Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 10 Stochastic averaging Use one hash function to simulate many by splitting the hash value into two parts ??? Vasiliki Kalavri | Boston University 2020 10 We split the input bits to compute the rank(.): Stochastic averaging Use one hash function to simulate many by splitting the hash value into two parts ??? Vasiliki Kalavri | Boston University 2020 10 We split the input bits to compute the rank(.): Stochastic averaging Use one hash function to simulate many by splitting the hash value into two parts h(x) = (i0i1 . . . iM−1)2, ik ∈ {0,1} j = (i0i1 . . . ip−1)2 For0 码力 | 69 页 | 630.01 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020?? Vasiliki Kalavri | Boston University 2020 24 • Cost of Merge = 0.5 • Cost of A = 0.5 • Splitting A allows a pre-aggregation similar to what combiners do in MapReduce Operator separation merge0 码力 | 54 页 | 2.83 MB | 1 年前3
PyFlink 1.15 Documentation"/Users/dianfu/code/src/github/pyflink-faq/testing/test_utils.py", line 122, in␣ ˓→setUp self.t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) File "/Users/dianfu/code/src/github in in_streaming_mode get_gateway().jvm.EnvironmentSettings.inStreamingMode()) File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-10 egg/pyflink/java_gateway.py",␣ ˓→line 62, in get_gateway _gateway = launch_gateway() File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-100 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation"/Users/dianfu/code/src/github/pyflink-faq/testing/test_utils.py", line 122, in␣ ˓→setUp self.t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) File "/Users/dianfu/code/src/github in in_streaming_mode get_gateway().jvm.EnvironmentSettings.inStreamingMode()) File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-10 egg/pyflink/java_gateway.py",␣ ˓→line 62, in get_gateway _gateway = launch_gateway() File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-100 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkfunction is executed in the driver process. 31 / 79 Output Operations (2/4) ▶ What’s wrong with this code? ▶ This requires the connection object to be serialized and sent from the driver to the worker. send(record) // executed at the worker } } 32 / 79 Output Operations (2/4) ▶ What’s wrong with this code? ▶ This requires the connection object to be serialized and sent from the driver to the worker. send(record) // executed at the worker } } 32 / 79 Output Operations (3/4) ▶ What’s wrong with this code? ▶ Creating a connection object has time and resource overheads. ▶ Creating and destroying a connection0 码力 | 113 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Deliverables • One (1) written report of maximum 5 pages (10%). • Code (including pre-processing, deployment, and testing): (40%) • code deliverables must be accompanied by documentation 8 Vasiliki Kalavri0 码力 | 34 页 | 2.53 MB | 1 年前3
Apache Flink的过去、现在和未来Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起 Code Up 阿里云开发者社区 Apache Flink China 2群 粘贴二维码 谢谢!0 码力 | 33 页 | 3.36 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020• Retries simply replay the output that has been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen0 码力 | 49 页 | 2.08 MB | 1 年前3
监控Apache Flink应用程序(入门)time window) for functional reasons. 4. Each computation in your Flink topology (framework or user code), as well as each network shuffle, takes time and adds to latency. 5. If the application emits0 码力 | 23 页 | 148.62 KB | 1 年前3
共 9 条
- 1













