Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/04: Streaming languages and operator semantics Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston University 2020 Kalavri | Boston University 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators process stream elements one-by-one. • selection, filtering Consider events from stream S1 and stream S2 11 Vasiliki Kalavri | Boston University 2020 Operator types (II) • Sequence Operators capture the arrival of an ordered set of events. • common in0 码力 | 53 页 | 532.37 KB | 1 年前3
PyFlink 1.15 DocumentationKubernetes. Execute PyFlink jobs with Flink Kubernetes Operator See PyFlink Example for more details on how to execute PyFlink jobs with Flink Kubernetes Operator. 1.1.2 QuickStart 1.1.2.1 QuickStart: Table RuntimeException: Error␣ ˓→received from SDK harness for instruction 4: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",␣ ˓→line 289, in _execute s/worker/sdk_worker.py",␣ ˓→line 362, inlambda: self.create_worker().do_instruction(request), request) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py" 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationKubernetes. Execute PyFlink jobs with Flink Kubernetes Operator See PyFlink Example for more details on how to execute PyFlink jobs with Flink Kubernetes Operator. 1.1.2 QuickStart 1.1.2.1 QuickStart: Table RuntimeException: Error␣ ˓→received from SDK harness for instruction 4: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",␣ ˓→line 289, in _execute s/worker/sdk_worker.py",␣ ˓→line 362, inlambda: self.create_worker().do_instruction(request), request) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py" 0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives Vasiliki Kalavri | Boston University 2020 Operator selectivity 6 • The number of output elements produced per number of input elements • a map operator has a selectivity of 1, i.e. it produces one output element for each input element it processes • an operator that tokenizes sentences into words has selectivity > 1 • a filter operator typically has selectivity < 1 Is selectivity always known0 码力 | 54 页 | 2.83 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020dataflow with sources S1, S2, … Sn and rates λ1, λ2, … λn identify the minimum parallelism πi per operator i, such that the physical dataflow can sustain all source rates. S1 S2 λ1 λ2 S1 S2 π=2 => scale out • Analytical dataflow-based models Action • Speculative: small changes at one operator at a time • Predictive: at-once for all operators 8 ??? Vasiliki Kalavri | Boston University per task • total time spent processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once0 码力 | 93 页 | 2.42 MB | 1 年前3
监控Apache Flink应用程序(入门)展并与上游系统保持同步。 4.1 吞吐量 Flink提供了多个metrics来衡量应用程序的吞吐量。对于每个operator或task(请记住:一个task可以包含多个 chained-task3),Flink会对进出系统的记录和字节进行计数。在这些metrics中,每个operator输出记录的速率 通常是最直观和最容易理解的。 4.2 关键指标 Metric Scope Description numRecordsOutPerSecond task The number of records this operator/task sends per second. numRecordsOutPerSecond operator The number of records this operator sends per second. caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 11 4.3 仪表盘示例 Figure 3: Mean Records Out per Second per Operator 4.4 可能的报警条件 • recordsOutPerSecond = 0 (for a non-Sink operator) 请注意:目前由于metrics体系只考虑Flink的内部通信,所以source operators的输入记录数是0,而sink0 码力 | 23 页 | 148.62 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020• Users define state using arbitrary types • The system is unaware of which parts of an operator constitute state Streaming state 3 • Explicit state primitives including state types and interfaces results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator task, i.e. records processed by the same parallel task have access to the same state scoped to a key defined in the operator’s input records • Flink maintains one state instance per key value and partitions all records with the same key to the operator task that maintains the state for0 码力 | 24 页 | 914.13 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational semantics of the operator. 19 Vasiliki Kalavri | Boston materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational semantics of the operator. src dest bytes 1 2 20K materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational semantics of the operator. Results as continuously0 码力 | 45 页 | 1.22 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020t3 t’2 t’3 t4 … Rollback-divergent t1 t2 t3 t’2 t’3 t’4 … The output semantics depend on the operator type: • arbitrary: it depends on order, randomness, or external system • deterministic: it produces result semantics 11 sum 4 3 3 9 6 … 5 6 1 9 sum 6 5 3 10 6 … 7 8 1 10 Can you think of an operator that provides correct, possibly repeating, results even if it re-processes tuples after recovery … 7 8 1 10 Can you think of an operator that provides correct, possibly repeating, results even if it re-processes tuples after recovery? Can you think of an operator that will converge to the correct0 码力 | 49 页 | 2.08 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020input rates and periodically estimates operator selectivities. • The load shedder assigns a cost, ci, in cycles per tuple, and a selectivity, si, to each operator i. • The statistics manager collects records does the operator produce per record in its input? • map: 1 in 1 out • filter: 1 in, 1 or 0 out • flatMap, join: 1 in 0, 1, or more out • Cost: how many records can an operator process in a • The rate defines the probability to discard a tuple and is computed based on statistics and operator selectivity • The optimization objective is to achieve the highest possible accuracy given the0 码力 | 43 页 | 2.42 MB | 1 年前3
共 17 条
- 1
- 2













