 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 • Flink requires a sufficient number of processing slots in order to execute all tasks of an application. • The JobManager cannot restart the application until enough slots become available JobManager failures ??? Vasiliki Kalavri | Boston University 2020 When the JobManager fails all tasks are automatically cancelled. The new JobManager performs the following steps: 1. It requests It requests processing slots. 3. It restarts the application and resets the state of all its tasks to the last completed checkpoint. Highly available Flink setup ??? Vasiliki Kalavri | Boston University0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 • Flink requires a sufficient number of processing slots in order to execute all tasks of an application. • The JobManager cannot restart the application until enough slots become available JobManager failures ??? Vasiliki Kalavri | Boston University 2020 When the JobManager fails all tasks are automatically cancelled. The new JobManager performs the following steps: 1. It requests It requests processing slots. 3. It restarts the application and resets the state of all its tasks to the last completed checkpoint. Highly available Flink setup ??? Vasiliki Kalavri | Boston University0 码力 | 41 页 | 4.09 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020TaskManager can execute several tasks at the same time. • It is statically configured with a certain number of processing slots that defines the maximum number of concurrent tasks it can execute. • A processing for each receiving task that any of its tasks need to send data to. Batching in Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component of a TaskManager0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020TaskManager can execute several tasks at the same time. • It is statically configured with a certain number of processing slots that defines the maximum number of concurrent tasks it can execute. • A processing for each receiving task that any of its tasks need to send data to. Batching in Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component of a TaskManager0 码力 | 54 页 | 2.83 MB | 1 年前3
 PyFlink 1.15 Documentationapache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationapache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationapache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationapache.flink.streaming.runtime.tasks.RegularOperatorChain. ˓→initializeStateAndOpenOperators(RegularOperatorChain.java:110) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask ˓→711) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1. ˓→call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask. ˓→java:687) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java: ˓→958) at org.apache0 码力 | 36 页 | 266.80 KB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020producer slows down according to the rate the consumer recycles buffers. Remote exchange: If tasks run on different worker nodes, the buffer can be recycled as soon as it is on the TCP channel. • The maximum throughput is limited by the processing rate of the slowest task. • Parallel tasks are connected via virtual channels multiplexed over TCP connections: • In the presence of skew 29 Remarks on CFC • Bakcpressure is inflicted on pairs of communicating tasks only • it does not interfere with other tasks sharing the same TCP connection. • CFC maximizes network utilization and0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020producer slows down according to the rate the consumer recycles buffers. Remote exchange: If tasks run on different worker nodes, the buffer can be recycled as soon as it is on the TCP channel. • The maximum throughput is limited by the processing rate of the slowest task. • Parallel tasks are connected via virtual channels multiplexed over TCP connections: • In the presence of skew 29 Remarks on CFC • Bakcpressure is inflicted on pairs of communicating tasks only • it does not interfere with other tasks sharing the same TCP connection. • CFC maximizes network utilization and0 码力 | 43 页 | 2.42 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020watermark captures the progress of upstream stages • minimum of output watermarks of all upstream tasks • The output watermark captures the progress of the stage itself • minimum of input watermarks 1. Watermarks must be monotonically increasing in order to ensure that the event time clocks of tasks are progressing and not going backwards. 2. A watermark with a timestamp T indicates that all subsequent0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020watermark captures the progress of upstream stages • minimum of output watermarks of all upstream tasks • The output watermark captures the progress of the stage itself • minimum of input watermarks 1. Watermarks must be monotonically increasing in order to ensure that the event time clocks of tasks are progressing and not going backwards. 2. A watermark with a timestamp T indicates that all subsequent0 码力 | 22 页 | 2.22 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020or shared subscription • A logical producer/consumer can be implemented by multiple physical tasks running in parallel • Ιf a producer generates events with high rate, we can balance the load by Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads in network clusters • tasks can be efficiently distributed among multiple workers, such as Google Compute Engine instances.0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020or shared subscription • A logical producer/consumer can be implemented by multiple physical tasks running in parallel • Ιf a producer generates events with high rate, we can balance the load by Publishers and Subscribers are applications. 23 Use-cases • Balancing workloads in network clusters • tasks can be efficiently distributed among multiple workers, such as Google Compute Engine instances.0 码力 | 33 页 | 700.14 KB | 1 年前3
 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators Keyed state is scoped to a key defined in the operator’s input one state instance. • The keyed state instances of a function are distributed across all parallel tasks of the function’s operator. Keyed state can only be used by functions that are applied on a KeyedStream:0 码力 | 24 页 | 914.13 KB | 1 年前3 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020the same parallel task have access to the same state • It cannot be accessed by other parallel tasks of the same or different operators Keyed state is scoped to a key defined in the operator’s input one state instance. • The keyed state instances of a function are distributed across all parallel tasks of the function’s operator. Keyed state can only be used by functions that are applied on a KeyedStream:0 码力 | 24 页 | 914.13 KB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020completely processed 3. Copy the state of each task to a remote, persistent storage 4. Wait until all tasks have finished their copies 5. Resume processing and stream ingestion 12 ??? Vasiliki Kalavri | post-snapshot events (order maintained by FIFO channels) Termination is satisfied if initiator can reach all tasks (possible in DAGs via multiple initiators, e.g., sources.) p1 p2 p3 p4 p5 p6 p7 p7 p5 p6 p1 p1 p2 p3 p4 34 ??? Vasiliki Kalavri | Boston University 2020 • Assumptions: • DAG of tasks • Epoch change events triggered on each source task (⟨ep1⟩,⟨ep2⟩,…) • Issued by a coordinator or generated0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020completely processed 3. Copy the state of each task to a remote, persistent storage 4. Wait until all tasks have finished their copies 5. Resume processing and stream ingestion 12 ??? Vasiliki Kalavri | post-snapshot events (order maintained by FIFO channels) Termination is satisfied if initiator can reach all tasks (possible in DAGs via multiple initiators, e.g., sources.) p1 p2 p3 p4 p5 p6 p7 p7 p5 p6 p1 p1 p2 p3 p4 34 ??? Vasiliki Kalavri | Boston University 2020 • Assumptions: • DAG of tasks • Epoch change events triggered on each source task (⟨ep1⟩,⟨ep2⟩,…) • Issued by a coordinator or generated0 码力 | 81 页 | 13.18 MB | 1 年前3
 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020patterns • Raise alerts for abnormal system metrics • Detect invariant violations • Identify outlier tasks Inspired by this paper : “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020patterns • Raise alerts for abnormal system metrics • Detect invariant violations • Identify outlier tasks Inspired by this paper : “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior0 码力 | 34 页 | 2.53 MB | 1 年前3
共 11 条
- 1
- 2













