 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/09: Flow control and load shedding ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers queue: what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load applications with strict latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize.0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/09: Flow control and load shedding ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers queue: what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load applications with strict latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize.0 码力 | 43 页 | 2.42 MB | 1 年前3
 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020results of the computation rather than the execution flow. • Imperative languages are used to describe plans of operators the streams must flow through. • Pattern-based languages specify conditions events in a period during which a user was active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition • it is blocking and must be defined over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as0 码力 | 53 页 | 532.37 KB | 1 年前3 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020results of the computation rather than the execution flow. • Imperative languages are used to describe plans of operators the streams must flow through. • Pattern-based languages specify conditions events in a period during which a user was active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition • it is blocking and must be defined over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as0 码力 | 53 页 | 532.37 KB | 1 年前3
 Scalable Stream Processing - Spark Streaming and FlinkLate Data (1/3) ▶ Spark streaming uses watermarks to measure progress in event time. ▶ Watermarks flow as part of the data stream and carry a timestamp t. ▶ A W(t) declares that event time has reached "10 minutes", "5 minutes"), col("word")).count() 67 / 79 Flink 68 / 79 Flink ▶ Distributed data flow processing system ▶ Unified real-time stream and batch processing ▶ Process unbounded and bounded ▶ Periodically, the data sources inject checkpoint barriers into the data stream. ▶ The barriers flow through the data stream, and trigger operators to emit all records that depend only on records before0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and FlinkLate Data (1/3) ▶ Spark streaming uses watermarks to measure progress in event time. ▶ Watermarks flow as part of the data stream and carry a timestamp t. ▶ A W(t) declares that event time has reached "10 minutes", "5 minutes"), col("word")).count() 67 / 79 Flink 68 / 79 Flink ▶ Distributed data flow processing system ▶ Unified real-time stream and batch processing ▶ Process unbounded and bounded ▶ Periodically, the data sources inject checkpoint barriers into the data stream. ▶ The barriers flow through the data stream, and trigger operators to emit all records that depend only on records before0 码力 | 113 页 | 1.22 MB | 1 年前3
 Apache Flink的过去、现在和未来Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch0 码力 | 33 页 | 3.36 MB | 1 年前3 Apache Flink的过去、现在和未来Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch0 码力 | 33 页 | 3.36 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 11 10 8 9 6 7 6 5 watermark record timestamp records 3 Watermarks (in Flink) flow along dataflow edges. They are special records generated by the sources or assigned by the application0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 11 10 8 9 6 7 6 5 watermark record timestamp records 3 Watermarks (in Flink) flow along dataflow edges. They are special records generated by the sources or assigned by the application0 码力 | 22 页 | 2.22 MB | 1 年前3
 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020conditions • Provide real-time scheduling information for public transport • Optimize transport network flow and recommend alternative routes Example: • Alibaba City Brain adjusts traffic lights in real-time0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020conditions • Provide real-time scheduling information for public transport • Optimize transport network flow and recommend alternative routes Example: • Alibaba City Brain adjusts traffic lights in real-time0 码力 | 34 页 | 2.53 MB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintained state • computation: load in terms of computation • communication: load in terms of flow size in the input channel of each parallel task • Partitioning function performance • space required0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintained state • computation: load in terms of computation • communication: load in terms of flow size in the input channel of each parallel task • Partitioning function performance • space required0 码力 | 41 页 | 4.09 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time0 码力 | 45 页 | 1.22 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time0 码力 | 54 页 | 2.83 MB | 1 年前3
共 9 条
- 1













