Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/09: Flow control and load shedding ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers queue: what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load shedder applications with strict latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize.0 码力 | 43 页 | 2.42 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkautomatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires, Spark checks for new data (new row in the0 码力 | 113 页 | 1.22 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? 12 • Detect environment changes: external workload and system performance unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 12 • Detect environment maintained state • computation: load in terms of computation • communication: load in terms of flow size in the input channel of each parallel task • Partitioning function performance • space required0 码力 | 41 页 | 4.09 MB | 1 年前3
Apache Flink的过去、现在和未来Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch0 码力 | 33 页 | 3.36 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020conditions • Provide real-time scheduling information for public transport • Optimize transport network flow and recommend alternative routes Example: • Alibaba City Brain adjusts traffic lights in real-time Boston University 2020 [1, 4, 5, 23, 8, 0, 7] 5 median ‣ We cannot store the entire stream ‣ No control over arrival rate or order f’ ∞ ? Continuously arriving, possibly unbounded data f read write0 码力 | 34 页 | 2.53 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020arrival and/or a generation timestamp. • They are produced by external sources, i.e. the DSMS has no control over their arrival order or the data rate. • They have unknown, possibly unbounded length, i are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time available cores / threads • Fused operators can share the address space but use separate threads of control • avoid communication cost without losing pipeline parallelism • use a shared buffer for communication0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020results of the computation rather than the execution flow. • Imperative languages are used to describe plans of operators the streams must flow through. • Pattern-based languages specify conditions events in a period during which a user was active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition • it is blocking and must be defined over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as0 码力 | 53 页 | 532.37 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 11 10 8 9 6 7 6 5 watermark record timestamp records 3 Watermarks (in Flink) flow along dataflow edges. They are special records generated by the sources or assigned by the application0 码力 | 22 页 | 2.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020time rate increase : input rate : throughput ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Automatic Scaling Control 4 ??? Vasiliki Kalavri | Boston University 2020 The automatic scaling problem 5 Given a logical congestion, back pressure, throughput Policy • Queuing theory models: for latency objectives • Control theory models: e.g., PID controller • Rule-based models, e.g. if CPU utilization > 70% => scale0 码力 | 93 页 | 2.42 MB | 1 年前3
共 13 条
- 1
- 2













