 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020associated timestamps • Iteration Operators define sequences of events or processing that satisfies a loop condition. • not commonly supported • a termination condition must be defined, e.g. time limit (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope) .concat(&stream) .inspect(|x| println!("seen: {:?}", x)) .connect_loop(handle); }); t (t (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri | Boston University 2020 Blocking vs. Non-Blocking operators • A Blocking0 码力 | 53 页 | 532.37 KB | 1 年前3 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020associated timestamps • Iteration Operators define sequences of events or processing that satisfies a loop condition. • not commonly supported • a termination condition must be defined, e.g. time limit (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope) .concat(&stream) .inspect(|x| println!("seen: {:?}", x)) .connect_loop(handle); }); t (t (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri | Boston University 2020 Blocking vs. Non-Blocking operators • A Blocking0 码力 | 53 页 | 532.37 KB | 1 年前3
 Streaming in Apache Flinkdevelop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click subset of stream processing Processing Data Dataflows Let's Talk About Time • Processing Time • Event Time • Events may arrive out of order! What Can Be Streamed? • Anything (if you write a serializer/deserializer none exists for this key if (average == null) average = new MovingAverage(2); // add this event to the moving average average.add(item.f1); averageState.update(average); // return0 码力 | 45 页 | 3.00 MB | 1 年前3 Streaming in Apache Flinkdevelop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click subset of stream processing Processing Data Dataflows Let's Talk About Time • Processing Time • Event Time • Events may arrive out of order! What Can Be Streamed? • Anything (if you write a serializer/deserializer none exists for this key if (average == null) average = new MovingAverage(2); // add this event to the moving average average.add(item.f1); averageState.update(average); // return0 码力 | 45 页 | 3.00 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic Clones Episode III: Revenge of the Sith Episode VII: The Force Awakens This is called event time This is called processing time Vasiliki Kalavri | Boston University 2020 • What if you were0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic Clones Episode III: Revenge of the Sith Episode VII: The Force Awakens This is called event time This is called processing time Vasiliki Kalavri | Boston University 2020 • What if you were0 码力 | 22 页 | 2.22 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• might fail (or seem as if they failed) Streaming sources… 3 Producers and consumers • An event is typically generated by a producer (or publisher or sender) and processed by one or multiple consumers consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message broker: a system that connects event producers with with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which connects one sender with one recipient.0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• might fail (or seem as if they failed) Streaming sources… 3 Producers and consumers • An event is typically generated by a producer (or publisher or sender) and processed by one or multiple consumers consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message broker: a system that connects event producers with with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which connects one sender with one recipient.0 码力 | 33 页 | 700.14 KB | 1 年前3
 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment // use event time for the application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) // window assigners for the most common windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an assigners provide a default trigger that triggers the evaluation of a window once the (processing or event) time passes the end of the window. • A window is created when the first element is assigned to0 码力 | 35 页 | 444.84 KB | 1 年前3 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment // use event time for the application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) // window assigners for the most common windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an assigners provide a default trigger that triggers the evaluation of a window once the (processing or event) time passes the end of the window. • A window is created when the first element is assigned to0 码力 | 35 页 | 444.84 KB | 1 年前3
 监控Apache Flink应用程序(入门)this operator has emitted caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 13 4.7 仪表盘示例 Figure 4: Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging speaking, latency is the delay between the creation of an event and the time at which results based on this event become visible. Once the event is created it is usually stored in a persistent message queue for growing state are very application-specific. Typically, an increasing number of keys, a large event-time skew between different input streams or simply missing state cleanup may cause growing state0 码力 | 23 页 | 148.62 KB | 1 年前3 监控Apache Flink应用程序(入门)this operator has emitted caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 13 4.7 仪表盘示例 Figure 4: Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging speaking, latency is the delay between the creation of an event and the time at which results based on this event become visible. Once the event is created it is usually stored in a persistent message queue for growing state are very application-specific. Typically, an increasing number of keys, a large event-time skew between different input streams or simply missing state cleanup may cause growing state0 码力 | 23 页 | 148.62 KB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational streams update relation tables and derived streams update materialized views. • An operator outputs event streams that describe the changing view computed over the input stream according to the relational0 码力 | 45 页 | 1.22 MB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 20201. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Was mi fully processed delivered downstream? Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output What can go wrong: • lost0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 20201. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Was mi fully processed delivered downstream? Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output What can go wrong: • lost0 码力 | 49 页 | 2.08 MB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinkhas been consumed or not. ▶ No built-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason. 48 / 79 mapWithState returns another streaming DF 63 / 79 Window Operation ▶ Aggregations over a sliding event-time window. • Event-time is the time embedded in the data, not the time Spark receives them. ▶ Use groupBy() streaming uses watermarks to measure progress in event time. ▶ Watermarks flow as part of the data stream and carry a timestamp t. ▶ A W(t) declares that event time has reached time t in that stream • There0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinkhas been consumed or not. ▶ No built-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason. 48 / 79 mapWithState returns another streaming DF 63 / 79 Window Operation ▶ Aggregations over a sliding event-time window. • Event-time is the time embedded in the data, not the time Spark receives them. ▶ Use groupBy() streaming uses watermarks to measure progress in event time. ▶ Watermarks flow as part of the data stream and carry a timestamp t. ▶ A W(t) declares that event time has reached time t in that stream • There0 码力 | 113 页 | 1.22 MB | 1 年前3
 Apache Flink的过去、现在和未来17亿/秒 Flink 的过去 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起0 码力 | 33 页 | 3.36 MB | 1 年前3 Apache Flink的过去、现在和未来17亿/秒 Flink 的过去 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起0 码力 | 33 页 | 3.36 MB | 1 年前3
共 18 条
- 1
- 2













