Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020insertions, updates, deletions of single row or groups of rows Data Stream Management System • continuous queries • sequential data access, high-rate append-only updates Data Warehouse • complex high, bursty Processing Model query-driven / pull-based data-driven / push-based Queries ad-hoc continuous Latency relatively high low 5 Vasiliki Kalavri | Boston University 2020 Traditional DW vs. elasticity 8. Offer low-latency 7 2005 Vasiliki Kalavri | Boston University 2020 actions, alerts continuous analytics … Building a stream processor… 8 ? Vasiliki Kalavri | Boston University 2020 Basic0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston University 2020 Languages for continuous data processing 2 Vasiliki Kalavri | Boston University 2020 • Transforming languages define 21 Vasiliki Kalavri | Boston University 2020 Non-blocking (monotonic) queries are the only continuous queries that can be supported on data streams. Proposition: Only monotonic queries can be Vasiliki Kalavri | Boston University 2020 Consider a sequence of length n, i.e., S = Sn. If G is a continuous sum, so that it returns the sum of all tuples seen so far: • what is Gj (S) for j < n? • for0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020retrieve the same message - many-to-many communication - message content / structure matters for delivery 8 MB architecture advantages • Multiple producers/consumers as concurrent clients • Effective depend on a snapshot and clients are not notified if their query result changes later. 13 Message delivery and ordering Acknowledgements are messages from the client to the broker indicating that the client processing a message • If an acknowledgement is not received, delivery is retried • Re-delivery might cause re-ordering of messages • Re-delivery complicates stream processing and fault-tolerance • might0 码力 | 33 页 | 700.14 KB | 1 年前3
Apache Flink的过去、现在和未来超万台 状态数据 PetaBytes 事件处理 十万亿/天 峰值能力 17亿/秒 Flink 的过去 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Manager 生态 Flink Hive Flink Zeppelin 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起 Code Up0 码力 | 33 页 | 3.36 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkhttps://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming real-time stream and batch processing ▶ Process unbounded and bounded Data ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 69 / 79 Programs and Dataflows0 码力 | 113 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Fault-tolerance & high-availability Vasiliki Kalavri | Boston University 2020 actions, alerts continuous analytics … Building a stream processor… 33 ? Vasiliki Kalavri | Boston University 2020 Optional0 码力 | 34 页 | 2.53 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020introducing load imbalance • Resource management • utilization, isolation • Automation • continuous monitoring • bottleneck detection • stability, accuracy 11 Challenges of reconfiguration0 码力 | 41 页 | 4.09 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Manager Scheduler QoS Monitor Load Shedder Query Execution Engine Qm Q2 Q1 Ad-hoc or continuous queries Input streams … ??? Vasiliki Kalavri | Boston University 2020 Load shedding decisions0 码力 | 43 页 | 2.42 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020the system ensures retrying even if the sender crashes • this technique guarantees at-least-once delivery RPC retries might create duplicates • RPCs can sometimes succeed even if they appear to have0 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned by key • State can be cleared based0 码力 | 54 页 | 2.83 MB | 1 年前3
共 10 条
- 1













