Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University University 2020 What is a stream? • In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream0 码力 | 45 页 | 1.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Uddin Nasir et. al. The power of both choices: Practical load balancing for distributed stream processing engines. ICDE 2015. • Mitzenmacher, Michael. The power of two choices in randomized load balancing0 码力 | 31 页 | 1.47 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston operator. Keyed state can only be used by functions that are applied on a KeyedStream: • When the processing method of a function with keyed input is called, Flink’s runtime automatically puts all keyed fare, Collector> out) throws Exception { // similar logic for processing fare events } } } Java example (cont.) 21 Vasiliki Kalavri | Boston University 2020 0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University serialization cost • if operators are separate, throughput is bounded by either communication or processing cost • if fused, throughput is determined by operator cost only Operator fusion A B A B is statically configured with a certain number of processing slots that defines the maximum number of concurrent tasks it can execute. • A processing slot can execute one slice of an application, i.e0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an end timestamp. • All built-in window assigners assigners provide a default trigger that triggers the evaluation of a window once the (processing or event) time passes the end of the window. • A window is created when the first element is assigned0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient in using end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs0 码力 | 34 页 | 2.53 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Vasiliki (Vasia) Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University minute? 4 Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while while the train is in the tunnel • results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the0 码力 | 22 页 | 2.22 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation0 码力 | 69 页 | 630.01 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Vertex streams (not today) ??? Vasiliki Kalavri | Boston University 2020 Batch Graph Processing 9 Batch graph processing systems, such as Apache Graph, GraphX, Pregel, operate offline. They are built from scratch for every new edge? • Can we use graph synopses and summaries and compute graph analytics in one-pass? ??? Vasiliki Kalavri | Boston University 2020 Connectivity & Bipartite property0 码力 | 72 页 | 7.77 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/21: Sampling and filtering streams ??? Vasiliki Roginsky, Miguel Jimeno. A new analysis of the false positive rate of a Bloom filter. Information Processing Letters 110 (2010). Further reading0 码力 | 74 页 | 1.06 MB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100













