Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020(Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling the world as a graph 2 Social networks friend follows The web Actor-movie results for the search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex” “edge” 1 5 4 3 2 undirected graph directed graph 4 ??? Vasiliki Kalavri Kalavri | Boston University 2020 Graph streams Graph streams model interactions as events that update an underlying graph structure 5 Edge events: A purchase, a movie rating, a like on an online post0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming in Apache Flinkup an environment to develop Flink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type0 码力 | 45 页 | 3.00 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run Run a streaming computation as a series of very small, deterministic batch jobs. • Chops up the live stream into batches of X seconds. • Treats each batch as RDDs and processes them using RDD operations0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 20204/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation basics 3 source sink input port output port dataflow graph ??? Vasiliki Kalavri | Boston University 2020 Revisiting the basics 4 Dataflow graph • operators are nodes, data channels are edges • ??? Vasiliki Kalavri | Boston University 2020 12 • What does efficient mean in the context of streaming? • queries run continuously • streams are unbounded • In traditional ad-hoc database queries0 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/04: Streaming languages and operator semantics Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston interval of 5–15 s) by an item of type C with Z < 5. 8 Vasiliki Kalavri | Boston University 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020relatively static and historical data • batched updates during downtimes, e.g. every night Streaming Data Warehouse • low-latency materialized view updates • pre-aggregated, pre-processed streams streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent relations stream can be viewed as a massive, dynamic, one-dimensional vector A[1…N]. The size N of the streaming vector is defined as the product of the attribute domain size(s). Note that N might be unknown0 码力 | 45 页 | 1.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 20204/02: Elasticity policies and state migration ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative0 码力 | 93 页 | 2.42 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020to congestion control or video streaming in a lower quality. 4 ??? Vasiliki Kalavri | Boston University 2020 https://commons.wikimedia.org/wiki/File:Adaptive_streaming_overview_daseddon_2011_07_28.png Boston University 2020 Rate control • In a network of consumers and producers such as a streaming execution graph with multiple operators, back-pressure has the effect that all operators slow down to processing speed of the slowest consumer. • If the bottleneck operator is far down the dataflow graph, back-pressure propagates to upstream operators, eventually reaching the data stream sources.0 码力 | 43 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 What is this course about? The design and architecture of modern distributed streaming 4 Fundamental for representing, summarizing, and analyzing data streams Systems Algorithms State management Operator semantics Window optimizations Filtering, counting, sampling Graph streaming algorithms Vasiliki Kalavri | Boston University 2020 Tools Apache Flink: flink.apache.org compare features and processing guarantees of streaming systems • be proficient in using Apache Flink and Kafka to build end-to-end, scalable, and reliable streaming applications • have a solid understanding0 码力 | 34 页 | 2.53 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020snapshotting • FIFO reliable channels: no lost or duplicate messages • Strongly connected execution graph: each process can reach every other process in the system • Single initiating process 18 The from a socket? Exactly-once state consistency (in Apache Flink) can be achieved only if all streaming sources are re-settable ??? Vasiliki Kalavri | Boston University 2020 44 • Flink checkpoints are exactly once • Flink’s checkpointing and recovery mechanism only resets the internal state of a streaming application • Some result records might be emitted multiple times to downstream systems 500 码力 | 81 页 | 13.18 MB | 1 年前3
共 22 条
- 1
- 2
- 3













