Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020(Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling the world as a graph 2 Social networks friend follows The web Actor-movie results for the search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex” “edge” 1 5 4 3 2 undirected graph directed graph 4 ??? Vasiliki Kalavri Kalavri | Boston University 2020 Graph streams Graph streams model interactions as events that update an underlying graph structure 5 Edge events: A purchase, a movie rating, a like on an online post0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 Distributed dataflow systems • Computations as Directed Acyclic Graphs (DAGs) • nodes are operators and edges are data channels • operators can accumulate state Vasiliki Kalavri | Boston University 2020 source sink input port output port dataflow graph Dataflow graph • operators are nodes, data channels are edges • channels have FIFO semantics • streams0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020basics 3 source sink input port output port dataflow graph ??? Vasiliki Kalavri | Boston University 2020 Revisiting the basics 4 Dataflow graph • operators are nodes, data channels are edges • 0.5 Operator re-ordering B A A B ??? Vasiliki Kalavri | Boston University 2020 17 • A static graph transformation that enables re-ordering at runtime • It dynamically routes data after measuring Kalavri | Boston University 2020 22 • Multi-tenancy • in streaming systems that build one dataflow graph for several queries • when applications analyze data streams from a small set of sources • Operator0 码力 | 54 页 | 2.83 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative University 2020 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Intuition: use the dataflow graph to extract operator dependencies and system instrumentation to collect accurate, representative0 码力 | 93 页 | 2.42 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Rate control • In a network of consumers and producers such as a streaming execution graph with multiple operators, back-pressure has the effect that all operators slow down to match the processing speed of the slowest consumer. • If the bottleneck operator is far down the dataflow graph, back-pressure propagates to upstream operators, eventually reaching the data stream sources.0 码力 | 43 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020guarantees State management Operator semantics Window optimizations Filtering, counting, sampling Graph streaming algorithms Vasiliki Kalavri | Boston University 2020 Tools Apache Flink: flink.apache0 码力 | 34 页 | 2.53 MB | 1 年前3
PyFlink 1.15 Documentationdianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.2.jar # -rw-r--r-- staff 110M Oct 19 16:02 flink-dist-1.15.2.jar -rw-r--r-- 1 duanchen staff 171K Oct 19 16:02 flink-json-1.15.2.jar -rw-r--r-- 1 duanchen staff 20M Oct 19 16:02 flink-scala_2.12-1.15.2.jar -rw-r--r-- 1 'default_catalog.default_database.sourceKafka'. Table options are: 'connector'='kafka' 'format'='json' 'properties.bootstrap.servers'='192.168.101.109:9092' 'scan.startup.mode'='earliest-offset' 'topic'='pyflink_test'0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationdianfu staff 110M 10 18 20:43 flink-dist-1.15.2.jar # -rw-r--r-- 1 dianfu staff 171K 10 18 20:43 flink-json-1.15.2.jar # -rw-r--r-- 1 dianfu staff 20M 10 18 20:43 flink-scala_2.12-1.15.2.jar # -rw-r--r-- staff 110M Oct 19 16:02 flink-dist-1.15.2.jar -rw-r--r-- 1 duanchen staff 171K Oct 19 16:02 flink-json-1.15.2.jar -rw-r--r-- 1 duanchen staff 20M Oct 19 16:02 flink-scala_2.12-1.15.2.jar -rw-r--r-- 1 'default_catalog.default_database.sourceKafka'. Table options are: 'connector'='kafka' 'format'='json' 'properties.bootstrap.servers'='192.168.101.109:9092' 'scan.startup.mode'='earliest-offset' 'topic'='pyflink_test'0 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkWINDOW(time, "1 hour") 61 / 79 Structured Streaming Example (3/3) val inputDF = spark.readStream.json("s3://logs") inputDF.groupBy(col("action"), window(col("time"), "1 hour")).count() .writeStream case class Call(action: String, time: Timestamp, id: Int) val df: DataFrame = spark.readStream.json("s3://logs") val ds: Dataset[Call] = df.as[Call] // Selection and projection df.select("action")0 码力 | 113 页 | 1.22 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020snapshotting • FIFO reliable channels: no lost or duplicate messages • Strongly connected execution graph: each process can reach every other process in the system • Single initiating process 18 The0 码力 | 81 页 | 13.18 MB | 1 年前3
共 10 条
- 1













