 Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into a global one ??? Vasiliki Kalavri | Boston University 2020 Connected components in Flink 37 DataStream Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into a global one ??? Vasiliki Kalavri | Boston University 2020 Connected components in Flink 37 DataStream- MILLISECONDS)) .process(new UpdateDisjointSet()) // ephemeral partial state .flatMap(new Merger()) // global state .setParallelism(1); // merging on one task ??? Vasiliki Kalavri | Boston University 2020 MILLISECONDS)) .process(new UpdateDisjointSet()) // ephemeral partial state .flatMap(new Merger()) // global state .setParallelism(1); // merging on one task Will this scale? ??? Vasiliki Kalavri | Boston 0 码力 | 72 页 | 7.77 MB | 1 年前3
 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020backends 7 Vasiliki Kalavri | Boston University 2020 MemoryStateBackend • Stores state as regular objects on TaskManager’s heap • Low read/write latencies • OutOfMemoryError if large grows too large, GC descriptors. • The data types handled by the state are specified as Class or TypeInformation objects. 16 Registering state Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val method of a function with keyed input is called, Flink’s runtime automatically puts all keyed state objects of the function into the context of the key of the record that is passed by the function call.0 码力 | 24 页 | 914.13 KB | 1 年前3 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020backends 7 Vasiliki Kalavri | Boston University 2020 MemoryStateBackend • Stores state as regular objects on TaskManager’s heap • Low read/write latencies • OutOfMemoryError if large grows too large, GC descriptors. • The data types handled by the state are specified as Class or TypeInformation objects. 16 Registering state Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val method of a function with keyed input is called, Flink’s runtime automatically puts all keyed state objects of the function into the context of the key of the record that is passed by the function call.0 码力 | 24 页 | 914.13 KB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinkacknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinkacknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics acknowledged. ▶ Fault tolerance in Flink • More coarse-grained approach than Storm. • Based on consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics0 码力 | 113 页 | 1.22 MB | 1 年前3
 监控Apache Flink应用程序(入门)true when using Flink’s filesystem statebackend as it keeps all state objects on the JVM Heap. If the size of long-living objects on the Heap increases significantly, this can usually be attributed to0 码力 | 23 页 | 148.62 KB | 1 年前3 监控Apache Flink应用程序(入门)true when using Flink’s filesystem statebackend as it keeps all state objects on the JVM Heap. If the size of long-living objects on the Heap increases significantly, this can usually be attributed to0 码力 | 23 页 | 148.62 KB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020that checkpoints are meaningful and coherent? 7 ??? Vasiliki Kalavri | Boston University 2020 Global snapshots ??? Vasiliki Kalavri | Boston University 2020 9 Image credit: NASA ??? Vasiliki Kalavri system configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols system configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Validity Explained0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020that checkpoints are meaningful and coherent? 7 ??? Vasiliki Kalavri | Boston University 2020 Global snapshots ??? Vasiliki Kalavri | Boston University 2020 9 Image credit: NASA ??? Vasiliki Kalavri system configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols system configuration is eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Validity Explained0 码力 | 81 页 | 13.18 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Refreshing distributed caches • an application can publish invalidation events to update the IDs of objects that have changed. • Logging to multiple systems • a Google Compute Engine instance can write0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Refreshing distributed caches • an application can publish invalidation events to update the IDs of objects that have changed. • Logging to multiple systems • a Google Compute Engine instance can write0 码力 | 33 页 | 700.14 KB | 1 年前3
 Apache Flink的过去、现在和未来Flink 0.9 Sink Source Offset Computation State Periodic Snapshots 2015 年 6 月份 发布 – 开始内置支持 State Global Checkpoint 新数据 老数据 Checkpoint Barrier N Checkpoint Barrier N-1 Part of Checkpoint N+1 Part of0 码力 | 33 页 | 3.36 MB | 1 年前3 Apache Flink的过去、现在和未来Flink 0.9 Sink Source Offset Computation State Periodic Snapshots 2015 年 6 月份 发布 – 开始内置支持 State Global Checkpoint 新数据 老数据 Checkpoint Barrier N Checkpoint Barrier N-1 Part of Checkpoint N+1 Part of0 码力 | 33 页 | 3.36 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020http://streamingbook.net/fig/3-1 Vasiliki Kalavri | Boston University 2020 10 • A watermark is a global progress metric that indicates a certain point in time when we are confident that no more delayed0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020http://streamingbook.net/fig/3-1 Vasiliki Kalavri | Boston University 2020 10 • A watermark is a global progress metric that indicates a certain point in time when we are confident that no more delayed0 码力 | 22 页 | 2.22 MB | 1 年前3
 Streaming in Apache FlinkextractTimestamp(MyEvent event) { return element.getCreationTime(); } } Windows (Not the OS) Global Vs Keyed Windows stream. .keyBy( Streaming in Apache FlinkextractTimestamp(MyEvent event) { return element.getCreationTime(); } } Windows (Not the OS) Global Vs Keyed Windows stream. .keyBy(- ) .window( - ) .reduce|agg 0 码力 | 45 页 | 3.00 MB | 1 年前3
共 9 条
- 1













