State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020interface. What operations should state support? What state types can you think of? • Count, sum, list, map, … Vasiliki Kalavri | Boston University 2020 All data maintained by a task and used to compute ListState[T]: a list of elements of type T • ListState.add(value: T) • ListState.addAll(values: java.util.List[T]). • List State.get(): Iterable[T] • ListState.update(values: java.util.List[T]) Flink’s Boston University 2020 ListsnapshotState(long checkpointId, long timestamp) void restoreState(List state) Operator state 22 • A function can work with operator list state by implementing the 0 码力 | 24 页 | 914.13 KB | 1 年前3
PyFlink 1.15 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page exist at the same time for some reason. # List the jar packages under the lib directory ls -lh /path/to/python/site-packages/pyflink/lib # It will output a list of jar packages as following: # -rw-r--r-- TableEnvironments in same query, e.g., to join or union them. Firstly, you can create a Table from a Python List Object [3]: table = table_env.from_elements([(1, 'Hi'), (2, 'Hello')]) table.get_schema() [3]: root0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page exist at the same time for some reason. # List the jar packages under the lib directory ls -lh /path/to/python/site-packages/pyflink/lib # It will output a list of jar packages as following: # -rw-r--r-- TableEnvironments in same query, e.g., to join or union them. Firstly, you can create a Table from a Python List Object [3]: table = table_env.from_elements([(1, 'Hi'), (2, 'Hello')]) table.get_schema() [3]: root0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(key, AsString(result)); MapReduce combiners example: URL access frequency (k2, list(v2)) → → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access frequency 26 map() reduce() GET /dumprequest HTTP/10 码力 | 54 页 | 2.83 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020with keyed state are scaled by repartitioning keys • Operators with operator list state are scaled by redistributing the list entries. • Operators with operator broadcast state are scaled up by copying University 2020 18 Scaling keyed state ??? Vasiliki Kalavri | Boston University 2020 19 Scaling list state ??? Vasiliki Kalavri | Boston University 2020 20 Kafka offsets re-distribution ??? Vasiliki0 码力 | 41 页 | 4.09 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020from 1. • e.g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove elements. 6 ??? Vasiliki Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur:0 码力 | 31 页 | 1.47 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Solution #2: sampling users Sample 1/10th of the users instead 12 • Maintain a list of all users seen so far and a flag indicating whether they belong to the sample or not • When University 2020 Solution #2: sampling users Sample 1/10th of the users instead 12 • Maintain a list of all users seen so far and a flag indicating whether they belong to the sample or not • When0 码力 | 74 页 | 1.06 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkvalue) pairs). def updateStateByKey[S](updateFunc: (Seq[V], Option[S]) => Option[S]) // Seq[V]: the list of new values received for the given key in the current batch // Option[S]: the state we are updating value) pairs). def updateStateByKey[S](updateFunc: (Seq[V], Option[S]) => Option[S]) // Seq[V]: the list of new values received for the given key in the current batch // Option[S]: the state we are updating0 码力 | 113 页 | 1.22 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020AggregateFunction • Full window functions collect all elements of a window and iterate over the list of all collected elements when evaluated: • They require more space but support more complex logic0 码力 | 35 页 | 444.84 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020exist for a data-parallel implementation of spanners • How to represent the spanner? As an adjacency list? which state primitives are suitable? Is RocksDB a suitable backend for graph state? • How to0 码力 | 72 页 | 7.77 MB | 1 年前3
共 12 条
- 1
- 2













