High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020recovery? • how can we ensure minimal downtime and fast recovery? • how can we hide recovery side-effects from downstream applications? Vasiliki Kalavri | Boston University 2020 What is a failure? op Kalavri | Boston University 2020 Recovery types • Precise recovery (exactly-once) • It hides the effects of a failure perfectly • Post-failure output is identical to no-failure 8 Vasiliki Kalavri | | Boston University 2020 Recovery types • Precise recovery (exactly-once) • It hides the effects of a failure perfectly • Post-failure output is identical to no-failure • Rollback recovery (at-least-once)0 码力 | 49 页 | 2.08 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused0 码力 | 41 页 | 4.09 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020plan with • expected cycle savings • locations for drop operations • drop amounts • QoS effects (provided that tuples can be associated with a utility metric) 15 ??? Vasiliki Kalavri | Boston0 码力 | 43 页 | 2.42 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused0 码力 | 93 页 | 2.42 MB | 1 年前3
PyFlink 1.15 DocumentationYou can also create Python virtual environment with a specific Python version virtualenv --python /path/to/python/executable venv The virtual environment needs to be activated before to use it. To activate following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1] # +I[to, 1] # +I[be,--that, 1] # ... If there PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will output a path like the following: # /path/to/python/site-packages/pyflink # Check the logging0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationYou can also create Python virtual environment with a specific Python version virtualenv --python /path/to/python/executable venv The virtual environment needs to be activated before to use it. To activate following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1] # +I[to, 1] # +I[be,--that, 1] # ... If there PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will output a path like the following: # /path/to/python/site-packages/pyflink # Check the logging0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020checkpoints.dir: path/to/checkpoint/folder/ In your Flink program: val env = StreamExecutionEnvironment.getExecutionEnvironment val checkpointPath: String = ??? // configure path for checkpoints on0 码力 | 24 页 | 914.13 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020• We want to estimate the distance between any pair of nodes u, v as the length of the shortest path between them. • A spanner H of graph G is a subgraph of G with fewer edges and the same set of0 码力 | 72 页 | 7.77 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkdirectory for stateful streams. val ssc = new StreamingContext(conf, Seconds(1)) ssc.checkpoint("path/to/persistent/storage") 45 / 79 Stateful Stream Operations ▶ Spark API proposes two functions for0 码力 | 113 页 | 1.22 MB | 1 年前3
共 9 条
- 1













