High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020<#WorldCup, 480> <#StarWars, 300> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 480> <#StarWars, 300> <#Brexit> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations <#StarWars, 300> <#Brexit, 521> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations0 码力 | 49 页 | 2.08 MB | 1 年前3
PyFlink 1.15 Documentation(FileSink .for_row_format('/tmp/sink', Encoder.simple_string_encoder("UTF-8")) .with_rolling_policy(RollingPolicy.default_rolling_policy( part_size=1024 ** 3, rollover_interval=15 * 60 * 1000, inactivity_interval=50 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation(FileSink .for_row_format('/tmp/sink', Encoder.simple_string_encoder("UTF-8")) .with_rolling_policy(RollingPolicy.default_rolling_policy( part_size=1024 ** 3, rollover_interval=15 * 60 * 1000, inactivity_interval=50 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020<#StarWars, 300> <#Brexit, 521> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations0 码力 | 24 页 | 914.13 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020cause unnecessary result degradation! • Load shedding components rely on statistics gathered during execution: • A statistics manager module monitors processing and input rates and periodically estimates assigns a cost, ci, in cycles per tuple, and a selectivity, si, to each operator i. • The statistics manager collects metrics and estimates those parameters either continuously or by running the sampling rate • The rate defines the probability to discard a tuple and is computed based on statistics and operator selectivity • The optimization objective is to achieve the highest possible accuracy0 码力 | 43 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020filtering and alarm activation • Aggregation of multiple sensors and joins • Examples • Real-time statistics, e.g. weather maps • Monitor conditions to adjust resources, e.g. power generation • Monitor0 码力 | 34 页 | 2.53 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Cost-based optimization 11 Parsed program representation Optimizer statistics input plan A plan B output Lowest-cost plan ??? Vasiliki Kalavri | Boston University 20200 码力 | 54 页 | 2.83 MB | 1 年前3
共 7 条
- 1













