Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020• Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train is in the tunnel • results depend on aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic and independent of the processing speed0 码力 | 22 页 | 2.22 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020different things • last 5 sec • last 10 events • last 1h every 10 min • last user session Window operators 2 Vasiliki Kalavri | Boston University 2020 object MaxSensorReadings { def main(args: 0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: Window sensor readings Vasiliki Kalavri | Boston University 2020 In the DataStream API, you can use the or IngestionTime Vasiliki Kalavri | Boston University 2020 Window operators can be applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed0 码力 | 35 页 | 444.84 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkDStream. 23 / 79 Window Operations (1/3) ▶ Spark provides a set of transformations that apply to a over a sliding window of data. ▶ A window is defined by two parameters: window length and slide interval interval. ▶ A tumbling window effect can be achieved by making slide interval = window length 24 / 79 Window Operations (2/3) ▶ window(windowLength, slideInterval) • Returns a new DStream which is computed based on windowed batches. ▶ countByWindow(windowLength, slideInterval) • Returns a sliding window count of elements in the stream. ▶ reduceByWindow(func, windowLength, slideInterval) • Returns0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming in Apache Flinkkey MovingAverage average = averageState.value(); // create a new MovingAverage (with window size 2) if none exists for this key if (average == null) average = new MovingAverage(2); keyBy() .window(<window assigner>) .reduce|aggregate|process(<window function>) stream. .windowAll(<window assigner>) .reduce|aggregate|process(<window function>) ◦TumblingEventTimeWindows s.withGap(Time.minutes(30)) DataStream input = ... input .keyBy(“key”) .window(TumblingEventTimeWindows.of(Time.minutes(1))) .process(new MyWastefulMax()); public static class 0 码力 | 45 页 | 3.00 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020drop? • Window-aware load shedding applies shedding to entire windows instead of individual tuples • When discarding tuples at the sources or another point in a query with multiple window aggregations aggregations, it is unclear how shedding will affect the correctness of downstream window operators. • This approach preserves window integrity and guarantees that the results under shedding will not be approximations shedding measures tuple utility • The method selects tuples to discard by relying on the notion of a window-based concept drift. • The metric is defined by computing a similarity metric across windows.0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020SQuAl Queries are represented in graphical representation using boxes and arrows Tumble Window Tumble Window Join(S1.A = S2.A) S1 S2 7 Vasiliki Kalavri | Boston University 2020 Composite subscription records arrive. • projection, selection, union 14 Vasiliki Kalavri | Boston University 2020 Window Operators • Probably the most important operators in stream processing systems • Almost universally the stream on which computations can be performed 15 Vasiliki Kalavri | Boston University 2020 Window types (I) • Time-based (logical) windows define their contents as a function of time. • average0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020replaces any existing tuple with the same t(A) value to form a new relation state. • as a sliding window with length k in which each subsequence of k tuples represents a relation state in the sequence data channels • operators can accumulate state, have multiple inputs, express event- time custom window-based logic • some systems, like Timely Dataflow support cyclic dataflows and iterations on streams continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time, window, logic, pattern matching transformations • output one or more streams of possibly different0 码力 | 45 页 | 1.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove infrequent Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur: if x ∈ D, increase0 码力 | 31 页 | 1.47 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020100 rec 100 recs Observation Window W 0.5s ??? Vasiliki Kalavri | Boston University 2020 16 src o1 o2 10 recs 10 recs 1 2 3 4 100 rec 100 recs Observation Window W 0.5s Instrumentation Metrics Vasiliki Kalavri | Boston University 2020 The DS2 model • Collect metrics per configurable observation window W • activity durations per worker • records processed Rprc and records pushed to output Rpsd Vasiliki Kalavri | Boston University 2020 The DS2 model • Collect metrics per configurable observation window W • activity durations per worker • records processed Rprc and records pushed to output Rpsd0 码力 | 93 页 | 2.42 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020<#StarWars, 300> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki 300> <#Brexit> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki <#Brexit, 521> Any non-trivial streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki0 码力 | 49 页 | 2.08 MB | 1 年前3
共 15 条
- 1
- 2













