 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020• freq(x) > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: • freq(y) < (δ - ε)*N, for a user-chosen value ε 4 (δ - ε)*Ν δ*Ν not included of items N: number of items in the stream fe: true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined threshold, so that freq(x)≥ δ*N,δ∈(0,1) ε: user-defined user-defined error Output: All items with frequency greater than or equal to δ*N. No item with frequency less than (δ-ε)*N. 5 ??? Vasiliki Kalavri | Boston University 2020 Notation (II) • We define windows0 码力 | 31 页 | 1.47 MB | 1 年前3 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020• freq(x) > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: • freq(y) < (δ - ε)*N, for a user-chosen value ε 4 (δ - ε)*Ν δ*Ν not included of items N: number of items in the stream fe: true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined threshold, so that freq(x)≥ δ*N,δ∈(0,1) ε: user-defined user-defined error Output: All items with frequency greater than or equal to δ*N. No item with frequency less than (δ-ε)*N. 5 ??? Vasiliki Kalavri | Boston University 2020 Notation (II) • We define windows0 码力 | 31 页 | 1.47 MB | 1 年前3
 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020fires when an item of type A having an attribute X > 0 enters the system and also an item of type B with Y = 10 is detected, followed (in a time interval of 5–15 s) by an item of type C with Z 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators process stream elements one-by-one. • selection, filtering, projection, renaming. • when at least one item has been detected. • repetition of an item I of degree (m, n) is satisfied when I is detected at least m times but o more than n times. • negation of an item I is satisfied when0 码力 | 53 页 | 532.37 KB | 1 年前3 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020fires when an item of type A having an attribute X > 0 enters the system and also an item of type B with Y = 10 is detected, followed (in a time interval of 5–15 s) by an item of type C with Z 2020 Streaming Operators 9 Vasiliki Kalavri | Boston University 2020 Operator types (I) • Single-Item Operators process stream elements one-by-one. • selection, filtering, projection, renaming. • when at least one item has been detected. • repetition of an item I of degree (m, n) is satisfied when I is detected at least m times but o more than n times. • negation of an item I is satisfied when0 码力 | 53 页 | 532.37 KB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020DataStream[(String, Double)] = … val priceRequests: DataStream[Item] = ... factors.connect(priceRequests).flatMap( new CoFlatMapFunction[(String, Double), Item, Offer] { // shared state between the two streams method for the stream of price requests override def flatMap2(item: Item, out: Collector[Offer]) = { out.collect(computePrice(item, factorValues)) } }) 17 Vasiliki Kalavri | Boston University0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020DataStream[(String, Double)] = … val priceRequests: DataStream[Item] = ... factors.connect(priceRequests).flatMap( new CoFlatMapFunction[(String, Double), Item, Offer] { // shared state between the two streams method for the stream of price requests override def flatMap2(item: Item, out: Collector[Offer]) = { out.collect(computePrice(item, factorValues)) } }) 17 Vasiliki Kalavri | Boston University0 码力 | 26 页 | 3.33 MB | 1 年前3
 Streaming in Apache FlinkgetState(descriptor); } @Override public Tuple2 Streaming in Apache FlinkgetState(descriptor); } @Override public Tuple2- map (Tuple2 - item) throws Exception { // access the state for this key MovingAverage average = averageState this event to the moving average average.add(item.f1); averageState.update(average); // return the smoothed result return new Tuple2(item.f0, average.getAverage()); } } Connected 0 码力 | 45 页 | 3.00 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020empty bag and then inserts each successive stream item: • ins([]) = Ø • ins(P:i) = insert(i, ins(P)), where P:i denotes the sequence P extended by item i. Insert-Unique (distinct): The reconstitution Insert-Replace: If the stream has a key, the reconstitution function ins_r guarantees that only the most recent item with a given key is included: • ins_r([]) = Ø • ins_r(P:i) = insert(i, {j | j ∈ ins_r(P) ^ j.A ≠0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020empty bag and then inserts each successive stream item: • ins([]) = Ø • ins(P:i) = insert(i, ins(P)), where P:i denotes the sequence P extended by item i. Insert-Unique (distinct): The reconstitution Insert-Replace: If the stream has a key, the reconstitution function ins_r guarantees that only the most recent item with a given key is included: • ins_r([]) = Ø • ins_r(P:i) = insert(i, {j | j ∈ ins_r(P) ^ j.A ≠0 码力 | 45 页 | 1.22 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020operations are commutative • natural joins are associative • Move projections early to reduce data item size • Pick join orderings to minimize the size of intermediate results • execute selective joins starvation: every data item is eventually processed • Ensure each worker is qualified: if load balancing is applied after fission, each instance must be capable of processing each item and have access to0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020operations are commutative • natural joins are associative • Move projections early to reduce data item size • Pick join orderings to minimize the size of intermediate results • execute selective joins starvation: every data item is eventually processed • Ensure each worker is qualified: if load balancing is applied after fission, each instance must be capable of processing each item and have access to0 码力 | 54 页 | 2.83 MB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinkelement of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinkelement of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 0 or more output items. ▶ filter • Returns a new DStream by selecting only the0 码力 | 113 页 | 1.22 MB | 1 年前3
共 7 条
- 1













