Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020possibly different type A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java… ??? Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 521> <#WorldCup combiners example: URL access frequency (k2, list(v2)) → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access google.be, 1 maps.google.com, 1 ??? Vasiliki Kalavri | Boston University 2020 MapReduce combiners example: URL access frequency 27 map() reduce() GET /dumprequest HTTP/1.1 Host: rve.org.uk Connection: 0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020possibly different type A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java… 40 Vasiliki Kalavri | Boston University 2020 Stateful operators Logic State<#Brexit .max("temp") maxTemp.print() env.execute("Compute max sensor temperature”) } } Example: Apache Flink DataStream API 42 Vasiliki Kalavri | Boston University 2020 Relational Streaming 0 码力 | 45 页 | 1.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 ??? Vasiliki Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 input stream ε=0.2 w1 w4 w3 w2 1 2 2 3 5 w1 ??? Vasiliki Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 1 2 2 3 5 w1 1 2 2 3 5 1 0 2 0 1 0 1 0 f1 ε1 f2 ε2 f3 ε3 f5 ε5 ??? Vasiliki Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 1 2 2 3 5 w1 Delete0 码力 | 31 页 | 1.47 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinka given key has been consumed or not. ▶ No built-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason newCount + oldCount state.update(sum) (key, sum) } 51 / 79 updateStateByKey vs. mapWithState Example (1/3) ▶ The first micro batch contains a message a. ▶ updateStateByKey • updateFunc = (values: value = Some(1), state = 0 • Output: key = a, sum = 1 52 / 79 updateStateByKey vs. mapWithState Example (1/3) ▶ The first micro batch contains a message a. ▶ updateStateByKey • updateFunc = (values:0 码力 | 113 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationvirtual environment. Note that the Flink version and PyFlink version need to be consistent. For example, if you are using Flink 1.15, then you should use PyFlink 1.15 Installing using PyPI PyFlink could make sure that the versions of all the Flink jar packages are consistent, e.g. 1.15.2 in the above example. 1.1.1.2 Local This page shows you how to set up PyFlink development environment in your local /path/to/venv/bin/python3 \ -pyexec /path/to/venv/bin/python3 \ -py word_count.py In the above example, it assumes that there is already a Python virtual environment available at /path/to/venv on all0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationvirtual environment. Note that the Flink version and PyFlink version need to be consistent. For example, if you are using Flink 1.15, then you should use PyFlink 1.15 Installing using PyPI PyFlink could make sure that the versions of all the Flink jar packages are consistent, e.g. 1.15.2 in the above example. 1.1.1.2 Local This page shows you how to set up PyFlink development environment in your local /path/to/venv/bin/python3 \ -pyexec /path/to/venv/bin/python3 \ -py word_count.py In the above example, it assumes that there is already a Python virtual environment available at /path/to/venv on all0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020and last 10 elements of S2 stream-to-relation relation-to-relation relation-to-stream CQL Example 5 Vasiliki Kalavri | Boston University 2020 CQL relation-to-stream operators • Istream (for “insert satisfied when I is not detected. 10 Vasiliki Kalavri | Boston University 2020 Logic Operators Example Select IStream(S1.A, S2.B) From S1 [Rows 50], S2 [Rows 50] (A & B) || (C & D) Explicit conjunction condition must be defined, e.g. time limit 12 Vasiliki Kalavri | Boston University 2020 timely::example(|scope| { let (handle, stream) = scope.loop_variable(100, 1); (0..10).to_stream(scope)0 码力 | 53 页 | 532.37 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 20200/9.0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: Window sensor readings Vasiliki Kalavri | Boston University 2020 In the DataStream API, you can of(size)) .timeWindow(Time.seconds(1)) .process(new TemperatureAverager) 9 Tumbling window example Window assigner Window function Vasiliki Kalavri | Boston University 2020 overlapping buckets timeWindow(Time.hours(1), Time(minutes(15))) .process(new TemperatureAverager) 11 Sliding window example Window assigner Window function Vasiliki Kalavri | Boston University 2020 a period of activity0 码力 | 35 页 | 444.84 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages ??? Vasiliki Kalavri | Boston University University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table ??? Vasiliki Boston University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct users visiting one or multiple webpages Naive solution: maintain a hash table0 码力 | 69 页 | 630.01 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020proportion of the stream, e.g. 1/10th 7 search enginequery stream Example use-case: Web search user behavior study Q: How many queries did users repeat last month? ??? Vasiliki fraction of users by hashing usernames to b buckets and selecting the query if h(user) < a. For example, to get a 30% sample: • use 10 buckets, b0, b1, …, b9. • select the query if the user hash value fraction of users by hashing usernames to b buckets and selecting the query if h(user) < a. For example, to get a 30% sample: • use 10 buckets, b0, b1, …, b9. • select the query if the user hash value 0 码力 | 74 页 | 1.06 MB | 1 年前3
共 17 条
- 1
- 2













