Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020• know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient in using Apache Flink and Kafka transfers of €1000 to a "fake account" until either you're out of money or the activity is detected. • Features to detect fraudulent activity like this: • The transaction amount. • The number of recent (e challenging? 28 Vasiliki Kalavri | Boston University 2020 Using pseudocode (or the programming language of your choice), write a program that reads a stream of integers and computes: 29 1. the maximum0 码力 | 34 页 | 2.53 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020type, content, timing constraints. • Actions define how to produce results from the matches. Language Types 3 Vasiliki Kalavri | Boston University 2020 Three classes of operators: • relation-to-relation: portions of a stream. • relation-to-stream: create streams through querying tables Declarative language: CQL 4 Vasiliki Kalavri | Boston University 2020 Select IStream(*) From S1 [Rows 5], S2 [Rows τ> whenever tuple s is in R at time τ. 6 Vasiliki Kalavri | Boston University 2020 Imperative language: Aurora SQuAl Queries are represented in graphical representation using boxes and arrows Tumble0 码力 | 53 页 | 532.37 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020beneficial? ??? Vasiliki Kalavri | Boston University 2020 • Use equivalence transformation rules if the language allows • selection operations are commutative • theta-join operations are commutative • natural Chromium/25.0.1364.160 Chrome/ 25.0.1364.160 Safari/537.22 Referer: https://www.google.be/ Accept-Language: en-US,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 GET /dumprequest HTTP/1.1 Host: Chromium/25.0.1364.160 Chrome/ 25.0.1364.160 Safari/537.22 Referer: https://www.google.be/ Accept-Language: en-US,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 GET /dumprequest HTTP/1.1 Host:0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 1. Process events online without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee deterministic (on 4.0.3, 32K>17 append … … … … new events old events R(t1) R(t2) R(t3) R(tk) Vasiliki Kalavri | Boston University 2020 • Base streams denotes a key and an arriving tuple t replaces any existing tuple with the same t(A) value to form a new relation state. • as a sliding window with length k in which each subsequence of k tuples represents 0 码力 | 45 页 | 1.22 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020| Boston University 2020 When the JobManager fails all tasks are automatically cancelled. The new JobManager performs the following steps: 1. It requests the storage locations from ZooKeeper to scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused resources, safely terminate processes • Adjust dataflow scaling actions, predict their effects, and decide which and when to apply • Allocate new resources, spawn new processes or release unused resources, safely terminate processes • Adjust dataflow0 码力 | 41 页 | 4.09 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020or database • Consumer periodically polls and retrieves new data • polling overhead, latency? • Consumer receives a notification when new data is available • how to implement triggers? • Direct attributes or meta-data. • Consumers subscribe to events by specifying filters in a subscription language. • Filters define constraints in the form of name-value pairs and basic comparison operators Distributing event notifications • a service that accepts user signups can send notifications whenever a new user registers, and downstream services can subscribe to receive notifications of the event. •0 码力 | 33 页 | 700.14 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkat which streaming data will be divided into batches. val conf = new SparkConf().setAppName(appName).setMaster(master) val ssc = new StreamingContext(conf, Seconds(1)) ▶ It can also be created from an an existing SparkContext object. val sc = ... // existing SparkContext val ssc = new StreamingContext(sc, Seconds(1)) 10 / 79 StreamingContext ▶ StreamingContext is the main entry point of all Spark at which streaming data will be divided into batches. val conf = new SparkConf().setAppName(appName).setMaster(master) val ssc = new StreamingContext(conf, Seconds(1)) ▶ It can also be created from an0 码力 | 113 页 | 1.22 MB | 1 年前3
Streaming in Apache Flinkare either ◦public, or ◦have a default getter and setter Tuple2person = new Tuple2<>("Fred", 35); // zero based index! String name = person.f0; Integer age = person.f1; {}; public Person(String name, Integer age) { … }; } Person person = new Person("Fred Flintstone", 35); Setup • https://training.ververica.com/devEnvSetup.html • Datasets: DataStream rides = env.addSource(new TaxiRideSource(...)); DataStream enrichedNYCRides = rides .filter(new RideCleansing.NYCFilter()) .map(new Enrichment()); enrichedNYCRides 0 码力 | 45 页 | 3.00 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020val env = StreamExecutionEnvironment.getExecutionEnvironment val sensorData = env.addSource(new SensorSource) val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5 .process(new TemperatureAverager) val avgTemp = sensorData .keyBy(_.id) // shortcut for window.(TumblingEventTimeWindows.of(size)) .timeWindow(Time.seconds(1)) .process(new TemperatureAverager) every 15 minutes .window(SlidingEventTimeWindows.of(Time.hours(1), Time.minutes(15))) .process(new TemperatureAverager) val slidingAvgTemp = sensorData .keyBy(_.id) // shortcut for window.0 码力 | 35 页 | 444.84 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020a bitcoin transaction, a packet routed from a source to destination Vertex events: A new product, a new movie, a user ??? Vasiliki Kalavri | Boston University 2020 6 ??? Vasiliki Kalavri | Boston stream: events indicate edge additions or deletions A t+1, the graph is obtained by inserting a new edge or deleting an existing edge (u, v) to E(t+1). If any of u, v do not already exist in V(t) step: For each vertex • choose the min of neighbors’ component IDs and own component ID as the new ID • if the component ID changed since the last iteration, notify neighbors 16 ??? Vasiliki Kalavri0 码力 | 72 页 | 7.77 MB | 1 年前3
共 19 条
- 1
- 2













