Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Boston University 2020 Key partitioning same worker • Popular keys cause imbalance w2 w1 w3 ??? Vasiliki Kalavri | Boston University 2020 Addressing skew • To address skew, the system needs to track the frequencies of the partitioning frequencies, we only need to track the heavy hitters 3 ??? Vasiliki Kalavri | Boston University 2020 Lossy Counting • Find all items x in a data stream such that: • freq(x) > δ*N, where N is the0 码力 | 31 页 | 1.47 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • No explicit state primitives • Users define state using arbitrary types • The system is unaware What are the advantages and disadvantages of each approach? Vasiliki Kalavri | Boston University 2020 • Copy, checkpoint, restore, merge, split, query, subscribe, … State operations and types 4 Consider0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives • Runtime optimizations Vasiliki Kalavri | Boston University 2020 Revisiting the basics 3 source sink input port output port dataflow graph ??? Vasiliki Kalavri | Boston University 2020 Revisiting the basics 4 Dataflow0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston University 2020 • Practical last 1h every 10 min • last user session Window operators 2 Vasiliki Kalavri | Boston University 2020 object MaxSensorReadings { def main(args: Array[String]) { val env = StreamExecutionEnvironment .max("temp") } } 3 Example: Window sensor readings Vasiliki Kalavri | Boston University 2020 In the DataStream API, you can use the time characteristic to tell Flink how to define time when0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University 2020 Course Information 9:30-10:45, MCS B33 • Office Hours: Tue,Thu 11:00-12:30, MCS 206 2 Vasiliki Kalavri | Boston University 2020 Announcements, updates, discussions • Website: vasia.github.io/dspa20 • Syllabus: /syllabus.html com/bu/spring2020/cs591k1/home • For questions & discussions • Blackboard: learn.bu.edu/... • For quizzes, assignment announcements & submissions 3 Vasiliki Kalavri | Boston University 2020 What is0 码力 | 34 页 | 2.53 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Vasiliki Kalavri | Boston University 2020 Vasiliki (Vasia) Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time time and progress Vasiliki Kalavri | Boston University 2020 Mobile game application • input stream: user activity • output: rewards based on how fast the user meets goals • e.g. pop 500 bubbles within Boston University 2020 What’s the meaning of one minute? 3 Vasiliki Kalavri | Boston University 2020 What’s the meaning of one minute? 4 Vasiliki Kalavri | Boston University 2020 • Processing time0 码力 | 22 页 | 2.22 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University 2020 What to process stream elements on-the-fly using limited memory 2 Vasiliki Kalavri | Boston University 2020 Properties of data streams • They arrive continuously instead of being available a-priori. • They length, i.e. the DSMS does not know when the stream ends. 3 Vasiliki Kalavri | Boston University 2020 DW DBMS SDW DSMS Database Management System • ad-hoc queries, data manipulation tasks • insertions0 码力 | 45 页 | 1.22 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation ??? Vasiliki Kalavri | Boston University 2020 Counting distinct elements 2 ??? Vasiliki Kalavri | Boston University 2020 How can we count the number of distinct elements seen so far in a stream? use-case: Distinct users visiting one or multiple webpages ??? Vasiliki Kalavri | Boston University 2020 How can we count the number of distinct elements seen so far in a stream? 3 Example use-case: Distinct0 码力 | 69 页 | 630.01 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | Boston University 2020 Modeling networks London Zurich Berlin Transportation networks ??? Vasiliki Kalavri | Boston University 2020 3 friend follows London Zurich Berlin “conservative” “liberal” If you like “Inside job” you Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex” “edge” 1 5 4 3 2 undirected graph directed graph 4 ??? Vasiliki Kalavri | Boston University 2020 Graph streams Graph streams0 码力 | 72 页 | 7.77 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/21: Sampling and filtering streams ??? Vasiliki Vasiliki Kalavri | Boston University 2020 Synopses for massive data streams • Maintaining synopses is often the only means of providing interactive response times when exploring massive datasets or high maintenance component user queries approximate results ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series.0 码力 | 74 页 | 1.06 MB | 1 年前3
共 19 条
- 1
- 2













