State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> key of the current record so that all records with the same key access the same state State management in Apache Flink 5 Vasiliki Kalavri | Boston University 2020 Operator state Keyed state State state is stored, accessed, and maintained. State backends are responsible for: • local state management • checkpointing state to remote and persistent storage, e.g. a distributed filesystem or a database 0 码力 | 24 页 | 914.13 KB | 1 年前3
PyFlink 1.15 DocumentationIt’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to jobs to YARN cluster It supports to execute PyFlink jobs in application mode, per-job mode and session mode in YARN deployment. You could execute PyFlink jobs in application mode as following: ./bin/flink0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationIt’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported to jobs to YARN cluster It supports to execute PyFlink jobs in application mode, per-job mode and session mode in YARN deployment. You could execute PyFlink jobs in application mode as following: ./bin/flink0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020optimizations • plan translation alternatives • Runtime optimizations • load management, scheduling, state management • Optimization semantics, correctness, profitability Topics covered in this lecture events? • clicks per user session? 49 t t+1 t+3 t+4 t+5 t+6 t+7 t+2 logged in logged out How would you compute… • the maximum every 100 events? • clicks per user session? • faster than the batch0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020might mean different things • last 5 sec • last 10 events • last 1h every 10 min • last user session Window operators 2 Vasiliki Kalavri | Boston University 2020 object MaxSensorReadings { def activity followed by a period of inactivity session gap key 3 key 2 key 1 Session windows 12 Vasiliki Kalavri | Boston University 2020 // event-time session windows assigner val sessionWindows = sensorData keyBy(_.id) // create event-time session windows with a 15 min gap .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .process(…) 13 Session window example Window assigner Window0 码力 | 35 页 | 444.84 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020DBMS SDW DSMS Database Management System • ad-hoc queries, data manipulation tasks • insertions, updates, deletions of single row or groups of rows Data Stream Management System • continuous materialized view updates • pre-aggregated, pre-processed streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University data State limited, in-memory partitioned, virtually unlimited, persisted to backends Load management shedding backpressure, elasticity Fault tolerance limited support, high availability full support0 码力 | 45 页 | 1.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020State is scoped to a single task • Each stateful task is responsible for processing and state management 31 ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State State is scoped to a single task • Each stateful task is responsible for processing and state management 31 block channels and upstream operators ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart State is scoped to a single task • Each stateful task is responsible for processing and state management 31 snapshot snapshot block channels and upstream operators buffer incoming records0 码力 | 93 页 | 2.42 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkbuilt-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason. 48 / 79 mapWithState Operation ▶ mapWithState is different types of windows: • Tumbling windows (no overlap) • Sliding windows (with overlap) • Session windows (punctuated by a gap of inactivity) 71 / 79 Watermark and Late Elements ▶ It is possible0 码力 | 113 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Algorithms Architecture and design Scheduling and load management Scalability and elasticity Fault-tolerance and guarantees State management Operator semantics Window optimizations Filtering experts with decades of hands-on experience in building and using distributed systems and data management platforms • Have fun! 10 Vasiliki Kalavri | Boston University 2020 Important dates Deliverable recommendations of products, articles, people 26 Vasiliki Kalavri | Boston University 2020 Online traffic management • Analysis of real-time vehicle locations to improve traffic conditions • Provide real-time scheduling0 码力 | 34 页 | 2.53 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020in a period during which a user was active 17 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (I) • Join operators merge two streams by matching elements satisfying a condition blocking and must be defined over a window 18 Vasiliki Kalavri | Boston University 2020 Flow Management Operators (II) • Duplicate/Copy Operator replicates a stream, commonly to be used as input to Article 15 (June 2012). • Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Data Stream Management: Processing High-Speed Data Streams. Springer-Verlag, Berlin, Heidelberg. • David Maier, Jin0 码力 | 53 页 | 532.37 KB | 1 年前3
共 15 条
- 1
- 2













