State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> key of the current record so that all records with the same key access the same state State management in Apache Flink 5 Vasiliki Kalavri | Boston University 2020 Operator state Keyed state State state is stored, accessed, and maintained. State backends are responsible for: • local state management • checkpointing state to remote and persistent storage, e.g. a distributed filesystem or a database 0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020optimizations • plan translation alternatives • Runtime optimizations • load management, scheduling, state management • Optimization semantics, correctness, profitability Topics covered in this lecture hashing • indexing, pre-fetching • minimize disk access • scheduling Objectives • optimize resource utilization or minimize resources • decrease latency, increase throughput • minimize monetary Kalavri | Boston University 2020 28 Safety • Ensure resource kinds: all resources required by a fused operator should remain available. • Ensure resource amounts: the total amount of resources required by0 码力 | 54 页 | 2.83 MB | 1 年前3
PyFlink 1.15 DocumentationIt’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment See Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported popular container-orchestration system for automating computer application deployment, scaling, and management. This page shows you how to set up Python environment and exeucte PyFlink jobs in a Kubernetes0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationIt’s supported to use Python virtual environment in your PyFlink jobs, see PyFlink Dependency Management for more details. Create a virtual environment using virtualenv To create a virtual environment See Submitting PyFlink jobs for more details. 1.1.1.4 YARN Apache Hadoop YARN is a cluster resource management framework for managing the resources and scheduling jobs in a Hadoop cluster. It’s supported popular container-orchestration system for automating computer application deployment, scaling, and management. This page shows you how to set up Python environment and exeucte PyFlink jobs in a Kubernetes0 码力 | 36 页 | 266.80 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University 2020 Load management approaches 3 ! Load shedder (a) Load shedding (b) Back-pressure (c) Elasticity Selectively stabilize. • Requires a persistent input source. • Suitable for transient load increase. Scale resource allocation: • Addresses the case of increased load and additionally ensures no resources are Credit-based flow control • This classic networking technique turns out to be very useful for load management in modern, highly-parallel stream processors and is implemented in Apache Flink. • Each task0 码力 | 43 页 | 2.42 MB | 1 年前3
Apache Flink的过去、现在和未来Time Window 2015 年阿里巴巴开始使用 Flink 并持续贡献社区 重构分布式架构 Client Dispatcher Job Manager Task Manager Resource Manager Cluster Manager Task Manager 1. Submit job 2. Start job 3. Request slots 4. Allocate P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming0 码力 | 33 页 | 3.36 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management • utilization, isolation • Automation • continuous monitoring • bottleneck detection0 码力 | 41 页 | 4.09 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020DBMS SDW DSMS Database Management System • ad-hoc queries, data manipulation tasks • insertions, updates, deletions of single row or groups of rows Data Stream Management System • continuous materialized view updates • pre-aggregated, pre-processed streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University data State limited, in-memory partitioned, virtually unlimited, persisted to backends Load management shedding backpressure, elasticity Fault tolerance limited support, high availability full support0 码力 | 45 页 | 1.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020State is scoped to a single task • Each stateful task is responsible for processing and state management 31 ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart state migration • State State is scoped to a single task • Each stateful task is responsible for processing and state management 31 block channels and upstream operators ??? Vasiliki Kalavri | Boston University 2020 Pause-and-restart State is scoped to a single task • Each stateful task is responsible for processing and state management 31 snapshot snapshot block channels and upstream operators buffer incoming records0 码力 | 93 页 | 2.42 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Algorithms Architecture and design Scheduling and load management Scalability and elasticity Fault-tolerance and guarantees State management Operator semantics Window optimizations Filtering experts with decades of hands-on experience in building and using distributed systems and data management platforms • Have fun! 10 Vasiliki Kalavri | Boston University 2020 Important dates Deliverable recommendations of products, articles, people 26 Vasiliki Kalavri | Boston University 2020 Online traffic management • Analysis of real-time vehicle locations to improve traffic conditions • Provide real-time scheduling0 码力 | 34 页 | 2.53 MB | 1 年前3
共 16 条
- 1
- 2













