Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once for all operators ??? Vasiliki processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once for all operators Too fine-grained0 码力 | 93 页 | 2.42 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020broker: a system that connects event producers with event consumers. • It receives messages from the producers and pushes them to the consumers. • A TCP connection is a simple messaging system which which connects one sender with one recipient. • A general messaging system connects multiple producers to multiple consumers by organizing messages into topics. 7 Message Broker producer producer allow the use of wildcards. 21 Content-based Pub/Sub • Events are grouped according to event properties or contents. • data attributes or meta-data. • Consumers subscribe to events by specifying0 码力 | 33 页 | 700.14 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020stream elements on-the-fly using limited memory 2 Vasiliki Kalavri | Boston University 2020 Properties of data streams • They arrive continuously instead of being available a-priori. • They bear DSMS Database Management System • ad-hoc queries, data manipulation tasks • insertions, updates, deletions of single row or groups of rows Data Stream Management System • continuous queries •0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentation2 GB. If the size of an archive file is more than 2 GB, you could upload it to a distributed file system and then use the path in the command line option -pyarch. • Mix use of the above options You could pyflink-docs, Release release-1.15 1.1.1.5 Kubernetes Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management. This page shows you how DataStream Connectors. [8]: from pyflink.common import Encoder from pyflink.datastream.connectors.file_system import FileSink, RollingPolicy def split(s): splits = s[1].split('|') for sp in splits: (continues0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation2 GB. If the size of an archive file is more than 2 GB, you could upload it to a distributed file system and then use the path in the command line option -pyarch. • Mix use of the above options You could pyflink-docs, Release release-1.16 1.1.1.5 Kubernetes Kubernetes is a popular container-orchestration system for automating computer application deployment, scaling, and management. This page shows you how DataStream Connectors. [8]: from pyflink.common import Encoder from pyflink.datastream.connectors.file_system import FileSink, RollingPolicy def split(s): splits = s[1].split('|') for sp in splits: (continues0 码力 | 36 页 | 266.80 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020no more delayed events will arrive. • Watermarks provide a logical clock which informs the system about the current event time. http://streamingbook.net/fig/2-9 Vasiliki Kalavri | Boston University with a timestamp T indicates that all subsequent records should have timestamps > T. Watermark properties 14 Vasiliki Kalavri | Boston University 2020 Watermarks are essential to both event-time windows0 码力 | 22 页 | 2.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the cost of different strategies? • before execution or during runtime0 码力 | 54 页 | 2.83 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize. • Requires a persistent process of discarding data when input rates increase beyond system capacity. • Load shedding techniques operate in a dynamic fashion: the system detects an overload situation during runtime and selectively streams with known arrival rates C: system processing capacity H: headroom factor, i.e. a conservative estimate of the percentage of resources required by the system at steady state Load(N(I)): the load0 码力 | 43 页 | 2.42 MB | 1 年前3
监控Apache Flink应用程序(入门)....................................................................................... 22 4.14 System Resources....................................................................................... is processed by Apache Flink, which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various reasons including the TaskManager (in case of a containerized setup), or by providing more TaskManagers. In general, a system already running under very high load during normal operations, will need much more time to catch-up0 码力 | 23 页 | 148.62 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020& reconfiguration ??? Vasiliki Kalavri | Boston University 2020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager failures ??? Vasiliki Vasiliki Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions0 码力 | 41 页 | 4.09 MB | 1 年前3
共 20 条
- 1
- 2













