State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • No explicit in Apache Flink - Tzu-Li (Gordon) Tai: https:// www.youtube.com/watch?v=euFMWFDThiE • Webinar: Deep Dive on Apache Flink State - Seth Wiesman: https:// www.youtube.com/watch?v=9GF8Hwqzwnk Further0 码力 | 24 页 | 914.13 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 4 Distributed0 码力 | 49 页 | 2.08 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Batch API Historic data Kafka, RabbitMQ, ... HDFS, JDBC, ... Event logs ETL, Graphs, Machine Learning Relational, … Low latency, windowing, aggregations, ... 2 Vasiliki Kalavri | Boston University0 码力 | 26 页 | 3.33 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020user queries approximate results ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let this series? 3 var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let observations var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentationworkloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such as Pandas0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationworkloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such as Pandas0 码力 | 36 页 | 266.80 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020at any point in the stream. 13 It is the most general model Hard to develop space-efficient and time-efficient algorithms Vasiliki Kalavri | Boston University 2020 Relational Streaming Model Vasiliki0 码力 | 45 页 | 1.22 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020setMaxParallelism(1024) 33 Setting the max parallelism ??? Vasiliki Kalavri | Boston University 2020 • A Deep Dive into Rescalable State in Apache Flink: https:// flink.apache.org/features/2017/07/04/flink-rescalable-state0 码力 | 41 页 | 4.09 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient way to execute a query • There may exist several ways to execute a computation • query plans plan B output Lowest-cost plan ??? Vasiliki Kalavri | Boston University 2020 12 • What does efficient mean in the context of streaming? • queries run continuously • streams are unbounded • In0 码力 | 54 页 | 2.83 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020working set. If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query results depend on a snapshot0 码力 | 33 页 | 700.14 KB | 1 年前3
共 13 条
- 1
- 2













