Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 20206 Vasiliki Kalavri | Boston University 2020 1. Process events online without storing them 2. Support a high-level language (e.g. StreamSQL) 3. Handle missing, out-of-order, delayed data 4. Guarantee Combine batch (historical) and stream processing 6. Ensure availability despite failures 7. Support distribution and automatic elasticity 8. Offer low-latency 7 2005 Vasiliki Kalavri | Boston University 2020 A stream can be viewed as a massive, dynamic, one-dimensional vector A[1…N]. The size N of the streaming vector is defined as the product of the attribute domain size(s). Note that N might0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationstreaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such 1.1.1.2 Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It ExecNodeBase. ˓→translateToPlan(ExecNodeBase.java:134) This is an issue around Java 17. It still doesn’t support Java 17 in Flink. You can refer to FLINK-15736 for more details. To solve this issue, you need to0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationstreaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries such 1.1.1.2 Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It ExecNodeBase. ˓→translateToPlan(ExecNodeBase.java:134) This is an issue around Java 17. It still doesn’t support Java 17 in Flink. You can refer to FLINK-15736 for more details. To solve this issue, you need to0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 2 Vasiliki Kalavri | Boston University 2020 • operations and types 4 Consider you are designing a state interface. What operations should state support? What state types can you think of? • Count, sum, list, map, … Vasiliki Kalavri | Boston University0 码力 | 24 页 | 914.13 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic streaming computation maintains state: • rolling aggregations • window contents • input offsets • machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 40 码力 | 49 页 | 2.08 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Download and play around with “part-00000-of-00500.csv” of: • job events • task events • machine events 13 Vasiliki Kalavri | Boston University 2020 Software requirements • All assignments assume Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To develop Flink applications and0 码力 | 34 页 | 2.53 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Streaming & Batch API Historic data Kafka, RabbitMQ, ... HDFS, JDBC, ... Event logs ETL, Graphs, Machine Learning Relational, … Low latency, windowing, aggregations, ... 2 Vasiliki Kalavri | Boston This value is typically proportional to the number of physical CPU cores that the TaskManager's machine has (e.g., equal to the number of cores, or half the number of cores). 18 Vasiliki Kalavri | Boston0 码力 | 26 页 | 3.33 MB | 1 年前3
Streaming in Apache Flink11M) ... 1> (50797,12M) Stateful Transformations • local: Flink state is kept local to the machine that processes it • durable: Flink state is automatically checkpointed and restored • vertically0 码力 | 45 页 | 3.00 MB | 1 年前3
监控Apache Flink应用程序(入门)gather insights about system resources, i.e. memory, CPU & network-related metrics for the whole machine as opposed to the Flink processes alone. System resource monitoring is disabled by default and requires0 码力 | 23 页 | 148.62 KB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• MBs assume a small working set. If consumers are slow, throughput might degrade. • DBs support secondary indexes for efficient search while MBs only offer topic-based subscription. • DB query0 码力 | 33 页 | 700.14 KB | 1 年前3
共 13 条
- 1
- 2













