Scalable Stream Processing - Spark Streaming and FlinkCourse Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming system ▶ Unified real-time stream and batch processing ▶ Process unbounded and bounded Data ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 69 / 790 码力 | 113 页 | 1.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020about? The design and architecture of modern distributed streaming 4 Fundamental for representing, summarizing, and analyzing data streams Systems Algorithms Architecture and design Scheduling0 码力 | 34 页 | 2.53 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020representation and denotations if left to the application developer/user. • The programmer needs to design and maintain appropriate state synopses. • In order to parallelize operations, events must have0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 DocumentationConda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx 4. Open the pyflink-docs/build/sphinx/html/index.html in the Browser 1.1 Getting Started This page summarizes found in the corresponding connector page in the official Flink documen- tation. For example, you can open the Kafka connector page and search keyword “SQL Client JAR” which is a fat JAR of Kafka connector unable to open JDBC writer at org.apache.flink.connector.jdbc.internal.JdbcOutputFormat.open(JdbcOutputFormat. ˓→java:145) at org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction. ˓→open(Gen0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationConda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx 4. Open the pyflink-docs/build/sphinx/html/index.html in the Browser 1.1 Getting Started This page summarizes found in the corresponding connector page in the official Flink documen- tation. For example, you can open the Kafka connector page and search keyword “SQL Client JAR” which is a fat JAR of Kafka connector unable to open JDBC writer at org.apache.flink.connector.jdbc.internal.JdbcOutputFormat.open(JdbcOutputFormat. ˓→java:145) at org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction. ˓→open(Gen0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming in Apache Flinkcluster grows and shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream> input = Tuple2 > { private ValueState averageState; @Override public void open (Configuration conf) { ValueStateDescriptor descriptor = new ValueStat RichCoFlatMapFunction { private ValueState blocked; @Override public void open(Configuration config) { blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked" 0 码力 | 45 页 | 3.00 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Double)] { // the state handle object private var lastTempState: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { // create state descriptor val lastTempDescriptor = new Valu declare state handle 2. assign name and get the state handle In the operator (FlatMap) class In the open() method Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val threshold: ValueStaterideState; private ValueState fareState; @Override public void open(Configuration config) { // initialize the state descriptors here rideState = getRuntimeContext() 0 码力 | 24 页 | 914.13 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Introduction to Apache Flink and Apache Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework • True streaming at its core • Streaming & Batch API0 码力 | 26 页 | 3.33 MB | 1 年前3
共 8 条
- 1













