 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University 2020 Course Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • Contact: vkalavri@bu.edu • Course Time Assignment #3 contributes 20% 7 Vasiliki Kalavri | Boston University 2020 Grading Scheme (2) Final Project (50%): • A real-time monitoring and anomaly detection framework • To be implemented individually Deliverable Available Due Assignment 1 1/30 2/12 Assignment 2 2/13 2/26 Assignment 3 3/3 3/16 Final Project 3/17 4/30 2/18: No Class, Self-study 2/25: Last Day to DROP Clases (without a ‘W’ grade) 4/3:0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University 2020 Course Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • Contact: vkalavri@bu.edu • Course Time Assignment #3 contributes 20% 7 Vasiliki Kalavri | Boston University 2020 Grading Scheme (2) Final Project (50%): • A real-time monitoring and anomaly detection framework • To be implemented individually Deliverable Available Due Assignment 1 1/30 2/12 Assignment 2 2/13 2/26 Assignment 3 3/3 3/16 Final Project 3/17 4/30 2/18: No Class, Self-study 2/25: Last Day to DROP Clases (without a ‘W’ grade) 4/3:0 码力 | 34 页 | 2.53 MB | 1 年前3
 PyFlink 1.15 Documentationisolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages LONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationisolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages LONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationisolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages LONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationisolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages LONG() It should be noted that Types.BIG_INT() represents type information for the Java BigInteger, while Types.LONG() represents type information for long integer. There are several users are using Types.BIG_INT()0 码力 | 36 页 | 266.80 KB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node recovery (at-most-once) • It drops data during failure • The backup starts from most recent information 8 Vasiliki Kalavri | Boston University 2020 Recovery semantics Given a dataflow Q, let Oe be0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node Post-failure output is identical to no-failure • Rollback recovery (at-least-once) • It avoids information loss • The output may contain duplicates • A backup needs to rebuild state of the failed node recovery (at-most-once) • It drops data during failure • The backup starts from most recent information 8 Vasiliki Kalavri | Boston University 2020 Recovery semantics Given a dataflow Q, let Oe be0 码力 | 49 页 | 2.08 MB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020watermark in each passing record, e.g. if the stream contains special records that encode watermark information. val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristi0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020watermark in each passing record, e.g. if the stream contains special records that encode watermark information. val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristi0 码力 | 22 页 | 2.22 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020publish publish notify() subscribe() unsubscribe() subscribe notify unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020publish publish notify() subscribe() unsubscribe() subscribe notify unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting0 码力 | 33 页 | 700.14 KB | 1 年前3
 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent might mean different0 码力 | 35 页 | 444.84 KB | 1 年前3 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent might mean different0 码力 | 35 页 | 444.84 KB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The load shedder continuously monitors input rates or other system metrics and can access information about the running query plan • It detects overload and decides what actions to take in order0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The load shedder continuously monitors input rates or other system metrics and can access information about the running query plan • It detects overload and decides what actions to take in order0 码力 | 43 页 | 2.42 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020optimization • Statis Viglas and Jeffrey Naughton. Rate-based Query Optimization for Streaming Information Sources. SIGMOD 2002. Further reading0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020optimization • Statis Viglas and Jeffrey Naughton. Rate-based Query Optimization for Streaming Information Sources. SIGMOD 2002. Further reading0 码力 | 54 页 | 2.83 MB | 1 年前3
 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Lecture references • Gianpaolo Cugola and Alessandro Margara. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3, Article 15 (June 2012)0 码力 | 53 页 | 532.37 KB | 1 年前3 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 Lecture references • Gianpaolo Cugola and Alessandro Margara. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3, Article 15 (June 2012)0 码力 | 53 页 | 532.37 KB | 1 年前3
共 11 条
- 1
- 2













