Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 202020% 7 Vasiliki Kalavri | Boston University 2020 Grading Scheme (2) Final Project (50%): • A real-time monitoring and anomaly detection framework • To be implemented individually Deliverables • Kalavri | Boston University 2020 Final Project You will use Apache Flink and Kafka to build a real-time monitoring and anomaly detection framework for datacenters. Your framework will: • Detect “suspicious” processing important? Vasiliki Kalavri | Boston University 2020 By 2025, 30% of all data will be real-time data. By 2020, we will be able to store less than 15% of all data. 18 Vasiliki Kalavri | Boston0 码力 | 34 页 | 2.53 MB | 1 年前3
Apache Flink的过去、现在和未来BY Name; Flink 在阿里的服务情况 集群规模 超万台 状态数据 PetaBytes 事件处理 十万亿/天 峰值能力 17亿/秒 Flink 的过去 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 批处理错误恢复(4) 批处理错误恢复(5) 插件化 Shuffle Manager 生态 Flink Hive Flink Zeppelin 中文社区 Flink 的现在 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ Shipping Flow-Control Async Call Auto Scale State Management Event Driven Flink 的未来 offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔0 码力 | 33 页 | 3.36 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020than being available in full before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have and Stan Zdonik. Michael Stonebraker, Uǧur Çetintemel, and Stan Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec. 34, 4 (December 2005).0 码力 | 45 页 | 1.22 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020exchange data through an ATM network, each pair of endpoints first needs to establish a virtual circuit (VC) or connection. • CFC uses a credit system to signal the availability of buffer space from0 码力 | 43 页 | 2.42 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020input • e.g. joins, holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020operator shares a lock with an upstream operator. • Satisfy deadlines: for applications with real-time constraints or QoS latency constraints. Batching Process multiple data elements in a single batch0 码力 | 54 页 | 2.83 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkcol("word")).count() 67 / 79 Flink 68 / 79 Flink ▶ Distributed data flow processing system ▶ Unified real-time stream and batch processing ▶ Process unbounded and bounded Data ▶ Design issues • Continuous0 码力 | 113 页 | 1.22 MB | 1 年前3
PyFlink 1.15 DocumentationPython API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationPython API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines0 码力 | 36 页 | 266.80 KB | 1 年前3
共 9 条
- 1













