Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University 2020 Mobile game application • input stream: Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while the train Event time • the time when an event actually happened • an event-time window would give you the extra life • results are deterministic and independent of the processing speed Notions of time 5 Vasiliki0 码力 | 22 页 | 2.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Conditions might change • State is accumulated over time 2 events/s time rate decrease events/s time throughput degradation events/s time rate increase : input rate : throughput ??? Vasiliki Kalavri | Boston University 2020 Scaling approaches Metrics • service time and waiting time per tuple and per task • total time spent processing a tuple and all its derived results • CPU utilization one operator at a time • Predictive: at-once for all operators 8 ??? Vasiliki Kalavri | Boston University 2020 Queuing theory models 9 • Metrics • service time and waiting time per tuple and per0 码力 | 93 页 | 2.42 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020receive an event 2. store in local buffer and possibly update state 3. produce output What can go wrong: • lost events • duplicate or lost state updates • wrong result 5 mi mo Was mi fully • The state consists of • input queues • operator state • output queues • Short recovery time • High runtime overhead • The checkpoint interval determines the trade-off 14 Ni primary • The state consists of • input queues • operator state • output queues • Short recovery time • High runtime overhead • The checkpoint interval determines the trade-off 14 Ni primary0 码力 | 49 页 | 2.08 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Some slides in this lecture have been generously provided by Paris Carbone, KTHGo read his PhD thesis: http://kth.diva-portal.org/smash/get/diva2:1240814/FULLTEXT01.pdf 2 ??? Kalavri | Boston University 2020 On receiving a marker (I) A process receiving a marker for the first time: 1. Records its own state. 2. Marks the channel that the marker came in on as empty. a. Future cpConfig.setMinPauseBetweenCheckpoints(30000); // allow three checkpoints to be in progress at the same time cpConfig.setMaxConcurrentCheckpoints(3); // checkpoints have to complete within five minutes, or 0 码力 | 81 页 | 13.18 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020• e.g. joins, holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: Window sensor can use the time characteristic to tell Flink how to define time when you are creating windows. The time characteristic is a property of the StreamExecutionEnvironment: Configuring a time characteristic0 码力 | 35 页 | 444.84 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flink79 Stream Processing Systems Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 79 Contribution ▶ Design issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic entry point of all Spark Streaming functionality. ▶ The second parameter, Seconds(1), represents the time interval at which streaming data will be divided into batches. val conf = new SparkConf().setAppName(appName)0 码力 | 113 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series? 3 var = ∑ University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series? 3 • the sum University 2020 A simple and efficient synopsis Suppose that our data consists of a large numeric time series. What summary would let us compute the statistical variance of this series? 3 • the sum0 码力 | 74 页 | 1.06 MB | 1 年前3
Streaming in Apache FlinkFlink programs • Implement streaming data processing pipelines • Flink managed state • Event time Streaming in Apache Flink • Streams are natural • Events of any type like sensors, click streams processing as a subset of stream processing Processing Data Dataflows Let's Talk About Time • Processing Time • Event Time • Events may arrive out of order! What Can Be Streamed? • Anything (if you write events, FALSE for ride end events startTime DateTime the start time of a ride endTime DateTime the end time of a ride, ""1970-01-01 00:00" for start events startLon Float0 码力 | 45 页 | 3.00 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020“relation stream”) applied to relation R contains a stream elementwhenever tuple s is in R at time τ. 6 Vasiliki Kalavri | Boston University 2020 Imperative language: Aurora SQuAl Queries are represented attribute X > 0 enters the system and also an item of type B with Y = 10 is detected, followed (in a time interval of 5–15 s) by an item of type C with Z < 5. 8 Vasiliki Kalavri | Boston University 2020 satisfies a loop condition. • not commonly supported • a termination condition must be defined, e.g. time limit 12 Vasiliki Kalavri | Boston University 2020 timely::example(|scope| { let (handle0 码力 | 53 页 | 532.37 KB | 1 年前3
监控Apache Flink应用程序(入门)Flink作业的整个拓扑结构,并且事件和屏障不能相互超越。因此,一个成功的检查点显示没有通道是完全拥挤 的。 3.1 关键指标 指标 范围 描述 uptime job The time that the job has been running without interruption. fullRestarts job The total number of full caolei – 监控Apache Flink应用程序(入门) 监控 – 8 3.2 仪表盘示例 Figure 1: Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7) caolei – 监控Apache Flink应用程序(入门) 监控 – 9 Figure 当watermarks超过30时,结束于t = 30的事件时间窗口将被关闭并计算。 因此,您应该在应用程序中对事件时间敏感的operators(如流程函数和窗口)上监控watermarks。如果当前处理 时间与被称为 even-time skew的watermarks之间的差异非常高,那么它通常意味着可能会出现两种情况。首 先,它可能意味着您只是在处理旧的事件,例如在停机后的追赶期间,或者当您的工作无法继续,而事件正在 排队时。0 码力 | 23 页 | 148.62 KB | 1 年前3
共 23 条
- 1
- 2
- 3













