Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020stream S1 and stream S2 11 Vasiliki Kalavri | Boston University 2020 Operator types (II) • Sequence Operators capture the arrival of an ordered set of events. • common in pattern languages • events stream is a sequence of unbounded length, where tuples are ordered by their arrival time. Sequence: Let t1, … ,tn be tuples from a relation R. The list S = [t1, … ,tn] is called a sequence, of length The empty sequence [ ] has length 0. We use t ∈ S to denote that, for some 1 ≤ i ≤ n, ti = t. 23 Vasiliki Kalavri | Boston University 2020 Model and formalization (II) Pre-sequence (prefix): Let0 码力 | 53 页 | 532.37 KB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020we observe the entries of A by increasing index. This can model time-series data streams: • a sequence of measurements from a temperature sensor • the volume of NASDAQ stock trades over time This Stream denotation An abstract interpretation of the stream as a mathematical structure, e.g. a sequence of (finite) relation states over a common schema R: [r1(R), r2(R), ..., ], where the individual 20K), (2, 5, 32K), (1, 2, 28K)} 25 Vasiliki Kalavri | Boston University 2020 Such a relation sequence could be represented in various ways: • as the concatenation of serializations of the relations0 码力 | 45 页 | 1.22 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkProcessing (DStream) 7 / 79 DStream (1/2) ▶ DStream: sequence of RDDs representing a stream of data. 8 / 79 DStream (1/2) ▶ DStream: sequence of RDDs representing a stream of data. 8 / 79 DStream consistent checkpoints. 74 / 79 Summary 75 / 79 Summary ▶ Spark • Mini-batch processing • DStream: sequence of RDDs • RDD and window operations • Structured streaming ▶ Flink • Unified batch and stream0 码力 | 113 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020durably store all events in a sequential (possibly partitioned) log • A log is an append-only sequence of records on disk • a producer generates messages by simply appending them to the log and a partitions • Within each partition, every message carries an offset, a monotonically increasing sequence number • Within a partition, all messages are totally ordered but there is no ordering guarantee0 码力 | 33 页 | 700.14 KB | 1 年前3
PyFlink 1.15 Documentationrandom_source ( id TINYINT, data STRING ) WITH ( 'connector' = 'datagen', 'fields.id.kind' = 'sequence', 'fields.id.start' = '1', 'fields.id.end' = '2', 'fields.data.kind' = 'random' ) """) table = table_env.from_descriptor( TableDescriptor .for_connector('datagen') .option('fields.id.kind', 'sequence') .option('fields.id.start', '1') .option('fields.id.end', '2') .option('fields.data.kind', 'random')0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationrandom_source ( id TINYINT, data STRING ) WITH ( 'connector' = 'datagen', 'fields.id.kind' = 'sequence', 'fields.id.start' = '1', 'fields.id.end' = '2', 'fields.data.kind' = 'random' ) """) table = table_env.from_descriptor( TableDescriptor .for_connector('datagen') .option('fields.id.kind', 'sequence') .option('fields.id.start', '1') .option('fields.id.end', '2') .option('fields.data.kind', 'random')0 码力 | 36 页 | 266.80 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020each topic, the Kafka cluster maintains a partitioned log. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. An offset is a sequential0 码力 | 26 页 | 3.33 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020deterministic: it produces the same output when starting from the same initial state and given the same sequence of input tuples • convergent-capable: it can re-build internal state in a way that it eventually0 码力 | 49 页 | 2.08 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Preliminaries ??? Vasiliki Kalavri | Boston University 2020 8 Some algorithms model graph streams a sequence of vertex events. A vertex stream consists of events that contain a vertex and all of its neighbors0 码力 | 72 页 | 7.77 MB | 1 年前3
共 9 条
- 1













