Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020• stream-to-relation: define tables by selecting portions of a stream. • relation-to-stream: create streams through querying tables Declarative language: CQL 4 Vasiliki Kalavri | Boston University stream S1 and stream S2 11 Vasiliki Kalavri | Boston University 2020 Operator types (II) • Sequence Operators capture the arrival of an ordered set of events. • common in pattern languages • events }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri | Boston University 2020 Blocking vs. Non-Blocking operators0 码力 | 53 页 | 532.37 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkProcessing (DStream) 7 / 79 DStream (1/2) ▶ DStream: sequence of RDDs representing a stream of data. 8 / 79 DStream (1/2) ▶ DStream: sequence of RDDs representing a stream of data. 8 / 79 DStream [consumer group id], [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom source: extend the Receiver class. ▶ Implement onStart() and onStop(). ▶ Call store(data) close() } } 33 / 79 Output Operations (4/4) ▶ A better solution is to use rdd.foreachPartition ▶ Create a single connection object and send all the records in a RDD partition using that connection. dstream0 码力 | 113 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationcheck your Python version as following: 3 pyflink-docs, Release release-1.15 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production when there Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationcheck your Python version as following: 3 pyflink-docs, Release release-1.16 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production when there Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment0 码力 | 36 页 | 266.80 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020deterministic: it produces the same output when starting from the same initial state and given the same sequence of input tuples • convergent-capable: it can re-build internal state in a way that it eventually even if the sender crashes • this technique guarantees at-least-once delivery RPC retries might create duplicates • RPCs can sometimes succeed even if they appear to have failed, i.e. a sender can only0 码力 | 49 页 | 2.08 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Preliminaries ??? Vasiliki Kalavri | Boston University 2020 8 Some algorithms model graph streams a sequence of vertex events. A vertex stream consists of events that contain a vertex and all of its neighbors partitioned in disjoint subsets • Single-pass computation: For each edge • if seen for the 1st time, create a component with ID the min of the vertex IDs • if in different components, merge them and update0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020we observe the entries of A by increasing index. This can model time-series data streams: • a sequence of measurements from a temperature sensor • the volume of NASDAQ stock trades over time This Stream denotation An abstract interpretation of the stream as a mathematical structure, e.g. a sequence of (finite) relation states over a common schema R: [r1(R), r2(R), ..., ], where the individual 20K), (2, 5, 32K), (1, 2, 28K)} 25 Vasiliki Kalavri | Boston University 2020 Such a relation sequence could be represented in various ways: • as the concatenation of serializations of the relations0 码力 | 45 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020durably store all events in a sequential (possibly partitioned) log • A log is an append-only sequence of records on disk • a producer generates messages by simply appending them to the log and a partitions • Within each partition, every message carries an offset, a monotonically increasing sequence number • Within a partition, all messages are totally ordered but there is no ordering guarantee0 码力 | 33 页 | 700.14 KB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020each topic, the Kafka cluster maintains a partitioned log. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. An offset is a sequential0 码力 | 26 页 | 3.33 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020keyed windows are evaluated in parallel • Non-keyed windows are processed in a single thread To create a window operator, you need to specify two window components: • A window assigner determines how is assigned to it. Flink will never evaluate empty windows! Flink’s built-in window assigners create windows of type TimeWindow. : a time interval between the two timestamps, where start is inclusive // event-time sliding windows assigner val slidingAvgTemp = sensorData .keyBy(_.id) // create 1h event-time windows every 15 minutes .window(SlidingEventTimeWindows.of(Time.hours(1), Time0 码力 | 35 页 | 444.84 KB | 1 年前3
共 12 条
- 1
- 2













