Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 20201/10th of the stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i if ri=0. 8 Q: 1/10th of the stream, we select a stream element i with probability 10%. • We can use a random generator that produces an integer ri between 0 and 9. We then select an input element i if ri=0. 8 Will0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentationcheck your Python version as following: 3 pyflink-docs, Release release-1.15 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production when there Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationcheck your Python version as following: 3 pyflink-docs, Release release-1.16 python3 --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies Python executable files and the installed Python packages. It is useful for local development to create a standalone Python environment and also useful when deploying a PyFlink job to production when there Management for more details. Create a virtual environment using virtualenv To create a virtual environment using virtualenv, run: python3 -m pip install virtualenv # Create Python virtual environment0 码力 | 36 页 | 266.80 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020keyed windows are evaluated in parallel • Non-keyed windows are processed in a single thread To create a window operator, you need to specify two window components: • A window assigner determines how is assigned to it. Flink will never evaluate empty windows! Flink’s built-in window assigners create windows of type TimeWindow. : a time interval between the two timestamps, where start is inclusive // event-time sliding windows assigner val slidingAvgTemp = sensorData .keyBy(_.id) // create 1h event-time windows every 15 minutes .window(SlidingEventTimeWindows.of(Time.hours(1), Time0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020• stream-to-relation: define tables by selecting portions of a stream. • relation-to-stream: create streams through querying tables Declarative language: CQL 4 Vasiliki Kalavri | Boston University }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 100 iterations Create the feedback loop 13 Vasiliki Kalavri | Boston University 2020 Blocking vs. Non-Blocking operators tuples are continuously added. 34 Vasiliki Kalavri | Boston University 2020 Example: CREATE STREAM CREATE STREAM OpenAuction( itemID INT, sellerID CHAR(10), start_price REAL, start_time TIMESTAMP)0 码力 | 53 页 | 532.37 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flink[consumer group id], [number of partitions]) 15 / 79 Input Operations - Custom Sources (1/3) ▶ To create a custom source: extend the Receiver class. ▶ Implement onStart() and onStop(). ▶ Call store(data) close() } } 33 / 79 Output Operations (4/4) ▶ A better solution is to use rdd.foreachPartition ▶ Create a single connection object and send all the records in a RDD partition using that connection. dstream 79 Word Count in Spark Streaming (1/6) ▶ First we create a StreamingContex import org.apache.spark._ import org.apache.spark.streaming._ // Create a local StreamingContext with two working threads and0 码力 | 113 页 | 1.22 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020flatMap will be scoped to the key being processed Vasiliki Kalavri | Boston University 2020 • To create a state object, we have to register a StateDescriptor with Flink’s runtime via the RuntimeContext lastTempState: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { // create state descriptor val lastTempDescriptor = new ValueStateDescriptor[Double]("lastTemp", classOf[Double])0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming in Apache Flink// access the state for this key MovingAverage average = averageState.value(); // create a new MovingAverage (with window size 2) if none exists for this key if (average == null) average0 码力 | 45 页 | 3.00 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020even if the sender crashes • this technique guarantees at-least-once delivery RPC retries might create duplicates • RPCs can sometimes succeed even if they appear to have failed, i.e. a sender can only0 码力 | 49 页 | 2.08 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020partitioned in disjoint subsets • Single-pass computation: For each edge • if seen for the 1st time, create a component with ID the min of the vertex IDs • if in different components, merge them and update0 码力 | 72 页 | 7.77 MB | 1 年前3
共 10 条
- 1













