Streaming in Apache FlinkExamples Tuples Tuple1 through Tuple25 types. POJOs A POJO (plain old Java object) is any Java class that • has an empty default constructor • all fields are either ◦public, or ◦have a default getter Tuple2<>("Fred", 35); // zero based index! String name = person.f0; Integer age = person.f1; public class Person { public String name; public Integer age; public Person() {}; public total fare collected Lab 1 -- Ride Cleansing Transforming Data Transforming Data public static class EnrichedRide extends TaxiRide { public int startCell; public int endCell; public EnrichedRide()0 码力 | 45 页 | 3.00 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 DataStream API Basics Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) temperature”) } } Example: Sensor Readings 7 Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String]) temperature reading Example: Sensor Readings 8 Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String])0 码力 | 26 页 | 3.33 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020data types handled by the state are specified as Class or TypeInformation objects. 16 Registering state Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val threshold: Double) assign name and get the state handle In the operator (FlatMap) class In the open() method Vasiliki Kalavri | Boston University 2020 class TemperatureAlertFunction(val threshold: Double) extends Rich flatMap(new MatchFunction()); Java example 20 Vasiliki Kalavri | Boston University 2020 public static class EnrichmentFunction extends RichCoFlatMapFunction> { 0 码力 | 24 页 | 914.13 KB | 1 年前3
PyFlink 1.15 DocumentationDataStream API. [7]: from pyflink.common import Row from pyflink.datastream import FlatMapFunction class MyFlatMapFunction(FlatMapFunction): def flat_map(self, value): for s in str(value.data).split('|'): scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104)0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationDataStream API. [7]: from pyflink.common import Row from pyflink.datastream import FlatMapFunction class MyFlatMapFunction(FlatMapFunction): def flat_map(self, value): for s in str(value.data).split('|'): scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104)0 码力 | 36 页 | 266.80 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020average temperature per sensor. // The accumulator holds the sum of temperatures and an event count. class AvgTempFunction extends AggregateFunction [(String, Double), (String, Double, Int), (String, Double)] watermark. ProcessWindowFunction 18 Vasiliki Kalavri | Boston University 2020 public abstract class ProcessWindowFunctionextends AbstractRichFunction { // Evaluates KEY key, Context ctx, Iterable vals, Collector out) throws Exception; public abstract class Context implements Serializable { // Returns the metadata of the window public 0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Announcements, updates, discussions • Website: vasia.github.io/dspa20 • Syllabus: /syllabus.html • Class schedule: /lectures.html • including today’s slides • Piazza: piazza.com/bu/spring2020/cs591k1/home applications 6 Vasiliki Kalavri | Boston University 2020 Grading Scheme (1) • No Exam • 5 in-class quizzes (10%): • Each quiz contributes 2% to the final grade • 3 hands-on assignments (40%): Kalavri | Boston University 2020 Schedule 9 vasia.github.io/dspa20/ lectures.html deadline no class guest lecture quizzes and announcements Vasiliki Kalavri | Boston University 2020 Guest Lectures0 码力 | 34 页 | 2.53 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinksource: extend the Receiver class. ▶ Implement onStart() and onStop(). ▶ Call store(data) to store received data inside Spark. 16 / 79 Input Operations - Custom Sources (2/3) class CustomReceiver(host: String 79 Basic Operations ▶ Most of operations on DataFrame/Dataset are supported for streaming. case class Call(action: String, time: Timestamp, id: Int) val df: DataFrame = spark.readStream.json("s3://logs")0 码力 | 113 页 | 1.22 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020setAutoWatermarkInterval(5000) Watermarks in Flink 18 Vasiliki Kalavri | Boston University 2020 class PeriodicAssigner extends AssignerWithPeriodicWatermarks[Reading] { val bound: Long = 60 * 1000 // // return record timestamp r.timestamp } } 19 Vasiliki Kalavri | Boston University 2020 20 class PunctuatedAssigner extends AssignerWithPunctuatedWatermarks[Reading] { val bound: Long = 60 * 10000 码力 | 22 页 | 2.22 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020punctuations • window fires, post becomes inactive 41 Vasiliki Kalavri | Boston University 2020 case class Reading(id: String, time: Long, temp: Double) object MaxSensorReadings { def main(args: Array[String])0 码力 | 45 页 | 1.22 MB | 1 年前3
共 11 条
- 1
- 2













