 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 21 Online recommendations Vasiliki Kalavri | Boston University 2020 Sensor measurements analysis • Monitoring applications • Complex filtering and alarm activation • Aggregation of multiple within 2% of today’s high. 23 Vasiliki Kalavri | Boston University 2020 Financial transaction analysis • Fraud detection, online risk calculation Example: Someone steals your phone and sings in your Web activity analysis • Visualization and aggregation • impressions, clicks, transactions, likes, comments • Analytics on user activity • Filtering, aggregation, joins with static data (e.g. user0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 21 Online recommendations Vasiliki Kalavri | Boston University 2020 Sensor measurements analysis • Monitoring applications • Complex filtering and alarm activation • Aggregation of multiple within 2% of today’s high. 23 Vasiliki Kalavri | Boston University 2020 Financial transaction analysis • Fraud detection, online risk calculation Example: Someone steals your phone and sings in your Web activity analysis • Visualization and aggregation • impressions, clicks, transactions, likes, comments • Analytics on user activity • Filtering, aggregation, joins with static data (e.g. user0 码力 | 34 页 | 2.53 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively static and historical data • batched updates during downtimes, e.g. every night Streaming pre-aggregated, pre-processed streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020access, high-rate append-only updates Data Warehouse • complex, offline analysis • large and relatively static and historical data • batched updates during downtimes, e.g. every night Streaming pre-aggregated, pre-processed streams and historical data Data Management Approaches 4 storage analytics static data streaming data Vasiliki Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent0 码力 | 45 页 | 1.22 MB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework • True streaming at its core • Streaming & Batch API Historic data Kafka, RabbitMQ0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kafka Vasiliki Kalavri | Boston University 2020 Apache Flink • An open-source, distributed data analysis framework • True streaming at its core • Streaming & Batch API Historic data Kafka, RabbitMQ0 码力 | 26 页 | 3.33 MB | 1 年前3
 Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020that contain a vertex and all of its neighbors. Although this model can enable a theoretical analysis of streaming algorithms, it cannot adequately model real-world unbounded streams, as the neighbors0 码力 | 72 页 | 7.77 MB | 1 年前3 Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020that contain a vertex and all of its neighbors. Although this model can enable a theoretical analysis of streaming algorithms, it cannot adequately model real-world unbounded streams, as the neighbors0 码力 | 72 页 | 7.77 MB | 1 年前3
 Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020http://infolab.stanford.edu/~ullman/mmds/book.pdf • Ken Christensen, Allen Roginsky, Miguel Jimeno. A new analysis of the false positive rate of a Bloom filter. Information Processing Letters 110 (2010). Further0 码力 | 74 页 | 1.06 MB | 1 年前3 Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020http://infolab.stanford.edu/~ullman/mmds/book.pdf • Ken Christensen, Allen Roginsky, Miguel Jimeno. A new analysis of the false positive rate of a Bloom filter. Information Processing Letters 110 (2010). Further0 码力 | 74 页 | 1.06 MB | 1 年前3
 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020cardinalities. European Symposium on Algorithms, 2003. • Flajolet, Philippe, et al. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. 2007. https://hal.archives-ouvertes.fr/fi0 码力 | 69 页 | 630.01 KB | 1 年前3 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020cardinalities. European Symposium on Algorithms, 2003. • Flajolet, Philippe, et al. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. 2007. https://hal.archives-ouvertes.fr/fi0 码力 | 69 页 | 630.01 KB | 1 年前3
 PyFlink 1.15 Documentationand streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationand streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationand streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationand streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes. If you’re already familiar with Python and libraries0 码力 | 36 页 | 266.80 KB | 1 年前3
 Streaming in Apache Flinktotal fare collected Lab 1 -- Ride Cleansing Transforming Data Transforming Data public static class EnrichedRide extends TaxiRide { public int startCell; public int endCell; public filter(new RideCleansing.NYCFilter()) .map(new Enrichment()); enrichedNYCRides.print(); public static class Enrichment implements MapFunction Streaming in Apache Flinktotal fare collected Lab 1 -- Ride Cleansing Transforming Data Transforming Data public static class EnrichedRide extends TaxiRide { public int startCell; public int endCell; public filter(new RideCleansing.NYCFilter()) .map(new Enrichment()); enrichedNYCRides.print(); public static class Enrichment implements MapFunction- { @Override public EnrichedRide taxiRide) throws Exception { return new EnrichedRide(taxiRide); } } FlatMap Function public static class NYCEnrichment implements FlatMapFunction - { @Override public void 0 码力 | 45 页 | 3.00 MB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinkmain steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2 main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2 main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 20 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinkmain steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2 main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2 main steps to develop a Spark stuctured streaming: ▶ 1. Defines a query on the input table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 20 码力 | 113 页 | 1.22 MB | 1 年前3
共 15 条
- 1
- 2













