std::flat_(multi)map/set - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

visiting one or multiple webpages Naive solution: maintain a hash table Convert the stream into a multi-set of uniformly distributed random numbers using a hash function. ??? Vasiliki Kalavri | Boston University encounter in the stream, the more different hash values we shall see. Convert the stream into a multi-set of uniformly distributed random numbers using a hash function. ??? Vasiliki Kalavri | Boston University to count cardinalities up to 1 billion or 230 with an accuracy of 4%. • The hash value needs to map elements to M = log2(230) = 30 bits. Space requirements ??? Vasiliki Kalavri | Boston University

0 码力 | 69 页 | 630.01 KB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Operator selectivity 6 • The number of output elements produced per number of input elements • a map operator has a selectivity of 1, i.e. it produces one output element for each input element it processes Kalavri | Boston University 2020 15 Safety • Attribute availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • Commutativity: the results of applying Kalavri | Boston University 2020 18 Safety • attribute availability: the set of attributes B reads from must be disjoint from the set of attributes A writes to. • commutativity: the results of applying

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

University 2020 Basic API Concept Source Data Stream Operator Data Stream Sink Source Data Set Operator Data Set Sink Writing a Flink Program 1.Bootstrap Sources 2.Apply Operators 3.Output to Sinks Kalavri | Boston University 2020 Streaming word count textStream .flatMap {_.split("\\W+")} .map {(_, 1)} .keyBy(0) .sum(1) .print() “live and let live” “live” “and” “let” “live” (live nt  val sensorData = env.addSource(new SensorSource)  val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id)  .max("temp")  maxTemp

0 码力 | 26 页 | 3.33 MB | 1 年前
3
PyFlink 1.15 Documentation

Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local Python environment. See Create a Python virtual environment for more details on how to barebone way of deploying Flink. This page shows you how to set up Python envi- ronment and execute PyFlink jobs in a standalone Flink cluster. Set up Python environment It requires Python 3.6 or above with

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

Local This page shows you how to set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires pre-installed to be available in your local environment. It’s suggested to use Python virtual environments to set up your local Python environment. See Create a Python virtual environment for more details on how to barebone way of deploying Flink. This page shows you how to set up Python envi- ronment and execute PyFlink jobs in a standalone Flink cluster. Set up Python environment It requires Python 3.6 or above with

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Scalable Stream Processing - Spark Streaming and Flink

20 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to 21 / 79 Transformations (2/4) ▶ map • Returns a new DStream by passing each element of the source DStream through a given function. ▶ flatMap • Similar to map, but each input item can be mapped to

0 码力 | 113 页 | 1.22 MB | 1 年前
3
Streaming in Apache Flink

Committer @ Apache Flink @SenorCarbone Contents • DataSet API • DataStream API • Concepts • Set up an environment to develop Flink programs • Implement streaming data processing pipelines • Flink } } Map Function DataStream rides = env.addSource(new TaxiRideSource(...)); DataStream enrichedNYCRides = rides .filter(new RideCleansing.NYCFilter()) .map(new Enrichment()); class Enrichment implements MapFunction { @Override public EnrichedRide map(TaxiRide taxiRide) throws Exception { return new EnrichedRide(taxiRide); } } FlatMap Function

0 码力 | 45 页 | 3.00 MB | 1 年前
3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020

val sensorData = env.addSource(new SensorSource)   val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id) .timeWindow(Time.minutes(1)) time characteristic 4 object AverageSensorReadings { def main(args: Array[String]) { // set up the streaming execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment Kalavri | Boston University 2020 val minTempPerWindow: DataStream[(String, Double)] = sensorData .map(r => (r.id, r.temperature)) .keyBy(_._1) .timeWindow(Time.seconds(15)) .reduce((r1, r2)

0 码力 | 35 页 | 444.84 KB | 1 年前
3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Kalavri | Boston University 2020 Load shedding as an optimization problem N: query network I: set of input streams with known arrival rates C: system processing capacity H: headroom factor, i.e selectivity 11 • Selectivity: how many records does the operator produce per record in its input? • map: 1 in 1 out • filter: 1 in, 1 or 0 out • flatMap, join: 1 in 0, 1, or more out • Cost: how many connected to multiple queries. 14 ??? Vasiliki Kalavri | Boston University 2020 Load Shedding Road Map (LSRM) • A pre-computed table that contains materialized load shedding plans ordered by how much

0 码力 | 43 页 | 2.42 MB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than being available in full before its processing associated keys. 38 Distributed dataflow model Vasiliki Kalavri | Boston University 2020 topK map print source w1 w2 w3 w6 w4 w5 w8 w7 Twitter source Extract hashtags Count topics Trends getExecutionEnvironment  val sensorData = env.addSource(new SensorSource)  val maxTemp = sensorData  .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0)))  .keyBy(_.id)  .max("temp")  maxTemp

0 码力 | 45 页 | 1.22 MB | 1 年前
3

共 21 条前往

页

分类

语言

格式

Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Scalable Stream Processing - Spark Streaming and Flink

Streaming in Apache Flink

Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020