 Apache Flink的过去、现在和未来Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN Runtime Distributed Streaming Dataflow DataStream API Stream Processing Processing Table API & SQL Relational Table API & SQL Relational Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN DataStream Physical 统一 Operator 抽象 Pull-based operator Push-based operator0 码力 | 33 页 | 3.36 MB | 1 年前3 Apache Flink的过去、现在和未来Distributed Streaming Dataflow Query Processor DAG & StreamOperator Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN Runtime Distributed Streaming Dataflow DataStream API Stream Processing Processing Table API & SQL Relational Table API & SQL Relational Local Single JVM Cloud GCE, EC2 Cluster Standalone, YARN DataStream Physical 统一 Operator 抽象 Pull-based operator Push-based operator0 码力 | 33 页 | 3.36 MB | 1 年前3
 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020| Boston University 2020 All data maintained by a task and used to compute results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator Keyed state is scoped to a key defined in the operator’s input records • Flink maintains one state instance per key value and partitions all records with the same key to the operator task that maintains of a key attribute: • For each distinct value of the key attribute, Flink maintains one state instance. • The keyed state instances of a function are distributed across all parallel tasks of the0 码力 | 24 页 | 914.13 KB | 1 年前3 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020| Boston University 2020 All data maintained by a task and used to compute results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator Keyed state is scoped to a key defined in the operator’s input records • Flink maintains one state instance per key value and partitions all records with the same key to the operator task that maintains of a key attribute: • For each distinct value of the key attribute, Flink maintains one state instance. • The keyed state instances of a function are distributed across all parallel tasks of the0 码力 | 24 页 | 914.13 KB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020factors: DataStream[(String, Double)] = … val priceRequests: DataStream[Item] = ... factors.connect(priceRequests).flatMap( new CoFlatMapFunction[(String, Double), Item, Offer] { // shared state with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on sent first, then M1 will have a lower offset than M2 and appear earlier in the log. • A consumer instance sees records in the order they are stored in the log. • For a topic with replication factor N0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020factors: DataStream[(String, Double)] = … val priceRequests: DataStream[Item] = ... factors.connect(priceRequests).flatMap( new CoFlatMapFunction[(String, Double), Item, Offer] { // shared state with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on sent first, then M1 will have a lower offset than M2 and appear earlier in the log. • A consumer instance sees records in the order they are stored in the log. • For a topic with replication factor N0 码力 | 26 页 | 3.33 MB | 1 年前3
 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 val forwardedReadings = readings // connect readings and switches .connect(filterSwitches) // key by sensor ids .keyBy(_.id, _._1) // apply filtering0 码力 | 35 页 | 444.84 KB | 1 年前3 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 val forwardedReadings = readings // connect readings and switches .connect(filterSwitches) // key by sensor ids .keyBy(_.id, _._1) // apply filtering0 码力 | 35 页 | 444.84 KB | 1 年前3
 Streaming in Apache FlinkfromElements("data", "DROP", "artisans", "IGNORE") .keyBy(x -> x); control .connect(datastreamOfWords) .flatMap(new ControlFunction()) .print(); public static class ControlFunction0 码力 | 45 页 | 3.00 MB | 1 年前3 Streaming in Apache FlinkfromElements("data", "DROP", "artisans", "IGNORE") .keyBy(x -> x); control .connect(datastreamOfWords) .flatMap(new ControlFunction()) .print(); public static class ControlFunction0 码力 | 45 页 | 3.00 MB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020by an operator instance in deserialization, processing, and serialization activities. • excludes any time spent waiting on input or on output • amounts to the time an operator instance runs for if executed drops DS2 converges in 2 steps for both operators 1 2 Transient underpovisioning by 1 instance 28 DS2 scaling actions on Apache Flink wordcount 28 DS2 scaling actions on Apache Flink wordcount0 码力 | 93 页 | 2.42 MB | 1 年前3 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020by an operator instance in deserialization, processing, and serialization activities. • excludes any time spent waiting on input or on output • amounts to the time an operator instance runs for if executed drops DS2 converges in 2 steps for both operators 1 2 Transient underpovisioning by 1 instance 28 DS2 scaling actions on Apache Flink wordcount 28 DS2 scaling actions on Apache Flink wordcount0 码力 | 93 页 | 2.42 MB | 1 年前3
 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020to_stream(scope) .concat(&stream) .inspect(|x| println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 1000 码力 | 53 页 | 532.37 KB | 1 年前3 Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020to_stream(scope) .concat(&stream) .inspect(|x| println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration Example Terminate after 1000 码力 | 53 页 | 532.37 KB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020processed • Ensure each worker is qualified: if load balancing is applied after fission, each instance must be capable of processing each item and have access to necessary state • Establish placement skew, e.g. when there exist popular keys • if there is skew, throughput is bounded by the instance that receives the highest load Load balancing Profitability A2 A1 split A2 A1 split ???0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020processed • Ensure each worker is qualified: if load balancing is applied after fission, each instance must be capable of processing each item and have access to necessary state • Establish placement skew, e.g. when there exist popular keys • if there is skew, throughput is bounded by the instance that receives the highest load Load balancing Profitability A2 A1 split A2 A1 split ???0 码力 | 54 页 | 2.83 MB | 1 年前3
 PyFlink 1.15 Documentationand simply selecting a column does not trigger the computation but it returns a Column Expression instance. [15]: from pyflink.table.expressions import col type(table.id)==type(col('id')) [15]: True 1 1 2 [17]: table.select(col('id')).to_pandas() [17]: id 0 1 1 2 Assign new Column Expression instance. [18]: table.add_columns(col('data').upper_case.alias('upper_data')).to_pandas() [18]: id data0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationand simply selecting a column does not trigger the computation but it returns a Column Expression instance. [15]: from pyflink.table.expressions import col type(table.id)==type(col('id')) [15]: True 1 1 2 [17]: table.select(col('id')).to_pandas() [17]: id 0 1 1 2 Assign new Column Expression instance. [18]: table.add_columns(col('data').upper_case.alias('upper_data')).to_pandas() [18]: id data0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationand simply selecting a column does not trigger the computation but it returns a Column Expression instance. [15]: from pyflink.table.expressions import col type(table.id)==type(col('id')) [15]: True 1 1 2 [17]: table.select(col('id')).to_pandas() [17]: id 0 1 1 2 Assign new Column Expression instance. [18]: table.add_columns(col('data').upper_case.alias('upper_data')).to_pandas() [18]: id data0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationand simply selecting a column does not trigger the computation but it returns a Column Expression instance. [15]: from pyflink.table.expressions import col type(table.id)==type(col('id')) [15]: True 1 1 2 [17]: table.select(col('id')).to_pandas() [17]: id 0 1 1 2 Assign new Column Expression instance. [18]: table.add_columns(col('data').upper_case.alias('upper_data')).to_pandas() [18]: id data0 码力 | 36 页 | 266.80 KB | 1 年前3
共 13 条
- 1
- 2













