 PyFlink 1.15 Documentationto set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.15 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationto set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.15 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationto set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.16 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationto set up PyFlink development environment in your local machine. This is usually used for local execution or development in an IDE. Set up Python environment It requires Python 3.6 or above with PyFlink given Python virtual environment at client side (for job compiling) and server side (for Python UDF execution) separately. 1.1. Getting Started 7 pyflink-docs, Release release-1.16 • Specify the Python virtual cluster nodes during job execution. It should be noted that option -pyexec is also required to specify the Python virtual environment to use at server side (for Python UDF execution). For the Python virtual0 码力 | 36 页 | 266.80 KB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives || B Task: B || C Data: A || A ??? Vasiliki Kalavri | Boston University 2020 8 Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient way to execute strategies? • before execution or during runtime Query optimization (I) ??? Vasiliki Kalavri | Boston University 2020 10 Optimization strategies • enumerate equivalent execution plans • minimize intermediate0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020processing optimizations ??? Vasiliki Kalavri | Boston University 2020 2 • Costs of streaming operator execution • state, parallelism, selectivity • Dataflow optimizations • plan translation alternatives || B Task: B || C Data: A || A ??? Vasiliki Kalavri | Boston University 2020 8 Distributed execution in Flink ??? Vasiliki Kalavri | Boston University 2020 9 Identify the most efficient way to execute strategies? • before execution or during runtime Query optimization (I) ??? Vasiliki Kalavri | Boston University 2020 10 Optimization strategies • enumerate equivalent execution plans • minimize intermediate0 码力 | 54 页 | 2.83 MB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020approximate answers … S1 S2 Sr Input Manager Scheduler QoS Monitor Load Shedder Query Execution Engine Qm Q2 Q1 Ad-hoc or continuous queries Input streams … ??? Vasiliki Kalavri | Boston unnecessary result degradation! • Load shedding components rely on statistics gathered during execution: • A statistics manager module monitors processing and input rates and periodically estimates continuously or by running the system for a designated period of time, prior to regular query execution. 10 ??? Vasiliki Kalavri | Boston University 2020 Estimating cost and selectivity 11 • Selectivity:0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020approximate answers … S1 S2 Sr Input Manager Scheduler QoS Monitor Load Shedder Query Execution Engine Qm Q2 Q1 Ad-hoc or continuous queries Input streams … ??? Vasiliki Kalavri | Boston unnecessary result degradation! • Load shedding components rely on statistics gathered during execution: • A statistics manager module monitors processing and input rates and periodically estimates continuously or by running the system for a designated period of time, prior to regular query execution. 10 ??? Vasiliki Kalavri | Boston University 2020 Estimating cost and selectivity 11 • Selectivity:0 码力 | 43 页 | 2.42 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Dataflow Systems Distributed execution Partitioned state Exact results Out-of-order support Single-node execution Synopses and sketches Approximate results In-order data processing 2020 • No particular basic stream model (time-series, turnstile…) is imposed by the dataflow execution engine. • The burden of representation and denotations if left to the application developer/user out-of-order Results approximate exact Language SQL extensions, CQL Java, Scala, Python, SQL Execution centralized distributed Parallelism pipeline pipeline, task, data State limited, in-memory partitioned0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Dataflow Systems Distributed execution Partitioned state Exact results Out-of-order support Single-node execution Synopses and sketches Approximate results In-order data processing 2020 • No particular basic stream model (time-series, turnstile…) is imposed by the dataflow execution engine. • The burden of representation and denotations if left to the application developer/user out-of-order Results approximate exact Language SQL extensions, CQL Java, Scala, Python, SQL Execution centralized distributed Parallelism pipeline pipeline, task, data State limited, in-memory partitioned0 码力 | 45 页 | 1.22 MB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinktable, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinktable, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires table, as a static table. • Spark automatically converts this batch-like query to a streaming execution plan. ▶ 2. Specify triggers to control when to update the results. • Each time a trigger fires0 码力 | 113 页 | 1.22 MB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020HOk8+K8Ox+z1iWnmDmAP3A+fwCD9I4G We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid CfjxXg3PqatJaOY2QV/yvj8AfLTl3A= We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid m m’ System Possible Execution ??? Vasiliki Kalavri | Boston University 2020 Validity Explained p1 p2 p3 p1 p2 p3 m m’ C events in cut System Possible Execution ??? Vasiliki Kalavri | Boston0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020HOk8+K8Ox+z1iWnmDmAP3A+fwCD9I4G We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid CfjxXg3PqatJaOY2QV/yvj8AfLTl3A= We need to retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid m m’ System Possible Execution ??? Vasiliki Kalavri | Boston University 2020 Validity Explained p1 p2 p3 p1 p2 p3 m m’ C events in cut System Possible Execution ??? Vasiliki Kalavri | Boston0 码力 | 81 页 | 13.18 MB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. • A high-availability mode migrates the responsibility increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement • skew and straggler mitigation • Migrate to a different0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. • A high-availability mode migrates the responsibility increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement • skew and straggler mitigation • Migrate to a different0 码力 | 41 页 | 4.09 MB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020be the output stream produced by input e. In the event of a failure, let Of be the pre-failure execution of the primary and O’ the output produced by the secondary after recovery. • Precise recovery convergent-capable: it can re-build internal state in a way that it eventually converges to a non-failure execution output • repeatable: it produces identical duplicate tuples Vasiliki Kalavri | Boston University0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020be the output stream produced by input e. In the event of a failure, let Of be the pre-failure execution of the primary and O’ the output produced by the secondary after recovery. • Precise recovery convergent-capable: it can re-build internal state in a way that it eventually converges to a non-failure execution output • repeatable: it produces identical duplicate tuples Vasiliki Kalavri | Boston University0 码力 | 49 页 | 2.08 MB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020temperature”) } } Flink programs are defined in regular Scala/Java methods Set up the execution environment: local, cluster, I/O, time semantics, parallelism, … Example: Sensor Readings0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020temperature”) } } Flink programs are defined in regular Scala/Java methods Set up the execution environment: local, cluster, I/O, time semantics, parallelism, … Example: Sensor Readings0 码力 | 26 页 | 3.33 MB | 1 年前3
共 12 条
- 1
- 2













