PyFlink 1.15 Documentationmanagement: User-defined function registration, dropping, listing, etc. • Executing SQL queries • Job configuration • Python dependency management • Job submission For more details of how to create a TableEnvironmentTable Creation Table is a core component of the Python Table API. A Table object describes a pipeline of data transformations. It does not StreamExecutionEnvironment is responsible for: • DataStream Creation • Python dependency management • Job configuration • Job submission [1]: from pyflink.datastream import StreamExecutionEnvironment from pyflink 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationmanagement: User-defined function registration, dropping, listing, etc. • Executing SQL queries • Job configuration • Python dependency management • Job submission For more details of how to create a TableEnvironmentTable Creation Table is a core component of the Python Table API. A Table object describes a pipeline of data transformations. It does not StreamExecutionEnvironment is responsible for: • DataStream Creation • Python dependency management • Job configuration • Job submission [1]: from pyflink.datastream import StreamExecutionEnvironment from pyflink 0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Operator state Keyed state State types 6 Vasiliki Kalavri | Boston University 2020 A pluggable component that determines how state is stored, accessed, and maintained. State backends are responsible handle object private var lastTempState: ValueState[Double] = _ override def open(parameters: Configuration): Unit = { // create state descriptor val lastTempDescriptor = new ValueStateDescriptor[Double]("lastTemp" rideState; private ValueStatefareState; @Override public void open(Configuration config) { // initialize the state descriptors here rideState = getRuntimeContext() 0 码力 | 24 页 | 914.13 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 20204 5 3 . . . 1 3, 4 2 1, 4 5 3 . . . ??? Vasiliki Kalavri | Boston University 2020 15 A component is a subgraph in which every vertex is reachable from all other vertices in the subgraph. Connected State: the graph and a component ID per vertex • initially equal to vertex ID • Iterative step: For each vertex • choose the min of neighbors’ component IDs and own component ID as the new ID • • if the component ID changed since the last iteration, notify neighbors 16 ??? Vasiliki Kalavri | Boston University 2020 1 4 3 2 5 i=0 Batch Connected Components 17 6 7 8 ??? Vasiliki Kalavri0 码力 | 72 页 | 7.77 MB | 1 年前3
Streaming in Apache Flinkand shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream> input = … DataStream > { private ValueState averageState; @Override public void open (Configuration conf) { ValueStateDescriptor descriptor = new ValueStateDescriptor<>("moving String, String> { private ValueState blocked; @Override public void open(Configuration config) { blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked", Boolean 0 码力 | 45 页 | 3.00 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck Flink wordcount Every reconfiguration takes ~30s during which the system is unavailable Re-configuration requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University only complete state is migrated Helpers buffer data that cannot yet be safely routed and configuration commands that cannot yet be applied Live state migration ??? Vasiliki Kalavri | Boston University0 码力 | 93 页 | 2.42 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020. trigger evictor evaluation function result stream Custom windows 20 • Describe each component Vasiliki Kalavri | Boston University 2020 32 4 2 5 7 44 8 18 Window max over 5 last elements0 码力 | 35 页 | 444.84 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Implementation • Load shedding is commonly implemented by a standalone component integrated with the stream processor • The load shedder continuously monitors input rates or0 码力 | 43 页 | 2.42 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm oAk+MBjAE7zAqyOcZ+fNeZ+3lpxiZh9+wfn4BnBTjW4= Epoch-Completeness: Obtain an epoch-complete system configuration 36 ??? Vasiliki Kalavri | Boston University 2020 p10 码力 | 81 页 | 13.18 MB | 1 年前 3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Apache Flink • The TaskManagers ship data from sending tasks to receiving tasks. • The network component of a TaskManager collects records in buffers before they are shipped, i.e., records are not shipped0 码力 | 54 页 | 2.83 MB | 1 年前3
共 13 条
- 1
- 2













