 PyFlink 1.15 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.150 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.150 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.160 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationword_count.py python3 word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__ ˓→file__)))" # It will logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting Started 5 pyflink-docs, Release release-1.160 码力 | 36 页 | 266.80 KB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020factorValues)) } }) 17 Vasiliki Kalavri | Boston University 2020 Configuration options conf/flink-conf.yaml contains the configuration options as a collection of key-value pairs with format key:value ./bin/flink run ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out Run with a class entry point and arguments: ./bin/flink ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out 19 Flink commands Vasiliki Kalavri | Boston University0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020factorValues)) } }) 17 Vasiliki Kalavri | Boston University 2020 Configuration options conf/flink-conf.yaml contains the configuration options as a collection of key-value pairs with format key:value ./bin/flink run ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out Run with a class entry point and arguments: ./bin/flink ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out 19 Flink commands Vasiliki Kalavri | Boston University0 码力 | 26 页 | 3.33 MB | 1 年前3
 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University 2020 MemoryStateBackend debugging purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints • Use for applications with very large state Which0 码力 | 24 页 | 914.13 KB | 1 年前3 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University 2020 MemoryStateBackend debugging purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints • Use for applications with very large state Which0 码力 | 24 页 | 914.13 KB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The JobManager writes the JobGraph and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations following steps: 1. It requests the storage locations from ZooKeeper to fetch the JobGraph, the JAR file, and the state handles of the last checkpoint from remote storage. 2. It requests processing slots Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 12 • Detect environment changes: external workload and system performance • Identify bottleneck0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020• The JobManager writes the JobGraph and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations following steps: 1. It requests the storage locations from ZooKeeper to fetch the JobGraph, the JAR file, and the state handles of the last checkpoint from remote storage. 2. It requests processing slots Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 12 • Detect environment changes: external workload and system performance • Identify bottleneck0 码力 | 41 页 | 4.09 MB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm oAk+MBjAE7zAqyOcZ+fNeZ+3lpxiZh9+wfn4BnBTjW4= Epoch-Completeness: Obtain an epoch-complete system configuration 36 ??? Vasiliki Kalavri | Boston University 2020 p1 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually captured A snapshot algorithm oAk+MBjAE7zAqyOcZ+fNeZ+3lpxiZh9+wfn4BnBTjW4= Epoch-Completeness: Obtain an epoch-complete system configuration 36 ??? Vasiliki Kalavri | Boston University 2020 p1- 0 码力 | 81 页 | 13.18 MB | 1 年前 3
 Streaming in Apache Flinkand shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream Streaming in Apache Flinkand shrinks • queryable: Flink state can be queried via a REST API Rich Functions • open(Configuration c) • close() • getRuntimeContext() DataStream- > input = … DataStream - > { private ValueState - averageState; @Override public void open (Configuration conf) { ValueStateDescriptor - descriptor = new ValueStateDescriptor<>("moving String, String> { private ValueState - blocked; @Override public void open(Configuration config) { blocked = getRuntimeContext().getState(new ValueStateDescriptor<>("blocked", Boolean 0 码力 | 45 页 | 3.00 MB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck Flink wordcount Every reconfiguration takes ~30s during which the system is unavailable Re-configuration requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University only complete state is migrated Helpers buffer data that cannot yet be safely routed and configuration commands that cannot yet be applied Live state migration ??? Vasiliki Kalavri | Boston University0 码力 | 93 页 | 2.42 MB | 1 年前3 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck Flink wordcount Every reconfiguration takes ~30s during which the system is unavailable Re-configuration requires state migration with correctness guarantees. ??? Vasiliki Kalavri | Boston University only complete state is migrated Helpers buffer data that cannot yet be safely routed and configuration commands that cannot yet be applied Live state migration ??? Vasiliki Kalavri | Boston University0 码力 | 93 页 | 2.42 MB | 1 年前3
 Flink如何实时分析Iceberg数据湖的CDC数据步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot Current Table Version Pointer Apac2e Ice-er1 Bas3c Part3t354- f f3 ( 53t3 file1 53t3 file1 (file2, 0 positio; 5elete file1 , (1,3 , (1,4 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,( 53t3 file1 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 53t3 file1 , (1,3 , (1,4 (file2, 0 , (1,( 53t3 file2 positio; 5elete file1 (1, 4 equ3lity 5elete file1 方e:2iIed CoF-delete aAd eDualitJ-delete I (1,2 S-1-CT * FR42 FamCle data file1 I (10 码力 | 36 页 | 781.69 KB | 1 年前3 Flink如何实时分析Iceberg数据湖的CDC数据步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot Current Table Version Pointer Apac2e Ice-er1 Bas3c Part3t354- f f3 ( 53t3 file1 53t3 file1 (file2, 0 positio; 5elete file1 , (1,3 , (1,4 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,( 53t3 file1 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 53t3 file1 , (1,3 , (1,4 (file2, 0 , (1,( 53t3 file2 positio; 5elete file1 (1, 4 equ3lity 5elete file1 方e:2iIed CoF-delete aAd eDualitJ-delete I (1,2 S-1-CT * FR42 FamCle data file1 I (10 码力 | 36 页 | 781.69 KB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flinkcategories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom categories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom from text data received over a TCP socket connection. ssc.socketTextStream("localhost", 9999) ▶ File stream • Reads data from files. streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory)0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flinkcategories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom categories of streaming sources: 1. Basic sources directly available in the StreamingContext API, e.g., file systems, socket connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom from text data received over a TCP socket connection. ssc.socketTextStream("localhost", 9999) ▶ File stream • Reads data from files. streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory)0 码力 | 113 页 | 1.22 MB | 1 年前3
共 14 条
- 1
- 2













