 PyFlink 1.15 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page there are any problems, you could perform the following checks. Check the logging messages in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import /path/to/python/site-packages/pyflink # Check the logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page there are any problems, you could perform the following checks. Check the logging messages in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import /path/to/python/site-packages/pyflink # Check the logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page there are any problems, you could perform the following checks. Check the logging messages in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import /path/to/python/site-packages/pyflink # Check the logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationtry PyFlink out without any other step: • Live Notebook: Table • Live Notebook: DataStream The list below is the contents of this quickstart page: 1.1.1 Installation 1.1.1.1 Preparation This page there are any problems, you could perform the following checks. Check the logging messages in the log file to see if there are any problems: # Get the installation directory of PyFlink python3 -c "import /path/to/python/site-packages/pyflink # Check the logging under the log directory ls -lh /path/to/python/site-packages/pyflink/log # You will see the log file as following: (continues on next page) 1.1. Getting0 码力 | 36 页 | 266.80 KB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 • Change parallelism • scale out to process increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated over time 10 events/s time rate decrease events/s time with keyed state are scaled by repartitioning keys • Operators with operator list state are scaled by redistributing the list entries. • Operators with operator broadcast state are scaled up by copying0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 • Change parallelism • scale out to process increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated over time 10 events/s time rate decrease events/s time with keyed state are scaled by repartitioning keys • Operators with operator list state are scaled by redistributing the list entries. • Operators with operator broadcast state are scaled up by copying0 码力 | 41 页 | 4.09 MB | 1 年前3
 Flink如何实时分析Iceberg数据湖的CDC数据CDsBPHNRHML, UDHFGR 1RO6 mysOl_AHLlMF; FHRGSA.BMm/TDPTDPHBa/ElHLI-BCB-BMLLDBRMPs Flink 原生支持 Change Log Stream A C D E F G INSERT DELETE UPDATE INSERT DELETE UPDATE INSERT F3152 + Icebe7g0 码力 | 36 页 | 781.69 KB | 1 年前3 Flink如何实时分析Iceberg数据湖的CDC数据CDsBPHNRHML, UDHFGR 1RO6 mysOl_AHLlMF; FHRGSA.BMm/TDPTDPHBa/ElHLI-BCB-BMLLDBRMPs Flink 原生支持 Change Log Stream A C D E F G INSERT DELETE UPDATE INSERT DELETE UPDATE INSERT F3152 + Icebe7g0 码力 | 36 页 | 781.69 KB | 1 年前3
 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020from 1. • e.g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove elements. 6 ??? Vasiliki Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur: algorithm terminates D contains an item x if its actual frequency is fx > ε*N Worst case: O( 1 ε * log(εN)) counters ??? Vasiliki Kalavri | Boston University 2020 The power of two choices • Instead,0 码力 | 31 页 | 1.47 MB | 1 年前3 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020from 1. • e.g., if ε=0.2, w=5 (5 items per window) • wcur: the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove elements. 6 ??? Vasiliki Kalavri | Boston University 2020 Lossy counting algorithm D = {} // empty list wcur = 1 // first window id N = 0 // elements seen so far Insert step For each element x in wcur: algorithm terminates D contains an item x if its actual frequency is fx > ε*N Worst case: O( 1 ε * log(εN)) counters ??? Vasiliki Kalavri | Boston University 2020 The power of two choices • Instead,0 码力 | 31 页 | 1.47 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020NASDAQ stock trades over time This model poses a severe limitation on the stream: updates cannot change past entries in A. 11 Useful in theory for the development of streaming algorithms With limited represented in various ways: • as the concatenation of serializations of the relations. • as a list of tuple-index pairs, where Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020NASDAQ stock trades over time This model poses a severe limitation on the stream: updates cannot change past entries in A. 11 Useful in theory for the development of streaming algorithms With limited represented in various ways: • as the concatenation of serializations of the relations. • as a list of tuple-index pairs, where- indicates that t ∈ rj • as a serialization of r1 followed by 0 码力 | 45 页 | 1.22 MB | 1 年前3
 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Let h be a hash function that maps each stream element into M = log2N bits, where N is the domain of input elements: For each element x, let rank(x) be the number maximum value of rank(.) seen so far. ̂n = 2R Claim: The maximum observed rank is a good estimate of log2n. In other words, the estimated number of distinct elements is equal to: ??? Vasiliki Kalavri | elements. We need a hash function that maps each input element to log2n bits. Then, each counter needs to be able to count up to log2(log2n) 0s. 13 ??? Vasiliki Kalavri | Boston University 2020 14 Combining0 码力 | 69 页 | 630.01 KB | 1 年前3 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Let h be a hash function that maps each stream element into M = log2N bits, where N is the domain of input elements: For each element x, let rank(x) be the number maximum value of rank(.) seen so far. ̂n = 2R Claim: The maximum observed rank is a good estimate of log2n. In other words, the estimated number of distinct elements is equal to: ??? Vasiliki Kalavri | elements. We need a hash function that maps each input element to log2n bits. Then, each counter needs to be able to count up to log2(log2n) 0s. 13 ??? Vasiliki Kalavri | Boston University 2020 14 Combining0 码力 | 69 页 | 630.01 KB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020subscription's queue of messages. 25 Log-structured brokers Logs as message brokers • In typical message brokers, once a message is consumed it is deleted • Log-based message brokers take a different partitioned) log • A log is an append-only sequence of records on disk • a producer generates messages by simply appending them to the log and a consumer receives messages by reading the log sequentially sequentially 27 Partitions and offsets • A log can be partitioned, so that each partition can be read and written independently of others • a topic is a set of partitions • Within each partition, every0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020subscription's queue of messages. 25 Log-structured brokers Logs as message brokers • In typical message brokers, once a message is consumed it is deleted • Log-based message brokers take a different partitioned) log • A log is an append-only sequence of records on disk • a producer generates messages by simply appending them to the log and a consumer receives messages by reading the log sequentially sequentially 27 Partitions and offsets • A log can be partitioned, so that each partition can be read and written independently of others • a topic is a set of partitions • Within each partition, every0 码力 | 33 页 | 700.14 KB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020the Kafka cluster maintains a partitioned log. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. An offset is a sequential id number time to free up disk space. 22 Vasiliki Kalavri | Boston University 2020 23 Partitions allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on will have a lower offset than M2 and appear earlier in the log. • A consumer instance sees records in the order they are stored in the log. • For a topic with replication factor N, we will tolerate0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020the Kafka cluster maintains a partitioned log. Each partition is an ordered, immutable sequence of records that is continually appended to—a structured commit log. An offset is a sequential id number time to free up disk space. 22 Vasiliki Kalavri | Boston University 2020 23 Partitions allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on will have a lower offset than M2 and appear earlier in the log. • A consumer instance sees records in the order they are stored in the log. • For a topic with replication factor N, we will tolerate0 码力 | 26 页 | 3.33 MB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020p4 34 ??? Vasiliki Kalavri | Boston University 2020 • Assumptions: • DAG of tasks • Epoch change events triggered on each source task (⟨ep1⟩,⟨ep2⟩,…) • Issued by a coordinator or generated periodically 4y cle0= A epoch-complete consistent cut that includes events that 1. precede epoch change Epoch cuts p4 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020p4 34 ??? Vasiliki Kalavri | Boston University 2020 • Assumptions: • DAG of tasks • Epoch change events triggered on each source task (⟨ep1⟩,⟨ep2⟩,…) • Issued by a coordinator or generated periodically 4y cle0= A epoch-complete consistent cut that includes events that 1. precede epoch change Epoch cuts p4- AB6XicbVBNS 8NAEJ3Ur1q/qh69L 4y cle0= A epoch-complete consistent cut that includes events that 1. precede epoch change 2. are produced by events in cut Epoch cuts p4- AB6XicbVBNS 0 码力 | 81 页 | 13.18 MB | 1 年前3
共 19 条
- 1
- 2













