Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 20202, 4), (1, 5, 3)) inputStream.keyBy(0).sum(1).print() 16 keyBy what’s the output for each key? Vasiliki Kalavri | Boston University 2020 coMap / coFlatMap val factors: DataStream[(String, Double)] options conf/flink-conf.yaml contains the configuration options as a collection of key-value pairs with format key:value Common options you might need to adjust: jobmanager.heap.size: JVM heap size A topic identifies a category of stream records stored in a Kafka cluster. Records consist of a key, a value, and a timestamp. A producer publishes a stream of records to a Kafka topic and a consumer0 码力 | 26 页 | 3.33 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020resources, safely terminate processes • Adjust dataflow channels and network connections • Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness resources, safely terminate processes • Adjust dataflow channels and network connections • Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness resources, safely terminate processes • Adjust dataflow channels and network connections • Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness0 码力 | 41 页 | 4.09 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020• A log can be partitioned, so that each partition can be read and written independently of others • a topic is a set of partitions • Within each partition, every message carries an offset, a monotonically monotonically increasing sequence number • Within a partition, all messages are totally ordered but there is no ordering guarantee across partitions 28 29 Failure handling • The broker does not delays: If a message is slow to process, this delays processing of subsequent messages, as each partition is read by a single thread What would you use when priority is: - latency but not ordering?0 码力 | 33 页 | 700.14 KB | 1 年前3
监控Apache Flink应用程序(入门)...................................................................................... 14 4.12.1 Key Metrics .......................................................................................... .................................................................................... 17 4.13.1.1 Key Metrics .......................................................................................... .......................................... 21 caolei – 监控Apache Flink应用程序(入门) – 3 4.13.2.1 Key Metrics ..........................................................................................0 码力 | 23 页 | 148.62 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020replicates a stream, commonly to be used as input to multiple downstream operators. • Group by / Partition Operators split a stream into sub-streams according to a function or the event contents. • one Pattern-Matching: a simpler approach SELECT ‘modified-pattern123’, X.CustomerId FROM webevents PARTITION BY CustomerId AS PATTERN (X Y Z) WHERE X.Event = ‘order’ AND Y.Event = ‘rebate’ Z.Event = ‘cancel’ AND Z.ItemID = Y.ItemID Partitions the stream into substreams according to a key A sequence of events that immediately follow one another AS PATTERN (X V* Y W* Z) • Match zero0 码力 | 53 页 | 532.37 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Keyed state is scoped to a key defined in the operator’s input records • Flink maintains one state instance per key value and partitions all records with the same key to the operator task that maintains maintains the state for this key • State access is automatically scoped to the key of the current record so that all records with the same key access the same state State management in Apache Flink Vasiliki Kalavri | Boston University 2020 RocksDB 10 RocksDB is an LSM-tree storage engine with key/value interface, where keys and values are arbitrary byte streams. https://rocksdb.org/ https://www0 码力 | 24 页 | 914.13 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkpairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of pairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of pairs where the values for each key are aggregated using the given reduce function. ▶ countByValue • Returns a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of0 码力 | 113 页 | 1.22 MB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据并DMerge-On-Rea- Mkh取 支持I量P取便于进一 步数RTransform I量h Apache Iceberg asic Data Metadata Database Table Partition Spec Manifest File TableMetadata Snapshot Current Table Version Pointer Apac2e Ice-er1 Bas3c cHFck_ePenON :ET =.4UE: (... TX6.3: /E4ETE 1R75 cHFck_ePenON W2ERE LMFIAMy_key - XX TX6.4: U8/.TE cHFck_ePenON :ET =.4UE:(... W2ERE LMFIAMy_key - XX :nALNEoO 5AnFDeNO /AOA//eHeOe 1FHeN 36:ERT 134E: /E4ETE0 码力 | 36 页 | 781.69 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020resources, safely terminate processes • Adjust dataflow channels and network connections • Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness migration if the state is large • Progressive • move state to be migrated in smaller pieces, e.g. key-by-key • can be used to interleave state transfer with processing • migration duration might increase0 码力 | 93 页 | 2.42 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 20201. Load: read the graph from disk and partition it in memory 10 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute: read and mutate from disk and partition it in memory 2. Compute: read and mutate the graph state 11 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2 2020 Distributed Stream Connected Components 36 1. partition the edge stream, e.g. by source Id 2. maintain a disjoint set in each partition 3. periodically merge the partial disjoint sets into0 码力 | 72 页 | 7.77 MB | 1 年前3
共 17 条
- 1
- 2













