 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR Xmibw3oxSDmPYljzijxko+ds8evW6l6tbcKcgi8Qp ShQKNbuWr0tYFqM0TFCt256bmiCnynAmcFzuZBpTyoa0j21LJY1RB/n02DE5tkqPRImyJQ2Zqr8nchprPYpD2xlTM9Dz3kT8z2tnJroMci7TzKBks0VRJohJyORz0uMKmREjSyhT3N5K2IAqyozNp2xD8OZfXiT+ae2q5t2dV+vXR0 码力 | 81 页 | 13.18 MB | 1 年前3
 Flink如何实时分析Iceberg数据湖的CDC数据2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E.GE=D.< WHEN MAT(HE) AN) +LA,=H)H THEN )ELETE WHEN MAT(HE) AN) +LA,<>H)H I (1,2 I (1,2 D (1,2 I (1,2 D (1,2 I (1,2 IN-E,T(1,2 DE(ETE(1,2 IN-E,T(1,2 问题:)1rg1-On-,1adL终读取RM为空集,实际I该返E I(1,2 。 IN-E,T DE(ETE data 2il11 data 2il11 1quality d1l1t1 2il11 data 2il11 1quality (1,3 , (1,4 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,( 53t3 file1 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,4 53t3 file1 , (1,3 , (1,4 (file2, 0 , (1,( 53t30 码力 | 36 页 | 781.69 KB | 1 年前3 Flink如何实时分析Iceberg数据湖的CDC数据2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E.GE=D.< WHEN MAT(HE) AN) +LA,=H)H THEN )ELETE WHEN MAT(HE) AN) +LA,<>H)H I (1,2 I (1,2 D (1,2 I (1,2 D (1,2 I (1,2 IN-E,T(1,2 DE(ETE(1,2 IN-E,T(1,2 问题:)1rg1-On-,1adL终读取RM为空集,实际I该返E I(1,2 。 IN-E,T DE(ETE data 2il11 data 2il11 1quality d1l1t1 2il11 data 2il11 1quality (1,3 , (1,4 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,( 53t3 file1 , (1,3 , (1,4 , (1,( 53t3 file2 , (1,2 DE-E2E (1,4 53t3 file1 , (1,3 , (1,4 (file2, 0 , (1,( 53t30 码力 | 36 页 | 781.69 KB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020much input do we need to re-play? How expensive is it to re-construct the state? How fast can we de-duplicate output? Vasiliki Kalavri | Boston University 2020 Gap Recovery • Restart the operator net/fig/5-2 • Receivers store a catalog of all identifiers they have seen and processed. • The de-duplication catalog is stored in a scalable key/value store. Vasiliki Kalavri | Boston University Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive • In a healthy pipeline though, most records will not be duplicates •0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020much input do we need to re-play? How expensive is it to re-construct the state? How fast can we de-duplicate output? Vasiliki Kalavri | Boston University 2020 Gap Recovery • Restart the operator net/fig/5-2 • Receivers store a catalog of all identifiers they have seen and processed. • The de-duplication catalog is stored in a scalable key/value store. Vasiliki Kalavri | Boston University Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive • In a healthy pipeline though, most records will not be duplicates •0 码力 | 49 页 | 2.08 MB | 1 年前3
 Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Stefani, Lorenzo De, et al. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 2017. https://www.kdd.org/ kdd2016/papers/files/rfp0465-de-stefaniA.pdf Further0 码力 | 72 页 | 7.77 MB | 1 年前3 Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020Stefani, Lorenzo De, et al. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 2017. https://www.kdd.org/ kdd2016/papers/files/rfp0465-de-stefaniA.pdf Further0 码力 | 72 页 | 7.77 MB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who0 码力 | 33 页 | 700.14 KB | 1 年前3
 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 RocksDBStateBackend • Stores all state into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints •0 码力 | 24 页 | 914.13 KB | 1 年前3 State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 RocksDBStateBackend • Stores all state into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints •0 码力 | 24 页 | 914.13 KB | 1 年前3
 PyFlink 1.15 Documentationin a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 Documentationin a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 Documentationin a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 Documentationin a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.80 KB | 1 年前3
共 8 条
- 1













