Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020secondary I1 O1 N’i I1 Vasiliki Kalavri | Boston University 2020 Passive Standby • Each primary periodically checkpoints its state and sends it to the secondary 6 Ni primary secondary I1 O1 N’i update net/pubs/pubs.html#chandy 13 ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 mAB53icbVBNS8NAEJ34WetX1aOXxSJ4Kok I6q3o Giv5uc9ur1qza27M5Bl4hWkBgWavepXt5+wLEZpmKBadzw3NUFOleFM 4KTSzTSmlI3oADuWShqjDvLZsRNyYpU+iRJlSxoyU39P5DTWehyHtjOmZqgXvan4n9fJTHQZ5FymUHJ5oui TBCTkOnpM8VMiPGlCmuL2VsCFVlBmbT8WG4C2+vEz8s/pV3bs7rzWuizTKcA 0 码力 | 81 页 | 13.18 MB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据、CDC记录实时写入HBase。高吞P + 低延迟。 2、小vSg询延迟低。 3、集u可拓展 ci评C B点 、行存o引不适O分析A务。 2、HBase集ur护成e较高。 3、通过Re12o4Server定DHF23e, ServerlB化Rs存完H用不上。 4、数a格式q定HF23e,不cF拓展到 +arquet、Avro、Orcn。 t点 A3a/21 Kudu 维护 CDC 数据p 、支持L时更新数据,时效性佳。 、支持L时更新数据,时效性佳。 2、CK加速,适合OLAP分析。 方案评估 优点 、cedKudup群,a较小众。维护 O本q。 2、H HDFS / S3 / OSS 等D裂。数据c e,且KAO本不如S3 / OSS。 3、Kudud批量P描不如3ar4u1t。 4、不支持增量SF。 h点 直接D入CDC到Hi2+分析 、流程能E作 2、Hi2+存量数据不受增量数据H响。 方案评估 2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E.GE=D.< WHEN MAT(HE) AN) +LA,=H)H THEN )ELETE WHEN MAT(HE) AN) +LA,<>H)H0 码力 | 36 页 | 781.69 KB | 1 年前3
PyFlink 1.15 Documentation. . . . . . . . . . . . . . . . . . . 22 1.3.1.1 O1: Scala Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1.2 O2: Java gateway process exited before sending its port . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . . . . . . issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.3.1 O1: InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible:0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation. . . . . . . . . . . . . . . . . . . 22 1.3.1.1 O1: Scala Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1.2 O2: Java gateway process exited before sending its port . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . . . . . . issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.3.1 O1: InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible:0 码力 | 36 页 | 266.80 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020other apps I1 I2 O1 O2 N’1 N’K N’2 … I’1 I’2 O’1 O’2 6 Vasiliki Kalavri | Boston University 2020 Assumptions Ni primary secondary I1 I2 O1 O2 N’i I’1 I’2 O’1 O’2 • The communication let Of be the pre-failure execution of the primary and O’ the output produced by the secondary after recovery. • Precise recovery guarantees Of + O’ = Oe • Rollback recovery allows duplicate tuples downstream: much input do we need to re-play? How expensive is it to re-construct the state? How fast can we de-duplicate output? Vasiliki Kalavri | Boston University 2020 Gap Recovery • Restart the operator0 码力 | 49 页 | 2.08 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who assembled from the following sources: • Martin Kleppmann. Designing data-intensive applications (O’Reilly Media) • Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec0 码力 | 33 页 | 700.14 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 RocksDBStateBackend • Stores all state into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints • using a ReduceFunction • ReducingState.add(value: T) • ReducingState.get() • AggregatingState[I, O]: aggregates values using an AggregateFunction Flink’s state primitives 14 Vasiliki Kalavri | Boston0 码力 | 24 页 | 914.13 KB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020serialization send message waiting waiting 13 ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we scale ο1 x 4? How much to scale ο2? o1 cannot keep up waiting for for output waiting for input src o1 o2 ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we0 码力 | 93 页 | 2.42 MB | 1 年前3
Apache Flink的过去、现在和未来Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment Shipping Flow-Control Async Call Auto Scale0 码力 | 33 页 | 3.36 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020c=5 s=1.0 O ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O 5 ??? Vasiliki c=5 s=1.0 O 5 12.5 ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O 5 12.5 5 L2=18.75 O2 I1 c=10 s=0.5 c=10 s=0.8 c=5 s=1.0 O1 c=10 s=0.9 5 ??? Vasiliki Kalavri | Boston University 2020 13 I2 c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 12.5 L2=18.75 O2 I1 c=10 s=00 码力 | 43 页 | 2.42 MB | 1 年前3
共 20 条
- 1
- 2













