Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020operators have completely processed them. 4 Vasiliki Kalavri | Boston University 2020 Active Standby 5 • The secondary receives tuples from upstream and processes them in parallel with the primary Ni of each task to a remote, persistent storage 4. Wait until all tasks have finished their copies 5. Resume processing and stream ingestion 12 ??? Vasiliki Kalavri | Boston University 2020 –Leslie sha1_base64="K6iKeExfG5XRxe0YbdC6wv0424=">AB6HicbVBNSwMxEJ34WetX1aOXYBE 9lV0R1FvRi8cqri20S8m2TY0yS5JVihL/4EXDype/Une/Dem7R609cHA470ZuZFqeDGet43WlpeWV1bL2UN7e2d3Yre/uPJsk0ZQFNRKJbETFMcMUCy61grVQzIiPBmtHwZuI3n5g2PFEPdpSyUJK+4jGnxDrpXp500 码力 | 81 页 | 13.18 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020than (δ-ε)*N. 5 ??? Vasiliki Kalavri | Boston University 2020 Notation (II) • We define windows of size w = 1/ε with increasing numeric ids, starting from 1. • e.g., if ε=0.2, w=5 (5 items per window) Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 ??? Vasiliki Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0 w1 w4 w3 w2 1 2 2 3 5 w1 ??? Vasiliki Kalavri | Boston University 2020 Example 8 1 2 2 3 5 5 1 1 2 3 3 3 3 1 2 0 1 1 3 5 input stream ε=0.2 w1 w4 w3 w2 1 2 2 3 5 w1 1 2 3 5 1 0 2 0 1 0 1 0 f10 码力 | 31 页 | 1.47 MB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020search term “graph” ??? Vasiliki Kalavri | Boston University 2020 Basics 1 5 4 3 2 “node” or “vertex” “edge” 1 5 4 3 2 undirected graph directed graph 4 ??? Vasiliki Kalavri | Boston University Graph streams Graph streams model interactions as events that update an underlying graph structure 5 Edge events: A purchase, a movie rating, a like on an online post, a bitcoin transaction, a packet 1, 4 5 3 . . . 1 5 4 3 2 ??? Vasiliki Kalavri | Boston University 2020 14 (Vi+1, outbox) <— compute(Vi, inbox) Superstep i Superstep i+1 1 3, 4 2 1, 4 5 3 . . . 1 3, 4 2 1, 4 5 3 . .0 码力 | 72 页 | 7.77 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 5 Let n be the number of distinct elements in the input stream so far and let R be the maximum value of rank(.) seen so far. ??? Vasiliki Kalavri | Boston University 2020 5 Let n be the number of 0s at the end of a hash value by 1, 2R doubles! • R = 4, 2R = 16 distinct elements • R = 5, 2R = 32 distinct elements • R = 6, 2R = 64 distinct elements No estimate in between powers of 2 number of 0s at the end of a hash value by 1, 2R doubles! • R = 4, 2R = 16 distinct elements • R = 5, 2R = 32 distinct elements • R = 6, 2R = 64 distinct elements No estimate in between powers of 20 码力 | 69 页 | 630.01 KB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据INSERT F3152 + Icebe7g CDC导入i案 D6w5st7e+4 c65su4e 15c7e4e5t+3 ch+5ges 、gc近实k导入和实k读取。 2、计算a擎原生gcCDCe入,不需要额外的业务 字r设计。 3、统一的h据t存储,多o化的计算模型。 4、读取合并后的历史h据可F分利wI存加速。 5、云原生gc。 6、gc增量b取。 7、nm足够简s,无在线l务节u。 Part3t354-2 f4 f5 Part3t354-3 Ma43fest- Ma43fest-2 Ma43fest-3 f2 S4aps25t- S4aps25t-2 Meta Data 1NS/RT / UPDAT/ / D/2/T/ 写入 CR/AT/ TA,2/ D;ABl= ( id 1NT N5T NU22, d;E; 1NT N5T NU22, ( 1 (1 1 (3,5 1 (1,2 D (1,2 1 (1,3 1 (3,5 D (1,3 1 (1,2 D (1,2 1 (1,3 1 (3,5 D (1,3 1 (2,5 1NS/RT(1,2 UPDAT/{(1,2 )*(1,3 I 1NS/RT(3,5 D/2/T/(1,3 1NS/RT(2,5 1 (3,5 1 (2,5 S/2/CT0 码力 | 36 页 | 781.69 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 https://commons.wikimedia.org/wiki/File:Adaptive_streaming_overview_daseddon_2011_07_28.png 5 ??? Vasiliki Kalavri | Boston University 2020 Load shedding as an optimization problem N: query network s=0.5 c=5 s=1.0 O ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O 5 ??? Vasiliki s=0.5 c=5 s=1.0 O 5 12.5 ??? Vasiliki Kalavri | Boston University 2020 Overload detection (II) 12 Load coefficient for input I: Total load over m inputs: I c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 O0 码力 | 43 页 | 2.42 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020produce output 5 mi mo Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Vasiliki failure? op 1. receive an event 2. store in local buffer and possibly update state 3. produce output 5 mi mo Was mi fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University 2020 produce output What can go wrong: • lost events • duplicate or lost state updates • wrong result 5 mi mo Was mi fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University 20200 码力 | 49 页 | 2.08 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020query-driven / pull-based data-driven / push-based Queries ad-hoc continuous Latency relatively high low 5 Vasiliki Kalavri | Boston University 2020 Traditional DW vs. SDW Traditional DW SDW Update Frequency out-of-order, delayed data 4. Guarantee deterministic (on replay) and correct results (on recovery) 5. Combine batch (historical) and stream processing 6. Ensure availability despite failures 7. Support 1 2 20K 2 5 32K src dest total 1 2 20K 2 5 32K sum Results as continuously updated materialized views 20 Vasiliki Kalavri | Boston University 2020 src dest bytes 1 2 20K 2 5 32K 1 2 28K0 码力 | 45 页 | 1.22 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020don't care about an accident that happened 2 hours ago • Recent might mean different things • last 5 sec • last 10 events • last 1h every 10 min • last user session Window operators 2 Vasiliki applied on a WindowedStream (or AllWindowedStream) and processes the elements assigned to a window. 5 Keyed vs. non-keyed windows Vasiliki Kalavri | Boston University 2020 // define a keyed window operator component Vasiliki Kalavri | Boston University 2020 32 4 2 5 7 44 8 18 Window max over 5 last elements? 32 4 2 5 8 4 2 5 7 8 2 5 7 44 8 5 7 44 8 18 32 8 44 44 21 Can we compute the max more efficiently0 码力 | 35 页 | 444.84 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020From S1 [Rows 5], S2 [Rows 10] Where S1.A = S2.A Last 5 elements of stream S1 and last 10 elements of S2 stream-to-relation relation-to-relation relation-to-stream CQL Example 5 Vasiliki Kalavri Kalavri | Boston University 2020 Composite subscription pattern language A(X>0) & (B(Y=10);[timespan:5] C(Z<5))[within:15] A, B, C are topics X, Y, Z are inner fields The rule fires when an item of type an item of type B with Y = 10 is detected, followed (in a time interval of 5–15 s) by an item of type C with Z < 5. 8 Vasiliki Kalavri | Boston University 2020 Streaming Operators 9 Vasiliki0 码力 | 53 页 | 532.37 KB | 1 年前3
共 24 条
- 1
- 2
- 3
相关搜索词
ExactlyoncefaulttoleranceinApacheFlinkCS591K1DataStreamProcessingandAnalyticsSpring2020SkewmitigationGraphstreamingalgorithmsCardinalityfrequencyestimation如何实时分析Iceberg数据CDCFlowcontrolloadsheddingHighavailabilityrecoverysemanticsguaranteesprocessingfundamentalsWindowstriggersStreaminglanguagesoperator













