Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020of length 5. We split the stream into m = 2p = 4 sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} ??? Vasiliki Kalavri | Boston University 2020 11 Stochastic averaging: example Let of length 5. We split the stream into m = 2p = 4 sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} Substream Address Counter S0 00 S1 01 S2 10 S3 11 ??? Vasiliki Kalavri | Boston sub-streams. Consider the input elements {5, 14, 5, 2, 8, 1, …} Substream Address Counter S0 00 S1 01 S2 10 S3 11 • x1=5, h5(5) = 00101 • x2=14, h5(14) = 10110 • x3=5, h5(5) = 00101 •0 码力 | 69 页 | 630.01 KB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020iVe QGhRo9apf3X7CshilYJq3fHc1AQ5VYzgZNKN9OYUjaiA+xYKmMOshnx07IiVX6JEqULWnITP09kdNY63Ec2s6YmqFe9Kbif14nM9FkHOZgYlmy+KMkFMQqafkz5XyIwYW0KZ4vZWwoZUWZsPhUbgrf48jLxG/XLund7VmteF WmU4QiO4RQ8OIcm3EALfGDA4Rle iVe QGhRo9apf3X7CshilYJq3fHc1AQ5VYzgZNKN9OYUjaiA+xYKmMOshnx07IiVX6JEqULWnITP09kdNY63Ec2s6YmqFe9Kbif14nM9FkHOZgYlmy+KMkFMQqafkz5XyIwYW0KZ4vZWwoZUWZsPhUbgrf48jLxG/XLund7VmteF WmU4QiO4RQ8OIcm3EALfGDA4Rle iVe QGhRo9apf3X7CshilYJq3fHc1AQ5VYzgZNKN9OYUjaiA+xYKmMOshnx07IiVX6JEqULWnITP09kdNY63Ec2s6YmqFe9Kbif14nM9FkHOZgYlmy+KMkFMQqafkz5XyIwYW0KZ4vZWwoZUWZsPhUbgrf48jLxG/XLund7VmteF WmU4QiO4RQ8OIcm3EALfGDA4Rle0 码力 | 81 页 | 13.18 MB | 1 年前3
监控Apache Flink应用程序(入门)............................. 14 4.8.1 currentProcessingTime - currentOutputWatermark > threshold.................................................................... 14 4.9 "Keeping Up"............. ........................... 14 4.10 关键指标 .............................................................................................................................. 14 4.11 可能的报警条件 .............. ............................. 14 4.12 Monitoring Latency.............................................................................................................. 14 4.12.1 Key Metrics .........0 码力 | 23 页 | 148.62 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020s=0.5 c=5 s=1.0 12.5 L2=18.75 O2 I1 c=10 s=0.5 c=10 s=0.8 c=5 s=1.0 O1 c=10 s=0.9 5 14 5 5 19 5 ??? Vasiliki Kalavri | Boston University 2020 13 I2 c=10 s=0.7 c=10 s=0.5 c=5 s=1.0 12.5 L2=18.75 O2 I1 c=10 s=0.5 c=10 s=0.8 c=5 s=1.0 O1 c=10 s=0.9 L1=26.5 5 14 5 5 19 5 ??? Vasiliki Kalavri | Boston University 2020 13 I2 c=10 s=0.7 c=10 s=0.5 c=5 s=1 s=1.0 12.5 L2=18.75 O2 I1 c=10 s=0.5 c=10 s=0.8 c=5 s=1.0 O1 c=10 s=0.9 L1=26.5 5 14 5 5 19 5 r1=10 r/s r2=20 r/s ??? Vasiliki Kalavri | Boston University 2020 13 I2 c=10 s=0.70 码力 | 43 页 | 2.42 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020size, e.g. s elements. 14 ??? Vasiliki Kalavri | Boston University 2020 Instead of a fixed proportion, assume we can only store a sample S of fixed size, e.g. s elements. 14 How can we continuously Instead of a fixed proportion, assume we can only store a sample S of fixed size, e.g. s elements. 14 How can we continuously maintain a representative fixed-size sample of the stream so far? At all Instead of a fixed proportion, assume we can only store a sample S of fixed size, e.g. s elements. 14 How can we continuously maintain a representative fixed-size sample of the stream so far? At all0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentationimport Schema from pyflink.table.table_descriptor import TableDescriptor (continues on next page) 14 Chapter 1. How to build docs locally pyflink-docs, Release release-1.15 (continued from previous Table also provides the conversion back to a pandas DataFrame to leverage pandas API. [14]: table.to_pandas() [14]: id data 0 1 Hi 1 2 Hello 16 Chapter 1. How to build docs locally pyflink-docs, Release (continued from previous page) 11 12 # create python virtual environment 13 ./miniconda.sh -b -p venv 14 15 # activate the conda python virtual environment 16 source venv/bin/activate "" 17 18 # specify0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationimport Schema from pyflink.table.table_descriptor import TableDescriptor (continues on next page) 14 Chapter 1. How to build docs locally pyflink-docs, Release release-1.16 (continued from previous Table also provides the conversion back to a pandas DataFrame to leverage pandas API. [14]: table.to_pandas() [14]: id data 0 1 Hi 1 2 Hello 16 Chapter 1. How to build docs locally pyflink-docs, Release (continued from previous page) 11 12 # create python virtual environment 13 ./miniconda.sh -b -p venv 14 15 # activate the conda python virtual environment 16 source venv/bin/activate "" 17 18 # specify0 码力 | 36 页 | 266.80 KB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Source 10 12 10 18 23 11 15 11 15 event time watermark 15 14 20 • The input watermark captures the progress of upstream stages • minimum of output watermarks timestamp T indicates that all subsequent records should have timestamps > T. Watermark properties 14 Vasiliki Kalavri | Boston University 2020 Watermarks are essential to both event-time windows and event-time windows Vasiliki Kalavri | Boston University 2020 16 http://streamingbook.net/fig/3-2 14 Vasiliki Kalavri | Boston University 2020 Watermarks provide a configurable trade-off between results0 码力 | 22 页 | 2.22 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020serialization send message waiting waiting 13 ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we scale ο1 x 4? How much to scale ο2? ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What waiting for output waiting for input src o1 o2 ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck?0 码力 | 93 页 | 2.42 MB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Short recovery time • High runtime overhead • The checkpoint interval determines the trade-off 14 Ni primary secondary I1 O1 N’i update checkpoint send state Vasiliki Kalavri | Boston University Short recovery time • High runtime overhead • The checkpoint interval determines the trade-off 14 Ni primary secondary I1 O1 N’i update checkpoint send state Can you see any disadvantage in0 码力 | 49 页 | 2.08 MB | 1 年前3
共 21 条
- 1
- 2
- 3













