 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020communication, UDP multicast, TCP • HTTP or RPC if the consumer exposes a service on the network • Failure handling: application needs to be aware of message loss, producers and consumers always online 5 Message architecture advantages • Multiple producers/consumers as concurrent clients • Effective failure handling, crashes or disconnects • Broker responsible for message durability • Asynchronous communication messages are totally ordered but there is no ordering guarantee across partitions 28 29 Failure handling • The broker does not need to wait for acknowledgements any more, but simply record consumers'0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020communication, UDP multicast, TCP • HTTP or RPC if the consumer exposes a service on the network • Failure handling: application needs to be aware of message loss, producers and consumers always online 5 Message architecture advantages • Multiple producers/consumers as concurrent clients • Effective failure handling, crashes or disconnects • Broker responsible for message durability • Asynchronous communication messages are totally ordered but there is no ordering guarantee across partitions 28 29 Failure handling • The broker does not need to wait for acknowledgements any more, but simply record consumers'0 码力 | 33 页 | 700.14 KB | 1 年前3
 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Watermarks are essential to both event-time windows and operators handling out-of-order events: • When an operator receives a watermark with time T, it can assume that no0 码力 | 22 页 | 2.22 MB | 1 年前3 Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Watermarks are essential to both event-time windows and operators handling out-of-order events: • When an operator receives a watermark with time T, it can assume that no0 码力 | 22 页 | 2.22 MB | 1 年前3
 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 202015 The standard error of the LogLog algorithm is inversely related to the number of counters m: Standard error δ ≈ 1.3 m For m = 256, the error is about 8% For m = 1024, the error decreases to 4% Vasiliki Kalavri | Boston University 2020 26 • Query approximation error • Error probability Guarantee: The estimation error for frequencies will not exceed with probability • A higher number ⌈2.71828 ϵ ⌉ Error and space/time trade-offs ??? Vasiliki Kalavri | Boston University 2020 27 Space requirements ??? Vasiliki Kalavri | Boston University 2020 27 For a standard error of , we need0 码力 | 69 页 | 630.01 KB | 1 年前3 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 202015 The standard error of the LogLog algorithm is inversely related to the number of counters m: Standard error δ ≈ 1.3 m For m = 256, the error is about 8% For m = 1024, the error decreases to 4% Vasiliki Kalavri | Boston University 2020 26 • Query approximation error • Error probability Guarantee: The estimation error for frequencies will not exceed with probability • A higher number ⌈2.71828 ϵ ⌉ Error and space/time trade-offs ??? Vasiliki Kalavri | Boston University 2020 27 Space requirements ??? Vasiliki Kalavri | Boston University 2020 27 For a standard error of , we need0 码力 | 69 页 | 630.01 KB | 1 年前3
 PyFlink 1.15 DocumentationOverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache.flink.client.python.PythonEnvUtils.resetCallbackClient 30 1.3 factories.DynamicTableFactory’ in the classpath Exception Stack: py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.table.api.ValidationException: Unable to create documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.runtime.client.JobExecutionException: Job execution0 码力 | 36 页 | 266.77 KB | 1 年前3 PyFlink 1.15 DocumentationOverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache.flink.client.python.PythonEnvUtils.resetCallbackClient 30 1.3 factories.DynamicTableFactory’ in the classpath Exception Stack: py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.table.api.ValidationException: Unable to create documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.runtime.client.JobExecutionException: Job execution0 码力 | 36 页 | 266.77 KB | 1 年前3
 PyFlink 1.16 DocumentationOverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache.flink.client.python.PythonEnvUtils.resetCallbackClient 30 1.3 factories.DynamicTableFactory’ in the classpath Exception Stack: py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.table.api.ValidationException: Unable to create documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.runtime.client.JobExecutionException: Job execution0 码力 | 36 页 | 266.80 KB | 1 年前3 PyFlink 1.16 DocumentationOverflowError: timeout value is too large . . . . . . . . . . . . . . . . . . . . 30 1.3.5.2 Q2: An error occurred while calling z:org.apache.flink.client.python.PythonEnvUtils.resetCallbackClient 30 1.3 factories.DynamicTableFactory’ in the classpath Exception Stack: py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.table.api.ValidationException: Unable to create documentation. 1.3.4.2 O2: ClassNotFoundException: com.mysql.cj.jdbc.Driver py4j.protocol.Py4JJavaError: An error occurred while calling o13.execute. : org.apache.flink.runtime.client.JobExecutionException: Job execution0 码力 | 36 页 | 266.80 KB | 1 年前3
 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020estimated frequency of item δ: user-defined threshold, so that freq(x)≥ δ*N,δ∈(0,1) ε: user-defined error Output: All items with frequency greater than or equal to δ*N. No item with frequency less than the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove infrequent elements. 6 ??? Vasiliki Kalavri | Boston University in wcur: if x ∈ D, increase its frequency, fx = fx +1 else insert with frequency fx = 1 and error εx = wcur - 1 N = N + 1 Delete step Iterate over D and remove every element x with fx + εx0 码力 | 31 页 | 1.47 MB | 1 年前3 Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020estimated frequency of item δ: user-defined threshold, so that freq(x)≥ δ*N,δ∈(0,1) ε: user-defined error Output: All items with frequency greater than or equal to δ*N. No item with frequency less than the current window id • We keep a list D of element frequencies and their maximum associated error. • Once a window fills up, we remove infrequent elements. 6 ??? Vasiliki Kalavri | Boston University in wcur: if x ∈ D, increase its frequency, fx = fx +1 else insert with frequency fx = 1 and error εx = wcur - 1 N = N + 1 Delete step Iterate over D and remove every element x with fx + εx0 码力 | 31 页 | 1.47 MB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020actual ??? Vasiliki Kalavri | Boston University 2020 parallelism initial rate target actual error p0 p1 prediction x x x DS2 model properties 24 ??? Vasiliki Kalavri | Boston University 2020 Boston University 2020 parallelism initial rate target actual p0 p1 x error p1’ new prediction Gradually minimizes error DS2 model properties 24 ??? Vasiliki Kalavri | Boston University 2020 250 码力 | 93 页 | 2.42 MB | 1 年前3 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020actual ??? Vasiliki Kalavri | Boston University 2020 parallelism initial rate target actual error p0 p1 prediction x x x DS2 model properties 24 ??? Vasiliki Kalavri | Boston University 2020 Boston University 2020 parallelism initial rate target actual p0 p1 x error p1’ new prediction Gradually minimizes error DS2 model properties 24 ??? Vasiliki Kalavri | Boston University 2020 250 码力 | 93 页 | 2.42 MB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020scaled using approximate query processing techniques, where accuracy is measured in terms of relative error in the computed query answers. 17 ??? Vasiliki Kalavri | Boston University 2020 Which tuples to Katsipoulakis, A. Labrinidis, and P. K. Chrysanthis. Concept-driven load shedding: Reducing size and error of voluminous and variable data streams. (IEEE Big Data ’18) • H. T. Kung, T. Blackwell, and A.0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020scaled using approximate query processing techniques, where accuracy is measured in terms of relative error in the computed query answers. 17 ??? Vasiliki Kalavri | Boston University 2020 Which tuples to Katsipoulakis, A. Labrinidis, and P. K. Chrysanthis. Concept-driven load shedding: Reducing size and error of voluminous and variable data streams. (IEEE Big Data ’18) • H. T. Kung, T. Blackwell, and A.0 码力 | 43 页 | 2.42 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020footprint for accuracy • Query results are approximate with either deterministic or probabilistic error bounds • There is no universal synopsis solution • They are purpose-built and query-specific0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020footprint for accuracy • Query results are approximate with either deterministic or probabilistic error bounds • There is no universal synopsis solution • They are purpose-built and query-specific0 码力 | 45 页 | 1.22 MB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020or are discarded cpConfig.setCheckpointTimeout(300000); // do not fail the job on a checkpointing error cpConfig.setFailOnCheckpointingErrors(false); ??? Vasiliki Kalavri | Boston University 2020 Lecture0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020or are discarded cpConfig.setCheckpointTimeout(300000); // do not fail the job on a checkpointing error cpConfig.setFailOnCheckpointingErrors(false); ??? Vasiliki Kalavri | Boston University 2020 Lecture0 码力 | 81 页 | 13.18 MB | 1 年前3
共 10 条
- 1













