 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 16 Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary to store any elements seen: Space requirements ??? Vasiliki seen: • Assume we want to count cardinalities up to 1 billion or 230 with an accuracy of 4%. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary 230 with an accuracy of 4%. • The hash value needs to map elements to M = log2(230) = 30 bits. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary0 码力 | 69 页 | 630.01 KB | 1 年前3 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 16 Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary to store any elements seen: Space requirements ??? Vasiliki seen: • Assume we want to count cardinalities up to 1 billion or 230 with an accuracy of 4%. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary 230 with an accuracy of 4%. • The hash value needs to map elements to M = log2(230) = 30 bits. Space requirements ??? Vasiliki Kalavri | Boston University 2020 16 As we read the stream, it is not necessary0 码力 | 69 页 | 630.01 KB | 1 年前3
 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who / how many subscribers get notified asynchronously while possibly performing some other concurrent action. 18 Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing RPC/RMI Asynchronous Message Queues Pub/Sub Yes Yes Yes Can you fill this in? 19 Pub/Sub vs. other paradigms Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing No No Producer-side0 码力 | 33 页 | 700.14 KB | 1 年前3 Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who / how many subscribers get notified asynchronously while possibly performing some other concurrent action. 18 Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing RPC/RMI Asynchronous Message Queues Pub/Sub Yes Yes Yes Can you fill this in? 19 Pub/Sub vs. other paradigms Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing No No Producer-side0 码力 | 33 页 | 700.14 KB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020virtual circuit (VC) or connection. • CFC uses a credit system to signal the availability of buffer space from receivers to senders. ??? Vasiliki Kalavri | Boston University 2020 27 • Senders maintain containing their number of available credits. • One credit corresponds to some amount of buffer space so that a sender can know how much data they can afford to forward downstream. ??? Vasiliki Kalavri0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020virtual circuit (VC) or connection. • CFC uses a credit system to signal the availability of buffer space from receivers to senders. ??? Vasiliki Kalavri | Boston University 2020 27 • Senders maintain containing their number of available credits. • One credit corresponds to some amount of buffer space so that a sender can know how much data they can afford to forward downstream. ??? Vasiliki Kalavri0 码力 | 43 页 | 2.42 MB | 1 年前3
 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020pair of addresses at any point in the stream. 13 It is the most general model Hard to develop space-efficient and time-efficient algorithms Vasiliki Kalavri | Boston University 2020 Relational Streaming can be easily updated with a single pass over streaming tuples in their arrival order • Small space: memory footprint poly-logarithmic in the stream size • Low time: fast update and query times0 码力 | 45 页 | 1.22 MB | 1 年前3 Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020pair of addresses at any point in the stream. 13 It is the most general model Hard to develop space-efficient and time-efficient algorithms Vasiliki Kalavri | Boston University 2020 Relational Streaming can be easily updated with a single pass over streaming tuples in their arrival order • Small space: memory footprint poly-logarithmic in the stream size • Low time: fast update and query times0 码力 | 45 页 | 1.22 MB | 1 年前3
 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020it is available for consumption. Records are discarded after their retention time to free up disk space. 22 Vasiliki Kalavri | Boston University 2020 23 Partitions allow the log to scale beyond a size0 码力 | 26 页 | 3.33 MB | 1 年前3 Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020it is available for consumption. Records are discarded after their retention time to free up disk space. 22 Vasiliki Kalavri | Boston University 2020 23 Partitions allow the log to scale beyond a size0 码力 | 26 页 | 3.33 MB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020of flow size in the input channel of each parallel task • Partitioning function performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020of flow size in the input channel of each parallel task • Partitioning function performance • space required to implement routing • lookup cost • Migration performance • re-assignment computation0 码力 | 41 页 | 4.09 MB | 1 年前3
 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020window and iterate over the list of all collected elements when evaluated: • They require more space but support more complex logic. • ProcessWindowFunction Window functions 14 Vasiliki Kalavri |0 码力 | 35 页 | 444.84 KB | 1 年前3 Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020window and iterate over the list of all collected elements when evaluated: • They require more space but support more complex logic. • ProcessWindowFunction Window functions 14 Vasiliki Kalavri |0 码力 | 35 页 | 444.84 KB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020the fault-tolerance mechanism under normal, failure- free operation? • How much memory or disk space is required to maintain input tuples and state? Recovery speed • How long does it take for the0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020the fault-tolerance mechanism under normal, failure- free operation? • How much memory or disk space is required to maintain input tuples and state? Recovery speed • How long does it take for the0 码力 | 49 页 | 2.08 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020according to the number of available cores / threads • Fused operators can share the address space but use separate threads of control • avoid communication cost without losing pipeline parallelism0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020according to the number of available cores / threads • Fused operators can share the address space but use separate threads of control • avoid communication cost without losing pipeline parallelism0 码力 | 54 页 | 2.83 MB | 1 年前3
 Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020approximation • A small synopsis can provide very accurate approximations using very little space: • It might suffice to know that the true answer is roughly $5 million without knowing that the0 码力 | 74 页 | 1.06 MB | 1 年前3 Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020approximation • A small synopsis can provide very accurate approximations using very little space: • It might suffice to know that the true answer is roughly $5 million without knowing that the0 码力 | 74 页 | 1.06 MB | 1 年前3
共 10 条
- 1
相关搜索词
 CardinalityandfrequencyestimationCS591K1DataStreamProcessingAnalyticsSpring2020ingestionpubsubsystemsFlowcontrolloadsheddingprocessingfundamentalsIntroductiontoApacheFlinkKafkaFaulttolerancedemoreconfigurationWindowstriggersHighavailabilityrecoverysemanticsguaranteesStreamingoptimizationsFilteringsamplingstreams













