 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020fr/file/index/docid/406166/ filename/FlFuGaMe07.pdf • Cormode, Graham, and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms (2005).0 码力 | 69 页 | 630.01 KB | 1 年前3 Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020fr/file/index/docid/406166/ filename/FlFuGaMe07.pdf • Cormode, Graham, and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms (2005).0 码力 | 69 页 | 630.01 KB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flink• The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flink• The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict0 码力 | 41 页 | 4.09 MB | 1 年前3
 监控Apache Flink应用程序(入门)metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing0 码力 | 23 页 | 148.62 KB | 1 年前3 监控Apache Flink应用程序(入门)metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing0 码力 | 23 页 | 148.62 KB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide MIMO too complex • Action • predictive, dataflow-wide The output signal is the delay time Performance depends on parameter selection, e.g. poles placement, sampling period, damping Cannot identify0 码力 | 93 页 | 2.42 MB | 1 年前3 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide MIMO too complex • Action • predictive, dataflow-wide The output signal is the delay time Performance depends on parameter selection, e.g. poles placement, sampling period, damping Cannot identify0 码力 | 93 页 | 2.42 MB | 1 年前3
 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? Do we need to checkpoint the complete0 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? Do we need to checkpoint the complete0 码力 | 81 页 | 13.18 MB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve0 码力 | 54 页 | 2.83 MB | 1 年前3
 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs one needs to consider when designing and deploying0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs one needs to consider when designing and deploying0 码力 | 34 页 | 2.53 MB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Elasticity Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Elasticity Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate0 码力 | 43 页 | 2.42 MB | 1 年前3
共 10 条
- 1
相关搜索词
 CardinalityandfrequencyestimationCS591K1DataStreamProcessingAnalyticsSpring2020ScalableSparkStreamingFlinkFaulttolerancedemoreconfiguration监控Apache应用程序应用程序入门ElasticitystatemigrationPartExactlyoncefaultinHighavailabilityrecoverysemanticsguaranteesoptimizationsCourseintroductionFlowcontrolloadshedding













