 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu8 u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu8 u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu80 码力 | 81 页 | 13.18 MB | 1 年前3 Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu8 u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu8 u0169V6m6NXcGsky8glShQLNX+er2E5bFKA0TVOuO56Y myKkynAmclLuZxpSyER1gx1JY9RBPjt1Qk6t0idRomx JQ2bq74mcxlqP49B2xtQM9aI3Ff/zOpmJLoOcyzQzKNl 8UZQJYhIy/Zv0uUJmxNgSyhS3txI2pIoyY9Mp2xC8xZeX iV+vXdW8u/Nq47pIowTHcAJn4MEFNOAWmuADgwE8wyu80 码力 | 81 页 | 13.18 MB | 1 年前3
 Scalable Stream Processing - Spark Streaming and Flink• The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3 Scalable Stream Processing - Spark Streaming and Flink• The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3
 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict0 码力 | 41 页 | 4.09 MB | 1 年前3 Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict0 码力 | 41 页 | 4.09 MB | 1 年前3
 监控Apache Flink应用程序(入门)metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing0 码力 | 23 页 | 148.62 KB | 1 年前3 监控Apache Flink应用程序(入门)metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing0 码力 | 23 页 | 148.62 KB | 1 年前3
 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide MIMO too complex • Action • predictive, dataflow-wide The output signal is the delay time Performance depends on parameter selection, e.g. poles placement, sampling period, damping Cannot identify0 码力 | 93 页 | 2.42 MB | 1 年前3 Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide MIMO too complex • Action • predictive, dataflow-wide The output signal is the delay time Performance depends on parameter selection, e.g. poles placement, sampling period, damping Cannot identify0 码力 | 93 页 | 2.42 MB | 1 年前3
 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3 High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3
 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve0 码力 | 54 页 | 2.83 MB | 1 年前3 Streaming optimizations	- CS 591 K1: Data Stream Processing and Analytics Spring 2020placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve0 码力 | 54 页 | 2.83 MB | 1 年前3
 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs one needs to consider when designing and deploying0 码力 | 34 页 | 2.53 MB | 1 年前3 Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs one needs to consider when designing and deploying0 码力 | 34 页 | 2.53 MB | 1 年前3
 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Elasticity Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate0 码力 | 43 页 | 2.42 MB | 1 年前3 Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020Elasticity Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate0 码力 | 43 页 | 2.42 MB | 1 年前3
共 9 条
- 1













