Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics SpringBoston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 3/24: Exactly-once fault-tolerance in Apache Flink ??? Vasiliki Kalavri to a remote, persistent storage 4. Wait until all tasks have finished their copies 5. Resume processing and stream ingestion 12 ??? Vasiliki Kalavri | Boston University 2020 –Leslie Lamport The distributed Kalavri | Boston University 2020 Requirements: • Taking a snapshot does not interfere with processing • processing and messages do not stop • Each process cast locally record its own state • Any process0 码力 | 81 页 | 13.18 MB | 1 年前3
CurveBS IO Processing FlowCurveBS I/O processing flow Before introducing IO processing flow, we first describe the overall architecture, data organization and topology structure of CURVE. CurveBS uses the central sockets. l Nebdserver: Accepts requests from NEBDClient and calls Curve Client for corresponding processing. it can receive requests from different NEBDClients.3. Through the above splitting, NebdClient NebdClient replaces Curve Client and directly interfaces with upper services. There is no logical processing in NEBDClient, it just proxy requests and has limited retries, which ensuring that NEBDClient0 码力 | 13 页 | 2.03 MB | 6 月前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University University 2020 What is a stream? • In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally incrementally over time, rather than being available in full before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream0 码力 | 45 页 | 1.22 MB | 1 年前3
2.1.5 Processing XML and Spreadsheet Data in GoProcessing XML and Spreadsheet in Go 续 日 Gopher China Conference Beijing 2021 6/26 - 6/27 Self Introduction The author of the Excelize - Go language spreadsheet library. Familiar with Go language Complex XML 02 • Partial Load • Namespace & Entity • Ser/Deserialize Idempotence High Performance Processing 03 • XML Schema Definition • DOM or SAX OOXML Spreadsheets 04 • Excel XML Specification • work:addr="WORK"> High Performance Processing XML Components Data ModelTom 0 码力 | 35 页 | 1.34 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkScalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops0 码力 | 113 页 | 1.22 MB | 1 年前3
【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink0 码力 | 30 页 | 24.22 MB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Uddin Nasir et. al. The power of both choices: Practical load balancing for distributed stream processing engines. ICDE 2015. • Mitzenmacher, Michael. The power of two choices in randomized load balancing0 码力 | 31 页 | 1.47 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston operator. Keyed state can only be used by functions that are applied on a KeyedStream: • When the processing method of a function with keyed input is called, Flink’s runtime automatically puts all keyed fare, Collector> out) throws Exception { // similar logic for processing fare events } } } Java example (cont.) 21 Vasiliki Kalavri | Boston University 2020 0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University serialization cost • if operators are separate, throughput is bounded by either communication or processing cost • if fused, throughput is determined by operator cost only Operator fusion A B A B is statically configured with a certain number of processing slots that defines the maximum number of concurrent tasks it can execute. • A processing slot can execute one slice of an application, i.e0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston windowing use cases: • They assign an element based on its event-time timestamp or the current processing time to windows. • Time windows have a start and an end timestamp. • All built-in window assigners assigners provide a default trigger that triggers the evaluation of a window once the (processing or event) time passes the end of the window. • A window is created when the first element is assigned0 码力 | 35 页 | 444.84 KB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100













