VS Code - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Scalable Stream Processing - Spark Streaming and Flink

Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues • Continuous vs. micro-batch micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops up the live stream into function is executed in the driver process. 31 / 79 Output Operations (2/4) ▶ What’s wrong with this code? ▶ This requires the connection object to be serialized and sent from the driver to the worker.

0 码力 | 113 页 | 1.22 MB | 1 年前
3
PyFlink 1.15 Documentation

‘int’ object has no attribute ‘encode’ . . . . . . . . . . . . . . 32 1.3.6.3 Q3: Types.BIG_INT() VS Types.LONG() . . . . . . . . . . . . . . . . . . . . . . 32 1.4 API reference . . . . . . . . . . "/Users/dianfu/code/src/github/pyflink-faq/testing/test_utils.py", line 122, in␣ ˓→setUp self.t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) File "/Users/dianfu/code/src/github in in_streaming_mode get_gateway().jvm.EnvironmentSettings.inStreamingMode()) File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-10

0 码力 | 36 页 | 266.77 KB | 1 年前
3
PyFlink 1.16 Documentation

‘int’ object has no attribute ‘encode’ . . . . . . . . . . . . . . 32 1.3.6.3 Q3: Types.BIG_INT() VS Types.LONG() . . . . . . . . . . . . . . . . . . . . . . 32 1.4 API reference . . . . . . . . . . "/Users/dianfu/code/src/github/pyflink-faq/testing/test_utils.py", line 122, in␣ ˓→setUp self.t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode()) File "/Users/dianfu/code/src/github in in_streaming_mode get_gateway().jvm.EnvironmentSettings.inStreamingMode()) File "/Users/dianfu/code/src/github/pyflink-faq/testing/.venv/lib/python3.8/site- ˓→packages/apache_flink-1.14.4-py3.8-macosx-10

0 码力 | 36 页 | 266.80 KB | 1 年前
3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

2020 Outcomes At the end of the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming Deliverables • One (1) written report of maximum 5 pages (10%). • Code (including pre-processing, deployment, and testing): (40%) • code deliverables must be accompanied by documentation 8 Vasiliki Kalavri

0 码力 | 34 页 | 2.53 MB | 1 年前
3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

4 storage analytics static data streaming data Vasiliki Kalavri | Boston University 2020 DBMS vs. DSMS DBMS DSMS Data persistent relations streams Data Access random sequential, single-pass Updates continuous Latency relatively high low 5 Vasiliki Kalavri | Boston University 2020 Traditional DW vs. SDW Traditional DW SDW Update Frequency low high Update propagation synchronized asynchronous stream is interpreted as describing a changing relation. • Stream elements bear a valid timestamp, Vs, after which they are considered valid and they can contribute to the result. • alternatively,

0 码力 | 45 页 | 1.22 MB | 1 年前
3
【05 计算平台蓉荣】Flink 批处理及其应⽤

为什什么Flink能做批处理理 Table Stream Bounded Data Unbounded Data SQL Runtime SQL ⾼高吞吐低延时 Hive vs. Spark vs. Flink Batch Hive/Hadoop Spark Flink 模型 MR MR(Memory/Disk) Pipeline 吞吐 TB-PB TB-PB 未经⼤大规模⽣生产验证 HiveSQL SparkSQL ANSI SQL 易易⽤用性⼀一般易易⽤用⼀一般⼯工具/⽣生态⼀一般丰富⼀一般 Flink Batch应⽤用 - 数据湖 Data Lake vs. Data Warehouse Flink Batch应⽤用 - 数据湖 Flink Batch应⽤用 - 数据湖 Blink SQL+UDF Queue 存储类存储计算存储

0 码力 | 12 页 | 1.44 MB | 1 年前
3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

message broker delivers messages to all subscribed consumers in a broadcast fashion. 11 12 Brokers vs. Databases • DBs keep data until explicitly deleted while MBs delete messages once consumed. • Asynchronous RPC Futures Message Queues Pub/Sub Yes Yes Yes Can you fill this in? 19 Pub/Sub vs. other paradigms Paradigm Space Decoupling Time Decoupling Synchronization Decoupling Message-passing consumer had read messages later than its recorded offset How can we avoid re-processing? 30 Logs vs. in-memory brokers • Multiple consumers with different processing speeds: reading a message doesn't

0 码力 | 33 页 | 700.14 KB | 1 年前
3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

order of operators • scheduling and placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator plans • minimize intermediate data and communication Alternatives • data structures • sorting vs hashing • indexing, pre-fetching • minimize disk access • scheduling Objectives • optimize equivalent computation • Ensure mergeable state: even a simple counter might differ on a combined stream vs. on separate streams Redundancy elimination Eliminate redundant operations, aka subgraph sharing

0 码力 | 54 页 | 2.83 MB | 1 年前
3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020

WindowedStream (or AllWindowedStream) and processes the elements assigned to a window. 5 Keyed vs. non-keyed windows Vasiliki Kalavri | Boston University 2020 // define a keyed window operator stream specify the window assigner .reduce/aggregate/process(...) // specify the window function 6 Keyed vs. non-keyed windows Vasiliki Kalavri | Boston University 2020 Time-based window assigners for the

0 码力 | 35 页 | 444.84 KB | 1 年前
3
Streaming in Apache Flink

extractTimestamp(MyEvent event) { return element.getCreationTime(); } } Windows (Not the OS) Global Vs Keyed Windows stream. .keyBy() .window() .reduce|aggregate|process(
0 码力 | 45 页 | 3.00 MB | 1 年前
3

共 16 条前往

页

分类

语言

格式

Scalable Stream Processing - Spark Streaming and Flink

PyFlink 1.15 Documentation

PyFlink 1.16 Documentation

Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020

【05 计算平台蓉荣】Flink 批处理及其应⽤

Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020

Streaming in Apache Flink