PyFlink 1.15 DocumentationPython Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.3 JDK issues . . . . . . . . . separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages. It is useful for local development to create a standalone Python word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1]0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 DocumentationPython Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.3 JDK issues . . . . . . . . . separate environment for each project. It is a directory tree which contains its own Python executable files and the installed Python packages. It is useful for local development to create a standalone Python word_count.py # You will see outputs as following: # Use --input to specify file input. # Printing result to stdout. Use --output to specify output path. # +I[To, 1] # +I[be,, 1] # +I[or, 1] # +I[not, 1]0 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and FlinkTCP socket connection. ssc.socketTextStream("localhost", 9999) ▶ File stream • Reads data from files. streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory) streamingContext TCP socket connection. ssc.socketTextStream("localhost", 9999) ▶ File stream • Reads data from files. streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory) streamingContext DStream of (word, 1). ▶ Get the frequency of words in each batch of data. ▶ Finally, print the result. val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) wordCounts0 码力 | 113 页 | 1.22 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020ETL process complex fast and light-weight ETL: Extract-Transform-Load e.g. unzipping compressed files, data cleaning and standardization 6 Vasiliki Kalavri | Boston University 2020 1. Process events bear a valid timestamp, Vs, after which they are considered valid and they can contribute to the result. • alternatively, events can have validity intervals. • The contents of the relation at time indexes and materialized views for high rates. • Incremental computation: do we recompute the result from scratch whenever a new record is appended to the stream table? Synopses: Maintain summaries0 码力 | 45 页 | 1.22 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri vkalavri@bu.edu Spring 2020 1/28: Stream ingestion and pub/sub systems Streaming sources Files, e.g. transaction logs Sockets IoT devices and sensors Databases and KV stores Message queues subscription. • DB query results depend on a snapshot and clients are not notified if their query result changes later. 13 Message delivery and ordering Acknowledgements are messages from the client0 码力 | 33 页 | 700.14 KB | 1 年前3
Flink如何实时分析Iceberg数据湖的CDC数据delete fileO 写T思l 1N5oFitioA ,elete File和nR SeD3umP大Qi己SeD3um 的,ata FileS 04I3M 2N-DualitJ ,elete File和n RSeD3um小Qi己SeD3um 的,ata FileS 04I3O 读取思l *CClJ ,eletioA *CClJ ,eletioA 5AnFDeNOSRTVU :1 :2 :3 f4 Ice4erg/Are3m1riAer Ice4erg/Are3m1riAer Ice4erg/Are3m1riAer 1 1riAe records Ao D3A3/DeleAe Files. F量文E集I1A4ns4cCion提D /4ACiCion-2 -cebeAg .eC4sCoAe /4ACiCion-1 /4ACiCion-3 -cebeAg D4C4 )enCeA ((3-2 -cebeAgSCAe4m2AiCeA -cebeAgSCAe4m2AiCeA -cebeAgSCAe4m2AiCeA 1 2AiCe AecoAds Co D4C4/DeleCe Files. -cebeAgFiles)ommiCCeA 2 EmiC compleCed D4C4File Co commiCCeA. f1 f2 f3 f4 F量文E集I1A4ns4cCion提D0 码力 | 36 页 | 781.69 KB | 1 年前3
High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020produce output What can go wrong: • lost events • duplicate or lost state updates • wrong result 5 mi mo Was mi fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University University 2020 Processing guarantees and result semantics 11 sum 4 3 2 1 0 … Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 4 3 2 1 … 1 5 Vasiliki University 2020 Processing guarantees and result semantics 11 sum 4 3 1 3 3 … 5 6 Vasiliki Kalavri | Boston University 2020 Processing guarantees and result semantics 11 sum 5 4 3 6 6 … 6 7 10 码力 | 49 页 | 2.08 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020conditions does the optimization preserve correctness? • maintain state semantics • maintain result and selectivity semantics • Dynamism: can the optimization be applied during runtime or does it attributes A writes to. • Commutativity: the results of applying A and then B must be the same as the result of applying B and then A. • holds if both operators are stateless Operator re-ordering B A A attributes A writes to. • commutativity: the results of applying A and then B must be the same as the result of applying B and then A. • holds if both operators are stateless Re-ordering split and merge0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020• They maintain a single value as window state and eventually emit the aggregated value as the result. • ReduceFunction and AggregateFunction • Full window functions collect all elements of a window accumulator); // compute the result from the accumulator and return it. OUT getResult(ACC accumulator); // merge two accumulators and return the result. ACC merge(ACC a, ACC b); } 16 Input type Accumulator type Output type Initialization Accumulate one element Compute the result Merge two partial accumulators Vasiliki Kalavri | Boston University 2020 Use the ProcessWindowFunction0 码力 | 35 页 | 444.84 KB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020rate : throughput Why is it necessary? ??? Vasiliki Kalavri | Boston University 2020 • Ensure result correctness • reconfiguration mechanism often relies on fault-tolerance mechanism • State re-partitioning Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? Re-partition and migrate state in a consistent manner • Block and unblock computations to ensure result correctness ??? Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt?0 码力 | 41 页 | 4.09 MB | 1 年前3
共 20 条
- 1
- 2













