PyFlink 1.15 Documentation. . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory standalone Python environment and also useful when deploying a PyFlink job to production when there are massive Python dependencies. It’s supported to use Python virtual environment in your PyFlink jobs0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation. . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.1 O1: How to prepare Python Virtual Environment . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 O2: How to add Python Files . . . . . . . . --version Create a Python virtual environment Virtual environment gives you the ability to isolate the Python dependencies of different projects by creating a separate environment for each project. It is a directory standalone Python environment and also useful when deploying a PyFlink job to production when there are massive Python dependencies. It’s supported to use Python virtual environment in your PyFlink jobs0 码力 | 36 页 | 266.80 KB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020f[j] = ci,j return min(f[1], f[2], …, f[p]) ??? Vasiliki Kalavri | Boston University 2020 24 Computing top-k ??? Vasiliki Kalavri | Boston University 2020 24 • Additional to the array of counter, we elements seen so far • a heap X* of up to k potential heavy hitters and their frequency estimations Computing top-k ??? Vasiliki Kalavri | Boston University 2020 24 • Additional to the array of counter, we frequency estimations • We use a frequency threshold f*=N/k to decide whether an element is popular Computing top-k ??? Vasiliki Kalavri | Boston University 2020 24 • Additional to the array of counter, we0 码力 | 69 页 | 630.01 KB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020discard by relying on the notion of a window-based concept drift. • The metric is defined by computing a similarity metric across windows. 18 ??? Vasiliki Kalavri | Boston University 2020 How many0 码力 | 43 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020University 2020 53 • Martin Hirzel et. al. A Catalog of Stream Processing Optimizations. (ACM Computing Surveys 2014). • Ron Avnur and Joseph M. Hellerstein. Eddies: continuously adaptive query processing0 码力 | 54 页 | 2.83 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Synopses for massive data streams • Maintaining synopses is often the only means of providing interactive response times when exploring massive datasets or high speed data streams. • Queries are executed0 码力 | 74 页 | 1.06 MB | 1 年前3
Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020-algorithm/ • A video lecture on global snapshots: https://www.coursera.org/lecture/ cloud-computing/1-2-global-snapshot-algorithm-hndGi 520 码力 | 81 页 | 13.18 MB | 1 年前3
Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 202011 Challenges of reconfiguration ??? Vasiliki Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler Vasiliki Kalavri | Boston University 2020 Control: When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler Control: When and how much to adapt? Mechanism: How to apply the re-configuration? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler0 码力 | 41 页 | 4.09 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in a UNIX environment. • A Java 8.x installation. To develop Flink applications and use its DataStream API in Java0 码力 | 34 页 | 2.53 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020temperature”) } } Flink programs are defined in regular Scala/Java methods Set up the execution environment: local, cluster, I/O, time semantics, parallelism, … Example: Sensor Readings 9 Vasiliki0 码力 | 26 页 | 3.33 MB | 1 年前3
共 14 条
- 1
- 2













