PyFlink 1.15 Documentation2 QuickStart: DataStream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Creation, listing Tables, Conversion between Table and DataStream, etc. • User-defined function management: User-defined function registration, dropping, listing, etc. • Executing SQL queries supports various UDFs and APIs to allow users to execute Python native functions. See also the latest User- defined Functions and Row-based Operations. The first example is UDFs used in Table API & SQL [20]:0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentation2 QuickStart: DataStream API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Creation, listing Tables, Conversion between Table and DataStream, etc. • User-defined function management: User-defined function registration, dropping, listing, etc. • Executing SQL queries supports various UDFs and APIs to allow users to execute Python native functions. See also the latest User- defined Functions and Row-based Operations. The first example is UDFs used in Table API & SQL [20]:0 码力 | 36 页 | 266.80 KB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkconnections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom sources, e.g., user-provided sources. 13 / 79 Input Operations ▶ Every input DStream is associated with a Receiver connections. 2. Advanced sources, e.g., Kafka, Flume, Kinesis, Twitter. 3. Custom sources, e.g., user-provided sources. 13 / 79 Input Operations - Basic Sources ▶ Socket connection • Creates a DStream built-in timeouts • Think what would happen in our example, if the event signaling the end of the user session was lost, or had not arrived for some reason. 48 / 79 mapWithState Operation ▶ mapWithState0 码力 | 113 页 | 1.22 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020Synopsis: a lossy, compact summary of the input stream input stream synopsis maintenance component user queries approximate results ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient fixed proportion of the stream, e.g. 1/10th 7 search engine <user, query, timestamp> query stream Example use-case: Web search user behavior study Q: How many queries did users repeat last month When a query arrives: • if the user is sampled: add the query to S • if we haven’t seen the user before: generate a random integer ru between 0 and 9 and add the user to the sample if ru = 0. ??? Vasiliki0 码力 | 74 页 | 1.06 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020uk Connection: keep-alive Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25 uk Connection: keep-alive Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25 uk Connection: keep-alive Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/250 码力 | 54 页 | 2.83 MB | 1 年前3
Introduction to Apache Flink and Apache Kafka - CS 591 K1: Data Stream Processing and Analytics Spring 2020setParallelism() in your application. taskmanager.numberOfTaskSlots: The number of parallel operator or user function instances that a single TaskManager can run. This value is typically proportional to the run ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out Run with a class entry point and arguments: ./bin/flink run -c ./examples/batch/WordCount.jar \ --input file:///home/user/hamlet.txt --output file:///home/user/wordcount_out 19 Flink commands Vasiliki Kalavri | Boston University 2020 Resources0 码力 | 26 页 | 3.33 MB | 1 年前3
监控Apache Flink应用程序(入门)程度。Flink可以从大多数source获得基本metrics。 4.10 关键指标 Metric Scope Description records-lag-max user applies to FlinkKafkaConsumer The maximum lag in terms of the number of records for any partition best indication that the consumer group is not keeping up with the producers. millisBehindLatest user applies to FlinkKinesisConsumer The number of milliseconds a consumer is behind the head of a time window) for functional reasons. 4. Each computation in your Flink topology (framework or user code), as well as each network shuffle, takes time and adds to latency. 5. If the application emits0 码力 | 23 页 | 148.62 KB | 1 年前3
Apache Flink的过去、现在和未来增量 snapshot 基于 credit 的流控机制 Streaming SQL ------------------------- | USER_SCORES | ------------------------- | User | Score | Time | ------------------------- | Julie | 7 | 12:01 | | Frank -- | ---------------------------- Stream Mode: 12:01> SELECT Name, SUM(Score), MAX(Time) FROM USER_SCORES GROUP BY Name; Flink 在阿里的服务情况 集群规模 超万台 状态数据 PetaBytes 事件处理 十万亿/天 峰值能力 17亿/秒 Flink 的过去0 码力 | 33 页 | 3.36 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020Kalavri | Boston University 2020 Mobile game application • input stream: user activity • output: rewards based on how fast the user meets goals • e.g. pop 500 bubbles within 1 minute and get extra life latency. Trade-offs 17 Vasiliki Kalavri | Boston University 2020 Periodic: periodically ask the user-defined function for the current watermark timestamp. Punctuated: check for a watermark in each0 码力 | 22 页 | 2.22 MB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Software requirements • All assignments assume a UNIX-based setup. • If you are a Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to impressions, clicks, transactions, likes, comments • Analytics on user activity • Filtering, aggregation, joins with static data (e.g. user profile data) Examples • online A/B testing • trending topics0 码力 | 34 页 | 2.53 MB | 1 年前3
共 19 条
- 1
- 2













