Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020let us compute the statistical variance of this series? 3 Can this synopsis be used to answer general queries? • the sum of all the values • the sum of the squares of the values • the number of hash function h to hash the user name (or IP) and select queries only when h(user) = 0. 13 In general: We can obtain a sample of any a/b fraction of users by hashing usernames to b buckets and selecting hash function h to hash the user name (or IP) and select queries only when h(user) = 0. 13 In general: We can obtain a sample of any a/b fraction of users by hashing usernames to b buckets and selecting0 码力 | 74 页 | 1.06 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020currently active 10 The vector is updated by a continuous stream events where the jth update has the general form (k, c[j]) and modifies the kth entry of A with the operation A[k]←A[k] + c[j]. Vasiliki initiated or terminated between any pair of addresses at any point in the stream. 13 It is the most general model Hard to develop space-efficient and time-efficient algorithms Vasiliki Kalavri | Boston0 码力 | 45 页 | 1.22 MB | 1 年前3
PyFlink 1.15 Documentationfirst example is UDFs used in Table API & SQL [20]: from pyflink.table.udf import udf # create a general Python UDF @udf(result_type=DataTypes.BIGINT()) def plus_one(i): return i + 1 table.select(plus_one(col('id'))) 1.1. Getting Started 17 pyflink-docs, Release release-1.15 [20]: _c0 0 2 1 3 [21]: # create a general Python UDF @udf(result_type=DataTypes.BIGINT(), func_type='pandas') def pandas_plus_one(series):0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationfirst example is UDFs used in Table API & SQL [20]: from pyflink.table.udf import udf # create a general Python UDF @udf(result_type=DataTypes.BIGINT()) def plus_one(i): return i + 1 table.select(plus_one(col('id'))) 1.1. Getting Started 17 pyflink-docs, Release release-1.16 [20]: _c0 0 2 1 3 [21]: # create a general Python UDF @udf(result_type=DataTypes.BIGINT(), func_type='pandas') def pandas_plus_one(series):0 码力 | 36 页 | 266.80 KB | 1 年前3
Streaming in Apache FlinkJava object) is any Java class that • has an empty default constructor • all fields are either ◦public, or ◦have a default getter and setter Tuple2person = new Tuple2<>("Fred" name = person.f0; Integer age = person.f1; public class Person { public String name; public Integer age; public Person() {}; public Person(String name, Integer age) { Transforming Data Transforming Data public static class EnrichedRide extends TaxiRide { public int startCell; public int endCell; public EnrichedRide() {} public EnrichedRide(TaxiRide ride) { 0 码力 | 45 页 | 3.00 MB | 1 年前3
Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020TCP connection is a simple messaging system which connects one sender with one recipient. • A general messaging system connects multiple producers to multiple consumers by organizing messages into0 码力 | 33 页 | 700.14 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020connect(fares) .flatMap(new MatchFunction()); Java example 20 Vasiliki Kalavri | Boston University 2020 public static class EnrichmentFunction extends RichCoFlatMapFunctionrideState; private ValueState fareState; @Override public void open(Configuration config) { // initialize the state descriptors here rideState getRuntimeContext().getState(new ValueStateDescriptor<>("saved fare", TaxiFare.class)); } @Override public void flatMap1(TaxiRide ride, Collector > out) throws Exception { 0 码力 | 24 页 | 914.13 KB | 1 年前3
监控Apache Flink应用程序(入门)resources to the TaskManager (in case of a containerized setup), or by providing more TaskManagers. In general, a system already running under very high load during normal operations, will need much more time0 码力 | 23 页 | 148.62 KB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020or equal to τ > 0. If S = Rτ for some τ , then S is pre-sequence of R, denoted S ⊆τ R. In general, if S1, ..., Sn and R1, ..., Rn be timestamped sequences, then (S1, ..., Sn) ⊆τ (R1, ..., Rn) when0 码力 | 53 页 | 532.37 KB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020the window Input and output types must be the same Vasiliki Kalavri | Boston University 2020 public interface AggregateFunctionextends Function, Serializable { // create a new processing time and the watermark. ProcessWindowFunction 18 Vasiliki Kalavri | Boston University 2020 public abstract class ProcessWindowFunction extends AbstractRichFunction Collector out) throws Exception; public abstract class Context implements Serializable { // Returns the metadata of the window public abstract W window(); // Returns 0 码力 | 35 页 | 444.84 KB | 1 年前3
共 11 条
- 1
- 2













