Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020Tv3WEyrNE3lvxikGMR1IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpbm8lbEgVZcZmU7YheIsvLxP/vHZd85oX1fpNk Tv3WEyrNE3lvxikGMR1IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpbm8lbEgVZcZmU7YheIsvLxP/vHZd85oX1fpNk Tv3WEyrNE3lvxikGMR1IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpbm8lbEgVZcZmU7YheIsvLxP/vHZd85oX1fpNk0 码力 | 81 页 | 13.18 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020For each element x, let rank(x) be the number of 0s in the end of h(x): • e.g. • x1 = 318, h(x1) = 12 or 01100 => rank(x1) = 2 • x2 = 9013, h(x2) = 24 or 11000 => rank(x2) = 3 h(x) = M−1 ∑ k=0 Vasiliki Kalavri | Boston University 2020 The hash function h hashes x to any of N values with probability 1/N. Out of all x we hash: • around 50% will have a binary representation that ends in at Vasiliki Kalavri | Boston University 2020 The hash function h hashes x to any of N values with probability 1/N. Out of all x we hash: • around 50% will have a binary representation that ends in at0 码力 | 69 页 | 630.01 KB | 1 年前3
Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 Lossy Counting • Find all items x in a data stream such that: • freq(x) > δ*N, where N is the number of stream elements • The solution will not contain item e in the input stream f: estimated frequency of item δ: user-defined threshold, so that freq(x)≥ δ*N,δ∈(0,1) ε: user-defined error Output: All items with frequency greater than or equal to δ*N element x in wcur: if x ∈ D, increase its frequency, fx = fx +1 else insert with frequency fx = 1 and error εx = wcur - 1 N = N + 1 Delete step Iterate over D and remove every element x with0 码力 | 31 页 | 1.47 MB | 1 年前3
PyFlink 1.15 Documentationanaconda.com/miniconda/ # Suppose the name of the downloaded miniconda installer is miniconda.sh chmod +x miniconda.sh # install miniconda ./miniconda.sh -b -p miniconda # Activate the miniconda environment TableEnvironment.create(env_settings) table_env [1]:x7fcd16342ac8> [2]: # Create a streaming TableEnvironment env_settings = EnvironmentSettings.in_streaming_mode() TableEnvironment.create(env_settings) table_env [2]: x7fcd1ad0c0f0> Table Creation Table is a core component of the Python Table API. A Table object describes 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationanaconda.com/miniconda/ # Suppose the name of the downloaded miniconda installer is miniconda.sh chmod +x miniconda.sh # install miniconda ./miniconda.sh -b -p miniconda # Activate the miniconda environment TableEnvironment.create(env_settings) table_env [1]:x7fcd16342ac8> [2]: # Create a streaming TableEnvironment env_settings = EnvironmentSettings.in_streaming_mode() TableEnvironment.create(env_settings) table_env [2]: x7fcd1ad0c0f0> Table Creation Table is a core component of the Python Table API. A Table object describes 0 码力 | 36 页 | 266.80 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020probability that an element x in S remains: ??? Vasiliki Kalavri | Boston University 2020 17 Inductive step At time tn+1, we need to compute the probability that an element x in S remains: Probability University 2020 17 Inductive step At time tn+1, we need to compute the probability that an element x in S remains: Probability that element n+1 is not selected OR ??? Vasiliki Kalavri | Boston University compute the probability that an element x in S remains: Probability that element n+1 is not selected Probability that n+1 is selected but it doesn’t replace x OR ??? Vasiliki Kalavri | Boston University0 码力 | 74 页 | 1.06 MB | 1 年前3
Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we scale ο1 x 4? How much to scale ο2? ??? Vasiliki Kalavri | Boston University 2020 14 o1 src o2 back-pressure back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we scale ο1 x 4? How much to scale ο2? o1 cannot keep up waiting for output waiting for input src o1 o2 ??? Vasiliki back-pressure target: 40 rec/s 10 rec/s 100 rec/s Which operator is the bottleneck? What if we scale ο1 x 4? How much to scale ο2? o1 cannot keep up waiting for output waiting for input src o1 o2 o20 码力 | 93 页 | 2.42 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020pre-aggregation similar to what combiners do in MapReduce Operator separation merge X merge A A X merge A1 merge A2 A2 A1 X X ??? Vasiliki Kalavri | Boston University 2020 map(String key, String value): Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/ 25.0 Accept: text/html,application/ xhtml+xml,application/ xml;q=0.9,*/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/ 25.00 码力 | 54 页 | 2.83 MB | 1 年前3
Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020subscription pattern language A(X>0) & (B(Y=10);[timespan:5] C(Z<5))[within:15] A, B, C are topics X, Y, Z are inner fields The rule fires when an item of type A having an attribute X > 0 enters the system 1); (0..10).to_stream(scope) .concat(&stream) .inspect(|x| println!("seen: {:?}", x)) .connect_loop(handle); }); t (t, l1) (t, (l1, l2)) Streaming Iteration ‘modified-pattern123’, X.CustomerId FROM webevents PARTITION BY CustomerId AS PATTERN (X Y Z) WHERE X.Event = ‘order’ AND Y.Event = ‘rebate’ AND Y.ItemID = X.ItemID AND Z.Event0 码力 | 53 页 | 532.37 KB | 1 年前3
Streaming in Apache Flinkenv.fromElements("DROP", "IGNORE").keyBy(x -> x); DataStreamstreamOfWords = env.fromElements("data", "DROP", "artisans", "IGNORE") .keyBy(x -> x); control .connect(datastreamOfWords) max)); } } Buffers all the events DataStream input = ... input .keyBy(x -> x.key) .window(TumblingEventTimeWindows.of(Time.minutes(1))) .reduce(new MyReducingMax(), new 0 码力 | 45 页 | 3.00 MB | 1 年前3
共 15 条
- 1
- 2













