Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | uses two hash functions, H1 and H2 and checks the load of the two sampled workers: P(k) = arg mini(Li(t): H1(k)=i ∨ H2(k)=i) • provably reduces load variation exponentially as compared to the single choice0 码力 | 31 页 | 1.47 MB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State <k, v> <#Brexit Vasiliki Kalavri | Boston University 2020 • MapState[K, V]: a map of keys and values • get(key: K), put(key: K, value: V), contains(key: K), remove(key: K) • iterators over the contained entries, keys0 码力 | 24 页 | 914.13 KB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Stream SQL, Scala, Python, Rust, Java… ??? Vasiliki Kalavri | Boston University 2020 Logic State <k, v> <#Brexit, 521> <#WorldCup, 480> <#StarWars, 300> <#Brexit> <#Brexit, 521> Stateful operators Emit(key, AsString(result)); MapReduce combiners example: URL access frequency (k2, list(v2)) → list(v2) (k1, v1) → list(k2, v2) map() reduce() 25 ??? Vasiliki Kalavri | Boston University 2020 MapReduce0 码力 | 54 页 | 2.83 MB | 1 年前3
Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston0 码力 | 35 页 | 444.84 KB | 1 年前3
Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University Class schedule: /lectures.html • including today’s slides • Piazza: piazza.com/bu/spring2020/cs591k1/home • For questions & discussions • Blackboard: learn.bu.edu/... • For quizzes, assignment office hours. Vasiliki Kalavri | Boston University 2020 Dataset A subset of traces from a large (12.5k machines) Google cluster • https://github.com/google/cluster-data/blob/master/ ClusterData2011_2.md0 码力 | 34 页 | 2.53 MB | 1 年前3
Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 20202020 Vasiliki Kalavri | Boston University 2020 Vasiliki (Vasia) Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki0 码力 | 22 页 | 2.22 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri continuous stream events where the jth update has the general form (k, c[j]) and modifies the kth entry of A with the operation A[k]←A[k] + c[j]. Vasiliki Kalavri | Boston University 2020 Time-Series Model: Cash-Register Model: In this model, multiple updates can increment an entry A[j]: In the jth update (k, c[j]), it must hold that c[j] ≥ 0. This can model insertion-only streams: • monitoring the total0 码力 | 45 页 | 1.22 MB | 1 年前3
Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation 12 or 01100 => rank(x1) = 2 • x2 = 9013, h(x2) = 24 or 11000 => rank(x2) = 3 h(x) = M−1 ∑ k=0 ik2k = (i0i1 . . . iM−1)2, ik ∈ {0,1} 4 ??? Vasiliki Kalavri | Boston University 2020 5 Let n be the * 1 2 . . . 1 2 = 2−r The probability of not seeing a tail with at least r 0s among k elements is: (1 − 2−r)k Is this a good estimate? 7 ??? Vasiliki Kalavri | Boston University 2020 The probability0 码力 | 69 页 | 630.01 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/28: Graph Streaming ??? Vasiliki Kalavri | 2020 48 A k-spanner is a graph synopsis that preserves the distances between any pair of nodes up to a factor of k: ∀(u, v) ∈ V, dG(u, v) ≤ dH(u, v) ≤ k ⋅ dG(u, v) The k-spanner synopsis E(H) = {} for (u, v) in input do if dH(u,v) > k then E(H).add((u,v)) ??? Vasiliki Kalavri | Boston University 2020 4 1 7 4 8 7 8 4 5 4 49 k=3 ??? Vasiliki Kalavri | Boston University 20200 码力 | 72 页 | 7.77 MB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/21: Sampling and filtering streams ??? Vasiliki input • k independent and uniformly distributed hash functions, where k << n The Bloom filter n bits h1 h2 hk … k hash functions ??? Vasiliki Kalavri | Boston University 2020 25 for i=1 to k do j 0 0 0 n bits h1 h2 hk … k hash functions stream elements x The empty filter is initialized to all 0s ??? Vasiliki Kalavri | Boston University 2020 26 for i=1 to k do j = hi(x) if the jth bit0 码力 | 74 页 | 1.06 MB | 1 年前3
共 23 条
- 1
- 2
- 3













