监控Apache Flink应用程序(入门)..................................................................................... 16 4.13.1 Memory................................................................................................. 7/ops/config.html#configuring-the-network-buffers 8 https://www.da-platform.com/blog/manage-rocksdb-memory-size-apache-flink? __hstc=216506377.c9dc814ddd168ffc714fc8d2bf20623f. 1550652804788.1550652804788 metrics you want to look at are memory consumption and CPU load of your Task- & JobManager JVMs. 4.13.1 Memory Flink reports the usage of Heap, NonHeap, Direct & Mapped memory for JobManagers and TaskManagers0 码力 | 23 页 | 148.62 KB | 1 年前3
Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020integer ru between 0 and 9 and add the user to the sample if ru = 0. Do we need to keep all users in memory? ??? Vasiliki Kalavri | Boston University 2020 We can use a hash function h to hash the user name Kalavri | Boston University 2020 28 Assume we expect around 1 billion elements and we have a fixed memory budget of 512MB • How many hash functions to use? • What would be the false positive rate? Kalavri | Boston University 2020 28 Assume we expect around 1 billion elements and we have a fixed memory budget of 512MB • How many hash functions to use? • What would be the false positive rate?0 码力 | 74 页 | 1.06 MB | 1 年前3
PyFlink 1.15 Documentationenvironments to use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec could not meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.shi following: ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.shi 0 码力 | 36 页 | 266.77 KB | 1 年前3
PyFlink 1.16 Documentationenvironments to use. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name=\ -pyclientexec could not meet. ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.shi following: ./bin/flink run-application -t yarn-application \ -Djobmanager.memory.process.size=1024m \ -Dtaskmanager.memory.process.size=1024m \ -Dyarn.application.name= \ -Dyarn.shi 0 码力 | 36 页 | 266.80 KB | 1 年前3
State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University 2020 MemoryStateBackend latencies • OutOfMemoryError if large grows too large, GC pauses • Checkpoints sent to JobManager's heap memory, i.e. the state is lost in case of failure • Use only for development and debugging purposes! FsStateBackend TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory and might suffer from GC pauses Which backend to choose?0 码力 | 24 页 | 914.13 KB | 1 年前3
Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020the graph from disk and partition it in memory 10 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute: read and mutate the graph and partition it in memory 2. Compute: read and mutate the graph state 11 ??? Vasiliki Kalavri | Boston University 2020 1. Load: read the graph from disk and partition it in memory 2. Compute: read Lorenzo De, et al. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 2017. https://www.kdd.org/ kdd2016/papers/files/rfp0465-de-stefaniA.pdf Further reading0 码力 | 72 页 | 7.77 MB | 1 年前3
Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020entire stream in an accessible way • we have to process stream elements on-the-fly using limited memory 2 Vasiliki Kalavri | Boston University 2020 Properties of data streams • They arrive continuously ins_r(P) ^ j.A ≠ i.A}). 28 Vasiliki Kalavri | Boston University 2020 Query processing challenges • Memory requirements: we cannot store the whole stream history. • Data rate: we cannot afford to continuously easily updated with a single pass over streaming tuples in their arrival order • Small space: memory footprint poly-logarithmic in the stream size • Low time: fast update and query times • Delete-proof:0 码力 | 45 页 | 1.22 MB | 1 年前3
Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020cost against resource utilization • Operators on the same host compete for resources, e.g. memory and CPU Operator placement D A B C E D A B C E Profitability ??? Vasiliki Kalavri | Boston Avoid race conditions: either ensure the data is immutable or synchronize access to state. • Manage memory safely: reclaiming and growing without bounds. State sharing Avoid unnecessary data copies B A series of deterministic batch computations on small time intervals • Keep intermediate state in memory • Use Spark's RDDs instead of replication • Parallel recovery mechanism in case of failures 440 码力 | 54 页 | 2.83 MB | 1 年前3
Scalable Stream Processing - Spark Streaming and Flinkassociated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in associated with a Receiver object. • It receives the data from a source and stores it in Spark’s memory for processing. ▶ Three categories of streaming sources: 1. Basic sources directly available in Sources (2/3) class CustomReceiver(host: String, port: Int) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging { def onStart() { new Thread("Socket Receiver") { override def run() {0 码力 | 113 页 | 1.22 MB | 1 年前3
Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020rate? • drop messages • buffer messages in a queue: what if the queue grows larger than available memory? 2 ??? Vasiliki Kalavri | Boston University 2020 Keeping up with the producers • Producers can rate? • drop messages • buffer messages in a queue: what if the queue grows larger than available memory? • block the producer (back-pressure, flow control) 2 ??? Vasiliki Kalavri | Boston University0 码力 | 43 页 | 2.42 MB | 1 年前3
共 15 条
- 1
- 2













