Memory Usage - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Prometheus Deep Dive - Monitoring. At scale.

Introduction Intro 2.0 to 2.2.1 2.4 - 2.6 Beyond Outro Storage Results 15x reduction in memory usage 6x reduction in CPU usage 80-100x reduction in disk writes 5x reduction in on-disk size 4x reduction in has not yet been committed, or has been rolled back, is ignored at query time We keep write IDs in memory; if we restart or crash, the atomicity of the write ahead log will protect us Richard Hartmann &

0 码力 | 34 页 | 370.20 KB | 1 年前
3
告警OnCall事件中心建设方法白皮书

incident。举个例子，最原始的告警事件，比如 host1 在 timestamp1 产生了一条 cpu_usage_idle 的告警，我们称为一个 event。如果没有恢复，一段时间之后，比如 timestamp1 + 60min，一般会再发出一个告警，还是 host1，还是 cpu_usage_idle 这个指标。很明显，这两个告警事件是有关联关系的，指代的是一个问题，只是时间戳不同，这样的两个 alert 的唯一标识。比如刚才的例子，告警策略的 ID 假设为 32，标签集是：[“name=cpu_usage_idle”, “host=host1”]，这两个时间戳产生的告警事件，哈希值都是一样的。计算方法是： hash(32 + ["__name__=cpu_usage_idle", "host=host1"]) 从 event 到 alert 的这个收敛逻辑，会把事件聚合为告警，告警聚合为故障，最终通知的是故障。那具体如何聚合呢？告警聚合事件到告警的聚合比较容易，通常是用类似下面的算法来计算不同事件的关联关系： hash(32 + ["__name__=cpu_usage_idle", "host=host1"]) 这个值姑且称为事件 Hash，相同 Hash 的事件就被聚合为一条告警。更复杂的是告警到故障的合并，当前我们支持基于规则的聚合，后面会基于算法聚合：

0 码力 | 23 页 | 1.75 MB | 1 年前
3
Intro to Prometheus - With a dash of operations & observability

not an alert Important but non-urgent incidents are handled during business hours Predict your usage so you add capacity during business hours If there’s no playbook, it does not go into production observability Outro Leverage One combined system allows for correlation and combination Power usage against service load Optical networks against outside temperature Datacenter power feed load against state Dashboards for drill-down Auto-generated PDFs for customers Global SLO statements for sales Usage exports for accounting If all you have is a hammer... choose your hammer well Richard Hartmann &

0 码力 | 19 页 | 63.73 KB | 1 年前
3

共 3 条前往

页

Prometheus Deep Dive Monitoring At scale 告警 OnCall 事件中心建设方法白皮皮书白皮书 Intro to With dash of operations observability

分类

语言

格式

Prometheus Deep Dive - Monitoring. At scale.

告警OnCall事件中心建设方法白皮书

Intro to Prometheus - With a dash of operations & observability