Intro to Prometheus - With a dash of operations & observabilityimmediate benefits Focus on removing repeated, manual tasks of no lasting benefit Show that you free up time and reduce toil Richard Hartmann & Frederic Branczyk @TwitchiH & @fredbrancz Intro to Prometheus so not to repeat them To write a good incident report, there must be no fear of retribution Blame-free post-mortems allow everyone to document exactly what went wrong and in what order It is important0 码力 | 19 页 | 63.73 KB | 1 年前3
B站统⼀监控系统的设计,演进
与实践分享案例例2 告警规则: 磁盘容量量可⽤用率 <10% 告警规则: 磁盘容量量预计将于3⼩小时后饱和 0 now -1h +3h predict_linear(node_filesystem_free{}[1h], 3 * 3600) < 0 异常检测 异常流量量 abs(requests - requests:holt_winters_rate1h offset 7d) > 0.3 *0 码力 | 34 页 | 650.25 KB | 1 年前3
OpenMetrics - Standing on the shoulders of Titansbucket, i.e. exemplars Some integrations already support this concept, e.g. OpenCensus Ingestors are free to discard this optional data, e.g. Prometheus Richard Hartmann, RichiH@{freenode,OFTC,IRCnet}, richih@{fosdem0 码力 | 21 页 | 84.83 KB | 1 年前3
Prometheus Deep Dive - Monitoring. At scale.per histogram bucket Some integrations already support this concept, e.g. OpenCensus Ingestors are free to discard this optional data, e.g. Prometheus Richard Hartmann & Frederic Branczyk @TwitchiH & @fredbrancz0 码力 | 34 页 | 370.20 KB | 1 年前3
PromQL 从入门到精通都被删除。姑且可以理解为一个减法,vector1 - vector2。 举个例子,还是磁盘利用率的问题,对于超过1个T的大盘,剩余量小于300G就告警,promql 怎么写? disk_free{app="clickhouse"}/1024/1024/1024 < 300 unless disk_total{app="clickhouse"}/1024/1024/1024 < 10240 码力 | 16 页 | 2.77 MB | 1 年前3
共 5 条
- 1













