Intro to Prometheus - With a dash of operations & observability@TwitchiH & @fredbrancz Intro to Prometheus Introduction Background Operations & observability Outro Time split 1 1/3 Prometheus 2 1/3 Observability 3 1/3 Questions Richard Hartmann & Frederic Branczyk Introduction Background Operations & observability Outro Prometheus 101 Inspired by Google’s Borgmon Time series database unit64 millisecond timestamp, float64 value Instrumentation & exporters Not for @fredbrancz Intro to Prometheus Introduction Background Operations & observability Outro Main selling points Highly dynamic, built-in service discovery No hierarchical model, n-dimensional label set PromQL:0 码力 | 19 页 | 63.73 KB | 1 年前3
Prometheus Deep Dive - Monitoring. At scale.Introduction Intro 2.0 to 2.2.1 2.4 - 2.6 Beyond Outro Prometheus 101 Inspired by Google’s Borgmon Time series database int64 timestamp, float64 value Ecosystem of instrumentation & exporters Not for @fredbrancz Prometheus Deep Dive Introduction Intro 2.0 to 2.2.1 2.4 - 2.6 Beyond Outro Main selling points Highly dynamic, built-in service discovery No hierarchical model, n-dimensional label set PromQL: - 2.6 Beyond Outro Storage Prometheus 1.x We used to have one file per time series ..and one common index for all of time Relatively easy to implement Pretty efficient Why change? Richard Hartmann0 码力 | 34 页 | 370.20 KB | 1 年前3
PromQL 从入门到精通querying/functions/ 这一节我们举例说明一些常用的函数。 absent_over_time 接收一个 range-vector,如果range-vector是空,则返回1,表示absent,如果range-vector 有内容,则什么都不返回。 这个特性在生产环境下可以用作nodata告警,比如: absent_over_time(system_load_norm_1{ident="tt-fc-dev02 何一台失联了就告警,想当然的我们可能会这么写: absent_over_time(system_load_norm_1[5m]) 很遗憾,这个结果不符合预期,只要任一台机器有在上报监控数据,这个promql就返回空,即 使已经有99台机器挂了,还剩最后一台机器在上报监控数据,这个promql也仍然返回空。 所以实际上,如果我们想要对100台机器使用absent_over_time做失联告警,就要配置100条告 警规则,_over_time 这类聚合函数和聚合运算章节提供的sum、avg等聚合运算符非常像,容易混淆,着重做一个说 明,比如avg,参数是instant-vector,是在同一时刻,对多个series的多个值求平均,而 avg_over_time,参数是 range-vector,是根据指定的时间范围,求取时间范围内的多个值的 平均。 比如 avg_over_time(mem_avai 0 码力 | 16 页 | 2.77 MB | 1 年前3
OpenMetrics - Standing on the shoulders of Titansimplementation Still finding minor bugs during implementation RFC currently blocked on me finding time Prometheus: experimental support since 2.5.0 Python client library: experimental support since 0 Plans Beyond metrics OpenMetrics supports more than just metrics Every single data point in a time series can point to one single event Especially useful if you emit one trace id per histogram bucket0 码力 | 21 页 | 84.83 KB | 1 年前3
共 4 条
- 1













