OpenMetrics - Standing on the shoulders of TitansOpenMetrics Outro Problem statement After Prometheus Prometheus has become a de-facto standard in cloud-native metric monitoring Ease of exposing data has lead to an explosion in compatible metrics endpoints Introduction Quick intro OpenMetrics Outro Problem statement Goldrush So about fragmentation... Cloud native is in the buzz phase Explosive growth always equals goldrush and stake claiming, aka fragmentation Quick intro OpenMetrics Outro Plans Beyond cloud OpenMetrics is intended to go beyond ”just” cloud-native fields Want to get more traditional projects and vendors on board Arista already on board Vertiv0 码力 | 21 页 | 84.83 KB | 1 年前3
Intro to Prometheus - With a dash of operations & observabilityPrometheus is a pull-based system Black-box monitoring: Looking at a service from the outside (Does the server answer to HTTP requests?) White-box monitoring: Instrumention code from the inside (How much time Supports dozens of data sources Modern UI Allows for complex data manipulation and visualization Native Prometheus support New feature: Interactive exploration of Prometheus data Richard Hartmann & Frederic0 码力 | 19 页 | 63.73 KB | 1 年前3
Prometheus Deep Dive - Monitoring. At scale.second project to ever join CNCF and the de facto standard in cloud-native monitoring Kubelets, sidecars, microservices, ALL the cloud-native But it’s a monolithic application ...why? Richard Hartmann & issues Some bugs and some human mistakes in the release process Always running latest is the cloud-native approach, but this is still not acceptable ..especially if every single version has its issues0 码力 | 34 页 | 370.20 KB | 1 年前3
1.6 利用夜莺扩展能力打造全方位监控系统喻波 滴滴 专家工程师 目 录 运维监控需求来源 01 监控痛点:全面完备、跨云 02 夜莺介绍: 国产开源监控系统 03 夜莺设计实现:Agentd 数据采集 04 夜莺设计实现:Server 数据处理 05 夜莺设计实现:技术难点及细节 06 运维监控需求来源 第一部分 如果贵司的业务强依赖IT技术,IT故障会直接影响营业收入, 稳定性体系一定要重视起来,而监控,就是稳定性体系中至 Nightingale 众多企业已上生产,共同打磨夜莺 上图展示部分社区用户,加入夜莺社群,请联系微信:UlricQin Nightingale 众多企业已上生产,共同打磨夜莺 Server01 Server02 Agentd Agentd LoadBalance 1. 单机版Prom 2. 集群版m3db 3. 集群版n9e-tsdb 3种存储方案,按需选择 Agentd 夜莺设计实现 Forwarder 夜莺设计实现 Server 数据处理 第五部分 夜莺Server数据处理 01. 服务器 02. API 夜莺Server数据处理 03. AlarmRule Control 夜莺Server数据处理 04. CollectRule Control 夜莺Server数据处理 04. CollectRule Control 夜莺Server数据处理 04. CollectRule0 码力 | 40 页 | 3.85 MB | 1 年前3
B站统⼀监控系统的设计,演进
与实践分享sharding (实验性质使⽤用) • prometheus 2.0 (tsdb) HA prometheus server1 server2 server3 prometheus IDC HA prometheus server1 server2 server3 prometheus IDC Federation pr s s s pr I pr s s 1. 降低编写规则的成本 2. 降低多idc维护成本 规则管理理⻚页⾯面 例例⼦子 - 业务监控 稿件 账号 Feed PAAS托管 服务树 container http server sdk 注册 获取target 采集数据 吞吐量量 响应时间 错误率 饱和度 熔断 限流 投稿数量量 订单数据 在线⼈人数 … ⻩黄⾦金金指标 业务指标 少量量事件0 码力 | 34 页 | 650.25 KB | 1 年前3
PromQL 从入门到精通mysql_slave_status_slave_sql_running == 0 and ON (instance) mysql_slave_status_master_server_id > 0 这个promql想表达的意思是如果这个mysql实例是个slave(master_server_id>0),则检查其 slave_sql_running的值,如果slave_sql_running==0表示slave sql线程没有在运行。 sql线程没有在运行。 但是mysql_slave_status_slave_sql_running和mysql_slave_status_master_server_id这两个 metric的标签可能并非完全一致,不过好在二者都有个instance标签,且相同instance标签的数 据从语义上来看就表示一个实例的多个指标数据,那就可以用on关键字,指定只使用instance 标签做匹配,忽略其他标签。0 码力 | 16 页 | 2.77 MB | 1 年前3
共 6 条
- 1













