 How and When You
Should Measure CPU
Overhead of eBPF
ProgramsHow and When You Should Measure CPU Overhead of eBPF Programs Bryce Kahle, Datadog October 28, 2020 Why should I profile eBPF programs? CI variance tracking Tools kernel.bpf_stats_enabled kernel0 码力 | 20 页 | 2.04 MB | 1 年前3 How and When You
Should Measure CPU
Overhead of eBPF
ProgramsHow and When You Should Measure CPU Overhead of eBPF Programs Bryce Kahle, Datadog October 28, 2020 Why should I profile eBPF programs? CI variance tracking Tools kernel.bpf_stats_enabled kernel0 码力 | 20 页 | 2.04 MB | 1 年前3
 2.2.1通过Golang+eBPF实现无侵入应用可观测✅ … eBPF的编程实践 bcc libbpf + bpf + core 编程  bcc 依靠运行时汇编,将整个大型LLVM/Clang 库带入并嵌入其中  编译过程中资源用量大,对Cpu、Mem有要求  依赖内核的头包  bpf 程序跟其他的用户空间的程序没有太大区别  编译成二进制文件,可以适应不同运行环境  libbpf 扮演bpf程序装载机角色  开发人员只需要关注bpf程序的正确性和性能,不 用户感知架构,通过对比期望架构,发现问题,通常在新应用上线,新地区开服,整 体链路梳理等场景使用。 异常发现 节点 属性 关系 规则 异常发现,通过节点和关系颜色表达,能够快速地发现特点的节点和关系异常,进一步提升问题发 现和定位的效率,通常在应用运行时整体链路梳理和特定问题节点上下游分析等场景使用。 关联分析 上游 自身 下游 节点 上游1 上游2 上游3 下游1 下游2 下游2 下游3 实例 实例 实例 … 关联分析,通过关联关系的切换,可以快速查看上游请求和下游依赖,以及自身服务实例的 运行情况,进一步提升问题定位能力,通常在已经定位到某个异常节点后使用。 实例 全栈数据源,70+个告警模板开箱即用: 应用级别:Pod/Service/Deployment K8S控制面:apiserver/ETCD/Scheduler 基础设施:节点、网络、存储 云服0 码力 | 29 页 | 3.83 MB | 1 年前3 2.2.1通过Golang+eBPF实现无侵入应用可观测✅ … eBPF的编程实践 bcc libbpf + bpf + core 编程  bcc 依靠运行时汇编,将整个大型LLVM/Clang 库带入并嵌入其中  编译过程中资源用量大,对Cpu、Mem有要求  依赖内核的头包  bpf 程序跟其他的用户空间的程序没有太大区别  编译成二进制文件,可以适应不同运行环境  libbpf 扮演bpf程序装载机角色  开发人员只需要关注bpf程序的正确性和性能,不 用户感知架构,通过对比期望架构,发现问题,通常在新应用上线,新地区开服,整 体链路梳理等场景使用。 异常发现 节点 属性 关系 规则 异常发现,通过节点和关系颜色表达,能够快速地发现特点的节点和关系异常,进一步提升问题发 现和定位的效率,通常在应用运行时整体链路梳理和特定问题节点上下游分析等场景使用。 关联分析 上游 自身 下游 节点 上游1 上游2 上游3 下游1 下游2 下游2 下游3 实例 实例 实例 … 关联分析,通过关联关系的切换,可以快速查看上游请求和下游依赖,以及自身服务实例的 运行情况,进一步提升问题定位能力,通常在已经定位到某个异常节点后使用。 实例 全栈数据源,70+个告警模板开箱即用: 应用级别:Pod/Service/Deployment K8S控制面:apiserver/ETCD/Scheduler 基础设施:节点、网络、存储 云服0 码力 | 29 页 | 3.83 MB | 1 年前3
 Cilium的网络加速秘诀性能提升的主要表现: • 不同场景下,不同程度地降低了 网络数据包的“转发延时” • 不同场景下,不同程度地提升了 网络数据包的“吞吐量” • 不同场景下,不同程度地降低了 转发数据包所需的“ CPU 开销” eBPF 简介 eBPF 技术 在 Linux kernel 3.19 开始被 引入,可在用户态进行 eBPF 程序编程,编译 后,动态加载到内核指定的 hook 点上,以 VM 方式安全运行,其能过通过 最终实现内核数据进行修改,或者影响内核处 理请求的结果,或者改变内核处理请求的流程。 极大提升了内核处理事件的效率。 截止 linux 5.14 版本,eBPF 有32种类型程序。而 cilium 主要使用了如下类型程序: • sched_cls 。cilium在内核 TC 处实现数据包转发、负载均衡、过滤 • xdp 。cilium在内核 XDP 处实现数据包的转发、负载均衡、过滤 • cgroup_sock_addr 。cilium在 cgroup 中实现对service解析 • sock_ops + sk_msg。记录本地应用之间通信的socket,实现本地数据包的加速转发 加速同节点pod间通信 cilium 使用 eBPF 程序,借助 bpf_redirect() 或 bpf_redirect_peer() 等 helper 函数,快速帮助同宿主机间 的流量转发,节省了大量的内核协议栈 处理流程0 码力 | 14 页 | 11.97 MB | 1 年前3 Cilium的网络加速秘诀性能提升的主要表现: • 不同场景下,不同程度地降低了 网络数据包的“转发延时” • 不同场景下,不同程度地提升了 网络数据包的“吞吐量” • 不同场景下,不同程度地降低了 转发数据包所需的“ CPU 开销” eBPF 简介 eBPF 技术 在 Linux kernel 3.19 开始被 引入,可在用户态进行 eBPF 程序编程,编译 后,动态加载到内核指定的 hook 点上,以 VM 方式安全运行,其能过通过 最终实现内核数据进行修改,或者影响内核处 理请求的结果,或者改变内核处理请求的流程。 极大提升了内核处理事件的效率。 截止 linux 5.14 版本,eBPF 有32种类型程序。而 cilium 主要使用了如下类型程序: • sched_cls 。cilium在内核 TC 处实现数据包转发、负载均衡、过滤 • xdp 。cilium在内核 XDP 处实现数据包的转发、负载均衡、过滤 • cgroup_sock_addr 。cilium在 cgroup 中实现对service解析 • sock_ops + sk_msg。记录本地应用之间通信的socket,实现本地数据包的加速转发 加速同节点pod间通信 cilium 使用 eBPF 程序,借助 bpf_redirect() 或 bpf_redirect_peer() 等 helper 函数,快速帮助同宿主机间 的流量转发,节省了大量的内核协议栈 处理流程0 码力 | 14 页 | 11.97 MB | 1 年前3
 Cilium v1.10 Documentationin-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call network-latency Set CPU governor to performance The CPU scaling up and down can impact latency tests and lead to sub-optimal performance. To achieve maximum consistent performance. Set the CPU governor to performance:0 码力 | 1307 页 | 19.26 MB | 1 年前3 Cilium v1.10 Documentationin-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call network-latency Set CPU governor to performance The CPU scaling up and down can impact latency tests and lead to sub-optimal performance. To achieve maximum consistent performance. Set the CPU governor to performance:0 码力 | 1307 页 | 19.26 MB | 1 年前3
 Cilium v1.11 Documentationclusters or clustermeshes with more than 65535 nodes. Decryption with Cilium IPsec is limited to a single CPU core per IPsec tunnel. This may affect performance in case of high throughput between two nodes. WireGuard in-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call0 码力 | 1373 页 | 19.37 MB | 1 年前3 Cilium v1.11 Documentationclusters or clustermeshes with more than 65535 nodes. Decryption with Cilium IPsec is limited to a single CPU core per IPsec tunnel. This may affect performance in case of high throughput between two nodes. WireGuard in-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call0 码力 | 1373 页 | 19.37 MB | 1 年前3
 Can eBPF save us from the Data Deluge?The data deluge on modern storage 2 Compute node CPU Network Storage node Flash The data deluge on modern storage 3 Compute node 3 CPU Network Storage node Flash 16-lane PCIe, 16GB/s eBPF and DoS 6 Compute node CPU Network Storage node Flash eBPF and DoS 7 Compute node CPU Network Storage node Flash DoS eBPF and DoS 8 Compute node CPU Network Storage node Flash DoS reverse! 9 Compute node CPU Network Storage node Flash DoS in reverse! 10 Compute node CPU Network Storage node Flash Data DoS in reverse! 11 Compute node CPU Network Storage node Flash0 码力 | 18 页 | 266.90 KB | 1 年前3 Can eBPF save us from the Data Deluge?The data deluge on modern storage 2 Compute node CPU Network Storage node Flash The data deluge on modern storage 3 Compute node 3 CPU Network Storage node Flash 16-lane PCIe, 16GB/s eBPF and DoS 6 Compute node CPU Network Storage node Flash eBPF and DoS 7 Compute node CPU Network Storage node Flash DoS eBPF and DoS 8 Compute node CPU Network Storage node Flash DoS reverse! 9 Compute node CPU Network Storage node Flash DoS in reverse! 10 Compute node CPU Network Storage node Flash Data DoS in reverse! 11 Compute node CPU Network Storage node Flash0 码力 | 18 页 | 266.90 KB | 1 年前3
 Cilium v1.8 Documentationin-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. BPF programs can be run at various BPF datapath to perform more aggressive aggregation on packet forwarding related events to reduce CPU consumption while running cilium monitor. The automatic change only applies to the default ConfigMap Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call0 码力 | 1124 页 | 21.33 MB | 1 年前3 Cilium v1.8 Documentationin-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. BPF programs can be run at various BPF datapath to perform more aggressive aggregation on packet forwarding related events to reduce CPU consumption while running cilium monitor. The automatic change only applies to the default ConfigMap Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call0 码力 | 1124 页 | 21.33 MB | 1 年前3
 Cilium v1.9 Documentationin-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call and kube-scheduler instances. The CPU, memory and disk size set for the workers might be different for your use case. You might have pods that require more memory or CPU available so you should design your0 码力 | 1263 页 | 18.62 MB | 1 年前3 Cilium v1.9 Documentationin-kernel verifier ensures that eBPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. eBPF programs can be run at various Deriving rate limits based on number of available CPU cores or available memory can be misleading as well as the Cilium agent may be subject to CPU and memory constraints. For this reason, all API call and kube-scheduler instances. The CPU, memory and disk size set for the workers might be different for your use case. You might have pods that require more memory or CPU available so you should design your0 码力 | 1263 页 | 18.62 MB | 1 年前3
 Cilium v1.6 Documentation\ --min-cpu-platform "Intel Broadwell" \ kata-testing gcloud compute ssh kata-testing # While ssh'd into the VM: $ [ -z "$(lscpu|grep GenuineIntel)" ] && { echo "ERROR: Need an Intel CPU"; exit 1; kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. BPF programs can be run at various BPF datapath to perform more aggressive aggregation on packet forwarding related events to reduce CPU consumption while running cilium monitor. The automatic change only applies to the default ConfigMap0 码力 | 734 页 | 11.45 MB | 1 年前3 Cilium v1.6 Documentation\ --min-cpu-platform "Intel Broadwell" \ kata-testing gcloud compute ssh kata-testing # While ssh'd into the VM: $ [ -z "$(lscpu|grep GenuineIntel)" ] && { echo "ERROR: Need an Intel CPU"; exit 1; kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. BPF programs can be run at various BPF datapath to perform more aggressive aggregation on packet forwarding related events to reduce CPU consumption while running cilium monitor. The automatic change only applies to the default ConfigMap0 码力 | 734 页 | 11.45 MB | 1 年前3
 Cilium v1.5 Documentationin-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instruc�ons for na�ve execu�on efficiency. BPF programs can be run at various between 10 seconds and 30 minutes or 12 hours for LRU based maps. This should automa�cally op�mize CPU consump�on as much as possible while keeping the connec�on tracking table u�liza�on below 25%. If needed and bpf-ct-global-tcp-max can be increased. Se�ng both of these op�ons will be a trade-off of CPU for conntrack-gc-interval , and for bpf-ct-global-any-max and bpf-ct-global-tcp-max the amount of0 码力 | 740 页 | 12.52 MB | 1 年前3 Cilium v1.5 Documentationin-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instruc�ons for na�ve execu�on efficiency. BPF programs can be run at various between 10 seconds and 30 minutes or 12 hours for LRU based maps. This should automa�cally op�mize CPU consump�on as much as possible while keeping the connec�on tracking table u�liza�on below 25%. If needed and bpf-ct-global-tcp-max can be increased. Se�ng both of these op�ons will be a trade-off of CPU for conntrack-gc-interval , and for bpf-ct-global-any-max and bpf-ct-global-tcp-max the amount of0 码力 | 740 页 | 12.52 MB | 1 年前3
共 17 条
- 1
- 2














