GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3
Node Operator: Kubernetes Node Management Made SimpleNode Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3
vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGproject was to enable GPU on Kubernetes with vSphere. Also actively contributing to kubelet, device manager, device plugin area. GitHub: @figo Steve Wong Hui Luo Presenter Bios 3 Abstract Kubernetes placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes0 码力 | 25 页 | 2.22 MB | 1 年前3
Putting an Invisible Shield on Kubernetes Secretscomplicated! ü User access management => raw and extensive! ü Secrets management => crucial! • Financial-grade security [1] KubeCon China 2018: Node Operator: Kubernetes Node Management Made Simple - Joe Chen extensions-webhook: /mutating-secret • Annotation: /storage-transform-disable=• Emergency management • High Availability guarantee • KMS • API server & kms-plugin • Cron job backup for KEKs (from Solution • Same TEE-based KMS-plugin runtime • Deployment modes • N (>=3) SGX servers deployed w/ sgx-device- plugin daemonset [1] • kms-plugins deployed as deployment • Interfaces • https + connection reuse 0 码力 | 33 页 | 20.81 MB | 1 年前3
Advancing the Tactical Edge with K3s and SUSE RGSmicroservices-centric strategy, in solving the most complex of infrastruc- ture management challenges. When it came to Kubernetes management, the team trialed a number of options. “K3s has been a foundational President Booz Allen Hamilton KubeEdge and K3s seemed the most natural starting point, given the device-centric use case. After assessing other leading Kuber- netes distributions, it was clear that many view, the digital solutions business is seeing the impact of SmartEdge on the evolution of the device landscape. According to the team, Booz Allen’s clients can use a range of devices as the software0 码力 | 8 页 | 888.26 KB | 1 年前3
用户界面State of the UI_ Leveraging Kubernetes Dashboard and Shaping its Futureplugins or integrations 集成第三方插件 2. Feature parity with kubectl 功能与kubectl保持一致 3. Multi-cluster management 多集群管理 4. Improved security 提高安全性 Top requested changes 1. Third-party plugins or integrations running Kubernetes on-prem and in the cloud 2. Feature parity with kubectl 3. Multi-cluster management 多集群管理 Most survey takers said that it is very or extremely important to see resources from Kubernetes in GCP and on-prem ● Custom Resource Definitions support ● Service topology view ● Mobile device support ● Cost estimates ● CI/CD pipelines ● ...and more! Additional feature requests Get involved0 码力 | 41 页 | 5.09 MB | 1 年前3
K8S安装部署开放服务PEERROUTES=yes IPV4_FAILURE_FATAL=no NAME=Internet UUID=b23b1970-690d-4af5-b014-4ac822dfd42c DEVICE=ens160 ONBOOT=yes IPADDR=202.114.193.101 GATEWAY=202.114.193.254 NETMASK=255.255.255.0 tencent.com/developer/article/1627330 Step1: 安装 docker-ce # 安装需要的支撑软件 yum install -y yum-utils device-mapper-persistent-data lvm2 # 添加 yum 源 yum-config-manager --add-repo https://download.docker.c rs0:PRIMARY> rs.secondaryOk() rs0:PRIMARY> show dbs rs0:PRIMARY> use admin rs0:PRIMARY> db.device.find() O.Helm 安装 influxdb Step1. 下载 influxdb helm chart helm repo add influxdata https://helm0 码力 | 54 页 | 1.23 MB | 1 年前3
基于 KUBERNETES 的 容器器 + AI 平台Registry project CI/CD workspace Pod … resources CPU quota MEM quota Storage quota Device (GPU) quota …. quota Service Config group … k8s objects Application template ⽤用户场景0 码力 | 19 页 | 3.55 MB | 1 年前3
在大规模Kubernetes集群上实现高SLO的方法process • orphaned containers • orphaned pod directories/volumes • orphaned cgroups • orphaned net device and so on, node recovery system can cleanup those dirty data or alert cluster admins to process0 码力 | 11 页 | 4.01 MB | 1 年前3
深度解析CNCF社区⾸个基于Kubernetes的边缘计算平台KubeEdgeKubernetes Pod 和 Node 状态通过云端 kubectl 查询, 从边缘端收集/报告数据。� ➔ 边缘节点在脱机时⾃自动恢复,并重新连接云端。� ➔ ⽀支持IoT设备通过Device twin 和 MQTT 协议与边缘节点 通信。� Release V2.0+ Function List:(TBD)� ➔ 使⽤用 KubeEdge 和 Istio 构建服务⽹网格。�0 码力 | 20 页 | 2.08 MB | 1 年前3
共 34 条
- 1
- 2
- 3
- 4













