 GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3 GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3
 Node Operator: Kubernetes Node Management Made SimpleNode Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3 Node Operator: Kubernetes Node Management Made SimpleNode Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3
 秘钥管理秘钥Turtles all the way down - Securely managing Kubernetes SecretsAccessible by users who shouldn’t have access, e.g., CEO ○ Stored in public storage buckets Secret management requirements Identity Require strong identities and least privilege Auditing Verify the use application-layer … then tries to decrypt it https://xkcd.com/538/, https://xkcd.com/license.html Key rotation “Keys are analogous to the combination of a safe. If a safe combination is known to an penetration. Similarly, poor key management may easily compromise strong algorithms.” NIST SP 800-57, Recommendation for Key Management Keys get old Key rotation ● Key rotation is meant to limit the0 码力 | 52 页 | 2.84 MB | 1 年前3 秘钥管理秘钥Turtles all the way down - Securely managing Kubernetes SecretsAccessible by users who shouldn’t have access, e.g., CEO ○ Stored in public storage buckets Secret management requirements Identity Require strong identities and least privilege Auditing Verify the use application-layer … then tries to decrypt it https://xkcd.com/538/, https://xkcd.com/license.html Key rotation “Keys are analogous to the combination of a safe. If a safe combination is known to an penetration. Similarly, poor key management may easily compromise strong algorithms.” NIST SP 800-57, Recommendation for Key Management Keys get old Key rotation ● Key rotation is meant to limit the0 码力 | 52 页 | 2.84 MB | 1 年前3
 Putting an Invisible Shield on Kubernetes SecretsSensitive information • Passwords • OAuth tokens • ssh keys etc. • Stored in etcd • distributed Key-Value data store • How about their security? • Default K8s setup • etcd contents not encrypted (only complicated! ü User access management => raw and extensive! ü Secrets management => crucial! • Financial-grade security [1] KubeCon China 2018: Node Operator: Kubernetes Node Management Made Simple - Joe Chen Emergency management • High Availability guarantee • KMS • API server & kms-plugin • Cron job backup for KEKs (from KMS) • Static key configuration support in kms-plugin • One click decryption • Key force0 码力 | 33 页 | 20.81 MB | 1 年前3 Putting an Invisible Shield on Kubernetes SecretsSensitive information • Passwords • OAuth tokens • ssh keys etc. • Stored in etcd • distributed Key-Value data store • How about their security? • Default K8s setup • etcd contents not encrypted (only complicated! ü User access management => raw and extensive! ü Secrets management => crucial! • Financial-grade security [1] KubeCon China 2018: Node Operator: Kubernetes Node Management Made Simple - Joe Chen Emergency management • High Availability guarantee • KMS • API server & kms-plugin • Cron job backup for KEKs (from KMS) • Static key configuration support in kms-plugin • One click decryption • Key force0 码力 | 33 页 | 20.81 MB | 1 年前3
 vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes eligible to be scheduled on based on <key: value> labels on the node • Some labels are automatically created, but you can add more specified as NodeSelector <key: value> in the Pod spec Affinity Zones0 码力 | 25 页 | 2.22 MB | 1 年前3 vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes eligible to be scheduled on based on <key: value> labels on the node • Some labels are automatically created, but you can add more specified as NodeSelector <key: value> in the Pod spec Affinity Zones0 码力 | 25 页 | 2.22 MB | 1 年前3
 VMware SIG Deep Dive into Kubernetes Schedulingplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes High eligible to be scheduled on based on <key: value> labels on the node • Some labels are automatically created, but you can add more specified as NodeSelector <key: value> in the Pod spec Affinity Zones0 码力 | 28 页 | 1.85 MB | 1 年前3 VMware SIG Deep Dive into Kubernetes Schedulingplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes High eligible to be scheduled on based on <key: value> labels on the node • Some labels are automatically created, but you can add more specified as NodeSelector <key: value> in the Pod spec Affinity Zones0 码力 | 28 页 | 1.85 MB | 1 年前3
 KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践way to release stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling systems like Route System, CMDB, CI, Security Platform, etc. • Declarative application lifecycle management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Architecture EKS (Elastic Kubernetes Service) TKE (Tencent Kubernetes Engine)0 码力 | 19 页 | 10.94 MB | 1 年前3 KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践way to release stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling systems like Route System, CMDB, CI, Security Platform, etc. • Declarative application lifecycle management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Architecture EKS (Elastic Kubernetes Service) TKE (Tencent Kubernetes Engine)0 码力 | 19 页 | 10.94 MB | 1 年前3
 Kubernetes开源书 -  周立的语义。Label可⽤于组织和选择对象的⼦集。Label可在创建时附加到对象,也可在创建后随时添加和修改。每个对象 都可定义⼀组Label。 对于给定的对象,Key必须唯⼀。 "labels" : { "key1" : "value1" , "key2" : "value2" } 我们最终会对Label进⾏索引和反向索引,以便于⾼效的查询、watch、排序、分组等操作。不要使⽤⾮标识的、⼤型的 "track" : "daily" , "track" : "weekly" 以上只是常⽤Label示例。可⾃由定制。请记住,给定对象的Label的key必须唯⼀。 语法和字符集 Label是键值对的形式。 有效Label的key分为两段: 名称段和前缀段,以 / 分隔。 名称段是必需的,不超过63个字符,以字⺟或数字( [a-z0-9A-Z] )开头和结尾,中间可包含 _ 、 . 、 字⺟或数字等字符。 前缀段是可选的。必须是DNS⼦域:⼀系列由 . 分隔的DNS Label,总共不超过253个字符,后跟 / 。 如省略前缀,则假定Label的Key对⽤户是私有的。 为最终⽤户对象添加Label的⾃动化系统组件(例如, kube-scheduler 、 kube-controller-manager 、 kube- apiserver0 码力 | 135 页 | 21.02 MB | 1 年前3 Kubernetes开源书 -  周立的语义。Label可⽤于组织和选择对象的⼦集。Label可在创建时附加到对象,也可在创建后随时添加和修改。每个对象 都可定义⼀组Label。 对于给定的对象,Key必须唯⼀。 "labels" : { "key1" : "value1" , "key2" : "value2" } 我们最终会对Label进⾏索引和反向索引,以便于⾼效的查询、watch、排序、分组等操作。不要使⽤⾮标识的、⼤型的 "track" : "daily" , "track" : "weekly" 以上只是常⽤Label示例。可⾃由定制。请记住,给定对象的Label的key必须唯⼀。 语法和字符集 Label是键值对的形式。 有效Label的key分为两段: 名称段和前缀段,以 / 分隔。 名称段是必需的,不超过63个字符,以字⺟或数字( [a-z0-9A-Z] )开头和结尾,中间可包含 _ 、 . 、 字⺟或数字等字符。 前缀段是可选的。必须是DNS⼦域:⼀系列由 . 分隔的DNS Label,总共不超过253个字符,后跟 / 。 如省略前缀,则假定Label的Key对⽤户是私有的。 为最终⽤户对象添加Label的⾃动化系统组件(例如, kube-scheduler 、 kube-controller-manager 、 kube- apiserver0 码力 | 135 页 | 21.02 MB | 1 年前3
 QCon北京2018/QCon北京2018-《Kubernetes-+面向未来的开发和部署》-Michael+ChenVery manual, no fault tolerance, hard to scale, etc • Scheduling, provisioning, and resource management of multiple containers – Docker, Mesos à Kubernetes Support – AWS, Azure, Google à Kubernetes ContainerImage2 Replicas: 2 Kubernetes 101 at the Highest Level • Container Cluster = “Desired State Management” – Kubernetes Cluster Services (w/API) • Node = Container Host w/agent called “Kubelet” • Application Node Basic Components Master Node ETCD kube-apiserver kube-controller-manager kube-scheduler • Key/Value Store • Leader based clustering • Can be clustered across Master Nodes • Contains all state0 码力 | 42 页 | 10.97 MB | 1 年前3 QCon北京2018/QCon北京2018-《Kubernetes-+面向未来的开发和部署》-Michael+ChenVery manual, no fault tolerance, hard to scale, etc • Scheduling, provisioning, and resource management of multiple containers – Docker, Mesos à Kubernetes Support – AWS, Azure, Google à Kubernetes ContainerImage2 Replicas: 2 Kubernetes 101 at the Highest Level • Container Cluster = “Desired State Management” – Kubernetes Cluster Services (w/API) • Node = Container Host w/agent called “Kubelet” • Application Node Basic Components Master Node ETCD kube-apiserver kube-controller-manager kube-scheduler • Key/Value Store • Leader based clustering • Can be clustered across Master Nodes • Contains all state0 码力 | 42 页 | 10.97 MB | 1 年前3
 Model and Operate Datacenter by Kubernetes at eBay (提交版)Kubernetes You have your compute node now, all you need is to configure it by a configuration management orchestration. We use SaltStack. Let’s model a datacenter running Kubernetes Onboard Provision upgrade a cluster? ● Kubernetes is amazing on its simple architecture ● Model + Controller is the key concept of Kubernetes ● It’s easy to extend Kubernetes API and write your controller based on list/watch KafkaCluster, HadoopCluster, MongoDB, ESCluster …… Fleet (Compute, Network, Storage) Configuration Management Infrastructure Service Application Service Recap We are hiring! xnxin@ebay.com cmei@ebay.com0 码力 | 25 页 | 3.60 MB | 1 年前3 Model and Operate Datacenter by Kubernetes at eBay (提交版)Kubernetes You have your compute node now, all you need is to configure it by a configuration management orchestration. We use SaltStack. Let’s model a datacenter running Kubernetes Onboard Provision upgrade a cluster? ● Kubernetes is amazing on its simple architecture ● Model + Controller is the key concept of Kubernetes ● It’s easy to extend Kubernetes API and write your controller based on list/watch KafkaCluster, HadoopCluster, MongoDB, ESCluster …… Fleet (Compute, Network, Storage) Configuration Management Infrastructure Service Application Service Recap We are hiring! xnxin@ebay.com cmei@ebay.com0 码力 | 25 页 | 3.60 MB | 1 年前3
共 38 条
- 1
- 2
- 3
- 4














