GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3
Node Operator: Kubernetes Node Management Made SimpleNode Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource • Analyze: difference from desired and actual0 码力 | 18 页 | 11.70 MB | 1 年前3
vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGplacement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 25 页 | 2.22 MB | 1 年前3
VMware SIG Deep Dive into Kubernetes Schedulingplacement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 28 页 | 1.85 MB | 1 年前3
KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践release stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling What is Platform, etc. • Declarative application lifecycle management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource utilization using hybrid deployment of online Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Architecture EKS (Elastic Kubernetes Service) TKE (Tencent Kubernetes Engine)0 码力 | 19 页 | 10.94 MB | 1 年前3
01. K8s扩展功能解析Application Catalog | Monitoring | Logging Management Plane Infrastructure Services - Policy Management - Cluster Operations - User Management - Lifecycle Management Infrastructure Services (Networking extend managed resource into a current Kubernetes cluster • Auto-generated API in Kubernetes API server • Customized resource controller to implement your business logic of managed resource • Natural Natural Kubernetes experience for operating your own resource with Kubernetes RBAC and authentication. • What it comes from • From ThirdPartyResource in Kubernetes 1.6 • Create CRD with spec in Kubernetes0 码力 | 12 页 | 1.08 MB | 1 年前3
Kubernetes & YARN: a hybrid container cloud
��� ����� �� Jian He Staff Engineer @Alibaba cluster management team Staff Engineer @Hortonworks Hadoop Committer & Project Management Committee member Bushuang Gao Senior Engineer @Alibaba Efficient placement of service container and tasks When placed together, don’t affect each other Resource contention ���� ������ ���������� - Online workload low 1:00am – 6:00am - Offline jobs scale VTRON RPC Resource management VTRON: Virtual Total Resources Of Node cgroup �������� ������� Kubernetes YARN Online service usage Offline job resource usage Online service resource quota Offline0 码力 | 42 页 | 25.48 MB | 1 年前3
Kubernetes开源书 - 周立(Dashboard) Dashboard 是⼀个Kubernetes集群通⽤、基于Web的UI。它允许⽤户管理/排错集群中应⽤程序以及集群本身。 Container Resource Monitoring(容器资源监控) Container Resource Monitoring 将容器的通⽤时序指标记录到⼀个中⼼化的数据库中,并提供⼀个UI以便于浏览该数 据。 Cluster-level Logging(集群级别的⽇志) Namespace为Name(名称)提供了范围。在Namespace中,资源的名称必须唯⼀,但不能跨Namespace。 Namespace是⼀种在多种⽤途之间划分集群资源的⽅法(通过 resource quota )。 在未来的Kubernetes版本中,同⼀Namespace中的对象默认有相同的访问控制策略。 没有必要使⽤多个Namespace来分隔稍微不同的资源,例如同⼀软件的不同版本,可使⽤ 关于Node的⼀般信息,如内核版本、Kubernetes版本(kubelet和kube-proxy版本)、Docker版本(如果使⽤了Docker 的话)、OS名称。信息由Kubelet从Node收集。 Management(管理) 与 pods 、 services 不同,Node不是由Kubernetes创建的:它是由Google Compute Engine等云提供商在外部创建 的,或存在于物理机或虚0 码力 | 135 页 | 21.02 MB | 1 年前3
Kubernetes Native DevOps PracticeEnvironment variable [] VolumeMounts - Files to be shared or persisted [] Resources - Resource requirement ActiveDeadlineSeconds Timeout of build task Lifecycle - Actions defined Node group of build nodes Node group of user applications Scheduling customization Cluster Resource Auto Scaling kubelet can do image GC DevOps Service DevOps Operator DevOps Operator DevOps can also use container probe if needed Infrastructure Layer Cloud provider insufficient resource remove / add nodes vCenter openstack Extensibility / Integration • Easy to extend task template0 码力 | 21 页 | 6.39 MB | 1 年前3
Go Programming Pattern in Kubernetes Philosophyhood • Internal systems or commercial software Kubernetes • The container orchestration and management project created by Google • Successor of Google Borg/Omega system • One of the most popular Deployment name: nginx-deployment minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50 • API Object Oriented Programming Core Gode Generator • client-gen: generate typed Kubernetes AP client for type • client.Pod.Get().Resource(…).Do() • conversion-gen: seamless upgrades between API versions • apiVersion: k8s.io/v1alpha10 码力 | 29 页 | 2.12 MB | 1 年前3
共 42 条
- 1
- 2
- 3
- 4
- 5













