GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3
vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGplacement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also tier of high availability and automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 25 页 | 2.22 MB | 1 年前3
VMware SIG Deep Dive into Kubernetes Schedulingplacement of pods. This is used to spread pods across availability zones, while still respecting resource access and availability concerns. When Kubernetes runs on vSphere, the hypervisor platform also tier of high availability and automated placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration affinity groups, NUMA, etc.). This session will explain the options to gain better performance, resource optimization and availability through tuning of vSphere, and Kubernetes configuration and labeling0 码力 | 28 页 | 1.85 MB | 1 年前3
Kubernetes Native DevOps PracticeArchitecture and Features • CRD and operator design • Pipeline / Stage/ Task / Task Template / Version Control • Logging, monitoring, autoscaling, high availability • Extensibility / Integration • CI/CD Environment variable [] VolumeMounts - Files to be shared or persisted [] Resources - Resource requirement ActiveDeadlineSeconds Timeout of build task Lifecycle - Actions defined Architecture and Features • CRD and operator design • Pipeline/Stage/Task/Task Template/Version Control/UI generation/Volume... • Logging, monitoring, autoscaling, high availability • Extensibility/Integration0 码力 | 21 页 | 6.39 MB | 1 年前3
Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践逐渐成为开发 operator 的首选 Operator Pattern 是官方定义的标准扩 展机制,是 K8s Native Application; Operator = CRD + control loop, i.e, Declaretive API + Automation; kubebuilder + controller-runtime + helm Operator leveraging the status block of the Custom Resource Configuration of the workload • Operator provides configuration via the spec section of the Custom Resource • Operator reconciles configuration and 工程结构 2. 熟悉 K8s Declaretive API 如何设计 3. 熟悉 CR(custom resource)相关事件如何获取 4. 熟悉 Operator Control Loop(即 Reconcile 函数) 如何实现 5. 熟悉如何生成二级资源(Managed Resource) 6. 熟悉如何写 UT 7. 熟悉如何制作 Helm Chart 课后思考题: 1. 如何不启动0 码力 | 21 页 | 3.06 MB | 9 月前3
KubeCon2020/大型Kubernetes集群的资源编排优化Resource orchestration optimization of kubernetes cluster in large scale Patrickxie ( 谢谆志) Background Cloud has been the general trend. How to manage so many clusters ,resources and businesses How to ensure load balancing of cluster nodes 1 2 Improper resource requests 3 Multi-tenant resource preemption How to expand horizontally more quickly and flexibly 4 Region1 How do you manage K8S scheduling is based on the resource request of Pod. However, in many cases, some nodes have low resource requests but high load, while some nodes have high resource requests but low load. Dynamic-Scheduler0 码力 | 27 页 | 3.91 MB | 1 年前3
Go Programming Pattern in Kubernetes Philosophymetrics: - type: Resource resource: name: cpu targetAverageUtilization: 50 • API Object Oriented Programming Core of API “OO” 1.API objects stores in etcd 2.Control loops (Sync Loop) bind operation 4.2 Start Pod on this machine etcd scheduler api-server Pattern 1: Controller • Control everything by Controller • Level driven, not edge driven edge level Image: https://speakerdeck Gode Generator • client-gen: generate typed Kubernetes AP client for type • client.Pod.Get().Resource(…).Do() • conversion-gen: seamless upgrades between API versions • apiVersion: k8s.io/v1alpha10 码力 | 29 页 | 2.12 MB | 1 年前3
Kubernetes & YARN: a hybrid container cloud
Efficient placement of service container and tasks When placed together, don’t affect each other Resource contention ���� ������ ���������� - Online workload low 1:00am – 6:00am - Offline jobs scale ��������� Kubernetes Focus on long running service. Driving current state towards desired state with control loops YARN Focus on scheduling jobs ���������� ���� Kubernetes Container centric – bottom up. VTRON RPC Resource management VTRON: Virtual Total Resources Of Node cgroup �������� ������� Kubernetes YARN Online service usage Offline job resource usage Online service resource quota Offline0 码力 | 42 页 | 25.48 MB | 1 年前3
KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling What is TKEx Ø Based management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource utilization using hybrid deployment of online and offline services. • Support Service Mesh. RollingUpdate ? Ø What are the advantages of batch gray release ? • more reliable and better control • More flexible • More efficient StatefulSetPlus StatefulSetPlus Service (Kube-proxy, CLB, etc0 码力 | 19 页 | 10.94 MB | 1 年前3
Kubernetes开源书 - 周立Label 允许⽤户随⼼所欲地组织他们的资源。Annotation 允许⽤户使⽤⾃定义信息来装饰资源以⽅便他们的⼯作流程, 并为管理⼯具提供检查点状态的简单⽅法。 此外, Kubernetes control plane 所⽤的API 与开发⼈员和⽤户可⽤的API相同。⽤户可以使⽤ their own API 编写⾃⼰ 的控制器,例如 scheduler ,这些API可由通⽤ command-line (Dashboard) Dashboard 是⼀个Kubernetes集群通⽤、基于Web的UI。它允许⽤户管理/排错集群中应⽤程序以及集群本身。 Container Resource Monitoring(容器资源监控) Container Resource Monitoring 将容器的通⽤时序指标记录到⼀个中⼼化的数据库中,并提供⼀个UI以便于浏览该数 据。 Cluster-level Logging(集群级别的⽇志) us 。您必须提供spec ,它描述了对象所期望的状态——您希望对象所具有的特性。status描述对象的实际状态,由Kubernetes系统提供和更 新。在任何时候,Kubernetes Control Plane都会主动管理对象的实际状态,从⽽让其匹配你所期望的状态。 例如,Kubernetes Deployment是⼀个表示在集群上运⾏的应⽤程序的对象。在创建Deployment时,可设置0 码力 | 135 页 | 21.02 MB | 1 年前3
共 42 条
- 1
- 2
- 3
- 4
- 5













