GPU Resource Management On JDOSGPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3
Node Operator: Kubernetes Node Management Made SimpleNode Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3
Kubernetes安全求生指南NIST在容器安全指南中揭露了五種容器應用最應關注的風險 映像風險 Image Risk 登錄風險 Registry Risk 容器調度平台風險 Orchestrator Risk 容器風險 Container Risk 實體作業系統風險 Host OS Risk ©2019 VMware, Inc. 9 針對Kubernetes的安全強化實作參考: Day 1 & Day 2 for k8s clusters Manages access to k8s API for developers IT Operator IaaS Management Internet User Application User Trust Boundary Trust Boundary Trust Boundary Trust Boundary Authentication and Authorization i. Compliance j. File System Permissions k. User Account Management 所有強化在發佈前都經過測試驗證 您不再需要每回合升級都從頭來過 若發現CVE漏洞官方立刻提供修補 •The following servers are not used0 码力 | 23 页 | 2.14 MB | 1 年前3
在大规模Kubernetes集群上实现高SLO的方法the cluster SLIs on Large k8s Cluster 1. Cluster health state A combination value indicates the risk in the cluster. Currently Healthy, Warning and Fatal are the possible value. 2. Success rate A0 码力 | 11 页 | 4.01 MB | 1 年前3
秘钥管理秘钥Turtles all the way down - Securely managing Kubernetes SecretsAccessible by users who shouldn’t have access, e.g., CEO ○ Stored in public storage buckets Secret management requirements Identity Require strong identities and least privilege Auditing Verify the use security against penetration. Similarly, poor key management may easily compromise strong algorithms.” NIST SP 800-57, Recommendation for Key Management Keys get old Key rotation ● Key rotation is meant stored cardholder data against disclosure and misuse. 3.6 Fully document and implement all key-management processes and procedures for cryptographic keys used for encryption of cardholder data, including0 码力 | 52 页 | 2.84 MB | 1 年前3
vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIGplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes pre-container era Active discussions regarding Kubernetes enhancements going on now in Resource Management Working Group – please join in • See Issue #49964 14 Using a NUMA aware hypervisor to solve0 码力 | 25 页 | 2.22 MB | 1 年前3
VMware SIG Deep Dive into Kubernetes Schedulingplacement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes High pre-container era Active discussions regarding Kubernetes enhancements going on now in Resource Management Working Group – please join in • See Issue #49964 14 Using a NUMA aware hypervisor to solve0 码力 | 28 页 | 1.85 MB | 1 年前3
QCon北京2017/智能化运维/Self Hosted Infrastructure:以自动运维 Kubernetes 为例distributed system Self driving infrastructure Topics ● Cluster management systems ● Today’s problems with operating cluster management systems ● A self-driving approach Motivation: microservices components ○ dynamic dependencies ○ fast deployment iteration ● Solution: automation Cluster management system ● Automation ○ Scheduling ○ Deployment ○ Healing ○ Discovery/load balancing ○ Scaling Kubernetes? ● Operational expertise around app management in k8s extends to k8s itself ○ E.g. scaling ● Bootstrapping simplified ● Simply cluster life cycle management ○ E.g. updates ● Upstream improvements0 码力 | 73 页 | 1.58 MB | 1 年前3
KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践way to release stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling systems like Route System, CMDB, CI, Security Platform, etc. • Declarative application lifecycle management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Architecture EKS (Elastic Kubernetes Service) TKE (Tencent Kubernetes Engine)0 码力 | 19 页 | 10.94 MB | 1 年前3
QCon北京2018/QCon北京2018-《Kubernetes-+面向未来的开发和部署》-Michael+ChenVery manual, no fault tolerance, hard to scale, etc • Scheduling, provisioning, and resource management of multiple containers – Docker, Mesos à Kubernetes Support – AWS, Azure, Google à Kubernetes ContainerImage2 Replicas: 2 Kubernetes 101 at the Highest Level • Container Cluster = “Desired State Management” – Kubernetes Cluster Services (w/API) • Node = Container Host w/agent called “Kubelet” • Application Switch Namespace ‘foo’ PODs – Logical Switch Namespace ‘demo’ PODs – Logical Switch Cluster Management Nodes – Logical Switch Master ‘VM’ etcd NCP API Srv Worker ‘VM’ Pod 1 Pod 2 Worker ‘VM’0 码力 | 42 页 | 10.97 MB | 1 年前3
共 30 条
- 1
- 2
- 3













