Go on GPUChangkun Ou. 2023. Go on GPU. GopherChina 2023. Session "Foundational Toolchains" Go on GPU Changkun Ou changkun.de/s/gogpu GopherChina 2023 Session “Foundational Toolchains” 2023 June 10 1 Changkun Ou. 2023. Go on GPU. GopherChina 2023. Session "Foundational Toolchains" Agenda ● Basic knowledge for interacting with GPUs ● Accelerate Go programs using GPUs ● Challenges in Go when using outlooks 2 Changkun Ou. 2023. Go on GPU. GopherChina 2023. Session "Foundational Toolchains" Agenda ● Basic knowledge for interacting with GPUs ○ Motivation ○ GPU Driver and Standards ○ Render and0 码力 | 57 页 | 4.62 MB | 1 年前3
Bridging the Gap: Writing Portable Programs for CPU and GPU1/66Bridging the Gap: Writing Portable Programs for CPU and GPU using CUDA Thomas Mejstrik Sebastian Woblistin 2/66Content 1 Motivation Audience etc.. Cuda crash course Quiz time 2 Patterns Oldschool Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Algorithms are designed differently Latency/Throughput Memory bandwidth Number of cores Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Why it makes sense? Library/Framework developers Embarrassingly parallel algorithms User0 码力 | 124 页 | 4.10 MB | 6 月前3
Kubernetes for Edge Computing across
Inter-Continental Haier Production Sites应用互联互通 应用形态复杂 • KPI: 峰值CPU利用率不低 于30% • 资源申请:按峰值30%进 行申请 • 峰值:1000TPS, 平时: 100TPS • 做自己擅长的事情,合作 方式开发 • 产品迭代:如何持续演进 和优化 • 外包管理:如何标准化降 低管理成本,提高质量 外包开发模式 资源利用率KPI 01 04 02 03 海尔集团业务转型 提交多框架(TensorFlow、PyTorch 、MxNet等)的模型训练作业,支 持分布式和 GPU 加速,以及训练过 程的可视化。 模型训练 模型版本管理,模型推理服务的部署 、监控、管理和升级,提供 A/B test 和滚动升级。 模型服务 实现对 GPU 集群资源进行管理,根 据用户作业请求自动分配和回收 GPU 资源。 GPU 集群管理 对接存储系统,管理数据集;提供 notebook 交互式代码开发和调试工0 码力 | 33 页 | 4.41 MB | 1 年前3
Serverless Kubernetes - KubeConpricing • 降低服务运行成本:无需再为闲置的计算资源付费(No Cost when Idle) • 灵活选择容器资源规格(Fine-grained cost model) • 提高资源利用率 CPU (vCPU) Memory (GB) 1 Min. 2 and Max. 8GB, in 1GB increments 2 Min. 4 and Max. 16GB, in0 码力 | 16 页 | 4.25 MB | 1 年前3
PyTorch Release NotesDeep Learning SDK accelerates widely-used deep learning frameworks such as PyTorch. PyTorch is a GPU-accelerated tensor computational framework with a Python front end. Functionality can be easily extended standard defined neural network layers, deep learning optimizers, data loading utilities, and multi-gpu, and multi-node support. Functions are executed immediately instead of enqueued in a static graph, see Preparing to use NVIDIA Containers Getting Started Guide. ‣ For non-DGX users, see NVIDIA ® GPU Cloud ™ (NGC) container registry installation documentation based on your platform. ‣ Ensure that0 码力 | 365 页 | 2.94 MB | 1 年前3
POCOAS in C++: A Portable Abstraction for Distributed Data StructuresCPU vFast GPU vvFast PCI Bus (or other fabric)GPUs as a First-Class Computing Resource CPU GPU PCI Bus (or other fabric) NIC - Historically, network comm. was CPU-centric 1) Direct GPU access to Infiniband allows GPU-to-GPU network transfers 2) Fast in-node fabrics like NVLink, Infinity Fabric allow very fast intra-node transfers DataGPUs as a First-Class Computing Resource CPU GPU PCI Bus (or fabric) NIC Data - Historically, network comm. was CPU-centric 1) Direct GPU access to Infiniband allows GPU-to-GPU network transfers 2) Fast in-node fabrics like NVLink, Infinity Fabric allow0 码力 | 128 页 | 2.03 MB | 6 月前3
Taro: Task graph-based Asynchronous Programming Using C++ CoroutineB" : GPU operation 9Existing TGPSs on Heterogenous Computing - Challenge A C D B! B" 5 task_b = sched.emplace([](&){ 6 // CPU code; // GPU code; 7 }); // CPU thread blocks until GPU finishes B" : GPU operation 10Existing TGPSs on Heterogenous Computing - Challenge A C D B! B" 5 task_b = sched.emplace([](&){ 6 // CPU code; // GPU code; 7 }); // CPU thread blocks until GPU finishes operation B" : GPU operation Atomic execution per task 11Existing TGPSs on Heterogenous Computing - Challenge CPU A B! C Idle GPU D B" Runtime A C D B! B" Assume one CPU and one GPU B! : CPU operation0 码力 | 84 页 | 8.82 MB | 6 月前3
Heterogeneous Modern C++ with SYCL 2020http://wongmichael.com/about ● C++11 book in Chinese: https://www.amazon.cn/dp/B00ETOV2OQ We build GPU compilers for some of the most powerful supercomputers in the world 34 Nevin “:-)” Liber nliber@anl Attribution 4.0 International License SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard for0 码力 | 114 页 | 7.94 MB | 6 月前3
Bringing Existing Code to CUDA Using constexpr and std::pmrcudaFree(x); cudaFree(y); } An Even Easier Introduction to CUDA 5 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0; i < n; i++) y[i] = x[i] + y[i]; } TEST_CASE("cppcon-1" TEST_CASE("cppcon-1", "[CUDA]") { // … } An Even Easier Introduction to CUDA 6 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0; i < n; i++) y[i] = x[i] + y[i]; } TEST_CASE("cppcon-1" 20; float* x; float* y; // … add_gpu<<<1, 1>>>(N, x, y); // … } An Even Easier Introduction to CUDA 7 |__global__ void add_gpu(int n, float* x, float* y) { for (int i = 0;0 码力 | 51 页 | 3.68 MB | 6 月前3
Keras: 基于 Python 的深度学习库. . . . . . . . . 6 2.4 Keras 支持多个后端引擎,并且不会将你锁定到一个生态系统中 . . . . . . . . . . 6 2.5 Keras 拥有强大的多 GPU 和分布式训练支持 . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Keras 的发展得到深度学习生态系统中的关键公司的支持 . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 如何在 GPU 上运行 Keras? . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.4 如何在多 GPU 上运行 Keras 模型? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 20.9 multi_gpu_model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 21 贡献 242 210 码力 | 257 页 | 1.19 MB | 1 年前3
共 276 条
- 1
- 2
- 3
- 4
- 5
- 6
- 28
相关搜索词
GoonGPUBridgingtheGapWritingPortableProgramsforCPUandKubernetesEdgeComputingacrossInterContinentalHaierProductionSitesServerlessKubeConPyTorchReleaseNotesPOCOASinC++AbstractionDistributedDataStructuresTaroTaskgraphbasedAsynchronousProgrammingUsingCoroutineHeterogeneousModernwithSYCL2020BringingExistingCodetoCUDAconstexprstdpmrKeras基于Python深度学习













