 Deploy VTA on Intel FPGAINDUSTRIES, INCORPORATED ACCELERATED VISUAL PERCEPTION LIANGFU CHEN 11/16/2019 DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 2 Moore’s Law is Slowing Down MOTIVATION©2019 DE10-Nano DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate to 3rdparty/cma and build kernel module Copy kernel module0 码力 | 12 页 | 1.35 MB | 5 月前3 Deploy VTA on Intel FPGAINDUSTRIES, INCORPORATED ACCELERATED VISUAL PERCEPTION LIANGFU CHEN 11/16/2019 DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 2 Moore’s Law is Slowing Down MOTIVATION©2019 DE10-Nano DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate to 3rdparty/cma and build kernel module Copy kernel module0 码力 | 12 页 | 1.35 MB | 5 月前3
 Heterogeneous Modern C++ with SYCL 2020Creative Commons Attribution 4.0 International License SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks Fusion can give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard / https://www.phoronix.com/scan.php?page=news_item&px=hipSYCL-New-Lite-Runtime https://software.intel.com/content/www/us/en/develop/articles/interoperability-dpcpp-sycl-opencl.html https://www.renesas0 码力 | 114 页 | 7.94 MB | 6 月前3 Heterogeneous Modern C++ with SYCL 2020Creative Commons Attribution 4.0 International License SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks Fusion can give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard / https://www.phoronix.com/scan.php?page=news_item&px=hipSYCL-New-Lite-Runtime https://software.intel.com/content/www/us/en/develop/articles/interoperability-dpcpp-sycl-opencl.html https://www.renesas0 码力 | 114 页 | 7.94 MB | 6 月前3
 Bring Your Own Codegen to TVM© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidentia Presenter: Zhi Chen, Cody Yu Amazon SageMaker Neo, Deep Engine Science Bring Your Own Codegen to TVM Chip© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example showcase: Intel MKL-DNN (DNNL) library 1. Import packages import numpy as np from tvm import relay 2. Load a pretrained Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement0 码力 | 19 页 | 504.69 KB | 5 月前3 Bring Your Own Codegen to TVM© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidentia Presenter: Zhi Chen, Cody Yu Amazon SageMaker Neo, Deep Engine Science Bring Your Own Codegen to TVM Chip© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example showcase: Intel MKL-DNN (DNNL) library 1. Import packages import numpy as np from tvm import relay 2. Load a pretrained Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement0 码力 | 19 页 | 504.69 KB | 5 月前3
 TVM: Where Are We GoingDifferentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft, Qualcomm, Alibaba, Intel, … Incubated as Apache TVM recently. Independent governance, allowing competitors to collaborate0 码力 | 31 页 | 22.64 MB | 5 月前3 TVM: Where Are We GoingDifferentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft, Qualcomm, Alibaba, Intel, … Incubated as Apache TVM recently. Independent governance, allowing competitors to collaborate0 码力 | 31 页 | 22.64 MB | 5 月前3
 Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SCSource Code DPC++ Uses LLVM/Clang Part of oneAPI hipSYCL Multiple Backends Any CPU Intel CPUs Intel GPUs Intel FPGAs AMD GPUs Any CPU SYCL enables Khronos to influence ISO C++ to (eventually) support integration and deployment of multiple acceleration technologies Level Zero Intel GPUs NVIDIA GPUs Level Zero Intel GPUs AMD GPUs New, not experimental anymore, and works on Ponte Vecchio deployment of multiple acceleration technologies VEO Intel CPUs NEC VEs neoSYCL SX-AURORA TSUBASA TBB Any CPU Samsung PIMS XILINX Versal ACAP LLVM IR FPGA LLVM IR HLS Experimental DPC++ fork DPC++0 码力 | 82 页 | 3.35 MB | 6 月前3 Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SCSource Code DPC++ Uses LLVM/Clang Part of oneAPI hipSYCL Multiple Backends Any CPU Intel CPUs Intel GPUs Intel FPGAs AMD GPUs Any CPU SYCL enables Khronos to influence ISO C++ to (eventually) support integration and deployment of multiple acceleration technologies Level Zero Intel GPUs NVIDIA GPUs Level Zero Intel GPUs AMD GPUs New, not experimental anymore, and works on Ponte Vecchio deployment of multiple acceleration technologies VEO Intel CPUs NEC VEs neoSYCL SX-AURORA TSUBASA TBB Any CPU Samsung PIMS XILINX Versal ACAP LLVM IR FPGA LLVM IR HLS Experimental DPC++ fork DPC++0 码力 | 82 页 | 3.35 MB | 6 月前3
 XDNN TVM - Nov 2019© Copyright 2018 Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation 24% 23% 85% 51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization0 码力 | 16 页 | 3.35 MB | 5 月前3 XDNN TVM - Nov 2019© Copyright 2018 Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation 24% 23% 85% 51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization0 码力 | 16 页 | 3.35 MB | 5 月前3
 TVM Meetup Nov. 16th - LinaroGPU mali (midgard) firefly rk3399, rock960 (mali t860) N/A opencl bifrost hikey960 (mali g71) N/A FPGA vta pynq, ultra96 N/A sdaccel Out-of-tree support or WIP: Hexagon DSP (via llvm), Ascend NPU, and0 码力 | 7 页 | 1.23 MB | 5 月前3 TVM Meetup Nov. 16th - LinaroGPU mali (midgard) firefly rk3399, rock960 (mali t860) N/A opencl bifrost hikey960 (mali g71) N/A FPGA vta pynq, ultra96 N/A sdaccel Out-of-tree support or WIP: Hexagon DSP (via llvm), Ascend NPU, and0 码力 | 7 页 | 1.23 MB | 5 月前3
 阿里云容器服务大促备战regional- outlook-and-forecast-study/492024云边端一体化协同双十一直播的背后 50% 5倍在线与离线 异构计算能力 ECS, EBM, GPU, FPGA, ECI 高性能网络 VPC, ENI, RDMA, SLB, DNS Public Cloud Edge Computing Private Cloud 高性能存储 EBS, NAS0 码力 | 17 页 | 17.74 MB | 6 月前3 阿里云容器服务大促备战regional- outlook-and-forecast-study/492024云边端一体化协同双十一直播的背后 50% 5倍在线与离线 异构计算能力 ECS, EBM, GPU, FPGA, ECI 高性能网络 VPC, ENI, RDMA, SLB, DNS Public Cloud Edge Computing Private Cloud 高性能存储 EBS, NAS0 码力 | 17 页 | 17.74 MB | 6 月前3
 Building Effective Embedded Systems: Architectural Best PracticesReal Time Hard Real Time Simple System Don’t care None Complicated System Operating system FPGA/Chip + CPU with operating systemLet’s review a system and decide if an operating system is0 码力 | 241 页 | 2.28 MB | 6 月前3 Building Effective Embedded Systems: Architectural Best PracticesReal Time Hard Real Time Simple System Don’t care None Complicated System Operating system FPGA/Chip + CPU with operating systemLet’s review a system and decide if an operating system is0 码力 | 241 页 | 2.28 MB | 6 月前3
 From Eager Futures/Promises to Lazy Continuations: Evolving an Actor Library Based on Lessons Learned from Large-Scale Deploymentsdon’t care, nor do we need to! ● if it uses a GPU, we don’t care, nor do we need to! ● if it uses an FPGA or a SoC, we don’t care, nor do we need to!function abstraction std::string SpellCheck(std::string0 码力 | 264 页 | 588.96 KB | 6 月前3 From Eager Futures/Promises to Lazy Continuations: Evolving an Actor Library Based on Lessons Learned from Large-Scale Deploymentsdon’t care, nor do we need to! ● if it uses a GPU, we don’t care, nor do we need to! ● if it uses an FPGA or a SoC, we don’t care, nor do we need to!function abstraction std::string SpellCheck(std::string0 码力 | 264 页 | 588.96 KB | 6 月前3
共 148 条
- 1
- 2
- 3
- 4
- 5
- 6
- 15
相关搜索词
 DeployVTAonIntelFPGAHeterogeneousModernC++withSYCL2020BringYourOwnCodegentoTVMWhereAreWeGoingKhronosAPIsforComputeandSafetySCXDNNNov2019Meetup16thLinaro阿里容器服务大促备战BuildingEffectiveEmbeddedSystemsArchitecturalBestPracticesFromEagerFuturesPromisesLazyContinuationsEvolvinganActorLibraryBasedLessonsLearnedfromLargeScaleDeployments













