Deploy VTA on Intel FPGAINDUSTRIES, INCORPORATED ACCELERATED VISUAL PERCEPTION LIANGFU CHEN 11/16/2019 DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 2 Moore’s Law is Slowing Down MOTIVATION©2019 DE10-Nano DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs.io/en/v2 INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate to 3rdparty/cma and build kernel module Copy kernel module0 码力 | 12 页 | 1.35 MB | 5 月前3
XDNN TVM - Nov 2019© Copyright 2018 Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation 24% 23% 85% 51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization0 码力 | 16 页 | 3.35 MB | 5 月前3
Bring Your Own Codegen to TVMRuntime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement extern operator functions, OR 2. Implement a Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement extern operator functions, OR 2. Implement a0 码力 | 19 页 | 504.69 KB | 5 月前3
Heterogeneous Modern C++ with SYCL 2020Creative Commons Attribution 4.0 International License SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks Fusion can give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard -generation-supercomputers/ https://research-portal.uws.ac.uk/en/publications/trisycl-for-xilinx-fpga https://www.imaginationtech.com/news/press-release/tensorflow-gets-native-support-for-powervr-gp0 码力 | 114 页 | 7.94 MB | 6 月前3
TVM: Where Are We GoingDifferentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data0 码力 | 31 页 | 22.64 MB | 5 月前3
TVM Meetup Nov. 16th - LinaroGPU mali (midgard) firefly rk3399, rock960 (mali t860) N/A opencl bifrost hikey960 (mali g71) N/A FPGA vta pynq, ultra96 N/A sdaccel Out-of-tree support or WIP: Hexagon DSP (via llvm), Ascend NPU, and0 码力 | 7 页 | 1.23 MB | 5 月前3
阿里云容器服务大促备战regional- outlook-and-forecast-study/492024云边端一体化协同双十一直播的背后 50% 5倍在线与离线 异构计算能力 ECS, EBM, GPU, FPGA, ECI 高性能网络 VPC, ENI, RDMA, SLB, DNS Public Cloud Edge Computing Private Cloud 高性能存储 EBS, NAS0 码力 | 17 页 | 17.74 MB | 6 月前3
Building Effective Embedded Systems: Architectural Best PracticesReal Time Hard Real Time Simple System Don’t care None Complicated System Operating system FPGA/Chip + CPU with operating systemLet’s review a system and decide if an operating system is0 码力 | 241 页 | 2.28 MB | 6 月前3
Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SCCPUs NEC VEs neoSYCL SX-AURORA TSUBASA TBB Any CPU Samsung PIMS XILINX Versal ACAP LLVM IR FPGA LLVM IR HLS Experimental DPC++ fork DPC++ fork MLIR Inteon Poligeist SYCL MLIR Bisheng0 码力 | 82 页 | 3.35 MB | 6 月前3
From Eager Futures/Promises to Lazy Continuations: Evolving an Actor Library Based on Lessons Learned from Large-Scale Deploymentsdon’t care, nor do we need to! ● if it uses a GPU, we don’t care, nor do we need to! ● if it uses an FPGA or a SoC, we don’t care, nor do we need to!function abstraction std::string SpellCheck(std::string0 码力 | 264 页 | 588.96 KB | 6 月前3
共 10 条
- 1
相关搜索词
DeployVTAonIntelFPGAXDNNTVMNov2019BringYourOwnCodegentoHeterogeneousModernC++withSYCL2020WhereAreWeGoingMeetup16thLinaro阿里容器服务大促备战BuildingEffectiveEmbeddedSystemsArchitecturalBestPracticesKhronosAPIsforComputeandSafetySCFromEagerFuturesPromisesLazyContinuationsEvolvinganActorLibraryBasedLessonsLearnedfromLargeScaleDeployments













