 Deploy VTA on Intel FPGAINDUSTRIES, INCORPORATED ACCELERATED VISUAL PERCEPTION LIANGFU CHEN 11/16/2019 DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 2 Moore’s Law is Slowing Down MOTIVATION©2019 DE10-Nano DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate to 3rdparty/cma and build kernel module Copy kernel module0 码力 | 12 页 | 1.35 MB | 5 月前3 Deploy VTA on Intel FPGAINDUSTRIES, INCORPORATED ACCELERATED VISUAL PERCEPTION LIANGFU CHEN 11/16/2019 DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 2 Moore’s Law is Slowing Down MOTIVATION©2019 DE10-Nano DEPLOY VTA ON INTEL FPGA©2019 HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED 5 Software - CMA Contiguous Memory Allocation – Linux Kernel DEPLOY VTA ON INTEL FPGA https://pynq.readthedocs INCORPORATED 6 Software - CMA Contiguous Memory Allocation – Linux Kernel Module DEPLOY VTA ON INTEL FPGA Setup Environment Variables Navigate to 3rdparty/cma and build kernel module Copy kernel module0 码力 | 12 页 | 1.35 MB | 5 月前3
 Bring Your Own Codegen to TVM© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidentia Presenter: Zhi Chen, Cody Yu Amazon SageMaker Neo, Deep Engine Science Bring Your Own Codegen to TVM Chip© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example showcase: Intel MKL-DNN (DNNL) library 1. Import packages import numpy as np from tvm import relay 2. Load a pretrained Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement0 码力 | 19 页 | 504.69 KB | 5 月前3 Bring Your Own Codegen to TVM© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidentia Presenter: Zhi Chen, Cody Yu Amazon SageMaker Neo, Deep Engine Science Bring Your Own Codegen to TVM Chip© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example showcase: Intel MKL-DNN (DNNL) library 1. Import packages import numpy as np from tvm import relay 2. Load a pretrained Runtime (VM, Graph Runtime, Interpreter) Your Dispatcher Target Device General Devices (CPU/GPU/FPGA) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement0 码力 | 19 页 | 504.69 KB | 5 月前3
 TVM: Where Are We GoingDifferentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft, Qualcomm, Alibaba, Intel, … Incubated as Apache TVM recently. Independent governance, allowing competitors to collaborate0 码力 | 31 页 | 22.64 MB | 5 月前3 TVM: Where Are We GoingDifferentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data from UW, Berkeley, Cornell, UCLA, Amazon, Huawei, NTT, Facebook, Microsoft, Qualcomm, Alibaba, Intel, … Incubated as Apache TVM recently. Independent governance, allowing competitors to collaborate0 码力 | 31 页 | 22.64 MB | 5 月前3
 XDNN TVM - Nov 2019© Copyright 2018 Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation 24% 23% 85% 51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization0 码力 | 16 页 | 3.35 MB | 5 月前3 XDNN TVM - Nov 2019© Copyright 2018 Xilinx Elliott Delaye FPGA CNN Accelerator and TVM© Copyright 2018 Xilinx TVM Target devices and models >> 2 HW Platforms ZCU102 ZCU104 Ultra96 PYNQ Face detection Pose estimation 24% 23% 85% 51% 52% 0% 20% 40% 60% 80% 100% VGG16 ResNet-50 GoogleNet-V3 Aristotle on 7020 FPGA Iphone8plus Kirin 970 CPU MEM CONTROLLER BUS Data Mover IMG WR SCHEDULER WEIGHTS WR SCHEDULER for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization0 码力 | 16 页 | 3.35 MB | 5 月前3
 TVM Meetup Nov. 16th - LinaroGPU mali (midgard) firefly rk3399, rock960 (mali t860) N/A opencl bifrost hikey960 (mali g71) N/A FPGA vta pynq, ultra96 N/A sdaccel Out-of-tree support or WIP: Hexagon DSP (via llvm), Ascend NPU, and0 码力 | 7 页 | 1.23 MB | 5 月前3 TVM Meetup Nov. 16th - LinaroGPU mali (midgard) firefly rk3399, rock960 (mali t860) N/A opencl bifrost hikey960 (mali g71) N/A FPGA vta pynq, ultra96 N/A sdaccel Out-of-tree support or WIP: Hexagon DSP (via llvm), Ascend NPU, and0 码力 | 7 页 | 1.23 MB | 5 月前3
 TVM@AliOSAGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview AiOS 1驱动万物智能 AliOs overview 。 AliOs AN 2X MobilenetV2 TFLite 1.34X MobilenetV2 QNNPACK AliOs @ Roewe RX5 MAX OpenVINO @ Intel GPU AliDS AR-Nav Product @ SUV Release and adopt TVM (Apollo Lake Gold) Model 1.6X Intel AliOs TVM Arch Model 。 Facelandmark Pedestrian & Vehicle Detection Voice-GUI Gesture Lanenet NLU DMS FacelD Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated0 码力 | 27 页 | 4.86 MB | 5 月前3 TVM@AliOSAGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview AiOS 1驱动万物智能 AliOs overview 。 AliOs AN 2X MobilenetV2 TFLite 1.34X MobilenetV2 QNNPACK AliOs @ Roewe RX5 MAX OpenVINO @ Intel GPU AliDS AR-Nav Product @ SUV Release and adopt TVM (Apollo Lake Gold) Model 1.6X Intel AliOs TVM Arch Model 。 Facelandmark Pedestrian & Vehicle Detection Voice-GUI Gesture Lanenet NLU DMS FacelD Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated0 码力 | 27 页 | 4.86 MB | 5 月前3
 TVM Meetup: QuantizationRelay Graph Target-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets Using QNN Dialect QNN passes Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent its Affiliates. All rights reserved. Outline • QNN Dialect • Design • Operators • Results on Intel Cascade Lake© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantized Operators0 码力 | 19 页 | 489.50 KB | 5 月前3 TVM Meetup: QuantizationRelay Graph Target-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets Using QNN Dialect QNN passes Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent its Affiliates. All rights reserved. Outline • QNN Dialect • Design • Operators • Results on Intel Cascade Lake© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantized Operators0 码力 | 19 页 | 489.50 KB | 5 月前3
 julia 1.10.10(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) with JIT profiling support, using eitherCHAPTER 29. ENVIRONMENT VARIABLES 376 • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 1692 页 | 6.34 MB | 3 月前3 julia 1.10.10(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) with JIT profiling support, using eitherCHAPTER 29. ENVIRONMENT VARIABLES 376 • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 1692 页 | 6.34 MB | 3 月前3
 Julia 1.10.9(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) with JIT profiling support, using eitherCHAPTER 29. ENVIRONMENT VARIABLES 376 • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 1692 页 | 6.34 MB | 3 月前3 Julia 1.10.9(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) with JIT profiling support, using eitherCHAPTER 29. ENVIRONMENT VARIABLES 376 • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 1692 页 | 6.34 MB | 3 月前3
 Julia 1.11.4(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) only has an effect if Julia was compiled with JIT profiling support, using either • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 2007 页 | 6.73 MB | 3 月前3 Julia 1.11.4(GNU), others optionally permit placing hidden arguments directly after the character argument (Intel, PGI). For example, Fortran subroutines of the form subroutine test(str1, str2) character(len=*) only has an effect if Julia was compiled with JIT profiling support, using either • Intel's VTune™ Amplifier (USE_INTEL_JITEVENTS set to 1 in the build configuration), or • OProfile (USE_OPROFILE_JITEVENTS External Profiling Currently Julia supports Intel VTune, OProfile and perf as external profiling tools. Depending on the tool you choose, compile with USE_INTEL_JITEVENTS, USE_OPROFILE_JITEVENTS and USE_PERF_JITEVENTS0 码力 | 2007 页 | 6.73 MB | 3 月前3
共 21 条
- 1
- 2
- 3













