 TVM@AliOSTVMQ@Alios AIOS ! 驱动万物智能 PRESENTATION AGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated Op Library / Others Inference Engine DSP (Qualcomm) PART TWO Alios TVM @ ARM CPU AiOS 1驱动万物智能 Alios TVMQOARM CPU 。 Support TFLite ( Open Open Source and Upstream Master ) 。, Optimize on INT8 & FP32 AiiOS ! 驱动万物智能 Alios TVM @ ARM CPU INT8 * Cache 芍四 Data FO Data FOData … QNNPACK Convolution 。,NHWC layout Cach, 浆百0 码力 | 27 页 | 4.86 MB | 5 月前3 TVM@AliOSTVMQ@Alios AIOS ! 驱动万物智能 PRESENTATION AGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated Op Library / Others Inference Engine DSP (Qualcomm) PART TWO Alios TVM @ ARM CPU AiOS 1驱动万物智能 Alios TVMQOARM CPU 。 Support TFLite ( Open Open Source and Upstream Master ) 。, Optimize on INT8 & FP32 AiiOS ! 驱动万物智能 Alios TVM @ ARM CPU INT8 * Cache 芍四 Data FO Data FOData … QNNPACK Convolution 。,NHWC layout Cach, 浆百0 码力 | 27 页 | 4.86 MB | 5 月前3
 TVM Meetup Nov. 16th - Linaro2019Bringing together the Arm ecosystemLinaro AI Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative Internal Jira project restricted to Linaro members ● Three sub-projects: ○ Arm Compute Library ○ Arm NN ○ Android NN Driver ● Arm Compute Library has been integrated by: ○ MATLAB Coder ○ ONNX RuntimeArm upstream IPs Target Hardware/Model Options Codegen CPU arm_cpu pixel2 (snapdragon 835), mate10/mate10pro (kirin 970), p20/p20pro (kirin 970) -target=arm64-linux-android -mattr=+neon llvm firefly rk33990 码力 | 7 页 | 1.23 MB | 5 月前3 TVM Meetup Nov. 16th - Linaro2019Bringing together the Arm ecosystemLinaro AI Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative Internal Jira project restricted to Linaro members ● Three sub-projects: ○ Arm Compute Library ○ Arm NN ○ Android NN Driver ● Arm Compute Library has been integrated by: ○ MATLAB Coder ○ ONNX RuntimeArm upstream IPs Target Hardware/Model Options Codegen CPU arm_cpu pixel2 (snapdragon 835), mate10/mate10pro (kirin 970), p20/p20pro (kirin 970) -target=arm64-linux-android -mattr=+neon llvm firefly rk33990 码力 | 7 页 | 1.23 MB | 5 月前3
 TVM Meetup: QuantizationTarget-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets AutoTVM – Tuning the Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon0 码力 | 19 页 | 489.50 KB | 5 月前3 TVM Meetup: QuantizationTarget-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets AutoTVM – Tuning the Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon0 码力 | 19 页 | 489.50 KB | 5 月前3
 TVM@Alibaba AI LabsLabs 阿里巴巴人工智能实验室 AiILabs & TVM PART 1 : ARM32 CPU CONTENT PART 2 : HIFI4 DSP PART 3 : _ PowervVR GPU [和| Alibaba AL.Labs 阿里巴巴人工智能实验室 ARM 32 CPU Resolution Quantization Orize Kernel ALIOS ent pl 1=int8 int8 * int8 int32 = int16 1 + int16 x int8 Alibaba Al.Labs 阿里巴巴人工智能实验室 CPU : MTK8167S (ARM32 A35 1.5GHz) Model : MobileNetV2_ 1.0_ 224 400 336 350 3丈 300 2500 码力 | 12 页 | 1.94 MB | 5 月前3 TVM@Alibaba AI LabsLabs 阿里巴巴人工智能实验室 AiILabs & TVM PART 1 : ARM32 CPU CONTENT PART 2 : HIFI4 DSP PART 3 : _ PowervVR GPU [和| Alibaba AL.Labs 阿里巴巴人工智能实验室 ARM 32 CPU Resolution Quantization Orize Kernel ALIOS ent pl 1=int8 int8 * int8 int32 = int16 1 + int16 x int8 Alibaba Al.Labs 阿里巴巴人工智能实验室 CPU : MTK8167S (ARM32 A35 1.5GHz) Model : MobileNetV2_ 1.0_ 224 400 336 350 3丈 300 2500 码力 | 12 页 | 1.94 MB | 5 月前3
 亿联TVM部署performance gain by autotuning 3. TVM can support many kinds of hardware platform: Intel/arm CPU, Nividia/arm GPU, VTA…5 �������������� 1. Get a .log file from the autotvm on Ubuntu 2. Use the .log0 码力 | 6 页 | 1.96 MB | 5 月前3 亿联TVM部署performance gain by autotuning 3. TVM can support many kinds of hardware platform: Intel/arm CPU, Nividia/arm GPU, VTA…5 �������������� 1. Get a .log file from the autotvm on Ubuntu 2. Use the .log0 码力 | 6 页 | 1.96 MB | 5 月前3
 TVM: Where Are We GoingHaichen Shen et.aluTVM: TVM on bare-metal Devices Support bare-metal J-TAG devices, no OS is needed ARM Cortex-M RISC-V Credit: Logan WeberuTVM upcoming: Self Hosted Runtime Credit: Logan WeberDesigned Runtime JIT compile accelerator micro code • Support heterogenous devices, 10x better than CPU on the same board. • Move hardware complexity to software HW-SW Blueprint for Flexible Deep Learning0 码力 | 31 页 | 22.64 MB | 5 月前3 TVM: Where Are We GoingHaichen Shen et.aluTVM: TVM on bare-metal Devices Support bare-metal J-TAG devices, no OS is needed ARM Cortex-M RISC-V Credit: Logan WeberuTVM upcoming: Self Hosted Runtime Credit: Logan WeberDesigned Runtime JIT compile accelerator micro code • Support heterogenous devices, 10x better than CPU on the same board. • Move hardware complexity to software HW-SW Blueprint for Flexible Deep Learning0 码力 | 31 页 | 22.64 MB | 5 月前3
 Julia 1.11.4Memory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3 Julia 1.11.4Memory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3
 Julia 1.11.5 DocumentationMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3 Julia 1.11.5 DocumentationMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3
 Julia 1.11.6 Release NotesMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3 Julia 1.11.6 Release NotesMemory-mapped I/O 1615 83 Network Options 1618 84 Pkg 1622 85 Printf 1626 86 Profiling 1629 86.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 86.2 Via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975CONTENTS xiv 107.6 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 107.7 Binary multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 2007 页 | 6.73 MB | 3 月前3
 julia 1.10.10Memory-mapped I/O 1380 81 Network Options 1383 82 Pkg 1387 83 Printf 1391 84 Profiling 1393 84.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1393 84.2 Via FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1664 103.9 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665 103.10 multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 1692 页 | 6.34 MB | 3 月前3 julia 1.10.10Memory-mapped I/O 1380 81 Network Options 1383 82 Pkg 1387 83 Printf 1391 84 Profiling 1393 84.1 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1393 84.2 Via FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1664 103.9 ARM (Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665 103.10 multi-threading provides the ability to schedule Tasks simultaneously on more than one thread or CPU core, sharing memory. This is usually the easiest way to get parallelism on one's PC or on a single0 码力 | 1692 页 | 6.34 MB | 3 月前3
共 24 条
- 1
- 2
- 3














 
 