Bridging the Gap: Writing Portable Programs for CPU and GPU1/66Bridging the Gap: Writing Portable Programs for CPU and GPU using CUDA Thomas Mejstrik Sebastian Woblistin 2/66Content 1 Motivation Audience etc.. Cuda crash course Quiz time 2 Patterns Oldschool afterwards7/66 Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Algorithms are designed differently Latency/Throughput Memory bandwidth Number of talk7/66 Motivation Patterns The dark path Cuda proposal Thank you Why write programs for CPU and GPU Difference CPU/GPU Why it makes sense? Library/Framework developers Embarrassingly parallel algorithms0 码力 | 124 页 | 4.10 MB | 6 月前3
C++ Exceptions for Smaller Firmwareheap? 48I would give an answer… They'd accept… 49Let's Make exceptions work on ARM! 50Arm GNU Toolchain GCC 11.3 arm-none-eabi-g++ -o except.elf except.cpp -std=c++20 -Os -g -fexceptions time the code throws) 53💥 Breaking Barrier #1 Enabling Exceptions in ARM GCC 54Step 1: Download & Build "ARM GNU Toolchain download" 55 Step 2. Find and modify -fno-exceptions → -fexceptionsStep you That was to show you that something is here. 94C++ Exceptions from throw to catch on GCC ARM 95Things that will NOT be covered here ● Nested exceptions ● Anything other than table based exceptions0 码力 | 237 页 | 6.74 MB | 6 月前3
TVM@AliOSTVMQ@Alios AIOS ! 驱动万物智能 PRESENTATION AGENDA 人 人 e 人 e@ TVM Q@ AliOs Overview TVM @ AliOs ARM CPU TVM @ AliOos Hexagon DSP TVM @ Alios Intel GPU Misc /NiiOS ! 驱动万物智能 PART ONE TVM Q@ AliOs Overview Multimodal Interection CPU (ARM、Intel) 1驱动万物智能 Accelerated Op Library / Others Inference Engine DSP (Qualcomm) PART TWO Alios TVM @ ARM CPU AiOS 1驱动万物智能 Alios TVMQOARM CPU 。 Support TFLite ( Open Open Source and Upstream Master ) 。, Optimize on INT8 & FP32 AiiOS ! 驱动万物智能 Alios TVM @ ARM CPU INT8 * Cache 芍四 Data FO Data FOData … QNNPACK Convolution 。,NHWC layout Cach, 浆百0 码力 | 27 页 | 4.86 MB | 5 月前3
TVM Meetup Nov. 16th - Linaro2019Bringing together the Arm ecosystemLinaro AI Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative Internal Jira project restricted to Linaro members ● Three sub-projects: ○ Arm Compute Library ○ Arm NN ○ Android NN Driver ● Arm Compute Library has been integrated by: ○ MATLAB Coder ○ ONNX RuntimeArm upstream IPs Target Hardware/Model Options Codegen CPU arm_cpu pixel2 (snapdragon 835), mate10/mate10pro (kirin 970), p20/p20pro (kirin 970) -target=arm64-linux-android -mattr=+neon llvm firefly rk33990 码力 | 7 页 | 1.23 MB | 5 月前3
TVM Meetup: QuantizationTarget-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets AutoTVM – Tuning the Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon Target-independent Relay passes Target-optimized Int8 Relay Graph Intel x86 schedule ARM CPU schedule Nvidia GPU schedule ARM GPU schedule Relay Int8 Graph Target-dependent Relay layout opt© 2019, Amazon0 码力 | 19 页 | 489.50 KB | 5 月前3
Just-in-Time Compilation - J F Bastien - CppCon 2020interpreter a JiT, or AoT? What if it modifies its bytecode? A CPU executes machine code… an interpreter executes bytecode… That’s the same thing, a CPU is an interpreter for machine code. A compiler can perform factor of 7 to 20. How do you develop hardware that doesn’t exist yet? When I started working on a CPU, it wasn’t obvious to me that you do so with simulators—multiple simulators—which simulate what the to the performance level of -O4.” “ Rosenblum et al. 1995 By far the fastest simulator of the CPU, MMU, and memory system of an SGI multiprocessor is an SGI multiprocessor. In other words, when0 码力 | 111 页 | 3.98 MB | 6 月前3
Conan 2.10 Documentationif self.settings.os == "Macos" and self.settings.arch == "armv8": raise ConanInvalidConfiguration("ARM v8 not supported in Macos") Conditional requirements using a conanfile.py You could add some logic cppstd=gnu14 compiler.libcxx=libstdc++11 compiler.version=9 [buildenv] CC=arm-linux-gnueabihf-gcc-9 CXX=arm-linux-gnueabihf-g++-9 LD=arm-linux-gnueabihf-ld Important: Please, take into account that in order case the host machine is a Raspberry Pi 3 with armv7hf architecture operating system and we have the arm-linux-gnueabihf toolchain installed in the Ubuntu machine. If you have a look at the raspberry profile0 码力 | 803 页 | 5.02 MB | 10 月前3
Conan 2.6 Documentationif self.settings.os == "Macos" and self.settings.arch == "armv8": raise ConanInvalidConfiguration("ARM v8 not supported in Macos") Conditional requirements using a conanfile.py You could add some logic cppstd=gnu14 compiler.libcxx=libstdc++11 compiler.version=9 [buildenv] CC=arm-linux-gnueabihf-gcc-9 CXX=arm-linux-gnueabihf-g++-9 LD=arm-linux-gnueabihf-ld Important: Please, take into account that in order case the host machine is a Raspberry Pi 3 with armv7hf architecture operating system and we have the arm-linux-gnueabihf toolchain installed in the Ubuntu machine. If you have a look at the raspberry profile0 码力 | 777 页 | 4.91 MB | 10 月前3
Conan 2.9 Documentationif self.settings.os == "Macos" and self.settings.arch == "armv8": raise ConanInvalidConfiguration("ARM v8 not supported in Macos") Conditional requirements using a conanfile.py You could add some logic cppstd=gnu14 compiler.libcxx=libstdc++11 compiler.version=9 [buildenv] CC=arm-linux-gnueabihf-gcc-9 CXX=arm-linux-gnueabihf-g++-9 LD=arm-linux-gnueabihf-ld Important: Please, take into account that in order case the host machine is a Raspberry Pi 3 with armv7hf architecture operating system and we have the arm-linux-gnueabihf toolchain installed in the Ubuntu machine. If you have a look at the raspberry profile0 码力 | 795 页 | 4.99 MB | 10 月前3
Conan 2.7 Documentationif self.settings.os == "Macos" and self.settings.arch == "armv8": raise ConanInvalidConfiguration("ARM v8 not supported in Macos") Conditional requirements using a conanfile.py You could add some logic cppstd=gnu14 compiler.libcxx=libstdc++11 compiler.version=9 [buildenv] CC=arm-linux-gnueabihf-gcc-9 CXX=arm-linux-gnueabihf-g++-9 LD=arm-linux-gnueabihf-ld Important: Please, take into account that in order case the host machine is a Raspberry Pi 3 with armv7hf architecture operating system and we have the arm-linux-gnueabihf toolchain installed in the Ubuntu machine. If you have a look at the raspberry profile0 码力 | 779 页 | 4.93 MB | 10 月前3
共 267 条
- 1
- 2
- 3
- 4
- 5
- 6
- 27













