 Gluon DeploymentAll rights reserved. Amazon Trademark Deploy GluonCV Models GluonCV Models MXNet Computational Graph Json Acyclic Graph Export As-is Optimize with TVM© 2019, Amazon Web Services, Inc. or its Affiliates its Affiliates. All rights reserved. Amazon Trademark Like GluonCV? Go build! https://gluon-cv.mxnet.io https://github.com/dmlc/gluon-cv© 2019, Amazon Web Services, Inc. or its Affiliates. All rights0 码力 | 8 页 | 16.18 MB | 5 月前3 Gluon DeploymentAll rights reserved. Amazon Trademark Deploy GluonCV Models GluonCV Models MXNet Computational Graph Json Acyclic Graph Export As-is Optimize with TVM© 2019, Amazon Web Services, Inc. or its Affiliates its Affiliates. All rights reserved. Amazon Trademark Like GluonCV? Go build! https://gluon-cv.mxnet.io https://github.com/dmlc/gluon-cv© 2019, Amazon Web Services, Inc. or its Affiliates. All rights0 码力 | 8 页 | 16.18 MB | 5 月前3
 TVM Meetup: Quantizationingests a FP32 graph and a small dataset • Finds suitable quantization scale • Produces a quantized graph • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or or MxNet • Use high-level wrapper ops of QNN dialect© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TVM Overview Framework Graph Mxnet TF …. parsers Relay Graph Target-independent Target-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets AutoTVM – Tuning0 码力 | 19 页 | 489.50 KB | 5 月前3 TVM Meetup: Quantizationingests a FP32 graph and a small dataset • Finds suitable quantization scale • Produces a quantized graph • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or or MxNet • Use high-level wrapper ops of QNN dialect© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TVM Overview Framework Graph Mxnet TF …. parsers Relay Graph Target-independent Target-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU Nvidia GPU ARM GPU Schedule templates written in TVM Tensor IR .. More targets AutoTVM – Tuning0 码力 | 19 页 | 489.50 KB | 5 月前3
 Bring Your Own Codegen to TVMcan run any models Your compiler (TVM) supports multiple frontends (e.g., TensorFlow, PyTorch, MXNet) Non Maximum Suppression ResNet-50 Dense Your Chip Your Chip© 2019, Amazon Web Services, Inc System Overview Relay IR Graph Annotation with Your Annotator Graph Partitioning Your Codegen LLVM, CUDA, Metal, VTA Serialized Subgraph Library Relay Runtime (VM, Graph Runtime, Interpreter) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement a graph-level annotator© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Option 1:0 码力 | 19 页 | 504.69 KB | 5 月前3 Bring Your Own Codegen to TVMcan run any models Your compiler (TVM) supports multiple frontends (e.g., TensorFlow, PyTorch, MXNet) Non Maximum Suppression ResNet-50 Dense Your Chip Your Chip© 2019, Amazon Web Services, Inc System Overview Relay IR Graph Annotation with Your Annotator Graph Partitioning Your Codegen LLVM, CUDA, Metal, VTA Serialized Subgraph Library Relay Runtime (VM, Graph Runtime, Interpreter) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement a graph-level annotator© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Option 1:0 码力 | 19 页 | 504.69 KB | 5 月前3
 XDNN TVM - Nov 2019Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization Framework Tensor Graph to Xilinx Tensor Graph Frontend Deep https://github.com/xilinx© Copyright 2018 Xilinx TVM as Unified ML Front End >> 6 Relay (and NNVM) Graph Parser XIR Compiler Quantizer Partitioner @relay.transform.module_pass(opt_level=4) class AccelModule:© supported/not supported, pattern matching graph colorization - Choices how to partition especially for multi-branch networks (i.e. YOLOv3, SSD)© Copyright 2018 Xilinx TVM Graph Partitioning/Fusion >> 8 Subgraph0 码力 | 16 页 | 3.35 MB | 5 月前3 XDNN TVM - Nov 2019Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization Framework Tensor Graph to Xilinx Tensor Graph Frontend Deep https://github.com/xilinx© Copyright 2018 Xilinx TVM as Unified ML Front End >> 6 Relay (and NNVM) Graph Parser XIR Compiler Quantizer Partitioner @relay.transform.module_pass(opt_level=4) class AccelModule:© supported/not supported, pattern matching graph colorization - Choices how to partition especially for multi-branch networks (i.e. YOLOv3, SSD)© Copyright 2018 Xilinx TVM Graph Partitioning/Fusion >> 8 Subgraph0 码力 | 16 页 | 3.35 MB | 5 月前3
 Dynamic Model in TVMdependent: arange, nms, etc. ○ Control flow: concatenate within a while loop Limitation of TVM/graph runtime ● Cannot compile and run dynamic models© 2019, Amazon Web Services, Inc. or its Affiliates new runtime for Relay ● Dynamic codegen (WIP) ○ Kernel dispatch for a single op ○ Graph dispatch for a (sub-)graph In collaboration with Jared Roesch, Zhi Chen, Wei Chen© 2019, Amazon Web Services, implement© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dispatch a Whole Graph Resnet Data -> (Any, 3, 224, 224) Dispatch Tree Resnet_copy0 Resnet_copy1 ... 1 <= bs < 17 170 码力 | 24 页 | 417.46 KB | 5 月前3 Dynamic Model in TVMdependent: arange, nms, etc. ○ Control flow: concatenate within a while loop Limitation of TVM/graph runtime ● Cannot compile and run dynamic models© 2019, Amazon Web Services, Inc. or its Affiliates new runtime for Relay ● Dynamic codegen (WIP) ○ Kernel dispatch for a single op ○ Graph dispatch for a (sub-)graph In collaboration with Jared Roesch, Zhi Chen, Wei Chen© 2019, Amazon Web Services, implement© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dispatch a Whole Graph Resnet Data -> (Any, 3, 224, 224) Dispatch Tree Resnet_copy0 Resnet_copy1 ... 1 <= bs < 17 170 码力 | 24 页 | 417.46 KB | 5 月前3
 TVM@AliOSHexagon DSP 人NiOS ! 驱动万物知 Tensorflow deploy.so / deploy.json / deploy.bin | NNVM / Relay 让 Graph Optimization 站 站 Compile | libtvm_hexagon_runtime.so Alios TVM @ Hexagon DSP 。 Compute Kernel 6 4 2.353 2. , 曾硬证 0 Mobilenet 1.0 densenet121 量TVM (with Auto Tuning) 目MXNet+ TensorRT 目TVM +TensorRT AiiOS ! 驱动万物智能 THANKS9 Ali0S ! 驱动万物智能0 码力 | 27 页 | 4.86 MB | 5 月前3 TVM@AliOSHexagon DSP 人NiOS ! 驱动万物知 Tensorflow deploy.so / deploy.json / deploy.bin | NNVM / Relay 让 Graph Optimization 站 站 Compile | libtvm_hexagon_runtime.so Alios TVM @ Hexagon DSP 。 Compute Kernel 6 4 2.353 2. , 曾硬证 0 Mobilenet 1.0 densenet121 量TVM (with Auto Tuning) 目MXNet+ TensorRT 目TVM +TensorRT AiiOS ! 驱动万物智能 THANKS9 Ali0S ! 驱动万物智能0 码力 | 27 页 | 4.86 MB | 5 月前3
 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modeldevice-level token-dropping strategy during training. This approach first computes the average computational budget for each device, which means that the capacity factor for each device is equivalent to al. (2021), we drop tokens with the lowest affinity scores on each device until reaching the computational budget. In addition, we ensure that the tokens belonging to approximately 10% of the training for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2368– 2378. Association for Computational Linguistics0 码力 | 52 页 | 1.23 MB | 1 年前3 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modeldevice-level token-dropping strategy during training. This approach first computes the average computational budget for each device, which means that the capacity factor for each device is equivalent to al. (2021), we drop tokens with the lowest affinity scores on each device until reaching the computational budget. In addition, we ensure that the tokens belonging to approximately 10% of the training for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2368– 2378. Association for Computational Linguistics0 码力 | 52 页 | 1.23 MB | 1 年前3
 Trends Artificial Intelligence
arithmetic calculation involving decimal numbers. In AI, total FLOPs are often used to estimate the computational cost of training or running a model. Note: Only language models shown (per Epoch AI, includes development – one that builds on recent exponential gains in model scale, training data, and computational efficiency. Timelines for AGI remain uncertain, but expert expectations have shifted forward scale and sophistication of artificial intelligence is demanding an extraordinary amount of computational horsepower, primarily from AI-focused data centers. These facilities – purpose-built to train0 码力 | 340 页 | 12.14 MB | 4 月前3 Trends Artificial Intelligence
arithmetic calculation involving decimal numbers. In AI, total FLOPs are often used to estimate the computational cost of training or running a model. Note: Only language models shown (per Epoch AI, includes development – one that builds on recent exponential gains in model scale, training data, and computational efficiency. Timelines for AGI remain uncertain, but expert expectations have shifted forward scale and sophistication of artificial intelligence is demanding an extraordinary amount of computational horsepower, primarily from AI-focused data centers. These facilities – purpose-built to train0 码力 | 340 页 | 12.14 MB | 4 月前3
 OctoML OSS 2019 11 8Co-founder Architect Co-founder Architect PhDin Compurer Archiecure PhD in Computational PhD in Machine Lesming Phb in Computer Arhiecure oon) PhD0 码力 | 16 页 | 1.77 MB | 5 月前3 OctoML OSS 2019 11 8Co-founder Architect Co-founder Architect PhDin Compurer Archiecure PhD in Computational PhD in Machine Lesming Phb in Computer Arhiecure oon) PhD0 码力 | 16 页 | 1.77 MB | 5 月前3
 OpenAI - AI in the Enterprisedata. We knew we had a winner on our hands! Nishant Gupta Senior Director, Data, Analytics and Computational Intelligence Product Note: OpenAI has launched Vision Fine-Tuning to further improve ecommerce0 码力 | 25 页 | 9.48 MB | 5 月前3 OpenAI - AI in the Enterprisedata. We knew we had a winner on our hands! Nishant Gupta Senior Director, Data, Analytics and Computational Intelligence Product Note: OpenAI has launched Vision Fine-Tuning to further improve ecommerce0 码力 | 25 页 | 9.48 MB | 5 月前3
共 17 条
- 1
- 2













