TVM: Where Are We GoingASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data flow graph Hardware Primitive Tensor operators such as Conv2D eg. cuDNN Offload to heavily optimized intensiveMachine Learning based Program Optimizer TVM: Learning-based Learning System High-level data flow graph and optimizations Directly generate optimized program for new operator workloads and hardware SubclassesUnified Runtime Benefit mod.export_library("mylib.so") Unified library packaging Free API (Py/Java/Go) lib = tvm.module.load("mylib.so") func = lib["npufunction0"] func(a, b) Automatic0 码力 | 31 页 | 22.64 MB | 5 月前3
Dynamic Model in TVMdependent: arange, nms, etc. ○ Control flow: concatenate within a while loop Limitation of TVM/graph runtime ● Cannot compile and run dynamic models© 2019, Amazon Web Services, Inc. or its Affiliates new runtime for Relay ● Dynamic codegen (WIP) ○ Kernel dispatch for a single op ○ Graph dispatch for a (sub-)graph In collaboration with Jared Roesch, Zhi Chen, Wei Chen© 2019, Amazon Web Services, implement© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dispatch a Whole Graph Resnet Data -> (Any, 3, 224, 224) Dispatch Tree Resnet_copy0 Resnet_copy1 ... 1 <= bs < 17 170 码力 | 24 页 | 417.46 KB | 5 月前3
PAI & TVM Meetup - Shanghai 20191116TensorCore 。Work at the scheduling level: the less the better 。 The requirement of familiarity with WMMA API “Unified matmul schedule for GPU 。 Maintainability & Common Optimization Sharing 。 Search across PLATFORM COMPUTING PLATFORM INT8 Inference on PAI- 引FTe[= PAI-Blade Model Analysis Graph optimization Blade Graph Optimizer TensorRT Customized OptimizeT TAO Compiler (XLA) cuUBLAS/VcuDNNVCUTL, Blade0 码力 | 26 页 | 5.82 MB | 5 月前3
OpenAI 《A practical guide to building agents》For example, a step might instruct the agent to ask the user for their order number or to call an API to retrieve account details. Being explicit about the action (and even the wording of a user-facing express workflow logic using familiar programming constructs without needing to pre-define the entire graph upfront, enabling more dynamic and adaptable agent orchestration. 20 A practical guide to building we combine LLM-based guardrails, rules-based guardrails such as regex, and the OpenAI moderation API to vet our user inputs. Respond ‘we cannot process your message. Try again!’ Continue with function0 码力 | 34 页 | 7.00 MB | 6 月前3
Trends Artificial Intelligence
Year 3 90% @ Year 23 10/22 4/25 800MM Big Six* USA Technology Company CapEx *Apple, NVIDIA, Microsoft, Alphabet, Amazon (AWS only), & Meta Platforms Source: Capital IQ (3/25), Morgan Stanley 2014 2024 applied effectively, even if it hasn’t yet generated revenue. Source: Microsoft, ‘Governing AI: A Blueprint for the Future,’ Microsoft Report (5/23); Data via Maddison Project & Our World in Data Technology installed based of smartphones & tablets in 2020. Cloud & data center capex includes Google, Amazon, Microsoft, Meta, Alibaba, Apple, IBM, Oracle, Tencent, & Baidu for ten years ending 2022. ‘Tens of billions0 码力 | 340 页 | 12.14 MB | 4 月前3
清华大学第二弹:DeepSeek赋能职场DeepSeek的三种模式 平台 地址 版本 备注 英伟达NIM微服务 https://build.nvidia.com/d eepseek-ai/deepseek-r1 671B(全量模型) 网页版直接使用,支持API调用,注册送1000点数,免费体验。 微软Azure https://ai.azure.com 671B(全量模型) 需注册微软账户并创建订阅,免费部署,支持参数调节。 亚马逊AWS https://aws 收集详细的流程或架构描述。 根据描述分析并设计图表结构。 生成并输出符合Mermaid语法的代码。 校验代码,确保没有语法错误。 将最终代码提供给用户。 输出格式: Mermaid图表代码。 示例: graph TD; A[开始] --> B[做事情]; B --> C[结束]; 如何使用DeepSeek制作可视化图表? 角色: PPT大纲辅助生成 功能: 根据用户提供的主题、0 码力 | 35 页 | 9.78 MB | 8 月前3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单何静 能做什么? 要怎么做? 效果如何? 一 能做什么? 数据挖掘 数据分析 数据采集 数据处理 数据可视化 AIGC 数据应用 通过编写爬虫代码、访问数据库、读取文件、调用API等方式,采 集社交媒体数据、数据库内容、文本数据、接口数据等。 通过数据清洗、数据集成、数据变换、特征工程等方式,实 现数据纠错、数据整合、格式转换、特征提取等。 对数据进行诊断、预测、关联、聚类分析,常用于问题 heatmap using this data? 创建一个热力图 Can you segment this data and create a table? 切分数据 Can you create a graph using this data? 制作一个图 Can you create a world cloud? 做一个词云 Can you create a chart using this data 发 者能够负担得起高性能 AI 模型的训练和使用。 调用成本:DeepSeek R1 的 API 服务定价为每百万输入 tokens 1 元(缓存命中)/4 元(缓存未命中),每百万输出 tokens 16 元, 输出 API 价格仅为 OpenAI o1 的 3%。这种低廉的 API 价格进一 步降低了使用门槛。 DeepSeek R1 采用 MIT 许可协议开源发布,允许全球的研究者和开0 码力 | 85 页 | 8.31 MB | 8 月前3
Bring Your Own Codegen to TVMSystem Overview Relay IR Graph Annotation with Your Annotator Graph Partitioning Your Codegen LLVM, CUDA, Metal, VTA Serialized Subgraph Library Relay Runtime (VM, Graph Runtime, Interpreter) Mark supported operators or subgraphs 1. Implement an operator-level annotator, OR 2. Implement a graph-level annotator© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Option 1: rights reserved. Option 2: Graph-Level Annotation ● Implement a Relay IR visitor to annotate a subgraph ● Module path: python/tvm/relay/op/contrib//graph_annotator.py ● Apply the annotator 0 码力 | 19 页 | 504.69 KB | 5 月前3
TVM Meetup: Quantizationingests a FP32 graph and a small dataset • Finds suitable quantization scale • Produces a quantized graph • Compiling Pre-quantized models – QNN Dialect • TVM ingests a pre-quantized graph in TFLite or Affiliates. All rights reserved. TVM Overview Framework Graph Mxnet TF …. parsers Relay Graph Target-independent Relay passes Target-optimized graph Target-dependent Relay passes Intel x86 ARM CPU targets AutoTVM – Tuning the kernels Optimized Binary Codegen – LLVM, Cuda, C, … Framework Parsers Graph level optimizations Tensor-level optimizations Machine code generation© 2019, Amazon Web Services0 码力 | 19 页 | 489.50 KB | 5 月前3
XDNN TVM - Nov 2019Runtime Image Model Weights Calibration Set Quantizer Compiler Tensor Graph Optimization Framework Tensor Graph to Xilinx Tensor Graph Frontend Deep Learning Frameworks https://github.com/xilinx© Copyright Copyright 2018 Xilinx TVM as Unified ML Front End >> 6 Relay (and NNVM) Graph Parser XIR Compiler Quantizer Partitioner @relay.transform.module_pass(opt_level=4) class AccelModule:© Copyright 2018 supported/not supported, pattern matching graph colorization - Choices how to partition especially for multi-branch networks (i.e. YOLOv3, SSD)© Copyright 2018 Xilinx TVM Graph Partitioning/Fusion >> 8 Subgraph0 码力 | 16 页 | 3.35 MB | 5 月前3
共 26 条
- 1
- 2
- 3













