Bring Your Own Codegen to TVMAmazon/Intel Confidentia Presenter: Zhi Chen, Cody Yu Amazon SageMaker Neo, Deep Engine Science Bring Your Own Codegen to TVM AWS AI© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved dense, ReLU, etc) Now your customer wants to run a YOLO model, but... ResNet-50 Dense Non Maximum Suppression Non Maximum Suppression (NMS) is too new to be supported by your chip But NMS is supported Compiler of Your Chip Your chip can run any models Your compiler (TVM) supports multiple frontends (e.g., TensorFlow, PyTorch, MXNet) Non Maximum Suppression ResNet-50 Dense Your Chip Your Chip© 20190 码力 | 19 页 | 504.69 KB | 5 月前3
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language ModelFamily Command R Family Qwen1.5 Family (a) 0 50 100 150 200 250 300 DeepSeek-V2 DeepSeek 67B saving 42.5% of training costs Training Costs (K GPU Hours/T Tokens) 0 100 200 300 400 DeepSeek-V2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Conclusion, Limitation, and Future Work 21 A Contributions and Acknowledgments 27 B DeepSeek-V2-Lite: A 16B Model Equipped with MLA and summarize the conclusion, deliberate on the current limitations of DeepSeek-V2, and outline our future work (Section 5). 2. Architecture By and large, DeepSeek-V2 is still in the Transformer architecture0 码力 | 52 页 | 1.23 MB | 1 年前3
OpenAI - AI in the Enterprisenew way to work 3 Executive summary 5 Seven lessons for enterprise AI adoption Start with evals 6 Embed AI into your products 9 Start now and invest early 11 Customize and fine-tune your models 13 AI in the hands of experts 16 Unblock your developers 18 Set bold automation goals 21 Conclusion 22 More resources 24 2 AI in the EnterpriseA new way to work As an AI research and deployment company OpenAI prioritizes partnering with global companies because our models will increasingly do their best work with sophisticated, complex, interconnected workflows and systems. We’re seeing AI deliver significant0 码力 | 25 页 | 9.48 MB | 5 月前3
Google 《Prompt Engineering v7》effective prompt can be complicated. Many aspects of your prompt affect its efficacy: the model you use, the model’s training data, the model configurations, your word-choice, style and tone, structure, and When prompt engineering, you will start by choosing a model. Prompts might need to be optimized for your specific model, regardless of whether you use Gemini language models in Vertex AI, GPT, Claude, or need to tinker with the various configurations of a LLM. LLM output configuration Once you choose your model you will need to figure out the model configuration. Most LLMs come with various configuration0 码力 | 68 页 | 6.50 MB | 6 月前3
Trends Artificial Intelligence
report to life. And, to the many friends and technology builders who helped, directly or via your work, and are driving technology forward.• Seem Like Change Happening Faster Than Ever? Yes, It Is • Global Internet User Ramps Powered by AI from Get-Go = Growth We Have Not Seen Likes of Before • AI & Work Evolution = Real + Rapid 3 1 2 3 4 5 6 7 8 9-51 52-128 129-152 153-247 248-298 299-307 Source: Sensor Tower (5/25) 5/23 4/25 Mobile App Monthly Active Users, MM Details on Page 315 AI & Work Evolution = Real + Rapid 8 USA IT Jobs – AI vs. Non-AI Details on Page 302 +448% -9% 1/180 码力 | 340 页 | 12.14 MB | 4 月前3
TVM@AliOSprimitive completely, no tensorize 。 Some Experience: 1 Avoid DataPack 2. Generate SMLAL instruction if your ARM does not have dot 3. compute_at is very important /NiiOS ! 驱动万物智能 Alios TVM @ ARM CPU INT8 Alios TVM @ Hexagon DSP 。, Performance is our focus next. We tvm.caLL_pure_intrin begin to do some work now. Such 本 站,可 as writing Tensorize to generate vec tvm,const(0, vrmpy instruction0 码力 | 27 页 | 4.86 MB | 5 月前3
TVM@Alibaba AI LabsAlibaba ALLabs 阿里巴巴人工智能实验室 Blocking Splits the workload into thread blocks (work groups) and individual threads (work items) Processing Element batch 二 (workitem) 2 下 罗汪| 门一一 Compute Unit 和 | (Work group) 名 | | | Apady+m in_channel x+p -一一 人 下| [lm ] Cooperative Fetching Lets threads (work item) in the same thread block (work group) cooperatively fetch dependent data https/www khronos.org/registry/DpenCLspecs/opencl-10 码力 | 12 页 | 1.94 MB | 5 月前3
OpenAI 《A practical guide to building agents》practices to ensure your agents run safely, predictably, and effectively. After reading this guide, you’ll have the foundational knowledge you need to confidently start building your first agent. 3 A behalf with a high degree of independence. Agents are systems that independently accomplish tasks on your behalf. A workflow is a sequence of steps that must be executed to meet the user’s goal, whether that's guide to building agents When should you build an agent? Building agents requires rethinking how your systems make decisions and handle complexity. Unlike conventional automation, agents are uniquely0 码力 | 34 页 | 7.00 MB | 6 月前3
Facebook -- TVM AWS Meetup Talktranscendentals (exp, tanh, erf, etc) - very general technique, allows clean vectorization - Related work in Gibiansky (2017), Gray (2019), et al. Image from OpenAI- Add relay.nn.sparse_dense for block-sparse reinterpret to implement rational approximations in user space (~10 lines of Relay IR) - A few days of work - TVM sampling model running in 30us on single server CPU core - Beat hand-written, highly optimized0 码力 | 11 页 | 3.08 MB | 5 月前3
TVM Meetup: Quantizationscratch • New Relay passes and TVM schedules required • AlterOpLayout, Graph Fusion etc require work/operator • No reuse of existing Relay and TVM infrastructure. Option 2 – Lower to a sequence of of existing Relay operators • We introduced a new Relay dialect – QNN to encapsulate this work • Complete reuse of Relay pass infrastructure • Possible reuse of TVM schedules (only to some extent)© 20190 码力 | 19 页 | 489.50 KB | 5 月前3
共 14 条
- 1
- 2













