Facebook -- TVM AWS Meetup TalkGRU units and FC layers - 24kHz sampling frequency requires 40us sampling net runtime - First PyTorch model used a 3,400us sampling net runtime Image from LPCNetExit, Pursued By A Bear - 3400us (baseline) (baseline), 40us (target) - 85x speedup - Uh ohEnter, TVM and model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with block-sparsified weight matrices - not trades off icache/ dcache - also available today in FBGEMMPyTorch and TVM - Lots of opportunity in PyTorch - Graph optimization - Existing fusion infrastructure fairly limited (CUDA-only, injective-only)0 码力 | 11 页 | 3.08 MB | 5 月前3
Bring Your Own Codegen to TVMYour chip can run any models Your compiler (TVM) supports multiple frontends (e.g., TensorFlow, PyTorch, MXNet) Non Maximum Suppression ResNet-50 Dense Your Chip Your Chip© 2019, Amazon Web Services0 码力 | 19 页 | 504.69 KB | 5 月前3
开源中国 2023 大模型(LLM)技术报告CUDA 和 Google Cloud TPU 均是此类工具。 这类工具通常由开源社区支持和维护,提供了灵活、可扩展的工具和 库来构建和训练大型机器学习模型,如 TensorFlow 和 PyTorch 和 Hugging Face Transformers 等。 TensorFlow 架构图 (图源:https://www.geeksforgeeks.org/architecture-of-0 码力 | 32 页 | 13.09 MB | 1 年前3
共 3 条
- 1













