TVM: Where Are We Goingoptimized Open source, automated end-to- end optimization framework for deep learning.TVM Stack High-Level Differentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data flow graph Hardware Primitive Tensor operators such as Conv2D eg. cuDNN Offload to FrameworksLimitations of Existing Approach cuDNN Frameworks New operator introduced by operator fusion optimization potential benefit: 1.5x speedup Engineering intensiveMachine Learning based Program Optimizer0 码力 | 31 页 | 22.64 MB | 5 月前3
PAI & TVM Meetup - Shanghai 20191116requirement of familiarity with WMMA API “Unified matmul schedule for GPU 。 Maintainability & Common Optimization Sharing 。 Search across the entire space (TensorCore + non-TensorCore) Our >olution wmma:mma_syncfcompute Jocalloj B_shareal_locollol A_sharea_locolloj compute_locallo族 了 了 Performance Optimization 计划了全事业部 “Same as non-TensorCore CUDA codegen 。Auto tune tiling sizes 。 Vectorized COMPUTING PLATFORM COMPUTING PLATFORM INT8 Inference on PAI- 引FTe[= PAI-Blade Model Analysis Graph optimization Blade Graph Optimizer TensorRT Customized OptimizeT TAO Compiler (XLA) cuUBLAS/VcuDNNVCUTL,0 码力 | 26 页 | 5.82 MB | 5 月前3
Curve for CNCF Mainversion on CurveBS • v1.2 supporting QOS, Discard, data silent check • v1.3 some performance optimization • more details https://github.com/opencurve/curve/releases • Now working on CurveFSRoadmap management and patch and minor version upgrads supported • File meta data preallocate • RAFT optimization • ParallelRaft for write • Reduce write magnification for file new write • Cloud tiering0 码力 | 21 页 | 4.56 MB | 6 月前3
Facebook -- TVM AWS Meetup Talkdcache - also available today in FBGEMMPyTorch and TVM - Lots of opportunity in PyTorch - Graph optimization - Existing fusion infrastructure fairly limited (CUDA-only, injective-only) - Kernel synthesis0 码力 | 11 页 | 3.08 MB | 5 月前3
共 4 条
- 1













