 QCon北京2018-《从键盘输入到神经网络--深度学习在彭博的应用》-李碧野org/licenses/by-sa/4.0/deed.en K80 © 2018 Bloomberg Finance L.P. All rights reserved. Back to 2018 Heterogeneous Hardware Modified from https://upload.wikimedia.org/wikipedia/commons/6/67/Kubernetes_logo.svg0 码力 | 64 页 | 13.45 MB | 1 年前3 QCon北京2018-《从键盘输入到神经网络--深度学习在彭博的应用》-李碧野org/licenses/by-sa/4.0/deed.en K80 © 2018 Bloomberg Finance L.P. All rights reserved. Back to 2018 Heterogeneous Hardware Modified from https://upload.wikimedia.org/wikipedia/commons/6/67/Kubernetes_logo.svg0 码力 | 64 页 | 13.45 MB | 1 年前3
 从推荐模型的基础特点看大规模推荐类深度学习系统的设计 袁镱直接压缩->训练算法补偿 [2020] Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation [ICLR2018]Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training0 码力 | 22 页 | 6.76 MB | 1 年前3 从推荐模型的基础特点看大规模推荐类深度学习系统的设计 袁镱直接压缩->训练算法补偿 [2020] Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation [ICLR2018]Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training0 码力 | 22 页 | 6.76 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniques[-10.0, 10.0]. We need to transmit a collection (vector) of these variables over an expensive communication channel. Can we use quantization to reduce transmission size and thus save some costs? What if integer compressed tensor which uses 3*5*1 = 15 bytes. It gives us a 4x savings in storage and communication costs as compared to the high precision representation. The quantized representation size increases0 码力 | 33 页 | 1.96 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniques[-10.0, 10.0]. We need to transmit a collection (vector) of these variables over an expensive communication channel. Can we use quantization to reduce transmission size and thus save some costs? What if integer compressed tensor which uses 3*5*1 = 15 bytes. It gives us a 4x savings in storage and communication costs as compared to the high precision representation. The quantized representation size increases0 码力 | 33 页 | 1.96 MB | 1 年前3
 PyTorch Release NotesPyTorch RN-08516-001_v23.07 | 91 ‣ On H100 NVLink systems using 2 GPUs for training, certain communication patterns can trigger a corner-case bug that manifests either as a hang or as an "illegal instruction" now more accurate when using FP16. ‣ Improved distributed performance, specifically, gradient communication can now overlap with gradient computation in backwards(). ‣ Compatibility changes, specifically0 码力 | 365 页 | 2.94 MB | 1 年前3 PyTorch Release NotesPyTorch RN-08516-001_v23.07 | 91 ‣ On H100 NVLink systems using 2 GPUs for training, certain communication patterns can trigger a corner-case bug that manifests either as a hang or as an "illegal instruction" now more accurate when using FP16. ‣ Improved distributed performance, specifically, gradient communication can now overlap with gradient computation in backwards(). ‣ Compatibility changes, specifically0 码力 | 365 页 | 2.94 MB | 1 年前3
共 4 条
- 1













