《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesbriefly introduced learning techniques such as regularization, dropout, data augmentation, and distillation to improve quality. These techniques can boost metrics like accuracy, precision, recall, etc. often are our primary quality concerns. We have chosen two of them, namely data augmentation and distillation, to discuss in this chapter. This is because, firstly, regularization and dropout are fairly straight-forward straight-forward to enable in any modern deep learning framework. Secondly, data augmentation and distillation can bring significant efficiency gains during the training phase, which is the focus of this chapter0 码力 | 56 页 | 18.93 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewtechniques. To recap, learning techniques can help us meet our model quality goals. Techniques like distillation and data augmentation improve the model quality, without increasing the footprint of the model training compute budget, so this approach is a non-starter. While techniques like data-augmentation, distillation etc. as introduced in chapter 3 do help us achieve better quality with fewer labels and fewer techniques like distillation might not be as helpful in certain settings. Subclass distillation in the next subsection can help us in some of these cases. Let’s find out how. Subclass Distillation It can also0 码力 | 31 页 | 4.03 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionefficient model by trimming the number of parameters if needed. An example of a learning technique is Distillation (see Figure 1-10), which helps a smaller model (student) that can be deployed, to learn from a of probabilities for each of the possible classes according to the teacher model. Figure 1-10: Distillation of a smaller student model from a larger pre-trained teacher model. Both the teacher’s weights way. In the original paper which proposed distillation, Hinton et al. replicated performance of an ensemble of 10 models with one model when using distillation. For vision datasets like CIFAR-10, an accuracy0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesmodel using data augmentation to achieve higher performance and subsequently apply compression or distillation to further reduce its footprint. With this chapter, we hope to have set the stage for your exploration your deep learning projects. They can often be combined with other approaches like quantization, distillation, data augmentation, that we already learned. In the next chapter we will explore some more advanced0 码力 | 53 页 | 3.92 MB | 1 年前3
动手学深度学习 v2.0Socher, R. (2018). A closer look at deep learn‐ ing heuristics: learning rate restarts, warmup and distillation. arXiv preprint arXiv:1810.13243. [Graves, 2013] Graves, A. (2013). Generating sequences with0 码力 | 797 页 | 29.45 MB | 1 年前3
共 5 条
- 1













