《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression TechniquesThe sparsify_smallest() sets the absolute smallest weights in the input weight matrix to zero. The number of the absolute smallest weights is computed based on the sparsity_rate parameter which denotes the absolute magnitude of the weights. w_1d_sorted_indices = np.argsort(np.abs(w_1d)) # Compute the number of elements to zero. num_elements_to_zero = int(w_1d.shape[0] * sparsity_rate) # Set the respective also define a sparsity_rate variable initialized with the value 0.4 to sparsify 40% of the total number of weights. Finally, we compute the original weight matrix size, compressed weight matrix size, and0 码力 | 34 页 | 3.18 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 7 - AutomationOptimization improves two aspects of the training process: performance and convergence. Hyperparameters like number of filters in a convolution network or 1 Note that this search space is just choosing if we are couple of additional drawbacks. First, it suffers from the curse of dimensionality where the total number of trials grows quickly for each additional hyperparameter value or a new hyperparameter. Second differentiate between unimportant and important hyperparameters. Important hyperparameters have a larger number of subspaces or subranges than unimportant parameters that need to be searched for an optimal value0 码力 | 33 页 | 2.48 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesnumerical format, having an algorithmic way to meaningfully represent these inputs using a small number of numerical features, will help us solve tasks related to these inputs. Ideally this representation representation. It is useful because it is often computationally infeasible to work with data that has a large number of features. However, not all features might be equally important, thus selecting the most informative / Not Suitable), since there were very few examples. What if you have multiple classes / a large number of examples / more than two features? In those cases, we could use classical machine learning algorithms0 码力 | 53 页 | 3.92 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewlike BERT. Self-Supervised learning helps models to quickly achieve impressive quality with a small number of labels. As we described in chapter 3’s ‘Learning Techniques and Efficiency’ section, labeling relies heavily on labeled data, and hence achieving a high performance on a new task requires a large number of labels. 2. Compute Efficiency: Training for new tasks requires new models to be trained from model learns these representations it can then be fine-tuned with a small number of labeled examples over a reasonable number of training epochs to do well on the given task. We will go into details of0 码力 | 31 页 | 4.03 MB | 1 年前3
PyTorch Release Notesmodels that currently rely on it, but torch.cuda.amp is the future-proof alternative and offers a number of advantages over APEX AMP. ‣ Guidance and examples demonstrating torch.cuda.amp can be found here paper. It is based on the regular ResNet model, which substitutes 3x3 convolutions in the bottleneck block for 3x3 grouped convolutions. This model script is available on GitHub. ‣ SE-ResNext model: This models that currently rely on it, but torch.cuda.amp is the future-proof alternative and offers a number of advantages over APEX AMP. ‣ Guidance and examples demonstrating torch.cuda.amp can be found here0 码力 | 365 页 | 2.94 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - IntroductionWhile CPUs progressively became faster, thanks to Moore’s law, they were not optimized for the heavy number-crunching at the heart of deep learning. AlexNet1 was one of the earliest models to rely on Graphics result, changing the input variable leads to a very tiny gradient (if any), and when there are a large number of layers the gradient essentially vanishes. Availability of labelled data Even if one has enough model scaled well with the number of labeled examples, since the network had a large number of parameters. Thus to extract the most out of the setup, the model needed a large number of labeled examples. Collecting0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesthe model footprint (size, latency, memory etc.). We can reduce the model footprint by reducing the number of trainable parameters. However, this approach has two drawbacks. First, it is hard to determine the most frequent symbols will take the least number of bits to represent. In aggregate, this would be better than encoding each symbol with the same number of bits. The lookup table (figure 2-1 middle) correlated with the number of layers, and the number of parameters (assuming that the models are well-tuned). If we naively reduce the footprint, we can reduce the number of layers and number of parameters,0 码力 | 33 页 | 1.96 MB | 1 年前3
keras tutorialneuron layer. It has three important layers: Convolution layer: It is the primary building block and perform computational tasks based on convolution function. Pooling layer: It is arranged model. Keras 26 As learned earlier, Keras layers are the primary building block of Keras models. Each layer receives input information, do some computation and finally output the layer requires below minimum details to create a complete layer. Shape of the input data Number of neurons / units in the layer Initializers Regularizers Constraints Activations0 码力 | 98 页 | 1.57 MB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版202112timeit(cpu_run, number=3) gpu_time = timeit.timeit(gpu_run, number=3) print('warmup:', cpu_time, gpu_time) # 正式计算 10 次,取平均时间 cpu_time = timeit.timeit(cpu_run, number=10) 预览版202112 预览版202112 第 1 章 人工智能绪论 16 gpu_time = timeit.timeit(gpu_run, number=10) print('run time:', cpu_time, gpu_time) 将不同大小?下的 CPU 和 GPU 环境的运算时间绘制为曲线,如图 1.21 所示。可以看 到,在矩阵?和矩阵?较小时,CPU 和 GPU 时间非常接近,并不能体现出 式 将类别名一一对应到某个从 0 开始编号的数字,比如说硬币的正反面,可以用 0 来表示硬 币的反面,用 1 来表示硬币的正面,当然也可以反过来 1 表示硬币的反面,这种编码方式 叫作数字编码(Number Encoding)。对于手写数字图片识别问题,编码方式更为直观,直接 用数字的 0~9 来表示类别名为 0~9 的图片。 图 3.1 手写的数字图片样例 如果希望模型能够在新样本0 码力 | 439 页 | 29.91 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesefficiency and label efficiency. Sample Efficiency Sample Efficiency is concerned with the total number of training samples including repeats seen by the model to reach the desired performance threshold cup or a saucer. The number of times you have to point and call out the kind of the objects such that the child identifies them correctly with desired accuracy is the total number of samples. If you have have a training process that enables the child to reach the same accuracy by seeing a smaller number of samples, that process would be sample efficient. Similarly, a sample efficient model training process0 码力 | 56 页 | 18.93 MB | 1 年前3
共 36 条
- 1
- 2
- 3
- 4













