keras tutoriallibraries such as Theano, TensorFlow, Caffe, Mxnet etc., Keras is one of the most powerful and easy to use python library, which is built on top of popular deep learning libraries like TensorFlow, Theano, avoid breaking the packages installed in the other environments. So, it is always recommended to use a virtual environment while developing Python applications. Linux/Mac OS Linux or mac OS users installation location. Windows 2. Keras ― Installation Keras 4 Windows user can use the below command, py -m venv keras Step 2: Activate the environment This step will configure0 码力 | 98 页 | 1.57 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesattention mechanism and the hashing trick. In this chapter, we will deepdive into their architectures and use them to transform large and complex models into smaller and efficient models capable of running on selecting the most informative features is crucial for making the training step efficient. In the case of visual, textual, and other multimodal data, we often construct the features by hand (at least in features an embedding, where the two features are its dimensions. We will shortly explain how we can use these embeddings. Animal Embedding (cute, dangerous) dog (0.85, 0.05) cat (0.95, 0.05) snake (00 码力 | 53 页 | 3.92 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquestirelessly towards storing and transmitting information in as few bits as possible. Depending on the use case, we might be interested in compressing in a lossless or lossy manner. We can fit 10 apples in a perceptible loss in quality. However, further compression might lead to degradation in quality. In our case, we are concerned about compressing the deep learning models. What do we really mean by compressing beyond a limit hurts quality metrics. Conversely, a higher quality implies a worse footprint. In the case of deep learning models, the model quality is often correlated with the number of layers, and the0 码力 | 33 页 | 1.96 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesquantized weight value. That is an implicit form for weight sharing. However, quantization falls behind in case the data that we are quantizing is not uniformly distributed, i.e. the data is more likely to take Exercise: Sparsity improves compression Let's import the required libraries to start with. We will use the gzip python module for demonstrating compression. The code for this exercise is available as a However, we could use variable pruning rates across the pruning rounds. The motivation behind using variable sparsity is that a pre-trained model’s weights will get disrupted if we use a large pruning rate0 码力 | 34 页 | 3.18 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesavailable to us, based on what we want: 1. We only care about reaching the accuracy goal of 80%: In this case, it is perfectly fine to take the lower labeling and training costs and call it a day! 2. We want improve your model footprint. However, given that they lead to an improvement in quality metrics, we can use them to boost the performance of models that might have not been suitable earlier because of a lower a probability of 30% and a ‘hamster’ with a probability of 70%. The sample generation techniques use models to generate samples for labels. Consider a training sample for English to Spanish translation:0 码力 | 56 页 | 18.93 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewchange the output), then we can simply add a few additional layers (known as the prediction head), use the appropriate loss function, and train the model with the labeled data for the task at hand. We can general model our hope is that we can use these limited number of labeled examples for fine-tuning since the model already knows the general concepts about language, and use the same model across many tasks itself, there is no need for any sort of human intervention for labeling. Therefore, we can simply use e-books, Wikipedia and other sources for NLU related models, and web images & videos for computer vision0 码力 | 31 页 | 4.03 MB | 1 年前3
PyTorch Release Notessupport. Functions are executed immediately instead of enqueued in a static graph, improving ease of use and provides a sophisticated debugging experience. In the container, see /workspace/README.md for container from the NGC container registry: ‣ Install Docker. ‣ For NVIDIA DGX™ users, see Preparing to use NVIDIA Containers Getting Started Guide. ‣ For non-DGX users, see NVIDIA ® GPU Cloud ™ (NGC) container run -it --rm -v local_dir:container_dir nvcr.io/nvidia/ pytorch:-py3 Note: If you use multiprocessing for multi-threaded data loaders, the default shared memory segment size with which 0 码力 | 365 页 | 2.94 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionthe one which does better on training or inference efficiency metrics, or both (depending on the use case). For example, if you are deploying a model on devices where inference is constrained (such as together make the pareto-frontier. However, certain models might offer better trade-offs than others. In case we find models where we cannot get a better quality while holding the latency constant, or we cannot to optimize the models for the device they will run on. Privacy & Data Sensitivity Being able to use as little data for training is critical when the user-data might be sensitive to handling / subject0 码力 | 21 页 | 3.17 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naiveis labeled by y, and P(X = x) is the probability that a randomly picked image has label y. In our case, we make decision by calculating P(Y = 0 | X = x) = P(X = x | Y = 0)P(Y = 0) P(X = x) (3) P(Y = only P(X = x | Y = y) and P(Y = y). Recalling that, in linear regression and logistic regression, we use hypothesis function y = hθ(x) to model the relationship between feature vector x and label y, while 1{y(i) = 1}x(i)/ m � i=1 1{y(i) = 1} Σ = 1 m m � i=1 (x(i) − µy(i))(x(i) − µy(i))T Now we can use the above results to calculate the expression of pY (y), pX|Y (x | 0), and pX|Y (x | 1) according to0 码力 | 19 页 | 238.80 KB | 1 年前3
Lecture Notes on Support Vector Machinenot change the hyperplane. Hence, we scale (ω, b) such that min i {y(i)(ωT x(i) + b)} = 1, In this case, the representation of the margin becomes 1/∥ω∥ according to Eq. (6). Then, the problem formulation can be used to find non-trivial lower bounds. The duality is said to be strong if d∗ = p∗. In this case, we can optimize the original problem by optimizing its dual problem. 2.2.2 Complementary Slackness We can use several off-the-shelf solvers (e.g., quadprog (MATLAB), CVXOPT, CPLEX, IPOPT, etc.) to solve such a QP problem. Let α∗ be the optimal value of α for the dual SVM problem. We can use Eq. (36)0 码力 | 18 页 | 509.37 KB | 1 年前3
共 39 条
- 1
- 2
- 3
- 4













