《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewremember, involve me and I learn.” – Benjamin Franklin This chapter is a continuation of Chapter 3, where we introduced learning techniques. To recap, learning techniques can help us meet our model quality training a task specific model from scratch. One such task is the Microsoft Research Paraphrase Corpus1 where the model needs to predict if a pair of sentences are semantically equivalent. The dataset has only fine-tuned for downstream tasks, for example object detection for tigers, segmentation for pets etc., where the labeled data might be sparse. 1 Dolan, William B. and Chris Brockett. "Automatically Constructing0 码力 | 31 页 | 4.03 MB | 1 年前3
keras tutorialsubfield of machine learning. Deep learning involves analyzing the input in layer by layer manner, where each layer progressively extracts higher level information about the input. Let us take a simple Activation model = Sequential() model.add(Dense(512, activation='relu', input_shape=(784,))) Where, Line 1 imports Sequential model from Keras models Line 2 imports Dense layer and Activation activation='relu')) model.add(Dropout(0.2)) model.add(Dense(num_classes, activation='softmax')) Where, Keras 19 Line 1 imports Sequential model from Keras models Line 2 imports0 码力 | 98 页 | 1.57 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - IntroductionThe subsequent chapters will delve deeper into techniques, infrastructure, and other helpful topics where you can get your hands dirty with practical projects. With that being said, let’s start off on our domains where there might not be a single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to process. Unlike traditional algorithm problems where we expect guidelines. The ImageNet dataset was a big boon in this aspect. It has more than 1 million labeled images, where each image belongs to 1 out of 1000 possible classes. This helped with creating a testbed for researchers0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesdangerous features for six animals2, and we are calling the tuple of these two features an embedding, where the two features are its dimensions. We will shortly explain how we can use these embeddings. Animal might want to stay away from it too. Now that we have a two-dimensional embedding for each animal, where each feature represents one dimension, we can represent the animals on a 2-D plot. The feature cute animals into just two dimensions, and established a relationship between them purely using numbers, where their relative closeness in the euclidean space on the plot denotes their similarity. We can verify0 码力 | 53 页 | 3.92 MB | 1 年前3
Lecture 5: Gaussian Discriminant Analysis, Naive Bayesrandom variable Y Joint probability distribution P(a ≤ X ≤ b, Y = y) = P(a ≤ X ≤ b | Y = y)P(Y = y) where P(a ≤ X ≤ b | Y = y) = � b a fX|Y =y(x)dx P(Y = y) = pY (y) Feng Li (SDU) GDA, NB and EM September log m � i=1 pX|Y (x(i) | y(i))pY (y(i)) = m � i=1 � log pX|Y (x(i) | y(i)) + log pY (y(i)) � where θ = {pX|Y (x | y), pY (y)}x,y Feng Li (SDU) GDA, NB and EM September 27, 2023 35 / 122 Warm Up (Contd Gaussian Distribution (Normal Distribution) p(x; µ, σ) = 1 (2πσ2)1/2 exp � − 1 2σ2 (x − µ)2 � where µ is the mean and σ2 is the variance Gaussian distributions are important in statistics and are often0 码力 | 122 页 | 1.35 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, NaiveBayes’ theorem is stated mathematically as the following equation P(A | B) = P(B | A)P(A) P(B) (1) where P(A | B) is the conditional probability of event A given event B happens, P(B | A) is the conditional · · , xn]T , out goal is to calculate P(Y = y | X = x) = P(X = x | Y = y)P(Y = y) P(X = x) (2) where y ∈ {0, 1}. In particular, P(Y = y | X = x) is the probability that the image is labeled by y given µ1, Σ)pY (y(i); ψ) = m � i=1 log pX|Y (x(i) | y(i); µ0, µ1, Σ) + m � i=1 log pY (y(i); ψ)(8) where ψ, µ0, and σ are parameters. Substituting Eq. (5)∼(7) into Eq. (8) gives 2 us a full expression0 码力 | 19 页 | 238.80 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquessaliency scores. Then, it proceeds to fine-tune the network. The outcome of this algorithm is a network where weights have been pruned. This style of pruning is called iterative pruning because we prune the Brain Damage (OBD) paper approximate the saliency score using a second-derivative of the weights , where is the loss function, and is the candidate parameter for removal. Why do we want to compute the second-derivative arbitrary weights, we can ignore the first row in the weight matrix. If the input was of shape [n, 6], where n is the batch size, and the weight matrix was of shape [6, 6], we can now treat this problem to be0 码力 | 34 页 | 3.18 MB | 1 年前3
Lecture Notes on Support Vector Machine1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ωT x + b = 0 (1) where ω ∈ Rn is the outward pointing normal vector, and b is the bias term. The n-dimensional space is separated problem min ω f(ω) (9) s.t. gi(ω) ≤ 0, i = 1, · · · , k (10) hj(ω) = 0, j = 1, · · · , l (11) where ω ∈ D is the variable with D = �k i=1 domgi ∩ �l j=1 domhj representing the feasible domain defined iii) G can be −∞ for some α and β Theorem 1. Lower Bounds Property: If α ⪰ 0, then G(α, β ) ≤ p∗ where p∗ is the optimal value of the (original) primal problem defined by (9)∼(11). Proof. If �ω is feasible0 码力 | 18 页 | 509.37 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 7 - AutomationSearch A simple algorithm for automating HPO is Grid Search (also referred to as Parameter Sweep), where the trial set consists of all the combinations of valid hyperparameters values. Each trial is configured a good candidate for parallel execution. For example, the trial set for two hyperparameters and where and is Figure 7-2 (a) shows results of grid search trials with two hyperparameters and . The blue It also has a couple of additional drawbacks. First, it suffers from the curse of dimensionality where the total number of trials grows quickly for each additional hyperparameter value or a new hyperparameter0 码力 | 33 页 | 2.48 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesto the digital domain. A popular example of lossless data compression algorithm is Huffman Coding, where we assign unique strings of bits (codes) to the symbols based on their frequency in the data. More schemes. The lossy compression algorithms are used in situations (people who like diced apples) where we don’t expect to recover the exact representation of the original data. It is okay to recover an information as a trade off. It is especially applicable for multimedia (audio, video, images) data,, where it is likely that either humans who will consume the information will not notice the loss of some0 码力 | 33 页 | 1.96 MB | 1 年前3
共 29 条
- 1
- 2
- 3













