WHERE - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Review

remember, involve me and I learn.” – Benjamin Franklin This chapter is a continuation of Chapter 3, where we introduced learning techniques. To recap, learning techniques can help us meet our model quality training a task specific model from scratch. One such task is the Microsoft Research Paraphrase Corpus1 where the model needs to predict if a pair of sentences are semantically equivalent. The dataset has only fine-tuned for downstream tasks, for example object detection for tigers, segmentation for pets etc., where the labeled data might be sparse. 1 Dolan, William B. and Chris Brockett. "Automatically Constructing

0 码力 | 31 页 | 4.03 MB | 1 年前
3
keras tutorial

subfield of machine learning. Deep learning involves analyzing the input in layer by layer manner, where each layer progressively extracts higher level information about the input. Let us take a simple Activation model = Sequential() model.add(Dense(512, activation='relu', input_shape=(784,))) Where,  Line 1 imports Sequential model from Keras models  Line 2 imports Dense layer and Activation activation='relu')) model.add(Dropout(0.2)) model.add(Dense(num_classes, activation='softmax')) Where, Keras 19  Line 1 imports Sequential model from Keras models  Line 2 imports

0 码力 | 98 页 | 1.57 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction

The subsequent chapters will delve deeper into techniques, infrastructure, and other helpful topics where you can get your hands dirty with practical projects. With that being said, let’s start off on our domains where there might not be a single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to process. Unlike traditional algorithm problems where we expect guidelines. The ImageNet dataset was a big boon in this aspect. It has more than 1 million labeled images, where each image belongs to 1 out of 1000 possible classes. This helped with creating a testbed for researchers

0 码力 | 21 页 | 3.17 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

dangerous features for six animals2, and we are calling the tuple of these two features an embedding, where the two features are its dimensions. We will shortly explain how we can use these embeddings. Animal might want to stay away from it too. Now that we have a two-dimensional embedding for each animal, where each feature represents one dimension, we can represent the animals on a 2-D plot. The feature cute animals into just two dimensions, and established a relationship between them purely using numbers, where their relative closeness in the euclidean space on the plot denotes their similarity. We can verify

0 码力 | 53 页 | 3.92 MB | 1 年前
3
Lecture 5: Gaussian Discriminant Analysis, Naive Bayes

random variable Y Joint probability distribution P(a ≤ X ≤ b, Y = y) = P(a ≤ X ≤ b | Y = y)P(Y = y) where P(a ≤ X ≤ b | Y = y) = � b a fX|Y =y(x)dx P(Y = y) = pY (y) Feng Li (SDU) GDA, NB and EM September log m � i=1 pX|Y (x(i) | y(i))pY (y(i)) = m � i=1 � log pX|Y (x(i) | y(i)) + log pY (y(i)) � where θ = {pX|Y (x | y), pY (y)}x,y Feng Li (SDU) GDA, NB and EM September 27, 2023 35 / 122 Warm Up (Contd Gaussian Distribution (Normal Distribution) p(x; µ, σ) = 1 (2πσ2)1/2 exp � − 1 2σ2 (x − µ)2 � where µ is the mean and σ2 is the variance Gaussian distributions are important in statistics and are often

0 码力 | 122 页 | 1.35 MB | 1 年前
3
Lecture Notes on Gaussian Discriminant Analysis, Naive

Bayes’ theorem is stated mathematically as the following equation P(A | B) = P(B | A)P(A) P(B) (1) where P(A | B) is the conditional probability of event A given event B happens, P(B | A) is the conditional · · , xn]T , out goal is to calculate P(Y = y | X = x) = P(X = x | Y = y)P(Y = y) P(X = x) (2) where y ∈ {0, 1}. In particular, P(Y = y | X = x) is the probability that the image is labeled by y given µ1, Σ)pY (y(i); ψ) = m � i=1 log pX|Y (x(i) | y(i); µ0, µ1, Σ) + m � i=1 log pY (y(i); ψ)(8) where ψ, µ0, and σ are parameters. Substituting Eq. (5)∼(7) into Eq. (8) gives 2 us a full expression

0 码力 | 19 页 | 238.80 KB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniques

saliency scores. Then, it proceeds to fine-tune the network. The outcome of this algorithm is a network where weights have been pruned. This style of pruning is called iterative pruning because we prune the Brain Damage (OBD) paper approximate the saliency score using a second-derivative of the weights , where is the loss function, and is the candidate parameter for removal. Why do we want to compute the second-derivative arbitrary weights, we can ignore the first row in the weight matrix. If the input was of shape [n, 6], where n is the batch size, and the weight matrix was of shape [6, 6], we can now treat this problem to be

0 码力 | 34 页 | 3.18 MB | 1 年前
3
Lecture Notes on Support Vector Machine

1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ωT x + b = 0 (1) where ω ∈ Rn is the outward pointing normal vector, and b is the bias term. The n-dimensional space is separated problem min ω f(ω) (9) s.t. gi(ω) ≤ 0, i = 1, · · · , k (10) hj(ω) = 0, j = 1, · · · , l (11) where ω ∈ D is the variable with D = �k i=1 domgi ∩ �l j=1 domhj representing the feasible domain defined iii) G can be −∞ for some α and β Theorem 1. Lower Bounds Property: If α ⪰ 0, then G(α, β ) ≤ p∗ where p∗ is the optimal value of the (original) primal problem defined by (9)∼(11). Proof. If �ω is feasible

0 码力 | 18 页 | 509.37 KB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 7 - Automation

Search A simple algorithm for automating HPO is Grid Search (also referred to as Parameter Sweep), where the trial set consists of all the combinations of valid hyperparameters values. Each trial is configured a good candidate for parallel execution. For example, the trial set for two hyperparameters and where and is Figure 7-2 (a) shows results of grid search trials with two hyperparameters and . The blue It also has a couple of additional drawbacks. First, it suffers from the curse of dimensionality where the total number of trials grows quickly for each additional hyperparameter value or a new hyperparameter

0 码力 | 33 页 | 2.48 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniques

to the digital domain. A popular example of lossless data compression algorithm is Huffman Coding, where we assign unique strings of bits (codes) to the symbols based on their frequency in the data. More schemes. The lossy compression algorithms are used in situations (people who like diced apples) where we don’t expect to recover the exact representation of the original data. It is okay to recover an information as a trade off. It is especially applicable for multimedia (audio, video, images) data,, where it is likely that either humans who will consume the information will not notice the loss of some

0 码力 | 33 页 | 1.96 MB | 1 年前
3

共 29 条前往

页

分类

语言

格式

《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Review

keras tutorial

《Efficient Deep Learning Book》[EDL] Chapter 1 - Introduction

《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

Lecture 5: Gaussian Discriminant Analysis, Naive Bayes

Lecture Notes on Gaussian Discriminant Analysis, Naive

《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniques

Lecture Notes on Support Vector Machine

《Efficient Deep Learning Book》[EDL] Chapter 7 - Automation

《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniques