 Experiment 1: Linear RegressionAge in years 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Height in meters Figure 1: Plotting the data. Before starting gradient descent, we need to add the x0 = 1 intercept term to every example. To do this t % Because of the way meshgrids work in the s u r f command, we % need to transpose J v a l s before c a l l i n g surf , or e l s e the % axes w i l l be f l i p p e d J v a l s = J vals ’ figure Answer the following questions: 1. Observe the changes in the cost function happens as the learning rate changes. What happens when the learning rate is too small? Too large? 2. Using the best learning0 码力 | 7 页 | 428.11 KB | 1 年前3 Experiment 1: Linear RegressionAge in years 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Height in meters Figure 1: Plotting the data. Before starting gradient descent, we need to add the x0 = 1 intercept term to every example. To do this t % Because of the way meshgrids work in the s u r f command, we % need to transpose J v a l s before c a l l i n g surf , or e l s e the % axes w i l l be f l i p p e d J v a l s = J vals ’ figure Answer the following questions: 1. Observe the changes in the cost function happens as the learning rate changes. What happens when the learning rate is too small? Too large? 2. Using the best learning0 码力 | 7 页 | 428.11 KB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesteaching-a-child analogy, consider the number of distinct examples of objects (labels) you must show a child before they can learn to identify them with high accuracy. All cups have the same basic shape. One possible perform a random synonym insertion for the word “vacation”. The synonym “holiday” got inserted right before the word “vacation”. The result is grammatically incorrect. However, the sentiments of the original This motivation was revived for compressing information into the 160-character limit of a costly SMS before the advent of multi-message capabilities. Length constraints, and the initial handicap of having0 码力 | 56 页 | 18.93 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesteaching-a-child analogy, consider the number of distinct examples of objects (labels) you must show a child before they can learn to identify them with high accuracy. All cups have the same basic shape. One possible perform a random synonym insertion for the word “vacation”. The synonym “holiday” got inserted right before the word “vacation”. The result is grammatically incorrect. However, the sentiments of the original This motivation was revived for compressing information into the 160-character limit of a costly SMS before the advent of multi-message capabilities. Length constraints, and the initial handicap of having0 码力 | 56 页 | 18.93 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesfamiliar with generating embeddings, how do we use them? If you have a generic task that has been solved before, it might be worth using its embeddings. However, if you are working on a specific task, you might course is predominantly southward and passes through numerous small lakes (most notably Lake Victor before reaching the sea at the islet Cove arm of Cunaris Sound." 11,"Oxalis stricta"," Oxalis stricta called perform the vectorization as well. It is possible to achieve by inserting the vectorization layer before the embedding layer. string_input = tf.keras.Input(shape=(1,), dtype='string') x = vectorizatio0 码力 | 53 页 | 3.92 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesfamiliar with generating embeddings, how do we use them? If you have a generic task that has been solved before, it might be worth using its embeddings. However, if you are working on a specific task, you might course is predominantly southward and passes through numerous small lakes (most notably Lake Victor before reaching the sea at the islet Cove arm of Cunaris Sound." 11,"Oxalis stricta"," Oxalis stricta called perform the vectorization as well. It is possible to achieve by inserting the vectorization layer before the embedding layer. string_input = tf.keras.Input(shape=(1,), dtype='string') x = vectorizatio0 码力 | 53 页 | 3.92 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionthem have to be realtime, hence there is a need for on-device ML models (where the model inference happens directly on the device). Which makes it imperative to optimize the models for the device they will model of efficient deep learning in the next section. A Mental Model of Efficient Deep Learning Before we dive deeper, let’s visualize two sets of closely connected metrics that we care about. First, Layers (see Figure 1-15), Attention, etc.), that are a significant leap over the baseline methods used before them. As an example, convolutional layers introduce parameter sharing and filters for use in image0 码力 | 21 页 | 3.17 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionthem have to be realtime, hence there is a need for on-device ML models (where the model inference happens directly on the device). Which makes it imperative to optimize the models for the device they will model of efficient deep learning in the next section. A Mental Model of Efficient Deep Learning Before we dive deeper, let’s visualize two sets of closely connected metrics that we care about. First, Layers (see Figure 1-15), Attention, etc.), that are a significant leap over the baseline methods used before them. As an example, convolutional layers introduce parameter sharing and filters for use in image0 码力 | 21 页 | 3.17 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniqueswhich is also used in various fields of computer science in addition to deep learning. Quantization Before we jump to working with a deep learning model, we have a task for you. You have been handed the charge a quantized domain (b-bit values). This process is nothing but (cue drum roll!) ...Quantization! Before we get our hands dirty, let us first make two reasonable assumptions: 1. We know that the value A visualization of 100 samples from the MNIST dataset. Loading and Processing the MNIST Dataset Before we start, the code is available as a Jupyter notebook here. Now let’s take a look at the load_data()0 码力 | 33 页 | 1.96 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniqueswhich is also used in various fields of computer science in addition to deep learning. Quantization Before we jump to working with a deep learning model, we have a task for you. You have been handed the charge a quantized domain (b-bit values). This process is nothing but (cue drum roll!) ...Quantization! Before we get our hands dirty, let us first make two reasonable assumptions: 1. We know that the value A visualization of 100 samples from the MNIST dataset. Loading and Processing the MNIST Dataset Before we start, the code is available as a Jupyter notebook here. Now let’s take a look at the load_data()0 码力 | 33 页 | 1.96 MB | 1 年前3
 Lecture 1: Overviewlabels Example 4 T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while ob- serving a essential aspects of the problem, but that is not so complex that overfitting occurs. Overfitting happens when we choose parameters of a model that fit the data we have very well, but do poorly on new data0 码力 | 57 页 | 2.41 MB | 1 年前3 Lecture 1: Overviewlabels Example 4 T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while ob- serving a essential aspects of the problem, but that is not so complex that overfitting occurs. Overfitting happens when we choose parameters of a model that fit the data we have very well, but do poorly on new data0 码力 | 57 页 | 2.41 MB | 1 年前3
 AI大模型千问 qwen 中文文档AutoModelForCausalLM, AutoTokenizer 借助 TextStreamer ,chat 的流式模式变得非常简单。下面我们将展示一个如何使用它的示例: ... # Reuse the code before `model.generate()` in the last code snippet from transformers import TextStreamer streamer = TextStreamer(tokenizer 作为系统提示。 1.3.2 流式输出 借助 TextStreamer ,您可以将与 Qwen 的对话切换到流式传输模式。下面是一个关于如何使用它的示例: # Repeat the code above before model.generate() # Starting here, we add streamer for text generation. from transformers import TextStreamer TextStreamer 之外,我们还可以使用 TextIteratorStreamer ,它将可打印的文本存储在一 个队列中,以便下游应用程序作为迭代器来使用: # Repeat the code above before model.generate() # Starting here, we add streamer for text generation. (续下页) 1.3. 使用 Transformers0 码力 | 56 页 | 835.78 KB | 1 年前3 AI大模型千问 qwen 中文文档AutoModelForCausalLM, AutoTokenizer 借助 TextStreamer ,chat 的流式模式变得非常简单。下面我们将展示一个如何使用它的示例: ... # Reuse the code before `model.generate()` in the last code snippet from transformers import TextStreamer streamer = TextStreamer(tokenizer 作为系统提示。 1.3.2 流式输出 借助 TextStreamer ,您可以将与 Qwen 的对话切换到流式传输模式。下面是一个关于如何使用它的示例: # Repeat the code above before model.generate() # Starting here, we add streamer for text generation. from transformers import TextStreamer TextStreamer 之外,我们还可以使用 TextIteratorStreamer ,它将可打印的文本存储在一 个队列中,以便下游应用程序作为迭代器来使用: # Repeat the code above before model.generate() # Starting here, we add streamer for text generation. (续下页) 1.3. 使用 Transformers0 码力 | 56 页 | 835.78 KB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 7 - Automationwise addition or the concatenation of output of primitive operations. The concatenation operation happens along the filter dimension to keep the feature map intact. Figure 7-9 shows the Normal and Reduction argument is a numpy array of shape (5, 5) which contains 5 state choices for each of the 5 blocks. Before the cell construction, we standardize the two branch inputs to an appropriate feature space and channel0 码力 | 33 页 | 2.48 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 7 - Automationwise addition or the concatenation of output of primitive operations. The concatenation operation happens along the filter dimension to keep the feature map intact. Figure 7-9 shows the Normal and Reduction argument is a numpy array of shape (5, 5) which contains 5 state choices for each of the 5 blocks. Before the cell construction, we standardize the two branch inputs to an appropriate feature space and channel0 码力 | 33 页 | 2.48 MB | 1 年前3
 Lecture 5: Gaussian Discriminant Analysis, Naive Bayesrandom experiment Event A is a subset of the sample space S P(A) is the probability that event A happens It is a function that maps the event A onto the interval [0, 1]. P(A) is also called the probability probability measures a “degree of be- lief”, and Bayes’ theorem links the degree of belief in a proposition before and after accounting for evidence. For proposition A and evidence B P(A), the prior, is the initial0 码力 | 122 页 | 1.35 MB | 1 年前3 Lecture 5: Gaussian Discriminant Analysis, Naive Bayesrandom experiment Event A is a subset of the sample space S P(A) is the probability that event A happens It is a function that maps the event A onto the interval [0, 1]. P(A) is also called the probability probability measures a “degree of be- lief”, and Bayes’ theorem links the degree of belief in a proposition before and after accounting for evidence. For proposition A and evidence B P(A), the prior, is the initial0 码力 | 122 页 | 1.35 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesweight. Hence, we can view weight sharing as the general concept behind quantization. However, what happens if our and were outliers, and the real data was clustered in some smaller concentrated ranges? Quantization followed by gzip the size of the final compressed model file was 263.02 KB. Converting to TFLite before gzip helped further reduce the size to 256.71 KB. All in all, we achieved a size reduction of 5.62x0 码力 | 34 页 | 3.18 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesweight. Hence, we can view weight sharing as the general concept behind quantization. However, what happens if our and were outliers, and the real data was clustered in some smaller concentrated ranges? Quantization followed by gzip the size of the final compressed model file was 263.02 KB. Converting to TFLite before gzip helped further reduce the size to 256.71 KB. All in all, we achieved a size reduction of 5.62x0 码力 | 34 页 | 3.18 MB | 1 年前3
共 22 条
- 1
- 2
- 3













