 动手学深度学习 v2.0import Image from torch import nn from torch.nn import functional as F from torch.utils import data from torchvision import transforms 目标受众 本书面向学生(本科生或研究生)、工程师和研究人员,他们希望扎实掌握深度学习的实用技术。因为我们 从头开始解 编写了一个“学习”程序。如果我们用一个巨大的带标签的数 据集,它很可能可以“学习”识别唤醒词。这种“通过用数据集来确定程序行为”的方法可以被看作用数据 编程(programming with data)。比如,我们可以通过向机器学习系统,提供许多猫和狗的图片来设计一个 “猫图检测器”。检测器最终可以学会:如果输入是猫的图片就输出一个非常大的正数,如果输入是狗的图片 就会输出一个非常小的负数 学习的一个主要分支,本节稍后的内容将对其 进行更详细的解析。 1.2 机器学习中的关键组件 首先介绍一些核心组件。无论什么类型的机器学习问题,都会遇到这些组件: 1. 可以用来学习的数据(data); 2. 如何转换数据的模型(model); 3. 一个目标函数(objective function),用来量化模型的有效性; 4. 调整模型参数以优化目标函数的算法(algorithm)。0 码力 | 797 页 | 29.45 MB | 1 年前3 动手学深度学习 v2.0import Image from torch import nn from torch.nn import functional as F from torch.utils import data from torchvision import transforms 目标受众 本书面向学生(本科生或研究生)、工程师和研究人员,他们希望扎实掌握深度学习的实用技术。因为我们 从头开始解 编写了一个“学习”程序。如果我们用一个巨大的带标签的数 据集,它很可能可以“学习”识别唤醒词。这种“通过用数据集来确定程序行为”的方法可以被看作用数据 编程(programming with data)。比如,我们可以通过向机器学习系统,提供许多猫和狗的图片来设计一个 “猫图检测器”。检测器最终可以学会:如果输入是猫的图片就输出一个非常大的正数,如果输入是狗的图片 就会输出一个非常小的负数 学习的一个主要分支,本节稍后的内容将对其 进行更详细的解析。 1.2 机器学习中的关键组件 首先介绍一些核心组件。无论什么类型的机器学习问题,都会遇到这些组件: 1. 可以用来学习的数据(data); 2. 如何转换数据的模型(model); 3. 一个目标函数(objective function),用来量化模型的有效性; 4. 调整模型参数以优化目标函数的算法(algorithm)。0 码力 | 797 页 | 29.45 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression TechniquesOverview of Compression One of the simplest approaches towards efficiency is compression to reduce data size. For the longest time in the history of computing, scientists have worked tirelessly towards popular example of lossless data compression algorithm is Huffman Coding, where we assign unique strings of bits (codes) to the symbols based on their frequency in the data. More frequent symbols are assigned and the path to that symbol is the bit-string assigned to it. This allows us to encode the given data in as few bits as possible, since the most frequent symbols will take the least number of bits to0 码力 | 33 页 | 1.96 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression TechniquesOverview of Compression One of the simplest approaches towards efficiency is compression to reduce data size. For the longest time in the history of computing, scientists have worked tirelessly towards popular example of lossless data compression algorithm is Huffman Coding, where we assign unique strings of bits (codes) to the symbols based on their frequency in the data. More frequent symbols are assigned and the path to that symbol is the bit-string assigned to it. This allows us to encode the given data in as few bits as possible, since the most frequent symbols will take the least number of bits to0 码力 | 33 页 | 1.96 MB | 1 年前3
 人工智能发展史cope with multi categories https://youtu.be/aygSMgK3BEM Perceptrons’ Limitation: 1969 http://science.sciencemag.org/content/165/3895/780 Is it Winter? http://www.iro.umontreal.ca/~vincentp/ift339 pdf http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf Other Heroes ▪ Big Data ▪ ReLU ▪ BatchNorm ▪ Xavier Initialization ▪ Kaiming Initialization ▪ Dropout http://www.iro Goodfellow ▪ How I fail https://veronikach.com/how-i-fail/how-i-fail-ian-goodfellow-phd14-computer-science/ http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf ▪ 2015 https://storage0 码力 | 54 页 | 3.87 MB | 1 年前3 人工智能发展史cope with multi categories https://youtu.be/aygSMgK3BEM Perceptrons’ Limitation: 1969 http://science.sciencemag.org/content/165/3895/780 Is it Winter? http://www.iro.umontreal.ca/~vincentp/ift339 pdf http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf Other Heroes ▪ Big Data ▪ ReLU ▪ BatchNorm ▪ Xavier Initialization ▪ Kaiming Initialization ▪ Dropout http://www.iro Goodfellow ▪ How I fail https://veronikach.com/how-i-fail/how-i-fail-ian-goodfellow-phd14-computer-science/ http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf ▪ 2015 https://storage0 码力 | 54 页 | 3.87 MB | 1 年前3
 【PyTorch深度学习-龙龙老师】-测试版2021120.01 的高斯分布: ? = 1. ? + . + ?, ? ∼ ?( , . 12) 通过随机采样? = 1 次,可以获得?个样本的训练数据集?train,代码如下: data = []# 保存样本集的列表 for i in range(100): # 循环采样 100 个点 x = np.random.uniform(-10., 10.) # 随机采样输入 random.normal(0., 0.01) # 得到模型的输出 y = 1.477 * x + 0.089 + eps data.append([x, y]) # 保存样本点 data = np.array(data) # 转换为 2D Numpy 数组 通过 for 循环进行 100 次采样,每次从均匀分布?(−1 ,1 )中随机采样一个数据?,同时从 均值为 1000 次,返回最优 w*,b*和训练 Loss 的下降过程 [b, w]= gradient_descent(data, initial_b, initial_w, lr, num_iterations) loss = mse(b, w, data) # 计算最优数值解 w,b 上的均方差 print(f'Final loss:{loss}, w:{w}, b:{b}')0 码力 | 439 页 | 29.91 MB | 1 年前3 【PyTorch深度学习-龙龙老师】-测试版2021120.01 的高斯分布: ? = 1. ? + . + ?, ? ∼ ?( , . 12) 通过随机采样? = 1 次,可以获得?个样本的训练数据集?train,代码如下: data = []# 保存样本集的列表 for i in range(100): # 循环采样 100 个点 x = np.random.uniform(-10., 10.) # 随机采样输入 random.normal(0., 0.01) # 得到模型的输出 y = 1.477 * x + 0.089 + eps data.append([x, y]) # 保存样本点 data = np.array(data) # 转换为 2D Numpy 数组 通过 for 循环进行 100 次采样,每次从均匀分布?(−1 ,1 )中随机采样一个数据?,同时从 均值为 1000 次,返回最优 w*,b*和训练 Loss 的下降过程 [b, w]= gradient_descent(data, initial_b, initial_w, lr, num_iterations) loss = mse(b, w, data) # 计算最优数值解 w,b 上的均方差 print(f'Final loss:{loss}, w:{w}, b:{b}')0 码力 | 439 页 | 29.91 MB | 1 年前3
 PyTorch TutorialPython usage − This library is considered to be Pythonic which smoothly integrates with the Python data science stack. • It can be considered as NumPy extension to GPUs. • Computational graphs − PyTorch provides computation graph) • Various other functions • loss (MSE,CE etc..) • optimizers Prepare Input Data •Load data •Iterate over examples Train Model •Train weights Evaluate Model •Visualise Tensor requires_grad=True) •Accessing tensor value: • t.data •Accessing tensor gradient • t.grad • grad_fn – history of operations for autograd • t.grad_fn Loading Data, Devices and CUDA • Numpy arrays to PyTorch0 码力 | 38 页 | 4.09 MB | 1 年前3 PyTorch TutorialPython usage − This library is considered to be Pythonic which smoothly integrates with the Python data science stack. • It can be considered as NumPy extension to GPUs. • Computational graphs − PyTorch provides computation graph) • Various other functions • loss (MSE,CE etc..) • optimizers Prepare Input Data •Load data •Iterate over examples Train Model •Train weights Evaluate Model •Visualise Tensor requires_grad=True) •Accessing tensor value: • t.data •Accessing tensor gradient • t.grad • grad_fn – history of operations for autograd • t.grad_fn Loading Data, Devices and CUDA • Numpy arrays to PyTorch0 码力 | 38 页 | 4.09 MB | 1 年前3
 Machine LearningNetworks and Deep Learning Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Deep Feedforward Networks • Also called feedforward f(x) from the given training data • In the output layer, f(x) ≈ y for each training data, but the behavior of the other layers is not directly specified by the training data • Learning algorithm must decided intermediate layers such that right results can be obtained in the output layer, but the training data do not say what each individual layer should do • The only thing we must provide to the neural network0 码力 | 19 页 | 944.40 KB | 1 年前3 Machine LearningNetworks and Deep Learning Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Deep Feedforward Networks • Also called feedforward f(x) from the given training data • In the output layer, f(x) ≈ y for each training data, but the behavior of the other layers is not directly specified by the training data • Learning algorithm must decided intermediate layers such that right results can be obtained in the output layer, but the training data do not say what each individual layer should do • The only thing we must provide to the neural network0 码力 | 19 页 | 944.40 KB | 1 年前3
 机器学习课程-温州大学-10机器学习-聚类Discovering Clusters in Large Spatial Databases with Noise[J]. Proc.int.conf.knowledg Discovery & Data Mining, 1996. [3] Andrew Ng. Machine Learning[EB/OL]. Stanford University,2014. https://www.coursera 2001. 47 参考文献 [7] Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492. [8] CHRISTOPHER M. BISHOP. Pattern Recognition and Machine Learning[M] A, et al. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection[J]. Acm Transactions on Knowledge Discovery from Data, 2015. [11] 彭 涛 . 人 工 智 能 概 论 [EB/OL]. 北 京 联0 码力 | 48 页 | 2.59 MB | 1 年前3 机器学习课程-温州大学-10机器学习-聚类Discovering Clusters in Large Spatial Databases with Noise[J]. Proc.int.conf.knowledg Discovery & Data Mining, 1996. [3] Andrew Ng. Machine Learning[EB/OL]. Stanford University,2014. https://www.coursera 2001. 47 参考文献 [7] Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492. [8] CHRISTOPHER M. BISHOP. Pattern Recognition and Machine Learning[M] A, et al. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection[J]. Acm Transactions on Knowledge Discovery from Data, 2015. [11] 彭 涛 . 人 工 智 能 概 论 [EB/OL]. 北 京 联0 码力 | 48 页 | 2.59 MB | 1 年前3
 Lecture 5: Gaussian Discriminant Analysis, Naive BayespX(x) , ∀y We calculate pX|Y (x | y) for ∀x, y and pY (y) for ∀y according to the given training data Fortunately, we do not have to calculate pX(x), because arg max y pY |X(y | x) = arg max y pX|Y learning from training data, but how? Feng Li (SDU) GDA, NB and EM September 27, 2023 33 / 122 Warm Up (Contd.) Given a set of training data D = {x(i), y(i)}i=1,··· ,m The training data are sampled in an an i.i.d. manner The probability of the i-th training data (x(i), y (i)) P(X = x(i), Y = y (i)) = P(X = x(i) | Y = y (i))P(Y = y (i)) = pX(x(i) | y (i))pY (y (i)) = pX|Y (x(i) | y (i))pY (y (i)) The0 码力 | 122 页 | 1.35 MB | 1 年前3 Lecture 5: Gaussian Discriminant Analysis, Naive BayespX(x) , ∀y We calculate pX|Y (x | y) for ∀x, y and pY (y) for ∀y according to the given training data Fortunately, we do not have to calculate pX(x), because arg max y pY |X(y | x) = arg max y pX|Y learning from training data, but how? Feng Li (SDU) GDA, NB and EM September 27, 2023 33 / 122 Warm Up (Contd.) Given a set of training data D = {x(i), y(i)}i=1,··· ,m The training data are sampled in an an i.i.d. manner The probability of the i-th training data (x(i), y (i)) P(X = x(i), Y = y (i)) = P(X = x(i) | Y = y (i))P(Y = y (i)) = pX(x(i) | y (i))pY (y (i)) = pX|Y (x(i) | y (i))pY (y (i)) The0 码力 | 122 页 | 1.35 MB | 1 年前3
 机器学习课程-温州大学-11机器学习-降维coursera.org/course/ml [2] Hinton, G, E, et al. Reducing the Dimensionality of Data with Neural Networks.[J]. Science, 2006. [3] Jolliffe I T . Principal Component Analysis[J]. Journal of Marketing0 码力 | 51 页 | 3.14 MB | 1 年前3 机器学习课程-温州大学-11机器学习-降维coursera.org/course/ml [2] Hinton, G, E, et al. Reducing the Dimensionality of Data with Neural Networks.[J]. Science, 2006. [3] Jolliffe I T . Principal Component Analysis[J]. Journal of Marketing0 码力 | 51 页 | 3.14 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesfor weight sharing. However, quantization falls behind in case the data that we are quantizing is not uniformly distributed, i.e. the data is more likely to take values in a certain range than another equally ranges (bins), regardless of the frequency of data. Clustering helps solve that problem by adapting the allocation of precision to match the distribution of the data, which ensures the decoded value deviates "Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science." Nature communications 9.1 (2018): 1-12. Weight sparsity has typically been the primary focus of0 码力 | 34 页 | 3.18 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesfor weight sharing. However, quantization falls behind in case the data that we are quantizing is not uniformly distributed, i.e. the data is more likely to take values in a certain range than another equally ranges (bins), regardless of the frequency of data. Clustering helps solve that problem by adapting the allocation of precision to match the distribution of the data, which ensures the decoded value deviates "Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science." Nature communications 9.1 (2018): 1-12. Weight sparsity has typically been the primary focus of0 码力 | 34 页 | 3.18 MB | 1 年前3
共 74 条
- 1
- 2
- 3
- 4
- 5
- 6
- 8













