extensión - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Lecture 5: Gaussian Discriminant Analysis, Naive Bayes

| B) = P(A, B) P(B) , P(A, B) = P(A | B)P(B) Corollary: The chain rule P (A1, A2, · · · , Ak) = n � k=1 P (Ak | A1, A2, · · · , Ak−1) Example: P(A4, A3, A2, A1) = P(A4 | A3, A2, A1)P(A3 | A2, A1)P(A2 Suppose we have n features X = [X1, X2, · · · , Xn]T The features are independent with each other P(X = x | Y = y) = P(X1 = x1, · · · , Xn = xn | Y = y) = n � j=1 P(Xj = xj | Y = y) = n � j=1 pXj|Y (x(i), y(i)) P(X = x(i) | Y = y(i)) = P(X1 = x(i) 1 , · · · , Xn = x(i) n | Y = y(i)) = n � j=1 P(Xj = x(i) j | Y = y(i)) = n � j=1 pXj|Y (x(i) j | y(i)) Feng Li (SDU) GDA, NB and EM September 27

0 码力 | 122 页 | 1.35 MB | 1 年前
3
Lecture Notes on Gaussian Discriminant Analysis, Naive

corresponding probability mass function (PMF) as pY (y; ψ) = P(Y = y) = ψy(1 − ψ)1−y (5) • A2: X | Y = 0 ∼ N(µ0, Σ): The conditional probability of continuous random variable X given Y = 0 is a Gaussian distribution probability density function (PDF) is defined as pX|Y (x | 0) = 1 (2π)n/2|Σ|1/2 exp � −1 2(x − µ0)T Σ−1(x − µ0) � (6) • A3: X | Y = 1 ∼ N(µ1, Σ): The conditional probability of continuous random variable distribution parameterized by µ1 and Σ, such that the corresponding PDF is given by pX|Y (x | 1) = 1 (2π)n/2|Σ|1/2 exp � −1 2(x − µ1)T Σ−1(x − µ1) � (7) Given m sample data {(x(i), y(i))}i=1,··· ,m, the log-likelihood

0 码力 | 19 页 | 238.80 KB | 1 年前
3
keras tutorial

using below formula and then find the weights using normal distribution, stddev = sqrt(scale / n) where n represent,  number of input units for mode = fan_in  number of out units for mode = fan_out using below formula and then find the weights using uniform distribution, limit = sqrt(3 * scale / n) lecun_normal Generates value using lecun normal distribution of input data. from keras.models is set as pattern. RepeatVector Layers RepeatVector is used to repeat the input for set number, n of times. For example, if RepeatVector with argument 16 is applied to layer having input shape as (batch_size

0 码力 | 98 页 | 1.57 MB | 1 年前
3
动手学深度学习 v2.0

学习语言模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 8.3.2 马尔可夫模型与n元语法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.3.3 自然语言统计 . . . . . . • I：单位矩阵 • xi, [x]i：向量x第i个元素 • xij, [X]ij：矩阵X第i行第j列的元素集合论 • X: 集合 • Z: 整数集合 • R: 实数集合 • Rn: n维实数向量集合 • Ra×b: 包含a行和b列的实数矩阵集合 • A ∪ B: 集合A和B的并集 13 • A ∩ B：集合A和B的交集 • A \ B：集合A与集合B相减，B关于A的相对补集为了能够完成各种数据操作，我们需要某种方法来存储和操作数据。通常，我们需要做两件重要的事：（1）获取数据；（2）将数据读入计算机后对其进行处理。如果没有某种方法来存储数据，那么获取数据是没有意义的。首先，我们介绍n维数组，也称为张量（tensor）。使用过Python中NumPy计算包的读者会对本部分很熟悉。无论使用哪个深度学习框架，它的张量类（在MXNet中为ndarray，在PyTorch和TensorFlow中为Tensor）都

0 码力 | 797 页 | 29.45 MB | 1 年前
3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architectures

features. 9 Implementation Detail: Using the cross entropy loss when $$N$$ is large can be computationally expensive due to the N-way softmax calculation. In the real world, as an efficient approximation table. Size of the vocabulary (N): It is expensive to learn an embedding for every possible token, so we limit our vocabulary to a smaller subset using heuristics like the top N most-frequent words in the model quality needs to be determined empirically, but d is often in hundreds for NLP problems, and N might range from thousands to millions. Now that we are familiar with generating embeddings, how do

0 码力 | 53 页 | 3.92 MB | 1 年前
3
《TensorFlow 快速入门与实战》7-实战TensorFlow人脸识别

�� e� �� e�P�� e� �� e�P�� e� �� e� ��n �� LFW.Labeled Face in the Wild/ 2007��LFW ��:��LFW� ��2�� iP�� x��N��t�H��f�K��f�M�b��H �m��jyc��H��nf�M��h�� [��]� x��S��r�y��u��w�0�1�RT��H��d�� f��N��ko��e�g��0�1M�w��P��Ha�l�fs�I �A2�B2��B��,2��A��2��B�B�2��2�9��8�� KheR�n��gjui ��t��nbi�� gj��v�m��w�yf]�s��ylK ��n��s��lK�adTH��c��kSl��[gjr ��c��kSls��y[c�Sn��a�o�dTH��v��f��p�

0 码力 | 81 页 | 12.64 MB | 1 年前
3
全连接神经网络实战. pytorch 版

s e l f ) : super ( NeuralNetwork , s e l f ) . __init__ () # 把数组降到1 维 s e l f . f l a t t e n = nn . Flatten () # 定义网络的计算顺序 s e l f . linear_relu_stack = nn . Sequential ( nn . Linear nn .ReLU() , nn . Linear (512 , 10) , ) def forward ( s e l f , x) : x = s e l f . f l a t t e n (x) l o g i t s = s e l f . linear_relu_stack (x) return l o g i t s model = NeuralNetwork () print 网络体、损失函数计算体以及优化器；测试函数不需要优化器： epochs = 10 f or t in range ( epochs ) : print ( f ”Epoch␣{ t+1}\n−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” ) train_loop ( train_dataloader , model , loss_function , optimizer

0 码力 | 29 页 | 1.40 MB | 1 年前
3
Experiment 1: Linear Regression

Recall that the linear regression model is hθ(x) = θT x = n � j=0 θjxj, (1) where θ is the parameter which we need to optimize and x is the (n + 1)- dimensional feature vector 1. Given a training set training data is actually n-dimensional, i.e., x = [x1, x2, · · · , xn]. For each training data, we have an extra intercept item x0 = 1. Therefore, the resulting feature vector is (n + 1)-dimensional. 1 3 2D Linear Regression We start a very simple case where n = 1. Download data1.zip, and extract the files (ex1x.dat and ex1y.dat) from the zip file. The files contain some example measurements of heights

0 码力 | 7 页 | 428.11 KB | 1 年前
3
AI大模型千问 qwen 中文文档

apply_chat_template() 函数将消息转换为模型能够理解的格式。其中的 add_generation_prompt 参数用于在输入中添加生成提示，该提示指向 <|im_start|>assistant\n 。尤其需要注意的是，我们遵循先前实践，对 chat 模型应用 ChatML 模板。而 max_new_tokens 参数则用于设置响应的最大长度。此外，通过 tokenizer.batch_decode() --local-dir-use-symlinks False 然后你可以用如下命令运行模型： ./main -m qwen1_5-7b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with- �→qwen.txt -n 指的是要生成的最大 token 数量。这里还有其他超参数供你选择，并且你可以运行 ./main -h 以了解它们。 1.4.3 生成你的子系统 Linux（WSL）上运行 start_wsl.bat 。另外，你也可以选择手动在 conda 环境中安装所需的依赖项。这里以 MacOS 系统为例进行实践操作。 conda create -n textgen python=3.11 conda activate textgen pip install torch torchvision torchaudio 接下来，您可以根据您的操作系统执行

0 码力 | 56 页 | 835.78 KB | 1 年前
3
Lecture 7: K-Means

(SDU) K-Means December 28, 2021 2 / 46 Clustering Usually an unsupervised learning problem Given: N unlabeled examples {x1, · · · , xN}; no. of desired partitions K Goal: Group the examples into K “homogeneous” Problem Given a set of observations X = {x1, x2, · · · , xN} (xi ∈ RD), partition the N observations into K sets (K ≤ N) {Ck}k=1,··· ,K such that the sets minimize the within-cluster sum of squares: arg over all points defines the K-means “loss function” L(µ, X, Z) = N � i=1 K � k=1 zi,k∥xi − µk∥2 = ∥X − Zµ∥2 where X is N × D, Z is N × K and µ is K × D Feng Li (SDU) K-Means December 28, 2021 19 /

0 码力 | 46 页 | 9.78 MB | 1 年前
3

共 59 条前往

页

分类

语言

格式