Lecture 5: Gaussian Discriminant Analysis, Naive Bayes| B) = P(A, B) P(B) , P(A, B) = P(A | B)P(B) Corollary: The chain rule P (A1, A2, · · · , Ak) = n � k=1 P (Ak | A1, A2, · · · , Ak−1) Example: P(A4, A3, A2, A1) = P(A4 | A3, A2, A1)P(A3 | A2, A1)P(A2 Suppose we have n features X = [X1, X2, · · · , Xn]T The features are independent with each other P(X = x | Y = y) = P(X1 = x1, · · · , Xn = xn | Y = y) = n � j=1 P(Xj = xj | Y = y) = n � j=1 pXj|Y (x(i), y(i)) P(X = x(i) | Y = y(i)) = P(X1 = x(i) 1 , · · · , Xn = x(i) n | Y = y(i)) = n � j=1 P(Xj = x(i) j | Y = y(i)) = n � j=1 pXj|Y (x(i) j | y(i)) Feng Li (SDU) GDA, NB and EM September 270 码力 | 122 页 | 1.35 MB | 1 年前3
Lecture Notes on Gaussian Discriminant Analysis, Naivecorresponding probability mass function (PMF) as pY (y; ψ) = P(Y = y) = ψy(1 − ψ)1−y (5) • A2: X | Y = 0 ∼ N(µ0, Σ): The conditional probability of continuous random variable X given Y = 0 is a Gaussian distribution probability density function (PDF) is defined as pX|Y (x | 0) = 1 (2π)n/2|Σ|1/2 exp � −1 2(x − µ0)T Σ−1(x − µ0) � (6) • A3: X | Y = 1 ∼ N(µ1, Σ): The conditional probability of continuous random variable distribution parameterized by µ1 and Σ, such that the corresponding PDF is given by pX|Y (x | 1) = 1 (2π)n/2|Σ|1/2 exp � −1 2(x − µ1)T Σ−1(x − µ1) � (7) Given m sample data {(x(i), y(i))}i=1,··· ,m, the log-likelihood0 码力 | 19 页 | 238.80 KB | 1 年前3
keras tutorialusing below formula and then find the weights using normal distribution, stddev = sqrt(scale / n) where n represent, number of input units for mode = fan_in number of out units for mode = fan_out using below formula and then find the weights using uniform distribution, limit = sqrt(3 * scale / n) lecun_normal Generates value using lecun normal distribution of input data. from keras.models is set as pattern. RepeatVector Layers RepeatVector is used to repeat the input for set number, n of times. For example, if RepeatVector with argument 16 is applied to layer having input shape as (batch_size0 码力 | 98 页 | 1.57 MB | 1 年前3
动手学深度学习 v2.0学习语言模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 8.3.2 马尔可夫模型与n元语法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.3.3 自然语言统计 . . . . . . • I:单位矩阵 • xi, [x]i:向量x第i个元素 • xij, [X]ij:矩阵X第i行第j列的元素 集合论 • X: 集合 • Z: 整数集合 • R: 实数集合 • Rn: n维实数向量集合 • Ra×b: 包含a行和b列的实数矩阵集合 • A ∪ B: 集合A和B的并集 13 • A ∩ B:集合A和B的交集 • A \ B:集合A与集合B相减,B关于A的相对补集 为了能够完成各种数据操作,我们需要某种方法来存储和操作数据。通常,我们需要做两件重要的事:(1) 获取数据;(2)将数据读入计算机后对其进行处理。如果没有某种方法来存储数据,那么获取数据是没有意 义的。 首先,我们介绍n维数组,也称为张量(tensor)。使用过Python中NumPy计算包的读者会对本部分很熟悉。 无论使用哪个深度学习框架,它的张量类(在MXNet中为ndarray,在PyTorch和TensorFlow中为Tensor)都0 码力 | 797 页 | 29.45 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesfeatures. 9 Implementation Detail: Using the cross entropy loss when $$N$$ is large can be computationally expensive due to the N-way softmax calculation. In the real world, as an efficient approximation table. Size of the vocabulary (N): It is expensive to learn an embedding for every possible token, so we limit our vocabulary to a smaller subset using heuristics like the top N most-frequent words in the model quality needs to be determined empirically, but d is often in hundreds for NLP problems, and N might range from thousands to millions. Now that we are familiar with generating embeddings, how do0 码力 | 53 页 | 3.92 MB | 1 年前3
《TensorFlow 快速入门与实战》7-实战TensorFlow人脸识别���������������� ��e� �������� e�P�������� ��e� ��� e�P���� ��e� �������� ���������������� ��e� �����n ��������������� ���� ����������� LFW.Labeled Face in the Wild/ 2007����LFW ������������:��������������LFW� ���2����� iP����������� x����������N����t�H��������f�K���f�M�b�������H �m����jyc���H��nf�M���h���� ����[���]� x��S���r�y��������u��w�0�1�RT�����������H���������d�� �f���N��ko��e�g���0�1M�w��P�������������Ha�l�fs�I �A2�B2�����B����,2���A����2��B�B�2����2�9�����������8����� �KheR�n��gjui ������t���nbi��� ��������gj�����v�m���w�yf]�s���ylK ����n���s��lK�adTH������������c����kSl����[gjr ��c��kSls���y[c�Sn�����a�o�dTH���v��f����p�0 码力 | 81 页 | 12.64 MB | 1 年前3
全连接神经网络实战. pytorch 版s e l f ) : super ( NeuralNetwork , s e l f ) . __init__ () # 把 数 组 降 到1 维 s e l f . f l a t t e n = nn . Flatten () # 定 义 网 络 的 计 算 顺 序 s e l f . linear_relu_stack = nn . Sequential ( nn . Linear nn .ReLU() , nn . Linear (512 , 10) , ) def forward ( s e l f , x) : x = s e l f . f l a t t e n (x) l o g i t s = s e l f . linear_relu_stack (x) return l o g i t s model = NeuralNetwork () print 网络体、损失函数计算体以及优化器;测试函数不 需要优化器: epochs = 10 f or t in range ( epochs ) : print ( f ”Epoch␣{ t+1}\n−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” ) train_loop ( train_dataloader , model , loss_function , optimizer0 码力 | 29 页 | 1.40 MB | 1 年前3
Experiment 1: Linear RegressionRecall that the linear regression model is hθ(x) = θT x = n � j=0 θjxj, (1) where θ is the parameter which we need to optimize and x is the (n + 1)- dimensional feature vector 1. Given a training set training data is actually n-dimensional, i.e., x = [x1, x2, · · · , xn]. For each training data, we have an extra intercept item x0 = 1. Therefore, the resulting feature vector is (n + 1)-dimensional. 1 3 2D Linear Regression We start a very simple case where n = 1. Download data1.zip, and extract the files (ex1x.dat and ex1y.dat) from the zip file. The files contain some example measurements of heights0 码力 | 7 页 | 428.11 KB | 1 年前3
AI大模型千问 qwen 中文文档apply_chat_template() 函数将消息转换为模型能够理解的格式。其中的 add_generation_prompt 参数用于在输入中添加生成提示,该提示指向 <|im_start|>assistant\n 。尤其需要注意的是,我们 遵循先前实践,对 chat 模型应用 ChatML 模板。而 max_new_tokens 参数则用于设置响应的最大长度。此 外,通过 tokenizer.batch_decode() --local-dir-use-symlinks False 然后你可以用如下命令运行模型: ./main -m qwen1_5-7b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with- �→qwen.txt -n 指的是要生成的最大 token 数量。这里还有其他超参数供你选择,并且你可以运行 ./main -h 以了解它们。 1.4.3 生成你的 子系统 Linux(WSL)上运行 start_wsl.bat 。另外,你也可以选择手动在 conda 环境中安装所需的依赖项。这 里以 MacOS 系统为例进行实践操作。 conda create -n textgen python=3.11 conda activate textgen pip install torch torchvision torchaudio 接下来,您可以根据您的操作系统执行0 码力 | 56 页 | 835.78 KB | 1 年前3
Lecture 7: K-Means(SDU) K-Means December 28, 2021 2 / 46 Clustering Usually an unsupervised learning problem Given: N unlabeled examples {x1, · · · , xN}; no. of desired partitions K Goal: Group the examples into K “homogeneous” Problem Given a set of observations X = {x1, x2, · · · , xN} (xi ∈ RD), partition the N observations into K sets (K ≤ N) {Ck}k=1,··· ,K such that the sets minimize the within-cluster sum of squares: arg over all points defines the K-means “loss function” L(µ, X, Z) = N � i=1 K � k=1 zi,k∥xi − µk∥2 = ∥X − Zµ∥2 where X is N × D, Z is N × K and µ is K × D Feng Li (SDU) K-Means December 28, 2021 19 /0 码力 | 46 页 | 9.78 MB | 1 年前3
共 59 条
- 1
- 2
- 3
- 4
- 5
- 6













