AI大模型千问 qwen 中文文档AutoTokenizer device = "cuda" # the device to load the model onto # Now you do not need to add "trust_remote_code=True" model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-7B-Chat", torch_dtype="auto" AutoTokenizer device = "cuda" # the device to load the model onto # Now you do not need to add "trust_remote_code=True" model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-7B-Chat", torch_dtype="auto" quantization model_path = "your_model_path" quant_path = "your_quantized_model_path" quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM �→" } # Load your tokenizer and0 码力 | 56 页 | 835.78 KB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版202112神经网络到强化学习领域,提出了 DQN 算法,在 Atari 游戏平台中的 49 个游戏上取得了 与人类相当甚至超越人类的水平;在围棋领域,DeepMind 提出的 AlphaGo 和 AlphaGo Zero 智能程序相继打败人类顶级围棋专家李世石、柯洁等;在多智能体协作的 Dota2 游戏 平台,OpenAI 开发的 OpenAI Five 智能程序在受限游戏环境中打败了 TI8 冠军队伍 OG 队,展现出了大量专业级的高层智能操作。图 DBN深度 置信网络 ImageNet 2009 2012 AlexNet 提出 GAN生成 对抗网络 2014 2015 DQN AlphaGO 2016 2017 AlphaGO Zero 2019 OpenAI Five ResNet 2015 2014 VGG GooLeNet 2015 Batch Normalization 德州扑克 Pluribus 2019 机器翻译 上串行训练即可得到满意结果。但是 深度学习非常依赖并行加速计算设备,目前的大部分神经网络均使用 NVIDIA GPU 和 Google TPU 等并行加速芯片训练模型参数。如围棋程序 AlphaGo Zero 在 64 块 GPU 上从 零开始训练了 40 天才得以超越所有的 AlphaGo 历史版本;自动网络结构搜索算法使用了 800 块 GPU 同时训练才能优化出较好的网络结构。 目前普通消费0 码力 | 439 页 | 29.91 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression Techniquesform of pruning is to zero out a certain, say p, percentage of the smallest absolute valued weights in each training epoch. The result of such a training process is p% weights with zero values. Sparse compressed have fewer connections. Let's do an exercise to convince ourselves that setting parameter values to zero indeed results in a higher compression ratio. Figure 5-1: An illustration of pruning weights (connections) compress(). The sparsify_smallest() sets the absolute smallest weights in the input weight matrix to zero. The number of the absolute smallest weights is computed based on the sparsity_rate parameter which0 码力 | 34 页 | 3.18 MB | 1 年前3
Lecture 6: Support Vector MachineFor now, assume that entire training data are correctly classified by (ω, b) Zero loss on the training examples (non-zero loss later) Feng Li (SDU) SVM December 28, 2021 4 / 82 Margin Hyperplane: ωTx Karush-Kuhn-Tucker (KKT) Conditions Let ω∗ and (α∗, β ∗) by any primal and dual optimal points wither zero duality gap (i.e., the strong duality holds), the following conditions should be satisfied Stationarity: (Contd.) Most αi’s in the solution are zero (sparse solution) According to KKT conditions, for the optimal αi’s, αi � 1 − y (i)(ωTx(i) + b) � = 0 αi is non-zero only if x(i) lies on the one of the two0 码力 | 82 页 | 773.97 KB | 1 年前3
全连接神经网络实战. pytorch 版and l o s s pred = model (X) l o s s = loss_function ( pred , y) # Backpropagation optimizer . zero_grad () #梯 度 归0w l o s s . backward () optimizer . step () i f batch % 100 == 0: loss , current shape ) #权 重 分 布 符 合 正 态 分 布 m. weight . data . normal_ ( 0 . 0 , 1) #偏 置 归 0 m. bias . data . zero_ () Chapter 3. 更完善的神经网络 17 注意 bias 是权重,因为当前层的 bias 会连接下一层的每个神经元,所以 bias 的 shape 是下 一层神经元个数。调用也很简单,定义网络对象后直接调用即可: (m, nn . Linear ) : m. weight . data . normal_ ( 0 . 0 , 1.0)#. f i l l _ (0.05) m. bias . data . zero_ () def forward ( s e l f , x) : #x = s e l f . f l a t t e n (x) l o g i t s = s e l f . linear_relu_stack0 码力 | 29 页 | 1.40 MB | 1 年前3
pytorch 入门笔记-03- 神经网络-0.0056, -0.0597, 0.0184, -0.0300]], grad_fn=) 将所有参数的梯度缓存清零,然后进行随机梯度的的反向传播: net.zero_grad() out.backward(torch.randn(1, 10)) note torch.nn 只支持小批量输入。整个 torch.nn 包都只支持小批量样本,而不支持单个样本。 但是在调用前需要清除已存在的梯度,否则梯度将被累加到已存在的梯度。 现在,我们将调用 loss.backward(),并查看 conv1 层的偏差(bias)项在反向传播前后的梯度。 net.zero_grad() 原文链接:pytorch 入门笔记 -03- 神经网络 print('conv1.bias.grad before backward') print(net.conv1.bias lr=0.01) # 迭代训练 optimizer.zero_grad() # 梯度清零 output = net(input) loss = criterion(output, target) # 计算损失 loss.backward() # 反向传播 optimizer.step() # 更新参数 注意 观察如何使用 optimizer.zero_grad() 手动将梯度缓冲区设置为零。 原文链接:pytorch 0 码力 | 7 页 | 370.53 KB | 1 年前3
动手学深度学习 v2.0tensor([True, True, True, True]) 现在计算x的另一个函数。 70 2. 预备知识 # 在默认情况下,PyTorch会累积梯度,我们需要清除之前的值 x.grad.zero_() y = x.sum() y.backward() x.grad tensor([1., 1., 1., 1.]) 2.5.2 非标量变量的反向传播 当y不是标量时,向量y关于向 而是单独计算批量中每个样本的偏导数之和。 # 对非标量调用backward需要传入一个gradient参数,该参数指定微分函数关于self的梯度。 # 本例只想求偏导数的和,所以传递一个1的梯度是合适的 x.grad.zero_() y = x * x # 等价于y.backward(torch.ones(len(x))) y.sum().backward() x.grad tensor([0., 2., 4 中如何计算y的任何信息。换句 话说,梯度不会向后流经u到x。因此,下面的反向传播函数计算z=u*x关于x的偏导数,同时将u作为常数处理, 而不是z=x*x*x关于x的偏导数。 x.grad.zero_() y = x * x u = y.detach() (continues on next page) 2.5. 自动微分 71 (continued from previous page)0 码力 | 797 页 | 29.45 MB | 1 年前3
Experiment 1: Linear Regressionefficiency. In your program, scale both types of inputs by their standard deviations and set their means to zero. In Matlab/Octave, this can be executed with sigma = std (x ) ; mu = mean(x ) ; x ( : , 2 ) = ( x t J % t e c h n i c a l l y , the f i r s t J s t a r t s at the zero−eth i t e r a t i o n % but Matlab/Octave doesn ’ t have a zero index 5 figure ; plot ( 0 : 4 9 , J ( 1 : 5 0 ) , ’− ’ ) xlabel0 码力 | 7 页 | 428.11 KB | 1 年前3
Machine Learning Pytorch Tutorialtorch.optim.SGD(model.parameters(), lr, momentum = 0) ● For every batch of data: 1. Call optimizer.zero_grad() to reset gradients of model parameters. 2. Call loss.backward() to backpropagate gradients Loop for epoch in range(n_epochs): model.train() for x, y in tr_set: optimizer.zero_grad() x, y = x.to(device), y.to(device) pred = model(x) loss = criterion(pred step() iterate n_epochs set model to train mode iterate through the dataloader set gradient to zero move data to device (cpu/cuda) forward pass (compute output) compute loss compute gradient (backpropagation)0 码力 | 48 页 | 584.86 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquessample is longer, it is truncated. A shorter sample is padded with null words. The null words have zero word vectors. We will further explain the process of transformation of a sentence to a word vector representation vector BATCH SIZE x MAX_SEQ_LEN x WORD2VEC_LEN with zero values vector = np.zeros(shape=(len(text), MAX_SEQ_LEN, WORD2VEC_LEN)) # Fill up zero vector with the actual word vectors from the language probabilities assigned to every class. Hence the soft-ness as compared to the hard labels which are one and zero for the correct and incorrect classes respectively. ● The predicted probability for the cat image0 码力 | 56 页 | 18.93 MB | 1 年前3
共 22 条
- 1
- 2
- 3













