动手学深度学习 v2.0function ones in module torch: ones(...) ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_ �→grad=False) -> Tensor Returns a tensor filled with the scalar value 1, with the strided. device (torch.device, optional): the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. requires_grad (bool, optional): If autograd should record operations on the returned tensor. Default: False. Example::0 码力 | 797 页 | 29.45 MB | 1 年前3
AI大模型千问 qwen 中文文档模型,其中包含 Qwen1. 5-7B-Chat 的实例: from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto # Now you do not need to add "trust_remote_code=True" model = = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-7B-Chat", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat") # Instead of using model.chat() tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) # Directly use generate() and tokenizer.decode() to get the output. # Use `max_new_tokens` to control0 码力 | 56 页 | 835.78 KB | 1 年前3
Machine Learning Pytorch TutorialTensors – Device ● Tensors & modules will be computed with CPU by default Use .to() to move tensors to appropriate devices. ● CPU x = x.to(‘cpu’) ● GPU x = x.to(‘cuda’) Tensors – Device (GPU) ● MyModel().to(device) criterion = nn.MSELoss() optimizer = torch.optim.SGD(model.parameters(), 0.1) read data via MyDataset put dataset into Dataloader construct model and move to device (cpu/cuda) model.train() for x, y in tr_set: optimizer.zero_grad() x, y = x.to(device), y.to(device) pred = model(x) loss = criterion(pred, y) loss.backward()0 码力 | 48 页 | 584.86 KB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionexpenditure on their data-centers, hence any efficiency gains are very significant. Enabling On-Device Deployment With the advent of smartphones, Internet-of-Things (IoT) devices (refer to Figure 1-5 hence there is a need for on-device ML models (where the model inference happens directly on the device). Which makes it imperative to optimize the models for the device they will run on. Privacy & Data lesser data-collection required. Similarly, enabling on-device models would imply that the model inference can be run completely on the user’s device without the need to send the input data to the server-side0 码力 | 21 页 | 3.17 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturescan be a bottleneck if the model is going to be deployed on-device (smartphones, IoT devices, etc.), where transmitting the model to the device is limited by the user’s bandwidth, and the memory available smaller vocabulary, and see if the resulting quality is within the acceptable parameters. For on-device models, TFLite offers post-training quantization as described in chapter 2. We could also incorporate shape (N, d). pQRNN demonstrated a model 140x smaller than an LSTM with pre-trained embeddings. An on-device friendly implementation of pQRNN is available in the Tensorflow repository here. We learnt about0 码力 | 53 页 | 3.92 MB | 1 年前3
【PyTorch深度学习-龙龙老师】-测试版202112print(n, cpu_a.device, cpu_b.device) # 创建使用 GPU 运算的 2 个矩阵 gpu_a = torch.randn([1, n]).cuda() gpu_b = torch.randn([n, 1]).cuda() print(n, gpu_a.device, gpu_b.device) 接下来实现 CPU 和 print(x) # 打印 print(x.shape, x.device, x.dtype) # 打印形状和设备、精度 Out[2]: tensor([1.0000, 2.0000, 3.3000]) torch.Size([3]),cpu,torch.float32 其中 shape 属性表示张量的形状,device 属性代表了张量的设备名,dtype 属性表示张量的 数值精度,张量 中,默认使用按需分配显存方式,可以通过 torch.cuda.memory_allocated 函 数获取目前已分配显存大小,代码如下: # 获取 GPU 0 的总显存 t = torch.cuda.get_device_properties(0).total_memory # 获取保留显存 r = torch.cuda.memory_reserved(0) # 获取已分配显存 a = torch0 码力 | 439 页 | 29.91 MB | 1 年前3
PyTorch OpenVINO 开发实战系列教程第一篇in range(torch.cuda.device_count()): PyTorch + OpenVINO 开发实战系列教程 第一篇 9 print(torch.cuda.get_device_name(i)) if gpu: print(x.cuda()) y = torch.tensor([1, 2, 3, 4], device="cuda:0") print("y: 1050 Ti tensor([[ 2., 3., 4., 12.], [ 3., 5., 8., 1.]], device='cuda:0') y: tensor([1, 2, 3, 4], device='cuda:0') 这里 x 默认是 CPU 类型数据,y 是直接创建的 GPU 类型数据。 以上都是一些最基础跟使用频率较高的 Pytorch 基础操作,了0 码力 | 13 页 | 5.99 MB | 1 年前3
全连接神经网络实战. pytorch 版来训练网络 首先,我们先定义用来训练网络的设备: device = ’ cuda ’ i f torch . cuda . is_available () e l s e ’ cpu ’ print ( device ) #把 网 络 模 型 移 到 cuda 中 model = NeuralNetwork () . to ( device ) print ( model ) 如果 cuda model ’ + s t r (9) +’ . pth ’ checkpoint = torch . load ( path ) model2 = NeuralNetwork () . to ( device ) model2 . load_state_dict ( checkpoint [ ’ model ’ ] ) optimizer . load_state_dict ( checkpoint bias 会连接下一层的每个神经元,所以 bias 的 shape 是下 一层神经元个数。调用也很简单,定义网络对象后直接调用即可: model = NeuralNetwork () . to ( device ) model . weight_init () 我们开始训练,发现第一个 epoch 训练的结果正确率就达到了 78%,而最终训练结果能达到 百分之 81%。说明合理地初始化权重具有很重要的意义。0 码力 | 29 页 | 1.40 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniqueshome-automation device. Figure 3-4 shows the high level workflow of such a device. The model continuously classifies audio signals into one of the four classes, three of which are the keywords that the device will absence of an acceptable keyword in the input signal. Figure 3-4: Workflow of a home-automation device which detects three spoken words: hello weather and time. The output is none when none of the three0 码力 | 56 页 | 18.93 MB | 1 年前3
《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquesshown in table 2-1. Footprint Metrics Quality Metrics ● Model Size ● Inference Latency on Target Device ● Training Time for Convergence ● Peak RAM Consumption ● Accuracy ● Precision ● Recall ● F1 a model is useful if we want to deploy a model in a space constrained environment like a mobile device. To summarize, compression techniques help to achieve an efficient representation of a layer or or cheques using a deep learning system. We are targeting this system to run on a low end Android device. The resource limitations are under 50 Kb of model size and an upper limit of 1 millisecond per prediction0 码力 | 33 页 | 1.96 MB | 1 年前3
共 19 条
- 1
- 2













