 PyTorch Release Notesmulti-threaded data loaders, the default shared memory segment size with which the container runs might not be enough. Therefore, you should increase the shared memory size by issuing one of the following commands: commands: ‣ --ipc=host ‣ --shm-size= PyTorch Release Notesmulti-threaded data loaders, the default shared memory segment size with which the container runs might not be enough. Therefore, you should increase the shared memory size by issuing one of the following commands: commands: ‣ --ipc=host ‣ --shm-size=- memory size> in the command line to docker run --gpus all To pull data and model descriptions from locations outside the container for use by PyTorch or (FP8) precision on Hopper GPUs which provides better training and inference performance with lower memory utilization. Transformer Engine also includes a collection of highly optimized modules for popular 0 码力 | 365 页 | 2.94 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesin ANALOG magazine (1991) So far, we have discussed generic techniques which are agnostic to the model architecture. These techniques can be applied in NLP, vision, speech or other domains. However, owing challenges. What good is a model that cannot be deployed in practical applications! Efficient Architectures aim to improve model deployability by proposing novel ways to reduce model footprint and improve running on mobile and edge devices. We have also set up a couple of programming projects for a hands-on model optimization experience using these efficient layers and architectures. Let’s start our journey with0 码力 | 53 页 | 3.92 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 4 - Efficient Architecturesin ANALOG magazine (1991) So far, we have discussed generic techniques which are agnostic to the model architecture. These techniques can be applied in NLP, vision, speech or other domains. However, owing challenges. What good is a model that cannot be deployed in practical applications! Efficient Architectures aim to improve model deployability by proposing novel ways to reduce model footprint and improve running on mobile and edge devices. We have also set up a couple of programming projects for a hands-on model optimization experience using these efficient layers and architectures. Let’s start our journey with0 码力 | 53 页 | 3.92 MB | 1 年前3
 AI大模型千问 qwen 中文文档Qwen Team 2024 年 05 月 11 日 快速开始 1 文档 3 i ii Qwen Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Now the large language models have been upgraded AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto # Now you do not need to add "trust_remote_code=True" model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-7B-Chat", tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat") # Instead of using model.chat(), we directly use model.generate() # But you need to use tokenizer.apply_chat_template() to format your inputs0 码力 | 56 页 | 835.78 KB | 1 年前3 AI大模型千问 qwen 中文文档Qwen Team 2024 年 05 月 11 日 快速开始 1 文档 3 i ii Qwen Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Now the large language models have been upgraded AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto # Now you do not need to add "trust_remote_code=True" model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen1.5-7B-Chat", tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-7B-Chat") # Instead of using model.chat(), we directly use model.generate() # But you need to use tokenizer.apply_chat_template() to format your inputs0 码力 | 56 页 | 835.78 KB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionlearning algorithms help build models, which as the name suggests is an approximate mathematical model of what outputs correspond to a given input. To illustrate, when you visit Netflix’s homepage, the might be popular with other users too. If we train a model to predict the probability based on your behavior and currently trending content, the model will assign a high probability to Seinfeld. While there the performance of the model scaled well with the number of labeled examples, since the network had a large number of parameters. Thus to extract the most out of the setup, the model needed a large number0 码力 | 21 页 | 3.17 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 1 - Introductionlearning algorithms help build models, which as the name suggests is an approximate mathematical model of what outputs correspond to a given input. To illustrate, when you visit Netflix’s homepage, the might be popular with other users too. If we train a model to predict the probability based on your behavior and currently trending content, the model will assign a high probability to Seinfeld. While there the performance of the model scaled well with the number of labeled examples, since the network had a large number of parameters. Thus to extract the most out of the setup, the model needed a large number0 码力 | 21 页 | 3.17 MB | 1 年前3
 【PyTorch深度学习-龙龙老师】-测试版202112参考文献 第 15 章 自定义数据集 15.1 精灵宝可梦数据集 15.2 自定义数据集加载流程 15.3 宝可梦数据集实战 15.4 迁移学习 15.5 Saved_model 15.6 模型部署 15.7 参考文献 预览版202112 人工智能绪论 我们需要的是一台可以从经验中学习的机器。 −阿兰·图灵 1.1 容器可以非常方便地搭建多层的网络。对于 3 层网络,我们可以通过快速 完成 3 层网络的搭建。 # 利用 Sequential 容器封装 3 个网络层,前网络层的输出默认作为下一层的输入 model = nn.Sequential( # 创建第一层,输入为 784,输出为 256 nn.Linear(28*28, 256), nn.ReLU(), # 激活函数 ) 第 1 层的输出节点数设计为 256,第 2 层设计为 128,输出层节点数设计为 10。直接调用 这个模型对象 model(x)就可以返回模型最后一层的输出?。 3.8.2 模型训练 搭建完成 3 层神经网络的对象后,给定输入?,调用 model(?)得到模型输出?后,通过 F.mse_loss 损失函数计算当前的误差ℒ: # 创建优化器,并传递需要优化的参数列表:[w10 码力 | 439 页 | 29.91 MB | 1 年前3 【PyTorch深度学习-龙龙老师】-测试版202112参考文献 第 15 章 自定义数据集 15.1 精灵宝可梦数据集 15.2 自定义数据集加载流程 15.3 宝可梦数据集实战 15.4 迁移学习 15.5 Saved_model 15.6 模型部署 15.7 参考文献 预览版202112 人工智能绪论 我们需要的是一台可以从经验中学习的机器。 −阿兰·图灵 1.1 容器可以非常方便地搭建多层的网络。对于 3 层网络,我们可以通过快速 完成 3 层网络的搭建。 # 利用 Sequential 容器封装 3 个网络层,前网络层的输出默认作为下一层的输入 model = nn.Sequential( # 创建第一层,输入为 784,输出为 256 nn.Linear(28*28, 256), nn.ReLU(), # 激活函数 ) 第 1 层的输出节点数设计为 256,第 2 层设计为 128,输出层节点数设计为 10。直接调用 这个模型对象 model(x)就可以返回模型最后一层的输出?。 3.8.2 模型训练 搭建完成 3 层神经网络的对象后,给定输入?,调用 model(?)得到模型输出?后,通过 F.mse_loss 损失函数计算当前的误差ℒ: # 创建优化器,并传递需要优化的参数列表:[w10 码力 | 439 页 | 29.91 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquescompression techniques. Compression techniques aim to reduce the model footprint (size, latency, memory etc.). We can reduce the model footprint by reducing the number of trainable parameters. However requires many trials and evaluations to reach a smaller model, if it is at all possible. Second, such an approach doesn’t generalize well because the model designs are subjective to the specific problem. In In this chapter, we introduce Quantization, a model compression technique that addresses both these issues. We’ll start with a gentle introduction to the idea of compression. Details of quantization and0 码力 | 33 页 | 1.96 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 2 - Compression Techniquescompression techniques. Compression techniques aim to reduce the model footprint (size, latency, memory etc.). We can reduce the model footprint by reducing the number of trainable parameters. However requires many trials and evaluations to reach a smaller model, if it is at all possible. Second, such an approach doesn’t generalize well because the model designs are subjective to the specific problem. In In this chapter, we introduce Quantization, a model compression technique that addresses both these issues. We’ll start with a gentle introduction to the idea of compression. Details of quantization and0 码力 | 33 页 | 1.96 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression TechniquesCan we optimally prune the network connections, remove extraneous nodes, etc. while retaining the model’s performance? In this chapter we introduce the intuition behind sparsity, different possible methods methods of picking the connections and nodes to prune, and how to prune a given deep learning model to achieve storage and latency gains with a minimal performance tradeoff. Next, the chapter goes over weight learn about these techniques together! Model Compression Using Sparsity Sparsity or Pruning refers to the technique of removing (pruning) weights during the model training to achieve smaller models. Such0 码力 | 34 页 | 3.18 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 5 - Advanced Compression TechniquesCan we optimally prune the network connections, remove extraneous nodes, etc. while retaining the model’s performance? In this chapter we introduce the intuition behind sparsity, different possible methods methods of picking the connections and nodes to prune, and how to prune a given deep learning model to achieve storage and latency gains with a minimal performance tradeoff. Next, the chapter goes over weight learn about these techniques together! Model Compression Using Sparsity Sparsity or Pruning refers to the technique of removing (pruning) weights during the model training to achieve smaller models. Such0 码力 | 34 页 | 3.18 MB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesyou'll go.” ― Dr. Seuss Model quality is an important benchmark to evaluate the performance of a deep learning model. A language translation application that uses a low quality model would struggle with consumer effectively with others who speak different languages. An application that employs a high quality model with a reasonable translation accuracy would garner better consumer support. In this chapter, our picked to benchmark learning techniques. It is followed by a short discussion on exchanging model quality and model footprint. An in-depth discussion of data augmentation and distillation follows right after0 码力 | 56 页 | 18.93 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesyou'll go.” ― Dr. Seuss Model quality is an important benchmark to evaluate the performance of a deep learning model. A language translation application that uses a low quality model would struggle with consumer effectively with others who speak different languages. An application that employs a high quality model with a reasonable translation accuracy would garner better consumer support. In this chapter, our picked to benchmark learning techniques. It is followed by a short discussion on exchanging model quality and model footprint. An in-depth discussion of data augmentation and distillation follows right after0 码力 | 56 页 | 18.93 MB | 1 年前3
 亚马逊AWSAI Services Overview内存 (内存存取带宽达到240 GB/秒), 以及 2,496 个并行处理核心 Instance Name GPU Count vCPU Count Memory Parallel Processing Cores GPU Memory Network Performance p2.xlarge 1 4 61 GiB 2,496 12 GiB High p2.8xlarge 8 32 Understanding Book Flight London Utterances Flight booking London Heathrow Intent / Slot model London Heathrow Origin Destination Departure Date Flight Booking “Book a flight to London” Understanding Book Flight London Utterances Flight booking London Heathrow Intent / Slot model London Heathrow Location Location Seattle Origin Destination Departure Date Flight Booking0 码力 | 56 页 | 4.97 MB | 1 年前3 亚马逊AWSAI Services Overview内存 (内存存取带宽达到240 GB/秒), 以及 2,496 个并行处理核心 Instance Name GPU Count vCPU Count Memory Parallel Processing Cores GPU Memory Network Performance p2.xlarge 1 4 61 GiB 2,496 12 GiB High p2.8xlarge 8 32 Understanding Book Flight London Utterances Flight booking London Heathrow Intent / Slot model London Heathrow Origin Destination Departure Date Flight Booking “Book a flight to London” Understanding Book Flight London Utterances Flight booking London Heathrow Intent / Slot model London Heathrow Location Location Seattle Origin Destination Departure Date Flight Booking0 码力 | 56 页 | 4.97 MB | 1 年前3
 动手学深度学习 v2.0定当下的“最佳参数集”,这些参数 通过某种性能度量方式来达到完成任务的最佳性能。 那么到底什么是参数呢?参数可以被看作旋钮,旋钮的转动可以调整程序的行为。任一调整参数后的程序被 称为模型(model)。通过操作参数而生成的所有不同程序(输入‐输出映射)的集合称为“模型族”。使用数 据集来选择参数的元程序被称为学习算法(learning algorithm)。 在开始用机器学习算法解决问题 进行更详细的解析。 1.2 机器学习中的关键组件 首先介绍一些核心组件。无论什么类型的机器学习问题,都会遇到这些组件: 1. 可以用来学习的数据(data); 2. 如何转换数据的模型(model); 3. 一个目标函数(objective function),用来量化模型的有效性; 4. 调整模型参数以优化目标函数的算法(algorithm)。 1.2. 机器学习中的关键组件 19 b。无论我们使用什么手段来观察特征X和标签y,都可能会出现少量 的观测误差。因此,即使确信特征与标签的潜在关系是线性的,我们也会加入一个噪声项来考虑观测误差带 来的影响。 在开始寻找最好的模型参数(model parameters)w和b之前,我们还需要两个东西:(1)一种模型质量的度 量方式;(2)一种能够更新模型以提高模型预测质量的方法。 损失函数 在我们开始考虑如何用模型拟合(fit)数据0 码力 | 797 页 | 29.45 MB | 1 年前3 动手学深度学习 v2.0定当下的“最佳参数集”,这些参数 通过某种性能度量方式来达到完成任务的最佳性能。 那么到底什么是参数呢?参数可以被看作旋钮,旋钮的转动可以调整程序的行为。任一调整参数后的程序被 称为模型(model)。通过操作参数而生成的所有不同程序(输入‐输出映射)的集合称为“模型族”。使用数 据集来选择参数的元程序被称为学习算法(learning algorithm)。 在开始用机器学习算法解决问题 进行更详细的解析。 1.2 机器学习中的关键组件 首先介绍一些核心组件。无论什么类型的机器学习问题,都会遇到这些组件: 1. 可以用来学习的数据(data); 2. 如何转换数据的模型(model); 3. 一个目标函数(objective function),用来量化模型的有效性; 4. 调整模型参数以优化目标函数的算法(algorithm)。 1.2. 机器学习中的关键组件 19 b。无论我们使用什么手段来观察特征X和标签y,都可能会出现少量 的观测误差。因此,即使确信特征与标签的潜在关系是线性的,我们也会加入一个噪声项来考虑观测误差带 来的影响。 在开始寻找最好的模型参数(model parameters)w和b之前,我们还需要两个东西:(1)一种模型质量的度 量方式;(2)一种能够更新模型以提高模型预测质量的方法。 损失函数 在我们开始考虑如何用模型拟合(fit)数据0 码力 | 797 页 | 29.45 MB | 1 年前3
共 69 条
- 1
- 2
- 3
- 4
- 5
- 6
- 7













