high performance journal - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Trends Artificial Intelligence

User + Usage + CapEx Growth = Unprecedented • AI Model Compute Costs High / Rising + Inference Costs Per Token Falling = Performance Converging + Developer Usage Rising • AI Usage + Cost + Loss Growth Page 293 USA – LLM #1 China USA – LLM #2 AI Model Compute Costs High / Rising + Inference Costs Per Token Falling = Performance Converging + Developer Usage Rising 3 Cost of Key Technologies Relative competitive. Breakthroughs in large models, cost-per-token declines, open-source proliferation and chip performance improvements are making new tech advances increasingly more powerful, accessible, and economically

0 码力 | 340 页 | 12.14 MB | 5 月前
3
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. The model checkpoints are available at h t t p s : / / g i t h u b . p S e e k - V 2 . 0 20 40 60 80 100 Activated Parameters (Billions) 55 60 65 70 75 80 Performance (MMLU) DeepSeek-V2 DeepSeek 67B LLaMA 1 33B LLaMA 1 65B LLaMA 2 13B LLaMA 2 34B LLaMA 2

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Google 《Prompt Engineering v7》

up the LLM to predict the right sequence of tokens. Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs. This process involves tinkering to find temperature (T), emphasizing a single, preferred temperature with high certainty. A higher Gemini temperature setting is like a high softmax temperature, making a wider range of temperatures around the irrelevant–the most probable token becomes the next token predicted. If you set temperature extremely high (above 1–generally into the 10s), temperature becomes irrelevant and whatever tokens make it through

0 码力 | 68 页 | 6.50 MB | 6 月前
3
OpenAI 《A practical guide to building agents》

and automate workflows, agents are able to perform the same workflows on the users’ behalf with a high degree of independence. Agents are systems that independently accomplish tasks on your behalf. A workflow well is to build your agent prototype with the most capable model for every task to establish a performance baseline. From there, try swapping in smaller models to see   if they still achieve acceptable fail. In summary, the principles for choosing a model are simple: 01 Set up evals to establish a performance baseline 02 Focus on meeting your accuracy target with the best models available 03 Optimize for

0 码力 | 34 页 | 7.00 MB | 6 月前
3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单

energy storage technologies allow intermittent renewable energy to replace traditional energy. High-performance secondary batteries are one of the most promising candidates for large-scale energy storage storage devices with their high output voltage, high energy density, and long cycle life. In order to meet the strong demand for further improving its electrochemical performance, the search for sustainable materials that provide lithium-ion batteries with safe and stable cyclic performance, while providing high capacity and high voltage curves, has sparked in-depth research and discussion. As a promising

0 码力 | 85 页 | 8.31 MB | 8 月前
3
OpenAI - AI in the Enterprise

We’re seeing AI deliver significant, measurable improvements on three fronts: 01 Workforce performance Helping people deliver higher-quality outputs in shorter   time frames. 02 Automating routine product improvements. That means shipping updates regularly, getting feedback, and improving performance and safety at every step. The result: users access new advancements in AI early and often—and Set bold   automation goals Most processes involve a lot of rote work, ripe for automation. Aim high. Let’s drill down into each of these, with customer stories as examples. 5 AI in the EnterpriseLesson

0 码力 | 25 页 | 9.48 MB | 5 月前
3
XDNN TVM - Nov 2019

DNN Specific Instruction Set Convolution, Max Pool etc. ˃ Any Network, Any Image Size ˃ High Frequency & High Compute Efficiency ˃ Supported on U200 – 3 Instances U250 – 4 Instances Amazon F1 ˃ ~1536 c_char_p(graph_path.value).value layout = c_char_p(output_layout.value).value … >> 12© Copyright 2018 Xilinx Performance Pipelines ˃ References to our latest results: https://github.com/Xilinx/AI-Model-Zoo (embedded measurements we track: Latency & Throughput ˃ ML pipeline contains multiple stages, performance limited by slowest one ˃ Performance results based on Xilinx own runtime pipeline available in github (https://github

0 码力 | 16 页 | 3.35 MB | 5 月前
3
TVM: Where Are We Going

optimized Open source, automated end-to- end optimization framework for deep learning.TVM Stack High-Level Differentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data flow graph Hardware Primitive Tensor operators such as Conv2D eg. cuDNN Offload to Engineering intensiveMachine Learning based Program Optimizer TVM: Learning-based Learning System High-level data flow graph and optimizations Directly generate optimized program for new operator workloads

0 码力 | 31 页 | 22.64 MB | 5 月前
3
Facebook -- TVM AWS Meetup Talk

TVM at Facebook Lots of contributors at FB and elsewhere- Performance matters a lot - Heterogenous computing environment - High variety of workloads - Ever-increasing set of primitives (over 500 500 aten kernels) - Interpreter methods not delivering generalized performance 2 Why TVM? XTVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster

0 码力 | 11 页 | 3.08 MB | 5 月前
3
OctoML OSS 2019 11 8

Models) Host Device High-Level 人 ORGREEE Te Conv2D mized RE -一一 QQ octoML Transformer Improvements Tensorflow. 5 ， Improve scheduling of batch matrix multiplies. 时”Early autotuning templates improve performance by ~20% e What we're working on: This prevents most compute layers from being fused. Reshape

0 码力 | 16 页 | 1.77 MB | 5 月前
3

共 17 条前往

页

分类

语言

格式

Trends Artificial Intelligence

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Google 《Prompt Engineering v7》

OpenAI 《A practical guide to building agents》

清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单

OpenAI - AI in the Enterprise

XDNN TVM - Nov 2019

TVM: Where Are We Going

Facebook -- TVM AWS Meetup Talk

OctoML OSS 2019 11 8