Trends Artificial Intelligence
User + Usage + CapEx Growth = Unprecedented • AI Model Compute Costs High / Rising + Inference Costs Per Token Falling = Performance Converging + Developer Usage Rising • AI Usage + Cost + Loss Growth Page 293 USA – LLM #1 China USA – LLM #2 AI Model Compute Costs High / Rising + Inference Costs Per Token Falling = Performance Converging + Developer Usage Rising 3 Cost of Key Technologies Relative competitive. Breakthroughs in large models, cost-per-token declines, open-source proliferation and chip performance improvements are making new tech advances increasingly more powerful, accessible, and economically0 码力 | 340 页 | 12.14 MB | 5 月前3
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelstronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. The model checkpoints are available at h t t p s : / / g i t h u b . p S e e k - V 2 . 0 20 40 60 80 100 Activated Parameters (Billions) 55 60 65 70 75 80 Performance (MMLU) DeepSeek-V2 DeepSeek 67B LLaMA 1 33B LLaMA 1 65B LLaMA 2 13B LLaMA 2 34B LLaMA 20 码力 | 52 页 | 1.23 MB | 1 年前3
Google 《Prompt Engineering v7》up the LLM to predict the right sequence of tokens. Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs. This process involves tinkering to find temperature (T), emphasizing a single, preferred temperature with high certainty. A higher Gemini temperature setting is like a high softmax temperature, making a wider range of temperatures around the irrelevant–the most probable token becomes the next token predicted. If you set temperature extremely high (above 1–generally into the 10s), temperature becomes irrelevant and whatever tokens make it through0 码力 | 68 页 | 6.50 MB | 6 月前3
OpenAI 《A practical guide to building agents》and automate workflows, agents are able to perform the same workflows on the users’ behalf with a high degree of independence. Agents are systems that independently accomplish tasks on your behalf. A workflow well is to build your agent prototype with the most capable model for every task to establish a performance baseline. From there, try swapping in smaller models to see if they still achieve acceptable fail. In summary, the principles for choosing a model are simple: 01 Set up evals to establish a performance baseline 02 Focus on meeting your accuracy target with the best models available 03 Optimize for0 码力 | 34 页 | 7.00 MB | 6 月前3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单energy storage technologies allow intermittent renewable energy to replace traditional energy. High-performance secondary batteries are one of the most promising candidates for large-scale energy storage storage devices with their high output voltage, high energy density, and long cycle life. In order to meet the strong demand for further improving its electrochemical performance, the search for sustainable materials that provide lithium-ion batteries with safe and stable cyclic performance, while providing high capacity and high voltage curves, has sparked in-depth research and discussion. As a promising0 码力 | 85 页 | 8.31 MB | 8 月前3
OpenAI - AI in the EnterpriseWe’re seeing AI deliver significant, measurable improvements on three fronts: 01 Workforce performance Helping people deliver higher-quality outputs in shorter time frames. 02 Automating routine product improvements. That means shipping updates regularly, getting feedback, and improving performance and safety at every step. The result: users access new advancements in AI early and often—and Set bold automation goals Most processes involve a lot of rote work, ripe for automation. Aim high. Let’s drill down into each of these, with customer stories as examples. 5 AI in the EnterpriseLesson0 码力 | 25 页 | 9.48 MB | 5 月前3
XDNN TVM - Nov 2019DNN Specific Instruction Set Convolution, Max Pool etc. ˃ Any Network, Any Image Size ˃ High Frequency & High Compute Efficiency ˃ Supported on U200 – 3 Instances U250 – 4 Instances Amazon F1 ˃ ~1536 c_char_p(graph_path.value).value layout = c_char_p(output_layout.value).value … >> 12© Copyright 2018 Xilinx Performance Pipelines ˃ References to our latest results: https://github.com/Xilinx/AI-Model-Zoo (embedded measurements we track: Latency & Throughput ˃ ML pipeline contains multiple stages, performance limited by slowest one ˃ Performance results based on Xilinx own runtime pipeline available in github (https://github0 码力 | 16 页 | 3.35 MB | 5 月前3
TVM: Where Are We Goingoptimized Open source, automated end-to- end optimization framework for deep learning.TVM Stack High-Level Differentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Deep Learning Frameworks High-level data flow graph Hardware Primitive Tensor operators such as Conv2D eg. cuDNN Offload to Engineering intensiveMachine Learning based Program Optimizer TVM: Learning-based Learning System High-level data flow graph and optimizations Directly generate optimized program for new operator workloads0 码力 | 31 页 | 22.64 MB | 5 月前3
Facebook -- TVM AWS Meetup TalkTVM at Facebook Lots of contributors at FB and elsewhere- Performance matters a lot - Heterogenous computing environment - High variety of workloads - Ever-increasing set of primitives (over 500 500 aten kernels) - Interpreter methods not delivering generalized performance 2 Why TVM? XTVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster0 码力 | 11 页 | 3.08 MB | 5 月前3
OctoML OSS 2019 11 8Models) Host Device High-Level 人 ORGREEE Te Conv2D mized RE -一 一 QQ octoML Transformer Improvements Tensorflow. 5 , Improve scheduling of batch matrix multiplies. 时”Early autotuning templates improve performance by ~20% e What we're working on: This prevents most compute layers from being fused. Reshape0 码力 | 16 页 | 1.77 MB | 5 月前3
共 17 条
- 1
- 2













