scaling - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-202502

12政企、创业者必读 DeepSeek出现之前我们对大模型发展趋势的十大预判 13政企、创业者必读 14 DeepSeek出现之前的十大预判之一传统AGI发展步伐在放慢需要寻找新方向  Scaling Law边际效应递减  人类训练数据接近枯竭  合成数据无法创造新知识  推理能力难以泛化，成本高昂全面超越人类的人工智能在逻辑上不成立政企、创业者必读 15 DeepSeek出现之前的十大预判 25 颠覆式创新的四种方式政企、创业者必读 DeepSeek-R1突破了大模型Scaling Law瓶颈导致大模型悲观论认为大模型的能力无法进一步得到质的提升开辟强化学习新范式从预训练Scaling Law转变为强化学习Scaling Law 大数据+大参数+大算力的预训练Scaling Law的边际效应递减 • 人类构造的训练数据已达上限 • 万亿参数规模之后，继续增大参数规训练算力成本和工程化难度大幅上升强化学习Scaling Law • 利用合成数据解决数据用尽问题 • 利用self-play强化学习，在不增大参数规模前提下，大幅提升复杂推理能力 • 通过后训练算力和推理算力，在不增加预训练算力前提下，大幅提升模型性能 DeepSeek颠覆式创新——技术创新 26政企、创业者必读  预训练模型如GPT——疯狂读书，积累知识，Scaling law撞墙  预训练模型思考深度不够

0 码力 | 76 页 | 5.02 MB | 5 月前
3
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

we employ additional RMS Norm layers after the compressed latent vectors, and multiply additional scaling factors at the width bottlenecks (i.e., the compressed latent vectors and the intermediate hidden Slightly diverging from original YaRN, due to our distinct attention mechanism, we adjust the length scaling factor to modulate the attention entropy. The factor √ ? is computed as √ ? = 0.0707 ln ? + 1, intelligence. • In our ongoing exploration, we are dedicated to devising methods that enable further scaling up MoE models while maintaining economical training and inference costs. The goal of our next step

0 码力 | 52 页 | 1.23 MB | 1 年前
3
PAI & TVM Meetup - Shanghai 20191116

threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We 。， “UN1T1a:111T1a SUMT1C(G 了引包cf =“c=1JoalB)ioat人+C XC6CT6IT6032 三Dloss5ca/9g=gsca/e ctom7 No need to modify or add any line of code. 计算平台事业部 COMPUTING PLATFORM Loss Scaling in TF 下和全于由 loss = loss_fn() opt = tf.Adamoptimizer(learning_rate=...) # minimize() on the loss scale optimizer. train_op = loss_scale_optimizer.minimize(1oss) Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients

0 码力 | 26 页 | 5.82 MB | 5 月前
3
DeepSeek图解10页PDF

限于该数据集的领域或问题。因此，这类模型的应用范围较为局限，通常只能解决特定领域或单一任务的问题。 Scaling Laws 大家可能在很多场合都见到过。它是一个什么法则呢？大模型之所以能基于大量多样化的数据集进行训练，并最终“学得好”，核心原因之一是 Scaling Laws（扩展规律）的指导和模型自身架构的优势。 Scaling Laws 指出参数越多，模型学习能力越强；训练数据规模越大、越多元化，模元化，模型最后就会越通用；即使包括噪声数据，模型仍能通过扩展规律提取出通用的知识。而 Transformer 这种架构正好完美做到了 Scaling Laws， Transformer 就是自然语言处理领域实现扩展规律的最好的网络结构。 2.2 Transformer 基础架构 LLM 依赖于 2017 年 Google 提出的 Transformer 模型，该架构相比传统的 RNN（递归神经网络）和

0 码力 | 11 页 | 2.64 MB | 8 月前
3
Trends Artificial Intelligence

infrastructure investments slowed & revenue grew… will AI follow? From 2020, AWS began rapidly scaling CapEx (+30% Y/Y) to build AI / ML infrastructure, potentially restarting cycle CapEx Spend @ Amazon infrastructure specialists is emerging to meet this demand. CoreWeave has become one of the fastest-scaling cloud GPU providers, repurposing gaming and Crypto hardware supply chains to serve enterprise AI highly performant AI cloud infrastructure required for the most advanced applications. We are scaling as fast as possible to capture that demand. The future runs on CoreWeave. - CoreWeave CEO Michael

0 码力 | 340 页 | 12.14 MB | 4 月前
3
OpenAI - AI in the Enterprise

visitors coming to the site every month—these increases scale up to significant business impact. But scaling up also meant using more tokens. To increase efficiency, OpenAI and Indeed   worked together to fine-tune

0 码力 | 25 页 | 9.48 MB | 5 月前
3

共 6 条前往

页

分类

语言

格式

【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-202502

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

PAI & TVM Meetup - Shanghai 20191116

DeepSeek图解10页PDF

Trends Artificial Intelligence

OpenAI - AI in the Enterprise