 【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-20250212政企、创业者必读 DeepSeek出现之前 我们对大模型发展趋势的十大预判 13政企、创业者必读 14 DeepSeek出现之前的十大预判 之一 传统AGI发展步伐在放慢 需要寻找新方向  Scaling Law边际效应递减  人类训练数据接近枯竭  合成数据无法创造新知识  推理能力难以泛化,成本高昂 全面超越人类的人工智能在逻辑上不成立政企、创业者必读 15 DeepSeek出现之前的十大预判 25 颠覆式创新的四种方式政企、创业者必读 DeepSeek-R1突破了大模型Scaling Law瓶颈 导致大模型悲观论 认为大模型的能力无法进一步得到质的提升 开辟强化学习新范式 从预训练Scaling Law转变为强化学习Scaling Law 大数据+大参数+大算力的 预训练Scaling Law的边际效应递减 • 人类构造的训练数据已达上限 • 万亿参数规模之后,继续增大参数规 训练算力成本和工程化难度大幅上升 强化学习Scaling Law • 利用合成数据解决数据用尽问题 • 利用self-play强化学习,在不增大参 数规模前提下,大幅提升复杂推理能力 • 通过后训练算力和推理算力,在不增加 预训练算力前提下,大幅提升模型性能 DeepSeek颠覆式创新——技术创新 26政企、创业者必读  预训练模型如GPT——疯狂读书,积 累知识,Scaling law撞墙  预训练模型思考深度不够0 码力 | 76 页 | 5.02 MB | 5 月前3 【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-20250212政企、创业者必读 DeepSeek出现之前 我们对大模型发展趋势的十大预判 13政企、创业者必读 14 DeepSeek出现之前的十大预判 之一 传统AGI发展步伐在放慢 需要寻找新方向  Scaling Law边际效应递减  人类训练数据接近枯竭  合成数据无法创造新知识  推理能力难以泛化,成本高昂 全面超越人类的人工智能在逻辑上不成立政企、创业者必读 15 DeepSeek出现之前的十大预判 25 颠覆式创新的四种方式政企、创业者必读 DeepSeek-R1突破了大模型Scaling Law瓶颈 导致大模型悲观论 认为大模型的能力无法进一步得到质的提升 开辟强化学习新范式 从预训练Scaling Law转变为强化学习Scaling Law 大数据+大参数+大算力的 预训练Scaling Law的边际效应递减 • 人类构造的训练数据已达上限 • 万亿参数规模之后,继续增大参数规 训练算力成本和工程化难度大幅上升 强化学习Scaling Law • 利用合成数据解决数据用尽问题 • 利用self-play强化学习,在不增大参 数规模前提下,大幅提升复杂推理能力 • 通过后训练算力和推理算力,在不增加 预训练算力前提下,大幅提升模型性能 DeepSeek颠覆式创新——技术创新 26政企、创业者必读  预训练模型如GPT——疯狂读书,积 累知识,Scaling law撞墙  预训练模型思考深度不够0 码力 | 76 页 | 5.02 MB | 5 月前3
 MITRE Defense Agile Acquisition Guide - Mar 2014Potential Agile Program Structures ......................................................... 52 16 Scaling Agile .......................................................................................... non-delivery; the government-led development team should be actively managing the development cycle and scaling- back capabilities when needed to meet the time-boxed sprint and release schedule. On the other maturity, training and documentation)  Do stakeholders agree with the release tempo? 16 Scaling Agile While Agile works best with small, self-organized, co-located teams, some mid-to-large programs0 码力 | 74 页 | 3.57 MB | 5 月前3 MITRE Defense Agile Acquisition Guide - Mar 2014Potential Agile Program Structures ......................................................... 52 16 Scaling Agile .......................................................................................... non-delivery; the government-led development team should be actively managing the development cycle and scaling- back capabilities when needed to meet the time-boxed sprint and release schedule. On the other maturity, training and documentation)  Do stakeholders agree with the release tempo? 16 Scaling Agile While Agile works best with small, self-organized, co-located teams, some mid-to-large programs0 码力 | 74 页 | 3.57 MB | 5 月前3
 PAI & TVM Meetup - Shanghai 20191116threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We 。, “UN1T1a:111T1a SUMT1C(G 了引包cf =“c=1JoalB)ioat人+C XC6CT6IT6032 三Dloss5ca/9g=gsca/e ctom7 No need to modify or add any line of code. 计算平台事业部 COMPUTING PLATFORM Loss Scaling in TF 下和全于由 loss = loss_fn() opt = tf.Adamoptimizer(learning_rate=...) # minimize() on the loss scale optimizer. train_op = loss_scale_optimizer.minimize(1oss) Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients0 码力 | 26 页 | 5.82 MB | 5 月前3 PAI & TVM Meetup - Shanghai 20191116threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We 。, “UN1T1a:111T1a SUMT1C(G 了引包cf =“c=1JoalB)ioat人+C XC6CT6IT6032 三Dloss5ca/9g=gsca/e ctom7 No need to modify or add any line of code. 计算平台事业部 COMPUTING PLATFORM Loss Scaling in TF 下和全于由 loss = loss_fn() opt = tf.Adamoptimizer(learning_rate=...) # minimize() on the loss scale optimizer. train_op = loss_scale_optimizer.minimize(1oss) Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients0 码力 | 26 页 | 5.82 MB | 5 月前3
 Trends Artificial Intelligence
infrastructure investments slowed & revenue grew… will AI follow? From 2020, AWS began rapidly scaling CapEx (+30% Y/Y) to build AI / ML infrastructure, potentially restarting cycle CapEx Spend @ Amazon infrastructure specialists is emerging to meet this demand. CoreWeave has become one of the fastest-scaling cloud GPU providers, repurposing gaming and Crypto hardware supply chains to serve enterprise AI highly performant AI cloud infrastructure required for the most advanced applications. We are scaling as fast as possible to capture that demand. The future runs on CoreWeave. - CoreWeave CEO Michael0 码力 | 340 页 | 12.14 MB | 4 月前3 Trends Artificial Intelligence
infrastructure investments slowed & revenue grew… will AI follow? From 2020, AWS began rapidly scaling CapEx (+30% Y/Y) to build AI / ML infrastructure, potentially restarting cycle CapEx Spend @ Amazon infrastructure specialists is emerging to meet this demand. CoreWeave has become one of the fastest-scaling cloud GPU providers, repurposing gaming and Crypto hardware supply chains to serve enterprise AI highly performant AI cloud infrastructure required for the most advanced applications. We are scaling as fast as possible to capture that demand. The future runs on CoreWeave. - CoreWeave CEO Michael0 码力 | 340 页 | 12.14 MB | 4 月前3
 Real-Time Unified Data Layers:
A New Era for Scalable Analytics,
Search, and AI(structured, semi-structured, unstructured). Scaling Costs Are Too High Traditional databases require expensive tuning, hardware, and licensing to scale. Scaling Smoothly as the Data Volume Grows Thanks to0 码力 | 10 页 | 2.82 MB | 5 月前3 Real-Time Unified Data Layers:
A New Era for Scalable Analytics,
Search, and AI(structured, semi-structured, unstructured). Scaling Costs Are Too High Traditional databases require expensive tuning, hardware, and licensing to scale. Scaling Smoothly as the Data Volume Grows Thanks to0 码力 | 10 页 | 2.82 MB | 5 月前3
 安全简介Agile Software Requirements: Lean Requirements for Teams Programs and the Enterprise (2011) and Scaling Software Agility: Best Practices for Large Enterprieses (2007) Implementing agile practices at enterprise0 码力 | 2 页 | 304.16 KB | 5 月前3 安全简介Agile Software Requirements: Lean Requirements for Teams Programs and the Enterprise (2011) and Scaling Software Agility: Best Practices for Large Enterprieses (2007) Implementing agile practices at enterprise0 码力 | 2 页 | 304.16 KB | 5 月前3
 DevOps Meetupmeasure it.  Do process map.  Do focus on Quality first.  Do start a book club. Book List  Scaling Lean & Agile Development: Thinking & Organizational Tools for Large-Scale Scrum, Craig Larman 0 码力 | 2 页 | 246.04 KB | 5 月前3 DevOps Meetupmeasure it.  Do process map.  Do focus on Quality first.  Do start a book club. Book List  Scaling Lean & Agile Development: Thinking & Organizational Tools for Large-Scale Scrum, Craig Larman 0 码力 | 2 页 | 246.04 KB | 5 月前3
 julia 1.10.10≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 1692 页 | 6.34 MB | 3 月前3 julia 1.10.10≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 1692 页 | 6.34 MB | 3 月前3
 Julia 1.10.9≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 1692 页 | 6.34 MB | 3 月前3 Julia 1.10.9≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 1692 页 | 6.34 MB | 3 月前3
 Julia 1.11.4≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 2007 页 | 6.73 MB | 3 月前3 Julia 1.11.4≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance matrix Bidiagonal Upper/lower bidiagonal matrix Diagonal Diagonal matrix UniformScaling Uniform scaling operator Elementary operations Matrix type + - * \ Other functions with optimized methods Symmetric corresponding to the characteristic values x=[x1, x2,...] is available eigvecs(M, x) The uniform scaling operator A UniformScaling operator represents a scalar times the identity operator, λ*I. The identity0 码力 | 2007 页 | 6.73 MB | 3 月前3
共 19 条
- 1
- 2














