 【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-20250212政企、创业者必读 DeepSeek出现之前 我们对大模型发展趋势的十大预判 13政企、创业者必读 14 DeepSeek出现之前的十大预判 之一 传统AGI发展步伐在放慢 需要寻找新方向  Scaling Law边际效应递减  人类训练数据接近枯竭  合成数据无法创造新知识  推理能力难以泛化,成本高昂 全面超越人类的人工智能在逻辑上不成立政企、创业者必读 15 DeepSeek出现之前的十大预判 25 颠覆式创新的四种方式政企、创业者必读 DeepSeek-R1突破了大模型Scaling Law瓶颈 导致大模型悲观论 认为大模型的能力无法进一步得到质的提升 开辟强化学习新范式 从预训练Scaling Law转变为强化学习Scaling Law 大数据+大参数+大算力的 预训练Scaling Law的边际效应递减 • 人类构造的训练数据已达上限 • 万亿参数规模之后,继续增大参数规 训练算力成本和工程化难度大幅上升 强化学习Scaling Law • 利用合成数据解决数据用尽问题 • 利用self-play强化学习,在不增大参 数规模前提下,大幅提升复杂推理能力 • 通过后训练算力和推理算力,在不增加 预训练算力前提下,大幅提升模型性能 DeepSeek颠覆式创新——技术创新 26政企、创业者必读  预训练模型如GPT——疯狂读书,积 累知识,Scaling law撞墙  预训练模型思考深度不够0 码力 | 76 页 | 5.02 MB | 5 月前3 【周鸿祎清华演讲】DeepSeek给我们带来的创业机会-360周鸿祎-20250212政企、创业者必读 DeepSeek出现之前 我们对大模型发展趋势的十大预判 13政企、创业者必读 14 DeepSeek出现之前的十大预判 之一 传统AGI发展步伐在放慢 需要寻找新方向  Scaling Law边际效应递减  人类训练数据接近枯竭  合成数据无法创造新知识  推理能力难以泛化,成本高昂 全面超越人类的人工智能在逻辑上不成立政企、创业者必读 15 DeepSeek出现之前的十大预判 25 颠覆式创新的四种方式政企、创业者必读 DeepSeek-R1突破了大模型Scaling Law瓶颈 导致大模型悲观论 认为大模型的能力无法进一步得到质的提升 开辟强化学习新范式 从预训练Scaling Law转变为强化学习Scaling Law 大数据+大参数+大算力的 预训练Scaling Law的边际效应递减 • 人类构造的训练数据已达上限 • 万亿参数规模之后,继续增大参数规 训练算力成本和工程化难度大幅上升 强化学习Scaling Law • 利用合成数据解决数据用尽问题 • 利用self-play强化学习,在不增大参 数规模前提下,大幅提升复杂推理能力 • 通过后训练算力和推理算力,在不增加 预训练算力前提下,大幅提升模型性能 DeepSeek颠覆式创新——技术创新 26政企、创业者必读  预训练模型如GPT——疯狂读书,积 累知识,Scaling law撞墙  预训练模型思考深度不够0 码力 | 76 页 | 5.02 MB | 5 月前3
 The Roles of Symmetry And Orthogonality In Designimplies high predictability and consistent behavior (once pattern is recognized) Enables system scaling (in size and complexity)Charley Bay - charleyb123 at gmail dot com The Roles of Symmetry And Orthogonality 2021 The Phi Scaling Angle In Nature https://cosmometry.net/phi-scaling-angle Spruce tree is fractal scaling based on phi angle Pelicans in flight with phi angle Tree bark scaling at phi angle Benefits: • Less complexity • Fewer edge cases • Increased stability • Greater reuse • Better scaling Make things “unrelated” • Orthogonal design: • Associated with simplicity (the more orthogonal0 码力 | 151 页 | 3.20 MB | 6 月前3 The Roles of Symmetry And Orthogonality In Designimplies high predictability and consistent behavior (once pattern is recognized) Enables system scaling (in size and complexity)Charley Bay - charleyb123 at gmail dot com The Roles of Symmetry And Orthogonality 2021 The Phi Scaling Angle In Nature https://cosmometry.net/phi-scaling-angle Spruce tree is fractal scaling based on phi angle Pelicans in flight with phi angle Tree bark scaling at phi angle Benefits: • Less complexity • Fewer edge cases • Increased stability • Greater reuse • Better scaling Make things “unrelated” • Orthogonal design: • Associated with simplicity (the more orthogonal0 码力 | 151 页 | 3.20 MB | 6 月前3
 02 TiDB Operator 架构与实现 付业成Portable, Scalable, Automated Full lifecycle management of TiDB cluster - Deployment - Upgrading - Scaling - Handle network, hardware failures, etc. - Backup/Restore/Data migration - ... TiDB Operator Bootstrapping - Configure services/configmaps, etc. - Upgrading - Change version, config, etc. - Scaling - Automatic Failover - Create replacements for failed replicas PingCAP.com Deploying - Bootstrapping successfully, scale in the StatefulSet Scaling In - PD PingCAP.com - Delete the TiKV store via PD API first - Wait for the store to be tombstoned - Scale the statefulset Scaling In - TiKV PingCAP.com - Scale0 码力 | 47 页 | 1.73 MB | 6 月前3 02 TiDB Operator 架构与实现 付业成Portable, Scalable, Automated Full lifecycle management of TiDB cluster - Deployment - Upgrading - Scaling - Handle network, hardware failures, etc. - Backup/Restore/Data migration - ... TiDB Operator Bootstrapping - Configure services/configmaps, etc. - Upgrading - Change version, config, etc. - Scaling - Automatic Failover - Create replacements for failed replicas PingCAP.com Deploying - Bootstrapping successfully, scale in the StatefulSet Scaling In - PD PingCAP.com - Delete the TiKV store via PD API first - Wait for the store to be tombstoned - Scale the statefulset Scaling In - TiKV PingCAP.com - Scale0 码力 | 47 页 | 1.73 MB | 6 月前3
 Back to Basics: Concurrencyin a chip will approximately double every 24 months." --Gordon Moore, Intel co-founderDennard Scaling (1/3) "The number of transistors incorporated in a chip will approximately double every 24 months tightly together • Heat becomes a problem • Energy consumption increases • (i.e. Dennard Scaling)Dennard Scaling (2/3) "The number of transistors incorporated in a chip will approximately double every • (i.e. Dennard Scaling) So the hardware industry has adapted (effectively keeping Moore’s Law accurate) We have more smaller cpus (i.e., cores) on our machinesDennard Scaling (3/3) "The number0 码力 | 141 页 | 6.02 MB | 6 月前3 Back to Basics: Concurrencyin a chip will approximately double every 24 months." --Gordon Moore, Intel co-founderDennard Scaling (1/3) "The number of transistors incorporated in a chip will approximately double every 24 months tightly together • Heat becomes a problem • Energy consumption increases • (i.e. Dennard Scaling)Dennard Scaling (2/3) "The number of transistors incorporated in a chip will approximately double every • (i.e. Dennard Scaling) So the hardware industry has adapted (effectively keeping Moore’s Law accurate) We have more smaller cpus (i.e., cores) on our machinesDennard Scaling (3/3) "The number0 码力 | 141 页 | 6.02 MB | 6 月前3
 MITRE Defense Agile Acquisition Guide - Mar 2014Potential Agile Program Structures ......................................................... 52 16 Scaling Agile .......................................................................................... non-delivery; the government-led development team should be actively managing the development cycle and scaling- back capabilities when needed to meet the time-boxed sprint and release schedule. On the other maturity, training and documentation)  Do stakeholders agree with the release tempo? 16 Scaling Agile While Agile works best with small, self-organized, co-located teams, some mid-to-large programs0 码力 | 74 页 | 3.57 MB | 5 月前3 MITRE Defense Agile Acquisition Guide - Mar 2014Potential Agile Program Structures ......................................................... 52 16 Scaling Agile .......................................................................................... non-delivery; the government-led development team should be actively managing the development cycle and scaling- back capabilities when needed to meet the time-boxed sprint and release schedule. On the other maturity, training and documentation)  Do stakeholders agree with the release tempo? 16 Scaling Agile While Agile works best with small, self-organized, co-located teams, some mid-to-large programs0 码力 | 74 页 | 3.57 MB | 5 月前3
 Modern C++ for Parallelism in High Performance Computingparallel, with fairly predictable performance. The D2D bench- mark explores to what extent we achieve scaling for different parallelization strategies: C-style programming with OpenMP, native mechanisms in modern parallel execution is that this code is bandwidth-limited, so we expect parallel efficiency to stop scaling at a certain core count. It is an interesting question whether some parallelism models have other large class of numerical analysis algorithms. It comprises the following operations: • An array scaling, which is a perfectly parallel operation. • A norm computation. This is a ‘reduction’, which would0 码力 | 3 页 | 91.16 KB | 6 月前3 Modern C++ for Parallelism in High Performance Computingparallel, with fairly predictable performance. The D2D bench- mark explores to what extent we achieve scaling for different parallelization strategies: C-style programming with OpenMP, native mechanisms in modern parallel execution is that this code is bandwidth-limited, so we expect parallel efficiency to stop scaling at a certain core count. It is an interesting question whether some parallelism models have other large class of numerical analysis algorithms. It comprises the following operations: • An array scaling, which is a perfectly parallel operation. • A norm computation. This is a ‘reduction’, which would0 码力 | 3 页 | 91.16 KB | 6 月前3
 PAI & TVM Meetup - Shanghai 20191116threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We 。, “UN1T1a:111T1a SUMT1C(G 了引包cf =“c=1JoalB)ioat人+C XC6CT6IT6032 三Dloss5ca/9g=gsca/e ctom7 No need to modify or add any line of code. 计算平台事业部 COMPUTING PLATFORM Loss Scaling in TF 下和全于由 loss = loss_fn() opt = tf.Adamoptimizer(learning_rate=...) # minimize() on the loss scale optimizer. train_op = loss_scale_optimizer.minimize(1oss) Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients0 码力 | 26 页 | 5.82 MB | 5 月前3 PAI & TVM Meetup - Shanghai 20191116threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We 。, “UN1T1a:111T1a SUMT1C(G 了引包cf =“c=1JoalB)ioat人+C XC6CT6IT6032 三Dloss5ca/9g=gsca/e ctom7 No need to modify or add any line of code. 计算平台事业部 COMPUTING PLATFORM Loss Scaling in TF 下和全于由 loss = loss_fn() opt = tf.Adamoptimizer(learning_rate=...) # minimize() on the loss scale optimizer. train_op = loss_scale_optimizer.minimize(1oss) Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients0 码力 | 26 页 | 5.82 MB | 5 月前3
 DeepSeek图解10页PDF限于该数据集的领域或问题。因此,这类模型的应用范围较为局限,通常只 能解决特定领域或单一任务的问题。 Scaling Laws 大家可能在很多场合都见到过。它是一个什么法则呢?大 模型之所以能基于大量多样化的数据集进行训练,并最终“学得好”,核 心原因之一是 Scaling Laws(扩展规律)的指导和模型自身架构的优势。 Scaling Laws 指出参数越多,模型学习能力越强;训练数据规模越大、越多 元化,模 元化,模型最后就会越通用;即使包括噪声数据,模型仍能通过扩展规律提 取出通用的知识。而 Transformer 这种架构正好完美做到了 Scaling Laws, Transformer 就是自然语言处理领域实现扩展规律的最好的网络结构。 2.2 Transformer 基础架构 LLM 依赖于 2017 年 Google 提出的 Transformer 模型,该架构相比传统的 RNN(递归神经网络)和0 码力 | 11 页 | 2.64 MB | 8 月前3 DeepSeek图解10页PDF限于该数据集的领域或问题。因此,这类模型的应用范围较为局限,通常只 能解决特定领域或单一任务的问题。 Scaling Laws 大家可能在很多场合都见到过。它是一个什么法则呢?大 模型之所以能基于大量多样化的数据集进行训练,并最终“学得好”,核 心原因之一是 Scaling Laws(扩展规律)的指导和模型自身架构的优势。 Scaling Laws 指出参数越多,模型学习能力越强;训练数据规模越大、越多 元化,模 元化,模型最后就会越通用;即使包括噪声数据,模型仍能通过扩展规律提 取出通用的知识。而 Transformer 这种架构正好完美做到了 Scaling Laws, Transformer 就是自然语言处理领域实现扩展规律的最好的网络结构。 2.2 Transformer 基础架构 LLM 依赖于 2017 年 Google 提出的 Transformer 模型,该架构相比传统的 RNN(递归神经网络)和0 码力 | 11 页 | 2.64 MB | 8 月前3
 MuPDF 1.22.0 Documentationa transformation matrix for the zoom and rotation desired. */ /* The default resolution without scaling is 72 dpi. */ ctm = fz_scale(zoom / 100, zoom / 100); ctm = fz_pre_rotate(ctm, rotate); /* Render color (default 303030). -C hex-color Set white tint color (default FFFFF0). -Y factor Set UI scaling factor (default calculated from screen DPI). [page] The initial page number to show. Example usage Identity The identity matrix, short hand for [1,0,0,1,0,0]. Methods Scale(sx, sy) Returns a scaling matrix, short hand for [sx,0,0,sy,0,0]. Returns [a,b,c,d,e,f]. Translate(tx, ty) Return a translation0 码力 | 175 页 | 698.87 KB | 8 月前3 MuPDF 1.22.0 Documentationa transformation matrix for the zoom and rotation desired. */ /* The default resolution without scaling is 72 dpi. */ ctm = fz_scale(zoom / 100, zoom / 100); ctm = fz_pre_rotate(ctm, rotate); /* Render color (default 303030). -C hex-color Set white tint color (default FFFFF0). -Y factor Set UI scaling factor (default calculated from screen DPI). [page] The initial page number to show. Example usage Identity The identity matrix, short hand for [1,0,0,1,0,0]. Methods Scale(sx, sy) Returns a scaling matrix, short hand for [sx,0,0,sy,0,0]. Returns [a,b,c,d,e,f]. Translate(tx, ty) Return a translation0 码力 | 175 页 | 698.87 KB | 8 月前3
 TiDB v8.5 DocumentationEasy horizontal scaling The TiDB architecture design separates computing from storage, letting you scale out or scale in the computing or storage capacity online as needed. The scaling process is transparent cost-effective solution that adopts a separate comput- ing and storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes scenarios: • Verify TiDB version upgrades 48 • Assess change impact • Validate performance before scaling TiDB • Test performance limits For more information, see documentation. 2.2.1.4 SQL • Support0 码力 | 6730 页 | 111.36 MB | 10 月前3 TiDB v8.5 DocumentationEasy horizontal scaling The TiDB architecture design separates computing from storage, letting you scale out or scale in the computing or storage capacity online as needed. The scaling process is transparent cost-effective solution that adopts a separate comput- ing and storage architecture, enabling easy scaling of computing or storage capacity separately. The computing layer supports a maximum of 512 nodes scenarios: • Verify TiDB version upgrades 48 • Assess change impact • Validate performance before scaling TiDB • Test performance limits For more information, see documentation. 2.2.1.4 SQL • Support0 码力 | 6730 页 | 111.36 MB | 10 月前3
共 81 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9














