DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelhttps://github.com/meta-llama/llama3/bl ob/main/MODEL_CARD.md. J. Ainslie, J. Lee-Thorp, M. de Jong, Y. Zemlyanskiy, F. Lebrón, and S. Sanghai. Gqa: Training generalized multi-query transformer models from large language models. arXiv preprint arXiv:2108.07732, 2021. J. Bai, S. Bai, Y. Chu, Z. Cui, K. Dang, X. Deng, Y. Fan, W. Ge, Y. Han, F. Huang, B. Hui, L. Ji, M. Li, J. Lin, R. Lin, D. Liu, G. Liu, C. Lu Yang, S. Yang, Y. Yao, B. Yu, H. Yuan, Z. Yuan, J. Zhang, X. Zhang, Y. Zhang, Z. Zhang, C. Zhou, J. Zhou, X. Zhou, and T. Zhu. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023. Y. Bisk, R. Zellers0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
– 2005-2025, Per NVIDIA39 AI Developer Growth (Google Ecosystem as Proxy) = +5x to 7MM Developers Y/Y Developers Building with Gemini, MM AI Development Trending = Unprecedented Note: Per Google in Companies = Inflected With AI’s Rise +250% / Year101 CapEx Spend @ Big Six* Tech Companies = +63% Y/Y & Accelerated… 1ChatGPT WAU data as of 11/23 & 12/24 due to data availability. *Note: Big Six USA investments slowed & revenue grew… will AI follow? From 2020, AWS began rapidly scaling CapEx (+30% Y/Y) to build AI / ML infrastructure, potentially restarting cycle CapEx Spend @ Amazon AWS = Cloud0 码力 | 340 页 | 12.14 MB | 4 月前3
清华大学 普通人如何抓住DeepSeek红利u N e P 6 7 K w S v L C q Y 4 Y V 1 T 8 0 u m B k k m O x d k C i y K r j i 6 n p Y d O w t v B 4 G 0 G p y 8 U I q e T 9 M 6 Deepseek的能力图谱 场景1:课堂上突然跟不上了,怎么办 场景:数学课上,老师正在讲解“隐函数求导”,步骤写到第三行时突然跳过了中间推导,直接给出结果:“所 以这里的dy/dx=(-2x-y)/(x+3y²)”。你盯着白板上的公式一脸懵——前两步的链式法则展开去哪了?为什么分 母突然多了3y²? 周围同学纷纷点头,老师已经翻到下一页讲应用题了。你手心冒汗,想举手提问又怕被说“这 么简单还不会”,不提问又担心后面全听不懂…… 场景1:课堂上突然跟不上了,怎么办 “隐函数求导例题:从方程x² + xy + y³ = 0推导 dy/dx,请展示完整的链式法则展开步骤,特别是分母 3y²的来源。” Ø 秒速获取步骤解析: 立即对照补全笔记,跟上老师进度。 2. 课间5分钟(深度追问) p 适用场景:老师已下课,但10分钟后还有后续课程 p 操作技巧: Ø 追问细节: “为什么对y³求导会得到3y²·dy/dx而不是3y²?” Ø 让AI用类比解释:0 码力 | 65 页 | 4.47 MB | 8 月前3
Manus AI:Agent元年开启2¬L'¶ñ%2µ¶úï,t¡‡&‰÷/Gwþ'¶,Æ'I'¶Gµ¶í!æÆ,01ˆæ"ÚGŠ‹¾%‡LLMG'¶> !"#$%Bloomberg*&'()17 AI Agent%U[R\]+^Y_`+^Z • App Agent>jk23ø-{"4Ý–,ýAgentd5&‰G:}ú•,6×7C”8,9Œ{#,-0•ùÈGøï,t:ßg{¹LH IÁ%kðFG¾%x>$Œ|û;¨Ð©<&‰=*–[> ••‘’_)lm“”WY>Y•)*+–—N˜Š ‹_ )*+™š)lm›˜œ•/Folž/Ÿ ¡‚&¢)lmbN•Ž••(D|£~˜•fg€)8¤¥Ÿ /23¦b§¨¦“ƒ[\ˆ(_ )*+,„/@AF©ªb«¬†56*+<-®¯/01Œ°±_H“²³´&fgœµt<¶·)*+,„@AF©ªb«¬“`¸/¹º*+_²³&)*+,»/(:;v*9//¼{F¼½bv*Y¾œµt¿À_DE-GK“µ»ÁI çñ×ýö÷Ìõ&K§¨þÿŒ‘’)*+áâ&“Ö—)*+˜Ò¶v*!"/#`”$_8%&;Õ‘’)*+,'è/`()*&fgbƒÕú“+,[\-./[_[\Ñ0/!1(:v*Y2;Õ!,(:v*ÅÆ/34;5 6+7ú˜L8_ 9:;=<%&)*+q,Á’/•šy=/>&?-@E-G&@E/y=-G“Öƒ˜¯)Â]/«A_fg“+7B“ˆ([\«A/Â]tÃDCG&!"q,Ò/«¬œµ…™šÙÖ/DE&[\DE/FGœµtHIJK0 码力 | 23 页 | 4.87 MB | 5 月前3
PAI & TVM Meetup - Shanghai 20191116Wi/7TTa:oaaSto/e 。threadIdx.x -> 0 "threadIdxy 让 | Sr -> threadIdx.y/warpDim.y*warpDim.y badGimy -8 y warpDim.y = 32/warpDim.x = 32/blockDim.x Loop scaling We Loss Scaling in PAI-TF Loss Scaling the loss using S 了 Backward propagation in MP N 放gradients( Y ) Unscaled gradients Zero gr: adients Apply gradients 计算平台事业部 COMPUTING PLATFORM COMPUTING PLATFORM0 码力 | 26 页 | 5.82 MB | 5 月前3
TVM: Where Are We Goingemerging tensor instructionsTensorization Challenge C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Computation Specification (Tensor Expression) A = tvm.placeholder((8 = tvm.placeholder((8,)) k = tvm.reduce_axis((0, 8)) C = tvm.compute((8, 8), lambda y, x: tvm.sum(A[k, y] * B[k], axis=k)) HW Interface Specification by Tensor Expression TensorizationVTA: Open0 码力 | 31 页 | 22.64 MB | 5 月前3
XDNN TVM - Nov 2019Quantized model (Int16/Int8/...) quantize test finetune needs to increase accuracy deploy Y N Model for DPU Origin training data Calibration data (100-1000 images) >> 15© Copyright 20180 码力 | 16 页 | 3.35 MB | 5 月前3
共 7 条
- 1













