 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modeland inference efficiency of DeepSeek 67B (Dense) and DeepSeek-V2. Contents 1 Introduction 4 2 Architecture 6 2.1 Multi-Head Latent Attention: Boosting Inference Efficiency . . . . . . . . . . . . . 6 DeepSeekMoE: Training Strong Models at Economical Costs . . . . . . . . . . . . 9 2.2.1 Basic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Device-Limited Routing characterized by economical training and efficient inference through an innovative Transformer architecture. It is equipped with a total of 236B parameters, of which 21B are activated for each token, and0 码力 | 52 页 | 1.23 MB | 1 年前3 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modeland inference efficiency of DeepSeek 67B (Dense) and DeepSeek-V2. Contents 1 Introduction 4 2 Architecture 6 2.1 Multi-Head Latent Attention: Boosting Inference Efficiency . . . . . . . . . . . . . 6 DeepSeekMoE: Training Strong Models at Economical Costs . . . . . . . . . . . . 9 2.2.1 Basic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Device-Limited Routing characterized by economical training and efficient inference through an innovative Transformer architecture. It is equipped with a total of 236B parameters, of which 21B are activated for each token, and0 码力 | 52 页 | 1.23 MB | 1 年前3
 Trends Artificial Intelligence
can make. The magic of watching AI do your work for you feels like the early days of email and web search – technologies that fundamentally changed our world. The better / faster / cheaper impacts of 5x Faster vs. Google Note: Dashed-line bars are for years where Google did not disclose annual search volumes. Source: Google public disclosures, OpenAI (12/24). ChatGPT figures are estimates per company 0 2,500 5,000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Google Search ChatGPT Years Since Public Launch (Google = 9/98, ChatGPT = 11/22)21 In 1998, tapping emerging0 码力 | 340 页 | 12.14 MB | 4 月前3 Trends Artificial Intelligence
can make. The magic of watching AI do your work for you feels like the early days of email and web search – technologies that fundamentally changed our world. The better / faster / cheaper impacts of 5x Faster vs. Google Note: Dashed-line bars are for years where Google did not disclose annual search volumes. Source: Google public disclosures, OpenAI (12/24). ChatGPT figures are estimates per company 0 2,500 5,000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Google Search ChatGPT Years Since Public Launch (Google = 9/98, ChatGPT = 11/22)21 In 1998, tapping emerging0 码力 | 340 页 | 12.14 MB | 4 月前3
 TVM Meetup Nov. 16th - LinaroecosystemLinaro AI Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative seamless integration0 码力 | 7 页 | 1.23 MB | 5 月前3 TVM Meetup Nov. 16th - LinaroecosystemLinaro AI Initiative Provide the best-in-class Deep Learning performance by leveraging Neural Network acceleration in IP and SoCs from the Arm ecosystem, through collaborative seamless integration0 码力 | 7 页 | 1.23 MB | 5 月前3
 XDNN TVM - Nov 2019SIGNALS MISC CALC AVG POOL MAX POOL ROI POOL ELEMENT WISE ... Efficiency > 50% for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime0 码力 | 16 页 | 3.35 MB | 5 月前3 XDNN TVM - Nov 2019SIGNALS MISC CALC AVG POOL MAX POOL ROI POOL ELEMENT WISE ... Efficiency > 50% for mainstream neural networks >> 4© Copyright 2018 Xilinx Inference Flow >> 5 MxNet CPU Layers FPGA Layers Runtime0 码力 | 16 页 | 3.35 MB | 5 月前3
 OpenAI 《A practical guide to building agents》executing the workflow. Query transaction databases or systems like CRMs, read PDF documents, or search the web. Action Enable agents to interact with systems to take actions such as adding new information return "File saved" search_agent = Agent( name= , instructions= tools=[WebSearchTool(),save_results], ) "output" "timestamp" "Search agent" "Help the user search the internet and save workflows effectively. While it’s tempting to immediately build a fully autonomous agent with complex architecture, customers typically achieve greater success with an incremental approach. In general, orchestration0 码力 | 34 页 | 7.00 MB | 6 月前3 OpenAI 《A practical guide to building agents》executing the workflow. Query transaction databases or systems like CRMs, read PDF documents, or search the web. Action Enable agents to interact with systems to take actions such as adding new information return "File saved" search_agent = Agent( name= , instructions= tools=[WebSearchTool(),save_results], ) "output" "timestamp" "Search agent" "Help the user search the internet and save workflows effectively. While it’s tempting to immediately build a fully autonomous agent with complex architecture, customers typically achieve greater success with an incremental approach. In general, orchestration0 码力 | 34 页 | 7.00 MB | 6 月前3
 Google 《Prompt Engineering v7》enabling LLMs to solve complex tasks using natural language reasoning combined with external tools (search, code interpreter etc.) allowing the LLM to perform certain actions, such as interacting with external langchain framework for Python, together with VertexAI (google-cloud-aiplatform) and the google-search-results pip packages. Prompt Engineering February 2025 38 To run this sample you must create a (free) fact, the LLM is scraping Google search results to figure out the band names. Then, it lists the results as observations and chains the thought for the next search. Prompt Engineering February 20250 码力 | 68 页 | 6.50 MB | 6 月前3 Google 《Prompt Engineering v7》enabling LLMs to solve complex tasks using natural language reasoning combined with external tools (search, code interpreter etc.) allowing the LLM to perform certain actions, such as interacting with external langchain framework for Python, together with VertexAI (google-cloud-aiplatform) and the google-search-results pip packages. Prompt Engineering February 2025 38 To run this sample you must create a (free) fact, the LLM is scraping Google search results to figure out the band names. Then, it lists the results as observations and chains the thought for the next search. Prompt Engineering February 20250 码力 | 68 页 | 6.50 MB | 6 月前3
 Facebook -- TVM AWS Meetup Talkdelivering generalized performance 2 Why TVM? XTVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster than real-time - Compute split between GRU0 码力 | 11 页 | 3.08 MB | 5 月前3 Facebook -- TVM AWS Meetup Talkdelivering generalized performance 2 Why TVM? XTVM for Speech Synthesis - WaveRNN-style model architecture - Autoregressive sampling net running at faster than real-time - Compute split between GRU0 码力 | 11 页 | 3.08 MB | 5 月前3
 开源中国 2023 大模型(LLM)技术报告TensorFlow 和 PyTorch 和 Hugging Face Transformers 等。 TensorFlow 架构图 (图源:https://www.geeksforgeeks.org/architecture-of- tensorflow/) 12 / 32 LLM 基础设施:编程语言 LLM 的训练和应用通常使用多种编程语言,取决于任务的需求和团 队的偏好。 。它的广泛使用得 益于其简洁的语法、强大的库支持(如0 码力 | 32 页 | 13.09 MB | 1 年前3 开源中国 2023 大模型(LLM)技术报告TensorFlow 和 PyTorch 和 Hugging Face Transformers 等。 TensorFlow 架构图 (图源:https://www.geeksforgeeks.org/architecture-of- tensorflow/) 12 / 32 LLM 基础设施:编程语言 LLM 的训练和应用通常使用多种编程语言,取决于任务的需求和团 队的偏好。 。它的广泛使用得 益于其简洁的语法、强大的库支持(如0 码力 | 32 页 | 13.09 MB | 1 年前3
 OpenAI - AI in the Enterpriseuse OpenAI every day; access to documents has jumped from 20% to 80%, with dramatically reduced search time; and advisors spend more time on client relationships, thanks to task automation and faster 12 AI in the EnterpriseLesson 4 Customize and fine-tune your models How Lowe’s improves product search Enterprises seeing the most success from AI adoption are often the ones that invest time and resources the Fortune 50 home improvement company, to improve the accuracy and relevance of their ecommerce search function. With thousands of suppliers, Lowe’s often has to work with incomplete or inconsistent0 码力 | 25 页 | 9.48 MB | 5 月前3 OpenAI - AI in the Enterpriseuse OpenAI every day; access to documents has jumped from 20% to 80%, with dramatically reduced search time; and advisors spend more time on client relationships, thanks to task automation and faster 12 AI in the EnterpriseLesson 4 Customize and fine-tune your models How Lowe’s improves product search Enterprises seeing the most success from AI adoption are often the ones that invest time and resources the Fortune 50 home improvement company, to improve the accuracy and relevance of their ecommerce search function. With thousands of suppliers, Lowe’s often has to work with incomplete or inconsistent0 码力 | 25 页 | 9.48 MB | 5 月前3
 TVM: Where Are We Goingframework for deep learning.TVM Stack High-Level Differentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Credit: Logan WeberuTVM upcoming: Self Hosted Runtime Credit: Logan WeberDesigned for Accelerators(NPU)Search Space for TPU-like Specialized Accelerators Tensor Compute Primitives Unified Buffer Acc FIFO0 码力 | 31 页 | 22.64 MB | 5 月前3 TVM: Where Are We Goingframework for deep learning.TVM Stack High-Level Differentiable IR Tensor Expression and Optimization Search Space LLVM, CUDA, Metal VTA Edge FPGA Cloud FPGA ASIC Optimization AutoTVM Device FleetExisting Credit: Logan WeberuTVM upcoming: Self Hosted Runtime Credit: Logan WeberDesigned for Accelerators(NPU)Search Space for TPU-like Specialized Accelerators Tensor Compute Primitives Unified Buffer Acc FIFO0 码力 | 31 页 | 22.64 MB | 5 月前3
共 12 条
- 1
- 2













