 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language ModelEconomical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training Evaluations on Math and Code 33 G Evaluation Formats 34 3 1. Introduction In the past few years, Large Language Models (LLMs) (Anthropic, 2023; Google, 2023; OpenAI, 2022, 2023) have undergone rapid development to tackle this problem, we introduce DeepSeek-V2, a strong open-source Mixture-of-Experts (MoE) language model, characterized by economical training and efficient inference through an innovative Transformer0 码力 | 52 页 | 1.23 MB | 1 年前3 DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language ModelEconomical, and Efficient Mixture-of-Experts Language Model DeepSeek-AI research@deepseek.com Abstract We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training Evaluations on Math and Code 33 G Evaluation Formats 34 3 1. Introduction In the past few years, Large Language Models (LLMs) (Anthropic, 2023; Google, 2023; OpenAI, 2022, 2023) have undergone rapid development to tackle this problem, we introduce DeepSeek-V2, a strong open-source Mixture-of-Experts (MoE) language model, characterized by economical training and efficient inference through an innovative Transformer0 码力 | 52 页 | 1.23 MB | 1 年前3
 Trends Artificial Intelligence
ever-growing digital datasets that have been in the making for over three decades; breakthrough large language models (LLMs) that – in effect – found freedom with the November 2022 launch of OpenAI’s ChatGPT 260% Annual Growth Over Fifteen Years of… Data to Train AI Models Led To… Note: Only “notable” language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K FLOPs are often used to estimate the computational cost of training or running a model. Note: Only language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K0 码力 | 340 页 | 12.14 MB | 4 月前3 Trends Artificial Intelligence
ever-growing digital datasets that have been in the making for over three decades; breakthrough large language models (LLMs) that – in effect – found freedom with the November 2022 launch of OpenAI’s ChatGPT 260% Annual Growth Over Fifteen Years of… Data to Train AI Models Led To… Note: Only “notable” language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K FLOPs are often used to estimate the computational cost of training or running a model. Note: Only language models shown (per Epoch AI, includes state of the art improvement on a recognized benchmark, >1K0 码力 | 340 页 | 12.14 MB | 4 月前3
 OctoML OSS 2019 11 8multiple employees to contribute to TVML. ee Today we'ltouch on a few of those contribution areas: o Core Infrastructure Improvements to TVM o_uTVM: support for microcontrollers in TVM o_ Virtual Machine dynamic NNs support (w/ AWS folks) o_ Improved NLP support, with focus on transformers QQ octoML Core Infrastructure Refactors ee New Integer Analysis Infrastructure o_ Supports the ability to handle groundwork forimproved multi-language support for expPosing runtime, and |IRs. QQ octoML Unified Object Protocol vm::Object NDArray | Rd | tuplelclosure AST Nodes Cross language suppPort Easy to introduce0 码力 | 16 页 | 1.77 MB | 5 月前3 OctoML OSS 2019 11 8multiple employees to contribute to TVML. ee Today we'ltouch on a few of those contribution areas: o Core Infrastructure Improvements to TVM o_uTVM: support for microcontrollers in TVM o_ Virtual Machine dynamic NNs support (w/ AWS folks) o_ Improved NLP support, with focus on transformers QQ octoML Core Infrastructure Refactors ee New Integer Analysis Infrastructure o_ Supports the ability to handle groundwork forimproved multi-language support for expPosing runtime, and |IRs. QQ octoML Unified Object Protocol vm::Object NDArray | Rd | tuplelclosure AST Nodes Cross language suppPort Easy to introduce0 码力 | 16 页 | 1.77 MB | 5 月前3
 OpenAI 《A practical guide to building agents》foundations 7 Guardrails 24 Conclusion 32 2 Practical guide to building agents Introduction Large language models are becoming increasingly capable of handling complex, multi-step tasks. Advances in reasoning chatbots, single-turn LLMs, or sentiment classifiers—are not agents. More concretely, an agent possesses core characteristics that allow it to act reliably and consistently on behalf of a user: 01 It leverages security reviews. 03 Heavy reliance on unstructured data: Scenarios that involve interpreting natural language, extracting meaning from documents, or interacting with users conversationally, for example0 码力 | 34 页 | 7.00 MB | 6 月前3 OpenAI 《A practical guide to building agents》foundations 7 Guardrails 24 Conclusion 32 2 Practical guide to building agents Introduction Large language models are becoming increasingly capable of handling complex, multi-step tasks. Advances in reasoning chatbots, single-turn LLMs, or sentiment classifiers—are not agents. More concretely, an agent possesses core characteristics that allow it to act reliably and consistently on behalf of a user: 01 It leverages security reviews. 03 Heavy reliance on unstructured data: Scenarios that involve interpreting natural language, extracting meaning from documents, or interacting with users conversationally, for example0 码力 | 34 页 | 7.00 MB | 6 月前3
 TVM@AliOSMobileNetv2 LaneNet 图TFLite1core 图TFLite4core 国QNNPACK 1core 四QNNPACK4core 四TVM1core 四TVM4core AiOS 1驱动万物智能 Alios TVM @ ARM CPU FP32 。,NHWC layout 。 For pointwise0 码力 | 27 页 | 4.86 MB | 5 月前3 TVM@AliOSMobileNetv2 LaneNet 图TFLite1core 图TFLite4core 国QNNPACK 1core 四QNNPACK4core 四TVM1core 四TVM4core AiOS 1驱动万物智能 Alios TVM @ ARM CPU FP32 。,NHWC layout 。 For pointwise0 码力 | 27 页 | 4.86 MB | 5 月前3
 Google 《Prompt Engineering v7》Summary 66 Endnotes 68 Prompt Engineering February 2025 6 Introduction When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as image evaluating a prompt’s writing style and structure in relation to the task. In the context of natural language processing and LLMs, a prompt is an input provided to the model to generate a response or prediction such as text summarization, information extraction, question and answering, text classification, language or code translation, code generation, and code documentation or reasoning. Please feel free to0 码力 | 68 页 | 6.50 MB | 6 月前3 Google 《Prompt Engineering v7》Summary 66 Endnotes 68 Prompt Engineering February 2025 6 Introduction When thinking about a large language model input and output, a text prompt (sometimes accompanied by other modalities such as image evaluating a prompt’s writing style and structure in relation to the task. In the context of natural language processing and LLMs, a prompt is an input provided to the model to generate a response or prediction such as text summarization, information extraction, question and answering, text classification, language or code translation, code generation, and code documentation or reasoning. Please feel free to0 码力 | 68 页 | 6.50 MB | 6 月前3
 Facebook -- TVM AWS Meetup TalkSparse Transformers, etc - Reduce precision with int8/float16 - very helpful to maintain model in core-private L1 dcaches - Use rational approximations for transcendentals (exp, tanh, erf, etc) - very lines of Relay IR) - A few days of work - TVM sampling model running in 30us on single server CPU core - Beat hand-written, highly optimized baselines (https://github.com/mozilla/LPCNet) by ~40% - Bonus:0 码力 | 11 页 | 3.08 MB | 5 月前3 Facebook -- TVM AWS Meetup TalkSparse Transformers, etc - Reduce precision with int8/float16 - very helpful to maintain model in core-private L1 dcaches - Use rational approximations for transcendentals (exp, tanh, erf, etc) - very lines of Relay IR) - A few days of work - TVM sampling model running in 30us on single server CPU core - Beat hand-written, highly optimized baselines (https://github.com/mozilla/LPCNet) by ~40% - Bonus:0 码力 | 11 页 | 3.08 MB | 5 月前3
 OpenAI - AI in the Enterprisethey could offer more and better insights to clients. They started with three model evals: 01 Language translation Measuring the accuracy and quality of translations produced by a model. 02 Summarization candidate why this specific job was recommended to them. Indeed uses the data analysis and natural language capabilities of GPT-4o mini to shape these ‘why’ statements in their emails and messages to jobseekers their AI application builds. Verdi integrates language models, Python nodes, and APIs to create a scalable, consistent platform that uses natural language as a central interface. Developers now build consistently0 码力 | 25 页 | 9.48 MB | 5 月前3 OpenAI - AI in the Enterprisethey could offer more and better insights to clients. They started with three model evals: 01 Language translation Measuring the accuracy and quality of translations produced by a model. 02 Summarization candidate why this specific job was recommended to them. Indeed uses the data analysis and natural language capabilities of GPT-4o mini to shape these ‘why’ statements in their emails and messages to jobseekers their AI application builds. Verdi integrates language models, Python nodes, and APIs to create a scalable, consistent platform that uses natural language as a central interface. Developers now build consistently0 码力 | 25 页 | 9.48 MB | 5 月前3
 TVM Meetup: QuantizationAmazon Web Services, Inc. or its Affiliates. All rights reserved. Evaluation • Intel Cascade Lake 12-core Server • TFLite Pre-quantized Hosted Models© 2019, Amazon Web Services, Inc. or its Affiliates. All0 码力 | 19 页 | 489.50 KB | 5 月前3 TVM Meetup: QuantizationAmazon Web Services, Inc. or its Affiliates. All rights reserved. Evaluation • Intel Cascade Lake 12-core Server • TFLite Pre-quantized Hosted Models© 2019, Amazon Web Services, Inc. or its Affiliates. All0 码力 | 19 页 | 489.50 KB | 5 月前3
 TVM@Alibaba AI Labskernel, strides, padding, dilation, layout, out_dtype): #Describe algorithm with tensor expression language'; #Return the out operation w How to compute. @autotvm.register_ topi_schedule(schedule_conv2d_nchw,pvr0 码力 | 12 页 | 1.94 MB | 5 月前3 TVM@Alibaba AI Labskernel, strides, padding, dilation, layout, out_dtype): #Describe algorithm with tensor expression language'; #Return the out operation w How to compute. @autotvm.register_ topi_schedule(schedule_conv2d_nchw,pvr0 码力 | 12 页 | 1.94 MB | 5 月前3
共 11 条
- 1
- 2













