Math - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

. . . . . . . 31 E Discussion About Pre-Training Data Debiasing 32 F Additional Evaluations on Math and Code 33 G Evaluation Formats 34 3 1. Introduction In the past few years, Large Language Models pre-training corpus. Then, we collect 1.5M conversational sessions, which encompass various domains such as math, code, writing, reasoning, safety, and more, to perform Supervised Fine-Tuning (SFT) for DeepSeek-V2 datasets include CHID (Zheng et al., 2019) and CCPM (Li et al., 2021). Math datasets include GSM8K (Cobbe et al., 2021), MATH (Hendrycks et al., 2021), and CMath (Wei et al., 2023). Code datasets include

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Trends Artificial Intelligence

benchmark evaluates a language model's performance across 57 academic and professional subjects, such as math, law, medicine, and history. It measures both factual recall and reasoning ability, making it a standard unified approach also creates a more seamless experience for users… …we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that AI… Performance on MATH Level 5 Test, Open vs. Closed LLMs by Year Released – 6/23-4/25, per Epoch AI Note: MATH Level 5 pass@1 refers to the accuracy of an AI model on the MATH benchmark, a dataset

0 码力 | 340 页 | 12.14 MB | 4 月前
3
OpenAI 《A practical guide to building agents》

enforcement, or safety classification. For example, the agent above processes a math question input optimistically until the math_homework_tripwire guardrail identifies a violation and raises an exception

0 码力 | 34 页 | 7.00 MB | 6 月前
3
Google 《Prompt Engineering v7》

.9, and top-K of 20. Finally, if your task always has a single correct answer (e.g., answering a math problem), start with a temperature of 0. NOTE: With more freedom (higher temperature, top-K, top-P simple as multiplying two numbers. This is because they are trained on large volumes of text and math may require a different approach. So let’s see if intermediate reasoning steps will improve the

0 码力 | 68 页 | 6.50 MB | 6 月前
3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单

OpenAI-4o等其他闭源模型。 • 数学推理能力对标顶尖模型：DeepSeek R1 在 AIME 2024 基准测试中得分 79.8%（pass@1），略优于 OpenAI-o1-1217；在 MATH-500 测试中，取得 97.3%，表现与 OpenAI-o1-1217 相当，远超其他模型。 • 代码生成能力达专家级水平：DeepSeek R1在编程任务中，Elo评分达 2029，超越

0 码力 | 85 页 | 8.31 MB | 8 月前
3

共 5 条前往

页

分类

语言

格式

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Trends Artificial Intelligence

OpenAI 《A practical guide to building agents》

Google 《Prompt Engineering v7》

清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单