DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modellanguage model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models t h u b . c o m / d e e p s e e k - a i / D e e p S e e k - V 2 . 0 20 40 60 80 100 Activated Parameters (Billions) 55 60 65 70 75 80 Performance (MMLU) DeepSeek-V2 DeepSeek 67B LLaMA 1 33B0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
AI Safety 4/24: Meta Platforms releases its open- source** Llama 3 model with 70B parameters 5/24: Google introduces AI overviews to augment its search functions 9/24: Alibaba releases 246-247246 Next AI Use Case Frontier – Protein Sequencing = Model Size +290% Annually to 98 Billion Parameters Over Four Years Note: List of models may not be comprehensive. Source: Stanford RAISE Health via Protein Sequencing Models (B Parameters) – 2020-2024, per Stanford RAISE Health 0 50 100 ProGen ProtBert ProGen 2 ProT5 ESM2 ESM3 +290% / Year Number of Parameters, B 2020 2022 2023 2024214MM0 码力 | 340 页 | 12.14 MB | 5 月前3
Facebook -- TVM AWS Meetup Talkspecialized code-generation techniques (TVM, Xbyak, etc) - Interesting new tradeoffs - how const are parameters? - structure specialization trades off icache/ dcache - also available today in FBGEMMPyTorch0 码力 | 11 页 | 3.08 MB | 5 月前3
Bring Your Own Codegen to TVMweight3 weight2 output What are not supported yet? ● Duplicated inputs optimization (e.g., reused parameters) ● Multiple outputs (e.g., batch normalization) ● Subgraph merging (e.g., conv2d + ReLU)© 20190 码力 | 19 页 | 504.69 KB | 5 月前3
OpenAI 《A practical guide to building agents》tools. Use multiple agents if improving tool clarity by providing descriptive names, clear parameters, and detailed descriptions doesn’t improve performance. 16 A practical guide to building agents0 码力 | 34 页 | 7.00 MB | 6 月前3
Google 《Prompt Engineering v7》creative ways. It changes the final prompt doing the task by utilizing more knowledge in the LLM’s parameters than would otherwise come into play when the LLM is prompted directly. It can help to mitigate0 码力 | 68 页 | 6.50 MB | 6 月前3
共 6 条
- 1













