Block Number - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Artificial General Intelligence (AGI). In general, the intelligence of an LLM tends to improve as the number of parameters increases, allowing it to exhibit emergent capabilities across various tasks (Wei et Figure 1(a) highlights that, on MMLU, DeepSeek-V2 achieves top-ranking performance with only a small number of activated parameters. In addition, as shown in Figure 1(b), compared with DeepSeek 67B, DeepSeek-V2 ??????????????????????? Attention Feed-Forward Network … 3 4 RMS Norm RMS Norm Transformer Block ×???????????? DeepSeekMoE 0 Input Hidden ???????????????????????? Multi-Head Latent Attention

0 码力 | 52 页 | 1.23 MB | 1 年前
3
Google 《Prompt Engineering v7》

these configurations optimally for your task. Output length An important configuration setting is the number of tokens to generate in a response. Generating more tokens requires more computation from the LLM you are looking for. Putting it all together Choosing between top-K, top-P, temperature, and the number of tokens to generate, depends on the specific application and desired outcome, and the settings chosen word or phrase will, by chance, lead back to a prior state, creating a loop due to the vast number of available options. In both cases, the model's sampling process gets "stuck," resulting in monotonous

0 码力 | 68 页 | 6.50 MB | 6 月前
3
Trends Artificial Intelligence

Leading USA-Based LLM Users 2 Source: Company disclosures Details on Page 55 6MM 2005 2025 Number of Developers, MM 0% 50% 100% Internet LLM 33 Years In 90% @ Year 3 90% @ Year 23 10/22 use). Source: Epoch AI (5/25) Training Dataset Size (Number of Words) for Key AI Models – 1950-2025, per Epoch AI Training Dataset Size – Number of Words +260% / Year AI Technology Compounding = Numbers in… Number of Powerful AI Models *As of 4/25, ‘Large-Scale AI Models’ are generally defined as those with a training compute of 1023 FLOPs or greater, per Epoch AI. Source: Epoch AI (5/25) Number of

0 码力 | 340 页 | 12.14 MB | 5 月前
3
Dynamic Model in TVM

Invokes a Relay closure. InvokePacked Invokes a TVM compiled kernel. AllocStorage Allocates a storage block. AllocTensor Allocates a tensor value of a certain shape. AllocTensorReg Allocates a tensor based = [tvm.relay.Any(), 3, 224, 224] dtype = "float32" block = get_model('resnet50_v1', pretrained=True) mod, params = relay.frontend.from_mxnet(block, shape={input_name: input_shape}, dtype=dtype) tvm

0 码力 | 24 页 | 417.46 KB | 6 月前
3
Facebook -- TVM AWS Meetup Talk

and model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with block-sparsified weight matrices - not a new idea, cf WaveRNN, Sparse Transformers, etc - Reduce precision Related work in Gibiansky (2017), Gray (2019), et al. Image from OpenAI- Add relay.nn.sparse_dense for block-sparse matrix multiplication (~50 lines of TVM IR) - Add relay.reinterpret to implement rational

0 码力 | 11 页 | 3.08 MB | 6 月前
3
TVM@Alibaba AI Labs

ce 2 |sep Cooperative Fetching lets threads in the same thread block cooperatively fetch dependent data out_channel WwWly, pm Bly zx) https://docstvm ] Cooperative Fetching Lets threads (work item) in the same thread block (work group) cooperatively fetch dependent data https/www khronos.org/registry/DpenCLspecs/opencl-1

0 码力 | 12 页 | 1.94 MB | 6 月前
3
OpenAI 《A practical guide to building agents》

"timestamp" "Search agent" "Help the user search the internet and save results if asked.", As the number of required tools increases, consider splitting tasks across multiple agents   (see Orchestration) action or output. For example, a step might instruct the agent to ask the user for their order number or to call an API to retrieve account details. Being explicit about the action (and even the wording exit conditions include tool calls,   a certain structured output, errors, or reaching a maximum number of turns. 14 A practical guide to building agents For example, in the Agents SDK, agents are started

0 码力 | 34 页 | 7.00 MB | 6 月前
3
TVM Meetup: Quantization

Overview • Represent FP32 numbers with a lower-precision INT8 numbers • Integer number stands as a proxy for FP32 number (not a downcast) • Quantized tensor is represented with a scale and a zero point

0 码力 | 19 页 | 489.50 KB | 6 月前
3

共 8 条前往

页

分类

语言

格式

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Google 《Prompt Engineering v7》

Trends Artificial Intelligence

Dynamic Model in TVM

Facebook -- TVM AWS Meetup Talk

TVM@Alibaba AI Labs

OpenAI 《A practical guide to building agents》

TVM Meetup: Quantization