DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language ModelArtificial General Intelligence (AGI). In general, the intelligence of an LLM tends to improve as the number of parameters increases, allowing it to exhibit emergent capabilities across various tasks (Wei et Figure 1(a) highlights that, on MMLU, DeepSeek-V2 achieves top-ranking performance with only a small number of activated parameters. In addition, as shown in Figure 1(b), compared with DeepSeek 67B, DeepSeek-V2 ??????????????????????? Attention Feed-Forward Network … 3 4 RMS Norm RMS Norm Transformer Block ×???????????? DeepSeekMoE 0 Input Hidden ???????????????????????? Multi-Head Latent Attention0 码力 | 52 页 | 1.23 MB | 1 年前3
Google 《Prompt Engineering v7》these configurations optimally for your task. Output length An important configuration setting is the number of tokens to generate in a response. Generating more tokens requires more computation from the LLM you are looking for. Putting it all together Choosing between top-K, top-P, temperature, and the number of tokens to generate, depends on the specific application and desired outcome, and the settings chosen word or phrase will, by chance, lead back to a prior state, creating a loop due to the vast number of available options. In both cases, the model's sampling process gets "stuck," resulting in monotonous0 码力 | 68 页 | 6.50 MB | 6 月前3
Trends Artificial Intelligence
Leading USA-Based LLM Users 2 Source: Company disclosures Details on Page 55 6MM 2005 2025 Number of Developers, MM 0% 50% 100% Internet LLM 33 Years In 90% @ Year 3 90% @ Year 23 10/22 use). Source: Epoch AI (5/25) Training Dataset Size (Number of Words) for Key AI Models – 1950-2025, per Epoch AI Training Dataset Size – Number of Words +260% / Year AI Technology Compounding = Numbers in… Number of Powerful AI Models *As of 4/25, ‘Large-Scale AI Models’ are generally defined as those with a training compute of 1023 FLOPs or greater, per Epoch AI. Source: Epoch AI (5/25) Number of0 码力 | 340 页 | 12.14 MB | 5 月前3
Dynamic Model in TVMInvokes a Relay closure. InvokePacked Invokes a TVM compiled kernel. AllocStorage Allocates a storage block. AllocTensor Allocates a tensor value of a certain shape. AllocTensorReg Allocates a tensor based = [tvm.relay.Any(), 3, 224, 224] dtype = "float32" block = get_model('resnet50_v1', pretrained=True) mod, params = relay.frontend.from_mxnet(block, shape={input_name: input_shape}, dtype=dtype) tvm0 码力 | 24 页 | 417.46 KB | 6 月前3
Facebook -- TVM AWS Meetup Talkand model co-design - PyTorch operator overhead makes interpreter infeasible - Reduce FLOPs with block-sparsified weight matrices - not a new idea, cf WaveRNN, Sparse Transformers, etc - Reduce precision Related work in Gibiansky (2017), Gray (2019), et al. Image from OpenAI- Add relay.nn.sparse_dense for block-sparse matrix multiplication (~50 lines of TVM IR) - Add relay.reinterpret to implement rational0 码力 | 11 页 | 3.08 MB | 6 月前3
TVM@Alibaba AI Labsce 2 |sep Cooperative Fetching lets threads in the same thread block cooperatively fetch dependent data out_channel WwWly, pm Bly zx) https://docstvm ] Cooperative Fetching Lets threads (work item) in the same thread block (work group) cooperatively fetch dependent data https/www khronos.org/registry/DpenCLspecs/opencl-10 码力 | 12 页 | 1.94 MB | 6 月前3
OpenAI 《A practical guide to building agents》"timestamp" "Search agent" "Help the user search the internet and save results if asked.", As the number of required tools increases, consider splitting tasks across multiple agents (see Orchestration) action or output. For example, a step might instruct the agent to ask the user for their order number or to call an API to retrieve account details. Being explicit about the action (and even the wording exit conditions include tool calls, a certain structured output, errors, or reaching a maximum number of turns. 14 A practical guide to building agents For example, in the Agents SDK, agents are started0 码力 | 34 页 | 7.00 MB | 6 月前3
TVM Meetup: QuantizationOverview • Represent FP32 numbers with a lower-precision INT8 numbers • Integer number stands as a proxy for FP32 number (not a downcast) • Quantized tensor is represented with a scale and a zero point0 码力 | 19 页 | 489.50 KB | 6 月前3
共 8 条
- 1













