DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Modelguarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation Multi-head Latent Attention (MLA). Through jointly compressing the keys and values into a latent vector, MLA significantly reduces the KV cache during inference. Then, q?, k?, v? will be sliced into (9) k? ? = ???c?? ? , (10) v? ? = ???c?? ? , (11) where c?? ? ∈ R?? is the compressed latent vector for keys and values; ??(≪ ?ℎ?ℎ) denotes the KV compression dimension; ? ??? ∈ R??×? is the down-projection0 码力 | 52 页 | 1.23 MB | 1 年前3
Trends Artificial Intelligence
Development Trending = Unprecedented37 Machine-Learning Model* Trending = In 2015... Industry Surpassed Academia as Data + Compute + Financial Needs Rose *Machine Learning = A subset of AI where machines AI, an AI Index data provider, uses the term ‘notable machine learning models’ to designate particularly influential models within the AI/machine learning ecosystem. Epoch maintains a database of 900 Academia Era 2015-today: Industry Era Global Notable Machine Learning Models by Sector – 2003-2024, per Stanford HAI Annual New Notable Machine-Learning Models AI Development Trending = Unprecedented380 码力 | 340 页 | 12.14 MB | 4 月前3
TVM: Where Are We Going(Py/Java/Go) lib = tvm.module.load("mylib.so") func = lib["npufunction0"] func(a, b) Automatic RPC Support remote = tvm.rpc.connect(board_url, port) remote.upload("mylib.so") remote_mod = remote.load_module(“mylib load_module(“mylib.so") func = remote_mod[“npufunction0"] func(remote_a, remote_b)Virtual Machine: Supporting Dynamic Workload Dynamic shape workloads More runtime objects: Arrays, Tuples, Trees, ADTs runtime for dynamic models Credit: Jared Roesch, Haichen Shen et.aluTVM: TVM on bare-metal Devices Support bare-metal J-TAG devices, no OS is needed ARM Cortex-M RISC-V Credit: Logan WeberuTVM upcoming:0 码力 | 31 页 | 22.64 MB | 5 月前3
开源中国 2023 大模型(LLM)技术报告年前四个月,向量数据库公司融资额 ,超过了 2022 年的总和 (图源:https://www.cbinsights.com/research/generative-ai-infrastructure- vector-database/) 7 / 32 LLM 基础设施:大模型框架及微调 (Fine Tuning) 大模型框架指专门设计用于构建、训练和部署大型机器 学习模型和深度学习模型的软件框架。这些框架提供了 数据存储、模型训练和部署服务。它们通常提供易于使用的界面,支 持快速迭代和大规模部署。Amazon SageMaker、Google Cloud AI Platform 和 Microsoft Azure Machine Learning 都是提供端到 端机器学习服务的云平台。 这些工具和库专门为加速机器学习模型的训练和推理而设计,通常利 用 GPU 或 TPU 等硬件。这类工具可以显著提高训练和推理的速度,0 码力 | 32 页 | 13.09 MB | 1 年前3
OctoML OSS 2019 11 8Computational PhD in Machine Lesming Phb in Computer Arhiecure oon) PhD in Programming nd Complers Biology and Machine Professor Intel orMicrosof Apple Qualcomm 40+ years of combined experience in computer systems design and machine learning tr tvm 。 @zxnet 和os 全 W Open Source at OctoML ee We are big believers Infrastructure Improvements to TVM o_uTVM: support for microcontrollers in TVM o_ Virtual Machine and dynamic NNs support (w/ AWS folks) o_ Improved NLP support, with focus on transformers QQ octoML Core0 码力 | 16 页 | 1.77 MB | 5 月前3
Dynamic Model in TVMits Affiliates. All rights reserved. Support dynamic model in TVM ● Support Any-dim in typing ● Use shape function to compute the type at runtime ● Virtual machine as a new runtime for Relay ● Dynamic dependent© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Relay virtual machine Relay Executable relay.vm.compile Relay Object (hardware independent) Code segment VM Func 0 type using the entries from a register. AllocClosure Allocates a closure with a lowered virtual machine function. If Jumps to the true or false offset depending on the condition. Goto Unconditionally0 码力 | 24 页 | 417.46 KB | 5 月前3
TVM Meetup: QuantizationCodegen – LLVM, Cuda, C, … Framework Parsers Graph level optimizations Tensor-level optimizations Machine code generation© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Quantization 𝒛𝒆𝒓𝒐_𝒑𝒐𝒊𝒏𝒕)© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to Support Framework Quantized Operators? Option 1 – Completely add new ops from scratch • New Relay passes rights reserved. Conclusion • TVM community is pursuing both Automatic- and Pre-quantized model support. Contributions are welcomed. • We need new/tuned TVM schedules using fast Integer operations like0 码力 | 19 页 | 489.50 KB | 5 月前3
TVM@Alibaba AI LabsHIFI4 DSP [和| Alibaba AL.Labs 阿里巴巴人工智能实验室 PowerVR GPU Alibaba Al.Labs 阿里巴巴人工智能实验室 PowerVR support by TVM NNVM Compiler -Execution graph -Model layers functions Computation Graph Optimizations Optimizations Symbols NNVM & Param Frontends Operators Algorithm &Schedule CUDA TOPI Backends Machine Learning Automated Optimizer Schedule explorer Cost model Mali TOPI ROCM TOPI PVRTOPI Alibaba0 码力 | 12 页 | 1.94 MB | 5 月前3
Google 《Prompt Engineering v7》the input the model uses to predict a specific output. You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. However, crafting the most effective prompt can hinder the model’s ability to provide meaningful output. You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. Prompt Engineering February 2025 7 When you chat The Gemini temperature control can be understood in a similar way to the softmax function used in machine learning. A low temperature setting mirrors a low softmax temperature (T), emphasizing a single0 码力 | 68 页 | 6.50 MB | 6 月前3
清华大学 DeepSeek+DeepResearch 让科研像聊天一样简单compressive force (shell strength)following Burnett and Belk (2018). A universal material+testing machine(MTS System Corporation, Eden Prairie, MIN, USA, Model 661; Fig1,)was used to determine the shell compressive force (shell strength)following Burnett and Belk (2018). A universal material-testing machine (MTS System Corporation, Eden Prairie, MN, USA, Model 661; Fig. 1) was used to determine the shell0 码力 | 85 页 | 8.31 MB | 8 月前3
共 16 条
- 1
- 2













