Overview#
RTP-LLM First Release Version:0.2.0(2025.09)
Features#
Framkework Advanced Feature#
New Models#
Model Family (Variants) |
Example HuggingFace Identifier |
Description |
Support CardType |
---|---|---|---|
DeepSeek (v1, v2, v3/R1) |
|
Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning; |
NV ✅ |
Kimi (Kimi-K2) |
|
Moonshot’s MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence |
NV ✅ |
Qwen (v1, v1.5, v2, v2.5, v3, QWQ, Qwen3-Coder) |
|
Series of advanced reasoning-optimized models, |
NV ✅ |
QwenVL (VL2, VL2.5, VL3) |
|
Series of advanced Vision-language model series based on Qwen2.5/Qwen3 |
NV ✅ |
Llama |
|
Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance. |
NV ✅ |
Bug Fixs#
metrics of cache is not work
P/D Disaggregation dead lock
Question of omission#
too many dynamic lora need more reserver_runtime_mem_mb
AMD not support MoE models
MoE model without shared_experter cannot use enable-layer-micro-batch