Overview#

RTP-LLM First Release Version:0.2.0(2025.09)

Features#

Framkework Advanced Feature#

New Models#

Model Family (Variants)

Example HuggingFace Identifier

Description

Support CardType

DeepSeek (v1, v2, v3/R1)

deepseek-ai/DeepSeek-R1

Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning;
top performance on complex reasoning, math, and code tasks.
RTP-LLM provides Deepseek v3/R1 model-specific optimizations

NV ✅
AMD ✅

Kimi (Kimi-K2)

moonshotai/Kimi-K2-Instruct

Moonshot’s MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence

NV ✅
AMD ✅

Qwen (v1, v1.5, v2, v2.5, v3, QWQ, Qwen3-Coder)

Qwen/Qwen3-235B-A22B

Series of advanced reasoning-optimized models,
Significantly improved performance on reasoning tasks,
including logical reasoning, mathematics, science, coding, and academic benchmarks that typically require human expertise — achieving state-of-the-art results among open-source thinking models.
Markedly better general capabilities, such as instruction following, tool usage, text generation, and alignment with human preferences.
Enhanced 256K long-context understanding capabilities.

NV ✅
AMD ✅

QwenVL (VL2, VL2.5, VL3)

Qwen/Qwen2-VL-2B

Series of advanced Vision-language model series based on Qwen2.5/Qwen3

NV ✅
AMD ❌

Llama

meta-llama/Llama-4-Scout-17B-16E-Instruct

Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance.

NV ✅
AMD ✅

Bug Fixs#

  • metrics of cache is not work

  • P/D Disaggregation dead lock

Question of omission#

  • too many dynamic lora need more reserver_runtime_mem_mb

  • AMD not support MoE models

  • MoE model without shared_experter cannot use enable-layer-micro-batch

Performance#

Compatibility#