Attention Backend#

Supporting matrix for different attention backends#

Backend

Page Size > 1

Spec Decoding

MLA

Sliding Window

Device Support

Server Args

Stage

TRT_V1

NV ✅
AMD ❌

–enable_trtv1_fmha

PREFILL ✅
DECODE❌

TRT_V2

NV ✅
AMD ❌

–enable_trt_fmha

PREFILL ✅
DECODE❌

PAGED_TRT_V2

NV ✅
AMD ❌

–enable_paged_trt_fmha

PREFILL ✅
DECODE❌

OPEN_SOURCE

NV ✅
AMD ❌

–enable_open_source_fmha

PREFILL ✅
DECODE❌

PAGED_OPEN_SOURCE

NV ✅
AMD ❌

–enable_paged_open_source_fmha

PREFILL ✅
DECODE❌

CKFMHA

NV ❌
AMD ✅

None

PREFILL ✅
DECODE❌

FlashInfer

NV ✅
AMD ✅

–disable_flash_infer

PREFILL ✅
DECODE✅

XQA

NV Hopper ✅
AMD ❌

–enable_xqa

PREFILL ❌
DECODE✅

FlashMLA

NV Hopper ✅
AMD ❌

None

PREFILL ❌
DECODE✅

MMHA

NV ✅
AMD ✅

None

PREFILL ❌
DECODE✅

AiterPA

NV ❌
AMD ✅

None

PREFILL ❌
DECODE✅