Attention Backend#
Supporting matrix for different attention backends#
Backend |
Page Size > 1 |
Spec Decoding |
MLA |
Sliding Window |
Device Support |
Server Args |
Stage |
---|---|---|---|---|---|---|---|
TRT_V1 |
❌ |
❌ |
❌ |
❌ |
NV ✅ |
–enable_trtv1_fmha |
PREFILL ✅ |
TRT_V2 |
❌ |
❌ |
❌ |
❌ |
NV ✅ |
–enable_trt_fmha |
PREFILL ✅ |
PAGED_TRT_V2 |
✅ |
❌ |
❌ |
❌ |
NV ✅ |
–enable_paged_trt_fmha |
PREFILL ✅ |
OPEN_SOURCE |
❌ |
❌ |
❌ |
❌ |
NV ✅ |
–enable_open_source_fmha |
PREFILL ✅ |
PAGED_OPEN_SOURCE |
✅ |
❌ |
❌ |
❌ |
NV ✅ |
–enable_paged_open_source_fmha |
PREFILL ✅ |
CKFMHA |
❌ |
❌ |
✅ |
✅ |
NV ❌ |
None |
PREFILL ✅ |
FlashInfer |
✅ |
✅ |
✅ |
✅ |
NV ✅ |
–disable_flash_infer |
PREFILL ✅ |
XQA |
✅ |
❌ |
❌ |
❌ |
NV Hopper ✅ |
–enable_xqa |
PREFILL ❌ |
FlashMLA |
✅ |
✅ |
✅ |
❌ |
NV Hopper ✅ |
None |
PREFILL ❌ |
MMHA |
✅ |
❌ |
❌ |
❌ |
NV ✅ |
None |
PREFILL ❌ |
AiterPA |
✅ |
❌ |
❌ |
❌ |
NV ❌ |
None |
PREFILL ❌ |