Skip to content

Pull requests: vllm-project/flash-attention

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add TopK mask utils
#127 opened Mar 23, 2026 by MatthewBonanni Loading…
Fp8 two level accumulation every n steps
#122 opened Feb 27, 2026 by PatrykSaffer Loading…
add support for newer CUDA archs (Spark/Thor)
#121 opened Feb 13, 2026 by askliar Loading…
Fix issues with async TP
#117 opened Feb 7, 2026 by LucasWilkinson Loading…
Sync to upstream main 20260121
#114 opened Jan 22, 2026 by LucasWilkinson Loading…
Add DCP parameters
#92 opened Sep 16, 2025 by MatthewBonanni Draft
Vllm_flash_attn_with_attention_weights
#88 opened Sep 11, 2025 by SiriusPaul Loading…
WIP stream k scheduling
#67 opened Apr 29, 2025 by LucasWilkinson Draft
fix: add "typename" prior to dependent type name
#54 opened Feb 28, 2025 by zhiweij1 Loading…
AMD ROCm Build
#41 opened Jan 29, 2025 by ProExpertProg Draft
support KV-Compress paged KV cache
#27 opened Nov 27, 2024 by IsaacRe Loading…
Add CUDA 8.7 arch for Jetson Orin
#26 opened Nov 27, 2024 by conroy-cheers Loading…
Update torch to 2.5.1
#25 opened Nov 7, 2024 by ayakzob Loading…
Don't disable uneven k to support more headdims
#21 opened Sep 27, 2024 by njhill Loading…
Update .gitignore to ignore *env/ directories
#16 opened Aug 8, 2024 by wasertech Loading…
ProTip! Follow long discussions with comments:>50.