Skip to content

AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server …#1657

Draft
nehaprakriya wants to merge 1 commit into
SemiAnalysisAI:mainfrom
nehaprakriya:gptoss-fp4-mi355x-aiter-specdec
Draft

AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server …#1657
nehaprakriya wants to merge 1 commit into
SemiAnalysisAI:mainfrom
nehaprakriya:gptoss-fp4-mi355x-aiter-specdec

Conversation

@nehaprakriya
Copy link
Copy Markdown
Collaborator

@nehaprakriya nehaprakriya commented Jun 3, 2026

Summary

  • Enable n-gram speculative decoding (prompt-lookup, 3 draft tokens) — 3.26x decode throughput at low concurrency
  • Add full AITER env-var tuning (MXFP4, FP4 ASM GEMM, unified paged attention, inductor graph partition, fused RoPE+KV cache, opus MoE sorting)
  • Tune server params: gpu-memory-utilization=0.97, max-num-seqs=256, max-num-batched-tokens=16384, GPU_MAX_HW_QUEUES=4

Files changed

File Change
benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh Env vars + spec decode + server tuning
perf-changelog.yaml New entry documenting the uplift

Dependencies

  • AITER kernel patches submitted separately to nehaprakriya/aiter (MoE GEMM num_stages + split_k fix)

Note

Low Risk
Benchmark-only script and changelog changes; no production serving or auth paths.

Overview
Tunes the GPT-OSS FP4 MI355X vLLM benchmark launch path for higher throughput on ROCm.

benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh expands AITER env tuning (MXFP4, unified paged attention, linear layers, FP4 ASM GEMM with Triton GEMM off, opus MoE sorting, AMDGCN_USE_BUFFER_OPS=1, GPU_MAX_HW_QUEUES=4, etc.) and keeps ROCM_AITER_UNIFIED_ATTN plus fuse RoPE+KV cache / inductor graph partition. It adds lossless n-gram prompt-lookup speculative decoding (num_speculative_tokens=3) via --speculative-config, raises --gpu-memory-utilization to 0.97, and sets --max-num-seqs 256 and --max-num-batched-tokens 16384.

perf-changelog.yaml records the gptoss-fp4-mi355x-vllm uplift (~3.26× decode throughput at low concurrency, per changelog).

Reviewed by Cursor Bugbot for commit 286dc1a. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@nehaprakriya nehaprakriya force-pushed the gptoss-fp4-mi355x-aiter-specdec branch from da6ae29 to d2b331c Compare June 3, 2026 17:41
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d2b331c. Configure here.

Comment thread benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh Outdated
@nehaprakriya nehaprakriya force-pushed the gptoss-fp4-mi355x-aiter-specdec branch from d2b331c to 286dc1a Compare June 3, 2026 17:45
@seungrokj seungrokj added the AMD label Jun 4, 2026
@chunfangamd chunfangamd marked this pull request as draft June 5, 2026 01:01
@chunfangamd
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26990596873
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm
Pinned ref: 286dc1a
Approval: not required (trusted collaborator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants