AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server … by nehaprakriya · Pull Request #1657 · SemiAnalysisAI/InferenceX

nehaprakriya · 2026-06-03T17:40:48Z

Summary

Enable n-gram speculative decoding (prompt-lookup, 3 draft tokens) — 3.26x decode throughput at low concurrency
Add full AITER env-var tuning (MXFP4, FP4 ASM GEMM, unified paged attention, inductor graph partition, fused RoPE+KV cache, opus MoE sorting)
Tune server params: gpu-memory-utilization=0.97, max-num-seqs=256, max-num-batched-tokens=16384, GPU_MAX_HW_QUEUES=4

Files changed

File	Change
benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh	Env vars + spec decode + server tuning
perf-changelog.yaml	New entry documenting the uplift

Dependencies

AITER kernel patches submitted separately to nehaprakriya/aiter (MoE GEMM num_stages + split_k fix)

Note

Low Risk
Benchmark-only script and changelog changes; no production serving or auth paths.

Overview
Tunes the GPT-OSS FP4 MI355X vLLM benchmark launch path for higher throughput on ROCm.

benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh expands AITER env tuning (MXFP4, unified paged attention, linear layers, FP4 ASM GEMM with Triton GEMM off, opus MoE sorting, AMDGCN_USE_BUFFER_OPS=1, GPU_MAX_HW_QUEUES=4, etc.) and keeps ROCM_AITER_UNIFIED_ATTN plus fuse RoPE+KV cache / inductor graph partition. It adds lossless n-gram prompt-lookup speculative decoding (num_speculative_tokens=3) via --speculative-config, raises --gpu-memory-utilization to 0.97, and sets --max-num-seqs 256 and --max-num-batched-tokens 16384.

perf-changelog.yaml records the gptoss-fp4-mi355x-vllm uplift (~3.26× decode throughput at low concurrency, per changelog).

^{Reviewed by Cursor Bugbot for commit 286dc1a. Bugbot is set up for automated code reviews on this repo. Configure here.}

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d2b331c. Configure here.}

…parameter tuning

chunfangamd · 2026-06-05T01:53:30Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm

github-actions · 2026-06-05T01:53:38Z

@chunfangamd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26990596873
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys gptoss-fp4-mi355x-vllm
Pinned ref: 286dc1a
Approval: not required (trusted collaborator).

nehaprakriya requested a review from a team June 3, 2026 17:40

github-project-automation Bot added this to InferenceMAX Board Jun 3, 2026

claude Bot reviewed Jun 3, 2026

View reviewed changes

nehaprakriya force-pushed the gptoss-fp4-mi355x-aiter-specdec branch from da6ae29 to d2b331c Compare June 3, 2026 17:41

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread benchmarks/single_node/fixed_seq_len/gptoss_fp4_mi355x.sh Outdated

AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server …

286dc1a

…parameter tuning

nehaprakriya force-pushed the gptoss-fp4-mi355x-aiter-specdec branch from d2b331c to 286dc1a Compare June 3, 2026 17:45

seungrokj added the AMD label Jun 4, 2026

chunfangamd marked this pull request as draft June 5, 2026 01:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server …#1657

AMD - gpt-oss vllm mxfp4: AITER tuning + n-gram spec decode + server …#1657
nehaprakriya wants to merge 1 commit into
SemiAnalysisAI:mainfrom
nehaprakriya:gptoss-fp4-mi355x-aiter-specdec

nehaprakriya commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

chunfangamd commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nehaprakriya commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Dependencies

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chunfangamd commented Jun 5, 2026

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nehaprakriya commented Jun 3, 2026 •

edited by cursor Bot

Loading