Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing by mehdi-saeedi · Pull Request #35 · AMD-AGI/AgentKernelArena

mehdi-saeedi · 2026-05-05T23:03:34Z

Summary

Enables the arena to target AMD RDNA4 (gfx1201, e.g. Radeon RX 9070 series) in addition to the existing CDNA targets (MI300/MI355X). The benchmark previously assumed a CDNA-flavored prompt path; on an RDNA4 box the agent received MI300-flavored guidance (XCDs, MFMA, 5.3 TB/s HBM, unified memory), which is wrong for RDNA. This PR adds an RDNA4 architecture spec, an RDNA-flavored HIP cheatsheet, and the plumbing to route the right cheatsheet per target_gpu_model.

What changes

New cheatsheets (consumed by the agent prompt):
- src/prompts/cheatsheet/RDNA4_architecture.md (+86 lines) — gfx1201 spec sheet: Wave32 default, WGP topology, no MFMA (use WMMA), 128 KB LDS/WGP, ~640 GB/s GDDR6, no XCDs, no unified memory, --offload-arch=gfx1201. Calls out CDNA→RDNA pitfalls.
- src/prompts/cheatsheet/hip_rdna_cheatsheet.md (+247 lines) — RDNA-flavored HIP best practices (coalescing, occupancy with Wave32, LDS bank conflicts, vector loads, __launch_bounds__, etc.) used as the HIP knowledge file when the target is RDNA.
src/prompts/cheatsheet/default_cheatsheet.yaml — registers RDNA4 → gfx1201 and introduces a knowledge_override map so RDNA tasks pull hip_rdna_cheatsheet.md instead of the CDNA-flavored hip_cheatsheet.md.
src/prompt_builder.py
- Honor knowledge_override from the arch entry when picking the language-specific cheatsheet.
- Stop silently defaulting target_gpu_model to 'MI300' — raise if missing. Prevents a misconfigured RDNA box from quietly emitting MI300 prompts.
src/preprocessing.py::setup_rocm_env — in addition to PYTORCH_ROCM_ARCH, also export AMDGPU_TARGETS and GPU_TARGETS (CMake honors these for HIP builds; needed by some hip2hip task makefiles).
agents/geak_v3/launch_agent.py — GEAK-v3 uses simple_prompt_builder() (not src/prompt_builder.py's full builder), so the cheatsheet wiring above wasn't reaching the agent. After the simple prompt is built, this commit appends the _gpu_arch_precheck_prompt + arch cheatsheet from default_cheatsheet.yaml, mirroring the section layout used by the regular builder. Wrapped in try/except so a cheatsheet lookup error never breaks agent launch.
tasks/hip2hip/others/points_in_boxes/config.yaml — removes a hardcoded MI300X-specific prompt embedded in this single task config (referenced "304 CUs", "5.3 TB/s", "MI308X LDS"). Now cheatsheet: null so the per-arch cheatsheet from default_cheatsheet.yaml is used, matching every other task in the repo.

Diffstat

```text
agents/geak_v3/launch_agent.py | +26
src/preprocessing.py | +6 -1
src/prompt_builder.py | +8 -2
src/prompts/cheatsheet/RDNA4_architecture.md | +86 (new)
src/prompts/cheatsheet/default_cheatsheet.yaml | +5
src/prompts/cheatsheet/hip_rdna_cheatsheet.md | +247 (new)
tasks/hip2hip/others/points_in_boxes/config.yaml | +1 -57
7 files, +379 / −60
```

Compatibility

No new dependencies.
No changes to evaluator, scorer, task discovery, or task content (other than removing the one hardcoded MI300X prompt in points_in_boxes).
CDNA targets unchanged: `MI300` and `MI355X` still resolve via the existing entries in `default_cheatsheet.yaml` and the original `hip_cheatsheet.md`.
This branch was already merged with current `origin/geak_benchmark` (merge commit `22c0439`, no conflicts), so this PR fast-forwards.

Test plan

target_gpu_model: RDNA4 resolves to `gfx1201` and pulls `hip_rdna_cheatsheet.md` (verified by inspecting generated `task_prompt.md` for an RDNA4 run).
target_gpu_model: MI300 still resolves to `gfx942` and pulls `hip_cheatsheet.md` (regression check).
`AMDGPU_TARGETS` / `GPU_TARGETS` exported correctly for HIP/CMake builds on a gfx1201 box (ROCm 7.2.0).
GEAK-v3 agent (`agents/geak_v3/launch_agent.py`) receives the cheatsheet — confirmed via "Appended cheatsheet (arch=gfx1201, +N chars)" log line and the contents of `workspace_RDNA4_geak_v3/run_*/task_prompt.md`.
Missing `target_gpu_model` in `config.yaml` now raises with a clear message instead of silently falling back to MI300.
End-to-end run on a representative `hip2hip` task on RDNA4 produces a compiling, correct, non-zero-speedup result (in progress; previously blocked by a separate environment issue with PyTorch's bundled `librocprofiler-register.so` shadowing the system one — orthogonal to this PR).

Made with Cursor

… support - Add RDNA4_architecture.md with gfx1201 hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints) - Add hip_rdna_cheatsheet.md as standalone RDNA HIP best practices - Wire RDNA4 into default_cheatsheet.yaml with knowledge_override - Set AMDGPU_TARGETS and GPU_TARGETS in preprocessing for CMake builds - Support knowledge_override in prompt_builder for per-arch cheatsheets - Remove hardcoded MI300X prompt from points_in_boxes task config Made-with: Cursor

…a-hip-enablement

simple_prompt_builder() bypasses src/prompt_builder.py, so the architecture cheatsheets + arch-precheck directive wired up in 14d240f were never reaching the agent. Load them in launch_agent.py after the simple prompt is built, mirroring src/prompt_builder.py's section layout (precheck + architecture context + language-specific knowledge). Made-with: Cursor

irvineoy · 2026-05-06T17:49:29Z

Thanks for adding the RDNA4/gfx1201 support — the routing through knowledge_override and the RDNA-specific prompt context look useful.

One concern before merging: in tasks/hip2hip/others/points_in_boxes/config.yaml, replacing the previous task-specific prompt.cheatsheet with null removes the MI300X-specific guidance, which is good, but it also removes some generic task constraints that still seem important for this harness, such as:

preserving the exact kernel function names/signatures expected by the task runner
keeping the launch/configuration interface compatible
ensuring the output remains directly compilable/runnable with the same interface
handling shared-memory launch sizing correctly if shared memory is introduced

Could we keep those generic constraints by moving them into prompt.instructions, while dropping only the MI300X-specific hardware details? That would make the task architecture-neutral without weakening the task contract.

irvineoy · 2026-05-06T17:50:33Z

cc @sharareh-y for awareness

mehdi-saeedi · 2026-05-06T21:11:42Z

Thanks, good catch — let me share what I found before we decide.

Looking across the 36 hip2hip task configs:

All 24 hip2hip/gpumode/* tasks have a generic role-preamble string in prompt.instructions (no kernel-name/signature/launch-config/compile contract).
All 12 hip2hip/others/* tasks (including points_in_boxes and its siblings like knn, ball_query, roiaware_pool3d, etc.) currently have instructions: null and cheatsheet: null.

So adding the contract here would make points_in_boxes inconsistent with its 11 others/* siblings — it was just an outlier because someone had previously embedded MI300X-specific text in its cheatsheet field.

If you're okay with it, I'd suggest:

Land this PR as-is, which makes points_in_boxes consistent with its 11 siblings.
Open a separate PR to add the constraints you listed at the framework level so every hip2hip task gets them uniformly. Happy to take that.

But if you'd prefer I revert just the points_in_boxes change in this PR for now, that's also fine — let me know.

irvineoy · 2026-05-07T00:32:16Z

Your suggestion sounds good. I will merge this diff. Please remember to do the second item. Thank you a lot

msaeedi-amd added 3 commits April 15, 2026 13:23

Merge remote-tracking branch 'origin/geak_benchmark' into feature/rdn…

22c0439

…a-hip-enablement

irvineoy self-requested a review May 6, 2026 17:38

mehdi-saeedi requested a review from sharareh-y May 6, 2026 23:34

irvineoy merged commit 2284ab6 into geak_benchmark May 7, 2026

mehdi-saeedi mentioned this pull request May 7, 2026

feat(hip2hip): inject generic task contract at framework level #38

Merged

7 tasks

mehdi-saeedi deleted the feature/rdna-hip-enablement branch May 8, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing#35

Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing#35
irvineoy merged 3 commits into
geak_benchmarkfrom
feature/rdna-hip-enablement

mehdi-saeedi commented May 5, 2026 •

edited

Loading

Uh oh!

irvineoy commented May 6, 2026

Uh oh!

irvineoy commented May 6, 2026

Uh oh!

mehdi-saeedi commented May 6, 2026

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mehdi-saeedi commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Diffstat

Compatibility

Test plan

Uh oh!

irvineoy commented May 6, 2026

Uh oh!

irvineoy commented May 6, 2026

Uh oh!

mehdi-saeedi commented May 6, 2026

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mehdi-saeedi commented May 5, 2026 •

edited

Loading