Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing#35
Conversation
… support - Add RDNA4_architecture.md with gfx1201 hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints) - Add hip_rdna_cheatsheet.md as standalone RDNA HIP best practices - Wire RDNA4 into default_cheatsheet.yaml with knowledge_override - Set AMDGPU_TARGETS and GPU_TARGETS in preprocessing for CMake builds - Support knowledge_override in prompt_builder for per-arch cheatsheets - Remove hardcoded MI300X prompt from points_in_boxes task config Made-with: Cursor
simple_prompt_builder() bypasses src/prompt_builder.py, so the architecture cheatsheets + arch-precheck directive wired up in 14d240f were never reaching the agent. Load them in launch_agent.py after the simple prompt is built, mirroring src/prompt_builder.py's section layout (precheck + architecture context + language-specific knowledge). Made-with: Cursor
|
Thanks for adding the RDNA4/gfx1201 support — the routing through One concern before merging: in
Could we keep those generic constraints by moving them into |
|
cc @sharareh-y for awareness |
|
Thanks, good catch — let me share what I found before we decide. Looking across the 36
So adding the contract here would make If you're okay with it, I'd suggest:
But if you'd prefer I revert just the |
|
Your suggestion sounds good. I will merge this diff. Please remember to do the second item. Thank you a lot |
Summary
Enables the arena to target AMD RDNA4 (gfx1201, e.g. Radeon RX 9070 series) in addition to the existing CDNA targets (MI300/MI355X). The benchmark previously assumed a CDNA-flavored prompt path; on an RDNA4 box the agent received MI300-flavored guidance (XCDs, MFMA, 5.3 TB/s HBM, unified memory), which is wrong for RDNA. This PR adds an RDNA4 architecture spec, an RDNA-flavored HIP cheatsheet, and the plumbing to route the right cheatsheet per
target_gpu_model.What changes
New cheatsheets (consumed by the agent prompt):
src/prompts/cheatsheet/RDNA4_architecture.md(+86 lines) — gfx1201 spec sheet: Wave32 default, WGP topology, no MFMA (use WMMA), 128 KB LDS/WGP, ~640 GB/s GDDR6, no XCDs, no unified memory,--offload-arch=gfx1201. Calls out CDNA→RDNA pitfalls.src/prompts/cheatsheet/hip_rdna_cheatsheet.md(+247 lines) — RDNA-flavored HIP best practices (coalescing, occupancy with Wave32, LDS bank conflicts, vector loads,__launch_bounds__, etc.) used as the HIP knowledge file when the target is RDNA.src/prompts/cheatsheet/default_cheatsheet.yaml— registersRDNA4 → gfx1201and introduces aknowledge_overridemap so RDNA tasks pullhip_rdna_cheatsheet.mdinstead of the CDNA-flavoredhip_cheatsheet.md.src/prompt_builder.pyknowledge_overridefrom the arch entry when picking the language-specific cheatsheet.target_gpu_modelto'MI300'— raise if missing. Prevents a misconfigured RDNA box from quietly emitting MI300 prompts.src/preprocessing.py::setup_rocm_env— in addition toPYTORCH_ROCM_ARCH, also exportAMDGPU_TARGETSandGPU_TARGETS(CMake honors these for HIP builds; needed by somehip2hiptask makefiles).agents/geak_v3/launch_agent.py— GEAK-v3 usessimple_prompt_builder()(notsrc/prompt_builder.py's full builder), so the cheatsheet wiring above wasn't reaching the agent. After the simple prompt is built, this commit appends the_gpu_arch_precheck_prompt+ arch cheatsheet fromdefault_cheatsheet.yaml, mirroring the section layout used by the regular builder. Wrapped in try/except so a cheatsheet lookup error never breaks agent launch.tasks/hip2hip/others/points_in_boxes/config.yaml— removes a hardcoded MI300X-specific prompt embedded in this single task config (referenced "304 CUs", "5.3 TB/s", "MI308X LDS"). Nowcheatsheet: nullso the per-arch cheatsheet fromdefault_cheatsheet.yamlis used, matching every other task in the repo.Diffstat
```text
agents/geak_v3/launch_agent.py | +26
src/preprocessing.py | +6 -1
src/prompt_builder.py | +8 -2
src/prompts/cheatsheet/RDNA4_architecture.md | +86 (new)
src/prompts/cheatsheet/default_cheatsheet.yaml | +5
src/prompts/cheatsheet/hip_rdna_cheatsheet.md | +247 (new)
tasks/hip2hip/others/points_in_boxes/config.yaml | +1 -57
7 files, +379 / −60
```
Compatibility
points_in_boxes).Test plan
target_gpu_model: RDNA4resolves to `gfx1201` and pulls `hip_rdna_cheatsheet.md` (verified by inspecting generated `task_prompt.md` for an RDNA4 run).target_gpu_model: MI300still resolves to `gfx942` and pulls `hip_cheatsheet.md` (regression check).Made with Cursor