Skip to content

Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing#35

Merged
irvineoy merged 3 commits into
geak_benchmarkfrom
feature/rdna-hip-enablement
May 7, 2026
Merged

Enable RDNA4 (gfx1201) HIP support: arch cheatsheets, prompt builder, preprocessing#35
irvineoy merged 3 commits into
geak_benchmarkfrom
feature/rdna-hip-enablement

Conversation

@mehdi-saeedi
Copy link
Copy Markdown
Collaborator

@mehdi-saeedi mehdi-saeedi commented May 5, 2026

Summary

Enables the arena to target AMD RDNA4 (gfx1201, e.g. Radeon RX 9070 series) in addition to the existing CDNA targets (MI300/MI355X). The benchmark previously assumed a CDNA-flavored prompt path; on an RDNA4 box the agent received MI300-flavored guidance (XCDs, MFMA, 5.3 TB/s HBM, unified memory), which is wrong for RDNA. This PR adds an RDNA4 architecture spec, an RDNA-flavored HIP cheatsheet, and the plumbing to route the right cheatsheet per target_gpu_model.

What changes

  • New cheatsheets (consumed by the agent prompt):

    • src/prompts/cheatsheet/RDNA4_architecture.md (+86 lines) — gfx1201 spec sheet: Wave32 default, WGP topology, no MFMA (use WMMA), 128 KB LDS/WGP, ~640 GB/s GDDR6, no XCDs, no unified memory, --offload-arch=gfx1201. Calls out CDNA→RDNA pitfalls.
    • src/prompts/cheatsheet/hip_rdna_cheatsheet.md (+247 lines) — RDNA-flavored HIP best practices (coalescing, occupancy with Wave32, LDS bank conflicts, vector loads, __launch_bounds__, etc.) used as the HIP knowledge file when the target is RDNA.
  • src/prompts/cheatsheet/default_cheatsheet.yaml — registers RDNA4 → gfx1201 and introduces a knowledge_override map so RDNA tasks pull hip_rdna_cheatsheet.md instead of the CDNA-flavored hip_cheatsheet.md.

  • src/prompt_builder.py

    • Honor knowledge_override from the arch entry when picking the language-specific cheatsheet.
    • Stop silently defaulting target_gpu_model to 'MI300' — raise if missing. Prevents a misconfigured RDNA box from quietly emitting MI300 prompts.
  • src/preprocessing.py::setup_rocm_env — in addition to PYTORCH_ROCM_ARCH, also export AMDGPU_TARGETS and GPU_TARGETS (CMake honors these for HIP builds; needed by some hip2hip task makefiles).

  • agents/geak_v3/launch_agent.py — GEAK-v3 uses simple_prompt_builder() (not src/prompt_builder.py's full builder), so the cheatsheet wiring above wasn't reaching the agent. After the simple prompt is built, this commit appends the _gpu_arch_precheck_prompt + arch cheatsheet from default_cheatsheet.yaml, mirroring the section layout used by the regular builder. Wrapped in try/except so a cheatsheet lookup error never breaks agent launch.

  • tasks/hip2hip/others/points_in_boxes/config.yaml — removes a hardcoded MI300X-specific prompt embedded in this single task config (referenced "304 CUs", "5.3 TB/s", "MI308X LDS"). Now cheatsheet: null so the per-arch cheatsheet from default_cheatsheet.yaml is used, matching every other task in the repo.

Diffstat

```text
agents/geak_v3/launch_agent.py | +26
src/preprocessing.py | +6 -1
src/prompt_builder.py | +8 -2
src/prompts/cheatsheet/RDNA4_architecture.md | +86 (new)
src/prompts/cheatsheet/default_cheatsheet.yaml | +5
src/prompts/cheatsheet/hip_rdna_cheatsheet.md | +247 (new)
tasks/hip2hip/others/points_in_boxes/config.yaml | +1 -57
7 files, +379 / −60
```

Compatibility

  • No new dependencies.
  • No changes to evaluator, scorer, task discovery, or task content (other than removing the one hardcoded MI300X prompt in points_in_boxes).
  • CDNA targets unchanged: `MI300` and `MI355X` still resolve via the existing entries in `default_cheatsheet.yaml` and the original `hip_cheatsheet.md`.
  • This branch was already merged with current `origin/geak_benchmark` (merge commit `22c0439`, no conflicts), so this PR fast-forwards.

Test plan

  • target_gpu_model: RDNA4 resolves to `gfx1201` and pulls `hip_rdna_cheatsheet.md` (verified by inspecting generated `task_prompt.md` for an RDNA4 run).
  • target_gpu_model: MI300 still resolves to `gfx942` and pulls `hip_cheatsheet.md` (regression check).
  • `AMDGPU_TARGETS` / `GPU_TARGETS` exported correctly for HIP/CMake builds on a gfx1201 box (ROCm 7.2.0).
  • GEAK-v3 agent (`agents/geak_v3/launch_agent.py`) receives the cheatsheet — confirmed via "Appended cheatsheet (arch=gfx1201, +N chars)" log line and the contents of `workspace_RDNA4_geak_v3/run_*/task_prompt.md`.
  • Missing `target_gpu_model` in `config.yaml` now raises with a clear message instead of silently falling back to MI300.
  • End-to-end run on a representative `hip2hip` task on RDNA4 produces a compiling, correct, non-zero-speedup result (in progress; previously blocked by a separate environment issue with PyTorch's bundled `librocprofiler-register.so` shadowing the system one — orthogonal to this PR).

Made with Cursor

… support

- Add RDNA4_architecture.md with gfx1201 hardware specs (Wave32, WGP,
  WMMA, GDDR6 bandwidth constraints)
- Add hip_rdna_cheatsheet.md as standalone RDNA HIP best practices
- Wire RDNA4 into default_cheatsheet.yaml with knowledge_override
- Set AMDGPU_TARGETS and GPU_TARGETS in preprocessing for CMake builds
- Support knowledge_override in prompt_builder for per-arch cheatsheets
- Remove hardcoded MI300X prompt from points_in_boxes task config

Made-with: Cursor
simple_prompt_builder() bypasses src/prompt_builder.py, so the
architecture cheatsheets + arch-precheck directive wired up in 14d240f
were never reaching the agent. Load them in launch_agent.py after the
simple prompt is built, mirroring src/prompt_builder.py's section
layout (precheck + architecture context + language-specific knowledge).

Made-with: Cursor
@irvineoy irvineoy self-requested a review May 6, 2026 17:38
@irvineoy
Copy link
Copy Markdown
Collaborator

irvineoy commented May 6, 2026

Thanks for adding the RDNA4/gfx1201 support — the routing through knowledge_override and the RDNA-specific prompt context look useful.

One concern before merging: in tasks/hip2hip/others/points_in_boxes/config.yaml, replacing the previous task-specific prompt.cheatsheet with null removes the MI300X-specific guidance, which is good, but it also removes some generic task constraints that still seem important for this harness, such as:

  • preserving the exact kernel function names/signatures expected by the task runner
  • keeping the launch/configuration interface compatible
  • ensuring the output remains directly compilable/runnable with the same interface
  • handling shared-memory launch sizing correctly if shared memory is introduced

Could we keep those generic constraints by moving them into prompt.instructions, while dropping only the MI300X-specific hardware details? That would make the task architecture-neutral without weakening the task contract.

@irvineoy
Copy link
Copy Markdown
Collaborator

irvineoy commented May 6, 2026

cc @sharareh-y for awareness

@mehdi-saeedi
Copy link
Copy Markdown
Collaborator Author

Thanks, good catch — let me share what I found before we decide.

Looking across the 36 hip2hip task configs:

  • All 24 hip2hip/gpumode/* tasks have a generic role-preamble string in prompt.instructions (no kernel-name/signature/launch-config/compile contract).
  • All 12 hip2hip/others/* tasks (including points_in_boxes and its siblings like knn, ball_query, roiaware_pool3d, etc.) currently have instructions: null and cheatsheet: null.

So adding the contract here would make points_in_boxes inconsistent with its 11 others/* siblings — it was just an outlier because someone had previously embedded MI300X-specific text in its cheatsheet field.

If you're okay with it, I'd suggest:

  1. Land this PR as-is, which makes points_in_boxes consistent with its 11 siblings.
  2. Open a separate PR to add the constraints you listed at the framework level so every hip2hip task gets them uniformly. Happy to take that.

But if you'd prefer I revert just the points_in_boxes change in this PR for now, that's also fine — let me know.

@mehdi-saeedi mehdi-saeedi requested a review from sharareh-y May 6, 2026 23:34
@irvineoy
Copy link
Copy Markdown
Collaborator

irvineoy commented May 7, 2026

Your suggestion sounds good. I will merge this diff. Please remember to do the second item. Thank you a lot

@irvineoy irvineoy merged commit 2284ab6 into geak_benchmark May 7, 2026
@mehdi-saeedi mehdi-saeedi deleted the feature/rdna-hip-enablement branch May 8, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants