feat(rdna): add RDNA4 Triton + HIP cheatsheets, arch config, and preprocessing support by mehdi-saeedi · Pull Request #37 · AMD-AGI/AgentKernelArena

mehdi-saeedi · 2026-05-06T23:32:15Z

Summary

Enables RDNA4 (gfx1201) as a target architecture for Triton and HIP tasks. Mirrors the AKA HIP-branch commit 14d240f, with an added Triton-specific cheatsheet and per-language knowledge_override wiring.

New RDNA4_architecture.md cheatsheet covering gfx1201 hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints).
New hip_rdna_cheatsheet.md — standalone RDNA HIP best practices for hip2hip / torch2hip tasks on this branch.
New triton_rdna_cheatsheet.md — standalone RDNA Triton best practices (Wave32 implications, WMMA vs MFMA, gfx1201 tl.dot dtype support).
Wires RDNA4 into default_cheatsheet.yaml with a per-language knowledge_override (hip + triton).
Sets AMDGPU_TARGETS and GPU_TARGETS in src/preprocessing.py so CMake-based HIP builds target RDNA4.
Adds knowledge_override support to src/prompt_builder.py for per-arch cheatsheets.
Removes a hardcoded MI300X prompt from tasks/hip2hip/others/points_in_boxes/config.yaml.

Files changed (7)

src/preprocessing.py
src/prompt_builder.py
src/prompts/cheatsheet/RDNA4_architecture.md (new)
src/prompts/cheatsheet/default_cheatsheet.yaml
src/prompts/cheatsheet/hip_rdna_cheatsheet.md (new)
src/prompts/cheatsheet/triton_rdna_cheatsheet.md (new)
tasks/hip2hip/others/points_in_boxes/config.yaml

Test plan

Run a Triton task on RDNA4 (gfx1201) and confirm triton_rdna_cheatsheet.md is injected into the agent prompt via knowledge_override.
Run a HIP task on RDNA4 and confirm hip_rdna_cheatsheet.md is injected.
Confirm CMake-based HIP builds use gfx1201 (check AMDGPU_TARGETS / GPU_TARGETS propagation).
Sanity-check that MI300/MI355 task runs are unaffected (cheatsheet selection still works).

Made with Cursor

…rocessing support - Add RDNA4_architecture.md with gfx1201 hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints) - Add hip_rdna_cheatsheet.md as standalone RDNA HIP best practices (needed for hip2hip and torch2hip tasks on this branch) - Add triton_rdna_cheatsheet.md as standalone RDNA Triton best practices (Wave32 implications, WMMA vs MFMA, gfx1201 tl.dot dtype support) - Wire RDNA4 into default_cheatsheet.yaml with per-language knowledge_override (hip + triton) - Set AMDGPU_TARGETS and GPU_TARGETS in preprocessing for CMake builds - Support knowledge_override in prompt_builder for per-arch cheatsheets - Remove hardcoded MI300X prompt from points_in_boxes task config Mirrors AgentKernelArena HIP-branch commit 14d240f, with an added triton-specific cheatsheet and knowledge_override wiring. Made-with: Cursor

irvineoy · 2026-05-07T00:36:11Z

One issue I noticed with the new ROCm env propagation: the added CMake env vars are only set on the fallback path, not on the normal rocminfo detection path.

In setup_rocm_env(), when _detect_gfx_arch_from_rocminfo() succeeds, the function currently sets only PYTORCH_ROCM_ARCH and then returns:

detected_arch = _detect_gfx_arch_from_rocminfo()
if detected_arch:
    os.environ["PYTORCH_ROCM_ARCH"] = detected_arch
    logger.info(...)
    return

That means on machines where rocminfo works, AMDGPU_TARGETS and GPU_TARGETS remain unset even though the fallback path sets them. I reproduced this on the MI300 box: rocminfo detected gfx942, but after setup_rocm_env("MI300") the env was:

PYTORCH_ROCM_ARCH=gfx942
AMDGPU_TARGETS=None
GPU_TARGETS=None

This seems to undermine the PR's goal of making CMake/HIP builds pick up the selected arch consistently. Could we set all three vars in the detected-arch branch as well, e.g.:

if detected_arch:
    os.environ["PYTORCH_ROCM_ARCH"] = detected_arch
    os.environ["AMDGPU_TARGETS"] = detected_arch
    os.environ["GPU_TARGETS"] = detected_arch
    logger.info(...)
    return

… path too setup_rocm_env() previously set all three of PYTORCH_ROCM_ARCH, AMDGPU_TARGETS, and GPU_TARGETS only on the fallback path. On the common rocminfo-success path it set just PYTORCH_ROCM_ARCH and returned, leaving the two CMake env vars unset. Reproduced on MI300: rocminfo detected gfx942 but AMDGPU_TARGETS / GPU_TARGETS remained None, so CMake-based HIP builds did not pick up the selected arch. Refactor the function to resolve gfx_arch first (rocminfo -> cheatsheet fallback) and converge on a single export block driven by a module- level tuple. Both paths now export all three vars uniformly, and any future CMake env var can be added to the tuple in one place. Addresses irvineoy's review on #37. Co-authored-by: Cursor <cursoragent@cursor.com>

mehdi-saeedi · 2026-05-07T01:45:44Z

Thanks @irvineoy — confirmed and fixed in a254c5e.

The asymmetry was real: setup_rocm_env() exported all three vars only on the fallback path; the rocminfo-success path (the common case on any working ROCm box) exported only PYTORCH_ROCM_ARCH and returned, leaving AMDGPU_TARGETS / GPU_TARGETS unset. This undermined the PR's stated goal of getting CMake/HIP builds to pick up the arch consistently.

Rather than duplicate the three-var export inline on each path (which leaves the same shape vulnerable to drift the next time someone touches the function), I refactored to resolve gfx_arch first and converge on a single export block driven by a module-level tuple _ROCM_ARCH_ENV_VARS. Both paths now export all three uniformly, the log message is generated from the same tuple (so it can never lie about what was set), and adding any future CMake env var is a one-line change.

Verified by mocking _detect_gfx_arch_from_rocminfo:

rocminfo succeeds (gfx942) → PYTORCH_ROCM_ARCH=AMDGPU_TARGETS=GPU_TARGETS=gfx942 ✅ (was the broken path)
rocminfo fails, model resolves (RDNA4 → gfx1201) → all three set to gfx1201 ✅ (unchanged)
both fail → env left untouched ✅ (unchanged)

irvineoy · 2026-05-08T06:20:58Z

Thanks for the contribution

mehdi-saeedi requested review from irvineoy and sharareh-y May 6, 2026 23:34

irvineoy merged commit 3a79d46 into geak-triton-common-benchmark May 8, 2026

mehdi-saeedi deleted the feature/rdna-triton-enablement branch May 8, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rdna): add RDNA4 Triton + HIP cheatsheets, arch config, and preprocessing support#37

feat(rdna): add RDNA4 Triton + HIP cheatsheets, arch config, and preprocessing support#37
irvineoy merged 2 commits into
geak-triton-common-benchmarkfrom
feature/rdna-triton-enablement

mehdi-saeedi commented May 6, 2026

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

mehdi-saeedi commented May 7, 2026

Uh oh!

irvineoy commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mehdi-saeedi commented May 6, 2026

Summary

Files changed (7)

Test plan

Uh oh!

irvineoy commented May 7, 2026

Uh oh!

mehdi-saeedi commented May 7, 2026

Uh oh!

irvineoy commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants