feat(rdna): add RDNA4 Triton + HIP cheatsheets, arch config, and preprocessing support#37
Conversation
…rocessing support - Add RDNA4_architecture.md with gfx1201 hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints) - Add hip_rdna_cheatsheet.md as standalone RDNA HIP best practices (needed for hip2hip and torch2hip tasks on this branch) - Add triton_rdna_cheatsheet.md as standalone RDNA Triton best practices (Wave32 implications, WMMA vs MFMA, gfx1201 tl.dot dtype support) - Wire RDNA4 into default_cheatsheet.yaml with per-language knowledge_override (hip + triton) - Set AMDGPU_TARGETS and GPU_TARGETS in preprocessing for CMake builds - Support knowledge_override in prompt_builder for per-arch cheatsheets - Remove hardcoded MI300X prompt from points_in_boxes task config Mirrors AgentKernelArena HIP-branch commit 14d240f, with an added triton-specific cheatsheet and knowledge_override wiring. Made-with: Cursor
|
One issue I noticed with the new ROCm env propagation: the added CMake env vars are only set on the fallback path, not on the normal In detected_arch = _detect_gfx_arch_from_rocminfo()
if detected_arch:
os.environ["PYTORCH_ROCM_ARCH"] = detected_arch
logger.info(...)
returnThat means on machines where This seems to undermine the PR's goal of making CMake/HIP builds pick up the selected arch consistently. Could we set all three vars in the detected-arch branch as well, e.g.: if detected_arch:
os.environ["PYTORCH_ROCM_ARCH"] = detected_arch
os.environ["AMDGPU_TARGETS"] = detected_arch
os.environ["GPU_TARGETS"] = detected_arch
logger.info(...)
return |
… path too setup_rocm_env() previously set all three of PYTORCH_ROCM_ARCH, AMDGPU_TARGETS, and GPU_TARGETS only on the fallback path. On the common rocminfo-success path it set just PYTORCH_ROCM_ARCH and returned, leaving the two CMake env vars unset. Reproduced on MI300: rocminfo detected gfx942 but AMDGPU_TARGETS / GPU_TARGETS remained None, so CMake-based HIP builds did not pick up the selected arch. Refactor the function to resolve gfx_arch first (rocminfo -> cheatsheet fallback) and converge on a single export block driven by a module- level tuple. Both paths now export all three vars uniformly, and any future CMake env var can be added to the tuple in one place. Addresses irvineoy's review on #37. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thanks @irvineoy — confirmed and fixed in The asymmetry was real: Rather than duplicate the three-var export inline on each path (which leaves the same shape vulnerable to drift the next time someone touches the function), I refactored to resolve Verified by mocking
|
|
Thanks for the contribution |
Summary
Enables RDNA4 (
gfx1201) as a target architecture for Triton and HIP tasks. Mirrors the AKA HIP-branch commit14d240f, with an added Triton-specific cheatsheet and per-languageknowledge_overridewiring.RDNA4_architecture.mdcheatsheet coveringgfx1201hardware specs (Wave32, WGP, WMMA, GDDR6 bandwidth constraints).hip_rdna_cheatsheet.md— standalone RDNA HIP best practices forhip2hip/torch2hiptasks on this branch.triton_rdna_cheatsheet.md— standalone RDNA Triton best practices (Wave32 implications, WMMA vs MFMA,gfx1201tl.dotdtype support).RDNA4intodefault_cheatsheet.yamlwith a per-languageknowledge_override(hip+triton).AMDGPU_TARGETSandGPU_TARGETSinsrc/preprocessing.pyso CMake-based HIP builds target RDNA4.knowledge_overridesupport tosrc/prompt_builder.pyfor per-arch cheatsheets.tasks/hip2hip/others/points_in_boxes/config.yaml.Files changed (7)
src/preprocessing.pysrc/prompt_builder.pysrc/prompts/cheatsheet/RDNA4_architecture.md(new)src/prompts/cheatsheet/default_cheatsheet.yamlsrc/prompts/cheatsheet/hip_rdna_cheatsheet.md(new)src/prompts/cheatsheet/triton_rdna_cheatsheet.md(new)tasks/hip2hip/others/points_in_boxes/config.yamlTest plan
gfx1201) and confirmtriton_rdna_cheatsheet.mdis injected into the agent prompt viaknowledge_override.hip_rdna_cheatsheet.mdis injected.gfx1201(checkAMDGPU_TARGETS/GPU_TARGETSpropagation).Made with Cursor