[DNM][AMD] agentx-v0.4#1654
Conversation
… Kimi/Qwen scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to 32 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
… Kimi/MiniMax scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
….0, expand conc list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…higher range Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… and update script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d update scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…standalone script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2.5 agentic configs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6dbef81. Configure here.
| HICACHE_CUDA_GRAPH_MAX_BS="${HICACHE_CUDA_GRAPH_MAX_BS:-16}" | ||
| # Don't force ROCm graph capture at every high concurrency point; conc=16 | ||
| # is the highest known-good capture size for this model/server path. | ||
| HICACHE_CUDA_GRAPH_MAX_BS="${HICACHE_CUDA_GRAPH_MAX_BS:-256}" |
There was a problem hiding this comment.
HiCache graph default too high
Medium Severity
HICACHE_CUDA_GRAPH_MAX_BS defaults to 256 while nearby comments state conc=16 is the highest known-good capture size for this path. For typical sweep concurrencies, the min-cap logic never reduces CUDA_GRAPH_MAX_BS, so the intended safety limit is ineffective even if the launch used that variable.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6dbef81. Configure here.


Summary
minimaxm2.5-fp4-mi355x-vllm-agentic-lmcacheandkimik2.5-fp4-mi355x-vllm-agentic-lmcacheentries inamd-master.yamlminimaxm2.5_fp4_mi355x.shagentic benchmark script with LMCache supportkimik2.5_fp4_mi355x.sh: simplify env vars, build LMCache from source (ROCm HIP), tuneLMCACHE_L1_SIZE_GB/TTL/chunk sizeqwen3.5_fp8_mi355x.sh: add HiCache offloading support, add 256k trace corpus cap viaWEKA_LOADER_OVERRIDELMCACHE_CHUNK_SIZEdefault to 32 for MiniMax agentic script🤖 Generated with Claude Code
Note
Medium Risk
Changes KV offload and server launch paths for long agentic runs (runtime LMCache clone/build, experimental DRAM/TODO tuning); benchmark-only but can mislead perf comparisons or fail jobs if offload configs are wrong.
Overview
Expands MI355X agentic-coding coverage in
amd-master.yamlwith new# targetmatrix rows that compare GPU-only vs HiCache (SGLang) or LMCache (vLLM) at fixed TP/concurrency grids for Qwen3.5 FP4, GLM-5.1 FP4, MiniMax M2.5 FP4/FP8, and Kimi K2.5 FP4. The existing Qwen3.5 FP8 agentic-hicache entry is retuned (newer SGLang image, TP=4, higher conc list) instead of the prior TP8-only sweep.Agentic launch scripts are updated to match: 256k-capped trace corpus via
WEKA_LOADER_OVERRIDE, shared OFFLOADING handling, and server recipes aligned with matrix defaults. SGLang scripts gain HiCache sizing (3 TB node DRAM budget, skip warmup, graph caps) and refreshed launch flags. vLLM scripts drop the large inline ROCm sitecustomize patches in favor of building LMCache from source (hipcc/BUILD_WITH_HIP), external LMCache MP +LMCacheMPConnector, partitioned CPU DRAM, longer L1 read TTL, and tuned chunk/worker settings; MiniMax FP4 adds a dedicated script, and MiniMax FP8 adds full LMCache plus concurrency-dependent block size / async scheduling.Reviewed by Cursor Bugbot for commit 6dbef81. Bugbot is set up for automated code reviews on this repo. Configure here.