[AMD] CI - migrate perf test and fix stage-b-test-1-gpu-amd#17340
[AMD] CI - migrate perf test and fix stage-b-test-1-gpu-amd#17340HaiShaw merged 19 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @yctseng0211, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates AMD GPU performance tests into the continuous integration pipeline. By adding specific registration calls for AMD CI, it ensures that performance benchmarks for both single and multi-GPU configurations are automatically run on AMD hardware, thereby expanding test coverage and helping to maintain performance consistency across different hardware architectures. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully migrates the performance tests for 1-GPU and 2-GPU configurations to include AMD CI. The changes involve importing the register_amd_ci function and then calling it with appropriate estimated times and suite names, consistent with the existing CUDA CI registrations. The modifications are clear, directly address the stated motivation, and integrate well with the existing CI registration framework.
2024c0b to
8fb4552
Compare
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
…pe; lower threshold for AMD platforms.
6a278ec to
7968e27
Compare
These failures will be fixed by : #17432 |
…ect#17340) Co-authored-by: Bingxu Chen <bingxche@amd.com> Co-authored-by: bingxche <Bingxu.Chen@amd.com> Co-authored-by: michaelzhang-ai <michaelzhang.ai@users.noreply.github.com>

Motivation
This PR migrates the AMD CI performance and accuracy tests from a manual/imperative workflow style to using the unified test suite runner (
run_suite.py).Modifications
1. Workflow Reorganization (
.github/workflows/pr-test-amd.yml)performance-test-1-gpu-part-1-amd→stage-b-test-small-1-gpu-performance-amdperformance-test-1-gpu-part-2-amd→stage-b-test-large-1-gpu-performance-amdperformance-test-1-gpu-part-3-amd→ merged into other stagesperformance-test-2-gpu-amd→stage-b-test-large-2-gpu-performance-amdaccuracy-test-1-gpu-amd→stage-b-test-small-1-gpu-accuracy-amdaccuracy-test-2-gpu-amd→stage-b-test-large-2-gpu-accuracy-amd2. Test Registration for AMD (
test/registered/perf/*.pyandtest/registered/eval/*.py)register_amd_ci()in:test_bench_one_batch_1gpu.pytest_bench_one_batch_2gpu.pytest_bench_serving_1gpu_large.pytest_bench_serving_1gpu_part1.pytest_bench_serving_1gpu_part2.pytest_bench_serving_2gpu.pytest_vlm_perf_5090.pytest_eval_accuracy_large.pytest_moe_eval_accuracy_large.py3. AMD-specific Test Adjustments
@unittest.skipIf(is_hip(), "Skip Eagle test for ROCm"))fp8_fnuz)4. Suite Registration Updates
test/run_suite.pyandscripts/ci/slash_command_handler.pyto recognize the new AMD test suitesAccuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci