Add parallel conv fusion quantization pass#1538
Conversation
Signed-off-by: sunzhongqi <sunzhongqi@didiglobal.com>
📝 WalkthroughWalkthroughThis PR adds support for a ChangesParallel Conv Fusion Optimization
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/onnx/quantization/fp8.py (1)
280-305: 💤 Low valueConsider extracting duplicated QDQ tensor group merging logic.
The QDQ tensor group computation and merging logic (lines 280-305) is duplicated nearly verbatim in
int8.py(lines 253-278). Consider extracting this into a shared helper function ingraph_utils.pyto improve maintainability and ensure consistency when adding future passes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/fp8.py` around lines 280 - 305, Extract the duplicated QDQ tensor group merging logic into a shared helper (e.g., build_group_qdq_tensors or collect_qdq_group_tensors) in graph_utils.py that accepts (onnx_model, nodes_to_quantize, passes, logger) and internally calls the existing get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace the block in fp8.py (the group_qdq_tensors creation and trt_guided_options assignment) and the analogous block in int8.py to call this new helper and assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules reuse the same implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/onnx/quantization/fp8.py`:
- Around line 280-305: Extract the duplicated QDQ tensor group merging logic
into a shared helper (e.g., build_group_qdq_tensors or
collect_qdq_group_tensors) in graph_utils.py that accepts (onnx_model,
nodes_to_quantize, passes, logger) and internally calls the existing
get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and
merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace
the block in fp8.py (the group_qdq_tensors creation and trt_guided_options
assignment) and the analogous block in int8.py to call this new helper and
assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules
reuse the same implementation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d711cfd1-551b-4439-83c7-a869a539ee06
📒 Files selected for processing (5)
modelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/fp8.pymodelopt/onnx/quantization/graph_utils.pymodelopt/onnx/quantization/int8.pymodelopt/onnx/quantization/quantize.py
What does this PR do?
Type of change: new feature.
Adds a TensorRT-oriented parallel Conv/ConvTranspose fusion quantization pass. The pass groups Q/DQ scales for matching parallel convolution branches so TensorRT can fuse them when the post-conv quantize nodes use the same scale. It is wired into both INT8 and FP8 quantization paths and exposed through the existing passes option.
Usage
Use the existing quantize API with passes including concat_elimination and parallel_conv_fusion. The new pass is enabled by default for INT8 and FP8 quantization.
Before your PR is Ready for review
Summary by CodeRabbit
parallel_conv_fusionoptimization pass for ONNX quantization workflows to improve model performance.concat_eliminationpass.--passesCLI option to select specific passes.