Add parallel conv fusion quantization pass by NotUr77 · Pull Request #1538 · NVIDIA/Model-Optimizer

NotUr77 · 2026-05-23T10:20:25Z

What does this PR do?

Type of change: new feature.

Adds a TensorRT-oriented parallel Conv/ConvTranspose fusion quantization pass. The pass groups Q/DQ scales for matching parallel convolution branches so TensorRT can fuse them when the post-conv quantize nodes use the same scale. It is wired into both INT8 and FP8 quantization paths and exposed through the existing passes option.

Usage

Use the existing quantize API with passes including concat_elimination and parallel_conv_fusion. The new pass is enabled by default for INT8 and FP8 quantization.

Before your PR is Ready for review

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: N/A

Summary by CodeRabbit

New Features
- Introduced parallel_conv_fusion optimization pass for ONNX quantization workflows to improve model performance.
- The pass is now enabled by default alongside the existing concat_elimination pass.
- Users can customize optimization behavior via the --passes CLI option to select specific passes.

Signed-off-by: sunzhongqi <sunzhongqi@didiglobal.com>

copy-pr-bot · 2026-05-23T10:20:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-23T10:20:48Z

📝 Walkthrough

Walkthrough

This PR adds support for a parallel_conv_fusion optimization pass in ONNX quantization. Two new tensor-grouping utilities detect fusible parallel convolution branches and merge QDQ scale assignments across them. Both FP8 and INT8 quantization pipelines now conditionally apply this pass alongside the existing concat_elimination, with the new pass also exposed via CLI and the main quantize function.

Changes

Parallel Conv Fusion Optimization

Layer / File(s)	Summary
Tensor grouping utilities `modelopt/onnx/quantization/graph_utils.py`	Adds `get_parallel_conv_fusion_tensors` to detect fusible parallel Conv/ConvTranspose branches by matching input activations, convolution signatures, and tail patterns, then mapping tensors to shared scale groups via union-find; also adds `merge_qdq_tensor_groups` to merge overlapping group mappings.
FP8 pipeline integration `modelopt/onnx/quantization/fp8.py`	Imports the new grouping utilities and updates the `quantize()` function to include `parallel_conv_fusion` in the default passes; refactors QDQ tensor-group construction to conditionally compute groups for both `concat_elimination` and `parallel_conv_fusion`, then merges them into a single mapping for TensorRT-guided quantization.
INT8 pipeline integration `modelopt/onnx/quantization/int8.py`	Mirrors FP8 updates: imports grouping utilities, updates default passes, and refactors QDQ tensor-group construction to conditionally compute and merge groups for both passes before attaching to `trt_guided_options`.
CLI and public API `modelopt/onnx/quantization/__main__.py`, `modelopt/onnx/quantization/quantize.py`	Updates the `--passes` CLI argument to accept `parallel_conv_fusion` and includes it in the default passes list; updates the main `quantize()` function's default passes parameter to include `parallel_conv_fusion`.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main change: adding a new parallel conv fusion quantization pass. It directly matches the PR's primary objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No unsafe torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new dependencies found. Pickle deserialization properly gated by user-configurable flag with secure default.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

modelopt/onnx/quantization/fp8.py (1)
280-305: 💤 Low value

Consider extracting duplicated QDQ tensor group merging logic.

The QDQ tensor group computation and merging logic (lines 280-305) is duplicated nearly verbatim in int8.py (lines 253-278). Consider extracting this into a shared helper function in graph_utils.py to improve maintainability and ensure consistency when adding future passes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/fp8.py` around lines 280 - 305, Extract the
duplicated QDQ tensor group merging logic into a shared helper (e.g.,
build_group_qdq_tensors or collect_qdq_group_tensors) in graph_utils.py that
accepts (onnx_model, nodes_to_quantize, passes, logger) and internally calls the
existing get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and
merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace
the block in fp8.py (the group_qdq_tensors creation and trt_guided_options
assignment) and the analogous block in int8.py to call this new helper and
assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules
reuse the same implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/onnx/quantization/fp8.py`:
- Around line 280-305: Extract the duplicated QDQ tensor group merging logic
into a shared helper (e.g., build_group_qdq_tensors or
collect_qdq_group_tensors) in graph_utils.py that accepts (onnx_model,
nodes_to_quantize, passes, logger) and internally calls the existing
get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and
merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace
the block in fp8.py (the group_qdq_tensors creation and trt_guided_options
assignment) and the analogous block in int8.py to call this new helper and
assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules
reuse the same implementation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d711cfd1-551b-4439-83c7-a869a539ee06

📥 Commits

Reviewing files that changed from the base of the PR and between 16a0130 and 6680944.

📒 Files selected for processing (5)

modelopt/onnx/quantization/__main__.py
modelopt/onnx/quantization/fp8.py
modelopt/onnx/quantization/graph_utils.py
modelopt/onnx/quantization/int8.py
modelopt/onnx/quantization/quantize.py

Add parallel conv fusion quantization pass

6680944

Signed-off-by: sunzhongqi <sunzhongqi@didiglobal.com>

NotUr77 requested a review from a team as a code owner May 23, 2026 10:20

NotUr77 requested a review from galagam May 23, 2026 10:20

Merge branch 'main' into codex/parallel-conv-fusion-pass

34babe0

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel conv fusion quantization pass#1538

Add parallel conv fusion quantization pass#1538
NotUr77 wants to merge 2 commits into
NVIDIA:mainfrom
NotUr77:codex/parallel-conv-fusion-pass

NotUr77 commented May 23, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NotUr77 commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Before your PR is Ready for review

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NotUr77 commented May 23, 2026 •

edited

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading