Skip to content

Add parallel conv fusion quantization pass#1538

Open
NotUr77 wants to merge 2 commits into
NVIDIA:mainfrom
NotUr77:codex/parallel-conv-fusion-pass
Open

Add parallel conv fusion quantization pass#1538
NotUr77 wants to merge 2 commits into
NVIDIA:mainfrom
NotUr77:codex/parallel-conv-fusion-pass

Conversation

@NotUr77
Copy link
Copy Markdown

@NotUr77 NotUr77 commented May 23, 2026

What does this PR do?

Type of change: new feature.

Adds a TensorRT-oriented parallel Conv/ConvTranspose fusion quantization pass. The pass groups Q/DQ scales for matching parallel convolution branches so TensorRT can fuse them when the post-conv quantize nodes use the same scale. It is wired into both INT8 and FP8 quantization paths and exposed through the existing passes option.

Usage

Use the existing quantize API with passes including concat_elimination and parallel_conv_fusion. The new pass is enabled by default for INT8 and FP8 quantization.

Before your PR is Ready for review

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: N/A

Summary by CodeRabbit

  • New Features
    • Introduced parallel_conv_fusion optimization pass for ONNX quantization workflows to improve model performance.
    • The pass is now enabled by default alongside the existing concat_elimination pass.
    • Users can customize optimization behavior via the --passes CLI option to select specific passes.

Review Change Stack

Signed-off-by: sunzhongqi <sunzhongqi@didiglobal.com>
@NotUr77 NotUr77 requested a review from a team as a code owner May 23, 2026 10:20
@NotUr77 NotUr77 requested a review from galagam May 23, 2026 10:20
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR adds support for a parallel_conv_fusion optimization pass in ONNX quantization. Two new tensor-grouping utilities detect fusible parallel convolution branches and merge QDQ scale assignments across them. Both FP8 and INT8 quantization pipelines now conditionally apply this pass alongside the existing concat_elimination, with the new pass also exposed via CLI and the main quantize function.

Changes

Parallel Conv Fusion Optimization

Layer / File(s) Summary
Tensor grouping utilities
modelopt/onnx/quantization/graph_utils.py
Adds get_parallel_conv_fusion_tensors to detect fusible parallel Conv/ConvTranspose branches by matching input activations, convolution signatures, and tail patterns, then mapping tensors to shared scale groups via union-find; also adds merge_qdq_tensor_groups to merge overlapping group mappings.
FP8 pipeline integration
modelopt/onnx/quantization/fp8.py
Imports the new grouping utilities and updates the quantize() function to include parallel_conv_fusion in the default passes; refactors QDQ tensor-group construction to conditionally compute groups for both concat_elimination and parallel_conv_fusion, then merges them into a single mapping for TensorRT-guided quantization.
INT8 pipeline integration
modelopt/onnx/quantization/int8.py
Mirrors FP8 updates: imports grouping utilities, updates default passes, and refactors QDQ tensor-group construction to conditionally compute and merge groups for both passes before attaching to trt_guided_options.
CLI and public API
modelopt/onnx/quantization/__main__.py, modelopt/onnx/quantization/quantize.py
Updates the --passes CLI argument to accept parallel_conv_fusion and includes it in the default passes list; updates the main quantize() function's default passes parameter to include parallel_conv_fusion.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding a new parallel conv fusion quantization pass. It directly matches the PR's primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No unsafe torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new dependencies found. Pickle deserialization properly gated by user-configurable flag with secure default.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/onnx/quantization/fp8.py (1)

280-305: 💤 Low value

Consider extracting duplicated QDQ tensor group merging logic.

The QDQ tensor group computation and merging logic (lines 280-305) is duplicated nearly verbatim in int8.py (lines 253-278). Consider extracting this into a shared helper function in graph_utils.py to improve maintainability and ensure consistency when adding future passes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/fp8.py` around lines 280 - 305, Extract the
duplicated QDQ tensor group merging logic into a shared helper (e.g.,
build_group_qdq_tensors or collect_qdq_group_tensors) in graph_utils.py that
accepts (onnx_model, nodes_to_quantize, passes, logger) and internally calls the
existing get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and
merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace
the block in fp8.py (the group_qdq_tensors creation and trt_guided_options
assignment) and the analogous block in int8.py to call this new helper and
assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules
reuse the same implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/onnx/quantization/fp8.py`:
- Around line 280-305: Extract the duplicated QDQ tensor group merging logic
into a shared helper (e.g., build_group_qdq_tensors or
collect_qdq_group_tensors) in graph_utils.py that accepts (onnx_model,
nodes_to_quantize, passes, logger) and internally calls the existing
get_concat_eliminated_tensors and get_parallel_conv_fusion_tensors and
merge_qdq_tensor_groups logic to produce/return group_qdq_tensors; then replace
the block in fp8.py (the group_qdq_tensors creation and trt_guided_options
assignment) and the analogous block in int8.py to call this new helper and
assign trt_guided_options["group_qdq_tensors"] when non-empty so both modules
reuse the same implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d711cfd1-551b-4439-83c7-a869a539ee06

📥 Commits

Reviewing files that changed from the base of the PR and between 16a0130 and 6680944.

📒 Files selected for processing (5)
  • modelopt/onnx/quantization/__main__.py
  • modelopt/onnx/quantization/fp8.py
  • modelopt/onnx/quantization/graph_utils.py
  • modelopt/onnx/quantization/int8.py
  • modelopt/onnx/quantization/quantize.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant