perf(dpa4): opt so3grid#5517
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the SO(3) grid-net quadratic operations in the SeZM NN descriptor by moving channel-only linear projections from grid resolution back to coefficient resolution (where possible), reducing work proportional to the grid size while preserving equivariant behavior.
Changes:
- Introduced a shared
_project_frames()helper to apply per-frameChannelLinearprojections directly on packed coefficient tensors. - Refactored
GridMLPandGridBranchto operate on coefficient operands and use injectedto_grid/from_gridprojectors only for the unavoidable point-wise grid product step. - Replaced the implicit GLU “identity” op path with an explicit
GridProductmodule and removed_apply_grid_op, unifying the grid-op call interface.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughThis PR refactors grid-net computation in both PyTorch and DPModel implementations to operate at coefficient resolution. A new ChangesPyTorch Grid-Net Coefficient-Space Refactoring
DPModel Grid-Net Port and Grid Operation Classes
Documentation and Guard Clarifications
Test Parity and Coverage Updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5517 +/- ##
=======================================
Coverage 82.19% 82.19%
=======================================
Files 891 891
Lines 101599 101647 +48
Branches 4242 4240 -2
=======================================
+ Hits 83507 83552 +45
- Misses 16789 16792 +3
Partials 1303 1303 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
This PR passes its own CI but fails in the merge queue. The (run: https://github.com/deepmodeling/deepmd-kit/actions/runs/27460273106) Root cause — stale base + a semantic merge conflict. This PR makes deepmd-kit/source/tests/pt/model/test_dpa4_dpmodel_parity.py Lines 1629 to 1652 in 5d94bd6 Each side is fine alone, but the merge of this PR with current master (which already contains #5515) constructs the new required- Fix: rebase onto current master and update the now-stale call sites to pass |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py`:
- Around line 692-695: The `op_type` parameter docstring in the GridNet class
(around lines 692-695) incorrectly states that `"mlp"` is not ported, but this
contradicts the actual implementation which supports all three operation types
including `"mlp"`. Update the docstring for the `op_type` parameter to remove
the note claiming `"mlp"` is not ported, and ensure it accurately reflects that
`"mlp"` is a supported option alongside `"glu"` and `"branch"`, consistent with
the module docstring, BaseGridNet implementation, and serialize/deserialize
logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4227543e-5cfe-4584-b0ec-2e8d377dd576
📒 Files selected for processing (5)
deepmd/dpmodel/descriptor/dpa4_nn/block.pydeepmd/dpmodel/descriptor/dpa4_nn/ffn.pydeepmd/dpmodel/descriptor/dpa4_nn/grid_net.pydeepmd/pt/model/descriptor/sezm_nn/grid_net.pysource/tests/pt/model/test_dpa4_dpmodel_parity.py
💤 Files with no reviewable changes (1)
- source/tests/pt/model/test_dpa4_dpmodel_parity.py
✅ Files skipped from review due to trivial changes (2)
- deepmd/dpmodel/descriptor/dpa4_nn/block.py
- deepmd/dpmodel/descriptor/dpa4_nn/ffn.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/descriptor/sezm_nn/grid_net.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py`:
- Around line 692-695: The `op_type` parameter docstring in the GridNet class
(around lines 692-695) incorrectly states that `"mlp"` is not ported, but this
contradicts the actual implementation which supports all three operation types
including `"mlp"`. Update the docstring for the `op_type` parameter to remove
the note claiming `"mlp"` is not ported, and ensure it accurately reflects that
`"mlp"` is a supported option alongside `"glu"` and `"branch"`, consistent with
the module docstring, BaseGridNet implementation, and serialize/deserialize
logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4227543e-5cfe-4584-b0ec-2e8d377dd576
📒 Files selected for processing (5)
deepmd/dpmodel/descriptor/dpa4_nn/block.pydeepmd/dpmodel/descriptor/dpa4_nn/ffn.pydeepmd/dpmodel/descriptor/dpa4_nn/grid_net.pydeepmd/pt/model/descriptor/sezm_nn/grid_net.pysource/tests/pt/model/test_dpa4_dpmodel_parity.py
💤 Files with no reviewable changes (1)
- source/tests/pt/model/test_dpa4_dpmodel_parity.py
✅ Files skipped from review due to trivial changes (2)
- deepmd/dpmodel/descriptor/dpa4_nn/block.py
- deepmd/dpmodel/descriptor/dpa4_nn/ffn.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/descriptor/sezm_nn/grid_net.py
🛑 Comments failed to post (1)
deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py (1)
692-695:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale docstring:
"mlp"is now ported.The docstring claims
"mlp"is not ported, but this contradicts the module docstring (lines 15-17), theBaseGridNetimplementation (lines 539-546), and the serialize/deserialize logic (lines 777, 844) which all supportop_type="mlp".📝 Proposed fix
op_type : str - Point-wise grid operation; ``"glu"`` or ``"branch"`` (``"mlp"`` is - not ported). + Point-wise grid operation: ``"glu"``, ``"mlp"``, or ``"branch"``.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py` around lines 692 - 695, The `op_type` parameter docstring in the GridNet class (around lines 692-695) incorrectly states that `"mlp"` is not ported, but this contradicts the actual implementation which supports all three operation types including `"mlp"`. Update the docstring for the `op_type` parameter to remove the note claiming `"mlp"` is not ported, and ensure it accurately reflects that `"mlp"` is a supported option alongside `"glu"` and `"branch"`, consistent with the module docstring, BaseGridNet implementation, and serialize/deserialize logic.
…modeling#5552) Based on deepmodeling#5517 (`perf(dpa4): opt so3grid` by @OutisLi) — this branch contains all of its commits plus one fix commit that addresses the CI failures on that PR. ### Problem deepmodeling#5517 introduces a new parameter-free `GridProduct` `NativeOP` in `deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py` for the so3grid optimization, but it has no `serialize`/`deserialize` and is not registered via `register_dpmodel_mapping`. The pt_expt backend auto-wraps every dpmodel `NativeOP` sub-component through `_auto_wrap_native_op`, which requires the op to be serializable (or registered) to build its dynamic torch wrapper. Otherwise it raises: ``` TypeError: Cannot auto-wrap GridProduct: it must implement serialize()/deserialize() or be explicitly registered via register_dpmodel_mapping(). ``` This broke **every `Test Python` shard** that loads a DPA4 pt_expt model (e.g. `source/tests/pt_expt/model/test_get_model_dpa4.py::TestGetModelDPA4::test_pair_exclude_types_from_descriptor`) on deepmodeling#5517. ### Fix Add trivial `serialize`/`deserialize` to `GridProduct` (no state — mirrors the `GridMLP` `@class`/`@version` convention). `_auto_wrap_native_op` then passes its `hasattr(value, "serialize")` guard and returns `wrapped_cls.deserialize(value.serialize())` cleanly. ### Notes - The sibling `GridMLP` (also new in deepmodeling#5517) already implements `serialize`/`deserialize`; only the parameter-free `GridProduct` was missing them. - Verified by tracing the `_auto_wrap_native_op` code path (`deepmd/pt_expt/common.py:138-170`); the actual pt_expt DPA4 test runs in CI here. - Once green, this can supersede deepmodeling#5517, or the single fix commit can be cherry-picked onto it. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Refactor** * Restructured internal grid-net coefficient processing for improved efficiency. * Consolidated grid operation selection logic and refactored supporting utility functions. * **Documentation** * Clarified docstrings for grid-path configuration options. * **Tests** * Extended parity test coverage for grid operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: OutisLi <LTC201806070316@gmail.com> Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
… pt) (deepmodeling#5555) ## What Completes the DPA4/SeZM **SO3 grid projection** port to the dpmodel backend so it faithfully **mirrors master's current pt** `sezm_nn/grid_net.py`. After this, the flagship `examples/water/dpa4/input.json` (which sets `ffn_so3_grid=true`, `message_node_so3=true`, `grid_mlp`) runs on dpmodel/pt_expt. Builds **on top of** the S2-grid base that deepmodeling#5517/deepmodeling#5552 landed (`GridProduct`/`GridMLP`/`op_type='mlp'`, the `_project_frames` refactor). Master's dpmodel was the S2 (`n_frames==1`) slice with SO3/cross-mode fail-fast guarded; this PR generalizes those ops to frame-aware (`n_frames>1`) + cross-mode and adds the missing SO3 pieces — matching current pt exactly (single source of truth: dpmodel == pt). Supersedes deepmodeling#5547 (which ported the *pre*-deepmodeling#5552 design and went structurally stale). ## Changes (all mirror current pt) - **`grid_net.py`**: add `_project_frames`; generalize `GridMLP`/`GridBranch` to frame-aware (`n_frames`); generalize `BaseGridNet` (un-guard `mode='cross'`, `layout='flat'`, `residual_scale_init`, `n_frames>1`; frame-axis to/from-grid via `xp.matmul`+reshape); add `FrameContract`/`FrameExpand`/`_build_frame_degree_index`; add `SO3GridNet` (self+cross). - **`projection.py`**: add `SO3GridProjector` (Wigner-D quadrature) + `resolve_so3_grid`/`_build_so3_frame_set`. - **`ffn.py`**: un-guard `ffn_so3_grid` → `SO3GridNet(mode='self')`. - **`so2.py`**: un-guard `node_wise_{s2,so3}`/`message_node_{s2,so3}` → cross-mode grid products, applied in `call` + round-tripped in serialize. ## Validation - Component parity vs pt (weight-copied fp64): `_project_frames`, `GridMLP`/`GridBranch` (incl. S2 byte-identical regression), `BaseGridNet` cross/flat/residual, `FrameContract`/`FrameExpand`, `SO3GridProjector` matrices, `SO3GridNet` self+cross (op_type glu/mlp/branch, kmax 1&2) — all **1e-12**; rotation equivariance **1e-10**. - **fp32** grid-path parity at the computation-in-fp32 budget (actual diffs 1e-6–1e-8 ≪ 1e-4). - Full-descriptor pt→dpmodel via `DescrptDPA4.deserialize(pt.serialize())` on the example config (lmax=3, mmax=1) — **~1e-14** — proving `dp convert-backend` schema interop. - Permutation-invariance + masked-edge no-op. - Cross-backend consistency rows (pt vs dpmodel **and pt_expt**, mixed_types) for ffn_so3_grid / message_node_so3 / both / grid_mlp. - **Verified on remote GPU (Tesla T4):** 617 (grid+parity+pt_expt) + 50 (consistency) pass, no CUDA device errors. pt_expt forward works today via auto-wrap (consistency + descriptor trio green) — no explicit registration needed. ## Known limitations - pt_expt **training** Parameter-promotion for the new weight-bearing grid classes, `torch.export`/AOTI grid coverage, training e2e, argcheck `doc_only_pt_supported` removal, and freeze/DeepEval are a **follow-up PR**. - `grid_method='e3nn'` (non-Lebedev product grid) stays fail-fast (Lebedev-only, per parent design). - fp32 grid paths use a ~1e-4 budget by design; fp64 is the parity reference. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SO(3) grid projection support and frame-aware grid networks for DPA4 descriptors, including SO(3)-based FFN and improved cross-mode grid-product wiring. * Extended grid modules to support multi-frame configurations with per-degree frame mixing, and added full SO(3) projector/network serialization. * **Bug Fixes** * Enabled previously disabled/unsupported DPA4 SO(2) convolution cross-mode SO(3)/S2 grid products. * **Documentation** * Updated DPA4 porting-layer documentation to clarify supported configuration flags. * **Tests** * Added/expanded parity, equivariance, serialization/roundtrip, and torch-namespace compatibility tests for the new SO(3) and frame-aware paths. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Summary by CodeRabbit
Release Notes
Refactor
to_grid/from_gridcallables and generalized grid-op handling for quadratic/product, polynomial MLP, and branch routing.mlpandbranchoperations.Tests
mlp, with updated forward wiring.Documentation