feat(dpmodel): NeighborGraph 3-body angle machinery (PR-E)#5717
feat(dpmodel): NeighborGraph 3-body angle machinery (PR-E)#5717wanghan-iapcm wants to merge 18 commits into
Conversation
…ftmax Built on the existing xp_maximum_at (no new array_api helper needed). Part of NeighborGraph PR-D (graph-native attention).
Segment-based (global (E,E) boolean deliberately avoided): compact eager form for carry-all graphs + shape-static nonzero-free form for the center-major static layout (jit/export/make_fx traceable). Part of NeighborGraph PR-D; PR-E angles reuse (unordered, no-self).
…r > 0) DescrptBlockSeAtten.call_graph grows _graph_attention: the dense per-center (nnei, nnei) attention square becomes the edge-pair axis (center_edge_pairs, ordered + self-included), softmax over keys becomes segment_softmax grouped by the query edge. Op-for-op mirror of GatedAttentionLayer.call (head_dim QKV slicing, normalize q/k/v, temperature/scaling, smooth shift trick, post-softmax sw and dotr weighting, residual + LayerNorm per layer). - shape-static adapter path (static_nnei threaded from the dense call adapter): bit-exact vs the dense body, rtol 1e-12, full flag matrix (attn_layer 1/2 x dotr x smooth x normalize x temperature, binding and non-binding sel). - carry-all (compact) graphs: exact for non-smooth; for smooth the dense branch keeps sel-padding slots in the softmax denominator (dense output is sel-DEPENDENT, up to ~1e-4) — the carry-all form drops those phantom terms by design (user decision 2026-07-03), pinned by a clean-divergence test. - edge_env_mat(return_sw=True) exposes the per-edge switch (zeroed on padding) for the smooth branch. - uses_graph_lower: attention configs are now graph-eligible (concat tebd, no exclude_types still required).
…ial parity
- test_make_fx_graph_attn: graph forward + autograd.grad at attn_layer=2
traces under make_fx for BOTH smooth branches (the shape-static
center_edge_pairs form is nonzero-free) — required since pt_expt compiled
training routes eligible models through the graph lower.
- model-level graph-vs-legacy lower parity now parametrized over
attn_layer {0, 2} (energy/force/virial/atom_virial, 1e-12 CPU).
- eligibility pins: attention+concat is graph-eligible; se_atten_v2
(tebd_input_mode='strip') correctly stays dense (strip = later PR;
the plan's 'se_atten_v2 inherits for free' did not hold).
- linear-model weight tests: pin smooth_type_embedding=False — the standard (graph-routed, carry-all) and linear (graph-ineligible, dense) submodels otherwise differ by the accepted smooth-attention denominator divergence (~1e-6), which is a route artifact, not a weight-combination bug. - new binding-sel sanity: carry-all graph attention diverges from the sel-truncated dense path when sel binds (spec decision deepmodeling#17).
…rity) neighbor_list=None now takes the carry-all graph default for eligible attention models; explicit World-1 builders take the legacy dense route. With smooth attention the two routes differ by design (PR-D), so the route-equivalence tests pin smooth_type_embedding=False.
…-no-self coverage
…ntiparallel test name
for more information, see https://pre-commit.ci
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR expands NeighborGraph utilities for angle and pair construction, exposes smooth-switch data from edge environment matrices, and extends DPA1 graph-native execution to support transformer attention with ChangesGraph-native DPA1 attention support
Estimated code review effort: 4 (Complex) | ~75 minutes Sequence Diagram(s)sequenceDiagram
participant DescrptDPA1
participant DescrptBlockSeAtten
participant edge_env_mat
participant center_edge_pairs
participant _graph_attention_one_layer
DescrptDPA1->>DescrptBlockSeAtten: call_graph(static_nnei)
DescrptBlockSeAtten->>edge_env_mat: return_sw=True
edge_env_mat-->>DescrptBlockSeAtten: rr, sw_e
DescrptBlockSeAtten->>center_edge_pairs: static_nnei
center_edge_pairs-->>DescrptBlockSeAtten: query_edge, key_edge, pair_mask
DescrptBlockSeAtten->>_graph_attention_one_layer: attention inputs
_graph_attention_one_layer-->>DescrptBlockSeAtten: attended edge embeddings
DescrptBlockSeAtten-->>DescrptDPA1: output, rotation matrix
Possibly related PRs
Suggested labels: Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| ng = build_neighbor_graph(coord, atype, None, 2.0) | ||
| ng_default = attach_angles(ng, a_rcut=1.5) | ||
| ng_full = attach_angles(ng, a_rcut=1.5, ordered=True, include_self=True) | ||
| ai_def = np.asarray(ng_default.angle_index) |
| ng_full = attach_angles(ng, a_rcut=1.5, ordered=True, include_self=True) | ||
| ai_def = np.asarray(ng_default.angle_index) | ||
| am_def = np.asarray(ng_default.angle_mask) | ||
| ai_full = np.asarray(ng_full.angle_index) |
| am = np.asarray(ng.angle_mask) | ||
| ai = np.asarray(ng.angle_index) | ||
| ei = np.asarray(ng.edge_index) # (2, E): [src, dst] | ||
| ev = np.asarray(ng.edge_vec) |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
source/tests/common/dpmodel/test_angle_builder.py (1)
326-364: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a padded-
angle_indexcase to the aggregation tests.Both aggregation tests here only use angle data where every slot is real (no padding). Given the padding-mask precondition gap noted in
angles.py(angle_to_edge_sum/angle_to_node_sum), a test with a paddedangle_indexand non-zero paddingdatavalues would catch regressions if that precondition is ever violated.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/common/dpmodel/test_angle_builder.py` around lines 326 - 364, The aggregation tests in test_angle_builder only cover fully real angle slots, so add a new padded-angle case for angle_to_edge_sum and angle_to_node_sum using a padded angle_index with non-zero data in the padded positions. Reuse the existing test_angle_aggregation and test_angle_aggregation_torch_namespace patterns, but assert that padding is ignored in both the numpy and torch-namespace paths so regressions around the padding-mask precondition are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/dpmodel/utils/neighbor_graph/segment.py`:
- Around line 59-89: The masked softmax in segment_softmax can overflow because
shifted is computed from raw data instead of data_for_max, allowing masked
entries to produce inf and then nan when multiplied by the zero mask. Update
segment_softmax to base the subtraction on data_for_max so masked elements
remain -inf before exp, keeping segment_sum and the denom_e guard from being
poisoned; use the existing segment_max, segment_sum, and mask handling paths to
locate the fix.
---
Nitpick comments:
In `@source/tests/common/dpmodel/test_angle_builder.py`:
- Around line 326-364: The aggregation tests in test_angle_builder only cover
fully real angle slots, so add a new padded-angle case for angle_to_edge_sum and
angle_to_node_sum using a padded angle_index with non-zero data in the padded
positions. Reuse the existing test_angle_aggregation and
test_angle_aggregation_torch_namespace patterns, but assert that padding is
ignored in both the numpy and torch-namespace paths so regressions around the
padding-mask precondition are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b038075f-6809-4579-8ce1-5cf5d1d41c90
📒 Files selected for processing (17)
deepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/utils/neighbor_graph/__init__.pydeepmd/dpmodel/utils/neighbor_graph/angles.pydeepmd/dpmodel/utils/neighbor_graph/env.pydeepmd/dpmodel/utils/neighbor_graph/graph.pydeepmd/dpmodel/utils/neighbor_graph/pairs.pydeepmd/dpmodel/utils/neighbor_graph/segment.pysource/tests/common/dpmodel/test_angle_builder.pysource/tests/common/dpmodel/test_center_edge_pairs.pysource/tests/common/dpmodel/test_dpa1_call_graph_block.pysource/tests/common/dpmodel/test_dpa1_graph_attention_parity.pysource/tests/common/dpmodel/test_graph_angle_cos_parity.pysource/tests/common/dpmodel/test_segment_softmax.pysource/tests/pt_expt/descriptor/test_dpa1.pysource/tests/pt_expt/model/test_dpa1_graph_lower.pysource/tests/pt_expt/model/test_linear_model.pysource/tests/pt_expt/utils/test_neighbor_list.py
| def segment_softmax( | ||
| data: Array, | ||
| segment_ids: Array, | ||
| num_segments: int, | ||
| mask: Array | None = None, | ||
| ) -> Array: | ||
| """Softmax over entries sharing a segment id, numerically stable. | ||
|
|
||
| Mirrors the dense ``np_softmax`` max-subtraction trick with a PER-SEGMENT | ||
| max. ``mask`` (bool, per entry) removes masked entries from the softmax | ||
| entirely (zero weight AND excluded from the denominator). Empty or | ||
| fully-masked segments produce all-zero weights (no NaN). | ||
| """ | ||
| xp = array_api_compat.array_namespace(data) | ||
| if mask is not None: | ||
| # keep masked entries out of the per-segment max: send them to -inf | ||
| neg = xp.full_like(data, -xp.inf) | ||
| data_for_max = xp.where(mask, data, neg) | ||
| else: | ||
| data_for_max = data | ||
| seg_max = segment_max(data_for_max, segment_ids, num_segments) | ||
| # guard -inf (empty / fully-masked segments) so gather doesn't yield inf-inf | ||
| seg_max = xp.where(xp.isinf(seg_max), xp.zeros_like(seg_max), seg_max) | ||
| shifted = data - xp.take(seg_max, segment_ids, axis=0) | ||
| ex = xp.exp(shifted) | ||
| if mask is not None: | ||
| ex = ex * xp.astype(mask, ex.dtype) | ||
| denom = segment_sum(ex, segment_ids, num_segments) | ||
| denom_e = xp.take(denom, segment_ids, axis=0) | ||
| safe = xp.where(denom_e > 0, denom_e, xp.ones_like(denom_e)) | ||
| return ex / safe |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Masked-entry overflow can NaN-poison the whole segment.
shifted is computed from the raw data (line 82), not data_for_max. If a masked entry's data value is much larger than the segment's (unmasked) max, exp(shifted) can overflow to inf; multiplying by the zero mask (line 85) then yields inf * 0 = nan. That NaN propagates through segment_sum (line 86) and poisons the softmax denominator for the entire segment, not just the masked slot — the denom_e > 0 guard doesn't catch NaN (nan > 0 is False), so safe falls back to 1, silently returning an unnormalized result for every entry sharing that segment. This contradicts the docstring's "no NaN" guarantee for masked/partially-masked segments.
Using data_for_max (already -inf on masked entries) for the subtraction avoids ever computing exp(inf).
🐛 Proposed fix
seg_max = segment_max(data_for_max, segment_ids, num_segments)
# guard -inf (empty / fully-masked segments) so gather doesn't yield inf-inf
seg_max = xp.where(xp.isinf(seg_max), xp.zeros_like(seg_max), seg_max)
- shifted = data - xp.take(seg_max, segment_ids, axis=0)
+ shifted = data_for_max - xp.take(seg_max, segment_ids, axis=0)
ex = xp.exp(shifted)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def segment_softmax( | |
| data: Array, | |
| segment_ids: Array, | |
| num_segments: int, | |
| mask: Array | None = None, | |
| ) -> Array: | |
| """Softmax over entries sharing a segment id, numerically stable. | |
| Mirrors the dense ``np_softmax`` max-subtraction trick with a PER-SEGMENT | |
| max. ``mask`` (bool, per entry) removes masked entries from the softmax | |
| entirely (zero weight AND excluded from the denominator). Empty or | |
| fully-masked segments produce all-zero weights (no NaN). | |
| """ | |
| xp = array_api_compat.array_namespace(data) | |
| if mask is not None: | |
| # keep masked entries out of the per-segment max: send them to -inf | |
| neg = xp.full_like(data, -xp.inf) | |
| data_for_max = xp.where(mask, data, neg) | |
| else: | |
| data_for_max = data | |
| seg_max = segment_max(data_for_max, segment_ids, num_segments) | |
| # guard -inf (empty / fully-masked segments) so gather doesn't yield inf-inf | |
| seg_max = xp.where(xp.isinf(seg_max), xp.zeros_like(seg_max), seg_max) | |
| shifted = data - xp.take(seg_max, segment_ids, axis=0) | |
| ex = xp.exp(shifted) | |
| if mask is not None: | |
| ex = ex * xp.astype(mask, ex.dtype) | |
| denom = segment_sum(ex, segment_ids, num_segments) | |
| denom_e = xp.take(denom, segment_ids, axis=0) | |
| safe = xp.where(denom_e > 0, denom_e, xp.ones_like(denom_e)) | |
| return ex / safe | |
| def segment_softmax( | |
| data: Array, | |
| segment_ids: Array, | |
| num_segments: int, | |
| mask: Array | None = None, | |
| ) -> Array: | |
| """Softmax over entries sharing a segment id, numerically stable. | |
| Mirrors the dense ``np_softmax`` max-subtraction trick with a PER-SEGMENT | |
| max. ``mask`` (bool, per entry) removes masked entries from the softmax | |
| entirely (zero weight AND excluded from the denominator). Empty or | |
| fully-masked segments produce all-zero weights (no NaN). | |
| """ | |
| xp = array_api_compat.array_namespace(data) | |
| if mask is not None: | |
| # keep masked entries out of the per-segment max: send them to -inf | |
| neg = xp.full_like(data, -xp.inf) | |
| data_for_max = xp.where(mask, data, neg) | |
| else: | |
| data_for_max = data | |
| seg_max = segment_max(data_for_max, segment_ids, num_segments) | |
| # guard -inf (empty / fully-masked segments) so gather doesn't yield inf-inf | |
| seg_max = xp.where(xp.isinf(seg_max), xp.zeros_like(seg_max), seg_max) | |
| shifted = data_for_max - xp.take(seg_max, segment_ids, axis=0) | |
| ex = xp.exp(shifted) | |
| if mask is not None: | |
| ex = ex * xp.astype(mask, ex.dtype) | |
| denom = segment_sum(ex, segment_ids, num_segments) | |
| denom_e = xp.take(denom, segment_ids, axis=0) | |
| safe = xp.where(denom_e > 0, denom_e, xp.ones_like(denom_e)) | |
| return ex / safe |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/dpmodel/utils/neighbor_graph/segment.py` around lines 59 - 89, The
masked softmax in segment_softmax can overflow because shifted is computed from
raw data instead of data_for_max, allowing masked entries to produce inf and
then nan when multiplied by the zero mask. Update segment_softmax to base the
subtraction on data_for_max so masked elements remain -inf before exp, keeping
segment_sum and the denom_e guard from being poisoned; use the existing
segment_max, segment_sum, and mask handling paths to locate the fix.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5717 +/- ##
==========================================
- Coverage 81.26% 81.18% -0.09%
==========================================
Files 988 990 +2
Lines 110876 111079 +203
Branches 4234 4233 -1
==========================================
+ Hits 90103 90175 +72
- Misses 19247 19375 +128
- Partials 1526 1529 +3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
test_plugin popped deepmd.pt_expt from sys.modules without restoring it,
leaving the package's cached submodules bound to a dead parent. Any later
import of a cached submodule (e.g. deepmd.pt_expt.infer.deep_eval)
re-created a BARE parent whose utils/infer attributes were never rebound,
and mock.patch('deepmd.pt_expt.utils...') in
test_deep_eval_serialize_api failed with AttributeError under py3.10's
mock target resolution. Shard-order dependent: this PR's new test files
reshuffled CI shard 4 and exposed the latent master-side hygiene bug.
Snapshot the whole deepmd.pt_expt module tree before the re-import and
restore it (including the parent-package attribute) afterwards.
Summary
NeighborGraph PR-E: the optional 3-body angle extension of the edge-graph neighbor-list contract (design discussion wanghan-iapcm#4). An angle is a pair of edges sharing a center (
dst(edge_a) == dst(edge_b)), stored asangle_index (2, A)into[0, E)+angle_mask (A,)—edge_vecstays the ONLY geometry leaf, so force/virial assembly is untouched (proven by an invariance test).What's added (all in
deepmd/dpmodel/utils/neighbor_graph/)pad_and_guard_angles(graph.py) — angle-axis padder mirroringpad_and_guard_edges(dynamic guard append / staticangle_capacitywith overflow ValueError).angles.py(new):build_angle_index(edge_index, edge_vec, edge_mask, n_total, a_rcut, *, ordered=False, include_self=False, layout=None)— unordered, no-self pairs of edges sharing a center where BOTH edges are withina_rcut; built on PR-D'scenter_edge_pairs;pair_maskfolded intoangle_mask(never discarded).attach_angles(graph, a_rcut, ...)— post-hoc: edge graph in, graph with angle fields out (dataclasses.replace); default builders keep anglesNone.angle_to_edge_sum/angle_to_node_sum— segment-sum aggregation to the query edge / shared center.graph_angle_cos(angle_index, edge_vec, eps=1e-6)— per-angle cos θ mirroring dpa3 repflowscosine_ijeps placement exactly (+epsin norm denominators,*(1-eps)on the product).angle_padding_fraction(graph)— mask-derived padding-waste report for static capacities.Semantics / decisions
a_sel x a_selsquare including thej==kdiagonal; the graph set keeps one entry per unordered{j,k}pair and moves the degenerate diagonal to the (a_rcut-filtered) edge channel. Dense parity is therefore asserted against the OFF-DIAGONALcosine_ij[j,k](j != k) at rtol/atol 1e-12 (same-math fp64), at non-bindinga_sel. The ordered+self full square stays available via flags.a_sel= normalization-only (carry-all withina_rcut), consistent with the edge-seldecision.sw == 1regime (rtol 1e-12).Tests
source/tests/common/dpmodel/test_angle_builder.py(21) +test_graph_angle_cos_parity.py(6): brute-force triplet oracle (all flag combinations, multi-center, static layout,node_capacitybranch), dpa3 dense-parity + no-self-angle assertion, se_t coordinate oracle, force/virial bit-exact invariance with/without angles, padding-fraction (incl.total==0), torch-namespace smoke tests for every new function. Full neighbor-graph suite: 54 passed.Known limitations
mixed_types=False), used as oracle only.nonzeroincenter_edge_pairs) even when a staticlayoutis passed —angle_capacityfixes the output shape only; shape-static enumeration for export is deferred to PR-G.angle_maskbefore summing (padding angles point at edge 0).A ~ sum(deg^2)capacity overhead mitigated bya_rcut < rcutand reported byangle_padding_fraction.Summary by CodeRabbit
New Features
maxandsoftmax).Bug Fixes
Tests