feat(pt_expt): dpa1(attn_layer=0) graph-native NeighborGraph forward#5583
Merged
wanghan-iapcm merged 69 commits intoJun 29, 2026
Merged
Conversation
added 26 commits
June 25, 2026 17:26
…_graph The dense path masks excluded type pairs; the graph path does not yet, so raise NotImplementedError instead of silently diverging.
…pter (attn_layer=0)
…k kernel; sw to dense adapter
…ph_method (Option B)
…gy_deriv) + parity
…d/pt_expt/model/)
serialize roundtrip + dpmodel->pt_expt interop on the attn_layer=0 graph path are already covered by test_dpa1.py::test_consistency (lines 86-113), which routes through the graph forward via the Task-3 dense-call adapter.
…all back to dense Task 3's adapter routed ALL attn_layer==0 through the graph, but the graph only supports tebd_input_mode='concat', no exclude_types, and needs mapping for ghosts. strip-mode / exclude / mapping-None-with-ghosts attn_layer=0 models raised/IndexError'd. uses_graph_lower() now encodes full eligibility and ineligible configs fall back to the legacy dense body unchanged. Fixes test_compressed_forward (attn_layer=0 strip).
…ping-ghosts) dense fallback
…pt graph mask key; legacy opt-out in Option-B test - _resolve_graph_method/_call_common_graph use getattr(atomic_model,'descriptor',None) so Linear/ZBL models (no descriptor) fall back to dense instead of AttributeError - pt_expt _call_common_graph override adds the all-ones mask key for dense parity - test_dpa1_graph_model_energy dense refs use neighbor_graph_method='legacy' to opt out of the now-default carry-all graph (decision deepmodeling#17 default-flip)
…nse default dpmodel/jax compute force/virial analytically inside call_common (energy_derv_r); the energy-only graph lower drops it -> KeyError when force is requested. Only pt_expt has the autograd graph force/virial path, so only pt_expt defaults eligible models to the graph. dpmodel base _resolve_graph_method no longer auto-routes; pt_expt overrides it to re-enable AUTO.
…nse call jit/export-traceable (decision deepmodeling#16)
…x int-sum, Array typing) - swap dangling memory/spec_unified_edge_nlist.md refs -> public design discussion (#4) so the references resolve - edge_force_virial: short-circuit n_out=int(node_capacity) when supplied so the static jax/export path never calls int() on a traced sum(n_node) - derivatives.py: move Array import under TYPE_CHECKING (+ from __future__ import annotations) for subpackage uniformity
for more information, see https://pre-commit.ci
…ng product, not -1) The leading-dim-agnostic refactor used xp.reshape(out, (*lead, -1)), but numpy cannot infer -1 for a size-0 array (a zero-atom forward, nloc==0). Restore the explicit trailing product math.prod(out.shape[len(lead):]). Fixes 101 universal model test_zero_forward failures across all descriptors (SeA/DPA3/...); dense path now byte-unchanged. Re-add the 'import math' dropped in the Task-5 refactor.
for more information, see https://pre-commit.ci
Contributor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/dpmodel/utils/neighbor_graph/graph.py`:
- Around line 122-127: The helper frame_id_from_n_node still converts n_node to
a Python int via int(xp.sum(n_node)), which breaks tracing for symbolic inputs.
Update frame_id_from_n_node to avoid deriving n_total from a runtime sum and
instead accept a static node count/capacity, following the same export-safe
pattern used by node_validity_mask. Keep the rest of the boundary/searchsorted
logic in place, but ensure all shape-related values come from static inputs so
the function remains trace-friendly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ada10980-d2ac-4e52-881e-6ae6632bb748
📒 Files selected for processing (23)
deepmd/dpmodel/atomic_model/base_atomic_model.pydeepmd/dpmodel/atomic_model/dp_atomic_model.pydeepmd/dpmodel/atomic_model/polar_atomic_model.pydeepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/fitting/general_fitting.pydeepmd/dpmodel/model/make_model.pydeepmd/dpmodel/model/transform_output.pydeepmd/dpmodel/utils/exclude_mask.pydeepmd/dpmodel/utils/neighbor_graph/__init__.pydeepmd/dpmodel/utils/neighbor_graph/graph.pydeepmd/pt_expt/model/edge_transform_output.pydeepmd/pt_expt/model/make_model.pysource/tests/common/dpmodel/case_single_frame_with_nlist.pysource/tests/common/dpmodel/test_dpa1_call_graph_descriptor.pysource/tests/common/dpmodel/test_edge_env_mat.pysource/tests/common/dpmodel/test_fitting_call_graph.pysource/tests/common/dpmodel/test_graph_atomic_parity.pysource/tests/common/dpmodel/test_graph_ragged.pysource/tests/common/test_mixins.pysource/tests/pd/model/test_env_mat.pysource/tests/pt_expt/model/test_dpa1_graph_lower.pysource/tests/pt_expt/model/test_graph_ragged.pysource/tests/universal/common/cases/cases.py
🚧 Files skipped from review as they are similar to previous changes (3)
- deepmd/dpmodel/utils/neighbor_graph/init.py
- deepmd/pt_expt/model/edge_transform_output.py
- deepmd/dpmodel/descriptor/dpa1.py
added 6 commits
June 28, 2026 01:47
…ude non-vacuity; unify polar eye tiling - forward_atomic_graph fparam-by-frame_id dispatch now UTed (graph==dense 1e-12 + per-frame fparam differs) [review #2] - pair-exclude non-vacuity toggles pair_excl on the SAME model weights (isolates exclusion from weights) [review #1] - polar apply_out_stat eye tiling unified to xp.tile(eye, (*atype.shape,1,1)) (drops the ndim==2 if/else) [review #3]
…ph (align with .call) [review deepmodeling#5]
…; move dpmodel transform to edge_transform_output.py [review deepmodeling#8,deepmodeling#10] - fit_output_to_model_output_graph now takes the NeighborGraph instead of n_node (dpmodel) / edge_vec+edge_index+edge_mask+n_node (pt_expt); the pt_expt autograd leaf is graph.edge_vec. Unifies the two signatures. - dpmodel fit_output_to_model_output_graph moved transform_output.py -> new edge_transform_output.py (mirrors the pt_expt file layout). - tighten pair-exclude non-vacuity tolerance (1e-9; the (0,1) effect is ~2e-6).
…per + alias) [review deepmodeling#6,deepmodeling#7,deepmodeling#9] Mirror the dense lower structure for the graph path: - NEW model-level forward_common_atomic_graph (builds NeighborGraph + atomic forward_common_atomic_graph + flat-N output transform) -- analogue of the dense forward_common_atomic; the graph build is no longer inlined in the lower [deepmodeling#6]. - call_lower_graph -> public call_common_lower_graph WITH _input/_output_type_cast (edge_vec is the geometry in place of coord), making it a directly-callable PRIMARY interface per spec decision deepmodeling#14 [deepmodeling#7]. - call_lower_graph = call_common_lower_graph alias (mirrors call_lower = call_common_lower) [deepmodeling#9].
The TestCompiledVaryingNatoms dpa1(attn_layer=0) case failed: the uncompiled reference uses the pt_expt carry-all GRAPH forward (default-flip deepmodeling#17) while the compiled forward_lower uses the sel-capped DENSE forward. Those are two different force computations -- even at non-binding sel the forward matches to ~1e-16 but their backward gradients agree only to fp64 accumulation (~1e-12), which the optimizer amplifies into a diverging training trajectory (weight drift ~1e-3 after one step). It is NOT sel-binding and NOT a torch.compile dynamic-shape bug. Pin BOTH sides to the legacy dense env-mat path via force_legacy_descriptor=True (monkeypatch descriptor.uses_graph_lower -> False, killing both the default-flip and the _call_graph_adapter), so this stays a true compile-correctness check on the path it actually compiles. Compiling the GRAPH lower so eager==compiled is tracked for PR-B.
…tion Add the missing Parameters/Returns sections (and fill incomplete ones) on the NeighborGraph / graph-lower functions so they match the package numpydoc style: - dpa1: _call_graph_adapter, _call_dense (Parameters+Returns) - general_fitting.call_graph: add missing g2, h2 params - neighbor_graph: pad_and_guard_edges, node_validity_mask (Parameters+Returns); from_dense_quartet, build_neighbor_graph_ase (Returns); edge_force_virial (add g_e/edge_vec/edge_index/edge_mask params) - dpmodel/pt_expt make_model: _resolve_graph_method, _call_common_graph (Parameters+Returns); call_common_lower_graph (replace "Parameters mirror ..." cross-ref with an explicit Parameters section) - pt_expt edge_transform_output: edge_energy_deriv (Parameters+Returns); fit_output_to_model_output_graph (Returns) Docstring-only; no behavior change.
iProzd
requested changes
Jun 28, 2026
- call_common: an explicit `neighbor_list` (a dense-nlist strategy) is no longer silently ignored by the graph default. Raise on `neighbor_list` + explicit `neighbor_graph_method`; otherwise honor the nlist by taking the dense route. - frame_id_from_n_node: accept an optional static `n_total` (jax/export trace-friendly, avoids `int(sum(n_node))`); clamp padding nodes to the last frame so a padded node axis stays in range for segment_sum. - thread `charge_spin` (accept-for-ABI-stability, like comm_dict/n_local) through the graph interface: forward_atomic_graph, forward_common_atomic_graph, call_common_lower_graph, forward_common_lower_graph. - docs: list neighbor_graph_method options one per line incl. "legacy", clarify "dense"/"ase" are carry-all GRAPH builders (not the dense nlist lower); contrast from_dense_quartet (legacy-quartet adapter, keeps sel truncation) vs the carry-all builders. Tests: neighbor_list conflict-raise + dense-route fallback; frame_id static n_total (exact + padded).
dpa1 does not consume charge_spin (get_dim_chg_spin()==0; the dense atomic model passes None to the descriptor since add_chg_spin_ebd is False). charge_spin is accepted on the graph lower only for ABI stability with charge/spin-conditioned descriptors (dpa3/dpa4, PR-G). Pin that the dpa1 graph lower output is INVARIANT to charge_spin: - dpmodel call_common_lower_graph: energy/atom_energy/mask unchanged. - pt_expt forward_common_lower_graph: energy/force/virial/atom_virial unchanged. With the existing graph==dense parity at non-binding sel this gives the full claim graph(charge_spin) == graph(None) == dense. Guards against a future regression where charge_spin leaks into the dpa1 graph path.
for more information, see https://pre-commit.ci
OutisLi
approved these changes
Jun 28, 2026
CodeQL flagged the unused local `N = nf * nloc`; fold it into the comment.
iProzd
approved these changes
Jun 28, 2026
wccc-phys
pushed a commit
to wccc-phys/deepmd-kit
that referenced
this pull request
Jul 2, 2026
…C++ inference single & multi-rank (NeighborGraph PR-B) (deepmodeling#5604) ## NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++ inference (single & multi-rank) This PR spans the full PR-B: **B1** (Python: graph `.pt2` export + compiled training on the graph lower), **B2** (C++ single-rank inference of the graph `.pt2`, dynamic edge axis), and **B3** (C++/LAMMPS multi-rank). Built on the merged PR-A (deepmodeling#5583). Scope: dpa1, `attn_layer=0`, pt_expt. ### B1 — graph `.pt2` export + compiled training (Python) - `forward_common_lower_graph_exportable` trace target; `serialization.py` graph export branch (`lower_kind="graph"`, `lower_input_kind` metadata); `_eval_model_graph` DeepEval dispatch (parity vs eager dpa1 **1e-10 pbc+nopbc**). - **Compiled training retargeted to the graph lower so eager == compiled** (the MUST-FIX) → `force_legacy_descriptor` deleted. Root cause was a real dpa1 `call_graph` autograd **detach** bug (`xp.asarray(tebd, device=)` drops the tebd-net gradient under torch); fixed. ### B2 — C++ graph ingestion (dynamic edge axis, single-rank) - Graph `.pt2` uses a **dynamic edge axis** (`Dim("nedge", min=2)`) — one artifact evals any system size (proven across 56- and 380-edge systems at 1e-10), no C++ capacity ceiling. - C++ `DeepPotPTExpt`: `lower_input_is_graph_` + `run_model_graph` (NeighborGraph ABI: `atype, n_node, edge_index, edge_vec, edge_mask, …`) + `buildGraphTensors` (mirrors the deepmodeling#5562 edge path; node types from `atype_ext`); `remap_graph_outputs_to_dense_keys` (single-rank). - gtest: 5 cases × {double,float} = 10/10 (build-nlist parity, dynamic-E 2nd size, `ago>0`, tiny system, atomic-overload). The review process caught two bugs that would otherwise have shipped: an `ago>0` heap-OOB (by inspection) and a public-vs-internal output-key mismatch (at runtime). ### B3 — multi-rank C++ / LAMMPS (non-MP) - **dpa1 is non-message-passing ⇒ multi-rank needs NO `border_op`/with-comm artifact** (that is a message-passing concern, deferred to PR-G). Multi-rank reuses the **same single-rank graph `.pt2`**, fed an **extended-region graph** (`buildGraphTensors(fold_to_local=false)`, `N=nall`, ghost node types from `atype_ext` incl. halo), with owned energy = `sum(atom_energy[0:nloc])` and the extended force folded to owners through the **existing dense `select_map` reverse-comm**. The fail-fast for `graph && multi_rank && has_message_passing` is retained. - **Validated locally on multi-CPU** (no GPU needed for correctness): `test_lammps_dpa1_graph_pt2.py` — single-rank vs reference, `mpirun -n 2` ≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an empty-subdomain (`nloc=0`) corner. Single-rank gtests stay 10/10 (multi-rank is purely additive). Multi-rank matched single-rank on the first run. ### Tests / known limitations - Per-task + whole-phase reviews all Ready-to-merge. - **pt_expt-only; dpa1 (non-MP) only.** Follow-ons: **PR-C** vesin/nv O(N) builders (carry-all builders still use `nonzero`, eager-only), **PR-D** attention, **PR-E** angles, **PR-F** jax graph force, **PR-G** dpa2/3 message-passing (forward halo + with-comm). CUDA multi-rank unvalidated locally. Carried code-cleanup follow-ups: a ~60-line DRY duplication in `training.py`; the multi-rank *atomic* output branch has no direct gtest (covered indirectly by the mpirun per-atom-virial assertion, since a single-process gtest can't set `nprocs>1`). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary * **New Features** * Added support for graph-schema (NeighborGraph) model archives with a selectable `lower_kind="graph"` export path, including CLI support and new graph-form inference handling. * Added static edge-capacity support during graph construction. * **Bug Fixes** * Improved gradient continuity for type embeddings in graph mode. * Enhanced trace/export stability by preventing out-of-range graph indices/frame IDs and making scatter/frame sizing more consistent. * **Tests** * Added/extended parity, export metadata, training, and LAMMPS single-/multi-rank validation for graph-form `.pt2`, plus metadata checks for `lower_input_kind`. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the graph-native forward path for
dpa1(attn_layer=0)(the factorizable, mixed-types case), built on theNeighborGraphfoundation from #5581. Geometry enters the descriptor only through per-edgeedge_vec; the neighbor-axis reduction becomes asegment_sumover edge centers. Forpt_exptthis becomes the default forward (force/virial via a single autograd backward throughedge_vec).What it adds
edge_env_mat(per-edge env-mat 4-vector),DescrptBlockSeAtten._call_graph+DescrptDPA1.call_graph, modelcall_lower_graph(energy),neighbor_graph_from_ijs+ an optional ASE O(N) carry-all builder.edge_energy_deriv(autogradgrad(E, edge_vec)→edge_force_virial) +forward_common_lower_graph(energy + force + virial + atom_virial).DescrptDPA1.callbecomes a thin adapter (from_dense_quartet → call_graph) preserving the 5-tuple ABI; a shape-static converter keeps itjax.jit/torch.export-traceable.Default behavior
dpa1(attn_layer=0, concat tebd, no exclude_types)models to the carry-all graph (it has the autograd force/virial path).sel.exclude_types, linear/ZBL) fall back to the dense path unchanged.neighbor_graph_method="legacy"forces dense;"dense"/"ase"force the graph.Parity (graph vs legacy dense lower, fp64 CPU)
atom_virial matches the canonical TF==pt-legacy full-to-src convention. dpa1 descriptor + model consistency suites green across dp/jax/pt_expt.
Known limitations
make_fx(forward + grad) traces; full.pt2AOTI export is a follow-up (PR-B). The carry-all builders (build_neighbor_graph/from_ijs) still usenonzero(eager-only); their static variants land with the export PR.Also folds in three follow-up fixes to the #5581 foundation from @OutisLi's review (dangling spec refs → design discussion,
edge_force_virialjax int-sum short-circuit,Arraytyping).Summary by CodeRabbit
neighbor_graph_methodrouting for energy/force/virial, with carry-all neighbor graphs and graph-output fitting/post-processing.(i,j,S)conversion, and per-edge environment-matrix computation), exported as part of the public API.