Skip to content

feat(pt_expt): O(N) on-device NeighborGraph builders — vesin & nvalchemiops (PR-C)#5714

Open
wanghan-iapcm wants to merge 13 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-graph-builders-prC
Open

feat(pt_expt): O(N) on-device NeighborGraph builders — vesin & nvalchemiops (PR-C)#5714
wanghan-iapcm wants to merge 13 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-graph-builders-prC

Conversation

@wanghan-iapcm

@wanghan-iapcm wanghan-iapcm commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

NeighborGraph PR-C — O(N) on-device graph builders (vesin / nv)

Third PR of the NeighborGraph series (after #5581 foundation, #5583 PR-A dpa1 graph forward, #5604 PR-B .pt2/C++). Adds two O(N) carry-all NeighborGraph builders behind the World-2 neighbor_graph_method dispatcher, replacing PR-A's O(N²) dense search / per-frame ASE stopgap with on-device cell lists.

What

  • build_neighbor_graph_vesin (deepmd/pt_expt/utils/vesin_graph_builder.py) — vesin.torch cell list. Device-following: runs the search on the input tensor's device (CUDA kernel on CUDA input, CPU cell list otherwise).
  • build_neighbor_graph_nv (deepmd/pt_expt/utils/nv_graph_builder.py) — nvalchemiops GPU cell list, frame-batched (batch_idx/batch_ptr, one kernel for all frames — no Python loop). CUDA-only.
  • Both structurally clone build_neighbor_graph_ase: search → per-frame local (i, j, S)neighbor_graph_from_ijs(...), which recomputes edge_vec differentiably from the original grad-carrying coords.
  • Wired into the pt_expt make_model graph dispatch (neighbor_graph_method ∈ {"legacy","dense","ase","vesin","nv"}) and DeepEval .pt2 graph inference (new neighbor_graph_method kwarg, default "dense" → existing inference byte-identical). dpmodel/jax fail-fast on vesin/nv (torch/CUDA-only).

Perf-only: all builders emit the SAME neighbor set as dense (carry-all, sel=normalization-only), proven by exact set-equality; energy/force/virial are unchanged (parity 1e-12 CPU / 1e-10 CUDA).

Layering

  • dpmodel stays torch-free: vesin/nv builders live in pt_expt; the dpmodel dispatch only carries a fail-fast message.
  • vesin/nv are optional deps, NOT in pyproject — lazy-imported, guarded by is_vesin_torch_available() / is_nv_available(), ImportError with an install hint on absence.
  • nv decode is a faithful transcription of the tested deepmd/pt/utils/nv_nlist.py:_matrix_to_extended_inputs Step-1 extraction.

Testing

  • Local (CPU): 13 passed (vesin builder 5, vesin/reject dispatch 2, DeepEval graph 6), nv self-skips.
  • Remote GPU (Tesla T4), commit 78f6c24:
    • test_nv_graph_builder.py: 4 passed (set-equality vs dense periodic+non-periodic, frame-batch, differentiable edge_vec).
    • test_graph_builder_dispatch.py: 3 passed on CUDA (vesin parity 1e-10, dpmodel reject vesin+nv, nv parity 1e-10).
    • test_graph_deepeval.py: 6 passed on CUDA (.pt2 graph dense parity + vesin, AOTI compile + torch.as_tensor extraction).

Known limitations

  • vesin per-frame Python loopvesin.torch.compute is single-system (no batch dim), so multi-frame vesin loops over frames (each an O(N) search; nf=1 inference has zero loop cost). nv batches natively — the loop-free path for batched training.
  • vesin/nv not in pyproject; no "auto" selector (explicit strings only).
  • nv search_capacity = max(64, nloc) initial heuristic + 1.25× grow loop (.item() host-sync per grow).
  • nv passes the normalized coord to from_ijs (differentiable lattice translation, identity gradient; edge_vec is lattice-invariant) — consistent with the pt nv-nlist path.
  • jax O(N) graph builders (matscipy/jax-md) = PR-F; attention/angles/MP = PR-D/E/G.

Implements plan_neighbor_graph_prC_implementation; design spec: discussion wanghan-iapcm#4.

Summary by CodeRabbit

  • New Features
    • Added experimental neighbor-graph builders supporting NVIDIA CUDA (NV) and Vesin cell-list construction (vesin), including periodic shifts and frame-aware edge decoding.
  • Bug Fixes
    • Excluded virtual atoms (type < 0) from graph edges consistently for ASE-based neighbor graphs.
    • Improved behavior for empty/zero-neighbor inputs and ensured differentiable edge vectors.
  • Tests
    • Added backend parity tests (dense vs vesin/nv), NV decode regression coverage, gradient/device checks, virtual-atom exclusions, and strengthened plugin entry-point import validation.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@wanghan-iapcm, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 10 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: af0c60cf-6584-47b0-9fb4-80116438a772

📥 Commits

Reviewing files that changed from the base of the PR and between 6396068 and d2cf920.

📒 Files selected for processing (8)
  • deepmd/pt/utils/nv_nlist.py
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/model/make_model.py
  • deepmd/pt_expt/utils/nv_graph_builder.py
  • deepmd/pt_expt/utils/vesin_graph_builder.py
  • deepmd/pt_expt/utils/vesin_neighbor_list.py
  • source/tests/pt_expt/model/test_graph_builder_dispatch.py
  • source/tests/pt_expt/utils/test_graph_pt2_metadata.py
📝 Walkthrough

Walkthrough

The PR adds PT experimental neighbor-graph builders for CUDA and vesin backends, updates ASE neighbor-graph filtering for virtual atoms, and adds regression tests for decoding, parity, dispatch, and plugin import state restoration.

Changes

PT experimental neighbor-graph builders and validation

Layer / File(s) Summary
NV builder
deepmd/pt_expt/utils/nv_graph_builder.py
A CUDA-backed neighbor-graph builder is added with dense-matrix decoding, periodic coordinate handling, capacity growth, virtual-atom filtering, and conversion back to a differentiable NeighborGraph.
Vesin builder
deepmd/pt_expt/utils/vesin_graph_builder.py
A vesin-backed neighbor-graph builder is added with availability checks, frame-wise neighbor-list computation, empty-system handling, virtual-atom filtering, and differentiable graph construction.
Virtual atom filtering
deepmd/dpmodel/utils/neighbor_graph/ase_builder.py, source/tests/common/dpmodel/test_from_ijs.py
The ASE neighbor-graph builder and its regression coverage exclude edges that touch atoms with negative atype values.
Decode regression tests
source/tests/pt_expt/utils/test_nv_matrix_decode.py
The NV dense-matrix decoder is covered by hand-checked, empty-input, and randomized oracle tests.
Graph parity and dispatch tests
source/tests/pt_expt/model/test_graph_builder_dispatch.py, source/tests/pt_expt/utils/test_nv_graph_builder.py, source/tests/pt_expt/utils/test_vesin_graph_builder.py
Parity tests compare the new vesin and NV builders against dense/reference graph construction, and model dispatch tests cover method routing and backend rejection.

Plugin import state restoration

Layer / File(s) Summary
Module subtree restore
source/tests/pt_expt/test_plugin.py
The plugin-loading test now saves and restores deepmd.pt_expt and its submodules in sys.modules, then rebinds the parent package reference after import.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related issues

Possibly related PRs

  • deepmodeling/deepmd-kit#5559: Also updates source/tests/pt_expt/test_plugin.py around deepmd.pt_expt import and plugin entry-point loading.

Suggested reviewers: njzjz, iProzd

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding O(N) on-device NeighborGraph builders for pt_expt via vesin and nvalchemiops.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.40602% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.51%. Comparing base (9bd1f16) to head (6905918).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt_expt/utils/nv_graph_builder.py 33.87% 41 Missing ⚠️
deepmd/pt_expt/infer/deep_eval.py 79.16% 5 Missing ⚠️
deepmd/pt_expt/model/make_model.py 50.00% 3 Missing ⚠️
deepmd/pt_expt/utils/vesin_graph_builder.py 97.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5714      +/-   ##
==========================================
- Coverage   81.24%   80.51%   -0.73%     
==========================================
  Files         981      990       +9     
  Lines      109860   111001    +1141     
  Branches     4234     4235       +1     
==========================================
+ Hits        89257    89375     +118     
- Misses      19080    20101    +1021     
- Partials     1523     1525       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wanghan-iapcm wanghan-iapcm requested review from OutisLi and iProzd July 2, 2026 13:17
Comment on lines +4 to +7
World-2 counterpart of vesin_neighbor_list.py: instead of building the dense
quartet, it returns per-frame local (i, j, S), then delegates to the array-API
``neighbor_graph_from_ijs`` (which recomputes ``edge_vec`` differentiably from
the ORIGINAL grad-carrying coords). torch-only => lives in pt_expt.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vesin multi-frame loops over frames in Python while nv batches natively. Could we confirm that batched GPU training routes through nv (not vesin) so the per-frame loop is not on the training hot path, and maybe document that vesin is intended for nf=1 inference / CPU?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed: batched training never routes through vesin by default — neighbor_graph_method=None resolves to the "dense" converter (make_model.py _resolve_graph_method), and vesin/nv are explicit opt-in only. Documented the scope in the module docstring in 6905918: vesin = nf==1 inference / CPU (per-frame loop ~1 ms/frame call overhead measured on a T4); batched multi-frame GPU work should opt into nv, which batches all frames in one kernel.

# and search all frames in ONE kernel via batch_idx/batch_ptr.
if periodic:
cell = box.reshape(nf, 3, 3).to(device=device, dtype=coord.dtype)
coord = normalize_coord(coord, cell)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consistency note: build_neighbor_graph_nv normalizes coords before neighbor_graph_from_ijs while build_neighbor_graph_vesin passes the original unnormalized coords. Both are internally self-consistent (S matches the coords each search used), but a one-line comment on each explaining why they differ would help future readers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cross-referencing comments in both builders in 6905918: nvalchemiops requires in-cell positions, so nv searches AND recomputes edge_vec on the normalized coords (S matches what the search saw; normalize_coord is a differentiable lattice translation with identity gradient); vesin's cell list handles unwrapped positions natively, so it keeps the original grad-carrying coords.

Comment on lines +167 to +176
dst = edge_idx // max_neighbors # flattened center
src = neighbor_matrix.reshape(-1).index_select(0, edge_idx).to(torch.int64)
shift = shifts.reshape(-1, 3).index_select(0, edge_idx).to(torch.int64)
frame_idx = (dst // nloc).to(torch.int64) # frame of the edge
center_local = (dst % nloc).to(torch.int64) # i = center
src_local = (src % nloc).to(torch.int64) # j = neighbor

return neighbor_graph_from_ijs(
center_local, src_local, shift, coord, box_out, frame_idx, nloc, layout=layout
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decode (matrix -> (i, j, S)) is pure torch and CPU-runnable, but it currently lives inside the CUDA-only build_neighbor_graph_nv, so it only runs under the opt-in CUDA suite. Could we extract it into a standalone helper (taking neighbor_matrix, num_neighbors, shifts, nloc, coord, box) and unit-test it on the default CPU CI with synthetic inputs? The GPU neighbor_list search stays opt-in, but the regression-prone arithmetic (// max_neighbors, % nloc, frame isolation, neighbor_graph_from_ijs) would then be behind the normal merge gate. Not blocking since nv is explicit opt-in.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6905918: extracted nv_matrix_to_ijs(neighbor_matrix, num_neighbors, shifts, nloc) -> (i, j, S, frame_idx) — pure torch, device-agnostic — and the builder now calls it. New default-CI CPU test test_nv_matrix_decode.py covers the regression-prone arithmetic with a hand-checked two-frame fixture (incl. stale-slot garbage that must be ignored), an all-empty matrix, and a random-vs-brute-force-oracle sweep. The GPU neighbor_list search stays behind the opt-in CUDA suite.

Comment on lines +19 to +22
pytestmark = pytest.mark.skipif(
not (torch.cuda.is_available() and is_nv_available()),
reason="nvalchemiops requires CUDA + nvalchemi-toolkit-ops",
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CUDA integration tests correctly stay gated. The gap is separate: the CPU-runnable decode in nv_graph_builder.py has no default-CI test, so a shared-code refactor could regress it silently without the CUDA tag. Suggest a CPU unit test of the extracted decode (see my comment on the builder).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by the extraction in 6905918nv_matrix_to_ijs now has a default-CI CPU unit test (source/tests/pt_expt/utils/test_nv_matrix_decode.py: hand-checked, empty, random-vs-oracle), so a shared-code refactor of the decode can no longer regress silently without the CUDA tag.

… (review)

Address iProzd's deepmodeling#5714 review:
- extract the nv dense-matrix -> (i, j, S) decode into nv_matrix_to_ijs
  (pure torch, device-agnostic) and unit-test it on the default CPU CI
  with hand-checked, empty, and random-vs-oracle synthetic inputs; the
  CUDA neighbor_list search stays behind the opt-in GPU suite
- document why nv searches/recomputes on NORMALIZED coords while vesin
  uses the ORIGINAL coords (nvalchemiops needs in-cell positions; vesin
  handles unwrapped positions natively) - each self-consistent with its S
- document the vesin per-frame Python loop scope: intended for nf==1
  inference / CPU; never a default (neighbor_graph_method=None resolves
  to 'dense'); batched multi-frame GPU work should use nv
@wanghan-iapcm wanghan-iapcm requested a review from iProzd July 3, 2026 09:12
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Jul 3, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jul 3, 2026

Dispatches on ``self._neighbor_graph_method``: ``dense``/``ase`` run
backend-agnostic (numpy); ``vesin``/``nv`` run on-device (torch, O(N)).
All backends emit the SAME neighbor set (carry-all, sel-free), so the

@OutisLi OutisLi Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "SAME neighbor set" statement is accurate for normal real atoms, but it is worth documenting one boundary here: the dense builder filters atype < 0 padding/placeholder atoms during graph construction, while the O(N) builders (ase/vesin/nv, matching the SeZM edge-schema style) build geometric edges first and rely on downstream masks/consumers to suppress non-real atoms. For inputs with mixed-type padding this is not literally the same edge set, so downstream graph consumers that need dense parity should explicitly mask/filter edges touching atype < 0, or this docstring should state that responsibility.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — and rather than documenting the divergence, fixed the builders to honor the contract in 6396068: ase/vesin/nv now post-filter the (i, j, S) edge list so virtual atoms (atype < 0) are excluded as both centers and neighbors, matching the dense reference builder (builder.py's documented contract). The "SAME neighbor set" docstring is therefore accurate again, including for mixed-type padding inputs, and downstream graph consumers need no extra masking. Set-equality tests vs the dense builder with a virtual atom added for all three: ase on the default CI, vesin (guarded), nv (CUDA-gated, validated on a T4: 11/11 with the vesin suite). Note the ase-builder half of this bug also exists on master (PR-A vintage) and is fixed here in passing.

@iProzd iProzd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @OutisLi, and I think this is a correctness gap for the opt-in O(N) builders, not just a docstring scope note. Confirmed in code:

  • build_neighbor_graph (dense) drops virtual-atom edges via keep_mask = ... & ~(extended_atype<0) & ~(atype<0).
  • vesin/nv/ase build geometric edges from coords only and never look at atype.
  • forward_common_atomic_graph masks edges only for pair_excl; for atype<0 it only clamps the type and zeroes the virtual atom's own output — it does not drop edges from a virtual neighbor into a real center.

So for inputs with atype<0 (e.g. mixed-natoms padding), a real center that has a virtual atom within rcut gets an extra type-0 edge under vesin/nv/ase but not under dense, polluting the real atom's descriptor/energy/force. The "SAME neighbor set / parity" claim then holds only for all-real-atom inputs, which is likely all the current set-equality fixtures cover.

Rather than only documenting the caller's responsibility, could the O(N) builders filter edges touching atype<0 (drop where center or neighbor-owner is virtual) to match dense, so the parity claim stays literally true? A set-equality test with a virtual-atom fixture would lock it in. Default users are unaffected (default resolves to dense), so this is opt-in-only severity, but it undermines the stated dense parity.

@OutisLi OutisLi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rework the nv graph path to reuse the existing NvNeighborList implementation rather than adding a second nvalchemiops search stack. The new graph builder duplicates the kernel setup, capacity loop, device handling, and matrix decode that NvNeighborList already owns, which increases maintenance cost and has already diverged in behavior. This can be fixed by extending NvNeighborList with a graph-oriented return mode or a thin build_graph method while preserving the existing extended/edges modes so current SeZM users are not affected.

return center_local, src_local, shift, frame_idx


def build_neighbor_graph_nv(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should reuse NvNeighborList instead of introducing a parallel nvalchemiops implementation. NvNeighborList already owns the matrix search, capacity growth, device context, and extended/edges conversions used by the SeZM path. Adding build_neighbor_graph_nv separately means future fixes must be kept in sync manually, and this copy has already drifted by omitting the existing _input_device_context(device) guard. Please extend NvNeighborList with a graph return mode or a thin build_graph method that produces NeighborGraph, while preserving the current extended and edges return modes so existing SeZM behavior stays unchanged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on single-sourcing — implemented in 2f7ea9e with the dependency direction inverted relative to the suggestion (per project roadmap): the graph-builder module now OWNS the search (nv_search_matrix: device-context pinning — restoring the _input_device_context guard this copy had dropped, good catch — periodic normalization, batch setup, grow-until-fit loop) and NvNeighborList.build delegates to it via a lazy import. Rationale: the World-1 strategy layer is scheduled for deprecation with the dense-nlist path, so the long-lived graph module is the owner and the legacy class is the caller — when World-1 is removed, nothing needs to move again. extended/edges return modes are byte-unchanged; GPU-validated on a T4 (nv builder 18/18, NvNeighborList/nlist_backend suites 75 passed + 26 subtests).

@iProzd

iProzd commented Jul 3, 2026

Copy link
Copy Markdown
Member

Others LGTM.

@OutisLi OutisLi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same design concern applies to both nv and vesin: graph-form NeighborGraph construction should extend the existing NeighborList strategies instead of adding separate parallel builders. VesinNeighborList and NvNeighborList already provide the O(N) search backends used by SeZM through extended/edges return modes; the graph path should be another output contract on the same strategy layer, not a second copy of each backend. Please unify this design while preserving the existing extended and edges modes for current SeZM users.

)


def build_neighbor_graph_vesin(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same structural issue as the nv path: this duplicates the existing VesinNeighborList search/conversion logic instead of extending the established NeighborList strategy. VesinNeighborList already owns the vesin.torch O(N) builder and supports extended/edges return modes for the SeZM path. Please make NeighborGraph another output contract of that shared strategy layer, or provide a thin build_graph method there, while preserving existing extended and edges behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same resolution as the nv path, in 2f7ea9e: the raw vesin search is single-sourced as vesin_graph_builder.vesin_search_ijs (device pinning included) and BOTH VesinNeighborList._build_single and _build_single_edges now call it (lazy import to break the module cycle) — graph module owns, deprecation-bound strategy delegates. Extended/edges behavior byte-unchanged (ss float cast kept at the _build_single call site so ss @ box math is identical); full vesin + neighbor_list suites pass on CPU and T4.

@OutisLi OutisLi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also restore the graph-lower eligibility check in the pt_expt graph dispatch. Explicit neighbor_graph_method values should not bypass the same mixed_types/uses_graph_lower contract enforced by dpmodel and by graph export.

ng = build_neighbor_graph(cc, atype, bb, rcut)
elif method == "ase":
ng = build_neighbor_graph_ase(cc, atype, bb, rcut)
elif method == "vesin":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pt_expt override no longer performs the graph-eligibility check that dpmodel keeps before building a NeighborGraph. _resolve_graph_method() only protects the default neighbor_graph_method=None path; an explicit neighbor_graph_method="dense", "ase", "vesin", or "nv" reaches this dispatch even for descriptors that do not support the graph lower (for example non-mixed descriptors, DPA1 with attention layers, or other configs where uses_graph_lower() is false). Please restore the same mixed_types() and descriptor.uses_graph_lower() fail-fast check here before dispatching to any builder, and add a regression test with a non-graph-eligible descriptor explicitly passing one of these methods.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — fixed in a6eb9bc: _call_common_graph now carries the same mixed_types() and descriptor.uses_graph_lower() fail-fast as dpmodel (NotImplementedError), so an explicit neighbor_graph_method can no longer reach the builders for ineligible descriptors. Regression test drives se_e2_a through all four explicit methods and expects the raise.

auto_batch_size: bool | int | AutoBatchSize = True,
neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
nlist_backend: str = "auto",
neighbor_graph_method: str = "dense",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document that neighbor_graph_method is graph-form only. For regular .pte / nlist-form .pt2 inference this value is currently ignored because _eval_model() only dispatches to _eval_model_graph() when metadata["lower_input_kind"] == "graph"; users who want the nlist path O(N) builder must use nlist_backend, not neighbor_graph_method. It would be clearer to either fail fast / warn when neighbor_graph_method != "dense" on a non-graph artifact, or spell out this no-op behavior in the constructor docstring.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d2cf920: both knobs are now documented in the DeepEval docstring (which controls which path, and that they don't overlap), and a non-default neighbor_graph_method on a non-graph artifact now RAISES at construction instead of being silently ignored, pointing the user at nlist_backend. Regression test on a nlist-form .pt2.

@OutisLi OutisLi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revisit the public neighbor-backend API design. The PR now exposes two knobs, nlist_backend and neighbor_graph_method, whose scopes are hard to infer from their names and whose supported values differ. From a user perspective both choose the neighbor construction backend, so the API should be made more natural and unambiguous. In particular, if nv is a supported O(N) backend, it should not work only for graph-form .pt2 while the nlist path has no nv option, especially since NvNeighborList already exists.

auto_batch_size: bool | int | AutoBatchSize = True,
neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
nlist_backend: str = "auto",
neighbor_graph_method: str = "dense",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a second public neighbor-builder knob next to nlist_backend, but the names and value sets do not make the distinction clear. nlist_backend controls the nlist/extended lower path and accepts auto/vesin/native; neighbor_graph_method controls only graph-form .pt2 and accepts dense/ase/vesin/nv. A user who wants an O(N) backend cannot tell which knob to use, and nv oddly works only on the graph path even though NvNeighborList already exists for the nlist strategy layer. Please consolidate this into a single backend-selection API, or otherwise rename/fail-fast/document the two paths clearly, and make nv support consistent across both nlist and graph lower schemas without breaking the existing SeZM extended/edges behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the two-knob split is not a good end state. What this PR now does: documents the split explicitly and fail-fasts on misuse (d2cf920), so a user can no longer pick the wrong knob silently. Full consolidation into a single backend-selection API (and nv on the nlist path) is deferred deliberately: the dense-nlist lower is scheduled for total deprecation in favor of the graph schema (see the NeighborGraph design discussion), at which point nlist_backend disappears and neighbor_graph_method becomes THE selector — consolidating now would mean building API surface on the path being removed. Happy to reorder if you think the deprecation is too far out.

auto_batch_size: bool | int | AutoBatchSize = True,
neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
nlist_backend: str = "auto",
neighbor_graph_method: str = "dense",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One performance note for the eventual unified/auto backend selection: in the SeZM path we found that CPU + nf == 1 is better served by vesin than nvalchemiops, because vesin has the lighter single-system cell-list path while nvalchemiops/Warp has more setup overhead to amortize. The current SeZM selector therefore uses vesin for device.type == "cpu" and nf == 1, and prefers nv for CUDA or multi-frame batches when available. This may be worth carrying over when designing the common default/auto policy.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — carried into the vesin builder's scope note and it matches our benchmark data (T4, diamond C, rcut 6): vesin and nv are equal within noise at nf==1 on CUDA, and vesin's per-frame Python loop (~1 ms/frame) makes nv the clear choice for multi-frame batches; your CPU+nf==1 vesin preference is consistent with vesin's lighter single-system path. Will encode exactly this policy (cpu+nf==1 -> vesin; CUDA/multi-frame -> nv when available) when the auto selector lands with the unified API.

Han Wang added 2 commits July 4, 2026 00:56
test_plugin popped deepmd.pt_expt from sys.modules without restoring it,
leaving the package's cached submodules bound to a dead parent. Any later
import of a cached submodule (e.g. deepmd.pt_expt.infer.deep_eval)
re-created a BARE parent whose utils/infer attributes were never rebound,
and mock.patch('deepmd.pt_expt.utils...') in
test_deep_eval_serialize_api failed with AttributeError under py3.10's
mock target resolution. Shard-order dependent: this PR's new test files
reshuffled CI shard 4 and exposed the latent master-side hygiene bug.
Snapshot the whole deepmd.pt_expt module tree before the re-import and
restore it (including the parent-package attribute) afterwards.
The dense reference builder filters virtual atoms (atype < 0) as both
centers and neighbors during construction, but the geometric ase/vesin/nv
searches are type-blind, so with virtual/padding atoms present the edge
sets diverged from the dense builder and the 'SAME neighbor set' contract
was violated (OutisLi review, deepmodeling#5714). Post-filter the (i, j, S) edge list
in all three builders by the atype of both endpoints. Set-equality tests
vs the dense builder with a virtual atom added for ase (default CI),
vesin (guarded), and nv (CUDA-gated).
Han Wang added 3 commits July 4, 2026 01:38
…egacy strategies delegate

The shared search helpers now live in the World-2 graph-builder modules
(vesin_graph_builder.vesin_search_ijs, nv_graph_builder.nv_search_matrix),
which are the long-term primary path. The legacy VesinNeighborList and
NvNeighborList classes (scheduled for deprecation with the dense-nlist path)
import and call these helpers rather than owning their own copies of the
same logic.

Direction is graph-builder-centric (World-1 strategies are callers, not
owners), matching the reviewer's intent and the project's deprecation roadmap.

Key details:
- vesin_search_ijs: returns (ii, jj, ss) as int64; handles the zeros-box
  for non-periodic internally; pins torch.device for vesin's internal allocs.
  _build_single casts ss to float at the call site (unchanged math for ss@box).
  _build_single_edges uses int64 ss directly (edge_schema_from_ij_shifts
  accepts int shifts). Function-level import in vesin_neighbor_list avoids a
  module-level cycle (vesin_graph_builder imports is_vesin_torch_available
  from vesin_neighbor_list).
- nv_search_matrix: wraps the full search pipeline including
  _input_device_context pinning (restoring the guard the nv graph builder
  previously omitted), periodic normalization, batch tensor construction, and
  the grow-until-fit capacity loop. NvNeighborList.build uses a function-level
  import (lazy, avoids pt->pt_expt module-level cycle). Post-search code in
  build() is wrapped in its own _input_device_context block so device behavior
  is byte-unchanged relative to the original single-context design.
- _matrix_to_extended_inputs, edge_schema_from_neighbor_matrix, and
  _truncate_to_sel_compiled are entirely unchanged (extended/edges modes
  byte-identical).
- _grow_search_capacity and _input_device_context remain in nv_nlist.py for
  callers that import them directly (test_nv_nlist imports _input_device_context).
…ble descriptors

Mirror the dpmodel guard in the pt_expt _call_common_graph override:
_resolve_graph_method's eligibility check only protects the default (None)
path, so an EXPLICIT method reached the builders for descriptors without a
graph lower. Regression test drives se_e2_a (mixed_types False) through all
four explicit methods and expects NotImplementedError. Addresses OutisLi
review on deepmodeling#5714.
…backend split

neighbor_graph_method is consumed only by graph-form .pt2 eval; a
non-default value on a nlist-form artifact was silently ignored, misleading
users who wanted an O(N) builder (that knob is nlist_backend). Document
both knobs in the DeepEval docstring (incl. the planned consolidation at
the dense-nlist deprecation) and raise at construction when
neighbor_graph_method != 'dense' on a non-graph artifact; regression test
on a nlist-form .pt2. Addresses OutisLi review on deepmodeling#5714.
nf, nloc = atype.shape[:2]
target_neighbors = int(sum(sel))
coord = coord.reshape(nf, nloc, 3)
periodic = box is not None
Comment on lines +35 to +39
from deepmd.pt.utils.nv_nlist import (
_input_device_context,
choose_nv_nlist_method,
is_nv_available,
)
nf = coord.shape[0] if coord.ndim == 3 else 1
coord = coord.reshape(nf, -1, 3)
nloc = coord.shape[1]
periodic = box is not None
@wanghan-iapcm wanghan-iapcm requested review from OutisLi and iProzd July 3, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants