feat(core): batch prod env mat over frames by njzjz · Pull Request #5582 · deepmodeling/deepmd-kit

njzjz · 2026-06-24T17:15:00Z

Summary

batch CPU and GPU prod env mat over frames for non-LAMMPS neighbor lists
support no-PBC self-built nlists, manual PBC coord copy, and mesh-provided nlists
keep LAMMPS external nlists on the existing per-sample path
add CPU/GPU multiple-frame neighbor-list and env-mat coverage

Fixes #2618

Tests

cmake --build source/build-codex-cuda --target runUnitTests_lib deepmd_op -j2
source/build-codex-cuda/lib/tests/runUnitTests_lib --gtest_filter=TestNeighborList.cpu:TestNeighborList.cpu_multiple_frames:TestNeighborList.cpu_lessmem:TestEnvMatA.prod_cpu:TestEnvMatA.prod_cpu_equal_cpu:TestEnvMatA.prod_cpu_multiple_frames:TestEnvMatR.prod_cpu:TestEnvMatR.prod_cpu_equal_cpu:TestEnvMatR.prod_cpu_multiple_frames:TestEnvMatAMix.prod_cpu:TestEnvMatAMix.prod_cpu_equal_cpu
srun --gres=gpu:1 source/build-codex-cuda/lib/tests/runUnitTests_lib --gtest_filter=TestEnvMatA.prod_gpu_multiple_frames:TestNeighborList.gpu_multiple_frames:TestEnvMatA.prod_gpu:TestEnvMatA.prod_gpu_equal_cpu:TestEnvMatR.prod_gpu:TestEnvMatR.prod_gpu_equal_cpu:TestFormatNlist.gpu:TestFormatNlistShortSel.gpu
pytest source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_pbc_self_built_nlist source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_nopbc_self_built_nlist source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_pbc_self_built_nlist_deriv source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_nopbc_self_built_nlist_deriv -q

Summary by CodeRabbit

New Features
- Neighbor-list, neighbor formatting, and environment-matrix operations now support multi-frame processing in a single call via an nframes argument (CPU and GPU).
- Neighbor-list construction can optionally filter candidates using a per-atom type array.
Bug Fixes
- Invalid/filtered center or neighbor types are handled consistently (including zeroing em/em_deriv where applicable).
- Improved GPU robustness with safer temporary-memory management and clearer error handling.
Tests
- Added/extended CPU and GPU tests to validate multi-frame neighbor-list and environment-matrix behavior, including negative-type cases.

coderabbitai · 2026-06-24T17:23:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR adds multi-frame support to neighbor-list and environment-matrix APIs, refactors CPU and GPU implementations to process flattened frame/atom rows, updates TensorFlow batching paths, and expands tests to cover multi-frame execution.

Changes

Multi-frame batch support for neighbor-list and env-mat ops

Layer / File(s)	Summary
Public API contracts and wrappers `source/lib/include/neighbor_list.h`, `source/lib/include/fmt_nlist.h`, `source/lib/include/prod_env_mat.h`	`build_nlist_cpu`, `build_nlist_gpu`, `format_nbor_list_gpu`, and all four `prod_env_mat_*` templates add `nframes` (and optional `type`) parameters, with inline wrappers preserving previous call shapes.
CPU multi-frame implementations `source/lib/src/neighbor_list.cc`, `source/lib/src/prod_env_mat.cc`	`build_nlist_cpu` and `prod_env_mat_a_cpu`/`prod_env_mat_r_cpu` iterate over frames, use per-frame offsets, apply frame-aware type filtering, and update explicit instantiations.
GPU multi-frame neighbor-list and env-mat kernels `source/lib/src/gpu/neighbor_list.cu`, `source/lib/src/gpu/prod_env_mat.cu`	GPU neighbor-list and env-mat kernels and host functions switch to `nframes * nloc` indexing, frame-aware coordinate/type lookups, and updated explicit instantiations.
TensorFlow op batching and helper wiring `source/op/tf/custom_op.h`, `source/op/tf/neighbor_stat.cc`, `source/op/tf/prod_env_mat_multi_device.cc`	Updates GPU helper return handling, adds RAII cleanup, and introduces batched CPU/GPU coord-normalization, neighbor-list preparation, and `ProdEnvMat*Op` execution paths.
Multi-frame validation tests `source/lib/tests/test_neighbor_list.cc`, `source/lib/tests/test_fmt_nlist.cc`, `source/lib/tests/test_env_mat_a.cc`, `source/lib/tests/test_env_mat_a_mix.cc`, `source/lib/tests/test_env_mat_r.cc`	Adds frame-repetition helpers, multi-frame CPU/GPU tests, and explicit `nframes = 1` updates in existing GPU call sites.

Sequence Diagram(s)

sequenceDiagram
  participant ProdEnvMatAOp as ProdEnvMatAOp::_Compute
  participant FormatNborListGpu as format_nbor_list_gpu
  participant ComputeEnvMatA as compute_env_mat_a
  ProdEnvMatAOp->>FormatNborListGpu: call(..., nframes)
  FormatNborListGpu->>ComputeEnvMatA: launch over nframes*nloc
  ComputeEnvMatA-->>ProdEnvMatAOp: write em, em_deriv, rij

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

C++

Suggested reviewers

wanghan-iapcm
denghuilu

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: batching prod_env_mat over frames.
Linked Issues check	✅ Passed	The core prod_env_mat CPU/GPU paths now process multiple frames, flattening frame and atom indices to parallelize the frame dimension.
Out of Scope Changes check	✅ Passed	The changes are all tied to multi-frame env-mat and neighbor-list support, with tests and API updates serving the same goal.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

source/lib/src/gpu/prod_env_mat.cu (1)

545-600: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard negative center types in compute_env_mat_r.

compute_env_mat_a returns before using type[i_idx_nall] when the center type is negative, but compute_env_mat_r does not. For filtered rows, the row_nlist[ii] < 0 branch can index avg/std with type[i_idx_nall] < 0, causing device out-of-bounds reads.

Proposed fix

   const int_64 frame_idx = bid / nloc;
   const int_64 atom_idx = bid % nloc;
   const int_64 i_idx_nall = frame_idx * nall + atom_idx;
+  if (type[i_idx_nall] < 0) {
+    return;
+  }
   const int ndescrpt = nnei;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/src/gpu/prod_env_mat.cu` around lines 545 - 600,
`compute_env_mat_r` is missing the same negative-center-type guard that
`compute_env_mat_a` already has, so `type[i_idx_nall]` can be negative and then
used to index `avg/std` in the `row_nlist[ii] < 0` path. Update
`compute_env_mat_r` to early-return or otherwise skip processing when the center
type is negative, and make sure both the positive and negative neighbor branches
never read `avg/std` or write outputs using `type[i_idx_nall]` unless it is
valid. Use `compute_env_mat_r`, `compute_env_mat_a`, and `i_idx_nall` to locate
the fix.

🧹 Nitpick comments (1)

source/lib/tests/test_env_mat_a.cc (1)
805-867: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Consider perturbing the second frame's coordinates to strengthen GPU coverage.

Unlike prod_cpu_multiple_frames (Lines 637-642), this GPU test repeats posi_cpy identically for both frames and compares each frame against the same expected_env. With identical frames, a kernel bug that mishandles the per-frame coordinate offset (e.g., reading frame 0's coords for frame 1) would not be detected. Applying the same per-frame perturbation used in the CPU test and comparing against per-frame reference outputs would make this test meaningfully verify frame-offset indexing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a.cc` around lines 805 - 867, The GPU
multi-frame test in prod_env_mat_a_gpu is too weak because it uses identical
coordinates for every frame, so frame-offset indexing bugs can pass unnoticed.
Update the test setup to perturb the second frame’s positions the same way the
CPU multi-frame test does, then compute or store per-frame expected_env values
and compare each frame against its own reference instead of reusing one shared
baseline.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 204-210: The R-descriptor normalization in prod_env_mat.cc can
still index avg/std with a negative center type when d_type[ii] is -1, which
must be guarded the same way as the A-path. Update the normalization logic
around the frame_em and frame_em_deriv loops to detect negative d_type[ii]
before using it for avg/std lookup, and zero out the corresponding outputs
instead of normalizing. Keep the fix localized to the descriptor-building code
that handles d_type, frame_em, and frame_em_deriv so negative-type filtering is
handled consistently.

In `@source/op/tf/prod_env_mat_multi_device.cc`:
- Around line 702-708: Reject oversized built GPU neighbor lists before rounding
in the built GPU nlist path: in the logic around _build_nlist_gpu and the
chunk_max_nbor_size rounding block, add a guard that checks the reported maximum
against GPU_MAX_NBOR_SIZE before any rounding to 1024/2048/4096, and fail early
if it exceeds the kernel-supported limit. Apply the same fix in the other
duplicated rounding sites so formatting/descriptor kernels never receive an
undersized max_nbor_size for an actually oversized neighbor list.
- Around line 2087-2098: The mesh tensor validation in the neighbor-count loop
allows integer overflow because `neighbor_count` is accumulated as `int` from
tensor data, which can let an invalid buffer size slip through. Update the
bounds check in the mesh parsing path around `numneigh_in`, `neighbor_count`,
and the `mesh_tensor_size` validation to accumulate with a wider unsigned/signed
64-bit type and verify each addition does not exceed the remaining tensor length
before proceeding; apply the same fix in the matching neighbor-count check used
elsewhere in this file.
- Around line 675-737: `firstneigh` is allocated with raw device memory in the
neighbor-list path, but several `OP_REQUIRES`/`OP_REQUIRES_OK` exits can return
before `deepmd::delete_device_memory(firstneigh)` runs, leaking GPU memory.
Update the `prod_env_mat_multi_device` flow that uses `firstneigh` to ensure
cleanup on every early exit, preferably by introducing a small RAII/guard
wrapper around the allocation or switching to a Tensor-owned buffer so the
memory is released automatically even when `_prepare_mesh_nlist_gpu_batch`,
`_build_nlist_gpu`, or later allocations fail. Apply the same pattern in the
other matching `firstneigh` code paths referenced by this diff.
- Around line 2128-2206: The batched CPU helper _prepare_coord_nlist_cpu_batch
currently swallows failures because OP_REQUIRES and OP_REQUIRES_OK only exit the
helper, allowing prod_env_mat_*_cpu to continue with an invalid or uninitialized
inlist. Change _prepare_coord_nlist_cpu_batch to return a tensorflow::Status or
boolean success value, propagate failures from the copy and neighbor-list setup
paths, and update the call sites to wrap it with OP_REQUIRES_OK so descriptor
computation stops immediately on invalid mesh data or allocation errors.

---

Outside diff comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 545-600: `compute_env_mat_r` is missing the same
negative-center-type guard that `compute_env_mat_a` already has, so
`type[i_idx_nall]` can be negative and then used to index `avg/std` in the
`row_nlist[ii] < 0` path. Update `compute_env_mat_r` to early-return or
otherwise skip processing when the center type is negative, and make sure both
the positive and negative neighbor branches never read `avg/std` or write
outputs using `type[i_idx_nall]` unless it is valid. Use `compute_env_mat_r`,
`compute_env_mat_a`, and `i_idx_nall` to locate the fix.

---

Nitpick comments:
In `@source/lib/tests/test_env_mat_a.cc`:
- Around line 805-867: The GPU multi-frame test in prod_env_mat_a_gpu is too
weak because it uses identical coordinates for every frame, so frame-offset
indexing bugs can pass unnoticed. Update the test setup to perturb the second
frame’s positions the same way the CPU multi-frame test does, then compute or
store per-frame expected_env values and compare each frame against its own
reference instead of reusing one shared baseline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 80d02dbc-a6db-44dc-86b9-5586fc945ee1

📥 Commits

Reviewing files that changed from the base of the PR and between b52c359 and 782ce9d.

📒 Files selected for processing (13)

source/lib/include/fmt_nlist.h
source/lib/include/neighbor_list.h
source/lib/include/prod_env_mat.h
source/lib/src/gpu/neighbor_list.cu
source/lib/src/gpu/prod_env_mat.cu
source/lib/src/neighbor_list.cc
source/lib/src/prod_env_mat.cc
source/lib/tests/test_env_mat_a.cc
source/lib/tests/test_env_mat_a_mix.cc
source/lib/tests/test_env_mat_r.cc
source/lib/tests/test_fmt_nlist.cc
source/lib/tests/test_neighbor_list.cc
source/op/tf/prod_env_mat_multi_device.cc

codecov · 2026-06-24T18:11:46Z

Codecov Report

❌ Patch coverage is 91.13924% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.82%. Comparing base (0e5c170) to head (c98dc29).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
source/op/tf/prod_env_mat_multi_device.cc	80.39%	15 Missing and 25 partials ⚠️
source/lib/src/neighbor_list.cc	81.81%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5582      +/-   ##
==========================================
- Coverage   81.97%   81.82%   -0.15%     
==========================================
  Files         959      960       +1     
  Lines      105748   106180     +432     
  Branches     4102     4168      +66     
==========================================
+ Hits        86684    86886     +202     
- Misses      17573    17796     +223     
- Partials     1491     1498       +7

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

source/lib/src/prod_env_mat.cc (1)
39-80: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Flatten CPU frame/atom work instead of serializing frames.

Both CPU paths still run a serial outer frame loop and recreate the OpenMP region per frame, so large nframes batches with modest nloc do not get frame-level parallelism. Consider restructuring around flattened (frame, atom) rows or another single parallel region with frame-local buffers.

Also applies to: 146-186
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/src/prod_env_mat.cc` around lines 39 - 80, The CPU frame handling
in prod_env_mat.cc still processes nframes serially and re-enters the OpenMP
region for each frame, leaving frame-level parallelism unused. Refactor the
logic around the main frame-processing loop and the existing `#pragma` omp
parallel for section so work is flattened across (frame, atom) rows or handled
in one outer parallel region with frame-local buffers, using the same
frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current
implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 404-407: The A-mix path is indexing the center-type buffer with a
nframes × nall stride in prod_env_mat_a_cpu/prod_env_mat_a_gpu, but some callers
may still pass a compact nframes × nloc type array. Update the A-mix call sites
and tests (especially test_env_mat_a_mix.cc and the multi-device path) so the
center type data is expanded to frame_nall, or switch the kernels to use the
already-expanded f_type/f_atype_cpy argument for center normalization instead of
type. Ensure the chosen buffer matches the posi_cpy expansion and is
consistently sized before the kernel reads frame_idx * nall + atom_idx.

---

Nitpick comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 39-80: The CPU frame handling in prod_env_mat.cc still processes
nframes serially and re-enters the OpenMP region for each frame, leaving
frame-level parallelism unused. Refactor the logic around the main
frame-processing loop and the existing `#pragma` omp parallel for section so work
is flattened across (frame, atom) rows or handled in one outer parallel region
with frame-local buffers, using the same
frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current
implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8a1ce82c-bc3c-4298-b2c4-784ef1861f64

📥 Commits

Reviewing files that changed from the base of the PR and between 782ce9d and 1a8586b.

📒 Files selected for processing (15)

source/lib/include/fmt_nlist.h
source/lib/include/neighbor_list.h
source/lib/include/prod_env_mat.h
source/lib/src/gpu/neighbor_list.cu
source/lib/src/gpu/prod_env_mat.cu
source/lib/src/neighbor_list.cc
source/lib/src/prod_env_mat.cc
source/lib/tests/test_env_mat_a.cc
source/lib/tests/test_env_mat_a_mix.cc
source/lib/tests/test_env_mat_r.cc
source/lib/tests/test_fmt_nlist.cc
source/lib/tests/test_neighbor_list.cc
source/op/tf/custom_op.h
source/op/tf/neighbor_stat.cc
source/op/tf/prod_env_mat_multi_device.cc

🚧 Files skipped from review as they are similar to previous changes (10)

source/lib/include/fmt_nlist.h
source/lib/tests/test_fmt_nlist.cc
source/lib/include/prod_env_mat.h
source/lib/tests/test_env_mat_a_mix.cc
source/lib/include/neighbor_list.h
source/lib/src/neighbor_list.cc
source/lib/src/gpu/neighbor_list.cu
source/lib/tests/test_neighbor_list.cc
source/lib/tests/test_env_mat_a.cc
source/op/tf/prod_env_mat_multi_device.cc

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

source/lib/tests/test_env_mat_a_mix.cc (1)

890-895: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Free rij_dev in this cleanup path.

rij_dev is allocated and copied back in this test but never released here, unlike the adjacent GPU tests.

Proposed fix

   deepmd::delete_device_memory(em_dev);
   deepmd::delete_device_memory(em_deriv_dev);
+  deepmd::delete_device_memory(rij_dev);
   deepmd::delete_device_memory(nlist_dev);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 890 - 895, Free the
missing GPU buffer in this cleanup block: the test currently releases em_dev,
em_deriv_dev, nlist_dev, posi_cpy_dev, f_atype_cpy_dev, and atype_cpy_dev, but
omits rij_dev. Update the cleanup path in this test near the existing
deepmd::delete_device_memory calls to also release rij_dev, matching the
adjacent GPU test cleanup patterns and keeping the allocation/freeing symmetry.

🧹 Nitpick comments (1)

source/lib/tests/test_env_mat_a_mix.cc (1)
678-723: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Strengthen the multi-frame assertions beyond em.

Both new multi-frame tests repeat identical coordinates and only compare em, so a frame-offset bug in coordinate reads or in em_deriv/rij/nlist writes could still pass. Perturb one frame’s coordinates within the same neighbor topology and compare the expected em_deriv, rij, and formatted nlist buffers too.

Also applies to: 984-1073
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 678 - 723, The
multi-frame test in test_env_mat_a_mix.cc only validates em, so it can miss
frame-offset or buffer-writing bugs in prod_env_mat_a_cpu. Update the affected
multi-frame cases to perturb one frame’s coordinates while keeping the neighbor
topology valid, then compare the expected outputs for em_deriv, rij, and nlist
in addition to em using the existing prod_env_mat_a_cpu calls and frame-specific
buffers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 890-895: Free the missing GPU buffer in this cleanup block: the
test currently releases em_dev, em_deriv_dev, nlist_dev, posi_cpy_dev,
f_atype_cpy_dev, and atype_cpy_dev, but omits rij_dev. Update the cleanup path
in this test near the existing deepmd::delete_device_memory calls to also
release rij_dev, matching the adjacent GPU test cleanup patterns and keeping the
allocation/freeing symmetry.

---

Nitpick comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 678-723: The multi-frame test in test_env_mat_a_mix.cc only
validates em, so it can miss frame-offset or buffer-writing bugs in
prod_env_mat_a_cpu. Update the affected multi-frame cases to perturb one frame’s
coordinates while keeping the neighbor topology valid, then compare the expected
outputs for em_deriv, rij, and nlist in addition to em using the existing
prod_env_mat_a_cpu calls and frame-specific buffers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 249f5166-176c-464a-acd1-d2707922380d

📥 Commits

Reviewing files that changed from the base of the PR and between 1a8586b and 048376b.

📒 Files selected for processing (15)

source/lib/include/fmt_nlist.h
source/lib/include/neighbor_list.h
source/lib/include/prod_env_mat.h
source/lib/src/gpu/neighbor_list.cu
source/lib/src/gpu/prod_env_mat.cu
source/lib/src/neighbor_list.cc
source/lib/src/prod_env_mat.cc
source/lib/tests/test_env_mat_a.cc
source/lib/tests/test_env_mat_a_mix.cc
source/lib/tests/test_env_mat_r.cc
source/lib/tests/test_fmt_nlist.cc
source/lib/tests/test_neighbor_list.cc
source/op/tf/custom_op.h
source/op/tf/neighbor_stat.cc
source/op/tf/prod_env_mat_multi_device.cc

🚧 Files skipped from review as they are similar to previous changes (14)

source/op/tf/custom_op.h
source/lib/include/fmt_nlist.h
source/lib/tests/test_fmt_nlist.cc
source/lib/include/neighbor_list.h
source/lib/include/prod_env_mat.h
source/op/tf/neighbor_stat.cc
source/lib/tests/test_neighbor_list.cc
source/lib/src/neighbor_list.cc
source/lib/tests/test_env_mat_r.cc
source/lib/src/prod_env_mat.cc
source/lib/tests/test_env_mat_a.cc
source/lib/src/gpu/neighbor_list.cu
source/lib/src/gpu/prod_env_mat.cu
source/op/tf/prod_env_mat_multi_device.cc

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

Copilot

Pull request overview

This PR extends DeePMD-kit’s TensorFlow prod_env_mat/neighbor-list pipeline to support processing multiple frames per call (CPU and GPU) for non-LAMMPS neighbor-list modes, improving batching/parallelism across the frame dimension as requested in #2618.

Changes:

Add batched (multi-frame) coord-copy + neighbor-list preparation paths for CPU/GPU in prod_env_mat_multi_device (keeping LAMMPS external neighbor lists on the per-sample path).
Extend core neighbor-list and environment-matrix implementations (CPU/CUDA) with an nframes dimension and update call sites accordingly.
Add/extend unit tests for multi-frame neighbor list and env-mat behavior on CPU/GPU, including negative-type handling.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
source/op/tf/prod_env_mat_multi_device.cc	Adds multi-frame preparation and compute paths for prod-env-mat, including mesh/self-built nlist batching on CPU/GPU.
source/op/tf/neighbor_stat.cc	Updates GPU neighbor-stat path to use the new `_prepare_coord_nlist_gpu` Status-return API and improves device-memory cleanup.
source/op/tf/custom_op.h	Updates `_prepare_coord_nlist_gpu` declaration to return `tensorflow::Status`.
source/lib/src/prod_env_mat.cc	Adds `nframes` support to CPU prod-env-mat implementations and handles negative types consistently.
source/lib/src/gpu/prod_env_mat.cu	Extends GPU neighbor formatting + env-mat kernels to operate over `nframes * nloc`.
source/lib/src/neighbor_list.cc	Extends CPU neighbor-list construction to build `nframes * nloc` rows and optionally filter pairs via per-atom `type`.
source/lib/src/gpu/neighbor_list.cu	Extends GPU neighbor-list construction to support multi-frame layouts and optional type filtering.
source/lib/include/prod_env_mat.h	Updates env-mat APIs with `nframes` and provides single-frame inline wrappers for compatibility.
source/lib/include/neighbor_list.h	Updates neighbor-list APIs/docs to include `nframes` and optional `type` filtering.
source/lib/include/fmt_nlist.h	Updates `format_nbor_list_gpu` signature to accept `nframes` and adds a single-frame wrapper.
source/lib/tests/test_neighbor_list.cc	Adds CPU/GPU multi-frame neighbor-list tests.
source/lib/tests/test_fmt_nlist.cc	Updates GPU format-nlist tests for the new `nframes` parameter.
source/lib/tests/test_env_mat_r.cc	Adds multi-frame + negative-type CPU/GPU coverage for `prod_env_mat_r`.
source/lib/tests/test_env_mat_a.cc	Adds multi-frame CPU/GPU coverage for `prod_env_mat_a`.
source/lib/tests/test_env_mat_a_mix.cc	Fixes/extends A-mix tests and adds multi-frame CPU/GPU coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

njzjz-bot

Added applyable suggestions for the Copilot findings.

— OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

@njzjz

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: A bot of @njzjz <48687836+njzjz-bot@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>

for more information, see https://pre-commit.ci

github-actions Bot added Core CUDA ROCM OP labels Jun 24, 2026

dosubot Bot added enhancement new feature labels Jun 24, 2026

njzjz added the Test CUDA Trigger test CUDA workflow label Jun 24, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

github-advanced-security AI found potential problems Jun 24, 2026

View reviewed changes

Comment thread source/op/tf/prod_env_mat_multi_device.cc Fixed

Comment thread source/lib/src/prod_env_mat.cc Fixed

Comment thread source/lib/src/prod_env_mat.cc Fixed

Comment thread source/op/tf/prod_env_mat_multi_device.cc Fixed

njzjz force-pushed the fix/gpu-nlist-prefix-scan branch from 782ce9d to 1a8586b Compare June 25, 2026 07:30

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread source/lib/src/gpu/prod_env_mat.cu

feat(core): batch prod env mat over frames

048376b

njzjz force-pushed the fix/gpu-nlist-prefix-scan branch from 1a8586b to 048376b Compare June 25, 2026 10:12

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

njzjz mentioned this pull request Jun 25, 2026

Clarify padded A env-mat normalization semantics across GPU, CPU, and dpmodel #5584

Closed

njzjz added the Test CUDA Trigger test CUDA workflow label Jun 25, 2026

github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 25, 2026

njzjz requested review from iProzd and wanghan-iapcm June 25, 2026 16:37

njzjz linked an issue Jun 25, 2026 that may be closed by this pull request

Clarify padded A env-mat normalization semantics across GPU, CPU, and dpmodel #5584

Closed

wanghan-iapcm reviewed Jun 26, 2026

View reviewed changes

Comment thread source/lib/tests/test_env_mat_a.cc Outdated

wanghan-iapcm reviewed Jun 26, 2026

View reviewed changes

Comment thread source/lib/tests/test_env_mat_r.cc

test(core): cover batched GPU env mat outputs

0c4e30d

njzjz requested a review from wanghan-iapcm June 26, 2026 10:30

wanghan-iapcm reviewed Jun 27, 2026

View reviewed changes

Comment thread source/lib/include/neighbor_list.h

docs(lib): document batched neighbor list inputs

d6ea9eb

njzjz requested a review from wanghan-iapcm June 30, 2026 11:06

iProzd approved these changes Jun 30, 2026

View reviewed changes

wanghan-iapcm approved these changes Jun 30, 2026

View reviewed changes

Merge branch 'master' into fix/gpu-nlist-prefix-scan

4c1742b

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>

Copilot AI review requested due to automatic review settings July 1, 2026 02:19

njzjz enabled auto-merge July 1, 2026 02:19

Copilot started reviewing on behalf of njzjz July 1, 2026 02:30 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

Comment thread source/lib/tests/test_neighbor_list.cc

Comment thread source/op/tf/prod_env_mat_multi_device.cc

Comment thread source/op/tf/prod_env_mat_multi_device.cc

njzjz-bot reviewed Jul 1, 2026

View reviewed changes

Comment thread source/lib/tests/test_neighbor_list.cc

Comment thread source/op/tf/prod_env_mat_multi_device.cc

Comment thread source/op/tf/prod_env_mat_multi_device.cc

Comment thread source/op/tf/prod_env_mat_multi_device.cc

njzjz and others added 2 commits July 1, 2026 14:45

Apply suggestions from code review

c0accc2

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: A bot of @njzjz <48687836+njzjz-bot@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c98dc29

for more information, see https://pre-commit.ci

njzjz added this pull request to the merge queue Jul 1, 2026

Merged via the queue into deepmodeling:master with commit 5e80487 Jul 1, 2026
57 checks passed

njzjz deleted the fix/gpu-nlist-prefix-scan branch July 1, 2026 16:33

Uh oh!

Conversation

njzjz commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njzjz-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

njzjz commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

codecov Bot commented Jun 24, 2026 •

edited

Loading