Skip to content

feat(core): batch prod env mat over frames#5582

Merged
njzjz merged 6 commits into
deepmodeling:masterfrom
njzjz:fix/gpu-nlist-prefix-scan
Jul 1, 2026
Merged

feat(core): batch prod env mat over frames#5582
njzjz merged 6 commits into
deepmodeling:masterfrom
njzjz:fix/gpu-nlist-prefix-scan

Conversation

@njzjz

@njzjz njzjz commented Jun 24, 2026

Copy link
Copy Markdown
Member

Summary

  • batch CPU and GPU prod env mat over frames for non-LAMMPS neighbor lists
  • support no-PBC self-built nlists, manual PBC coord copy, and mesh-provided nlists
  • keep LAMMPS external nlists on the existing per-sample path
  • add CPU/GPU multiple-frame neighbor-list and env-mat coverage

Fixes #2618

Tests

  • cmake --build source/build-codex-cuda --target runUnitTests_lib deepmd_op -j2
  • source/build-codex-cuda/lib/tests/runUnitTests_lib --gtest_filter=TestNeighborList.cpu:TestNeighborList.cpu_multiple_frames:TestNeighborList.cpu_lessmem:TestEnvMatA.prod_cpu:TestEnvMatA.prod_cpu_equal_cpu:TestEnvMatA.prod_cpu_multiple_frames:TestEnvMatR.prod_cpu:TestEnvMatR.prod_cpu_equal_cpu:TestEnvMatR.prod_cpu_multiple_frames:TestEnvMatAMix.prod_cpu:TestEnvMatAMix.prod_cpu_equal_cpu
  • srun --gres=gpu:1 source/build-codex-cuda/lib/tests/runUnitTests_lib --gtest_filter=TestEnvMatA.prod_gpu_multiple_frames:TestNeighborList.gpu_multiple_frames:TestEnvMatA.prod_gpu:TestEnvMatA.prod_gpu_equal_cpu:TestEnvMatR.prod_gpu:TestEnvMatR.prod_gpu_equal_cpu:TestFormatNlist.gpu:TestFormatNlistShortSel.gpu
  • pytest source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_pbc_self_built_nlist source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_nopbc_self_built_nlist source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_pbc_self_built_nlist_deriv source/tests/tf/test_prod_env_mat.py::TestProdEnvMat::test_nopbc_self_built_nlist_deriv -q

Summary by CodeRabbit

  • New Features
    • Neighbor-list, neighbor formatting, and environment-matrix operations now support multi-frame processing in a single call via an nframes argument (CPU and GPU).
    • Neighbor-list construction can optionally filter candidates using a per-atom type array.
  • Bug Fixes
    • Invalid/filtered center or neighbor types are handled consistently (including zeroing em/em_deriv where applicable).
    • Improved GPU robustness with safer temporary-memory management and clearer error handling.
  • Tests
    • Added/extended CPU and GPU tests to validate multi-frame neighbor-list and environment-matrix behavior, including negative-type cases.

@njzjz njzjz added the Test CUDA Trigger test CUDA workflow label Jun 24, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR adds multi-frame support to neighbor-list and environment-matrix APIs, refactors CPU and GPU implementations to process flattened frame/atom rows, updates TensorFlow batching paths, and expands tests to cover multi-frame execution.

Changes

Multi-frame batch support for neighbor-list and env-mat ops

Layer / File(s) Summary
Public API contracts and wrappers
source/lib/include/neighbor_list.h, source/lib/include/fmt_nlist.h, source/lib/include/prod_env_mat.h
build_nlist_cpu, build_nlist_gpu, format_nbor_list_gpu, and all four prod_env_mat_* templates add nframes (and optional type) parameters, with inline wrappers preserving previous call shapes.
CPU multi-frame implementations
source/lib/src/neighbor_list.cc, source/lib/src/prod_env_mat.cc
build_nlist_cpu and prod_env_mat_a_cpu/prod_env_mat_r_cpu iterate over frames, use per-frame offsets, apply frame-aware type filtering, and update explicit instantiations.
GPU multi-frame neighbor-list and env-mat kernels
source/lib/src/gpu/neighbor_list.cu, source/lib/src/gpu/prod_env_mat.cu
GPU neighbor-list and env-mat kernels and host functions switch to nframes * nloc indexing, frame-aware coordinate/type lookups, and updated explicit instantiations.
TensorFlow op batching and helper wiring
source/op/tf/custom_op.h, source/op/tf/neighbor_stat.cc, source/op/tf/prod_env_mat_multi_device.cc
Updates GPU helper return handling, adds RAII cleanup, and introduces batched CPU/GPU coord-normalization, neighbor-list preparation, and ProdEnvMat*Op execution paths.
Multi-frame validation tests
source/lib/tests/test_neighbor_list.cc, source/lib/tests/test_fmt_nlist.cc, source/lib/tests/test_env_mat_a.cc, source/lib/tests/test_env_mat_a_mix.cc, source/lib/tests/test_env_mat_r.cc
Adds frame-repetition helpers, multi-frame CPU/GPU tests, and explicit nframes = 1 updates in existing GPU call sites.

Sequence Diagram(s)

sequenceDiagram
  participant ProdEnvMatAOp as ProdEnvMatAOp::_Compute
  participant FormatNborListGpu as format_nbor_list_gpu
  participant ComputeEnvMatA as compute_env_mat_a
  ProdEnvMatAOp->>FormatNborListGpu: call(..., nframes)
  FormatNborListGpu->>ComputeEnvMatA: launch over nframes*nloc
  ComputeEnvMatA-->>ProdEnvMatAOp: write em, em_deriv, rij
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

C++

Suggested reviewers

  • wanghan-iapcm
  • denghuilu
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: batching prod_env_mat over frames.
Linked Issues check ✅ Passed The core prod_env_mat CPU/GPU paths now process multiple frames, flattening frame and atom indices to parallelize the frame dimension.
Out of Scope Changes check ✅ Passed The changes are all tied to multi-frame env-mat and neighbor-list support, with tests and API updates serving the same goal.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
source/lib/src/gpu/prod_env_mat.cu (1)

545-600: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard negative center types in compute_env_mat_r.

compute_env_mat_a returns before using type[i_idx_nall] when the center type is negative, but compute_env_mat_r does not. For filtered rows, the row_nlist[ii] < 0 branch can index avg/std with type[i_idx_nall] < 0, causing device out-of-bounds reads.

Proposed fix
   const int_64 frame_idx = bid / nloc;
   const int_64 atom_idx = bid % nloc;
   const int_64 i_idx_nall = frame_idx * nall + atom_idx;
+  if (type[i_idx_nall] < 0) {
+    return;
+  }
   const int ndescrpt = nnei;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/src/gpu/prod_env_mat.cu` around lines 545 - 600,
`compute_env_mat_r` is missing the same negative-center-type guard that
`compute_env_mat_a` already has, so `type[i_idx_nall]` can be negative and then
used to index `avg/std` in the `row_nlist[ii] < 0` path. Update
`compute_env_mat_r` to early-return or otherwise skip processing when the center
type is negative, and make sure both the positive and negative neighbor branches
never read `avg/std` or write outputs using `type[i_idx_nall]` unless it is
valid. Use `compute_env_mat_r`, `compute_env_mat_a`, and `i_idx_nall` to locate
the fix.
🧹 Nitpick comments (1)
source/lib/tests/test_env_mat_a.cc (1)

805-867: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Consider perturbing the second frame's coordinates to strengthen GPU coverage.

Unlike prod_cpu_multiple_frames (Lines 637-642), this GPU test repeats posi_cpy identically for both frames and compares each frame against the same expected_env. With identical frames, a kernel bug that mishandles the per-frame coordinate offset (e.g., reading frame 0's coords for frame 1) would not be detected. Applying the same per-frame perturbation used in the CPU test and comparing against per-frame reference outputs would make this test meaningfully verify frame-offset indexing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a.cc` around lines 805 - 867, The GPU
multi-frame test in prod_env_mat_a_gpu is too weak because it uses identical
coordinates for every frame, so frame-offset indexing bugs can pass unnoticed.
Update the test setup to perturb the second frame’s positions the same way the
CPU multi-frame test does, then compute or store per-frame expected_env values
and compare each frame against its own reference instead of reusing one shared
baseline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 204-210: The R-descriptor normalization in prod_env_mat.cc can
still index avg/std with a negative center type when d_type[ii] is -1, which
must be guarded the same way as the A-path. Update the normalization logic
around the frame_em and frame_em_deriv loops to detect negative d_type[ii]
before using it for avg/std lookup, and zero out the corresponding outputs
instead of normalizing. Keep the fix localized to the descriptor-building code
that handles d_type, frame_em, and frame_em_deriv so negative-type filtering is
handled consistently.

In `@source/op/tf/prod_env_mat_multi_device.cc`:
- Around line 702-708: Reject oversized built GPU neighbor lists before rounding
in the built GPU nlist path: in the logic around _build_nlist_gpu and the
chunk_max_nbor_size rounding block, add a guard that checks the reported maximum
against GPU_MAX_NBOR_SIZE before any rounding to 1024/2048/4096, and fail early
if it exceeds the kernel-supported limit. Apply the same fix in the other
duplicated rounding sites so formatting/descriptor kernels never receive an
undersized max_nbor_size for an actually oversized neighbor list.
- Around line 2087-2098: The mesh tensor validation in the neighbor-count loop
allows integer overflow because `neighbor_count` is accumulated as `int` from
tensor data, which can let an invalid buffer size slip through. Update the
bounds check in the mesh parsing path around `numneigh_in`, `neighbor_count`,
and the `mesh_tensor_size` validation to accumulate with a wider unsigned/signed
64-bit type and verify each addition does not exceed the remaining tensor length
before proceeding; apply the same fix in the matching neighbor-count check used
elsewhere in this file.
- Around line 675-737: `firstneigh` is allocated with raw device memory in the
neighbor-list path, but several `OP_REQUIRES`/`OP_REQUIRES_OK` exits can return
before `deepmd::delete_device_memory(firstneigh)` runs, leaking GPU memory.
Update the `prod_env_mat_multi_device` flow that uses `firstneigh` to ensure
cleanup on every early exit, preferably by introducing a small RAII/guard
wrapper around the allocation or switching to a Tensor-owned buffer so the
memory is released automatically even when `_prepare_mesh_nlist_gpu_batch`,
`_build_nlist_gpu`, or later allocations fail. Apply the same pattern in the
other matching `firstneigh` code paths referenced by this diff.
- Around line 2128-2206: The batched CPU helper _prepare_coord_nlist_cpu_batch
currently swallows failures because OP_REQUIRES and OP_REQUIRES_OK only exit the
helper, allowing prod_env_mat_*_cpu to continue with an invalid or uninitialized
inlist. Change _prepare_coord_nlist_cpu_batch to return a tensorflow::Status or
boolean success value, propagate failures from the copy and neighbor-list setup
paths, and update the call sites to wrap it with OP_REQUIRES_OK so descriptor
computation stops immediately on invalid mesh data or allocation errors.

---

Outside diff comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 545-600: `compute_env_mat_r` is missing the same
negative-center-type guard that `compute_env_mat_a` already has, so
`type[i_idx_nall]` can be negative and then used to index `avg/std` in the
`row_nlist[ii] < 0` path. Update `compute_env_mat_r` to early-return or
otherwise skip processing when the center type is negative, and make sure both
the positive and negative neighbor branches never read `avg/std` or write
outputs using `type[i_idx_nall]` unless it is valid. Use `compute_env_mat_r`,
`compute_env_mat_a`, and `i_idx_nall` to locate the fix.

---

Nitpick comments:
In `@source/lib/tests/test_env_mat_a.cc`:
- Around line 805-867: The GPU multi-frame test in prod_env_mat_a_gpu is too
weak because it uses identical coordinates for every frame, so frame-offset
indexing bugs can pass unnoticed. Update the test setup to perturb the second
frame’s positions the same way the CPU multi-frame test does, then compute or
store per-frame expected_env values and compare each frame against its own
reference instead of reusing one shared baseline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 80d02dbc-a6db-44dc-86b9-5586fc945ee1

📥 Commits

Reviewing files that changed from the base of the PR and between b52c359 and 782ce9d.

📒 Files selected for processing (13)
  • source/lib/include/fmt_nlist.h
  • source/lib/include/neighbor_list.h
  • source/lib/include/prod_env_mat.h
  • source/lib/src/gpu/neighbor_list.cu
  • source/lib/src/gpu/prod_env_mat.cu
  • source/lib/src/neighbor_list.cc
  • source/lib/src/prod_env_mat.cc
  • source/lib/tests/test_env_mat_a.cc
  • source/lib/tests/test_env_mat_a_mix.cc
  • source/lib/tests/test_env_mat_r.cc
  • source/lib/tests/test_fmt_nlist.cc
  • source/lib/tests/test_neighbor_list.cc
  • source/op/tf/prod_env_mat_multi_device.cc

Comment thread source/lib/src/prod_env_mat.cc Outdated
Comment thread source/op/tf/prod_env_mat_multi_device.cc Outdated
Comment thread source/op/tf/prod_env_mat_multi_device.cc Outdated
Comment thread source/op/tf/prod_env_mat_multi_device.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc Outdated
Comment thread source/op/tf/prod_env_mat_multi_device.cc Fixed
Comment thread source/lib/src/prod_env_mat.cc Fixed
Comment thread source/lib/src/prod_env_mat.cc Fixed
Comment thread source/op/tf/prod_env_mat_multi_device.cc Fixed
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.13924% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.82%. Comparing base (0e5c170) to head (c98dc29).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
source/op/tf/prod_env_mat_multi_device.cc 80.39% 15 Missing and 25 partials ⚠️
source/lib/src/neighbor_list.cc 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5582      +/-   ##
==========================================
- Coverage   81.97%   81.82%   -0.15%     
==========================================
  Files         959      960       +1     
  Lines      105748   106180     +432     
  Branches     4102     4168      +66     
==========================================
+ Hits        86684    86886     +202     
- Misses      17573    17796     +223     
- Partials     1491     1498       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz force-pushed the fix/gpu-nlist-prefix-scan branch from 782ce9d to 1a8586b Compare June 25, 2026 07:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
source/lib/src/prod_env_mat.cc (1)

39-80: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Flatten CPU frame/atom work instead of serializing frames.

Both CPU paths still run a serial outer frame loop and recreate the OpenMP region per frame, so large nframes batches with modest nloc do not get frame-level parallelism. Consider restructuring around flattened (frame, atom) rows or another single parallel region with frame-local buffers.

Also applies to: 146-186

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/src/prod_env_mat.cc` around lines 39 - 80, The CPU frame handling
in prod_env_mat.cc still processes nframes serially and re-enters the OpenMP
region for each frame, leaving frame-level parallelism unused. Refactor the
logic around the main frame-processing loop and the existing `#pragma` omp
parallel for section so work is flattened across (frame, atom) rows or handled
in one outer parallel region with frame-local buffers, using the same
frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current
implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 404-407: The A-mix path is indexing the center-type buffer with a
nframes × nall stride in prod_env_mat_a_cpu/prod_env_mat_a_gpu, but some callers
may still pass a compact nframes × nloc type array. Update the A-mix call sites
and tests (especially test_env_mat_a_mix.cc and the multi-device path) so the
center type data is expanded to frame_nall, or switch the kernels to use the
already-expanded f_type/f_atype_cpy argument for center normalization instead of
type. Ensure the chosen buffer matches the posi_cpy expansion and is
consistently sized before the kernel reads frame_idx * nall + atom_idx.

---

Nitpick comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 39-80: The CPU frame handling in prod_env_mat.cc still processes
nframes serially and re-enters the OpenMP region for each frame, leaving
frame-level parallelism unused. Refactor the logic around the main
frame-processing loop and the existing `#pragma` omp parallel for section so work
is flattened across (frame, atom) rows or handled in one outer parallel region
with frame-local buffers, using the same
frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current
implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8a1ce82c-bc3c-4298-b2c4-784ef1861f64

📥 Commits

Reviewing files that changed from the base of the PR and between 782ce9d and 1a8586b.

📒 Files selected for processing (15)
  • source/lib/include/fmt_nlist.h
  • source/lib/include/neighbor_list.h
  • source/lib/include/prod_env_mat.h
  • source/lib/src/gpu/neighbor_list.cu
  • source/lib/src/gpu/prod_env_mat.cu
  • source/lib/src/neighbor_list.cc
  • source/lib/src/prod_env_mat.cc
  • source/lib/tests/test_env_mat_a.cc
  • source/lib/tests/test_env_mat_a_mix.cc
  • source/lib/tests/test_env_mat_r.cc
  • source/lib/tests/test_fmt_nlist.cc
  • source/lib/tests/test_neighbor_list.cc
  • source/op/tf/custom_op.h
  • source/op/tf/neighbor_stat.cc
  • source/op/tf/prod_env_mat_multi_device.cc
🚧 Files skipped from review as they are similar to previous changes (10)
  • source/lib/include/fmt_nlist.h
  • source/lib/tests/test_fmt_nlist.cc
  • source/lib/include/prod_env_mat.h
  • source/lib/tests/test_env_mat_a_mix.cc
  • source/lib/include/neighbor_list.h
  • source/lib/src/neighbor_list.cc
  • source/lib/src/gpu/neighbor_list.cu
  • source/lib/tests/test_neighbor_list.cc
  • source/lib/tests/test_env_mat_a.cc
  • source/op/tf/prod_env_mat_multi_device.cc

Comment thread source/lib/src/gpu/prod_env_mat.cu
@njzjz njzjz force-pushed the fix/gpu-nlist-prefix-scan branch from 1a8586b to 048376b Compare June 25, 2026 10:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
source/lib/tests/test_env_mat_a_mix.cc (1)

890-895: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Free rij_dev in this cleanup path.

rij_dev is allocated and copied back in this test but never released here, unlike the adjacent GPU tests.

Proposed fix
   deepmd::delete_device_memory(em_dev);
   deepmd::delete_device_memory(em_deriv_dev);
+  deepmd::delete_device_memory(rij_dev);
   deepmd::delete_device_memory(nlist_dev);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 890 - 895, Free the
missing GPU buffer in this cleanup block: the test currently releases em_dev,
em_deriv_dev, nlist_dev, posi_cpy_dev, f_atype_cpy_dev, and atype_cpy_dev, but
omits rij_dev. Update the cleanup path in this test near the existing
deepmd::delete_device_memory calls to also release rij_dev, matching the
adjacent GPU test cleanup patterns and keeping the allocation/freeing symmetry.
🧹 Nitpick comments (1)
source/lib/tests/test_env_mat_a_mix.cc (1)

678-723: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Strengthen the multi-frame assertions beyond em.

Both new multi-frame tests repeat identical coordinates and only compare em, so a frame-offset bug in coordinate reads or in em_deriv/rij/nlist writes could still pass. Perturb one frame’s coordinates within the same neighbor topology and compare the expected em_deriv, rij, and formatted nlist buffers too.

Also applies to: 984-1073

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 678 - 723, The
multi-frame test in test_env_mat_a_mix.cc only validates em, so it can miss
frame-offset or buffer-writing bugs in prod_env_mat_a_cpu. Update the affected
multi-frame cases to perturb one frame’s coordinates while keeping the neighbor
topology valid, then compare the expected outputs for em_deriv, rij, and nlist
in addition to em using the existing prod_env_mat_a_cpu calls and frame-specific
buffers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 890-895: Free the missing GPU buffer in this cleanup block: the
test currently releases em_dev, em_deriv_dev, nlist_dev, posi_cpy_dev,
f_atype_cpy_dev, and atype_cpy_dev, but omits rij_dev. Update the cleanup path
in this test near the existing deepmd::delete_device_memory calls to also
release rij_dev, matching the adjacent GPU test cleanup patterns and keeping the
allocation/freeing symmetry.

---

Nitpick comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 678-723: The multi-frame test in test_env_mat_a_mix.cc only
validates em, so it can miss frame-offset or buffer-writing bugs in
prod_env_mat_a_cpu. Update the affected multi-frame cases to perturb one frame’s
coordinates while keeping the neighbor topology valid, then compare the expected
outputs for em_deriv, rij, and nlist in addition to em using the existing
prod_env_mat_a_cpu calls and frame-specific buffers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 249f5166-176c-464a-acd1-d2707922380d

📥 Commits

Reviewing files that changed from the base of the PR and between 1a8586b and 048376b.

📒 Files selected for processing (15)
  • source/lib/include/fmt_nlist.h
  • source/lib/include/neighbor_list.h
  • source/lib/include/prod_env_mat.h
  • source/lib/src/gpu/neighbor_list.cu
  • source/lib/src/gpu/prod_env_mat.cu
  • source/lib/src/neighbor_list.cc
  • source/lib/src/prod_env_mat.cc
  • source/lib/tests/test_env_mat_a.cc
  • source/lib/tests/test_env_mat_a_mix.cc
  • source/lib/tests/test_env_mat_r.cc
  • source/lib/tests/test_fmt_nlist.cc
  • source/lib/tests/test_neighbor_list.cc
  • source/op/tf/custom_op.h
  • source/op/tf/neighbor_stat.cc
  • source/op/tf/prod_env_mat_multi_device.cc
🚧 Files skipped from review as they are similar to previous changes (14)
  • source/op/tf/custom_op.h
  • source/lib/include/fmt_nlist.h
  • source/lib/tests/test_fmt_nlist.cc
  • source/lib/include/neighbor_list.h
  • source/lib/include/prod_env_mat.h
  • source/op/tf/neighbor_stat.cc
  • source/lib/tests/test_neighbor_list.cc
  • source/lib/src/neighbor_list.cc
  • source/lib/tests/test_env_mat_r.cc
  • source/lib/src/prod_env_mat.cc
  • source/lib/tests/test_env_mat_a.cc
  • source/lib/src/gpu/neighbor_list.cu
  • source/lib/src/gpu/prod_env_mat.cu
  • source/op/tf/prod_env_mat_multi_device.cc

@njzjz njzjz added the Test CUDA Trigger test CUDA workflow label Jun 25, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Jun 25, 2026
@njzjz njzjz requested review from iProzd and wanghan-iapcm June 25, 2026 16:37
Comment thread source/lib/tests/test_env_mat_a.cc Outdated
Comment thread source/lib/tests/test_env_mat_r.cc
@njzjz njzjz requested a review from wanghan-iapcm June 26, 2026 10:30
Comment thread source/lib/include/neighbor_list.h
@njzjz njzjz requested a review from wanghan-iapcm June 30, 2026 11:06
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
Copilot AI review requested due to automatic review settings July 1, 2026 02:19
@njzjz njzjz enabled auto-merge July 1, 2026 02:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends DeePMD-kit’s TensorFlow prod_env_mat/neighbor-list pipeline to support processing multiple frames per call (CPU and GPU) for non-LAMMPS neighbor-list modes, improving batching/parallelism across the frame dimension as requested in #2618.

Changes:

  • Add batched (multi-frame) coord-copy + neighbor-list preparation paths for CPU/GPU in prod_env_mat_multi_device (keeping LAMMPS external neighbor lists on the per-sample path).
  • Extend core neighbor-list and environment-matrix implementations (CPU/CUDA) with an nframes dimension and update call sites accordingly.
  • Add/extend unit tests for multi-frame neighbor list and env-mat behavior on CPU/GPU, including negative-type handling.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
source/op/tf/prod_env_mat_multi_device.cc Adds multi-frame preparation and compute paths for prod-env-mat, including mesh/self-built nlist batching on CPU/GPU.
source/op/tf/neighbor_stat.cc Updates GPU neighbor-stat path to use the new _prepare_coord_nlist_gpu Status-return API and improves device-memory cleanup.
source/op/tf/custom_op.h Updates _prepare_coord_nlist_gpu declaration to return tensorflow::Status.
source/lib/src/prod_env_mat.cc Adds nframes support to CPU prod-env-mat implementations and handles negative types consistently.
source/lib/src/gpu/prod_env_mat.cu Extends GPU neighbor formatting + env-mat kernels to operate over nframes * nloc.
source/lib/src/neighbor_list.cc Extends CPU neighbor-list construction to build nframes * nloc rows and optionally filter pairs via per-atom type.
source/lib/src/gpu/neighbor_list.cu Extends GPU neighbor-list construction to support multi-frame layouts and optional type filtering.
source/lib/include/prod_env_mat.h Updates env-mat APIs with nframes and provides single-frame inline wrappers for compatibility.
source/lib/include/neighbor_list.h Updates neighbor-list APIs/docs to include nframes and optional type filtering.
source/lib/include/fmt_nlist.h Updates format_nbor_list_gpu signature to accept nframes and adds a single-frame wrapper.
source/lib/tests/test_neighbor_list.cc Adds CPU/GPU multi-frame neighbor-list tests.
source/lib/tests/test_fmt_nlist.cc Updates GPU format-nlist tests for the new nframes parameter.
source/lib/tests/test_env_mat_r.cc Adds multi-frame + negative-type CPU/GPU coverage for prod_env_mat_r.
source/lib/tests/test_env_mat_a.cc Adds multi-frame CPU/GPU coverage for prod_env_mat_a.
source/lib/tests/test_env_mat_a_mix.cc Fixes/extends A-mix tests and adds multi-frame CPU/GPU coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/lib/tests/test_neighbor_list.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc

@njzjz-bot njzjz-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added applyable suggestions for the Copilot findings.

— OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Comment thread source/lib/tests/test_neighbor_list.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc
Comment thread source/op/tf/prod_env_mat_multi_device.cc
njzjz and others added 2 commits July 1, 2026 14:45
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: A bot of @njzjz <48687836+njzjz-bot@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
@njzjz njzjz added this pull request to the merge queue Jul 1, 2026
Merged via the queue into deepmodeling:master with commit 5e80487 Jul 1, 2026
57 checks passed
@njzjz njzjz deleted the fix/gpu-nlist-prefix-scan branch July 1, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

6 participants