feat(core): batch prod env mat over frames#5582
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR adds multi-frame support to neighbor-list and environment-matrix APIs, refactors CPU and GPU implementations to process flattened frame/atom rows, updates TensorFlow batching paths, and expands tests to cover multi-frame execution. ChangesMulti-frame batch support for neighbor-list and env-mat ops
Sequence Diagram(s)sequenceDiagram
participant ProdEnvMatAOp as ProdEnvMatAOp::_Compute
participant FormatNborListGpu as format_nbor_list_gpu
participant ComputeEnvMatA as compute_env_mat_a
ProdEnvMatAOp->>FormatNborListGpu: call(..., nframes)
FormatNborListGpu->>ComputeEnvMatA: launch over nframes*nloc
ComputeEnvMatA-->>ProdEnvMatAOp: write em, em_deriv, rij
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/lib/src/gpu/prod_env_mat.cu (1)
545-600: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winGuard negative center types in
compute_env_mat_r.
compute_env_mat_areturns before usingtype[i_idx_nall]when the center type is negative, butcompute_env_mat_rdoes not. For filtered rows, therow_nlist[ii] < 0branch can indexavg/stdwithtype[i_idx_nall] < 0, causing device out-of-bounds reads.Proposed fix
const int_64 frame_idx = bid / nloc; const int_64 atom_idx = bid % nloc; const int_64 i_idx_nall = frame_idx * nall + atom_idx; + if (type[i_idx_nall] < 0) { + return; + } const int ndescrpt = nnei;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lib/src/gpu/prod_env_mat.cu` around lines 545 - 600, `compute_env_mat_r` is missing the same negative-center-type guard that `compute_env_mat_a` already has, so `type[i_idx_nall]` can be negative and then used to index `avg/std` in the `row_nlist[ii] < 0` path. Update `compute_env_mat_r` to early-return or otherwise skip processing when the center type is negative, and make sure both the positive and negative neighbor branches never read `avg/std` or write outputs using `type[i_idx_nall]` unless it is valid. Use `compute_env_mat_r`, `compute_env_mat_a`, and `i_idx_nall` to locate the fix.
🧹 Nitpick comments (1)
source/lib/tests/test_env_mat_a.cc (1)
805-867: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffConsider perturbing the second frame's coordinates to strengthen GPU coverage.
Unlike
prod_cpu_multiple_frames(Lines 637-642), this GPU test repeatsposi_cpyidentically for both frames and compares each frame against the sameexpected_env. With identical frames, a kernel bug that mishandles the per-frame coordinate offset (e.g., reading frame 0's coords for frame 1) would not be detected. Applying the same per-frame perturbation used in the CPU test and comparing against per-frame reference outputs would make this test meaningfully verify frame-offset indexing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lib/tests/test_env_mat_a.cc` around lines 805 - 867, The GPU multi-frame test in prod_env_mat_a_gpu is too weak because it uses identical coordinates for every frame, so frame-offset indexing bugs can pass unnoticed. Update the test setup to perturb the second frame’s positions the same way the CPU multi-frame test does, then compute or store per-frame expected_env values and compare each frame against its own reference instead of reusing one shared baseline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 204-210: The R-descriptor normalization in prod_env_mat.cc can
still index avg/std with a negative center type when d_type[ii] is -1, which
must be guarded the same way as the A-path. Update the normalization logic
around the frame_em and frame_em_deriv loops to detect negative d_type[ii]
before using it for avg/std lookup, and zero out the corresponding outputs
instead of normalizing. Keep the fix localized to the descriptor-building code
that handles d_type, frame_em, and frame_em_deriv so negative-type filtering is
handled consistently.
In `@source/op/tf/prod_env_mat_multi_device.cc`:
- Around line 702-708: Reject oversized built GPU neighbor lists before rounding
in the built GPU nlist path: in the logic around _build_nlist_gpu and the
chunk_max_nbor_size rounding block, add a guard that checks the reported maximum
against GPU_MAX_NBOR_SIZE before any rounding to 1024/2048/4096, and fail early
if it exceeds the kernel-supported limit. Apply the same fix in the other
duplicated rounding sites so formatting/descriptor kernels never receive an
undersized max_nbor_size for an actually oversized neighbor list.
- Around line 2087-2098: The mesh tensor validation in the neighbor-count loop
allows integer overflow because `neighbor_count` is accumulated as `int` from
tensor data, which can let an invalid buffer size slip through. Update the
bounds check in the mesh parsing path around `numneigh_in`, `neighbor_count`,
and the `mesh_tensor_size` validation to accumulate with a wider unsigned/signed
64-bit type and verify each addition does not exceed the remaining tensor length
before proceeding; apply the same fix in the matching neighbor-count check used
elsewhere in this file.
- Around line 675-737: `firstneigh` is allocated with raw device memory in the
neighbor-list path, but several `OP_REQUIRES`/`OP_REQUIRES_OK` exits can return
before `deepmd::delete_device_memory(firstneigh)` runs, leaking GPU memory.
Update the `prod_env_mat_multi_device` flow that uses `firstneigh` to ensure
cleanup on every early exit, preferably by introducing a small RAII/guard
wrapper around the allocation or switching to a Tensor-owned buffer so the
memory is released automatically even when `_prepare_mesh_nlist_gpu_batch`,
`_build_nlist_gpu`, or later allocations fail. Apply the same pattern in the
other matching `firstneigh` code paths referenced by this diff.
- Around line 2128-2206: The batched CPU helper _prepare_coord_nlist_cpu_batch
currently swallows failures because OP_REQUIRES and OP_REQUIRES_OK only exit the
helper, allowing prod_env_mat_*_cpu to continue with an invalid or uninitialized
inlist. Change _prepare_coord_nlist_cpu_batch to return a tensorflow::Status or
boolean success value, propagate failures from the copy and neighbor-list setup
paths, and update the call sites to wrap it with OP_REQUIRES_OK so descriptor
computation stops immediately on invalid mesh data or allocation errors.
---
Outside diff comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 545-600: `compute_env_mat_r` is missing the same
negative-center-type guard that `compute_env_mat_a` already has, so
`type[i_idx_nall]` can be negative and then used to index `avg/std` in the
`row_nlist[ii] < 0` path. Update `compute_env_mat_r` to early-return or
otherwise skip processing when the center type is negative, and make sure both
the positive and negative neighbor branches never read `avg/std` or write
outputs using `type[i_idx_nall]` unless it is valid. Use `compute_env_mat_r`,
`compute_env_mat_a`, and `i_idx_nall` to locate the fix.
---
Nitpick comments:
In `@source/lib/tests/test_env_mat_a.cc`:
- Around line 805-867: The GPU multi-frame test in prod_env_mat_a_gpu is too
weak because it uses identical coordinates for every frame, so frame-offset
indexing bugs can pass unnoticed. Update the test setup to perturb the second
frame’s positions the same way the CPU multi-frame test does, then compute or
store per-frame expected_env values and compare each frame against its own
reference instead of reusing one shared baseline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 80d02dbc-a6db-44dc-86b9-5586fc945ee1
📒 Files selected for processing (13)
source/lib/include/fmt_nlist.hsource/lib/include/neighbor_list.hsource/lib/include/prod_env_mat.hsource/lib/src/gpu/neighbor_list.cusource/lib/src/gpu/prod_env_mat.cusource/lib/src/neighbor_list.ccsource/lib/src/prod_env_mat.ccsource/lib/tests/test_env_mat_a.ccsource/lib/tests/test_env_mat_a_mix.ccsource/lib/tests/test_env_mat_r.ccsource/lib/tests/test_fmt_nlist.ccsource/lib/tests/test_neighbor_list.ccsource/op/tf/prod_env_mat_multi_device.cc
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5582 +/- ##
==========================================
- Coverage 81.97% 81.82% -0.15%
==========================================
Files 959 960 +1
Lines 105748 106180 +432
Branches 4102 4168 +66
==========================================
+ Hits 86684 86886 +202
- Misses 17573 17796 +223
- Partials 1491 1498 +7 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
782ce9d to
1a8586b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
source/lib/src/prod_env_mat.cc (1)
39-80: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftFlatten CPU frame/atom work instead of serializing frames.
Both CPU paths still run a serial outer frame loop and recreate the OpenMP region per frame, so large
nframesbatches with modestnlocdo not get frame-level parallelism. Consider restructuring around flattened(frame, atom)rows or another single parallel region with frame-local buffers.Also applies to: 146-186
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lib/src/prod_env_mat.cc` around lines 39 - 80, The CPU frame handling in prod_env_mat.cc still processes nframes serially and re-enters the OpenMP region for each frame, leaving frame-level parallelism unused. Refactor the logic around the main frame-processing loop and the existing `#pragma` omp parallel for section so work is flattened across (frame, atom) rows or handled in one outer parallel region with frame-local buffers, using the same frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/lib/src/gpu/prod_env_mat.cu`:
- Around line 404-407: The A-mix path is indexing the center-type buffer with a
nframes × nall stride in prod_env_mat_a_cpu/prod_env_mat_a_gpu, but some callers
may still pass a compact nframes × nloc type array. Update the A-mix call sites
and tests (especially test_env_mat_a_mix.cc and the multi-device path) so the
center type data is expanded to frame_nall, or switch the kernels to use the
already-expanded f_type/f_atype_cpy argument for center normalization instead of
type. Ensure the chosen buffer matches the posi_cpy expansion and is
consistently sized before the kernel reads frame_idx * nall + atom_idx.
---
Nitpick comments:
In `@source/lib/src/prod_env_mat.cc`:
- Around line 39-80: The CPU frame handling in prod_env_mat.cc still processes
nframes serially and re-enters the OpenMP region for each frame, leaving
frame-level parallelism unused. Refactor the logic around the main
frame-processing loop and the existing `#pragma` omp parallel for section so work
is flattened across (frame, atom) rows or handled in one outer parallel region
with frame-local buffers, using the same
frame-em/frame-em_deriv/frame-rij/frame-nlist flow as the current
implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8a1ce82c-bc3c-4298-b2c4-784ef1861f64
📒 Files selected for processing (15)
source/lib/include/fmt_nlist.hsource/lib/include/neighbor_list.hsource/lib/include/prod_env_mat.hsource/lib/src/gpu/neighbor_list.cusource/lib/src/gpu/prod_env_mat.cusource/lib/src/neighbor_list.ccsource/lib/src/prod_env_mat.ccsource/lib/tests/test_env_mat_a.ccsource/lib/tests/test_env_mat_a_mix.ccsource/lib/tests/test_env_mat_r.ccsource/lib/tests/test_fmt_nlist.ccsource/lib/tests/test_neighbor_list.ccsource/op/tf/custom_op.hsource/op/tf/neighbor_stat.ccsource/op/tf/prod_env_mat_multi_device.cc
🚧 Files skipped from review as they are similar to previous changes (10)
- source/lib/include/fmt_nlist.h
- source/lib/tests/test_fmt_nlist.cc
- source/lib/include/prod_env_mat.h
- source/lib/tests/test_env_mat_a_mix.cc
- source/lib/include/neighbor_list.h
- source/lib/src/neighbor_list.cc
- source/lib/src/gpu/neighbor_list.cu
- source/lib/tests/test_neighbor_list.cc
- source/lib/tests/test_env_mat_a.cc
- source/op/tf/prod_env_mat_multi_device.cc
1a8586b to
048376b
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/lib/tests/test_env_mat_a_mix.cc (1)
890-895: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winFree
rij_devin this cleanup path.
rij_devis allocated and copied back in this test but never released here, unlike the adjacent GPU tests.Proposed fix
deepmd::delete_device_memory(em_dev); deepmd::delete_device_memory(em_deriv_dev); + deepmd::delete_device_memory(rij_dev); deepmd::delete_device_memory(nlist_dev);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 890 - 895, Free the missing GPU buffer in this cleanup block: the test currently releases em_dev, em_deriv_dev, nlist_dev, posi_cpy_dev, f_atype_cpy_dev, and atype_cpy_dev, but omits rij_dev. Update the cleanup path in this test near the existing deepmd::delete_device_memory calls to also release rij_dev, matching the adjacent GPU test cleanup patterns and keeping the allocation/freeing symmetry.
🧹 Nitpick comments (1)
source/lib/tests/test_env_mat_a_mix.cc (1)
678-723: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winStrengthen the multi-frame assertions beyond
em.Both new multi-frame tests repeat identical coordinates and only compare
em, so a frame-offset bug in coordinate reads or inem_deriv/rij/nlistwrites could still pass. Perturb one frame’s coordinates within the same neighbor topology and compare the expectedem_deriv,rij, and formattednlistbuffers too.Also applies to: 984-1073
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lib/tests/test_env_mat_a_mix.cc` around lines 678 - 723, The multi-frame test in test_env_mat_a_mix.cc only validates em, so it can miss frame-offset or buffer-writing bugs in prod_env_mat_a_cpu. Update the affected multi-frame cases to perturb one frame’s coordinates while keeping the neighbor topology valid, then compare the expected outputs for em_deriv, rij, and nlist in addition to em using the existing prod_env_mat_a_cpu calls and frame-specific buffers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 890-895: Free the missing GPU buffer in this cleanup block: the
test currently releases em_dev, em_deriv_dev, nlist_dev, posi_cpy_dev,
f_atype_cpy_dev, and atype_cpy_dev, but omits rij_dev. Update the cleanup path
in this test near the existing deepmd::delete_device_memory calls to also
release rij_dev, matching the adjacent GPU test cleanup patterns and keeping the
allocation/freeing symmetry.
---
Nitpick comments:
In `@source/lib/tests/test_env_mat_a_mix.cc`:
- Around line 678-723: The multi-frame test in test_env_mat_a_mix.cc only
validates em, so it can miss frame-offset or buffer-writing bugs in
prod_env_mat_a_cpu. Update the affected multi-frame cases to perturb one frame’s
coordinates while keeping the neighbor topology valid, then compare the expected
outputs for em_deriv, rij, and nlist in addition to em using the existing
prod_env_mat_a_cpu calls and frame-specific buffers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 249f5166-176c-464a-acd1-d2707922380d
📒 Files selected for processing (15)
source/lib/include/fmt_nlist.hsource/lib/include/neighbor_list.hsource/lib/include/prod_env_mat.hsource/lib/src/gpu/neighbor_list.cusource/lib/src/gpu/prod_env_mat.cusource/lib/src/neighbor_list.ccsource/lib/src/prod_env_mat.ccsource/lib/tests/test_env_mat_a.ccsource/lib/tests/test_env_mat_a_mix.ccsource/lib/tests/test_env_mat_r.ccsource/lib/tests/test_fmt_nlist.ccsource/lib/tests/test_neighbor_list.ccsource/op/tf/custom_op.hsource/op/tf/neighbor_stat.ccsource/op/tf/prod_env_mat_multi_device.cc
🚧 Files skipped from review as they are similar to previous changes (14)
- source/op/tf/custom_op.h
- source/lib/include/fmt_nlist.h
- source/lib/tests/test_fmt_nlist.cc
- source/lib/include/neighbor_list.h
- source/lib/include/prod_env_mat.h
- source/op/tf/neighbor_stat.cc
- source/lib/tests/test_neighbor_list.cc
- source/lib/src/neighbor_list.cc
- source/lib/tests/test_env_mat_r.cc
- source/lib/src/prod_env_mat.cc
- source/lib/tests/test_env_mat_a.cc
- source/lib/src/gpu/neighbor_list.cu
- source/lib/src/gpu/prod_env_mat.cu
- source/op/tf/prod_env_mat_multi_device.cc
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
There was a problem hiding this comment.
Pull request overview
This PR extends DeePMD-kit’s TensorFlow prod_env_mat/neighbor-list pipeline to support processing multiple frames per call (CPU and GPU) for non-LAMMPS neighbor-list modes, improving batching/parallelism across the frame dimension as requested in #2618.
Changes:
- Add batched (multi-frame) coord-copy + neighbor-list preparation paths for CPU/GPU in
prod_env_mat_multi_device(keeping LAMMPS external neighbor lists on the per-sample path). - Extend core neighbor-list and environment-matrix implementations (CPU/CUDA) with an
nframesdimension and update call sites accordingly. - Add/extend unit tests for multi-frame neighbor list and env-mat behavior on CPU/GPU, including negative-type handling.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| source/op/tf/prod_env_mat_multi_device.cc | Adds multi-frame preparation and compute paths for prod-env-mat, including mesh/self-built nlist batching on CPU/GPU. |
| source/op/tf/neighbor_stat.cc | Updates GPU neighbor-stat path to use the new _prepare_coord_nlist_gpu Status-return API and improves device-memory cleanup. |
| source/op/tf/custom_op.h | Updates _prepare_coord_nlist_gpu declaration to return tensorflow::Status. |
| source/lib/src/prod_env_mat.cc | Adds nframes support to CPU prod-env-mat implementations and handles negative types consistently. |
| source/lib/src/gpu/prod_env_mat.cu | Extends GPU neighbor formatting + env-mat kernels to operate over nframes * nloc. |
| source/lib/src/neighbor_list.cc | Extends CPU neighbor-list construction to build nframes * nloc rows and optionally filter pairs via per-atom type. |
| source/lib/src/gpu/neighbor_list.cu | Extends GPU neighbor-list construction to support multi-frame layouts and optional type filtering. |
| source/lib/include/prod_env_mat.h | Updates env-mat APIs with nframes and provides single-frame inline wrappers for compatibility. |
| source/lib/include/neighbor_list.h | Updates neighbor-list APIs/docs to include nframes and optional type filtering. |
| source/lib/include/fmt_nlist.h | Updates format_nbor_list_gpu signature to accept nframes and adds a single-frame wrapper. |
| source/lib/tests/test_neighbor_list.cc | Adds CPU/GPU multi-frame neighbor-list tests. |
| source/lib/tests/test_fmt_nlist.cc | Updates GPU format-nlist tests for the new nframes parameter. |
| source/lib/tests/test_env_mat_r.cc | Adds multi-frame + negative-type CPU/GPU coverage for prod_env_mat_r. |
| source/lib/tests/test_env_mat_a.cc | Adds multi-frame CPU/GPU coverage for prod_env_mat_a. |
| source/lib/tests/test_env_mat_a_mix.cc | Fixes/extends A-mix tests and adds multi-frame CPU/GPU coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
njzjz-bot
left a comment
There was a problem hiding this comment.
Added applyable suggestions for the Copilot findings.
— OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: A bot of @njzjz <48687836+njzjz-bot@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
for more information, see https://pre-commit.ci
Summary
Fixes #2618
Tests
Summary by CodeRabbit
nframesargument (CPU and GPU).em/em_derivwhere applicable).