Skip to content

[Code scan] System slicing and first append share mutable data with the source #985

Description

@njzjz

This issue is part of a Codex global repository code scan.

System.sub_system() and the first append into an empty System can return/store shared mutable objects from the source system. Mutating the derived system can unexpectedly mutate the original, and mutating the original after an initial append can mutate the appended target.

Affected code:

dpdata/dpdata/system.py

Lines 456 to 465 in a7a50bf

if tt.shape is not None and Axis.NFRAMES in tt.shape:
axis_nframes = tt.shape.index(Axis.NFRAMES)
new_shape: list[slice | np.ndarray | list] = [
slice(None) for _ in self.data[tt.name].shape
]
new_shape[axis_nframes] = f_idx
tmp.data[tt.name] = self.data[tt.name][tuple(new_shape)]
else:
# keep the original data
tmp.data[tt.name] = self.data[tt.name]

dpdata/dpdata/system.py

Lines 476 to 482 in a7a50bf

if not len(system.data["atom_numbs"]):
# skip if the system to append is non-converged
return False
elif not len(self.data["atom_numbs"]):
# this system is non-converged but the system to append is converged
self.data = system.data.copy()
return False

Minimal reproducer:

import numpy as np
import dpdata

data = {
    "atom_names": ["O", "H"],
    "atom_numbs": [1, 2],
    "atom_types": np.array([1, 0, 1]),
    "orig": np.zeros(3),
    "cells": np.arange(18.0).reshape(2, 3, 3) + np.eye(3),
    "coords": np.arange(18.0).reshape(2, 3, 3),
}

s = dpdata.System(data={k: (v.copy() if hasattr(v, "copy") else list(v)) for k, v in data.items()})
sub = s[:1]
sub.data["coords"][0, 0, 0] = 123
print(s.data["coords"][0, 0, 0])

src = dpdata.System(data={k: (v.copy() if hasattr(v, "copy") else list(v)) for k, v in data.items()})
tgt = dpdata.System()
tgt.append(src)
src.data["coords"][0, 0, 0] = 7
src.data["atom_names"][0] = "X"
print(tgt.data["coords"][0, 0, 0], tgt.data["atom_names"])

Current output shows aliasing:

123.0
7.0 ['X', 'H']

sub_system() should return independent data, and the initial append path should deep-copy the source data consistently with other append paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions