From be9d87bcb06e5a19f6e27196ac6a6c1889c4deb1 Mon Sep 17 00:00:00 2001 From: "A bot of @njzjz" <48687836+njzjz-bot@users.noreply.github.com> Date: Tue, 10 Mar 2026 14:33:57 +0800 Subject: [PATCH 1/8] sync(skills): feat: add LAMMPS DeePMD skill (#13) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@155a6755a8902ac30c4da2ba8ee7238b5e0b70d6 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/lammps-deepmd/SKILL.md | 291 ++++++++++++++++++ skills/lammps-deepmd/assets/input.nvt.lammps | 25 ++ .../references/commands-and-workflow.md | 54 ++++ 3 files changed, 370 insertions(+) create mode 100644 skills/lammps-deepmd/SKILL.md create mode 100644 skills/lammps-deepmd/assets/input.nvt.lammps create mode 100644 skills/lammps-deepmd/references/commands-and-workflow.md diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md new file mode 100644 index 0000000000..a64808e9f3 --- /dev/null +++ b/skills/lammps-deepmd/SKILL.md @@ -0,0 +1,291 @@ +--- +name: lammps-deepmd +description: Run molecular dynamics simulations in LAMMPS with the DeePMD-kit plugin, including preparing input scripts, choosing ensembles such as NVE/NVT/NPT, validating commands against LAMMPS documentation, and executing jobs either with `uvx --from lammps --with deepmd-kit[gpu,torch] lmp` when internet access is available or with a user-specified offline LAMMPS executable. +compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx --from lammps --with deepmd-kit[gpu,torch] lmp`; offline mode requires a user-provided LAMMPS executable or module. +license: MIT +metadata: + author: OpenClaw + version: "1.0" + repository: https://github.com/deepmodeling/deepmd-kit + lammps_docs: https://docs.lammps.org/ +--- + +# LAMMPS + DeePMD-kit + +Use this skill when the user wants to run molecular dynamics in LAMMPS with a DeePMD-kit potential, prepare or explain an `input.lammps` file, or switch between common ensembles such as NVE, NVT, and NPT. + +## Agent responsibilities + +1. Confirm the available execution mode: + - **Online mode**: if internet access is available and `uv` is installed, prefer + `uvx --from lammps --with deepmd-kit[gpu,torch] lmp ...` + - **Offline mode**: do **not** guess the executable. Ask the user which LAMMPS command, module, or container should be used. +2. Confirm the minimum simulation inputs: + - structure/data file (for example `data.system`) + - DeePMD model file (for example `graph.pb` or compressed model) + - target ensemble (NVE, NVT, NPT, or another explicitly requested setup) + - temperature, pressure if applicable, timestep, and total number of steps +3. Write the LAMMPS input script yourself instead of asking the user to hand-write it. +4. Keep the example readable and fully explained. If you include an example input script, explain what **every command** does. +5. When possible, validate command availability against the LAMMPS docs or local `lmp -h` output before execution. +6. Report clearly which command was run, which files were used, and where outputs were written. + +## Decide the execution mode + +### Online mode (preferred when internet access is available) + +Use: + +```bash +uvx --from lammps --with deepmd-kit[gpu,torch] lmp -in input.lammps +``` + +If you need to inspect the local command-line help: + +```bash +uvx --from lammps --with deepmd-kit[gpu,torch] lmp -h | tee /dev/tty +``` + +Notes: +- This is the preferred path because it can provision LAMMPS and DeePMD-kit on demand. +- The `gpu,torch` extras match the requested runtime pattern from the user. +- If the environment is slow or the packages are large, warn the user that the first run may take time. + +### Offline mode + +If internet access is unavailable or the user explicitly wants a site-installed binary, ask a concrete question such as: + +- "Which LAMMPS executable should I use, for example `lmp`, `lmp_mpi`, `mpirun -np 8 lmp`, or an HPC module command?" +- "Do you already have a DeePMD-enabled LAMMPS build on this machine or cluster?" + +Do not invent a binary name or module name. + +## Minimal information to collect + +Ask only for what is missing: + +- DeePMD model path +- LAMMPS data file path +- ensemble +- target temperature +- target pressure if using NPT +- timestep +- run length in steps +- whether velocities should be generated from scratch +- preferred execution command if offline + +## Recommended workflow + +1. Inspect available files in the working directory. +2. Draft `input.lammps`. +3. Explain the script to the user if they asked for an explanation or if the script is nontrivial. +4. Run a short smoke test first when reasonable. +5. Run the full simulation. +6. Summarize outputs such as `log.lammps`, dump trajectories, restart files, and thermodynamic data. + +## Example: annotated NVT input + +The following example is adapted from the user-provided tutorial pattern and slightly generalized. See also `assets/input.nvt.lammps`. + +```lammps +variable NSTEPS equal 1000000 +variable THERMO_FREQ equal 1000 +variable DUMP_FREQ equal 1000 +variable TEMP equal 300.0 +variable TAU_T equal 0.1 + +units metal +boundary p p p +atom_style atomic + +neighbor 1.0 bin + +read_data data.system +pair_style deepmd graph_compressed.pb +pair_coeff * * + +thermo_style custom step temp pe ke etotal press vol lx ly lz xy xz yz +thermo ${THERMO_FREQ} +dump 1 all custom ${DUMP_FREQ} traj.lammpstrj id type x y z + +velocity all create ${TEMP} 743574 +fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T} + +timestep 0.0005 +run ${NSTEPS} +``` + +### What every command means + +- `variable NSTEPS equal 1000000` + - Defines a numeric variable called `NSTEPS` with value `1000000`. + - Used later by `run ${NSTEPS}` so the run length is easy to modify in one place. + +- `variable THERMO_FREQ equal 1000` + - Defines how often LAMMPS prints thermodynamic information. + - Used by `thermo ${THERMO_FREQ}`. + +- `variable DUMP_FREQ equal 1000` + - Defines how often coordinates are written to the trajectory dump. + +- `variable TEMP equal 300.0` + - Sets the target temperature in the current unit system. + - Because `units metal` is used below, this temperature is interpreted in kelvin. + +- `variable TAU_T equal 0.1` + - Sets the thermostat damping parameter used by the NVT fix. + - In `metal` units this is in picoseconds. + +- `units metal` + - Selects the LAMMPS `metal` unit system. + - This determines the physical meaning of timestep, temperature, pressure, energy, distance, and time. + - In this unit system, distances are in angstrom, time is in picoseconds, and the timestep should be chosen accordingly. + +- `boundary p p p` + - Applies periodic boundary conditions in x, y, and z. + - Suitable for bulk condensed-phase simulations. + +- `atom_style atomic` + - Uses the `atomic` atom style, appropriate when atoms have no explicit bonds, angles, or molecular topology in the force field description. + - Common for DeePMD simulations of condensed phases when the structure is provided as atoms in a box. + +- `neighbor 1.0 bin` + - Sets the neighbor-list skin distance to `1.0` in the current distance unit. + - Uses the `bin` neighbor-building method. + - Neighbor lists help LAMMPS efficiently find nearby atoms for force evaluation. + +- `read_data data.system` + - Reads the initial atomic structure, atom types, simulation box, and related information from the LAMMPS data file `data.system`. + - Replace this filename with the actual user file. + +- `pair_style deepmd graph_compressed.pb` + - Selects the DeePMD pair style. + - Loads the DeePMD model from `graph_compressed.pb`. + - Replace the model filename with the actual model path, for example `graph.pb`, `graph-compress.pb`, or another supported exported model. + +- `pair_coeff * *` + - Activates the previously selected pair style for all atom types. + - For DeePMD this often takes the simple form `* *` because the mapping is embedded in the model workflow rather than through conventional pairwise parameters. + +- `thermo_style custom step temp pe ke etotal press vol lx ly lz xy xz yz` + - Chooses exactly which thermodynamic quantities to print. + - `step`: timestep index. + - `temp`: instantaneous temperature. + - `pe`: potential energy. + - `ke`: kinetic energy. + - `etotal`: total energy. + - `press`: pressure. + - `vol`: box volume. + - `lx ly lz`: box lengths. + - `xy xz yz`: triclinic tilt factors, which are harmless to print even for an orthogonal box. + +- `thermo ${THERMO_FREQ}` + - Prints the thermo block every `THERMO_FREQ` timesteps. + +- `dump 1 all custom ${DUMP_FREQ} traj.lammpstrj id type x y z` + - Creates dump ID `1`. + - Dumps atoms from group `all`. + - Uses the `custom` dump format. + - Writes every `DUMP_FREQ` steps. + - Saves to `traj.lammpstrj`. + - Outputs per-atom columns `id type x y z`. + +- `velocity all create ${TEMP} 743574` + - Assigns random initial velocities to all atoms. + - The target temperature is `TEMP`. + - `743574` is the random seed. + - Use this when starting a fresh MD trajectory. If restarting from a previous equilibrated state, this command may be unnecessary. + +- `fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T}` + - Creates fix ID `1` on group `all`. + - Applies the Nose-Hoover NVT thermostat. + - The target temperature is ramped from `${TEMP}` to `${TEMP}`, meaning constant temperature here. + - `${TAU_T}` is the thermostat damping constant. + +- `timestep 0.0005` + - Sets the MD timestep. + - In `metal` units, `0.0005` means `0.0005 ps = 0.5 fs`. + - The safe choice depends on the system and model quality. + +- `run ${NSTEPS}` + - Runs molecular dynamics for `NSTEPS` timesteps. + +## Common ensemble modifications + +### NVE + +Replace the NVT thermostat line with: + +```lammps +fix 1 all nve +``` + +Meaning: +- integrates Newton's equations in the microcanonical ensemble +- no thermostat or barostat is applied +- useful for short stability checks or production runs after equilibration + +### NPT + +A typical isotropic NPT alternative is: + +```lammps +variable PRESS equal 1.0 +variable TAU_P equal 1.0 +fix 1 all npt temp ${TEMP} ${TEMP} ${TAU_T} iso ${PRESS} ${PRESS} ${TAU_P} +``` + +Meaning: +- `PRESS` is the target pressure +- `TAU_P` is the barostat damping constant +- `iso` applies isotropic pressure control to the simulation box +- this simultaneously thermostats and barostats the system + +When using NPT, it is often useful to keep `vol`, `lx`, `ly`, and `lz` in the thermo output. + +## Execution templates + +### Online run + +```bash +uvx --from lammps --with deepmd-kit[gpu,torch] lmp -in input.lammps +``` + +### Online help + +```bash +uvx --from lammps --with deepmd-kit[gpu,torch] lmp -h | tee /dev/tty +``` + +### Offline run + +Only after the user specifies the executable, use a command such as one of these exact patterns: + +```bash +lmp -in input.lammps +mpirun -np 8 lmp_mpi -in input.lammps +srun lmp -in input.lammps +``` + +The agent must not choose one of these on its own without user guidance in offline mode. + +## Output checklist + +After a run, report at least: + +- executed command +- input script path +- data file path +- model path +- main log path +- trajectory path if any +- whether the run completed successfully +- any obvious warnings or errors from the log + +## References + +- LAMMPS command categories: https://docs.lammps.org/Commands_category.html +- LAMMPS command index: https://docs.lammps.org/Commands_all.html +- DeePMD-kit: https://github.com/deepmodeling/deepmd-kit +- User-provided tutorial reference: https://github.com/tongzhugroup/Chapter13-tutorial/blob/master/input.lammps +- Detailed notes: `references/commands-and-workflow.md` diff --git a/skills/lammps-deepmd/assets/input.nvt.lammps b/skills/lammps-deepmd/assets/input.nvt.lammps new file mode 100644 index 0000000000..17bd72cc86 --- /dev/null +++ b/skills/lammps-deepmd/assets/input.nvt.lammps @@ -0,0 +1,25 @@ +variable NSTEPS equal 1000000 +variable THERMO_FREQ equal 1000 +variable DUMP_FREQ equal 1000 +variable TEMP equal 300.0 +variable TAU_T equal 0.1 + +units metal +boundary p p p +atom_style atomic + +neighbor 1.0 bin + +read_data data.system +pair_style deepmd graph_compressed.pb +pair_coeff * * + +thermo_style custom step temp pe ke etotal press vol lx ly lz xy xz yz +thermo ${THERMO_FREQ} +dump 1 all custom ${DUMP_FREQ} traj.lammpstrj id type x y z + +velocity all create ${TEMP} 743574 +fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T} + +timestep 0.0005 +run ${NSTEPS} diff --git a/skills/lammps-deepmd/references/commands-and-workflow.md b/skills/lammps-deepmd/references/commands-and-workflow.md new file mode 100644 index 0000000000..41942437a6 --- /dev/null +++ b/skills/lammps-deepmd/references/commands-and-workflow.md @@ -0,0 +1,54 @@ +# LAMMPS + DeePMD-kit Reference Notes + +This reference expands the main skill with practical operating guidance. + +## When to use this skill + +Use this skill when a user needs to: + +- run LAMMPS with a DeePMD-kit model +- write or modify `input.lammps` +- explain what a LAMMPS command does +- switch between NVE, NVT, and NPT +- run through `uvx` in an internet-connected environment +- run through a site-installed `lmp` command in an offline or HPC environment + +## Practical rules for agents + +1. Prefer small, explicit input scripts over clever but opaque templates. +2. Explain every command in the example script, because many users treat the example as a starting point for their own production run. +3. If the user asks to run a simulation, always confirm the structure file and DeePMD model file before execution. +4. If the user asks for offline execution, ask which exact LAMMPS command should be used instead of guessing. +5. If the user only asks for a template, do not overcomplicate it with advanced computes or fixes unless they are needed. + +## Suggested smoke test strategy + +Before a long production run, consider a short test such as: + +```lammps +run 100 +``` + +This helps catch obvious issues such as: + +- missing model file +- unsupported pair style in the local LAMMPS build +- malformed data file +- immediate numerical instability + +Then replace the short run with the intended production length. + +## Typical files in a DeePMD-LAMMPS job + +- `input.lammps`: input script +- `data.system`: atomic structure and box +- `graph.pb` or `graph_compressed.pb`: DeePMD model +- `log.lammps`: main textual log +- `traj.lammpstrj`: trajectory output + +## Caution points + +- The correct timestep depends on the physical system and the DeePMD model quality. +- `velocity ... create ...` should usually not be repeated when continuing from a restart. +- NPT settings need physically sensible damping constants; avoid copying values blindly. +- Some local LAMMPS builds may support DeePMD under slightly different package configurations. Check `lmp -h` if unsure. From c3b2ff941a7f67d9ce8e932ca84837a8a6b22dfc Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Sat, 14 Mar 2026 21:37:39 +0800 Subject: [PATCH 2/8] sync(skills): style: add mdformat hook (#26) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@d8ff524bba415d2a0c55e603a6235f567063efb4 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/lammps-deepmd/SKILL.md | 44 ++++++++++++++----- .../references/commands-and-workflow.md | 8 ++-- 2 files changed, 37 insertions(+), 15 deletions(-) diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md index a64808e9f3..d90f653f0e 100644 --- a/skills/lammps-deepmd/SKILL.md +++ b/skills/lammps-deepmd/SKILL.md @@ -5,7 +5,7 @@ compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx license: MIT metadata: author: OpenClaw - version: "1.0" + version: '1.0' repository: https://github.com/deepmodeling/deepmd-kit lammps_docs: https://docs.lammps.org/ --- @@ -20,15 +20,15 @@ Use this skill when the user wants to run molecular dynamics in LAMMPS with a De - **Online mode**: if internet access is available and `uv` is installed, prefer `uvx --from lammps --with deepmd-kit[gpu,torch] lmp ...` - **Offline mode**: do **not** guess the executable. Ask the user which LAMMPS command, module, or container should be used. -2. Confirm the minimum simulation inputs: +1. Confirm the minimum simulation inputs: - structure/data file (for example `data.system`) - DeePMD model file (for example `graph.pb` or compressed model) - target ensemble (NVE, NVT, NPT, or another explicitly requested setup) - temperature, pressure if applicable, timestep, and total number of steps -3. Write the LAMMPS input script yourself instead of asking the user to hand-write it. -4. Keep the example readable and fully explained. If you include an example input script, explain what **every command** does. -5. When possible, validate command availability against the LAMMPS docs or local `lmp -h` output before execution. -6. Report clearly which command was run, which files were used, and where outputs were written. +1. Write the LAMMPS input script yourself instead of asking the user to hand-write it. +1. Keep the example readable and fully explained. If you include an example input script, explain what **every command** does. +1. When possible, validate command availability against the LAMMPS docs or local `lmp -h` output before execution. +1. Report clearly which command was run, which files were used, and where outputs were written. ## Decide the execution mode @@ -47,6 +47,7 @@ uvx --from lammps --with deepmd-kit[gpu,torch] lmp -h | tee /dev/tty ``` Notes: + - This is the preferred path because it can provision LAMMPS and DeePMD-kit on demand. - The `gpu,torch` extras match the requested runtime pattern from the user. - If the environment is slow or the packages are large, warn the user that the first run may take time. @@ -77,11 +78,11 @@ Ask only for what is missing: ## Recommended workflow 1. Inspect available files in the working directory. -2. Draft `input.lammps`. -3. Explain the script to the user if they asked for an explanation or if the script is nontrivial. -4. Run a short smoke test first when reasonable. -5. Run the full simulation. -6. Summarize outputs such as `log.lammps`, dump trajectories, restart files, and thermodynamic data. +1. Draft `input.lammps`. +1. Explain the script to the user if they asked for an explanation or if the script is nontrivial. +1. Run a short smoke test first when reasonable. +1. Run the full simulation. +1. Summarize outputs such as `log.lammps`, dump trajectories, restart files, and thermodynamic data. ## Example: annotated NVT input @@ -118,56 +119,69 @@ run ${NSTEPS} ### What every command means - `variable NSTEPS equal 1000000` + - Defines a numeric variable called `NSTEPS` with value `1000000`. - Used later by `run ${NSTEPS}` so the run length is easy to modify in one place. - `variable THERMO_FREQ equal 1000` + - Defines how often LAMMPS prints thermodynamic information. - Used by `thermo ${THERMO_FREQ}`. - `variable DUMP_FREQ equal 1000` + - Defines how often coordinates are written to the trajectory dump. - `variable TEMP equal 300.0` + - Sets the target temperature in the current unit system. - Because `units metal` is used below, this temperature is interpreted in kelvin. - `variable TAU_T equal 0.1` + - Sets the thermostat damping parameter used by the NVT fix. - In `metal` units this is in picoseconds. - `units metal` + - Selects the LAMMPS `metal` unit system. - This determines the physical meaning of timestep, temperature, pressure, energy, distance, and time. - In this unit system, distances are in angstrom, time is in picoseconds, and the timestep should be chosen accordingly. - `boundary p p p` + - Applies periodic boundary conditions in x, y, and z. - Suitable for bulk condensed-phase simulations. - `atom_style atomic` + - Uses the `atomic` atom style, appropriate when atoms have no explicit bonds, angles, or molecular topology in the force field description. - Common for DeePMD simulations of condensed phases when the structure is provided as atoms in a box. - `neighbor 1.0 bin` + - Sets the neighbor-list skin distance to `1.0` in the current distance unit. - Uses the `bin` neighbor-building method. - Neighbor lists help LAMMPS efficiently find nearby atoms for force evaluation. - `read_data data.system` + - Reads the initial atomic structure, atom types, simulation box, and related information from the LAMMPS data file `data.system`. - Replace this filename with the actual user file. - `pair_style deepmd graph_compressed.pb` + - Selects the DeePMD pair style. - Loads the DeePMD model from `graph_compressed.pb`. - Replace the model filename with the actual model path, for example `graph.pb`, `graph-compress.pb`, or another supported exported model. - `pair_coeff * *` + - Activates the previously selected pair style for all atom types. - For DeePMD this often takes the simple form `* *` because the mapping is embedded in the model workflow rather than through conventional pairwise parameters. - `thermo_style custom step temp pe ke etotal press vol lx ly lz xy xz yz` + - Chooses exactly which thermodynamic quantities to print. - `step`: timestep index. - `temp`: instantaneous temperature. @@ -180,9 +194,11 @@ run ${NSTEPS} - `xy xz yz`: triclinic tilt factors, which are harmless to print even for an orthogonal box. - `thermo ${THERMO_FREQ}` + - Prints the thermo block every `THERMO_FREQ` timesteps. - `dump 1 all custom ${DUMP_FREQ} traj.lammpstrj id type x y z` + - Creates dump ID `1`. - Dumps atoms from group `all`. - Uses the `custom` dump format. @@ -191,23 +207,27 @@ run ${NSTEPS} - Outputs per-atom columns `id type x y z`. - `velocity all create ${TEMP} 743574` + - Assigns random initial velocities to all atoms. - The target temperature is `TEMP`. - `743574` is the random seed. - Use this when starting a fresh MD trajectory. If restarting from a previous equilibrated state, this command may be unnecessary. - `fix 1 all nvt temp ${TEMP} ${TEMP} ${TAU_T}` + - Creates fix ID `1` on group `all`. - Applies the Nose-Hoover NVT thermostat. - The target temperature is ramped from `${TEMP}` to `${TEMP}`, meaning constant temperature here. - `${TAU_T}` is the thermostat damping constant. - `timestep 0.0005` + - Sets the MD timestep. - In `metal` units, `0.0005` means `0.0005 ps = 0.5 fs`. - The safe choice depends on the system and model quality. - `run ${NSTEPS}` + - Runs molecular dynamics for `NSTEPS` timesteps. ## Common ensemble modifications @@ -221,6 +241,7 @@ fix 1 all nve ``` Meaning: + - integrates Newton's equations in the microcanonical ensemble - no thermostat or barostat is applied - useful for short stability checks or production runs after equilibration @@ -236,6 +257,7 @@ fix 1 all npt temp ${TEMP} ${TEMP} ${TAU_T} iso ${PRESS} ${PRESS} ${ ``` Meaning: + - `PRESS` is the target pressure - `TAU_P` is the barostat damping constant - `iso` applies isotropic pressure control to the simulation box diff --git a/skills/lammps-deepmd/references/commands-and-workflow.md b/skills/lammps-deepmd/references/commands-and-workflow.md index 41942437a6..5dcb0f917d 100644 --- a/skills/lammps-deepmd/references/commands-and-workflow.md +++ b/skills/lammps-deepmd/references/commands-and-workflow.md @@ -16,10 +16,10 @@ Use this skill when a user needs to: ## Practical rules for agents 1. Prefer small, explicit input scripts over clever but opaque templates. -2. Explain every command in the example script, because many users treat the example as a starting point for their own production run. -3. If the user asks to run a simulation, always confirm the structure file and DeePMD model file before execution. -4. If the user asks for offline execution, ask which exact LAMMPS command should be used instead of guessing. -5. If the user only asks for a template, do not overcomplicate it with advanced computes or fixes unless they are needed. +1. Explain every command in the example script, because many users treat the example as a starting point for their own production run. +1. If the user asks to run a simulation, always confirm the structure file and DeePMD model file before execution. +1. If the user asks for offline execution, ask which exact LAMMPS command should be used instead of guessing. +1. If the user only asks for a template, do not overcomplicate it with advanced computes or fixes unless they are needed. ## Suggested smoke test strategy From 373ce6d8beb10df9b8ea1a24510bcd92ebcbecd4 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Sun, 15 Mar 2026 20:07:14 +0800 Subject: [PATCH 3/8] sync(skills): fix(lammps-deepmd): pin lammps version with `lmp` extra (#30) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@3b9a0dfa034ec2405c592d9691f6c5e80993c705 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/lammps-deepmd/SKILL.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md index d90f653f0e..df78c7bce3 100644 --- a/skills/lammps-deepmd/SKILL.md +++ b/skills/lammps-deepmd/SKILL.md @@ -1,7 +1,7 @@ --- name: lammps-deepmd -description: Run molecular dynamics simulations in LAMMPS with the DeePMD-kit plugin, including preparing input scripts, choosing ensembles such as NVE/NVT/NPT, validating commands against LAMMPS documentation, and executing jobs either with `uvx --from lammps --with deepmd-kit[gpu,torch] lmp` when internet access is available or with a user-specified offline LAMMPS executable. -compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx --from lammps --with deepmd-kit[gpu,torch] lmp`; offline mode requires a user-provided LAMMPS executable or module. +description: Run molecular dynamics simulations in LAMMPS with the DeePMD-kit plugin, including preparing input scripts, choosing ensembles such as NVE/NVT/NPT, validating commands against LAMMPS documentation, and executing jobs either with `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp` when internet access is available or with a user-specified offline LAMMPS executable. +compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp`; offline mode requires a user-provided LAMMPS executable or module. license: MIT metadata: author: OpenClaw @@ -18,7 +18,7 @@ Use this skill when the user wants to run molecular dynamics in LAMMPS with a De 1. Confirm the available execution mode: - **Online mode**: if internet access is available and `uv` is installed, prefer - `uvx --from lammps --with deepmd-kit[gpu,torch] lmp ...` + `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp ...` - **Offline mode**: do **not** guess the executable. Ask the user which LAMMPS command, module, or container should be used. 1. Confirm the minimum simulation inputs: - structure/data file (for example `data.system`) @@ -37,19 +37,19 @@ Use this skill when the user wants to run molecular dynamics in LAMMPS with a De Use: ```bash -uvx --from lammps --with deepmd-kit[gpu,torch] lmp -in input.lammps +uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp -in input.lammps ``` If you need to inspect the local command-line help: ```bash -uvx --from lammps --with deepmd-kit[gpu,torch] lmp -h | tee /dev/tty +uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp -h | tee /dev/tty ``` Notes: - This is the preferred path because it can provision LAMMPS and DeePMD-kit on demand. -- The `gpu,torch` extras match the requested runtime pattern from the user. +- The `gpu,torch,lmp` extras match the requested runtime pattern from the user. - If the environment is slow or the packages are large, warn the user that the first run may take time. ### Offline mode @@ -270,13 +270,13 @@ When using NPT, it is often useful to keep `vol`, `lx`, `ly`, and `lz` in the th ### Online run ```bash -uvx --from lammps --with deepmd-kit[gpu,torch] lmp -in input.lammps +uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp -in input.lammps ``` ### Online help ```bash -uvx --from lammps --with deepmd-kit[gpu,torch] lmp -h | tee /dev/tty +uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp -h | tee /dev/tty ``` ### Offline run From 89464832f6d74a1b17162ac16f85a4b207f0c21c Mon Sep 17 00:00:00 2001 From: mwDing <148040278+light-cyan@users.noreply.github.com> Date: Sat, 21 Mar 2026 12:30:54 +0800 Subject: [PATCH 4/8] sync(skills): imp(description): update all SKILL.md files (#37) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@0fbe96d2890fd1810c9fdbc834234c2d8582d9cb Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/lammps-deepmd/SKILL.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md index df78c7bce3..a46a697ebd 100644 --- a/skills/lammps-deepmd/SKILL.md +++ b/skills/lammps-deepmd/SKILL.md @@ -1,6 +1,8 @@ --- name: lammps-deepmd -description: Run molecular dynamics simulations in LAMMPS with the DeePMD-kit plugin, including preparing input scripts, choosing ensembles such as NVE/NVT/NPT, validating commands against LAMMPS documentation, and executing jobs either with `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp` when internet access is available or with a user-specified offline LAMMPS executable. +description: > + A tool and knowledge base for running molecular dynamics (MD) simulations in LAMMPS with the DeePMD-kit plugin. It handles input script preparation, ensemble selection (NVE/NVT/NPT), and job execution via `uv` or offline binaries. + USE WHEN you need to set up, write, explain, or execute a LAMMPS molecular dynamics simulation using a DeePMD machine learning potential (e.g., `graph.pb`). compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp`; offline mode requires a user-provided LAMMPS executable or module. license: MIT metadata: From 9f0efaf30793b5f2702a81abc2a56222f0bc7ffa Mon Sep 17 00:00:00 2001 From: Duo <50307526+iProzd@users.noreply.github.com> Date: Sat, 21 Mar 2026 16:49:03 +0800 Subject: [PATCH 5/8] sync(skills): Add skills for deepmd training/finetune/inference (#35) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@0a9651d1a4e5aad79a64cf0ca51b42d1123f2056 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/deepmd-finetune-dpa3/SKILL.md | 416 ++++++++++++++++++++++++ skills/deepmd-python-inference/SKILL.md | 300 +++++++++++++++++ skills/deepmd-train-dpa3/SKILL.md | 308 ++++++++++++++++++ skills/deepmd-train-se-e2-a/SKILL.md | 264 +++++++++++++++ 4 files changed, 1288 insertions(+) create mode 100644 skills/deepmd-finetune-dpa3/SKILL.md create mode 100644 skills/deepmd-python-inference/SKILL.md create mode 100644 skills/deepmd-train-dpa3/SKILL.md create mode 100644 skills/deepmd-train-se-e2-a/SKILL.md diff --git a/skills/deepmd-finetune-dpa3/SKILL.md b/skills/deepmd-finetune-dpa3/SKILL.md new file mode 100644 index 0000000000..31393b4450 --- /dev/null +++ b/skills/deepmd-finetune-dpa3/SKILL.md @@ -0,0 +1,416 @@ +--- +name: deepmd-finetune-dpa3 +description: Fine-tune a DPA3 model in DeePMD-kit using the PyTorch backend. Use when the user wants to adapt a pre-trained DPA3 model to a new downstream dataset. Supports fine-tuning from a self-trained DPA3 model (.pt checkpoint), from a multi-task pre-trained model, or from a built-in pretrained model downloaded via `dp pretrained download` (e.g., DPA-3.1-3M, DPA-3.2-5M). Covers single-task and multi-task fine-tuning workflows. +compatibility: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. +license: LGPL-3.0 +metadata: + author: iProzd + version: '1.0' + repository: https://github.com/deepmodeling/deepmd-kit +--- + +# DeePMD-kit Fine-tuning: DPA3 + +Fine-tune a pre-trained DPA3 model on a downstream dataset. This skill covers three scenarios: + +1. Fine-tuning from a self-trained single-task DPA3 model +1. Fine-tuning from a multi-task pre-trained DPA3 model +1. Fine-tuning from a built-in pretrained model (e.g., DPA-3.1-3M, DPA-3.2-5M) downloaded via `dp pretrained download` + +## Quick Start + +```bash +# Fine-tune from a self-trained model +dp --pt train input.json --finetune pretrained.pt --use-pretrain-script + +# Fine-tune from a built-in pretrained model +dp pretrained download DPA-3.2-5M +dp --pt train input.json --finetune /path/to/DPA-3.2-5M.pt --use-pretrain-script --model-branch OMat24 +``` + +## Agent Responsibilities + +1. Determine the fine-tuning scenario: + - Does the user have a self-trained `.pt` model? + - Does the user want to use a built-in pretrained model (DPA-3.1-3M, DPA-3.2-5M, etc.)? + - Is the pre-trained model single-task or multi-task? +1. If using a built-in pretrained model, download it first with `dp pretrained download`. +1. Collect the downstream training data paths and element types. +1. Generate the fine-tuning `input.json`. +1. Run fine-tuning and monitor the learning curve. +1. Freeze and test the fine-tuned model. + +## Scenario 1: Fine-tune from a Self-trained Single-task Model + +When you have trained a DPA3 model yourself and want to adapt it to new data. + +### Step 1: Prepare input.json + +When using `--use-pretrain-script`, the model architecture is inherited from the pre-trained model. You only need to specify `type_map`, data paths, and training parameters: + +```json +{ + "model": { + "type_map": [ + "O", + "H" + ], + "descriptor": {}, + "fitting_net": {} + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.0001, + "stop_lr": 3e-06 + }, + "loss": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60, + "start_pref_v": 0.02, + "limit_pref_v": 1 + }, + "optimizer": { + "type": "AdamW", + "weight_decay": 0.001 + }, + "training": { + "training_data": { + "systems": [ + "./downstream_data/train_0", + "./downstream_data/train_1" + ], + "batch_size": 1 + }, + "validation_data": { + "systems": [ + "./downstream_data/valid_0" + ], + "batch_size": 1 + }, + "numb_steps": 200000, + "gradient_max_norm": 5.0, + "seed": 10, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 2000 + } +} +``` + +Fine-tuning tips: + +- Use a smaller `start_lr` (e.g., 1e-4) than training from scratch (1e-3). +- Use fewer `numb_steps` since the model is already pre-trained. +- The elements in the downstream data must be a subset of the pre-trained model's `type_map`. + +### Step 2: Run Fine-tuning + +```bash +dp --pt train input.json --finetune pretrained.pt --use-pretrain-script +``` + +The `--use-pretrain-script` flag tells DeePMD-kit to inherit the model architecture from the pre-trained model, so the `descriptor` and `fitting_net` sections in `input.json` can be empty. + +Without `--use-pretrain-script`, the model section in `input.json` must exactly match the pre-trained model's architecture. + +## Scenario 2: Fine-tune from a Multi-task Pre-trained Model + +When the pre-trained model was trained with multiple datasets (multi-task training), you can select a specific branch to fine-tune from. + +### Check Available Branches + +```bash +dp --pt show multitask_pretrained.pt model-branch +``` + +### Run Fine-tuning from a Specific Branch + +```bash +dp --pt train input.json --finetune multitask_pretrained.pt --model-branch CHOSEN_BRANCH --use-pretrain-script +``` + +If `--model-branch` is not set or set to `RANDOM`, a randomly initialized fitting net will be used. + +### Multi-task Fine-tuning (Prevent Forgetting) + +To retain knowledge from the pre-trained datasets during fine-tuning, use multi-task fine-tuning. Prepare a multi-task input script: + +```json +{ + "model": { + "shared_dict": { + "type_map_all": [ + "O", + "H", + "C", + "N" + ], + "dpa3_desc": { + "type": "dpa3", + "repflow": {} + } + }, + "model_dict": { + "pre_data_1": { + "type_map": "type_map_all", + "descriptor": "dpa3_desc", + "fitting_net": {} + }, + "pre_data_2": { + "type_map": "type_map_all", + "descriptor": "dpa3_desc", + "fitting_net": {} + }, + "downstream": { + "finetune_head": "pre_data_1", + "type_map": "type_map_all", + "descriptor": "dpa3_desc", + "fitting_net": {} + } + } + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.0001, + "stop_lr": 3e-06 + }, + "loss_dict": { + "pre_data_1": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60 + }, + "pre_data_2": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60 + }, + "downstream": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60 + } + }, + "training": { + "model_prob": { + "pre_data_1": 0.3, + "pre_data_2": 0.3, + "downstream": 1.0 + }, + "data_dict": { + "pre_data_1": { + "training_data": { + "systems": [ + "./pre_data_1/train" + ], + "batch_size": 1 + } + }, + "pre_data_2": { + "training_data": { + "systems": [ + "./pre_data_2/train" + ], + "batch_size": 1 + } + }, + "downstream": { + "training_data": { + "systems": [ + "./downstream/train" + ], + "batch_size": 1 + }, + "validation_data": { + "systems": [ + "./downstream/valid" + ], + "batch_size": 1 + } + } + }, + "numb_steps": 200000, + "gradient_max_norm": 5.0, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 2000 + } +} +``` + +Key points: + +- `"finetune_head": "pre_data_1"` specifies which branch the downstream task fine-tunes from. +- `model_prob` controls the sampling probability for each dataset. +- Pre-trained branches continue training in `init-model` mode; the downstream branch fine-tunes from the selected head. + +Run: + +```bash +dp --pt train multi_input.json --finetune multitask_pretrained.pt +``` + +Freeze a specific branch: + +```bash +dp --pt freeze -o model_downstream.pth --head downstream +``` + +## Scenario 3: Fine-tune from Built-in Pretrained Models + +DeePMD-kit provides built-in pretrained models that can be downloaded directly. + +### Step 1: Check Available Models + +```bash +dp pretrained download -h +``` + +Currently available models include: + +- `DPA-3.2-5M` — latest large-scale pretrained model +- `DPA-3.1-3M` — 3M parameter DPA3 pretrained model +- `DPA3-Omol-Large` — large organic molecule model + +### Step 2: Download the Model + +```bash +# Download to default cache directory +dp pretrained download DPA-3.1-3M + +# Download to a custom directory +dp pretrained download DPA-3.1-3M --cache-dir ./models +``` + +The command prints the local path of the downloaded model file on success. + +### Step 3: Check Model Branches (if multi-task) + +```bash +dp --pt show /path/to/DPA-3.1-3M.pt model-branch +``` + +### Step 4: Prepare input.json and Run Fine-tuning + +The input.json is the same as Scenario 1. Use `--use-pretrain-script` to inherit the model architecture: + +```json +{ + "model": { + "type_map": [ + "O", + "H" + ], + "descriptor": {}, + "fitting_net": {} + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.0001, + "stop_lr": 3e-06 + }, + "loss": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60, + "start_pref_v": 0.02, + "limit_pref_v": 1 + }, + "optimizer": { + "type": "AdamW", + "weight_decay": 0.001 + }, + "training": { + "training_data": { + "systems": [ + "./my_data/train_0", + "./my_data/train_1" + ], + "batch_size": 1 + }, + "validation_data": { + "systems": [ + "./my_data/valid_0" + ], + "batch_size": 1 + }, + "numb_steps": 200000, + "gradient_max_norm": 5.0, + "seed": 10, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 2000 + } +} +``` + +The meaning of each parameter can be generated through `dp doc-train-input`. +Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter: + +```sh +dp doc-train-input | grep -A 7 training/numb_steps +``` + +Run fine-tuning: + +```bash +# Single-task fine-tuning from a specific branch +dp --pt train input.json --finetune /path/to/DPA-3.1-3M.pt --model-branch CHOSEN_BRANCH --use-pretrain-script + +# If the pretrained model is single-task, --model-branch is not needed +dp --pt train input.json --finetune /path/to/DPA3-Omol-Large.pt --use-pretrain-script +``` + +### Step 5: Freeze and Test + +```bash +dp --pt freeze -o finetuned_model.pth +dp --pt test -m finetuned_model.pth -s /path/to/test_system -n 30 +``` + +## Fine-tuning Command Reference + +| Command | Description | +| ------------------------------------------------------------------------ | ------------------------------------------------- | +| `dp pretrained download ` | Download a built-in pretrained model | +| `dp pretrained download --cache-dir ` | Download to a custom directory | +| `dp --pt train input.json --finetune .pt` | Fine-tune from a pre-trained model | +| `dp --pt train input.json --finetune .pt --use-pretrain-script` | Inherit model architecture from pre-trained model | +| `dp --pt train input.json --finetune .pt --model-branch ` | Fine-tune from a specific branch | +| `dp --pt train input.json --finetune .pt --model-branch RANDOM` | Fine-tune with random fitting net | +| `dp --pt show .pt model-branch` | List available branches in a multi-task model | +| `dp --pt freeze -o model.pth` | Freeze the fine-tuned model | +| `dp --pt freeze -o model.pth --head ` | Freeze a specific branch (multi-task) | + +## Agent Checklist + +- [ ] Pre-trained model file exists (downloaded or self-trained) +- [ ] Downstream data elements are a subset of the pre-trained model's `type_map` +- [ ] `--use-pretrain-script` is used if model architecture is unknown +- [ ] Learning rate is reduced compared to training from scratch (e.g., 1e-4 vs 1e-3) +- [ ] For multi-task pretrained models, the correct `--model-branch` is selected +- [ ] Training completes without NaN in `lcurve.out` +- [ ] Fine-tuned model is frozen and tested +- [ ] Test RMSE values are reported to the user + +## References + +- [Fine-tuning documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/finetuning.html) +- [Pretrained model download](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/pretrained.html) +- [Multi-task training](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/multi-task-training.html) +- [DPA3 descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/dpa3.html) +- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit) diff --git a/skills/deepmd-python-inference/SKILL.md b/skills/deepmd-python-inference/SKILL.md new file mode 100644 index 0000000000..f0b4eeaaaf --- /dev/null +++ b/skills/deepmd-python-inference/SKILL.md @@ -0,0 +1,300 @@ +--- +name: deepmd-python-inference +description: Run Python inference with DeePMD-kit models using the DeepPot API. Use when the user wants to load a trained/frozen DeePMD model (.pth or .pb) or a built-in pretrained model (e.g., DPA-3.2-5M) in Python, predict energy/force/virial for atomic configurations, evaluate descriptors, or calculate model deviation between multiple models. Also covers using `dp test` CLI for batch evaluation against labeled data. +compatibility: Requires deepmd-kit Python package installed. PyTorch backend for .pth models, TensorFlow for .pb models. +license: LGPL-3.0 +metadata: + author: iProzd + version: '1.0' + repository: https://github.com/deepmodeling/deepmd-kit +--- + +# DeePMD-kit Python Inference + +Load a trained DeePMD-kit model in Python and predict energy, forces, and virial for atomic configurations. Also covers CLI-based testing with `dp test`. + +## Quick Start + +```python +from deepmd.infer import DeepPot +import numpy as np + +dp = DeepPot("model.pth") +coord = np.array([[1, 0, 0], [0, 0, 1.5], [1, 0, 3]]).reshape([1, -1]) +cell = np.diag(10 * np.ones(3)).reshape([1, -1]) +atype = [1, 0, 1] +e, f, v = dp.eval(coord, cell, atype) +``` + +## Agent Responsibilities + +1. Determine the model source: + - Frozen model file (`.pth` for PyTorch, `.pb` for TensorFlow) + - Built-in pretrained model name (e.g., `DPA-3.2-5M`) + - Checkpoint file (requires freezing first) +1. Determine the inference task: + - Single-frame prediction (energy, force, virial) + - Batch prediction over multiple frames + - Descriptor evaluation + - Model deviation calculation + - CLI-based testing against labeled data +1. Help the user prepare input arrays in the correct format. +1. Run inference and report results. + +## Python API: DeepPot + +### Load a Model + +```python +from deepmd.infer import DeepPot + +# From a frozen PyTorch model +dp = DeepPot("model.pth") + +# From a frozen TensorFlow model +dp = DeepPot("graph.pb") + +# From a built-in pretrained model (auto-downloads if not cached) +dp = DeepPot("DPA-3.2-5M") +``` + +Built-in pretrained model names include `DPA-3.2-5M`, `DPA-3.1-3M`, `DPA3-Omol-Large`, etc. DeePMD-kit will automatically download and cache the model on first use. + +### Predict Energy, Forces, and Virial + +```python +import numpy as np +from deepmd.infer import DeepPot + +dp = DeepPot("model.pth") + +# Prepare inputs +# coord: (nframes, natoms * 3) in Angstrom +# cell: (nframes, 9) cell vectors in Angstrom, row-major +# atype: list of atom type indices (length natoms) + +coord = np.array( + [ + [ + 0.0, + 0.0, + 0.0, # atom 0 (O) + 0.0, + 0.0, + 1.0, # atom 1 (H) + 0.0, + 1.0, + 0.0, + ] # atom 2 (H) + ] +).reshape([1, -1]) + +cell = np.diag([10.0, 10.0, 10.0]).reshape([1, -1]) + +# atype indices correspond to type_map order in the model +# e.g., if type_map = ["O", "H"], then O=0, H=1 +atype = [0, 1, 1] + +e, f, v = dp.eval(coord, cell, atype) + +print(f"Energy (eV): {e}") # shape: (nframes, 1) +print(f"Forces (eV/A): {f}") # shape: (nframes, natoms, 3) +print(f"Virial (eV): {v}") # shape: (nframes, 9) +``` + +### Non-periodic Systems + +For non-periodic (isolated) systems, pass `cell=None`: + +```python +e, f, v = dp.eval(coord, None, atype) +``` + +### Batch Prediction + +Process multiple frames at once: + +```python +nframes = 10 +natoms = 3 + +coords = np.random.rand(nframes, natoms * 3) +cells = np.tile(np.diag([10.0, 10.0, 10.0]).reshape([1, -1]), (nframes, 1)) +atype = [0, 1, 1] + +e, f, v = dp.eval(coords, cells, atype) +# e: (nframes, 1) +# f: (nframes, natoms, 3) +# v: (nframes, 9) +``` + +### Evaluate Descriptors + +Extract the descriptor (atomic environment representation) from the model: + +```python +descriptors = dp.eval_descriptor(coord, cell, atype) +# shape: (nframes, natoms, ndesc) +``` + +This can also be done via CLI: + +```bash +dp eval-desc -m model.pth -s /path/to/system -o desc_output +``` + +### Calculate Model Deviation + +Compare predictions from multiple models to estimate uncertainty: + +```python +from deepmd.infer import calc_model_devi, DeepPot + +coord = np.array([[1, 0, 0], [0, 0, 1.5], [1, 0, 3]]).reshape([1, -1]) +cell = np.diag(10 * np.ones(3)).reshape([1, -1]) +atype = [1, 0, 1] + +graphs = [DeepPot("model_0.pth"), DeepPot("model_1.pth")] +model_devi = calc_model_devi(coord, cell, atype, graphs) +``` + +Important: avoid loading the same model multiple times in a loop, as this can cause memory leaks. + +## CLI Testing: dp test + +Test a frozen model against labeled data: + +```bash +# Basic test +dp --pt test -m model.pth -s /path/to/test_system -n 30 + +# Test with detailed output +dp --pt test -m model.pth -s /path/to/test_system -n 30 -d test_detail +``` + +### dp test Options + +| Option | Description | +| ---------------- | ---------------------------------- | +| `-m MODEL` | Path to the frozen model file | +| `-s SYSTEM` | Path to the test data system | +| `-n NUMB` | Number of test frames | +| `-d DETAIL` | Output prefix for detailed results | +| `--shuffle-test` | Shuffle test frames | + +### Output + +`dp test` prints RMSE values for energy, force, and virial: + +``` +Energy RMSE : 1.234e-03 eV +Energy RMSE/Natoms : 6.427e-06 eV +Force RMSE : 2.345e-02 eV/A +Virial RMSE : 5.678e-02 eV +Virial RMSE/Natoms : 2.957e-04 eV +``` + +With `-d test_detail`, per-frame predictions are saved to files for further analysis. + +## Complete Example: Train, Freeze, and Inference + +```python +import subprocess +import numpy as np +from deepmd.infer import DeepPot + +# Step 1: Train (run in shell) +# dp --pt train input.json + +# Step 2: Freeze (run in shell) +# dp --pt freeze -o model.pth + +# Step 3: Python inference +dp = DeepPot("model.pth") + +# Load test data from deepmd format +coord = np.load("test_system/set.000/coord.npy") # (nframes, natoms*3) +cell = np.load("test_system/set.000/box.npy") # (nframes, 9) +atype_raw = np.loadtxt("test_system/type.raw", dtype=int).tolist() + +# Predict +e, f, v = dp.eval(coord, cell, atype_raw) + +# Compare with reference +ref_energy = np.load("test_system/set.000/energy.npy") +ref_force = np.load("test_system/set.000/force.npy") + +natoms = len(atype_raw) +energy_rmse = np.sqrt(np.mean((e.flatten() - ref_energy.flatten()) ** 2)) / natoms +force_rmse = np.sqrt(np.mean((f.reshape(-1) - ref_force.reshape(-1)) ** 2)) + +print(f"Energy RMSE/atom: {energy_rmse:.6f} eV") +print(f"Force RMSE: {force_rmse:.6f} eV/A") +``` + +## Using Pretrained Models Directly + +Built-in pretrained models can be used without any training: + +```python +from deepmd.infer import DeepPot +import numpy as np + +# Auto-downloads DPA-3.2-5M on first use +dp = DeepPot("DPA-3.2-5M") + +# Water molecule example +coord = np.array( + [ + [0.000, 0.000, 0.117], # O + [0.000, 0.757, -0.469], # H + [0.000, -0.757, -0.469], # H + ] +).reshape([1, -1]) + +cell = np.diag([10.0, 10.0, 10.0]).reshape([1, -1]) +atype = [0, 1, 1] # Check model's type_map for correct indices + +e, f, v = dp.eval(coord, cell, atype) +print(f"Energy: {e[0][0]:.6f} eV") +print(f"Forces:\n{f[0]}") +``` + +To download pretrained models explicitly: + +```bash +dp pretrained download DPA-3.2-5M +dp pretrained download DPA-3.1-3M +dp pretrained download DPA-3.2-5M --cache-dir ./models +``` + +## Input Array Format Reference + +| Array | Shape | Unit | Description | +| ------- | -------------------- | -------- | --------------------------------------------- | +| `coord` | (nframes, natoms\*3) | Angstrom | Atomic coordinates, flattened | +| `cell` | (nframes, 9) | Angstrom | Cell vectors, row-major (a1x,a1y,a1z,a2x,...) | +| `atype` | (natoms,) | - | Atom type indices matching model's type_map | + +| Output | Shape | Unit | Description | +| ------ | -------------------- | ---- | ----------------------- | +| `e` | (nframes, 1) | eV | Total energy per frame | +| `f` | (nframes, natoms, 3) | eV/A | Forces on each atom | +| `v` | (nframes, 9) | eV | Virial tensor per frame | + +## Agent Checklist + +- [ ] Model file exists and is accessible (`.pth`, `.pb`, or valid pretrained name) +- [ ] `coord` array is shaped (nframes, natoms\*3) and in Angstrom +- [ ] `cell` array is shaped (nframes, 9) or `None` for non-periodic systems +- [ ] `atype` indices match the model's `type_map` ordering +- [ ] For model deviation, multiple models are loaded only once (not in a loop) +- [ ] Results are reported with correct units (eV, eV/A) + +## References + +- [Python inference documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/inference/python.html) +- [dp test documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/test/test.html) +- [Pretrained model download](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/pretrained.html) +- [DeepPot API reference](https://docs.deepmodeling.com/projects/deepmd/en/latest/api_py/deepmd.infer.html) +- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit) diff --git a/skills/deepmd-train-dpa3/SKILL.md b/skills/deepmd-train-dpa3/SKILL.md new file mode 100644 index 0000000000..8476f2b379 --- /dev/null +++ b/skills/deepmd-train-dpa3/SKILL.md @@ -0,0 +1,308 @@ +--- +name: deepmd-train-dpa3 +description: Train a DeePMD-kit model using the DPA3 descriptor with the PyTorch backend. Use when the user wants to train a state-of-the-art deep potential model based on message passing on Line Graph Series (LiGS). DPA3 provides high accuracy and strong generalization, suitable for large atomic models (LAM) and diverse chemical systems. Supports both fixed and dynamic neighbor selection. +compatibility: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. Custom OP library required for LAMMPS deployment. +license: LGPL-3.0 +metadata: + author: iProzd + version: '1.0' + repository: https://github.com/deepmodeling/deepmd-kit +--- + +# DeePMD-kit Training: DPA3 + +Train a deep potential model using the DPA3 descriptor, an advanced message-passing architecture operating on Line Graph Series (LiGS). DPA3 is designed as a large atomic model (LAM) with high fitting accuracy and robust generalization across diverse chemical and materials systems. + +## Quick Start + +```bash +dp --pt train input.json +``` + +## Agent Responsibilities + +1. Confirm the user has a working deepmd-kit environment with PyTorch backend. +1. Collect the minimum required information: + - Training data paths (deepmd/npy or deepmd/hdf5 format) + - Validation data paths + - Element types (type_map) + - Target number of training steps + - Model size preference (L3/L6/L12 layers) +1. Generate a complete `input.json` training configuration. +1. Decide whether to use fixed or dynamic neighbor selection based on system diversity. +1. Run training and monitor the learning curve. +1. Freeze the trained model and optionally test it. + +## Workflow + +### Step 1: Prepare Training Data + +Same format as other DeePMD models. Each system directory should contain: + +``` +system_dir/ +├── type.raw +├── type_map.raw +└── set.000/ + ├── coord.npy + ├── energy.npy + ├── force.npy + ├── box.npy + └── virial.npy +``` + +DPA3 also supports the mixed type data format for multi-element systems. + +### Step 2: Write input.json + +#### Standard DPA3 (fixed selection) + +```json +{ + "model": { + "type_map": [ + "O", + "H" + ], + "descriptor": { + "type": "dpa3", + "repflow": { + "n_dim": 128, + "e_dim": 64, + "a_dim": 32, + "nlayers": 6, + "e_rcut": 6.0, + "e_rcut_smth": 5.3, + "e_sel": 120, + "a_rcut": 4.0, + "a_rcut_smth": 3.5, + "a_sel": 30, + "axis_neuron": 4, + "fix_stat_std": 0.3, + "a_compress_rate": 1, + "a_compress_e_rate": 2, + "a_compress_use_split": true, + "update_angle": true, + "smooth_edge_update": true, + "edge_init_use_dist": true, + "use_exp_switch": true, + "update_style": "res_residual", + "update_residual": 0.1, + "update_residual_init": "const" + }, + "activation_function": "silut:10.0", + "use_tebd_bias": false, + "precision": "float32", + "concat_output_tebd": false, + "seed": 1 + }, + "fitting_net": { + "neuron": [ + 240, + 240, + 240 + ], + "resnet_dt": true, + "precision": "float32", + "activation_function": "silut:10.0", + "seed": 1 + } + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.001, + "stop_lr": 3e-05 + }, + "loss": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60, + "start_pref_v": 0.02, + "limit_pref_v": 1 + }, + "optimizer": { + "type": "AdamW", + "adam_beta1": 0.9, + "adam_beta2": 0.999, + "weight_decay": 0.001 + }, + "training": { + "stat_file": "./dpa3.hdf5", + "training_data": { + "systems": [ + "./data/train_0", + "./data/train_1", + "./data/train_2" + ], + "batch_size": 1 + }, + "validation_data": { + "systems": [ + "./data/valid_0" + ], + "batch_size": 1 + }, + "numb_steps": 1000000, + "gradient_max_norm": 5.0, + "seed": 10, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 2000 + } +} +``` + +The meaning of each parameter can be generated through `dp doc-train-input`. +Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter: + +```sh +dp doc-train-input | grep -A 7 training/numb_steps +dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[dpa3\]/repflow/e_sel' +``` + +#### DPA3 with Dynamic Selection + +For systems with highly variable neighbor counts (e.g., multi-element datasets), use dynamic selection by modifying the `repflow` section: + +```json +"repflow": { + "e_sel": 1200, + "a_sel": 300, + "use_dynamic_sel": true, + "sel_reduce_factor": 10.0 +} +``` + +When `use_dynamic_sel` is true, the effective selection is `e_sel / sel_reduce_factor` and `a_sel / sel_reduce_factor` (i.e., 120 and 30 in this example), but the model dynamically adapts to varying neighbor counts. + +### Step 3: Run Training + +```bash +dp --pt train input.json +``` + +To restart from a checkpoint: + +```bash +dp --pt train input.json --restart model.ckpt.pt +``` + +### Step 4: Monitor Training + +The learning curve `lcurve.out` has the same format as other DeePMD models: + +``` +# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr +``` + +### Step 5: Freeze the Model + +```bash +dp --pt freeze -o model.pth +``` + +### Step 6: Test the Model + +```bash +dp --pt test -m model.pth -s /path/to/test_system -n 30 +``` + +## Model Size Guide + +Choose the number of layers based on accuracy vs. cost trade-off: + +| Model | nlayers | n_dim | e_dim | a_dim | Relative Cost | Use Case | +| ------------- | ------- | ----- | ----- | ----- | ------------- | ---------------------------------- | +| DPA3-L3 | 3 | 256 | 128 | 32 | 1x | Quick prototyping, smaller systems | +| DPA3-L3-small | 3 | 128 | 64 | 32 | 0.8x | Fast iteration, limited GPU memory | +| DPA3-L6 | 6 | 256 | 128 | 32 | 2x | Recommended for production | +| DPA3-L6-small | 6 | 128 | 64 | 32 | 1.4x | Good accuracy/cost balance | + +Benchmark RMSE (averaged over 6 representative systems, 0.5M steps): + +| Model | Energy (meV/atom) | Force (meV/A) | Virial (meV/atom) | +| ------------------------- | ----------------- | ------------- | ----------------- | +| DPA3-L3 (256/128/32) | 5.74 | 85.4 | 43.1 | +| DPA3-L3-small (128/64/32) | 6.99 | 93.6 | 46.7 | +| DPA3-L6 (256/128/32) | 4.85 | 79.9 | 39.7 | +| DPA3-L6-small (128/64/32) | 5.11 | 77.7 | 41.2 | +| DPA2-L6 (reference) | 12.12 | 109.3 | 83.1 | + +## Key Differences from SE_E2_A + +| Aspect | SE_E2_A | DPA3 | +| ----------------- | -------------------- | ------------------------------- | +| Architecture | Two-body embedding | Message passing on LiGS | +| Default precision | float64 | float32 | +| Optimizer | Adam | AdamW (with weight_decay) | +| Loss prefactors | e: 0.02→1, f: 1000→1 | e: 0.2→20, f: 100→60, v: 0.02→1 | +| stop_lr | 3.51e-8 | 3e-5 | +| Gradient clipping | Not used | gradient_max_norm: 5.0 | +| Virial training | Optional | Recommended | +| Model compression | Supported | Not supported | +| Activation | tanh (default) | silut:10.0 | + +## Key Hyperparameters + +### Repflow (Descriptor) + +| Parameter | Description | Default | +| ----------------- | -------------------------------- | -------------- | +| `n_dim` | Node embedding dimension | 128 or 256 | +| `e_dim` | Edge embedding dimension | 64 or 128 | +| `a_dim` | Angle embedding dimension | 32 | +| `nlayers` | Number of message passing layers | 3 or 6 | +| `e_rcut` | Edge cutoff radius (A) | 6.0 | +| `e_rcut_smth` | Edge smooth cutoff start | 5.3 | +| `e_sel` | Max edge neighbors | 120 | +| `a_rcut` | Angle cutoff radius (A) | 4.0 | +| `a_rcut_smth` | Angle smooth cutoff start | 3.5 | +| `a_sel` | Max angle neighbors | 30 | +| `update_style` | Residual update style | "res_residual" | +| `update_residual` | Residual scaling factor | 0.1 | + +### Optimizer + +DPA3 uses AdamW by default (decoupled weight decay): + +```json +"optimizer": { + "type": "AdamW", + "adam_beta1": 0.9, + "adam_beta2": 0.999, + "weight_decay": 0.001 +} +``` + +### Gradient Clipping + +Recommended for DPA3 to stabilize training: + +```json +"training": { + "gradient_max_norm": 5.0 +} +``` + +## Agent Checklist + +- [ ] Training data exists and is in deepmd format +- [ ] `type_map` matches the elements in the data +- [ ] Precision is set to `float32` (DPA3 default, not float64) +- [ ] AdamW optimizer is configured with weight_decay +- [ ] `gradient_max_norm` is set (recommended: 5.0) +- [ ] `stop_lr` is 3e-5 (not 3.51e-8 as in SE_E2_A) +- [ ] Virial loss prefactors are included if virial data is available +- [ ] `stat_file` is set to cache statistics (avoids recomputation on restart) +- [ ] Training completes without NaN in `lcurve.out` +- [ ] Model is frozen to `.pth` after training + +## References + +- [DPA3 descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/dpa3.html) +- [DPA3 paper](https://arxiv.org/abs/2506.01686) +- [Training documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) +- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit) diff --git a/skills/deepmd-train-se-e2-a/SKILL.md b/skills/deepmd-train-se-e2-a/SKILL.md new file mode 100644 index 0000000000..e58c4146be --- /dev/null +++ b/skills/deepmd-train-se-e2-a/SKILL.md @@ -0,0 +1,264 @@ +--- +name: deepmd-train-se-e2-a +description: Train a DeePMD-kit model using the SE_E2_A (DeepPot-SE) descriptor with the PyTorch backend. Use when the user wants to train a classical deep potential model for a specific system, prepare training input JSON, run `dp --pt train`, monitor learning curves, freeze the model, and test it. SE_E2_A is the foundational two-body embedding descriptor suitable for most condensed-phase systems. +compatibility: Requires deepmd-kit with PyTorch backend installed. GPU recommended for production training. +license: LGPL-3.0 +metadata: + author: iProzd + version: '1.0' + repository: https://github.com/deepmodeling/deepmd-kit +--- + +# DeePMD-kit Training: SE_E2_A + +Train a deep potential model using the SE_E2_A (Smooth Edition, two-body embedding, all information) descriptor. This is the foundational DeepPot-SE architecture suitable for most condensed-phase systems. + +## Quick Start + +```bash +dp --pt train input.json +``` + +## Agent Responsibilities + +1. Confirm the user has a working deepmd-kit environment with PyTorch backend. +1. Collect the minimum required information: + - Training data paths (deepmd/npy or deepmd/hdf5 format) + - Validation data paths + - Element types (type_map) + - Target number of training steps +1. Generate a complete `input.json` training configuration. +1. Explain key hyperparameters if the user is unfamiliar. +1. Run training and monitor the learning curve (`lcurve.out`). +1. Freeze the trained model to `.pth` format. +1. Optionally test the model with `dp test`. + +## Workflow + +### Step 1: Prepare Training Data + +Training data must be in DeePMD format (deepmd/npy or deepmd/hdf5). Each system directory should contain: + +``` +system_dir/ +├── type.raw # atom type indices, one integer per atom +├── type_map.raw # element names, one per line +└── set.000/ + ├── coord.npy # coordinates (nframes, natoms*3) + ├── energy.npy # energies (nframes, 1) + ├── force.npy # forces (nframes, natoms*3) + └── box.npy # cell vectors (nframes, 9) +``` + +If the user has DFT output (VASP OUTCAR, etc.), refer to the `dpdata-cli` skill for format conversion. + +### Step 2: Write input.json + +A complete SE_E2_A training configuration: + +```json +{ + "model": { + "type_map": [ + "O", + "H" + ], + "descriptor": { + "type": "se_e2_a", + "sel": [ + 46, + 92 + ], + "rcut_smth": 0.5, + "rcut": 6.0, + "neuron": [ + 25, + 50, + 100 + ], + "resnet_dt": false, + "axis_neuron": 16, + "type_one_side": true, + "seed": 1 + }, + "fitting_net": { + "neuron": [ + 240, + 240, + 240 + ], + "resnet_dt": true, + "seed": 1 + } + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.001, + "stop_lr": 3.51e-08 + }, + "loss": { + "type": "ener", + "start_pref_e": 0.02, + "limit_pref_e": 1, + "start_pref_f": 1000, + "limit_pref_f": 1 + }, + "training": { + "training_data": { + "systems": [ + "./data/train_system_0", + "./data/train_system_1" + ], + "batch_size": "auto" + }, + "validation_data": { + "systems": [ + "./data/valid_system_0" + ], + "batch_size": 1, + "numb_btch": 3 + }, + "numb_steps": 400000, + "seed": 10, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 10000 + } +} +``` + +The meaning of each parameter can be generated through `dp doc-train-input`. +Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter: + +```sh +dp doc-train-input | grep -A 7 training/numb_steps +dp doc-train-input | grep -A 7 'model\[standard\]/descriptor\[se_e2_a\]/sel' +``` + +### Step 3: Run Training + +```bash +dp --pt train input.json +``` + +To restart from a checkpoint: + +```bash +dp --pt train input.json --restart model.ckpt.pt +``` + +To initialize from an existing model: + +```bash +dp --pt train input.json --init-model model.ckpt.pt +``` + +### Step 4: Monitor Training + +The learning curve is written to `lcurve.out` with columns: + +``` +# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr +``` + +- `rmse_e_*`: energy RMSE per atom (eV/atom) +- `rmse_f_*`: force RMSE (eV/A) +- `lr`: current learning rate + +Quick visualization: + +```python +import numpy as np +import matplotlib.pyplot as plt + +data = np.genfromtxt("lcurve.out", names=True) +for name in data.dtype.names[1:-1]: + plt.plot(data["step"], data[name], label=name) +plt.legend() +plt.xlabel("Step") +plt.ylabel("Loss") +plt.xscale("symlog") +plt.yscale("log") +plt.grid() +plt.show() +``` + +### Step 5: Freeze the Model + +```bash +dp --pt freeze -o model.pth +``` + +### Step 6: Test the Model + +```bash +dp --pt test -m model.pth -s /path/to/test_system -n 30 +``` + +## Key Hyperparameters + +### Descriptor + +| Parameter | Description | Typical Value | +| --------------- | ----------------------------------- | ---------------- | +| `rcut` | Cutoff radius (A) | 6.0 | +| `rcut_smth` | Smooth cutoff start (A) | 0.5 | +| `sel` | Max neighbors per type | System-dependent | +| `neuron` | Embedding net sizes | [25, 50, 100] | +| `axis_neuron` | Axis matrix dimension | 16 | +| `type_one_side` | Share embedding across center types | true | + +### Fitting Net + +| Parameter | Description | Typical Value | +| ----------- | ---------------------- | --------------- | +| `neuron` | Hidden layer sizes | [240, 240, 240] | +| `resnet_dt` | Use timestep in ResNet | true | + +### Loss Prefactors + +| JSON keys | Description | Start | Limit | +| ------------------------------- | ------------------------ | ----- | ----- | +| `start_pref_e` / `limit_pref_e` | Energy weight | 0.02 | 1 | +| `start_pref_f` / `limit_pref_f` | Force weight | 1000 | 1 | +| `start_pref_v` / `limit_pref_v` | Virial weight (optional) | 0 | 0 | + +Here, `start_pref_*` and `limit_pref_*` set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. + +### Training + +| Parameter | Description | Typical Value | +| ------------- | --------------------- | ------------------- | +| `numb_steps` | Total training steps | 400000-1000000 | +| `batch_size` | Frames per step | "auto" or "auto:32" | +| `start_lr` | Initial learning rate | 0.001 | +| `stop_lr` | Final learning rate | 3.51e-8 | +| `decay_steps` | LR decay interval | 5000 | + +### Setting `sel` + +`sel` is a list with one entry per element type, specifying the maximum number of neighbors of that type within `rcut`. To determine appropriate values: + +```bash +dp --pt neighbor-stat -s /path/to/data -r 6.0 -t O H +``` + +Use values slightly above the reported maximum. + +## Agent Checklist + +- [ ] Training data exists and is in deepmd format +- [ ] `type_map` matches the elements in the data +- [ ] `sel` is appropriate for the system (use `dp neighbor-stat` if unsure) +- [ ] `rcut` is reasonable for the system (typically 6.0-9.0 A) +- [ ] Training completes without NaN in `lcurve.out` +- [ ] Model is frozen to `.pth` after training +- [ ] Test RMSE values are reported to the user + +## References + +- [SE_E2_A descriptor documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/model/train-se-e2-a.html) +- [Training documentation](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) +- [Training advanced options](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training-advanced.html) +- [DeePMD-kit GitHub](https://github.com/deepmodeling/deepmd-kit) From 209191855425eb1c1e52a3ee404376b32eb619e5 Mon Sep 17 00:00:00 2001 From: "A bot of @njzjz" <48687836+njzjz-bot@users.noreply.github.com> Date: Tue, 24 Mar 2026 06:26:43 +0800 Subject: [PATCH 6/8] sync(skills): chore(skills): unify licenses to LGPL-3.0-or-later (#52) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@622da72a772fd23930bf3661d5eee95419f181b6 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/deepmd-finetune-dpa3/SKILL.md | 2 +- skills/deepmd-python-inference/SKILL.md | 2 +- skills/deepmd-train-dpa3/SKILL.md | 2 +- skills/deepmd-train-se-e2-a/SKILL.md | 2 +- skills/lammps-deepmd/SKILL.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/skills/deepmd-finetune-dpa3/SKILL.md b/skills/deepmd-finetune-dpa3/SKILL.md index 31393b4450..d48679be56 100644 --- a/skills/deepmd-finetune-dpa3/SKILL.md +++ b/skills/deepmd-finetune-dpa3/SKILL.md @@ -2,7 +2,7 @@ name: deepmd-finetune-dpa3 description: Fine-tune a DPA3 model in DeePMD-kit using the PyTorch backend. Use when the user wants to adapt a pre-trained DPA3 model to a new downstream dataset. Supports fine-tuning from a self-trained DPA3 model (.pt checkpoint), from a multi-task pre-trained model, or from a built-in pretrained model downloaded via `dp pretrained download` (e.g., DPA-3.1-3M, DPA-3.2-5M). Covers single-task and multi-task fine-tuning workflows. compatibility: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. -license: LGPL-3.0 +license: LGPL-3.0-or-later metadata: author: iProzd version: '1.0' diff --git a/skills/deepmd-python-inference/SKILL.md b/skills/deepmd-python-inference/SKILL.md index f0b4eeaaaf..4fab21c08a 100644 --- a/skills/deepmd-python-inference/SKILL.md +++ b/skills/deepmd-python-inference/SKILL.md @@ -2,7 +2,7 @@ name: deepmd-python-inference description: Run Python inference with DeePMD-kit models using the DeepPot API. Use when the user wants to load a trained/frozen DeePMD model (.pth or .pb) or a built-in pretrained model (e.g., DPA-3.2-5M) in Python, predict energy/force/virial for atomic configurations, evaluate descriptors, or calculate model deviation between multiple models. Also covers using `dp test` CLI for batch evaluation against labeled data. compatibility: Requires deepmd-kit Python package installed. PyTorch backend for .pth models, TensorFlow for .pb models. -license: LGPL-3.0 +license: LGPL-3.0-or-later metadata: author: iProzd version: '1.0' diff --git a/skills/deepmd-train-dpa3/SKILL.md b/skills/deepmd-train-dpa3/SKILL.md index 8476f2b379..07b9fbfce4 100644 --- a/skills/deepmd-train-dpa3/SKILL.md +++ b/skills/deepmd-train-dpa3/SKILL.md @@ -2,7 +2,7 @@ name: deepmd-train-dpa3 description: Train a DeePMD-kit model using the DPA3 descriptor with the PyTorch backend. Use when the user wants to train a state-of-the-art deep potential model based on message passing on Line Graph Series (LiGS). DPA3 provides high accuracy and strong generalization, suitable for large atomic models (LAM) and diverse chemical systems. Supports both fixed and dynamic neighbor selection. compatibility: Requires deepmd-kit with PyTorch backend installed. GPU strongly recommended. Custom OP library required for LAMMPS deployment. -license: LGPL-3.0 +license: LGPL-3.0-or-later metadata: author: iProzd version: '1.0' diff --git a/skills/deepmd-train-se-e2-a/SKILL.md b/skills/deepmd-train-se-e2-a/SKILL.md index e58c4146be..1e11b2ac89 100644 --- a/skills/deepmd-train-se-e2-a/SKILL.md +++ b/skills/deepmd-train-se-e2-a/SKILL.md @@ -2,7 +2,7 @@ name: deepmd-train-se-e2-a description: Train a DeePMD-kit model using the SE_E2_A (DeepPot-SE) descriptor with the PyTorch backend. Use when the user wants to train a classical deep potential model for a specific system, prepare training input JSON, run `dp --pt train`, monitor learning curves, freeze the model, and test it. SE_E2_A is the foundational two-body embedding descriptor suitable for most condensed-phase systems. compatibility: Requires deepmd-kit with PyTorch backend installed. GPU recommended for production training. -license: LGPL-3.0 +license: LGPL-3.0-or-later metadata: author: iProzd version: '1.0' diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md index a46a697ebd..5fbe292891 100644 --- a/skills/lammps-deepmd/SKILL.md +++ b/skills/lammps-deepmd/SKILL.md @@ -4,7 +4,7 @@ description: > A tool and knowledge base for running molecular dynamics (MD) simulations in LAMMPS with the DeePMD-kit plugin. It handles input script preparation, ensemble selection (NVE/NVT/NPT), and job execution via `uv` or offline binaries. USE WHEN you need to set up, write, explain, or execute a LAMMPS molecular dynamics simulation using a DeePMD machine learning potential (e.g., `graph.pb`). compatibility: Requires LAMMPS with DeePMD-kit support. Online mode prefers `uvx --from lammps --with deepmd-kit[gpu,torch,lmp] lmp`; offline mode requires a user-provided LAMMPS executable or module. -license: MIT +license: LGPL-3.0-or-later metadata: author: OpenClaw version: '1.0' From b30b50d4e61554ea0318d147108172a0630d0efa Mon Sep 17 00:00:00 2001 From: Duo <50307526+iProzd@users.noreply.github.com> Date: Fri, 27 Mar 2026 07:51:05 +0800 Subject: [PATCH 7/8] sync(skills): improve deepmd skills (#56) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@3f03f724e35f86024c0de921c7780bef75dee954 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/deepmd-train-dpa3/SKILL.md | 28 ++++++++++++++++++++++++++-- skills/deepmd-train-se-e2-a/SKILL.md | 15 +++++++++++---- 2 files changed, 37 insertions(+), 6 deletions(-) diff --git a/skills/deepmd-train-dpa3/SKILL.md b/skills/deepmd-train-dpa3/SKILL.md index 07b9fbfce4..9c361e041a 100644 --- a/skills/deepmd-train-dpa3/SKILL.md +++ b/skills/deepmd-train-dpa3/SKILL.md @@ -155,6 +155,10 @@ DPA3 also supports the mixed type data format for multi-element systems. } ``` +If you do not want to train on virial, set the virial prefactors to 0. + +DPA3 uses different default loss prefactors compared to SE_E2_A. See the comparison table in the "Key Differences from SE_E2_A" section below. + The meaning of each parameter can be generated through `dp doc-train-input`. Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter: @@ -192,12 +196,17 @@ dp --pt train input.json --restart model.ckpt.pt ### Step 4: Monitor Training -The learning curve `lcurve.out` has the same format as other DeePMD models: +The learning curve is written to `lcurve.out` with columns: ``` -# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr +# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn rmse_v_val rmse_v_trn lr ``` +- `rmse_e_*`: energy RMSE per atom (eV/atom) +- `rmse_f_*`: force RMSE (eV/A) +- `rmse_v_*`: virial RMSE (eV/atom, only present if virial data is available) +- `lr`: current learning rate + ### Step 5: Freeze the Model ```bash @@ -264,6 +273,21 @@ Benchmark RMSE (averaged over 6 representative systems, 0.5M steps): | `update_style` | Residual update style | "res_residual" | | `update_residual` | Residual scaling factor | 0.1 | +### Activation Function + +DPA3 uses `silut:10.0` by default. For datasets where training is unstable, consider switching to `tanh`: + +```json +"descriptor": { + "type": "dpa3", + "repflow": { ... }, + "activation_function": "tanh" +}, +"fitting_net": { + "activation_function": "tanh" +} +``` + ### Optimizer DPA3 uses AdamW by default (decoupled weight decay): diff --git a/skills/deepmd-train-se-e2-a/SKILL.md b/skills/deepmd-train-se-e2-a/SKILL.md index 1e11b2ac89..11bcc992bf 100644 --- a/skills/deepmd-train-se-e2-a/SKILL.md +++ b/skills/deepmd-train-se-e2-a/SKILL.md @@ -102,7 +102,9 @@ A complete SE_E2_A training configuration: "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, - "limit_pref_f": 1 + "limit_pref_f": 1, + "start_pref_v": 0.02, + "limit_pref_v": 1 }, "training": { "training_data": { @@ -128,6 +130,10 @@ A complete SE_E2_A training configuration: } ``` +If you do not want to train on virial, set the virial prefactors to 0. + +SE_E2_A uses different default loss prefactors compared to DPA3 (e: 0.02→1, f: 1000→1 vs. e: 0.2→20, f: 100→60, v: 0.02→1). + The meaning of each parameter can be generated through `dp doc-train-input`. Considering the output RST documentation on the screen is very long, use `grep` to find the documentation of a specific parameter: @@ -159,11 +165,12 @@ dp --pt train input.json --init-model model.ckpt.pt The learning curve is written to `lcurve.out` with columns: ``` -# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr +# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn rmse_v_val rmse_v_trn lr ``` - `rmse_e_*`: energy RMSE per atom (eV/atom) - `rmse_f_*`: force RMSE (eV/A) +- `rmse_v_*`: virial RMSE (eV/atom, only present if virial data is available) - `lr`: current learning rate Quick visualization: @@ -222,9 +229,9 @@ dp --pt test -m model.pth -s /path/to/test_system -n 30 | ------------------------------- | ------------------------ | ----- | ----- | | `start_pref_e` / `limit_pref_e` | Energy weight | 0.02 | 1 | | `start_pref_f` / `limit_pref_f` | Force weight | 1000 | 1 | -| `start_pref_v` / `limit_pref_v` | Virial weight (optional) | 0 | 0 | +| `start_pref_v` / `limit_pref_v` | Virial weight (optional) | 0.02 | 1 | -Here, `start_pref_*` and `limit_pref_*` set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. +Here, `start_pref_*` and `limit_pref_*` set the initial and final loss weights; the loss shifts from force-dominated early training to balanced energy+force later. For virial, set to 0 if not training on virial data. ### Training From 5156c408b817342d190b67b6f0d922447663e5d5 Mon Sep 17 00:00:00 2001 From: mwDing <148040278+light-cyan@users.noreply.github.com> Date: Fri, 8 May 2026 02:37:38 +0800 Subject: [PATCH 8/8] sync(skills): Add a description about `mass` (#73) Imported from jinzhezenggroup/computational-chemistry-agent-skills. Upstream-Commit: jinzhezenggroup/computational-chemistry-agent-skills@37ed5a29ae463b41fe903543a09004a995e39027 Upstream-Paths: - machine-learning-potentials/deepmd-finetune-dpa3 - machine-learning-potentials/deepmd-python-inference - machine-learning-potentials/deepmd-train-dpa3 - machine-learning-potentials/deepmd-train-se-e2-a - molecular-dynamics/lammps-deepmd --- skills/lammps-deepmd/SKILL.md | 8 ++++++++ skills/lammps-deepmd/assets/input.nvt.lammps | 2 ++ .../references/commands-and-workflow.md | 13 ++----------- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/skills/lammps-deepmd/SKILL.md b/skills/lammps-deepmd/SKILL.md index 5fbe292891..4ec87c9adc 100644 --- a/skills/lammps-deepmd/SKILL.md +++ b/skills/lammps-deepmd/SKILL.md @@ -25,6 +25,7 @@ Use this skill when the user wants to run molecular dynamics in LAMMPS with a De 1. Confirm the minimum simulation inputs: - structure/data file (for example `data.system`) - DeePMD model file (for example `graph.pb` or compressed model) + - atom type to element mapping, including required per-type masses if the data file does not define them - target ensemble (NVE, NVT, NPT, or another explicitly requested setup) - temperature, pressure if applicable, timestep, and total number of steps 1. Write the LAMMPS input script yourself instead of asking the user to hand-write it. @@ -104,6 +105,8 @@ atom_style atomic neighbor 1.0 bin read_data data.system +mass 1 28.0855 +mass 2 15.999 pair_style deepmd graph_compressed.pb pair_coeff * * @@ -171,6 +174,11 @@ run ${NSTEPS} - Reads the initial atomic structure, atom types, simulation box, and related information from the LAMMPS data file `data.system`. - Replace this filename with the actual user file. +- `mass 1 28.0855`, `mass 2 15.999` + + - Defines per-type atomic masses when the data file does not contain a `Masses` section. + - These example values correspond to a two-type Si/O mapping; adjust them to the actual atom type to element mapping. LAMMPS velocity creation and thermostats require masses; without them, runs can fail with `Not all per-type masses are set`. + - `pair_style deepmd graph_compressed.pb` - Selects the DeePMD pair style. diff --git a/skills/lammps-deepmd/assets/input.nvt.lammps b/skills/lammps-deepmd/assets/input.nvt.lammps index 17bd72cc86..90317704dd 100644 --- a/skills/lammps-deepmd/assets/input.nvt.lammps +++ b/skills/lammps-deepmd/assets/input.nvt.lammps @@ -11,6 +11,8 @@ atom_style atomic neighbor 1.0 bin read_data data.system +mass 1 28.0855 +mass 2 15.999 pair_style deepmd graph_compressed.pb pair_coeff * * diff --git a/skills/lammps-deepmd/references/commands-and-workflow.md b/skills/lammps-deepmd/references/commands-and-workflow.md index 5dcb0f917d..7365057d17 100644 --- a/skills/lammps-deepmd/references/commands-and-workflow.md +++ b/skills/lammps-deepmd/references/commands-and-workflow.md @@ -2,17 +2,6 @@ This reference expands the main skill with practical operating guidance. -## When to use this skill - -Use this skill when a user needs to: - -- run LAMMPS with a DeePMD-kit model -- write or modify `input.lammps` -- explain what a LAMMPS command does -- switch between NVE, NVT, and NPT -- run through `uvx` in an internet-connected environment -- run through a site-installed `lmp` command in an offline or HPC environment - ## Practical rules for agents 1. Prefer small, explicit input scripts over clever but opaque templates. @@ -34,6 +23,7 @@ This helps catch obvious issues such as: - missing model file - unsupported pair style in the local LAMMPS build - malformed data file +- missing per-type masses in the data file or input script - immediate numerical instability Then replace the short run with the intended production length. @@ -49,6 +39,7 @@ Then replace the short run with the intended production length. ## Caution points - The correct timestep depends on the physical system and the DeePMD model quality. +- Ensure every atom type has a mass, either in the LAMMPS data file `Masses` section or via explicit `mass` commands after `read_data`. - `velocity ... create ...` should usually not be repeated when continuing from a restart. - NPT settings need physically sensible damping constants; avoid copying values blindly. - Some local LAMMPS builds may support DeePMD under slightly different package configurations. Check `lmp -h` if unsure.