deepmodeling · njzjz · Aug 25, 2025 · Aug 24, 2025 · Aug 24, 2025 · Aug 24, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,191 @@
+# DeePMD-kit
+
+DeePMD-kit is a deep learning package for many-body potential energy representation and molecular dynamics. It supports multiple backends (TensorFlow, PyTorch, JAX, Paddle) and integrates with MD packages like LAMMPS, GROMACS, and i-PI.
+
+**Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.**
+
+## Working Effectively
+
+### Bootstrap and Build Repository
+
+- Create virtual environment: `uv venv venv && source venv/bin/activate`
+- Install base dependencies: `uv pip install tensorflow-cpu` (takes ~8 seconds)
+- Install PyTorch: `uv pip install torch --index-url https://download.pytorch.org/whl/cpu` (takes ~5 seconds)
+- Build Python package: `uv pip install -e .[cpu,test]` -- takes 67 seconds. **NEVER CANCEL. Set timeout to 120+ seconds.**
+- Build C++ components: `export TENSORFLOW_ROOT=$(python -c 'import importlib.util,pathlib;print(pathlib.Path(importlib.util.find_spec("tensorflow").origin).parent)')` then `export PYTORCH_ROOT=$(python -c 'import torch;print(torch.__path__[0])')` then `./source/install/build_cc.sh` -- takes 164 seconds. **NEVER CANCEL. Set timeout to 300+ seconds.**
+
+### Test Repository
+
+- Run single test: `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` -- takes 8-13 seconds
+- Run test subset: `pytest source/tests/tf/test_dp_test.py -v` -- takes 15 seconds. **NEVER CANCEL. Set timeout to 60+ seconds.**
+- **Recommended: Use single test cases for validation instead of full test suite** -- full suite has 314 test files and takes 60+ minutes
+
+### Lint and Format Code
+
+- Install linter: `uv pip install ruff`
+- Run linting: `ruff check .` -- takes <1 second
+- Format code: `ruff format .` -- takes <1 second
+- **Always run `ruff check .` and `ruff format .` before committing changes or the CI will fail.**
+
+### Training and Validation
+
+- Test TensorFlow training: `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` -- training proceeds but is slow on CPU
+- Test PyTorch training: `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` -- training proceeds but is slow on CPU
+- **Training examples are for validation only. Real training takes hours/days. Timeout training tests after 60 seconds for validation.**
+
+## Validation Scenarios
+
+**ALWAYS manually validate any new code through at least one complete scenario:**
+
+### Basic Functionality Validation
+
+1. **CLI Interface**: Run `dp --version` and `dp -h` to verify installation
+2. **Python Interface**: Run `python -c "import deepmd; import deepmd.tf; print('Both interfaces work')"`
+3. **Backend Selection**: Test `dp --tf -h`, `dp --pt -h`, `dp --jax -h`, `dp --pd -h`
+
+### Training Workflow Validation
+
+1. **TensorFlow Training**: `cd examples/water/se_e2_a && timeout 60 dp train input.json --skip-neighbor-stat` -- should start training and show decreasing loss
+2. **PyTorch Training**: `cd examples/water/se_e2_a && timeout 60 dp --pt train input_torch.json --skip-neighbor-stat` -- should start training and show decreasing loss
+3. **Verify training output**: Look for "batch X: trn: rmse" messages showing decreasing error values
+
+### Test-Based Validation
+
+1. **Core Tests**: `pytest source/tests/tf/test_dp_test.py::TestDPTestEner::test_1frame -v` -- should pass in ~10 seconds
+2. **Multi-backend**: Test both TensorFlow and PyTorch components work
+
+## Common Commands and Timing
+
+### Repository Structure
+
+```
+ls -la [repo-root]
+.github/               # GitHub workflows and templates
+CONTRIBUTING.md        # Contributing guide
+README.md             # Project overview
+deepmd/               # Python package source
+doc/                  # Documentation
+examples/             # Training examples and configurations
+pyproject.toml        # Python build configuration
+source/               # C++ source code and tests
+```
+
+### Key Directories and Files
+
+- `deepmd/` - Main Python package with backend implementations
+- `source/lib/` - Core C++ library
+- `source/op/` - Backend-specific operators (TF, PyTorch, etc.)
+- `source/api_cc/` - C++ API
+- `source/api_c/` - C API
+- `source/tests/` - Test suite (314 test files)
+- `examples/water/se_e2_a/` - Basic water training example
+- `examples/` - Various model examples for different scenarios
+
+### Common CLI Commands
+
+- `dp --version` - Show version information
+- `dp -h` - Show help and available commands
+- `dp train input.json` - Train a model (TensorFlow backend)
+- `dp --pt train input.json` - Train with PyTorch backend
+- `dp --jax train input.json` - Train with JAX backend
+- `dp --pd train input.json` - Train with Paddle backend
+- `dp test -m model.pb -s system/` - Test a trained model
+- `dp freeze -o model.pb` - Freeze/save a model
+
+### Build Dependencies and Setup
+
+- **Python 3.9+** required
+- **Virtual environment** strongly recommended: `uv venv venv && source venv/bin/activate`
+- **Backend dependencies**: TensorFlow, PyTorch, JAX, or Paddle (install before building)
+- **Build tools**: CMake, C++ compiler, scikit-build-core
+- **C++ build requires**: Both TensorFlow and PyTorch installed, set TENSORFLOW_ROOT and PYTORCH_ROOT environment variables
+
+### Key Configuration Files
+
+- `pyproject.toml` - Python build configuration and dependencies
+- `source/CMakeLists.txt` - C++ build configuration
+- `examples/water/se_e2_a/input.json` - Basic TensorFlow training config
+- `examples/water/se_e2_a/input_torch.json` - Basic PyTorch training config
+
+## Frequent Patterns and Time Expectations
+
+### Installation and Build Times
+
+- **Virtual environment setup**: ~5 seconds
+- **TensorFlow CPU install**: ~8 seconds
+- **PyTorch CPU install**: ~5 seconds
+- **Python package build**: ~67 seconds. **NEVER CANCEL.**
+- **C++ components build**: ~164 seconds. **NEVER CANCEL.**
+- **Full fresh setup**: ~3-4 minutes total
+
+### Testing Times
+
+- **Single test**: 8-13 seconds
+- **Test file (~5 tests)**: ~15 seconds
+- **Backend-specific test subset**: 15-30 minutes. **Use sparingly.**
+- **Full test suite (314 files)**: 60+ minutes. **Avoid in development - use single tests instead.**
+
+### Linting and Formatting
+
+- **Ruff check**: <1 second
+- **Ruff format**: <1 second
+- **Pre-commit hooks**: May have network issues, use individual tools
+
+### Commit Messages and PR Titles
+
+**All commit messages and PR titles must follow [conventional commit specification](https://www.conventionalcommits.org/):**
+
+- **Format**: `type(scope): description`
+- **Common types**: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`, `ci`
+- **Examples**:
+  - `feat(core): add new descriptor type`
+  - `fix(tf): resolve memory leak in training`
+  - `docs: update installation guide`
+  - `ci: add workflow for testing`
+
+### Training and Model Operations
+
+- **Training initialization**: 10-30 seconds
+- **Training per batch**: 0.1-1 second (CPU), much faster on GPU
+- **Model freezing**: 5-15 seconds
+- **Model testing**: 10-30 seconds
+
+## Backend-Specific Notes
+
+### TensorFlow Backend
+
+- **Default backend** when no flag specified
+- **Configuration**: Use `input.json` format
+- **Training**: `dp train input.json`
+- **Requirements**: `tensorflow` or `tensorflow-cpu` package
+
+### PyTorch Backend
+
+- **Activation**: Use `--pt` flag or `export DP_BACKEND=pytorch`
+- **Configuration**: Use `input_torch.json` format typically
+- **Training**: `dp --pt train input_torch.json`
+- **Requirements**: `torch` package
+
+### JAX Backend
+
+- **Activation**: Use `--jax` flag
+- **Training**: `dp --jax train input.json`
+- **Requirements**: `jax` and related packages
+- **Note**: Experimental backend, may have limitations
+
+### Paddle Backend
+
+- **Activation**: Use `--pd` flag
+- **Training**: `dp --pd train input.json`
+- **Requirements**: `paddlepaddle` package
+- **Note**: Less commonly used
+
+## Critical Warnings
+
+- **NEVER CANCEL BUILD OPERATIONS**: Python build takes 67 seconds, C++ build takes 164 seconds
+- **USE SINGLE TESTS FOR VALIDATION**: Run individual tests instead of full test suite for faster feedback
+- **ALWAYS activate virtual environment**: Build and runtime failures occur without proper environment
+- **ALWAYS install backend dependencies first**: TensorFlow/PyTorch required before building C++ components
+- **ALWAYS run linting before commits**: `ruff check . && ruff format .` or CI will fail
+- **ALWAYS test both Python and C++ components**: Some features require both to be built
+- **ALWAYS follow conventional commit format**: All commit messages and PR titles must use conventional commit specification (`type(scope): description`)
diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml
@@ -0,0 +1,65 @@
+name: "Copilot Setup Steps"
+
+# Automatically run the setup steps when they are changed to allow for easy validation, and
+# allow manual testing through the repository's "Actions" tab
+on:
+  workflow_dispatch:
+  push:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+  pull_request:
+    paths:
+      - .github/workflows/copilot-setup-steps.yml
+
+jobs:
+  # The job MUST be called `copilot-setup-steps` or it will not be picked up by Copilot.
+  copilot-setup-steps:
+    runs-on: ubuntu-latest
+
+    # Set the permissions to the lowest permissions possible needed for your steps.
+    # Copilot will be given its own token for its operations.
+    permissions:
+      # If you want to clone the repository as part of your setup steps, for example to install dependencies, you'll need the `contents: read` permission. If you don't clone the repository in your setup steps, Copilot will do this for you automatically after the steps complete.
+      contents: read
+
+    # You can define any steps you want, and they will run before the agent starts.
+    # If you do not check out your code, Copilot will do this for you.
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
+
+      - name: Create virtual environment
+        run: uv venv venv
+
+      - name: Activate virtual environment
+        run: echo "VIRTUAL_ENV=$PWD/venv" >> $GITHUB_ENV && echo "$PWD/venv/bin" >> $GITHUB_PATH
+
+      - name: Install base dependencies
+        run: uv pip install tensorflow-cpu
+
+      - name: Install PyTorch
+        run: uv pip install torch --index-url https://download.pytorch.org/whl/cpu
+
+      - name: Build Python package
+        run: uv pip install -e .[cpu,test]
+
+      - name: Install pre-commit tools
+        run: uv tool install pre-commit
+
+      - name: Install pre-commit hooks
+        run: pre-commit install --install-hooks
+
+      - name: Verify installation
+        run: |
+          dp --version
+          python -c "import deepmd; import deepmd.tf; print('DeePMD-kit installation verified')"
diff --git a/.gitignore b/.gitignore
@@ -50,3 +50,8 @@ uv.lock
 buildcxx/
 node_modules/
 *.bib.original
+
+# Test output files (temporary)
+test_dp_test/
+test_dp_test_*.out
+*_detail.out