diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md index a733ba9d65..c8e60c514a 100644 --- a/doc/model/dpa2.md +++ b/doc/model/dpa2.md @@ -26,7 +26,7 @@ When using the JAX backend, 2 or more MPI ranks are not supported. One must set atom_modify map yes ``` -See the example `examples/water/lmp/jax_dpa2.lammps`. +See the example `examples/water/lmp/jax_dpa.lammps`. ## Data format diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md new file mode 100644 index 0000000000..5770a2889b --- /dev/null +++ b/doc/model/dpa3.md @@ -0,0 +1,76 @@ +# Descriptor DPA-3 {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }} + +:::{note} +**Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }} +::: + +DPA-3 is an advanced interatomic potential leveraging the message passing architecture. +Designed as a large atomic model (LAM), DPA-3 is tailored to integrate and simultaneously train on datasets from various disciplines, +encompassing diverse chemical and materials systems across different research domains. +Its model design ensures exceptional fitting accuracy and robust generalization both within and beyond the training domain. +Furthermore, DPA-3 maintains energy conservation and respects the physical symmetries of the potential energy surface, +making it a dependable tool for a wide range of scientific applications. + +Reference: will be released soon. + +Training example: `examples/water/dpa3/input_torch.json`. + +## Hyperparameter tests + +We systematically conducted DPA-3 training on six representative DFT datasets (available at [AIS-Square](https://www.aissquare.com/datasets/detail?pageType=datasets&name=DPA3_hyperparameter_search&id=316)): +metallic systems (`Alloy`, `AlMgCu`, `W`), covalent material (`Boron`), molecular system (`Drug`), and liquid water (`Water`). +Under consistent training conditions (0.5M training steps, batch_size "auto:128"), +we rigorously evaluated the impacts of some critical hyperparameters on validation accuracy. + +The comparative analysis focused on average RMSEs (Root Mean Square Error) for both energy, force and virial predictions across all six systems, +with results tabulated below to guide scenario-specific hyperparameter selection: + +| Model | comment | nlayers | n_dim | e_dim | a_dim | e_sel | a_sel | start_lr | stop_lr | loss prefactors | rmse_e (meV/atom) | rmse_f (meV/Å) | rmse_v (meV/atom) | Training wall time (h) | +| ---------------- | --------------- | ------- | ------- | ------ | ----- | ------- | ------ | -------- | -------- | ------------------------- | ----------------- | -------------- | ----------------- | ---------------------- | +| DPA3-L3 | Default | 3 | 256 | 128 | 32 | 120 | 30 | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 5.74 | 85.4 | 43.1 | 9.8 | +| | Small dimension | 3 | **128** | **64** | 32 | 120 | 30 | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 6.99 | 93.6 | 46.7 | 8.0 | +| | Large sel | 3 | 256 | 128 | 32 | **154** | **48** | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 5.70 | 83.7 | 43.4 | 14.1 | +| DPA3-L6 | Default | 6 | 256 | 128 | 32 | 120 | 30 | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 4.85 | 79.9 | 39.7 | 19.2 | +| | Small dimension | 6 | **128** | **64** | 32 | 120 | 30 | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 5.11 | 77.7 | 41.2 | 14.1 | +| | Large sel | 6 | 256 | 128 | 32 | **154** | **48** | 1e-3 | 3e-5 | 0.2\|20, 100\|60, 0.02\|1 | 4.76 | 78.4 | 40.2 | 31.8 | +| DPA2-L6 (medium) | Default | 6 | - | - | - | - | - | 1e-3 | 3.51e-08 | 0.02\|1, 1000\|1, 0.02\|1 | 12.12 | 109.3 | 83.1 | 12.2 | + +The loss prefactors (0.2|20, 100|60, 0.02|1) correspond to (`start_pref_e`|`limit_pref_e`, `start_pref_f`|`limit_pref_f`, `start_pref_v`|`limit_pref_v`) respectively. +Virial RMSEs were averaged exclusively for systems containing virial labels (`Alloy`, `AlMgCu`, `W`, and `Boron`). + +Note that we set `float32` in all DPA-3 models, while `float64` in other models by default. + +## Requirements of installation from source code {{ pytorch_icon }} + +To run the DPA-3 model on LAMMPS via source code installation +(users can skip this step if using [easy installation](../install/easy-install.md)), +the custom OP library for Python interface integration must be compiled and linked +during the [model freezing process](../freeze/freeze.md). + +The customized OP library for the Python interface can be installed by setting environment variable {envvar}`DP_ENABLE_PYTORCH` to `1` during installation. + +If one runs LAMMPS with MPI, the customized OP library for the C++ interface should be compiled against the same MPI library as the runtime MPI. +If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a [CUDA-Aware MPI](https://developer.nvidia.com/mpi-solutions-gpus) library and CUDA, +otherwise the communication between GPU cards falls back to the slower CPU implementation. + +## Limitations of the JAX backend with LAMMPS {{ jax_icon }} + +When using the JAX backend, 2 or more MPI ranks are not supported. One must set `map` to `yes` using the [`atom_modify`](https://docs.lammps.org/atom_modify.html) command. + +```lammps +atom_modify map yes +``` + +See the example `examples/water/lmp/jax_dpa.lammps`. + +## Data format + +DPA-3 supports both the [standard data format](../data/system.md) and the [mixed type data format](../data/system.md#mixed-type). + +## Type embedding + +Type embedding is within this descriptor with the same dimension as the node embedding: {ref}`n_dim ` argument. + +## Model compression + +Model compression is not supported in this descriptor. diff --git a/doc/model/index.rst b/doc/model/index.rst index 33dbf571cf..b97db858cc 100644 --- a/doc/model/index.rst +++ b/doc/model/index.rst @@ -10,6 +10,7 @@ Model train-se-e3 train-se-atten dpa2 + dpa3 train-hybrid sel train-energy diff --git a/doc/model/train-energy-spin.md b/doc/model/train-energy-spin.md index 1d56d59449..52a470f2a6 100644 --- a/doc/model/train-energy-spin.md +++ b/doc/model/train-energy-spin.md @@ -51,6 +51,7 @@ In PyTorch/DP, the spin implementation is more flexible and so far supports the - `se_e2_a` - `dpa1`(`se_atten`) - `dpa2` +- `dpa3` See `se_e2_a` examples in `$deepmd_source_dir/examples/spin/se_e2_a/input_torch.json`, the {ref}`spin ` section is defined as the following with a much more clear interface: diff --git a/examples/water/dpa3/README.md b/examples/water/dpa3/README.md new file mode 100644 index 0000000000..2352248278 --- /dev/null +++ b/examples/water/dpa3/README.md @@ -0,0 +1,4 @@ +# Input for the DPA-3 model + +This directory stores configuration files for training the 6-layer DPA-3 model. +For comprehensive hyperparameter selection, consult the [DPA-3 documentation](../../../doc/model/dpa3.md/#hyperparameter-tests). diff --git a/examples/water/dpa3/input_torch.json b/examples/water/dpa3/input_torch.json new file mode 100644 index 0000000000..ebdbb78724 --- /dev/null +++ b/examples/water/dpa3/input_torch.json @@ -0,0 +1,94 @@ +{ + "_comment": "that's all", + "model": { + "type_map": [ + "O", + "H" + ], + "descriptor": { + "type": "dpa3", + "repflow": { + "n_dim": 256, + "e_dim": 128, + "a_dim": 32, + "nlayers": 6, + "e_rcut": 6.0, + "e_rcut_smth": 3.0, + "e_sel": 120, + "a_rcut": 4.0, + "a_rcut_smth": 2.0, + "a_sel": 30, + "axis_neuron": 4, + "skip_stat": true, + "a_compress_rate": 1, + "a_compress_e_rate": 2, + "a_compress_use_split": true, + "update_angle": true, + "update_style": "res_residual", + "update_residual": 0.1, + "update_residual_init": "const" + }, + "activation_function": "silut:10.0", + "use_tebd_bias": false, + "precision": "float32", + "concat_output_tebd": false + }, + "fitting_net": { + "neuron": [ + 240, + 240, + 240 + ], + "resnet_dt": true, + "precision": "float32", + "activation_function": "silut:10.0", + "seed": 1, + "_comment": " that's all" + }, + "_comment": " that's all" + }, + "learning_rate": { + "type": "exp", + "decay_steps": 5000, + "start_lr": 0.001, + "stop_lr": 3e-5, + "_comment": "that's all" + }, + "loss": { + "type": "ener", + "start_pref_e": 0.2, + "limit_pref_e": 20, + "start_pref_f": 100, + "limit_pref_f": 60, + "start_pref_v": 0.02, + "limit_pref_v": 1, + "_comment": " that's all" + }, + "training": { + "stat_file": "./dpa3.hdf5", + "training_data": { + "systems": [ + "../data/data_0", + "../data/data_1", + "../data/data_2" + ], + "batch_size": 1, + "_comment": "that's all" + }, + "validation_data": { + "systems": [ + "../data/data_3" + ], + "batch_size": 1, + "_comment": "that's all" + }, + "numb_steps": 1000000, + "warmup_steps": 0, + "gradient_max_norm": 5.0, + "seed": 10, + "disp_file": "lcurve.out", + "disp_freq": 100, + "save_freq": 2000, + "_comment": "that's all" + } +} diff --git a/examples/water/lmp/jax_dpa2.lammps b/examples/water/lmp/jax_dpa.lammps similarity index 91% rename from examples/water/lmp/jax_dpa2.lammps rename to examples/water/lmp/jax_dpa.lammps index c9fdeac47d..f62aa079bf 100644 --- a/examples/water/lmp/jax_dpa2.lammps +++ b/examples/water/lmp/jax_dpa.lammps @@ -5,7 +5,7 @@ units metal boundary p p p atom_style atomic -# Below line is required when using DPA-2 with the JAX backend +# Below line is required when using DPA-2/3 with the JAX backend atom_modify map yes neighbor 2.0 bin diff --git a/source/tests/common/test_examples.py b/source/tests/common/test_examples.py index 92ecf3a09f..283b02bc2f 100644 --- a/source/tests/common/test_examples.py +++ b/source/tests/common/test_examples.py @@ -58,6 +58,7 @@ p_examples / "water" / "dpa2" / "input_torch_medium.json", p_examples / "water" / "dpa2" / "input_torch_large.json", p_examples / "water" / "dpa2" / "input_torch_compressible.json", + p_examples / "water" / "dpa3" / "input_torch.json", p_examples / "property" / "train" / "input_torch.json", p_examples / "water" / "se_e3_tebd" / "input_torch.json", p_examples / "hessian" / "single_task" / "input.json",