The following error appears when training a model using the PyTorch backend on the CPU:
[2025-03-22 22:37:08,743] DEEPMD INFO ---Summary of DataSystem: validation -----------------------------------------------
[2025-03-22 22:37:08,743] DEEPMD INFO found 1 system(s):
[2025-03-22 22:37:08,743] DEEPMD INFO system natoms bch_sz n_bch prob pbc
[2025-03-22 22:37:08,743] DEEPMD INFO ../data/data_3 192 1 80 1.000e+00 T
[2025-03-22 22:37:08,743] DEEPMD INFO --------------------------------------------------------------------------------------
/home/njzjz/codes/deepmd-kit/deepmd/dpmodel/utils/learning_rate.py:42: RuntimeWarning: divide by zero encountered in scalar divide
np.log(stop_lr / self.start_lr) / (stop_steps / self.decay_steps)
Traceback (most recent call last):
File "/home/njzjz/anaconda3/bin/dp", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/njzjz/codes/deepmd-kit/deepmd/main.py", line 928, in main
deepmd_main(args)
File "/home/njzjz/anaconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 530, in main
train(
File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 340, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 186, in get_trainer
trainer = training.Trainer(
^^^^^^^^^^^^^^^^^
File "/home/njzjz/codes/deepmd-kit/deepmd/pt/train/training.py", line 596, in __init__
self.optimizer = torch.optim.Adam(
^^^^^^^^^^^^^^^^^
File "/home/njzjz/anaconda3/lib/python3.12/site-packages/torch/optim/adam.py", line 60, in __init__
raise RuntimeError("`fused=True` requires all the params to be floating point Tensors of "
RuntimeError: `fused=True` requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'].
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
terminate called after throwing an instance of 'std::system_error'
what(): Broken pipe
Water example.
Bug summary
The following error appears when training a model using the PyTorch backend on the CPU:
DeePMD-kit Version
c9bfa79
Backend and its version
PyTorch v2.3.1+cpu-gd44533f9d07
How did you download the software?
Built from source
Input Files, Running Commands, Error Log, etc.
Water example.
Steps to Reproduce
cd examples/water/se_e2_a dp --pt train input.jsonFurther Information, Files, and Links
No response