Skip to content

[BUG] PT: RuntimeError: fused=True requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone']. #4667

Description

@njzjz

Bug summary

The following error appears when training a model using the PyTorch backend on the CPU:

[2025-03-22 22:37:08,743] DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
[2025-03-22 22:37:08,743] DEEPMD INFO    found 1 system(s):
[2025-03-22 22:37:08,743] DEEPMD INFO                                        system  natoms  bch_sz   n_bch       prob  pbc
[2025-03-22 22:37:08,743] DEEPMD INFO                                ../data/data_3     192       1      80  1.000e+00    T
[2025-03-22 22:37:08,743] DEEPMD INFO    --------------------------------------------------------------------------------------
/home/njzjz/codes/deepmd-kit/deepmd/dpmodel/utils/learning_rate.py:42: RuntimeWarning: divide by zero encountered in scalar divide
  np.log(stop_lr / self.start_lr) / (stop_steps / self.decay_steps)
Traceback (most recent call last):
  File "/home/njzjz/anaconda3/bin/dp", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/njzjz/codes/deepmd-kit/deepmd/main.py", line 928, in main
    deepmd_main(args)
  File "/home/njzjz/anaconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 530, in main
    train(
  File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 340, in train
    trainer = get_trainer(
              ^^^^^^^^^^^^
  File "/home/njzjz/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 186, in get_trainer
    trainer = training.Trainer(
              ^^^^^^^^^^^^^^^^^
  File "/home/njzjz/codes/deepmd-kit/deepmd/pt/train/training.py", line 596, in __init__
    self.optimizer = torch.optim.Adam(
                     ^^^^^^^^^^^^^^^^^
  File "/home/njzjz/anaconda3/lib/python3.12/site-packages/torch/optim/adam.py", line 60, in __init__
    raise RuntimeError("`fused=True` requires all the params to be floating point Tensors of "
RuntimeError: `fused=True` requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'].
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe
terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe

DeePMD-kit Version

c9bfa79

Backend and its version

PyTorch v2.3.1+cpu-gd44533f9d07

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Water example.

Steps to Reproduce

cd examples/water/se_e2_a
dp --pt train input.json

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Labels

Type

Fields

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions