Skip to content

[BUG] PT parallel training print summary on each node #4595

Description

@njzjz

Bug summary

It should be only printed once, i.e. on the rank 0.

DeePMD-kit Version

v3.0.1

Backend and its version

PyTorch v2.4.1.post302

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

[2025-02-10 17:58:17,472] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,472] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,472] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,472] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,472] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,472] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,473] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,473] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,473] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,473] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,473] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,473] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,473] DEEPMD INFO    source:
[2025-02-10 17:58:17,473] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,473] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,473] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,473] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,473] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,473] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,473] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,473] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,473] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,473] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,473] DEEPMD INFO    computing device:      cuda:1
[2025-02-10 17:58:17,473] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,473] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,473] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,473] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,473] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,502] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,503] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,503] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,503] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,503] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,503] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,503] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,503] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,503] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,503] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,503] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,503] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,503] DEEPMD INFO    source:
[2025-02-10 17:58:17,503] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,503] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,503] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,503] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,503] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,503] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,503] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,503] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,503] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,503] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,503] DEEPMD INFO    computing device:      cuda:2
[2025-02-10 17:58:17,503] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,503] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,503] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,503] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,503] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,510] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,510] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,510] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,510] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,510] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,510] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,510] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,510] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,510] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,510] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,510] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,510] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,510] DEEPMD INFO    source:
[2025-02-10 17:58:17,510] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,510] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,510] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,510] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,510] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,511] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,511] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,511] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,511] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,511] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,511] DEEPMD INFO    computing device:      cuda:5
[2025-02-10 17:58:17,511] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,511] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,511] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,511] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,511] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,519] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,519] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,519] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,519] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,519] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,519] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,519] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,519] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,519] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,519] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,519] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,519] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,519] DEEPMD INFO    source:
[2025-02-10 17:58:17,519] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,519] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,519] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,519] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,519] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,519] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,519] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,519] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,519] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,519] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,519] DEEPMD INFO    computing device:      cuda:3
[2025-02-10 17:58:17,520] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,520] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,520] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,520] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,520] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,540] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,540] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,540] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,540] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,540] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,540] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,540] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,540] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,540] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,540] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,540] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,540] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,540] DEEPMD INFO    source:
[2025-02-10 17:58:17,541] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,541] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,541] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,541] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,541] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,541] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,541] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,541] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,541] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,541] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,541] DEEPMD INFO    computing device:      cuda:7
[2025-02-10 17:58:17,541] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,541] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,541] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,541] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,541] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,580] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,580] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,580] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,580] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,580] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,580] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,580] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,580] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,580] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,580] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,580] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,580] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,580] DEEPMD INFO    source:
[2025-02-10 17:58:17,580] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,581] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,581] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,581] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,581] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,581] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,581] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,581] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,581] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,581] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,581] DEEPMD INFO    computing device:      cuda:4
[2025-02-10 17:58:17,581] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,581] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,581] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,581] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,581] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[bohrium-156-1256408:01441] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1256408.0/jf.0/1753350144/shared_mem_cuda_pool.bohrium-156-1256408 could be created.
[bohrium-156-1256408:01441] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728
[bohrium-156-1256408:01435] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1256408.0/jf.0/4004642816/shared_mem_cuda_pool.bohrium-156-1256408 could be created.
[bohrium-156-1256408:01435] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728
[2025-02-10 17:58:17,708] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,708] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,708] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,708] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,708] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,708] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,708] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,708] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,708] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,708] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,708] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,708] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,708] DEEPMD INFO    source:
[2025-02-10 17:58:17,708] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,708] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,708] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,708] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,708] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,708] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,708] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,708] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,708] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,708] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,708] DEEPMD INFO    computing device:      cuda:6
[2025-02-10 17:58:17,708] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,708] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,709] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,709] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,709] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,806] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,806] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,806] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,806] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,806] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,806] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,806] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,806] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,806] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,806] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,806] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,806] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,806] DEEPMD INFO    source:
[2025-02-10 17:58:17,806] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,806] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,806] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,806] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,806] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,806] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,806] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,806] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,806] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,806] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,806] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,807] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,807] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,807] DEEPMD INFO    computing device:      cuda:0
[2025-02-10 17:58:17,807] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,807] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,807] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,807] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,807] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------

Steps to Reproduce

cd examples/water/se_atten
torchrun --nproc_per_node=4 --no-python dp --pt train input.json

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions