Skip to content

OOM Problem when run "dp test" (batch_size=1) #748

Description

@Manyi-Yang

Summary
After training a model with the descriptor "neuron": [40, 80, 160], I am trying to using "dp test" command to check my model, then I got some OOM issues. But when I using this model to run DP-MD simulation, it works well. Also when I reduce the network to [25, 50, 100], everything is fine.

Deepmd-kit version, installation way, input file, running commands, error log, etc.
2.0.0.b0, conda

Parton Input parameters:

        "descriptor": {
            "type":             "se_a",
            "sel":              [128],
            "rcut_smth":        1.00,
            "rcut":             7.00,
            "neuron":           [40, 80, 160],
             "axis_neuron": 16,
            "resnet_dt":        false,
            "seed": 722586222
        },
        "fitting_net": {
            "neuron": [
                240,
                240,
                240,
                240
            ],
            "resnet_dt": true,
            "seed": 711230366
        }
    },

Error Information when run dp test

tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[25600,128,120] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node load/gradients/filter_type_0/MatMul_4_grad/MatMul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[load/o_virial/_27]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[25600,128,120] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node load/gradients/filter_type_0/MatMul_4_grad/MatMul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Metadata

Metadata

Assignees

Labels

Fields

No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions