Skip to content

TiledArray unit tests fail on nVIDIA V100 GPU #258

@victor-anisimov

Description

@victor-anisimov

I have compiled the latest, Feb 15 copy of TiledArray (commit 53af746) on Intel Xeon Gold 6152 CPU equipped with nVidia V100 GPU. Compilation successfully finishes. All tests compile as well. However, the execution of tests stalls.

`1: [100%] Built target ta_test
1/3 Test #1: build_ta_test .................... Passed 1724.30 sec
test 2
Start 2: ta_test-np-1

2: Test command: mpirun "-n" "1" "/home/vanisimov/tiledarray/tiledarray/build/tests/ta_test" "--log_level=warning" "--show-progress" "--run_test=!@distributed"
2: Test timeout computed to be: 10000000
2: created 3 CUDA streams + 2 I/O streams
2: Running 1788 test cases...
2:
2: 0% 10 20 30 40 50 60 70 80 90 100%
2: |----|----|----|----|----|----|----|----|----|----|
2: *************************************************!!MADNESS: Hung queue?
2: !!MADNESS: Hung queue?
2: !!MADNESS: Hung queue?
2: !!MADNESS: Hung queue?
2: !!MADNESS: Hung queue?
2: !! ERROR TiledArray: Aborting due to MADNESS exception.
2: !! ERROR TiledArray: ThreadPool::await() timed out after 900.0 seconds
2: !! ERROR TiledArray: rank=0 id={0,21626} 105 of 125 tiles set
2: unknown location(0): fatal error: in "um_expressions_suite/tensor_factories": signal: SIGABRT (application abort requested)
2: /home/vanisimov/tiledarray/tiledarray/tests/expressions_cuda_um.cpp(144): last checkpoint`

The host compiler is gcc/8.2.0. The version of cuda toolkit is 11.2.0.

For comparison, all unit tests successfully complete if TiledArray is compiled in CPU-only mode by using the same version of gcc/8.2.0. What version of TiledArray code should I use to see CUDA tests passing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions