-
Notifications
You must be signed in to change notification settings - Fork 160
Closed
Labels
Description
One of the tests introduced by #1366 seems to be failing randomly in CI.
For instance, from the CI run for #1459:
Error: Tests run: 17, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 7.993 s <<< FAILURE! -- in com.nvidia.cuvs.CagraBuildAndSearchIT
Error: com.nvidia.cuvs.CagraBuildAndSearchIT.testFloatIndexing -- Time elapsed: 0.186 s <<< ERROR!
java.util.concurrent.ExecutionException:
java.lang.AssertionError: Exception while executing runnable: java.lang.RuntimeException: cuvsCagraBuild returned 0[RAFT failure at file=/tmp/conda-bld-output/bld/rattler-build_libcuvs/work/cpp/src/neighbors/detail/cagra/graph_core.cuh line=1406: Could not generate an intermediate CAGRA graph because the initial kNN graph contains too many invalid or duplicated neighbor nodes. This error can occur, for example, if too many overflows occur during the norm computation between the dataset vectors.
Obtained 6 stack frames
#1 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so(+0x56826d) [0x7a570b4db26d]
#2 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: void cuvs::neighbors::cagra::detail::graph::optimize<unsigned int, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >(raft::resources const&, std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >, std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >, bool, bool) +0x23f9 [0x7a570bf36c19]
#3 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: cuvs::neighbors::cagra::index<float, unsigned int> cuvs::neighbors::cagra::detail::build<float, unsigned int, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)0> >(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)0> >) +0x5f9 [0x7a570bf44009]
#4 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: cuvs::neighbors::cagra::build(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)0> >) +0x21 [0x7a570bf1d3b1]
#5 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs_c.so: cuvsCagraBuild +0x50a [0x7a574799dffa]
#6 in [0x7a579c0861c7]
]
at __randomizedtesting.SeedInfo.seed([66039A8CAFB9D3C9:449B6310296799E0]:0)
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at com.nvidia.cuvs@25.12.0/com.nvidia.cuvs.CagraBuildAndSearchIT.runConcurrently(CagraBuildAndSearchIT.java:69)
at com.nvidia.cuvs@25.12.0/com.nvidia.cuvs.CagraBuildAndSearchIT.testIndexing(CagraBuildAndSearchIT.java:233)
at com.nvidia.cuvs@25.12.0/com.nvidia.cuvs.CagraBuildAndSearchIT.testFloatIndexing(CagraBuildAndSearchIT.java:215)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:426)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:716)
at com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:637)
In the interest of unblocking CI on cuvs/main, I think it might be quickest to disable this test temporarily, while a fix is sought.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done