Fix potential OOB access in CAGRA search when graph size < dataset size by irina-resh-nvda · Pull Request #1780 · rapidsai/cuvs

irina-resh-nvda · 2026-02-06T17:00:39Z

Random seed node selection incorrectly used dataset_desc.size to generate indices. When graph.extent(0) < dataset.size, this caused OOB access by attempting to index graph rows that don't exist.

This PR fixes an out-of-bounds memory access bug in CAGRA search that occurs when the graph has fewer nodes than the dataset (e.g., during iterative CAGRA-Q builds with compression).

Solution

Thread graph.extent(0) through the search kernel call chain as a graph_size parameter. Random seeds are now correctly constrained to [0, graph.extent(0)) instead of [0, dataset.size).

Purely internal fix with no API changes. Existing behavior unchanged for normal CAGRA usage where graph and dataset sizes match.

Testing

Added test that builds an index on 5,000 points, expands the dataset to 10,000 points via update_dataset(), then runs search. This reproduces the exact scenario where the bug would occur. Test verifies both SINGLE_CTA and MULTI_CTA algorithms complete without OOB access.

…rs to constrain random seed node selection to a subset of the dataset. This is useful when the graph is smaller than the dataset, such as during iterative build with compression.

…oved max_node_id from the search parameters structure

mfoerste4

LGTM, only minor suggestions.

cpp/src/neighbors/detail/cagra/search_multi_cta_kernel-inl.cuh

mfoerste4 · 2026-02-10T22:25:53Z

cpp/src/neighbors/detail/cagra/search_multi_cta_kernel-inl.cuh

  uint32_t* const num_executed_iterations, /* stats */
-  SAMPLE_FILTER_T sample_filter)
+  SAMPLE_FILTER_T sample_filter,
+  const typename DATASET_DESCRIPTOR_T::INDEX_T graph_size = 0)


Is the default value used anywhere besides tests?

the default is used everywhere but the tests and thefuture iterative cagra q implementation

cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh

tfeher

Thanks Irina, the PR looks good to me.

cpp/tests/neighbors/ann_cagra/bug_graph_smaller_than_dataset.cu

irina-resh-nvda added 4 commits February 6, 2026 05:28

add a new max_node_id parameter to the CAGRA search API, allowing use…

13725b3

…rs to constrain random seed node selection to a subset of the dataset. This is useful when the graph is smaller than the dataset, such as during iterative build with compression.

Changed the max node id parameter name to graph_size for clarity; rem…

0746446

…oved max_node_id from the search parameters structure

wrote test

70a69d9

minor pre-commit changes

f428e54

irina-resh-nvda requested review from a team as code owners February 6, 2026 17:00

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Feb 6, 2026

irina-resh-nvda self-assigned this Feb 6, 2026

irina-resh-nvda requested a review from mfoerste4 February 6, 2026 17:01

irina-resh-nvda added bug Something isn't working non-breaking Introduces a non-breaking change labels Feb 6, 2026

Merge branch 'main' into add-max-node-id-parameter

c69c067

mfoerste4 approved these changes Feb 10, 2026

View reviewed changes

irina-resh-nvda added 2 commits February 16, 2026 12:20

Merge branch 'main' into add-max-node-id-parameter

7d3b52c

addressed comments regarding type cast

13b1f77

divyegala mentioned this pull request Feb 16, 2026

JIT LTO Cagra Search #1807

Open

8 tasks

irina-resh-nvda added 6 commits February 20, 2026 01:32

Pre-commit style fix

cb60c6b

Merge branch 'main' into add-max-node-id-parameter

4be18ec

Merge branch 'main' into add-max-node-id-parameter

c6bbece

Merge branch 'main' into add-max-node-id-parameter

58d224a

style fix

b27ad1b

Merge branch 'main' into add-max-node-id-parameter

40e960f

tfeher approved these changes Mar 3, 2026

View reviewed changes

cpp/tests/neighbors/ann_cagra/bug_graph_smaller_than_dataset.cu Outdated Show resolved Hide resolved

irina-resh-nvda added 3 commits March 3, 2026 13:24

Merge branch 'main' into add-max-node-id-parameter

7ec1360

smaller test

3843a7b

comment fix

86f8b86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix potential OOB access in CAGRA search when graph size < dataset size#1780

Fix potential OOB access in CAGRA search when graph size < dataset size#1780
irina-resh-nvda wants to merge 16 commits intorapidsai:mainfrom
irina-resh-nvda:add-max-node-id-parameter

irina-resh-nvda commented Feb 6, 2026

Uh oh!

mfoerste4 left a comment

Uh oh!

Uh oh!

mfoerste4 Feb 10, 2026

Uh oh!

irina-resh-nvda Feb 16, 2026

Uh oh!

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

irina-resh-nvda commented Feb 6, 2026

Uh oh!

mfoerste4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mfoerste4 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

irina-resh-nvda Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants