[WIP][Java] Exposing CAGRA graph#1102
[WIP][Java] Exposing CAGRA graph#1102chatman wants to merge 2 commits intorapidsai:pull-request/1086from
Conversation
| // Prepare dataset tensor | ||
| long[] datasetShape = {rows, cols}; | ||
| MemorySegment datasetTensor = | ||
| prepareTensor(resources.getArena(), dataSeg, datasetShape, 2, 32, 2, 2, 1); |
There was a problem hiding this comment.
You can/should use the localArena here
ldematte
left a comment
There was a problem hiding this comment.
FYI, I have something similar but without the on-heap array: #1105
I think we should combine the 2 PRs: I like how you extended the CagraRandomizedIT, but I have a preference for letting the C API deal with GPU memory (IMO it is cleaner and in at least one case avoids a copy in GPU memory), unless there is a real need/use case for it.
| prepareTensor(resources.getArena(), dataSeg, datasetShape, 2, 32, 2, 2, 1); | ||
|
|
||
| // Prepare graph tensor | ||
| Arena arena = resources.getArena(); |
There was a problem hiding this comment.
Same, the resources arena will go away
| } | ||
|
|
||
| // Allocate device memory for the graph | ||
| MemorySegment graphD = arena.allocate(C_POINTER); |
There was a problem hiding this comment.
It's interesting you are copying things to device memory here, but I don't think this is needed: cuvsCagraIndexFromGraph will work with host memory and do the copy itself (and maybe a tiny bit more efficiently)
|
|
||
| // Allocate memory for the graph | ||
| long graphElements = (long) size * graphDegree; | ||
| Arena arena = resources.getArena(); |
There was a problem hiding this comment.
Same here, this can be local
| MemorySegment graphMemorySegment = arena.allocate(graphSequenceLayout); | ||
|
|
||
| // Allocate device memory for the graph | ||
| MemorySegment graphD = arena.allocate(C_POINTER); |
There was a problem hiding this comment.
Same here, I think cuvsCagraIndexGetGraph does this for you; this time even more important, as (I think) this will save a GPU memory copy.
| // Convert to 2D int array | ||
| int[][] graph = new int[size][graphDegree]; | ||
| for (int i = 0; i < size; i++) { | ||
| for (int j = 0; j < graphDegree; j++) { |
There was a problem hiding this comment.
Nit: this can be done "by-row" more efficiently with MemorySegment.copy
| CuVSResourcesImpl resources) { | ||
| this.memorySegment = indexMemorySegment; | ||
| this.dataset = dataset; | ||
| this.graphDevicePointer = graphDevicePointer; |
There was a problem hiding this comment.
I think this can be avoided if you don't allocate GPU graph memory yourself, keeping things easier/tidier (but better double check)
@ldematte That's fantastic. We can proceed via your PR (#1105) and close this (#1102). I opened this draft PR even before #1086 is merged so that @punAhuja can test the end-to-end workflow for building a HNSW graph in Lucene. He reported that it is working for him :-) |
In #902 and #1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. #1105 / #1102 or #1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from #1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: #1111
…#1111) In rapidsai#902 and rapidsai#1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. rapidsai#1105 / rapidsai#1102 or rapidsai#1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from rapidsai#1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: rapidsai#1111
…#1111) In rapidsai#902 and rapidsai#1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. rapidsai#1105 / rapidsai#1102 or rapidsai#1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from rapidsai#1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: rapidsai#1111
Initial attempt at exposing the CAGRA graph exposed via the Java API. This is currently based on @benfred's PR, I'll change the base branch to branch-25.08 once #1086 is merged there.
Note: Introduced
int[][] getGraph()in CagraIndex for now. We should revisit performance implications of this, and possibly avoid this double dimensional on-heap array. Just wanted to try something out for testing the functionality.