Skip to content

Initial cut for a cuVS Java API#450

Merged
rapids-bot[bot] merged 76 commits intorapidsai:branch-25.02from
SearchScale:java-api
Jan 30, 2025
Merged

Initial cut for a cuVS Java API#450
rapids-bot[bot] merged 76 commits intorapidsai:branch-25.02from
SearchScale:java-api

Conversation

@chatman
Copy link
Contributor

@chatman chatman commented Nov 8, 2024

A Java API for cuVS for easy integration into Apache Lucene or other Java based projects.

Try:

./build.sh libcuvs
./build.sh java

For generating docs, mvn javadoc:javadoc

Prerequisites:

  • JDK 22
  • Maven 3.9.6+

Todo:

  • Generate project panama classes using jextract on every build
  • Algorithms other than Cagra
  • Prefiltering in cagra

Co-authored-by: Vivek Narang <vivek@searchscale.com>
@chatman chatman requested review from a team as code owners November 8, 2024 14:05
@chatman chatman requested a review from msarahan November 8, 2024 14:05
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the CMake label Nov 8, 2024
@chatman
Copy link
Contributor Author

chatman commented Nov 8, 2024

FYI @cjnolet ^
An ExampleApp.java is added as a starting point for the review.

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 8, 2024
@cjnolet cjnolet changed the base branch from branch-24.10 to branch-24.12 November 8, 2024 16:06
@cjnolet
Copy link
Member

cjnolet commented Nov 8, 2024

/ok to test

@chatman chatman changed the title [WIP] Initial cut for a cuVS Java API Initial cut for a cuVS Java API Nov 18, 2024
@chatman
Copy link
Contributor Author

chatman commented Nov 18, 2024

@naramgvivek10 Let's move CuVSResources to the cuvs package instead of common? That way we can abstract out the internals of Panama out of sight of the users.

narangvivek10 and others added 2 commits January 24, 2025 14:45
* Update CI script to update version in pom.xml files
* Add and update examples
* Add GPU info API methods
---------
Co-authored-by: Vivek Narang <vivek@searchscale.com>
@cjnolet
Copy link
Member

cjnolet commented Jan 24, 2025

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the packaging details at @cjnolet's request. There are some merge conflicts with #593. I will push a couple small packaging-related changes to this PR to help it move along. I do not have power to push to this fork, so I can't help here. Please merge in branch-25.02 and run pre-commit.

The biggest work item needed on the packaging side is to clarify what versioning should look like. I think a patch version (25.02.0) is important to add here, as we often publish patch releases. Currently the update-version.sh script uses a full tag like 25.02.00 but that doesn't correspond to what the version strings look like right now. Also, we should probably normalize it to 25.02.0 (drop the extra leading zero), as we do for several packaging systems (all Python wheels are normalized, as is cuDF's Java layer). I would look at how cuDF versions its Java layer for more inspiration, which I linked in a comment.

Comment on lines +100 to +103
# Update Java API version
sed_runner "s/VERSION=\".*\"/VERSION=\"${NEXT_FULL_TAG}\"/g" java/build.sh
for FILE in java/*/pom.xml; do
sed_runner "/<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START-->.*<!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/s//<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START--><version>${NEXT_FULL_TAG}<\/version><!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/g" "${FILE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently inconsistent with our normal use of the script. We usually run ./ci/release/update-version.sh 25.02.00 which would assign a value of 25.02.00. However, the pom.xml files use 25.02 without the patch version. I would guess that we do want to have a patch version here, but perhaps we want to normalize it to 25.02.0. That's close to what we do in cuDF's Java code.

Please look at https://github.com/rapidsai/cudf/blob/133e0c869531af94474e0bbb66cb22c5f8ba80f2/ci/release/update-version.sh#L87-L91 and https://github.com/rapidsai/cudf/blob/133e0c869531af94474e0bbb66cb22c5f8ba80f2/java/pom.xml#L24 and see whether we should adopt the same patterns here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bradley, I've updated the version number format to include a patch number also. This should be better for Java artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should normalize it to 25.02.0 instead of 25.02.00? That would align with what we do for Java code elsewhere in RAPIDS, including cuDF and KvikIO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bdice I will update the versioning following your suggestion above and push this change soon.

java/README.md Outdated
Building
--------

`./build.sh` will generate the libcuvs_java.so file in internal/ directory, and then build the final jar file for the cuVS Java API in cuvs-java/ directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`./build.sh` will generate the libcuvs_java.so file in internal/ directory, and then build the final jar file for the cuVS Java API in cuvs-java/ directory.
`./build.sh` will generate the `libcuvs_java.so` file in the `internal/` directory, and then build the final jar file for the cuVS Java API in the `cuvs-java/` directory.

@chatman
Copy link
Contributor Author

chatman commented Jan 27, 2025

I updated the branch by merging latest branch-25.02 changes. Needed to add a filter param to cagra search (#452).
However, now tests don't work and I see the following (followed by JVM crash):
free(): double free detected in tcache 2

FYI @narangvivek10, please take a look.

@chatman
Copy link
Contributor Author

chatman commented Jan 27, 2025

(FYI @ChrisHegarty, @narangvivek10), Last working commit in this branch is:

Author: Vivek Narang <123010842+narangvivek10@users.noreply.github.com>
Date:   Fri Jan 24 14:45:44 2025 -0500

    Add examples, update GPU info API methods, and update CI script
    
    * Update CI script to update version in pom.xml files
    * Add and update examples
    * Add GPU info API methods
    ---------
    Co-authored-by: Vivek Narang <vivek@searchscale.com>

@cjnolet
Copy link
Member

cjnolet commented Jan 28, 2025

/ok to test

1 similar comment
@cjnolet
Copy link
Member

cjnolet commented Jan 28, 2025

/ok to test

build.sh Outdated
# Build the cuvs Java bindings
if (( ${NUMARGS} == 0 )) || hasArg java; then
# build libcuvs first as the Java API depends on it
./$0 libcuvs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is good practice to call this script recursively to build the C++ code. It seems like it will lead to unexpected behavior. For example, how do you ensure that the CLI arguments such as --allgpuarch or --no-mg are being passed through?

I would recommend removing this line, and users must call ./build.sh libcuvs java if the C++ has not yet been built. Perhaps a warning can be issued if hasArg libcuvs is false. I believe this is how we handle the Python builds, which also depend on C++ libraries being built.

Suggested change
./$0 libcuvs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert any files with only copyright changes.

Comment on lines +100 to +103
# Update Java API version
sed_runner "s/VERSION=\".*\"/VERSION=\"${NEXT_FULL_TAG}\"/g" java/build.sh
for FILE in java/*/pom.xml; do
sed_runner "/<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START-->.*<!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/s//<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START--><version>${NEXT_FULL_TAG}<\/version><!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/g" "${FILE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should normalize it to 25.02.0 instead of 25.02.00? That would align with what we do for Java code elsewhere in RAPIDS, including cuDF and KvikIO.

@cjnolet
Copy link
Member

cjnolet commented Jan 29, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 29, 2025

/ok to test

1 similar comment
@cjnolet
Copy link
Member

cjnolet commented Jan 30, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 30, 2025

@bdice the devcontainer build seems to be complaining about rmm::rmm target not being available in CMakeLists but I'm not seeing this behavior in any other PRs atm. The thing is, I'm not sure what would have changed in this PR that would have caused this behavior (and nothing seems immediately obvious to me). Do you have any ideas here?

This is the log for the devcontainer: https://github.com/rapidsai/cuvs/actions/runs/13024175979/job/36330368363?pr=450.

@cjnolet
Copy link
Member

cjnolet commented Jan 30, 2025

/ok to test

Copy link
Contributor

@msarahan msarahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. You may want to run shellcheck on your .sh scripts. Mostly pedantic, but sometimes it catches annoying bugs.

@rapids-bot rapids-bot bot merged commit 4ca47c9 into rapidsai:branch-25.02 Jan 30, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

6 participants