Skip to content

Clean up nemo-retriever dependencies#2183

Open
colesmcintosh wants to merge 4 commits into
NVIDIA:mainfrom
colesmcintosh:cleanup-nemo-retriever-dependencies
Open

Clean up nemo-retriever dependencies#2183
colesmcintosh wants to merge 4 commits into
NVIDIA:mainfrom
colesmcintosh:cleanup-nemo-retriever-dependencies

Conversation

@colesmcintosh
Copy link
Copy Markdown

@colesmcintosh colesmcintosh commented May 31, 2026

Description

Audit-driven cleanup of dependency declarations in nemo_retriever/pyproject.toml. Closes #1392.

  • Add runtime dependencies imported by shipped nemo_retriever code to core [project.dependencies]: aiohttp, python-dateutil, fastparquet, opencv-python-headless, scikit-learn, scipy, unstructured-client, and grpcio.
  • Remove the now-redundant scikit-learn, scipy, and opencv-python-headless entries from the service, local, and multimedia extras, since they are covered by core.
  • Regenerate nemo_retriever/uv.lock.

Notes:

  • The issue references api/pyproject.toml; on current main the relevant manifest is nemo_retriever/pyproject.toml.
  • opencv-python-headless is used instead of opencv-python because the project already suppresses opencv-python in uv overrides to avoid cv2/ directory conflicts.
  • redis, python-docx, python-pptx, minio, and pymilvus from the original issue are not imported by the current tree and were intentionally not added.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 31, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

colesmcintosh and others added 2 commits June 1, 2026 17:23
moviepy is not imported anywhere in src or tests; all media handling
goes through the ffmpeg-python core dependency. Its only effect was
pulling moviepy/imageio/proglog into the lock and forcing a global
decorator downgrade (5.2.1 -> 4.4.2) via moviepy's decorator<5 pin.
Remove the extra and regenerate the lock.

Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
@colesmcintosh colesmcintosh marked this pull request as ready for review June 1, 2026 22:31
@colesmcintosh colesmcintosh requested review from a team as code owners June 1, 2026 22:31
@colesmcintosh colesmcintosh requested a review from edknv June 1, 2026 22:31
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 1, 2026

Greptile Summary

Audit-driven cleanup of nemo_retriever/pyproject.toml that promotes eight runtime-imported dependencies (aiohttp, python-dateutil, fastparquet, opencv-python-headless, scikit-learn, scipy, unstructured-client, grpcio) from optional extras into the core [project.dependencies] list, then removes the now-redundant entries from the service, local, and multimedia extras. The uv.lock is regenerated to reflect the updated resolution.

  • Core dependency promotion: Every newly promoted dep is given a >=x.y.z lower bound, matching the style used throughout the rest of the file. fastparquet also carries the pre-existing <2026 upper bound from the uv override, kept consistent in both places.
  • Extra cleanup: scikit-learn removed from service and local; opencv-python-headless removed from local; scipy removed from multimedia. Multimedia section comment updated to remove the now-inaccurate reference to scipy.
  • Intentional omissions documented: redis, python-docx, python-pptx, minio, and pymilvus are explicitly called out in the PR description as not imported by the current tree.

Confidence Score: 5/5

Safe to merge — the change correctly promotes runtime-required dependencies into core with appropriate version bounds and removes the now-redundant extra entries.

All eight newly promoted dependencies carry >=x.y.z lower bounds consistent with the rest of the file. The fastparquet constraint is kept in sync with the pre-existing uv override. The opencv-python-headless promotion is correctly paired with the existing uv override that suppresses the conflicting opencv-python. Removed extra entries are now fully covered by core, so no consumer loses access to them. The lock file was regenerated, confirming the resolution is satisfiable. No correctness or safety issues were found.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/pyproject.toml Audit-driven promotion of 8 runtime deps into core and removal of their now-redundant extra entries; all new entries carry >=x.y.z lower bounds consistent with repo conventions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["pip install nemo-retriever\n(core only)"] --> B[Core deps]
    B --> C["aiohttp>=3.12.0 new"]
    B --> D["python-dateutil>=2.9.0 new"]
    B --> E["fastparquet>=2024.11.0,<2026 new"]
    B --> F["opencv-python-headless>=4.8.0 new"]
    B --> G["scikit-learn>=1.6.0 new"]
    B --> H["scipy>=1.11.0 new"]
    B --> I["unstructured-client>=0.42.0 new"]
    B --> J["grpcio>=1.60.0 new"]
    K["nemo-retriever[service]"] --> B
    K --> L["service extras\n(scikit-learn removed)"]
    M["nemo-retriever[local]"] --> B
    M --> N["local extras\n(opencv-headless, scikit-learn removed)"]
    O["nemo-retriever[multimedia]"] --> B
    O --> P["multimedia extras\n(scipy removed)"]
Loading

Reviews (2): Last reviewed commit: "Add version floors to grpcio and unstruc..." | Re-trigger Greptile

Comment thread nemo_retriever/pyproject.toml Outdated
Comment on lines +77 to +82
"unstructured-client",
# Default VDB solution
"lancedb",
# gRPC client for Parakeet/Riva ASR. Required for ASRCPUActor when it
# targets the public NVCF Parakeet endpoint (the default) or any remote NIM.
"grpcio",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing version constraints on new core deps

unstructured-client and grpcio are added with no lower-bound version constraint at all, while every other dep in this file uses either an exact pin or a >=x.y.z range. unstructured-client in particular has a history of breaking SDK changes across minor releases; without any floor, a uv lock --upgrade or a fresh install on a new machine can silently resolve to a version that is incompatible with the call sites already in the tree. grpcio is a native extension that must match the Python ABI and CUDA stack — an unbounded resolution is equally risky there.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 77-82

Comment:
**Missing version constraints on new core deps**

`unstructured-client` and `grpcio` are added with no lower-bound version constraint at all, while every other dep in this file uses either an exact pin or a `>=x.y.z` range. `unstructured-client` in particular has a history of breaking SDK changes across minor releases; without any floor, a `uv lock --upgrade` or a fresh install on a new machine can silently resolve to a version that is incompatible with the call sites already in the tree. `grpcio` is a native extension that must match the Python ABI and CUDA stack — an unbounded resolution is equally risky there.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added floors in 2a14c0f: grpcio>=1.60.0 (first line with reliable cp312 wheels; also constrained transitively by nvidia-riva-client) and unstructured-client>=0.42.0 (its SDK breaks across minor releases, so floored at the validated minor). Both are satisfied by the current lock, so resolution is unchanged.

Every other core dependency carries a lower bound; grpcio and
unstructured-client were added without one. Pin grpcio>=1.60.0 (first
line with reliable cp312 wheels, also constrained by nvidia-riva-client)
and unstructured-client>=0.42.0 (its SDK breaks across minor releases).
Both are satisfied by the current lock, so resolution is unchanged.

Addresses Greptile review feedback.

Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Clean up dependencies in api/pyproject.toml

1 participant