Clean up nemo-retriever dependencies#2183
Conversation
Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
moviepy is not imported anywhere in src or tests; all media handling goes through the ffmpeg-python core dependency. Its only effect was pulling moviepy/imageio/proglog into the lock and forcing a global decorator downgrade (5.2.1 -> 4.4.2) via moviepy's decorator<5 pin. Remove the extra and regenerate the lock. Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
Greptile SummaryAudit-driven cleanup of
|
| Filename | Overview |
|---|---|
| nemo_retriever/pyproject.toml | Audit-driven promotion of 8 runtime deps into core and removal of their now-redundant extra entries; all new entries carry >=x.y.z lower bounds consistent with repo conventions. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["pip install nemo-retriever\n(core only)"] --> B[Core deps]
B --> C["aiohttp>=3.12.0 new"]
B --> D["python-dateutil>=2.9.0 new"]
B --> E["fastparquet>=2024.11.0,<2026 new"]
B --> F["opencv-python-headless>=4.8.0 new"]
B --> G["scikit-learn>=1.6.0 new"]
B --> H["scipy>=1.11.0 new"]
B --> I["unstructured-client>=0.42.0 new"]
B --> J["grpcio>=1.60.0 new"]
K["nemo-retriever[service]"] --> B
K --> L["service extras\n(scikit-learn removed)"]
M["nemo-retriever[local]"] --> B
M --> N["local extras\n(opencv-headless, scikit-learn removed)"]
O["nemo-retriever[multimedia]"] --> B
O --> P["multimedia extras\n(scipy removed)"]
Reviews (2): Last reviewed commit: "Add version floors to grpcio and unstruc..." | Re-trigger Greptile
| "unstructured-client", | ||
| # Default VDB solution | ||
| "lancedb", | ||
| # gRPC client for Parakeet/Riva ASR. Required for ASRCPUActor when it | ||
| # targets the public NVCF Parakeet endpoint (the default) or any remote NIM. | ||
| "grpcio", |
There was a problem hiding this comment.
Missing version constraints on new core deps
unstructured-client and grpcio are added with no lower-bound version constraint at all, while every other dep in this file uses either an exact pin or a >=x.y.z range. unstructured-client in particular has a history of breaking SDK changes across minor releases; without any floor, a uv lock --upgrade or a fresh install on a new machine can silently resolve to a version that is incompatible with the call sites already in the tree. grpcio is a native extension that must match the Python ABI and CUDA stack — an unbounded resolution is equally risky there.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 77-82
Comment:
**Missing version constraints on new core deps**
`unstructured-client` and `grpcio` are added with no lower-bound version constraint at all, while every other dep in this file uses either an exact pin or a `>=x.y.z` range. `unstructured-client` in particular has a history of breaking SDK changes across minor releases; without any floor, a `uv lock --upgrade` or a fresh install on a new machine can silently resolve to a version that is incompatible with the call sites already in the tree. `grpcio` is a native extension that must match the Python ABI and CUDA stack — an unbounded resolution is equally risky there.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Added floors in 2a14c0f: grpcio>=1.60.0 (first line with reliable cp312 wheels; also constrained transitively by nvidia-riva-client) and unstructured-client>=0.42.0 (its SDK breaks across minor releases, so floored at the validated minor). Both are satisfied by the current lock, so resolution is unchanged.
Every other core dependency carries a lower bound; grpcio and unstructured-client were added without one. Pin grpcio>=1.60.0 (first line with reliable cp312 wheels, also constrained by nvidia-riva-client) and unstructured-client>=0.42.0 (its SDK breaks across minor releases). Both are satisfied by the current lock, so resolution is unchanged. Addresses Greptile review feedback. Signed-off-by: Cole McIntosh <colemcintosh6@gmail.com>
Description
Audit-driven cleanup of dependency declarations in
nemo_retriever/pyproject.toml. Closes #1392.nemo_retrievercode to core[project.dependencies]:aiohttp,python-dateutil,fastparquet,opencv-python-headless,scikit-learn,scipy,unstructured-client, andgrpcio.scikit-learn,scipy, andopencv-python-headlessentries from theservice,local, andmultimediaextras, since they are covered by core.nemo_retriever/uv.lock.Notes:
api/pyproject.toml; on currentmainthe relevant manifest isnemo_retriever/pyproject.toml.opencv-python-headlessis used instead ofopencv-pythonbecause the project already suppressesopencv-pythonin uv overrides to avoidcv2/directory conflicts.redis,python-docx,python-pptx,minio, andpymilvusfrom the original issue are not imported by the current tree and were intentionally not added.Checklist