-
Notifications
You must be signed in to change notification settings - Fork 160
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Running into std::bad_alloc: out_of_memory: CUDA error while training scalar embeddings with cuVS. Runs below 800,000 embeddings but throws error after.
Embedding model - https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard
Dimensions - 2048
Precision - FP16
Steps/Code to reproduce bug
def scalar_quantize_cuvs(embeddings):
embeddings_gpu = cp.asarray(embeddings)
params = scalar.QuantizerParams(quantile=0.99)
quantizer = scalar.train(params, embeddings_gpu)
transformed = scalar.transform(quantizer, embeddings_gpu)
return transformed, quantization_time
Error Statement
Error during quantization: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory
Traceback:
Traceback (most recent call last):
File "/tmp/ipykernel_765663/1026619002.py", line 35, in scalar_quantize_cuvs
quantizer = scalar.train(params, embeddings_gpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "resources.pyx", line 110, in cuvs.common.resources.auto_sync_resources.wrapper
File "scalar.pyx", line 126, in cuvs.preprocessing.quantize.scalar.scalar.train
File "exceptions.pyx", line 37, in cuvs.common.exceptions.check_cuvs
cuvs.common.exceptions.CuvsException: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory
Expected behavior
Successful code execution
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of RAFT install: [conda, Docker, or from source]
pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Done