Skip to content

[BUG] cuVS Scaler Quantization memory issue #718

@singhmanas1

Description

@singhmanas1

Describe the bug
Running into std::bad_alloc: out_of_memory: CUDA error while training scalar embeddings with cuVS. Runs below 800,000 embeddings but throws error after.

Embedding model - https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard
Dimensions - 2048
Precision - FP16

Steps/Code to reproduce bug

def scalar_quantize_cuvs(embeddings):
    embeddings_gpu = cp.asarray(embeddings)     
    params = scalar.QuantizerParams(quantile=0.99)
    quantizer = scalar.train(params, embeddings_gpu)
    transformed = scalar.transform(quantizer, embeddings_gpu)
    return transformed, quantization_time

Error Statement

Error during quantization: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory
Traceback:
Traceback (most recent call last):
  File "/tmp/ipykernel_765663/1026619002.py", line 35, in scalar_quantize_cuvs
    quantizer = scalar.train(params, embeddings_gpu)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "resources.pyx", line 110, in cuvs.common.resources.auto_sync_resources.wrapper
  File "scalar.pyx", line 126, in cuvs.preprocessing.quantize.scalar.scalar.train
  File "exceptions.pyx", line 37, in cuvs.common.exceptions.check_cuvs
cuvs.common.exceptions.CuvsException: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory

Expected behavior
Successful code execution

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
Image
  • Method of RAFT install: [conda, Docker, or from source]
    pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions