[BUG] cuVS Scaler Quantization memory issue

**Describe the bug**
Running into `std::bad_alloc: out_of_memory: CUDA error` while training scalar embeddings with cuVS. Runs below 800,000 embeddings but throws error after. 

Embedding model - https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard
Dimensions - 2048
Precision - FP16

**Steps/Code to reproduce bug**
```
def scalar_quantize_cuvs(embeddings):
    embeddings_gpu = cp.asarray(embeddings)     
    params = scalar.QuantizerParams(quantile=0.99)
    quantizer = scalar.train(params, embeddings_gpu)
    transformed = scalar.transform(quantizer, embeddings_gpu)
    return transformed, quantization_time
```

**Error Statement**
```
Error during quantization: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory
Traceback:
Traceback (most recent call last):
  File "/tmp/ipykernel_765663/1026619002.py", line 35, in scalar_quantize_cuvs
    quantizer = scalar.train(params, embeddings_gpu)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "resources.pyx", line 110, in cuvs.common.resources.auto_sync_resources.wrapper
  File "scalar.pyx", line 126, in cuvs.preprocessing.quantize.scalar.scalar.train
  File "exceptions.pyx", line 37, in cuvs.common.exceptions.check_cuvs
cuvs.common.exceptions.CuvsException: std::bad_alloc: out_of_memory: CUDA error at: [/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62](https://vscode-remote+ssh-002dremote-002b10-002e176-002e1-002e125.vscode-resource.vscode-cdn.net/pyenv/versions/3.12.9/lib/python3.12/site-packages/librmm/include/rmm/mr/device/cuda_memory_resource.hpp:62): cudaErrorMemoryAllocation out of memory
```

**Expected behavior**
Successful code execution

**Environment details (please complete the following information):**
 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]

<img width="781" alt="Image" src="https://github.com/user-attachments/assets/98ca395c-55ff-4387-b582-9dfdabd42a2e" />

 - Method of RAFT install: [conda, Docker, or from source]
`pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cuVS Scaler Quantization memory issue #718

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] cuVS Scaler Quantization memory issue #718

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions