Skip to content

chore: log Ray Pool run batches#4854

Draft
hfutatzhanghb wants to merge 1 commit into
lance-format:mainfrom
hfutatzhanghb:codex/log-ray-pool-run-batches
Draft

chore: log Ray Pool run batches#4854
hfutatzhanghb wants to merge 1 commit into
lance-format:mainfrom
hfutatzhanghb:codex/log-ray-pool-run-batches

Conversation

@hfutatzhanghb
Copy link
Copy Markdown
Contributor

Summary

  • Add info-level logs before and after each index Ray Pool run_batch business batch.
  • Include batch size, a bounded fragment ID preview, and result status in the log messages.
  • Add a unit test that exercises the shared index Pool helper and verifies the run-batch log messages.

Why

The index helper submits fragment_batches to ray.util.multiprocessing.Pool.map_async(..., chunksize=1). In Ray Pool, each submitted chunk is executed by PoolActor.run_batch. With chunksize=1, each run_batch contains one balanced fragment batch.

For large vector index builds, these batches can be memory-heavy or long-running. Bracketing each business batch with info-level logs makes it easier to identify which fragment batch is starting, which one completed, and what status it returned, without logging an unbounded list of fragment IDs.

Validation

  • python -m py_compile lance_ray/index.py tests/test_vector_index_options.py
  • python -m pytest tests/test_vector_index_options.py
  • uv run --no-sync ruff check lance_ray/index.py tests/test_vector_index_options.py
  • git diff --check

@github-actions github-actions Bot added the chore label May 18, 2026
@hfutatzhanghb hfutatzhanghb marked this pull request as ready for review May 18, 2026 09:54
@hfutatzhanghb hfutatzhanghb reopened this May 19, 2026
@hfutatzhanghb hfutatzhanghb marked this pull request as draft May 19, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant