Skip to content

Fix parallel test flakes#513

Merged
jlamypoirier merged 3 commits into
mainfrom
jlp_debug-parallel-test-flakes
May 14, 2026
Merged

Fix parallel test flakes#513
jlamypoirier merged 3 commits into
mainfrom
jlp_debug-parallel-test-flakes

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

Summary

  • allocate per-worker port blocks and dedicated streaming ports to avoid xdist port collisions
  • use Hugging Face Hub HTTP backoff for Croissant metadata fetches while preserving live Hub test coverage
  • improve the Croissant metadata assertion/log messages so endpoint/network failures are clear

Testing

  • FAST_LLM_TEST_RESULTS_PATH=/tmp/fast_llm_tests/Fast-LLM" /Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m pytest -v -n 4 tests/data/test_streaming.py::test_streaming_dataset tests/data/test_streaming.py::test_streaming_sampled_dataset tests/data/test_streaming.py::test_data_streaming
  • FAST_LLM_TEST_RESULTS_PATH=/tmp/fast_llm_tests/Fast-LLM" /Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m pytest -v -n 4 tests/data/test_streaming.py::test_data_streaming tests/data/test_streaming.py::test_run_data_streaming_distributed tests/data/test_streaming.py::test_data_streaming_distributed
  • FAST_LLM_TEST_RESULTS_PATH=/tmp/fast_llm_tests/Fast-LLM" /Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m pytest -v -n 4 tests/data/test_preparator.py::test_dataset_preparator_from_hub
  • FAST_LLM_TEST_RESULTS_PATH=/tmp/fast_llm_tests/Fast-LLM" /Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m pytest -v -n 4 tests/models/test_streaming.py::test_run_model_distributed_streaming tests/models/test_streaming.py::test_model_distributed_streaming (skipped locally: CUDA unavailable)
  • git diff --check

jlamypoirier and others added 3 commits May 14, 2026 16:39
- Assert the model-streaming port range stays within PORTS_PER_WORKER so
  adding entries to _DISTRIBUTED_STREAMING_CONFIGS fails loudly instead
  of silently colliding with the next worker's range.
- Collapse the Croissant fetch assertion message to a single line.
- Rename the lingering `port` parameter on
  `_run_test_data_streaming_distributed` to `redis_port` for consistency
  with the helper it delegates to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jlamypoirier jlamypoirier merged commit f26fcc6 into main May 14, 2026
1 of 2 checks passed
@jlamypoirier jlamypoirier deleted the jlp_debug-parallel-test-flakes branch May 14, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant