Skip to content

feat(e2e): add CPU E2E test suite with provisioner and rolling release tests#326

Open
runpod-Henrik wants to merge 2 commits intomainfrom
Henrik/e2e-cpu-smoke
Open

feat(e2e): add CPU E2E test suite with provisioner and rolling release tests#326
runpod-Henrik wants to merge 2 commits intomainfrom
Henrik/e2e-cpu-smoke

Conversation

@runpod-Henrik
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik commented Apr 23, 2026

Summary

Adds the full E2E test infrastructure built and validated during v1.14.0 QA. All 15 CPU tests confirmed passing locally.
AE-2168

New files:

  • e2e/provisioner.py — session-scoped endpoint pool with parallel provisioning
  • e2e/test_cpu_suite.py — QB function (smoke, empty string, unicode, concurrent), deps (numpy/pandas), class, LB endpoint (9 pass, 1 xfail AE-2744)
  • e2e/test_rolling_release.py — no-spurious-release and config-change-triggers-drift
  • e2e/test_redeploy.py — scale-to-zero and multi-worker recycle tests (3 pass)
  • e2e/test_gpu_smoke.py — GPU deploy → invoke → undeploy

Updated:

  • e2e/conftest.py — better error messages, sys.path fix, sweep prefix filter
  • e2e/test_cpu_smoke.py — updated for provisioner
  • .github/workflows/e2e.yml — inject FLASH_SDK_GIT_REF

Note on GPU smoke: test_gpu_smoke.py will timeout in CI when GPU inventory is constrained.

Test plan

  • All 15 CPU tests confirmed passing locally (v1.14.0): cpu_smoke, cpu_suite (9+1 xfail), rolling_release (2), redeploy (3)
  • GPU smoke — requires GPU inventory; expected to timeout when constrained

🤖 Generated with Claude Code

…e tests

Adds the full E2E test infrastructure built and validated during v1.14.0 QA:

- provisioner.py: session-scoped endpoint pool with parallel provisioning
- test_cpu_smoke.py: updated deploy → invoke → undeploy smoke test
- test_cpu_suite.py: QB function (smoke, empty string, unicode, concurrent),
  deps (numpy/pandas), class, and LB endpoint tests (9 pass, 1 xfail AE-2744)
- test_rolling_release.py: no-spurious-release and config-change-triggers-drift
- test_redeploy.py: scale-to-zero and multi-worker (scale-to-zero + always-on)
  recycle tests; single-slot always-on failures split to test_redeploy_always_on.py
- e2e.yml: enable push/PR CI triggers; inject FLASH_SDK_GIT_REF

All 15 CPU tests confirmed passing locally (v1.14.0). GPU smoke included;
may timeout in CI when GPU inventory is constrained.

Excluded from this PR (tracked separately):
- test_redeploy_always_on.py: single-slot always-on recycle (AE-2940/2941/2942)
- test_source_fingerprint.py: needs assertion update
- test_concurrency_modifier.py: inconclusive — needs redesign

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep workflow_dispatch-only trigger; schedule can be added back once
the E2E account quota and test batching are sorted out.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant