Skip to content

stanza_service: loosen healthcheck so busy workers aren't killed#662

Merged
mircealungu merged 4 commits into
masterfrom
fix-stanza-healthcheck-timeout
Jul 3, 2026
Merged

stanza_service: loosen healthcheck so busy workers aren't killed#662
mircealungu merged 4 commits into
masterfrom
fix-stanza-healthcheck-timeout

Conversation

@mircealungu

@mircealungu mircealungu commented Jul 3, 2026

Copy link
Copy Markdown
Member

Change (stanza_service/Dockerfile)

HEALTHCHECK timeout 5s → 15s, chosen from measured latency rather than feel: across ~41,340 real tokenizations, >5s is the p99.9 (0.11%) and the legitimate slow tail reaches ~12s for large texts. So the old 5s ceiling tripped on genuine big-text work. 15s clears the observed max with margin; interval/retries stay at 30s/3 so a real outage is still caught in ~90s.

Why it matters

In the deployed stack this false-positive let autoheal restart a healthy-but-busy stanza (~15–30s downtime + model reload), stalling the API and surfacing as transient ~15s page loads.

Note

The deployed docker-compose.yml defines its own stanza healthcheck that overrides this at runtime — updated in the companion ops PR (which also adds GUNICORN_WORKERS: "2", mem_limit: 16g, and crawl cpu_shares). This keeps standalone docker run builds consistent.

🤖 Generated with Claude Code

Tokenization is CPU-bound and a long text can keep the (single) gunicorn
worker busy, delaying /health. The aggressive 5s timeout / 3 retries let
orchestrators (autoheal in the deployed stack) restart a healthy-but-busy
container under load, causing ~15-30s of downtime + model reload.

Loosen to timeout 30s / retries 5. Note: the deployed compose file defines
its own stanza healthcheck which overrides this; that is updated in the ops
repo. This keeps standalone `docker run` builds consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

ArchLens - No architecturally relevant changes to the existing views

mircealungu and others added 3 commits July 3, 2026 12:58
Keep ~90s detection of a genuine outage (30s interval x 3 retries) instead
of ballooning to ~2.5min; only widen the per-probe timeout 5s -> 10s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
>5s is the p99.9 of real tokenizations and the legitimate tail reaches
~12s for large texts, so 5s tripped on genuine work. 15s clears it;
keep 30s/3 for ~90s outage detection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mircealungu mircealungu merged commit 371e4fa into master Jul 3, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant