stanza_service: loosen healthcheck so busy workers aren't killed#662
Merged
Conversation
Tokenization is CPU-bound and a long text can keep the (single) gunicorn worker busy, delaying /health. The aggressive 5s timeout / 3 retries let orchestrators (autoheal in the deployed stack) restart a healthy-but-busy container under load, causing ~15-30s of downtime + model reload. Loosen to timeout 30s / retries 5. Note: the deployed compose file defines its own stanza healthcheck which overrides this; that is updated in the ops repo. This keeps standalone `docker run` builds consistent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
ArchLens - No architecturally relevant changes to the existing views |
Keep ~90s detection of a genuine outage (30s interval x 3 retries) instead of ballooning to ~2.5min; only widen the per-probe timeout 5s -> 10s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
>5s is the p99.9 of real tokenizations and the legitimate tail reaches ~12s for large texts, so 5s tripped on genuine work. 15s clears it; keep 30s/3 for ~90s outage detection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change (
stanza_service/Dockerfile)HEALTHCHECKtimeout 5s → 15s, chosen from measured latency rather than feel: across ~41,340 real tokenizations, >5s is the p99.9 (0.11%) and the legitimate slow tail reaches ~12s for large texts. So the old 5s ceiling tripped on genuine big-text work. 15s clears the observed max with margin; interval/retries stay at 30s/3 so a real outage is still caught in ~90s.Why it matters
In the deployed stack this false-positive let
autohealrestart a healthy-but-busy stanza (~15–30s downtime + model reload), stalling the API and surfacing as transient ~15s page loads.Note
The deployed
docker-compose.ymldefines its ownstanzahealthcheck that overrides this at runtime — updated in the companion ops PR (which also addsGUNICORN_WORKERS: "2",mem_limit: 16g, and crawlcpu_shares). This keeps standalonedocker runbuilds consistent.🤖 Generated with Claude Code