fix: cap cluster worker count via IFRAMELY_WORKERS_COUNT (OOMKilled on large nodes) by pratapalakshmi · Pull Request #4 · makeplane/iframely

pratapalakshmi · 2026-06-08T04:44:48Z

Problem (customer incident)

iframely's cluster forks os.cpus().length workers (graceful-cluster default). That's the host node's vCPU count and ignores the container CPU limit. On a 32-vCPU node, a pod with a 1000m CPU limit still forks ~32 workers, each independently loading ~1886 domains + connecting to Redis + fetching AWS Secrets Manager. Combined startup memory blows past the pod memory limit → OOMKilled / CrashLoopBackOff within ~40s.

There was no supported way to cap the worker count — cluster.js never passed workersCount, so neither an env var nor config.local.js could limit it.

Fix

cluster.js: pass workersCount: CONFIG.CLUSTER_WORKERS_COUNT to GracefulCluster.start.
config.loader.js: set CLUSTER_WORKERS_COUNT from IFRAMELY_WORKERS_COUNT (alias IFRAMELY_WORKERS).
When unset, behaviour is unchanged (graceful-cluster falls back to os.cpus().length).

This complements the existing IFRAMELY_WORKER_MAX_MEMORY_MB knob (added in #3): cap the worker count to the CPU/memory the container actually has.

Example

IFRAMELY_WORKERS_COUNT=4          # 4 workers regardless of node size
IFRAMELY_WORKER_MAX_MEMORY_MB=400

Testing

node --check on both files.
Unit-tested the env resolution: canonical var, alias, canonical-wins-over-alias, unset→fallback, 0→ignored, non-numeric→ignored — all pass.

Note

This is a follow-up to the merged #3 (released as v2.5.2). A customer hit this on 32-vCPU nodes. Recommend cutting a patch release once merged so it can flow through downstream image mirrors.

🤖 Generated with Claude Code

graceful-cluster defaults to os.cpus().length workers, which is the HOST node's vCPU count and ignores the container's CPU limit. On large nodes (e.g. 32 vCPU) this forks ~32 workers, each independently loading ~1886 domains + Redis + Secrets Manager, exhausting the pod memory limit -> OOMKilled/CrashLoopBackOff. Pass workersCount to GracefulCluster.start from CONFIG.CLUSTER_WORKERS_COUNT, settable via IFRAMELY_WORKERS_COUNT (alias IFRAMELY_WORKERS). When unset, behaviour is unchanged (falls back to os.cpus().length). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pratapalakshmi force-pushed the fix/iframely-worker-count-oom branch from bdd9b1b to 2193519 Compare June 8, 2026 04:56

pratapalakshmi merged commit 957d938 into main Jun 8, 2026

pratapalakshmi deleted the fix/iframely-worker-count-oom branch June 8, 2026 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: cap cluster worker count via IFRAMELY_WORKERS_COUNT (OOMKilled on large nodes)#4

fix: cap cluster worker count via IFRAMELY_WORKERS_COUNT (OOMKilled on large nodes)#4
pratapalakshmi merged 1 commit into
mainfrom
fix/iframely-worker-count-oom

pratapalakshmi commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pratapalakshmi commented Jun 8, 2026

Problem (customer incident)

Fix

Example

Testing

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant