Skip to content

feat: child-process isolation for benchmarks#512

Merged
carlos-alm merged 9 commits intomainfrom
feat/bench-subprocess-isolation
Mar 19, 2026
Merged

feat: child-process isolation for benchmarks#512
carlos-alm merged 9 commits intomainfrom
feat/bench-subprocess-isolation

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Run each engine/model benchmark in a forked subprocess so segfaults only kill the child — the parent survives and collects partial results
  • New shared utility scripts/lib/fork-engine.js handles subprocess lifecycle, timeout (10min), crash recovery, and partial JSON extraction from non-zero exits
  • benchmark.js, query-benchmark.js, incremental-benchmark.js fork per engine (wasm/native); embedding-benchmark.js forks per model (ONNX can OOM/segfault)
  • JSON output contract unchanged — report update scripts and CI workflow need no modifications

Test plan

  • Run node scripts/benchmark.js locally — verify JSON output matches previous schema
  • Run node scripts/query-benchmark.js locally — verify JSON output
  • Run node scripts/incremental-benchmark.js locally — verify JSON output
  • Run node scripts/embedding-benchmark.js locally — verify JSON output
  • Simulate a crash by forcing an engine failure — verify parent collects partial results from surviving engine
  • Verify [fork] log messages appear on stderr showing subprocess spawn/exit

Run each engine/model benchmark in a forked subprocess so that segfaults
(e.g. from the native Rust addon) only kill the child — the parent
survives and collects partial results from whichever engines succeeded.

- New shared utility: scripts/lib/fork-engine.js (isWorker, workerEngine,
  forkEngines) handles subprocess lifecycle, timeout, crash recovery, and
  partial JSON result extraction
- benchmark.js, query-benchmark.js, incremental-benchmark.js: fork per
  engine (wasm/native) via forkEngines()
- embedding-benchmark.js: fork per model (ONNX runtime can OOM/segfault)
- JSON output contract unchanged — report scripts and CI need no changes
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Child-Process Isolation for Benchmarks

Comprehensive review of subprocess isolation implementation to handle segfaults gracefully.

Review Tasks ✅

  • Analyze root cause and problem justification
  • Review new fork-engine.js utility implementation
  • Check integration in modified benchmark files
  • Assess configuration changes and test integrity
  • Evaluate against codegraph backlog compliance criteria
  • Provide final recommendation and rating

Root Cause Analysis

Problem Being Solved: Native Rust addon segfaults kill entire benchmark processes, preventing collection of results from working engines (WASM vs native). This causes benchmark CI failures and incomplete performance data.

Why This Approach: Child-process isolation is the standard solution for handling potentially unstable native code. Alternative approaches (fixing all segfaults in Rust addon, or abandoning native benchmarks) are either infeasible or reduce test coverage.

Risk Assessment: Low risk - this is defensive programming that improves reliability without changing functionality.

Implementation Quality Assessment

fork-engine.js Design ⭐⭐⭐⭐

  • ✅ Clean worker/parent process API (isWorker(), workerEngine(), forkEngines())
  • ✅ Robust timeout handling (10min default, configurable, SIGKILL fallback)
  • ✅ Graceful error recovery (partial JSON extraction from failed processes)
  • ✅ Sequential execution to avoid DB file conflicts
  • ✅ Clear subprocess lifecycle logging with [fork] prefix

Integration Quality ⭐⭐⭐⭐

  • ✅ All 4 benchmarks follow consistent fork pattern
  • ✅ JSON output isolation (console.log → stderr in workers)
  • ✅ Maintained output schema - no breaking changes for CI/scripts
  • ✅ Proper cleanup in both parent and worker code paths
  • embedding-benchmark.js uses custom worker implementation (appropriate for per-model forking)

Code Quality ⭐⭐⭐⭐

  • ✅ Consistent error handling and logging
  • ✅ No hardcoded paths or magic numbers
  • ✅ Proper resource cleanup
  • ✅ Clear separation of concerns

Configuration & Test Integrity ✅

  • No configuration changes detected - no linting rules relaxed, no timeouts increased
  • No test modifications - test integrity preserved
  • No biome.json changes - linting standards maintained

Backlog Compliance

  • Zero-dep: ✓ Uses only built-in Node.js child_process and url modules
  • Foundation-aligned: ✓ Improves reliability without changing core functionality - aligns with P5 (embeddable-first)
  • Problem-fit: 3/5 - Useful operational improvement but doesn't directly address "AI agents waste tokens re-orienting in large codebases"
  • Breaking: No - Output contracts unchanged, purely additive isolation layer
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is well-implemented defensive programming.

Minor Observations

  1. Sequential vs parallel execution: Scripts run engines sequentially due to shared DB state - this is correct and documented
  2. Timeout choice: 10-minute timeout is reasonable for benchmarks that parse large codebases
  3. Error recovery: Attempting to parse partial JSON from failed processes is good UX

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a clean, well-designed solution to a real operational problem. The implementation follows Node.js best practices, maintains backward compatibility, and improves benchmark reliability. While it doesn't directly solve codegraph's core "AI orientation" problem, it's valuable infrastructure that prevents CI failures and ensures complete benchmark data collection.

The code quality is high, error handling is robust, and the approach is standard practice for isolating unstable native code.

Recommendation: Approve and merge.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR introduces child-process isolation for all four benchmark scripts so that native-addon segfaults only kill the child worker rather than the entire benchmark run. A new shared utility scripts/lib/fork-engine.js exports forkWorker() (generic subprocess lifecycle helper) and forkEngines() (engine-specific orchestrator), which are consumed by all three engine benchmarks; embedding-benchmark.js now also uses forkWorker() instead of its own copy.

All previously raised review issues have been addressed: the dual-timeout problem, the double-resolution race, the cleanup leak in parent paths, the forkEnginesthrow refactor, and the consolidation of forkModel into the shared forkWorker. The code is well-structured and the JSON output contract is unchanged.

One remaining issue:

  • The git staging safety net in query-benchmark.js calls git checkout . (line 47) to restore the working tree after a crash. This command resets all tracked files to HEAD, not just the hub file that the crashed worker staged. The staged variable already holds the exact filenames, but they are not used for the working-tree restore step — only for the git restore --staged call. A developer running benchmarks with unrelated uncommitted edits in their repository would silently lose those changes.

Confidence Score: 4/5

  • Safe to merge with a targeted fix to the git checkout . scope in query-benchmark.js
  • The core isolation architecture is sound, all previous review items have been resolved, and the JSON schema is unchanged. One logic issue remains: the git safety-net cleanup uses git checkout . instead of scoping the restore to the files identified by git diff --cached --name-only, which can silently discard unrelated developer edits in a dirty working tree.
  • scripts/query-benchmark.js — git checkout . on line 47 should be scoped to the staged files identified on line 43

Important Files Changed

Filename Overview
scripts/lib/fork-engine.js New shared subprocess utility. forkWorker() correctly uses a settled guard, a single SIGKILL-only timeout, and partial-JSON recovery. forkEngines() throws instead of process.exit(), allowing callers to clean up. No issues found.
scripts/benchmark.js Parent path captures versionCleanup and calls it on both success and error paths. Worker path correctly redirects stdout/stderr and writes JSON before exiting. No issues found.
scripts/query-benchmark.js Git staging safety net runs git checkout . which discards ALL working-tree changes rather than just the hub file that the crashed worker staged. The staged file list is available but unused for the working-tree restore step.
scripts/incremental-benchmark.js Parent correctly captures parentCleanup and calls it on both success and error paths. Worker modifies PROBE_FILE inside a try/finally for orderly cleanup. No issues found.
scripts/embedding-benchmark.js Now correctly imports and uses the shared forkWorker() from fork-engine.js. Worker uses its own env key (__BENCH_MODEL__) distinct from the engine key, which is correct. Console redirect and cleanup are handled properly in both worker and parent paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([node benchmark.js]) --> B{isWorker?}
    B -- No: parent path --> C[resolveBenchmarkSource\nget version + versionCleanup]
    C --> D[forkEngines]
    D --> E[resolveBenchmarkSource\nget srcDir + cleanup]
    E --> F{hasWasm?\nhasNative?}
    F -- neither --> G[throw Error]
    G --> H[caller catches,\ncalls versionCleanup,\nprocess.exit 1]
    F -- at least one --> I[forkWorker wasm\nor native sequentially]
    I --> J[fork child process\nSIGKILL timer 10 min]
    J --> K{child close event}
    K -- killed by signal --> L[settle null]
    K -- code != 0 --> M{valid JSON\nin stdout?}
    M -- yes --> N[settle parsed partial]
    M -- no --> O[settle null]
    K -- code 0 --> P[settle parsed JSON]
    L & N & O & P --> Q[results.wasm / results.native]
    Q --> R[assemble output JSON\nversionCleanup\nprocess.exit 0]
    B -- Yes: worker path --> S[workerEngine\nrun benchmark for engine]
    S --> T[write JSON to stdout\ncleanup\nprocess.exit 0]
Loading

Last reviewed commit: "fix: replace process..."

Comment on lines +95 to +107
const child = fork(scriptPath, argv, {
env: { ...process.env, [WORKER_ENV_KEY]: engine },
stdio: ['ignore', 'pipe', 'inherit', 'ipc'],
timeout: timeoutMs,
});

let stdout = '';
child.stdout.on('data', (chunk) => { stdout += chunk; });

const timer = setTimeout(() => {
console.error(`[fork] ${engine} worker timed out after ${timeoutMs / 1000}s — killing`);
child.kill('SIGKILL');
}, timeoutMs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dual timeout fires simultaneously, negating SIGTERM grace period

Both fork()'s built-in timeout option and the manual setTimeout are set to the same timeoutMs value. The built-in timeout option sends SIGTERM at T ms, while the manual timer sends SIGKILL at T ms — nearly simultaneously. Since SIGKILL arrives at essentially the same moment as SIGTERM, the child never has an opportunity for graceful cleanup before it's forcibly killed.

If the intent is SIGKILL-only (fast, no grace period), remove the timeout option from fork() and let the manual timer be the sole mechanism:

const child = fork(scriptPath, argv, {
    env: { ...process.env, [WORKER_ENV_KEY]: engine },
    stdio: ['ignore', 'pipe', 'inherit', 'ipc'],
    // Remove `timeout` here — the manual timer below handles it with SIGKILL
});

If the intent is graceful-then-force (SIGTERM + grace + SIGKILL), offset the manual timer:

const child = fork(scriptPath, argv, {
    env: { ...process.env, [WORKER_ENV_KEY]: engine },
    stdio: ['ignore', 'pipe', 'inherit', 'ipc'],
    timeout: timeoutMs,       // sends SIGTERM at T
});

const timer = setTimeout(() => {
    console.error(`[fork] ${engine} worker timed out — sending SIGKILL`);
    child.kill('SIGKILL');
}, timeoutMs + 5_000);        // SIGKILL 5 s after SIGTERM grace period

The same pattern applies to embedding-benchmark.js (lines 140–152).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the timeout option from fork() so the manual SIGKILL timer is the sole timeout mechanism. No more simultaneous SIGTERM+SIGKILL. This is now in the shared forkWorker() helper, so the fix applies to both fork-engine.js and embedding-benchmark.js.

Comment on lines +23 to +24
const { version } = await resolveBenchmarkSource();
const { wasm, native } = await forkEngines(import.meta.url, process.argv.slice(2));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 cleanup() from first resolveBenchmarkSource() call is never invoked

resolveBenchmarkSource() is called here solely to extract version, but the returned cleanup callback is never captured or called. forkEngines() then calls resolveBenchmarkSource() again internally (and does call its own cleanup). If resolveBenchmarkSource() creates any temporary resources (symlinks, temp dirs, npm installs), those from this first call will be leaked for the lifetime of the parent process.

Contrast this with incremental-benchmark.js line 22, which correctly captures and calls parentCleanup().

Suggested change
const { version } = await resolveBenchmarkSource();
const { wasm, native } = await forkEngines(import.meta.url, process.argv.slice(2));
const { version, cleanup: versionCleanup } = await resolveBenchmarkSource();
const { wasm, native } = await forkEngines(import.meta.url, process.argv.slice(2));

Then call versionCleanup() before process.exit(0) on line 69.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — parent path now captures cleanup as versionCleanup and calls it before process.exit(0), matching how incremental-benchmark.js handles it.

Comment on lines +136 to +189
function forkModel(modelKey) {
return new Promise((resolve) => {
console.error(`\n[fork] Spawning ${modelKey} worker (pid isolation)...`);

const child = fork(scriptPath, process.argv.slice(2), {
env: { ...process.env, [MODEL_WORKER_KEY]: modelKey },
stdio: ['ignore', 'pipe', 'inherit', 'ipc'],
timeout: TIMEOUT_MS,
});

let stdout = '';
child.stdout.on('data', (chunk) => { stdout += chunk; });

const timer = setTimeout(() => {
console.error(`[fork] ${modelKey} worker timed out after ${TIMEOUT_MS / 1000}s — killing`);
child.kill('SIGKILL');
}, TIMEOUT_MS);

child.on('close', (code, signal) => {
clearTimeout(timer);

if (signal) {
console.error(`[fork] ${modelKey} worker killed by signal ${signal}`);
resolve(null);
return;
}

if (code !== 0) {
console.error(`[fork] ${modelKey} worker exited with code ${code}`);
try {
const parsed = JSON.parse(stdout);
console.error(`[fork] ${modelKey} worker produced partial results despite non-zero exit`);
resolve(parsed);
} catch {
resolve(null);
}
return;
}

try {
resolve(JSON.parse(stdout));
} catch (err) {
console.error(`[fork] ${modelKey} worker produced invalid JSON: ${err.message}`);
resolve(null);
}
});

child.on('error', (err) => {
clearTimeout(timer);
console.error(`[fork] ${modelKey} worker failed to start: ${err.message}`);
resolve(null);
});
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 forkModel() duplicates runWorker() logic from fork-engine.js

The forkModel() function is a near-identical copy of runWorker() in scripts/lib/fork-engine.js — same stdout accumulation, same dual-timeout pattern, same partial-JSON fallback, same error handlers. The PR description frames fork-engine.js as a shared utility, but embedding-benchmark.js never imports it, maintaining a parallel implementation.

This means any bug fix or improvement to the fork lifecycle (e.g., the double-timeout issue above) must be applied in two places. Consider generalising forkEngines / fork-engine.js to accept an arbitrary worker key and a list of worker names, or extract a lower-level forkWorker(scriptPath, envKey, workerName, argv, opts) helper that both forkEngines and embedding-benchmark.js can call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — extracted a generic forkWorker(scriptPath, envKey, workerName, argv, timeoutMs) helper in fork-engine.js. embedding-benchmark.js now imports and uses it instead of maintaining its own copy. Bug fixes (timeout, settle guard) are in one place.

Comment on lines +109 to +143
child.on('close', (code, signal) => {
clearTimeout(timer);

if (signal) {
console.error(`[fork] ${engine} worker killed by signal ${signal}`);
resolve(null);
return;
}

if (code !== 0) {
console.error(`[fork] ${engine} worker exited with code ${code}`);
// Try to parse partial output anyway
try {
const parsed = JSON.parse(stdout);
console.error(`[fork] ${engine} worker produced partial results despite non-zero exit`);
resolve(parsed);
} catch {
resolve(null);
}
return;
}

try {
resolve(JSON.parse(stdout));
} catch (err) {
console.error(`[fork] ${engine} worker produced invalid JSON: ${err.message}`);
resolve(null);
}
});

child.on('error', (err) => {
clearTimeout(timer);
console.error(`[fork] ${engine} worker failed to start: ${err.message}`);
resolve(null);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Promise may be resolved twice on spawn failure

Node.js child-process documentation notes that when a process fails to spawn, both the 'error' event and the 'close' event are emitted (in that order, with close carrying code: null, signal: null). Both handlers call resolve(null), so the Promise is resolved twice. Since subsequent resolutions are silently ignored by the Promise spec, this is harmless today, but it could mask future cases where the two events carry different values.

A common guard is a resolved flag:

let resolved = false;
function settle(value) {
    if (resolved) return;
    resolved = true;
    resolve(value);
}

Then replace all resolve(...) calls with settle(...) inside the 'close' and 'error' handlers. The same pattern should be applied to forkModel() in embedding-benchmark.js.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a settled flag with a settle() wrapper function. Both close and error handlers call settle() instead of resolve() directly, preventing double resolution on spawn failure.

…#512)

- Extract forkWorker() as a generic reusable subprocess helper
- Remove fork()'s built-in timeout option so the manual SIGKILL timer
  is the sole timeout mechanism (they previously fired simultaneously)
- Add a settled flag to prevent double Promise resolution on spawn failure
- forkEngines() now delegates to forkWorker() instead of inline runWorker()
…512)

The first resolveBenchmarkSource() call (for version extraction) never
had its cleanup invoked, potentially leaking temporary resources.
embedding-benchmark.js had a near-identical copy of runWorker() from
fork-engine.js (including the dual-timeout bug). Now imports and uses
the shared forkWorker() helper instead.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines 128 to 150
@@ -130,95 +145,35 @@ function benchDiffImpact(hubName) {
affectedFiles: lastResult?.affectedFiles?.length || 0,
};
} finally {
// Restore: unstage + revert content
execFileSync('git', ['restore', '--staged', hubFile], { cwd: root, stdio: 'pipe' });
fs.writeFileSync(hubFile, original);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Segfault in worker leaves git staging area dirty

benchDiffImpact() runs git add hubFile (line 132), then benchmarks, then restores via git restore --staged hubFile in a finally block (line 148). However, a SIGSEGV from the native addon — exactly the crash scenario this PR is designed to handle — kills the child process at the OS level before any JavaScript finally block can execute. The finally cleanup is therefore not crash-safe.

If the worker is killed (by signal or SIGKILL timeout) after git add but before git restore --staged, the parent process survives (as intended) but the working tree's index is permanently left with the hub file staged. This can corrupt the developer's git state and interfere with subsequent benchmark runs or unrelated git operations.

One practical mitigation is for forkWorker (or the benchmark orchestrator) to run git restore --staged in the parent when the child exits abnormally. Alternatively, benchDiffImpact could record the staged file in a temp file before git add, allowing a parent-side recovery step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in two commits:

  1. Parent cleanup capture: resolveBenchmarkSource() return value is now destructured to capture cleanup as versionCleanup, called on both the success and error exit paths.

  2. Git staging safety net: After forkEngines() returns, the parent now checks git diff --cached --name-only. If any staged files are left over from a crashed worker (segfault during benchDiffImpact's git add), the parent runs git restore --staged . and git checkout . to reset both the index and working tree. This ensures a worker crash never corrupts the developer's git state.

versionCleanup() was only called on the success path. When both engines
fail and the parent exits early via process.exit(1), temporary resources
from resolveBenchmarkSource() were leaked.
Two issues addressed:

1. resolveBenchmarkSource() cleanup was never captured or called in the
   parent process, leaking temporary resources on every run.

2. benchDiffImpact() runs git-add inside the worker — if a segfault
   kills the worker mid-execution, the finally block never runs and
   the git staging area is left dirty. The parent now checks for
   leftover staged files after workers exit and cleans them up.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +150 to +156
cleanup();

if (!hasWasm && !hasNative) {
console.error('Error: Neither WASM grammars nor native engine are available.');
console.error('Run "npm run build:wasm" to build WASM grammars, or install the native platform package.');
process.exit(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 process.exit(1) bypasses callers' cleanup callbacks

forkEngines correctly calls its own cleanup() (line 150) before exiting, but process.exit(1) on line 155 aborts the call-stack immediately — the caller never regains control. This means the versionCleanup / parentCleanup captured by each benchmark script before calling forkEngines is never invoked when neither engine is available:

  • benchmark.js:23versionCleanup leaked
  • query-benchmark.js:27versionCleanup leaked
  • incremental-benchmark.js:22parentCleanup leaked

Rather than exiting internally, forkEngines should signal the error to the caller and let it handle cleanup before exiting:

if (!hasWasm && !hasNative) {
    console.error('Error: Neither WASM grammars nor native engine are available.');
    console.error('Run "npm run build:wasm" to build WASM grammars, or install the native platform package.');
    return null; // or throw an Error
}

Each caller can then check for null / catch the error, call its own cleanup, and exit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — forkEngines() now throws an Error instead of calling process.exit(1). All three callers (benchmark.js, query-benchmark.js, incremental-benchmark.js) catch the error, invoke their own cleanup callback, and exit gracefully.

forkEngines() called process.exit(1) when no engines were available,
which aborted the call-stack immediately and prevented callers from
invoking their own cleanup callbacks (versionCleanup, parentCleanup).

Now throws an Error instead; all three callers catch it, run cleanup,
and exit gracefully.

Impact: 1 functions changed, 3 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 8199db7 into main Mar 19, 2026
19 checks passed
@carlos-alm carlos-alm deleted the feat/bench-subprocess-isolation branch March 19, 2026 06:09
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant