Inject spawn seam in RunWorkloadDetached to stop orphan test processes#5346
Merged
Conversation
DefaultManager.RunWorkloadDetached re-execs os.Executable() with `start <basename> --foreground`. Under `go test`, os.Executable() resolves to the test binary, so the unit test for updateSingleWorkload spawned `workloads.test start test-workload --foreground` as a detached child via Setsid. Go's testing.Main ignores the positional args and re-ran the entire test suite, which called RunWorkloadDetached again — recursive spawn. Each child was orphaned to launchd/init and survived after the test binary exited, eventually consuming high CPU on developer machines. Extract the spawn step into a function-typed field on DefaultManager (detachedProcessSpawner) with a withDetachedSpawner option, following the same pattern as withRetryConfig and withRunnerFactory. Production uses spawnDetached (unchanged behavior); tests inject a no-op spawner that returns a fake PID. Closes #5344
- MEDIUM: Preserve original WorkloadStatus ordering. Move SetWorkloadStatus(Starting) and the Error rollback into spawnDetached, so failures during pre-exec setup (os.Executable, xdg.DataFile, GetSecretsPassword) return without any status change — matching the pre-refactor behavior. - MEDIUM: Add TestDefaultManager_RunWorkloadDetached_SpawnerError to cover the new spawner-error path. - MEDIUM: Add TestDefaultManager_RunWorkloadDetached_SpawnerSuccess to verify the spawner is invoked and its PID is forwarded. - LOW: Document that ctx is intentionally not propagated to exec.Command, so the child can outlive the parent. - LOW: Expand detachedProcessSpawner doc to call out the platform-specific detachment that makes orphan children possible.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5346 +/- ##
==========================================
- Coverage 68.41% 68.40% -0.01%
==========================================
Files 621 624 +3
Lines 63278 63442 +164
==========================================
+ Hits 43293 43399 +106
- Misses 16757 16807 +50
- Partials 3228 3236 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ChrisJBurns
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pkg/workloadsunit tests were spawning orphanworkloads.testchild processes that survivedgo testand accumulated on developer machines (one observed at 81% CPU after 11 hours). The cause wasDefaultManager.RunWorkloadDetachedcallingos.Executable()+exec.Command(...).Start()directly withSetsid: true, which undergo testre-execs the test binary itself; Go'stesting.Mainignores the unknown positional args and reruns the whole suite, recursively spawning more detached children.This PR introduces a
detachedProcessSpawnerseam onDefaultManagerso tests can inject a no-op spawner instead of re-execing the test binary. Production behavior is unchanged — the default spawner still re-execs viaos.Executable()withsetsid. The triggering test (TestDefaultManager_updateSingleWorkload"successful stop and delete operations complete correctly") now uses the injected spawner, and two new tests cover the seam's success and error paths.Fixes #5344
Type of change
Test plan
task test)task lint-fix)Manual verification:
task testagainst thepkg/workloadspackage onmainand confirmed orphanworkloads.test start test-workload --foregroundprocesses withPPID 1viaps -eo pid,ppid,etime,args | grep workloads.test.TestDefaultManager_RunWorkloadDetached_SpawnerSuccessandTestDefaultManager_RunWorkloadDetached_SpawnerErrorto lock in the seam contract (PID forwarding, error wrapping, no PID written on spawn failure).Changes
pkg/workloads/manager.godetachedProcessSpawnertype,withDetachedSpawneroption, anddetachedSpawnerOrDefaultaccessor onDefaultManager. SplitRunWorkloadDetachedso the re-exec logic lives inspawnDetached(the production spawner) andRunWorkloadDetachedonly validates inputs, calls the spawner, and records the returned PID. Move theWorkloadStatusStartingcommit point into the spawner so it sits immediately before the actual exec, and have the spawner own theWorkloadStatusErrorrollback onStart()failure.pkg/workloads/manager_test.goTestDefaultManager_updateSingleWorkloadso the success-path subtest no longer re-execs the test binary. Tighten itsSetWorkloadPIDexpectation fromgomock.Any()to a fixed fake PID. AddTestDefaultManager_RunWorkloadDetached_SpawnerSuccessandTestDefaultManager_RunWorkloadDetached_SpawnerErrorto cover the seam directly.Does this introduce a user-facing change?
No.
Implementation plan
Approved implementation plan
Use option 1 from the issue: introduce a spawn seam.
detachedProcessSpawner func(ctx, *runner.RunConfig) (pid int, err error)and store it onDefaultManageralongside the existingmcpRunnerFactoryseam.withDetachedSpawnermanagerOptionand adetachedSpawnerOrDefaultaccessor mirroring thenewRunner/retryConfigpattern already used in this file.spawnDetached) bit-for-bit equivalent to the current behavior, includingSetsid, log file redirection, secrets password env var, and the--debug/--foregroundargs.WorkloadStatusStartingset call intospawnDetachedso it remains the commit point right beforedetachedCmd.Start(). KeepWorkloadStatusErrorrollback there too.RunWorkloadDetacheditself should only handle validation, spawner invocation, and PID recording — pre-spawn failures must not mutate workload status.TestDefaultManager_updateSingleWorkload, inject a no-op spawner that still emits theStartingstatus transition so mock expectations match production.SetWorkloadPID.Special notes for reviewers
spawnDetacheddeliberately does not propagatectxtoexec.Command— the spawned process must outlive the parent (that is the point of "detached");exec.CommandContextwould kill the child on parent cancellation.ctxis still threaded through for status manager calls. This is called out in a comment onspawnDetached.Startingis set immediately beforedetachedCmd.Start(),Erroris set ifStart()fails, and failures during pre-exec setup (e.g.os.Executable()or log file creation) return without touching workload status — matching the previous behavior.RunWorkloadDetachedcall sites in tests (pkg/api/v1/...,pkg/mcp/server/...) usemocks.MockManager.RunWorkloadDetached, which never spawns, so they are unaffected. The validation-failure-only testTestDefaultManager_RunWorkloadDetachedreturns before reaching the spawner and also remains unchanged.