Make workerkey the canonical worker identity#494
Merged
Conversation
PID is not a stable worker identity. The OS recycles low PIDs across container restarts, and the unique index on (pid, server) collides whenever a containerized worker is replaced before its row is cleaned. workerkey is already a UUID-style hash generated per process - the correct identity, and already uniquely indexed on its own. Schema: - New migration drops the unique index on (pid, server) and re-adds it as a non-unique index for query performance. - Test fixture mirrors the change. Behavior: - update() and remove() now look up by workerkey. - Processor stores the workerkey alongside the pid and passes it on heartbeat / shutdown. - endProcess() picks the most recently heartbeated row when multiple rows share a PID. terminateProcess() already wipes by pid which remains correct - the OS guarantees only one live process per pid at any moment. Open questions for review: - Should worker end / kill accept --workerkey as an alternative to PID for precise targeting? - Can the eviction logic added in PR #493 be removed once this lands, or kept as belt-and-braces?
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #494 +/- ##
============================================
- Coverage 77.51% 77.48% -0.03%
Complexity 971 971
============================================
Files 45 45
Lines 3255 3251 -4
============================================
- Hits 2523 2519 -4
Misses 732 732 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…canonical-identity
dereuromark
added a commit
that referenced
this pull request
May 8, 2026
…ue index (#495) testAddDoesNotEvictRecentRowOnSamePidServer was asserting QueryException, which fired before #494 because the unique (pid, server) index rejected the second insert. With the unique constraint dropped, the second insert succeeds, so the test premise no longer holds and master CI failed. Reframe the test to verify what it actually cares about: the eviction guard (`modified <` threshold) inside add() must NOT remove a fresh row. After the second add() on the same (pid, server), assert the pre-existing fresh row still exists. Coverage of the threshold check is preserved; the duplicate-allowed behavior is already covered by testAddAllowsDuplicatePidServer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: DRAFT — for discussion
Summary
(pid, server)(re-added as non-unique for query performance).update()/remove()to lookup byworkerkey(already uniquely indexed).Processornow stores itsworkerkeyalongsidepidand uses it for heartbeat and shutdown.endProcess()picks the most recently heartbeated row when multiple share a PID, so operator commands target the live worker rather than a stale leftover.Why
PID is not a stable worker identity. The OS recycles low PIDs across container restarts, and the unique
(pid, server)index collides whenever a containerized worker is replaced before its row is cleaned.workerkeyis already a UUID-style hash generated per process — the correct identity.Relationship to other PRs
--forceflag onqueue worker clean) is the operator escape hatch and lands first.feature/heartbeat-aware-pid-slot) is the auto-recovery fix and lands second. Once this PR is merged, the eviction logic there becomes belt-and-braces — could be kept or removed.Open questions
worker end/worker killaccept--workerkeyas a more precise alternative to PID?update()/remove()keep PID-based overloads for BC (deprecation path) or be a hard cut?Notes
terminateProcess()still wipes by PID; that's correct because the OS only ever has one live process per PID at any instant, so all rows sharing that PID belong to the same OS process (some live, some stale leftovers — all OK to remove once the kill succeeds).