feat: cross-platform file locking and atomic writes for token tracker#103
feat: cross-platform file locking and atomic writes for token tracker#103SalesTeamToolbox wants to merge 1 commit intojgravelle:mainfrom
Conversation
- Add cross-process file lock (fcntl on Unix, msvcrt on Windows) to prevent concurrent read-modify-write races that silently drop savings - Add in-process threading lock for async task safety - Replace naive path.write_text() with atomic temp-file + os.replace() - Extract _read_locked() and _atomic_write() helpers - Early return on zero delta to avoid unnecessary I/O Fixes token savings regression where concurrent MCP server instances could overwrite each other's accumulated totals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
jgravelle
left a comment
There was a problem hiding this comment.
Solid fix for a real race condition. The approach is correct: threading lock for in-process safety, fcntl/msvcrt file lock for cross-process safety, and atomic rename to prevent corruption on crash.
One minor note: the lock file grows by one byte per record_savings call (the 'L' write in _lock_file). Not a blocker — lock files are typically tiny and this path is not high-frequency — but worth knowing if anyone checks.
No tests added, but that's consistent with the existing module. Merging.
|
Thanks for the careful analysis of the race condition in record_savings — the problem is real and the approach (cross-process file lock + atomic write) is correct. Unfortunately this PR can't merge: token_tracker.py was fully rewritten in v1.2.7 (in-memory accumulator, flush-on-interval, atexit/SIGTERM handlers), so the diff no longer applies cleanly. The current code has a different form of the same race: _flush_locked writes self._total (this process's full in-memory total) directly to disk, which clobbers whatever a concurrent process flushed. The fix there isn't a lock around the existing write -- it's changing the flush to be additive (data['total_tokens_saved'] = data.get('total_tokens_saved', 0) + self._unflushed) rather than overwrite. That's a clean single-line change in _flush_locked, not the same patch this PR contains. If you want to rebase against current main and submit a fix for the additive-flush problem, it would be welcomed. Otherwise I'll track it as a known issue. |
_flush_locked was overwriting the on-disk total with self._total (this
process's in-memory view), so concurrent MCP instances clobbered each
other's accumulated savings at flush time. Changed to additive write:
data["total_tokens_saved"] = data.get("total_tokens_saved", 0) + self._unflushed
Each process now only contributes its own unflushed delta, making
concurrent flushes safe in the common case. Reported in PR #103.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
fcntlon Unix,msvcrton Windows) prevents concurrent MCP server instances from silently dropping accumulated token savingstempfile.mkstemp()+os.replace()instead of naivepath.write_text()— prevents corruption on crash/killProblem
When multiple MCP server instances run concurrently (e.g., parallel Claude Code sessions), the
_savings.jsonfile experiences read-modify-write races. Two processes read the same total, each adds their delta independently, and the last writer wins — silently losing the other's accumulated savings.The naive
path.write_text()also risks corruption if the process is killed mid-write.Test plan
🤖 Generated with Claude Code