Skip to content

feat: cross-platform file locking and atomic writes for token tracker#103

Closed
SalesTeamToolbox wants to merge 1 commit intojgravelle:mainfrom
SalesTeamToolbox:feat/atomic-savings-tracker
Closed

feat: cross-platform file locking and atomic writes for token tracker#103
SalesTeamToolbox wants to merge 1 commit intojgravelle:mainfrom
SalesTeamToolbox:feat/atomic-savings-tracker

Conversation

@SalesTeamToolbox
Copy link

Summary

  • Cross-process file lock (fcntl on Unix, msvcrt on Windows) prevents concurrent MCP server instances from silently dropping accumulated token savings
  • In-process threading lock for async task safety
  • Atomic writes via tempfile.mkstemp() + os.replace() instead of naive path.write_text() — prevents corruption on crash/kill
  • Early return on zero delta to avoid unnecessary I/O

Problem

When multiple MCP server instances run concurrently (e.g., parallel Claude Code sessions), the _savings.json file experiences read-modify-write races. Two processes read the same total, each adds their delta independently, and the last writer wins — silently losing the other's accumulated savings.

The naive path.write_text() also risks corruption if the process is killed mid-write.

Test plan

  • Verify savings accumulate correctly with single MCP server
  • Verify no savings loss with 2+ concurrent MCP server instances
  • Verify atomic write handles process kill during write (no corruption)
  • Verify cross-platform: works on both Unix (fcntl) and Windows (msvcrt)

🤖 Generated with Claude Code

- Add cross-process file lock (fcntl on Unix, msvcrt on Windows) to
  prevent concurrent read-modify-write races that silently drop savings
- Add in-process threading lock for async task safety
- Replace naive path.write_text() with atomic temp-file + os.replace()
- Extract _read_locked() and _atomic_write() helpers
- Early return on zero delta to avoid unnecessary I/O

Fixes token savings regression where concurrent MCP server instances
could overwrite each other's accumulated totals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Owner

@jgravelle jgravelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for a real race condition. The approach is correct: threading lock for in-process safety, fcntl/msvcrt file lock for cross-process safety, and atomic rename to prevent corruption on crash.

One minor note: the lock file grows by one byte per record_savings call (the 'L' write in _lock_file). Not a blocker — lock files are typically tiny and this path is not high-frequency — but worth knowing if anyone checks.

No tests added, but that's consistent with the existing module. Merging.

@jgravelle
Copy link
Owner

Thanks for the careful analysis of the race condition in record_savings — the problem is real and the approach (cross-process file lock + atomic write) is correct.

Unfortunately this PR can't merge: token_tracker.py was fully rewritten in v1.2.7 (in-memory accumulator, flush-on-interval, atexit/SIGTERM handlers), so the diff no longer applies cleanly.

The current code has a different form of the same race: _flush_locked writes self._total (this process's full in-memory total) directly to disk, which clobbers whatever a concurrent process flushed. The fix there isn't a lock around the existing write -- it's changing the flush to be additive (data['total_tokens_saved'] = data.get('total_tokens_saved', 0) + self._unflushed) rather than overwrite. That's a clean single-line change in _flush_locked, not the same patch this PR contains.

If you want to rebase against current main and submit a fix for the additive-flush problem, it would be welcomed. Otherwise I'll track it as a known issue.

@jgravelle jgravelle closed this Mar 15, 2026
jgravelle added a commit that referenced this pull request Mar 15, 2026
_flush_locked was overwriting the on-disk total with self._total (this
process's in-memory view), so concurrent MCP instances clobbered each
other's accumulated savings at flush time. Changed to additive write:

    data["total_tokens_saved"] = data.get("total_tokens_saved", 0) + self._unflushed

Each process now only contributes its own unflushed delta, making
concurrent flushes safe in the common case. Reported in PR #103.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@SalesTeamToolbox SalesTeamToolbox deleted the feat/atomic-savings-tracker branch March 15, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants