Summary
Magic Context's reasoning cleanup and stripClearedReasoning stages change the conversation body on every turn for Claude thinking models, causing a complete server-side prompt cache miss (0% hit rate → read=0) even when the account, endpoint, fingerprint, and system prompt are all identical.
Environment
- Magic Context:
@cortexkit/opencode-magic-context@0.21.8
- Provider: Antigravity proxy (Google Cloud Code Assist) with implicit prefix caching
- Model:
claude-opus-4-6-thinking (any Claude thinking model)
- Plugin:
@expiren/opencode-antigravity-auth@1.6.49
Root Cause
Claude thinking models generate thinking blocks in every assistant response. On the next turn, MC's Phase 1 transform pipeline runs:
reasoning replay → clears N thinking blocks (N grows each turn)
stripClearedReasoning → strips the cleared parts
sentinel replay → neutralizes stripped messages
Because N changes every turn (new thinking blocks from the latest response), the conversation body is different from the previous turn's body. Google's implicit prefix cache is keyed on exact prefix hash — any change in the conversation messages results in a complete cache miss.
Evidence from MC Logs
Three consecutive transforms on the same account, no account switch, <2 minutes apart:
| Field |
Transform 1 |
Transform 2 |
Transform 3 |
reasoning replay: cleared= |
8 |
8 |
10 |
reasoning cleanup: |
(none) |
cleared=2 watermark=2437→2467 |
(none) |
stripClearedReasoning: strippedParts= |
8 |
8 |
10 |
sentinel replay: neutralized= |
3 |
3 |
7 |
| Output messages |
44 |
40 |
41 |
Transform 2 has reasoning cleanup: cleared=2 watermark=2437→2467 — two new thinking blocks from the previous assistant response were cleared, advancing the watermark. Transform 3 then strips 10 parts (was 8) because the watermark advanced.
Corresponding Plugin Cache Stats
These are from the Antigravity plugin's debug output for the same session, same account (idx=18, no account switch):
Request 1: Cache HIT read=148055 total=148996 hitRate=99% ← previous turn's prefix matched
Request 2: Cache MISS read=0 total=149615 hitRate=0% ← complete miss after MC changed the body
Total tokens only grew by 619 (148996 → 149615) — a single user message. Yet read dropped from 148055 to 0. The entire prefix was invalidated because MC's reasoning cleanup changed the conversation content.
Impact
- Every Claude thinking model turn suffers a complete cache miss (~150K uncached tokens re-processed)
- This wastes significant compute quota on the Antigravity proxy
- Cache warmup probes become ineffective (probe seeds the cache, MC immediately invalidates it on the next turn)
- Hit rate cannot exceed ~50% on average because every other turn is guaranteed to miss
Expected Behavior
The reasoning stripping result should be idempotent across turns — if thinking blocks are stripped to the same sentinel structure regardless of how many new blocks were added, the prefix hash would remain stable and the server-side cache would be reused.
Possible Fixes
-
Stable sentinel replacement: Replace all thinking blocks with a fixed-content sentinel (e.g., { text: "." }) so the stripped result is identical regardless of which/how many blocks were cleared. The count of sentinels and their content must be deterministic from turn to turn.
-
Watermark-stable stripping: Apply reasoning cleanup at a fixed watermark position rather than advancing it each turn, so parts already stripped remain in the same sentinel form.
-
One-time strip at generation time: Strip thinking blocks immediately when the assistant response is received (before it enters OpenCode's history), rather than re-stripping on every subsequent turn. This way the history content is already clean and doesn't change.
Reproduction
- Use any Claude thinking model (e.g.,
claude-opus-4-6-thinking) with Magic Context enabled
- Send 3+ messages in a conversation
- Observe MC logs:
reasoning replay: cleared=N where N increases each turn
- Observe provider cache stats: alternating HIT/MISS pattern or consistent MISS on every other turn
Related
The message.updated events with hasUsageTokens=false being counted as cache BUSTs in MC diagnostics is a separate but related issue — it inflates the BUST count in MC's own metrics for Antigravity Claude models.
Summary
Magic Context's
reasoning cleanupandstripClearedReasoningstages change the conversation body on every turn for Claude thinking models, causing a complete server-side prompt cache miss (0% hit rate →read=0) even when the account, endpoint, fingerprint, and system prompt are all identical.Environment
@cortexkit/opencode-magic-context@0.21.8claude-opus-4-6-thinking(any Claude thinking model)@expiren/opencode-antigravity-auth@1.6.49Root Cause
Claude thinking models generate
thinkingblocks in every assistant response. On the next turn, MC's Phase 1 transform pipeline runs:reasoning replay→ clears N thinking blocks (N grows each turn)stripClearedReasoning→ strips the cleared partssentinel replay→ neutralizes stripped messagesBecause N changes every turn (new thinking blocks from the latest response), the conversation body is different from the previous turn's body. Google's implicit prefix cache is keyed on exact prefix hash — any change in the conversation messages results in a complete cache miss.
Evidence from MC Logs
Three consecutive transforms on the same account, no account switch, <2 minutes apart:
reasoning replay: cleared=reasoning cleanup:stripClearedReasoning: strippedParts=sentinel replay: neutralized=Transform 2 has
reasoning cleanup: cleared=2 watermark=2437→2467— two new thinking blocks from the previous assistant response were cleared, advancing the watermark. Transform 3 then strips 10 parts (was 8) because the watermark advanced.Corresponding Plugin Cache Stats
These are from the Antigravity plugin's debug output for the same session, same account (
idx=18, no account switch):Total tokens only grew by 619 (148996 → 149615) — a single user message. Yet
readdropped from 148055 to 0. The entire prefix was invalidated because MC's reasoning cleanup changed the conversation content.Impact
Expected Behavior
The reasoning stripping result should be idempotent across turns — if thinking blocks are stripped to the same sentinel structure regardless of how many new blocks were added, the prefix hash would remain stable and the server-side cache would be reused.
Possible Fixes
Stable sentinel replacement: Replace all thinking blocks with a fixed-content sentinel (e.g.,
{ text: "." }) so the stripped result is identical regardless of which/how many blocks were cleared. The count of sentinels and their content must be deterministic from turn to turn.Watermark-stable stripping: Apply reasoning cleanup at a fixed watermark position rather than advancing it each turn, so parts already stripped remain in the same sentinel form.
One-time strip at generation time: Strip thinking blocks immediately when the assistant response is received (before it enters OpenCode's history), rather than re-stripping on every subsequent turn. This way the history content is already clean and doesn't change.
Reproduction
claude-opus-4-6-thinking) with Magic Context enabledreasoning replay: cleared=Nwhere N increases each turnRelated
The
message.updatedevents withhasUsageTokens=falsebeing counted as cache BUSTs in MC diagnostics is a separate but related issue — it inflates the BUST count in MC's own metrics for Antigravity Claude models.