Per-turn reasoning cleanup causes server-side prompt cache bust on every Claude thinking model turn

## Summary

Magic Context's `reasoning cleanup` and `stripClearedReasoning` stages change the conversation body on **every turn** for Claude thinking models, causing a complete server-side prompt cache miss (0% hit rate → `read=0`) even when the account, endpoint, fingerprint, and system prompt are all identical.

## Environment

- Magic Context: `@cortexkit/opencode-magic-context@0.21.8`
- Provider: Antigravity proxy (Google Cloud Code Assist) with implicit prefix caching
- Model: `claude-opus-4-6-thinking` (any Claude thinking model)
- Plugin: `@expiren/opencode-antigravity-auth@1.6.49`

## Root Cause

Claude thinking models generate `thinking` blocks in every assistant response. On the **next** turn, MC's Phase 1 transform pipeline runs:

1. `reasoning replay` → clears N thinking blocks (N grows each turn)
2. `stripClearedReasoning` → strips the cleared parts
3. `sentinel replay` → neutralizes stripped messages

Because N changes every turn (new thinking blocks from the latest response), the conversation body is **different from the previous turn's body**. Google's implicit prefix cache is keyed on exact prefix hash — any change in the conversation messages results in a complete cache miss.

## Evidence from MC Logs

Three consecutive transforms on the **same account**, no account switch, <2 minutes apart:

| Field | Transform 1 | Transform 2 | Transform 3 |
|---|---|---|---|
| `reasoning replay: cleared=` | 8 | 8 | **10** |
| `reasoning cleanup:` | *(none)* | **cleared=2 watermark=2437→2467** | *(none)* |
| `stripClearedReasoning: strippedParts=` | 8 | 8 | **10** |
| `sentinel replay: neutralized=` | 3 | 3 | **7** |
| Output messages | 44 | 40 | 41 |

Transform 2 has `reasoning cleanup: cleared=2 watermark=2437→2467` — two new thinking blocks from the previous assistant response were cleared, advancing the watermark. Transform 3 then strips 10 parts (was 8) because the watermark advanced.

## Corresponding Plugin Cache Stats

These are from the Antigravity plugin's debug output for the same session, same account (`idx=18`, no account switch):

```
Request 1: Cache HIT  read=148055 total=148996 hitRate=99%   ← previous turn's prefix matched
Request 2: Cache MISS read=0      total=149615 hitRate=0%    ← complete miss after MC changed the body
```

Total tokens only grew by 619 (148996 → 149615) — a single user message. Yet `read` dropped from 148055 to **0**. The entire prefix was invalidated because MC's reasoning cleanup changed the conversation content.

## Impact

- Every Claude thinking model turn suffers a complete cache miss (~150K uncached tokens re-processed)
- This wastes significant compute quota on the Antigravity proxy
- Cache warmup probes become ineffective (probe seeds the cache, MC immediately invalidates it on the next turn)
- Hit rate cannot exceed ~50% on average because every other turn is guaranteed to miss

## Expected Behavior

The reasoning stripping result should be **idempotent** across turns — if thinking blocks are stripped to the same sentinel structure regardless of how many new blocks were added, the prefix hash would remain stable and the server-side cache would be reused.

## Possible Fixes

1. **Stable sentinel replacement**: Replace all thinking blocks with a fixed-content sentinel (e.g., `{ text: "." }`) so the stripped result is identical regardless of which/how many blocks were cleared. The count of sentinels and their content must be deterministic from turn to turn.

2. **Watermark-stable stripping**: Apply reasoning cleanup at a fixed watermark position rather than advancing it each turn, so parts already stripped remain in the same sentinel form.

3. **One-time strip at generation time**: Strip thinking blocks immediately when the assistant response is received (before it enters OpenCode's history), rather than re-stripping on every subsequent turn. This way the history content is already clean and doesn't change.

## Reproduction

1. Use any Claude thinking model (e.g., `claude-opus-4-6-thinking`) with Magic Context enabled
2. Send 3+ messages in a conversation
3. Observe MC logs: `reasoning replay: cleared=N` where N increases each turn
4. Observe provider cache stats: alternating HIT/MISS pattern or consistent MISS on every other turn

## Related

The `message.updated` events with `hasUsageTokens=false` being counted as cache BUSTs in MC diagnostics is a separate but related issue — it inflates the BUST count in MC's own metrics for Antigravity Claude models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-turn reasoning cleanup causes server-side prompt cache bust on every Claude thinking model turn #125

Summary

Environment

Root Cause

Evidence from MC Logs

Corresponding Plugin Cache Stats

Impact

Expected Behavior

Possible Fixes

Reproduction

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Transform 1	Transform 2	Transform 3
`reasoning replay: cleared=`	8	8	10
`reasoning cleanup:`	(none)	cleared=2 watermark=2437→2467	(none)
`stripClearedReasoning: strippedParts=`	8	8	10
`sentinel replay: neutralized=`	3	3	7
Output messages	44	40	41

Per-turn reasoning cleanup causes server-side prompt cache bust on every Claude thinking model turn #125

Description

Summary

Environment

Root Cause

Evidence from MC Logs

Corresponding Plugin Cache Stats

Impact

Expected Behavior

Possible Fixes

Reproduction

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions