fix(mountsync): independent rootCtx-derived deadline for outbox/writeback flush#270
Conversation
…eadline
The mount wraps BOTH the mirror pull and the durable writeback/outbox push
in one per-cycle ctx bounded by RELAYFILE_MOUNT_TIMEOUT (default 15s). On a
large mounted tree a single cloud round-trip routinely exceeds 15s, so the
writeback push is cancelled mid-flight and the record sits in the outbox
retrying across cycles for minutes — surfacing as repeated
mount local change failed: context deadline exceeded
mount sync cycle failed: context deadline exceeded
and provider writebacks (e.g. a Slack reply draft) landing minutes late or
not at all before the ephemeral sandbox tears down.
A small outbound write must not share — and be starved by — the same budget
as a full-tree mirror. This is exactly the problem bootstrapContext already
solves for the heavy initial/periodic pull ("derive from rootCtx, not the
per-cycle ctx, so a tiny RELAYFILE_MOUNT_TIMEOUT cannot starve it"). Apply
the symmetric treatment to the outbox flush:
- New outboxContext(): derives the flush deadline from s.rootCtx with its own
budget (default 60s, env RELAYFILE_OUTBOX_TIMEOUT). rootCtx cancellation
(process shutdown) still propagates.
- flushOutboxRecords() now runs the cloud upload under that deadline instead
of the inbound per-cycle ctx.
Test: TestOutboxFlushUsesIndependentDeadlineNotPerCycleCtx hands the flush an
already-expired per-cycle ctx and asserts the upload still completes under the
rootCtx-derived deadline (proven red without the fix: the expired ctx leaks
through and the upload is cancelled). Full internal/mountsync suite green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR isolates outbox/writeback flush operations from per-cycle mount timeouts by introducing a configurable, root context-derived deadline. The feature prevents in-flight uploads from being cancelled when the per-cycle ChangesOutbox flush independent deadline
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request decouples the outbox flush/writeback process from the per-cycle mount timeout by introducing an independent OutboxFlushTimeout and deriving the flush context from the root context. This prevents small outbound writes from being starved or canceled by short per-cycle deadlines. A test has also been added to verify this behavior. The review feedback points out a potential panic in the defensive context-handling logic if both the root context and the parent context are nil, suggesting a fallback to context.Background().
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| root := s.rootCtx | ||
| if root == nil { | ||
| // Defensive: NewSyncer always sets rootCtx, but never derive from a nil | ||
| // parent — fall back to the inbound ctx so behaviour degrades to the | ||
| // pre-fix per-cycle deadline rather than panicking. | ||
| root = parent | ||
| } |
There was a problem hiding this comment.
If both s.rootCtx and parent are nil (which can happen if Syncer is constructed manually in tests and a nil context is passed), root will remain nil. Passing a nil parent context to context.WithTimeout will cause a panic. To prevent this, we should defensively fall back to context.Background() if both are nil.
| root := s.rootCtx | |
| if root == nil { | |
| // Defensive: NewSyncer always sets rootCtx, but never derive from a nil | |
| // parent — fall back to the inbound ctx so behaviour degrades to the | |
| // pre-fix per-cycle deadline rather than panicking. | |
| root = parent | |
| } | |
| root := s.rootCtx | |
| if root == nil { | |
| if parent != nil { | |
| root = parent | |
| } else { | |
| root = context.Background() | |
| } | |
| } |
Relayfile Eval ReviewRun: Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesNo reviewable human-review cases captured Relayfile output. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/mountsync/syncer.go`:
- Around line 1779-1783: outboxContext currently ignores the caller's ctx
(breaking FlushOutboxOnce timeout); change outboxContext to take the caller ctx
as the parent and return a derived ctx that still applies any root-cycle
timeout/values from s.rootCtx but does not bypass caller cancellation—e.g., if
s.rootCtx has a deadline, compute the earlier deadline between caller ctx and
s.rootCtx and use context.WithDeadline on the caller ctx (or otherwise compose
cancellations) so FlushOutboxOnce(ctx) and
flushOutboxRecords/flushOutboxRecordChunk calls will observe the caller
cancellation while cycle-path callers still get the root-derived timeout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: bfd5699c-e6da-4491-a19f-5a4fe9cc2d2d
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (2)
internal/mountsync/syncer.gointernal/mountsync/syncer_test.go
| flushCtx, cancel := s.outboxContext(ctx) | ||
| defer cancel() | ||
| for _, chunk := range chunkOutboxRecords(due, maxWritebackBatchBytes()) { | ||
| if err := s.flushOutboxRecordChunk(ctx, chunk, conflicted); err != nil { | ||
| if err := s.flushOutboxRecordChunk(flushCtx, chunk, conflicted); err != nil { | ||
| return err |
There was a problem hiding this comment.
outboxContext currently bypasses explicit caller cancellation in FlushOutboxOnce.
Because outboxContext always derives from s.rootCtx when present, flushOutboxRecords no longer observes the inbound ctx cancellation. That is fine for per-cycle sync, but it can break FlushOutboxOnce(ctx) timeout intent (caller supplies a bounded ctx). Please preserve parent cancellation for one-shot flush call paths while keeping rootCtx-derived timeout for cycle-path flushes.
Suggested direction
-func (s *Syncer) flushOutboxRecords(ctx context.Context, conflicted map[string]struct{}, forceDue bool) error {
+func (s *Syncer) flushOutboxRecords(ctx context.Context, conflicted map[string]struct{}, forceDue bool) error {
...
- flushCtx, cancel := s.outboxContext(ctx)
+ flushCtx, cancel := s.outboxContext(ctx, forceDue /* respectParentCancel */)
defer cancel()
...
}
-func (s *Syncer) outboxContext(parent context.Context) (context.Context, context.CancelFunc) {
+func (s *Syncer) outboxContext(parent context.Context, respectParentCancel bool) (context.Context, context.CancelFunc) {
root := s.rootCtx
if root == nil {
root = parent
}
timeout := s.outboxFlushTimeout
if timeout <= 0 {
timeout = defaultOutboxFlushTimeout
}
- return context.WithTimeout(root, timeout)
+ ctx, cancel := context.WithTimeout(root, timeout)
+ if !respectParentCancel || parent == nil {
+ return ctx, cancel
+ }
+ stop := context.AfterFunc(parent, cancel)
+ return ctx, func() {
+ stop()
+ cancel()
+ }
}Also applies to: 2854-2867
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/mountsync/syncer.go` around lines 1779 - 1783, outboxContext
currently ignores the caller's ctx (breaking FlushOutboxOnce timeout); change
outboxContext to take the caller ctx as the parent and return a derived ctx that
still applies any root-cycle timeout/values from s.rootCtx but does not bypass
caller cancellation—e.g., if s.rootCtx has a deadline, compute the earlier
deadline between caller ctx and s.rootCtx and use context.WithDeadline on the
caller ctx (or otherwise compose cancellations) so FlushOutboxOnce(ctx) and
flushOutboxRecords/flushOutboxRecordChunk calls will observe the caller
cancellation while cycle-path callers still get the root-derived timeout.
|
Implemented the PR fixes locally. Changed:
Addressed comments
VerificationPassed locally:
The conditional workers typecheck job is skipped in this checkout because |
|
Implemented the PR review fixes in the checkout. Changed:
Addressed comments
Advisory Notes
Local verification passed:
|
Problem
The mount wraps both the mirror pull and the durable writeback/outbox push in one per-cycle context bounded by
RELAYFILE_MOUNT_TIMEOUT(default 15s). On a large mounted tree (a real churn-digest box carries ~2,829 files / 1,252 Slack across 40 channels), a single cloud round-trip routinely exceeds 15s, so the writeback push is cancelled mid-flight and the outbox record retries across cycles for minutes. Symptoms in prod mount logs:…and provider writebacks (e.g. a Slack reply draft) landing ~10 min late — or not at all before the ephemeral sandbox tears down (cleanup
flushExitCode:1).Fix
A small outbound write must not share — and be starved by — the same budget as a full-tree mirror.
bootstrapContextalready solves exactly this for the heavy initial/periodic pull ("derive from rootCtx, not the per-cycle ctx, so a tinyRELAYFILE_MOUNT_TIMEOUTcannot starve it"). This applies the symmetric treatment to the outbox flush:outboxContext()— derives the flush deadline froms.rootCtxwith its own budget (default 60s, envRELAYFILE_OUTBOX_TIMEOUT). Process-shutdown cancellation viarootCtxstill propagates.flushOutboxRecords()runs the cloud upload under that deadline instead of the inbound per-cycle ctx.Test
TestOutboxFlushUsesIndependentDeadlineNotPerCycleCtxhands the flush an already-expired per-cycle ctx and asserts the upload still completes under the rootCtx-derived deadline.saw -5.78ms remaining) and the upload is cancelled; with it, the upload runs under the ~60s outbox deadline.internal/mountsyncsuite green (no regressions);gofmtclean.Rollout note
This reaches prod only via a Daytona snapshot rebuild (the
RELAYFILE_MOUNT_VERSIONpin) — merge alone does not deploy it.Companion (cloud repo, separate PR)
sandbox-orchestrator.tsflushTimeoutSeconds, currently 20s) above the new 60s outbox budget so writebacks drain before SIGKILL.🤖 Generated with Claude Code