Skip to content

fix(mountsync): independent rootCtx-derived deadline for outbox/writeback flush#270

Merged
khaliqgant merged 2 commits into
mainfrom
fix/mount-outbox-independent-deadline
Jun 11, 2026
Merged

fix(mountsync): independent rootCtx-derived deadline for outbox/writeback flush#270
khaliqgant merged 2 commits into
mainfrom
fix/mount-outbox-independent-deadline

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Problem

The mount wraps both the mirror pull and the durable writeback/outbox push in one per-cycle context bounded by RELAYFILE_MOUNT_TIMEOUT (default 15s). On a large mounted tree (a real churn-digest box carries ~2,829 files / 1,252 Slack across 40 channels), a single cloud round-trip routinely exceeds 15s, so the writeback push is cancelled mid-flight and the outbox record retries across cycles for minutes. Symptoms in prod mount logs:

mount local change failed: context deadline exceeded
mount sync cycle failed: context deadline exceeded

…and provider writebacks (e.g. a Slack reply draft) landing ~10 min late — or not at all before the ephemeral sandbox tears down (cleanup flushExitCode:1).

Fix

A small outbound write must not share — and be starved by — the same budget as a full-tree mirror. bootstrapContext already solves exactly this for the heavy initial/periodic pull ("derive from rootCtx, not the per-cycle ctx, so a tiny RELAYFILE_MOUNT_TIMEOUT cannot starve it"). This applies the symmetric treatment to the outbox flush:

  • outboxContext() — derives the flush deadline from s.rootCtx with its own budget (default 60s, env RELAYFILE_OUTBOX_TIMEOUT). Process-shutdown cancellation via rootCtx still propagates.
  • flushOutboxRecords() runs the cloud upload under that deadline instead of the inbound per-cycle ctx.

Test

TestOutboxFlushUsesIndependentDeadlineNotPerCycleCtx hands the flush an already-expired per-cycle ctx and asserts the upload still completes under the rootCtx-derived deadline.

  • Proven red→green: without the fix the expired ctx leaks through (saw -5.78ms remaining) and the upload is cancelled; with it, the upload runs under the ~60s outbox deadline.
  • Full internal/mountsync suite green (no regressions); gofmt clean.

Rollout note

This reaches prod only via a Daytona snapshot rebuild (the RELAYFILE_MOUNT_VERSION pin) — merge alone does not deploy it.

Companion (cloud repo, separate PR)

  • Raise the cloud teardown flush wrap (sandbox-orchestrator.ts flushTimeoutSeconds, currently 20s) above the new 60s outbox budget so writebacks drain before SIGKILL.
  • Narrow over-broad persona mount scope (churn-digest mounts all 40 Slack channels) — pending verification that scope drives the mounted tree.

🤖 Generated with Claude Code

…eadline

The mount wraps BOTH the mirror pull and the durable writeback/outbox push
in one per-cycle ctx bounded by RELAYFILE_MOUNT_TIMEOUT (default 15s). On a
large mounted tree a single cloud round-trip routinely exceeds 15s, so the
writeback push is cancelled mid-flight and the record sits in the outbox
retrying across cycles for minutes — surfacing as repeated

    mount local change failed: context deadline exceeded
    mount sync cycle failed: context deadline exceeded

and provider writebacks (e.g. a Slack reply draft) landing minutes late or
not at all before the ephemeral sandbox tears down.

A small outbound write must not share — and be starved by — the same budget
as a full-tree mirror. This is exactly the problem bootstrapContext already
solves for the heavy initial/periodic pull ("derive from rootCtx, not the
per-cycle ctx, so a tiny RELAYFILE_MOUNT_TIMEOUT cannot starve it"). Apply
the symmetric treatment to the outbox flush:

- New outboxContext(): derives the flush deadline from s.rootCtx with its own
  budget (default 60s, env RELAYFILE_OUTBOX_TIMEOUT). rootCtx cancellation
  (process shutdown) still propagates.
- flushOutboxRecords() now runs the cloud upload under that deadline instead
  of the inbound per-cycle ctx.

Test: TestOutboxFlushUsesIndependentDeadlineNotPerCycleCtx hands the flush an
already-expired per-cycle ctx and asserts the upload still completes under the
rootCtx-derived deadline (proven red without the fix: the expired ctx leaks
through and the upload is cancelled). Full internal/mountsync suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR isolates outbox/writeback flush operations from per-cycle mount timeouts by introducing a configurable, root context-derived deadline. The feature prevents in-flight uploads from being cancelled when the per-cycle RELAYFILE_MOUNT_TIMEOUT is exhausted, ensuring durable record flushes complete independently.

Changes

Outbox flush independent deadline

Layer / File(s) Summary
Timeout configuration and constant
internal/mountsync/syncer.go
Defines defaultOutboxFlushTimeout constant documenting that outbox flushes use a dedicated rootCtx-derived deadline, and extends SyncerOptions with a new OutboxFlushTimeout field defaulting from RELAYFILE_OUTBOX_TIMEOUT environment variable.
Syncer storage and initialization
internal/mountsync/syncer.go
Adds outboxFlushTimeout field to Syncer struct, resolves the configured timeout value with environment variable fallback and non-positive value handling, and wires the computed timeout into the constructed Syncer instance.
Outbox flush context derivation and usage
internal/mountsync/syncer.go
Introduces outboxContext(parent) helper method that derives a timeout context from rootCtx (with defensive fallback to parent context) using the configured outbox flush timeout. Updates flushOutboxRecords to run outbox record uploads using this independent flush context instead of the inbound per-cycle context.
Regression test for independent deadline
internal/mountsync/syncer_test.go
Comprehensive test TestOutboxFlushUsesIndependentDeadlineNotPerCycleCtx that seeds a pending outbox record via cancelled upload, then verifies that a subsequent flush called with an already-expired per-cycle context still executes the bulk upload under the independent outbox-root deadline and clears the pending records.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

no-agent-relay-review

Poem

🐰 Timeouts that timeout? Not on my watch!
Outbox flushes now chart their own path,
Root context steady, per-cycle loose—
Writes finish strong, no more abuse! 📦✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: introducing an independent deadline for outbox/writeback flush derived from rootCtx instead of per-cycle context, which is the core fix.
Description check ✅ Passed The description provides comprehensive context about the problem (writeback cancellations on large trees), the specific fix (outboxContext with independent deadline), test coverage, and rollout notes, all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mount-outbox-independent-deadline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request decouples the outbox flush/writeback process from the per-cycle mount timeout by introducing an independent OutboxFlushTimeout and deriving the flush context from the root context. This prevents small outbound writes from being starved or canceled by short per-cycle deadlines. A test has also been added to verify this behavior. The review feedback points out a potential panic in the defensive context-handling logic if both the root context and the parent context are nil, suggesting a fallback to context.Background().

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2855 to +2861
root := s.rootCtx
if root == nil {
// Defensive: NewSyncer always sets rootCtx, but never derive from a nil
// parent — fall back to the inbound ctx so behaviour degrades to the
// pre-fix per-cycle deadline rather than panicking.
root = parent
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If both s.rootCtx and parent are nil (which can happen if Syncer is constructed manually in tests and a nil context is passed), root will remain nil. Passing a nil parent context to context.WithTimeout will cause a panic. To prevent this, we should defensively fall back to context.Background() if both are nil.

Suggested change
root := s.rootCtx
if root == nil {
// Defensive: NewSyncer always sets rootCtx, but never derive from a nil
// parent — fall back to the inbound ctx so behaviour degrades to the
// pre-fix per-cycle deadline rather than panicking.
root = parent
}
root := s.rootCtx
if root == nil {
if parent != nil {
root = parent
} else {
root = context.Background()
}
}

@github-actions

Copy link
Copy Markdown

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-06-11T12-13-29-119Z-HEAD-provider
Mode: provider
Git SHA: b17d29b

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/mountsync/syncer.go`:
- Around line 1779-1783: outboxContext currently ignores the caller's ctx
(breaking FlushOutboxOnce timeout); change outboxContext to take the caller ctx
as the parent and return a derived ctx that still applies any root-cycle
timeout/values from s.rootCtx but does not bypass caller cancellation—e.g., if
s.rootCtx has a deadline, compute the earlier deadline between caller ctx and
s.rootCtx and use context.WithDeadline on the caller ctx (or otherwise compose
cancellations) so FlushOutboxOnce(ctx) and
flushOutboxRecords/flushOutboxRecordChunk calls will observe the caller
cancellation while cycle-path callers still get the root-derived timeout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bfd5699c-e6da-4491-a19f-5a4fe9cc2d2d

📥 Commits

Reviewing files that changed from the base of the PR and between 123190c and 6ba3f39.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (2)
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go

Comment on lines +1779 to 1783
flushCtx, cancel := s.outboxContext(ctx)
defer cancel()
for _, chunk := range chunkOutboxRecords(due, maxWritebackBatchBytes()) {
if err := s.flushOutboxRecordChunk(ctx, chunk, conflicted); err != nil {
if err := s.flushOutboxRecordChunk(flushCtx, chunk, conflicted); err != nil {
return err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

outboxContext currently bypasses explicit caller cancellation in FlushOutboxOnce.

Because outboxContext always derives from s.rootCtx when present, flushOutboxRecords no longer observes the inbound ctx cancellation. That is fine for per-cycle sync, but it can break FlushOutboxOnce(ctx) timeout intent (caller supplies a bounded ctx). Please preserve parent cancellation for one-shot flush call paths while keeping rootCtx-derived timeout for cycle-path flushes.

Suggested direction
-func (s *Syncer) flushOutboxRecords(ctx context.Context, conflicted map[string]struct{}, forceDue bool) error {
+func (s *Syncer) flushOutboxRecords(ctx context.Context, conflicted map[string]struct{}, forceDue bool) error {
   ...
-  flushCtx, cancel := s.outboxContext(ctx)
+  flushCtx, cancel := s.outboxContext(ctx, forceDue /* respectParentCancel */)
   defer cancel()
   ...
 }

-func (s *Syncer) outboxContext(parent context.Context) (context.Context, context.CancelFunc) {
+func (s *Syncer) outboxContext(parent context.Context, respectParentCancel bool) (context.Context, context.CancelFunc) {
   root := s.rootCtx
   if root == nil {
     root = parent
   }
   timeout := s.outboxFlushTimeout
   if timeout <= 0 {
     timeout = defaultOutboxFlushTimeout
   }
-  return context.WithTimeout(root, timeout)
+  ctx, cancel := context.WithTimeout(root, timeout)
+  if !respectParentCancel || parent == nil {
+    return ctx, cancel
+  }
+  stop := context.AfterFunc(parent, cancel)
+  return ctx, func() {
+    stop()
+    cancel()
+  }
 }

Also applies to: 2854-2867

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/mountsync/syncer.go` around lines 1779 - 1783, outboxContext
currently ignores the caller's ctx (breaking FlushOutboxOnce timeout); change
outboxContext to take the caller ctx as the parent and return a derived ctx that
still applies any root-cycle timeout/values from s.rootCtx but does not bypass
caller cancellation—e.g., if s.rootCtx has a deadline, compute the earlier
deadline between caller ctx and s.rootCtx and use context.WithDeadline on the
caller ctx (or otherwise compose cancellations) so FlushOutboxOnce(ctx) and
flushOutboxRecords/flushOutboxRecordChunk calls will observe the caller
cancellation while cycle-path callers still get the root-derived timeout.

@khaliqgant khaliqgant merged commit ca90b87 into main Jun 11, 2026
9 checks passed
@khaliqgant khaliqgant deleted the fix/mount-outbox-independent-deadline branch June 11, 2026 12:27
@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented the PR fixes locally.

Changed:

  • Preserved the new root-derived outbox deadline for normal sync-cycle flushes.
  • Made forced one-shot flushes (FlushOutboxOnce(ctx)) respect caller cancellation.
  • Added a regression test for canceled FlushOutboxOnce.
  • Added defensive context.Background() fallback if both root and parent contexts are nil.

Addressed comments

  • CodeRabbit inline review: outboxContext bypassed FlushOutboxOnce(ctx) cancellation. Fixed in internal/mountsync/syncer.go:1780 and internal/mountsync/syncer.go:2874; regression test added at internal/mountsync/syncer_test.go:2626.
  • CodeRabbit review summary: repeated the same FlushOutboxOnce(ctx) cancellation finding. Fixed in internal/mountsync/syncer.go:1780.
  • Gemini Code Assist: potential panic if both rootCtx and parent context are nil. Fixed with context.Background() fallback in internal/mountsync/syncer.go:2863.
  • CodeRabbit walkthrough/pre-merge note: docstring coverage warning. No separate code change beyond updating the outboxContext doc at internal/mountsync/syncer.go:2847; the warning was repo/check advisory, not a validated PR breakage.
  • github-actions relayfile eval review: reported 4 passed, 0 failed, no reviewable cases. No code change needed.

Verification

Passed locally:

  • go test ./internal/mountsync
  • go test ./...
  • Go binary builds for cmd/relayfile, cmd/relayfile-mount, cmd/relayfile-cli
  • npm ci
  • npm run build
  • npm run test
  • npm run typecheck
  • ./scripts/check-contract-surface.sh
  • npx tsx scripts/e2e.ts --ci

The conditional workers typecheck job is skipped in this checkout because packages/server/tsconfig.json is absent. GitHub metadata currently reports PR #270 is already merged/closed, so I’m not ending with READY.

@agent-relay-code

Copy link
Copy Markdown
Contributor

Implemented the PR review fixes in the checkout.

Changed:

  • internal/mountsync/syncer.go: normal sync-cycle outbox flushes keep the independent root-derived timeout, while forced FlushOutboxOnce(ctx) flushes also respect caller cancellation.
  • internal/mountsync/syncer.go: outboxContext now falls back to context.Background() if both root and parent contexts are nil.
  • internal/mountsync/syncer_test.go: added regressions for nil-context fallback and FlushOutboxOnce caller cancellation.

Addressed comments

  • gemini-code-assist[bot]: flagged panic when both s.rootCtx and parent context are nil. Fixed in internal/mountsync/syncer.go:2861; regression added in internal/mountsync/syncer_test.go:2118.
  • coderabbitai[bot]: flagged outboxContext bypassing explicit FlushOutboxOnce(ctx) cancellation. Fixed in internal/mountsync/syncer.go:1779 and internal/mountsync/syncer.go:2868; regression added in internal/mountsync/syncer_test.go:2640.
  • coderabbitai[bot] walkthrough comment: informational summary of the PR, no separate validated code finding beyond the inline cancellation issue above.
  • github-actions[bot]: relayfile eval review reported 4 passed, 0 failed, 0 needing human review; no code change needed.
  • agent-relay-code[bot]: listed the same local fixes needed for CodeRabbit/Gemini findings; implemented and validated in the files above.

Advisory Notes

Local verification passed:

  • go test ./internal/mountsync -count=1 -timeout 180s
  • go test ./... -count=1 -timeout 240s
  • Go binary builds for cmd/relayfile, cmd/relayfile-mount, cmd/relayfile-cli
  • npm ci
  • npm run build
  • npm run typecheck
  • npm test
  • SDK CI sequence: package npm ci, core build, SDK build, npx tsc --noEmit, SDK tests
  • ./scripts/check-contract-surface.sh
  • npx tsx scripts/e2e.ts --ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant