Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions internal/mountsync/syncer.go
Original file line number Diff line number Diff line change
Expand Up @@ -1412,6 +1412,30 @@ func (s *Syncer) HTTPClient() (*HTTPClient, bool) {

func (s *Syncer) pullRemote(ctx context.Context, conflicted map[string]struct{}) error {
if s.state.EventsCursor != "" {
// Skip-if-no-events short-circuit. Most reconcile cycles on a
// quiet workspace have nothing to pull; turning that into a
// single cheap ListEvents probe avoids the worst-case full-tree
// fetch (sequential ReadFile per entry) that times out on
// workspaces with hundreds of files. If the events feed reports
// no new events since our last cursor, the local mirror is
// already up-to-date and we can return immediately. The caller
// (sync) then bumps LastSuccessfulReconcileAt so the stall
// detector stays clear.
//
// If the events feed itself is unavailable (404) we fall through
// to the existing incremental/full-pull path, which will hit the
// 404 again and degrade to the full-tree fetch. That preserves
// pre-fix behaviour for backends without an events feed.
feed, err := s.client.ListEvents(ctx, s.workspace, s.eventProvider, s.state.EventsCursor, 1)
if err == nil && len(feed.Events) == 0 {
// Quiet cycle. Periodic full-pull cadence is still tracked
// against incrementalCycles below to keep the trust-but-verify
// safety net intact, but only when there is actually work to
// do. Counting empty cycles toward the threshold would race
// the periodic full pull against the very condition the
// short-circuit is designed to avoid.
return nil
}
// "Trust but verify": every Nth incremental cycle, force a full
// tree pull regardless of cursor health. This self-heals any stale
// state caused by cloud-side revision reuse — applyRemoteFile
Expand Down Expand Up @@ -1450,6 +1474,47 @@ func (s *Syncer) pullRemote(ctx context.Context, conflicted map[string]struct{})
s.state.EventsCursor = ""
}

// Restart fast-path. When EventsCursor is empty but the state file
// already records tracked files AND a prior LastEventAt — meaning a
// previous daemon successfully observed events from this workspace
// — this is a daemon restart against a workspace we have synced
// before. The full-tree fetch (export or per-file ReadFile loop) on
// workspaces with hundreds of files routinely exceeds the per-cycle
// deadline (RELAYFILE_MOUNT_TIMEOUT, default 15s), trapping the
// daemon in a permanent stall:
//
// mount sync cycle failed: context deadline exceeded
// mount stalled: no successful reconcile for 10m
//
// Skip the bootstrap full pull: seed the events cursor against the
// current tip and trust the existing on-disk state. Any drift between
// local and remote will be caught either by the next incremental
// cycle (if events fired during downtime) or by the periodic full
// pull cadence (every fullPullEvery cycles). If resolving the cursor
// fails — including on backends without an events feed — fall
// through to the full pull as before so this is purely additive on
// supported backends.
//
// Gating on LastEventAt (in addition to len(Files) > 0) keeps the
// fast-path opt-in: callers and tests that hand-seed a state file
// without ever observing live events still go through the full pull
// (which is necessary for e.g. the denied-file teardown path).
if len(s.state.Files) > 0 && strings.TrimSpace(s.state.LastEventAt) != "" {
cursor, err := s.resolveLatestEventCursor(ctx)
if err == nil {
s.state.EventsCursor = cursor
Comment on lines +1503 to +1505

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid advancing cursor past unseen downtime events

When a restarted daemon has tracked files and LastEventAt but no EventsCursor, resolving the cursor to the current tip and returning skips every event that occurred after the persisted mirror was last updated. In that scenario (for example, a remote file is updated or deleted while the daemon is down), the next reconcile probes from this newly seeded tip, sees no events, and short-circuits; because quiet cycles no longer advance the full-pull counter, the local mirror can remain stale indefinitely until some later non-empty cycle happens to trigger verification.

Useful? React with 👍 / 👎.

s.logf("restart fast-path: seeded events cursor %q from %d tracked files; skipping bootstrap full pull", cursor, len(s.state.Files))
return nil
}
var httpErr *HTTPError
if errors.As(err, &httpErr) && httpErr.StatusCode == http.StatusNotFound {
// No events feed on this backend — fall through to the
// full-pull bootstrap path. (Pre-fix behaviour.)
} else {
s.logf("restart fast-path: cursor resolution failed (%v); falling through to full pull", err)
}
}

if err := s.pullRemoteFull(ctx, conflicted); err != nil {
return err
}
Expand Down
Loading
Loading