Skip to content

feat(mount): seed GitHub working trees from export tar#213

Merged
khaliqgant merged 1 commit into
mainfrom
codex/issue-1250-mount-tar-seed
May 27, 2026
Merged

feat(mount): seed GitHub working trees from export tar#213
khaliqgant merged 1 commit into
mainfrom
codex/issue-1250-mount-tar-seed

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Summary

  • add a raw-tar GitHub working-tree seed path for relayfile-mount bootstrap
  • verify seeded files against fs/tree contentHash and fail on missing/unexpected entries
  • decode local checkout paths while preserving RelayFile object paths for writeback
  • preserve the clone sentinel events cursor and fall back cleanly when the contract is unsupported

Contract notes

  • requests gzip=0 for /fs/export?format=tar&decode=github-working-tree
  • reads .relayfile/clone.json first, with legacy meta.json fallback
  • accepts eventsCursor plus legacy aliases; forward-scans for the sentinel cursor until import stamps eventsCursor

Tests

  • go test ./internal/mountsync ./cmd/relayfile-mount
  • git diff --check

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@khaliqgant, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 34 minutes and 15 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 381779d5-e98c-42a2-b9fc-385ad96711da

📥 Commits

Reviewing files that changed from the base of the PR and between ec396ef and 919ddaa.

📒 Files selected for processing (3)
  • internal/mountsync/http_client_test.go
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go
📝 Walkthrough

Walkthrough

This PR adds GitHub "working tree" mount support to mountsync, enabling tar-based full-tree bootstrapping, GitHub-specific path mapping, and event cursor seeding from clone manifests.

Changes

GitHub Working Tree Mount and Tar-based Seed Support

Layer / File(s) Summary
Data contracts and GitHub working tree types
internal/mountsync/syncer.go
TreeEntry gains Size and Encoding fields; new types GithubWorkingTreeSeedRequest and GithubWorkingTreeTar added for tar-export flow; githubWorkingTreeMount struct introduced for path mapping; SyncerState extended with GithubWorkingTreeHeadSHA persistence.
HTTPClient tar export implementation and test
internal/mountsync/syncer.go, internal/mountsync/http_client_test.go
ExportGithubWorkingTreeTar method streams tar-format exports with auth, retry, and error handling; HTTP client test validates query parameters, content-type preservation, and tar entry parsing.
Syncer GitHub mount detection and state initialization
internal/mountsync/syncer.go
NewSyncer detects GitHub working-tree mounts from remoteRoot; githubWorkingTree field initialized on Syncer; persisted GithubWorkingTreeHeadSHA restored into in-memory mount state during load.
GitHub-aware path translation and safety helpers
internal/mountsync/syncer.go
Implements remoteToLocalPath, localPathToRemotePath, localRelativeToRemotePath for GitHub mapping; adds githubRemotePathForWorkingTreeRel revision selection, safeLocalPath traversal protection, and detectGithubWorkingTreeMount with sentinel/meta path handling.
GitHub tar seed bootstrap orchestration
internal/mountsync/syncer.go
Reads clone manifest for cursor, verifies expected tree via paginated ListTree, exports tar for target head SHA, applies tar files with hash verification and dirty-state preservation, performs safety checks, and commits bootstrap state with events cursor and head SHA.
Path resolution call site updates throughout syncer
internal/mountsync/syncer.go
Eight call sites updated to use GitHub-aware path translation: local event handler, bootstrap/incremental hash probes, apply-remote operations, local push, read-deny cleanup, and file scanning.
Integration and unit test coverage
internal/mountsync/syncer_test.go, internal/mountsync/http_client_test.go
Comprehensive reconcile test seeds local files from tar export and verifies cursor/head-SHA persistence; fakeExportClient extended with deterministic tar generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/relayfile#92: Both PRs modify state.EventsCursor handling in pullRemoteFull, including early-path logic and cursor resolution behavior that interact at the same decision points.
  • AgentWorkforce/relayfile#185: Both PRs update bootstrap and full-reconcile flow in pullRemoteFull around WebSocket/cursor polling, with direct dependency on syncer behavior changes.

Poem

🐰 A rabbit hops through working trees so bright,
With tar-seeded roots and paths mapped just right,
GitHub's clone manifest guides every leap,
While checksums guard the promises we keep!
Thump, thump!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding GitHub working tree seed support via tar export. It is concise, specific, and clearly summarizes the primary objective.
Description check ✅ Passed The description is directly related to the changeset. It provides a summary of the implementation approach, contract details, and test verification, all of which align with the code changes shown in the raw summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-1250-mount-tar-seed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for seeding the local workspace using a GitHub working tree tarball export. It adds a new ExportGithubWorkingTreeTar method to the HTTP client, updates the syncer to detect GitHub working tree mounts, and implements the logic to fetch, verify, and apply the tarball seed. A critical performance issue was identified in githubRemotePathForWorkingTreeRel where unnecessary slice allocations and sorting are performed for every file during local scans, leading to $O(N^2 \log N)$ complexity.

Comment thread internal/mountsync/syncer.go
@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown

Relayfile Eval Review

Run: .relayfile/evals/runs/2026-05-27T11-00-10-150Z-HEAD-provider
Mode: provider
Git SHA: cb1d04e

Passed: 4 | Needs human: 0 | Reviewable: 0 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

No reviewable human-review cases captured Relayfile output.

@khaliqgant

Copy link
Copy Markdown
Member Author

Server-contract review (claude-2, #1250 stage-2 server owner) — ✅ adheres

Reviewed the client against the relayfile server contract it consumes. Contract-correct on all points:

  • Raw tar requestgzip=0 + pathPrefix/headSha asserted in the test; defensively handles BOTH application/x-tar and (legacy) gzip content-types. ✅
  • contentHash verification (the linchpin) — verifies each tar entry via hashBytes(data) vs the fs/tree contentHash. This is correct by construction: the server-side contentHash (cloud slice-1, content-hash.ts) was explicitly written to MIRROR this daemon hashBytes (SHA-256 hex of the raw decoded bytes), and the decoded github-working-tree tar contains those same raw bytes — so verification MATCHES rather than always-refetching. Holds across utf-8 and base64-stored files (both stored decoded in R2). ✅
  • Shared-write plane preserved — local paths decode to real checkout paths while tracked state keeps the authoritative RelayFile object path (remotePath) for writeback. So agent edits still flow back to RelayFile (the multi-agent constraint). ✅
  • Sentinel/cursor — prefers .relayfile/clone.json (meta.json fallback), eventsCursor accepted with aliases + a forward-scan fallback, sentinel-aligned cursor preserved (not latest-after-seed). ✅ (The forward-scan fallback is fine for small clone histories; the import stamping eventsCursor will eliminate it for large histories — tracked separately.)
  • Integrity guards — tar-verified-count must equal tree count (else error); snapshot-delete skipped on suspected partial/empty listing (preserves local state). ✅

No contract gaps. Composes with #1256 (server: raw-tar + gzip=0-coupled body ceiling) once both deploy + the consumer requests &gzip=0. LGTM from the contract side.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/mountsync/syncer.go`:
- Around line 5113-5135: The mapping can pick a stale object when state
temporarily contains both old and new SHAs; update
githubRemotePathForWorkingTreeRel to prefer a candidate whose remotePath suffix
matches the current head SHA (s.githubWorkingTree.HeadSHA) before falling back
to revision comparison: inside the loop over paths, when candidate==rel compute
isHeadCandidate := (s.githubWorkingTree != nil && s.githubWorkingTree.HeadSHA !=
"" && strings.HasSuffix(remotePath, "@"+s.githubWorkingTree.HeadSHA)); then
choose candidate if bestPath=="" or (isHeadCandidate && !bestIsHead) or
(isHeadCandidate==bestIsHead && revisionAdvances(bestRevision, revision)); track
bestIsHead alongside bestPath/bestRevision; ensure nil/empty HeadSHA is handled
(treat as non-head).
- Around line 2690-2695: parseGithubCloneManifest currently only recognizes
camelCase keys for cursor fields so legacy meta.json entries like events_cursor
or event_id are ignored; update parseGithubCloneManifest (and the read helper
used there) to accept snake_case variants ("events_cursor", "event_cursor",
"fs_events_cursor", "cursor" and "event_id", "event_id" etc.) when populating
githubCloneManifest.EventsCursor and EventID so the legacy manifest path seeded
by readGithubCloneManifest preserves the cursor; modify the read(...) call list
used to build githubCloneManifest (and any related key lookup logic) to include
the snake_case key names alongside the existing camelCase names.
- Around line 2789-2878: The tar verification loop currently allows duplicate
entries because it only validates membership against tree and final cardinality;
fix by tracking seen entries and rejecting duplicates: introduce a local map
(e.g., seenRel := map[string]struct{}{}) before the loop that iterates
tr.Next(), and immediately after computing rel (the cleaned header path) check
if rel is already in seenRel and if so return an error (e.g., "github tar seed
contains duplicate file %q"); otherwise add rel to seenRel and continue with the
existing processing (this will catch duplicate headers for the same file before
using tree[rel] or writing files such as in the blocks that reference
meta.RemotePath, safeLocalPath, writeFileAtomic, etc.).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ada55aa7-36e1-44fe-a9dd-ac76eae86814

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae090c and ec396ef.

📒 Files selected for processing (3)
  • internal/mountsync/http_client_test.go
  • internal/mountsync/syncer.go
  • internal/mountsync/syncer_test.go

Comment thread internal/mountsync/syncer.go
Comment thread internal/mountsync/syncer.go
Comment thread internal/mountsync/syncer.go Outdated
@khaliqgant khaliqgant force-pushed the codex/issue-1250-mount-tar-seed branch from ec396ef to 919ddaa Compare May 27, 2026 10:58
@khaliqgant

Copy link
Copy Markdown
Member Author

Addressed the review findings in 919ddaa:

  • precomputed the GitHub working-tree local-path index once per scan and use O(1) lookups during scanLocalFiles; single-event routing still builds one index per event
  • local-to-remote mapping now prefers the current HeadSHA object before revision tiebreaking, with a regression for old/new SHA coexistence
  • legacy clone manifests now accept snake_case cursor keys (events_cursor, event_cursor, fs_events_cursor, event_id)
  • tar seed verification now rejects duplicate entries, with regression coverage

Validation: go test ./internal/mountsync ./cmd/relayfile-mount and git diff --check pass. All review threads are resolved.

@khaliqgant khaliqgant merged commit 67cd414 into main May 27, 2026
8 checks passed
@khaliqgant khaliqgant deleted the codex/issue-1250-mount-tar-seed branch May 27, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant