gl clone: clean partial clone of repos with private subtrees#33
gl clone: clean partial clone of repos with private subtrees#33beardthelion wants to merge 5 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds backend helpers and a new REST endpoint that expose per-caller withheld and reinclude path globs, and a ChangesBackend Withheld Paths Support
CLI Clone Command with Partial Clone Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/visibility.rs`:
- Around line 204-212: The handler currently returns withheld metadata without
verifying the caller can read the repository root; before calling
crate::visibility::withheld_globs or returning the JSON, perform a root-read
gate by calling the existing visibility_check (or equivalent) with path "/"
using the same auth, rules, record.is_public, and record.owner_did; if
visibility_check(..., "/") denies access, return the repo-not-found/unauthorized
response used elsewhere (same error path as other root-read failures) instead of
returning the withheld data. Ensure this check is placed after fetching rules
(list_visibility_rules) but before invoking withheld_globs and constructing the
Json response.
In `@crates/gitlawb-node/src/visibility.rs`:
- Around line 114-119: The current withheld_globs filtering uses a single
representative probe (glob_prefix) and calls visibility_check, which
misclassifies a denied parent when a more-specific child rule allows access;
update the logic in withheld_globs (or change its API contract) to detect
allow-overrides by checking for any more-specific descendant allow before
emitting the parent glob: either (A) when
visibility_check(glob_prefix(&r.path_glob)) == Decision::Deny, scan rules for
any child globs under r.path_glob that yield Decision::Allow (using
visibility_check on those child probes) and skip emitting the parent if any
exist, or (B) change the return structure to emit tuples like (denying_glob,
allowed_exceptions) so the caller can re-include allowed descendants; reference
visibility_check, glob_prefix, Decision::Deny, and the withheld_globs helper to
locate where to implement this.
In `@crates/gl/src/clone.rs`:
- Around line 133-144: The current parsing uses split_once('/') which accepts
inputs like "owner/name/extra"; update the parsing logic in clone.rs (the block
that computes owner and name from stripped) to reject any repo string containing
more than one slash: after trimming trailing slashes and calling
split_once('/'), validate that neither owner nor name contains another '/' (or
alternately check that the count of '/' is exactly one) and bail! with the same
error message if extra slashes are present so malformed inputs like
"owner/name/extra" fail fast instead of producing an invalid repo identifier.
- Around line 147-171: fetch_withheld currently swallows all errors and returns
an empty Vec, which allows clones to proceed when the withheld-paths endpoint
actually failed; change fetch_withheld to return a Result<Vec<String>,
anyhow::Error> (or your crate's error type) instead of Vec<String>, and
propagate any network/HTTP/json errors to the caller except for an explicit
“endpoint unsupported” HTTP status (e.g. 404 Not Found or 501 Not Implemented)
where you should return Ok(Vec::new()). Concretely: update fetch_withheld
signature, stop using unwrap_or_default on resp.json(), treat Err(resp) and
non-success statuses as Err unless status is 404/501 (in which case return
Ok([])), and update callers of fetch_withheld to handle the Result; refer to
load_keypair_from_dir, NodeClient::get_signed/get and the path
"/api/v1/repos/{owner}/{name}/withheld-paths" to locate the code to change.
- Around line 112-121: The current logic in clone.rs extracts the default branch
by parsing stdout from `git remote show origin` (variables `out`, `text`,
`head`) and does not check `out.status.success()`, which breaks for localized
output or non-zero exit; replace this with a refs-based lookup by running `git
symbolic-ref --short refs/remotes/origin/HEAD` (or equivalent) and check the
child process exit status (`out.status.success()`), then read and trim stdout to
produce the branch name, returning an error if the command fails or stdout is
empty instead of relying on localized `HEAD branch:` parsing.
- Around line 99-105: The loop in setup_partial_clone that converts
withheld_globs currently always strips a trailing "**" and emits only a
directory-exclude like "!{dir}/", which fails to exclude the exact path when the
upstream sent "/prefix" (no "/**"). Change the logic inside the for g in
withheld_globs loop: detect whether g originally ended with "**" (or "/**"); if
it did keep the existing behavior (emit "!{dir}/"), but if it did not then emit
two exclude lines for that dir—one for the exact path ("!{dir}") and one for the
subtree ("!{dir}/")—so the sparse-checkout encoding matches visibility_check
semantics; update/add a unit/integration test that calls setup_partial_clone (or
the surrounding code) with a withheld glob like "/docs/private" to assert both
the exact path and subtree are withheld.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 43094733-35b0-4310-8cae-ee86146445f9
📒 Files selected for processing (5)
crates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
Three fixes from the PR Gitlawb#33 review: - withheld_paths now applies the whole-repo "/" read gate (returns repo-not-found when the caller cannot read the root), matching the git read endpoints. Without it the endpoint disclosed a private repo's existence and path layout to unauthorized callers. The withheld_globs doc already assumed this gate existed; now it does. - A nested allow under a denied parent (e.g. "/secret/public/**" allowed, "/secret/**" denied) was over-withheld: the client sparse-excluded the whole parent and hid paths the caller may read. The endpoint now also returns a "reinclude" list (allowed globs strictly under a denied one) and gl clone re-includes them in the sparse spec after the excludes. - Wildcard-free globs like "/docs/private" match both the exact path and a subtree (per glob_matches), but the client only emitted the subtree exclude. sparse_patterns now emits both "/docs/private" and "/docs/private/". Verified the exclude-then-reinclude sparse ordering checks out cleanly with real git, plus unit tests for reincluded_globs, the nested re-include, the exact-path exclude, and sparse_patterns.
…eld-path errors
split_once('/') accepted owner/name/extra, smuggling a path segment into
the repo name that then flowed into the API path and clone URL; reject it.
fetch_withheld swallowed every network/auth/5xx/JSON error into an empty
result, dropping to a stock clone that the node refuses once blobs are
withheld. Now only 404/501 (endpoint unsupported) fall back to empty; the
rest propagate so the real cause surfaces.
Pairs with #28 (subtree content withholding). That PR makes the node withhold private blob bytes from the served pack; the resulting pack is not closed under reachability, so a stock
git cloneis refused at fetch unless the user manually passes--filter. This adds the client-side piece so a non-reader gets a clean checkout with one command.What's here
GET /api/v1/repos/{owner}/{repo}/withheld-paths: returns only the path globs the (optionally authenticated) caller is denied. Not owner-gated and never exposes reader DIDs, unlikelist_visibility. Computed from the existingvisibility_checkvia a new purewithheld_globs.gl clone <gitlawb://owner/name | owner/name> [dir]: asks the node what's withheld for the caller, then clones as a promisor (git clone --filter=blob:none --no-checkout) and sparse-excludes those globs before checkout. Public files land, private paths are absent, the tree entries and SHAs stay visible, no error.A
connect-mode helper can't influence git's own clone logic, so the orchestration lives ingl clonerather than trying to make a baregit clone gitlawb://...work without flags (noted as a follow-up).Notes
withheld-pathsreturns[]andgl clonebehaves like a normal clone. It starts doing real work once feat(node): subtree content withholding (Phase 3) for #18 #28 lands.--filter=blob:noneis the exact client invocation feat(node): subtree content withholding (Phase 3) for #18 #28's node test already proved accepts the withholding pack.--no-conesparse mode is required for the negated!/dir/excludes.Test plan
cargo test -p gl -p gitlawb-node(new:withheld_globs,gl cloneorchestration against afile://bare, repo-arg parsing)/secretmode-B repo,gl cloneit and confirmpublic/present,secret/absent from the worktree, secret path + SHA still ingit ls-tree.Summary by CodeRabbit
New Features
Tests