feat(node): replication enforcement (Phase 2) for #18#34
Conversation
…al clone upload_pack_excluding emitted a v2 packfile section, but info_refs advertises v0, so real clients negotiated v0 and rejected the response with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape instead (NAK, then the pack via side-band-64k when offered). Add an end-to-end test that stands up info_refs + upload_pack_excluding and runs a real git partial clone, asserting the withheld blob's bytes never reach the client while its tree entry and SHA stay visible. A stock full clone cannot consume the pack (it is not closed under reachability, so fetch fails the connectivity check); a partial clone is required.
…tion choice Add a real-git test that partial-clones, pushes a new commit server-side, then fetches: the new object arrives and the withheld blob stays absent. This pins down that ignoring have/want negotiation (always sending a self-contained pack of all refs minus withheld, with NAK) is correct for both clone and fetch; the only cost is a fetch re-sends the full object set. Refactor the real-git tests onto a shared server harness and document the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off the async worker thread, matching the tokio::process / spawn_blocking usage already in this file: build_filtered_pack (rev-list + pack-objects) and withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large repo cannot stall the tokio runtime. Behavior is unchanged. Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2 packfile framing, which is the exact path that failed against a real client and was corrected to v0. The block now documents the shipped v0 contract. Drop a stray trailing code fence flagged by markdownlint (MD040). The speculative ls-tree timeout and the public/no-rules fast-path from the review are intentionally left out: the timeout guards against adversarial repos we do not yet host, and the fast-path is a micro-optimization not worth the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The Phase 3 plan was scaffolding for this change, not something the project needs to carry. Removing it leaves only the code and tests in the PR.
📝 WalkthroughWalkthroughThis PR implements visibility-aware blob withholding for Git read operations and replication. It adds a visibility_pack module to compute which blob OIDs must be withheld based on path-scoped visibility rules, modifies smart-HTTP to serve filtered packs, gates read access through upload-pack, and conditionally gates IPFS/Pinata/P2P dissemination during push operations based on public announce status. ChangesVisibility-aware blob withholding for Git read and replication
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 629-644: The match arm currently calls
crate::git::visibility_pack::withheld_blob_oids(...) directly on the async
worker (using disk_path, rules, record.is_public, &record.owner_did), which must
be moved into a blocking task; replace the direct call with
tokio::task::spawn_blocking(||
crate::git::visibility_pack::withheld_blob_oids(...)).await handling
(propagate/map the Result->Option the same way and keep the tracing::warn! on
errors) so the git ls-tree subprocess runs off the async runtime thread.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a00a4311-c564-4086-b45f-866546839dd1
📒 Files selected for processing (8)
.gitignorecrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/visibility.rs
| match &rules_opt { | ||
| Some(rules) if rules.is_empty() => Some(std::collections::HashSet::new()), | ||
| Some(rules) => crate::git::visibility_pack::withheld_blob_oids( | ||
| &disk_path, | ||
| rules, | ||
| record.is_public, | ||
| &record.owner_did, | ||
| None, | ||
| ) | ||
| .map_err(|e| { | ||
| tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push") | ||
| }) | ||
| .ok(), | ||
| None => None, | ||
| } | ||
| }; |
There was a problem hiding this comment.
Blocking git ls-tree calls on async worker thread.
withheld_blob_oids internally runs blocking git ls-tree -r for each ref via std::process::Command. In git_upload_pack (lines 406-414), this is correctly wrapped in spawn_blocking, but here it's called directly on the async worker, which can stall the tokio runtime for repos with many refs.
Proposed fix: wrap in spawn_blocking
let withheld: Option<std::collections::HashSet<String>> = if !announce {
None
} else {
match &rules_opt {
Some(rules) if rules.is_empty() => Some(std::collections::HashSet::new()),
- Some(rules) => crate::git::visibility_pack::withheld_blob_oids(
- &disk_path,
- rules,
- record.is_public,
- &record.owner_did,
- None,
- )
- .map_err(|e| {
- tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push")
- })
- .ok(),
+ Some(rules) => {
+ let path = disk_path.clone();
+ let rules = rules.clone();
+ let owner_did = record.owner_did.clone();
+ let is_public = record.is_public;
+ tokio::task::spawn_blocking(move || {
+ visibility_pack::withheld_blob_oids(&path, &rules, is_public, &owner_did, None)
+ })
+ .await
+ .map_err(|e| {
+ tracing::warn!(err = %e, "withheld_blob_oids task panicked; skipping replication")
+ })
+ .ok()
+ .and_then(|r| r.map_err(|e| {
+ tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push")
+ }).ok())
+ }
None => None,
}
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/repos.rs` around lines 629 - 644, The match arm
currently calls crate::git::visibility_pack::withheld_blob_oids(...) directly on
the async worker (using disk_path, rules, record.is_public, &record.owner_did),
which must be moved into a blocking task; replace the direct call with
tokio::task::spawn_blocking(||
crate::git::visibility_pack::withheld_blob_oids(...)).await handling
(propagate/map the Result->Option the same way and keep the tracing::warn! on
errors) so the git ls-tree subprocess runs off the async runtime thread.
Phase 2 of path-scoped visibility (#18): stop withheld content from leaving the origin node through replication, and stop fully-private repos from being announced to the network. Phase 1 (#25) gates the git read path and Phase 3 (#28) withholds blobs from served packs, but after a push three paths still copied objects off the node ignoring visibility: local IPFS pinning, Pinata pinning, and the gossip/peer-notify/Arweave announcements.
The whole thing reduces to one decision computed once per push in
git_receive_pack: can an anonymous caller read the repo root, and which blob OIDs are denied to the public. Awithheld: Option<HashSet<String>>drives both pin sites (Nonemeans the repo is private, so nothing replicates, not even commit and tree objects), and anannouncebool gates the network-facing announcements.What changes:
replicable_objectsfilter). For a private repo they pin nothing at all, so file names in tree objects and history in commit objects no longer reach public IPFS.Deferred on purpose, each cheap to add later off the same seam: peer partial-mirrors (peers currently fail closed on repos with withheld content), UCAN-delegated reader sets, and encrypted-at-rest replication of private blobs.
Depends on #28:
withheld_blob_oidslives on that branch. This PR is stacked on it, so until #28 merges the diff here will also show #28's commits. Rebase onto main once #28 lands.Test plan
cargo test -p gitlawb-node(100 pass),cargo clippy --all-targets -D warningsclean,cargo fmt --checkcleanreplicable_objectsfilter, anonymous-caller contract ofwithheld_blob_oids, and the announce gate across public / legacy-private / mode A / mode B/secret/**rule, confirm the secret blob is absent from IPFS/Pinata while public files and the commit/tree are presentSummary by CodeRabbit
Release Notes