Skip to content

feat(node): replication enforcement (Phase 2) for #18#34

Open
beardthelion wants to merge 18 commits into
Gitlawb:mainfrom
beardthelion:feat/phase2-replication-enforcement
Open

feat(node): replication enforcement (Phase 2) for #18#34
beardthelion wants to merge 18 commits into
Gitlawb:mainfrom
beardthelion:feat/phase2-replication-enforcement

Conversation

@beardthelion
Copy link
Copy Markdown
Contributor

@beardthelion beardthelion commented Jun 8, 2026

Phase 2 of path-scoped visibility (#18): stop withheld content from leaving the origin node through replication, and stop fully-private repos from being announced to the network. Phase 1 (#25) gates the git read path and Phase 3 (#28) withholds blobs from served packs, but after a push three paths still copied objects off the node ignoring visibility: local IPFS pinning, Pinata pinning, and the gossip/peer-notify/Arweave announcements.

The whole thing reduces to one decision computed once per push in git_receive_pack: can an anonymous caller read the repo root, and which blob OIDs are denied to the public. A withheld: Option<HashSet<String>> drives both pin sites (None means the repo is private, so nothing replicates, not even commit and tree objects), and an announce bool gates the network-facing announcements.

What changes:

  • IPFS and Pinata pinning skip the withheld blob OIDs (via a small pure replicable_objects filter). For a private repo they pin nothing at all, so file names in tree objects and history in commit objects no longer reach public IPFS.
  • Gossip ref-update publish, the HTTP peer-notify fallback, and Arweave anchoring are suppressed for repos the public cannot read. Mode B repos (public with a private subtree) still announce, since their commit and tree SHAs are public.
  • Fail closed: if visibility can't be determined, the push replicates nothing.
  • The in-process GraphQL subscription broadcast and the local branch->CID write are left alone; they are owner-facing/local, not network leaks.

Deferred on purpose, each cheap to add later off the same seam: peer partial-mirrors (peers currently fail closed on repos with withheld content), UCAN-delegated reader sets, and encrypted-at-rest replication of private blobs.

Depends on #28: withheld_blob_oids lives on that branch. This PR is stacked on it, so until #28 merges the diff here will also show #28's commits. Rebase onto main once #28 lands.

Test plan

  • cargo test -p gitlawb-node (100 pass), cargo clippy --all-targets -D warnings clean, cargo fmt --check clean
  • Unit coverage: replicable_objects filter, anonymous-caller contract of withheld_blob_oids, and the announce gate across public / legacy-private / mode A / mode B
  • Manual: push to a node with a mode B /secret/** rule, confirm the secret blob is absent from IPFS/Pinata while public files and the commit/tree are present
  • Manual: push to a fully-private repo, confirm no objects pinned and no gossip/peer-notify/Arweave anchor

Summary by CodeRabbit

Release Notes

  • New Features
    • Implemented selective content withholding based on repository visibility rules.
    • Enhanced Git pack generation to exclude restricted content during clone and fetch operations.
    • Updated replication and pinning services to respect visibility constraints, ensuring private content is not distributed to unauthorized users.
    • Improved access control enforcement across distributed storage systems.

…al clone

upload_pack_excluding emitted a v2 packfile section, but info_refs
advertises v0, so real clients negotiated v0 and rejected the response
with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape
instead (NAK, then the pack via side-band-64k when offered).

Add an end-to-end test that stands up info_refs + upload_pack_excluding
and runs a real git partial clone, asserting the withheld blob's bytes
never reach the client while its tree entry and SHA stay visible. A stock
full clone cannot consume the pack (it is not closed under reachability,
so fetch fails the connectivity check); a partial clone is required.
…tion choice

Add a real-git test that partial-clones, pushes a new commit server-side,
then fetches: the new object arrives and the withheld blob stays absent.
This pins down that ignoring have/want negotiation (always sending a
self-contained pack of all refs minus withheld, with NAK) is correct for
both clone and fetch; the only cost is a fetch re-sends the full object
set. Refactor the real-git tests onto a shared server harness and document
the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off
the async worker thread, matching the tokio::process / spawn_blocking usage
already in this file: build_filtered_pack (rev-list + pack-objects) and
withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large
repo cannot stall the tokio runtime. Behavior is unchanged.

Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2
packfile framing, which is the exact path that failed against a real client
and was corrected to v0. The block now documents the shipped v0 contract.
Drop a stray trailing code fence flagged by markdownlint (MD040).

The speculative ls-tree timeout and the public/no-rules fast-path from the
review are intentionally left out: the timeout guards against adversarial
repos we do not yet host, and the fast-path is a micro-optimization not worth
the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The
Phase 3 plan was scaffolding for this change, not something the project needs
to carry. Removing it leaves only the code and tests in the PR.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements visibility-aware blob withholding for Git read operations and replication. It adds a visibility_pack module to compute which blob OIDs must be withheld based on path-scoped visibility rules, modifies smart-HTTP to serve filtered packs, gates read access through upload-pack, and conditionally gates IPFS/Pinata/P2P dissemination during push operations based on public announce status.

Changes

Visibility-aware blob withholding for Git read and replication

Layer / File(s) Summary
Visibility pack core logic
crates/gitlawb-node/src/git/visibility_pack.rs, crates/gitlawb-node/src/git/mod.rs
New module computes withheld blob OIDs by evaluating visibility rules per (blob, path) pair across all refs. Exports withheld_blob_oids to compute the withheld set and replicable_objects to filter object lists. Includes comprehensive unit tests across anonymous/reader/owner visibility scenarios.
Smart HTTP pack filtering and serving
crates/gitlawb-node/src/git/smart_http.rs
Adds build_filtered_pack helper that removes withheld OIDs from pack contents and upload_pack_excluding async handler that serves filtered packs with git protocol v0 framing and optional side-band-64k chunking. Includes unit tests validating filtered pack contents and end-to-end tests verifying git clone --filter=blob:none and incremental git fetch both exclude withheld blob bytes from the client object database.
Upload-pack endpoint integration
crates/gitlawb-node/src/api/repos.rs (upload-pack handler)
Integrates visibility_pack into git_upload_pack to conditionally route to upload_pack or upload_pack_excluding based on whether blobs must be withheld. Refines error classification for protocol failures and clarifies that subtree visibility rules do not gate advertisement (withholding occurs during pack build).
Receive-pack Phase 2 replication control
crates/gitlawb-node/src/api/repos.rs (receive-pack, pinning, dissemination handlers)
Implements Phase 2 replication enforcement: derives announce flag from visibility rules (whether public may read the repo), computes optional withheld blob set, and conditionally gates IPFS/Pinata pinning, P2P ref publishing, HTTP peer sync, and Arweave anchoring. When announce=false, suppresses all replication; when announce=true, pins with withheld set and performs dissemination.
Pinning API withheld set support
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
Updates IPFS and Pinata pin_new_objects function signatures to accept withheld: &HashSet<String> parameter. Both functions filter enumerated repository objects through replicable_objects(withheld) before pinning, enabling selective object pinning.
Validation test and configuration
crates/gitlawb-node/src/visibility.rs, .gitignore
Adds unit test validating announce-gate behavior: public reads are allowed exactly when visibility rules permit anonymous access to the whole-repo path. Updates .gitignore to exclude docs/superpowers/ directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • Gitlawb/node#18: This PR directly implements path-scoped private-read behavior via visibility_pack::withheld_blob_oids and replicable_objects, with gated upload-pack, IPFS/Pinata pinning, and P2P/HTTP dissemination based on public announce status.

Possibly related PRs

  • Gitlawb/node#25: This PR builds on Phase 1's path-scoped visibility model introduced in that PR by implementing blob withholding and replication gating using the visibility rules and visibility_check authorization logic.

Suggested reviewers

  • kevincodex1

Poem

🐰 With withheld blobs now tucked away from prying eyes,
Git packs are filtered, secrets safe from sharing skies.
Visibility rules guide each announce decision true—
P2P and Pinata both know what not to brew!
Phase 2 replication blooms where public reads are blessed.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(node): replication enforcement (Phase 2) for #18' directly addresses the main objective of the PR: implementing Phase 2 of replication enforcement to prevent withheld content from leaving the origin node. It is specific, follows conventional commit format, and clearly summarizes the primary change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 629-644: The match arm currently calls
crate::git::visibility_pack::withheld_blob_oids(...) directly on the async
worker (using disk_path, rules, record.is_public, &record.owner_did), which must
be moved into a blocking task; replace the direct call with
tokio::task::spawn_blocking(||
crate::git::visibility_pack::withheld_blob_oids(...)).await handling
(propagate/map the Result->Option the same way and keep the tracing::warn! on
errors) so the git ls-tree subprocess runs off the async runtime thread.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a00a4311-c564-4086-b45f-866546839dd1

📥 Commits

Reviewing files that changed from the base of the PR and between 6abaf1d and 949d131.

📒 Files selected for processing (8)
  • .gitignore
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/visibility.rs

Comment on lines +629 to +644
match &rules_opt {
Some(rules) if rules.is_empty() => Some(std::collections::HashSet::new()),
Some(rules) => crate::git::visibility_pack::withheld_blob_oids(
&disk_path,
rules,
record.is_public,
&record.owner_did,
None,
)
.map_err(|e| {
tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push")
})
.ok(),
None => None,
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Blocking git ls-tree calls on async worker thread.

withheld_blob_oids internally runs blocking git ls-tree -r for each ref via std::process::Command. In git_upload_pack (lines 406-414), this is correctly wrapped in spawn_blocking, but here it's called directly on the async worker, which can stall the tokio runtime for repos with many refs.

Proposed fix: wrap in spawn_blocking
     let withheld: Option<std::collections::HashSet<String>> = if !announce {
         None
     } else {
         match &rules_opt {
             Some(rules) if rules.is_empty() => Some(std::collections::HashSet::new()),
-            Some(rules) => crate::git::visibility_pack::withheld_blob_oids(
-                &disk_path,
-                rules,
-                record.is_public,
-                &record.owner_did,
-                None,
-            )
-            .map_err(|e| {
-                tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push")
-            })
-            .ok(),
+            Some(rules) => {
+                let path = disk_path.clone();
+                let rules = rules.clone();
+                let owner_did = record.owner_did.clone();
+                let is_public = record.is_public;
+                tokio::task::spawn_blocking(move || {
+                    visibility_pack::withheld_blob_oids(&path, &rules, is_public, &owner_did, None)
+                })
+                .await
+                .map_err(|e| {
+                    tracing::warn!(err = %e, "withheld_blob_oids task panicked; skipping replication")
+                })
+                .ok()
+                .and_then(|r| r.map_err(|e| {
+                    tracing::warn!(err = %e, "withheld_blob_oids failed; skipping replication for this push")
+                }).ok())
+            }
             None => None,
         }
     };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 629 - 644, The match arm
currently calls crate::git::visibility_pack::withheld_blob_oids(...) directly on
the async worker (using disk_path, rules, record.is_public, &record.owner_did),
which must be moved into a blocking task; replace the direct call with
tokio::task::spawn_blocking(||
crate::git::visibility_pack::withheld_blob_oids(...)).await handling
(propagate/map the Result->Option the same way and keep the tracing::warn! on
errors) so the git ls-tree subprocess runs off the async runtime thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant