Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

rvf-federation

Crates.io docs.rs License: MIT OR Apache-2.0 Rust 1.87+

Privacy-preserving federated transfer learning for the RVF format.

rvf-federation = "0.1"

RuVector users independently accumulate learning patterns -- SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters. Today that learning is siloed. rvf-federation implements the inter-user federation layer defined in ADR-057: it strips PII, injects differential privacy noise, packages transferable learning as RVF segments, and merges incoming learning with formal privacy guarantees.

rvf-federation Siloed learning Manual sharing
Privacy 3-stage PII stripping + calibrated DP noise N/A -- nothing leaves the machine Trust the sender
Knowledge reuse New users bootstrap from community priors Every deployment starts cold Copy-paste config files
Integrity Witness chain + Ed25519/ML-DSA-65 signatures N/A No verification
Aggregation FedAvg, FedProx, Byzantine-tolerant averaging N/A Manual merge
Privacy accounting RDP composition with formal epsilon budget N/A N/A

Quick Start

use rvf_federation::{
    ExportBuilder, DiffPrivacyEngine, FederationPolicy,
    TransferPriorSet, TransferPriorEntry, BetaParams,
};

// 1. Build an export from local learning
let priors = TransferPriorSet {
    source_domain: "code_review".into(),
    entries: vec![TransferPriorEntry {
        bucket_id: "medium_algorithm".into(),
        arm_id: "arm_0".into(),
        params: BetaParams::new(10.0, 5.0),
        observation_count: 50,
    }],
    cost_ema: 0.85,
};

// 2. Configure differential privacy (epsilon=1.0, delta=1e-5)
let mut dp = DiffPrivacyEngine::gaussian(1.0, 1e-5, 1.0, 1.0).unwrap();

// 3. Build: PII strip -> DP noise -> assemble manifest
let export = ExportBuilder::new("alice_pseudo".into(), "code_review".into())
    .with_policy(FederationPolicy::default())
    .add_priors(priors)
    .add_string_field("config_path".into(), "/home/alice/project/.config".into())
    .build(&mut dp)
    .unwrap();

assert_eq!(export.manifest.format_version, 1);
assert!(export.redaction_log.total_redactions >= 1); // PII was stripped
assert!(export.privacy_proof.epsilon > 0.0);         // DP noise was applied

Key Features

Feature What It Does Why It Matters
PII stripping 3-stage pipeline: detect, redact, attest No personal data leaves the local machine
Differential privacy Gaussian/Laplace noise with RDP accounting Formal mathematical privacy guarantee per export
Gradient clipping Bound L2 norms before aggregation Limits any single user's influence on the aggregate
FedAvg / FedProx Federated averaging with optional proximal term Industry-standard aggregation (McMahan et al. 2017)
Byzantine tolerance Outlier detection by L2-norm z-score Malicious contributions are excluded automatically
Version-aware merging Dampened confidence for cross-version imports Older learning still helps, with reduced weight
Selective sharing Allowlist/denylist for segments and domains Users control exactly what they share

Architecture

Local Engine                                             Remote
  +------------------+    +------------+    +---------+     +----------+
  | TransferPriors   |--->|            |--->|         |---->|          |
  | PolicyKernels    |    | PII Strip  |    | DP      |    | RVF      |     Registry
  | CostCurves       |    | (3-stage)  |    | Noise   |    | Export   |---->  (GCS)
  | LoRA Weights     |    |            |    |         |    | Builder  |       |
  +------------------+    +------------+    +---------+    +----------+       |
                                                                             v
  +------------------+    +------------+    +---------+     +----------+  +--------+
  | Merged Learning  |<---| Version-   |<---| Import  |<----| Validate |<-| Import |
  | (local engines)  |    | Aware      |    | Merger  |    | (sig +   |  | (pull) |
  |                  |    | Merge      |    |         |    | witness) |  +--------+
  +------------------+    +------------+    +---------+    +----------+

Modules

Module Description
types Four new RVF segment payload types (0x33-0x36) plus federation data structures
error 15 error variants covering privacy, validation, aggregation, and I/O failures
pii_strip Three-stage PII stripping pipeline with 12 built-in detection rules
diff_privacy Gaussian/Laplace noise engines, gradient clipping, RDP privacy accountant
federation ExportBuilder and ImportMerger implementing the ADR-057 transfer protocol
aggregate FederatedAggregator with FedAvg, FedProx, and Byzantine-tolerant strategies
policy FederationPolicy for selective sharing with allowlists, denylists, and rate limits

Segment Types

Four new RVF segment types extend the 0x30-0x32 domain expansion range:

Code Name Purpose
0x33 FederatedManifest Describes the export: contributor pseudonym, timestamp, included segments, privacy budget spent
0x34 DiffPrivacyProof Privacy attestation: epsilon/delta, mechanism, sensitivity, clipping norm, noise scale
0x35 RedactionLog PII stripping attestation: redaction counts by category, pre-redaction content hash, rules fired
0x36 AggregateWeights Federated-averaged LoRA deltas with participation count, round number, confidence scores

Readers that do not recognize these segment types skip them per the RVF forward-compatibility rule. Existing TransferPrior (0x30), PolicyKernel (0x31), CostCurve (0x32), Witness, and Crypto segments are reused as-is.

PII Stripping Pipeline

PiiStripper runs a three-stage pipeline on every string field before it leaves the local machine.

Stage 1 -- Detection. Twelve built-in regex rules scan for:

  • Unix and Windows file paths (/home/user/..., C:\Users\...)
  • IPv4 and IPv6 addresses
  • Email addresses
  • API keys (sk-..., AKIA..., ghp_..., Bearer tokens)
  • Environment variable references ($HOME, %USERPROFILE%)
  • Usernames (@handle)

Custom rules can be registered with add_rule().

Stage 2 -- Redaction. Detected PII is replaced with deterministic pseudonyms (<PATH_1>, <IP_2>, <REDACTED_KEY>). The same original value always maps to the same pseudonym within a single export, preserving structural relationships without revealing content.

Stage 3 -- Attestation. A RedactionLog (0x35) segment is generated containing redaction counts by category, the SHAKE-256 hash of the pre-redaction content (proves scanning happened without revealing it), and the rules that fired.

use rvf_federation::PiiStripper;

let mut stripper = PiiStripper::new();
let fields = vec![
    ("config", "/home/alice/project/.env"),
    ("server", "connecting to 10.0.0.1:8080"),
    ("note", "no pii here"),
];
let (redacted, log) = stripper.strip_fields(&fields);
assert_eq!(log.fields_scanned, 3);
assert!(log.total_redactions >= 2);
assert!(redacted[2].1 == "no pii here"); // clean fields pass through

Differential Privacy

Noise Mechanisms

Mechanism Privacy Model Noise Distribution Use Case
Gaussian (epsilon, delta)-DP N(0, sigma^2) where sigma = S * sqrt(2 ln(1.25/delta)) / epsilon Default; tighter for large parameter counts
Laplace Pure epsilon-DP Laplace(0, S/epsilon) Stronger guarantee; no delta term

Gradient Clipping

Before noise injection, all parameter vectors are clipped to a configurable L2 norm bound. This limits the sensitivity of the aggregation to any single user's contribution.

Privacy Accountant

PrivacyAccountant tracks cumulative privacy loss using Renyi Differential Privacy (RDP) composition across 16 alpha orders. RDP composition is tighter than naive (epsilon, delta)-DP composition, meaning more exports fit within the same budget.

use rvf_federation::PrivacyAccountant;

let mut accountant = PrivacyAccountant::new(10.0, 1e-5); // budget: eps=10, delta=1e-5
accountant.record_gaussian(1.0, 1.0, 1e-5, 100);
assert!(accountant.remaining_budget() > 0.0);
assert!(!accountant.is_exhausted());

Federation Strategies

Strategy Algorithm Weighting When to Use
FedAvg Federated Averaging (McMahan et al.) Trajectory count Default; most scenarios
FedProx Proximal regularization Trajectory count + mu penalty Heterogeneous data distributions
WeightedAverage Simple weighted mean Quality/reputation score When contributor reputation varies widely
Byzantine detection L2-norm z-score filtering Outliers > 2 std removed Always runs before aggregation
use rvf_federation::{FederatedAggregator, AggregationStrategy};
use rvf_federation::aggregate::Contribution;

let mut agg = FederatedAggregator::new("code_review".into(), AggregationStrategy::FedAvg)
    .with_min_contributions(2)
    .with_byzantine_threshold(2.0);

agg.add_contribution(Contribution {
    contributor: "alice".into(),
    weights: vec![1.0, 2.0, 3.0],
    quality_weight: 0.9,
    trajectory_count: 100,
});
agg.add_contribution(Contribution {
    contributor: "bob".into(),
    weights: vec![1.2, 1.8, 3.1],
    quality_weight: 0.85,
    trajectory_count: 80,
});

let result = agg.aggregate().unwrap();
assert_eq!(result.participation_count, 2);
assert_eq!(result.lora_deltas.len(), 3);

Performance Benchmarks

Measured on an AMD64 Linux system with Criterion.

Benchmark Time
PII detect (single string) 756 ns
PII strip (10 fields) 44 us
PII strip (100 fields) 303 us
Gaussian noise (100 params) 4.7 us
Gaussian noise (10k params) 334 us
Gradient clipping (1k params) 487 ns
Privacy accountant (100 rounds) 1.0 us
FedAvg (10 contrib, 100 dim) 3.9 us
FedAvg (100 contrib, 1k dim) 365 us
Byzantine detection (50 contrib) 12 us
Full export pipeline 1.2 ms
Merge 100 priors 28 us

Feature Flags

Flag Default What It Enables
std Yes Standard library support (required)
serde No Derive Serialize/Deserialize on all public types
[dependencies]
rvf-federation = { version = "0.1", features = ["serde"] }

API Overview

Core Types

Type Description
FederatedManifest Export metadata: contributor pseudonym, domain, timestamp, privacy budget spent
DiffPrivacyProof Privacy attestation: epsilon, delta, mechanism, sensitivity, noise scale
RedactionLog PII stripping attestation: entries by category, pre-redaction hash, field count
AggregateWeights Federated-averaged LoRA deltas with round number, participation count, confidences
BetaParams Beta distribution parameters for Thompson Sampling priors (merge, dampen, mean)

Transfer Types

Type Description
TransferPriorEntry Single context bucket prior: bucket ID, arm ID, Beta params, observation count
TransferPriorSet Collection of priors from a trained domain with cost EMA
PolicyKernelSnapshot Snapshot of tunable policy knob values with fitness score
CostCurveSnapshot Ordered (step, cost) points with acceleration factor

Aggregation Types

Type Description
FederatedAggregator Aggregation server: collects contributions, detects outliers, produces AggregateWeights
AggregationStrategy FedAvg, FedProx { mu }, or WeightedAverage
Contribution Single participant's weight vector with quality and trajectory metadata

Protocol Types

Type Description
ExportBuilder Builder pattern: add priors/kernels/weights, PII-strip, DP-noise, produce FederatedExport
ImportMerger Validate imports, merge priors with version-aware dampening, merge weights
FederatedExport Completed export: manifest + redaction log + privacy proof + learning data
FederationPolicy Selective sharing: allowlists, denylists, quality gate, rate limit, privacy budget
PiiStripper Three-stage PII pipeline: detect, redact, attest
DiffPrivacyEngine Noise injection with Gaussian or Laplace mechanism and gradient clipping
PrivacyAccountant RDP-based cumulative privacy loss tracker

Error Types

FederationError covers 15 variants:

Variant Trigger
PrivacyBudgetExhausted Cumulative epsilon exceeds limit
InvalidEpsilon Epsilon <= 0
InvalidDelta Delta outside (0, 1)
SegmentValidation Malformed segment data
VersionMismatch Incompatible format version
SignatureVerification Ed25519/ML-DSA-65 signature check failed
WitnessChainBroken Witness chain has a gap or tampered entry
InsufficientObservations Prior has too few observations for export
QualityBelowThreshold Trajectory quality below policy minimum
RateLimited Export rate limit exceeded
PiiLeakDetected PII found after stripping (defense-in-depth)
ByzantineOutlier Contribution flagged as adversarial
InsufficientContributions Not enough participants for aggregation round
Serialization Encoding/decoding failure
Io I/O operation failure

Related Crates

Crate Relationship
rvf-types Core RVF segment definitions; rvf-federation defines its own payload types to avoid circular deps
ruvector-domain-expansion Source of TransferPrior, PolicyKernel, CostCurve; federation exports these as RVF segments
sona SONA learning engine; FederatedCoordinator handles intra-deployment aggregation, rvf-federation handles inter-user
rvf-crypto Ed25519 signatures and SHAKE-256 hashing used for witness chains and segment integrity

Testing

54 tests across all modules:

cargo test -p rvf-federation

Benchmarks:

cargo bench -p rvf-federation

License

MIT OR Apache-2.0


Part of RuVector -- the self-learning vector database.