Privacy-preserving federated transfer learning for the RVF format.
rvf-federation = "0.1"RuVector users independently accumulate learning patterns -- SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters. Today that learning is siloed. rvf-federation implements the inter-user federation layer defined in ADR-057: it strips PII, injects differential privacy noise, packages transferable learning as RVF segments, and merges incoming learning with formal privacy guarantees.
| rvf-federation | Siloed learning | Manual sharing | |
|---|---|---|---|
| Privacy | 3-stage PII stripping + calibrated DP noise | N/A -- nothing leaves the machine | Trust the sender |
| Knowledge reuse | New users bootstrap from community priors | Every deployment starts cold | Copy-paste config files |
| Integrity | Witness chain + Ed25519/ML-DSA-65 signatures | N/A | No verification |
| Aggregation | FedAvg, FedProx, Byzantine-tolerant averaging | N/A | Manual merge |
| Privacy accounting | RDP composition with formal epsilon budget | N/A | N/A |
use rvf_federation::{
ExportBuilder, DiffPrivacyEngine, FederationPolicy,
TransferPriorSet, TransferPriorEntry, BetaParams,
};
// 1. Build an export from local learning
let priors = TransferPriorSet {
source_domain: "code_review".into(),
entries: vec![TransferPriorEntry {
bucket_id: "medium_algorithm".into(),
arm_id: "arm_0".into(),
params: BetaParams::new(10.0, 5.0),
observation_count: 50,
}],
cost_ema: 0.85,
};
// 2. Configure differential privacy (epsilon=1.0, delta=1e-5)
let mut dp = DiffPrivacyEngine::gaussian(1.0, 1e-5, 1.0, 1.0).unwrap();
// 3. Build: PII strip -> DP noise -> assemble manifest
let export = ExportBuilder::new("alice_pseudo".into(), "code_review".into())
.with_policy(FederationPolicy::default())
.add_priors(priors)
.add_string_field("config_path".into(), "/home/alice/project/.config".into())
.build(&mut dp)
.unwrap();
assert_eq!(export.manifest.format_version, 1);
assert!(export.redaction_log.total_redactions >= 1); // PII was stripped
assert!(export.privacy_proof.epsilon > 0.0); // DP noise was applied| Feature | What It Does | Why It Matters |
|---|---|---|
| PII stripping | 3-stage pipeline: detect, redact, attest | No personal data leaves the local machine |
| Differential privacy | Gaussian/Laplace noise with RDP accounting | Formal mathematical privacy guarantee per export |
| Gradient clipping | Bound L2 norms before aggregation | Limits any single user's influence on the aggregate |
| FedAvg / FedProx | Federated averaging with optional proximal term | Industry-standard aggregation (McMahan et al. 2017) |
| Byzantine tolerance | Outlier detection by L2-norm z-score | Malicious contributions are excluded automatically |
| Version-aware merging | Dampened confidence for cross-version imports | Older learning still helps, with reduced weight |
| Selective sharing | Allowlist/denylist for segments and domains | Users control exactly what they share |
Local Engine Remote
+------------------+ +------------+ +---------+ +----------+
| TransferPriors |--->| |--->| |---->| |
| PolicyKernels | | PII Strip | | DP | | RVF | Registry
| CostCurves | | (3-stage) | | Noise | | Export |----> (GCS)
| LoRA Weights | | | | | | Builder | |
+------------------+ +------------+ +---------+ +----------+ |
v
+------------------+ +------------+ +---------+ +----------+ +--------+
| Merged Learning |<---| Version- |<---| Import |<----| Validate |<-| Import |
| (local engines) | | Aware | | Merger | | (sig + | | (pull) |
| | | Merge | | | | witness) | +--------+
+------------------+ +------------+ +---------+ +----------+
| Module | Description |
|---|---|
types |
Four new RVF segment payload types (0x33-0x36) plus federation data structures |
error |
15 error variants covering privacy, validation, aggregation, and I/O failures |
pii_strip |
Three-stage PII stripping pipeline with 12 built-in detection rules |
diff_privacy |
Gaussian/Laplace noise engines, gradient clipping, RDP privacy accountant |
federation |
ExportBuilder and ImportMerger implementing the ADR-057 transfer protocol |
aggregate |
FederatedAggregator with FedAvg, FedProx, and Byzantine-tolerant strategies |
policy |
FederationPolicy for selective sharing with allowlists, denylists, and rate limits |
Four new RVF segment types extend the 0x30-0x32 domain expansion range:
| Code | Name | Purpose |
|---|---|---|
0x33 |
FederatedManifest |
Describes the export: contributor pseudonym, timestamp, included segments, privacy budget spent |
0x34 |
DiffPrivacyProof |
Privacy attestation: epsilon/delta, mechanism, sensitivity, clipping norm, noise scale |
0x35 |
RedactionLog |
PII stripping attestation: redaction counts by category, pre-redaction content hash, rules fired |
0x36 |
AggregateWeights |
Federated-averaged LoRA deltas with participation count, round number, confidence scores |
Readers that do not recognize these segment types skip them per the RVF forward-compatibility rule. Existing TransferPrior (0x30), PolicyKernel (0x31), CostCurve (0x32), Witness, and Crypto segments are reused as-is.
PiiStripper runs a three-stage pipeline on every string field before it leaves the local machine.
Stage 1 -- Detection. Twelve built-in regex rules scan for:
- Unix and Windows file paths (
/home/user/...,C:\Users\...) - IPv4 and IPv6 addresses
- Email addresses
- API keys (
sk-...,AKIA...,ghp_..., Bearer tokens) - Environment variable references (
$HOME,%USERPROFILE%) - Usernames (
@handle)
Custom rules can be registered with add_rule().
Stage 2 -- Redaction. Detected PII is replaced with deterministic pseudonyms (<PATH_1>, <IP_2>, <REDACTED_KEY>). The same original value always maps to the same pseudonym within a single export, preserving structural relationships without revealing content.
Stage 3 -- Attestation. A RedactionLog (0x35) segment is generated containing redaction counts by category, the SHAKE-256 hash of the pre-redaction content (proves scanning happened without revealing it), and the rules that fired.
use rvf_federation::PiiStripper;
let mut stripper = PiiStripper::new();
let fields = vec![
("config", "/home/alice/project/.env"),
("server", "connecting to 10.0.0.1:8080"),
("note", "no pii here"),
];
let (redacted, log) = stripper.strip_fields(&fields);
assert_eq!(log.fields_scanned, 3);
assert!(log.total_redactions >= 2);
assert!(redacted[2].1 == "no pii here"); // clean fields pass through| Mechanism | Privacy Model | Noise Distribution | Use Case |
|---|---|---|---|
| Gaussian | (epsilon, delta)-DP | N(0, sigma^2) where sigma = S * sqrt(2 ln(1.25/delta)) / epsilon | Default; tighter for large parameter counts |
| Laplace | Pure epsilon-DP | Laplace(0, S/epsilon) | Stronger guarantee; no delta term |
Before noise injection, all parameter vectors are clipped to a configurable L2 norm bound. This limits the sensitivity of the aggregation to any single user's contribution.
PrivacyAccountant tracks cumulative privacy loss using Renyi Differential Privacy (RDP) composition across 16 alpha orders. RDP composition is tighter than naive (epsilon, delta)-DP composition, meaning more exports fit within the same budget.
use rvf_federation::PrivacyAccountant;
let mut accountant = PrivacyAccountant::new(10.0, 1e-5); // budget: eps=10, delta=1e-5
accountant.record_gaussian(1.0, 1.0, 1e-5, 100);
assert!(accountant.remaining_budget() > 0.0);
assert!(!accountant.is_exhausted());| Strategy | Algorithm | Weighting | When to Use |
|---|---|---|---|
FedAvg |
Federated Averaging (McMahan et al.) | Trajectory count | Default; most scenarios |
FedProx |
Proximal regularization | Trajectory count + mu penalty | Heterogeneous data distributions |
WeightedAverage |
Simple weighted mean | Quality/reputation score | When contributor reputation varies widely |
| Byzantine detection | L2-norm z-score filtering | Outliers > 2 std removed | Always runs before aggregation |
use rvf_federation::{FederatedAggregator, AggregationStrategy};
use rvf_federation::aggregate::Contribution;
let mut agg = FederatedAggregator::new("code_review".into(), AggregationStrategy::FedAvg)
.with_min_contributions(2)
.with_byzantine_threshold(2.0);
agg.add_contribution(Contribution {
contributor: "alice".into(),
weights: vec![1.0, 2.0, 3.0],
quality_weight: 0.9,
trajectory_count: 100,
});
agg.add_contribution(Contribution {
contributor: "bob".into(),
weights: vec![1.2, 1.8, 3.1],
quality_weight: 0.85,
trajectory_count: 80,
});
let result = agg.aggregate().unwrap();
assert_eq!(result.participation_count, 2);
assert_eq!(result.lora_deltas.len(), 3);Measured on an AMD64 Linux system with Criterion.
| Benchmark | Time |
|---|---|
| PII detect (single string) | 756 ns |
| PII strip (10 fields) | 44 us |
| PII strip (100 fields) | 303 us |
| Gaussian noise (100 params) | 4.7 us |
| Gaussian noise (10k params) | 334 us |
| Gradient clipping (1k params) | 487 ns |
| Privacy accountant (100 rounds) | 1.0 us |
| FedAvg (10 contrib, 100 dim) | 3.9 us |
| FedAvg (100 contrib, 1k dim) | 365 us |
| Byzantine detection (50 contrib) | 12 us |
| Full export pipeline | 1.2 ms |
| Merge 100 priors | 28 us |
| Flag | Default | What It Enables |
|---|---|---|
std |
Yes | Standard library support (required) |
serde |
No | Derive Serialize/Deserialize on all public types |
[dependencies]
rvf-federation = { version = "0.1", features = ["serde"] }| Type | Description |
|---|---|
FederatedManifest |
Export metadata: contributor pseudonym, domain, timestamp, privacy budget spent |
DiffPrivacyProof |
Privacy attestation: epsilon, delta, mechanism, sensitivity, noise scale |
RedactionLog |
PII stripping attestation: entries by category, pre-redaction hash, field count |
AggregateWeights |
Federated-averaged LoRA deltas with round number, participation count, confidences |
BetaParams |
Beta distribution parameters for Thompson Sampling priors (merge, dampen, mean) |
| Type | Description |
|---|---|
TransferPriorEntry |
Single context bucket prior: bucket ID, arm ID, Beta params, observation count |
TransferPriorSet |
Collection of priors from a trained domain with cost EMA |
PolicyKernelSnapshot |
Snapshot of tunable policy knob values with fitness score |
CostCurveSnapshot |
Ordered (step, cost) points with acceleration factor |
| Type | Description |
|---|---|
FederatedAggregator |
Aggregation server: collects contributions, detects outliers, produces AggregateWeights |
AggregationStrategy |
FedAvg, FedProx { mu }, or WeightedAverage |
Contribution |
Single participant's weight vector with quality and trajectory metadata |
| Type | Description |
|---|---|
ExportBuilder |
Builder pattern: add priors/kernels/weights, PII-strip, DP-noise, produce FederatedExport |
ImportMerger |
Validate imports, merge priors with version-aware dampening, merge weights |
FederatedExport |
Completed export: manifest + redaction log + privacy proof + learning data |
FederationPolicy |
Selective sharing: allowlists, denylists, quality gate, rate limit, privacy budget |
PiiStripper |
Three-stage PII pipeline: detect, redact, attest |
DiffPrivacyEngine |
Noise injection with Gaussian or Laplace mechanism and gradient clipping |
PrivacyAccountant |
RDP-based cumulative privacy loss tracker |
FederationError covers 15 variants:
| Variant | Trigger |
|---|---|
PrivacyBudgetExhausted |
Cumulative epsilon exceeds limit |
InvalidEpsilon |
Epsilon <= 0 |
InvalidDelta |
Delta outside (0, 1) |
SegmentValidation |
Malformed segment data |
VersionMismatch |
Incompatible format version |
SignatureVerification |
Ed25519/ML-DSA-65 signature check failed |
WitnessChainBroken |
Witness chain has a gap or tampered entry |
InsufficientObservations |
Prior has too few observations for export |
QualityBelowThreshold |
Trajectory quality below policy minimum |
RateLimited |
Export rate limit exceeded |
PiiLeakDetected |
PII found after stripping (defense-in-depth) |
ByzantineOutlier |
Contribution flagged as adversarial |
InsufficientContributions |
Not enough participants for aggregation round |
Serialization |
Encoding/decoding failure |
Io |
I/O operation failure |
| Crate | Relationship |
|---|---|
rvf-types |
Core RVF segment definitions; rvf-federation defines its own payload types to avoid circular deps |
ruvector-domain-expansion |
Source of TransferPrior, PolicyKernel, CostCurve; federation exports these as RVF segments |
sona |
SONA learning engine; FederatedCoordinator handles intra-deployment aggregation, rvf-federation handles inter-user |
rvf-crypto |
Ed25519 signatures and SHAKE-256 hashing used for witness chains and segment integrity |
54 tests across all modules:
cargo test -p rvf-federationBenchmarks:
cargo bench -p rvf-federationMIT OR Apache-2.0
Part of RuVector -- the self-learning vector database.