# rvf-federation [![Crates.io](https://img.shields.io/crates/v/rvf-federation.svg)](https://crates.io/crates/rvf-federation) [![docs.rs](https://img.shields.io/docsrs/rvf-federation)](https://docs.rs/rvf-federation) [![License: MIT OR Apache-2.0](https://img.shields.io/badge/License-MIT%20OR%20Apache--2.0-blue.svg)](https://opensource.org/licenses/MIT) [![Rust 1.87+](https://img.shields.io/badge/rust-1.87%2B-orange.svg)](https://www.rust-lang.org) **Privacy-preserving federated transfer learning for the RVF format.** ```toml rvf-federation = "0.1" ``` RuVector users independently accumulate learning patterns -- SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters. Today that learning is siloed. `rvf-federation` implements the inter-user federation layer defined in [ADR-057](../../../docs/adr/ADR-057-federated-rvf-transfer-learning.md): it strips PII, injects differential privacy noise, packages transferable learning as RVF segments, and merges incoming learning with formal privacy guarantees. | | rvf-federation | Siloed learning | Manual sharing | |---|---|---|---| | **Privacy** | 3-stage PII stripping + calibrated DP noise | N/A -- nothing leaves the machine | Trust the sender | | **Knowledge reuse** | New users bootstrap from community priors | Every deployment starts cold | Copy-paste config files | | **Integrity** | Witness chain + Ed25519/ML-DSA-65 signatures | N/A | No verification | | **Aggregation** | FedAvg, FedProx, Byzantine-tolerant averaging | N/A | Manual merge | | **Privacy accounting** | RDP composition with formal epsilon budget | N/A | N/A | ## Quick Start ```rust use rvf_federation::{ ExportBuilder, DiffPrivacyEngine, FederationPolicy, TransferPriorSet, TransferPriorEntry, BetaParams, }; // 1. Build an export from local learning let priors = TransferPriorSet { source_domain: "code_review".into(), entries: vec![TransferPriorEntry { bucket_id: "medium_algorithm".into(), arm_id: "arm_0".into(), params: BetaParams::new(10.0, 5.0), observation_count: 50, }], cost_ema: 0.85, }; // 2. Configure differential privacy (epsilon=1.0, delta=1e-5) let mut dp = DiffPrivacyEngine::gaussian(1.0, 1e-5, 1.0, 1.0).unwrap(); // 3. Build: PII strip -> DP noise -> assemble manifest let export = ExportBuilder::new("alice_pseudo".into(), "code_review".into()) .with_policy(FederationPolicy::default()) .add_priors(priors) .add_string_field("config_path".into(), "/home/alice/project/.config".into()) .build(&mut dp) .unwrap(); assert_eq!(export.manifest.format_version, 1); assert!(export.redaction_log.total_redactions >= 1); // PII was stripped assert!(export.privacy_proof.epsilon > 0.0); // DP noise was applied ``` ## Key Features | Feature | What It Does | Why It Matters | |---|---|---| | **PII stripping** | 3-stage pipeline: detect, redact, attest | No personal data leaves the local machine | | **Differential privacy** | Gaussian/Laplace noise with RDP accounting | Formal mathematical privacy guarantee per export | | **Gradient clipping** | Bound L2 norms before aggregation | Limits any single user's influence on the aggregate | | **FedAvg / FedProx** | Federated averaging with optional proximal term | Industry-standard aggregation (McMahan et al. 2017) | | **Byzantine tolerance** | Outlier detection by L2-norm z-score | Malicious contributions are excluded automatically | | **Version-aware merging** | Dampened confidence for cross-version imports | Older learning still helps, with reduced weight | | **Selective sharing** | Allowlist/denylist for segments and domains | Users control exactly what they share | ## Architecture ``` Local Engine Remote +------------------+ +------------+ +---------+ +----------+ | TransferPriors |--->| |--->| |---->| | | PolicyKernels | | PII Strip | | DP | | RVF | Registry | CostCurves | | (3-stage) | | Noise | | Export |----> (GCS) | LoRA Weights | | | | | | Builder | | +------------------+ +------------+ +---------+ +----------+ | v +------------------+ +------------+ +---------+ +----------+ +--------+ | Merged Learning |<---| Version- |<---| Import |<----| Validate |<-| Import | | (local engines) | | Aware | | Merger | | (sig + | | (pull) | | | | Merge | | | | witness) | +--------+ +------------------+ +------------+ +---------+ +----------+ ``` ## Modules | Module | Description | |---|---| | `types` | Four new RVF segment payload types (0x33-0x36) plus federation data structures | | `error` | 15 error variants covering privacy, validation, aggregation, and I/O failures | | `pii_strip` | Three-stage PII stripping pipeline with 12 built-in detection rules | | `diff_privacy` | Gaussian/Laplace noise engines, gradient clipping, RDP privacy accountant | | `federation` | `ExportBuilder` and `ImportMerger` implementing the ADR-057 transfer protocol | | `aggregate` | `FederatedAggregator` with FedAvg, FedProx, and Byzantine-tolerant strategies | | `policy` | `FederationPolicy` for selective sharing with allowlists, denylists, and rate limits | ## Segment Types Four new RVF segment types extend the `0x30-0x32` domain expansion range: | Code | Name | Purpose | |---|---|---| | `0x33` | `FederatedManifest` | Describes the export: contributor pseudonym, timestamp, included segments, privacy budget spent | | `0x34` | `DiffPrivacyProof` | Privacy attestation: epsilon/delta, mechanism, sensitivity, clipping norm, noise scale | | `0x35` | `RedactionLog` | PII stripping attestation: redaction counts by category, pre-redaction content hash, rules fired | | `0x36` | `AggregateWeights` | Federated-averaged LoRA deltas with participation count, round number, confidence scores | Readers that do not recognize these segment types skip them per the RVF forward-compatibility rule. Existing `TransferPrior (0x30)`, `PolicyKernel (0x31)`, `CostCurve (0x32)`, `Witness`, and `Crypto` segments are reused as-is. ## PII Stripping Pipeline `PiiStripper` runs a three-stage pipeline on every string field before it leaves the local machine. **Stage 1 -- Detection.** Twelve built-in regex rules scan for: - Unix and Windows file paths (`/home/user/...`, `C:\Users\...`) - IPv4 and IPv6 addresses - Email addresses - API keys (`sk-...`, `AKIA...`, `ghp_...`, Bearer tokens) - Environment variable references (`$HOME`, `%USERPROFILE%`) - Usernames (`@handle`) Custom rules can be registered with `add_rule()`. **Stage 2 -- Redaction.** Detected PII is replaced with deterministic pseudonyms (``, ``, ``). The same original value always maps to the same pseudonym within a single export, preserving structural relationships without revealing content. **Stage 3 -- Attestation.** A `RedactionLog (0x35)` segment is generated containing redaction counts by category, the SHAKE-256 hash of the pre-redaction content (proves scanning happened without revealing it), and the rules that fired. ```rust use rvf_federation::PiiStripper; let mut stripper = PiiStripper::new(); let fields = vec![ ("config", "/home/alice/project/.env"), ("server", "connecting to 10.0.0.1:8080"), ("note", "no pii here"), ]; let (redacted, log) = stripper.strip_fields(&fields); assert_eq!(log.fields_scanned, 3); assert!(log.total_redactions >= 2); assert!(redacted[2].1 == "no pii here"); // clean fields pass through ``` ## Differential Privacy ### Noise Mechanisms | Mechanism | Privacy Model | Noise Distribution | Use Case | |---|---|---|---| | Gaussian | (epsilon, delta)-DP | N(0, sigma^2) where sigma = S * sqrt(2 ln(1.25/delta)) / epsilon | Default; tighter for large parameter counts | | Laplace | Pure epsilon-DP | Laplace(0, S/epsilon) | Stronger guarantee; no delta term | ### Gradient Clipping Before noise injection, all parameter vectors are clipped to a configurable L2 norm bound. This limits the sensitivity of the aggregation to any single user's contribution. ### Privacy Accountant `PrivacyAccountant` tracks cumulative privacy loss using Renyi Differential Privacy (RDP) composition across 16 alpha orders. RDP composition is tighter than naive (epsilon, delta)-DP composition, meaning more exports fit within the same budget. ```rust use rvf_federation::PrivacyAccountant; let mut accountant = PrivacyAccountant::new(10.0, 1e-5); // budget: eps=10, delta=1e-5 accountant.record_gaussian(1.0, 1.0, 1e-5, 100); assert!(accountant.remaining_budget() > 0.0); assert!(!accountant.is_exhausted()); ``` ## Federation Strategies | Strategy | Algorithm | Weighting | When to Use | |---|---|---|---| | `FedAvg` | Federated Averaging (McMahan et al.) | Trajectory count | Default; most scenarios | | `FedProx` | Proximal regularization | Trajectory count + mu penalty | Heterogeneous data distributions | | `WeightedAverage` | Simple weighted mean | Quality/reputation score | When contributor reputation varies widely | | Byzantine detection | L2-norm z-score filtering | Outliers > 2 std removed | Always runs before aggregation | ```rust use rvf_federation::{FederatedAggregator, AggregationStrategy}; use rvf_federation::aggregate::Contribution; let mut agg = FederatedAggregator::new("code_review".into(), AggregationStrategy::FedAvg) .with_min_contributions(2) .with_byzantine_threshold(2.0); agg.add_contribution(Contribution { contributor: "alice".into(), weights: vec![1.0, 2.0, 3.0], quality_weight: 0.9, trajectory_count: 100, }); agg.add_contribution(Contribution { contributor: "bob".into(), weights: vec![1.2, 1.8, 3.1], quality_weight: 0.85, trajectory_count: 80, }); let result = agg.aggregate().unwrap(); assert_eq!(result.participation_count, 2); assert_eq!(result.lora_deltas.len(), 3); ``` ## Performance Benchmarks Measured on an AMD64 Linux system with Criterion. | Benchmark | Time | |---|---| | PII detect (single string) | 756 ns | | PII strip (10 fields) | 44 us | | PII strip (100 fields) | 303 us | | Gaussian noise (100 params) | 4.7 us | | Gaussian noise (10k params) | 334 us | | Gradient clipping (1k params) | 487 ns | | Privacy accountant (100 rounds) | 1.0 us | | FedAvg (10 contrib, 100 dim) | 3.9 us | | FedAvg (100 contrib, 1k dim) | 365 us | | Byzantine detection (50 contrib) | 12 us | | Full export pipeline | 1.2 ms | | Merge 100 priors | 28 us | ## Feature Flags | Flag | Default | What It Enables | |---|---|---| | `std` | Yes | Standard library support (required) | | `serde` | No | Derive `Serialize`/`Deserialize` on all public types | ```toml [dependencies] rvf-federation = { version = "0.1", features = ["serde"] } ``` ## API Overview ### Core Types | Type | Description | |---|---| | `FederatedManifest` | Export metadata: contributor pseudonym, domain, timestamp, privacy budget spent | | `DiffPrivacyProof` | Privacy attestation: epsilon, delta, mechanism, sensitivity, noise scale | | `RedactionLog` | PII stripping attestation: entries by category, pre-redaction hash, field count | | `AggregateWeights` | Federated-averaged LoRA deltas with round number, participation count, confidences | | `BetaParams` | Beta distribution parameters for Thompson Sampling priors (merge, dampen, mean) | ### Transfer Types | Type | Description | |---|---| | `TransferPriorEntry` | Single context bucket prior: bucket ID, arm ID, Beta params, observation count | | `TransferPriorSet` | Collection of priors from a trained domain with cost EMA | | `PolicyKernelSnapshot` | Snapshot of tunable policy knob values with fitness score | | `CostCurveSnapshot` | Ordered (step, cost) points with acceleration factor | ### Aggregation Types | Type | Description | |---|---| | `FederatedAggregator` | Aggregation server: collects contributions, detects outliers, produces `AggregateWeights` | | `AggregationStrategy` | `FedAvg`, `FedProx { mu }`, or `WeightedAverage` | | `Contribution` | Single participant's weight vector with quality and trajectory metadata | ### Protocol Types | Type | Description | |---|---| | `ExportBuilder` | Builder pattern: add priors/kernels/weights, PII-strip, DP-noise, produce `FederatedExport` | | `ImportMerger` | Validate imports, merge priors with version-aware dampening, merge weights | | `FederatedExport` | Completed export: manifest + redaction log + privacy proof + learning data | | `FederationPolicy` | Selective sharing: allowlists, denylists, quality gate, rate limit, privacy budget | | `PiiStripper` | Three-stage PII pipeline: detect, redact, attest | | `DiffPrivacyEngine` | Noise injection with Gaussian or Laplace mechanism and gradient clipping | | `PrivacyAccountant` | RDP-based cumulative privacy loss tracker | ### Error Types `FederationError` covers 15 variants: | Variant | Trigger | |---|---| | `PrivacyBudgetExhausted` | Cumulative epsilon exceeds limit | | `InvalidEpsilon` | Epsilon <= 0 | | `InvalidDelta` | Delta outside (0, 1) | | `SegmentValidation` | Malformed segment data | | `VersionMismatch` | Incompatible format version | | `SignatureVerification` | Ed25519/ML-DSA-65 signature check failed | | `WitnessChainBroken` | Witness chain has a gap or tampered entry | | `InsufficientObservations` | Prior has too few observations for export | | `QualityBelowThreshold` | Trajectory quality below policy minimum | | `RateLimited` | Export rate limit exceeded | | `PiiLeakDetected` | PII found after stripping (defense-in-depth) | | `ByzantineOutlier` | Contribution flagged as adversarial | | `InsufficientContributions` | Not enough participants for aggregation round | | `Serialization` | Encoding/decoding failure | | `Io` | I/O operation failure | ## Related Crates | Crate | Relationship | |---|---| | [`rvf-types`](../rvf-types) | Core RVF segment definitions; `rvf-federation` defines its own payload types to avoid circular deps | | [`ruvector-domain-expansion`](../../ruvector-domain-expansion) | Source of `TransferPrior`, `PolicyKernel`, `CostCurve`; federation exports these as RVF segments | | [`sona`](../../sona) | SONA learning engine; `FederatedCoordinator` handles intra-deployment aggregation, `rvf-federation` handles inter-user | | [`rvf-crypto`](../rvf-crypto) | Ed25519 signatures and SHAKE-256 hashing used for witness chains and segment integrity | ## Testing 54 tests across all modules: ```bash cargo test -p rvf-federation ``` Benchmarks: ```bash cargo bench -p rvf-federation ``` ## License MIT OR Apache-2.0 --- Part of [RuVector](https://github.com/ruvnet/ruvector) -- the self-learning vector database.