Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
333
vendor/ruvector/crates/rvf/rvf-federation/README.md
vendored
Normal file
333
vendor/ruvector/crates/rvf/rvf-federation/README.md
vendored
Normal file
@@ -0,0 +1,333 @@
|
||||
# rvf-federation
|
||||
|
||||
[](https://crates.io/crates/rvf-federation)
|
||||
[](https://docs.rs/rvf-federation)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://www.rust-lang.org)
|
||||
|
||||
**Privacy-preserving federated transfer learning for the RVF format.**
|
||||
|
||||
```toml
|
||||
rvf-federation = "0.1"
|
||||
```
|
||||
|
||||
RuVector users independently accumulate learning patterns -- SONA weight trajectories, policy kernel configurations, domain expansion priors, HNSW tuning parameters. Today that learning is siloed. `rvf-federation` implements the inter-user federation layer defined in [ADR-057](../../../docs/adr/ADR-057-federated-rvf-transfer-learning.md): it strips PII, injects differential privacy noise, packages transferable learning as RVF segments, and merges incoming learning with formal privacy guarantees.
|
||||
|
||||
| | rvf-federation | Siloed learning | Manual sharing |
|
||||
|---|---|---|---|
|
||||
| **Privacy** | 3-stage PII stripping + calibrated DP noise | N/A -- nothing leaves the machine | Trust the sender |
|
||||
| **Knowledge reuse** | New users bootstrap from community priors | Every deployment starts cold | Copy-paste config files |
|
||||
| **Integrity** | Witness chain + Ed25519/ML-DSA-65 signatures | N/A | No verification |
|
||||
| **Aggregation** | FedAvg, FedProx, Byzantine-tolerant averaging | N/A | Manual merge |
|
||||
| **Privacy accounting** | RDP composition with formal epsilon budget | N/A | N/A |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```rust
|
||||
use rvf_federation::{
|
||||
ExportBuilder, DiffPrivacyEngine, FederationPolicy,
|
||||
TransferPriorSet, TransferPriorEntry, BetaParams,
|
||||
};
|
||||
|
||||
// 1. Build an export from local learning
|
||||
let priors = TransferPriorSet {
|
||||
source_domain: "code_review".into(),
|
||||
entries: vec![TransferPriorEntry {
|
||||
bucket_id: "medium_algorithm".into(),
|
||||
arm_id: "arm_0".into(),
|
||||
params: BetaParams::new(10.0, 5.0),
|
||||
observation_count: 50,
|
||||
}],
|
||||
cost_ema: 0.85,
|
||||
};
|
||||
|
||||
// 2. Configure differential privacy (epsilon=1.0, delta=1e-5)
|
||||
let mut dp = DiffPrivacyEngine::gaussian(1.0, 1e-5, 1.0, 1.0).unwrap();
|
||||
|
||||
// 3. Build: PII strip -> DP noise -> assemble manifest
|
||||
let export = ExportBuilder::new("alice_pseudo".into(), "code_review".into())
|
||||
.with_policy(FederationPolicy::default())
|
||||
.add_priors(priors)
|
||||
.add_string_field("config_path".into(), "/home/alice/project/.config".into())
|
||||
.build(&mut dp)
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(export.manifest.format_version, 1);
|
||||
assert!(export.redaction_log.total_redactions >= 1); // PII was stripped
|
||||
assert!(export.privacy_proof.epsilon > 0.0); // DP noise was applied
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
| Feature | What It Does | Why It Matters |
|
||||
|---|---|---|
|
||||
| **PII stripping** | 3-stage pipeline: detect, redact, attest | No personal data leaves the local machine |
|
||||
| **Differential privacy** | Gaussian/Laplace noise with RDP accounting | Formal mathematical privacy guarantee per export |
|
||||
| **Gradient clipping** | Bound L2 norms before aggregation | Limits any single user's influence on the aggregate |
|
||||
| **FedAvg / FedProx** | Federated averaging with optional proximal term | Industry-standard aggregation (McMahan et al. 2017) |
|
||||
| **Byzantine tolerance** | Outlier detection by L2-norm z-score | Malicious contributions are excluded automatically |
|
||||
| **Version-aware merging** | Dampened confidence for cross-version imports | Older learning still helps, with reduced weight |
|
||||
| **Selective sharing** | Allowlist/denylist for segments and domains | Users control exactly what they share |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Local Engine Remote
|
||||
+------------------+ +------------+ +---------+ +----------+
|
||||
| TransferPriors |--->| |--->| |---->| |
|
||||
| PolicyKernels | | PII Strip | | DP | | RVF | Registry
|
||||
| CostCurves | | (3-stage) | | Noise | | Export |----> (GCS)
|
||||
| LoRA Weights | | | | | | Builder | |
|
||||
+------------------+ +------------+ +---------+ +----------+ |
|
||||
v
|
||||
+------------------+ +------------+ +---------+ +----------+ +--------+
|
||||
| Merged Learning |<---| Version- |<---| Import |<----| Validate |<-| Import |
|
||||
| (local engines) | | Aware | | Merger | | (sig + | | (pull) |
|
||||
| | | Merge | | | | witness) | +--------+
|
||||
+------------------+ +------------+ +---------+ +----------+
|
||||
```
|
||||
|
||||
## Modules
|
||||
|
||||
| Module | Description |
|
||||
|---|---|
|
||||
| `types` | Four new RVF segment payload types (0x33-0x36) plus federation data structures |
|
||||
| `error` | 15 error variants covering privacy, validation, aggregation, and I/O failures |
|
||||
| `pii_strip` | Three-stage PII stripping pipeline with 12 built-in detection rules |
|
||||
| `diff_privacy` | Gaussian/Laplace noise engines, gradient clipping, RDP privacy accountant |
|
||||
| `federation` | `ExportBuilder` and `ImportMerger` implementing the ADR-057 transfer protocol |
|
||||
| `aggregate` | `FederatedAggregator` with FedAvg, FedProx, and Byzantine-tolerant strategies |
|
||||
| `policy` | `FederationPolicy` for selective sharing with allowlists, denylists, and rate limits |
|
||||
|
||||
## Segment Types
|
||||
|
||||
Four new RVF segment types extend the `0x30-0x32` domain expansion range:
|
||||
|
||||
| Code | Name | Purpose |
|
||||
|---|---|---|
|
||||
| `0x33` | `FederatedManifest` | Describes the export: contributor pseudonym, timestamp, included segments, privacy budget spent |
|
||||
| `0x34` | `DiffPrivacyProof` | Privacy attestation: epsilon/delta, mechanism, sensitivity, clipping norm, noise scale |
|
||||
| `0x35` | `RedactionLog` | PII stripping attestation: redaction counts by category, pre-redaction content hash, rules fired |
|
||||
| `0x36` | `AggregateWeights` | Federated-averaged LoRA deltas with participation count, round number, confidence scores |
|
||||
|
||||
Readers that do not recognize these segment types skip them per the RVF forward-compatibility rule. Existing `TransferPrior (0x30)`, `PolicyKernel (0x31)`, `CostCurve (0x32)`, `Witness`, and `Crypto` segments are reused as-is.
|
||||
|
||||
## PII Stripping Pipeline
|
||||
|
||||
`PiiStripper` runs a three-stage pipeline on every string field before it leaves the local machine.
|
||||
|
||||
**Stage 1 -- Detection.** Twelve built-in regex rules scan for:
|
||||
|
||||
- Unix and Windows file paths (`/home/user/...`, `C:\Users\...`)
|
||||
- IPv4 and IPv6 addresses
|
||||
- Email addresses
|
||||
- API keys (`sk-...`, `AKIA...`, `ghp_...`, Bearer tokens)
|
||||
- Environment variable references (`$HOME`, `%USERPROFILE%`)
|
||||
- Usernames (`@handle`)
|
||||
|
||||
Custom rules can be registered with `add_rule()`.
|
||||
|
||||
**Stage 2 -- Redaction.** Detected PII is replaced with deterministic pseudonyms (`<PATH_1>`, `<IP_2>`, `<REDACTED_KEY>`). The same original value always maps to the same pseudonym within a single export, preserving structural relationships without revealing content.
|
||||
|
||||
**Stage 3 -- Attestation.** A `RedactionLog (0x35)` segment is generated containing redaction counts by category, the SHAKE-256 hash of the pre-redaction content (proves scanning happened without revealing it), and the rules that fired.
|
||||
|
||||
```rust
|
||||
use rvf_federation::PiiStripper;
|
||||
|
||||
let mut stripper = PiiStripper::new();
|
||||
let fields = vec![
|
||||
("config", "/home/alice/project/.env"),
|
||||
("server", "connecting to 10.0.0.1:8080"),
|
||||
("note", "no pii here"),
|
||||
];
|
||||
let (redacted, log) = stripper.strip_fields(&fields);
|
||||
assert_eq!(log.fields_scanned, 3);
|
||||
assert!(log.total_redactions >= 2);
|
||||
assert!(redacted[2].1 == "no pii here"); // clean fields pass through
|
||||
```
|
||||
|
||||
## Differential Privacy
|
||||
|
||||
### Noise Mechanisms
|
||||
|
||||
| Mechanism | Privacy Model | Noise Distribution | Use Case |
|
||||
|---|---|---|---|
|
||||
| Gaussian | (epsilon, delta)-DP | N(0, sigma^2) where sigma = S * sqrt(2 ln(1.25/delta)) / epsilon | Default; tighter for large parameter counts |
|
||||
| Laplace | Pure epsilon-DP | Laplace(0, S/epsilon) | Stronger guarantee; no delta term |
|
||||
|
||||
### Gradient Clipping
|
||||
|
||||
Before noise injection, all parameter vectors are clipped to a configurable L2 norm bound. This limits the sensitivity of the aggregation to any single user's contribution.
|
||||
|
||||
### Privacy Accountant
|
||||
|
||||
`PrivacyAccountant` tracks cumulative privacy loss using Renyi Differential Privacy (RDP) composition across 16 alpha orders. RDP composition is tighter than naive (epsilon, delta)-DP composition, meaning more exports fit within the same budget.
|
||||
|
||||
```rust
|
||||
use rvf_federation::PrivacyAccountant;
|
||||
|
||||
let mut accountant = PrivacyAccountant::new(10.0, 1e-5); // budget: eps=10, delta=1e-5
|
||||
accountant.record_gaussian(1.0, 1.0, 1e-5, 100);
|
||||
assert!(accountant.remaining_budget() > 0.0);
|
||||
assert!(!accountant.is_exhausted());
|
||||
```
|
||||
|
||||
## Federation Strategies
|
||||
|
||||
| Strategy | Algorithm | Weighting | When to Use |
|
||||
|---|---|---|---|
|
||||
| `FedAvg` | Federated Averaging (McMahan et al.) | Trajectory count | Default; most scenarios |
|
||||
| `FedProx` | Proximal regularization | Trajectory count + mu penalty | Heterogeneous data distributions |
|
||||
| `WeightedAverage` | Simple weighted mean | Quality/reputation score | When contributor reputation varies widely |
|
||||
| Byzantine detection | L2-norm z-score filtering | Outliers > 2 std removed | Always runs before aggregation |
|
||||
|
||||
```rust
|
||||
use rvf_federation::{FederatedAggregator, AggregationStrategy};
|
||||
use rvf_federation::aggregate::Contribution;
|
||||
|
||||
let mut agg = FederatedAggregator::new("code_review".into(), AggregationStrategy::FedAvg)
|
||||
.with_min_contributions(2)
|
||||
.with_byzantine_threshold(2.0);
|
||||
|
||||
agg.add_contribution(Contribution {
|
||||
contributor: "alice".into(),
|
||||
weights: vec![1.0, 2.0, 3.0],
|
||||
quality_weight: 0.9,
|
||||
trajectory_count: 100,
|
||||
});
|
||||
agg.add_contribution(Contribution {
|
||||
contributor: "bob".into(),
|
||||
weights: vec![1.2, 1.8, 3.1],
|
||||
quality_weight: 0.85,
|
||||
trajectory_count: 80,
|
||||
});
|
||||
|
||||
let result = agg.aggregate().unwrap();
|
||||
assert_eq!(result.participation_count, 2);
|
||||
assert_eq!(result.lora_deltas.len(), 3);
|
||||
```
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
Measured on an AMD64 Linux system with Criterion.
|
||||
|
||||
| Benchmark | Time |
|
||||
|---|---|
|
||||
| PII detect (single string) | 756 ns |
|
||||
| PII strip (10 fields) | 44 us |
|
||||
| PII strip (100 fields) | 303 us |
|
||||
| Gaussian noise (100 params) | 4.7 us |
|
||||
| Gaussian noise (10k params) | 334 us |
|
||||
| Gradient clipping (1k params) | 487 ns |
|
||||
| Privacy accountant (100 rounds) | 1.0 us |
|
||||
| FedAvg (10 contrib, 100 dim) | 3.9 us |
|
||||
| FedAvg (100 contrib, 1k dim) | 365 us |
|
||||
| Byzantine detection (50 contrib) | 12 us |
|
||||
| Full export pipeline | 1.2 ms |
|
||||
| Merge 100 priors | 28 us |
|
||||
|
||||
## Feature Flags
|
||||
|
||||
| Flag | Default | What It Enables |
|
||||
|---|---|---|
|
||||
| `std` | Yes | Standard library support (required) |
|
||||
| `serde` | No | Derive `Serialize`/`Deserialize` on all public types |
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
rvf-federation = { version = "0.1", features = ["serde"] }
|
||||
```
|
||||
|
||||
## API Overview
|
||||
|
||||
### Core Types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `FederatedManifest` | Export metadata: contributor pseudonym, domain, timestamp, privacy budget spent |
|
||||
| `DiffPrivacyProof` | Privacy attestation: epsilon, delta, mechanism, sensitivity, noise scale |
|
||||
| `RedactionLog` | PII stripping attestation: entries by category, pre-redaction hash, field count |
|
||||
| `AggregateWeights` | Federated-averaged LoRA deltas with round number, participation count, confidences |
|
||||
| `BetaParams` | Beta distribution parameters for Thompson Sampling priors (merge, dampen, mean) |
|
||||
|
||||
### Transfer Types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `TransferPriorEntry` | Single context bucket prior: bucket ID, arm ID, Beta params, observation count |
|
||||
| `TransferPriorSet` | Collection of priors from a trained domain with cost EMA |
|
||||
| `PolicyKernelSnapshot` | Snapshot of tunable policy knob values with fitness score |
|
||||
| `CostCurveSnapshot` | Ordered (step, cost) points with acceleration factor |
|
||||
|
||||
### Aggregation Types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `FederatedAggregator` | Aggregation server: collects contributions, detects outliers, produces `AggregateWeights` |
|
||||
| `AggregationStrategy` | `FedAvg`, `FedProx { mu }`, or `WeightedAverage` |
|
||||
| `Contribution` | Single participant's weight vector with quality and trajectory metadata |
|
||||
|
||||
### Protocol Types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `ExportBuilder` | Builder pattern: add priors/kernels/weights, PII-strip, DP-noise, produce `FederatedExport` |
|
||||
| `ImportMerger` | Validate imports, merge priors with version-aware dampening, merge weights |
|
||||
| `FederatedExport` | Completed export: manifest + redaction log + privacy proof + learning data |
|
||||
| `FederationPolicy` | Selective sharing: allowlists, denylists, quality gate, rate limit, privacy budget |
|
||||
| `PiiStripper` | Three-stage PII pipeline: detect, redact, attest |
|
||||
| `DiffPrivacyEngine` | Noise injection with Gaussian or Laplace mechanism and gradient clipping |
|
||||
| `PrivacyAccountant` | RDP-based cumulative privacy loss tracker |
|
||||
|
||||
### Error Types
|
||||
|
||||
`FederationError` covers 15 variants:
|
||||
|
||||
| Variant | Trigger |
|
||||
|---|---|
|
||||
| `PrivacyBudgetExhausted` | Cumulative epsilon exceeds limit |
|
||||
| `InvalidEpsilon` | Epsilon <= 0 |
|
||||
| `InvalidDelta` | Delta outside (0, 1) |
|
||||
| `SegmentValidation` | Malformed segment data |
|
||||
| `VersionMismatch` | Incompatible format version |
|
||||
| `SignatureVerification` | Ed25519/ML-DSA-65 signature check failed |
|
||||
| `WitnessChainBroken` | Witness chain has a gap or tampered entry |
|
||||
| `InsufficientObservations` | Prior has too few observations for export |
|
||||
| `QualityBelowThreshold` | Trajectory quality below policy minimum |
|
||||
| `RateLimited` | Export rate limit exceeded |
|
||||
| `PiiLeakDetected` | PII found after stripping (defense-in-depth) |
|
||||
| `ByzantineOutlier` | Contribution flagged as adversarial |
|
||||
| `InsufficientContributions` | Not enough participants for aggregation round |
|
||||
| `Serialization` | Encoding/decoding failure |
|
||||
| `Io` | I/O operation failure |
|
||||
|
||||
## Related Crates
|
||||
|
||||
| Crate | Relationship |
|
||||
|---|---|
|
||||
| [`rvf-types`](../rvf-types) | Core RVF segment definitions; `rvf-federation` defines its own payload types to avoid circular deps |
|
||||
| [`ruvector-domain-expansion`](../../ruvector-domain-expansion) | Source of `TransferPrior`, `PolicyKernel`, `CostCurve`; federation exports these as RVF segments |
|
||||
| [`sona`](../../sona) | SONA learning engine; `FederatedCoordinator` handles intra-deployment aggregation, `rvf-federation` handles inter-user |
|
||||
| [`rvf-crypto`](../rvf-crypto) | Ed25519 signatures and SHAKE-256 hashing used for witness chains and segment integrity |
|
||||
|
||||
## Testing
|
||||
|
||||
54 tests across all modules:
|
||||
|
||||
```bash
|
||||
cargo test -p rvf-federation
|
||||
```
|
||||
|
||||
Benchmarks:
|
||||
|
||||
```bash
|
||||
cargo bench -p rvf-federation
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT OR Apache-2.0
|
||||
|
||||
---
|
||||
|
||||
Part of [RuVector](https://github.com/ruvnet/ruvector) -- the self-learning vector database.
|
||||
Reference in New Issue
Block a user