Files
wifi-densepose/vendor/ruvector/docs/research/federated-rvf/PLAN.md

43 KiB

Federated RVF Transfer Learning -- GOAP Implementation Plan

ADR: ADR-057 Date: 2026-02-26 Methodology: Goal-Oriented Action Planning (GOAP)


1. World State Assessment

1.1 Current State (What Is True Now)

State Variable Value Evidence
rvf_segment_types_defined 25 types (0x00-0x32) crates/rvf/rvf-types/src/segment_type.rs
transfer_prior_segment_exists true (0x30) SegmentType::TransferPrior
policy_kernel_segment_exists true (0x31) SegmentType::PolicyKernel
cost_curve_segment_exists true (0x32) SegmentType::CostCurve
rvf_bridge_serialization_works true ruvector-domain-expansion/src/rvf_bridge.rs -- 11 passing tests
witness_chain_support true rvf-crypto/src/witness.rs -- create_witness_chain, verify_witness_chain
ed25519_signing_support true rvf-crypto/src/sign.rs -- feature-gated ed25519
shake256_hashing_support true rvf-crypto/src/hash.rs -- shake256_128, shake256_256
sona_federated_coordinator_exists true crates/sona/src/training/federated.rs -- FederatedCoordinator, EphemeralAgent
sona_agent_export_works true AgentExport, TrajectoryExport with quality gating
sona_lora_weights_accessible true SonaEngine::apply_micro_lora, MicroLoRA, BaseLoRA
sona_ewc_support true crates/sona/src/ewc.rs -- EwcPlusPlus, TaskFisher
domain_expansion_engine_exists true crates/ruvector-domain-expansion/ -- 3 domains, Meta-TS, population search
domain_expansion_transfer_works true MetaThompsonEngine::init_domain_with_transfer with sqrt dampening
beta_params_merge_exists true BetaParams::merge() in transfer.rs
gcloud_example_exists true examples/google-cloud/ -- Cloud Run, axum server
rvf_workspace_defined true crates/rvf/Cargo.toml -- 25 workspace members
no_std_types_core true rvf-types is no_std by default
pii_stripping_exists false No PII detection or redaction crate
differential_privacy_exists false No DP primitives in codebase
federation_protocol_exists false No inter-user export/import protocol
gcloud_pubsub_integration false No Pub/Sub client code
gcloud_gcs_integration false No GCS object store client code
gcloud_firestore_integration false No Firestore client code
federated_manifest_segment false No 0x33 segment type
diff_privacy_proof_segment false No 0x34 segment type
redaction_log_segment false No 0x35 segment type
aggregate_weights_segment false No 0x36 segment type
wasm_export_path false No browser-side federation
aggregation_server false No multi-user aggregation service
mcp_federation_server false No MCP server for AI agent access
rest_api_server false No REST API server for programmatic access

1.2 Goal State (What Should Be True)

State Variable Required Value
federated_manifest_segment true -- 0x33 defined and wire-coded
diff_privacy_proof_segment true -- 0x34 defined and wire-coded
redaction_log_segment true -- 0x35 defined and wire-coded
aggregate_weights_segment true -- 0x36 defined and wire-coded
pii_stripping_exists true -- rvf-pii-strip crate with detection, redaction, attestation
differential_privacy_exists true -- rvf-diff-privacy crate with Gaussian mechanism, RDP accountant
federation_protocol_exists true -- rvf-federation crate with export builder, import validator, merger
gcloud_pubsub_integration true -- rvf-gcloud with Pub/Sub publish/subscribe
gcloud_gcs_integration true -- rvf-gcloud with GCS upload/download
gcloud_firestore_integration true -- rvf-gcloud with Firestore registry
aggregation_server true -- rvf-fed-aggregate with FedAvg, Byzantine tolerance
wasm_export_path true -- rvf-fed-wasm with browser PII strip + export
federation_adapter true -- rvf-adapters/federation connecting SONA + domain expansion
mcp_federation_server true -- mcp-federation crate with 6 tools + 4 resources over JSON-RPC 2.0
rest_api_server true -- rvf-fed-server crate with REST API, SSE events, Prometheus metrics
all_tests_pass true
feature_gated true -- all federation is behind federation feature flag

2. Action Inventory

Each action has: preconditions, effects, estimated cost (story points, 1-13), and dependencies.

Phase 0: Foundation -- Segment Types and Core Types

Action 0.1: Add Federation Segment Types to rvf-types

  • Preconditions: rvf_segment_types_defined == true
  • Effects: federated_manifest_segment = true, diff_privacy_proof_segment = true, redaction_log_segment = true, aggregate_weights_segment = true
  • Cost: 3 SP
  • Dependencies: None
  • Files Modified:
    • crates/rvf/rvf-types/src/segment_type.rs -- Add FederatedManifest = 0x33, DiffPrivacyProof = 0x34, RedactionLog = 0x35, AggregateWeights = 0x36
    • crates/rvf/rvf-types/src/federation.rs -- New module with header structs (FederatedManifestHeader, DiffPrivacyProofHeader, RedactionLogHeader, AggregateWeightsHeader)
    • crates/rvf/rvf-types/src/lib.rs -- Add pub mod federation; (feature-gated behind federation)
    • Tests: round-trip for all 4 new segment types, discriminant values

Action 0.2: Add Federation Segment Wire Codecs to rvf-wire

  • Preconditions: federated_manifest_segment == true
  • Effects: federation_wire_codecs = true
  • Cost: 5 SP
  • Dependencies: [0.1]
  • Files Modified:
    • crates/rvf/rvf-wire/src/federation_codec.rs -- New module: encode_federated_manifest, decode_federated_manifest, and equivalents for 0x34-0x36
    • crates/rvf/rvf-wire/src/lib.rs -- Add pub mod federation_codec; (feature-gated)
    • Tests: encode-decode round-trip for each new segment type, fuzz edge cases (truncated payloads, wrong magic)

Phase 1: PII Stripping

Action 1.1: Create rvf-pii-strip Crate

  • Preconditions: rvf_workspace_defined == true
  • Effects: pii_detection_exists = true
  • Cost: 8 SP
  • Dependencies: [0.1]
  • New Files:
    • crates/rvf/rvf-pii-strip/Cargo.toml -- deps: rvf-types, regex (std feature), serde
    • crates/rvf/rvf-pii-strip/src/lib.rs -- Module structure
    • crates/rvf/rvf-pii-strip/src/detect.rs -- PiiDetector with regex patterns for paths, IPs, emails, API keys, usernames, env refs
    • crates/rvf/rvf-pii-strip/src/redact.rs -- PiiRedactor with pseudonymization (deterministic per-export)
    • crates/rvf/rvf-pii-strip/src/attest.rs -- RedactionAttestor generating RedactionLog segment payload
    • crates/rvf/rvf-pii-strip/src/rules.rs -- RedactionRule config, RuleSet with default + custom rules
    • crates/rvf/rvf-pii-strip/src/pipeline.rs -- StripPipeline::new(rules).detect(payload).redact().attest() fluent API
    • Tests: detection accuracy for each PII type, pseudonym determinism, attest hash correctness, empty input, binary content (should pass through)

Action 1.2: Create rvf-pii-strip no_std Core

  • Preconditions: pii_detection_exists == true
  • Effects: pii_strip_nostd_core = true
  • Cost: 3 SP
  • Dependencies: [1.1]
  • Details: Extract regex-free pattern matching into no_std core that works in WASM. Uses simple byte-scanning for path separators, IP octets, sk- prefixes. Full regex detection remains in std feature.

Phase 2: Differential Privacy

Action 2.1: Create rvf-diff-privacy Crate

  • Preconditions: rvf_workspace_defined == true, diff_privacy_proof_segment == true
  • Effects: differential_privacy_exists = true
  • Cost: 8 SP
  • Dependencies: [0.1]
  • New Files:
    • crates/rvf/rvf-diff-privacy/Cargo.toml -- deps: rvf-types, rand, serde
    • crates/rvf/rvf-diff-privacy/src/lib.rs -- Module structure
    • crates/rvf/rvf-diff-privacy/src/mechanism.rs -- GaussianMechanism, LaplaceMechanism, ExponentialMechanism with calibrated noise
    • crates/rvf/rvf-diff-privacy/src/clipping.rs -- GradientClipper with L2 norm clipping, per-parameter and global
    • crates/rvf/rvf-diff-privacy/src/accountant.rs -- PrivacyAccountant using Renyi Differential Privacy (RDP) composition
    • crates/rvf/rvf-diff-privacy/src/budget.rs -- PrivacyBudget tracking cumulative epsilon/delta spend per contributor
    • crates/rvf/rvf-diff-privacy/src/proof.rs -- DiffPrivacyProofBuilder generating 0x34 segment payload
    • crates/rvf/rvf-diff-privacy/src/config.rs -- DiffPrivacyConfig { epsilon, delta, clipping_norm, noise_multiplier, mechanism }
    • Tests: noise calibration matches theoretical bounds, RDP composition is monotonically increasing, budget tracking, proof generation

Action 2.2: Create rvf-diff-privacy no_std Core

  • Preconditions: differential_privacy_exists == true
  • Effects: diff_privacy_nostd_core = true
  • Cost: 3 SP
  • Dependencies: [2.1]
  • Details: Core noise generation and clipping in no_std (uses rand which supports no_std). RDP accountant requires f64 math but can be no_std with libm.

Phase 3: Federation Protocol

Action 3.1: Create rvf-federation Crate

  • Preconditions: federation_wire_codecs == true, pii_detection_exists == true, differential_privacy_exists == true
  • Effects: federation_protocol_exists = true
  • Cost: 13 SP
  • Dependencies: [0.2, 1.1, 2.1]
  • New Files:
    • crates/rvf/rvf-federation/Cargo.toml -- deps: rvf-types, rvf-wire, rvf-crypto, rvf-pii-strip, rvf-diff-privacy, serde, serde_json
    • crates/rvf/rvf-federation/src/lib.rs -- Module structure
    • crates/rvf/rvf-federation/src/export.rs -- ExportBuilder:
      • .add_transfer_prior(prior) -- adds 0x30 segment
      • .add_policy_kernel(kernel) -- adds 0x31 segment
      • .add_cost_curve(curve) -- adds 0x32 segment
      • .add_sona_weights(weights) -- adds 0x36 segment
      • .set_contributor(pseudonym) -- sets contributor ID
      • .set_privacy_config(config) -- sets epsilon/delta
      • .set_pii_rules(rules) -- sets redaction rules
      • .build() -- runs PII strip pipeline, noise injection, generates manifest + redaction log + proof + witness + signature, returns Vec<u8>
    • crates/rvf/rvf-federation/src/import.rs -- ImportValidator:
      • .validate(data: &[u8]) -- parses segments, verifies signature, witness chain, privacy proof, redaction log
      • .extract_priors(), .extract_kernels(), .extract_curves(), .extract_weights()
      • Returns ValidatedImport with all segments + metadata
    • crates/rvf/rvf-federation/src/merge.rs -- VersionMerger:
      • .merge_transfer_prior(local, remote, weight) -- version-aware Beta parameter merging with dampening
      • .merge_policy_kernel(local_population, remote_kernel) -- inject remote kernel into population
      • .merge_sona_weights(local, remote, ewc_fisher) -- weighted average with EWC regularization
      • .merge_cost_curve(local_scoreboard, remote_curve) -- add as reference curve
    • crates/rvf/rvf-federation/src/policy.rs -- FederationPolicy:
      • Allowlist/denylist for segment types
      • Quality gate threshold
      • Minimum evidence threshold
      • Rate limit configuration
      • Privacy budget limit
    • crates/rvf/rvf-federation/src/manifest.rs -- FederatedManifestBuilder for 0x33 segment
    • crates/rvf/rvf-federation/src/version.rs -- Version compatibility checking and negotiation
    • Tests: full export/import round-trip, merge correctness, policy enforcement, version compatibility, signature verification, error cases

Action 3.2: Create rvf-adapters/federation

  • Preconditions: federation_protocol_exists == true, sona_federated_coordinator_exists == true, domain_expansion_engine_exists == true
  • Effects: federation_adapter = true
  • Cost: 8 SP
  • Dependencies: [3.1]
  • New Files:
    • crates/rvf/rvf-adapters/federation/Cargo.toml -- deps: rvf-federation, sona, ruvector-domain-expansion, rvf-adapter-sona
    • crates/rvf/rvf-adapters/federation/src/lib.rs -- Module structure
    • crates/rvf/rvf-adapters/federation/src/export_coordinator.rs -- FederationExportCoordinator:
      • Takes &SonaEngine and &DomainExpansionEngine
      • Extracts TransferPrior from MetaThompsonEngine
      • Extracts best PolicyKernel from PopulationSearch
      • Extracts CostCurve from AccelerationScoreboard
      • Extracts SONA LoRA weights for AggregateWeights
      • Applies quality gate and minimum evidence filter
      • Passes to rvf-federation::ExportBuilder
    • crates/rvf/rvf-adapters/federation/src/import_coordinator.rs -- FederationImportCoordinator:
      • Takes &mut SonaEngine and &mut DomainExpansionEngine
      • Uses rvf-federation::ImportValidator to validate
      • Uses rvf-federation::VersionMerger to merge
      • Updates local MetaThompsonEngine with merged priors
      • Injects kernels into PopulationSearch
      • Merges SONA weights with EWC protection
    • Tests: end-to-end export from real engines, import into fresh engines, verify acceleration after import

Phase 4: Google Cloud Integration

Action 4.1: Create rvf-gcloud Crate

  • Preconditions: federation_protocol_exists == true
  • Effects: gcloud_pubsub_integration = true, gcloud_gcs_integration = true, gcloud_firestore_integration = true
  • Cost: 13 SP
  • Dependencies: [3.1]
  • New Files:
    • crates/rvf/rvf-gcloud/Cargo.toml -- deps: google-cloud-pubsub, google-cloud-storage, google-cloud-firestore (or gcloud-sdk), tokio, serde, rvf-federation
    • crates/rvf/rvf-gcloud/src/lib.rs -- Module structure
    • crates/rvf/rvf-gcloud/src/pubsub.rs -- FederationPubSub:
      • publish_export_notification(manifest) -- publish FederatedManifest header to topic
      • subscribe_federation_events(filter) -- subscribe with domain/version filter
      • acknowledge(message_id) -- ack after successful import
      • Topic/subscription management
    • crates/rvf/rvf-gcloud/src/gcs.rs -- FederationStorage:
      • upload_export(domain, contributor, data) -- upload RVF to GCS with proper naming
      • download_export(path) -- download RVF from GCS
      • list_exports(domain, since) -- list available exports
      • delete_by_contributor(pseudonym) -- right-to-deletion support
      • Lifecycle policy configuration
    • crates/rvf/rvf-gcloud/src/firestore.rs -- FederationRegistry:
      • register_manifest(manifest) -- store manifest metadata
      • get_contributor_reputation(pseudonym) -- read reputation score
      • update_reputation(pseudonym, delta) -- update reputation
      • get_privacy_budget(pseudonym) -- read remaining budget
      • record_budget_spend(pseudonym, epsilon) -- deduct from budget
      • list_manifests(domain, limit) -- query manifest history
    • crates/rvf/rvf-gcloud/src/auth.rs -- IAM authentication and service account management
    • crates/rvf/rvf-gcloud/src/config.rs -- GCloudConfig { project_id, region, bucket, topic, collection }
    • Tests: mock-based tests for all GCloud operations (no real GCloud calls in unit tests), integration test behind gcloud-integration feature flag

Action 4.2: Extend Google Cloud Example

  • Preconditions: gcloud_pubsub_integration == true
  • Effects: gcloud_example_updated = true
  • Cost: 5 SP
  • Dependencies: [4.1, 3.2]
  • Files Modified:
    • examples/google-cloud/src/server.rs -- Add federation endpoints: POST /federation/submit, GET /federation/pull, POST /federation/aggregate, GET /federation/status
    • examples/google-cloud/src/federation.rs -- New module: handler implementations using rvf-gcloud and rvf-federation
    • examples/google-cloud/Cargo.toml -- Add rvf-federation, rvf-gcloud, rvf-adapters/federation deps
    • examples/google-cloud/cloudrun.yaml -- Add environment variables for federation config

Phase 5: Federated Aggregation

Action 5.1: Create rvf-fed-aggregate Crate

  • Preconditions: federation_protocol_exists == true, differential_privacy_exists == true
  • Effects: aggregation_server = true
  • Cost: 8 SP
  • Dependencies: [3.1, 2.1]
  • New Files:
    • crates/rvf/rvf-fed-aggregate/Cargo.toml -- deps: rvf-federation, rvf-diff-privacy, rvf-types, serde, tokio
    • crates/rvf/rvf-fed-aggregate/src/lib.rs -- Module structure
    • crates/rvf/rvf-fed-aggregate/src/round.rs -- AggregationRound:
      • Round lifecycle: Open -> Collecting -> Aggregating -> Published
      • submit(validated_import) -- add contributor
      • is_ready() -- check if min_participants reached or timeout
      • aggregate() -- trigger aggregation
    • crates/rvf/rvf-fed-aggregate/src/fedavg.rs -- FedAvgAggregator:
      • Weighted average of TransferPrior Beta parameters
      • Weighted average of PolicyKnobs numeric fields
      • Weighted average of SONA LoRA deltas
      • Weight = contributor_reputation * trajectory_count * quality_score
    • crates/rvf/rvf-fed-aggregate/src/fedprox.rs -- FedProxAggregator:
      • FedAvg + proximal term mu/2 * ||w_k - w_global||^2
      • For heterogeneous contributor distributions
    • crates/rvf/rvf-fed-aggregate/src/byzantine.rs -- ByzantineFilter:
      • IQR-based outlier detection on parameter vectors
      • Krum aggregation: select contributor closest to peers
      • Configurable tolerance threshold
    • crates/rvf/rvf-fed-aggregate/src/reputation.rs -- ReputationManager:
      • Score = f(avg_quality, trajectory_count, age, acceptance_rate)
      • Decay over time
      • Penalty for rejected submissions
    • Tests: FedAvg correctness (average of known inputs), Byzantine tolerance (inject outlier, verify exclusion), reputation scoring, round lifecycle

Phase 5B: MCP and API Interfaces

Action 5B.1: Create mcp-federation Crate (MCP Server)

  • Preconditions: federation_protocol_exists == true, gcloud_pubsub_integration == true
  • Effects: mcp_federation_server = true
  • Cost: 8 SP
  • Dependencies: [3.1, 4.1]
  • New Files:
    • crates/mcp-federation/Cargo.toml -- deps: rvf-federation, rvf-gcloud, rvf-pii-strip, rvf-diff-privacy, rvf-adapters/federation, serde, serde_json, tokio
    • crates/mcp-federation/src/lib.rs -- Module structure, McpFederationServer
    • crates/mcp-federation/src/server.rs -- JSON-RPC 2.0 stdio transport (same pattern as mcp-gate/src/server.rs):
      • McpFederationServer::new(config) -- initialize with federation config
      • McpFederationServer::run() -- main event loop: read stdin, dispatch, write stdout
      • Handles initialize, tools/list, tools/call, resources/list, resources/read
    • crates/mcp-federation/src/tools.rs -- McpFederationTools:
      • federation_export -- extracts, strips PII, applies noise, signs, uploads
      • federation_import -- pulls, validates, merges into local engines
      • federation_status -- reads budget, recent activity, reputation
      • federation_search -- queries Firestore manifest registry
      • federation_budget -- reads privacy budget details
      • federation_aggregate -- triggers server-side aggregation round
    • crates/mcp-federation/src/resources.rs -- McpFederationResources:
      • federation://domains -- list of federated domains with stats
      • federation://contributors -- pseudonymized contributor list + reputation
      • federation://rounds/{round_id} -- aggregation round details
      • federation://budget -- privacy budget for current contributor
    • crates/mcp-federation/src/schemas.rs -- JSON Schema definitions for all tool inputs/outputs
    • Tests: tool dispatch, resource resolution, schema validation, error handling

Action 5B.2: Create rvf-fed-server Crate (REST API)

  • Preconditions: federation_protocol_exists == true, gcloud_pubsub_integration == true, aggregation_server == true
  • Effects: rest_api_server = true
  • Cost: 8 SP
  • Dependencies: [3.1, 4.1, 5.1]
  • New Files:
    • crates/rvf/rvf-fed-server/Cargo.toml -- deps: rvf-federation, rvf-gcloud, rvf-fed-aggregate, axum, tower, tower-http, tokio, serde, serde_json, tracing, metrics, metrics-exporter-prometheus
    • crates/rvf/rvf-fed-server/src/lib.rs -- Module structure, FederationServer
    • crates/rvf/rvf-fed-server/src/routes.rs -- axum Router:
      • POST /v1/exports -- accept RVF bytes, validate, store in GCS, publish event
      • GET /v1/exports/{id} -- download RVF export by ID
      • GET /v1/exports?domain=&since=&limit= -- list exports
      • DELETE /v1/exports/{id} -- contributor deletes own export
      • POST /v1/aggregates -- trigger aggregation round
      • GET /v1/aggregates/{round_id} -- round status
      • GET /v1/aggregates/latest?domain= -- latest aggregate RVF
      • GET /v1/domains -- list federated domains
      • GET /v1/contributors/{pseudonym} -- contributor profile
      • GET /v1/contributors/{pseudonym}/budget -- privacy budget
      • GET /v1/health -- health check
      • GET /v1/metrics -- Prometheus metrics
      • GET /v1/events?domain= -- SSE stream
    • crates/rvf/rvf-fed-server/src/auth.rs -- Authentication middleware:
      • Bearer token validation (SHAKE-256 hash lookup in Firestore)
      • Ed25519 signature verification (X-Federation-Signature, X-Federation-PublicKey)
    • crates/rvf/rvf-fed-server/src/rate_limit.rs -- Tower rate limiting middleware:
      • Per-contributor, per-endpoint configurable limits
      • Token bucket algorithm
    • crates/rvf/rvf-fed-server/src/sse.rs -- Server-Sent Events:
      • new_export, aggregation_complete, import_available event types
      • Domain-filtered subscriptions
    • crates/rvf/rvf-fed-server/src/metrics.rs -- Prometheus metrics registration and export
    • Tests: route handler tests with mock backends, auth middleware, rate limiting, SSE stream

Phase 6: WASM Export Path

Action 6.1: Create rvf-fed-wasm Crate

  • Preconditions: pii_strip_nostd_core == true, diff_privacy_nostd_core == true, federation_protocol_exists == true
  • Effects: wasm_export_path = true
  • Cost: 5 SP
  • Dependencies: [1.2, 2.2, 3.1]
  • New Files:
    • crates/rvf/rvf-fed-wasm/Cargo.toml -- deps: rvf-types, rvf-wire, rvf-crypto, rvf-pii-strip (no_std), rvf-diff-privacy (no_std), wasm-bindgen, js-sys
    • crates/rvf/rvf-fed-wasm/src/lib.rs -- wasm-bindgen exports:
      • FederationExporter::new(config) -- create exporter with epsilon/delta/rules
      • FederationExporter::add_transfer_prior(bytes) -- add prior segment
      • FederationExporter::add_policy_kernel(bytes) -- add kernel segment
      • FederationExporter::add_cost_curve(bytes) -- add curve segment
      • FederationExporter::build() -- strip PII, add noise, sign, return Uint8Array
    • crates/rvf/rvf-fed-wasm/src/js_types.rs -- JavaScript-friendly type wrappers
    • npm package config for @ruvector/rvf-fed-wasm
    • Tests: build with wasm-pack test --headless --chrome

Phase 7: Integration and Testing

Action 7.1: Integration Tests

  • Preconditions: All previous actions complete
  • Effects: all_tests_pass = true
  • Cost: 8 SP
  • Dependencies: [All above]
  • New Files:
    • crates/rvf/tests/rvf-integration/src/federation.rs -- Integration tests:
      • Full export/import round-trip with real SONA and DomainExpansion engines
      • PII stripping verification (inject known PII, verify redaction)
      • Differential privacy verification (noise bounds check)
      • Version compatibility matrix (v1 export, v1 import; future v2 considerations)
      • Byzantine tolerance verification (inject poisoned export, verify exclusion)
      • Privacy budget exhaustion (export until budget depleted, verify rejection)
      • Signature verification (tamper with segment, verify rejection)
      • Witness chain verification (reorder segments, verify rejection)
      • Federated averaging correctness (known inputs, verify output)
      • End-to-end acceleration test (import learning, verify faster convergence)

Action 7.2: Update Workspace Configuration

  • Preconditions: All new crates created
  • Effects: feature_gated = true
  • Cost: 2 SP
  • Dependencies: [All new crate creations]
  • Files Modified:
    • crates/rvf/Cargo.toml -- Add new members to workspace, add workspace dependencies
    • Each existing crate that gains federation feature gates

Action 7.3: CLI Extension

  • Preconditions: federation_adapter == true, gcloud_pubsub_integration == true
  • Effects: cli_federation_commands = true
  • Cost: 5 SP
  • Dependencies: [3.2, 4.1]
  • Files Modified:
    • crates/rvf/rvf-cli/ -- Add subcommands:
      • rvf federation export --domain <id> --epsilon <val> --output <path>
      • rvf federation import --input <path>
      • rvf federation subscribe --domains <ids> --gcloud-config <path>
      • rvf federation status -- show privacy budget, contribution history
      • rvf federation revoke --contributor <pseudonym> -- right-to-deletion

3. GOAP Plan: Optimal Action Sequence

Using A* search through the action dependency graph, the optimal implementation order is:

MILESTONE 1: FOUNDATION (Week 1-2)
===================================
Sprint 1 (Week 1):
  [0.1] Add Federation Segment Types        (3 SP)
  [0.2] Add Federation Wire Codecs          (5 SP)
                                      Total: 8 SP

Sprint 2 (Week 2):
  [1.1] Create rvf-pii-strip               (8 SP)
  [2.1] Create rvf-diff-privacy            (8 SP)  -- parallel with 1.1
                                      Total: 16 SP


MILESTONE 2: CORE PROTOCOL (Week 3-4)
======================================
Sprint 3 (Week 3):
  [1.2] PII Strip no_std Core              (3 SP)
  [2.2] Diff Privacy no_std Core           (3 SP)  -- parallel with 1.2
  [3.1] Create rvf-federation (start)      (8 SP of 13)
                                      Total: 14 SP

Sprint 4 (Week 4):
  [3.1] Create rvf-federation (complete)   (5 SP remaining)
  [3.2] Create rvf-adapters/federation     (8 SP)
                                      Total: 13 SP


MILESTONE 3: CLOUD + AGGREGATION (Week 5-6)
=============================================
Sprint 5 (Week 5):
  [4.1] Create rvf-gcloud                  (13 SP)
                                      Total: 13 SP

Sprint 6 (Week 6):
  [5.1] Create rvf-fed-aggregate           (8 SP)
  [4.2] Extend Google Cloud Example        (5 SP)  -- parallel with 5.1
                                      Total: 13 SP


MILESTONE 4: INTERFACES + WASM (Week 7-8)
==========================================
Sprint 7 (Week 7):
  [5B.1] Create mcp-federation (MCP)       (8 SP)
  [5B.2] Create rvf-fed-server (REST API)  (8 SP)  -- parallel with 5B.1
                                      Total: 16 SP

Sprint 8 (Week 8):
  [6.1] Create rvf-fed-wasm               (5 SP)
  [7.2] Update Workspace Configuration    (2 SP)
  [7.3] CLI Extension                      (5 SP)  -- parallel with 6.1
                                      Total: 12 SP


MILESTONE 5: INTEGRATION (Week 9)
==================================
Sprint 9 (Week 9):
  [7.1] Integration Tests                  (8 SP)
                                      Total: 8 SP


TOTAL: 113 SP across 9 weeks (5 milestones)

Dependency Graph (Topological Order)

[0.1] Segment Types
  |
  +---> [0.2] Wire Codecs
  |       |
  |       +---> [3.1] rvf-federation ──────────────────────┐
  |       |       |                                         |
  +---> [1.1] rvf-pii-strip ──> [1.2] no_std core ──> [6.1] rvf-fed-wasm
  |       |                                                 |
  +---> [2.1] rvf-diff-privacy ──> [2.2] no_std core ──────┘
          |                           |
          +---> [5.1] rvf-fed-aggregate ──> [5B.2] rvf-fed-server (REST API)
          |                                    |
          +---> [3.1] ──> [3.2] rvf-adapters/federation
          |                  |
          +---> [4.1] rvf-gcloud ──> [4.2] Example update
          |       |            |
          |       |            +---> [5B.1] mcp-federation (MCP Server)
          |       |            |
          +-------+---> [7.3] CLI Extension
                  |
                  +---> [7.2] Workspace Config
                  |
                  +---> [7.1] Integration Tests

Critical Path

[0.1] -> [0.2] -> [3.1] -> [3.2] -> [7.1]
                     ^
                     |
[1.1] -> [1.2] -----+
                     |
[2.1] -> [2.2] -----+

Interface crates (off critical path, parallel):
[3.1] + [4.1] -> [5B.1] mcp-federation
[3.1] + [4.1] + [5.1] -> [5B.2] rvf-fed-server

The critical path runs through the segment types, wire codecs, and federation protocol. PII stripping and differential privacy can proceed in parallel but must complete before rvf-federation begins its final integration. The MCP server and REST API crates are off the critical path — they depend on rvf-federation and rvf-gcloud but can be built in parallel with WASM and CLI work.


4. Detailed Implementation Notes

4.1 Segment Type Registration

In crates/rvf/rvf-types/src/segment_type.rs, add to the enum:

/// Federated learning manifest (contributor, privacy budget, segment list).
FederatedManifest = 0x33,
/// Differential privacy proof (epsilon, delta, mechanism, noise proof).
DiffPrivacyProof = 0x34,
/// PII redaction attestation (counts, hashes, rules fired).
RedactionLog = 0x35,
/// Federated-averaged weights (LoRA deltas, participation, convergence).
AggregateWeights = 0x36,

Add to TryFrom<u8>:

0x33 => Ok(Self::FederatedManifest),
0x34 => Ok(Self::DiffPrivacyProof),
0x35 => Ok(Self::RedactionLog),
0x36 => Ok(Self::AggregateWeights),

4.2 PII Detection Patterns

Core regex patterns for rvf-pii-strip:

const PATH_UNIX: &str = r"(?:/(?:home|Users|tmp|var|etc|opt)/[^\s\x00-\x1f]+)";
const PATH_WINDOWS: &str = r"(?:[A-Za-z]:\\(?:Users|Windows|Program Files)[^\s\x00-\x1f]*)";
const IPV4: &str = r"\b(?:\d{1,3}\.){3}\d{1,3}\b";
const IPV6: &str = r"\b(?:[0-9a-fA-F]{1,4}:){2,7}[0-9a-fA-F]{1,4}\b";
const EMAIL: &str = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b";
const API_KEY_OPENAI: &str = r"\bsk-(?:proj-)?[A-Za-z0-9]{20,}\b";
const API_KEY_AWS: &str = r"\bAKIA[A-Z0-9]{16}\b";
const API_KEY_GITHUB: &str = r"\bgh[ps]_[A-Za-z0-9]{36,}\b";
const BEARER_TOKEN: &str = r"\bBearer\s+[A-Za-z0-9\-._~+/]+=*\b";
const ENV_VAR_UNIX: &str = r"\$(?:HOME|USER|PATH|SHELL|TMPDIR|HOSTNAME)\b";
const ENV_VAR_WIN: &str = r"%(?:USERPROFILE|USERNAME|COMPUTERNAME|TEMP|TMP)%";

4.3 Gaussian Mechanism Calibration

For rvf-diff-privacy, the Gaussian mechanism adds noise:

/// Calibrate noise for (epsilon, delta)-differential privacy.
///
/// sigma = sensitivity * sqrt(2 * ln(1.25 / delta)) / epsilon
pub fn calibrate_gaussian(sensitivity: f64, epsilon: f64, delta: f64) -> f64 {
    sensitivity * (2.0 * (1.25 / delta).ln()).sqrt() / epsilon
}

/// Add calibrated Gaussian noise to a parameter vector.
pub fn add_gaussian_noise(
    params: &mut [f32],
    sensitivity: f64,
    epsilon: f64,
    delta: f64,
    rng: &mut impl Rng,
) {
    let sigma = calibrate_gaussian(sensitivity, epsilon, delta) as f32;
    let dist = rand_distr::Normal::new(0.0, sigma).unwrap();
    for p in params.iter_mut() {
        *p += rng.sample(dist);
    }
}

4.4 Renyi Differential Privacy Accountant

/// RDP accountant for privacy budget tracking.
///
/// For the Gaussian mechanism with noise multiplier sigma:
///   RDP(alpha) = alpha / (2 * sigma^2)
///
/// Convert RDP to (epsilon, delta)-DP:
///   epsilon = RDP(alpha) + ln(1/delta) / (alpha - 1) - ln(alpha) / (alpha - 1)
///
/// Composition: RDP values add across multiple queries.
pub struct RdpAccountant {
    /// Accumulated RDP values at each alpha order.
    rdp_values: Vec<(f64, f64)>,  // (alpha, accumulated_rdp)
    /// Alpha orders to track.
    alpha_orders: Vec<f64>,
}

impl RdpAccountant {
    pub fn new() -> Self {
        let alpha_orders: Vec<f64> = (2..=256).map(|a| a as f64).collect();
        let rdp_values = alpha_orders.iter().map(|&a| (a, 0.0)).collect();
        Self { rdp_values, alpha_orders }
    }

    pub fn add_gaussian_query(&mut self, sigma: f64) {
        for (alpha, rdp) in self.rdp_values.iter_mut() {
            *rdp += *alpha / (2.0 * sigma * sigma);
        }
    }

    pub fn get_epsilon(&self, delta: f64) -> f64 {
        self.rdp_values.iter()
            .map(|(alpha, rdp)| rdp + (1.0 / delta).ln() / (alpha - 1.0))
            .fold(f64::INFINITY, f64::min)
    }
}

4.5 Version-Aware Prior Merging

/// Merge a remote TransferPrior into a local one.
///
/// Uses evidence-weighted blending: the source with more training cycles
/// gets higher weight. Then applies sqrt-dampening to prevent over-confidence.
pub fn merge_transfer_priors(
    local: &TransferPrior,
    remote: &TransferPrior,
) -> TransferPrior {
    let total_cycles = local.training_cycles + remote.training_cycles;
    let remote_weight = if total_cycles > 0 {
        remote.training_cycles as f32 / total_cycles as f32
    } else {
        0.5
    };
    let local_weight = 1.0 - remote_weight;

    let mut merged = TransferPrior::uniform(local.source_domain.clone());
    merged.training_cycles = total_cycles;

    // Collect all bucket/arm combinations from both
    let all_buckets: HashSet<_> = local.bucket_priors.keys()
        .chain(remote.bucket_priors.keys())
        .collect();

    for bucket in all_buckets {
        let local_arms = local.bucket_priors.get(bucket);
        let remote_arms = remote.bucket_priors.get(bucket);

        let all_arms: HashSet<_> = local_arms.iter()
            .flat_map(|m| m.keys())
            .chain(remote_arms.iter().flat_map(|m| m.keys()))
            .collect();

        let mut merged_arms = HashMap::new();
        for arm in all_arms {
            let l = local_arms
                .and_then(|m| m.get(arm))
                .unwrap_or(&BetaParams::uniform());
            let r = remote_arms
                .and_then(|m| m.get(arm))
                .unwrap_or(&BetaParams::uniform());

            // Weighted blend
            let alpha = l.alpha * local_weight + r.alpha * remote_weight;
            let beta = l.beta * local_weight + r.beta * remote_weight;

            // Sqrt-dampening (same as MetaThompsonEngine::init_domain_with_transfer)
            let dampened = BetaParams {
                alpha: 1.0 + (alpha - 1.0).sqrt(),
                beta: 1.0 + (beta - 1.0).sqrt(),
            };

            merged_arms.insert(arm.clone(), dampened);
        }
        merged.bucket_priors.insert(bucket.clone(), merged_arms);
    }

    merged
}

4.6 FedAvg Implementation

/// Federated averaging of TransferPriors.
///
/// weight_k = reputation_k * trajectory_count_k * avg_quality_k
/// w_avg = sum(weight_k * prior_k) / sum(weight_k)
pub fn fedavg_priors(
    contributions: &[(TransferPrior, f32)],  // (prior, weight)
) -> TransferPrior {
    let total_weight: f32 = contributions.iter().map(|(_, w)| w).sum();
    if total_weight < 1e-10 || contributions.is_empty() {
        return TransferPrior::uniform(DomainId("aggregate".into()));
    }

    let mut result = TransferPrior::uniform(DomainId("aggregate".into()));

    // Collect all unique bucket/arm combinations
    let all_buckets: HashSet<_> = contributions.iter()
        .flat_map(|(p, _)| p.bucket_priors.keys())
        .collect();

    for bucket in &all_buckets {
        let all_arms: HashSet<_> = contributions.iter()
            .flat_map(|(p, _)| {
                p.bucket_priors.get(*bucket)
                    .map(|m| m.keys().collect::<Vec<_>>())
                    .unwrap_or_default()
            })
            .collect();

        let mut merged_arms = HashMap::new();
        for arm in &all_arms {
            let mut alpha_sum = 0.0;
            let mut beta_sum = 0.0;

            for (prior, weight) in contributions {
                let params = prior.get_prior(bucket, arm);
                let normalized_weight = weight / total_weight;
                alpha_sum += params.alpha * normalized_weight;
                beta_sum += params.beta * normalized_weight;
            }

            merged_arms.insert((*arm).clone(), BetaParams {
                alpha: alpha_sum,
                beta: beta_sum,
            });
        }
        result.bucket_priors.insert((*bucket).clone(), merged_arms);
    }

    result.training_cycles = contributions.iter()
        .map(|(p, _)| p.training_cycles)
        .sum();

    result
}

4.7 Byzantine-Tolerant Aggregation (Krum)

/// Krum aggregation: select the contribution closest to its peers.
///
/// For each contribution k, compute:
///   score_k = sum of distances to (N - f - 1) nearest neighbors
/// where f = ceil(N/3) - 1 (Byzantine tolerance).
///
/// Select the contribution with the minimum score.
pub fn krum_select(
    contributions: &[(TransferPrior, f32)],
) -> Option<usize> {
    let n = contributions.len();
    if n < 4 { return Some(0); }  // Need at least 4 for Byzantine tolerance

    let f = (n as f32 / 3.0).ceil() as usize - 1;
    let neighbors_to_check = n - f - 1;

    // Flatten each prior into a parameter vector for distance computation
    let vectors: Vec<Vec<f32>> = contributions.iter()
        .map(|(p, _)| flatten_prior(p))
        .collect();

    // Compute pairwise distances
    let mut scores = vec![0.0f32; n];
    for i in 0..n {
        let mut distances: Vec<f32> = (0..n)
            .filter(|&j| j != i)
            .map(|j| l2_distance(&vectors[i], &vectors[j]))
            .collect();
        distances.sort_by(|a, b| a.partial_cmp(b).unwrap());
        scores[i] = distances.iter().take(neighbors_to_check).sum();
    }

    scores.iter()
        .enumerate()
        .min_by(|a, b| a.1.partial_cmp(b.1).unwrap())
        .map(|(idx, _)| idx)
}

5. Risk Mitigations

5.1 Poisoning Attack Prevention

Threat: Malicious contributor submits crafted learning data to degrade other users' models.

Mitigations (defense in depth):

  1. Byzantine-tolerant aggregation (Krum) excludes outlier contributions
  2. Reputation system: new contributors have low weight; weight grows with successful contributions
  3. Signature verification: every export is signed, attributable to a pseudonym
  4. Quality gate: only learning from high-quality trajectories is exported
  5. Differential privacy noise limits the impact of any single contribution
  6. Dampened priors: imports are sqrt-dampened before integration

5.2 Privacy Budget Management

Threat: User exports too frequently, accumulating enough epsilon to allow reconstruction.

Mitigations:

  1. PrivacyBudget tracked per-contributor in Firestore
  2. Each export's DiffPrivacyProof records epsilon spent
  3. Server rejects exports when cumulative epsilon exceeds configurable limit (default: 10.0)
  4. Alert at 80% budget usage
  5. Budget resets annually (configurable)

5.3 Backward Compatibility

Threat: Adding new segment types breaks existing RVF readers.

Mitigations:

  1. RVF's forward compatibility: unknown segment types are skipped by readers that do not recognize them (existing behavior)
  2. New segments use 0x33-0x36 range, which existing TryFrom<u8> returns Err(_) for
  3. All federation code is behind federation Cargo feature flag
  4. FederatedManifest header includes a format version field for future evolution

5.4 Regulatory Compliance

Threat: Federation data subject to GDPR/CCPA despite PII stripping.

Mitigations:

  1. PII stripping is mandatory at the export boundary, not optional
  2. RedactionLog provides auditable proof that stripping occurred
  3. Contributor pseudonym (SHAKE-256 hash) is the only identifier in cloud
  4. Right-to-deletion: revoke pseudonym -> delete all GCS objects -> Firestore cleanup
  5. Differential privacy provides mathematical guarantee that individual contributions cannot be reconstructed

6. Monitoring and Observability

6.1 Metrics

Metric Type Description
federation.exports.total Counter Total exports submitted
federation.imports.total Counter Total imports processed
federation.rejections.total{reason} Counter Imports rejected, labeled by reason
federation.pii.detections{type} Counter PII detections by type
federation.privacy.budget.used{contributor} Gauge Epsilon spent per contributor
federation.aggregate.rounds Counter Aggregation rounds completed
federation.aggregate.participants Histogram Participants per round
federation.acceleration.factor Gauge Last measured acceleration from imports
federation.latency.export_ms Histogram Export build time
federation.latency.import_ms Histogram Import + merge time

6.2 Structured Logging

All federation operations emit structured log events:

  • event=federation_export contributor=<pseudonym> domain=<id> segments=<count> epsilon=<val>
  • event=federation_import source=<pseudonym> domain=<id> valid=<bool> reason=<str>
  • event=federation_aggregate round=<id> participants=<count> method=<fedavg|krum>
  • event=pii_detection type=<path|ip|key> count=<n>
  • event=privacy_budget contributor=<pseudonym> remaining=<epsilon>

7. Testing Strategy

7.1 Unit Tests (per crate)

Crate Test Focus Est. Count
rvf-types (federation) Segment type discriminants, header struct sizes ~10
rvf-wire (federation) Codec round-trips, malformed input handling ~15
rvf-pii-strip Detection patterns, redaction determinism, attestation hashes ~30
rvf-diff-privacy Noise calibration, RDP composition, budget tracking ~25
rvf-federation Export/import round-trip, policy enforcement, version compat ~30
rvf-fed-aggregate FedAvg math, Krum selection, reputation scoring ~20
rvf-gcloud Mock-based GCS/PubSub/Firestore operations ~25
rvf-adapters/federation Coordinator export/import, engine integration ~15

7.2 Integration Tests

  • End-to-end export from real SONA + DomainExpansion -> PII strip -> noise -> sign -> validate -> import -> verify acceleration
  • Multi-contributor aggregation round with FedAvg
  • Byzantine tolerance with injected outlier
  • Privacy budget exhaustion and rejection
  • WASM export path (headless browser test)

7.3 Property-Based Tests

  • PII detector: any string matching a PII pattern must be redacted
  • Differential privacy: output distribution must satisfy (epsilon, delta) bounds
  • Witness chain: reordering segments must fail verification
  • FedAvg: result is a convex combination of inputs

8. Open Questions

  1. ML-DSA-65 vs Ed25519: ADR-057 mentions both. Ed25519 is available now in rvf-crypto. ML-DSA-65 (post-quantum) would require adding pqcrypto-dilithium dependency. Recommendation: start with Ed25519, add ML-DSA-65 as a future optional feature.

  2. Reputation bootstrapping: New contributors start with no reputation. How much weight should their first contribution receive? Recommendation: fixed minimum weight of 0.1 for first 5 contributions, then reputation-based.

  3. Cross-region replication: Should GCS buckets be multi-region? Recommendation: start single-region (us-central1), add multi-region when >100 contributors.

  4. Aggregation trigger: Time-based (hourly) or participant-based (every N submissions)? Recommendation: participant-based with timeout fallback. min_participants=5, max_wait=3600s.

  5. SONA weight granularity: Export full LoRA matrices or just the rank-1/rank-2 deltas? Recommendation: export rank-matched deltas only (typically 2hidden_dimrank floats = 1024 floats for rank=2, dim=256). Full matrices are unnecessary for transfer.