# ADR-001: Anytime-Valid Coherence Gate **Status**: Proposed **Date**: 2026-01-17 **Authors**: ruv.io, RuVector Team **Deciders**: Architecture Review Board **SDK**: Claude-Flow ## Version History | Version | Date | Author | Changes | |---------|------|--------|---------| | 0.1 | 2026-01-17 | ruv.io | Initial draft with three-filter architecture | | 0.2 | 2026-01-17 | ruv.io | Added security hardening, performance optimization | | 0.3 | 2026-01-17 | ruv.io | Added 256-tile WASM fabric mapping | | 0.4 | 2026-01-17 | ruv.io | Added API contract, migration, observability | | 0.5 | 2026-01-17 | ruv.io | Added hybrid agent/human workflow | | 0.6 | 2026-01-17 | ruv.io | Added testing strategy, config format, error recovery | ## Plain Language Summary **What is it?** An Anytime-Valid Coherence Gate is a small control loop that decides, at any moment: > "Is it safe to act right now, or should we pause or escalate?" It does not try to be smart. It tries to be **safe**, **calm**, and **correct** about permission. **Why "anytime-valid"?** Because you can stop the computation at any time and still trust the decision. Like a smoke detector: - It can keep listening forever - The moment it has enough evidence, it triggers - If you stop listening early, whatever it already concluded is still valid You are not waiting for a model to finish thinking. You are continuously monitoring stability. **Why "coherence"?** Coherence means: does the system's current state agree with itself? In RuVector, coherence is measured from structure: - RuVector holds relationships as vectors plus a graph - Min-cut and boundary signals tell you when the graph is becoming fragile or splitting into conflicting regions - If the system is splitting, you do not let it take big actions **What it outputs:** | Decision | Meaning | |----------|---------| | **Permit** | Stable enough, proceed | | **Defer** | Uncertain, escalate to a stronger model or human | | **Deny** | Unstable or policy-violating, block the action | Every decision returns a short "receipt" explaining why. **A concrete example:** An agent wants to push a config change to a network device. - If the dependency graph is stable and similar changes worked before → **Permit** - If signals are weird (new dependencies, new actors, drift) → **Defer** and ask for confirmation - If the change crosses a fragile boundary (touches a partition already unstable) → **Deny** **Why it matters:** It turns autonomy into something enterprises can trust because: - Actions are bounded - Uncertainty is handled explicitly - You get an audit trail *"Attention becomes a permission system, not a popularity contest"* — applied to whole-system actions instead of token attention. --- ## Context The RuVector ecosystem requires a principled mechanism for controlling autonomous agent actions with: - **Formal safety guarantees** under distribution shift - **Computational efficiency** suitable for real-time enforcement - **Auditable decision trails** with cryptographic receipts Current approaches (threshold classifiers, rule-based systems, periodic audits) lack one or more of these properties. This ADR proposes the **Anytime-Valid Coherence Gate (AVCG)** - a 3-way algorithmic combination that converts coherence measurement into a deterministic control loop. ## Decision We will implement an Anytime-Valid Coherence Gate that integrates three cutting-edge algorithmic components: ### 1. Dynamic Min-Cut with Witness Partitions **Source**: El-Hayek, Henzinger, Li (arXiv:2512.13105, December 2025) **Key Innovation**: Exact deterministic n^{o(1)} update time for cuts up to 2^{Θ(log^{3/4-c}n)} **Integration**: - Extends existing `SubpolynomialMinCut` in `ruvector-mincut/src/subpolynomial/mod.rs` - Leverages existing `WitnessTree` for explicit partition certificates - Uses deterministic `LocalKCut` for local cut verification **Role in Gate**: Provides the **structural coherence signal** - identifies minimal intervention points in the agent action graph with explicit witness partitions showing which actions form the critical boundary to unsafe states. ### 2. Online Conformal Prediction with Shift-Awareness **Sources**: - Retrospective Adjustment (arXiv:2511.04275, November 2025) - Conformal Optimistic Prediction (COP) (December 2025) - CORE: RL-based Conformal Regression (October 2025) **Key Innovation**: Distribution-free coverage guarantees that adapt to arbitrary distribution shift with faster recalibration via retrospective adjustment. **Integration**: - New module: `ruvector-mincut/src/conformal/` for prediction sets - Interfaces with existing `GatePolicy` thresholds - Wraps action outcome predictions with calibrated uncertainty **Role in Gate**: Provides the **predictive uncertainty signal** - quantifies confidence in action outcomes, triggering DEFER when prediction sets are too large. ### 3. E-Values and E-Processes for Anytime-Valid Inference **Sources**: - Ramdas & Wang "Hypothesis Testing with E-values" (FnTStA 2025) - ICML 2025 Tutorial on SAVI - Sequential Randomization Tests (arXiv:2512.04366, December 2025) **Key Innovation**: Evidence accumulation that remains valid at any stopping time, with multiplicative composition across experiments. **Definition**: E-value e satisfies E[e] ≤ 1 under null hypothesis. E-processes are nonnegative supermartingales with E_0 = 1. **Integration**: - New module: `ruvector-mincut/src/eprocess/` for evidence tracking - Integrates with existing `CutCertificate` for audit trails - Enables anytime-valid stopping decisions **Role in Gate**: Provides the **evidential validity signal** - accumulates statistical evidence for/against coherence with formal Type I error control at any stopping time. ## Gate Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ ANYTIME-VALID COHERENCE GATE │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ DYNAMIC MIN-CUT │ │ CONFORMAL │ │ E-PROCESS │ │ │ │ (Structural) │ │ (Predictive) │ │ (Evidential) │ │ │ │ │ │ │ │ │ │ │ │ SubpolynomialMC │ │ ShiftAdaptive │ │ CoherenceTest │ │ │ │ WitnessTree │───▶│ PredictionSet │───▶│ EvidenceAccum │ │ │ │ LocalKCut │ │ COP/CORE │ │ StoppingRule │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ DECISION LOGIC │ │ │ │ │ │ │ │ PERMIT: E_t > τ_permit ∧ action ∉ CriticalCut ∧ |C_t| small │ │ │ │ DEFER: |C_t| large ∨ τ_deny < E_t < τ_permit │ │ │ │ DENY: E_t < τ_deny ∨ action ∈ WitnessPartition(unsafe) │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ WITNESS RECEIPT │ │ │ │ (cut + conf + e) │ │ │ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ## Integration with Existing Architecture ### Extension Points | Component | Current Implementation | AVCG Extension | |-----------|----------------------|----------------| | `GatePacket` | λ as point estimate | Add `lambda_confidence_q15`, `e_value_log_q15` | | `GateController` | Rule-based thresholds | Add `AnytimeGatePolicy` with adaptive thresholds | | `WitnessTree` | Cut value only | Add `ConfidenceWitness` with staleness tracking | | `CutCertificate` | Static verification | Add `EvidenceReceipt` with e-value trace | | `TierDecision` | Fixed tiers | Add `required_confidence_for_tier` | ### New Modules ``` ruvector-mincut/ ├── src/ │ ├── conformal/ # NEW: Online conformal prediction │ │ ├── mod.rs │ │ ├── prediction_set.rs │ │ ├── cop.rs # Conformal Optimistic Prediction │ │ ├── retrospective.rs # Retrospective adjustment │ │ └── core.rs # RL-based conformal │ ├── eprocess/ # NEW: E-value and e-process tracking │ │ ├── mod.rs │ │ ├── evalue.rs │ │ ├── evidence_accum.rs │ │ ├── stopping.rs │ │ └── mixture.rs │ ├── anytime_gate/ # NEW: Integrated gate controller │ │ ├── mod.rs │ │ ├── policy.rs │ │ ├── decision.rs │ │ └── receipt.rs │ └── ...existing modules... ``` ## Decision Rules ### Permit Conditions (all must hold) 1. E-process value E_t > τ_permit (sufficient evidence of coherence) 2. Action not in witness partition of critical cut 3. Conformal prediction set |C_t| < θ_confidence (confident prediction) ### Defer Conditions (any triggers) 1. Conformal prediction set |C_t| > θ_uncertainty (uncertain outcome) 2. E-process in indeterminate range: τ_deny < E_t < τ_permit 3. Deadline approaching without sufficient confidence ### Deny Conditions (any triggers) 1. E-process value E_t < τ_deny (strong evidence of incoherence) 2. Action in witness partition crossing to unsafe states 3. Structural impossibility via min-cut topology ## Threshold Configuration | Threshold | Meaning | Recommended Default | |-----------|---------|---------------------| | τ_deny | E-process level indicating incoherence | 0.01 (1% false alarm) | | τ_permit | E-process level indicating coherence | 100 (strong evidence) | | θ_uncertainty | Conformal set size requiring deferral | Task-dependent | | θ_confidence | Conformal set size for confident permit | Task-dependent | ## Witness Receipt Structure ```rust pub struct WitnessReceipt { /// Timestamp of decision pub timestamp: u64, /// Action that was evaluated pub action_id: ActionId, /// Gate decision pub decision: GateDecision, // Structural witness (from min-cut) pub cut_value: f64, pub witness_partition: (Vec, Vec), pub critical_edges: Vec, // Predictive witness (from conformal) pub prediction_set: ConformalSet, pub coverage_target: f32, pub shift_adaptation_rate: f32, // Evidential witness (from e-process) pub e_value: f64, pub e_process_cumulative: f64, pub stopping_valid: bool, // Cryptographic seal pub receipt_hash: [u8; 32], } ``` ## Security Hardening ### Threat Model | Threat Actor | Capabilities | Target | Impact | |--------------|--------------|--------|--------| | **Malicious Agent** | Action injection, timing manipulation | Gate bypass | Unauthorized actions executed | | **Network Adversary** | Message interception, replay | Receipt forgery | False audit trail | | **Insider Threat** | Threshold modification, key access | Policy manipulation | Safety guarantees voided | | **Byzantine Node** | Arbitrary behavior in distributed gate | Consensus corruption | Inconsistent decisions | ### Cryptographic Requirements #### Receipt Signing (CRITICAL) ```rust pub struct WitnessReceipt { // ... existing fields ... // Cryptographic seal (REQUIRED) pub receipt_hash: [u8; 32], // Blake3 hash of serialized content pub signature: Ed25519Signature, // REQUIRED, not optional pub signer_id: PublicKey, // Identity of signing gate pub timestamp_proof: TimestampProof, // Prevents backdating } /// Timestamp proof prevents replay and backdating pub struct TimestampProof { pub timestamp: u64, pub previous_receipt_hash: [u8; 32], // Chain linkage pub merkle_root: [u8; 32], // Batch anchor } impl WitnessReceipt { /// Sign receipt - MUST be called before any external use pub fn sign(&mut self, key: &SigningKey) -> Result<(), CryptoError> { let content = self.serialize_without_signature(); self.receipt_hash = blake3::hash(&content).into(); self.signature = key.sign(&self.receipt_hash); Ok(()) } /// Verify receipt integrity and authenticity pub fn verify(&self, trusted_keys: &KeyStore) -> Result<(), VerifyError> { // 1. Verify hash let expected_hash = blake3::hash(&self.serialize_without_signature()); if self.receipt_hash != expected_hash.into() { return Err(VerifyError::HashMismatch); } // 2. Verify signature let public_key = trusted_keys.get(&self.signer_id)?; public_key.verify(&self.receipt_hash, &self.signature)?; // 3. Verify timestamp chain self.timestamp_proof.verify()?; Ok(()) } } ``` #### Key Management | Key Type | Purpose | Rotation | Storage | |----------|---------|----------|---------| | Gate Signing Key | Sign receipts | 30 days | HSM or secure enclave | | Receipt Verification Keys | Verify receipts | On rotation | Distributed key store | | Threshold Keys | Multi-party signing | 90 days | Shamir secret sharing | ### Attack Mitigations #### E-Value Manipulation Prevention ```rust /// Bounds checking for e-value inputs impl EValue { pub fn from_likelihood_ratio( likelihood_h1: f64, likelihood_h0: f64, ) -> Result { // Prevent division by zero if likelihood_h0 <= f64::EPSILON { return Err(EValueError::InvalidDenominator); } let ratio = likelihood_h1 / likelihood_h0; // Bound extreme values to prevent overflow attacks let bounded = ratio.clamp(E_VALUE_MIN, E_VALUE_MAX); // Log if clamping occurred (potential attack indicator) if (bounded - ratio).abs() > f64::EPSILON { security_log!("E-value clamped: {} -> {}", ratio, bounded); } Ok(Self { value: bounded, ..Default::default() }) } } const E_VALUE_MIN: f64 = 1e-10; const E_VALUE_MAX: f64 = 1e10; ``` #### Race Condition Prevention ```rust /// Atomic gate decision with sequence numbers pub struct AtomicGateDecision { /// Monotonic sequence for ordering sequence: AtomicU64, /// Lock for decision atomicity decision_lock: RwLock<()>, } impl AtomicGateDecision { pub async fn evaluate(&self, action: &Action) -> GateResult { // Acquire exclusive lock for decision let _guard = self.decision_lock.write().await; // Get sequence number BEFORE evaluation let seq = self.sequence.fetch_add(1, Ordering::SeqCst); // Evaluate all three signals atomically let result = self.evaluate_internal(action, seq).await; // Sequence number in receipt ensures ordering result.with_sequence(seq) } } ``` #### Replay Attack Prevention ```rust /// Replay prevention via nonce tracking pub struct ReplayGuard { /// Recent action hashes (bloom filter for efficiency) recent_actions: BloomFilter, /// Sliding window of full hashes for false positive resolution hash_window: VecDeque<[u8; 32]>, /// Maximum age of tracked actions window_duration: Duration, } impl ReplayGuard { pub fn check_and_record(&mut self, action: &Action) -> Result<(), ReplayError> { let hash = action.content_hash(); // Fast path: bloom filter check if self.recent_actions.might_contain(&hash) { // Slow path: verify against full hash window if self.hash_window.contains(&hash) { return Err(ReplayError::DuplicateAction { hash }); } } // Record action self.recent_actions.insert(&hash); self.hash_window.push_back(hash); self.prune_old_entries(); Ok(()) } } ``` ### Trust Boundaries ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ TRUST BOUNDARY: GATE CORE │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ • E-process computation • Min-cut evaluation │ │ │ │ • Conformal prediction • Decision logic │ │ │ │ • Receipt signing • Key material │ │ │ │ │ │ │ │ Invariants: │ │ │ │ - All inputs validated before use │ │ │ │ - All outputs signed before release │ │ │ │ - No external calls during decision │ │ │ └───────────────────────────────────────────────────────────────────┘ │ │ │ │ │ (authenticated channel) │ │ │ │ └────────────────────────────────────┼────────────────────────────────────┘ │ ┌────────────────────────────────────┼────────────────────────────────────┐ │ TRUST BOUNDARY: AGENT INTERFACE │ │ │ │ │ • Action submission (validated) │ • Decision receipt (verified) │ │ • Context provision (sanitized) │ • Witness query (authenticated) │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` --- ## Performance Optimization ### Identified Bottlenecks & Solutions #### 1. E-Process History Management **Problem**: Unbounded history growth in `EProcess.history: Vec` **Solution**: Ring buffer with configurable retention ```rust pub struct EProcess { /// Current accumulated value (always maintained) current: f64, /// Bounded history ring buffer history: RingBuffer, /// Checkpoint for long-term audit (sampled) checkpoints: Vec, } /// Compact summary for history pub struct EValueSummary { value: f32, // Reduced precision for storage timestamp: u32, // Relative to epoch flags: u8, // Metadata bits } impl EProcess { const HISTORY_CAPACITY: usize = 1024; const CHECKPOINT_INTERVAL: usize = 100; pub fn update(&mut self, e: EValue) { // Update current (always) self.current = self.update_rule.apply(self.current, e.value); // Add to ring buffer (bounded) self.history.push(e.to_summary()); // Periodic checkpoint for audit if self.history.len() % Self::CHECKPOINT_INTERVAL == 0 { self.checkpoints.push(self.checkpoint()); } } } ``` #### 2. Min-Cut Hierarchy Updates **Problem**: Sequential iteration over all hierarchy levels **Solution**: Lazy propagation with dirty tracking ```rust pub struct LazyHierarchy { levels: Vec, /// Bitmap of levels needing update dirty_levels: u64, /// Deferred updates queue pending_updates: VecDeque, } impl LazyHierarchy { pub fn insert(&mut self, edge: Edge) { // Only update lowest level immediately self.levels[0].insert(edge); self.dirty_levels |= 1; // Defer higher level updates self.pending_updates.push_back(DeferredUpdate::Insert(edge)); } pub fn get_cut(&mut self) -> CutValue { // Propagate only if needed for query if self.dirty_levels != 0 { self.propagate_lazy(); } self.levels.last().unwrap().cut_value() } fn propagate_lazy(&mut self) { // Process only dirty levels while self.dirty_levels != 0 { let level = self.dirty_levels.trailing_zeros() as usize; self.update_level(level); self.dirty_levels &= !(1 << level); } } } ``` #### 3. SIMD-Optimized E-Value Computation ```rust #[cfg(target_arch = "x86_64")] use std::arch::x86_64::*; /// Batch e-value computation with SIMD pub fn compute_mixture_evalue_simd( likelihoods_h1: &[f64], likelihoods_h0: &[f64], weights: &[f64], ) -> f64 { assert_eq!(likelihoods_h1.len(), likelihoods_h0.len()); assert_eq!(likelihoods_h1.len(), weights.len()); #[cfg(target_feature = "avx2")] unsafe { let mut sum = _mm256_setzero_pd(); for i in (0..likelihoods_h1.len()).step_by(4) { let h1 = _mm256_loadu_pd(likelihoods_h1.as_ptr().add(i)); let h0 = _mm256_loadu_pd(likelihoods_h0.as_ptr().add(i)); let w = _mm256_loadu_pd(weights.as_ptr().add(i)); let ratio = _mm256_div_pd(h1, h0); let weighted = _mm256_mul_pd(ratio, w); sum = _mm256_add_pd(sum, weighted); } // Horizontal sum horizontal_sum_pd(sum) } #[cfg(not(target_feature = "avx2"))] { // Scalar fallback likelihoods_h1.iter() .zip(likelihoods_h0.iter()) .zip(weights.iter()) .map(|((h1, h0), w)| (h1 / h0) * w) .sum() } } ``` #### 4. Receipt Serialization Optimization ```rust /// Zero-copy receipt serialization pub struct ReceiptBuffer { /// Pre-allocated buffer pool pool: BufferPool, /// Current buffer current: Buffer, } impl WitnessReceipt { /// Serialize to pre-allocated buffer (zero-copy) pub fn serialize_into(&self, buffer: &mut [u8]) -> Result { let mut cursor = 0; // Fixed-size header (no allocation) cursor += self.write_header(&mut buffer[cursor..])?; // Structural witness (fixed size) cursor += self.structural.write_to(&mut buffer[cursor..])?; // Predictive witness (bounded size) cursor += self.predictive.write_to(&mut buffer[cursor..])?; // Evidential witness (fixed size) cursor += self.evidential.write_to(&mut buffer[cursor..])?; // Hash and signature (fixed size) buffer[cursor..cursor + 32].copy_from_slice(&self.receipt_hash); cursor += 32; buffer[cursor..cursor + 64].copy_from_slice(&self.signature.to_bytes()); cursor += 64; Ok(cursor) } } ``` ### Latency Budget (Revised) | Component | Budget | Optimization | Measured p99 | |-----------|--------|--------------|--------------| | Min-cut query | 10ms | Lazy propagation | TBD | | Conformal prediction | 15ms | Cached quantiles | TBD | | E-process update | 5ms | SIMD mixture | TBD | | Decision logic | 5ms | Short-circuit | TBD | | Receipt generation | 10ms | Zero-copy serialize | TBD | | Signing | 5ms | Ed25519 batch | TBD | | **Total** | **50ms** | | | --- ## Distributed Coordination ### Multi-Agent Gate Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ DISTRIBUTED COHERENCE GATE │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ REGIONAL │ │ REGIONAL │ │ REGIONAL │ │ │ │ GATE (Raft) │ │ GATE (Raft) │ │ GATE (Raft) │ │ │ │ │ │ │ │ │ │ │ │ • Local cuts │ │ • Local cuts │ │ • Local cuts │ │ │ │ • Local conf │ │ • Local conf │ │ • Local conf │ │ │ │ • Local e-proc │ │ • Local e-proc │ │ • Local e-proc │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ └──────────────────────┼──────────────────────┘ │ │ │ │ │ ┌─────────────▼─────────────┐ │ │ │ GLOBAL COORDINATOR │ │ │ │ (DAG Consensus) │ │ │ │ │ │ │ │ • Cross-region cuts │ │ │ │ • Aggregated e-process │ │ │ │ • Boundary arbitration │ │ │ └───────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Hierarchical Decision Protocol ```rust /// Distributed gate with hierarchical coordination pub struct DistributedGateController { /// Local gate for fast-path decisions local_gate: AnytimeGateController, /// Regional coordinator (Raft consensus) regional: RegionalCoordinator, /// Global coordinator (DAG consensus) global: GlobalCoordinator, /// Decision routing policy routing: DecisionRoutingPolicy, } pub enum DecisionScope { /// Action affects only local partition Local, /// Action crosses regional boundary Regional, /// Action has global implications Global, } impl DistributedGateController { pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult { // 1. Determine scope let scope = self.routing.classify(action, context); // 2. Route to appropriate level match scope { DecisionScope::Local => { // Fast path: local decision only self.local_gate.evaluate(action, context) } DecisionScope::Regional => { // Medium path: coordinate with regional peers let local_result = self.local_gate.evaluate(action, context); let regional_result = self.regional.coordinate(action, &local_result).await?; self.merge_results(local_result, regional_result) } DecisionScope::Global => { // Slow path: full coordination let local_result = self.local_gate.evaluate(action, context); let regional_result = self.regional.coordinate(action, &local_result).await?; let global_result = self.global.arbitrate(action, ®ional_result).await?; self.merge_all_results(local_result, regional_result, global_result) } } } } ``` ### Distributed E-Process Aggregation ```rust /// E-process that aggregates across distributed gates pub struct DistributedEProcess { /// Local e-process local: EProcess, /// Peer e-process summaries (received via gossip) peer_summaries: HashMap, /// Aggregation method aggregation: AggregationMethod, } pub enum AggregationMethod { /// Conservative: minimum across all nodes Minimum, /// Average with confidence weighting WeightedAverage, /// Consensus-based (requires agreement) Consensus { threshold: f64 }, } impl DistributedEProcess { /// Get aggregated e-value for distributed decision pub fn aggregated_value(&self) -> f64 { match self.aggregation { AggregationMethod::Minimum => { let local = self.local.current_value(); let peer_min = self.peer_summaries.values() .map(|s| s.current_value) .fold(f64::INFINITY, f64::min); local.min(peer_min) } AggregationMethod::WeightedAverage => { let total_weight: f64 = 1.0 + self.peer_summaries.values() .map(|s| s.confidence_weight) .sum::(); let weighted_sum = self.local.current_value() + self.peer_summaries.values() .map(|s| s.current_value * s.confidence_weight) .sum::(); weighted_sum / total_weight } AggregationMethod::Consensus { threshold } => { // Requires threshold fraction of nodes to agree let values: Vec = std::iter::once(self.local.current_value()) .chain(self.peer_summaries.values().map(|s| s.current_value)) .collect(); // Return median if sufficient agreement, else conservative min if self.check_agreement(&values, threshold) { statistical_median(&values) } else { values.iter().cloned().fold(f64::INFINITY, f64::min) } } } } } ``` ### Fault Tolerance ```rust /// Fault-tolerant gate with automatic failover pub struct FaultTolerantGate { /// Primary gate primary: AnytimeGateController, /// Standby gates (hot standbys) standbys: Vec, /// Health monitor health: HealthMonitor, /// Failover policy failover: FailoverPolicy, } pub struct FailoverPolicy { /// Maximum consecutive failures before failover max_failures: u32, /// Health check interval check_interval: Duration, /// Recovery grace period recovery_grace: Duration, } impl FaultTolerantGate { pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult { // Try primary match self.try_primary(action, context).await { Ok(result) => return Ok(result), Err(e) => { self.health.record_failure(&e); } } // Failover to standbys for (idx, standby) in self.standbys.iter_mut().enumerate() { match standby.evaluate(action, context) { Ok(result) => { // Promote standby if primary unhealthy if self.health.should_failover() { self.promote_standby(idx); } return Ok(result); } Err(e) => { self.health.record_standby_failure(idx, &e); } } } // All gates failed - safe default Ok(GateResult { decision: GateDecision::Deny, reason: "All gates unavailable - failing safe".into(), ..Default::default() }) } } ``` ### Integration with RuVector Consensus | Consensus Layer | RuVector Module | Gate Integration | |-----------------|-----------------|------------------| | Regional (Raft) | `ruvector-raft` | Local cut coordination, leader-based decisions | | Global (DAG) | `ruvector-cluster` | Cross-region boundary arbitration | | State Sync | `ruvector-sync` | E-process summary propagation | | Receipt Chain | `ruvector-merkle` | Distributed receipt verification | --- ## Hardware Mapping: 256-Tile WASM Fabric The coherence gate is an ideal workload for event-driven WASM hardware: **mostly silent, then extremely decisive when boundaries move**. ### Tile Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ 256-TILE COGNITUM FABRIC │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ TILE ZERO (Arbiter) │ │ │ │ │ │ │ │ • Merge worker reports • Hierarchical min-cut │ │ │ │ • Global gate decision • Permit token issuance │ │ │ │ • Witness receipt log • Hash-chained eventlog │ │ │ └──────────────────────────────┬───────────────────────────────────┘ │ │ │ │ │ ┌────────────────────┼────────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Workers │ │ Workers │ │ Workers │ ... │ │ │ [1-85] │ │ [86-170] │ │ [171-255] │ │ │ │ │ │ │ │ │ │ │ │ Shard A │ │ Shard B │ │ Shard C │ │ │ │ Local cuts │ │ Local cuts │ │ Local cuts │ │ │ │ E-accum │ │ E-accum │ │ E-accum │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Worker Tile Responsibilities Each of the 255 worker tiles maintains a **local shard**: ```rust /// Worker tile state (fits in ~64KB WASM memory) #[repr(C)] pub struct WorkerTileState { /// Compact neighborhood graph (edges + weights) graph_shard: CompactGraph, // ~32KB /// Rolling feature window for normality scores feature_window: RingBuffer, // ~8KB /// Local coherence score coherence: f32, /// Local boundary candidates (top-k edges) boundary_edges: [EdgeId; 8], /// Local e-value accumulator e_accumulator: f64, /// Tick counter tick: u64, } /// Per-tick processing: only deltas impl WorkerTileState { /// Process incoming delta (edge add/remove/weight update) pub fn ingest_delta(&mut self, delta: &Delta) -> Status { match delta { Delta::EdgeAdd(e) => self.graph_shard.add_edge(e), Delta::EdgeRemove(e) => self.graph_shard.remove_edge(e), Delta::WeightUpdate(e, w) => self.graph_shard.update_weight(e, *w), Delta::Observation(score) => self.feature_window.push(*score), } self.update_local_coherence(); Status::Ok } /// Tick: compute and emit report pub fn tick(&mut self, now_ns: u64) -> TileReport { self.tick = now_ns; // Tiny math: update e-accumulator self.e_accumulator = self.compute_local_evalue(); TileReport { tile_id: self.id, coherence: self.coherence, boundary_moved: self.detect_boundary_movement(), suspicious_edges: self.top_k_suspicious(), e_value: self.e_accumulator as f32, witness_fragment: self.extract_witness_fragment(), } } } /// Fixed-size report (fits in single cache line) #[repr(C, align(64))] pub struct TileReport { tile_id: u8, coherence: f32, boundary_moved: bool, suspicious_edges: [EdgeId; 4], e_value: f32, witness_fragment: WitnessFragment, } ``` ### TileZero Responsibilities TileZero acts as the **arbiter** that issues final decisions: ```rust /// TileZero: Global gate decision and permit issuance pub struct TileZero { /// Merged supergraph (reduced from worker summaries) supergraph: ReducedGraph, /// Canonical permit token state permit_state: PermitState, /// Hash-chained witness receipt log receipt_log: ReceiptLog, /// Threshold configuration thresholds: GateThresholds, } impl TileZero { /// Collect reports from all worker tiles pub fn collect_reports(&mut self, reports: &[TileReport; 255]) { // Merge worker summaries into supergraph for report in reports { if report.boundary_moved { self.supergraph.update_from_fragment(&report.witness_fragment); } self.supergraph.update_coherence(report.tile_id, report.coherence); } } /// Issue gate decision (microsecond latency) pub fn decide(&mut self, action_ctx: &ActionContext) -> PermitToken { // Three stacked filters: // 1. Structural filter (global cut on reduced graph) let structural_ok = self.supergraph.global_cut() >= self.thresholds.min_cut; // 2. Shift filter (aggregated shift pressure) let shift_pressure = self.aggregate_shift_pressure(); let shift_ok = shift_pressure < self.thresholds.max_shift; // 3. Evidence filter (can stop immediately if enough evidence) let e_aggregate = self.aggregate_evidence(); let evidence_decision = self.evidence_decision(e_aggregate); // Combined decision let decision = match (structural_ok, shift_ok, evidence_decision) { (false, _, _) => GateDecision::Deny, // Structure broken (_, false, _) => GateDecision::Defer, // Shift detected (_, _, EvidenceDecision::Reject) => GateDecision::Deny, (_, _, EvidenceDecision::Continue) => GateDecision::Defer, (true, true, EvidenceDecision::Accept) => GateDecision::Permit, }; // Issue token self.issue_permit_token(action_ctx, decision) } /// Issue permit token (a signed capability) fn issue_permit_token( &mut self, ctx: &ActionContext, decision: GateDecision, ) -> PermitToken { let witness_hash = self.compute_witness_hash(); let token = PermitToken { decision, action_id: ctx.action_id, timestamp: now_ns(), ttl_ns: self.thresholds.permit_ttl, witness_hash, sequence: self.permit_state.next_sequence(), }; // MAC or sign the token let mac = self.permit_state.sign(&token); // Emit receipt self.emit_receipt(&token, &mac); PermitToken { mac, ..token } } /// Emit witness receipt (hash-chained) fn emit_receipt(&mut self, token: &PermitToken, mac: &[u8; 32]) { let receipt = WitnessReceipt { token: token.clone(), mac: *mac, previous_hash: self.receipt_log.last_hash(), witness_summary: self.supergraph.witness_summary(), }; self.receipt_log.append(receipt); } } /// Permit token: a capability that agents must present #[repr(C)] pub struct PermitToken { pub decision: GateDecision, pub action_id: ActionId, pub timestamp: u64, pub ttl_ns: u64, pub witness_hash: [u8; 32], pub sequence: u64, pub mac: [u8; 32], // HMAC or signature } impl PermitToken { /// Agents must present valid token to perform actions pub fn is_valid(&self, verifier: &Verifier) -> bool { // Check TTL if now_ns() > self.timestamp + self.ttl_ns { return false; } // Verify MAC/signature verifier.verify(self, &self.mac) } } ``` ### WASM Kernel API Each tile runs a minimal WASM kernel: ```rust /// Worker tile WASM exports #[no_mangle] pub extern "C" fn ingest_delta(delta_ptr: *const u8, len: usize) -> u32 { let delta = unsafe { core::slice::from_raw_parts(delta_ptr, len) }; TILE_STATE.with(|state| state.borrow_mut().ingest_delta(delta)) } #[no_mangle] pub extern "C" fn tick(now_ns: u64) -> *const TileReport { TILE_STATE.with(|state| state.borrow_mut().tick(now_ns)) } #[no_mangle] pub extern "C" fn get_witness_fragment(id: u32) -> *const u8 { TILE_STATE.with(|state| state.borrow().get_witness_fragment(id)) } /// TileZero WASM/native exports #[no_mangle] pub extern "C" fn collect_reports(reports_ptr: *const TileReport, count: usize) { TILEZERO.with(|tz| tz.borrow_mut().collect_reports(reports_ptr, count)) } #[no_mangle] pub extern "C" fn decide(action_ctx_ptr: *const ActionContext) -> *const PermitToken { TILEZERO.with(|tz| tz.borrow_mut().decide(action_ctx_ptr)) } #[no_mangle] pub extern "C" fn get_receipt(sequence: u64) -> *const WitnessReceipt { TILEZERO.with(|tz| tz.borrow().get_receipt(sequence)) } ``` ### v0 Implementation Strategy Ship fast by layering: | Phase | Components | Skip Initially | |-------|------------|----------------| | **v0.1** | Structural coherence + witness receipt | Shift filter, evidence filter | | **v0.2** | Add shift filter (normality scores) | CORE RL adaptation | | **v0.3** | Add evidence filter (e-values) | Mixture e-values | | **v1.0** | Full three-filter stack | - | ### Rust Deliverables | Crate | Description | Dependencies | |-------|-------------|--------------| | `cognitum-gate-kernel` | `no_std` WASM kernel for worker tiles | `ruvector-mincut` (core algorithms) | | `cognitum-gate-tilezero` | Native arbiter for TileZero | `ruvector-mincut`, `blake3`, `ed25519` | | `mcp-gate` | MCP server for agent integration | `cognitum-gate-tilezero` | ``` cognitum-gate/ ├── cognitum-gate-kernel/ # no_std WASM │ ├── Cargo.toml │ └── src/ │ ├── lib.rs # WASM exports │ ├── shard.rs # Compact graph shard │ ├── evidence.rs # Local e-accumulator │ └── report.rs # TileReport generation │ ├── cognitum-gate-tilezero/ # Native arbiter │ ├── Cargo.toml │ └── src/ │ ├── lib.rs │ ├── merge.rs # Report merging │ ├── supergraph.rs # Reduced global graph │ ├── permit.rs # Token issuance │ └── receipt.rs # Hash-chained log │ └── mcp-gate/ # MCP integration ├── Cargo.toml └── src/ ├── lib.rs ├── tools.rs # permit_action, get_receipt, replay_decision └── server.rs # MCP server ``` ### MCP Gate Tools ```rust /// MCP tool: Request permission for an action #[mcp_tool] pub async fn permit_action( action_id: String, action_type: String, context: serde_json::Value, ) -> Result { let ctx = ActionContext::from_json(&context)?; let token = TILEZERO.decide(&ctx); Ok(PermitResponse { decision: token.decision.to_string(), token: token.encode_base64(), witness_hash: hex::encode(&token.witness_hash), valid_until_ns: token.timestamp + token.ttl_ns, }) } /// MCP tool: Get witness receipt for audit #[mcp_tool] pub async fn get_receipt(sequence: u64) -> Result { let receipt = TILEZERO.get_receipt(sequence)?; Ok(ReceiptResponse { sequence, decision: receipt.token.decision.to_string(), timestamp: receipt.token.timestamp, witness_summary: receipt.witness_summary.to_json(), previous_hash: hex::encode(&receipt.previous_hash), receipt_hash: hex::encode(&receipt.hash()), }) } /// MCP tool: Replay decision for debugging/audit #[mcp_tool] pub async fn replay_decision( sequence: u64, verify_chain: bool, ) -> Result { let receipt = TILEZERO.get_receipt(sequence)?; // Optionally verify hash chain if verify_chain { TILEZERO.verify_chain_to(sequence)?; } // Replay the decision with logged state let replayed = TILEZERO.replay(&receipt)?; Ok(ReplayResponse { original_decision: receipt.token.decision.to_string(), replayed_decision: replayed.decision.to_string(), match_confirmed: receipt.token.decision == replayed.decision, state_snapshot: replayed.state_snapshot.to_json(), }) } ``` ### The Practical Win This gives Cognitum a clear job that buyers understand: > **"We do not just detect issues, we prevent unsafe actions."** > **"We can prove why we blocked or allowed it."** > **"We stay calm until structure breaks."** The permit token as a capability means: - Agents cannot act without presenting a valid token - Tokens expire (TTL-bounded) - Every token is backed by a witness receipt - The entire chain is cryptographically verifiable --- ## API Contract ### Request: Permit Action ```json { "action_id": "cfg-push-7a3f", "action_type": "config_change", "target": { "device": "router-west-03", "path": "/network/interfaces/eth0" }, "context": { "agent_id": "ops-agent-12", "session_id": "sess-abc123", "prior_actions": ["cfg-push-7a3e"], "urgency": "normal" } } ``` ### Response: Permit ```json { "decision": "permit", "token": "eyJ0eXAiOiJQVCIsImFsZyI6IkVkMjU1MTkifQ...", "valid_until_ns": 1737158400000000000, "witness": { "structural": { "cut_value": 12.7, "partition": "stable", "critical_edges": 0 }, "predictive": { "set_size": 3, "coverage": 0.92 }, "evidential": { "e_value": 847.3, "verdict": "accept" } }, "receipt_sequence": 1847392 } ``` ### Response: Defer ```json { "decision": "defer", "reason": "shift_detected", "detail": "Distribution shift pressure 0.73 exceeds threshold 0.5", "escalation": { "to": "human_operator", "context_url": "/receipts/1847393/context", "timeout_ns": 300000000000 }, "witness": { "structural": { "cut_value": 11.2, "partition": "stable" }, "predictive": { "set_size": 18, "coverage": 0.91 }, "evidential": { "e_value": 3.2, "verdict": "continue" } }, "receipt_sequence": 1847393 } ``` ### Response: Deny ```json { "decision": "deny", "reason": "boundary_violation", "detail": "Action crosses fragile partition (cut=2.1 < min=5.0)", "witness": { "structural": { "cut_value": 2.1, "partition": "fragile", "critical_edges": 4, "boundary": ["edge-17", "edge-23", "edge-41", "edge-52"] }, "predictive": { "set_size": 47, "coverage": 0.88 }, "evidential": { "e_value": 0.004, "verdict": "reject" } }, "receipt_sequence": 1847394 } ``` --- ## Migration Path ### Phase M1: Shadow Mode Run AVCG alongside existing `GateController`. Compare decisions, don't enforce. ```rust impl HybridGate { pub fn evaluate(&mut self, action: &Action) -> GateResult { // Existing gate makes the decision let legacy_result = self.legacy_gate.evaluate(action); // AVCG runs in shadow, logs disagreements let avcg_result = self.avcg_gate.evaluate(action); if legacy_result.decision != avcg_result.decision { metrics::counter!("gate.shadow.disagreement").increment(1); log::info!( "Shadow disagreement: legacy={:?} avcg={:?} action={}", legacy_result.decision, avcg_result.decision, action.id ); } legacy_result // Legacy still decides } } ``` **Exit criteria**: <1% disagreement rate over 7 days, zero false denies on known-safe actions. ### Phase M2: Canary Enforcement AVCG enforces for 5% of traffic, legacy handles rest. ```rust impl CanaryGate { pub fn evaluate(&mut self, action: &Action) -> GateResult { let canary = self.canary_selector.select(action); if canary { metrics::counter!("gate.canary.avcg").increment(1); self.avcg_gate.evaluate(action) } else { self.legacy_gate.evaluate(action) } } } ``` **Exit criteria**: No incidents attributed to AVCG decisions over 14 days. ### Phase M3: Majority Rollout AVCG handles 95%, legacy available for fallback. ### Phase M4: Full Cutover Legacy removed. AVCG is the gate. ``` Timeline: M1 (Shadow) → 2-4 weeks M2 (Canary 5%) → 2 weeks M3 (Majority) → 2 weeks M4 (Full) → 1 week ───────── Total → 7-9 weeks ``` --- ## Observability ### Metrics (Prometheus) ``` # Decision counters gate_decisions_total{decision="permit|defer|deny", reason="..."} # Latency histograms gate_latency_seconds{phase="mincut|conformal|eprocess|decision|receipt"} # Signal values gate_cut_value{quantile="0.5|0.9|0.99"} gate_prediction_set_size{quantile="0.5|0.9|0.99"} gate_evalue{quantile="0.5|0.9|0.99"} # Health gate_healthy{component="mincut|conformal|eprocess"} gate_failover_total{from="primary|standby_N"} # Coverage tracking gate_conformal_coverage_rate # Should stay ≥ 0.85 gate_eprocess_power # Evidence accumulation rate ``` ### Alerting Thresholds | Alert | Condition | Severity | |-------|-----------|----------| | `GateHighDenyRate` | deny_rate > 10% for 5m | Warning | | `GateLatencyHigh` | p99 > 100ms for 5m | Warning | | `GateCoverageDrift` | coverage < 0.80 for 15m | Critical | | `GateUnhealthy` | any component unhealthy for 1m | Critical | | `GateReceiptChainBroken` | hash verification fails | Critical | ### Debug Query: Why Was This Denied? ```bash # Get full decision context curl /api/gate/receipts/1847394/explain # Response: { "receipt_sequence": 1847394, "decision": "deny", "explanation": { "primary_reason": "structural", "structural": { "cut_value": 2.1, "threshold": 5.0, "failed": true, "boundary_edges": [ {"id": "edge-17", "weight": 0.3, "endpoints": ["node-a", "node-b"]}, ... ], "partition_context": "Device router-west-03 is in partition P7 which has been unstable since 14:32:07 UTC" }, "predictive": { "failed": false, "detail": "Set size 47 within bounds" }, "evidential": { "failed": true, "detail": "E-value 0.004 < τ_deny 0.01" } }, "suggested_action": "Wait for partition P7 to stabilize or escalate to human approval", "similar_past_decisions": [1847201, 1846998, 1846754] } ``` --- ## Open Questions Resolution ### Q1: Graph model scope — immediate actions or multi-step lookahead? **Decision**: Immediate actions for v0, optional 1-step lookahead for v1. **Rationale**: Multi-step lookahead requires predicting action sequences, which adds latency and complexity. Start simple: evaluate the action being requested *right now*. If the current action is safe but would lead to an unsafe state, the *next* action will be denied when it's requested. ### Q2: E-process null — "action safety" vs "policy consistency"? **Decision**: Action safety as primary null, policy consistency as secondary. **Rationale**: - Primary H₀: P(action leads to unsafe state) ≤ p₀ - Secondary (optional): Current action consistent with established policy Action safety is more fundamental. Policy consistency can be added as a separate e-process that runs in parallel. ### Q3: Threshold learning — fixed or adaptive? **Decision**: Fixed for v0, adaptive via meta-learning for v1. **Rationale**: Fixed thresholds are easier to audit and explain. Once we have production data, we can train a meta-learner to adjust thresholds based on observed false positive/negative rates. ### Q4: Human-in-loop — how are DEFER decisions presented? **Decision**: Structured escalation with timeout and context link. ```json { "escalation": { "to": "human_operator", "channel": "slack|pagerduty|dashboard", "context_url": "/receipts/{seq}/context", "timeout_ns": 300000000000, "default_on_timeout": "deny" } } ``` If human doesn't respond within timeout, default to DENY (fail-safe). See **Hybrid Agent/Human Workflow** section for full details. ### Q5: Adversarial robustness — adaptive adversaries? **Decision**: Defense in depth + rate limiting + anomaly detection. **Mitigations**: 1. Rate limit: Max N decisions per agent per minute 2. Anomaly: Flag agents with unusually high deny rates 3. Honeypots: Inject synthetic "trap" actions to detect probing 4. Rotation: Periodically rotate threshold parameters within safe bounds --- ## Definition of Done ### v0.1 Shippable Criteria | Criterion | Metric | Target | |-----------|--------|--------| | **Structural filter works** | Min-cut correctly identifies fragile partitions | 100% on test suite | | **Receipts are signed** | All receipts have valid Ed25519 signature | 100% | | **Receipts are chained** | Hash chain verifies for all receipts | 100% | | **Latency acceptable** | p99 gate decision time | < 50ms | | **No false denies** | Known-safe actions are permitted | 100% on test suite | | **Demo scenario runs** | Network security control plane demo | End-to-end pass | ### v0.1 Minimum Viable Demo **Scenario**: Agent requests config push to network device. 1. Agent calls `permit_action` with device target 2. Gate evaluates structural coherence (min-cut) 3. Gate returns PERMIT with signed receipt 4. Agent presents token to device 5. Device verifies token, accepts config **Success**: Auditor can replay decision from receipt and get same result. --- ## Cost Model ### Memory per Tile (WASM) | Component | Size | Notes | |-----------|------|-------| | Graph shard | 32 KB | ~2000 edges at 16 bytes each | | Feature window | 8 KB | 2048 f32 values | | E-accumulator | 64 B | f64 + metadata | | Boundary edges | 64 B | 8 × EdgeId | | **Total per worker** | **~41 KB** | Fits in 64KB WASM page | | **Total 255 workers** | **~10.2 MB** | | | TileZero state | ~1 MB | Supergraph + receipt log head | | **Total fabric** | **~12 MB** | | ### Network Bandwidth | Flow | Frequency | Size | Bandwidth | |------|-----------|------|-----------| | Worker → TileZero reports | 1/tick (10ms) | 64 B × 255 | ~1.6 MB/s | | Receipt log append | per decision | ~512 B | Variable | | Gossip (distributed) | 1/100ms | ~1 KB × peers | ~10 KB/s × P | ### Storage Growth | Item | Size | Retention | Growth | |------|------|-----------|--------| | Receipt | ~512 B | 90 days | ~44 MB/day @ 1000 decisions/s | | E-process checkpoint | ~128 B | Forever | ~11 MB/day @ 1000 decisions/s | | Audit log | ~256 B | 1 year | ~22 MB/day @ 1000 decisions/s | **90-day storage**: ~7 GB receipts + ~1 GB checkpoints ≈ **8 GB** --- ## Hybrid Agent/Human Workflow The coherence gate is designed for **bounded autonomy**, not full autonomy. Humans stay in the loop at critical decision points. ### Design Philosophy ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ "Agents handle the routine. Humans handle the novel." │ │ │ │ PERMIT → Agent proceeds autonomously (low risk, high confidence) │ │ DEFER → Human decides (uncertain, boundary case, policy gap) │ │ DENY → Blocked automatically (structural violation, unsafe) │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` The gate doesn't replace human judgment—it **routes decisions to humans when judgment is needed**. ### Escalation Tiers | Tier | Trigger | Responder | SLA | Example | |------|---------|-----------|-----|---------| | **T0** | PERMIT | None (automated) | 0 | Routine config within stable partition | | **T1** | DEFER (shift) | On-call operator | 5 min | New dependency pattern detected | | **T2** | DEFER (boundary) | Senior engineer | 15 min | Action crosses partition boundary | | **T3** | DEFER (policy gap) | Policy team | 1 hour | No precedent for this action type | | **T4** | DENY override request | Security + Management | 4 hours | Agent requesting exception to denial | ### Human Decision Interface When a DEFER is escalated, humans see: ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ DECISION REQUIRED Timeout: 4:32 │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ Agent: ops-agent-12 │ │ Action: Push config to router-west-03 /network/interfaces/eth0 │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ WHY DEFERRED │ │ │ │ │ │ │ │ • Shift detected: New dependency pattern (0.73 > 0.5 threshold)│ │ │ │ • This device was added to the graph 2 hours ago │ │ │ │ • Similar actions on established devices: 847 permits, 0 denies│ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ CONTEXT │ │ │ │ │ │ │ │ Structural coherence: 11.2 (healthy) │ │ │ │ Prediction set size: 18 outcomes (moderate uncertainty) │ │ │ │ Evidence accumulator: 3.2 (inconclusive) │ │ │ │ │ │ │ │ [View full witness receipt] [View similar past decisions] │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │ │ │ APPROVE │ │ DENY │ │ ESCALATE TO T3 │ │ │ │ (proceed) │ │ (block) │ │ (need policy guidance) │ │ │ └───────────────┘ └───────────────┘ └───────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Human Decision Recording Human decisions become part of the audit trail: ```rust pub struct HumanDecision { /// Original deferred receipt pub deferred_receipt_seq: u64, /// Human's decision pub decision: HumanVerdict, /// Human identity (authenticated) pub decider_id: AuthenticatedUserId, /// Reasoning (required for audit) pub rationale: String, /// Timestamp pub decided_at: u64, /// Signature (human signs their decision) pub signature: Ed25519Signature, } pub enum HumanVerdict { /// Approve the action Approve { /// Add to training data for future automation learn_from_this: bool, }, /// Deny the action Deny { /// Reason for denial reason: String, }, /// Escalate to higher tier Escalate { to_tier: EscalationTier, reason: String, }, /// Request more information NeedMoreInfo { questions: Vec, }, } ``` ### Override Protocol Humans can override DENY decisions, but with friction and accountability: ```rust pub struct DenyOverride { /// Which denial is being overridden pub denied_receipt_seq: u64, /// Who is overriding (must be T4 authority) pub overrider_id: AuthenticatedUserId, /// Second approver required pub second_approver_id: AuthenticatedUserId, /// Business justification (required, min 50 chars) pub justification: String, /// Time-bounded: override expires pub valid_until: u64, /// Scope-limited: only this specific action pub action_id: ActionId, /// Both signatures required pub overrider_signature: Ed25519Signature, pub approver_signature: Ed25519Signature, } ``` **Override constraints**: - Two humans required (four-eyes principle) - Must provide written justification - Time-limited (max 24 hours) - Scope-limited (only the specific action) - All overrides flagged for security review ### Learning from Human Decisions Human decisions improve the gate over time: ```rust /// When human approves a DEFER, optionally learn from it pub fn learn_from_approval( deferred: &WitnessReceipt, human: &HumanDecision, ) { if human.decision.learn_from_this() { // Add to calibration data conformal_calibrator.add_observation( deferred.context.clone(), Outcome::Safe, // Human judged it safe ); // Update e-process null hypothesis eprocess_trainer.add_positive_example( deferred.action.clone(), ); // Adjust threshold candidates (for meta-learning in v1) threshold_learner.record_human_permit( deferred.signals.clone(), ); } } ``` ### Workload Distribution Target The goal is **minimal human burden** while maintaining safety: | Decision | Target Rate | Human Workload | |----------|-------------|----------------| | PERMIT | 90-95% | Zero | | DEFER | 4-9% | Human decides | | DENY | 1-2% | Zero (unless override requested) | If DEFER rate exceeds 10%, the gate is too conservative—tune thresholds. If DENY rate exceeds 5%, something is wrong—investigate root cause. ### Integration Channels | Channel | Use Case | Response Format | |---------|----------|-----------------| | **Slack** | On-call escalation | Interactive buttons | | **PagerDuty** | Critical/timed decisions | Acknowledge + decision API | | **Dashboard** | Batch review | Web UI with full context | | **CLI** | Developer/ops workflow | `ruvector gate approve ` | | **API** | Programmatic integration | REST/gRPC | ### Audit Trail for Human Decisions Every human decision is: 1. **Authenticated**: Decider identity verified via SSO/MFA 2. **Signed**: Human signs their decision with personal key 3. **Chained**: Added to the same receipt chain as gate decisions 4. **Timestamped**: Immutable record of when decision was made 5. **Justified**: Rationale captured for later review ``` Receipt Chain: [1847392] PERMIT (automated) → agent executed [1847393] DEFER (automated) → escalated to human [1847393-H] APPROVE (human: alice@corp) → agent executed [1847394] DENY (automated) → blocked [1847394-O] OVERRIDE (humans: bob@corp + carol@corp) → exception granted ``` --- ## Consequences ### Benefits 1. **Formal Guarantees**: Type I error control at any stopping time 2. **Distribution Shift Robustness**: Conformal prediction adapts without retraining 3. **Computational Efficiency**: O(n^{o(1)}) update time from subpolynomial min-cut 4. **Audit Trail**: Every decision has cryptographic witness receipt 5. **Defense in Depth**: Three independent signals must concur for permit 6. **Cryptographic Integrity**: All receipts signed with Ed25519 7. **Attack Resistance**: E-value bounds, replay guards, race condition prevention 8. **Distributed Scalability**: Hierarchical coordination with regional and global tiers 9. **Fault Tolerance**: Automatic failover with safe defaults ### Risks & Mitigations | Risk | Mitigation | |------|------------| | Computational overhead | Lazy evaluation; batch updates; SIMD optimization | | E-value power under uncertainty | Mixture e-values for robustness | | Graph model mismatch | Learn graph structure from trajectories | | Threshold tuning | Adaptive thresholds via meta-learning | | Receipt forgery | Mandatory Ed25519 signing; chain linkage | | E-value manipulation | Input bounds; clamping with security logging | | Race conditions | Atomic decisions with sequence numbers | | Replay attacks | Bloom filter + sliding window guard | | Network partitions | Hierarchical decisions; local autonomy | | Byzantine nodes | Consensus-based aggregation; safe defaults | ### Complexity Analysis | Operation | Current | With AVCG | Distributed AVCG | |-----------|---------|-----------|------------------| | Edge update | O(n^{o(1)}) | O(n^{o(1)}) | O(n^{o(1)}) + network | | Gate evaluation | O(1) | O(k) prediction set | O(k) + O(R) regional | | Witness generation | O(m) | O(m) amortized | O(m) + signing | | Certificate verification | O(n) | O(n + log T) | O(n + log T) + sig verify | | Receipt signing | N/A | O(1) Ed25519 | O(1) + HSM latency | | Distributed consensus | N/A | N/A | O(log N) Raft | | E-process aggregation | N/A | O(1) | O(P) peers | Where: k = prediction set size, T = history length, R = regional peers, N = cluster size, P = peer count ## References ### Dynamic Min-Cut 1. El-Hayek, Henzinger, Li. "Deterministic and Exact Fully-dynamic Minimum Cut of Superpolylogarithmic Size in Subpolynomial Time." arXiv:2512.13105, December 2025. 2. Jin, Sun, Thorup. "Fully Dynamic Exact Minimum Cut in Subpolynomial Time." SODA 2024. ### Online Conformal Prediction 3. "Online Conformal Inference with Retrospective Adjustment for Faster Adaptation to Distribution Shift." arXiv:2511.04275, November 2025. 4. "Distribution-informed Online Conformal Prediction (COP)." December 2025. 5. "CORE: Conformal Regression under Distribution Shift via Reinforcement Learning." October 2025. ### E-Values and E-Processes 6. Ramdas, Wang. "Hypothesis Testing with E-values." Foundations and Trends in Statistics, 2025. 7. ICML 2025 Tutorial: "Game-theoretic Statistics and Sequential Anytime-Valid Inference." 8. "Sequential Randomization Tests Using e-values." arXiv:2512.04366, December 2025. ### AI Agent Control 9. "Bounded Autonomy: A Pragmatic Response to Concerns About Fully Autonomous AI Agents." XMPRO, 2025. 10. "Customizable Runtime Enforcement for Safe and Reliable LLM Agents." arXiv:2503.18666, 2025. ## Testing Strategy ### Unit Tests | Component | Coverage Target | Key Test Cases | |-----------|----------------|----------------| | `CompactGraph` | 95% | Add/remove edges, weight updates, min-cut estimation | | `EvidenceAccumulator` | 95% | Bounds checking, update rules, stopping decisions | | `TileReport` | 90% | Serialization roundtrip, checksum verification | | `PermitToken` | 95% | Signing, verification, TTL expiration | | `ReceiptLog` | 95% | Hash chain integrity, tamper detection | | `ThreeFilterDecision` | 100% | All Permit/Defer/Deny paths | ### Integration Tests | Scenario | Description | Expected Outcome | |----------|-------------|------------------| | Happy path | Stable graph, safe action | PERMIT with valid receipt | | Boundary crossing | Action crosses fragile partition | DENY with boundary edges | | Shift detection | New dependency pattern | DEFER with escalation | | Human approval | DEFER → human approves | Token issued, learning recorded | | Replay verification | Replay historical decision | Deterministic match | | Hash chain audit | Verify 1000 receipts | All hashes valid | ### Property-Based Tests ```rust #[proptest] fn e_value_always_positive(e1: f64, e2: f64) { let result = combine_evalues(e1.abs(), e2.abs()); prop_assert!(result > 0.0); } #[proptest] fn receipt_hash_deterministic(receipt: WitnessReceipt) { let hash1 = receipt.compute_hash(); let hash2 = receipt.compute_hash(); prop_assert_eq!(hash1, hash2); } #[proptest] fn serialization_roundtrip(report: TileReport) { let bytes = report.serialize(); let restored = TileReport::deserialize(&bytes); prop_assert_eq!(report, restored); } ``` ### Security Tests | Test | Attack Vector | Expected Behavior | |------|---------------|-------------------| | Forged signature | Invalid Ed25519 sig | Verification fails | | Replay attack | Duplicate action | ReplayGuard blocks | | E-value overflow | Extreme likelihood ratio | Clamped to bounds | | Race condition | Concurrent evaluations | Sequence numbers ordered | | Tampered receipt | Modified hash | Chain verification fails | ### Benchmark Tests | Metric | Target | Measurement | |--------|--------|-------------| | Gate decision latency | p99 < 50ms | `criterion` benchmark | | Receipt signing | < 5ms | `criterion` benchmark | | 255-tile report merge | < 10ms | `criterion` benchmark | | Hash chain verification (1000) | < 100ms | `criterion` benchmark | | Memory per worker tile | < 64KB | Static analysis | --- ## Configuration Format ### TOML Configuration ```toml # gate-config.toml [gate] # Gate identification gate_id = "gate-west-01" version = "0.1.0" [thresholds] # E-process thresholds tau_deny = 0.01 # E-value below this → DENY tau_permit = 100.0 # E-value above this → PERMIT # Structural thresholds min_cut = 5.0 # Cut value below this → DENY max_shift = 0.5 # Shift pressure above this → DEFER # Conformal thresholds max_prediction_set = 20 # Set size above this → DEFER coverage_target = 0.90 # Target coverage rate [timing] # Permit token TTL permit_ttl_seconds = 300 # Decision timeout decision_timeout_ms = 50 # Tick interval for worker tiles tick_interval_ms = 10 [security] # Key rotation signing_key_rotation_days = 30 threshold_key_rotation_days = 90 # Replay prevention replay_window_seconds = 3600 bloom_filter_size = 1000000 [distributed] # Coordination settings regional_peers = ["gate-west-02", "gate-west-03"] global_coordinator = "coordinator-global-01" raft_heartbeat_ms = 100 consensus_timeout_ms = 1000 [escalation] # Human-in-loop settings default_timeout_seconds = 300 default_on_timeout = "deny" [escalation.channels.slack] webhook_url = "${SLACK_WEBHOOK_URL}" channel = "#gate-escalations" [escalation.channels.pagerduty] api_key = "${PAGERDUTY_API_KEY}" service_id = "gate-critical" [observability] # Metrics endpoint metrics_port = 9090 metrics_path = "/metrics" # Tracing tracing_enabled = true tracing_sample_rate = 0.1 jaeger_endpoint = "http://jaeger:14268/api/traces" [storage] # Receipt storage receipt_backend = "postgresql" receipt_retention_days = 90 checkpoint_interval = 100 [storage.postgresql] host = "${DB_HOST}" port = 5432 database = "gate_receipts" username = "${DB_USER}" password = "${DB_PASSWORD}" ``` ### Environment Variables ```bash # Required export GATE_SIGNING_KEY_PATH=/etc/gate/keys/signing.key export GATE_CONFIG_PATH=/etc/gate/config.toml # Optional overrides export GATE_TAU_DENY=0.01 export GATE_TAU_PERMIT=100.0 export GATE_MIN_CUT=5.0 export GATE_MAX_SHIFT=0.5 export GATE_PERMIT_TTL_SECONDS=300 # Secrets (never in config file) export SLACK_WEBHOOK_URL=https://hooks.slack.com/... export PAGERDUTY_API_KEY=... export DB_PASSWORD=... ``` --- ## Error Recovery Procedures ### Gate Decision Failures | Failure | Detection | Recovery | Fallback | |---------|-----------|----------|----------| | Min-cut timeout | Decision exceeds 50ms | Log, retry once | DEFER | | E-process NaN | `is_nan()` check | Reset accumulator | DENY | | Signing failure | Ed25519 error | Rotate to backup key | DENY (unsigned) | | Receipt log full | Capacity check | Archive, start new segment | DENY | ### Distributed Failures ```rust impl FaultRecovery { pub async fn handle_regional_failure(&mut self, error: RegionalError) -> GateResult { match error { RegionalError::LeaderUnavailable => { // Wait for new leader election tokio::time::sleep(Duration::from_millis(200)).await; self.retry_with_new_leader().await } RegionalError::NetworkPartition => { // Fall back to local-only decision log::warn!("Network partition detected, using local gate"); self.local_gate.evaluate_standalone() } RegionalError::ConsensusTimeout => { // Use conservative decision Ok(GateResult { decision: GateDecision::Defer, reason: "Consensus timeout - escalating to human".into(), ..Default::default() }) } } } } ``` ### Receipt Chain Recovery ```rust impl ReceiptLog { /// Recover from corrupted receipt chain pub fn recover_chain(&mut self, last_known_good: u64) -> Result<(), RecoveryError> { // 1. Truncate corrupted entries self.truncate_after(last_known_good)?; // 2. Rebuild from checkpoint let checkpoint = self.find_nearest_checkpoint(last_known_good)?; self.rebuild_from_checkpoint(checkpoint)?; // 3. Mark recovery in audit log self.append_recovery_marker(last_known_good)?; // 4. Alert operators alert::send("Receipt chain recovery performed", Severity::Warning); Ok(()) } } ``` ### Worker Tile Recovery | Failure | Detection | Recovery Time | Data Loss | |---------|-----------|---------------|-----------| | Single tile crash | Heartbeat timeout | < 100ms | Last tick | | Tile memory corruption | Checksum mismatch | < 500ms | Current shard | | TileZero crash | Primary unavailable | < 1s | None (standbys) | | Full fabric restart | All tiles down | < 5s | Rebuild from checkpoint | ### Runbook: Gate Unresponsive ```bash # 1. Check gate health curl http://gate:9090/health # 2. If unhealthy, check logs kubectl logs -l app=gate --tail=100 # 3. Check for resource exhaustion kubectl top pods -l app=gate # 4. If memory high, trigger GC curl -X POST http://gate:9090/admin/gc # 5. If still unresponsive, rolling restart kubectl rollout restart deployment/gate # 6. Verify recovery curl http://gate:9090/health curl http://gate:9090/metrics | grep gate_healthy ``` --- ## Appendix: Mathematical Foundations ### E-Value Composition For independent e-values e₁, e₂: ``` e_combined = e₁ · e₂ E[e_combined] = E[e₁] · E[e₂] ≤ 1 · 1 = 1 ``` This enables **optional continuation**: evidence accumulates validly across sessions. ### Conformal Coverage Under exchangeability or bounded distribution shift: ``` P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α - δ_t ``` Where δ_t → 0 as the algorithm adapts via retrospective adjustment. ### Anytime-Valid Stopping For any stopping time τ (possibly data-dependent): ``` P_H₀(E_τ ≥ 1/α) ≤ α ``` This holds because E_t is a nonnegative supermartingale with E[E_0] = 1.