Files
wifi-densepose/crates/ruvector-mincut/docs/adr/ADR-001-anytime-valid-coherence-gate.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

79 KiB
Raw Blame History

ADR-001: Anytime-Valid Coherence Gate

Status: Proposed Date: 2026-01-17 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow

Version History

Version Date Author Changes
0.1 2026-01-17 ruv.io Initial draft with three-filter architecture
0.2 2026-01-17 ruv.io Added security hardening, performance optimization
0.3 2026-01-17 ruv.io Added 256-tile WASM fabric mapping
0.4 2026-01-17 ruv.io Added API contract, migration, observability
0.5 2026-01-17 ruv.io Added hybrid agent/human workflow
0.6 2026-01-17 ruv.io Added testing strategy, config format, error recovery

Plain Language Summary

What is it?

An Anytime-Valid Coherence Gate is a small control loop that decides, at any moment:

"Is it safe to act right now, or should we pause or escalate?"

It does not try to be smart. It tries to be safe, calm, and correct about permission.

Why "anytime-valid"?

Because you can stop the computation at any time and still trust the decision.

Like a smoke detector:

  • It can keep listening forever
  • The moment it has enough evidence, it triggers
  • If you stop listening early, whatever it already concluded is still valid

You are not waiting for a model to finish thinking. You are continuously monitoring stability.

Why "coherence"?

Coherence means: does the system's current state agree with itself?

In RuVector, coherence is measured from structure:

  • RuVector holds relationships as vectors plus a graph
  • Min-cut and boundary signals tell you when the graph is becoming fragile or splitting into conflicting regions
  • If the system is splitting, you do not let it take big actions

What it outputs:

Decision Meaning
Permit Stable enough, proceed
Defer Uncertain, escalate to a stronger model or human
Deny Unstable or policy-violating, block the action

Every decision returns a short "receipt" explaining why.

A concrete example:

An agent wants to push a config change to a network device.

  • If the dependency graph is stable and similar changes worked before → Permit
  • If signals are weird (new dependencies, new actors, drift) → Defer and ask for confirmation
  • If the change crosses a fragile boundary (touches a partition already unstable) → Deny

Why it matters:

It turns autonomy into something enterprises can trust because:

  • Actions are bounded
  • Uncertainty is handled explicitly
  • You get an audit trail

"Attention becomes a permission system, not a popularity contest" — applied to whole-system actions instead of token attention.


Context

The RuVector ecosystem requires a principled mechanism for controlling autonomous agent actions with:

  • Formal safety guarantees under distribution shift
  • Computational efficiency suitable for real-time enforcement
  • Auditable decision trails with cryptographic receipts

Current approaches (threshold classifiers, rule-based systems, periodic audits) lack one or more of these properties. This ADR proposes the Anytime-Valid Coherence Gate (AVCG) - a 3-way algorithmic combination that converts coherence measurement into a deterministic control loop.

Decision

We will implement an Anytime-Valid Coherence Gate that integrates three cutting-edge algorithmic components:

1. Dynamic Min-Cut with Witness Partitions

Source: El-Hayek, Henzinger, Li (arXiv:2512.13105, December 2025)

Key Innovation: Exact deterministic n^{o(1)} update time for cuts up to 2^{Θ(log^{3/4-c}n)}

Integration:

  • Extends existing SubpolynomialMinCut in ruvector-mincut/src/subpolynomial/mod.rs
  • Leverages existing WitnessTree for explicit partition certificates
  • Uses deterministic LocalKCut for local cut verification

Role in Gate: Provides the structural coherence signal - identifies minimal intervention points in the agent action graph with explicit witness partitions showing which actions form the critical boundary to unsafe states.

2. Online Conformal Prediction with Shift-Awareness

Sources:

  • Retrospective Adjustment (arXiv:2511.04275, November 2025)
  • Conformal Optimistic Prediction (COP) (December 2025)
  • CORE: RL-based Conformal Regression (October 2025)

Key Innovation: Distribution-free coverage guarantees that adapt to arbitrary distribution shift with faster recalibration via retrospective adjustment.

Integration:

  • New module: ruvector-mincut/src/conformal/ for prediction sets
  • Interfaces with existing GatePolicy thresholds
  • Wraps action outcome predictions with calibrated uncertainty

Role in Gate: Provides the predictive uncertainty signal - quantifies confidence in action outcomes, triggering DEFER when prediction sets are too large.

3. E-Values and E-Processes for Anytime-Valid Inference

Sources:

  • Ramdas & Wang "Hypothesis Testing with E-values" (FnTStA 2025)
  • ICML 2025 Tutorial on SAVI
  • Sequential Randomization Tests (arXiv:2512.04366, December 2025)

Key Innovation: Evidence accumulation that remains valid at any stopping time, with multiplicative composition across experiments.

Definition: E-value e satisfies E[e] ≤ 1 under null hypothesis. E-processes are nonnegative supermartingales with E_0 = 1.

Integration:

  • New module: ruvector-mincut/src/eprocess/ for evidence tracking
  • Integrates with existing CutCertificate for audit trails
  • Enables anytime-valid stopping decisions

Role in Gate: Provides the evidential validity signal - accumulates statistical evidence for/against coherence with formal Type I error control at any stopping time.

Gate Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    ANYTIME-VALID COHERENCE GATE                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐ │
│   │  DYNAMIC MIN-CUT │    │    CONFORMAL     │    │   E-PROCESS      │ │
│   │    (Structural)  │    │   (Predictive)   │    │  (Evidential)    │ │
│   │                  │    │                  │    │                  │ │
│   │  SubpolynomialMC │    │  ShiftAdaptive   │    │  CoherenceTest   │ │
│   │  WitnessTree     │───▶│  PredictionSet   │───▶│  EvidenceAccum   │ │
│   │  LocalKCut       │    │  COP/CORE        │    │  StoppingRule    │ │
│   └──────────────────┘    └──────────────────┘    └──────────────────┘ │
│            │                       │                       │           │
│            ▼                       ▼                       ▼           │
│   ┌────────────────────────────────────────────────────────────────┐   │
│   │                    DECISION LOGIC                              │   │
│   │                                                                │   │
│   │   PERMIT: E_t > τ_permit ∧ action ∉ CriticalCut ∧ |C_t| small │   │
│   │   DEFER:  |C_t| large  τ_deny < E_t < τ_permit               │   │
│   │   DENY:   E_t < τ_deny  action ∈ WitnessPartition(unsafe)    │   │
│   │                                                                │   │
│   └────────────────────────────────────────────────────────────────┘   │
│                               │                                        │
│                               ▼                                        │
│                    ┌─────────────────────┐                            │
│                    │   WITNESS RECEIPT   │                            │
│                    │  (cut + conf + e)   │                            │
│                    └─────────────────────┘                            │
└─────────────────────────────────────────────────────────────────────────┘

Integration with Existing Architecture

Extension Points

Component Current Implementation AVCG Extension
GatePacket λ as point estimate Add lambda_confidence_q15, e_value_log_q15
GateController Rule-based thresholds Add AnytimeGatePolicy with adaptive thresholds
WitnessTree Cut value only Add ConfidenceWitness with staleness tracking
CutCertificate Static verification Add EvidenceReceipt with e-value trace
TierDecision Fixed tiers Add required_confidence_for_tier

New Modules

ruvector-mincut/
├── src/
│   ├── conformal/           # NEW: Online conformal prediction
│   │   ├── mod.rs
│   │   ├── prediction_set.rs
│   │   ├── cop.rs           # Conformal Optimistic Prediction
│   │   ├── retrospective.rs # Retrospective adjustment
│   │   └── core.rs          # RL-based conformal
│   ├── eprocess/            # NEW: E-value and e-process tracking
│   │   ├── mod.rs
│   │   ├── evalue.rs
│   │   ├── evidence_accum.rs
│   │   ├── stopping.rs
│   │   └── mixture.rs
│   ├── anytime_gate/        # NEW: Integrated gate controller
│   │   ├── mod.rs
│   │   ├── policy.rs
│   │   ├── decision.rs
│   │   └── receipt.rs
│   └── ...existing modules...

Decision Rules

Permit Conditions (all must hold)

  1. E-process value E_t > τ_permit (sufficient evidence of coherence)
  2. Action not in witness partition of critical cut
  3. Conformal prediction set |C_t| < θ_confidence (confident prediction)

Defer Conditions (any triggers)

  1. Conformal prediction set |C_t| > θ_uncertainty (uncertain outcome)
  2. E-process in indeterminate range: τ_deny < E_t < τ_permit
  3. Deadline approaching without sufficient confidence

Deny Conditions (any triggers)

  1. E-process value E_t < τ_deny (strong evidence of incoherence)
  2. Action in witness partition crossing to unsafe states
  3. Structural impossibility via min-cut topology

Threshold Configuration

Threshold Meaning Recommended Default
τ_deny E-process level indicating incoherence 0.01 (1% false alarm)
τ_permit E-process level indicating coherence 100 (strong evidence)
θ_uncertainty Conformal set size requiring deferral Task-dependent
θ_confidence Conformal set size for confident permit Task-dependent

Witness Receipt Structure

pub struct WitnessReceipt {
    /// Timestamp of decision
    pub timestamp: u64,
    /// Action that was evaluated
    pub action_id: ActionId,
    /// Gate decision
    pub decision: GateDecision,

    // Structural witness (from min-cut)
    pub cut_value: f64,
    pub witness_partition: (Vec<VertexId>, Vec<VertexId>),
    pub critical_edges: Vec<EdgeId>,

    // Predictive witness (from conformal)
    pub prediction_set: ConformalSet,
    pub coverage_target: f32,
    pub shift_adaptation_rate: f32,

    // Evidential witness (from e-process)
    pub e_value: f64,
    pub e_process_cumulative: f64,
    pub stopping_valid: bool,

    // Cryptographic seal
    pub receipt_hash: [u8; 32],
}

Security Hardening

Threat Model

Threat Actor Capabilities Target Impact
Malicious Agent Action injection, timing manipulation Gate bypass Unauthorized actions executed
Network Adversary Message interception, replay Receipt forgery False audit trail
Insider Threat Threshold modification, key access Policy manipulation Safety guarantees voided
Byzantine Node Arbitrary behavior in distributed gate Consensus corruption Inconsistent decisions

Cryptographic Requirements

Receipt Signing (CRITICAL)

pub struct WitnessReceipt {
    // ... existing fields ...

    // Cryptographic seal (REQUIRED)
    pub receipt_hash: [u8; 32],         // Blake3 hash of serialized content
    pub signature: Ed25519Signature,     // REQUIRED, not optional
    pub signer_id: PublicKey,           // Identity of signing gate
    pub timestamp_proof: TimestampProof, // Prevents backdating
}

/// Timestamp proof prevents replay and backdating
pub struct TimestampProof {
    pub timestamp: u64,
    pub previous_receipt_hash: [u8; 32], // Chain linkage
    pub merkle_root: [u8; 32],           // Batch anchor
}

impl WitnessReceipt {
    /// Sign receipt - MUST be called before any external use
    pub fn sign(&mut self, key: &SigningKey) -> Result<(), CryptoError> {
        let content = self.serialize_without_signature();
        self.receipt_hash = blake3::hash(&content).into();
        self.signature = key.sign(&self.receipt_hash);
        Ok(())
    }

    /// Verify receipt integrity and authenticity
    pub fn verify(&self, trusted_keys: &KeyStore) -> Result<(), VerifyError> {
        // 1. Verify hash
        let expected_hash = blake3::hash(&self.serialize_without_signature());
        if self.receipt_hash != expected_hash.into() {
            return Err(VerifyError::HashMismatch);
        }

        // 2. Verify signature
        let public_key = trusted_keys.get(&self.signer_id)?;
        public_key.verify(&self.receipt_hash, &self.signature)?;

        // 3. Verify timestamp chain
        self.timestamp_proof.verify()?;

        Ok(())
    }
}

Key Management

Key Type Purpose Rotation Storage
Gate Signing Key Sign receipts 30 days HSM or secure enclave
Receipt Verification Keys Verify receipts On rotation Distributed key store
Threshold Keys Multi-party signing 90 days Shamir secret sharing

Attack Mitigations

E-Value Manipulation Prevention

/// Bounds checking for e-value inputs
impl EValue {
    pub fn from_likelihood_ratio(
        likelihood_h1: f64,
        likelihood_h0: f64,
    ) -> Result<Self, EValueError> {
        // Prevent division by zero
        if likelihood_h0 <= f64::EPSILON {
            return Err(EValueError::InvalidDenominator);
        }

        let ratio = likelihood_h1 / likelihood_h0;

        // Bound extreme values to prevent overflow attacks
        let bounded = ratio.clamp(E_VALUE_MIN, E_VALUE_MAX);

        // Log if clamping occurred (potential attack indicator)
        if (bounded - ratio).abs() > f64::EPSILON {
            security_log!("E-value clamped: {} -> {}", ratio, bounded);
        }

        Ok(Self { value: bounded, ..Default::default() })
    }
}

const E_VALUE_MIN: f64 = 1e-10;
const E_VALUE_MAX: f64 = 1e10;

Race Condition Prevention

/// Atomic gate decision with sequence numbers
pub struct AtomicGateDecision {
    /// Monotonic sequence for ordering
    sequence: AtomicU64,
    /// Lock for decision atomicity
    decision_lock: RwLock<()>,
}

impl AtomicGateDecision {
    pub async fn evaluate(&self, action: &Action) -> GateResult {
        // Acquire exclusive lock for decision
        let _guard = self.decision_lock.write().await;

        // Get sequence number BEFORE evaluation
        let seq = self.sequence.fetch_add(1, Ordering::SeqCst);

        // Evaluate all three signals atomically
        let result = self.evaluate_internal(action, seq).await;

        // Sequence number in receipt ensures ordering
        result.with_sequence(seq)
    }
}

Replay Attack Prevention

/// Replay prevention via nonce tracking
pub struct ReplayGuard {
    /// Recent action hashes (bloom filter for efficiency)
    recent_actions: BloomFilter,
    /// Sliding window of full hashes for false positive resolution
    hash_window: VecDeque<[u8; 32]>,
    /// Maximum age of tracked actions
    window_duration: Duration,
}

impl ReplayGuard {
    pub fn check_and_record(&mut self, action: &Action) -> Result<(), ReplayError> {
        let hash = action.content_hash();

        // Fast path: bloom filter check
        if self.recent_actions.might_contain(&hash) {
            // Slow path: verify against full hash window
            if self.hash_window.contains(&hash) {
                return Err(ReplayError::DuplicateAction { hash });
            }
        }

        // Record action
        self.recent_actions.insert(&hash);
        self.hash_window.push_back(hash);
        self.prune_old_entries();

        Ok(())
    }
}

Trust Boundaries

┌─────────────────────────────────────────────────────────────────────────┐
│                         TRUST BOUNDARY: GATE CORE                       │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │  • E-process computation    • Min-cut evaluation                 │  │
│  │  • Conformal prediction     • Decision logic                     │  │
│  │  • Receipt signing          • Key material                       │  │
│  │                                                                   │  │
│  │  Invariants:                                                      │  │
│  │  - All inputs validated before use                               │  │
│  │  - All outputs signed before release                             │  │
│  │  - No external calls during decision                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                    │
│                         (authenticated channel)                         │
│                                    │                                    │
└────────────────────────────────────┼────────────────────────────────────┘
                                     │
┌────────────────────────────────────┼────────────────────────────────────┐
│                    TRUST BOUNDARY: AGENT INTERFACE                      │
│                                    │                                    │
│  • Action submission (validated)   │  • Decision receipt (verified)    │
│  • Context provision (sanitized)   │  • Witness query (authenticated)  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Performance Optimization

Identified Bottlenecks & Solutions

1. E-Process History Management

Problem: Unbounded history growth in EProcess.history: Vec<EValue>

Solution: Ring buffer with configurable retention

pub struct EProcess {
    /// Current accumulated value (always maintained)
    current: f64,

    /// Bounded history ring buffer
    history: RingBuffer<EValueSummary>,

    /// Checkpoint for long-term audit (sampled)
    checkpoints: Vec<EProcessCheckpoint>,
}

/// Compact summary for history
pub struct EValueSummary {
    value: f32,           // Reduced precision for storage
    timestamp: u32,       // Relative to epoch
    flags: u8,            // Metadata bits
}

impl EProcess {
    const HISTORY_CAPACITY: usize = 1024;
    const CHECKPOINT_INTERVAL: usize = 100;

    pub fn update(&mut self, e: EValue) {
        // Update current (always)
        self.current = self.update_rule.apply(self.current, e.value);

        // Add to ring buffer (bounded)
        self.history.push(e.to_summary());

        // Periodic checkpoint for audit
        if self.history.len() % Self::CHECKPOINT_INTERVAL == 0 {
            self.checkpoints.push(self.checkpoint());
        }
    }
}

2. Min-Cut Hierarchy Updates

Problem: Sequential iteration over all hierarchy levels

Solution: Lazy propagation with dirty tracking

pub struct LazyHierarchy {
    levels: Vec<HierarchyLevel>,
    /// Bitmap of levels needing update
    dirty_levels: u64,
    /// Deferred updates queue
    pending_updates: VecDeque<DeferredUpdate>,
}

impl LazyHierarchy {
    pub fn insert(&mut self, edge: Edge) {
        // Only update lowest level immediately
        self.levels[0].insert(edge);
        self.dirty_levels |= 1;

        // Defer higher level updates
        self.pending_updates.push_back(DeferredUpdate::Insert(edge));
    }

    pub fn get_cut(&mut self) -> CutValue {
        // Propagate only if needed for query
        if self.dirty_levels != 0 {
            self.propagate_lazy();
        }
        self.levels.last().unwrap().cut_value()
    }

    fn propagate_lazy(&mut self) {
        // Process only dirty levels
        while self.dirty_levels != 0 {
            let level = self.dirty_levels.trailing_zeros() as usize;
            self.update_level(level);
            self.dirty_levels &= !(1 << level);
        }
    }
}

3. SIMD-Optimized E-Value Computation

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

/// Batch e-value computation with SIMD
pub fn compute_mixture_evalue_simd(
    likelihoods_h1: &[f64],
    likelihoods_h0: &[f64],
    weights: &[f64],
) -> f64 {
    assert_eq!(likelihoods_h1.len(), likelihoods_h0.len());
    assert_eq!(likelihoods_h1.len(), weights.len());

    #[cfg(target_feature = "avx2")]
    unsafe {
        let mut sum = _mm256_setzero_pd();

        for i in (0..likelihoods_h1.len()).step_by(4) {
            let h1 = _mm256_loadu_pd(likelihoods_h1.as_ptr().add(i));
            let h0 = _mm256_loadu_pd(likelihoods_h0.as_ptr().add(i));
            let w = _mm256_loadu_pd(weights.as_ptr().add(i));

            let ratio = _mm256_div_pd(h1, h0);
            let weighted = _mm256_mul_pd(ratio, w);
            sum = _mm256_add_pd(sum, weighted);
        }

        // Horizontal sum
        horizontal_sum_pd(sum)
    }

    #[cfg(not(target_feature = "avx2"))]
    {
        // Scalar fallback
        likelihoods_h1.iter()
            .zip(likelihoods_h0.iter())
            .zip(weights.iter())
            .map(|((h1, h0), w)| (h1 / h0) * w)
            .sum()
    }
}

4. Receipt Serialization Optimization

/// Zero-copy receipt serialization
pub struct ReceiptBuffer {
    /// Pre-allocated buffer pool
    pool: BufferPool,
    /// Current buffer
    current: Buffer,
}

impl WitnessReceipt {
    /// Serialize to pre-allocated buffer (zero-copy)
    pub fn serialize_into(&self, buffer: &mut [u8]) -> Result<usize, SerializeError> {
        let mut cursor = 0;

        // Fixed-size header (no allocation)
        cursor += self.write_header(&mut buffer[cursor..])?;

        // Structural witness (fixed size)
        cursor += self.structural.write_to(&mut buffer[cursor..])?;

        // Predictive witness (bounded size)
        cursor += self.predictive.write_to(&mut buffer[cursor..])?;

        // Evidential witness (fixed size)
        cursor += self.evidential.write_to(&mut buffer[cursor..])?;

        // Hash and signature (fixed size)
        buffer[cursor..cursor + 32].copy_from_slice(&self.receipt_hash);
        cursor += 32;
        buffer[cursor..cursor + 64].copy_from_slice(&self.signature.to_bytes());
        cursor += 64;

        Ok(cursor)
    }
}

Latency Budget (Revised)

Component Budget Optimization Measured p99
Min-cut query 10ms Lazy propagation TBD
Conformal prediction 15ms Cached quantiles TBD
E-process update 5ms SIMD mixture TBD
Decision logic 5ms Short-circuit TBD
Receipt generation 10ms Zero-copy serialize TBD
Signing 5ms Ed25519 batch TBD
Total 50ms

Distributed Coordination

Multi-Agent Gate Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED COHERENCE GATE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │   REGIONAL      │    │   REGIONAL      │    │   REGIONAL      │     │
│  │   GATE (Raft)   │    │   GATE (Raft)   │    │   GATE (Raft)   │     │
│  │                 │    │                 │    │                 │     │
│  │  • Local cuts   │    │  • Local cuts   │    │  • Local cuts   │     │
│  │  • Local conf   │    │  • Local conf   │    │  • Local conf   │     │
│  │  • Local e-proc │    │  • Local e-proc │    │  • Local e-proc │     │
│  └────────┬────────┘    └────────┬────────┘    └────────┬────────┘     │
│           │                      │                      │              │
│           └──────────────────────┼──────────────────────┘              │
│                                  │                                     │
│                    ┌─────────────▼─────────────┐                       │
│                    │   GLOBAL COORDINATOR      │                       │
│                    │   (DAG Consensus)         │                       │
│                    │                           │                       │
│                    │  • Cross-region cuts      │                       │
│                    │  • Aggregated e-process   │                       │
│                    │  • Boundary arbitration   │                       │
│                    └───────────────────────────┘                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Hierarchical Decision Protocol

/// Distributed gate with hierarchical coordination
pub struct DistributedGateController {
    /// Local gate for fast-path decisions
    local_gate: AnytimeGateController,

    /// Regional coordinator (Raft consensus)
    regional: RegionalCoordinator,

    /// Global coordinator (DAG consensus)
    global: GlobalCoordinator,

    /// Decision routing policy
    routing: DecisionRoutingPolicy,
}

pub enum DecisionScope {
    /// Action affects only local partition
    Local,
    /// Action crosses regional boundary
    Regional,
    /// Action has global implications
    Global,
}

impl DistributedGateController {
    pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
        // 1. Determine scope
        let scope = self.routing.classify(action, context);

        // 2. Route to appropriate level
        match scope {
            DecisionScope::Local => {
                // Fast path: local decision only
                self.local_gate.evaluate(action, context)
            }

            DecisionScope::Regional => {
                // Medium path: coordinate with regional peers
                let local_result = self.local_gate.evaluate(action, context);
                let regional_result = self.regional.coordinate(action, &local_result).await?;
                self.merge_results(local_result, regional_result)
            }

            DecisionScope::Global => {
                // Slow path: full coordination
                let local_result = self.local_gate.evaluate(action, context);
                let regional_result = self.regional.coordinate(action, &local_result).await?;
                let global_result = self.global.arbitrate(action, &regional_result).await?;
                self.merge_all_results(local_result, regional_result, global_result)
            }
        }
    }
}

Distributed E-Process Aggregation

/// E-process that aggregates across distributed gates
pub struct DistributedEProcess {
    /// Local e-process
    local: EProcess,

    /// Peer e-process summaries (received via gossip)
    peer_summaries: HashMap<NodeId, EProcessSummary>,

    /// Aggregation method
    aggregation: AggregationMethod,
}

pub enum AggregationMethod {
    /// Conservative: minimum across all nodes
    Minimum,
    /// Average with confidence weighting
    WeightedAverage,
    /// Consensus-based (requires agreement)
    Consensus { threshold: f64 },
}

impl DistributedEProcess {
    /// Get aggregated e-value for distributed decision
    pub fn aggregated_value(&self) -> f64 {
        match self.aggregation {
            AggregationMethod::Minimum => {
                let local = self.local.current_value();
                let peer_min = self.peer_summaries.values()
                    .map(|s| s.current_value)
                    .fold(f64::INFINITY, f64::min);
                local.min(peer_min)
            }

            AggregationMethod::WeightedAverage => {
                let total_weight: f64 = 1.0 + self.peer_summaries.values()
                    .map(|s| s.confidence_weight)
                    .sum::<f64>();

                let weighted_sum = self.local.current_value()
                    + self.peer_summaries.values()
                        .map(|s| s.current_value * s.confidence_weight)
                        .sum::<f64>();

                weighted_sum / total_weight
            }

            AggregationMethod::Consensus { threshold } => {
                // Requires threshold fraction of nodes to agree
                let values: Vec<f64> = std::iter::once(self.local.current_value())
                    .chain(self.peer_summaries.values().map(|s| s.current_value))
                    .collect();

                // Return median if sufficient agreement, else conservative min
                if self.check_agreement(&values, threshold) {
                    statistical_median(&values)
                } else {
                    values.iter().cloned().fold(f64::INFINITY, f64::min)
                }
            }
        }
    }
}

Fault Tolerance

/// Fault-tolerant gate with automatic failover
pub struct FaultTolerantGate {
    /// Primary gate
    primary: AnytimeGateController,

    /// Standby gates (hot standbys)
    standbys: Vec<AnytimeGateController>,

    /// Health monitor
    health: HealthMonitor,

    /// Failover policy
    failover: FailoverPolicy,
}

pub struct FailoverPolicy {
    /// Maximum consecutive failures before failover
    max_failures: u32,
    /// Health check interval
    check_interval: Duration,
    /// Recovery grace period
    recovery_grace: Duration,
}

impl FaultTolerantGate {
    pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
        // Try primary
        match self.try_primary(action, context).await {
            Ok(result) => return Ok(result),
            Err(e) => {
                self.health.record_failure(&e);
            }
        }

        // Failover to standbys
        for (idx, standby) in self.standbys.iter_mut().enumerate() {
            match standby.evaluate(action, context) {
                Ok(result) => {
                    // Promote standby if primary unhealthy
                    if self.health.should_failover() {
                        self.promote_standby(idx);
                    }
                    return Ok(result);
                }
                Err(e) => {
                    self.health.record_standby_failure(idx, &e);
                }
            }
        }

        // All gates failed - safe default
        Ok(GateResult {
            decision: GateDecision::Deny,
            reason: "All gates unavailable - failing safe".into(),
            ..Default::default()
        })
    }
}

Integration with RuVector Consensus

Consensus Layer RuVector Module Gate Integration
Regional (Raft) ruvector-raft Local cut coordination, leader-based decisions
Global (DAG) ruvector-cluster Cross-region boundary arbitration
State Sync ruvector-sync E-process summary propagation
Receipt Chain ruvector-merkle Distributed receipt verification

Hardware Mapping: 256-Tile WASM Fabric

The coherence gate is an ideal workload for event-driven WASM hardware: mostly silent, then extremely decisive when boundaries move.

Tile Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         256-TILE COGNITUM FABRIC                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                        TILE ZERO (Arbiter)                       │   │
│  │                                                                  │   │
│  │  • Merge worker reports      • Hierarchical min-cut             │   │
│  │  • Global gate decision      • Permit token issuance            │   │
│  │  • Witness receipt log       • Hash-chained eventlog            │   │
│  └──────────────────────────────┬───────────────────────────────────┘   │
│                                 │                                       │
│            ┌────────────────────┼────────────────────┐                 │
│            │                    │                    │                  │
│            ▼                    ▼                    ▼                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐             │
│  │  Workers     │    │  Workers     │    │  Workers     │   ...       │
│  │  [1-85]      │    │  [86-170]    │    │  [171-255]   │             │
│  │              │    │              │    │              │             │
│  │  Shard A     │    │  Shard B     │    │  Shard C     │             │
│  │  Local cuts  │    │  Local cuts  │    │  Local cuts  │             │
│  │  E-accum     │    │  E-accum     │    │  E-accum     │             │
│  └──────────────┘    └──────────────┘    └──────────────┘             │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Worker Tile Responsibilities

Each of the 255 worker tiles maintains a local shard:

/// Worker tile state (fits in ~64KB WASM memory)
#[repr(C)]
pub struct WorkerTileState {
    /// Compact neighborhood graph (edges + weights)
    graph_shard: CompactGraph,          // ~32KB

    /// Rolling feature window for normality scores
    feature_window: RingBuffer<f32>,    // ~8KB

    /// Local coherence score
    coherence: f32,

    /// Local boundary candidates (top-k edges)
    boundary_edges: [EdgeId; 8],

    /// Local e-value accumulator
    e_accumulator: f64,

    /// Tick counter
    tick: u64,
}

/// Per-tick processing: only deltas
impl WorkerTileState {
    /// Process incoming delta (edge add/remove/weight update)
    pub fn ingest_delta(&mut self, delta: &Delta) -> Status {
        match delta {
            Delta::EdgeAdd(e) => self.graph_shard.add_edge(e),
            Delta::EdgeRemove(e) => self.graph_shard.remove_edge(e),
            Delta::WeightUpdate(e, w) => self.graph_shard.update_weight(e, *w),
            Delta::Observation(score) => self.feature_window.push(*score),
        }
        self.update_local_coherence();
        Status::Ok
    }

    /// Tick: compute and emit report
    pub fn tick(&mut self, now_ns: u64) -> TileReport {
        self.tick = now_ns;

        // Tiny math: update e-accumulator
        self.e_accumulator = self.compute_local_evalue();

        TileReport {
            tile_id: self.id,
            coherence: self.coherence,
            boundary_moved: self.detect_boundary_movement(),
            suspicious_edges: self.top_k_suspicious(),
            e_value: self.e_accumulator as f32,
            witness_fragment: self.extract_witness_fragment(),
        }
    }
}

/// Fixed-size report (fits in single cache line)
#[repr(C, align(64))]
pub struct TileReport {
    tile_id: u8,
    coherence: f32,
    boundary_moved: bool,
    suspicious_edges: [EdgeId; 4],
    e_value: f32,
    witness_fragment: WitnessFragment,
}

TileZero Responsibilities

TileZero acts as the arbiter that issues final decisions:

/// TileZero: Global gate decision and permit issuance
pub struct TileZero {
    /// Merged supergraph (reduced from worker summaries)
    supergraph: ReducedGraph,

    /// Canonical permit token state
    permit_state: PermitState,

    /// Hash-chained witness receipt log
    receipt_log: ReceiptLog,

    /// Threshold configuration
    thresholds: GateThresholds,
}

impl TileZero {
    /// Collect reports from all worker tiles
    pub fn collect_reports(&mut self, reports: &[TileReport; 255]) {
        // Merge worker summaries into supergraph
        for report in reports {
            if report.boundary_moved {
                self.supergraph.update_from_fragment(&report.witness_fragment);
            }
            self.supergraph.update_coherence(report.tile_id, report.coherence);
        }
    }

    /// Issue gate decision (microsecond latency)
    pub fn decide(&mut self, action_ctx: &ActionContext) -> PermitToken {
        // Three stacked filters:

        // 1. Structural filter (global cut on reduced graph)
        let structural_ok = self.supergraph.global_cut() >= self.thresholds.min_cut;

        // 2. Shift filter (aggregated shift pressure)
        let shift_pressure = self.aggregate_shift_pressure();
        let shift_ok = shift_pressure < self.thresholds.max_shift;

        // 3. Evidence filter (can stop immediately if enough evidence)
        let e_aggregate = self.aggregate_evidence();
        let evidence_decision = self.evidence_decision(e_aggregate);

        // Combined decision
        let decision = match (structural_ok, shift_ok, evidence_decision) {
            (false, _, _) => GateDecision::Deny,  // Structure broken
            (_, false, _) => GateDecision::Defer, // Shift detected
            (_, _, EvidenceDecision::Reject) => GateDecision::Deny,
            (_, _, EvidenceDecision::Continue) => GateDecision::Defer,
            (true, true, EvidenceDecision::Accept) => GateDecision::Permit,
        };

        // Issue token
        self.issue_permit_token(action_ctx, decision)
    }

    /// Issue permit token (a signed capability)
    fn issue_permit_token(
        &mut self,
        ctx: &ActionContext,
        decision: GateDecision,
    ) -> PermitToken {
        let witness_hash = self.compute_witness_hash();

        let token = PermitToken {
            decision,
            action_id: ctx.action_id,
            timestamp: now_ns(),
            ttl_ns: self.thresholds.permit_ttl,
            witness_hash,
            sequence: self.permit_state.next_sequence(),
        };

        // MAC or sign the token
        let mac = self.permit_state.sign(&token);

        // Emit receipt
        self.emit_receipt(&token, &mac);

        PermitToken { mac, ..token }
    }

    /// Emit witness receipt (hash-chained)
    fn emit_receipt(&mut self, token: &PermitToken, mac: &[u8; 32]) {
        let receipt = WitnessReceipt {
            token: token.clone(),
            mac: *mac,
            previous_hash: self.receipt_log.last_hash(),
            witness_summary: self.supergraph.witness_summary(),
        };

        self.receipt_log.append(receipt);
    }
}

/// Permit token: a capability that agents must present
#[repr(C)]
pub struct PermitToken {
    pub decision: GateDecision,
    pub action_id: ActionId,
    pub timestamp: u64,
    pub ttl_ns: u64,
    pub witness_hash: [u8; 32],
    pub sequence: u64,
    pub mac: [u8; 32],  // HMAC or signature
}

impl PermitToken {
    /// Agents must present valid token to perform actions
    pub fn is_valid(&self, verifier: &Verifier) -> bool {
        // Check TTL
        if now_ns() > self.timestamp + self.ttl_ns {
            return false;
        }

        // Verify MAC/signature
        verifier.verify(self, &self.mac)
    }
}

WASM Kernel API

Each tile runs a minimal WASM kernel:

/// Worker tile WASM exports
#[no_mangle]
pub extern "C" fn ingest_delta(delta_ptr: *const u8, len: usize) -> u32 {
    let delta = unsafe { core::slice::from_raw_parts(delta_ptr, len) };
    TILE_STATE.with(|state| state.borrow_mut().ingest_delta(delta))
}

#[no_mangle]
pub extern "C" fn tick(now_ns: u64) -> *const TileReport {
    TILE_STATE.with(|state| state.borrow_mut().tick(now_ns))
}

#[no_mangle]
pub extern "C" fn get_witness_fragment(id: u32) -> *const u8 {
    TILE_STATE.with(|state| state.borrow().get_witness_fragment(id))
}

/// TileZero WASM/native exports
#[no_mangle]
pub extern "C" fn collect_reports(reports_ptr: *const TileReport, count: usize) {
    TILEZERO.with(|tz| tz.borrow_mut().collect_reports(reports_ptr, count))
}

#[no_mangle]
pub extern "C" fn decide(action_ctx_ptr: *const ActionContext) -> *const PermitToken {
    TILEZERO.with(|tz| tz.borrow_mut().decide(action_ctx_ptr))
}

#[no_mangle]
pub extern "C" fn get_receipt(sequence: u64) -> *const WitnessReceipt {
    TILEZERO.with(|tz| tz.borrow().get_receipt(sequence))
}

v0 Implementation Strategy

Ship fast by layering:

Phase Components Skip Initially
v0.1 Structural coherence + witness receipt Shift filter, evidence filter
v0.2 Add shift filter (normality scores) CORE RL adaptation
v0.3 Add evidence filter (e-values) Mixture e-values
v1.0 Full three-filter stack -

Rust Deliverables

Crate Description Dependencies
cognitum-gate-kernel no_std WASM kernel for worker tiles ruvector-mincut (core algorithms)
cognitum-gate-tilezero Native arbiter for TileZero ruvector-mincut, blake3, ed25519
mcp-gate MCP server for agent integration cognitum-gate-tilezero
cognitum-gate/
├── cognitum-gate-kernel/      # no_std WASM
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs             # WASM exports
│       ├── shard.rs           # Compact graph shard
│       ├── evidence.rs        # Local e-accumulator
│       └── report.rs          # TileReport generation
│
├── cognitum-gate-tilezero/    # Native arbiter
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs
│       ├── merge.rs           # Report merging
│       ├── supergraph.rs      # Reduced global graph
│       ├── permit.rs          # Token issuance
│       └── receipt.rs         # Hash-chained log
│
└── mcp-gate/                  # MCP integration
    ├── Cargo.toml
    └── src/
        ├── lib.rs
        ├── tools.rs           # permit_action, get_receipt, replay_decision
        └── server.rs          # MCP server

MCP Gate Tools

/// MCP tool: Request permission for an action
#[mcp_tool]
pub async fn permit_action(
    action_id: String,
    action_type: String,
    context: serde_json::Value,
) -> Result<PermitResponse, McpError> {
    let ctx = ActionContext::from_json(&context)?;
    let token = TILEZERO.decide(&ctx);

    Ok(PermitResponse {
        decision: token.decision.to_string(),
        token: token.encode_base64(),
        witness_hash: hex::encode(&token.witness_hash),
        valid_until_ns: token.timestamp + token.ttl_ns,
    })
}

/// MCP tool: Get witness receipt for audit
#[mcp_tool]
pub async fn get_receipt(sequence: u64) -> Result<ReceiptResponse, McpError> {
    let receipt = TILEZERO.get_receipt(sequence)?;

    Ok(ReceiptResponse {
        sequence,
        decision: receipt.token.decision.to_string(),
        timestamp: receipt.token.timestamp,
        witness_summary: receipt.witness_summary.to_json(),
        previous_hash: hex::encode(&receipt.previous_hash),
        receipt_hash: hex::encode(&receipt.hash()),
    })
}

/// MCP tool: Replay decision for debugging/audit
#[mcp_tool]
pub async fn replay_decision(
    sequence: u64,
    verify_chain: bool,
) -> Result<ReplayResponse, McpError> {
    let receipt = TILEZERO.get_receipt(sequence)?;

    // Optionally verify hash chain
    if verify_chain {
        TILEZERO.verify_chain_to(sequence)?;
    }

    // Replay the decision with logged state
    let replayed = TILEZERO.replay(&receipt)?;

    Ok(ReplayResponse {
        original_decision: receipt.token.decision.to_string(),
        replayed_decision: replayed.decision.to_string(),
        match_confirmed: receipt.token.decision == replayed.decision,
        state_snapshot: replayed.state_snapshot.to_json(),
    })
}

The Practical Win

This gives Cognitum a clear job that buyers understand:

"We do not just detect issues, we prevent unsafe actions." "We can prove why we blocked or allowed it." "We stay calm until structure breaks."

The permit token as a capability means:

  • Agents cannot act without presenting a valid token
  • Tokens expire (TTL-bounded)
  • Every token is backed by a witness receipt
  • The entire chain is cryptographically verifiable

API Contract

Request: Permit Action

{
  "action_id": "cfg-push-7a3f",
  "action_type": "config_change",
  "target": {
    "device": "router-west-03",
    "path": "/network/interfaces/eth0"
  },
  "context": {
    "agent_id": "ops-agent-12",
    "session_id": "sess-abc123",
    "prior_actions": ["cfg-push-7a3e"],
    "urgency": "normal"
  }
}

Response: Permit

{
  "decision": "permit",
  "token": "eyJ0eXAiOiJQVCIsImFsZyI6IkVkMjU1MTkifQ...",
  "valid_until_ns": 1737158400000000000,
  "witness": {
    "structural": {
      "cut_value": 12.7,
      "partition": "stable",
      "critical_edges": 0
    },
    "predictive": {
      "set_size": 3,
      "coverage": 0.92
    },
    "evidential": {
      "e_value": 847.3,
      "verdict": "accept"
    }
  },
  "receipt_sequence": 1847392
}

Response: Defer

{
  "decision": "defer",
  "reason": "shift_detected",
  "detail": "Distribution shift pressure 0.73 exceeds threshold 0.5",
  "escalation": {
    "to": "human_operator",
    "context_url": "/receipts/1847393/context",
    "timeout_ns": 300000000000
  },
  "witness": {
    "structural": { "cut_value": 11.2, "partition": "stable" },
    "predictive": { "set_size": 18, "coverage": 0.91 },
    "evidential": { "e_value": 3.2, "verdict": "continue" }
  },
  "receipt_sequence": 1847393
}

Response: Deny

{
  "decision": "deny",
  "reason": "boundary_violation",
  "detail": "Action crosses fragile partition (cut=2.1 < min=5.0)",
  "witness": {
    "structural": {
      "cut_value": 2.1,
      "partition": "fragile",
      "critical_edges": 4,
      "boundary": ["edge-17", "edge-23", "edge-41", "edge-52"]
    },
    "predictive": { "set_size": 47, "coverage": 0.88 },
    "evidential": { "e_value": 0.004, "verdict": "reject" }
  },
  "receipt_sequence": 1847394
}

Migration Path

Phase M1: Shadow Mode

Run AVCG alongside existing GateController. Compare decisions, don't enforce.

impl HybridGate {
    pub fn evaluate(&mut self, action: &Action) -> GateResult {
        // Existing gate makes the decision
        let legacy_result = self.legacy_gate.evaluate(action);

        // AVCG runs in shadow, logs disagreements
        let avcg_result = self.avcg_gate.evaluate(action);

        if legacy_result.decision != avcg_result.decision {
            metrics::counter!("gate.shadow.disagreement").increment(1);
            log::info!(
                "Shadow disagreement: legacy={:?} avcg={:?} action={}",
                legacy_result.decision,
                avcg_result.decision,
                action.id
            );
        }

        legacy_result  // Legacy still decides
    }
}

Exit criteria: <1% disagreement rate over 7 days, zero false denies on known-safe actions.

Phase M2: Canary Enforcement

AVCG enforces for 5% of traffic, legacy handles rest.

impl CanaryGate {
    pub fn evaluate(&mut self, action: &Action) -> GateResult {
        let canary = self.canary_selector.select(action);

        if canary {
            metrics::counter!("gate.canary.avcg").increment(1);
            self.avcg_gate.evaluate(action)
        } else {
            self.legacy_gate.evaluate(action)
        }
    }
}

Exit criteria: No incidents attributed to AVCG decisions over 14 days.

Phase M3: Majority Rollout

AVCG handles 95%, legacy available for fallback.

Phase M4: Full Cutover

Legacy removed. AVCG is the gate.

Timeline:
M1 (Shadow)     → 2-4 weeks
M2 (Canary 5%)  → 2 weeks
M3 (Majority)   → 2 weeks
M4 (Full)       → 1 week
                  ─────────
Total           → 7-9 weeks

Observability

Metrics (Prometheus)

# Decision counters
gate_decisions_total{decision="permit|defer|deny", reason="..."}

# Latency histograms
gate_latency_seconds{phase="mincut|conformal|eprocess|decision|receipt"}

# Signal values
gate_cut_value{quantile="0.5|0.9|0.99"}
gate_prediction_set_size{quantile="0.5|0.9|0.99"}
gate_evalue{quantile="0.5|0.9|0.99"}

# Health
gate_healthy{component="mincut|conformal|eprocess"}
gate_failover_total{from="primary|standby_N"}

# Coverage tracking
gate_conformal_coverage_rate  # Should stay ≥ 0.85
gate_eprocess_power           # Evidence accumulation rate

Alerting Thresholds

Alert Condition Severity
GateHighDenyRate deny_rate > 10% for 5m Warning
GateLatencyHigh p99 > 100ms for 5m Warning
GateCoverageDrift coverage < 0.80 for 15m Critical
GateUnhealthy any component unhealthy for 1m Critical
GateReceiptChainBroken hash verification fails Critical

Debug Query: Why Was This Denied?

# Get full decision context
curl /api/gate/receipts/1847394/explain

# Response:
{
  "receipt_sequence": 1847394,
  "decision": "deny",
  "explanation": {
    "primary_reason": "structural",
    "structural": {
      "cut_value": 2.1,
      "threshold": 5.0,
      "failed": true,
      "boundary_edges": [
        {"id": "edge-17", "weight": 0.3, "endpoints": ["node-a", "node-b"]},
        ...
      ],
      "partition_context": "Device router-west-03 is in partition P7 which has been unstable since 14:32:07 UTC"
    },
    "predictive": { "failed": false, "detail": "Set size 47 within bounds" },
    "evidential": { "failed": true, "detail": "E-value 0.004 < τ_deny 0.01" }
  },
  "suggested_action": "Wait for partition P7 to stabilize or escalate to human approval",
  "similar_past_decisions": [1847201, 1846998, 1846754]
}

Open Questions Resolution

Q1: Graph model scope — immediate actions or multi-step lookahead?

Decision: Immediate actions for v0, optional 1-step lookahead for v1.

Rationale: Multi-step lookahead requires predicting action sequences, which adds latency and complexity. Start simple: evaluate the action being requested right now. If the current action is safe but would lead to an unsafe state, the next action will be denied when it's requested.

Q2: E-process null — "action safety" vs "policy consistency"?

Decision: Action safety as primary null, policy consistency as secondary.

Rationale:

  • Primary H₀: P(action leads to unsafe state) ≤ p₀
  • Secondary (optional): Current action consistent with established policy

Action safety is more fundamental. Policy consistency can be added as a separate e-process that runs in parallel.

Q3: Threshold learning — fixed or adaptive?

Decision: Fixed for v0, adaptive via meta-learning for v1.

Rationale: Fixed thresholds are easier to audit and explain. Once we have production data, we can train a meta-learner to adjust thresholds based on observed false positive/negative rates.

Q4: Human-in-loop — how are DEFER decisions presented?

Decision: Structured escalation with timeout and context link.

{
  "escalation": {
    "to": "human_operator",
    "channel": "slack|pagerduty|dashboard",
    "context_url": "/receipts/{seq}/context",
    "timeout_ns": 300000000000,
    "default_on_timeout": "deny"
  }
}

If human doesn't respond within timeout, default to DENY (fail-safe).

See Hybrid Agent/Human Workflow section for full details.

Q5: Adversarial robustness — adaptive adversaries?

Decision: Defense in depth + rate limiting + anomaly detection.

Mitigations:

  1. Rate limit: Max N decisions per agent per minute
  2. Anomaly: Flag agents with unusually high deny rates
  3. Honeypots: Inject synthetic "trap" actions to detect probing
  4. Rotation: Periodically rotate threshold parameters within safe bounds

Definition of Done

v0.1 Shippable Criteria

Criterion Metric Target
Structural filter works Min-cut correctly identifies fragile partitions 100% on test suite
Receipts are signed All receipts have valid Ed25519 signature 100%
Receipts are chained Hash chain verifies for all receipts 100%
Latency acceptable p99 gate decision time < 50ms
No false denies Known-safe actions are permitted 100% on test suite
Demo scenario runs Network security control plane demo End-to-end pass

v0.1 Minimum Viable Demo

Scenario: Agent requests config push to network device.

  1. Agent calls permit_action with device target
  2. Gate evaluates structural coherence (min-cut)
  3. Gate returns PERMIT with signed receipt
  4. Agent presents token to device
  5. Device verifies token, accepts config

Success: Auditor can replay decision from receipt and get same result.


Cost Model

Memory per Tile (WASM)

Component Size Notes
Graph shard 32 KB ~2000 edges at 16 bytes each
Feature window 8 KB 2048 f32 values
E-accumulator 64 B f64 + metadata
Boundary edges 64 B 8 × EdgeId
Total per worker ~41 KB Fits in 64KB WASM page
Total 255 workers ~10.2 MB
TileZero state ~1 MB Supergraph + receipt log head
Total fabric ~12 MB

Network Bandwidth

Flow Frequency Size Bandwidth
Worker → TileZero reports 1/tick (10ms) 64 B × 255 ~1.6 MB/s
Receipt log append per decision ~512 B Variable
Gossip (distributed) 1/100ms ~1 KB × peers ~10 KB/s × P

Storage Growth

Item Size Retention Growth
Receipt ~512 B 90 days ~44 MB/day @ 1000 decisions/s
E-process checkpoint ~128 B Forever ~11 MB/day @ 1000 decisions/s
Audit log ~256 B 1 year ~22 MB/day @ 1000 decisions/s

90-day storage: ~7 GB receipts + ~1 GB checkpoints ≈ 8 GB


Hybrid Agent/Human Workflow

The coherence gate is designed for bounded autonomy, not full autonomy. Humans stay in the loop at critical decision points.

Design Philosophy

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   "Agents handle the routine. Humans handle the novel."                │
│                                                                         │
│   PERMIT  → Agent proceeds autonomously (low risk, high confidence)    │
│   DEFER   → Human decides (uncertain, boundary case, policy gap)       │
│   DENY    → Blocked automatically (structural violation, unsafe)       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The gate doesn't replace human judgment—it routes decisions to humans when judgment is needed.

Escalation Tiers

Tier Trigger Responder SLA Example
T0 PERMIT None (automated) 0 Routine config within stable partition
T1 DEFER (shift) On-call operator 5 min New dependency pattern detected
T2 DEFER (boundary) Senior engineer 15 min Action crosses partition boundary
T3 DEFER (policy gap) Policy team 1 hour No precedent for this action type
T4 DENY override request Security + Management 4 hours Agent requesting exception to denial

Human Decision Interface

When a DEFER is escalated, humans see:

┌─────────────────────────────────────────────────────────────────────────┐
│  DECISION REQUIRED                                    Timeout: 4:32    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Agent: ops-agent-12                                                   │
│  Action: Push config to router-west-03 /network/interfaces/eth0        │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  WHY DEFERRED                                                    │   │
│  │                                                                  │   │
│  │  • Shift detected: New dependency pattern (0.73 > 0.5 threshold)│   │
│  │  • This device was added to the graph 2 hours ago               │   │
│  │  • Similar actions on established devices: 847 permits, 0 denies│   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  CONTEXT                                                         │   │
│  │                                                                  │   │
│  │  Structural coherence: 11.2 (healthy)                           │   │
│  │  Prediction set size: 18 outcomes (moderate uncertainty)        │   │
│  │  Evidence accumulator: 3.2 (inconclusive)                       │   │
│  │                                                                  │   │
│  │  [View full witness receipt] [View similar past decisions]      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────────────────┐   │
│  │   APPROVE     │  │    DENY       │  │  ESCALATE TO T3           │   │
│  │   (proceed)   │  │   (block)     │  │  (need policy guidance)   │   │
│  └───────────────┘  └───────────────┘  └───────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Human Decision Recording

Human decisions become part of the audit trail:

pub struct HumanDecision {
    /// Original deferred receipt
    pub deferred_receipt_seq: u64,

    /// Human's decision
    pub decision: HumanVerdict,

    /// Human identity (authenticated)
    pub decider_id: AuthenticatedUserId,

    /// Reasoning (required for audit)
    pub rationale: String,

    /// Timestamp
    pub decided_at: u64,

    /// Signature (human signs their decision)
    pub signature: Ed25519Signature,
}

pub enum HumanVerdict {
    /// Approve the action
    Approve {
        /// Add to training data for future automation
        learn_from_this: bool,
    },
    /// Deny the action
    Deny {
        /// Reason for denial
        reason: String,
    },
    /// Escalate to higher tier
    Escalate {
        to_tier: EscalationTier,
        reason: String,
    },
    /// Request more information
    NeedMoreInfo {
        questions: Vec<String>,
    },
}

Override Protocol

Humans can override DENY decisions, but with friction and accountability:

pub struct DenyOverride {
    /// Which denial is being overridden
    pub denied_receipt_seq: u64,

    /// Who is overriding (must be T4 authority)
    pub overrider_id: AuthenticatedUserId,

    /// Second approver required
    pub second_approver_id: AuthenticatedUserId,

    /// Business justification (required, min 50 chars)
    pub justification: String,

    /// Time-bounded: override expires
    pub valid_until: u64,

    /// Scope-limited: only this specific action
    pub action_id: ActionId,

    /// Both signatures required
    pub overrider_signature: Ed25519Signature,
    pub approver_signature: Ed25519Signature,
}

Override constraints:

  • Two humans required (four-eyes principle)
  • Must provide written justification
  • Time-limited (max 24 hours)
  • Scope-limited (only the specific action)
  • All overrides flagged for security review

Learning from Human Decisions

Human decisions improve the gate over time:

/// When human approves a DEFER, optionally learn from it
pub fn learn_from_approval(
    deferred: &WitnessReceipt,
    human: &HumanDecision,
) {
    if human.decision.learn_from_this() {
        // Add to calibration data
        conformal_calibrator.add_observation(
            deferred.context.clone(),
            Outcome::Safe,  // Human judged it safe
        );

        // Update e-process null hypothesis
        eprocess_trainer.add_positive_example(
            deferred.action.clone(),
        );

        // Adjust threshold candidates (for meta-learning in v1)
        threshold_learner.record_human_permit(
            deferred.signals.clone(),
        );
    }
}

Workload Distribution Target

The goal is minimal human burden while maintaining safety:

Decision Target Rate Human Workload
PERMIT 90-95% Zero
DEFER 4-9% Human decides
DENY 1-2% Zero (unless override requested)

If DEFER rate exceeds 10%, the gate is too conservative—tune thresholds. If DENY rate exceeds 5%, something is wrong—investigate root cause.

Integration Channels

Channel Use Case Response Format
Slack On-call escalation Interactive buttons
PagerDuty Critical/timed decisions Acknowledge + decision API
Dashboard Batch review Web UI with full context
CLI Developer/ops workflow ruvector gate approve <seq>
API Programmatic integration REST/gRPC

Audit Trail for Human Decisions

Every human decision is:

  1. Authenticated: Decider identity verified via SSO/MFA
  2. Signed: Human signs their decision with personal key
  3. Chained: Added to the same receipt chain as gate decisions
  4. Timestamped: Immutable record of when decision was made
  5. Justified: Rationale captured for later review
Receipt Chain:
  [1847392] PERMIT (automated) → agent executed
  [1847393] DEFER (automated) → escalated to human
  [1847393-H] APPROVE (human: alice@corp) → agent executed
  [1847394] DENY (automated) → blocked
  [1847394-O] OVERRIDE (humans: bob@corp + carol@corp) → exception granted

Consequences

Benefits

  1. Formal Guarantees: Type I error control at any stopping time
  2. Distribution Shift Robustness: Conformal prediction adapts without retraining
  3. Computational Efficiency: O(n^{o(1)}) update time from subpolynomial min-cut
  4. Audit Trail: Every decision has cryptographic witness receipt
  5. Defense in Depth: Three independent signals must concur for permit
  6. Cryptographic Integrity: All receipts signed with Ed25519
  7. Attack Resistance: E-value bounds, replay guards, race condition prevention
  8. Distributed Scalability: Hierarchical coordination with regional and global tiers
  9. Fault Tolerance: Automatic failover with safe defaults

Risks & Mitigations

Risk Mitigation
Computational overhead Lazy evaluation; batch updates; SIMD optimization
E-value power under uncertainty Mixture e-values for robustness
Graph model mismatch Learn graph structure from trajectories
Threshold tuning Adaptive thresholds via meta-learning
Receipt forgery Mandatory Ed25519 signing; chain linkage
E-value manipulation Input bounds; clamping with security logging
Race conditions Atomic decisions with sequence numbers
Replay attacks Bloom filter + sliding window guard
Network partitions Hierarchical decisions; local autonomy
Byzantine nodes Consensus-based aggregation; safe defaults

Complexity Analysis

Operation Current With AVCG Distributed AVCG
Edge update O(n^{o(1)}) O(n^{o(1)}) O(n^{o(1)}) + network
Gate evaluation O(1) O(k) prediction set O(k) + O(R) regional
Witness generation O(m) O(m) amortized O(m) + signing
Certificate verification O(n) O(n + log T) O(n + log T) + sig verify
Receipt signing N/A O(1) Ed25519 O(1) + HSM latency
Distributed consensus N/A N/A O(log N) Raft
E-process aggregation N/A O(1) O(P) peers

Where: k = prediction set size, T = history length, R = regional peers, N = cluster size, P = peer count

References

Dynamic Min-Cut

  1. El-Hayek, Henzinger, Li. "Deterministic and Exact Fully-dynamic Minimum Cut of Superpolylogarithmic Size in Subpolynomial Time." arXiv:2512.13105, December 2025.
  2. Jin, Sun, Thorup. "Fully Dynamic Exact Minimum Cut in Subpolynomial Time." SODA 2024.

Online Conformal Prediction

  1. "Online Conformal Inference with Retrospective Adjustment for Faster Adaptation to Distribution Shift." arXiv:2511.04275, November 2025.
  2. "Distribution-informed Online Conformal Prediction (COP)." December 2025.
  3. "CORE: Conformal Regression under Distribution Shift via Reinforcement Learning." October 2025.

E-Values and E-Processes

  1. Ramdas, Wang. "Hypothesis Testing with E-values." Foundations and Trends in Statistics, 2025.
  2. ICML 2025 Tutorial: "Game-theoretic Statistics and Sequential Anytime-Valid Inference."
  3. "Sequential Randomization Tests Using e-values." arXiv:2512.04366, December 2025.

AI Agent Control

  1. "Bounded Autonomy: A Pragmatic Response to Concerns About Fully Autonomous AI Agents." XMPRO, 2025.
  2. "Customizable Runtime Enforcement for Safe and Reliable LLM Agents." arXiv:2503.18666, 2025.

Testing Strategy

Unit Tests

Component Coverage Target Key Test Cases
CompactGraph 95% Add/remove edges, weight updates, min-cut estimation
EvidenceAccumulator 95% Bounds checking, update rules, stopping decisions
TileReport 90% Serialization roundtrip, checksum verification
PermitToken 95% Signing, verification, TTL expiration
ReceiptLog 95% Hash chain integrity, tamper detection
ThreeFilterDecision 100% All Permit/Defer/Deny paths

Integration Tests

Scenario Description Expected Outcome
Happy path Stable graph, safe action PERMIT with valid receipt
Boundary crossing Action crosses fragile partition DENY with boundary edges
Shift detection New dependency pattern DEFER with escalation
Human approval DEFER → human approves Token issued, learning recorded
Replay verification Replay historical decision Deterministic match
Hash chain audit Verify 1000 receipts All hashes valid

Property-Based Tests

#[proptest]
fn e_value_always_positive(e1: f64, e2: f64) {
    let result = combine_evalues(e1.abs(), e2.abs());
    prop_assert!(result > 0.0);
}

#[proptest]
fn receipt_hash_deterministic(receipt: WitnessReceipt) {
    let hash1 = receipt.compute_hash();
    let hash2 = receipt.compute_hash();
    prop_assert_eq!(hash1, hash2);
}

#[proptest]
fn serialization_roundtrip(report: TileReport) {
    let bytes = report.serialize();
    let restored = TileReport::deserialize(&bytes);
    prop_assert_eq!(report, restored);
}

Security Tests

Test Attack Vector Expected Behavior
Forged signature Invalid Ed25519 sig Verification fails
Replay attack Duplicate action ReplayGuard blocks
E-value overflow Extreme likelihood ratio Clamped to bounds
Race condition Concurrent evaluations Sequence numbers ordered
Tampered receipt Modified hash Chain verification fails

Benchmark Tests

Metric Target Measurement
Gate decision latency p99 < 50ms criterion benchmark
Receipt signing < 5ms criterion benchmark
255-tile report merge < 10ms criterion benchmark
Hash chain verification (1000) < 100ms criterion benchmark
Memory per worker tile < 64KB Static analysis

Configuration Format

TOML Configuration

# gate-config.toml

[gate]
# Gate identification
gate_id = "gate-west-01"
version = "0.1.0"

[thresholds]
# E-process thresholds
tau_deny = 0.01          # E-value below this → DENY
tau_permit = 100.0       # E-value above this → PERMIT

# Structural thresholds
min_cut = 5.0            # Cut value below this → DENY
max_shift = 0.5          # Shift pressure above this → DEFER

# Conformal thresholds
max_prediction_set = 20  # Set size above this → DEFER
coverage_target = 0.90   # Target coverage rate

[timing]
# Permit token TTL
permit_ttl_seconds = 300

# Decision timeout
decision_timeout_ms = 50

# Tick interval for worker tiles
tick_interval_ms = 10

[security]
# Key rotation
signing_key_rotation_days = 30
threshold_key_rotation_days = 90

# Replay prevention
replay_window_seconds = 3600
bloom_filter_size = 1000000

[distributed]
# Coordination settings
regional_peers = ["gate-west-02", "gate-west-03"]
global_coordinator = "coordinator-global-01"
raft_heartbeat_ms = 100
consensus_timeout_ms = 1000

[escalation]
# Human-in-loop settings
default_timeout_seconds = 300
default_on_timeout = "deny"

[escalation.channels.slack]
webhook_url = "${SLACK_WEBHOOK_URL}"
channel = "#gate-escalations"

[escalation.channels.pagerduty]
api_key = "${PAGERDUTY_API_KEY}"
service_id = "gate-critical"

[observability]
# Metrics endpoint
metrics_port = 9090
metrics_path = "/metrics"

# Tracing
tracing_enabled = true
tracing_sample_rate = 0.1
jaeger_endpoint = "http://jaeger:14268/api/traces"

[storage]
# Receipt storage
receipt_backend = "postgresql"
receipt_retention_days = 90
checkpoint_interval = 100

[storage.postgresql]
host = "${DB_HOST}"
port = 5432
database = "gate_receipts"
username = "${DB_USER}"
password = "${DB_PASSWORD}"

Environment Variables

# Required
export GATE_SIGNING_KEY_PATH=/etc/gate/keys/signing.key
export GATE_CONFIG_PATH=/etc/gate/config.toml

# Optional overrides
export GATE_TAU_DENY=0.01
export GATE_TAU_PERMIT=100.0
export GATE_MIN_CUT=5.0
export GATE_MAX_SHIFT=0.5
export GATE_PERMIT_TTL_SECONDS=300

# Secrets (never in config file)
export SLACK_WEBHOOK_URL=https://hooks.slack.com/...
export PAGERDUTY_API_KEY=...
export DB_PASSWORD=...

Error Recovery Procedures

Gate Decision Failures

Failure Detection Recovery Fallback
Min-cut timeout Decision exceeds 50ms Log, retry once DEFER
E-process NaN is_nan() check Reset accumulator DENY
Signing failure Ed25519 error Rotate to backup key DENY (unsigned)
Receipt log full Capacity check Archive, start new segment DENY

Distributed Failures

impl FaultRecovery {
    pub async fn handle_regional_failure(&mut self, error: RegionalError) -> GateResult {
        match error {
            RegionalError::LeaderUnavailable => {
                // Wait for new leader election
                tokio::time::sleep(Duration::from_millis(200)).await;
                self.retry_with_new_leader().await
            }

            RegionalError::NetworkPartition => {
                // Fall back to local-only decision
                log::warn!("Network partition detected, using local gate");
                self.local_gate.evaluate_standalone()
            }

            RegionalError::ConsensusTimeout => {
                // Use conservative decision
                Ok(GateResult {
                    decision: GateDecision::Defer,
                    reason: "Consensus timeout - escalating to human".into(),
                    ..Default::default()
                })
            }
        }
    }
}

Receipt Chain Recovery

impl ReceiptLog {
    /// Recover from corrupted receipt chain
    pub fn recover_chain(&mut self, last_known_good: u64) -> Result<(), RecoveryError> {
        // 1. Truncate corrupted entries
        self.truncate_after(last_known_good)?;

        // 2. Rebuild from checkpoint
        let checkpoint = self.find_nearest_checkpoint(last_known_good)?;
        self.rebuild_from_checkpoint(checkpoint)?;

        // 3. Mark recovery in audit log
        self.append_recovery_marker(last_known_good)?;

        // 4. Alert operators
        alert::send("Receipt chain recovery performed", Severity::Warning);

        Ok(())
    }
}

Worker Tile Recovery

Failure Detection Recovery Time Data Loss
Single tile crash Heartbeat timeout < 100ms Last tick
Tile memory corruption Checksum mismatch < 500ms Current shard
TileZero crash Primary unavailable < 1s None (standbys)
Full fabric restart All tiles down < 5s Rebuild from checkpoint

Runbook: Gate Unresponsive

# 1. Check gate health
curl http://gate:9090/health

# 2. If unhealthy, check logs
kubectl logs -l app=gate --tail=100

# 3. Check for resource exhaustion
kubectl top pods -l app=gate

# 4. If memory high, trigger GC
curl -X POST http://gate:9090/admin/gc

# 5. If still unresponsive, rolling restart
kubectl rollout restart deployment/gate

# 6. Verify recovery
curl http://gate:9090/health
curl http://gate:9090/metrics | grep gate_healthy

Appendix: Mathematical Foundations

E-Value Composition

For independent e-values e₁, e₂:

e_combined = e₁ · e₂
E[e_combined] = E[e₁] · E[e₂] ≤ 1 · 1 = 1

This enables optional continuation: evidence accumulates validly across sessions.

Conformal Coverage

Under exchangeability or bounded distribution shift:

P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α - δ_t

Where δ_t → 0 as the algorithm adapts via retrospective adjustment.

Anytime-Valid Stopping

For any stopping time τ (possibly data-dependent):

P_H₀(E_τ ≥ 1/α) ≤ α

This holds because E_t is a nonnegative supermartingale with E[E_0] = 1.