Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

79 KiB

Raw Blame History

ADR-001: Anytime-Valid Coherence Gate

Status: Proposed Date: 2026-01-17 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow

Version History

Version	Date	Author	Changes
0.1	2026-01-17	ruv.io	Initial draft with three-filter architecture
0.2	2026-01-17	ruv.io	Added security hardening, performance optimization
0.3	2026-01-17	ruv.io	Added 256-tile WASM fabric mapping
0.4	2026-01-17	ruv.io	Added API contract, migration, observability
0.5	2026-01-17	ruv.io	Added hybrid agent/human workflow
0.6	2026-01-17	ruv.io	Added testing strategy, config format, error recovery

Plain Language Summary

What is it?

An Anytime-Valid Coherence Gate is a small control loop that decides, at any moment:

"Is it safe to act right now, or should we pause or escalate?"

It does not try to be smart. It tries to be safe, calm, and correct about permission.

Why "anytime-valid"?

Because you can stop the computation at any time and still trust the decision.

Like a smoke detector:

It can keep listening forever
The moment it has enough evidence, it triggers
If you stop listening early, whatever it already concluded is still valid

You are not waiting for a model to finish thinking. You are continuously monitoring stability.

Why "coherence"?

Coherence means: does the system's current state agree with itself?

In RuVector, coherence is measured from structure:

RuVector holds relationships as vectors plus a graph
Min-cut and boundary signals tell you when the graph is becoming fragile or splitting into conflicting regions
If the system is splitting, you do not let it take big actions

What it outputs:

Decision	Meaning
Permit	Stable enough, proceed
Defer	Uncertain, escalate to a stronger model or human
Deny	Unstable or policy-violating, block the action

Every decision returns a short "receipt" explaining why.

A concrete example:

An agent wants to push a config change to a network device.

If the dependency graph is stable and similar changes worked before → Permit
If signals are weird (new dependencies, new actors, drift) → Defer and ask for confirmation
If the change crosses a fragile boundary (touches a partition already unstable) → Deny

Why it matters:

It turns autonomy into something enterprises can trust because:

Actions are bounded
Uncertainty is handled explicitly
You get an audit trail

"Attention becomes a permission system, not a popularity contest" — applied to whole-system actions instead of token attention.

Context

The RuVector ecosystem requires a principled mechanism for controlling autonomous agent actions with:

Formal safety guarantees under distribution shift
Computational efficiency suitable for real-time enforcement
Auditable decision trails with cryptographic receipts

Current approaches (threshold classifiers, rule-based systems, periodic audits) lack one or more of these properties. This ADR proposes the Anytime-Valid Coherence Gate (AVCG) - a 3-way algorithmic combination that converts coherence measurement into a deterministic control loop.

Decision

We will implement an Anytime-Valid Coherence Gate that integrates three cutting-edge algorithmic components:

1. Dynamic Min-Cut with Witness Partitions

Source: El-Hayek, Henzinger, Li (arXiv:2512.13105, December 2025)

Key Innovation: Exact deterministic n^{o(1)} update time for cuts up to 2^{Θ(log^{3/4-c}n)}

Integration:

Extends existing SubpolynomialMinCut in ruvector-mincut/src/subpolynomial/mod.rs
Leverages existing WitnessTree for explicit partition certificates
Uses deterministic LocalKCut for local cut verification

Role in Gate: Provides the structural coherence signal - identifies minimal intervention points in the agent action graph with explicit witness partitions showing which actions form the critical boundary to unsafe states.

2. Online Conformal Prediction with Shift-Awareness

Sources:

Retrospective Adjustment (arXiv:2511.04275, November 2025)
Conformal Optimistic Prediction (COP) (December 2025)
CORE: RL-based Conformal Regression (October 2025)

Key Innovation: Distribution-free coverage guarantees that adapt to arbitrary distribution shift with faster recalibration via retrospective adjustment.

Integration:

New module: ruvector-mincut/src/conformal/ for prediction sets
Interfaces with existing GatePolicy thresholds
Wraps action outcome predictions with calibrated uncertainty

Role in Gate: Provides the predictive uncertainty signal - quantifies confidence in action outcomes, triggering DEFER when prediction sets are too large.

3. E-Values and E-Processes for Anytime-Valid Inference

Sources:

Ramdas & Wang "Hypothesis Testing with E-values" (FnTStA 2025)
ICML 2025 Tutorial on SAVI
Sequential Randomization Tests (arXiv:2512.04366, December 2025)

Key Innovation: Evidence accumulation that remains valid at any stopping time, with multiplicative composition across experiments.

Definition: E-value e satisfies E[e] ≤ 1 under null hypothesis. E-processes are nonnegative supermartingales with E_0 = 1.

Integration:

New module: ruvector-mincut/src/eprocess/ for evidence tracking
Integrates with existing CutCertificate for audit trails
Enables anytime-valid stopping decisions

Role in Gate: Provides the evidential validity signal - accumulates statistical evidence for/against coherence with formal Type I error control at any stopping time.

Gate Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    ANYTIME-VALID COHERENCE GATE                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐ │
│   │  DYNAMIC MIN-CUT │    │    CONFORMAL     │    │   E-PROCESS      │ │
│   │    (Structural)  │    │   (Predictive)   │    │  (Evidential)    │ │
│   │                  │    │                  │    │                  │ │
│   │  SubpolynomialMC │    │  ShiftAdaptive   │    │  CoherenceTest   │ │
│   │  WitnessTree     │───▶│  PredictionSet   │───▶│  EvidenceAccum   │ │
│   │  LocalKCut       │    │  COP/CORE        │    │  StoppingRule    │ │
│   └──────────────────┘    └──────────────────┘    └──────────────────┘ │
│            │                       │                       │           │
│            ▼                       ▼                       ▼           │
│   ┌────────────────────────────────────────────────────────────────┐   │
│   │                    DECISION LOGIC                              │   │
│   │                                                                │   │
│   │   PERMIT: E_t > τ_permit ∧ action ∉ CriticalCut ∧ |C_t| small │   │
│   │   DEFER:  |C_t| large ∨ τ_deny < E_t < τ_permit               │   │
│   │   DENY:   E_t < τ_deny ∨ action ∈ WitnessPartition(unsafe)    │   │
│   │                                                                │   │
│   └────────────────────────────────────────────────────────────────┘   │
│                               │                                        │
│                               ▼                                        │
│                    ┌─────────────────────┐                            │
│                    │   WITNESS RECEIPT   │                            │
│                    │  (cut + conf + e)   │                            │
│                    └─────────────────────┘                            │
└─────────────────────────────────────────────────────────────────────────┘

Integration with Existing Architecture

Extension Points

Component	Current Implementation	AVCG Extension
`GatePacket`	λ as point estimate	Add `lambda_confidence_q15`, `e_value_log_q15`
`GateController`	Rule-based thresholds	Add `AnytimeGatePolicy` with adaptive thresholds
`WitnessTree`	Cut value only	Add `ConfidenceWitness` with staleness tracking
`CutCertificate`	Static verification	Add `EvidenceReceipt` with e-value trace
`TierDecision`	Fixed tiers	Add `required_confidence_for_tier`

New Modules

ruvector-mincut/
├── src/
│   ├── conformal/           # NEW: Online conformal prediction
│   │   ├── mod.rs
│   │   ├── prediction_set.rs
│   │   ├── cop.rs           # Conformal Optimistic Prediction
│   │   ├── retrospective.rs # Retrospective adjustment
│   │   └── core.rs          # RL-based conformal
│   ├── eprocess/            # NEW: E-value and e-process tracking
│   │   ├── mod.rs
│   │   ├── evalue.rs
│   │   ├── evidence_accum.rs
│   │   ├── stopping.rs
│   │   └── mixture.rs
│   ├── anytime_gate/        # NEW: Integrated gate controller
│   │   ├── mod.rs
│   │   ├── policy.rs
│   │   ├── decision.rs
│   │   └── receipt.rs
│   └── ...existing modules...

Decision Rules

Permit Conditions (all must hold)

E-process value E_t > τ_permit (sufficient evidence of coherence)
Action not in witness partition of critical cut
Conformal prediction set |C_t| < θ_confidence (confident prediction)

Defer Conditions (any triggers)

Conformal prediction set |C_t| > θ_uncertainty (uncertain outcome)
E-process in indeterminate range: τ_deny < E_t < τ_permit
Deadline approaching without sufficient confidence

Deny Conditions (any triggers)

E-process value E_t < τ_deny (strong evidence of incoherence)
Action in witness partition crossing to unsafe states
Structural impossibility via min-cut topology

Threshold Configuration

Threshold	Meaning	Recommended Default
τ_deny	E-process level indicating incoherence	0.01 (1% false alarm)
τ_permit	E-process level indicating coherence	100 (strong evidence)
θ_uncertainty	Conformal set size requiring deferral	Task-dependent
θ_confidence	Conformal set size for confident permit	Task-dependent

Witness Receipt Structure

pub struct WitnessReceipt {
    /// Timestamp of decision
    pub timestamp: u64,
    /// Action that was evaluated
    pub action_id: ActionId,
    /// Gate decision
    pub decision: GateDecision,

    // Structural witness (from min-cut)
    pub cut_value: f64,
    pub witness_partition: (Vec<VertexId>, Vec<VertexId>),
    pub critical_edges: Vec<EdgeId>,

    // Predictive witness (from conformal)
    pub prediction_set: ConformalSet,
    pub coverage_target: f32,
    pub shift_adaptation_rate: f32,

    // Evidential witness (from e-process)
    pub e_value: f64,
    pub e_process_cumulative: f64,
    pub stopping_valid: bool,

    // Cryptographic seal
    pub receipt_hash: [u8; 32],
}

Security Hardening

Threat Model

Threat Actor	Capabilities	Target	Impact
Malicious Agent	Action injection, timing manipulation	Gate bypass	Unauthorized actions executed
Network Adversary	Message interception, replay	Receipt forgery	False audit trail
Insider Threat	Threshold modification, key access	Policy manipulation	Safety guarantees voided
Byzantine Node	Arbitrary behavior in distributed gate	Consensus corruption	Inconsistent decisions

Cryptographic Requirements

Receipt Signing (CRITICAL)

pub struct WitnessReceipt {
    // ... existing fields ...

    // Cryptographic seal (REQUIRED)
    pub receipt_hash: [u8; 32],         // Blake3 hash of serialized content
    pub signature: Ed25519Signature,     // REQUIRED, not optional
    pub signer_id: PublicKey,           // Identity of signing gate
    pub timestamp_proof: TimestampProof, // Prevents backdating
}

/// Timestamp proof prevents replay and backdating
pub struct TimestampProof {
    pub timestamp: u64,
    pub previous_receipt_hash: [u8; 32], // Chain linkage
    pub merkle_root: [u8; 32],           // Batch anchor
}

impl WitnessReceipt {
    /// Sign receipt - MUST be called before any external use
    pub fn sign(&mut self, key: &SigningKey) -> Result<(), CryptoError> {
        let content = self.serialize_without_signature();
        self.receipt_hash = blake3::hash(&content).into();
        self.signature = key.sign(&self.receipt_hash);
        Ok(())
    }

    /// Verify receipt integrity and authenticity
    pub fn verify(&self, trusted_keys: &KeyStore) -> Result<(), VerifyError> {
        // 1. Verify hash
        let expected_hash = blake3::hash(&self.serialize_without_signature());
        if self.receipt_hash != expected_hash.into() {
            return Err(VerifyError::HashMismatch);
        }

        // 2. Verify signature
        let public_key = trusted_keys.get(&self.signer_id)?;
        public_key.verify(&self.receipt_hash, &self.signature)?;

        // 3. Verify timestamp chain
        self.timestamp_proof.verify()?;

        Ok(())
    }
}

Key Management

Key Type	Purpose	Rotation	Storage
Gate Signing Key	Sign receipts	30 days	HSM or secure enclave
Receipt Verification Keys	Verify receipts	On rotation	Distributed key store
Threshold Keys	Multi-party signing	90 days	Shamir secret sharing

Attack Mitigations

E-Value Manipulation Prevention

/// Bounds checking for e-value inputs
impl EValue {
    pub fn from_likelihood_ratio(
        likelihood_h1: f64,
        likelihood_h0: f64,
    ) -> Result<Self, EValueError> {
        // Prevent division by zero
        if likelihood_h0 <= f64::EPSILON {
            return Err(EValueError::InvalidDenominator);
        }

        let ratio = likelihood_h1 / likelihood_h0;

        // Bound extreme values to prevent overflow attacks
        let bounded = ratio.clamp(E_VALUE_MIN, E_VALUE_MAX);

        // Log if clamping occurred (potential attack indicator)
        if (bounded - ratio).abs() > f64::EPSILON {
            security_log!("E-value clamped: {} -> {}", ratio, bounded);
        }

        Ok(Self { value: bounded, ..Default::default() })
    }
}

const E_VALUE_MIN: f64 = 1e-10;
const E_VALUE_MAX: f64 = 1e10;

Race Condition Prevention

/// Atomic gate decision with sequence numbers
pub struct AtomicGateDecision {
    /// Monotonic sequence for ordering
    sequence: AtomicU64,
    /// Lock for decision atomicity
    decision_lock: RwLock<()>,
}

impl AtomicGateDecision {
    pub async fn evaluate(&self, action: &Action) -> GateResult {
        // Acquire exclusive lock for decision
        let _guard = self.decision_lock.write().await;

        // Get sequence number BEFORE evaluation
        let seq = self.sequence.fetch_add(1, Ordering::SeqCst);

        // Evaluate all three signals atomically
        let result = self.evaluate_internal(action, seq).await;

        // Sequence number in receipt ensures ordering
        result.with_sequence(seq)
    }
}

Replay Attack Prevention

/// Replay prevention via nonce tracking
pub struct ReplayGuard {
    /// Recent action hashes (bloom filter for efficiency)
    recent_actions: BloomFilter,
    /// Sliding window of full hashes for false positive resolution
    hash_window: VecDeque<[u8; 32]>,
    /// Maximum age of tracked actions
    window_duration: Duration,
}

impl ReplayGuard {
    pub fn check_and_record(&mut self, action: &Action) -> Result<(), ReplayError> {
        let hash = action.content_hash();

        // Fast path: bloom filter check
        if self.recent_actions.might_contain(&hash) {
            // Slow path: verify against full hash window
            if self.hash_window.contains(&hash) {
                return Err(ReplayError::DuplicateAction { hash });
            }
        }

        // Record action
        self.recent_actions.insert(&hash);
        self.hash_window.push_back(hash);
        self.prune_old_entries();

        Ok(())
    }
}

Trust Boundaries

┌─────────────────────────────────────────────────────────────────────────┐
│                         TRUST BOUNDARY: GATE CORE                       │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │  • E-process computation    • Min-cut evaluation                 │  │
│  │  • Conformal prediction     • Decision logic                     │  │
│  │  • Receipt signing          • Key material                       │  │
│  │                                                                   │  │
│  │  Invariants:                                                      │  │
│  │  - All inputs validated before use                               │  │
│  │  - All outputs signed before release                             │  │
│  │  - No external calls during decision                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                    │
│                         (authenticated channel)                         │
│                                    │                                    │
└────────────────────────────────────┼────────────────────────────────────┘
                                     │
┌────────────────────────────────────┼────────────────────────────────────┐
│                    TRUST BOUNDARY: AGENT INTERFACE                      │
│                                    │                                    │
│  • Action submission (validated)   │  • Decision receipt (verified)    │
│  • Context provision (sanitized)   │  • Witness query (authenticated)  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Performance Optimization

Identified Bottlenecks & Solutions

1. E-Process History Management

Problem: Unbounded history growth in EProcess.history: Vec<EValue>

Solution: Ring buffer with configurable retention

pub struct EProcess {
    /// Current accumulated value (always maintained)
    current: f64,

    /// Bounded history ring buffer
    history: RingBuffer<EValueSummary>,

    /// Checkpoint for long-term audit (sampled)
    checkpoints: Vec<EProcessCheckpoint>,
}

/// Compact summary for history
pub struct EValueSummary {
    value: f32,           // Reduced precision for storage
    timestamp: u32,       // Relative to epoch
    flags: u8,            // Metadata bits
}

impl EProcess {
    const HISTORY_CAPACITY: usize = 1024;
    const CHECKPOINT_INTERVAL: usize = 100;

    pub fn update(&mut self, e: EValue) {
        // Update current (always)
        self.current = self.update_rule.apply(self.current, e.value);

        // Add to ring buffer (bounded)
        self.history.push(e.to_summary());

        // Periodic checkpoint for audit
        if self.history.len() % Self::CHECKPOINT_INTERVAL == 0 {
            self.checkpoints.push(self.checkpoint());
        }
    }
}

2. Min-Cut Hierarchy Updates

Problem: Sequential iteration over all hierarchy levels

Solution: Lazy propagation with dirty tracking

pub struct LazyHierarchy {
    levels: Vec<HierarchyLevel>,
    /// Bitmap of levels needing update
    dirty_levels: u64,
    /// Deferred updates queue
    pending_updates: VecDeque<DeferredUpdate>,
}

impl LazyHierarchy {
    pub fn insert(&mut self, edge: Edge) {
        // Only update lowest level immediately
        self.levels[0].insert(edge);
        self.dirty_levels |= 1;

        // Defer higher level updates
        self.pending_updates.push_back(DeferredUpdate::Insert(edge));
    }

    pub fn get_cut(&mut self) -> CutValue {
        // Propagate only if needed for query
        if self.dirty_levels != 0 {
            self.propagate_lazy();
        }
        self.levels.last().unwrap().cut_value()
    }

    fn propagate_lazy(&mut self) {
        // Process only dirty levels
        while self.dirty_levels != 0 {
            let level = self.dirty_levels.trailing_zeros() as usize;
            self.update_level(level);
            self.dirty_levels &= !(1 << level);
        }
    }
}

3. SIMD-Optimized E-Value Computation

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

/// Batch e-value computation with SIMD
pub fn compute_mixture_evalue_simd(
    likelihoods_h1: &[f64],
    likelihoods_h0: &[f64],
    weights: &[f64],
) -> f64 {
    assert_eq!(likelihoods_h1.len(), likelihoods_h0.len());
    assert_eq!(likelihoods_h1.len(), weights.len());

    #[cfg(target_feature = "avx2")]
    unsafe {
        let mut sum = _mm256_setzero_pd();

        for i in (0..likelihoods_h1.len()).step_by(4) {
            let h1 = _mm256_loadu_pd(likelihoods_h1.as_ptr().add(i));
            let h0 = _mm256_loadu_pd(likelihoods_h0.as_ptr().add(i));
            let w = _mm256_loadu_pd(weights.as_ptr().add(i));

            let ratio = _mm256_div_pd(h1, h0);
            let weighted = _mm256_mul_pd(ratio, w);
            sum = _mm256_add_pd(sum, weighted);
        }

        // Horizontal sum
        horizontal_sum_pd(sum)
    }

    #[cfg(not(target_feature = "avx2"))]
    {
        // Scalar fallback
        likelihoods_h1.iter()
            .zip(likelihoods_h0.iter())
            .zip(weights.iter())
            .map(|((h1, h0), w)| (h1 / h0) * w)
            .sum()
    }
}

4. Receipt Serialization Optimization

/// Zero-copy receipt serialization
pub struct ReceiptBuffer {
    /// Pre-allocated buffer pool
    pool: BufferPool,
    /// Current buffer
    current: Buffer,
}

impl WitnessReceipt {
    /// Serialize to pre-allocated buffer (zero-copy)
    pub fn serialize_into(&self, buffer: &mut [u8]) -> Result<usize, SerializeError> {
        let mut cursor = 0;

        // Fixed-size header (no allocation)
        cursor += self.write_header(&mut buffer[cursor..])?;

        // Structural witness (fixed size)
        cursor += self.structural.write_to(&mut buffer[cursor..])?;

        // Predictive witness (bounded size)
        cursor += self.predictive.write_to(&mut buffer[cursor..])?;

        // Evidential witness (fixed size)
        cursor += self.evidential.write_to(&mut buffer[cursor..])?;

        // Hash and signature (fixed size)
        buffer[cursor..cursor + 32].copy_from_slice(&self.receipt_hash);
        cursor += 32;
        buffer[cursor..cursor + 64].copy_from_slice(&self.signature.to_bytes());
        cursor += 64;

        Ok(cursor)
    }
}

Latency Budget (Revised)

Component	Budget	Optimization	Measured p99
Min-cut query	10ms	Lazy propagation	TBD
Conformal prediction	15ms	Cached quantiles	TBD
E-process update	5ms	SIMD mixture	TBD
Decision logic	5ms	Short-circuit	TBD
Receipt generation	10ms	Zero-copy serialize	TBD
Signing	5ms	Ed25519 batch	TBD
Total	50ms

Distributed Coordination

Multi-Agent Gate Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED COHERENCE GATE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │   REGIONAL      │    │   REGIONAL      │    │   REGIONAL      │     │
│  │   GATE (Raft)   │    │   GATE (Raft)   │    │   GATE (Raft)   │     │
│  │                 │    │                 │    │                 │     │
│  │  • Local cuts   │    │  • Local cuts   │    │  • Local cuts   │     │
│  │  • Local conf   │    │  • Local conf   │    │  • Local conf   │     │
│  │  • Local e-proc │    │  • Local e-proc │    │  • Local e-proc │     │
│  └────────┬────────┘    └────────┬────────┘    └────────┬────────┘     │
│           │                      │                      │              │
│           └──────────────────────┼──────────────────────┘              │
│                                  │                                     │
│                    ┌─────────────▼─────────────┐                       │
│                    │   GLOBAL COORDINATOR      │                       │
│                    │   (DAG Consensus)         │                       │
│                    │                           │                       │
│                    │  • Cross-region cuts      │                       │
│                    │  • Aggregated e-process   │                       │
│                    │  • Boundary arbitration   │                       │
│                    └───────────────────────────┘                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Hierarchical Decision Protocol

/// Distributed gate with hierarchical coordination
pub struct DistributedGateController {
    /// Local gate for fast-path decisions
    local_gate: AnytimeGateController,

    /// Regional coordinator (Raft consensus)
    regional: RegionalCoordinator,

    /// Global coordinator (DAG consensus)
    global: GlobalCoordinator,

    /// Decision routing policy
    routing: DecisionRoutingPolicy,
}

pub enum DecisionScope {
    /// Action affects only local partition
    Local,
    /// Action crosses regional boundary
    Regional,
    /// Action has global implications
    Global,
}

impl DistributedGateController {
    pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
        // 1. Determine scope
        let scope = self.routing.classify(action, context);

        // 2. Route to appropriate level
        match scope {
            DecisionScope::Local => {
                // Fast path: local decision only
                self.local_gate.evaluate(action, context)
            }

            DecisionScope::Regional => {
                // Medium path: coordinate with regional peers
                let local_result = self.local_gate.evaluate(action, context);
                let regional_result = self.regional.coordinate(action, &local_result).await?;
                self.merge_results(local_result, regional_result)
            }

            DecisionScope::Global => {
                // Slow path: full coordination
                let local_result = self.local_gate.evaluate(action, context);
                let regional_result = self.regional.coordinate(action, &local_result).await?;
                let global_result = self.global.arbitrate(action, &regional_result).await?;
                self.merge_all_results(local_result, regional_result, global_result)
            }
        }
    }
}

Distributed E-Process Aggregation

/// E-process that aggregates across distributed gates
pub struct DistributedEProcess {
    /// Local e-process
    local: EProcess,

    /// Peer e-process summaries (received via gossip)
    peer_summaries: HashMap<NodeId, EProcessSummary>,

    /// Aggregation method
    aggregation: AggregationMethod,
}

pub enum AggregationMethod {
    /// Conservative: minimum across all nodes
    Minimum,
    /// Average with confidence weighting
    WeightedAverage,
    /// Consensus-based (requires agreement)
    Consensus { threshold: f64 },
}

impl DistributedEProcess {
    /// Get aggregated e-value for distributed decision
    pub fn aggregated_value(&self) -> f64 {
        match self.aggregation {
            AggregationMethod::Minimum => {
                let local = self.local.current_value();
                let peer_min = self.peer_summaries.values()
                    .map(|s| s.current_value)
                    .fold(f64::INFINITY, f64::min);
                local.min(peer_min)
            }

            AggregationMethod::WeightedAverage => {
                let total_weight: f64 = 1.0 + self.peer_summaries.values()
                    .map(|s| s.confidence_weight)
                    .sum::<f64>();

                let weighted_sum = self.local.current_value()
                    + self.peer_summaries.values()
                        .map(|s| s.current_value * s.confidence_weight)
                        .sum::<f64>();

                weighted_sum / total_weight
            }

            AggregationMethod::Consensus { threshold } => {
                // Requires threshold fraction of nodes to agree
                let values: Vec<f64> = std::iter::once(self.local.current_value())
                    .chain(self.peer_summaries.values().map(|s| s.current_value))
                    .collect();

                // Return median if sufficient agreement, else conservative min
                if self.check_agreement(&values, threshold) {
                    statistical_median(&values)
                } else {
                    values.iter().cloned().fold(f64::INFINITY, f64::min)
                }
            }
        }
    }
}

Fault Tolerance

/// Fault-tolerant gate with automatic failover
pub struct FaultTolerantGate {
    /// Primary gate
    primary: AnytimeGateController,

    /// Standby gates (hot standbys)
    standbys: Vec<AnytimeGateController>,

    /// Health monitor
    health: HealthMonitor,

    /// Failover policy
    failover: FailoverPolicy,
}

pub struct FailoverPolicy {
    /// Maximum consecutive failures before failover
    max_failures: u32,
    /// Health check interval
    check_interval: Duration,
    /// Recovery grace period
    recovery_grace: Duration,
}

impl FaultTolerantGate {
    pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
        // Try primary
        match self.try_primary(action, context).await {
            Ok(result) => return Ok(result),
            Err(e) => {
                self.health.record_failure(&e);
            }
        }

        // Failover to standbys
        for (idx, standby) in self.standbys.iter_mut().enumerate() {
            match standby.evaluate(action, context) {
                Ok(result) => {
                    // Promote standby if primary unhealthy
                    if self.health.should_failover() {
                        self.promote_standby(idx);
                    }
                    return Ok(result);
                }
                Err(e) => {
                    self.health.record_standby_failure(idx, &e);
                }
            }
        }

        // All gates failed - safe default
        Ok(GateResult {
            decision: GateDecision::Deny,
            reason: "All gates unavailable - failing safe".into(),
            ..Default::default()
        })
    }
}

Integration with RuVector Consensus

Consensus Layer	RuVector Module	Gate Integration
Regional (Raft)	`ruvector-raft`	Local cut coordination, leader-based decisions
Global (DAG)	`ruvector-cluster`	Cross-region boundary arbitration
State Sync	`ruvector-sync`	E-process summary propagation
Receipt Chain	`ruvector-merkle`	Distributed receipt verification

Hardware Mapping: 256-Tile WASM Fabric

The coherence gate is an ideal workload for event-driven WASM hardware: mostly silent, then extremely decisive when boundaries move.

Tile Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         256-TILE COGNITUM FABRIC                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                        TILE ZERO (Arbiter)                       │   │
│  │                                                                  │   │
│  │  • Merge worker reports      • Hierarchical min-cut             │   │
│  │  • Global gate decision      • Permit token issuance            │   │
│  │  • Witness receipt log       • Hash-chained eventlog            │   │
│  └──────────────────────────────┬───────────────────────────────────┘   │
│                                 │                                       │
│            ┌────────────────────┼────────────────────┐                 │
│            │                    │                    │                  │
│            ▼                    ▼                    ▼                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐             │
│  │  Workers     │    │  Workers     │    │  Workers     │   ...       │
│  │  [1-85]      │    │  [86-170]    │    │  [171-255]   │             │
│  │              │    │              │    │              │             │
│  │  Shard A     │    │  Shard B     │    │  Shard C     │             │
│  │  Local cuts  │    │  Local cuts  │    │  Local cuts  │             │
│  │  E-accum     │    │  E-accum     │    │  E-accum     │             │
│  └──────────────┘    └──────────────┘    └──────────────┘             │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Worker Tile Responsibilities

Each of the 255 worker tiles maintains a local shard:

/// Worker tile state (fits in ~64KB WASM memory)
#[repr(C)]
pub struct WorkerTileState {
    /// Compact neighborhood graph (edges + weights)
    graph_shard: CompactGraph,          // ~32KB

    /// Rolling feature window for normality scores
    feature_window: RingBuffer<f32>,    // ~8KB

    /// Local coherence score
    coherence: f32,

    /// Local boundary candidates (top-k edges)
    boundary_edges: [EdgeId; 8],

    /// Local e-value accumulator
    e_accumulator: f64,

    /// Tick counter
    tick: u64,
}

/// Per-tick processing: only deltas
impl WorkerTileState {
    /// Process incoming delta (edge add/remove/weight update)
    pub fn ingest_delta(&mut self, delta: &Delta) -> Status {
        match delta {
            Delta::EdgeAdd(e) => self.graph_shard.add_edge(e),
            Delta::EdgeRemove(e) => self.graph_shard.remove_edge(e),
            Delta::WeightUpdate(e, w) => self.graph_shard.update_weight(e, *w),
            Delta::Observation(score) => self.feature_window.push(*score),
        }
        self.update_local_coherence();
        Status::Ok
    }

    /// Tick: compute and emit report
    pub fn tick(&mut self, now_ns: u64) -> TileReport {
        self.tick = now_ns;

        // Tiny math: update e-accumulator
        self.e_accumulator = self.compute_local_evalue();

        TileReport {
            tile_id: self.id,
            coherence: self.coherence,
            boundary_moved: self.detect_boundary_movement(),
            suspicious_edges: self.top_k_suspicious(),
            e_value: self.e_accumulator as f32,
            witness_fragment: self.extract_witness_fragment(),
        }
    }
}

/// Fixed-size report (fits in single cache line)
#[repr(C, align(64))]
pub struct TileReport {
    tile_id: u8,
    coherence: f32,
    boundary_moved: bool,
    suspicious_edges: [EdgeId; 4],
    e_value: f32,
    witness_fragment: WitnessFragment,
}

TileZero Responsibilities

TileZero acts as the arbiter that issues final decisions:

/// TileZero: Global gate decision and permit issuance
pub struct TileZero {
    /// Merged supergraph (reduced from worker summaries)
    supergraph: ReducedGraph,

    /// Canonical permit token state
    permit_state: PermitState,

    /// Hash-chained witness receipt log
    receipt_log: ReceiptLog,

    /// Threshold configuration
    thresholds: GateThresholds,
}

impl TileZero {
    /// Collect reports from all worker tiles
    pub fn collect_reports(&mut self, reports: &[TileReport; 255]) {
        // Merge worker summaries into supergraph
        for report in reports {
            if report.boundary_moved {
                self.supergraph.update_from_fragment(&report.witness_fragment);
            }
            self.supergraph.update_coherence(report.tile_id, report.coherence);
        }
    }

    /// Issue gate decision (microsecond latency)
    pub fn decide(&mut self, action_ctx: &ActionContext) -> PermitToken {
        // Three stacked filters:

        // 1. Structural filter (global cut on reduced graph)
        let structural_ok = self.supergraph.global_cut() >= self.thresholds.min_cut;

        // 2. Shift filter (aggregated shift pressure)
        let shift_pressure = self.aggregate_shift_pressure();
        let shift_ok = shift_pressure < self.thresholds.max_shift;

        // 3. Evidence filter (can stop immediately if enough evidence)
        let e_aggregate = self.aggregate_evidence();
        let evidence_decision = self.evidence_decision(e_aggregate);

        // Combined decision
        let decision = match (structural_ok, shift_ok, evidence_decision) {
            (false, _, _) => GateDecision::Deny,  // Structure broken
            (_, false, _) => GateDecision::Defer, // Shift detected
            (_, _, EvidenceDecision::Reject) => GateDecision::Deny,
            (_, _, EvidenceDecision::Continue) => GateDecision::Defer,
            (true, true, EvidenceDecision::Accept) => GateDecision::Permit,
        };

        // Issue token
        self.issue_permit_token(action_ctx, decision)
    }

    /// Issue permit token (a signed capability)
    fn issue_permit_token(
        &mut self,
        ctx: &ActionContext,
        decision: GateDecision,
    ) -> PermitToken {
        let witness_hash = self.compute_witness_hash();

        let token = PermitToken {
            decision,
            action_id: ctx.action_id,
            timestamp: now_ns(),
            ttl_ns: self.thresholds.permit_ttl,
            witness_hash,
            sequence: self.permit_state.next_sequence(),
        };

        // MAC or sign the token
        let mac = self.permit_state.sign(&token);

        // Emit receipt
        self.emit_receipt(&token, &mac);

        PermitToken { mac, ..token }
    }

    /// Emit witness receipt (hash-chained)
    fn emit_receipt(&mut self, token: &PermitToken, mac: &[u8; 32]) {
        let receipt = WitnessReceipt {
            token: token.clone(),
            mac: *mac,
            previous_hash: self.receipt_log.last_hash(),
            witness_summary: self.supergraph.witness_summary(),
        };

        self.receipt_log.append(receipt);
    }
}

/// Permit token: a capability that agents must present
#[repr(C)]
pub struct PermitToken {
    pub decision: GateDecision,
    pub action_id: ActionId,
    pub timestamp: u64,
    pub ttl_ns: u64,
    pub witness_hash: [u8; 32],
    pub sequence: u64,
    pub mac: [u8; 32],  // HMAC or signature
}

impl PermitToken {
    /// Agents must present valid token to perform actions
    pub fn is_valid(&self, verifier: &Verifier) -> bool {
        // Check TTL
        if now_ns() > self.timestamp + self.ttl_ns {
            return false;
        }

        // Verify MAC/signature
        verifier.verify(self, &self.mac)
    }
}

WASM Kernel API

Each tile runs a minimal WASM kernel:

/// Worker tile WASM exports
#[no_mangle]
pub extern "C" fn ingest_delta(delta_ptr: *const u8, len: usize) -> u32 {
    let delta = unsafe { core::slice::from_raw_parts(delta_ptr, len) };
    TILE_STATE.with(|state| state.borrow_mut().ingest_delta(delta))
}

#[no_mangle]
pub extern "C" fn tick(now_ns: u64) -> *const TileReport {
    TILE_STATE.with(|state| state.borrow_mut().tick(now_ns))
}

#[no_mangle]
pub extern "C" fn get_witness_fragment(id: u32) -> *const u8 {
    TILE_STATE.with(|state| state.borrow().get_witness_fragment(id))
}

/// TileZero WASM/native exports
#[no_mangle]
pub extern "C" fn collect_reports(reports_ptr: *const TileReport, count: usize) {
    TILEZERO.with(|tz| tz.borrow_mut().collect_reports(reports_ptr, count))
}

#[no_mangle]
pub extern "C" fn decide(action_ctx_ptr: *const ActionContext) -> *const PermitToken {
    TILEZERO.with(|tz| tz.borrow_mut().decide(action_ctx_ptr))
}

#[no_mangle]
pub extern "C" fn get_receipt(sequence: u64) -> *const WitnessReceipt {
    TILEZERO.with(|tz| tz.borrow().get_receipt(sequence))
}

v0 Implementation Strategy

Ship fast by layering:

Phase	Components	Skip Initially
v0.1	Structural coherence + witness receipt	Shift filter, evidence filter
v0.2	Add shift filter (normality scores)	CORE RL adaptation
v0.3	Add evidence filter (e-values)	Mixture e-values
v1.0	Full three-filter stack	-

Rust Deliverables

Crate	Description	Dependencies
`cognitum-gate-kernel`	`no_std` WASM kernel for worker tiles	`ruvector-mincut` (core algorithms)
`cognitum-gate-tilezero`	Native arbiter for TileZero	`ruvector-mincut`, `blake3`, `ed25519`
`mcp-gate`	MCP server for agent integration	`cognitum-gate-tilezero`

cognitum-gate/
├── cognitum-gate-kernel/      # no_std WASM
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs             # WASM exports
│       ├── shard.rs           # Compact graph shard
│       ├── evidence.rs        # Local e-accumulator
│       └── report.rs          # TileReport generation
│
├── cognitum-gate-tilezero/    # Native arbiter
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs
│       ├── merge.rs           # Report merging
│       ├── supergraph.rs      # Reduced global graph
│       ├── permit.rs          # Token issuance
│       └── receipt.rs         # Hash-chained log
│
└── mcp-gate/                  # MCP integration
    ├── Cargo.toml
    └── src/
        ├── lib.rs
        ├── tools.rs           # permit_action, get_receipt, replay_decision
        └── server.rs          # MCP server

MCP Gate Tools

/// MCP tool: Request permission for an action
#[mcp_tool]
pub async fn permit_action(
    action_id: String,
    action_type: String,
    context: serde_json::Value,
) -> Result<PermitResponse, McpError> {
    let ctx = ActionContext::from_json(&context)?;
    let token = TILEZERO.decide(&ctx);

    Ok(PermitResponse {
        decision: token.decision.to_string(),
        token: token.encode_base64(),
        witness_hash: hex::encode(&token.witness_hash),
        valid_until_ns: token.timestamp + token.ttl_ns,
    })
}

/// MCP tool: Get witness receipt for audit
#[mcp_tool]
pub async fn get_receipt(sequence: u64) -> Result<ReceiptResponse, McpError> {
    let receipt = TILEZERO.get_receipt(sequence)?;

    Ok(ReceiptResponse {
        sequence,
        decision: receipt.token.decision.to_string(),
        timestamp: receipt.token.timestamp,
        witness_summary: receipt.witness_summary.to_json(),
        previous_hash: hex::encode(&receipt.previous_hash),
        receipt_hash: hex::encode(&receipt.hash()),
    })
}

/// MCP tool: Replay decision for debugging/audit
#[mcp_tool]
pub async fn replay_decision(
    sequence: u64,
    verify_chain: bool,
) -> Result<ReplayResponse, McpError> {
    let receipt = TILEZERO.get_receipt(sequence)?;

    // Optionally verify hash chain
    if verify_chain {
        TILEZERO.verify_chain_to(sequence)?;
    }

    // Replay the decision with logged state
    let replayed = TILEZERO.replay(&receipt)?;

    Ok(ReplayResponse {
        original_decision: receipt.token.decision.to_string(),
        replayed_decision: replayed.decision.to_string(),
        match_confirmed: receipt.token.decision == replayed.decision,
        state_snapshot: replayed.state_snapshot.to_json(),
    })
}

The Practical Win

This gives Cognitum a clear job that buyers understand:

"We do not just detect issues, we prevent unsafe actions." "We can prove why we blocked or allowed it." "We stay calm until structure breaks."

The permit token as a capability means:

Agents cannot act without presenting a valid token
Tokens expire (TTL-bounded)
Every token is backed by a witness receipt
The entire chain is cryptographically verifiable

API Contract

Request: Permit Action

{
  "action_id": "cfg-push-7a3f",
  "action_type": "config_change",
  "target": {
    "device": "router-west-03",
    "path": "/network/interfaces/eth0"
  },
  "context": {
    "agent_id": "ops-agent-12",
    "session_id": "sess-abc123",
    "prior_actions": ["cfg-push-7a3e"],
    "urgency": "normal"
  }
}

Response: Permit

{
  "decision": "permit",
  "token": "eyJ0eXAiOiJQVCIsImFsZyI6IkVkMjU1MTkifQ...",
  "valid_until_ns": 1737158400000000000,
  "witness": {
    "structural": {
      "cut_value": 12.7,
      "partition": "stable",
      "critical_edges": 0
    },
    "predictive": {
      "set_size": 3,
      "coverage": 0.92
    },
    "evidential": {
      "e_value": 847.3,
      "verdict": "accept"
    }
  },
  "receipt_sequence": 1847392
}

Response: Defer

{
  "decision": "defer",
  "reason": "shift_detected",
  "detail": "Distribution shift pressure 0.73 exceeds threshold 0.5",
  "escalation": {
    "to": "human_operator",
    "context_url": "/receipts/1847393/context",
    "timeout_ns": 300000000000
  },
  "witness": {
    "structural": { "cut_value": 11.2, "partition": "stable" },
    "predictive": { "set_size": 18, "coverage": 0.91 },
    "evidential": { "e_value": 3.2, "verdict": "continue" }
  },
  "receipt_sequence": 1847393
}

Response: Deny

{
  "decision": "deny",
  "reason": "boundary_violation",
  "detail": "Action crosses fragile partition (cut=2.1 < min=5.0)",
  "witness": {
    "structural": {
      "cut_value": 2.1,
      "partition": "fragile",
      "critical_edges": 4,
      "boundary": ["edge-17", "edge-23", "edge-41", "edge-52"]
    },
    "predictive": { "set_size": 47, "coverage": 0.88 },
    "evidential": { "e_value": 0.004, "verdict": "reject" }
  },
  "receipt_sequence": 1847394
}

Migration Path

Phase M1: Shadow Mode

Run AVCG alongside existing GateController. Compare decisions, don't enforce.

impl HybridGate {
    pub fn evaluate(&mut self, action: &Action) -> GateResult {
        // Existing gate makes the decision
        let legacy_result = self.legacy_gate.evaluate(action);

        // AVCG runs in shadow, logs disagreements
        let avcg_result = self.avcg_gate.evaluate(action);

        if legacy_result.decision != avcg_result.decision {
            metrics::counter!("gate.shadow.disagreement").increment(1);
            log::info!(
                "Shadow disagreement: legacy={:?} avcg={:?} action={}",
                legacy_result.decision,
                avcg_result.decision,
                action.id
            );
        }

        legacy_result  // Legacy still decides
    }
}

Exit criteria: <1% disagreement rate over 7 days, zero false denies on known-safe actions.

Phase M2: Canary Enforcement

AVCG enforces for 5% of traffic, legacy handles rest.

impl CanaryGate {
    pub fn evaluate(&mut self, action: &Action) -> GateResult {
        let canary = self.canary_selector.select(action);

        if canary {
            metrics::counter!("gate.canary.avcg").increment(1);
            self.avcg_gate.evaluate(action)
        } else {
            self.legacy_gate.evaluate(action)
        }
    }
}

Exit criteria: No incidents attributed to AVCG decisions over 14 days.

Phase M3: Majority Rollout

AVCG handles 95%, legacy available for fallback.

Phase M4: Full Cutover

Legacy removed. AVCG is the gate.

Timeline:
M1 (Shadow)     → 2-4 weeks
M2 (Canary 5%)  → 2 weeks
M3 (Majority)   → 2 weeks
M4 (Full)       → 1 week
                  ─────────
Total           → 7-9 weeks

Observability

Metrics (Prometheus)

# Decision counters
gate_decisions_total{decision="permit|defer|deny", reason="..."}

# Latency histograms
gate_latency_seconds{phase="mincut|conformal|eprocess|decision|receipt"}

# Signal values
gate_cut_value{quantile="0.5|0.9|0.99"}
gate_prediction_set_size{quantile="0.5|0.9|0.99"}
gate_evalue{quantile="0.5|0.9|0.99"}

# Health
gate_healthy{component="mincut|conformal|eprocess"}
gate_failover_total{from="primary|standby_N"}

# Coverage tracking
gate_conformal_coverage_rate  # Should stay ≥ 0.85
gate_eprocess_power           # Evidence accumulation rate

Alerting Thresholds

Alert	Condition	Severity
`GateHighDenyRate`	deny_rate > 10% for 5m	Warning
`GateLatencyHigh`	p99 > 100ms for 5m	Warning
`GateCoverageDrift`	coverage < 0.80 for 15m	Critical
`GateUnhealthy`	any component unhealthy for 1m	Critical
`GateReceiptChainBroken`	hash verification fails	Critical

Debug Query: Why Was This Denied?

# Get full decision context
curl /api/gate/receipts/1847394/explain

# Response:
{
  "receipt_sequence": 1847394,
  "decision": "deny",
  "explanation": {
    "primary_reason": "structural",
    "structural": {
      "cut_value": 2.1,
      "threshold": 5.0,
      "failed": true,
      "boundary_edges": [
        {"id": "edge-17", "weight": 0.3, "endpoints": ["node-a", "node-b"]},
        ...
      ],
      "partition_context": "Device router-west-03 is in partition P7 which has been unstable since 14:32:07 UTC"
    },
    "predictive": { "failed": false, "detail": "Set size 47 within bounds" },
    "evidential": { "failed": true, "detail": "E-value 0.004 < τ_deny 0.01" }
  },
  "suggested_action": "Wait for partition P7 to stabilize or escalate to human approval",
  "similar_past_decisions": [1847201, 1846998, 1846754]
}

Open Questions Resolution

Q1: Graph model scope — immediate actions or multi-step lookahead?

Decision: Immediate actions for v0, optional 1-step lookahead for v1.

Rationale: Multi-step lookahead requires predicting action sequences, which adds latency and complexity. Start simple: evaluate the action being requested right now. If the current action is safe but would lead to an unsafe state, the next action will be denied when it's requested.

Q2: E-process null — "action safety" vs "policy consistency"?

Decision: Action safety as primary null, policy consistency as secondary.

Rationale:

Primary H₀: P(action leads to unsafe state) ≤ p₀
Secondary (optional): Current action consistent with established policy

Action safety is more fundamental. Policy consistency can be added as a separate e-process that runs in parallel.

Q3: Threshold learning — fixed or adaptive?

Decision: Fixed for v0, adaptive via meta-learning for v1.

Rationale: Fixed thresholds are easier to audit and explain. Once we have production data, we can train a meta-learner to adjust thresholds based on observed false positive/negative rates.

Q4: Human-in-loop — how are DEFER decisions presented?

Decision: Structured escalation with timeout and context link.

{
  "escalation": {
    "to": "human_operator",
    "channel": "slack|pagerduty|dashboard",
    "context_url": "/receipts/{seq}/context",
    "timeout_ns": 300000000000,
    "default_on_timeout": "deny"
  }
}

If human doesn't respond within timeout, default to DENY (fail-safe).

See Hybrid Agent/Human Workflow section for full details.

Q5: Adversarial robustness — adaptive adversaries?

Decision: Defense in depth + rate limiting + anomaly detection.

Mitigations:

Rate limit: Max N decisions per agent per minute
Anomaly: Flag agents with unusually high deny rates
Honeypots: Inject synthetic "trap" actions to detect probing
Rotation: Periodically rotate threshold parameters within safe bounds

Definition of Done

v0.1 Shippable Criteria

Criterion	Metric	Target
Structural filter works	Min-cut correctly identifies fragile partitions	100% on test suite
Receipts are signed	All receipts have valid Ed25519 signature	100%
Receipts are chained	Hash chain verifies for all receipts	100%
Latency acceptable	p99 gate decision time	< 50ms
No false denies	Known-safe actions are permitted	100% on test suite
Demo scenario runs	Network security control plane demo	End-to-end pass

v0.1 Minimum Viable Demo

Scenario: Agent requests config push to network device.

Agent calls permit_action with device target
Gate evaluates structural coherence (min-cut)
Gate returns PERMIT with signed receipt
Agent presents token to device
Device verifies token, accepts config

Success: Auditor can replay decision from receipt and get same result.

Cost Model

Memory per Tile (WASM)

Component	Size	Notes
Graph shard	32 KB	~2000 edges at 16 bytes each
Feature window	8 KB	2048 f32 values
E-accumulator	64 B	f64 + metadata
Boundary edges	64 B	8 × EdgeId
Total per worker	~41 KB	Fits in 64KB WASM page
Total 255 workers	~10.2 MB
TileZero state	~1 MB	Supergraph + receipt log head
Total fabric	~12 MB

Network Bandwidth

Flow	Frequency	Size	Bandwidth
Worker → TileZero reports	1/tick (10ms)	64 B × 255	~1.6 MB/s
Receipt log append	per decision	~512 B	Variable
Gossip (distributed)	1/100ms	~1 KB × peers	~10 KB/s × P

Storage Growth

Item	Size	Retention	Growth
Receipt	~512 B	90 days	~44 MB/day @ 1000 decisions/s
E-process checkpoint	~128 B	Forever	~11 MB/day @ 1000 decisions/s
Audit log	~256 B	1 year	~22 MB/day @ 1000 decisions/s

90-day storage: ~7 GB receipts + ~1 GB checkpoints ≈ 8 GB

Hybrid Agent/Human Workflow

The coherence gate is designed for bounded autonomy, not full autonomy. Humans stay in the loop at critical decision points.

Design Philosophy

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   "Agents handle the routine. Humans handle the novel."                │
│                                                                         │
│   PERMIT  → Agent proceeds autonomously (low risk, high confidence)    │
│   DEFER   → Human decides (uncertain, boundary case, policy gap)       │
│   DENY    → Blocked automatically (structural violation, unsafe)       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The gate doesn't replace human judgment—it routes decisions to humans when judgment is needed.

Escalation Tiers

Tier	Trigger	Responder	SLA	Example
T0	PERMIT	None (automated)	0	Routine config within stable partition
T1	DEFER (shift)	On-call operator	5 min	New dependency pattern detected
T2	DEFER (boundary)	Senior engineer	15 min	Action crosses partition boundary
T3	DEFER (policy gap)	Policy team	1 hour	No precedent for this action type
T4	DENY override request	Security + Management	4 hours	Agent requesting exception to denial

Human Decision Interface

When a DEFER is escalated, humans see:

┌─────────────────────────────────────────────────────────────────────────┐
│  DECISION REQUIRED                                    Timeout: 4:32    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Agent: ops-agent-12                                                   │
│  Action: Push config to router-west-03 /network/interfaces/eth0        │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  WHY DEFERRED                                                    │   │
│  │                                                                  │   │
│  │  • Shift detected: New dependency pattern (0.73 > 0.5 threshold)│   │
│  │  • This device was added to the graph 2 hours ago               │   │
│  │  • Similar actions on established devices: 847 permits, 0 denies│   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  CONTEXT                                                         │   │
│  │                                                                  │   │
│  │  Structural coherence: 11.2 (healthy)                           │   │
│  │  Prediction set size: 18 outcomes (moderate uncertainty)        │   │
│  │  Evidence accumulator: 3.2 (inconclusive)                       │   │
│  │                                                                  │   │
│  │  [View full witness receipt] [View similar past decisions]      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────────────────┐   │
│  │   APPROVE     │  │    DENY       │  │  ESCALATE TO T3           │   │
│  │   (proceed)   │  │   (block)     │  │  (need policy guidance)   │   │
│  └───────────────┘  └───────────────┘  └───────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Human Decision Recording

Human decisions become part of the audit trail:

pub struct HumanDecision {
    /// Original deferred receipt
    pub deferred_receipt_seq: u64,

    /// Human's decision
    pub decision: HumanVerdict,

    /// Human identity (authenticated)
    pub decider_id: AuthenticatedUserId,

    /// Reasoning (required for audit)
    pub rationale: String,

    /// Timestamp
    pub decided_at: u64,

    /// Signature (human signs their decision)
    pub signature: Ed25519Signature,
}

pub enum HumanVerdict {
    /// Approve the action
    Approve {
        /// Add to training data for future automation
        learn_from_this: bool,
    },
    /// Deny the action
    Deny {
        /// Reason for denial
        reason: String,
    },
    /// Escalate to higher tier
    Escalate {
        to_tier: EscalationTier,
        reason: String,
    },
    /// Request more information
    NeedMoreInfo {
        questions: Vec<String>,
    },
}

Override Protocol

Humans can override DENY decisions, but with friction and accountability:

pub struct DenyOverride {
    /// Which denial is being overridden
    pub denied_receipt_seq: u64,

    /// Who is overriding (must be T4 authority)
    pub overrider_id: AuthenticatedUserId,

    /// Second approver required
    pub second_approver_id: AuthenticatedUserId,

    /// Business justification (required, min 50 chars)
    pub justification: String,

    /// Time-bounded: override expires
    pub valid_until: u64,

    /// Scope-limited: only this specific action
    pub action_id: ActionId,

    /// Both signatures required
    pub overrider_signature: Ed25519Signature,
    pub approver_signature: Ed25519Signature,
}

Override constraints:

Two humans required (four-eyes principle)
Must provide written justification
Time-limited (max 24 hours)
Scope-limited (only the specific action)
All overrides flagged for security review

Learning from Human Decisions

Human decisions improve the gate over time:

/// When human approves a DEFER, optionally learn from it
pub fn learn_from_approval(
    deferred: &WitnessReceipt,
    human: &HumanDecision,
) {
    if human.decision.learn_from_this() {
        // Add to calibration data
        conformal_calibrator.add_observation(
            deferred.context.clone(),
            Outcome::Safe,  // Human judged it safe
        );

        // Update e-process null hypothesis
        eprocess_trainer.add_positive_example(
            deferred.action.clone(),
        );

        // Adjust threshold candidates (for meta-learning in v1)
        threshold_learner.record_human_permit(
            deferred.signals.clone(),
        );
    }
}

Workload Distribution Target

The goal is minimal human burden while maintaining safety:

Decision	Target Rate	Human Workload
PERMIT	90-95%	Zero
DEFER	4-9%	Human decides
DENY	1-2%	Zero (unless override requested)

If DEFER rate exceeds 10%, the gate is too conservative—tune thresholds. If DENY rate exceeds 5%, something is wrong—investigate root cause.

Integration Channels

Channel	Use Case	Response Format
Slack	On-call escalation	Interactive buttons
PagerDuty	Critical/timed decisions	Acknowledge + decision API
Dashboard	Batch review	Web UI with full context
CLI	Developer/ops workflow	`ruvector gate approve <seq>`
API	Programmatic integration	REST/gRPC

Audit Trail for Human Decisions

Every human decision is:

Authenticated: Decider identity verified via SSO/MFA
Signed: Human signs their decision with personal key
Chained: Added to the same receipt chain as gate decisions
Timestamped: Immutable record of when decision was made
Justified: Rationale captured for later review

Receipt Chain:
  [1847392] PERMIT (automated) → agent executed
  [1847393] DEFER (automated) → escalated to human
  [1847393-H] APPROVE (human: alice@corp) → agent executed
  [1847394] DENY (automated) → blocked
  [1847394-O] OVERRIDE (humans: bob@corp + carol@corp) → exception granted

Consequences

Benefits

Formal Guarantees: Type I error control at any stopping time
Distribution Shift Robustness: Conformal prediction adapts without retraining
Computational Efficiency: O(n^{o(1)}) update time from subpolynomial min-cut
Audit Trail: Every decision has cryptographic witness receipt
Defense in Depth: Three independent signals must concur for permit
Cryptographic Integrity: All receipts signed with Ed25519
Attack Resistance: E-value bounds, replay guards, race condition prevention
Distributed Scalability: Hierarchical coordination with regional and global tiers
Fault Tolerance: Automatic failover with safe defaults

Risks & Mitigations

Risk	Mitigation
Computational overhead	Lazy evaluation; batch updates; SIMD optimization
E-value power under uncertainty	Mixture e-values for robustness
Graph model mismatch	Learn graph structure from trajectories
Threshold tuning	Adaptive thresholds via meta-learning
Receipt forgery	Mandatory Ed25519 signing; chain linkage
E-value manipulation	Input bounds; clamping with security logging
Race conditions	Atomic decisions with sequence numbers
Replay attacks	Bloom filter + sliding window guard
Network partitions	Hierarchical decisions; local autonomy
Byzantine nodes	Consensus-based aggregation; safe defaults

Complexity Analysis

Operation	Current	With AVCG	Distributed AVCG
Edge update	O(n^{o(1)})	O(n^{o(1)})	O(n^{o(1)}) + network
Gate evaluation	O(1)	O(k) prediction set	O(k) + O(R) regional
Witness generation	O(m)	O(m) amortized	O(m) + signing
Certificate verification	O(n)	O(n + log T)	O(n + log T) + sig verify
Receipt signing	N/A	O(1) Ed25519	O(1) + HSM latency
Distributed consensus	N/A	N/A	O(log N) Raft
E-process aggregation	N/A	O(1)	O(P) peers

Where: k = prediction set size, T = history length, R = regional peers, N = cluster size, P = peer count

References

Dynamic Min-Cut

El-Hayek, Henzinger, Li. "Deterministic and Exact Fully-dynamic Minimum Cut of Superpolylogarithmic Size in Subpolynomial Time." arXiv:2512.13105, December 2025.
Jin, Sun, Thorup. "Fully Dynamic Exact Minimum Cut in Subpolynomial Time." SODA 2024.

Online Conformal Prediction

"Online Conformal Inference with Retrospective Adjustment for Faster Adaptation to Distribution Shift." arXiv:2511.04275, November 2025.
"Distribution-informed Online Conformal Prediction (COP)." December 2025.
"CORE: Conformal Regression under Distribution Shift via Reinforcement Learning." October 2025.

E-Values and E-Processes

Ramdas, Wang. "Hypothesis Testing with E-values." Foundations and Trends in Statistics, 2025.
ICML 2025 Tutorial: "Game-theoretic Statistics and Sequential Anytime-Valid Inference."
"Sequential Randomization Tests Using e-values." arXiv:2512.04366, December 2025.

AI Agent Control

"Bounded Autonomy: A Pragmatic Response to Concerns About Fully Autonomous AI Agents." XMPRO, 2025.
"Customizable Runtime Enforcement for Safe and Reliable LLM Agents." arXiv:2503.18666, 2025.

Testing Strategy

Unit Tests

Component	Coverage Target	Key Test Cases
`CompactGraph`	95%	Add/remove edges, weight updates, min-cut estimation
`EvidenceAccumulator`	95%	Bounds checking, update rules, stopping decisions
`TileReport`	90%	Serialization roundtrip, checksum verification
`PermitToken`	95%	Signing, verification, TTL expiration
`ReceiptLog`	95%	Hash chain integrity, tamper detection
`ThreeFilterDecision`	100%	All Permit/Defer/Deny paths

Integration Tests

Scenario	Description	Expected Outcome
Happy path	Stable graph, safe action	PERMIT with valid receipt
Boundary crossing	Action crosses fragile partition	DENY with boundary edges
Shift detection	New dependency pattern	DEFER with escalation
Human approval	DEFER → human approves	Token issued, learning recorded
Replay verification	Replay historical decision	Deterministic match
Hash chain audit	Verify 1000 receipts	All hashes valid

Property-Based Tests

#[proptest]
fn e_value_always_positive(e1: f64, e2: f64) {
    let result = combine_evalues(e1.abs(), e2.abs());
    prop_assert!(result > 0.0);
}

#[proptest]
fn receipt_hash_deterministic(receipt: WitnessReceipt) {
    let hash1 = receipt.compute_hash();
    let hash2 = receipt.compute_hash();
    prop_assert_eq!(hash1, hash2);
}

#[proptest]
fn serialization_roundtrip(report: TileReport) {
    let bytes = report.serialize();
    let restored = TileReport::deserialize(&bytes);
    prop_assert_eq!(report, restored);
}

Security Tests

Test	Attack Vector	Expected Behavior
Forged signature	Invalid Ed25519 sig	Verification fails
Replay attack	Duplicate action	ReplayGuard blocks
E-value overflow	Extreme likelihood ratio	Clamped to bounds
Race condition	Concurrent evaluations	Sequence numbers ordered
Tampered receipt	Modified hash	Chain verification fails

Benchmark Tests

Metric	Target	Measurement
Gate decision latency	p99 < 50ms	`criterion` benchmark
Receipt signing	< 5ms	`criterion` benchmark
255-tile report merge	< 10ms	`criterion` benchmark
Hash chain verification (1000)	< 100ms	`criterion` benchmark
Memory per worker tile	< 64KB	Static analysis

Configuration Format

TOML Configuration

# gate-config.toml

[gate]
# Gate identification
gate_id = "gate-west-01"
version = "0.1.0"

[thresholds]
# E-process thresholds
tau_deny = 0.01          # E-value below this → DENY
tau_permit = 100.0       # E-value above this → PERMIT

# Structural thresholds
min_cut = 5.0            # Cut value below this → DENY
max_shift = 0.5          # Shift pressure above this → DEFER

# Conformal thresholds
max_prediction_set = 20  # Set size above this → DEFER
coverage_target = 0.90   # Target coverage rate

[timing]
# Permit token TTL
permit_ttl_seconds = 300

# Decision timeout
decision_timeout_ms = 50

# Tick interval for worker tiles
tick_interval_ms = 10

[security]
# Key rotation
signing_key_rotation_days = 30
threshold_key_rotation_days = 90

# Replay prevention
replay_window_seconds = 3600
bloom_filter_size = 1000000

[distributed]
# Coordination settings
regional_peers = ["gate-west-02", "gate-west-03"]
global_coordinator = "coordinator-global-01"
raft_heartbeat_ms = 100
consensus_timeout_ms = 1000

[escalation]
# Human-in-loop settings
default_timeout_seconds = 300
default_on_timeout = "deny"

[escalation.channels.slack]
webhook_url = "${SLACK_WEBHOOK_URL}"
channel = "#gate-escalations"

[escalation.channels.pagerduty]
api_key = "${PAGERDUTY_API_KEY}"
service_id = "gate-critical"

[observability]
# Metrics endpoint
metrics_port = 9090
metrics_path = "/metrics"

# Tracing
tracing_enabled = true
tracing_sample_rate = 0.1
jaeger_endpoint = "http://jaeger:14268/api/traces"

[storage]
# Receipt storage
receipt_backend = "postgresql"
receipt_retention_days = 90
checkpoint_interval = 100

[storage.postgresql]
host = "${DB_HOST}"
port = 5432
database = "gate_receipts"
username = "${DB_USER}"
password = "${DB_PASSWORD}"

Environment Variables

# Required
export GATE_SIGNING_KEY_PATH=/etc/gate/keys/signing.key
export GATE_CONFIG_PATH=/etc/gate/config.toml

# Optional overrides
export GATE_TAU_DENY=0.01
export GATE_TAU_PERMIT=100.0
export GATE_MIN_CUT=5.0
export GATE_MAX_SHIFT=0.5
export GATE_PERMIT_TTL_SECONDS=300

# Secrets (never in config file)
export SLACK_WEBHOOK_URL=https://hooks.slack.com/...
export PAGERDUTY_API_KEY=...
export DB_PASSWORD=...

Error Recovery Procedures

Gate Decision Failures

Failure	Detection	Recovery	Fallback
Min-cut timeout	Decision exceeds 50ms	Log, retry once	DEFER
E-process NaN	`is_nan()` check	Reset accumulator	DENY
Signing failure	Ed25519 error	Rotate to backup key	DENY (unsigned)
Receipt log full	Capacity check	Archive, start new segment	DENY

Distributed Failures

impl FaultRecovery {
    pub async fn handle_regional_failure(&mut self, error: RegionalError) -> GateResult {
        match error {
            RegionalError::LeaderUnavailable => {
                // Wait for new leader election
                tokio::time::sleep(Duration::from_millis(200)).await;
                self.retry_with_new_leader().await
            }

            RegionalError::NetworkPartition => {
                // Fall back to local-only decision
                log::warn!("Network partition detected, using local gate");
                self.local_gate.evaluate_standalone()
            }

            RegionalError::ConsensusTimeout => {
                // Use conservative decision
                Ok(GateResult {
                    decision: GateDecision::Defer,
                    reason: "Consensus timeout - escalating to human".into(),
                    ..Default::default()
                })
            }
        }
    }
}

Receipt Chain Recovery

impl ReceiptLog {
    /// Recover from corrupted receipt chain
    pub fn recover_chain(&mut self, last_known_good: u64) -> Result<(), RecoveryError> {
        // 1. Truncate corrupted entries
        self.truncate_after(last_known_good)?;

        // 2. Rebuild from checkpoint
        let checkpoint = self.find_nearest_checkpoint(last_known_good)?;
        self.rebuild_from_checkpoint(checkpoint)?;

        // 3. Mark recovery in audit log
        self.append_recovery_marker(last_known_good)?;

        // 4. Alert operators
        alert::send("Receipt chain recovery performed", Severity::Warning);

        Ok(())
    }
}

Worker Tile Recovery

Failure	Detection	Recovery Time	Data Loss
Single tile crash	Heartbeat timeout	< 100ms	Last tick
Tile memory corruption	Checksum mismatch	< 500ms	Current shard
TileZero crash	Primary unavailable	< 1s	None (standbys)
Full fabric restart	All tiles down	< 5s	Rebuild from checkpoint

Runbook: Gate Unresponsive

# 1. Check gate health
curl http://gate:9090/health

# 2. If unhealthy, check logs
kubectl logs -l app=gate --tail=100

# 3. Check for resource exhaustion
kubectl top pods -l app=gate

# 4. If memory high, trigger GC
curl -X POST http://gate:9090/admin/gc

# 5. If still unresponsive, rolling restart
kubectl rollout restart deployment/gate

# 6. Verify recovery
curl http://gate:9090/health
curl http://gate:9090/metrics | grep gate_healthy

Appendix: Mathematical Foundations

E-Value Composition

For independent e-values e₁, e₂:

e_combined = e₁ · e₂
E[e_combined] = E[e₁] · E[e₂] ≤ 1 · 1 = 1

This enables optional continuation: evidence accumulates validly across sessions.

Conformal Coverage

Under exchangeability or bounded distribution shift:

P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α - δ_t

Where δ_t → 0 as the algorithm adapts via retrospective adjustment.

Anytime-Valid Stopping

For any stopping time τ (possibly data-dependent):

P_H₀(E_τ ≥ 1/α) ≤ α

This holds because E_t is a nonnegative supermartingale with E[E_0] = 1.

79 KiB Raw Blame History Unescape Escape

ADR-001: Anytime-Valid Coherence Gate

Version History

Plain Language Summary

Context

Decision

1. Dynamic Min-Cut with Witness Partitions

2. Online Conformal Prediction with Shift-Awareness

3. E-Values and E-Processes for Anytime-Valid Inference

Gate Architecture

Integration with Existing Architecture

Extension Points

New Modules

Decision Rules

Permit Conditions (all must hold)

Defer Conditions (any triggers)

Deny Conditions (any triggers)

Threshold Configuration

Witness Receipt Structure

Security Hardening

Threat Model

Cryptographic Requirements

Receipt Signing (CRITICAL)

Key Management

Attack Mitigations

E-Value Manipulation Prevention

Race Condition Prevention

Replay Attack Prevention

Trust Boundaries

Performance Optimization

Identified Bottlenecks & Solutions

1. E-Process History Management

2. Min-Cut Hierarchy Updates

3. SIMD-Optimized E-Value Computation

4. Receipt Serialization Optimization

Latency Budget (Revised)

Distributed Coordination

Multi-Agent Gate Architecture

Hierarchical Decision Protocol

Distributed E-Process Aggregation

Fault Tolerance

Integration with RuVector Consensus

Hardware Mapping: 256-Tile WASM Fabric

Tile Architecture

Worker Tile Responsibilities

TileZero Responsibilities

WASM Kernel API

v0 Implementation Strategy

Rust Deliverables

MCP Gate Tools

The Practical Win

API Contract

Request: Permit Action

Response: Permit

Response: Defer

Response: Deny

Migration Path

Phase M1: Shadow Mode

Phase M2: Canary Enforcement

Phase M3: Majority Rollout

Phase M4: Full Cutover

Observability

Metrics (Prometheus)

Alerting Thresholds

Debug Query: Why Was This Denied?

Open Questions Resolution

Q1: Graph model scope — immediate actions or multi-step lookahead?

Q2: E-process null — "action safety" vs "policy consistency"?

Q3: Threshold learning — fixed or adaptive?

Q4: Human-in-loop — how are DEFER decisions presented?

Q5: Adversarial robustness — adaptive adversaries?

Definition of Done

v0.1 Shippable Criteria

v0.1 Minimum Viable Demo

Cost Model

Memory per Tile (WASM)

Network Bandwidth

Storage Growth

Hybrid Agent/Human Workflow

Design Philosophy

79 KiB

Raw Blame History