Files
wifi-densepose/crates/ruvector-mincut/docs/adr/ADR-001-anytime-valid-coherence-gate.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

2224 lines
79 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-001: Anytime-Valid Coherence Gate
**Status**: Proposed
**Date**: 2026-01-17
**Authors**: ruv.io, RuVector Team
**Deciders**: Architecture Review Board
**SDK**: Claude-Flow
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 0.1 | 2026-01-17 | ruv.io | Initial draft with three-filter architecture |
| 0.2 | 2026-01-17 | ruv.io | Added security hardening, performance optimization |
| 0.3 | 2026-01-17 | ruv.io | Added 256-tile WASM fabric mapping |
| 0.4 | 2026-01-17 | ruv.io | Added API contract, migration, observability |
| 0.5 | 2026-01-17 | ruv.io | Added hybrid agent/human workflow |
| 0.6 | 2026-01-17 | ruv.io | Added testing strategy, config format, error recovery |
## Plain Language Summary
**What is it?**
An Anytime-Valid Coherence Gate is a small control loop that decides, at any moment:
> "Is it safe to act right now, or should we pause or escalate?"
It does not try to be smart. It tries to be **safe**, **calm**, and **correct** about permission.
**Why "anytime-valid"?**
Because you can stop the computation at any time and still trust the decision.
Like a smoke detector:
- It can keep listening forever
- The moment it has enough evidence, it triggers
- If you stop listening early, whatever it already concluded is still valid
You are not waiting for a model to finish thinking. You are continuously monitoring stability.
**Why "coherence"?**
Coherence means: does the system's current state agree with itself?
In RuVector, coherence is measured from structure:
- RuVector holds relationships as vectors plus a graph
- Min-cut and boundary signals tell you when the graph is becoming fragile or splitting into conflicting regions
- If the system is splitting, you do not let it take big actions
**What it outputs:**
| Decision | Meaning |
|----------|---------|
| **Permit** | Stable enough, proceed |
| **Defer** | Uncertain, escalate to a stronger model or human |
| **Deny** | Unstable or policy-violating, block the action |
Every decision returns a short "receipt" explaining why.
**A concrete example:**
An agent wants to push a config change to a network device.
- If the dependency graph is stable and similar changes worked before → **Permit**
- If signals are weird (new dependencies, new actors, drift) → **Defer** and ask for confirmation
- If the change crosses a fragile boundary (touches a partition already unstable) → **Deny**
**Why it matters:**
It turns autonomy into something enterprises can trust because:
- Actions are bounded
- Uncertainty is handled explicitly
- You get an audit trail
*"Attention becomes a permission system, not a popularity contest"* — applied to whole-system actions instead of token attention.
---
## Context
The RuVector ecosystem requires a principled mechanism for controlling autonomous agent actions with:
- **Formal safety guarantees** under distribution shift
- **Computational efficiency** suitable for real-time enforcement
- **Auditable decision trails** with cryptographic receipts
Current approaches (threshold classifiers, rule-based systems, periodic audits) lack one or more of these properties. This ADR proposes the **Anytime-Valid Coherence Gate (AVCG)** - a 3-way algorithmic combination that converts coherence measurement into a deterministic control loop.
## Decision
We will implement an Anytime-Valid Coherence Gate that integrates three cutting-edge algorithmic components:
### 1. Dynamic Min-Cut with Witness Partitions
**Source**: El-Hayek, Henzinger, Li (arXiv:2512.13105, December 2025)
**Key Innovation**: Exact deterministic n^{o(1)} update time for cuts up to 2^{Θ(log^{3/4-c}n)}
**Integration**:
- Extends existing `SubpolynomialMinCut` in `ruvector-mincut/src/subpolynomial/mod.rs`
- Leverages existing `WitnessTree` for explicit partition certificates
- Uses deterministic `LocalKCut` for local cut verification
**Role in Gate**: Provides the **structural coherence signal** - identifies minimal intervention points in the agent action graph with explicit witness partitions showing which actions form the critical boundary to unsafe states.
### 2. Online Conformal Prediction with Shift-Awareness
**Sources**:
- Retrospective Adjustment (arXiv:2511.04275, November 2025)
- Conformal Optimistic Prediction (COP) (December 2025)
- CORE: RL-based Conformal Regression (October 2025)
**Key Innovation**: Distribution-free coverage guarantees that adapt to arbitrary distribution shift with faster recalibration via retrospective adjustment.
**Integration**:
- New module: `ruvector-mincut/src/conformal/` for prediction sets
- Interfaces with existing `GatePolicy` thresholds
- Wraps action outcome predictions with calibrated uncertainty
**Role in Gate**: Provides the **predictive uncertainty signal** - quantifies confidence in action outcomes, triggering DEFER when prediction sets are too large.
### 3. E-Values and E-Processes for Anytime-Valid Inference
**Sources**:
- Ramdas & Wang "Hypothesis Testing with E-values" (FnTStA 2025)
- ICML 2025 Tutorial on SAVI
- Sequential Randomization Tests (arXiv:2512.04366, December 2025)
**Key Innovation**: Evidence accumulation that remains valid at any stopping time, with multiplicative composition across experiments.
**Definition**: E-value e satisfies E[e] ≤ 1 under null hypothesis. E-processes are nonnegative supermartingales with E_0 = 1.
**Integration**:
- New module: `ruvector-mincut/src/eprocess/` for evidence tracking
- Integrates with existing `CutCertificate` for audit trails
- Enables anytime-valid stopping decisions
**Role in Gate**: Provides the **evidential validity signal** - accumulates statistical evidence for/against coherence with formal Type I error control at any stopping time.
## Gate Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ ANYTIME-VALID COHERENCE GATE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ DYNAMIC MIN-CUT │ │ CONFORMAL │ │ E-PROCESS │ │
│ │ (Structural) │ │ (Predictive) │ │ (Evidential) │ │
│ │ │ │ │ │ │ │
│ │ SubpolynomialMC │ │ ShiftAdaptive │ │ CoherenceTest │ │
│ │ WitnessTree │───▶│ PredictionSet │───▶│ EvidenceAccum │ │
│ │ LocalKCut │ │ COP/CORE │ │ StoppingRule │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ DECISION LOGIC │ │
│ │ │ │
│ │ PERMIT: E_t > τ_permit ∧ action ∉ CriticalCut ∧ |C_t| small │ │
│ │ DEFER: |C_t| large τ_deny < E_t < τ_permit │ │
│ │ DENY: E_t < τ_deny action ∈ WitnessPartition(unsafe) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ WITNESS RECEIPT │ │
│ │ (cut + conf + e) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
## Integration with Existing Architecture
### Extension Points
| Component | Current Implementation | AVCG Extension |
|-----------|----------------------|----------------|
| `GatePacket` | λ as point estimate | Add `lambda_confidence_q15`, `e_value_log_q15` |
| `GateController` | Rule-based thresholds | Add `AnytimeGatePolicy` with adaptive thresholds |
| `WitnessTree` | Cut value only | Add `ConfidenceWitness` with staleness tracking |
| `CutCertificate` | Static verification | Add `EvidenceReceipt` with e-value trace |
| `TierDecision` | Fixed tiers | Add `required_confidence_for_tier` |
### New Modules
```
ruvector-mincut/
├── src/
│ ├── conformal/ # NEW: Online conformal prediction
│ │ ├── mod.rs
│ │ ├── prediction_set.rs
│ │ ├── cop.rs # Conformal Optimistic Prediction
│ │ ├── retrospective.rs # Retrospective adjustment
│ │ └── core.rs # RL-based conformal
│ ├── eprocess/ # NEW: E-value and e-process tracking
│ │ ├── mod.rs
│ │ ├── evalue.rs
│ │ ├── evidence_accum.rs
│ │ ├── stopping.rs
│ │ └── mixture.rs
│ ├── anytime_gate/ # NEW: Integrated gate controller
│ │ ├── mod.rs
│ │ ├── policy.rs
│ │ ├── decision.rs
│ │ └── receipt.rs
│ └── ...existing modules...
```
## Decision Rules
### Permit Conditions (all must hold)
1. E-process value E_t > τ_permit (sufficient evidence of coherence)
2. Action not in witness partition of critical cut
3. Conformal prediction set |C_t| < θ_confidence (confident prediction)
### Defer Conditions (any triggers)
1. Conformal prediction set |C_t| > θ_uncertainty (uncertain outcome)
2. E-process in indeterminate range: τ_deny < E_t < τ_permit
3. Deadline approaching without sufficient confidence
### Deny Conditions (any triggers)
1. E-process value E_t < τ_deny (strong evidence of incoherence)
2. Action in witness partition crossing to unsafe states
3. Structural impossibility via min-cut topology
## Threshold Configuration
| Threshold | Meaning | Recommended Default |
|-----------|---------|---------------------|
| τ_deny | E-process level indicating incoherence | 0.01 (1% false alarm) |
| τ_permit | E-process level indicating coherence | 100 (strong evidence) |
| θ_uncertainty | Conformal set size requiring deferral | Task-dependent |
| θ_confidence | Conformal set size for confident permit | Task-dependent |
## Witness Receipt Structure
```rust
pub struct WitnessReceipt {
/// Timestamp of decision
pub timestamp: u64,
/// Action that was evaluated
pub action_id: ActionId,
/// Gate decision
pub decision: GateDecision,
// Structural witness (from min-cut)
pub cut_value: f64,
pub witness_partition: (Vec<VertexId>, Vec<VertexId>),
pub critical_edges: Vec<EdgeId>,
// Predictive witness (from conformal)
pub prediction_set: ConformalSet,
pub coverage_target: f32,
pub shift_adaptation_rate: f32,
// Evidential witness (from e-process)
pub e_value: f64,
pub e_process_cumulative: f64,
pub stopping_valid: bool,
// Cryptographic seal
pub receipt_hash: [u8; 32],
}
```
## Security Hardening
### Threat Model
| Threat Actor | Capabilities | Target | Impact |
|--------------|--------------|--------|--------|
| **Malicious Agent** | Action injection, timing manipulation | Gate bypass | Unauthorized actions executed |
| **Network Adversary** | Message interception, replay | Receipt forgery | False audit trail |
| **Insider Threat** | Threshold modification, key access | Policy manipulation | Safety guarantees voided |
| **Byzantine Node** | Arbitrary behavior in distributed gate | Consensus corruption | Inconsistent decisions |
### Cryptographic Requirements
#### Receipt Signing (CRITICAL)
```rust
pub struct WitnessReceipt {
// ... existing fields ...
// Cryptographic seal (REQUIRED)
pub receipt_hash: [u8; 32], // Blake3 hash of serialized content
pub signature: Ed25519Signature, // REQUIRED, not optional
pub signer_id: PublicKey, // Identity of signing gate
pub timestamp_proof: TimestampProof, // Prevents backdating
}
/// Timestamp proof prevents replay and backdating
pub struct TimestampProof {
pub timestamp: u64,
pub previous_receipt_hash: [u8; 32], // Chain linkage
pub merkle_root: [u8; 32], // Batch anchor
}
impl WitnessReceipt {
/// Sign receipt - MUST be called before any external use
pub fn sign(&mut self, key: &SigningKey) -> Result<(), CryptoError> {
let content = self.serialize_without_signature();
self.receipt_hash = blake3::hash(&content).into();
self.signature = key.sign(&self.receipt_hash);
Ok(())
}
/// Verify receipt integrity and authenticity
pub fn verify(&self, trusted_keys: &KeyStore) -> Result<(), VerifyError> {
// 1. Verify hash
let expected_hash = blake3::hash(&self.serialize_without_signature());
if self.receipt_hash != expected_hash.into() {
return Err(VerifyError::HashMismatch);
}
// 2. Verify signature
let public_key = trusted_keys.get(&self.signer_id)?;
public_key.verify(&self.receipt_hash, &self.signature)?;
// 3. Verify timestamp chain
self.timestamp_proof.verify()?;
Ok(())
}
}
```
#### Key Management
| Key Type | Purpose | Rotation | Storage |
|----------|---------|----------|---------|
| Gate Signing Key | Sign receipts | 30 days | HSM or secure enclave |
| Receipt Verification Keys | Verify receipts | On rotation | Distributed key store |
| Threshold Keys | Multi-party signing | 90 days | Shamir secret sharing |
### Attack Mitigations
#### E-Value Manipulation Prevention
```rust
/// Bounds checking for e-value inputs
impl EValue {
pub fn from_likelihood_ratio(
likelihood_h1: f64,
likelihood_h0: f64,
) -> Result<Self, EValueError> {
// Prevent division by zero
if likelihood_h0 <= f64::EPSILON {
return Err(EValueError::InvalidDenominator);
}
let ratio = likelihood_h1 / likelihood_h0;
// Bound extreme values to prevent overflow attacks
let bounded = ratio.clamp(E_VALUE_MIN, E_VALUE_MAX);
// Log if clamping occurred (potential attack indicator)
if (bounded - ratio).abs() > f64::EPSILON {
security_log!("E-value clamped: {} -> {}", ratio, bounded);
}
Ok(Self { value: bounded, ..Default::default() })
}
}
const E_VALUE_MIN: f64 = 1e-10;
const E_VALUE_MAX: f64 = 1e10;
```
#### Race Condition Prevention
```rust
/// Atomic gate decision with sequence numbers
pub struct AtomicGateDecision {
/// Monotonic sequence for ordering
sequence: AtomicU64,
/// Lock for decision atomicity
decision_lock: RwLock<()>,
}
impl AtomicGateDecision {
pub async fn evaluate(&self, action: &Action) -> GateResult {
// Acquire exclusive lock for decision
let _guard = self.decision_lock.write().await;
// Get sequence number BEFORE evaluation
let seq = self.sequence.fetch_add(1, Ordering::SeqCst);
// Evaluate all three signals atomically
let result = self.evaluate_internal(action, seq).await;
// Sequence number in receipt ensures ordering
result.with_sequence(seq)
}
}
```
#### Replay Attack Prevention
```rust
/// Replay prevention via nonce tracking
pub struct ReplayGuard {
/// Recent action hashes (bloom filter for efficiency)
recent_actions: BloomFilter,
/// Sliding window of full hashes for false positive resolution
hash_window: VecDeque<[u8; 32]>,
/// Maximum age of tracked actions
window_duration: Duration,
}
impl ReplayGuard {
pub fn check_and_record(&mut self, action: &Action) -> Result<(), ReplayError> {
let hash = action.content_hash();
// Fast path: bloom filter check
if self.recent_actions.might_contain(&hash) {
// Slow path: verify against full hash window
if self.hash_window.contains(&hash) {
return Err(ReplayError::DuplicateAction { hash });
}
}
// Record action
self.recent_actions.insert(&hash);
self.hash_window.push_back(hash);
self.prune_old_entries();
Ok(())
}
}
```
### Trust Boundaries
```
┌─────────────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: GATE CORE │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ • E-process computation • Min-cut evaluation │ │
│ │ • Conformal prediction • Decision logic │ │
│ │ • Receipt signing • Key material │ │
│ │ │ │
│ │ Invariants: │ │
│ │ - All inputs validated before use │ │
│ │ - All outputs signed before release │ │
│ │ - No external calls during decision │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ (authenticated channel) │
│ │ │
└────────────────────────────────────┼────────────────────────────────────┘
┌────────────────────────────────────┼────────────────────────────────────┐
│ TRUST BOUNDARY: AGENT INTERFACE │
│ │ │
│ • Action submission (validated) │ • Decision receipt (verified) │
│ • Context provision (sanitized) │ • Witness query (authenticated) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Performance Optimization
### Identified Bottlenecks & Solutions
#### 1. E-Process History Management
**Problem**: Unbounded history growth in `EProcess.history: Vec<EValue>`
**Solution**: Ring buffer with configurable retention
```rust
pub struct EProcess {
/// Current accumulated value (always maintained)
current: f64,
/// Bounded history ring buffer
history: RingBuffer<EValueSummary>,
/// Checkpoint for long-term audit (sampled)
checkpoints: Vec<EProcessCheckpoint>,
}
/// Compact summary for history
pub struct EValueSummary {
value: f32, // Reduced precision for storage
timestamp: u32, // Relative to epoch
flags: u8, // Metadata bits
}
impl EProcess {
const HISTORY_CAPACITY: usize = 1024;
const CHECKPOINT_INTERVAL: usize = 100;
pub fn update(&mut self, e: EValue) {
// Update current (always)
self.current = self.update_rule.apply(self.current, e.value);
// Add to ring buffer (bounded)
self.history.push(e.to_summary());
// Periodic checkpoint for audit
if self.history.len() % Self::CHECKPOINT_INTERVAL == 0 {
self.checkpoints.push(self.checkpoint());
}
}
}
```
#### 2. Min-Cut Hierarchy Updates
**Problem**: Sequential iteration over all hierarchy levels
**Solution**: Lazy propagation with dirty tracking
```rust
pub struct LazyHierarchy {
levels: Vec<HierarchyLevel>,
/// Bitmap of levels needing update
dirty_levels: u64,
/// Deferred updates queue
pending_updates: VecDeque<DeferredUpdate>,
}
impl LazyHierarchy {
pub fn insert(&mut self, edge: Edge) {
// Only update lowest level immediately
self.levels[0].insert(edge);
self.dirty_levels |= 1;
// Defer higher level updates
self.pending_updates.push_back(DeferredUpdate::Insert(edge));
}
pub fn get_cut(&mut self) -> CutValue {
// Propagate only if needed for query
if self.dirty_levels != 0 {
self.propagate_lazy();
}
self.levels.last().unwrap().cut_value()
}
fn propagate_lazy(&mut self) {
// Process only dirty levels
while self.dirty_levels != 0 {
let level = self.dirty_levels.trailing_zeros() as usize;
self.update_level(level);
self.dirty_levels &= !(1 << level);
}
}
}
```
#### 3. SIMD-Optimized E-Value Computation
```rust
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
/// Batch e-value computation with SIMD
pub fn compute_mixture_evalue_simd(
likelihoods_h1: &[f64],
likelihoods_h0: &[f64],
weights: &[f64],
) -> f64 {
assert_eq!(likelihoods_h1.len(), likelihoods_h0.len());
assert_eq!(likelihoods_h1.len(), weights.len());
#[cfg(target_feature = "avx2")]
unsafe {
let mut sum = _mm256_setzero_pd();
for i in (0..likelihoods_h1.len()).step_by(4) {
let h1 = _mm256_loadu_pd(likelihoods_h1.as_ptr().add(i));
let h0 = _mm256_loadu_pd(likelihoods_h0.as_ptr().add(i));
let w = _mm256_loadu_pd(weights.as_ptr().add(i));
let ratio = _mm256_div_pd(h1, h0);
let weighted = _mm256_mul_pd(ratio, w);
sum = _mm256_add_pd(sum, weighted);
}
// Horizontal sum
horizontal_sum_pd(sum)
}
#[cfg(not(target_feature = "avx2"))]
{
// Scalar fallback
likelihoods_h1.iter()
.zip(likelihoods_h0.iter())
.zip(weights.iter())
.map(|((h1, h0), w)| (h1 / h0) * w)
.sum()
}
}
```
#### 4. Receipt Serialization Optimization
```rust
/// Zero-copy receipt serialization
pub struct ReceiptBuffer {
/// Pre-allocated buffer pool
pool: BufferPool,
/// Current buffer
current: Buffer,
}
impl WitnessReceipt {
/// Serialize to pre-allocated buffer (zero-copy)
pub fn serialize_into(&self, buffer: &mut [u8]) -> Result<usize, SerializeError> {
let mut cursor = 0;
// Fixed-size header (no allocation)
cursor += self.write_header(&mut buffer[cursor..])?;
// Structural witness (fixed size)
cursor += self.structural.write_to(&mut buffer[cursor..])?;
// Predictive witness (bounded size)
cursor += self.predictive.write_to(&mut buffer[cursor..])?;
// Evidential witness (fixed size)
cursor += self.evidential.write_to(&mut buffer[cursor..])?;
// Hash and signature (fixed size)
buffer[cursor..cursor + 32].copy_from_slice(&self.receipt_hash);
cursor += 32;
buffer[cursor..cursor + 64].copy_from_slice(&self.signature.to_bytes());
cursor += 64;
Ok(cursor)
}
}
```
### Latency Budget (Revised)
| Component | Budget | Optimization | Measured p99 |
|-----------|--------|--------------|--------------|
| Min-cut query | 10ms | Lazy propagation | TBD |
| Conformal prediction | 15ms | Cached quantiles | TBD |
| E-process update | 5ms | SIMD mixture | TBD |
| Decision logic | 5ms | Short-circuit | TBD |
| Receipt generation | 10ms | Zero-copy serialize | TBD |
| Signing | 5ms | Ed25519 batch | TBD |
| **Total** | **50ms** | | |
---
## Distributed Coordination
### Multi-Agent Gate Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED COHERENCE GATE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ REGIONAL │ │ REGIONAL │ │ REGIONAL │ │
│ │ GATE (Raft) │ │ GATE (Raft) │ │ GATE (Raft) │ │
│ │ │ │ │ │ │ │
│ │ • Local cuts │ │ • Local cuts │ │ • Local cuts │ │
│ │ • Local conf │ │ • Local conf │ │ • Local conf │ │
│ │ • Local e-proc │ │ • Local e-proc │ │ • Local e-proc │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ └──────────────────────┼──────────────────────┘ │
│ │ │
│ ┌─────────────▼─────────────┐ │
│ │ GLOBAL COORDINATOR │ │
│ │ (DAG Consensus) │ │
│ │ │ │
│ │ • Cross-region cuts │ │
│ │ • Aggregated e-process │ │
│ │ • Boundary arbitration │ │
│ └───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Hierarchical Decision Protocol
```rust
/// Distributed gate with hierarchical coordination
pub struct DistributedGateController {
/// Local gate for fast-path decisions
local_gate: AnytimeGateController,
/// Regional coordinator (Raft consensus)
regional: RegionalCoordinator,
/// Global coordinator (DAG consensus)
global: GlobalCoordinator,
/// Decision routing policy
routing: DecisionRoutingPolicy,
}
pub enum DecisionScope {
/// Action affects only local partition
Local,
/// Action crosses regional boundary
Regional,
/// Action has global implications
Global,
}
impl DistributedGateController {
pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
// 1. Determine scope
let scope = self.routing.classify(action, context);
// 2. Route to appropriate level
match scope {
DecisionScope::Local => {
// Fast path: local decision only
self.local_gate.evaluate(action, context)
}
DecisionScope::Regional => {
// Medium path: coordinate with regional peers
let local_result = self.local_gate.evaluate(action, context);
let regional_result = self.regional.coordinate(action, &local_result).await?;
self.merge_results(local_result, regional_result)
}
DecisionScope::Global => {
// Slow path: full coordination
let local_result = self.local_gate.evaluate(action, context);
let regional_result = self.regional.coordinate(action, &local_result).await?;
let global_result = self.global.arbitrate(action, &regional_result).await?;
self.merge_all_results(local_result, regional_result, global_result)
}
}
}
}
```
### Distributed E-Process Aggregation
```rust
/// E-process that aggregates across distributed gates
pub struct DistributedEProcess {
/// Local e-process
local: EProcess,
/// Peer e-process summaries (received via gossip)
peer_summaries: HashMap<NodeId, EProcessSummary>,
/// Aggregation method
aggregation: AggregationMethod,
}
pub enum AggregationMethod {
/// Conservative: minimum across all nodes
Minimum,
/// Average with confidence weighting
WeightedAverage,
/// Consensus-based (requires agreement)
Consensus { threshold: f64 },
}
impl DistributedEProcess {
/// Get aggregated e-value for distributed decision
pub fn aggregated_value(&self) -> f64 {
match self.aggregation {
AggregationMethod::Minimum => {
let local = self.local.current_value();
let peer_min = self.peer_summaries.values()
.map(|s| s.current_value)
.fold(f64::INFINITY, f64::min);
local.min(peer_min)
}
AggregationMethod::WeightedAverage => {
let total_weight: f64 = 1.0 + self.peer_summaries.values()
.map(|s| s.confidence_weight)
.sum::<f64>();
let weighted_sum = self.local.current_value()
+ self.peer_summaries.values()
.map(|s| s.current_value * s.confidence_weight)
.sum::<f64>();
weighted_sum / total_weight
}
AggregationMethod::Consensus { threshold } => {
// Requires threshold fraction of nodes to agree
let values: Vec<f64> = std::iter::once(self.local.current_value())
.chain(self.peer_summaries.values().map(|s| s.current_value))
.collect();
// Return median if sufficient agreement, else conservative min
if self.check_agreement(&values, threshold) {
statistical_median(&values)
} else {
values.iter().cloned().fold(f64::INFINITY, f64::min)
}
}
}
}
}
```
### Fault Tolerance
```rust
/// Fault-tolerant gate with automatic failover
pub struct FaultTolerantGate {
/// Primary gate
primary: AnytimeGateController,
/// Standby gates (hot standbys)
standbys: Vec<AnytimeGateController>,
/// Health monitor
health: HealthMonitor,
/// Failover policy
failover: FailoverPolicy,
}
pub struct FailoverPolicy {
/// Maximum consecutive failures before failover
max_failures: u32,
/// Health check interval
check_interval: Duration,
/// Recovery grace period
recovery_grace: Duration,
}
impl FaultTolerantGate {
pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
// Try primary
match self.try_primary(action, context).await {
Ok(result) => return Ok(result),
Err(e) => {
self.health.record_failure(&e);
}
}
// Failover to standbys
for (idx, standby) in self.standbys.iter_mut().enumerate() {
match standby.evaluate(action, context) {
Ok(result) => {
// Promote standby if primary unhealthy
if self.health.should_failover() {
self.promote_standby(idx);
}
return Ok(result);
}
Err(e) => {
self.health.record_standby_failure(idx, &e);
}
}
}
// All gates failed - safe default
Ok(GateResult {
decision: GateDecision::Deny,
reason: "All gates unavailable - failing safe".into(),
..Default::default()
})
}
}
```
### Integration with RuVector Consensus
| Consensus Layer | RuVector Module | Gate Integration |
|-----------------|-----------------|------------------|
| Regional (Raft) | `ruvector-raft` | Local cut coordination, leader-based decisions |
| Global (DAG) | `ruvector-cluster` | Cross-region boundary arbitration |
| State Sync | `ruvector-sync` | E-process summary propagation |
| Receipt Chain | `ruvector-merkle` | Distributed receipt verification |
---
## Hardware Mapping: 256-Tile WASM Fabric
The coherence gate is an ideal workload for event-driven WASM hardware: **mostly silent, then extremely decisive when boundaries move**.
### Tile Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ 256-TILE COGNITUM FABRIC │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ TILE ZERO (Arbiter) │ │
│ │ │ │
│ │ • Merge worker reports • Hierarchical min-cut │ │
│ │ • Global gate decision • Permit token issuance │ │
│ │ • Witness receipt log • Hash-chained eventlog │ │
│ └──────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Workers │ │ Workers │ │ Workers │ ... │
│ │ [1-85] │ │ [86-170] │ │ [171-255] │ │
│ │ │ │ │ │ │ │
│ │ Shard A │ │ Shard B │ │ Shard C │ │
│ │ Local cuts │ │ Local cuts │ │ Local cuts │ │
│ │ E-accum │ │ E-accum │ │ E-accum │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Worker Tile Responsibilities
Each of the 255 worker tiles maintains a **local shard**:
```rust
/// Worker tile state (fits in ~64KB WASM memory)
#[repr(C)]
pub struct WorkerTileState {
/// Compact neighborhood graph (edges + weights)
graph_shard: CompactGraph, // ~32KB
/// Rolling feature window for normality scores
feature_window: RingBuffer<f32>, // ~8KB
/// Local coherence score
coherence: f32,
/// Local boundary candidates (top-k edges)
boundary_edges: [EdgeId; 8],
/// Local e-value accumulator
e_accumulator: f64,
/// Tick counter
tick: u64,
}
/// Per-tick processing: only deltas
impl WorkerTileState {
/// Process incoming delta (edge add/remove/weight update)
pub fn ingest_delta(&mut self, delta: &Delta) -> Status {
match delta {
Delta::EdgeAdd(e) => self.graph_shard.add_edge(e),
Delta::EdgeRemove(e) => self.graph_shard.remove_edge(e),
Delta::WeightUpdate(e, w) => self.graph_shard.update_weight(e, *w),
Delta::Observation(score) => self.feature_window.push(*score),
}
self.update_local_coherence();
Status::Ok
}
/// Tick: compute and emit report
pub fn tick(&mut self, now_ns: u64) -> TileReport {
self.tick = now_ns;
// Tiny math: update e-accumulator
self.e_accumulator = self.compute_local_evalue();
TileReport {
tile_id: self.id,
coherence: self.coherence,
boundary_moved: self.detect_boundary_movement(),
suspicious_edges: self.top_k_suspicious(),
e_value: self.e_accumulator as f32,
witness_fragment: self.extract_witness_fragment(),
}
}
}
/// Fixed-size report (fits in single cache line)
#[repr(C, align(64))]
pub struct TileReport {
tile_id: u8,
coherence: f32,
boundary_moved: bool,
suspicious_edges: [EdgeId; 4],
e_value: f32,
witness_fragment: WitnessFragment,
}
```
### TileZero Responsibilities
TileZero acts as the **arbiter** that issues final decisions:
```rust
/// TileZero: Global gate decision and permit issuance
pub struct TileZero {
/// Merged supergraph (reduced from worker summaries)
supergraph: ReducedGraph,
/// Canonical permit token state
permit_state: PermitState,
/// Hash-chained witness receipt log
receipt_log: ReceiptLog,
/// Threshold configuration
thresholds: GateThresholds,
}
impl TileZero {
/// Collect reports from all worker tiles
pub fn collect_reports(&mut self, reports: &[TileReport; 255]) {
// Merge worker summaries into supergraph
for report in reports {
if report.boundary_moved {
self.supergraph.update_from_fragment(&report.witness_fragment);
}
self.supergraph.update_coherence(report.tile_id, report.coherence);
}
}
/// Issue gate decision (microsecond latency)
pub fn decide(&mut self, action_ctx: &ActionContext) -> PermitToken {
// Three stacked filters:
// 1. Structural filter (global cut on reduced graph)
let structural_ok = self.supergraph.global_cut() >= self.thresholds.min_cut;
// 2. Shift filter (aggregated shift pressure)
let shift_pressure = self.aggregate_shift_pressure();
let shift_ok = shift_pressure < self.thresholds.max_shift;
// 3. Evidence filter (can stop immediately if enough evidence)
let e_aggregate = self.aggregate_evidence();
let evidence_decision = self.evidence_decision(e_aggregate);
// Combined decision
let decision = match (structural_ok, shift_ok, evidence_decision) {
(false, _, _) => GateDecision::Deny, // Structure broken
(_, false, _) => GateDecision::Defer, // Shift detected
(_, _, EvidenceDecision::Reject) => GateDecision::Deny,
(_, _, EvidenceDecision::Continue) => GateDecision::Defer,
(true, true, EvidenceDecision::Accept) => GateDecision::Permit,
};
// Issue token
self.issue_permit_token(action_ctx, decision)
}
/// Issue permit token (a signed capability)
fn issue_permit_token(
&mut self,
ctx: &ActionContext,
decision: GateDecision,
) -> PermitToken {
let witness_hash = self.compute_witness_hash();
let token = PermitToken {
decision,
action_id: ctx.action_id,
timestamp: now_ns(),
ttl_ns: self.thresholds.permit_ttl,
witness_hash,
sequence: self.permit_state.next_sequence(),
};
// MAC or sign the token
let mac = self.permit_state.sign(&token);
// Emit receipt
self.emit_receipt(&token, &mac);
PermitToken { mac, ..token }
}
/// Emit witness receipt (hash-chained)
fn emit_receipt(&mut self, token: &PermitToken, mac: &[u8; 32]) {
let receipt = WitnessReceipt {
token: token.clone(),
mac: *mac,
previous_hash: self.receipt_log.last_hash(),
witness_summary: self.supergraph.witness_summary(),
};
self.receipt_log.append(receipt);
}
}
/// Permit token: a capability that agents must present
#[repr(C)]
pub struct PermitToken {
pub decision: GateDecision,
pub action_id: ActionId,
pub timestamp: u64,
pub ttl_ns: u64,
pub witness_hash: [u8; 32],
pub sequence: u64,
pub mac: [u8; 32], // HMAC or signature
}
impl PermitToken {
/// Agents must present valid token to perform actions
pub fn is_valid(&self, verifier: &Verifier) -> bool {
// Check TTL
if now_ns() > self.timestamp + self.ttl_ns {
return false;
}
// Verify MAC/signature
verifier.verify(self, &self.mac)
}
}
```
### WASM Kernel API
Each tile runs a minimal WASM kernel:
```rust
/// Worker tile WASM exports
#[no_mangle]
pub extern "C" fn ingest_delta(delta_ptr: *const u8, len: usize) -> u32 {
let delta = unsafe { core::slice::from_raw_parts(delta_ptr, len) };
TILE_STATE.with(|state| state.borrow_mut().ingest_delta(delta))
}
#[no_mangle]
pub extern "C" fn tick(now_ns: u64) -> *const TileReport {
TILE_STATE.with(|state| state.borrow_mut().tick(now_ns))
}
#[no_mangle]
pub extern "C" fn get_witness_fragment(id: u32) -> *const u8 {
TILE_STATE.with(|state| state.borrow().get_witness_fragment(id))
}
/// TileZero WASM/native exports
#[no_mangle]
pub extern "C" fn collect_reports(reports_ptr: *const TileReport, count: usize) {
TILEZERO.with(|tz| tz.borrow_mut().collect_reports(reports_ptr, count))
}
#[no_mangle]
pub extern "C" fn decide(action_ctx_ptr: *const ActionContext) -> *const PermitToken {
TILEZERO.with(|tz| tz.borrow_mut().decide(action_ctx_ptr))
}
#[no_mangle]
pub extern "C" fn get_receipt(sequence: u64) -> *const WitnessReceipt {
TILEZERO.with(|tz| tz.borrow().get_receipt(sequence))
}
```
### v0 Implementation Strategy
Ship fast by layering:
| Phase | Components | Skip Initially |
|-------|------------|----------------|
| **v0.1** | Structural coherence + witness receipt | Shift filter, evidence filter |
| **v0.2** | Add shift filter (normality scores) | CORE RL adaptation |
| **v0.3** | Add evidence filter (e-values) | Mixture e-values |
| **v1.0** | Full three-filter stack | - |
### Rust Deliverables
| Crate | Description | Dependencies |
|-------|-------------|--------------|
| `cognitum-gate-kernel` | `no_std` WASM kernel for worker tiles | `ruvector-mincut` (core algorithms) |
| `cognitum-gate-tilezero` | Native arbiter for TileZero | `ruvector-mincut`, `blake3`, `ed25519` |
| `mcp-gate` | MCP server for agent integration | `cognitum-gate-tilezero` |
```
cognitum-gate/
├── cognitum-gate-kernel/ # no_std WASM
│ ├── Cargo.toml
│ └── src/
│ ├── lib.rs # WASM exports
│ ├── shard.rs # Compact graph shard
│ ├── evidence.rs # Local e-accumulator
│ └── report.rs # TileReport generation
├── cognitum-gate-tilezero/ # Native arbiter
│ ├── Cargo.toml
│ └── src/
│ ├── lib.rs
│ ├── merge.rs # Report merging
│ ├── supergraph.rs # Reduced global graph
│ ├── permit.rs # Token issuance
│ └── receipt.rs # Hash-chained log
└── mcp-gate/ # MCP integration
├── Cargo.toml
└── src/
├── lib.rs
├── tools.rs # permit_action, get_receipt, replay_decision
└── server.rs # MCP server
```
### MCP Gate Tools
```rust
/// MCP tool: Request permission for an action
#[mcp_tool]
pub async fn permit_action(
action_id: String,
action_type: String,
context: serde_json::Value,
) -> Result<PermitResponse, McpError> {
let ctx = ActionContext::from_json(&context)?;
let token = TILEZERO.decide(&ctx);
Ok(PermitResponse {
decision: token.decision.to_string(),
token: token.encode_base64(),
witness_hash: hex::encode(&token.witness_hash),
valid_until_ns: token.timestamp + token.ttl_ns,
})
}
/// MCP tool: Get witness receipt for audit
#[mcp_tool]
pub async fn get_receipt(sequence: u64) -> Result<ReceiptResponse, McpError> {
let receipt = TILEZERO.get_receipt(sequence)?;
Ok(ReceiptResponse {
sequence,
decision: receipt.token.decision.to_string(),
timestamp: receipt.token.timestamp,
witness_summary: receipt.witness_summary.to_json(),
previous_hash: hex::encode(&receipt.previous_hash),
receipt_hash: hex::encode(&receipt.hash()),
})
}
/// MCP tool: Replay decision for debugging/audit
#[mcp_tool]
pub async fn replay_decision(
sequence: u64,
verify_chain: bool,
) -> Result<ReplayResponse, McpError> {
let receipt = TILEZERO.get_receipt(sequence)?;
// Optionally verify hash chain
if verify_chain {
TILEZERO.verify_chain_to(sequence)?;
}
// Replay the decision with logged state
let replayed = TILEZERO.replay(&receipt)?;
Ok(ReplayResponse {
original_decision: receipt.token.decision.to_string(),
replayed_decision: replayed.decision.to_string(),
match_confirmed: receipt.token.decision == replayed.decision,
state_snapshot: replayed.state_snapshot.to_json(),
})
}
```
### The Practical Win
This gives Cognitum a clear job that buyers understand:
> **"We do not just detect issues, we prevent unsafe actions."**
> **"We can prove why we blocked or allowed it."**
> **"We stay calm until structure breaks."**
The permit token as a capability means:
- Agents cannot act without presenting a valid token
- Tokens expire (TTL-bounded)
- Every token is backed by a witness receipt
- The entire chain is cryptographically verifiable
---
## API Contract
### Request: Permit Action
```json
{
"action_id": "cfg-push-7a3f",
"action_type": "config_change",
"target": {
"device": "router-west-03",
"path": "/network/interfaces/eth0"
},
"context": {
"agent_id": "ops-agent-12",
"session_id": "sess-abc123",
"prior_actions": ["cfg-push-7a3e"],
"urgency": "normal"
}
}
```
### Response: Permit
```json
{
"decision": "permit",
"token": "eyJ0eXAiOiJQVCIsImFsZyI6IkVkMjU1MTkifQ...",
"valid_until_ns": 1737158400000000000,
"witness": {
"structural": {
"cut_value": 12.7,
"partition": "stable",
"critical_edges": 0
},
"predictive": {
"set_size": 3,
"coverage": 0.92
},
"evidential": {
"e_value": 847.3,
"verdict": "accept"
}
},
"receipt_sequence": 1847392
}
```
### Response: Defer
```json
{
"decision": "defer",
"reason": "shift_detected",
"detail": "Distribution shift pressure 0.73 exceeds threshold 0.5",
"escalation": {
"to": "human_operator",
"context_url": "/receipts/1847393/context",
"timeout_ns": 300000000000
},
"witness": {
"structural": { "cut_value": 11.2, "partition": "stable" },
"predictive": { "set_size": 18, "coverage": 0.91 },
"evidential": { "e_value": 3.2, "verdict": "continue" }
},
"receipt_sequence": 1847393
}
```
### Response: Deny
```json
{
"decision": "deny",
"reason": "boundary_violation",
"detail": "Action crosses fragile partition (cut=2.1 < min=5.0)",
"witness": {
"structural": {
"cut_value": 2.1,
"partition": "fragile",
"critical_edges": 4,
"boundary": ["edge-17", "edge-23", "edge-41", "edge-52"]
},
"predictive": { "set_size": 47, "coverage": 0.88 },
"evidential": { "e_value": 0.004, "verdict": "reject" }
},
"receipt_sequence": 1847394
}
```
---
## Migration Path
### Phase M1: Shadow Mode
Run AVCG alongside existing `GateController`. Compare decisions, don't enforce.
```rust
impl HybridGate {
pub fn evaluate(&mut self, action: &Action) -> GateResult {
// Existing gate makes the decision
let legacy_result = self.legacy_gate.evaluate(action);
// AVCG runs in shadow, logs disagreements
let avcg_result = self.avcg_gate.evaluate(action);
if legacy_result.decision != avcg_result.decision {
metrics::counter!("gate.shadow.disagreement").increment(1);
log::info!(
"Shadow disagreement: legacy={:?} avcg={:?} action={}",
legacy_result.decision,
avcg_result.decision,
action.id
);
}
legacy_result // Legacy still decides
}
}
```
**Exit criteria**: <1% disagreement rate over 7 days, zero false denies on known-safe actions.
### Phase M2: Canary Enforcement
AVCG enforces for 5% of traffic, legacy handles rest.
```rust
impl CanaryGate {
pub fn evaluate(&mut self, action: &Action) -> GateResult {
let canary = self.canary_selector.select(action);
if canary {
metrics::counter!("gate.canary.avcg").increment(1);
self.avcg_gate.evaluate(action)
} else {
self.legacy_gate.evaluate(action)
}
}
}
```
**Exit criteria**: No incidents attributed to AVCG decisions over 14 days.
### Phase M3: Majority Rollout
AVCG handles 95%, legacy available for fallback.
### Phase M4: Full Cutover
Legacy removed. AVCG is the gate.
```
Timeline:
M1 (Shadow) → 2-4 weeks
M2 (Canary 5%) → 2 weeks
M3 (Majority) → 2 weeks
M4 (Full) → 1 week
─────────
Total → 7-9 weeks
```
---
## Observability
### Metrics (Prometheus)
```
# Decision counters
gate_decisions_total{decision="permit|defer|deny", reason="..."}
# Latency histograms
gate_latency_seconds{phase="mincut|conformal|eprocess|decision|receipt"}
# Signal values
gate_cut_value{quantile="0.5|0.9|0.99"}
gate_prediction_set_size{quantile="0.5|0.9|0.99"}
gate_evalue{quantile="0.5|0.9|0.99"}
# Health
gate_healthy{component="mincut|conformal|eprocess"}
gate_failover_total{from="primary|standby_N"}
# Coverage tracking
gate_conformal_coverage_rate # Should stay ≥ 0.85
gate_eprocess_power # Evidence accumulation rate
```
### Alerting Thresholds
| Alert | Condition | Severity |
|-------|-----------|----------|
| `GateHighDenyRate` | deny_rate > 10% for 5m | Warning |
| `GateLatencyHigh` | p99 > 100ms for 5m | Warning |
| `GateCoverageDrift` | coverage < 0.80 for 15m | Critical |
| `GateUnhealthy` | any component unhealthy for 1m | Critical |
| `GateReceiptChainBroken` | hash verification fails | Critical |
### Debug Query: Why Was This Denied?
```bash
# Get full decision context
curl /api/gate/receipts/1847394/explain
# Response:
{
"receipt_sequence": 1847394,
"decision": "deny",
"explanation": {
"primary_reason": "structural",
"structural": {
"cut_value": 2.1,
"threshold": 5.0,
"failed": true,
"boundary_edges": [
{"id": "edge-17", "weight": 0.3, "endpoints": ["node-a", "node-b"]},
...
],
"partition_context": "Device router-west-03 is in partition P7 which has been unstable since 14:32:07 UTC"
},
"predictive": { "failed": false, "detail": "Set size 47 within bounds" },
"evidential": { "failed": true, "detail": "E-value 0.004 < τ_deny 0.01" }
},
"suggested_action": "Wait for partition P7 to stabilize or escalate to human approval",
"similar_past_decisions": [1847201, 1846998, 1846754]
}
```
---
## Open Questions Resolution
### Q1: Graph model scope — immediate actions or multi-step lookahead?
**Decision**: Immediate actions for v0, optional 1-step lookahead for v1.
**Rationale**: Multi-step lookahead requires predicting action sequences, which adds latency and complexity. Start simple: evaluate the action being requested *right now*. If the current action is safe but would lead to an unsafe state, the *next* action will be denied when it's requested.
### Q2: E-process null — "action safety" vs "policy consistency"?
**Decision**: Action safety as primary null, policy consistency as secondary.
**Rationale**:
- Primary H₀: P(action leads to unsafe state) ≤ p₀
- Secondary (optional): Current action consistent with established policy
Action safety is more fundamental. Policy consistency can be added as a separate e-process that runs in parallel.
### Q3: Threshold learning — fixed or adaptive?
**Decision**: Fixed for v0, adaptive via meta-learning for v1.
**Rationale**: Fixed thresholds are easier to audit and explain. Once we have production data, we can train a meta-learner to adjust thresholds based on observed false positive/negative rates.
### Q4: Human-in-loop — how are DEFER decisions presented?
**Decision**: Structured escalation with timeout and context link.
```json
{
"escalation": {
"to": "human_operator",
"channel": "slack|pagerduty|dashboard",
"context_url": "/receipts/{seq}/context",
"timeout_ns": 300000000000,
"default_on_timeout": "deny"
}
}
```
If human doesn't respond within timeout, default to DENY (fail-safe).
See **Hybrid Agent/Human Workflow** section for full details.
### Q5: Adversarial robustness — adaptive adversaries?
**Decision**: Defense in depth + rate limiting + anomaly detection.
**Mitigations**:
1. Rate limit: Max N decisions per agent per minute
2. Anomaly: Flag agents with unusually high deny rates
3. Honeypots: Inject synthetic "trap" actions to detect probing
4. Rotation: Periodically rotate threshold parameters within safe bounds
---
## Definition of Done
### v0.1 Shippable Criteria
| Criterion | Metric | Target |
|-----------|--------|--------|
| **Structural filter works** | Min-cut correctly identifies fragile partitions | 100% on test suite |
| **Receipts are signed** | All receipts have valid Ed25519 signature | 100% |
| **Receipts are chained** | Hash chain verifies for all receipts | 100% |
| **Latency acceptable** | p99 gate decision time | < 50ms |
| **No false denies** | Known-safe actions are permitted | 100% on test suite |
| **Demo scenario runs** | Network security control plane demo | End-to-end pass |
### v0.1 Minimum Viable Demo
**Scenario**: Agent requests config push to network device.
1. Agent calls `permit_action` with device target
2. Gate evaluates structural coherence (min-cut)
3. Gate returns PERMIT with signed receipt
4. Agent presents token to device
5. Device verifies token, accepts config
**Success**: Auditor can replay decision from receipt and get same result.
---
## Cost Model
### Memory per Tile (WASM)
| Component | Size | Notes |
|-----------|------|-------|
| Graph shard | 32 KB | ~2000 edges at 16 bytes each |
| Feature window | 8 KB | 2048 f32 values |
| E-accumulator | 64 B | f64 + metadata |
| Boundary edges | 64 B | 8 × EdgeId |
| **Total per worker** | **~41 KB** | Fits in 64KB WASM page |
| **Total 255 workers** | **~10.2 MB** | |
| TileZero state | ~1 MB | Supergraph + receipt log head |
| **Total fabric** | **~12 MB** | |
### Network Bandwidth
| Flow | Frequency | Size | Bandwidth |
|------|-----------|------|-----------|
| Worker → TileZero reports | 1/tick (10ms) | 64 B × 255 | ~1.6 MB/s |
| Receipt log append | per decision | ~512 B | Variable |
| Gossip (distributed) | 1/100ms | ~1 KB × peers | ~10 KB/s × P |
### Storage Growth
| Item | Size | Retention | Growth |
|------|------|-----------|--------|
| Receipt | ~512 B | 90 days | ~44 MB/day @ 1000 decisions/s |
| E-process checkpoint | ~128 B | Forever | ~11 MB/day @ 1000 decisions/s |
| Audit log | ~256 B | 1 year | ~22 MB/day @ 1000 decisions/s |
**90-day storage**: ~7 GB receipts + ~1 GB checkpoints ≈ **8 GB**
---
## Hybrid Agent/Human Workflow
The coherence gate is designed for **bounded autonomy**, not full autonomy. Humans stay in the loop at critical decision points.
### Design Philosophy
```
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ "Agents handle the routine. Humans handle the novel." │
│ │
│ PERMIT → Agent proceeds autonomously (low risk, high confidence) │
│ DEFER → Human decides (uncertain, boundary case, policy gap) │
│ DENY → Blocked automatically (structural violation, unsafe) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
The gate doesn't replace human judgment—it **routes decisions to humans when judgment is needed**.
### Escalation Tiers
| Tier | Trigger | Responder | SLA | Example |
|------|---------|-----------|-----|---------|
| **T0** | PERMIT | None (automated) | 0 | Routine config within stable partition |
| **T1** | DEFER (shift) | On-call operator | 5 min | New dependency pattern detected |
| **T2** | DEFER (boundary) | Senior engineer | 15 min | Action crosses partition boundary |
| **T3** | DEFER (policy gap) | Policy team | 1 hour | No precedent for this action type |
| **T4** | DENY override request | Security + Management | 4 hours | Agent requesting exception to denial |
### Human Decision Interface
When a DEFER is escalated, humans see:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ DECISION REQUIRED Timeout: 4:32 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Agent: ops-agent-12 │
│ Action: Push config to router-west-03 /network/interfaces/eth0 │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ WHY DEFERRED │ │
│ │ │ │
│ │ • Shift detected: New dependency pattern (0.73 > 0.5 threshold)│ │
│ │ • This device was added to the graph 2 hours ago │ │
│ │ • Similar actions on established devices: 847 permits, 0 denies│ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CONTEXT │ │
│ │ │ │
│ │ Structural coherence: 11.2 (healthy) │ │
│ │ Prediction set size: 18 outcomes (moderate uncertainty) │ │
│ │ Evidence accumulator: 3.2 (inconclusive) │ │
│ │ │ │
│ │ [View full witness receipt] [View similar past decisions] │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │
│ │ APPROVE │ │ DENY │ │ ESCALATE TO T3 │ │
│ │ (proceed) │ │ (block) │ │ (need policy guidance) │ │
│ └───────────────┘ └───────────────┘ └───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Human Decision Recording
Human decisions become part of the audit trail:
```rust
pub struct HumanDecision {
/// Original deferred receipt
pub deferred_receipt_seq: u64,
/// Human's decision
pub decision: HumanVerdict,
/// Human identity (authenticated)
pub decider_id: AuthenticatedUserId,
/// Reasoning (required for audit)
pub rationale: String,
/// Timestamp
pub decided_at: u64,
/// Signature (human signs their decision)
pub signature: Ed25519Signature,
}
pub enum HumanVerdict {
/// Approve the action
Approve {
/// Add to training data for future automation
learn_from_this: bool,
},
/// Deny the action
Deny {
/// Reason for denial
reason: String,
},
/// Escalate to higher tier
Escalate {
to_tier: EscalationTier,
reason: String,
},
/// Request more information
NeedMoreInfo {
questions: Vec<String>,
},
}
```
### Override Protocol
Humans can override DENY decisions, but with friction and accountability:
```rust
pub struct DenyOverride {
/// Which denial is being overridden
pub denied_receipt_seq: u64,
/// Who is overriding (must be T4 authority)
pub overrider_id: AuthenticatedUserId,
/// Second approver required
pub second_approver_id: AuthenticatedUserId,
/// Business justification (required, min 50 chars)
pub justification: String,
/// Time-bounded: override expires
pub valid_until: u64,
/// Scope-limited: only this specific action
pub action_id: ActionId,
/// Both signatures required
pub overrider_signature: Ed25519Signature,
pub approver_signature: Ed25519Signature,
}
```
**Override constraints**:
- Two humans required (four-eyes principle)
- Must provide written justification
- Time-limited (max 24 hours)
- Scope-limited (only the specific action)
- All overrides flagged for security review
### Learning from Human Decisions
Human decisions improve the gate over time:
```rust
/// When human approves a DEFER, optionally learn from it
pub fn learn_from_approval(
deferred: &WitnessReceipt,
human: &HumanDecision,
) {
if human.decision.learn_from_this() {
// Add to calibration data
conformal_calibrator.add_observation(
deferred.context.clone(),
Outcome::Safe, // Human judged it safe
);
// Update e-process null hypothesis
eprocess_trainer.add_positive_example(
deferred.action.clone(),
);
// Adjust threshold candidates (for meta-learning in v1)
threshold_learner.record_human_permit(
deferred.signals.clone(),
);
}
}
```
### Workload Distribution Target
The goal is **minimal human burden** while maintaining safety:
| Decision | Target Rate | Human Workload |
|----------|-------------|----------------|
| PERMIT | 90-95% | Zero |
| DEFER | 4-9% | Human decides |
| DENY | 1-2% | Zero (unless override requested) |
If DEFER rate exceeds 10%, the gate is too conservative—tune thresholds.
If DENY rate exceeds 5%, something is wrong—investigate root cause.
### Integration Channels
| Channel | Use Case | Response Format |
|---------|----------|-----------------|
| **Slack** | On-call escalation | Interactive buttons |
| **PagerDuty** | Critical/timed decisions | Acknowledge + decision API |
| **Dashboard** | Batch review | Web UI with full context |
| **CLI** | Developer/ops workflow | `ruvector gate approve <seq>` |
| **API** | Programmatic integration | REST/gRPC |
### Audit Trail for Human Decisions
Every human decision is:
1. **Authenticated**: Decider identity verified via SSO/MFA
2. **Signed**: Human signs their decision with personal key
3. **Chained**: Added to the same receipt chain as gate decisions
4. **Timestamped**: Immutable record of when decision was made
5. **Justified**: Rationale captured for later review
```
Receipt Chain:
[1847392] PERMIT (automated) → agent executed
[1847393] DEFER (automated) → escalated to human
[1847393-H] APPROVE (human: alice@corp) → agent executed
[1847394] DENY (automated) → blocked
[1847394-O] OVERRIDE (humans: bob@corp + carol@corp) → exception granted
```
---
## Consequences
### Benefits
1. **Formal Guarantees**: Type I error control at any stopping time
2. **Distribution Shift Robustness**: Conformal prediction adapts without retraining
3. **Computational Efficiency**: O(n^{o(1)}) update time from subpolynomial min-cut
4. **Audit Trail**: Every decision has cryptographic witness receipt
5. **Defense in Depth**: Three independent signals must concur for permit
6. **Cryptographic Integrity**: All receipts signed with Ed25519
7. **Attack Resistance**: E-value bounds, replay guards, race condition prevention
8. **Distributed Scalability**: Hierarchical coordination with regional and global tiers
9. **Fault Tolerance**: Automatic failover with safe defaults
### Risks & Mitigations
| Risk | Mitigation |
|------|------------|
| Computational overhead | Lazy evaluation; batch updates; SIMD optimization |
| E-value power under uncertainty | Mixture e-values for robustness |
| Graph model mismatch | Learn graph structure from trajectories |
| Threshold tuning | Adaptive thresholds via meta-learning |
| Receipt forgery | Mandatory Ed25519 signing; chain linkage |
| E-value manipulation | Input bounds; clamping with security logging |
| Race conditions | Atomic decisions with sequence numbers |
| Replay attacks | Bloom filter + sliding window guard |
| Network partitions | Hierarchical decisions; local autonomy |
| Byzantine nodes | Consensus-based aggregation; safe defaults |
### Complexity Analysis
| Operation | Current | With AVCG | Distributed AVCG |
|-----------|---------|-----------|------------------|
| Edge update | O(n^{o(1)}) | O(n^{o(1)}) | O(n^{o(1)}) + network |
| Gate evaluation | O(1) | O(k) prediction set | O(k) + O(R) regional |
| Witness generation | O(m) | O(m) amortized | O(m) + signing |
| Certificate verification | O(n) | O(n + log T) | O(n + log T) + sig verify |
| Receipt signing | N/A | O(1) Ed25519 | O(1) + HSM latency |
| Distributed consensus | N/A | N/A | O(log N) Raft |
| E-process aggregation | N/A | O(1) | O(P) peers |
Where: k = prediction set size, T = history length, R = regional peers, N = cluster size, P = peer count
## References
### Dynamic Min-Cut
1. El-Hayek, Henzinger, Li. "Deterministic and Exact Fully-dynamic Minimum Cut of Superpolylogarithmic Size in Subpolynomial Time." arXiv:2512.13105, December 2025.
2. Jin, Sun, Thorup. "Fully Dynamic Exact Minimum Cut in Subpolynomial Time." SODA 2024.
### Online Conformal Prediction
3. "Online Conformal Inference with Retrospective Adjustment for Faster Adaptation to Distribution Shift." arXiv:2511.04275, November 2025.
4. "Distribution-informed Online Conformal Prediction (COP)." December 2025.
5. "CORE: Conformal Regression under Distribution Shift via Reinforcement Learning." October 2025.
### E-Values and E-Processes
6. Ramdas, Wang. "Hypothesis Testing with E-values." Foundations and Trends in Statistics, 2025.
7. ICML 2025 Tutorial: "Game-theoretic Statistics and Sequential Anytime-Valid Inference."
8. "Sequential Randomization Tests Using e-values." arXiv:2512.04366, December 2025.
### AI Agent Control
9. "Bounded Autonomy: A Pragmatic Response to Concerns About Fully Autonomous AI Agents." XMPRO, 2025.
10. "Customizable Runtime Enforcement for Safe and Reliable LLM Agents." arXiv:2503.18666, 2025.
## Testing Strategy
### Unit Tests
| Component | Coverage Target | Key Test Cases |
|-----------|----------------|----------------|
| `CompactGraph` | 95% | Add/remove edges, weight updates, min-cut estimation |
| `EvidenceAccumulator` | 95% | Bounds checking, update rules, stopping decisions |
| `TileReport` | 90% | Serialization roundtrip, checksum verification |
| `PermitToken` | 95% | Signing, verification, TTL expiration |
| `ReceiptLog` | 95% | Hash chain integrity, tamper detection |
| `ThreeFilterDecision` | 100% | All Permit/Defer/Deny paths |
### Integration Tests
| Scenario | Description | Expected Outcome |
|----------|-------------|------------------|
| Happy path | Stable graph, safe action | PERMIT with valid receipt |
| Boundary crossing | Action crosses fragile partition | DENY with boundary edges |
| Shift detection | New dependency pattern | DEFER with escalation |
| Human approval | DEFER → human approves | Token issued, learning recorded |
| Replay verification | Replay historical decision | Deterministic match |
| Hash chain audit | Verify 1000 receipts | All hashes valid |
### Property-Based Tests
```rust
#[proptest]
fn e_value_always_positive(e1: f64, e2: f64) {
let result = combine_evalues(e1.abs(), e2.abs());
prop_assert!(result > 0.0);
}
#[proptest]
fn receipt_hash_deterministic(receipt: WitnessReceipt) {
let hash1 = receipt.compute_hash();
let hash2 = receipt.compute_hash();
prop_assert_eq!(hash1, hash2);
}
#[proptest]
fn serialization_roundtrip(report: TileReport) {
let bytes = report.serialize();
let restored = TileReport::deserialize(&bytes);
prop_assert_eq!(report, restored);
}
```
### Security Tests
| Test | Attack Vector | Expected Behavior |
|------|---------------|-------------------|
| Forged signature | Invalid Ed25519 sig | Verification fails |
| Replay attack | Duplicate action | ReplayGuard blocks |
| E-value overflow | Extreme likelihood ratio | Clamped to bounds |
| Race condition | Concurrent evaluations | Sequence numbers ordered |
| Tampered receipt | Modified hash | Chain verification fails |
### Benchmark Tests
| Metric | Target | Measurement |
|--------|--------|-------------|
| Gate decision latency | p99 < 50ms | `criterion` benchmark |
| Receipt signing | < 5ms | `criterion` benchmark |
| 255-tile report merge | < 10ms | `criterion` benchmark |
| Hash chain verification (1000) | < 100ms | `criterion` benchmark |
| Memory per worker tile | < 64KB | Static analysis |
---
## Configuration Format
### TOML Configuration
```toml
# gate-config.toml
[gate]
# Gate identification
gate_id = "gate-west-01"
version = "0.1.0"
[thresholds]
# E-process thresholds
tau_deny = 0.01 # E-value below this → DENY
tau_permit = 100.0 # E-value above this → PERMIT
# Structural thresholds
min_cut = 5.0 # Cut value below this → DENY
max_shift = 0.5 # Shift pressure above this → DEFER
# Conformal thresholds
max_prediction_set = 20 # Set size above this → DEFER
coverage_target = 0.90 # Target coverage rate
[timing]
# Permit token TTL
permit_ttl_seconds = 300
# Decision timeout
decision_timeout_ms = 50
# Tick interval for worker tiles
tick_interval_ms = 10
[security]
# Key rotation
signing_key_rotation_days = 30
threshold_key_rotation_days = 90
# Replay prevention
replay_window_seconds = 3600
bloom_filter_size = 1000000
[distributed]
# Coordination settings
regional_peers = ["gate-west-02", "gate-west-03"]
global_coordinator = "coordinator-global-01"
raft_heartbeat_ms = 100
consensus_timeout_ms = 1000
[escalation]
# Human-in-loop settings
default_timeout_seconds = 300
default_on_timeout = "deny"
[escalation.channels.slack]
webhook_url = "${SLACK_WEBHOOK_URL}"
channel = "#gate-escalations"
[escalation.channels.pagerduty]
api_key = "${PAGERDUTY_API_KEY}"
service_id = "gate-critical"
[observability]
# Metrics endpoint
metrics_port = 9090
metrics_path = "/metrics"
# Tracing
tracing_enabled = true
tracing_sample_rate = 0.1
jaeger_endpoint = "http://jaeger:14268/api/traces"
[storage]
# Receipt storage
receipt_backend = "postgresql"
receipt_retention_days = 90
checkpoint_interval = 100
[storage.postgresql]
host = "${DB_HOST}"
port = 5432
database = "gate_receipts"
username = "${DB_USER}"
password = "${DB_PASSWORD}"
```
### Environment Variables
```bash
# Required
export GATE_SIGNING_KEY_PATH=/etc/gate/keys/signing.key
export GATE_CONFIG_PATH=/etc/gate/config.toml
# Optional overrides
export GATE_TAU_DENY=0.01
export GATE_TAU_PERMIT=100.0
export GATE_MIN_CUT=5.0
export GATE_MAX_SHIFT=0.5
export GATE_PERMIT_TTL_SECONDS=300
# Secrets (never in config file)
export SLACK_WEBHOOK_URL=https://hooks.slack.com/...
export PAGERDUTY_API_KEY=...
export DB_PASSWORD=...
```
---
## Error Recovery Procedures
### Gate Decision Failures
| Failure | Detection | Recovery | Fallback |
|---------|-----------|----------|----------|
| Min-cut timeout | Decision exceeds 50ms | Log, retry once | DEFER |
| E-process NaN | `is_nan()` check | Reset accumulator | DENY |
| Signing failure | Ed25519 error | Rotate to backup key | DENY (unsigned) |
| Receipt log full | Capacity check | Archive, start new segment | DENY |
### Distributed Failures
```rust
impl FaultRecovery {
pub async fn handle_regional_failure(&mut self, error: RegionalError) -> GateResult {
match error {
RegionalError::LeaderUnavailable => {
// Wait for new leader election
tokio::time::sleep(Duration::from_millis(200)).await;
self.retry_with_new_leader().await
}
RegionalError::NetworkPartition => {
// Fall back to local-only decision
log::warn!("Network partition detected, using local gate");
self.local_gate.evaluate_standalone()
}
RegionalError::ConsensusTimeout => {
// Use conservative decision
Ok(GateResult {
decision: GateDecision::Defer,
reason: "Consensus timeout - escalating to human".into(),
..Default::default()
})
}
}
}
}
```
### Receipt Chain Recovery
```rust
impl ReceiptLog {
/// Recover from corrupted receipt chain
pub fn recover_chain(&mut self, last_known_good: u64) -> Result<(), RecoveryError> {
// 1. Truncate corrupted entries
self.truncate_after(last_known_good)?;
// 2. Rebuild from checkpoint
let checkpoint = self.find_nearest_checkpoint(last_known_good)?;
self.rebuild_from_checkpoint(checkpoint)?;
// 3. Mark recovery in audit log
self.append_recovery_marker(last_known_good)?;
// 4. Alert operators
alert::send("Receipt chain recovery performed", Severity::Warning);
Ok(())
}
}
```
### Worker Tile Recovery
| Failure | Detection | Recovery Time | Data Loss |
|---------|-----------|---------------|-----------|
| Single tile crash | Heartbeat timeout | < 100ms | Last tick |
| Tile memory corruption | Checksum mismatch | < 500ms | Current shard |
| TileZero crash | Primary unavailable | < 1s | None (standbys) |
| Full fabric restart | All tiles down | < 5s | Rebuild from checkpoint |
### Runbook: Gate Unresponsive
```bash
# 1. Check gate health
curl http://gate:9090/health
# 2. If unhealthy, check logs
kubectl logs -l app=gate --tail=100
# 3. Check for resource exhaustion
kubectl top pods -l app=gate
# 4. If memory high, trigger GC
curl -X POST http://gate:9090/admin/gc
# 5. If still unresponsive, rolling restart
kubectl rollout restart deployment/gate
# 6. Verify recovery
curl http://gate:9090/health
curl http://gate:9090/metrics | grep gate_healthy
```
---
## Appendix: Mathematical Foundations
### E-Value Composition
For independent e-values e₁, e₂:
```
e_combined = e₁ · e₂
E[e_combined] = E[e₁] · E[e₂] ≤ 1 · 1 = 1
```
This enables **optional continuation**: evidence accumulates validly across sessions.
### Conformal Coverage
Under exchangeability or bounded distribution shift:
```
P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α - δ_t
```
Where δ_t → 0 as the algorithm adapts via retrospective adjustment.
### Anytime-Valid Stopping
For any stopping time τ (possibly data-dependent):
```
P_H₀(E_τ ≥ 1/α) ≤ α
```
This holds because E_t is a nonnegative supermartingale with E[E_0] = 1.