git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2224 lines
79 KiB
Markdown
2224 lines
79 KiB
Markdown
# ADR-001: Anytime-Valid Coherence Gate
|
||
|
||
**Status**: Proposed
|
||
**Date**: 2026-01-17
|
||
**Authors**: ruv.io, RuVector Team
|
||
**Deciders**: Architecture Review Board
|
||
**SDK**: Claude-Flow
|
||
|
||
## Version History
|
||
|
||
| Version | Date | Author | Changes |
|
||
|---------|------|--------|---------|
|
||
| 0.1 | 2026-01-17 | ruv.io | Initial draft with three-filter architecture |
|
||
| 0.2 | 2026-01-17 | ruv.io | Added security hardening, performance optimization |
|
||
| 0.3 | 2026-01-17 | ruv.io | Added 256-tile WASM fabric mapping |
|
||
| 0.4 | 2026-01-17 | ruv.io | Added API contract, migration, observability |
|
||
| 0.5 | 2026-01-17 | ruv.io | Added hybrid agent/human workflow |
|
||
| 0.6 | 2026-01-17 | ruv.io | Added testing strategy, config format, error recovery |
|
||
|
||
## Plain Language Summary
|
||
|
||
**What is it?**
|
||
|
||
An Anytime-Valid Coherence Gate is a small control loop that decides, at any moment:
|
||
|
||
> "Is it safe to act right now, or should we pause or escalate?"
|
||
|
||
It does not try to be smart. It tries to be **safe**, **calm**, and **correct** about permission.
|
||
|
||
**Why "anytime-valid"?**
|
||
|
||
Because you can stop the computation at any time and still trust the decision.
|
||
|
||
Like a smoke detector:
|
||
- It can keep listening forever
|
||
- The moment it has enough evidence, it triggers
|
||
- If you stop listening early, whatever it already concluded is still valid
|
||
|
||
You are not waiting for a model to finish thinking. You are continuously monitoring stability.
|
||
|
||
**Why "coherence"?**
|
||
|
||
Coherence means: does the system's current state agree with itself?
|
||
|
||
In RuVector, coherence is measured from structure:
|
||
- RuVector holds relationships as vectors plus a graph
|
||
- Min-cut and boundary signals tell you when the graph is becoming fragile or splitting into conflicting regions
|
||
- If the system is splitting, you do not let it take big actions
|
||
|
||
**What it outputs:**
|
||
|
||
| Decision | Meaning |
|
||
|----------|---------|
|
||
| **Permit** | Stable enough, proceed |
|
||
| **Defer** | Uncertain, escalate to a stronger model or human |
|
||
| **Deny** | Unstable or policy-violating, block the action |
|
||
|
||
Every decision returns a short "receipt" explaining why.
|
||
|
||
**A concrete example:**
|
||
|
||
An agent wants to push a config change to a network device.
|
||
- If the dependency graph is stable and similar changes worked before → **Permit**
|
||
- If signals are weird (new dependencies, new actors, drift) → **Defer** and ask for confirmation
|
||
- If the change crosses a fragile boundary (touches a partition already unstable) → **Deny**
|
||
|
||
**Why it matters:**
|
||
|
||
It turns autonomy into something enterprises can trust because:
|
||
- Actions are bounded
|
||
- Uncertainty is handled explicitly
|
||
- You get an audit trail
|
||
|
||
*"Attention becomes a permission system, not a popularity contest"* — applied to whole-system actions instead of token attention.
|
||
|
||
---
|
||
|
||
## Context
|
||
|
||
The RuVector ecosystem requires a principled mechanism for controlling autonomous agent actions with:
|
||
- **Formal safety guarantees** under distribution shift
|
||
- **Computational efficiency** suitable for real-time enforcement
|
||
- **Auditable decision trails** with cryptographic receipts
|
||
|
||
Current approaches (threshold classifiers, rule-based systems, periodic audits) lack one or more of these properties. This ADR proposes the **Anytime-Valid Coherence Gate (AVCG)** - a 3-way algorithmic combination that converts coherence measurement into a deterministic control loop.
|
||
|
||
## Decision
|
||
|
||
We will implement an Anytime-Valid Coherence Gate that integrates three cutting-edge algorithmic components:
|
||
|
||
### 1. Dynamic Min-Cut with Witness Partitions
|
||
|
||
**Source**: El-Hayek, Henzinger, Li (arXiv:2512.13105, December 2025)
|
||
|
||
**Key Innovation**: Exact deterministic n^{o(1)} update time for cuts up to 2^{Θ(log^{3/4-c}n)}
|
||
|
||
**Integration**:
|
||
- Extends existing `SubpolynomialMinCut` in `ruvector-mincut/src/subpolynomial/mod.rs`
|
||
- Leverages existing `WitnessTree` for explicit partition certificates
|
||
- Uses deterministic `LocalKCut` for local cut verification
|
||
|
||
**Role in Gate**: Provides the **structural coherence signal** - identifies minimal intervention points in the agent action graph with explicit witness partitions showing which actions form the critical boundary to unsafe states.
|
||
|
||
### 2. Online Conformal Prediction with Shift-Awareness
|
||
|
||
**Sources**:
|
||
- Retrospective Adjustment (arXiv:2511.04275, November 2025)
|
||
- Conformal Optimistic Prediction (COP) (December 2025)
|
||
- CORE: RL-based Conformal Regression (October 2025)
|
||
|
||
**Key Innovation**: Distribution-free coverage guarantees that adapt to arbitrary distribution shift with faster recalibration via retrospective adjustment.
|
||
|
||
**Integration**:
|
||
- New module: `ruvector-mincut/src/conformal/` for prediction sets
|
||
- Interfaces with existing `GatePolicy` thresholds
|
||
- Wraps action outcome predictions with calibrated uncertainty
|
||
|
||
**Role in Gate**: Provides the **predictive uncertainty signal** - quantifies confidence in action outcomes, triggering DEFER when prediction sets are too large.
|
||
|
||
### 3. E-Values and E-Processes for Anytime-Valid Inference
|
||
|
||
**Sources**:
|
||
- Ramdas & Wang "Hypothesis Testing with E-values" (FnTStA 2025)
|
||
- ICML 2025 Tutorial on SAVI
|
||
- Sequential Randomization Tests (arXiv:2512.04366, December 2025)
|
||
|
||
**Key Innovation**: Evidence accumulation that remains valid at any stopping time, with multiplicative composition across experiments.
|
||
|
||
**Definition**: E-value e satisfies E[e] ≤ 1 under null hypothesis. E-processes are nonnegative supermartingales with E_0 = 1.
|
||
|
||
**Integration**:
|
||
- New module: `ruvector-mincut/src/eprocess/` for evidence tracking
|
||
- Integrates with existing `CutCertificate` for audit trails
|
||
- Enables anytime-valid stopping decisions
|
||
|
||
**Role in Gate**: Provides the **evidential validity signal** - accumulates statistical evidence for/against coherence with formal Type I error control at any stopping time.
|
||
|
||
## Gate Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ ANYTIME-VALID COHERENCE GATE │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||
│ │ DYNAMIC MIN-CUT │ │ CONFORMAL │ │ E-PROCESS │ │
|
||
│ │ (Structural) │ │ (Predictive) │ │ (Evidential) │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ SubpolynomialMC │ │ ShiftAdaptive │ │ CoherenceTest │ │
|
||
│ │ WitnessTree │───▶│ PredictionSet │───▶│ EvidenceAccum │ │
|
||
│ │ LocalKCut │ │ COP/CORE │ │ StoppingRule │ │
|
||
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||
│ │ DECISION LOGIC │ │
|
||
│ │ │ │
|
||
│ │ PERMIT: E_t > τ_permit ∧ action ∉ CriticalCut ∧ |C_t| small │ │
|
||
│ │ DEFER: |C_t| large ∨ τ_deny < E_t < τ_permit │ │
|
||
│ │ DENY: E_t < τ_deny ∨ action ∈ WitnessPartition(unsafe) │ │
|
||
│ │ │ │
|
||
│ └────────────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ┌─────────────────────┐ │
|
||
│ │ WITNESS RECEIPT │ │
|
||
│ │ (cut + conf + e) │ │
|
||
│ └─────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Integration with Existing Architecture
|
||
|
||
### Extension Points
|
||
|
||
| Component | Current Implementation | AVCG Extension |
|
||
|-----------|----------------------|----------------|
|
||
| `GatePacket` | λ as point estimate | Add `lambda_confidence_q15`, `e_value_log_q15` |
|
||
| `GateController` | Rule-based thresholds | Add `AnytimeGatePolicy` with adaptive thresholds |
|
||
| `WitnessTree` | Cut value only | Add `ConfidenceWitness` with staleness tracking |
|
||
| `CutCertificate` | Static verification | Add `EvidenceReceipt` with e-value trace |
|
||
| `TierDecision` | Fixed tiers | Add `required_confidence_for_tier` |
|
||
|
||
### New Modules
|
||
|
||
```
|
||
ruvector-mincut/
|
||
├── src/
|
||
│ ├── conformal/ # NEW: Online conformal prediction
|
||
│ │ ├── mod.rs
|
||
│ │ ├── prediction_set.rs
|
||
│ │ ├── cop.rs # Conformal Optimistic Prediction
|
||
│ │ ├── retrospective.rs # Retrospective adjustment
|
||
│ │ └── core.rs # RL-based conformal
|
||
│ ├── eprocess/ # NEW: E-value and e-process tracking
|
||
│ │ ├── mod.rs
|
||
│ │ ├── evalue.rs
|
||
│ │ ├── evidence_accum.rs
|
||
│ │ ├── stopping.rs
|
||
│ │ └── mixture.rs
|
||
│ ├── anytime_gate/ # NEW: Integrated gate controller
|
||
│ │ ├── mod.rs
|
||
│ │ ├── policy.rs
|
||
│ │ ├── decision.rs
|
||
│ │ └── receipt.rs
|
||
│ └── ...existing modules...
|
||
```
|
||
|
||
## Decision Rules
|
||
|
||
### Permit Conditions (all must hold)
|
||
1. E-process value E_t > τ_permit (sufficient evidence of coherence)
|
||
2. Action not in witness partition of critical cut
|
||
3. Conformal prediction set |C_t| < θ_confidence (confident prediction)
|
||
|
||
### Defer Conditions (any triggers)
|
||
1. Conformal prediction set |C_t| > θ_uncertainty (uncertain outcome)
|
||
2. E-process in indeterminate range: τ_deny < E_t < τ_permit
|
||
3. Deadline approaching without sufficient confidence
|
||
|
||
### Deny Conditions (any triggers)
|
||
1. E-process value E_t < τ_deny (strong evidence of incoherence)
|
||
2. Action in witness partition crossing to unsafe states
|
||
3. Structural impossibility via min-cut topology
|
||
|
||
## Threshold Configuration
|
||
|
||
| Threshold | Meaning | Recommended Default |
|
||
|-----------|---------|---------------------|
|
||
| τ_deny | E-process level indicating incoherence | 0.01 (1% false alarm) |
|
||
| τ_permit | E-process level indicating coherence | 100 (strong evidence) |
|
||
| θ_uncertainty | Conformal set size requiring deferral | Task-dependent |
|
||
| θ_confidence | Conformal set size for confident permit | Task-dependent |
|
||
|
||
## Witness Receipt Structure
|
||
|
||
```rust
|
||
pub struct WitnessReceipt {
|
||
/// Timestamp of decision
|
||
pub timestamp: u64,
|
||
/// Action that was evaluated
|
||
pub action_id: ActionId,
|
||
/// Gate decision
|
||
pub decision: GateDecision,
|
||
|
||
// Structural witness (from min-cut)
|
||
pub cut_value: f64,
|
||
pub witness_partition: (Vec<VertexId>, Vec<VertexId>),
|
||
pub critical_edges: Vec<EdgeId>,
|
||
|
||
// Predictive witness (from conformal)
|
||
pub prediction_set: ConformalSet,
|
||
pub coverage_target: f32,
|
||
pub shift_adaptation_rate: f32,
|
||
|
||
// Evidential witness (from e-process)
|
||
pub e_value: f64,
|
||
pub e_process_cumulative: f64,
|
||
pub stopping_valid: bool,
|
||
|
||
// Cryptographic seal
|
||
pub receipt_hash: [u8; 32],
|
||
}
|
||
```
|
||
|
||
## Security Hardening
|
||
|
||
### Threat Model
|
||
|
||
| Threat Actor | Capabilities | Target | Impact |
|
||
|--------------|--------------|--------|--------|
|
||
| **Malicious Agent** | Action injection, timing manipulation | Gate bypass | Unauthorized actions executed |
|
||
| **Network Adversary** | Message interception, replay | Receipt forgery | False audit trail |
|
||
| **Insider Threat** | Threshold modification, key access | Policy manipulation | Safety guarantees voided |
|
||
| **Byzantine Node** | Arbitrary behavior in distributed gate | Consensus corruption | Inconsistent decisions |
|
||
|
||
### Cryptographic Requirements
|
||
|
||
#### Receipt Signing (CRITICAL)
|
||
|
||
```rust
|
||
pub struct WitnessReceipt {
|
||
// ... existing fields ...
|
||
|
||
// Cryptographic seal (REQUIRED)
|
||
pub receipt_hash: [u8; 32], // Blake3 hash of serialized content
|
||
pub signature: Ed25519Signature, // REQUIRED, not optional
|
||
pub signer_id: PublicKey, // Identity of signing gate
|
||
pub timestamp_proof: TimestampProof, // Prevents backdating
|
||
}
|
||
|
||
/// Timestamp proof prevents replay and backdating
|
||
pub struct TimestampProof {
|
||
pub timestamp: u64,
|
||
pub previous_receipt_hash: [u8; 32], // Chain linkage
|
||
pub merkle_root: [u8; 32], // Batch anchor
|
||
}
|
||
|
||
impl WitnessReceipt {
|
||
/// Sign receipt - MUST be called before any external use
|
||
pub fn sign(&mut self, key: &SigningKey) -> Result<(), CryptoError> {
|
||
let content = self.serialize_without_signature();
|
||
self.receipt_hash = blake3::hash(&content).into();
|
||
self.signature = key.sign(&self.receipt_hash);
|
||
Ok(())
|
||
}
|
||
|
||
/// Verify receipt integrity and authenticity
|
||
pub fn verify(&self, trusted_keys: &KeyStore) -> Result<(), VerifyError> {
|
||
// 1. Verify hash
|
||
let expected_hash = blake3::hash(&self.serialize_without_signature());
|
||
if self.receipt_hash != expected_hash.into() {
|
||
return Err(VerifyError::HashMismatch);
|
||
}
|
||
|
||
// 2. Verify signature
|
||
let public_key = trusted_keys.get(&self.signer_id)?;
|
||
public_key.verify(&self.receipt_hash, &self.signature)?;
|
||
|
||
// 3. Verify timestamp chain
|
||
self.timestamp_proof.verify()?;
|
||
|
||
Ok(())
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Key Management
|
||
|
||
| Key Type | Purpose | Rotation | Storage |
|
||
|----------|---------|----------|---------|
|
||
| Gate Signing Key | Sign receipts | 30 days | HSM or secure enclave |
|
||
| Receipt Verification Keys | Verify receipts | On rotation | Distributed key store |
|
||
| Threshold Keys | Multi-party signing | 90 days | Shamir secret sharing |
|
||
|
||
### Attack Mitigations
|
||
|
||
#### E-Value Manipulation Prevention
|
||
|
||
```rust
|
||
/// Bounds checking for e-value inputs
|
||
impl EValue {
|
||
pub fn from_likelihood_ratio(
|
||
likelihood_h1: f64,
|
||
likelihood_h0: f64,
|
||
) -> Result<Self, EValueError> {
|
||
// Prevent division by zero
|
||
if likelihood_h0 <= f64::EPSILON {
|
||
return Err(EValueError::InvalidDenominator);
|
||
}
|
||
|
||
let ratio = likelihood_h1 / likelihood_h0;
|
||
|
||
// Bound extreme values to prevent overflow attacks
|
||
let bounded = ratio.clamp(E_VALUE_MIN, E_VALUE_MAX);
|
||
|
||
// Log if clamping occurred (potential attack indicator)
|
||
if (bounded - ratio).abs() > f64::EPSILON {
|
||
security_log!("E-value clamped: {} -> {}", ratio, bounded);
|
||
}
|
||
|
||
Ok(Self { value: bounded, ..Default::default() })
|
||
}
|
||
}
|
||
|
||
const E_VALUE_MIN: f64 = 1e-10;
|
||
const E_VALUE_MAX: f64 = 1e10;
|
||
```
|
||
|
||
#### Race Condition Prevention
|
||
|
||
```rust
|
||
/// Atomic gate decision with sequence numbers
|
||
pub struct AtomicGateDecision {
|
||
/// Monotonic sequence for ordering
|
||
sequence: AtomicU64,
|
||
/// Lock for decision atomicity
|
||
decision_lock: RwLock<()>,
|
||
}
|
||
|
||
impl AtomicGateDecision {
|
||
pub async fn evaluate(&self, action: &Action) -> GateResult {
|
||
// Acquire exclusive lock for decision
|
||
let _guard = self.decision_lock.write().await;
|
||
|
||
// Get sequence number BEFORE evaluation
|
||
let seq = self.sequence.fetch_add(1, Ordering::SeqCst);
|
||
|
||
// Evaluate all three signals atomically
|
||
let result = self.evaluate_internal(action, seq).await;
|
||
|
||
// Sequence number in receipt ensures ordering
|
||
result.with_sequence(seq)
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Replay Attack Prevention
|
||
|
||
```rust
|
||
/// Replay prevention via nonce tracking
|
||
pub struct ReplayGuard {
|
||
/// Recent action hashes (bloom filter for efficiency)
|
||
recent_actions: BloomFilter,
|
||
/// Sliding window of full hashes for false positive resolution
|
||
hash_window: VecDeque<[u8; 32]>,
|
||
/// Maximum age of tracked actions
|
||
window_duration: Duration,
|
||
}
|
||
|
||
impl ReplayGuard {
|
||
pub fn check_and_record(&mut self, action: &Action) -> Result<(), ReplayError> {
|
||
let hash = action.content_hash();
|
||
|
||
// Fast path: bloom filter check
|
||
if self.recent_actions.might_contain(&hash) {
|
||
// Slow path: verify against full hash window
|
||
if self.hash_window.contains(&hash) {
|
||
return Err(ReplayError::DuplicateAction { hash });
|
||
}
|
||
}
|
||
|
||
// Record action
|
||
self.recent_actions.insert(&hash);
|
||
self.hash_window.push_back(hash);
|
||
self.prune_old_entries();
|
||
|
||
Ok(())
|
||
}
|
||
}
|
||
```
|
||
|
||
### Trust Boundaries
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ TRUST BOUNDARY: GATE CORE │
|
||
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
||
│ │ • E-process computation • Min-cut evaluation │ │
|
||
│ │ • Conformal prediction • Decision logic │ │
|
||
│ │ • Receipt signing • Key material │ │
|
||
│ │ │ │
|
||
│ │ Invariants: │ │
|
||
│ │ - All inputs validated before use │ │
|
||
│ │ - All outputs signed before release │ │
|
||
│ │ - No external calls during decision │ │
|
||
│ └───────────────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ (authenticated channel) │
|
||
│ │ │
|
||
└────────────────────────────────────┼────────────────────────────────────┘
|
||
│
|
||
┌────────────────────────────────────┼────────────────────────────────────┐
|
||
│ TRUST BOUNDARY: AGENT INTERFACE │
|
||
│ │ │
|
||
│ • Action submission (validated) │ • Decision receipt (verified) │
|
||
│ • Context provision (sanitized) │ • Witness query (authenticated) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Optimization
|
||
|
||
### Identified Bottlenecks & Solutions
|
||
|
||
#### 1. E-Process History Management
|
||
|
||
**Problem**: Unbounded history growth in `EProcess.history: Vec<EValue>`
|
||
|
||
**Solution**: Ring buffer with configurable retention
|
||
|
||
```rust
|
||
pub struct EProcess {
|
||
/// Current accumulated value (always maintained)
|
||
current: f64,
|
||
|
||
/// Bounded history ring buffer
|
||
history: RingBuffer<EValueSummary>,
|
||
|
||
/// Checkpoint for long-term audit (sampled)
|
||
checkpoints: Vec<EProcessCheckpoint>,
|
||
}
|
||
|
||
/// Compact summary for history
|
||
pub struct EValueSummary {
|
||
value: f32, // Reduced precision for storage
|
||
timestamp: u32, // Relative to epoch
|
||
flags: u8, // Metadata bits
|
||
}
|
||
|
||
impl EProcess {
|
||
const HISTORY_CAPACITY: usize = 1024;
|
||
const CHECKPOINT_INTERVAL: usize = 100;
|
||
|
||
pub fn update(&mut self, e: EValue) {
|
||
// Update current (always)
|
||
self.current = self.update_rule.apply(self.current, e.value);
|
||
|
||
// Add to ring buffer (bounded)
|
||
self.history.push(e.to_summary());
|
||
|
||
// Periodic checkpoint for audit
|
||
if self.history.len() % Self::CHECKPOINT_INTERVAL == 0 {
|
||
self.checkpoints.push(self.checkpoint());
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 2. Min-Cut Hierarchy Updates
|
||
|
||
**Problem**: Sequential iteration over all hierarchy levels
|
||
|
||
**Solution**: Lazy propagation with dirty tracking
|
||
|
||
```rust
|
||
pub struct LazyHierarchy {
|
||
levels: Vec<HierarchyLevel>,
|
||
/// Bitmap of levels needing update
|
||
dirty_levels: u64,
|
||
/// Deferred updates queue
|
||
pending_updates: VecDeque<DeferredUpdate>,
|
||
}
|
||
|
||
impl LazyHierarchy {
|
||
pub fn insert(&mut self, edge: Edge) {
|
||
// Only update lowest level immediately
|
||
self.levels[0].insert(edge);
|
||
self.dirty_levels |= 1;
|
||
|
||
// Defer higher level updates
|
||
self.pending_updates.push_back(DeferredUpdate::Insert(edge));
|
||
}
|
||
|
||
pub fn get_cut(&mut self) -> CutValue {
|
||
// Propagate only if needed for query
|
||
if self.dirty_levels != 0 {
|
||
self.propagate_lazy();
|
||
}
|
||
self.levels.last().unwrap().cut_value()
|
||
}
|
||
|
||
fn propagate_lazy(&mut self) {
|
||
// Process only dirty levels
|
||
while self.dirty_levels != 0 {
|
||
let level = self.dirty_levels.trailing_zeros() as usize;
|
||
self.update_level(level);
|
||
self.dirty_levels &= !(1 << level);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 3. SIMD-Optimized E-Value Computation
|
||
|
||
```rust
|
||
#[cfg(target_arch = "x86_64")]
|
||
use std::arch::x86_64::*;
|
||
|
||
/// Batch e-value computation with SIMD
|
||
pub fn compute_mixture_evalue_simd(
|
||
likelihoods_h1: &[f64],
|
||
likelihoods_h0: &[f64],
|
||
weights: &[f64],
|
||
) -> f64 {
|
||
assert_eq!(likelihoods_h1.len(), likelihoods_h0.len());
|
||
assert_eq!(likelihoods_h1.len(), weights.len());
|
||
|
||
#[cfg(target_feature = "avx2")]
|
||
unsafe {
|
||
let mut sum = _mm256_setzero_pd();
|
||
|
||
for i in (0..likelihoods_h1.len()).step_by(4) {
|
||
let h1 = _mm256_loadu_pd(likelihoods_h1.as_ptr().add(i));
|
||
let h0 = _mm256_loadu_pd(likelihoods_h0.as_ptr().add(i));
|
||
let w = _mm256_loadu_pd(weights.as_ptr().add(i));
|
||
|
||
let ratio = _mm256_div_pd(h1, h0);
|
||
let weighted = _mm256_mul_pd(ratio, w);
|
||
sum = _mm256_add_pd(sum, weighted);
|
||
}
|
||
|
||
// Horizontal sum
|
||
horizontal_sum_pd(sum)
|
||
}
|
||
|
||
#[cfg(not(target_feature = "avx2"))]
|
||
{
|
||
// Scalar fallback
|
||
likelihoods_h1.iter()
|
||
.zip(likelihoods_h0.iter())
|
||
.zip(weights.iter())
|
||
.map(|((h1, h0), w)| (h1 / h0) * w)
|
||
.sum()
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 4. Receipt Serialization Optimization
|
||
|
||
```rust
|
||
/// Zero-copy receipt serialization
|
||
pub struct ReceiptBuffer {
|
||
/// Pre-allocated buffer pool
|
||
pool: BufferPool,
|
||
/// Current buffer
|
||
current: Buffer,
|
||
}
|
||
|
||
impl WitnessReceipt {
|
||
/// Serialize to pre-allocated buffer (zero-copy)
|
||
pub fn serialize_into(&self, buffer: &mut [u8]) -> Result<usize, SerializeError> {
|
||
let mut cursor = 0;
|
||
|
||
// Fixed-size header (no allocation)
|
||
cursor += self.write_header(&mut buffer[cursor..])?;
|
||
|
||
// Structural witness (fixed size)
|
||
cursor += self.structural.write_to(&mut buffer[cursor..])?;
|
||
|
||
// Predictive witness (bounded size)
|
||
cursor += self.predictive.write_to(&mut buffer[cursor..])?;
|
||
|
||
// Evidential witness (fixed size)
|
||
cursor += self.evidential.write_to(&mut buffer[cursor..])?;
|
||
|
||
// Hash and signature (fixed size)
|
||
buffer[cursor..cursor + 32].copy_from_slice(&self.receipt_hash);
|
||
cursor += 32;
|
||
buffer[cursor..cursor + 64].copy_from_slice(&self.signature.to_bytes());
|
||
cursor += 64;
|
||
|
||
Ok(cursor)
|
||
}
|
||
}
|
||
```
|
||
|
||
### Latency Budget (Revised)
|
||
|
||
| Component | Budget | Optimization | Measured p99 |
|
||
|-----------|--------|--------------|--------------|
|
||
| Min-cut query | 10ms | Lazy propagation | TBD |
|
||
| Conformal prediction | 15ms | Cached quantiles | TBD |
|
||
| E-process update | 5ms | SIMD mixture | TBD |
|
||
| Decision logic | 5ms | Short-circuit | TBD |
|
||
| Receipt generation | 10ms | Zero-copy serialize | TBD |
|
||
| Signing | 5ms | Ed25519 batch | TBD |
|
||
| **Total** | **50ms** | | |
|
||
|
||
---
|
||
|
||
## Distributed Coordination
|
||
|
||
### Multi-Agent Gate Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ DISTRIBUTED COHERENCE GATE │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ REGIONAL │ │ REGIONAL │ │ REGIONAL │ │
|
||
│ │ GATE (Raft) │ │ GATE (Raft) │ │ GATE (Raft) │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ • Local cuts │ │ • Local cuts │ │ • Local cuts │ │
|
||
│ │ • Local conf │ │ • Local conf │ │ • Local conf │ │
|
||
│ │ • Local e-proc │ │ • Local e-proc │ │ • Local e-proc │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ └──────────────────────┼──────────────────────┘ │
|
||
│ │ │
|
||
│ ┌─────────────▼─────────────┐ │
|
||
│ │ GLOBAL COORDINATOR │ │
|
||
│ │ (DAG Consensus) │ │
|
||
│ │ │ │
|
||
│ │ • Cross-region cuts │ │
|
||
│ │ • Aggregated e-process │ │
|
||
│ │ • Boundary arbitration │ │
|
||
│ └───────────────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Hierarchical Decision Protocol
|
||
|
||
```rust
|
||
/// Distributed gate with hierarchical coordination
|
||
pub struct DistributedGateController {
|
||
/// Local gate for fast-path decisions
|
||
local_gate: AnytimeGateController,
|
||
|
||
/// Regional coordinator (Raft consensus)
|
||
regional: RegionalCoordinator,
|
||
|
||
/// Global coordinator (DAG consensus)
|
||
global: GlobalCoordinator,
|
||
|
||
/// Decision routing policy
|
||
routing: DecisionRoutingPolicy,
|
||
}
|
||
|
||
pub enum DecisionScope {
|
||
/// Action affects only local partition
|
||
Local,
|
||
/// Action crosses regional boundary
|
||
Regional,
|
||
/// Action has global implications
|
||
Global,
|
||
}
|
||
|
||
impl DistributedGateController {
|
||
pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
|
||
// 1. Determine scope
|
||
let scope = self.routing.classify(action, context);
|
||
|
||
// 2. Route to appropriate level
|
||
match scope {
|
||
DecisionScope::Local => {
|
||
// Fast path: local decision only
|
||
self.local_gate.evaluate(action, context)
|
||
}
|
||
|
||
DecisionScope::Regional => {
|
||
// Medium path: coordinate with regional peers
|
||
let local_result = self.local_gate.evaluate(action, context);
|
||
let regional_result = self.regional.coordinate(action, &local_result).await?;
|
||
self.merge_results(local_result, regional_result)
|
||
}
|
||
|
||
DecisionScope::Global => {
|
||
// Slow path: full coordination
|
||
let local_result = self.local_gate.evaluate(action, context);
|
||
let regional_result = self.regional.coordinate(action, &local_result).await?;
|
||
let global_result = self.global.arbitrate(action, ®ional_result).await?;
|
||
self.merge_all_results(local_result, regional_result, global_result)
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Distributed E-Process Aggregation
|
||
|
||
```rust
|
||
/// E-process that aggregates across distributed gates
|
||
pub struct DistributedEProcess {
|
||
/// Local e-process
|
||
local: EProcess,
|
||
|
||
/// Peer e-process summaries (received via gossip)
|
||
peer_summaries: HashMap<NodeId, EProcessSummary>,
|
||
|
||
/// Aggregation method
|
||
aggregation: AggregationMethod,
|
||
}
|
||
|
||
pub enum AggregationMethod {
|
||
/// Conservative: minimum across all nodes
|
||
Minimum,
|
||
/// Average with confidence weighting
|
||
WeightedAverage,
|
||
/// Consensus-based (requires agreement)
|
||
Consensus { threshold: f64 },
|
||
}
|
||
|
||
impl DistributedEProcess {
|
||
/// Get aggregated e-value for distributed decision
|
||
pub fn aggregated_value(&self) -> f64 {
|
||
match self.aggregation {
|
||
AggregationMethod::Minimum => {
|
||
let local = self.local.current_value();
|
||
let peer_min = self.peer_summaries.values()
|
||
.map(|s| s.current_value)
|
||
.fold(f64::INFINITY, f64::min);
|
||
local.min(peer_min)
|
||
}
|
||
|
||
AggregationMethod::WeightedAverage => {
|
||
let total_weight: f64 = 1.0 + self.peer_summaries.values()
|
||
.map(|s| s.confidence_weight)
|
||
.sum::<f64>();
|
||
|
||
let weighted_sum = self.local.current_value()
|
||
+ self.peer_summaries.values()
|
||
.map(|s| s.current_value * s.confidence_weight)
|
||
.sum::<f64>();
|
||
|
||
weighted_sum / total_weight
|
||
}
|
||
|
||
AggregationMethod::Consensus { threshold } => {
|
||
// Requires threshold fraction of nodes to agree
|
||
let values: Vec<f64> = std::iter::once(self.local.current_value())
|
||
.chain(self.peer_summaries.values().map(|s| s.current_value))
|
||
.collect();
|
||
|
||
// Return median if sufficient agreement, else conservative min
|
||
if self.check_agreement(&values, threshold) {
|
||
statistical_median(&values)
|
||
} else {
|
||
values.iter().cloned().fold(f64::INFINITY, f64::min)
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Fault Tolerance
|
||
|
||
```rust
|
||
/// Fault-tolerant gate with automatic failover
|
||
pub struct FaultTolerantGate {
|
||
/// Primary gate
|
||
primary: AnytimeGateController,
|
||
|
||
/// Standby gates (hot standbys)
|
||
standbys: Vec<AnytimeGateController>,
|
||
|
||
/// Health monitor
|
||
health: HealthMonitor,
|
||
|
||
/// Failover policy
|
||
failover: FailoverPolicy,
|
||
}
|
||
|
||
pub struct FailoverPolicy {
|
||
/// Maximum consecutive failures before failover
|
||
max_failures: u32,
|
||
/// Health check interval
|
||
check_interval: Duration,
|
||
/// Recovery grace period
|
||
recovery_grace: Duration,
|
||
}
|
||
|
||
impl FaultTolerantGate {
|
||
pub async fn evaluate(&mut self, action: &Action, context: &Context) -> GateResult {
|
||
// Try primary
|
||
match self.try_primary(action, context).await {
|
||
Ok(result) => return Ok(result),
|
||
Err(e) => {
|
||
self.health.record_failure(&e);
|
||
}
|
||
}
|
||
|
||
// Failover to standbys
|
||
for (idx, standby) in self.standbys.iter_mut().enumerate() {
|
||
match standby.evaluate(action, context) {
|
||
Ok(result) => {
|
||
// Promote standby if primary unhealthy
|
||
if self.health.should_failover() {
|
||
self.promote_standby(idx);
|
||
}
|
||
return Ok(result);
|
||
}
|
||
Err(e) => {
|
||
self.health.record_standby_failure(idx, &e);
|
||
}
|
||
}
|
||
}
|
||
|
||
// All gates failed - safe default
|
||
Ok(GateResult {
|
||
decision: GateDecision::Deny,
|
||
reason: "All gates unavailable - failing safe".into(),
|
||
..Default::default()
|
||
})
|
||
}
|
||
}
|
||
```
|
||
|
||
### Integration with RuVector Consensus
|
||
|
||
| Consensus Layer | RuVector Module | Gate Integration |
|
||
|-----------------|-----------------|------------------|
|
||
| Regional (Raft) | `ruvector-raft` | Local cut coordination, leader-based decisions |
|
||
| Global (DAG) | `ruvector-cluster` | Cross-region boundary arbitration |
|
||
| State Sync | `ruvector-sync` | E-process summary propagation |
|
||
| Receipt Chain | `ruvector-merkle` | Distributed receipt verification |
|
||
|
||
---
|
||
|
||
## Hardware Mapping: 256-Tile WASM Fabric
|
||
|
||
The coherence gate is an ideal workload for event-driven WASM hardware: **mostly silent, then extremely decisive when boundaries move**.
|
||
|
||
### Tile Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ 256-TILE COGNITUM FABRIC │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
||
│ │ TILE ZERO (Arbiter) │ │
|
||
│ │ │ │
|
||
│ │ • Merge worker reports • Hierarchical min-cut │ │
|
||
│ │ • Global gate decision • Permit token issuance │ │
|
||
│ │ • Witness receipt log • Hash-chained eventlog │ │
|
||
│ └──────────────────────────────┬───────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ┌────────────────────┼────────────────────┐ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Workers │ │ Workers │ │ Workers │ ... │
|
||
│ │ [1-85] │ │ [86-170] │ │ [171-255] │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ Shard A │ │ Shard B │ │ Shard C │ │
|
||
│ │ Local cuts │ │ Local cuts │ │ Local cuts │ │
|
||
│ │ E-accum │ │ E-accum │ │ E-accum │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Worker Tile Responsibilities
|
||
|
||
Each of the 255 worker tiles maintains a **local shard**:
|
||
|
||
```rust
|
||
/// Worker tile state (fits in ~64KB WASM memory)
|
||
#[repr(C)]
|
||
pub struct WorkerTileState {
|
||
/// Compact neighborhood graph (edges + weights)
|
||
graph_shard: CompactGraph, // ~32KB
|
||
|
||
/// Rolling feature window for normality scores
|
||
feature_window: RingBuffer<f32>, // ~8KB
|
||
|
||
/// Local coherence score
|
||
coherence: f32,
|
||
|
||
/// Local boundary candidates (top-k edges)
|
||
boundary_edges: [EdgeId; 8],
|
||
|
||
/// Local e-value accumulator
|
||
e_accumulator: f64,
|
||
|
||
/// Tick counter
|
||
tick: u64,
|
||
}
|
||
|
||
/// Per-tick processing: only deltas
|
||
impl WorkerTileState {
|
||
/// Process incoming delta (edge add/remove/weight update)
|
||
pub fn ingest_delta(&mut self, delta: &Delta) -> Status {
|
||
match delta {
|
||
Delta::EdgeAdd(e) => self.graph_shard.add_edge(e),
|
||
Delta::EdgeRemove(e) => self.graph_shard.remove_edge(e),
|
||
Delta::WeightUpdate(e, w) => self.graph_shard.update_weight(e, *w),
|
||
Delta::Observation(score) => self.feature_window.push(*score),
|
||
}
|
||
self.update_local_coherence();
|
||
Status::Ok
|
||
}
|
||
|
||
/// Tick: compute and emit report
|
||
pub fn tick(&mut self, now_ns: u64) -> TileReport {
|
||
self.tick = now_ns;
|
||
|
||
// Tiny math: update e-accumulator
|
||
self.e_accumulator = self.compute_local_evalue();
|
||
|
||
TileReport {
|
||
tile_id: self.id,
|
||
coherence: self.coherence,
|
||
boundary_moved: self.detect_boundary_movement(),
|
||
suspicious_edges: self.top_k_suspicious(),
|
||
e_value: self.e_accumulator as f32,
|
||
witness_fragment: self.extract_witness_fragment(),
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Fixed-size report (fits in single cache line)
|
||
#[repr(C, align(64))]
|
||
pub struct TileReport {
|
||
tile_id: u8,
|
||
coherence: f32,
|
||
boundary_moved: bool,
|
||
suspicious_edges: [EdgeId; 4],
|
||
e_value: f32,
|
||
witness_fragment: WitnessFragment,
|
||
}
|
||
```
|
||
|
||
### TileZero Responsibilities
|
||
|
||
TileZero acts as the **arbiter** that issues final decisions:
|
||
|
||
```rust
|
||
/// TileZero: Global gate decision and permit issuance
|
||
pub struct TileZero {
|
||
/// Merged supergraph (reduced from worker summaries)
|
||
supergraph: ReducedGraph,
|
||
|
||
/// Canonical permit token state
|
||
permit_state: PermitState,
|
||
|
||
/// Hash-chained witness receipt log
|
||
receipt_log: ReceiptLog,
|
||
|
||
/// Threshold configuration
|
||
thresholds: GateThresholds,
|
||
}
|
||
|
||
impl TileZero {
|
||
/// Collect reports from all worker tiles
|
||
pub fn collect_reports(&mut self, reports: &[TileReport; 255]) {
|
||
// Merge worker summaries into supergraph
|
||
for report in reports {
|
||
if report.boundary_moved {
|
||
self.supergraph.update_from_fragment(&report.witness_fragment);
|
||
}
|
||
self.supergraph.update_coherence(report.tile_id, report.coherence);
|
||
}
|
||
}
|
||
|
||
/// Issue gate decision (microsecond latency)
|
||
pub fn decide(&mut self, action_ctx: &ActionContext) -> PermitToken {
|
||
// Three stacked filters:
|
||
|
||
// 1. Structural filter (global cut on reduced graph)
|
||
let structural_ok = self.supergraph.global_cut() >= self.thresholds.min_cut;
|
||
|
||
// 2. Shift filter (aggregated shift pressure)
|
||
let shift_pressure = self.aggregate_shift_pressure();
|
||
let shift_ok = shift_pressure < self.thresholds.max_shift;
|
||
|
||
// 3. Evidence filter (can stop immediately if enough evidence)
|
||
let e_aggregate = self.aggregate_evidence();
|
||
let evidence_decision = self.evidence_decision(e_aggregate);
|
||
|
||
// Combined decision
|
||
let decision = match (structural_ok, shift_ok, evidence_decision) {
|
||
(false, _, _) => GateDecision::Deny, // Structure broken
|
||
(_, false, _) => GateDecision::Defer, // Shift detected
|
||
(_, _, EvidenceDecision::Reject) => GateDecision::Deny,
|
||
(_, _, EvidenceDecision::Continue) => GateDecision::Defer,
|
||
(true, true, EvidenceDecision::Accept) => GateDecision::Permit,
|
||
};
|
||
|
||
// Issue token
|
||
self.issue_permit_token(action_ctx, decision)
|
||
}
|
||
|
||
/// Issue permit token (a signed capability)
|
||
fn issue_permit_token(
|
||
&mut self,
|
||
ctx: &ActionContext,
|
||
decision: GateDecision,
|
||
) -> PermitToken {
|
||
let witness_hash = self.compute_witness_hash();
|
||
|
||
let token = PermitToken {
|
||
decision,
|
||
action_id: ctx.action_id,
|
||
timestamp: now_ns(),
|
||
ttl_ns: self.thresholds.permit_ttl,
|
||
witness_hash,
|
||
sequence: self.permit_state.next_sequence(),
|
||
};
|
||
|
||
// MAC or sign the token
|
||
let mac = self.permit_state.sign(&token);
|
||
|
||
// Emit receipt
|
||
self.emit_receipt(&token, &mac);
|
||
|
||
PermitToken { mac, ..token }
|
||
}
|
||
|
||
/// Emit witness receipt (hash-chained)
|
||
fn emit_receipt(&mut self, token: &PermitToken, mac: &[u8; 32]) {
|
||
let receipt = WitnessReceipt {
|
||
token: token.clone(),
|
||
mac: *mac,
|
||
previous_hash: self.receipt_log.last_hash(),
|
||
witness_summary: self.supergraph.witness_summary(),
|
||
};
|
||
|
||
self.receipt_log.append(receipt);
|
||
}
|
||
}
|
||
|
||
/// Permit token: a capability that agents must present
|
||
#[repr(C)]
|
||
pub struct PermitToken {
|
||
pub decision: GateDecision,
|
||
pub action_id: ActionId,
|
||
pub timestamp: u64,
|
||
pub ttl_ns: u64,
|
||
pub witness_hash: [u8; 32],
|
||
pub sequence: u64,
|
||
pub mac: [u8; 32], // HMAC or signature
|
||
}
|
||
|
||
impl PermitToken {
|
||
/// Agents must present valid token to perform actions
|
||
pub fn is_valid(&self, verifier: &Verifier) -> bool {
|
||
// Check TTL
|
||
if now_ns() > self.timestamp + self.ttl_ns {
|
||
return false;
|
||
}
|
||
|
||
// Verify MAC/signature
|
||
verifier.verify(self, &self.mac)
|
||
}
|
||
}
|
||
```
|
||
|
||
### WASM Kernel API
|
||
|
||
Each tile runs a minimal WASM kernel:
|
||
|
||
```rust
|
||
/// Worker tile WASM exports
|
||
#[no_mangle]
|
||
pub extern "C" fn ingest_delta(delta_ptr: *const u8, len: usize) -> u32 {
|
||
let delta = unsafe { core::slice::from_raw_parts(delta_ptr, len) };
|
||
TILE_STATE.with(|state| state.borrow_mut().ingest_delta(delta))
|
||
}
|
||
|
||
#[no_mangle]
|
||
pub extern "C" fn tick(now_ns: u64) -> *const TileReport {
|
||
TILE_STATE.with(|state| state.borrow_mut().tick(now_ns))
|
||
}
|
||
|
||
#[no_mangle]
|
||
pub extern "C" fn get_witness_fragment(id: u32) -> *const u8 {
|
||
TILE_STATE.with(|state| state.borrow().get_witness_fragment(id))
|
||
}
|
||
|
||
/// TileZero WASM/native exports
|
||
#[no_mangle]
|
||
pub extern "C" fn collect_reports(reports_ptr: *const TileReport, count: usize) {
|
||
TILEZERO.with(|tz| tz.borrow_mut().collect_reports(reports_ptr, count))
|
||
}
|
||
|
||
#[no_mangle]
|
||
pub extern "C" fn decide(action_ctx_ptr: *const ActionContext) -> *const PermitToken {
|
||
TILEZERO.with(|tz| tz.borrow_mut().decide(action_ctx_ptr))
|
||
}
|
||
|
||
#[no_mangle]
|
||
pub extern "C" fn get_receipt(sequence: u64) -> *const WitnessReceipt {
|
||
TILEZERO.with(|tz| tz.borrow().get_receipt(sequence))
|
||
}
|
||
```
|
||
|
||
### v0 Implementation Strategy
|
||
|
||
Ship fast by layering:
|
||
|
||
| Phase | Components | Skip Initially |
|
||
|-------|------------|----------------|
|
||
| **v0.1** | Structural coherence + witness receipt | Shift filter, evidence filter |
|
||
| **v0.2** | Add shift filter (normality scores) | CORE RL adaptation |
|
||
| **v0.3** | Add evidence filter (e-values) | Mixture e-values |
|
||
| **v1.0** | Full three-filter stack | - |
|
||
|
||
### Rust Deliverables
|
||
|
||
| Crate | Description | Dependencies |
|
||
|-------|-------------|--------------|
|
||
| `cognitum-gate-kernel` | `no_std` WASM kernel for worker tiles | `ruvector-mincut` (core algorithms) |
|
||
| `cognitum-gate-tilezero` | Native arbiter for TileZero | `ruvector-mincut`, `blake3`, `ed25519` |
|
||
| `mcp-gate` | MCP server for agent integration | `cognitum-gate-tilezero` |
|
||
|
||
```
|
||
cognitum-gate/
|
||
├── cognitum-gate-kernel/ # no_std WASM
|
||
│ ├── Cargo.toml
|
||
│ └── src/
|
||
│ ├── lib.rs # WASM exports
|
||
│ ├── shard.rs # Compact graph shard
|
||
│ ├── evidence.rs # Local e-accumulator
|
||
│ └── report.rs # TileReport generation
|
||
│
|
||
├── cognitum-gate-tilezero/ # Native arbiter
|
||
│ ├── Cargo.toml
|
||
│ └── src/
|
||
│ ├── lib.rs
|
||
│ ├── merge.rs # Report merging
|
||
│ ├── supergraph.rs # Reduced global graph
|
||
│ ├── permit.rs # Token issuance
|
||
│ └── receipt.rs # Hash-chained log
|
||
│
|
||
└── mcp-gate/ # MCP integration
|
||
├── Cargo.toml
|
||
└── src/
|
||
├── lib.rs
|
||
├── tools.rs # permit_action, get_receipt, replay_decision
|
||
└── server.rs # MCP server
|
||
```
|
||
|
||
### MCP Gate Tools
|
||
|
||
```rust
|
||
/// MCP tool: Request permission for an action
|
||
#[mcp_tool]
|
||
pub async fn permit_action(
|
||
action_id: String,
|
||
action_type: String,
|
||
context: serde_json::Value,
|
||
) -> Result<PermitResponse, McpError> {
|
||
let ctx = ActionContext::from_json(&context)?;
|
||
let token = TILEZERO.decide(&ctx);
|
||
|
||
Ok(PermitResponse {
|
||
decision: token.decision.to_string(),
|
||
token: token.encode_base64(),
|
||
witness_hash: hex::encode(&token.witness_hash),
|
||
valid_until_ns: token.timestamp + token.ttl_ns,
|
||
})
|
||
}
|
||
|
||
/// MCP tool: Get witness receipt for audit
|
||
#[mcp_tool]
|
||
pub async fn get_receipt(sequence: u64) -> Result<ReceiptResponse, McpError> {
|
||
let receipt = TILEZERO.get_receipt(sequence)?;
|
||
|
||
Ok(ReceiptResponse {
|
||
sequence,
|
||
decision: receipt.token.decision.to_string(),
|
||
timestamp: receipt.token.timestamp,
|
||
witness_summary: receipt.witness_summary.to_json(),
|
||
previous_hash: hex::encode(&receipt.previous_hash),
|
||
receipt_hash: hex::encode(&receipt.hash()),
|
||
})
|
||
}
|
||
|
||
/// MCP tool: Replay decision for debugging/audit
|
||
#[mcp_tool]
|
||
pub async fn replay_decision(
|
||
sequence: u64,
|
||
verify_chain: bool,
|
||
) -> Result<ReplayResponse, McpError> {
|
||
let receipt = TILEZERO.get_receipt(sequence)?;
|
||
|
||
// Optionally verify hash chain
|
||
if verify_chain {
|
||
TILEZERO.verify_chain_to(sequence)?;
|
||
}
|
||
|
||
// Replay the decision with logged state
|
||
let replayed = TILEZERO.replay(&receipt)?;
|
||
|
||
Ok(ReplayResponse {
|
||
original_decision: receipt.token.decision.to_string(),
|
||
replayed_decision: replayed.decision.to_string(),
|
||
match_confirmed: receipt.token.decision == replayed.decision,
|
||
state_snapshot: replayed.state_snapshot.to_json(),
|
||
})
|
||
}
|
||
```
|
||
|
||
### The Practical Win
|
||
|
||
This gives Cognitum a clear job that buyers understand:
|
||
|
||
> **"We do not just detect issues, we prevent unsafe actions."**
|
||
> **"We can prove why we blocked or allowed it."**
|
||
> **"We stay calm until structure breaks."**
|
||
|
||
The permit token as a capability means:
|
||
- Agents cannot act without presenting a valid token
|
||
- Tokens expire (TTL-bounded)
|
||
- Every token is backed by a witness receipt
|
||
- The entire chain is cryptographically verifiable
|
||
|
||
---
|
||
|
||
## API Contract
|
||
|
||
### Request: Permit Action
|
||
|
||
```json
|
||
{
|
||
"action_id": "cfg-push-7a3f",
|
||
"action_type": "config_change",
|
||
"target": {
|
||
"device": "router-west-03",
|
||
"path": "/network/interfaces/eth0"
|
||
},
|
||
"context": {
|
||
"agent_id": "ops-agent-12",
|
||
"session_id": "sess-abc123",
|
||
"prior_actions": ["cfg-push-7a3e"],
|
||
"urgency": "normal"
|
||
}
|
||
}
|
||
```
|
||
|
||
### Response: Permit
|
||
|
||
```json
|
||
{
|
||
"decision": "permit",
|
||
"token": "eyJ0eXAiOiJQVCIsImFsZyI6IkVkMjU1MTkifQ...",
|
||
"valid_until_ns": 1737158400000000000,
|
||
"witness": {
|
||
"structural": {
|
||
"cut_value": 12.7,
|
||
"partition": "stable",
|
||
"critical_edges": 0
|
||
},
|
||
"predictive": {
|
||
"set_size": 3,
|
||
"coverage": 0.92
|
||
},
|
||
"evidential": {
|
||
"e_value": 847.3,
|
||
"verdict": "accept"
|
||
}
|
||
},
|
||
"receipt_sequence": 1847392
|
||
}
|
||
```
|
||
|
||
### Response: Defer
|
||
|
||
```json
|
||
{
|
||
"decision": "defer",
|
||
"reason": "shift_detected",
|
||
"detail": "Distribution shift pressure 0.73 exceeds threshold 0.5",
|
||
"escalation": {
|
||
"to": "human_operator",
|
||
"context_url": "/receipts/1847393/context",
|
||
"timeout_ns": 300000000000
|
||
},
|
||
"witness": {
|
||
"structural": { "cut_value": 11.2, "partition": "stable" },
|
||
"predictive": { "set_size": 18, "coverage": 0.91 },
|
||
"evidential": { "e_value": 3.2, "verdict": "continue" }
|
||
},
|
||
"receipt_sequence": 1847393
|
||
}
|
||
```
|
||
|
||
### Response: Deny
|
||
|
||
```json
|
||
{
|
||
"decision": "deny",
|
||
"reason": "boundary_violation",
|
||
"detail": "Action crosses fragile partition (cut=2.1 < min=5.0)",
|
||
"witness": {
|
||
"structural": {
|
||
"cut_value": 2.1,
|
||
"partition": "fragile",
|
||
"critical_edges": 4,
|
||
"boundary": ["edge-17", "edge-23", "edge-41", "edge-52"]
|
||
},
|
||
"predictive": { "set_size": 47, "coverage": 0.88 },
|
||
"evidential": { "e_value": 0.004, "verdict": "reject" }
|
||
},
|
||
"receipt_sequence": 1847394
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Migration Path
|
||
|
||
### Phase M1: Shadow Mode
|
||
|
||
Run AVCG alongside existing `GateController`. Compare decisions, don't enforce.
|
||
|
||
```rust
|
||
impl HybridGate {
|
||
pub fn evaluate(&mut self, action: &Action) -> GateResult {
|
||
// Existing gate makes the decision
|
||
let legacy_result = self.legacy_gate.evaluate(action);
|
||
|
||
// AVCG runs in shadow, logs disagreements
|
||
let avcg_result = self.avcg_gate.evaluate(action);
|
||
|
||
if legacy_result.decision != avcg_result.decision {
|
||
metrics::counter!("gate.shadow.disagreement").increment(1);
|
||
log::info!(
|
||
"Shadow disagreement: legacy={:?} avcg={:?} action={}",
|
||
legacy_result.decision,
|
||
avcg_result.decision,
|
||
action.id
|
||
);
|
||
}
|
||
|
||
legacy_result // Legacy still decides
|
||
}
|
||
}
|
||
```
|
||
|
||
**Exit criteria**: <1% disagreement rate over 7 days, zero false denies on known-safe actions.
|
||
|
||
### Phase M2: Canary Enforcement
|
||
|
||
AVCG enforces for 5% of traffic, legacy handles rest.
|
||
|
||
```rust
|
||
impl CanaryGate {
|
||
pub fn evaluate(&mut self, action: &Action) -> GateResult {
|
||
let canary = self.canary_selector.select(action);
|
||
|
||
if canary {
|
||
metrics::counter!("gate.canary.avcg").increment(1);
|
||
self.avcg_gate.evaluate(action)
|
||
} else {
|
||
self.legacy_gate.evaluate(action)
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Exit criteria**: No incidents attributed to AVCG decisions over 14 days.
|
||
|
||
### Phase M3: Majority Rollout
|
||
|
||
AVCG handles 95%, legacy available for fallback.
|
||
|
||
### Phase M4: Full Cutover
|
||
|
||
Legacy removed. AVCG is the gate.
|
||
|
||
```
|
||
Timeline:
|
||
M1 (Shadow) → 2-4 weeks
|
||
M2 (Canary 5%) → 2 weeks
|
||
M3 (Majority) → 2 weeks
|
||
M4 (Full) → 1 week
|
||
─────────
|
||
Total → 7-9 weeks
|
||
```
|
||
|
||
---
|
||
|
||
## Observability
|
||
|
||
### Metrics (Prometheus)
|
||
|
||
```
|
||
# Decision counters
|
||
gate_decisions_total{decision="permit|defer|deny", reason="..."}
|
||
|
||
# Latency histograms
|
||
gate_latency_seconds{phase="mincut|conformal|eprocess|decision|receipt"}
|
||
|
||
# Signal values
|
||
gate_cut_value{quantile="0.5|0.9|0.99"}
|
||
gate_prediction_set_size{quantile="0.5|0.9|0.99"}
|
||
gate_evalue{quantile="0.5|0.9|0.99"}
|
||
|
||
# Health
|
||
gate_healthy{component="mincut|conformal|eprocess"}
|
||
gate_failover_total{from="primary|standby_N"}
|
||
|
||
# Coverage tracking
|
||
gate_conformal_coverage_rate # Should stay ≥ 0.85
|
||
gate_eprocess_power # Evidence accumulation rate
|
||
```
|
||
|
||
### Alerting Thresholds
|
||
|
||
| Alert | Condition | Severity |
|
||
|-------|-----------|----------|
|
||
| `GateHighDenyRate` | deny_rate > 10% for 5m | Warning |
|
||
| `GateLatencyHigh` | p99 > 100ms for 5m | Warning |
|
||
| `GateCoverageDrift` | coverage < 0.80 for 15m | Critical |
|
||
| `GateUnhealthy` | any component unhealthy for 1m | Critical |
|
||
| `GateReceiptChainBroken` | hash verification fails | Critical |
|
||
|
||
### Debug Query: Why Was This Denied?
|
||
|
||
```bash
|
||
# Get full decision context
|
||
curl /api/gate/receipts/1847394/explain
|
||
|
||
# Response:
|
||
{
|
||
"receipt_sequence": 1847394,
|
||
"decision": "deny",
|
||
"explanation": {
|
||
"primary_reason": "structural",
|
||
"structural": {
|
||
"cut_value": 2.1,
|
||
"threshold": 5.0,
|
||
"failed": true,
|
||
"boundary_edges": [
|
||
{"id": "edge-17", "weight": 0.3, "endpoints": ["node-a", "node-b"]},
|
||
...
|
||
],
|
||
"partition_context": "Device router-west-03 is in partition P7 which has been unstable since 14:32:07 UTC"
|
||
},
|
||
"predictive": { "failed": false, "detail": "Set size 47 within bounds" },
|
||
"evidential": { "failed": true, "detail": "E-value 0.004 < τ_deny 0.01" }
|
||
},
|
||
"suggested_action": "Wait for partition P7 to stabilize or escalate to human approval",
|
||
"similar_past_decisions": [1847201, 1846998, 1846754]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Open Questions Resolution
|
||
|
||
### Q1: Graph model scope — immediate actions or multi-step lookahead?
|
||
|
||
**Decision**: Immediate actions for v0, optional 1-step lookahead for v1.
|
||
|
||
**Rationale**: Multi-step lookahead requires predicting action sequences, which adds latency and complexity. Start simple: evaluate the action being requested *right now*. If the current action is safe but would lead to an unsafe state, the *next* action will be denied when it's requested.
|
||
|
||
### Q2: E-process null — "action safety" vs "policy consistency"?
|
||
|
||
**Decision**: Action safety as primary null, policy consistency as secondary.
|
||
|
||
**Rationale**:
|
||
- Primary H₀: P(action leads to unsafe state) ≤ p₀
|
||
- Secondary (optional): Current action consistent with established policy
|
||
|
||
Action safety is more fundamental. Policy consistency can be added as a separate e-process that runs in parallel.
|
||
|
||
### Q3: Threshold learning — fixed or adaptive?
|
||
|
||
**Decision**: Fixed for v0, adaptive via meta-learning for v1.
|
||
|
||
**Rationale**: Fixed thresholds are easier to audit and explain. Once we have production data, we can train a meta-learner to adjust thresholds based on observed false positive/negative rates.
|
||
|
||
### Q4: Human-in-loop — how are DEFER decisions presented?
|
||
|
||
**Decision**: Structured escalation with timeout and context link.
|
||
|
||
```json
|
||
{
|
||
"escalation": {
|
||
"to": "human_operator",
|
||
"channel": "slack|pagerduty|dashboard",
|
||
"context_url": "/receipts/{seq}/context",
|
||
"timeout_ns": 300000000000,
|
||
"default_on_timeout": "deny"
|
||
}
|
||
}
|
||
```
|
||
|
||
If human doesn't respond within timeout, default to DENY (fail-safe).
|
||
|
||
See **Hybrid Agent/Human Workflow** section for full details.
|
||
|
||
### Q5: Adversarial robustness — adaptive adversaries?
|
||
|
||
**Decision**: Defense in depth + rate limiting + anomaly detection.
|
||
|
||
**Mitigations**:
|
||
1. Rate limit: Max N decisions per agent per minute
|
||
2. Anomaly: Flag agents with unusually high deny rates
|
||
3. Honeypots: Inject synthetic "trap" actions to detect probing
|
||
4. Rotation: Periodically rotate threshold parameters within safe bounds
|
||
|
||
---
|
||
|
||
## Definition of Done
|
||
|
||
### v0.1 Shippable Criteria
|
||
|
||
| Criterion | Metric | Target |
|
||
|-----------|--------|--------|
|
||
| **Structural filter works** | Min-cut correctly identifies fragile partitions | 100% on test suite |
|
||
| **Receipts are signed** | All receipts have valid Ed25519 signature | 100% |
|
||
| **Receipts are chained** | Hash chain verifies for all receipts | 100% |
|
||
| **Latency acceptable** | p99 gate decision time | < 50ms |
|
||
| **No false denies** | Known-safe actions are permitted | 100% on test suite |
|
||
| **Demo scenario runs** | Network security control plane demo | End-to-end pass |
|
||
|
||
### v0.1 Minimum Viable Demo
|
||
|
||
**Scenario**: Agent requests config push to network device.
|
||
|
||
1. Agent calls `permit_action` with device target
|
||
2. Gate evaluates structural coherence (min-cut)
|
||
3. Gate returns PERMIT with signed receipt
|
||
4. Agent presents token to device
|
||
5. Device verifies token, accepts config
|
||
|
||
**Success**: Auditor can replay decision from receipt and get same result.
|
||
|
||
---
|
||
|
||
## Cost Model
|
||
|
||
### Memory per Tile (WASM)
|
||
|
||
| Component | Size | Notes |
|
||
|-----------|------|-------|
|
||
| Graph shard | 32 KB | ~2000 edges at 16 bytes each |
|
||
| Feature window | 8 KB | 2048 f32 values |
|
||
| E-accumulator | 64 B | f64 + metadata |
|
||
| Boundary edges | 64 B | 8 × EdgeId |
|
||
| **Total per worker** | **~41 KB** | Fits in 64KB WASM page |
|
||
| **Total 255 workers** | **~10.2 MB** | |
|
||
| TileZero state | ~1 MB | Supergraph + receipt log head |
|
||
| **Total fabric** | **~12 MB** | |
|
||
|
||
### Network Bandwidth
|
||
|
||
| Flow | Frequency | Size | Bandwidth |
|
||
|------|-----------|------|-----------|
|
||
| Worker → TileZero reports | 1/tick (10ms) | 64 B × 255 | ~1.6 MB/s |
|
||
| Receipt log append | per decision | ~512 B | Variable |
|
||
| Gossip (distributed) | 1/100ms | ~1 KB × peers | ~10 KB/s × P |
|
||
|
||
### Storage Growth
|
||
|
||
| Item | Size | Retention | Growth |
|
||
|------|------|-----------|--------|
|
||
| Receipt | ~512 B | 90 days | ~44 MB/day @ 1000 decisions/s |
|
||
| E-process checkpoint | ~128 B | Forever | ~11 MB/day @ 1000 decisions/s |
|
||
| Audit log | ~256 B | 1 year | ~22 MB/day @ 1000 decisions/s |
|
||
|
||
**90-day storage**: ~7 GB receipts + ~1 GB checkpoints ≈ **8 GB**
|
||
|
||
---
|
||
|
||
## Hybrid Agent/Human Workflow
|
||
|
||
The coherence gate is designed for **bounded autonomy**, not full autonomy. Humans stay in the loop at critical decision points.
|
||
|
||
### Design Philosophy
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ │
|
||
│ "Agents handle the routine. Humans handle the novel." │
|
||
│ │
|
||
│ PERMIT → Agent proceeds autonomously (low risk, high confidence) │
|
||
│ DEFER → Human decides (uncertain, boundary case, policy gap) │
|
||
│ DENY → Blocked automatically (structural violation, unsafe) │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
The gate doesn't replace human judgment—it **routes decisions to humans when judgment is needed**.
|
||
|
||
### Escalation Tiers
|
||
|
||
| Tier | Trigger | Responder | SLA | Example |
|
||
|------|---------|-----------|-----|---------|
|
||
| **T0** | PERMIT | None (automated) | 0 | Routine config within stable partition |
|
||
| **T1** | DEFER (shift) | On-call operator | 5 min | New dependency pattern detected |
|
||
| **T2** | DEFER (boundary) | Senior engineer | 15 min | Action crosses partition boundary |
|
||
| **T3** | DEFER (policy gap) | Policy team | 1 hour | No precedent for this action type |
|
||
| **T4** | DENY override request | Security + Management | 4 hours | Agent requesting exception to denial |
|
||
|
||
### Human Decision Interface
|
||
|
||
When a DEFER is escalated, humans see:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ DECISION REQUIRED Timeout: 4:32 │
|
||
├─────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Agent: ops-agent-12 │
|
||
│ Action: Push config to router-west-03 /network/interfaces/eth0 │
|
||
│ │
|
||
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
||
│ │ WHY DEFERRED │ │
|
||
│ │ │ │
|
||
│ │ • Shift detected: New dependency pattern (0.73 > 0.5 threshold)│ │
|
||
│ │ • This device was added to the graph 2 hours ago │ │
|
||
│ │ • Similar actions on established devices: 847 permits, 0 denies│ │
|
||
│ └─────────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
||
│ │ CONTEXT │ │
|
||
│ │ │ │
|
||
│ │ Structural coherence: 11.2 (healthy) │ │
|
||
│ │ Prediction set size: 18 outcomes (moderate uncertainty) │ │
|
||
│ │ Evidence accumulator: 3.2 (inconclusive) │ │
|
||
│ │ │ │
|
||
│ │ [View full witness receipt] [View similar past decisions] │ │
|
||
│ └─────────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────────────────┐ │
|
||
│ │ APPROVE │ │ DENY │ │ ESCALATE TO T3 │ │
|
||
│ │ (proceed) │ │ (block) │ │ (need policy guidance) │ │
|
||
│ └───────────────┘ └───────────────┘ └───────────────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Human Decision Recording
|
||
|
||
Human decisions become part of the audit trail:
|
||
|
||
```rust
|
||
pub struct HumanDecision {
|
||
/// Original deferred receipt
|
||
pub deferred_receipt_seq: u64,
|
||
|
||
/// Human's decision
|
||
pub decision: HumanVerdict,
|
||
|
||
/// Human identity (authenticated)
|
||
pub decider_id: AuthenticatedUserId,
|
||
|
||
/// Reasoning (required for audit)
|
||
pub rationale: String,
|
||
|
||
/// Timestamp
|
||
pub decided_at: u64,
|
||
|
||
/// Signature (human signs their decision)
|
||
pub signature: Ed25519Signature,
|
||
}
|
||
|
||
pub enum HumanVerdict {
|
||
/// Approve the action
|
||
Approve {
|
||
/// Add to training data for future automation
|
||
learn_from_this: bool,
|
||
},
|
||
/// Deny the action
|
||
Deny {
|
||
/// Reason for denial
|
||
reason: String,
|
||
},
|
||
/// Escalate to higher tier
|
||
Escalate {
|
||
to_tier: EscalationTier,
|
||
reason: String,
|
||
},
|
||
/// Request more information
|
||
NeedMoreInfo {
|
||
questions: Vec<String>,
|
||
},
|
||
}
|
||
```
|
||
|
||
### Override Protocol
|
||
|
||
Humans can override DENY decisions, but with friction and accountability:
|
||
|
||
```rust
|
||
pub struct DenyOverride {
|
||
/// Which denial is being overridden
|
||
pub denied_receipt_seq: u64,
|
||
|
||
/// Who is overriding (must be T4 authority)
|
||
pub overrider_id: AuthenticatedUserId,
|
||
|
||
/// Second approver required
|
||
pub second_approver_id: AuthenticatedUserId,
|
||
|
||
/// Business justification (required, min 50 chars)
|
||
pub justification: String,
|
||
|
||
/// Time-bounded: override expires
|
||
pub valid_until: u64,
|
||
|
||
/// Scope-limited: only this specific action
|
||
pub action_id: ActionId,
|
||
|
||
/// Both signatures required
|
||
pub overrider_signature: Ed25519Signature,
|
||
pub approver_signature: Ed25519Signature,
|
||
}
|
||
```
|
||
|
||
**Override constraints**:
|
||
- Two humans required (four-eyes principle)
|
||
- Must provide written justification
|
||
- Time-limited (max 24 hours)
|
||
- Scope-limited (only the specific action)
|
||
- All overrides flagged for security review
|
||
|
||
### Learning from Human Decisions
|
||
|
||
Human decisions improve the gate over time:
|
||
|
||
```rust
|
||
/// When human approves a DEFER, optionally learn from it
|
||
pub fn learn_from_approval(
|
||
deferred: &WitnessReceipt,
|
||
human: &HumanDecision,
|
||
) {
|
||
if human.decision.learn_from_this() {
|
||
// Add to calibration data
|
||
conformal_calibrator.add_observation(
|
||
deferred.context.clone(),
|
||
Outcome::Safe, // Human judged it safe
|
||
);
|
||
|
||
// Update e-process null hypothesis
|
||
eprocess_trainer.add_positive_example(
|
||
deferred.action.clone(),
|
||
);
|
||
|
||
// Adjust threshold candidates (for meta-learning in v1)
|
||
threshold_learner.record_human_permit(
|
||
deferred.signals.clone(),
|
||
);
|
||
}
|
||
}
|
||
```
|
||
|
||
### Workload Distribution Target
|
||
|
||
The goal is **minimal human burden** while maintaining safety:
|
||
|
||
| Decision | Target Rate | Human Workload |
|
||
|----------|-------------|----------------|
|
||
| PERMIT | 90-95% | Zero |
|
||
| DEFER | 4-9% | Human decides |
|
||
| DENY | 1-2% | Zero (unless override requested) |
|
||
|
||
If DEFER rate exceeds 10%, the gate is too conservative—tune thresholds.
|
||
If DENY rate exceeds 5%, something is wrong—investigate root cause.
|
||
|
||
### Integration Channels
|
||
|
||
| Channel | Use Case | Response Format |
|
||
|---------|----------|-----------------|
|
||
| **Slack** | On-call escalation | Interactive buttons |
|
||
| **PagerDuty** | Critical/timed decisions | Acknowledge + decision API |
|
||
| **Dashboard** | Batch review | Web UI with full context |
|
||
| **CLI** | Developer/ops workflow | `ruvector gate approve <seq>` |
|
||
| **API** | Programmatic integration | REST/gRPC |
|
||
|
||
### Audit Trail for Human Decisions
|
||
|
||
Every human decision is:
|
||
1. **Authenticated**: Decider identity verified via SSO/MFA
|
||
2. **Signed**: Human signs their decision with personal key
|
||
3. **Chained**: Added to the same receipt chain as gate decisions
|
||
4. **Timestamped**: Immutable record of when decision was made
|
||
5. **Justified**: Rationale captured for later review
|
||
|
||
```
|
||
Receipt Chain:
|
||
[1847392] PERMIT (automated) → agent executed
|
||
[1847393] DEFER (automated) → escalated to human
|
||
[1847393-H] APPROVE (human: alice@corp) → agent executed
|
||
[1847394] DENY (automated) → blocked
|
||
[1847394-O] OVERRIDE (humans: bob@corp + carol@corp) → exception granted
|
||
```
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
### Benefits
|
||
|
||
1. **Formal Guarantees**: Type I error control at any stopping time
|
||
2. **Distribution Shift Robustness**: Conformal prediction adapts without retraining
|
||
3. **Computational Efficiency**: O(n^{o(1)}) update time from subpolynomial min-cut
|
||
4. **Audit Trail**: Every decision has cryptographic witness receipt
|
||
5. **Defense in Depth**: Three independent signals must concur for permit
|
||
6. **Cryptographic Integrity**: All receipts signed with Ed25519
|
||
7. **Attack Resistance**: E-value bounds, replay guards, race condition prevention
|
||
8. **Distributed Scalability**: Hierarchical coordination with regional and global tiers
|
||
9. **Fault Tolerance**: Automatic failover with safe defaults
|
||
|
||
### Risks & Mitigations
|
||
|
||
| Risk | Mitigation |
|
||
|------|------------|
|
||
| Computational overhead | Lazy evaluation; batch updates; SIMD optimization |
|
||
| E-value power under uncertainty | Mixture e-values for robustness |
|
||
| Graph model mismatch | Learn graph structure from trajectories |
|
||
| Threshold tuning | Adaptive thresholds via meta-learning |
|
||
| Receipt forgery | Mandatory Ed25519 signing; chain linkage |
|
||
| E-value manipulation | Input bounds; clamping with security logging |
|
||
| Race conditions | Atomic decisions with sequence numbers |
|
||
| Replay attacks | Bloom filter + sliding window guard |
|
||
| Network partitions | Hierarchical decisions; local autonomy |
|
||
| Byzantine nodes | Consensus-based aggregation; safe defaults |
|
||
|
||
### Complexity Analysis
|
||
|
||
| Operation | Current | With AVCG | Distributed AVCG |
|
||
|-----------|---------|-----------|------------------|
|
||
| Edge update | O(n^{o(1)}) | O(n^{o(1)}) | O(n^{o(1)}) + network |
|
||
| Gate evaluation | O(1) | O(k) prediction set | O(k) + O(R) regional |
|
||
| Witness generation | O(m) | O(m) amortized | O(m) + signing |
|
||
| Certificate verification | O(n) | O(n + log T) | O(n + log T) + sig verify |
|
||
| Receipt signing | N/A | O(1) Ed25519 | O(1) + HSM latency |
|
||
| Distributed consensus | N/A | N/A | O(log N) Raft |
|
||
| E-process aggregation | N/A | O(1) | O(P) peers |
|
||
|
||
Where: k = prediction set size, T = history length, R = regional peers, N = cluster size, P = peer count
|
||
|
||
## References
|
||
|
||
### Dynamic Min-Cut
|
||
1. El-Hayek, Henzinger, Li. "Deterministic and Exact Fully-dynamic Minimum Cut of Superpolylogarithmic Size in Subpolynomial Time." arXiv:2512.13105, December 2025.
|
||
2. Jin, Sun, Thorup. "Fully Dynamic Exact Minimum Cut in Subpolynomial Time." SODA 2024.
|
||
|
||
### Online Conformal Prediction
|
||
3. "Online Conformal Inference with Retrospective Adjustment for Faster Adaptation to Distribution Shift." arXiv:2511.04275, November 2025.
|
||
4. "Distribution-informed Online Conformal Prediction (COP)." December 2025.
|
||
5. "CORE: Conformal Regression under Distribution Shift via Reinforcement Learning." October 2025.
|
||
|
||
### E-Values and E-Processes
|
||
6. Ramdas, Wang. "Hypothesis Testing with E-values." Foundations and Trends in Statistics, 2025.
|
||
7. ICML 2025 Tutorial: "Game-theoretic Statistics and Sequential Anytime-Valid Inference."
|
||
8. "Sequential Randomization Tests Using e-values." arXiv:2512.04366, December 2025.
|
||
|
||
### AI Agent Control
|
||
9. "Bounded Autonomy: A Pragmatic Response to Concerns About Fully Autonomous AI Agents." XMPRO, 2025.
|
||
10. "Customizable Runtime Enforcement for Safe and Reliable LLM Agents." arXiv:2503.18666, 2025.
|
||
|
||
## Testing Strategy
|
||
|
||
### Unit Tests
|
||
|
||
| Component | Coverage Target | Key Test Cases |
|
||
|-----------|----------------|----------------|
|
||
| `CompactGraph` | 95% | Add/remove edges, weight updates, min-cut estimation |
|
||
| `EvidenceAccumulator` | 95% | Bounds checking, update rules, stopping decisions |
|
||
| `TileReport` | 90% | Serialization roundtrip, checksum verification |
|
||
| `PermitToken` | 95% | Signing, verification, TTL expiration |
|
||
| `ReceiptLog` | 95% | Hash chain integrity, tamper detection |
|
||
| `ThreeFilterDecision` | 100% | All Permit/Defer/Deny paths |
|
||
|
||
### Integration Tests
|
||
|
||
| Scenario | Description | Expected Outcome |
|
||
|----------|-------------|------------------|
|
||
| Happy path | Stable graph, safe action | PERMIT with valid receipt |
|
||
| Boundary crossing | Action crosses fragile partition | DENY with boundary edges |
|
||
| Shift detection | New dependency pattern | DEFER with escalation |
|
||
| Human approval | DEFER → human approves | Token issued, learning recorded |
|
||
| Replay verification | Replay historical decision | Deterministic match |
|
||
| Hash chain audit | Verify 1000 receipts | All hashes valid |
|
||
|
||
### Property-Based Tests
|
||
|
||
```rust
|
||
#[proptest]
|
||
fn e_value_always_positive(e1: f64, e2: f64) {
|
||
let result = combine_evalues(e1.abs(), e2.abs());
|
||
prop_assert!(result > 0.0);
|
||
}
|
||
|
||
#[proptest]
|
||
fn receipt_hash_deterministic(receipt: WitnessReceipt) {
|
||
let hash1 = receipt.compute_hash();
|
||
let hash2 = receipt.compute_hash();
|
||
prop_assert_eq!(hash1, hash2);
|
||
}
|
||
|
||
#[proptest]
|
||
fn serialization_roundtrip(report: TileReport) {
|
||
let bytes = report.serialize();
|
||
let restored = TileReport::deserialize(&bytes);
|
||
prop_assert_eq!(report, restored);
|
||
}
|
||
```
|
||
|
||
### Security Tests
|
||
|
||
| Test | Attack Vector | Expected Behavior |
|
||
|------|---------------|-------------------|
|
||
| Forged signature | Invalid Ed25519 sig | Verification fails |
|
||
| Replay attack | Duplicate action | ReplayGuard blocks |
|
||
| E-value overflow | Extreme likelihood ratio | Clamped to bounds |
|
||
| Race condition | Concurrent evaluations | Sequence numbers ordered |
|
||
| Tampered receipt | Modified hash | Chain verification fails |
|
||
|
||
### Benchmark Tests
|
||
|
||
| Metric | Target | Measurement |
|
||
|--------|--------|-------------|
|
||
| Gate decision latency | p99 < 50ms | `criterion` benchmark |
|
||
| Receipt signing | < 5ms | `criterion` benchmark |
|
||
| 255-tile report merge | < 10ms | `criterion` benchmark |
|
||
| Hash chain verification (1000) | < 100ms | `criterion` benchmark |
|
||
| Memory per worker tile | < 64KB | Static analysis |
|
||
|
||
---
|
||
|
||
## Configuration Format
|
||
|
||
### TOML Configuration
|
||
|
||
```toml
|
||
# gate-config.toml
|
||
|
||
[gate]
|
||
# Gate identification
|
||
gate_id = "gate-west-01"
|
||
version = "0.1.0"
|
||
|
||
[thresholds]
|
||
# E-process thresholds
|
||
tau_deny = 0.01 # E-value below this → DENY
|
||
tau_permit = 100.0 # E-value above this → PERMIT
|
||
|
||
# Structural thresholds
|
||
min_cut = 5.0 # Cut value below this → DENY
|
||
max_shift = 0.5 # Shift pressure above this → DEFER
|
||
|
||
# Conformal thresholds
|
||
max_prediction_set = 20 # Set size above this → DEFER
|
||
coverage_target = 0.90 # Target coverage rate
|
||
|
||
[timing]
|
||
# Permit token TTL
|
||
permit_ttl_seconds = 300
|
||
|
||
# Decision timeout
|
||
decision_timeout_ms = 50
|
||
|
||
# Tick interval for worker tiles
|
||
tick_interval_ms = 10
|
||
|
||
[security]
|
||
# Key rotation
|
||
signing_key_rotation_days = 30
|
||
threshold_key_rotation_days = 90
|
||
|
||
# Replay prevention
|
||
replay_window_seconds = 3600
|
||
bloom_filter_size = 1000000
|
||
|
||
[distributed]
|
||
# Coordination settings
|
||
regional_peers = ["gate-west-02", "gate-west-03"]
|
||
global_coordinator = "coordinator-global-01"
|
||
raft_heartbeat_ms = 100
|
||
consensus_timeout_ms = 1000
|
||
|
||
[escalation]
|
||
# Human-in-loop settings
|
||
default_timeout_seconds = 300
|
||
default_on_timeout = "deny"
|
||
|
||
[escalation.channels.slack]
|
||
webhook_url = "${SLACK_WEBHOOK_URL}"
|
||
channel = "#gate-escalations"
|
||
|
||
[escalation.channels.pagerduty]
|
||
api_key = "${PAGERDUTY_API_KEY}"
|
||
service_id = "gate-critical"
|
||
|
||
[observability]
|
||
# Metrics endpoint
|
||
metrics_port = 9090
|
||
metrics_path = "/metrics"
|
||
|
||
# Tracing
|
||
tracing_enabled = true
|
||
tracing_sample_rate = 0.1
|
||
jaeger_endpoint = "http://jaeger:14268/api/traces"
|
||
|
||
[storage]
|
||
# Receipt storage
|
||
receipt_backend = "postgresql"
|
||
receipt_retention_days = 90
|
||
checkpoint_interval = 100
|
||
|
||
[storage.postgresql]
|
||
host = "${DB_HOST}"
|
||
port = 5432
|
||
database = "gate_receipts"
|
||
username = "${DB_USER}"
|
||
password = "${DB_PASSWORD}"
|
||
```
|
||
|
||
### Environment Variables
|
||
|
||
```bash
|
||
# Required
|
||
export GATE_SIGNING_KEY_PATH=/etc/gate/keys/signing.key
|
||
export GATE_CONFIG_PATH=/etc/gate/config.toml
|
||
|
||
# Optional overrides
|
||
export GATE_TAU_DENY=0.01
|
||
export GATE_TAU_PERMIT=100.0
|
||
export GATE_MIN_CUT=5.0
|
||
export GATE_MAX_SHIFT=0.5
|
||
export GATE_PERMIT_TTL_SECONDS=300
|
||
|
||
# Secrets (never in config file)
|
||
export SLACK_WEBHOOK_URL=https://hooks.slack.com/...
|
||
export PAGERDUTY_API_KEY=...
|
||
export DB_PASSWORD=...
|
||
```
|
||
|
||
---
|
||
|
||
## Error Recovery Procedures
|
||
|
||
### Gate Decision Failures
|
||
|
||
| Failure | Detection | Recovery | Fallback |
|
||
|---------|-----------|----------|----------|
|
||
| Min-cut timeout | Decision exceeds 50ms | Log, retry once | DEFER |
|
||
| E-process NaN | `is_nan()` check | Reset accumulator | DENY |
|
||
| Signing failure | Ed25519 error | Rotate to backup key | DENY (unsigned) |
|
||
| Receipt log full | Capacity check | Archive, start new segment | DENY |
|
||
|
||
### Distributed Failures
|
||
|
||
```rust
|
||
impl FaultRecovery {
|
||
pub async fn handle_regional_failure(&mut self, error: RegionalError) -> GateResult {
|
||
match error {
|
||
RegionalError::LeaderUnavailable => {
|
||
// Wait for new leader election
|
||
tokio::time::sleep(Duration::from_millis(200)).await;
|
||
self.retry_with_new_leader().await
|
||
}
|
||
|
||
RegionalError::NetworkPartition => {
|
||
// Fall back to local-only decision
|
||
log::warn!("Network partition detected, using local gate");
|
||
self.local_gate.evaluate_standalone()
|
||
}
|
||
|
||
RegionalError::ConsensusTimeout => {
|
||
// Use conservative decision
|
||
Ok(GateResult {
|
||
decision: GateDecision::Defer,
|
||
reason: "Consensus timeout - escalating to human".into(),
|
||
..Default::default()
|
||
})
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Receipt Chain Recovery
|
||
|
||
```rust
|
||
impl ReceiptLog {
|
||
/// Recover from corrupted receipt chain
|
||
pub fn recover_chain(&mut self, last_known_good: u64) -> Result<(), RecoveryError> {
|
||
// 1. Truncate corrupted entries
|
||
self.truncate_after(last_known_good)?;
|
||
|
||
// 2. Rebuild from checkpoint
|
||
let checkpoint = self.find_nearest_checkpoint(last_known_good)?;
|
||
self.rebuild_from_checkpoint(checkpoint)?;
|
||
|
||
// 3. Mark recovery in audit log
|
||
self.append_recovery_marker(last_known_good)?;
|
||
|
||
// 4. Alert operators
|
||
alert::send("Receipt chain recovery performed", Severity::Warning);
|
||
|
||
Ok(())
|
||
}
|
||
}
|
||
```
|
||
|
||
### Worker Tile Recovery
|
||
|
||
| Failure | Detection | Recovery Time | Data Loss |
|
||
|---------|-----------|---------------|-----------|
|
||
| Single tile crash | Heartbeat timeout | < 100ms | Last tick |
|
||
| Tile memory corruption | Checksum mismatch | < 500ms | Current shard |
|
||
| TileZero crash | Primary unavailable | < 1s | None (standbys) |
|
||
| Full fabric restart | All tiles down | < 5s | Rebuild from checkpoint |
|
||
|
||
### Runbook: Gate Unresponsive
|
||
|
||
```bash
|
||
# 1. Check gate health
|
||
curl http://gate:9090/health
|
||
|
||
# 2. If unhealthy, check logs
|
||
kubectl logs -l app=gate --tail=100
|
||
|
||
# 3. Check for resource exhaustion
|
||
kubectl top pods -l app=gate
|
||
|
||
# 4. If memory high, trigger GC
|
||
curl -X POST http://gate:9090/admin/gc
|
||
|
||
# 5. If still unresponsive, rolling restart
|
||
kubectl rollout restart deployment/gate
|
||
|
||
# 6. Verify recovery
|
||
curl http://gate:9090/health
|
||
curl http://gate:9090/metrics | grep gate_healthy
|
||
```
|
||
|
||
---
|
||
|
||
## Appendix: Mathematical Foundations
|
||
|
||
### E-Value Composition
|
||
|
||
For independent e-values e₁, e₂:
|
||
```
|
||
e_combined = e₁ · e₂
|
||
E[e_combined] = E[e₁] · E[e₂] ≤ 1 · 1 = 1
|
||
```
|
||
|
||
This enables **optional continuation**: evidence accumulates validly across sessions.
|
||
|
||
### Conformal Coverage
|
||
|
||
Under exchangeability or bounded distribution shift:
|
||
```
|
||
P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α - δ_t
|
||
```
|
||
|
||
Where δ_t → 0 as the algorithm adapts via retrospective adjustment.
|
||
|
||
### Anytime-Valid Stopping
|
||
|
||
For any stopping time τ (possibly data-dependent):
|
||
```
|
||
P_H₀(E_τ ≥ 1/α) ≤ α
|
||
```
|
||
|
||
This holds because E_t is a nonnegative supermartingale with E[E_0] = 1.
|