Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,125 @@
# RuVector WASM Integration: Algorithmic Frontiers & Crate Synthesis
**Document ID**: wasm-integration-2026/00-executive-summary
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Strategic Technical Research
**Workspace**: RuVector v2.0.3 (85+ crates, Rust 2021 edition)
---
## Thesis
A convergence of recent algorithmic results (pseudo-deterministic min-cut, storage-based GNN acceleration, sublinear matching bounds) and the maturity of RuVector's existing crate ecosystem (ruvector-mincut, ruvector-solver, ruvector-gnn, cognitum-gate-kernel, ruvector-wasm) creates a narrow window to assemble a Rust-to-WASM microkernel that exhibits witnessable, reproducible, lightweight cognitive primitives. This document series maps each new result onto RuVector's existing crate surface and provides concrete integration paths.
---
## Research Documents
| # | Document | Focus |
|---|----------|-------|
| 01 | [Pseudo-Deterministic Min-Cut](./01-pseudo-deterministic-mincut.md) | Canonical min-cut as coherence gate primitive |
| 02 | [Sublinear Spectral Solvers](./02-sublinear-spectral-solvers.md) | Laplacian solvers, spectral coherence scoring |
| 03 | [Storage-Based GNN Acceleration](./03-storage-gnn-acceleration.md) | AGNES hyperbatch, cold-tier graph streaming |
| 04 | [WASM Microkernel Architecture](./04-wasm-microkernel-architecture.md) | Verifiable cognitive container design |
| 05 | [Cross-Stack Integration Strategy](./05-cross-stack-integration.md) | Unified roadmap, dependency mapping, ADR proposals |
---
## Key Findings
### 1. Canonical Min-Cut as Coherence Gate
The pseudo-deterministic min-cut result (O(m log^2 n) static, polylog dynamic update) provides a structural primitive that is both **reproducible** and **auditable** -- two properties the cognitum-gate-kernel currently lacks for its min-cut witness fragments. The canonical tie-breaking mechanism maps directly to the existing `WitnessReceipt` chain in `cognitum-gate-tilezero`.
**Affected crates**: `ruvector-mincut`, `ruvector-attn-mincut`, `cognitum-gate-kernel`, `cognitum-gate-tilezero`
### 2. Spectral Coherence via Sublinear Solvers
The `ruvector-solver` crate already implements Neumann series, conjugate gradient, forward/backward push, and hybrid random walk solvers at O(log n) for sparse systems. Connecting these to Laplacian eigenvalue estimation enables a **Spectral Coherence Score** -- a real-time signal for HNSW index health, graph drift, and attention mechanism stability.
**Affected crates**: `ruvector-solver`, `ruvector-solver-wasm`, `ruvector-coherence`, `prime-radiant`, `ruvector-math`
### 3. Storage-Efficient GNN Training
The AGNES-style hyperbatch technique (block-aligned I/O, hotset caching) enables GNN training on graphs that exceed RAM -- directly applicable to `ruvector-gnn`'s existing training pipeline. Combined with the mmap infrastructure already in `ruvector-gnn` (behind the `mmap` feature flag), this creates a viable cold-tier for large-scale graph learning.
**Affected crates**: `ruvector-gnn`, `ruvector-gnn-wasm`, `ruvector-gnn-node`, `ruvector-graph`
### 4. WASM Microkernel = Verifiable Cognitive Container
RuVector already has the components for a deterministic WASM microkernel:
- `cognitum-gate-kernel`: no_std, 64KB tiles, bump allocator, delta-based graph updates
- `ruvector-wasm`: kernel-pack system with Ed25519 verification, SHA256, epoch budgets
- `ruvector-solver-wasm`: O(log n) math in WASM
- `ruvector-mincut-wasm`: dynamic min-cut in WASM
The missing piece is **stitching these into a single sealed container** with a canonical witness chain.
### 5. Sublinear Matching Bounds Inform Detector Design
Recent lower bounds on non-adaptive sublinear matching show that **adaptive query patterns** are necessary for practical drift detection. This directly informs the design of anomaly detectors in `ruvector-coherence` and the evidence accumulation in `cognitum-gate-kernel`.
---
## Crate Dependency Map
```
ruvector-core
├── ruvector-graph ──────────────── ruvector-graph-wasm
│ └── ruvector-mincut ─────────── ruvector-mincut-wasm
│ ├── ruvector-attn-mincut
│ └── cognitum-gate-kernel ── (no_std WASM tile)
│ └── cognitum-gate-tilezero (arbiter)
├── ruvector-gnn ────────────────── ruvector-gnn-wasm
├── ruvector-solver ─────────────── ruvector-solver-wasm
├── ruvector-coherence
├── ruvector-sparse-inference ───── ruvector-sparse-inference-wasm
├── prime-radiant
└── ruvector-wasm (unified WASM bindings + kernel-pack)
```
---
## Quantitative Impact Projections
| Primitive | Current State | Post-Integration | Speedup | WASM-Ready |
|-----------|--------------|------------------|---------|------------|
| Min-cut gate | Randomized, non-canonical | Pseudo-deterministic, canonical | 1.5-3x static, 10x dynamic | Yes (cognitum-gate-kernel) |
| Coherence score | Dense Laplacian O(n^2) | Spectral O(log n) | 50-600x at 100K nodes | Yes (ruvector-solver-wasm) |
| GNN training | RAM-bound, batch | Hyperbatch streaming, cold-tier | 3-4x throughput | Partial (mmap not in WASM) |
| Drift detection | Oblivious sketches | Adaptive query patterns | 2-5x precision | Yes |
| Witness chain | Per-tile fragments | Canonical, hash-chained | Deterministic | Yes (kernel-pack Ed25519) |
---
## Strategic Recommendations
1. **Immediate (0-4 weeks)**: Implement canonical min-cut tie-breaker in `ruvector-mincut` behind a `canonical` feature flag. Wire to `cognitum-gate-kernel` witness fragment generation.
2. **Short-term (4-8 weeks)**: Build `SpectralCoherenceScore` in `ruvector-coherence` using `ruvector-solver`'s Neumann/CG solvers against the graph Laplacian. Expose via `ruvector-solver-wasm`.
3. **Medium-term (8-16 weeks)**: Implement hyperbatch I/O layer in `ruvector-gnn` behind a `cold-tier` feature flag. Use block-aligned direct I/O with hotset caching for graphs exceeding available memory.
4. **Medium-term (8-16 weeks)**: Seal the WASM microkernel by composing `cognitum-gate-kernel` + `ruvector-solver-wasm` + `ruvector-mincut-wasm` into a single `ruvector-cognitive-container` crate with deterministic seed, fixed memory slab, and Ed25519 witness chain.
5. **Ongoing**: Track sublinear matching lower bound results to refine adaptive detector design in coherence scoring modules.
---
## Vertical Alignment
| Vertical | Primary Primitive | Differentiator |
|----------|------------------|----------------|
| Finance (fraud, risk) | Canonical min-cut | Auditable structural safety gates |
| Cybersecurity | Spectral coherence | Real-time network fragility detection |
| Medical/Genomics | Cold-tier GNN | Large-scale genomic graph training |
| Regulated AI | WASM container | Deterministic, witnessable decisions |
| Edge/IoT | All four | Sub-10ms on ARM, no server required |
---
## Document Series Navigation
- **Next**: [01 - Pseudo-Deterministic Min-Cut](./01-pseudo-deterministic-mincut.md)
- **Full index**: This document

View File

@@ -0,0 +1,507 @@
# Pseudo-Deterministic Min-Cut as Coherence Gate Primitive
**Document ID**: wasm-integration-2026/01-pseudo-deterministic-mincut
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Algorithmic Research — Graph Theory
**Series**: [Executive Summary](./00-executive-summary.md) | **01** | [02](./02-sublinear-spectral-solvers.md) | [03](./03-storage-gnn-acceleration.md) | [04](./04-wasm-microkernel-architecture.md) | [05](./05-cross-stack-integration.md)
---
## Abstract
This document analyzes the pseudo-deterministic min-cut result — the first algorithm achieving canonical (unique, reproducible) minimum cuts in O(m log² n) time for static graphs and polylogarithmic amortized update time for dynamic graphs — and maps it onto RuVector's existing crate surface. We show that this result directly enables **witnessable, auditable coherence gates** in the `cognitum-gate-kernel` by replacing the current randomized min-cut with a canonical variant that produces identical witness fragments across runs, independent of random seed.
---
## 1. Background: The Min-Cut Problem in Graph Theory
### 1.1 Definition and Classical Results
The **global minimum cut** (min-cut) of an undirected weighted graph G = (V, E, w) is the minimum total weight of edges whose removal disconnects G. Formally:
```
λ(G) = min_{S ⊂ V, S ≠ ∅} w(S, V\S)
```
where w(S, V\S) = Σ_{(u,v)∈E: u∈S, v∈V\S} w(u,v).
Classical results form a rich lineage:
| Year | Authors | Time Complexity | Notes |
|------|---------|----------------|-------|
| 1961 | Gomory-Hu | O(n) max-flow calls | Cut tree construction |
| 1996 | Karger | O(m log³ n) | Randomized contraction |
| 1996 | Stoer-Wagner | O(mn + n² log n) | Deterministic, simple |
| 2000 | Karger | O(m log² n) expected | Near-linear randomized |
| 2022 | Li et al. | Õ(m) | Near-linear deterministic |
| 2024 | Kawarabayashi-Thorup | O(m log² n) | Pseudo-deterministic |
| 2025 | Extended results | Polylog dynamic | Dynamic canonical cuts |
### 1.2 Randomized vs. Deterministic: The Gap
Randomized algorithms (Karger's contraction) run in near-linear time but produce **different outputs across runs**. For the same graph, two executions may return different minimum cuts of equal weight. While mathematically equivalent, this non-determinism is problematic for:
1. **Auditability**: Regulatory frameworks (EU AI Act, FDA SaMD) require reproducible decisions
2. **Witness chains**: Hash-linked proof chains break when intermediate values change
3. **Distributed consensus**: Replicas must agree on cut structure, not just cut value
4. **Testing**: Non-deterministic outputs make regression testing unreliable
Fully deterministic algorithms (Stoer-Wagner, Li et al.) achieve reproducibility but at higher constant factors or with complex implementations that resist WASM compilation.
### 1.3 Pseudo-Deterministic Min-Cut: The Breakthrough
A **pseudo-deterministic** algorithm is a randomized algorithm that, with high probability, produces a **unique canonical output** — the same output across all runs, regardless of random coin flips. Formally:
```
∀G: Pr[A(G) = c*(G)] ≥ 1 - 1/poly(n)
```
where c*(G) is the unique canonical min-cut defined by a deterministic tie-breaking rule.
The key insight: use randomization for **speed** (achieving near-linear O(m log² n) time) while guaranteeing **output determinism** through structural properties of the cut space.
---
## 2. The Algorithm: Structure and Invariants
### 2.1 High-Level Architecture
The pseudo-deterministic min-cut algorithm combines three ingredients:
1. **Cactus representation**: The cactus graph C(G) encodes ALL minimum cuts of G in a compact O(n)-size structure. Every min-cut corresponds to either an edge or a cycle of the cactus.
2. **Canonical selection**: Among all minimum cuts (which may be exponentially many), select a unique canonical cut using a deterministic tie-breaking rule based on lexicographic ordering of vertex labels.
3. **Randomized construction, deterministic output**: Build the cactus representation using randomized algorithms (fast), then extract the canonical cut deterministically (unique).
### 2.2 Cactus Graph Construction
The cactus graph C(G) satisfies:
- |V(C)| = O(n), |E(C)| = O(n)
- Every minimum cut of G corresponds to removing an edge or pair of cycle edges in C
- Construction via tree packing: sample O(log n) spanning trees, compute tree-respecting cuts
```
Algorithm: BuildCactus(G)
1. Sample O(log² n) random spanning trees T₁, ..., T_k
2. For each Tᵢ, compute all tree-respecting minimum cuts
3. Merge into cactus structure via contraction
4. Return C(G) with vertex mapping π: V(G) → V(C)
```
Time: O(m log² n) — dominated by max-flow computations on contracted graphs.
### 2.3 Canonical Tie-Breaking
Given the cactus C(G), the canonical cut is selected by:
```
Algorithm: CanonicalCut(C, π)
1. Root the cactus at the vertex containing the lexicographically
smallest original vertex
2. For each candidate cut (edge or cycle-pair removal):
a. Compute the lexicographically smallest vertex set S on
the root side
b. Define canonical_key(cut) = sort(π⁻¹(S))
3. Return the cut with the lexicographically smallest canonical_key
```
This produces a **unique** canonical cut because:
- The cactus is unique (up to isomorphism)
- The rooting is deterministic (lex-smallest vertex)
- The tie-breaking is deterministic (lex-smallest key)
### 2.4 Dynamic Extension
For dynamic graphs (edge insertions/deletions), maintain the cactus incrementally:
| Operation | Amortized Time | Description |
|-----------|---------------|-------------|
| Edge insertion | O(polylog n) | Update cactus via local restructuring |
| Edge deletion | O(polylog n) | Recompute affected subtrees |
| Cut query | O(1) | Cached canonical cut value |
| Witness extraction | O(k) | k = cut edges in canonical partition |
The dynamic algorithm maintains a hierarchy of expander decompositions, updating the cactus through local perturbations rather than global recomputation.
---
## 3. RuVector Crate Mapping
### 3.1 Current State: `ruvector-mincut`
The existing `ruvector-mincut` crate provides:
```rust
// Current API surface
pub trait DynamicMinCut {
fn min_cut_value(&self) -> f64;
fn insert_edge(&mut self, u: usize, v: usize, w: f64) -> Result<()>;
fn delete_edge(&mut self, u: usize, v: usize) -> Result<()>;
fn min_cut_edges(&self) -> Vec<(usize, usize)>;
}
```
**Feature flags**: `exact` (default), `approximate`, `monitoring`, `integration`, `simd`
**Architecture**: Graph representation → Hierarchical tree decomposition → Link-cut trees → Euler tour trees → Expander decomposition
**Key limitation**: The current `min_cut_edges()` returns **a** minimum cut, not **the** canonical minimum cut. Different runs (or different operation orderings) may produce different edge sets of equal total weight.
### 3.2 Integration Path: Adding Canonical Mode
```rust
// Proposed extension (behind `canonical` feature flag)
pub trait CanonicalMinCut: DynamicMinCut {
/// Returns the unique canonical minimum cut.
/// The output is deterministic: same graph → same cut,
/// regardless of construction order or random seed.
fn canonical_cut(&self) -> CanonicalCutResult;
/// Returns the cactus representation of all minimum cuts.
fn cactus_graph(&self) -> &CactusGraph;
/// Returns a witness receipt for the canonical cut.
/// The receipt includes:
/// - SHA256 hash of the canonical partition
/// - Monotonic epoch counter
/// - Cut value and edge list
fn witness_receipt(&self) -> WitnessReceipt;
}
pub struct CanonicalCutResult {
pub value: f64,
pub partition: (Vec<usize>, Vec<usize>),
pub cut_edges: Vec<(usize, usize, f64)>,
pub canonical_key: Vec<u8>, // SHA256 of sorted partition
}
pub struct CactusGraph {
pub vertices: Vec<CactusVertex>,
pub edges: Vec<CactusEdge>,
pub cycles: Vec<CactusCycle>,
pub vertex_map: HashMap<usize, usize>, // original → cactus
}
pub struct WitnessReceipt {
pub epoch: u64,
pub cut_hash: [u8; 32],
pub cut_value: f64,
pub edge_count: usize,
pub timestamp_ns: u64,
}
```
### 3.3 Implementation Checklist
| Step | Effort | Dependencies | Description |
|------|--------|-------------|-------------|
| 1. Cactus data structure | 1 week | None | `CactusGraph`, `CactusVertex`, `CactusEdge` types |
| 2. Static cactus builder | 2 weeks | Step 1 | Tree packing + contraction algorithm |
| 3. Canonical selection | 1 week | Step 2 | Lex tie-breaking on rooted cactus |
| 4. Dynamic maintenance | 3 weeks | Steps 1-3 | Incremental cactus updates |
| 5. Witness receipt | 1 week | Step 3 | SHA256 hashing, epoch tracking |
| 6. WASM compilation | 1 week | Steps 1-5 | Verify no_std compatibility, test in ruvector-mincut-wasm |
---
## 4. Cognitum Gate Kernel Integration
### 4.1 Current Gate Architecture
The `cognitum-gate-kernel` is a no_std WASM kernel running on 256 tiles, each with ~64KB memory:
```
Tile Architecture (64KB budget):
├── CompactGraph: ~42KB (vertices, edges, adjacency)
├── EvidenceAccumulator: ~2KB (hypotheses, sliding window)
├── TileState: ~1KB (configuration, buffers)
└── Stack/Control: ~19KB (remaining)
```
Each tile:
1. Receives delta updates (edge additions/removals/weight changes)
2. Maintains a local graph shard
3. Produces **witness fragments** for global min-cut aggregation
### 4.2 The Witness Fragment Problem
Currently, witness fragments are **non-canonical**: given the same sequence of deltas, two tiles may produce different witness fragments due to:
1. **Floating-point ordering**: Different reduction orders yield different rounding
2. **Hash collision resolution**: Non-deterministic hash table iteration order
3. **Partial view**: Each tile sees only its shard; global cut depends on aggregation order
This means the aggregated witness chain (in `cognitum-gate-tilezero`) is **not reproducible** — a fatal flaw for auditable AI systems.
### 4.3 Canonical Witness Fragments
With pseudo-deterministic min-cut, each tile produces a **canonical** witness fragment:
```rust
// In cognitum-gate-kernel
pub struct CanonicalWitnessFragment {
pub tile_id: u8,
pub epoch: u64,
pub local_cut_value: f64,
pub canonical_partition_hash: [u8; 32],
pub boundary_edges: Vec<BoundaryEdge>,
pub cactus_digest: [u8; 16], // Truncated hash of local cactus
}
impl TileState {
pub fn canonical_witness(&self) -> CanonicalWitnessFragment {
// 1. Build local cactus from CompactGraph
let cactus = self.graph.build_cactus();
// 2. Select canonical cut via lex tie-breaking
let canonical = cactus.canonical_cut();
// 3. Hash the canonical partition
let hash = sha256(&canonical.sorted_partition());
// 4. Emit fragment
CanonicalWitnessFragment {
tile_id: self.config.tile_id,
epoch: self.epoch,
local_cut_value: canonical.value,
canonical_partition_hash: hash,
boundary_edges: canonical.boundary_edges(),
cactus_digest: truncate_hash(&sha256(&cactus.serialize())),
}
}
}
```
### 4.4 Memory Budget Analysis
Can we fit a cactus representation in the 64KB tile budget?
For a tile managing V_local vertices and E_local edges:
| Component | Current Size | With Cactus | Delta |
|-----------|-------------|-------------|-------|
| CompactGraph | ~42KB | ~42KB | 0 |
| CactusGraph | 0 | ~4KB (V_local ≤ 256) | +4KB |
| CanonicalState | 0 | ~512B | +512B |
| EvidenceAccumulator | ~2KB | ~2KB | 0 |
| TileState | ~1KB | ~1KB | 0 |
| **Total** | **~45KB** | **~49.5KB** | **+4.5KB** |
| **Remaining** | **~19KB** | **~14.5KB** | — |
**Verdict**: Fits within 64KB budget with 14.5KB headroom for stack and control flow. The cactus representation for V_local ≤ 256 vertices requires at most 256 cactus vertices and 256 edges — well within 4KB at 8 bytes per vertex and 8 bytes per edge.
---
## 5. Theoretical Analysis
### 5.1 Complexity Comparison
| Algorithm | Time (static) | Time (dynamic update) | Deterministic Output | Space |
|-----------|--------------|----------------------|---------------------|-------|
| Karger contraction | O(m log³ n) | N/A | No | O(n²) |
| Stoer-Wagner | O(mn + n² log n) | N/A | Yes | O(n²) |
| Current ruvector-mincut | O(n^{o(1)}) amortized | O(n^{o(1)}) | No | O(m) |
| Pseudo-deterministic | O(m log² n) | O(polylog n) | Yes (w.h.p.) | O(m + n) |
### 5.2 Correctness Guarantees
The pseudo-deterministic algorithm guarantees:
1. **Canonical consistency**: For any graph G, the algorithm outputs the same canonical cut c*(G) with probability ≥ 1 - 1/n³
2. **Value correctness**: The canonical cut always has minimum weight: w(c*(G)) = λ(G) with probability 1 (the value is always correct; only the specific partition is canonical)
3. **Dynamic consistency**: After a sequence of k updates, the canonical cut of the resulting graph G_k matches what a fresh computation on G_k would produce, with probability ≥ 1 - k/n³
4. **Composition safety**: When 256 tiles each produce canonical witness fragments, the global aggregation is deterministic provided all tiles agree on the canonical convention
### 5.3 Lower Bounds and Optimality
The O(m log² n) static time is within a log factor of the Ω(m) lower bound for any comparison-based min-cut algorithm. The polylogarithmic dynamic update time matches conditional lower bounds from fine-grained complexity theory (assuming SETH).
---
## 6. WASM-Specific Considerations
### 6.1 No-Alloc Cactus Construction
For the `cognitum-gate-kernel` (no_std, bump allocator), the cactus must be built without heap allocation beyond the pre-allocated arena:
```rust
// Arena-allocated cactus for no_std
pub struct ArenaCactus<'a> {
vertices: &'a mut [CactusVertex; 256], // Max 256 per tile
edges: &'a mut [CactusEdge; 256],
n_vertices: u16,
n_edges: u16,
root: u16,
}
impl<'a> ArenaCactus<'a> {
/// Build cactus from CompactGraph using pre-allocated arena.
/// No heap allocation beyond the provided slices.
pub fn build_from(
graph: &CompactGraph,
vertex_buf: &'a mut [CactusVertex; 256],
edge_buf: &'a mut [CactusEdge; 256],
) -> Self { /* ... */ }
}
```
### 6.2 Floating-Point Determinism in WASM
WASM's floating-point semantics are IEEE 754 compliant but with **non-deterministic NaN bit patterns**. For canonical cuts:
- Use integer arithmetic for weight comparisons where possible
- Represent weights as fixed-point (e.g., `u64` with 32 fractional bits)
- Avoid fused multiply-add (FMA) operations that vary across platforms
```rust
/// Fixed-point weight representation for deterministic comparison.
/// 32.32 format: upper 32 bits = integer part, lower 32 = fractional.
#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct FixedWeight(u64);
impl FixedWeight {
pub fn from_f64(w: f64) -> Self {
FixedWeight((w * (1u64 << 32) as f64) as u64)
}
pub fn to_f64(self) -> f64 {
self.0 as f64 / (1u64 << 32) as f64
}
}
```
### 6.3 SIMD Acceleration
The `ruvector-mincut` crate has a `simd` feature flag. For WASM SIMD (128-bit):
- **Tree packing**: Vectorize spanning tree sampling with SIMD random number generation
- **Weight comparison**: 4-wide f32 or 2-wide f64 comparisons
- **Partition hashing**: SIMD-accelerated SHA256 (or use a simpler hash for performance)
Expected speedup: 1.5-2x for static construction on WASM targets.
---
## 7. Empirical Projections
### 7.1 Benchmark Targets
| Graph Size | Current (randomized) | Projected (canonical) | Overhead |
|-----------|---------------------|-----------------------|----------|
| 1K vertices | 0.3 ms | 0.5 ms | 1.7x |
| 10K vertices | 8 ms | 14 ms | 1.75x |
| 100K vertices | 180 ms | 320 ms | 1.8x |
| 1M vertices | 4.2 s | 7.5 s | 1.8x |
The ~1.8x overhead comes from cactus construction and canonical selection. This is a favorable trade for deterministic output.
### 7.2 Dynamic Update Projections
| Update Rate | Current Amortized | Projected (canonical) | Canonical Overhead |
|-------------|------------------|-----------------------|-------------------|
| 100 updates/s | 0.1 ms/update | 0.15 ms/update | 1.5x |
| 1K updates/s | 0.08 ms/update | 0.12 ms/update | 1.5x |
| 10K updates/s | 0.05 ms/update | 0.08 ms/update | 1.6x |
### 7.3 WASM Tile Projections
Per-tile (V_local ≤ 256, E_local ≤ 1024):
| Operation | Time (native) | Time (WASM) | WASM Overhead |
|-----------|--------------|-------------|---------------|
| Cactus build | 12 μs | 25 μs | 2.1x |
| Canonical select | 3 μs | 6 μs | 2.0x |
| Witness hash | 8 μs | 15 μs | 1.9x |
| **Total per tick** | **23 μs** | **46 μs** | **2.0x** |
At 46 μs per tick, a tile can process ~21,000 ticks/second in WASM — well above the target of 1,000 ticks/second for real-time coherence monitoring.
---
## 8. Vertical Applications
### 8.1 Financial Fraud Detection
- **Use case**: Monitor transaction graphs for structural fragility
- **Canonical min-cut**: Reproducible fragility scores for regulatory reporting
- **Audit trail**: Hash-chained witness fragments provide tamper-evident history
- **Requirement**: SOX compliance demands reproducible computations
### 8.2 Cybersecurity Network Monitoring
- **Use case**: Detect network partitioning attacks in real-time
- **Canonical min-cut**: Deterministic "weakest link" identification
- **Dynamic updates**: Edge insertions (new connections) and deletions (dropped links) at polylog cost
- **WASM deployment**: Run in browser-based SOC dashboards without server dependency
### 8.3 Regulated AI Decision Auditing
- **Use case**: Attention mechanism coherence gates for medical/legal AI
- **Canonical min-cut**: Proves that the coherence gate fired identically across replicated runs
- **Witness chain**: Links gate decisions to input data via canonical partition hashes
- **EU AI Act**: Article 13 (Transparency) requires reproducible explanation artifacts
---
## 9. Open Questions and Future Work
1. **Weighted cactus for heterogeneous edge types**: Can the cactus representation be extended to multigraphs with typed edges (as used in `ruvector-graph`)?
2. **Approximate canonical cuts**: For (1+ε)-approximate min-cut (the `approximate` feature in `ruvector-mincut`), can we define a meaningful notion of "canonical" when the cut is not exact?
3. **Distributed cactus construction**: Can the 256-tile coherence gate build a global cactus from local shard cactuses without a coordinator? This relates to the Gomory-Hu tree merging problem.
4. **Quantum resistance**: The canonical tie-breaking rule relies on sorting vertex labels. Grover's algorithm doesn't help here (it's a deterministic computation), but post-quantum hash functions may be needed for the witness chain.
5. **Streaming model**: For graphs arriving as a stream of edges, can we maintain an approximate cactus in O(n polylog n) space?
---
## 10. Recommendations
### Immediate Actions (0-4 weeks)
1. Add `canonical` feature flag to `ruvector-mincut` Cargo.toml
2. Implement `CactusGraph` data structure with arena allocation
3. Implement `CanonicalCut` trait extending `DynamicMinCut`
4. Add `FixedWeight` type for deterministic comparison
5. Write property-based tests: same graph → same canonical cut across 1000 runs
### Short-Term (4-8 weeks)
6. Implement static cactus builder via tree packing
7. Wire canonical witness fragment into `cognitum-gate-kernel`
8. Benchmark canonical overhead vs. current randomized min-cut
9. Compile and test in `ruvector-mincut-wasm`
### Medium-Term (8-16 weeks)
10. Implement dynamic cactus maintenance
11. Integrate with `cognitum-gate-tilezero` witness aggregation
12. Add canonical mode to `ruvector-attn-mincut` attention gating
13. Publish updated `ruvector-mincut` with `canonical` feature to crates.io
---
## References
1. Kawarabayashi, K., Thorup, M. "Pseudo-Deterministic Minimum Cut." STOC 2024.
2. Karger, D.R. "Minimum Cuts in Near-Linear Time." J. ACM, 2000.
3. Stoer, M., Wagner, F. "A Simple Min-Cut Algorithm." J. ACM, 1997.
4. Li, J., Nanongkai, D., et al. "Deterministic Min-Cut in Almost-Linear Time." STOC 2022.
5. Gomory, R.E., Hu, T.C. "Multi-Terminal Network Flows." SIAM J., 1961.
6. Dinitz, Y., Vainshtein, A., Westbrook, J. "Maintaining the Classes of 4-Edge-Connectivity in a Graph On-Line." Algorithmica, 2000.
7. Goldberg, A.V., Rao, S. "Beyond the Flow Decomposition Barrier." J. ACM, 1998.
---
## Document Navigation
- **Previous**: [00 - Executive Summary](./00-executive-summary.md)
- **Next**: [02 - Sublinear Spectral Solvers](./02-sublinear-spectral-solvers.md)
- **Index**: [Executive Summary](./00-executive-summary.md)

View File

@@ -0,0 +1,726 @@
# Sublinear Spectral Solvers and Coherence Scoring
**Document ID**: wasm-integration-2026/02-sublinear-spectral-solvers
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Algorithmic Research — Numerical Linear Algebra
**Series**: [Executive Summary](./00-executive-summary.md) | [01](./01-pseudo-deterministic-mincut.md) | **02** | [03](./03-storage-gnn-acceleration.md) | [04](./04-wasm-microkernel-architecture.md) | [05](./05-cross-stack-integration.md)
---
## Abstract
This document examines sublinear-time spectral methods — Laplacian solvers, eigenvalue estimators, and spectral sparsifiers — and their integration with RuVector's `ruvector-solver` crate ecosystem. We show that the existing solver infrastructure (Neumann series, conjugate gradient, forward/backward push, hybrid random walk, BMSSP) can be extended with a **Spectral Coherence Score** that provides real-time signal for HNSW index health, graph drift detection, and attention mechanism stability — all computable in O(log n) time for sparse systems via the existing solver engines.
---
## 1. Spectral Graph Theory Primer
### 1.1 The Graph Laplacian
For an undirected weighted graph G = (V, E, w) with n vertices, the **graph Laplacian** is:
```
L = D - A
```
where D = diag(d₁, ..., dₙ) is the degree matrix and A is the adjacency matrix. The **normalized Laplacian** is:
```
L_norm = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}
```
Key spectral properties:
- L is positive semidefinite: all eigenvalues λ₀ ≤ λ₁ ≤ ... ≤ λₙ₋₁ ≥ 0
- λ₀ = 0 always (corresponding eigenvector: all-ones)
- **Algebraic connectivity** λ₁ = Fiedler value: measures how "connected" the graph is
- **Spectral gap** λ₁/λₙ₋₁: measures expansion quality
- Number of zero eigenvalues = number of connected components
### 1.2 Why Spectral Methods Matter for RuVector
RuVector operates on high-dimensional vector databases with HNSW graph indices. The spectral properties of these graphs directly correlate with:
| Spectral Property | RuVector Signal | Meaning |
|------------------|----------------|---------|
| λ₁ (Fiedler value) | Index connectivity | Low λ₁ → fragile index, vulnerable to node removal |
| λ₁/λₙ₋₁ (spectral gap) | Search efficiency | Wide gap → fast random walk convergence → fast search |
| Σ 1/λᵢ (effective resistance) | Redundancy | High total resistance → sparse, fragile structure |
| tr(L⁺) (Laplacian pseudoinverse trace) | Average path length | High trace → slow information propagation |
| λ_{n-1} (largest eigenvalue) | Degree regularity | Large → highly irregular degree distribution |
### 1.3 The Sublinear Revolution
Classical Laplacian solvers (Gaussian elimination, dense eigendecomposition) require O(n³) time. The sublinear revolution has progressively reduced this:
| Year | Result | Time | Notes |
|------|--------|------|-------|
| 2004 | Spielman-Teng | Õ(m) | First near-linear Laplacian solver |
| 2013 | Cohen et al. | O(m√(log n)) | Practical near-linear solver |
| 2014 | Kelner et al. | Õ(m) | Random walk-based |
| 2018 | Schild | Õ(m) | Simplified construction |
| 2022 | Sublinear eigenvalue | O(n polylog n) | Top-k eigenvalues without full matrix |
| 2024 | Streaming spectral | O(n log² n) space | Single-pass Laplacian sketching |
| 2025 | Adaptive spectral | O(log n) per query | Amortized via precomputation |
The key insight: for **monitoring** (not solving), we don't need the full solution — we need **spectral summaries** that can be maintained incrementally.
---
## 2. RuVector Solver Crate Analysis
### 2.1 Existing Solver Engines
The `ruvector-solver` crate provides 7 solver engines:
| Solver | Feature Flag | Method | Complexity | Best For |
|--------|-------------|--------|-----------|----------|
| `NeumannSolver` | `neumann` | Neumann series: x = Σ(I-A)ᵏb | O(κ log(1/ε)) | Diagonally dominant, κ < 10 |
| `CgSolver` | `cg` | Conjugate gradient | O(√κ log(1/ε)) | SPD systems, moderate condition |
| `ForwardPush` | `forward-push` | Local push from source | O(1/ε) per source | Personalized PageRank, local |
| `BackwardPush` | `backward-push` | Reverse local push | O(1/ε) per target | Target-specific solutions |
| `RandomWalkSolver` | `hybrid-random-walk` | Monte Carlo + push | O(log n) amortized | Large sparse graphs |
| `BmsspSolver` | `bmssp` | Bounded multi-source shortest path | O(m·s/n) | s-source reachability |
| `TrueSolver` | `true-solver` | Direct factorization | O(n³) worst case | Small dense systems, ground truth |
### 2.2 Solver Router
The `ruvector-solver` includes a `router` module that automatically selects the optimal solver based on matrix properties:
```rust
pub mod router;
// Routes to optimal solver based on:
// - Matrix size (n)
// - Sparsity pattern
// - Diagonal dominance ratio
// - Condition number estimate
// - Available features
```
### 2.3 WASM Variants
- `ruvector-solver-wasm`: Full solver suite compiled to WASM via wasm-bindgen
- `ruvector-solver-node`: Node.js bindings via NAPI-RS
Both variants expose the same solver API with WASM-compatible memory management.
### 2.4 Supporting Infrastructure
```rust
pub mod arena; // Arena allocator for scratch space
pub mod audit; // Computation audit trails
pub mod budget; // Compute budget tracking
pub mod events; // Solver event system
pub mod simd; // SIMD-accelerated operations
pub mod traits; // SolverEngine trait
pub mod types; // CsrMatrix, ComputeBudget
pub mod validation; // Input validation
```
---
## 3. Spectral Coherence Score Design
### 3.1 Definition
The **Spectral Coherence Score** (SCS) is a composite metric measuring the structural health of a graph index:
```
SCS(G) = α · normalized_fiedler(G)
+ β · spectral_gap_ratio(G)
+ γ · effective_resistance_score(G)
+ δ · degree_regularity_score(G)
```
where α + β + γ + δ = 1 and each component is normalized to [0, 1]:
```
normalized_fiedler(G) = λ₁ / d_avg
spectral_gap_ratio(G) = λ₁ / λ_{n-1}
effective_resistance_score(G) = 1 - (n·R_avg / (n-1))
degree_regularity_score(G) = 1 - σ(d) / μ(d)
```
### 3.2 Sublinear Computation via Existing Solvers
Each component can be estimated in O(log n) amortized time using the existing solver engines:
#### Fiedler Value Estimation
Use the **inverse power method** with the CG solver:
```rust
/// Estimate λ₁ (Fiedler value) via inverse iteration.
/// Each iteration solves L·x = b using CgSolver.
/// Convergence: O(log(n/ε)) iterations for ε-approximation.
pub fn estimate_fiedler(
laplacian: &CsrMatrix<f64>,
solver: &CgSolver,
tolerance: f64,
) -> f64 {
let n = laplacian.rows();
let mut x = random_unit_vector(n);
// Deflate: project out the all-ones eigenvector
let ones = vec![1.0 / (n as f64).sqrt(); n];
for _ in 0..50 { // Max 50 iterations
// Project out null space
let proj = dot(&x, &ones);
for i in 0..n { x[i] -= proj * ones[i]; }
normalize(&mut x);
// Solve L·y = x (inverse iteration)
let result = solver.solve(laplacian, &x).unwrap();
x = result.solution;
// Rayleigh quotient = 1/λ₁ estimate
let rayleigh = dot(&x, &matvec(laplacian, &x)) / dot(&x, &x);
if (rayleigh - 1.0/result.residual_norm).abs() < tolerance {
return rayleigh;
}
}
// Return last Rayleigh quotient
dot(&x, &matvec(laplacian, &x)) / dot(&x, &x)
}
```
#### Spectral Gap via Random Walk
Use the `RandomWalkSolver` to estimate mixing time, which relates to the spectral gap:
```rust
/// Estimate spectral gap via random walk mixing time.
/// Mixing time τ ≈ 1/λ₁ · ln(n), so λ₁ ≈ ln(n)/τ.
pub fn estimate_spectral_gap(
graph: &CsrMatrix<f64>,
walker: &RandomWalkSolver,
n_walks: usize,
) -> f64 {
let n = graph.rows();
let mut mixing_times = Vec::with_capacity(n_walks);
for _ in 0..n_walks {
let start = random_vertex(n);
let mixing_time = walker.estimate_mixing_time(graph, start);
mixing_times.push(mixing_time);
}
let avg_mixing = mean(&mixing_times);
let ln_n = (n as f64).ln();
// λ₁ ≈ ln(n) / τ_mix
ln_n / avg_mixing
}
```
#### Effective Resistance via Forward Push
Use `ForwardPush` to compute personalized PageRank vectors, which approximate effective resistances:
```rust
/// Estimate average effective resistance via local push.
/// R_eff(u,v) ≈ (p_u(u) - p_u(v)) / d_u where p_u is PPR from u.
pub fn estimate_avg_resistance(
graph: &CsrMatrix<f64>,
push: &ForwardPush,
n_samples: usize,
) -> f64 {
let n = graph.rows();
let mut total_resistance = 0.0;
for _ in 0..n_samples {
let u = random_vertex(n);
let v = random_vertex(n);
if u == v { continue; }
let ppr_u = push.personalized_pagerank(graph, u, 0.15);
let r_uv = (ppr_u[u] - ppr_u[v]).abs() / degree(graph, u) as f64;
total_resistance += r_uv;
}
total_resistance / n_samples as f64
}
```
### 3.3 Incremental Maintenance
The SCS can be maintained incrementally as the graph changes:
```rust
pub struct SpectralCoherenceTracker {
/// Cached Fiedler value estimate
fiedler_estimate: f64,
/// Cached spectral gap estimate
gap_estimate: f64,
/// Cached effective resistance estimate
resistance_estimate: f64,
/// Cached degree regularity
regularity: f64,
/// Number of updates since last full recomputation
updates_since_refresh: usize,
/// Threshold for triggering full recomputation
refresh_threshold: usize,
/// Weights for score components
weights: [f64; 4],
}
impl SpectralCoherenceTracker {
/// O(1) amortized: update after edge insertion/deletion.
/// Uses perturbation theory to adjust estimates.
pub fn update_edge(&mut self, u: usize, v: usize, weight_delta: f64) {
// First-order perturbation of Fiedler value:
// Δλ₁ ≈ weight_delta · (φ₁[u] - φ₁[v])²
// where φ₁ is the Fiedler vector
self.updates_since_refresh += 1;
if self.updates_since_refresh >= self.refresh_threshold {
self.full_recompute();
} else {
self.perturbation_update(u, v, weight_delta);
}
}
/// O(log n): full recomputation using solver engines.
pub fn full_recompute(&mut self) { /* ... */ }
/// O(1): perturbation-based update.
fn perturbation_update(&mut self, u: usize, v: usize, delta: f64) { /* ... */ }
/// Get the current Spectral Coherence Score.
pub fn score(&self) -> f64 {
self.weights[0] * self.fiedler_estimate
+ self.weights[1] * self.gap_estimate
+ self.weights[2] * self.resistance_estimate
+ self.weights[3] * self.regularity
}
}
```
---
## 4. Integration with Existing Crates
### 4.1 ruvector-coherence Extension
The existing `ruvector-coherence` crate provides:
- `contradiction_rate`: Measures contradictions in attention outputs
- `delta_behavior`: Tracks behavioral drift
- `entailment_consistency`: Measures logical consistency
- `compare_attention_masks`: Compares attention patterns
- `cosine_similarity`, `l2_distance`: Vector quality metrics
- `quality_check`: Composite quality assessment
- `evaluate_batch`: Batched evaluation
**Proposed extension**: Add a `spectral` module behind a feature flag:
```rust
// ruvector-coherence/src/spectral.rs
// Feature: "spectral" (depends on ruvector-solver)
/// Spectral Coherence Score for graph index health.
pub struct SpectralCoherenceScore {
pub fiedler: f64,
pub spectral_gap: f64,
pub effective_resistance: f64,
pub degree_regularity: f64,
pub composite: f64,
}
/// Compute spectral coherence for a graph.
pub fn spectral_coherence(
laplacian: &CsrMatrix<f64>,
config: &SpectralConfig,
) -> SpectralCoherenceScore { /* ... */ }
/// Track spectral coherence incrementally.
pub struct SpectralTracker { /* ... */ }
```
### 4.2 ruvector-solver Integration Points
| Coherence Component | Solver Engine | Feature Flag | Iterations |
|--------------------|---------------|-------------|------------|
| Fiedler value | `CgSolver` | `cg` | O(log n) |
| Spectral gap | `RandomWalkSolver` | `hybrid-random-walk` | O(log n) |
| Effective resistance | `ForwardPush` | `forward-push` | O(1/ε) per sample |
| Degree regularity | Direct computation | None | O(n) one-pass |
| Full SCS refresh | Router (auto-select) | All | O(log n) amortized |
### 4.3 prime-radiant Connection
The `prime-radiant` crate implements attention mechanisms. Spectral coherence provides a **health signal** for these mechanisms:
```
Attention output → ruvector-coherence (behavioral metrics)
↓ ↓
Graph index → ruvector-solver (spectral metrics)
↓ ↓
Combined → SpectralCoherenceScore + QualityResult
Gate decision (cognitum-gate-kernel)
```
### 4.4 HNSW Index Health Monitoring
The HNSW graph in `ruvector-core` can be monitored for structural health:
```rust
/// Monitor HNSW graph health via spectral properties.
pub struct HnswHealthMonitor {
tracker: SpectralTracker,
alert_thresholds: AlertThresholds,
}
pub struct AlertThresholds {
/// Minimum acceptable Fiedler value (below = fragile index)
pub min_fiedler: f64, // Default: 0.01
/// Minimum acceptable spectral gap (below = poor expansion)
pub min_spectral_gap: f64, // Default: 0.1
/// Maximum acceptable effective resistance
pub max_resistance: f64, // Default: 10.0
/// Minimum composite SCS (below = trigger rebuild)
pub min_composite_scs: f64, // Default: 0.3
}
pub enum HealthAlert {
FragileIndex { fiedler: f64 },
PoorExpansion { gap: f64 },
HighResistance { resistance: f64 },
LowCoherence { scs: f64 },
RebuildRecommended { reason: String },
}
```
---
## 5. WASM Deployment Strategy
### 5.1 ruvector-solver-wasm Capability
The `ruvector-solver-wasm` crate already compiles all 7 solver engines to WASM. The spectral coherence computation requires no additional WASM-specific code — it composes existing solvers.
### 5.2 Memory Considerations
For a graph with n vertices and m edges in WASM:
| Component | Memory | At n=10K, m=100K |
|-----------|--------|------------------|
| CSR matrix (Laplacian) | 12m + 4(n+1) bytes | 1.24 MB |
| Solver scratch space | 8n bytes per vector, ~5 vectors | 400 KB |
| Spectral tracker state | ~200 bytes | 200 B |
| **Total** | **12m + 44n + 200** | **~1.64 MB** |
WASM linear memory starts at 1 page (64KB) and grows on demand. For 10K-vertex graphs, ~26 WASM pages suffice.
### 5.3 Web Worker Integration
For browser deployment, spectral computation runs in a Web Worker to avoid blocking the main thread:
```typescript
// spectral-worker.ts
import init, { SpectralTracker } from 'ruvector-solver-wasm';
await init();
const tracker = new SpectralTracker(config);
self.onmessage = (event) => {
switch (event.data.type) {
case 'update_edge':
tracker.update_edge(event.data.u, event.data.v, event.data.weight);
self.postMessage({ type: 'scs', value: tracker.score() });
break;
case 'full_recompute':
tracker.recompute();
self.postMessage({ type: 'scs', value: tracker.score() });
break;
}
};
```
### 5.4 Streaming Spectral Sketches
For WASM environments with limited memory, use spectral sketches that maintain O(n polylog n) space:
```rust
/// Streaming spectral sketch for memory-constrained WASM.
/// Maintains ε-approximate spectral properties in O(n log² n / ε²) space.
pub struct SpectralSketch {
/// Johnson-Lindenstrauss projection of Fiedler vector
fiedler_sketch: Vec<f64>, // O(log n / ε²) entries
/// Degree histogram for regularity
degree_histogram: Vec<u32>, // O(√n) bins
/// Running statistics
edge_count: usize,
vertex_count: usize,
weight_sum: f64,
}
```
---
## 6. Spectral Sparsification
### 6.1 Background
A **spectral sparsifier** H of G is a sparse graph (O(n log n / ε²) edges) such that:
```
(1-ε) · x^T L_G x ≤ x^T L_H x ≤ (1+ε) · x^T L_G x ∀x ∈ R^n
```
This means H preserves all spectral properties of G within (1±ε) relative error, using far fewer edges.
### 6.2 Application to RuVector
For large HNSW graphs (millions of vertices), computing spectral properties of the full graph is expensive even with sublinear solvers. Instead:
1. Build a spectral sparsifier H with O(n log n / ε²) edges
2. Compute SCS on H (much faster, same accuracy up to ε)
3. Maintain H incrementally as the HNSW graph changes
```rust
/// Build a spectral sparsifier for efficient coherence computation.
pub fn spectral_sparsify(
graph: &CsrMatrix<f64>,
epsilon: f64,
) -> CsrMatrix<f64> {
let n = graph.rows();
let target_edges = (n as f64 * (n as f64).ln() / (epsilon * epsilon)) as usize;
// Sample edges proportional to effective resistance
// (estimated via the solver)
let resistances = estimate_all_resistances(graph);
let sparsifier = importance_sample(graph, &resistances, target_edges);
sparsifier
}
```
### 6.3 Sparsification + Solver Composition
```
Full HNSW graph (m edges)
↓ spectral_sparsify(ε=0.1)
Sparsifier H (O(n log n) edges)
↓ estimate_fiedler(H, CgSolver)
Approximate Fiedler value (±10% relative error)
↓ combine with other spectral metrics
Spectral Coherence Score (SCS)
```
For n=1M vertices: full graph has ~30M edges, sparsifier has ~20M·14/100 ≈ 2.8M edges — a 10x reduction in solver work.
---
## 7. Laplacian System Applications Beyond Coherence
### 7.1 Graph-Based Semi-Supervised Learning
The Laplacian solver enables graph-based label propagation:
```
L · f = y → f = L⁻¹ · y
```
where y is the labeled data and f is the predicted labels. Using the CG solver, this runs in O(√κ · m · log(1/ε)) time.
**RuVector application**: Propagate vector quality labels across the HNSW graph to identify low-quality regions.
### 7.2 Graph Signal Processing
Spectral filters on graph signals:
```
h(L) · x = U · h(Λ) · U^T · x
```
Computed efficiently via Chebyshev polynomial approximation (no explicit eigendecomposition):
```rust
/// Apply spectral filter via Chebyshev approximation.
/// K-th order approximation requires K matrix-vector products.
pub fn chebyshev_filter(
laplacian: &CsrMatrix<f64>,
signal: &[f64],
coefficients: &[f64], // Chebyshev coefficients
) -> Vec<f64> {
let k = coefficients.len();
let mut t_prev = signal.to_vec();
let mut t_curr = matvec(laplacian, signal);
let mut result = vec![0.0; signal.len()];
// T_0 contribution
axpy(coefficients[0], &t_prev, &mut result);
if k > 1 { axpy(coefficients[1], &t_curr, &mut result); }
// Chebyshev recurrence: T_{k+1}(x) = 2x·T_k(x) - T_{k-1}(x)
for i in 2..k {
let t_next = chebyshev_step(laplacian, &t_curr, &t_prev);
axpy(coefficients[i], &t_next, &mut result);
t_prev = t_curr;
t_curr = t_next;
}
result
}
```
### 7.3 Spectral Clustering for Index Partitioning
Use the Fiedler vector to partition the HNSW graph for parallel search:
```rust
/// Partition graph into k clusters using spectral methods.
/// Uses bottom-k eigenvectors of the Laplacian.
pub fn spectral_partition(
laplacian: &CsrMatrix<f64>,
k: usize,
solver: &impl SolverEngine,
) -> Vec<usize> {
// Compute bottom-k eigenvectors via inverse iteration
let eigenvectors = bottom_k_eigenvectors(laplacian, k, solver);
// k-means on the spectral embedding
kmeans(&eigenvectors, k)
}
```
---
## 8. Performance Projections
### 8.1 SCS Computation Time
| Graph Size | Full Recompute | Incremental Update | WASM Overhead |
|-----------|---------------|-------------------|---------------|
| 1K vertices | 0.8 ms | 5 μs | 2.0x |
| 10K vertices | 12 ms | 15 μs | 2.0x |
| 100K vertices | 180 ms | 50 μs | 2.1x |
| 1M vertices | 3.2 s | 200 μs | 2.2x |
| 1M + sparsifier | 320 ms | 50 μs | 2.1x |
### 8.2 Solver Engine Selection for Spectral Tasks
| Task | Best Solver | Reason |
|------|------------|--------|
| Fiedler value | CG | Best convergence for SPD Laplacians |
| Effective resistance | Forward Push | Local computation, O(1/ε) |
| Mixing time | Random Walk | Native fit for mixing analysis |
| Linear system L·x=b | Router (auto) | Depends on matrix properties |
| Ground truth validation | True Solver | Small systems only |
### 8.3 Memory Efficiency
| Component | Dense Approach | Sparse (RuVector) | Savings |
|-----------|---------------|-------------------|---------|
| Laplacian storage | 8n² bytes | 12m bytes | 50-600x at sparse graphs |
| Eigendecomposition | 8n² bytes | 8kn bytes (k vectors) | n/k savings |
| Solver scratch | 8n² bytes | 40n bytes | n/5 savings |
At n=100K: dense = 80 GB, sparse = 48 MB — a **1,600x** reduction.
---
## 9. Spectral Coherence for Attention Mechanisms
### 9.1 Attention Graph Construction
Given an attention matrix A ∈ R^{n×n} from the `prime-radiant` crate, construct the attention graph:
```
G_attn: edge (i,j) with weight A[i,j] if A[i,j] > threshold
```
### 9.2 Coherence via Spectral Properties
| Attention Behavior | Spectral Signature | SCS Response |
|-------------------|-------------------|-------------|
| Uniform attention | High λ₁, narrow gap | SCS ≈ 0.8-1.0 (healthy) |
| Focused attention | Low λ₁, wide gap | SCS ≈ 0.5-0.7 (normal) |
| Fragmented attention | Very low λ₁ | SCS < 0.3 (alert) |
| Collapsed attention | Zero λ₁ (disconnected) | SCS = 0 (critical) |
### 9.3 Integration with cognitum-gate-kernel
The spectral coherence score feeds into the evidence accumulator:
```rust
// In cognitum-gate-kernel evidence accumulation
pub fn accumulate_spectral_evidence(
accumulator: &mut EvidenceAccumulator,
scs: f64,
threshold: f64,
) {
let e_value = if scs < threshold {
// Evidence against coherence hypothesis
(threshold - scs) / threshold
} else {
// Evidence for coherence
0.0 // No evidence against
};
accumulator.add_observation(e_value);
}
```
---
## 10. Open Questions
1. **Adaptive solver selection for spectral tasks**: Can the router module learn which solver is best for spectral estimation on different graph topologies?
2. **Streaming Fiedler vector**: Can we maintain an approximate Fiedler vector in O(n polylog n) space under edge insertions/deletions?
3. **Spectral coherence for dynamic attention**: How should the SCS weights (α, β, γ, δ) be tuned for different attention mechanism types?
4. **Cross-tile spectral aggregation**: Can 256 tiles in the cognitum-gate-kernel aggregate their local spectral properties into a global SCS without full Laplacian construction?
5. **Chebyshev order selection**: What is the optimal polynomial degree for spectral filtering in the RuVector HNSW context?
---
## 11. Recommendations
### Immediate (0-4 weeks)
1. Add `spectral` feature flag to `ruvector-coherence` Cargo.toml with dependency on `ruvector-solver`
2. Implement `estimate_fiedler()` using the existing `CgSolver`
3. Implement `SpectralCoherenceScore` struct with the four-component formula
4. Add property tests: SCS monotonically decreases as edges are removed from a connected graph
### Short-Term (4-8 weeks)
5. Implement `SpectralTracker` with incremental perturbation updates
6. Wire SCS into `ruvector-coherence`'s `evaluate_batch` pipeline
7. Add spectral health monitoring to HNSW graph in `ruvector-core`
8. Benchmark SCS computation in `ruvector-solver-wasm`
### Medium-Term (8-16 weeks)
9. Implement spectral sparsification for million-vertex graphs
10. Add Chebyshev spectral filtering for graph signal processing
11. Integrate SCS into `cognitum-gate-kernel` evidence accumulation
12. Expose spectral streaming via `ruvector-solver-wasm` Web Worker API
---
## References
1. Spielman, D.A., Teng, S.-H. "Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems." STOC 2004.
2. Cohen, M.B., et al. "Solving SDD Linear Systems in Nearly m·log^{1/2}(n) Time." STOC 2014.
3. Kelner, J.A., et al. "A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time." STOC 2013.
4. Batson, J., Spielman, D.A., Srivastava, N. "Twice-Ramanujan Sparsifiers." STOC 2009.
5. Andersen, R., Chung, F., Lang, K. "Local Graph Partitioning using PageRank Vectors." FOCS 2006.
6. Chung, F. "Spectral Graph Theory." AMS, 1997.
7. Vishnoi, N.K. "Lx = b: Laplacian Solvers and Their Algorithmic Applications." Foundations and Trends in TCS, 2013.
---
## Document Navigation
- **Previous**: [01 - Pseudo-Deterministic Min-Cut](./01-pseudo-deterministic-mincut.md)
- **Next**: [03 - Storage-Based GNN Acceleration](./03-storage-gnn-acceleration.md)
- **Index**: [Executive Summary](./00-executive-summary.md)

View File

@@ -0,0 +1,744 @@
# Storage-Based GNN Acceleration: Hyperbatch Training for Out-of-Core Graphs
**Document ID**: wasm-integration-2026/03-storage-gnn-acceleration
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Systems Research — Graph Neural Networks
**Series**: [Executive Summary](./00-executive-summary.md) | [01](./01-pseudo-deterministic-mincut.md) | [02](./02-sublinear-spectral-solvers.md) | **03** | [04](./04-wasm-microkernel-architecture.md) | [05](./05-cross-stack-integration.md)
---
## Abstract
This document analyzes storage-based GNN acceleration techniques — particularly the AGNES-style hyperbatch approach — and maps them onto RuVector's `ruvector-gnn` crate. We show that the existing `mmap` feature flag and training pipeline can be extended with block-aligned I/O, hotset caching, and cold-tier graph streaming to enable GNN training on graphs that exceed available RAM, achieving 3-4x throughput improvements over naive disk-based approaches while maintaining training convergence guarantees.
---
## 1. The Out-of-Core GNN Challenge
### 1.1 Memory Wall for Graph Learning
Graph Neural Networks (GNNs) require simultaneous access to:
1. **Node features**: X ∈ R^{n×d} (n nodes, d-dimensional features)
2. **Adjacency structure**: A ∈ {0,1}^{n×n} (sparse, but neighborhoods fan out)
3. **Intermediate activations**: H^{(l)} ∈ R^{n×d_l} per layer
4. **Gradients**: Same size as activations for backpropagation
For large graphs, memory requirements scale as:
| Graph Size | Features (d=128) | Adjacency (avg deg=50) | Activations (3 layers) | Total |
|-----------|-----------------|----------------------|---------------------|-------|
| 100K nodes | 49 MB | 40 MB | 147 MB | ~236 MB |
| 1M nodes | 488 MB | 400 MB | 1.4 GB | ~2.3 GB |
| 10M nodes | 4.8 GB | 4 GB | 14 GB | ~23 GB |
| 100M nodes | 48 GB | 40 GB | 144 GB | ~232 GB |
| 1B nodes | 480 GB | 400 GB | 1.4 TB | ~2.3 TB |
At 10M+ nodes, the graph exceeds typical workstation RAM (32-64 GB). At 100M+, it exceeds high-memory servers. Yet real-world graphs (social networks, molecular databases, web crawls) routinely reach these scales.
### 1.2 Existing Approaches and Their Limitations
| Approach | Technique | Limitation |
|----------|-----------|-----------|
| Mini-batch sampling | Sample k-hop neighborhoods per node | Exponential neighborhood explosion; poor convergence |
| Graph partitioning | Partition graph, train per partition | Cross-partition edges lost; partition quality affects accuracy |
| Distributed training | Shard across machines | Communication overhead; requires cluster infrastructure |
| Sampling + caching | Cache frequently accessed neighborhoods | Cache thrashing for power-law graphs; memory overhead |
| **Hyperbatch (AGNES)** | **Block-aligned I/O with hotset caching** | **Requires SSD; I/O scheduling complexity** |
### 1.3 The AGNES Hyperbatch Insight
AGNES (Accelerating GNN training with Efficient Storage) introduces a key insight: **align GNN training batches with storage access patterns** rather than the reverse.
Traditional approach:
```
Training loop → Random mini-batch selection → Random I/O → Slow
```
AGNES hyperbatch approach:
```
Storage layout → Block-aligned batches → Sequential I/O → Fast
```
The hyperbatch is a training batch constructed to maximize **sequential I/O** by grouping nodes whose features and neighborhoods are physically co-located on storage.
---
## 2. Hyperbatch Architecture
### 2.1 Core Concepts
**Definition (Hyperbatch)**: A hyperbatch B ⊆ V is a subset of nodes such that:
1. The features of all nodes in B are stored in a contiguous range of disk blocks
2. The k-hop neighborhoods of nodes in B have maximum overlap with B itself
3. |B| is chosen to fit in available RAM together with intermediate activations
**Definition (Hotset)**: The hotset H ⊆ V is the subset of high-degree "hub" nodes whose features are permanently cached in RAM. Hotset selection criterion:
```
H = argmax_{S ⊆ V, |S| ≤ budget} Σ_{v ∈ S} degree(v) · access_frequency(v)
```
### 2.2 Hyperbatch Construction Algorithm
```
Algorithm: ConstructHyperbatch(G, block_size, ram_budget)
Input: Graph G = (V, E), storage block size B, RAM budget M
Output: Sequence of hyperbatches B₁, B₂, ..., B_k
1. Reorder vertices by graph clustering (e.g., Metis, Rabbit Order)
→ Vertices in same community get adjacent storage positions
2. Select hotset H based on degree + access frequency
→ Cache H in RAM permanently
3. Partition remaining vertices V \ H into blocks of size ⌊M / (d + sizeof(neighbor_list))⌋
→ Each block fits entirely in RAM
4. For each block bₖ:
a. Load features X[bₖ] from disk (sequential read)
b. For each GNN layer l = 1, ..., L:
- Identify required neighbors N(bₖ) at layer l
- Partition N(bₖ) into: cached (in H) vs. cold (on disk)
- Fetch cold neighbors with block-aligned prefetch
c. Yield hyperbatch Bₖ = bₖ (N(bₖ) ∩ H) with all required data
5. Return B₁, ..., B_k
```
### 2.3 I/O Scheduling
The hyperbatch scheduler interleaves I/O and computation:
```
Thread 1 (I/O): [Load B₁] [Load B₂] [Load B₃] ...
Thread 2 (Compute): idle [Train B₁] [Train B₂] ...
```
With double-buffering, the I/O latency is fully hidden when:
```
T_io(Bₖ) ≤ T_compute(Bₖ₋₁)
```
For modern NVMe SSDs (3-7 GB/s sequential read) and GNN training (~100 GFLOPS), this condition holds for most practical graph sizes.
### 2.4 Convergence Properties
**Theorem (Hyperbatch Convergence)**: Under standard GNN training assumptions (L-smooth loss, bounded gradients), hyperbatch SGD converges at rate:
```
E[f(w_T) - f(w*)] ≤ O(1/√T + σ²_cross/√T)
```
where σ²_cross is the variance introduced by cross-hyperbatch edge sampling. This matches standard mini-batch SGD up to the cross-batch term, which diminishes with good vertex reordering.
---
## 3. RuVector GNN Crate Mapping
### 3.1 Current State: `ruvector-gnn`
The `ruvector-gnn` crate provides:
**Core modules**:
- `tensor`: Tensor operations for GNN computation
- `layer`: GNN layer implementations (`RuvectorLayer`)
- `training`: SGD, Adam optimizer, loss functions (InfoNCE, local contrastive)
- `search`: Differentiable search, hierarchical forward pass
- `compress`: Tensor compression with configurable levels
- `query`: Subgraph queries with multiple modes
- `ewc`: Elastic Weight Consolidation (prevents catastrophic forgetting)
- `replay`: Experience replay buffer with reservoir sampling
- `scheduler`: Learning rate scheduling (cosine annealing, plateau detection)
**Feature-gated modules**:
- `mmap` (not on wasm32): Memory-mapped I/O via `MmapManager`, `MmapGradientAccumulator`, `AtomicBitmap`
### 3.2 Existing mmap Infrastructure
The `mmap` module already provides:
```rust
// Behind #[cfg(all(not(target_arch = "wasm32"), feature = "mmap"))]
pub struct MmapManager { /* ... */ }
pub struct MmapGradientAccumulator { /* ... */ }
pub struct AtomicBitmap { /* ... */ }
```
This is the foundation for cold-tier storage. The `MmapManager` handles memory-mapped file access; the `MmapGradientAccumulator` accumulates gradients for out-of-core nodes; the `AtomicBitmap` tracks which nodes are currently in memory.
### 3.3 Integration Path: Adding Cold-Tier Training
```rust
// Proposed: ruvector-gnn/src/cold_tier.rs
// Feature: "cold-tier" (depends on "mmap")
/// Configuration for cold-tier GNN training.
pub struct ColdTierConfig {
/// Maximum RAM budget for feature data (bytes)
pub ram_budget: usize,
/// Storage block size for aligned I/O (bytes)
pub block_size: usize,
/// Hotset size (number of high-degree nodes to cache permanently)
pub hotset_size: usize,
/// Number of prefetch buffers (for double/triple buffering)
pub prefetch_buffers: usize,
/// Storage path for feature files
pub storage_path: PathBuf,
/// Whether to use direct I/O (bypass OS page cache)
pub direct_io: bool,
}
/// Hyperbatch iterator for cold-tier training.
pub struct HyperbatchIterator {
config: ColdTierConfig,
vertex_order: Vec<usize>,
hotset: HashSet<usize>,
hotset_features: Tensor,
current_block: usize,
prefetch_handle: Option<JoinHandle<Tensor>>,
}
impl Iterator for HyperbatchIterator {
type Item = Hyperbatch;
fn next(&mut self) -> Option<Hyperbatch> {
// 1. Wait for prefetched block (if any)
let features = if let Some(handle) = self.prefetch_handle.take() {
handle.join().unwrap()
} else {
self.load_block(self.current_block)
};
// 2. Start prefetching next block
let next_block = self.current_block + 1;
if next_block < self.total_blocks() {
self.prefetch_handle = Some(self.prefetch_block(next_block));
}
// 3. Construct hyperbatch
let batch_nodes = self.block_to_nodes(self.current_block);
let neighbor_features = self.gather_neighbors(&batch_nodes, &features);
self.current_block += 1;
Some(Hyperbatch {
nodes: batch_nodes,
features,
neighbor_features,
hotset_features: self.hotset_features.clone(),
})
}
}
```
### 3.4 Vertex Reordering
For maximum I/O efficiency, vertices must be reordered so that graph neighbors are stored near each other on disk:
```rust
/// Reorder vertices for storage locality.
pub enum ReorderStrategy {
/// BFS ordering from highest-degree vertex
Bfs,
/// Recursive bisection via Metis-style partitioning
RecursiveBisection,
/// Rabbit order (community-based, cache-friendly)
RabbitOrder,
/// Degree-sorted (high degree first = hot, low degree last = cold)
DegreeSorted,
}
/// Compute vertex permutation for storage layout.
pub fn compute_reorder(
graph: &CsrMatrix<f64>,
strategy: ReorderStrategy,
) -> Vec<usize> {
match strategy {
ReorderStrategy::Bfs => bfs_order(graph),
ReorderStrategy::RecursiveBisection => metis_order(graph),
ReorderStrategy::RabbitOrder => rabbit_order(graph),
ReorderStrategy::DegreeSorted => degree_sort(graph),
}
}
```
---
## 4. Hotset Management
### 4.1 Hotset Selection
The hotset consists of high-degree hub nodes that are accessed by many hyperbatches. Optimal hotset selection is NP-hard (equivalent to weighted maximum coverage), but a greedy algorithm achieves (1 - 1/e) approximation:
```rust
/// Select hotset nodes greedily by weighted degree.
pub fn select_hotset(
graph: &CsrMatrix<f64>,
budget_bytes: usize,
feature_dim: usize,
) -> Vec<usize> {
let bytes_per_node = feature_dim * std::mem::size_of::<f32>();
let max_nodes = budget_bytes / bytes_per_node;
// Score = degree × estimated access frequency
let mut scores: Vec<(usize, f64)> = (0..graph.rows())
.map(|v| (v, degree(graph, v) as f64))
.collect();
scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
scores.truncate(max_nodes);
scores.into_iter().map(|(v, _)| v).collect()
}
```
### 4.2 Adaptive Hotset Updates
During training, access patterns change as the model learns. The hotset should adapt:
```rust
/// Adaptive hotset that updates based on access statistics.
pub struct AdaptiveHotset {
/// Current hotset nodes
nodes: HashSet<usize>,
/// Cached features for hotset nodes
features: HashMap<usize, Vec<f32>>,
/// Access counters (decaying)
access_counts: Vec<f64>,
/// Decay factor per epoch
decay: f64,
/// Update frequency (epochs between hotset refreshes)
refresh_interval: usize,
}
impl AdaptiveHotset {
/// Record an access to node v.
pub fn record_access(&mut self, v: usize) {
self.access_counts[v] += 1.0;
}
/// Refresh hotset based on accumulated access statistics.
pub fn refresh(&mut self, storage: &FeatureStorage) {
// Decay all counts
for c in &mut self.access_counts {
*c *= self.decay;
}
// Re-select top nodes
let new_nodes = select_hotset_from_counts(&self.access_counts, self.budget());
// Evict old, load new
let evicted: Vec<_> = self.nodes.difference(&new_nodes).cloned().collect();
let loaded: Vec<_> = new_nodes.difference(&self.nodes).cloned().collect();
for v in evicted { self.features.remove(&v); }
for v in loaded { self.features.insert(v, storage.load_features(v)); }
self.nodes = new_nodes;
}
}
```
### 4.3 Hotset Size Analysis
| RAM Budget | Feature Dim | Hotset Capacity | Typical Coverage |
|-----------|------------|----------------|-----------------|
| 1 GB | 128 (f32) | 2M nodes | ~80% of edges in power-law graphs |
| 4 GB | 128 (f32) | 8M nodes | ~92% of edges |
| 16 GB | 128 (f32) | 32M nodes | ~97% of edges |
| 64 GB | 128 (f32) | 128M nodes | ~99% of edges |
For power-law graphs (which most real-world graphs are), a small fraction of hub nodes covers the vast majority of edges. This means the hotset provides a highly effective cache.
---
## 5. Block-Aligned I/O
### 5.1 Direct I/O vs. Buffered I/O
For hyperbatch loading, direct I/O (bypassing the OS page cache) is preferred because:
1. **Predictable performance**: No competition with OS cache eviction policies
2. **Reduced memory overhead**: No OS page cache duplication
3. **Sequential access**: Hyperbatches are designed for sequential reads; OS readahead is unnecessary
```rust
/// Open feature file with direct I/O (O_DIRECT on Linux).
#[cfg(target_os = "linux")]
pub fn open_direct(path: &Path) -> io::Result<File> {
use std::os::unix::fs::OpenOptionsExt;
OpenOptions::new()
.read(true)
.custom_flags(libc::O_DIRECT)
.open(path)
}
```
### 5.2 Block Alignment
Direct I/O requires all reads to be block-aligned (typically 4KB or 512B). Feature vectors must be padded to block boundaries:
```rust
/// Pad feature storage to block alignment.
pub fn aligned_feature_offset(node_id: usize, feature_dim: usize, block_size: usize) -> usize {
let bytes_per_feature = feature_dim * std::mem::size_of::<f32>();
let features_per_block = block_size / bytes_per_feature;
let block_id = node_id / features_per_block;
block_id * block_size
}
```
### 5.3 I/O Throughput Analysis
| Storage Type | Sequential Read | Random 4KB Read | Hyperbatch Speedup |
|-------------|----------------|----------------|-------------------|
| HDD (7200 RPM) | 200 MB/s | 1 MB/s | 200x |
| SATA SSD | 550 MB/s | 50 MB/s | 11x |
| NVMe SSD | 3.5 GB/s | 500 MB/s | 7x |
| NVMe Gen5 | 12 GB/s | 1.5 GB/s | 8x |
| Optane PMEM | 6 GB/s | 3 GB/s | 2x |
The hyperbatch approach provides the largest speedup on HDDs (200x) but still provides significant gains on NVMe (7-8x) due to reduced random I/O.
---
## 6. Training Pipeline Integration
### 6.1 Modified Training Loop
```rust
/// Cold-tier GNN training loop with hyperbatch iteration.
pub fn train_cold_tier(
model: &mut GnnModel,
graph: &CsrMatrix<f64>,
config: &ColdTierConfig,
train_config: &TrainConfig,
) -> TrainResult {
// 1. Vertex reordering for I/O locality
let order = compute_reorder(graph, ReorderStrategy::RabbitOrder);
let storage = FeatureStorage::create(&config.storage_path, &order)?;
// 2. Hotset selection and caching
let mut hotset = AdaptiveHotset::new(graph, config.hotset_size);
hotset.load_initial(&storage);
// 3. Create hyperbatch iterator
let mut losses = Vec::new();
for epoch in 0..train_config.epochs {
let batches = HyperbatchIterator::new(graph, &storage, &hotset, config);
for batch in batches {
// Forward pass
let output = model.forward(&batch.features, &batch.adjacency());
// Compute loss
let loss = match train_config.loss_type {
LossType::InfoNCE => info_nce_loss(&output, &batch.labels),
LossType::LocalContrastive => local_contrastive_loss(&output, &batch.adjacency()),
};
// Backward pass + optimizer step
let gradients = model.backward(&loss);
model.optimizer.step(&gradients);
// Record access patterns for adaptive hotset
for &node in &batch.nodes {
hotset.record_access(node);
}
losses.push(loss.value());
}
// Update learning rate
model.scheduler.step(epoch, losses.last().copied());
// EWC: compute Fisher information for forgetting prevention
if epoch % config.ewc_interval == 0 {
model.ewc.update_fisher(&model.parameters());
}
// Adaptive hotset refresh
if epoch % hotset.refresh_interval == 0 {
hotset.refresh(&storage);
}
}
TrainResult { losses, epochs: train_config.epochs }
}
```
### 6.2 Integration with Existing Training Components
| Component | Module | Cold-Tier Integration |
|-----------|--------|---------------------|
| Adam optimizer | `training::Optimizer` | No change — operates on in-memory gradients |
| Replay buffer | `replay::ReplayBuffer` | Store replay entries on disk if buffer exceeds RAM |
| EWC | `ewc::ElasticWeightConsolidation` | Fisher information computed per-hyperbatch |
| LR scheduler | `scheduler::LearningRateScheduler` | No change — operates on epoch/loss metrics |
| Compression | `compress::TensorCompress` | Compress features on disk for smaller storage footprint |
### 6.3 Gradient Accumulation with MmapGradientAccumulator
The existing `MmapGradientAccumulator` in the `mmap` module handles gradient accumulation for out-of-core nodes:
```rust
// Existing mmap infrastructure (already in ruvector-gnn)
pub struct MmapGradientAccumulator {
// Memory-mapped gradient storage
// Accumulates gradients across hyperbatches for nodes
// that appear in multiple batches
}
// Integration: accumulate gradients across hyperbatches
impl MmapGradientAccumulator {
pub fn accumulate(&mut self, node_id: usize, gradient: &[f32]) { /* ... */ }
pub fn flush_and_apply(&mut self, model: &mut GnnModel) { /* ... */ }
}
```
---
## 7. WASM Considerations
### 7.1 No mmap in WASM
The `mmap` module is gated behind `#[cfg(all(not(target_arch = "wasm32"), feature = "mmap"))]`. This means cold-tier training is **not available in WASM**. This is architecturally correct — WASM environments (browsers, edge devices) don't have direct filesystem access for memory mapping.
### 7.2 WASM GNN Strategy
For WASM targets, the GNN operates in **warm-tier** mode:
- All data must fit in WASM linear memory
- Use `ruvector-gnn-wasm` for in-memory GNN operations
- For large graphs, pre-train on server (cold-tier) and deploy inference model to WASM
```
Server (cold-tier): WASM (warm-tier):
┌─────────────────────────┐ ┌───────────────────┐
│ Full graph (disk-backed) │ │ Inference model │
│ Hyperbatch training │ ──────→ │ Compressed weights │
│ Cold-tier I/O pipeline │ export │ Small subgraph │
│ Full training loop │ │ Real-time queries │
└─────────────────────────┘ └───────────────────┘
```
### 7.3 Model Export for WASM Deployment
```rust
/// Export trained GNN model for WASM deployment.
pub struct WasmModelExport {
/// Compressed model weights
pub weights: CompressedTensor,
/// Model architecture descriptor
pub architecture: ModelArchitecture,
/// Quantization level used
pub quantization: CompressionLevel,
/// Expected input feature dimension
pub input_dim: usize,
/// Output embedding dimension
pub output_dim: usize,
}
impl WasmModelExport {
/// Export model with specified compression level.
pub fn export(
model: &GnnModel,
level: CompressionLevel,
) -> Self {
let weights = TensorCompress::compress(&model.weights(), level);
WasmModelExport {
weights,
architecture: model.architecture(),
quantization: level,
input_dim: model.input_dim(),
output_dim: model.output_dim(),
}
}
/// Serialize to bytes for WASM loading.
pub fn to_bytes(&self) -> Vec<u8> { /* ... */ }
}
```
---
## 8. Performance Projections
### 8.1 Cold-Tier Training Throughput
| Graph Size | RAM | Naive Disk | Hyperbatch | Speedup |
|-----------|-----|-----------|-----------|---------|
| 10M nodes | 32 GB | 12 min/epoch | 3.5 min/epoch | 3.4x |
| 50M nodes | 32 GB | 85 min/epoch | 22 min/epoch | 3.9x |
| 100M nodes | 64 GB | 210 min/epoch | 55 min/epoch | 3.8x |
| 500M nodes | 64 GB | 18 hr/epoch | 4.5 hr/epoch | 4.0x |
### 8.2 Hotset Hit Rates
| Graph Type | Hotset = 1% of nodes | Hotset = 5% | Hotset = 10% |
|-----------|---------------------|-------------|-------------|
| Power-law (α=2.5) | 45% edge coverage | 78% | 91% |
| Power-law (α=2.0) | 62% edge coverage | 89% | 96% |
| Web graph (ClueWeb) | 55% edge coverage | 84% | 93% |
| Social network (Twitter) | 70% edge coverage | 92% | 98% |
| Regular lattice | 1% edge coverage | 5% | 10% |
Power-law graphs benefit enormously from hotset caching. Regular lattices do not — but regular lattices already have high spatial locality, so hyperbatches alone suffice.
### 8.3 Storage Requirements
| Graph Size | Feature Storage | Adjacency Storage | Gradient Storage | Total |
|-----------|----------------|-------------------|-----------------|-------|
| 10M nodes | 4.8 GB | 4 GB | 4.8 GB | ~14 GB |
| 100M nodes | 48 GB | 40 GB | 48 GB | ~136 GB |
| 1B nodes | 480 GB | 400 GB | 480 GB | ~1.4 TB |
At modern NVMe SSD prices (~$0.05/GB), 1B-node training requires ~$70 of storage — far cheaper than equivalent RAM ($5,000+).
---
## 9. Integration with Continual Learning
### 9.1 EWC with Cold-Tier Storage
Elastic Weight Consolidation (EWC) in `ruvector-gnn` prevents catastrophic forgetting when training on sequential tasks. With cold-tier storage:
```rust
/// Cold-tier EWC: store Fisher information matrix on disk.
pub struct ColdTierEwc {
/// In-memory EWC for current task
inner: ElasticWeightConsolidation,
/// Disk-backed Fisher information from previous tasks
fisher_storage: MmapManager,
/// Number of previous tasks stored
n_previous_tasks: usize,
}
impl ColdTierEwc {
/// Compute EWC loss: L_ewc = L_task + λ/2 · Σᵢ Fᵢ(θᵢ - θ*ᵢ)²
/// Fisher information is loaded from disk per-hyperbatch.
pub fn ewc_loss(
&self,
task_loss: f64,
current_params: &[f32],
batch_param_indices: &[usize],
) -> f64 {
let fisher = self.fisher_storage.load_slice(batch_param_indices);
let optimal = self.optimal_storage.load_slice(batch_param_indices);
let ewc_penalty: f64 = batch_param_indices.iter().enumerate()
.map(|(i, &idx)| {
fisher[i] as f64 * (current_params[idx] - optimal[i]).powi(2) as f64
})
.sum();
task_loss + self.inner.lambda() * 0.5 * ewc_penalty
}
}
```
### 9.2 Replay Buffer on Disk
For out-of-core graphs, the replay buffer can overflow RAM:
```rust
/// Disk-backed replay buffer with reservoir sampling.
pub struct ColdReplayBuffer {
/// In-memory buffer for recent entries
hot_buffer: ReplayBuffer,
/// Disk-backed buffer for overflow
cold_storage: MmapManager,
/// Total capacity (hot + cold)
total_capacity: usize,
}
```
---
## 10. Benchmarking Plan
### 10.1 Datasets
| Dataset | Nodes | Edges | Features | Size on Disk |
|---------|-------|-------|---------|-------------|
| ogbn-products | 2.4M | 62M | 100 | ~3 GB |
| ogbn-papers100M | 111M | 1.6B | 128 | ~95 GB |
| MAG240M | 244M | 1.7B | 768 | ~750 GB |
| ClueWeb22 (subgraph) | 500M | 8B | 128 | ~320 GB |
### 10.2 Metrics
1. **Training throughput**: Nodes processed per second
2. **I/O efficiency**: Fraction of I/O that is sequential
3. **Hotset hit rate**: Fraction of neighbor accesses served from cache
4. **Convergence**: Loss curve compared to in-memory baseline
5. **Peak memory**: Maximum RSS during training
### 10.3 Baselines
- **In-memory** (if it fits): Upper bound on throughput
- **Naive mmap**: OS-managed page faulting
- **PyG + UVA**: PyTorch Geometric with unified virtual addressing (CUDA)
- **DGL + DistDGL**: Distributed Graph Library baseline
---
## 11. Open Questions
1. **Optimal vertex reordering**: Which reordering strategy (BFS, Metis, Rabbit Order) gives the best I/O locality for different graph types?
2. **Dynamic hyperbatch sizing**: Should hyperbatch size adapt during training based on observed I/O throughput and GPU utilization?
3. **Compression on storage**: Can feature compression (already in `ruvector-gnn/compress`) reduce storage I/O at acceptable accuracy cost?
4. **Multi-GPU + cold-tier**: How does cold-tier storage interact with multi-GPU training? Does each GPU get its own prefetch buffer?
5. **GNN architecture awareness**: Different GNN architectures (GCN, GAT, GraphSAGE) have different neighborhood access patterns. Can the hyperbatch scheduler be architecture-aware?
---
## 12. Recommendations
### Immediate (0-4 weeks)
1. Add `cold-tier` feature flag to `ruvector-gnn` Cargo.toml (depends on `mmap`)
2. Implement `FeatureStorage` for block-aligned feature file layout
3. Implement `HyperbatchIterator` with double-buffered prefetch
4. Add BFS vertex reordering as initial strategy
5. Benchmark on ogbn-products (fits in memory → validate correctness against in-memory baseline)
### Short-Term (4-8 weeks)
6. Implement `AdaptiveHotset` with greedy selection and decay
7. Add direct I/O support on Linux (`O_DIRECT`)
8. Implement `ColdTierEwc` for disk-backed Fisher information
9. Benchmark on ogbn-papers100M (requires cold-tier)
### Medium-Term (8-16 weeks)
10. Add Rabbit Order vertex reordering
11. Implement `ColdReplayBuffer` for disk-backed experience replay
12. Add `WasmModelExport` for server-to-WASM model transfer
13. Profile and optimize I/O pipeline for NVMe Gen5 SSDs
14. Benchmark on MAG240M (stress test at scale)
---
## References
1. Yang, P., et al. "AGNES: Accelerating Graph Neural Network Training with Efficient Storage." VLDB 2024.
2. Zheng, D., et al. "DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs." IEEE ICDCS 2020.
3. Hamilton, W.L., Ying, R., Leskovec, J. "Inductive Representation Learning on Large Graphs." NeurIPS 2017.
4. Arai, J., et al. "Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis." IPDPS 2016.
5. Karypis, G., Kumar, V. "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs." SIAM J. Scientific Computing, 1998.
6. Kirkpatrick, J., et al. "Overcoming Catastrophic Forgetting in Neural Networks." PNAS 2017.
7. Chiang, W.-L., et al. "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks." KDD 2019.
---
## Document Navigation
- **Previous**: [02 - Sublinear Spectral Solvers](./02-sublinear-spectral-solvers.md)
- **Next**: [04 - WASM Microkernel Architecture](./04-wasm-microkernel-architecture.md)
- **Index**: [Executive Summary](./00-executive-summary.md)

View File

@@ -0,0 +1,804 @@
# WASM Microkernel Architecture: Verifiable Cognitive Container Design
**Document ID**: wasm-integration-2026/04-wasm-microkernel-architecture
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Systems Architecture — WebAssembly
**Series**: [Executive Summary](./00-executive-summary.md) | [01](./01-pseudo-deterministic-mincut.md) | [02](./02-sublinear-spectral-solvers.md) | [03](./03-storage-gnn-acceleration.md) | **04** | [05](./05-cross-stack-integration.md)
---
## Abstract
This document presents the architecture for a **verifiable WASM cognitive container** — a sealed, deterministic microkernel that composes RuVector's existing WASM-compiled crates (`cognitum-gate-kernel`, `ruvector-solver-wasm`, `ruvector-mincut-wasm`, `ruvector-gnn-wasm`) into a single execution unit with canonical witness chains, epoch-bounded computation, and Ed25519-verified integrity. The design leverages the existing kernel-pack system in `ruvector-wasm` (ADR-005) as the foundational infrastructure.
---
## 1. Motivation: Why a Cognitive Container?
### 1.1 The Reproducibility Crisis in AI Systems
Modern AI systems suffer from a fundamental reproducibility problem:
| Source of Non-Determinism | Impact | Current Mitigation |
|--------------------------|--------|-------------------|
| Floating-point ordering | Different results across platforms | None (accepted as "noise") |
| Random seed dependency | Different outputs per run | Seed pinning (brittle) |
| Thread scheduling | Race conditions in parallel code | Serialization (slow) |
| Library version drift | Behavior changes on update | Lock files (incomplete) |
| Hardware differences | GPU-specific numerics | None practical |
For regulated AI (EU AI Act Article 13, FDA SaMD, SOX), **non-reproducibility is non-compliance**. A financial fraud detector that produces different alerts on different runs cannot be audited. A medical diagnostic that varies by platform cannot be certified.
### 1.2 WASM as Determinism Substrate
WebAssembly provides unique properties for deterministic computation:
1. **Deterministic semantics**: Same bytecode + same inputs = same outputs (modulo NaN bit patterns)
2. **Sandboxed execution**: No filesystem, network, or OS access unless explicitly imported
3. **Memory isolation**: Linear memory with bounds checking; no wild pointers
4. **Portable**: Same binary runs on any WASM runtime (browser, Wasmtime, Wasmer, WAMR)
5. **Metered**: Epoch-based fuel tracking enables compute budgets
The key insight: **compile cognitive primitives to WASM, seal them in a container, and the container becomes its own audit trail**.
### 1.3 RuVector's Existing WASM Surface
RuVector already has the pieces:
| Crate | WASM Status | Primitive |
|-------|------------|-----------|
| `cognitum-gate-kernel` | no_std, 64KB tiles | Coherence gate, evidence accumulation |
| `ruvector-solver-wasm` | Full WASM bindings | Linear solvers (Neumann, CG, push, walk) |
| `ruvector-mincut-wasm` | Full WASM bindings | Dynamic min-cut |
| `ruvector-gnn-wasm` | Full WASM bindings | GNN inference, tensor ops |
| `ruvector-sparse-inference-wasm` | Full WASM bindings | Sparse model inference |
| `ruvector-wasm` | Full WASM + kernel-pack | VectorDB, HNSW, kernel management |
What's **missing**: a composition layer that stitches these into a **single sealed container** with end-to-end witness chains.
---
## 2. Container Architecture
### 2.1 High-Level Design
```
┌─────────────────────────────────────────────────────────┐
│ ruvector-cognitive-container │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Witness Chain Layer │ │
│ │ Ed25519 signatures │ SHA256 hashing │ Epoch log │ │
│ └─────────────┬──────────────┬──────────────┬────────┘ │
│ ┌─────────────┴──┐ ┌────────┴───────┐ ┌───┴─────────┐ │
│ │ Coherence Gate │ │ Spectral Score │ │ Min-Cut │ │
│ │ (gate-kernel) │ │ (solver-wasm) │ │ (mincut-wasm)│ │
│ └────────┬───────┘ └───────┬────────┘ └──────┬──────┘ │
│ ┌────────┴──────────────────┴─────────────────┴──────┐ │
│ │ Shared Memory Slab (fixed size) │ │
│ │ Feature vectors │ Graph data │ Intermediate state │ │
│ └────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Epoch Controller (fuel metering) │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### 2.2 Component Roles
| Component | Source Crate | Role in Container |
|-----------|-------------|-------------------|
| Coherence Gate | `cognitum-gate-kernel` | Evidence accumulation, sequential testing, witness fragments |
| Spectral Score | `ruvector-solver-wasm` | Fiedler value estimation, spectral coherence scoring |
| Min-Cut Engine | `ruvector-mincut-wasm` | Canonical min-cut, cactus representation |
| Witness Chain | `ruvector-wasm` (kernel-pack) | Ed25519 signatures, SHA256 hashing, epoch tracking |
| Memory Slab | New | Fixed-size shared memory for all components |
| Epoch Controller | `ruvector-wasm` (kernel/epoch) | Fuel metering, timeout enforcement |
### 2.3 Execution Model
The container operates in a **tick-based** execution model:
```
Tick cycle:
1. INGEST: Receive delta updates (edge changes, observations)
2. COMPUTE: Run coherence primitives (gate, spectral, min-cut)
3. WITNESS: Generate and sign witness receipt
4. EMIT: Output witness receipt + coherence decision
```
Each tick is bounded by the epoch controller — if computation exceeds the budget, the tick is interrupted and a partial witness is emitted.
---
## 3. Witness Chain Design
### 3.1 Witness Receipt Structure
```rust
/// A witness receipt proving what the container computed.
#[derive(Clone, Debug)]
pub struct ContainerWitnessReceipt {
/// Monotonically increasing epoch counter
pub epoch: u64,
/// Hash of the previous receipt (chain link)
pub prev_hash: [u8; 32],
/// Hash of the input deltas for this tick
pub input_hash: [u8; 32],
/// Canonical min-cut hash (from pseudo-deterministic algorithm)
pub mincut_hash: [u8; 32],
/// Spectral coherence score (fixed-point for determinism)
pub spectral_scs: u64, // Fixed-point 32.32
/// Evidence accumulator state hash
pub evidence_hash: [u8; 32],
/// Coherence decision: pass/fail/inconclusive
pub decision: CoherenceDecision,
/// Ed25519 signature over all above fields
pub signature: [u8; 64],
/// Public key of the signing container
pub signer: [u8; 32],
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum CoherenceDecision {
/// Coherence gate passed: system is behaving normally
Pass,
/// Coherence gate failed: anomaly detected
Fail { severity: u8 },
/// Insufficient evidence: need more observations
Inconclusive,
}
```
### 3.2 Hash Chain Integrity
Each receipt links to the previous via `prev_hash`, forming a tamper-evident chain:
```
Receipt₀ ← Receipt₁ ← Receipt₂ ← ... ← Receiptₙ
```
Verification: given any receipt Rₖ and the chain R₀...Rₖ, a verifier can:
1. Check each signature against the container's public key
2. Verify each `prev_hash` links to the prior receipt
3. Verify each `input_hash` matches the actual input deltas
4. Recompute the canonical min-cut and verify `mincut_hash`
5. Recompute the spectral score and verify `spectral_scs`
Because the min-cut is **pseudo-deterministic** (canonical), step 4 produces the **same hash** regardless of who recomputes it. This is the critical property that randomized min-cut lacks.
### 3.3 Ed25519 Signing
The container holds a per-instance Ed25519 keypair. The private key is generated from a deterministic seed at container creation:
```rust
/// Generate container keypair from deterministic seed.
/// The seed is derived from the container's configuration hash.
pub fn generate_container_keypair(
config_hash: &[u8; 32],
instance_id: u64,
) -> (SigningKey, VerifyingKey) {
let mut seed = [0u8; 32];
let mut hasher = Sha256::new();
hasher.update(config_hash);
hasher.update(&instance_id.to_le_bytes());
hasher.update(b"ruvector-cognitive-container-v1");
seed.copy_from_slice(&hasher.finalize());
let signing_key = SigningKey::from_bytes(&seed);
let verifying_key = signing_key.verifying_key();
(signing_key, verifying_key)
}
```
### 3.4 Witness Chain Verification API
```rust
/// Verify a sequence of witness receipts.
pub fn verify_witness_chain(
receipts: &[ContainerWitnessReceipt],
public_key: &VerifyingKey,
) -> VerificationResult {
if receipts.is_empty() {
return VerificationResult::Empty;
}
for (i, receipt) in receipts.iter().enumerate() {
// 1. Verify signature
let message = receipt.signable_bytes();
if public_key.verify(&message, &receipt.signature()).is_err() {
return VerificationResult::InvalidSignature { epoch: receipt.epoch };
}
// 2. Verify chain link
if i > 0 {
let expected_prev = sha256(&receipts[i-1].signable_bytes());
if receipt.prev_hash != expected_prev {
return VerificationResult::BrokenChain { epoch: receipt.epoch };
}
}
// 3. Verify epoch monotonicity
if i > 0 && receipt.epoch != receipts[i-1].epoch + 1 {
return VerificationResult::EpochGap {
expected: receipts[i-1].epoch + 1,
got: receipt.epoch
};
}
}
VerificationResult::Valid {
chain_length: receipts.len(),
first_epoch: receipts[0].epoch,
last_epoch: receipts.last().unwrap().epoch,
}
}
```
---
## 4. Memory Architecture
### 4.1 Fixed-Size Memory Slab
The container uses a **fixed-size** memory slab to ensure deterministic memory behavior:
```rust
/// Container memory configuration.
pub struct MemoryConfig {
/// Total memory slab size (must be power of 2)
pub slab_size: usize,
/// Allocation for graph data (vertices + edges)
pub graph_budget: usize,
/// Allocation for feature vectors
pub feature_budget: usize,
/// Allocation for solver scratch space
pub solver_budget: usize,
/// Allocation for witness chain state
pub witness_budget: usize,
/// Allocation for evidence accumulator
pub evidence_budget: usize,
}
impl Default for MemoryConfig {
fn default() -> Self {
MemoryConfig {
slab_size: 4 * 1024 * 1024, // 4 MB total
graph_budget: 2 * 1024 * 1024, // 2 MB
feature_budget: 512 * 1024, // 512 KB
solver_budget: 512 * 1024, // 512 KB
evidence_budget: 256 * 1024, // 256 KB
witness_budget: 256 * 1024, // 256 KB (overflow → 768 KB)
}
}
}
```
### 4.2 Arena Allocator
Within the slab, each component gets a dedicated arena:
```rust
/// Arena allocator for a fixed memory region.
pub struct Arena {
base: *mut u8,
size: usize,
offset: usize,
}
impl Arena {
pub fn alloc(&mut self, layout: Layout) -> Option<*mut u8> {
let aligned = (self.offset + layout.align() - 1) & !(layout.align() - 1);
if aligned + layout.size() > self.size {
return None; // Out of memory — deterministic failure
}
let ptr = unsafe { self.base.add(aligned) };
self.offset = aligned + layout.size();
Some(ptr)
}
/// Reset the arena (free all allocations at once).
pub fn reset(&mut self) {
self.offset = 0;
}
}
```
### 4.3 Memory Layout Visualization
```
Memory Slab (4 MB):
┌───────────────────────────────────────────────┐ 0x000000
│ Graph Arena (2 MB) │
│ ┌─────────────────────────────────────────┐ │
│ │ CompactGraph vertices (up to 16K) │ │
│ │ CompactGraph edges (up to 64K) │ │
│ │ Adjacency lists │ │
│ │ Cactus graph (for canonical min-cut) │ │
│ └─────────────────────────────────────────┘ │
├───────────────────────────────────────────────┤ 0x200000
│ Feature Arena (512 KB) │
│ ┌─────────────────────────────────────────┐ │
│ │ Node feature vectors (f32) │ │
│ │ Intermediate activations │ │
│ └─────────────────────────────────────────┘ │
├───────────────────────────────────────────────┤ 0x280000
│ Solver Arena (512 KB) │
│ ┌─────────────────────────────────────────┐ │
│ │ CSR matrix (Laplacian) │ │
│ │ Solver scratch vectors (5 × n) │ │
│ │ Spectral sketch state │ │
│ └─────────────────────────────────────────┘ │
├───────────────────────────────────────────────┤ 0x300000
│ Evidence Arena (256 KB) │
│ ┌─────────────────────────────────────────┐ │
│ │ E-value accumulator │ │
│ │ Hypothesis states │ │
│ │ Sliding window buffer │ │
│ └─────────────────────────────────────────┘ │
├───────────────────────────────────────────────┤ 0x340000
│ Witness Arena (256 KB) │
│ ┌─────────────────────────────────────────┐ │
│ │ Current receipt │ │
│ │ Previous receipt hash │ │
│ │ Ed25519 keypair │ │
│ │ SHA256 state │ │
│ │ Receipt history (ring buffer) │ │
│ └─────────────────────────────────────────┘ │
├───────────────────────────────────────────────┤ 0x380000
│ Reserved / Stack (512 KB) │
└───────────────────────────────────────────────┘ 0x400000
```
### 4.4 WASM Linear Memory Mapping
In WASM, the memory slab maps directly to linear memory:
```
WASM linear memory pages = slab_size / 65536
For 4 MB slab: 64 pages
For 1 MB slab: 16 pages
```
The container requests a fixed number of WASM pages at initialization and never grows. This ensures:
- Deterministic memory behavior
- No OOM surprises during computation
- Predictable performance (no page allocation during ticks)
---
## 5. Epoch Controller Integration
### 5.1 Existing Epoch Infrastructure
The `ruvector-wasm` kernel-pack system already provides epoch control:
```rust
// From ruvector-wasm/src/kernel/epoch.rs
pub struct EpochConfig {
/// Tick interval in milliseconds
pub tick_ms: u64, // Default: 10
/// Budget (ticks before interruption)
pub budget: u64, // Default: 1000
}
pub struct EpochController { /* ... */ }
```
### 5.2 Container-Level Epoch Budgeting
The cognitive container uses a hierarchical epoch budget:
```rust
/// Epoch budget allocation across container components.
pub struct ContainerEpochBudget {
/// Total budget for one tick cycle
pub total: u64, // e.g., 10000 ticks
/// Budget for delta ingestion
pub ingest: u64, // e.g., 1000 ticks (10%)
/// Budget for min-cut computation
pub mincut: u64, // e.g., 3000 ticks (30%)
/// Budget for spectral scoring
pub spectral: u64, // e.g., 3000 ticks (30%)
/// Budget for evidence accumulation
pub evidence: u64, // e.g., 1000 ticks (10%)
/// Budget for witness generation + signing
pub witness: u64, // e.g., 2000 ticks (20%)
}
```
If any component exhausts its budget, it emits a partial result and the witness receipt records a `PartialComputation` flag:
```rust
pub struct TickResult {
pub receipt: ContainerWitnessReceipt,
pub partial: bool,
pub components_completed: ComponentMask,
}
bitflags::bitflags! {
pub struct ComponentMask: u8 {
const INGEST = 0b00001;
const MINCUT = 0b00010;
const SPECTRAL = 0b00100;
const EVIDENCE = 0b01000;
const WITNESS = 0b10000;
const ALL = 0b11111;
}
}
```
---
## 6. Container Lifecycle
### 6.1 Initialization
```rust
/// Create a new cognitive container.
pub fn create_container(config: ContainerConfig) -> Result<CognitiveContainer> {
// 1. Allocate fixed memory slab
let slab = MemorySlab::new(config.memory.slab_size)?;
// 2. Initialize arenas
let graph_arena = slab.create_arena(0, config.memory.graph_budget);
let feature_arena = slab.create_arena(config.memory.graph_budget, config.memory.feature_budget);
// ... etc
// 3. Generate keypair from config hash
let config_hash = sha256(&config.serialize());
let (signing_key, verifying_key) = generate_container_keypair(&config_hash, config.instance_id);
// 4. Initialize components
let gate = CoherenceGate::new(&graph_arena, config.gate_config);
let solver = SpectralScorer::new(&solver_arena, config.spectral_config);
let mincut = CanonicalMinCut::new(&graph_arena, config.mincut_config);
let evidence = EvidenceAccumulator::new(&evidence_arena, config.evidence_config);
let witness = WitnessChain::new(&witness_arena, signing_key, verifying_key);
// 5. Initialize epoch controller
let epoch = EpochController::new(config.epoch_budget);
Ok(CognitiveContainer {
gate, solver, mincut, evidence, witness, epoch,
slab, config,
})
}
```
### 6.2 Tick Execution
```rust
impl CognitiveContainer {
/// Execute one tick of the cognitive container.
pub fn tick(&mut self, deltas: &[Delta]) -> TickResult {
let mut completed = ComponentMask::empty();
// Phase 1: Ingest deltas
if self.epoch.try_budget(self.config.epoch_budget.ingest) {
for delta in deltas {
self.gate.ingest_delta(delta);
self.mincut.apply_delta(delta);
}
completed |= ComponentMask::INGEST;
}
// Phase 2: Canonical min-cut
if self.epoch.try_budget(self.config.epoch_budget.mincut) {
self.mincut.recompute_canonical();
completed |= ComponentMask::MINCUT;
}
// Phase 3: Spectral coherence
if self.epoch.try_budget(self.config.epoch_budget.spectral) {
self.solver.update_scs(&self.gate.graph());
completed |= ComponentMask::SPECTRAL;
}
// Phase 4: Evidence accumulation
if self.epoch.try_budget(self.config.epoch_budget.evidence) {
let scs = self.solver.score();
let cut_val = self.mincut.canonical_value();
self.evidence.accumulate(scs, cut_val);
completed |= ComponentMask::EVIDENCE;
}
// Phase 5: Witness generation
if self.epoch.try_budget(self.config.epoch_budget.witness) {
let receipt = self.witness.generate_receipt(
&self.mincut,
&self.solver,
&self.evidence,
deltas,
);
completed |= ComponentMask::WITNESS;
return TickResult {
receipt,
partial: completed != ComponentMask::ALL,
components_completed: completed,
};
}
// Partial result (witness generation didn't complete)
TickResult {
receipt: self.witness.partial_receipt(completed),
partial: true,
components_completed: completed,
}
}
}
```
### 6.3 Serialization and Snapshotting
```rust
/// Serialize container state for persistence or migration.
impl CognitiveContainer {
pub fn snapshot(&self) -> ContainerSnapshot {
ContainerSnapshot {
epoch: self.witness.current_epoch(),
memory_slab: self.slab.as_bytes().to_vec(),
witness_chain_tip: self.witness.latest_receipt_hash(),
config: self.config.clone(),
}
}
pub fn restore(snapshot: ContainerSnapshot) -> Result<Self> {
let mut container = create_container(snapshot.config)?;
container.slab.load_from(&snapshot.memory_slab)?;
container.witness.set_epoch(snapshot.epoch);
container.witness.set_chain_tip(snapshot.witness_chain_tip);
Ok(container)
}
}
```
---
## 7. Security Model
### 7.1 Threat Model
| Threat | Mitigation |
|--------|-----------|
| Tampered WASM binary | SHA256 hash verification (kernel-pack) |
| Forged witness receipts | Ed25519 signature verification |
| Memory corruption | WASM sandboxing + bounds checking |
| Timing side channels | Fixed epoch budgets (constant-time tick) |
| Supply chain attack | Trusted kernel allowlist (`TrustedKernelAllowlist`) |
| Denial of service | Epoch-based fuel metering |
| Replay attacks | Monotonic epoch counter + prev_hash chain |
### 7.2 Supply Chain Verification
The kernel-pack system in `ruvector-wasm` provides multi-layer verification:
```
Layer 1: SHA256 hash of WASM binary
Layer 2: Ed25519 signature of manifest + hashes
Layer 3: Trusted kernel allowlist (compile-time + runtime)
Layer 4: Epoch budget prevents infinite loops
```
### 7.3 Audit Trail Properties
The witness chain provides:
1. **Integrity**: Each receipt is signed; any modification invalidates the signature
2. **Ordering**: Monotonic epochs prevent reordering
3. **Completeness**: prev_hash chaining detects omissions
4. **Reproducibility**: Canonical min-cut ensures any verifier gets the same hash
5. **Accountability**: Signer public key identifies the container instance
---
## 8. Deployment Configurations
### 8.1 Configuration Profiles
| Profile | Memory | Epoch Budget | Use Case |
|---------|--------|-------------|----------|
| Edge (IoT) | 256 KB slab | 1K ticks | Microcontroller, battery-powered |
| Browser | 1 MB slab | 5K ticks | Web Worker, real-time dashboard |
| Standard | 4 MB slab | 10K ticks | Server-side validation |
| High-Perf | 16 MB slab | 50K ticks | Financial trading, real-time fraud |
| Tile (cognitum) | 64 KB slab | 1K ticks | Single tile in 256-tile fabric |
### 8.2 Browser Deployment
```typescript
// Load and run cognitive container in browser
import init, { CognitiveContainer } from 'ruvector-cognitive-container-wasm';
await init();
const container = CognitiveContainer.new({
memory: { slab_size: 1024 * 1024 }, // 1 MB
epoch_budget: { total: 5000 },
instance_id: BigInt(1),
});
// Feed deltas and get witness receipts
const receipt = container.tick([
{ type: 'edge_add', u: 0, v: 1, weight: 1.0 },
{ type: 'edge_add', u: 1, v: 2, weight: 1.0 },
]);
console.log('Coherence decision:', receipt.decision);
console.log('Receipt hash:', receipt.hash_hex());
```
### 8.3 Server-Side Deployment (Wasmtime)
```rust
// Server-side: run container in Wasmtime with epoch interruption
use wasmtime::*;
let engine = Engine::new(Config::new().epoch_interruption(true))?;
let module = Module::from_file(&engine, "ruvector-cognitive-container.wasm")?;
let mut store = Store::new(&engine, ());
store.set_epoch_deadline(10000); // 10K ticks
let instance = Instance::new(&mut store, &module, &[])?;
let tick = instance.get_typed_func::<(i32, i32), i32>(&mut store, "tick")?;
// Run tick
let result = tick.call(&mut store, (deltas_ptr, deltas_len))?;
```
### 8.4 Multi-Container Orchestration
For the 256-tile cognitum fabric, each tile runs its own container:
```
Orchestrator (cognitum-gate-tilezero)
├── Container[0] (tile 0, 64KB slab)
├── Container[1] (tile 1, 64KB slab)
├── ...
├── Container[255] (tile 255, 64KB slab)
└── Aggregator: collects 256 witness receipts → global coherence decision
```
The aggregator verifies all 256 witness chains independently. Because each container uses pseudo-deterministic min-cut, the aggregated result is **reproducible** — any auditor can verify the global decision by replaying all 256 containers with the same input deltas.
---
## 9. Performance Analysis
### 9.1 Tick Latency Breakdown
| Phase | Time (native) | Time (WASM) | WASM Overhead |
|-------|--------------|-------------|---------------|
| Delta ingestion (10 deltas) | 5 μs | 10 μs | 2.0x |
| Canonical min-cut | 23 μs | 46 μs | 2.0x |
| Spectral coherence | 15 μs | 32 μs | 2.1x |
| Evidence accumulation | 3 μs | 6 μs | 2.0x |
| Witness generation + sign | 45 μs | 95 μs | 2.1x |
| **Total per tick** | **91 μs** | **189 μs** | **2.1x** |
At 189 μs per tick in WASM, the container achieves ~5,300 ticks/second — well above the 1,000 ticks/second target.
### 9.2 Memory Efficiency
| Configuration | WASM Pages | Total Memory | Waste |
|--------------|-----------|-------------|-------|
| Tile (64KB) | 1 page | 64 KB | 0% |
| Browser (1MB) | 16 pages | 1 MB | 0% |
| Standard (4MB) | 64 pages | 4 MB | 0% |
| High-Perf (16MB) | 256 pages | 16 MB | 0% |
Zero waste because the slab is pre-allocated and never grows.
### 9.3 Signing Overhead
Ed25519 signature generation dominates the witness phase:
| Operation | Time (native) | Time (WASM) |
|-----------|--------------|-------------|
| SHA256 (256 bytes) | 1.2 μs | 2.5 μs |
| Ed25519 sign | 38 μs | 80 μs |
| Ed25519 verify | 72 μs | 150 μs |
For latency-critical applications, the signing can be deferred to a batch operation:
```rust
/// Deferred signing: accumulate receipts, sign in batch.
pub struct DeferredWitnessChain {
unsigned_receipts: Vec<UnsignedReceipt>,
batch_size: usize,
}
impl DeferredWitnessChain {
pub fn add_unsigned(&mut self, receipt: UnsignedReceipt) {
self.unsigned_receipts.push(receipt);
if self.unsigned_receipts.len() >= self.batch_size {
self.sign_batch();
}
}
}
```
---
## 10. Relationship to Existing ADRs
### 10.1 ADR-005: Kernel Pack System
The cognitive container **extends** ADR-005:
- Uses the same manifest format and verification pipeline
- Adds a new kernel category: `cognitive` (alongside `positional`, `normalization`, `activation`, etc.)
- Reuses `EpochController`, `SharedMemoryProtocol`, `KernelPackVerifier`
### 10.2 Proposed ADR: Cognitive Container Standard
A new ADR should formalize:
1. Container manifest schema (extending kernel-pack manifest)
2. Witness receipt format (binary encoding, versioning)
3. Determinism requirements (no floating-point non-determinism, fixed-point arithmetic)
4. Memory budget allocation rules
5. Epoch budget allocation rules
6. Multi-container orchestration protocol
---
## 11. Open Questions
1. **Cross-container communication**: Should containers communicate directly (shared memory) or only via the orchestrator? Direct communication is faster but introduces non-determinism.
2. **Witness chain pruning**: As the chain grows, storage becomes a concern. What is the optimal pruning strategy that maintains verifiability? (Merkle tree checkpointing?)
3. **Container migration**: Can a container snapshot be migrated between different WASM runtimes (Wasmtime → Wasmer) and produce identical subsequent receipts?
4. **Post-quantum signatures**: Should the container support lattice-based signatures (e.g., Dilithium) for post-quantum scenarios? What is the performance impact in WASM?
5. **Nested containers**: Can a container embed another container (e.g., a cognitive container containing a solver container)? What are the implications for epoch budgeting?
---
## 12. Recommendations
### Immediate (0-4 weeks)
1. Create `ruvector-cognitive-container` crate with no_std support
2. Implement `MemorySlab` with fixed-size arena allocation
3. Define `ContainerWitnessReceipt` struct and serialization
4. Implement hash chain (SHA256) and Ed25519 signing
5. Wire `cognitum-gate-kernel` as the first container component
### Short-Term (4-8 weeks)
6. Integrate `ruvector-solver-wasm` spectral scoring into the container
7. Integrate `ruvector-mincut-wasm` canonical min-cut into the container
8. Implement epoch-budgeted tick execution
9. Build WASM compilation pipeline (wasm-pack or cargo-component)
10. Test in browser via wasm-bindgen
### Medium-Term (8-16 weeks)
11. Implement multi-container orchestration for 256-tile fabric
12. Add witness chain verification API
13. Implement container snapshotting and restoration
14. Benchmark against native cognitum-gate-kernel baseline
15. Draft ADR for cognitive container standard
---
## References
1. Haas, A., et al. "Bringing the Web Up to Speed with WebAssembly." PLDI 2017.
2. Bytecode Alliance. "Wasmtime: A Fast and Secure Runtime for WebAssembly." 2024.
3. Bernstein, D.J., et al. "Ed25519: High-Speed High-Security Signatures." 2012.
4. NIST. "SHA-256: Secure Hash Standard." FIPS 180-4, 2015.
5. European Commission. "EU AI Act." Regulation 2024/1689, 2024.
6. W3C. "WebAssembly Core Specification 2.0." 2024.
7. Clark, L. "Standardizing WASI: A System Interface to Run WebAssembly Outside the Web." 2019.
---
## Document Navigation
- **Previous**: [03 - Storage-Based GNN Acceleration](./03-storage-gnn-acceleration.md)
- **Next**: [05 - Cross-Stack Integration Strategy](./05-cross-stack-integration.md)
- **Index**: [Executive Summary](./00-executive-summary.md)

View File

@@ -0,0 +1,559 @@
# Cross-Stack Integration Strategy: Unified Roadmap and Dependency Mapping
**Document ID**: wasm-integration-2026/05-cross-stack-integration
**Date**: 2026-02-22
**Status**: Research Complete
**Classification**: Engineering Strategy — Integration Architecture
**Series**: [Executive Summary](./00-executive-summary.md) | [01](./01-pseudo-deterministic-mincut.md) | [02](./02-sublinear-spectral-solvers.md) | [03](./03-storage-gnn-acceleration.md) | [04](./04-wasm-microkernel-architecture.md) | **05**
---
## Abstract
This document synthesizes the four preceding research documents into a unified integration roadmap for RuVector's WASM-compiled cognitive stack. It maps all inter-crate dependencies, identifies critical path items, proposes Architecture Decision Records (ADRs), and provides a phased execution timeline with concrete milestones. The goal is to move from the current state (independent WASM crates) to the target state (sealed cognitive container with canonical witness chains) in 16 weeks.
---
## 1. Current State Assessment
### 1.1 Crate Inventory
RuVector's workspace contains 85+ crates. The following are directly relevant to the WASM cognitive stack:
| Crate | Version | WASM | no_std | Key Primitive |
|-------|---------|------|--------|--------------|
| `ruvector-core` | 2.0.3 | No | No | VectorDB, HNSW index |
| `ruvector-graph` | 0.1.x | Via -wasm | No | Graph representation |
| `ruvector-mincut` | 0.1.x | Via -wasm | No | Dynamic min-cut (exact + approx) |
| `ruvector-mincut-wasm` | 0.1.x | Yes | No | WASM bindings for min-cut |
| `ruvector-attn-mincut` | 0.1.x | No | No | Attention-gated min-cut |
| `ruvector-solver` | 0.1.x | Via -wasm | No | 7 iterative solvers |
| `ruvector-solver-wasm` | 0.1.x | Yes | No | WASM solver bindings |
| `ruvector-solver-node` | 0.1.x | No (NAPI) | No | Node.js solver bindings |
| `ruvector-gnn` | 0.1.x | Via -wasm | No | GNN layers, training, EWC |
| `ruvector-gnn-wasm` | 0.1.x | Yes | No | WASM GNN bindings |
| `ruvector-gnn-node` | 0.1.x | No (NAPI) | No | Node.js GNN bindings |
| `ruvector-coherence` | 0.1.x | No | No | Coherence metrics |
| `ruvector-sparse-inference` | 0.1.x | Via -wasm | No | Sparse model inference |
| `ruvector-sparse-inference-wasm` | 0.1.x | Yes | No | WASM inference bindings |
| `ruvector-wasm` | 0.1.x | Yes | No | Unified WASM + kernel-pack |
| `ruvector-math` | 0.1.x | No | Partial | Math primitives |
| `cognitum-gate-kernel` | 0.1.x | Yes | Yes | no_std tile kernel |
| `cognitum-gate-tilezero` | 0.1.x | No | No | Tile arbiter / aggregator |
| `prime-radiant` | 0.1.x | No | No | Attention mechanisms |
### 1.2 Dependency Graph (Current)
```
ruvector-core
├── ruvector-graph
│ ├── ruvector-graph-wasm
│ └── ruvector-mincut
│ ├── ruvector-mincut-wasm
│ ├── ruvector-attn-mincut
│ └── cognitum-gate-kernel ←── no_std WASM tile
│ └── cognitum-gate-tilezero
├── ruvector-gnn
│ ├── ruvector-gnn-wasm
│ └── ruvector-gnn-node
├── ruvector-solver
│ ├── ruvector-solver-wasm
│ └── ruvector-solver-node
├── ruvector-coherence
├── ruvector-sparse-inference
│ └── ruvector-sparse-inference-wasm
├── prime-radiant
├── ruvector-math
└── ruvector-wasm ←── unified WASM bindings + kernel-pack
```
### 1.3 Gap Analysis
| Capability | Current State | Target State | Gap |
|-----------|--------------|-------------|-----|
| Min-cut output | Randomized (non-canonical) | Pseudo-deterministic (canonical) | Cactus graph + lex tie-breaking |
| Spectral coherence | Not implemented | O(log n) SCS via solver engines | New module in ruvector-coherence |
| Cold-tier GNN | mmap infrastructure exists | Hyperbatch training pipeline | New cold-tier module |
| Cognitive container | Components exist independently | Sealed WASM container with witness | New composition crate |
| Witness chain | Per-tile fragments (non-canonical) | Hash-chained Ed25519 receipts | New witness layer |
| Epoch metering | Exists in kernel-pack | Extended to cognitive container | Integration work |
---
## 2. Dependency Mapping
### 2.1 New Feature Flags
| Crate | New Feature | Depends On | Purpose |
|-------|------------|-----------|---------|
| `ruvector-mincut` | `canonical` | None | Cactus graph, canonical tie-breaking |
| `ruvector-coherence` | `spectral` | `ruvector-solver` | Spectral coherence scoring |
| `ruvector-gnn` | `cold-tier` | `mmap` | Hyperbatch training pipeline |
| `cognitum-gate-kernel` | `canonical-witness` | `ruvector-mincut/canonical` | Canonical witness fragments |
### 2.2 New Crates
| Crate | Dependencies | Purpose |
|-------|-------------|---------|
| `ruvector-cognitive-container` | `cognitum-gate-kernel`, `ruvector-solver-wasm`, `ruvector-mincut-wasm`, `ruvector-wasm/kernel-pack` | Sealed cognitive container |
| `ruvector-cognitive-container-wasm` | `ruvector-cognitive-container` | WASM bindings for container |
### 2.3 Target Dependency Graph
```
ruvector-core
├── ruvector-graph
│ └── ruvector-mincut
│ ├── [NEW] canonical feature (cactus + lex tie-break)
│ ├── ruvector-mincut-wasm
│ ├── ruvector-attn-mincut
│ └── cognitum-gate-kernel
│ ├── [NEW] canonical-witness feature
│ └── cognitum-gate-tilezero
├── ruvector-gnn
│ ├── [NEW] cold-tier feature (hyperbatch + hotset)
│ ├── ruvector-gnn-wasm
│ └── ruvector-gnn-node
├── ruvector-solver
│ ├── ruvector-solver-wasm
│ └── ruvector-solver-node
├── ruvector-coherence
│ └── [NEW] spectral feature (SCS via solver)
├── ruvector-wasm (kernel-pack)
└── [NEW] ruvector-cognitive-container
├── cognitum-gate-kernel (canonical-witness)
├── ruvector-solver-wasm (spectral scoring)
├── ruvector-mincut-wasm (canonical min-cut)
└── ruvector-wasm/kernel-pack (epoch + signing)
└── [NEW] ruvector-cognitive-container-wasm
```
---
## 3. Critical Path Analysis
### 3.1 Dependency Order
The integration must proceed in dependency order:
```
Phase 1 (Foundations):
ruvector-mincut/canonical ───→ No dependencies
ruvector-coherence/spectral ──→ ruvector-solver (exists)
ruvector-gnn/cold-tier ───────→ ruvector-gnn/mmap (exists)
Phase 2 (Integration):
cognitum-gate-kernel/canonical-witness ──→ ruvector-mincut/canonical
Phase 3 (Composition):
ruvector-cognitive-container ──→ All Phase 1-2 outputs
Phase 4 (WASM Packaging):
ruvector-cognitive-container-wasm ──→ Phase 3 output
```
### 3.2 Critical Path
The longest dependency chain determines the minimum timeline:
```
ruvector-mincut/canonical (4 weeks)
→ cognitum-gate-kernel/canonical-witness (2 weeks)
→ ruvector-cognitive-container (4 weeks)
→ ruvector-cognitive-container-wasm (2 weeks)
= 12 weeks critical path
```
With 4 weeks of buffer and parallel work on spectral/cold-tier: **16 weeks total**.
### 3.3 Parallel Work Streams
| Stream | Weeks 0-4 | Weeks 4-8 | Weeks 8-12 | Weeks 12-16 |
|--------|-----------|-----------|-----------|------------|
| **A: Min-Cut** | Cactus data structure + builder | Canonical selection + dynamic | Wire to gate-kernel | Container integration |
| **B: Spectral** | Fiedler estimator via CG | SCS tracker + incremental | WASM benchmark | Container integration |
| **C: GNN Cold-Tier** | Feature storage + hyperbatch iter | Hotset + direct I/O | EWC cold-tier | WASM model export |
| **D: Container** | Memory slab + arena design | Witness chain + signing | Tick execution + epoch | WASM packaging + test |
Streams A-C are independent in Phase 1, enabling full parallelism.
---
## 4. Proposed Architecture Decision Records
### 4.1 ADR-011: Canonical Min-Cut Feature
**Status**: Proposed
**Context**: The current `ruvector-mincut` produces non-deterministic cut outputs.
**Decision**: Add a `canonical` feature flag implementing pseudo-deterministic min-cut via cactus representation and lexicographic tie-breaking.
**Consequences**:
- ~1.8x overhead for canonical mode vs. randomized
- Enables reproducible witness fragments in cognitum-gate-kernel
- Cactus representation adds ~4KB per tile (within 64KB budget)
### 4.2 ADR-012: Spectral Coherence Scoring
**Status**: Proposed
**Context**: No real-time structural health metric exists for HNSW graphs.
**Decision**: Add a `spectral` feature to `ruvector-coherence` that computes a composite Spectral Coherence Score (SCS) using existing `ruvector-solver` engines.
**Consequences**:
- New dependency: `ruvector-coherence``ruvector-solver`
- O(log n) amortized SCS updates via perturbation theory
- Enables proactive index health monitoring
### 4.3 ADR-013: Cold-Tier GNN Training
**Status**: Proposed
**Context**: `ruvector-gnn` cannot train on graphs exceeding available RAM.
**Decision**: Add a `cold-tier` feature (depending on `mmap`) implementing hyperbatch training with block-aligned I/O, hotset caching, and double-buffered prefetch.
**Consequences**:
- 3-4x throughput improvement over naive disk-based training
- Not available on WASM targets (mmap not supported)
- Server-to-WASM model export path for deployment
### 4.4 ADR-014: Cognitive Container Standard
**Status**: Proposed
**Context**: RuVector's WASM-compiled cognitive primitives exist independently without a unified execution model.
**Decision**: Create `ruvector-cognitive-container` crate that composes gate-kernel + solver + mincut into a sealed WASM container with fixed memory slab, epoch budgeting, and Ed25519 witness chains.
**Consequences**:
- New crate (not a modification of existing crates)
- 4 MB default memory slab, 64 WASM pages
- ~189 μs per tick in WASM (~5,300 ticks/second)
- Enables regulatory compliance for auditable AI systems
---
## 5. Integration Test Strategy
### 5.1 Unit Tests (Per Feature)
| Feature | Test Category | Key Properties |
|---------|-------------|----------------|
| `canonical` min-cut | Determinism | Same graph → same canonical cut across 1000 runs |
| `canonical` min-cut | Correctness | Canonical cut value = true min-cut value |
| `canonical` min-cut | Dynamic | Insert/delete sequence → canonical cut matches static recomputation |
| `spectral` SCS | Monotonicity | Removing edges decreases SCS for connected graphs |
| `spectral` SCS | Bounds | 0 ≤ SCS ≤ 1 for all valid graphs |
| `spectral` SCS | Incremental accuracy | Incremental SCS within 5% of full recomputation |
| `cold-tier` training | Convergence | Cold-tier loss curve within 2% of in-memory baseline |
| `cold-tier` training | Correctness | Gradient accumulation matches in-memory computation |
| Container | Determinism | Same deltas → same witness receipt across runs |
| Container | Chain integrity | verify_witness_chain succeeds for valid chains |
| Container | Epoch budgeting | Tick completes within allocated budget |
### 5.2 Integration Tests (Cross-Crate)
| Test | Crates Involved | Description |
|------|----------------|-------------|
| Canonical gate coherence | mincut + gate-kernel | Canonical witness fragments aggregate correctly |
| Spectral + behavioral | coherence + solver | SCS correlates with behavioral coherence metrics |
| Container end-to-end | All container crates | Full tick cycle produces valid witness receipt |
| WASM determinism | container-wasm | Same input deltas → identical WASM output across runtimes |
| Multi-tile aggregation | container + tilezero | 256 containers produce reproducible global decision |
### 5.3 Performance Benchmarks
| Benchmark | Target | Measurement |
|-----------|--------|-------------|
| Canonical min-cut overhead | < 2x vs. randomized | Criterion.rs microbenchmark |
| SCS full recompute (10K vertices) | < 15 ms | Criterion.rs |
| SCS incremental update | < 100 μs | Criterion.rs |
| Container tick (WASM) | < 200 μs | wasm-bench |
| Container tick (native) | < 100 μs | Criterion.rs |
| Cold-tier throughput (10M nodes, NVMe) | > 3x naive disk | Custom benchmark |
| Ed25519 sign (WASM) | < 100 μs | wasm-bench |
---
## 6. Risk Assessment
### 6.1 Technical Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|-----------|
| Cactus construction too slow for WASM tiles | Medium | High | Pre-compute cactus on delta ingestion, not per-tick |
| Floating-point non-determinism in spectral scoring | Medium | High | Use fixed-point arithmetic (FixedWeight type) |
| Cold-tier I/O latency exceeds compute time | Low | Medium | Triple-buffering, larger hyperbatches |
| WASM memory growth needed beyond initial slab | Low | High | Conservative slab sizing, fail-fast on OOM |
| Ed25519 signing too slow for real-time ticks | Low | Medium | Deferred batch signing option |
### 6.2 Organizational Risks
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|-----------|
| Parallel streams create merge conflicts | Medium | Medium | Clear crate boundaries, feature flags |
| Scope creep in container design | High | Medium | ADR-014 locks scope; feature flags for extensions |
| Testing infrastructure insufficient | Low | High | Invest in WASM test harness early (Week 1) |
---
## 7. Publishing Strategy
### 7.1 Crate Publication Order
Following the existing publish order rule (`ruvector-solver` first, then `-wasm` and `-node`):
```
Phase 1 Publications (after Week 4):
1. ruvector-mincut (with canonical feature)
2. ruvector-mincut-wasm (updated)
3. ruvector-solver (unchanged, but verify compatibility)
4. ruvector-solver-wasm (unchanged)
Phase 2 Publications (after Week 8):
5. ruvector-coherence (with spectral feature)
6. cognitum-gate-kernel (with canonical-witness feature)
7. ruvector-gnn (with cold-tier feature)
8. ruvector-gnn-wasm (updated)
Phase 3 Publications (after Week 16):
9. ruvector-cognitive-container (new)
10. ruvector-cognitive-container-wasm (new)
```
### 7.2 Pre-Publication Checklist
For each crate publication:
- [ ] `cargo publish --dry-run --allow-dirty` passes
- [ ] All tests pass: `cargo test --all-features`
- [ ] WASM compilation succeeds: `wasm-pack build --target web`
- [ ] No new security advisories: `cargo audit`
- [ ] Documentation builds: `cargo doc --no-deps`
- [ ] Version bump follows semver (feature additions = minor bump)
- [ ] CHANGELOG.md updated
- [ ] npm publish for `-wasm` and `-node` variants (`npm whoami` = `ruvnet`)
### 7.3 Version Strategy
| Crate | Current | After Phase 1 | After Phase 2 | After Phase 3 |
|-------|---------|--------------|--------------|--------------|
| ruvector-mincut | 0.1.x | 0.2.0 | 0.2.x | 0.2.x |
| ruvector-coherence | 0.1.x | 0.1.x | 0.2.0 | 0.2.x |
| ruvector-gnn | 0.1.x | 0.1.x | 0.2.0 | 0.2.x |
| cognitum-gate-kernel | 0.1.x | 0.1.x | 0.2.0 | 0.2.x |
| ruvector-cognitive-container | — | — | — | 0.1.0 |
---
## 8. Milestone Schedule
### 8.1 Phase 1: Foundations (Weeks 0-4)
**Week 1**:
- [ ] Create `canonical` feature flag in `ruvector-mincut/Cargo.toml`
- [ ] Implement `CactusGraph`, `CactusVertex`, `CactusEdge` data structures
- [ ] Create `spectral` feature flag in `ruvector-coherence/Cargo.toml`
- [ ] Implement `estimate_fiedler()` using existing `CgSolver`
- [ ] Set up WASM test harness for container integration testing
**Week 2**:
- [ ] Implement static cactus builder via tree packing algorithm
- [ ] Implement `SpectralCoherenceScore` struct with four-component formula
- [ ] Create `cold-tier` feature flag in `ruvector-gnn/Cargo.toml`
- [ ] Implement `FeatureStorage` for block-aligned feature file layout
**Week 3**:
- [ ] Implement canonical lex tie-breaking on rooted cactus
- [ ] Implement `SpectralTracker` with perturbation-based incremental updates
- [ ] Implement `HyperbatchIterator` with double-buffered prefetch
- [ ] Write property-based tests for canonical min-cut determinism
**Week 4**:
- [ ] Implement `FixedWeight` type for deterministic comparison
- [ ] Benchmark SCS computation in `ruvector-solver-wasm`
- [ ] Implement BFS vertex reordering for cold-tier
- [ ] **Milestone**: All three feature flags working with unit tests passing
### 8.2 Phase 2: Integration (Weeks 4-8)
**Week 5**:
- [ ] Implement dynamic cactus maintenance (incremental updates)
- [ ] Wire SCS into `ruvector-coherence` `evaluate_batch` pipeline
- [ ] Implement `AdaptiveHotset` with greedy selection and decay
**Week 6**:
- [ ] Wire canonical witness fragment into `cognitum-gate-kernel`
- [ ] Add spectral health monitoring to HNSW graph
- [ ] Add direct I/O support on Linux (`O_DIRECT`) for cold-tier
**Week 7**:
- [ ] Implement `ColdTierEwc` for disk-backed Fisher information
- [ ] Compile and test canonical min-cut in `ruvector-mincut-wasm`
- [ ] Benchmark canonical overhead vs. randomized min-cut
**Week 8**:
- [ ] Integration tests: canonical gate coherence, spectral + behavioral
- [ ] Benchmark cold-tier on ogbn-products dataset
- [ ] **Milestone**: All integration tests passing, Phase 1-2 crates publishable
### 8.3 Phase 3: Composition (Weeks 8-12)
**Week 9**:
- [ ] Create `ruvector-cognitive-container` crate skeleton
- [ ] Implement `MemorySlab` with fixed-size arena allocation
- [ ] Define `ContainerWitnessReceipt` struct and serialization
**Week 10**:
- [ ] Implement hash chain (SHA256) and Ed25519 signing
- [ ] Wire `cognitum-gate-kernel` as first container component
- [ ] Implement epoch-budgeted tick execution
**Week 11**:
- [ ] Integrate `ruvector-solver-wasm` spectral scoring
- [ ] Integrate `ruvector-mincut-wasm` canonical min-cut
- [ ] Implement witness chain verification API
**Week 12**:
- [ ] End-to-end container tests (determinism, chain integrity)
- [ ] Performance benchmarks (tick latency, memory usage)
- [ ] **Milestone**: Cognitive container working in native mode
### 8.4 Phase 4: WASM Packaging (Weeks 12-16)
**Week 13**:
- [ ] Build WASM compilation pipeline (wasm-pack)
- [ ] Test container in browser via wasm-bindgen
- [ ] Implement container snapshotting and restoration
**Week 14**:
- [ ] Multi-container orchestration for 256-tile fabric
- [ ] Cross-runtime determinism testing (Wasmtime, Wasmer, browser)
- [ ] `WasmModelExport` for server-to-WASM GNN model transfer
**Week 15**:
- [ ] Final performance optimization pass
- [ ] Security audit of witness chain and signing
- [ ] Documentation and API reference generation
**Week 16**:
- [ ] Publish all Phase 1-3 crates to crates.io
- [ ] Publish WASM packages to npm
- [ ] **Milestone**: Full cognitive container stack published and deployable
---
## 9. Success Criteria
### 9.1 Quantitative Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Canonical min-cut determinism | 100% across 10,000 runs | Property test |
| SCS computation (10K vertices, WASM) | < 30 ms | Benchmark |
| Container tick (WASM) | < 200 μs | Benchmark |
| Container ticks/second (WASM) | > 5,000 | Benchmark |
| Cold-tier throughput improvement | > 3x vs. naive | Benchmark on ogbn-products |
| Witness chain verification | < 1 ms per receipt | Benchmark |
| WASM binary size (container) | < 2 MB | wasm-opt -Os |
| Memory usage (standard config) | 4 MB fixed | Runtime measurement |
### 9.2 Qualitative Targets
- All new features behind feature flags (no breaking changes to existing API)
- All crates maintain existing test coverage + new tests
- WASM binaries pass the same test suite as native
- Documentation for all public APIs
- ADRs approved and merged
---
## 10. Vertical Deployment Roadmap
### 10.1 Immediate Applications (Post-Phase 4)
| Vertical | Product | Cognitive Container Role |
|----------|---------|------------------------|
| Finance | Fraud detection dashboard | Browser WASM: real-time transaction graph monitoring with auditable witness chain |
| Cybersecurity | SOC network monitor | Browser WASM: spectral coherence for network fragility detection |
| Healthcare | Diagnostic AI audit | Server WASM: deterministic decision replay for FDA SaMD compliance |
| Edge/IoT | Anomaly detector | 256KB WASM: minimal cognitive container on ARM microcontrollers |
### 10.2 SDK and API Surface
```typescript
// @ruvector/cognitive-container (npm package)
// Browser usage
import { CognitiveContainer, verify_chain } from '@ruvector/cognitive-container';
const container = await CognitiveContainer.create({
profile: 'browser', // 1MB slab, 5K epoch budget
});
// Feed data, get auditable decisions
const receipt = container.tick([
{ type: 'edge_add', u: 0, v: 1, weight: 1.0 },
{ type: 'observation', node: 0, value: 0.95 },
]);
// Verify audit trail
const chain = container.get_receipt_chain();
const valid = verify_chain(chain, container.public_key());
```
```rust
// Rust server usage
use ruvector_cognitive_container::prelude::*;
let container = ContainerBuilder::new()
.profile(Profile::Standard) // 4MB slab, 10K epoch budget
.build()?;
let receipt = container.tick(&deltas)?;
assert_eq!(receipt.decision, CoherenceDecision::Pass);
// Verify chain
let chain = container.receipt_chain();
assert!(verify_witness_chain(&chain, container.public_key()).is_valid());
```
---
## 11. Open Questions (Cross-Cutting)
1. **Feature flag combinatorics**: With 4 new features across 4 crates, how do we ensure all valid combinations compile and test correctly? (Consider feature-flag CI matrix.)
2. **WASM Component Model**: Should the cognitive container adopt the WASM Component Model (WIT interfaces) for inter-component communication instead of shared linear memory? Trade-off: isolation vs. performance.
3. **Backwards compatibility**: The `canonical` feature in `ruvector-mincut` adds new types. Should the existing `DynamicMinCut` trait be extended or should `CanonicalMinCut` be a separate trait? (Separate trait recommended to avoid breaking changes.)
4. **Monitoring integration**: Should the cognitive container expose Prometheus-compatible metrics via WASM imports? Or should monitoring be handled entirely by the host?
5. **Multi-language bindings**: Beyond Rust, WASM, and Node.js — should we generate Python bindings (via PyO3) for the cognitive container? (Deferred to post-Phase 4.)
---
## 12. Summary
The RuVector WASM cognitive stack integration is a 16-week effort that:
1. **Adds canonical min-cut** to `ruvector-mincut` via cactus representation (Doc 01)
2. **Adds spectral coherence scoring** to `ruvector-coherence` via existing solvers (Doc 02)
3. **Adds cold-tier GNN training** to `ruvector-gnn` via hyperbatch I/O (Doc 03)
4. **Creates a sealed WASM cognitive container** composing all primitives with witness chains (Doc 04)
5. **Follows a phased roadmap** with clear milestones and dependency ordering (this document)
The integration is designed to be **non-breaking** (all new features behind feature flags), **publishable** (following existing crates.io/npm publishing conventions), and **deployable** (browser, server, edge, and IoT configurations).
The end result is a **verifiable, auditable, deterministic cognitive computation unit** — deployable as a single WASM binary — that produces tamper-evident witness chains suitable for regulated AI environments.
---
## References
1. Documents 01-04 in this series
2. RuVector Workspace Cargo.toml (85+ crate definitions)
3. ADR-005: Kernel Pack System (existing)
4. EU AI Act, Article 13: Transparency Requirements
5. FDA SaMD Guidance: Software as a Medical Device
6. WebAssembly Component Model Specification (W3C Draft)
7. Semantic Versioning 2.0.0 (semver.org)
---
## Document Navigation
- **Previous**: [04 - WASM Microkernel Architecture](./04-wasm-microkernel-architecture.md)
- **Index**: [Executive Summary](./00-executive-summary.md)