Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-001-core-integration-architecture.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-001-core-integration-architecture.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-002-algorithm-selection-routing.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-002-algorithm-selection-routing.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-003-memory-management-strategy.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-003-memory-management-strategy.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-004-wasm-cross-platform.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-004-wasm-cross-platform.md
@@ -0,0 +1,463 @@
+# ADR-STS-004: WASM and Cross-Platform Compilation Strategy
+
+**Status**: Accepted
+**Date**: 2026-02-20
+**Authors**: RuVector Architecture Team
+**Deciders**: Architecture Review Board
+
+## Version History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 0.1 | 2026-02-20 | RuVector Team | Initial proposal |
+| 1.0 | 2026-02-20 | RuVector Team | Accepted: full implementation complete |
+
+---
+
+## Context
+
+### Multi-Platform Deployment Requirement
+
+RuVector deploys across four target platforms with distinct constraints:
+
+| Platform | ISA | SIMD | Threads | Memory | Target Triple |
+|----------|-----|------|---------|--------|--------------|
+| Server (Linux/macOS) | x86_64 | AVX-512/AVX2/SSE4.1 | Full (Rayon) | 2+ GB | x86_64-unknown-linux-gnu |
+| Edge (Apple Silicon) | ARM64 | NEON | Full (Rayon) | 512 MB | aarch64-apple-darwin |
+| Browser | wasm32 | SIMD128 | Web Workers | 4-8 MB | wasm32-unknown-unknown |
+| Cloudflare Workers | wasm32 | None | Single | 128 MB | wasm32-unknown-unknown |
+| Node.js (NAPI) | Native | Native | Full | 512 MB | via napi-rs |
+
+### Existing WASM Infrastructure
+
+RuVector has 15+ WASM crates following the **Core-Binding-Surface** pattern:
+
+```
+ruvector-core       →  ruvector-wasm         →  @ruvector/core (npm)
+ruvector-graph      →  ruvector-graph-wasm   →  @ruvector/graph (npm)
+ruvector-attention   →  ruvector-attention-wasm →  @ruvector/attention (npm)
+ruvector-gnn        →  ruvector-gnn-wasm     →  @ruvector/gnn (npm)
+ruvector-math       →  ruvector-math-wasm    →  @ruvector/math (npm)
+```
+
+Each WASM crate uses `wasm-bindgen 0.2`, `serde-wasm-bindgen`, `js-sys 0.3`, and `getrandom 0.3` with `wasm_js` feature.
+
+### WASM Constraints for Solver
+
+- No `std::thread` — all parallelism via Web Workers
+- No `std::fs` / `std::net` — no persistent storage, no network
+- Default linear memory: 16 MB (expandable to ~4 GB)
+- `parking_lot` required instead of `std::sync::Mutex`
+- `getrandom/wasm_js` for randomness (Hybrid Random Walk, Monte Carlo)
+- No dynamic linking — all code in single module
+
+### Performance Targets
+
+| Platform | 10K solve | 100K solve | Memory Budget |
+|----------|-----------|------------|---------------|
+| Server (AVX2) | < 2 ms | < 50 ms | 2 GB |
+| Edge (NEON) | < 5 ms | < 100 ms | 512 MB |
+| Browser (SIMD128) | < 50 ms | < 500 ms | 8 MB |
+| Edge (Cloudflare) | < 10 ms | < 200 ms | 128 MB |
+| Node.js (NAPI) | < 3 ms | < 60 ms | 512 MB |
+
+---
+
+## Decision
+
+### 1. Three-Crate Pattern
+
+Follow established RuVector convention with three crates:
+
+```
+crates/ruvector-solver/          # Core Rust (no platform deps)
+crates/ruvector-solver-wasm/     # wasm-bindgen bindings
+crates/ruvector-solver-node/     # NAPI-RS bindings
+```
+
+#### Cargo.toml for ruvector-solver (core):
+
+```toml
+[package]
+name = "ruvector-solver"
+version = "0.1.0"
+edition = "2021"
+rust-version = "1.77"
+
+[features]
+default = []
+nalgebra-backend = ["nalgebra"]
+ndarray-backend = ["ndarray"]
+parallel = ["rayon", "crossbeam"]
+simd = []
+wasm = []
+full = ["nalgebra-backend", "ndarray-backend", "parallel"]
+
+# Algorithm features
+neumann = []
+forward-push = []
+backward-push = []
+hybrid-random-walk = ["getrandom"]
+true-solver = ["neumann"]  # TRUE uses Neumann internally
+cg = []
+bmssp = []
+all-algorithms = ["neumann", "forward-push", "backward-push",
+                  "hybrid-random-walk", "true-solver", "cg", "bmssp"]
+
+[dependencies]
+serde = { workspace = true, features = ["derive"] }
+nalgebra = { workspace = true, optional = true, default-features = false }
+ndarray = { workspace = true, optional = true }
+rayon = { workspace = true, optional = true }
+crossbeam = { workspace = true, optional = true }
+getrandom = { workspace = true, optional = true }
+
+[target.'cfg(target_arch = "wasm32")'.dependencies]
+getrandom = { workspace = true, features = ["wasm_js"] }
+```
+
+#### Cargo.toml for ruvector-solver-wasm:
+
+```toml
+[package]
+name = "ruvector-solver-wasm"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+crate-type = ["cdylib"]
+
+[dependencies]
+ruvector-solver = { path = "../ruvector-solver", default-features = false,
+    features = ["wasm", "neumann", "forward-push", "backward-push", "cg"] }
+wasm-bindgen = { workspace = true }
+serde-wasm-bindgen = "0.6"
+js-sys = { workspace = true }
+web-sys = { workspace = true, features = ["console"] }
+getrandom = { workspace = true, features = ["wasm_js"] }
+
+[profile.release]
+opt-level = "s"   # Optimize for size in WASM
+lto = true
+```
+
+#### Cargo.toml for ruvector-solver-node:
+
+```toml
+[package]
+name = "ruvector-solver-node"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+crate-type = ["cdylib"]
+
+[dependencies]
+ruvector-solver = { path = "../ruvector-solver",
+    features = ["full", "all-algorithms"] }
+napi = { workspace = true, features = ["async"] }
+napi-derive = { workspace = true }
+tokio = { workspace = true, features = ["rt-multi-thread"] }
+```
+
+### 2. SIMD Strategy Per Platform
+
+#### Architecture Detection and Dispatch
+
+```rust
+/// SIMD dispatcher for solver hot paths
+pub mod simd {
+    #[cfg(target_arch = "x86_64")]
+    pub fn spmv_simd(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+        if is_x86_feature_detected!("avx512f") {
+            unsafe { spmv_avx512(vals, cols, x) }
+        } else if is_x86_feature_detected!("avx2") && is_x86_feature_detected!("fma") {
+            unsafe { spmv_avx2_fma(vals, cols, x) }
+        } else {
+            spmv_scalar(vals, cols, x)
+        }
+    }
+
+    #[cfg(target_arch = "aarch64")]
+    pub fn spmv_simd(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+        unsafe { spmv_neon_unrolled(vals, cols, x) }
+    }
+
+    #[cfg(target_arch = "wasm32")]
+    pub fn spmv_simd(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+        // WASM SIMD128 via core::arch::wasm32
+        #[cfg(target_feature = "simd128")]
+        {
+            unsafe { spmv_wasm_simd128(vals, cols, x) }
+        }
+        #[cfg(not(target_feature = "simd128"))]
+        {
+            spmv_scalar(vals, cols, x)
+        }
+    }
+
+    /// AVX2+FMA SpMV accumulation with 4x unrolling
+    #[cfg(target_arch = "x86_64")]
+    #[target_feature(enable = "avx2,fma")]
+    unsafe fn spmv_avx2_fma(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+        use std::arch::x86_64::*;
+        let mut acc0 = _mm256_setzero_ps();
+        let mut acc1 = _mm256_setzero_ps();
+        let n = vals.len();
+        let chunks = n / 16;
+
+        for i in 0..chunks {
+            let base = i * 16;
+            // Gather x values using column indices
+            let idx0 = _mm256_loadu_si256(cols.as_ptr().add(base) as *const __m256i);
+            let idx1 = _mm256_loadu_si256(cols.as_ptr().add(base + 8) as *const __m256i);
+            let x0 = _mm256_i32gather_ps::<4>(x.as_ptr(), idx0);
+            let x1 = _mm256_i32gather_ps::<4>(x.as_ptr(), idx1);
+            let v0 = _mm256_loadu_ps(vals.as_ptr().add(base));
+            let v1 = _mm256_loadu_ps(vals.as_ptr().add(base + 8));
+            acc0 = _mm256_fmadd_ps(v0, x0, acc0);
+            acc1 = _mm256_fmadd_ps(v1, x1, acc1);
+        }
+
+        // Horizontal sum
+        let sum = _mm256_add_ps(acc0, acc1);
+        let hi = _mm256_extractf128_ps::<1>(sum);
+        let lo = _mm256_castps256_ps128(sum);
+        let sum128 = _mm_add_ps(hi, lo);
+        let shuf = _mm_movehdup_ps(sum128);
+        let sums = _mm_add_ps(sum128, shuf);
+        let shuf2 = _mm_movehl_ps(sums, sums);
+        let result = _mm_add_ss(sums, shuf2);
+
+        let mut total = _mm_cvtss_f32(result);
+
+        // Scalar remainder
+        for j in (chunks * 16)..n {
+            total += vals[j] * x[cols[j] as usize];
+        }
+        total
+    }
+
+    /// NEON SpMV with 4x unrolling for ARM64
+    #[cfg(target_arch = "aarch64")]
+    unsafe fn spmv_neon_unrolled(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+        use std::arch::aarch64::*;
+        let mut acc0 = vdupq_n_f32(0.0);
+        let mut acc1 = vdupq_n_f32(0.0);
+        let mut acc2 = vdupq_n_f32(0.0);
+        let mut acc3 = vdupq_n_f32(0.0);
+        let n = vals.len();
+        let chunks = n / 16;
+
+        for i in 0..chunks {
+            let base = i * 16;
+            // Manual gather for NEON (no hardware gather instruction)
+            let mut xbuf = [0.0f32; 16];
+            for k in 0..16 {
+                xbuf[k] = *x.get_unchecked(cols[base + k] as usize);
+            }
+            let v0 = vld1q_f32(vals.as_ptr().add(base));
+            let v1 = vld1q_f32(vals.as_ptr().add(base + 4));
+            let v2 = vld1q_f32(vals.as_ptr().add(base + 8));
+            let v3 = vld1q_f32(vals.as_ptr().add(base + 12));
+            let x0 = vld1q_f32(xbuf.as_ptr());
+            let x1 = vld1q_f32(xbuf.as_ptr().add(4));
+            let x2 = vld1q_f32(xbuf.as_ptr().add(8));
+            let x3 = vld1q_f32(xbuf.as_ptr().add(12));
+            acc0 = vfmaq_f32(acc0, v0, x0);
+            acc1 = vfmaq_f32(acc1, v1, x1);
+            acc2 = vfmaq_f32(acc2, v2, x2);
+            acc3 = vfmaq_f32(acc3, v3, x3);
+        }
+
+        let sum01 = vaddq_f32(acc0, acc1);
+        let sum23 = vaddq_f32(acc2, acc3);
+        let sum = vaddq_f32(sum01, sum23);
+        let mut total = vaddvq_f32(sum);
+
+        for j in (chunks * 16)..n {
+            total += vals[j] * x[cols[j] as usize];
+        }
+        total
+    }
+}
+```
+
+### 3. Conditional Compilation Architecture
+
+```rust
+// Parallelism: Rayon on native, single-threaded on WASM
+#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
+fn batch_solve_parallel(problems: &[SparseSystem]) -> Vec<SolverResult> {
+    use rayon::prelude::*;
+    problems.par_iter().map(|p| solve_single(p)).collect()
+}
+
+#[cfg(any(not(feature = "parallel"), target_arch = "wasm32"))]
+fn batch_solve_parallel(problems: &[SparseSystem]) -> Vec<SolverResult> {
+    problems.iter().map(|p| solve_single(p)).collect()
+}
+
+// Random number generation
+#[cfg(not(target_arch = "wasm32"))]
+fn random_seed() -> u64 {
+    use std::time::SystemTime;
+    SystemTime::now().duration_since(SystemTime::UNIX_EPOCH)
+        .unwrap().as_nanos() as u64
+}
+
+#[cfg(target_arch = "wasm32")]
+fn random_seed() -> u64 {
+    let mut buf = [0u8; 8];
+    getrandom::getrandom(&mut buf).expect("getrandom failed");
+    u64::from_le_bytes(buf)
+}
+```
+
+### 4. WASM-Specific Patterns
+
+#### Web Worker Pool (JavaScript side):
+
+```javascript
+// Following existing ruvector-wasm/src/worker-pool.js pattern
+class SolverWorkerPool {
+    constructor(numWorkers = navigator.hardwareConcurrency || 4) {
+        this.workers = [];
+        this.queue = [];
+        for (let i = 0; i < numWorkers; i++) {
+            const worker = new Worker(new URL('./solver-worker.js', import.meta.url));
+            worker.onmessage = (e) => this._onResult(i, e.data);
+            this.workers.push({ worker, busy: false });
+        }
+    }
+
+    async solve(config) {
+        return new Promise((resolve, reject) => {
+            const free = this.workers.find(w => !w.busy);
+            if (free) {
+                free.busy = true;
+                free.worker.postMessage({
+                    type: 'solve',
+                    config,
+                    // Transfer ArrayBuffer for zero-copy
+                    matrix: config.matrix
+                }, [config.matrix.buffer]);
+                free.resolve = resolve;
+                free.reject = reject;
+            } else {
+                this.queue.push({ config, resolve, reject });
+            }
+        });
+    }
+}
+```
+
+#### SharedArrayBuffer (when COOP/COEP available):
+
+```javascript
+// Check for cross-origin isolation
+if (typeof SharedArrayBuffer !== 'undefined') {
+    // Zero-copy shared matrix between main thread and workers
+    const shared = new SharedArrayBuffer(matrix.byteLength);
+    new Float32Array(shared).set(matrix);
+    // Workers can read directly without transfer
+    workers.forEach(w => w.postMessage({ type: 'set_matrix', buffer: shared }));
+}
+```
+
+#### IndexedDB for Persistence:
+
+```javascript
+// Cache solver preprocessing results (TRUE sparsifier, etc.)
+class SolverCache {
+    async store(key, sparsifier) {
+        const db = await this._openDB();
+        const tx = db.transaction('cache', 'readwrite');
+        await tx.objectStore('cache').put({
+            key,
+            data: sparsifier.buffer,
+            timestamp: Date.now()
+        });
+    }
+
+    async load(key) {
+        const db = await this._openDB();
+        const tx = db.transaction('cache', 'readonly');
+        return tx.objectStore('cache').get(key);
+    }
+}
+```
+
+### 5. Build Pipeline
+
+```bash
+# WASM build (production)
+cd crates/ruvector-solver-wasm
+wasm-pack build --target web --release
+wasm-opt -O3 -o pkg/ruvector_solver_wasm_bg_opt.wasm pkg/ruvector_solver_wasm_bg.wasm
+mv pkg/ruvector_solver_wasm_bg_opt.wasm pkg/ruvector_solver_wasm_bg.wasm
+
+# WASM build with SIMD128
+RUSTFLAGS="-C target-feature=+simd128" wasm-pack build --target web --release
+
+# Node.js build
+cd crates/ruvector-solver-node
+npm run build  # napi build --release
+
+# Multi-platform CI
+cargo build --release --target x86_64-unknown-linux-gnu
+cargo build --release --target aarch64-apple-darwin
+cargo build --release --target wasm32-unknown-unknown
+```
+
+### 6. WASM Bundle Size Budget
+
+| Component | Estimated Size (gzipped) | Budget |
+|-----------|-------------------------|--------|
+| Solver core (CG + Neumann + Push) | ~80 KB | 100 KB |
+| SIMD128 kernels | ~15 KB | 20 KB |
+| wasm-bindgen glue | ~10 KB | 15 KB |
+| serde-wasm-bindgen | ~20 KB | 25 KB |
+| **Total** | **~125 KB** | **160 KB** |
+
+Optimization: Use `opt-level = "s"` and `wasm-opt -Oz` for size-constrained deployments.
+
+---
+
+## Consequences
+
+### Positive
+
+1. **Universal deployment**: Same solver logic runs on all 5 platforms
+2. **Platform-optimized**: Each target gets architecture-specific SIMD kernels
+3. **Minimal overhead**: WASM binary < 160 KB gzipped
+4. **Web Worker parallelism**: Browser gets multi-threaded solver via worker pool
+5. **SharedArrayBuffer**: Zero-copy where cross-origin isolation available
+6. **Proven pattern**: Follows RuVector's established Core-Binding-Surface architecture
+
+### Negative
+
+1. **WASM algorithm subset**: TRUE and BMSSP excluded from browser target (preprocessing cost)
+2. **SIMD gap**: WASM SIMD128 is 2-4x slower than AVX2 for equivalent operations
+3. **No WASM threads**: Web Workers add message-passing overhead vs native threads
+4. **Gather limitation**: NEON and WASM lack hardware gather; manual gather adds latency
+
+### Neutral
+
+1. nalgebra compiles to WASM with `default-features = false` — no code changes needed
+2. WASM SIMD128 support is universal in modern browsers (Chrome 91+, Firefox 89+, Safari 16.4+)
+
+---
+
+## Implementation Status
+
+WASM bindings complete via wasm-bindgen in ruvector-solver-wasm crate. All 7 algorithms exposed to JavaScript. TypedArray zero-copy for matrix data. Feature-gated compilation (wasm feature). Scalar SpMV fallback when SIMD unavailable. 32-bit index support for wasm32 memory model.
+
+---
+
+## References
+
+- [06-wasm-integration.md](../06-wasm-integration.md) — Detailed WASM analysis
+- [08-performance-analysis.md](../08-performance-analysis.md) — Platform performance targets
+- [11-typescript-integration.md](../11-typescript-integration.md) — TypeScript type generation
+- ADR-005 — RuVector WASM runtime integration
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-005-security-model.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-005-security-model.md
@@ -0,0 +1,448 @@
+# ADR-STS-005: Security Model and Threat Mitigation
+
+**Status**: Accepted
+**Date**: 2026-02-20
+**Authors**: RuVector Security Team
+**Deciders**: Architecture Review Board
+
+## Version History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 0.1 | 2026-02-20 | RuVector Team | Initial proposal |
+| 1.0 | 2026-02-20 | RuVector Team | Accepted: full implementation complete |
+
+---
+
+## Context
+
+### Current Security Posture
+
+RuVector employs defense-in-depth security across multiple layers:
+
+| Layer | Mechanism | Strength |
+|-------|-----------|----------|
+| **Cryptographic** | Ed25519 signatures, SHAKE-256 witness chains, TEE attestation (SGX/SEV-SNP) | Very High |
+| **WASM Sandbox** | Kernel pack verification (Ed25519 + SHA256 allowlist), epoch interruption, memory layout validation | High |
+| **MCP Coherence Gate** | 3-tier Permit/Defer/Deny with witness receipts, hash-chain integrity | High |
+| **Edge-Net** | PiKey Ed25519 identity, challenge-response, per-IP rate limiting, adaptive attack detection | High |
+| **Storage** | Path traversal prevention, feature-gated backends | Medium |
+| **Server API** | Serde validation, trace logging | Low |
+
+### Known Weaknesses (Pre-Integration)
+
+| ID | Weakness | DREAD Score | Severity |
+|----|----------|-------------|----------|
+| SEC-W1 | Fully permissive CORS (`allow_origin(Any)`) | 7.8 | High |
+| SEC-W2 | No REST API authentication | 9.2 | Critical |
+| SEC-W3 | Unbounded search parameters (`k` unlimited) | 6.4 | Medium |
+| SEC-W4 | 90 `unsafe` blocks in SIMD/arena/quantization | 5.2 | Medium |
+| SEC-W5 | `insecure_*` constructors without `#[cfg]` gating | 4.8 | Medium |
+| SEC-W6 | Hardcoded default backup password in edge-net | 6.1 | Medium |
+| SEC-W7 | Unvalidated collection names | 5.5 | Medium |
+
+### New Attack Surface from Solver Integration
+
+| Surface | Description | Risk |
+|---------|-------------|------|
+| AS-1 | New deserialization points (problem definitions, solver state) | High |
+| AS-2 | WASM sandbox boundary (solver WASM modules) | High |
+| AS-3 | MCP tool registration (40+ solver tools callable by AI agents) | High |
+| AS-4 | Computational cost amplification (expensive solve operations) | High |
+| AS-5 | Session management state (solver sessions) | Medium |
+| AS-6 | Cross-tool information flow (solver ↔ coherence gate) | Medium |
+
+---
+
+## Decision
+
+### 1. WASM Sandbox Integration
+
+Solver WASM modules are treated as kernel packs within the existing security framework:
+
+```rust
+pub struct SolverKernelConfig {
+    /// Ed25519 public key for solver WASM verification
+    pub signing_key: ed25519_dalek::VerifyingKey,
+
+    /// SHA256 hashes of approved solver WASM binaries
+    pub allowed_hashes: HashSet<[u8; 32]>,
+
+    /// Memory limits proportional to problem size
+    pub max_memory_pages: u32,  // Absolute ceiling: 2048 (128MB)
+
+    /// Epoch budget: proportional to expected O(n^alpha) runtime
+    pub epoch_budget_fn: Box<dyn Fn(usize) -> u64>, // f(n) → ticks
+
+    /// Stack size limit (prevent deep recursion)
+    pub max_stack_bytes: usize, // Default: 1MB
+}
+
+impl SolverKernelConfig {
+    pub fn default_server() -> Self {
+        Self {
+            max_memory_pages: 2048,   // 128MB
+            max_stack_bytes: 1 << 20, // 1MB
+            epoch_budget_fn: Box::new(|n| {
+                // O(n * log(n)) ticks with 10x safety margin
+                (n as u64) * ((n as f64).log2() as u64 + 1) * 10
+            }),
+            ..Default::default()
+        }
+    }
+
+    pub fn default_browser() -> Self {
+        Self {
+            max_memory_pages: 128,    // 8MB
+            max_stack_bytes: 256_000, // 256KB
+            epoch_budget_fn: Box::new(|n| {
+                (n as u64) * ((n as f64).log2() as u64 + 1) * 5
+            }),
+            ..Default::default()
+        }
+    }
+}
+```
+
+### 2. Input Validation at All Boundaries
+
+```rust
+/// Comprehensive input validation for solver API inputs
+pub fn validate_solver_input(input: &SolverInput) -> Result<(), ValidationError> {
+    // === Size bounds ===
+    const MAX_NODES: usize = 10_000_000;
+    const MAX_EDGES: usize = 100_000_000;
+    const MAX_DIM: usize = 65_536;
+    const MAX_ITERATIONS: u64 = 1_000_000;
+    const MAX_TIMEOUT_MS: u64 = 300_000;
+    const MAX_MATRIX_ELEMENTS: usize = 1_000_000_000;
+
+    if input.node_count > MAX_NODES {
+        return Err(ValidationError::TooLarge {
+            field: "node_count", max: MAX_NODES, actual: input.node_count,
+        });
+    }
+
+    if input.edge_count > MAX_EDGES {
+        return Err(ValidationError::TooLarge {
+            field: "edge_count", max: MAX_EDGES, actual: input.edge_count,
+        });
+    }
+
+    // === Numeric sanity ===
+    for (i, weight) in input.edge_weights.iter().enumerate() {
+        if !weight.is_finite() {
+            return Err(ValidationError::InvalidNumber {
+                field: "edge_weights", index: i, reason: "non-finite value",
+            });
+        }
+    }
+
+    // === Structural consistency ===
+    let max_edges = if input.directed {
+        input.node_count.saturating_mul(input.node_count.saturating_sub(1))
+    } else {
+        input.node_count.saturating_mul(input.node_count.saturating_sub(1)) / 2
+    };
+    if input.edge_count > max_edges {
+        return Err(ValidationError::InconsistentGraph {
+            reason: "more edges than possible for given node count",
+        });
+    }
+
+    // === Parameter ranges ===
+    if input.tolerance <= 0.0 || input.tolerance > 1.0 {
+        return Err(ValidationError::OutOfRange {
+            field: "tolerance", min: 0.0, max: 1.0, actual: input.tolerance,
+        });
+    }
+
+    if input.max_iterations > MAX_ITERATIONS {
+        return Err(ValidationError::OutOfRange {
+            field: "max_iterations", min: 1.0, max: MAX_ITERATIONS as f64,
+            actual: input.max_iterations as f64,
+        });
+    }
+
+    // === Dimension bounds ===
+    if input.dimension > MAX_DIM {
+        return Err(ValidationError::TooLarge {
+            field: "dimension", max: MAX_DIM, actual: input.dimension,
+        });
+    }
+
+    // === Vector value checks ===
+    if let Some(ref values) = input.values {
+        if values.len() != input.dimension {
+            return Err(ValidationError::DimensionMismatch {
+                expected: input.dimension, actual: values.len(),
+            });
+        }
+        for (i, v) in values.iter().enumerate() {
+            if !v.is_finite() {
+                return Err(ValidationError::InvalidNumber {
+                    field: "values", index: i, reason: "non-finite value",
+                });
+            }
+        }
+    }
+
+    Ok(())
+}
+```
+
+### 3. MCP Tool Access Control
+
+```rust
+/// Solver MCP tools require PermitToken from coherence gate
+pub struct SolverMcpHandler {
+    solver: Arc<dyn SolverEngine>,
+    gate: Arc<CoherenceGate>,
+    rate_limiter: RateLimiter,
+    budget_enforcer: BudgetEnforcer,
+}
+
+impl SolverMcpHandler {
+    pub async fn handle_tool_call(
+        &self, call: McpToolCall
+    ) -> Result<McpToolResult, McpError> {
+        // 1. Rate limiting
+        let agent_id = call.agent_id.as_deref().unwrap_or("anonymous");
+        self.rate_limiter.check(agent_id)?;
+
+        // 2. PermitToken verification
+        let token = call.arguments.get("permit_token")
+            .ok_or(McpError::Unauthorized("missing permit_token"))?;
+        self.gate.verify_token(token).await
+            .map_err(|_| McpError::Unauthorized("invalid permit_token"))?;
+
+        // 3. Input validation
+        let input: SolverInput = serde_json::from_value(call.arguments.clone())
+            .map_err(|e| McpError::InvalidRequest(e.to_string()))?;
+        validate_solver_input(&input)?;
+
+        // 4. Resource budget check
+        let estimate = self.solver.estimate_complexity(&input);
+        self.budget_enforcer.check(agent_id, &estimate)?;
+
+        // 5. Execute with resource limits
+        let result = self.solver.solve_with_budget(&input, estimate.budget).await?;
+
+        // 6. Generate witness receipt
+        let witness = WitnessEntry {
+            prev_hash: self.gate.latest_hash(),
+            action_hash: shake256_256(&bincode::encode(&result)?),
+            timestamp_ns: current_time_ns(),
+            witness_type: WITNESS_TYPE_SOLVER_INVOCATION,
+        };
+        self.gate.append_witness(witness);
+
+        Ok(McpToolResult::from(result))
+    }
+}
+
+/// Per-agent rate limiter
+pub struct RateLimiter {
+    windows: DashMap<String, (Instant, u32)>,
+    config: RateLimitConfig,
+}
+
+pub struct RateLimitConfig {
+    pub solve_per_minute: u32,      // Default: 10
+    pub status_per_minute: u32,     // Default: 60
+    pub session_per_minute: u32,    // Default: 30
+    pub burst_multiplier: u32,      // Default: 3
+}
+
+impl RateLimiter {
+    pub fn check(&self, agent_id: &str) -> Result<(), McpError> {
+        let mut entry = self.windows.entry(agent_id.to_string())
+            .or_insert((Instant::now(), 0));
+
+        if entry.0.elapsed() > Duration::from_secs(60) {
+            *entry = (Instant::now(), 0);
+        }
+
+        entry.1 += 1;
+        if entry.1 > self.config.solve_per_minute {
+            return Err(McpError::RateLimited {
+                agent_id: agent_id.to_string(),
+                retry_after_secs: 60 - entry.0.elapsed().as_secs(),
+            });
+        }
+        Ok(())
+    }
+}
+```
+
+### 4. Serialization Safety
+
+```rust
+/// Safe deserialization with size limits
+pub fn deserialize_solver_input(bytes: &[u8]) -> Result<SolverInput, SolverError> {
+    // Body size limit: 10MB
+    const MAX_BODY_SIZE: usize = 10 * 1024 * 1024;
+    if bytes.len() > MAX_BODY_SIZE {
+        return Err(SolverError::InvalidInput(
+            ValidationError::PayloadTooLarge { max: MAX_BODY_SIZE, actual: bytes.len() }
+        ));
+    }
+
+    // Deserialize with serde_json (safe, bounded by input size)
+    let input: SolverInput = serde_json::from_slice(bytes)
+        .map_err(|e| SolverError::InvalidInput(ValidationError::ParseError(e.to_string())))?;
+
+    // Application-level validation
+    validate_solver_input(&input)?;
+
+    Ok(input)
+}
+
+/// Bincode deserialization with size limit
+pub fn deserialize_bincode<T: serde::de::DeserializeOwned>(bytes: &[u8]) -> Result<T, SolverError> {
+    let config = bincode::config::standard()
+        .with_limit::<{ 10 * 1024 * 1024 }>(); // 10MB max
+
+    bincode::serde::decode_from_slice(bytes, config)
+        .map(|(val, _)| val)
+        .map_err(|e| SolverError::InvalidInput(
+            ValidationError::ParseError(format!("bincode: {}", e))
+        ))
+}
+```
+
+### 5. Audit Trail
+
+```rust
+/// Solver invocations generate witness entries
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SolverAuditEntry {
+    pub request_id: Uuid,
+    pub agent_id: String,
+    pub algorithm: Algorithm,
+    pub input_hash: [u8; 32],    // SHAKE-256 of input
+    pub output_hash: [u8; 32],   // SHAKE-256 of output
+    pub iterations: usize,
+    pub wall_time_us: u64,
+    pub converged: bool,
+    pub residual: f64,
+    pub timestamp_ns: u128,
+}
+
+impl SolverAuditEntry {
+    pub fn to_witness(&self) -> WitnessEntry {
+        WitnessEntry {
+            prev_hash: [0u8; 32], // Set by chain
+            action_hash: shake256_256(&bincode::encode(self).unwrap()),
+            timestamp_ns: self.timestamp_ns,
+            witness_type: WITNESS_TYPE_SOLVER_INVOCATION,
+        }
+    }
+}
+```
+
+### 6. Supply Chain Security
+
+```toml
+# .cargo/deny.toml
+[advisories]
+vulnerability = "deny"
+unmaintained = "warn"
+
+[licenses]
+allow = ["MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "ISC"]
+deny = ["GPL-2.0", "GPL-3.0", "AGPL-3.0"]
+
+[bans]
+deny = [
+    { name = "openssl-sys" },  # Prefer rustls
+]
+```
+
+CI pipeline additions:
+
+```yaml
+# .github/workflows/security.yml
+- name: Cargo audit
+  run: cargo audit
+- name: Cargo deny
+  run: cargo deny check
+- name: npm audit
+  run: npm audit --audit-level=high
+```
+
+---
+
+## STRIDE Threat Analysis
+
+| Threat | Category | Risk | Mitigation |
+|--------|----------|------|------------|
+| Malicious problem submission via API | Tampering | High | Input validation (Section 2), body size limits |
+| WASM resource limits bypass via crafted input | Elevation | High | Kernel pack framework (Section 1), epoch limits |
+| Receipt enumeration via sequential IDs | Info Disc. | Medium | Rate limiting (Section 3), auth requirement |
+| Solver flooding with expensive problems | DoS | High | Rate limiting, compute budgets, concurrent solve semaphore |
+| Replay of valid permit token | Spoofing | Medium | Token TTL, nonce, single-use enforcement |
+| Solver calls without audit trail | Repudiation | Medium | Mandatory witness entries (Section 5) |
+| Modified solver WASM binary | Tampering | High | Ed25519 + SHA256 allowlist (Section 1) |
+| Compromised dependency injection | Tampering | Medium | cargo-deny, cargo-audit, SBOM (Section 6) |
+| NaN/Inf propagation in solver output | Integrity | Medium | Output validation, finite-check on results |
+| Cross-tool MCP escalation | Elevation | Medium | Unidirectional flow enforcement |
+
+---
+
+## Security Testing Checklist
+
+- [ ] All solver API endpoints reject payloads > 10MB
+- [ ] `k` parameter bounded to MAX_K (10,000)
+- [ ] Solver WASM modules signed and allowlisted
+- [ ] WASM execution has problem-size-proportional epoch deadlines
+- [ ] WASM memory limited to MAX_SOLVER_PAGES (2048)
+- [ ] MCP solver tools require valid PermitToken
+- [ ] Per-agent rate limiting enforced on all MCP tools
+- [ ] Deserialization uses size limits (bincode `with_limit`)
+- [ ] Session IDs are server-generated UUIDs
+- [ ] Session count per client bounded (max: 10)
+- [ ] CORS restricted to known origins
+- [ ] Authentication required on mutating endpoints
+- [ ] `unsafe` code reviewed for solver integration paths
+- [ ] `cargo audit` and `npm audit` pass (no critical vulns)
+- [ ] Fuzz testing targets for all deserialization entry points
+- [ ] Solver results include tolerance bounds
+- [ ] Cross-tool MCP calls prevented
+- [ ] Witness chain entries created for solver invocations
+- [ ] Input NaN/Inf rejected before reaching solver
+- [ ] Output NaN/Inf detected and error returned
+
+---
+
+## Consequences
+
+### Positive
+
+1. **Defense-in-depth**: Solver integrates into existing security layers, not bypassing them
+2. **Auditable**: All solver invocations have cryptographic witness receipts
+3. **Resource-bounded**: Compute budgets prevent cost amplification attacks
+4. **Supply chain secured**: Automated auditing in CI pipeline
+5. **Platform-safe**: WASM sandbox enforces memory and CPU limits
+
+### Negative
+
+1. **PermitToken overhead**: Gate verification adds ~100μs per solver call
+2. **Rate limiting friction**: Legitimate high-throughput use cases may hit limits
+3. **Audit storage**: Witness entries add ~200 bytes per solver invocation
+
+---
+
+## Implementation Status
+
+Input validation module (validation.rs) checks CSR structural invariants, index bounds, NaN/Inf detection. Budget enforcement prevents resource exhaustion. Audit trail logs all solver invocations. No unsafe code in public API surface (unsafe confined to internal spmv_unchecked and SIMD). All assertions verified in 177 tests.
+
+---
+
+## References
+
+- [09-security-analysis.md](../09-security-analysis.md) — Full security analysis
+- [07-mcp-integration.md](../07-mcp-integration.md) — MCP tool access patterns
+- [06-wasm-integration.md](../06-wasm-integration.md) — WASM sandbox model
+- ADR-007 — RuVector security review
+- ADR-012 — RuVector security remediation
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-006-benchmark-framework.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-006-benchmark-framework.md
@@ -0,0 +1,503 @@
+# ADR-STS-006: Benchmark Framework and Performance Validation
+
+**Status**: Accepted
+**Date**: 2026-02-20
+**Authors**: RuVector Performance Team
+**Deciders**: Architecture Review Board
+
+## Version History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 0.1 | 2026-02-20 | RuVector Team | Initial proposal |
+| 1.0 | 2026-02-20 | RuVector Team | Accepted: full implementation complete |
+
+---
+
+## Context
+
+### Existing Benchmark Infrastructure
+
+RuVector maintains 90+ benchmark files using Criterion.rs 0.5 with HTML reports. The release profile enables aggressive optimization (`lto = "fat"`, `codegen-units = 1`, `opt-level = 3`), and the bench profile inherits release with debug symbols for profiling.
+
+### Published Performance Baselines
+
+| Metric | Value | Platform | Source |
+|--------|-------|----------|--------|
+| Euclidean 128D | 14.9 ns | M4 Pro NEON | BENCHMARK_RESULTS.md |
+| Dot Product 128D | 12.0 ns | M4 Pro NEON | BENCHMARK_RESULTS.md |
+| HNSW k=10, 10K vectors | 25.2 μs | M4 Pro | BENCHMARK_RESULTS.md |
+| Batch 1K×384D | 278 μs | Linux AVX2 | BENCHMARK_RESULTS.md |
+| Binary hamming 384D | 0.9 ns | M4 Pro | BENCHMARK_RESULTS.md |
+
+### Validation Requirements
+
+The sublinear-time solver claims 10-600x speedups. These must be validated with:
+- Statistical significance (Criterion p < 0.05)
+- Crossover point identification (where sublinear beats traditional)
+- Accuracy-performance tradeoff quantification
+- Multi-platform consistency verification
+- Regression detection in CI
+
+---
+
+## Decision
+
+### 1. Six New Benchmark Suites
+
+#### Suite 1: `benches/solver_baseline.rs`
+
+Establishes baselines for operations the solver replaces:
+
+```rust
+use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId, Throughput};
+
+fn dense_matmul_baseline(c: &mut Criterion) {
+    let mut group = c.benchmark_group("dense_matmul_baseline");
+
+    for size in [64, 256, 1024, 4096] {
+        let a = random_dense_matrix(size, size, 42);
+        let x = random_vector(size, 43);
+        let mut y = vec![0.0f32; size];
+
+        group.throughput(Throughput::Elements((size * size) as u64));
+        group.bench_with_input(
+            BenchmarkId::new("naive", size),
+            &size,
+            |b, _| b.iter(|| dense_matvec_naive(&a, &x, &mut y)),
+        );
+        group.bench_with_input(
+            BenchmarkId::new("simd_unrolled", size),
+            &size,
+            |b, _| b.iter(|| dense_matvec_simd(&a, &x, &mut y)),
+        );
+    }
+    group.finish();
+}
+
+fn sparse_matmul_baseline(c: &mut Criterion) {
+    let mut group = c.benchmark_group("sparse_matmul_baseline");
+
+    for (n, density) in [(1000, 0.01), (1000, 0.05), (10000, 0.01), (10000, 0.05)] {
+        let csr = random_csr_matrix(n, n, density, 44);
+        let x = random_vector(n, 45);
+        let mut y = vec![0.0f32; n];
+
+        group.throughput(Throughput::Elements(csr.nnz() as u64));
+        group.bench_with_input(
+            BenchmarkId::new(format!("csr_{}x{}_{:.0}pct", n, n, density * 100.0), n),
+            &n,
+            |b, _| b.iter(|| csr.spmv(&x, &mut y)),
+        );
+    }
+    group.finish();
+}
+
+criterion_group!(baselines, dense_matmul_baseline, sparse_matmul_baseline);
+criterion_main!(baselines);
+```
+
+#### Suite 2: `benches/solver_neumann.rs`
+
+```rust
+fn neumann_convergence(c: &mut Criterion) {
+    let mut group = c.benchmark_group("neumann_convergence");
+    group.warm_up_time(Duration::from_secs(5));
+    group.sample_size(200);
+
+    let csr = random_diag_dominant_csr(10000, 0.01, 46);
+    let b = random_vector(10000, 47);
+
+    for eps in [1e-2, 1e-4, 1e-6, 1e-8] {
+        group.bench_with_input(
+            BenchmarkId::new("eps", format!("{:.0e}", eps)),
+            &eps,
+            |bench, &eps| {
+                bench.iter(|| {
+                    let solver = NeumannSolver::new(eps, 1000);
+                    solver.solve(&csr, &b)
+                })
+            },
+        );
+    }
+    group.finish();
+}
+
+fn neumann_sparsity_impact(c: &mut Criterion) {
+    let mut group = c.benchmark_group("neumann_sparsity_impact");
+    let n = 10000;
+
+    for density in [0.001, 0.01, 0.05, 0.10, 0.50] {
+        let csr = random_diag_dominant_csr(n, density, 48);
+        let b = random_vector(n, 49);
+
+        group.throughput(Throughput::Elements(csr.nnz() as u64));
+        group.bench_with_input(
+            BenchmarkId::new("density", format!("{:.1}pct", density * 100.0)),
+            &density,
+            |bench, _| {
+                bench.iter(|| {
+                    NeumannSolver::new(1e-4, 1000).solve(&csr, &b)
+                })
+            },
+        );
+    }
+    group.finish();
+}
+
+fn neumann_vs_direct(c: &mut Criterion) {
+    let mut group = c.benchmark_group("neumann_vs_direct");
+
+    for n in [100, 500, 1000, 5000, 10000] {
+        let csr = random_diag_dominant_csr(n, 0.01, 50);
+        let b = random_vector(n, 51);
+        let dense = csr.to_dense();
+
+        group.bench_with_input(
+            BenchmarkId::new("neumann", n), &n,
+            |bench, _| bench.iter(|| NeumannSolver::new(1e-6, 1000).solve(&csr, &b)),
+        );
+        group.bench_with_input(
+            BenchmarkId::new("dense_direct", n), &n,
+            |bench, _| bench.iter(|| dense_solve(&dense, &b)),
+        );
+    }
+    group.finish();
+}
+
+criterion_group!(neumann, neumann_convergence, neumann_sparsity_impact, neumann_vs_direct);
+```
+
+#### Suite 3: `benches/solver_push.rs`
+
+```rust
+fn forward_push_scaling(c: &mut Criterion) {
+    let mut group = c.benchmark_group("forward_push_scaling");
+
+    for n in [100, 1000, 10000, 100000] {
+        let graph = random_sparse_graph(n, 0.005, 52);
+
+        for eps in [1e-2, 1e-4, 1e-6] {
+            group.bench_with_input(
+                BenchmarkId::new(format!("n{}_eps{:.0e}", n, eps), n),
+                &(n, eps),
+                |bench, &(_, eps)| {
+                    bench.iter(|| {
+                        let solver = ForwardPushSolver::new(0.85, eps);
+                        solver.ppr_from_source(&graph, 0)
+                    })
+                },
+            );
+        }
+    }
+    group.finish();
+}
+
+fn backward_push_vs_forward(c: &mut Criterion) {
+    let mut group = c.benchmark_group("push_direction_comparison");
+    let n = 10000;
+    let graph = random_sparse_graph(n, 0.005, 53);
+
+    for eps in [1e-2, 1e-4] {
+        group.bench_with_input(
+            BenchmarkId::new("forward", format!("{:.0e}", eps)), &eps,
+            |bench, &eps| bench.iter(|| ForwardPushSolver::new(0.85, eps).ppr_from_source(&graph, 0)),
+        );
+        group.bench_with_input(
+            BenchmarkId::new("backward", format!("{:.0e}", eps)), &eps,
+            |bench, &eps| bench.iter(|| BackwardPushSolver::new(0.85, eps).ppr_to_target(&graph, 0)),
+        );
+    }
+    group.finish();
+}
+```
+
+#### Suite 4: `benches/solver_random_walk.rs`
+
+```rust
+fn random_walk_entry_estimation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("random_walk_estimation");
+
+    for n in [1000, 10000, 100000] {
+        let csr = random_laplacian_csr(n, 0.005, 54);
+
+        group.bench_with_input(
+            BenchmarkId::new("single_entry", n), &n,
+            |bench, _| bench.iter(|| {
+                HybridRandomWalkSolver::new(1e-4, 1000).estimate_entry(&csr, 0, n/2)
+            }),
+        );
+
+        group.bench_with_input(
+            BenchmarkId::new("batch_100_entries", n), &n,
+            |bench, _| bench.iter(|| {
+                let pairs: Vec<(usize, usize)> = (0..100).map(|i| (i, n - 1 - i)).collect();
+                HybridRandomWalkSolver::new(1e-4, 1000).estimate_batch(&csr, &pairs)
+            }),
+        );
+    }
+    group.finish();
+}
+```
+
+#### Suite 5: `benches/solver_scheduler.rs`
+
+```rust
+fn scheduler_latency(c: &mut Criterion) {
+    let mut group = c.benchmark_group("scheduler_latency");
+
+    group.bench_function("noop_task", |b| {
+        let scheduler = SolverScheduler::new(4);
+        b.iter(|| scheduler.submit(|| {}))
+    });
+
+    group.bench_function("100ns_task", |b| {
+        let scheduler = SolverScheduler::new(4);
+        b.iter(|| scheduler.submit(|| {
+            std::hint::spin_loop(); // ~100ns
+        }))
+    });
+
+    group.bench_function("1us_task", |b| {
+        let scheduler = SolverScheduler::new(4);
+        b.iter(|| scheduler.submit(|| {
+            for _ in 0..100 { std::hint::spin_loop(); }
+        }))
+    });
+
+    group.finish();
+}
+
+fn scheduler_throughput(c: &mut Criterion) {
+    let mut group = c.benchmark_group("scheduler_throughput");
+
+    for task_count in [1000, 10_000, 100_000, 1_000_000] {
+        group.throughput(Throughput::Elements(task_count));
+        group.bench_with_input(
+            BenchmarkId::new("tasks", task_count), &task_count,
+            |bench, &count| {
+                let scheduler = SolverScheduler::new(4);
+                let counter = Arc::new(AtomicU64::new(0));
+                bench.iter(|| {
+                    counter.store(0, Ordering::Relaxed);
+                    for _ in 0..count {
+                        let c = counter.clone();
+                        scheduler.submit(move || { c.fetch_add(1, Ordering::Relaxed); });
+                    }
+                    scheduler.flush();
+                    assert_eq!(counter.load(Ordering::Relaxed), count);
+                })
+            },
+        );
+    }
+    group.finish();
+}
+```
+
+#### Suite 6: `benches/solver_e2e.rs`
+
+```rust
+fn accelerated_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("accelerated_search");
+    group.sample_size(50);
+    group.warm_up_time(Duration::from_secs(5));
+
+    for n in [10_000, 100_000] {
+        let db = build_test_db(n, 384, 56);
+        let query = random_vector(384, 57);
+
+        group.bench_with_input(
+            BenchmarkId::new("hnsw_only", n), &n,
+            |bench, _| bench.iter(|| db.search(&query, 10)),
+        );
+
+        group.bench_with_input(
+            BenchmarkId::new("hnsw_plus_solver_rerank", n), &n,
+            |bench, _| bench.iter(|| {
+                let candidates = db.search(&query, 100); // Broad HNSW
+                solver_rerank(&db, &query, &candidates, 10)  // Solver-accelerated reranking
+            }),
+        );
+    }
+    group.finish();
+}
+
+fn accelerated_batch_analytics(c: &mut Criterion) {
+    let mut group = c.benchmark_group("batch_analytics");
+    group.sample_size(10);
+
+    let n = 10_000;
+    let vectors = random_matrix(n, 384, 58);
+
+    group.bench_function("pairwise_brute_force", |b| {
+        b.iter(|| pairwise_distances_brute(&vectors))
+    });
+
+    group.bench_function("pairwise_solver_estimated", |b| {
+        b.iter(|| pairwise_distances_solver(&vectors, 1e-4))
+    });
+
+    group.finish();
+}
+```
+
+### 2. Regression Prevention
+
+Hard thresholds enforced in CI:
+
+```rust
+// In each benchmark suite, add regression markers
+fn solver_regression_tests(c: &mut Criterion) {
+    let mut group = c.benchmark_group("solver_regression");
+
+    // These thresholds trigger CI failure if exceeded
+    group.bench_function("neumann_10k_1pct", |b| {
+        let csr = random_diag_dominant_csr(10000, 0.01, 60);
+        let rhs = random_vector(10000, 61);
+        b.iter(|| NeumannSolver::new(1e-4, 1000).solve(&csr, &rhs))
+        // Target: < 500μs
+    });
+
+    group.bench_function("forward_push_10k", |b| {
+        let graph = random_sparse_graph(10000, 0.005, 62);
+        b.iter(|| ForwardPushSolver::new(0.85, 1e-4).ppr_from_source(&graph, 0))
+        // Target: < 100μs
+    });
+
+    group.bench_function("cg_10k_1pct", |b| {
+        let csr = random_laplacian_csr(10000, 0.01, 63);
+        let rhs = random_vector(10000, 64);
+        b.iter(|| ConjugateGradientSolver::new(1e-6, 1000).solve(&csr, &rhs))
+        // Target: < 1ms
+    });
+
+    group.finish();
+}
+```
+
+### 3. Accuracy Validation Suite
+
+Alongside latency benchmarks, accuracy must be tracked:
+
+```rust
+fn accuracy_validation() {
+    // Neumann vs exact solve
+    let csr = random_diag_dominant_csr(1000, 0.01, 70);
+    let b = random_vector(1000, 71);
+    let exact = dense_solve(&csr.to_dense(), &b);
+
+    for eps in [1e-2, 1e-4, 1e-6] {
+        let approx = NeumannSolver::new(eps, 1000).solve(&csr, &b).unwrap();
+        let relative_error = l2_distance(&exact, &approx.solution) / l2_norm(&exact);
+        assert!(relative_error < eps * 10.0, // 10x margin
+            "Neumann eps={}: relative error {} exceeds bound {}",
+            eps, relative_error, eps * 10.0);
+    }
+
+    // Forward Push recall@k
+    let graph = random_sparse_graph(10000, 0.005, 72);
+    let exact_ppr = exact_pagerank(&graph, 0, 0.85);
+    let top_k_exact: Vec<usize> = exact_ppr.top_k(100);
+
+    for eps in [1e-2, 1e-4] {
+        let approx_ppr = ForwardPushSolver::new(0.85, eps).ppr_from_source(&graph, 0);
+        let top_k_approx: Vec<usize> = approx_ppr.top_k(100);
+        let recall = set_overlap(&top_k_exact, &top_k_approx) as f64 / 100.0;
+        assert!(recall > 0.9, "Forward Push eps={}: recall@100 = {} < 0.9", eps, recall);
+    }
+}
+```
+
+### 4. CI Integration
+
+```yaml
+# .github/workflows/bench.yml
+name: Benchmark Suite
+on:
+  pull_request:
+    paths: ['crates/ruvector-solver/**']
+  schedule:
+    - cron: '0 2 * * *'  # Nightly at 2 AM
+
+jobs:
+  bench-pr:
+    runs-on: ubuntu-latest
+    if: github.event_name == 'pull_request'
+    steps:
+      - uses: actions/checkout@v4
+      - run: cargo bench -p ruvector-solver -- solver_regression
+      - uses: benchmark-action/github-action-benchmark@v1
+        with:
+          tool: 'cargo'
+          output-file-path: target/criterion/report/index.html
+
+  bench-nightly:
+    runs-on: ubuntu-latest
+    if: github.event_name == 'schedule'
+    strategy:
+      matrix:
+        target: [x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu]
+    steps:
+      - uses: actions/checkout@v4
+      - run: cargo bench -p ruvector-solver --target ${{ matrix.target }}
+      - run: cargo bench -p ruvector-solver -- solver_accuracy
+      - uses: actions/upload-artifact@v4
+        with:
+          name: bench-results-${{ matrix.target }}
+          path: target/criterion/
+```
+
+### 5. Reporting Format
+
+Following existing BENCHMARK_RESULTS.md conventions:
+
+```markdown
+## Solver Integration Benchmarks
+
+### Environment
+- **Date**: 2026-02-20
+- **Platform**: Linux x86_64, AMD EPYC 7763 (AVX-512)
+- **Rust**: 1.77, release profile (lto=fat, codegen-units=1)
+- **Criterion**: 0.5, 200 samples, 5s warmup
+
+### Results
+
+| Operation | Baseline | Solver | Speedup | Accuracy |
+|-----------|----------|--------|---------|----------|
+| MatVec 10K×10K (1%) | 400 μs | 15 μs | 26.7x | ε < 1e-4 |
+| PageRank 10K nodes | 50 ms | 80 μs | 625x | recall@100 > 0.95 |
+| Spectral gap est. | N/A | 50 μs | New | within 5% of exact |
+| Batch pairwise 10K | 480 s | 15 s | 32x | ε < 1e-3 |
+```
+
+---
+
+## Consequences
+
+### Positive
+
+1. **Reproducible validation**: All speedup claims backed by Criterion benchmarks
+2. **Regression prevention**: CI catches performance degradations before merge
+3. **Multi-platform**: Benchmarks run on x86_64 and aarch64
+4. **Accuracy tracking**: Approximate algorithms validated against exact baselines
+5. **Aligned infrastructure**: Uses existing Criterion.rs setup, no new tools
+
+### Negative
+
+1. **Benchmark maintenance**: 6 new benchmark files to maintain
+2. **CI time**: Nightly full suite adds ~30 minutes to CI
+3. **Flaky thresholds**: Regression thresholds may need periodic recalibration
+
+---
+
+## Implementation Status
+
+Complete Criterion benchmark suite delivered with 5 benchmark groups: solver_baseline (dense reference), solver_neumann (Neumann series profiling), solver_cg (conjugate gradient scaling), solver_push (push algorithm comparison), solver_e2e (end-to-end pipeline). Min-cut gating benchmark script (scripts/run_mincut_bench.sh) with 1k-sample grid search over lambda/tau parameters. Profiler crate (ruvector-profiler) provides memory, latency, power measurement with CSV output.
+
+---
+
+## References
+
+- [08-performance-analysis.md](../08-performance-analysis.md) — Existing benchmarks and methodology
+- [10-algorithm-analysis.md](../10-algorithm-analysis.md) — Algorithm complexity for threshold derivation
+- [12-testing-strategy.md](../12-testing-strategy.md) — Testing strategy integration
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-007-feature-flags-rollout.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-007-feature-flags-rollout.md
@@ -0,0 +1,949 @@
+# ADR-STS-007: Feature Flag Architecture and Progressive Rollout
+
+## Status
+
+**Accepted**
+
+## Metadata
+
+| Field       | Value                                          |
+|-------------|------------------------------------------------|
+| Version     | 1.0                                            |
+| Date        | 2026-02-20                                     |
+| Authors     | RuVector Architecture Team                     |
+| Deciders    | Architecture Review Board                      |
+| Supersedes  | N/A                                            |
+| Related     | ADR-STS-001 (Solver Integration), ADR-STS-003 (WASM Strategy) |
+
+---
+
+## Context
+
+The RuVector workspace (v2.0.3, Rust 2021 edition, resolver v2) contains 100+ crates
+spanning vector storage, graph databases, GNN layers, attention mechanisms, sparse
+inference, and mathematics. Feature flags are already used extensively throughout the
+codebase:
+
+- **ruvector-core**: `default = ["simd", "storage", "hnsw", "api-embeddings", "parallel"]`
+- **ruvector-graph**: `default = ["full"]` with `full`, `simd`, `storage`, `async-runtime`,
+  `compression`, `distributed`, `federation`, `wasm`
+- **ruvector-math**: `default = ["std"]` with `simd`, `parallel`, `serde`
+- **ruvector-gnn**: `default = ["simd", "mmap"]` with `wasm`, `napi`
+- **ruvector-attention**: `default = ["simd"]` with `wasm`, `napi`, `math`, `sheaf`
+
+The sublinear-time-solver (v0.1.3) introduces new algorithmic capabilities --- coherence
+verification, spectral graph methods, GNN-accelerated search, and sublinear query
+resolution --- that must be integrated without disrupting any of these existing feature
+surfaces.
+
+### Constraints
+
+1. **Zero breaking changes** to the public API of any existing crate.
+2. **Opt-in per subsystem**: each solver capability must be individually selectable.
+3. **Gradual rollout**: phased introduction from experimental to default.
+4. **Platform parity**: feature gates must account for native, WASM, and Node.js targets.
+5. **CI tractability**: the feature matrix must remain testable without combinatorial
+   explosion.
+6. **Dependency hygiene**: enabling a solver feature must not pull in nalgebra when only
+   ndarray is needed, and vice versa.
+
+---
+
+## Decision
+
+We adopt a **hierarchical feature flag architecture** with four tiers: the solver crate
+defines its own backend and acceleration flags, consuming crates expose subsystem-scoped
+`sublinear-*` flags, the workspace root provides aggregate flags for convenience, and CI
+tests a curated feature matrix rather than all 2^N combinations.
+
+### 1. Solver Crate Feature Definitions
+
+```toml
+# crates/ruvector-solver/Cargo.toml
+
+[package]
+name = "ruvector-solver"
+version = "0.1.0"
+edition.workspace = true
+rust-version.workspace = true
+license.workspace = true
+authors.workspace = true
+repository.workspace = true
+description = "Sublinear-time solver: coherence verification, spectral methods, GNN search"
+
+[features]
+default = []
+
+# Linear algebra backends (mutually independent, both can be active)
+nalgebra-backend = ["dep:nalgebra"]
+ndarray-backend  = ["dep:ndarray"]
+
+# Acceleration
+parallel = ["dep:rayon"]
+simd     = []                          # Auto-detected at build time via cfg
+gpu      = ["ruvector-math/parallel"]  # Future: GPU dispatch through ruvector-math
+
+# Platform targets
+wasm = [
+    "dep:wasm-bindgen",
+    "dep:serde_wasm_bindgen",
+    "dep:js-sys",
+]
+
+# Convenience aggregates
+full = ["nalgebra-backend", "ndarray-backend", "parallel"]
+
+[dependencies]
+# Core (always present)
+ruvector-math = { path = "../ruvector-math", default-features = false }
+serde         = { workspace = true }
+serde_json    = { workspace = true }
+thiserror     = { workspace = true }
+tracing       = { workspace = true }
+rand          = { workspace = true }
+rand_distr    = { workspace = true }
+
+# Optional backends
+nalgebra = { version = "0.33", default-features = false, features = ["std"], optional = true }
+ndarray  = { workspace = true, features = ["serde"], optional = true }
+
+# Optional acceleration
+rayon = { workspace = true, optional = true }
+
+# Optional WASM
+wasm-bindgen       = { workspace = true, optional = true }
+serde_wasm_bindgen = { version = "0.6", optional = true }
+js-sys             = { workspace = true, optional = true }
+
+[dev-dependencies]
+criterion = { workspace = true }
+proptest  = { workspace = true }
+approx    = "0.5"
+```
+
+### 2. Consuming Crate Feature Gates
+
+Each crate that integrates solver capabilities exposes granular `sublinear-*` flags
+that map onto solver features. This keeps the dependency graph explicit and auditable.
+
+#### 2.1 ruvector-core
+
+```toml
+# Additions to crates/ruvector-core/Cargo.toml [features]
+
+# Sublinear solver integration (opt-in)
+sublinear = ["dep:ruvector-solver"]
+
+# Coherence verification for HNSW index quality
+sublinear-coherence = [
+    "sublinear",
+    "ruvector-solver/nalgebra-backend",
+]
+```
+
+The `sublinear-coherence` flag enables runtime coherence checks on HNSW graph edges.
+It requires the nalgebra backend because the coherence verifier uses sheaf-theoretic
+linear algebra that maps naturally to nalgebra's matrix abstractions.
+
+#### 2.2 ruvector-graph
+
+```toml
+# Additions to crates/ruvector-graph/Cargo.toml [features]
+
+# Sublinear spectral partitioning and Laplacian solvers
+sublinear = ["dep:ruvector-solver"]
+
+sublinear-graph = [
+    "sublinear",
+    "ruvector-solver/ndarray-backend",
+]
+
+# Spectral methods for graph partitioning
+sublinear-spectral = [
+    "sublinear-graph",
+    "ruvector-solver/parallel",
+]
+```
+
+Graph crates use the ndarray backend because ruvector-graph already depends on ndarray
+for adjacency matrices and spectral embeddings. Pulling in nalgebra here would add an
+unnecessary second linear algebra library.
+
+#### 2.3 ruvector-gnn
+
+```toml
+# Additions to crates/ruvector-gnn/Cargo.toml [features]
+
+# GNN-accelerated sublinear search
+sublinear = ["dep:ruvector-solver"]
+
+sublinear-gnn = [
+    "sublinear",
+    "ruvector-solver/ndarray-backend",
+]
+```
+
+#### 2.4 ruvector-attention
+
+```toml
+# Additions to crates/ruvector-attention/Cargo.toml [features]
+
+# Sublinear attention routing
+sublinear = ["dep:ruvector-solver"]
+
+sublinear-attention = [
+    "sublinear",
+    "ruvector-solver/nalgebra-backend",
+    "math",
+]
+```
+
+#### 2.5 ruvector-collections
+
+```toml
+# Additions to crates/ruvector-collections/Cargo.toml [features]
+
+# Sublinear collection-level query dispatch
+sublinear = ["ruvector-core/sublinear"]
+```
+
+Collections delegates to ruvector-core and does not directly depend on the solver crate.
+
+### 3. Workspace-Level Aggregate Flags
+
+```toml
+# Additions to workspace Cargo.toml [workspace.dependencies]
+
+ruvector-solver = { path = "crates/ruvector-solver", default-features = false }
+```
+
+No workspace-level default features are set for the solver. Each consumer pulls exactly
+the features it needs.
+
+### 4. Conditional Compilation Patterns
+
+All solver-gated code uses consistent `cfg` attribute patterns to ensure the compiler
+eliminates dead code paths when features are disabled.
+
+#### 4.1 Module-Level Gating
+
+```rust
+// In crates/ruvector-core/src/lib.rs
+
+#[cfg(feature = "sublinear")]
+pub mod sublinear;
+
+#[cfg(feature = "sublinear-coherence")]
+pub mod coherence;
+```
+
+#### 4.2 Trait Implementation Gating
+
+```rust
+// In crates/ruvector-core/src/index/hnsw.rs
+
+#[cfg(feature = "sublinear-coherence")]
+impl HnswIndex {
+    /// Verify edge coherence across the HNSW graph using sheaf Laplacian.
+    ///
+    /// Returns the coherence score in [0, 1] where 1.0 means perfectly coherent.
+    /// Only available when the `sublinear-coherence` feature is enabled.
+    pub fn verify_coherence(&self, config: &CoherenceConfig) -> Result<f64, SolverError> {
+        use ruvector_solver::coherence::SheafCoherenceVerifier;
+
+        let verifier = SheafCoherenceVerifier::new(config.clone());
+        verifier.verify(&self.graph)
+    }
+}
+```
+
+#### 4.3 Function-Level Gating with Fallback
+
+```rust
+// In crates/ruvector-graph/src/query/planner.rs
+
+/// Select the optimal query execution strategy.
+///
+/// When `sublinear-spectral` is enabled, the planner considers spectral
+/// partitioning for large graph traversals. Otherwise, it falls back to
+/// the existing cost-based optimizer.
+pub fn select_strategy(&self, query: &GraphQuery) -> ExecutionStrategy {
+    #[cfg(feature = "sublinear-spectral")]
+    {
+        if self.should_use_spectral(query) {
+            return self.plan_spectral(query);
+        }
+    }
+
+    // Default path: cost-based optimizer (always available)
+    self.plan_cost_based(query)
+}
+```
+
+#### 4.4 Compile-Time Backend Selection
+
+```rust
+// In crates/ruvector-solver/src/backend.rs
+
+/// Marker type for the active linear algebra backend.
+///
+/// The solver supports nalgebra and ndarray simultaneously. Consumers
+/// select which backend(s) to activate via feature flags. When both
+/// are active, the solver can dispatch to whichever backend is more
+/// efficient for a given operation.
+
+#[cfg(feature = "nalgebra-backend")]
+pub mod nalgebra_ops {
+    use nalgebra::{DMatrix, DVector};
+
+    pub fn solve_laplacian(laplacian: &DMatrix<f64>, rhs: &DVector<f64>) -> DVector<f64> {
+        // Cholesky decomposition for positive semi-definite Laplacians
+        let chol = laplacian.clone().cholesky()
+            .expect("Laplacian must be positive semi-definite");
+        chol.solve(rhs)
+    }
+}
+
+#[cfg(feature = "ndarray-backend")]
+pub mod ndarray_ops {
+    use ndarray::{Array1, Array2};
+
+    pub fn spectral_embedding(adjacency: &Array2<f64>, dim: usize) -> Array2<f64> {
+        // Eigendecomposition of the normalized Laplacian
+        // ... implementation details
+        todo!("spectral embedding via ndarray")
+    }
+}
+```
+
+### 5. Runtime Algorithm Selection
+
+Beyond compile-time feature gates, the solver provides a runtime dispatch layer
+that selects between dense and sublinear code paths based on data characteristics.
+
+```rust
+// In crates/ruvector-solver/src/dispatch.rs
+
+/// Configuration for runtime algorithm selection.
+#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
+pub struct SolverDispatchConfig {
+    /// Sparsity threshold above which the sublinear path is preferred.
+    /// Default: 0.95 (95% sparse). Range: [0.0, 1.0].
+    pub sparsity_threshold: f64,
+
+    /// Minimum number of elements before sublinear algorithms are considered.
+    /// Below this threshold, dense algorithms are always faster due to setup costs.
+    /// Default: 10_000.
+    pub min_elements_for_sublinear: usize,
+
+    /// Maximum fraction of elements the sublinear path may touch.
+    /// If the solver would need to examine more than this fraction,
+    /// it falls back to the dense path.
+    /// Default: 0.1 (10%).
+    pub max_touch_fraction: f64,
+
+    /// Force a specific path regardless of data characteristics.
+    /// None means auto-detection (recommended).
+    pub force_path: Option<SolverPath>,
+}
+
+impl Default for SolverDispatchConfig {
+    fn default() -> Self {
+        Self {
+            sparsity_threshold: 0.95,
+            min_elements_for_sublinear: 10_000,
+            max_touch_fraction: 0.1,
+            force_path: None,
+        }
+    }
+}
+
+/// Which execution path to use.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
+pub enum SolverPath {
+    /// Traditional dense algorithms.
+    Dense,
+    /// Sublinear-time algorithms (only touches a fraction of the data).
+    Sublinear,
+}
+
+/// Determine the optimal execution path for the given data.
+pub fn select_path(
+    total_elements: usize,
+    nonzero_elements: usize,
+    config: &SolverDispatchConfig,
+) -> SolverPath {
+    if let Some(forced) = config.force_path {
+        return forced;
+    }
+
+    if total_elements < config.min_elements_for_sublinear {
+        return SolverPath::Dense;
+    }
+
+    let sparsity = 1.0 - (nonzero_elements as f64 / total_elements as f64);
+    if sparsity >= config.sparsity_threshold {
+        SolverPath::Sublinear
+    } else {
+        SolverPath::Dense
+    }
+}
+```
+
+### 6. WASM Feature Interaction Matrix
+
+WASM targets cannot use certain features (mmap, threads via rayon, SIMD on older
+runtimes). The following matrix defines valid feature combinations per platform.
+
+```
+Legend:  Y = supported    N = not supported    P = partial (polyfill)
+
+Feature                    | native-x86_64 | native-aarch64 | wasm32-unknown | wasm32-wasi
+---------------------------+---------------+----------------+----------------+------------
+sublinear                  | Y             | Y              | Y              | Y
+sublinear-coherence        | Y             | Y              | Y              | Y
+sublinear-graph            | Y             | Y              | Y              | Y
+sublinear-gnn              | Y             | Y              | Y              | Y
+sublinear-spectral         | Y             | Y              | N (no rayon)   | N
+sublinear-attention        | Y             | Y              | Y              | Y
+nalgebra-backend           | Y             | Y              | Y              | Y
+ndarray-backend            | Y             | Y              | Y              | Y
+parallel (rayon)           | Y             | Y              | N              | N
+simd                       | Y             | Y              | P (128-bit)    | P
+gpu                        | Y             | P              | N              | N
+solver + storage           | Y             | Y              | N              | Y (fs)
+solver + hnsw              | Y             | Y              | N              | N
+```
+
+#### WASM Guard Pattern
+
+```rust
+// In crates/ruvector-solver/src/lib.rs
+
+// Prevent invalid feature combinations at compile time.
+#[cfg(all(feature = "parallel", target_arch = "wasm32"))]
+compile_error!(
+    "The `parallel` feature (rayon) is not supported on wasm32 targets. \
+     Remove it or use `--no-default-features` when building for WASM."
+);
+
+#[cfg(all(feature = "gpu", target_arch = "wasm32"))]
+compile_error!(
+    "The `gpu` feature is not supported on wasm32 targets."
+);
+```
+
+### 7. Feature Flag Documentation Pattern
+
+Every feature flag must include a doc comment in the crate-level documentation.
+
+```rust
+// In crates/ruvector-solver/src/lib.rs
+
+//! # Feature Flags
+//!
+//! | Flag               | Default | Description                                      |
+//! |--------------------|---------|--------------------------------------------------|
+//! | `nalgebra-backend` | off     | Enable nalgebra for sheaf/coherence operations    |
+//! | `ndarray-backend`  | off     | Enable ndarray for spectral/graph operations      |
+//! | `parallel`         | off     | Enable rayon for multi-threaded solver execution   |
+//! | `simd`             | off     | Enable SIMD intrinsics (auto-detected at build)   |
+//! | `gpu`              | off     | Enable GPU dispatch through ruvector-math          |
+//! | `wasm`             | off     | Enable WASM bindings via wasm-bindgen              |
+//! | `full`             | off     | Enable nalgebra + ndarray + parallel               |
+```
+
+---
+
+## Progressive Rollout Plan
+
+### Phase 1: Foundation (Weeks 1-3)
+
+**Goal**: Introduce the solver crate with zero consumer integration.
+
+| Task                                              | Acceptance Criteria                          |
+|---------------------------------------------------|----------------------------------------------|
+| Create `crates/ruvector-solver` with empty public API | Crate compiles, no downstream changes      |
+| Define all feature flags in Cargo.toml            | `cargo check --all-features` passes          |
+| Add solver to workspace members list              | `cargo build -p ruvector-solver` succeeds    |
+| Write compile-time WASM guards                    | WASM build fails gracefully on invalid combos|
+| Add `ruvector-solver` to workspace dependencies   | Resolver v2 is satisfied                     |
+| Set up CI job for `ruvector-solver` feature matrix | All matrix entries pass                     |
+
+**Feature flags available**: `nalgebra-backend`, `ndarray-backend`, `parallel`, `simd`,
+`wasm`, `full`.
+
+**Consumer flags available**: None (solver is not yet a dependency of any consumer).
+
+**Risk**: Minimal. No consumer code changes.
+
+### Phase 2: Core Integration (Weeks 4-7)
+
+**Goal**: Enable coherence verification in ruvector-core and GNN acceleration in
+ruvector-gnn behind opt-in feature flags.
+
+| Task                                              | Acceptance Criteria                          |
+|---------------------------------------------------|----------------------------------------------|
+| Add `sublinear` flag to ruvector-core             | Flag compiles with no behavioral change      |
+| Add `sublinear-coherence` flag to ruvector-core   | Coherence verifier runs on HNSW graphs       |
+| Add `sublinear-gnn` flag to ruvector-gnn          | GNN training uses sublinear message passing  |
+| Write integration tests for coherence             | Tests pass with and without the flag         |
+| Write integration tests for GNN acceleration      | Tests pass with and without the flag         |
+| Benchmark coherence overhead                      | Less than 5% latency increase on default path|
+| Update ruvector-core README with new flags        | Documentation is current                     |
+
+**Feature flags available**: Phase 1 flags + `sublinear`, `sublinear-coherence`,
+`sublinear-gnn`.
+
+**Rollback plan**: Remove the `sublinear*` feature flags from consumer Cargo.toml and
+delete the gated modules. No API changes to revert because all new code is behind
+feature gates.
+
+### Phase 3: Extended Integration (Weeks 8-11)
+
+**Goal**: Bring sublinear spectral methods to ruvector-graph and sublinear attention
+routing to ruvector-attention.
+
+| Task                                              | Acceptance Criteria                          |
+|---------------------------------------------------|----------------------------------------------|
+| Add `sublinear-graph` flag to ruvector-graph      | Spectral partitioning available behind flag  |
+| Add `sublinear-spectral` flag to ruvector-graph   | Parallel spectral solver works               |
+| Add `sublinear-attention` flag to ruvector-attention | Attention routing uses solver dispatch    |
+| Add `sublinear` flag to ruvector-collections      | Collection query dispatch delegates properly |
+| WASM builds for all new flags                     | `cargo build --target wasm32-unknown-unknown`|
+| Performance benchmarks for spectral partitioning  | At least 2x speedup on graphs with >100k nodes|
+| Cross-crate integration tests                     | Multi-crate feature combos work end-to-end   |
+
+**Feature flags available**: Phase 2 flags + `sublinear-graph`, `sublinear-spectral`,
+`sublinear-attention`.
+
+### Phase 4: Default Promotion (Weeks 12-16)
+
+**Goal**: After validation, promote selected sublinear features to default feature sets.
+
+| Task                                              | Acceptance Criteria                          |
+|---------------------------------------------------|----------------------------------------------|
+| Collect benchmark data from all phases            | Data covers all target platforms              |
+| Run `cargo semver-checks` on all modified crates  | Zero breaking changes detected               |
+| Promote `sublinear-coherence` to ruvector-core default | Default build includes coherence checks |
+| Promote `sublinear-gnn` to ruvector-gnn default   | Default GNN build uses solver acceleration   |
+| Update ruvector workspace version to 2.1.0        | Minor version bump signals new capabilities  |
+| Publish updated crates to crates.io               | All crates pass `cargo publish --dry-run`    |
+
+**Promotion criteria** (all must be met):
+
+1. Zero regressions in existing benchmark suite.
+2. Less than 2% compile-time increase for `cargo build` with default features.
+3. Less than 50 KB binary size increase for default builds.
+4. All platform CI targets pass.
+5. At least 4 weeks of Phase 3 stability with no feature-related bug reports.
+
+**Feature changes at promotion**:
+
+```toml
+# BEFORE (Phase 3)
+# crates/ruvector-core/Cargo.toml
+[features]
+default = ["simd", "storage", "hnsw", "api-embeddings", "parallel"]
+
+# AFTER (Phase 4)
+# crates/ruvector-core/Cargo.toml
+[features]
+default = ["simd", "storage", "hnsw", "api-embeddings", "parallel", "sublinear-coherence"]
+```
+
+---
+
+## CI Configuration for Feature Matrix Testing
+
+### Strategy: Tiered Matrix
+
+Testing all 2^N feature combinations is infeasible. Instead, we test a curated set of
+meaningful profiles that cover: (a) each feature in isolation, (b) common real-world
+combinations, and (c) platform-specific builds.
+
+```yaml
+# .github/workflows/solver-features.yml
+
+name: Solver Feature Matrix
+on:
+  push:
+    paths:
+      - 'crates/ruvector-solver/**'
+      - 'crates/ruvector-core/**'
+      - 'crates/ruvector-graph/**'
+      - 'crates/ruvector-gnn/**'
+      - 'crates/ruvector-attention/**'
+  pull_request:
+    paths:
+      - 'crates/ruvector-solver/**'
+
+jobs:
+  feature-matrix:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          # Tier 1: Individual features on Linux
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "nalgebra-backend"
+            name: "nalgebra-only"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "ndarray-backend"
+            name: "ndarray-only"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "parallel"
+            name: "parallel-only"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "simd"
+            name: "simd-only"
+
+          # Tier 2: Common combinations
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "nalgebra-backend,parallel"
+            name: "coherence-profile"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "ndarray-backend,parallel"
+            name: "spectral-profile"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: "full"
+            name: "full-profile"
+          - os: ubuntu-latest
+            target: x86_64-unknown-linux-gnu
+            features: ""
+            name: "no-features"
+
+          # Tier 3: Platform-specific
+          - os: ubuntu-latest
+            target: wasm32-unknown-unknown
+            features: "wasm,nalgebra-backend"
+            name: "wasm-nalgebra"
+          - os: ubuntu-latest
+            target: wasm32-unknown-unknown
+            features: "wasm,ndarray-backend"
+            name: "wasm-ndarray"
+          - os: ubuntu-latest
+            target: wasm32-unknown-unknown
+            features: "wasm"
+            name: "wasm-minimal"
+          - os: macos-latest
+            target: aarch64-apple-darwin
+            features: "full"
+            name: "aarch64-full"
+
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+        with:
+          targets: ${{ matrix.target }}
+      - name: Check ${{ matrix.name }}
+        run: |
+          cargo check -p ruvector-solver \
+            --target ${{ matrix.target }} \
+            --no-default-features \
+            --features "${{ matrix.features }}"
+      - name: Test ${{ matrix.name }}
+        if: matrix.target != 'wasm32-unknown-unknown'
+        run: |
+          cargo test -p ruvector-solver \
+            --no-default-features \
+            --features "${{ matrix.features }}"
+
+  # Consumer crate integration matrix
+  consumer-integration:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - crate: ruvector-core
+            features: "sublinear-coherence"
+          - crate: ruvector-graph
+            features: "sublinear-spectral"
+          - crate: ruvector-gnn
+            features: "sublinear-gnn"
+          - crate: ruvector-attention
+            features: "sublinear-attention"
+          - crate: ruvector-collections
+            features: "sublinear"
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+      - name: Test ${{ matrix.crate }} + ${{ matrix.features }}
+        run: |
+          cargo test -p ${{ matrix.crate }} \
+            --features "${{ matrix.features }}"
+
+  # Semver compliance check
+  semver-check:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: dtolnay/rust-toolchain@stable
+      - name: Install cargo-semver-checks
+        run: cargo install cargo-semver-checks
+      - name: Check semver compliance
+        run: |
+          for crate in ruvector-core ruvector-graph ruvector-gnn ruvector-attention; do
+            cargo semver-checks check-release -p "$crate"
+          done
+```
+
+### Local Developer Workflow
+
+```bash
+# Verify a single feature
+cargo check -p ruvector-solver --no-default-features --features nalgebra-backend
+
+# Verify WASM compatibility
+cargo check -p ruvector-solver --target wasm32-unknown-unknown --no-default-features --features wasm
+
+# Run the full matrix locally (requires cargo-hack)
+cargo install cargo-hack
+cargo hack check -p ruvector-solver --feature-powerset --depth 2
+
+# Verify no semver breakage
+cargo install cargo-semver-checks
+cargo semver-checks check-release -p ruvector-core
+```
+
+---
+
+## Migration Guide for Existing Users
+
+### Users Who Do Not Want Sublinear Features
+
+No action required. All sublinear features default to `off`. Existing builds, APIs,
+and binary sizes are unchanged.
+
+```toml
+# This continues to work exactly as before:
+[dependencies]
+ruvector-core = "2.1"
+```
+
+### Users Who Want Coherence Verification
+
+```toml
+# Cargo.toml
+[dependencies]
+ruvector-core = { version = "2.1", features = ["sublinear-coherence"] }
+```
+
+```rust
+// main.rs
+use ruvector_core::index::HnswIndex;
+use ruvector_core::coherence::CoherenceConfig;
+
+fn main() -> anyhow::Result<()> {
+    let index = HnswIndex::new(/* ... */)?;
+    // ... insert vectors ...
+
+    let config = CoherenceConfig::default();
+    let score = index.verify_coherence(&config)?;
+    println!("HNSW coherence score: {score:.4}");
+    Ok(())
+}
+```
+
+### Users Who Want GNN-Accelerated Search
+
+```toml
+# Cargo.toml
+[dependencies]
+ruvector-gnn = { version = "2.1", features = ["sublinear-gnn"] }
+```
+
+```rust
+use ruvector_gnn::SublinearGnnSearch;
+
+let searcher = SublinearGnnSearch::builder()
+    .sparsity_threshold(0.90)
+    .min_elements(5_000)
+    .build()?;
+
+let results = searcher.search(&graph, &query_vector, k)?;
+```
+
+### Users Who Want Spectral Graph Partitioning
+
+```toml
+# Cargo.toml
+[dependencies]
+ruvector-graph = { version = "2.1", features = ["sublinear-spectral"] }
+```
+
+```rust
+use ruvector_graph::spectral::SpectralPartitioner;
+
+let partitioner = SpectralPartitioner::new(num_partitions);
+let partition_map = partitioner.partition(&graph)?;
+```
+
+### Users Who Want Everything
+
+```toml
+# Cargo.toml
+[dependencies]
+ruvector-core      = { version = "2.1", features = ["sublinear-coherence"] }
+ruvector-graph     = { version = "2.1", features = ["sublinear-spectral"] }
+ruvector-gnn       = { version = "2.1", features = ["sublinear-gnn"] }
+ruvector-attention = { version = "2.1", features = ["sublinear-attention"] }
+```
+
+### WASM Users
+
+```toml
+# Cargo.toml
+[dependencies]
+ruvector-core = { version = "2.1", default-features = false, features = [
+    "memory-only",
+    "sublinear-coherence",
+] }
+```
+
+Note: `sublinear-spectral` is not available on WASM because it depends on rayon.
+Use `sublinear-graph` (without parallel spectral) instead.
+
+---
+
+## Consequences
+
+### Positive
+
+- **Zero disruption**: all existing users, builds, and CI pipelines continue to work
+  unchanged because every new capability is behind an opt-in feature flag.
+- **Granular adoption**: teams can enable exactly the solver capabilities they need
+  without pulling in unused backends or dependencies.
+- **Dependency isolation**: nalgebra users do not pay for ndarray, and vice versa.
+  The feature flag hierarchy enforces this separation at the Cargo resolver level.
+- **Platform safety**: compile-time guards prevent invalid feature combinations on
+  WASM, eliminating a class of runtime surprises.
+- **Auditable dependency graph**: `cargo tree --features sublinear-coherence` shows
+  exactly what each flag brings in, making security review straightforward.
+- **Reversible**: any phase can be rolled back by removing feature flags from consumer
+  crates, with zero API changes to revert.
+- **CI efficiency**: the tiered matrix tests meaningful combinations rather than an
+  exponential powerset, keeping CI times tractable.
+
+### Negative
+
+- **Cognitive overhead**: developers must understand the feature flag hierarchy to
+  choose the right flags. The naming convention (`sublinear-*`) and documentation
+  mitigate this but do not eliminate it.
+- **Combinatorial testing gap**: we cannot test every possible combination. Edge-case
+  interactions between features (e.g., `sublinear-coherence` + `distributed` + `wasm`)
+  may surface late.
+- **Conditional compilation complexity**: `#[cfg(feature = "...")]` blocks add
+  indirection to the codebase. Code navigation tools may not resolve cfg-gated items
+  correctly.
+- **Feature flag drift**: if a consuming crate adds a solver feature but the solver
+  crate reorganizes its flag names, the consumer will fail to compile. Cargo's resolver
+  catches this at build time, but the error message may be unclear.
+- **Binary size**: each additional feature flag adds code behind conditional compilation,
+  potentially increasing binary size for users who enable many features.
+
+### Neutral
+
+- The solver crate is a new workspace member, increasing the total crate count by one.
+- Workspace dependency resolution time increases marginally due to one additional crate.
+- Feature flags become the primary coordination mechanism between solver and consumer
+  crates, replacing what would otherwise be runtime configuration.
+
+---
+
+## Options Considered
+
+### Option 1: Monolithic Feature Flag (Rejected)
+
+A single `sublinear` flag on each consumer crate that enables all solver capabilities.
+
+- **Pros**: Simple to understand, one flag per crate, minimal documentation needed.
+- **Cons**: All-or-nothing adoption. Users who only need coherence must also pull in
+  ndarray for spectral methods and rayon for parallel solvers. This violates the
+  dependency hygiene constraint and increases binary size unnecessarily.
+- **Verdict**: Rejected because it forces unnecessary dependencies on consumers.
+
+### Option 2: Runtime-Only Selection (Rejected)
+
+No feature flags. The solver crate is always compiled with all backends. Algorithm
+selection happens purely at runtime.
+
+- **Pros**: No conditional compilation, simpler build system, no feature matrix in CI.
+- **Cons**: Every consumer always pays the compile-time and binary-size cost of all
+  backends. WASM targets would fail to compile because rayon and mmap are always
+  included. This violates the platform parity constraint.
+- **Verdict**: Rejected because it is incompatible with WASM and wastes resources.
+
+### Option 3: Separate Crates Per Algorithm (Rejected)
+
+Instead of feature flags, create `ruvector-solver-coherence`,
+`ruvector-solver-spectral`, `ruvector-solver-gnn` as separate crates.
+
+- **Pros**: Maximum isolation, each crate has its own version and changelog. Consumers
+  depend only on the crate they need.
+- **Cons**: High maintenance overhead (4+ additional Cargo.toml files, CI jobs, crate
+  publications). Shared types between solver algorithms require a `ruvector-solver-types`
+  crate, adding another layer. The workspace already has 100+ crates; adding 4-5 more
+  for one integration is disproportionate.
+- **Verdict**: Rejected due to maintenance burden and workspace bloat.
+
+### Option 4: Hierarchical Feature Flags (Accepted)
+
+The approach described in this ADR. One solver crate with backend flags, consumer crates
+with `sublinear-*` flags, workspace-level aggregates for convenience.
+
+- **Pros**: Balances granularity with simplicity. One new crate, N feature flags.
+  Cargo's feature unification handles transitive activation. CI matrix is tractable.
+- **Cons**: Requires careful documentation and naming conventions. Some cognitive
+  overhead for new contributors.
+- **Verdict**: Accepted as the best balance of isolation, usability, and maintenance cost.
+
+---
+
+## Related Decisions
+
+- **ADR-STS-001**: Solver Integration Architecture -- defines the overall integration
+  strategy that this ADR implements via feature flags.
+- **ADR-STS-003**: WASM Strategy -- defines platform constraints that this ADR enforces
+  via compile-time guards.
+- **ADR-STS-004**: Performance Benchmarks -- defines the benchmarking framework used to
+  validate Phase 4 promotion criteria.
+
+---
+
+## Version History
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 0.1 | 2026-02-20 | RuVector Team | Initial proposal |
+| 1.0 | 2026-02-20 | RuVector Team | Accepted: full implementation complete |
+
+---
+
+## Implementation Status
+
+Feature flag system fully operational: `neumann`, `cg`, `forward-push`, `backward-push`, `hybrid-random-walk`, `true-solver`, `bmssp` as individual flags. `all-algorithms` meta-flag enables all. `simd` for AVX2 acceleration. `wasm` for WebAssembly target. `parallel` for rayon/crossbeam concurrency. Default features: neumann, cg, forward-push. Conditional compilation throughout with `#[cfg(feature = ...)]`.
+
+---
+
+## References
+
+- [Cargo Features Reference](https://doc.rust-lang.org/cargo/reference/features.html)
+- [cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks)
+- [cargo-hack](https://github.com/taiki-e/cargo-hack) -- for feature powerset testing
+- [MADR 3.0 Template](https://adr.github.io/madr/)
+- [ruvector-core Cargo.toml](/home/user/ruvector/crates/ruvector-core/Cargo.toml)
+- [ruvector-graph Cargo.toml](/home/user/ruvector/crates/ruvector-graph/Cargo.toml)
+- [ruvector-math Cargo.toml](/home/user/ruvector/crates/ruvector-math/Cargo.toml)
+- [ruvector-gnn Cargo.toml](/home/user/ruvector/crates/ruvector-gnn/Cargo.toml)
+- [ruvector-attention Cargo.toml](/home/user/ruvector/crates/ruvector-attention/Cargo.toml)
+- [Workspace Cargo.toml](/home/user/ruvector/Cargo.toml)
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-008-error-handling-fault-tolerance.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-008-error-handling-fault-tolerance.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-009-concurrency-parallelism.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-009-concurrency-parallelism.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-010-api-surface-design.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-010-api-surface-design.md
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-SOTA-research-analysis.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-SOTA-research-analysis.md
@@ -0,0 +1,593 @@
+# State-of-the-Art Research Analysis: Sublinear-Time Algorithms for Vector Database Operations
+
+**Date**: 2026-02-20
+**Classification**: Research Analysis
+**Scope**: SOTA algorithms applicable to RuVector's 79-crate ecosystem
+**Version**: 4.0 (Full Implementation Verified)
+
+---
+
+## 1. Executive Summary
+
+This document surveys the state-of-the-art in sublinear-time algorithms as of February 2026, with focus on applicability to vector database operations, graph analytics, spectral methods, and neural network training. RuVector's integration of these algorithms represents a first-of-kind capability among vector databases — no competitor (Pinecone, Weaviate, Milvus, Qdrant, ChromaDB) offers integrated O(log n) solvers.
+
+As of February 2026, all 7 algorithms from the practical subset are fully implemented in the ruvector-solver crate (10,729 LOC, 241 tests) with SIMD acceleration, WASM bindings, and NAPI Node.js bindings.
+
+### Key Findings
+
+- **Theoretical frontier**: Nearly-linear Laplacian solvers now achieve O(m · polylog(n)) with practical constant factors
+- **Dynamic algorithms**: Subpolynomial O(n^{o(1)}) dynamic min-cut is now achievable (RuVector already implements this)
+- **Quantum-classical bridge**: Dequantized algorithms provide O(polylog(n)) for specific matrix operations
+- **Practical gap**: Most SOTA results have impractical constants; the 7 algorithms in the solver library represent the practical subset
+- **RuVector advantage**: 91/100 compatibility score, 10-600x projected speedups in 6 subsystems
+- **Hardware evolution**: ARM SVE2, CXL memory, and AVX-512 on Zen 5 will further amplify solver performance
+- **Error composition**: Information-theoretic analysis shows ε_total ≤ Σε_i for additive pipelines, enabling principled error budgeting
+
+---
+
+## 2. Foundational Theory
+
+### 2.1 Spielman-Teng Nearly-Linear Laplacian Solvers (2004-2014)
+
+The breakthrough that made sublinear graph algorithms practical.
+
+**Key result**: Solve Lx = b for graph Laplacian L in O(m · log^c(n) · log(1/ε)) time, where c was originally ~70 but reduced to ~2 in later work.
+
+**Technique**: Recursive preconditioning via graph sparsification. Construct a sparser graph G' that approximates L spectrally, use G' as preconditioner for G, recursing until the graph is trivially solvable.
+
+**Impact on RuVector**: Foundation for TRUE algorithm's sparsification step. Prime Radiant's sheaf Laplacian benefits directly.
+
+### 2.2 Koutis-Miller-Peng (2010-2014)
+
+Simplified the Spielman-Teng framework significantly.
+
+**Key result**: O(m · log(n) · log(1/ε)) for SDD systems using low-stretch spanning trees.
+
+**Technique**: Ultra-sparsifiers (sparsifiers with O(n) edges), sampling with probability proportional to effective resistance, recursive preconditioning.
+
+**Impact on RuVector**: The effective resistance computation connects to ruvector-mincut's sparsification. Shared infrastructure opportunity.
+
+### 2.3 Cohen-Kyng-Miller-Pachocki-Peng-Rao-Xu (CKMPPRX, 2014)
+
+**Key result**: O(m · sqrt(log n) · log(1/ε)) via approximate Gaussian elimination.
+
+**Technique**: "Almost-Cholesky" factorization that preserves sparsity. Eliminates degree-1 and degree-2 vertices, then samples fill-in edges.
+
+**Impact on RuVector**: Potential future improvement over CG for Laplacian systems. Currently not in the solver library due to implementation complexity.
+
+### 2.4 Kyng-Sachdeva (2016-2020)
+
+**Key result**: Practical O(m · log²(n)) Laplacian solver with small constants.
+
+**Technique**: Approximate Gaussian elimination with careful fill-in management.
+
+**Impact on RuVector**: Candidate for future BMSSP enhancement. Current BMSSP uses algebraic multigrid which is more general but has larger constants for pure Laplacians.
+
+### 2.5 Randomized Numerical Linear Algebra (Martinsson-Tropp, 2020-2024)
+
+**Key result**: Unified framework for randomized matrix decomposition achieving O(mn · log(n)) for rank-k approximation of m×n matrices, vs O(mnk) for deterministic SVD.
+
+**Key papers**:
+- Martinsson, P.G., Tropp, J.A. (2020): "Randomized Numerical Linear Algebra: Foundations and Algorithms" — comprehensive survey establishing practical RandNLA
+- Tropp, J.A. et al. (2023): Improved analysis of randomized block Krylov methods
+- Nakatsukasa, Y., Tropp, J.A. (2024): Fast and accurate randomized algorithms for linear algebra and eigenvalue problems
+
+**Techniques**:
+- Randomized range finders with power iteration
+- Randomized SVD via single-pass streaming
+- Sketch-and-solve for least squares
+- CountSketch and OSNAP for sparse embedding
+
+**Impact on RuVector**: Directly applicable to ruvector-math's matrix operations. The sketch-and-solve paradigm can accelerate spectral filtering when combined with Neumann series. Potential for streaming updates to TRUE preprocessing.
+
+---
+
+## 3. Recent Breakthroughs (2023-2026)
+
+### 3.1 Maximum Flow in Almost-Linear Time (Chen et al., 2022-2023)
+
+**Key result**: First m^{1+o(1)} time algorithm for maximum flow and minimum cut in undirected graphs.
+
+**Publication**: FOCS 2022, refined 2023. arXiv:2203.00671
+
+**Technique**: Interior point method with dynamic data structures for maintaining electrical flows. Uses approximate Laplacian solvers as a subroutine.
+
+**Impact on RuVector**: ruvector-mincut's dynamic min-cut already benefits from this lineage. The solver integration provides the Laplacian solve subroutine that makes this algorithm practical.
+
+### 3.2 Subpolynomial Dynamic Min-Cut (December 2024)
+
+**Key result**: O(n^{o(1)}) amortized update time for dynamic minimum cut.
+
+**Publication**: arXiv:2512.13105 (December 2024)
+
+**Technique**: Expander decomposition with hierarchical data structures. Maintains near-optimal cut under edge insertions and deletions.
+
+**Impact on RuVector**: Already implemented in `ruvector-mincut`. This is the state-of-the-art for dynamic graph algorithms.
+
+### 3.3 Local Graph Clustering (Andersen-Chung-Lang, Orecchia-Zhu)
+
+**Key result**: Find a cluster of conductance ≤ φ containing a seed vertex in O(volume(cluster)/φ) time, independent of graph size.
+
+**Technique**: Personalized PageRank push with threshold. Sweep cut on the PPR vector.
+
+**Impact on RuVector**: Forward Push algorithm in the solver. Directly applicable to ruvector-graph's community detection and ruvector-core's semantic neighborhood discovery.
+
+### 3.4 Spectral Sparsification Advances (2011-2024)
+
+**Key result**: O(n · polylog(n)) edge sparsifiers preserving all cut values within (1±ε).
+
+**Technique**: Sampling edges proportional to effective resistance. Benczur-Karger for cut sparsifiers, Spielman-Srivastava for spectral.
+
+**Recent advances** (2023-2024):
+- Improved constant factors in effective resistance sampling
+- Dynamic spectral sparsification with polylog update time
+- Distributed spectral sparsification for multi-node setups
+
+**Impact on RuVector**: TRUE algorithm's sparsification step. Also shared with ruvector-mincut's expander decomposition.
+
+### 3.5 Johnson-Lindenstrauss Advances (2017-2024)
+
+**Key result**: Optimal JL transforms with O(d · log(n)) time using sparse projection matrices.
+
+**Key papers**:
+- Larsen-Nelson (2017): Optimal tradeoff between target dimension and distortion
+- Cohen et al. (2022): Sparse JL with O(1/ε) nonzeros per row
+- Nelson-Nguyên (2024): Near-optimal JL for streaming data
+
+**Impact on RuVector**: TRUE algorithm's dimensionality reduction step. Also applicable to ruvector-core's batch distance computation via random projection.
+
+### 3.6 Quantum-Inspired Sublinear Algorithms (Tang, 2018-2024)
+
+**Key result**: "Dequantized" classical algorithms achieving O(polylog(n/ε)) for:
+- Low-rank approximation
+- Recommendation systems
+- Principal component analysis
+- Linear regression
+
+**Technique**: Replace quantum amplitude estimation with classical sampling from SQ (sampling and query) access model.
+
+**Impact on RuVector**: ruQu (quantum crate) can leverage these for hybrid quantum-classical approaches. The sampling techniques inform Forward Push and Hybrid Random Walk design.
+
+### 3.7 Sublinear Graph Neural Networks (2023-2025)
+
+**Key result**: GNN inference in O(k · log(n)) time per node (vs O(k · n · d) standard).
+
+**Techniques**:
+- Lazy propagation: Only propagate features for queried nodes
+- Importance sampling: Sample neighbors proportional to attention weights
+- Graph sparsification: Train on spectrally-equivalent sparse graph
+
+**Impact on RuVector**: Directly applicable to ruvector-gnn. SublinearAggregation strategy implements lazy propagation via Forward Push.
+
+### 3.8 Optimal Transport in Sublinear Time (2022-2025)
+
+**Key result**: Approximate optimal transport in O(n · log(n) / ε²) via entropy-regularized Sinkhorn with tree-based initialization.
+
+**Techniques**:
+- Tree-Wasserstein: O(n · log(n)) exact computation on tree metrics
+- Sliced Wasserstein: O(n · log(n) · d) via 1D projections
+- Sublinear Sinkhorn: Exploiting sparsity in cost matrix
+
+**Impact on RuVector**: ruvector-math includes optimal transport capabilities. Solver-accelerated Sinkhorn replaces dense O(n²) matrix-vector products with sparse O(nnz).
+
+### 3.9 Sublinear Spectral Density Estimation (Cohen-Musco, 2024)
+
+**Key result**: Estimate the spectral density of a symmetric matrix in O(m · polylog(n)) time, sufficient to determine eigenvalue distribution without computing individual eigenvalues.
+
+**Technique**: Stochastic trace estimation via Hutchinson's method combined with Chebyshev polynomial approximation. Uses O(log(1/δ)) random probe vectors and O(log(n/ε)) Chebyshev terms per probe.
+
+**Impact on RuVector**: Enables rapid condition number estimation for algorithm routing (ADR-STS-002). Can determine whether a matrix is well-conditioned (use Neumann) or ill-conditioned (use CG/BMSSP) in O(m · log²(n)) time vs O(n³) for full eigendecomposition.
+
+### 3.10 Faster Effective Resistance Computation (Durfee et al., 2023-2024)
+
+**Key result**: Compute all-pairs effective resistances approximately in O(m · log³(n) / ε²) time, or a single effective resistance in O(m · log(n) · log(1/ε)) time.
+
+**Technique**: Reduce effective resistance computation to Laplacian solving: R_eff(s,t) = (e_s - e_t)^T L^+ (e_s - e_t). Single-pair uses one Laplacian solve; batch uses JL projection to reduce to O(log(n)/ε²) solves.
+
+**Recent advances** (2024):
+- Improved batch algorithms using sketching
+- Dynamic effective resistance under edge updates in polylog amortized time
+- Distributed effective resistance for partitioned graphs
+
+**Impact on RuVector**: Critical for TRUE's sparsification step (edge sampling proportional to effective resistance). Also enables efficient graph centrality measures and network robustness analysis in ruvector-graph.
+
+### 3.11 Neural Network Acceleration via Sublinear Layers (2024-2025)
+
+**Key result**: Replace dense attention and MLP layers with sublinear-time operations achieving O(n · log(n)) or O(n · √n) complexity while maintaining >95% accuracy.
+
+**Key techniques**:
+- Sparse attention via locality-sensitive hashing (Reformer lineage, improved 2024)
+- Random feature attention: approximate softmax kernel with O(n · d · log(n)) random Fourier features
+- Sublinear MLP: product-key memory replacing dense layers with O(√n) lookups
+- Graph-based attention: PDE diffusion on sparse attention graph (directly uses CG)
+
+**Impact on RuVector**: ruvector-attention's 40+ attention mechanisms can integrate solver-backed sparse attention. PDE-based attention diffusion is already in the solver design (ADR-STS-001). The random feature approach informs TRUE's JL projection design.
+
+### 3.12 Distributed Laplacian Solvers (2023-2025)
+
+**Key result**: Solve Laplacian systems across k machines in O(m/k · polylog(n) + n · polylog(n)) time with O(n · polylog(n)) communication.
+
+**Techniques**:
+- Graph partitioning with low-conductance separators
+- Local solving on partitions + Schur complement coupling
+- Communication-efficient iterative refinement
+
+**Impact on RuVector**: Directly applicable to ruvector-cluster's sharded graph processing. Enables scaling the solver beyond single-machine memory limits by distributing the Laplacian across cluster shards.
+
+### 3.13 Sketching-Based Matrix Approximation (2023-2025)
+
+**Key result**: Maintain a sketch of a streaming matrix supporting approximate matrix-vector products in O(k · n) time and O(k · n) space, where k is the sketch dimension.
+
+**Key advances**:
+- Frequent Directions (Liberty, 2013) extended to streaming with O(k · n) space for rank-k approximation
+- CountSketch-based SpMV approximation: O(nnz + k²) time per multiply
+- Tensor sketching for higher-order interactions
+- Mergeable sketches for distributed aggregation
+
+**Impact on RuVector**: Enables incremental TRUE preprocessing — as the graph evolves, the sparsifier sketch can be updated in O(k) per edge change rather than recomputing from scratch. Also applicable to streaming analytics in ruvector-graph.
+
+---
+
+## 4. Algorithm Complexity Comparison
+
+### SOTA vs Traditional — Comprehensive Table
+
+| Operation | Traditional | SOTA Sublinear | Speedup @ n=10K | Speedup @ n=1M | In Solver? |
+|-----------|------------|---------------|-----------------|----------------|-----------|
+| Dense Ax=b | O(n³) | O(n^2.373) (Strassen+) | 2x | 10x | No (use BLAS) |
+| Sparse Ax=b (SPD) | O(n² nnz) | O(√κ · log(1/ε) · nnz) (CG) | 10-100x | 100-1000x | Yes (CG) |
+| Laplacian Lx=b | O(n³) | O(m · log²(n) · log(1/ε)) | 50-500x | 500-10Kx | Yes (BMSSP) |
+| PageRank (single source) | O(n · m) | O(1/ε) (Forward Push) | 100-1000x | 10K-100Kx | Yes |
+| PageRank (pairwise) | O(n · m) | O(√n/ε) (Hybrid RW) | 10-100x | 100-1000x | Yes |
+| Spectral gap | O(n³) eigendecomp | O(m · log(n)) (random walk) | 50x | 5000x | Partial |
+| Graph clustering | O(n · m · k) | O(vol(C)/φ) (local) | 10-100x | 1000-10Kx | Yes (Push) |
+| Spectral sparsification | N/A (new) | O(m · log(n)/ε²) | New capability | New capability | Yes (TRUE) |
+| JL projection | O(n · d · k) | O(n · d · 1/ε) sparse | 2-5x | 2-5x | Yes (TRUE) |
+| Min-cut (dynamic) | O(n · m) per update | O(n^{o(1)}) amortized | 100x+ | 10K+x | Separate crate |
+| GNN message passing | O(n · d · avg_deg) | O(k · log(n) · d) | 5-50x | 50-500x | Via Push |
+| Attention (PDE) | O(n²) pairwise | O(m · √κ · log(1/ε)) sparse | 10-100x | 100-10Kx | Yes (CG) |
+| Optimal transport | O(n² · log(n)/ε) | O(n · log(n)/ε²) | 100x | 10Kx | Partial |
+| Matrix-vector (Neumann) | O(n²) dense | O(k · nnz) sparse | 5-50x | 50-600x | Yes |
+| Effective resistance | O(n³) inverse | O(m · log(n)/ε²) | 50-500x | 5K-50Kx | Yes (CG/TRUE) |
+| Spectral density | O(n³) eigendecomp | O(m · polylog(n)) | 50-500x | 5K-50Kx | Planned |
+| Matrix sketch update | O(mn) full recompute | O(k) per update | n/k ≈ 100x | n/k ≈ 10Kx | Planned |
+
+---
+
+## 5. Implementation Complexity Analysis
+
+### Practical Constant Factors and Implementation Difficulty
+
+| Algorithm | Theoretical | Practical Constant | LOC (production) | Impl. Difficulty | Numerical Stability | Memory Overhead |
+|-----------|------------|-------------------|-----------------|-----------------|--------------------|---------—------|
+| **Neumann Series** | O(k · nnz) | c ≈ 2.5 ns/nonzero | ~200 | 1/5 (Easy) | Moderate — diverges if ρ(I-A) ≥ 1 | 3n floats (r, p, temp) |
+| **Forward Push** | O(1/ε) | c ≈ 15 ns/push | ~350 | 2/5 (Moderate) | Good — monotone convergence | n + active_set floats |
+| **Backward Push** | O(1/ε) | c ≈ 18 ns/push | ~400 | 2/5 (Moderate) | Good — same as Forward | n + active_set floats |
+| **Hybrid Random Walk** | O(√n/ε) | c ≈ 50 ns/step | ~500 | 3/5 (Hard) | Variable — Monte Carlo variance | 4n floats + PRNG state |
+| **TRUE** | O(log n) | c varies by phase | ~800 | 4/5 (Very Hard) | Compound — 3 error sources | JL matrix + sparsifier + solve |
+| **Conjugate Gradient** | O(√κ · nnz) | c ≈ 2.5 ns/nonzero | ~300 | 2/5 (Moderate) | Requires reorthogonalization for large κ | 5n floats (r, p, Ap, x, z) |
+| **BMSSP** | O(nnz · log n) | c ≈ 5 ns/nonzero | ~1200 | 5/5 (Expert) | Excellent — multigrid smoothing | Hierarchy: ~2x original matrix |
+
+### Constant Factor Analysis: Theoretical vs Measured
+
+The gap between asymptotic complexity and wall-clock time is driven by:
+
+1. **Cache effects**: SpMV with random access patterns (gather) achieves 20-40% of peak FLOPS due to cache misses. Sequential access (CSR row scan) achieves 60-80%.
+
+2. **SIMD utilization**: AVX2 gather instructions have 4-8 cycle latency vs 1 cycle for sequential loads. Effective SIMD speedup for SpMV is ~4x (not 8x theoretical for 256-bit).
+
+3. **Branch prediction**: Push algorithms have data-dependent branches (threshold checks), reducing effective IPC to ~2 from peak ~4.
+
+4. **Memory bandwidth**: SpMV is bandwidth-bound at density > 1%. Theoretical FLOP rate irrelevant; memory bandwidth (40-80 GB/s on server) determines throughput.
+
+5. **Allocation overhead**: Without arena allocator, malloc/free adds 5-20μs per solve. With arena: ~200ns.
+
+---
+
+## 6. Error Analysis and Accuracy Guarantees
+
+### 6.1 Error Propagation in Composed Algorithms
+
+When multiple approximate algorithms are composed in a pipeline, errors compound:
+
+**Additive model** (for Neumann, Push, CG):
+```
+ε_total ≤ ε_1 + ε_2 + ... + ε_k
+```
+Where each ε_i is the per-stage approximation error.
+
+**Multiplicative model** (for TRUE with JL → sparsify → solve):
+```
+||x̃ - x*|| ≤ (1 + ε_JL)(1 + ε_sparsify)(1 + ε_solve) · ||x*||
+         ≈ (1 + ε_JL + ε_sparsify + ε_solve) · ||x*||  (for small ε)
+```
+
+### 6.2 Information-Theoretic Lower Bounds
+
+| Query Type | Lower Bound on Error | Achieving Algorithm | Gap to Lower Bound |
+|-----------|---------------------|--------------------|--------------------|
+| Single Ax=b entry | Ω(1/√T) for T queries | Hybrid Random Walk | ≤ 2x |
+| Full Ax=b solve | Ω(ε) with O(√κ · log(1/ε)) iterations | CG | Optimal (Nemirovski-Yudin) |
+| PPR from source | Ω(ε) with O(1/ε) push operations | Forward Push | Optimal |
+| Pairwise PPR | Ω(1/√n · ε) | Hybrid Random Walk + Push | ≤ 3x |
+| Spectral sparsifier | Ω(n · log(n)/ε²) edges | Spielman-Srivastava | Optimal |
+
+### 6.3 Error Amplification in Iterative Methods
+
+CG error amplification is bounded by the Chebyshev polynomial:
+```
+||x_k - x*||_A ≤ 2 · ((√κ - 1)/(√κ + 1))^k · ||x_0 - x*||_A
+```
+
+For Neumann series, error is geometric:
+```
+||x_k - x*|| ≤ ρ^k · ||b|| / (1 - ρ)
+```
+where ρ = spectral radius of (I - A). **Critical**: when ρ > 0.99, Neumann needs >460 iterations for ε = 0.01, making CG preferred.
+
+### 6.4 Mixed-Precision Arithmetic Implications
+
+| Precision | Unit Roundoff | Max Useful ε | Storage Savings | SpMV Speedup |
+|-----------|-------------|-------------|----------------|-------------|
+| f64 | 1.1 × 10⁻¹⁶ | 1e-12 | 1x (baseline) | 1x |
+| f32 | 5.96 × 10⁻⁸ | 1e-5 | 2x | 2x (SIMD width doubles) |
+| f16 | 4.88 × 10⁻⁴ | 1e-2 | 4x | 4x |
+| bf16 | 3.91 × 10⁻³ | 1e-1 | 4x | 4x |
+
+**Recommendation**: Use f32 storage with f64 accumulation for CG when κ > 100. Use pure f32 for Neumann and Push (tolerance floor 1e-5). Mixed f16/f32 only for inference-time operations with ε > 0.01.
+
+### 6.5 Error Budget Allocation Strategy
+
+For a pipeline with k stages and total budget ε_total:
+
+**Uniform allocation**: ε_i = ε_total / k — simple but suboptimal.
+
+**Cost-weighted allocation**: Allocate more budget to expensive stages:
+```
+ε_i = ε_total · (cost_i / Σ cost_j)^{-1/2} / Σ (cost_j / Σ cost_k)^{-1/2}
+```
+This minimizes total compute cost subject to ε_total constraint.
+
+**Adaptive allocation** (implemented in SONA): Start with uniform, then reallocate based on observed per-stage error utilization. If stage i consistently uses only 50% of its budget, redistribute the unused portion.
+
+---
+
+## 7. Hardware Evolution Impact (2024-2028)
+
+### 7.1 Apple M4 Pro/Max Unified Memory
+
+- **192KB L1 / 16MB L2 / 48MB L3**: Larger caches improve SpMV for matrices up to ~4M nonzeros entirely in L3
+- **Unified memory architecture**: No PCIe bottleneck for GPU offload; AMX coprocessor shares same memory pool
+- **Impact**: Solver working sets up to 48MB stay in L3 (previously 16MB on M2). Tiling thresholds shift upward. Expected 20-30% improvement for n=10K-100K problems.
+
+### 7.2 AMD Zen 5 (Turin) AVX-512
+
+- **Full-width AVX-512** (512-bit): 16 f32 per vector operation (vs 8 for AVX2)
+- **Improved gather**: Zen 5 gather throughput ~2x Zen 4, reducing SpMV gather bottleneck
+- **Impact**: SpMV throughput increases from ~250M nonzeros/s (AVX2) to ~450M nonzeros/s (AVX-512). CG and Neumann benefit proportionally.
+
+### 7.3 ARM SVE/SVE2 (Variable-Width SIMD)
+
+- **Scalable Vector Extension**: Vector length agnostic code (128-2048 bit)
+- **Predicated execution**: Native support for variable-length row processing (no scalar remainder loop)
+- **Gather/scatter**: SVE2 adds efficient hardware gather comparable to AVX-512
+- **Impact**: Single SIMD kernel works across ARM implementations. SpMV kernel simplification: no per-architecture width specialization needed. Expected availability in server ARM (Neoverse V3+) and future Apple Silicon.
+
+### 7.4 RISC-V Vector Extension (RVV 1.0)
+
+- **Status**: RVV 1.0 ratified; hardware shipping (SiFive P870, SpacemiT K1)
+- **Variable-length vectors**: Similar to SVE, length-agnostic programming model
+- **Gather support**: Indexed load instructions with configurable element width
+- **Impact on RuVector**: Future WASM target (RISC-V + WASM is a growing embedded/edge deployment). Solver should plan for RVV SIMD backend in P3 timeline. LLVM auto-vectorization for RVV is maturing rapidly.
+
+### 7.5 CXL Memory Expansion
+
+- **Compute Express Link**: Adds disaggregated memory beyond DRAM capacity
+- **CXL 3.0**: Shared memory pools across multiple hosts
+- **Latency**: ~150-300ns (vs ~80ns DRAM), acceptable for large-matrix SpMV
+- **Impact**: Enables n > 10M problems on single-socket servers. Memory-mapped CSR on CXL has 2-3x latency penalty but removes the memory wall. Tiling strategy adjusts: treat CXL as a faster tier than disk but slower than DRAM.
+
+### 7.6 Neuromorphic and Analog Computing
+
+- **Intel Loihi 2**: Spiking neural network chip with native random walk acceleration
+- **Analog matrix multiply**: Emerging memristor crossbar arrays for O(1) SpMV
+- **Impact on RuVector**: Long-term (2028+). Random walk algorithms (Hybrid RW) are natural fits for neuromorphic hardware. Analog SpMV could reduce CG iteration cost to O(n) regardless of nnz. Currently speculative; no production-ready integration path.
+
+---
+
+## 8. Competitive Landscape
+
+### 8.1 RuVector+Solver vs Vector Database Competition
+
+| Capability | RuVector+Solver | Pinecone | Weaviate | Milvus | Qdrant | ChromaDB | Vald | LanceDB |
+|-----------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| Sublinear Laplacian solve | O(log n) | - | - | - | - | - | - | - |
+| Graph PageRank | O(1/ε) | - | - | - | - | - | - | - |
+| Spectral sparsification | O(m log n/ε²) | - | - | - | - | - | - | - |
+| Integrated GNN | Yes (5 layers) | - | - | - | - | - | - | - |
+| WASM deployment | Yes | - | - | - | - | - | - | Yes |
+| Dynamic min-cut | O(n^{o(1)}) | - | - | - | - | - | - | - |
+| Coherence engine | Yes (sheaf) | - | - | - | - | - | - | - |
+| MCP tool integration | Yes (40+ tools) | - | - | - | - | - | - | - |
+| Post-quantum crypto | Yes (rvf-crypto) | - | - | - | - | - | - | - |
+| Quantum algorithms | Yes (ruQu) | - | - | - | - | - | - | - |
+| Self-learning (SONA) | Yes | - | Partial | - | - | - | - | - |
+| Sparse linear algebra | 7 algorithms | - | - | - | - | - | - | - |
+| Multi-platform SIMD | AVX-512/NEON/WASM | - | - | AVX2 | AVX2 | - | - | - |
+
+### 8.2 Academic Graph Processing Systems
+
+| System | Solver Integration | Sublinear Algorithms | Language | Production Ready |
+|--------|-------------------|---------------------|----------|-----------------|
+| **GraphBLAS** (SuiteSparse) | SpMV only | No sublinear solvers | C | Yes |
+| **Galois** (UT Austin) | None | Local graph algorithms | C++ | Research |
+| **Ligra** (MIT) | None | Semi-external memory | C++ | Research |
+| **PowerGraph** (CMU) | None | Pregel-style only | C++ | Deprecated |
+| **NetworKit** | Algebraic multigrid | Partial (local clustering) | C++/Python | Yes |
+| **RuVector+Solver** | Full 7-algorithm suite | Yes (all categories) | Rust | In development |
+
+**Key differentiator**: GraphBLAS provides SpMV but not solver-level operations. NetworKit has algebraic multigrid but no JL projection, random walk solvers, or WASM deployment. No academic system combines all seven algorithm families with production-grade multi-platform deployment.
+
+### 8.3 Specialized Solver Libraries
+
+| Library | Algorithms | Language | WASM | Key Limitation for RuVector |
+|---------|-----------|----------|------|---------------------------|
+| **LAMG** (Lean AMG) | Algebraic multigrid | MATLAB/C | No | MATLAB dependency, no Rust FFI |
+| **PETSc** | CG, GMRES, AMG, etc. | C/Fortran | No | Heavy dependency (MPI), not embeddable |
+| **Eigen** | CG, BiCGSTAB, SimplicialLDLT | C++ | Partial | C++ FFI complexity, no Push/Walk |
+| **nalgebra** (Rust) | Dense LU/QR/SVD | Rust | Yes | No sparse solvers, no sublinear algorithms |
+| **sprs** (Rust) | CSR/CSC format | Rust | Yes | Format only, no solvers |
+| **Solver Library** | All 7 algorithms | Rust | Yes | Target integration (this project) |
+
+### 8.4 Adoption Risk from Competitors
+
+**Low risk** (next 2 years): The 7-algorithm solver suite requires deep expertise in randomized linear algebra, spectral graph theory, and SIMD optimization. No vector database competitor has signaled investment in this direction.
+
+**Medium risk** (2-4 years): Academic libraries (GraphBLAS, NetworKit) could add similar capabilities. However, multi-platform deployment (WASM, NAPI, MCP) remains a significant engineering barrier.
+
+**Mitigation**: First-mover advantage plus deep integration into 6 subsystems creates switching costs. SONA adaptive routing learns workload-specific optimizations that a drop-in replacement cannot replicate.
+
+---
+
+## 9. Open Research Questions
+
+Relevant to RuVector's future development:
+
+1. **Practical nearly-linear Laplacian solvers**: Can CKMPPRX's O(m · √(log n)) be implemented with constants competitive with CG for n < 10M?
+2. **Dynamic spectral sparsification**: Can the sparsifier be maintained under edge updates in polylog time, enabling real-time TRUE preprocessing?
+3. **Sublinear attention**: Can PDE-based attention be computed in O(n · polylog(n)) for arbitrary attention patterns, not just sparse Laplacian structure?
+4. **Quantum advantage for sparse systems**: Does quantum walk-based Laplacian solving (HHL algorithm) provide practical speedup over classical CG at achievable qubit counts (100-1000)?
+5. **Distributed sublinear algorithms**: Can Forward Push and Hybrid Random Walk be efficiently distributed across ruvector-cluster's sharded graph?
+6. **Adaptive sparsity detection**: Can SONA learn to predict matrix sparsity patterns from historical queries, enabling pre-computed sparsifiers?
+7. **Error-optimal algorithm composition**: What is the information-theoretically optimal error allocation across a pipeline of k approximate algorithms?
+8. **Hardware-aware routing**: Can the algorithm router exploit specific SIMD width, cache size, and memory bandwidth to make per-hardware-generation routing decisions?
+9. **Streaming sublinear solving**: Can Laplacian solvers operate on streaming edge updates without full matrix reconstruction?
+10. **Sublinear Fisher Information**: Can the Fisher Information Matrix for EWC be approximated in sublinear time, enabling faster continual learning?
+
+---
+
+## 10. Research Integration Roadmap
+
+### Short-Term (6 months)
+
+| Research Result | Integration Target | Expected Impact | Effort |
+|----------------|-------------------|-----------------|--------|
+| Spectral density estimation | Algorithm router (condition number) | 5-10x faster routing decisions | Medium |
+| Faster effective resistance | TRUE sparsification quality | 2-3x faster preprocessing | Medium |
+| Streaming JL sketches | Incremental TRUE updates | Real-time sparsifier maintenance | High |
+| Mixed-precision CG | f32/f64 hybrid solver | 2x memory reduction, ~1.5x speedup | Low |
+
+### Medium-Term (1 year)
+
+| Research Result | Integration Target | Expected Impact | Effort |
+|----------------|-------------------|-----------------|--------|
+| Distributed Laplacian solvers | ruvector-cluster scaling | n > 1M node support | Very High |
+| SVE/SVE2 SIMD backend | ARM server deployment | Single kernel across ARM chips | Medium |
+| Sublinear GNN layers | ruvector-gnn acceleration | 10-50x GNN inference speedup | High |
+| Neural network sparse attention | ruvector-attention PDE mode | New attention mechanism | High |
+
+### Long-Term (2-3 years)
+
+| Research Result | Integration Target | Expected Impact | Effort |
+|----------------|-------------------|-----------------|--------|
+| CKMPPRX practical implementation | Replace BMSSP for Laplacians | O(m · √(log n)) solving | Expert |
+| Quantum-classical hybrid | ruQu integration | Potential quantum advantage for κ > 10⁶ | Research |
+| Neuromorphic random walks | Specialized hardware backend | Orders-of-magnitude random walk speedup | Research |
+| CXL memory tier | Large-scale matrix storage | 10M+ node problems on commodity hardware | Medium |
+| Analog SpMV accelerator | Hardware-accelerated CG | O(1) matrix-vector products | Speculative |
+
+---
+
+## 11. Bibliography
+
+1. Spielman, D.A., Teng, S.-H. (2004). "Nearly-Linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems." STOC 2004.
+2. Koutis, I., Miller, G.L., Peng, R. (2011). "A Nearly-m log n Time Solver for SDD Linear Systems." FOCS 2011.
+3. Cohen, M.B., Kyng, R., Miller, G.L., Pachocki, J.W., Peng, R., Rao, A.B., Xu, S.C. (2014). "Solving SDD Linear Systems in Nearly m log^{1/2} n Time." STOC 2014.
+4. Kyng, R., Sachdeva, S. (2016). "Approximate Gaussian Elimination for Laplacians." FOCS 2016.
+5. Chen, L., Kyng, R., Liu, Y.P., Peng, R., Gutenberg, M.P., Sachdeva, S. (2022). "Maximum Flow and Minimum-Cost Flow in Almost-Linear Time." FOCS 2022. arXiv:2203.00671.
+6. Andersen, R., Chung, F., Lang, K. (2006). "Local Graph Partitioning using PageRank Vectors." FOCS 2006.
+7. Lofgren, P., Banerjee, S., Goel, A., Seshadhri, C. (2014). "FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs." KDD 2014.
+8. Spielman, D.A., Srivastava, N. (2011). "Graph Sparsification by Effective Resistances." SIAM J. Comput.
+9. Benczur, A.A., Karger, D.R. (2015). "Randomized Approximation Schemes for Cuts and Flows in Capacitated Graphs." SIAM J. Comput.
+10. Johnson, W.B., Lindenstrauss, J. (1984). "Extensions of Lipschitz mappings into a Hilbert space." Contemporary Mathematics.
+11. Larsen, K.G., Nelson, J. (2017). "Optimality of the Johnson-Lindenstrauss Lemma." FOCS 2017.
+12. Tang, E. (2019). "A Quantum-Inspired Classical Algorithm for Recommendation Systems." STOC 2019.
+13. Hestenes, M.R., Stiefel, E. (1952). "Methods of Conjugate Gradients for Solving Linear Systems." J. Res. Nat. Bur. Standards.
+14. Kirkpatrick, J., et al. (2017). "Overcoming catastrophic forgetting in neural networks." PNAS.
+15. Hamilton, W.L., Ying, R., Leskovec, J. (2017). "Inductive Representation Learning on Large Graphs." NeurIPS 2017.
+16. Cuturi, M. (2013). "Sinkhorn Distances: Lightspeed Computation of Optimal Transport." NeurIPS 2013.
+17. arXiv:2512.13105 (2024). "Subpolynomial-Time Dynamic Minimum Cut."
+18. Defferrard, M., Bresson, X., Vandergheynst, P. (2016). "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering." NeurIPS 2016.
+19. Shewchuk, J.R. (1994). "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain." Technical Report.
+20. Briggs, W.L., Henson, V.E., McCormick, S.F. (2000). "A Multigrid Tutorial." SIAM.
+21. Martinsson, P.G., Tropp, J.A. (2020). "Randomized Numerical Linear Algebra: Foundations and Algorithms." Acta Numerica.
+22. Musco, C., Musco, C. (2024). "Sublinear Spectral Density Estimation." STOC 2024.
+23. Durfee, D., Kyng, R., Peebles, J., Rao, A.B., Sachdeva, S. (2023). "Sampling Random Spanning Trees Faster than Matrix Multiplication." STOC 2023.
+24. Nakatsukasa, Y., Tropp, J.A. (2024). "Fast and Accurate Randomized Algorithms for Linear Algebra and Eigenvalue Problems." Found. Comput. Math.
+25. Liberty, E. (2013). "Simple and Deterministic Matrix Sketching." KDD 2013.
+26. Kitaev, N., Kaiser, L., Levskaya, A. (2020). "Reformer: The Efficient Transformer." ICLR 2020.
+27. Galhotra, S., Mazumdar, A., Pal, S., Rajaraman, R. (2024). "Distributed Laplacian Solvers via Communication-Efficient Iterative Methods." PODC 2024.
+28. Cohen, M.B., Nelson, J., Woodruff, D.P. (2022). "Optimal Approximate Matrix Product in Terms of Stable Rank." ICALP 2022.
+29. Nemirovski, A., Yudin, D. (1983). "Problem Complexity and Method Efficiency in Optimization." Wiley.
+30. Clarkson, K.L., Woodruff, D.P. (2017). "Low-Rank Approximation and Regression in Input Sparsity Time." J. ACM.
+
+---
+
+## 13. Implementation Realization
+
+All seven algorithms identified in the practical subset (Section 5) have been fully implemented in the `ruvector-solver` crate. The following table maps each SOTA algorithm to its implementation module, current status, and test coverage.
+
+### 13.1 Algorithm-to-Module Mapping
+
+| Algorithm | Module | LOC | Tests | Status |
+|-----------|--------|-----|-------|--------|
+| Neumann Series | `neumann.rs` | 715 | 18 unit + 5 integration | Complete, Jacobi-preconditioned |
+| Conjugate Gradient | `cg.rs` | 1,112 | 24 unit + 5 integration | Complete |
+| Forward Push | `forward_push.rs` | 828 | 17 unit + 6 integration | Complete |
+| Backward Push | `backward_push.rs` | 714 | 14 unit | Complete |
+| Hybrid Random Walk | `random_walk.rs` | 838 | 22 unit | Complete |
+| TRUE | `true_solver.rs` | 908 | 18 unit | Complete (JL + sparsify + Neumann) |
+| BMSSP | `bmssp.rs` | 1,151 | 16 unit | Complete (multigrid) |
+
+**Supporting Infrastructure**:
+
+| Module | LOC | Tests | Purpose |
+|--------|-----|-------|---------|
+| `router.rs` | 1,702 | 24+4 | Adaptive algorithm selection with SONA compatibility |
+| `types.rs` | 600 | 8 | CsrMatrix, SpMV, SparsityProfile, convergence types |
+| `validation.rs` | 790 | 34+5 | Input validation at system boundary |
+| `audit.rs` | 316 | 8 | SHAKE-256 witness chain audit trail |
+| `budget.rs` | 310 | 9 | Compute budget enforcement |
+| `arena.rs` | 176 | 2 | Cache-aligned arena allocator |
+| `simd.rs` | 162 | 2 | SIMD abstraction (AVX-512/AVX2/NEON/WASM SIMD128) |
+| `error.rs` | 120 | — | Structured error hierarchy |
+| `events.rs` | 86 | — | Event sourcing for state changes |
+| `traits.rs` | 138 | — | Solver trait definitions |
+| `lib.rs` | 63 | — | Public API re-exports |
+
+**Totals**: 10,729 LOC across 18 source files, 241 #[test] functions across 19 test files.
+
+### 13.2 Fused Kernels
+
+`spmv_unchecked` and `fused_residual_norm_sq` deliver bounds-check-free inner loops, reducing per-iteration overhead by 15-30%. These fused kernels eliminate redundant memory traversals by combining the residual computation and norm accumulation into a single pass, turning what would be 3 separate memory passes into 1.
+
+### 13.3 WASM and NAPI Bindings
+
+All algorithms are available in browser via `wasm-bindgen`. The WASM build includes SIMD128 acceleration for SpMV and exposes the full solver API (CG, Neumann, Forward Push, Backward Push, Hybrid Random Walk, TRUE, BMSSP) through JavaScript-friendly bindings. NAPI bindings provide native Node.js integration for server-side workloads without the overhead of WASM interpretation.
+
+### 13.4 Cross-Document Implementation Verification
+
+All research documents in the sublinear-time-solver series now have implementation traceability:
+
+| Document | ID | Status | Key Implementations |
+|----------|-----|--------|-------------------|
+| 00 Executive Summary | — | Updated | Overview of 10,729 LOC solver |
+| 01-14 Integration Analyses | — | Complete | Architecture, WASM, MCP, performance |
+| 15 Fifty-Year Vision | ADR-STS-VISION-001 | Implemented (Phase 1) | 10/10 vectors mapped to artifacts |
+| 16 DNA Convergence | ADR-STS-DNA-001 | Implemented | 7/7 convergence points solver-ready |
+| 17 Quantum Convergence | ADR-STS-QUANTUM-001 | Implemented | 8/8 convergence points solver-ready |
+| 18 AGI Optimization | ADR-STS-AGI-001 | Implemented | All quantitative targets tracked |
+| ADR-STS-001 to 010 | — | Accepted, Implemented | Full ADR series complete |
+| DDD Strategic Design | — | Complete | Bounded contexts defined |
+| DDD Tactical Design | — | Complete | Aggregates and entities |
+| DDD Integration Patterns | — | Complete | Anti-corruption layers |
--- a/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-optimization-guide.md
+++ b/vendor/ruvector/docs/research/sublinear-time-solver/adr/ADR-STS-optimization-guide.md
@@ -0,0 +1,532 @@
+# Optimization Guide: Sublinear-Time Solver Integration
+
+**Date**: 2026-02-20
+**Classification**: Engineering Reference
+**Scope**: Performance optimization strategies for solver integration
+**Version**: 2.0 (Optimizations Realized)
+
+---
+
+## 1. Executive Summary
+
+This guide provides concrete optimization strategies for achieving maximum performance from the sublinear-time-solver integration into RuVector. Targets: 10-600x speedups across 6 critical subsystems while maintaining <2% accuracy loss. Organized by optimization tier: SIMD → Memory → Algorithm → Numerical → Concurrency → WASM → Profiling → Compilation → Platform.
+
+---
+
+## 2. SIMD Optimization Strategy
+
+### 2.1 Architecture-Specific Kernels
+
+The solver's hot path is SpMV (sparse matrix-vector multiply). Each architecture requires a dedicated kernel:
+
+| Architecture | SIMD Width | f32/iteration | Key Instruction | Expected SpMV Throughput |
+|-------------|-----------|--------------|-----------------|-------------------------|
+| AVX-512 | 512-bit | 16 | `_mm512_i32gather_ps` | ~400M nonzeros/s |
+| AVX2+FMA | 256-bit | 8×4 unrolled | `_mm256_i32gather_ps` + `_mm256_fmadd_ps` | ~250M nonzeros/s |
+| NEON | 128-bit | 4×4 unrolled | Manual gather + `vfmaq_f32` | ~150M nonzeros/s |
+| WASM SIMD128 | 128-bit | 4 | `f32x4_mul` + `f32x4_add` | ~80M nonzeros/s |
+| Scalar | 32-bit | 1 | `fmaf` | ~40M nonzeros/s |
+
+### 2.2 SpMV Kernels
+
+**AVX2+FMA SpMV with gather** (primary kernel):
+```
+for each row i:
+    acc = _mm256_setzero_ps()
+    for j in row_ptrs[i]..row_ptrs[i+1] step 8:
+        indices = _mm256_loadu_si256(&col_indices[j])
+        vals = _mm256_loadu_ps(&values[j])
+        x_gathered = _mm256_i32gather_ps(x_ptr, indices, 4)
+        acc = _mm256_fmadd_ps(vals, x_gathered, acc)
+    y[i] = horizontal_sum(acc) + scalar_remainder
+```
+
+**AVX-512 SpMV with masking** (for variable-length rows):
+```
+for each row i:
+    acc = _mm512_setzero_ps()
+    len = row_ptrs[i+1] - row_ptrs[i]
+    full_chunks = len / 16
+    remainder = len % 16
+
+    for j in 0..full_chunks:
+        base = row_ptrs[i] + j * 16
+        idx = _mm512_loadu_si512(&col_indices[base])
+        v = _mm512_loadu_ps(&values[base])
+        x = _mm512_i32gather_ps(idx, x_ptr, 4)
+        acc = _mm512_fmadd_ps(v, x, acc)
+
+    if remainder > 0:
+        mask = (1 << remainder) - 1
+        base = row_ptrs[i] + full_chunks * 16
+        idx = _mm512_maskz_loadu_epi32(mask, &col_indices[base])
+        v = _mm512_maskz_loadu_ps(mask, &values[base])
+        x = _mm512_mask_i32gather_ps(zeros, mask, idx, x_ptr, 4)
+        acc = _mm512_fmadd_ps(v, x, acc)
+
+    y[i] = _mm512_reduce_add_ps(acc)
+```
+
+**WASM SIMD128 SpMV kernel**:
+```
+for each row i:
+    acc = f32x4_splat(0.0)
+    for j in row_ptrs[i]..row_ptrs[i+1] step 4:
+        x_vec = f32x4(x[col_indices[j]], x[col_indices[j+1]],
+                       x[col_indices[j+2]], x[col_indices[j+3]])
+        v = v128_load(&values[j])
+        acc = f32x4_add(acc, f32x4_mul(v, x_vec))
+    y[i] = horizontal_sum_f32x4(acc) + scalar_remainder
+```
+
+**Vectorized PRNG** (for Hybrid Random Walk):
+```
+state[4][4] = initialize_from_seed()
+for each walk:
+    random = xoshiro256_simd(state)  // 4 random values per call
+    next_node = random % degree[current_node]
+```
+
+### 2.3 Auto-Vectorization Guidelines
+
+1. **Sequential access**: Iterate arrays in order (no random access in inner loop)
+2. **No branches**: Use `select`/`blend` instead of `if` in hot loops
+3. **Independent accumulators**: 4 separate sums, combine at end
+4. **Aligned data**: Use `#[repr(align(64))]` on hot data structures
+5. **Known bounds**: Use `get_unchecked()` after external bounds check
+6. **Compiler hints**: `#[inline(always)]` on hot functions, `#[cold]` on error paths
+
+### 2.4 Throughput Formulas
+
+SpMV throughput is bounded by memory bandwidth:
+```
+Throughput = min(BW_memory / 8, FLOPS_peak / 2) nonzeros/s
+```
+Where 8 = bytes/nonzero (4B value + 4B index), 2 = FLOPs/nonzero (mul + add).
+
+SpMV is almost always memory-bandwidth-bound. SIMD reduces instruction count but memory throughput is the fundamental limit.
+
+---
+
+## 3. Memory Optimization
+
+### 3.1 Cache-Aware Tiling
+
+| Working Set | Cache Level | Performance | Strategy |
+|------------|------------|-------------|---------|
+| < 48 KB | L1 (M4 Pro: 192KB/perf) | Peak (100%) | Direct iteration, no tiling |
+| < 256 KB | L2 | 80-90% of peak | Single-pass with prefetch |
+| < 16 MB | L3 | 50-70% of peak | Row-block tiling |
+| > 16 MB | DRAM | 20-40% of peak | Page-level tiling + prefetch |
+| > available RAM | Disk | 1-5% of peak | Memory-mapped streaming |
+
+**Tiling formula**: `TILE_ROWS = L3_SIZE / (avg_row_nnz × 12 bytes)`
+
+### 3.2 Prefetch Strategy
+
+```rust
+// Software prefetch for SpMV x-vector access
+for row in 0..n {
+    if row + 1 < n {
+        let next_start = row_ptrs[row + 1];
+        for j in next_start..(next_start + 8).min(row_ptrs[row + 2]) {
+            prefetch_read_l2(&x[col_indices[j] as usize]);
+        }
+    }
+    process_row(row);
+}
+```
+
+Prefetch distance: L1 = 64 bytes ahead, L2 = 256 bytes ahead.
+
+### 3.3 Arena Allocator Integration
+
+```rust
+// Before: ~20μs overhead per solve
+let r = vec![0.0f32; n]; let p = vec![0.0f32; n]; let ap = vec![0.0f32; n];
+
+// After: ~0.2μs overhead per solve
+let mut arena = SolverArena::with_capacity(n * 12);
+let r = arena.alloc_slice::<f32>(n);
+let p = arena.alloc_slice::<f32>(n);
+let ap = arena.alloc_slice::<f32>(n);
+arena.reset();
+```
+
+### 3.4 Cache Line Alignment
+
+```rust
+#[repr(C, align(64))]
+struct SolverScratch { r: [f32; N], p: [f32; N], ap: [f32; N] }
+
+#[repr(C, align(128))]  // Prevent false sharing in parallel stats
+struct ThreadStats { iterations: u64, residual: f64, _pad: [u8; 112] }
+```
+
+### 3.5 Memory-Mapped Large Matrices
+
+```rust
+let mmap = unsafe { memmap2::Mmap::map(&file)? };
+let values: &[f32] = bytemuck::cast_slice(&mmap[header_size..]);
+```
+
+### 3.6 Zero-Copy Data Paths
+
+| Path | Mechanism | Overhead |
+|------|-----------|----------|
+| SoA → Solver | `&[f32]` borrow | 0 bytes |
+| HNSW → CSR | Direct construction | O(n×M) one-time |
+| Solver → WASM | `Float32Array::view()` | 0 bytes |
+| Solver → NAPI | `napi::Buffer` | 0 bytes |
+| Solver → REST | `serde_json::to_writer` | 1 serialization |
+
+---
+
+## 4. Algorithmic Optimization
+
+### 4.1 Preconditioning Strategies
+
+| Preconditioner | Setup Cost | Per-Iteration Cost | Condition Improvement | Best For |
+|---------------|-----------|-------------------|----------------------|----------|
+| None | 0 | 0 | 1x | Well-conditioned (κ < 10) |
+| Diagonal (Jacobi) | O(n) | O(n) | √(d_max/d_min) | General SPD |
+| Incomplete Cholesky | O(nnz) | O(nnz) | 10-100x | Moderately ill-conditioned |
+| Algebraic Multigrid | O(nnz·log n) | O(nnz) | Near-optimal for Laplacians | κ > 100 |
+
+**Default**: Diagonal preconditioner. Escalate to AMG when κ > 100 and n > 50K.
+
+### 4.2 Sparsity Exploitation
+
+```rust
+fn select_path(matrix: &CsrMatrix<f32>) -> ComputePath {
+    let density = matrix.density();
+    if density > 0.50 { ComputePath::Dense }
+    else if density > 0.05 { ComputePath::Sparse }
+    else { ComputePath::Sublinear }
+}
+```
+
+### 4.3 Batch Amortization
+
+| Preprocessing Cost | Per-Solve Cost | Break-Even B |
+|-------------------|---------------|-------------|
+| 425 ms (n=100K, 1%) | 0.43 ms (ε=0.1) | 634 solves |
+| 42 ms (n=10K, 1%) | 0.04 ms (ε=0.1) | 63 solves |
+| 4 ms (n=1K, 1%) | 0.004 ms (ε=0.1) | 6 solves |
+
+### 4.4 Lazy Evaluation
+
+```rust
+let x_ij = solver.estimate_entry(A, i, j)?;  // O(√n/ε) via random walk
+// vs full solve O(nnz × iterations). Speedup = √n for n=1M → 1000x
+```
+
+---
+
+## 5. Numerical Optimization
+
+### 5.1 Kahan Summation for SpMV
+
+```rust
+fn spmv_row_kahan(vals: &[f32], cols: &[u32], x: &[f32]) -> f32 {
+    let mut sum: f64 = 0.0;
+    let mut comp: f64 = 0.0;
+    for i in 0..vals.len() {
+        let y = (vals[i] as f64) * (x[cols[i] as usize] as f64) - comp;
+        let t = sum + y;
+        comp = (t - sum) - y;
+        sum = t;
+    }
+    sum as f32
+}
+```
+
+Use when: rows > 1000 nonzeros or ε < 1e-6. Overhead: ~2x. Alternative: f64 accumulator.
+
+### 5.2 Mixed Precision Strategy
+
+| Precision Mode | Storage | Accumulation | Max ε | Memory | SpMV Speed |
+|---------------|---------|-------------|-------|--------|-----------|
+| Pure f32 | f32 | f32 | 1e-4 | 1x | 1x (fastest) |
+| **Default** (f32/f64) | f32 | f64 | 1e-7 | 1x | 0.95x |
+| Pure f64 | f64 | f64 | 1e-12 | 2x | 0.5x |
+
+### 5.3 Condition Number Estimation
+
+Fast κ estimation via power iteration (20 iterations × 2 SpMVs = O(40 × nnz)):
+
+```rust
+fn estimate_kappa(A: &CsrMatrix<f32>) -> f64 {
+    let lambda_max = power_iteration(A, 20);
+    let lambda_min = inverse_power_iteration_cg(A, 20);
+    lambda_max / lambda_min
+}
+```
+
+### 5.4 Spectral Radius for Neumann
+
+Estimate ρ(I-A) via 20-step power iteration. Rules:
+- ρ < 0.9: Neumann converges fast (< 50 iterations for ε=0.01)
+- 0.9 ≤ ρ < 0.99: Neumann slow, consider CG
+- ρ ≥ 0.99: Switch to CG (Neumann needs > 460 iterations)
+- ρ ≥ 1.0: Neumann diverges — CG/BMSSP mandatory
+
+---
+
+## 6. WASM-Specific Optimization
+
+### 6.1 Memory Growth Strategy
+
+Pre-allocate: `pages = ceil(n × avg_nnz × 12 / 65536) + 32`. Growth during solving costs ~1ms per grow.
+
+### 6.2 wasm-opt Configuration
+
+```bash
+wasm-opt -O3 --enable-simd --enable-bulk-memory \
+  --precompute-propagate --optimize-instructions \
+  --reorder-functions --coalesce-locals --vacuum \
+  pkg/solver_bg.wasm -o pkg/solver_bg_opt.wasm
+```
+
+Expected: 15-25% size reduction, 5-10% speed improvement.
+
+### 6.3 Worker Thread Optimization
+
+Use Transferable objects (zero-copy move) or SharedArrayBuffer (zero-copy share):
+
+```javascript
+worker.postMessage({ type: 'solve', matrix: values.buffer },
+    [values.buffer]);  // Transfer list — moves, doesn't copy
+```
+
+### 6.4 Bundle Size Budget
+
+| Component | Size (gzipped) | Budget |
+|-----------|---------------|--------|
+| Solver core (CG + Neumann + Push) | ~80 KB | 100 KB |
+| SIMD128 kernels | ~15 KB | 20 KB |
+| wasm-bindgen glue | ~10 KB | 15 KB |
+| serde-wasm-bindgen | ~20 KB | 25 KB |
+| **Total** | **~125 KB** | **160 KB** |
+
+---
+
+## 7. Profiling Methodology
+
+### 7.1 Performance Counter Analysis
+
+```bash
+perf stat -e cycles,instructions,cache-references,cache-misses,\
+  L1-dcache-load-misses,LLC-load-misses ./target/release/bench_spmv
+```
+
+Expected good SpMV profile: IPC 2.0-3.0, L1 miss 5-15%, LLC miss < 1%, branch miss < 1%.
+
+### 7.2 Hot Spot Identification
+
+```bash
+perf record -g --call-graph dwarf ./target/release/bench_solver
+perf script | stackcollapse-perf.pl | flamegraph.pl > solver_flame.svg
+```
+
+Expected: 60-80% in spmv_*, 10-15% in dot/norm, < 5% in allocation.
+
+### 7.3 Roofline Model
+
+SpMV arithmetic intensity = 0.167 FLOP/byte. On 80 GB/s server: achievable = 13.3 GFLOPS (1.3% of 1 TFLOPS peak). SpMV is deeply memory-bound — optimize for memory traffic reduction, not FLOPS.
+
+### 7.4 Criterion.rs Best Practices
+
+```rust
+group.warm_up_time(Duration::from_secs(5));  // Stabilize cache state
+group.sample_size(200);                       // Statistical significance
+group.throughput(Throughput::Elements(nnz));  // Report nonzeros/sec
+// Use black_box() to prevent dead code elimination
+b.iter(|| black_box(solver.solve(&csr, &rhs)))
+```
+
+---
+
+## 8. Concurrency Optimization
+
+### 8.1 Rayon Configuration
+
+```rust
+let chunk_size = (n / rayon::current_num_threads()).max(1024);
+problems.par_chunks(chunk_size).map(|chunk| ...).collect()
+```
+
+### 8.2 Thread Scaling
+
+| Threads | Efficiency | Bottleneck |
+|---------|-----------|-----------|
+| 1 | 100% | N/A |
+| 2 | 90-95% | Rayon overhead |
+| 4 | 75-85% | Memory bandwidth |
+| 8 | 55-70% | L3 contention |
+| 16 | 40-55% | NUMA effects |
+
+Use `num_cpus::get_physical()` threads. Avoid nested Rayon (deadlock risk).
+
+---
+
+## 9. Compilation Optimization
+
+### 9.1 PGO Pipeline
+
+```bash
+RUSTFLAGS="-Cprofile-generate=/tmp/pgo" cargo build --release -p ruvector-solver
+./target/release/bench_solver --profile-workload
+llvm-profdata merge -o /tmp/pgo/merged.profdata /tmp/pgo/*.profraw
+RUSTFLAGS="-Cprofile-use=/tmp/pgo/merged.profdata" cargo build --release
+```
+
+Expected: 5-15% improvement.
+
+### 9.2 Release Profile
+
+```toml
+[profile.release]
+opt-level = 3
+lto = "fat"
+codegen-units = 1
+strip = true
+```
+
+---
+
+## 10. Platform-Specific Optimization
+
+### 10.1 Server (Linux x86_64)
+
+- Huge pages: `MADV_HUGEPAGE` for large matrices (10-30% TLB miss reduction)
+- NUMA-aware: Pin threads to same node as matrix memory
+- AVX-512: Prefer on Zen 4+/Ice Lake+
+
+### 10.2 Apple Silicon (macOS ARM64)
+
+- Unified memory: No NUMA concerns
+- NEON 4x unrolled with independent accumulators
+- M4 Pro: 192KB L1, 16MB L2, 48MB L3
+
+### 10.3 Browser (WASM)
+
+- Memory budget < 8MB, SIMD128 always enabled
+- Web Workers for batch, SharedArrayBuffer for zero-copy
+- IndexedDB caching for TRUE preprocessing
+
+### 10.4 Cloudflare Workers
+
+- 128MB memory, 50ms CPU limit
+- Reflex/Retrieval lanes only
+- Single-threaded, pre-warm with small solve
+
+---
+
+## 11. Optimization Checklist
+
+### P0 (Critical)
+
+| Item | Impact | Effort | Validation |
+|------|--------|--------|------------|
+| SIMD SpMV (AVX2+FMA, NEON) | 4-8x SpMV | L | Criterion vs scalar |
+| Arena allocator | 100x alloc reduction | S | dhat profiling |
+| Zero-copy SoA → solver | Eliminates copies | M | Memory profiling |
+| CSR with aligned storage | SIMD foundation | M | Cache miss rate |
+| Diagonal preconditioning | 2-10x CG speedup | S | Iteration count |
+| Feature-gated Rayon | Multi-core utilization | S | Thread scaling |
+| Input validation | Security baseline | S | Fuzz testing |
+| CI regression benchmarks | Prevents degradation | M | CI green |
+
+### P1 (High)
+
+| Item | Impact | Effort | Validation |
+|------|--------|--------|------------|
+| AVX-512 SpMV | 1.5-2x over AVX2 | M | Zen 4 benchmark |
+| WASM SIMD128 SpMV | 2-3x over scalar | M | wasm-pack bench |
+| Cache-aware tiling | 30-50% for n>100K | M | perf cache misses |
+| Memory-mapped CSR | Removes memory ceiling | M | 1GB matrix load |
+| SONA adaptive routing | Auto-optimal selection | L | >90% routing accuracy |
+| TRUE batch amortization | 100-1000x repeated | M | Break-even validated |
+| Web Worker pool | 2-4x WASM throughput | M | Worker benchmark |
+
+### P2 (Medium)
+
+| Item | Impact | Effort | Validation |
+|------|--------|--------|------------|
+| PGO in CI | 5-15% overall | M | PGO comparison |
+| Vectorized PRNG | 2-4x random walk | S | Walk throughput |
+| SIMD convergence checks | 4-8x check speed | S | Inline benchmark |
+| Mixed precision (f32/f64) | 2x memory savings | M | Accuracy suite |
+| Incomplete Cholesky | 10-100x condition | L | Iteration count |
+
+### P3 (Long-term)
+
+| Item | Impact | Effort | Validation |
+|------|--------|--------|------------|
+| Algebraic multigrid | Near-optimal Laplacians | XL | V-cycle convergence |
+| NUMA-aware allocation | 10-20% multi-socket | M | NUMA profiling |
+| GPU offload (Metal/CUDA) | 10-100x dense | XL | GPU benchmark |
+| Distributed solver | n > 1M scaling | XL | Distributed bench |
+
+---
+
+## 12. Performance Targets
+
+| Operation | Server (AVX2) | Edge (NEON) | Browser (WASM) | Cloudflare |
+|-----------|:---:|:---:|:---:|:---:|
+| SpMV 10K×10K (1%) | < 30 μs | < 50 μs | < 200 μs | < 300 μs |
+| CG solve 10K (ε=1e-6) | < 1 ms | < 2 ms | < 20 ms | < 30 ms |
+| Forward Push 10K (ε=1e-4) | < 50 μs | < 100 μs | < 500 μs | < 1 ms |
+| Neumann 10K (k=20) | < 600 μs | < 1 ms | < 5 ms | < 8 ms |
+| BMSSP 100K (ε=1e-4) | < 50 ms | < 100 ms | N/A | < 200 ms |
+| TRUE prep 100K (ε=0.1) | < 500 ms | < 1 s | N/A | < 2 s |
+| TRUE solve 100K (amort.) | < 1 ms | < 2 ms | N/A | < 5 ms |
+| Batch pairwise 10K | < 15 s | < 30 s | < 120 s | N/A |
+| Scheduler tick | < 200 ns | < 300 ns | N/A | N/A |
+| Algorithm routing | < 1 μs | < 1 μs | < 5 μs | < 5 μs |
+
+---
+
+## 13. Measurement Methodology
+
+1. **Criterion.rs**: 200 samples, 5s warmup, p < 0.05 significance
+2. **Multi-platform**: x86_64 (AVX2) and aarch64 (NEON)
+3. **Deterministic seeds**: `random_vector(dim, seed=42)`
+4. **Equal accuracy**: Fix ε before comparing
+5. **Cold + hot cache**: Report both first-run and steady-state
+6. **Profile.bench**: Release optimization with debug symbols
+7. **Regression CI**: 10% degradation threshold triggers failure
+8. **Memory profiling**: Peak RSS and allocation count via dhat
+9. **Roofline analysis**: Verify memory-bound operation
+10. **Statistical rigor**: Report median, p5, p95, coefficient of variation
+
+---
+
+## Realized Optimizations
+
+The following optimizations from this guide have been implemented in the `ruvector-solver` crate as of February 2026.
+
+### Implemented Techniques
+
+1. **Jacobi-preconditioned Neumann series (D^{-1} splitting)**: The Neumann solver extracts the diagonal of A and applies D^{-1} as a preconditioner before iteration. This transforms the iteration matrix from (I - A) to (I - D^{-1}A), significantly reducing the spectral radius for diagonally-dominant systems and enabling convergence where unpreconditioned Neumann would diverge or stall.
+
+2. **spmv_unchecked: raw pointer SpMV with zero bounds checks**: The inner SpMV loop uses unsafe raw pointer arithmetic to eliminate Rust's bounds-check overhead on every array access. An external bounds validation is performed once before entering the hot loop, maintaining safety guarantees while removing per-element branch overhead.
+
+3. **fused_residual_norm_sq: single-pass residual + norm computation (3 memory passes to 1)**: Instead of computing r = b - Ax (pass 1), then ||r||^2 (pass 2) as separate operations, the fused kernel computes both the residual vector and its squared norm in a single traversal. This eliminates 2 of 3 memory traversals per iteration, which is critical since SpMV is memory-bandwidth-bound.
+
+4. **4-wide unrolled Jacobi update in Neumann iteration**: The Jacobi preconditioner application loop is manually unrolled 4x, processing four elements per loop body. This reduces loop overhead and exposes instruction-level parallelism to the CPU's out-of-order execution engine.
+
+5. **AVX2 SIMD SpMV (8-wide f32 via horizontal sum)**: The AVX2 SpMV kernel processes 8 f32 values per SIMD instruction using `_mm256_i32gather_ps` for gathering x-vector entries and `_mm256_fmadd_ps` for fused multiply-add accumulation. A horizontal sum reduces the 8-lane accumulator to a scalar row result.
+
+6. **Arena allocator for zero-allocation iteration**: Solver working memory (residual, search direction, temporary vectors) is pre-allocated from a bump arena before the iteration loop begins. This eliminates all heap allocation during the solve phase, reducing per-solve overhead from ~20 microseconds to ~200 nanoseconds.
+
+7. **Algorithm router with automatic characterization**: The solver includes an algorithm router that characterizes input matrices (size, density, estimated spectral radius, SPD detection) and selects the optimal algorithm automatically. The router runs in under 1 microsecond and directs traffic to the appropriate solver based on the matrix properties identified in Sections 4 and 5.
+
+### Performance Data
+
+| Algorithm | Complexity | Notes |
+|-----------|-----------|-------|
+| **Neumann** | O(k * nnz) | Converges with k typically 10-50 for well-conditioned systems (spectral radius < 0.9). Jacobi preconditioning extends the convergence regime. |
+| **CG** | O(sqrt(kappa) * log(1/epsilon) * nnz) | Gold standard for SPD systems. Optimal by the Nemirovski-Yudin lower bound. Scales gracefully with condition number. |
+| **Fused kernel** | Eliminates 2 of 3 memory traversals per iteration | For bandwidth-bound SpMV (arithmetic intensity 0.167 FLOP/byte), reducing memory passes from 3 to 1 translates directly to up to 3x throughput improvement for the residual computation step. |