wifi-densepose/docs/research/sublinear-time-solver/05-architecture-analysis.md

# Architecture Analysis: Sublinear-Time Solver Integration with ruvector

**Agent**: 5 -- Architecture & System Design
**Date**: 2026-02-20
**Status**: Complete
**Scope**: Full-stack architectural mapping, compatibility analysis, and integration strategy

---

## Table of Contents

1. [ruvector's Current Architecture Patterns](#1-ruvectors-current-architecture-patterns)
2. [Architectural Compatibility with Sublinear-Time Solver](#2-architectural-compatibility-with-sublinear-time-solver)
3. [Layered Integration Strategy (Rust -> WASM -> JS -> API)](#3-layered-integration-strategy)
4. [Module Boundary Recommendations](#4-module-boundary-recommendations)
5. [Dependency Injection Points](#5-dependency-injection-points)
6. [Event-Driven Integration Patterns](#6-event-driven-integration-patterns)
7. [Performance Architecture Considerations](#7-performance-architecture-considerations)

---

## 1. ruvector's Current Architecture Patterns

### 1.1 Macro-Architecture: Rust Workspace Monorepo

ruvector is organized as a Cargo workspace monorepo with approximately 75+ crates under
`/crates`. The workspace configuration in `Cargo.toml` lists roughly 100 workspace members
spanning core database functionality, mathematical engines, neural systems, governance layers,
and multiple deployment targets.

**Topology**: The codebase follows a layered architecture with a clear separation between
computational cores and their platform bindings:

```
Layer 0: Mathematical Foundations
  ruvector-math, ruvector-mincut, ruqu-core, ruqu-algorithms

Layer 1: Core Engines
  ruvector-core, ruvector-graph, ruvector-dag, ruvector-sparse-inference,
  prime-radiant, sona, cognitum-gate-kernel, cognitum-gate-tilezero

Layer 2: Platform Bindings
  *-wasm crates (wasm-bindgen), *-node crates (NAPI-RS), *-ffi crates

Layer 3: Integration Services
  ruvector-server (axum REST), mcp-gate (MCP/JSON-RPC), ruvector-cli (clap)

Layer 4: Distribution & Orchestration
  ruvector-cluster, ruvector-raft, ruvector-replication, ruvector-delta-consensus
```

### 1.2 The Core-Binding-Surface Pattern

Every major subsystem in ruvector follows a consistent three-part decomposition:

| Component | Purpose | Example |
|-----------|---------|---------|
| **Core** (pure Rust) | Algorithms, data structures, business logic | `ruvector-core`, `ruvector-graph`, `ruvector-math` |
| **WASM binding** | Browser/edge deployment via `wasm-bindgen` | `ruvector-wasm`, `ruvector-graph-wasm`, `ruvector-math-wasm` |
| **Node binding** | Server-side deployment via NAPI-RS | `ruvector-node`, `ruvector-graph-node`, `ruvector-gnn-node` |

This pattern is the primary architectural convention in ruvector. It appears in at least
15 subsystems: core, graph, GNN, attention, mincut, DAG, sparse-inference, math,
domain-expansion, economy, exotic, learning, nervous-system, tiny-dancer, and the
prime-radiant advanced WASM.

Key characteristics observed in the codebase:

- **Pure Rust cores** use `no_std`-compatible patterns where possible, avoiding I/O and
  platform-specific code.
- **WASM crates** wrap core types in `#[wasm_bindgen]`-annotated structs with `JsValue`
  serialization via `serde_wasm_bindgen`. They handle browser-specific concerns like
  IndexedDB persistence, Web Worker pool management, and Float32Array interop.
- **Node crates** use `#[napi]` macros with `tokio::task::spawn_blocking` for async I/O,
  leveraging zero-copy `Float32Array` buffers through NAPI-RS.

### 1.3 Dependency Management Strategy

The workspace `Cargo.toml` centralizes all shared dependencies. Critical shared dependencies
relevant to the sublinear-time solver integration:

- **Linear algebra**: `ndarray 0.16` (ruvector-math uses this extensively)
- **Numerics**: `rand 0.8`, `rand_distr 0.4`
- **WASM**: `wasm-bindgen 0.2`, `js-sys 0.3`, `web-sys 0.3`
- **Node.js**: `napi 2.16`, `napi-derive 2.16`
- **Async**: `tokio 1.41` (multi-thread runtime), `futures 0.3`
- **SIMD**: `simsimd 5.9` (distance calculations)
- **Serialization**: `serde 1.0`, `rkyv 0.8`, `bincode 2.0.0-rc.3`
- **Concurrency**: `rayon 1.10`, `crossbeam 0.8`, `dashmap 6.1`, `parking_lot 0.12`

Notable **absence**: `nalgebra` is not currently a workspace dependency. The sublinear-time
solver uses `nalgebra` as its linear algebra backend. This is a significant compatibility
consideration (analyzed in Section 2).

### 1.4 Feature Flag Architecture

ruvector makes extensive use of Cargo feature flags for conditional compilation:

- `storage` / `storage-memory`: Toggle between REDB-backed and in-memory storage
- `parallel`: Enables lock-free structures and rayon parallelism (disabled on `wasm32`)
- `collections`: Multi-collection support (requires file I/O, so conditionally excluded in WASM)
- `kernel-pack`: ADR-005 compliant secure WASM kernel execution
- `full`: Enables async-dependent modules (healing, qudag, sona) in the DAG crate
- `api-embeddings` / `real-embeddings`: External embedding model support

### 1.5 Event Sourcing and Domain Events

The `prime-radiant` crate implements a comprehensive event sourcing pattern through its
`events.rs` module. Domain events are defined as a tagged enum (`DomainEvent`) covering:

- Substrate events (NodeCreated, NodeUpdated, NodeRemoved, EdgeCreated, EdgeRemoved)
- Coherence computation events (energy calculations, residual updates)
- Governance events (policy changes, witness records)

Events are serialized with `serde` using `#[serde(tag = "type")]` for deterministic replay
and tamper detection via content hashes. This aligns well with the sublinear-time solver's
potential need for computation provenance tracking.

### 1.6 MCP Integration Pattern

The `mcp-gate` crate provides a Model Context Protocol server using JSON-RPC 2.0 over stdio.
Tools are defined declaratively with JSON Schema input specifications. The architecture uses
`Arc<RwLock<TileZero>>` for shared state with the coherence gate engine. This existing MCP
infrastructure provides a natural extension point for exposing solver capabilities to AI agents.

### 1.7 Server Architecture

`ruvector-server` uses `axum` with tower middleware layers (compression, CORS, tracing).
Routes are modular (health, collections, points). The server shares application state via
`AppState` and uses the standard Rust web service pattern with `Router` composition.

---

## 2. Architectural Compatibility with Sublinear-Time Solver

### 2.1 Structural Alignment Matrix

| Solver Component | ruvector Equivalent | Compatibility | Notes |
|-----------------|--------------------|----|-------|
| Rust core library (`sublinear_solver`) | `ruvector-core`, `ruvector-math` | **HIGH** | Both are pure Rust crates with algorithm-focused design |
| WASM layer (`wasm-bindgen`) | `ruvector-wasm`, `*-wasm` crates | **HIGH** | Identical binding technology, identical patterns |
| JS bridge (`solver.js`, etc.) | `npm/core/src/index.ts` | **HIGH** | Both provide platform-detection loaders and typed APIs |
| Express server | `ruvector-server` (axum) | **MEDIUM** | Different frameworks (Express vs axum) but compatible at API level |
| MCP integration (40+ tools) | `mcp-gate` (3 tools) | **HIGH** | Same protocol, ruvector has established patterns |
| CLI (NPX) | `ruvector-cli` (clap) | **MEDIUM** | Different CLI paradigms; ruvector uses native Rust CLI |
| TypeScript types | `npm/core/src/index.ts` | **HIGH** | ruvector already publishes TypeScript definitions |
| 9 workspace crates | ~75+ workspace crates | **HIGH** | Same Cargo workspace model |

### 2.2 Linear Algebra Backend Divergence

**This is the single most significant architectural tension.**

- **Sublinear-time solver**: Uses `nalgebra` for matrix operations, linear algebra, and
  numerical computation.
- **ruvector**: Uses `ndarray 0.16` in `ruvector-math` and raw `Vec<f32>` with SIMD intrinsics
  in `ruvector-core`.

**Resolution strategy**: Introduce `nalgebra` as a workspace dependency and create an
adapter layer. The two libraries can coexist. The adapter should provide zero-cost conversions
between `nalgebra::DMatrix<f32>` and `ndarray::Array2<f32>` views using shared memory backing.
Specifically:

```rust
// Proposed adapter in crates/ruvector-math/src/nalgebra_bridge.rs
use nalgebra::DMatrix;
use ndarray::Array2;

/// Zero-copy view conversion from nalgebra DMatrix to ndarray Array2
pub fn dmatrix_to_ndarray_view(m: &DMatrix<f32>) -> ndarray::ArrayView2<f32> {
    let (rows, cols) = m.shape();
    let slice = m.as_slice();
    ndarray::ArrayView2::from_shape((rows, cols), slice)
        .expect("nalgebra DMatrix is always contiguous column-major")
}
```

Note: `nalgebra` uses column-major storage while `ndarray` defaults to row-major. The adapter
must handle layout transposition or use `.reversed_axes()` for correct interpretation.

### 2.3 Server Framework Compatibility

The sublinear-time solver uses Express.js with session management and streaming. ruvector
uses axum (Rust). These are not in conflict because they serve different layers:

- **Solver Express server**: JS-level API for browser and Node clients, session management,
  streaming results.
- **ruvector axum server**: Rust-level REST API for database operations.

The integration should layer the solver's Express functionality as a separate API surface,
or preferably, expose solver endpoints through axum with the same streaming semantics using
axum's SSE (Server-Sent Events) or WebSocket support.

### 2.4 WASM Compilation Target Compatibility

Both projects target `wasm32-unknown-unknown` via `wasm-bindgen`. ruvector already manages
the WASM-specific constraints:

- No `std::fs`, `std::net` in WASM builds
- `parking_lot::Mutex` instead of `std::sync::Mutex` (which does not panic on web)
- `getrandom` with `wasm_js` feature for random number generation
- Console error panic hooks for debugging

The sublinear-time solver's WASM layer should be able to reuse these patterns directly. The
existing `ruvector-wasm` crate demonstrates the complete pattern including IndexedDB persistence,
Web Worker pools, Float32Array interop, and SIMD detection.

---

## 3. Layered Integration Strategy

### 3.1 Layer Architecture Overview

```
+===========================================================================+
|                        APPLICATION CONSUMERS                               |
|  MCP Agents | REST Clients | Browser Apps | CLI Users | Edge Devices       |
+===========================================================================+
         |              |             |            |            |
+===========================================================================+
|                        API SURFACE (Layer 4)                               |
|  mcp-gate          | ruvector-server | solver-server | ruvector-cli        |
|  (JSON-RPC/stdio)  | (axum REST)     | (axum SSE)    | (clap binary)      |
+===========================================================================+
         |              |             |            |
+===========================================================================+
|                        JS/TS BRIDGE (Layer 3)                              |
|  npm/core/index.ts | solver-bridge.ts | solver-worker.ts                  |
|  Platform detection, typed wrappers, async coordination                    |
+===========================================================================+
         |              |             |
+===========================================================================+
|                        WASM SURFACE (Layer 2)                              |
|  ruvector-wasm     | ruvector-solver-wasm | ruvector-math-wasm            |
|  wasm-bindgen, Float32Array, Web Workers, IndexedDB                       |
+===========================================================================+
         |              |
+===========================================================================+
|                        RUST CORE (Layer 1)                                 |
|  ruvector-core     | ruvector-solver | ruvector-math | ruvector-dag       |
|  Pure algorithms, nalgebra/ndarray, SIMD, rayon                           |
+===========================================================================+
         |
+===========================================================================+
|                        MATH FOUNDATION (Layer 0)                           |
|  nalgebra | ndarray | simsimd | ndarray-linalg (optional)                 |
+===========================================================================+
```

### 3.2 Layer 0 -> Layer 1: Rust Core Integration

**New crate**: `crates/ruvector-solver` (or `crates/sublinear-solver` if preserving the
upstream name is preferred).

Structure:

```
crates/ruvector-solver/
  Cargo.toml
  src/
    lib.rs           # Public API: traits, types, re-exports
    algorithms/
      mod.rs         # Algorithm registry
      bmssp.rs       # Bounded Max-Sum Subarray Problem solver
      fast.rs        # Fast solver variants
      sublinear.rs   # Core sublinear-time algorithms
    backend/
      mod.rs         # Backend abstraction
      nalgebra.rs    # nalgebra-backed implementation
      ndarray.rs     # ndarray bridge for ruvector interop
    config.rs        # Solver configuration
    error.rs         # Error types
    types.rs         # Core domain types (matrices, results, bounds)
```

Integration points with existing ruvector crates:

- **`ruvector-math`**: The solver's mathematical operations (optimal transport, spectral
  methods, tropical algebra) overlap with `ruvector-math`. Common abstractions should be
  extracted into shared traits.
- **`ruvector-dag`**: Sublinear graph algorithms can be applied to DAG bottleneck analysis.
  The `DagMinCutEngine` already uses subpolynomial O(n^0.12) bottleneck detection; solver
  algorithms could provide alternative or improved implementations.
- **`ruvector-sparse-inference`**: Sparse matrix operations and activation-locality patterns
  in the inference engine are natural consumers of sublinear-time solvers.

### 3.3 Layer 1 -> Layer 2: WASM Compilation

**New crate**: `crates/ruvector-solver-wasm`

This follows the established ruvector pattern exactly:

```rust
// crates/ruvector-solver-wasm/src/lib.rs
use wasm_bindgen::prelude::*;
use ruvector_solver::{SublinearSolver, SolverConfig, SolverResult};

#[wasm_bindgen(start)]
pub fn init() {
    console_error_panic_hook::set_once();
}

#[wasm_bindgen]
pub struct JsSolver {
    inner: SublinearSolver,
}

#[wasm_bindgen]
impl JsSolver {
    #[wasm_bindgen(constructor)]
    pub fn new(config: JsValue) -> Result<JsSolver, JsValue> {
        let config: SolverConfig = serde_wasm_bindgen::from_value(config)?;
        let solver = SublinearSolver::new(config)
            .map_err(|e| JsValue::from_str(&e.to_string()))?;
        Ok(JsSolver { inner: solver })
    }

    #[wasm_bindgen]
    pub fn solve(&self, input: Float32Array) -> Result<JsValue, JsValue> {
        let data = input.to_vec();
        let result = self.inner.solve(&data)
            .map_err(|e| JsValue::from_str(&e.to_string()))?;
        serde_wasm_bindgen::to_value(&result)
            .map_err(|e| JsValue::from_str(&e.to_string()))
    }
}
```

Critical WASM considerations:

1. **nalgebra WASM compatibility**: `nalgebra` compiles to WASM without issues. Ensure
   `default-features = false` if the `std` feature pulls in incompatible dependencies.
2. **Memory limits**: WASM linear memory is limited (default 256 pages = 16MB). Sublinear
   algorithms are inherently memory-efficient, which is an advantage. However, large matrix
   operations may need chunked processing.
3. **No threads by default**: WASM does not support `std::thread`. Use the existing
   `worker-pool.js` and `worker.js` patterns from `ruvector-wasm` for parallelism.

### 3.4 Layer 2 -> Layer 3: JavaScript Bridge

**New package**: `npm/solver/` (or extension of `npm/core/`)

```typescript
// npm/solver/src/index.ts
import { SublinearSolver as WasmSolver } from '../pkg/ruvector_solver_wasm';

export interface SolverConfig {
  algorithm: 'bmssp' | 'fast' | 'sublinear';
  tolerance?: number;
  maxIterations?: number;
  dimensions?: number;
}

export interface SolverResult {
  solution: Float32Array;
  iterations: number;
  converged: boolean;
  residualNorm: number;
  wallTimeMs: number;
}

export class SublinearSolver {
  private inner: WasmSolver;

  constructor(config: SolverConfig) {
    this.inner = new WasmSolver(config);
  }

  solve(input: Float32Array): SolverResult {
    return this.inner.solve(input);
  }

  async solveAsync(input: Float32Array): Promise<SolverResult> {
    // Offload to Web Worker for non-blocking execution
    return workerPool.dispatch('solve', { input, config: this.config });
  }
}
```

### 3.5 Layer 3 -> Layer 4: API Surface

For the axum-based server integration, add a new route module:

```rust
// crates/ruvector-server/src/routes/solver.rs
use axum::{extract::State, Json, response::sse::Event};
use ruvector_solver::{SublinearSolver, SolverConfig};

pub fn routes() -> Router<AppState> {
    Router::new()
        .route("/solver/solve", post(solve))
        .route("/solver/solve/stream", post(solve_stream))
        .route("/solver/config", get(get_config).put(update_config))
}
```

For the MCP integration, add new tools to `mcp-gate`:

```rust
McpTool {
    name: "solve_sublinear".to_string(),
    description: "Execute a sublinear-time solver on the provided input data".to_string(),
    input_schema: serde_json::json!({
        "type": "object",
        "properties": {
            "algorithm": { "type": "string", "enum": ["bmssp", "fast", "sublinear"] },
            "input": { "type": "array", "items": { "type": "number" } },
            "tolerance": { "type": "number", "default": 1e-6 }
        },
        "required": ["algorithm", "input"]
    }),
}
```

---

## 4. Module Boundary Recommendations

### 4.1 Boundary Principles

The following boundaries should be enforced through Cargo crate visibility and trait-based
abstraction:

```
                    PUBLIC API BOUNDARY
                    ===================
                           |
            +--------------+--------------+
            |                             |
    Solver Core Trait              ruvector Core Trait
    (SolverEngine)                 (VectorDB, SearchEngine)
            |                             |
            +------+------+       +-------+------+
            |      |      |       |       |      |
         BMSSP   Fast  Sublin   HNSW   Graph   DAG
```

### 4.2 Recommended Trait Boundaries

**Solver engine trait** (new, in `ruvector-solver`):

```rust
pub trait SolverEngine: Send + Sync {
    type Input;
    type Output;
    type Error: std::error::Error;

    fn solve(&self, input: &Self::Input) -> Result<Self::Output, Self::Error>;
    fn solve_with_budget(
        &self,
        input: &Self::Input,
        budget: ComputeBudget,
    ) -> Result<Self::Output, Self::Error>;
    fn estimate_complexity(&self, input: &Self::Input) -> ComplexityEstimate;
}
```

**Numeric backend trait** (new, in `ruvector-math` or `ruvector-solver`):

```rust
pub trait NumericBackend: Send + Sync {
    type Matrix;
    type Vector;

    fn mat_mul(&self, a: &Self::Matrix, b: &Self::Matrix) -> Self::Matrix;
    fn svd(&self, m: &Self::Matrix) -> (Self::Matrix, Self::Vector, Self::Matrix);
    fn eigenvalues(&self, m: &Self::Matrix) -> Self::Vector;
    fn norm(&self, v: &Self::Vector) -> f64;
}
```

This trait allows the solver to abstract over `nalgebra` and `ndarray` backends, and also
enables future GPU-accelerated backends (the `prime-radiant` crate already has a GPU module
with buffer management and kernel dispatch).

### 4.3 Crate Dependency Graph (Proposed)

```
ruvector-solver-wasm -----> ruvector-solver -----> ruvector-math
        |                         |                     |
        |                         |                     +---> nalgebra (new dep)
        |                         |                     +---> ndarray  (existing)
        |                         |
        |                         +---> ruvector-core (optional, for VectorDB integration)
        |
        +---> wasm-bindgen, serde_wasm_bindgen (existing workspace deps)

ruvector-solver-node -----> ruvector-solver
        |
        +---> napi, napi-derive (existing workspace deps)

mcp-gate -----> ruvector-solver (optional feature)
ruvector-server -----> ruvector-solver (optional feature)
ruvector-dag -----> ruvector-solver (optional feature for bottleneck algorithms)
```

### 4.4 Feature Flag Recommendations

```toml
[features]
default = []
nalgebra-backend = ["nalgebra"]
ndarray-backend = ["ndarray"]
wasm = ["wasm-bindgen", "serde_wasm_bindgen", "js-sys"]
parallel = ["rayon"]
simd = []  # Auto-detected via cfg(target_feature)
gpu = ["ruvector-math/gpu"]
full = ["nalgebra-backend", "ndarray-backend", "parallel"]
```

---

## 5. Dependency Injection Points

### 5.1 Core DI Architecture

ruvector uses a combination of generic type parameters and `Arc<dyn Trait>` for dependency
injection. The following injection points are relevant for the sublinear-time solver:

#### 5.1.1 Numeric Backend Injection

The solver's core algorithm implementations should accept a generic numeric backend:

```rust
pub struct SublinearSolver<B: NumericBackend = NalgebraBackend> {
    backend: B,
    config: SolverConfig,
}

impl<B: NumericBackend> SublinearSolver<B> {
    pub fn with_backend(backend: B, config: SolverConfig) -> Self {
        Self { backend, config }
    }
}
```

This allows ruvector consumers who already have `ndarray` matrices to use the solver
without conversion overhead.

#### 5.1.2 Distance Function Injection

ruvector-core's `DistanceMetric` enum defines four distance functions (Euclidean, Cosine,
DotProduct, Manhattan). The solver may need additional distance metrics or custom distance
functions. Injection point:

```rust
pub trait DistanceFunction: Send + Sync {
    fn distance(&self, a: &[f32], b: &[f32]) -> f32;
    fn name(&self) -> &str;
}

// Adapt ruvector's existing DistanceMetric
impl DistanceFunction for DistanceMetric {
    fn distance(&self, a: &[f32], b: &[f32]) -> f32 {
        match self {
            DistanceMetric::Euclidean => simsimd_euclidean(a, b),
            DistanceMetric::Cosine => simsimd_cosine(a, b),
            // ...
        }
    }
}
```

#### 5.1.3 Storage Backend Injection

ruvector-core already has conditional compilation for storage backends (`storage` vs
`storage_memory`). The solver should use a similar pattern for result caching:

```rust
pub trait SolverCache: Send + Sync {
    fn get(&self, key: &[u8]) -> Option<Vec<u8>>;
    fn put(&self, key: &[u8], value: &[u8]);
    fn invalidate(&self, key: &[u8]);
}
```

Implementations could include:
- `InMemoryCache` (default, using `DashMap`)
- `VectorDBCache` (using ruvector-core's VectorDB for nearest-neighbor result caching)
- `WasmCache` (using IndexedDB, following the `ruvector-wasm/src/indexeddb.js` pattern)

#### 5.1.4 Compute Budget Injection

Following `prime-radiant`'s compute ladder pattern (Lane 0 Reflex through Lane 3 Human),
the solver should accept compute budgets:

```rust
pub struct ComputeBudget {
    pub max_wall_time: Duration,
    pub max_iterations: usize,
    pub max_memory_bytes: usize,
    pub lane: ComputeLane,
}

pub enum ComputeLane {
    Reflex,     // < 1ms, local only
    Retrieval,  // ~ 10ms, can fetch cached results
    Heavy,      // ~ 100ms, full solver execution
    Deliberate, // unbounded, with streaming progress
}
```

### 5.2 WASM-Specific Injection Points

In the WASM layer, dependency injection occurs through JavaScript configuration objects:

```typescript
interface SolverOptions {
  // Backend selection
  backend?: 'wasm-simd' | 'wasm-baseline' | 'js-fallback';

  // Worker pool configuration
  workerCount?: number;
  workerUrl?: string;

  // Memory management
  maxMemoryMB?: number;
  useSharedArrayBuffer?: boolean;

  // Progress callback (for streaming)
  onProgress?: (progress: SolverProgress) => void;
}
```

### 5.3 Server-Level Injection

At the API layer, the solver should be injected into the axum `AppState`:

```rust
pub struct AppState {
    // Existing
    pub vector_db: Arc<RwLock<CoreVectorDB>>,
    pub collection_manager: Arc<RwLock<CoreCollectionManager>>,

    // New: solver engine injection
    pub solver: Arc<dyn SolverEngine<Input = SolverInput, Output = SolverOutput, Error = SolverError>>,
}
```

---

## 6. Event-Driven Integration Patterns

### 6.1 Alignment with Prime-Radiant Event Sourcing

The `prime-radiant` crate's `DomainEvent` enum provides a proven event-sourcing pattern.
The solver should emit analogous events for computation provenance:

```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum SolverEvent {
    /// A solve request was received
    SolveRequested {
        request_id: String,
        algorithm: String,
        input_dimensions: (usize, usize),
        timestamp: Timestamp,
    },

    /// An iteration completed
    IterationCompleted {
        request_id: String,
        iteration: usize,
        residual_norm: f64,
        wall_time_us: u64,
        timestamp: Timestamp,
    },

    /// The solver converged to a solution
    SolveConverged {
        request_id: String,
        total_iterations: usize,
        final_residual: f64,
        total_wall_time_us: u64,
        timestamp: Timestamp,
    },

    /// The solver exceeded its compute budget
    BudgetExhausted {
        request_id: String,
        budget: ComputeBudget,
        best_residual: f64,
        timestamp: Timestamp,
    },

    /// A complexity estimate was computed
    ComplexityEstimated {
        request_id: String,
        estimated_flops: u64,
        estimated_memory_bytes: u64,
        recommended_lane: ComputeLane,
        timestamp: Timestamp,
    },
}
```

### 6.2 Event Bus Integration

The solver events should be published to the same event infrastructure that prime-radiant
uses. The recommended pattern is a channel-based event bus:

```rust
pub struct SolverWithEvents<S: SolverEngine> {
    solver: S,
    event_tx: tokio::sync::broadcast::Sender<SolverEvent>,
}

impl<S: SolverEngine> SolverWithEvents<S> {
    pub fn subscribe(&self) -> tokio::sync::broadcast::Receiver<SolverEvent> {
        self.event_tx.subscribe()
    }
}
```

This enables:
- **Coherence gate integration**: Prime-radiant can subscribe to solver events and include
  solver stability in its coherence energy calculations.
- **Streaming API responses**: The axum server can convert the event stream to SSE.
- **MCP progress notifications**: The MCP server can emit JSON-RPC notifications for
  long-running solve operations.
- **Telemetry and monitoring**: The `ruvector-metrics` crate can subscribe and export
  Prometheus metrics for solver operations.

### 6.3 Coherence Gate as Solver Governor

A powerful integration pattern connects the solver to prime-radiant's coherence gate:

```
Solve Request --> Complexity Estimate --> Gate Decision --> Execute or Escalate
                                              |
                                     Prime-Radiant evaluates:
                                     - Energy budget available?
                                     - System coherence stable?
                                     - Resource contention low?
```

The `cognitum-gate-tilezero` crate's `permit_action` tool can govern solver execution:

```rust
// Before executing a solver, request permission from the gate
let action = ActionContext {
    action_id: format!("solve-{}", request_id),
    action_type: "heavy_compute".into(),
    target: ActionTarget {
        device: "solver-engine".into(),
        path: format!("/solver/{}", algorithm),
    },
    metadata: ActionMetadata {
        estimated_cost: complexity.estimated_flops as f64,
        estimated_duration_ms: complexity.estimated_wall_time_ms,
    },
};

match gate.permit_action(action).await {
    GateDecision::Permit(token) => solver.solve_with_token(input, token),
    GateDecision::Defer(info) => escalate_to_queue(input, info),
    GateDecision::Deny(reason) => Err(SolverError::Denied(reason)),
}
```

### 6.4 DAG Integration Events

The `ruvector-dag` crate's query plan optimizer can emit events when bottleneck analysis
identifies nodes that would benefit from sublinear-time solving:

```rust
// In ruvector-dag when a bottleneck is detected
SolverEvent::BottleneckSolverRequested {
    dag_id: dag.id(),
    bottleneck_nodes: bottlenecks.iter().map(|b| b.node_id).collect(),
    estimated_speedup: bottlenecks.iter().map(|b| b.speedup_potential).sum(),
    timestamp: now(),
}
```

---

## 7. Performance Architecture Considerations

### 7.1 Memory Architecture

#### Current ruvector Memory Model

ruvector-core uses several memory optimization strategies:

- **Arena allocator** (`arena.rs`): Cache-aligned vector allocation with `CACHE_LINE_SIZE`
  awareness and batch allocation via `BatchVectorAllocator`.
- **SoA storage** (`cache_optimized.rs`): Structure-of-Arrays layout for cache-friendly
  sequential access to vector components.
- **Memory pools** (`memory.rs`): Basic allocation tracking with optional limits.
- **Paged memory** (ADR-006): 2MB page-granular allocation with LRU eviction and
  Hot/Warm/Cold residency tiers.

#### Solver Memory Requirements

Sublinear-time algorithms are inherently memory-efficient (often O(n^alpha) for alpha < 1),
but the nalgebra backend may allocate large intermediate matrices. Recommendations:

1. **Use ruvector's arena allocator** for solver-internal scratch space. Wrap nalgebra
   allocations in arena-backed storage:

   ```rust
   pub struct SolverArena {
       inner: Arena,
       scratch_matrices: Vec<DMatrix<f32>>,
   }
   ```

2. **Integrate with ADR-006 paged memory** for large problem instances. The solver should
   respect the memory pool's limit and request pages through the established interface rather
   than allocating directly.

3. **WASM memory budget**: In WASM, limit solver memory to a configurable fraction of the
   linear memory. The default WASM memory of 16MB is tight; ensure the solver can operate
   within 4-8MB for typical problem sizes, using the `ComputeBudget.max_memory_bytes` field.

### 7.2 SIMD Optimization Strategy

ruvector uses `simsimd 5.9` for distance calculations, achieving approximately 16M ops/sec
for 512-dimensional vectors. The solver should leverage SIMD at two levels:

1. **Auto-vectorization**: Write inner loops in a SIMD-friendly style (sequential access,
   no branches, aligned data). Rust's LLVM backend will auto-vectorize these for both native
   and WASM targets.

2. **Explicit SIMD**: For hot paths, use `std::arch` intrinsics with runtime detection:

   ```rust
   #[cfg(target_arch = "x86_64")]
   use std::arch::x86_64::*;

   #[cfg(target_arch = "wasm32")]
   use std::arch::wasm32::*;
   ```

   The existing `ruvector-core/src/simd_intrinsics.rs` provides patterns for this.

3. **WASM SIMD128**: The `ruvector-wasm` crate already detects SIMD support via
   `detect_simd()`. Ensure the solver WASM crate is compiled with `-C target-feature=+simd128`
   for WASM SIMD support, with a non-SIMD fallback.

### 7.3 Concurrency Architecture

#### Native (Server) Concurrency

ruvector uses a rich concurrency toolkit:

- **Rayon** for data-parallel operations (conditional on `feature = "parallel"`)
- **Crossbeam** for lock-free data structures
- **DashMap** for concurrent hash maps
- **Parking_lot** for efficient mutexes and RwLocks
- **Tokio** for async I/O and task scheduling
- **Lock-free structures** (`lockfree.rs`): `AtomicVectorPool`, `LockFreeWorkQueue`,
  `LockFreeBatchProcessor`

The solver should integrate with this concurrency model:

```rust
impl SublinearSolver {
    pub fn solve_parallel(&self, input: &[f32]) -> Result<SolverResult> {
        #[cfg(feature = "parallel")]
        {
            input.par_chunks(self.config.chunk_size)
                .map(|chunk| self.solve_chunk(chunk))
                .reduce_with(|a, b| self.merge_results(a?, b?))
                .unwrap_or(Err(SolverError::EmptyInput))
        }
        #[cfg(not(feature = "parallel"))]
        {
            self.solve_sequential(input)
        }
    }
}
```

#### WASM Concurrency

WASM does not support native threads. The solver must use Web Workers for parallelism:

- Follow the `ruvector-wasm/src/worker-pool.js` pattern
- Use `SharedArrayBuffer` for zero-copy data sharing between workers (requires
  `Cross-Origin-Opener-Policy: same-origin` and `Cross-Origin-Embedder-Policy: require-corp`)
- Fall back to `postMessage` with transferable `ArrayBuffer` when SAB is unavailable

### 7.4 Latency Targets by Deployment Context

| Context | Target Latency | Memory Budget | Strategy |
|---------|---------------|---------------|----------|
| **WASM (browser)** | < 50ms for 10K elements | 4-8 MB | SIMD128, single-threaded, streaming |
| **WASM (edge/Cloudflare)** | < 10ms for 10K elements | 128 MB | SIMD128, limited workers |
| **Node.js (NAPI)** | < 5ms for 10K elements | 512 MB | Native SIMD, Rayon parallel |
| **Server (axum)** | < 2ms for 10K elements | 2 GB | Full SIMD, Rayon, memory-mapped |
| **MCP (agent)** | Budget-dependent | Configurable | Gate-governed, compute ladder |

### 7.5 Benchmarking Integration

ruvector uses `criterion 0.5` for benchmarking with HTML reports. The solver should integrate
into the existing benchmark infrastructure:

```rust
// benches/solver_benchmarks.rs
use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use ruvector_solver::{SublinearSolver, SolverConfig};

fn bench_sublinear_solve(c: &mut Criterion) {
    let mut group = c.benchmark_group("sublinear_solver");

    for size in [100, 1_000, 10_000, 100_000] {
        group.bench_with_input(
            BenchmarkId::new("bmssp", size),
            &size,
            |b, &size| {
                let solver = SublinearSolver::new(SolverConfig::default());
                let input: Vec<f32> = (0..size).map(|i| i as f32).collect();
                b.iter(|| solver.solve(&input));
            },
        );
    }
    group.finish();
}
```

The benchmark results should be stored in the existing `bench_results/` directory in JSON
format, matching the schema used by `comparison_benchmark.json` and `latency_benchmark.json`.

### 7.6 Profile-Guided Optimization

The workspace `Cargo.toml` already configures aggressive release optimizations:

```toml
[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
strip = true
```

These settings are critical for solver performance. Additional considerations:

- **PGO (Profile-Guided Optimization)**: For the NAPI binary, consider adding a PGO training
  step using representative solver workloads.
- **WASM opt**: Run `wasm-opt -O3` on the solver WASM output (the existing build scripts
  in `ruvector-wasm` likely already do this).
- **Link-time optimization across crates**: The `lto = "fat"` setting enables cross-crate
  LTO, which is essential for inlining nalgebra operations into solver hot paths.

### 7.7 Zero-Copy Data Path

The critical performance path for the solver is the data pipeline from API input to solver
core and back. Minimize copies:

```
API (axum): body bytes  --deserialize-->  SolverInput
                                               |
                            +---------borrow-----------+
                            |                          |
                      nalgebra::DMatrixSlice      result buffer
                            |                          |
                            +------solve-------->------+
                                                       |
                                          --serialize-->  API response bytes
```

For the WASM path:

```
JS Float32Array --view (no copy)--> wasm linear memory --solve--> wasm linear memory
                                                                       |
                                                         --view (no copy)--> JS Float32Array
```

The key is to use `Float32Array::view()` in wasm-bindgen rather than `Float32Array::copy_from()`
wherever the solver does not need to retain ownership of the input data.

---

## Summary of Key Recommendations

1. **Create `crates/ruvector-solver`** as a new pure-Rust workspace member, following the
   established core-binding-surface pattern.

2. **Add `nalgebra` as a workspace dependency** and create a bridge module in `ruvector-math`
   for zero-cost conversions between nalgebra and ndarray representations.

3. **Follow the existing three-crate pattern** exactly: `ruvector-solver` (core),
   `ruvector-solver-wasm` (browser), `ruvector-solver-node` (server).

4. **Integrate with prime-radiant's event sourcing** by emitting `SolverEvent`s through
   a broadcast channel, enabling coherence gate governance and streaming API responses.

5. **Use the coherence gate as a solver governor** to prevent runaway computation and
   integrate with the compute ladder (Lane 0-3).

6. **Inject the solver into `AppState`** for axum server integration, and add new MCP
   tools to `mcp-gate` for AI agent access.

7. **Respect ruvector's memory architecture** by integrating with the arena allocator,
   SoA storage patterns, and ADR-006 paged memory management.

8. **Target WASM SIMD128** for browser performance, with graceful fallback to scalar code
   detected at runtime via the existing `detect_simd()` mechanism.

9. **Use Rayon with feature gating** for native parallelism, and Web Workers for WASM
   parallelism, following the patterns already established in `ruvector-wasm`.

10. **Integrate benchmarks into the existing `criterion` infrastructure** and store results
    in the `bench_results/` directory for regression tracking.