Files
wifi-densepose/docs/research/sublinear-time-solver/06-wasm-integration.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

912 lines
35 KiB
Markdown

# 06 - WebAssembly Integration Analysis
**Agent**: 6 (WASM Integration Specialist)
**Date**: 2026-02-20
**Scope**: ruvector codebase WASM capabilities, build pipeline, SIMD acceleration, memory management, deployment strategies, module loading, and benchmarking framework
---
## Table of Contents
1. [Existing WASM Usage in ruvector](#1-existing-wasm-usage-in-ruvector)
2. [WASM Build Pipeline Compatibility](#2-wasm-build-pipeline-compatibility)
3. [SIMD Acceleration Opportunities](#3-simd-acceleration-opportunities)
4. [Memory Management Patterns](#4-memory-management-patterns)
5. [Browser vs Node.js Deployment Strategies](#5-browser-vs-nodejs-deployment-strategies)
6. [WASM Module Loading and Initialization Patterns](#6-wasm-module-loading-and-initialization-patterns)
7. [Performance Benchmarking Framework for WASM](#7-performance-benchmarking-framework-for-wasm)
8. [Recommendations for the Sublinear-Time Solver](#8-recommendations-for-the-sublinear-time-solver)
---
## 1. Existing WASM Usage in ruvector
### 1.1 Scale of WASM Infrastructure
The ruvector project has a **massive, mature WASM infrastructure**. The workspace defines **27 dedicated WASM crates** in the Cargo workspace, spanning vector database operations, attention mechanisms, graph algorithms, ML inference, and self-learning solvers. This is not an experimental feature -- it is a first-class deployment target.
#### WASM Crate Inventory (27 crates)
| Crate | Description | Target | Size |
|-------|-------------|--------|------|
| `ruvector-wasm` | Core vector DB bindings (HNSW, insert, search, delete) | `wasm32-unknown-unknown` (wasm-bindgen) | ~28 KB src |
| `rvf-solver-wasm` | Self-learning temporal solver (Thompson Sampling, PolicyKernel) | `wasm32-unknown-unknown` (no_std + alloc, `extern "C"`) | ~160 KB compiled |
| `rvf-wasm` | RVF format microkernel for browser/edge vector ops | `wasm32-unknown-unknown` | - |
| `micro-hnsw-wasm` | Neuromorphic HNSW with spiking neural nets | `wasm32-unknown-unknown` | 11.8 KB compiled |
| `ruvector-attention-wasm` | 18+ attention mechanisms (Flash, MoE, Hyperbolic) | `wasm32-unknown-unknown` (wasm-bindgen) | - |
| `ruvector-attention-unified-wasm` | Unified attention API | `wasm32-unknown-unknown` | 339 KB compiled |
| `ruvector-learning-wasm` | MicroLoRA adaptation (<100us latency) | `wasm32-unknown-unknown` | 39 KB compiled |
| `ruvector-nervous-system-wasm` | Bio-inspired neural simulation | `wasm32-unknown-unknown` | 178 KB compiled |
| `ruvector-economy-wasm` | Compute credit management | `wasm32-unknown-unknown` | 181 KB compiled |
| `ruvector-exotic-wasm` | Quantum, hyperbolic, topological | `wasm32-unknown-unknown` | 149 KB compiled |
| `ruvector-sparse-inference-wasm` | Sparse matrix inference with WASM SIMD | `wasm32-unknown-unknown` | - |
| `ruvector-delta-wasm` | Delta operations with SIMD | `wasm32-unknown-unknown` | - |
| `ruvector-mincut-wasm` | Subpolynomial-time dynamic min-cut | `wasm32-unknown-unknown` | - |
| `ruvector-mincut-gated-transformer-wasm` | Gated transformer min-cut | `wasm32-unknown-unknown` | - |
| `ruvector-graph-wasm` | Graph operations | `wasm32-unknown-unknown` | - |
| `ruvector-gnn-wasm` | Graph neural networks | `wasm32-unknown-unknown` | - |
| `ruvector-dag-wasm` | Minimal DAG for browser/embedded | `wasm32-unknown-unknown` | - |
| `ruvector-math-wasm` | Math operations (Wasserstein, manifolds, spherical) | `wasm32-unknown-unknown` | - |
| `ruvector-router-wasm` | Query routing | `wasm32-unknown-unknown` | - |
| `ruvector-fpga-transformer-wasm` | FPGA transformer simulation | `wasm32-unknown-unknown` | - |
| `ruvector-temporal-tensor-wasm` | Temporal tensor operations | `wasm32-unknown-unknown` | - |
| `ruvector-tiny-dancer-wasm` | Lightweight operations | `wasm32-unknown-unknown` | - |
| `ruvector-hyperbolic-hnsw-wasm` | Hyperbolic HNSW | `wasm32-unknown-unknown` | - |
| `ruvector-domain-expansion-wasm` | Cross-domain transfer learning | `wasm32-unknown-unknown` | - |
| `ruvllm-wasm` | LLM inference | `wasm32-unknown-unknown` | - |
| `ruqu-wasm` | Quantum operations | `wasm32-unknown-unknown` | - |
| `exo-wasm` (example) | Exo AI experiment | `wasm32-unknown-unknown` | - |
### 1.2 Two Distinct WASM Binding Strategies
The codebase employs **two fundamentally different WASM integration patterns**:
#### Pattern A: wasm-bindgen + wasm-pack (High-Level, Browser-First)
Used by: `ruvector-wasm`, `ruvector-attention-wasm`, `ruvector-math-wasm`, most `-wasm` crates.
```rust
// crates/ruvector-wasm/src/lib.rs
use wasm_bindgen::prelude::*;
use js_sys::{Float32Array, Object, Promise};
use web_sys::{console, IdbDatabase, IdbFactory};
#[wasm_bindgen(start)]
pub fn init() {
console_error_panic_hook::set_once();
tracing_wasm::set_as_global_default();
}
#[wasm_bindgen]
pub struct VectorDB { /* ... */ }
#[wasm_bindgen]
impl VectorDB {
#[wasm_bindgen(constructor)]
pub fn new(dimensions: usize, metric: Option<String>, use_hnsw: Option<bool>)
-> Result<VectorDB, JsValue> { /* ... */ }
}
```
Key dependencies: `wasm-bindgen`, `wasm-bindgen-futures`, `js-sys`, `web-sys`, `serde-wasm-bindgen`, `console_error_panic_hook`.
Advantages: Rich JS interop, automatic TypeScript type generation, Promise support, access to Web APIs (IndexedDB, Workers, console).
#### Pattern B: no_std + extern "C" ABI (Low-Level, Minimal)
Used by: `rvf-solver-wasm`, `rvf-wasm`, `micro-hnsw-wasm`.
```rust
// crates/rvf/rvf-solver-wasm/src/lib.rs
#![no_std]
extern crate alloc;
#[no_mangle]
pub extern "C" fn rvf_solver_create() -> i32 {
registry().create()
}
#[no_mangle]
pub extern "C" fn rvf_solver_train(handle: i32, count: i32, /* ... */) -> i32 { /* ... */ }
```
Key dependencies: `dlmalloc` (global allocator), `libm`, `serde` (no_std + alloc). No wasm-bindgen.
Advantages: Minimal binary size (~160 KB for rvf-solver-wasm, 11.8 KB for micro-hnsw-wasm), no JS runtime dependency, runs on bare wasm32-unknown-unknown, suitable for self-bootstrapping RVF files.
### 1.3 Kernel Pack System (ADR-005)
The `ruvector-wasm` crate includes a sophisticated **Kernel Pack System** (`/crates/ruvector-wasm/src/kernel/`) for secure, sandboxed execution of ML compute kernels via Wasmtime:
- **Manifest parsing** (`manifest.rs`): Declares kernel categories (Positional/RoPE, Normalization/RMSNorm, Activation/SwiGLU, KV-Cache, Adapter/LoRA), tensor specs, resource limits
- **Ed25519 signature verification** (`signature.rs`): Supply chain security for kernel packs
- **SHA256 hash verification** (`hash.rs`): Content integrity
- **Epoch-based execution budgets** (`epoch.rs`): Coarse-grained interruption with configurable tick intervals (10ms server, 1ms embedded)
- **Shared memory protocol** (`memory.rs`): 16-byte aligned allocation, region overlap validation, tensor layout management
- **Kernel runtime** (`runtime.rs`): `KernelRuntime` trait with compile/instantiate/execute lifecycle, mock runtime for testing
- **Trusted allowlist** (`allowlist.rs`): Restricts which kernel IDs may execute
This kernel pack system is directly relevant to the sublinear-time solver because it provides a ready-made infrastructure for sandboxed execution of solver kernels with resource limits.
### 1.4 Self-Bootstrapping WASM (RVF Format)
The `rvf-types` crate defines a `WasmHeader` (`/crates/rvf/rvf-types/src/wasm_bootstrap.rs`) for embedding WASM modules directly inside `.rvf` data files:
```
.rvf file
+-- WASM_SEG (role=Interpreter, ~50 KB)
+-- WASM_SEG (role=Microkernel, ~5.5 KB)
+-- VEC_SEG (data)
```
Roles: `Microkernel`, `Interpreter`, `Combined`, `Extension`, `ControlPlane`.
Targets: `Wasm32`, `WasiP1`, `WasiP2`, `Browser`, `BareTile`.
Feature flags: `WASM_FEAT_SIMD`, `WASM_FEAT_BULK_MEMORY`, `WASM_FEAT_MULTI_VALUE`, `WASM_FEAT_REFERENCE_TYPES`, `WASM_FEAT_THREADS`, `WASM_FEAT_TAIL_CALL`, `WASM_FEAT_GC`, `WASM_FEAT_EXCEPTION_HANDLING`.
### 1.5 Unified WASM TypeScript API
The `@ruvector/wasm-unified` npm package (`/npm/packages/ruvector-wasm-unified/src/index.ts`) provides a high-level TypeScript surface combining all WASM modules:
```typescript
export interface UnifiedEngine {
attention: AttentionEngine; // 14+ mechanisms
learning: LearningEngine; // MicroLoRA, SONA, BTSP, RL
nervous: NervousEngine; // Bio-inspired neural simulation
economy: EconomyEngine; // Compute credits
exotic: ExoticEngine; // Quantum, hyperbolic, topological
version(): string;
getStats(): UnifiedStats;
init(): Promise<void>;
dispose(): void;
}
```
---
## 2. WASM Build Pipeline Compatibility
### 2.1 Workspace-Level Configuration
The root `Cargo.toml` defines workspace-level WASM dependencies:
```toml
# /Cargo.toml (workspace)
[workspace.dependencies]
wasm-bindgen = "0.2"
wasm-bindgen-futures = "0.4"
js-sys = "0.3"
web-sys = { version = "0.3", features = ["Worker", "MessagePort", "console"] }
getrandom = { version = "0.3", features = ["wasm_js"] }
```
There is also a getrandom compatibility patch for WASM:
```toml
# In ruvector-wasm/Cargo.toml
getrandom02 = { package = "getrandom", version = "0.2", features = ["js"] }
[target.'cfg(target_arch = "wasm32")'.dependencies]
getrandom = { workspace = true, features = ["wasm_js"] }
```
And a workspace-level patch for hnsw_rs to use rand 0.8 for WASM compatibility:
```toml
[patch.crates-io]
hnsw_rs = { path = "./patches/hnsw_rs" }
```
### 2.2 Build Profiles
Two distinct WASM build profiles exist:
#### Profile 1: Size-Optimized (for wasm-bindgen crates)
```toml
# crates/ruvector-wasm/Cargo.toml
[profile.release]
opt-level = "z" # Optimize for size
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit
panic = "abort" # No unwind tables
[profile.release.package."*"]
opt-level = "z"
[package.metadata.wasm-pack.profile.release]
wasm-opt = false # Disable wasm-opt (already optimized by LTO)
```
#### Profile 2: Size-Optimized + Strip (for no_std crates)
```toml
# crates/rvf/rvf-solver-wasm/Cargo.toml
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true # Also strips debug symbols
```
#### Profile 3: Workspace Default Release (native)
```toml
# Root Cargo.toml
[profile.release]
opt-level = 3 # Optimize for speed
lto = "fat"
codegen-units = 1
strip = true
panic = "unwind" # Keeps unwind tables (unlike WASM profile)
```
### 2.3 Build Tooling
The test script at `/scripts/test/test-wasm.mjs` demonstrates the build command:
```bash
wasm-pack build crates/ruvector-attention-wasm --target web --release
```
For no_std crates like rvf-solver-wasm, the standard cargo command with WASM target is used:
```bash
cargo build --target wasm32-unknown-unknown --release -p rvf-solver-wasm
```
### 2.4 Sublinear-Time Solver Build Compatibility
The rvf-solver-wasm crate provides the closest precedent for a sublinear-time solver WASM build:
- **Target**: `wasm32-unknown-unknown` (no WASI dependency)
- **Allocator**: `dlmalloc` (global allocator for `alloc`)
- **Math**: `libm` (no_std-compatible math functions)
- **Serialization**: `serde` + `serde_json` (no_std + alloc features)
- **Crypto**: `rvf-crypto` (SHAKE-256 witness chain)
- **Panic handler**: `core::arch::wasm32::unreachable()`
- **ABI**: `extern "C"` exports (no wasm-bindgen overhead)
- **Crate type**: `cdylib` only (no rlib)
This approach produces binaries in the ~160 KB range, which is excellent for edge deployment.
---
## 3. SIMD Acceleration Opportunities
### 3.1 Existing WASM SIMD Infrastructure
The codebase has **extensive WASM SIMD128 support** across multiple crates, all using `core::arch::wasm32::*` intrinsics. Every SIMD function provides dual implementations: a `#[cfg(target_feature = "simd128")]` version using WASM SIMD intrinsics and a `#[cfg(not(target_feature = "simd128"))]` scalar fallback.
#### WASM SIMD Operations Already Implemented
| Crate | File | Operations |
|-------|------|------------|
| `ruvector-delta-wasm` | `src/simd.rs` | `f32x4` add, sub, scale, dot, L2 norm, diff, abs, clamp, count_nonzero |
| `ruvector-sparse-inference` | `src/backend/wasm.rs` | `f32x4` dot product, ReLU, vector add, AXPY |
| `ruvector-mincut` | `src/wasm/simd.rs` | `v128` popcount (table lookup method), XOR, boundary computation, batch membership |
| `ruvector-core` | `src/simd_intrinsics.rs` | x86_64 (AVX2, AVX-512, FMA), aarch64 (NEON, unrolled), INT8 quantized, batch operations |
#### SIMD Operations in ruvector-delta-wasm/src/simd.rs (Representative)
```rust
use core::arch::wasm32::*;
#[cfg(target_feature = "simd128")]
pub fn simd_dot(a: &[f32], b: &[f32]) -> f32 {
let chunks = a.len() / 4;
let mut sum_vec = f32x4_splat(0.0);
for i in 0..chunks {
let offset = i * 4;
unsafe {
let a_vec = v128_load(a.as_ptr().add(offset) as *const v128);
let b_vec = v128_load(b.as_ptr().add(offset) as *const v128);
let prod = f32x4_mul(a_vec, b_vec);
sum_vec = f32x4_add(sum_vec, prod);
}
}
// Horizontal sum + remainder handling
let sum_array: [f32; 4] = unsafe { core::mem::transmute(sum_vec) };
let mut sum = sum_array[0] + sum_array[1] + sum_array[2] + sum_array[3];
for i in (chunks * 4)..a.len() { sum += a[i] * b[i]; }
sum
}
```
#### SIMD Operations in ruvector-sparse-inference/src/backend/wasm.rs (Backend Trait)
```rust
pub struct WasmBackend;
impl Backend for WasmBackend {
fn dot_product(&self, a: &[f32], b: &[f32]) -> f32 { /* SIMD dispatch */ }
fn sparse_matmul(&self, matrix: &Array2<f32>, input: &[f32], rows: &[usize]) -> Vec<f32>;
fn sparse_matmul_accumulate(&self, matrix: &Array2<f32>, input: &[f32], cols: &[usize], output: &mut [f32]);
fn activation(&self, data: &mut [f32], activation_type: ActivationType); // ReLU via SIMD
fn add(&self, a: &mut [f32], b: &[f32]);
fn axpy(&self, a: &mut [f32], b: &[f32], scalar: f32);
fn name(&self) -> &'static str { "WASM-SIMD" }
fn simd_width(&self) -> usize { 4 } // 128-bit = 4 x f32
}
```
### 3.2 SIMD Acceleration Opportunities for the Sublinear-Time Solver
Based on the sublinear-time solver's core operations, the following SIMD acceleration points are identified:
| Operation | SIMD Strategy | Expected Speedup | Existing Pattern |
|-----------|---------------|-------------------|------------------|
| Distance computation (dot, cosine, euclidean) | `f32x4_mul` + `f32x4_add` accumulation | 2-4x | `ruvector-delta-wasm/src/simd.rs` |
| Vector normalization | `f32x4_mul` (scale) + `f32x4_add` (L2 norm) | 2-4x | `simd_l2_norm_squared`, `simd_scale` |
| Bitset operations (partition tracking) | `v128_xor`, `v128_and`, popcount via lookup | 4-8x | `ruvector-mincut/src/wasm/simd.rs` |
| Sparse matrix-vector multiply | SIMD dot + sparse row selection | 2-4x | `WasmBackend::sparse_matmul` |
| Activation functions (ReLU, GELU) | `f32x4_max` with zero splat | 2-4x | `relu_wasm_simd` |
| Thompson Sampling bandit updates | Scalar (branching-heavy) | 1x (no benefit) | N/A |
| Sort/selection (top-k) | Scalar (comparison-heavy) | 1x (no benefit) | N/A |
### 3.3 SIMD Feature Detection
The `ruvector-wasm` crate exposes SIMD detection to JS:
```rust
#[wasm_bindgen(js_name = detectSIMD)]
pub fn detect_simd() -> bool {
#[cfg(target_feature = "simd128")]
{ true }
#[cfg(not(target_feature = "simd128"))]
{ false }
}
```
For the sublinear-time solver, SIMD should be compiled in via `RUSTFLAGS="-C target-feature=+simd128"` at build time, with scalar fallbacks for environments that do not support it.
### 3.4 Native SIMD Comparison
The native codebase (`ruvector-core/src/simd_intrinsics.rs`) supports:
- **x86_64**: AVX2 (256-bit, 8 x f32), AVX-512 (512-bit, 16 x f32), FMA, INT8 quantized
- **aarch64**: NEON (128-bit, 4 x f32), 4x loop unrolling, FMA via `vfmaq_f32`
- **WASM**: SIMD128 (128-bit, 4 x f32)
WASM SIMD128 provides the same width as NEON (4 x f32) but lacks FMA (`f32x4_fma` is not available in stable WASM SIMD). This means the sublinear-time solver WASM build will be approximately 2-3x slower than a native NEON build for distance computations, and 4-8x slower than an AVX-512 build. However, it will still be significantly faster than scalar fallback.
---
## 4. Memory Management Patterns
### 4.1 Shared Memory Protocol (Kernel Pack System)
The kernel pack system at `/crates/ruvector-wasm/src/kernel/memory.rs` defines a mature shared memory protocol:
```rust
pub struct SharedMemoryProtocol {
total_size: usize, // Total memory in bytes
current_offset: usize, // Bump allocator position
alignment: usize, // Typically 16 bytes
}
impl SharedMemoryProtocol {
pub fn default_settings() -> Self {
Self::new(256, 16) // 256 pages = 16 MB, 16-byte alignment
}
pub fn allocate(&mut self, size: usize) -> Result<usize, KernelError> {
let aligned_offset = self.align_offset(self.current_offset);
// ...bounds check...
self.current_offset = aligned_offset + size;
Ok(aligned_offset)
}
}
```
The `KernelInvocationDescriptor` manages tensor memory layout:
```rust
pub struct KernelInvocationDescriptor {
pub descriptor: KernelDescriptor, // input_a, input_b, output, scratch, params offsets+sizes
protocol: SharedMemoryProtocol,
}
```
The `MemoryLayoutValidator` prevents region overlap and bounds violations.
### 4.2 Typed Arrays / Zero-Copy Transfer
The wasm-bindgen crates use `Float32Array` for zero-copy data transfer between JS and WASM:
```rust
// Input: JS Float32Array -> Rust Vec<f32>
pub fn insert(&self, vector: Float32Array, ...) -> Result<String, JsValue> {
let vector_data: Vec<f32> = vector.to_vec(); // Copy from JS typed array
// ...
}
// Output: Rust Vec<f32> -> JS Float32Array
pub fn vector(&self) -> Float32Array {
Float32Array::from(&self.inner.vector[..]) // Copy to JS typed array
}
```
Note: `Float32Array::to_vec()` and `Float32Array::from()` perform copies. True zero-copy requires accessing WASM linear memory directly from JS, which is demonstrated in the pwa-loader:
```javascript
// Zero-copy write into WASM memory
function wasmWrite(data) {
const ptr = wasmInstance.exports.rvf_alloc(data.length);
const mem = new Uint8Array(wasmMemory.buffer, ptr, data.length);
mem.set(data); // Direct memory write
return ptr;
}
// Zero-copy read from WASM memory
function wasmRead(ptr, len) {
return new Uint8Array(wasmMemory.buffer, ptr, len).slice();
}
```
### 4.3 Memory Patterns in rvf-solver-wasm (no_std)
The no_std solver uses `dlmalloc` as global allocator and manages its own instance registry:
```rust
// Global mutable registry - safe in single-threaded WASM
static mut REGISTRY: Registry = Registry::new();
const MAX_INSTANCES: usize = 8;
struct SolverInstance {
solver: AdaptiveSolver,
last_result_json: Vec<u8>, // Heap-allocated via dlmalloc
policy_json: Vec<u8>,
witness_chain: Vec<u8>,
}
```
Memory export for external reads uses raw pointer copies:
```rust
#[no_mangle]
pub extern "C" fn rvf_solver_result_read(handle: i32, out_ptr: i32) -> i32 {
let data = &inst.last_result_json;
unsafe {
core::ptr::copy_nonoverlapping(data.as_ptr(), out_ptr as *mut u8, data.len());
}
data.len() as i32
}
```
### 4.4 Memory Limits
| Configuration | Max Pages | Memory Limit | Context |
|---------------|-----------|--------------|---------|
| Server runtime | 1024 | 64 MB | `RuntimeConfig::server()` |
| Embedded runtime | 64 | 4 MB | `RuntimeConfig::embedded()` |
| Default shared memory | 256 | 16 MB | `SharedMemoryProtocol::default_settings()` |
| Microkernel (RVF) | 2-4 | 128-256 KB | `WasmHeader` min/max pages |
| WASM page size | 1 | 64 KB | `WASM_PAGE_SIZE = 65536` |
### 4.5 Security Boundary Validation
The `ruvector-wasm` crate enforces input validation at the WASM boundary:
```rust
const MAX_VECTOR_DIMENSIONS: usize = 65536;
#[wasm_bindgen(constructor)]
pub fn new(vector: Float32Array, ...) -> Result<JsVectorEntry, JsValue> {
let vec_len = vector.length() as usize;
if vec_len == 0 {
return Err(JsValue::from_str("Vector cannot be empty"));
}
if vec_len > MAX_VECTOR_DIMENSIONS {
return Err(JsValue::from_str(&format!(
"Vector dimensions {} exceed maximum allowed {}", vec_len, MAX_VECTOR_DIMENSIONS
)));
}
// ...
}
```
---
## 5. Browser vs Node.js Deployment Strategies
### 5.1 Browser Deployment (Primary)
The ruvector-wasm crate is browser-first, using:
- **IndexedDB persistence**: `web-sys` features include `IdbDatabase`, `IdbFactory`, `IdbObjectStore`, `IdbRequest`, `IdbTransaction`, `IdbOpenDbRequest` (`/crates/ruvector-wasm/Cargo.toml`)
- **Web Workers**: Embedded JavaScript worker pool (`/crates/ruvector-wasm/src/worker-pool.js`, `/crates/ruvector-wasm/src/worker.js`) for parallel operations
- **Tracing via console**: `tracing-wasm` sends logs to browser dev tools
- **Promise-based async**: `wasm-bindgen-futures` for async operations
- **getrandom via JS**: `getrandom` with `wasm_js` feature uses `crypto.getRandomValues()`
- **PWA support**: The pwa-loader example (`/examples/pwa-loader/app.js`) demonstrates offline-capable WASM loading
#### Browser Loading Pattern
```javascript
// From examples/pwa-loader/app.js
async function loadWasm() {
const response = await fetch(WASM_PATH);
const bytes = await response.arrayBuffer();
const importObject = { env: {} };
const result = await WebAssembly.instantiate(bytes, importObject);
wasmInstance = result.instance;
wasmMemory = wasmInstance.exports.memory;
}
```
#### Browser SIMD Support
WASM SIMD128 is supported in Chrome 91+, Firefox 89+, Safari 16.4+, and Edge 91+. This covers >95% of active browsers as of 2026. Feature detection can be done via:
```javascript
const simdSupported = WebAssembly.validate(
new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11])
);
```
### 5.2 Node.js Deployment
The project supports Node.js via:
- **wasm-pack `--target nodejs`**: Generates CommonJS bindings
- **Direct instantiation** from test scripts (`/scripts/test/test-wasm.mjs`):
```javascript
import { readFileSync } from 'fs';
const wasmBuffer = readFileSync(wasmPath);
const mathWasm = await import(join(pkgPath, 'ruvector_math_wasm.js'));
await mathWasm.default(wasmBuffer);
```
- **Edge-net example**: `/examples/edge-net/pkg/node/` provides Node-specific WASM packages
Node.js has had WASM SIMD support since v16.4 (V8 9.1+). For the sublinear-time solver, Node.js deployment enables server-side and CLI usage with the same WASM binary.
### 5.3 Edge / Embedded Deployment
The `micro-hnsw-wasm` crate (11.8 KB) and `rvf-solver-wasm` (~160 KB) demonstrate ultra-compact deployment:
- **iOS/Swift**: `/examples/wasm/ios/` includes Swift resources with embedded WASM
- **Self-bootstrapping**: The WASM_SEG system embeds WASM interpreters inside data files
- **Target platforms**: `WasmTarget::Wasm32`, `WasiP1`, `WasiP2`, `Browser`, `BareTile`
### 5.4 Deployment Target Matrix
| Target | WASM Format | Binding | SIMD | Size Budget | Persistence |
|--------|-------------|---------|------|-------------|-------------|
| Browser (Chrome/FF/Safari) | wasm-bindgen | JS glue + TS types | SIMD128 | <500 KB | IndexedDB |
| Node.js (>= 16.4) | wasm-bindgen (nodejs) or raw | CommonJS/ESM | SIMD128 | <1 MB | fs |
| Cloudflare Workers | wasm-bindgen (web) | ESM | SIMD128 | <1 MB | KV |
| iOS/Swift | raw wasm32 | C FFI | Optional | <200 KB | CoreData |
| Bare-metal / RVF | no_std cdylib | extern "C" | Optional | <200 KB | None |
---
## 6. WASM Module Loading and Initialization Patterns
### 6.1 Pattern 1: wasm-bindgen Auto-Init
Used by most WASM crates. The `#[wasm_bindgen(start)]` attribute runs initialization automatically:
```rust
#[wasm_bindgen(start)]
pub fn init() {
console_error_panic_hook::set_once();
tracing_wasm::set_as_global_default();
}
```
JS side (generated by wasm-pack):
```javascript
import init, { VectorDB } from './ruvector_wasm.js';
await init(); // Loads + instantiates + runs start function
const db = new VectorDB(384, 'cosine', true);
```
### 6.2 Pattern 2: Manual WebAssembly.instantiate
Used by the pwa-loader and no_std modules:
```javascript
const response = await fetch(WASM_PATH);
const bytes = await response.arrayBuffer();
const importObject = { env: {} };
const result = await WebAssembly.instantiate(bytes, importObject);
wasmInstance = result.instance;
wasmMemory = wasmInstance.exports.memory;
```
This pattern offers maximum control: the host can inspect exports before calling any function, handle errors granularly, and manage memory directly.
### 6.3 Pattern 3: Streaming Instantiation
For large modules, `WebAssembly.instantiateStreaming` should be used (not currently in the codebase but recommended):
```javascript
const result = await WebAssembly.instantiateStreaming(
fetch(WASM_PATH),
importObject
);
```
This starts compiling while bytes are still downloading, reducing load time by up to 50%.
### 6.4 Pattern 4: Unified Engine Lazy Init
The `@ruvector/wasm-unified` uses lazy initialization:
```typescript
let defaultEngine: UnifiedEngine | null = null;
export async function getDefaultEngine(): Promise<UnifiedEngine> {
if (!defaultEngine) {
defaultEngine = await createUnifiedEngine();
await defaultEngine.init();
}
return defaultEngine;
}
```
### 6.5 Pattern 5: Instance Registry (rvf-solver-wasm)
The solver WASM uses a handle-based instance registry:
```rust
static mut REGISTRY: Registry = Registry::new(); // Max 8 concurrent solvers
// JS creates solver:
let handle = wasmInstance.exports.rvf_solver_create();
// JS uses solver:
wasmInstance.exports.rvf_solver_train(handle, 100, 1, 10, seedLo, seedHi);
// JS reads result:
let len = wasmInstance.exports.rvf_solver_result_len(handle);
let ptr = wasmInstance.exports.rvf_solver_alloc(len);
wasmInstance.exports.rvf_solver_result_read(handle, ptr);
let json = new TextDecoder().decode(new Uint8Array(wasmMemory.buffer, ptr, len));
// JS destroys:
wasmInstance.exports.rvf_solver_destroy(handle);
```
This is the recommended pattern for the sublinear-time solver because it:
- Supports multiple concurrent solver instances
- Avoids global state issues
- Enables resource cleanup
- Works across all deployment targets (browser, Node, bare-metal)
---
## 7. Performance Benchmarking Framework for WASM
### 7.1 Existing Benchmark Infrastructure
#### In-WASM Benchmark Function
The `ruvector-wasm` crate includes a built-in benchmark export:
```rust
#[wasm_bindgen(js_name = benchmark)]
pub fn benchmark(name: &str, iterations: usize, dimensions: usize) -> Result<f64, JsValue> {
let start = Instant::now();
for i in 0..iterations {
let vector: Vec<f32> = (0..dimensions)
.map(|_| js_sys::Math::random() as f32)
.collect();
let vector_arr = Float32Array::from(&vector[..]);
db.insert(vector_arr, Some(format!("vec_{}", i)), None)?;
}
let duration = start.elapsed();
Ok(iterations as f64 / duration.as_secs_f64())
}
```
#### WASM Solver Benchmark Binary
The `/examples/benchmarks/src/bin/wasm_solver_bench.rs` provides a native vs WASM comparison framework:
```
WASM vs Native AGI Solver Benchmark
Config: holdout=50, training=50, cycles=3, budget=200
NATIVE SOLVER RESULTS
Mode Acc% Cost Noise% Time Pass
A baseline xx.x% xxx.x xx.x% xxxms PASS
B compiler xx.x% xxx.x xx.x% xxxms PASS
C learned xx.x% xxx.x xx.x% xxxms PASS
WASM REFERENCE METRICS
Native total time: xxxms
WASM expected: ~xxxms (2-5x native)
```
This establishes the expected WASM overhead: **2-5x slower than native** for the self-learning solver workload.
#### SIMD Benchmarks
The `/crates/prime-radiant/benches/simd_benchmarks.rs` and `/crates/ruvector-sparse-inference/benches/simd_kernels.rs` provide Criterion benchmarks for SIMD operations that can be adapted for WASM SIMD.
### 7.2 Recommended Benchmarking Framework for the Sublinear-Time Solver
```
sublinear-time-solver/benches/
wasm_bench.rs -- In-Rust Criterion benchmarks (native baseline)
wasm_bench.mjs -- Node.js WASM performance runner
wasm_bench.html -- Browser WASM performance runner
bench_harness.rs -- Shared benchmark harness (puzzle generation)
```
#### Metrics to Track
| Metric | Description | Measurement |
|--------|-------------|-------------|
| `solve_throughput` | Puzzles solved per second | `iterations / elapsed_secs` |
| `solve_latency_p50` | Median solve time | Percentile of individual solve times |
| `solve_latency_p99` | 99th percentile solve time | Percentile of individual solve times |
| `memory_peak_bytes` | Peak WASM linear memory usage | `memory.buffer.byteLength` |
| `module_load_ms` | Time to instantiate WASM module | `performance.now()` around `WebAssembly.instantiate` |
| `simd_speedup` | SIMD vs scalar performance ratio | Compare SIMD build vs non-SIMD build |
| `wasm_native_ratio` | WASM-to-native performance overhead | Compare WASM throughput vs native Criterion results |
| `binary_size_bytes` | Compiled .wasm file size | `wc -c *.wasm` |
| `accuracy_parity` | Solver accuracy matches native | Bit-exact or epsilon comparison of results |
#### Benchmark Protocol
1. **Native baseline**: Run the solver natively with Criterion (3+ iterations, warm-up)
2. **WASM baseline**: Load the same solver as WASM, run identical workload in Node.js
3. **WASM SIMD**: Build with `RUSTFLAGS="-C target-feature=+simd128"`, measure speedup
4. **Browser measurement**: Run in Chrome with `performance.now()`, measure real-world latency
5. **Size budget**: Track .wasm binary size across commits (regression alerts if >200 KB)
6. **Accuracy validation**: Compare solver output JSON between native and WASM (must match to f64 epsilon)
---
## 8. Recommendations for the Sublinear-Time Solver
### 8.1 Binding Strategy: Use no_std + extern "C" (Pattern B)
For the sublinear-time solver WASM module, adopt the `rvf-solver-wasm` pattern:
- **no_std + alloc**: Minimizes binary size, avoids JS runtime dependency
- **dlmalloc global allocator**: Proven in rvf-solver-wasm
- **extern "C" exports**: Maximum portability (browser, Node, embedded, bare-metal)
- **Handle-based instance registry**: Supports concurrent solver instances
- **Result reads via pointer+length**: JSON serialization of results into WASM memory, host reads via typed array view
Do not use wasm-bindgen for the core solver. A thin wasm-bindgen wrapper can be created separately if a richer JS API is needed.
### 8.2 SIMD Strategy: Conditional Compilation
```rust
// In the solver crate
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
mod simd_wasm {
use core::arch::wasm32::*;
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* SIMD128 */ }
}
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
mod simd_wasm {
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* scalar fallback */ }
}
```
Build two variants:
- `solver.wasm` -- scalar fallback (maximum compatibility)
- `solver-simd.wasm` -- SIMD128 enabled (Chrome 91+, FF 89+, Safari 16.4+, Node 16.4+)
### 8.3 Memory Strategy: Bump Allocator + Shared Memory Protocol
Adopt the `SharedMemoryProtocol` pattern from the kernel pack system:
1. Allocate a fixed arena at solver creation (e.g., 256 pages = 16 MB)
2. Use 16-byte aligned bump allocation for tensor data
3. Reset the allocator between solve invocations (amortized O(1))
4. Validate memory regions before kernel execution
5. Export `memory` so the host can directly view/write typed arrays without copying
### 8.4 Build Profile
```toml
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
panic = "abort"
```
Target binary size: <200 KB (consistent with existing rvf-solver-wasm at ~160 KB).
### 8.5 Feature Detection Export
```rust
#[no_mangle]
pub extern "C" fn solver_capabilities() -> u32 {
let mut caps = 0u32;
#[cfg(target_feature = "simd128")]
{ caps |= 0x01; } // SIMD available
#[cfg(feature = "thompson-sampling")]
{ caps |= 0x02; } // Thompson Sampling enabled
#[cfg(feature = "witness-chain")]
{ caps |= 0x04; } // Witness chain enabled
caps
}
```
### 8.6 Testing Strategy
- Use `wasm-bindgen-test` with `run_in_browser` for browser tests (existing pattern)
- Use the Node.js test harness at `/scripts/test/test-wasm.mjs` as a template
- Validate accuracy parity with native build via `wasm_solver_bench`
- Run SIMD-specific tests with `RUSTFLAGS="-C target-feature=+simd128"` in CI
---
## Appendix A: File Reference
### Core WASM Source Files
| File | Purpose |
|------|---------|
| `/crates/ruvector-wasm/src/lib.rs` | Main VectorDB WASM bindings (wasm-bindgen) |
| `/crates/ruvector-wasm/src/kernel/mod.rs` | Kernel pack system entry point |
| `/crates/ruvector-wasm/src/kernel/memory.rs` | Shared memory protocol, bump allocator |
| `/crates/ruvector-wasm/src/kernel/runtime.rs` | Kernel runtime trait, mock runtime, manager |
| `/crates/ruvector-wasm/src/kernel/epoch.rs` | Epoch-based execution budgets |
| `/crates/ruvector-wasm/src/kernel/signature.rs` | Ed25519 kernel pack verification |
| `/crates/ruvector-wasm/src/kernel/manifest.rs` | Kernel manifest parsing |
| `/crates/ruvector-wasm/Cargo.toml` | WASM dependency configuration |
### SIMD Source Files
| File | Purpose |
|------|---------|
| `/crates/ruvector-delta-wasm/src/simd.rs` | WASM SIMD128 f32x4 operations |
| `/crates/ruvector-sparse-inference/src/backend/wasm.rs` | WASM SIMD backend with Backend trait |
| `/crates/ruvector-mincut/src/wasm/simd.rs` | WASM SIMD128 bitset operations |
| `/crates/ruvector-core/src/simd_intrinsics.rs` | Native SIMD (AVX2/AVX-512/NEON) reference |
### Solver WASM Source Files
| File | Purpose |
|------|---------|
| `/crates/rvf/rvf-solver-wasm/src/lib.rs` | Self-learning solver WASM exports (no_std) |
| `/crates/rvf/rvf-solver-wasm/src/engine.rs` | Adaptive solver engine |
| `/crates/rvf/rvf-solver-wasm/src/policy.rs` | PolicyKernel with Thompson Sampling |
| `/crates/rvf/rvf-solver-wasm/Cargo.toml` | no_std WASM build configuration |
### Build and Test Files
| File | Purpose |
|------|---------|
| `/Cargo.toml` | Workspace WASM dependencies and build profiles |
| `/scripts/test/test-wasm.mjs` | Node.js WASM test runner |
| `/examples/benchmarks/src/bin/wasm_solver_bench.rs` | Native vs WASM benchmark comparison |
| `/examples/pwa-loader/app.js` | Browser WASM loading and memory management |
### RVF Self-Bootstrap Files
| File | Purpose |
|------|---------|
| `/crates/rvf/rvf-types/src/wasm_bootstrap.rs` | WasmHeader, WasmRole, WasmTarget, feature flags |
### TypeScript/npm Files
| File | Purpose |
|------|---------|
| `/npm/packages/ruvector-wasm-unified/src/index.ts` | Unified WASM engine TypeScript API |
---
## Appendix B: WASM Binary Size Inventory
| Binary | Size | Strategy |
|--------|------|----------|
| `micro_hnsw.wasm` | 11.8 KB | no_std, bare minimum |
| `ruvector_learning_wasm_bg.wasm` | 39 KB | wasm-bindgen |
| `ruvector_exotic_wasm_bg.wasm` | 149 KB | wasm-bindgen |
| `ruvector_nervous_system_wasm_bg.wasm` | 178 KB | wasm-bindgen |
| `ruvector_economy_wasm_bg.wasm` | 181 KB | wasm-bindgen |
| `ruvector_attention_unified_wasm_bg.wasm` | 339 KB | wasm-bindgen |
| `rvf-solver-wasm` (estimated) | ~160 KB | no_std + dlmalloc |
The sublinear-time solver should target the **<200 KB** range using the no_std approach, consistent with `rvf-solver-wasm`.