git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
35 KiB
06 - WebAssembly Integration Analysis
Agent: 6 (WASM Integration Specialist) Date: 2026-02-20 Scope: ruvector codebase WASM capabilities, build pipeline, SIMD acceleration, memory management, deployment strategies, module loading, and benchmarking framework
Table of Contents
- Existing WASM Usage in ruvector
- WASM Build Pipeline Compatibility
- SIMD Acceleration Opportunities
- Memory Management Patterns
- Browser vs Node.js Deployment Strategies
- WASM Module Loading and Initialization Patterns
- Performance Benchmarking Framework for WASM
- Recommendations for the Sublinear-Time Solver
1. Existing WASM Usage in ruvector
1.1 Scale of WASM Infrastructure
The ruvector project has a massive, mature WASM infrastructure. The workspace defines 27 dedicated WASM crates in the Cargo workspace, spanning vector database operations, attention mechanisms, graph algorithms, ML inference, and self-learning solvers. This is not an experimental feature -- it is a first-class deployment target.
WASM Crate Inventory (27 crates)
| Crate | Description | Target | Size |
|---|---|---|---|
ruvector-wasm |
Core vector DB bindings (HNSW, insert, search, delete) | wasm32-unknown-unknown (wasm-bindgen) |
~28 KB src |
rvf-solver-wasm |
Self-learning temporal solver (Thompson Sampling, PolicyKernel) | wasm32-unknown-unknown (no_std + alloc, extern "C") |
~160 KB compiled |
rvf-wasm |
RVF format microkernel for browser/edge vector ops | wasm32-unknown-unknown |
- |
micro-hnsw-wasm |
Neuromorphic HNSW with spiking neural nets | wasm32-unknown-unknown |
11.8 KB compiled |
ruvector-attention-wasm |
18+ attention mechanisms (Flash, MoE, Hyperbolic) | wasm32-unknown-unknown (wasm-bindgen) |
- |
ruvector-attention-unified-wasm |
Unified attention API | wasm32-unknown-unknown |
339 KB compiled |
ruvector-learning-wasm |
MicroLoRA adaptation (<100us latency) | wasm32-unknown-unknown |
39 KB compiled |
ruvector-nervous-system-wasm |
Bio-inspired neural simulation | wasm32-unknown-unknown |
178 KB compiled |
ruvector-economy-wasm |
Compute credit management | wasm32-unknown-unknown |
181 KB compiled |
ruvector-exotic-wasm |
Quantum, hyperbolic, topological | wasm32-unknown-unknown |
149 KB compiled |
ruvector-sparse-inference-wasm |
Sparse matrix inference with WASM SIMD | wasm32-unknown-unknown |
- |
ruvector-delta-wasm |
Delta operations with SIMD | wasm32-unknown-unknown |
- |
ruvector-mincut-wasm |
Subpolynomial-time dynamic min-cut | wasm32-unknown-unknown |
- |
ruvector-mincut-gated-transformer-wasm |
Gated transformer min-cut | wasm32-unknown-unknown |
- |
ruvector-graph-wasm |
Graph operations | wasm32-unknown-unknown |
- |
ruvector-gnn-wasm |
Graph neural networks | wasm32-unknown-unknown |
- |
ruvector-dag-wasm |
Minimal DAG for browser/embedded | wasm32-unknown-unknown |
- |
ruvector-math-wasm |
Math operations (Wasserstein, manifolds, spherical) | wasm32-unknown-unknown |
- |
ruvector-router-wasm |
Query routing | wasm32-unknown-unknown |
- |
ruvector-fpga-transformer-wasm |
FPGA transformer simulation | wasm32-unknown-unknown |
- |
ruvector-temporal-tensor-wasm |
Temporal tensor operations | wasm32-unknown-unknown |
- |
ruvector-tiny-dancer-wasm |
Lightweight operations | wasm32-unknown-unknown |
- |
ruvector-hyperbolic-hnsw-wasm |
Hyperbolic HNSW | wasm32-unknown-unknown |
- |
ruvector-domain-expansion-wasm |
Cross-domain transfer learning | wasm32-unknown-unknown |
- |
ruvllm-wasm |
LLM inference | wasm32-unknown-unknown |
- |
ruqu-wasm |
Quantum operations | wasm32-unknown-unknown |
- |
exo-wasm (example) |
Exo AI experiment | wasm32-unknown-unknown |
- |
1.2 Two Distinct WASM Binding Strategies
The codebase employs two fundamentally different WASM integration patterns:
Pattern A: wasm-bindgen + wasm-pack (High-Level, Browser-First)
Used by: ruvector-wasm, ruvector-attention-wasm, ruvector-math-wasm, most -wasm crates.
// crates/ruvector-wasm/src/lib.rs
use wasm_bindgen::prelude::*;
use js_sys::{Float32Array, Object, Promise};
use web_sys::{console, IdbDatabase, IdbFactory};
#[wasm_bindgen(start)]
pub fn init() {
console_error_panic_hook::set_once();
tracing_wasm::set_as_global_default();
}
#[wasm_bindgen]
pub struct VectorDB { /* ... */ }
#[wasm_bindgen]
impl VectorDB {
#[wasm_bindgen(constructor)]
pub fn new(dimensions: usize, metric: Option<String>, use_hnsw: Option<bool>)
-> Result<VectorDB, JsValue> { /* ... */ }
}
Key dependencies: wasm-bindgen, wasm-bindgen-futures, js-sys, web-sys, serde-wasm-bindgen, console_error_panic_hook.
Advantages: Rich JS interop, automatic TypeScript type generation, Promise support, access to Web APIs (IndexedDB, Workers, console).
Pattern B: no_std + extern "C" ABI (Low-Level, Minimal)
Used by: rvf-solver-wasm, rvf-wasm, micro-hnsw-wasm.
// crates/rvf/rvf-solver-wasm/src/lib.rs
#![no_std]
extern crate alloc;
#[no_mangle]
pub extern "C" fn rvf_solver_create() -> i32 {
registry().create()
}
#[no_mangle]
pub extern "C" fn rvf_solver_train(handle: i32, count: i32, /* ... */) -> i32 { /* ... */ }
Key dependencies: dlmalloc (global allocator), libm, serde (no_std + alloc). No wasm-bindgen.
Advantages: Minimal binary size (~160 KB for rvf-solver-wasm, 11.8 KB for micro-hnsw-wasm), no JS runtime dependency, runs on bare wasm32-unknown-unknown, suitable for self-bootstrapping RVF files.
1.3 Kernel Pack System (ADR-005)
The ruvector-wasm crate includes a sophisticated Kernel Pack System (/crates/ruvector-wasm/src/kernel/) for secure, sandboxed execution of ML compute kernels via Wasmtime:
- Manifest parsing (
manifest.rs): Declares kernel categories (Positional/RoPE, Normalization/RMSNorm, Activation/SwiGLU, KV-Cache, Adapter/LoRA), tensor specs, resource limits - Ed25519 signature verification (
signature.rs): Supply chain security for kernel packs - SHA256 hash verification (
hash.rs): Content integrity - Epoch-based execution budgets (
epoch.rs): Coarse-grained interruption with configurable tick intervals (10ms server, 1ms embedded) - Shared memory protocol (
memory.rs): 16-byte aligned allocation, region overlap validation, tensor layout management - Kernel runtime (
runtime.rs):KernelRuntimetrait with compile/instantiate/execute lifecycle, mock runtime for testing - Trusted allowlist (
allowlist.rs): Restricts which kernel IDs may execute
This kernel pack system is directly relevant to the sublinear-time solver because it provides a ready-made infrastructure for sandboxed execution of solver kernels with resource limits.
1.4 Self-Bootstrapping WASM (RVF Format)
The rvf-types crate defines a WasmHeader (/crates/rvf/rvf-types/src/wasm_bootstrap.rs) for embedding WASM modules directly inside .rvf data files:
.rvf file
+-- WASM_SEG (role=Interpreter, ~50 KB)
+-- WASM_SEG (role=Microkernel, ~5.5 KB)
+-- VEC_SEG (data)
Roles: Microkernel, Interpreter, Combined, Extension, ControlPlane.
Targets: Wasm32, WasiP1, WasiP2, Browser, BareTile.
Feature flags: WASM_FEAT_SIMD, WASM_FEAT_BULK_MEMORY, WASM_FEAT_MULTI_VALUE, WASM_FEAT_REFERENCE_TYPES, WASM_FEAT_THREADS, WASM_FEAT_TAIL_CALL, WASM_FEAT_GC, WASM_FEAT_EXCEPTION_HANDLING.
1.5 Unified WASM TypeScript API
The @ruvector/wasm-unified npm package (/npm/packages/ruvector-wasm-unified/src/index.ts) provides a high-level TypeScript surface combining all WASM modules:
export interface UnifiedEngine {
attention: AttentionEngine; // 14+ mechanisms
learning: LearningEngine; // MicroLoRA, SONA, BTSP, RL
nervous: NervousEngine; // Bio-inspired neural simulation
economy: EconomyEngine; // Compute credits
exotic: ExoticEngine; // Quantum, hyperbolic, topological
version(): string;
getStats(): UnifiedStats;
init(): Promise<void>;
dispose(): void;
}
2. WASM Build Pipeline Compatibility
2.1 Workspace-Level Configuration
The root Cargo.toml defines workspace-level WASM dependencies:
# /Cargo.toml (workspace)
[workspace.dependencies]
wasm-bindgen = "0.2"
wasm-bindgen-futures = "0.4"
js-sys = "0.3"
web-sys = { version = "0.3", features = ["Worker", "MessagePort", "console"] }
getrandom = { version = "0.3", features = ["wasm_js"] }
There is also a getrandom compatibility patch for WASM:
# In ruvector-wasm/Cargo.toml
getrandom02 = { package = "getrandom", version = "0.2", features = ["js"] }
[target.'cfg(target_arch = "wasm32")'.dependencies]
getrandom = { workspace = true, features = ["wasm_js"] }
And a workspace-level patch for hnsw_rs to use rand 0.8 for WASM compatibility:
[patch.crates-io]
hnsw_rs = { path = "./patches/hnsw_rs" }
2.2 Build Profiles
Two distinct WASM build profiles exist:
Profile 1: Size-Optimized (for wasm-bindgen crates)
# crates/ruvector-wasm/Cargo.toml
[profile.release]
opt-level = "z" # Optimize for size
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit
panic = "abort" # No unwind tables
[profile.release.package."*"]
opt-level = "z"
[package.metadata.wasm-pack.profile.release]
wasm-opt = false # Disable wasm-opt (already optimized by LTO)
Profile 2: Size-Optimized + Strip (for no_std crates)
# crates/rvf/rvf-solver-wasm/Cargo.toml
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true # Also strips debug symbols
Profile 3: Workspace Default Release (native)
# Root Cargo.toml
[profile.release]
opt-level = 3 # Optimize for speed
lto = "fat"
codegen-units = 1
strip = true
panic = "unwind" # Keeps unwind tables (unlike WASM profile)
2.3 Build Tooling
The test script at /scripts/test/test-wasm.mjs demonstrates the build command:
wasm-pack build crates/ruvector-attention-wasm --target web --release
For no_std crates like rvf-solver-wasm, the standard cargo command with WASM target is used:
cargo build --target wasm32-unknown-unknown --release -p rvf-solver-wasm
2.4 Sublinear-Time Solver Build Compatibility
The rvf-solver-wasm crate provides the closest precedent for a sublinear-time solver WASM build:
- Target:
wasm32-unknown-unknown(no WASI dependency) - Allocator:
dlmalloc(global allocator foralloc) - Math:
libm(no_std-compatible math functions) - Serialization:
serde+serde_json(no_std + alloc features) - Crypto:
rvf-crypto(SHAKE-256 witness chain) - Panic handler:
core::arch::wasm32::unreachable() - ABI:
extern "C"exports (no wasm-bindgen overhead) - Crate type:
cdylibonly (no rlib)
This approach produces binaries in the ~160 KB range, which is excellent for edge deployment.
3. SIMD Acceleration Opportunities
3.1 Existing WASM SIMD Infrastructure
The codebase has extensive WASM SIMD128 support across multiple crates, all using core::arch::wasm32::* intrinsics. Every SIMD function provides dual implementations: a #[cfg(target_feature = "simd128")] version using WASM SIMD intrinsics and a #[cfg(not(target_feature = "simd128"))] scalar fallback.
WASM SIMD Operations Already Implemented
| Crate | File | Operations |
|---|---|---|
ruvector-delta-wasm |
src/simd.rs |
f32x4 add, sub, scale, dot, L2 norm, diff, abs, clamp, count_nonzero |
ruvector-sparse-inference |
src/backend/wasm.rs |
f32x4 dot product, ReLU, vector add, AXPY |
ruvector-mincut |
src/wasm/simd.rs |
v128 popcount (table lookup method), XOR, boundary computation, batch membership |
ruvector-core |
src/simd_intrinsics.rs |
x86_64 (AVX2, AVX-512, FMA), aarch64 (NEON, unrolled), INT8 quantized, batch operations |
SIMD Operations in ruvector-delta-wasm/src/simd.rs (Representative)
use core::arch::wasm32::*;
#[cfg(target_feature = "simd128")]
pub fn simd_dot(a: &[f32], b: &[f32]) -> f32 {
let chunks = a.len() / 4;
let mut sum_vec = f32x4_splat(0.0);
for i in 0..chunks {
let offset = i * 4;
unsafe {
let a_vec = v128_load(a.as_ptr().add(offset) as *const v128);
let b_vec = v128_load(b.as_ptr().add(offset) as *const v128);
let prod = f32x4_mul(a_vec, b_vec);
sum_vec = f32x4_add(sum_vec, prod);
}
}
// Horizontal sum + remainder handling
let sum_array: [f32; 4] = unsafe { core::mem::transmute(sum_vec) };
let mut sum = sum_array[0] + sum_array[1] + sum_array[2] + sum_array[3];
for i in (chunks * 4)..a.len() { sum += a[i] * b[i]; }
sum
}
SIMD Operations in ruvector-sparse-inference/src/backend/wasm.rs (Backend Trait)
pub struct WasmBackend;
impl Backend for WasmBackend {
fn dot_product(&self, a: &[f32], b: &[f32]) -> f32 { /* SIMD dispatch */ }
fn sparse_matmul(&self, matrix: &Array2<f32>, input: &[f32], rows: &[usize]) -> Vec<f32>;
fn sparse_matmul_accumulate(&self, matrix: &Array2<f32>, input: &[f32], cols: &[usize], output: &mut [f32]);
fn activation(&self, data: &mut [f32], activation_type: ActivationType); // ReLU via SIMD
fn add(&self, a: &mut [f32], b: &[f32]);
fn axpy(&self, a: &mut [f32], b: &[f32], scalar: f32);
fn name(&self) -> &'static str { "WASM-SIMD" }
fn simd_width(&self) -> usize { 4 } // 128-bit = 4 x f32
}
3.2 SIMD Acceleration Opportunities for the Sublinear-Time Solver
Based on the sublinear-time solver's core operations, the following SIMD acceleration points are identified:
| Operation | SIMD Strategy | Expected Speedup | Existing Pattern |
|---|---|---|---|
| Distance computation (dot, cosine, euclidean) | f32x4_mul + f32x4_add accumulation |
2-4x | ruvector-delta-wasm/src/simd.rs |
| Vector normalization | f32x4_mul (scale) + f32x4_add (L2 norm) |
2-4x | simd_l2_norm_squared, simd_scale |
| Bitset operations (partition tracking) | v128_xor, v128_and, popcount via lookup |
4-8x | ruvector-mincut/src/wasm/simd.rs |
| Sparse matrix-vector multiply | SIMD dot + sparse row selection | 2-4x | WasmBackend::sparse_matmul |
| Activation functions (ReLU, GELU) | f32x4_max with zero splat |
2-4x | relu_wasm_simd |
| Thompson Sampling bandit updates | Scalar (branching-heavy) | 1x (no benefit) | N/A |
| Sort/selection (top-k) | Scalar (comparison-heavy) | 1x (no benefit) | N/A |
3.3 SIMD Feature Detection
The ruvector-wasm crate exposes SIMD detection to JS:
#[wasm_bindgen(js_name = detectSIMD)]
pub fn detect_simd() -> bool {
#[cfg(target_feature = "simd128")]
{ true }
#[cfg(not(target_feature = "simd128"))]
{ false }
}
For the sublinear-time solver, SIMD should be compiled in via RUSTFLAGS="-C target-feature=+simd128" at build time, with scalar fallbacks for environments that do not support it.
3.4 Native SIMD Comparison
The native codebase (ruvector-core/src/simd_intrinsics.rs) supports:
- x86_64: AVX2 (256-bit, 8 x f32), AVX-512 (512-bit, 16 x f32), FMA, INT8 quantized
- aarch64: NEON (128-bit, 4 x f32), 4x loop unrolling, FMA via
vfmaq_f32 - WASM: SIMD128 (128-bit, 4 x f32)
WASM SIMD128 provides the same width as NEON (4 x f32) but lacks FMA (f32x4_fma is not available in stable WASM SIMD). This means the sublinear-time solver WASM build will be approximately 2-3x slower than a native NEON build for distance computations, and 4-8x slower than an AVX-512 build. However, it will still be significantly faster than scalar fallback.
4. Memory Management Patterns
4.1 Shared Memory Protocol (Kernel Pack System)
The kernel pack system at /crates/ruvector-wasm/src/kernel/memory.rs defines a mature shared memory protocol:
pub struct SharedMemoryProtocol {
total_size: usize, // Total memory in bytes
current_offset: usize, // Bump allocator position
alignment: usize, // Typically 16 bytes
}
impl SharedMemoryProtocol {
pub fn default_settings() -> Self {
Self::new(256, 16) // 256 pages = 16 MB, 16-byte alignment
}
pub fn allocate(&mut self, size: usize) -> Result<usize, KernelError> {
let aligned_offset = self.align_offset(self.current_offset);
// ...bounds check...
self.current_offset = aligned_offset + size;
Ok(aligned_offset)
}
}
The KernelInvocationDescriptor manages tensor memory layout:
pub struct KernelInvocationDescriptor {
pub descriptor: KernelDescriptor, // input_a, input_b, output, scratch, params offsets+sizes
protocol: SharedMemoryProtocol,
}
The MemoryLayoutValidator prevents region overlap and bounds violations.
4.2 Typed Arrays / Zero-Copy Transfer
The wasm-bindgen crates use Float32Array for zero-copy data transfer between JS and WASM:
// Input: JS Float32Array -> Rust Vec<f32>
pub fn insert(&self, vector: Float32Array, ...) -> Result<String, JsValue> {
let vector_data: Vec<f32> = vector.to_vec(); // Copy from JS typed array
// ...
}
// Output: Rust Vec<f32> -> JS Float32Array
pub fn vector(&self) -> Float32Array {
Float32Array::from(&self.inner.vector[..]) // Copy to JS typed array
}
Note: Float32Array::to_vec() and Float32Array::from() perform copies. True zero-copy requires accessing WASM linear memory directly from JS, which is demonstrated in the pwa-loader:
// Zero-copy write into WASM memory
function wasmWrite(data) {
const ptr = wasmInstance.exports.rvf_alloc(data.length);
const mem = new Uint8Array(wasmMemory.buffer, ptr, data.length);
mem.set(data); // Direct memory write
return ptr;
}
// Zero-copy read from WASM memory
function wasmRead(ptr, len) {
return new Uint8Array(wasmMemory.buffer, ptr, len).slice();
}
4.3 Memory Patterns in rvf-solver-wasm (no_std)
The no_std solver uses dlmalloc as global allocator and manages its own instance registry:
// Global mutable registry - safe in single-threaded WASM
static mut REGISTRY: Registry = Registry::new();
const MAX_INSTANCES: usize = 8;
struct SolverInstance {
solver: AdaptiveSolver,
last_result_json: Vec<u8>, // Heap-allocated via dlmalloc
policy_json: Vec<u8>,
witness_chain: Vec<u8>,
}
Memory export for external reads uses raw pointer copies:
#[no_mangle]
pub extern "C" fn rvf_solver_result_read(handle: i32, out_ptr: i32) -> i32 {
let data = &inst.last_result_json;
unsafe {
core::ptr::copy_nonoverlapping(data.as_ptr(), out_ptr as *mut u8, data.len());
}
data.len() as i32
}
4.4 Memory Limits
| Configuration | Max Pages | Memory Limit | Context |
|---|---|---|---|
| Server runtime | 1024 | 64 MB | RuntimeConfig::server() |
| Embedded runtime | 64 | 4 MB | RuntimeConfig::embedded() |
| Default shared memory | 256 | 16 MB | SharedMemoryProtocol::default_settings() |
| Microkernel (RVF) | 2-4 | 128-256 KB | WasmHeader min/max pages |
| WASM page size | 1 | 64 KB | WASM_PAGE_SIZE = 65536 |
4.5 Security Boundary Validation
The ruvector-wasm crate enforces input validation at the WASM boundary:
const MAX_VECTOR_DIMENSIONS: usize = 65536;
#[wasm_bindgen(constructor)]
pub fn new(vector: Float32Array, ...) -> Result<JsVectorEntry, JsValue> {
let vec_len = vector.length() as usize;
if vec_len == 0 {
return Err(JsValue::from_str("Vector cannot be empty"));
}
if vec_len > MAX_VECTOR_DIMENSIONS {
return Err(JsValue::from_str(&format!(
"Vector dimensions {} exceed maximum allowed {}", vec_len, MAX_VECTOR_DIMENSIONS
)));
}
// ...
}
5. Browser vs Node.js Deployment Strategies
5.1 Browser Deployment (Primary)
The ruvector-wasm crate is browser-first, using:
- IndexedDB persistence:
web-sysfeatures includeIdbDatabase,IdbFactory,IdbObjectStore,IdbRequest,IdbTransaction,IdbOpenDbRequest(/crates/ruvector-wasm/Cargo.toml) - Web Workers: Embedded JavaScript worker pool (
/crates/ruvector-wasm/src/worker-pool.js,/crates/ruvector-wasm/src/worker.js) for parallel operations - Tracing via console:
tracing-wasmsends logs to browser dev tools - Promise-based async:
wasm-bindgen-futuresfor async operations - getrandom via JS:
getrandomwithwasm_jsfeature usescrypto.getRandomValues() - PWA support: The pwa-loader example (
/examples/pwa-loader/app.js) demonstrates offline-capable WASM loading
Browser Loading Pattern
// From examples/pwa-loader/app.js
async function loadWasm() {
const response = await fetch(WASM_PATH);
const bytes = await response.arrayBuffer();
const importObject = { env: {} };
const result = await WebAssembly.instantiate(bytes, importObject);
wasmInstance = result.instance;
wasmMemory = wasmInstance.exports.memory;
}
Browser SIMD Support
WASM SIMD128 is supported in Chrome 91+, Firefox 89+, Safari 16.4+, and Edge 91+. This covers >95% of active browsers as of 2026. Feature detection can be done via:
const simdSupported = WebAssembly.validate(
new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11])
);
5.2 Node.js Deployment
The project supports Node.js via:
- wasm-pack
--target nodejs: Generates CommonJS bindings - Direct instantiation from test scripts (
/scripts/test/test-wasm.mjs):
import { readFileSync } from 'fs';
const wasmBuffer = readFileSync(wasmPath);
const mathWasm = await import(join(pkgPath, 'ruvector_math_wasm.js'));
await mathWasm.default(wasmBuffer);
- Edge-net example:
/examples/edge-net/pkg/node/provides Node-specific WASM packages
Node.js has had WASM SIMD support since v16.4 (V8 9.1+). For the sublinear-time solver, Node.js deployment enables server-side and CLI usage with the same WASM binary.
5.3 Edge / Embedded Deployment
The micro-hnsw-wasm crate (11.8 KB) and rvf-solver-wasm (~160 KB) demonstrate ultra-compact deployment:
- iOS/Swift:
/examples/wasm/ios/includes Swift resources with embedded WASM - Self-bootstrapping: The WASM_SEG system embeds WASM interpreters inside data files
- Target platforms:
WasmTarget::Wasm32,WasiP1,WasiP2,Browser,BareTile
5.4 Deployment Target Matrix
| Target | WASM Format | Binding | SIMD | Size Budget | Persistence |
|---|---|---|---|---|---|
| Browser (Chrome/FF/Safari) | wasm-bindgen | JS glue + TS types | SIMD128 | <500 KB | IndexedDB |
| Node.js (>= 16.4) | wasm-bindgen (nodejs) or raw | CommonJS/ESM | SIMD128 | <1 MB | fs |
| Cloudflare Workers | wasm-bindgen (web) | ESM | SIMD128 | <1 MB | KV |
| iOS/Swift | raw wasm32 | C FFI | Optional | <200 KB | CoreData |
| Bare-metal / RVF | no_std cdylib | extern "C" | Optional | <200 KB | None |
6. WASM Module Loading and Initialization Patterns
6.1 Pattern 1: wasm-bindgen Auto-Init
Used by most WASM crates. The #[wasm_bindgen(start)] attribute runs initialization automatically:
#[wasm_bindgen(start)]
pub fn init() {
console_error_panic_hook::set_once();
tracing_wasm::set_as_global_default();
}
JS side (generated by wasm-pack):
import init, { VectorDB } from './ruvector_wasm.js';
await init(); // Loads + instantiates + runs start function
const db = new VectorDB(384, 'cosine', true);
6.2 Pattern 2: Manual WebAssembly.instantiate
Used by the pwa-loader and no_std modules:
const response = await fetch(WASM_PATH);
const bytes = await response.arrayBuffer();
const importObject = { env: {} };
const result = await WebAssembly.instantiate(bytes, importObject);
wasmInstance = result.instance;
wasmMemory = wasmInstance.exports.memory;
This pattern offers maximum control: the host can inspect exports before calling any function, handle errors granularly, and manage memory directly.
6.3 Pattern 3: Streaming Instantiation
For large modules, WebAssembly.instantiateStreaming should be used (not currently in the codebase but recommended):
const result = await WebAssembly.instantiateStreaming(
fetch(WASM_PATH),
importObject
);
This starts compiling while bytes are still downloading, reducing load time by up to 50%.
6.4 Pattern 4: Unified Engine Lazy Init
The @ruvector/wasm-unified uses lazy initialization:
let defaultEngine: UnifiedEngine | null = null;
export async function getDefaultEngine(): Promise<UnifiedEngine> {
if (!defaultEngine) {
defaultEngine = await createUnifiedEngine();
await defaultEngine.init();
}
return defaultEngine;
}
6.5 Pattern 5: Instance Registry (rvf-solver-wasm)
The solver WASM uses a handle-based instance registry:
static mut REGISTRY: Registry = Registry::new(); // Max 8 concurrent solvers
// JS creates solver:
let handle = wasmInstance.exports.rvf_solver_create();
// JS uses solver:
wasmInstance.exports.rvf_solver_train(handle, 100, 1, 10, seedLo, seedHi);
// JS reads result:
let len = wasmInstance.exports.rvf_solver_result_len(handle);
let ptr = wasmInstance.exports.rvf_solver_alloc(len);
wasmInstance.exports.rvf_solver_result_read(handle, ptr);
let json = new TextDecoder().decode(new Uint8Array(wasmMemory.buffer, ptr, len));
// JS destroys:
wasmInstance.exports.rvf_solver_destroy(handle);
This is the recommended pattern for the sublinear-time solver because it:
- Supports multiple concurrent solver instances
- Avoids global state issues
- Enables resource cleanup
- Works across all deployment targets (browser, Node, bare-metal)
7. Performance Benchmarking Framework for WASM
7.1 Existing Benchmark Infrastructure
In-WASM Benchmark Function
The ruvector-wasm crate includes a built-in benchmark export:
#[wasm_bindgen(js_name = benchmark)]
pub fn benchmark(name: &str, iterations: usize, dimensions: usize) -> Result<f64, JsValue> {
let start = Instant::now();
for i in 0..iterations {
let vector: Vec<f32> = (0..dimensions)
.map(|_| js_sys::Math::random() as f32)
.collect();
let vector_arr = Float32Array::from(&vector[..]);
db.insert(vector_arr, Some(format!("vec_{}", i)), None)?;
}
let duration = start.elapsed();
Ok(iterations as f64 / duration.as_secs_f64())
}
WASM Solver Benchmark Binary
The /examples/benchmarks/src/bin/wasm_solver_bench.rs provides a native vs WASM comparison framework:
WASM vs Native AGI Solver Benchmark
Config: holdout=50, training=50, cycles=3, budget=200
NATIVE SOLVER RESULTS
Mode Acc% Cost Noise% Time Pass
A baseline xx.x% xxx.x xx.x% xxxms PASS
B compiler xx.x% xxx.x xx.x% xxxms PASS
C learned xx.x% xxx.x xx.x% xxxms PASS
WASM REFERENCE METRICS
Native total time: xxxms
WASM expected: ~xxxms (2-5x native)
This establishes the expected WASM overhead: 2-5x slower than native for the self-learning solver workload.
SIMD Benchmarks
The /crates/prime-radiant/benches/simd_benchmarks.rs and /crates/ruvector-sparse-inference/benches/simd_kernels.rs provide Criterion benchmarks for SIMD operations that can be adapted for WASM SIMD.
7.2 Recommended Benchmarking Framework for the Sublinear-Time Solver
sublinear-time-solver/benches/
wasm_bench.rs -- In-Rust Criterion benchmarks (native baseline)
wasm_bench.mjs -- Node.js WASM performance runner
wasm_bench.html -- Browser WASM performance runner
bench_harness.rs -- Shared benchmark harness (puzzle generation)
Metrics to Track
| Metric | Description | Measurement |
|---|---|---|
solve_throughput |
Puzzles solved per second | iterations / elapsed_secs |
solve_latency_p50 |
Median solve time | Percentile of individual solve times |
solve_latency_p99 |
99th percentile solve time | Percentile of individual solve times |
memory_peak_bytes |
Peak WASM linear memory usage | memory.buffer.byteLength |
module_load_ms |
Time to instantiate WASM module | performance.now() around WebAssembly.instantiate |
simd_speedup |
SIMD vs scalar performance ratio | Compare SIMD build vs non-SIMD build |
wasm_native_ratio |
WASM-to-native performance overhead | Compare WASM throughput vs native Criterion results |
binary_size_bytes |
Compiled .wasm file size | wc -c *.wasm |
accuracy_parity |
Solver accuracy matches native | Bit-exact or epsilon comparison of results |
Benchmark Protocol
- Native baseline: Run the solver natively with Criterion (3+ iterations, warm-up)
- WASM baseline: Load the same solver as WASM, run identical workload in Node.js
- WASM SIMD: Build with
RUSTFLAGS="-C target-feature=+simd128", measure speedup - Browser measurement: Run in Chrome with
performance.now(), measure real-world latency - Size budget: Track .wasm binary size across commits (regression alerts if >200 KB)
- Accuracy validation: Compare solver output JSON between native and WASM (must match to f64 epsilon)
8. Recommendations for the Sublinear-Time Solver
8.1 Binding Strategy: Use no_std + extern "C" (Pattern B)
For the sublinear-time solver WASM module, adopt the rvf-solver-wasm pattern:
- no_std + alloc: Minimizes binary size, avoids JS runtime dependency
- dlmalloc global allocator: Proven in rvf-solver-wasm
- extern "C" exports: Maximum portability (browser, Node, embedded, bare-metal)
- Handle-based instance registry: Supports concurrent solver instances
- Result reads via pointer+length: JSON serialization of results into WASM memory, host reads via typed array view
Do not use wasm-bindgen for the core solver. A thin wasm-bindgen wrapper can be created separately if a richer JS API is needed.
8.2 SIMD Strategy: Conditional Compilation
// In the solver crate
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
mod simd_wasm {
use core::arch::wasm32::*;
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* SIMD128 */ }
}
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
mod simd_wasm {
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* scalar fallback */ }
}
Build two variants:
solver.wasm-- scalar fallback (maximum compatibility)solver-simd.wasm-- SIMD128 enabled (Chrome 91+, FF 89+, Safari 16.4+, Node 16.4+)
8.3 Memory Strategy: Bump Allocator + Shared Memory Protocol
Adopt the SharedMemoryProtocol pattern from the kernel pack system:
- Allocate a fixed arena at solver creation (e.g., 256 pages = 16 MB)
- Use 16-byte aligned bump allocation for tensor data
- Reset the allocator between solve invocations (amortized O(1))
- Validate memory regions before kernel execution
- Export
memoryso the host can directly view/write typed arrays without copying
8.4 Build Profile
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true
panic = "abort"
Target binary size: <200 KB (consistent with existing rvf-solver-wasm at ~160 KB).
8.5 Feature Detection Export
#[no_mangle]
pub extern "C" fn solver_capabilities() -> u32 {
let mut caps = 0u32;
#[cfg(target_feature = "simd128")]
{ caps |= 0x01; } // SIMD available
#[cfg(feature = "thompson-sampling")]
{ caps |= 0x02; } // Thompson Sampling enabled
#[cfg(feature = "witness-chain")]
{ caps |= 0x04; } // Witness chain enabled
caps
}
8.6 Testing Strategy
- Use
wasm-bindgen-testwithrun_in_browserfor browser tests (existing pattern) - Use the Node.js test harness at
/scripts/test/test-wasm.mjsas a template - Validate accuracy parity with native build via
wasm_solver_bench - Run SIMD-specific tests with
RUSTFLAGS="-C target-feature=+simd128"in CI
Appendix A: File Reference
Core WASM Source Files
| File | Purpose |
|---|---|
/crates/ruvector-wasm/src/lib.rs |
Main VectorDB WASM bindings (wasm-bindgen) |
/crates/ruvector-wasm/src/kernel/mod.rs |
Kernel pack system entry point |
/crates/ruvector-wasm/src/kernel/memory.rs |
Shared memory protocol, bump allocator |
/crates/ruvector-wasm/src/kernel/runtime.rs |
Kernel runtime trait, mock runtime, manager |
/crates/ruvector-wasm/src/kernel/epoch.rs |
Epoch-based execution budgets |
/crates/ruvector-wasm/src/kernel/signature.rs |
Ed25519 kernel pack verification |
/crates/ruvector-wasm/src/kernel/manifest.rs |
Kernel manifest parsing |
/crates/ruvector-wasm/Cargo.toml |
WASM dependency configuration |
SIMD Source Files
| File | Purpose |
|---|---|
/crates/ruvector-delta-wasm/src/simd.rs |
WASM SIMD128 f32x4 operations |
/crates/ruvector-sparse-inference/src/backend/wasm.rs |
WASM SIMD backend with Backend trait |
/crates/ruvector-mincut/src/wasm/simd.rs |
WASM SIMD128 bitset operations |
/crates/ruvector-core/src/simd_intrinsics.rs |
Native SIMD (AVX2/AVX-512/NEON) reference |
Solver WASM Source Files
| File | Purpose |
|---|---|
/crates/rvf/rvf-solver-wasm/src/lib.rs |
Self-learning solver WASM exports (no_std) |
/crates/rvf/rvf-solver-wasm/src/engine.rs |
Adaptive solver engine |
/crates/rvf/rvf-solver-wasm/src/policy.rs |
PolicyKernel with Thompson Sampling |
/crates/rvf/rvf-solver-wasm/Cargo.toml |
no_std WASM build configuration |
Build and Test Files
| File | Purpose |
|---|---|
/Cargo.toml |
Workspace WASM dependencies and build profiles |
/scripts/test/test-wasm.mjs |
Node.js WASM test runner |
/examples/benchmarks/src/bin/wasm_solver_bench.rs |
Native vs WASM benchmark comparison |
/examples/pwa-loader/app.js |
Browser WASM loading and memory management |
RVF Self-Bootstrap Files
| File | Purpose |
|---|---|
/crates/rvf/rvf-types/src/wasm_bootstrap.rs |
WasmHeader, WasmRole, WasmTarget, feature flags |
TypeScript/npm Files
| File | Purpose |
|---|---|
/npm/packages/ruvector-wasm-unified/src/index.ts |
Unified WASM engine TypeScript API |
Appendix B: WASM Binary Size Inventory
| Binary | Size | Strategy |
|---|---|---|
micro_hnsw.wasm |
11.8 KB | no_std, bare minimum |
ruvector_learning_wasm_bg.wasm |
39 KB | wasm-bindgen |
ruvector_exotic_wasm_bg.wasm |
149 KB | wasm-bindgen |
ruvector_nervous_system_wasm_bg.wasm |
178 KB | wasm-bindgen |
ruvector_economy_wasm_bg.wasm |
181 KB | wasm-bindgen |
ruvector_attention_unified_wasm_bg.wasm |
339 KB | wasm-bindgen |
rvf-solver-wasm (estimated) |
~160 KB | no_std + dlmalloc |
The sublinear-time solver should target the <200 KB range using the no_std approach, consistent with rvf-solver-wasm.