git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
912 lines
35 KiB
Markdown
912 lines
35 KiB
Markdown
# 06 - WebAssembly Integration Analysis
|
|
|
|
**Agent**: 6 (WASM Integration Specialist)
|
|
**Date**: 2026-02-20
|
|
**Scope**: ruvector codebase WASM capabilities, build pipeline, SIMD acceleration, memory management, deployment strategies, module loading, and benchmarking framework
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Existing WASM Usage in ruvector](#1-existing-wasm-usage-in-ruvector)
|
|
2. [WASM Build Pipeline Compatibility](#2-wasm-build-pipeline-compatibility)
|
|
3. [SIMD Acceleration Opportunities](#3-simd-acceleration-opportunities)
|
|
4. [Memory Management Patterns](#4-memory-management-patterns)
|
|
5. [Browser vs Node.js Deployment Strategies](#5-browser-vs-nodejs-deployment-strategies)
|
|
6. [WASM Module Loading and Initialization Patterns](#6-wasm-module-loading-and-initialization-patterns)
|
|
7. [Performance Benchmarking Framework for WASM](#7-performance-benchmarking-framework-for-wasm)
|
|
8. [Recommendations for the Sublinear-Time Solver](#8-recommendations-for-the-sublinear-time-solver)
|
|
|
|
---
|
|
|
|
## 1. Existing WASM Usage in ruvector
|
|
|
|
### 1.1 Scale of WASM Infrastructure
|
|
|
|
The ruvector project has a **massive, mature WASM infrastructure**. The workspace defines **27 dedicated WASM crates** in the Cargo workspace, spanning vector database operations, attention mechanisms, graph algorithms, ML inference, and self-learning solvers. This is not an experimental feature -- it is a first-class deployment target.
|
|
|
|
#### WASM Crate Inventory (27 crates)
|
|
|
|
| Crate | Description | Target | Size |
|
|
|-------|-------------|--------|------|
|
|
| `ruvector-wasm` | Core vector DB bindings (HNSW, insert, search, delete) | `wasm32-unknown-unknown` (wasm-bindgen) | ~28 KB src |
|
|
| `rvf-solver-wasm` | Self-learning temporal solver (Thompson Sampling, PolicyKernel) | `wasm32-unknown-unknown` (no_std + alloc, `extern "C"`) | ~160 KB compiled |
|
|
| `rvf-wasm` | RVF format microkernel for browser/edge vector ops | `wasm32-unknown-unknown` | - |
|
|
| `micro-hnsw-wasm` | Neuromorphic HNSW with spiking neural nets | `wasm32-unknown-unknown` | 11.8 KB compiled |
|
|
| `ruvector-attention-wasm` | 18+ attention mechanisms (Flash, MoE, Hyperbolic) | `wasm32-unknown-unknown` (wasm-bindgen) | - |
|
|
| `ruvector-attention-unified-wasm` | Unified attention API | `wasm32-unknown-unknown` | 339 KB compiled |
|
|
| `ruvector-learning-wasm` | MicroLoRA adaptation (<100us latency) | `wasm32-unknown-unknown` | 39 KB compiled |
|
|
| `ruvector-nervous-system-wasm` | Bio-inspired neural simulation | `wasm32-unknown-unknown` | 178 KB compiled |
|
|
| `ruvector-economy-wasm` | Compute credit management | `wasm32-unknown-unknown` | 181 KB compiled |
|
|
| `ruvector-exotic-wasm` | Quantum, hyperbolic, topological | `wasm32-unknown-unknown` | 149 KB compiled |
|
|
| `ruvector-sparse-inference-wasm` | Sparse matrix inference with WASM SIMD | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-delta-wasm` | Delta operations with SIMD | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-mincut-wasm` | Subpolynomial-time dynamic min-cut | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-mincut-gated-transformer-wasm` | Gated transformer min-cut | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-graph-wasm` | Graph operations | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-gnn-wasm` | Graph neural networks | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-dag-wasm` | Minimal DAG for browser/embedded | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-math-wasm` | Math operations (Wasserstein, manifolds, spherical) | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-router-wasm` | Query routing | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-fpga-transformer-wasm` | FPGA transformer simulation | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-temporal-tensor-wasm` | Temporal tensor operations | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-tiny-dancer-wasm` | Lightweight operations | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-hyperbolic-hnsw-wasm` | Hyperbolic HNSW | `wasm32-unknown-unknown` | - |
|
|
| `ruvector-domain-expansion-wasm` | Cross-domain transfer learning | `wasm32-unknown-unknown` | - |
|
|
| `ruvllm-wasm` | LLM inference | `wasm32-unknown-unknown` | - |
|
|
| `ruqu-wasm` | Quantum operations | `wasm32-unknown-unknown` | - |
|
|
| `exo-wasm` (example) | Exo AI experiment | `wasm32-unknown-unknown` | - |
|
|
|
|
### 1.2 Two Distinct WASM Binding Strategies
|
|
|
|
The codebase employs **two fundamentally different WASM integration patterns**:
|
|
|
|
#### Pattern A: wasm-bindgen + wasm-pack (High-Level, Browser-First)
|
|
|
|
Used by: `ruvector-wasm`, `ruvector-attention-wasm`, `ruvector-math-wasm`, most `-wasm` crates.
|
|
|
|
```rust
|
|
// crates/ruvector-wasm/src/lib.rs
|
|
use wasm_bindgen::prelude::*;
|
|
use js_sys::{Float32Array, Object, Promise};
|
|
use web_sys::{console, IdbDatabase, IdbFactory};
|
|
|
|
#[wasm_bindgen(start)]
|
|
pub fn init() {
|
|
console_error_panic_hook::set_once();
|
|
tracing_wasm::set_as_global_default();
|
|
}
|
|
|
|
#[wasm_bindgen]
|
|
pub struct VectorDB { /* ... */ }
|
|
|
|
#[wasm_bindgen]
|
|
impl VectorDB {
|
|
#[wasm_bindgen(constructor)]
|
|
pub fn new(dimensions: usize, metric: Option<String>, use_hnsw: Option<bool>)
|
|
-> Result<VectorDB, JsValue> { /* ... */ }
|
|
}
|
|
```
|
|
|
|
Key dependencies: `wasm-bindgen`, `wasm-bindgen-futures`, `js-sys`, `web-sys`, `serde-wasm-bindgen`, `console_error_panic_hook`.
|
|
|
|
Advantages: Rich JS interop, automatic TypeScript type generation, Promise support, access to Web APIs (IndexedDB, Workers, console).
|
|
|
|
#### Pattern B: no_std + extern "C" ABI (Low-Level, Minimal)
|
|
|
|
Used by: `rvf-solver-wasm`, `rvf-wasm`, `micro-hnsw-wasm`.
|
|
|
|
```rust
|
|
// crates/rvf/rvf-solver-wasm/src/lib.rs
|
|
#![no_std]
|
|
extern crate alloc;
|
|
|
|
#[no_mangle]
|
|
pub extern "C" fn rvf_solver_create() -> i32 {
|
|
registry().create()
|
|
}
|
|
|
|
#[no_mangle]
|
|
pub extern "C" fn rvf_solver_train(handle: i32, count: i32, /* ... */) -> i32 { /* ... */ }
|
|
```
|
|
|
|
Key dependencies: `dlmalloc` (global allocator), `libm`, `serde` (no_std + alloc). No wasm-bindgen.
|
|
|
|
Advantages: Minimal binary size (~160 KB for rvf-solver-wasm, 11.8 KB for micro-hnsw-wasm), no JS runtime dependency, runs on bare wasm32-unknown-unknown, suitable for self-bootstrapping RVF files.
|
|
|
|
### 1.3 Kernel Pack System (ADR-005)
|
|
|
|
The `ruvector-wasm` crate includes a sophisticated **Kernel Pack System** (`/crates/ruvector-wasm/src/kernel/`) for secure, sandboxed execution of ML compute kernels via Wasmtime:
|
|
|
|
- **Manifest parsing** (`manifest.rs`): Declares kernel categories (Positional/RoPE, Normalization/RMSNorm, Activation/SwiGLU, KV-Cache, Adapter/LoRA), tensor specs, resource limits
|
|
- **Ed25519 signature verification** (`signature.rs`): Supply chain security for kernel packs
|
|
- **SHA256 hash verification** (`hash.rs`): Content integrity
|
|
- **Epoch-based execution budgets** (`epoch.rs`): Coarse-grained interruption with configurable tick intervals (10ms server, 1ms embedded)
|
|
- **Shared memory protocol** (`memory.rs`): 16-byte aligned allocation, region overlap validation, tensor layout management
|
|
- **Kernel runtime** (`runtime.rs`): `KernelRuntime` trait with compile/instantiate/execute lifecycle, mock runtime for testing
|
|
- **Trusted allowlist** (`allowlist.rs`): Restricts which kernel IDs may execute
|
|
|
|
This kernel pack system is directly relevant to the sublinear-time solver because it provides a ready-made infrastructure for sandboxed execution of solver kernels with resource limits.
|
|
|
|
### 1.4 Self-Bootstrapping WASM (RVF Format)
|
|
|
|
The `rvf-types` crate defines a `WasmHeader` (`/crates/rvf/rvf-types/src/wasm_bootstrap.rs`) for embedding WASM modules directly inside `.rvf` data files:
|
|
|
|
```
|
|
.rvf file
|
|
+-- WASM_SEG (role=Interpreter, ~50 KB)
|
|
+-- WASM_SEG (role=Microkernel, ~5.5 KB)
|
|
+-- VEC_SEG (data)
|
|
```
|
|
|
|
Roles: `Microkernel`, `Interpreter`, `Combined`, `Extension`, `ControlPlane`.
|
|
Targets: `Wasm32`, `WasiP1`, `WasiP2`, `Browser`, `BareTile`.
|
|
Feature flags: `WASM_FEAT_SIMD`, `WASM_FEAT_BULK_MEMORY`, `WASM_FEAT_MULTI_VALUE`, `WASM_FEAT_REFERENCE_TYPES`, `WASM_FEAT_THREADS`, `WASM_FEAT_TAIL_CALL`, `WASM_FEAT_GC`, `WASM_FEAT_EXCEPTION_HANDLING`.
|
|
|
|
### 1.5 Unified WASM TypeScript API
|
|
|
|
The `@ruvector/wasm-unified` npm package (`/npm/packages/ruvector-wasm-unified/src/index.ts`) provides a high-level TypeScript surface combining all WASM modules:
|
|
|
|
```typescript
|
|
export interface UnifiedEngine {
|
|
attention: AttentionEngine; // 14+ mechanisms
|
|
learning: LearningEngine; // MicroLoRA, SONA, BTSP, RL
|
|
nervous: NervousEngine; // Bio-inspired neural simulation
|
|
economy: EconomyEngine; // Compute credits
|
|
exotic: ExoticEngine; // Quantum, hyperbolic, topological
|
|
version(): string;
|
|
getStats(): UnifiedStats;
|
|
init(): Promise<void>;
|
|
dispose(): void;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 2. WASM Build Pipeline Compatibility
|
|
|
|
### 2.1 Workspace-Level Configuration
|
|
|
|
The root `Cargo.toml` defines workspace-level WASM dependencies:
|
|
|
|
```toml
|
|
# /Cargo.toml (workspace)
|
|
[workspace.dependencies]
|
|
wasm-bindgen = "0.2"
|
|
wasm-bindgen-futures = "0.4"
|
|
js-sys = "0.3"
|
|
web-sys = { version = "0.3", features = ["Worker", "MessagePort", "console"] }
|
|
getrandom = { version = "0.3", features = ["wasm_js"] }
|
|
```
|
|
|
|
There is also a getrandom compatibility patch for WASM:
|
|
|
|
```toml
|
|
# In ruvector-wasm/Cargo.toml
|
|
getrandom02 = { package = "getrandom", version = "0.2", features = ["js"] }
|
|
[target.'cfg(target_arch = "wasm32")'.dependencies]
|
|
getrandom = { workspace = true, features = ["wasm_js"] }
|
|
```
|
|
|
|
And a workspace-level patch for hnsw_rs to use rand 0.8 for WASM compatibility:
|
|
|
|
```toml
|
|
[patch.crates-io]
|
|
hnsw_rs = { path = "./patches/hnsw_rs" }
|
|
```
|
|
|
|
### 2.2 Build Profiles
|
|
|
|
Two distinct WASM build profiles exist:
|
|
|
|
#### Profile 1: Size-Optimized (for wasm-bindgen crates)
|
|
|
|
```toml
|
|
# crates/ruvector-wasm/Cargo.toml
|
|
[profile.release]
|
|
opt-level = "z" # Optimize for size
|
|
lto = true # Link-time optimization
|
|
codegen-units = 1 # Single codegen unit
|
|
panic = "abort" # No unwind tables
|
|
|
|
[profile.release.package."*"]
|
|
opt-level = "z"
|
|
|
|
[package.metadata.wasm-pack.profile.release]
|
|
wasm-opt = false # Disable wasm-opt (already optimized by LTO)
|
|
```
|
|
|
|
#### Profile 2: Size-Optimized + Strip (for no_std crates)
|
|
|
|
```toml
|
|
# crates/rvf/rvf-solver-wasm/Cargo.toml
|
|
[profile.release]
|
|
opt-level = "z"
|
|
lto = true
|
|
codegen-units = 1
|
|
strip = true # Also strips debug symbols
|
|
```
|
|
|
|
#### Profile 3: Workspace Default Release (native)
|
|
|
|
```toml
|
|
# Root Cargo.toml
|
|
[profile.release]
|
|
opt-level = 3 # Optimize for speed
|
|
lto = "fat"
|
|
codegen-units = 1
|
|
strip = true
|
|
panic = "unwind" # Keeps unwind tables (unlike WASM profile)
|
|
```
|
|
|
|
### 2.3 Build Tooling
|
|
|
|
The test script at `/scripts/test/test-wasm.mjs` demonstrates the build command:
|
|
|
|
```bash
|
|
wasm-pack build crates/ruvector-attention-wasm --target web --release
|
|
```
|
|
|
|
For no_std crates like rvf-solver-wasm, the standard cargo command with WASM target is used:
|
|
|
|
```bash
|
|
cargo build --target wasm32-unknown-unknown --release -p rvf-solver-wasm
|
|
```
|
|
|
|
### 2.4 Sublinear-Time Solver Build Compatibility
|
|
|
|
The rvf-solver-wasm crate provides the closest precedent for a sublinear-time solver WASM build:
|
|
|
|
- **Target**: `wasm32-unknown-unknown` (no WASI dependency)
|
|
- **Allocator**: `dlmalloc` (global allocator for `alloc`)
|
|
- **Math**: `libm` (no_std-compatible math functions)
|
|
- **Serialization**: `serde` + `serde_json` (no_std + alloc features)
|
|
- **Crypto**: `rvf-crypto` (SHAKE-256 witness chain)
|
|
- **Panic handler**: `core::arch::wasm32::unreachable()`
|
|
- **ABI**: `extern "C"` exports (no wasm-bindgen overhead)
|
|
- **Crate type**: `cdylib` only (no rlib)
|
|
|
|
This approach produces binaries in the ~160 KB range, which is excellent for edge deployment.
|
|
|
|
---
|
|
|
|
## 3. SIMD Acceleration Opportunities
|
|
|
|
### 3.1 Existing WASM SIMD Infrastructure
|
|
|
|
The codebase has **extensive WASM SIMD128 support** across multiple crates, all using `core::arch::wasm32::*` intrinsics. Every SIMD function provides dual implementations: a `#[cfg(target_feature = "simd128")]` version using WASM SIMD intrinsics and a `#[cfg(not(target_feature = "simd128"))]` scalar fallback.
|
|
|
|
#### WASM SIMD Operations Already Implemented
|
|
|
|
| Crate | File | Operations |
|
|
|-------|------|------------|
|
|
| `ruvector-delta-wasm` | `src/simd.rs` | `f32x4` add, sub, scale, dot, L2 norm, diff, abs, clamp, count_nonzero |
|
|
| `ruvector-sparse-inference` | `src/backend/wasm.rs` | `f32x4` dot product, ReLU, vector add, AXPY |
|
|
| `ruvector-mincut` | `src/wasm/simd.rs` | `v128` popcount (table lookup method), XOR, boundary computation, batch membership |
|
|
| `ruvector-core` | `src/simd_intrinsics.rs` | x86_64 (AVX2, AVX-512, FMA), aarch64 (NEON, unrolled), INT8 quantized, batch operations |
|
|
|
|
#### SIMD Operations in ruvector-delta-wasm/src/simd.rs (Representative)
|
|
|
|
```rust
|
|
use core::arch::wasm32::*;
|
|
|
|
#[cfg(target_feature = "simd128")]
|
|
pub fn simd_dot(a: &[f32], b: &[f32]) -> f32 {
|
|
let chunks = a.len() / 4;
|
|
let mut sum_vec = f32x4_splat(0.0);
|
|
for i in 0..chunks {
|
|
let offset = i * 4;
|
|
unsafe {
|
|
let a_vec = v128_load(a.as_ptr().add(offset) as *const v128);
|
|
let b_vec = v128_load(b.as_ptr().add(offset) as *const v128);
|
|
let prod = f32x4_mul(a_vec, b_vec);
|
|
sum_vec = f32x4_add(sum_vec, prod);
|
|
}
|
|
}
|
|
// Horizontal sum + remainder handling
|
|
let sum_array: [f32; 4] = unsafe { core::mem::transmute(sum_vec) };
|
|
let mut sum = sum_array[0] + sum_array[1] + sum_array[2] + sum_array[3];
|
|
for i in (chunks * 4)..a.len() { sum += a[i] * b[i]; }
|
|
sum
|
|
}
|
|
```
|
|
|
|
#### SIMD Operations in ruvector-sparse-inference/src/backend/wasm.rs (Backend Trait)
|
|
|
|
```rust
|
|
pub struct WasmBackend;
|
|
|
|
impl Backend for WasmBackend {
|
|
fn dot_product(&self, a: &[f32], b: &[f32]) -> f32 { /* SIMD dispatch */ }
|
|
fn sparse_matmul(&self, matrix: &Array2<f32>, input: &[f32], rows: &[usize]) -> Vec<f32>;
|
|
fn sparse_matmul_accumulate(&self, matrix: &Array2<f32>, input: &[f32], cols: &[usize], output: &mut [f32]);
|
|
fn activation(&self, data: &mut [f32], activation_type: ActivationType); // ReLU via SIMD
|
|
fn add(&self, a: &mut [f32], b: &[f32]);
|
|
fn axpy(&self, a: &mut [f32], b: &[f32], scalar: f32);
|
|
fn name(&self) -> &'static str { "WASM-SIMD" }
|
|
fn simd_width(&self) -> usize { 4 } // 128-bit = 4 x f32
|
|
}
|
|
```
|
|
|
|
### 3.2 SIMD Acceleration Opportunities for the Sublinear-Time Solver
|
|
|
|
Based on the sublinear-time solver's core operations, the following SIMD acceleration points are identified:
|
|
|
|
| Operation | SIMD Strategy | Expected Speedup | Existing Pattern |
|
|
|-----------|---------------|-------------------|------------------|
|
|
| Distance computation (dot, cosine, euclidean) | `f32x4_mul` + `f32x4_add` accumulation | 2-4x | `ruvector-delta-wasm/src/simd.rs` |
|
|
| Vector normalization | `f32x4_mul` (scale) + `f32x4_add` (L2 norm) | 2-4x | `simd_l2_norm_squared`, `simd_scale` |
|
|
| Bitset operations (partition tracking) | `v128_xor`, `v128_and`, popcount via lookup | 4-8x | `ruvector-mincut/src/wasm/simd.rs` |
|
|
| Sparse matrix-vector multiply | SIMD dot + sparse row selection | 2-4x | `WasmBackend::sparse_matmul` |
|
|
| Activation functions (ReLU, GELU) | `f32x4_max` with zero splat | 2-4x | `relu_wasm_simd` |
|
|
| Thompson Sampling bandit updates | Scalar (branching-heavy) | 1x (no benefit) | N/A |
|
|
| Sort/selection (top-k) | Scalar (comparison-heavy) | 1x (no benefit) | N/A |
|
|
|
|
### 3.3 SIMD Feature Detection
|
|
|
|
The `ruvector-wasm` crate exposes SIMD detection to JS:
|
|
|
|
```rust
|
|
#[wasm_bindgen(js_name = detectSIMD)]
|
|
pub fn detect_simd() -> bool {
|
|
#[cfg(target_feature = "simd128")]
|
|
{ true }
|
|
#[cfg(not(target_feature = "simd128"))]
|
|
{ false }
|
|
}
|
|
```
|
|
|
|
For the sublinear-time solver, SIMD should be compiled in via `RUSTFLAGS="-C target-feature=+simd128"` at build time, with scalar fallbacks for environments that do not support it.
|
|
|
|
### 3.4 Native SIMD Comparison
|
|
|
|
The native codebase (`ruvector-core/src/simd_intrinsics.rs`) supports:
|
|
- **x86_64**: AVX2 (256-bit, 8 x f32), AVX-512 (512-bit, 16 x f32), FMA, INT8 quantized
|
|
- **aarch64**: NEON (128-bit, 4 x f32), 4x loop unrolling, FMA via `vfmaq_f32`
|
|
- **WASM**: SIMD128 (128-bit, 4 x f32)
|
|
|
|
WASM SIMD128 provides the same width as NEON (4 x f32) but lacks FMA (`f32x4_fma` is not available in stable WASM SIMD). This means the sublinear-time solver WASM build will be approximately 2-3x slower than a native NEON build for distance computations, and 4-8x slower than an AVX-512 build. However, it will still be significantly faster than scalar fallback.
|
|
|
|
---
|
|
|
|
## 4. Memory Management Patterns
|
|
|
|
### 4.1 Shared Memory Protocol (Kernel Pack System)
|
|
|
|
The kernel pack system at `/crates/ruvector-wasm/src/kernel/memory.rs` defines a mature shared memory protocol:
|
|
|
|
```rust
|
|
pub struct SharedMemoryProtocol {
|
|
total_size: usize, // Total memory in bytes
|
|
current_offset: usize, // Bump allocator position
|
|
alignment: usize, // Typically 16 bytes
|
|
}
|
|
|
|
impl SharedMemoryProtocol {
|
|
pub fn default_settings() -> Self {
|
|
Self::new(256, 16) // 256 pages = 16 MB, 16-byte alignment
|
|
}
|
|
|
|
pub fn allocate(&mut self, size: usize) -> Result<usize, KernelError> {
|
|
let aligned_offset = self.align_offset(self.current_offset);
|
|
// ...bounds check...
|
|
self.current_offset = aligned_offset + size;
|
|
Ok(aligned_offset)
|
|
}
|
|
}
|
|
```
|
|
|
|
The `KernelInvocationDescriptor` manages tensor memory layout:
|
|
|
|
```rust
|
|
pub struct KernelInvocationDescriptor {
|
|
pub descriptor: KernelDescriptor, // input_a, input_b, output, scratch, params offsets+sizes
|
|
protocol: SharedMemoryProtocol,
|
|
}
|
|
```
|
|
|
|
The `MemoryLayoutValidator` prevents region overlap and bounds violations.
|
|
|
|
### 4.2 Typed Arrays / Zero-Copy Transfer
|
|
|
|
The wasm-bindgen crates use `Float32Array` for zero-copy data transfer between JS and WASM:
|
|
|
|
```rust
|
|
// Input: JS Float32Array -> Rust Vec<f32>
|
|
pub fn insert(&self, vector: Float32Array, ...) -> Result<String, JsValue> {
|
|
let vector_data: Vec<f32> = vector.to_vec(); // Copy from JS typed array
|
|
// ...
|
|
}
|
|
|
|
// Output: Rust Vec<f32> -> JS Float32Array
|
|
pub fn vector(&self) -> Float32Array {
|
|
Float32Array::from(&self.inner.vector[..]) // Copy to JS typed array
|
|
}
|
|
```
|
|
|
|
Note: `Float32Array::to_vec()` and `Float32Array::from()` perform copies. True zero-copy requires accessing WASM linear memory directly from JS, which is demonstrated in the pwa-loader:
|
|
|
|
```javascript
|
|
// Zero-copy write into WASM memory
|
|
function wasmWrite(data) {
|
|
const ptr = wasmInstance.exports.rvf_alloc(data.length);
|
|
const mem = new Uint8Array(wasmMemory.buffer, ptr, data.length);
|
|
mem.set(data); // Direct memory write
|
|
return ptr;
|
|
}
|
|
|
|
// Zero-copy read from WASM memory
|
|
function wasmRead(ptr, len) {
|
|
return new Uint8Array(wasmMemory.buffer, ptr, len).slice();
|
|
}
|
|
```
|
|
|
|
### 4.3 Memory Patterns in rvf-solver-wasm (no_std)
|
|
|
|
The no_std solver uses `dlmalloc` as global allocator and manages its own instance registry:
|
|
|
|
```rust
|
|
// Global mutable registry - safe in single-threaded WASM
|
|
static mut REGISTRY: Registry = Registry::new();
|
|
const MAX_INSTANCES: usize = 8;
|
|
|
|
struct SolverInstance {
|
|
solver: AdaptiveSolver,
|
|
last_result_json: Vec<u8>, // Heap-allocated via dlmalloc
|
|
policy_json: Vec<u8>,
|
|
witness_chain: Vec<u8>,
|
|
}
|
|
```
|
|
|
|
Memory export for external reads uses raw pointer copies:
|
|
|
|
```rust
|
|
#[no_mangle]
|
|
pub extern "C" fn rvf_solver_result_read(handle: i32, out_ptr: i32) -> i32 {
|
|
let data = &inst.last_result_json;
|
|
unsafe {
|
|
core::ptr::copy_nonoverlapping(data.as_ptr(), out_ptr as *mut u8, data.len());
|
|
}
|
|
data.len() as i32
|
|
}
|
|
```
|
|
|
|
### 4.4 Memory Limits
|
|
|
|
| Configuration | Max Pages | Memory Limit | Context |
|
|
|---------------|-----------|--------------|---------|
|
|
| Server runtime | 1024 | 64 MB | `RuntimeConfig::server()` |
|
|
| Embedded runtime | 64 | 4 MB | `RuntimeConfig::embedded()` |
|
|
| Default shared memory | 256 | 16 MB | `SharedMemoryProtocol::default_settings()` |
|
|
| Microkernel (RVF) | 2-4 | 128-256 KB | `WasmHeader` min/max pages |
|
|
| WASM page size | 1 | 64 KB | `WASM_PAGE_SIZE = 65536` |
|
|
|
|
### 4.5 Security Boundary Validation
|
|
|
|
The `ruvector-wasm` crate enforces input validation at the WASM boundary:
|
|
|
|
```rust
|
|
const MAX_VECTOR_DIMENSIONS: usize = 65536;
|
|
|
|
#[wasm_bindgen(constructor)]
|
|
pub fn new(vector: Float32Array, ...) -> Result<JsVectorEntry, JsValue> {
|
|
let vec_len = vector.length() as usize;
|
|
if vec_len == 0 {
|
|
return Err(JsValue::from_str("Vector cannot be empty"));
|
|
}
|
|
if vec_len > MAX_VECTOR_DIMENSIONS {
|
|
return Err(JsValue::from_str(&format!(
|
|
"Vector dimensions {} exceed maximum allowed {}", vec_len, MAX_VECTOR_DIMENSIONS
|
|
)));
|
|
}
|
|
// ...
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Browser vs Node.js Deployment Strategies
|
|
|
|
### 5.1 Browser Deployment (Primary)
|
|
|
|
The ruvector-wasm crate is browser-first, using:
|
|
|
|
- **IndexedDB persistence**: `web-sys` features include `IdbDatabase`, `IdbFactory`, `IdbObjectStore`, `IdbRequest`, `IdbTransaction`, `IdbOpenDbRequest` (`/crates/ruvector-wasm/Cargo.toml`)
|
|
- **Web Workers**: Embedded JavaScript worker pool (`/crates/ruvector-wasm/src/worker-pool.js`, `/crates/ruvector-wasm/src/worker.js`) for parallel operations
|
|
- **Tracing via console**: `tracing-wasm` sends logs to browser dev tools
|
|
- **Promise-based async**: `wasm-bindgen-futures` for async operations
|
|
- **getrandom via JS**: `getrandom` with `wasm_js` feature uses `crypto.getRandomValues()`
|
|
- **PWA support**: The pwa-loader example (`/examples/pwa-loader/app.js`) demonstrates offline-capable WASM loading
|
|
|
|
#### Browser Loading Pattern
|
|
|
|
```javascript
|
|
// From examples/pwa-loader/app.js
|
|
async function loadWasm() {
|
|
const response = await fetch(WASM_PATH);
|
|
const bytes = await response.arrayBuffer();
|
|
const importObject = { env: {} };
|
|
const result = await WebAssembly.instantiate(bytes, importObject);
|
|
wasmInstance = result.instance;
|
|
wasmMemory = wasmInstance.exports.memory;
|
|
}
|
|
```
|
|
|
|
#### Browser SIMD Support
|
|
|
|
WASM SIMD128 is supported in Chrome 91+, Firefox 89+, Safari 16.4+, and Edge 91+. This covers >95% of active browsers as of 2026. Feature detection can be done via:
|
|
|
|
```javascript
|
|
const simdSupported = WebAssembly.validate(
|
|
new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11])
|
|
);
|
|
```
|
|
|
|
### 5.2 Node.js Deployment
|
|
|
|
The project supports Node.js via:
|
|
|
|
- **wasm-pack `--target nodejs`**: Generates CommonJS bindings
|
|
- **Direct instantiation** from test scripts (`/scripts/test/test-wasm.mjs`):
|
|
|
|
```javascript
|
|
import { readFileSync } from 'fs';
|
|
const wasmBuffer = readFileSync(wasmPath);
|
|
const mathWasm = await import(join(pkgPath, 'ruvector_math_wasm.js'));
|
|
await mathWasm.default(wasmBuffer);
|
|
```
|
|
|
|
- **Edge-net example**: `/examples/edge-net/pkg/node/` provides Node-specific WASM packages
|
|
|
|
Node.js has had WASM SIMD support since v16.4 (V8 9.1+). For the sublinear-time solver, Node.js deployment enables server-side and CLI usage with the same WASM binary.
|
|
|
|
### 5.3 Edge / Embedded Deployment
|
|
|
|
The `micro-hnsw-wasm` crate (11.8 KB) and `rvf-solver-wasm` (~160 KB) demonstrate ultra-compact deployment:
|
|
|
|
- **iOS/Swift**: `/examples/wasm/ios/` includes Swift resources with embedded WASM
|
|
- **Self-bootstrapping**: The WASM_SEG system embeds WASM interpreters inside data files
|
|
- **Target platforms**: `WasmTarget::Wasm32`, `WasiP1`, `WasiP2`, `Browser`, `BareTile`
|
|
|
|
### 5.4 Deployment Target Matrix
|
|
|
|
| Target | WASM Format | Binding | SIMD | Size Budget | Persistence |
|
|
|--------|-------------|---------|------|-------------|-------------|
|
|
| Browser (Chrome/FF/Safari) | wasm-bindgen | JS glue + TS types | SIMD128 | <500 KB | IndexedDB |
|
|
| Node.js (>= 16.4) | wasm-bindgen (nodejs) or raw | CommonJS/ESM | SIMD128 | <1 MB | fs |
|
|
| Cloudflare Workers | wasm-bindgen (web) | ESM | SIMD128 | <1 MB | KV |
|
|
| iOS/Swift | raw wasm32 | C FFI | Optional | <200 KB | CoreData |
|
|
| Bare-metal / RVF | no_std cdylib | extern "C" | Optional | <200 KB | None |
|
|
|
|
---
|
|
|
|
## 6. WASM Module Loading and Initialization Patterns
|
|
|
|
### 6.1 Pattern 1: wasm-bindgen Auto-Init
|
|
|
|
Used by most WASM crates. The `#[wasm_bindgen(start)]` attribute runs initialization automatically:
|
|
|
|
```rust
|
|
#[wasm_bindgen(start)]
|
|
pub fn init() {
|
|
console_error_panic_hook::set_once();
|
|
tracing_wasm::set_as_global_default();
|
|
}
|
|
```
|
|
|
|
JS side (generated by wasm-pack):
|
|
|
|
```javascript
|
|
import init, { VectorDB } from './ruvector_wasm.js';
|
|
await init(); // Loads + instantiates + runs start function
|
|
const db = new VectorDB(384, 'cosine', true);
|
|
```
|
|
|
|
### 6.2 Pattern 2: Manual WebAssembly.instantiate
|
|
|
|
Used by the pwa-loader and no_std modules:
|
|
|
|
```javascript
|
|
const response = await fetch(WASM_PATH);
|
|
const bytes = await response.arrayBuffer();
|
|
const importObject = { env: {} };
|
|
const result = await WebAssembly.instantiate(bytes, importObject);
|
|
wasmInstance = result.instance;
|
|
wasmMemory = wasmInstance.exports.memory;
|
|
```
|
|
|
|
This pattern offers maximum control: the host can inspect exports before calling any function, handle errors granularly, and manage memory directly.
|
|
|
|
### 6.3 Pattern 3: Streaming Instantiation
|
|
|
|
For large modules, `WebAssembly.instantiateStreaming` should be used (not currently in the codebase but recommended):
|
|
|
|
```javascript
|
|
const result = await WebAssembly.instantiateStreaming(
|
|
fetch(WASM_PATH),
|
|
importObject
|
|
);
|
|
```
|
|
|
|
This starts compiling while bytes are still downloading, reducing load time by up to 50%.
|
|
|
|
### 6.4 Pattern 4: Unified Engine Lazy Init
|
|
|
|
The `@ruvector/wasm-unified` uses lazy initialization:
|
|
|
|
```typescript
|
|
let defaultEngine: UnifiedEngine | null = null;
|
|
|
|
export async function getDefaultEngine(): Promise<UnifiedEngine> {
|
|
if (!defaultEngine) {
|
|
defaultEngine = await createUnifiedEngine();
|
|
await defaultEngine.init();
|
|
}
|
|
return defaultEngine;
|
|
}
|
|
```
|
|
|
|
### 6.5 Pattern 5: Instance Registry (rvf-solver-wasm)
|
|
|
|
The solver WASM uses a handle-based instance registry:
|
|
|
|
```rust
|
|
static mut REGISTRY: Registry = Registry::new(); // Max 8 concurrent solvers
|
|
|
|
// JS creates solver:
|
|
let handle = wasmInstance.exports.rvf_solver_create();
|
|
// JS uses solver:
|
|
wasmInstance.exports.rvf_solver_train(handle, 100, 1, 10, seedLo, seedHi);
|
|
// JS reads result:
|
|
let len = wasmInstance.exports.rvf_solver_result_len(handle);
|
|
let ptr = wasmInstance.exports.rvf_solver_alloc(len);
|
|
wasmInstance.exports.rvf_solver_result_read(handle, ptr);
|
|
let json = new TextDecoder().decode(new Uint8Array(wasmMemory.buffer, ptr, len));
|
|
// JS destroys:
|
|
wasmInstance.exports.rvf_solver_destroy(handle);
|
|
```
|
|
|
|
This is the recommended pattern for the sublinear-time solver because it:
|
|
- Supports multiple concurrent solver instances
|
|
- Avoids global state issues
|
|
- Enables resource cleanup
|
|
- Works across all deployment targets (browser, Node, bare-metal)
|
|
|
|
---
|
|
|
|
## 7. Performance Benchmarking Framework for WASM
|
|
|
|
### 7.1 Existing Benchmark Infrastructure
|
|
|
|
#### In-WASM Benchmark Function
|
|
|
|
The `ruvector-wasm` crate includes a built-in benchmark export:
|
|
|
|
```rust
|
|
#[wasm_bindgen(js_name = benchmark)]
|
|
pub fn benchmark(name: &str, iterations: usize, dimensions: usize) -> Result<f64, JsValue> {
|
|
let start = Instant::now();
|
|
for i in 0..iterations {
|
|
let vector: Vec<f32> = (0..dimensions)
|
|
.map(|_| js_sys::Math::random() as f32)
|
|
.collect();
|
|
let vector_arr = Float32Array::from(&vector[..]);
|
|
db.insert(vector_arr, Some(format!("vec_{}", i)), None)?;
|
|
}
|
|
let duration = start.elapsed();
|
|
Ok(iterations as f64 / duration.as_secs_f64())
|
|
}
|
|
```
|
|
|
|
#### WASM Solver Benchmark Binary
|
|
|
|
The `/examples/benchmarks/src/bin/wasm_solver_bench.rs` provides a native vs WASM comparison framework:
|
|
|
|
```
|
|
WASM vs Native AGI Solver Benchmark
|
|
Config: holdout=50, training=50, cycles=3, budget=200
|
|
|
|
NATIVE SOLVER RESULTS
|
|
Mode Acc% Cost Noise% Time Pass
|
|
A baseline xx.x% xxx.x xx.x% xxxms PASS
|
|
B compiler xx.x% xxx.x xx.x% xxxms PASS
|
|
C learned xx.x% xxx.x xx.x% xxxms PASS
|
|
|
|
WASM REFERENCE METRICS
|
|
Native total time: xxxms
|
|
WASM expected: ~xxxms (2-5x native)
|
|
```
|
|
|
|
This establishes the expected WASM overhead: **2-5x slower than native** for the self-learning solver workload.
|
|
|
|
#### SIMD Benchmarks
|
|
|
|
The `/crates/prime-radiant/benches/simd_benchmarks.rs` and `/crates/ruvector-sparse-inference/benches/simd_kernels.rs` provide Criterion benchmarks for SIMD operations that can be adapted for WASM SIMD.
|
|
|
|
### 7.2 Recommended Benchmarking Framework for the Sublinear-Time Solver
|
|
|
|
```
|
|
sublinear-time-solver/benches/
|
|
wasm_bench.rs -- In-Rust Criterion benchmarks (native baseline)
|
|
wasm_bench.mjs -- Node.js WASM performance runner
|
|
wasm_bench.html -- Browser WASM performance runner
|
|
bench_harness.rs -- Shared benchmark harness (puzzle generation)
|
|
```
|
|
|
|
#### Metrics to Track
|
|
|
|
| Metric | Description | Measurement |
|
|
|--------|-------------|-------------|
|
|
| `solve_throughput` | Puzzles solved per second | `iterations / elapsed_secs` |
|
|
| `solve_latency_p50` | Median solve time | Percentile of individual solve times |
|
|
| `solve_latency_p99` | 99th percentile solve time | Percentile of individual solve times |
|
|
| `memory_peak_bytes` | Peak WASM linear memory usage | `memory.buffer.byteLength` |
|
|
| `module_load_ms` | Time to instantiate WASM module | `performance.now()` around `WebAssembly.instantiate` |
|
|
| `simd_speedup` | SIMD vs scalar performance ratio | Compare SIMD build vs non-SIMD build |
|
|
| `wasm_native_ratio` | WASM-to-native performance overhead | Compare WASM throughput vs native Criterion results |
|
|
| `binary_size_bytes` | Compiled .wasm file size | `wc -c *.wasm` |
|
|
| `accuracy_parity` | Solver accuracy matches native | Bit-exact or epsilon comparison of results |
|
|
|
|
#### Benchmark Protocol
|
|
|
|
1. **Native baseline**: Run the solver natively with Criterion (3+ iterations, warm-up)
|
|
2. **WASM baseline**: Load the same solver as WASM, run identical workload in Node.js
|
|
3. **WASM SIMD**: Build with `RUSTFLAGS="-C target-feature=+simd128"`, measure speedup
|
|
4. **Browser measurement**: Run in Chrome with `performance.now()`, measure real-world latency
|
|
5. **Size budget**: Track .wasm binary size across commits (regression alerts if >200 KB)
|
|
6. **Accuracy validation**: Compare solver output JSON between native and WASM (must match to f64 epsilon)
|
|
|
|
---
|
|
|
|
## 8. Recommendations for the Sublinear-Time Solver
|
|
|
|
### 8.1 Binding Strategy: Use no_std + extern "C" (Pattern B)
|
|
|
|
For the sublinear-time solver WASM module, adopt the `rvf-solver-wasm` pattern:
|
|
|
|
- **no_std + alloc**: Minimizes binary size, avoids JS runtime dependency
|
|
- **dlmalloc global allocator**: Proven in rvf-solver-wasm
|
|
- **extern "C" exports**: Maximum portability (browser, Node, embedded, bare-metal)
|
|
- **Handle-based instance registry**: Supports concurrent solver instances
|
|
- **Result reads via pointer+length**: JSON serialization of results into WASM memory, host reads via typed array view
|
|
|
|
Do not use wasm-bindgen for the core solver. A thin wasm-bindgen wrapper can be created separately if a richer JS API is needed.
|
|
|
|
### 8.2 SIMD Strategy: Conditional Compilation
|
|
|
|
```rust
|
|
// In the solver crate
|
|
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
|
|
mod simd_wasm {
|
|
use core::arch::wasm32::*;
|
|
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* SIMD128 */ }
|
|
}
|
|
|
|
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
|
|
mod simd_wasm {
|
|
pub fn distance_l2_simd(a: &[f32], b: &[f32]) -> f32 { /* scalar fallback */ }
|
|
}
|
|
```
|
|
|
|
Build two variants:
|
|
- `solver.wasm` -- scalar fallback (maximum compatibility)
|
|
- `solver-simd.wasm` -- SIMD128 enabled (Chrome 91+, FF 89+, Safari 16.4+, Node 16.4+)
|
|
|
|
### 8.3 Memory Strategy: Bump Allocator + Shared Memory Protocol
|
|
|
|
Adopt the `SharedMemoryProtocol` pattern from the kernel pack system:
|
|
|
|
1. Allocate a fixed arena at solver creation (e.g., 256 pages = 16 MB)
|
|
2. Use 16-byte aligned bump allocation for tensor data
|
|
3. Reset the allocator between solve invocations (amortized O(1))
|
|
4. Validate memory regions before kernel execution
|
|
5. Export `memory` so the host can directly view/write typed arrays without copying
|
|
|
|
### 8.4 Build Profile
|
|
|
|
```toml
|
|
[profile.release]
|
|
opt-level = "z"
|
|
lto = true
|
|
codegen-units = 1
|
|
strip = true
|
|
panic = "abort"
|
|
```
|
|
|
|
Target binary size: <200 KB (consistent with existing rvf-solver-wasm at ~160 KB).
|
|
|
|
### 8.5 Feature Detection Export
|
|
|
|
```rust
|
|
#[no_mangle]
|
|
pub extern "C" fn solver_capabilities() -> u32 {
|
|
let mut caps = 0u32;
|
|
#[cfg(target_feature = "simd128")]
|
|
{ caps |= 0x01; } // SIMD available
|
|
#[cfg(feature = "thompson-sampling")]
|
|
{ caps |= 0x02; } // Thompson Sampling enabled
|
|
#[cfg(feature = "witness-chain")]
|
|
{ caps |= 0x04; } // Witness chain enabled
|
|
caps
|
|
}
|
|
```
|
|
|
|
### 8.6 Testing Strategy
|
|
|
|
- Use `wasm-bindgen-test` with `run_in_browser` for browser tests (existing pattern)
|
|
- Use the Node.js test harness at `/scripts/test/test-wasm.mjs` as a template
|
|
- Validate accuracy parity with native build via `wasm_solver_bench`
|
|
- Run SIMD-specific tests with `RUSTFLAGS="-C target-feature=+simd128"` in CI
|
|
|
|
---
|
|
|
|
## Appendix A: File Reference
|
|
|
|
### Core WASM Source Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/crates/ruvector-wasm/src/lib.rs` | Main VectorDB WASM bindings (wasm-bindgen) |
|
|
| `/crates/ruvector-wasm/src/kernel/mod.rs` | Kernel pack system entry point |
|
|
| `/crates/ruvector-wasm/src/kernel/memory.rs` | Shared memory protocol, bump allocator |
|
|
| `/crates/ruvector-wasm/src/kernel/runtime.rs` | Kernel runtime trait, mock runtime, manager |
|
|
| `/crates/ruvector-wasm/src/kernel/epoch.rs` | Epoch-based execution budgets |
|
|
| `/crates/ruvector-wasm/src/kernel/signature.rs` | Ed25519 kernel pack verification |
|
|
| `/crates/ruvector-wasm/src/kernel/manifest.rs` | Kernel manifest parsing |
|
|
| `/crates/ruvector-wasm/Cargo.toml` | WASM dependency configuration |
|
|
|
|
### SIMD Source Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/crates/ruvector-delta-wasm/src/simd.rs` | WASM SIMD128 f32x4 operations |
|
|
| `/crates/ruvector-sparse-inference/src/backend/wasm.rs` | WASM SIMD backend with Backend trait |
|
|
| `/crates/ruvector-mincut/src/wasm/simd.rs` | WASM SIMD128 bitset operations |
|
|
| `/crates/ruvector-core/src/simd_intrinsics.rs` | Native SIMD (AVX2/AVX-512/NEON) reference |
|
|
|
|
### Solver WASM Source Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/crates/rvf/rvf-solver-wasm/src/lib.rs` | Self-learning solver WASM exports (no_std) |
|
|
| `/crates/rvf/rvf-solver-wasm/src/engine.rs` | Adaptive solver engine |
|
|
| `/crates/rvf/rvf-solver-wasm/src/policy.rs` | PolicyKernel with Thompson Sampling |
|
|
| `/crates/rvf/rvf-solver-wasm/Cargo.toml` | no_std WASM build configuration |
|
|
|
|
### Build and Test Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/Cargo.toml` | Workspace WASM dependencies and build profiles |
|
|
| `/scripts/test/test-wasm.mjs` | Node.js WASM test runner |
|
|
| `/examples/benchmarks/src/bin/wasm_solver_bench.rs` | Native vs WASM benchmark comparison |
|
|
| `/examples/pwa-loader/app.js` | Browser WASM loading and memory management |
|
|
|
|
### RVF Self-Bootstrap Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/crates/rvf/rvf-types/src/wasm_bootstrap.rs` | WasmHeader, WasmRole, WasmTarget, feature flags |
|
|
|
|
### TypeScript/npm Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `/npm/packages/ruvector-wasm-unified/src/index.ts` | Unified WASM engine TypeScript API |
|
|
|
|
---
|
|
|
|
## Appendix B: WASM Binary Size Inventory
|
|
|
|
| Binary | Size | Strategy |
|
|
|--------|------|----------|
|
|
| `micro_hnsw.wasm` | 11.8 KB | no_std, bare minimum |
|
|
| `ruvector_learning_wasm_bg.wasm` | 39 KB | wasm-bindgen |
|
|
| `ruvector_exotic_wasm_bg.wasm` | 149 KB | wasm-bindgen |
|
|
| `ruvector_nervous_system_wasm_bg.wasm` | 178 KB | wasm-bindgen |
|
|
| `ruvector_economy_wasm_bg.wasm` | 181 KB | wasm-bindgen |
|
|
| `ruvector_attention_unified_wasm_bg.wasm` | 339 KB | wasm-bindgen |
|
|
| `rvf-solver-wasm` (estimated) | ~160 KB | no_std + dlmalloc |
|
|
|
|
The sublinear-time solver should target the **<200 KB** range using the no_std approach, consistent with `rvf-solver-wasm`.
|