git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
460 lines
15 KiB
Markdown
460 lines
15 KiB
Markdown
# ADR-QE-003: WebAssembly Compilation Strategy
|
|
|
|
**Status**: Proposed
|
|
**Date**: 2026-02-06
|
|
**Authors**: ruv.io, RuVector Team
|
|
**Deciders**: Architecture Review Board
|
|
|
|
## Context
|
|
|
|
### Problem Statement
|
|
|
|
ruVector targets browsers, embedded/edge runtimes, and IoT devices via
|
|
WebAssembly. The quantum simulation engine must compile to
|
|
`wasm32-unknown-unknown` and run correctly in these constrained environments.
|
|
WASM introduces fundamental constraints that differ significantly from native
|
|
execution and must be addressed at the architectural level rather than
|
|
worked around at runtime.
|
|
|
|
### WASM Execution Environment Constraints
|
|
|
|
| Constraint | Detail | Impact on Quantum Simulation |
|
|
|------------|--------|------------------------------|
|
|
| 32-bit address space | ~4 GB theoretical max, ~2 GB practical | Hard ceiling on state vector size |
|
|
| Memory model | Linear memory, grows in 64 KB pages | Allocation must be page-aware |
|
|
| No native threads | Web Workers required for parallelism | Requires SharedArrayBuffer + COOP/COEP headers |
|
|
| No direct GPU | WebGPU is separate API, not WASM-native | GPU acceleration unavailable in WASM path |
|
|
| No OS syscalls | Sandboxed execution, no file/network | All I/O must go through host bindings |
|
|
| JIT compilation | V8/SpiderMonkey JIT, not AOT | ~1.5-3x slower than native, variable warmup |
|
|
| SIMD support | 128-bit SIMD proposal (widely supported since 2021) | 4 f32 or 2 f64 per vector lane |
|
|
| Stack size | Default ~1 MB, configurable | Deep recursion limited |
|
|
|
|
### Memory Budget Analysis for Quantum Simulation
|
|
|
|
The critical constraint is WASM's 32-bit address space. With a practical
|
|
usable limit of approximately 2 GB (due to browser memory allocation
|
|
behavior and address space fragmentation), the maximum feasible state vector
|
|
size is bounded:
|
|
|
|
```
|
|
Available WASM Memory Budget:
|
|
|
|
Total addressable: 4,294,967,296 bytes (4 GB theoretical)
|
|
Practical usable: ~2,147,483,648 bytes (2 GB, browser-dependent)
|
|
WASM overhead: ~100,000,000 bytes (module, stack, heap metadata)
|
|
Application overhead: ~50,000,000 bytes (circuit data, scratch buffers)
|
|
-------------------------------------------------
|
|
Available for state: ~2,000,000,000 bytes (1.86 GB)
|
|
|
|
State vector sizes:
|
|
24 qubits: 268,435,456 bytes (256 MB) -- comfortable
|
|
25 qubits: 536,870,912 bytes (512 MB) -- feasible
|
|
25 + scratch: ~1,073,741,824 bytes -- tight but within budget
|
|
26 qubits: 1,073,741,824 bytes (1 GB) -- state alone, no scratch room
|
|
27 qubits: 2,147,483,648 bytes (2 GB) -- exceeds practical limit
|
|
```
|
|
|
|
### Existing WASM Patterns in ruVector
|
|
|
|
The `ruvector-router-wasm` crate establishes conventions for WASM compilation:
|
|
|
|
- `wasm-pack build` as the compilation tool
|
|
- `wasm-bindgen` for JavaScript interop
|
|
- TypeScript definition generation
|
|
- Feature-flag controlled inclusion/exclusion of capabilities
|
|
- Dedicated test suites using `wasm-bindgen-test`
|
|
|
|
## Decision
|
|
|
|
### 1. Target and Toolchain
|
|
|
|
**Target triple**: `wasm32-unknown-unknown`
|
|
|
|
**Build toolchain**: `wasm-pack` with `wasm-bindgen`
|
|
|
|
```bash
|
|
# Development build
|
|
wasm-pack build crates/ruqu-wasm --target web --dev
|
|
|
|
# Release build with size optimization
|
|
wasm-pack build crates/ruqu-wasm --target web --release
|
|
|
|
# Node.js target (for server-side WASM)
|
|
wasm-pack build crates/ruqu-wasm --target nodejs --release
|
|
```
|
|
|
|
**Cargo profile for WASM release**:
|
|
|
|
```toml
|
|
[profile.wasm-release]
|
|
inherits = "release"
|
|
opt-level = "z" # Optimize for binary size
|
|
lto = true # Link-time optimization
|
|
codegen-units = 1 # Single codegen unit for maximum optimization
|
|
strip = true # Strip debug symbols
|
|
panic = "abort" # Smaller panic handling
|
|
```
|
|
|
|
### 2. Memory Limit Enforcement
|
|
|
|
`ruqu-wasm` enforces qubit limits before any allocation occurs. This is a hard
|
|
gate, not a soft warning.
|
|
|
|
**Enforcement strategy**:
|
|
|
|
```
|
|
User requests N qubits
|
|
|
|
|
v
|
|
[N <= 25?] ---NO---> Return WasmLimitError {
|
|
| requested: N,
|
|
YES maximum: 25,
|
|
| estimated_memory: 16 * 2^N,
|
|
v suggestion: "Use native build for >25 qubits"
|
|
[Estimate total }
|
|
memory needed]
|
|
|
|
|
v
|
|
[< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
|
|
|
|
|
YES
|
|
|
|
|
v
|
|
Proceed with allocation
|
|
```
|
|
|
|
**Qubit limits by precision**:
|
|
|
|
| Precision | Max Qubits (WASM) | State Size | With Scratch |
|
|
|-----------|--------------------|------------|--------------|
|
|
| Complex f64 (default) | 25 | 512 MB | ~1.07 GB |
|
|
| Complex f32 (optional) | 26 | 512 MB | ~1.07 GB |
|
|
|
|
**Error reporting**:
|
|
|
|
```rust
|
|
#[wasm_bindgen]
|
|
#[derive(Debug)]
|
|
pub struct WasmLimitError {
|
|
pub requested_qubits: usize,
|
|
pub maximum_qubits: usize,
|
|
pub estimated_bytes: usize,
|
|
pub message: String,
|
|
}
|
|
|
|
impl WasmLimitError {
|
|
pub fn qubit_overflow(requested: usize) -> Self {
|
|
let max = if cfg!(feature = "f32") { 26 } else { 25 };
|
|
let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
|
|
Self {
|
|
requested_qubits: requested,
|
|
maximum_qubits: max,
|
|
estimated_bytes: bytes_per_amplitude * (1usize << requested),
|
|
message: format!(
|
|
"Cannot simulate {} qubits in WASM: requires {} bytes, \
|
|
exceeds WASM address space. Maximum: {} qubits. \
|
|
Use native build for larger simulations.",
|
|
requested,
|
|
bytes_per_amplitude * (1usize << requested),
|
|
max
|
|
),
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Threading Strategy
|
|
|
|
WASM multi-threading requires SharedArrayBuffer, which in turn requires
|
|
specific HTTP security headers (Cross-Origin-Opener-Policy and
|
|
Cross-Origin-Embedder-Policy). Not all deployment environments support these.
|
|
|
|
**Strategy**: Optional multi-threading with graceful fallback.
|
|
|
|
```
|
|
ruqu-wasm execution
|
|
|
|
|
v
|
|
[SharedArrayBuffer
|
|
available?]
|
|
/ \
|
|
YES NO
|
|
/ \
|
|
[wasm-bindgen-rayon] [single-threaded
|
|
parallel execution] execution]
|
|
| |
|
|
Split state vector Sequential gate
|
|
across Web Workers application
|
|
| |
|
|
v v
|
|
Fast (N cores) Slower (1 core)
|
|
```
|
|
|
|
**Compile-time configuration**:
|
|
|
|
```toml
|
|
# In ruqu-wasm/Cargo.toml
|
|
[features]
|
|
default = []
|
|
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]
|
|
```
|
|
|
|
**Runtime detection**:
|
|
|
|
```rust
|
|
#[wasm_bindgen]
|
|
pub fn threading_available() -> bool {
|
|
// Check if SharedArrayBuffer is available in this environment
|
|
js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
|
|
.ok()
|
|
.and_then(|v| v.as_bool())
|
|
.unwrap_or(false)
|
|
}
|
|
```
|
|
|
|
**Required HTTP headers for threading**:
|
|
|
|
```
|
|
Cross-Origin-Opener-Policy: same-origin
|
|
Cross-Origin-Embedder-Policy: require-corp
|
|
```
|
|
|
|
### 4. SIMD Utilization
|
|
|
|
The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers
|
|
and runtimes. The quantum engine uses SIMD for amplitude manipulation when
|
|
available.
|
|
|
|
**WASM SIMD capabilities**:
|
|
|
|
| Operation | WASM SIMD Instruction | Use in Quantum Sim |
|
|
|-----------|-----------------------|--------------------|
|
|
| f64x2 multiply | `f64x2.mul` | Complex multiplication (real part) |
|
|
| f64x2 add | `f64x2.add` | Amplitude accumulation |
|
|
| f64x2 sub | `f64x2.sub` | Complex multiplication (cross terms) |
|
|
| f64x2 shuffle | `i64x2.shuffle` | Swapping real/imaginary parts |
|
|
| f32x4 multiply | `f32x4.mul` | f32 mode complex multiply |
|
|
| f32x4 fma | emulated | Fused multiply-add for accuracy |
|
|
|
|
**Conditional compilation**:
|
|
|
|
```rust
|
|
// In ruqu-core, WASM SIMD path
|
|
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
|
|
mod wasm_simd {
|
|
use core::arch::wasm32::*;
|
|
|
|
/// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
|
|
#[inline(always)]
|
|
pub fn apply_gate_2x2_simd(
|
|
a_re: f64, a_im: f64,
|
|
b_re: f64, b_im: f64,
|
|
u00_re: f64, u00_im: f64,
|
|
u01_re: f64, u01_im: f64,
|
|
u10_re: f64, u10_im: f64,
|
|
u11_re: f64, u11_im: f64,
|
|
) -> (f64, f64, f64, f64) {
|
|
// Pack amplitude pair into SIMD lanes
|
|
let a = f64x2(a_re, a_im);
|
|
let b = f64x2(b_re, b_im);
|
|
|
|
// Complex multiply-accumulate for output amplitudes
|
|
// c0 = u00*a + u01*b
|
|
// c1 = u10*a + u11*b
|
|
// (expanded for complex arithmetic)
|
|
// ...
|
|
todo!()
|
|
}
|
|
}
|
|
|
|
// Fallback scalar path
|
|
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
|
|
mod scalar {
|
|
// Pure scalar complex arithmetic
|
|
}
|
|
```
|
|
|
|
**Comparison of SIMD widths across targets**:
|
|
|
|
```
|
|
Native (AVX-512): 512-bit = 8 f64 = 4 complex f64 per instruction
|
|
Native (AVX2): 256-bit = 4 f64 = 2 complex f64 per instruction
|
|
Native (NEON): 128-bit = 2 f64 = 1 complex f64 per instruction
|
|
WASM SIMD: 128-bit = 2 f64 = 1 complex f64 per instruction
|
|
```
|
|
|
|
WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine
|
|
uses the same algorithmic structure as the NEON path, adapted for WASM SIMD
|
|
intrinsics.
|
|
|
|
### 5. No GPU in WASM
|
|
|
|
GPU acceleration is exclusively available in native builds. The WASM path
|
|
uses CPU-only simulation.
|
|
|
|
**Rationale**:
|
|
- WebGPU is a separate browser API, not accessible from WASM linear memory
|
|
- Bridging WASM to WebGPU would require complex JavaScript glue code
|
|
- WebGPU compute shader support varies across browsers
|
|
- The performance benefit is uncertain for the 25-qubit WASM ceiling
|
|
|
|
**Future consideration**: If WebGPU stabilizes and WASM-WebGPU interop matures,
|
|
a `ruqu-webgpu` crate could provide browser-side GPU acceleration. This is out
|
|
of scope for the initial release.
|
|
|
|
### 6. API Parity
|
|
|
|
`ruqu-wasm` exposes an API that is functionally identical to `ruqu-core` native.
|
|
The same circuit description produces the same measurement results (within
|
|
floating-point tolerance). Only performance and capacity differ.
|
|
|
|
**Parity guarantee**:
|
|
|
|
```
|
|
Same Circuit
|
|
|
|
|
+------------+------------+
|
|
| |
|
|
ruqu-core (native) ruqu-wasm (browser)
|
|
| |
|
|
- 30+ qubits - 25 qubits max
|
|
- AVX2/AVX-512 SIMD - WASM SIMD128
|
|
- Rayon threading - Optional Web Workers
|
|
- Optional GPU - CPU only
|
|
- ~17.5M gates/sec - ~5-12M gates/sec
|
|
| |
|
|
+------------+------------+
|
|
|
|
|
Same Results
|
|
(within fp tolerance)
|
|
```
|
|
|
|
**Verified by**: Shared test suite that runs against both native and WASM targets,
|
|
comparing outputs bitwise (for deterministic operations) or statistically (for
|
|
measurement sampling).
|
|
|
|
### 7. Module Size Target
|
|
|
|
Target `.wasm` binary size: **< 2 MB** for the default feature set.
|
|
|
|
**Size budget**:
|
|
|
|
| Component | Estimated Size |
|
|
|-----------|---------------|
|
|
| Core simulation engine | ~800 KB |
|
|
| Gate implementations | ~200 KB |
|
|
| Measurement and sampling | ~100 KB |
|
|
| wasm-bindgen glue | ~50 KB |
|
|
| Circuit optimization | ~150 KB |
|
|
| Error handling and validation | ~50 KB |
|
|
| **Total (default features)** | **~1.35 MB** |
|
|
| + noise-model feature | +200 KB |
|
|
| + tensor-network feature | +400 KB |
|
|
| **Total (all features)** | **~1.95 MB** |
|
|
|
|
**Size reduction techniques**:
|
|
- `opt-level = "z"` for size-optimized compilation
|
|
- LTO (Link-Time Optimization) for dead code elimination
|
|
- `wasm-opt` post-processing pass (binaryen)
|
|
- Feature flags to exclude unused capabilities
|
|
- `panic = "abort"` to eliminate unwinding machinery
|
|
- Avoid `format!` and `std::fmt` where possible in hot paths
|
|
|
|
**Build pipeline**:
|
|
|
|
```bash
|
|
# Build with wasm-pack
|
|
wasm-pack build crates/ruqu-wasm --target web --release
|
|
|
|
# Post-process with wasm-opt for additional size reduction
|
|
wasm-opt -Oz --enable-simd \
|
|
crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
|
|
-o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
|
|
|
# Verify size
|
|
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
|
# Expected: < 2 MB
|
|
```
|
|
|
|
### 8. Future: wasm64 (Memory64 Proposal)
|
|
|
|
The WebAssembly Memory64 proposal extends the address space to 64 bits,
|
|
removing the 4 GB limitation. When this proposal reaches broad runtime support:
|
|
|
|
- Recompile `ruqu-wasm` targeting `wasm64-unknown-unknown`
|
|
- Lift the 25-qubit ceiling to match native limits
|
|
- Maintain backward compatibility with wasm32 via conditional compilation
|
|
|
|
**Current status**: Memory64 is at Phase 4 (standardized) in the WASM
|
|
specification process. Browser support is emerging but not yet universal.
|
|
|
|
**Migration path**:
|
|
|
|
```toml
|
|
# Future Cargo.toml
|
|
[features]
|
|
wasm64 = [] # Enable when targeting wasm64
|
|
|
|
# In code
|
|
#[cfg(feature = "wasm64")]
|
|
const MAX_QUBITS_WASM: usize = 30;
|
|
|
|
#[cfg(not(feature = "wasm64"))]
|
|
const MAX_QUBITS_WASM: usize = 25;
|
|
```
|
|
|
|
## Trade-offs Accepted
|
|
|
|
| Trade-off | Accepted Limitation | Justification |
|
|
|-----------|---------------------|---------------|
|
|
| Performance | ~1.5-3x slower than native | Universal deployment outweighs raw speed |
|
|
| Qubit ceiling | 25 qubits in WASM vs 30+ native | Sufficient for most educational and research workloads |
|
|
| Threading | Requires specific browser headers | Graceful fallback ensures always-works baseline |
|
|
| No GPU | CPU-only in browser | GPU simulation at 25 qubits shows minimal benefit |
|
|
| Binary size | ~1.35 MB module | Acceptable for a quantum simulation library |
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Universal deployment**: Any modern browser or WASM runtime can execute
|
|
quantum simulations without installation
|
|
- **Security sandboxing**: WASM's memory isolation prevents quantum simulation
|
|
code from accessing host resources
|
|
- **Edge-aligned**: Matches ruVector's philosophy of computation at the edge
|
|
- **Testable**: WASM builds can be tested in CI via headless browsers and
|
|
wasm-bindgen-test
|
|
- **Progressive enhancement**: Single-threaded baseline with optional threading
|
|
ensures broad compatibility
|
|
|
|
### Negative
|
|
|
|
- **Performance ceiling**: JIT overhead and narrower SIMD limit throughput
|
|
- **Memory limits**: 25-qubit hard ceiling until wasm64 adoption
|
|
- **Threading complexity**: SharedArrayBuffer requirement adds deployment
|
|
configuration burden
|
|
- **Debugging difficulty**: WASM debugging tools are less mature than native
|
|
debuggers
|
|
|
|
### Mitigations
|
|
|
|
| Issue | Mitigation |
|
|
|-------|------------|
|
|
| Performance gap | Document native vs WASM trade-offs; recommend native for >20 qubits |
|
|
| Memory exhaustion | Hard limit enforcement with informative error messages |
|
|
| Threading failures | Automatic fallback to single-threaded; no silent degradation |
|
|
| Debug difficulty | Source maps via wasm-pack; comprehensive logging to console |
|
|
| Binary size creep | CI size gate: fail build if .wasm exceeds 2 MB |
|
|
|
|
## References
|
|
|
|
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
|
|
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
|
|
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
|
|
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
|
|
- [ruvector-router-wasm crate](/crates/ruvector-router-wasm/)
|
|
- [WebAssembly SIMD Proposal](https://github.com/WebAssembly/simd)
|
|
- [WebAssembly Memory64 Proposal](https://github.com/WebAssembly/memory64)
|
|
- [wasm-bindgen-rayon](https://github.com/RReverser/wasm-bindgen-rayon)
|
|
- [Cross-Origin Isolation Guide (MDN)](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated)
|