Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
459
docs/adr/quantum-engine/ADR-QE-003-wasm-compilation-strategy.md
Normal file
459
docs/adr/quantum-engine/ADR-QE-003-wasm-compilation-strategy.md
Normal file
@@ -0,0 +1,459 @@
|
||||
# ADR-QE-003: WebAssembly Compilation Strategy
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Context
|
||||
|
||||
### Problem Statement
|
||||
|
||||
ruVector targets browsers, embedded/edge runtimes, and IoT devices via
|
||||
WebAssembly. The quantum simulation engine must compile to
|
||||
`wasm32-unknown-unknown` and run correctly in these constrained environments.
|
||||
WASM introduces fundamental constraints that differ significantly from native
|
||||
execution and must be addressed at the architectural level rather than
|
||||
worked around at runtime.
|
||||
|
||||
### WASM Execution Environment Constraints
|
||||
|
||||
| Constraint | Detail | Impact on Quantum Simulation |
|
||||
|------------|--------|------------------------------|
|
||||
| 32-bit address space | ~4 GB theoretical max, ~2 GB practical | Hard ceiling on state vector size |
|
||||
| Memory model | Linear memory, grows in 64 KB pages | Allocation must be page-aware |
|
||||
| No native threads | Web Workers required for parallelism | Requires SharedArrayBuffer + COOP/COEP headers |
|
||||
| No direct GPU | WebGPU is separate API, not WASM-native | GPU acceleration unavailable in WASM path |
|
||||
| No OS syscalls | Sandboxed execution, no file/network | All I/O must go through host bindings |
|
||||
| JIT compilation | V8/SpiderMonkey JIT, not AOT | ~1.5-3x slower than native, variable warmup |
|
||||
| SIMD support | 128-bit SIMD proposal (widely supported since 2021) | 4 f32 or 2 f64 per vector lane |
|
||||
| Stack size | Default ~1 MB, configurable | Deep recursion limited |
|
||||
|
||||
### Memory Budget Analysis for Quantum Simulation
|
||||
|
||||
The critical constraint is WASM's 32-bit address space. With a practical
|
||||
usable limit of approximately 2 GB (due to browser memory allocation
|
||||
behavior and address space fragmentation), the maximum feasible state vector
|
||||
size is bounded:
|
||||
|
||||
```
|
||||
Available WASM Memory Budget:
|
||||
|
||||
Total addressable: 4,294,967,296 bytes (4 GB theoretical)
|
||||
Practical usable: ~2,147,483,648 bytes (2 GB, browser-dependent)
|
||||
WASM overhead: ~100,000,000 bytes (module, stack, heap metadata)
|
||||
Application overhead: ~50,000,000 bytes (circuit data, scratch buffers)
|
||||
-------------------------------------------------
|
||||
Available for state: ~2,000,000,000 bytes (1.86 GB)
|
||||
|
||||
State vector sizes:
|
||||
24 qubits: 268,435,456 bytes (256 MB) -- comfortable
|
||||
25 qubits: 536,870,912 bytes (512 MB) -- feasible
|
||||
25 + scratch: ~1,073,741,824 bytes -- tight but within budget
|
||||
26 qubits: 1,073,741,824 bytes (1 GB) -- state alone, no scratch room
|
||||
27 qubits: 2,147,483,648 bytes (2 GB) -- exceeds practical limit
|
||||
```
|
||||
|
||||
### Existing WASM Patterns in ruVector
|
||||
|
||||
The `ruvector-router-wasm` crate establishes conventions for WASM compilation:
|
||||
|
||||
- `wasm-pack build` as the compilation tool
|
||||
- `wasm-bindgen` for JavaScript interop
|
||||
- TypeScript definition generation
|
||||
- Feature-flag controlled inclusion/exclusion of capabilities
|
||||
- Dedicated test suites using `wasm-bindgen-test`
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Target and Toolchain
|
||||
|
||||
**Target triple**: `wasm32-unknown-unknown`
|
||||
|
||||
**Build toolchain**: `wasm-pack` with `wasm-bindgen`
|
||||
|
||||
```bash
|
||||
# Development build
|
||||
wasm-pack build crates/ruqu-wasm --target web --dev
|
||||
|
||||
# Release build with size optimization
|
||||
wasm-pack build crates/ruqu-wasm --target web --release
|
||||
|
||||
# Node.js target (for server-side WASM)
|
||||
wasm-pack build crates/ruqu-wasm --target nodejs --release
|
||||
```
|
||||
|
||||
**Cargo profile for WASM release**:
|
||||
|
||||
```toml
|
||||
[profile.wasm-release]
|
||||
inherits = "release"
|
||||
opt-level = "z" # Optimize for binary size
|
||||
lto = true # Link-time optimization
|
||||
codegen-units = 1 # Single codegen unit for maximum optimization
|
||||
strip = true # Strip debug symbols
|
||||
panic = "abort" # Smaller panic handling
|
||||
```
|
||||
|
||||
### 2. Memory Limit Enforcement
|
||||
|
||||
`ruqu-wasm` enforces qubit limits before any allocation occurs. This is a hard
|
||||
gate, not a soft warning.
|
||||
|
||||
**Enforcement strategy**:
|
||||
|
||||
```
|
||||
User requests N qubits
|
||||
|
|
||||
v
|
||||
[N <= 25?] ---NO---> Return WasmLimitError {
|
||||
| requested: N,
|
||||
YES maximum: 25,
|
||||
| estimated_memory: 16 * 2^N,
|
||||
v suggestion: "Use native build for >25 qubits"
|
||||
[Estimate total }
|
||||
memory needed]
|
||||
|
|
||||
v
|
||||
[< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
|
||||
|
|
||||
YES
|
||||
|
|
||||
v
|
||||
Proceed with allocation
|
||||
```
|
||||
|
||||
**Qubit limits by precision**:
|
||||
|
||||
| Precision | Max Qubits (WASM) | State Size | With Scratch |
|
||||
|-----------|--------------------|------------|--------------|
|
||||
| Complex f64 (default) | 25 | 512 MB | ~1.07 GB |
|
||||
| Complex f32 (optional) | 26 | 512 MB | ~1.07 GB |
|
||||
|
||||
**Error reporting**:
|
||||
|
||||
```rust
|
||||
#[wasm_bindgen]
|
||||
#[derive(Debug)]
|
||||
pub struct WasmLimitError {
|
||||
pub requested_qubits: usize,
|
||||
pub maximum_qubits: usize,
|
||||
pub estimated_bytes: usize,
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
impl WasmLimitError {
|
||||
pub fn qubit_overflow(requested: usize) -> Self {
|
||||
let max = if cfg!(feature = "f32") { 26 } else { 25 };
|
||||
let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
|
||||
Self {
|
||||
requested_qubits: requested,
|
||||
maximum_qubits: max,
|
||||
estimated_bytes: bytes_per_amplitude * (1usize << requested),
|
||||
message: format!(
|
||||
"Cannot simulate {} qubits in WASM: requires {} bytes, \
|
||||
exceeds WASM address space. Maximum: {} qubits. \
|
||||
Use native build for larger simulations.",
|
||||
requested,
|
||||
bytes_per_amplitude * (1usize << requested),
|
||||
max
|
||||
),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Threading Strategy
|
||||
|
||||
WASM multi-threading requires SharedArrayBuffer, which in turn requires
|
||||
specific HTTP security headers (Cross-Origin-Opener-Policy and
|
||||
Cross-Origin-Embedder-Policy). Not all deployment environments support these.
|
||||
|
||||
**Strategy**: Optional multi-threading with graceful fallback.
|
||||
|
||||
```
|
||||
ruqu-wasm execution
|
||||
|
|
||||
v
|
||||
[SharedArrayBuffer
|
||||
available?]
|
||||
/ \
|
||||
YES NO
|
||||
/ \
|
||||
[wasm-bindgen-rayon] [single-threaded
|
||||
parallel execution] execution]
|
||||
| |
|
||||
Split state vector Sequential gate
|
||||
across Web Workers application
|
||||
| |
|
||||
v v
|
||||
Fast (N cores) Slower (1 core)
|
||||
```
|
||||
|
||||
**Compile-time configuration**:
|
||||
|
||||
```toml
|
||||
# In ruqu-wasm/Cargo.toml
|
||||
[features]
|
||||
default = []
|
||||
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]
|
||||
```
|
||||
|
||||
**Runtime detection**:
|
||||
|
||||
```rust
|
||||
#[wasm_bindgen]
|
||||
pub fn threading_available() -> bool {
|
||||
// Check if SharedArrayBuffer is available in this environment
|
||||
js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
|
||||
.ok()
|
||||
.and_then(|v| v.as_bool())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
```
|
||||
|
||||
**Required HTTP headers for threading**:
|
||||
|
||||
```
|
||||
Cross-Origin-Opener-Policy: same-origin
|
||||
Cross-Origin-Embedder-Policy: require-corp
|
||||
```
|
||||
|
||||
### 4. SIMD Utilization
|
||||
|
||||
The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers
|
||||
and runtimes. The quantum engine uses SIMD for amplitude manipulation when
|
||||
available.
|
||||
|
||||
**WASM SIMD capabilities**:
|
||||
|
||||
| Operation | WASM SIMD Instruction | Use in Quantum Sim |
|
||||
|-----------|-----------------------|--------------------|
|
||||
| f64x2 multiply | `f64x2.mul` | Complex multiplication (real part) |
|
||||
| f64x2 add | `f64x2.add` | Amplitude accumulation |
|
||||
| f64x2 sub | `f64x2.sub` | Complex multiplication (cross terms) |
|
||||
| f64x2 shuffle | `i64x2.shuffle` | Swapping real/imaginary parts |
|
||||
| f32x4 multiply | `f32x4.mul` | f32 mode complex multiply |
|
||||
| f32x4 fma | emulated | Fused multiply-add for accuracy |
|
||||
|
||||
**Conditional compilation**:
|
||||
|
||||
```rust
|
||||
// In ruqu-core, WASM SIMD path
|
||||
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
|
||||
mod wasm_simd {
|
||||
use core::arch::wasm32::*;
|
||||
|
||||
/// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
|
||||
#[inline(always)]
|
||||
pub fn apply_gate_2x2_simd(
|
||||
a_re: f64, a_im: f64,
|
||||
b_re: f64, b_im: f64,
|
||||
u00_re: f64, u00_im: f64,
|
||||
u01_re: f64, u01_im: f64,
|
||||
u10_re: f64, u10_im: f64,
|
||||
u11_re: f64, u11_im: f64,
|
||||
) -> (f64, f64, f64, f64) {
|
||||
// Pack amplitude pair into SIMD lanes
|
||||
let a = f64x2(a_re, a_im);
|
||||
let b = f64x2(b_re, b_im);
|
||||
|
||||
// Complex multiply-accumulate for output amplitudes
|
||||
// c0 = u00*a + u01*b
|
||||
// c1 = u10*a + u11*b
|
||||
// (expanded for complex arithmetic)
|
||||
// ...
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback scalar path
|
||||
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
|
||||
mod scalar {
|
||||
// Pure scalar complex arithmetic
|
||||
}
|
||||
```
|
||||
|
||||
**Comparison of SIMD widths across targets**:
|
||||
|
||||
```
|
||||
Native (AVX-512): 512-bit = 8 f64 = 4 complex f64 per instruction
|
||||
Native (AVX2): 256-bit = 4 f64 = 2 complex f64 per instruction
|
||||
Native (NEON): 128-bit = 2 f64 = 1 complex f64 per instruction
|
||||
WASM SIMD: 128-bit = 2 f64 = 1 complex f64 per instruction
|
||||
```
|
||||
|
||||
WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine
|
||||
uses the same algorithmic structure as the NEON path, adapted for WASM SIMD
|
||||
intrinsics.
|
||||
|
||||
### 5. No GPU in WASM
|
||||
|
||||
GPU acceleration is exclusively available in native builds. The WASM path
|
||||
uses CPU-only simulation.
|
||||
|
||||
**Rationale**:
|
||||
- WebGPU is a separate browser API, not accessible from WASM linear memory
|
||||
- Bridging WASM to WebGPU would require complex JavaScript glue code
|
||||
- WebGPU compute shader support varies across browsers
|
||||
- The performance benefit is uncertain for the 25-qubit WASM ceiling
|
||||
|
||||
**Future consideration**: If WebGPU stabilizes and WASM-WebGPU interop matures,
|
||||
a `ruqu-webgpu` crate could provide browser-side GPU acceleration. This is out
|
||||
of scope for the initial release.
|
||||
|
||||
### 6. API Parity
|
||||
|
||||
`ruqu-wasm` exposes an API that is functionally identical to `ruqu-core` native.
|
||||
The same circuit description produces the same measurement results (within
|
||||
floating-point tolerance). Only performance and capacity differ.
|
||||
|
||||
**Parity guarantee**:
|
||||
|
||||
```
|
||||
Same Circuit
|
||||
|
|
||||
+------------+------------+
|
||||
| |
|
||||
ruqu-core (native) ruqu-wasm (browser)
|
||||
| |
|
||||
- 30+ qubits - 25 qubits max
|
||||
- AVX2/AVX-512 SIMD - WASM SIMD128
|
||||
- Rayon threading - Optional Web Workers
|
||||
- Optional GPU - CPU only
|
||||
- ~17.5M gates/sec - ~5-12M gates/sec
|
||||
| |
|
||||
+------------+------------+
|
||||
|
|
||||
Same Results
|
||||
(within fp tolerance)
|
||||
```
|
||||
|
||||
**Verified by**: Shared test suite that runs against both native and WASM targets,
|
||||
comparing outputs bitwise (for deterministic operations) or statistically (for
|
||||
measurement sampling).
|
||||
|
||||
### 7. Module Size Target
|
||||
|
||||
Target `.wasm` binary size: **< 2 MB** for the default feature set.
|
||||
|
||||
**Size budget**:
|
||||
|
||||
| Component | Estimated Size |
|
||||
|-----------|---------------|
|
||||
| Core simulation engine | ~800 KB |
|
||||
| Gate implementations | ~200 KB |
|
||||
| Measurement and sampling | ~100 KB |
|
||||
| wasm-bindgen glue | ~50 KB |
|
||||
| Circuit optimization | ~150 KB |
|
||||
| Error handling and validation | ~50 KB |
|
||||
| **Total (default features)** | **~1.35 MB** |
|
||||
| + noise-model feature | +200 KB |
|
||||
| + tensor-network feature | +400 KB |
|
||||
| **Total (all features)** | **~1.95 MB** |
|
||||
|
||||
**Size reduction techniques**:
|
||||
- `opt-level = "z"` for size-optimized compilation
|
||||
- LTO (Link-Time Optimization) for dead code elimination
|
||||
- `wasm-opt` post-processing pass (binaryen)
|
||||
- Feature flags to exclude unused capabilities
|
||||
- `panic = "abort"` to eliminate unwinding machinery
|
||||
- Avoid `format!` and `std::fmt` where possible in hot paths
|
||||
|
||||
**Build pipeline**:
|
||||
|
||||
```bash
|
||||
# Build with wasm-pack
|
||||
wasm-pack build crates/ruqu-wasm --target web --release
|
||||
|
||||
# Post-process with wasm-opt for additional size reduction
|
||||
wasm-opt -Oz --enable-simd \
|
||||
crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
|
||||
-o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
||||
|
||||
# Verify size
|
||||
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
||||
# Expected: < 2 MB
|
||||
```
|
||||
|
||||
### 8. Future: wasm64 (Memory64 Proposal)
|
||||
|
||||
The WebAssembly Memory64 proposal extends the address space to 64 bits,
|
||||
removing the 4 GB limitation. When this proposal reaches broad runtime support:
|
||||
|
||||
- Recompile `ruqu-wasm` targeting `wasm64-unknown-unknown`
|
||||
- Lift the 25-qubit ceiling to match native limits
|
||||
- Maintain backward compatibility with wasm32 via conditional compilation
|
||||
|
||||
**Current status**: Memory64 is at Phase 4 (standardized) in the WASM
|
||||
specification process. Browser support is emerging but not yet universal.
|
||||
|
||||
**Migration path**:
|
||||
|
||||
```toml
|
||||
# Future Cargo.toml
|
||||
[features]
|
||||
wasm64 = [] # Enable when targeting wasm64
|
||||
|
||||
# In code
|
||||
#[cfg(feature = "wasm64")]
|
||||
const MAX_QUBITS_WASM: usize = 30;
|
||||
|
||||
#[cfg(not(feature = "wasm64"))]
|
||||
const MAX_QUBITS_WASM: usize = 25;
|
||||
```
|
||||
|
||||
## Trade-offs Accepted
|
||||
|
||||
| Trade-off | Accepted Limitation | Justification |
|
||||
|-----------|---------------------|---------------|
|
||||
| Performance | ~1.5-3x slower than native | Universal deployment outweighs raw speed |
|
||||
| Qubit ceiling | 25 qubits in WASM vs 30+ native | Sufficient for most educational and research workloads |
|
||||
| Threading | Requires specific browser headers | Graceful fallback ensures always-works baseline |
|
||||
| No GPU | CPU-only in browser | GPU simulation at 25 qubits shows minimal benefit |
|
||||
| Binary size | ~1.35 MB module | Acceptable for a quantum simulation library |
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Universal deployment**: Any modern browser or WASM runtime can execute
|
||||
quantum simulations without installation
|
||||
- **Security sandboxing**: WASM's memory isolation prevents quantum simulation
|
||||
code from accessing host resources
|
||||
- **Edge-aligned**: Matches ruVector's philosophy of computation at the edge
|
||||
- **Testable**: WASM builds can be tested in CI via headless browsers and
|
||||
wasm-bindgen-test
|
||||
- **Progressive enhancement**: Single-threaded baseline with optional threading
|
||||
ensures broad compatibility
|
||||
|
||||
### Negative
|
||||
|
||||
- **Performance ceiling**: JIT overhead and narrower SIMD limit throughput
|
||||
- **Memory limits**: 25-qubit hard ceiling until wasm64 adoption
|
||||
- **Threading complexity**: SharedArrayBuffer requirement adds deployment
|
||||
configuration burden
|
||||
- **Debugging difficulty**: WASM debugging tools are less mature than native
|
||||
debuggers
|
||||
|
||||
### Mitigations
|
||||
|
||||
| Issue | Mitigation |
|
||||
|-------|------------|
|
||||
| Performance gap | Document native vs WASM trade-offs; recommend native for >20 qubits |
|
||||
| Memory exhaustion | Hard limit enforcement with informative error messages |
|
||||
| Threading failures | Automatic fallback to single-threaded; no silent degradation |
|
||||
| Debug difficulty | Source maps via wasm-pack; comprehensive logging to console |
|
||||
| Binary size creep | CI size gate: fail build if .wasm exceeds 2 MB |
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
|
||||
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
|
||||
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
|
||||
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
|
||||
- [ruvector-router-wasm crate](/crates/ruvector-router-wasm/)
|
||||
- [WebAssembly SIMD Proposal](https://github.com/WebAssembly/simd)
|
||||
- [WebAssembly Memory64 Proposal](https://github.com/WebAssembly/memory64)
|
||||
- [wasm-bindgen-rayon](https://github.com/RReverser/wasm-bindgen-rayon)
|
||||
- [Cross-Origin Isolation Guide (MDN)](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated)
|
||||
Reference in New Issue
Block a user