# ruvector-mincut-gated-transformer-wasm WebAssembly bindings for the mincut-gated transformer - ultra-low-latency inference with coherence control. ## Overview This crate provides JavaScript-friendly WASM bindings for the `ruvector-mincut-gated-transformer` crate, enabling browser-based transformer inference with deterministic latency bounds and explainable decision making. ## Features - **Zero-copy inference**: Direct memory access from JavaScript - **Deterministic bounds**: Predictable p99 latency guarantees - **Explainable decisions**: Every inference produces a witness - **Coherence control**: Integration with dynamic minimum cut signals - **Event-driven scheduling**: Optional spike-based compute tier selection ## Installation ### NPM ```bash npm install ruvector-mincut-gated-transformer-wasm ``` ### Build from source ```bash wasm-pack build --target web ``` ## Quick Start ```javascript import init, { WasmTransformer, WasmGatePacket } from './pkg'; async function run() { await init(); // Create transformer with micro config (optimized for WASM) const transformer = new WasmTransformer(); // Create gate packet from coherence signals const gate = new WasmGatePacket(); gate.lambda = 100; gate.lambda_prev = 95; gate.boundary_edges = 5; gate.boundary_concentration_q15 = 8192; gate.partition_count = 3; // Run inference const tokens = new Uint32Array([1, 2, 3, 4]); const result = transformer.infer(tokens, gate); console.log('Decision:', result.decision); console.log('Reason:', result.reason); console.log('Tier:', result.tier); console.log('KV writes enabled:', result.kv_writes_enabled); console.log('External writes enabled:', result.external_writes_enabled); console.log('Logits:', result.logits); } run(); ``` ## API Reference ### WasmTransformer Main transformer class for inference. #### Constructor ```javascript const transformer = new WasmTransformer(); ``` Creates a transformer with micro config (sequence length: 32, hidden: 128, heads: 4, layers: 2). #### Methods - `new_baseline()`: Create with baseline config (larger model) - `with_config(config)`: Create with custom configuration - `infer(tokens, gate)`: Run inference with gate packet - `infer_with_spikes(tokens, gate, spikes)`: Run inference with gate and spike packets - `reset()`: Reset all state (KV cache, cached logits) - `buffer_size()`: Get logits buffer size - `set_policy(policy)`: Update gate policy ### WasmGatePacket Gate packet carrying coherence control signals. #### Constructor ```javascript const gate = new WasmGatePacket(); ``` #### Properties - `lambda`: Current coherence metric (minimum cut value) - `lambda_prev`: Previous lambda for trend detection - `boundary_edges`: Number of edges crossing partition boundaries - `boundary_concentration_q15`: Boundary concentration (Q15: 0-32767) - `partition_count`: Number of partitions in graph - `flags`: Policy flags (force safe mode, etc.) ### WasmSpikePacket Spike packet for event-driven scheduling. #### Constructor ```javascript const spike = new WasmSpikePacket(); ``` #### Properties - `fired`: Spike fired indicator (0 = skip, 1 = active) - `rate_q15`: Spike rate (Q15: 0-32767) - `novelty_q15`: Novelty metric (Q15: 0-32767) - `flags`: Spike flags ### WasmInferResult Inference result with logits and witness information. #### Properties - `logits`: Output logits (Int32Array) - `decision`: Gate decision ("Allow", "ReduceScope", "FlushKv", "FreezeWrites", "QuarantineUpdates") - `reason`: Decision reason ("None", "LambdaBelowMin", "LambdaDroppedFast", etc.) - `tier`: Compute tier used (0-3) - `kv_writes_enabled`: Whether KV writes were enabled - `external_writes_enabled`: Whether external writes are enabled - `effective_seq_len`: Effective sequence length used - `effective_window`: Effective window size used - `lambda`: Current lambda value - `lambda_prev`: Previous lambda value - `boundary_edges`: Boundary edges count - `partition_count`: Partition count ## Configuration ### Micro Config (Default) Optimized for WASM and edge gateways: ```javascript { seq_len_max: 32, hidden: 128, heads: 4, layers: 2, window_normal: 8, window_degraded: 4, ffn_mult: 4, logits: 256 } ``` ### Baseline Config Larger model for more capacity: ```javascript const transformer = WasmTransformer.new_baseline(); // seq_len_max: 64, hidden: 256, heads: 4, layers: 4, logits: 1024 ``` ### Custom Config ```javascript const config = { seq_len_max: 32, hidden: 128, heads: 4, layers: 2, window_normal: 8, window_degraded: 4, ffn_mult: 4, logits: 256, layers_degraded: 1, seq_len_degraded: 16, seq_len_safe: 4, enable_kv_cache: true, enable_external_writes: true }; const transformer = WasmTransformer.with_config(config); ``` ## Gate Policy Control when the gate intervenes: ```javascript const policy = { lambda_min: 30, drop_ratio_q15_max: 12288, // ~37.5% boundary_edges_max: 20, boundary_concentration_q15_max: 20480, // ~62.5% partitions_max: 10, spike_rate_q15_max: 16384, spike_novelty_q15_min: 2048, allow_kv_write_when_unstable: true, allow_external_write_when_unstable: false }; transformer.set_policy(policy); ``` ## Decision Types ### Gate Decisions - **Allow**: Proceed normally with full capabilities - **ReduceScope**: Reduce sequence length and window size - **FlushKv**: Flush KV cache before proceeding - **FreezeWrites**: Run in read-only mode (no KV updates) - **QuarantineUpdates**: Run compute but discard all state changes ### Decision Reasons - **None**: No intervention needed - **LambdaBelowMin**: Lambda below minimum threshold - **LambdaDroppedFast**: Lambda dropped too quickly - **BoundarySpike**: Boundary edge count exceeded threshold - **BoundaryConcentrationSpike**: Boundary concentration too high - **PartitionDrift**: Partition count indicates drift - **SpikeStorm**: Spike rate indicates overload - **ForcedByFlag**: Forced by flag in gate packet ## Examples ### Basic Inference ```javascript const transformer = new WasmTransformer(); const gate = new WasmGatePacket(); const tokens = new Uint32Array([1, 2, 3, 4]); const result = transformer.infer(tokens, gate); console.log(result.decision); ``` ### With Spike Scheduling ```javascript const transformer = new WasmTransformer(); const gate = new WasmGatePacket(); const spike = new WasmSpikePacket(); spike.fired = 1; spike.novelty_q15 = 8192; const tokens = new Uint32Array([1, 2, 3, 4]); const result = transformer.infer_with_spikes(tokens, gate, spike); ``` ### Handling Interventions ```javascript const transformer = new WasmTransformer(); const gate = new WasmGatePacket(); gate.lambda = 10; // Low coherence gate.lambda_prev = 100; const tokens = new Uint32Array([1, 2, 3, 4]); const result = transformer.infer(tokens, gate); if (result.decision !== 'Allow') { console.log('Intervention triggered:', result.reason); console.log('Effective seq_len:', result.effective_seq_len); console.log('KV writes:', result.kv_writes_enabled); } ``` ## Building ### Development ```bash wasm-pack build --dev --target web ``` ### Release (optimized) ```bash wasm-pack build --release --target web ``` ### For Node.js ```bash wasm-pack build --target nodejs ``` ### For Bundlers ```bash wasm-pack build --target bundler ``` ## Testing ### Browser tests ```bash wasm-pack test --headless --firefox wasm-pack test --headless --chrome ``` ### Node.js tests ```bash wasm-pack test --node ``` ## Performance The WASM bindings maintain the core performance characteristics: - **Allocation-free hot path**: Zero heap allocations during inference - **Predictable latency**: Bounded p99 latency guarantees - **Small binary size**: ~50KB compressed (micro config) - **Low memory footprint**: ~128KB runtime state (micro config) ## Integration with RuVector This transformer integrates with the RuVector ecosystem: - **ruvector-mincut**: Provides coherence signals via gate packets - **ruvector-core**: Vector search and semantic retrieval - **ruvector-router**: Query routing and orchestration ## License MIT OR Apache-2.0 ## Links - [GitHub Repository](https://github.com/ruvnet/ruvector) - [Core Library](../ruvector-mincut-gated-transformer) - [RuVector Documentation](../../README.md)