git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
345 lines
8.1 KiB
Markdown
345 lines
8.1 KiB
Markdown
# ruvector-mincut-gated-transformer-wasm
|
|
|
|
WebAssembly bindings for the mincut-gated transformer - ultra-low-latency inference with coherence control.
|
|
|
|
## Overview
|
|
|
|
This crate provides JavaScript-friendly WASM bindings for the `ruvector-mincut-gated-transformer` crate, enabling browser-based transformer inference with deterministic latency bounds and explainable decision making.
|
|
|
|
## Features
|
|
|
|
- **Zero-copy inference**: Direct memory access from JavaScript
|
|
- **Deterministic bounds**: Predictable p99 latency guarantees
|
|
- **Explainable decisions**: Every inference produces a witness
|
|
- **Coherence control**: Integration with dynamic minimum cut signals
|
|
- **Event-driven scheduling**: Optional spike-based compute tier selection
|
|
|
|
## Installation
|
|
|
|
### NPM
|
|
|
|
```bash
|
|
npm install ruvector-mincut-gated-transformer-wasm
|
|
```
|
|
|
|
### Build from source
|
|
|
|
```bash
|
|
wasm-pack build --target web
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```javascript
|
|
import init, { WasmTransformer, WasmGatePacket } from './pkg';
|
|
|
|
async function run() {
|
|
await init();
|
|
|
|
// Create transformer with micro config (optimized for WASM)
|
|
const transformer = new WasmTransformer();
|
|
|
|
// Create gate packet from coherence signals
|
|
const gate = new WasmGatePacket();
|
|
gate.lambda = 100;
|
|
gate.lambda_prev = 95;
|
|
gate.boundary_edges = 5;
|
|
gate.boundary_concentration_q15 = 8192;
|
|
gate.partition_count = 3;
|
|
|
|
// Run inference
|
|
const tokens = new Uint32Array([1, 2, 3, 4]);
|
|
const result = transformer.infer(tokens, gate);
|
|
|
|
console.log('Decision:', result.decision);
|
|
console.log('Reason:', result.reason);
|
|
console.log('Tier:', result.tier);
|
|
console.log('KV writes enabled:', result.kv_writes_enabled);
|
|
console.log('External writes enabled:', result.external_writes_enabled);
|
|
console.log('Logits:', result.logits);
|
|
}
|
|
|
|
run();
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### WasmTransformer
|
|
|
|
Main transformer class for inference.
|
|
|
|
#### Constructor
|
|
|
|
```javascript
|
|
const transformer = new WasmTransformer();
|
|
```
|
|
|
|
Creates a transformer with micro config (sequence length: 32, hidden: 128, heads: 4, layers: 2).
|
|
|
|
#### Methods
|
|
|
|
- `new_baseline()`: Create with baseline config (larger model)
|
|
- `with_config(config)`: Create with custom configuration
|
|
- `infer(tokens, gate)`: Run inference with gate packet
|
|
- `infer_with_spikes(tokens, gate, spikes)`: Run inference with gate and spike packets
|
|
- `reset()`: Reset all state (KV cache, cached logits)
|
|
- `buffer_size()`: Get logits buffer size
|
|
- `set_policy(policy)`: Update gate policy
|
|
|
|
### WasmGatePacket
|
|
|
|
Gate packet carrying coherence control signals.
|
|
|
|
#### Constructor
|
|
|
|
```javascript
|
|
const gate = new WasmGatePacket();
|
|
```
|
|
|
|
#### Properties
|
|
|
|
- `lambda`: Current coherence metric (minimum cut value)
|
|
- `lambda_prev`: Previous lambda for trend detection
|
|
- `boundary_edges`: Number of edges crossing partition boundaries
|
|
- `boundary_concentration_q15`: Boundary concentration (Q15: 0-32767)
|
|
- `partition_count`: Number of partitions in graph
|
|
- `flags`: Policy flags (force safe mode, etc.)
|
|
|
|
### WasmSpikePacket
|
|
|
|
Spike packet for event-driven scheduling.
|
|
|
|
#### Constructor
|
|
|
|
```javascript
|
|
const spike = new WasmSpikePacket();
|
|
```
|
|
|
|
#### Properties
|
|
|
|
- `fired`: Spike fired indicator (0 = skip, 1 = active)
|
|
- `rate_q15`: Spike rate (Q15: 0-32767)
|
|
- `novelty_q15`: Novelty metric (Q15: 0-32767)
|
|
- `flags`: Spike flags
|
|
|
|
### WasmInferResult
|
|
|
|
Inference result with logits and witness information.
|
|
|
|
#### Properties
|
|
|
|
- `logits`: Output logits (Int32Array)
|
|
- `decision`: Gate decision ("Allow", "ReduceScope", "FlushKv", "FreezeWrites", "QuarantineUpdates")
|
|
- `reason`: Decision reason ("None", "LambdaBelowMin", "LambdaDroppedFast", etc.)
|
|
- `tier`: Compute tier used (0-3)
|
|
- `kv_writes_enabled`: Whether KV writes were enabled
|
|
- `external_writes_enabled`: Whether external writes are enabled
|
|
- `effective_seq_len`: Effective sequence length used
|
|
- `effective_window`: Effective window size used
|
|
- `lambda`: Current lambda value
|
|
- `lambda_prev`: Previous lambda value
|
|
- `boundary_edges`: Boundary edges count
|
|
- `partition_count`: Partition count
|
|
|
|
## Configuration
|
|
|
|
### Micro Config (Default)
|
|
|
|
Optimized for WASM and edge gateways:
|
|
|
|
```javascript
|
|
{
|
|
seq_len_max: 32,
|
|
hidden: 128,
|
|
heads: 4,
|
|
layers: 2,
|
|
window_normal: 8,
|
|
window_degraded: 4,
|
|
ffn_mult: 4,
|
|
logits: 256
|
|
}
|
|
```
|
|
|
|
### Baseline Config
|
|
|
|
Larger model for more capacity:
|
|
|
|
```javascript
|
|
const transformer = WasmTransformer.new_baseline();
|
|
// seq_len_max: 64, hidden: 256, heads: 4, layers: 4, logits: 1024
|
|
```
|
|
|
|
### Custom Config
|
|
|
|
```javascript
|
|
const config = {
|
|
seq_len_max: 32,
|
|
hidden: 128,
|
|
heads: 4,
|
|
layers: 2,
|
|
window_normal: 8,
|
|
window_degraded: 4,
|
|
ffn_mult: 4,
|
|
logits: 256,
|
|
layers_degraded: 1,
|
|
seq_len_degraded: 16,
|
|
seq_len_safe: 4,
|
|
enable_kv_cache: true,
|
|
enable_external_writes: true
|
|
};
|
|
|
|
const transformer = WasmTransformer.with_config(config);
|
|
```
|
|
|
|
## Gate Policy
|
|
|
|
Control when the gate intervenes:
|
|
|
|
```javascript
|
|
const policy = {
|
|
lambda_min: 30,
|
|
drop_ratio_q15_max: 12288, // ~37.5%
|
|
boundary_edges_max: 20,
|
|
boundary_concentration_q15_max: 20480, // ~62.5%
|
|
partitions_max: 10,
|
|
spike_rate_q15_max: 16384,
|
|
spike_novelty_q15_min: 2048,
|
|
allow_kv_write_when_unstable: true,
|
|
allow_external_write_when_unstable: false
|
|
};
|
|
|
|
transformer.set_policy(policy);
|
|
```
|
|
|
|
## Decision Types
|
|
|
|
### Gate Decisions
|
|
|
|
- **Allow**: Proceed normally with full capabilities
|
|
- **ReduceScope**: Reduce sequence length and window size
|
|
- **FlushKv**: Flush KV cache before proceeding
|
|
- **FreezeWrites**: Run in read-only mode (no KV updates)
|
|
- **QuarantineUpdates**: Run compute but discard all state changes
|
|
|
|
### Decision Reasons
|
|
|
|
- **None**: No intervention needed
|
|
- **LambdaBelowMin**: Lambda below minimum threshold
|
|
- **LambdaDroppedFast**: Lambda dropped too quickly
|
|
- **BoundarySpike**: Boundary edge count exceeded threshold
|
|
- **BoundaryConcentrationSpike**: Boundary concentration too high
|
|
- **PartitionDrift**: Partition count indicates drift
|
|
- **SpikeStorm**: Spike rate indicates overload
|
|
- **ForcedByFlag**: Forced by flag in gate packet
|
|
|
|
## Examples
|
|
|
|
### Basic Inference
|
|
|
|
```javascript
|
|
const transformer = new WasmTransformer();
|
|
const gate = new WasmGatePacket();
|
|
const tokens = new Uint32Array([1, 2, 3, 4]);
|
|
const result = transformer.infer(tokens, gate);
|
|
console.log(result.decision);
|
|
```
|
|
|
|
### With Spike Scheduling
|
|
|
|
```javascript
|
|
const transformer = new WasmTransformer();
|
|
const gate = new WasmGatePacket();
|
|
const spike = new WasmSpikePacket();
|
|
spike.fired = 1;
|
|
spike.novelty_q15 = 8192;
|
|
|
|
const tokens = new Uint32Array([1, 2, 3, 4]);
|
|
const result = transformer.infer_with_spikes(tokens, gate, spike);
|
|
```
|
|
|
|
### Handling Interventions
|
|
|
|
```javascript
|
|
const transformer = new WasmTransformer();
|
|
const gate = new WasmGatePacket();
|
|
gate.lambda = 10; // Low coherence
|
|
gate.lambda_prev = 100;
|
|
|
|
const tokens = new Uint32Array([1, 2, 3, 4]);
|
|
const result = transformer.infer(tokens, gate);
|
|
|
|
if (result.decision !== 'Allow') {
|
|
console.log('Intervention triggered:', result.reason);
|
|
console.log('Effective seq_len:', result.effective_seq_len);
|
|
console.log('KV writes:', result.kv_writes_enabled);
|
|
}
|
|
```
|
|
|
|
## Building
|
|
|
|
### Development
|
|
|
|
```bash
|
|
wasm-pack build --dev --target web
|
|
```
|
|
|
|
### Release (optimized)
|
|
|
|
```bash
|
|
wasm-pack build --release --target web
|
|
```
|
|
|
|
### For Node.js
|
|
|
|
```bash
|
|
wasm-pack build --target nodejs
|
|
```
|
|
|
|
### For Bundlers
|
|
|
|
```bash
|
|
wasm-pack build --target bundler
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Browser tests
|
|
|
|
```bash
|
|
wasm-pack test --headless --firefox
|
|
wasm-pack test --headless --chrome
|
|
```
|
|
|
|
### Node.js tests
|
|
|
|
```bash
|
|
wasm-pack test --node
|
|
```
|
|
|
|
## Performance
|
|
|
|
The WASM bindings maintain the core performance characteristics:
|
|
|
|
- **Allocation-free hot path**: Zero heap allocations during inference
|
|
- **Predictable latency**: Bounded p99 latency guarantees
|
|
- **Small binary size**: ~50KB compressed (micro config)
|
|
- **Low memory footprint**: ~128KB runtime state (micro config)
|
|
|
|
## Integration with RuVector
|
|
|
|
This transformer integrates with the RuVector ecosystem:
|
|
|
|
- **ruvector-mincut**: Provides coherence signals via gate packets
|
|
- **ruvector-core**: Vector search and semantic retrieval
|
|
- **ruvector-router**: Query routing and orchestration
|
|
|
|
## License
|
|
|
|
MIT OR Apache-2.0
|
|
|
|
## Links
|
|
|
|
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
|
- [Core Library](../ruvector-mincut-gated-transformer)
|
|
- [RuVector Documentation](../../README.md)
|