Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/crates/ruvector-mincut-gated-transformer-wasm/README.md
+++ b/vendor/ruvector/crates/ruvector-mincut-gated-transformer-wasm/README.md
@@ -0,0 +1,344 @@
+# ruvector-mincut-gated-transformer-wasm
+
+WebAssembly bindings for the mincut-gated transformer - ultra-low-latency inference with coherence control.
+
+## Overview
+
+This crate provides JavaScript-friendly WASM bindings for the `ruvector-mincut-gated-transformer` crate, enabling browser-based transformer inference with deterministic latency bounds and explainable decision making.
+
+## Features
+
+- **Zero-copy inference**: Direct memory access from JavaScript
+- **Deterministic bounds**: Predictable p99 latency guarantees
+- **Explainable decisions**: Every inference produces a witness
+- **Coherence control**: Integration with dynamic minimum cut signals
+- **Event-driven scheduling**: Optional spike-based compute tier selection
+
+## Installation
+
+### NPM
+
+```bash
+npm install ruvector-mincut-gated-transformer-wasm
+```
+
+### Build from source
+
+```bash
+wasm-pack build --target web
+```
+
+## Quick Start
+
+```javascript
+import init, { WasmTransformer, WasmGatePacket } from './pkg';
+
+async function run() {
+  await init();
+
+  // Create transformer with micro config (optimized for WASM)
+  const transformer = new WasmTransformer();
+
+  // Create gate packet from coherence signals
+  const gate = new WasmGatePacket();
+  gate.lambda = 100;
+  gate.lambda_prev = 95;
+  gate.boundary_edges = 5;
+  gate.boundary_concentration_q15 = 8192;
+  gate.partition_count = 3;
+
+  // Run inference
+  const tokens = new Uint32Array([1, 2, 3, 4]);
+  const result = transformer.infer(tokens, gate);
+
+  console.log('Decision:', result.decision);
+  console.log('Reason:', result.reason);
+  console.log('Tier:', result.tier);
+  console.log('KV writes enabled:', result.kv_writes_enabled);
+  console.log('External writes enabled:', result.external_writes_enabled);
+  console.log('Logits:', result.logits);
+}
+
+run();
+```
+
+## API Reference
+
+### WasmTransformer
+
+Main transformer class for inference.
+
+#### Constructor
+
+```javascript
+const transformer = new WasmTransformer();
+```
+
+Creates a transformer with micro config (sequence length: 32, hidden: 128, heads: 4, layers: 2).
+
+#### Methods
+
+- `new_baseline()`: Create with baseline config (larger model)
+- `with_config(config)`: Create with custom configuration
+- `infer(tokens, gate)`: Run inference with gate packet
+- `infer_with_spikes(tokens, gate, spikes)`: Run inference with gate and spike packets
+- `reset()`: Reset all state (KV cache, cached logits)
+- `buffer_size()`: Get logits buffer size
+- `set_policy(policy)`: Update gate policy
+
+### WasmGatePacket
+
+Gate packet carrying coherence control signals.
+
+#### Constructor
+
+```javascript
+const gate = new WasmGatePacket();
+```
+
+#### Properties
+
+- `lambda`: Current coherence metric (minimum cut value)
+- `lambda_prev`: Previous lambda for trend detection
+- `boundary_edges`: Number of edges crossing partition boundaries
+- `boundary_concentration_q15`: Boundary concentration (Q15: 0-32767)
+- `partition_count`: Number of partitions in graph
+- `flags`: Policy flags (force safe mode, etc.)
+
+### WasmSpikePacket
+
+Spike packet for event-driven scheduling.
+
+#### Constructor
+
+```javascript
+const spike = new WasmSpikePacket();
+```
+
+#### Properties
+
+- `fired`: Spike fired indicator (0 = skip, 1 = active)
+- `rate_q15`: Spike rate (Q15: 0-32767)
+- `novelty_q15`: Novelty metric (Q15: 0-32767)
+- `flags`: Spike flags
+
+### WasmInferResult
+
+Inference result with logits and witness information.
+
+#### Properties
+
+- `logits`: Output logits (Int32Array)
+- `decision`: Gate decision ("Allow", "ReduceScope", "FlushKv", "FreezeWrites", "QuarantineUpdates")
+- `reason`: Decision reason ("None", "LambdaBelowMin", "LambdaDroppedFast", etc.)
+- `tier`: Compute tier used (0-3)
+- `kv_writes_enabled`: Whether KV writes were enabled
+- `external_writes_enabled`: Whether external writes are enabled
+- `effective_seq_len`: Effective sequence length used
+- `effective_window`: Effective window size used
+- `lambda`: Current lambda value
+- `lambda_prev`: Previous lambda value
+- `boundary_edges`: Boundary edges count
+- `partition_count`: Partition count
+
+## Configuration
+
+### Micro Config (Default)
+
+Optimized for WASM and edge gateways:
+
+```javascript
+{
+  seq_len_max: 32,
+  hidden: 128,
+  heads: 4,
+  layers: 2,
+  window_normal: 8,
+  window_degraded: 4,
+  ffn_mult: 4,
+  logits: 256
+}
+```
+
+### Baseline Config
+
+Larger model for more capacity:
+
+```javascript
+const transformer = WasmTransformer.new_baseline();
+// seq_len_max: 64, hidden: 256, heads: 4, layers: 4, logits: 1024
+```
+
+### Custom Config
+
+```javascript
+const config = {
+  seq_len_max: 32,
+  hidden: 128,
+  heads: 4,
+  layers: 2,
+  window_normal: 8,
+  window_degraded: 4,
+  ffn_mult: 4,
+  logits: 256,
+  layers_degraded: 1,
+  seq_len_degraded: 16,
+  seq_len_safe: 4,
+  enable_kv_cache: true,
+  enable_external_writes: true
+};
+
+const transformer = WasmTransformer.with_config(config);
+```
+
+## Gate Policy
+
+Control when the gate intervenes:
+
+```javascript
+const policy = {
+  lambda_min: 30,
+  drop_ratio_q15_max: 12288,  // ~37.5%
+  boundary_edges_max: 20,
+  boundary_concentration_q15_max: 20480,  // ~62.5%
+  partitions_max: 10,
+  spike_rate_q15_max: 16384,
+  spike_novelty_q15_min: 2048,
+  allow_kv_write_when_unstable: true,
+  allow_external_write_when_unstable: false
+};
+
+transformer.set_policy(policy);
+```
+
+## Decision Types
+
+### Gate Decisions
+
+- **Allow**: Proceed normally with full capabilities
+- **ReduceScope**: Reduce sequence length and window size
+- **FlushKv**: Flush KV cache before proceeding
+- **FreezeWrites**: Run in read-only mode (no KV updates)
+- **QuarantineUpdates**: Run compute but discard all state changes
+
+### Decision Reasons
+
+- **None**: No intervention needed
+- **LambdaBelowMin**: Lambda below minimum threshold
+- **LambdaDroppedFast**: Lambda dropped too quickly
+- **BoundarySpike**: Boundary edge count exceeded threshold
+- **BoundaryConcentrationSpike**: Boundary concentration too high
+- **PartitionDrift**: Partition count indicates drift
+- **SpikeStorm**: Spike rate indicates overload
+- **ForcedByFlag**: Forced by flag in gate packet
+
+## Examples
+
+### Basic Inference
+
+```javascript
+const transformer = new WasmTransformer();
+const gate = new WasmGatePacket();
+const tokens = new Uint32Array([1, 2, 3, 4]);
+const result = transformer.infer(tokens, gate);
+console.log(result.decision);
+```
+
+### With Spike Scheduling
+
+```javascript
+const transformer = new WasmTransformer();
+const gate = new WasmGatePacket();
+const spike = new WasmSpikePacket();
+spike.fired = 1;
+spike.novelty_q15 = 8192;
+
+const tokens = new Uint32Array([1, 2, 3, 4]);
+const result = transformer.infer_with_spikes(tokens, gate, spike);
+```
+
+### Handling Interventions
+
+```javascript
+const transformer = new WasmTransformer();
+const gate = new WasmGatePacket();
+gate.lambda = 10;  // Low coherence
+gate.lambda_prev = 100;
+
+const tokens = new Uint32Array([1, 2, 3, 4]);
+const result = transformer.infer(tokens, gate);
+
+if (result.decision !== 'Allow') {
+  console.log('Intervention triggered:', result.reason);
+  console.log('Effective seq_len:', result.effective_seq_len);
+  console.log('KV writes:', result.kv_writes_enabled);
+}
+```
+
+## Building
+
+### Development
+
+```bash
+wasm-pack build --dev --target web
+```
+
+### Release (optimized)
+
+```bash
+wasm-pack build --release --target web
+```
+
+### For Node.js
+
+```bash
+wasm-pack build --target nodejs
+```
+
+### For Bundlers
+
+```bash
+wasm-pack build --target bundler
+```
+
+## Testing
+
+### Browser tests
+
+```bash
+wasm-pack test --headless --firefox
+wasm-pack test --headless --chrome
+```
+
+### Node.js tests
+
+```bash
+wasm-pack test --node
+```
+
+## Performance
+
+The WASM bindings maintain the core performance characteristics:
+
+- **Allocation-free hot path**: Zero heap allocations during inference
+- **Predictable latency**: Bounded p99 latency guarantees
+- **Small binary size**: ~50KB compressed (micro config)
+- **Low memory footprint**: ~128KB runtime state (micro config)
+
+## Integration with RuVector
+
+This transformer integrates with the RuVector ecosystem:
+
+- **ruvector-mincut**: Provides coherence signals via gate packets
+- **ruvector-core**: Vector search and semantic retrieval
+- **ruvector-router**: Query routing and orchestration
+
+## License
+
+MIT OR Apache-2.0
+
+## Links
+
+- [GitHub Repository](https://github.com/ruvnet/ruvector)
+- [Core Library](../ruvector-mincut-gated-transformer)
+- [RuVector Documentation](../../README.md)