# rvf-solver-wasm

Self-learning temporal reasoning engine compiled to WebAssembly -- Thompson Sampling, three-loop adaptive solver, and cryptographic witness chains in ~160 KB.

## Overview

`rvf-solver-wasm` compiles the complete AGI temporal puzzle solver to `wasm32-unknown-unknown` for use in browsers, Node.js, and edge runtimes. It is a `no_std + alloc` crate (same architecture as `rvf-wasm`) with a pure C ABI export surface -- no `wasm-bindgen` required.

The solver learns which solving strategy works best for each problem context using Thompson Sampling, compiles successful patterns into a signature cache, and proves its learning through a three-mode ablation test with SHAKE-256 witness chains.

### Key Design Choices

| Choice | Rationale |
|--------|-----------|
| **no_std + alloc** | Matches `rvf-wasm` pattern; runs in any WASM runtime |
| **Pure-integer `Date` type** | Howard Hinnant algorithm replaces `chrono`; no std required |
| **`BTreeMap` over `HashMap`** | Available in `alloc`; deterministic iteration order |
| **`libm` for float math** | `sqrt`, `log`, `cos`, `pow` -- pure Rust, no_std compatible |
| **xorshift64 RNG** | Deterministic, zero dependencies, identical to benchmarks RNG |
| **C ABI exports** | Maximum compatibility -- works with any WASM host |
| **Handle-based API** | Up to 8 concurrent solver instances |

## Build

```bash
# Build the WASM module
cargo build --target wasm32-unknown-unknown --release -p rvf-solver-wasm

# Optimize with wasm-opt (optional, ~80-100 KB output)
wasm-opt -Oz target/wasm32-unknown-unknown/release/rvf_solver_wasm.wasm \
  -o rvf_solver_wasm.opt.wasm
```

### Binary Size

| Build | Size |
|-------|------|
| Release (`wasm32-unknown-unknown`) | ~160 KB |
| After `wasm-opt -Oz` | ~80-100 KB |

## Architecture

### Three-Loop Adaptive Solver

The engine uses a three-loop architecture where each loop operates on a different timescale:

```
 Fast loop (per puzzle)          Medium loop (per batch)       Slow loop (per cycle)
 ┌──────────────────────┐       ┌──────────────────────┐     ┌──────────────────────┐
 │ Constraint propagation│──────▶│ PolicyKernel selects  │────▶│ ReasoningBank tracks │
 │ Range narrowing       │       │ skip mode via Thompson│     │ trajectories         │
 │ Date enumeration      │       │ Sampling (two-signal) │     │ KnowledgeCompiler    │
 │ Solution validation   │       │ Speculative dual-path │     │ compiles patterns    │
 └──────────────────────┘       └──────────────────────┘     │ Checkpoint/rollback  │
                                                              └──────────────────────┘
```

| Loop | Frequency | What it does |
|------|-----------|--------------|
| **Fast** | Every puzzle | Constraint propagation, range narrowing, date enumeration, solution check |
| **Medium** | Every puzzle | Thompson Sampling selects `None`/`Weekday`/`Hybrid` skip mode per context bucket |
| **Slow** | Per training cycle | ReasoningBank promotes successful trajectories; KnowledgeCompiler caches signatures |

### Five AGI Capabilities

| # | Capability | Description |
|---|-----------|-------------|
| 1 | **Thompson Sampling** | Two-signal model: Beta posterior for safety (correct + no early-commit) + EMA for cost |
| 2 | **18 Context Buckets** | 3 range (small/medium/large) x 3 distractor (clean/some/heavy) x 2 noise = 18 independent bandits |
| 3 | **Speculative Dual-Path** | When top-2 arms within delta 0.15 and variance > 0.02, speculatively execute secondary arm |
| 4 | **KnowledgeCompiler** | Constraint signature cache (`v1:{difficulty}:{sorted_types}`); compiled skip-mode, step budget, confidence |
| 5 | **Acceptance Test** | Multi-cycle training/holdout with A/B/C ablation and checkpoint/rollback on regression |

### Ablation Modes

| Mode | Compiler | Router | Purpose |
|------|----------|--------|---------|
| **A** (Baseline) | Off | Off | Fixed heuristic policy; establishes cost/accuracy baseline |
| **B** (Compiler) | On | Off | KnowledgeCompiler active; must show >= 15% cost decrease vs A |
| **C** (Full) | On | On | Thompson Sampling + speculation; must show robustness gain vs B |

## WASM Export Surface

### Memory Management (2 exports)

| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_solver_alloc` | `(size: i32) -> i32` | Allocate WASM memory; returns pointer or 0 |
| `rvf_solver_free` | `(ptr: i32, size: i32)` | Free previously allocated memory |

### Lifecycle (2 exports)

| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_solver_create` | `() -> i32` | Create solver instance; returns handle (>0) or -1 |
| `rvf_solver_destroy` | `(handle: i32) -> i32` | Destroy solver; returns 0 on success |

### Training (1 export)

| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_solver_train` | `(handle, count, min_diff, max_diff, seed_lo, seed_hi) -> i32` | Train on `count` generated puzzles using three-loop learning; returns correct count |

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `handle` | `i32` | Solver instance handle |
| `count` | `i32` | Number of puzzles to generate and solve |
| `min_diff` | `i32` | Minimum puzzle difficulty (1-10) |
| `max_diff` | `i32` | Maximum puzzle difficulty (1-10) |
| `seed_lo` | `i32` | Lower 32 bits of RNG seed |
| `seed_hi` | `i32` | Upper 32 bits of RNG seed |

### Acceptance Test (1 export)

| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_solver_acceptance` | `(handle, holdout, training, cycles, budget, seed_lo, seed_hi) -> i32` | Run full A/B/C ablation test; returns 1 = passed, 0 = failed, -1 = error |

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `handle` | `i32` | Solver instance handle |
| `holdout` | `i32` | Number of holdout puzzles per evaluation |
| `training` | `i32` | Training puzzles per cycle |
| `cycles` | `i32` | Number of training/evaluation cycles |
| `budget` | `i32` | Maximum steps per puzzle solve |
| `seed_lo` | `i32` | Lower 32 bits of RNG seed |
| `seed_hi` | `i32` | Upper 32 bits of RNG seed |

### Result / Policy / Witness Reads (6 exports)

| Export | Signature | Description |
|--------|-----------|-------------|
| `rvf_solver_result_len` | `(handle: i32) -> i32` | Byte length of last result JSON |
| `rvf_solver_result_read` | `(handle: i32, out_ptr: i32) -> i32` | Copy result JSON to `out_ptr`; returns bytes written |
| `rvf_solver_policy_len` | `(handle: i32) -> i32` | Byte length of policy state JSON |
| `rvf_solver_policy_read` | `(handle: i32, out_ptr: i32) -> i32` | Copy policy JSON to `out_ptr`; returns bytes written |
| `rvf_solver_witness_len` | `(handle: i32) -> i32` | Byte length of witness chain (73 bytes/entry) |
| `rvf_solver_witness_read` | `(handle: i32, out_ptr: i32) -> i32` | Copy raw witness chain to `out_ptr`; returns bytes written |

## Usage from JavaScript

### Node.js / Browser

```javascript
import { readFile } from 'fs/promises';

// Load WASM module
const wasmBytes = await readFile('rvf_solver_wasm.wasm');
const { instance } = await WebAssembly.instantiate(wasmBytes);
const wasm = instance.exports;

// Create a solver instance
const handle = wasm.rvf_solver_create();
console.log('Solver handle:', handle); // 1

// Train on 500 puzzles (difficulty 1-8, seed 42)
const correct = wasm.rvf_solver_train(handle, 500, 1, 8, 42, 0);
console.log(`Training: ${correct}/500 correct`);

// Run full acceptance test (A/B/C ablation)
const passed = wasm.rvf_solver_acceptance(
  handle,
  100,  // holdout puzzles
  100,  // training per cycle
  5,    // cycles
  400,  // step budget
  42, 0 // seed
);
console.log('Acceptance test:', passed === 1 ? 'PASSED' : 'FAILED');

// Read the result manifest (JSON)
const resultLen = wasm.rvf_solver_result_len(handle);
const resultPtr = wasm.rvf_solver_alloc(resultLen);
wasm.rvf_solver_result_read(handle, resultPtr);
const resultJson = new TextDecoder().decode(
  new Uint8Array(wasm.memory.buffer, resultPtr, resultLen)
);
const manifest = JSON.parse(resultJson);
console.log('Mode A accuracy:', manifest.mode_a.cycles.at(-1).accuracy);
console.log('Mode B accuracy:', manifest.mode_b.cycles.at(-1).accuracy);
console.log('Mode C accuracy:', manifest.mode_c.cycles.at(-1).accuracy);
wasm.rvf_solver_free(resultPtr, resultLen);

// Read policy state (Thompson Sampling internals)
const policyLen = wasm.rvf_solver_policy_len(handle);
const policyPtr = wasm.rvf_solver_alloc(policyLen);
wasm.rvf_solver_policy_read(handle, policyPtr);
const policyJson = new TextDecoder().decode(
  new Uint8Array(wasm.memory.buffer, policyPtr, policyLen)
);
const policy = JSON.parse(policyJson);
console.log('Context buckets:', Object.keys(policy.context_stats).length);
console.log('Early commit rate:', (policy.early_commits_wrong / policy.early_commits_total * 100).toFixed(1) + '%');
wasm.rvf_solver_free(policyPtr, policyLen);

// Read witness chain (verifiable by rvf-wasm)
const witnessLen = wasm.rvf_solver_witness_len(handle);
const witnessPtr = wasm.rvf_solver_alloc(witnessLen);
wasm.rvf_solver_witness_read(handle, witnessPtr);
const witnessChain = new Uint8Array(
  wasm.memory.buffer, witnessPtr, witnessLen
).slice(); // copy out of WASM memory
console.log('Witness entries:', witnessLen / 73);
wasm.rvf_solver_free(witnessPtr, witnessLen);

// Clean up
wasm.rvf_solver_destroy(handle);
```

### Verify Witness Chain with rvf-wasm

```javascript
// Load both WASM modules
const solver = await WebAssembly.instantiate(solverWasmBytes);
const verifier = await WebAssembly.instantiate(rvfWasmBytes);

// Run acceptance test in solver
const handle = solver.instance.exports.rvf_solver_create();
solver.instance.exports.rvf_solver_acceptance(handle, 100, 100, 5, 400, 42, 0);

// Extract witness chain
const wLen = solver.instance.exports.rvf_solver_witness_len(handle);
const wPtr = solver.instance.exports.rvf_solver_alloc(wLen);
solver.instance.exports.rvf_solver_witness_read(handle, wPtr);
const chain = new Uint8Array(solver.instance.exports.memory.buffer, wPtr, wLen).slice();

// Copy into verifier memory and verify
const vPtr = verifier.instance.exports.rvf_alloc(wLen);
new Uint8Array(verifier.instance.exports.memory.buffer, vPtr, wLen).set(chain);
const entryCount = verifier.instance.exports.rvf_witness_verify(vPtr, wLen);

if (entryCount > 0) {
  console.log(`Witness chain verified: ${entryCount} entries`);
} else {
  console.error('Witness chain verification failed:', entryCount);
  // -2 = truncated, -3 = hash mismatch
}

verifier.instance.exports.rvf_free(vPtr, wLen);
solver.instance.exports.rvf_solver_destroy(handle);
```

## Module Structure

```
crates/rvf/rvf-solver-wasm/
├── Cargo.toml           # no_std + alloc, dlmalloc, libm, serde_json
├── README.md            # This file
└── src/
    ├── lib.rs           # 12 WASM exports, instance registry, panic handler
    ├── alloc_setup.rs   # dlmalloc global allocator, rvf_solver_alloc/free
    ├── types.rs         # Date arithmetic, Constraint, Puzzle, Rng64
    ├── policy.rs        # PolicyKernel, Thompson Sampling, KnowledgeCompiler
    └── engine.rs        # AdaptiveSolver, ReasoningBank, PuzzleGenerator, acceptance test
```

| File | Lines | Purpose |
|------|-------|---------|
| `types.rs` | 239 | Pure-integer date math (Howard Hinnant algorithm), 10 constraint types, puzzle checking, xorshift64 RNG |
| `policy.rs` | 505 | Thompson Sampling two-signal model, Marsaglia gamma sampling, 18 context buckets, KnowledgeCompiler signature cache |
| `engine.rs` | 690 | Three-loop solver, constraint propagation, ReasoningBank trajectory tracking, PuzzleGenerator, acceptance test runner |
| `lib.rs` | 396 | 12 C ABI WASM exports, handle-based registry (8 slots), SHAKE-256 witness chain, panic handler |
| `alloc_setup.rs` | 45 | dlmalloc global allocator, `rvf_solver_alloc`/`rvf_solver_free` interop |

## Temporal Constraint Types

The solver handles 10 constraint types for temporal puzzle solving:

| Constraint | Example | Description |
|------------|---------|-------------|
| `Exact(date)` | `2025-03-15` | Must be this exact date |
| `After(date)` | `> 2025-01-01` | Must be strictly after date |
| `Before(date)` | `< 2025-12-31` | Must be strictly before date |
| `Between(a, b)` | `2025-01-01..2025-06-30` | Must fall within range (inclusive) |
| `DayOfWeek(w)` | `Monday` | Must fall on this weekday |
| `DaysAfter(ref, n)` | `5 days after "meeting"` | Relative to named reference date |
| `DaysBefore(ref, n)` | `3 days before "deadline"` | Relative to named reference date |
| `InMonth(m)` | `March` | Must be in this month |
| `InYear(y)` | `2025` | Must be in this year |
| `DayOfMonth(d)` | `15th` | Must be this day of month |

## Thompson Sampling Details

### Two-Signal Model

Each skip-mode arm (`None`, `Weekday`, `Hybrid`) maintains two signals per context bucket:

| Signal | Distribution | Update Rule |
|--------|-------------|-------------|
| **Safety** | Beta(alpha, beta) | alpha += 1 on correct & no early-commit; beta += 1 on failure, beta += 1.5 on early-commit wrong |
| **Cost** | EMA (alpha = 0.1) | Normalized step count (steps / 200), exponentially weighted |

**Composite score:** `sample_beta(alpha, beta) - 0.3 * cost_ema`

### Context Bucketing

| Dimension | Levels | Thresholds |
|-----------|--------|------------|
| **Range** | small, medium, large | 0-60, 61-180, 181+ days |
| **Distractors** | clean, some, heavy | 0, 1, 2+ duplicate constraint types |
| **Noise** | clean, noisy | Whether puzzle has injected noise |

Total: 3 x 3 x 2 = **18 independent bandit contexts**

### Speculative Dual-Path

When the top-2 arms are within delta 0.15 of each other and the leading arm's variance exceeds 0.02, the solver speculatively executes the secondary arm. This accelerates convergence in uncertain contexts.

## Integration with RVF Ecosystem

```
┌──────────────────────┐         ┌──────────────────────┐
│   rvf-solver-wasm    │         │     rvf-wasm         │
│   (self-learning     │ ──────▶ │   (verification)     │
│    AGI engine)       │ witness │                      │
│                      │ chain   │ rvf_witness_verify   │
│ rvf_solver_train     │         │ rvf_witness_count    │
│ rvf_solver_acceptance│         │                      │
│ rvf_solver_witness_* │         │ rvf_store_*          │
└──────────┬───────────┘         └──────────────────────┘
           │ uses
    ┌──────▼──────┐
    │  rvf-crypto  │
    │  SHAKE-256   │
    │  witness     │
    │  chain       │
    └─────────────┘
```

- **rvf-solver-wasm** produces witness chains via `rvf-crypto::create_witness_chain`
- **rvf-wasm** verifies those chains via `rvf_witness_verify` (73 bytes per entry)
- Both modules run in the browser -- no backend required

## Dependencies

| Crate | Version | Purpose |
|-------|---------|---------|
| `rvf-types` | 0.1.0 | Shared RVF type definitions |
| `rvf-crypto` | 0.1.0 | SHAKE-256 hashing and witness chain creation |
| `dlmalloc` | 0.2 | Global allocator for WASM heap |
| `libm` | 0.2 | `no_std` float math (`sqrt`, `log`, `cos`, `pow`) |
| `serde` | 1.0 | Serialization (no_std, alloc features) |
| `serde_json` | 1.0 | JSON output for result/policy manifests (no_std, alloc) |

## Determinism

Given identical seeds, the WASM module produces identical results:

- Same seed produces same puzzles (xorshift64 RNG)
- Same puzzles produce same learning trajectory
- Same trajectory produces same witness chain hashes

Minor float precision differences between native and WASM (due to `libm` vs std `f64` methods) may cause Thompson Sampling to diverge over many iterations, but acceptance test outcomes should converge.

## Benchmarks

Run the native reference benchmark:

```bash
cargo run --bin wasm-solver-bench -- --holdout 50 --training 50 --cycles 3
```

Reference results (native):

| Mode | Accuracy | Cost/Solve | Noise Accuracy | Pass |
|------|----------|------------|----------------|------|
| A (baseline) | 100% | ~43 | ~100% | PASS |
| B (compiler) | 100% | ~10 | ~100% | PASS |
| C (learned) | 100% | ~10 | ~100% | PASS |

- B vs A cost decrease: ~76% (threshold: >= 15%)
- Thompson Sampling converges across 13+ context buckets with 3 unique skip modes

## Related ADRs

- [ADR-032](../../../docs/adr/ADR-032-rvf-wasm-integration.md) -- RVF WASM integration
- [ADR-037](../../../docs/adr/ADR-037-publishable-rvf-acceptance-test.md) -- Publishable RVF acceptance test
- [ADR-038](../../../docs/adr/ADR-038-npx-rvlite-witness-verification.md) -- npx/rvlite witness verification
- [ADR-039](../../../docs/adr/ADR-039-rvf-solver-wasm-agi-integration.md) -- RVF solver WASM AGI integration (this crate)

## License

MIT OR Apache-2.0