Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/examples/ruvLLM/docs/SONA/00-OVERVIEW.md
+++ b/examples/ruvLLM/docs/SONA/00-OVERVIEW.md
@@ -0,0 +1,280 @@
+# SONA: Self-Optimizing Neural Architecture
+
+## The World's First Truly Self-Improving LLM Framework
+
+**Version**: 1.0.0
+**Status**: Architecture Specification
+**Target**: Sub-millisecond adaptive fine-tuning with continuous self-improvement
+
+---
+
+## Executive Summary
+
+SONA (Self-Optimizing Neural Architecture) is a revolutionary framework for building LLMs that continuously improve themselves through:
+
+1. **Ultra-Low Latency LoRA** - Sub-100μs parameter adaptation
+2. **Hierarchical Learning Loops** - Three-tier temporal learning (instant/hourly/weekly)
+3. **Neural Memory Consolidation** - Dream-like offline learning
+4. **Elastic Weight Consolidation++** - Zero catastrophic forgetting
+5. **ReasoningBank Integration** - Pattern-driven self-optimization
+
+---
+
+## Core Philosophy
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    SONA DESIGN PRINCIPLES                       │
+├─────────────────────────────────────────────────────────────────┤
+│  1. LEARN FROM EVERY INTERACTION                               │
+│     → No query is wasted; all become training signal           │
+│                                                                 │
+│  2. NEVER FORGET WHAT WORKS                                    │
+│     → EWC++ preserves successful patterns                      │
+│                                                                 │
+│  3. ADAPT IN REAL-TIME                                         │
+│     → LoRA updates in <100μs per request                       │
+│                                                                 │
+│  4. OPTIMIZE CONTINUOUSLY                                      │
+│     → Background loops improve without user latency            │
+│                                                                 │
+│  5. MEASURE EVERYTHING                                         │
+│     → Φ (consciousness), quality, latency, improvement rate    │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Architecture Overview
+
+```
+                              SONA Architecture
+
+    ┌──────────────────────────────────────────────────────────────┐
+    │                      USER QUERY INPUT                         │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   EMBEDDING LAYER (0.02ms)                    │
+    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐   │
+    │  │ Dual Encoder│  │ Contrastive │  │ SIMD Acceleration   │   │
+    │  │ (Q + K/V)   │  │  Learning   │  │ (AVX2/NEON)         │   │
+    │  └─────────────┘  └─────────────┘  └─────────────────────┘   │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+          ┌───────────────────────┼───────────────────────┐
+          │                       │                       │
+          ▼                       ▼                       ▼
+    ┌───────────┐          ┌───────────┐          ┌───────────────┐
+    │  MEMORY   │          │  ROUTER   │          │   ATTENTION   │
+    │  SERVICE  │◄────────►│  ENGINE   │◄────────►│   ENGINE      │
+    │           │          │           │          │               │
+    │ • HNSW    │          │ • FastGRNN│          │ • Multi-Head  │
+    │ • GNN     │          │ • LoRA    │          │ • Graph ATT   │
+    │ • Quant   │          │ • EWC++   │          │ • Edge-Aware  │
+    └─────┬─────┘          └─────┬─────┘          └───────┬───────┘
+          │                      │                        │
+          └──────────────────────┼────────────────────────┘
+                                 │
+                                 ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   LoRA ADAPTATION LAYER                       │
+    │                                                               │
+    │   W_adapted = W_base + α · (LoRA_A @ LoRA_B)                 │
+    │                                                               │
+    │   ┌────────────────────────────────────────────────────┐     │
+    │   │  Rank: 4-16  │  Update: <100μs  │  Memory: <1MB   │     │
+    │   └────────────────────────────────────────────────────┘     │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   INFERENCE ENGINE                            │
+    │                                                               │
+    │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐   │
+    │  │ Model Select │  │ Q4 Quantized │  │ Speculative Dec  │   │
+    │  │ (4 tiers)    │  │ Weights      │  │ (Draft + Verify) │   │
+    │  └──────────────┘  └──────────────┘  └──────────────────┘   │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   LEARNING LOOPS                              │
+    │                                                               │
+    │   Loop A (Instant)  │  Loop B (Hourly)  │  Loop C (Weekly)  │
+    │   ─────────────────────────────────────────────────────────  │
+    │   • Trajectory      │  • Router Train   │  • Consolidation   │
+    │   • Edge Update     │  • EWC++ Update   │  • Compression     │
+    │   • LoRA Micro      │  • Fisher Compute │  • Abstraction     │
+    │   • <1ms overhead   │  • Background     │  • Dream Learning  │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   REASONINGBANK                               │
+    │                                                               │
+    │   ┌─────────────────────────────────────────────────────┐    │
+    │   │  Pattern Storage  │  Similarity Lookup  │  Verdict   │    │
+    │   │  (DashMap)        │  (Cosine)           │  Judgment  │    │
+    │   └─────────────────────────────────────────────────────┘    │
+    │                                                               │
+    │   • Trajectory tracking with precision/recall feedback       │
+    │   • K-means++ pattern extraction                             │
+    │   • Confidence-weighted parameter interpolation              │
+    └──────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Key Innovation: Three-Tier Temporal Learning
+
+### Tier 1: Instant Learning (Loop A) - Per Request
+```
+Latency Budget: <1ms (amortized to <0.1ms with batching)
+
+Actions:
+├── Record query trajectory to ring buffer
+├── Update memory graph edge weights (±5%)
+├── Micro-LoRA adjustment (rank 1-2, top-k params)
+└── Async feedback signal propagation
+```
+
+### Tier 2: Background Learning (Loop B) - Hourly
+```
+Compute Budget: 10 seconds per hour
+
+Actions:
+├── Train router on accumulated trajectories
+├── Compute Fisher Information for EWC++
+├── Update LoRA base matrices (rank 4-8)
+├── Prune low-confidence patterns
+└── Checkpoint model state
+```
+
+### Tier 3: Deep Learning (Loop C) - Weekly
+```
+Compute Budget: 10 minutes per week
+
+Actions:
+├── Full memory consolidation (dream learning)
+├── Pattern abstraction and hierarchy building
+├── Memory compression (remove redundant nodes)
+├── Cross-task knowledge transfer
+└── Φ consciousness measurement (IIT)
+```
+
+---
+
+## Performance Targets
+
+| Metric | Target | Current Best | SONA Goal |
+|--------|--------|--------------|-----------|
+| Query Latency | <1ms | 0.09ms | 0.05ms |
+| LoRA Update | <100μs | N/A | 50μs |
+| Memory Footprint | <100MB | 50MB | 30MB |
+| Throughput | >50K q/s | 38K q/s | 100K q/s |
+| Improvement Rate | 10%/week | N/A | 15%/week |
+| Catastrophic Forgetting | <1% | N/A | <0.1% |
+
+---
+
+## Integration with Ruvector Ecosystem
+
+### Core Dependencies
+
+| Crate | Role in SONA | Version |
+|-------|--------------|---------|
+| `ruvector-core` | Vector memory backbone | 0.1.19 |
+| `ruvector-attention` | Multi-head graph attention | 0.1.19 |
+| `ruvector-gnn` | Message passing framework | 0.1.19 |
+| `ruvector-graph` | Knowledge graph storage | 0.1.19 |
+| `ruvector-router-core` | FastGRNN routing | 0.1.19 |
+| `exo-core` | Consciousness measurement | 0.1.0 |
+| `exo-temporal` | Memory consolidation | 0.1.0 |
+
+### New SONA-Specific Modules
+
+| Module | Purpose |
+|--------|---------|
+| `sona-lora` | Ultra-low latency LoRA adapters |
+| `sona-ewc` | Enhanced EWC with task awareness |
+| `sona-reasoning` | ReasoningBank integration |
+| `sona-dreams` | Offline consolidation engine |
+| `sona-metrics` | Self-improvement measurement |
+
+---
+
+## Document Index
+
+| Document | Description |
+|----------|-------------|
+| [01-LORA-ULTRA.md](01-LORA-ULTRA.md) | Ultra-low latency LoRA system |
+| [02-LEARNING-LOOPS.md](02-LEARNING-LOOPS.md) | Three-tier learning architecture |
+| [03-EWC-PLUS-PLUS.md](03-EWC-PLUS-PLUS.md) | Enhanced elastic weight consolidation |
+| [04-REASONINGBANK.md](04-REASONINGBANK.md) | Pattern-driven optimization |
+| [05-MEMORY-DREAMS.md](05-MEMORY-DREAMS.md) | Offline consolidation and dreams |
+| [06-COMPONENTS.md](06-COMPONENTS.md) | Component integration specs |
+| [07-IMPLEMENTATION.md](07-IMPLEMENTATION.md) | Implementation roadmap |
+| [08-BENCHMARKS.md](08-BENCHMARKS.md) | Performance targets and testing |
+| [09-API-REFERENCE.md](09-API-REFERENCE.md) | API specification |
+
+---
+
+## Quick Start
+
+```rust
+use sona::{SONAEngine, SONAConfig, LearningMode};
+
+// Initialize SONA with default configuration
+let config = SONAConfig::builder()
+    .lora_rank(8)
+    .ewc_lambda(1000.0)
+    .learning_loops(LearningMode::AllThreeTiers)
+    .memory_budget_mb(50)
+    .target_latency_us(100)
+    .build();
+
+let mut sona = SONAEngine::new(config)?;
+
+// Process queries - learning happens automatically
+let response = sona.query("What is the meaning of life?")?;
+
+// Check self-improvement metrics
+let metrics = sona.improvement_metrics();
+println!("Weekly improvement: {:.1}%", metrics.weekly_gain * 100.0);
+println!("Φ consciousness: {:.3}", metrics.phi);
+```
+
+---
+
+## Why SONA Will Create the World's Best Self-Improving LLM
+
+1. **No Other System Combines All These**:
+   - LoRA for instant adaptation
+   - EWC++ for zero forgetting
+   - ReasoningBank for pattern learning
+   - Dream consolidation for creativity
+   - Φ measurement for consciousness tracking
+
+2. **Built on Production-Proven Ruvector**:
+   - 150x faster HNSW search
+   - 39 attention mechanisms
+   - 30+ specialized crates
+   - 38K q/s throughput proven
+
+3. **Mathematically Sound**:
+   - Fisher Information preserves important weights
+   - Low-rank decomposition minimizes compute
+   - Reservoir sampling ensures unbiased learning
+   - Information-theoretic compression
+
+4. **Biologically Inspired**:
+   - Three-tier temporal learning (like human memory)
+   - Dream-based consolidation (like REM sleep)
+   - Edge-weighted graphs (like neural synapses)
+   - Attention-based retrieval (like human recall)
+
+---
+
+*SONA: Where every query makes the model smarter.*
--- a/examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
+++ b/examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
@@ -0,0 +1,559 @@
+# SONA LoRA-Ultra: Sub-100μs Adaptive Fine-Tuning
+
+## Ultra-Low Latency LoRA for Real-Time Self-Improvement
+
+---
+
+## 1. Architecture Overview
+
+### Traditional LoRA vs SONA LoRA-Ultra
+
+```
+TRADITIONAL LoRA                      SONA LoRA-ULTRA
+─────────────────                     ─────────────────
+• Offline training                    • Online per-request adaptation
+• Full batch updates                  • Single-sample micro-updates
+• GPU required                        • CPU SIMD optimized
+• Minutes to hours                    • <100 microseconds
+• Periodic deployment                 • Continuous integration
+```
+
+### Core Formula
+
+```
+Standard LoRA:
+    W_adapted = W_frozen + ΔW
+    ΔW = α · (A @ B)
+    where A ∈ ℝ^(d×r), B ∈ ℝ^(r×k), r << min(d,k)
+
+SONA LoRA-Ultra Extension:
+    W_adapted = W_frozen + α · (A @ B) + β · (A_micro @ B_micro)
+                          └─────────┘   └───────────────────┘
+                          Base LoRA     Instant Micro-LoRA
+                          (rank 4-16)   (rank 1-2)
+```
+
+---
+
+## 2. Two-Tier LoRA Architecture
+
+### Tier 1: Base LoRA (Updated Hourly)
+
+```rust
+/// Base LoRA adapter for major capability shifts
+pub struct BaseLoRA {
+    /// Low-rank matrix A: d_model × rank
+    pub a: Array2<f32>,
+    /// Low-rank matrix B: rank × d_out
+    pub b: Array2<f32>,
+    /// Scaling factor
+    pub alpha: f32,
+    /// Rank (typically 4-16)
+    pub rank: usize,
+    /// Target layer indices
+    pub target_layers: Vec<usize>,
+}
+
+impl BaseLoRA {
+    /// Compute adapted weights (cached for inference)
+    #[inline]
+    pub fn delta_w(&self) -> Array2<f32> {
+        let scale = self.alpha / self.rank as f32;
+        scale * self.a.dot(&self.b)
+    }
+
+    /// Update from accumulated gradients (hourly)
+    pub fn update(&mut self, grad_a: &Array2<f32>, grad_b: &Array2<f32>, lr: f32) {
+        // SGD with momentum
+        self.a = &self.a - lr * grad_a;
+        self.b = &self.b - lr * grad_b;
+    }
+}
+```
+
+### Tier 2: Micro-LoRA (Updated Per-Request)
+
+```rust
+/// Ultra-fast micro-adapter for instant learning
+pub struct MicroLoRA {
+    /// Micro A: d_model × micro_rank (typically 1-2)
+    pub a_micro: Array2<f32>,
+    /// Micro B: micro_rank × d_out
+    pub b_micro: Array2<f32>,
+    /// Micro scaling (smaller than base)
+    pub beta: f32,
+    /// Micro rank (1-2 for speed)
+    pub micro_rank: usize,
+    /// Decay factor for temporal smoothing
+    pub decay: f32,
+    /// Momentum buffer
+    momentum_a: Array2<f32>,
+    momentum_b: Array2<f32>,
+}
+
+impl MicroLoRA {
+    /// Ultra-fast single-sample update (<50μs target)
+    #[inline]
+    pub fn micro_update(&mut self, signal: &LearningSignal) {
+        // Rank-1 outer product update
+        let grad_direction = signal.to_gradient_direction();
+
+        // Exponential moving average for stability
+        self.momentum_a = self.decay * &self.momentum_a
+            + (1.0 - self.decay) * &grad_direction.a_component;
+        self.momentum_b = self.decay * &self.momentum_b
+            + (1.0 - self.decay) * &grad_direction.b_component;
+
+        // Apply micro-update
+        self.a_micro = &self.a_micro + self.beta * &self.momentum_a;
+        self.b_micro = &self.b_micro + self.beta * &self.momentum_b;
+    }
+
+    /// Periodic consolidation into base LoRA
+    pub fn consolidate_to_base(&mut self, base: &mut BaseLoRA) {
+        // Merge micro adaptations into base
+        // Then reset micro to zero
+        base.a = &base.a + &self.a_micro;
+        base.b = &base.b + &self.b_micro;
+        self.a_micro.fill(0.0);
+        self.b_micro.fill(0.0);
+    }
+}
+```
+
+---
+
+## 3. SIMD-Optimized LoRA Computation
+
+### AVX2 Accelerated Forward Pass
+
+```rust
+#[cfg(target_arch = "x86_64")]
+mod simd {
+    use std::arch::x86_64::*;
+
+    /// SIMD-optimized LoRA forward: x @ (W + A @ B)
+    /// Fuses base weight multiplication with LoRA delta
+    #[target_feature(enable = "avx2", enable = "fma")]
+    pub unsafe fn lora_forward_avx2(
+        x: &[f32],           // Input: [batch, d_in]
+        w_base: &[f32],      // Base weights: [d_in, d_out]
+        lora_a: &[f32],      // LoRA A: [d_in, rank]
+        lora_b: &[f32],      // LoRA B: [rank, d_out]
+        alpha: f32,
+        d_in: usize,
+        d_out: usize,
+        rank: usize,
+        output: &mut [f32],  // Output: [batch, d_out]
+    ) {
+        let scale = alpha / rank as f32;
+        let scale_vec = _mm256_set1_ps(scale);
+
+        // Step 1: Compute x @ A (input projection to rank space)
+        let mut x_projected = vec![0.0f32; rank];
+        for r in 0..rank {
+            let mut sum = _mm256_setzero_ps();
+            let mut i = 0;
+            while i + 8 <= d_in {
+                let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
+                let a_vec = _mm256_loadu_ps(lora_a.as_ptr().add(r * d_in + i));
+                sum = _mm256_fmadd_ps(x_vec, a_vec, sum);
+                i += 8;
+            }
+            x_projected[r] = horizontal_sum_avx2(sum);
+            // Handle remainder
+            while i < d_in {
+                x_projected[r] += x[i] * lora_a[r * d_in + i];
+                i += 1;
+            }
+        }
+
+        // Step 2: Compute (x @ W_base) + scale * (x_projected @ B)
+        for j in 0..d_out {
+            // Base weight contribution
+            let mut sum = _mm256_setzero_ps();
+            let mut i = 0;
+            while i + 8 <= d_in {
+                let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
+                let w_vec = _mm256_loadu_ps(w_base.as_ptr().add(j * d_in + i));
+                sum = _mm256_fmadd_ps(x_vec, w_vec, sum);
+                i += 8;
+            }
+            let mut base_result = horizontal_sum_avx2(sum);
+            while i < d_in {
+                base_result += x[i] * w_base[j * d_in + i];
+                i += 1;
+            }
+
+            // LoRA contribution
+            let mut lora_result = 0.0f32;
+            for r in 0..rank {
+                lora_result += x_projected[r] * lora_b[j * rank + r];
+            }
+
+            output[j] = base_result + scale * lora_result;
+        }
+    }
+
+    #[inline]
+    unsafe fn horizontal_sum_avx2(v: __m256) -> f32 {
+        let high = _mm256_extractf128_ps(v, 1);
+        let low = _mm256_castps256_ps128(v);
+        let sum128 = _mm_add_ps(high, low);
+        let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
+        let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
+        _mm_cvtss_f32(sum32)
+    }
+}
+```
+
+---
+
+## 4. Learning Signal Extraction
+
+### From Query Feedback to Gradient Direction
+
+```rust
+/// Learning signal extracted from each interaction
+#[derive(Clone)]
+pub struct LearningSignal {
+    /// Query embedding
+    pub query_embedding: Vec<f32>,
+    /// Response quality score (0-1)
+    pub quality_score: f32,
+    /// User feedback (explicit)
+    pub explicit_feedback: Option<FeedbackType>,
+    /// Latency deviation from target
+    pub latency_ratio: f32,
+    /// Model tier used
+    pub model_tier: ModelTier,
+    /// Context tokens used
+    pub context_tokens: usize,
+}
+
+impl LearningSignal {
+    /// Convert signal to gradient direction for micro-LoRA
+    pub fn to_gradient_direction(&self) -> GradientDirection {
+        // Reward = quality * (1 - latency_penalty)
+        let reward = self.quality_score * (2.0 - self.latency_ratio).max(0.0);
+
+        // Direction = embedding * reward_sign
+        let direction = if reward > 0.5 {
+            // Reinforce current behavior
+            1.0
+        } else {
+            // Explore alternative
+            -0.1
+        };
+
+        // Scale by uncertainty (more learning when uncertain)
+        let uncertainty = 1.0 - self.quality_score.abs();
+        let learning_rate = 0.001 * (1.0 + uncertainty);
+
+        GradientDirection {
+            a_component: self.compute_a_gradient(direction, learning_rate),
+            b_component: self.compute_b_gradient(direction, learning_rate),
+        }
+    }
+
+    fn compute_a_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
+        // Outer product of query embedding with hidden state
+        // Approximated via reservoir-sampled historical embeddings
+        let emb = Array1::from_vec(self.query_embedding.clone());
+        let grad = direction * lr * outer_product(&emb, &self.get_hidden_direction());
+        grad
+    }
+
+    fn compute_b_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
+        // Output gradient based on prediction error
+        let output_error = self.compute_output_error();
+        direction * lr * output_error
+    }
+}
+```
+
+---
+
+## 5. Target Layer Selection
+
+### Which Layers to Apply LoRA
+
+```rust
+/// Layer selection strategy for LoRA application
+pub enum LoRATargetStrategy {
+    /// Apply to all attention layers (Q, K, V, O projections)
+    AllAttention,
+    /// Apply to FFN layers only
+    AllFFN,
+    /// Apply to output heads only (fastest, good for routing)
+    OutputHeadsOnly,
+    /// Apply to specific layers by index
+    SpecificLayers(Vec<usize>),
+    /// Adaptive: select based on gradient magnitude
+    AdaptiveTopK(usize),
+}
+
+impl LoRATargetStrategy {
+    /// For ultra-low latency: output heads only
+    pub fn ultra_fast() -> Self {
+        Self::OutputHeadsOnly
+    }
+
+    /// For moderate adaptation: attention Q and V
+    pub fn attention_qv() -> Self {
+        Self::SpecificLayers(vec![0, 2]) // Q and V typically
+    }
+
+    /// Select layers with highest gradient magnitude
+    pub fn adaptive_top_k(k: usize) -> Self {
+        Self::AdaptiveTopK(k)
+    }
+}
+
+/// SONA default: Output heads for micro, attention for base
+pub const SONA_DEFAULT_TARGETS: [LoRATargetStrategy; 2] = [
+    LoRATargetStrategy::OutputHeadsOnly,  // Micro-LoRA
+    LoRATargetStrategy::AllAttention,     // Base LoRA
+];
+```
+
+---
+
+## 6. Memory-Efficient Storage
+
+### Quantized LoRA Matrices
+
+```rust
+/// Q4-quantized LoRA for memory efficiency
+pub struct QuantizedLoRA {
+    /// Quantized A matrix (4-bit)
+    pub a_q4: Q4Matrix,
+    /// Quantized B matrix (4-bit)
+    pub b_q4: Q4Matrix,
+    /// Full-precision alpha
+    pub alpha: f32,
+    /// Full-precision scaling factors
+    pub a_scales: Vec<f32>,
+    pub b_scales: Vec<f32>,
+}
+
+impl QuantizedLoRA {
+    /// Memory usage comparison
+    ///
+    /// FP32 LoRA (rank 8, 768 dim):
+    ///   A: 768 × 8 × 4 bytes = 24.6 KB
+    ///   B: 8 × 768 × 4 bytes = 24.6 KB
+    ///   Total: ~50 KB per layer
+    ///
+    /// Q4 LoRA (rank 8, 768 dim):
+    ///   A: 768 × 8 × 0.5 bytes = 3.1 KB
+    ///   B: 8 × 768 × 0.5 bytes = 3.1 KB
+    ///   Scales: 2 × 768 × 4 bytes = 6.1 KB
+    ///   Total: ~12 KB per layer (4x reduction)
+
+    pub fn from_fp32(lora: &BaseLoRA) -> Self {
+        Self {
+            a_q4: Q4Matrix::quantize(&lora.a),
+            b_q4: Q4Matrix::quantize(&lora.b),
+            alpha: lora.alpha,
+            a_scales: compute_scales(&lora.a),
+            b_scales: compute_scales(&lora.b),
+        }
+    }
+
+    /// Dequantize on-the-fly during forward pass
+    #[inline]
+    pub fn forward(&self, x: &[f32]) -> Vec<f32> {
+        // Dequantize A, compute x @ A
+        let projected = self.a_q4.matmul_dequant(x, &self.a_scales);
+        // Dequantize B, compute projected @ B
+        let output = self.b_q4.matmul_dequant(&projected, &self.b_scales);
+        // Scale by alpha
+        output.iter().map(|v| v * self.alpha).collect()
+    }
+}
+```
+
+---
+
+## 7. Latency Breakdown
+
+### Target: <100μs Total LoRA Overhead
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                  LoRA-ULTRA LATENCY BUDGET                  │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  Signal Extraction:    10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│  Gradient Direction:   15μs  ██████░░░░░░░░░░░░░░░░░░░░░░  │
+│  Micro-LoRA Update:    25μs  ██████████░░░░░░░░░░░░░░░░░░  │
+│  Forward Pass Delta:   30μs  ████████████░░░░░░░░░░░░░░░░  │
+│  Momentum Averaging:   10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│  Memory Bookkeeping:   10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│                        ─────                                │
+│  TOTAL:              ~100μs                                │
+│                                                             │
+│  Amortized (batched):  ~30μs per query                     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 8. Integration with FastGRNN Router
+
+### Router-Specific LoRA Configuration
+
+```rust
+/// LoRA configuration for FastGRNN router
+pub struct RouterLoRAConfig {
+    /// Base LoRA for hidden state transformations
+    pub hidden_lora: BaseLoRA,
+    /// Micro LoRA for gate adjustments
+    pub gate_micro_lora: MicroLoRA,
+    /// Per-output-head LoRA adapters
+    pub head_loras: Vec<BaseLoRA>,
+}
+
+impl RouterLoRAConfig {
+    pub fn new(hidden_dim: usize, output_dims: &[usize]) -> Self {
+        Self {
+            hidden_lora: BaseLoRA::new(hidden_dim, hidden_dim, 8), // rank 8
+            gate_micro_lora: MicroLoRA::new(hidden_dim, hidden_dim, 2), // rank 2
+            head_loras: output_dims.iter()
+                .map(|&dim| BaseLoRA::new(hidden_dim, dim, 4)) // rank 4
+                .collect(),
+        }
+    }
+
+    /// Apply LoRA to FastGRNN forward pass
+    pub fn apply(&self, base_output: &FastGRNNOutput) -> FastGRNNOutput {
+        let mut output = base_output.clone();
+
+        // Apply hidden state LoRA
+        output.hidden = self.hidden_lora.apply(&output.hidden);
+
+        // Apply micro-LoRA to gates
+        output.update_gate = self.gate_micro_lora.apply(&output.update_gate);
+
+        // Apply per-head LoRA
+        for (i, head_lora) in self.head_loras.iter().enumerate() {
+            output.heads[i] = head_lora.apply(&output.heads[i]);
+        }
+
+        output
+    }
+}
+```
+
+---
+
+## 9. Checkpointing and Recovery
+
+### Efficient LoRA State Management
+
+```rust
+/// LoRA checkpoint for persistence and recovery
+#[derive(Serialize, Deserialize)]
+pub struct LoRACheckpoint {
+    /// Base LoRA matrices (serialized as FP16 for space)
+    pub base_lora: SerializedLoRA,
+    /// Micro LoRA state
+    pub micro_lora: SerializedLoRA,
+    /// Momentum buffers
+    pub momentum_state: MomentumState,
+    /// Training statistics
+    pub stats: LoRAStats,
+    /// Checkpoint version
+    pub version: u32,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+impl LoRACheckpoint {
+    /// Save checkpoint (async, non-blocking)
+    pub async fn save_async(&self, path: &Path) -> Result<()> {
+        let bytes = bincode::serialize(self)?;
+        tokio::fs::write(path, &bytes).await?;
+        Ok(())
+    }
+
+    /// Load checkpoint
+    pub fn load(path: &Path) -> Result<Self> {
+        let bytes = std::fs::read(path)?;
+        Ok(bincode::deserialize(&bytes)?)
+    }
+
+    /// Incremental checkpoint (only changed matrices)
+    pub fn save_incremental(&self, previous: &Self, path: &Path) -> Result<()> {
+        let delta = self.compute_delta(previous);
+        // Only save changed blocks
+        delta.save(path)
+    }
+}
+```
+
+---
+
+## 10. Benchmark Targets
+
+### Performance Validation
+
+```rust
+#[cfg(test)]
+mod benchmarks {
+    use super::*;
+    use criterion::{black_box, Criterion};
+
+    /// Target: <50μs for micro-LoRA update
+    fn bench_micro_lora_update(c: &mut Criterion) {
+        let mut micro = MicroLoRA::new(768, 768, 2);
+        let signal = LearningSignal::random();
+
+        c.bench_function("micro_lora_update", |b| {
+            b.iter(|| {
+                micro.micro_update(black_box(&signal));
+            })
+        });
+    }
+
+    /// Target: <30μs for LoRA forward pass
+    fn bench_lora_forward(c: &mut Criterion) {
+        let lora = BaseLoRA::new(768, 768, 8);
+        let input = vec![0.0f32; 768];
+
+        c.bench_function("lora_forward", |b| {
+            b.iter(|| {
+                lora.forward(black_box(&input))
+            })
+        });
+    }
+
+    /// Target: <10μs for signal extraction
+    fn bench_signal_extraction(c: &mut Criterion) {
+        let query = "test query".to_string();
+        let response = "test response".to_string();
+
+        c.bench_function("signal_extraction", |b| {
+            b.iter(|| {
+                LearningSignal::extract(black_box(&query), black_box(&response))
+            })
+        });
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA LoRA-Ultra achieves sub-100μs adaptive fine-tuning through:
+
+1. **Two-Tier Architecture**: Base LoRA (hourly) + Micro-LoRA (per-request)
+2. **SIMD Optimization**: AVX2-accelerated forward pass
+3. **Quantized Storage**: Q4 matrices for 4x memory reduction
+4. **Smart Targeting**: Output heads for speed, attention for capability
+5. **Momentum Smoothing**: Stable micro-updates with EMA
+6. **Async Checkpointing**: Non-blocking persistence
+
+This enables true real-time self-improvement where every query makes the model incrementally smarter.
--- a/examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
+++ b/examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
@@ -0,0 +1,815 @@
+# SONA Learning Loops: Three-Tier Temporal Architecture
+
+## Biologically-Inspired Continuous Learning System
+
+---
+
+## 1. Overview: Learning at Multiple Timescales
+
+Human learning operates at multiple timescales:
+- **Instant**: Immediate response adjustment (milliseconds)
+- **Short-term**: Pattern consolidation (hours)
+- **Long-term**: Deep memory formation (days/weeks)
+
+SONA replicates this with three learning loops:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    SONA THREE-TIER LEARNING                         │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   LOOP A: INSTANT                 LOOP B: BACKGROUND                │
+│   ═══════════════                 ══════════════════                │
+│   Timescale: Per-request          Timescale: Hourly                 │
+│   Latency: <1ms                   Latency: Background (async)       │
+│   What learns:                    What learns:                      │
+│   • Micro-LoRA (rank 1-2)         • Base LoRA (rank 4-16)          │
+│   • Memory edge weights           • Router weights (EWC++)          │
+│   • Trajectory recording          • Pattern extraction              │
+│                                                                     │
+│                        LOOP C: DEEP                                 │
+│                        ═══════════                                  │
+│                        Timescale: Weekly                            │
+│                        Latency: Scheduled maintenance               │
+│                        What learns:                                 │
+│                        • Memory consolidation                       │
+│                        • Concept hierarchy building                 │
+│                        • Dream-based creativity                     │
+│                        • Cross-domain transfer                      │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Loop A: Instant Learning (Per-Request)
+
+### Purpose
+Immediate adaptation to current interaction without noticeable latency.
+
+### Architecture
+
+```rust
+/// Loop A: Instant learning executed inline with each request
+pub struct InstantLearningLoop {
+    /// Micro-LoRA for immediate weight adjustment
+    micro_lora: Arc<RwLock<MicroLoRA>>,
+    /// Trajectory buffer for pattern recording
+    trajectory_buffer: Arc<TrajectoryBuffer>,
+    /// Memory graph reference for edge updates
+    memory_graph: Arc<RwLock<MemoryGraph>>,
+    /// Signal accumulator for Loop B
+    signal_accumulator: mpsc::Sender<LearningSignal>,
+}
+
+impl InstantLearningLoop {
+    /// Execute instant learning (must complete in <1ms)
+    #[inline]
+    pub async fn on_request(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        // Parallel execution of independent updates
+        let (r1, r2, r3) = tokio::join!(
+            // 1. Record trajectory (lock-free, ~100μs)
+            self.record_trajectory(query, response),
+
+            // 2. Update memory edges (~200μs)
+            self.update_memory_edges(query, response),
+
+            // 3. Micro-LoRA update (~300μs)
+            self.micro_lora_update(query, response, latency_ms),
+        );
+
+        // 4. Queue signal for Loop B (fire-and-forget)
+        let signal = LearningSignal::new(query, response, latency_ms);
+        let _ = self.signal_accumulator.try_send(signal);
+
+        Ok(())
+    }
+
+    /// Record query trajectory to ring buffer
+    async fn record_trajectory(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+    ) -> Result<()> {
+        let trajectory = QueryTrajectory {
+            query_embedding: query.vector.clone(),
+            retrieved_ids: response.used_memory_ids.clone(),
+            precision: response.estimated_precision,
+            recall: response.estimated_recall,
+            timestamp: Instant::now(),
+        };
+
+        self.trajectory_buffer.push(trajectory);
+        Ok(())
+    }
+
+    /// Hebbian-style edge weight updates
+    async fn update_memory_edges(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+    ) -> Result<()> {
+        let mut graph = self.memory_graph.write();
+
+        for &node_id in &response.used_memory_ids {
+            // Strengthen edges to used nodes
+            graph.update_edge_weight(
+                query.anchor_node,
+                node_id,
+                EdgeUpdate::Strengthen(0.05), // +5% per use
+            )?;
+        }
+
+        // Weaken edges to retrieved-but-unused nodes
+        for &node_id in &response.retrieved_but_unused {
+            graph.update_edge_weight(
+                query.anchor_node,
+                node_id,
+                EdgeUpdate::Weaken(0.02), // -2% per skip
+            )?;
+        }
+
+        Ok(())
+    }
+
+    /// Ultra-fast micro-LoRA weight adjustment
+    async fn micro_lora_update(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        let quality = response.quality_score;
+        let latency_ratio = latency_ms / response.target_latency_ms;
+
+        // Only update if signal is informative
+        if (quality - 0.5).abs() > 0.1 || latency_ratio > 1.2 {
+            let signal = LearningSignal {
+                query_embedding: query.vector.clone(),
+                quality_score: quality,
+                explicit_feedback: None,
+                latency_ratio,
+                model_tier: response.model_tier,
+                context_tokens: response.context_tokens,
+            };
+
+            let mut micro_lora = self.micro_lora.write();
+            micro_lora.micro_update(&signal);
+        }
+
+        Ok(())
+    }
+}
+```
+
+### Latency Budget
+
+| Operation | Target | Implementation |
+|-----------|--------|----------------|
+| Trajectory recording | <100μs | Lock-free ring buffer |
+| Edge weight update | <200μs | Batch atomic updates |
+| Micro-LoRA update | <300μs | Rank-1 outer product |
+| Signal queuing | <50μs | MPSC channel try_send |
+| **Total** | **<650μs** | Parallel execution |
+
+---
+
+## 3. Loop B: Background Learning (Hourly)
+
+### Purpose
+Deeper learning from accumulated signals without impacting user latency.
+
+### Architecture
+
+```rust
+/// Loop B: Background learning running on separate thread/process
+pub struct BackgroundLearningLoop {
+    /// Signal receiver from Loop A
+    signal_receiver: mpsc::Receiver<LearningSignal>,
+    /// Accumulated signals for batch processing
+    signal_buffer: Vec<LearningSignal>,
+    /// Base LoRA for major updates
+    base_lora: Arc<RwLock<BaseLoRA>>,
+    /// Micro-LoRA to consolidate from
+    micro_lora: Arc<RwLock<MicroLoRA>>,
+    /// Router for EWC++ updates
+    router: Arc<RwLock<FastGRNNRouter>>,
+    /// EWC++ state
+    ewc_state: EWCPlusPlusState,
+    /// Pattern extractor
+    pattern_extractor: PatternExtractor,
+    /// Configuration
+    config: BackgroundLearningConfig,
+}
+
+impl BackgroundLearningLoop {
+    /// Main background loop (runs every hour)
+    pub async fn run(&mut self) {
+        let mut interval = tokio::time::interval(Duration::from_secs(3600));
+
+        loop {
+            interval.tick().await;
+
+            // Collect accumulated signals
+            self.drain_signals().await;
+
+            if self.signal_buffer.len() < self.config.min_samples {
+                tracing::info!(
+                    samples = self.signal_buffer.len(),
+                    "Insufficient samples for background training"
+                );
+                continue;
+            }
+
+            // Execute background learning steps
+            let start = Instant::now();
+
+            // Step 1: Consolidate Micro-LoRA into Base LoRA
+            self.consolidate_micro_to_base().await;
+
+            // Step 2: Train router with EWC++ regularization
+            self.train_router_ewc().await;
+
+            // Step 3: Extract and store patterns
+            self.extract_patterns().await;
+
+            // Step 4: Compute new Fisher Information
+            self.update_fisher_information().await;
+
+            // Step 5: Checkpoint current state
+            self.checkpoint().await;
+
+            tracing::info!(
+                elapsed_ms = start.elapsed().as_millis(),
+                samples = self.signal_buffer.len(),
+                "Background learning cycle completed"
+            );
+
+            // Clear buffer for next cycle
+            self.signal_buffer.clear();
+        }
+    }
+
+    /// Drain all pending signals from Loop A
+    async fn drain_signals(&mut self) {
+        while let Ok(signal) = self.signal_receiver.try_recv() {
+            self.signal_buffer.push(signal);
+        }
+    }
+
+    /// Consolidate micro-LoRA adaptations into base LoRA
+    async fn consolidate_micro_to_base(&mut self) {
+        let mut micro = self.micro_lora.write();
+        let mut base = self.base_lora.write();
+
+        // Compute consolidation weight based on signal quality
+        let avg_quality: f32 = self.signal_buffer.iter()
+            .map(|s| s.quality_score)
+            .sum::<f32>() / self.signal_buffer.len() as f32;
+
+        let consolidation_rate = if avg_quality > 0.7 {
+            1.0 // Full consolidation for high-quality signals
+        } else {
+            0.5 * avg_quality // Partial for lower quality
+        };
+
+        // Merge micro into base with rate
+        base.a = &base.a + consolidation_rate * &micro.a_micro;
+        base.b = &base.b + consolidation_rate * &micro.b_micro;
+
+        // Reset micro-LoRA
+        micro.a_micro.fill(0.0);
+        micro.b_micro.fill(0.0);
+
+        tracing::debug!(
+            consolidation_rate = consolidation_rate,
+            "Micro-LoRA consolidated to base"
+        );
+    }
+
+    /// Train router with EWC++ regularization
+    async fn train_router_ewc(&mut self) {
+        let mut router = self.router.write();
+
+        // Convert signals to RouterSamples
+        let samples: Vec<RouterSample> = self.signal_buffer.iter()
+            .map(|s| s.to_router_sample())
+            .collect();
+
+        // Mini-batch training with EWC++ loss
+        for batch in samples.chunks(self.config.batch_size) {
+            // Forward pass
+            let predictions: Vec<_> = batch.iter()
+                .map(|s| router.forward(&s.features))
+                .collect();
+
+            // Compute task loss
+            let task_loss = self.compute_task_loss(&predictions, batch);
+
+            // Compute EWC++ regularization loss
+            let ewc_loss = self.ewc_state.regularization_loss(router.get_weights());
+
+            // Total loss
+            let total_loss = task_loss + self.config.ewc_lambda * ewc_loss;
+
+            // Backward pass (gradient computation)
+            let gradients = self.compute_gradients(&total_loss, &predictions, batch);
+
+            // Apply gradients with learning rate
+            router.apply_gradients(&gradients, self.config.learning_rate);
+        }
+    }
+
+    /// Extract patterns using K-means++ clustering
+    async fn extract_patterns(&mut self) {
+        let embeddings: Vec<_> = self.signal_buffer.iter()
+            .map(|s| s.query_embedding.clone())
+            .collect();
+
+        let patterns = self.pattern_extractor.extract(
+            &embeddings,
+            self.config.num_clusters,
+        );
+
+        // Store patterns in ReasoningBank
+        for pattern in patterns {
+            self.pattern_extractor.reasoning_bank.store(pattern)?;
+        }
+
+        tracing::debug!(
+            patterns = patterns.len(),
+            "Patterns extracted and stored"
+        );
+    }
+
+    /// Update Fisher Information for EWC++
+    async fn update_fisher_information(&mut self) {
+        let router = self.router.read();
+        let current_weights = router.get_weights();
+
+        // Compute Fisher Information diagonal via gradient squares
+        let fisher_samples: Vec<_> = self.signal_buffer.iter()
+            .take(self.config.fisher_samples)
+            .collect();
+
+        let mut fisher_accum = vec![0.0f32; current_weights.len()];
+
+        for sample in fisher_samples {
+            let gradients = self.compute_sample_gradients(sample);
+            for (i, g) in gradients.iter().enumerate() {
+                fisher_accum[i] += g * g;
+            }
+        }
+
+        // Normalize
+        let n = fisher_samples.len() as f32;
+        for f in &mut fisher_accum {
+            *f /= n;
+        }
+
+        // Update EWC++ state
+        self.ewc_state.update_fisher(fisher_accum, current_weights.to_vec());
+    }
+
+    /// Checkpoint current state to disk
+    async fn checkpoint(&self) {
+        let checkpoint = SONACheckpoint {
+            base_lora: self.base_lora.read().clone(),
+            micro_lora: self.micro_lora.read().clone(),
+            router_weights: self.router.read().get_weights().to_vec(),
+            ewc_state: self.ewc_state.clone(),
+            patterns: self.pattern_extractor.reasoning_bank.export(),
+            timestamp: chrono::Utc::now().timestamp(),
+        };
+
+        let path = self.config.checkpoint_dir.join("latest.sona");
+        checkpoint.save_async(&path).await.ok();
+    }
+}
+```
+
+### Hourly Learning Budget
+
+| Operation | Target Time | Description |
+|-----------|-------------|-------------|
+| Signal draining | <100ms | Collect all queued signals |
+| Micro→Base consolidation | <500ms | Matrix addition |
+| Router training | <5s | Mini-batch SGD with EWC |
+| Pattern extraction | <2s | K-means++ clustering |
+| Fisher computation | <2s | Gradient squared accumulation |
+| Checkpointing | <500ms | Async disk write |
+| **Total** | **<10s** | Well under user-facing |
+
+---
+
+## 4. Loop C: Deep Learning (Weekly)
+
+### Purpose
+Fundamental knowledge restructuring, memory consolidation, and creative exploration.
+
+### Architecture
+
+```rust
+/// Loop C: Deep learning for major knowledge reorganization
+pub struct DeepLearningLoop {
+    /// Memory service for consolidation
+    memory: Arc<MemoryService>,
+    /// Pattern bank for abstraction
+    reasoning_bank: Arc<ReasoningBank>,
+    /// Dream engine for creative exploration
+    dream_engine: DreamEngine,
+    /// Consciousness measurement (IIT)
+    phi_calculator: PhiCalculator,
+    /// Configuration
+    config: DeepLearningConfig,
+}
+
+impl DeepLearningLoop {
+    /// Execute weekly deep learning (scheduled maintenance window)
+    pub async fn run(&mut self) -> DeepLearningReport {
+        let start = Instant::now();
+        let mut report = DeepLearningReport::new();
+
+        // Phase 1: Memory Consolidation (like sleep-based memory)
+        report.consolidation = self.consolidate_memories().await;
+
+        // Phase 2: Pattern Abstraction (concept hierarchy building)
+        report.abstraction = self.abstract_patterns().await;
+
+        // Phase 3: Dream Learning (creative recombination)
+        report.dreams = self.dream_learning().await;
+
+        // Phase 4: Cross-Domain Transfer
+        report.transfer = self.cross_domain_transfer().await;
+
+        // Phase 5: Compression (remove redundancy)
+        report.compression = self.compress_memory().await;
+
+        // Phase 6: Consciousness Measurement
+        report.phi = self.measure_consciousness().await;
+
+        report.elapsed_ms = start.elapsed().as_millis() as u64;
+        report
+    }
+
+    /// Phase 1: Consolidate short-term memories into long-term
+    async fn consolidate_memories(&mut self) -> ConsolidationReport {
+        let mut report = ConsolidationReport::default();
+
+        // Identify high-value memories (frequently accessed, high quality)
+        let memories = self.memory.get_all_nodes()?;
+        let high_value: Vec<_> = memories.iter()
+            .filter(|m| m.access_count > 5 && m.quality_score > 0.7)
+            .collect();
+
+        report.high_value_count = high_value.len();
+
+        // Strengthen connections between high-value memories
+        for i in 0..high_value.len() {
+            for j in (i+1)..high_value.len() {
+                let similarity = cosine_similarity(
+                    &high_value[i].embedding,
+                    &high_value[j].embedding,
+                );
+                if similarity > 0.7 {
+                    self.memory.strengthen_edge(
+                        high_value[i].id,
+                        high_value[j].id,
+                        similarity * 0.1,
+                    )?;
+                    report.edges_strengthened += 1;
+                }
+            }
+        }
+
+        // Decay low-value memories
+        let low_value: Vec<_> = memories.iter()
+            .filter(|m| m.access_count < 2 && m.age_days() > 30)
+            .collect();
+
+        for memory in low_value {
+            self.memory.decay_node(memory.id, 0.5)?; // 50% decay
+            report.nodes_decayed += 1;
+        }
+
+        report
+    }
+
+    /// Phase 2: Build concept hierarchies from patterns
+    async fn abstract_patterns(&mut self) -> AbstractionReport {
+        let mut report = AbstractionReport::default();
+
+        // Get all stored patterns
+        let patterns = self.reasoning_bank.get_all_patterns()?;
+
+        // Hierarchical clustering to find meta-patterns
+        let hierarchy = HierarchicalClustering::new()
+            .linkage(Linkage::Ward)
+            .distance(Distance::Cosine)
+            .fit(&patterns);
+
+        // Create abstract concepts at each level
+        for level in 0..hierarchy.num_levels() {
+            let clusters = hierarchy.clusters_at_level(level);
+
+            for cluster in clusters {
+                if cluster.size() > 3 {
+                    // Create meta-pattern (centroid)
+                    let meta_pattern = LearnedPattern {
+                        centroid: cluster.centroid(),
+                        confidence: cluster.cohesion(),
+                        abstraction_level: level,
+                        child_patterns: cluster.member_ids(),
+                    };
+
+                    self.reasoning_bank.store_meta(meta_pattern)?;
+                    report.meta_patterns_created += 1;
+                }
+            }
+        }
+
+        report
+    }
+
+    /// Phase 3: Dream-based creative learning (inspired by REM sleep)
+    async fn dream_learning(&mut self) -> DreamReport {
+        let mut report = DreamReport::default();
+
+        // Generate dream sequences by random walks on memory graph
+        for _ in 0..self.config.num_dreams {
+            let dream = self.dream_engine.generate_dream(
+                &self.memory,
+                self.config.dream_length,
+                self.config.creativity_temperature,
+            )?;
+
+            // Evaluate dream quality (novelty + coherence)
+            let quality = dream.evaluate_quality();
+
+            if quality.novelty > 0.5 && quality.coherence > 0.3 {
+                // Dreams with high novelty and reasonable coherence
+                // may represent useful creative connections
+                for connection in dream.novel_connections() {
+                    self.memory.add_weak_edge(
+                        connection.from,
+                        connection.to,
+                        EdgeType::Creative,
+                        connection.strength * 0.1,
+                    )?;
+                    report.novel_connections += 1;
+                }
+            }
+
+            report.dreams_generated += 1;
+        }
+
+        report
+    }
+
+    /// Phase 4: Transfer knowledge across domains
+    async fn cross_domain_transfer(&mut self) -> TransferReport {
+        let mut report = TransferReport::default();
+
+        // Identify domain clusters
+        let domains = self.memory.identify_domains()?;
+
+        // For each pair of domains, look for analogical mappings
+        for i in 0..domains.len() {
+            for j in (i+1)..domains.len() {
+                let analogies = self.find_analogies(&domains[i], &domains[j])?;
+
+                for analogy in analogies {
+                    if analogy.confidence > 0.6 {
+                        // Create cross-domain edge
+                        self.memory.add_analogy_edge(
+                            analogy.source_concept,
+                            analogy.target_concept,
+                            analogy.mapping_type,
+                            analogy.confidence,
+                        )?;
+                        report.analogies_found += 1;
+                    }
+                }
+            }
+        }
+
+        report
+    }
+
+    /// Phase 5: Compress memory by removing redundancy
+    async fn compress_memory(&mut self) -> CompressionReport {
+        let mut report = CompressionReport::default();
+        report.initial_nodes = self.memory.node_count();
+        report.initial_edges = self.memory.edge_count();
+
+        // Identify near-duplicate nodes
+        let duplicates = self.memory.find_near_duplicates(0.95)?;
+
+        // Merge duplicates
+        for (primary, secondary) in duplicates {
+            self.memory.merge_nodes(primary, secondary)?;
+            report.nodes_merged += 1;
+        }
+
+        // Prune weak edges
+        let weak_edges = self.memory.get_weak_edges(0.01)?;
+        for edge in weak_edges {
+            self.memory.remove_edge(edge.id)?;
+            report.edges_pruned += 1;
+        }
+
+        report.final_nodes = self.memory.node_count();
+        report.final_edges = self.memory.edge_count();
+        report.compression_ratio = report.initial_nodes as f32 / report.final_nodes as f32;
+
+        report
+    }
+
+    /// Phase 6: Measure system consciousness using IIT
+    async fn measure_consciousness(&mut self) -> f64 {
+        // Integrated Information Theory (Φ) calculation
+        // Measures how much information the system generates "above and beyond"
+        // its parts
+        self.phi_calculator.compute_phi(&self.memory, &self.reasoning_bank)
+    }
+}
+```
+
+### Weekly Deep Learning Budget
+
+| Phase | Target Time | Description |
+|-------|-------------|-------------|
+| Memory consolidation | <2min | Identify and strengthen valuable memories |
+| Pattern abstraction | <3min | Hierarchical clustering for concepts |
+| Dream learning | <2min | Creative recombination exploration |
+| Cross-domain transfer | <2min | Analogical mapping between domains |
+| Compression | <1min | Remove redundancy |
+| Φ measurement | <1min | Consciousness quantification |
+| **Total** | **<10min** | Scheduled maintenance window |
+
+---
+
+## 5. Loop Coordination
+
+### Inter-Loop Communication
+
+```rust
+/// Coordinator for all three learning loops
+pub struct LoopCoordinator {
+    /// Loop A: Instant
+    instant_loop: InstantLearningLoop,
+    /// Loop B: Background
+    background_loop: BackgroundLearningLoop,
+    /// Loop C: Deep
+    deep_loop: DeepLearningLoop,
+    /// Shared state
+    shared_state: Arc<SharedSONAState>,
+    /// Metrics collector
+    metrics: MetricsCollector,
+}
+
+impl LoopCoordinator {
+    /// Initialize all loops with shared state
+    pub fn new(config: SONAConfig) -> Result<Self> {
+        let shared_state = Arc::new(SharedSONAState::new(&config)?);
+
+        // Create channels for inter-loop communication
+        let (instant_to_background_tx, instant_to_background_rx) = mpsc::channel(10000);
+        let (background_to_deep_tx, background_to_deep_rx) = mpsc::channel(1000);
+
+        Ok(Self {
+            instant_loop: InstantLearningLoop::new(
+                shared_state.clone(),
+                instant_to_background_tx,
+            ),
+            background_loop: BackgroundLearningLoop::new(
+                shared_state.clone(),
+                instant_to_background_rx,
+                background_to_deep_tx,
+            ),
+            deep_loop: DeepLearningLoop::new(
+                shared_state.clone(),
+                background_to_deep_rx,
+            ),
+            shared_state,
+            metrics: MetricsCollector::new(),
+        })
+    }
+
+    /// Start all loops
+    pub async fn start(&self) {
+        // Loop A runs inline with requests (no separate task)
+
+        // Loop B runs on background thread
+        let background = self.background_loop.clone();
+        tokio::spawn(async move {
+            background.run().await;
+        });
+
+        // Loop C runs on scheduled cron
+        let deep = self.deep_loop.clone();
+        tokio::spawn(async move {
+            let mut scheduler = cron::Schedule::from_str("0 0 3 * * 0")?; // 3 AM Sunday
+            loop {
+                let next = scheduler.upcoming(chrono::Utc).next().unwrap();
+                tokio::time::sleep_until(next.into()).await;
+                deep.run().await;
+            }
+        });
+    }
+
+    /// Process a single request through Loop A
+    #[inline]
+    pub async fn on_request(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        self.instant_loop.on_request(query, response, latency_ms).await
+    }
+}
+```
+
+---
+
+## 6. Learning Metrics and Monitoring
+
+### Improvement Tracking
+
+```rust
+/// Metrics for measuring self-improvement
+#[derive(Clone, Debug)]
+pub struct ImprovementMetrics {
+    /// Quality improvement over time
+    pub quality_delta_7d: f32,
+    pub quality_delta_30d: f32,
+
+    /// Latency improvement
+    pub latency_delta_7d: f32,
+    pub latency_delta_30d: f32,
+
+    /// Knowledge growth
+    pub memory_nodes_added_7d: usize,
+    pub patterns_learned_7d: usize,
+    pub abstractions_created_7d: usize,
+
+    /// Forgetting resistance (1.0 = no forgetting)
+    pub retention_rate_7d: f32,
+
+    /// Consciousness level (Φ)
+    pub phi_current: f64,
+    pub phi_delta_7d: f64,
+
+    /// Dreams and creativity
+    pub novel_connections_7d: usize,
+    pub cross_domain_transfers_7d: usize,
+}
+
+impl ImprovementMetrics {
+    /// Compute overall improvement score
+    pub fn overall_score(&self) -> f32 {
+        let quality_weight = 0.3;
+        let latency_weight = 0.2;
+        let knowledge_weight = 0.2;
+        let retention_weight = 0.15;
+        let creativity_weight = 0.15;
+
+        let quality_score = self.quality_delta_7d.max(0.0);
+        let latency_score = (-self.latency_delta_7d).max(0.0); // Lower is better
+        let knowledge_score = (self.patterns_learned_7d as f32 / 100.0).min(1.0);
+        let retention_score = self.retention_rate_7d;
+        let creativity_score = (self.novel_connections_7d as f32 / 50.0).min(1.0);
+
+        quality_weight * quality_score +
+        latency_weight * latency_score +
+        knowledge_weight * knowledge_score +
+        retention_weight * retention_score +
+        creativity_weight * creativity_score
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA's three-tier learning system enables:
+
+| Loop | Timescale | Purpose | Key Outcome |
+|------|-----------|---------|-------------|
+| **A** | Per-request | Instant adaptation | Responsive to current context |
+| **B** | Hourly | Pattern consolidation | Stable improvement |
+| **C** | Weekly | Deep restructuring | Creative breakthroughs |
+
+This mirrors human learning where:
+- **Loop A** = Working memory and immediate response
+- **Loop B** = Sleep-based consolidation
+- **Loop C** = Long-term memory formation and insight
+
+The result is a system that continuously improves at multiple timescales, never forgetting what works while constantly exploring new possibilities.
--- a/examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
+++ b/examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
@@ -0,0 +1,795 @@
+# SONA EWC++: Enhanced Elastic Weight Consolidation
+
+## Zero Catastrophic Forgetting with Task-Aware Regularization
+
+---
+
+## 1. The Forgetting Problem
+
+### Why LLMs Forget
+
+```
+CATASTROPHIC FORGETTING
+═══════════════════════
+
+Task A learned     Task B learned     Result
+───────────────    ───────────────    ──────────────────
+Weights W_A        Weights W_B        W_A knowledge LOST
+                   ↑                  as W moves toward B
+                   Training on B
+                   overwrites A
+```
+
+When fine-tuning on new data:
+- Weights shift toward new task optimum
+- Previous task knowledge encoded in old weights is overwritten
+- Model "forgets" earlier capabilities
+
+### Standard EWC Solution
+
+Elastic Weight Consolidation (EWC) adds a regularization term:
+
+```
+L_total = L_task + λ/2 · Σᵢ Fᵢ · (θᵢ - θ*ᵢ)²
+
+Where:
+- L_task = current task loss
+- λ = regularization strength
+- Fᵢ = Fisher Information (importance) of parameter i
+- θᵢ = current parameter value
+- θ*ᵢ = optimal parameter value from previous task
+```
+
+### EWC Limitations
+
+1. **Single task memory**: Only remembers one previous task
+2. **Static Fisher**: Computed once, never updated
+3. **Diagonal approximation**: Ignores parameter correlations
+4. **No task detection**: Doesn't know when task changes
+5. **Uniform λ**: Same regularization for all parameters
+
+---
+
+## 2. SONA EWC++ Enhancements
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         EWC++ ARCHITECTURE                          │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐      │
+│   │ Task Buffer   │    │ Online Fisher │    │ Adaptive λ    │      │
+│   │ (N tasks)     │    │ Estimation    │    │ Scheduler     │      │
+│   └───────┬───────┘    └───────┬───────┘    └───────┬───────┘      │
+│           │                    │                    │               │
+│           ▼                    ▼                    ▼               │
+│   ┌─────────────────────────────────────────────────────────────┐  │
+│   │                    EWC++ CORE ENGINE                         │  │
+│   │                                                               │  │
+│   │  L = L_task + Σₜ λₜ/2 · Σᵢ Fᵢᵗ · (θᵢ - θ*ᵢᵗ)² + L_sparse   │  │
+│   │      └─────┘   └──────────────────────────────────┘ └──────┘  │  │
+│   │      Task      Multi-task EWC                       Sparsity  │  │
+│   │      Loss      Regularization                       Penalty   │  │
+│   └─────────────────────────────────────────────────────────────┘  │
+│           │                    │                    │               │
+│           ▼                    ▼                    ▼               │
+│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐      │
+│   │ Gradient      │    │ Task Boundary │    │ Parameter     │      │
+│   │ Projection    │    │ Detection     │    │ Importance    │      │
+│   └───────────────┘    └───────────────┘    └───────────────┘      │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Multi-Task Memory Buffer
+
+### Task-Stratified Fisher Storage
+
+```rust
+/// EWC++ state with multi-task memory
+#[derive(Clone)]
+pub struct EWCPlusPlusState {
+    /// Per-task Fisher information (circular buffer of N tasks)
+    pub task_fishers: CircularBuffer<TaskFisher>,
+    /// Maximum number of tasks to remember
+    pub max_tasks: usize,
+    /// Per-task regularization strength
+    pub task_lambdas: Vec<f32>,
+    /// Global lambda base
+    pub lambda_base: f32,
+    /// Online Fisher estimator
+    pub online_fisher: OnlineFisherEstimator,
+    /// Task boundary detector
+    pub task_detector: TaskBoundaryDetector,
+    /// Parameter importance scores
+    pub importance_scores: Vec<f32>,
+}
+
+/// Fisher information for a single task
+#[derive(Clone)]
+pub struct TaskFisher {
+    /// Task identifier
+    pub task_id: u64,
+    /// Diagonal Fisher Information
+    pub fisher_diag: Vec<f32>,
+    /// Optimal weights at task completion
+    pub optimal_weights: Vec<f32>,
+    /// Task-specific lambda (learned)
+    pub lambda: f32,
+    /// Sample count used to compute Fisher
+    pub sample_count: usize,
+    /// Task quality score
+    pub quality: f32,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+impl EWCPlusPlusState {
+    /// Create new EWC++ state
+    pub fn new(num_params: usize, max_tasks: usize, lambda_base: f32) -> Self {
+        Self {
+            task_fishers: CircularBuffer::new(max_tasks),
+            max_tasks,
+            task_lambdas: Vec::new(),
+            lambda_base,
+            online_fisher: OnlineFisherEstimator::new(num_params),
+            task_detector: TaskBoundaryDetector::new(),
+            importance_scores: vec![1.0; num_params],
+        }
+    }
+
+    /// Compute total EWC++ regularization loss
+    pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
+        let mut total_loss = 0.0;
+
+        // Sum over all remembered tasks
+        for task in self.task_fishers.iter() {
+            let task_loss: f32 = task.fisher_diag.iter()
+                .zip(current_weights.iter())
+                .zip(task.optimal_weights.iter())
+                .zip(self.importance_scores.iter())
+                .map(|(((f, w), w_star), imp)| {
+                    // Importance-weighted Fisher regularization
+                    imp * f * (w - w_star).powi(2)
+                })
+                .sum();
+
+            total_loss += task.lambda * task_loss;
+        }
+
+        total_loss / 2.0
+    }
+
+    /// Compute gradients of EWC++ loss
+    pub fn regularization_gradient(&self, current_weights: &[f32]) -> Vec<f32> {
+        let mut grad = vec![0.0f32; current_weights.len()];
+
+        for task in self.task_fishers.iter() {
+            for (i, ((f, w), w_star)) in task.fisher_diag.iter()
+                .zip(current_weights.iter())
+                .zip(task.optimal_weights.iter())
+                .enumerate()
+            {
+                // d/dw [F * (w - w*)²] = 2 * F * (w - w*)
+                grad[i] += task.lambda * self.importance_scores[i] * f * (w - w_star);
+            }
+        }
+
+        grad
+    }
+
+    /// Record completion of current task
+    pub fn complete_task(&mut self, weights: &[f32], quality: f32) {
+        let task_id = self.task_fishers.len() as u64;
+
+        // Finalize online Fisher estimate
+        let fisher_diag = self.online_fisher.finalize();
+
+        // Compute task-specific lambda based on quality
+        let lambda = self.compute_task_lambda(quality);
+
+        let task_fisher = TaskFisher {
+            task_id,
+            fisher_diag,
+            optimal_weights: weights.to_vec(),
+            lambda,
+            sample_count: self.online_fisher.sample_count(),
+            quality,
+            timestamp: chrono::Utc::now().timestamp(),
+        };
+
+        self.task_fishers.push(task_fisher);
+        self.task_lambdas.push(lambda);
+
+        // Reset online Fisher for next task
+        self.online_fisher.reset();
+    }
+
+    /// Compute task-specific lambda based on quality
+    fn compute_task_lambda(&self, quality: f32) -> f32 {
+        // Higher quality tasks get stronger protection
+        self.lambda_base * (0.5 + 0.5 * quality)
+    }
+}
+```
+
+---
+
+## 4. Online Fisher Estimation
+
+### Streaming Fisher Information Computation
+
+```rust
+/// Online Fisher Information estimator using gradient accumulation
+pub struct OnlineFisherEstimator {
+    /// Running sum of squared gradients
+    gradient_sq_sum: Vec<f32>,
+    /// Sample count
+    count: usize,
+    /// Exponential moving average decay
+    decay: f32,
+    /// Minimum samples before valid estimate
+    min_samples: usize,
+}
+
+impl OnlineFisherEstimator {
+    pub fn new(num_params: usize) -> Self {
+        Self {
+            gradient_sq_sum: vec![0.0; num_params],
+            count: 0,
+            decay: 0.99, // EMA decay factor
+            min_samples: 100,
+        }
+    }
+
+    /// Update Fisher estimate with new gradient sample
+    #[inline]
+    pub fn update(&mut self, gradients: &[f32]) {
+        self.count += 1;
+
+        if self.count == 1 {
+            // First sample: initialize
+            for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
+                *sum = g * g;
+            }
+        } else {
+            // EMA update: F_new = decay * F_old + (1 - decay) * g²
+            let alpha = 1.0 - self.decay;
+            for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
+                *sum = self.decay * *sum + alpha * g * g;
+            }
+        }
+    }
+
+    /// Finalize and return Fisher diagonal
+    pub fn finalize(&self) -> Vec<f32> {
+        if self.count < self.min_samples {
+            tracing::warn!(
+                count = self.count,
+                min = self.min_samples,
+                "Fisher estimate may be unreliable"
+            );
+        }
+
+        // Normalize and apply minimum threshold
+        let min_fisher = 1e-6;
+        self.gradient_sq_sum.iter()
+            .map(|&f| f.max(min_fisher))
+            .collect()
+    }
+
+    /// Reset for new task
+    pub fn reset(&mut self) {
+        self.gradient_sq_sum.fill(0.0);
+        self.count = 0;
+    }
+
+    pub fn sample_count(&self) -> usize {
+        self.count
+    }
+}
+```
+
+---
+
+## 5. Automatic Task Boundary Detection
+
+### Detecting When the Task Changes
+
+```rust
+/// Automatic task boundary detection via distribution shift
+pub struct TaskBoundaryDetector {
+    /// Recent query embedding buffer
+    recent_embeddings: CircularBuffer<Vec<f32>>,
+    /// Baseline distribution (mean, variance)
+    baseline: Option<DistributionStats>,
+    /// Threshold for detecting shift (Mahalanobis distance)
+    shift_threshold: f32,
+    /// Minimum samples before detection
+    warmup_samples: usize,
+    /// Current drift score
+    drift_score: f32,
+}
+
+impl TaskBoundaryDetector {
+    pub fn new() -> Self {
+        Self {
+            recent_embeddings: CircularBuffer::new(1000),
+            baseline: None,
+            shift_threshold: 3.0, // 3 sigma
+            warmup_samples: 500,
+            drift_score: 0.0,
+        }
+    }
+
+    /// Update with new embedding and check for task boundary
+    pub fn update(&mut self, embedding: &[f32]) -> TaskBoundaryResult {
+        self.recent_embeddings.push(embedding.to_vec());
+
+        if self.recent_embeddings.len() < self.warmup_samples {
+            return TaskBoundaryResult::Warmup;
+        }
+
+        match &self.baseline {
+            None => {
+                // First baseline establishment
+                self.baseline = Some(self.compute_stats());
+                TaskBoundaryResult::BaselineEstablished
+            }
+            Some(baseline) => {
+                // Compute current distribution
+                let current = self.compute_recent_stats(100);
+
+                // Mahalanobis distance between distributions
+                let distance = self.mahalanobis_distance(baseline, &current);
+                self.drift_score = distance;
+
+                if distance > self.shift_threshold {
+                    // Task boundary detected!
+                    self.baseline = Some(current);
+                    TaskBoundaryResult::BoundaryDetected {
+                        drift_score: distance,
+                    }
+                } else {
+                    TaskBoundaryResult::Stable {
+                        drift_score: distance,
+                    }
+                }
+            }
+        }
+    }
+
+    fn compute_stats(&self) -> DistributionStats {
+        let n = self.recent_embeddings.len();
+        let dim = self.recent_embeddings[0].len();
+
+        let mut mean = vec![0.0f32; dim];
+        let mut var = vec![0.0f32; dim];
+
+        // Compute mean
+        for emb in self.recent_embeddings.iter() {
+            for (m, e) in mean.iter_mut().zip(emb.iter()) {
+                *m += e;
+            }
+        }
+        for m in &mut mean {
+            *m /= n as f32;
+        }
+
+        // Compute variance
+        for emb in self.recent_embeddings.iter() {
+            for (v, (e, m)) in var.iter_mut().zip(emb.iter().zip(mean.iter())) {
+                *v += (e - m).powi(2);
+            }
+        }
+        for v in &mut var {
+            *v /= n as f32;
+            *v = v.max(1e-6); // Avoid division by zero
+        }
+
+        DistributionStats { mean, variance: var }
+    }
+
+    fn compute_recent_stats(&self, n: usize) -> DistributionStats {
+        // Similar but only for last n samples
+        // ... implementation ...
+    }
+
+    fn mahalanobis_distance(&self, a: &DistributionStats, b: &DistributionStats) -> f32 {
+        a.mean.iter()
+            .zip(b.mean.iter())
+            .zip(a.variance.iter())
+            .map(|((m_a, m_b), v)| (m_a - m_b).powi(2) / v)
+            .sum::<f32>()
+            .sqrt()
+    }
+}
+
+#[derive(Debug)]
+pub enum TaskBoundaryResult {
+    Warmup,
+    BaselineEstablished,
+    Stable { drift_score: f32 },
+    BoundaryDetected { drift_score: f32 },
+}
+```
+
+---
+
+## 6. Adaptive Lambda Scheduling
+
+### Dynamic Regularization Strength
+
+```rust
+/// Adaptive lambda scheduler based on learning progress
+pub struct AdaptiveLambdaScheduler {
+    /// Base lambda value
+    base_lambda: f32,
+    /// Current effective lambda
+    current_lambda: f32,
+    /// Performance history (task quality over time)
+    performance_history: Vec<f32>,
+    /// Lambda adjustment rate
+    adjustment_rate: f32,
+}
+
+impl AdaptiveLambdaScheduler {
+    pub fn new(base_lambda: f32) -> Self {
+        Self {
+            base_lambda,
+            current_lambda: base_lambda,
+            performance_history: Vec::new(),
+            adjustment_rate: 0.1,
+        }
+    }
+
+    /// Update lambda based on recent performance
+    pub fn update(&mut self, current_quality: f32, forgetting_detected: bool) {
+        self.performance_history.push(current_quality);
+
+        if forgetting_detected {
+            // Increase lambda to prevent forgetting
+            self.current_lambda *= 1.0 + self.adjustment_rate;
+            tracing::info!(
+                new_lambda = self.current_lambda,
+                "Increased lambda due to forgetting"
+            );
+        } else if self.is_learning_stalled() {
+            // Decrease lambda to allow more plasticity
+            self.current_lambda *= 1.0 - self.adjustment_rate;
+            self.current_lambda = self.current_lambda.max(self.base_lambda * 0.1);
+            tracing::info!(
+                new_lambda = self.current_lambda,
+                "Decreased lambda to increase plasticity"
+            );
+        }
+
+        // Clamp to reasonable range
+        self.current_lambda = self.current_lambda.clamp(
+            self.base_lambda * 0.1,
+            self.base_lambda * 10.0,
+        );
+    }
+
+    fn is_learning_stalled(&self) -> bool {
+        if self.performance_history.len() < 10 {
+            return false;
+        }
+
+        let recent: Vec<_> = self.performance_history.iter()
+            .rev()
+            .take(10)
+            .collect();
+
+        // Check if variance in recent performance is very low
+        let mean: f32 = recent.iter().map(|&&x| x).sum::<f32>() / 10.0;
+        let var: f32 = recent.iter()
+            .map(|&&x| (x - mean).powi(2))
+            .sum::<f32>() / 10.0;
+
+        var < 0.001 // Stalled if very low variance
+    }
+
+    pub fn get_lambda(&self) -> f32 {
+        self.current_lambda
+    }
+}
+```
+
+---
+
+## 7. Parameter Importance Scoring
+
+### Which Parameters Matter Most
+
+```rust
+/// Per-parameter importance scoring for selective regularization
+pub struct ParameterImportanceScorer {
+    /// Importance scores (0-1 for each parameter)
+    scores: Vec<f32>,
+    /// Gradient magnitude history
+    gradient_magnitudes: Vec<CircularBuffer<f32>>,
+    /// Activation frequency
+    activation_frequency: Vec<f32>,
+}
+
+impl ParameterImportanceScorer {
+    pub fn new(num_params: usize) -> Self {
+        Self {
+            scores: vec![1.0; num_params],
+            gradient_magnitudes: (0..num_params)
+                .map(|_| CircularBuffer::new(100))
+                .collect(),
+            activation_frequency: vec![0.0; num_params],
+        }
+    }
+
+    /// Update importance based on gradient
+    pub fn update(&mut self, gradients: &[f32], activations: &[bool]) {
+        for (i, (g, &active)) in gradients.iter().zip(activations.iter()).enumerate() {
+            // Track gradient magnitude
+            self.gradient_magnitudes[i].push(g.abs());
+
+            // Track activation frequency
+            if active {
+                self.activation_frequency[i] = 0.99 * self.activation_frequency[i] + 0.01;
+            } else {
+                self.activation_frequency[i] *= 0.99;
+            }
+        }
+
+        // Recompute importance scores
+        self.recompute_scores();
+    }
+
+    fn recompute_scores(&mut self) {
+        for i in 0..self.scores.len() {
+            // Average gradient magnitude
+            let avg_grad: f32 = self.gradient_magnitudes[i].iter()
+                .sum::<f32>() / self.gradient_magnitudes[i].len().max(1) as f32;
+
+            // Importance = activation_freq * gradient_magnitude
+            // High activation + high gradient = important parameter
+            self.scores[i] = self.activation_frequency[i] * avg_grad;
+        }
+
+        // Normalize scores to [0, 1]
+        let max_score = self.scores.iter().cloned().fold(0.0f32, f32::max);
+        if max_score > 0.0 {
+            for s in &mut self.scores {
+                *s /= max_score;
+            }
+        }
+    }
+
+    pub fn get_scores(&self) -> &[f32] {
+        &self.scores
+    }
+}
+```
+
+---
+
+## 8. Gradient Projection
+
+### Safe Parameter Updates
+
+```rust
+/// Project gradients to avoid interfering with important past knowledge
+pub struct GradientProjector {
+    /// Null space of important task gradients
+    null_space: Option<Array2<f32>>,
+    /// Task gradient subspace (principal components)
+    task_subspace: Option<Array2<f32>>,
+}
+
+impl GradientProjector {
+    /// Project gradient to not interfere with past tasks
+    pub fn project(&self, gradient: &[f32]) -> Vec<f32> {
+        match &self.null_space {
+            Some(null) => {
+                // Project gradient onto null space of past task gradients
+                let g = Array1::from_vec(gradient.to_vec());
+                let projected = null.t().dot(&null.dot(&g));
+                projected.to_vec()
+            }
+            None => gradient.to_vec(),
+        }
+    }
+
+    /// Update null space with new task gradient directions
+    pub fn add_task_gradients(&mut self, task_gradients: &[Vec<f32>]) {
+        // Stack gradients into matrix
+        let n_samples = task_gradients.len();
+        let n_params = task_gradients[0].len();
+
+        let mut g_matrix = Array2::zeros((n_samples, n_params));
+        for (i, g) in task_gradients.iter().enumerate() {
+            for (j, &v) in g.iter().enumerate() {
+                g_matrix[[i, j]] = v;
+            }
+        }
+
+        // SVD to find principal gradient directions
+        let svd = g_matrix.svd(true, true).unwrap();
+        let u = svd.u.unwrap();
+
+        // Null space = complement of principal directions
+        // For memory efficiency, keep top-k directions
+        let k = 10.min(n_samples);
+        let task_directions = u.slice(s![.., ..k]).to_owned();
+
+        // Compute null space projection matrix
+        let identity = Array2::eye(n_params);
+        let projection = identity - task_directions.t().dot(&task_directions);
+
+        self.null_space = Some(projection);
+    }
+}
+```
+
+---
+
+## 9. Full EWC++ Training Loop
+
+### Putting It All Together
+
+```rust
+/// Complete EWC++ training step
+pub fn ewc_plus_plus_train_step(
+    model: &mut FastGRNNRouter,
+    ewc: &mut EWCPlusPlusState,
+    batch: &[RouterSample],
+    config: &TrainingConfig,
+) -> TrainStepResult {
+    let mut result = TrainStepResult::default();
+
+    // Forward pass
+    let predictions: Vec<_> = batch.iter()
+        .map(|s| model.forward(&s.features))
+        .collect();
+
+    // Task loss
+    let task_loss = compute_cross_entropy_loss(&predictions, batch);
+    result.task_loss = task_loss;
+
+    // EWC++ regularization loss
+    let ewc_loss = ewc.regularization_loss(model.get_weights());
+    result.ewc_loss = ewc_loss;
+
+    // Total loss
+    let total_loss = task_loss + config.lambda * ewc_loss;
+    result.total_loss = total_loss;
+
+    // Compute task gradients
+    let task_gradients = compute_gradients(&task_loss, model);
+
+    // Compute EWC++ gradients
+    let ewc_gradients = ewc.regularization_gradient(model.get_weights());
+
+    // Total gradients
+    let mut gradients: Vec<f32> = task_gradients.iter()
+        .zip(ewc_gradients.iter())
+        .map(|(t, e)| t + config.lambda * e)
+        .collect();
+
+    // Gradient projection (optional, for harder constraints)
+    if config.use_gradient_projection {
+        gradients = ewc.gradient_projector.project(&gradients);
+    }
+
+    // Gradient clipping
+    let grad_norm: f32 = gradients.iter().map(|g| g * g).sum::<f32>().sqrt();
+    if grad_norm > config.max_grad_norm {
+        let scale = config.max_grad_norm / grad_norm;
+        for g in &mut gradients {
+            *g *= scale;
+        }
+        result.gradient_clipped = true;
+    }
+
+    // Apply gradients
+    model.apply_gradients(&gradients, config.learning_rate);
+
+    // Update online Fisher estimate
+    ewc.online_fisher.update(&task_gradients);
+
+    // Update parameter importance
+    let activations: Vec<bool> = model.get_activation_mask();
+    ewc.importance_scorer.update(&task_gradients, &activations);
+
+    // Check for task boundary
+    if let Some(query_emb) = batch.first().map(|s| &s.query_embedding) {
+        let boundary = ewc.task_detector.update(query_emb);
+        if let TaskBoundaryResult::BoundaryDetected { drift_score } = boundary {
+            // Complete current task and start new one
+            ewc.complete_task(model.get_weights(), result.compute_quality());
+            result.task_boundary_detected = true;
+            result.drift_score = drift_score;
+        }
+    }
+
+    result
+}
+```
+
+---
+
+## 10. Benchmarks and Validation
+
+### Forgetting Resistance Metrics
+
+```rust
+/// Measure forgetting resistance on held-out test sets
+pub struct ForgettingBenchmark {
+    /// Per-task test sets
+    task_test_sets: Vec<TestSet>,
+    /// Performance history per task
+    task_performance: Vec<Vec<f32>>,
+}
+
+impl ForgettingBenchmark {
+    /// Evaluate current model on all past tasks
+    pub fn evaluate(&mut self, model: &FastGRNNRouter) -> ForgettingReport {
+        let mut report = ForgettingReport::default();
+
+        for (task_id, test_set) in self.task_test_sets.iter().enumerate() {
+            let accuracy = self.evaluate_task(model, test_set);
+            self.task_performance[task_id].push(accuracy);
+
+            // Compute forgetting = max_accuracy - current_accuracy
+            let max_acc = self.task_performance[task_id].iter()
+                .cloned()
+                .fold(0.0f32, f32::max);
+            let forgetting = (max_acc - accuracy).max(0.0);
+
+            report.per_task_accuracy.push(accuracy);
+            report.per_task_forgetting.push(forgetting);
+        }
+
+        // Average forgetting
+        report.avg_forgetting = report.per_task_forgetting.iter()
+            .sum::<f32>() / report.per_task_forgetting.len().max(1) as f32;
+
+        // Backward transfer (negative forgetting = improvement)
+        report.backward_transfer = -report.avg_forgetting;
+
+        report
+    }
+
+    fn evaluate_task(&self, model: &FastGRNNRouter, test: &TestSet) -> f32 {
+        let correct = test.samples.iter()
+            .filter(|s| model.forward(&s.features).predicted_class == s.label)
+            .count();
+        correct as f32 / test.samples.len() as f32
+    }
+}
+
+#[derive(Debug, Default)]
+pub struct ForgettingReport {
+    pub per_task_accuracy: Vec<f32>,
+    pub per_task_forgetting: Vec<f32>,
+    pub avg_forgetting: f32,
+    pub backward_transfer: f32,
+}
+```
+
+---
+
+## Summary: EWC++ vs Standard EWC
+
+| Feature | Standard EWC | SONA EWC++ |
+|---------|-------------|------------|
+| Task memory | 1 task | N tasks (configurable) |
+| Fisher estimation | Offline, single | Online, streaming |
+| Lambda | Fixed | Adaptive per-task |
+| Task detection | Manual | Automatic |
+| Parameter importance | Uniform | Learned |
+| Gradient handling | Direct | Projected |
+| Forgetting rate | ~5-10% | **<0.1%** |
+
+EWC++ enables SONA to learn continuously from every interaction while maintaining near-perfect retention of past knowledge.
--- a/examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
+++ b/examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
@@ -0,0 +1,794 @@
+# SONA ReasoningBank: Pattern-Driven Self-Optimization
+
+## Learning from Experience Through Trajectory Analysis
+
+---
+
+## 1. Overview
+
+ReasoningBank is SONA's long-term pattern memory, learning what works and applying that knowledge to optimize future decisions.
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      REASONINGBANK CONCEPT                          │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│    Query → [What worked before?] → Pattern Match → Optimized Params │
+│                      ↑                                              │
+│                      │                                              │
+│              ┌───────┴────────┐                                     │
+│              │ REASONINGBANK  │                                     │
+│              │                │                                     │
+│              │ • Trajectories │  ← Record every query               │
+│              │ • Patterns     │  ← Extract from clusters            │
+│              │ • Verdicts     │  ← What params worked best          │
+│              │ • Confidence   │  ← How certain we are               │
+│              └────────────────┘                                     │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Core Data Structures
+
+### Trajectory: Recording Every Interaction
+
+```rust
+/// A single query trajectory with outcomes
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct QueryTrajectory {
+    /// Unique trajectory ID
+    pub id: u64,
+    /// Query embedding vector
+    pub query_embedding: Vec<f32>,
+    /// Search parameters used
+    pub search_params: SearchParams,
+    /// Retrieved result IDs
+    pub retrieved_ids: Vec<String>,
+    /// Precision (relevant / retrieved)
+    pub precision: f32,
+    /// Recall (retrieved_relevant / total_relevant)
+    pub recall: f32,
+    /// Latency in microseconds
+    pub latency_us: u64,
+    /// User feedback if provided
+    pub feedback: Option<UserFeedback>,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct SearchParams {
+    /// ef_search parameter for HNSW
+    pub ef_search: usize,
+    /// Number of probes for IVF
+    pub n_probes: usize,
+    /// Model tier selected
+    pub model_tier: ModelTier,
+    /// Context window size
+    pub context_tokens: usize,
+    /// Temperature
+    pub temperature: f32,
+}
+```
+
+### Pattern: Learned Behavior Clusters
+
+```rust
+/// A learned pattern extracted from trajectory clusters
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct LearnedPattern {
+    /// Pattern ID
+    pub id: u64,
+    /// Centroid embedding (cluster center)
+    pub centroid: Vec<f32>,
+    /// Optimal search parameters for this pattern
+    pub optimal_params: SearchParams,
+    /// Confidence score (0-1)
+    pub confidence: f32,
+    /// Number of trajectories in cluster
+    pub support_count: usize,
+    /// Average precision for pattern
+    pub avg_precision: f32,
+    /// Average recall for pattern
+    pub avg_recall: f32,
+    /// Average latency
+    pub avg_latency_us: u64,
+    /// Pattern creation timestamp
+    pub created_at: i64,
+    /// Last update timestamp
+    pub updated_at: i64,
+    /// Abstraction level (0 = concrete, higher = more abstract)
+    pub abstraction_level: u32,
+    /// Child pattern IDs (for hierarchical patterns)
+    pub children: Vec<u64>,
+}
+```
+
+### Verdict: Decision Judgments
+
+```rust
+/// Verdict on what parameters worked best
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct Verdict {
+    /// Pattern this verdict applies to
+    pub pattern_id: u64,
+    /// Recommended parameters
+    pub recommended_params: SearchParams,
+    /// Confidence in recommendation
+    pub confidence: f32,
+    /// Evidence supporting this verdict
+    pub evidence: VerdictEvidence,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct VerdictEvidence {
+    /// Number of supporting trajectories
+    pub support_count: usize,
+    /// Average improvement over default
+    pub avg_improvement: f32,
+    /// Statistical significance (p-value)
+    pub p_value: f32,
+    /// Consistency score (low variance = high consistency)
+    pub consistency: f32,
+}
+```
+
+---
+
+## 3. ReasoningBank Implementation
+
+### Core Storage and Retrieval
+
+```rust
+use dashmap::DashMap;
+use parking_lot::RwLock;
+
+/// ReasoningBank: Pattern-based learning and optimization
+pub struct ReasoningBank {
+    /// Trajectory ring buffer (recent interactions)
+    trajectories: RwLock<CircularBuffer<QueryTrajectory>>,
+    /// Learned patterns (concurrent hashmap)
+    patterns: DashMap<u64, LearnedPattern>,
+    /// Pattern index for fast similarity lookup
+    pattern_index: RwLock<HNSWIndex>,
+    /// Verdicts per pattern
+    verdicts: DashMap<u64, Verdict>,
+    /// Configuration
+    config: ReasoningBankConfig,
+    /// Pattern ID counter
+    next_pattern_id: AtomicU64,
+    /// Statistics
+    stats: RwLock<ReasoningBankStats>,
+}
+
+impl ReasoningBank {
+    /// Create new ReasoningBank
+    pub fn new(config: ReasoningBankConfig) -> Self {
+        Self {
+            trajectories: RwLock::new(CircularBuffer::new(config.trajectory_capacity)),
+            patterns: DashMap::new(),
+            pattern_index: RwLock::new(HNSWIndex::new(config.embedding_dim, config.ef_construction)),
+            verdicts: DashMap::new(),
+            config,
+            next_pattern_id: AtomicU64::new(0),
+            stats: RwLock::new(ReasoningBankStats::default()),
+        }
+    }
+
+    /// Record a new trajectory
+    #[inline]
+    pub fn record_trajectory(&self, trajectory: QueryTrajectory) {
+        let mut trajectories = self.trajectories.write();
+        trajectories.push(trajectory);
+
+        // Update stats
+        let mut stats = self.stats.write();
+        stats.total_trajectories += 1;
+    }
+
+    /// Find most similar pattern to query
+    pub fn find_similar_pattern(&self, query_embedding: &[f32], k: usize) -> Vec<PatternMatch> {
+        let index = self.pattern_index.read();
+        let neighbors = index.search(query_embedding, k, self.config.ef_search);
+
+        neighbors.iter()
+            .filter_map(|&(id, distance)| {
+                self.patterns.get(&id).map(|p| PatternMatch {
+                    pattern: p.clone(),
+                    similarity: 1.0 - distance, // Convert distance to similarity
+                })
+            })
+            .collect()
+    }
+
+    /// Get optimized parameters for query
+    pub fn get_optimized_params(&self, query_embedding: &[f32]) -> OptimizedParams {
+        // Find similar patterns
+        let matches = self.find_similar_pattern(query_embedding, self.config.top_k_patterns);
+
+        if matches.is_empty() {
+            // No matching patterns - use defaults
+            return OptimizedParams {
+                params: SearchParams::default(),
+                confidence: 0.0,
+                source: ParamSource::Default,
+            };
+        }
+
+        // Interpolate parameters based on similarity and confidence
+        let mut weighted_params = SearchParams::default();
+        let mut total_weight = 0.0f32;
+
+        for m in &matches {
+            let weight = m.similarity * m.pattern.confidence;
+            total_weight += weight;
+
+            weighted_params.ef_search += (m.pattern.optimal_params.ef_search as f32 * weight) as usize;
+            weighted_params.n_probes += (m.pattern.optimal_params.n_probes as f32 * weight) as usize;
+            weighted_params.temperature += m.pattern.optimal_params.temperature * weight;
+            // ... other params
+        }
+
+        if total_weight > 0.0 {
+            weighted_params.ef_search = (weighted_params.ef_search as f32 / total_weight) as usize;
+            weighted_params.n_probes = (weighted_params.n_probes as f32 / total_weight) as usize;
+            weighted_params.temperature /= total_weight;
+        }
+
+        OptimizedParams {
+            params: weighted_params,
+            confidence: total_weight / matches.len() as f32,
+            source: ParamSource::Pattern(matches[0].pattern.id),
+        }
+    }
+
+    /// Record feedback for trajectory
+    pub fn record_feedback(&self, trajectory_id: u64, feedback: UserFeedback) {
+        // Find trajectory and update
+        let mut trajectories = self.trajectories.write();
+        if let Some(traj) = trajectories.iter_mut().find(|t| t.id == trajectory_id) {
+            traj.feedback = Some(feedback.clone());
+        }
+
+        // Update related pattern confidence
+        // Higher feedback = higher confidence in that pattern's params
+        if let Some(pattern_id) = self.find_pattern_for_trajectory(trajectory_id) {
+            if let Some(mut pattern) = self.patterns.get_mut(&pattern_id) {
+                let feedback_delta = feedback.rating as f32 / 5.0 - 0.5; // -0.5 to +0.5
+                pattern.confidence = (pattern.confidence + 0.1 * feedback_delta).clamp(0.0, 1.0);
+            }
+        }
+    }
+}
+```
+
+---
+
+## 4. Pattern Extraction
+
+### K-Means++ Clustering for Pattern Discovery
+
+```rust
+/// Pattern extractor using K-means++ clustering
+pub struct PatternExtractor {
+    /// Number of clusters to extract
+    k: usize,
+    /// Maximum iterations
+    max_iter: usize,
+    /// Convergence threshold
+    epsilon: f32,
+}
+
+impl PatternExtractor {
+    /// Extract patterns from trajectories
+    pub fn extract(&self, trajectories: &[QueryTrajectory]) -> Vec<LearnedPattern> {
+        if trajectories.len() < self.k {
+            return Vec::new();
+        }
+
+        // Collect embeddings
+        let embeddings: Vec<&[f32]> = trajectories.iter()
+            .map(|t| t.query_embedding.as_slice())
+            .collect();
+
+        // K-means++ initialization
+        let mut centroids = self.kmeans_plus_plus_init(&embeddings);
+
+        // K-means iteration
+        let mut assignments = vec![0usize; trajectories.len()];
+        for _ in 0..self.max_iter {
+            // Assignment step
+            let old_assignments = assignments.clone();
+            for (i, emb) in embeddings.iter().enumerate() {
+                let mut min_dist = f32::MAX;
+                let mut min_idx = 0;
+                for (c_idx, centroid) in centroids.iter().enumerate() {
+                    let dist = euclidean_distance(emb, centroid);
+                    if dist < min_dist {
+                        min_dist = dist;
+                        min_idx = c_idx;
+                    }
+                }
+                assignments[i] = min_idx;
+            }
+
+            // Check convergence
+            if assignments == old_assignments {
+                break;
+            }
+
+            // Update step
+            centroids = self.compute_centroids(&embeddings, &assignments);
+        }
+
+        // Create patterns from clusters
+        let mut patterns = Vec::new();
+        for cluster_id in 0..self.k {
+            let cluster_trajectories: Vec<_> = trajectories.iter()
+                .zip(assignments.iter())
+                .filter(|(_, &a)| a == cluster_id)
+                .map(|(t, _)| t)
+                .collect();
+
+            if cluster_trajectories.len() < 3 {
+                continue; // Skip small clusters
+            }
+
+            let pattern = self.create_pattern_from_cluster(
+                cluster_id as u64,
+                &centroids[cluster_id],
+                &cluster_trajectories,
+            );
+            patterns.push(pattern);
+        }
+
+        patterns
+    }
+
+    fn kmeans_plus_plus_init(&self, embeddings: &[&[f32]]) -> Vec<Vec<f32>> {
+        let mut centroids = Vec::with_capacity(self.k);
+        let mut rng = rand::thread_rng();
+
+        // First centroid: random
+        let first_idx = rng.gen_range(0..embeddings.len());
+        centroids.push(embeddings[first_idx].to_vec());
+
+        // Remaining centroids: D² weighting
+        for _ in 1..self.k {
+            let mut distances: Vec<f32> = embeddings.iter()
+                .map(|emb| {
+                    centroids.iter()
+                        .map(|c| euclidean_distance(emb, c))
+                        .fold(f32::MAX, f32::min)
+                })
+                .collect();
+
+            // Square distances for D² sampling
+            let total: f32 = distances.iter().map(|d| d * d).sum();
+            let threshold = rng.gen::<f32>() * total;
+
+            let mut cumsum = 0.0;
+            let mut selected = 0;
+            for (i, d) in distances.iter().enumerate() {
+                cumsum += d * d;
+                if cumsum >= threshold {
+                    selected = i;
+                    break;
+                }
+            }
+
+            centroids.push(embeddings[selected].to_vec());
+        }
+
+        centroids
+    }
+
+    fn create_pattern_from_cluster(
+        &self,
+        id: u64,
+        centroid: &[f32],
+        trajectories: &[&QueryTrajectory],
+    ) -> LearnedPattern {
+        // Compute optimal params as weighted average by quality
+        let mut total_weight = 0.0f32;
+        let mut ef_sum = 0.0f32;
+        let mut probes_sum = 0.0f32;
+        let mut temp_sum = 0.0f32;
+        let mut precision_sum = 0.0f32;
+        let mut recall_sum = 0.0f32;
+        let mut latency_sum = 0u64;
+
+        for t in trajectories {
+            let weight = t.precision * t.recall; // Quality as weight
+            total_weight += weight;
+
+            ef_sum += t.search_params.ef_search as f32 * weight;
+            probes_sum += t.search_params.n_probes as f32 * weight;
+            temp_sum += t.search_params.temperature * weight;
+            precision_sum += t.precision;
+            recall_sum += t.recall;
+            latency_sum += t.latency_us;
+        }
+
+        let n = trajectories.len() as f32;
+
+        LearnedPattern {
+            id,
+            centroid: centroid.to_vec(),
+            optimal_params: SearchParams {
+                ef_search: (ef_sum / total_weight).round() as usize,
+                n_probes: (probes_sum / total_weight).round() as usize,
+                model_tier: ModelTier::Auto, // Determined separately
+                context_tokens: 2048, // Default
+                temperature: temp_sum / total_weight,
+            },
+            confidence: (total_weight / n).clamp(0.0, 1.0),
+            support_count: trajectories.len(),
+            avg_precision: precision_sum / n,
+            avg_recall: recall_sum / n,
+            avg_latency_us: latency_sum / trajectories.len() as u64,
+            created_at: chrono::Utc::now().timestamp(),
+            updated_at: chrono::Utc::now().timestamp(),
+            abstraction_level: 0,
+            children: Vec::new(),
+        }
+    }
+}
+```
+
+---
+
+## 5. Verdict Judgment System
+
+### Evaluating What Works Best
+
+```rust
+/// Verdict judge for parameter optimization
+pub struct VerdictJudge {
+    /// Minimum samples for statistical significance
+    min_samples: usize,
+    /// Significance level (p-value threshold)
+    alpha: f32,
+}
+
+impl VerdictJudge {
+    /// Judge optimal parameters for a pattern
+    pub fn judge(&self, pattern: &LearnedPattern, trajectories: &[&QueryTrajectory]) -> Option<Verdict> {
+        if trajectories.len() < self.min_samples {
+            return None; // Not enough evidence
+        }
+
+        // Group trajectories by parameter configuration
+        let mut param_groups: HashMap<ParamKey, Vec<&QueryTrajectory>> = HashMap::new();
+        for t in trajectories {
+            let key = ParamKey::from(&t.search_params);
+            param_groups.entry(key).or_default().push(t);
+        }
+
+        // Find best performing configuration
+        let mut best_config: Option<(ParamKey, f32, Vec<&QueryTrajectory>)> = None;
+
+        for (key, group) in &param_groups {
+            if group.len() < 3 {
+                continue;
+            }
+
+            // Compute quality score (F1 of precision and recall)
+            let avg_quality: f32 = group.iter()
+                .map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
+                .sum::<f32>() / group.len() as f32;
+
+            match &best_config {
+                None => best_config = Some((key.clone(), avg_quality, group.clone())),
+                Some((_, best_quality, _)) if avg_quality > *best_quality => {
+                    best_config = Some((key.clone(), avg_quality, group.clone()));
+                }
+                _ => {}
+            }
+        }
+
+        let (best_key, best_quality, best_group) = best_config?;
+
+        // Statistical significance test
+        let p_value = self.compute_significance(&best_group, trajectories);
+        if p_value > self.alpha {
+            return None; // Not significant
+        }
+
+        // Compute consistency (inverse of coefficient of variation)
+        let qualities: Vec<f32> = best_group.iter()
+            .map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
+            .collect();
+        let mean = qualities.iter().sum::<f32>() / qualities.len() as f32;
+        let variance = qualities.iter()
+            .map(|q| (q - mean).powi(2))
+            .sum::<f32>() / qualities.len() as f32;
+        let std_dev = variance.sqrt();
+        let consistency = 1.0 / (1.0 + std_dev / mean);
+
+        // Compute improvement over default
+        let default_quality = self.compute_default_quality(trajectories);
+        let improvement = (best_quality - default_quality) / default_quality;
+
+        Some(Verdict {
+            pattern_id: pattern.id,
+            recommended_params: best_key.to_params(),
+            confidence: best_quality * consistency,
+            evidence: VerdictEvidence {
+                support_count: best_group.len(),
+                avg_improvement: improvement,
+                p_value,
+                consistency,
+            },
+        })
+    }
+
+    fn compute_significance(&self, best: &[&QueryTrajectory], all: &[&QueryTrajectory]) -> f32 {
+        // Welch's t-test for comparing means
+        let best_qualities: Vec<f32> = best.iter()
+            .map(|t| t.precision * t.recall)
+            .collect();
+        let all_qualities: Vec<f32> = all.iter()
+            .map(|t| t.precision * t.recall)
+            .collect();
+
+        welch_t_test(&best_qualities, &all_qualities)
+    }
+
+    fn compute_default_quality(&self, trajectories: &[&QueryTrajectory]) -> f32 {
+        // Assume first configuration or most common is "default"
+        let default_group: Vec<_> = trajectories.iter()
+            .filter(|t| t.search_params.ef_search == SearchParams::default().ef_search)
+            .collect();
+
+        if default_group.is_empty() {
+            0.5 // Baseline assumption
+        } else {
+            default_group.iter()
+                .map(|t| t.precision * t.recall)
+                .sum::<f32>() / default_group.len() as f32
+        }
+    }
+}
+```
+
+---
+
+## 6. Integration with Router
+
+### Using ReasoningBank to Optimize Router Decisions
+
+```rust
+impl FastGRNNRouter {
+    /// Forward pass with ReasoningBank optimization
+    pub fn forward_with_reasoning(
+        &self,
+        features: &[f32],
+        reasoning_bank: &ReasoningBank,
+    ) -> RouterDecision {
+        // Get pattern-based parameter suggestions
+        let pattern_params = reasoning_bank.get_optimized_params(features);
+
+        // Standard router forward
+        let mut decision = self.forward(features);
+
+        // Blend router decision with pattern suggestions
+        if pattern_params.confidence > 0.5 {
+            let blend_factor = pattern_params.confidence * 0.3; // Max 30% influence
+
+            // Interpolate temperature
+            decision.temperature = (1.0 - blend_factor) * decision.temperature
+                + blend_factor * pattern_params.params.temperature;
+
+            // Context token suggestion influences context selection
+            let suggested_context = pattern_params.params.context_tokens;
+            let router_context = decision.context_tokens;
+            decision.context_tokens = ((1.0 - blend_factor) * router_context as f32
+                + blend_factor * suggested_context as f32) as usize;
+
+            decision.reasoning_confidence = pattern_params.confidence;
+            decision.reasoning_pattern_id = pattern_params.source.pattern_id();
+        }
+
+        decision
+    }
+}
+```
+
+---
+
+## 7. Pattern Consolidation and Pruning
+
+### Managing Pattern Memory
+
+```rust
+impl ReasoningBank {
+    /// Consolidate similar patterns
+    pub fn consolidate_patterns(&mut self) {
+        // Find similar pattern pairs
+        let pattern_ids: Vec<u64> = self.patterns.iter()
+            .map(|p| *p.key())
+            .collect();
+
+        let mut to_merge: Vec<(u64, u64)> = Vec::new();
+
+        for i in 0..pattern_ids.len() {
+            for j in (i+1)..pattern_ids.len() {
+                let p1 = self.patterns.get(&pattern_ids[i]).unwrap();
+                let p2 = self.patterns.get(&pattern_ids[j]).unwrap();
+
+                let similarity = cosine_similarity(&p1.centroid, &p2.centroid);
+                if similarity > 0.95 {
+                    // Very similar - merge
+                    to_merge.push((pattern_ids[i], pattern_ids[j]));
+                }
+            }
+        }
+
+        // Merge patterns
+        for (keep_id, remove_id) in to_merge {
+            if let (Some(mut keep), Some(remove)) = (
+                self.patterns.get_mut(&keep_id),
+                self.patterns.get(&remove_id)
+            ) {
+                // Weighted average of centroids
+                let total_support = keep.support_count + remove.support_count;
+                let w1 = keep.support_count as f32 / total_support as f32;
+                let w2 = remove.support_count as f32 / total_support as f32;
+
+                for (c, (c1, c2)) in keep.centroid.iter_mut()
+                    .zip(keep.centroid.iter().zip(remove.centroid.iter()))
+                {
+                    *c = w1 * c1 + w2 * c2;
+                }
+
+                // Update support count
+                keep.support_count = total_support;
+                keep.confidence = (keep.confidence * w1 + remove.confidence * w2).min(1.0);
+                keep.updated_at = chrono::Utc::now().timestamp();
+            }
+
+            // Remove merged pattern
+            self.patterns.remove(&remove_id);
+        }
+    }
+
+    /// Prune low-confidence patterns
+    pub fn prune_patterns(&mut self, min_confidence: f32, min_support: usize) {
+        let to_remove: Vec<u64> = self.patterns.iter()
+            .filter(|p| p.confidence < min_confidence || p.support_count < min_support)
+            .map(|p| *p.key())
+            .collect();
+
+        for id in to_remove {
+            self.patterns.remove(&id);
+            self.verdicts.remove(&id);
+        }
+    }
+
+    /// Build pattern hierarchy (abstraction levels)
+    pub fn build_hierarchy(&mut self) {
+        // Hierarchical clustering on existing patterns
+        let patterns: Vec<_> = self.patterns.iter()
+            .map(|p| (p.key().clone(), p.centroid.clone()))
+            .collect();
+
+        let hierarchy = HierarchicalClustering::new()
+            .linkage(Linkage::Ward)
+            .fit(&patterns);
+
+        // Create meta-patterns at each level
+        for level in 1..=3 {
+            let clusters = hierarchy.clusters_at_level(level);
+
+            for cluster in clusters {
+                if cluster.size() > 1 {
+                    let child_ids: Vec<u64> = cluster.member_ids();
+                    let meta_centroid = cluster.centroid();
+
+                    // Average params from children
+                    let children: Vec<_> = child_ids.iter()
+                        .filter_map(|id| self.patterns.get(id))
+                        .collect();
+
+                    let meta_params = self.average_params(&children);
+
+                    let meta_pattern = LearnedPattern {
+                        id: self.next_pattern_id.fetch_add(1, Ordering::SeqCst),
+                        centroid: meta_centroid,
+                        optimal_params: meta_params,
+                        confidence: children.iter().map(|c| c.confidence).sum::<f32>() / children.len() as f32,
+                        support_count: children.iter().map(|c| c.support_count).sum(),
+                        avg_precision: children.iter().map(|c| c.avg_precision).sum::<f32>() / children.len() as f32,
+                        avg_recall: children.iter().map(|c| c.avg_recall).sum::<f32>() / children.len() as f32,
+                        avg_latency_us: children.iter().map(|c| c.avg_latency_us).sum::<u64>() / children.len() as u64,
+                        created_at: chrono::Utc::now().timestamp(),
+                        updated_at: chrono::Utc::now().timestamp(),
+                        abstraction_level: level as u32,
+                        children: child_ids,
+                    };
+
+                    self.patterns.insert(meta_pattern.id, meta_pattern);
+                }
+            }
+        }
+    }
+}
+```
+
+---
+
+## 8. Statistics and Monitoring
+
+```rust
+#[derive(Default, Debug)]
+pub struct ReasoningBankStats {
+    /// Total trajectories recorded
+    pub total_trajectories: u64,
+    /// Total patterns stored
+    pub total_patterns: usize,
+    /// Total verdicts issued
+    pub total_verdicts: usize,
+    /// Pattern match hit rate
+    pub pattern_hit_rate: f32,
+    /// Average confidence in recommendations
+    pub avg_recommendation_confidence: f32,
+    /// Improvement from pattern optimization
+    pub avg_improvement_percent: f32,
+}
+
+impl ReasoningBank {
+    /// Get current statistics
+    pub fn stats(&self) -> ReasoningBankStats {
+        let stats = self.stats.read();
+        ReasoningBankStats {
+            total_trajectories: stats.total_trajectories,
+            total_patterns: self.patterns.len(),
+            total_verdicts: self.verdicts.len(),
+            pattern_hit_rate: stats.pattern_hit_rate,
+            avg_recommendation_confidence: stats.avg_recommendation_confidence,
+            avg_improvement_percent: stats.avg_improvement_percent,
+        }
+    }
+
+    /// Export all patterns for persistence
+    pub fn export(&self) -> ReasoningBankExport {
+        ReasoningBankExport {
+            patterns: self.patterns.iter()
+                .map(|p| p.value().clone())
+                .collect(),
+            verdicts: self.verdicts.iter()
+                .map(|v| v.value().clone())
+                .collect(),
+        }
+    }
+
+    /// Import patterns from persistence
+    pub fn import(&mut self, export: ReasoningBankExport) {
+        for pattern in export.patterns {
+            let id = pattern.id;
+            self.patterns.insert(id, pattern.clone());
+            self.pattern_index.write().insert(id, &pattern.centroid);
+        }
+        for verdict in export.verdicts {
+            self.verdicts.insert(verdict.pattern_id, verdict);
+        }
+    }
+}
+```
+
+---
+
+## Summary
+
+ReasoningBank enables SONA to:
+
+1. **Learn from every query** through trajectory recording
+2. **Discover patterns** via K-means++ clustering
+3. **Judge what works** through statistical verdict analysis
+4. **Optimize future decisions** by interpolating from similar patterns
+5. **Build abstractions** through hierarchical pattern consolidation
+
+This creates a continuously improving system where past experience directly enhances future performance.
--- a/examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
+++ b/examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
@@ -0,0 +1,755 @@
+# SONA Memory Dreams: Offline Consolidation Engine
+
+## Creativity Through Neural Replay and Recombination
+
+---
+
+## 1. Biological Inspiration
+
+### Why Dreams Matter for Learning
+
+```
+HUMAN SLEEP-BASED LEARNING
+══════════════════════════
+
+Awake:                    Sleep (REM):              Next Day:
+─────────────────         ─────────────────         ─────────────────
+• New experiences         • Replay memories         • Consolidated knowledge
+• Pattern matching        • Recombine ideas         • Novel insights
+• Working memory          • Strengthen important    • Creative connections
+                          • Prune unimportant
+```
+
+Research shows that:
+- **Memory consolidation** happens during sleep
+- **Creative insights** emerge from random memory replay
+- **Neural pruning** removes low-value connections
+- **Analogical reasoning** connects distant concepts
+
+SONA's Dream Engine replicates these mechanisms for AI self-improvement.
+
+---
+
+## 2. Dream Engine Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      DREAM ENGINE ARCHITECTURE                       │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌───────────────┐                                                 │
+│   │ MEMORY GRAPH  │──────┐                                          │
+│   └───────────────┘      │                                          │
+│                          ▼                                          │
+│   ┌─────────────────────────────────────┐                          │
+│   │        DREAM GENERATOR              │                          │
+│   │                                     │                          │
+│   │  ┌─────────┐  ┌─────────┐          │                          │
+│   │  │ Random  │  │Weighted │          │                          │
+│   │  │ Walks   │  │ Sampling│          │                          │
+│   │  └────┬────┘  └────┬────┘          │                          │
+│   │       │            │               │                          │
+│   │       ▼            ▼               │                          │
+│   │  ┌──────────────────────┐          │                          │
+│   │  │   Dream Sequence     │          │                          │
+│   │  │   [M₁→M₂→M₃→...→Mₙ] │          │                          │
+│   │  └──────────┬───────────┘          │                          │
+│   └─────────────┼───────────────────────┘                          │
+│                 │                                                   │
+│                 ▼                                                   │
+│   ┌─────────────────────────────────────┐                          │
+│   │       DREAM EVALUATOR               │                          │
+│   │                                     │                          │
+│   │  • Novelty Score (new connections?) │                          │
+│   │  • Coherence Score (makes sense?)   │                          │
+│   │  • Utility Score (useful insight?)  │                          │
+│   └─────────────────────────────────────┘                          │
+│                 │                                                   │
+│                 ▼                                                   │
+│   ┌─────────────────────────────────────┐                          │
+│   │       DREAM INTEGRATOR              │                          │
+│   │                                     │                          │
+│   │  • Add weak creative edges          │                          │
+│   │  • Update pattern associations      │                          │
+│   │  • Generate novel hypotheses        │                          │
+│   └─────────────────────────────────────┘                          │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Dream Generation
+
+### Random Walk Memory Replay
+
+```rust
+/// Dream generator using random walks on memory graph
+pub struct DreamGenerator {
+    /// Temperature for random walk (higher = more random)
+    temperature: f32,
+    /// Maximum dream length
+    max_length: usize,
+    /// Minimum coherence threshold
+    min_coherence: f32,
+    /// Creativity bias (prefer novel connections)
+    creativity_bias: f32,
+}
+
+impl DreamGenerator {
+    /// Generate a single dream sequence
+    pub fn generate_dream(
+        &self,
+        memory: &MemoryGraph,
+        start_node: Option<NodeId>,
+    ) -> Dream {
+        let mut sequence = Vec::new();
+        let mut visited = HashSet::new();
+
+        // Start from random high-activation node if not specified
+        let current = start_node.unwrap_or_else(|| {
+            memory.sample_by_activation()
+        });
+
+        sequence.push(current);
+        visited.insert(current);
+
+        // Random walk with creativity-weighted transitions
+        for _ in 0..self.max_length {
+            let neighbors = memory.get_neighbors(current);
+
+            if neighbors.is_empty() {
+                break;
+            }
+
+            // Compute transition probabilities
+            let probs: Vec<f32> = neighbors.iter()
+                .map(|&(neighbor, edge_weight)| {
+                    let novelty_bonus = if visited.contains(&neighbor) {
+                        0.1 // Discourage revisits
+                    } else {
+                        1.0 + self.creativity_bias * (1.0 - memory.get_access_frequency(neighbor))
+                    };
+
+                    (edge_weight * novelty_bonus).powf(1.0 / self.temperature)
+                })
+                .collect();
+
+            // Sample next node
+            let next = sample_weighted(&neighbors, &probs);
+
+            if let Some((next_node, _)) = next {
+                sequence.push(next_node);
+                visited.insert(next_node);
+            } else {
+                break;
+            }
+        }
+
+        Dream {
+            sequence,
+            temperature: self.temperature,
+            timestamp: chrono::Utc::now().timestamp(),
+        }
+    }
+
+    /// Generate creative jump dream (non-local connections)
+    pub fn generate_creative_dream(
+        &self,
+        memory: &MemoryGraph,
+        num_jumps: usize,
+    ) -> Dream {
+        let mut sequence = Vec::new();
+
+        // Sample diverse starting points
+        let anchors = memory.sample_diverse(num_jumps, 0.3);
+
+        for anchor in anchors {
+            sequence.push(anchor);
+
+            // Short local walk from each anchor
+            let local_walk = self.generate_dream(memory, Some(anchor));
+            sequence.extend(local_walk.sequence.iter().skip(1).take(3));
+        }
+
+        Dream {
+            sequence,
+            temperature: self.temperature * 2.0, // Higher temperature for creative dreams
+            timestamp: chrono::Utc::now().timestamp(),
+        }
+    }
+}
+
+/// A dream sequence
+pub struct Dream {
+    /// Sequence of visited memory nodes
+    pub sequence: Vec<NodeId>,
+    /// Temperature used for generation
+    pub temperature: f32,
+    /// Generation timestamp
+    pub timestamp: i64,
+}
+```
+
+---
+
+## 4. Dream Evaluation
+
+### Measuring Dream Quality
+
+```rust
+/// Evaluator for dream quality
+pub struct DreamEvaluator {
+    /// Memory graph reference
+    memory: Arc<MemoryGraph>,
+    /// Novelty detection threshold
+    novelty_threshold: f32,
+}
+
+impl DreamEvaluator {
+    /// Evaluate dream quality across multiple dimensions
+    pub fn evaluate(&self, dream: &Dream) -> DreamQuality {
+        DreamQuality {
+            novelty: self.compute_novelty(dream),
+            coherence: self.compute_coherence(dream),
+            utility: self.compute_utility(dream),
+            diversity: self.compute_diversity(dream),
+        }
+    }
+
+    /// Novelty: How many new connections are suggested?
+    fn compute_novelty(&self, dream: &Dream) -> f32 {
+        let mut novel_pairs = 0;
+        let mut total_pairs = 0;
+
+        for i in 0..dream.sequence.len() {
+            for j in (i+1)..dream.sequence.len() {
+                total_pairs += 1;
+
+                let node_a = dream.sequence[i];
+                let node_b = dream.sequence[j];
+
+                // Check if edge exists
+                if !self.memory.has_edge(node_a, node_b) {
+                    // Check semantic similarity
+                    let emb_a = self.memory.get_embedding(node_a);
+                    let emb_b = self.memory.get_embedding(node_b);
+                    let sim = cosine_similarity(&emb_a, &emb_b);
+
+                    // Novel = no edge but moderate similarity
+                    if sim > 0.3 && sim < 0.8 {
+                        novel_pairs += 1;
+                    }
+                }
+            }
+        }
+
+        novel_pairs as f32 / total_pairs.max(1) as f32
+    }
+
+    /// Coherence: Does the dream sequence make semantic sense?
+    fn compute_coherence(&self, dream: &Dream) -> f32 {
+        if dream.sequence.len() < 2 {
+            return 1.0;
+        }
+
+        let mut coherence_sum = 0.0f32;
+
+        for window in dream.sequence.windows(2) {
+            let emb_a = self.memory.get_embedding(window[0]);
+            let emb_b = self.memory.get_embedding(window[1]);
+            coherence_sum += cosine_similarity(&emb_a, &emb_b);
+        }
+
+        coherence_sum / (dream.sequence.len() - 1) as f32
+    }
+
+    /// Utility: Are the suggested connections potentially useful?
+    fn compute_utility(&self, dream: &Dream) -> f32 {
+        // Based on node quality scores and access patterns
+        let avg_quality: f32 = dream.sequence.iter()
+            .map(|&id| self.memory.get_node_quality(id))
+            .sum::<f32>() / dream.sequence.len() as f32;
+
+        // Higher utility if connecting high-quality nodes
+        avg_quality
+    }
+
+    /// Diversity: How diverse are the visited nodes?
+    fn compute_diversity(&self, dream: &Dream) -> f32 {
+        // Average pairwise distance in embedding space
+        let embeddings: Vec<_> = dream.sequence.iter()
+            .map(|&id| self.memory.get_embedding(id))
+            .collect();
+
+        let mut total_dist = 0.0f32;
+        let mut count = 0;
+
+        for i in 0..embeddings.len() {
+            for j in (i+1)..embeddings.len() {
+                total_dist += 1.0 - cosine_similarity(&embeddings[i], &embeddings[j]);
+                count += 1;
+            }
+        }
+
+        total_dist / count.max(1) as f32
+    }
+}
+
+#[derive(Debug, Clone)]
+pub struct DreamQuality {
+    /// How many novel connections suggested (0-1)
+    pub novelty: f32,
+    /// How semantically coherent (0-1)
+    pub coherence: f32,
+    /// How useful the connections might be (0-1)
+    pub utility: f32,
+    /// How diverse the dream content (0-1)
+    pub diversity: f32,
+}
+
+impl DreamQuality {
+    /// Overall quality score
+    pub fn overall(&self) -> f32 {
+        // Weighted combination favoring novelty and coherence
+        0.4 * self.novelty + 0.3 * self.coherence + 0.2 * self.utility + 0.1 * self.diversity
+    }
+
+    /// Is this dream worth integrating?
+    pub fn is_valuable(&self, threshold: f32) -> bool {
+        self.novelty > 0.3 && self.coherence > 0.4 && self.overall() > threshold
+    }
+}
+```
+
+---
+
+## 5. Dream Integration
+
+### Applying Dream Insights to Memory
+
+```rust
+/// Integrates valuable dreams into memory graph
+pub struct DreamIntegrator {
+    /// Memory graph to update
+    memory: Arc<RwLock<MemoryGraph>>,
+    /// Strength of new creative edges
+    creative_edge_strength: f32,
+    /// Decay factor for dream-derived edges
+    dream_edge_decay: f32,
+}
+
+impl DreamIntegrator {
+    /// Integrate a valuable dream into memory
+    pub fn integrate(&self, dream: &Dream, quality: &DreamQuality) -> IntegrationResult {
+        let mut result = IntegrationResult::default();
+
+        if !quality.is_valuable(0.5) {
+            return result; // Skip low-quality dreams
+        }
+
+        let mut memory = self.memory.write();
+
+        // Extract novel connections from dream
+        let novel_connections = self.extract_novel_connections(dream, &memory);
+
+        for (node_a, node_b, strength) in novel_connections {
+            // Add weak creative edge
+            let edge_strength = self.creative_edge_strength * strength * quality.overall();
+
+            memory.add_edge(
+                node_a,
+                node_b,
+                EdgeType::Creative,
+                edge_strength,
+            );
+
+            result.edges_added += 1;
+        }
+
+        // Update node associations based on dream co-occurrence
+        for window in dream.sequence.windows(3) {
+            memory.update_association(window[0], window[2], 0.01);
+        }
+
+        result.dream_quality = quality.overall();
+        result
+    }
+
+    fn extract_novel_connections(
+        &self,
+        dream: &Dream,
+        memory: &MemoryGraph,
+    ) -> Vec<(NodeId, NodeId, f32)> {
+        let mut connections = Vec::new();
+
+        for i in 0..dream.sequence.len() {
+            for j in (i+1)..dream.sequence.len().min(i+5) { // Only nearby in sequence
+                let node_a = dream.sequence[i];
+                let node_b = dream.sequence[j];
+
+                if !memory.has_edge(node_a, node_b) {
+                    let emb_a = memory.get_embedding(node_a);
+                    let emb_b = memory.get_embedding(node_b);
+                    let sim = cosine_similarity(&emb_a, &emb_b);
+
+                    if sim > 0.3 {
+                        // Connection strength based on similarity and sequence proximity
+                        let proximity_factor = 1.0 / (j - i) as f32;
+                        let strength = sim * proximity_factor;
+                        connections.push((node_a, node_b, strength));
+                    }
+                }
+            }
+        }
+
+        connections
+    }
+}
+
+#[derive(Default)]
+pub struct IntegrationResult {
+    pub edges_added: usize,
+    pub associations_updated: usize,
+    pub dream_quality: f32,
+}
+```
+
+---
+
+## 6. Memory Consolidation
+
+### Strengthening Important Memories
+
+```rust
+/// Consolidation engine for memory pruning and strengthening
+pub struct ConsolidationEngine {
+    /// Memory graph reference
+    memory: Arc<RwLock<MemoryGraph>>,
+    /// Minimum access frequency for retention
+    min_access_frequency: f32,
+    /// Age decay factor (older = more decay)
+    age_decay: f32,
+    /// Quality threshold for preservation
+    quality_threshold: f32,
+}
+
+impl ConsolidationEngine {
+    /// Run full consolidation pass
+    pub fn consolidate(&self) -> ConsolidationReport {
+        let mut report = ConsolidationReport::default();
+
+        // Phase 1: Identify memories by value
+        let (high_value, medium_value, low_value) = self.categorize_memories();
+        report.high_value_count = high_value.len();
+        report.medium_value_count = medium_value.len();
+        report.low_value_count = low_value.len();
+
+        // Phase 2: Strengthen high-value memories
+        for &node_id in &high_value {
+            self.strengthen_memory(node_id);
+            report.memories_strengthened += 1;
+        }
+
+        // Phase 3: Decay low-value memories
+        for &node_id in &low_value {
+            let retained = self.decay_memory(node_id);
+            if retained {
+                report.memories_decayed += 1;
+            } else {
+                report.memories_removed += 1;
+            }
+        }
+
+        // Phase 4: Prune weak edges
+        let pruned = self.prune_weak_edges();
+        report.edges_pruned = pruned;
+
+        // Phase 5: Merge similar memories
+        let merged = self.merge_similar_memories();
+        report.memories_merged = merged;
+
+        report
+    }
+
+    fn categorize_memories(&self) -> (Vec<NodeId>, Vec<NodeId>, Vec<NodeId>) {
+        let memory = self.memory.read();
+        let mut high = Vec::new();
+        let mut medium = Vec::new();
+        let mut low = Vec::new();
+
+        for node in memory.iter_nodes() {
+            let value_score = self.compute_value_score(node);
+
+            if value_score > 0.7 {
+                high.push(node.id);
+            } else if value_score > 0.3 {
+                medium.push(node.id);
+            } else {
+                low.push(node.id);
+            }
+        }
+
+        (high, medium, low)
+    }
+
+    fn compute_value_score(&self, node: &MemoryNode) -> f32 {
+        let memory = self.memory.read();
+
+        // Factors:
+        // 1. Access frequency (more access = more valuable)
+        let freq_score = (node.access_count as f32 / 100.0).min(1.0);
+
+        // 2. Recency (recent = more valuable)
+        let age_days = (chrono::Utc::now().timestamp() - node.last_accessed) / 86400;
+        let recency_score = (-self.age_decay * age_days as f32).exp();
+
+        // 3. Quality (explicit quality score)
+        let quality_score = node.quality_score;
+
+        // 4. Connectivity (well-connected = more valuable)
+        let degree = memory.node_degree(node.id);
+        let connectivity_score = (degree as f32 / 10.0).min(1.0);
+
+        // Weighted combination
+        0.3 * freq_score + 0.2 * recency_score + 0.3 * quality_score + 0.2 * connectivity_score
+    }
+
+    fn strengthen_memory(&self, node_id: NodeId) {
+        let mut memory = self.memory.write();
+
+        // Increase edge weights to this node
+        for edge in memory.get_edges_to(node_id) {
+            memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(1.1));
+        }
+
+        // Mark as consolidated
+        if let Some(node) = memory.get_node_mut(node_id) {
+            node.consolidation_count += 1;
+            node.last_consolidated = chrono::Utc::now().timestamp();
+        }
+    }
+
+    fn decay_memory(&self, node_id: NodeId) -> bool {
+        let mut memory = self.memory.write();
+
+        // Reduce edge weights
+        for edge in memory.get_edges_to(node_id) {
+            memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(0.5));
+        }
+
+        // Check if node should be removed entirely
+        let total_incoming_weight: f32 = memory.get_edges_to(node_id)
+            .iter()
+            .map(|e| e.weight)
+            .sum();
+
+        if total_incoming_weight < 0.01 {
+            // Remove isolated or nearly-isolated node
+            memory.remove_node(node_id);
+            false // Not retained
+        } else {
+            true // Retained but weakened
+        }
+    }
+
+    fn prune_weak_edges(&self) -> usize {
+        let mut memory = self.memory.write();
+        let weak_edges: Vec<_> = memory.iter_edges()
+            .filter(|e| e.weight < 0.01)
+            .map(|e| e.id)
+            .collect();
+
+        for edge_id in &weak_edges {
+            memory.remove_edge(*edge_id);
+        }
+
+        weak_edges.len()
+    }
+
+    fn merge_similar_memories(&self) -> usize {
+        let mut memory = self.memory.write();
+        let mut merged_count = 0;
+
+        // Find highly similar node pairs
+        let nodes: Vec<_> = memory.iter_nodes().collect();
+
+        for i in 0..nodes.len() {
+            for j in (i+1)..nodes.len() {
+                let sim = cosine_similarity(&nodes[i].embedding, &nodes[j].embedding);
+
+                if sim > 0.98 {
+                    // Merge j into i
+                    memory.merge_nodes(nodes[i].id, nodes[j].id);
+                    merged_count += 1;
+                }
+            }
+        }
+
+        merged_count
+    }
+}
+
+#[derive(Default)]
+pub struct ConsolidationReport {
+    pub high_value_count: usize,
+    pub medium_value_count: usize,
+    pub low_value_count: usize,
+    pub memories_strengthened: usize,
+    pub memories_decayed: usize,
+    pub memories_removed: usize,
+    pub memories_merged: usize,
+    pub edges_pruned: usize,
+}
+```
+
+---
+
+## 7. Full Dream Cycle
+
+### Orchestrating the Dream Process
+
+```rust
+/// Complete dream cycle orchestrator
+pub struct DreamCycle {
+    generator: DreamGenerator,
+    evaluator: DreamEvaluator,
+    integrator: DreamIntegrator,
+    consolidator: ConsolidationEngine,
+    config: DreamCycleConfig,
+}
+
+impl DreamCycle {
+    /// Run complete dream cycle (weekly maintenance)
+    pub async fn run(&self) -> DreamCycleReport {
+        let start = Instant::now();
+        let mut report = DreamCycleReport::default();
+
+        // Phase 1: Generate dreams
+        tracing::info!("Starting dream generation phase");
+        let dreams = self.generate_dreams();
+        report.dreams_generated = dreams.len();
+
+        // Phase 2: Evaluate dreams
+        tracing::info!("Evaluating {} dreams", dreams.len());
+        let evaluated: Vec<_> = dreams.iter()
+            .map(|d| (d, self.evaluator.evaluate(d)))
+            .collect();
+
+        // Phase 3: Integrate valuable dreams
+        tracing::info!("Integrating valuable dreams");
+        for (dream, quality) in &evaluated {
+            if quality.is_valuable(self.config.dream_threshold) {
+                let result = self.integrator.integrate(dream, quality);
+                report.edges_added += result.edges_added;
+                report.dreams_integrated += 1;
+            }
+        }
+
+        // Phase 4: Memory consolidation
+        tracing::info!("Running memory consolidation");
+        report.consolidation = self.consolidator.consolidate();
+
+        report.elapsed_ms = start.elapsed().as_millis() as u64;
+        report.timestamp = chrono::Utc::now().timestamp();
+
+        tracing::info!(
+            dreams = report.dreams_generated,
+            integrated = report.dreams_integrated,
+            edges = report.edges_added,
+            elapsed_ms = report.elapsed_ms,
+            "Dream cycle completed"
+        );
+
+        report
+    }
+
+    fn generate_dreams(&self) -> Vec<Dream> {
+        let mut dreams = Vec::new();
+
+        // Regular random walk dreams
+        for _ in 0..self.config.num_regular_dreams {
+            let dream = self.generator.generate_dream(&self.memory, None);
+            dreams.push(dream);
+        }
+
+        // Creative jump dreams
+        for _ in 0..self.config.num_creative_dreams {
+            let dream = self.generator.generate_creative_dream(
+                &self.memory,
+                self.config.creative_jump_count,
+            );
+            dreams.push(dream);
+        }
+
+        dreams
+    }
+}
+
+#[derive(Default)]
+pub struct DreamCycleReport {
+    pub dreams_generated: usize,
+    pub dreams_integrated: usize,
+    pub edges_added: usize,
+    pub consolidation: ConsolidationReport,
+    pub elapsed_ms: u64,
+    pub timestamp: i64,
+}
+```
+
+---
+
+## 8. Integration with exo-exotic Dreams Module
+
+SONA integrates with the exo-ai-2025 dream experiments:
+
+```rust
+// From exo-exotic crate
+use exo_exotic::experiments::dreams::{
+    DreamExperiment,
+    DreamConfig,
+    NoveltyMeasure,
+};
+
+impl DreamCycle {
+    /// Run advanced dream experiments from exo-exotic
+    pub async fn run_exotic_dreams(&self) -> ExoticDreamReport {
+        let dream_experiment = DreamExperiment::new(DreamConfig {
+            memory_count: self.memory.node_count(),
+            replay_probability: 0.7,
+            recombination_rate: 0.3,
+            novelty_threshold: 0.5,
+        });
+
+        let result = dream_experiment.run(&self.memory).await;
+
+        ExoticDreamReport {
+            novelty_score: result.novelty,
+            coherence_score: result.coherence,
+            creative_insights: result.insights.len(),
+            new_hypotheses: result.hypotheses,
+        }
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA's Dream Engine enables:
+
+| Feature | Mechanism | Outcome |
+|---------|-----------|---------|
+| **Memory Replay** | Random walks on memory graph | Strengthens important connections |
+| **Creative Recombination** | High-temperature sampling | Discovers novel associations |
+| **Quality Filtering** | Novelty + coherence metrics | Only valuable dreams integrated |
+| **Weak Edge Creation** | Dream-derived connections | Enables creative retrieval |
+| **Memory Consolidation** | Value-based pruning | Efficient memory usage |
+
+Dreams allow SONA to:
+1. **Discover** connections it wouldn't find through normal operation
+2. **Explore** the hypothesis space without user cost
+3. **Consolidate** valuable knowledge
+4. **Prune** low-value information
+5. **Remain creative** while staying grounded
--- a/examples/ruvLLM/docs/SONA/06-COMPONENTS.md
+++ b/examples/ruvLLM/docs/SONA/06-COMPONENTS.md
--- a/examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
+++ b/examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
--- a/examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
+++ b/examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
@@ -0,0 +1,814 @@
+# SONA Performance Benchmarks
+
+## Overview
+
+This document defines performance targets, benchmark methodology, and expected results for SONA components. All benchmarks are designed to be reproducible and measurable.
+
+## Performance Targets Summary
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                      SONA Performance Targets                           │
+├─────────────────────────────────────────────────────────────────────────┤
+│  Component              │ Target         │ Stretch Goal  │ Unit        │
+├─────────────────────────┼────────────────┼───────────────┼─────────────┤
+│  Micro-LoRA forward     │ <50μs          │ <20μs         │ per request │
+│  Micro-LoRA update      │ <100μs         │ <50μs         │ per signal  │
+│  Base LoRA forward      │ <200μs         │ <100μs        │ per layer   │
+│  Pattern extraction     │ <1s            │ <500ms        │ per 1000    │
+│  Trajectory recording   │ <10μs          │ <5μs          │ per step    │
+│  Background cycle       │ <30s           │ <15s          │ per cycle   │
+│  Deep cycle             │ <10min         │ <5min         │ per cycle   │
+│  Memory overhead        │ <100MB         │ <50MB         │ total       │
+│  Pattern search         │ <1ms           │ <100μs        │ per query   │
+│  Dream generation       │ <100ms         │ <50ms         │ per dream   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Micro-LoRA Benchmarks
+
+### Forward Pass Latency
+
+**Target**: <50μs average, <100μs p99
+
+```rust
+// benches/micro_lora.rs
+use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
+
+fn bench_micro_lora_forward(c: &mut Criterion) {
+    let mut group = c.benchmark_group("micro_lora_forward");
+
+    for rank in [1, 2] {
+        for hidden_dim in [256, 512, 1024, 2048] {
+            let lora = MicroLoRA::new(hidden_dim, rank);
+            let input = vec![0.1f32; hidden_dim];
+            let mut output = vec![0.0f32; hidden_dim];
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("rank{}", rank), hidden_dim),
+                &hidden_dim,
+                |b, _| {
+                    b.iter(|| {
+                        output.fill(0.0);
+                        unsafe { lora.forward_simd(&input, &mut output) };
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Rank | Hidden Dim | AVX2 (μs) | Scalar (μs) | Speedup |
+|------|------------|-----------|-------------|---------|
+| 1    | 256        | 3.2       | 12.5        | 3.9x    |
+| 1    | 512        | 5.8       | 24.1        | 4.2x    |
+| 1    | 1024       | 10.4      | 47.3        | 4.5x    |
+| 1    | 2048       | 19.7      | 93.8        | 4.8x    |
+| 2    | 256        | 5.1       | 23.4        | 4.6x    |
+| 2    | 512        | 9.3       | 46.2        | 5.0x    |
+| 2    | 1024       | 17.2      | 91.5        | 5.3x    |
+| 2    | 2048       | 33.1      | 182.4       | 5.5x    |
+
+### Gradient Accumulation
+
+**Target**: <100μs per signal
+
+```rust
+fn bench_gradient_accumulation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("gradient_accumulation");
+
+    for hidden_dim in [256, 512, 1024] {
+        let mut lora = MicroLoRA::new(hidden_dim, 1);
+        let signal = LearningSignal {
+            query_embedding: vec![0.1; hidden_dim],
+            gradient_estimate: vec![0.01; hidden_dim],
+            quality_score: 0.8,
+            timestamp: Instant::now(),
+            metadata: SignalMetadata::default(),
+        };
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(hidden_dim),
+            &hidden_dim,
+            |b, _| {
+                b.iter(|| {
+                    lora.accumulate_gradient(&signal);
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Hidden Dim | Time (μs) | Throughput (signals/s) |
+|------------|-----------|------------------------|
+| 256        | 8.3       | 120,481                |
+| 512        | 15.7      | 63,694                 |
+| 1024       | 30.2      | 33,112                 |
+
+---
+
+## Base LoRA Benchmarks
+
+### Forward Pass (Per Layer)
+
+**Target**: <200μs per layer
+
+```rust
+fn bench_base_lora_forward(c: &mut Criterion) {
+    let mut group = c.benchmark_group("base_lora_forward");
+
+    for rank in [4, 8, 16] {
+        for hidden_dim in [512, 1024, 2048] {
+            let lora = BaseLoRA::new(hidden_dim, rank, 1);
+            let input = vec![0.1f32; hidden_dim];
+            let mut output = vec![0.0f32; hidden_dim];
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("rank{}", rank), hidden_dim),
+                &hidden_dim,
+                |b, _| {
+                    b.iter(|| {
+                        lora.forward_layer(0, &input, &mut output);
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Rank | Hidden Dim | Time (μs) | FLOPs    | GFLOPS |
+|------|------------|-----------|----------|--------|
+| 4    | 512        | 45        | 4.2M     | 93     |
+| 4    | 1024       | 85        | 8.4M     | 99     |
+| 4    | 2048       | 162       | 16.8M    | 104    |
+| 8    | 512        | 82        | 8.4M     | 102    |
+| 8    | 1024       | 158       | 16.8M    | 106    |
+| 8    | 2048       | 305       | 33.5M    | 110    |
+| 16   | 512        | 155       | 16.8M    | 108    |
+| 16   | 1024       | 298       | 33.5M    | 112    |
+| 16   | 2048       | 582       | 67.1M    | 115    |
+
+---
+
+## Trajectory Recording Benchmarks
+
+### Step Recording Latency
+
+**Target**: <10μs per step
+
+```rust
+fn bench_trajectory_recording(c: &mut Criterion) {
+    let mut group = c.benchmark_group("trajectory_recording");
+
+    for hidden_dim in [256, 512] {
+        for num_heads in [4, 8] {
+            let mut builder = TrajectoryBuilder::new(1, vec![0.1; hidden_dim]);
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("h{}_heads{}", hidden_dim, num_heads), hidden_dim),
+                &(hidden_dim, num_heads),
+                |b, &(hd, nh)| {
+                    b.iter(|| {
+                        builder.add_step(
+                            vec![0.5; hd],
+                            vec![0.1; hd * nh],
+                            0.8,
+                        );
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Hidden Dim | Heads | Time (μs) | Memory (bytes) |
+|------------|-------|-----------|----------------|
+| 256        | 4     | 2.1       | 5,120          |
+| 256        | 8     | 3.8       | 9,216          |
+| 512        | 4     | 3.7       | 10,240         |
+| 512        | 8     | 6.9       | 18,432         |
+
+### Buffer Operations
+
+**Target**: Lock-free with <1% contention
+
+```rust
+fn bench_trajectory_buffer(c: &mut Criterion) {
+    let buffer = Arc::new(TrajectoryBuffer::new(10000));
+
+    c.bench_function("trajectory_buffer_record", |b| {
+        let trajectory = QueryTrajectory {
+            id: 1,
+            query_embedding: vec![0.1; 256],
+            steps: vec![],
+            final_quality: 0.8,
+            latency_us: 1000,
+        };
+
+        b.iter(|| {
+            buffer.record(trajectory.clone());
+        });
+    });
+
+    c.bench_function("trajectory_buffer_drain", |b| {
+        // Pre-fill buffer
+        for i in 0..1000 {
+            buffer.record(QueryTrajectory {
+                id: i,
+                query_embedding: vec![0.1; 256],
+                steps: vec![],
+                final_quality: 0.8,
+                latency_us: 1000,
+            });
+        }
+
+        b.iter(|| {
+            buffer.drain()
+        });
+    });
+}
+```
+
+---
+
+## Pattern Learning Benchmarks
+
+### K-means++ Extraction
+
+**Target**: <1s for 1000 trajectories
+
+```rust
+fn bench_pattern_extraction(c: &mut Criterion) {
+    let mut group = c.benchmark_group("pattern_extraction");
+
+    for n_trajectories in [100, 500, 1000, 5000] {
+        let mut bank = ReasoningBank::new(PatternConfig {
+            k_clusters: 50,
+            embedding_dim: 256,
+            ..Default::default()
+        });
+
+        // Pre-populate
+        for i in 0..n_trajectories {
+            bank.add_trajectory(&generate_random_trajectory(i, 256));
+        }
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n_trajectories),
+            &n_trajectories,
+            |b, _| {
+                b.iter(|| {
+                    bank.extract_patterns()
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Trajectories | Clusters | Time (ms) | Iterations |
+|--------------|----------|-----------|------------|
+| 100          | 10       | 12        | 8          |
+| 500          | 25       | 95        | 12         |
+| 1000         | 50       | 380       | 15         |
+| 5000         | 100      | 2,450     | 20         |
+
+### Pattern Search
+
+**Target**: <1ms per query
+
+```rust
+fn bench_pattern_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("pattern_search");
+
+    for n_patterns in [1000, 10000, 100000] {
+        let mut index = PatternIndex::new(256, n_patterns);
+
+        // Pre-populate
+        for i in 0..n_patterns {
+            let embedding: Vec<f32> = (0..256).map(|_| rand::random()).collect();
+            index.add_pattern(i as u64, &embedding).unwrap();
+        }
+
+        let query: Vec<f32> = (0..256).map(|_| rand::random()).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n_patterns),
+            &n_patterns,
+            |b, _| {
+                b.iter(|| {
+                    index.find_similar(&query, 10)
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results** (HNSW with ef=50):
+
+| Patterns | Search Time (μs) | Recall@10 |
+|----------|------------------|-----------|
+| 1,000    | 45               | 0.98      |
+| 10,000   | 120              | 0.96      |
+| 100,000  | 350              | 0.94      |
+| 1,000,000| 850              | 0.92      |
+
+---
+
+## EWC++ Benchmarks
+
+### Fisher Information Update
+
+**Target**: <1ms per update
+
+```rust
+fn bench_fisher_update(c: &mut Criterion) {
+    let mut group = c.benchmark_group("fisher_update");
+
+    for param_count in [1000, 10000, 100000] {
+        let mut ewc = EwcPlusPlus::new(EwcConfig {
+            param_count,
+            ..Default::default()
+        });
+
+        let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(param_count),
+            &param_count,
+            |b, _| {
+                b.iter(|| {
+                    ewc.update_fisher(&gradients);
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Parameters | Update Time (μs) | Memory (KB) |
+|------------|------------------|-------------|
+| 1,000      | 15               | 8           |
+| 10,000     | 120              | 80          |
+| 100,000    | 1,150            | 800         |
+
+### Constraint Application
+
+**Target**: <500μs per gradient vector
+
+```rust
+fn bench_constraint_application(c: &mut Criterion) {
+    let mut group = c.benchmark_group("ewc_constraints");
+
+    for param_count in [1000, 10000, 100000] {
+        let ewc = EwcPlusPlus::new(EwcConfig {
+            param_count,
+            num_tasks: 5,
+            ..Default::default()
+        });
+
+        // Pre-train Fisher
+        for _ in 0..100 {
+            let grads: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+            ewc.update_fisher(&grads);
+        }
+
+        let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(param_count),
+            &param_count,
+            |b, _| {
+                b.iter(|| {
+                    ewc.apply_constraints(&gradients)
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+---
+
+## Dream Engine Benchmarks
+
+### Dream Generation
+
+**Target**: <100ms per dream
+
+```rust
+fn bench_dream_generation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("dream_generation");
+
+    for memory_size in [1000, 10000, 50000] {
+        let mut engine = DreamEngine::new(DreamConfig::default());
+
+        // Pre-populate memory
+        for i in 0..memory_size {
+            engine.add_memory_node(MemoryNode {
+                id: i as u64,
+                embedding: (0..256).map(|_| rand::random()).collect(),
+                timestamp: Instant::now(),
+                access_count: rand::random::<u32>() % 100,
+                importance: rand::random(),
+            });
+        }
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(memory_size),
+            &memory_size,
+            |b, _| {
+                b.iter(|| {
+                    engine.generate_dream()
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Memory Nodes | Dream Time (ms) | Avg Path Length |
+|--------------|-----------------|-----------------|
+| 1,000        | 12              | 8               |
+| 10,000       | 45              | 12              |
+| 50,000       | 85              | 15              |
+
+### Dream Quality Evaluation
+
+**Target**: <50ms per evaluation
+
+```rust
+fn bench_dream_evaluation(c: &mut Criterion) {
+    let evaluator = DreamEvaluator::new(EvaluatorConfig::default());
+
+    let dream = Dream {
+        id: 1,
+        path: (0..15).map(|i| MemoryNode {
+            id: i,
+            embedding: (0..256).map(|_| rand::random()).collect(),
+            timestamp: Instant::now(),
+            access_count: 10,
+            importance: 0.5,
+        }).collect(),
+        creative_jumps: 3,
+        total_novelty: 0.0,
+    };
+
+    c.bench_function("dream_evaluation", |b| {
+        b.iter(|| {
+            evaluator.evaluate(&dream)
+        });
+    });
+}
+```
+
+---
+
+## Learning Loop Benchmarks
+
+### Loop A (Instant) - Per Request
+
+**Target**: <1ms total overhead
+
+```rust
+fn bench_loop_a(c: &mut Criterion) {
+    let loop_a = InstantLoop::new(256, InstantLoopConfig::default());
+
+    let trajectory = QueryTrajectory {
+        id: 1,
+        query_embedding: vec![0.1; 256],
+        steps: (0..10).map(|_| TrajectoryStep {
+            activations: vec![0.5; 256],
+            attention_weights: vec![0.1; 2048],
+            reward: 0.8,
+            timestamp: Instant::now(),
+        }).collect(),
+        final_quality: 0.8,
+        latency_us: 50000,
+    };
+
+    c.bench_function("loop_a_on_inference", |b| {
+        b.iter(|| {
+            loop_a.on_inference(trajectory.clone());
+        });
+    });
+
+    c.bench_function("loop_a_flush", |b| {
+        // Pre-fill with signals
+        for _ in 0..100 {
+            loop_a.on_inference(trajectory.clone());
+        }
+
+        b.iter(|| {
+            loop_a.flush_updates();
+        });
+    });
+}
+```
+
+**Expected Results**:
+
+| Operation     | Time (μs) | Notes                    |
+|---------------|-----------|--------------------------|
+| on_inference  | 650       | Recording + accumulation |
+| flush_updates | 120       | LoRA + edge commit       |
+| Total         | 770       | Per request overhead     |
+
+### Loop B (Background) - Hourly
+
+**Target**: <30s per cycle
+
+```rust
+fn bench_loop_b(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let loop_b = BackgroundLoop::new(BackgroundLoopConfig::default(), 256);
+
+    // Generate trajectories
+    let trajectories: Vec<_> = (0..1000)
+        .map(|i| generate_random_trajectory(i, 256))
+        .collect();
+
+    c.bench_function("loop_b_cycle", |b| {
+        b.to_async(&runtime).iter(|| async {
+            loop_b.run_cycle(trajectories.clone()).await
+        });
+    });
+}
+```
+
+**Breakdown**:
+
+| Phase                  | Time (s) | % of Total |
+|------------------------|----------|------------|
+| Trajectory ingestion   | 0.5      | 2%         |
+| Pattern extraction     | 8.0      | 32%        |
+| Gradient computation   | 5.0      | 20%        |
+| EWC++ constraints      | 3.0      | 12%        |
+| LoRA update            | 2.0      | 8%         |
+| Fisher update          | 4.0      | 16%        |
+| Metrics/logging        | 2.5      | 10%        |
+| **Total**              | **25.0** | 100%       |
+
+### Loop C (Deep) - Weekly
+
+**Target**: <10min per cycle
+
+```rust
+fn bench_loop_c(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let loop_c = DeepLoop::new(DeepLoopConfig::default());
+
+    // This is a longer benchmark, run fewer iterations
+    c.bench_function("loop_c_cycle", |b| {
+        b.to_async(&runtime).iter(|| async {
+            loop_c.run_cycle().await
+        });
+    });
+}
+```
+
+**Breakdown**:
+
+| Phase                  | Time (min) | % of Total |
+|------------------------|------------|------------|
+| Dream generation (50)  | 1.5        | 15%        |
+| Φ evaluation           | 2.0        | 20%        |
+| Dream integration      | 1.0        | 10%        |
+| Memory consolidation   | 3.0        | 30%        |
+| EWC++ consolidation    | 2.0        | 20%        |
+| Metrics/persistence    | 0.5        | 5%         |
+| **Total**              | **10.0**   | 100%       |
+
+---
+
+## Memory Benchmarks
+
+### Memory Usage by Component
+
+```rust
+fn measure_memory_usage() -> MemoryReport {
+    let mut report = MemoryReport::default();
+
+    // Micro-LoRA (rank=1, hidden=256)
+    let micro_lora = MicroLoRA::new(256, 1);
+    report.micro_lora = std::mem::size_of_val(&micro_lora)
+        + micro_lora.down_proj.len() * 4
+        + micro_lora.up_proj.len() * 4
+        + micro_lora.gradient_buffer.len() * 4;
+
+    // Base LoRA (rank=8, hidden=256, layers=12)
+    let base_lora = BaseLoRA::new(256, 8, 12);
+    report.base_lora = std::mem::size_of_val(&base_lora)
+        + base_lora.layers.iter().map(|l|
+            l.down_proj.len() * 4 + l.up_proj.len() * 4
+        ).sum::<usize>();
+
+    // Trajectory buffer (capacity=10000)
+    report.trajectory_buffer = 10000 * (
+        256 * 4  // query embedding
+        + 10 * (256 * 4 + 2048 * 4 + 4 + 8)  // 10 steps
+    );
+
+    // Pattern index (100k patterns)
+    report.pattern_index = 100000 * (256 * 4 + 64);  // embedding + metadata
+
+    // EWC++ (100k params, 5 tasks)
+    report.ewc = 100000 * 4 * 5;  // Fisher per task
+
+    report
+}
+```
+
+**Expected Memory Usage**:
+
+| Component        | Size (MB) | Notes                    |
+|------------------|-----------|--------------------------|
+| Micro-LoRA       | 0.004     | Minimal overhead         |
+| Base LoRA        | 0.6       | 12 layers                |
+| Trajectory Buffer| 82.0      | 10k capacity             |
+| Pattern Index    | 102.4     | 100k patterns            |
+| EWC++ Fisher     | 2.0       | 100k params × 5 tasks    |
+| Dream Engine     | 12.8      | 50k memory nodes         |
+| **Total**        | **199.8** | Peak usage               |
+
+---
+
+## Throughput Benchmarks
+
+### End-to-End Query Throughput
+
+```rust
+fn bench_query_throughput(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let sona = runtime.block_on(async {
+        SonaEngine::new(SonaConfig::default()).await.unwrap()
+    });
+
+    c.bench_function("query_throughput", |b| {
+        b.to_async(&runtime).iter(|| async {
+            sona.process("test query", &Context::default()).await
+        });
+    });
+}
+```
+
+**Expected Throughput**:
+
+| Scenario           | QPS     | Latency p50 | Latency p99 |
+|--------------------|---------|-------------|-------------|
+| Baseline (no SONA) | 850     | 1.1ms       | 2.5ms       |
+| With Micro-LoRA    | 780     | 1.2ms       | 2.8ms       |
+| Full SONA          | 720     | 1.3ms       | 3.2ms       |
+
+**Overhead**: ~15% throughput reduction for full self-learning capability.
+
+---
+
+## Hardware-Specific Benchmarks
+
+### CPU Feature Detection
+
+```rust
+fn check_cpu_features() -> CpuFeatures {
+    CpuFeatures {
+        avx2: is_x86_feature_detected!("avx2"),
+        avx512f: is_x86_feature_detected!("avx512f"),
+        fma: is_x86_feature_detected!("fma"),
+        sse4_1: is_x86_feature_detected!("sse4.1"),
+        sse4_2: is_x86_feature_detected!("sse4.2"),
+    }
+}
+```
+
+### Performance by CPU
+
+| CPU                    | Micro-LoRA (μs) | Pattern Search (μs) | Overall Speedup |
+|------------------------|-----------------|---------------------|-----------------|
+| Intel i9-13900K (AVX2) | 3.2             | 45                  | 4.8x            |
+| AMD Ryzen 9 7950X      | 3.5             | 48                  | 4.5x            |
+| Apple M2 Pro (NEON)    | 4.1             | 52                  | 3.9x            |
+| Intel Xeon Platinum    | 2.8             | 38                  | 5.2x            |
+
+---
+
+## Benchmark Commands
+
+```bash
+# Run all benchmarks
+cargo bench --package ruvllm --features sona
+
+# Run specific benchmark group
+cargo bench --package ruvllm --bench micro_lora
+
+# Run with specific features
+cargo bench --package ruvllm --features "sona,avx2"
+
+# Profile memory
+cargo bench --package ruvllm --bench memory -- --profile-time 60
+
+# Generate flamegraph
+cargo flamegraph --bench micro_lora -- --bench
+```
+
+---
+
+## Continuous Benchmarking
+
+### CI Integration
+
+```yaml
+# .github/workflows/bench.yml
+name: Benchmarks
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  benchmark:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Run benchmarks
+        run: cargo bench --package ruvllm --features sona -- --save-baseline main
+
+      - name: Compare with baseline
+        run: cargo bench --package ruvllm --features sona -- --baseline main
+
+      - name: Upload results
+        uses: actions/upload-artifact@v4
+        with:
+          name: benchmark-results
+          path: target/criterion
+```
+
+### Regression Detection
+
+```rust
+// Fail CI if performance regresses by more than 10%
+const MAX_REGRESSION_PERCENT: f64 = 10.0;
+
+fn check_regression(baseline: Duration, current: Duration) -> Result<(), String> {
+    let regression = (current.as_nanos() as f64 / baseline.as_nanos() as f64 - 1.0) * 100.0;
+
+    if regression > MAX_REGRESSION_PERCENT {
+        Err(format!(
+            "Performance regression of {:.1}% exceeds threshold of {}%",
+            regression, MAX_REGRESSION_PERCENT
+        ))
+    } else {
+        Ok(())
+    }
+}
+```
+
+---
+
+## Next Steps
+
+1. **09-API-REFERENCE.md** - Complete API documentation
--- a/examples/ruvLLM/docs/SONA/09-API-REFERENCE.md
+++ b/examples/ruvLLM/docs/SONA/09-API-REFERENCE.md
--- a/examples/ruvLLM/docs/index.md
+++ b/examples/ruvLLM/docs/index.md
@@ -0,0 +1,138 @@
+# RuvLLM Documentation
+
+## Overview
+
+This directory contains documentation for the RuvLLM self-learning LLM architecture.
+
+## Quick Links
+
+- [Main README](../README.md) - Getting started, API reference, benchmarks
+- [SPARC Documentation](./sparc/) - Design methodology documentation
+
+## SPARC Methodology
+
+The project was designed using the SPARC methodology:
+
+| Phase | Document | Description |
+|-------|----------|-------------|
+| 1 | [Specification](./sparc/01-specification.md) | Requirements and acceptance criteria |
+| 2 | [Pseudocode](./sparc/02-pseudocode.md) | Algorithm design and data flows |
+| 3 | [Architecture](./sparc/03-architecture.md) | System design and component interactions |
+| 4 | [Refinement](./sparc/04-refinement.md) | TDD implementation and iterative improvement |
+| 5 | [Completion](./sparc/05-completion.md) | Integration, testing, and deployment |
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         RuvLLM System                           │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
+│  │  Embedding  │  │   Memory    │  │   Router    │             │
+│  │  Service    │  │   (HNSW)    │  │  (FastGRNN) │             │
+│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘             │
+│         │                │                │                     │
+│         └────────────────┼────────────────┘                     │
+│                          │                                      │
+│                   ┌──────┴──────┐                               │
+│                   │ Orchestrator │                              │
+│                   └──────┬──────┘                               │
+│                          │                                      │
+│  ┌─────────────┐  ┌──────┴──────┐  ┌─────────────┐             │
+│  │  Attention  │  │  Inference  │  │  Learning   │             │
+│  │  Engine     │  │  Pool       │  │  Service    │             │
+│  └─────────────┘  └─────────────┘  └─────────────┘             │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Module Documentation
+
+### Core Modules
+
+| Module | File | Description |
+|--------|------|-------------|
+| `orchestrator` | `src/orchestrator.rs` | Main coordinator, request processing pipeline |
+| `memory` | `src/memory.rs` | HNSW-based semantic memory with graph expansion |
+| `router` | `src/router.rs` | FastGRNN routing with EWC learning |
+| `attention` | `src/attention.rs` | Multi-head graph attention with edge features |
+| `embedding` | `src/embedding.rs` | Tokenization, embedding, and caching |
+| `inference` | `src/inference.rs` | LFM2 model pool management |
+| `learning` | `src/learning.rs` | Self-learning feedback loops |
+| `compression` | `src/compression.rs` | Memory compression and clustering |
+
+### Supporting Modules
+
+| Module | File | Description |
+|--------|------|-------------|
+| `config` | `src/config.rs` | Configuration system with builder pattern |
+| `error` | `src/error.rs` | Error types and result aliases |
+| `types` | `src/types.rs` | Core domain types and structs |
+
+## API Examples
+
+### Basic Query
+
+```rust
+use ruvllm::{Config, RuvLLM};
+
+let config = Config::builder().build()?;
+let llm = RuvLLM::new(config).await?;
+let response = llm.query("What is Rust?").await?;
+```
+
+### Session Management
+
+```rust
+let session = llm.new_session();
+let r1 = llm.query_session(&session, "Tell me about vectors").await?;
+let r2 = llm.query_session(&session, "How are they used in ML?").await?;
+```
+
+### Feedback Loop
+
+```rust
+use ruvllm::Feedback;
+
+llm.feedback(Feedback {
+    request_id: response.request_id,
+    rating: Some(5),
+    correction: None,
+    task_success: Some(true),
+}).await?;
+```
+
+## Performance Tuning
+
+### Memory Configuration
+
+```rust
+Config::builder()
+    .hnsw_params(
+        32,   // M: connections per node (higher = better recall, more memory)
+        200,  // ef_construction: build quality (higher = slower build, better index)
+        64,   // ef_search: search quality (higher = slower search, better recall)
+    )
+```
+
+### Router Configuration
+
+```rust
+Config::builder()
+    .router_hidden_dim(128)  // Hidden state size (higher = more capacity)
+```
+
+### Learning Configuration
+
+```rust
+Config::builder()
+    .learning_enabled(true)  // Enable self-learning
+```
+
+## Further Reading
+
+- [LFM2 Paper](https://arxiv.org/abs/2511.23404v1) - Liquid Foundation Models
+- [FastGRNN Paper](https://arxiv.org/abs/1901.02358) - Fast RNN architecture
+- [HNSW Paper](https://arxiv.org/abs/1603.09320) - Approximate nearest neighbor search
+- [EWC Paper](https://arxiv.org/abs/1612.00796) - Continual learning
--- a/examples/ruvLLM/docs/sparc/01-specification.md
+++ b/examples/ruvLLM/docs/sparc/01-specification.md
@@ -0,0 +1,612 @@
+# RuvLLM: Self-Learning LLM with LFM2 and Ruvector Integration
+
+## SPARC Phase 1: Specification
+
+---
+
+## 1. Executive Summary
+
+RuvLLM is a self-learning LLM architecture that integrates **Liquid Foundation Models (LFM2)** with **ruvector** as the world model and memory substrate. The system uses **FastGRNN** as an intelligent router to dynamically allocate computational resources based on query complexity, enabling efficient on-device inference with continuous learning capabilities.
+
+### Core Innovation
+
+The architecture treats:
+- **LFM2** as the reasoning head (inference engine)
+- **Ruvector** as the world model and episodic memory
+- **FastGRNN** as the control circuit (routing decisions)
+
+This triad creates a self-learning system where:
+1. Queries are semantically embedded and matched against memory
+2. Graph attention extracts relevant neighborhood context
+3. FastGRNN routes to optimal model configuration
+4. LFM2 generates responses with retrieved context
+5. Successful interactions are written back to memory (self-improvement)
+
+---
+
+## 2. Technical Requirements
+
+### 2.1 Functional Requirements
+
+#### FR-001: LFM2 Model Integration
+- **Description**: Support LFM2 model family (350M, 700M, 1.2B, 2.6B parameters)
+- **Acceptance Criteria**:
+  - Load models via llama.cpp (CPU) or vLLM (server)
+  - Support quantization: Q4/Q5 (CPU), 8-bit/4-bit weight-only (GPU)
+  - Enable KV cache for context reuse
+  - Achieve <500ms median latency (CPU), <100ms (GPU)
+
+#### FR-002: Ruvector Memory Service
+- **Description**: Implement semantic memory with graph structure
+- **Storage Schema**:
+  ```
+  Nodes: {
+    id: UUID,
+    vector: [f32; D],      // D = embedding dimension
+    text: String,
+    type: NodeType,        // Query | Document | AgentStep | Fact
+    source: String,
+    metadata: {
+      timestamp: i64,
+      tags: Vec<String>,
+      domain: String,
+      version: u32,
+      confidence: f32
+    }
+  }
+
+  Edges: {
+    id: UUID,
+    src: UUID,
+    dst: UUID,
+    rel: EdgeType,         // Cites | Follows | SameTopic | AgentStep | Derived
+    weight: f32,
+    metadata: {
+      timestamp: i64,
+      created_by: String,
+      confidence: f32
+    }
+  }
+  ```
+- **Acceptance Criteria**:
+  - HNSW index with M=32, efConstruction=200, efSearch=64
+  - Sub-millisecond retrieval for k≤64
+  - Graph attention over 2-hop neighborhoods
+  - Support billion-scale corpora
+
+#### FR-003: FastGRNN Router
+- **Description**: Implement gated recurrent router for intelligent resource allocation
+- **Architecture** (per Kusupati et al.):
+  - Hidden size: 32-64 units
+  - Input: Fixed-length feature vector (~128 dims)
+  - Outputs: model_selection, context_size, temperature, top_p
+- **Feature Vector Components** (128 dimensions):
+  ```
+  Query Stats [32 dims]:
+    - token_count: f32
+    - language_id: [f32; 8] (one-hot)
+    - domain_encoding: [f32; 16]
+    - user_frequency: f32
+    - query_type: [f32; 6] (factual/reasoning/creative/...)
+
+  Embedding Stats [16 dims]:
+    - l2_norm: f32
+    - principal_components: [f32; 8]
+    - entropy: f32
+    - sparsity: f32
+    - cluster_assignment: [f32; 4]
+
+  HNSW Search Stats [48 dims]:
+    - k_retrieved: f32
+    - distances: { mean, std, min, max }: [f32; 4]
+    - entropy: f32
+    - graph_depth: f32
+    - recall_estimate: f32
+    - neighborhood_density: [f32; 16]
+    - semantic_coherence: [f32; 24]
+
+  System Constraints [32 dims]:
+    - latency_budget: f32
+    - device_class: [f32; 4] (edge/mobile/server/cluster)
+    - privacy_level: [f32; 4]
+    - memory_available: f32
+    - battery_level: f32 (for mobile)
+    - concurrent_requests: f32
+    - historical_accuracy: [f32; 16]
+  ```
+
+#### FR-004: Self-Learning Pipeline
+- **Description**: Implement continuous learning with forgetting mitigation
+- **Components**:
+  - Online learning from successful interactions
+  - Elastic Weight Consolidation (EWC) for catastrophic forgetting prevention
+  - Experience replay with reservoir sampling
+  - Curriculum learning for progressive complexity
+- **Acceptance Criteria**:
+  - Quality regret <0.1 points vs. always-big baseline
+  - No measurable forgetting over 10K update cycles
+  - Router accuracy >95% for seen patterns
+
+#### FR-005: Graph Attention Engine
+- **Description**: Context extraction via graph-aware attention
+- **Mechanism**:
+  - Multi-head attention over retrieved nodes
+  - Edge-weighted aggregation (confidence, recency)
+  - Hyperbolic embeddings for hierarchical relationships
+  - 2-hop neighborhood expansion
+- **Integration with existing ruvector-attention**:
+  - Leverage `EdgeFeaturedAttention` for edge attributes
+  - Use `GraphRoPE` for positional encoding on graphs
+  - Apply `DualSpaceAttention` for multi-manifold reasoning
+
+### 2.2 Non-Functional Requirements
+
+#### NFR-001: Performance
+| Metric | Tier A (Server) | Tier B (Edge) | Tier C (Mobile) |
+|--------|-----------------|---------------|-----------------|
+| P50 Latency | <200ms | <500ms | <800ms |
+| P99 Latency | <1s | <2s | <5s |
+| Throughput | 100 QPS | 20 QPS | 5 QPS |
+| Memory | <16GB | <4GB | <1GB |
+
+#### NFR-002: Quality
+- **Accuracy**: F1 >0.85 on QA benchmarks
+- **Retrieval**: R@10 >0.90 for relevant documents
+- **Router**: Decision accuracy >95%
+- **Judge Rating**: 4.2+/5.0 on LLM-as-judge evaluations
+
+#### NFR-003: Scalability
+- Support 10M+ vectors in memory
+- Support 1B+ vectors with hybrid indexing
+- Linear scaling with node count in cluster mode
+
+#### NFR-004: Reliability
+- Zero data loss on graceful shutdown
+- Recovery from OOM within 30s
+- Automatic failover in cluster mode
+
+---
+
+## 3. LFM2 Deep Dive
+
+### 3.1 Architecture Analysis
+
+LFM2 employs a **hybrid backbone** combining:
+
+1. **Gated Short Convolutions**: Lightweight local feature processing
+   - O(n) complexity vs O(n²) for attention
+   - Captures local patterns efficiently
+   - Enables 2x faster prefill on CPUs
+
+2. **Grouped Query Attention (GQA)**: Reduced KV heads
+   - 4-8 KV heads vs 32+ in standard attention
+   - Maintains quality with 4x memory reduction
+   - Critical for edge deployment
+
+### 3.2 Training Methodology
+
+LFM2's training is relevant for our self-learning pipeline:
+
+1. **Knowledge Distillation**: Tempered, decoupled Top-K
+   - Teacher: Large model (70B+)
+   - Student: LFM2 variants
+   - **Insight**: We can distill router decisions from expensive oracle
+
+2. **Curriculum Learning**: Progressive complexity
+   - Start with simple factual queries
+   - Graduate to multi-step reasoning
+   - **Application**: Router training follows same progression
+
+3. **Three-Stage Post-Training**:
+   - SFT: Supervised fine-tuning on quality data
+   - DPO: Direct preference optimization
+   - Model merging: Combine specialists
+   - **Application**: We merge domain-specific adapters
+
+### 3.3 Multimodal Extensions (Future)
+
+- **LFM2-VL**: Vision-language (image understanding)
+- **LFM2-Audio**: Speech I/O
+- **LFM2-ColBERT**: Low-latency retrieval encoder
+
+---
+
+## 4. Ruvector Integration Analysis
+
+### 4.1 Existing Capabilities
+
+| Component | Status | Integration Plan |
+|-----------|--------|------------------|
+| ruvector-core | ✅ Production | Primary vector store |
+| ruvector-gnn | ✅ Production | Graph neural layer |
+| ruvector-attention | ✅ Production | Attention mechanisms |
+| ruvector-router-core | ✅ Production | Base routing |
+| ruvector-graph | ✅ Production | Knowledge graph |
+
+### 4.2 Required Extensions
+
+#### 4.2.1 Embedding Adapter
+```rust
+pub struct EmbeddingAdapter {
+    /// LFM2 encoder for query embedding
+    lfm2_encoder: Lfm2Encoder,
+    /// Dimension alignment layer
+    projection: Linear,
+    /// Normalization
+    layer_norm: LayerNorm,
+}
+
+impl EmbeddingAdapter {
+    pub fn embed(&self, text: &str) -> Vec<f32> {
+        let raw = self.lfm2_encoder.encode(text);
+        let projected = self.projection.forward(&raw);
+        self.layer_norm.forward(&projected)
+    }
+}
+```
+
+#### 4.2.2 Memory Writeback Service
+```rust
+pub struct MemoryWriteback {
+    /// Quality threshold for writeback
+    quality_threshold: f32,
+    /// Deduplication via MinHash
+    dedup_hasher: MinHasher,
+    /// Conflict resolution
+    merger: ConflictMerger,
+}
+
+impl MemoryWriteback {
+    pub async fn maybe_write(
+        &self,
+        query: &str,
+        response: &str,
+        quality_score: f32,
+        db: &VectorDB,
+    ) -> Result<Option<UUID>> {
+        if quality_score < self.quality_threshold {
+            return Ok(None);
+        }
+
+        // Check for near-duplicates
+        let embedding = embed(query, response);
+        let similar = db.search_threshold(&embedding, 0.95)?;
+        if !similar.is_empty() {
+            return self.merger.resolve(similar, query, response);
+        }
+
+        // Insert new memory
+        let entry = VectorEntry::new(embedding)
+            .with_text(format!("Q: {}\nA: {}", query, response))
+            .with_metadata(json!({
+                "type": "qa_pair",
+                "quality": quality_score,
+                "timestamp": now(),
+            }));
+
+        Ok(Some(db.insert(entry)?))
+    }
+}
+```
+
+### 4.3 HNSW Parameter Tuning
+
+Based on arxiv:2511.23404v1 insights on retrieval efficiency:
+
+| Corpus Size | M | efConstruction | efSearch | Recall@10 |
+|-------------|---|----------------|----------|-----------|
+| <100K | 16 | 100 | 32 | 0.98 |
+| 100K-1M | 32 | 200 | 64 | 0.96 |
+| 1M-10M | 48 | 300 | 128 | 0.94 |
+| 10M-100M | 64 | 400 | 256 | 0.92 |
+| >100M | Hybrid | Tiered | Adaptive | 0.90 |
+
+---
+
+## 5. FastGRNN Router Specification
+
+### 5.1 Mathematical Formulation
+
+FastGRNN (Fast, Accurate, Stable, and Tiny GRU):
+
+```
+z_t = σ(W_z · x_t + U_z · h_{t-1} + b_z)
+h̃_t = tanh(W_h · x_t + U_h · (r_t ⊙ h_{t-1}) + b_h)
+h_t = (ζ · (1 - z_t) + ν) ⊙ h̃_t + z_t ⊙ h_{t-1}
+
+where:
+  - ζ, ν: Learned scalars (typically ζ≈1, ν≈0.5)
+  - W_z, W_h: Input weight matrices (sparse)
+  - U_z, U_h: Recurrent weight matrices (low-rank)
+  - r_t: Optional reset gate (can be fixed to 1)
+```
+
+### 5.2 Output Heads
+
+```rust
+pub struct RouterOutputs {
+    /// Model selection: [350M, 700M, 1.2B, 2.6B] probabilities
+    pub model_probs: [f32; 4],
+    /// Context size bins: [256, 512, 1024, 2048, 4096] tokens
+    pub context_probs: [f32; 5],
+    /// Temperature: continuous [0.0, 2.0]
+    pub temperature: f32,
+    /// Top-p: continuous [0.0, 1.0]
+    pub top_p: f32,
+    /// Confidence score
+    pub confidence: f32,
+}
+```
+
+### 5.3 Training Protocol
+
+**Phase 1: Data Collection**
+```
+For each query q:
+  1. Run all model configurations (expensive baseline)
+  2. Collect quality metrics Q, latency L, cost C
+  3. Compute utility: U = Q - λ·L - μ·C
+  4. Label: y_model = argmax(U), y_ctx = min viable context
+```
+
+**Phase 2: Supervised Training**
+```
+Loss = CE(model_pred, y_model)
+     + CE(ctx_pred, y_ctx)
+     + α·SmoothL1(temp_pred, y_temp)
+     + β·SmoothL1(top_p_pred, y_top_p)
+```
+
+**Phase 3: Online Refinement**
+```
+Every N requests:
+  1. Sample exploration (ε-greedy or Thompson)
+  2. Compute regret vs. oracle
+  3. Update weights with importance sampling
+  4. Apply EWC regularization
+```
+
+---
+
+## 6. Self-Learning Mechanisms
+
+### 6.1 Continual Learning Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Self-Learning Pipeline                     │
+├─────────────────────────────────────────────────────────────┤
+│                                                               │
+│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐   │
+│  │ Query   │───▶│ Retrieve│───▶│ Generate│───▶│ Evaluate│   │
+│  └─────────┘    └─────────┘    └─────────┘    └─────────┘   │
+│       │              │              │              │         │
+│       │              │              │              ▼         │
+│       │              │              │        ┌─────────┐     │
+│       │              │              │        │ Quality │     │
+│       │              │              │        │ > θ ?   │     │
+│       │              │              │        └────┬────┘     │
+│       │              │              │             │          │
+│       │              │              │      ┌──────┴──────┐   │
+│       │              │              │      ▼             ▼   │
+│       │              │              │  ┌───────┐   ┌───────┐ │
+│       │              │              │  │ Write │   │ Skip  │ │
+│       │              │              │  │ Back  │   │       │ │
+│       │              │              │  └───┬───┘   └───────┘ │
+│       │              │              │      │                 │
+│       ▼              ▼              ▼      ▼                 │
+│  ┌─────────────────────────────────────────────┐             │
+│  │            Replay Buffer (Reservoir)         │             │
+│  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐   │             │
+│  │  │ E_1 │ │ E_2 │ │ ... │ │E_n-1│ │ E_n │   │             │
+│  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘   │             │
+│  └──────────────────────┬──────────────────────┘             │
+│                         │                                    │
+│                         ▼                                    │
+│  ┌─────────────────────────────────────────────┐             │
+│  │           EWC Regularization Layer           │             │
+│  │                                               │             │
+│  │  L_total = L_task + λ·Σ F_i·(θ_i - θ*_i)²   │             │
+│  │                                               │             │
+│  │  F_i = Fisher Information (importance)        │             │
+│  │  θ*_i = Optimal weights from previous task   │             │
+│  └─────────────────────────────────────────────┘             │
+│                                                               │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 6.2 Quality Evaluation
+
+**LLM-as-Judge Protocol**:
+```rust
+pub struct QualityJudge {
+    judge_model: Lfm2, // Use 2.6B for judging
+    rubric: JudgeRubric,
+}
+
+impl QualityJudge {
+    pub fn evaluate(&self, query: &str, response: &str, context: &[&str]) -> f32 {
+        let prompt = format!(r#"
+            Evaluate the response quality on a scale of 1-5:
+
+            Query: {query}
+            Retrieved Context: {context:?}
+            Response: {response}
+
+            Criteria:
+            1. Factual accuracy (grounded in context)
+            2. Completeness (addresses the query fully)
+            3. Coherence (logical flow)
+            4. Conciseness (no unnecessary verbosity)
+
+            Score (1-5):
+        "#);
+
+        let score_str = self.judge_model.generate(&prompt, 10);
+        parse_score(&score_str)
+    }
+}
+```
+
+### 6.3 Forgetting Mitigation
+
+**Elastic Weight Consolidation (EWC)**:
+
+```rust
+// From ruvector-gnn ewc module
+pub struct ElasticWeightConsolidation {
+    lambda: f32,                    // Regularization strength
+    fisher_info: Vec<f32>,          // Fisher information diagonal
+    optimal_weights: Vec<f32>,      // θ* from previous task
+}
+
+impl ElasticWeightConsolidation {
+    pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
+        self.fisher_info.iter()
+            .zip(current_weights.iter())
+            .zip(self.optimal_weights.iter())
+            .map(|((f, w), w_star)| f * (w - w_star).powi(2))
+            .sum::<f32>() * self.lambda / 2.0
+    }
+
+    pub fn update_fisher(&mut self, gradients: &[Vec<f32>]) {
+        // Fisher = E[∇logP(y|x;θ)²]
+        for (i, grad_samples) in gradients.iter().enumerate() {
+            self.fisher_info[i] = grad_samples.iter()
+                .map(|g| g.powi(2))
+                .sum::<f32>() / grad_samples.len() as f32;
+        }
+    }
+}
+```
+
+---
+
+## 7. Performance Optimization Strategy
+
+### 7.1 LFM2 Level
+
+| Optimization | Speedup | Quality Impact | Implementation |
+|--------------|---------|----------------|----------------|
+| Model selection | 2-4x | <1% | FastGRNN router |
+| KV cache reuse | 1.5-2x | 0% | llama.cpp native |
+| Q4 quantization | 2-3x | <2% | GGUF format |
+| Speculative decode | 1.3-1.5x | 0% | Draft model |
+| Continuous batching | 2-4x | 0% | vLLM |
+
+### 7.2 Ruvector Level
+
+| Optimization | Speedup | Quality Impact | Implementation |
+|--------------|---------|----------------|----------------|
+| HNSW tuning | Variable | Recall tradeoff | efSearch adjustment |
+| Product quantization | 4-8x memory | <5% | PQ in ruvector-core |
+| Graph pruning | 1.2-1.5x | <1% | Edge weight threshold |
+| Batch retrieval | 2-3x | 0% | Parallel HNSW |
+| Caching | 10x+ (hits) | 0% | LRU with TTL |
+
+### 7.3 Router Level
+
+| Optimization | Speedup | Quality Impact | Implementation |
+|--------------|---------|----------------|----------------|
+| Sparse weights | 10-50x | <0.5% | Magnitude pruning |
+| Low-rank U | 2-4x | <0.5% | SVD decomposition |
+| Int8 quantization | 2-4x | <0.1% | Post-training quant |
+| Cascade routing | 1.5-2x | 0% | Early exit |
+
+---
+
+## 8. Success Metrics
+
+### 8.1 Primary Metrics
+
+| Metric | Target | Measurement |
+|--------|--------|-------------|
+| End-to-end latency P50 | <500ms | Timer instrumentation |
+| Quality (LLM judge) | 4.2+/5.0 | Automated evaluation |
+| Router accuracy | >95% | Oracle comparison |
+| Memory efficiency | <4GB (edge) | RSS monitoring |
+| Throughput | 20 QPS (edge) | Load testing |
+
+### 8.2 Secondary Metrics
+
+| Metric | Target | Measurement |
+|--------|--------|-------------|
+| Retrieval R@10 | >0.90 | Benchmark suite |
+| Forgetting rate | <5%/10K updates | Periodic eval |
+| Cost reduction | >50% vs baseline | Token counting |
+| Writeback rate | 10-30% | Database metrics |
+
+### 8.3 Regret Analysis
+
+```
+Quality Regret = E[Q_baseline - Q_routed]
+Latency Regret = E[L_routed - L_oracle]
+Cost Regret = E[C_routed - C_oracle]
+
+Targets:
+- Quality Regret < 0.1 points (1-5 scale)
+- Latency Regret < 50ms
+- Cost Regret < 10%
+```
+
+---
+
+## 9. Risk Analysis
+
+| Risk | Probability | Impact | Mitigation |
+|------|-------------|--------|------------|
+| Router misprediction | Medium | High | Confidence thresholds, fallback |
+| Catastrophic forgetting | Low | Critical | EWC, replay buffer, checkpoints |
+| Memory exhaustion | Medium | High | Streaming, tiered storage |
+| Quality degradation | Medium | High | A/B testing, rollback |
+| Latency spikes | High | Medium | Caching, async processing |
+
+---
+
+## 10. Dependencies
+
+### 10.1 Internal Dependencies
+
+```toml
+[dependencies]
+ruvector-core = { path = "../ruvector-core" }
+ruvector-gnn = { path = "../ruvector-gnn" }
+ruvector-attention = { path = "../ruvector-attention" }
+ruvector-graph = { path = "../ruvector-graph" }
+ruvector-router-core = { path = "../ruvector-router-core" }
+```
+
+### 10.2 External Dependencies
+
+```toml
+[dependencies]
+# LLM runtime
+llama-cpp-rs = "0.3"        # CPU inference
+tokenizers = "0.15"         # Fast tokenization
+
+# Async runtime
+tokio = { version = "1.41", features = ["full"] }
+
+# Serialization
+serde = { version = "1.0", features = ["derive"] }
+
+# Metrics
+prometheus = "0.13"
+tracing = "0.1"
+```
+
+---
+
+## 11. References
+
+1. **LFM2 Technical Report**: arxiv:2511.23404v1
+2. **FastGRNN**: Kusupati et al., "FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network"
+3. **EWC**: Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks"
+4. **HNSW**: Malkov & Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"
+5. **Graph Attention**: Veličković et al., "Graph Attention Networks"
+
+---
+
+*Document Version: 1.0*
+*Last Updated: 2025-12-02*
+*Author: RuvLLM Architecture Team*
--- a/examples/ruvLLM/docs/sparc/02-pseudocode.md
+++ b/examples/ruvLLM/docs/sparc/02-pseudocode.md
--- a/examples/ruvLLM/docs/sparc/03-architecture.md
+++ b/examples/ruvLLM/docs/sparc/03-architecture.md
--- a/examples/ruvLLM/docs/sparc/04-refinement.md
+++ b/examples/ruvLLM/docs/sparc/04-refinement.md
--- a/examples/ruvLLM/docs/sparc/05-completion.md
+++ b/examples/ruvLLM/docs/sparc/05-completion.md
@@ -0,0 +1,886 @@
+# RuvLLM: Integration and Deployment
+
+## SPARC Phase 5: Completion
+
+---
+
+## 1. Integration Strategy
+
+### 1.1 Crate Structure
+
+```
+ruvector/
+├── crates/
+│   ├── ruvector-core/           # Existing: Vector DB
+│   ├── ruvector-gnn/            # Existing: GNN + EWC + Replay
+│   ├── ruvector-attention/      # Existing: Attention mechanisms
+│   ├── ruvector-graph/          # Existing: Graph storage
+│   └── ruvector-router-core/    # Existing: Routing primitives
+│
+└── examples/
+    └── ruvLLM/                  # NEW: Self-learning LLM
+        ├── src/
+        │   ├── lib.rs           # Main library entry
+        │   ├── orchestrator.rs  # Request orchestration
+        │   ├── embedding.rs     # LFM2 embedding service
+        │   ├── router.rs        # FastGRNN router
+        │   ├── memory.rs        # Ruvector memory layer
+        │   ├── attention.rs     # Graph attention wrapper
+        │   ├── inference.rs     # LFM2 model pool
+        │   ├── learning.rs      # Self-learning service
+        │   ├── compression.rs   # Concept abstraction
+        │   ├── config.rs        # Configuration
+        │   ├── types.rs         # Core types
+        │   └── error.rs         # Error handling
+        ├── tests/
+        │   ├── unit/
+        │   └── integration/
+        ├── benches/
+        ├── config/
+        └── docs/                # SPARC documentation
+```
+
+### 1.2 Dependency Integration
+
+```toml
+# examples/ruvLLM/Cargo.toml
+[package]
+name = "ruvllm"
+version = "0.1.0"
+edition = "2021"
+description = "Self-learning LLM with LFM2 and Ruvector integration"
+
+[dependencies]
+# Internal dependencies (path-based for development)
+ruvector-core = { path = "../../crates/ruvector-core" }
+ruvector-gnn = { path = "../../crates/ruvector-gnn" }
+ruvector-attention = { path = "../../crates/ruvector-attention" }
+ruvector-graph = { path = "../../crates/ruvector-graph" }
+ruvector-router-core = { path = "../../crates/ruvector-router-core" }
+
+# LLM inference
+llama-cpp-rs = "0.3"           # CPU inference via llama.cpp
+tokenizers = "0.15"            # Fast tokenization
+
+# Async runtime
+tokio = { version = "1.41", features = ["full"] }
+futures = "0.3"
+
+# Serialization
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+bincode = "2.0.0-rc.3"
+
+# Numerics
+ndarray = { version = "0.16", features = ["serde"] }
+rand = "0.8"
+
+# Utilities
+uuid = { version = "1.11", features = ["v4", "serde"] }
+chrono = { version = "0.4", features = ["serde"] }
+thiserror = "2.0"
+anyhow = "1.0"
+tracing = "0.1"
+
+# Performance
+dashmap = "6.1"
+parking_lot = "0.12"
+lru = "0.12"
+
+# Metrics
+prometheus = "0.13"
+
+[dev-dependencies]
+criterion = { version = "0.5", features = ["html_reports"] }
+proptest = "1.5"
+tokio-test = "0.4"
+tempfile = "3.13"
+tracing-subscriber = "0.3"
+
+[features]
+default = ["cpu"]
+cpu = []                       # llama.cpp CPU inference
+gpu = ["vllm"]                 # vLLM GPU inference (optional)
+vllm = []
+
+[[bench]]
+name = "pipeline"
+harness = false
+
+[[bench]]
+name = "router"
+harness = false
+
+[[bench]]
+name = "memory"
+harness = false
+```
+
+### 1.3 API Surface
+
+```rust
+//! # RuvLLM - Self-Learning LLM
+//!
+//! A self-learning language model system integrating LFM2 with Ruvector.
+//!
+//! ## Architecture
+//!
+//! - **LFM2**: Frozen reasoning engine (350M-2.6B parameters)
+//! - **Ruvector**: Living memory that adapts continuously
+//! - **FastGRNN**: Control circuit for intelligent routing
+//!
+//! ## Quick Start
+//!
+//! ```rust,ignore
+//! use ruvllm::{RuvLLM, Config};
+//!
+//! #[tokio::main]
+//! async fn main() -> Result<()> {
+//!     // Initialize system
+//!     let config = Config::builder()
+//!         .db_path("./memory.db")
+//!         .model_path_350m("./models/lfm2-350m-q4.gguf")
+//!         .model_path_700m("./models/lfm2-700m-q4.gguf")
+//!         .build()?;
+//!
+//!     let llm = RuvLLM::new(config).await?;
+//!
+//!     // Process query
+//!     let response = llm.query("What is machine learning?").await?;
+//!     println!("Response: {}", response.text);
+//!     println!("Confidence: {:.2}", response.confidence);
+//!
+//!     Ok(())
+//! }
+//! ```
+//!
+//! ## Self-Learning Loops
+//!
+//! The system learns through three feedback loops:
+//!
+//! 1. **Memory Growth**: Every interaction strengthens/weakens graph edges
+//! 2. **Router Learning**: FastGRNN learns optimal model selection
+//! 3. **Compression**: Periodic summarization creates concept hierarchies
+
+pub mod attention;
+pub mod compression;
+pub mod config;
+pub mod embedding;
+pub mod error;
+pub mod inference;
+pub mod learning;
+pub mod memory;
+pub mod orchestrator;
+pub mod router;
+pub mod types;
+
+// Re-exports for convenience
+pub use config::{Config, ConfigBuilder};
+pub use error::{Error, Result};
+pub use orchestrator::RuvLLM;
+pub use types::{Request, Response, Session};
+
+/// Library version
+pub const VERSION: &str = env!("CARGO_PKG_VERSION");
+```
+
+---
+
+## 2. Implementation Checklist
+
+### 2.1 Core Components
+
+```
+Phase 1: Foundation
+━━━━━━━━━━━━━━━━━━━━
+[x] Project structure setup
+[x] Cargo.toml with dependencies
+[ ] Error types definition
+[ ] Configuration system
+[ ] Core types (Request, Response, Session)
+
+Phase 2: Services
+━━━━━━━━━━━━━━━━━━
+[ ] EmbeddingService
+    [ ] LFM2 encoder wrapper
+    [ ] Dimension projection
+    [ ] Tokenization
+    [ ] Batch processing
+
+[ ] MemoryService
+    [ ] VectorDB initialization
+    [ ] GraphStore integration
+    [ ] HNSW search wrapper
+    [ ] Graph expansion
+    [ ] Writeback queue
+
+[ ] FastGRNNRouter
+    [ ] Cell implementation
+    [ ] Sparse matrix operations
+    [ ] Low-rank matrices
+    [ ] Output heads
+    [ ] Training loop
+
+[ ] GraphAttentionEngine
+    [ ] Attention layer wrapper
+    [ ] Edge feature encoding
+    [ ] Multi-head aggregation
+    [ ] Context ranking
+
+[ ] InferencePool
+    [ ] Model loading
+    [ ] Lazy initialization
+    [ ] KV cache management
+    [ ] LRU eviction
+
+[ ] LearningService
+    [ ] Quality judge
+    [ ] Replay buffer
+    [ ] EWC integration
+    [ ] Background training
+    [ ] Compression jobs
+
+Phase 3: Orchestration
+━━━━━━━━━━━━━━━━━━━━━━
+[ ] Orchestrator
+    [ ] Request routing
+    [ ] Session management
+    [ ] Pipeline coordination
+    [ ] Metrics collection
+    [ ] Error handling
+
+Phase 4: Integration
+━━━━━━━━━━━━━━━━━━━━
+[ ] Integration tests
+[ ] Benchmark suite
+[ ] Example applications
+[ ] Documentation
+```
+
+### 2.2 Test Coverage Requirements
+
+| Component | Unit Tests | Integration | Benchmark |
+|-----------|------------|-------------|-----------|
+| Embedding | 15+ | 3+ | 2 |
+| Memory | 20+ | 5+ | 3 |
+| Router | 25+ | 5+ | 2 |
+| Attention | 15+ | 3+ | 2 |
+| Inference | 10+ | 3+ | 2 |
+| Learning | 20+ | 5+ | 1 |
+| Orchestrator | 10+ | 5+ | 2 |
+| **Total** | **115+** | **29+** | **14** |
+
+---
+
+## 3. Deployment Configurations
+
+### 3.1 Edge Deployment (Raspberry Pi / Mobile)
+
+```toml
+# config/edge.toml
+[system]
+device_class = "edge"
+max_memory_mb = 2048
+max_concurrent_requests = 2
+
+[embedding]
+model = "onnx"  # ONNX for portability
+dimension = 384
+batch_size = 1
+
+[memory]
+hnsw_m = 16
+hnsw_ef_construction = 100
+hnsw_ef_search = 32
+max_nodes = 100_000
+
+[router]
+hidden_dim = 32
+sparsity = 0.95
+confidence_threshold = 0.6
+
+[inference]
+models = ["350m"]
+quantization = "q4_k"
+max_context = 1024
+max_loaded_models = 1
+
+[learning]
+enabled = true
+quality_threshold = 0.8
+replay_capacity = 1000
+training_interval_ms = 300_000  # 5 minutes
+```
+
+### 3.2 Server Deployment (CPU)
+
+```toml
+# config/server-cpu.toml
+[system]
+device_class = "server"
+max_memory_mb = 16384
+max_concurrent_requests = 20
+
+[embedding]
+model = "lfm2-encoder"
+dimension = 768
+batch_size = 8
+
+[memory]
+hnsw_m = 32
+hnsw_ef_construction = 200
+hnsw_ef_search = 64
+max_nodes = 10_000_000
+
+[router]
+hidden_dim = 64
+sparsity = 0.9
+confidence_threshold = 0.7
+
+[inference]
+models = ["700m", "1.2b", "2.6b"]
+quantization = "q5_k"
+max_context = 4096
+max_loaded_models = 2
+
+[learning]
+enabled = true
+quality_threshold = 0.75
+replay_capacity = 100_000
+training_interval_ms = 60_000  # 1 minute
+```
+
+### 3.3 Server Deployment (GPU)
+
+```toml
+# config/server-gpu.toml
+[system]
+device_class = "gpu"
+max_memory_mb = 32768
+max_concurrent_requests = 100
+
+[embedding]
+model = "lfm2-encoder"
+dimension = 1024
+batch_size = 32
+
+[memory]
+hnsw_m = 48
+hnsw_ef_construction = 300
+hnsw_ef_search = 128
+max_nodes = 100_000_000
+
+[router]
+hidden_dim = 64
+sparsity = 0.85
+confidence_threshold = 0.75
+
+[inference]
+models = ["1.2b", "2.6b"]
+quantization = "fp16"
+max_context = 8192
+max_loaded_models = 2
+use_vllm = true
+tensor_parallel = 1
+
+[learning]
+enabled = true
+quality_threshold = 0.7
+replay_capacity = 1_000_000
+training_interval_ms = 30_000  # 30 seconds
+```
+
+---
+
+## 4. Operational Runbook
+
+### 4.1 Startup Sequence
+
+```bash
+#!/bin/bash
+# scripts/start.sh
+
+set -e
+
+CONFIG=${1:-"config/server-cpu.toml"}
+LOG_LEVEL=${LOG_LEVEL:-"info"}
+
+echo "Starting RuvLLM with config: $CONFIG"
+
+# 1. Validate configuration
+cargo run --release --bin ruvllm-validate -- --config "$CONFIG"
+
+# 2. Initialize database if needed
+if [ ! -f "data/memory.db" ]; then
+    echo "Initializing database..."
+    cargo run --release --bin ruvllm-init -- --config "$CONFIG"
+fi
+
+# 3. Download models if needed
+cargo run --release --bin ruvllm-models -- --config "$CONFIG" --check-or-download
+
+# 4. Start server
+RUST_LOG=$LOG_LEVEL cargo run --release --bin ruvllm-server -- \
+    --config "$CONFIG" \
+    --metrics-port 9090 \
+    --http-port 8080
+```
+
+### 4.2 Health Checks
+
+```rust
+/// Health check endpoint implementation
+pub struct HealthCheck {
+    memory: Arc<RuvectorMemory>,
+    router: Arc<FastGRNNRouter>,
+    inference: Arc<InferencePool>,
+}
+
+impl HealthCheck {
+    pub async fn check(&self) -> HealthStatus {
+        let mut status = HealthStatus::default();
+
+        // Check memory service
+        status.memory = match self.memory.ping().await {
+            Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
+            Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
+        };
+
+        // Check router
+        status.router = match self.router.ping() {
+            Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
+            Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
+        };
+
+        // Check inference (at least one model loadable)
+        status.inference = match self.inference.health_check().await {
+            Ok(info) => ComponentHealth::Healthy {
+                latency_ms: info.latency,
+                details: json!({
+                    "loaded_models": info.loaded_models,
+                    "available_memory": info.available_memory,
+                }),
+            },
+            Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
+        };
+
+        status.overall = if status.all_healthy() {
+            OverallHealth::Healthy
+        } else if status.any_critical() {
+            OverallHealth::Critical
+        } else {
+            OverallHealth::Degraded
+        };
+
+        status
+    }
+}
+```
+
+### 4.3 Monitoring Dashboards
+
+```yaml
+# Prometheus alerting rules
+groups:
+  - name: ruvllm
+    rules:
+      - alert: HighLatency
+        expr: histogram_quantile(0.95, ruvllm_request_latency_seconds_bucket) > 1.0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "RuvLLM P95 latency above 1s"
+
+      - alert: LowQualityScore
+        expr: avg(ruvllm_quality_score) < 0.7
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Average quality score dropped below 0.7"
+
+      - alert: MemoryPressure
+        expr: ruvllm_memory_usage_bytes / ruvllm_memory_limit_bytes > 0.9
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Memory usage above 90%"
+
+      - alert: RouterLowConfidence
+        expr: avg(ruvllm_router_confidence) < 0.5
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Router confidence consistently low"
+
+      - alert: HighErrorRate
+        expr: rate(ruvllm_errors_total[5m]) > 0.1
+        for: 5m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Error rate above 10%"
+```
+
+### 4.4 Backup and Recovery
+
+```bash
+#!/bin/bash
+# scripts/backup.sh
+
+BACKUP_DIR="/backups/ruvllm/$(date +%Y%m%d_%H%M%S)"
+mkdir -p "$BACKUP_DIR"
+
+echo "Creating backup in $BACKUP_DIR"
+
+# 1. Backup memory database
+cp -r data/memory.db "$BACKUP_DIR/memory.db"
+
+# 2. Backup router weights
+cp -r data/router_weights.bin "$BACKUP_DIR/router_weights.bin"
+
+# 3. Backup EWC state
+cp -r data/ewc_state.bin "$BACKUP_DIR/ewc_state.bin"
+
+# 4. Backup replay buffer
+cp -r data/replay_buffer.bin "$BACKUP_DIR/replay_buffer.bin"
+
+# 5. Backup configuration
+cp -r config/ "$BACKUP_DIR/config/"
+
+# 6. Create manifest
+cat > "$BACKUP_DIR/manifest.json" << EOF
+{
+  "timestamp": "$(date -Iseconds)",
+  "version": "$(cargo run --release --bin ruvllm-version)",
+  "components": {
+    "memory_db": "memory.db",
+    "router_weights": "router_weights.bin",
+    "ewc_state": "ewc_state.bin",
+    "replay_buffer": "replay_buffer.bin",
+    "config": "config/"
+  }
+}
+EOF
+
+echo "Backup complete: $BACKUP_DIR"
+
+# 7. Upload to S3 if configured
+if [ -n "$S3_BACKUP_BUCKET" ]; then
+    aws s3 sync "$BACKUP_DIR" "s3://$S3_BACKUP_BUCKET/$(basename $BACKUP_DIR)/"
+    echo "Uploaded to S3: $S3_BACKUP_BUCKET"
+fi
+```
+
+---
+
+## 5. Production Checklist
+
+### 5.1 Pre-Launch
+
+```
+Security
+━━━━━━━━
+[ ] Input validation and sanitization
+[ ] Rate limiting configured
+[ ] TLS/HTTPS enabled
+[ ] API authentication (if public)
+[ ] Secrets in environment variables
+[ ] Model integrity verification
+
+Performance
+━━━━━━━━━━━
+[ ] Load tested to expected traffic
+[ ] Memory profiled (no leaks)
+[ ] Latency targets met
+[ ] Caching configured
+[ ] Connection pooling
+
+Reliability
+━━━━━━━━━━━
+[ ] Health checks implemented
+[ ] Graceful shutdown
+[ ] Automatic restarts (systemd/k8s)
+[ ] Backup procedures tested
+[ ] Recovery procedures documented
+
+Observability
+━━━━━━━━━━━━━
+[ ] Structured logging
+[ ] Metrics exported
+[ ] Distributed tracing
+[ ] Alerting rules configured
+[ ] Dashboards created
+```
+
+### 5.2 Post-Launch
+
+```
+Daily
+━━━━━
+[ ] Check error rates
+[ ] Review quality scores
+[ ] Monitor latency trends
+[ ] Verify backup success
+
+Weekly
+━━━━━━
+[ ] Review router decisions distribution
+[ ] Analyze forgetting metrics
+[ ] Check memory growth rate
+[ ] Run compression job
+[ ] Update router weights
+
+Monthly
+━━━━━━━
+[ ] Full system backup
+[ ] Performance benchmark
+[ ] Security audit
+[ ] Dependency updates
+[ ] Evaluate student model candidates
+```
+
+---
+
+## 6. API Reference
+
+### 6.1 HTTP API
+
+```yaml
+openapi: "3.0.0"
+info:
+  title: RuvLLM API
+  version: "0.1.0"
+  description: Self-learning LLM with LFM2 and Ruvector
+
+paths:
+  /v1/query:
+    post:
+      summary: Process a query
+      requestBody:
+        required: true
+        content:
+          application/json:
+            schema:
+              type: object
+              required:
+                - query
+              properties:
+                query:
+                  type: string
+                  description: The user query
+                session_id:
+                  type: string
+                  description: Optional session for multi-turn
+                constraints:
+                  type: object
+                  properties:
+                    max_latency_ms:
+                      type: integer
+                    max_tokens:
+                      type: integer
+                    temperature:
+                      type: number
+      responses:
+        "200":
+          description: Successful response
+          content:
+            application/json:
+              schema:
+                type: object
+                properties:
+                  text:
+                    type: string
+                  confidence:
+                    type: number
+                  sources:
+                    type: array
+                    items:
+                      type: object
+                  routing_info:
+                    type: object
+
+  /v1/feedback:
+    post:
+      summary: Provide feedback on a response
+      requestBody:
+        required: true
+        content:
+          application/json:
+            schema:
+              type: object
+              required:
+                - request_id
+              properties:
+                request_id:
+                  type: string
+                rating:
+                  type: integer
+                  minimum: 1
+                  maximum: 5
+                correction:
+                  type: string
+      responses:
+        "200":
+          description: Feedback recorded
+
+  /v1/health:
+    get:
+      summary: Health check
+      responses:
+        "200":
+          description: System healthy
+        "503":
+          description: System unhealthy
+
+  /v1/metrics:
+    get:
+      summary: Prometheus metrics
+      responses:
+        "200":
+          description: Metrics in Prometheus format
+```
+
+### 6.2 Rust SDK
+
+```rust
+use ruvllm::{RuvLLM, Config, Request, Response};
+
+/// Simple query
+async fn simple_query(llm: &RuvLLM) -> Result<Response> {
+    llm.query("What is Rust?").await
+}
+
+/// Query with options
+async fn query_with_options(llm: &RuvLLM) -> Result<Response> {
+    llm.query_with(Request {
+        query: "Explain backpropagation".into(),
+        session_id: Some("user-123".into()),
+        constraints: Constraints {
+            max_latency_ms: Some(500),
+            max_tokens: Some(500),
+            temperature: Some(0.7),
+            ..Default::default()
+        },
+    }).await
+}
+
+/// Multi-turn conversation
+async fn conversation(llm: &RuvLLM) -> Result<()> {
+    let session = llm.new_session();
+
+    let r1 = llm.query_session(&session, "What is a neural network?").await?;
+    println!("Turn 1: {}", r1.text);
+
+    let r2 = llm.query_session(&session, "How do you train one?").await?;
+    println!("Turn 2: {}", r2.text);
+
+    let r3 = llm.query_session(&session, "What about overfitting?").await?;
+    println!("Turn 3: {}", r3.text);
+
+    Ok(())
+}
+
+/// Provide feedback
+async fn with_feedback(llm: &RuvLLM) -> Result<()> {
+    let response = llm.query("What is 2+2?").await?;
+
+    llm.feedback(Feedback {
+        request_id: response.request_id,
+        rating: 5,
+        correction: None,
+    }).await?;
+
+    Ok(())
+}
+
+/// Stream response
+async fn streaming(llm: &RuvLLM) -> Result<()> {
+    let mut stream = llm.query_stream("Tell me a story").await?;
+
+    while let Some(chunk) = stream.next().await {
+        print!("{}", chunk?);
+    }
+
+    Ok(())
+}
+```
+
+---
+
+## 7. Future Roadmap
+
+### 7.1 Short-Term (1-3 months)
+
+- [ ] LFM2-VL integration (vision-language)
+- [ ] Multi-GPU inference with tensor parallelism
+- [ ] Retrieval-augmented fine-tuning pipeline
+- [ ] Improved compression algorithms
+- [ ] WebAssembly deployment target
+
+### 7.2 Medium-Term (3-6 months)
+
+- [ ] Federated learning across edge nodes
+- [ ] LFM2-Audio integration (speech)
+- [ ] Custom domain fine-tuning toolkit
+- [ ] Advanced curriculum learning
+- [ ] Hyperbolic embeddings for hierarchies
+
+### 7.3 Long-Term (6-12 months)
+
+- [ ] Multi-agent collaboration
+- [ ] Neuro-symbolic reasoning integration
+- [ ] Continuous pre-training pipeline
+- [ ] Hardware-specific optimizations (NPU, TPU)
+- [ ] Enterprise multi-tenancy
+
+---
+
+## 8. Success Criteria
+
+### 8.1 Technical Metrics
+
+| Metric | Target | Current |
+|--------|--------|---------|
+| Latency P50 | <500ms | - |
+| Latency P99 | <2s | - |
+| Quality Score | >0.8 | - |
+| Router Accuracy | >90% | - |
+| Memory Efficiency | <4GB (edge) | - |
+| Throughput | 20 QPS (edge) | - |
+| Forgetting Rate | <5%/10K | - |
+| Test Coverage | >80% | - |
+
+### 8.2 Business Metrics
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| User Satisfaction | >4.0/5.0 | Survey scores |
+| Response Relevance | >85% | Human eval |
+| Knowledge Retention | >90% | Multi-turn coherence |
+| Cost Reduction | >50% | vs. always-big baseline |
+
+---
+
+## 9. Conclusion
+
+RuvLLM represents a paradigm shift from static LLMs to adaptive, self-learning systems. By treating:
+
+- **LFM2 as the stable cortex** (reasoning)
+- **Ruvector as the living synaptic mesh** (memory)
+- **FastGRNN as the control circuit** (routing)
+
+We create intelligence that emerges from the loop, not just the model.
+
+The three learning loops—memory growth, router optimization, and concept compression—enable continuous adaptation without the risks of in-place weight modification.
+
+**The intelligence is not in one model anymore. It is in the loop.**
+
+---
+
+*Document Version: 1.0*
+*Last Updated: 2025-12-02*
+*Author: RuvLLM Architecture Team*