Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/examples/ruvLLM/docs/SONA/00-OVERVIEW.md
+++ b/examples/ruvLLM/docs/SONA/00-OVERVIEW.md
@@ -0,0 +1,280 @@
+# SONA: Self-Optimizing Neural Architecture
+
+## The World's First Truly Self-Improving LLM Framework
+
+**Version**: 1.0.0
+**Status**: Architecture Specification
+**Target**: Sub-millisecond adaptive fine-tuning with continuous self-improvement
+
+---
+
+## Executive Summary
+
+SONA (Self-Optimizing Neural Architecture) is a revolutionary framework for building LLMs that continuously improve themselves through:
+
+1. **Ultra-Low Latency LoRA** - Sub-100μs parameter adaptation
+2. **Hierarchical Learning Loops** - Three-tier temporal learning (instant/hourly/weekly)
+3. **Neural Memory Consolidation** - Dream-like offline learning
+4. **Elastic Weight Consolidation++** - Zero catastrophic forgetting
+5. **ReasoningBank Integration** - Pattern-driven self-optimization
+
+---
+
+## Core Philosophy
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    SONA DESIGN PRINCIPLES                       │
+├─────────────────────────────────────────────────────────────────┤
+│  1. LEARN FROM EVERY INTERACTION                               │
+│     → No query is wasted; all become training signal           │
+│                                                                 │
+│  2. NEVER FORGET WHAT WORKS                                    │
+│     → EWC++ preserves successful patterns                      │
+│                                                                 │
+│  3. ADAPT IN REAL-TIME                                         │
+│     → LoRA updates in <100μs per request                       │
+│                                                                 │
+│  4. OPTIMIZE CONTINUOUSLY                                      │
+│     → Background loops improve without user latency            │
+│                                                                 │
+│  5. MEASURE EVERYTHING                                         │
+│     → Φ (consciousness), quality, latency, improvement rate    │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Architecture Overview
+
+```
+                              SONA Architecture
+
+    ┌──────────────────────────────────────────────────────────────┐
+    │                      USER QUERY INPUT                         │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   EMBEDDING LAYER (0.02ms)                    │
+    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐   │
+    │  │ Dual Encoder│  │ Contrastive │  │ SIMD Acceleration   │   │
+    │  │ (Q + K/V)   │  │  Learning   │  │ (AVX2/NEON)         │   │
+    │  └─────────────┘  └─────────────┘  └─────────────────────┘   │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+          ┌───────────────────────┼───────────────────────┐
+          │                       │                       │
+          ▼                       ▼                       ▼
+    ┌───────────┐          ┌───────────┐          ┌───────────────┐
+    │  MEMORY   │          │  ROUTER   │          │   ATTENTION   │
+    │  SERVICE  │◄────────►│  ENGINE   │◄────────►│   ENGINE      │
+    │           │          │           │          │               │
+    │ • HNSW    │          │ • FastGRNN│          │ • Multi-Head  │
+    │ • GNN     │          │ • LoRA    │          │ • Graph ATT   │
+    │ • Quant   │          │ • EWC++   │          │ • Edge-Aware  │
+    └─────┬─────┘          └─────┬─────┘          └───────┬───────┘
+          │                      │                        │
+          └──────────────────────┼────────────────────────┘
+                                 │
+                                 ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   LoRA ADAPTATION LAYER                       │
+    │                                                               │
+    │   W_adapted = W_base + α · (LoRA_A @ LoRA_B)                 │
+    │                                                               │
+    │   ┌────────────────────────────────────────────────────┐     │
+    │   │  Rank: 4-16  │  Update: <100μs  │  Memory: <1MB   │     │
+    │   └────────────────────────────────────────────────────┘     │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   INFERENCE ENGINE                            │
+    │                                                               │
+    │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐   │
+    │  │ Model Select │  │ Q4 Quantized │  │ Speculative Dec  │   │
+    │  │ (4 tiers)    │  │ Weights      │  │ (Draft + Verify) │   │
+    │  └──────────────┘  └──────────────┘  └──────────────────┘   │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   LEARNING LOOPS                              │
+    │                                                               │
+    │   Loop A (Instant)  │  Loop B (Hourly)  │  Loop C (Weekly)  │
+    │   ─────────────────────────────────────────────────────────  │
+    │   • Trajectory      │  • Router Train   │  • Consolidation   │
+    │   • Edge Update     │  • EWC++ Update   │  • Compression     │
+    │   • LoRA Micro      │  • Fisher Compute │  • Abstraction     │
+    │   • <1ms overhead   │  • Background     │  • Dream Learning  │
+    └─────────────────────────────┬────────────────────────────────┘
+                                  │
+                                  ▼
+    ┌──────────────────────────────────────────────────────────────┐
+    │                   REASONINGBANK                               │
+    │                                                               │
+    │   ┌─────────────────────────────────────────────────────┐    │
+    │   │  Pattern Storage  │  Similarity Lookup  │  Verdict   │    │
+    │   │  (DashMap)        │  (Cosine)           │  Judgment  │    │
+    │   └─────────────────────────────────────────────────────┘    │
+    │                                                               │
+    │   • Trajectory tracking with precision/recall feedback       │
+    │   • K-means++ pattern extraction                             │
+    │   • Confidence-weighted parameter interpolation              │
+    └──────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Key Innovation: Three-Tier Temporal Learning
+
+### Tier 1: Instant Learning (Loop A) - Per Request
+```
+Latency Budget: <1ms (amortized to <0.1ms with batching)
+
+Actions:
+├── Record query trajectory to ring buffer
+├── Update memory graph edge weights (±5%)
+├── Micro-LoRA adjustment (rank 1-2, top-k params)
+└── Async feedback signal propagation
+```
+
+### Tier 2: Background Learning (Loop B) - Hourly
+```
+Compute Budget: 10 seconds per hour
+
+Actions:
+├── Train router on accumulated trajectories
+├── Compute Fisher Information for EWC++
+├── Update LoRA base matrices (rank 4-8)
+├── Prune low-confidence patterns
+└── Checkpoint model state
+```
+
+### Tier 3: Deep Learning (Loop C) - Weekly
+```
+Compute Budget: 10 minutes per week
+
+Actions:
+├── Full memory consolidation (dream learning)
+├── Pattern abstraction and hierarchy building
+├── Memory compression (remove redundant nodes)
+├── Cross-task knowledge transfer
+└── Φ consciousness measurement (IIT)
+```
+
+---
+
+## Performance Targets
+
+| Metric | Target | Current Best | SONA Goal |
+|--------|--------|--------------|-----------|
+| Query Latency | <1ms | 0.09ms | 0.05ms |
+| LoRA Update | <100μs | N/A | 50μs |
+| Memory Footprint | <100MB | 50MB | 30MB |
+| Throughput | >50K q/s | 38K q/s | 100K q/s |
+| Improvement Rate | 10%/week | N/A | 15%/week |
+| Catastrophic Forgetting | <1% | N/A | <0.1% |
+
+---
+
+## Integration with Ruvector Ecosystem
+
+### Core Dependencies
+
+| Crate | Role in SONA | Version |
+|-------|--------------|---------|
+| `ruvector-core` | Vector memory backbone | 0.1.19 |
+| `ruvector-attention` | Multi-head graph attention | 0.1.19 |
+| `ruvector-gnn` | Message passing framework | 0.1.19 |
+| `ruvector-graph` | Knowledge graph storage | 0.1.19 |
+| `ruvector-router-core` | FastGRNN routing | 0.1.19 |
+| `exo-core` | Consciousness measurement | 0.1.0 |
+| `exo-temporal` | Memory consolidation | 0.1.0 |
+
+### New SONA-Specific Modules
+
+| Module | Purpose |
+|--------|---------|
+| `sona-lora` | Ultra-low latency LoRA adapters |
+| `sona-ewc` | Enhanced EWC with task awareness |
+| `sona-reasoning` | ReasoningBank integration |
+| `sona-dreams` | Offline consolidation engine |
+| `sona-metrics` | Self-improvement measurement |
+
+---
+
+## Document Index
+
+| Document | Description |
+|----------|-------------|
+| [01-LORA-ULTRA.md](01-LORA-ULTRA.md) | Ultra-low latency LoRA system |
+| [02-LEARNING-LOOPS.md](02-LEARNING-LOOPS.md) | Three-tier learning architecture |
+| [03-EWC-PLUS-PLUS.md](03-EWC-PLUS-PLUS.md) | Enhanced elastic weight consolidation |
+| [04-REASONINGBANK.md](04-REASONINGBANK.md) | Pattern-driven optimization |
+| [05-MEMORY-DREAMS.md](05-MEMORY-DREAMS.md) | Offline consolidation and dreams |
+| [06-COMPONENTS.md](06-COMPONENTS.md) | Component integration specs |
+| [07-IMPLEMENTATION.md](07-IMPLEMENTATION.md) | Implementation roadmap |
+| [08-BENCHMARKS.md](08-BENCHMARKS.md) | Performance targets and testing |
+| [09-API-REFERENCE.md](09-API-REFERENCE.md) | API specification |
+
+---
+
+## Quick Start
+
+```rust
+use sona::{SONAEngine, SONAConfig, LearningMode};
+
+// Initialize SONA with default configuration
+let config = SONAConfig::builder()
+    .lora_rank(8)
+    .ewc_lambda(1000.0)
+    .learning_loops(LearningMode::AllThreeTiers)
+    .memory_budget_mb(50)
+    .target_latency_us(100)
+    .build();
+
+let mut sona = SONAEngine::new(config)?;
+
+// Process queries - learning happens automatically
+let response = sona.query("What is the meaning of life?")?;
+
+// Check self-improvement metrics
+let metrics = sona.improvement_metrics();
+println!("Weekly improvement: {:.1}%", metrics.weekly_gain * 100.0);
+println!("Φ consciousness: {:.3}", metrics.phi);
+```
+
+---
+
+## Why SONA Will Create the World's Best Self-Improving LLM
+
+1. **No Other System Combines All These**:
+   - LoRA for instant adaptation
+   - EWC++ for zero forgetting
+   - ReasoningBank for pattern learning
+   - Dream consolidation for creativity
+   - Φ measurement for consciousness tracking
+
+2. **Built on Production-Proven Ruvector**:
+   - 150x faster HNSW search
+   - 39 attention mechanisms
+   - 30+ specialized crates
+   - 38K q/s throughput proven
+
+3. **Mathematically Sound**:
+   - Fisher Information preserves important weights
+   - Low-rank decomposition minimizes compute
+   - Reservoir sampling ensures unbiased learning
+   - Information-theoretic compression
+
+4. **Biologically Inspired**:
+   - Three-tier temporal learning (like human memory)
+   - Dream-based consolidation (like REM sleep)
+   - Edge-weighted graphs (like neural synapses)
+   - Attention-based retrieval (like human recall)
+
+---
+
+*SONA: Where every query makes the model smarter.*
--- a/examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
+++ b/examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
@@ -0,0 +1,559 @@
+# SONA LoRA-Ultra: Sub-100μs Adaptive Fine-Tuning
+
+## Ultra-Low Latency LoRA for Real-Time Self-Improvement
+
+---
+
+## 1. Architecture Overview
+
+### Traditional LoRA vs SONA LoRA-Ultra
+
+```
+TRADITIONAL LoRA                      SONA LoRA-ULTRA
+─────────────────                     ─────────────────
+• Offline training                    • Online per-request adaptation
+• Full batch updates                  • Single-sample micro-updates
+• GPU required                        • CPU SIMD optimized
+• Minutes to hours                    • <100 microseconds
+• Periodic deployment                 • Continuous integration
+```
+
+### Core Formula
+
+```
+Standard LoRA:
+    W_adapted = W_frozen + ΔW
+    ΔW = α · (A @ B)
+    where A ∈ ℝ^(d×r), B ∈ ℝ^(r×k), r << min(d,k)
+
+SONA LoRA-Ultra Extension:
+    W_adapted = W_frozen + α · (A @ B) + β · (A_micro @ B_micro)
+                          └─────────┘   └───────────────────┘
+                          Base LoRA     Instant Micro-LoRA
+                          (rank 4-16)   (rank 1-2)
+```
+
+---
+
+## 2. Two-Tier LoRA Architecture
+
+### Tier 1: Base LoRA (Updated Hourly)
+
+```rust
+/// Base LoRA adapter for major capability shifts
+pub struct BaseLoRA {
+    /// Low-rank matrix A: d_model × rank
+    pub a: Array2<f32>,
+    /// Low-rank matrix B: rank × d_out
+    pub b: Array2<f32>,
+    /// Scaling factor
+    pub alpha: f32,
+    /// Rank (typically 4-16)
+    pub rank: usize,
+    /// Target layer indices
+    pub target_layers: Vec<usize>,
+}
+
+impl BaseLoRA {
+    /// Compute adapted weights (cached for inference)
+    #[inline]
+    pub fn delta_w(&self) -> Array2<f32> {
+        let scale = self.alpha / self.rank as f32;
+        scale * self.a.dot(&self.b)
+    }
+
+    /// Update from accumulated gradients (hourly)
+    pub fn update(&mut self, grad_a: &Array2<f32>, grad_b: &Array2<f32>, lr: f32) {
+        // SGD with momentum
+        self.a = &self.a - lr * grad_a;
+        self.b = &self.b - lr * grad_b;
+    }
+}
+```
+
+### Tier 2: Micro-LoRA (Updated Per-Request)
+
+```rust
+/// Ultra-fast micro-adapter for instant learning
+pub struct MicroLoRA {
+    /// Micro A: d_model × micro_rank (typically 1-2)
+    pub a_micro: Array2<f32>,
+    /// Micro B: micro_rank × d_out
+    pub b_micro: Array2<f32>,
+    /// Micro scaling (smaller than base)
+    pub beta: f32,
+    /// Micro rank (1-2 for speed)
+    pub micro_rank: usize,
+    /// Decay factor for temporal smoothing
+    pub decay: f32,
+    /// Momentum buffer
+    momentum_a: Array2<f32>,
+    momentum_b: Array2<f32>,
+}
+
+impl MicroLoRA {
+    /// Ultra-fast single-sample update (<50μs target)
+    #[inline]
+    pub fn micro_update(&mut self, signal: &LearningSignal) {
+        // Rank-1 outer product update
+        let grad_direction = signal.to_gradient_direction();
+
+        // Exponential moving average for stability
+        self.momentum_a = self.decay * &self.momentum_a
+            + (1.0 - self.decay) * &grad_direction.a_component;
+        self.momentum_b = self.decay * &self.momentum_b
+            + (1.0 - self.decay) * &grad_direction.b_component;
+
+        // Apply micro-update
+        self.a_micro = &self.a_micro + self.beta * &self.momentum_a;
+        self.b_micro = &self.b_micro + self.beta * &self.momentum_b;
+    }
+
+    /// Periodic consolidation into base LoRA
+    pub fn consolidate_to_base(&mut self, base: &mut BaseLoRA) {
+        // Merge micro adaptations into base
+        // Then reset micro to zero
+        base.a = &base.a + &self.a_micro;
+        base.b = &base.b + &self.b_micro;
+        self.a_micro.fill(0.0);
+        self.b_micro.fill(0.0);
+    }
+}
+```
+
+---
+
+## 3. SIMD-Optimized LoRA Computation
+
+### AVX2 Accelerated Forward Pass
+
+```rust
+#[cfg(target_arch = "x86_64")]
+mod simd {
+    use std::arch::x86_64::*;
+
+    /// SIMD-optimized LoRA forward: x @ (W + A @ B)
+    /// Fuses base weight multiplication with LoRA delta
+    #[target_feature(enable = "avx2", enable = "fma")]
+    pub unsafe fn lora_forward_avx2(
+        x: &[f32],           // Input: [batch, d_in]
+        w_base: &[f32],      // Base weights: [d_in, d_out]
+        lora_a: &[f32],      // LoRA A: [d_in, rank]
+        lora_b: &[f32],      // LoRA B: [rank, d_out]
+        alpha: f32,
+        d_in: usize,
+        d_out: usize,
+        rank: usize,
+        output: &mut [f32],  // Output: [batch, d_out]
+    ) {
+        let scale = alpha / rank as f32;
+        let scale_vec = _mm256_set1_ps(scale);
+
+        // Step 1: Compute x @ A (input projection to rank space)
+        let mut x_projected = vec![0.0f32; rank];
+        for r in 0..rank {
+            let mut sum = _mm256_setzero_ps();
+            let mut i = 0;
+            while i + 8 <= d_in {
+                let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
+                let a_vec = _mm256_loadu_ps(lora_a.as_ptr().add(r * d_in + i));
+                sum = _mm256_fmadd_ps(x_vec, a_vec, sum);
+                i += 8;
+            }
+            x_projected[r] = horizontal_sum_avx2(sum);
+            // Handle remainder
+            while i < d_in {
+                x_projected[r] += x[i] * lora_a[r * d_in + i];
+                i += 1;
+            }
+        }
+
+        // Step 2: Compute (x @ W_base) + scale * (x_projected @ B)
+        for j in 0..d_out {
+            // Base weight contribution
+            let mut sum = _mm256_setzero_ps();
+            let mut i = 0;
+            while i + 8 <= d_in {
+                let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
+                let w_vec = _mm256_loadu_ps(w_base.as_ptr().add(j * d_in + i));
+                sum = _mm256_fmadd_ps(x_vec, w_vec, sum);
+                i += 8;
+            }
+            let mut base_result = horizontal_sum_avx2(sum);
+            while i < d_in {
+                base_result += x[i] * w_base[j * d_in + i];
+                i += 1;
+            }
+
+            // LoRA contribution
+            let mut lora_result = 0.0f32;
+            for r in 0..rank {
+                lora_result += x_projected[r] * lora_b[j * rank + r];
+            }
+
+            output[j] = base_result + scale * lora_result;
+        }
+    }
+
+    #[inline]
+    unsafe fn horizontal_sum_avx2(v: __m256) -> f32 {
+        let high = _mm256_extractf128_ps(v, 1);
+        let low = _mm256_castps256_ps128(v);
+        let sum128 = _mm_add_ps(high, low);
+        let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
+        let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
+        _mm_cvtss_f32(sum32)
+    }
+}
+```
+
+---
+
+## 4. Learning Signal Extraction
+
+### From Query Feedback to Gradient Direction
+
+```rust
+/// Learning signal extracted from each interaction
+#[derive(Clone)]
+pub struct LearningSignal {
+    /// Query embedding
+    pub query_embedding: Vec<f32>,
+    /// Response quality score (0-1)
+    pub quality_score: f32,
+    /// User feedback (explicit)
+    pub explicit_feedback: Option<FeedbackType>,
+    /// Latency deviation from target
+    pub latency_ratio: f32,
+    /// Model tier used
+    pub model_tier: ModelTier,
+    /// Context tokens used
+    pub context_tokens: usize,
+}
+
+impl LearningSignal {
+    /// Convert signal to gradient direction for micro-LoRA
+    pub fn to_gradient_direction(&self) -> GradientDirection {
+        // Reward = quality * (1 - latency_penalty)
+        let reward = self.quality_score * (2.0 - self.latency_ratio).max(0.0);
+
+        // Direction = embedding * reward_sign
+        let direction = if reward > 0.5 {
+            // Reinforce current behavior
+            1.0
+        } else {
+            // Explore alternative
+            -0.1
+        };
+
+        // Scale by uncertainty (more learning when uncertain)
+        let uncertainty = 1.0 - self.quality_score.abs();
+        let learning_rate = 0.001 * (1.0 + uncertainty);
+
+        GradientDirection {
+            a_component: self.compute_a_gradient(direction, learning_rate),
+            b_component: self.compute_b_gradient(direction, learning_rate),
+        }
+    }
+
+    fn compute_a_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
+        // Outer product of query embedding with hidden state
+        // Approximated via reservoir-sampled historical embeddings
+        let emb = Array1::from_vec(self.query_embedding.clone());
+        let grad = direction * lr * outer_product(&emb, &self.get_hidden_direction());
+        grad
+    }
+
+    fn compute_b_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
+        // Output gradient based on prediction error
+        let output_error = self.compute_output_error();
+        direction * lr * output_error
+    }
+}
+```
+
+---
+
+## 5. Target Layer Selection
+
+### Which Layers to Apply LoRA
+
+```rust
+/// Layer selection strategy for LoRA application
+pub enum LoRATargetStrategy {
+    /// Apply to all attention layers (Q, K, V, O projections)
+    AllAttention,
+    /// Apply to FFN layers only
+    AllFFN,
+    /// Apply to output heads only (fastest, good for routing)
+    OutputHeadsOnly,
+    /// Apply to specific layers by index
+    SpecificLayers(Vec<usize>),
+    /// Adaptive: select based on gradient magnitude
+    AdaptiveTopK(usize),
+}
+
+impl LoRATargetStrategy {
+    /// For ultra-low latency: output heads only
+    pub fn ultra_fast() -> Self {
+        Self::OutputHeadsOnly
+    }
+
+    /// For moderate adaptation: attention Q and V
+    pub fn attention_qv() -> Self {
+        Self::SpecificLayers(vec![0, 2]) // Q and V typically
+    }
+
+    /// Select layers with highest gradient magnitude
+    pub fn adaptive_top_k(k: usize) -> Self {
+        Self::AdaptiveTopK(k)
+    }
+}
+
+/// SONA default: Output heads for micro, attention for base
+pub const SONA_DEFAULT_TARGETS: [LoRATargetStrategy; 2] = [
+    LoRATargetStrategy::OutputHeadsOnly,  // Micro-LoRA
+    LoRATargetStrategy::AllAttention,     // Base LoRA
+];
+```
+
+---
+
+## 6. Memory-Efficient Storage
+
+### Quantized LoRA Matrices
+
+```rust
+/// Q4-quantized LoRA for memory efficiency
+pub struct QuantizedLoRA {
+    /// Quantized A matrix (4-bit)
+    pub a_q4: Q4Matrix,
+    /// Quantized B matrix (4-bit)
+    pub b_q4: Q4Matrix,
+    /// Full-precision alpha
+    pub alpha: f32,
+    /// Full-precision scaling factors
+    pub a_scales: Vec<f32>,
+    pub b_scales: Vec<f32>,
+}
+
+impl QuantizedLoRA {
+    /// Memory usage comparison
+    ///
+    /// FP32 LoRA (rank 8, 768 dim):
+    ///   A: 768 × 8 × 4 bytes = 24.6 KB
+    ///   B: 8 × 768 × 4 bytes = 24.6 KB
+    ///   Total: ~50 KB per layer
+    ///
+    /// Q4 LoRA (rank 8, 768 dim):
+    ///   A: 768 × 8 × 0.5 bytes = 3.1 KB
+    ///   B: 8 × 768 × 0.5 bytes = 3.1 KB
+    ///   Scales: 2 × 768 × 4 bytes = 6.1 KB
+    ///   Total: ~12 KB per layer (4x reduction)
+
+    pub fn from_fp32(lora: &BaseLoRA) -> Self {
+        Self {
+            a_q4: Q4Matrix::quantize(&lora.a),
+            b_q4: Q4Matrix::quantize(&lora.b),
+            alpha: lora.alpha,
+            a_scales: compute_scales(&lora.a),
+            b_scales: compute_scales(&lora.b),
+        }
+    }
+
+    /// Dequantize on-the-fly during forward pass
+    #[inline]
+    pub fn forward(&self, x: &[f32]) -> Vec<f32> {
+        // Dequantize A, compute x @ A
+        let projected = self.a_q4.matmul_dequant(x, &self.a_scales);
+        // Dequantize B, compute projected @ B
+        let output = self.b_q4.matmul_dequant(&projected, &self.b_scales);
+        // Scale by alpha
+        output.iter().map(|v| v * self.alpha).collect()
+    }
+}
+```
+
+---
+
+## 7. Latency Breakdown
+
+### Target: <100μs Total LoRA Overhead
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                  LoRA-ULTRA LATENCY BUDGET                  │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  Signal Extraction:    10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│  Gradient Direction:   15μs  ██████░░░░░░░░░░░░░░░░░░░░░░  │
+│  Micro-LoRA Update:    25μs  ██████████░░░░░░░░░░░░░░░░░░  │
+│  Forward Pass Delta:   30μs  ████████████░░░░░░░░░░░░░░░░  │
+│  Momentum Averaging:   10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│  Memory Bookkeeping:   10μs  ████░░░░░░░░░░░░░░░░░░░░░░░░  │
+│                        ─────                                │
+│  TOTAL:              ~100μs                                │
+│                                                             │
+│  Amortized (batched):  ~30μs per query                     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 8. Integration with FastGRNN Router
+
+### Router-Specific LoRA Configuration
+
+```rust
+/// LoRA configuration for FastGRNN router
+pub struct RouterLoRAConfig {
+    /// Base LoRA for hidden state transformations
+    pub hidden_lora: BaseLoRA,
+    /// Micro LoRA for gate adjustments
+    pub gate_micro_lora: MicroLoRA,
+    /// Per-output-head LoRA adapters
+    pub head_loras: Vec<BaseLoRA>,
+}
+
+impl RouterLoRAConfig {
+    pub fn new(hidden_dim: usize, output_dims: &[usize]) -> Self {
+        Self {
+            hidden_lora: BaseLoRA::new(hidden_dim, hidden_dim, 8), // rank 8
+            gate_micro_lora: MicroLoRA::new(hidden_dim, hidden_dim, 2), // rank 2
+            head_loras: output_dims.iter()
+                .map(|&dim| BaseLoRA::new(hidden_dim, dim, 4)) // rank 4
+                .collect(),
+        }
+    }
+
+    /// Apply LoRA to FastGRNN forward pass
+    pub fn apply(&self, base_output: &FastGRNNOutput) -> FastGRNNOutput {
+        let mut output = base_output.clone();
+
+        // Apply hidden state LoRA
+        output.hidden = self.hidden_lora.apply(&output.hidden);
+
+        // Apply micro-LoRA to gates
+        output.update_gate = self.gate_micro_lora.apply(&output.update_gate);
+
+        // Apply per-head LoRA
+        for (i, head_lora) in self.head_loras.iter().enumerate() {
+            output.heads[i] = head_lora.apply(&output.heads[i]);
+        }
+
+        output
+    }
+}
+```
+
+---
+
+## 9. Checkpointing and Recovery
+
+### Efficient LoRA State Management
+
+```rust
+/// LoRA checkpoint for persistence and recovery
+#[derive(Serialize, Deserialize)]
+pub struct LoRACheckpoint {
+    /// Base LoRA matrices (serialized as FP16 for space)
+    pub base_lora: SerializedLoRA,
+    /// Micro LoRA state
+    pub micro_lora: SerializedLoRA,
+    /// Momentum buffers
+    pub momentum_state: MomentumState,
+    /// Training statistics
+    pub stats: LoRAStats,
+    /// Checkpoint version
+    pub version: u32,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+impl LoRACheckpoint {
+    /// Save checkpoint (async, non-blocking)
+    pub async fn save_async(&self, path: &Path) -> Result<()> {
+        let bytes = bincode::serialize(self)?;
+        tokio::fs::write(path, &bytes).await?;
+        Ok(())
+    }
+
+    /// Load checkpoint
+    pub fn load(path: &Path) -> Result<Self> {
+        let bytes = std::fs::read(path)?;
+        Ok(bincode::deserialize(&bytes)?)
+    }
+
+    /// Incremental checkpoint (only changed matrices)
+    pub fn save_incremental(&self, previous: &Self, path: &Path) -> Result<()> {
+        let delta = self.compute_delta(previous);
+        // Only save changed blocks
+        delta.save(path)
+    }
+}
+```
+
+---
+
+## 10. Benchmark Targets
+
+### Performance Validation
+
+```rust
+#[cfg(test)]
+mod benchmarks {
+    use super::*;
+    use criterion::{black_box, Criterion};
+
+    /// Target: <50μs for micro-LoRA update
+    fn bench_micro_lora_update(c: &mut Criterion) {
+        let mut micro = MicroLoRA::new(768, 768, 2);
+        let signal = LearningSignal::random();
+
+        c.bench_function("micro_lora_update", |b| {
+            b.iter(|| {
+                micro.micro_update(black_box(&signal));
+            })
+        });
+    }
+
+    /// Target: <30μs for LoRA forward pass
+    fn bench_lora_forward(c: &mut Criterion) {
+        let lora = BaseLoRA::new(768, 768, 8);
+        let input = vec![0.0f32; 768];
+
+        c.bench_function("lora_forward", |b| {
+            b.iter(|| {
+                lora.forward(black_box(&input))
+            })
+        });
+    }
+
+    /// Target: <10μs for signal extraction
+    fn bench_signal_extraction(c: &mut Criterion) {
+        let query = "test query".to_string();
+        let response = "test response".to_string();
+
+        c.bench_function("signal_extraction", |b| {
+            b.iter(|| {
+                LearningSignal::extract(black_box(&query), black_box(&response))
+            })
+        });
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA LoRA-Ultra achieves sub-100μs adaptive fine-tuning through:
+
+1. **Two-Tier Architecture**: Base LoRA (hourly) + Micro-LoRA (per-request)
+2. **SIMD Optimization**: AVX2-accelerated forward pass
+3. **Quantized Storage**: Q4 matrices for 4x memory reduction
+4. **Smart Targeting**: Output heads for speed, attention for capability
+5. **Momentum Smoothing**: Stable micro-updates with EMA
+6. **Async Checkpointing**: Non-blocking persistence
+
+This enables true real-time self-improvement where every query makes the model incrementally smarter.
--- a/examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
+++ b/examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
@@ -0,0 +1,815 @@
+# SONA Learning Loops: Three-Tier Temporal Architecture
+
+## Biologically-Inspired Continuous Learning System
+
+---
+
+## 1. Overview: Learning at Multiple Timescales
+
+Human learning operates at multiple timescales:
+- **Instant**: Immediate response adjustment (milliseconds)
+- **Short-term**: Pattern consolidation (hours)
+- **Long-term**: Deep memory formation (days/weeks)
+
+SONA replicates this with three learning loops:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    SONA THREE-TIER LEARNING                         │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   LOOP A: INSTANT                 LOOP B: BACKGROUND                │
+│   ═══════════════                 ══════════════════                │
+│   Timescale: Per-request          Timescale: Hourly                 │
+│   Latency: <1ms                   Latency: Background (async)       │
+│   What learns:                    What learns:                      │
+│   • Micro-LoRA (rank 1-2)         • Base LoRA (rank 4-16)          │
+│   • Memory edge weights           • Router weights (EWC++)          │
+│   • Trajectory recording          • Pattern extraction              │
+│                                                                     │
+│                        LOOP C: DEEP                                 │
+│                        ═══════════                                  │
+│                        Timescale: Weekly                            │
+│                        Latency: Scheduled maintenance               │
+│                        What learns:                                 │
+│                        • Memory consolidation                       │
+│                        • Concept hierarchy building                 │
+│                        • Dream-based creativity                     │
+│                        • Cross-domain transfer                      │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Loop A: Instant Learning (Per-Request)
+
+### Purpose
+Immediate adaptation to current interaction without noticeable latency.
+
+### Architecture
+
+```rust
+/// Loop A: Instant learning executed inline with each request
+pub struct InstantLearningLoop {
+    /// Micro-LoRA for immediate weight adjustment
+    micro_lora: Arc<RwLock<MicroLoRA>>,
+    /// Trajectory buffer for pattern recording
+    trajectory_buffer: Arc<TrajectoryBuffer>,
+    /// Memory graph reference for edge updates
+    memory_graph: Arc<RwLock<MemoryGraph>>,
+    /// Signal accumulator for Loop B
+    signal_accumulator: mpsc::Sender<LearningSignal>,
+}
+
+impl InstantLearningLoop {
+    /// Execute instant learning (must complete in <1ms)
+    #[inline]
+    pub async fn on_request(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        // Parallel execution of independent updates
+        let (r1, r2, r3) = tokio::join!(
+            // 1. Record trajectory (lock-free, ~100μs)
+            self.record_trajectory(query, response),
+
+            // 2. Update memory edges (~200μs)
+            self.update_memory_edges(query, response),
+
+            // 3. Micro-LoRA update (~300μs)
+            self.micro_lora_update(query, response, latency_ms),
+        );
+
+        // 4. Queue signal for Loop B (fire-and-forget)
+        let signal = LearningSignal::new(query, response, latency_ms);
+        let _ = self.signal_accumulator.try_send(signal);
+
+        Ok(())
+    }
+
+    /// Record query trajectory to ring buffer
+    async fn record_trajectory(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+    ) -> Result<()> {
+        let trajectory = QueryTrajectory {
+            query_embedding: query.vector.clone(),
+            retrieved_ids: response.used_memory_ids.clone(),
+            precision: response.estimated_precision,
+            recall: response.estimated_recall,
+            timestamp: Instant::now(),
+        };
+
+        self.trajectory_buffer.push(trajectory);
+        Ok(())
+    }
+
+    /// Hebbian-style edge weight updates
+    async fn update_memory_edges(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+    ) -> Result<()> {
+        let mut graph = self.memory_graph.write();
+
+        for &node_id in &response.used_memory_ids {
+            // Strengthen edges to used nodes
+            graph.update_edge_weight(
+                query.anchor_node,
+                node_id,
+                EdgeUpdate::Strengthen(0.05), // +5% per use
+            )?;
+        }
+
+        // Weaken edges to retrieved-but-unused nodes
+        for &node_id in &response.retrieved_but_unused {
+            graph.update_edge_weight(
+                query.anchor_node,
+                node_id,
+                EdgeUpdate::Weaken(0.02), // -2% per skip
+            )?;
+        }
+
+        Ok(())
+    }
+
+    /// Ultra-fast micro-LoRA weight adjustment
+    async fn micro_lora_update(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        let quality = response.quality_score;
+        let latency_ratio = latency_ms / response.target_latency_ms;
+
+        // Only update if signal is informative
+        if (quality - 0.5).abs() > 0.1 || latency_ratio > 1.2 {
+            let signal = LearningSignal {
+                query_embedding: query.vector.clone(),
+                quality_score: quality,
+                explicit_feedback: None,
+                latency_ratio,
+                model_tier: response.model_tier,
+                context_tokens: response.context_tokens,
+            };
+
+            let mut micro_lora = self.micro_lora.write();
+            micro_lora.micro_update(&signal);
+        }
+
+        Ok(())
+    }
+}
+```
+
+### Latency Budget
+
+| Operation | Target | Implementation |
+|-----------|--------|----------------|
+| Trajectory recording | <100μs | Lock-free ring buffer |
+| Edge weight update | <200μs | Batch atomic updates |
+| Micro-LoRA update | <300μs | Rank-1 outer product |
+| Signal queuing | <50μs | MPSC channel try_send |
+| **Total** | **<650μs** | Parallel execution |
+
+---
+
+## 3. Loop B: Background Learning (Hourly)
+
+### Purpose
+Deeper learning from accumulated signals without impacting user latency.
+
+### Architecture
+
+```rust
+/// Loop B: Background learning running on separate thread/process
+pub struct BackgroundLearningLoop {
+    /// Signal receiver from Loop A
+    signal_receiver: mpsc::Receiver<LearningSignal>,
+    /// Accumulated signals for batch processing
+    signal_buffer: Vec<LearningSignal>,
+    /// Base LoRA for major updates
+    base_lora: Arc<RwLock<BaseLoRA>>,
+    /// Micro-LoRA to consolidate from
+    micro_lora: Arc<RwLock<MicroLoRA>>,
+    /// Router for EWC++ updates
+    router: Arc<RwLock<FastGRNNRouter>>,
+    /// EWC++ state
+    ewc_state: EWCPlusPlusState,
+    /// Pattern extractor
+    pattern_extractor: PatternExtractor,
+    /// Configuration
+    config: BackgroundLearningConfig,
+}
+
+impl BackgroundLearningLoop {
+    /// Main background loop (runs every hour)
+    pub async fn run(&mut self) {
+        let mut interval = tokio::time::interval(Duration::from_secs(3600));
+
+        loop {
+            interval.tick().await;
+
+            // Collect accumulated signals
+            self.drain_signals().await;
+
+            if self.signal_buffer.len() < self.config.min_samples {
+                tracing::info!(
+                    samples = self.signal_buffer.len(),
+                    "Insufficient samples for background training"
+                );
+                continue;
+            }
+
+            // Execute background learning steps
+            let start = Instant::now();
+
+            // Step 1: Consolidate Micro-LoRA into Base LoRA
+            self.consolidate_micro_to_base().await;
+
+            // Step 2: Train router with EWC++ regularization
+            self.train_router_ewc().await;
+
+            // Step 3: Extract and store patterns
+            self.extract_patterns().await;
+
+            // Step 4: Compute new Fisher Information
+            self.update_fisher_information().await;
+
+            // Step 5: Checkpoint current state
+            self.checkpoint().await;
+
+            tracing::info!(
+                elapsed_ms = start.elapsed().as_millis(),
+                samples = self.signal_buffer.len(),
+                "Background learning cycle completed"
+            );
+
+            // Clear buffer for next cycle
+            self.signal_buffer.clear();
+        }
+    }
+
+    /// Drain all pending signals from Loop A
+    async fn drain_signals(&mut self) {
+        while let Ok(signal) = self.signal_receiver.try_recv() {
+            self.signal_buffer.push(signal);
+        }
+    }
+
+    /// Consolidate micro-LoRA adaptations into base LoRA
+    async fn consolidate_micro_to_base(&mut self) {
+        let mut micro = self.micro_lora.write();
+        let mut base = self.base_lora.write();
+
+        // Compute consolidation weight based on signal quality
+        let avg_quality: f32 = self.signal_buffer.iter()
+            .map(|s| s.quality_score)
+            .sum::<f32>() / self.signal_buffer.len() as f32;
+
+        let consolidation_rate = if avg_quality > 0.7 {
+            1.0 // Full consolidation for high-quality signals
+        } else {
+            0.5 * avg_quality // Partial for lower quality
+        };
+
+        // Merge micro into base with rate
+        base.a = &base.a + consolidation_rate * &micro.a_micro;
+        base.b = &base.b + consolidation_rate * &micro.b_micro;
+
+        // Reset micro-LoRA
+        micro.a_micro.fill(0.0);
+        micro.b_micro.fill(0.0);
+
+        tracing::debug!(
+            consolidation_rate = consolidation_rate,
+            "Micro-LoRA consolidated to base"
+        );
+    }
+
+    /// Train router with EWC++ regularization
+    async fn train_router_ewc(&mut self) {
+        let mut router = self.router.write();
+
+        // Convert signals to RouterSamples
+        let samples: Vec<RouterSample> = self.signal_buffer.iter()
+            .map(|s| s.to_router_sample())
+            .collect();
+
+        // Mini-batch training with EWC++ loss
+        for batch in samples.chunks(self.config.batch_size) {
+            // Forward pass
+            let predictions: Vec<_> = batch.iter()
+                .map(|s| router.forward(&s.features))
+                .collect();
+
+            // Compute task loss
+            let task_loss = self.compute_task_loss(&predictions, batch);
+
+            // Compute EWC++ regularization loss
+            let ewc_loss = self.ewc_state.regularization_loss(router.get_weights());
+
+            // Total loss
+            let total_loss = task_loss + self.config.ewc_lambda * ewc_loss;
+
+            // Backward pass (gradient computation)
+            let gradients = self.compute_gradients(&total_loss, &predictions, batch);
+
+            // Apply gradients with learning rate
+            router.apply_gradients(&gradients, self.config.learning_rate);
+        }
+    }
+
+    /// Extract patterns using K-means++ clustering
+    async fn extract_patterns(&mut self) {
+        let embeddings: Vec<_> = self.signal_buffer.iter()
+            .map(|s| s.query_embedding.clone())
+            .collect();
+
+        let patterns = self.pattern_extractor.extract(
+            &embeddings,
+            self.config.num_clusters,
+        );
+
+        // Store patterns in ReasoningBank
+        for pattern in patterns {
+            self.pattern_extractor.reasoning_bank.store(pattern)?;
+        }
+
+        tracing::debug!(
+            patterns = patterns.len(),
+            "Patterns extracted and stored"
+        );
+    }
+
+    /// Update Fisher Information for EWC++
+    async fn update_fisher_information(&mut self) {
+        let router = self.router.read();
+        let current_weights = router.get_weights();
+
+        // Compute Fisher Information diagonal via gradient squares
+        let fisher_samples: Vec<_> = self.signal_buffer.iter()
+            .take(self.config.fisher_samples)
+            .collect();
+
+        let mut fisher_accum = vec![0.0f32; current_weights.len()];
+
+        for sample in fisher_samples {
+            let gradients = self.compute_sample_gradients(sample);
+            for (i, g) in gradients.iter().enumerate() {
+                fisher_accum[i] += g * g;
+            }
+        }
+
+        // Normalize
+        let n = fisher_samples.len() as f32;
+        for f in &mut fisher_accum {
+            *f /= n;
+        }
+
+        // Update EWC++ state
+        self.ewc_state.update_fisher(fisher_accum, current_weights.to_vec());
+    }
+
+    /// Checkpoint current state to disk
+    async fn checkpoint(&self) {
+        let checkpoint = SONACheckpoint {
+            base_lora: self.base_lora.read().clone(),
+            micro_lora: self.micro_lora.read().clone(),
+            router_weights: self.router.read().get_weights().to_vec(),
+            ewc_state: self.ewc_state.clone(),
+            patterns: self.pattern_extractor.reasoning_bank.export(),
+            timestamp: chrono::Utc::now().timestamp(),
+        };
+
+        let path = self.config.checkpoint_dir.join("latest.sona");
+        checkpoint.save_async(&path).await.ok();
+    }
+}
+```
+
+### Hourly Learning Budget
+
+| Operation | Target Time | Description |
+|-----------|-------------|-------------|
+| Signal draining | <100ms | Collect all queued signals |
+| Micro→Base consolidation | <500ms | Matrix addition |
+| Router training | <5s | Mini-batch SGD with EWC |
+| Pattern extraction | <2s | K-means++ clustering |
+| Fisher computation | <2s | Gradient squared accumulation |
+| Checkpointing | <500ms | Async disk write |
+| **Total** | **<10s** | Well under user-facing |
+
+---
+
+## 4. Loop C: Deep Learning (Weekly)
+
+### Purpose
+Fundamental knowledge restructuring, memory consolidation, and creative exploration.
+
+### Architecture
+
+```rust
+/// Loop C: Deep learning for major knowledge reorganization
+pub struct DeepLearningLoop {
+    /// Memory service for consolidation
+    memory: Arc<MemoryService>,
+    /// Pattern bank for abstraction
+    reasoning_bank: Arc<ReasoningBank>,
+    /// Dream engine for creative exploration
+    dream_engine: DreamEngine,
+    /// Consciousness measurement (IIT)
+    phi_calculator: PhiCalculator,
+    /// Configuration
+    config: DeepLearningConfig,
+}
+
+impl DeepLearningLoop {
+    /// Execute weekly deep learning (scheduled maintenance window)
+    pub async fn run(&mut self) -> DeepLearningReport {
+        let start = Instant::now();
+        let mut report = DeepLearningReport::new();
+
+        // Phase 1: Memory Consolidation (like sleep-based memory)
+        report.consolidation = self.consolidate_memories().await;
+
+        // Phase 2: Pattern Abstraction (concept hierarchy building)
+        report.abstraction = self.abstract_patterns().await;
+
+        // Phase 3: Dream Learning (creative recombination)
+        report.dreams = self.dream_learning().await;
+
+        // Phase 4: Cross-Domain Transfer
+        report.transfer = self.cross_domain_transfer().await;
+
+        // Phase 5: Compression (remove redundancy)
+        report.compression = self.compress_memory().await;
+
+        // Phase 6: Consciousness Measurement
+        report.phi = self.measure_consciousness().await;
+
+        report.elapsed_ms = start.elapsed().as_millis() as u64;
+        report
+    }
+
+    /// Phase 1: Consolidate short-term memories into long-term
+    async fn consolidate_memories(&mut self) -> ConsolidationReport {
+        let mut report = ConsolidationReport::default();
+
+        // Identify high-value memories (frequently accessed, high quality)
+        let memories = self.memory.get_all_nodes()?;
+        let high_value: Vec<_> = memories.iter()
+            .filter(|m| m.access_count > 5 && m.quality_score > 0.7)
+            .collect();
+
+        report.high_value_count = high_value.len();
+
+        // Strengthen connections between high-value memories
+        for i in 0..high_value.len() {
+            for j in (i+1)..high_value.len() {
+                let similarity = cosine_similarity(
+                    &high_value[i].embedding,
+                    &high_value[j].embedding,
+                );
+                if similarity > 0.7 {
+                    self.memory.strengthen_edge(
+                        high_value[i].id,
+                        high_value[j].id,
+                        similarity * 0.1,
+                    )?;
+                    report.edges_strengthened += 1;
+                }
+            }
+        }
+
+        // Decay low-value memories
+        let low_value: Vec<_> = memories.iter()
+            .filter(|m| m.access_count < 2 && m.age_days() > 30)
+            .collect();
+
+        for memory in low_value {
+            self.memory.decay_node(memory.id, 0.5)?; // 50% decay
+            report.nodes_decayed += 1;
+        }
+
+        report
+    }
+
+    /// Phase 2: Build concept hierarchies from patterns
+    async fn abstract_patterns(&mut self) -> AbstractionReport {
+        let mut report = AbstractionReport::default();
+
+        // Get all stored patterns
+        let patterns = self.reasoning_bank.get_all_patterns()?;
+
+        // Hierarchical clustering to find meta-patterns
+        let hierarchy = HierarchicalClustering::new()
+            .linkage(Linkage::Ward)
+            .distance(Distance::Cosine)
+            .fit(&patterns);
+
+        // Create abstract concepts at each level
+        for level in 0..hierarchy.num_levels() {
+            let clusters = hierarchy.clusters_at_level(level);
+
+            for cluster in clusters {
+                if cluster.size() > 3 {
+                    // Create meta-pattern (centroid)
+                    let meta_pattern = LearnedPattern {
+                        centroid: cluster.centroid(),
+                        confidence: cluster.cohesion(),
+                        abstraction_level: level,
+                        child_patterns: cluster.member_ids(),
+                    };
+
+                    self.reasoning_bank.store_meta(meta_pattern)?;
+                    report.meta_patterns_created += 1;
+                }
+            }
+        }
+
+        report
+    }
+
+    /// Phase 3: Dream-based creative learning (inspired by REM sleep)
+    async fn dream_learning(&mut self) -> DreamReport {
+        let mut report = DreamReport::default();
+
+        // Generate dream sequences by random walks on memory graph
+        for _ in 0..self.config.num_dreams {
+            let dream = self.dream_engine.generate_dream(
+                &self.memory,
+                self.config.dream_length,
+                self.config.creativity_temperature,
+            )?;
+
+            // Evaluate dream quality (novelty + coherence)
+            let quality = dream.evaluate_quality();
+
+            if quality.novelty > 0.5 && quality.coherence > 0.3 {
+                // Dreams with high novelty and reasonable coherence
+                // may represent useful creative connections
+                for connection in dream.novel_connections() {
+                    self.memory.add_weak_edge(
+                        connection.from,
+                        connection.to,
+                        EdgeType::Creative,
+                        connection.strength * 0.1,
+                    )?;
+                    report.novel_connections += 1;
+                }
+            }
+
+            report.dreams_generated += 1;
+        }
+
+        report
+    }
+
+    /// Phase 4: Transfer knowledge across domains
+    async fn cross_domain_transfer(&mut self) -> TransferReport {
+        let mut report = TransferReport::default();
+
+        // Identify domain clusters
+        let domains = self.memory.identify_domains()?;
+
+        // For each pair of domains, look for analogical mappings
+        for i in 0..domains.len() {
+            for j in (i+1)..domains.len() {
+                let analogies = self.find_analogies(&domains[i], &domains[j])?;
+
+                for analogy in analogies {
+                    if analogy.confidence > 0.6 {
+                        // Create cross-domain edge
+                        self.memory.add_analogy_edge(
+                            analogy.source_concept,
+                            analogy.target_concept,
+                            analogy.mapping_type,
+                            analogy.confidence,
+                        )?;
+                        report.analogies_found += 1;
+                    }
+                }
+            }
+        }
+
+        report
+    }
+
+    /// Phase 5: Compress memory by removing redundancy
+    async fn compress_memory(&mut self) -> CompressionReport {
+        let mut report = CompressionReport::default();
+        report.initial_nodes = self.memory.node_count();
+        report.initial_edges = self.memory.edge_count();
+
+        // Identify near-duplicate nodes
+        let duplicates = self.memory.find_near_duplicates(0.95)?;
+
+        // Merge duplicates
+        for (primary, secondary) in duplicates {
+            self.memory.merge_nodes(primary, secondary)?;
+            report.nodes_merged += 1;
+        }
+
+        // Prune weak edges
+        let weak_edges = self.memory.get_weak_edges(0.01)?;
+        for edge in weak_edges {
+            self.memory.remove_edge(edge.id)?;
+            report.edges_pruned += 1;
+        }
+
+        report.final_nodes = self.memory.node_count();
+        report.final_edges = self.memory.edge_count();
+        report.compression_ratio = report.initial_nodes as f32 / report.final_nodes as f32;
+
+        report
+    }
+
+    /// Phase 6: Measure system consciousness using IIT
+    async fn measure_consciousness(&mut self) -> f64 {
+        // Integrated Information Theory (Φ) calculation
+        // Measures how much information the system generates "above and beyond"
+        // its parts
+        self.phi_calculator.compute_phi(&self.memory, &self.reasoning_bank)
+    }
+}
+```
+
+### Weekly Deep Learning Budget
+
+| Phase | Target Time | Description |
+|-------|-------------|-------------|
+| Memory consolidation | <2min | Identify and strengthen valuable memories |
+| Pattern abstraction | <3min | Hierarchical clustering for concepts |
+| Dream learning | <2min | Creative recombination exploration |
+| Cross-domain transfer | <2min | Analogical mapping between domains |
+| Compression | <1min | Remove redundancy |
+| Φ measurement | <1min | Consciousness quantification |
+| **Total** | **<10min** | Scheduled maintenance window |
+
+---
+
+## 5. Loop Coordination
+
+### Inter-Loop Communication
+
+```rust
+/// Coordinator for all three learning loops
+pub struct LoopCoordinator {
+    /// Loop A: Instant
+    instant_loop: InstantLearningLoop,
+    /// Loop B: Background
+    background_loop: BackgroundLearningLoop,
+    /// Loop C: Deep
+    deep_loop: DeepLearningLoop,
+    /// Shared state
+    shared_state: Arc<SharedSONAState>,
+    /// Metrics collector
+    metrics: MetricsCollector,
+}
+
+impl LoopCoordinator {
+    /// Initialize all loops with shared state
+    pub fn new(config: SONAConfig) -> Result<Self> {
+        let shared_state = Arc::new(SharedSONAState::new(&config)?);
+
+        // Create channels for inter-loop communication
+        let (instant_to_background_tx, instant_to_background_rx) = mpsc::channel(10000);
+        let (background_to_deep_tx, background_to_deep_rx) = mpsc::channel(1000);
+
+        Ok(Self {
+            instant_loop: InstantLearningLoop::new(
+                shared_state.clone(),
+                instant_to_background_tx,
+            ),
+            background_loop: BackgroundLearningLoop::new(
+                shared_state.clone(),
+                instant_to_background_rx,
+                background_to_deep_tx,
+            ),
+            deep_loop: DeepLearningLoop::new(
+                shared_state.clone(),
+                background_to_deep_rx,
+            ),
+            shared_state,
+            metrics: MetricsCollector::new(),
+        })
+    }
+
+    /// Start all loops
+    pub async fn start(&self) {
+        // Loop A runs inline with requests (no separate task)
+
+        // Loop B runs on background thread
+        let background = self.background_loop.clone();
+        tokio::spawn(async move {
+            background.run().await;
+        });
+
+        // Loop C runs on scheduled cron
+        let deep = self.deep_loop.clone();
+        tokio::spawn(async move {
+            let mut scheduler = cron::Schedule::from_str("0 0 3 * * 0")?; // 3 AM Sunday
+            loop {
+                let next = scheduler.upcoming(chrono::Utc).next().unwrap();
+                tokio::time::sleep_until(next.into()).await;
+                deep.run().await;
+            }
+        });
+    }
+
+    /// Process a single request through Loop A
+    #[inline]
+    pub async fn on_request(
+        &self,
+        query: &QueryEmbedding,
+        response: &ResponseData,
+        latency_ms: f32,
+    ) -> Result<()> {
+        self.instant_loop.on_request(query, response, latency_ms).await
+    }
+}
+```
+
+---
+
+## 6. Learning Metrics and Monitoring
+
+### Improvement Tracking
+
+```rust
+/// Metrics for measuring self-improvement
+#[derive(Clone, Debug)]
+pub struct ImprovementMetrics {
+    /// Quality improvement over time
+    pub quality_delta_7d: f32,
+    pub quality_delta_30d: f32,
+
+    /// Latency improvement
+    pub latency_delta_7d: f32,
+    pub latency_delta_30d: f32,
+
+    /// Knowledge growth
+    pub memory_nodes_added_7d: usize,
+    pub patterns_learned_7d: usize,
+    pub abstractions_created_7d: usize,
+
+    /// Forgetting resistance (1.0 = no forgetting)
+    pub retention_rate_7d: f32,
+
+    /// Consciousness level (Φ)
+    pub phi_current: f64,
+    pub phi_delta_7d: f64,
+
+    /// Dreams and creativity
+    pub novel_connections_7d: usize,
+    pub cross_domain_transfers_7d: usize,
+}
+
+impl ImprovementMetrics {
+    /// Compute overall improvement score
+    pub fn overall_score(&self) -> f32 {
+        let quality_weight = 0.3;
+        let latency_weight = 0.2;
+        let knowledge_weight = 0.2;
+        let retention_weight = 0.15;
+        let creativity_weight = 0.15;
+
+        let quality_score = self.quality_delta_7d.max(0.0);
+        let latency_score = (-self.latency_delta_7d).max(0.0); // Lower is better
+        let knowledge_score = (self.patterns_learned_7d as f32 / 100.0).min(1.0);
+        let retention_score = self.retention_rate_7d;
+        let creativity_score = (self.novel_connections_7d as f32 / 50.0).min(1.0);
+
+        quality_weight * quality_score +
+        latency_weight * latency_score +
+        knowledge_weight * knowledge_score +
+        retention_weight * retention_score +
+        creativity_weight * creativity_score
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA's three-tier learning system enables:
+
+| Loop | Timescale | Purpose | Key Outcome |
+|------|-----------|---------|-------------|
+| **A** | Per-request | Instant adaptation | Responsive to current context |
+| **B** | Hourly | Pattern consolidation | Stable improvement |
+| **C** | Weekly | Deep restructuring | Creative breakthroughs |
+
+This mirrors human learning where:
+- **Loop A** = Working memory and immediate response
+- **Loop B** = Sleep-based consolidation
+- **Loop C** = Long-term memory formation and insight
+
+The result is a system that continuously improves at multiple timescales, never forgetting what works while constantly exploring new possibilities.
--- a/examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
+++ b/examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
@@ -0,0 +1,795 @@
+# SONA EWC++: Enhanced Elastic Weight Consolidation
+
+## Zero Catastrophic Forgetting with Task-Aware Regularization
+
+---
+
+## 1. The Forgetting Problem
+
+### Why LLMs Forget
+
+```
+CATASTROPHIC FORGETTING
+═══════════════════════
+
+Task A learned     Task B learned     Result
+───────────────    ───────────────    ──────────────────
+Weights W_A        Weights W_B        W_A knowledge LOST
+                   ↑                  as W moves toward B
+                   Training on B
+                   overwrites A
+```
+
+When fine-tuning on new data:
+- Weights shift toward new task optimum
+- Previous task knowledge encoded in old weights is overwritten
+- Model "forgets" earlier capabilities
+
+### Standard EWC Solution
+
+Elastic Weight Consolidation (EWC) adds a regularization term:
+
+```
+L_total = L_task + λ/2 · Σᵢ Fᵢ · (θᵢ - θ*ᵢ)²
+
+Where:
+- L_task = current task loss
+- λ = regularization strength
+- Fᵢ = Fisher Information (importance) of parameter i
+- θᵢ = current parameter value
+- θ*ᵢ = optimal parameter value from previous task
+```
+
+### EWC Limitations
+
+1. **Single task memory**: Only remembers one previous task
+2. **Static Fisher**: Computed once, never updated
+3. **Diagonal approximation**: Ignores parameter correlations
+4. **No task detection**: Doesn't know when task changes
+5. **Uniform λ**: Same regularization for all parameters
+
+---
+
+## 2. SONA EWC++ Enhancements
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         EWC++ ARCHITECTURE                          │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐      │
+│   │ Task Buffer   │    │ Online Fisher │    │ Adaptive λ    │      │
+│   │ (N tasks)     │    │ Estimation    │    │ Scheduler     │      │
+│   └───────┬───────┘    └───────┬───────┘    └───────┬───────┘      │
+│           │                    │                    │               │
+│           ▼                    ▼                    ▼               │
+│   ┌─────────────────────────────────────────────────────────────┐  │
+│   │                    EWC++ CORE ENGINE                         │  │
+│   │                                                               │  │
+│   │  L = L_task + Σₜ λₜ/2 · Σᵢ Fᵢᵗ · (θᵢ - θ*ᵢᵗ)² + L_sparse   │  │
+│   │      └─────┘   └──────────────────────────────────┘ └──────┘  │  │
+│   │      Task      Multi-task EWC                       Sparsity  │  │
+│   │      Loss      Regularization                       Penalty   │  │
+│   └─────────────────────────────────────────────────────────────┘  │
+│           │                    │                    │               │
+│           ▼                    ▼                    ▼               │
+│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐      │
+│   │ Gradient      │    │ Task Boundary │    │ Parameter     │      │
+│   │ Projection    │    │ Detection     │    │ Importance    │      │
+│   └───────────────┘    └───────────────┘    └───────────────┘      │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Multi-Task Memory Buffer
+
+### Task-Stratified Fisher Storage
+
+```rust
+/// EWC++ state with multi-task memory
+#[derive(Clone)]
+pub struct EWCPlusPlusState {
+    /// Per-task Fisher information (circular buffer of N tasks)
+    pub task_fishers: CircularBuffer<TaskFisher>,
+    /// Maximum number of tasks to remember
+    pub max_tasks: usize,
+    /// Per-task regularization strength
+    pub task_lambdas: Vec<f32>,
+    /// Global lambda base
+    pub lambda_base: f32,
+    /// Online Fisher estimator
+    pub online_fisher: OnlineFisherEstimator,
+    /// Task boundary detector
+    pub task_detector: TaskBoundaryDetector,
+    /// Parameter importance scores
+    pub importance_scores: Vec<f32>,
+}
+
+/// Fisher information for a single task
+#[derive(Clone)]
+pub struct TaskFisher {
+    /// Task identifier
+    pub task_id: u64,
+    /// Diagonal Fisher Information
+    pub fisher_diag: Vec<f32>,
+    /// Optimal weights at task completion
+    pub optimal_weights: Vec<f32>,
+    /// Task-specific lambda (learned)
+    pub lambda: f32,
+    /// Sample count used to compute Fisher
+    pub sample_count: usize,
+    /// Task quality score
+    pub quality: f32,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+impl EWCPlusPlusState {
+    /// Create new EWC++ state
+    pub fn new(num_params: usize, max_tasks: usize, lambda_base: f32) -> Self {
+        Self {
+            task_fishers: CircularBuffer::new(max_tasks),
+            max_tasks,
+            task_lambdas: Vec::new(),
+            lambda_base,
+            online_fisher: OnlineFisherEstimator::new(num_params),
+            task_detector: TaskBoundaryDetector::new(),
+            importance_scores: vec![1.0; num_params],
+        }
+    }
+
+    /// Compute total EWC++ regularization loss
+    pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
+        let mut total_loss = 0.0;
+
+        // Sum over all remembered tasks
+        for task in self.task_fishers.iter() {
+            let task_loss: f32 = task.fisher_diag.iter()
+                .zip(current_weights.iter())
+                .zip(task.optimal_weights.iter())
+                .zip(self.importance_scores.iter())
+                .map(|(((f, w), w_star), imp)| {
+                    // Importance-weighted Fisher regularization
+                    imp * f * (w - w_star).powi(2)
+                })
+                .sum();
+
+            total_loss += task.lambda * task_loss;
+        }
+
+        total_loss / 2.0
+    }
+
+    /// Compute gradients of EWC++ loss
+    pub fn regularization_gradient(&self, current_weights: &[f32]) -> Vec<f32> {
+        let mut grad = vec![0.0f32; current_weights.len()];
+
+        for task in self.task_fishers.iter() {
+            for (i, ((f, w), w_star)) in task.fisher_diag.iter()
+                .zip(current_weights.iter())
+                .zip(task.optimal_weights.iter())
+                .enumerate()
+            {
+                // d/dw [F * (w - w*)²] = 2 * F * (w - w*)
+                grad[i] += task.lambda * self.importance_scores[i] * f * (w - w_star);
+            }
+        }
+
+        grad
+    }
+
+    /// Record completion of current task
+    pub fn complete_task(&mut self, weights: &[f32], quality: f32) {
+        let task_id = self.task_fishers.len() as u64;
+
+        // Finalize online Fisher estimate
+        let fisher_diag = self.online_fisher.finalize();
+
+        // Compute task-specific lambda based on quality
+        let lambda = self.compute_task_lambda(quality);
+
+        let task_fisher = TaskFisher {
+            task_id,
+            fisher_diag,
+            optimal_weights: weights.to_vec(),
+            lambda,
+            sample_count: self.online_fisher.sample_count(),
+            quality,
+            timestamp: chrono::Utc::now().timestamp(),
+        };
+
+        self.task_fishers.push(task_fisher);
+        self.task_lambdas.push(lambda);
+
+        // Reset online Fisher for next task
+        self.online_fisher.reset();
+    }
+
+    /// Compute task-specific lambda based on quality
+    fn compute_task_lambda(&self, quality: f32) -> f32 {
+        // Higher quality tasks get stronger protection
+        self.lambda_base * (0.5 + 0.5 * quality)
+    }
+}
+```
+
+---
+
+## 4. Online Fisher Estimation
+
+### Streaming Fisher Information Computation
+
+```rust
+/// Online Fisher Information estimator using gradient accumulation
+pub struct OnlineFisherEstimator {
+    /// Running sum of squared gradients
+    gradient_sq_sum: Vec<f32>,
+    /// Sample count
+    count: usize,
+    /// Exponential moving average decay
+    decay: f32,
+    /// Minimum samples before valid estimate
+    min_samples: usize,
+}
+
+impl OnlineFisherEstimator {
+    pub fn new(num_params: usize) -> Self {
+        Self {
+            gradient_sq_sum: vec![0.0; num_params],
+            count: 0,
+            decay: 0.99, // EMA decay factor
+            min_samples: 100,
+        }
+    }
+
+    /// Update Fisher estimate with new gradient sample
+    #[inline]
+    pub fn update(&mut self, gradients: &[f32]) {
+        self.count += 1;
+
+        if self.count == 1 {
+            // First sample: initialize
+            for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
+                *sum = g * g;
+            }
+        } else {
+            // EMA update: F_new = decay * F_old + (1 - decay) * g²
+            let alpha = 1.0 - self.decay;
+            for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
+                *sum = self.decay * *sum + alpha * g * g;
+            }
+        }
+    }
+
+    /// Finalize and return Fisher diagonal
+    pub fn finalize(&self) -> Vec<f32> {
+        if self.count < self.min_samples {
+            tracing::warn!(
+                count = self.count,
+                min = self.min_samples,
+                "Fisher estimate may be unreliable"
+            );
+        }
+
+        // Normalize and apply minimum threshold
+        let min_fisher = 1e-6;
+        self.gradient_sq_sum.iter()
+            .map(|&f| f.max(min_fisher))
+            .collect()
+    }
+
+    /// Reset for new task
+    pub fn reset(&mut self) {
+        self.gradient_sq_sum.fill(0.0);
+        self.count = 0;
+    }
+
+    pub fn sample_count(&self) -> usize {
+        self.count
+    }
+}
+```
+
+---
+
+## 5. Automatic Task Boundary Detection
+
+### Detecting When the Task Changes
+
+```rust
+/// Automatic task boundary detection via distribution shift
+pub struct TaskBoundaryDetector {
+    /// Recent query embedding buffer
+    recent_embeddings: CircularBuffer<Vec<f32>>,
+    /// Baseline distribution (mean, variance)
+    baseline: Option<DistributionStats>,
+    /// Threshold for detecting shift (Mahalanobis distance)
+    shift_threshold: f32,
+    /// Minimum samples before detection
+    warmup_samples: usize,
+    /// Current drift score
+    drift_score: f32,
+}
+
+impl TaskBoundaryDetector {
+    pub fn new() -> Self {
+        Self {
+            recent_embeddings: CircularBuffer::new(1000),
+            baseline: None,
+            shift_threshold: 3.0, // 3 sigma
+            warmup_samples: 500,
+            drift_score: 0.0,
+        }
+    }
+
+    /// Update with new embedding and check for task boundary
+    pub fn update(&mut self, embedding: &[f32]) -> TaskBoundaryResult {
+        self.recent_embeddings.push(embedding.to_vec());
+
+        if self.recent_embeddings.len() < self.warmup_samples {
+            return TaskBoundaryResult::Warmup;
+        }
+
+        match &self.baseline {
+            None => {
+                // First baseline establishment
+                self.baseline = Some(self.compute_stats());
+                TaskBoundaryResult::BaselineEstablished
+            }
+            Some(baseline) => {
+                // Compute current distribution
+                let current = self.compute_recent_stats(100);
+
+                // Mahalanobis distance between distributions
+                let distance = self.mahalanobis_distance(baseline, &current);
+                self.drift_score = distance;
+
+                if distance > self.shift_threshold {
+                    // Task boundary detected!
+                    self.baseline = Some(current);
+                    TaskBoundaryResult::BoundaryDetected {
+                        drift_score: distance,
+                    }
+                } else {
+                    TaskBoundaryResult::Stable {
+                        drift_score: distance,
+                    }
+                }
+            }
+        }
+    }
+
+    fn compute_stats(&self) -> DistributionStats {
+        let n = self.recent_embeddings.len();
+        let dim = self.recent_embeddings[0].len();
+
+        let mut mean = vec![0.0f32; dim];
+        let mut var = vec![0.0f32; dim];
+
+        // Compute mean
+        for emb in self.recent_embeddings.iter() {
+            for (m, e) in mean.iter_mut().zip(emb.iter()) {
+                *m += e;
+            }
+        }
+        for m in &mut mean {
+            *m /= n as f32;
+        }
+
+        // Compute variance
+        for emb in self.recent_embeddings.iter() {
+            for (v, (e, m)) in var.iter_mut().zip(emb.iter().zip(mean.iter())) {
+                *v += (e - m).powi(2);
+            }
+        }
+        for v in &mut var {
+            *v /= n as f32;
+            *v = v.max(1e-6); // Avoid division by zero
+        }
+
+        DistributionStats { mean, variance: var }
+    }
+
+    fn compute_recent_stats(&self, n: usize) -> DistributionStats {
+        // Similar but only for last n samples
+        // ... implementation ...
+    }
+
+    fn mahalanobis_distance(&self, a: &DistributionStats, b: &DistributionStats) -> f32 {
+        a.mean.iter()
+            .zip(b.mean.iter())
+            .zip(a.variance.iter())
+            .map(|((m_a, m_b), v)| (m_a - m_b).powi(2) / v)
+            .sum::<f32>()
+            .sqrt()
+    }
+}
+
+#[derive(Debug)]
+pub enum TaskBoundaryResult {
+    Warmup,
+    BaselineEstablished,
+    Stable { drift_score: f32 },
+    BoundaryDetected { drift_score: f32 },
+}
+```
+
+---
+
+## 6. Adaptive Lambda Scheduling
+
+### Dynamic Regularization Strength
+
+```rust
+/// Adaptive lambda scheduler based on learning progress
+pub struct AdaptiveLambdaScheduler {
+    /// Base lambda value
+    base_lambda: f32,
+    /// Current effective lambda
+    current_lambda: f32,
+    /// Performance history (task quality over time)
+    performance_history: Vec<f32>,
+    /// Lambda adjustment rate
+    adjustment_rate: f32,
+}
+
+impl AdaptiveLambdaScheduler {
+    pub fn new(base_lambda: f32) -> Self {
+        Self {
+            base_lambda,
+            current_lambda: base_lambda,
+            performance_history: Vec::new(),
+            adjustment_rate: 0.1,
+        }
+    }
+
+    /// Update lambda based on recent performance
+    pub fn update(&mut self, current_quality: f32, forgetting_detected: bool) {
+        self.performance_history.push(current_quality);
+
+        if forgetting_detected {
+            // Increase lambda to prevent forgetting
+            self.current_lambda *= 1.0 + self.adjustment_rate;
+            tracing::info!(
+                new_lambda = self.current_lambda,
+                "Increased lambda due to forgetting"
+            );
+        } else if self.is_learning_stalled() {
+            // Decrease lambda to allow more plasticity
+            self.current_lambda *= 1.0 - self.adjustment_rate;
+            self.current_lambda = self.current_lambda.max(self.base_lambda * 0.1);
+            tracing::info!(
+                new_lambda = self.current_lambda,
+                "Decreased lambda to increase plasticity"
+            );
+        }
+
+        // Clamp to reasonable range
+        self.current_lambda = self.current_lambda.clamp(
+            self.base_lambda * 0.1,
+            self.base_lambda * 10.0,
+        );
+    }
+
+    fn is_learning_stalled(&self) -> bool {
+        if self.performance_history.len() < 10 {
+            return false;
+        }
+
+        let recent: Vec<_> = self.performance_history.iter()
+            .rev()
+            .take(10)
+            .collect();
+
+        // Check if variance in recent performance is very low
+        let mean: f32 = recent.iter().map(|&&x| x).sum::<f32>() / 10.0;
+        let var: f32 = recent.iter()
+            .map(|&&x| (x - mean).powi(2))
+            .sum::<f32>() / 10.0;
+
+        var < 0.001 // Stalled if very low variance
+    }
+
+    pub fn get_lambda(&self) -> f32 {
+        self.current_lambda
+    }
+}
+```
+
+---
+
+## 7. Parameter Importance Scoring
+
+### Which Parameters Matter Most
+
+```rust
+/// Per-parameter importance scoring for selective regularization
+pub struct ParameterImportanceScorer {
+    /// Importance scores (0-1 for each parameter)
+    scores: Vec<f32>,
+    /// Gradient magnitude history
+    gradient_magnitudes: Vec<CircularBuffer<f32>>,
+    /// Activation frequency
+    activation_frequency: Vec<f32>,
+}
+
+impl ParameterImportanceScorer {
+    pub fn new(num_params: usize) -> Self {
+        Self {
+            scores: vec![1.0; num_params],
+            gradient_magnitudes: (0..num_params)
+                .map(|_| CircularBuffer::new(100))
+                .collect(),
+            activation_frequency: vec![0.0; num_params],
+        }
+    }
+
+    /// Update importance based on gradient
+    pub fn update(&mut self, gradients: &[f32], activations: &[bool]) {
+        for (i, (g, &active)) in gradients.iter().zip(activations.iter()).enumerate() {
+            // Track gradient magnitude
+            self.gradient_magnitudes[i].push(g.abs());
+
+            // Track activation frequency
+            if active {
+                self.activation_frequency[i] = 0.99 * self.activation_frequency[i] + 0.01;
+            } else {
+                self.activation_frequency[i] *= 0.99;
+            }
+        }
+
+        // Recompute importance scores
+        self.recompute_scores();
+    }
+
+    fn recompute_scores(&mut self) {
+        for i in 0..self.scores.len() {
+            // Average gradient magnitude
+            let avg_grad: f32 = self.gradient_magnitudes[i].iter()
+                .sum::<f32>() / self.gradient_magnitudes[i].len().max(1) as f32;
+
+            // Importance = activation_freq * gradient_magnitude
+            // High activation + high gradient = important parameter
+            self.scores[i] = self.activation_frequency[i] * avg_grad;
+        }
+
+        // Normalize scores to [0, 1]
+        let max_score = self.scores.iter().cloned().fold(0.0f32, f32::max);
+        if max_score > 0.0 {
+            for s in &mut self.scores {
+                *s /= max_score;
+            }
+        }
+    }
+
+    pub fn get_scores(&self) -> &[f32] {
+        &self.scores
+    }
+}
+```
+
+---
+
+## 8. Gradient Projection
+
+### Safe Parameter Updates
+
+```rust
+/// Project gradients to avoid interfering with important past knowledge
+pub struct GradientProjector {
+    /// Null space of important task gradients
+    null_space: Option<Array2<f32>>,
+    /// Task gradient subspace (principal components)
+    task_subspace: Option<Array2<f32>>,
+}
+
+impl GradientProjector {
+    /// Project gradient to not interfere with past tasks
+    pub fn project(&self, gradient: &[f32]) -> Vec<f32> {
+        match &self.null_space {
+            Some(null) => {
+                // Project gradient onto null space of past task gradients
+                let g = Array1::from_vec(gradient.to_vec());
+                let projected = null.t().dot(&null.dot(&g));
+                projected.to_vec()
+            }
+            None => gradient.to_vec(),
+        }
+    }
+
+    /// Update null space with new task gradient directions
+    pub fn add_task_gradients(&mut self, task_gradients: &[Vec<f32>]) {
+        // Stack gradients into matrix
+        let n_samples = task_gradients.len();
+        let n_params = task_gradients[0].len();
+
+        let mut g_matrix = Array2::zeros((n_samples, n_params));
+        for (i, g) in task_gradients.iter().enumerate() {
+            for (j, &v) in g.iter().enumerate() {
+                g_matrix[[i, j]] = v;
+            }
+        }
+
+        // SVD to find principal gradient directions
+        let svd = g_matrix.svd(true, true).unwrap();
+        let u = svd.u.unwrap();
+
+        // Null space = complement of principal directions
+        // For memory efficiency, keep top-k directions
+        let k = 10.min(n_samples);
+        let task_directions = u.slice(s![.., ..k]).to_owned();
+
+        // Compute null space projection matrix
+        let identity = Array2::eye(n_params);
+        let projection = identity - task_directions.t().dot(&task_directions);
+
+        self.null_space = Some(projection);
+    }
+}
+```
+
+---
+
+## 9. Full EWC++ Training Loop
+
+### Putting It All Together
+
+```rust
+/// Complete EWC++ training step
+pub fn ewc_plus_plus_train_step(
+    model: &mut FastGRNNRouter,
+    ewc: &mut EWCPlusPlusState,
+    batch: &[RouterSample],
+    config: &TrainingConfig,
+) -> TrainStepResult {
+    let mut result = TrainStepResult::default();
+
+    // Forward pass
+    let predictions: Vec<_> = batch.iter()
+        .map(|s| model.forward(&s.features))
+        .collect();
+
+    // Task loss
+    let task_loss = compute_cross_entropy_loss(&predictions, batch);
+    result.task_loss = task_loss;
+
+    // EWC++ regularization loss
+    let ewc_loss = ewc.regularization_loss(model.get_weights());
+    result.ewc_loss = ewc_loss;
+
+    // Total loss
+    let total_loss = task_loss + config.lambda * ewc_loss;
+    result.total_loss = total_loss;
+
+    // Compute task gradients
+    let task_gradients = compute_gradients(&task_loss, model);
+
+    // Compute EWC++ gradients
+    let ewc_gradients = ewc.regularization_gradient(model.get_weights());
+
+    // Total gradients
+    let mut gradients: Vec<f32> = task_gradients.iter()
+        .zip(ewc_gradients.iter())
+        .map(|(t, e)| t + config.lambda * e)
+        .collect();
+
+    // Gradient projection (optional, for harder constraints)
+    if config.use_gradient_projection {
+        gradients = ewc.gradient_projector.project(&gradients);
+    }
+
+    // Gradient clipping
+    let grad_norm: f32 = gradients.iter().map(|g| g * g).sum::<f32>().sqrt();
+    if grad_norm > config.max_grad_norm {
+        let scale = config.max_grad_norm / grad_norm;
+        for g in &mut gradients {
+            *g *= scale;
+        }
+        result.gradient_clipped = true;
+    }
+
+    // Apply gradients
+    model.apply_gradients(&gradients, config.learning_rate);
+
+    // Update online Fisher estimate
+    ewc.online_fisher.update(&task_gradients);
+
+    // Update parameter importance
+    let activations: Vec<bool> = model.get_activation_mask();
+    ewc.importance_scorer.update(&task_gradients, &activations);
+
+    // Check for task boundary
+    if let Some(query_emb) = batch.first().map(|s| &s.query_embedding) {
+        let boundary = ewc.task_detector.update(query_emb);
+        if let TaskBoundaryResult::BoundaryDetected { drift_score } = boundary {
+            // Complete current task and start new one
+            ewc.complete_task(model.get_weights(), result.compute_quality());
+            result.task_boundary_detected = true;
+            result.drift_score = drift_score;
+        }
+    }
+
+    result
+}
+```
+
+---
+
+## 10. Benchmarks and Validation
+
+### Forgetting Resistance Metrics
+
+```rust
+/// Measure forgetting resistance on held-out test sets
+pub struct ForgettingBenchmark {
+    /// Per-task test sets
+    task_test_sets: Vec<TestSet>,
+    /// Performance history per task
+    task_performance: Vec<Vec<f32>>,
+}
+
+impl ForgettingBenchmark {
+    /// Evaluate current model on all past tasks
+    pub fn evaluate(&mut self, model: &FastGRNNRouter) -> ForgettingReport {
+        let mut report = ForgettingReport::default();
+
+        for (task_id, test_set) in self.task_test_sets.iter().enumerate() {
+            let accuracy = self.evaluate_task(model, test_set);
+            self.task_performance[task_id].push(accuracy);
+
+            // Compute forgetting = max_accuracy - current_accuracy
+            let max_acc = self.task_performance[task_id].iter()
+                .cloned()
+                .fold(0.0f32, f32::max);
+            let forgetting = (max_acc - accuracy).max(0.0);
+
+            report.per_task_accuracy.push(accuracy);
+            report.per_task_forgetting.push(forgetting);
+        }
+
+        // Average forgetting
+        report.avg_forgetting = report.per_task_forgetting.iter()
+            .sum::<f32>() / report.per_task_forgetting.len().max(1) as f32;
+
+        // Backward transfer (negative forgetting = improvement)
+        report.backward_transfer = -report.avg_forgetting;
+
+        report
+    }
+
+    fn evaluate_task(&self, model: &FastGRNNRouter, test: &TestSet) -> f32 {
+        let correct = test.samples.iter()
+            .filter(|s| model.forward(&s.features).predicted_class == s.label)
+            .count();
+        correct as f32 / test.samples.len() as f32
+    }
+}
+
+#[derive(Debug, Default)]
+pub struct ForgettingReport {
+    pub per_task_accuracy: Vec<f32>,
+    pub per_task_forgetting: Vec<f32>,
+    pub avg_forgetting: f32,
+    pub backward_transfer: f32,
+}
+```
+
+---
+
+## Summary: EWC++ vs Standard EWC
+
+| Feature | Standard EWC | SONA EWC++ |
+|---------|-------------|------------|
+| Task memory | 1 task | N tasks (configurable) |
+| Fisher estimation | Offline, single | Online, streaming |
+| Lambda | Fixed | Adaptive per-task |
+| Task detection | Manual | Automatic |
+| Parameter importance | Uniform | Learned |
+| Gradient handling | Direct | Projected |
+| Forgetting rate | ~5-10% | **<0.1%** |
+
+EWC++ enables SONA to learn continuously from every interaction while maintaining near-perfect retention of past knowledge.
--- a/examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
+++ b/examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
@@ -0,0 +1,794 @@
+# SONA ReasoningBank: Pattern-Driven Self-Optimization
+
+## Learning from Experience Through Trajectory Analysis
+
+---
+
+## 1. Overview
+
+ReasoningBank is SONA's long-term pattern memory, learning what works and applying that knowledge to optimize future decisions.
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      REASONINGBANK CONCEPT                          │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│    Query → [What worked before?] → Pattern Match → Optimized Params │
+│                      ↑                                              │
+│                      │                                              │
+│              ┌───────┴────────┐                                     │
+│              │ REASONINGBANK  │                                     │
+│              │                │                                     │
+│              │ • Trajectories │  ← Record every query               │
+│              │ • Patterns     │  ← Extract from clusters            │
+│              │ • Verdicts     │  ← What params worked best          │
+│              │ • Confidence   │  ← How certain we are               │
+│              └────────────────┘                                     │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Core Data Structures
+
+### Trajectory: Recording Every Interaction
+
+```rust
+/// A single query trajectory with outcomes
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct QueryTrajectory {
+    /// Unique trajectory ID
+    pub id: u64,
+    /// Query embedding vector
+    pub query_embedding: Vec<f32>,
+    /// Search parameters used
+    pub search_params: SearchParams,
+    /// Retrieved result IDs
+    pub retrieved_ids: Vec<String>,
+    /// Precision (relevant / retrieved)
+    pub precision: f32,
+    /// Recall (retrieved_relevant / total_relevant)
+    pub recall: f32,
+    /// Latency in microseconds
+    pub latency_us: u64,
+    /// User feedback if provided
+    pub feedback: Option<UserFeedback>,
+    /// Timestamp
+    pub timestamp: i64,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct SearchParams {
+    /// ef_search parameter for HNSW
+    pub ef_search: usize,
+    /// Number of probes for IVF
+    pub n_probes: usize,
+    /// Model tier selected
+    pub model_tier: ModelTier,
+    /// Context window size
+    pub context_tokens: usize,
+    /// Temperature
+    pub temperature: f32,
+}
+```
+
+### Pattern: Learned Behavior Clusters
+
+```rust
+/// A learned pattern extracted from trajectory clusters
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct LearnedPattern {
+    /// Pattern ID
+    pub id: u64,
+    /// Centroid embedding (cluster center)
+    pub centroid: Vec<f32>,
+    /// Optimal search parameters for this pattern
+    pub optimal_params: SearchParams,
+    /// Confidence score (0-1)
+    pub confidence: f32,
+    /// Number of trajectories in cluster
+    pub support_count: usize,
+    /// Average precision for pattern
+    pub avg_precision: f32,
+    /// Average recall for pattern
+    pub avg_recall: f32,
+    /// Average latency
+    pub avg_latency_us: u64,
+    /// Pattern creation timestamp
+    pub created_at: i64,
+    /// Last update timestamp
+    pub updated_at: i64,
+    /// Abstraction level (0 = concrete, higher = more abstract)
+    pub abstraction_level: u32,
+    /// Child pattern IDs (for hierarchical patterns)
+    pub children: Vec<u64>,
+}
+```
+
+### Verdict: Decision Judgments
+
+```rust
+/// Verdict on what parameters worked best
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct Verdict {
+    /// Pattern this verdict applies to
+    pub pattern_id: u64,
+    /// Recommended parameters
+    pub recommended_params: SearchParams,
+    /// Confidence in recommendation
+    pub confidence: f32,
+    /// Evidence supporting this verdict
+    pub evidence: VerdictEvidence,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct VerdictEvidence {
+    /// Number of supporting trajectories
+    pub support_count: usize,
+    /// Average improvement over default
+    pub avg_improvement: f32,
+    /// Statistical significance (p-value)
+    pub p_value: f32,
+    /// Consistency score (low variance = high consistency)
+    pub consistency: f32,
+}
+```
+
+---
+
+## 3. ReasoningBank Implementation
+
+### Core Storage and Retrieval
+
+```rust
+use dashmap::DashMap;
+use parking_lot::RwLock;
+
+/// ReasoningBank: Pattern-based learning and optimization
+pub struct ReasoningBank {
+    /// Trajectory ring buffer (recent interactions)
+    trajectories: RwLock<CircularBuffer<QueryTrajectory>>,
+    /// Learned patterns (concurrent hashmap)
+    patterns: DashMap<u64, LearnedPattern>,
+    /// Pattern index for fast similarity lookup
+    pattern_index: RwLock<HNSWIndex>,
+    /// Verdicts per pattern
+    verdicts: DashMap<u64, Verdict>,
+    /// Configuration
+    config: ReasoningBankConfig,
+    /// Pattern ID counter
+    next_pattern_id: AtomicU64,
+    /// Statistics
+    stats: RwLock<ReasoningBankStats>,
+}
+
+impl ReasoningBank {
+    /// Create new ReasoningBank
+    pub fn new(config: ReasoningBankConfig) -> Self {
+        Self {
+            trajectories: RwLock::new(CircularBuffer::new(config.trajectory_capacity)),
+            patterns: DashMap::new(),
+            pattern_index: RwLock::new(HNSWIndex::new(config.embedding_dim, config.ef_construction)),
+            verdicts: DashMap::new(),
+            config,
+            next_pattern_id: AtomicU64::new(0),
+            stats: RwLock::new(ReasoningBankStats::default()),
+        }
+    }
+
+    /// Record a new trajectory
+    #[inline]
+    pub fn record_trajectory(&self, trajectory: QueryTrajectory) {
+        let mut trajectories = self.trajectories.write();
+        trajectories.push(trajectory);
+
+        // Update stats
+        let mut stats = self.stats.write();
+        stats.total_trajectories += 1;
+    }
+
+    /// Find most similar pattern to query
+    pub fn find_similar_pattern(&self, query_embedding: &[f32], k: usize) -> Vec<PatternMatch> {
+        let index = self.pattern_index.read();
+        let neighbors = index.search(query_embedding, k, self.config.ef_search);
+
+        neighbors.iter()
+            .filter_map(|&(id, distance)| {
+                self.patterns.get(&id).map(|p| PatternMatch {
+                    pattern: p.clone(),
+                    similarity: 1.0 - distance, // Convert distance to similarity
+                })
+            })
+            .collect()
+    }
+
+    /// Get optimized parameters for query
+    pub fn get_optimized_params(&self, query_embedding: &[f32]) -> OptimizedParams {
+        // Find similar patterns
+        let matches = self.find_similar_pattern(query_embedding, self.config.top_k_patterns);
+
+        if matches.is_empty() {
+            // No matching patterns - use defaults
+            return OptimizedParams {
+                params: SearchParams::default(),
+                confidence: 0.0,
+                source: ParamSource::Default,
+            };
+        }
+
+        // Interpolate parameters based on similarity and confidence
+        let mut weighted_params = SearchParams::default();
+        let mut total_weight = 0.0f32;
+
+        for m in &matches {
+            let weight = m.similarity * m.pattern.confidence;
+            total_weight += weight;
+
+            weighted_params.ef_search += (m.pattern.optimal_params.ef_search as f32 * weight) as usize;
+            weighted_params.n_probes += (m.pattern.optimal_params.n_probes as f32 * weight) as usize;
+            weighted_params.temperature += m.pattern.optimal_params.temperature * weight;
+            // ... other params
+        }
+
+        if total_weight > 0.0 {
+            weighted_params.ef_search = (weighted_params.ef_search as f32 / total_weight) as usize;
+            weighted_params.n_probes = (weighted_params.n_probes as f32 / total_weight) as usize;
+            weighted_params.temperature /= total_weight;
+        }
+
+        OptimizedParams {
+            params: weighted_params,
+            confidence: total_weight / matches.len() as f32,
+            source: ParamSource::Pattern(matches[0].pattern.id),
+        }
+    }
+
+    /// Record feedback for trajectory
+    pub fn record_feedback(&self, trajectory_id: u64, feedback: UserFeedback) {
+        // Find trajectory and update
+        let mut trajectories = self.trajectories.write();
+        if let Some(traj) = trajectories.iter_mut().find(|t| t.id == trajectory_id) {
+            traj.feedback = Some(feedback.clone());
+        }
+
+        // Update related pattern confidence
+        // Higher feedback = higher confidence in that pattern's params
+        if let Some(pattern_id) = self.find_pattern_for_trajectory(trajectory_id) {
+            if let Some(mut pattern) = self.patterns.get_mut(&pattern_id) {
+                let feedback_delta = feedback.rating as f32 / 5.0 - 0.5; // -0.5 to +0.5
+                pattern.confidence = (pattern.confidence + 0.1 * feedback_delta).clamp(0.0, 1.0);
+            }
+        }
+    }
+}
+```
+
+---
+
+## 4. Pattern Extraction
+
+### K-Means++ Clustering for Pattern Discovery
+
+```rust
+/// Pattern extractor using K-means++ clustering
+pub struct PatternExtractor {
+    /// Number of clusters to extract
+    k: usize,
+    /// Maximum iterations
+    max_iter: usize,
+    /// Convergence threshold
+    epsilon: f32,
+}
+
+impl PatternExtractor {
+    /// Extract patterns from trajectories
+    pub fn extract(&self, trajectories: &[QueryTrajectory]) -> Vec<LearnedPattern> {
+        if trajectories.len() < self.k {
+            return Vec::new();
+        }
+
+        // Collect embeddings
+        let embeddings: Vec<&[f32]> = trajectories.iter()
+            .map(|t| t.query_embedding.as_slice())
+            .collect();
+
+        // K-means++ initialization
+        let mut centroids = self.kmeans_plus_plus_init(&embeddings);
+
+        // K-means iteration
+        let mut assignments = vec![0usize; trajectories.len()];
+        for _ in 0..self.max_iter {
+            // Assignment step
+            let old_assignments = assignments.clone();
+            for (i, emb) in embeddings.iter().enumerate() {
+                let mut min_dist = f32::MAX;
+                let mut min_idx = 0;
+                for (c_idx, centroid) in centroids.iter().enumerate() {
+                    let dist = euclidean_distance(emb, centroid);
+                    if dist < min_dist {
+                        min_dist = dist;
+                        min_idx = c_idx;
+                    }
+                }
+                assignments[i] = min_idx;
+            }
+
+            // Check convergence
+            if assignments == old_assignments {
+                break;
+            }
+
+            // Update step
+            centroids = self.compute_centroids(&embeddings, &assignments);
+        }
+
+        // Create patterns from clusters
+        let mut patterns = Vec::new();
+        for cluster_id in 0..self.k {
+            let cluster_trajectories: Vec<_> = trajectories.iter()
+                .zip(assignments.iter())
+                .filter(|(_, &a)| a == cluster_id)
+                .map(|(t, _)| t)
+                .collect();
+
+            if cluster_trajectories.len() < 3 {
+                continue; // Skip small clusters
+            }
+
+            let pattern = self.create_pattern_from_cluster(
+                cluster_id as u64,
+                &centroids[cluster_id],
+                &cluster_trajectories,
+            );
+            patterns.push(pattern);
+        }
+
+        patterns
+    }
+
+    fn kmeans_plus_plus_init(&self, embeddings: &[&[f32]]) -> Vec<Vec<f32>> {
+        let mut centroids = Vec::with_capacity(self.k);
+        let mut rng = rand::thread_rng();
+
+        // First centroid: random
+        let first_idx = rng.gen_range(0..embeddings.len());
+        centroids.push(embeddings[first_idx].to_vec());
+
+        // Remaining centroids: D² weighting
+        for _ in 1..self.k {
+            let mut distances: Vec<f32> = embeddings.iter()
+                .map(|emb| {
+                    centroids.iter()
+                        .map(|c| euclidean_distance(emb, c))
+                        .fold(f32::MAX, f32::min)
+                })
+                .collect();
+
+            // Square distances for D² sampling
+            let total: f32 = distances.iter().map(|d| d * d).sum();
+            let threshold = rng.gen::<f32>() * total;
+
+            let mut cumsum = 0.0;
+            let mut selected = 0;
+            for (i, d) in distances.iter().enumerate() {
+                cumsum += d * d;
+                if cumsum >= threshold {
+                    selected = i;
+                    break;
+                }
+            }
+
+            centroids.push(embeddings[selected].to_vec());
+        }
+
+        centroids
+    }
+
+    fn create_pattern_from_cluster(
+        &self,
+        id: u64,
+        centroid: &[f32],
+        trajectories: &[&QueryTrajectory],
+    ) -> LearnedPattern {
+        // Compute optimal params as weighted average by quality
+        let mut total_weight = 0.0f32;
+        let mut ef_sum = 0.0f32;
+        let mut probes_sum = 0.0f32;
+        let mut temp_sum = 0.0f32;
+        let mut precision_sum = 0.0f32;
+        let mut recall_sum = 0.0f32;
+        let mut latency_sum = 0u64;
+
+        for t in trajectories {
+            let weight = t.precision * t.recall; // Quality as weight
+            total_weight += weight;
+
+            ef_sum += t.search_params.ef_search as f32 * weight;
+            probes_sum += t.search_params.n_probes as f32 * weight;
+            temp_sum += t.search_params.temperature * weight;
+            precision_sum += t.precision;
+            recall_sum += t.recall;
+            latency_sum += t.latency_us;
+        }
+
+        let n = trajectories.len() as f32;
+
+        LearnedPattern {
+            id,
+            centroid: centroid.to_vec(),
+            optimal_params: SearchParams {
+                ef_search: (ef_sum / total_weight).round() as usize,
+                n_probes: (probes_sum / total_weight).round() as usize,
+                model_tier: ModelTier::Auto, // Determined separately
+                context_tokens: 2048, // Default
+                temperature: temp_sum / total_weight,
+            },
+            confidence: (total_weight / n).clamp(0.0, 1.0),
+            support_count: trajectories.len(),
+            avg_precision: precision_sum / n,
+            avg_recall: recall_sum / n,
+            avg_latency_us: latency_sum / trajectories.len() as u64,
+            created_at: chrono::Utc::now().timestamp(),
+            updated_at: chrono::Utc::now().timestamp(),
+            abstraction_level: 0,
+            children: Vec::new(),
+        }
+    }
+}
+```
+
+---
+
+## 5. Verdict Judgment System
+
+### Evaluating What Works Best
+
+```rust
+/// Verdict judge for parameter optimization
+pub struct VerdictJudge {
+    /// Minimum samples for statistical significance
+    min_samples: usize,
+    /// Significance level (p-value threshold)
+    alpha: f32,
+}
+
+impl VerdictJudge {
+    /// Judge optimal parameters for a pattern
+    pub fn judge(&self, pattern: &LearnedPattern, trajectories: &[&QueryTrajectory]) -> Option<Verdict> {
+        if trajectories.len() < self.min_samples {
+            return None; // Not enough evidence
+        }
+
+        // Group trajectories by parameter configuration
+        let mut param_groups: HashMap<ParamKey, Vec<&QueryTrajectory>> = HashMap::new();
+        for t in trajectories {
+            let key = ParamKey::from(&t.search_params);
+            param_groups.entry(key).or_default().push(t);
+        }
+
+        // Find best performing configuration
+        let mut best_config: Option<(ParamKey, f32, Vec<&QueryTrajectory>)> = None;
+
+        for (key, group) in &param_groups {
+            if group.len() < 3 {
+                continue;
+            }
+
+            // Compute quality score (F1 of precision and recall)
+            let avg_quality: f32 = group.iter()
+                .map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
+                .sum::<f32>() / group.len() as f32;
+
+            match &best_config {
+                None => best_config = Some((key.clone(), avg_quality, group.clone())),
+                Some((_, best_quality, _)) if avg_quality > *best_quality => {
+                    best_config = Some((key.clone(), avg_quality, group.clone()));
+                }
+                _ => {}
+            }
+        }
+
+        let (best_key, best_quality, best_group) = best_config?;
+
+        // Statistical significance test
+        let p_value = self.compute_significance(&best_group, trajectories);
+        if p_value > self.alpha {
+            return None; // Not significant
+        }
+
+        // Compute consistency (inverse of coefficient of variation)
+        let qualities: Vec<f32> = best_group.iter()
+            .map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
+            .collect();
+        let mean = qualities.iter().sum::<f32>() / qualities.len() as f32;
+        let variance = qualities.iter()
+            .map(|q| (q - mean).powi(2))
+            .sum::<f32>() / qualities.len() as f32;
+        let std_dev = variance.sqrt();
+        let consistency = 1.0 / (1.0 + std_dev / mean);
+
+        // Compute improvement over default
+        let default_quality = self.compute_default_quality(trajectories);
+        let improvement = (best_quality - default_quality) / default_quality;
+
+        Some(Verdict {
+            pattern_id: pattern.id,
+            recommended_params: best_key.to_params(),
+            confidence: best_quality * consistency,
+            evidence: VerdictEvidence {
+                support_count: best_group.len(),
+                avg_improvement: improvement,
+                p_value,
+                consistency,
+            },
+        })
+    }
+
+    fn compute_significance(&self, best: &[&QueryTrajectory], all: &[&QueryTrajectory]) -> f32 {
+        // Welch's t-test for comparing means
+        let best_qualities: Vec<f32> = best.iter()
+            .map(|t| t.precision * t.recall)
+            .collect();
+        let all_qualities: Vec<f32> = all.iter()
+            .map(|t| t.precision * t.recall)
+            .collect();
+
+        welch_t_test(&best_qualities, &all_qualities)
+    }
+
+    fn compute_default_quality(&self, trajectories: &[&QueryTrajectory]) -> f32 {
+        // Assume first configuration or most common is "default"
+        let default_group: Vec<_> = trajectories.iter()
+            .filter(|t| t.search_params.ef_search == SearchParams::default().ef_search)
+            .collect();
+
+        if default_group.is_empty() {
+            0.5 // Baseline assumption
+        } else {
+            default_group.iter()
+                .map(|t| t.precision * t.recall)
+                .sum::<f32>() / default_group.len() as f32
+        }
+    }
+}
+```
+
+---
+
+## 6. Integration with Router
+
+### Using ReasoningBank to Optimize Router Decisions
+
+```rust
+impl FastGRNNRouter {
+    /// Forward pass with ReasoningBank optimization
+    pub fn forward_with_reasoning(
+        &self,
+        features: &[f32],
+        reasoning_bank: &ReasoningBank,
+    ) -> RouterDecision {
+        // Get pattern-based parameter suggestions
+        let pattern_params = reasoning_bank.get_optimized_params(features);
+
+        // Standard router forward
+        let mut decision = self.forward(features);
+
+        // Blend router decision with pattern suggestions
+        if pattern_params.confidence > 0.5 {
+            let blend_factor = pattern_params.confidence * 0.3; // Max 30% influence
+
+            // Interpolate temperature
+            decision.temperature = (1.0 - blend_factor) * decision.temperature
+                + blend_factor * pattern_params.params.temperature;
+
+            // Context token suggestion influences context selection
+            let suggested_context = pattern_params.params.context_tokens;
+            let router_context = decision.context_tokens;
+            decision.context_tokens = ((1.0 - blend_factor) * router_context as f32
+                + blend_factor * suggested_context as f32) as usize;
+
+            decision.reasoning_confidence = pattern_params.confidence;
+            decision.reasoning_pattern_id = pattern_params.source.pattern_id();
+        }
+
+        decision
+    }
+}
+```
+
+---
+
+## 7. Pattern Consolidation and Pruning
+
+### Managing Pattern Memory
+
+```rust
+impl ReasoningBank {
+    /// Consolidate similar patterns
+    pub fn consolidate_patterns(&mut self) {
+        // Find similar pattern pairs
+        let pattern_ids: Vec<u64> = self.patterns.iter()
+            .map(|p| *p.key())
+            .collect();
+
+        let mut to_merge: Vec<(u64, u64)> = Vec::new();
+
+        for i in 0..pattern_ids.len() {
+            for j in (i+1)..pattern_ids.len() {
+                let p1 = self.patterns.get(&pattern_ids[i]).unwrap();
+                let p2 = self.patterns.get(&pattern_ids[j]).unwrap();
+
+                let similarity = cosine_similarity(&p1.centroid, &p2.centroid);
+                if similarity > 0.95 {
+                    // Very similar - merge
+                    to_merge.push((pattern_ids[i], pattern_ids[j]));
+                }
+            }
+        }
+
+        // Merge patterns
+        for (keep_id, remove_id) in to_merge {
+            if let (Some(mut keep), Some(remove)) = (
+                self.patterns.get_mut(&keep_id),
+                self.patterns.get(&remove_id)
+            ) {
+                // Weighted average of centroids
+                let total_support = keep.support_count + remove.support_count;
+                let w1 = keep.support_count as f32 / total_support as f32;
+                let w2 = remove.support_count as f32 / total_support as f32;
+
+                for (c, (c1, c2)) in keep.centroid.iter_mut()
+                    .zip(keep.centroid.iter().zip(remove.centroid.iter()))
+                {
+                    *c = w1 * c1 + w2 * c2;
+                }
+
+                // Update support count
+                keep.support_count = total_support;
+                keep.confidence = (keep.confidence * w1 + remove.confidence * w2).min(1.0);
+                keep.updated_at = chrono::Utc::now().timestamp();
+            }
+
+            // Remove merged pattern
+            self.patterns.remove(&remove_id);
+        }
+    }
+
+    /// Prune low-confidence patterns
+    pub fn prune_patterns(&mut self, min_confidence: f32, min_support: usize) {
+        let to_remove: Vec<u64> = self.patterns.iter()
+            .filter(|p| p.confidence < min_confidence || p.support_count < min_support)
+            .map(|p| *p.key())
+            .collect();
+
+        for id in to_remove {
+            self.patterns.remove(&id);
+            self.verdicts.remove(&id);
+        }
+    }
+
+    /// Build pattern hierarchy (abstraction levels)
+    pub fn build_hierarchy(&mut self) {
+        // Hierarchical clustering on existing patterns
+        let patterns: Vec<_> = self.patterns.iter()
+            .map(|p| (p.key().clone(), p.centroid.clone()))
+            .collect();
+
+        let hierarchy = HierarchicalClustering::new()
+            .linkage(Linkage::Ward)
+            .fit(&patterns);
+
+        // Create meta-patterns at each level
+        for level in 1..=3 {
+            let clusters = hierarchy.clusters_at_level(level);
+
+            for cluster in clusters {
+                if cluster.size() > 1 {
+                    let child_ids: Vec<u64> = cluster.member_ids();
+                    let meta_centroid = cluster.centroid();
+
+                    // Average params from children
+                    let children: Vec<_> = child_ids.iter()
+                        .filter_map(|id| self.patterns.get(id))
+                        .collect();
+
+                    let meta_params = self.average_params(&children);
+
+                    let meta_pattern = LearnedPattern {
+                        id: self.next_pattern_id.fetch_add(1, Ordering::SeqCst),
+                        centroid: meta_centroid,
+                        optimal_params: meta_params,
+                        confidence: children.iter().map(|c| c.confidence).sum::<f32>() / children.len() as f32,
+                        support_count: children.iter().map(|c| c.support_count).sum(),
+                        avg_precision: children.iter().map(|c| c.avg_precision).sum::<f32>() / children.len() as f32,
+                        avg_recall: children.iter().map(|c| c.avg_recall).sum::<f32>() / children.len() as f32,
+                        avg_latency_us: children.iter().map(|c| c.avg_latency_us).sum::<u64>() / children.len() as u64,
+                        created_at: chrono::Utc::now().timestamp(),
+                        updated_at: chrono::Utc::now().timestamp(),
+                        abstraction_level: level as u32,
+                        children: child_ids,
+                    };
+
+                    self.patterns.insert(meta_pattern.id, meta_pattern);
+                }
+            }
+        }
+    }
+}
+```
+
+---
+
+## 8. Statistics and Monitoring
+
+```rust
+#[derive(Default, Debug)]
+pub struct ReasoningBankStats {
+    /// Total trajectories recorded
+    pub total_trajectories: u64,
+    /// Total patterns stored
+    pub total_patterns: usize,
+    /// Total verdicts issued
+    pub total_verdicts: usize,
+    /// Pattern match hit rate
+    pub pattern_hit_rate: f32,
+    /// Average confidence in recommendations
+    pub avg_recommendation_confidence: f32,
+    /// Improvement from pattern optimization
+    pub avg_improvement_percent: f32,
+}
+
+impl ReasoningBank {
+    /// Get current statistics
+    pub fn stats(&self) -> ReasoningBankStats {
+        let stats = self.stats.read();
+        ReasoningBankStats {
+            total_trajectories: stats.total_trajectories,
+            total_patterns: self.patterns.len(),
+            total_verdicts: self.verdicts.len(),
+            pattern_hit_rate: stats.pattern_hit_rate,
+            avg_recommendation_confidence: stats.avg_recommendation_confidence,
+            avg_improvement_percent: stats.avg_improvement_percent,
+        }
+    }
+
+    /// Export all patterns for persistence
+    pub fn export(&self) -> ReasoningBankExport {
+        ReasoningBankExport {
+            patterns: self.patterns.iter()
+                .map(|p| p.value().clone())
+                .collect(),
+            verdicts: self.verdicts.iter()
+                .map(|v| v.value().clone())
+                .collect(),
+        }
+    }
+
+    /// Import patterns from persistence
+    pub fn import(&mut self, export: ReasoningBankExport) {
+        for pattern in export.patterns {
+            let id = pattern.id;
+            self.patterns.insert(id, pattern.clone());
+            self.pattern_index.write().insert(id, &pattern.centroid);
+        }
+        for verdict in export.verdicts {
+            self.verdicts.insert(verdict.pattern_id, verdict);
+        }
+    }
+}
+```
+
+---
+
+## Summary
+
+ReasoningBank enables SONA to:
+
+1. **Learn from every query** through trajectory recording
+2. **Discover patterns** via K-means++ clustering
+3. **Judge what works** through statistical verdict analysis
+4. **Optimize future decisions** by interpolating from similar patterns
+5. **Build abstractions** through hierarchical pattern consolidation
+
+This creates a continuously improving system where past experience directly enhances future performance.
--- a/examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
+++ b/examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
@@ -0,0 +1,755 @@
+# SONA Memory Dreams: Offline Consolidation Engine
+
+## Creativity Through Neural Replay and Recombination
+
+---
+
+## 1. Biological Inspiration
+
+### Why Dreams Matter for Learning
+
+```
+HUMAN SLEEP-BASED LEARNING
+══════════════════════════
+
+Awake:                    Sleep (REM):              Next Day:
+─────────────────         ─────────────────         ─────────────────
+• New experiences         • Replay memories         • Consolidated knowledge
+• Pattern matching        • Recombine ideas         • Novel insights
+• Working memory          • Strengthen important    • Creative connections
+                          • Prune unimportant
+```
+
+Research shows that:
+- **Memory consolidation** happens during sleep
+- **Creative insights** emerge from random memory replay
+- **Neural pruning** removes low-value connections
+- **Analogical reasoning** connects distant concepts
+
+SONA's Dream Engine replicates these mechanisms for AI self-improvement.
+
+---
+
+## 2. Dream Engine Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      DREAM ENGINE ARCHITECTURE                       │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌───────────────┐                                                 │
+│   │ MEMORY GRAPH  │──────┐                                          │
+│   └───────────────┘      │                                          │
+│                          ▼                                          │
+│   ┌─────────────────────────────────────┐                          │
+│   │        DREAM GENERATOR              │                          │
+│   │                                     │                          │
+│   │  ┌─────────┐  ┌─────────┐          │                          │
+│   │  │ Random  │  │Weighted │          │                          │
+│   │  │ Walks   │  │ Sampling│          │                          │
+│   │  └────┬────┘  └────┬────┘          │                          │
+│   │       │            │               │                          │
+│   │       ▼            ▼               │                          │
+│   │  ┌──────────────────────┐          │                          │
+│   │  │   Dream Sequence     │          │                          │
+│   │  │   [M₁→M₂→M₃→...→Mₙ] │          │                          │
+│   │  └──────────┬───────────┘          │                          │
+│   └─────────────┼───────────────────────┘                          │
+│                 │                                                   │
+│                 ▼                                                   │
+│   ┌─────────────────────────────────────┐                          │
+│   │       DREAM EVALUATOR               │                          │
+│   │                                     │                          │
+│   │  • Novelty Score (new connections?) │                          │
+│   │  • Coherence Score (makes sense?)   │                          │
+│   │  • Utility Score (useful insight?)  │                          │
+│   └─────────────────────────────────────┘                          │
+│                 │                                                   │
+│                 ▼                                                   │
+│   ┌─────────────────────────────────────┐                          │
+│   │       DREAM INTEGRATOR              │                          │
+│   │                                     │                          │
+│   │  • Add weak creative edges          │                          │
+│   │  • Update pattern associations      │                          │
+│   │  • Generate novel hypotheses        │                          │
+│   └─────────────────────────────────────┘                          │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Dream Generation
+
+### Random Walk Memory Replay
+
+```rust
+/// Dream generator using random walks on memory graph
+pub struct DreamGenerator {
+    /// Temperature for random walk (higher = more random)
+    temperature: f32,
+    /// Maximum dream length
+    max_length: usize,
+    /// Minimum coherence threshold
+    min_coherence: f32,
+    /// Creativity bias (prefer novel connections)
+    creativity_bias: f32,
+}
+
+impl DreamGenerator {
+    /// Generate a single dream sequence
+    pub fn generate_dream(
+        &self,
+        memory: &MemoryGraph,
+        start_node: Option<NodeId>,
+    ) -> Dream {
+        let mut sequence = Vec::new();
+        let mut visited = HashSet::new();
+
+        // Start from random high-activation node if not specified
+        let current = start_node.unwrap_or_else(|| {
+            memory.sample_by_activation()
+        });
+
+        sequence.push(current);
+        visited.insert(current);
+
+        // Random walk with creativity-weighted transitions
+        for _ in 0..self.max_length {
+            let neighbors = memory.get_neighbors(current);
+
+            if neighbors.is_empty() {
+                break;
+            }
+
+            // Compute transition probabilities
+            let probs: Vec<f32> = neighbors.iter()
+                .map(|&(neighbor, edge_weight)| {
+                    let novelty_bonus = if visited.contains(&neighbor) {
+                        0.1 // Discourage revisits
+                    } else {
+                        1.0 + self.creativity_bias * (1.0 - memory.get_access_frequency(neighbor))
+                    };
+
+                    (edge_weight * novelty_bonus).powf(1.0 / self.temperature)
+                })
+                .collect();
+
+            // Sample next node
+            let next = sample_weighted(&neighbors, &probs);
+
+            if let Some((next_node, _)) = next {
+                sequence.push(next_node);
+                visited.insert(next_node);
+            } else {
+                break;
+            }
+        }
+
+        Dream {
+            sequence,
+            temperature: self.temperature,
+            timestamp: chrono::Utc::now().timestamp(),
+        }
+    }
+
+    /// Generate creative jump dream (non-local connections)
+    pub fn generate_creative_dream(
+        &self,
+        memory: &MemoryGraph,
+        num_jumps: usize,
+    ) -> Dream {
+        let mut sequence = Vec::new();
+
+        // Sample diverse starting points
+        let anchors = memory.sample_diverse(num_jumps, 0.3);
+
+        for anchor in anchors {
+            sequence.push(anchor);
+
+            // Short local walk from each anchor
+            let local_walk = self.generate_dream(memory, Some(anchor));
+            sequence.extend(local_walk.sequence.iter().skip(1).take(3));
+        }
+
+        Dream {
+            sequence,
+            temperature: self.temperature * 2.0, // Higher temperature for creative dreams
+            timestamp: chrono::Utc::now().timestamp(),
+        }
+    }
+}
+
+/// A dream sequence
+pub struct Dream {
+    /// Sequence of visited memory nodes
+    pub sequence: Vec<NodeId>,
+    /// Temperature used for generation
+    pub temperature: f32,
+    /// Generation timestamp
+    pub timestamp: i64,
+}
+```
+
+---
+
+## 4. Dream Evaluation
+
+### Measuring Dream Quality
+
+```rust
+/// Evaluator for dream quality
+pub struct DreamEvaluator {
+    /// Memory graph reference
+    memory: Arc<MemoryGraph>,
+    /// Novelty detection threshold
+    novelty_threshold: f32,
+}
+
+impl DreamEvaluator {
+    /// Evaluate dream quality across multiple dimensions
+    pub fn evaluate(&self, dream: &Dream) -> DreamQuality {
+        DreamQuality {
+            novelty: self.compute_novelty(dream),
+            coherence: self.compute_coherence(dream),
+            utility: self.compute_utility(dream),
+            diversity: self.compute_diversity(dream),
+        }
+    }
+
+    /// Novelty: How many new connections are suggested?
+    fn compute_novelty(&self, dream: &Dream) -> f32 {
+        let mut novel_pairs = 0;
+        let mut total_pairs = 0;
+
+        for i in 0..dream.sequence.len() {
+            for j in (i+1)..dream.sequence.len() {
+                total_pairs += 1;
+
+                let node_a = dream.sequence[i];
+                let node_b = dream.sequence[j];
+
+                // Check if edge exists
+                if !self.memory.has_edge(node_a, node_b) {
+                    // Check semantic similarity
+                    let emb_a = self.memory.get_embedding(node_a);
+                    let emb_b = self.memory.get_embedding(node_b);
+                    let sim = cosine_similarity(&emb_a, &emb_b);
+
+                    // Novel = no edge but moderate similarity
+                    if sim > 0.3 && sim < 0.8 {
+                        novel_pairs += 1;
+                    }
+                }
+            }
+        }
+
+        novel_pairs as f32 / total_pairs.max(1) as f32
+    }
+
+    /// Coherence: Does the dream sequence make semantic sense?
+    fn compute_coherence(&self, dream: &Dream) -> f32 {
+        if dream.sequence.len() < 2 {
+            return 1.0;
+        }
+
+        let mut coherence_sum = 0.0f32;
+
+        for window in dream.sequence.windows(2) {
+            let emb_a = self.memory.get_embedding(window[0]);
+            let emb_b = self.memory.get_embedding(window[1]);
+            coherence_sum += cosine_similarity(&emb_a, &emb_b);
+        }
+
+        coherence_sum / (dream.sequence.len() - 1) as f32
+    }
+
+    /// Utility: Are the suggested connections potentially useful?
+    fn compute_utility(&self, dream: &Dream) -> f32 {
+        // Based on node quality scores and access patterns
+        let avg_quality: f32 = dream.sequence.iter()
+            .map(|&id| self.memory.get_node_quality(id))
+            .sum::<f32>() / dream.sequence.len() as f32;
+
+        // Higher utility if connecting high-quality nodes
+        avg_quality
+    }
+
+    /// Diversity: How diverse are the visited nodes?
+    fn compute_diversity(&self, dream: &Dream) -> f32 {
+        // Average pairwise distance in embedding space
+        let embeddings: Vec<_> = dream.sequence.iter()
+            .map(|&id| self.memory.get_embedding(id))
+            .collect();
+
+        let mut total_dist = 0.0f32;
+        let mut count = 0;
+
+        for i in 0..embeddings.len() {
+            for j in (i+1)..embeddings.len() {
+                total_dist += 1.0 - cosine_similarity(&embeddings[i], &embeddings[j]);
+                count += 1;
+            }
+        }
+
+        total_dist / count.max(1) as f32
+    }
+}
+
+#[derive(Debug, Clone)]
+pub struct DreamQuality {
+    /// How many novel connections suggested (0-1)
+    pub novelty: f32,
+    /// How semantically coherent (0-1)
+    pub coherence: f32,
+    /// How useful the connections might be (0-1)
+    pub utility: f32,
+    /// How diverse the dream content (0-1)
+    pub diversity: f32,
+}
+
+impl DreamQuality {
+    /// Overall quality score
+    pub fn overall(&self) -> f32 {
+        // Weighted combination favoring novelty and coherence
+        0.4 * self.novelty + 0.3 * self.coherence + 0.2 * self.utility + 0.1 * self.diversity
+    }
+
+    /// Is this dream worth integrating?
+    pub fn is_valuable(&self, threshold: f32) -> bool {
+        self.novelty > 0.3 && self.coherence > 0.4 && self.overall() > threshold
+    }
+}
+```
+
+---
+
+## 5. Dream Integration
+
+### Applying Dream Insights to Memory
+
+```rust
+/// Integrates valuable dreams into memory graph
+pub struct DreamIntegrator {
+    /// Memory graph to update
+    memory: Arc<RwLock<MemoryGraph>>,
+    /// Strength of new creative edges
+    creative_edge_strength: f32,
+    /// Decay factor for dream-derived edges
+    dream_edge_decay: f32,
+}
+
+impl DreamIntegrator {
+    /// Integrate a valuable dream into memory
+    pub fn integrate(&self, dream: &Dream, quality: &DreamQuality) -> IntegrationResult {
+        let mut result = IntegrationResult::default();
+
+        if !quality.is_valuable(0.5) {
+            return result; // Skip low-quality dreams
+        }
+
+        let mut memory = self.memory.write();
+
+        // Extract novel connections from dream
+        let novel_connections = self.extract_novel_connections(dream, &memory);
+
+        for (node_a, node_b, strength) in novel_connections {
+            // Add weak creative edge
+            let edge_strength = self.creative_edge_strength * strength * quality.overall();
+
+            memory.add_edge(
+                node_a,
+                node_b,
+                EdgeType::Creative,
+                edge_strength,
+            );
+
+            result.edges_added += 1;
+        }
+
+        // Update node associations based on dream co-occurrence
+        for window in dream.sequence.windows(3) {
+            memory.update_association(window[0], window[2], 0.01);
+        }
+
+        result.dream_quality = quality.overall();
+        result
+    }
+
+    fn extract_novel_connections(
+        &self,
+        dream: &Dream,
+        memory: &MemoryGraph,
+    ) -> Vec<(NodeId, NodeId, f32)> {
+        let mut connections = Vec::new();
+
+        for i in 0..dream.sequence.len() {
+            for j in (i+1)..dream.sequence.len().min(i+5) { // Only nearby in sequence
+                let node_a = dream.sequence[i];
+                let node_b = dream.sequence[j];
+
+                if !memory.has_edge(node_a, node_b) {
+                    let emb_a = memory.get_embedding(node_a);
+                    let emb_b = memory.get_embedding(node_b);
+                    let sim = cosine_similarity(&emb_a, &emb_b);
+
+                    if sim > 0.3 {
+                        // Connection strength based on similarity and sequence proximity
+                        let proximity_factor = 1.0 / (j - i) as f32;
+                        let strength = sim * proximity_factor;
+                        connections.push((node_a, node_b, strength));
+                    }
+                }
+            }
+        }
+
+        connections
+    }
+}
+
+#[derive(Default)]
+pub struct IntegrationResult {
+    pub edges_added: usize,
+    pub associations_updated: usize,
+    pub dream_quality: f32,
+}
+```
+
+---
+
+## 6. Memory Consolidation
+
+### Strengthening Important Memories
+
+```rust
+/// Consolidation engine for memory pruning and strengthening
+pub struct ConsolidationEngine {
+    /// Memory graph reference
+    memory: Arc<RwLock<MemoryGraph>>,
+    /// Minimum access frequency for retention
+    min_access_frequency: f32,
+    /// Age decay factor (older = more decay)
+    age_decay: f32,
+    /// Quality threshold for preservation
+    quality_threshold: f32,
+}
+
+impl ConsolidationEngine {
+    /// Run full consolidation pass
+    pub fn consolidate(&self) -> ConsolidationReport {
+        let mut report = ConsolidationReport::default();
+
+        // Phase 1: Identify memories by value
+        let (high_value, medium_value, low_value) = self.categorize_memories();
+        report.high_value_count = high_value.len();
+        report.medium_value_count = medium_value.len();
+        report.low_value_count = low_value.len();
+
+        // Phase 2: Strengthen high-value memories
+        for &node_id in &high_value {
+            self.strengthen_memory(node_id);
+            report.memories_strengthened += 1;
+        }
+
+        // Phase 3: Decay low-value memories
+        for &node_id in &low_value {
+            let retained = self.decay_memory(node_id);
+            if retained {
+                report.memories_decayed += 1;
+            } else {
+                report.memories_removed += 1;
+            }
+        }
+
+        // Phase 4: Prune weak edges
+        let pruned = self.prune_weak_edges();
+        report.edges_pruned = pruned;
+
+        // Phase 5: Merge similar memories
+        let merged = self.merge_similar_memories();
+        report.memories_merged = merged;
+
+        report
+    }
+
+    fn categorize_memories(&self) -> (Vec<NodeId>, Vec<NodeId>, Vec<NodeId>) {
+        let memory = self.memory.read();
+        let mut high = Vec::new();
+        let mut medium = Vec::new();
+        let mut low = Vec::new();
+
+        for node in memory.iter_nodes() {
+            let value_score = self.compute_value_score(node);
+
+            if value_score > 0.7 {
+                high.push(node.id);
+            } else if value_score > 0.3 {
+                medium.push(node.id);
+            } else {
+                low.push(node.id);
+            }
+        }
+
+        (high, medium, low)
+    }
+
+    fn compute_value_score(&self, node: &MemoryNode) -> f32 {
+        let memory = self.memory.read();
+
+        // Factors:
+        // 1. Access frequency (more access = more valuable)
+        let freq_score = (node.access_count as f32 / 100.0).min(1.0);
+
+        // 2. Recency (recent = more valuable)
+        let age_days = (chrono::Utc::now().timestamp() - node.last_accessed) / 86400;
+        let recency_score = (-self.age_decay * age_days as f32).exp();
+
+        // 3. Quality (explicit quality score)
+        let quality_score = node.quality_score;
+
+        // 4. Connectivity (well-connected = more valuable)
+        let degree = memory.node_degree(node.id);
+        let connectivity_score = (degree as f32 / 10.0).min(1.0);
+
+        // Weighted combination
+        0.3 * freq_score + 0.2 * recency_score + 0.3 * quality_score + 0.2 * connectivity_score
+    }
+
+    fn strengthen_memory(&self, node_id: NodeId) {
+        let mut memory = self.memory.write();
+
+        // Increase edge weights to this node
+        for edge in memory.get_edges_to(node_id) {
+            memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(1.1));
+        }
+
+        // Mark as consolidated
+        if let Some(node) = memory.get_node_mut(node_id) {
+            node.consolidation_count += 1;
+            node.last_consolidated = chrono::Utc::now().timestamp();
+        }
+    }
+
+    fn decay_memory(&self, node_id: NodeId) -> bool {
+        let mut memory = self.memory.write();
+
+        // Reduce edge weights
+        for edge in memory.get_edges_to(node_id) {
+            memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(0.5));
+        }
+
+        // Check if node should be removed entirely
+        let total_incoming_weight: f32 = memory.get_edges_to(node_id)
+            .iter()
+            .map(|e| e.weight)
+            .sum();
+
+        if total_incoming_weight < 0.01 {
+            // Remove isolated or nearly-isolated node
+            memory.remove_node(node_id);
+            false // Not retained
+        } else {
+            true // Retained but weakened
+        }
+    }
+
+    fn prune_weak_edges(&self) -> usize {
+        let mut memory = self.memory.write();
+        let weak_edges: Vec<_> = memory.iter_edges()
+            .filter(|e| e.weight < 0.01)
+            .map(|e| e.id)
+            .collect();
+
+        for edge_id in &weak_edges {
+            memory.remove_edge(*edge_id);
+        }
+
+        weak_edges.len()
+    }
+
+    fn merge_similar_memories(&self) -> usize {
+        let mut memory = self.memory.write();
+        let mut merged_count = 0;
+
+        // Find highly similar node pairs
+        let nodes: Vec<_> = memory.iter_nodes().collect();
+
+        for i in 0..nodes.len() {
+            for j in (i+1)..nodes.len() {
+                let sim = cosine_similarity(&nodes[i].embedding, &nodes[j].embedding);
+
+                if sim > 0.98 {
+                    // Merge j into i
+                    memory.merge_nodes(nodes[i].id, nodes[j].id);
+                    merged_count += 1;
+                }
+            }
+        }
+
+        merged_count
+    }
+}
+
+#[derive(Default)]
+pub struct ConsolidationReport {
+    pub high_value_count: usize,
+    pub medium_value_count: usize,
+    pub low_value_count: usize,
+    pub memories_strengthened: usize,
+    pub memories_decayed: usize,
+    pub memories_removed: usize,
+    pub memories_merged: usize,
+    pub edges_pruned: usize,
+}
+```
+
+---
+
+## 7. Full Dream Cycle
+
+### Orchestrating the Dream Process
+
+```rust
+/// Complete dream cycle orchestrator
+pub struct DreamCycle {
+    generator: DreamGenerator,
+    evaluator: DreamEvaluator,
+    integrator: DreamIntegrator,
+    consolidator: ConsolidationEngine,
+    config: DreamCycleConfig,
+}
+
+impl DreamCycle {
+    /// Run complete dream cycle (weekly maintenance)
+    pub async fn run(&self) -> DreamCycleReport {
+        let start = Instant::now();
+        let mut report = DreamCycleReport::default();
+
+        // Phase 1: Generate dreams
+        tracing::info!("Starting dream generation phase");
+        let dreams = self.generate_dreams();
+        report.dreams_generated = dreams.len();
+
+        // Phase 2: Evaluate dreams
+        tracing::info!("Evaluating {} dreams", dreams.len());
+        let evaluated: Vec<_> = dreams.iter()
+            .map(|d| (d, self.evaluator.evaluate(d)))
+            .collect();
+
+        // Phase 3: Integrate valuable dreams
+        tracing::info!("Integrating valuable dreams");
+        for (dream, quality) in &evaluated {
+            if quality.is_valuable(self.config.dream_threshold) {
+                let result = self.integrator.integrate(dream, quality);
+                report.edges_added += result.edges_added;
+                report.dreams_integrated += 1;
+            }
+        }
+
+        // Phase 4: Memory consolidation
+        tracing::info!("Running memory consolidation");
+        report.consolidation = self.consolidator.consolidate();
+
+        report.elapsed_ms = start.elapsed().as_millis() as u64;
+        report.timestamp = chrono::Utc::now().timestamp();
+
+        tracing::info!(
+            dreams = report.dreams_generated,
+            integrated = report.dreams_integrated,
+            edges = report.edges_added,
+            elapsed_ms = report.elapsed_ms,
+            "Dream cycle completed"
+        );
+
+        report
+    }
+
+    fn generate_dreams(&self) -> Vec<Dream> {
+        let mut dreams = Vec::new();
+
+        // Regular random walk dreams
+        for _ in 0..self.config.num_regular_dreams {
+            let dream = self.generator.generate_dream(&self.memory, None);
+            dreams.push(dream);
+        }
+
+        // Creative jump dreams
+        for _ in 0..self.config.num_creative_dreams {
+            let dream = self.generator.generate_creative_dream(
+                &self.memory,
+                self.config.creative_jump_count,
+            );
+            dreams.push(dream);
+        }
+
+        dreams
+    }
+}
+
+#[derive(Default)]
+pub struct DreamCycleReport {
+    pub dreams_generated: usize,
+    pub dreams_integrated: usize,
+    pub edges_added: usize,
+    pub consolidation: ConsolidationReport,
+    pub elapsed_ms: u64,
+    pub timestamp: i64,
+}
+```
+
+---
+
+## 8. Integration with exo-exotic Dreams Module
+
+SONA integrates with the exo-ai-2025 dream experiments:
+
+```rust
+// From exo-exotic crate
+use exo_exotic::experiments::dreams::{
+    DreamExperiment,
+    DreamConfig,
+    NoveltyMeasure,
+};
+
+impl DreamCycle {
+    /// Run advanced dream experiments from exo-exotic
+    pub async fn run_exotic_dreams(&self) -> ExoticDreamReport {
+        let dream_experiment = DreamExperiment::new(DreamConfig {
+            memory_count: self.memory.node_count(),
+            replay_probability: 0.7,
+            recombination_rate: 0.3,
+            novelty_threshold: 0.5,
+        });
+
+        let result = dream_experiment.run(&self.memory).await;
+
+        ExoticDreamReport {
+            novelty_score: result.novelty,
+            coherence_score: result.coherence,
+            creative_insights: result.insights.len(),
+            new_hypotheses: result.hypotheses,
+        }
+    }
+}
+```
+
+---
+
+## Summary
+
+SONA's Dream Engine enables:
+
+| Feature | Mechanism | Outcome |
+|---------|-----------|---------|
+| **Memory Replay** | Random walks on memory graph | Strengthens important connections |
+| **Creative Recombination** | High-temperature sampling | Discovers novel associations |
+| **Quality Filtering** | Novelty + coherence metrics | Only valuable dreams integrated |
+| **Weak Edge Creation** | Dream-derived connections | Enables creative retrieval |
+| **Memory Consolidation** | Value-based pruning | Efficient memory usage |
+
+Dreams allow SONA to:
+1. **Discover** connections it wouldn't find through normal operation
+2. **Explore** the hypothesis space without user cost
+3. **Consolidate** valuable knowledge
+4. **Prune** low-value information
+5. **Remain creative** while staying grounded
--- a/examples/ruvLLM/docs/SONA/06-COMPONENTS.md
+++ b/examples/ruvLLM/docs/SONA/06-COMPONENTS.md
--- a/examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
+++ b/examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
--- a/examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
+++ b/examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
@@ -0,0 +1,814 @@
+# SONA Performance Benchmarks
+
+## Overview
+
+This document defines performance targets, benchmark methodology, and expected results for SONA components. All benchmarks are designed to be reproducible and measurable.
+
+## Performance Targets Summary
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                      SONA Performance Targets                           │
+├─────────────────────────────────────────────────────────────────────────┤
+│  Component              │ Target         │ Stretch Goal  │ Unit        │
+├─────────────────────────┼────────────────┼───────────────┼─────────────┤
+│  Micro-LoRA forward     │ <50μs          │ <20μs         │ per request │
+│  Micro-LoRA update      │ <100μs         │ <50μs         │ per signal  │
+│  Base LoRA forward      │ <200μs         │ <100μs        │ per layer   │
+│  Pattern extraction     │ <1s            │ <500ms        │ per 1000    │
+│  Trajectory recording   │ <10μs          │ <5μs          │ per step    │
+│  Background cycle       │ <30s           │ <15s          │ per cycle   │
+│  Deep cycle             │ <10min         │ <5min         │ per cycle   │
+│  Memory overhead        │ <100MB         │ <50MB         │ total       │
+│  Pattern search         │ <1ms           │ <100μs        │ per query   │
+│  Dream generation       │ <100ms         │ <50ms         │ per dream   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Micro-LoRA Benchmarks
+
+### Forward Pass Latency
+
+**Target**: <50μs average, <100μs p99
+
+```rust
+// benches/micro_lora.rs
+use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
+
+fn bench_micro_lora_forward(c: &mut Criterion) {
+    let mut group = c.benchmark_group("micro_lora_forward");
+
+    for rank in [1, 2] {
+        for hidden_dim in [256, 512, 1024, 2048] {
+            let lora = MicroLoRA::new(hidden_dim, rank);
+            let input = vec![0.1f32; hidden_dim];
+            let mut output = vec![0.0f32; hidden_dim];
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("rank{}", rank), hidden_dim),
+                &hidden_dim,
+                |b, _| {
+                    b.iter(|| {
+                        output.fill(0.0);
+                        unsafe { lora.forward_simd(&input, &mut output) };
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Rank | Hidden Dim | AVX2 (μs) | Scalar (μs) | Speedup |
+|------|------------|-----------|-------------|---------|
+| 1    | 256        | 3.2       | 12.5        | 3.9x    |
+| 1    | 512        | 5.8       | 24.1        | 4.2x    |
+| 1    | 1024       | 10.4      | 47.3        | 4.5x    |
+| 1    | 2048       | 19.7      | 93.8        | 4.8x    |
+| 2    | 256        | 5.1       | 23.4        | 4.6x    |
+| 2    | 512        | 9.3       | 46.2        | 5.0x    |
+| 2    | 1024       | 17.2      | 91.5        | 5.3x    |
+| 2    | 2048       | 33.1      | 182.4       | 5.5x    |
+
+### Gradient Accumulation
+
+**Target**: <100μs per signal
+
+```rust
+fn bench_gradient_accumulation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("gradient_accumulation");
+
+    for hidden_dim in [256, 512, 1024] {
+        let mut lora = MicroLoRA::new(hidden_dim, 1);
+        let signal = LearningSignal {
+            query_embedding: vec![0.1; hidden_dim],
+            gradient_estimate: vec![0.01; hidden_dim],
+            quality_score: 0.8,
+            timestamp: Instant::now(),
+            metadata: SignalMetadata::default(),
+        };
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(hidden_dim),
+            &hidden_dim,
+            |b, _| {
+                b.iter(|| {
+                    lora.accumulate_gradient(&signal);
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Hidden Dim | Time (μs) | Throughput (signals/s) |
+|------------|-----------|------------------------|
+| 256        | 8.3       | 120,481                |
+| 512        | 15.7      | 63,694                 |
+| 1024       | 30.2      | 33,112                 |
+
+---
+
+## Base LoRA Benchmarks
+
+### Forward Pass (Per Layer)
+
+**Target**: <200μs per layer
+
+```rust
+fn bench_base_lora_forward(c: &mut Criterion) {
+    let mut group = c.benchmark_group("base_lora_forward");
+
+    for rank in [4, 8, 16] {
+        for hidden_dim in [512, 1024, 2048] {
+            let lora = BaseLoRA::new(hidden_dim, rank, 1);
+            let input = vec![0.1f32; hidden_dim];
+            let mut output = vec![0.0f32; hidden_dim];
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("rank{}", rank), hidden_dim),
+                &hidden_dim,
+                |b, _| {
+                    b.iter(|| {
+                        lora.forward_layer(0, &input, &mut output);
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Rank | Hidden Dim | Time (μs) | FLOPs    | GFLOPS |
+|------|------------|-----------|----------|--------|
+| 4    | 512        | 45        | 4.2M     | 93     |
+| 4    | 1024       | 85        | 8.4M     | 99     |
+| 4    | 2048       | 162       | 16.8M    | 104    |
+| 8    | 512        | 82        | 8.4M     | 102    |
+| 8    | 1024       | 158       | 16.8M    | 106    |
+| 8    | 2048       | 305       | 33.5M    | 110    |
+| 16   | 512        | 155       | 16.8M    | 108    |
+| 16   | 1024       | 298       | 33.5M    | 112    |
+| 16   | 2048       | 582       | 67.1M    | 115    |
+
+---
+
+## Trajectory Recording Benchmarks
+
+### Step Recording Latency
+
+**Target**: <10μs per step
+
+```rust
+fn bench_trajectory_recording(c: &mut Criterion) {
+    let mut group = c.benchmark_group("trajectory_recording");
+
+    for hidden_dim in [256, 512] {
+        for num_heads in [4, 8] {
+            let mut builder = TrajectoryBuilder::new(1, vec![0.1; hidden_dim]);
+
+            group.bench_with_input(
+                BenchmarkId::new(format!("h{}_heads{}", hidden_dim, num_heads), hidden_dim),
+                &(hidden_dim, num_heads),
+                |b, &(hd, nh)| {
+                    b.iter(|| {
+                        builder.add_step(
+                            vec![0.5; hd],
+                            vec![0.1; hd * nh],
+                            0.8,
+                        );
+                    });
+                },
+            );
+        }
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Hidden Dim | Heads | Time (μs) | Memory (bytes) |
+|------------|-------|-----------|----------------|
+| 256        | 4     | 2.1       | 5,120          |
+| 256        | 8     | 3.8       | 9,216          |
+| 512        | 4     | 3.7       | 10,240         |
+| 512        | 8     | 6.9       | 18,432         |
+
+### Buffer Operations
+
+**Target**: Lock-free with <1% contention
+
+```rust
+fn bench_trajectory_buffer(c: &mut Criterion) {
+    let buffer = Arc::new(TrajectoryBuffer::new(10000));
+
+    c.bench_function("trajectory_buffer_record", |b| {
+        let trajectory = QueryTrajectory {
+            id: 1,
+            query_embedding: vec![0.1; 256],
+            steps: vec![],
+            final_quality: 0.8,
+            latency_us: 1000,
+        };
+
+        b.iter(|| {
+            buffer.record(trajectory.clone());
+        });
+    });
+
+    c.bench_function("trajectory_buffer_drain", |b| {
+        // Pre-fill buffer
+        for i in 0..1000 {
+            buffer.record(QueryTrajectory {
+                id: i,
+                query_embedding: vec![0.1; 256],
+                steps: vec![],
+                final_quality: 0.8,
+                latency_us: 1000,
+            });
+        }
+
+        b.iter(|| {
+            buffer.drain()
+        });
+    });
+}
+```
+
+---
+
+## Pattern Learning Benchmarks
+
+### K-means++ Extraction
+
+**Target**: <1s for 1000 trajectories
+
+```rust
+fn bench_pattern_extraction(c: &mut Criterion) {
+    let mut group = c.benchmark_group("pattern_extraction");
+
+    for n_trajectories in [100, 500, 1000, 5000] {
+        let mut bank = ReasoningBank::new(PatternConfig {
+            k_clusters: 50,
+            embedding_dim: 256,
+            ..Default::default()
+        });
+
+        // Pre-populate
+        for i in 0..n_trajectories {
+            bank.add_trajectory(&generate_random_trajectory(i, 256));
+        }
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n_trajectories),
+            &n_trajectories,
+            |b, _| {
+                b.iter(|| {
+                    bank.extract_patterns()
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Trajectories | Clusters | Time (ms) | Iterations |
+|--------------|----------|-----------|------------|
+| 100          | 10       | 12        | 8          |
+| 500          | 25       | 95        | 12         |
+| 1000         | 50       | 380       | 15         |
+| 5000         | 100      | 2,450     | 20         |
+
+### Pattern Search
+
+**Target**: <1ms per query
+
+```rust
+fn bench_pattern_search(c: &mut Criterion) {
+    let mut group = c.benchmark_group("pattern_search");
+
+    for n_patterns in [1000, 10000, 100000] {
+        let mut index = PatternIndex::new(256, n_patterns);
+
+        // Pre-populate
+        for i in 0..n_patterns {
+            let embedding: Vec<f32> = (0..256).map(|_| rand::random()).collect();
+            index.add_pattern(i as u64, &embedding).unwrap();
+        }
+
+        let query: Vec<f32> = (0..256).map(|_| rand::random()).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(n_patterns),
+            &n_patterns,
+            |b, _| {
+                b.iter(|| {
+                    index.find_similar(&query, 10)
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results** (HNSW with ef=50):
+
+| Patterns | Search Time (μs) | Recall@10 |
+|----------|------------------|-----------|
+| 1,000    | 45               | 0.98      |
+| 10,000   | 120              | 0.96      |
+| 100,000  | 350              | 0.94      |
+| 1,000,000| 850              | 0.92      |
+
+---
+
+## EWC++ Benchmarks
+
+### Fisher Information Update
+
+**Target**: <1ms per update
+
+```rust
+fn bench_fisher_update(c: &mut Criterion) {
+    let mut group = c.benchmark_group("fisher_update");
+
+    for param_count in [1000, 10000, 100000] {
+        let mut ewc = EwcPlusPlus::new(EwcConfig {
+            param_count,
+            ..Default::default()
+        });
+
+        let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(param_count),
+            &param_count,
+            |b, _| {
+                b.iter(|| {
+                    ewc.update_fisher(&gradients);
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Parameters | Update Time (μs) | Memory (KB) |
+|------------|------------------|-------------|
+| 1,000      | 15               | 8           |
+| 10,000     | 120              | 80          |
+| 100,000    | 1,150            | 800         |
+
+### Constraint Application
+
+**Target**: <500μs per gradient vector
+
+```rust
+fn bench_constraint_application(c: &mut Criterion) {
+    let mut group = c.benchmark_group("ewc_constraints");
+
+    for param_count in [1000, 10000, 100000] {
+        let ewc = EwcPlusPlus::new(EwcConfig {
+            param_count,
+            num_tasks: 5,
+            ..Default::default()
+        });
+
+        // Pre-train Fisher
+        for _ in 0..100 {
+            let grads: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+            ewc.update_fisher(&grads);
+        }
+
+        let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(param_count),
+            &param_count,
+            |b, _| {
+                b.iter(|| {
+                    ewc.apply_constraints(&gradients)
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+---
+
+## Dream Engine Benchmarks
+
+### Dream Generation
+
+**Target**: <100ms per dream
+
+```rust
+fn bench_dream_generation(c: &mut Criterion) {
+    let mut group = c.benchmark_group("dream_generation");
+
+    for memory_size in [1000, 10000, 50000] {
+        let mut engine = DreamEngine::new(DreamConfig::default());
+
+        // Pre-populate memory
+        for i in 0..memory_size {
+            engine.add_memory_node(MemoryNode {
+                id: i as u64,
+                embedding: (0..256).map(|_| rand::random()).collect(),
+                timestamp: Instant::now(),
+                access_count: rand::random::<u32>() % 100,
+                importance: rand::random(),
+            });
+        }
+
+        group.bench_with_input(
+            BenchmarkId::from_parameter(memory_size),
+            &memory_size,
+            |b, _| {
+                b.iter(|| {
+                    engine.generate_dream()
+                });
+            },
+        );
+    }
+
+    group.finish();
+}
+```
+
+**Expected Results**:
+
+| Memory Nodes | Dream Time (ms) | Avg Path Length |
+|--------------|-----------------|-----------------|
+| 1,000        | 12              | 8               |
+| 10,000       | 45              | 12              |
+| 50,000       | 85              | 15              |
+
+### Dream Quality Evaluation
+
+**Target**: <50ms per evaluation
+
+```rust
+fn bench_dream_evaluation(c: &mut Criterion) {
+    let evaluator = DreamEvaluator::new(EvaluatorConfig::default());
+
+    let dream = Dream {
+        id: 1,
+        path: (0..15).map(|i| MemoryNode {
+            id: i,
+            embedding: (0..256).map(|_| rand::random()).collect(),
+            timestamp: Instant::now(),
+            access_count: 10,
+            importance: 0.5,
+        }).collect(),
+        creative_jumps: 3,
+        total_novelty: 0.0,
+    };
+
+    c.bench_function("dream_evaluation", |b| {
+        b.iter(|| {
+            evaluator.evaluate(&dream)
+        });
+    });
+}
+```
+
+---
+
+## Learning Loop Benchmarks
+
+### Loop A (Instant) - Per Request
+
+**Target**: <1ms total overhead
+
+```rust
+fn bench_loop_a(c: &mut Criterion) {
+    let loop_a = InstantLoop::new(256, InstantLoopConfig::default());
+
+    let trajectory = QueryTrajectory {
+        id: 1,
+        query_embedding: vec![0.1; 256],
+        steps: (0..10).map(|_| TrajectoryStep {
+            activations: vec![0.5; 256],
+            attention_weights: vec![0.1; 2048],
+            reward: 0.8,
+            timestamp: Instant::now(),
+        }).collect(),
+        final_quality: 0.8,
+        latency_us: 50000,
+    };
+
+    c.bench_function("loop_a_on_inference", |b| {
+        b.iter(|| {
+            loop_a.on_inference(trajectory.clone());
+        });
+    });
+
+    c.bench_function("loop_a_flush", |b| {
+        // Pre-fill with signals
+        for _ in 0..100 {
+            loop_a.on_inference(trajectory.clone());
+        }
+
+        b.iter(|| {
+            loop_a.flush_updates();
+        });
+    });
+}
+```
+
+**Expected Results**:
+
+| Operation     | Time (μs) | Notes                    |
+|---------------|-----------|--------------------------|
+| on_inference  | 650       | Recording + accumulation |
+| flush_updates | 120       | LoRA + edge commit       |
+| Total         | 770       | Per request overhead     |
+
+### Loop B (Background) - Hourly
+
+**Target**: <30s per cycle
+
+```rust
+fn bench_loop_b(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let loop_b = BackgroundLoop::new(BackgroundLoopConfig::default(), 256);
+
+    // Generate trajectories
+    let trajectories: Vec<_> = (0..1000)
+        .map(|i| generate_random_trajectory(i, 256))
+        .collect();
+
+    c.bench_function("loop_b_cycle", |b| {
+        b.to_async(&runtime).iter(|| async {
+            loop_b.run_cycle(trajectories.clone()).await
+        });
+    });
+}
+```
+
+**Breakdown**:
+
+| Phase                  | Time (s) | % of Total |
+|------------------------|----------|------------|
+| Trajectory ingestion   | 0.5      | 2%         |
+| Pattern extraction     | 8.0      | 32%        |
+| Gradient computation   | 5.0      | 20%        |
+| EWC++ constraints      | 3.0      | 12%        |
+| LoRA update            | 2.0      | 8%         |
+| Fisher update          | 4.0      | 16%        |
+| Metrics/logging        | 2.5      | 10%        |
+| **Total**              | **25.0** | 100%       |
+
+### Loop C (Deep) - Weekly
+
+**Target**: <10min per cycle
+
+```rust
+fn bench_loop_c(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let loop_c = DeepLoop::new(DeepLoopConfig::default());
+
+    // This is a longer benchmark, run fewer iterations
+    c.bench_function("loop_c_cycle", |b| {
+        b.to_async(&runtime).iter(|| async {
+            loop_c.run_cycle().await
+        });
+    });
+}
+```
+
+**Breakdown**:
+
+| Phase                  | Time (min) | % of Total |
+|------------------------|------------|------------|
+| Dream generation (50)  | 1.5        | 15%        |
+| Φ evaluation           | 2.0        | 20%        |
+| Dream integration      | 1.0        | 10%        |
+| Memory consolidation   | 3.0        | 30%        |
+| EWC++ consolidation    | 2.0        | 20%        |
+| Metrics/persistence    | 0.5        | 5%         |
+| **Total**              | **10.0**   | 100%       |
+
+---
+
+## Memory Benchmarks
+
+### Memory Usage by Component
+
+```rust
+fn measure_memory_usage() -> MemoryReport {
+    let mut report = MemoryReport::default();
+
+    // Micro-LoRA (rank=1, hidden=256)
+    let micro_lora = MicroLoRA::new(256, 1);
+    report.micro_lora = std::mem::size_of_val(&micro_lora)
+        + micro_lora.down_proj.len() * 4
+        + micro_lora.up_proj.len() * 4
+        + micro_lora.gradient_buffer.len() * 4;
+
+    // Base LoRA (rank=8, hidden=256, layers=12)
+    let base_lora = BaseLoRA::new(256, 8, 12);
+    report.base_lora = std::mem::size_of_val(&base_lora)
+        + base_lora.layers.iter().map(|l|
+            l.down_proj.len() * 4 + l.up_proj.len() * 4
+        ).sum::<usize>();
+
+    // Trajectory buffer (capacity=10000)
+    report.trajectory_buffer = 10000 * (
+        256 * 4  // query embedding
+        + 10 * (256 * 4 + 2048 * 4 + 4 + 8)  // 10 steps
+    );
+
+    // Pattern index (100k patterns)
+    report.pattern_index = 100000 * (256 * 4 + 64);  // embedding + metadata
+
+    // EWC++ (100k params, 5 tasks)
+    report.ewc = 100000 * 4 * 5;  // Fisher per task
+
+    report
+}
+```
+
+**Expected Memory Usage**:
+
+| Component        | Size (MB) | Notes                    |
+|------------------|-----------|--------------------------|
+| Micro-LoRA       | 0.004     | Minimal overhead         |
+| Base LoRA        | 0.6       | 12 layers                |
+| Trajectory Buffer| 82.0      | 10k capacity             |
+| Pattern Index    | 102.4     | 100k patterns            |
+| EWC++ Fisher     | 2.0       | 100k params × 5 tasks    |
+| Dream Engine     | 12.8      | 50k memory nodes         |
+| **Total**        | **199.8** | Peak usage               |
+
+---
+
+## Throughput Benchmarks
+
+### End-to-End Query Throughput
+
+```rust
+fn bench_query_throughput(c: &mut Criterion) {
+    let runtime = tokio::runtime::Runtime::new().unwrap();
+
+    let sona = runtime.block_on(async {
+        SonaEngine::new(SonaConfig::default()).await.unwrap()
+    });
+
+    c.bench_function("query_throughput", |b| {
+        b.to_async(&runtime).iter(|| async {
+            sona.process("test query", &Context::default()).await
+        });
+    });
+}
+```
+
+**Expected Throughput**:
+
+| Scenario           | QPS     | Latency p50 | Latency p99 |
+|--------------------|---------|-------------|-------------|
+| Baseline (no SONA) | 850     | 1.1ms       | 2.5ms       |
+| With Micro-LoRA    | 780     | 1.2ms       | 2.8ms       |
+| Full SONA          | 720     | 1.3ms       | 3.2ms       |
+
+**Overhead**: ~15% throughput reduction for full self-learning capability.
+
+---
+
+## Hardware-Specific Benchmarks
+
+### CPU Feature Detection
+
+```rust
+fn check_cpu_features() -> CpuFeatures {
+    CpuFeatures {
+        avx2: is_x86_feature_detected!("avx2"),
+        avx512f: is_x86_feature_detected!("avx512f"),
+        fma: is_x86_feature_detected!("fma"),
+        sse4_1: is_x86_feature_detected!("sse4.1"),
+        sse4_2: is_x86_feature_detected!("sse4.2"),
+    }
+}
+```
+
+### Performance by CPU
+
+| CPU                    | Micro-LoRA (μs) | Pattern Search (μs) | Overall Speedup |
+|------------------------|-----------------|---------------------|-----------------|
+| Intel i9-13900K (AVX2) | 3.2             | 45                  | 4.8x            |
+| AMD Ryzen 9 7950X      | 3.5             | 48                  | 4.5x            |
+| Apple M2 Pro (NEON)    | 4.1             | 52                  | 3.9x            |
+| Intel Xeon Platinum    | 2.8             | 38                  | 5.2x            |
+
+---
+
+## Benchmark Commands
+
+```bash
+# Run all benchmarks
+cargo bench --package ruvllm --features sona
+
+# Run specific benchmark group
+cargo bench --package ruvllm --bench micro_lora
+
+# Run with specific features
+cargo bench --package ruvllm --features "sona,avx2"
+
+# Profile memory
+cargo bench --package ruvllm --bench memory -- --profile-time 60
+
+# Generate flamegraph
+cargo flamegraph --bench micro_lora -- --bench
+```
+
+---
+
+## Continuous Benchmarking
+
+### CI Integration
+
+```yaml
+# .github/workflows/bench.yml
+name: Benchmarks
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  benchmark:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Run benchmarks
+        run: cargo bench --package ruvllm --features sona -- --save-baseline main
+
+      - name: Compare with baseline
+        run: cargo bench --package ruvllm --features sona -- --baseline main
+
+      - name: Upload results
+        uses: actions/upload-artifact@v4
+        with:
+          name: benchmark-results
+          path: target/criterion
+```
+
+### Regression Detection
+
+```rust
+// Fail CI if performance regresses by more than 10%
+const MAX_REGRESSION_PERCENT: f64 = 10.0;
+
+fn check_regression(baseline: Duration, current: Duration) -> Result<(), String> {
+    let regression = (current.as_nanos() as f64 / baseline.as_nanos() as f64 - 1.0) * 100.0;
+
+    if regression > MAX_REGRESSION_PERCENT {
+        Err(format!(
+            "Performance regression of {:.1}% exceeds threshold of {}%",
+            regression, MAX_REGRESSION_PERCENT
+        ))
+    } else {
+        Ok(())
+    }
+}
+```
+
+---
+
+## Next Steps
+
+1. **09-API-REFERENCE.md** - Complete API documentation
--- a/examples/ruvLLM/docs/SONA/09-API-REFERENCE.md
+++ b/examples/ruvLLM/docs/SONA/09-API-REFERENCE.md