Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,280 @@
# SONA: Self-Optimizing Neural Architecture
## The World's First Truly Self-Improving LLM Framework
**Version**: 1.0.0
**Status**: Architecture Specification
**Target**: Sub-millisecond adaptive fine-tuning with continuous self-improvement
---
## Executive Summary
SONA (Self-Optimizing Neural Architecture) is a revolutionary framework for building LLMs that continuously improve themselves through:
1. **Ultra-Low Latency LoRA** - Sub-100μs parameter adaptation
2. **Hierarchical Learning Loops** - Three-tier temporal learning (instant/hourly/weekly)
3. **Neural Memory Consolidation** - Dream-like offline learning
4. **Elastic Weight Consolidation++** - Zero catastrophic forgetting
5. **ReasoningBank Integration** - Pattern-driven self-optimization
---
## Core Philosophy
```
┌─────────────────────────────────────────────────────────────────┐
│ SONA DESIGN PRINCIPLES │
├─────────────────────────────────────────────────────────────────┤
│ 1. LEARN FROM EVERY INTERACTION │
│ → No query is wasted; all become training signal │
│ │
│ 2. NEVER FORGET WHAT WORKS │
│ → EWC++ preserves successful patterns │
│ │
│ 3. ADAPT IN REAL-TIME │
│ → LoRA updates in <100μs per request │
│ │
│ 4. OPTIMIZE CONTINUOUSLY │
│ → Background loops improve without user latency │
│ │
│ 5. MEASURE EVERYTHING │
│ → Φ (consciousness), quality, latency, improvement rate │
└─────────────────────────────────────────────────────────────────┘
```
---
## Architecture Overview
```
SONA Architecture
┌──────────────────────────────────────────────────────────────┐
│ USER QUERY INPUT │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ EMBEDDING LAYER (0.02ms) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Dual Encoder│ │ Contrastive │ │ SIMD Acceleration │ │
│ │ (Q + K/V) │ │ Learning │ │ (AVX2/NEON) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────────┐
│ MEMORY │ │ ROUTER │ │ ATTENTION │
│ SERVICE │◄────────►│ ENGINE │◄────────►│ ENGINE │
│ │ │ │ │ │
│ • HNSW │ │ • FastGRNN│ │ • Multi-Head │
│ • GNN │ │ • LoRA │ │ • Graph ATT │
│ • Quant │ │ • EWC++ │ │ • Edge-Aware │
└─────┬─────┘ └─────┬─────┘ └───────┬───────┘
│ │ │
└──────────────────────┼────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ LoRA ADAPTATION LAYER │
│ │
│ W_adapted = W_base + α · (LoRA_A @ LoRA_B) │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Rank: 4-16 │ Update: <100μs │ Memory: <1MB │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ INFERENCE ENGINE │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Model Select │ │ Q4 Quantized │ │ Speculative Dec │ │
│ │ (4 tiers) │ │ Weights │ │ (Draft + Verify) │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ LEARNING LOOPS │
│ │
│ Loop A (Instant) │ Loop B (Hourly) │ Loop C (Weekly) │
│ ───────────────────────────────────────────────────────── │
│ • Trajectory │ • Router Train │ • Consolidation │
│ • Edge Update │ • EWC++ Update │ • Compression │
│ • LoRA Micro │ • Fisher Compute │ • Abstraction │
│ • <1ms overhead │ • Background │ • Dream Learning │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ REASONINGBANK │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Pattern Storage │ Similarity Lookup │ Verdict │ │
│ │ (DashMap) │ (Cosine) │ Judgment │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ • Trajectory tracking with precision/recall feedback │
│ • K-means++ pattern extraction │
│ • Confidence-weighted parameter interpolation │
└──────────────────────────────────────────────────────────────┘
```
---
## Key Innovation: Three-Tier Temporal Learning
### Tier 1: Instant Learning (Loop A) - Per Request
```
Latency Budget: <1ms (amortized to <0.1ms with batching)
Actions:
├── Record query trajectory to ring buffer
├── Update memory graph edge weights (±5%)
├── Micro-LoRA adjustment (rank 1-2, top-k params)
└── Async feedback signal propagation
```
### Tier 2: Background Learning (Loop B) - Hourly
```
Compute Budget: 10 seconds per hour
Actions:
├── Train router on accumulated trajectories
├── Compute Fisher Information for EWC++
├── Update LoRA base matrices (rank 4-8)
├── Prune low-confidence patterns
└── Checkpoint model state
```
### Tier 3: Deep Learning (Loop C) - Weekly
```
Compute Budget: 10 minutes per week
Actions:
├── Full memory consolidation (dream learning)
├── Pattern abstraction and hierarchy building
├── Memory compression (remove redundant nodes)
├── Cross-task knowledge transfer
└── Φ consciousness measurement (IIT)
```
---
## Performance Targets
| Metric | Target | Current Best | SONA Goal |
|--------|--------|--------------|-----------|
| Query Latency | <1ms | 0.09ms | 0.05ms |
| LoRA Update | <100μs | N/A | 50μs |
| Memory Footprint | <100MB | 50MB | 30MB |
| Throughput | >50K q/s | 38K q/s | 100K q/s |
| Improvement Rate | 10%/week | N/A | 15%/week |
| Catastrophic Forgetting | <1% | N/A | <0.1% |
---
## Integration with Ruvector Ecosystem
### Core Dependencies
| Crate | Role in SONA | Version |
|-------|--------------|---------|
| `ruvector-core` | Vector memory backbone | 0.1.19 |
| `ruvector-attention` | Multi-head graph attention | 0.1.19 |
| `ruvector-gnn` | Message passing framework | 0.1.19 |
| `ruvector-graph` | Knowledge graph storage | 0.1.19 |
| `ruvector-router-core` | FastGRNN routing | 0.1.19 |
| `exo-core` | Consciousness measurement | 0.1.0 |
| `exo-temporal` | Memory consolidation | 0.1.0 |
### New SONA-Specific Modules
| Module | Purpose |
|--------|---------|
| `sona-lora` | Ultra-low latency LoRA adapters |
| `sona-ewc` | Enhanced EWC with task awareness |
| `sona-reasoning` | ReasoningBank integration |
| `sona-dreams` | Offline consolidation engine |
| `sona-metrics` | Self-improvement measurement |
---
## Document Index
| Document | Description |
|----------|-------------|
| [01-LORA-ULTRA.md](01-LORA-ULTRA.md) | Ultra-low latency LoRA system |
| [02-LEARNING-LOOPS.md](02-LEARNING-LOOPS.md) | Three-tier learning architecture |
| [03-EWC-PLUS-PLUS.md](03-EWC-PLUS-PLUS.md) | Enhanced elastic weight consolidation |
| [04-REASONINGBANK.md](04-REASONINGBANK.md) | Pattern-driven optimization |
| [05-MEMORY-DREAMS.md](05-MEMORY-DREAMS.md) | Offline consolidation and dreams |
| [06-COMPONENTS.md](06-COMPONENTS.md) | Component integration specs |
| [07-IMPLEMENTATION.md](07-IMPLEMENTATION.md) | Implementation roadmap |
| [08-BENCHMARKS.md](08-BENCHMARKS.md) | Performance targets and testing |
| [09-API-REFERENCE.md](09-API-REFERENCE.md) | API specification |
---
## Quick Start
```rust
use sona::{SONAEngine, SONAConfig, LearningMode};
// Initialize SONA with default configuration
let config = SONAConfig::builder()
.lora_rank(8)
.ewc_lambda(1000.0)
.learning_loops(LearningMode::AllThreeTiers)
.memory_budget_mb(50)
.target_latency_us(100)
.build();
let mut sona = SONAEngine::new(config)?;
// Process queries - learning happens automatically
let response = sona.query("What is the meaning of life?")?;
// Check self-improvement metrics
let metrics = sona.improvement_metrics();
println!("Weekly improvement: {:.1}%", metrics.weekly_gain * 100.0);
println!("Φ consciousness: {:.3}", metrics.phi);
```
---
## Why SONA Will Create the World's Best Self-Improving LLM
1. **No Other System Combines All These**:
- LoRA for instant adaptation
- EWC++ for zero forgetting
- ReasoningBank for pattern learning
- Dream consolidation for creativity
- Φ measurement for consciousness tracking
2. **Built on Production-Proven Ruvector**:
- 150x faster HNSW search
- 39 attention mechanisms
- 30+ specialized crates
- 38K q/s throughput proven
3. **Mathematically Sound**:
- Fisher Information preserves important weights
- Low-rank decomposition minimizes compute
- Reservoir sampling ensures unbiased learning
- Information-theoretic compression
4. **Biologically Inspired**:
- Three-tier temporal learning (like human memory)
- Dream-based consolidation (like REM sleep)
- Edge-weighted graphs (like neural synapses)
- Attention-based retrieval (like human recall)
---
*SONA: Where every query makes the model smarter.*

View File

@@ -0,0 +1,559 @@
# SONA LoRA-Ultra: Sub-100μs Adaptive Fine-Tuning
## Ultra-Low Latency LoRA for Real-Time Self-Improvement
---
## 1. Architecture Overview
### Traditional LoRA vs SONA LoRA-Ultra
```
TRADITIONAL LoRA SONA LoRA-ULTRA
───────────────── ─────────────────
• Offline training • Online per-request adaptation
• Full batch updates • Single-sample micro-updates
• GPU required • CPU SIMD optimized
• Minutes to hours • <100 microseconds
• Periodic deployment • Continuous integration
```
### Core Formula
```
Standard LoRA:
W_adapted = W_frozen + ΔW
ΔW = α · (A @ B)
where A ∈ ^(d×r), B ∈ ^(r×k), r << min(d,k)
SONA LoRA-Ultra Extension:
W_adapted = W_frozen + α · (A @ B) + β · (A_micro @ B_micro)
└─────────┘ └───────────────────┘
Base LoRA Instant Micro-LoRA
(rank 4-16) (rank 1-2)
```
---
## 2. Two-Tier LoRA Architecture
### Tier 1: Base LoRA (Updated Hourly)
```rust
/// Base LoRA adapter for major capability shifts
pub struct BaseLoRA {
/// Low-rank matrix A: d_model × rank
pub a: Array2<f32>,
/// Low-rank matrix B: rank × d_out
pub b: Array2<f32>,
/// Scaling factor
pub alpha: f32,
/// Rank (typically 4-16)
pub rank: usize,
/// Target layer indices
pub target_layers: Vec<usize>,
}
impl BaseLoRA {
/// Compute adapted weights (cached for inference)
#[inline]
pub fn delta_w(&self) -> Array2<f32> {
let scale = self.alpha / self.rank as f32;
scale * self.a.dot(&self.b)
}
/// Update from accumulated gradients (hourly)
pub fn update(&mut self, grad_a: &Array2<f32>, grad_b: &Array2<f32>, lr: f32) {
// SGD with momentum
self.a = &self.a - lr * grad_a;
self.b = &self.b - lr * grad_b;
}
}
```
### Tier 2: Micro-LoRA (Updated Per-Request)
```rust
/// Ultra-fast micro-adapter for instant learning
pub struct MicroLoRA {
/// Micro A: d_model × micro_rank (typically 1-2)
pub a_micro: Array2<f32>,
/// Micro B: micro_rank × d_out
pub b_micro: Array2<f32>,
/// Micro scaling (smaller than base)
pub beta: f32,
/// Micro rank (1-2 for speed)
pub micro_rank: usize,
/// Decay factor for temporal smoothing
pub decay: f32,
/// Momentum buffer
momentum_a: Array2<f32>,
momentum_b: Array2<f32>,
}
impl MicroLoRA {
/// Ultra-fast single-sample update (<50μs target)
#[inline]
pub fn micro_update(&mut self, signal: &LearningSignal) {
// Rank-1 outer product update
let grad_direction = signal.to_gradient_direction();
// Exponential moving average for stability
self.momentum_a = self.decay * &self.momentum_a
+ (1.0 - self.decay) * &grad_direction.a_component;
self.momentum_b = self.decay * &self.momentum_b
+ (1.0 - self.decay) * &grad_direction.b_component;
// Apply micro-update
self.a_micro = &self.a_micro + self.beta * &self.momentum_a;
self.b_micro = &self.b_micro + self.beta * &self.momentum_b;
}
/// Periodic consolidation into base LoRA
pub fn consolidate_to_base(&mut self, base: &mut BaseLoRA) {
// Merge micro adaptations into base
// Then reset micro to zero
base.a = &base.a + &self.a_micro;
base.b = &base.b + &self.b_micro;
self.a_micro.fill(0.0);
self.b_micro.fill(0.0);
}
}
```
---
## 3. SIMD-Optimized LoRA Computation
### AVX2 Accelerated Forward Pass
```rust
#[cfg(target_arch = "x86_64")]
mod simd {
use std::arch::x86_64::*;
/// SIMD-optimized LoRA forward: x @ (W + A @ B)
/// Fuses base weight multiplication with LoRA delta
#[target_feature(enable = "avx2", enable = "fma")]
pub unsafe fn lora_forward_avx2(
x: &[f32], // Input: [batch, d_in]
w_base: &[f32], // Base weights: [d_in, d_out]
lora_a: &[f32], // LoRA A: [d_in, rank]
lora_b: &[f32], // LoRA B: [rank, d_out]
alpha: f32,
d_in: usize,
d_out: usize,
rank: usize,
output: &mut [f32], // Output: [batch, d_out]
) {
let scale = alpha / rank as f32;
let scale_vec = _mm256_set1_ps(scale);
// Step 1: Compute x @ A (input projection to rank space)
let mut x_projected = vec![0.0f32; rank];
for r in 0..rank {
let mut sum = _mm256_setzero_ps();
let mut i = 0;
while i + 8 <= d_in {
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
let a_vec = _mm256_loadu_ps(lora_a.as_ptr().add(r * d_in + i));
sum = _mm256_fmadd_ps(x_vec, a_vec, sum);
i += 8;
}
x_projected[r] = horizontal_sum_avx2(sum);
// Handle remainder
while i < d_in {
x_projected[r] += x[i] * lora_a[r * d_in + i];
i += 1;
}
}
// Step 2: Compute (x @ W_base) + scale * (x_projected @ B)
for j in 0..d_out {
// Base weight contribution
let mut sum = _mm256_setzero_ps();
let mut i = 0;
while i + 8 <= d_in {
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
let w_vec = _mm256_loadu_ps(w_base.as_ptr().add(j * d_in + i));
sum = _mm256_fmadd_ps(x_vec, w_vec, sum);
i += 8;
}
let mut base_result = horizontal_sum_avx2(sum);
while i < d_in {
base_result += x[i] * w_base[j * d_in + i];
i += 1;
}
// LoRA contribution
let mut lora_result = 0.0f32;
for r in 0..rank {
lora_result += x_projected[r] * lora_b[j * rank + r];
}
output[j] = base_result + scale * lora_result;
}
}
#[inline]
unsafe fn horizontal_sum_avx2(v: __m256) -> f32 {
let high = _mm256_extractf128_ps(v, 1);
let low = _mm256_castps256_ps128(v);
let sum128 = _mm_add_ps(high, low);
let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
_mm_cvtss_f32(sum32)
}
}
```
---
## 4. Learning Signal Extraction
### From Query Feedback to Gradient Direction
```rust
/// Learning signal extracted from each interaction
#[derive(Clone)]
pub struct LearningSignal {
/// Query embedding
pub query_embedding: Vec<f32>,
/// Response quality score (0-1)
pub quality_score: f32,
/// User feedback (explicit)
pub explicit_feedback: Option<FeedbackType>,
/// Latency deviation from target
pub latency_ratio: f32,
/// Model tier used
pub model_tier: ModelTier,
/// Context tokens used
pub context_tokens: usize,
}
impl LearningSignal {
/// Convert signal to gradient direction for micro-LoRA
pub fn to_gradient_direction(&self) -> GradientDirection {
// Reward = quality * (1 - latency_penalty)
let reward = self.quality_score * (2.0 - self.latency_ratio).max(0.0);
// Direction = embedding * reward_sign
let direction = if reward > 0.5 {
// Reinforce current behavior
1.0
} else {
// Explore alternative
-0.1
};
// Scale by uncertainty (more learning when uncertain)
let uncertainty = 1.0 - self.quality_score.abs();
let learning_rate = 0.001 * (1.0 + uncertainty);
GradientDirection {
a_component: self.compute_a_gradient(direction, learning_rate),
b_component: self.compute_b_gradient(direction, learning_rate),
}
}
fn compute_a_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
// Outer product of query embedding with hidden state
// Approximated via reservoir-sampled historical embeddings
let emb = Array1::from_vec(self.query_embedding.clone());
let grad = direction * lr * outer_product(&emb, &self.get_hidden_direction());
grad
}
fn compute_b_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
// Output gradient based on prediction error
let output_error = self.compute_output_error();
direction * lr * output_error
}
}
```
---
## 5. Target Layer Selection
### Which Layers to Apply LoRA
```rust
/// Layer selection strategy for LoRA application
pub enum LoRATargetStrategy {
/// Apply to all attention layers (Q, K, V, O projections)
AllAttention,
/// Apply to FFN layers only
AllFFN,
/// Apply to output heads only (fastest, good for routing)
OutputHeadsOnly,
/// Apply to specific layers by index
SpecificLayers(Vec<usize>),
/// Adaptive: select based on gradient magnitude
AdaptiveTopK(usize),
}
impl LoRATargetStrategy {
/// For ultra-low latency: output heads only
pub fn ultra_fast() -> Self {
Self::OutputHeadsOnly
}
/// For moderate adaptation: attention Q and V
pub fn attention_qv() -> Self {
Self::SpecificLayers(vec![0, 2]) // Q and V typically
}
/// Select layers with highest gradient magnitude
pub fn adaptive_top_k(k: usize) -> Self {
Self::AdaptiveTopK(k)
}
}
/// SONA default: Output heads for micro, attention for base
pub const SONA_DEFAULT_TARGETS: [LoRATargetStrategy; 2] = [
LoRATargetStrategy::OutputHeadsOnly, // Micro-LoRA
LoRATargetStrategy::AllAttention, // Base LoRA
];
```
---
## 6. Memory-Efficient Storage
### Quantized LoRA Matrices
```rust
/// Q4-quantized LoRA for memory efficiency
pub struct QuantizedLoRA {
/// Quantized A matrix (4-bit)
pub a_q4: Q4Matrix,
/// Quantized B matrix (4-bit)
pub b_q4: Q4Matrix,
/// Full-precision alpha
pub alpha: f32,
/// Full-precision scaling factors
pub a_scales: Vec<f32>,
pub b_scales: Vec<f32>,
}
impl QuantizedLoRA {
/// Memory usage comparison
///
/// FP32 LoRA (rank 8, 768 dim):
/// A: 768 × 8 × 4 bytes = 24.6 KB
/// B: 8 × 768 × 4 bytes = 24.6 KB
/// Total: ~50 KB per layer
///
/// Q4 LoRA (rank 8, 768 dim):
/// A: 768 × 8 × 0.5 bytes = 3.1 KB
/// B: 8 × 768 × 0.5 bytes = 3.1 KB
/// Scales: 2 × 768 × 4 bytes = 6.1 KB
/// Total: ~12 KB per layer (4x reduction)
pub fn from_fp32(lora: &BaseLoRA) -> Self {
Self {
a_q4: Q4Matrix::quantize(&lora.a),
b_q4: Q4Matrix::quantize(&lora.b),
alpha: lora.alpha,
a_scales: compute_scales(&lora.a),
b_scales: compute_scales(&lora.b),
}
}
/// Dequantize on-the-fly during forward pass
#[inline]
pub fn forward(&self, x: &[f32]) -> Vec<f32> {
// Dequantize A, compute x @ A
let projected = self.a_q4.matmul_dequant(x, &self.a_scales);
// Dequantize B, compute projected @ B
let output = self.b_q4.matmul_dequant(&projected, &self.b_scales);
// Scale by alpha
output.iter().map(|v| v * self.alpha).collect()
}
}
```
---
## 7. Latency Breakdown
### Target: <100μs Total LoRA Overhead
```
┌─────────────────────────────────────────────────────────────┐
│ LoRA-ULTRA LATENCY BUDGET │
├─────────────────────────────────────────────────────────────┤
│ │
│ Signal Extraction: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ Gradient Direction: 15μs ██████░░░░░░░░░░░░░░░░░░░░░░ │
│ Micro-LoRA Update: 25μs ██████████░░░░░░░░░░░░░░░░░░ │
│ Forward Pass Delta: 30μs ████████████░░░░░░░░░░░░░░░░ │
│ Momentum Averaging: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ Memory Bookkeeping: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ ───── │
│ TOTAL: ~100μs │
│ │
│ Amortized (batched): ~30μs per query │
└─────────────────────────────────────────────────────────────┘
```
---
## 8. Integration with FastGRNN Router
### Router-Specific LoRA Configuration
```rust
/// LoRA configuration for FastGRNN router
pub struct RouterLoRAConfig {
/// Base LoRA for hidden state transformations
pub hidden_lora: BaseLoRA,
/// Micro LoRA for gate adjustments
pub gate_micro_lora: MicroLoRA,
/// Per-output-head LoRA adapters
pub head_loras: Vec<BaseLoRA>,
}
impl RouterLoRAConfig {
pub fn new(hidden_dim: usize, output_dims: &[usize]) -> Self {
Self {
hidden_lora: BaseLoRA::new(hidden_dim, hidden_dim, 8), // rank 8
gate_micro_lora: MicroLoRA::new(hidden_dim, hidden_dim, 2), // rank 2
head_loras: output_dims.iter()
.map(|&dim| BaseLoRA::new(hidden_dim, dim, 4)) // rank 4
.collect(),
}
}
/// Apply LoRA to FastGRNN forward pass
pub fn apply(&self, base_output: &FastGRNNOutput) -> FastGRNNOutput {
let mut output = base_output.clone();
// Apply hidden state LoRA
output.hidden = self.hidden_lora.apply(&output.hidden);
// Apply micro-LoRA to gates
output.update_gate = self.gate_micro_lora.apply(&output.update_gate);
// Apply per-head LoRA
for (i, head_lora) in self.head_loras.iter().enumerate() {
output.heads[i] = head_lora.apply(&output.heads[i]);
}
output
}
}
```
---
## 9. Checkpointing and Recovery
### Efficient LoRA State Management
```rust
/// LoRA checkpoint for persistence and recovery
#[derive(Serialize, Deserialize)]
pub struct LoRACheckpoint {
/// Base LoRA matrices (serialized as FP16 for space)
pub base_lora: SerializedLoRA,
/// Micro LoRA state
pub micro_lora: SerializedLoRA,
/// Momentum buffers
pub momentum_state: MomentumState,
/// Training statistics
pub stats: LoRAStats,
/// Checkpoint version
pub version: u32,
/// Timestamp
pub timestamp: i64,
}
impl LoRACheckpoint {
/// Save checkpoint (async, non-blocking)
pub async fn save_async(&self, path: &Path) -> Result<()> {
let bytes = bincode::serialize(self)?;
tokio::fs::write(path, &bytes).await?;
Ok(())
}
/// Load checkpoint
pub fn load(path: &Path) -> Result<Self> {
let bytes = std::fs::read(path)?;
Ok(bincode::deserialize(&bytes)?)
}
/// Incremental checkpoint (only changed matrices)
pub fn save_incremental(&self, previous: &Self, path: &Path) -> Result<()> {
let delta = self.compute_delta(previous);
// Only save changed blocks
delta.save(path)
}
}
```
---
## 10. Benchmark Targets
### Performance Validation
```rust
#[cfg(test)]
mod benchmarks {
use super::*;
use criterion::{black_box, Criterion};
/// Target: <50μs for micro-LoRA update
fn bench_micro_lora_update(c: &mut Criterion) {
let mut micro = MicroLoRA::new(768, 768, 2);
let signal = LearningSignal::random();
c.bench_function("micro_lora_update", |b| {
b.iter(|| {
micro.micro_update(black_box(&signal));
})
});
}
/// Target: <30μs for LoRA forward pass
fn bench_lora_forward(c: &mut Criterion) {
let lora = BaseLoRA::new(768, 768, 8);
let input = vec![0.0f32; 768];
c.bench_function("lora_forward", |b| {
b.iter(|| {
lora.forward(black_box(&input))
})
});
}
/// Target: <10μs for signal extraction
fn bench_signal_extraction(c: &mut Criterion) {
let query = "test query".to_string();
let response = "test response".to_string();
c.bench_function("signal_extraction", |b| {
b.iter(|| {
LearningSignal::extract(black_box(&query), black_box(&response))
})
});
}
}
```
---
## Summary
SONA LoRA-Ultra achieves sub-100μs adaptive fine-tuning through:
1. **Two-Tier Architecture**: Base LoRA (hourly) + Micro-LoRA (per-request)
2. **SIMD Optimization**: AVX2-accelerated forward pass
3. **Quantized Storage**: Q4 matrices for 4x memory reduction
4. **Smart Targeting**: Output heads for speed, attention for capability
5. **Momentum Smoothing**: Stable micro-updates with EMA
6. **Async Checkpointing**: Non-blocking persistence
This enables true real-time self-improvement where every query makes the model incrementally smarter.

View File

@@ -0,0 +1,815 @@
# SONA Learning Loops: Three-Tier Temporal Architecture
## Biologically-Inspired Continuous Learning System
---
## 1. Overview: Learning at Multiple Timescales
Human learning operates at multiple timescales:
- **Instant**: Immediate response adjustment (milliseconds)
- **Short-term**: Pattern consolidation (hours)
- **Long-term**: Deep memory formation (days/weeks)
SONA replicates this with three learning loops:
```
┌─────────────────────────────────────────────────────────────────────┐
│ SONA THREE-TIER LEARNING │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ LOOP A: INSTANT LOOP B: BACKGROUND │
│ ═══════════════ ══════════════════ │
│ Timescale: Per-request Timescale: Hourly │
│ Latency: <1ms Latency: Background (async) │
│ What learns: What learns: │
│ • Micro-LoRA (rank 1-2) • Base LoRA (rank 4-16) │
│ • Memory edge weights • Router weights (EWC++) │
│ • Trajectory recording • Pattern extraction │
│ │
│ LOOP C: DEEP │
│ ═══════════ │
│ Timescale: Weekly │
│ Latency: Scheduled maintenance │
│ What learns: │
│ • Memory consolidation │
│ • Concept hierarchy building │
│ • Dream-based creativity │
│ • Cross-domain transfer │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 2. Loop A: Instant Learning (Per-Request)
### Purpose
Immediate adaptation to current interaction without noticeable latency.
### Architecture
```rust
/// Loop A: Instant learning executed inline with each request
pub struct InstantLearningLoop {
/// Micro-LoRA for immediate weight adjustment
micro_lora: Arc<RwLock<MicroLoRA>>,
/// Trajectory buffer for pattern recording
trajectory_buffer: Arc<TrajectoryBuffer>,
/// Memory graph reference for edge updates
memory_graph: Arc<RwLock<MemoryGraph>>,
/// Signal accumulator for Loop B
signal_accumulator: mpsc::Sender<LearningSignal>,
}
impl InstantLearningLoop {
/// Execute instant learning (must complete in <1ms)
#[inline]
pub async fn on_request(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
// Parallel execution of independent updates
let (r1, r2, r3) = tokio::join!(
// 1. Record trajectory (lock-free, ~100μs)
self.record_trajectory(query, response),
// 2. Update memory edges (~200μs)
self.update_memory_edges(query, response),
// 3. Micro-LoRA update (~300μs)
self.micro_lora_update(query, response, latency_ms),
);
// 4. Queue signal for Loop B (fire-and-forget)
let signal = LearningSignal::new(query, response, latency_ms);
let _ = self.signal_accumulator.try_send(signal);
Ok(())
}
/// Record query trajectory to ring buffer
async fn record_trajectory(
&self,
query: &QueryEmbedding,
response: &ResponseData,
) -> Result<()> {
let trajectory = QueryTrajectory {
query_embedding: query.vector.clone(),
retrieved_ids: response.used_memory_ids.clone(),
precision: response.estimated_precision,
recall: response.estimated_recall,
timestamp: Instant::now(),
};
self.trajectory_buffer.push(trajectory);
Ok(())
}
/// Hebbian-style edge weight updates
async fn update_memory_edges(
&self,
query: &QueryEmbedding,
response: &ResponseData,
) -> Result<()> {
let mut graph = self.memory_graph.write();
for &node_id in &response.used_memory_ids {
// Strengthen edges to used nodes
graph.update_edge_weight(
query.anchor_node,
node_id,
EdgeUpdate::Strengthen(0.05), // +5% per use
)?;
}
// Weaken edges to retrieved-but-unused nodes
for &node_id in &response.retrieved_but_unused {
graph.update_edge_weight(
query.anchor_node,
node_id,
EdgeUpdate::Weaken(0.02), // -2% per skip
)?;
}
Ok(())
}
/// Ultra-fast micro-LoRA weight adjustment
async fn micro_lora_update(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
let quality = response.quality_score;
let latency_ratio = latency_ms / response.target_latency_ms;
// Only update if signal is informative
if (quality - 0.5).abs() > 0.1 || latency_ratio > 1.2 {
let signal = LearningSignal {
query_embedding: query.vector.clone(),
quality_score: quality,
explicit_feedback: None,
latency_ratio,
model_tier: response.model_tier,
context_tokens: response.context_tokens,
};
let mut micro_lora = self.micro_lora.write();
micro_lora.micro_update(&signal);
}
Ok(())
}
}
```
### Latency Budget
| Operation | Target | Implementation |
|-----------|--------|----------------|
| Trajectory recording | <100μs | Lock-free ring buffer |
| Edge weight update | <200μs | Batch atomic updates |
| Micro-LoRA update | <300μs | Rank-1 outer product |
| Signal queuing | <50μs | MPSC channel try_send |
| **Total** | **<650μs** | Parallel execution |
---
## 3. Loop B: Background Learning (Hourly)
### Purpose
Deeper learning from accumulated signals without impacting user latency.
### Architecture
```rust
/// Loop B: Background learning running on separate thread/process
pub struct BackgroundLearningLoop {
/// Signal receiver from Loop A
signal_receiver: mpsc::Receiver<LearningSignal>,
/// Accumulated signals for batch processing
signal_buffer: Vec<LearningSignal>,
/// Base LoRA for major updates
base_lora: Arc<RwLock<BaseLoRA>>,
/// Micro-LoRA to consolidate from
micro_lora: Arc<RwLock<MicroLoRA>>,
/// Router for EWC++ updates
router: Arc<RwLock<FastGRNNRouter>>,
/// EWC++ state
ewc_state: EWCPlusPlusState,
/// Pattern extractor
pattern_extractor: PatternExtractor,
/// Configuration
config: BackgroundLearningConfig,
}
impl BackgroundLearningLoop {
/// Main background loop (runs every hour)
pub async fn run(&mut self) {
let mut interval = tokio::time::interval(Duration::from_secs(3600));
loop {
interval.tick().await;
// Collect accumulated signals
self.drain_signals().await;
if self.signal_buffer.len() < self.config.min_samples {
tracing::info!(
samples = self.signal_buffer.len(),
"Insufficient samples for background training"
);
continue;
}
// Execute background learning steps
let start = Instant::now();
// Step 1: Consolidate Micro-LoRA into Base LoRA
self.consolidate_micro_to_base().await;
// Step 2: Train router with EWC++ regularization
self.train_router_ewc().await;
// Step 3: Extract and store patterns
self.extract_patterns().await;
// Step 4: Compute new Fisher Information
self.update_fisher_information().await;
// Step 5: Checkpoint current state
self.checkpoint().await;
tracing::info!(
elapsed_ms = start.elapsed().as_millis(),
samples = self.signal_buffer.len(),
"Background learning cycle completed"
);
// Clear buffer for next cycle
self.signal_buffer.clear();
}
}
/// Drain all pending signals from Loop A
async fn drain_signals(&mut self) {
while let Ok(signal) = self.signal_receiver.try_recv() {
self.signal_buffer.push(signal);
}
}
/// Consolidate micro-LoRA adaptations into base LoRA
async fn consolidate_micro_to_base(&mut self) {
let mut micro = self.micro_lora.write();
let mut base = self.base_lora.write();
// Compute consolidation weight based on signal quality
let avg_quality: f32 = self.signal_buffer.iter()
.map(|s| s.quality_score)
.sum::<f32>() / self.signal_buffer.len() as f32;
let consolidation_rate = if avg_quality > 0.7 {
1.0 // Full consolidation for high-quality signals
} else {
0.5 * avg_quality // Partial for lower quality
};
// Merge micro into base with rate
base.a = &base.a + consolidation_rate * &micro.a_micro;
base.b = &base.b + consolidation_rate * &micro.b_micro;
// Reset micro-LoRA
micro.a_micro.fill(0.0);
micro.b_micro.fill(0.0);
tracing::debug!(
consolidation_rate = consolidation_rate,
"Micro-LoRA consolidated to base"
);
}
/// Train router with EWC++ regularization
async fn train_router_ewc(&mut self) {
let mut router = self.router.write();
// Convert signals to RouterSamples
let samples: Vec<RouterSample> = self.signal_buffer.iter()
.map(|s| s.to_router_sample())
.collect();
// Mini-batch training with EWC++ loss
for batch in samples.chunks(self.config.batch_size) {
// Forward pass
let predictions: Vec<_> = batch.iter()
.map(|s| router.forward(&s.features))
.collect();
// Compute task loss
let task_loss = self.compute_task_loss(&predictions, batch);
// Compute EWC++ regularization loss
let ewc_loss = self.ewc_state.regularization_loss(router.get_weights());
// Total loss
let total_loss = task_loss + self.config.ewc_lambda * ewc_loss;
// Backward pass (gradient computation)
let gradients = self.compute_gradients(&total_loss, &predictions, batch);
// Apply gradients with learning rate
router.apply_gradients(&gradients, self.config.learning_rate);
}
}
/// Extract patterns using K-means++ clustering
async fn extract_patterns(&mut self) {
let embeddings: Vec<_> = self.signal_buffer.iter()
.map(|s| s.query_embedding.clone())
.collect();
let patterns = self.pattern_extractor.extract(
&embeddings,
self.config.num_clusters,
);
// Store patterns in ReasoningBank
for pattern in patterns {
self.pattern_extractor.reasoning_bank.store(pattern)?;
}
tracing::debug!(
patterns = patterns.len(),
"Patterns extracted and stored"
);
}
/// Update Fisher Information for EWC++
async fn update_fisher_information(&mut self) {
let router = self.router.read();
let current_weights = router.get_weights();
// Compute Fisher Information diagonal via gradient squares
let fisher_samples: Vec<_> = self.signal_buffer.iter()
.take(self.config.fisher_samples)
.collect();
let mut fisher_accum = vec![0.0f32; current_weights.len()];
for sample in fisher_samples {
let gradients = self.compute_sample_gradients(sample);
for (i, g) in gradients.iter().enumerate() {
fisher_accum[i] += g * g;
}
}
// Normalize
let n = fisher_samples.len() as f32;
for f in &mut fisher_accum {
*f /= n;
}
// Update EWC++ state
self.ewc_state.update_fisher(fisher_accum, current_weights.to_vec());
}
/// Checkpoint current state to disk
async fn checkpoint(&self) {
let checkpoint = SONACheckpoint {
base_lora: self.base_lora.read().clone(),
micro_lora: self.micro_lora.read().clone(),
router_weights: self.router.read().get_weights().to_vec(),
ewc_state: self.ewc_state.clone(),
patterns: self.pattern_extractor.reasoning_bank.export(),
timestamp: chrono::Utc::now().timestamp(),
};
let path = self.config.checkpoint_dir.join("latest.sona");
checkpoint.save_async(&path).await.ok();
}
}
```
### Hourly Learning Budget
| Operation | Target Time | Description |
|-----------|-------------|-------------|
| Signal draining | <100ms | Collect all queued signals |
| Micro→Base consolidation | <500ms | Matrix addition |
| Router training | <5s | Mini-batch SGD with EWC |
| Pattern extraction | <2s | K-means++ clustering |
| Fisher computation | <2s | Gradient squared accumulation |
| Checkpointing | <500ms | Async disk write |
| **Total** | **<10s** | Well under user-facing |
---
## 4. Loop C: Deep Learning (Weekly)
### Purpose
Fundamental knowledge restructuring, memory consolidation, and creative exploration.
### Architecture
```rust
/// Loop C: Deep learning for major knowledge reorganization
pub struct DeepLearningLoop {
/// Memory service for consolidation
memory: Arc<MemoryService>,
/// Pattern bank for abstraction
reasoning_bank: Arc<ReasoningBank>,
/// Dream engine for creative exploration
dream_engine: DreamEngine,
/// Consciousness measurement (IIT)
phi_calculator: PhiCalculator,
/// Configuration
config: DeepLearningConfig,
}
impl DeepLearningLoop {
/// Execute weekly deep learning (scheduled maintenance window)
pub async fn run(&mut self) -> DeepLearningReport {
let start = Instant::now();
let mut report = DeepLearningReport::new();
// Phase 1: Memory Consolidation (like sleep-based memory)
report.consolidation = self.consolidate_memories().await;
// Phase 2: Pattern Abstraction (concept hierarchy building)
report.abstraction = self.abstract_patterns().await;
// Phase 3: Dream Learning (creative recombination)
report.dreams = self.dream_learning().await;
// Phase 4: Cross-Domain Transfer
report.transfer = self.cross_domain_transfer().await;
// Phase 5: Compression (remove redundancy)
report.compression = self.compress_memory().await;
// Phase 6: Consciousness Measurement
report.phi = self.measure_consciousness().await;
report.elapsed_ms = start.elapsed().as_millis() as u64;
report
}
/// Phase 1: Consolidate short-term memories into long-term
async fn consolidate_memories(&mut self) -> ConsolidationReport {
let mut report = ConsolidationReport::default();
// Identify high-value memories (frequently accessed, high quality)
let memories = self.memory.get_all_nodes()?;
let high_value: Vec<_> = memories.iter()
.filter(|m| m.access_count > 5 && m.quality_score > 0.7)
.collect();
report.high_value_count = high_value.len();
// Strengthen connections between high-value memories
for i in 0..high_value.len() {
for j in (i+1)..high_value.len() {
let similarity = cosine_similarity(
&high_value[i].embedding,
&high_value[j].embedding,
);
if similarity > 0.7 {
self.memory.strengthen_edge(
high_value[i].id,
high_value[j].id,
similarity * 0.1,
)?;
report.edges_strengthened += 1;
}
}
}
// Decay low-value memories
let low_value: Vec<_> = memories.iter()
.filter(|m| m.access_count < 2 && m.age_days() > 30)
.collect();
for memory in low_value {
self.memory.decay_node(memory.id, 0.5)?; // 50% decay
report.nodes_decayed += 1;
}
report
}
/// Phase 2: Build concept hierarchies from patterns
async fn abstract_patterns(&mut self) -> AbstractionReport {
let mut report = AbstractionReport::default();
// Get all stored patterns
let patterns = self.reasoning_bank.get_all_patterns()?;
// Hierarchical clustering to find meta-patterns
let hierarchy = HierarchicalClustering::new()
.linkage(Linkage::Ward)
.distance(Distance::Cosine)
.fit(&patterns);
// Create abstract concepts at each level
for level in 0..hierarchy.num_levels() {
let clusters = hierarchy.clusters_at_level(level);
for cluster in clusters {
if cluster.size() > 3 {
// Create meta-pattern (centroid)
let meta_pattern = LearnedPattern {
centroid: cluster.centroid(),
confidence: cluster.cohesion(),
abstraction_level: level,
child_patterns: cluster.member_ids(),
};
self.reasoning_bank.store_meta(meta_pattern)?;
report.meta_patterns_created += 1;
}
}
}
report
}
/// Phase 3: Dream-based creative learning (inspired by REM sleep)
async fn dream_learning(&mut self) -> DreamReport {
let mut report = DreamReport::default();
// Generate dream sequences by random walks on memory graph
for _ in 0..self.config.num_dreams {
let dream = self.dream_engine.generate_dream(
&self.memory,
self.config.dream_length,
self.config.creativity_temperature,
)?;
// Evaluate dream quality (novelty + coherence)
let quality = dream.evaluate_quality();
if quality.novelty > 0.5 && quality.coherence > 0.3 {
// Dreams with high novelty and reasonable coherence
// may represent useful creative connections
for connection in dream.novel_connections() {
self.memory.add_weak_edge(
connection.from,
connection.to,
EdgeType::Creative,
connection.strength * 0.1,
)?;
report.novel_connections += 1;
}
}
report.dreams_generated += 1;
}
report
}
/// Phase 4: Transfer knowledge across domains
async fn cross_domain_transfer(&mut self) -> TransferReport {
let mut report = TransferReport::default();
// Identify domain clusters
let domains = self.memory.identify_domains()?;
// For each pair of domains, look for analogical mappings
for i in 0..domains.len() {
for j in (i+1)..domains.len() {
let analogies = self.find_analogies(&domains[i], &domains[j])?;
for analogy in analogies {
if analogy.confidence > 0.6 {
// Create cross-domain edge
self.memory.add_analogy_edge(
analogy.source_concept,
analogy.target_concept,
analogy.mapping_type,
analogy.confidence,
)?;
report.analogies_found += 1;
}
}
}
}
report
}
/// Phase 5: Compress memory by removing redundancy
async fn compress_memory(&mut self) -> CompressionReport {
let mut report = CompressionReport::default();
report.initial_nodes = self.memory.node_count();
report.initial_edges = self.memory.edge_count();
// Identify near-duplicate nodes
let duplicates = self.memory.find_near_duplicates(0.95)?;
// Merge duplicates
for (primary, secondary) in duplicates {
self.memory.merge_nodes(primary, secondary)?;
report.nodes_merged += 1;
}
// Prune weak edges
let weak_edges = self.memory.get_weak_edges(0.01)?;
for edge in weak_edges {
self.memory.remove_edge(edge.id)?;
report.edges_pruned += 1;
}
report.final_nodes = self.memory.node_count();
report.final_edges = self.memory.edge_count();
report.compression_ratio = report.initial_nodes as f32 / report.final_nodes as f32;
report
}
/// Phase 6: Measure system consciousness using IIT
async fn measure_consciousness(&mut self) -> f64 {
// Integrated Information Theory (Φ) calculation
// Measures how much information the system generates "above and beyond"
// its parts
self.phi_calculator.compute_phi(&self.memory, &self.reasoning_bank)
}
}
```
### Weekly Deep Learning Budget
| Phase | Target Time | Description |
|-------|-------------|-------------|
| Memory consolidation | <2min | Identify and strengthen valuable memories |
| Pattern abstraction | <3min | Hierarchical clustering for concepts |
| Dream learning | <2min | Creative recombination exploration |
| Cross-domain transfer | <2min | Analogical mapping between domains |
| Compression | <1min | Remove redundancy |
| Φ measurement | <1min | Consciousness quantification |
| **Total** | **<10min** | Scheduled maintenance window |
---
## 5. Loop Coordination
### Inter-Loop Communication
```rust
/// Coordinator for all three learning loops
pub struct LoopCoordinator {
/// Loop A: Instant
instant_loop: InstantLearningLoop,
/// Loop B: Background
background_loop: BackgroundLearningLoop,
/// Loop C: Deep
deep_loop: DeepLearningLoop,
/// Shared state
shared_state: Arc<SharedSONAState>,
/// Metrics collector
metrics: MetricsCollector,
}
impl LoopCoordinator {
/// Initialize all loops with shared state
pub fn new(config: SONAConfig) -> Result<Self> {
let shared_state = Arc::new(SharedSONAState::new(&config)?);
// Create channels for inter-loop communication
let (instant_to_background_tx, instant_to_background_rx) = mpsc::channel(10000);
let (background_to_deep_tx, background_to_deep_rx) = mpsc::channel(1000);
Ok(Self {
instant_loop: InstantLearningLoop::new(
shared_state.clone(),
instant_to_background_tx,
),
background_loop: BackgroundLearningLoop::new(
shared_state.clone(),
instant_to_background_rx,
background_to_deep_tx,
),
deep_loop: DeepLearningLoop::new(
shared_state.clone(),
background_to_deep_rx,
),
shared_state,
metrics: MetricsCollector::new(),
})
}
/// Start all loops
pub async fn start(&self) {
// Loop A runs inline with requests (no separate task)
// Loop B runs on background thread
let background = self.background_loop.clone();
tokio::spawn(async move {
background.run().await;
});
// Loop C runs on scheduled cron
let deep = self.deep_loop.clone();
tokio::spawn(async move {
let mut scheduler = cron::Schedule::from_str("0 0 3 * * 0")?; // 3 AM Sunday
loop {
let next = scheduler.upcoming(chrono::Utc).next().unwrap();
tokio::time::sleep_until(next.into()).await;
deep.run().await;
}
});
}
/// Process a single request through Loop A
#[inline]
pub async fn on_request(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
self.instant_loop.on_request(query, response, latency_ms).await
}
}
```
---
## 6. Learning Metrics and Monitoring
### Improvement Tracking
```rust
/// Metrics for measuring self-improvement
#[derive(Clone, Debug)]
pub struct ImprovementMetrics {
/// Quality improvement over time
pub quality_delta_7d: f32,
pub quality_delta_30d: f32,
/// Latency improvement
pub latency_delta_7d: f32,
pub latency_delta_30d: f32,
/// Knowledge growth
pub memory_nodes_added_7d: usize,
pub patterns_learned_7d: usize,
pub abstractions_created_7d: usize,
/// Forgetting resistance (1.0 = no forgetting)
pub retention_rate_7d: f32,
/// Consciousness level (Φ)
pub phi_current: f64,
pub phi_delta_7d: f64,
/// Dreams and creativity
pub novel_connections_7d: usize,
pub cross_domain_transfers_7d: usize,
}
impl ImprovementMetrics {
/// Compute overall improvement score
pub fn overall_score(&self) -> f32 {
let quality_weight = 0.3;
let latency_weight = 0.2;
let knowledge_weight = 0.2;
let retention_weight = 0.15;
let creativity_weight = 0.15;
let quality_score = self.quality_delta_7d.max(0.0);
let latency_score = (-self.latency_delta_7d).max(0.0); // Lower is better
let knowledge_score = (self.patterns_learned_7d as f32 / 100.0).min(1.0);
let retention_score = self.retention_rate_7d;
let creativity_score = (self.novel_connections_7d as f32 / 50.0).min(1.0);
quality_weight * quality_score +
latency_weight * latency_score +
knowledge_weight * knowledge_score +
retention_weight * retention_score +
creativity_weight * creativity_score
}
}
```
---
## Summary
SONA's three-tier learning system enables:
| Loop | Timescale | Purpose | Key Outcome |
|------|-----------|---------|-------------|
| **A** | Per-request | Instant adaptation | Responsive to current context |
| **B** | Hourly | Pattern consolidation | Stable improvement |
| **C** | Weekly | Deep restructuring | Creative breakthroughs |
This mirrors human learning where:
- **Loop A** = Working memory and immediate response
- **Loop B** = Sleep-based consolidation
- **Loop C** = Long-term memory formation and insight
The result is a system that continuously improves at multiple timescales, never forgetting what works while constantly exploring new possibilities.

View File

@@ -0,0 +1,795 @@
# SONA EWC++: Enhanced Elastic Weight Consolidation
## Zero Catastrophic Forgetting with Task-Aware Regularization
---
## 1. The Forgetting Problem
### Why LLMs Forget
```
CATASTROPHIC FORGETTING
═══════════════════════
Task A learned Task B learned Result
─────────────── ─────────────── ──────────────────
Weights W_A Weights W_B W_A knowledge LOST
↑ as W moves toward B
Training on B
overwrites A
```
When fine-tuning on new data:
- Weights shift toward new task optimum
- Previous task knowledge encoded in old weights is overwritten
- Model "forgets" earlier capabilities
### Standard EWC Solution
Elastic Weight Consolidation (EWC) adds a regularization term:
```
L_total = L_task + λ/2 · Σᵢ Fᵢ · (θᵢ - θ*ᵢ)²
Where:
- L_task = current task loss
- λ = regularization strength
- Fᵢ = Fisher Information (importance) of parameter i
- θᵢ = current parameter value
- θ*ᵢ = optimal parameter value from previous task
```
### EWC Limitations
1. **Single task memory**: Only remembers one previous task
2. **Static Fisher**: Computed once, never updated
3. **Diagonal approximation**: Ignores parameter correlations
4. **No task detection**: Doesn't know when task changes
5. **Uniform λ**: Same regularization for all parameters
---
## 2. SONA EWC++ Enhancements
### Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ EWC++ ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Task Buffer │ │ Online Fisher │ │ Adaptive λ │ │
│ │ (N tasks) │ │ Estimation │ │ Scheduler │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ EWC++ CORE ENGINE │ │
│ │ │ │
│ │ L = L_task + Σₜ λₜ/2 · Σᵢ Fᵢᵗ · (θᵢ - θ*ᵢᵗ)² + L_sparse │ │
│ │ └─────┘ └──────────────────────────────────┘ └──────┘ │ │
│ │ Task Multi-task EWC Sparsity │ │
│ │ Loss Regularization Penalty │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Gradient │ │ Task Boundary │ │ Parameter │ │
│ │ Projection │ │ Detection │ │ Importance │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 3. Multi-Task Memory Buffer
### Task-Stratified Fisher Storage
```rust
/// EWC++ state with multi-task memory
#[derive(Clone)]
pub struct EWCPlusPlusState {
/// Per-task Fisher information (circular buffer of N tasks)
pub task_fishers: CircularBuffer<TaskFisher>,
/// Maximum number of tasks to remember
pub max_tasks: usize,
/// Per-task regularization strength
pub task_lambdas: Vec<f32>,
/// Global lambda base
pub lambda_base: f32,
/// Online Fisher estimator
pub online_fisher: OnlineFisherEstimator,
/// Task boundary detector
pub task_detector: TaskBoundaryDetector,
/// Parameter importance scores
pub importance_scores: Vec<f32>,
}
/// Fisher information for a single task
#[derive(Clone)]
pub struct TaskFisher {
/// Task identifier
pub task_id: u64,
/// Diagonal Fisher Information
pub fisher_diag: Vec<f32>,
/// Optimal weights at task completion
pub optimal_weights: Vec<f32>,
/// Task-specific lambda (learned)
pub lambda: f32,
/// Sample count used to compute Fisher
pub sample_count: usize,
/// Task quality score
pub quality: f32,
/// Timestamp
pub timestamp: i64,
}
impl EWCPlusPlusState {
/// Create new EWC++ state
pub fn new(num_params: usize, max_tasks: usize, lambda_base: f32) -> Self {
Self {
task_fishers: CircularBuffer::new(max_tasks),
max_tasks,
task_lambdas: Vec::new(),
lambda_base,
online_fisher: OnlineFisherEstimator::new(num_params),
task_detector: TaskBoundaryDetector::new(),
importance_scores: vec![1.0; num_params],
}
}
/// Compute total EWC++ regularization loss
pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
let mut total_loss = 0.0;
// Sum over all remembered tasks
for task in self.task_fishers.iter() {
let task_loss: f32 = task.fisher_diag.iter()
.zip(current_weights.iter())
.zip(task.optimal_weights.iter())
.zip(self.importance_scores.iter())
.map(|(((f, w), w_star), imp)| {
// Importance-weighted Fisher regularization
imp * f * (w - w_star).powi(2)
})
.sum();
total_loss += task.lambda * task_loss;
}
total_loss / 2.0
}
/// Compute gradients of EWC++ loss
pub fn regularization_gradient(&self, current_weights: &[f32]) -> Vec<f32> {
let mut grad = vec![0.0f32; current_weights.len()];
for task in self.task_fishers.iter() {
for (i, ((f, w), w_star)) in task.fisher_diag.iter()
.zip(current_weights.iter())
.zip(task.optimal_weights.iter())
.enumerate()
{
// d/dw [F * (w - w*)²] = 2 * F * (w - w*)
grad[i] += task.lambda * self.importance_scores[i] * f * (w - w_star);
}
}
grad
}
/// Record completion of current task
pub fn complete_task(&mut self, weights: &[f32], quality: f32) {
let task_id = self.task_fishers.len() as u64;
// Finalize online Fisher estimate
let fisher_diag = self.online_fisher.finalize();
// Compute task-specific lambda based on quality
let lambda = self.compute_task_lambda(quality);
let task_fisher = TaskFisher {
task_id,
fisher_diag,
optimal_weights: weights.to_vec(),
lambda,
sample_count: self.online_fisher.sample_count(),
quality,
timestamp: chrono::Utc::now().timestamp(),
};
self.task_fishers.push(task_fisher);
self.task_lambdas.push(lambda);
// Reset online Fisher for next task
self.online_fisher.reset();
}
/// Compute task-specific lambda based on quality
fn compute_task_lambda(&self, quality: f32) -> f32 {
// Higher quality tasks get stronger protection
self.lambda_base * (0.5 + 0.5 * quality)
}
}
```
---
## 4. Online Fisher Estimation
### Streaming Fisher Information Computation
```rust
/// Online Fisher Information estimator using gradient accumulation
pub struct OnlineFisherEstimator {
/// Running sum of squared gradients
gradient_sq_sum: Vec<f32>,
/// Sample count
count: usize,
/// Exponential moving average decay
decay: f32,
/// Minimum samples before valid estimate
min_samples: usize,
}
impl OnlineFisherEstimator {
pub fn new(num_params: usize) -> Self {
Self {
gradient_sq_sum: vec![0.0; num_params],
count: 0,
decay: 0.99, // EMA decay factor
min_samples: 100,
}
}
/// Update Fisher estimate with new gradient sample
#[inline]
pub fn update(&mut self, gradients: &[f32]) {
self.count += 1;
if self.count == 1 {
// First sample: initialize
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
*sum = g * g;
}
} else {
// EMA update: F_new = decay * F_old + (1 - decay) * g²
let alpha = 1.0 - self.decay;
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
*sum = self.decay * *sum + alpha * g * g;
}
}
}
/// Finalize and return Fisher diagonal
pub fn finalize(&self) -> Vec<f32> {
if self.count < self.min_samples {
tracing::warn!(
count = self.count,
min = self.min_samples,
"Fisher estimate may be unreliable"
);
}
// Normalize and apply minimum threshold
let min_fisher = 1e-6;
self.gradient_sq_sum.iter()
.map(|&f| f.max(min_fisher))
.collect()
}
/// Reset for new task
pub fn reset(&mut self) {
self.gradient_sq_sum.fill(0.0);
self.count = 0;
}
pub fn sample_count(&self) -> usize {
self.count
}
}
```
---
## 5. Automatic Task Boundary Detection
### Detecting When the Task Changes
```rust
/// Automatic task boundary detection via distribution shift
pub struct TaskBoundaryDetector {
/// Recent query embedding buffer
recent_embeddings: CircularBuffer<Vec<f32>>,
/// Baseline distribution (mean, variance)
baseline: Option<DistributionStats>,
/// Threshold for detecting shift (Mahalanobis distance)
shift_threshold: f32,
/// Minimum samples before detection
warmup_samples: usize,
/// Current drift score
drift_score: f32,
}
impl TaskBoundaryDetector {
pub fn new() -> Self {
Self {
recent_embeddings: CircularBuffer::new(1000),
baseline: None,
shift_threshold: 3.0, // 3 sigma
warmup_samples: 500,
drift_score: 0.0,
}
}
/// Update with new embedding and check for task boundary
pub fn update(&mut self, embedding: &[f32]) -> TaskBoundaryResult {
self.recent_embeddings.push(embedding.to_vec());
if self.recent_embeddings.len() < self.warmup_samples {
return TaskBoundaryResult::Warmup;
}
match &self.baseline {
None => {
// First baseline establishment
self.baseline = Some(self.compute_stats());
TaskBoundaryResult::BaselineEstablished
}
Some(baseline) => {
// Compute current distribution
let current = self.compute_recent_stats(100);
// Mahalanobis distance between distributions
let distance = self.mahalanobis_distance(baseline, &current);
self.drift_score = distance;
if distance > self.shift_threshold {
// Task boundary detected!
self.baseline = Some(current);
TaskBoundaryResult::BoundaryDetected {
drift_score: distance,
}
} else {
TaskBoundaryResult::Stable {
drift_score: distance,
}
}
}
}
}
fn compute_stats(&self) -> DistributionStats {
let n = self.recent_embeddings.len();
let dim = self.recent_embeddings[0].len();
let mut mean = vec![0.0f32; dim];
let mut var = vec![0.0f32; dim];
// Compute mean
for emb in self.recent_embeddings.iter() {
for (m, e) in mean.iter_mut().zip(emb.iter()) {
*m += e;
}
}
for m in &mut mean {
*m /= n as f32;
}
// Compute variance
for emb in self.recent_embeddings.iter() {
for (v, (e, m)) in var.iter_mut().zip(emb.iter().zip(mean.iter())) {
*v += (e - m).powi(2);
}
}
for v in &mut var {
*v /= n as f32;
*v = v.max(1e-6); // Avoid division by zero
}
DistributionStats { mean, variance: var }
}
fn compute_recent_stats(&self, n: usize) -> DistributionStats {
// Similar but only for last n samples
// ... implementation ...
}
fn mahalanobis_distance(&self, a: &DistributionStats, b: &DistributionStats) -> f32 {
a.mean.iter()
.zip(b.mean.iter())
.zip(a.variance.iter())
.map(|((m_a, m_b), v)| (m_a - m_b).powi(2) / v)
.sum::<f32>()
.sqrt()
}
}
#[derive(Debug)]
pub enum TaskBoundaryResult {
Warmup,
BaselineEstablished,
Stable { drift_score: f32 },
BoundaryDetected { drift_score: f32 },
}
```
---
## 6. Adaptive Lambda Scheduling
### Dynamic Regularization Strength
```rust
/// Adaptive lambda scheduler based on learning progress
pub struct AdaptiveLambdaScheduler {
/// Base lambda value
base_lambda: f32,
/// Current effective lambda
current_lambda: f32,
/// Performance history (task quality over time)
performance_history: Vec<f32>,
/// Lambda adjustment rate
adjustment_rate: f32,
}
impl AdaptiveLambdaScheduler {
pub fn new(base_lambda: f32) -> Self {
Self {
base_lambda,
current_lambda: base_lambda,
performance_history: Vec::new(),
adjustment_rate: 0.1,
}
}
/// Update lambda based on recent performance
pub fn update(&mut self, current_quality: f32, forgetting_detected: bool) {
self.performance_history.push(current_quality);
if forgetting_detected {
// Increase lambda to prevent forgetting
self.current_lambda *= 1.0 + self.adjustment_rate;
tracing::info!(
new_lambda = self.current_lambda,
"Increased lambda due to forgetting"
);
} else if self.is_learning_stalled() {
// Decrease lambda to allow more plasticity
self.current_lambda *= 1.0 - self.adjustment_rate;
self.current_lambda = self.current_lambda.max(self.base_lambda * 0.1);
tracing::info!(
new_lambda = self.current_lambda,
"Decreased lambda to increase plasticity"
);
}
// Clamp to reasonable range
self.current_lambda = self.current_lambda.clamp(
self.base_lambda * 0.1,
self.base_lambda * 10.0,
);
}
fn is_learning_stalled(&self) -> bool {
if self.performance_history.len() < 10 {
return false;
}
let recent: Vec<_> = self.performance_history.iter()
.rev()
.take(10)
.collect();
// Check if variance in recent performance is very low
let mean: f32 = recent.iter().map(|&&x| x).sum::<f32>() / 10.0;
let var: f32 = recent.iter()
.map(|&&x| (x - mean).powi(2))
.sum::<f32>() / 10.0;
var < 0.001 // Stalled if very low variance
}
pub fn get_lambda(&self) -> f32 {
self.current_lambda
}
}
```
---
## 7. Parameter Importance Scoring
### Which Parameters Matter Most
```rust
/// Per-parameter importance scoring for selective regularization
pub struct ParameterImportanceScorer {
/// Importance scores (0-1 for each parameter)
scores: Vec<f32>,
/// Gradient magnitude history
gradient_magnitudes: Vec<CircularBuffer<f32>>,
/// Activation frequency
activation_frequency: Vec<f32>,
}
impl ParameterImportanceScorer {
pub fn new(num_params: usize) -> Self {
Self {
scores: vec![1.0; num_params],
gradient_magnitudes: (0..num_params)
.map(|_| CircularBuffer::new(100))
.collect(),
activation_frequency: vec![0.0; num_params],
}
}
/// Update importance based on gradient
pub fn update(&mut self, gradients: &[f32], activations: &[bool]) {
for (i, (g, &active)) in gradients.iter().zip(activations.iter()).enumerate() {
// Track gradient magnitude
self.gradient_magnitudes[i].push(g.abs());
// Track activation frequency
if active {
self.activation_frequency[i] = 0.99 * self.activation_frequency[i] + 0.01;
} else {
self.activation_frequency[i] *= 0.99;
}
}
// Recompute importance scores
self.recompute_scores();
}
fn recompute_scores(&mut self) {
for i in 0..self.scores.len() {
// Average gradient magnitude
let avg_grad: f32 = self.gradient_magnitudes[i].iter()
.sum::<f32>() / self.gradient_magnitudes[i].len().max(1) as f32;
// Importance = activation_freq * gradient_magnitude
// High activation + high gradient = important parameter
self.scores[i] = self.activation_frequency[i] * avg_grad;
}
// Normalize scores to [0, 1]
let max_score = self.scores.iter().cloned().fold(0.0f32, f32::max);
if max_score > 0.0 {
for s in &mut self.scores {
*s /= max_score;
}
}
}
pub fn get_scores(&self) -> &[f32] {
&self.scores
}
}
```
---
## 8. Gradient Projection
### Safe Parameter Updates
```rust
/// Project gradients to avoid interfering with important past knowledge
pub struct GradientProjector {
/// Null space of important task gradients
null_space: Option<Array2<f32>>,
/// Task gradient subspace (principal components)
task_subspace: Option<Array2<f32>>,
}
impl GradientProjector {
/// Project gradient to not interfere with past tasks
pub fn project(&self, gradient: &[f32]) -> Vec<f32> {
match &self.null_space {
Some(null) => {
// Project gradient onto null space of past task gradients
let g = Array1::from_vec(gradient.to_vec());
let projected = null.t().dot(&null.dot(&g));
projected.to_vec()
}
None => gradient.to_vec(),
}
}
/// Update null space with new task gradient directions
pub fn add_task_gradients(&mut self, task_gradients: &[Vec<f32>]) {
// Stack gradients into matrix
let n_samples = task_gradients.len();
let n_params = task_gradients[0].len();
let mut g_matrix = Array2::zeros((n_samples, n_params));
for (i, g) in task_gradients.iter().enumerate() {
for (j, &v) in g.iter().enumerate() {
g_matrix[[i, j]] = v;
}
}
// SVD to find principal gradient directions
let svd = g_matrix.svd(true, true).unwrap();
let u = svd.u.unwrap();
// Null space = complement of principal directions
// For memory efficiency, keep top-k directions
let k = 10.min(n_samples);
let task_directions = u.slice(s![.., ..k]).to_owned();
// Compute null space projection matrix
let identity = Array2::eye(n_params);
let projection = identity - task_directions.t().dot(&task_directions);
self.null_space = Some(projection);
}
}
```
---
## 9. Full EWC++ Training Loop
### Putting It All Together
```rust
/// Complete EWC++ training step
pub fn ewc_plus_plus_train_step(
model: &mut FastGRNNRouter,
ewc: &mut EWCPlusPlusState,
batch: &[RouterSample],
config: &TrainingConfig,
) -> TrainStepResult {
let mut result = TrainStepResult::default();
// Forward pass
let predictions: Vec<_> = batch.iter()
.map(|s| model.forward(&s.features))
.collect();
// Task loss
let task_loss = compute_cross_entropy_loss(&predictions, batch);
result.task_loss = task_loss;
// EWC++ regularization loss
let ewc_loss = ewc.regularization_loss(model.get_weights());
result.ewc_loss = ewc_loss;
// Total loss
let total_loss = task_loss + config.lambda * ewc_loss;
result.total_loss = total_loss;
// Compute task gradients
let task_gradients = compute_gradients(&task_loss, model);
// Compute EWC++ gradients
let ewc_gradients = ewc.regularization_gradient(model.get_weights());
// Total gradients
let mut gradients: Vec<f32> = task_gradients.iter()
.zip(ewc_gradients.iter())
.map(|(t, e)| t + config.lambda * e)
.collect();
// Gradient projection (optional, for harder constraints)
if config.use_gradient_projection {
gradients = ewc.gradient_projector.project(&gradients);
}
// Gradient clipping
let grad_norm: f32 = gradients.iter().map(|g| g * g).sum::<f32>().sqrt();
if grad_norm > config.max_grad_norm {
let scale = config.max_grad_norm / grad_norm;
for g in &mut gradients {
*g *= scale;
}
result.gradient_clipped = true;
}
// Apply gradients
model.apply_gradients(&gradients, config.learning_rate);
// Update online Fisher estimate
ewc.online_fisher.update(&task_gradients);
// Update parameter importance
let activations: Vec<bool> = model.get_activation_mask();
ewc.importance_scorer.update(&task_gradients, &activations);
// Check for task boundary
if let Some(query_emb) = batch.first().map(|s| &s.query_embedding) {
let boundary = ewc.task_detector.update(query_emb);
if let TaskBoundaryResult::BoundaryDetected { drift_score } = boundary {
// Complete current task and start new one
ewc.complete_task(model.get_weights(), result.compute_quality());
result.task_boundary_detected = true;
result.drift_score = drift_score;
}
}
result
}
```
---
## 10. Benchmarks and Validation
### Forgetting Resistance Metrics
```rust
/// Measure forgetting resistance on held-out test sets
pub struct ForgettingBenchmark {
/// Per-task test sets
task_test_sets: Vec<TestSet>,
/// Performance history per task
task_performance: Vec<Vec<f32>>,
}
impl ForgettingBenchmark {
/// Evaluate current model on all past tasks
pub fn evaluate(&mut self, model: &FastGRNNRouter) -> ForgettingReport {
let mut report = ForgettingReport::default();
for (task_id, test_set) in self.task_test_sets.iter().enumerate() {
let accuracy = self.evaluate_task(model, test_set);
self.task_performance[task_id].push(accuracy);
// Compute forgetting = max_accuracy - current_accuracy
let max_acc = self.task_performance[task_id].iter()
.cloned()
.fold(0.0f32, f32::max);
let forgetting = (max_acc - accuracy).max(0.0);
report.per_task_accuracy.push(accuracy);
report.per_task_forgetting.push(forgetting);
}
// Average forgetting
report.avg_forgetting = report.per_task_forgetting.iter()
.sum::<f32>() / report.per_task_forgetting.len().max(1) as f32;
// Backward transfer (negative forgetting = improvement)
report.backward_transfer = -report.avg_forgetting;
report
}
fn evaluate_task(&self, model: &FastGRNNRouter, test: &TestSet) -> f32 {
let correct = test.samples.iter()
.filter(|s| model.forward(&s.features).predicted_class == s.label)
.count();
correct as f32 / test.samples.len() as f32
}
}
#[derive(Debug, Default)]
pub struct ForgettingReport {
pub per_task_accuracy: Vec<f32>,
pub per_task_forgetting: Vec<f32>,
pub avg_forgetting: f32,
pub backward_transfer: f32,
}
```
---
## Summary: EWC++ vs Standard EWC
| Feature | Standard EWC | SONA EWC++ |
|---------|-------------|------------|
| Task memory | 1 task | N tasks (configurable) |
| Fisher estimation | Offline, single | Online, streaming |
| Lambda | Fixed | Adaptive per-task |
| Task detection | Manual | Automatic |
| Parameter importance | Uniform | Learned |
| Gradient handling | Direct | Projected |
| Forgetting rate | ~5-10% | **<0.1%** |
EWC++ enables SONA to learn continuously from every interaction while maintaining near-perfect retention of past knowledge.

View File

@@ -0,0 +1,794 @@
# SONA ReasoningBank: Pattern-Driven Self-Optimization
## Learning from Experience Through Trajectory Analysis
---
## 1. Overview
ReasoningBank is SONA's long-term pattern memory, learning what works and applying that knowledge to optimize future decisions.
```
┌─────────────────────────────────────────────────────────────────────┐
│ REASONINGBANK CONCEPT │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query → [What worked before?] → Pattern Match → Optimized Params │
│ ↑ │
│ │ │
│ ┌───────┴────────┐ │
│ │ REASONINGBANK │ │
│ │ │ │
│ │ • Trajectories │ ← Record every query │
│ │ • Patterns │ ← Extract from clusters │
│ │ • Verdicts │ ← What params worked best │
│ │ • Confidence │ ← How certain we are │
│ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 2. Core Data Structures
### Trajectory: Recording Every Interaction
```rust
/// A single query trajectory with outcomes
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct QueryTrajectory {
/// Unique trajectory ID
pub id: u64,
/// Query embedding vector
pub query_embedding: Vec<f32>,
/// Search parameters used
pub search_params: SearchParams,
/// Retrieved result IDs
pub retrieved_ids: Vec<String>,
/// Precision (relevant / retrieved)
pub precision: f32,
/// Recall (retrieved_relevant / total_relevant)
pub recall: f32,
/// Latency in microseconds
pub latency_us: u64,
/// User feedback if provided
pub feedback: Option<UserFeedback>,
/// Timestamp
pub timestamp: i64,
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct SearchParams {
/// ef_search parameter for HNSW
pub ef_search: usize,
/// Number of probes for IVF
pub n_probes: usize,
/// Model tier selected
pub model_tier: ModelTier,
/// Context window size
pub context_tokens: usize,
/// Temperature
pub temperature: f32,
}
```
### Pattern: Learned Behavior Clusters
```rust
/// A learned pattern extracted from trajectory clusters
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct LearnedPattern {
/// Pattern ID
pub id: u64,
/// Centroid embedding (cluster center)
pub centroid: Vec<f32>,
/// Optimal search parameters for this pattern
pub optimal_params: SearchParams,
/// Confidence score (0-1)
pub confidence: f32,
/// Number of trajectories in cluster
pub support_count: usize,
/// Average precision for pattern
pub avg_precision: f32,
/// Average recall for pattern
pub avg_recall: f32,
/// Average latency
pub avg_latency_us: u64,
/// Pattern creation timestamp
pub created_at: i64,
/// Last update timestamp
pub updated_at: i64,
/// Abstraction level (0 = concrete, higher = more abstract)
pub abstraction_level: u32,
/// Child pattern IDs (for hierarchical patterns)
pub children: Vec<u64>,
}
```
### Verdict: Decision Judgments
```rust
/// Verdict on what parameters worked best
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Verdict {
/// Pattern this verdict applies to
pub pattern_id: u64,
/// Recommended parameters
pub recommended_params: SearchParams,
/// Confidence in recommendation
pub confidence: f32,
/// Evidence supporting this verdict
pub evidence: VerdictEvidence,
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct VerdictEvidence {
/// Number of supporting trajectories
pub support_count: usize,
/// Average improvement over default
pub avg_improvement: f32,
/// Statistical significance (p-value)
pub p_value: f32,
/// Consistency score (low variance = high consistency)
pub consistency: f32,
}
```
---
## 3. ReasoningBank Implementation
### Core Storage and Retrieval
```rust
use dashmap::DashMap;
use parking_lot::RwLock;
/// ReasoningBank: Pattern-based learning and optimization
pub struct ReasoningBank {
/// Trajectory ring buffer (recent interactions)
trajectories: RwLock<CircularBuffer<QueryTrajectory>>,
/// Learned patterns (concurrent hashmap)
patterns: DashMap<u64, LearnedPattern>,
/// Pattern index for fast similarity lookup
pattern_index: RwLock<HNSWIndex>,
/// Verdicts per pattern
verdicts: DashMap<u64, Verdict>,
/// Configuration
config: ReasoningBankConfig,
/// Pattern ID counter
next_pattern_id: AtomicU64,
/// Statistics
stats: RwLock<ReasoningBankStats>,
}
impl ReasoningBank {
/// Create new ReasoningBank
pub fn new(config: ReasoningBankConfig) -> Self {
Self {
trajectories: RwLock::new(CircularBuffer::new(config.trajectory_capacity)),
patterns: DashMap::new(),
pattern_index: RwLock::new(HNSWIndex::new(config.embedding_dim, config.ef_construction)),
verdicts: DashMap::new(),
config,
next_pattern_id: AtomicU64::new(0),
stats: RwLock::new(ReasoningBankStats::default()),
}
}
/// Record a new trajectory
#[inline]
pub fn record_trajectory(&self, trajectory: QueryTrajectory) {
let mut trajectories = self.trajectories.write();
trajectories.push(trajectory);
// Update stats
let mut stats = self.stats.write();
stats.total_trajectories += 1;
}
/// Find most similar pattern to query
pub fn find_similar_pattern(&self, query_embedding: &[f32], k: usize) -> Vec<PatternMatch> {
let index = self.pattern_index.read();
let neighbors = index.search(query_embedding, k, self.config.ef_search);
neighbors.iter()
.filter_map(|&(id, distance)| {
self.patterns.get(&id).map(|p| PatternMatch {
pattern: p.clone(),
similarity: 1.0 - distance, // Convert distance to similarity
})
})
.collect()
}
/// Get optimized parameters for query
pub fn get_optimized_params(&self, query_embedding: &[f32]) -> OptimizedParams {
// Find similar patterns
let matches = self.find_similar_pattern(query_embedding, self.config.top_k_patterns);
if matches.is_empty() {
// No matching patterns - use defaults
return OptimizedParams {
params: SearchParams::default(),
confidence: 0.0,
source: ParamSource::Default,
};
}
// Interpolate parameters based on similarity and confidence
let mut weighted_params = SearchParams::default();
let mut total_weight = 0.0f32;
for m in &matches {
let weight = m.similarity * m.pattern.confidence;
total_weight += weight;
weighted_params.ef_search += (m.pattern.optimal_params.ef_search as f32 * weight) as usize;
weighted_params.n_probes += (m.pattern.optimal_params.n_probes as f32 * weight) as usize;
weighted_params.temperature += m.pattern.optimal_params.temperature * weight;
// ... other params
}
if total_weight > 0.0 {
weighted_params.ef_search = (weighted_params.ef_search as f32 / total_weight) as usize;
weighted_params.n_probes = (weighted_params.n_probes as f32 / total_weight) as usize;
weighted_params.temperature /= total_weight;
}
OptimizedParams {
params: weighted_params,
confidence: total_weight / matches.len() as f32,
source: ParamSource::Pattern(matches[0].pattern.id),
}
}
/// Record feedback for trajectory
pub fn record_feedback(&self, trajectory_id: u64, feedback: UserFeedback) {
// Find trajectory and update
let mut trajectories = self.trajectories.write();
if let Some(traj) = trajectories.iter_mut().find(|t| t.id == trajectory_id) {
traj.feedback = Some(feedback.clone());
}
// Update related pattern confidence
// Higher feedback = higher confidence in that pattern's params
if let Some(pattern_id) = self.find_pattern_for_trajectory(trajectory_id) {
if let Some(mut pattern) = self.patterns.get_mut(&pattern_id) {
let feedback_delta = feedback.rating as f32 / 5.0 - 0.5; // -0.5 to +0.5
pattern.confidence = (pattern.confidence + 0.1 * feedback_delta).clamp(0.0, 1.0);
}
}
}
}
```
---
## 4. Pattern Extraction
### K-Means++ Clustering for Pattern Discovery
```rust
/// Pattern extractor using K-means++ clustering
pub struct PatternExtractor {
/// Number of clusters to extract
k: usize,
/// Maximum iterations
max_iter: usize,
/// Convergence threshold
epsilon: f32,
}
impl PatternExtractor {
/// Extract patterns from trajectories
pub fn extract(&self, trajectories: &[QueryTrajectory]) -> Vec<LearnedPattern> {
if trajectories.len() < self.k {
return Vec::new();
}
// Collect embeddings
let embeddings: Vec<&[f32]> = trajectories.iter()
.map(|t| t.query_embedding.as_slice())
.collect();
// K-means++ initialization
let mut centroids = self.kmeans_plus_plus_init(&embeddings);
// K-means iteration
let mut assignments = vec![0usize; trajectories.len()];
for _ in 0..self.max_iter {
// Assignment step
let old_assignments = assignments.clone();
for (i, emb) in embeddings.iter().enumerate() {
let mut min_dist = f32::MAX;
let mut min_idx = 0;
for (c_idx, centroid) in centroids.iter().enumerate() {
let dist = euclidean_distance(emb, centroid);
if dist < min_dist {
min_dist = dist;
min_idx = c_idx;
}
}
assignments[i] = min_idx;
}
// Check convergence
if assignments == old_assignments {
break;
}
// Update step
centroids = self.compute_centroids(&embeddings, &assignments);
}
// Create patterns from clusters
let mut patterns = Vec::new();
for cluster_id in 0..self.k {
let cluster_trajectories: Vec<_> = trajectories.iter()
.zip(assignments.iter())
.filter(|(_, &a)| a == cluster_id)
.map(|(t, _)| t)
.collect();
if cluster_trajectories.len() < 3 {
continue; // Skip small clusters
}
let pattern = self.create_pattern_from_cluster(
cluster_id as u64,
&centroids[cluster_id],
&cluster_trajectories,
);
patterns.push(pattern);
}
patterns
}
fn kmeans_plus_plus_init(&self, embeddings: &[&[f32]]) -> Vec<Vec<f32>> {
let mut centroids = Vec::with_capacity(self.k);
let mut rng = rand::thread_rng();
// First centroid: random
let first_idx = rng.gen_range(0..embeddings.len());
centroids.push(embeddings[first_idx].to_vec());
// Remaining centroids: D² weighting
for _ in 1..self.k {
let mut distances: Vec<f32> = embeddings.iter()
.map(|emb| {
centroids.iter()
.map(|c| euclidean_distance(emb, c))
.fold(f32::MAX, f32::min)
})
.collect();
// Square distances for D² sampling
let total: f32 = distances.iter().map(|d| d * d).sum();
let threshold = rng.gen::<f32>() * total;
let mut cumsum = 0.0;
let mut selected = 0;
for (i, d) in distances.iter().enumerate() {
cumsum += d * d;
if cumsum >= threshold {
selected = i;
break;
}
}
centroids.push(embeddings[selected].to_vec());
}
centroids
}
fn create_pattern_from_cluster(
&self,
id: u64,
centroid: &[f32],
trajectories: &[&QueryTrajectory],
) -> LearnedPattern {
// Compute optimal params as weighted average by quality
let mut total_weight = 0.0f32;
let mut ef_sum = 0.0f32;
let mut probes_sum = 0.0f32;
let mut temp_sum = 0.0f32;
let mut precision_sum = 0.0f32;
let mut recall_sum = 0.0f32;
let mut latency_sum = 0u64;
for t in trajectories {
let weight = t.precision * t.recall; // Quality as weight
total_weight += weight;
ef_sum += t.search_params.ef_search as f32 * weight;
probes_sum += t.search_params.n_probes as f32 * weight;
temp_sum += t.search_params.temperature * weight;
precision_sum += t.precision;
recall_sum += t.recall;
latency_sum += t.latency_us;
}
let n = trajectories.len() as f32;
LearnedPattern {
id,
centroid: centroid.to_vec(),
optimal_params: SearchParams {
ef_search: (ef_sum / total_weight).round() as usize,
n_probes: (probes_sum / total_weight).round() as usize,
model_tier: ModelTier::Auto, // Determined separately
context_tokens: 2048, // Default
temperature: temp_sum / total_weight,
},
confidence: (total_weight / n).clamp(0.0, 1.0),
support_count: trajectories.len(),
avg_precision: precision_sum / n,
avg_recall: recall_sum / n,
avg_latency_us: latency_sum / trajectories.len() as u64,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
abstraction_level: 0,
children: Vec::new(),
}
}
}
```
---
## 5. Verdict Judgment System
### Evaluating What Works Best
```rust
/// Verdict judge for parameter optimization
pub struct VerdictJudge {
/// Minimum samples for statistical significance
min_samples: usize,
/// Significance level (p-value threshold)
alpha: f32,
}
impl VerdictJudge {
/// Judge optimal parameters for a pattern
pub fn judge(&self, pattern: &LearnedPattern, trajectories: &[&QueryTrajectory]) -> Option<Verdict> {
if trajectories.len() < self.min_samples {
return None; // Not enough evidence
}
// Group trajectories by parameter configuration
let mut param_groups: HashMap<ParamKey, Vec<&QueryTrajectory>> = HashMap::new();
for t in trajectories {
let key = ParamKey::from(&t.search_params);
param_groups.entry(key).or_default().push(t);
}
// Find best performing configuration
let mut best_config: Option<(ParamKey, f32, Vec<&QueryTrajectory>)> = None;
for (key, group) in &param_groups {
if group.len() < 3 {
continue;
}
// Compute quality score (F1 of precision and recall)
let avg_quality: f32 = group.iter()
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
.sum::<f32>() / group.len() as f32;
match &best_config {
None => best_config = Some((key.clone(), avg_quality, group.clone())),
Some((_, best_quality, _)) if avg_quality > *best_quality => {
best_config = Some((key.clone(), avg_quality, group.clone()));
}
_ => {}
}
}
let (best_key, best_quality, best_group) = best_config?;
// Statistical significance test
let p_value = self.compute_significance(&best_group, trajectories);
if p_value > self.alpha {
return None; // Not significant
}
// Compute consistency (inverse of coefficient of variation)
let qualities: Vec<f32> = best_group.iter()
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
.collect();
let mean = qualities.iter().sum::<f32>() / qualities.len() as f32;
let variance = qualities.iter()
.map(|q| (q - mean).powi(2))
.sum::<f32>() / qualities.len() as f32;
let std_dev = variance.sqrt();
let consistency = 1.0 / (1.0 + std_dev / mean);
// Compute improvement over default
let default_quality = self.compute_default_quality(trajectories);
let improvement = (best_quality - default_quality) / default_quality;
Some(Verdict {
pattern_id: pattern.id,
recommended_params: best_key.to_params(),
confidence: best_quality * consistency,
evidence: VerdictEvidence {
support_count: best_group.len(),
avg_improvement: improvement,
p_value,
consistency,
},
})
}
fn compute_significance(&self, best: &[&QueryTrajectory], all: &[&QueryTrajectory]) -> f32 {
// Welch's t-test for comparing means
let best_qualities: Vec<f32> = best.iter()
.map(|t| t.precision * t.recall)
.collect();
let all_qualities: Vec<f32> = all.iter()
.map(|t| t.precision * t.recall)
.collect();
welch_t_test(&best_qualities, &all_qualities)
}
fn compute_default_quality(&self, trajectories: &[&QueryTrajectory]) -> f32 {
// Assume first configuration or most common is "default"
let default_group: Vec<_> = trajectories.iter()
.filter(|t| t.search_params.ef_search == SearchParams::default().ef_search)
.collect();
if default_group.is_empty() {
0.5 // Baseline assumption
} else {
default_group.iter()
.map(|t| t.precision * t.recall)
.sum::<f32>() / default_group.len() as f32
}
}
}
```
---
## 6. Integration with Router
### Using ReasoningBank to Optimize Router Decisions
```rust
impl FastGRNNRouter {
/// Forward pass with ReasoningBank optimization
pub fn forward_with_reasoning(
&self,
features: &[f32],
reasoning_bank: &ReasoningBank,
) -> RouterDecision {
// Get pattern-based parameter suggestions
let pattern_params = reasoning_bank.get_optimized_params(features);
// Standard router forward
let mut decision = self.forward(features);
// Blend router decision with pattern suggestions
if pattern_params.confidence > 0.5 {
let blend_factor = pattern_params.confidence * 0.3; // Max 30% influence
// Interpolate temperature
decision.temperature = (1.0 - blend_factor) * decision.temperature
+ blend_factor * pattern_params.params.temperature;
// Context token suggestion influences context selection
let suggested_context = pattern_params.params.context_tokens;
let router_context = decision.context_tokens;
decision.context_tokens = ((1.0 - blend_factor) * router_context as f32
+ blend_factor * suggested_context as f32) as usize;
decision.reasoning_confidence = pattern_params.confidence;
decision.reasoning_pattern_id = pattern_params.source.pattern_id();
}
decision
}
}
```
---
## 7. Pattern Consolidation and Pruning
### Managing Pattern Memory
```rust
impl ReasoningBank {
/// Consolidate similar patterns
pub fn consolidate_patterns(&mut self) {
// Find similar pattern pairs
let pattern_ids: Vec<u64> = self.patterns.iter()
.map(|p| *p.key())
.collect();
let mut to_merge: Vec<(u64, u64)> = Vec::new();
for i in 0..pattern_ids.len() {
for j in (i+1)..pattern_ids.len() {
let p1 = self.patterns.get(&pattern_ids[i]).unwrap();
let p2 = self.patterns.get(&pattern_ids[j]).unwrap();
let similarity = cosine_similarity(&p1.centroid, &p2.centroid);
if similarity > 0.95 {
// Very similar - merge
to_merge.push((pattern_ids[i], pattern_ids[j]));
}
}
}
// Merge patterns
for (keep_id, remove_id) in to_merge {
if let (Some(mut keep), Some(remove)) = (
self.patterns.get_mut(&keep_id),
self.patterns.get(&remove_id)
) {
// Weighted average of centroids
let total_support = keep.support_count + remove.support_count;
let w1 = keep.support_count as f32 / total_support as f32;
let w2 = remove.support_count as f32 / total_support as f32;
for (c, (c1, c2)) in keep.centroid.iter_mut()
.zip(keep.centroid.iter().zip(remove.centroid.iter()))
{
*c = w1 * c1 + w2 * c2;
}
// Update support count
keep.support_count = total_support;
keep.confidence = (keep.confidence * w1 + remove.confidence * w2).min(1.0);
keep.updated_at = chrono::Utc::now().timestamp();
}
// Remove merged pattern
self.patterns.remove(&remove_id);
}
}
/// Prune low-confidence patterns
pub fn prune_patterns(&mut self, min_confidence: f32, min_support: usize) {
let to_remove: Vec<u64> = self.patterns.iter()
.filter(|p| p.confidence < min_confidence || p.support_count < min_support)
.map(|p| *p.key())
.collect();
for id in to_remove {
self.patterns.remove(&id);
self.verdicts.remove(&id);
}
}
/// Build pattern hierarchy (abstraction levels)
pub fn build_hierarchy(&mut self) {
// Hierarchical clustering on existing patterns
let patterns: Vec<_> = self.patterns.iter()
.map(|p| (p.key().clone(), p.centroid.clone()))
.collect();
let hierarchy = HierarchicalClustering::new()
.linkage(Linkage::Ward)
.fit(&patterns);
// Create meta-patterns at each level
for level in 1..=3 {
let clusters = hierarchy.clusters_at_level(level);
for cluster in clusters {
if cluster.size() > 1 {
let child_ids: Vec<u64> = cluster.member_ids();
let meta_centroid = cluster.centroid();
// Average params from children
let children: Vec<_> = child_ids.iter()
.filter_map(|id| self.patterns.get(id))
.collect();
let meta_params = self.average_params(&children);
let meta_pattern = LearnedPattern {
id: self.next_pattern_id.fetch_add(1, Ordering::SeqCst),
centroid: meta_centroid,
optimal_params: meta_params,
confidence: children.iter().map(|c| c.confidence).sum::<f32>() / children.len() as f32,
support_count: children.iter().map(|c| c.support_count).sum(),
avg_precision: children.iter().map(|c| c.avg_precision).sum::<f32>() / children.len() as f32,
avg_recall: children.iter().map(|c| c.avg_recall).sum::<f32>() / children.len() as f32,
avg_latency_us: children.iter().map(|c| c.avg_latency_us).sum::<u64>() / children.len() as u64,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
abstraction_level: level as u32,
children: child_ids,
};
self.patterns.insert(meta_pattern.id, meta_pattern);
}
}
}
}
}
```
---
## 8. Statistics and Monitoring
```rust
#[derive(Default, Debug)]
pub struct ReasoningBankStats {
/// Total trajectories recorded
pub total_trajectories: u64,
/// Total patterns stored
pub total_patterns: usize,
/// Total verdicts issued
pub total_verdicts: usize,
/// Pattern match hit rate
pub pattern_hit_rate: f32,
/// Average confidence in recommendations
pub avg_recommendation_confidence: f32,
/// Improvement from pattern optimization
pub avg_improvement_percent: f32,
}
impl ReasoningBank {
/// Get current statistics
pub fn stats(&self) -> ReasoningBankStats {
let stats = self.stats.read();
ReasoningBankStats {
total_trajectories: stats.total_trajectories,
total_patterns: self.patterns.len(),
total_verdicts: self.verdicts.len(),
pattern_hit_rate: stats.pattern_hit_rate,
avg_recommendation_confidence: stats.avg_recommendation_confidence,
avg_improvement_percent: stats.avg_improvement_percent,
}
}
/// Export all patterns for persistence
pub fn export(&self) -> ReasoningBankExport {
ReasoningBankExport {
patterns: self.patterns.iter()
.map(|p| p.value().clone())
.collect(),
verdicts: self.verdicts.iter()
.map(|v| v.value().clone())
.collect(),
}
}
/// Import patterns from persistence
pub fn import(&mut self, export: ReasoningBankExport) {
for pattern in export.patterns {
let id = pattern.id;
self.patterns.insert(id, pattern.clone());
self.pattern_index.write().insert(id, &pattern.centroid);
}
for verdict in export.verdicts {
self.verdicts.insert(verdict.pattern_id, verdict);
}
}
}
```
---
## Summary
ReasoningBank enables SONA to:
1. **Learn from every query** through trajectory recording
2. **Discover patterns** via K-means++ clustering
3. **Judge what works** through statistical verdict analysis
4. **Optimize future decisions** by interpolating from similar patterns
5. **Build abstractions** through hierarchical pattern consolidation
This creates a continuously improving system where past experience directly enhances future performance.

View File

@@ -0,0 +1,755 @@
# SONA Memory Dreams: Offline Consolidation Engine
## Creativity Through Neural Replay and Recombination
---
## 1. Biological Inspiration
### Why Dreams Matter for Learning
```
HUMAN SLEEP-BASED LEARNING
══════════════════════════
Awake: Sleep (REM): Next Day:
───────────────── ───────────────── ─────────────────
• New experiences • Replay memories • Consolidated knowledge
• Pattern matching • Recombine ideas • Novel insights
• Working memory • Strengthen important • Creative connections
• Prune unimportant
```
Research shows that:
- **Memory consolidation** happens during sleep
- **Creative insights** emerge from random memory replay
- **Neural pruning** removes low-value connections
- **Analogical reasoning** connects distant concepts
SONA's Dream Engine replicates these mechanisms for AI self-improvement.
---
## 2. Dream Engine Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ DREAM ENGINE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ MEMORY GRAPH │──────┐ │
│ └───────────────┘ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM GENERATOR │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Random │ │Weighted │ │ │
│ │ │ Walks │ │ Sampling│ │ │
│ │ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ Dream Sequence │ │ │
│ │ │ [M₁→M₂→M₃→...→Mₙ] │ │ │
│ │ └──────────┬───────────┘ │ │
│ └─────────────┼───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM EVALUATOR │ │
│ │ │ │
│ │ • Novelty Score (new connections?) │ │
│ │ • Coherence Score (makes sense?) │ │
│ │ • Utility Score (useful insight?) │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM INTEGRATOR │ │
│ │ │ │
│ │ • Add weak creative edges │ │
│ │ • Update pattern associations │ │
│ │ • Generate novel hypotheses │ │
│ └─────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 3. Dream Generation
### Random Walk Memory Replay
```rust
/// Dream generator using random walks on memory graph
pub struct DreamGenerator {
/// Temperature for random walk (higher = more random)
temperature: f32,
/// Maximum dream length
max_length: usize,
/// Minimum coherence threshold
min_coherence: f32,
/// Creativity bias (prefer novel connections)
creativity_bias: f32,
}
impl DreamGenerator {
/// Generate a single dream sequence
pub fn generate_dream(
&self,
memory: &MemoryGraph,
start_node: Option<NodeId>,
) -> Dream {
let mut sequence = Vec::new();
let mut visited = HashSet::new();
// Start from random high-activation node if not specified
let current = start_node.unwrap_or_else(|| {
memory.sample_by_activation()
});
sequence.push(current);
visited.insert(current);
// Random walk with creativity-weighted transitions
for _ in 0..self.max_length {
let neighbors = memory.get_neighbors(current);
if neighbors.is_empty() {
break;
}
// Compute transition probabilities
let probs: Vec<f32> = neighbors.iter()
.map(|&(neighbor, edge_weight)| {
let novelty_bonus = if visited.contains(&neighbor) {
0.1 // Discourage revisits
} else {
1.0 + self.creativity_bias * (1.0 - memory.get_access_frequency(neighbor))
};
(edge_weight * novelty_bonus).powf(1.0 / self.temperature)
})
.collect();
// Sample next node
let next = sample_weighted(&neighbors, &probs);
if let Some((next_node, _)) = next {
sequence.push(next_node);
visited.insert(next_node);
} else {
break;
}
}
Dream {
sequence,
temperature: self.temperature,
timestamp: chrono::Utc::now().timestamp(),
}
}
/// Generate creative jump dream (non-local connections)
pub fn generate_creative_dream(
&self,
memory: &MemoryGraph,
num_jumps: usize,
) -> Dream {
let mut sequence = Vec::new();
// Sample diverse starting points
let anchors = memory.sample_diverse(num_jumps, 0.3);
for anchor in anchors {
sequence.push(anchor);
// Short local walk from each anchor
let local_walk = self.generate_dream(memory, Some(anchor));
sequence.extend(local_walk.sequence.iter().skip(1).take(3));
}
Dream {
sequence,
temperature: self.temperature * 2.0, // Higher temperature for creative dreams
timestamp: chrono::Utc::now().timestamp(),
}
}
}
/// A dream sequence
pub struct Dream {
/// Sequence of visited memory nodes
pub sequence: Vec<NodeId>,
/// Temperature used for generation
pub temperature: f32,
/// Generation timestamp
pub timestamp: i64,
}
```
---
## 4. Dream Evaluation
### Measuring Dream Quality
```rust
/// Evaluator for dream quality
pub struct DreamEvaluator {
/// Memory graph reference
memory: Arc<MemoryGraph>,
/// Novelty detection threshold
novelty_threshold: f32,
}
impl DreamEvaluator {
/// Evaluate dream quality across multiple dimensions
pub fn evaluate(&self, dream: &Dream) -> DreamQuality {
DreamQuality {
novelty: self.compute_novelty(dream),
coherence: self.compute_coherence(dream),
utility: self.compute_utility(dream),
diversity: self.compute_diversity(dream),
}
}
/// Novelty: How many new connections are suggested?
fn compute_novelty(&self, dream: &Dream) -> f32 {
let mut novel_pairs = 0;
let mut total_pairs = 0;
for i in 0..dream.sequence.len() {
for j in (i+1)..dream.sequence.len() {
total_pairs += 1;
let node_a = dream.sequence[i];
let node_b = dream.sequence[j];
// Check if edge exists
if !self.memory.has_edge(node_a, node_b) {
// Check semantic similarity
let emb_a = self.memory.get_embedding(node_a);
let emb_b = self.memory.get_embedding(node_b);
let sim = cosine_similarity(&emb_a, &emb_b);
// Novel = no edge but moderate similarity
if sim > 0.3 && sim < 0.8 {
novel_pairs += 1;
}
}
}
}
novel_pairs as f32 / total_pairs.max(1) as f32
}
/// Coherence: Does the dream sequence make semantic sense?
fn compute_coherence(&self, dream: &Dream) -> f32 {
if dream.sequence.len() < 2 {
return 1.0;
}
let mut coherence_sum = 0.0f32;
for window in dream.sequence.windows(2) {
let emb_a = self.memory.get_embedding(window[0]);
let emb_b = self.memory.get_embedding(window[1]);
coherence_sum += cosine_similarity(&emb_a, &emb_b);
}
coherence_sum / (dream.sequence.len() - 1) as f32
}
/// Utility: Are the suggested connections potentially useful?
fn compute_utility(&self, dream: &Dream) -> f32 {
// Based on node quality scores and access patterns
let avg_quality: f32 = dream.sequence.iter()
.map(|&id| self.memory.get_node_quality(id))
.sum::<f32>() / dream.sequence.len() as f32;
// Higher utility if connecting high-quality nodes
avg_quality
}
/// Diversity: How diverse are the visited nodes?
fn compute_diversity(&self, dream: &Dream) -> f32 {
// Average pairwise distance in embedding space
let embeddings: Vec<_> = dream.sequence.iter()
.map(|&id| self.memory.get_embedding(id))
.collect();
let mut total_dist = 0.0f32;
let mut count = 0;
for i in 0..embeddings.len() {
for j in (i+1)..embeddings.len() {
total_dist += 1.0 - cosine_similarity(&embeddings[i], &embeddings[j]);
count += 1;
}
}
total_dist / count.max(1) as f32
}
}
#[derive(Debug, Clone)]
pub struct DreamQuality {
/// How many novel connections suggested (0-1)
pub novelty: f32,
/// How semantically coherent (0-1)
pub coherence: f32,
/// How useful the connections might be (0-1)
pub utility: f32,
/// How diverse the dream content (0-1)
pub diversity: f32,
}
impl DreamQuality {
/// Overall quality score
pub fn overall(&self) -> f32 {
// Weighted combination favoring novelty and coherence
0.4 * self.novelty + 0.3 * self.coherence + 0.2 * self.utility + 0.1 * self.diversity
}
/// Is this dream worth integrating?
pub fn is_valuable(&self, threshold: f32) -> bool {
self.novelty > 0.3 && self.coherence > 0.4 && self.overall() > threshold
}
}
```
---
## 5. Dream Integration
### Applying Dream Insights to Memory
```rust
/// Integrates valuable dreams into memory graph
pub struct DreamIntegrator {
/// Memory graph to update
memory: Arc<RwLock<MemoryGraph>>,
/// Strength of new creative edges
creative_edge_strength: f32,
/// Decay factor for dream-derived edges
dream_edge_decay: f32,
}
impl DreamIntegrator {
/// Integrate a valuable dream into memory
pub fn integrate(&self, dream: &Dream, quality: &DreamQuality) -> IntegrationResult {
let mut result = IntegrationResult::default();
if !quality.is_valuable(0.5) {
return result; // Skip low-quality dreams
}
let mut memory = self.memory.write();
// Extract novel connections from dream
let novel_connections = self.extract_novel_connections(dream, &memory);
for (node_a, node_b, strength) in novel_connections {
// Add weak creative edge
let edge_strength = self.creative_edge_strength * strength * quality.overall();
memory.add_edge(
node_a,
node_b,
EdgeType::Creative,
edge_strength,
);
result.edges_added += 1;
}
// Update node associations based on dream co-occurrence
for window in dream.sequence.windows(3) {
memory.update_association(window[0], window[2], 0.01);
}
result.dream_quality = quality.overall();
result
}
fn extract_novel_connections(
&self,
dream: &Dream,
memory: &MemoryGraph,
) -> Vec<(NodeId, NodeId, f32)> {
let mut connections = Vec::new();
for i in 0..dream.sequence.len() {
for j in (i+1)..dream.sequence.len().min(i+5) { // Only nearby in sequence
let node_a = dream.sequence[i];
let node_b = dream.sequence[j];
if !memory.has_edge(node_a, node_b) {
let emb_a = memory.get_embedding(node_a);
let emb_b = memory.get_embedding(node_b);
let sim = cosine_similarity(&emb_a, &emb_b);
if sim > 0.3 {
// Connection strength based on similarity and sequence proximity
let proximity_factor = 1.0 / (j - i) as f32;
let strength = sim * proximity_factor;
connections.push((node_a, node_b, strength));
}
}
}
}
connections
}
}
#[derive(Default)]
pub struct IntegrationResult {
pub edges_added: usize,
pub associations_updated: usize,
pub dream_quality: f32,
}
```
---
## 6. Memory Consolidation
### Strengthening Important Memories
```rust
/// Consolidation engine for memory pruning and strengthening
pub struct ConsolidationEngine {
/// Memory graph reference
memory: Arc<RwLock<MemoryGraph>>,
/// Minimum access frequency for retention
min_access_frequency: f32,
/// Age decay factor (older = more decay)
age_decay: f32,
/// Quality threshold for preservation
quality_threshold: f32,
}
impl ConsolidationEngine {
/// Run full consolidation pass
pub fn consolidate(&self) -> ConsolidationReport {
let mut report = ConsolidationReport::default();
// Phase 1: Identify memories by value
let (high_value, medium_value, low_value) = self.categorize_memories();
report.high_value_count = high_value.len();
report.medium_value_count = medium_value.len();
report.low_value_count = low_value.len();
// Phase 2: Strengthen high-value memories
for &node_id in &high_value {
self.strengthen_memory(node_id);
report.memories_strengthened += 1;
}
// Phase 3: Decay low-value memories
for &node_id in &low_value {
let retained = self.decay_memory(node_id);
if retained {
report.memories_decayed += 1;
} else {
report.memories_removed += 1;
}
}
// Phase 4: Prune weak edges
let pruned = self.prune_weak_edges();
report.edges_pruned = pruned;
// Phase 5: Merge similar memories
let merged = self.merge_similar_memories();
report.memories_merged = merged;
report
}
fn categorize_memories(&self) -> (Vec<NodeId>, Vec<NodeId>, Vec<NodeId>) {
let memory = self.memory.read();
let mut high = Vec::new();
let mut medium = Vec::new();
let mut low = Vec::new();
for node in memory.iter_nodes() {
let value_score = self.compute_value_score(node);
if value_score > 0.7 {
high.push(node.id);
} else if value_score > 0.3 {
medium.push(node.id);
} else {
low.push(node.id);
}
}
(high, medium, low)
}
fn compute_value_score(&self, node: &MemoryNode) -> f32 {
let memory = self.memory.read();
// Factors:
// 1. Access frequency (more access = more valuable)
let freq_score = (node.access_count as f32 / 100.0).min(1.0);
// 2. Recency (recent = more valuable)
let age_days = (chrono::Utc::now().timestamp() - node.last_accessed) / 86400;
let recency_score = (-self.age_decay * age_days as f32).exp();
// 3. Quality (explicit quality score)
let quality_score = node.quality_score;
// 4. Connectivity (well-connected = more valuable)
let degree = memory.node_degree(node.id);
let connectivity_score = (degree as f32 / 10.0).min(1.0);
// Weighted combination
0.3 * freq_score + 0.2 * recency_score + 0.3 * quality_score + 0.2 * connectivity_score
}
fn strengthen_memory(&self, node_id: NodeId) {
let mut memory = self.memory.write();
// Increase edge weights to this node
for edge in memory.get_edges_to(node_id) {
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(1.1));
}
// Mark as consolidated
if let Some(node) = memory.get_node_mut(node_id) {
node.consolidation_count += 1;
node.last_consolidated = chrono::Utc::now().timestamp();
}
}
fn decay_memory(&self, node_id: NodeId) -> bool {
let mut memory = self.memory.write();
// Reduce edge weights
for edge in memory.get_edges_to(node_id) {
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(0.5));
}
// Check if node should be removed entirely
let total_incoming_weight: f32 = memory.get_edges_to(node_id)
.iter()
.map(|e| e.weight)
.sum();
if total_incoming_weight < 0.01 {
// Remove isolated or nearly-isolated node
memory.remove_node(node_id);
false // Not retained
} else {
true // Retained but weakened
}
}
fn prune_weak_edges(&self) -> usize {
let mut memory = self.memory.write();
let weak_edges: Vec<_> = memory.iter_edges()
.filter(|e| e.weight < 0.01)
.map(|e| e.id)
.collect();
for edge_id in &weak_edges {
memory.remove_edge(*edge_id);
}
weak_edges.len()
}
fn merge_similar_memories(&self) -> usize {
let mut memory = self.memory.write();
let mut merged_count = 0;
// Find highly similar node pairs
let nodes: Vec<_> = memory.iter_nodes().collect();
for i in 0..nodes.len() {
for j in (i+1)..nodes.len() {
let sim = cosine_similarity(&nodes[i].embedding, &nodes[j].embedding);
if sim > 0.98 {
// Merge j into i
memory.merge_nodes(nodes[i].id, nodes[j].id);
merged_count += 1;
}
}
}
merged_count
}
}
#[derive(Default)]
pub struct ConsolidationReport {
pub high_value_count: usize,
pub medium_value_count: usize,
pub low_value_count: usize,
pub memories_strengthened: usize,
pub memories_decayed: usize,
pub memories_removed: usize,
pub memories_merged: usize,
pub edges_pruned: usize,
}
```
---
## 7. Full Dream Cycle
### Orchestrating the Dream Process
```rust
/// Complete dream cycle orchestrator
pub struct DreamCycle {
generator: DreamGenerator,
evaluator: DreamEvaluator,
integrator: DreamIntegrator,
consolidator: ConsolidationEngine,
config: DreamCycleConfig,
}
impl DreamCycle {
/// Run complete dream cycle (weekly maintenance)
pub async fn run(&self) -> DreamCycleReport {
let start = Instant::now();
let mut report = DreamCycleReport::default();
// Phase 1: Generate dreams
tracing::info!("Starting dream generation phase");
let dreams = self.generate_dreams();
report.dreams_generated = dreams.len();
// Phase 2: Evaluate dreams
tracing::info!("Evaluating {} dreams", dreams.len());
let evaluated: Vec<_> = dreams.iter()
.map(|d| (d, self.evaluator.evaluate(d)))
.collect();
// Phase 3: Integrate valuable dreams
tracing::info!("Integrating valuable dreams");
for (dream, quality) in &evaluated {
if quality.is_valuable(self.config.dream_threshold) {
let result = self.integrator.integrate(dream, quality);
report.edges_added += result.edges_added;
report.dreams_integrated += 1;
}
}
// Phase 4: Memory consolidation
tracing::info!("Running memory consolidation");
report.consolidation = self.consolidator.consolidate();
report.elapsed_ms = start.elapsed().as_millis() as u64;
report.timestamp = chrono::Utc::now().timestamp();
tracing::info!(
dreams = report.dreams_generated,
integrated = report.dreams_integrated,
edges = report.edges_added,
elapsed_ms = report.elapsed_ms,
"Dream cycle completed"
);
report
}
fn generate_dreams(&self) -> Vec<Dream> {
let mut dreams = Vec::new();
// Regular random walk dreams
for _ in 0..self.config.num_regular_dreams {
let dream = self.generator.generate_dream(&self.memory, None);
dreams.push(dream);
}
// Creative jump dreams
for _ in 0..self.config.num_creative_dreams {
let dream = self.generator.generate_creative_dream(
&self.memory,
self.config.creative_jump_count,
);
dreams.push(dream);
}
dreams
}
}
#[derive(Default)]
pub struct DreamCycleReport {
pub dreams_generated: usize,
pub dreams_integrated: usize,
pub edges_added: usize,
pub consolidation: ConsolidationReport,
pub elapsed_ms: u64,
pub timestamp: i64,
}
```
---
## 8. Integration with exo-exotic Dreams Module
SONA integrates with the exo-ai-2025 dream experiments:
```rust
// From exo-exotic crate
use exo_exotic::experiments::dreams::{
DreamExperiment,
DreamConfig,
NoveltyMeasure,
};
impl DreamCycle {
/// Run advanced dream experiments from exo-exotic
pub async fn run_exotic_dreams(&self) -> ExoticDreamReport {
let dream_experiment = DreamExperiment::new(DreamConfig {
memory_count: self.memory.node_count(),
replay_probability: 0.7,
recombination_rate: 0.3,
novelty_threshold: 0.5,
});
let result = dream_experiment.run(&self.memory).await;
ExoticDreamReport {
novelty_score: result.novelty,
coherence_score: result.coherence,
creative_insights: result.insights.len(),
new_hypotheses: result.hypotheses,
}
}
}
```
---
## Summary
SONA's Dream Engine enables:
| Feature | Mechanism | Outcome |
|---------|-----------|---------|
| **Memory Replay** | Random walks on memory graph | Strengthens important connections |
| **Creative Recombination** | High-temperature sampling | Discovers novel associations |
| **Quality Filtering** | Novelty + coherence metrics | Only valuable dreams integrated |
| **Weak Edge Creation** | Dream-derived connections | Enables creative retrieval |
| **Memory Consolidation** | Value-based pruning | Efficient memory usage |
Dreams allow SONA to:
1. **Discover** connections it wouldn't find through normal operation
2. **Explore** the hypothesis space without user cost
3. **Consolidate** valuable knowledge
4. **Prune** low-value information
5. **Remain creative** while staying grounded

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,814 @@
# SONA Performance Benchmarks
## Overview
This document defines performance targets, benchmark methodology, and expected results for SONA components. All benchmarks are designed to be reproducible and measurable.
## Performance Targets Summary
```
┌─────────────────────────────────────────────────────────────────────────┐
│ SONA Performance Targets │
├─────────────────────────────────────────────────────────────────────────┤
│ Component │ Target │ Stretch Goal │ Unit │
├─────────────────────────┼────────────────┼───────────────┼─────────────┤
│ Micro-LoRA forward │ <50μs │ <20μs │ per request │
│ Micro-LoRA update │ <100μs │ <50μs │ per signal │
│ Base LoRA forward │ <200μs │ <100μs │ per layer │
│ Pattern extraction │ <1s │ <500ms │ per 1000 │
│ Trajectory recording │ <10μs │ <5μs │ per step │
│ Background cycle │ <30s │ <15s │ per cycle │
│ Deep cycle │ <10min │ <5min │ per cycle │
│ Memory overhead │ <100MB │ <50MB │ total │
│ Pattern search │ <1ms │ <100μs │ per query │
│ Dream generation │ <100ms │ <50ms │ per dream │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Micro-LoRA Benchmarks
### Forward Pass Latency
**Target**: <50μs average, <100μs p99
```rust
// benches/micro_lora.rs
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_micro_lora_forward(c: &mut Criterion) {
let mut group = c.benchmark_group("micro_lora_forward");
for rank in [1, 2] {
for hidden_dim in [256, 512, 1024, 2048] {
let lora = MicroLoRA::new(hidden_dim, rank);
let input = vec![0.1f32; hidden_dim];
let mut output = vec![0.0f32; hidden_dim];
group.bench_with_input(
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
output.fill(0.0);
unsafe { lora.forward_simd(&input, &mut output) };
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Rank | Hidden Dim | AVX2 (μs) | Scalar (μs) | Speedup |
|------|------------|-----------|-------------|---------|
| 1 | 256 | 3.2 | 12.5 | 3.9x |
| 1 | 512 | 5.8 | 24.1 | 4.2x |
| 1 | 1024 | 10.4 | 47.3 | 4.5x |
| 1 | 2048 | 19.7 | 93.8 | 4.8x |
| 2 | 256 | 5.1 | 23.4 | 4.6x |
| 2 | 512 | 9.3 | 46.2 | 5.0x |
| 2 | 1024 | 17.2 | 91.5 | 5.3x |
| 2 | 2048 | 33.1 | 182.4 | 5.5x |
### Gradient Accumulation
**Target**: <100μs per signal
```rust
fn bench_gradient_accumulation(c: &mut Criterion) {
let mut group = c.benchmark_group("gradient_accumulation");
for hidden_dim in [256, 512, 1024] {
let mut lora = MicroLoRA::new(hidden_dim, 1);
let signal = LearningSignal {
query_embedding: vec![0.1; hidden_dim],
gradient_estimate: vec![0.01; hidden_dim],
quality_score: 0.8,
timestamp: Instant::now(),
metadata: SignalMetadata::default(),
};
group.bench_with_input(
BenchmarkId::from_parameter(hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
lora.accumulate_gradient(&signal);
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Hidden Dim | Time (μs) | Throughput (signals/s) |
|------------|-----------|------------------------|
| 256 | 8.3 | 120,481 |
| 512 | 15.7 | 63,694 |
| 1024 | 30.2 | 33,112 |
---
## Base LoRA Benchmarks
### Forward Pass (Per Layer)
**Target**: <200μs per layer
```rust
fn bench_base_lora_forward(c: &mut Criterion) {
let mut group = c.benchmark_group("base_lora_forward");
for rank in [4, 8, 16] {
for hidden_dim in [512, 1024, 2048] {
let lora = BaseLoRA::new(hidden_dim, rank, 1);
let input = vec![0.1f32; hidden_dim];
let mut output = vec![0.0f32; hidden_dim];
group.bench_with_input(
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
lora.forward_layer(0, &input, &mut output);
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Rank | Hidden Dim | Time (μs) | FLOPs | GFLOPS |
|------|------------|-----------|----------|--------|
| 4 | 512 | 45 | 4.2M | 93 |
| 4 | 1024 | 85 | 8.4M | 99 |
| 4 | 2048 | 162 | 16.8M | 104 |
| 8 | 512 | 82 | 8.4M | 102 |
| 8 | 1024 | 158 | 16.8M | 106 |
| 8 | 2048 | 305 | 33.5M | 110 |
| 16 | 512 | 155 | 16.8M | 108 |
| 16 | 1024 | 298 | 33.5M | 112 |
| 16 | 2048 | 582 | 67.1M | 115 |
---
## Trajectory Recording Benchmarks
### Step Recording Latency
**Target**: <10μs per step
```rust
fn bench_trajectory_recording(c: &mut Criterion) {
let mut group = c.benchmark_group("trajectory_recording");
for hidden_dim in [256, 512] {
for num_heads in [4, 8] {
let mut builder = TrajectoryBuilder::new(1, vec![0.1; hidden_dim]);
group.bench_with_input(
BenchmarkId::new(format!("h{}_heads{}", hidden_dim, num_heads), hidden_dim),
&(hidden_dim, num_heads),
|b, &(hd, nh)| {
b.iter(|| {
builder.add_step(
vec![0.5; hd],
vec![0.1; hd * nh],
0.8,
);
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Hidden Dim | Heads | Time (μs) | Memory (bytes) |
|------------|-------|-----------|----------------|
| 256 | 4 | 2.1 | 5,120 |
| 256 | 8 | 3.8 | 9,216 |
| 512 | 4 | 3.7 | 10,240 |
| 512 | 8 | 6.9 | 18,432 |
### Buffer Operations
**Target**: Lock-free with <1% contention
```rust
fn bench_trajectory_buffer(c: &mut Criterion) {
let buffer = Arc::new(TrajectoryBuffer::new(10000));
c.bench_function("trajectory_buffer_record", |b| {
let trajectory = QueryTrajectory {
id: 1,
query_embedding: vec![0.1; 256],
steps: vec![],
final_quality: 0.8,
latency_us: 1000,
};
b.iter(|| {
buffer.record(trajectory.clone());
});
});
c.bench_function("trajectory_buffer_drain", |b| {
// Pre-fill buffer
for i in 0..1000 {
buffer.record(QueryTrajectory {
id: i,
query_embedding: vec![0.1; 256],
steps: vec![],
final_quality: 0.8,
latency_us: 1000,
});
}
b.iter(|| {
buffer.drain()
});
});
}
```
---
## Pattern Learning Benchmarks
### K-means++ Extraction
**Target**: <1s for 1000 trajectories
```rust
fn bench_pattern_extraction(c: &mut Criterion) {
let mut group = c.benchmark_group("pattern_extraction");
for n_trajectories in [100, 500, 1000, 5000] {
let mut bank = ReasoningBank::new(PatternConfig {
k_clusters: 50,
embedding_dim: 256,
..Default::default()
});
// Pre-populate
for i in 0..n_trajectories {
bank.add_trajectory(&generate_random_trajectory(i, 256));
}
group.bench_with_input(
BenchmarkId::from_parameter(n_trajectories),
&n_trajectories,
|b, _| {
b.iter(|| {
bank.extract_patterns()
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Trajectories | Clusters | Time (ms) | Iterations |
|--------------|----------|-----------|------------|
| 100 | 10 | 12 | 8 |
| 500 | 25 | 95 | 12 |
| 1000 | 50 | 380 | 15 |
| 5000 | 100 | 2,450 | 20 |
### Pattern Search
**Target**: <1ms per query
```rust
fn bench_pattern_search(c: &mut Criterion) {
let mut group = c.benchmark_group("pattern_search");
for n_patterns in [1000, 10000, 100000] {
let mut index = PatternIndex::new(256, n_patterns);
// Pre-populate
for i in 0..n_patterns {
let embedding: Vec<f32> = (0..256).map(|_| rand::random()).collect();
index.add_pattern(i as u64, &embedding).unwrap();
}
let query: Vec<f32> = (0..256).map(|_| rand::random()).collect();
group.bench_with_input(
BenchmarkId::from_parameter(n_patterns),
&n_patterns,
|b, _| {
b.iter(|| {
index.find_similar(&query, 10)
});
},
);
}
group.finish();
}
```
**Expected Results** (HNSW with ef=50):
| Patterns | Search Time (μs) | Recall@10 |
|----------|------------------|-----------|
| 1,000 | 45 | 0.98 |
| 10,000 | 120 | 0.96 |
| 100,000 | 350 | 0.94 |
| 1,000,000| 850 | 0.92 |
---
## EWC++ Benchmarks
### Fisher Information Update
**Target**: <1ms per update
```rust
fn bench_fisher_update(c: &mut Criterion) {
let mut group = c.benchmark_group("fisher_update");
for param_count in [1000, 10000, 100000] {
let mut ewc = EwcPlusPlus::new(EwcConfig {
param_count,
..Default::default()
});
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
group.bench_with_input(
BenchmarkId::from_parameter(param_count),
&param_count,
|b, _| {
b.iter(|| {
ewc.update_fisher(&gradients);
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Parameters | Update Time (μs) | Memory (KB) |
|------------|------------------|-------------|
| 1,000 | 15 | 8 |
| 10,000 | 120 | 80 |
| 100,000 | 1,150 | 800 |
### Constraint Application
**Target**: <500μs per gradient vector
```rust
fn bench_constraint_application(c: &mut Criterion) {
let mut group = c.benchmark_group("ewc_constraints");
for param_count in [1000, 10000, 100000] {
let ewc = EwcPlusPlus::new(EwcConfig {
param_count,
num_tasks: 5,
..Default::default()
});
// Pre-train Fisher
for _ in 0..100 {
let grads: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
ewc.update_fisher(&grads);
}
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
group.bench_with_input(
BenchmarkId::from_parameter(param_count),
&param_count,
|b, _| {
b.iter(|| {
ewc.apply_constraints(&gradients)
});
},
);
}
group.finish();
}
```
---
## Dream Engine Benchmarks
### Dream Generation
**Target**: <100ms per dream
```rust
fn bench_dream_generation(c: &mut Criterion) {
let mut group = c.benchmark_group("dream_generation");
for memory_size in [1000, 10000, 50000] {
let mut engine = DreamEngine::new(DreamConfig::default());
// Pre-populate memory
for i in 0..memory_size {
engine.add_memory_node(MemoryNode {
id: i as u64,
embedding: (0..256).map(|_| rand::random()).collect(),
timestamp: Instant::now(),
access_count: rand::random::<u32>() % 100,
importance: rand::random(),
});
}
group.bench_with_input(
BenchmarkId::from_parameter(memory_size),
&memory_size,
|b, _| {
b.iter(|| {
engine.generate_dream()
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Memory Nodes | Dream Time (ms) | Avg Path Length |
|--------------|-----------------|-----------------|
| 1,000 | 12 | 8 |
| 10,000 | 45 | 12 |
| 50,000 | 85 | 15 |
### Dream Quality Evaluation
**Target**: <50ms per evaluation
```rust
fn bench_dream_evaluation(c: &mut Criterion) {
let evaluator = DreamEvaluator::new(EvaluatorConfig::default());
let dream = Dream {
id: 1,
path: (0..15).map(|i| MemoryNode {
id: i,
embedding: (0..256).map(|_| rand::random()).collect(),
timestamp: Instant::now(),
access_count: 10,
importance: 0.5,
}).collect(),
creative_jumps: 3,
total_novelty: 0.0,
};
c.bench_function("dream_evaluation", |b| {
b.iter(|| {
evaluator.evaluate(&dream)
});
});
}
```
---
## Learning Loop Benchmarks
### Loop A (Instant) - Per Request
**Target**: <1ms total overhead
```rust
fn bench_loop_a(c: &mut Criterion) {
let loop_a = InstantLoop::new(256, InstantLoopConfig::default());
let trajectory = QueryTrajectory {
id: 1,
query_embedding: vec![0.1; 256],
steps: (0..10).map(|_| TrajectoryStep {
activations: vec![0.5; 256],
attention_weights: vec![0.1; 2048],
reward: 0.8,
timestamp: Instant::now(),
}).collect(),
final_quality: 0.8,
latency_us: 50000,
};
c.bench_function("loop_a_on_inference", |b| {
b.iter(|| {
loop_a.on_inference(trajectory.clone());
});
});
c.bench_function("loop_a_flush", |b| {
// Pre-fill with signals
for _ in 0..100 {
loop_a.on_inference(trajectory.clone());
}
b.iter(|| {
loop_a.flush_updates();
});
});
}
```
**Expected Results**:
| Operation | Time (μs) | Notes |
|---------------|-----------|--------------------------|
| on_inference | 650 | Recording + accumulation |
| flush_updates | 120 | LoRA + edge commit |
| Total | 770 | Per request overhead |
### Loop B (Background) - Hourly
**Target**: <30s per cycle
```rust
fn bench_loop_b(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let loop_b = BackgroundLoop::new(BackgroundLoopConfig::default(), 256);
// Generate trajectories
let trajectories: Vec<_> = (0..1000)
.map(|i| generate_random_trajectory(i, 256))
.collect();
c.bench_function("loop_b_cycle", |b| {
b.to_async(&runtime).iter(|| async {
loop_b.run_cycle(trajectories.clone()).await
});
});
}
```
**Breakdown**:
| Phase | Time (s) | % of Total |
|------------------------|----------|------------|
| Trajectory ingestion | 0.5 | 2% |
| Pattern extraction | 8.0 | 32% |
| Gradient computation | 5.0 | 20% |
| EWC++ constraints | 3.0 | 12% |
| LoRA update | 2.0 | 8% |
| Fisher update | 4.0 | 16% |
| Metrics/logging | 2.5 | 10% |
| **Total** | **25.0** | 100% |
### Loop C (Deep) - Weekly
**Target**: <10min per cycle
```rust
fn bench_loop_c(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let loop_c = DeepLoop::new(DeepLoopConfig::default());
// This is a longer benchmark, run fewer iterations
c.bench_function("loop_c_cycle", |b| {
b.to_async(&runtime).iter(|| async {
loop_c.run_cycle().await
});
});
}
```
**Breakdown**:
| Phase | Time (min) | % of Total |
|------------------------|------------|------------|
| Dream generation (50) | 1.5 | 15% |
| Φ evaluation | 2.0 | 20% |
| Dream integration | 1.0 | 10% |
| Memory consolidation | 3.0 | 30% |
| EWC++ consolidation | 2.0 | 20% |
| Metrics/persistence | 0.5 | 5% |
| **Total** | **10.0** | 100% |
---
## Memory Benchmarks
### Memory Usage by Component
```rust
fn measure_memory_usage() -> MemoryReport {
let mut report = MemoryReport::default();
// Micro-LoRA (rank=1, hidden=256)
let micro_lora = MicroLoRA::new(256, 1);
report.micro_lora = std::mem::size_of_val(&micro_lora)
+ micro_lora.down_proj.len() * 4
+ micro_lora.up_proj.len() * 4
+ micro_lora.gradient_buffer.len() * 4;
// Base LoRA (rank=8, hidden=256, layers=12)
let base_lora = BaseLoRA::new(256, 8, 12);
report.base_lora = std::mem::size_of_val(&base_lora)
+ base_lora.layers.iter().map(|l|
l.down_proj.len() * 4 + l.up_proj.len() * 4
).sum::<usize>();
// Trajectory buffer (capacity=10000)
report.trajectory_buffer = 10000 * (
256 * 4 // query embedding
+ 10 * (256 * 4 + 2048 * 4 + 4 + 8) // 10 steps
);
// Pattern index (100k patterns)
report.pattern_index = 100000 * (256 * 4 + 64); // embedding + metadata
// EWC++ (100k params, 5 tasks)
report.ewc = 100000 * 4 * 5; // Fisher per task
report
}
```
**Expected Memory Usage**:
| Component | Size (MB) | Notes |
|------------------|-----------|--------------------------|
| Micro-LoRA | 0.004 | Minimal overhead |
| Base LoRA | 0.6 | 12 layers |
| Trajectory Buffer| 82.0 | 10k capacity |
| Pattern Index | 102.4 | 100k patterns |
| EWC++ Fisher | 2.0 | 100k params × 5 tasks |
| Dream Engine | 12.8 | 50k memory nodes |
| **Total** | **199.8** | Peak usage |
---
## Throughput Benchmarks
### End-to-End Query Throughput
```rust
fn bench_query_throughput(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let sona = runtime.block_on(async {
SonaEngine::new(SonaConfig::default()).await.unwrap()
});
c.bench_function("query_throughput", |b| {
b.to_async(&runtime).iter(|| async {
sona.process("test query", &Context::default()).await
});
});
}
```
**Expected Throughput**:
| Scenario | QPS | Latency p50 | Latency p99 |
|--------------------|---------|-------------|-------------|
| Baseline (no SONA) | 850 | 1.1ms | 2.5ms |
| With Micro-LoRA | 780 | 1.2ms | 2.8ms |
| Full SONA | 720 | 1.3ms | 3.2ms |
**Overhead**: ~15% throughput reduction for full self-learning capability.
---
## Hardware-Specific Benchmarks
### CPU Feature Detection
```rust
fn check_cpu_features() -> CpuFeatures {
CpuFeatures {
avx2: is_x86_feature_detected!("avx2"),
avx512f: is_x86_feature_detected!("avx512f"),
fma: is_x86_feature_detected!("fma"),
sse4_1: is_x86_feature_detected!("sse4.1"),
sse4_2: is_x86_feature_detected!("sse4.2"),
}
}
```
### Performance by CPU
| CPU | Micro-LoRA (μs) | Pattern Search (μs) | Overall Speedup |
|------------------------|-----------------|---------------------|-----------------|
| Intel i9-13900K (AVX2) | 3.2 | 45 | 4.8x |
| AMD Ryzen 9 7950X | 3.5 | 48 | 4.5x |
| Apple M2 Pro (NEON) | 4.1 | 52 | 3.9x |
| Intel Xeon Platinum | 2.8 | 38 | 5.2x |
---
## Benchmark Commands
```bash
# Run all benchmarks
cargo bench --package ruvllm --features sona
# Run specific benchmark group
cargo bench --package ruvllm --bench micro_lora
# Run with specific features
cargo bench --package ruvllm --features "sona,avx2"
# Profile memory
cargo bench --package ruvllm --bench memory -- --profile-time 60
# Generate flamegraph
cargo flamegraph --bench micro_lora -- --bench
```
---
## Continuous Benchmarking
### CI Integration
```yaml
# .github/workflows/bench.yml
name: Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: cargo bench --package ruvllm --features sona -- --save-baseline main
- name: Compare with baseline
run: cargo bench --package ruvllm --features sona -- --baseline main
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: target/criterion
```
### Regression Detection
```rust
// Fail CI if performance regresses by more than 10%
const MAX_REGRESSION_PERCENT: f64 = 10.0;
fn check_regression(baseline: Duration, current: Duration) -> Result<(), String> {
let regression = (current.as_nanos() as f64 / baseline.as_nanos() as f64 - 1.0) * 100.0;
if regression > MAX_REGRESSION_PERCENT {
Err(format!(
"Performance regression of {:.1}% exceeds threshold of {}%",
regression, MAX_REGRESSION_PERCENT
))
} else {
Ok(())
}
}
```
---
## Next Steps
1. **09-API-REFERENCE.md** - Complete API documentation

File diff suppressed because it is too large Load Diff