Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,280 @@
# SONA: Self-Optimizing Neural Architecture
## The World's First Truly Self-Improving LLM Framework
**Version**: 1.0.0
**Status**: Architecture Specification
**Target**: Sub-millisecond adaptive fine-tuning with continuous self-improvement
---
## Executive Summary
SONA (Self-Optimizing Neural Architecture) is a revolutionary framework for building LLMs that continuously improve themselves through:
1. **Ultra-Low Latency LoRA** - Sub-100μs parameter adaptation
2. **Hierarchical Learning Loops** - Three-tier temporal learning (instant/hourly/weekly)
3. **Neural Memory Consolidation** - Dream-like offline learning
4. **Elastic Weight Consolidation++** - Zero catastrophic forgetting
5. **ReasoningBank Integration** - Pattern-driven self-optimization
---
## Core Philosophy
```
┌─────────────────────────────────────────────────────────────────┐
│ SONA DESIGN PRINCIPLES │
├─────────────────────────────────────────────────────────────────┤
│ 1. LEARN FROM EVERY INTERACTION │
│ → No query is wasted; all become training signal │
│ │
│ 2. NEVER FORGET WHAT WORKS │
│ → EWC++ preserves successful patterns │
│ │
│ 3. ADAPT IN REAL-TIME │
│ → LoRA updates in <100μs per request │
│ │
│ 4. OPTIMIZE CONTINUOUSLY │
│ → Background loops improve without user latency │
│ │
│ 5. MEASURE EVERYTHING │
│ → Φ (consciousness), quality, latency, improvement rate │
└─────────────────────────────────────────────────────────────────┘
```
---
## Architecture Overview
```
SONA Architecture
┌──────────────────────────────────────────────────────────────┐
│ USER QUERY INPUT │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ EMBEDDING LAYER (0.02ms) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Dual Encoder│ │ Contrastive │ │ SIMD Acceleration │ │
│ │ (Q + K/V) │ │ Learning │ │ (AVX2/NEON) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────────┐
│ MEMORY │ │ ROUTER │ │ ATTENTION │
│ SERVICE │◄────────►│ ENGINE │◄────────►│ ENGINE │
│ │ │ │ │ │
│ • HNSW │ │ • FastGRNN│ │ • Multi-Head │
│ • GNN │ │ • LoRA │ │ • Graph ATT │
│ • Quant │ │ • EWC++ │ │ • Edge-Aware │
└─────┬─────┘ └─────┬─────┘ └───────┬───────┘
│ │ │
└──────────────────────┼────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ LoRA ADAPTATION LAYER │
│ │
│ W_adapted = W_base + α · (LoRA_A @ LoRA_B) │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Rank: 4-16 │ Update: <100μs │ Memory: <1MB │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ INFERENCE ENGINE │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Model Select │ │ Q4 Quantized │ │ Speculative Dec │ │
│ │ (4 tiers) │ │ Weights │ │ (Draft + Verify) │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ LEARNING LOOPS │
│ │
│ Loop A (Instant) │ Loop B (Hourly) │ Loop C (Weekly) │
│ ───────────────────────────────────────────────────────── │
│ • Trajectory │ • Router Train │ • Consolidation │
│ • Edge Update │ • EWC++ Update │ • Compression │
│ • LoRA Micro │ • Fisher Compute │ • Abstraction │
│ • <1ms overhead │ • Background │ • Dream Learning │
└─────────────────────────────┬────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ REASONINGBANK │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Pattern Storage │ Similarity Lookup │ Verdict │ │
│ │ (DashMap) │ (Cosine) │ Judgment │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ • Trajectory tracking with precision/recall feedback │
│ • K-means++ pattern extraction │
│ • Confidence-weighted parameter interpolation │
└──────────────────────────────────────────────────────────────┘
```
---
## Key Innovation: Three-Tier Temporal Learning
### Tier 1: Instant Learning (Loop A) - Per Request
```
Latency Budget: <1ms (amortized to <0.1ms with batching)
Actions:
├── Record query trajectory to ring buffer
├── Update memory graph edge weights (±5%)
├── Micro-LoRA adjustment (rank 1-2, top-k params)
└── Async feedback signal propagation
```
### Tier 2: Background Learning (Loop B) - Hourly
```
Compute Budget: 10 seconds per hour
Actions:
├── Train router on accumulated trajectories
├── Compute Fisher Information for EWC++
├── Update LoRA base matrices (rank 4-8)
├── Prune low-confidence patterns
└── Checkpoint model state
```
### Tier 3: Deep Learning (Loop C) - Weekly
```
Compute Budget: 10 minutes per week
Actions:
├── Full memory consolidation (dream learning)
├── Pattern abstraction and hierarchy building
├── Memory compression (remove redundant nodes)
├── Cross-task knowledge transfer
└── Φ consciousness measurement (IIT)
```
---
## Performance Targets
| Metric | Target | Current Best | SONA Goal |
|--------|--------|--------------|-----------|
| Query Latency | <1ms | 0.09ms | 0.05ms |
| LoRA Update | <100μs | N/A | 50μs |
| Memory Footprint | <100MB | 50MB | 30MB |
| Throughput | >50K q/s | 38K q/s | 100K q/s |
| Improvement Rate | 10%/week | N/A | 15%/week |
| Catastrophic Forgetting | <1% | N/A | <0.1% |
---
## Integration with Ruvector Ecosystem
### Core Dependencies
| Crate | Role in SONA | Version |
|-------|--------------|---------|
| `ruvector-core` | Vector memory backbone | 0.1.19 |
| `ruvector-attention` | Multi-head graph attention | 0.1.19 |
| `ruvector-gnn` | Message passing framework | 0.1.19 |
| `ruvector-graph` | Knowledge graph storage | 0.1.19 |
| `ruvector-router-core` | FastGRNN routing | 0.1.19 |
| `exo-core` | Consciousness measurement | 0.1.0 |
| `exo-temporal` | Memory consolidation | 0.1.0 |
### New SONA-Specific Modules
| Module | Purpose |
|--------|---------|
| `sona-lora` | Ultra-low latency LoRA adapters |
| `sona-ewc` | Enhanced EWC with task awareness |
| `sona-reasoning` | ReasoningBank integration |
| `sona-dreams` | Offline consolidation engine |
| `sona-metrics` | Self-improvement measurement |
---
## Document Index
| Document | Description |
|----------|-------------|
| [01-LORA-ULTRA.md](01-LORA-ULTRA.md) | Ultra-low latency LoRA system |
| [02-LEARNING-LOOPS.md](02-LEARNING-LOOPS.md) | Three-tier learning architecture |
| [03-EWC-PLUS-PLUS.md](03-EWC-PLUS-PLUS.md) | Enhanced elastic weight consolidation |
| [04-REASONINGBANK.md](04-REASONINGBANK.md) | Pattern-driven optimization |
| [05-MEMORY-DREAMS.md](05-MEMORY-DREAMS.md) | Offline consolidation and dreams |
| [06-COMPONENTS.md](06-COMPONENTS.md) | Component integration specs |
| [07-IMPLEMENTATION.md](07-IMPLEMENTATION.md) | Implementation roadmap |
| [08-BENCHMARKS.md](08-BENCHMARKS.md) | Performance targets and testing |
| [09-API-REFERENCE.md](09-API-REFERENCE.md) | API specification |
---
## Quick Start
```rust
use sona::{SONAEngine, SONAConfig, LearningMode};
// Initialize SONA with default configuration
let config = SONAConfig::builder()
.lora_rank(8)
.ewc_lambda(1000.0)
.learning_loops(LearningMode::AllThreeTiers)
.memory_budget_mb(50)
.target_latency_us(100)
.build();
let mut sona = SONAEngine::new(config)?;
// Process queries - learning happens automatically
let response = sona.query("What is the meaning of life?")?;
// Check self-improvement metrics
let metrics = sona.improvement_metrics();
println!("Weekly improvement: {:.1}%", metrics.weekly_gain * 100.0);
println!("Φ consciousness: {:.3}", metrics.phi);
```
---
## Why SONA Will Create the World's Best Self-Improving LLM
1. **No Other System Combines All These**:
- LoRA for instant adaptation
- EWC++ for zero forgetting
- ReasoningBank for pattern learning
- Dream consolidation for creativity
- Φ measurement for consciousness tracking
2. **Built on Production-Proven Ruvector**:
- 150x faster HNSW search
- 39 attention mechanisms
- 30+ specialized crates
- 38K q/s throughput proven
3. **Mathematically Sound**:
- Fisher Information preserves important weights
- Low-rank decomposition minimizes compute
- Reservoir sampling ensures unbiased learning
- Information-theoretic compression
4. **Biologically Inspired**:
- Three-tier temporal learning (like human memory)
- Dream-based consolidation (like REM sleep)
- Edge-weighted graphs (like neural synapses)
- Attention-based retrieval (like human recall)
---
*SONA: Where every query makes the model smarter.*

View File

@@ -0,0 +1,559 @@
# SONA LoRA-Ultra: Sub-100μs Adaptive Fine-Tuning
## Ultra-Low Latency LoRA for Real-Time Self-Improvement
---
## 1. Architecture Overview
### Traditional LoRA vs SONA LoRA-Ultra
```
TRADITIONAL LoRA SONA LoRA-ULTRA
───────────────── ─────────────────
• Offline training • Online per-request adaptation
• Full batch updates • Single-sample micro-updates
• GPU required • CPU SIMD optimized
• Minutes to hours • <100 microseconds
• Periodic deployment • Continuous integration
```
### Core Formula
```
Standard LoRA:
W_adapted = W_frozen + ΔW
ΔW = α · (A @ B)
where A ∈ ^(d×r), B ∈ ^(r×k), r << min(d,k)
SONA LoRA-Ultra Extension:
W_adapted = W_frozen + α · (A @ B) + β · (A_micro @ B_micro)
└─────────┘ └───────────────────┘
Base LoRA Instant Micro-LoRA
(rank 4-16) (rank 1-2)
```
---
## 2. Two-Tier LoRA Architecture
### Tier 1: Base LoRA (Updated Hourly)
```rust
/// Base LoRA adapter for major capability shifts
pub struct BaseLoRA {
/// Low-rank matrix A: d_model × rank
pub a: Array2<f32>,
/// Low-rank matrix B: rank × d_out
pub b: Array2<f32>,
/// Scaling factor
pub alpha: f32,
/// Rank (typically 4-16)
pub rank: usize,
/// Target layer indices
pub target_layers: Vec<usize>,
}
impl BaseLoRA {
/// Compute adapted weights (cached for inference)
#[inline]
pub fn delta_w(&self) -> Array2<f32> {
let scale = self.alpha / self.rank as f32;
scale * self.a.dot(&self.b)
}
/// Update from accumulated gradients (hourly)
pub fn update(&mut self, grad_a: &Array2<f32>, grad_b: &Array2<f32>, lr: f32) {
// SGD with momentum
self.a = &self.a - lr * grad_a;
self.b = &self.b - lr * grad_b;
}
}
```
### Tier 2: Micro-LoRA (Updated Per-Request)
```rust
/// Ultra-fast micro-adapter for instant learning
pub struct MicroLoRA {
/// Micro A: d_model × micro_rank (typically 1-2)
pub a_micro: Array2<f32>,
/// Micro B: micro_rank × d_out
pub b_micro: Array2<f32>,
/// Micro scaling (smaller than base)
pub beta: f32,
/// Micro rank (1-2 for speed)
pub micro_rank: usize,
/// Decay factor for temporal smoothing
pub decay: f32,
/// Momentum buffer
momentum_a: Array2<f32>,
momentum_b: Array2<f32>,
}
impl MicroLoRA {
/// Ultra-fast single-sample update (<50μs target)
#[inline]
pub fn micro_update(&mut self, signal: &LearningSignal) {
// Rank-1 outer product update
let grad_direction = signal.to_gradient_direction();
// Exponential moving average for stability
self.momentum_a = self.decay * &self.momentum_a
+ (1.0 - self.decay) * &grad_direction.a_component;
self.momentum_b = self.decay * &self.momentum_b
+ (1.0 - self.decay) * &grad_direction.b_component;
// Apply micro-update
self.a_micro = &self.a_micro + self.beta * &self.momentum_a;
self.b_micro = &self.b_micro + self.beta * &self.momentum_b;
}
/// Periodic consolidation into base LoRA
pub fn consolidate_to_base(&mut self, base: &mut BaseLoRA) {
// Merge micro adaptations into base
// Then reset micro to zero
base.a = &base.a + &self.a_micro;
base.b = &base.b + &self.b_micro;
self.a_micro.fill(0.0);
self.b_micro.fill(0.0);
}
}
```
---
## 3. SIMD-Optimized LoRA Computation
### AVX2 Accelerated Forward Pass
```rust
#[cfg(target_arch = "x86_64")]
mod simd {
use std::arch::x86_64::*;
/// SIMD-optimized LoRA forward: x @ (W + A @ B)
/// Fuses base weight multiplication with LoRA delta
#[target_feature(enable = "avx2", enable = "fma")]
pub unsafe fn lora_forward_avx2(
x: &[f32], // Input: [batch, d_in]
w_base: &[f32], // Base weights: [d_in, d_out]
lora_a: &[f32], // LoRA A: [d_in, rank]
lora_b: &[f32], // LoRA B: [rank, d_out]
alpha: f32,
d_in: usize,
d_out: usize,
rank: usize,
output: &mut [f32], // Output: [batch, d_out]
) {
let scale = alpha / rank as f32;
let scale_vec = _mm256_set1_ps(scale);
// Step 1: Compute x @ A (input projection to rank space)
let mut x_projected = vec![0.0f32; rank];
for r in 0..rank {
let mut sum = _mm256_setzero_ps();
let mut i = 0;
while i + 8 <= d_in {
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
let a_vec = _mm256_loadu_ps(lora_a.as_ptr().add(r * d_in + i));
sum = _mm256_fmadd_ps(x_vec, a_vec, sum);
i += 8;
}
x_projected[r] = horizontal_sum_avx2(sum);
// Handle remainder
while i < d_in {
x_projected[r] += x[i] * lora_a[r * d_in + i];
i += 1;
}
}
// Step 2: Compute (x @ W_base) + scale * (x_projected @ B)
for j in 0..d_out {
// Base weight contribution
let mut sum = _mm256_setzero_ps();
let mut i = 0;
while i + 8 <= d_in {
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
let w_vec = _mm256_loadu_ps(w_base.as_ptr().add(j * d_in + i));
sum = _mm256_fmadd_ps(x_vec, w_vec, sum);
i += 8;
}
let mut base_result = horizontal_sum_avx2(sum);
while i < d_in {
base_result += x[i] * w_base[j * d_in + i];
i += 1;
}
// LoRA contribution
let mut lora_result = 0.0f32;
for r in 0..rank {
lora_result += x_projected[r] * lora_b[j * rank + r];
}
output[j] = base_result + scale * lora_result;
}
}
#[inline]
unsafe fn horizontal_sum_avx2(v: __m256) -> f32 {
let high = _mm256_extractf128_ps(v, 1);
let low = _mm256_castps256_ps128(v);
let sum128 = _mm_add_ps(high, low);
let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
_mm_cvtss_f32(sum32)
}
}
```
---
## 4. Learning Signal Extraction
### From Query Feedback to Gradient Direction
```rust
/// Learning signal extracted from each interaction
#[derive(Clone)]
pub struct LearningSignal {
/// Query embedding
pub query_embedding: Vec<f32>,
/// Response quality score (0-1)
pub quality_score: f32,
/// User feedback (explicit)
pub explicit_feedback: Option<FeedbackType>,
/// Latency deviation from target
pub latency_ratio: f32,
/// Model tier used
pub model_tier: ModelTier,
/// Context tokens used
pub context_tokens: usize,
}
impl LearningSignal {
/// Convert signal to gradient direction for micro-LoRA
pub fn to_gradient_direction(&self) -> GradientDirection {
// Reward = quality * (1 - latency_penalty)
let reward = self.quality_score * (2.0 - self.latency_ratio).max(0.0);
// Direction = embedding * reward_sign
let direction = if reward > 0.5 {
// Reinforce current behavior
1.0
} else {
// Explore alternative
-0.1
};
// Scale by uncertainty (more learning when uncertain)
let uncertainty = 1.0 - self.quality_score.abs();
let learning_rate = 0.001 * (1.0 + uncertainty);
GradientDirection {
a_component: self.compute_a_gradient(direction, learning_rate),
b_component: self.compute_b_gradient(direction, learning_rate),
}
}
fn compute_a_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
// Outer product of query embedding with hidden state
// Approximated via reservoir-sampled historical embeddings
let emb = Array1::from_vec(self.query_embedding.clone());
let grad = direction * lr * outer_product(&emb, &self.get_hidden_direction());
grad
}
fn compute_b_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
// Output gradient based on prediction error
let output_error = self.compute_output_error();
direction * lr * output_error
}
}
```
---
## 5. Target Layer Selection
### Which Layers to Apply LoRA
```rust
/// Layer selection strategy for LoRA application
pub enum LoRATargetStrategy {
/// Apply to all attention layers (Q, K, V, O projections)
AllAttention,
/// Apply to FFN layers only
AllFFN,
/// Apply to output heads only (fastest, good for routing)
OutputHeadsOnly,
/// Apply to specific layers by index
SpecificLayers(Vec<usize>),
/// Adaptive: select based on gradient magnitude
AdaptiveTopK(usize),
}
impl LoRATargetStrategy {
/// For ultra-low latency: output heads only
pub fn ultra_fast() -> Self {
Self::OutputHeadsOnly
}
/// For moderate adaptation: attention Q and V
pub fn attention_qv() -> Self {
Self::SpecificLayers(vec![0, 2]) // Q and V typically
}
/// Select layers with highest gradient magnitude
pub fn adaptive_top_k(k: usize) -> Self {
Self::AdaptiveTopK(k)
}
}
/// SONA default: Output heads for micro, attention for base
pub const SONA_DEFAULT_TARGETS: [LoRATargetStrategy; 2] = [
LoRATargetStrategy::OutputHeadsOnly, // Micro-LoRA
LoRATargetStrategy::AllAttention, // Base LoRA
];
```
---
## 6. Memory-Efficient Storage
### Quantized LoRA Matrices
```rust
/// Q4-quantized LoRA for memory efficiency
pub struct QuantizedLoRA {
/// Quantized A matrix (4-bit)
pub a_q4: Q4Matrix,
/// Quantized B matrix (4-bit)
pub b_q4: Q4Matrix,
/// Full-precision alpha
pub alpha: f32,
/// Full-precision scaling factors
pub a_scales: Vec<f32>,
pub b_scales: Vec<f32>,
}
impl QuantizedLoRA {
/// Memory usage comparison
///
/// FP32 LoRA (rank 8, 768 dim):
/// A: 768 × 8 × 4 bytes = 24.6 KB
/// B: 8 × 768 × 4 bytes = 24.6 KB
/// Total: ~50 KB per layer
///
/// Q4 LoRA (rank 8, 768 dim):
/// A: 768 × 8 × 0.5 bytes = 3.1 KB
/// B: 8 × 768 × 0.5 bytes = 3.1 KB
/// Scales: 2 × 768 × 4 bytes = 6.1 KB
/// Total: ~12 KB per layer (4x reduction)
pub fn from_fp32(lora: &BaseLoRA) -> Self {
Self {
a_q4: Q4Matrix::quantize(&lora.a),
b_q4: Q4Matrix::quantize(&lora.b),
alpha: lora.alpha,
a_scales: compute_scales(&lora.a),
b_scales: compute_scales(&lora.b),
}
}
/// Dequantize on-the-fly during forward pass
#[inline]
pub fn forward(&self, x: &[f32]) -> Vec<f32> {
// Dequantize A, compute x @ A
let projected = self.a_q4.matmul_dequant(x, &self.a_scales);
// Dequantize B, compute projected @ B
let output = self.b_q4.matmul_dequant(&projected, &self.b_scales);
// Scale by alpha
output.iter().map(|v| v * self.alpha).collect()
}
}
```
---
## 7. Latency Breakdown
### Target: <100μs Total LoRA Overhead
```
┌─────────────────────────────────────────────────────────────┐
│ LoRA-ULTRA LATENCY BUDGET │
├─────────────────────────────────────────────────────────────┤
│ │
│ Signal Extraction: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ Gradient Direction: 15μs ██████░░░░░░░░░░░░░░░░░░░░░░ │
│ Micro-LoRA Update: 25μs ██████████░░░░░░░░░░░░░░░░░░ │
│ Forward Pass Delta: 30μs ████████████░░░░░░░░░░░░░░░░ │
│ Momentum Averaging: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ Memory Bookkeeping: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ ───── │
│ TOTAL: ~100μs │
│ │
│ Amortized (batched): ~30μs per query │
└─────────────────────────────────────────────────────────────┘
```
---
## 8. Integration with FastGRNN Router
### Router-Specific LoRA Configuration
```rust
/// LoRA configuration for FastGRNN router
pub struct RouterLoRAConfig {
/// Base LoRA for hidden state transformations
pub hidden_lora: BaseLoRA,
/// Micro LoRA for gate adjustments
pub gate_micro_lora: MicroLoRA,
/// Per-output-head LoRA adapters
pub head_loras: Vec<BaseLoRA>,
}
impl RouterLoRAConfig {
pub fn new(hidden_dim: usize, output_dims: &[usize]) -> Self {
Self {
hidden_lora: BaseLoRA::new(hidden_dim, hidden_dim, 8), // rank 8
gate_micro_lora: MicroLoRA::new(hidden_dim, hidden_dim, 2), // rank 2
head_loras: output_dims.iter()
.map(|&dim| BaseLoRA::new(hidden_dim, dim, 4)) // rank 4
.collect(),
}
}
/// Apply LoRA to FastGRNN forward pass
pub fn apply(&self, base_output: &FastGRNNOutput) -> FastGRNNOutput {
let mut output = base_output.clone();
// Apply hidden state LoRA
output.hidden = self.hidden_lora.apply(&output.hidden);
// Apply micro-LoRA to gates
output.update_gate = self.gate_micro_lora.apply(&output.update_gate);
// Apply per-head LoRA
for (i, head_lora) in self.head_loras.iter().enumerate() {
output.heads[i] = head_lora.apply(&output.heads[i]);
}
output
}
}
```
---
## 9. Checkpointing and Recovery
### Efficient LoRA State Management
```rust
/// LoRA checkpoint for persistence and recovery
#[derive(Serialize, Deserialize)]
pub struct LoRACheckpoint {
/// Base LoRA matrices (serialized as FP16 for space)
pub base_lora: SerializedLoRA,
/// Micro LoRA state
pub micro_lora: SerializedLoRA,
/// Momentum buffers
pub momentum_state: MomentumState,
/// Training statistics
pub stats: LoRAStats,
/// Checkpoint version
pub version: u32,
/// Timestamp
pub timestamp: i64,
}
impl LoRACheckpoint {
/// Save checkpoint (async, non-blocking)
pub async fn save_async(&self, path: &Path) -> Result<()> {
let bytes = bincode::serialize(self)?;
tokio::fs::write(path, &bytes).await?;
Ok(())
}
/// Load checkpoint
pub fn load(path: &Path) -> Result<Self> {
let bytes = std::fs::read(path)?;
Ok(bincode::deserialize(&bytes)?)
}
/// Incremental checkpoint (only changed matrices)
pub fn save_incremental(&self, previous: &Self, path: &Path) -> Result<()> {
let delta = self.compute_delta(previous);
// Only save changed blocks
delta.save(path)
}
}
```
---
## 10. Benchmark Targets
### Performance Validation
```rust
#[cfg(test)]
mod benchmarks {
use super::*;
use criterion::{black_box, Criterion};
/// Target: <50μs for micro-LoRA update
fn bench_micro_lora_update(c: &mut Criterion) {
let mut micro = MicroLoRA::new(768, 768, 2);
let signal = LearningSignal::random();
c.bench_function("micro_lora_update", |b| {
b.iter(|| {
micro.micro_update(black_box(&signal));
})
});
}
/// Target: <30μs for LoRA forward pass
fn bench_lora_forward(c: &mut Criterion) {
let lora = BaseLoRA::new(768, 768, 8);
let input = vec![0.0f32; 768];
c.bench_function("lora_forward", |b| {
b.iter(|| {
lora.forward(black_box(&input))
})
});
}
/// Target: <10μs for signal extraction
fn bench_signal_extraction(c: &mut Criterion) {
let query = "test query".to_string();
let response = "test response".to_string();
c.bench_function("signal_extraction", |b| {
b.iter(|| {
LearningSignal::extract(black_box(&query), black_box(&response))
})
});
}
}
```
---
## Summary
SONA LoRA-Ultra achieves sub-100μs adaptive fine-tuning through:
1. **Two-Tier Architecture**: Base LoRA (hourly) + Micro-LoRA (per-request)
2. **SIMD Optimization**: AVX2-accelerated forward pass
3. **Quantized Storage**: Q4 matrices for 4x memory reduction
4. **Smart Targeting**: Output heads for speed, attention for capability
5. **Momentum Smoothing**: Stable micro-updates with EMA
6. **Async Checkpointing**: Non-blocking persistence
This enables true real-time self-improvement where every query makes the model incrementally smarter.

View File

@@ -0,0 +1,815 @@
# SONA Learning Loops: Three-Tier Temporal Architecture
## Biologically-Inspired Continuous Learning System
---
## 1. Overview: Learning at Multiple Timescales
Human learning operates at multiple timescales:
- **Instant**: Immediate response adjustment (milliseconds)
- **Short-term**: Pattern consolidation (hours)
- **Long-term**: Deep memory formation (days/weeks)
SONA replicates this with three learning loops:
```
┌─────────────────────────────────────────────────────────────────────┐
│ SONA THREE-TIER LEARNING │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ LOOP A: INSTANT LOOP B: BACKGROUND │
│ ═══════════════ ══════════════════ │
│ Timescale: Per-request Timescale: Hourly │
│ Latency: <1ms Latency: Background (async) │
│ What learns: What learns: │
│ • Micro-LoRA (rank 1-2) • Base LoRA (rank 4-16) │
│ • Memory edge weights • Router weights (EWC++) │
│ • Trajectory recording • Pattern extraction │
│ │
│ LOOP C: DEEP │
│ ═══════════ │
│ Timescale: Weekly │
│ Latency: Scheduled maintenance │
│ What learns: │
│ • Memory consolidation │
│ • Concept hierarchy building │
│ • Dream-based creativity │
│ • Cross-domain transfer │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 2. Loop A: Instant Learning (Per-Request)
### Purpose
Immediate adaptation to current interaction without noticeable latency.
### Architecture
```rust
/// Loop A: Instant learning executed inline with each request
pub struct InstantLearningLoop {
/// Micro-LoRA for immediate weight adjustment
micro_lora: Arc<RwLock<MicroLoRA>>,
/// Trajectory buffer for pattern recording
trajectory_buffer: Arc<TrajectoryBuffer>,
/// Memory graph reference for edge updates
memory_graph: Arc<RwLock<MemoryGraph>>,
/// Signal accumulator for Loop B
signal_accumulator: mpsc::Sender<LearningSignal>,
}
impl InstantLearningLoop {
/// Execute instant learning (must complete in <1ms)
#[inline]
pub async fn on_request(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
// Parallel execution of independent updates
let (r1, r2, r3) = tokio::join!(
// 1. Record trajectory (lock-free, ~100μs)
self.record_trajectory(query, response),
// 2. Update memory edges (~200μs)
self.update_memory_edges(query, response),
// 3. Micro-LoRA update (~300μs)
self.micro_lora_update(query, response, latency_ms),
);
// 4. Queue signal for Loop B (fire-and-forget)
let signal = LearningSignal::new(query, response, latency_ms);
let _ = self.signal_accumulator.try_send(signal);
Ok(())
}
/// Record query trajectory to ring buffer
async fn record_trajectory(
&self,
query: &QueryEmbedding,
response: &ResponseData,
) -> Result<()> {
let trajectory = QueryTrajectory {
query_embedding: query.vector.clone(),
retrieved_ids: response.used_memory_ids.clone(),
precision: response.estimated_precision,
recall: response.estimated_recall,
timestamp: Instant::now(),
};
self.trajectory_buffer.push(trajectory);
Ok(())
}
/// Hebbian-style edge weight updates
async fn update_memory_edges(
&self,
query: &QueryEmbedding,
response: &ResponseData,
) -> Result<()> {
let mut graph = self.memory_graph.write();
for &node_id in &response.used_memory_ids {
// Strengthen edges to used nodes
graph.update_edge_weight(
query.anchor_node,
node_id,
EdgeUpdate::Strengthen(0.05), // +5% per use
)?;
}
// Weaken edges to retrieved-but-unused nodes
for &node_id in &response.retrieved_but_unused {
graph.update_edge_weight(
query.anchor_node,
node_id,
EdgeUpdate::Weaken(0.02), // -2% per skip
)?;
}
Ok(())
}
/// Ultra-fast micro-LoRA weight adjustment
async fn micro_lora_update(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
let quality = response.quality_score;
let latency_ratio = latency_ms / response.target_latency_ms;
// Only update if signal is informative
if (quality - 0.5).abs() > 0.1 || latency_ratio > 1.2 {
let signal = LearningSignal {
query_embedding: query.vector.clone(),
quality_score: quality,
explicit_feedback: None,
latency_ratio,
model_tier: response.model_tier,
context_tokens: response.context_tokens,
};
let mut micro_lora = self.micro_lora.write();
micro_lora.micro_update(&signal);
}
Ok(())
}
}
```
### Latency Budget
| Operation | Target | Implementation |
|-----------|--------|----------------|
| Trajectory recording | <100μs | Lock-free ring buffer |
| Edge weight update | <200μs | Batch atomic updates |
| Micro-LoRA update | <300μs | Rank-1 outer product |
| Signal queuing | <50μs | MPSC channel try_send |
| **Total** | **<650μs** | Parallel execution |
---
## 3. Loop B: Background Learning (Hourly)
### Purpose
Deeper learning from accumulated signals without impacting user latency.
### Architecture
```rust
/// Loop B: Background learning running on separate thread/process
pub struct BackgroundLearningLoop {
/// Signal receiver from Loop A
signal_receiver: mpsc::Receiver<LearningSignal>,
/// Accumulated signals for batch processing
signal_buffer: Vec<LearningSignal>,
/// Base LoRA for major updates
base_lora: Arc<RwLock<BaseLoRA>>,
/// Micro-LoRA to consolidate from
micro_lora: Arc<RwLock<MicroLoRA>>,
/// Router for EWC++ updates
router: Arc<RwLock<FastGRNNRouter>>,
/// EWC++ state
ewc_state: EWCPlusPlusState,
/// Pattern extractor
pattern_extractor: PatternExtractor,
/// Configuration
config: BackgroundLearningConfig,
}
impl BackgroundLearningLoop {
/// Main background loop (runs every hour)
pub async fn run(&mut self) {
let mut interval = tokio::time::interval(Duration::from_secs(3600));
loop {
interval.tick().await;
// Collect accumulated signals
self.drain_signals().await;
if self.signal_buffer.len() < self.config.min_samples {
tracing::info!(
samples = self.signal_buffer.len(),
"Insufficient samples for background training"
);
continue;
}
// Execute background learning steps
let start = Instant::now();
// Step 1: Consolidate Micro-LoRA into Base LoRA
self.consolidate_micro_to_base().await;
// Step 2: Train router with EWC++ regularization
self.train_router_ewc().await;
// Step 3: Extract and store patterns
self.extract_patterns().await;
// Step 4: Compute new Fisher Information
self.update_fisher_information().await;
// Step 5: Checkpoint current state
self.checkpoint().await;
tracing::info!(
elapsed_ms = start.elapsed().as_millis(),
samples = self.signal_buffer.len(),
"Background learning cycle completed"
);
// Clear buffer for next cycle
self.signal_buffer.clear();
}
}
/// Drain all pending signals from Loop A
async fn drain_signals(&mut self) {
while let Ok(signal) = self.signal_receiver.try_recv() {
self.signal_buffer.push(signal);
}
}
/// Consolidate micro-LoRA adaptations into base LoRA
async fn consolidate_micro_to_base(&mut self) {
let mut micro = self.micro_lora.write();
let mut base = self.base_lora.write();
// Compute consolidation weight based on signal quality
let avg_quality: f32 = self.signal_buffer.iter()
.map(|s| s.quality_score)
.sum::<f32>() / self.signal_buffer.len() as f32;
let consolidation_rate = if avg_quality > 0.7 {
1.0 // Full consolidation for high-quality signals
} else {
0.5 * avg_quality // Partial for lower quality
};
// Merge micro into base with rate
base.a = &base.a + consolidation_rate * &micro.a_micro;
base.b = &base.b + consolidation_rate * &micro.b_micro;
// Reset micro-LoRA
micro.a_micro.fill(0.0);
micro.b_micro.fill(0.0);
tracing::debug!(
consolidation_rate = consolidation_rate,
"Micro-LoRA consolidated to base"
);
}
/// Train router with EWC++ regularization
async fn train_router_ewc(&mut self) {
let mut router = self.router.write();
// Convert signals to RouterSamples
let samples: Vec<RouterSample> = self.signal_buffer.iter()
.map(|s| s.to_router_sample())
.collect();
// Mini-batch training with EWC++ loss
for batch in samples.chunks(self.config.batch_size) {
// Forward pass
let predictions: Vec<_> = batch.iter()
.map(|s| router.forward(&s.features))
.collect();
// Compute task loss
let task_loss = self.compute_task_loss(&predictions, batch);
// Compute EWC++ regularization loss
let ewc_loss = self.ewc_state.regularization_loss(router.get_weights());
// Total loss
let total_loss = task_loss + self.config.ewc_lambda * ewc_loss;
// Backward pass (gradient computation)
let gradients = self.compute_gradients(&total_loss, &predictions, batch);
// Apply gradients with learning rate
router.apply_gradients(&gradients, self.config.learning_rate);
}
}
/// Extract patterns using K-means++ clustering
async fn extract_patterns(&mut self) {
let embeddings: Vec<_> = self.signal_buffer.iter()
.map(|s| s.query_embedding.clone())
.collect();
let patterns = self.pattern_extractor.extract(
&embeddings,
self.config.num_clusters,
);
// Store patterns in ReasoningBank
for pattern in patterns {
self.pattern_extractor.reasoning_bank.store(pattern)?;
}
tracing::debug!(
patterns = patterns.len(),
"Patterns extracted and stored"
);
}
/// Update Fisher Information for EWC++
async fn update_fisher_information(&mut self) {
let router = self.router.read();
let current_weights = router.get_weights();
// Compute Fisher Information diagonal via gradient squares
let fisher_samples: Vec<_> = self.signal_buffer.iter()
.take(self.config.fisher_samples)
.collect();
let mut fisher_accum = vec![0.0f32; current_weights.len()];
for sample in fisher_samples {
let gradients = self.compute_sample_gradients(sample);
for (i, g) in gradients.iter().enumerate() {
fisher_accum[i] += g * g;
}
}
// Normalize
let n = fisher_samples.len() as f32;
for f in &mut fisher_accum {
*f /= n;
}
// Update EWC++ state
self.ewc_state.update_fisher(fisher_accum, current_weights.to_vec());
}
/// Checkpoint current state to disk
async fn checkpoint(&self) {
let checkpoint = SONACheckpoint {
base_lora: self.base_lora.read().clone(),
micro_lora: self.micro_lora.read().clone(),
router_weights: self.router.read().get_weights().to_vec(),
ewc_state: self.ewc_state.clone(),
patterns: self.pattern_extractor.reasoning_bank.export(),
timestamp: chrono::Utc::now().timestamp(),
};
let path = self.config.checkpoint_dir.join("latest.sona");
checkpoint.save_async(&path).await.ok();
}
}
```
### Hourly Learning Budget
| Operation | Target Time | Description |
|-----------|-------------|-------------|
| Signal draining | <100ms | Collect all queued signals |
| Micro→Base consolidation | <500ms | Matrix addition |
| Router training | <5s | Mini-batch SGD with EWC |
| Pattern extraction | <2s | K-means++ clustering |
| Fisher computation | <2s | Gradient squared accumulation |
| Checkpointing | <500ms | Async disk write |
| **Total** | **<10s** | Well under user-facing |
---
## 4. Loop C: Deep Learning (Weekly)
### Purpose
Fundamental knowledge restructuring, memory consolidation, and creative exploration.
### Architecture
```rust
/// Loop C: Deep learning for major knowledge reorganization
pub struct DeepLearningLoop {
/// Memory service for consolidation
memory: Arc<MemoryService>,
/// Pattern bank for abstraction
reasoning_bank: Arc<ReasoningBank>,
/// Dream engine for creative exploration
dream_engine: DreamEngine,
/// Consciousness measurement (IIT)
phi_calculator: PhiCalculator,
/// Configuration
config: DeepLearningConfig,
}
impl DeepLearningLoop {
/// Execute weekly deep learning (scheduled maintenance window)
pub async fn run(&mut self) -> DeepLearningReport {
let start = Instant::now();
let mut report = DeepLearningReport::new();
// Phase 1: Memory Consolidation (like sleep-based memory)
report.consolidation = self.consolidate_memories().await;
// Phase 2: Pattern Abstraction (concept hierarchy building)
report.abstraction = self.abstract_patterns().await;
// Phase 3: Dream Learning (creative recombination)
report.dreams = self.dream_learning().await;
// Phase 4: Cross-Domain Transfer
report.transfer = self.cross_domain_transfer().await;
// Phase 5: Compression (remove redundancy)
report.compression = self.compress_memory().await;
// Phase 6: Consciousness Measurement
report.phi = self.measure_consciousness().await;
report.elapsed_ms = start.elapsed().as_millis() as u64;
report
}
/// Phase 1: Consolidate short-term memories into long-term
async fn consolidate_memories(&mut self) -> ConsolidationReport {
let mut report = ConsolidationReport::default();
// Identify high-value memories (frequently accessed, high quality)
let memories = self.memory.get_all_nodes()?;
let high_value: Vec<_> = memories.iter()
.filter(|m| m.access_count > 5 && m.quality_score > 0.7)
.collect();
report.high_value_count = high_value.len();
// Strengthen connections between high-value memories
for i in 0..high_value.len() {
for j in (i+1)..high_value.len() {
let similarity = cosine_similarity(
&high_value[i].embedding,
&high_value[j].embedding,
);
if similarity > 0.7 {
self.memory.strengthen_edge(
high_value[i].id,
high_value[j].id,
similarity * 0.1,
)?;
report.edges_strengthened += 1;
}
}
}
// Decay low-value memories
let low_value: Vec<_> = memories.iter()
.filter(|m| m.access_count < 2 && m.age_days() > 30)
.collect();
for memory in low_value {
self.memory.decay_node(memory.id, 0.5)?; // 50% decay
report.nodes_decayed += 1;
}
report
}
/// Phase 2: Build concept hierarchies from patterns
async fn abstract_patterns(&mut self) -> AbstractionReport {
let mut report = AbstractionReport::default();
// Get all stored patterns
let patterns = self.reasoning_bank.get_all_patterns()?;
// Hierarchical clustering to find meta-patterns
let hierarchy = HierarchicalClustering::new()
.linkage(Linkage::Ward)
.distance(Distance::Cosine)
.fit(&patterns);
// Create abstract concepts at each level
for level in 0..hierarchy.num_levels() {
let clusters = hierarchy.clusters_at_level(level);
for cluster in clusters {
if cluster.size() > 3 {
// Create meta-pattern (centroid)
let meta_pattern = LearnedPattern {
centroid: cluster.centroid(),
confidence: cluster.cohesion(),
abstraction_level: level,
child_patterns: cluster.member_ids(),
};
self.reasoning_bank.store_meta(meta_pattern)?;
report.meta_patterns_created += 1;
}
}
}
report
}
/// Phase 3: Dream-based creative learning (inspired by REM sleep)
async fn dream_learning(&mut self) -> DreamReport {
let mut report = DreamReport::default();
// Generate dream sequences by random walks on memory graph
for _ in 0..self.config.num_dreams {
let dream = self.dream_engine.generate_dream(
&self.memory,
self.config.dream_length,
self.config.creativity_temperature,
)?;
// Evaluate dream quality (novelty + coherence)
let quality = dream.evaluate_quality();
if quality.novelty > 0.5 && quality.coherence > 0.3 {
// Dreams with high novelty and reasonable coherence
// may represent useful creative connections
for connection in dream.novel_connections() {
self.memory.add_weak_edge(
connection.from,
connection.to,
EdgeType::Creative,
connection.strength * 0.1,
)?;
report.novel_connections += 1;
}
}
report.dreams_generated += 1;
}
report
}
/// Phase 4: Transfer knowledge across domains
async fn cross_domain_transfer(&mut self) -> TransferReport {
let mut report = TransferReport::default();
// Identify domain clusters
let domains = self.memory.identify_domains()?;
// For each pair of domains, look for analogical mappings
for i in 0..domains.len() {
for j in (i+1)..domains.len() {
let analogies = self.find_analogies(&domains[i], &domains[j])?;
for analogy in analogies {
if analogy.confidence > 0.6 {
// Create cross-domain edge
self.memory.add_analogy_edge(
analogy.source_concept,
analogy.target_concept,
analogy.mapping_type,
analogy.confidence,
)?;
report.analogies_found += 1;
}
}
}
}
report
}
/// Phase 5: Compress memory by removing redundancy
async fn compress_memory(&mut self) -> CompressionReport {
let mut report = CompressionReport::default();
report.initial_nodes = self.memory.node_count();
report.initial_edges = self.memory.edge_count();
// Identify near-duplicate nodes
let duplicates = self.memory.find_near_duplicates(0.95)?;
// Merge duplicates
for (primary, secondary) in duplicates {
self.memory.merge_nodes(primary, secondary)?;
report.nodes_merged += 1;
}
// Prune weak edges
let weak_edges = self.memory.get_weak_edges(0.01)?;
for edge in weak_edges {
self.memory.remove_edge(edge.id)?;
report.edges_pruned += 1;
}
report.final_nodes = self.memory.node_count();
report.final_edges = self.memory.edge_count();
report.compression_ratio = report.initial_nodes as f32 / report.final_nodes as f32;
report
}
/// Phase 6: Measure system consciousness using IIT
async fn measure_consciousness(&mut self) -> f64 {
// Integrated Information Theory (Φ) calculation
// Measures how much information the system generates "above and beyond"
// its parts
self.phi_calculator.compute_phi(&self.memory, &self.reasoning_bank)
}
}
```
### Weekly Deep Learning Budget
| Phase | Target Time | Description |
|-------|-------------|-------------|
| Memory consolidation | <2min | Identify and strengthen valuable memories |
| Pattern abstraction | <3min | Hierarchical clustering for concepts |
| Dream learning | <2min | Creative recombination exploration |
| Cross-domain transfer | <2min | Analogical mapping between domains |
| Compression | <1min | Remove redundancy |
| Φ measurement | <1min | Consciousness quantification |
| **Total** | **<10min** | Scheduled maintenance window |
---
## 5. Loop Coordination
### Inter-Loop Communication
```rust
/// Coordinator for all three learning loops
pub struct LoopCoordinator {
/// Loop A: Instant
instant_loop: InstantLearningLoop,
/// Loop B: Background
background_loop: BackgroundLearningLoop,
/// Loop C: Deep
deep_loop: DeepLearningLoop,
/// Shared state
shared_state: Arc<SharedSONAState>,
/// Metrics collector
metrics: MetricsCollector,
}
impl LoopCoordinator {
/// Initialize all loops with shared state
pub fn new(config: SONAConfig) -> Result<Self> {
let shared_state = Arc::new(SharedSONAState::new(&config)?);
// Create channels for inter-loop communication
let (instant_to_background_tx, instant_to_background_rx) = mpsc::channel(10000);
let (background_to_deep_tx, background_to_deep_rx) = mpsc::channel(1000);
Ok(Self {
instant_loop: InstantLearningLoop::new(
shared_state.clone(),
instant_to_background_tx,
),
background_loop: BackgroundLearningLoop::new(
shared_state.clone(),
instant_to_background_rx,
background_to_deep_tx,
),
deep_loop: DeepLearningLoop::new(
shared_state.clone(),
background_to_deep_rx,
),
shared_state,
metrics: MetricsCollector::new(),
})
}
/// Start all loops
pub async fn start(&self) {
// Loop A runs inline with requests (no separate task)
// Loop B runs on background thread
let background = self.background_loop.clone();
tokio::spawn(async move {
background.run().await;
});
// Loop C runs on scheduled cron
let deep = self.deep_loop.clone();
tokio::spawn(async move {
let mut scheduler = cron::Schedule::from_str("0 0 3 * * 0")?; // 3 AM Sunday
loop {
let next = scheduler.upcoming(chrono::Utc).next().unwrap();
tokio::time::sleep_until(next.into()).await;
deep.run().await;
}
});
}
/// Process a single request through Loop A
#[inline]
pub async fn on_request(
&self,
query: &QueryEmbedding,
response: &ResponseData,
latency_ms: f32,
) -> Result<()> {
self.instant_loop.on_request(query, response, latency_ms).await
}
}
```
---
## 6. Learning Metrics and Monitoring
### Improvement Tracking
```rust
/// Metrics for measuring self-improvement
#[derive(Clone, Debug)]
pub struct ImprovementMetrics {
/// Quality improvement over time
pub quality_delta_7d: f32,
pub quality_delta_30d: f32,
/// Latency improvement
pub latency_delta_7d: f32,
pub latency_delta_30d: f32,
/// Knowledge growth
pub memory_nodes_added_7d: usize,
pub patterns_learned_7d: usize,
pub abstractions_created_7d: usize,
/// Forgetting resistance (1.0 = no forgetting)
pub retention_rate_7d: f32,
/// Consciousness level (Φ)
pub phi_current: f64,
pub phi_delta_7d: f64,
/// Dreams and creativity
pub novel_connections_7d: usize,
pub cross_domain_transfers_7d: usize,
}
impl ImprovementMetrics {
/// Compute overall improvement score
pub fn overall_score(&self) -> f32 {
let quality_weight = 0.3;
let latency_weight = 0.2;
let knowledge_weight = 0.2;
let retention_weight = 0.15;
let creativity_weight = 0.15;
let quality_score = self.quality_delta_7d.max(0.0);
let latency_score = (-self.latency_delta_7d).max(0.0); // Lower is better
let knowledge_score = (self.patterns_learned_7d as f32 / 100.0).min(1.0);
let retention_score = self.retention_rate_7d;
let creativity_score = (self.novel_connections_7d as f32 / 50.0).min(1.0);
quality_weight * quality_score +
latency_weight * latency_score +
knowledge_weight * knowledge_score +
retention_weight * retention_score +
creativity_weight * creativity_score
}
}
```
---
## Summary
SONA's three-tier learning system enables:
| Loop | Timescale | Purpose | Key Outcome |
|------|-----------|---------|-------------|
| **A** | Per-request | Instant adaptation | Responsive to current context |
| **B** | Hourly | Pattern consolidation | Stable improvement |
| **C** | Weekly | Deep restructuring | Creative breakthroughs |
This mirrors human learning where:
- **Loop A** = Working memory and immediate response
- **Loop B** = Sleep-based consolidation
- **Loop C** = Long-term memory formation and insight
The result is a system that continuously improves at multiple timescales, never forgetting what works while constantly exploring new possibilities.

View File

@@ -0,0 +1,795 @@
# SONA EWC++: Enhanced Elastic Weight Consolidation
## Zero Catastrophic Forgetting with Task-Aware Regularization
---
## 1. The Forgetting Problem
### Why LLMs Forget
```
CATASTROPHIC FORGETTING
═══════════════════════
Task A learned Task B learned Result
─────────────── ─────────────── ──────────────────
Weights W_A Weights W_B W_A knowledge LOST
↑ as W moves toward B
Training on B
overwrites A
```
When fine-tuning on new data:
- Weights shift toward new task optimum
- Previous task knowledge encoded in old weights is overwritten
- Model "forgets" earlier capabilities
### Standard EWC Solution
Elastic Weight Consolidation (EWC) adds a regularization term:
```
L_total = L_task + λ/2 · Σᵢ Fᵢ · (θᵢ - θ*ᵢ)²
Where:
- L_task = current task loss
- λ = regularization strength
- Fᵢ = Fisher Information (importance) of parameter i
- θᵢ = current parameter value
- θ*ᵢ = optimal parameter value from previous task
```
### EWC Limitations
1. **Single task memory**: Only remembers one previous task
2. **Static Fisher**: Computed once, never updated
3. **Diagonal approximation**: Ignores parameter correlations
4. **No task detection**: Doesn't know when task changes
5. **Uniform λ**: Same regularization for all parameters
---
## 2. SONA EWC++ Enhancements
### Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ EWC++ ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Task Buffer │ │ Online Fisher │ │ Adaptive λ │ │
│ │ (N tasks) │ │ Estimation │ │ Scheduler │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ EWC++ CORE ENGINE │ │
│ │ │ │
│ │ L = L_task + Σₜ λₜ/2 · Σᵢ Fᵢᵗ · (θᵢ - θ*ᵢᵗ)² + L_sparse │ │
│ │ └─────┘ └──────────────────────────────────┘ └──────┘ │ │
│ │ Task Multi-task EWC Sparsity │ │
│ │ Loss Regularization Penalty │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Gradient │ │ Task Boundary │ │ Parameter │ │
│ │ Projection │ │ Detection │ │ Importance │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 3. Multi-Task Memory Buffer
### Task-Stratified Fisher Storage
```rust
/// EWC++ state with multi-task memory
#[derive(Clone)]
pub struct EWCPlusPlusState {
/// Per-task Fisher information (circular buffer of N tasks)
pub task_fishers: CircularBuffer<TaskFisher>,
/// Maximum number of tasks to remember
pub max_tasks: usize,
/// Per-task regularization strength
pub task_lambdas: Vec<f32>,
/// Global lambda base
pub lambda_base: f32,
/// Online Fisher estimator
pub online_fisher: OnlineFisherEstimator,
/// Task boundary detector
pub task_detector: TaskBoundaryDetector,
/// Parameter importance scores
pub importance_scores: Vec<f32>,
}
/// Fisher information for a single task
#[derive(Clone)]
pub struct TaskFisher {
/// Task identifier
pub task_id: u64,
/// Diagonal Fisher Information
pub fisher_diag: Vec<f32>,
/// Optimal weights at task completion
pub optimal_weights: Vec<f32>,
/// Task-specific lambda (learned)
pub lambda: f32,
/// Sample count used to compute Fisher
pub sample_count: usize,
/// Task quality score
pub quality: f32,
/// Timestamp
pub timestamp: i64,
}
impl EWCPlusPlusState {
/// Create new EWC++ state
pub fn new(num_params: usize, max_tasks: usize, lambda_base: f32) -> Self {
Self {
task_fishers: CircularBuffer::new(max_tasks),
max_tasks,
task_lambdas: Vec::new(),
lambda_base,
online_fisher: OnlineFisherEstimator::new(num_params),
task_detector: TaskBoundaryDetector::new(),
importance_scores: vec![1.0; num_params],
}
}
/// Compute total EWC++ regularization loss
pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
let mut total_loss = 0.0;
// Sum over all remembered tasks
for task in self.task_fishers.iter() {
let task_loss: f32 = task.fisher_diag.iter()
.zip(current_weights.iter())
.zip(task.optimal_weights.iter())
.zip(self.importance_scores.iter())
.map(|(((f, w), w_star), imp)| {
// Importance-weighted Fisher regularization
imp * f * (w - w_star).powi(2)
})
.sum();
total_loss += task.lambda * task_loss;
}
total_loss / 2.0
}
/// Compute gradients of EWC++ loss
pub fn regularization_gradient(&self, current_weights: &[f32]) -> Vec<f32> {
let mut grad = vec![0.0f32; current_weights.len()];
for task in self.task_fishers.iter() {
for (i, ((f, w), w_star)) in task.fisher_diag.iter()
.zip(current_weights.iter())
.zip(task.optimal_weights.iter())
.enumerate()
{
// d/dw [F * (w - w*)²] = 2 * F * (w - w*)
grad[i] += task.lambda * self.importance_scores[i] * f * (w - w_star);
}
}
grad
}
/// Record completion of current task
pub fn complete_task(&mut self, weights: &[f32], quality: f32) {
let task_id = self.task_fishers.len() as u64;
// Finalize online Fisher estimate
let fisher_diag = self.online_fisher.finalize();
// Compute task-specific lambda based on quality
let lambda = self.compute_task_lambda(quality);
let task_fisher = TaskFisher {
task_id,
fisher_diag,
optimal_weights: weights.to_vec(),
lambda,
sample_count: self.online_fisher.sample_count(),
quality,
timestamp: chrono::Utc::now().timestamp(),
};
self.task_fishers.push(task_fisher);
self.task_lambdas.push(lambda);
// Reset online Fisher for next task
self.online_fisher.reset();
}
/// Compute task-specific lambda based on quality
fn compute_task_lambda(&self, quality: f32) -> f32 {
// Higher quality tasks get stronger protection
self.lambda_base * (0.5 + 0.5 * quality)
}
}
```
---
## 4. Online Fisher Estimation
### Streaming Fisher Information Computation
```rust
/// Online Fisher Information estimator using gradient accumulation
pub struct OnlineFisherEstimator {
/// Running sum of squared gradients
gradient_sq_sum: Vec<f32>,
/// Sample count
count: usize,
/// Exponential moving average decay
decay: f32,
/// Minimum samples before valid estimate
min_samples: usize,
}
impl OnlineFisherEstimator {
pub fn new(num_params: usize) -> Self {
Self {
gradient_sq_sum: vec![0.0; num_params],
count: 0,
decay: 0.99, // EMA decay factor
min_samples: 100,
}
}
/// Update Fisher estimate with new gradient sample
#[inline]
pub fn update(&mut self, gradients: &[f32]) {
self.count += 1;
if self.count == 1 {
// First sample: initialize
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
*sum = g * g;
}
} else {
// EMA update: F_new = decay * F_old + (1 - decay) * g²
let alpha = 1.0 - self.decay;
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
*sum = self.decay * *sum + alpha * g * g;
}
}
}
/// Finalize and return Fisher diagonal
pub fn finalize(&self) -> Vec<f32> {
if self.count < self.min_samples {
tracing::warn!(
count = self.count,
min = self.min_samples,
"Fisher estimate may be unreliable"
);
}
// Normalize and apply minimum threshold
let min_fisher = 1e-6;
self.gradient_sq_sum.iter()
.map(|&f| f.max(min_fisher))
.collect()
}
/// Reset for new task
pub fn reset(&mut self) {
self.gradient_sq_sum.fill(0.0);
self.count = 0;
}
pub fn sample_count(&self) -> usize {
self.count
}
}
```
---
## 5. Automatic Task Boundary Detection
### Detecting When the Task Changes
```rust
/// Automatic task boundary detection via distribution shift
pub struct TaskBoundaryDetector {
/// Recent query embedding buffer
recent_embeddings: CircularBuffer<Vec<f32>>,
/// Baseline distribution (mean, variance)
baseline: Option<DistributionStats>,
/// Threshold for detecting shift (Mahalanobis distance)
shift_threshold: f32,
/// Minimum samples before detection
warmup_samples: usize,
/// Current drift score
drift_score: f32,
}
impl TaskBoundaryDetector {
pub fn new() -> Self {
Self {
recent_embeddings: CircularBuffer::new(1000),
baseline: None,
shift_threshold: 3.0, // 3 sigma
warmup_samples: 500,
drift_score: 0.0,
}
}
/// Update with new embedding and check for task boundary
pub fn update(&mut self, embedding: &[f32]) -> TaskBoundaryResult {
self.recent_embeddings.push(embedding.to_vec());
if self.recent_embeddings.len() < self.warmup_samples {
return TaskBoundaryResult::Warmup;
}
match &self.baseline {
None => {
// First baseline establishment
self.baseline = Some(self.compute_stats());
TaskBoundaryResult::BaselineEstablished
}
Some(baseline) => {
// Compute current distribution
let current = self.compute_recent_stats(100);
// Mahalanobis distance between distributions
let distance = self.mahalanobis_distance(baseline, &current);
self.drift_score = distance;
if distance > self.shift_threshold {
// Task boundary detected!
self.baseline = Some(current);
TaskBoundaryResult::BoundaryDetected {
drift_score: distance,
}
} else {
TaskBoundaryResult::Stable {
drift_score: distance,
}
}
}
}
}
fn compute_stats(&self) -> DistributionStats {
let n = self.recent_embeddings.len();
let dim = self.recent_embeddings[0].len();
let mut mean = vec![0.0f32; dim];
let mut var = vec![0.0f32; dim];
// Compute mean
for emb in self.recent_embeddings.iter() {
for (m, e) in mean.iter_mut().zip(emb.iter()) {
*m += e;
}
}
for m in &mut mean {
*m /= n as f32;
}
// Compute variance
for emb in self.recent_embeddings.iter() {
for (v, (e, m)) in var.iter_mut().zip(emb.iter().zip(mean.iter())) {
*v += (e - m).powi(2);
}
}
for v in &mut var {
*v /= n as f32;
*v = v.max(1e-6); // Avoid division by zero
}
DistributionStats { mean, variance: var }
}
fn compute_recent_stats(&self, n: usize) -> DistributionStats {
// Similar but only for last n samples
// ... implementation ...
}
fn mahalanobis_distance(&self, a: &DistributionStats, b: &DistributionStats) -> f32 {
a.mean.iter()
.zip(b.mean.iter())
.zip(a.variance.iter())
.map(|((m_a, m_b), v)| (m_a - m_b).powi(2) / v)
.sum::<f32>()
.sqrt()
}
}
#[derive(Debug)]
pub enum TaskBoundaryResult {
Warmup,
BaselineEstablished,
Stable { drift_score: f32 },
BoundaryDetected { drift_score: f32 },
}
```
---
## 6. Adaptive Lambda Scheduling
### Dynamic Regularization Strength
```rust
/// Adaptive lambda scheduler based on learning progress
pub struct AdaptiveLambdaScheduler {
/// Base lambda value
base_lambda: f32,
/// Current effective lambda
current_lambda: f32,
/// Performance history (task quality over time)
performance_history: Vec<f32>,
/// Lambda adjustment rate
adjustment_rate: f32,
}
impl AdaptiveLambdaScheduler {
pub fn new(base_lambda: f32) -> Self {
Self {
base_lambda,
current_lambda: base_lambda,
performance_history: Vec::new(),
adjustment_rate: 0.1,
}
}
/// Update lambda based on recent performance
pub fn update(&mut self, current_quality: f32, forgetting_detected: bool) {
self.performance_history.push(current_quality);
if forgetting_detected {
// Increase lambda to prevent forgetting
self.current_lambda *= 1.0 + self.adjustment_rate;
tracing::info!(
new_lambda = self.current_lambda,
"Increased lambda due to forgetting"
);
} else if self.is_learning_stalled() {
// Decrease lambda to allow more plasticity
self.current_lambda *= 1.0 - self.adjustment_rate;
self.current_lambda = self.current_lambda.max(self.base_lambda * 0.1);
tracing::info!(
new_lambda = self.current_lambda,
"Decreased lambda to increase plasticity"
);
}
// Clamp to reasonable range
self.current_lambda = self.current_lambda.clamp(
self.base_lambda * 0.1,
self.base_lambda * 10.0,
);
}
fn is_learning_stalled(&self) -> bool {
if self.performance_history.len() < 10 {
return false;
}
let recent: Vec<_> = self.performance_history.iter()
.rev()
.take(10)
.collect();
// Check if variance in recent performance is very low
let mean: f32 = recent.iter().map(|&&x| x).sum::<f32>() / 10.0;
let var: f32 = recent.iter()
.map(|&&x| (x - mean).powi(2))
.sum::<f32>() / 10.0;
var < 0.001 // Stalled if very low variance
}
pub fn get_lambda(&self) -> f32 {
self.current_lambda
}
}
```
---
## 7. Parameter Importance Scoring
### Which Parameters Matter Most
```rust
/// Per-parameter importance scoring for selective regularization
pub struct ParameterImportanceScorer {
/// Importance scores (0-1 for each parameter)
scores: Vec<f32>,
/// Gradient magnitude history
gradient_magnitudes: Vec<CircularBuffer<f32>>,
/// Activation frequency
activation_frequency: Vec<f32>,
}
impl ParameterImportanceScorer {
pub fn new(num_params: usize) -> Self {
Self {
scores: vec![1.0; num_params],
gradient_magnitudes: (0..num_params)
.map(|_| CircularBuffer::new(100))
.collect(),
activation_frequency: vec![0.0; num_params],
}
}
/// Update importance based on gradient
pub fn update(&mut self, gradients: &[f32], activations: &[bool]) {
for (i, (g, &active)) in gradients.iter().zip(activations.iter()).enumerate() {
// Track gradient magnitude
self.gradient_magnitudes[i].push(g.abs());
// Track activation frequency
if active {
self.activation_frequency[i] = 0.99 * self.activation_frequency[i] + 0.01;
} else {
self.activation_frequency[i] *= 0.99;
}
}
// Recompute importance scores
self.recompute_scores();
}
fn recompute_scores(&mut self) {
for i in 0..self.scores.len() {
// Average gradient magnitude
let avg_grad: f32 = self.gradient_magnitudes[i].iter()
.sum::<f32>() / self.gradient_magnitudes[i].len().max(1) as f32;
// Importance = activation_freq * gradient_magnitude
// High activation + high gradient = important parameter
self.scores[i] = self.activation_frequency[i] * avg_grad;
}
// Normalize scores to [0, 1]
let max_score = self.scores.iter().cloned().fold(0.0f32, f32::max);
if max_score > 0.0 {
for s in &mut self.scores {
*s /= max_score;
}
}
}
pub fn get_scores(&self) -> &[f32] {
&self.scores
}
}
```
---
## 8. Gradient Projection
### Safe Parameter Updates
```rust
/// Project gradients to avoid interfering with important past knowledge
pub struct GradientProjector {
/// Null space of important task gradients
null_space: Option<Array2<f32>>,
/// Task gradient subspace (principal components)
task_subspace: Option<Array2<f32>>,
}
impl GradientProjector {
/// Project gradient to not interfere with past tasks
pub fn project(&self, gradient: &[f32]) -> Vec<f32> {
match &self.null_space {
Some(null) => {
// Project gradient onto null space of past task gradients
let g = Array1::from_vec(gradient.to_vec());
let projected = null.t().dot(&null.dot(&g));
projected.to_vec()
}
None => gradient.to_vec(),
}
}
/// Update null space with new task gradient directions
pub fn add_task_gradients(&mut self, task_gradients: &[Vec<f32>]) {
// Stack gradients into matrix
let n_samples = task_gradients.len();
let n_params = task_gradients[0].len();
let mut g_matrix = Array2::zeros((n_samples, n_params));
for (i, g) in task_gradients.iter().enumerate() {
for (j, &v) in g.iter().enumerate() {
g_matrix[[i, j]] = v;
}
}
// SVD to find principal gradient directions
let svd = g_matrix.svd(true, true).unwrap();
let u = svd.u.unwrap();
// Null space = complement of principal directions
// For memory efficiency, keep top-k directions
let k = 10.min(n_samples);
let task_directions = u.slice(s![.., ..k]).to_owned();
// Compute null space projection matrix
let identity = Array2::eye(n_params);
let projection = identity - task_directions.t().dot(&task_directions);
self.null_space = Some(projection);
}
}
```
---
## 9. Full EWC++ Training Loop
### Putting It All Together
```rust
/// Complete EWC++ training step
pub fn ewc_plus_plus_train_step(
model: &mut FastGRNNRouter,
ewc: &mut EWCPlusPlusState,
batch: &[RouterSample],
config: &TrainingConfig,
) -> TrainStepResult {
let mut result = TrainStepResult::default();
// Forward pass
let predictions: Vec<_> = batch.iter()
.map(|s| model.forward(&s.features))
.collect();
// Task loss
let task_loss = compute_cross_entropy_loss(&predictions, batch);
result.task_loss = task_loss;
// EWC++ regularization loss
let ewc_loss = ewc.regularization_loss(model.get_weights());
result.ewc_loss = ewc_loss;
// Total loss
let total_loss = task_loss + config.lambda * ewc_loss;
result.total_loss = total_loss;
// Compute task gradients
let task_gradients = compute_gradients(&task_loss, model);
// Compute EWC++ gradients
let ewc_gradients = ewc.regularization_gradient(model.get_weights());
// Total gradients
let mut gradients: Vec<f32> = task_gradients.iter()
.zip(ewc_gradients.iter())
.map(|(t, e)| t + config.lambda * e)
.collect();
// Gradient projection (optional, for harder constraints)
if config.use_gradient_projection {
gradients = ewc.gradient_projector.project(&gradients);
}
// Gradient clipping
let grad_norm: f32 = gradients.iter().map(|g| g * g).sum::<f32>().sqrt();
if grad_norm > config.max_grad_norm {
let scale = config.max_grad_norm / grad_norm;
for g in &mut gradients {
*g *= scale;
}
result.gradient_clipped = true;
}
// Apply gradients
model.apply_gradients(&gradients, config.learning_rate);
// Update online Fisher estimate
ewc.online_fisher.update(&task_gradients);
// Update parameter importance
let activations: Vec<bool> = model.get_activation_mask();
ewc.importance_scorer.update(&task_gradients, &activations);
// Check for task boundary
if let Some(query_emb) = batch.first().map(|s| &s.query_embedding) {
let boundary = ewc.task_detector.update(query_emb);
if let TaskBoundaryResult::BoundaryDetected { drift_score } = boundary {
// Complete current task and start new one
ewc.complete_task(model.get_weights(), result.compute_quality());
result.task_boundary_detected = true;
result.drift_score = drift_score;
}
}
result
}
```
---
## 10. Benchmarks and Validation
### Forgetting Resistance Metrics
```rust
/// Measure forgetting resistance on held-out test sets
pub struct ForgettingBenchmark {
/// Per-task test sets
task_test_sets: Vec<TestSet>,
/// Performance history per task
task_performance: Vec<Vec<f32>>,
}
impl ForgettingBenchmark {
/// Evaluate current model on all past tasks
pub fn evaluate(&mut self, model: &FastGRNNRouter) -> ForgettingReport {
let mut report = ForgettingReport::default();
for (task_id, test_set) in self.task_test_sets.iter().enumerate() {
let accuracy = self.evaluate_task(model, test_set);
self.task_performance[task_id].push(accuracy);
// Compute forgetting = max_accuracy - current_accuracy
let max_acc = self.task_performance[task_id].iter()
.cloned()
.fold(0.0f32, f32::max);
let forgetting = (max_acc - accuracy).max(0.0);
report.per_task_accuracy.push(accuracy);
report.per_task_forgetting.push(forgetting);
}
// Average forgetting
report.avg_forgetting = report.per_task_forgetting.iter()
.sum::<f32>() / report.per_task_forgetting.len().max(1) as f32;
// Backward transfer (negative forgetting = improvement)
report.backward_transfer = -report.avg_forgetting;
report
}
fn evaluate_task(&self, model: &FastGRNNRouter, test: &TestSet) -> f32 {
let correct = test.samples.iter()
.filter(|s| model.forward(&s.features).predicted_class == s.label)
.count();
correct as f32 / test.samples.len() as f32
}
}
#[derive(Debug, Default)]
pub struct ForgettingReport {
pub per_task_accuracy: Vec<f32>,
pub per_task_forgetting: Vec<f32>,
pub avg_forgetting: f32,
pub backward_transfer: f32,
}
```
---
## Summary: EWC++ vs Standard EWC
| Feature | Standard EWC | SONA EWC++ |
|---------|-------------|------------|
| Task memory | 1 task | N tasks (configurable) |
| Fisher estimation | Offline, single | Online, streaming |
| Lambda | Fixed | Adaptive per-task |
| Task detection | Manual | Automatic |
| Parameter importance | Uniform | Learned |
| Gradient handling | Direct | Projected |
| Forgetting rate | ~5-10% | **<0.1%** |
EWC++ enables SONA to learn continuously from every interaction while maintaining near-perfect retention of past knowledge.

View File

@@ -0,0 +1,794 @@
# SONA ReasoningBank: Pattern-Driven Self-Optimization
## Learning from Experience Through Trajectory Analysis
---
## 1. Overview
ReasoningBank is SONA's long-term pattern memory, learning what works and applying that knowledge to optimize future decisions.
```
┌─────────────────────────────────────────────────────────────────────┐
│ REASONINGBANK CONCEPT │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query → [What worked before?] → Pattern Match → Optimized Params │
│ ↑ │
│ │ │
│ ┌───────┴────────┐ │
│ │ REASONINGBANK │ │
│ │ │ │
│ │ • Trajectories │ ← Record every query │
│ │ • Patterns │ ← Extract from clusters │
│ │ • Verdicts │ ← What params worked best │
│ │ • Confidence │ ← How certain we are │
│ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 2. Core Data Structures
### Trajectory: Recording Every Interaction
```rust
/// A single query trajectory with outcomes
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct QueryTrajectory {
/// Unique trajectory ID
pub id: u64,
/// Query embedding vector
pub query_embedding: Vec<f32>,
/// Search parameters used
pub search_params: SearchParams,
/// Retrieved result IDs
pub retrieved_ids: Vec<String>,
/// Precision (relevant / retrieved)
pub precision: f32,
/// Recall (retrieved_relevant / total_relevant)
pub recall: f32,
/// Latency in microseconds
pub latency_us: u64,
/// User feedback if provided
pub feedback: Option<UserFeedback>,
/// Timestamp
pub timestamp: i64,
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct SearchParams {
/// ef_search parameter for HNSW
pub ef_search: usize,
/// Number of probes for IVF
pub n_probes: usize,
/// Model tier selected
pub model_tier: ModelTier,
/// Context window size
pub context_tokens: usize,
/// Temperature
pub temperature: f32,
}
```
### Pattern: Learned Behavior Clusters
```rust
/// A learned pattern extracted from trajectory clusters
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct LearnedPattern {
/// Pattern ID
pub id: u64,
/// Centroid embedding (cluster center)
pub centroid: Vec<f32>,
/// Optimal search parameters for this pattern
pub optimal_params: SearchParams,
/// Confidence score (0-1)
pub confidence: f32,
/// Number of trajectories in cluster
pub support_count: usize,
/// Average precision for pattern
pub avg_precision: f32,
/// Average recall for pattern
pub avg_recall: f32,
/// Average latency
pub avg_latency_us: u64,
/// Pattern creation timestamp
pub created_at: i64,
/// Last update timestamp
pub updated_at: i64,
/// Abstraction level (0 = concrete, higher = more abstract)
pub abstraction_level: u32,
/// Child pattern IDs (for hierarchical patterns)
pub children: Vec<u64>,
}
```
### Verdict: Decision Judgments
```rust
/// Verdict on what parameters worked best
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Verdict {
/// Pattern this verdict applies to
pub pattern_id: u64,
/// Recommended parameters
pub recommended_params: SearchParams,
/// Confidence in recommendation
pub confidence: f32,
/// Evidence supporting this verdict
pub evidence: VerdictEvidence,
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct VerdictEvidence {
/// Number of supporting trajectories
pub support_count: usize,
/// Average improvement over default
pub avg_improvement: f32,
/// Statistical significance (p-value)
pub p_value: f32,
/// Consistency score (low variance = high consistency)
pub consistency: f32,
}
```
---
## 3. ReasoningBank Implementation
### Core Storage and Retrieval
```rust
use dashmap::DashMap;
use parking_lot::RwLock;
/// ReasoningBank: Pattern-based learning and optimization
pub struct ReasoningBank {
/// Trajectory ring buffer (recent interactions)
trajectories: RwLock<CircularBuffer<QueryTrajectory>>,
/// Learned patterns (concurrent hashmap)
patterns: DashMap<u64, LearnedPattern>,
/// Pattern index for fast similarity lookup
pattern_index: RwLock<HNSWIndex>,
/// Verdicts per pattern
verdicts: DashMap<u64, Verdict>,
/// Configuration
config: ReasoningBankConfig,
/// Pattern ID counter
next_pattern_id: AtomicU64,
/// Statistics
stats: RwLock<ReasoningBankStats>,
}
impl ReasoningBank {
/// Create new ReasoningBank
pub fn new(config: ReasoningBankConfig) -> Self {
Self {
trajectories: RwLock::new(CircularBuffer::new(config.trajectory_capacity)),
patterns: DashMap::new(),
pattern_index: RwLock::new(HNSWIndex::new(config.embedding_dim, config.ef_construction)),
verdicts: DashMap::new(),
config,
next_pattern_id: AtomicU64::new(0),
stats: RwLock::new(ReasoningBankStats::default()),
}
}
/// Record a new trajectory
#[inline]
pub fn record_trajectory(&self, trajectory: QueryTrajectory) {
let mut trajectories = self.trajectories.write();
trajectories.push(trajectory);
// Update stats
let mut stats = self.stats.write();
stats.total_trajectories += 1;
}
/// Find most similar pattern to query
pub fn find_similar_pattern(&self, query_embedding: &[f32], k: usize) -> Vec<PatternMatch> {
let index = self.pattern_index.read();
let neighbors = index.search(query_embedding, k, self.config.ef_search);
neighbors.iter()
.filter_map(|&(id, distance)| {
self.patterns.get(&id).map(|p| PatternMatch {
pattern: p.clone(),
similarity: 1.0 - distance, // Convert distance to similarity
})
})
.collect()
}
/// Get optimized parameters for query
pub fn get_optimized_params(&self, query_embedding: &[f32]) -> OptimizedParams {
// Find similar patterns
let matches = self.find_similar_pattern(query_embedding, self.config.top_k_patterns);
if matches.is_empty() {
// No matching patterns - use defaults
return OptimizedParams {
params: SearchParams::default(),
confidence: 0.0,
source: ParamSource::Default,
};
}
// Interpolate parameters based on similarity and confidence
let mut weighted_params = SearchParams::default();
let mut total_weight = 0.0f32;
for m in &matches {
let weight = m.similarity * m.pattern.confidence;
total_weight += weight;
weighted_params.ef_search += (m.pattern.optimal_params.ef_search as f32 * weight) as usize;
weighted_params.n_probes += (m.pattern.optimal_params.n_probes as f32 * weight) as usize;
weighted_params.temperature += m.pattern.optimal_params.temperature * weight;
// ... other params
}
if total_weight > 0.0 {
weighted_params.ef_search = (weighted_params.ef_search as f32 / total_weight) as usize;
weighted_params.n_probes = (weighted_params.n_probes as f32 / total_weight) as usize;
weighted_params.temperature /= total_weight;
}
OptimizedParams {
params: weighted_params,
confidence: total_weight / matches.len() as f32,
source: ParamSource::Pattern(matches[0].pattern.id),
}
}
/// Record feedback for trajectory
pub fn record_feedback(&self, trajectory_id: u64, feedback: UserFeedback) {
// Find trajectory and update
let mut trajectories = self.trajectories.write();
if let Some(traj) = trajectories.iter_mut().find(|t| t.id == trajectory_id) {
traj.feedback = Some(feedback.clone());
}
// Update related pattern confidence
// Higher feedback = higher confidence in that pattern's params
if let Some(pattern_id) = self.find_pattern_for_trajectory(trajectory_id) {
if let Some(mut pattern) = self.patterns.get_mut(&pattern_id) {
let feedback_delta = feedback.rating as f32 / 5.0 - 0.5; // -0.5 to +0.5
pattern.confidence = (pattern.confidence + 0.1 * feedback_delta).clamp(0.0, 1.0);
}
}
}
}
```
---
## 4. Pattern Extraction
### K-Means++ Clustering for Pattern Discovery
```rust
/// Pattern extractor using K-means++ clustering
pub struct PatternExtractor {
/// Number of clusters to extract
k: usize,
/// Maximum iterations
max_iter: usize,
/// Convergence threshold
epsilon: f32,
}
impl PatternExtractor {
/// Extract patterns from trajectories
pub fn extract(&self, trajectories: &[QueryTrajectory]) -> Vec<LearnedPattern> {
if trajectories.len() < self.k {
return Vec::new();
}
// Collect embeddings
let embeddings: Vec<&[f32]> = trajectories.iter()
.map(|t| t.query_embedding.as_slice())
.collect();
// K-means++ initialization
let mut centroids = self.kmeans_plus_plus_init(&embeddings);
// K-means iteration
let mut assignments = vec![0usize; trajectories.len()];
for _ in 0..self.max_iter {
// Assignment step
let old_assignments = assignments.clone();
for (i, emb) in embeddings.iter().enumerate() {
let mut min_dist = f32::MAX;
let mut min_idx = 0;
for (c_idx, centroid) in centroids.iter().enumerate() {
let dist = euclidean_distance(emb, centroid);
if dist < min_dist {
min_dist = dist;
min_idx = c_idx;
}
}
assignments[i] = min_idx;
}
// Check convergence
if assignments == old_assignments {
break;
}
// Update step
centroids = self.compute_centroids(&embeddings, &assignments);
}
// Create patterns from clusters
let mut patterns = Vec::new();
for cluster_id in 0..self.k {
let cluster_trajectories: Vec<_> = trajectories.iter()
.zip(assignments.iter())
.filter(|(_, &a)| a == cluster_id)
.map(|(t, _)| t)
.collect();
if cluster_trajectories.len() < 3 {
continue; // Skip small clusters
}
let pattern = self.create_pattern_from_cluster(
cluster_id as u64,
&centroids[cluster_id],
&cluster_trajectories,
);
patterns.push(pattern);
}
patterns
}
fn kmeans_plus_plus_init(&self, embeddings: &[&[f32]]) -> Vec<Vec<f32>> {
let mut centroids = Vec::with_capacity(self.k);
let mut rng = rand::thread_rng();
// First centroid: random
let first_idx = rng.gen_range(0..embeddings.len());
centroids.push(embeddings[first_idx].to_vec());
// Remaining centroids: D² weighting
for _ in 1..self.k {
let mut distances: Vec<f32> = embeddings.iter()
.map(|emb| {
centroids.iter()
.map(|c| euclidean_distance(emb, c))
.fold(f32::MAX, f32::min)
})
.collect();
// Square distances for D² sampling
let total: f32 = distances.iter().map(|d| d * d).sum();
let threshold = rng.gen::<f32>() * total;
let mut cumsum = 0.0;
let mut selected = 0;
for (i, d) in distances.iter().enumerate() {
cumsum += d * d;
if cumsum >= threshold {
selected = i;
break;
}
}
centroids.push(embeddings[selected].to_vec());
}
centroids
}
fn create_pattern_from_cluster(
&self,
id: u64,
centroid: &[f32],
trajectories: &[&QueryTrajectory],
) -> LearnedPattern {
// Compute optimal params as weighted average by quality
let mut total_weight = 0.0f32;
let mut ef_sum = 0.0f32;
let mut probes_sum = 0.0f32;
let mut temp_sum = 0.0f32;
let mut precision_sum = 0.0f32;
let mut recall_sum = 0.0f32;
let mut latency_sum = 0u64;
for t in trajectories {
let weight = t.precision * t.recall; // Quality as weight
total_weight += weight;
ef_sum += t.search_params.ef_search as f32 * weight;
probes_sum += t.search_params.n_probes as f32 * weight;
temp_sum += t.search_params.temperature * weight;
precision_sum += t.precision;
recall_sum += t.recall;
latency_sum += t.latency_us;
}
let n = trajectories.len() as f32;
LearnedPattern {
id,
centroid: centroid.to_vec(),
optimal_params: SearchParams {
ef_search: (ef_sum / total_weight).round() as usize,
n_probes: (probes_sum / total_weight).round() as usize,
model_tier: ModelTier::Auto, // Determined separately
context_tokens: 2048, // Default
temperature: temp_sum / total_weight,
},
confidence: (total_weight / n).clamp(0.0, 1.0),
support_count: trajectories.len(),
avg_precision: precision_sum / n,
avg_recall: recall_sum / n,
avg_latency_us: latency_sum / trajectories.len() as u64,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
abstraction_level: 0,
children: Vec::new(),
}
}
}
```
---
## 5. Verdict Judgment System
### Evaluating What Works Best
```rust
/// Verdict judge for parameter optimization
pub struct VerdictJudge {
/// Minimum samples for statistical significance
min_samples: usize,
/// Significance level (p-value threshold)
alpha: f32,
}
impl VerdictJudge {
/// Judge optimal parameters for a pattern
pub fn judge(&self, pattern: &LearnedPattern, trajectories: &[&QueryTrajectory]) -> Option<Verdict> {
if trajectories.len() < self.min_samples {
return None; // Not enough evidence
}
// Group trajectories by parameter configuration
let mut param_groups: HashMap<ParamKey, Vec<&QueryTrajectory>> = HashMap::new();
for t in trajectories {
let key = ParamKey::from(&t.search_params);
param_groups.entry(key).or_default().push(t);
}
// Find best performing configuration
let mut best_config: Option<(ParamKey, f32, Vec<&QueryTrajectory>)> = None;
for (key, group) in &param_groups {
if group.len() < 3 {
continue;
}
// Compute quality score (F1 of precision and recall)
let avg_quality: f32 = group.iter()
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
.sum::<f32>() / group.len() as f32;
match &best_config {
None => best_config = Some((key.clone(), avg_quality, group.clone())),
Some((_, best_quality, _)) if avg_quality > *best_quality => {
best_config = Some((key.clone(), avg_quality, group.clone()));
}
_ => {}
}
}
let (best_key, best_quality, best_group) = best_config?;
// Statistical significance test
let p_value = self.compute_significance(&best_group, trajectories);
if p_value > self.alpha {
return None; // Not significant
}
// Compute consistency (inverse of coefficient of variation)
let qualities: Vec<f32> = best_group.iter()
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
.collect();
let mean = qualities.iter().sum::<f32>() / qualities.len() as f32;
let variance = qualities.iter()
.map(|q| (q - mean).powi(2))
.sum::<f32>() / qualities.len() as f32;
let std_dev = variance.sqrt();
let consistency = 1.0 / (1.0 + std_dev / mean);
// Compute improvement over default
let default_quality = self.compute_default_quality(trajectories);
let improvement = (best_quality - default_quality) / default_quality;
Some(Verdict {
pattern_id: pattern.id,
recommended_params: best_key.to_params(),
confidence: best_quality * consistency,
evidence: VerdictEvidence {
support_count: best_group.len(),
avg_improvement: improvement,
p_value,
consistency,
},
})
}
fn compute_significance(&self, best: &[&QueryTrajectory], all: &[&QueryTrajectory]) -> f32 {
// Welch's t-test for comparing means
let best_qualities: Vec<f32> = best.iter()
.map(|t| t.precision * t.recall)
.collect();
let all_qualities: Vec<f32> = all.iter()
.map(|t| t.precision * t.recall)
.collect();
welch_t_test(&best_qualities, &all_qualities)
}
fn compute_default_quality(&self, trajectories: &[&QueryTrajectory]) -> f32 {
// Assume first configuration or most common is "default"
let default_group: Vec<_> = trajectories.iter()
.filter(|t| t.search_params.ef_search == SearchParams::default().ef_search)
.collect();
if default_group.is_empty() {
0.5 // Baseline assumption
} else {
default_group.iter()
.map(|t| t.precision * t.recall)
.sum::<f32>() / default_group.len() as f32
}
}
}
```
---
## 6. Integration with Router
### Using ReasoningBank to Optimize Router Decisions
```rust
impl FastGRNNRouter {
/// Forward pass with ReasoningBank optimization
pub fn forward_with_reasoning(
&self,
features: &[f32],
reasoning_bank: &ReasoningBank,
) -> RouterDecision {
// Get pattern-based parameter suggestions
let pattern_params = reasoning_bank.get_optimized_params(features);
// Standard router forward
let mut decision = self.forward(features);
// Blend router decision with pattern suggestions
if pattern_params.confidence > 0.5 {
let blend_factor = pattern_params.confidence * 0.3; // Max 30% influence
// Interpolate temperature
decision.temperature = (1.0 - blend_factor) * decision.temperature
+ blend_factor * pattern_params.params.temperature;
// Context token suggestion influences context selection
let suggested_context = pattern_params.params.context_tokens;
let router_context = decision.context_tokens;
decision.context_tokens = ((1.0 - blend_factor) * router_context as f32
+ blend_factor * suggested_context as f32) as usize;
decision.reasoning_confidence = pattern_params.confidence;
decision.reasoning_pattern_id = pattern_params.source.pattern_id();
}
decision
}
}
```
---
## 7. Pattern Consolidation and Pruning
### Managing Pattern Memory
```rust
impl ReasoningBank {
/// Consolidate similar patterns
pub fn consolidate_patterns(&mut self) {
// Find similar pattern pairs
let pattern_ids: Vec<u64> = self.patterns.iter()
.map(|p| *p.key())
.collect();
let mut to_merge: Vec<(u64, u64)> = Vec::new();
for i in 0..pattern_ids.len() {
for j in (i+1)..pattern_ids.len() {
let p1 = self.patterns.get(&pattern_ids[i]).unwrap();
let p2 = self.patterns.get(&pattern_ids[j]).unwrap();
let similarity = cosine_similarity(&p1.centroid, &p2.centroid);
if similarity > 0.95 {
// Very similar - merge
to_merge.push((pattern_ids[i], pattern_ids[j]));
}
}
}
// Merge patterns
for (keep_id, remove_id) in to_merge {
if let (Some(mut keep), Some(remove)) = (
self.patterns.get_mut(&keep_id),
self.patterns.get(&remove_id)
) {
// Weighted average of centroids
let total_support = keep.support_count + remove.support_count;
let w1 = keep.support_count as f32 / total_support as f32;
let w2 = remove.support_count as f32 / total_support as f32;
for (c, (c1, c2)) in keep.centroid.iter_mut()
.zip(keep.centroid.iter().zip(remove.centroid.iter()))
{
*c = w1 * c1 + w2 * c2;
}
// Update support count
keep.support_count = total_support;
keep.confidence = (keep.confidence * w1 + remove.confidence * w2).min(1.0);
keep.updated_at = chrono::Utc::now().timestamp();
}
// Remove merged pattern
self.patterns.remove(&remove_id);
}
}
/// Prune low-confidence patterns
pub fn prune_patterns(&mut self, min_confidence: f32, min_support: usize) {
let to_remove: Vec<u64> = self.patterns.iter()
.filter(|p| p.confidence < min_confidence || p.support_count < min_support)
.map(|p| *p.key())
.collect();
for id in to_remove {
self.patterns.remove(&id);
self.verdicts.remove(&id);
}
}
/// Build pattern hierarchy (abstraction levels)
pub fn build_hierarchy(&mut self) {
// Hierarchical clustering on existing patterns
let patterns: Vec<_> = self.patterns.iter()
.map(|p| (p.key().clone(), p.centroid.clone()))
.collect();
let hierarchy = HierarchicalClustering::new()
.linkage(Linkage::Ward)
.fit(&patterns);
// Create meta-patterns at each level
for level in 1..=3 {
let clusters = hierarchy.clusters_at_level(level);
for cluster in clusters {
if cluster.size() > 1 {
let child_ids: Vec<u64> = cluster.member_ids();
let meta_centroid = cluster.centroid();
// Average params from children
let children: Vec<_> = child_ids.iter()
.filter_map(|id| self.patterns.get(id))
.collect();
let meta_params = self.average_params(&children);
let meta_pattern = LearnedPattern {
id: self.next_pattern_id.fetch_add(1, Ordering::SeqCst),
centroid: meta_centroid,
optimal_params: meta_params,
confidence: children.iter().map(|c| c.confidence).sum::<f32>() / children.len() as f32,
support_count: children.iter().map(|c| c.support_count).sum(),
avg_precision: children.iter().map(|c| c.avg_precision).sum::<f32>() / children.len() as f32,
avg_recall: children.iter().map(|c| c.avg_recall).sum::<f32>() / children.len() as f32,
avg_latency_us: children.iter().map(|c| c.avg_latency_us).sum::<u64>() / children.len() as u64,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
abstraction_level: level as u32,
children: child_ids,
};
self.patterns.insert(meta_pattern.id, meta_pattern);
}
}
}
}
}
```
---
## 8. Statistics and Monitoring
```rust
#[derive(Default, Debug)]
pub struct ReasoningBankStats {
/// Total trajectories recorded
pub total_trajectories: u64,
/// Total patterns stored
pub total_patterns: usize,
/// Total verdicts issued
pub total_verdicts: usize,
/// Pattern match hit rate
pub pattern_hit_rate: f32,
/// Average confidence in recommendations
pub avg_recommendation_confidence: f32,
/// Improvement from pattern optimization
pub avg_improvement_percent: f32,
}
impl ReasoningBank {
/// Get current statistics
pub fn stats(&self) -> ReasoningBankStats {
let stats = self.stats.read();
ReasoningBankStats {
total_trajectories: stats.total_trajectories,
total_patterns: self.patterns.len(),
total_verdicts: self.verdicts.len(),
pattern_hit_rate: stats.pattern_hit_rate,
avg_recommendation_confidence: stats.avg_recommendation_confidence,
avg_improvement_percent: stats.avg_improvement_percent,
}
}
/// Export all patterns for persistence
pub fn export(&self) -> ReasoningBankExport {
ReasoningBankExport {
patterns: self.patterns.iter()
.map(|p| p.value().clone())
.collect(),
verdicts: self.verdicts.iter()
.map(|v| v.value().clone())
.collect(),
}
}
/// Import patterns from persistence
pub fn import(&mut self, export: ReasoningBankExport) {
for pattern in export.patterns {
let id = pattern.id;
self.patterns.insert(id, pattern.clone());
self.pattern_index.write().insert(id, &pattern.centroid);
}
for verdict in export.verdicts {
self.verdicts.insert(verdict.pattern_id, verdict);
}
}
}
```
---
## Summary
ReasoningBank enables SONA to:
1. **Learn from every query** through trajectory recording
2. **Discover patterns** via K-means++ clustering
3. **Judge what works** through statistical verdict analysis
4. **Optimize future decisions** by interpolating from similar patterns
5. **Build abstractions** through hierarchical pattern consolidation
This creates a continuously improving system where past experience directly enhances future performance.

View File

@@ -0,0 +1,755 @@
# SONA Memory Dreams: Offline Consolidation Engine
## Creativity Through Neural Replay and Recombination
---
## 1. Biological Inspiration
### Why Dreams Matter for Learning
```
HUMAN SLEEP-BASED LEARNING
══════════════════════════
Awake: Sleep (REM): Next Day:
───────────────── ───────────────── ─────────────────
• New experiences • Replay memories • Consolidated knowledge
• Pattern matching • Recombine ideas • Novel insights
• Working memory • Strengthen important • Creative connections
• Prune unimportant
```
Research shows that:
- **Memory consolidation** happens during sleep
- **Creative insights** emerge from random memory replay
- **Neural pruning** removes low-value connections
- **Analogical reasoning** connects distant concepts
SONA's Dream Engine replicates these mechanisms for AI self-improvement.
---
## 2. Dream Engine Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ DREAM ENGINE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ MEMORY GRAPH │──────┐ │
│ └───────────────┘ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM GENERATOR │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Random │ │Weighted │ │ │
│ │ │ Walks │ │ Sampling│ │ │
│ │ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ Dream Sequence │ │ │
│ │ │ [M₁→M₂→M₃→...→Mₙ] │ │ │
│ │ └──────────┬───────────┘ │ │
│ └─────────────┼───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM EVALUATOR │ │
│ │ │ │
│ │ • Novelty Score (new connections?) │ │
│ │ • Coherence Score (makes sense?) │ │
│ │ • Utility Score (useful insight?) │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ DREAM INTEGRATOR │ │
│ │ │ │
│ │ • Add weak creative edges │ │
│ │ • Update pattern associations │ │
│ │ • Generate novel hypotheses │ │
│ └─────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 3. Dream Generation
### Random Walk Memory Replay
```rust
/// Dream generator using random walks on memory graph
pub struct DreamGenerator {
/// Temperature for random walk (higher = more random)
temperature: f32,
/// Maximum dream length
max_length: usize,
/// Minimum coherence threshold
min_coherence: f32,
/// Creativity bias (prefer novel connections)
creativity_bias: f32,
}
impl DreamGenerator {
/// Generate a single dream sequence
pub fn generate_dream(
&self,
memory: &MemoryGraph,
start_node: Option<NodeId>,
) -> Dream {
let mut sequence = Vec::new();
let mut visited = HashSet::new();
// Start from random high-activation node if not specified
let current = start_node.unwrap_or_else(|| {
memory.sample_by_activation()
});
sequence.push(current);
visited.insert(current);
// Random walk with creativity-weighted transitions
for _ in 0..self.max_length {
let neighbors = memory.get_neighbors(current);
if neighbors.is_empty() {
break;
}
// Compute transition probabilities
let probs: Vec<f32> = neighbors.iter()
.map(|&(neighbor, edge_weight)| {
let novelty_bonus = if visited.contains(&neighbor) {
0.1 // Discourage revisits
} else {
1.0 + self.creativity_bias * (1.0 - memory.get_access_frequency(neighbor))
};
(edge_weight * novelty_bonus).powf(1.0 / self.temperature)
})
.collect();
// Sample next node
let next = sample_weighted(&neighbors, &probs);
if let Some((next_node, _)) = next {
sequence.push(next_node);
visited.insert(next_node);
} else {
break;
}
}
Dream {
sequence,
temperature: self.temperature,
timestamp: chrono::Utc::now().timestamp(),
}
}
/// Generate creative jump dream (non-local connections)
pub fn generate_creative_dream(
&self,
memory: &MemoryGraph,
num_jumps: usize,
) -> Dream {
let mut sequence = Vec::new();
// Sample diverse starting points
let anchors = memory.sample_diverse(num_jumps, 0.3);
for anchor in anchors {
sequence.push(anchor);
// Short local walk from each anchor
let local_walk = self.generate_dream(memory, Some(anchor));
sequence.extend(local_walk.sequence.iter().skip(1).take(3));
}
Dream {
sequence,
temperature: self.temperature * 2.0, // Higher temperature for creative dreams
timestamp: chrono::Utc::now().timestamp(),
}
}
}
/// A dream sequence
pub struct Dream {
/// Sequence of visited memory nodes
pub sequence: Vec<NodeId>,
/// Temperature used for generation
pub temperature: f32,
/// Generation timestamp
pub timestamp: i64,
}
```
---
## 4. Dream Evaluation
### Measuring Dream Quality
```rust
/// Evaluator for dream quality
pub struct DreamEvaluator {
/// Memory graph reference
memory: Arc<MemoryGraph>,
/// Novelty detection threshold
novelty_threshold: f32,
}
impl DreamEvaluator {
/// Evaluate dream quality across multiple dimensions
pub fn evaluate(&self, dream: &Dream) -> DreamQuality {
DreamQuality {
novelty: self.compute_novelty(dream),
coherence: self.compute_coherence(dream),
utility: self.compute_utility(dream),
diversity: self.compute_diversity(dream),
}
}
/// Novelty: How many new connections are suggested?
fn compute_novelty(&self, dream: &Dream) -> f32 {
let mut novel_pairs = 0;
let mut total_pairs = 0;
for i in 0..dream.sequence.len() {
for j in (i+1)..dream.sequence.len() {
total_pairs += 1;
let node_a = dream.sequence[i];
let node_b = dream.sequence[j];
// Check if edge exists
if !self.memory.has_edge(node_a, node_b) {
// Check semantic similarity
let emb_a = self.memory.get_embedding(node_a);
let emb_b = self.memory.get_embedding(node_b);
let sim = cosine_similarity(&emb_a, &emb_b);
// Novel = no edge but moderate similarity
if sim > 0.3 && sim < 0.8 {
novel_pairs += 1;
}
}
}
}
novel_pairs as f32 / total_pairs.max(1) as f32
}
/// Coherence: Does the dream sequence make semantic sense?
fn compute_coherence(&self, dream: &Dream) -> f32 {
if dream.sequence.len() < 2 {
return 1.0;
}
let mut coherence_sum = 0.0f32;
for window in dream.sequence.windows(2) {
let emb_a = self.memory.get_embedding(window[0]);
let emb_b = self.memory.get_embedding(window[1]);
coherence_sum += cosine_similarity(&emb_a, &emb_b);
}
coherence_sum / (dream.sequence.len() - 1) as f32
}
/// Utility: Are the suggested connections potentially useful?
fn compute_utility(&self, dream: &Dream) -> f32 {
// Based on node quality scores and access patterns
let avg_quality: f32 = dream.sequence.iter()
.map(|&id| self.memory.get_node_quality(id))
.sum::<f32>() / dream.sequence.len() as f32;
// Higher utility if connecting high-quality nodes
avg_quality
}
/// Diversity: How diverse are the visited nodes?
fn compute_diversity(&self, dream: &Dream) -> f32 {
// Average pairwise distance in embedding space
let embeddings: Vec<_> = dream.sequence.iter()
.map(|&id| self.memory.get_embedding(id))
.collect();
let mut total_dist = 0.0f32;
let mut count = 0;
for i in 0..embeddings.len() {
for j in (i+1)..embeddings.len() {
total_dist += 1.0 - cosine_similarity(&embeddings[i], &embeddings[j]);
count += 1;
}
}
total_dist / count.max(1) as f32
}
}
#[derive(Debug, Clone)]
pub struct DreamQuality {
/// How many novel connections suggested (0-1)
pub novelty: f32,
/// How semantically coherent (0-1)
pub coherence: f32,
/// How useful the connections might be (0-1)
pub utility: f32,
/// How diverse the dream content (0-1)
pub diversity: f32,
}
impl DreamQuality {
/// Overall quality score
pub fn overall(&self) -> f32 {
// Weighted combination favoring novelty and coherence
0.4 * self.novelty + 0.3 * self.coherence + 0.2 * self.utility + 0.1 * self.diversity
}
/// Is this dream worth integrating?
pub fn is_valuable(&self, threshold: f32) -> bool {
self.novelty > 0.3 && self.coherence > 0.4 && self.overall() > threshold
}
}
```
---
## 5. Dream Integration
### Applying Dream Insights to Memory
```rust
/// Integrates valuable dreams into memory graph
pub struct DreamIntegrator {
/// Memory graph to update
memory: Arc<RwLock<MemoryGraph>>,
/// Strength of new creative edges
creative_edge_strength: f32,
/// Decay factor for dream-derived edges
dream_edge_decay: f32,
}
impl DreamIntegrator {
/// Integrate a valuable dream into memory
pub fn integrate(&self, dream: &Dream, quality: &DreamQuality) -> IntegrationResult {
let mut result = IntegrationResult::default();
if !quality.is_valuable(0.5) {
return result; // Skip low-quality dreams
}
let mut memory = self.memory.write();
// Extract novel connections from dream
let novel_connections = self.extract_novel_connections(dream, &memory);
for (node_a, node_b, strength) in novel_connections {
// Add weak creative edge
let edge_strength = self.creative_edge_strength * strength * quality.overall();
memory.add_edge(
node_a,
node_b,
EdgeType::Creative,
edge_strength,
);
result.edges_added += 1;
}
// Update node associations based on dream co-occurrence
for window in dream.sequence.windows(3) {
memory.update_association(window[0], window[2], 0.01);
}
result.dream_quality = quality.overall();
result
}
fn extract_novel_connections(
&self,
dream: &Dream,
memory: &MemoryGraph,
) -> Vec<(NodeId, NodeId, f32)> {
let mut connections = Vec::new();
for i in 0..dream.sequence.len() {
for j in (i+1)..dream.sequence.len().min(i+5) { // Only nearby in sequence
let node_a = dream.sequence[i];
let node_b = dream.sequence[j];
if !memory.has_edge(node_a, node_b) {
let emb_a = memory.get_embedding(node_a);
let emb_b = memory.get_embedding(node_b);
let sim = cosine_similarity(&emb_a, &emb_b);
if sim > 0.3 {
// Connection strength based on similarity and sequence proximity
let proximity_factor = 1.0 / (j - i) as f32;
let strength = sim * proximity_factor;
connections.push((node_a, node_b, strength));
}
}
}
}
connections
}
}
#[derive(Default)]
pub struct IntegrationResult {
pub edges_added: usize,
pub associations_updated: usize,
pub dream_quality: f32,
}
```
---
## 6. Memory Consolidation
### Strengthening Important Memories
```rust
/// Consolidation engine for memory pruning and strengthening
pub struct ConsolidationEngine {
/// Memory graph reference
memory: Arc<RwLock<MemoryGraph>>,
/// Minimum access frequency for retention
min_access_frequency: f32,
/// Age decay factor (older = more decay)
age_decay: f32,
/// Quality threshold for preservation
quality_threshold: f32,
}
impl ConsolidationEngine {
/// Run full consolidation pass
pub fn consolidate(&self) -> ConsolidationReport {
let mut report = ConsolidationReport::default();
// Phase 1: Identify memories by value
let (high_value, medium_value, low_value) = self.categorize_memories();
report.high_value_count = high_value.len();
report.medium_value_count = medium_value.len();
report.low_value_count = low_value.len();
// Phase 2: Strengthen high-value memories
for &node_id in &high_value {
self.strengthen_memory(node_id);
report.memories_strengthened += 1;
}
// Phase 3: Decay low-value memories
for &node_id in &low_value {
let retained = self.decay_memory(node_id);
if retained {
report.memories_decayed += 1;
} else {
report.memories_removed += 1;
}
}
// Phase 4: Prune weak edges
let pruned = self.prune_weak_edges();
report.edges_pruned = pruned;
// Phase 5: Merge similar memories
let merged = self.merge_similar_memories();
report.memories_merged = merged;
report
}
fn categorize_memories(&self) -> (Vec<NodeId>, Vec<NodeId>, Vec<NodeId>) {
let memory = self.memory.read();
let mut high = Vec::new();
let mut medium = Vec::new();
let mut low = Vec::new();
for node in memory.iter_nodes() {
let value_score = self.compute_value_score(node);
if value_score > 0.7 {
high.push(node.id);
} else if value_score > 0.3 {
medium.push(node.id);
} else {
low.push(node.id);
}
}
(high, medium, low)
}
fn compute_value_score(&self, node: &MemoryNode) -> f32 {
let memory = self.memory.read();
// Factors:
// 1. Access frequency (more access = more valuable)
let freq_score = (node.access_count as f32 / 100.0).min(1.0);
// 2. Recency (recent = more valuable)
let age_days = (chrono::Utc::now().timestamp() - node.last_accessed) / 86400;
let recency_score = (-self.age_decay * age_days as f32).exp();
// 3. Quality (explicit quality score)
let quality_score = node.quality_score;
// 4. Connectivity (well-connected = more valuable)
let degree = memory.node_degree(node.id);
let connectivity_score = (degree as f32 / 10.0).min(1.0);
// Weighted combination
0.3 * freq_score + 0.2 * recency_score + 0.3 * quality_score + 0.2 * connectivity_score
}
fn strengthen_memory(&self, node_id: NodeId) {
let mut memory = self.memory.write();
// Increase edge weights to this node
for edge in memory.get_edges_to(node_id) {
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(1.1));
}
// Mark as consolidated
if let Some(node) = memory.get_node_mut(node_id) {
node.consolidation_count += 1;
node.last_consolidated = chrono::Utc::now().timestamp();
}
}
fn decay_memory(&self, node_id: NodeId) -> bool {
let mut memory = self.memory.write();
// Reduce edge weights
for edge in memory.get_edges_to(node_id) {
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(0.5));
}
// Check if node should be removed entirely
let total_incoming_weight: f32 = memory.get_edges_to(node_id)
.iter()
.map(|e| e.weight)
.sum();
if total_incoming_weight < 0.01 {
// Remove isolated or nearly-isolated node
memory.remove_node(node_id);
false // Not retained
} else {
true // Retained but weakened
}
}
fn prune_weak_edges(&self) -> usize {
let mut memory = self.memory.write();
let weak_edges: Vec<_> = memory.iter_edges()
.filter(|e| e.weight < 0.01)
.map(|e| e.id)
.collect();
for edge_id in &weak_edges {
memory.remove_edge(*edge_id);
}
weak_edges.len()
}
fn merge_similar_memories(&self) -> usize {
let mut memory = self.memory.write();
let mut merged_count = 0;
// Find highly similar node pairs
let nodes: Vec<_> = memory.iter_nodes().collect();
for i in 0..nodes.len() {
for j in (i+1)..nodes.len() {
let sim = cosine_similarity(&nodes[i].embedding, &nodes[j].embedding);
if sim > 0.98 {
// Merge j into i
memory.merge_nodes(nodes[i].id, nodes[j].id);
merged_count += 1;
}
}
}
merged_count
}
}
#[derive(Default)]
pub struct ConsolidationReport {
pub high_value_count: usize,
pub medium_value_count: usize,
pub low_value_count: usize,
pub memories_strengthened: usize,
pub memories_decayed: usize,
pub memories_removed: usize,
pub memories_merged: usize,
pub edges_pruned: usize,
}
```
---
## 7. Full Dream Cycle
### Orchestrating the Dream Process
```rust
/// Complete dream cycle orchestrator
pub struct DreamCycle {
generator: DreamGenerator,
evaluator: DreamEvaluator,
integrator: DreamIntegrator,
consolidator: ConsolidationEngine,
config: DreamCycleConfig,
}
impl DreamCycle {
/// Run complete dream cycle (weekly maintenance)
pub async fn run(&self) -> DreamCycleReport {
let start = Instant::now();
let mut report = DreamCycleReport::default();
// Phase 1: Generate dreams
tracing::info!("Starting dream generation phase");
let dreams = self.generate_dreams();
report.dreams_generated = dreams.len();
// Phase 2: Evaluate dreams
tracing::info!("Evaluating {} dreams", dreams.len());
let evaluated: Vec<_> = dreams.iter()
.map(|d| (d, self.evaluator.evaluate(d)))
.collect();
// Phase 3: Integrate valuable dreams
tracing::info!("Integrating valuable dreams");
for (dream, quality) in &evaluated {
if quality.is_valuable(self.config.dream_threshold) {
let result = self.integrator.integrate(dream, quality);
report.edges_added += result.edges_added;
report.dreams_integrated += 1;
}
}
// Phase 4: Memory consolidation
tracing::info!("Running memory consolidation");
report.consolidation = self.consolidator.consolidate();
report.elapsed_ms = start.elapsed().as_millis() as u64;
report.timestamp = chrono::Utc::now().timestamp();
tracing::info!(
dreams = report.dreams_generated,
integrated = report.dreams_integrated,
edges = report.edges_added,
elapsed_ms = report.elapsed_ms,
"Dream cycle completed"
);
report
}
fn generate_dreams(&self) -> Vec<Dream> {
let mut dreams = Vec::new();
// Regular random walk dreams
for _ in 0..self.config.num_regular_dreams {
let dream = self.generator.generate_dream(&self.memory, None);
dreams.push(dream);
}
// Creative jump dreams
for _ in 0..self.config.num_creative_dreams {
let dream = self.generator.generate_creative_dream(
&self.memory,
self.config.creative_jump_count,
);
dreams.push(dream);
}
dreams
}
}
#[derive(Default)]
pub struct DreamCycleReport {
pub dreams_generated: usize,
pub dreams_integrated: usize,
pub edges_added: usize,
pub consolidation: ConsolidationReport,
pub elapsed_ms: u64,
pub timestamp: i64,
}
```
---
## 8. Integration with exo-exotic Dreams Module
SONA integrates with the exo-ai-2025 dream experiments:
```rust
// From exo-exotic crate
use exo_exotic::experiments::dreams::{
DreamExperiment,
DreamConfig,
NoveltyMeasure,
};
impl DreamCycle {
/// Run advanced dream experiments from exo-exotic
pub async fn run_exotic_dreams(&self) -> ExoticDreamReport {
let dream_experiment = DreamExperiment::new(DreamConfig {
memory_count: self.memory.node_count(),
replay_probability: 0.7,
recombination_rate: 0.3,
novelty_threshold: 0.5,
});
let result = dream_experiment.run(&self.memory).await;
ExoticDreamReport {
novelty_score: result.novelty,
coherence_score: result.coherence,
creative_insights: result.insights.len(),
new_hypotheses: result.hypotheses,
}
}
}
```
---
## Summary
SONA's Dream Engine enables:
| Feature | Mechanism | Outcome |
|---------|-----------|---------|
| **Memory Replay** | Random walks on memory graph | Strengthens important connections |
| **Creative Recombination** | High-temperature sampling | Discovers novel associations |
| **Quality Filtering** | Novelty + coherence metrics | Only valuable dreams integrated |
| **Weak Edge Creation** | Dream-derived connections | Enables creative retrieval |
| **Memory Consolidation** | Value-based pruning | Efficient memory usage |
Dreams allow SONA to:
1. **Discover** connections it wouldn't find through normal operation
2. **Explore** the hypothesis space without user cost
3. **Consolidate** valuable knowledge
4. **Prune** low-value information
5. **Remain creative** while staying grounded

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,814 @@
# SONA Performance Benchmarks
## Overview
This document defines performance targets, benchmark methodology, and expected results for SONA components. All benchmarks are designed to be reproducible and measurable.
## Performance Targets Summary
```
┌─────────────────────────────────────────────────────────────────────────┐
│ SONA Performance Targets │
├─────────────────────────────────────────────────────────────────────────┤
│ Component │ Target │ Stretch Goal │ Unit │
├─────────────────────────┼────────────────┼───────────────┼─────────────┤
│ Micro-LoRA forward │ <50μs │ <20μs │ per request │
│ Micro-LoRA update │ <100μs │ <50μs │ per signal │
│ Base LoRA forward │ <200μs │ <100μs │ per layer │
│ Pattern extraction │ <1s │ <500ms │ per 1000 │
│ Trajectory recording │ <10μs │ <5μs │ per step │
│ Background cycle │ <30s │ <15s │ per cycle │
│ Deep cycle │ <10min │ <5min │ per cycle │
│ Memory overhead │ <100MB │ <50MB │ total │
│ Pattern search │ <1ms │ <100μs │ per query │
│ Dream generation │ <100ms │ <50ms │ per dream │
└─────────────────────────────────────────────────────────────────────────┘
```
---
## Micro-LoRA Benchmarks
### Forward Pass Latency
**Target**: <50μs average, <100μs p99
```rust
// benches/micro_lora.rs
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
fn bench_micro_lora_forward(c: &mut Criterion) {
let mut group = c.benchmark_group("micro_lora_forward");
for rank in [1, 2] {
for hidden_dim in [256, 512, 1024, 2048] {
let lora = MicroLoRA::new(hidden_dim, rank);
let input = vec![0.1f32; hidden_dim];
let mut output = vec![0.0f32; hidden_dim];
group.bench_with_input(
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
output.fill(0.0);
unsafe { lora.forward_simd(&input, &mut output) };
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Rank | Hidden Dim | AVX2 (μs) | Scalar (μs) | Speedup |
|------|------------|-----------|-------------|---------|
| 1 | 256 | 3.2 | 12.5 | 3.9x |
| 1 | 512 | 5.8 | 24.1 | 4.2x |
| 1 | 1024 | 10.4 | 47.3 | 4.5x |
| 1 | 2048 | 19.7 | 93.8 | 4.8x |
| 2 | 256 | 5.1 | 23.4 | 4.6x |
| 2 | 512 | 9.3 | 46.2 | 5.0x |
| 2 | 1024 | 17.2 | 91.5 | 5.3x |
| 2 | 2048 | 33.1 | 182.4 | 5.5x |
### Gradient Accumulation
**Target**: <100μs per signal
```rust
fn bench_gradient_accumulation(c: &mut Criterion) {
let mut group = c.benchmark_group("gradient_accumulation");
for hidden_dim in [256, 512, 1024] {
let mut lora = MicroLoRA::new(hidden_dim, 1);
let signal = LearningSignal {
query_embedding: vec![0.1; hidden_dim],
gradient_estimate: vec![0.01; hidden_dim],
quality_score: 0.8,
timestamp: Instant::now(),
metadata: SignalMetadata::default(),
};
group.bench_with_input(
BenchmarkId::from_parameter(hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
lora.accumulate_gradient(&signal);
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Hidden Dim | Time (μs) | Throughput (signals/s) |
|------------|-----------|------------------------|
| 256 | 8.3 | 120,481 |
| 512 | 15.7 | 63,694 |
| 1024 | 30.2 | 33,112 |
---
## Base LoRA Benchmarks
### Forward Pass (Per Layer)
**Target**: <200μs per layer
```rust
fn bench_base_lora_forward(c: &mut Criterion) {
let mut group = c.benchmark_group("base_lora_forward");
for rank in [4, 8, 16] {
for hidden_dim in [512, 1024, 2048] {
let lora = BaseLoRA::new(hidden_dim, rank, 1);
let input = vec![0.1f32; hidden_dim];
let mut output = vec![0.0f32; hidden_dim];
group.bench_with_input(
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
&hidden_dim,
|b, _| {
b.iter(|| {
lora.forward_layer(0, &input, &mut output);
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Rank | Hidden Dim | Time (μs) | FLOPs | GFLOPS |
|------|------------|-----------|----------|--------|
| 4 | 512 | 45 | 4.2M | 93 |
| 4 | 1024 | 85 | 8.4M | 99 |
| 4 | 2048 | 162 | 16.8M | 104 |
| 8 | 512 | 82 | 8.4M | 102 |
| 8 | 1024 | 158 | 16.8M | 106 |
| 8 | 2048 | 305 | 33.5M | 110 |
| 16 | 512 | 155 | 16.8M | 108 |
| 16 | 1024 | 298 | 33.5M | 112 |
| 16 | 2048 | 582 | 67.1M | 115 |
---
## Trajectory Recording Benchmarks
### Step Recording Latency
**Target**: <10μs per step
```rust
fn bench_trajectory_recording(c: &mut Criterion) {
let mut group = c.benchmark_group("trajectory_recording");
for hidden_dim in [256, 512] {
for num_heads in [4, 8] {
let mut builder = TrajectoryBuilder::new(1, vec![0.1; hidden_dim]);
group.bench_with_input(
BenchmarkId::new(format!("h{}_heads{}", hidden_dim, num_heads), hidden_dim),
&(hidden_dim, num_heads),
|b, &(hd, nh)| {
b.iter(|| {
builder.add_step(
vec![0.5; hd],
vec![0.1; hd * nh],
0.8,
);
});
},
);
}
}
group.finish();
}
```
**Expected Results**:
| Hidden Dim | Heads | Time (μs) | Memory (bytes) |
|------------|-------|-----------|----------------|
| 256 | 4 | 2.1 | 5,120 |
| 256 | 8 | 3.8 | 9,216 |
| 512 | 4 | 3.7 | 10,240 |
| 512 | 8 | 6.9 | 18,432 |
### Buffer Operations
**Target**: Lock-free with <1% contention
```rust
fn bench_trajectory_buffer(c: &mut Criterion) {
let buffer = Arc::new(TrajectoryBuffer::new(10000));
c.bench_function("trajectory_buffer_record", |b| {
let trajectory = QueryTrajectory {
id: 1,
query_embedding: vec![0.1; 256],
steps: vec![],
final_quality: 0.8,
latency_us: 1000,
};
b.iter(|| {
buffer.record(trajectory.clone());
});
});
c.bench_function("trajectory_buffer_drain", |b| {
// Pre-fill buffer
for i in 0..1000 {
buffer.record(QueryTrajectory {
id: i,
query_embedding: vec![0.1; 256],
steps: vec![],
final_quality: 0.8,
latency_us: 1000,
});
}
b.iter(|| {
buffer.drain()
});
});
}
```
---
## Pattern Learning Benchmarks
### K-means++ Extraction
**Target**: <1s for 1000 trajectories
```rust
fn bench_pattern_extraction(c: &mut Criterion) {
let mut group = c.benchmark_group("pattern_extraction");
for n_trajectories in [100, 500, 1000, 5000] {
let mut bank = ReasoningBank::new(PatternConfig {
k_clusters: 50,
embedding_dim: 256,
..Default::default()
});
// Pre-populate
for i in 0..n_trajectories {
bank.add_trajectory(&generate_random_trajectory(i, 256));
}
group.bench_with_input(
BenchmarkId::from_parameter(n_trajectories),
&n_trajectories,
|b, _| {
b.iter(|| {
bank.extract_patterns()
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Trajectories | Clusters | Time (ms) | Iterations |
|--------------|----------|-----------|------------|
| 100 | 10 | 12 | 8 |
| 500 | 25 | 95 | 12 |
| 1000 | 50 | 380 | 15 |
| 5000 | 100 | 2,450 | 20 |
### Pattern Search
**Target**: <1ms per query
```rust
fn bench_pattern_search(c: &mut Criterion) {
let mut group = c.benchmark_group("pattern_search");
for n_patterns in [1000, 10000, 100000] {
let mut index = PatternIndex::new(256, n_patterns);
// Pre-populate
for i in 0..n_patterns {
let embedding: Vec<f32> = (0..256).map(|_| rand::random()).collect();
index.add_pattern(i as u64, &embedding).unwrap();
}
let query: Vec<f32> = (0..256).map(|_| rand::random()).collect();
group.bench_with_input(
BenchmarkId::from_parameter(n_patterns),
&n_patterns,
|b, _| {
b.iter(|| {
index.find_similar(&query, 10)
});
},
);
}
group.finish();
}
```
**Expected Results** (HNSW with ef=50):
| Patterns | Search Time (μs) | Recall@10 |
|----------|------------------|-----------|
| 1,000 | 45 | 0.98 |
| 10,000 | 120 | 0.96 |
| 100,000 | 350 | 0.94 |
| 1,000,000| 850 | 0.92 |
---
## EWC++ Benchmarks
### Fisher Information Update
**Target**: <1ms per update
```rust
fn bench_fisher_update(c: &mut Criterion) {
let mut group = c.benchmark_group("fisher_update");
for param_count in [1000, 10000, 100000] {
let mut ewc = EwcPlusPlus::new(EwcConfig {
param_count,
..Default::default()
});
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
group.bench_with_input(
BenchmarkId::from_parameter(param_count),
&param_count,
|b, _| {
b.iter(|| {
ewc.update_fisher(&gradients);
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Parameters | Update Time (μs) | Memory (KB) |
|------------|------------------|-------------|
| 1,000 | 15 | 8 |
| 10,000 | 120 | 80 |
| 100,000 | 1,150 | 800 |
### Constraint Application
**Target**: <500μs per gradient vector
```rust
fn bench_constraint_application(c: &mut Criterion) {
let mut group = c.benchmark_group("ewc_constraints");
for param_count in [1000, 10000, 100000] {
let ewc = EwcPlusPlus::new(EwcConfig {
param_count,
num_tasks: 5,
..Default::default()
});
// Pre-train Fisher
for _ in 0..100 {
let grads: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
ewc.update_fisher(&grads);
}
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
group.bench_with_input(
BenchmarkId::from_parameter(param_count),
&param_count,
|b, _| {
b.iter(|| {
ewc.apply_constraints(&gradients)
});
},
);
}
group.finish();
}
```
---
## Dream Engine Benchmarks
### Dream Generation
**Target**: <100ms per dream
```rust
fn bench_dream_generation(c: &mut Criterion) {
let mut group = c.benchmark_group("dream_generation");
for memory_size in [1000, 10000, 50000] {
let mut engine = DreamEngine::new(DreamConfig::default());
// Pre-populate memory
for i in 0..memory_size {
engine.add_memory_node(MemoryNode {
id: i as u64,
embedding: (0..256).map(|_| rand::random()).collect(),
timestamp: Instant::now(),
access_count: rand::random::<u32>() % 100,
importance: rand::random(),
});
}
group.bench_with_input(
BenchmarkId::from_parameter(memory_size),
&memory_size,
|b, _| {
b.iter(|| {
engine.generate_dream()
});
},
);
}
group.finish();
}
```
**Expected Results**:
| Memory Nodes | Dream Time (ms) | Avg Path Length |
|--------------|-----------------|-----------------|
| 1,000 | 12 | 8 |
| 10,000 | 45 | 12 |
| 50,000 | 85 | 15 |
### Dream Quality Evaluation
**Target**: <50ms per evaluation
```rust
fn bench_dream_evaluation(c: &mut Criterion) {
let evaluator = DreamEvaluator::new(EvaluatorConfig::default());
let dream = Dream {
id: 1,
path: (0..15).map(|i| MemoryNode {
id: i,
embedding: (0..256).map(|_| rand::random()).collect(),
timestamp: Instant::now(),
access_count: 10,
importance: 0.5,
}).collect(),
creative_jumps: 3,
total_novelty: 0.0,
};
c.bench_function("dream_evaluation", |b| {
b.iter(|| {
evaluator.evaluate(&dream)
});
});
}
```
---
## Learning Loop Benchmarks
### Loop A (Instant) - Per Request
**Target**: <1ms total overhead
```rust
fn bench_loop_a(c: &mut Criterion) {
let loop_a = InstantLoop::new(256, InstantLoopConfig::default());
let trajectory = QueryTrajectory {
id: 1,
query_embedding: vec![0.1; 256],
steps: (0..10).map(|_| TrajectoryStep {
activations: vec![0.5; 256],
attention_weights: vec![0.1; 2048],
reward: 0.8,
timestamp: Instant::now(),
}).collect(),
final_quality: 0.8,
latency_us: 50000,
};
c.bench_function("loop_a_on_inference", |b| {
b.iter(|| {
loop_a.on_inference(trajectory.clone());
});
});
c.bench_function("loop_a_flush", |b| {
// Pre-fill with signals
for _ in 0..100 {
loop_a.on_inference(trajectory.clone());
}
b.iter(|| {
loop_a.flush_updates();
});
});
}
```
**Expected Results**:
| Operation | Time (μs) | Notes |
|---------------|-----------|--------------------------|
| on_inference | 650 | Recording + accumulation |
| flush_updates | 120 | LoRA + edge commit |
| Total | 770 | Per request overhead |
### Loop B (Background) - Hourly
**Target**: <30s per cycle
```rust
fn bench_loop_b(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let loop_b = BackgroundLoop::new(BackgroundLoopConfig::default(), 256);
// Generate trajectories
let trajectories: Vec<_> = (0..1000)
.map(|i| generate_random_trajectory(i, 256))
.collect();
c.bench_function("loop_b_cycle", |b| {
b.to_async(&runtime).iter(|| async {
loop_b.run_cycle(trajectories.clone()).await
});
});
}
```
**Breakdown**:
| Phase | Time (s) | % of Total |
|------------------------|----------|------------|
| Trajectory ingestion | 0.5 | 2% |
| Pattern extraction | 8.0 | 32% |
| Gradient computation | 5.0 | 20% |
| EWC++ constraints | 3.0 | 12% |
| LoRA update | 2.0 | 8% |
| Fisher update | 4.0 | 16% |
| Metrics/logging | 2.5 | 10% |
| **Total** | **25.0** | 100% |
### Loop C (Deep) - Weekly
**Target**: <10min per cycle
```rust
fn bench_loop_c(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let loop_c = DeepLoop::new(DeepLoopConfig::default());
// This is a longer benchmark, run fewer iterations
c.bench_function("loop_c_cycle", |b| {
b.to_async(&runtime).iter(|| async {
loop_c.run_cycle().await
});
});
}
```
**Breakdown**:
| Phase | Time (min) | % of Total |
|------------------------|------------|------------|
| Dream generation (50) | 1.5 | 15% |
| Φ evaluation | 2.0 | 20% |
| Dream integration | 1.0 | 10% |
| Memory consolidation | 3.0 | 30% |
| EWC++ consolidation | 2.0 | 20% |
| Metrics/persistence | 0.5 | 5% |
| **Total** | **10.0** | 100% |
---
## Memory Benchmarks
### Memory Usage by Component
```rust
fn measure_memory_usage() -> MemoryReport {
let mut report = MemoryReport::default();
// Micro-LoRA (rank=1, hidden=256)
let micro_lora = MicroLoRA::new(256, 1);
report.micro_lora = std::mem::size_of_val(&micro_lora)
+ micro_lora.down_proj.len() * 4
+ micro_lora.up_proj.len() * 4
+ micro_lora.gradient_buffer.len() * 4;
// Base LoRA (rank=8, hidden=256, layers=12)
let base_lora = BaseLoRA::new(256, 8, 12);
report.base_lora = std::mem::size_of_val(&base_lora)
+ base_lora.layers.iter().map(|l|
l.down_proj.len() * 4 + l.up_proj.len() * 4
).sum::<usize>();
// Trajectory buffer (capacity=10000)
report.trajectory_buffer = 10000 * (
256 * 4 // query embedding
+ 10 * (256 * 4 + 2048 * 4 + 4 + 8) // 10 steps
);
// Pattern index (100k patterns)
report.pattern_index = 100000 * (256 * 4 + 64); // embedding + metadata
// EWC++ (100k params, 5 tasks)
report.ewc = 100000 * 4 * 5; // Fisher per task
report
}
```
**Expected Memory Usage**:
| Component | Size (MB) | Notes |
|------------------|-----------|--------------------------|
| Micro-LoRA | 0.004 | Minimal overhead |
| Base LoRA | 0.6 | 12 layers |
| Trajectory Buffer| 82.0 | 10k capacity |
| Pattern Index | 102.4 | 100k patterns |
| EWC++ Fisher | 2.0 | 100k params × 5 tasks |
| Dream Engine | 12.8 | 50k memory nodes |
| **Total** | **199.8** | Peak usage |
---
## Throughput Benchmarks
### End-to-End Query Throughput
```rust
fn bench_query_throughput(c: &mut Criterion) {
let runtime = tokio::runtime::Runtime::new().unwrap();
let sona = runtime.block_on(async {
SonaEngine::new(SonaConfig::default()).await.unwrap()
});
c.bench_function("query_throughput", |b| {
b.to_async(&runtime).iter(|| async {
sona.process("test query", &Context::default()).await
});
});
}
```
**Expected Throughput**:
| Scenario | QPS | Latency p50 | Latency p99 |
|--------------------|---------|-------------|-------------|
| Baseline (no SONA) | 850 | 1.1ms | 2.5ms |
| With Micro-LoRA | 780 | 1.2ms | 2.8ms |
| Full SONA | 720 | 1.3ms | 3.2ms |
**Overhead**: ~15% throughput reduction for full self-learning capability.
---
## Hardware-Specific Benchmarks
### CPU Feature Detection
```rust
fn check_cpu_features() -> CpuFeatures {
CpuFeatures {
avx2: is_x86_feature_detected!("avx2"),
avx512f: is_x86_feature_detected!("avx512f"),
fma: is_x86_feature_detected!("fma"),
sse4_1: is_x86_feature_detected!("sse4.1"),
sse4_2: is_x86_feature_detected!("sse4.2"),
}
}
```
### Performance by CPU
| CPU | Micro-LoRA (μs) | Pattern Search (μs) | Overall Speedup |
|------------------------|-----------------|---------------------|-----------------|
| Intel i9-13900K (AVX2) | 3.2 | 45 | 4.8x |
| AMD Ryzen 9 7950X | 3.5 | 48 | 4.5x |
| Apple M2 Pro (NEON) | 4.1 | 52 | 3.9x |
| Intel Xeon Platinum | 2.8 | 38 | 5.2x |
---
## Benchmark Commands
```bash
# Run all benchmarks
cargo bench --package ruvllm --features sona
# Run specific benchmark group
cargo bench --package ruvllm --bench micro_lora
# Run with specific features
cargo bench --package ruvllm --features "sona,avx2"
# Profile memory
cargo bench --package ruvllm --bench memory -- --profile-time 60
# Generate flamegraph
cargo flamegraph --bench micro_lora -- --bench
```
---
## Continuous Benchmarking
### CI Integration
```yaml
# .github/workflows/bench.yml
name: Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: cargo bench --package ruvllm --features sona -- --save-baseline main
- name: Compare with baseline
run: cargo bench --package ruvllm --features sona -- --baseline main
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: target/criterion
```
### Regression Detection
```rust
// Fail CI if performance regresses by more than 10%
const MAX_REGRESSION_PERCENT: f64 = 10.0;
fn check_regression(baseline: Duration, current: Duration) -> Result<(), String> {
let regression = (current.as_nanos() as f64 / baseline.as_nanos() as f64 - 1.0) * 100.0;
if regression > MAX_REGRESSION_PERCENT {
Err(format!(
"Performance regression of {:.1}% exceeds threshold of {}%",
regression, MAX_REGRESSION_PERCENT
))
} else {
Ok(())
}
}
```
---
## Next Steps
1. **09-API-REFERENCE.md** - Complete API documentation

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,138 @@
# RuvLLM Documentation
## Overview
This directory contains documentation for the RuvLLM self-learning LLM architecture.
## Quick Links
- [Main README](../README.md) - Getting started, API reference, benchmarks
- [SPARC Documentation](./sparc/) - Design methodology documentation
## SPARC Methodology
The project was designed using the SPARC methodology:
| Phase | Document | Description |
|-------|----------|-------------|
| 1 | [Specification](./sparc/01-specification.md) | Requirements and acceptance criteria |
| 2 | [Pseudocode](./sparc/02-pseudocode.md) | Algorithm design and data flows |
| 3 | [Architecture](./sparc/03-architecture.md) | System design and component interactions |
| 4 | [Refinement](./sparc/04-refinement.md) | TDD implementation and iterative improvement |
| 5 | [Completion](./sparc/05-completion.md) | Integration, testing, and deployment |
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ RuvLLM System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Embedding │ │ Memory │ │ Router │ │
│ │ Service │ │ (HNSW) │ │ (FastGRNN) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Orchestrator │ │
│ └──────┬──────┘ │
│ │ │
│ ┌─────────────┐ ┌──────┴──────┐ ┌─────────────┐ │
│ │ Attention │ │ Inference │ │ Learning │ │
│ │ Engine │ │ Pool │ │ Service │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Module Documentation
### Core Modules
| Module | File | Description |
|--------|------|-------------|
| `orchestrator` | `src/orchestrator.rs` | Main coordinator, request processing pipeline |
| `memory` | `src/memory.rs` | HNSW-based semantic memory with graph expansion |
| `router` | `src/router.rs` | FastGRNN routing with EWC learning |
| `attention` | `src/attention.rs` | Multi-head graph attention with edge features |
| `embedding` | `src/embedding.rs` | Tokenization, embedding, and caching |
| `inference` | `src/inference.rs` | LFM2 model pool management |
| `learning` | `src/learning.rs` | Self-learning feedback loops |
| `compression` | `src/compression.rs` | Memory compression and clustering |
### Supporting Modules
| Module | File | Description |
|--------|------|-------------|
| `config` | `src/config.rs` | Configuration system with builder pattern |
| `error` | `src/error.rs` | Error types and result aliases |
| `types` | `src/types.rs` | Core domain types and structs |
## API Examples
### Basic Query
```rust
use ruvllm::{Config, RuvLLM};
let config = Config::builder().build()?;
let llm = RuvLLM::new(config).await?;
let response = llm.query("What is Rust?").await?;
```
### Session Management
```rust
let session = llm.new_session();
let r1 = llm.query_session(&session, "Tell me about vectors").await?;
let r2 = llm.query_session(&session, "How are they used in ML?").await?;
```
### Feedback Loop
```rust
use ruvllm::Feedback;
llm.feedback(Feedback {
request_id: response.request_id,
rating: Some(5),
correction: None,
task_success: Some(true),
}).await?;
```
## Performance Tuning
### Memory Configuration
```rust
Config::builder()
.hnsw_params(
32, // M: connections per node (higher = better recall, more memory)
200, // ef_construction: build quality (higher = slower build, better index)
64, // ef_search: search quality (higher = slower search, better recall)
)
```
### Router Configuration
```rust
Config::builder()
.router_hidden_dim(128) // Hidden state size (higher = more capacity)
```
### Learning Configuration
```rust
Config::builder()
.learning_enabled(true) // Enable self-learning
```
## Further Reading
- [LFM2 Paper](https://arxiv.org/abs/2511.23404v1) - Liquid Foundation Models
- [FastGRNN Paper](https://arxiv.org/abs/1901.02358) - Fast RNN architecture
- [HNSW Paper](https://arxiv.org/abs/1603.09320) - Approximate nearest neighbor search
- [EWC Paper](https://arxiv.org/abs/1612.00796) - Continual learning

View File

@@ -0,0 +1,612 @@
# RuvLLM: Self-Learning LLM with LFM2 and Ruvector Integration
## SPARC Phase 1: Specification
---
## 1. Executive Summary
RuvLLM is a self-learning LLM architecture that integrates **Liquid Foundation Models (LFM2)** with **ruvector** as the world model and memory substrate. The system uses **FastGRNN** as an intelligent router to dynamically allocate computational resources based on query complexity, enabling efficient on-device inference with continuous learning capabilities.
### Core Innovation
The architecture treats:
- **LFM2** as the reasoning head (inference engine)
- **Ruvector** as the world model and episodic memory
- **FastGRNN** as the control circuit (routing decisions)
This triad creates a self-learning system where:
1. Queries are semantically embedded and matched against memory
2. Graph attention extracts relevant neighborhood context
3. FastGRNN routes to optimal model configuration
4. LFM2 generates responses with retrieved context
5. Successful interactions are written back to memory (self-improvement)
---
## 2. Technical Requirements
### 2.1 Functional Requirements
#### FR-001: LFM2 Model Integration
- **Description**: Support LFM2 model family (350M, 700M, 1.2B, 2.6B parameters)
- **Acceptance Criteria**:
- Load models via llama.cpp (CPU) or vLLM (server)
- Support quantization: Q4/Q5 (CPU), 8-bit/4-bit weight-only (GPU)
- Enable KV cache for context reuse
- Achieve <500ms median latency (CPU), <100ms (GPU)
#### FR-002: Ruvector Memory Service
- **Description**: Implement semantic memory with graph structure
- **Storage Schema**:
```
Nodes: {
id: UUID,
vector: [f32; D], // D = embedding dimension
text: String,
type: NodeType, // Query | Document | AgentStep | Fact
source: String,
metadata: {
timestamp: i64,
tags: Vec<String>,
domain: String,
version: u32,
confidence: f32
}
}
Edges: {
id: UUID,
src: UUID,
dst: UUID,
rel: EdgeType, // Cites | Follows | SameTopic | AgentStep | Derived
weight: f32,
metadata: {
timestamp: i64,
created_by: String,
confidence: f32
}
}
```
- **Acceptance Criteria**:
- HNSW index with M=32, efConstruction=200, efSearch=64
- Sub-millisecond retrieval for k≤64
- Graph attention over 2-hop neighborhoods
- Support billion-scale corpora
#### FR-003: FastGRNN Router
- **Description**: Implement gated recurrent router for intelligent resource allocation
- **Architecture** (per Kusupati et al.):
- Hidden size: 32-64 units
- Input: Fixed-length feature vector (~128 dims)
- Outputs: model_selection, context_size, temperature, top_p
- **Feature Vector Components** (128 dimensions):
```
Query Stats [32 dims]:
- token_count: f32
- language_id: [f32; 8] (one-hot)
- domain_encoding: [f32; 16]
- user_frequency: f32
- query_type: [f32; 6] (factual/reasoning/creative/...)
Embedding Stats [16 dims]:
- l2_norm: f32
- principal_components: [f32; 8]
- entropy: f32
- sparsity: f32
- cluster_assignment: [f32; 4]
HNSW Search Stats [48 dims]:
- k_retrieved: f32
- distances: { mean, std, min, max }: [f32; 4]
- entropy: f32
- graph_depth: f32
- recall_estimate: f32
- neighborhood_density: [f32; 16]
- semantic_coherence: [f32; 24]
System Constraints [32 dims]:
- latency_budget: f32
- device_class: [f32; 4] (edge/mobile/server/cluster)
- privacy_level: [f32; 4]
- memory_available: f32
- battery_level: f32 (for mobile)
- concurrent_requests: f32
- historical_accuracy: [f32; 16]
```
#### FR-004: Self-Learning Pipeline
- **Description**: Implement continuous learning with forgetting mitigation
- **Components**:
- Online learning from successful interactions
- Elastic Weight Consolidation (EWC) for catastrophic forgetting prevention
- Experience replay with reservoir sampling
- Curriculum learning for progressive complexity
- **Acceptance Criteria**:
- Quality regret <0.1 points vs. always-big baseline
- No measurable forgetting over 10K update cycles
- Router accuracy >95% for seen patterns
#### FR-005: Graph Attention Engine
- **Description**: Context extraction via graph-aware attention
- **Mechanism**:
- Multi-head attention over retrieved nodes
- Edge-weighted aggregation (confidence, recency)
- Hyperbolic embeddings for hierarchical relationships
- 2-hop neighborhood expansion
- **Integration with existing ruvector-attention**:
- Leverage `EdgeFeaturedAttention` for edge attributes
- Use `GraphRoPE` for positional encoding on graphs
- Apply `DualSpaceAttention` for multi-manifold reasoning
### 2.2 Non-Functional Requirements
#### NFR-001: Performance
| Metric | Tier A (Server) | Tier B (Edge) | Tier C (Mobile) |
|--------|-----------------|---------------|-----------------|
| P50 Latency | <200ms | <500ms | <800ms |
| P99 Latency | <1s | <2s | <5s |
| Throughput | 100 QPS | 20 QPS | 5 QPS |
| Memory | <16GB | <4GB | <1GB |
#### NFR-002: Quality
- **Accuracy**: F1 >0.85 on QA benchmarks
- **Retrieval**: R@10 >0.90 for relevant documents
- **Router**: Decision accuracy >95%
- **Judge Rating**: 4.2+/5.0 on LLM-as-judge evaluations
#### NFR-003: Scalability
- Support 10M+ vectors in memory
- Support 1B+ vectors with hybrid indexing
- Linear scaling with node count in cluster mode
#### NFR-004: Reliability
- Zero data loss on graceful shutdown
- Recovery from OOM within 30s
- Automatic failover in cluster mode
---
## 3. LFM2 Deep Dive
### 3.1 Architecture Analysis
LFM2 employs a **hybrid backbone** combining:
1. **Gated Short Convolutions**: Lightweight local feature processing
- O(n) complexity vs O(n²) for attention
- Captures local patterns efficiently
- Enables 2x faster prefill on CPUs
2. **Grouped Query Attention (GQA)**: Reduced KV heads
- 4-8 KV heads vs 32+ in standard attention
- Maintains quality with 4x memory reduction
- Critical for edge deployment
### 3.2 Training Methodology
LFM2's training is relevant for our self-learning pipeline:
1. **Knowledge Distillation**: Tempered, decoupled Top-K
- Teacher: Large model (70B+)
- Student: LFM2 variants
- **Insight**: We can distill router decisions from expensive oracle
2. **Curriculum Learning**: Progressive complexity
- Start with simple factual queries
- Graduate to multi-step reasoning
- **Application**: Router training follows same progression
3. **Three-Stage Post-Training**:
- SFT: Supervised fine-tuning on quality data
- DPO: Direct preference optimization
- Model merging: Combine specialists
- **Application**: We merge domain-specific adapters
### 3.3 Multimodal Extensions (Future)
- **LFM2-VL**: Vision-language (image understanding)
- **LFM2-Audio**: Speech I/O
- **LFM2-ColBERT**: Low-latency retrieval encoder
---
## 4. Ruvector Integration Analysis
### 4.1 Existing Capabilities
| Component | Status | Integration Plan |
|-----------|--------|------------------|
| ruvector-core | ✅ Production | Primary vector store |
| ruvector-gnn | ✅ Production | Graph neural layer |
| ruvector-attention | ✅ Production | Attention mechanisms |
| ruvector-router-core | ✅ Production | Base routing |
| ruvector-graph | ✅ Production | Knowledge graph |
### 4.2 Required Extensions
#### 4.2.1 Embedding Adapter
```rust
pub struct EmbeddingAdapter {
/// LFM2 encoder for query embedding
lfm2_encoder: Lfm2Encoder,
/// Dimension alignment layer
projection: Linear,
/// Normalization
layer_norm: LayerNorm,
}
impl EmbeddingAdapter {
pub fn embed(&self, text: &str) -> Vec<f32> {
let raw = self.lfm2_encoder.encode(text);
let projected = self.projection.forward(&raw);
self.layer_norm.forward(&projected)
}
}
```
#### 4.2.2 Memory Writeback Service
```rust
pub struct MemoryWriteback {
/// Quality threshold for writeback
quality_threshold: f32,
/// Deduplication via MinHash
dedup_hasher: MinHasher,
/// Conflict resolution
merger: ConflictMerger,
}
impl MemoryWriteback {
pub async fn maybe_write(
&self,
query: &str,
response: &str,
quality_score: f32,
db: &VectorDB,
) -> Result<Option<UUID>> {
if quality_score < self.quality_threshold {
return Ok(None);
}
// Check for near-duplicates
let embedding = embed(query, response);
let similar = db.search_threshold(&embedding, 0.95)?;
if !similar.is_empty() {
return self.merger.resolve(similar, query, response);
}
// Insert new memory
let entry = VectorEntry::new(embedding)
.with_text(format!("Q: {}\nA: {}", query, response))
.with_metadata(json!({
"type": "qa_pair",
"quality": quality_score,
"timestamp": now(),
}));
Ok(Some(db.insert(entry)?))
}
}
```
### 4.3 HNSW Parameter Tuning
Based on arxiv:2511.23404v1 insights on retrieval efficiency:
| Corpus Size | M | efConstruction | efSearch | Recall@10 |
|-------------|---|----------------|----------|-----------|
| <100K | 16 | 100 | 32 | 0.98 |
| 100K-1M | 32 | 200 | 64 | 0.96 |
| 1M-10M | 48 | 300 | 128 | 0.94 |
| 10M-100M | 64 | 400 | 256 | 0.92 |
| >100M | Hybrid | Tiered | Adaptive | 0.90 |
---
## 5. FastGRNN Router Specification
### 5.1 Mathematical Formulation
FastGRNN (Fast, Accurate, Stable, and Tiny GRU):
```
z_t = σ(W_z · x_t + U_z · h_{t-1} + b_z)
h̃_t = tanh(W_h · x_t + U_h · (r_t ⊙ h_{t-1}) + b_h)
h_t = (ζ · (1 - z_t) + ν) ⊙ h̃_t + z_t ⊙ h_{t-1}
where:
- ζ, ν: Learned scalars (typically ζ≈1, ν≈0.5)
- W_z, W_h: Input weight matrices (sparse)
- U_z, U_h: Recurrent weight matrices (low-rank)
- r_t: Optional reset gate (can be fixed to 1)
```
### 5.2 Output Heads
```rust
pub struct RouterOutputs {
/// Model selection: [350M, 700M, 1.2B, 2.6B] probabilities
pub model_probs: [f32; 4],
/// Context size bins: [256, 512, 1024, 2048, 4096] tokens
pub context_probs: [f32; 5],
/// Temperature: continuous [0.0, 2.0]
pub temperature: f32,
/// Top-p: continuous [0.0, 1.0]
pub top_p: f32,
/// Confidence score
pub confidence: f32,
}
```
### 5.3 Training Protocol
**Phase 1: Data Collection**
```
For each query q:
1. Run all model configurations (expensive baseline)
2. Collect quality metrics Q, latency L, cost C
3. Compute utility: U = Q - λ·L - μ·C
4. Label: y_model = argmax(U), y_ctx = min viable context
```
**Phase 2: Supervised Training**
```
Loss = CE(model_pred, y_model)
+ CE(ctx_pred, y_ctx)
+ α·SmoothL1(temp_pred, y_temp)
+ β·SmoothL1(top_p_pred, y_top_p)
```
**Phase 3: Online Refinement**
```
Every N requests:
1. Sample exploration (ε-greedy or Thompson)
2. Compute regret vs. oracle
3. Update weights with importance sampling
4. Apply EWC regularization
```
---
## 6. Self-Learning Mechanisms
### 6.1 Continual Learning Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Self-Learning Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Query │───▶│ Retrieve│───▶│ Generate│───▶│ Evaluate│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │
│ │ │ │ ▼ │
│ │ │ │ ┌─────────┐ │
│ │ │ │ │ Quality │ │
│ │ │ │ │ > θ ? │ │
│ │ │ │ └────┬────┘ │
│ │ │ │ │ │
│ │ │ │ ┌──────┴──────┐ │
│ │ │ │ ▼ ▼ │
│ │ │ │ ┌───────┐ ┌───────┐ │
│ │ │ │ │ Write │ │ Skip │ │
│ │ │ │ │ Back │ │ │ │
│ │ │ │ └───┬───┘ └───────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Replay Buffer (Reservoir) │ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ │ E_1 │ │ E_2 │ │ ... │ │E_n-1│ │ E_n │ │ │
│ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │
│ └──────────────────────┬──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ EWC Regularization Layer │ │
│ │ │ │
│ │ L_total = L_task + λ·Σ F_i·(θ_i - θ*_i)² │ │
│ │ │ │
│ │ F_i = Fisher Information (importance) │ │
│ │ θ*_i = Optimal weights from previous task │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
```
### 6.2 Quality Evaluation
**LLM-as-Judge Protocol**:
```rust
pub struct QualityJudge {
judge_model: Lfm2, // Use 2.6B for judging
rubric: JudgeRubric,
}
impl QualityJudge {
pub fn evaluate(&self, query: &str, response: &str, context: &[&str]) -> f32 {
let prompt = format!(r#"
Evaluate the response quality on a scale of 1-5:
Query: {query}
Retrieved Context: {context:?}
Response: {response}
Criteria:
1. Factual accuracy (grounded in context)
2. Completeness (addresses the query fully)
3. Coherence (logical flow)
4. Conciseness (no unnecessary verbosity)
Score (1-5):
"#);
let score_str = self.judge_model.generate(&prompt, 10);
parse_score(&score_str)
}
}
```
### 6.3 Forgetting Mitigation
**Elastic Weight Consolidation (EWC)**:
```rust
// From ruvector-gnn ewc module
pub struct ElasticWeightConsolidation {
lambda: f32, // Regularization strength
fisher_info: Vec<f32>, // Fisher information diagonal
optimal_weights: Vec<f32>, // θ* from previous task
}
impl ElasticWeightConsolidation {
pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
self.fisher_info.iter()
.zip(current_weights.iter())
.zip(self.optimal_weights.iter())
.map(|((f, w), w_star)| f * (w - w_star).powi(2))
.sum::<f32>() * self.lambda / 2.0
}
pub fn update_fisher(&mut self, gradients: &[Vec<f32>]) {
// Fisher = E[∇logP(y|x;θ)²]
for (i, grad_samples) in gradients.iter().enumerate() {
self.fisher_info[i] = grad_samples.iter()
.map(|g| g.powi(2))
.sum::<f32>() / grad_samples.len() as f32;
}
}
}
```
---
## 7. Performance Optimization Strategy
### 7.1 LFM2 Level
| Optimization | Speedup | Quality Impact | Implementation |
|--------------|---------|----------------|----------------|
| Model selection | 2-4x | <1% | FastGRNN router |
| KV cache reuse | 1.5-2x | 0% | llama.cpp native |
| Q4 quantization | 2-3x | <2% | GGUF format |
| Speculative decode | 1.3-1.5x | 0% | Draft model |
| Continuous batching | 2-4x | 0% | vLLM |
### 7.2 Ruvector Level
| Optimization | Speedup | Quality Impact | Implementation |
|--------------|---------|----------------|----------------|
| HNSW tuning | Variable | Recall tradeoff | efSearch adjustment |
| Product quantization | 4-8x memory | <5% | PQ in ruvector-core |
| Graph pruning | 1.2-1.5x | <1% | Edge weight threshold |
| Batch retrieval | 2-3x | 0% | Parallel HNSW |
| Caching | 10x+ (hits) | 0% | LRU with TTL |
### 7.3 Router Level
| Optimization | Speedup | Quality Impact | Implementation |
|--------------|---------|----------------|----------------|
| Sparse weights | 10-50x | <0.5% | Magnitude pruning |
| Low-rank U | 2-4x | <0.5% | SVD decomposition |
| Int8 quantization | 2-4x | <0.1% | Post-training quant |
| Cascade routing | 1.5-2x | 0% | Early exit |
---
## 8. Success Metrics
### 8.1 Primary Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| End-to-end latency P50 | <500ms | Timer instrumentation |
| Quality (LLM judge) | 4.2+/5.0 | Automated evaluation |
| Router accuracy | >95% | Oracle comparison |
| Memory efficiency | <4GB (edge) | RSS monitoring |
| Throughput | 20 QPS (edge) | Load testing |
### 8.2 Secondary Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Retrieval R@10 | >0.90 | Benchmark suite |
| Forgetting rate | <5%/10K updates | Periodic eval |
| Cost reduction | >50% vs baseline | Token counting |
| Writeback rate | 10-30% | Database metrics |
### 8.3 Regret Analysis
```
Quality Regret = E[Q_baseline - Q_routed]
Latency Regret = E[L_routed - L_oracle]
Cost Regret = E[C_routed - C_oracle]
Targets:
- Quality Regret < 0.1 points (1-5 scale)
- Latency Regret < 50ms
- Cost Regret < 10%
```
---
## 9. Risk Analysis
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Router misprediction | Medium | High | Confidence thresholds, fallback |
| Catastrophic forgetting | Low | Critical | EWC, replay buffer, checkpoints |
| Memory exhaustion | Medium | High | Streaming, tiered storage |
| Quality degradation | Medium | High | A/B testing, rollback |
| Latency spikes | High | Medium | Caching, async processing |
---
## 10. Dependencies
### 10.1 Internal Dependencies
```toml
[dependencies]
ruvector-core = { path = "../ruvector-core" }
ruvector-gnn = { path = "../ruvector-gnn" }
ruvector-attention = { path = "../ruvector-attention" }
ruvector-graph = { path = "../ruvector-graph" }
ruvector-router-core = { path = "../ruvector-router-core" }
```
### 10.2 External Dependencies
```toml
[dependencies]
# LLM runtime
llama-cpp-rs = "0.3" # CPU inference
tokenizers = "0.15" # Fast tokenization
# Async runtime
tokio = { version = "1.41", features = ["full"] }
# Serialization
serde = { version = "1.0", features = ["derive"] }
# Metrics
prometheus = "0.13"
tracing = "0.1"
```
---
## 11. References
1. **LFM2 Technical Report**: arxiv:2511.23404v1
2. **FastGRNN**: Kusupati et al., "FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network"
3. **EWC**: Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks"
4. **HNSW**: Malkov & Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"
5. **Graph Attention**: Veličković et al., "Graph Attention Networks"
---
*Document Version: 1.0*
*Last Updated: 2025-12-02*
*Author: RuvLLM Architecture Team*

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,886 @@
# RuvLLM: Integration and Deployment
## SPARC Phase 5: Completion
---
## 1. Integration Strategy
### 1.1 Crate Structure
```
ruvector/
├── crates/
│ ├── ruvector-core/ # Existing: Vector DB
│ ├── ruvector-gnn/ # Existing: GNN + EWC + Replay
│ ├── ruvector-attention/ # Existing: Attention mechanisms
│ ├── ruvector-graph/ # Existing: Graph storage
│ └── ruvector-router-core/ # Existing: Routing primitives
└── examples/
└── ruvLLM/ # NEW: Self-learning LLM
├── src/
│ ├── lib.rs # Main library entry
│ ├── orchestrator.rs # Request orchestration
│ ├── embedding.rs # LFM2 embedding service
│ ├── router.rs # FastGRNN router
│ ├── memory.rs # Ruvector memory layer
│ ├── attention.rs # Graph attention wrapper
│ ├── inference.rs # LFM2 model pool
│ ├── learning.rs # Self-learning service
│ ├── compression.rs # Concept abstraction
│ ├── config.rs # Configuration
│ ├── types.rs # Core types
│ └── error.rs # Error handling
├── tests/
│ ├── unit/
│ └── integration/
├── benches/
├── config/
└── docs/ # SPARC documentation
```
### 1.2 Dependency Integration
```toml
# examples/ruvLLM/Cargo.toml
[package]
name = "ruvllm"
version = "0.1.0"
edition = "2021"
description = "Self-learning LLM with LFM2 and Ruvector integration"
[dependencies]
# Internal dependencies (path-based for development)
ruvector-core = { path = "../../crates/ruvector-core" }
ruvector-gnn = { path = "../../crates/ruvector-gnn" }
ruvector-attention = { path = "../../crates/ruvector-attention" }
ruvector-graph = { path = "../../crates/ruvector-graph" }
ruvector-router-core = { path = "../../crates/ruvector-router-core" }
# LLM inference
llama-cpp-rs = "0.3" # CPU inference via llama.cpp
tokenizers = "0.15" # Fast tokenization
# Async runtime
tokio = { version = "1.41", features = ["full"] }
futures = "0.3"
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "2.0.0-rc.3"
# Numerics
ndarray = { version = "0.16", features = ["serde"] }
rand = "0.8"
# Utilities
uuid = { version = "1.11", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
thiserror = "2.0"
anyhow = "1.0"
tracing = "0.1"
# Performance
dashmap = "6.1"
parking_lot = "0.12"
lru = "0.12"
# Metrics
prometheus = "0.13"
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
proptest = "1.5"
tokio-test = "0.4"
tempfile = "3.13"
tracing-subscriber = "0.3"
[features]
default = ["cpu"]
cpu = [] # llama.cpp CPU inference
gpu = ["vllm"] # vLLM GPU inference (optional)
vllm = []
[[bench]]
name = "pipeline"
harness = false
[[bench]]
name = "router"
harness = false
[[bench]]
name = "memory"
harness = false
```
### 1.3 API Surface
```rust
//! # RuvLLM - Self-Learning LLM
//!
//! A self-learning language model system integrating LFM2 with Ruvector.
//!
//! ## Architecture
//!
//! - **LFM2**: Frozen reasoning engine (350M-2.6B parameters)
//! - **Ruvector**: Living memory that adapts continuously
//! - **FastGRNN**: Control circuit for intelligent routing
//!
//! ## Quick Start
//!
//! ```rust,ignore
//! use ruvllm::{RuvLLM, Config};
//!
//! #[tokio::main]
//! async fn main() -> Result<()> {
//! // Initialize system
//! let config = Config::builder()
//! .db_path("./memory.db")
//! .model_path_350m("./models/lfm2-350m-q4.gguf")
//! .model_path_700m("./models/lfm2-700m-q4.gguf")
//! .build()?;
//!
//! let llm = RuvLLM::new(config).await?;
//!
//! // Process query
//! let response = llm.query("What is machine learning?").await?;
//! println!("Response: {}", response.text);
//! println!("Confidence: {:.2}", response.confidence);
//!
//! Ok(())
//! }
//! ```
//!
//! ## Self-Learning Loops
//!
//! The system learns through three feedback loops:
//!
//! 1. **Memory Growth**: Every interaction strengthens/weakens graph edges
//! 2. **Router Learning**: FastGRNN learns optimal model selection
//! 3. **Compression**: Periodic summarization creates concept hierarchies
pub mod attention;
pub mod compression;
pub mod config;
pub mod embedding;
pub mod error;
pub mod inference;
pub mod learning;
pub mod memory;
pub mod orchestrator;
pub mod router;
pub mod types;
// Re-exports for convenience
pub use config::{Config, ConfigBuilder};
pub use error::{Error, Result};
pub use orchestrator::RuvLLM;
pub use types::{Request, Response, Session};
/// Library version
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
```
---
## 2. Implementation Checklist
### 2.1 Core Components
```
Phase 1: Foundation
━━━━━━━━━━━━━━━━━━━━
[x] Project structure setup
[x] Cargo.toml with dependencies
[ ] Error types definition
[ ] Configuration system
[ ] Core types (Request, Response, Session)
Phase 2: Services
━━━━━━━━━━━━━━━━━━
[ ] EmbeddingService
[ ] LFM2 encoder wrapper
[ ] Dimension projection
[ ] Tokenization
[ ] Batch processing
[ ] MemoryService
[ ] VectorDB initialization
[ ] GraphStore integration
[ ] HNSW search wrapper
[ ] Graph expansion
[ ] Writeback queue
[ ] FastGRNNRouter
[ ] Cell implementation
[ ] Sparse matrix operations
[ ] Low-rank matrices
[ ] Output heads
[ ] Training loop
[ ] GraphAttentionEngine
[ ] Attention layer wrapper
[ ] Edge feature encoding
[ ] Multi-head aggregation
[ ] Context ranking
[ ] InferencePool
[ ] Model loading
[ ] Lazy initialization
[ ] KV cache management
[ ] LRU eviction
[ ] LearningService
[ ] Quality judge
[ ] Replay buffer
[ ] EWC integration
[ ] Background training
[ ] Compression jobs
Phase 3: Orchestration
━━━━━━━━━━━━━━━━━━━━━━
[ ] Orchestrator
[ ] Request routing
[ ] Session management
[ ] Pipeline coordination
[ ] Metrics collection
[ ] Error handling
Phase 4: Integration
━━━━━━━━━━━━━━━━━━━━
[ ] Integration tests
[ ] Benchmark suite
[ ] Example applications
[ ] Documentation
```
### 2.2 Test Coverage Requirements
| Component | Unit Tests | Integration | Benchmark |
|-----------|------------|-------------|-----------|
| Embedding | 15+ | 3+ | 2 |
| Memory | 20+ | 5+ | 3 |
| Router | 25+ | 5+ | 2 |
| Attention | 15+ | 3+ | 2 |
| Inference | 10+ | 3+ | 2 |
| Learning | 20+ | 5+ | 1 |
| Orchestrator | 10+ | 5+ | 2 |
| **Total** | **115+** | **29+** | **14** |
---
## 3. Deployment Configurations
### 3.1 Edge Deployment (Raspberry Pi / Mobile)
```toml
# config/edge.toml
[system]
device_class = "edge"
max_memory_mb = 2048
max_concurrent_requests = 2
[embedding]
model = "onnx" # ONNX for portability
dimension = 384
batch_size = 1
[memory]
hnsw_m = 16
hnsw_ef_construction = 100
hnsw_ef_search = 32
max_nodes = 100_000
[router]
hidden_dim = 32
sparsity = 0.95
confidence_threshold = 0.6
[inference]
models = ["350m"]
quantization = "q4_k"
max_context = 1024
max_loaded_models = 1
[learning]
enabled = true
quality_threshold = 0.8
replay_capacity = 1000
training_interval_ms = 300_000 # 5 minutes
```
### 3.2 Server Deployment (CPU)
```toml
# config/server-cpu.toml
[system]
device_class = "server"
max_memory_mb = 16384
max_concurrent_requests = 20
[embedding]
model = "lfm2-encoder"
dimension = 768
batch_size = 8
[memory]
hnsw_m = 32
hnsw_ef_construction = 200
hnsw_ef_search = 64
max_nodes = 10_000_000
[router]
hidden_dim = 64
sparsity = 0.9
confidence_threshold = 0.7
[inference]
models = ["700m", "1.2b", "2.6b"]
quantization = "q5_k"
max_context = 4096
max_loaded_models = 2
[learning]
enabled = true
quality_threshold = 0.75
replay_capacity = 100_000
training_interval_ms = 60_000 # 1 minute
```
### 3.3 Server Deployment (GPU)
```toml
# config/server-gpu.toml
[system]
device_class = "gpu"
max_memory_mb = 32768
max_concurrent_requests = 100
[embedding]
model = "lfm2-encoder"
dimension = 1024
batch_size = 32
[memory]
hnsw_m = 48
hnsw_ef_construction = 300
hnsw_ef_search = 128
max_nodes = 100_000_000
[router]
hidden_dim = 64
sparsity = 0.85
confidence_threshold = 0.75
[inference]
models = ["1.2b", "2.6b"]
quantization = "fp16"
max_context = 8192
max_loaded_models = 2
use_vllm = true
tensor_parallel = 1
[learning]
enabled = true
quality_threshold = 0.7
replay_capacity = 1_000_000
training_interval_ms = 30_000 # 30 seconds
```
---
## 4. Operational Runbook
### 4.1 Startup Sequence
```bash
#!/bin/bash
# scripts/start.sh
set -e
CONFIG=${1:-"config/server-cpu.toml"}
LOG_LEVEL=${LOG_LEVEL:-"info"}
echo "Starting RuvLLM with config: $CONFIG"
# 1. Validate configuration
cargo run --release --bin ruvllm-validate -- --config "$CONFIG"
# 2. Initialize database if needed
if [ ! -f "data/memory.db" ]; then
echo "Initializing database..."
cargo run --release --bin ruvllm-init -- --config "$CONFIG"
fi
# 3. Download models if needed
cargo run --release --bin ruvllm-models -- --config "$CONFIG" --check-or-download
# 4. Start server
RUST_LOG=$LOG_LEVEL cargo run --release --bin ruvllm-server -- \
--config "$CONFIG" \
--metrics-port 9090 \
--http-port 8080
```
### 4.2 Health Checks
```rust
/// Health check endpoint implementation
pub struct HealthCheck {
memory: Arc<RuvectorMemory>,
router: Arc<FastGRNNRouter>,
inference: Arc<InferencePool>,
}
impl HealthCheck {
pub async fn check(&self) -> HealthStatus {
let mut status = HealthStatus::default();
// Check memory service
status.memory = match self.memory.ping().await {
Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
};
// Check router
status.router = match self.router.ping() {
Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
};
// Check inference (at least one model loadable)
status.inference = match self.inference.health_check().await {
Ok(info) => ComponentHealth::Healthy {
latency_ms: info.latency,
details: json!({
"loaded_models": info.loaded_models,
"available_memory": info.available_memory,
}),
},
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
};
status.overall = if status.all_healthy() {
OverallHealth::Healthy
} else if status.any_critical() {
OverallHealth::Critical
} else {
OverallHealth::Degraded
};
status
}
}
```
### 4.3 Monitoring Dashboards
```yaml
# Prometheus alerting rules
groups:
- name: ruvllm
rules:
- alert: HighLatency
expr: histogram_quantile(0.95, ruvllm_request_latency_seconds_bucket) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "RuvLLM P95 latency above 1s"
- alert: LowQualityScore
expr: avg(ruvllm_quality_score) < 0.7
for: 10m
labels:
severity: warning
annotations:
summary: "Average quality score dropped below 0.7"
- alert: MemoryPressure
expr: ruvllm_memory_usage_bytes / ruvllm_memory_limit_bytes > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Memory usage above 90%"
- alert: RouterLowConfidence
expr: avg(ruvllm_router_confidence) < 0.5
for: 15m
labels:
severity: warning
annotations:
summary: "Router confidence consistently low"
- alert: HighErrorRate
expr: rate(ruvllm_errors_total[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 10%"
```
### 4.4 Backup and Recovery
```bash
#!/bin/bash
# scripts/backup.sh
BACKUP_DIR="/backups/ruvllm/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
echo "Creating backup in $BACKUP_DIR"
# 1. Backup memory database
cp -r data/memory.db "$BACKUP_DIR/memory.db"
# 2. Backup router weights
cp -r data/router_weights.bin "$BACKUP_DIR/router_weights.bin"
# 3. Backup EWC state
cp -r data/ewc_state.bin "$BACKUP_DIR/ewc_state.bin"
# 4. Backup replay buffer
cp -r data/replay_buffer.bin "$BACKUP_DIR/replay_buffer.bin"
# 5. Backup configuration
cp -r config/ "$BACKUP_DIR/config/"
# 6. Create manifest
cat > "$BACKUP_DIR/manifest.json" << EOF
{
"timestamp": "$(date -Iseconds)",
"version": "$(cargo run --release --bin ruvllm-version)",
"components": {
"memory_db": "memory.db",
"router_weights": "router_weights.bin",
"ewc_state": "ewc_state.bin",
"replay_buffer": "replay_buffer.bin",
"config": "config/"
}
}
EOF
echo "Backup complete: $BACKUP_DIR"
# 7. Upload to S3 if configured
if [ -n "$S3_BACKUP_BUCKET" ]; then
aws s3 sync "$BACKUP_DIR" "s3://$S3_BACKUP_BUCKET/$(basename $BACKUP_DIR)/"
echo "Uploaded to S3: $S3_BACKUP_BUCKET"
fi
```
---
## 5. Production Checklist
### 5.1 Pre-Launch
```
Security
━━━━━━━━
[ ] Input validation and sanitization
[ ] Rate limiting configured
[ ] TLS/HTTPS enabled
[ ] API authentication (if public)
[ ] Secrets in environment variables
[ ] Model integrity verification
Performance
━━━━━━━━━━━
[ ] Load tested to expected traffic
[ ] Memory profiled (no leaks)
[ ] Latency targets met
[ ] Caching configured
[ ] Connection pooling
Reliability
━━━━━━━━━━━
[ ] Health checks implemented
[ ] Graceful shutdown
[ ] Automatic restarts (systemd/k8s)
[ ] Backup procedures tested
[ ] Recovery procedures documented
Observability
━━━━━━━━━━━━━
[ ] Structured logging
[ ] Metrics exported
[ ] Distributed tracing
[ ] Alerting rules configured
[ ] Dashboards created
```
### 5.2 Post-Launch
```
Daily
━━━━━
[ ] Check error rates
[ ] Review quality scores
[ ] Monitor latency trends
[ ] Verify backup success
Weekly
━━━━━━
[ ] Review router decisions distribution
[ ] Analyze forgetting metrics
[ ] Check memory growth rate
[ ] Run compression job
[ ] Update router weights
Monthly
━━━━━━━
[ ] Full system backup
[ ] Performance benchmark
[ ] Security audit
[ ] Dependency updates
[ ] Evaluate student model candidates
```
---
## 6. API Reference
### 6.1 HTTP API
```yaml
openapi: "3.0.0"
info:
title: RuvLLM API
version: "0.1.0"
description: Self-learning LLM with LFM2 and Ruvector
paths:
/v1/query:
post:
summary: Process a query
requestBody:
required: true
content:
application/json:
schema:
type: object
required:
- query
properties:
query:
type: string
description: The user query
session_id:
type: string
description: Optional session for multi-turn
constraints:
type: object
properties:
max_latency_ms:
type: integer
max_tokens:
type: integer
temperature:
type: number
responses:
"200":
description: Successful response
content:
application/json:
schema:
type: object
properties:
text:
type: string
confidence:
type: number
sources:
type: array
items:
type: object
routing_info:
type: object
/v1/feedback:
post:
summary: Provide feedback on a response
requestBody:
required: true
content:
application/json:
schema:
type: object
required:
- request_id
properties:
request_id:
type: string
rating:
type: integer
minimum: 1
maximum: 5
correction:
type: string
responses:
"200":
description: Feedback recorded
/v1/health:
get:
summary: Health check
responses:
"200":
description: System healthy
"503":
description: System unhealthy
/v1/metrics:
get:
summary: Prometheus metrics
responses:
"200":
description: Metrics in Prometheus format
```
### 6.2 Rust SDK
```rust
use ruvllm::{RuvLLM, Config, Request, Response};
/// Simple query
async fn simple_query(llm: &RuvLLM) -> Result<Response> {
llm.query("What is Rust?").await
}
/// Query with options
async fn query_with_options(llm: &RuvLLM) -> Result<Response> {
llm.query_with(Request {
query: "Explain backpropagation".into(),
session_id: Some("user-123".into()),
constraints: Constraints {
max_latency_ms: Some(500),
max_tokens: Some(500),
temperature: Some(0.7),
..Default::default()
},
}).await
}
/// Multi-turn conversation
async fn conversation(llm: &RuvLLM) -> Result<()> {
let session = llm.new_session();
let r1 = llm.query_session(&session, "What is a neural network?").await?;
println!("Turn 1: {}", r1.text);
let r2 = llm.query_session(&session, "How do you train one?").await?;
println!("Turn 2: {}", r2.text);
let r3 = llm.query_session(&session, "What about overfitting?").await?;
println!("Turn 3: {}", r3.text);
Ok(())
}
/// Provide feedback
async fn with_feedback(llm: &RuvLLM) -> Result<()> {
let response = llm.query("What is 2+2?").await?;
llm.feedback(Feedback {
request_id: response.request_id,
rating: 5,
correction: None,
}).await?;
Ok(())
}
/// Stream response
async fn streaming(llm: &RuvLLM) -> Result<()> {
let mut stream = llm.query_stream("Tell me a story").await?;
while let Some(chunk) = stream.next().await {
print!("{}", chunk?);
}
Ok(())
}
```
---
## 7. Future Roadmap
### 7.1 Short-Term (1-3 months)
- [ ] LFM2-VL integration (vision-language)
- [ ] Multi-GPU inference with tensor parallelism
- [ ] Retrieval-augmented fine-tuning pipeline
- [ ] Improved compression algorithms
- [ ] WebAssembly deployment target
### 7.2 Medium-Term (3-6 months)
- [ ] Federated learning across edge nodes
- [ ] LFM2-Audio integration (speech)
- [ ] Custom domain fine-tuning toolkit
- [ ] Advanced curriculum learning
- [ ] Hyperbolic embeddings for hierarchies
### 7.3 Long-Term (6-12 months)
- [ ] Multi-agent collaboration
- [ ] Neuro-symbolic reasoning integration
- [ ] Continuous pre-training pipeline
- [ ] Hardware-specific optimizations (NPU, TPU)
- [ ] Enterprise multi-tenancy
---
## 8. Success Criteria
### 8.1 Technical Metrics
| Metric | Target | Current |
|--------|--------|---------|
| Latency P50 | <500ms | - |
| Latency P99 | <2s | - |
| Quality Score | >0.8 | - |
| Router Accuracy | >90% | - |
| Memory Efficiency | <4GB (edge) | - |
| Throughput | 20 QPS (edge) | - |
| Forgetting Rate | <5%/10K | - |
| Test Coverage | >80% | - |
### 8.2 Business Metrics
| Metric | Target | Notes |
|--------|--------|-------|
| User Satisfaction | >4.0/5.0 | Survey scores |
| Response Relevance | >85% | Human eval |
| Knowledge Retention | >90% | Multi-turn coherence |
| Cost Reduction | >50% | vs. always-big baseline |
---
## 9. Conclusion
RuvLLM represents a paradigm shift from static LLMs to adaptive, self-learning systems. By treating:
- **LFM2 as the stable cortex** (reasoning)
- **Ruvector as the living synaptic mesh** (memory)
- **FastGRNN as the control circuit** (routing)
We create intelligence that emerges from the loop, not just the model.
The three learning loops—memory growth, router optimization, and concept compression—enable continuous adaptation without the risks of in-place weight modification.
**The intelligence is not in one model anymore. It is in the loop.**
---
*Document Version: 1.0*
*Last Updated: 2025-12-02*
*Author: RuvLLM Architecture Team*