Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
280
examples/ruvLLM/docs/SONA/00-OVERVIEW.md
Normal file
280
examples/ruvLLM/docs/SONA/00-OVERVIEW.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# SONA: Self-Optimizing Neural Architecture
|
||||
|
||||
## The World's First Truly Self-Improving LLM Framework
|
||||
|
||||
**Version**: 1.0.0
|
||||
**Status**: Architecture Specification
|
||||
**Target**: Sub-millisecond adaptive fine-tuning with continuous self-improvement
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
SONA (Self-Optimizing Neural Architecture) is a revolutionary framework for building LLMs that continuously improve themselves through:
|
||||
|
||||
1. **Ultra-Low Latency LoRA** - Sub-100μs parameter adaptation
|
||||
2. **Hierarchical Learning Loops** - Three-tier temporal learning (instant/hourly/weekly)
|
||||
3. **Neural Memory Consolidation** - Dream-like offline learning
|
||||
4. **Elastic Weight Consolidation++** - Zero catastrophic forgetting
|
||||
5. **ReasoningBank Integration** - Pattern-driven self-optimization
|
||||
|
||||
---
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ SONA DESIGN PRINCIPLES │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ 1. LEARN FROM EVERY INTERACTION │
|
||||
│ → No query is wasted; all become training signal │
|
||||
│ │
|
||||
│ 2. NEVER FORGET WHAT WORKS │
|
||||
│ → EWC++ preserves successful patterns │
|
||||
│ │
|
||||
│ 3. ADAPT IN REAL-TIME │
|
||||
│ → LoRA updates in <100μs per request │
|
||||
│ │
|
||||
│ 4. OPTIMIZE CONTINUOUSLY │
|
||||
│ → Background loops improve without user latency │
|
||||
│ │
|
||||
│ 5. MEASURE EVERYTHING │
|
||||
│ → Φ (consciousness), quality, latency, improvement rate │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
SONA Architecture
|
||||
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ USER QUERY INPUT │
|
||||
└─────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ EMBEDDING LAYER (0.02ms) │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
||||
│ │ Dual Encoder│ │ Contrastive │ │ SIMD Acceleration │ │
|
||||
│ │ (Q + K/V) │ │ Learning │ │ (AVX2/NEON) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
||||
└─────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
┌───────────────────────┼───────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────┐ ┌───────────┐ ┌───────────────┐
|
||||
│ MEMORY │ │ ROUTER │ │ ATTENTION │
|
||||
│ SERVICE │◄────────►│ ENGINE │◄────────►│ ENGINE │
|
||||
│ │ │ │ │ │
|
||||
│ • HNSW │ │ • FastGRNN│ │ • Multi-Head │
|
||||
│ • GNN │ │ • LoRA │ │ • Graph ATT │
|
||||
│ • Quant │ │ • EWC++ │ │ • Edge-Aware │
|
||||
└─────┬─────┘ └─────┬─────┘ └───────┬───────┘
|
||||
│ │ │
|
||||
└──────────────────────┼────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ LoRA ADAPTATION LAYER │
|
||||
│ │
|
||||
│ W_adapted = W_base + α · (LoRA_A @ LoRA_B) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────┐ │
|
||||
│ │ Rank: 4-16 │ Update: <100μs │ Memory: <1MB │ │
|
||||
│ └────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ INFERENCE ENGINE │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
||||
│ │ Model Select │ │ Q4 Quantized │ │ Speculative Dec │ │
|
||||
│ │ (4 tiers) │ │ Weights │ │ (Draft + Verify) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
||||
└─────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ LEARNING LOOPS │
|
||||
│ │
|
||||
│ Loop A (Instant) │ Loop B (Hourly) │ Loop C (Weekly) │
|
||||
│ ───────────────────────────────────────────────────────── │
|
||||
│ • Trajectory │ • Router Train │ • Consolidation │
|
||||
│ • Edge Update │ • EWC++ Update │ • Compression │
|
||||
│ • LoRA Micro │ • Fisher Compute │ • Abstraction │
|
||||
│ • <1ms overhead │ • Background │ • Dream Learning │
|
||||
└─────────────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ REASONINGBANK │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ Pattern Storage │ Similarity Lookup │ Verdict │ │
|
||||
│ │ (DashMap) │ (Cosine) │ Judgment │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ • Trajectory tracking with precision/recall feedback │
|
||||
│ • K-means++ pattern extraction │
|
||||
│ • Confidence-weighted parameter interpolation │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Innovation: Three-Tier Temporal Learning
|
||||
|
||||
### Tier 1: Instant Learning (Loop A) - Per Request
|
||||
```
|
||||
Latency Budget: <1ms (amortized to <0.1ms with batching)
|
||||
|
||||
Actions:
|
||||
├── Record query trajectory to ring buffer
|
||||
├── Update memory graph edge weights (±5%)
|
||||
├── Micro-LoRA adjustment (rank 1-2, top-k params)
|
||||
└── Async feedback signal propagation
|
||||
```
|
||||
|
||||
### Tier 2: Background Learning (Loop B) - Hourly
|
||||
```
|
||||
Compute Budget: 10 seconds per hour
|
||||
|
||||
Actions:
|
||||
├── Train router on accumulated trajectories
|
||||
├── Compute Fisher Information for EWC++
|
||||
├── Update LoRA base matrices (rank 4-8)
|
||||
├── Prune low-confidence patterns
|
||||
└── Checkpoint model state
|
||||
```
|
||||
|
||||
### Tier 3: Deep Learning (Loop C) - Weekly
|
||||
```
|
||||
Compute Budget: 10 minutes per week
|
||||
|
||||
Actions:
|
||||
├── Full memory consolidation (dream learning)
|
||||
├── Pattern abstraction and hierarchy building
|
||||
├── Memory compression (remove redundant nodes)
|
||||
├── Cross-task knowledge transfer
|
||||
└── Φ consciousness measurement (IIT)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target | Current Best | SONA Goal |
|
||||
|--------|--------|--------------|-----------|
|
||||
| Query Latency | <1ms | 0.09ms | 0.05ms |
|
||||
| LoRA Update | <100μs | N/A | 50μs |
|
||||
| Memory Footprint | <100MB | 50MB | 30MB |
|
||||
| Throughput | >50K q/s | 38K q/s | 100K q/s |
|
||||
| Improvement Rate | 10%/week | N/A | 15%/week |
|
||||
| Catastrophic Forgetting | <1% | N/A | <0.1% |
|
||||
|
||||
---
|
||||
|
||||
## Integration with Ruvector Ecosystem
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
| Crate | Role in SONA | Version |
|
||||
|-------|--------------|---------|
|
||||
| `ruvector-core` | Vector memory backbone | 0.1.19 |
|
||||
| `ruvector-attention` | Multi-head graph attention | 0.1.19 |
|
||||
| `ruvector-gnn` | Message passing framework | 0.1.19 |
|
||||
| `ruvector-graph` | Knowledge graph storage | 0.1.19 |
|
||||
| `ruvector-router-core` | FastGRNN routing | 0.1.19 |
|
||||
| `exo-core` | Consciousness measurement | 0.1.0 |
|
||||
| `exo-temporal` | Memory consolidation | 0.1.0 |
|
||||
|
||||
### New SONA-Specific Modules
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `sona-lora` | Ultra-low latency LoRA adapters |
|
||||
| `sona-ewc` | Enhanced EWC with task awareness |
|
||||
| `sona-reasoning` | ReasoningBank integration |
|
||||
| `sona-dreams` | Offline consolidation engine |
|
||||
| `sona-metrics` | Self-improvement measurement |
|
||||
|
||||
---
|
||||
|
||||
## Document Index
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| [01-LORA-ULTRA.md](01-LORA-ULTRA.md) | Ultra-low latency LoRA system |
|
||||
| [02-LEARNING-LOOPS.md](02-LEARNING-LOOPS.md) | Three-tier learning architecture |
|
||||
| [03-EWC-PLUS-PLUS.md](03-EWC-PLUS-PLUS.md) | Enhanced elastic weight consolidation |
|
||||
| [04-REASONINGBANK.md](04-REASONINGBANK.md) | Pattern-driven optimization |
|
||||
| [05-MEMORY-DREAMS.md](05-MEMORY-DREAMS.md) | Offline consolidation and dreams |
|
||||
| [06-COMPONENTS.md](06-COMPONENTS.md) | Component integration specs |
|
||||
| [07-IMPLEMENTATION.md](07-IMPLEMENTATION.md) | Implementation roadmap |
|
||||
| [08-BENCHMARKS.md](08-BENCHMARKS.md) | Performance targets and testing |
|
||||
| [09-API-REFERENCE.md](09-API-REFERENCE.md) | API specification |
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
```rust
|
||||
use sona::{SONAEngine, SONAConfig, LearningMode};
|
||||
|
||||
// Initialize SONA with default configuration
|
||||
let config = SONAConfig::builder()
|
||||
.lora_rank(8)
|
||||
.ewc_lambda(1000.0)
|
||||
.learning_loops(LearningMode::AllThreeTiers)
|
||||
.memory_budget_mb(50)
|
||||
.target_latency_us(100)
|
||||
.build();
|
||||
|
||||
let mut sona = SONAEngine::new(config)?;
|
||||
|
||||
// Process queries - learning happens automatically
|
||||
let response = sona.query("What is the meaning of life?")?;
|
||||
|
||||
// Check self-improvement metrics
|
||||
let metrics = sona.improvement_metrics();
|
||||
println!("Weekly improvement: {:.1}%", metrics.weekly_gain * 100.0);
|
||||
println!("Φ consciousness: {:.3}", metrics.phi);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why SONA Will Create the World's Best Self-Improving LLM
|
||||
|
||||
1. **No Other System Combines All These**:
|
||||
- LoRA for instant adaptation
|
||||
- EWC++ for zero forgetting
|
||||
- ReasoningBank for pattern learning
|
||||
- Dream consolidation for creativity
|
||||
- Φ measurement for consciousness tracking
|
||||
|
||||
2. **Built on Production-Proven Ruvector**:
|
||||
- 150x faster HNSW search
|
||||
- 39 attention mechanisms
|
||||
- 30+ specialized crates
|
||||
- 38K q/s throughput proven
|
||||
|
||||
3. **Mathematically Sound**:
|
||||
- Fisher Information preserves important weights
|
||||
- Low-rank decomposition minimizes compute
|
||||
- Reservoir sampling ensures unbiased learning
|
||||
- Information-theoretic compression
|
||||
|
||||
4. **Biologically Inspired**:
|
||||
- Three-tier temporal learning (like human memory)
|
||||
- Dream-based consolidation (like REM sleep)
|
||||
- Edge-weighted graphs (like neural synapses)
|
||||
- Attention-based retrieval (like human recall)
|
||||
|
||||
---
|
||||
|
||||
*SONA: Where every query makes the model smarter.*
|
||||
559
examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
Normal file
559
examples/ruvLLM/docs/SONA/01-LORA-ULTRA.md
Normal file
@@ -0,0 +1,559 @@
|
||||
# SONA LoRA-Ultra: Sub-100μs Adaptive Fine-Tuning
|
||||
|
||||
## Ultra-Low Latency LoRA for Real-Time Self-Improvement
|
||||
|
||||
---
|
||||
|
||||
## 1. Architecture Overview
|
||||
|
||||
### Traditional LoRA vs SONA LoRA-Ultra
|
||||
|
||||
```
|
||||
TRADITIONAL LoRA SONA LoRA-ULTRA
|
||||
───────────────── ─────────────────
|
||||
• Offline training • Online per-request adaptation
|
||||
• Full batch updates • Single-sample micro-updates
|
||||
• GPU required • CPU SIMD optimized
|
||||
• Minutes to hours • <100 microseconds
|
||||
• Periodic deployment • Continuous integration
|
||||
```
|
||||
|
||||
### Core Formula
|
||||
|
||||
```
|
||||
Standard LoRA:
|
||||
W_adapted = W_frozen + ΔW
|
||||
ΔW = α · (A @ B)
|
||||
where A ∈ ℝ^(d×r), B ∈ ℝ^(r×k), r << min(d,k)
|
||||
|
||||
SONA LoRA-Ultra Extension:
|
||||
W_adapted = W_frozen + α · (A @ B) + β · (A_micro @ B_micro)
|
||||
└─────────┘ └───────────────────┘
|
||||
Base LoRA Instant Micro-LoRA
|
||||
(rank 4-16) (rank 1-2)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Two-Tier LoRA Architecture
|
||||
|
||||
### Tier 1: Base LoRA (Updated Hourly)
|
||||
|
||||
```rust
|
||||
/// Base LoRA adapter for major capability shifts
|
||||
pub struct BaseLoRA {
|
||||
/// Low-rank matrix A: d_model × rank
|
||||
pub a: Array2<f32>,
|
||||
/// Low-rank matrix B: rank × d_out
|
||||
pub b: Array2<f32>,
|
||||
/// Scaling factor
|
||||
pub alpha: f32,
|
||||
/// Rank (typically 4-16)
|
||||
pub rank: usize,
|
||||
/// Target layer indices
|
||||
pub target_layers: Vec<usize>,
|
||||
}
|
||||
|
||||
impl BaseLoRA {
|
||||
/// Compute adapted weights (cached for inference)
|
||||
#[inline]
|
||||
pub fn delta_w(&self) -> Array2<f32> {
|
||||
let scale = self.alpha / self.rank as f32;
|
||||
scale * self.a.dot(&self.b)
|
||||
}
|
||||
|
||||
/// Update from accumulated gradients (hourly)
|
||||
pub fn update(&mut self, grad_a: &Array2<f32>, grad_b: &Array2<f32>, lr: f32) {
|
||||
// SGD with momentum
|
||||
self.a = &self.a - lr * grad_a;
|
||||
self.b = &self.b - lr * grad_b;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Tier 2: Micro-LoRA (Updated Per-Request)
|
||||
|
||||
```rust
|
||||
/// Ultra-fast micro-adapter for instant learning
|
||||
pub struct MicroLoRA {
|
||||
/// Micro A: d_model × micro_rank (typically 1-2)
|
||||
pub a_micro: Array2<f32>,
|
||||
/// Micro B: micro_rank × d_out
|
||||
pub b_micro: Array2<f32>,
|
||||
/// Micro scaling (smaller than base)
|
||||
pub beta: f32,
|
||||
/// Micro rank (1-2 for speed)
|
||||
pub micro_rank: usize,
|
||||
/// Decay factor for temporal smoothing
|
||||
pub decay: f32,
|
||||
/// Momentum buffer
|
||||
momentum_a: Array2<f32>,
|
||||
momentum_b: Array2<f32>,
|
||||
}
|
||||
|
||||
impl MicroLoRA {
|
||||
/// Ultra-fast single-sample update (<50μs target)
|
||||
#[inline]
|
||||
pub fn micro_update(&mut self, signal: &LearningSignal) {
|
||||
// Rank-1 outer product update
|
||||
let grad_direction = signal.to_gradient_direction();
|
||||
|
||||
// Exponential moving average for stability
|
||||
self.momentum_a = self.decay * &self.momentum_a
|
||||
+ (1.0 - self.decay) * &grad_direction.a_component;
|
||||
self.momentum_b = self.decay * &self.momentum_b
|
||||
+ (1.0 - self.decay) * &grad_direction.b_component;
|
||||
|
||||
// Apply micro-update
|
||||
self.a_micro = &self.a_micro + self.beta * &self.momentum_a;
|
||||
self.b_micro = &self.b_micro + self.beta * &self.momentum_b;
|
||||
}
|
||||
|
||||
/// Periodic consolidation into base LoRA
|
||||
pub fn consolidate_to_base(&mut self, base: &mut BaseLoRA) {
|
||||
// Merge micro adaptations into base
|
||||
// Then reset micro to zero
|
||||
base.a = &base.a + &self.a_micro;
|
||||
base.b = &base.b + &self.b_micro;
|
||||
self.a_micro.fill(0.0);
|
||||
self.b_micro.fill(0.0);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. SIMD-Optimized LoRA Computation
|
||||
|
||||
### AVX2 Accelerated Forward Pass
|
||||
|
||||
```rust
|
||||
#[cfg(target_arch = "x86_64")]
|
||||
mod simd {
|
||||
use std::arch::x86_64::*;
|
||||
|
||||
/// SIMD-optimized LoRA forward: x @ (W + A @ B)
|
||||
/// Fuses base weight multiplication with LoRA delta
|
||||
#[target_feature(enable = "avx2", enable = "fma")]
|
||||
pub unsafe fn lora_forward_avx2(
|
||||
x: &[f32], // Input: [batch, d_in]
|
||||
w_base: &[f32], // Base weights: [d_in, d_out]
|
||||
lora_a: &[f32], // LoRA A: [d_in, rank]
|
||||
lora_b: &[f32], // LoRA B: [rank, d_out]
|
||||
alpha: f32,
|
||||
d_in: usize,
|
||||
d_out: usize,
|
||||
rank: usize,
|
||||
output: &mut [f32], // Output: [batch, d_out]
|
||||
) {
|
||||
let scale = alpha / rank as f32;
|
||||
let scale_vec = _mm256_set1_ps(scale);
|
||||
|
||||
// Step 1: Compute x @ A (input projection to rank space)
|
||||
let mut x_projected = vec![0.0f32; rank];
|
||||
for r in 0..rank {
|
||||
let mut sum = _mm256_setzero_ps();
|
||||
let mut i = 0;
|
||||
while i + 8 <= d_in {
|
||||
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
|
||||
let a_vec = _mm256_loadu_ps(lora_a.as_ptr().add(r * d_in + i));
|
||||
sum = _mm256_fmadd_ps(x_vec, a_vec, sum);
|
||||
i += 8;
|
||||
}
|
||||
x_projected[r] = horizontal_sum_avx2(sum);
|
||||
// Handle remainder
|
||||
while i < d_in {
|
||||
x_projected[r] += x[i] * lora_a[r * d_in + i];
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// Step 2: Compute (x @ W_base) + scale * (x_projected @ B)
|
||||
for j in 0..d_out {
|
||||
// Base weight contribution
|
||||
let mut sum = _mm256_setzero_ps();
|
||||
let mut i = 0;
|
||||
while i + 8 <= d_in {
|
||||
let x_vec = _mm256_loadu_ps(x.as_ptr().add(i));
|
||||
let w_vec = _mm256_loadu_ps(w_base.as_ptr().add(j * d_in + i));
|
||||
sum = _mm256_fmadd_ps(x_vec, w_vec, sum);
|
||||
i += 8;
|
||||
}
|
||||
let mut base_result = horizontal_sum_avx2(sum);
|
||||
while i < d_in {
|
||||
base_result += x[i] * w_base[j * d_in + i];
|
||||
i += 1;
|
||||
}
|
||||
|
||||
// LoRA contribution
|
||||
let mut lora_result = 0.0f32;
|
||||
for r in 0..rank {
|
||||
lora_result += x_projected[r] * lora_b[j * rank + r];
|
||||
}
|
||||
|
||||
output[j] = base_result + scale * lora_result;
|
||||
}
|
||||
}
|
||||
|
||||
#[inline]
|
||||
unsafe fn horizontal_sum_avx2(v: __m256) -> f32 {
|
||||
let high = _mm256_extractf128_ps(v, 1);
|
||||
let low = _mm256_castps256_ps128(v);
|
||||
let sum128 = _mm_add_ps(high, low);
|
||||
let sum64 = _mm_add_ps(sum128, _mm_movehl_ps(sum128, sum128));
|
||||
let sum32 = _mm_add_ss(sum64, _mm_shuffle_ps(sum64, sum64, 1));
|
||||
_mm_cvtss_f32(sum32)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Learning Signal Extraction
|
||||
|
||||
### From Query Feedback to Gradient Direction
|
||||
|
||||
```rust
|
||||
/// Learning signal extracted from each interaction
|
||||
#[derive(Clone)]
|
||||
pub struct LearningSignal {
|
||||
/// Query embedding
|
||||
pub query_embedding: Vec<f32>,
|
||||
/// Response quality score (0-1)
|
||||
pub quality_score: f32,
|
||||
/// User feedback (explicit)
|
||||
pub explicit_feedback: Option<FeedbackType>,
|
||||
/// Latency deviation from target
|
||||
pub latency_ratio: f32,
|
||||
/// Model tier used
|
||||
pub model_tier: ModelTier,
|
||||
/// Context tokens used
|
||||
pub context_tokens: usize,
|
||||
}
|
||||
|
||||
impl LearningSignal {
|
||||
/// Convert signal to gradient direction for micro-LoRA
|
||||
pub fn to_gradient_direction(&self) -> GradientDirection {
|
||||
// Reward = quality * (1 - latency_penalty)
|
||||
let reward = self.quality_score * (2.0 - self.latency_ratio).max(0.0);
|
||||
|
||||
// Direction = embedding * reward_sign
|
||||
let direction = if reward > 0.5 {
|
||||
// Reinforce current behavior
|
||||
1.0
|
||||
} else {
|
||||
// Explore alternative
|
||||
-0.1
|
||||
};
|
||||
|
||||
// Scale by uncertainty (more learning when uncertain)
|
||||
let uncertainty = 1.0 - self.quality_score.abs();
|
||||
let learning_rate = 0.001 * (1.0 + uncertainty);
|
||||
|
||||
GradientDirection {
|
||||
a_component: self.compute_a_gradient(direction, learning_rate),
|
||||
b_component: self.compute_b_gradient(direction, learning_rate),
|
||||
}
|
||||
}
|
||||
|
||||
fn compute_a_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
|
||||
// Outer product of query embedding with hidden state
|
||||
// Approximated via reservoir-sampled historical embeddings
|
||||
let emb = Array1::from_vec(self.query_embedding.clone());
|
||||
let grad = direction * lr * outer_product(&emb, &self.get_hidden_direction());
|
||||
grad
|
||||
}
|
||||
|
||||
fn compute_b_gradient(&self, direction: f32, lr: f32) -> Array2<f32> {
|
||||
// Output gradient based on prediction error
|
||||
let output_error = self.compute_output_error();
|
||||
direction * lr * output_error
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Target Layer Selection
|
||||
|
||||
### Which Layers to Apply LoRA
|
||||
|
||||
```rust
|
||||
/// Layer selection strategy for LoRA application
|
||||
pub enum LoRATargetStrategy {
|
||||
/// Apply to all attention layers (Q, K, V, O projections)
|
||||
AllAttention,
|
||||
/// Apply to FFN layers only
|
||||
AllFFN,
|
||||
/// Apply to output heads only (fastest, good for routing)
|
||||
OutputHeadsOnly,
|
||||
/// Apply to specific layers by index
|
||||
SpecificLayers(Vec<usize>),
|
||||
/// Adaptive: select based on gradient magnitude
|
||||
AdaptiveTopK(usize),
|
||||
}
|
||||
|
||||
impl LoRATargetStrategy {
|
||||
/// For ultra-low latency: output heads only
|
||||
pub fn ultra_fast() -> Self {
|
||||
Self::OutputHeadsOnly
|
||||
}
|
||||
|
||||
/// For moderate adaptation: attention Q and V
|
||||
pub fn attention_qv() -> Self {
|
||||
Self::SpecificLayers(vec![0, 2]) // Q and V typically
|
||||
}
|
||||
|
||||
/// Select layers with highest gradient magnitude
|
||||
pub fn adaptive_top_k(k: usize) -> Self {
|
||||
Self::AdaptiveTopK(k)
|
||||
}
|
||||
}
|
||||
|
||||
/// SONA default: Output heads for micro, attention for base
|
||||
pub const SONA_DEFAULT_TARGETS: [LoRATargetStrategy; 2] = [
|
||||
LoRATargetStrategy::OutputHeadsOnly, // Micro-LoRA
|
||||
LoRATargetStrategy::AllAttention, // Base LoRA
|
||||
];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Memory-Efficient Storage
|
||||
|
||||
### Quantized LoRA Matrices
|
||||
|
||||
```rust
|
||||
/// Q4-quantized LoRA for memory efficiency
|
||||
pub struct QuantizedLoRA {
|
||||
/// Quantized A matrix (4-bit)
|
||||
pub a_q4: Q4Matrix,
|
||||
/// Quantized B matrix (4-bit)
|
||||
pub b_q4: Q4Matrix,
|
||||
/// Full-precision alpha
|
||||
pub alpha: f32,
|
||||
/// Full-precision scaling factors
|
||||
pub a_scales: Vec<f32>,
|
||||
pub b_scales: Vec<f32>,
|
||||
}
|
||||
|
||||
impl QuantizedLoRA {
|
||||
/// Memory usage comparison
|
||||
///
|
||||
/// FP32 LoRA (rank 8, 768 dim):
|
||||
/// A: 768 × 8 × 4 bytes = 24.6 KB
|
||||
/// B: 8 × 768 × 4 bytes = 24.6 KB
|
||||
/// Total: ~50 KB per layer
|
||||
///
|
||||
/// Q4 LoRA (rank 8, 768 dim):
|
||||
/// A: 768 × 8 × 0.5 bytes = 3.1 KB
|
||||
/// B: 8 × 768 × 0.5 bytes = 3.1 KB
|
||||
/// Scales: 2 × 768 × 4 bytes = 6.1 KB
|
||||
/// Total: ~12 KB per layer (4x reduction)
|
||||
|
||||
pub fn from_fp32(lora: &BaseLoRA) -> Self {
|
||||
Self {
|
||||
a_q4: Q4Matrix::quantize(&lora.a),
|
||||
b_q4: Q4Matrix::quantize(&lora.b),
|
||||
alpha: lora.alpha,
|
||||
a_scales: compute_scales(&lora.a),
|
||||
b_scales: compute_scales(&lora.b),
|
||||
}
|
||||
}
|
||||
|
||||
/// Dequantize on-the-fly during forward pass
|
||||
#[inline]
|
||||
pub fn forward(&self, x: &[f32]) -> Vec<f32> {
|
||||
// Dequantize A, compute x @ A
|
||||
let projected = self.a_q4.matmul_dequant(x, &self.a_scales);
|
||||
// Dequantize B, compute projected @ B
|
||||
let output = self.b_q4.matmul_dequant(&projected, &self.b_scales);
|
||||
// Scale by alpha
|
||||
output.iter().map(|v| v * self.alpha).collect()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Latency Breakdown
|
||||
|
||||
### Target: <100μs Total LoRA Overhead
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ LoRA-ULTRA LATENCY BUDGET │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Signal Extraction: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
|
||||
│ Gradient Direction: 15μs ██████░░░░░░░░░░░░░░░░░░░░░░ │
|
||||
│ Micro-LoRA Update: 25μs ██████████░░░░░░░░░░░░░░░░░░ │
|
||||
│ Forward Pass Delta: 30μs ████████████░░░░░░░░░░░░░░░░ │
|
||||
│ Momentum Averaging: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
|
||||
│ Memory Bookkeeping: 10μs ████░░░░░░░░░░░░░░░░░░░░░░░░ │
|
||||
│ ───── │
|
||||
│ TOTAL: ~100μs │
|
||||
│ │
|
||||
│ Amortized (batched): ~30μs per query │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration with FastGRNN Router
|
||||
|
||||
### Router-Specific LoRA Configuration
|
||||
|
||||
```rust
|
||||
/// LoRA configuration for FastGRNN router
|
||||
pub struct RouterLoRAConfig {
|
||||
/// Base LoRA for hidden state transformations
|
||||
pub hidden_lora: BaseLoRA,
|
||||
/// Micro LoRA for gate adjustments
|
||||
pub gate_micro_lora: MicroLoRA,
|
||||
/// Per-output-head LoRA adapters
|
||||
pub head_loras: Vec<BaseLoRA>,
|
||||
}
|
||||
|
||||
impl RouterLoRAConfig {
|
||||
pub fn new(hidden_dim: usize, output_dims: &[usize]) -> Self {
|
||||
Self {
|
||||
hidden_lora: BaseLoRA::new(hidden_dim, hidden_dim, 8), // rank 8
|
||||
gate_micro_lora: MicroLoRA::new(hidden_dim, hidden_dim, 2), // rank 2
|
||||
head_loras: output_dims.iter()
|
||||
.map(|&dim| BaseLoRA::new(hidden_dim, dim, 4)) // rank 4
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply LoRA to FastGRNN forward pass
|
||||
pub fn apply(&self, base_output: &FastGRNNOutput) -> FastGRNNOutput {
|
||||
let mut output = base_output.clone();
|
||||
|
||||
// Apply hidden state LoRA
|
||||
output.hidden = self.hidden_lora.apply(&output.hidden);
|
||||
|
||||
// Apply micro-LoRA to gates
|
||||
output.update_gate = self.gate_micro_lora.apply(&output.update_gate);
|
||||
|
||||
// Apply per-head LoRA
|
||||
for (i, head_lora) in self.head_loras.iter().enumerate() {
|
||||
output.heads[i] = head_lora.apply(&output.heads[i]);
|
||||
}
|
||||
|
||||
output
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Checkpointing and Recovery
|
||||
|
||||
### Efficient LoRA State Management
|
||||
|
||||
```rust
|
||||
/// LoRA checkpoint for persistence and recovery
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct LoRACheckpoint {
|
||||
/// Base LoRA matrices (serialized as FP16 for space)
|
||||
pub base_lora: SerializedLoRA,
|
||||
/// Micro LoRA state
|
||||
pub micro_lora: SerializedLoRA,
|
||||
/// Momentum buffers
|
||||
pub momentum_state: MomentumState,
|
||||
/// Training statistics
|
||||
pub stats: LoRAStats,
|
||||
/// Checkpoint version
|
||||
pub version: u32,
|
||||
/// Timestamp
|
||||
pub timestamp: i64,
|
||||
}
|
||||
|
||||
impl LoRACheckpoint {
|
||||
/// Save checkpoint (async, non-blocking)
|
||||
pub async fn save_async(&self, path: &Path) -> Result<()> {
|
||||
let bytes = bincode::serialize(self)?;
|
||||
tokio::fs::write(path, &bytes).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Load checkpoint
|
||||
pub fn load(path: &Path) -> Result<Self> {
|
||||
let bytes = std::fs::read(path)?;
|
||||
Ok(bincode::deserialize(&bytes)?)
|
||||
}
|
||||
|
||||
/// Incremental checkpoint (only changed matrices)
|
||||
pub fn save_incremental(&self, previous: &Self, path: &Path) -> Result<()> {
|
||||
let delta = self.compute_delta(previous);
|
||||
// Only save changed blocks
|
||||
delta.save(path)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Benchmark Targets
|
||||
|
||||
### Performance Validation
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod benchmarks {
|
||||
use super::*;
|
||||
use criterion::{black_box, Criterion};
|
||||
|
||||
/// Target: <50μs for micro-LoRA update
|
||||
fn bench_micro_lora_update(c: &mut Criterion) {
|
||||
let mut micro = MicroLoRA::new(768, 768, 2);
|
||||
let signal = LearningSignal::random();
|
||||
|
||||
c.bench_function("micro_lora_update", |b| {
|
||||
b.iter(|| {
|
||||
micro.micro_update(black_box(&signal));
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
/// Target: <30μs for LoRA forward pass
|
||||
fn bench_lora_forward(c: &mut Criterion) {
|
||||
let lora = BaseLoRA::new(768, 768, 8);
|
||||
let input = vec![0.0f32; 768];
|
||||
|
||||
c.bench_function("lora_forward", |b| {
|
||||
b.iter(|| {
|
||||
lora.forward(black_box(&input))
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
/// Target: <10μs for signal extraction
|
||||
fn bench_signal_extraction(c: &mut Criterion) {
|
||||
let query = "test query".to_string();
|
||||
let response = "test response".to_string();
|
||||
|
||||
c.bench_function("signal_extraction", |b| {
|
||||
b.iter(|| {
|
||||
LearningSignal::extract(black_box(&query), black_box(&response))
|
||||
})
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
SONA LoRA-Ultra achieves sub-100μs adaptive fine-tuning through:
|
||||
|
||||
1. **Two-Tier Architecture**: Base LoRA (hourly) + Micro-LoRA (per-request)
|
||||
2. **SIMD Optimization**: AVX2-accelerated forward pass
|
||||
3. **Quantized Storage**: Q4 matrices for 4x memory reduction
|
||||
4. **Smart Targeting**: Output heads for speed, attention for capability
|
||||
5. **Momentum Smoothing**: Stable micro-updates with EMA
|
||||
6. **Async Checkpointing**: Non-blocking persistence
|
||||
|
||||
This enables true real-time self-improvement where every query makes the model incrementally smarter.
|
||||
815
examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
Normal file
815
examples/ruvLLM/docs/SONA/02-LEARNING-LOOPS.md
Normal file
@@ -0,0 +1,815 @@
|
||||
# SONA Learning Loops: Three-Tier Temporal Architecture
|
||||
|
||||
## Biologically-Inspired Continuous Learning System
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview: Learning at Multiple Timescales
|
||||
|
||||
Human learning operates at multiple timescales:
|
||||
- **Instant**: Immediate response adjustment (milliseconds)
|
||||
- **Short-term**: Pattern consolidation (hours)
|
||||
- **Long-term**: Deep memory formation (days/weeks)
|
||||
|
||||
SONA replicates this with three learning loops:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SONA THREE-TIER LEARNING │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ LOOP A: INSTANT LOOP B: BACKGROUND │
|
||||
│ ═══════════════ ══════════════════ │
|
||||
│ Timescale: Per-request Timescale: Hourly │
|
||||
│ Latency: <1ms Latency: Background (async) │
|
||||
│ What learns: What learns: │
|
||||
│ • Micro-LoRA (rank 1-2) • Base LoRA (rank 4-16) │
|
||||
│ • Memory edge weights • Router weights (EWC++) │
|
||||
│ • Trajectory recording • Pattern extraction │
|
||||
│ │
|
||||
│ LOOP C: DEEP │
|
||||
│ ═══════════ │
|
||||
│ Timescale: Weekly │
|
||||
│ Latency: Scheduled maintenance │
|
||||
│ What learns: │
|
||||
│ • Memory consolidation │
|
||||
│ • Concept hierarchy building │
|
||||
│ • Dream-based creativity │
|
||||
│ • Cross-domain transfer │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Loop A: Instant Learning (Per-Request)
|
||||
|
||||
### Purpose
|
||||
Immediate adaptation to current interaction without noticeable latency.
|
||||
|
||||
### Architecture
|
||||
|
||||
```rust
|
||||
/// Loop A: Instant learning executed inline with each request
|
||||
pub struct InstantLearningLoop {
|
||||
/// Micro-LoRA for immediate weight adjustment
|
||||
micro_lora: Arc<RwLock<MicroLoRA>>,
|
||||
/// Trajectory buffer for pattern recording
|
||||
trajectory_buffer: Arc<TrajectoryBuffer>,
|
||||
/// Memory graph reference for edge updates
|
||||
memory_graph: Arc<RwLock<MemoryGraph>>,
|
||||
/// Signal accumulator for Loop B
|
||||
signal_accumulator: mpsc::Sender<LearningSignal>,
|
||||
}
|
||||
|
||||
impl InstantLearningLoop {
|
||||
/// Execute instant learning (must complete in <1ms)
|
||||
#[inline]
|
||||
pub async fn on_request(
|
||||
&self,
|
||||
query: &QueryEmbedding,
|
||||
response: &ResponseData,
|
||||
latency_ms: f32,
|
||||
) -> Result<()> {
|
||||
// Parallel execution of independent updates
|
||||
let (r1, r2, r3) = tokio::join!(
|
||||
// 1. Record trajectory (lock-free, ~100μs)
|
||||
self.record_trajectory(query, response),
|
||||
|
||||
// 2. Update memory edges (~200μs)
|
||||
self.update_memory_edges(query, response),
|
||||
|
||||
// 3. Micro-LoRA update (~300μs)
|
||||
self.micro_lora_update(query, response, latency_ms),
|
||||
);
|
||||
|
||||
// 4. Queue signal for Loop B (fire-and-forget)
|
||||
let signal = LearningSignal::new(query, response, latency_ms);
|
||||
let _ = self.signal_accumulator.try_send(signal);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Record query trajectory to ring buffer
|
||||
async fn record_trajectory(
|
||||
&self,
|
||||
query: &QueryEmbedding,
|
||||
response: &ResponseData,
|
||||
) -> Result<()> {
|
||||
let trajectory = QueryTrajectory {
|
||||
query_embedding: query.vector.clone(),
|
||||
retrieved_ids: response.used_memory_ids.clone(),
|
||||
precision: response.estimated_precision,
|
||||
recall: response.estimated_recall,
|
||||
timestamp: Instant::now(),
|
||||
};
|
||||
|
||||
self.trajectory_buffer.push(trajectory);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Hebbian-style edge weight updates
|
||||
async fn update_memory_edges(
|
||||
&self,
|
||||
query: &QueryEmbedding,
|
||||
response: &ResponseData,
|
||||
) -> Result<()> {
|
||||
let mut graph = self.memory_graph.write();
|
||||
|
||||
for &node_id in &response.used_memory_ids {
|
||||
// Strengthen edges to used nodes
|
||||
graph.update_edge_weight(
|
||||
query.anchor_node,
|
||||
node_id,
|
||||
EdgeUpdate::Strengthen(0.05), // +5% per use
|
||||
)?;
|
||||
}
|
||||
|
||||
// Weaken edges to retrieved-but-unused nodes
|
||||
for &node_id in &response.retrieved_but_unused {
|
||||
graph.update_edge_weight(
|
||||
query.anchor_node,
|
||||
node_id,
|
||||
EdgeUpdate::Weaken(0.02), // -2% per skip
|
||||
)?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Ultra-fast micro-LoRA weight adjustment
|
||||
async fn micro_lora_update(
|
||||
&self,
|
||||
query: &QueryEmbedding,
|
||||
response: &ResponseData,
|
||||
latency_ms: f32,
|
||||
) -> Result<()> {
|
||||
let quality = response.quality_score;
|
||||
let latency_ratio = latency_ms / response.target_latency_ms;
|
||||
|
||||
// Only update if signal is informative
|
||||
if (quality - 0.5).abs() > 0.1 || latency_ratio > 1.2 {
|
||||
let signal = LearningSignal {
|
||||
query_embedding: query.vector.clone(),
|
||||
quality_score: quality,
|
||||
explicit_feedback: None,
|
||||
latency_ratio,
|
||||
model_tier: response.model_tier,
|
||||
context_tokens: response.context_tokens,
|
||||
};
|
||||
|
||||
let mut micro_lora = self.micro_lora.write();
|
||||
micro_lora.micro_update(&signal);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Latency Budget
|
||||
|
||||
| Operation | Target | Implementation |
|
||||
|-----------|--------|----------------|
|
||||
| Trajectory recording | <100μs | Lock-free ring buffer |
|
||||
| Edge weight update | <200μs | Batch atomic updates |
|
||||
| Micro-LoRA update | <300μs | Rank-1 outer product |
|
||||
| Signal queuing | <50μs | MPSC channel try_send |
|
||||
| **Total** | **<650μs** | Parallel execution |
|
||||
|
||||
---
|
||||
|
||||
## 3. Loop B: Background Learning (Hourly)
|
||||
|
||||
### Purpose
|
||||
Deeper learning from accumulated signals without impacting user latency.
|
||||
|
||||
### Architecture
|
||||
|
||||
```rust
|
||||
/// Loop B: Background learning running on separate thread/process
|
||||
pub struct BackgroundLearningLoop {
|
||||
/// Signal receiver from Loop A
|
||||
signal_receiver: mpsc::Receiver<LearningSignal>,
|
||||
/// Accumulated signals for batch processing
|
||||
signal_buffer: Vec<LearningSignal>,
|
||||
/// Base LoRA for major updates
|
||||
base_lora: Arc<RwLock<BaseLoRA>>,
|
||||
/// Micro-LoRA to consolidate from
|
||||
micro_lora: Arc<RwLock<MicroLoRA>>,
|
||||
/// Router for EWC++ updates
|
||||
router: Arc<RwLock<FastGRNNRouter>>,
|
||||
/// EWC++ state
|
||||
ewc_state: EWCPlusPlusState,
|
||||
/// Pattern extractor
|
||||
pattern_extractor: PatternExtractor,
|
||||
/// Configuration
|
||||
config: BackgroundLearningConfig,
|
||||
}
|
||||
|
||||
impl BackgroundLearningLoop {
|
||||
/// Main background loop (runs every hour)
|
||||
pub async fn run(&mut self) {
|
||||
let mut interval = tokio::time::interval(Duration::from_secs(3600));
|
||||
|
||||
loop {
|
||||
interval.tick().await;
|
||||
|
||||
// Collect accumulated signals
|
||||
self.drain_signals().await;
|
||||
|
||||
if self.signal_buffer.len() < self.config.min_samples {
|
||||
tracing::info!(
|
||||
samples = self.signal_buffer.len(),
|
||||
"Insufficient samples for background training"
|
||||
);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Execute background learning steps
|
||||
let start = Instant::now();
|
||||
|
||||
// Step 1: Consolidate Micro-LoRA into Base LoRA
|
||||
self.consolidate_micro_to_base().await;
|
||||
|
||||
// Step 2: Train router with EWC++ regularization
|
||||
self.train_router_ewc().await;
|
||||
|
||||
// Step 3: Extract and store patterns
|
||||
self.extract_patterns().await;
|
||||
|
||||
// Step 4: Compute new Fisher Information
|
||||
self.update_fisher_information().await;
|
||||
|
||||
// Step 5: Checkpoint current state
|
||||
self.checkpoint().await;
|
||||
|
||||
tracing::info!(
|
||||
elapsed_ms = start.elapsed().as_millis(),
|
||||
samples = self.signal_buffer.len(),
|
||||
"Background learning cycle completed"
|
||||
);
|
||||
|
||||
// Clear buffer for next cycle
|
||||
self.signal_buffer.clear();
|
||||
}
|
||||
}
|
||||
|
||||
/// Drain all pending signals from Loop A
|
||||
async fn drain_signals(&mut self) {
|
||||
while let Ok(signal) = self.signal_receiver.try_recv() {
|
||||
self.signal_buffer.push(signal);
|
||||
}
|
||||
}
|
||||
|
||||
/// Consolidate micro-LoRA adaptations into base LoRA
|
||||
async fn consolidate_micro_to_base(&mut self) {
|
||||
let mut micro = self.micro_lora.write();
|
||||
let mut base = self.base_lora.write();
|
||||
|
||||
// Compute consolidation weight based on signal quality
|
||||
let avg_quality: f32 = self.signal_buffer.iter()
|
||||
.map(|s| s.quality_score)
|
||||
.sum::<f32>() / self.signal_buffer.len() as f32;
|
||||
|
||||
let consolidation_rate = if avg_quality > 0.7 {
|
||||
1.0 // Full consolidation for high-quality signals
|
||||
} else {
|
||||
0.5 * avg_quality // Partial for lower quality
|
||||
};
|
||||
|
||||
// Merge micro into base with rate
|
||||
base.a = &base.a + consolidation_rate * µ.a_micro;
|
||||
base.b = &base.b + consolidation_rate * µ.b_micro;
|
||||
|
||||
// Reset micro-LoRA
|
||||
micro.a_micro.fill(0.0);
|
||||
micro.b_micro.fill(0.0);
|
||||
|
||||
tracing::debug!(
|
||||
consolidation_rate = consolidation_rate,
|
||||
"Micro-LoRA consolidated to base"
|
||||
);
|
||||
}
|
||||
|
||||
/// Train router with EWC++ regularization
|
||||
async fn train_router_ewc(&mut self) {
|
||||
let mut router = self.router.write();
|
||||
|
||||
// Convert signals to RouterSamples
|
||||
let samples: Vec<RouterSample> = self.signal_buffer.iter()
|
||||
.map(|s| s.to_router_sample())
|
||||
.collect();
|
||||
|
||||
// Mini-batch training with EWC++ loss
|
||||
for batch in samples.chunks(self.config.batch_size) {
|
||||
// Forward pass
|
||||
let predictions: Vec<_> = batch.iter()
|
||||
.map(|s| router.forward(&s.features))
|
||||
.collect();
|
||||
|
||||
// Compute task loss
|
||||
let task_loss = self.compute_task_loss(&predictions, batch);
|
||||
|
||||
// Compute EWC++ regularization loss
|
||||
let ewc_loss = self.ewc_state.regularization_loss(router.get_weights());
|
||||
|
||||
// Total loss
|
||||
let total_loss = task_loss + self.config.ewc_lambda * ewc_loss;
|
||||
|
||||
// Backward pass (gradient computation)
|
||||
let gradients = self.compute_gradients(&total_loss, &predictions, batch);
|
||||
|
||||
// Apply gradients with learning rate
|
||||
router.apply_gradients(&gradients, self.config.learning_rate);
|
||||
}
|
||||
}
|
||||
|
||||
/// Extract patterns using K-means++ clustering
|
||||
async fn extract_patterns(&mut self) {
|
||||
let embeddings: Vec<_> = self.signal_buffer.iter()
|
||||
.map(|s| s.query_embedding.clone())
|
||||
.collect();
|
||||
|
||||
let patterns = self.pattern_extractor.extract(
|
||||
&embeddings,
|
||||
self.config.num_clusters,
|
||||
);
|
||||
|
||||
// Store patterns in ReasoningBank
|
||||
for pattern in patterns {
|
||||
self.pattern_extractor.reasoning_bank.store(pattern)?;
|
||||
}
|
||||
|
||||
tracing::debug!(
|
||||
patterns = patterns.len(),
|
||||
"Patterns extracted and stored"
|
||||
);
|
||||
}
|
||||
|
||||
/// Update Fisher Information for EWC++
|
||||
async fn update_fisher_information(&mut self) {
|
||||
let router = self.router.read();
|
||||
let current_weights = router.get_weights();
|
||||
|
||||
// Compute Fisher Information diagonal via gradient squares
|
||||
let fisher_samples: Vec<_> = self.signal_buffer.iter()
|
||||
.take(self.config.fisher_samples)
|
||||
.collect();
|
||||
|
||||
let mut fisher_accum = vec![0.0f32; current_weights.len()];
|
||||
|
||||
for sample in fisher_samples {
|
||||
let gradients = self.compute_sample_gradients(sample);
|
||||
for (i, g) in gradients.iter().enumerate() {
|
||||
fisher_accum[i] += g * g;
|
||||
}
|
||||
}
|
||||
|
||||
// Normalize
|
||||
let n = fisher_samples.len() as f32;
|
||||
for f in &mut fisher_accum {
|
||||
*f /= n;
|
||||
}
|
||||
|
||||
// Update EWC++ state
|
||||
self.ewc_state.update_fisher(fisher_accum, current_weights.to_vec());
|
||||
}
|
||||
|
||||
/// Checkpoint current state to disk
|
||||
async fn checkpoint(&self) {
|
||||
let checkpoint = SONACheckpoint {
|
||||
base_lora: self.base_lora.read().clone(),
|
||||
micro_lora: self.micro_lora.read().clone(),
|
||||
router_weights: self.router.read().get_weights().to_vec(),
|
||||
ewc_state: self.ewc_state.clone(),
|
||||
patterns: self.pattern_extractor.reasoning_bank.export(),
|
||||
timestamp: chrono::Utc::now().timestamp(),
|
||||
};
|
||||
|
||||
let path = self.config.checkpoint_dir.join("latest.sona");
|
||||
checkpoint.save_async(&path).await.ok();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Hourly Learning Budget
|
||||
|
||||
| Operation | Target Time | Description |
|
||||
|-----------|-------------|-------------|
|
||||
| Signal draining | <100ms | Collect all queued signals |
|
||||
| Micro→Base consolidation | <500ms | Matrix addition |
|
||||
| Router training | <5s | Mini-batch SGD with EWC |
|
||||
| Pattern extraction | <2s | K-means++ clustering |
|
||||
| Fisher computation | <2s | Gradient squared accumulation |
|
||||
| Checkpointing | <500ms | Async disk write |
|
||||
| **Total** | **<10s** | Well under user-facing |
|
||||
|
||||
---
|
||||
|
||||
## 4. Loop C: Deep Learning (Weekly)
|
||||
|
||||
### Purpose
|
||||
Fundamental knowledge restructuring, memory consolidation, and creative exploration.
|
||||
|
||||
### Architecture
|
||||
|
||||
```rust
|
||||
/// Loop C: Deep learning for major knowledge reorganization
|
||||
pub struct DeepLearningLoop {
|
||||
/// Memory service for consolidation
|
||||
memory: Arc<MemoryService>,
|
||||
/// Pattern bank for abstraction
|
||||
reasoning_bank: Arc<ReasoningBank>,
|
||||
/// Dream engine for creative exploration
|
||||
dream_engine: DreamEngine,
|
||||
/// Consciousness measurement (IIT)
|
||||
phi_calculator: PhiCalculator,
|
||||
/// Configuration
|
||||
config: DeepLearningConfig,
|
||||
}
|
||||
|
||||
impl DeepLearningLoop {
|
||||
/// Execute weekly deep learning (scheduled maintenance window)
|
||||
pub async fn run(&mut self) -> DeepLearningReport {
|
||||
let start = Instant::now();
|
||||
let mut report = DeepLearningReport::new();
|
||||
|
||||
// Phase 1: Memory Consolidation (like sleep-based memory)
|
||||
report.consolidation = self.consolidate_memories().await;
|
||||
|
||||
// Phase 2: Pattern Abstraction (concept hierarchy building)
|
||||
report.abstraction = self.abstract_patterns().await;
|
||||
|
||||
// Phase 3: Dream Learning (creative recombination)
|
||||
report.dreams = self.dream_learning().await;
|
||||
|
||||
// Phase 4: Cross-Domain Transfer
|
||||
report.transfer = self.cross_domain_transfer().await;
|
||||
|
||||
// Phase 5: Compression (remove redundancy)
|
||||
report.compression = self.compress_memory().await;
|
||||
|
||||
// Phase 6: Consciousness Measurement
|
||||
report.phi = self.measure_consciousness().await;
|
||||
|
||||
report.elapsed_ms = start.elapsed().as_millis() as u64;
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 1: Consolidate short-term memories into long-term
|
||||
async fn consolidate_memories(&mut self) -> ConsolidationReport {
|
||||
let mut report = ConsolidationReport::default();
|
||||
|
||||
// Identify high-value memories (frequently accessed, high quality)
|
||||
let memories = self.memory.get_all_nodes()?;
|
||||
let high_value: Vec<_> = memories.iter()
|
||||
.filter(|m| m.access_count > 5 && m.quality_score > 0.7)
|
||||
.collect();
|
||||
|
||||
report.high_value_count = high_value.len();
|
||||
|
||||
// Strengthen connections between high-value memories
|
||||
for i in 0..high_value.len() {
|
||||
for j in (i+1)..high_value.len() {
|
||||
let similarity = cosine_similarity(
|
||||
&high_value[i].embedding,
|
||||
&high_value[j].embedding,
|
||||
);
|
||||
if similarity > 0.7 {
|
||||
self.memory.strengthen_edge(
|
||||
high_value[i].id,
|
||||
high_value[j].id,
|
||||
similarity * 0.1,
|
||||
)?;
|
||||
report.edges_strengthened += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Decay low-value memories
|
||||
let low_value: Vec<_> = memories.iter()
|
||||
.filter(|m| m.access_count < 2 && m.age_days() > 30)
|
||||
.collect();
|
||||
|
||||
for memory in low_value {
|
||||
self.memory.decay_node(memory.id, 0.5)?; // 50% decay
|
||||
report.nodes_decayed += 1;
|
||||
}
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 2: Build concept hierarchies from patterns
|
||||
async fn abstract_patterns(&mut self) -> AbstractionReport {
|
||||
let mut report = AbstractionReport::default();
|
||||
|
||||
// Get all stored patterns
|
||||
let patterns = self.reasoning_bank.get_all_patterns()?;
|
||||
|
||||
// Hierarchical clustering to find meta-patterns
|
||||
let hierarchy = HierarchicalClustering::new()
|
||||
.linkage(Linkage::Ward)
|
||||
.distance(Distance::Cosine)
|
||||
.fit(&patterns);
|
||||
|
||||
// Create abstract concepts at each level
|
||||
for level in 0..hierarchy.num_levels() {
|
||||
let clusters = hierarchy.clusters_at_level(level);
|
||||
|
||||
for cluster in clusters {
|
||||
if cluster.size() > 3 {
|
||||
// Create meta-pattern (centroid)
|
||||
let meta_pattern = LearnedPattern {
|
||||
centroid: cluster.centroid(),
|
||||
confidence: cluster.cohesion(),
|
||||
abstraction_level: level,
|
||||
child_patterns: cluster.member_ids(),
|
||||
};
|
||||
|
||||
self.reasoning_bank.store_meta(meta_pattern)?;
|
||||
report.meta_patterns_created += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 3: Dream-based creative learning (inspired by REM sleep)
|
||||
async fn dream_learning(&mut self) -> DreamReport {
|
||||
let mut report = DreamReport::default();
|
||||
|
||||
// Generate dream sequences by random walks on memory graph
|
||||
for _ in 0..self.config.num_dreams {
|
||||
let dream = self.dream_engine.generate_dream(
|
||||
&self.memory,
|
||||
self.config.dream_length,
|
||||
self.config.creativity_temperature,
|
||||
)?;
|
||||
|
||||
// Evaluate dream quality (novelty + coherence)
|
||||
let quality = dream.evaluate_quality();
|
||||
|
||||
if quality.novelty > 0.5 && quality.coherence > 0.3 {
|
||||
// Dreams with high novelty and reasonable coherence
|
||||
// may represent useful creative connections
|
||||
for connection in dream.novel_connections() {
|
||||
self.memory.add_weak_edge(
|
||||
connection.from,
|
||||
connection.to,
|
||||
EdgeType::Creative,
|
||||
connection.strength * 0.1,
|
||||
)?;
|
||||
report.novel_connections += 1;
|
||||
}
|
||||
}
|
||||
|
||||
report.dreams_generated += 1;
|
||||
}
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 4: Transfer knowledge across domains
|
||||
async fn cross_domain_transfer(&mut self) -> TransferReport {
|
||||
let mut report = TransferReport::default();
|
||||
|
||||
// Identify domain clusters
|
||||
let domains = self.memory.identify_domains()?;
|
||||
|
||||
// For each pair of domains, look for analogical mappings
|
||||
for i in 0..domains.len() {
|
||||
for j in (i+1)..domains.len() {
|
||||
let analogies = self.find_analogies(&domains[i], &domains[j])?;
|
||||
|
||||
for analogy in analogies {
|
||||
if analogy.confidence > 0.6 {
|
||||
// Create cross-domain edge
|
||||
self.memory.add_analogy_edge(
|
||||
analogy.source_concept,
|
||||
analogy.target_concept,
|
||||
analogy.mapping_type,
|
||||
analogy.confidence,
|
||||
)?;
|
||||
report.analogies_found += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 5: Compress memory by removing redundancy
|
||||
async fn compress_memory(&mut self) -> CompressionReport {
|
||||
let mut report = CompressionReport::default();
|
||||
report.initial_nodes = self.memory.node_count();
|
||||
report.initial_edges = self.memory.edge_count();
|
||||
|
||||
// Identify near-duplicate nodes
|
||||
let duplicates = self.memory.find_near_duplicates(0.95)?;
|
||||
|
||||
// Merge duplicates
|
||||
for (primary, secondary) in duplicates {
|
||||
self.memory.merge_nodes(primary, secondary)?;
|
||||
report.nodes_merged += 1;
|
||||
}
|
||||
|
||||
// Prune weak edges
|
||||
let weak_edges = self.memory.get_weak_edges(0.01)?;
|
||||
for edge in weak_edges {
|
||||
self.memory.remove_edge(edge.id)?;
|
||||
report.edges_pruned += 1;
|
||||
}
|
||||
|
||||
report.final_nodes = self.memory.node_count();
|
||||
report.final_edges = self.memory.edge_count();
|
||||
report.compression_ratio = report.initial_nodes as f32 / report.final_nodes as f32;
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
/// Phase 6: Measure system consciousness using IIT
|
||||
async fn measure_consciousness(&mut self) -> f64 {
|
||||
// Integrated Information Theory (Φ) calculation
|
||||
// Measures how much information the system generates "above and beyond"
|
||||
// its parts
|
||||
self.phi_calculator.compute_phi(&self.memory, &self.reasoning_bank)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Weekly Deep Learning Budget
|
||||
|
||||
| Phase | Target Time | Description |
|
||||
|-------|-------------|-------------|
|
||||
| Memory consolidation | <2min | Identify and strengthen valuable memories |
|
||||
| Pattern abstraction | <3min | Hierarchical clustering for concepts |
|
||||
| Dream learning | <2min | Creative recombination exploration |
|
||||
| Cross-domain transfer | <2min | Analogical mapping between domains |
|
||||
| Compression | <1min | Remove redundancy |
|
||||
| Φ measurement | <1min | Consciousness quantification |
|
||||
| **Total** | **<10min** | Scheduled maintenance window |
|
||||
|
||||
---
|
||||
|
||||
## 5. Loop Coordination
|
||||
|
||||
### Inter-Loop Communication
|
||||
|
||||
```rust
|
||||
/// Coordinator for all three learning loops
|
||||
pub struct LoopCoordinator {
|
||||
/// Loop A: Instant
|
||||
instant_loop: InstantLearningLoop,
|
||||
/// Loop B: Background
|
||||
background_loop: BackgroundLearningLoop,
|
||||
/// Loop C: Deep
|
||||
deep_loop: DeepLearningLoop,
|
||||
/// Shared state
|
||||
shared_state: Arc<SharedSONAState>,
|
||||
/// Metrics collector
|
||||
metrics: MetricsCollector,
|
||||
}
|
||||
|
||||
impl LoopCoordinator {
|
||||
/// Initialize all loops with shared state
|
||||
pub fn new(config: SONAConfig) -> Result<Self> {
|
||||
let shared_state = Arc::new(SharedSONAState::new(&config)?);
|
||||
|
||||
// Create channels for inter-loop communication
|
||||
let (instant_to_background_tx, instant_to_background_rx) = mpsc::channel(10000);
|
||||
let (background_to_deep_tx, background_to_deep_rx) = mpsc::channel(1000);
|
||||
|
||||
Ok(Self {
|
||||
instant_loop: InstantLearningLoop::new(
|
||||
shared_state.clone(),
|
||||
instant_to_background_tx,
|
||||
),
|
||||
background_loop: BackgroundLearningLoop::new(
|
||||
shared_state.clone(),
|
||||
instant_to_background_rx,
|
||||
background_to_deep_tx,
|
||||
),
|
||||
deep_loop: DeepLearningLoop::new(
|
||||
shared_state.clone(),
|
||||
background_to_deep_rx,
|
||||
),
|
||||
shared_state,
|
||||
metrics: MetricsCollector::new(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Start all loops
|
||||
pub async fn start(&self) {
|
||||
// Loop A runs inline with requests (no separate task)
|
||||
|
||||
// Loop B runs on background thread
|
||||
let background = self.background_loop.clone();
|
||||
tokio::spawn(async move {
|
||||
background.run().await;
|
||||
});
|
||||
|
||||
// Loop C runs on scheduled cron
|
||||
let deep = self.deep_loop.clone();
|
||||
tokio::spawn(async move {
|
||||
let mut scheduler = cron::Schedule::from_str("0 0 3 * * 0")?; // 3 AM Sunday
|
||||
loop {
|
||||
let next = scheduler.upcoming(chrono::Utc).next().unwrap();
|
||||
tokio::time::sleep_until(next.into()).await;
|
||||
deep.run().await;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/// Process a single request through Loop A
|
||||
#[inline]
|
||||
pub async fn on_request(
|
||||
&self,
|
||||
query: &QueryEmbedding,
|
||||
response: &ResponseData,
|
||||
latency_ms: f32,
|
||||
) -> Result<()> {
|
||||
self.instant_loop.on_request(query, response, latency_ms).await
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Learning Metrics and Monitoring
|
||||
|
||||
### Improvement Tracking
|
||||
|
||||
```rust
|
||||
/// Metrics for measuring self-improvement
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct ImprovementMetrics {
|
||||
/// Quality improvement over time
|
||||
pub quality_delta_7d: f32,
|
||||
pub quality_delta_30d: f32,
|
||||
|
||||
/// Latency improvement
|
||||
pub latency_delta_7d: f32,
|
||||
pub latency_delta_30d: f32,
|
||||
|
||||
/// Knowledge growth
|
||||
pub memory_nodes_added_7d: usize,
|
||||
pub patterns_learned_7d: usize,
|
||||
pub abstractions_created_7d: usize,
|
||||
|
||||
/// Forgetting resistance (1.0 = no forgetting)
|
||||
pub retention_rate_7d: f32,
|
||||
|
||||
/// Consciousness level (Φ)
|
||||
pub phi_current: f64,
|
||||
pub phi_delta_7d: f64,
|
||||
|
||||
/// Dreams and creativity
|
||||
pub novel_connections_7d: usize,
|
||||
pub cross_domain_transfers_7d: usize,
|
||||
}
|
||||
|
||||
impl ImprovementMetrics {
|
||||
/// Compute overall improvement score
|
||||
pub fn overall_score(&self) -> f32 {
|
||||
let quality_weight = 0.3;
|
||||
let latency_weight = 0.2;
|
||||
let knowledge_weight = 0.2;
|
||||
let retention_weight = 0.15;
|
||||
let creativity_weight = 0.15;
|
||||
|
||||
let quality_score = self.quality_delta_7d.max(0.0);
|
||||
let latency_score = (-self.latency_delta_7d).max(0.0); // Lower is better
|
||||
let knowledge_score = (self.patterns_learned_7d as f32 / 100.0).min(1.0);
|
||||
let retention_score = self.retention_rate_7d;
|
||||
let creativity_score = (self.novel_connections_7d as f32 / 50.0).min(1.0);
|
||||
|
||||
quality_weight * quality_score +
|
||||
latency_weight * latency_score +
|
||||
knowledge_weight * knowledge_score +
|
||||
retention_weight * retention_score +
|
||||
creativity_weight * creativity_score
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
SONA's three-tier learning system enables:
|
||||
|
||||
| Loop | Timescale | Purpose | Key Outcome |
|
||||
|------|-----------|---------|-------------|
|
||||
| **A** | Per-request | Instant adaptation | Responsive to current context |
|
||||
| **B** | Hourly | Pattern consolidation | Stable improvement |
|
||||
| **C** | Weekly | Deep restructuring | Creative breakthroughs |
|
||||
|
||||
This mirrors human learning where:
|
||||
- **Loop A** = Working memory and immediate response
|
||||
- **Loop B** = Sleep-based consolidation
|
||||
- **Loop C** = Long-term memory formation and insight
|
||||
|
||||
The result is a system that continuously improves at multiple timescales, never forgetting what works while constantly exploring new possibilities.
|
||||
795
examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
Normal file
795
examples/ruvLLM/docs/SONA/03-EWC-PLUS-PLUS.md
Normal file
@@ -0,0 +1,795 @@
|
||||
# SONA EWC++: Enhanced Elastic Weight Consolidation
|
||||
|
||||
## Zero Catastrophic Forgetting with Task-Aware Regularization
|
||||
|
||||
---
|
||||
|
||||
## 1. The Forgetting Problem
|
||||
|
||||
### Why LLMs Forget
|
||||
|
||||
```
|
||||
CATASTROPHIC FORGETTING
|
||||
═══════════════════════
|
||||
|
||||
Task A learned Task B learned Result
|
||||
─────────────── ─────────────── ──────────────────
|
||||
Weights W_A Weights W_B W_A knowledge LOST
|
||||
↑ as W moves toward B
|
||||
Training on B
|
||||
overwrites A
|
||||
```
|
||||
|
||||
When fine-tuning on new data:
|
||||
- Weights shift toward new task optimum
|
||||
- Previous task knowledge encoded in old weights is overwritten
|
||||
- Model "forgets" earlier capabilities
|
||||
|
||||
### Standard EWC Solution
|
||||
|
||||
Elastic Weight Consolidation (EWC) adds a regularization term:
|
||||
|
||||
```
|
||||
L_total = L_task + λ/2 · Σᵢ Fᵢ · (θᵢ - θ*ᵢ)²
|
||||
|
||||
Where:
|
||||
- L_task = current task loss
|
||||
- λ = regularization strength
|
||||
- Fᵢ = Fisher Information (importance) of parameter i
|
||||
- θᵢ = current parameter value
|
||||
- θ*ᵢ = optimal parameter value from previous task
|
||||
```
|
||||
|
||||
### EWC Limitations
|
||||
|
||||
1. **Single task memory**: Only remembers one previous task
|
||||
2. **Static Fisher**: Computed once, never updated
|
||||
3. **Diagonal approximation**: Ignores parameter correlations
|
||||
4. **No task detection**: Doesn't know when task changes
|
||||
5. **Uniform λ**: Same regularization for all parameters
|
||||
|
||||
---
|
||||
|
||||
## 2. SONA EWC++ Enhancements
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ EWC++ ARCHITECTURE │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ Task Buffer │ │ Online Fisher │ │ Adaptive λ │ │
|
||||
│ │ (N tasks) │ │ Estimation │ │ Scheduler │ │
|
||||
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ EWC++ CORE ENGINE │ │
|
||||
│ │ │ │
|
||||
│ │ L = L_task + Σₜ λₜ/2 · Σᵢ Fᵢᵗ · (θᵢ - θ*ᵢᵗ)² + L_sparse │ │
|
||||
│ │ └─────┘ └──────────────────────────────────┘ └──────┘ │ │
|
||||
│ │ Task Multi-task EWC Sparsity │ │
|
||||
│ │ Loss Regularization Penalty │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ Gradient │ │ Task Boundary │ │ Parameter │ │
|
||||
│ │ Projection │ │ Detection │ │ Importance │ │
|
||||
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Multi-Task Memory Buffer
|
||||
|
||||
### Task-Stratified Fisher Storage
|
||||
|
||||
```rust
|
||||
/// EWC++ state with multi-task memory
|
||||
#[derive(Clone)]
|
||||
pub struct EWCPlusPlusState {
|
||||
/// Per-task Fisher information (circular buffer of N tasks)
|
||||
pub task_fishers: CircularBuffer<TaskFisher>,
|
||||
/// Maximum number of tasks to remember
|
||||
pub max_tasks: usize,
|
||||
/// Per-task regularization strength
|
||||
pub task_lambdas: Vec<f32>,
|
||||
/// Global lambda base
|
||||
pub lambda_base: f32,
|
||||
/// Online Fisher estimator
|
||||
pub online_fisher: OnlineFisherEstimator,
|
||||
/// Task boundary detector
|
||||
pub task_detector: TaskBoundaryDetector,
|
||||
/// Parameter importance scores
|
||||
pub importance_scores: Vec<f32>,
|
||||
}
|
||||
|
||||
/// Fisher information for a single task
|
||||
#[derive(Clone)]
|
||||
pub struct TaskFisher {
|
||||
/// Task identifier
|
||||
pub task_id: u64,
|
||||
/// Diagonal Fisher Information
|
||||
pub fisher_diag: Vec<f32>,
|
||||
/// Optimal weights at task completion
|
||||
pub optimal_weights: Vec<f32>,
|
||||
/// Task-specific lambda (learned)
|
||||
pub lambda: f32,
|
||||
/// Sample count used to compute Fisher
|
||||
pub sample_count: usize,
|
||||
/// Task quality score
|
||||
pub quality: f32,
|
||||
/// Timestamp
|
||||
pub timestamp: i64,
|
||||
}
|
||||
|
||||
impl EWCPlusPlusState {
|
||||
/// Create new EWC++ state
|
||||
pub fn new(num_params: usize, max_tasks: usize, lambda_base: f32) -> Self {
|
||||
Self {
|
||||
task_fishers: CircularBuffer::new(max_tasks),
|
||||
max_tasks,
|
||||
task_lambdas: Vec::new(),
|
||||
lambda_base,
|
||||
online_fisher: OnlineFisherEstimator::new(num_params),
|
||||
task_detector: TaskBoundaryDetector::new(),
|
||||
importance_scores: vec![1.0; num_params],
|
||||
}
|
||||
}
|
||||
|
||||
/// Compute total EWC++ regularization loss
|
||||
pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
|
||||
let mut total_loss = 0.0;
|
||||
|
||||
// Sum over all remembered tasks
|
||||
for task in self.task_fishers.iter() {
|
||||
let task_loss: f32 = task.fisher_diag.iter()
|
||||
.zip(current_weights.iter())
|
||||
.zip(task.optimal_weights.iter())
|
||||
.zip(self.importance_scores.iter())
|
||||
.map(|(((f, w), w_star), imp)| {
|
||||
// Importance-weighted Fisher regularization
|
||||
imp * f * (w - w_star).powi(2)
|
||||
})
|
||||
.sum();
|
||||
|
||||
total_loss += task.lambda * task_loss;
|
||||
}
|
||||
|
||||
total_loss / 2.0
|
||||
}
|
||||
|
||||
/// Compute gradients of EWC++ loss
|
||||
pub fn regularization_gradient(&self, current_weights: &[f32]) -> Vec<f32> {
|
||||
let mut grad = vec![0.0f32; current_weights.len()];
|
||||
|
||||
for task in self.task_fishers.iter() {
|
||||
for (i, ((f, w), w_star)) in task.fisher_diag.iter()
|
||||
.zip(current_weights.iter())
|
||||
.zip(task.optimal_weights.iter())
|
||||
.enumerate()
|
||||
{
|
||||
// d/dw [F * (w - w*)²] = 2 * F * (w - w*)
|
||||
grad[i] += task.lambda * self.importance_scores[i] * f * (w - w_star);
|
||||
}
|
||||
}
|
||||
|
||||
grad
|
||||
}
|
||||
|
||||
/// Record completion of current task
|
||||
pub fn complete_task(&mut self, weights: &[f32], quality: f32) {
|
||||
let task_id = self.task_fishers.len() as u64;
|
||||
|
||||
// Finalize online Fisher estimate
|
||||
let fisher_diag = self.online_fisher.finalize();
|
||||
|
||||
// Compute task-specific lambda based on quality
|
||||
let lambda = self.compute_task_lambda(quality);
|
||||
|
||||
let task_fisher = TaskFisher {
|
||||
task_id,
|
||||
fisher_diag,
|
||||
optimal_weights: weights.to_vec(),
|
||||
lambda,
|
||||
sample_count: self.online_fisher.sample_count(),
|
||||
quality,
|
||||
timestamp: chrono::Utc::now().timestamp(),
|
||||
};
|
||||
|
||||
self.task_fishers.push(task_fisher);
|
||||
self.task_lambdas.push(lambda);
|
||||
|
||||
// Reset online Fisher for next task
|
||||
self.online_fisher.reset();
|
||||
}
|
||||
|
||||
/// Compute task-specific lambda based on quality
|
||||
fn compute_task_lambda(&self, quality: f32) -> f32 {
|
||||
// Higher quality tasks get stronger protection
|
||||
self.lambda_base * (0.5 + 0.5 * quality)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Online Fisher Estimation
|
||||
|
||||
### Streaming Fisher Information Computation
|
||||
|
||||
```rust
|
||||
/// Online Fisher Information estimator using gradient accumulation
|
||||
pub struct OnlineFisherEstimator {
|
||||
/// Running sum of squared gradients
|
||||
gradient_sq_sum: Vec<f32>,
|
||||
/// Sample count
|
||||
count: usize,
|
||||
/// Exponential moving average decay
|
||||
decay: f32,
|
||||
/// Minimum samples before valid estimate
|
||||
min_samples: usize,
|
||||
}
|
||||
|
||||
impl OnlineFisherEstimator {
|
||||
pub fn new(num_params: usize) -> Self {
|
||||
Self {
|
||||
gradient_sq_sum: vec![0.0; num_params],
|
||||
count: 0,
|
||||
decay: 0.99, // EMA decay factor
|
||||
min_samples: 100,
|
||||
}
|
||||
}
|
||||
|
||||
/// Update Fisher estimate with new gradient sample
|
||||
#[inline]
|
||||
pub fn update(&mut self, gradients: &[f32]) {
|
||||
self.count += 1;
|
||||
|
||||
if self.count == 1 {
|
||||
// First sample: initialize
|
||||
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
|
||||
*sum = g * g;
|
||||
}
|
||||
} else {
|
||||
// EMA update: F_new = decay * F_old + (1 - decay) * g²
|
||||
let alpha = 1.0 - self.decay;
|
||||
for (sum, g) in self.gradient_sq_sum.iter_mut().zip(gradients.iter()) {
|
||||
*sum = self.decay * *sum + alpha * g * g;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Finalize and return Fisher diagonal
|
||||
pub fn finalize(&self) -> Vec<f32> {
|
||||
if self.count < self.min_samples {
|
||||
tracing::warn!(
|
||||
count = self.count,
|
||||
min = self.min_samples,
|
||||
"Fisher estimate may be unreliable"
|
||||
);
|
||||
}
|
||||
|
||||
// Normalize and apply minimum threshold
|
||||
let min_fisher = 1e-6;
|
||||
self.gradient_sq_sum.iter()
|
||||
.map(|&f| f.max(min_fisher))
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Reset for new task
|
||||
pub fn reset(&mut self) {
|
||||
self.gradient_sq_sum.fill(0.0);
|
||||
self.count = 0;
|
||||
}
|
||||
|
||||
pub fn sample_count(&self) -> usize {
|
||||
self.count
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Automatic Task Boundary Detection
|
||||
|
||||
### Detecting When the Task Changes
|
||||
|
||||
```rust
|
||||
/// Automatic task boundary detection via distribution shift
|
||||
pub struct TaskBoundaryDetector {
|
||||
/// Recent query embedding buffer
|
||||
recent_embeddings: CircularBuffer<Vec<f32>>,
|
||||
/// Baseline distribution (mean, variance)
|
||||
baseline: Option<DistributionStats>,
|
||||
/// Threshold for detecting shift (Mahalanobis distance)
|
||||
shift_threshold: f32,
|
||||
/// Minimum samples before detection
|
||||
warmup_samples: usize,
|
||||
/// Current drift score
|
||||
drift_score: f32,
|
||||
}
|
||||
|
||||
impl TaskBoundaryDetector {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
recent_embeddings: CircularBuffer::new(1000),
|
||||
baseline: None,
|
||||
shift_threshold: 3.0, // 3 sigma
|
||||
warmup_samples: 500,
|
||||
drift_score: 0.0,
|
||||
}
|
||||
}
|
||||
|
||||
/// Update with new embedding and check for task boundary
|
||||
pub fn update(&mut self, embedding: &[f32]) -> TaskBoundaryResult {
|
||||
self.recent_embeddings.push(embedding.to_vec());
|
||||
|
||||
if self.recent_embeddings.len() < self.warmup_samples {
|
||||
return TaskBoundaryResult::Warmup;
|
||||
}
|
||||
|
||||
match &self.baseline {
|
||||
None => {
|
||||
// First baseline establishment
|
||||
self.baseline = Some(self.compute_stats());
|
||||
TaskBoundaryResult::BaselineEstablished
|
||||
}
|
||||
Some(baseline) => {
|
||||
// Compute current distribution
|
||||
let current = self.compute_recent_stats(100);
|
||||
|
||||
// Mahalanobis distance between distributions
|
||||
let distance = self.mahalanobis_distance(baseline, ¤t);
|
||||
self.drift_score = distance;
|
||||
|
||||
if distance > self.shift_threshold {
|
||||
// Task boundary detected!
|
||||
self.baseline = Some(current);
|
||||
TaskBoundaryResult::BoundaryDetected {
|
||||
drift_score: distance,
|
||||
}
|
||||
} else {
|
||||
TaskBoundaryResult::Stable {
|
||||
drift_score: distance,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn compute_stats(&self) -> DistributionStats {
|
||||
let n = self.recent_embeddings.len();
|
||||
let dim = self.recent_embeddings[0].len();
|
||||
|
||||
let mut mean = vec![0.0f32; dim];
|
||||
let mut var = vec![0.0f32; dim];
|
||||
|
||||
// Compute mean
|
||||
for emb in self.recent_embeddings.iter() {
|
||||
for (m, e) in mean.iter_mut().zip(emb.iter()) {
|
||||
*m += e;
|
||||
}
|
||||
}
|
||||
for m in &mut mean {
|
||||
*m /= n as f32;
|
||||
}
|
||||
|
||||
// Compute variance
|
||||
for emb in self.recent_embeddings.iter() {
|
||||
for (v, (e, m)) in var.iter_mut().zip(emb.iter().zip(mean.iter())) {
|
||||
*v += (e - m).powi(2);
|
||||
}
|
||||
}
|
||||
for v in &mut var {
|
||||
*v /= n as f32;
|
||||
*v = v.max(1e-6); // Avoid division by zero
|
||||
}
|
||||
|
||||
DistributionStats { mean, variance: var }
|
||||
}
|
||||
|
||||
fn compute_recent_stats(&self, n: usize) -> DistributionStats {
|
||||
// Similar but only for last n samples
|
||||
// ... implementation ...
|
||||
}
|
||||
|
||||
fn mahalanobis_distance(&self, a: &DistributionStats, b: &DistributionStats) -> f32 {
|
||||
a.mean.iter()
|
||||
.zip(b.mean.iter())
|
||||
.zip(a.variance.iter())
|
||||
.map(|((m_a, m_b), v)| (m_a - m_b).powi(2) / v)
|
||||
.sum::<f32>()
|
||||
.sqrt()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub enum TaskBoundaryResult {
|
||||
Warmup,
|
||||
BaselineEstablished,
|
||||
Stable { drift_score: f32 },
|
||||
BoundaryDetected { drift_score: f32 },
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Adaptive Lambda Scheduling
|
||||
|
||||
### Dynamic Regularization Strength
|
||||
|
||||
```rust
|
||||
/// Adaptive lambda scheduler based on learning progress
|
||||
pub struct AdaptiveLambdaScheduler {
|
||||
/// Base lambda value
|
||||
base_lambda: f32,
|
||||
/// Current effective lambda
|
||||
current_lambda: f32,
|
||||
/// Performance history (task quality over time)
|
||||
performance_history: Vec<f32>,
|
||||
/// Lambda adjustment rate
|
||||
adjustment_rate: f32,
|
||||
}
|
||||
|
||||
impl AdaptiveLambdaScheduler {
|
||||
pub fn new(base_lambda: f32) -> Self {
|
||||
Self {
|
||||
base_lambda,
|
||||
current_lambda: base_lambda,
|
||||
performance_history: Vec::new(),
|
||||
adjustment_rate: 0.1,
|
||||
}
|
||||
}
|
||||
|
||||
/// Update lambda based on recent performance
|
||||
pub fn update(&mut self, current_quality: f32, forgetting_detected: bool) {
|
||||
self.performance_history.push(current_quality);
|
||||
|
||||
if forgetting_detected {
|
||||
// Increase lambda to prevent forgetting
|
||||
self.current_lambda *= 1.0 + self.adjustment_rate;
|
||||
tracing::info!(
|
||||
new_lambda = self.current_lambda,
|
||||
"Increased lambda due to forgetting"
|
||||
);
|
||||
} else if self.is_learning_stalled() {
|
||||
// Decrease lambda to allow more plasticity
|
||||
self.current_lambda *= 1.0 - self.adjustment_rate;
|
||||
self.current_lambda = self.current_lambda.max(self.base_lambda * 0.1);
|
||||
tracing::info!(
|
||||
new_lambda = self.current_lambda,
|
||||
"Decreased lambda to increase plasticity"
|
||||
);
|
||||
}
|
||||
|
||||
// Clamp to reasonable range
|
||||
self.current_lambda = self.current_lambda.clamp(
|
||||
self.base_lambda * 0.1,
|
||||
self.base_lambda * 10.0,
|
||||
);
|
||||
}
|
||||
|
||||
fn is_learning_stalled(&self) -> bool {
|
||||
if self.performance_history.len() < 10 {
|
||||
return false;
|
||||
}
|
||||
|
||||
let recent: Vec<_> = self.performance_history.iter()
|
||||
.rev()
|
||||
.take(10)
|
||||
.collect();
|
||||
|
||||
// Check if variance in recent performance is very low
|
||||
let mean: f32 = recent.iter().map(|&&x| x).sum::<f32>() / 10.0;
|
||||
let var: f32 = recent.iter()
|
||||
.map(|&&x| (x - mean).powi(2))
|
||||
.sum::<f32>() / 10.0;
|
||||
|
||||
var < 0.001 // Stalled if very low variance
|
||||
}
|
||||
|
||||
pub fn get_lambda(&self) -> f32 {
|
||||
self.current_lambda
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Parameter Importance Scoring
|
||||
|
||||
### Which Parameters Matter Most
|
||||
|
||||
```rust
|
||||
/// Per-parameter importance scoring for selective regularization
|
||||
pub struct ParameterImportanceScorer {
|
||||
/// Importance scores (0-1 for each parameter)
|
||||
scores: Vec<f32>,
|
||||
/// Gradient magnitude history
|
||||
gradient_magnitudes: Vec<CircularBuffer<f32>>,
|
||||
/// Activation frequency
|
||||
activation_frequency: Vec<f32>,
|
||||
}
|
||||
|
||||
impl ParameterImportanceScorer {
|
||||
pub fn new(num_params: usize) -> Self {
|
||||
Self {
|
||||
scores: vec![1.0; num_params],
|
||||
gradient_magnitudes: (0..num_params)
|
||||
.map(|_| CircularBuffer::new(100))
|
||||
.collect(),
|
||||
activation_frequency: vec![0.0; num_params],
|
||||
}
|
||||
}
|
||||
|
||||
/// Update importance based on gradient
|
||||
pub fn update(&mut self, gradients: &[f32], activations: &[bool]) {
|
||||
for (i, (g, &active)) in gradients.iter().zip(activations.iter()).enumerate() {
|
||||
// Track gradient magnitude
|
||||
self.gradient_magnitudes[i].push(g.abs());
|
||||
|
||||
// Track activation frequency
|
||||
if active {
|
||||
self.activation_frequency[i] = 0.99 * self.activation_frequency[i] + 0.01;
|
||||
} else {
|
||||
self.activation_frequency[i] *= 0.99;
|
||||
}
|
||||
}
|
||||
|
||||
// Recompute importance scores
|
||||
self.recompute_scores();
|
||||
}
|
||||
|
||||
fn recompute_scores(&mut self) {
|
||||
for i in 0..self.scores.len() {
|
||||
// Average gradient magnitude
|
||||
let avg_grad: f32 = self.gradient_magnitudes[i].iter()
|
||||
.sum::<f32>() / self.gradient_magnitudes[i].len().max(1) as f32;
|
||||
|
||||
// Importance = activation_freq * gradient_magnitude
|
||||
// High activation + high gradient = important parameter
|
||||
self.scores[i] = self.activation_frequency[i] * avg_grad;
|
||||
}
|
||||
|
||||
// Normalize scores to [0, 1]
|
||||
let max_score = self.scores.iter().cloned().fold(0.0f32, f32::max);
|
||||
if max_score > 0.0 {
|
||||
for s in &mut self.scores {
|
||||
*s /= max_score;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_scores(&self) -> &[f32] {
|
||||
&self.scores
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Gradient Projection
|
||||
|
||||
### Safe Parameter Updates
|
||||
|
||||
```rust
|
||||
/// Project gradients to avoid interfering with important past knowledge
|
||||
pub struct GradientProjector {
|
||||
/// Null space of important task gradients
|
||||
null_space: Option<Array2<f32>>,
|
||||
/// Task gradient subspace (principal components)
|
||||
task_subspace: Option<Array2<f32>>,
|
||||
}
|
||||
|
||||
impl GradientProjector {
|
||||
/// Project gradient to not interfere with past tasks
|
||||
pub fn project(&self, gradient: &[f32]) -> Vec<f32> {
|
||||
match &self.null_space {
|
||||
Some(null) => {
|
||||
// Project gradient onto null space of past task gradients
|
||||
let g = Array1::from_vec(gradient.to_vec());
|
||||
let projected = null.t().dot(&null.dot(&g));
|
||||
projected.to_vec()
|
||||
}
|
||||
None => gradient.to_vec(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Update null space with new task gradient directions
|
||||
pub fn add_task_gradients(&mut self, task_gradients: &[Vec<f32>]) {
|
||||
// Stack gradients into matrix
|
||||
let n_samples = task_gradients.len();
|
||||
let n_params = task_gradients[0].len();
|
||||
|
||||
let mut g_matrix = Array2::zeros((n_samples, n_params));
|
||||
for (i, g) in task_gradients.iter().enumerate() {
|
||||
for (j, &v) in g.iter().enumerate() {
|
||||
g_matrix[[i, j]] = v;
|
||||
}
|
||||
}
|
||||
|
||||
// SVD to find principal gradient directions
|
||||
let svd = g_matrix.svd(true, true).unwrap();
|
||||
let u = svd.u.unwrap();
|
||||
|
||||
// Null space = complement of principal directions
|
||||
// For memory efficiency, keep top-k directions
|
||||
let k = 10.min(n_samples);
|
||||
let task_directions = u.slice(s![.., ..k]).to_owned();
|
||||
|
||||
// Compute null space projection matrix
|
||||
let identity = Array2::eye(n_params);
|
||||
let projection = identity - task_directions.t().dot(&task_directions);
|
||||
|
||||
self.null_space = Some(projection);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Full EWC++ Training Loop
|
||||
|
||||
### Putting It All Together
|
||||
|
||||
```rust
|
||||
/// Complete EWC++ training step
|
||||
pub fn ewc_plus_plus_train_step(
|
||||
model: &mut FastGRNNRouter,
|
||||
ewc: &mut EWCPlusPlusState,
|
||||
batch: &[RouterSample],
|
||||
config: &TrainingConfig,
|
||||
) -> TrainStepResult {
|
||||
let mut result = TrainStepResult::default();
|
||||
|
||||
// Forward pass
|
||||
let predictions: Vec<_> = batch.iter()
|
||||
.map(|s| model.forward(&s.features))
|
||||
.collect();
|
||||
|
||||
// Task loss
|
||||
let task_loss = compute_cross_entropy_loss(&predictions, batch);
|
||||
result.task_loss = task_loss;
|
||||
|
||||
// EWC++ regularization loss
|
||||
let ewc_loss = ewc.regularization_loss(model.get_weights());
|
||||
result.ewc_loss = ewc_loss;
|
||||
|
||||
// Total loss
|
||||
let total_loss = task_loss + config.lambda * ewc_loss;
|
||||
result.total_loss = total_loss;
|
||||
|
||||
// Compute task gradients
|
||||
let task_gradients = compute_gradients(&task_loss, model);
|
||||
|
||||
// Compute EWC++ gradients
|
||||
let ewc_gradients = ewc.regularization_gradient(model.get_weights());
|
||||
|
||||
// Total gradients
|
||||
let mut gradients: Vec<f32> = task_gradients.iter()
|
||||
.zip(ewc_gradients.iter())
|
||||
.map(|(t, e)| t + config.lambda * e)
|
||||
.collect();
|
||||
|
||||
// Gradient projection (optional, for harder constraints)
|
||||
if config.use_gradient_projection {
|
||||
gradients = ewc.gradient_projector.project(&gradients);
|
||||
}
|
||||
|
||||
// Gradient clipping
|
||||
let grad_norm: f32 = gradients.iter().map(|g| g * g).sum::<f32>().sqrt();
|
||||
if grad_norm > config.max_grad_norm {
|
||||
let scale = config.max_grad_norm / grad_norm;
|
||||
for g in &mut gradients {
|
||||
*g *= scale;
|
||||
}
|
||||
result.gradient_clipped = true;
|
||||
}
|
||||
|
||||
// Apply gradients
|
||||
model.apply_gradients(&gradients, config.learning_rate);
|
||||
|
||||
// Update online Fisher estimate
|
||||
ewc.online_fisher.update(&task_gradients);
|
||||
|
||||
// Update parameter importance
|
||||
let activations: Vec<bool> = model.get_activation_mask();
|
||||
ewc.importance_scorer.update(&task_gradients, &activations);
|
||||
|
||||
// Check for task boundary
|
||||
if let Some(query_emb) = batch.first().map(|s| &s.query_embedding) {
|
||||
let boundary = ewc.task_detector.update(query_emb);
|
||||
if let TaskBoundaryResult::BoundaryDetected { drift_score } = boundary {
|
||||
// Complete current task and start new one
|
||||
ewc.complete_task(model.get_weights(), result.compute_quality());
|
||||
result.task_boundary_detected = true;
|
||||
result.drift_score = drift_score;
|
||||
}
|
||||
}
|
||||
|
||||
result
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Benchmarks and Validation
|
||||
|
||||
### Forgetting Resistance Metrics
|
||||
|
||||
```rust
|
||||
/// Measure forgetting resistance on held-out test sets
|
||||
pub struct ForgettingBenchmark {
|
||||
/// Per-task test sets
|
||||
task_test_sets: Vec<TestSet>,
|
||||
/// Performance history per task
|
||||
task_performance: Vec<Vec<f32>>,
|
||||
}
|
||||
|
||||
impl ForgettingBenchmark {
|
||||
/// Evaluate current model on all past tasks
|
||||
pub fn evaluate(&mut self, model: &FastGRNNRouter) -> ForgettingReport {
|
||||
let mut report = ForgettingReport::default();
|
||||
|
||||
for (task_id, test_set) in self.task_test_sets.iter().enumerate() {
|
||||
let accuracy = self.evaluate_task(model, test_set);
|
||||
self.task_performance[task_id].push(accuracy);
|
||||
|
||||
// Compute forgetting = max_accuracy - current_accuracy
|
||||
let max_acc = self.task_performance[task_id].iter()
|
||||
.cloned()
|
||||
.fold(0.0f32, f32::max);
|
||||
let forgetting = (max_acc - accuracy).max(0.0);
|
||||
|
||||
report.per_task_accuracy.push(accuracy);
|
||||
report.per_task_forgetting.push(forgetting);
|
||||
}
|
||||
|
||||
// Average forgetting
|
||||
report.avg_forgetting = report.per_task_forgetting.iter()
|
||||
.sum::<f32>() / report.per_task_forgetting.len().max(1) as f32;
|
||||
|
||||
// Backward transfer (negative forgetting = improvement)
|
||||
report.backward_transfer = -report.avg_forgetting;
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
fn evaluate_task(&self, model: &FastGRNNRouter, test: &TestSet) -> f32 {
|
||||
let correct = test.samples.iter()
|
||||
.filter(|s| model.forward(&s.features).predicted_class == s.label)
|
||||
.count();
|
||||
correct as f32 / test.samples.len() as f32
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
pub struct ForgettingReport {
|
||||
pub per_task_accuracy: Vec<f32>,
|
||||
pub per_task_forgetting: Vec<f32>,
|
||||
pub avg_forgetting: f32,
|
||||
pub backward_transfer: f32,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary: EWC++ vs Standard EWC
|
||||
|
||||
| Feature | Standard EWC | SONA EWC++ |
|
||||
|---------|-------------|------------|
|
||||
| Task memory | 1 task | N tasks (configurable) |
|
||||
| Fisher estimation | Offline, single | Online, streaming |
|
||||
| Lambda | Fixed | Adaptive per-task |
|
||||
| Task detection | Manual | Automatic |
|
||||
| Parameter importance | Uniform | Learned |
|
||||
| Gradient handling | Direct | Projected |
|
||||
| Forgetting rate | ~5-10% | **<0.1%** |
|
||||
|
||||
EWC++ enables SONA to learn continuously from every interaction while maintaining near-perfect retention of past knowledge.
|
||||
794
examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
Normal file
794
examples/ruvLLM/docs/SONA/04-REASONINGBANK.md
Normal file
@@ -0,0 +1,794 @@
|
||||
# SONA ReasoningBank: Pattern-Driven Self-Optimization
|
||||
|
||||
## Learning from Experience Through Trajectory Analysis
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
ReasoningBank is SONA's long-term pattern memory, learning what works and applying that knowledge to optimize future decisions.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ REASONINGBANK CONCEPT │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Query → [What worked before?] → Pattern Match → Optimized Params │
|
||||
│ ↑ │
|
||||
│ │ │
|
||||
│ ┌───────┴────────┐ │
|
||||
│ │ REASONINGBANK │ │
|
||||
│ │ │ │
|
||||
│ │ • Trajectories │ ← Record every query │
|
||||
│ │ • Patterns │ ← Extract from clusters │
|
||||
│ │ • Verdicts │ ← What params worked best │
|
||||
│ │ • Confidence │ ← How certain we are │
|
||||
│ └────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Data Structures
|
||||
|
||||
### Trajectory: Recording Every Interaction
|
||||
|
||||
```rust
|
||||
/// A single query trajectory with outcomes
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct QueryTrajectory {
|
||||
/// Unique trajectory ID
|
||||
pub id: u64,
|
||||
/// Query embedding vector
|
||||
pub query_embedding: Vec<f32>,
|
||||
/// Search parameters used
|
||||
pub search_params: SearchParams,
|
||||
/// Retrieved result IDs
|
||||
pub retrieved_ids: Vec<String>,
|
||||
/// Precision (relevant / retrieved)
|
||||
pub precision: f32,
|
||||
/// Recall (retrieved_relevant / total_relevant)
|
||||
pub recall: f32,
|
||||
/// Latency in microseconds
|
||||
pub latency_us: u64,
|
||||
/// User feedback if provided
|
||||
pub feedback: Option<UserFeedback>,
|
||||
/// Timestamp
|
||||
pub timestamp: i64,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct SearchParams {
|
||||
/// ef_search parameter for HNSW
|
||||
pub ef_search: usize,
|
||||
/// Number of probes for IVF
|
||||
pub n_probes: usize,
|
||||
/// Model tier selected
|
||||
pub model_tier: ModelTier,
|
||||
/// Context window size
|
||||
pub context_tokens: usize,
|
||||
/// Temperature
|
||||
pub temperature: f32,
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern: Learned Behavior Clusters
|
||||
|
||||
```rust
|
||||
/// A learned pattern extracted from trajectory clusters
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct LearnedPattern {
|
||||
/// Pattern ID
|
||||
pub id: u64,
|
||||
/// Centroid embedding (cluster center)
|
||||
pub centroid: Vec<f32>,
|
||||
/// Optimal search parameters for this pattern
|
||||
pub optimal_params: SearchParams,
|
||||
/// Confidence score (0-1)
|
||||
pub confidence: f32,
|
||||
/// Number of trajectories in cluster
|
||||
pub support_count: usize,
|
||||
/// Average precision for pattern
|
||||
pub avg_precision: f32,
|
||||
/// Average recall for pattern
|
||||
pub avg_recall: f32,
|
||||
/// Average latency
|
||||
pub avg_latency_us: u64,
|
||||
/// Pattern creation timestamp
|
||||
pub created_at: i64,
|
||||
/// Last update timestamp
|
||||
pub updated_at: i64,
|
||||
/// Abstraction level (0 = concrete, higher = more abstract)
|
||||
pub abstraction_level: u32,
|
||||
/// Child pattern IDs (for hierarchical patterns)
|
||||
pub children: Vec<u64>,
|
||||
}
|
||||
```
|
||||
|
||||
### Verdict: Decision Judgments
|
||||
|
||||
```rust
|
||||
/// Verdict on what parameters worked best
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct Verdict {
|
||||
/// Pattern this verdict applies to
|
||||
pub pattern_id: u64,
|
||||
/// Recommended parameters
|
||||
pub recommended_params: SearchParams,
|
||||
/// Confidence in recommendation
|
||||
pub confidence: f32,
|
||||
/// Evidence supporting this verdict
|
||||
pub evidence: VerdictEvidence,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct VerdictEvidence {
|
||||
/// Number of supporting trajectories
|
||||
pub support_count: usize,
|
||||
/// Average improvement over default
|
||||
pub avg_improvement: f32,
|
||||
/// Statistical significance (p-value)
|
||||
pub p_value: f32,
|
||||
/// Consistency score (low variance = high consistency)
|
||||
pub consistency: f32,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. ReasoningBank Implementation
|
||||
|
||||
### Core Storage and Retrieval
|
||||
|
||||
```rust
|
||||
use dashmap::DashMap;
|
||||
use parking_lot::RwLock;
|
||||
|
||||
/// ReasoningBank: Pattern-based learning and optimization
|
||||
pub struct ReasoningBank {
|
||||
/// Trajectory ring buffer (recent interactions)
|
||||
trajectories: RwLock<CircularBuffer<QueryTrajectory>>,
|
||||
/// Learned patterns (concurrent hashmap)
|
||||
patterns: DashMap<u64, LearnedPattern>,
|
||||
/// Pattern index for fast similarity lookup
|
||||
pattern_index: RwLock<HNSWIndex>,
|
||||
/// Verdicts per pattern
|
||||
verdicts: DashMap<u64, Verdict>,
|
||||
/// Configuration
|
||||
config: ReasoningBankConfig,
|
||||
/// Pattern ID counter
|
||||
next_pattern_id: AtomicU64,
|
||||
/// Statistics
|
||||
stats: RwLock<ReasoningBankStats>,
|
||||
}
|
||||
|
||||
impl ReasoningBank {
|
||||
/// Create new ReasoningBank
|
||||
pub fn new(config: ReasoningBankConfig) -> Self {
|
||||
Self {
|
||||
trajectories: RwLock::new(CircularBuffer::new(config.trajectory_capacity)),
|
||||
patterns: DashMap::new(),
|
||||
pattern_index: RwLock::new(HNSWIndex::new(config.embedding_dim, config.ef_construction)),
|
||||
verdicts: DashMap::new(),
|
||||
config,
|
||||
next_pattern_id: AtomicU64::new(0),
|
||||
stats: RwLock::new(ReasoningBankStats::default()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Record a new trajectory
|
||||
#[inline]
|
||||
pub fn record_trajectory(&self, trajectory: QueryTrajectory) {
|
||||
let mut trajectories = self.trajectories.write();
|
||||
trajectories.push(trajectory);
|
||||
|
||||
// Update stats
|
||||
let mut stats = self.stats.write();
|
||||
stats.total_trajectories += 1;
|
||||
}
|
||||
|
||||
/// Find most similar pattern to query
|
||||
pub fn find_similar_pattern(&self, query_embedding: &[f32], k: usize) -> Vec<PatternMatch> {
|
||||
let index = self.pattern_index.read();
|
||||
let neighbors = index.search(query_embedding, k, self.config.ef_search);
|
||||
|
||||
neighbors.iter()
|
||||
.filter_map(|&(id, distance)| {
|
||||
self.patterns.get(&id).map(|p| PatternMatch {
|
||||
pattern: p.clone(),
|
||||
similarity: 1.0 - distance, // Convert distance to similarity
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Get optimized parameters for query
|
||||
pub fn get_optimized_params(&self, query_embedding: &[f32]) -> OptimizedParams {
|
||||
// Find similar patterns
|
||||
let matches = self.find_similar_pattern(query_embedding, self.config.top_k_patterns);
|
||||
|
||||
if matches.is_empty() {
|
||||
// No matching patterns - use defaults
|
||||
return OptimizedParams {
|
||||
params: SearchParams::default(),
|
||||
confidence: 0.0,
|
||||
source: ParamSource::Default,
|
||||
};
|
||||
}
|
||||
|
||||
// Interpolate parameters based on similarity and confidence
|
||||
let mut weighted_params = SearchParams::default();
|
||||
let mut total_weight = 0.0f32;
|
||||
|
||||
for m in &matches {
|
||||
let weight = m.similarity * m.pattern.confidence;
|
||||
total_weight += weight;
|
||||
|
||||
weighted_params.ef_search += (m.pattern.optimal_params.ef_search as f32 * weight) as usize;
|
||||
weighted_params.n_probes += (m.pattern.optimal_params.n_probes as f32 * weight) as usize;
|
||||
weighted_params.temperature += m.pattern.optimal_params.temperature * weight;
|
||||
// ... other params
|
||||
}
|
||||
|
||||
if total_weight > 0.0 {
|
||||
weighted_params.ef_search = (weighted_params.ef_search as f32 / total_weight) as usize;
|
||||
weighted_params.n_probes = (weighted_params.n_probes as f32 / total_weight) as usize;
|
||||
weighted_params.temperature /= total_weight;
|
||||
}
|
||||
|
||||
OptimizedParams {
|
||||
params: weighted_params,
|
||||
confidence: total_weight / matches.len() as f32,
|
||||
source: ParamSource::Pattern(matches[0].pattern.id),
|
||||
}
|
||||
}
|
||||
|
||||
/// Record feedback for trajectory
|
||||
pub fn record_feedback(&self, trajectory_id: u64, feedback: UserFeedback) {
|
||||
// Find trajectory and update
|
||||
let mut trajectories = self.trajectories.write();
|
||||
if let Some(traj) = trajectories.iter_mut().find(|t| t.id == trajectory_id) {
|
||||
traj.feedback = Some(feedback.clone());
|
||||
}
|
||||
|
||||
// Update related pattern confidence
|
||||
// Higher feedback = higher confidence in that pattern's params
|
||||
if let Some(pattern_id) = self.find_pattern_for_trajectory(trajectory_id) {
|
||||
if let Some(mut pattern) = self.patterns.get_mut(&pattern_id) {
|
||||
let feedback_delta = feedback.rating as f32 / 5.0 - 0.5; // -0.5 to +0.5
|
||||
pattern.confidence = (pattern.confidence + 0.1 * feedback_delta).clamp(0.0, 1.0);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Pattern Extraction
|
||||
|
||||
### K-Means++ Clustering for Pattern Discovery
|
||||
|
||||
```rust
|
||||
/// Pattern extractor using K-means++ clustering
|
||||
pub struct PatternExtractor {
|
||||
/// Number of clusters to extract
|
||||
k: usize,
|
||||
/// Maximum iterations
|
||||
max_iter: usize,
|
||||
/// Convergence threshold
|
||||
epsilon: f32,
|
||||
}
|
||||
|
||||
impl PatternExtractor {
|
||||
/// Extract patterns from trajectories
|
||||
pub fn extract(&self, trajectories: &[QueryTrajectory]) -> Vec<LearnedPattern> {
|
||||
if trajectories.len() < self.k {
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
// Collect embeddings
|
||||
let embeddings: Vec<&[f32]> = trajectories.iter()
|
||||
.map(|t| t.query_embedding.as_slice())
|
||||
.collect();
|
||||
|
||||
// K-means++ initialization
|
||||
let mut centroids = self.kmeans_plus_plus_init(&embeddings);
|
||||
|
||||
// K-means iteration
|
||||
let mut assignments = vec![0usize; trajectories.len()];
|
||||
for _ in 0..self.max_iter {
|
||||
// Assignment step
|
||||
let old_assignments = assignments.clone();
|
||||
for (i, emb) in embeddings.iter().enumerate() {
|
||||
let mut min_dist = f32::MAX;
|
||||
let mut min_idx = 0;
|
||||
for (c_idx, centroid) in centroids.iter().enumerate() {
|
||||
let dist = euclidean_distance(emb, centroid);
|
||||
if dist < min_dist {
|
||||
min_dist = dist;
|
||||
min_idx = c_idx;
|
||||
}
|
||||
}
|
||||
assignments[i] = min_idx;
|
||||
}
|
||||
|
||||
// Check convergence
|
||||
if assignments == old_assignments {
|
||||
break;
|
||||
}
|
||||
|
||||
// Update step
|
||||
centroids = self.compute_centroids(&embeddings, &assignments);
|
||||
}
|
||||
|
||||
// Create patterns from clusters
|
||||
let mut patterns = Vec::new();
|
||||
for cluster_id in 0..self.k {
|
||||
let cluster_trajectories: Vec<_> = trajectories.iter()
|
||||
.zip(assignments.iter())
|
||||
.filter(|(_, &a)| a == cluster_id)
|
||||
.map(|(t, _)| t)
|
||||
.collect();
|
||||
|
||||
if cluster_trajectories.len() < 3 {
|
||||
continue; // Skip small clusters
|
||||
}
|
||||
|
||||
let pattern = self.create_pattern_from_cluster(
|
||||
cluster_id as u64,
|
||||
¢roids[cluster_id],
|
||||
&cluster_trajectories,
|
||||
);
|
||||
patterns.push(pattern);
|
||||
}
|
||||
|
||||
patterns
|
||||
}
|
||||
|
||||
fn kmeans_plus_plus_init(&self, embeddings: &[&[f32]]) -> Vec<Vec<f32>> {
|
||||
let mut centroids = Vec::with_capacity(self.k);
|
||||
let mut rng = rand::thread_rng();
|
||||
|
||||
// First centroid: random
|
||||
let first_idx = rng.gen_range(0..embeddings.len());
|
||||
centroids.push(embeddings[first_idx].to_vec());
|
||||
|
||||
// Remaining centroids: D² weighting
|
||||
for _ in 1..self.k {
|
||||
let mut distances: Vec<f32> = embeddings.iter()
|
||||
.map(|emb| {
|
||||
centroids.iter()
|
||||
.map(|c| euclidean_distance(emb, c))
|
||||
.fold(f32::MAX, f32::min)
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Square distances for D² sampling
|
||||
let total: f32 = distances.iter().map(|d| d * d).sum();
|
||||
let threshold = rng.gen::<f32>() * total;
|
||||
|
||||
let mut cumsum = 0.0;
|
||||
let mut selected = 0;
|
||||
for (i, d) in distances.iter().enumerate() {
|
||||
cumsum += d * d;
|
||||
if cumsum >= threshold {
|
||||
selected = i;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
centroids.push(embeddings[selected].to_vec());
|
||||
}
|
||||
|
||||
centroids
|
||||
}
|
||||
|
||||
fn create_pattern_from_cluster(
|
||||
&self,
|
||||
id: u64,
|
||||
centroid: &[f32],
|
||||
trajectories: &[&QueryTrajectory],
|
||||
) -> LearnedPattern {
|
||||
// Compute optimal params as weighted average by quality
|
||||
let mut total_weight = 0.0f32;
|
||||
let mut ef_sum = 0.0f32;
|
||||
let mut probes_sum = 0.0f32;
|
||||
let mut temp_sum = 0.0f32;
|
||||
let mut precision_sum = 0.0f32;
|
||||
let mut recall_sum = 0.0f32;
|
||||
let mut latency_sum = 0u64;
|
||||
|
||||
for t in trajectories {
|
||||
let weight = t.precision * t.recall; // Quality as weight
|
||||
total_weight += weight;
|
||||
|
||||
ef_sum += t.search_params.ef_search as f32 * weight;
|
||||
probes_sum += t.search_params.n_probes as f32 * weight;
|
||||
temp_sum += t.search_params.temperature * weight;
|
||||
precision_sum += t.precision;
|
||||
recall_sum += t.recall;
|
||||
latency_sum += t.latency_us;
|
||||
}
|
||||
|
||||
let n = trajectories.len() as f32;
|
||||
|
||||
LearnedPattern {
|
||||
id,
|
||||
centroid: centroid.to_vec(),
|
||||
optimal_params: SearchParams {
|
||||
ef_search: (ef_sum / total_weight).round() as usize,
|
||||
n_probes: (probes_sum / total_weight).round() as usize,
|
||||
model_tier: ModelTier::Auto, // Determined separately
|
||||
context_tokens: 2048, // Default
|
||||
temperature: temp_sum / total_weight,
|
||||
},
|
||||
confidence: (total_weight / n).clamp(0.0, 1.0),
|
||||
support_count: trajectories.len(),
|
||||
avg_precision: precision_sum / n,
|
||||
avg_recall: recall_sum / n,
|
||||
avg_latency_us: latency_sum / trajectories.len() as u64,
|
||||
created_at: chrono::Utc::now().timestamp(),
|
||||
updated_at: chrono::Utc::now().timestamp(),
|
||||
abstraction_level: 0,
|
||||
children: Vec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Verdict Judgment System
|
||||
|
||||
### Evaluating What Works Best
|
||||
|
||||
```rust
|
||||
/// Verdict judge for parameter optimization
|
||||
pub struct VerdictJudge {
|
||||
/// Minimum samples for statistical significance
|
||||
min_samples: usize,
|
||||
/// Significance level (p-value threshold)
|
||||
alpha: f32,
|
||||
}
|
||||
|
||||
impl VerdictJudge {
|
||||
/// Judge optimal parameters for a pattern
|
||||
pub fn judge(&self, pattern: &LearnedPattern, trajectories: &[&QueryTrajectory]) -> Option<Verdict> {
|
||||
if trajectories.len() < self.min_samples {
|
||||
return None; // Not enough evidence
|
||||
}
|
||||
|
||||
// Group trajectories by parameter configuration
|
||||
let mut param_groups: HashMap<ParamKey, Vec<&QueryTrajectory>> = HashMap::new();
|
||||
for t in trajectories {
|
||||
let key = ParamKey::from(&t.search_params);
|
||||
param_groups.entry(key).or_default().push(t);
|
||||
}
|
||||
|
||||
// Find best performing configuration
|
||||
let mut best_config: Option<(ParamKey, f32, Vec<&QueryTrajectory>)> = None;
|
||||
|
||||
for (key, group) in ¶m_groups {
|
||||
if group.len() < 3 {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Compute quality score (F1 of precision and recall)
|
||||
let avg_quality: f32 = group.iter()
|
||||
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
|
||||
.sum::<f32>() / group.len() as f32;
|
||||
|
||||
match &best_config {
|
||||
None => best_config = Some((key.clone(), avg_quality, group.clone())),
|
||||
Some((_, best_quality, _)) if avg_quality > *best_quality => {
|
||||
best_config = Some((key.clone(), avg_quality, group.clone()));
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
let (best_key, best_quality, best_group) = best_config?;
|
||||
|
||||
// Statistical significance test
|
||||
let p_value = self.compute_significance(&best_group, trajectories);
|
||||
if p_value > self.alpha {
|
||||
return None; // Not significant
|
||||
}
|
||||
|
||||
// Compute consistency (inverse of coefficient of variation)
|
||||
let qualities: Vec<f32> = best_group.iter()
|
||||
.map(|t| 2.0 * t.precision * t.recall / (t.precision + t.recall + 1e-6))
|
||||
.collect();
|
||||
let mean = qualities.iter().sum::<f32>() / qualities.len() as f32;
|
||||
let variance = qualities.iter()
|
||||
.map(|q| (q - mean).powi(2))
|
||||
.sum::<f32>() / qualities.len() as f32;
|
||||
let std_dev = variance.sqrt();
|
||||
let consistency = 1.0 / (1.0 + std_dev / mean);
|
||||
|
||||
// Compute improvement over default
|
||||
let default_quality = self.compute_default_quality(trajectories);
|
||||
let improvement = (best_quality - default_quality) / default_quality;
|
||||
|
||||
Some(Verdict {
|
||||
pattern_id: pattern.id,
|
||||
recommended_params: best_key.to_params(),
|
||||
confidence: best_quality * consistency,
|
||||
evidence: VerdictEvidence {
|
||||
support_count: best_group.len(),
|
||||
avg_improvement: improvement,
|
||||
p_value,
|
||||
consistency,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
fn compute_significance(&self, best: &[&QueryTrajectory], all: &[&QueryTrajectory]) -> f32 {
|
||||
// Welch's t-test for comparing means
|
||||
let best_qualities: Vec<f32> = best.iter()
|
||||
.map(|t| t.precision * t.recall)
|
||||
.collect();
|
||||
let all_qualities: Vec<f32> = all.iter()
|
||||
.map(|t| t.precision * t.recall)
|
||||
.collect();
|
||||
|
||||
welch_t_test(&best_qualities, &all_qualities)
|
||||
}
|
||||
|
||||
fn compute_default_quality(&self, trajectories: &[&QueryTrajectory]) -> f32 {
|
||||
// Assume first configuration or most common is "default"
|
||||
let default_group: Vec<_> = trajectories.iter()
|
||||
.filter(|t| t.search_params.ef_search == SearchParams::default().ef_search)
|
||||
.collect();
|
||||
|
||||
if default_group.is_empty() {
|
||||
0.5 // Baseline assumption
|
||||
} else {
|
||||
default_group.iter()
|
||||
.map(|t| t.precision * t.recall)
|
||||
.sum::<f32>() / default_group.len() as f32
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Integration with Router
|
||||
|
||||
### Using ReasoningBank to Optimize Router Decisions
|
||||
|
||||
```rust
|
||||
impl FastGRNNRouter {
|
||||
/// Forward pass with ReasoningBank optimization
|
||||
pub fn forward_with_reasoning(
|
||||
&self,
|
||||
features: &[f32],
|
||||
reasoning_bank: &ReasoningBank,
|
||||
) -> RouterDecision {
|
||||
// Get pattern-based parameter suggestions
|
||||
let pattern_params = reasoning_bank.get_optimized_params(features);
|
||||
|
||||
// Standard router forward
|
||||
let mut decision = self.forward(features);
|
||||
|
||||
// Blend router decision with pattern suggestions
|
||||
if pattern_params.confidence > 0.5 {
|
||||
let blend_factor = pattern_params.confidence * 0.3; // Max 30% influence
|
||||
|
||||
// Interpolate temperature
|
||||
decision.temperature = (1.0 - blend_factor) * decision.temperature
|
||||
+ blend_factor * pattern_params.params.temperature;
|
||||
|
||||
// Context token suggestion influences context selection
|
||||
let suggested_context = pattern_params.params.context_tokens;
|
||||
let router_context = decision.context_tokens;
|
||||
decision.context_tokens = ((1.0 - blend_factor) * router_context as f32
|
||||
+ blend_factor * suggested_context as f32) as usize;
|
||||
|
||||
decision.reasoning_confidence = pattern_params.confidence;
|
||||
decision.reasoning_pattern_id = pattern_params.source.pattern_id();
|
||||
}
|
||||
|
||||
decision
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Pattern Consolidation and Pruning
|
||||
|
||||
### Managing Pattern Memory
|
||||
|
||||
```rust
|
||||
impl ReasoningBank {
|
||||
/// Consolidate similar patterns
|
||||
pub fn consolidate_patterns(&mut self) {
|
||||
// Find similar pattern pairs
|
||||
let pattern_ids: Vec<u64> = self.patterns.iter()
|
||||
.map(|p| *p.key())
|
||||
.collect();
|
||||
|
||||
let mut to_merge: Vec<(u64, u64)> = Vec::new();
|
||||
|
||||
for i in 0..pattern_ids.len() {
|
||||
for j in (i+1)..pattern_ids.len() {
|
||||
let p1 = self.patterns.get(&pattern_ids[i]).unwrap();
|
||||
let p2 = self.patterns.get(&pattern_ids[j]).unwrap();
|
||||
|
||||
let similarity = cosine_similarity(&p1.centroid, &p2.centroid);
|
||||
if similarity > 0.95 {
|
||||
// Very similar - merge
|
||||
to_merge.push((pattern_ids[i], pattern_ids[j]));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Merge patterns
|
||||
for (keep_id, remove_id) in to_merge {
|
||||
if let (Some(mut keep), Some(remove)) = (
|
||||
self.patterns.get_mut(&keep_id),
|
||||
self.patterns.get(&remove_id)
|
||||
) {
|
||||
// Weighted average of centroids
|
||||
let total_support = keep.support_count + remove.support_count;
|
||||
let w1 = keep.support_count as f32 / total_support as f32;
|
||||
let w2 = remove.support_count as f32 / total_support as f32;
|
||||
|
||||
for (c, (c1, c2)) in keep.centroid.iter_mut()
|
||||
.zip(keep.centroid.iter().zip(remove.centroid.iter()))
|
||||
{
|
||||
*c = w1 * c1 + w2 * c2;
|
||||
}
|
||||
|
||||
// Update support count
|
||||
keep.support_count = total_support;
|
||||
keep.confidence = (keep.confidence * w1 + remove.confidence * w2).min(1.0);
|
||||
keep.updated_at = chrono::Utc::now().timestamp();
|
||||
}
|
||||
|
||||
// Remove merged pattern
|
||||
self.patterns.remove(&remove_id);
|
||||
}
|
||||
}
|
||||
|
||||
/// Prune low-confidence patterns
|
||||
pub fn prune_patterns(&mut self, min_confidence: f32, min_support: usize) {
|
||||
let to_remove: Vec<u64> = self.patterns.iter()
|
||||
.filter(|p| p.confidence < min_confidence || p.support_count < min_support)
|
||||
.map(|p| *p.key())
|
||||
.collect();
|
||||
|
||||
for id in to_remove {
|
||||
self.patterns.remove(&id);
|
||||
self.verdicts.remove(&id);
|
||||
}
|
||||
}
|
||||
|
||||
/// Build pattern hierarchy (abstraction levels)
|
||||
pub fn build_hierarchy(&mut self) {
|
||||
// Hierarchical clustering on existing patterns
|
||||
let patterns: Vec<_> = self.patterns.iter()
|
||||
.map(|p| (p.key().clone(), p.centroid.clone()))
|
||||
.collect();
|
||||
|
||||
let hierarchy = HierarchicalClustering::new()
|
||||
.linkage(Linkage::Ward)
|
||||
.fit(&patterns);
|
||||
|
||||
// Create meta-patterns at each level
|
||||
for level in 1..=3 {
|
||||
let clusters = hierarchy.clusters_at_level(level);
|
||||
|
||||
for cluster in clusters {
|
||||
if cluster.size() > 1 {
|
||||
let child_ids: Vec<u64> = cluster.member_ids();
|
||||
let meta_centroid = cluster.centroid();
|
||||
|
||||
// Average params from children
|
||||
let children: Vec<_> = child_ids.iter()
|
||||
.filter_map(|id| self.patterns.get(id))
|
||||
.collect();
|
||||
|
||||
let meta_params = self.average_params(&children);
|
||||
|
||||
let meta_pattern = LearnedPattern {
|
||||
id: self.next_pattern_id.fetch_add(1, Ordering::SeqCst),
|
||||
centroid: meta_centroid,
|
||||
optimal_params: meta_params,
|
||||
confidence: children.iter().map(|c| c.confidence).sum::<f32>() / children.len() as f32,
|
||||
support_count: children.iter().map(|c| c.support_count).sum(),
|
||||
avg_precision: children.iter().map(|c| c.avg_precision).sum::<f32>() / children.len() as f32,
|
||||
avg_recall: children.iter().map(|c| c.avg_recall).sum::<f32>() / children.len() as f32,
|
||||
avg_latency_us: children.iter().map(|c| c.avg_latency_us).sum::<u64>() / children.len() as u64,
|
||||
created_at: chrono::Utc::now().timestamp(),
|
||||
updated_at: chrono::Utc::now().timestamp(),
|
||||
abstraction_level: level as u32,
|
||||
children: child_ids,
|
||||
};
|
||||
|
||||
self.patterns.insert(meta_pattern.id, meta_pattern);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Statistics and Monitoring
|
||||
|
||||
```rust
|
||||
#[derive(Default, Debug)]
|
||||
pub struct ReasoningBankStats {
|
||||
/// Total trajectories recorded
|
||||
pub total_trajectories: u64,
|
||||
/// Total patterns stored
|
||||
pub total_patterns: usize,
|
||||
/// Total verdicts issued
|
||||
pub total_verdicts: usize,
|
||||
/// Pattern match hit rate
|
||||
pub pattern_hit_rate: f32,
|
||||
/// Average confidence in recommendations
|
||||
pub avg_recommendation_confidence: f32,
|
||||
/// Improvement from pattern optimization
|
||||
pub avg_improvement_percent: f32,
|
||||
}
|
||||
|
||||
impl ReasoningBank {
|
||||
/// Get current statistics
|
||||
pub fn stats(&self) -> ReasoningBankStats {
|
||||
let stats = self.stats.read();
|
||||
ReasoningBankStats {
|
||||
total_trajectories: stats.total_trajectories,
|
||||
total_patterns: self.patterns.len(),
|
||||
total_verdicts: self.verdicts.len(),
|
||||
pattern_hit_rate: stats.pattern_hit_rate,
|
||||
avg_recommendation_confidence: stats.avg_recommendation_confidence,
|
||||
avg_improvement_percent: stats.avg_improvement_percent,
|
||||
}
|
||||
}
|
||||
|
||||
/// Export all patterns for persistence
|
||||
pub fn export(&self) -> ReasoningBankExport {
|
||||
ReasoningBankExport {
|
||||
patterns: self.patterns.iter()
|
||||
.map(|p| p.value().clone())
|
||||
.collect(),
|
||||
verdicts: self.verdicts.iter()
|
||||
.map(|v| v.value().clone())
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Import patterns from persistence
|
||||
pub fn import(&mut self, export: ReasoningBankExport) {
|
||||
for pattern in export.patterns {
|
||||
let id = pattern.id;
|
||||
self.patterns.insert(id, pattern.clone());
|
||||
self.pattern_index.write().insert(id, &pattern.centroid);
|
||||
}
|
||||
for verdict in export.verdicts {
|
||||
self.verdicts.insert(verdict.pattern_id, verdict);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
ReasoningBank enables SONA to:
|
||||
|
||||
1. **Learn from every query** through trajectory recording
|
||||
2. **Discover patterns** via K-means++ clustering
|
||||
3. **Judge what works** through statistical verdict analysis
|
||||
4. **Optimize future decisions** by interpolating from similar patterns
|
||||
5. **Build abstractions** through hierarchical pattern consolidation
|
||||
|
||||
This creates a continuously improving system where past experience directly enhances future performance.
|
||||
755
examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
Normal file
755
examples/ruvLLM/docs/SONA/05-MEMORY-DREAMS.md
Normal file
@@ -0,0 +1,755 @@
|
||||
# SONA Memory Dreams: Offline Consolidation Engine
|
||||
|
||||
## Creativity Through Neural Replay and Recombination
|
||||
|
||||
---
|
||||
|
||||
## 1. Biological Inspiration
|
||||
|
||||
### Why Dreams Matter for Learning
|
||||
|
||||
```
|
||||
HUMAN SLEEP-BASED LEARNING
|
||||
══════════════════════════
|
||||
|
||||
Awake: Sleep (REM): Next Day:
|
||||
───────────────── ───────────────── ─────────────────
|
||||
• New experiences • Replay memories • Consolidated knowledge
|
||||
• Pattern matching • Recombine ideas • Novel insights
|
||||
• Working memory • Strengthen important • Creative connections
|
||||
• Prune unimportant
|
||||
```
|
||||
|
||||
Research shows that:
|
||||
- **Memory consolidation** happens during sleep
|
||||
- **Creative insights** emerge from random memory replay
|
||||
- **Neural pruning** removes low-value connections
|
||||
- **Analogical reasoning** connects distant concepts
|
||||
|
||||
SONA's Dream Engine replicates these mechanisms for AI self-improvement.
|
||||
|
||||
---
|
||||
|
||||
## 2. Dream Engine Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ DREAM ENGINE ARCHITECTURE │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ MEMORY GRAPH │──────┐ │
|
||||
│ └───────────────┘ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────┐ │
|
||||
│ │ DREAM GENERATOR │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────┐ ┌─────────┐ │ │
|
||||
│ │ │ Random │ │Weighted │ │ │
|
||||
│ │ │ Walks │ │ Sampling│ │ │
|
||||
│ │ └────┬────┘ └────┬────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ▼ ▼ │ │
|
||||
│ │ ┌──────────────────────┐ │ │
|
||||
│ │ │ Dream Sequence │ │ │
|
||||
│ │ │ [M₁→M₂→M₃→...→Mₙ] │ │ │
|
||||
│ │ └──────────┬───────────┘ │ │
|
||||
│ └─────────────┼───────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────┐ │
|
||||
│ │ DREAM EVALUATOR │ │
|
||||
│ │ │ │
|
||||
│ │ • Novelty Score (new connections?) │ │
|
||||
│ │ • Coherence Score (makes sense?) │ │
|
||||
│ │ • Utility Score (useful insight?) │ │
|
||||
│ └─────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────┐ │
|
||||
│ │ DREAM INTEGRATOR │ │
|
||||
│ │ │ │
|
||||
│ │ • Add weak creative edges │ │
|
||||
│ │ • Update pattern associations │ │
|
||||
│ │ • Generate novel hypotheses │ │
|
||||
│ └─────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Dream Generation
|
||||
|
||||
### Random Walk Memory Replay
|
||||
|
||||
```rust
|
||||
/// Dream generator using random walks on memory graph
|
||||
pub struct DreamGenerator {
|
||||
/// Temperature for random walk (higher = more random)
|
||||
temperature: f32,
|
||||
/// Maximum dream length
|
||||
max_length: usize,
|
||||
/// Minimum coherence threshold
|
||||
min_coherence: f32,
|
||||
/// Creativity bias (prefer novel connections)
|
||||
creativity_bias: f32,
|
||||
}
|
||||
|
||||
impl DreamGenerator {
|
||||
/// Generate a single dream sequence
|
||||
pub fn generate_dream(
|
||||
&self,
|
||||
memory: &MemoryGraph,
|
||||
start_node: Option<NodeId>,
|
||||
) -> Dream {
|
||||
let mut sequence = Vec::new();
|
||||
let mut visited = HashSet::new();
|
||||
|
||||
// Start from random high-activation node if not specified
|
||||
let current = start_node.unwrap_or_else(|| {
|
||||
memory.sample_by_activation()
|
||||
});
|
||||
|
||||
sequence.push(current);
|
||||
visited.insert(current);
|
||||
|
||||
// Random walk with creativity-weighted transitions
|
||||
for _ in 0..self.max_length {
|
||||
let neighbors = memory.get_neighbors(current);
|
||||
|
||||
if neighbors.is_empty() {
|
||||
break;
|
||||
}
|
||||
|
||||
// Compute transition probabilities
|
||||
let probs: Vec<f32> = neighbors.iter()
|
||||
.map(|&(neighbor, edge_weight)| {
|
||||
let novelty_bonus = if visited.contains(&neighbor) {
|
||||
0.1 // Discourage revisits
|
||||
} else {
|
||||
1.0 + self.creativity_bias * (1.0 - memory.get_access_frequency(neighbor))
|
||||
};
|
||||
|
||||
(edge_weight * novelty_bonus).powf(1.0 / self.temperature)
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Sample next node
|
||||
let next = sample_weighted(&neighbors, &probs);
|
||||
|
||||
if let Some((next_node, _)) = next {
|
||||
sequence.push(next_node);
|
||||
visited.insert(next_node);
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Dream {
|
||||
sequence,
|
||||
temperature: self.temperature,
|
||||
timestamp: chrono::Utc::now().timestamp(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Generate creative jump dream (non-local connections)
|
||||
pub fn generate_creative_dream(
|
||||
&self,
|
||||
memory: &MemoryGraph,
|
||||
num_jumps: usize,
|
||||
) -> Dream {
|
||||
let mut sequence = Vec::new();
|
||||
|
||||
// Sample diverse starting points
|
||||
let anchors = memory.sample_diverse(num_jumps, 0.3);
|
||||
|
||||
for anchor in anchors {
|
||||
sequence.push(anchor);
|
||||
|
||||
// Short local walk from each anchor
|
||||
let local_walk = self.generate_dream(memory, Some(anchor));
|
||||
sequence.extend(local_walk.sequence.iter().skip(1).take(3));
|
||||
}
|
||||
|
||||
Dream {
|
||||
sequence,
|
||||
temperature: self.temperature * 2.0, // Higher temperature for creative dreams
|
||||
timestamp: chrono::Utc::now().timestamp(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A dream sequence
|
||||
pub struct Dream {
|
||||
/// Sequence of visited memory nodes
|
||||
pub sequence: Vec<NodeId>,
|
||||
/// Temperature used for generation
|
||||
pub temperature: f32,
|
||||
/// Generation timestamp
|
||||
pub timestamp: i64,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Dream Evaluation
|
||||
|
||||
### Measuring Dream Quality
|
||||
|
||||
```rust
|
||||
/// Evaluator for dream quality
|
||||
pub struct DreamEvaluator {
|
||||
/// Memory graph reference
|
||||
memory: Arc<MemoryGraph>,
|
||||
/// Novelty detection threshold
|
||||
novelty_threshold: f32,
|
||||
}
|
||||
|
||||
impl DreamEvaluator {
|
||||
/// Evaluate dream quality across multiple dimensions
|
||||
pub fn evaluate(&self, dream: &Dream) -> DreamQuality {
|
||||
DreamQuality {
|
||||
novelty: self.compute_novelty(dream),
|
||||
coherence: self.compute_coherence(dream),
|
||||
utility: self.compute_utility(dream),
|
||||
diversity: self.compute_diversity(dream),
|
||||
}
|
||||
}
|
||||
|
||||
/// Novelty: How many new connections are suggested?
|
||||
fn compute_novelty(&self, dream: &Dream) -> f32 {
|
||||
let mut novel_pairs = 0;
|
||||
let mut total_pairs = 0;
|
||||
|
||||
for i in 0..dream.sequence.len() {
|
||||
for j in (i+1)..dream.sequence.len() {
|
||||
total_pairs += 1;
|
||||
|
||||
let node_a = dream.sequence[i];
|
||||
let node_b = dream.sequence[j];
|
||||
|
||||
// Check if edge exists
|
||||
if !self.memory.has_edge(node_a, node_b) {
|
||||
// Check semantic similarity
|
||||
let emb_a = self.memory.get_embedding(node_a);
|
||||
let emb_b = self.memory.get_embedding(node_b);
|
||||
let sim = cosine_similarity(&emb_a, &emb_b);
|
||||
|
||||
// Novel = no edge but moderate similarity
|
||||
if sim > 0.3 && sim < 0.8 {
|
||||
novel_pairs += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
novel_pairs as f32 / total_pairs.max(1) as f32
|
||||
}
|
||||
|
||||
/// Coherence: Does the dream sequence make semantic sense?
|
||||
fn compute_coherence(&self, dream: &Dream) -> f32 {
|
||||
if dream.sequence.len() < 2 {
|
||||
return 1.0;
|
||||
}
|
||||
|
||||
let mut coherence_sum = 0.0f32;
|
||||
|
||||
for window in dream.sequence.windows(2) {
|
||||
let emb_a = self.memory.get_embedding(window[0]);
|
||||
let emb_b = self.memory.get_embedding(window[1]);
|
||||
coherence_sum += cosine_similarity(&emb_a, &emb_b);
|
||||
}
|
||||
|
||||
coherence_sum / (dream.sequence.len() - 1) as f32
|
||||
}
|
||||
|
||||
/// Utility: Are the suggested connections potentially useful?
|
||||
fn compute_utility(&self, dream: &Dream) -> f32 {
|
||||
// Based on node quality scores and access patterns
|
||||
let avg_quality: f32 = dream.sequence.iter()
|
||||
.map(|&id| self.memory.get_node_quality(id))
|
||||
.sum::<f32>() / dream.sequence.len() as f32;
|
||||
|
||||
// Higher utility if connecting high-quality nodes
|
||||
avg_quality
|
||||
}
|
||||
|
||||
/// Diversity: How diverse are the visited nodes?
|
||||
fn compute_diversity(&self, dream: &Dream) -> f32 {
|
||||
// Average pairwise distance in embedding space
|
||||
let embeddings: Vec<_> = dream.sequence.iter()
|
||||
.map(|&id| self.memory.get_embedding(id))
|
||||
.collect();
|
||||
|
||||
let mut total_dist = 0.0f32;
|
||||
let mut count = 0;
|
||||
|
||||
for i in 0..embeddings.len() {
|
||||
for j in (i+1)..embeddings.len() {
|
||||
total_dist += 1.0 - cosine_similarity(&embeddings[i], &embeddings[j]);
|
||||
count += 1;
|
||||
}
|
||||
}
|
||||
|
||||
total_dist / count.max(1) as f32
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DreamQuality {
|
||||
/// How many novel connections suggested (0-1)
|
||||
pub novelty: f32,
|
||||
/// How semantically coherent (0-1)
|
||||
pub coherence: f32,
|
||||
/// How useful the connections might be (0-1)
|
||||
pub utility: f32,
|
||||
/// How diverse the dream content (0-1)
|
||||
pub diversity: f32,
|
||||
}
|
||||
|
||||
impl DreamQuality {
|
||||
/// Overall quality score
|
||||
pub fn overall(&self) -> f32 {
|
||||
// Weighted combination favoring novelty and coherence
|
||||
0.4 * self.novelty + 0.3 * self.coherence + 0.2 * self.utility + 0.1 * self.diversity
|
||||
}
|
||||
|
||||
/// Is this dream worth integrating?
|
||||
pub fn is_valuable(&self, threshold: f32) -> bool {
|
||||
self.novelty > 0.3 && self.coherence > 0.4 && self.overall() > threshold
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Dream Integration
|
||||
|
||||
### Applying Dream Insights to Memory
|
||||
|
||||
```rust
|
||||
/// Integrates valuable dreams into memory graph
|
||||
pub struct DreamIntegrator {
|
||||
/// Memory graph to update
|
||||
memory: Arc<RwLock<MemoryGraph>>,
|
||||
/// Strength of new creative edges
|
||||
creative_edge_strength: f32,
|
||||
/// Decay factor for dream-derived edges
|
||||
dream_edge_decay: f32,
|
||||
}
|
||||
|
||||
impl DreamIntegrator {
|
||||
/// Integrate a valuable dream into memory
|
||||
pub fn integrate(&self, dream: &Dream, quality: &DreamQuality) -> IntegrationResult {
|
||||
let mut result = IntegrationResult::default();
|
||||
|
||||
if !quality.is_valuable(0.5) {
|
||||
return result; // Skip low-quality dreams
|
||||
}
|
||||
|
||||
let mut memory = self.memory.write();
|
||||
|
||||
// Extract novel connections from dream
|
||||
let novel_connections = self.extract_novel_connections(dream, &memory);
|
||||
|
||||
for (node_a, node_b, strength) in novel_connections {
|
||||
// Add weak creative edge
|
||||
let edge_strength = self.creative_edge_strength * strength * quality.overall();
|
||||
|
||||
memory.add_edge(
|
||||
node_a,
|
||||
node_b,
|
||||
EdgeType::Creative,
|
||||
edge_strength,
|
||||
);
|
||||
|
||||
result.edges_added += 1;
|
||||
}
|
||||
|
||||
// Update node associations based on dream co-occurrence
|
||||
for window in dream.sequence.windows(3) {
|
||||
memory.update_association(window[0], window[2], 0.01);
|
||||
}
|
||||
|
||||
result.dream_quality = quality.overall();
|
||||
result
|
||||
}
|
||||
|
||||
fn extract_novel_connections(
|
||||
&self,
|
||||
dream: &Dream,
|
||||
memory: &MemoryGraph,
|
||||
) -> Vec<(NodeId, NodeId, f32)> {
|
||||
let mut connections = Vec::new();
|
||||
|
||||
for i in 0..dream.sequence.len() {
|
||||
for j in (i+1)..dream.sequence.len().min(i+5) { // Only nearby in sequence
|
||||
let node_a = dream.sequence[i];
|
||||
let node_b = dream.sequence[j];
|
||||
|
||||
if !memory.has_edge(node_a, node_b) {
|
||||
let emb_a = memory.get_embedding(node_a);
|
||||
let emb_b = memory.get_embedding(node_b);
|
||||
let sim = cosine_similarity(&emb_a, &emb_b);
|
||||
|
||||
if sim > 0.3 {
|
||||
// Connection strength based on similarity and sequence proximity
|
||||
let proximity_factor = 1.0 / (j - i) as f32;
|
||||
let strength = sim * proximity_factor;
|
||||
connections.push((node_a, node_b, strength));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
connections
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
pub struct IntegrationResult {
|
||||
pub edges_added: usize,
|
||||
pub associations_updated: usize,
|
||||
pub dream_quality: f32,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Memory Consolidation
|
||||
|
||||
### Strengthening Important Memories
|
||||
|
||||
```rust
|
||||
/// Consolidation engine for memory pruning and strengthening
|
||||
pub struct ConsolidationEngine {
|
||||
/// Memory graph reference
|
||||
memory: Arc<RwLock<MemoryGraph>>,
|
||||
/// Minimum access frequency for retention
|
||||
min_access_frequency: f32,
|
||||
/// Age decay factor (older = more decay)
|
||||
age_decay: f32,
|
||||
/// Quality threshold for preservation
|
||||
quality_threshold: f32,
|
||||
}
|
||||
|
||||
impl ConsolidationEngine {
|
||||
/// Run full consolidation pass
|
||||
pub fn consolidate(&self) -> ConsolidationReport {
|
||||
let mut report = ConsolidationReport::default();
|
||||
|
||||
// Phase 1: Identify memories by value
|
||||
let (high_value, medium_value, low_value) = self.categorize_memories();
|
||||
report.high_value_count = high_value.len();
|
||||
report.medium_value_count = medium_value.len();
|
||||
report.low_value_count = low_value.len();
|
||||
|
||||
// Phase 2: Strengthen high-value memories
|
||||
for &node_id in &high_value {
|
||||
self.strengthen_memory(node_id);
|
||||
report.memories_strengthened += 1;
|
||||
}
|
||||
|
||||
// Phase 3: Decay low-value memories
|
||||
for &node_id in &low_value {
|
||||
let retained = self.decay_memory(node_id);
|
||||
if retained {
|
||||
report.memories_decayed += 1;
|
||||
} else {
|
||||
report.memories_removed += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 4: Prune weak edges
|
||||
let pruned = self.prune_weak_edges();
|
||||
report.edges_pruned = pruned;
|
||||
|
||||
// Phase 5: Merge similar memories
|
||||
let merged = self.merge_similar_memories();
|
||||
report.memories_merged = merged;
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
fn categorize_memories(&self) -> (Vec<NodeId>, Vec<NodeId>, Vec<NodeId>) {
|
||||
let memory = self.memory.read();
|
||||
let mut high = Vec::new();
|
||||
let mut medium = Vec::new();
|
||||
let mut low = Vec::new();
|
||||
|
||||
for node in memory.iter_nodes() {
|
||||
let value_score = self.compute_value_score(node);
|
||||
|
||||
if value_score > 0.7 {
|
||||
high.push(node.id);
|
||||
} else if value_score > 0.3 {
|
||||
medium.push(node.id);
|
||||
} else {
|
||||
low.push(node.id);
|
||||
}
|
||||
}
|
||||
|
||||
(high, medium, low)
|
||||
}
|
||||
|
||||
fn compute_value_score(&self, node: &MemoryNode) -> f32 {
|
||||
let memory = self.memory.read();
|
||||
|
||||
// Factors:
|
||||
// 1. Access frequency (more access = more valuable)
|
||||
let freq_score = (node.access_count as f32 / 100.0).min(1.0);
|
||||
|
||||
// 2. Recency (recent = more valuable)
|
||||
let age_days = (chrono::Utc::now().timestamp() - node.last_accessed) / 86400;
|
||||
let recency_score = (-self.age_decay * age_days as f32).exp();
|
||||
|
||||
// 3. Quality (explicit quality score)
|
||||
let quality_score = node.quality_score;
|
||||
|
||||
// 4. Connectivity (well-connected = more valuable)
|
||||
let degree = memory.node_degree(node.id);
|
||||
let connectivity_score = (degree as f32 / 10.0).min(1.0);
|
||||
|
||||
// Weighted combination
|
||||
0.3 * freq_score + 0.2 * recency_score + 0.3 * quality_score + 0.2 * connectivity_score
|
||||
}
|
||||
|
||||
fn strengthen_memory(&self, node_id: NodeId) {
|
||||
let mut memory = self.memory.write();
|
||||
|
||||
// Increase edge weights to this node
|
||||
for edge in memory.get_edges_to(node_id) {
|
||||
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(1.1));
|
||||
}
|
||||
|
||||
// Mark as consolidated
|
||||
if let Some(node) = memory.get_node_mut(node_id) {
|
||||
node.consolidation_count += 1;
|
||||
node.last_consolidated = chrono::Utc::now().timestamp();
|
||||
}
|
||||
}
|
||||
|
||||
fn decay_memory(&self, node_id: NodeId) -> bool {
|
||||
let mut memory = self.memory.write();
|
||||
|
||||
// Reduce edge weights
|
||||
for edge in memory.get_edges_to(node_id) {
|
||||
memory.update_edge_weight(edge.from, node_id, EdgeUpdate::Multiply(0.5));
|
||||
}
|
||||
|
||||
// Check if node should be removed entirely
|
||||
let total_incoming_weight: f32 = memory.get_edges_to(node_id)
|
||||
.iter()
|
||||
.map(|e| e.weight)
|
||||
.sum();
|
||||
|
||||
if total_incoming_weight < 0.01 {
|
||||
// Remove isolated or nearly-isolated node
|
||||
memory.remove_node(node_id);
|
||||
false // Not retained
|
||||
} else {
|
||||
true // Retained but weakened
|
||||
}
|
||||
}
|
||||
|
||||
fn prune_weak_edges(&self) -> usize {
|
||||
let mut memory = self.memory.write();
|
||||
let weak_edges: Vec<_> = memory.iter_edges()
|
||||
.filter(|e| e.weight < 0.01)
|
||||
.map(|e| e.id)
|
||||
.collect();
|
||||
|
||||
for edge_id in &weak_edges {
|
||||
memory.remove_edge(*edge_id);
|
||||
}
|
||||
|
||||
weak_edges.len()
|
||||
}
|
||||
|
||||
fn merge_similar_memories(&self) -> usize {
|
||||
let mut memory = self.memory.write();
|
||||
let mut merged_count = 0;
|
||||
|
||||
// Find highly similar node pairs
|
||||
let nodes: Vec<_> = memory.iter_nodes().collect();
|
||||
|
||||
for i in 0..nodes.len() {
|
||||
for j in (i+1)..nodes.len() {
|
||||
let sim = cosine_similarity(&nodes[i].embedding, &nodes[j].embedding);
|
||||
|
||||
if sim > 0.98 {
|
||||
// Merge j into i
|
||||
memory.merge_nodes(nodes[i].id, nodes[j].id);
|
||||
merged_count += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
merged_count
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
pub struct ConsolidationReport {
|
||||
pub high_value_count: usize,
|
||||
pub medium_value_count: usize,
|
||||
pub low_value_count: usize,
|
||||
pub memories_strengthened: usize,
|
||||
pub memories_decayed: usize,
|
||||
pub memories_removed: usize,
|
||||
pub memories_merged: usize,
|
||||
pub edges_pruned: usize,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Full Dream Cycle
|
||||
|
||||
### Orchestrating the Dream Process
|
||||
|
||||
```rust
|
||||
/// Complete dream cycle orchestrator
|
||||
pub struct DreamCycle {
|
||||
generator: DreamGenerator,
|
||||
evaluator: DreamEvaluator,
|
||||
integrator: DreamIntegrator,
|
||||
consolidator: ConsolidationEngine,
|
||||
config: DreamCycleConfig,
|
||||
}
|
||||
|
||||
impl DreamCycle {
|
||||
/// Run complete dream cycle (weekly maintenance)
|
||||
pub async fn run(&self) -> DreamCycleReport {
|
||||
let start = Instant::now();
|
||||
let mut report = DreamCycleReport::default();
|
||||
|
||||
// Phase 1: Generate dreams
|
||||
tracing::info!("Starting dream generation phase");
|
||||
let dreams = self.generate_dreams();
|
||||
report.dreams_generated = dreams.len();
|
||||
|
||||
// Phase 2: Evaluate dreams
|
||||
tracing::info!("Evaluating {} dreams", dreams.len());
|
||||
let evaluated: Vec<_> = dreams.iter()
|
||||
.map(|d| (d, self.evaluator.evaluate(d)))
|
||||
.collect();
|
||||
|
||||
// Phase 3: Integrate valuable dreams
|
||||
tracing::info!("Integrating valuable dreams");
|
||||
for (dream, quality) in &evaluated {
|
||||
if quality.is_valuable(self.config.dream_threshold) {
|
||||
let result = self.integrator.integrate(dream, quality);
|
||||
report.edges_added += result.edges_added;
|
||||
report.dreams_integrated += 1;
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 4: Memory consolidation
|
||||
tracing::info!("Running memory consolidation");
|
||||
report.consolidation = self.consolidator.consolidate();
|
||||
|
||||
report.elapsed_ms = start.elapsed().as_millis() as u64;
|
||||
report.timestamp = chrono::Utc::now().timestamp();
|
||||
|
||||
tracing::info!(
|
||||
dreams = report.dreams_generated,
|
||||
integrated = report.dreams_integrated,
|
||||
edges = report.edges_added,
|
||||
elapsed_ms = report.elapsed_ms,
|
||||
"Dream cycle completed"
|
||||
);
|
||||
|
||||
report
|
||||
}
|
||||
|
||||
fn generate_dreams(&self) -> Vec<Dream> {
|
||||
let mut dreams = Vec::new();
|
||||
|
||||
// Regular random walk dreams
|
||||
for _ in 0..self.config.num_regular_dreams {
|
||||
let dream = self.generator.generate_dream(&self.memory, None);
|
||||
dreams.push(dream);
|
||||
}
|
||||
|
||||
// Creative jump dreams
|
||||
for _ in 0..self.config.num_creative_dreams {
|
||||
let dream = self.generator.generate_creative_dream(
|
||||
&self.memory,
|
||||
self.config.creative_jump_count,
|
||||
);
|
||||
dreams.push(dream);
|
||||
}
|
||||
|
||||
dreams
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
pub struct DreamCycleReport {
|
||||
pub dreams_generated: usize,
|
||||
pub dreams_integrated: usize,
|
||||
pub edges_added: usize,
|
||||
pub consolidation: ConsolidationReport,
|
||||
pub elapsed_ms: u64,
|
||||
pub timestamp: i64,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration with exo-exotic Dreams Module
|
||||
|
||||
SONA integrates with the exo-ai-2025 dream experiments:
|
||||
|
||||
```rust
|
||||
// From exo-exotic crate
|
||||
use exo_exotic::experiments::dreams::{
|
||||
DreamExperiment,
|
||||
DreamConfig,
|
||||
NoveltyMeasure,
|
||||
};
|
||||
|
||||
impl DreamCycle {
|
||||
/// Run advanced dream experiments from exo-exotic
|
||||
pub async fn run_exotic_dreams(&self) -> ExoticDreamReport {
|
||||
let dream_experiment = DreamExperiment::new(DreamConfig {
|
||||
memory_count: self.memory.node_count(),
|
||||
replay_probability: 0.7,
|
||||
recombination_rate: 0.3,
|
||||
novelty_threshold: 0.5,
|
||||
});
|
||||
|
||||
let result = dream_experiment.run(&self.memory).await;
|
||||
|
||||
ExoticDreamReport {
|
||||
novelty_score: result.novelty,
|
||||
coherence_score: result.coherence,
|
||||
creative_insights: result.insights.len(),
|
||||
new_hypotheses: result.hypotheses,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
SONA's Dream Engine enables:
|
||||
|
||||
| Feature | Mechanism | Outcome |
|
||||
|---------|-----------|---------|
|
||||
| **Memory Replay** | Random walks on memory graph | Strengthens important connections |
|
||||
| **Creative Recombination** | High-temperature sampling | Discovers novel associations |
|
||||
| **Quality Filtering** | Novelty + coherence metrics | Only valuable dreams integrated |
|
||||
| **Weak Edge Creation** | Dream-derived connections | Enables creative retrieval |
|
||||
| **Memory Consolidation** | Value-based pruning | Efficient memory usage |
|
||||
|
||||
Dreams allow SONA to:
|
||||
1. **Discover** connections it wouldn't find through normal operation
|
||||
2. **Explore** the hypothesis space without user cost
|
||||
3. **Consolidate** valuable knowledge
|
||||
4. **Prune** low-value information
|
||||
5. **Remain creative** while staying grounded
|
||||
1154
examples/ruvLLM/docs/SONA/06-COMPONENTS.md
Normal file
1154
examples/ruvLLM/docs/SONA/06-COMPONENTS.md
Normal file
File diff suppressed because it is too large
Load Diff
1396
examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
Normal file
1396
examples/ruvLLM/docs/SONA/07-IMPLEMENTATION.md
Normal file
File diff suppressed because it is too large
Load Diff
814
examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
Normal file
814
examples/ruvLLM/docs/SONA/08-BENCHMARKS.md
Normal file
@@ -0,0 +1,814 @@
|
||||
# SONA Performance Benchmarks
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines performance targets, benchmark methodology, and expected results for SONA components. All benchmarks are designed to be reproducible and measurable.
|
||||
|
||||
## Performance Targets Summary
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ SONA Performance Targets │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ Component │ Target │ Stretch Goal │ Unit │
|
||||
├─────────────────────────┼────────────────┼───────────────┼─────────────┤
|
||||
│ Micro-LoRA forward │ <50μs │ <20μs │ per request │
|
||||
│ Micro-LoRA update │ <100μs │ <50μs │ per signal │
|
||||
│ Base LoRA forward │ <200μs │ <100μs │ per layer │
|
||||
│ Pattern extraction │ <1s │ <500ms │ per 1000 │
|
||||
│ Trajectory recording │ <10μs │ <5μs │ per step │
|
||||
│ Background cycle │ <30s │ <15s │ per cycle │
|
||||
│ Deep cycle │ <10min │ <5min │ per cycle │
|
||||
│ Memory overhead │ <100MB │ <50MB │ total │
|
||||
│ Pattern search │ <1ms │ <100μs │ per query │
|
||||
│ Dream generation │ <100ms │ <50ms │ per dream │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Micro-LoRA Benchmarks
|
||||
|
||||
### Forward Pass Latency
|
||||
|
||||
**Target**: <50μs average, <100μs p99
|
||||
|
||||
```rust
|
||||
// benches/micro_lora.rs
|
||||
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
|
||||
|
||||
fn bench_micro_lora_forward(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("micro_lora_forward");
|
||||
|
||||
for rank in [1, 2] {
|
||||
for hidden_dim in [256, 512, 1024, 2048] {
|
||||
let lora = MicroLoRA::new(hidden_dim, rank);
|
||||
let input = vec![0.1f32; hidden_dim];
|
||||
let mut output = vec![0.0f32; hidden_dim];
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
|
||||
&hidden_dim,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
output.fill(0.0);
|
||||
unsafe { lora.forward_simd(&input, &mut output) };
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Rank | Hidden Dim | AVX2 (μs) | Scalar (μs) | Speedup |
|
||||
|------|------------|-----------|-------------|---------|
|
||||
| 1 | 256 | 3.2 | 12.5 | 3.9x |
|
||||
| 1 | 512 | 5.8 | 24.1 | 4.2x |
|
||||
| 1 | 1024 | 10.4 | 47.3 | 4.5x |
|
||||
| 1 | 2048 | 19.7 | 93.8 | 4.8x |
|
||||
| 2 | 256 | 5.1 | 23.4 | 4.6x |
|
||||
| 2 | 512 | 9.3 | 46.2 | 5.0x |
|
||||
| 2 | 1024 | 17.2 | 91.5 | 5.3x |
|
||||
| 2 | 2048 | 33.1 | 182.4 | 5.5x |
|
||||
|
||||
### Gradient Accumulation
|
||||
|
||||
**Target**: <100μs per signal
|
||||
|
||||
```rust
|
||||
fn bench_gradient_accumulation(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("gradient_accumulation");
|
||||
|
||||
for hidden_dim in [256, 512, 1024] {
|
||||
let mut lora = MicroLoRA::new(hidden_dim, 1);
|
||||
let signal = LearningSignal {
|
||||
query_embedding: vec![0.1; hidden_dim],
|
||||
gradient_estimate: vec![0.01; hidden_dim],
|
||||
quality_score: 0.8,
|
||||
timestamp: Instant::now(),
|
||||
metadata: SignalMetadata::default(),
|
||||
};
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(hidden_dim),
|
||||
&hidden_dim,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
lora.accumulate_gradient(&signal);
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Hidden Dim | Time (μs) | Throughput (signals/s) |
|
||||
|------------|-----------|------------------------|
|
||||
| 256 | 8.3 | 120,481 |
|
||||
| 512 | 15.7 | 63,694 |
|
||||
| 1024 | 30.2 | 33,112 |
|
||||
|
||||
---
|
||||
|
||||
## Base LoRA Benchmarks
|
||||
|
||||
### Forward Pass (Per Layer)
|
||||
|
||||
**Target**: <200μs per layer
|
||||
|
||||
```rust
|
||||
fn bench_base_lora_forward(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("base_lora_forward");
|
||||
|
||||
for rank in [4, 8, 16] {
|
||||
for hidden_dim in [512, 1024, 2048] {
|
||||
let lora = BaseLoRA::new(hidden_dim, rank, 1);
|
||||
let input = vec![0.1f32; hidden_dim];
|
||||
let mut output = vec![0.0f32; hidden_dim];
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::new(format!("rank{}", rank), hidden_dim),
|
||||
&hidden_dim,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
lora.forward_layer(0, &input, &mut output);
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Rank | Hidden Dim | Time (μs) | FLOPs | GFLOPS |
|
||||
|------|------------|-----------|----------|--------|
|
||||
| 4 | 512 | 45 | 4.2M | 93 |
|
||||
| 4 | 1024 | 85 | 8.4M | 99 |
|
||||
| 4 | 2048 | 162 | 16.8M | 104 |
|
||||
| 8 | 512 | 82 | 8.4M | 102 |
|
||||
| 8 | 1024 | 158 | 16.8M | 106 |
|
||||
| 8 | 2048 | 305 | 33.5M | 110 |
|
||||
| 16 | 512 | 155 | 16.8M | 108 |
|
||||
| 16 | 1024 | 298 | 33.5M | 112 |
|
||||
| 16 | 2048 | 582 | 67.1M | 115 |
|
||||
|
||||
---
|
||||
|
||||
## Trajectory Recording Benchmarks
|
||||
|
||||
### Step Recording Latency
|
||||
|
||||
**Target**: <10μs per step
|
||||
|
||||
```rust
|
||||
fn bench_trajectory_recording(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("trajectory_recording");
|
||||
|
||||
for hidden_dim in [256, 512] {
|
||||
for num_heads in [4, 8] {
|
||||
let mut builder = TrajectoryBuilder::new(1, vec![0.1; hidden_dim]);
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::new(format!("h{}_heads{}", hidden_dim, num_heads), hidden_dim),
|
||||
&(hidden_dim, num_heads),
|
||||
|b, &(hd, nh)| {
|
||||
b.iter(|| {
|
||||
builder.add_step(
|
||||
vec![0.5; hd],
|
||||
vec![0.1; hd * nh],
|
||||
0.8,
|
||||
);
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Hidden Dim | Heads | Time (μs) | Memory (bytes) |
|
||||
|------------|-------|-----------|----------------|
|
||||
| 256 | 4 | 2.1 | 5,120 |
|
||||
| 256 | 8 | 3.8 | 9,216 |
|
||||
| 512 | 4 | 3.7 | 10,240 |
|
||||
| 512 | 8 | 6.9 | 18,432 |
|
||||
|
||||
### Buffer Operations
|
||||
|
||||
**Target**: Lock-free with <1% contention
|
||||
|
||||
```rust
|
||||
fn bench_trajectory_buffer(c: &mut Criterion) {
|
||||
let buffer = Arc::new(TrajectoryBuffer::new(10000));
|
||||
|
||||
c.bench_function("trajectory_buffer_record", |b| {
|
||||
let trajectory = QueryTrajectory {
|
||||
id: 1,
|
||||
query_embedding: vec![0.1; 256],
|
||||
steps: vec![],
|
||||
final_quality: 0.8,
|
||||
latency_us: 1000,
|
||||
};
|
||||
|
||||
b.iter(|| {
|
||||
buffer.record(trajectory.clone());
|
||||
});
|
||||
});
|
||||
|
||||
c.bench_function("trajectory_buffer_drain", |b| {
|
||||
// Pre-fill buffer
|
||||
for i in 0..1000 {
|
||||
buffer.record(QueryTrajectory {
|
||||
id: i,
|
||||
query_embedding: vec![0.1; 256],
|
||||
steps: vec![],
|
||||
final_quality: 0.8,
|
||||
latency_us: 1000,
|
||||
});
|
||||
}
|
||||
|
||||
b.iter(|| {
|
||||
buffer.drain()
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern Learning Benchmarks
|
||||
|
||||
### K-means++ Extraction
|
||||
|
||||
**Target**: <1s for 1000 trajectories
|
||||
|
||||
```rust
|
||||
fn bench_pattern_extraction(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("pattern_extraction");
|
||||
|
||||
for n_trajectories in [100, 500, 1000, 5000] {
|
||||
let mut bank = ReasoningBank::new(PatternConfig {
|
||||
k_clusters: 50,
|
||||
embedding_dim: 256,
|
||||
..Default::default()
|
||||
});
|
||||
|
||||
// Pre-populate
|
||||
for i in 0..n_trajectories {
|
||||
bank.add_trajectory(&generate_random_trajectory(i, 256));
|
||||
}
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(n_trajectories),
|
||||
&n_trajectories,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
bank.extract_patterns()
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Trajectories | Clusters | Time (ms) | Iterations |
|
||||
|--------------|----------|-----------|------------|
|
||||
| 100 | 10 | 12 | 8 |
|
||||
| 500 | 25 | 95 | 12 |
|
||||
| 1000 | 50 | 380 | 15 |
|
||||
| 5000 | 100 | 2,450 | 20 |
|
||||
|
||||
### Pattern Search
|
||||
|
||||
**Target**: <1ms per query
|
||||
|
||||
```rust
|
||||
fn bench_pattern_search(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("pattern_search");
|
||||
|
||||
for n_patterns in [1000, 10000, 100000] {
|
||||
let mut index = PatternIndex::new(256, n_patterns);
|
||||
|
||||
// Pre-populate
|
||||
for i in 0..n_patterns {
|
||||
let embedding: Vec<f32> = (0..256).map(|_| rand::random()).collect();
|
||||
index.add_pattern(i as u64, &embedding).unwrap();
|
||||
}
|
||||
|
||||
let query: Vec<f32> = (0..256).map(|_| rand::random()).collect();
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(n_patterns),
|
||||
&n_patterns,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
index.find_similar(&query, 10)
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results** (HNSW with ef=50):
|
||||
|
||||
| Patterns | Search Time (μs) | Recall@10 |
|
||||
|----------|------------------|-----------|
|
||||
| 1,000 | 45 | 0.98 |
|
||||
| 10,000 | 120 | 0.96 |
|
||||
| 100,000 | 350 | 0.94 |
|
||||
| 1,000,000| 850 | 0.92 |
|
||||
|
||||
---
|
||||
|
||||
## EWC++ Benchmarks
|
||||
|
||||
### Fisher Information Update
|
||||
|
||||
**Target**: <1ms per update
|
||||
|
||||
```rust
|
||||
fn bench_fisher_update(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("fisher_update");
|
||||
|
||||
for param_count in [1000, 10000, 100000] {
|
||||
let mut ewc = EwcPlusPlus::new(EwcConfig {
|
||||
param_count,
|
||||
..Default::default()
|
||||
});
|
||||
|
||||
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(param_count),
|
||||
¶m_count,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
ewc.update_fisher(&gradients);
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Parameters | Update Time (μs) | Memory (KB) |
|
||||
|------------|------------------|-------------|
|
||||
| 1,000 | 15 | 8 |
|
||||
| 10,000 | 120 | 80 |
|
||||
| 100,000 | 1,150 | 800 |
|
||||
|
||||
### Constraint Application
|
||||
|
||||
**Target**: <500μs per gradient vector
|
||||
|
||||
```rust
|
||||
fn bench_constraint_application(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("ewc_constraints");
|
||||
|
||||
for param_count in [1000, 10000, 100000] {
|
||||
let ewc = EwcPlusPlus::new(EwcConfig {
|
||||
param_count,
|
||||
num_tasks: 5,
|
||||
..Default::default()
|
||||
});
|
||||
|
||||
// Pre-train Fisher
|
||||
for _ in 0..100 {
|
||||
let grads: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
|
||||
ewc.update_fisher(&grads);
|
||||
}
|
||||
|
||||
let gradients: Vec<f32> = (0..param_count).map(|_| rand::random::<f32>() * 0.01).collect();
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(param_count),
|
||||
¶m_count,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
ewc.apply_constraints(&gradients)
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dream Engine Benchmarks
|
||||
|
||||
### Dream Generation
|
||||
|
||||
**Target**: <100ms per dream
|
||||
|
||||
```rust
|
||||
fn bench_dream_generation(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("dream_generation");
|
||||
|
||||
for memory_size in [1000, 10000, 50000] {
|
||||
let mut engine = DreamEngine::new(DreamConfig::default());
|
||||
|
||||
// Pre-populate memory
|
||||
for i in 0..memory_size {
|
||||
engine.add_memory_node(MemoryNode {
|
||||
id: i as u64,
|
||||
embedding: (0..256).map(|_| rand::random()).collect(),
|
||||
timestamp: Instant::now(),
|
||||
access_count: rand::random::<u32>() % 100,
|
||||
importance: rand::random(),
|
||||
});
|
||||
}
|
||||
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(memory_size),
|
||||
&memory_size,
|
||||
|b, _| {
|
||||
b.iter(|| {
|
||||
engine.generate_dream()
|
||||
});
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
group.finish();
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Memory Nodes | Dream Time (ms) | Avg Path Length |
|
||||
|--------------|-----------------|-----------------|
|
||||
| 1,000 | 12 | 8 |
|
||||
| 10,000 | 45 | 12 |
|
||||
| 50,000 | 85 | 15 |
|
||||
|
||||
### Dream Quality Evaluation
|
||||
|
||||
**Target**: <50ms per evaluation
|
||||
|
||||
```rust
|
||||
fn bench_dream_evaluation(c: &mut Criterion) {
|
||||
let evaluator = DreamEvaluator::new(EvaluatorConfig::default());
|
||||
|
||||
let dream = Dream {
|
||||
id: 1,
|
||||
path: (0..15).map(|i| MemoryNode {
|
||||
id: i,
|
||||
embedding: (0..256).map(|_| rand::random()).collect(),
|
||||
timestamp: Instant::now(),
|
||||
access_count: 10,
|
||||
importance: 0.5,
|
||||
}).collect(),
|
||||
creative_jumps: 3,
|
||||
total_novelty: 0.0,
|
||||
};
|
||||
|
||||
c.bench_function("dream_evaluation", |b| {
|
||||
b.iter(|| {
|
||||
evaluator.evaluate(&dream)
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Learning Loop Benchmarks
|
||||
|
||||
### Loop A (Instant) - Per Request
|
||||
|
||||
**Target**: <1ms total overhead
|
||||
|
||||
```rust
|
||||
fn bench_loop_a(c: &mut Criterion) {
|
||||
let loop_a = InstantLoop::new(256, InstantLoopConfig::default());
|
||||
|
||||
let trajectory = QueryTrajectory {
|
||||
id: 1,
|
||||
query_embedding: vec![0.1; 256],
|
||||
steps: (0..10).map(|_| TrajectoryStep {
|
||||
activations: vec![0.5; 256],
|
||||
attention_weights: vec![0.1; 2048],
|
||||
reward: 0.8,
|
||||
timestamp: Instant::now(),
|
||||
}).collect(),
|
||||
final_quality: 0.8,
|
||||
latency_us: 50000,
|
||||
};
|
||||
|
||||
c.bench_function("loop_a_on_inference", |b| {
|
||||
b.iter(|| {
|
||||
loop_a.on_inference(trajectory.clone());
|
||||
});
|
||||
});
|
||||
|
||||
c.bench_function("loop_a_flush", |b| {
|
||||
// Pre-fill with signals
|
||||
for _ in 0..100 {
|
||||
loop_a.on_inference(trajectory.clone());
|
||||
}
|
||||
|
||||
b.iter(|| {
|
||||
loop_a.flush_updates();
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Results**:
|
||||
|
||||
| Operation | Time (μs) | Notes |
|
||||
|---------------|-----------|--------------------------|
|
||||
| on_inference | 650 | Recording + accumulation |
|
||||
| flush_updates | 120 | LoRA + edge commit |
|
||||
| Total | 770 | Per request overhead |
|
||||
|
||||
### Loop B (Background) - Hourly
|
||||
|
||||
**Target**: <30s per cycle
|
||||
|
||||
```rust
|
||||
fn bench_loop_b(c: &mut Criterion) {
|
||||
let runtime = tokio::runtime::Runtime::new().unwrap();
|
||||
|
||||
let loop_b = BackgroundLoop::new(BackgroundLoopConfig::default(), 256);
|
||||
|
||||
// Generate trajectories
|
||||
let trajectories: Vec<_> = (0..1000)
|
||||
.map(|i| generate_random_trajectory(i, 256))
|
||||
.collect();
|
||||
|
||||
c.bench_function("loop_b_cycle", |b| {
|
||||
b.to_async(&runtime).iter(|| async {
|
||||
loop_b.run_cycle(trajectories.clone()).await
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Breakdown**:
|
||||
|
||||
| Phase | Time (s) | % of Total |
|
||||
|------------------------|----------|------------|
|
||||
| Trajectory ingestion | 0.5 | 2% |
|
||||
| Pattern extraction | 8.0 | 32% |
|
||||
| Gradient computation | 5.0 | 20% |
|
||||
| EWC++ constraints | 3.0 | 12% |
|
||||
| LoRA update | 2.0 | 8% |
|
||||
| Fisher update | 4.0 | 16% |
|
||||
| Metrics/logging | 2.5 | 10% |
|
||||
| **Total** | **25.0** | 100% |
|
||||
|
||||
### Loop C (Deep) - Weekly
|
||||
|
||||
**Target**: <10min per cycle
|
||||
|
||||
```rust
|
||||
fn bench_loop_c(c: &mut Criterion) {
|
||||
let runtime = tokio::runtime::Runtime::new().unwrap();
|
||||
|
||||
let loop_c = DeepLoop::new(DeepLoopConfig::default());
|
||||
|
||||
// This is a longer benchmark, run fewer iterations
|
||||
c.bench_function("loop_c_cycle", |b| {
|
||||
b.to_async(&runtime).iter(|| async {
|
||||
loop_c.run_cycle().await
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Breakdown**:
|
||||
|
||||
| Phase | Time (min) | % of Total |
|
||||
|------------------------|------------|------------|
|
||||
| Dream generation (50) | 1.5 | 15% |
|
||||
| Φ evaluation | 2.0 | 20% |
|
||||
| Dream integration | 1.0 | 10% |
|
||||
| Memory consolidation | 3.0 | 30% |
|
||||
| EWC++ consolidation | 2.0 | 20% |
|
||||
| Metrics/persistence | 0.5 | 5% |
|
||||
| **Total** | **10.0** | 100% |
|
||||
|
||||
---
|
||||
|
||||
## Memory Benchmarks
|
||||
|
||||
### Memory Usage by Component
|
||||
|
||||
```rust
|
||||
fn measure_memory_usage() -> MemoryReport {
|
||||
let mut report = MemoryReport::default();
|
||||
|
||||
// Micro-LoRA (rank=1, hidden=256)
|
||||
let micro_lora = MicroLoRA::new(256, 1);
|
||||
report.micro_lora = std::mem::size_of_val(µ_lora)
|
||||
+ micro_lora.down_proj.len() * 4
|
||||
+ micro_lora.up_proj.len() * 4
|
||||
+ micro_lora.gradient_buffer.len() * 4;
|
||||
|
||||
// Base LoRA (rank=8, hidden=256, layers=12)
|
||||
let base_lora = BaseLoRA::new(256, 8, 12);
|
||||
report.base_lora = std::mem::size_of_val(&base_lora)
|
||||
+ base_lora.layers.iter().map(|l|
|
||||
l.down_proj.len() * 4 + l.up_proj.len() * 4
|
||||
).sum::<usize>();
|
||||
|
||||
// Trajectory buffer (capacity=10000)
|
||||
report.trajectory_buffer = 10000 * (
|
||||
256 * 4 // query embedding
|
||||
+ 10 * (256 * 4 + 2048 * 4 + 4 + 8) // 10 steps
|
||||
);
|
||||
|
||||
// Pattern index (100k patterns)
|
||||
report.pattern_index = 100000 * (256 * 4 + 64); // embedding + metadata
|
||||
|
||||
// EWC++ (100k params, 5 tasks)
|
||||
report.ewc = 100000 * 4 * 5; // Fisher per task
|
||||
|
||||
report
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Memory Usage**:
|
||||
|
||||
| Component | Size (MB) | Notes |
|
||||
|------------------|-----------|--------------------------|
|
||||
| Micro-LoRA | 0.004 | Minimal overhead |
|
||||
| Base LoRA | 0.6 | 12 layers |
|
||||
| Trajectory Buffer| 82.0 | 10k capacity |
|
||||
| Pattern Index | 102.4 | 100k patterns |
|
||||
| EWC++ Fisher | 2.0 | 100k params × 5 tasks |
|
||||
| Dream Engine | 12.8 | 50k memory nodes |
|
||||
| **Total** | **199.8** | Peak usage |
|
||||
|
||||
---
|
||||
|
||||
## Throughput Benchmarks
|
||||
|
||||
### End-to-End Query Throughput
|
||||
|
||||
```rust
|
||||
fn bench_query_throughput(c: &mut Criterion) {
|
||||
let runtime = tokio::runtime::Runtime::new().unwrap();
|
||||
|
||||
let sona = runtime.block_on(async {
|
||||
SonaEngine::new(SonaConfig::default()).await.unwrap()
|
||||
});
|
||||
|
||||
c.bench_function("query_throughput", |b| {
|
||||
b.to_async(&runtime).iter(|| async {
|
||||
sona.process("test query", &Context::default()).await
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Expected Throughput**:
|
||||
|
||||
| Scenario | QPS | Latency p50 | Latency p99 |
|
||||
|--------------------|---------|-------------|-------------|
|
||||
| Baseline (no SONA) | 850 | 1.1ms | 2.5ms |
|
||||
| With Micro-LoRA | 780 | 1.2ms | 2.8ms |
|
||||
| Full SONA | 720 | 1.3ms | 3.2ms |
|
||||
|
||||
**Overhead**: ~15% throughput reduction for full self-learning capability.
|
||||
|
||||
---
|
||||
|
||||
## Hardware-Specific Benchmarks
|
||||
|
||||
### CPU Feature Detection
|
||||
|
||||
```rust
|
||||
fn check_cpu_features() -> CpuFeatures {
|
||||
CpuFeatures {
|
||||
avx2: is_x86_feature_detected!("avx2"),
|
||||
avx512f: is_x86_feature_detected!("avx512f"),
|
||||
fma: is_x86_feature_detected!("fma"),
|
||||
sse4_1: is_x86_feature_detected!("sse4.1"),
|
||||
sse4_2: is_x86_feature_detected!("sse4.2"),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Performance by CPU
|
||||
|
||||
| CPU | Micro-LoRA (μs) | Pattern Search (μs) | Overall Speedup |
|
||||
|------------------------|-----------------|---------------------|-----------------|
|
||||
| Intel i9-13900K (AVX2) | 3.2 | 45 | 4.8x |
|
||||
| AMD Ryzen 9 7950X | 3.5 | 48 | 4.5x |
|
||||
| Apple M2 Pro (NEON) | 4.1 | 52 | 3.9x |
|
||||
| Intel Xeon Platinum | 2.8 | 38 | 5.2x |
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Commands
|
||||
|
||||
```bash
|
||||
# Run all benchmarks
|
||||
cargo bench --package ruvllm --features sona
|
||||
|
||||
# Run specific benchmark group
|
||||
cargo bench --package ruvllm --bench micro_lora
|
||||
|
||||
# Run with specific features
|
||||
cargo bench --package ruvllm --features "sona,avx2"
|
||||
|
||||
# Profile memory
|
||||
cargo bench --package ruvllm --bench memory -- --profile-time 60
|
||||
|
||||
# Generate flamegraph
|
||||
cargo flamegraph --bench micro_lora -- --bench
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Continuous Benchmarking
|
||||
|
||||
### CI Integration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/bench.yml
|
||||
name: Benchmarks
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
benchmark:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Run benchmarks
|
||||
run: cargo bench --package ruvllm --features sona -- --save-baseline main
|
||||
|
||||
- name: Compare with baseline
|
||||
run: cargo bench --package ruvllm --features sona -- --baseline main
|
||||
|
||||
- name: Upload results
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: benchmark-results
|
||||
path: target/criterion
|
||||
```
|
||||
|
||||
### Regression Detection
|
||||
|
||||
```rust
|
||||
// Fail CI if performance regresses by more than 10%
|
||||
const MAX_REGRESSION_PERCENT: f64 = 10.0;
|
||||
|
||||
fn check_regression(baseline: Duration, current: Duration) -> Result<(), String> {
|
||||
let regression = (current.as_nanos() as f64 / baseline.as_nanos() as f64 - 1.0) * 100.0;
|
||||
|
||||
if regression > MAX_REGRESSION_PERCENT {
|
||||
Err(format!(
|
||||
"Performance regression of {:.1}% exceeds threshold of {}%",
|
||||
regression, MAX_REGRESSION_PERCENT
|
||||
))
|
||||
} else {
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **09-API-REFERENCE.md** - Complete API documentation
|
||||
1116
examples/ruvLLM/docs/SONA/09-API-REFERENCE.md
Normal file
1116
examples/ruvLLM/docs/SONA/09-API-REFERENCE.md
Normal file
File diff suppressed because it is too large
Load Diff
138
examples/ruvLLM/docs/index.md
Normal file
138
examples/ruvLLM/docs/index.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# RuvLLM Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains documentation for the RuvLLM self-learning LLM architecture.
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [Main README](../README.md) - Getting started, API reference, benchmarks
|
||||
- [SPARC Documentation](./sparc/) - Design methodology documentation
|
||||
|
||||
## SPARC Methodology
|
||||
|
||||
The project was designed using the SPARC methodology:
|
||||
|
||||
| Phase | Document | Description |
|
||||
|-------|----------|-------------|
|
||||
| 1 | [Specification](./sparc/01-specification.md) | Requirements and acceptance criteria |
|
||||
| 2 | [Pseudocode](./sparc/02-pseudocode.md) | Algorithm design and data flows |
|
||||
| 3 | [Architecture](./sparc/03-architecture.md) | System design and component interactions |
|
||||
| 4 | [Refinement](./sparc/04-refinement.md) | TDD implementation and iterative improvement |
|
||||
| 5 | [Completion](./sparc/05-completion.md) | Integration, testing, and deployment |
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ RuvLLM System │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Embedding │ │ Memory │ │ Router │ │
|
||||
│ │ Service │ │ (HNSW) │ │ (FastGRNN) │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │
|
||||
│ └────────────────┼────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────┴──────┐ │
|
||||
│ │ Orchestrator │ │
|
||||
│ └──────┬──────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────┐ ┌──────┴──────┐ ┌─────────────┐ │
|
||||
│ │ Attention │ │ Inference │ │ Learning │ │
|
||||
│ │ Engine │ │ Pool │ │ Service │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Module Documentation
|
||||
|
||||
### Core Modules
|
||||
|
||||
| Module | File | Description |
|
||||
|--------|------|-------------|
|
||||
| `orchestrator` | `src/orchestrator.rs` | Main coordinator, request processing pipeline |
|
||||
| `memory` | `src/memory.rs` | HNSW-based semantic memory with graph expansion |
|
||||
| `router` | `src/router.rs` | FastGRNN routing with EWC learning |
|
||||
| `attention` | `src/attention.rs` | Multi-head graph attention with edge features |
|
||||
| `embedding` | `src/embedding.rs` | Tokenization, embedding, and caching |
|
||||
| `inference` | `src/inference.rs` | LFM2 model pool management |
|
||||
| `learning` | `src/learning.rs` | Self-learning feedback loops |
|
||||
| `compression` | `src/compression.rs` | Memory compression and clustering |
|
||||
|
||||
### Supporting Modules
|
||||
|
||||
| Module | File | Description |
|
||||
|--------|------|-------------|
|
||||
| `config` | `src/config.rs` | Configuration system with builder pattern |
|
||||
| `error` | `src/error.rs` | Error types and result aliases |
|
||||
| `types` | `src/types.rs` | Core domain types and structs |
|
||||
|
||||
## API Examples
|
||||
|
||||
### Basic Query
|
||||
|
||||
```rust
|
||||
use ruvllm::{Config, RuvLLM};
|
||||
|
||||
let config = Config::builder().build()?;
|
||||
let llm = RuvLLM::new(config).await?;
|
||||
let response = llm.query("What is Rust?").await?;
|
||||
```
|
||||
|
||||
### Session Management
|
||||
|
||||
```rust
|
||||
let session = llm.new_session();
|
||||
let r1 = llm.query_session(&session, "Tell me about vectors").await?;
|
||||
let r2 = llm.query_session(&session, "How are they used in ML?").await?;
|
||||
```
|
||||
|
||||
### Feedback Loop
|
||||
|
||||
```rust
|
||||
use ruvllm::Feedback;
|
||||
|
||||
llm.feedback(Feedback {
|
||||
request_id: response.request_id,
|
||||
rating: Some(5),
|
||||
correction: None,
|
||||
task_success: Some(true),
|
||||
}).await?;
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Memory Configuration
|
||||
|
||||
```rust
|
||||
Config::builder()
|
||||
.hnsw_params(
|
||||
32, // M: connections per node (higher = better recall, more memory)
|
||||
200, // ef_construction: build quality (higher = slower build, better index)
|
||||
64, // ef_search: search quality (higher = slower search, better recall)
|
||||
)
|
||||
```
|
||||
|
||||
### Router Configuration
|
||||
|
||||
```rust
|
||||
Config::builder()
|
||||
.router_hidden_dim(128) // Hidden state size (higher = more capacity)
|
||||
```
|
||||
|
||||
### Learning Configuration
|
||||
|
||||
```rust
|
||||
Config::builder()
|
||||
.learning_enabled(true) // Enable self-learning
|
||||
```
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [LFM2 Paper](https://arxiv.org/abs/2511.23404v1) - Liquid Foundation Models
|
||||
- [FastGRNN Paper](https://arxiv.org/abs/1901.02358) - Fast RNN architecture
|
||||
- [HNSW Paper](https://arxiv.org/abs/1603.09320) - Approximate nearest neighbor search
|
||||
- [EWC Paper](https://arxiv.org/abs/1612.00796) - Continual learning
|
||||
612
examples/ruvLLM/docs/sparc/01-specification.md
Normal file
612
examples/ruvLLM/docs/sparc/01-specification.md
Normal file
@@ -0,0 +1,612 @@
|
||||
# RuvLLM: Self-Learning LLM with LFM2 and Ruvector Integration
|
||||
|
||||
## SPARC Phase 1: Specification
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
RuvLLM is a self-learning LLM architecture that integrates **Liquid Foundation Models (LFM2)** with **ruvector** as the world model and memory substrate. The system uses **FastGRNN** as an intelligent router to dynamically allocate computational resources based on query complexity, enabling efficient on-device inference with continuous learning capabilities.
|
||||
|
||||
### Core Innovation
|
||||
|
||||
The architecture treats:
|
||||
- **LFM2** as the reasoning head (inference engine)
|
||||
- **Ruvector** as the world model and episodic memory
|
||||
- **FastGRNN** as the control circuit (routing decisions)
|
||||
|
||||
This triad creates a self-learning system where:
|
||||
1. Queries are semantically embedded and matched against memory
|
||||
2. Graph attention extracts relevant neighborhood context
|
||||
3. FastGRNN routes to optimal model configuration
|
||||
4. LFM2 generates responses with retrieved context
|
||||
5. Successful interactions are written back to memory (self-improvement)
|
||||
|
||||
---
|
||||
|
||||
## 2. Technical Requirements
|
||||
|
||||
### 2.1 Functional Requirements
|
||||
|
||||
#### FR-001: LFM2 Model Integration
|
||||
- **Description**: Support LFM2 model family (350M, 700M, 1.2B, 2.6B parameters)
|
||||
- **Acceptance Criteria**:
|
||||
- Load models via llama.cpp (CPU) or vLLM (server)
|
||||
- Support quantization: Q4/Q5 (CPU), 8-bit/4-bit weight-only (GPU)
|
||||
- Enable KV cache for context reuse
|
||||
- Achieve <500ms median latency (CPU), <100ms (GPU)
|
||||
|
||||
#### FR-002: Ruvector Memory Service
|
||||
- **Description**: Implement semantic memory with graph structure
|
||||
- **Storage Schema**:
|
||||
```
|
||||
Nodes: {
|
||||
id: UUID,
|
||||
vector: [f32; D], // D = embedding dimension
|
||||
text: String,
|
||||
type: NodeType, // Query | Document | AgentStep | Fact
|
||||
source: String,
|
||||
metadata: {
|
||||
timestamp: i64,
|
||||
tags: Vec<String>,
|
||||
domain: String,
|
||||
version: u32,
|
||||
confidence: f32
|
||||
}
|
||||
}
|
||||
|
||||
Edges: {
|
||||
id: UUID,
|
||||
src: UUID,
|
||||
dst: UUID,
|
||||
rel: EdgeType, // Cites | Follows | SameTopic | AgentStep | Derived
|
||||
weight: f32,
|
||||
metadata: {
|
||||
timestamp: i64,
|
||||
created_by: String,
|
||||
confidence: f32
|
||||
}
|
||||
}
|
||||
```
|
||||
- **Acceptance Criteria**:
|
||||
- HNSW index with M=32, efConstruction=200, efSearch=64
|
||||
- Sub-millisecond retrieval for k≤64
|
||||
- Graph attention over 2-hop neighborhoods
|
||||
- Support billion-scale corpora
|
||||
|
||||
#### FR-003: FastGRNN Router
|
||||
- **Description**: Implement gated recurrent router for intelligent resource allocation
|
||||
- **Architecture** (per Kusupati et al.):
|
||||
- Hidden size: 32-64 units
|
||||
- Input: Fixed-length feature vector (~128 dims)
|
||||
- Outputs: model_selection, context_size, temperature, top_p
|
||||
- **Feature Vector Components** (128 dimensions):
|
||||
```
|
||||
Query Stats [32 dims]:
|
||||
- token_count: f32
|
||||
- language_id: [f32; 8] (one-hot)
|
||||
- domain_encoding: [f32; 16]
|
||||
- user_frequency: f32
|
||||
- query_type: [f32; 6] (factual/reasoning/creative/...)
|
||||
|
||||
Embedding Stats [16 dims]:
|
||||
- l2_norm: f32
|
||||
- principal_components: [f32; 8]
|
||||
- entropy: f32
|
||||
- sparsity: f32
|
||||
- cluster_assignment: [f32; 4]
|
||||
|
||||
HNSW Search Stats [48 dims]:
|
||||
- k_retrieved: f32
|
||||
- distances: { mean, std, min, max }: [f32; 4]
|
||||
- entropy: f32
|
||||
- graph_depth: f32
|
||||
- recall_estimate: f32
|
||||
- neighborhood_density: [f32; 16]
|
||||
- semantic_coherence: [f32; 24]
|
||||
|
||||
System Constraints [32 dims]:
|
||||
- latency_budget: f32
|
||||
- device_class: [f32; 4] (edge/mobile/server/cluster)
|
||||
- privacy_level: [f32; 4]
|
||||
- memory_available: f32
|
||||
- battery_level: f32 (for mobile)
|
||||
- concurrent_requests: f32
|
||||
- historical_accuracy: [f32; 16]
|
||||
```
|
||||
|
||||
#### FR-004: Self-Learning Pipeline
|
||||
- **Description**: Implement continuous learning with forgetting mitigation
|
||||
- **Components**:
|
||||
- Online learning from successful interactions
|
||||
- Elastic Weight Consolidation (EWC) for catastrophic forgetting prevention
|
||||
- Experience replay with reservoir sampling
|
||||
- Curriculum learning for progressive complexity
|
||||
- **Acceptance Criteria**:
|
||||
- Quality regret <0.1 points vs. always-big baseline
|
||||
- No measurable forgetting over 10K update cycles
|
||||
- Router accuracy >95% for seen patterns
|
||||
|
||||
#### FR-005: Graph Attention Engine
|
||||
- **Description**: Context extraction via graph-aware attention
|
||||
- **Mechanism**:
|
||||
- Multi-head attention over retrieved nodes
|
||||
- Edge-weighted aggregation (confidence, recency)
|
||||
- Hyperbolic embeddings for hierarchical relationships
|
||||
- 2-hop neighborhood expansion
|
||||
- **Integration with existing ruvector-attention**:
|
||||
- Leverage `EdgeFeaturedAttention` for edge attributes
|
||||
- Use `GraphRoPE` for positional encoding on graphs
|
||||
- Apply `DualSpaceAttention` for multi-manifold reasoning
|
||||
|
||||
### 2.2 Non-Functional Requirements
|
||||
|
||||
#### NFR-001: Performance
|
||||
| Metric | Tier A (Server) | Tier B (Edge) | Tier C (Mobile) |
|
||||
|--------|-----------------|---------------|-----------------|
|
||||
| P50 Latency | <200ms | <500ms | <800ms |
|
||||
| P99 Latency | <1s | <2s | <5s |
|
||||
| Throughput | 100 QPS | 20 QPS | 5 QPS |
|
||||
| Memory | <16GB | <4GB | <1GB |
|
||||
|
||||
#### NFR-002: Quality
|
||||
- **Accuracy**: F1 >0.85 on QA benchmarks
|
||||
- **Retrieval**: R@10 >0.90 for relevant documents
|
||||
- **Router**: Decision accuracy >95%
|
||||
- **Judge Rating**: 4.2+/5.0 on LLM-as-judge evaluations
|
||||
|
||||
#### NFR-003: Scalability
|
||||
- Support 10M+ vectors in memory
|
||||
- Support 1B+ vectors with hybrid indexing
|
||||
- Linear scaling with node count in cluster mode
|
||||
|
||||
#### NFR-004: Reliability
|
||||
- Zero data loss on graceful shutdown
|
||||
- Recovery from OOM within 30s
|
||||
- Automatic failover in cluster mode
|
||||
|
||||
---
|
||||
|
||||
## 3. LFM2 Deep Dive
|
||||
|
||||
### 3.1 Architecture Analysis
|
||||
|
||||
LFM2 employs a **hybrid backbone** combining:
|
||||
|
||||
1. **Gated Short Convolutions**: Lightweight local feature processing
|
||||
- O(n) complexity vs O(n²) for attention
|
||||
- Captures local patterns efficiently
|
||||
- Enables 2x faster prefill on CPUs
|
||||
|
||||
2. **Grouped Query Attention (GQA)**: Reduced KV heads
|
||||
- 4-8 KV heads vs 32+ in standard attention
|
||||
- Maintains quality with 4x memory reduction
|
||||
- Critical for edge deployment
|
||||
|
||||
### 3.2 Training Methodology
|
||||
|
||||
LFM2's training is relevant for our self-learning pipeline:
|
||||
|
||||
1. **Knowledge Distillation**: Tempered, decoupled Top-K
|
||||
- Teacher: Large model (70B+)
|
||||
- Student: LFM2 variants
|
||||
- **Insight**: We can distill router decisions from expensive oracle
|
||||
|
||||
2. **Curriculum Learning**: Progressive complexity
|
||||
- Start with simple factual queries
|
||||
- Graduate to multi-step reasoning
|
||||
- **Application**: Router training follows same progression
|
||||
|
||||
3. **Three-Stage Post-Training**:
|
||||
- SFT: Supervised fine-tuning on quality data
|
||||
- DPO: Direct preference optimization
|
||||
- Model merging: Combine specialists
|
||||
- **Application**: We merge domain-specific adapters
|
||||
|
||||
### 3.3 Multimodal Extensions (Future)
|
||||
|
||||
- **LFM2-VL**: Vision-language (image understanding)
|
||||
- **LFM2-Audio**: Speech I/O
|
||||
- **LFM2-ColBERT**: Low-latency retrieval encoder
|
||||
|
||||
---
|
||||
|
||||
## 4. Ruvector Integration Analysis
|
||||
|
||||
### 4.1 Existing Capabilities
|
||||
|
||||
| Component | Status | Integration Plan |
|
||||
|-----------|--------|------------------|
|
||||
| ruvector-core | ✅ Production | Primary vector store |
|
||||
| ruvector-gnn | ✅ Production | Graph neural layer |
|
||||
| ruvector-attention | ✅ Production | Attention mechanisms |
|
||||
| ruvector-router-core | ✅ Production | Base routing |
|
||||
| ruvector-graph | ✅ Production | Knowledge graph |
|
||||
|
||||
### 4.2 Required Extensions
|
||||
|
||||
#### 4.2.1 Embedding Adapter
|
||||
```rust
|
||||
pub struct EmbeddingAdapter {
|
||||
/// LFM2 encoder for query embedding
|
||||
lfm2_encoder: Lfm2Encoder,
|
||||
/// Dimension alignment layer
|
||||
projection: Linear,
|
||||
/// Normalization
|
||||
layer_norm: LayerNorm,
|
||||
}
|
||||
|
||||
impl EmbeddingAdapter {
|
||||
pub fn embed(&self, text: &str) -> Vec<f32> {
|
||||
let raw = self.lfm2_encoder.encode(text);
|
||||
let projected = self.projection.forward(&raw);
|
||||
self.layer_norm.forward(&projected)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 4.2.2 Memory Writeback Service
|
||||
```rust
|
||||
pub struct MemoryWriteback {
|
||||
/// Quality threshold for writeback
|
||||
quality_threshold: f32,
|
||||
/// Deduplication via MinHash
|
||||
dedup_hasher: MinHasher,
|
||||
/// Conflict resolution
|
||||
merger: ConflictMerger,
|
||||
}
|
||||
|
||||
impl MemoryWriteback {
|
||||
pub async fn maybe_write(
|
||||
&self,
|
||||
query: &str,
|
||||
response: &str,
|
||||
quality_score: f32,
|
||||
db: &VectorDB,
|
||||
) -> Result<Option<UUID>> {
|
||||
if quality_score < self.quality_threshold {
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
// Check for near-duplicates
|
||||
let embedding = embed(query, response);
|
||||
let similar = db.search_threshold(&embedding, 0.95)?;
|
||||
if !similar.is_empty() {
|
||||
return self.merger.resolve(similar, query, response);
|
||||
}
|
||||
|
||||
// Insert new memory
|
||||
let entry = VectorEntry::new(embedding)
|
||||
.with_text(format!("Q: {}\nA: {}", query, response))
|
||||
.with_metadata(json!({
|
||||
"type": "qa_pair",
|
||||
"quality": quality_score,
|
||||
"timestamp": now(),
|
||||
}));
|
||||
|
||||
Ok(Some(db.insert(entry)?))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 HNSW Parameter Tuning
|
||||
|
||||
Based on arxiv:2511.23404v1 insights on retrieval efficiency:
|
||||
|
||||
| Corpus Size | M | efConstruction | efSearch | Recall@10 |
|
||||
|-------------|---|----------------|----------|-----------|
|
||||
| <100K | 16 | 100 | 32 | 0.98 |
|
||||
| 100K-1M | 32 | 200 | 64 | 0.96 |
|
||||
| 1M-10M | 48 | 300 | 128 | 0.94 |
|
||||
| 10M-100M | 64 | 400 | 256 | 0.92 |
|
||||
| >100M | Hybrid | Tiered | Adaptive | 0.90 |
|
||||
|
||||
---
|
||||
|
||||
## 5. FastGRNN Router Specification
|
||||
|
||||
### 5.1 Mathematical Formulation
|
||||
|
||||
FastGRNN (Fast, Accurate, Stable, and Tiny GRU):
|
||||
|
||||
```
|
||||
z_t = σ(W_z · x_t + U_z · h_{t-1} + b_z)
|
||||
h̃_t = tanh(W_h · x_t + U_h · (r_t ⊙ h_{t-1}) + b_h)
|
||||
h_t = (ζ · (1 - z_t) + ν) ⊙ h̃_t + z_t ⊙ h_{t-1}
|
||||
|
||||
where:
|
||||
- ζ, ν: Learned scalars (typically ζ≈1, ν≈0.5)
|
||||
- W_z, W_h: Input weight matrices (sparse)
|
||||
- U_z, U_h: Recurrent weight matrices (low-rank)
|
||||
- r_t: Optional reset gate (can be fixed to 1)
|
||||
```
|
||||
|
||||
### 5.2 Output Heads
|
||||
|
||||
```rust
|
||||
pub struct RouterOutputs {
|
||||
/// Model selection: [350M, 700M, 1.2B, 2.6B] probabilities
|
||||
pub model_probs: [f32; 4],
|
||||
/// Context size bins: [256, 512, 1024, 2048, 4096] tokens
|
||||
pub context_probs: [f32; 5],
|
||||
/// Temperature: continuous [0.0, 2.0]
|
||||
pub temperature: f32,
|
||||
/// Top-p: continuous [0.0, 1.0]
|
||||
pub top_p: f32,
|
||||
/// Confidence score
|
||||
pub confidence: f32,
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Training Protocol
|
||||
|
||||
**Phase 1: Data Collection**
|
||||
```
|
||||
For each query q:
|
||||
1. Run all model configurations (expensive baseline)
|
||||
2. Collect quality metrics Q, latency L, cost C
|
||||
3. Compute utility: U = Q - λ·L - μ·C
|
||||
4. Label: y_model = argmax(U), y_ctx = min viable context
|
||||
```
|
||||
|
||||
**Phase 2: Supervised Training**
|
||||
```
|
||||
Loss = CE(model_pred, y_model)
|
||||
+ CE(ctx_pred, y_ctx)
|
||||
+ α·SmoothL1(temp_pred, y_temp)
|
||||
+ β·SmoothL1(top_p_pred, y_top_p)
|
||||
```
|
||||
|
||||
**Phase 3: Online Refinement**
|
||||
```
|
||||
Every N requests:
|
||||
1. Sample exploration (ε-greedy or Thompson)
|
||||
2. Compute regret vs. oracle
|
||||
3. Update weights with importance sampling
|
||||
4. Apply EWC regularization
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Self-Learning Mechanisms
|
||||
|
||||
### 6.1 Continual Learning Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Self-Learning Pipeline │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Query │───▶│ Retrieve│───▶│ Generate│───▶│ Evaluate│ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ │ ▼ │
|
||||
│ │ │ │ ┌─────────┐ │
|
||||
│ │ │ │ │ Quality │ │
|
||||
│ │ │ │ │ > θ ? │ │
|
||||
│ │ │ │ └────┬────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ │ ┌──────┴──────┐ │
|
||||
│ │ │ │ ▼ ▼ │
|
||||
│ │ │ │ ┌───────┐ ┌───────┐ │
|
||||
│ │ │ │ │ Write │ │ Skip │ │
|
||||
│ │ │ │ │ Back │ │ │ │
|
||||
│ │ │ │ └───┬───┘ └───────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ Replay Buffer (Reservoir) │ │
|
||||
│ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │
|
||||
│ │ │ E_1 │ │ E_2 │ │ ... │ │E_n-1│ │ E_n │ │ │
|
||||
│ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │
|
||||
│ └──────────────────────┬──────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────┐ │
|
||||
│ │ EWC Regularization Layer │ │
|
||||
│ │ │ │
|
||||
│ │ L_total = L_task + λ·Σ F_i·(θ_i - θ*_i)² │ │
|
||||
│ │ │ │
|
||||
│ │ F_i = Fisher Information (importance) │ │
|
||||
│ │ θ*_i = Optimal weights from previous task │ │
|
||||
│ └─────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 6.2 Quality Evaluation
|
||||
|
||||
**LLM-as-Judge Protocol**:
|
||||
```rust
|
||||
pub struct QualityJudge {
|
||||
judge_model: Lfm2, // Use 2.6B for judging
|
||||
rubric: JudgeRubric,
|
||||
}
|
||||
|
||||
impl QualityJudge {
|
||||
pub fn evaluate(&self, query: &str, response: &str, context: &[&str]) -> f32 {
|
||||
let prompt = format!(r#"
|
||||
Evaluate the response quality on a scale of 1-5:
|
||||
|
||||
Query: {query}
|
||||
Retrieved Context: {context:?}
|
||||
Response: {response}
|
||||
|
||||
Criteria:
|
||||
1. Factual accuracy (grounded in context)
|
||||
2. Completeness (addresses the query fully)
|
||||
3. Coherence (logical flow)
|
||||
4. Conciseness (no unnecessary verbosity)
|
||||
|
||||
Score (1-5):
|
||||
"#);
|
||||
|
||||
let score_str = self.judge_model.generate(&prompt, 10);
|
||||
parse_score(&score_str)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 Forgetting Mitigation
|
||||
|
||||
**Elastic Weight Consolidation (EWC)**:
|
||||
|
||||
```rust
|
||||
// From ruvector-gnn ewc module
|
||||
pub struct ElasticWeightConsolidation {
|
||||
lambda: f32, // Regularization strength
|
||||
fisher_info: Vec<f32>, // Fisher information diagonal
|
||||
optimal_weights: Vec<f32>, // θ* from previous task
|
||||
}
|
||||
|
||||
impl ElasticWeightConsolidation {
|
||||
pub fn regularization_loss(&self, current_weights: &[f32]) -> f32 {
|
||||
self.fisher_info.iter()
|
||||
.zip(current_weights.iter())
|
||||
.zip(self.optimal_weights.iter())
|
||||
.map(|((f, w), w_star)| f * (w - w_star).powi(2))
|
||||
.sum::<f32>() * self.lambda / 2.0
|
||||
}
|
||||
|
||||
pub fn update_fisher(&mut self, gradients: &[Vec<f32>]) {
|
||||
// Fisher = E[∇logP(y|x;θ)²]
|
||||
for (i, grad_samples) in gradients.iter().enumerate() {
|
||||
self.fisher_info[i] = grad_samples.iter()
|
||||
.map(|g| g.powi(2))
|
||||
.sum::<f32>() / grad_samples.len() as f32;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Performance Optimization Strategy
|
||||
|
||||
### 7.1 LFM2 Level
|
||||
|
||||
| Optimization | Speedup | Quality Impact | Implementation |
|
||||
|--------------|---------|----------------|----------------|
|
||||
| Model selection | 2-4x | <1% | FastGRNN router |
|
||||
| KV cache reuse | 1.5-2x | 0% | llama.cpp native |
|
||||
| Q4 quantization | 2-3x | <2% | GGUF format |
|
||||
| Speculative decode | 1.3-1.5x | 0% | Draft model |
|
||||
| Continuous batching | 2-4x | 0% | vLLM |
|
||||
|
||||
### 7.2 Ruvector Level
|
||||
|
||||
| Optimization | Speedup | Quality Impact | Implementation |
|
||||
|--------------|---------|----------------|----------------|
|
||||
| HNSW tuning | Variable | Recall tradeoff | efSearch adjustment |
|
||||
| Product quantization | 4-8x memory | <5% | PQ in ruvector-core |
|
||||
| Graph pruning | 1.2-1.5x | <1% | Edge weight threshold |
|
||||
| Batch retrieval | 2-3x | 0% | Parallel HNSW |
|
||||
| Caching | 10x+ (hits) | 0% | LRU with TTL |
|
||||
|
||||
### 7.3 Router Level
|
||||
|
||||
| Optimization | Speedup | Quality Impact | Implementation |
|
||||
|--------------|---------|----------------|----------------|
|
||||
| Sparse weights | 10-50x | <0.5% | Magnitude pruning |
|
||||
| Low-rank U | 2-4x | <0.5% | SVD decomposition |
|
||||
| Int8 quantization | 2-4x | <0.1% | Post-training quant |
|
||||
| Cascade routing | 1.5-2x | 0% | Early exit |
|
||||
|
||||
---
|
||||
|
||||
## 8. Success Metrics
|
||||
|
||||
### 8.1 Primary Metrics
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| End-to-end latency P50 | <500ms | Timer instrumentation |
|
||||
| Quality (LLM judge) | 4.2+/5.0 | Automated evaluation |
|
||||
| Router accuracy | >95% | Oracle comparison |
|
||||
| Memory efficiency | <4GB (edge) | RSS monitoring |
|
||||
| Throughput | 20 QPS (edge) | Load testing |
|
||||
|
||||
### 8.2 Secondary Metrics
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| Retrieval R@10 | >0.90 | Benchmark suite |
|
||||
| Forgetting rate | <5%/10K updates | Periodic eval |
|
||||
| Cost reduction | >50% vs baseline | Token counting |
|
||||
| Writeback rate | 10-30% | Database metrics |
|
||||
|
||||
### 8.3 Regret Analysis
|
||||
|
||||
```
|
||||
Quality Regret = E[Q_baseline - Q_routed]
|
||||
Latency Regret = E[L_routed - L_oracle]
|
||||
Cost Regret = E[C_routed - C_oracle]
|
||||
|
||||
Targets:
|
||||
- Quality Regret < 0.1 points (1-5 scale)
|
||||
- Latency Regret < 50ms
|
||||
- Cost Regret < 10%
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Risk Analysis
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|-------------|--------|------------|
|
||||
| Router misprediction | Medium | High | Confidence thresholds, fallback |
|
||||
| Catastrophic forgetting | Low | Critical | EWC, replay buffer, checkpoints |
|
||||
| Memory exhaustion | Medium | High | Streaming, tiered storage |
|
||||
| Quality degradation | Medium | High | A/B testing, rollback |
|
||||
| Latency spikes | High | Medium | Caching, async processing |
|
||||
|
||||
---
|
||||
|
||||
## 10. Dependencies
|
||||
|
||||
### 10.1 Internal Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
ruvector-core = { path = "../ruvector-core" }
|
||||
ruvector-gnn = { path = "../ruvector-gnn" }
|
||||
ruvector-attention = { path = "../ruvector-attention" }
|
||||
ruvector-graph = { path = "../ruvector-graph" }
|
||||
ruvector-router-core = { path = "../ruvector-router-core" }
|
||||
```
|
||||
|
||||
### 10.2 External Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
# LLM runtime
|
||||
llama-cpp-rs = "0.3" # CPU inference
|
||||
tokenizers = "0.15" # Fast tokenization
|
||||
|
||||
# Async runtime
|
||||
tokio = { version = "1.41", features = ["full"] }
|
||||
|
||||
# Serialization
|
||||
serde = { version = "1.0", features = ["derive"] }
|
||||
|
||||
# Metrics
|
||||
prometheus = "0.13"
|
||||
tracing = "0.1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. References
|
||||
|
||||
1. **LFM2 Technical Report**: arxiv:2511.23404v1
|
||||
2. **FastGRNN**: Kusupati et al., "FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network"
|
||||
3. **EWC**: Kirkpatrick et al., "Overcoming catastrophic forgetting in neural networks"
|
||||
4. **HNSW**: Malkov & Yashunin, "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"
|
||||
5. **Graph Attention**: Veličković et al., "Graph Attention Networks"
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0*
|
||||
*Last Updated: 2025-12-02*
|
||||
*Author: RuvLLM Architecture Team*
|
||||
1098
examples/ruvLLM/docs/sparc/02-pseudocode.md
Normal file
1098
examples/ruvLLM/docs/sparc/02-pseudocode.md
Normal file
File diff suppressed because it is too large
Load Diff
1353
examples/ruvLLM/docs/sparc/03-architecture.md
Normal file
1353
examples/ruvLLM/docs/sparc/03-architecture.md
Normal file
File diff suppressed because it is too large
Load Diff
1159
examples/ruvLLM/docs/sparc/04-refinement.md
Normal file
1159
examples/ruvLLM/docs/sparc/04-refinement.md
Normal file
File diff suppressed because it is too large
Load Diff
886
examples/ruvLLM/docs/sparc/05-completion.md
Normal file
886
examples/ruvLLM/docs/sparc/05-completion.md
Normal file
@@ -0,0 +1,886 @@
|
||||
# RuvLLM: Integration and Deployment
|
||||
|
||||
## SPARC Phase 5: Completion
|
||||
|
||||
---
|
||||
|
||||
## 1. Integration Strategy
|
||||
|
||||
### 1.1 Crate Structure
|
||||
|
||||
```
|
||||
ruvector/
|
||||
├── crates/
|
||||
│ ├── ruvector-core/ # Existing: Vector DB
|
||||
│ ├── ruvector-gnn/ # Existing: GNN + EWC + Replay
|
||||
│ ├── ruvector-attention/ # Existing: Attention mechanisms
|
||||
│ ├── ruvector-graph/ # Existing: Graph storage
|
||||
│ └── ruvector-router-core/ # Existing: Routing primitives
|
||||
│
|
||||
└── examples/
|
||||
└── ruvLLM/ # NEW: Self-learning LLM
|
||||
├── src/
|
||||
│ ├── lib.rs # Main library entry
|
||||
│ ├── orchestrator.rs # Request orchestration
|
||||
│ ├── embedding.rs # LFM2 embedding service
|
||||
│ ├── router.rs # FastGRNN router
|
||||
│ ├── memory.rs # Ruvector memory layer
|
||||
│ ├── attention.rs # Graph attention wrapper
|
||||
│ ├── inference.rs # LFM2 model pool
|
||||
│ ├── learning.rs # Self-learning service
|
||||
│ ├── compression.rs # Concept abstraction
|
||||
│ ├── config.rs # Configuration
|
||||
│ ├── types.rs # Core types
|
||||
│ └── error.rs # Error handling
|
||||
├── tests/
|
||||
│ ├── unit/
|
||||
│ └── integration/
|
||||
├── benches/
|
||||
├── config/
|
||||
└── docs/ # SPARC documentation
|
||||
```
|
||||
|
||||
### 1.2 Dependency Integration
|
||||
|
||||
```toml
|
||||
# examples/ruvLLM/Cargo.toml
|
||||
[package]
|
||||
name = "ruvllm"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
description = "Self-learning LLM with LFM2 and Ruvector integration"
|
||||
|
||||
[dependencies]
|
||||
# Internal dependencies (path-based for development)
|
||||
ruvector-core = { path = "../../crates/ruvector-core" }
|
||||
ruvector-gnn = { path = "../../crates/ruvector-gnn" }
|
||||
ruvector-attention = { path = "../../crates/ruvector-attention" }
|
||||
ruvector-graph = { path = "../../crates/ruvector-graph" }
|
||||
ruvector-router-core = { path = "../../crates/ruvector-router-core" }
|
||||
|
||||
# LLM inference
|
||||
llama-cpp-rs = "0.3" # CPU inference via llama.cpp
|
||||
tokenizers = "0.15" # Fast tokenization
|
||||
|
||||
# Async runtime
|
||||
tokio = { version = "1.41", features = ["full"] }
|
||||
futures = "0.3"
|
||||
|
||||
# Serialization
|
||||
serde = { version = "1.0", features = ["derive"] }
|
||||
serde_json = "1.0"
|
||||
bincode = "2.0.0-rc.3"
|
||||
|
||||
# Numerics
|
||||
ndarray = { version = "0.16", features = ["serde"] }
|
||||
rand = "0.8"
|
||||
|
||||
# Utilities
|
||||
uuid = { version = "1.11", features = ["v4", "serde"] }
|
||||
chrono = { version = "0.4", features = ["serde"] }
|
||||
thiserror = "2.0"
|
||||
anyhow = "1.0"
|
||||
tracing = "0.1"
|
||||
|
||||
# Performance
|
||||
dashmap = "6.1"
|
||||
parking_lot = "0.12"
|
||||
lru = "0.12"
|
||||
|
||||
# Metrics
|
||||
prometheus = "0.13"
|
||||
|
||||
[dev-dependencies]
|
||||
criterion = { version = "0.5", features = ["html_reports"] }
|
||||
proptest = "1.5"
|
||||
tokio-test = "0.4"
|
||||
tempfile = "3.13"
|
||||
tracing-subscriber = "0.3"
|
||||
|
||||
[features]
|
||||
default = ["cpu"]
|
||||
cpu = [] # llama.cpp CPU inference
|
||||
gpu = ["vllm"] # vLLM GPU inference (optional)
|
||||
vllm = []
|
||||
|
||||
[[bench]]
|
||||
name = "pipeline"
|
||||
harness = false
|
||||
|
||||
[[bench]]
|
||||
name = "router"
|
||||
harness = false
|
||||
|
||||
[[bench]]
|
||||
name = "memory"
|
||||
harness = false
|
||||
```
|
||||
|
||||
### 1.3 API Surface
|
||||
|
||||
```rust
|
||||
//! # RuvLLM - Self-Learning LLM
|
||||
//!
|
||||
//! A self-learning language model system integrating LFM2 with Ruvector.
|
||||
//!
|
||||
//! ## Architecture
|
||||
//!
|
||||
//! - **LFM2**: Frozen reasoning engine (350M-2.6B parameters)
|
||||
//! - **Ruvector**: Living memory that adapts continuously
|
||||
//! - **FastGRNN**: Control circuit for intelligent routing
|
||||
//!
|
||||
//! ## Quick Start
|
||||
//!
|
||||
//! ```rust,ignore
|
||||
//! use ruvllm::{RuvLLM, Config};
|
||||
//!
|
||||
//! #[tokio::main]
|
||||
//! async fn main() -> Result<()> {
|
||||
//! // Initialize system
|
||||
//! let config = Config::builder()
|
||||
//! .db_path("./memory.db")
|
||||
//! .model_path_350m("./models/lfm2-350m-q4.gguf")
|
||||
//! .model_path_700m("./models/lfm2-700m-q4.gguf")
|
||||
//! .build()?;
|
||||
//!
|
||||
//! let llm = RuvLLM::new(config).await?;
|
||||
//!
|
||||
//! // Process query
|
||||
//! let response = llm.query("What is machine learning?").await?;
|
||||
//! println!("Response: {}", response.text);
|
||||
//! println!("Confidence: {:.2}", response.confidence);
|
||||
//!
|
||||
//! Ok(())
|
||||
//! }
|
||||
//! ```
|
||||
//!
|
||||
//! ## Self-Learning Loops
|
||||
//!
|
||||
//! The system learns through three feedback loops:
|
||||
//!
|
||||
//! 1. **Memory Growth**: Every interaction strengthens/weakens graph edges
|
||||
//! 2. **Router Learning**: FastGRNN learns optimal model selection
|
||||
//! 3. **Compression**: Periodic summarization creates concept hierarchies
|
||||
|
||||
pub mod attention;
|
||||
pub mod compression;
|
||||
pub mod config;
|
||||
pub mod embedding;
|
||||
pub mod error;
|
||||
pub mod inference;
|
||||
pub mod learning;
|
||||
pub mod memory;
|
||||
pub mod orchestrator;
|
||||
pub mod router;
|
||||
pub mod types;
|
||||
|
||||
// Re-exports for convenience
|
||||
pub use config::{Config, ConfigBuilder};
|
||||
pub use error::{Error, Result};
|
||||
pub use orchestrator::RuvLLM;
|
||||
pub use types::{Request, Response, Session};
|
||||
|
||||
/// Library version
|
||||
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Implementation Checklist
|
||||
|
||||
### 2.1 Core Components
|
||||
|
||||
```
|
||||
Phase 1: Foundation
|
||||
━━━━━━━━━━━━━━━━━━━━
|
||||
[x] Project structure setup
|
||||
[x] Cargo.toml with dependencies
|
||||
[ ] Error types definition
|
||||
[ ] Configuration system
|
||||
[ ] Core types (Request, Response, Session)
|
||||
|
||||
Phase 2: Services
|
||||
━━━━━━━━━━━━━━━━━━
|
||||
[ ] EmbeddingService
|
||||
[ ] LFM2 encoder wrapper
|
||||
[ ] Dimension projection
|
||||
[ ] Tokenization
|
||||
[ ] Batch processing
|
||||
|
||||
[ ] MemoryService
|
||||
[ ] VectorDB initialization
|
||||
[ ] GraphStore integration
|
||||
[ ] HNSW search wrapper
|
||||
[ ] Graph expansion
|
||||
[ ] Writeback queue
|
||||
|
||||
[ ] FastGRNNRouter
|
||||
[ ] Cell implementation
|
||||
[ ] Sparse matrix operations
|
||||
[ ] Low-rank matrices
|
||||
[ ] Output heads
|
||||
[ ] Training loop
|
||||
|
||||
[ ] GraphAttentionEngine
|
||||
[ ] Attention layer wrapper
|
||||
[ ] Edge feature encoding
|
||||
[ ] Multi-head aggregation
|
||||
[ ] Context ranking
|
||||
|
||||
[ ] InferencePool
|
||||
[ ] Model loading
|
||||
[ ] Lazy initialization
|
||||
[ ] KV cache management
|
||||
[ ] LRU eviction
|
||||
|
||||
[ ] LearningService
|
||||
[ ] Quality judge
|
||||
[ ] Replay buffer
|
||||
[ ] EWC integration
|
||||
[ ] Background training
|
||||
[ ] Compression jobs
|
||||
|
||||
Phase 3: Orchestration
|
||||
━━━━━━━━━━━━━━━━━━━━━━
|
||||
[ ] Orchestrator
|
||||
[ ] Request routing
|
||||
[ ] Session management
|
||||
[ ] Pipeline coordination
|
||||
[ ] Metrics collection
|
||||
[ ] Error handling
|
||||
|
||||
Phase 4: Integration
|
||||
━━━━━━━━━━━━━━━━━━━━
|
||||
[ ] Integration tests
|
||||
[ ] Benchmark suite
|
||||
[ ] Example applications
|
||||
[ ] Documentation
|
||||
```
|
||||
|
||||
### 2.2 Test Coverage Requirements
|
||||
|
||||
| Component | Unit Tests | Integration | Benchmark |
|
||||
|-----------|------------|-------------|-----------|
|
||||
| Embedding | 15+ | 3+ | 2 |
|
||||
| Memory | 20+ | 5+ | 3 |
|
||||
| Router | 25+ | 5+ | 2 |
|
||||
| Attention | 15+ | 3+ | 2 |
|
||||
| Inference | 10+ | 3+ | 2 |
|
||||
| Learning | 20+ | 5+ | 1 |
|
||||
| Orchestrator | 10+ | 5+ | 2 |
|
||||
| **Total** | **115+** | **29+** | **14** |
|
||||
|
||||
---
|
||||
|
||||
## 3. Deployment Configurations
|
||||
|
||||
### 3.1 Edge Deployment (Raspberry Pi / Mobile)
|
||||
|
||||
```toml
|
||||
# config/edge.toml
|
||||
[system]
|
||||
device_class = "edge"
|
||||
max_memory_mb = 2048
|
||||
max_concurrent_requests = 2
|
||||
|
||||
[embedding]
|
||||
model = "onnx" # ONNX for portability
|
||||
dimension = 384
|
||||
batch_size = 1
|
||||
|
||||
[memory]
|
||||
hnsw_m = 16
|
||||
hnsw_ef_construction = 100
|
||||
hnsw_ef_search = 32
|
||||
max_nodes = 100_000
|
||||
|
||||
[router]
|
||||
hidden_dim = 32
|
||||
sparsity = 0.95
|
||||
confidence_threshold = 0.6
|
||||
|
||||
[inference]
|
||||
models = ["350m"]
|
||||
quantization = "q4_k"
|
||||
max_context = 1024
|
||||
max_loaded_models = 1
|
||||
|
||||
[learning]
|
||||
enabled = true
|
||||
quality_threshold = 0.8
|
||||
replay_capacity = 1000
|
||||
training_interval_ms = 300_000 # 5 minutes
|
||||
```
|
||||
|
||||
### 3.2 Server Deployment (CPU)
|
||||
|
||||
```toml
|
||||
# config/server-cpu.toml
|
||||
[system]
|
||||
device_class = "server"
|
||||
max_memory_mb = 16384
|
||||
max_concurrent_requests = 20
|
||||
|
||||
[embedding]
|
||||
model = "lfm2-encoder"
|
||||
dimension = 768
|
||||
batch_size = 8
|
||||
|
||||
[memory]
|
||||
hnsw_m = 32
|
||||
hnsw_ef_construction = 200
|
||||
hnsw_ef_search = 64
|
||||
max_nodes = 10_000_000
|
||||
|
||||
[router]
|
||||
hidden_dim = 64
|
||||
sparsity = 0.9
|
||||
confidence_threshold = 0.7
|
||||
|
||||
[inference]
|
||||
models = ["700m", "1.2b", "2.6b"]
|
||||
quantization = "q5_k"
|
||||
max_context = 4096
|
||||
max_loaded_models = 2
|
||||
|
||||
[learning]
|
||||
enabled = true
|
||||
quality_threshold = 0.75
|
||||
replay_capacity = 100_000
|
||||
training_interval_ms = 60_000 # 1 minute
|
||||
```
|
||||
|
||||
### 3.3 Server Deployment (GPU)
|
||||
|
||||
```toml
|
||||
# config/server-gpu.toml
|
||||
[system]
|
||||
device_class = "gpu"
|
||||
max_memory_mb = 32768
|
||||
max_concurrent_requests = 100
|
||||
|
||||
[embedding]
|
||||
model = "lfm2-encoder"
|
||||
dimension = 1024
|
||||
batch_size = 32
|
||||
|
||||
[memory]
|
||||
hnsw_m = 48
|
||||
hnsw_ef_construction = 300
|
||||
hnsw_ef_search = 128
|
||||
max_nodes = 100_000_000
|
||||
|
||||
[router]
|
||||
hidden_dim = 64
|
||||
sparsity = 0.85
|
||||
confidence_threshold = 0.75
|
||||
|
||||
[inference]
|
||||
models = ["1.2b", "2.6b"]
|
||||
quantization = "fp16"
|
||||
max_context = 8192
|
||||
max_loaded_models = 2
|
||||
use_vllm = true
|
||||
tensor_parallel = 1
|
||||
|
||||
[learning]
|
||||
enabled = true
|
||||
quality_threshold = 0.7
|
||||
replay_capacity = 1_000_000
|
||||
training_interval_ms = 30_000 # 30 seconds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Operational Runbook
|
||||
|
||||
### 4.1 Startup Sequence
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/start.sh
|
||||
|
||||
set -e
|
||||
|
||||
CONFIG=${1:-"config/server-cpu.toml"}
|
||||
LOG_LEVEL=${LOG_LEVEL:-"info"}
|
||||
|
||||
echo "Starting RuvLLM with config: $CONFIG"
|
||||
|
||||
# 1. Validate configuration
|
||||
cargo run --release --bin ruvllm-validate -- --config "$CONFIG"
|
||||
|
||||
# 2. Initialize database if needed
|
||||
if [ ! -f "data/memory.db" ]; then
|
||||
echo "Initializing database..."
|
||||
cargo run --release --bin ruvllm-init -- --config "$CONFIG"
|
||||
fi
|
||||
|
||||
# 3. Download models if needed
|
||||
cargo run --release --bin ruvllm-models -- --config "$CONFIG" --check-or-download
|
||||
|
||||
# 4. Start server
|
||||
RUST_LOG=$LOG_LEVEL cargo run --release --bin ruvllm-server -- \
|
||||
--config "$CONFIG" \
|
||||
--metrics-port 9090 \
|
||||
--http-port 8080
|
||||
```
|
||||
|
||||
### 4.2 Health Checks
|
||||
|
||||
```rust
|
||||
/// Health check endpoint implementation
|
||||
pub struct HealthCheck {
|
||||
memory: Arc<RuvectorMemory>,
|
||||
router: Arc<FastGRNNRouter>,
|
||||
inference: Arc<InferencePool>,
|
||||
}
|
||||
|
||||
impl HealthCheck {
|
||||
pub async fn check(&self) -> HealthStatus {
|
||||
let mut status = HealthStatus::default();
|
||||
|
||||
// Check memory service
|
||||
status.memory = match self.memory.ping().await {
|
||||
Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
|
||||
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
|
||||
};
|
||||
|
||||
// Check router
|
||||
status.router = match self.router.ping() {
|
||||
Ok(latency) => ComponentHealth::Healthy { latency_ms: latency },
|
||||
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
|
||||
};
|
||||
|
||||
// Check inference (at least one model loadable)
|
||||
status.inference = match self.inference.health_check().await {
|
||||
Ok(info) => ComponentHealth::Healthy {
|
||||
latency_ms: info.latency,
|
||||
details: json!({
|
||||
"loaded_models": info.loaded_models,
|
||||
"available_memory": info.available_memory,
|
||||
}),
|
||||
},
|
||||
Err(e) => ComponentHealth::Unhealthy { error: e.to_string() },
|
||||
};
|
||||
|
||||
status.overall = if status.all_healthy() {
|
||||
OverallHealth::Healthy
|
||||
} else if status.any_critical() {
|
||||
OverallHealth::Critical
|
||||
} else {
|
||||
OverallHealth::Degraded
|
||||
};
|
||||
|
||||
status
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Monitoring Dashboards
|
||||
|
||||
```yaml
|
||||
# Prometheus alerting rules
|
||||
groups:
|
||||
- name: ruvllm
|
||||
rules:
|
||||
- alert: HighLatency
|
||||
expr: histogram_quantile(0.95, ruvllm_request_latency_seconds_bucket) > 1.0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "RuvLLM P95 latency above 1s"
|
||||
|
||||
- alert: LowQualityScore
|
||||
expr: avg(ruvllm_quality_score) < 0.7
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Average quality score dropped below 0.7"
|
||||
|
||||
- alert: MemoryPressure
|
||||
expr: ruvllm_memory_usage_bytes / ruvllm_memory_limit_bytes > 0.9
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Memory usage above 90%"
|
||||
|
||||
- alert: RouterLowConfidence
|
||||
expr: avg(ruvllm_router_confidence) < 0.5
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Router confidence consistently low"
|
||||
|
||||
- alert: HighErrorRate
|
||||
expr: rate(ruvllm_errors_total[5m]) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Error rate above 10%"
|
||||
```
|
||||
|
||||
### 4.4 Backup and Recovery
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/backup.sh
|
||||
|
||||
BACKUP_DIR="/backups/ruvllm/$(date +%Y%m%d_%H%M%S)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
echo "Creating backup in $BACKUP_DIR"
|
||||
|
||||
# 1. Backup memory database
|
||||
cp -r data/memory.db "$BACKUP_DIR/memory.db"
|
||||
|
||||
# 2. Backup router weights
|
||||
cp -r data/router_weights.bin "$BACKUP_DIR/router_weights.bin"
|
||||
|
||||
# 3. Backup EWC state
|
||||
cp -r data/ewc_state.bin "$BACKUP_DIR/ewc_state.bin"
|
||||
|
||||
# 4. Backup replay buffer
|
||||
cp -r data/replay_buffer.bin "$BACKUP_DIR/replay_buffer.bin"
|
||||
|
||||
# 5. Backup configuration
|
||||
cp -r config/ "$BACKUP_DIR/config/"
|
||||
|
||||
# 6. Create manifest
|
||||
cat > "$BACKUP_DIR/manifest.json" << EOF
|
||||
{
|
||||
"timestamp": "$(date -Iseconds)",
|
||||
"version": "$(cargo run --release --bin ruvllm-version)",
|
||||
"components": {
|
||||
"memory_db": "memory.db",
|
||||
"router_weights": "router_weights.bin",
|
||||
"ewc_state": "ewc_state.bin",
|
||||
"replay_buffer": "replay_buffer.bin",
|
||||
"config": "config/"
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
echo "Backup complete: $BACKUP_DIR"
|
||||
|
||||
# 7. Upload to S3 if configured
|
||||
if [ -n "$S3_BACKUP_BUCKET" ]; then
|
||||
aws s3 sync "$BACKUP_DIR" "s3://$S3_BACKUP_BUCKET/$(basename $BACKUP_DIR)/"
|
||||
echo "Uploaded to S3: $S3_BACKUP_BUCKET"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Production Checklist
|
||||
|
||||
### 5.1 Pre-Launch
|
||||
|
||||
```
|
||||
Security
|
||||
━━━━━━━━
|
||||
[ ] Input validation and sanitization
|
||||
[ ] Rate limiting configured
|
||||
[ ] TLS/HTTPS enabled
|
||||
[ ] API authentication (if public)
|
||||
[ ] Secrets in environment variables
|
||||
[ ] Model integrity verification
|
||||
|
||||
Performance
|
||||
━━━━━━━━━━━
|
||||
[ ] Load tested to expected traffic
|
||||
[ ] Memory profiled (no leaks)
|
||||
[ ] Latency targets met
|
||||
[ ] Caching configured
|
||||
[ ] Connection pooling
|
||||
|
||||
Reliability
|
||||
━━━━━━━━━━━
|
||||
[ ] Health checks implemented
|
||||
[ ] Graceful shutdown
|
||||
[ ] Automatic restarts (systemd/k8s)
|
||||
[ ] Backup procedures tested
|
||||
[ ] Recovery procedures documented
|
||||
|
||||
Observability
|
||||
━━━━━━━━━━━━━
|
||||
[ ] Structured logging
|
||||
[ ] Metrics exported
|
||||
[ ] Distributed tracing
|
||||
[ ] Alerting rules configured
|
||||
[ ] Dashboards created
|
||||
```
|
||||
|
||||
### 5.2 Post-Launch
|
||||
|
||||
```
|
||||
Daily
|
||||
━━━━━
|
||||
[ ] Check error rates
|
||||
[ ] Review quality scores
|
||||
[ ] Monitor latency trends
|
||||
[ ] Verify backup success
|
||||
|
||||
Weekly
|
||||
━━━━━━
|
||||
[ ] Review router decisions distribution
|
||||
[ ] Analyze forgetting metrics
|
||||
[ ] Check memory growth rate
|
||||
[ ] Run compression job
|
||||
[ ] Update router weights
|
||||
|
||||
Monthly
|
||||
━━━━━━━
|
||||
[ ] Full system backup
|
||||
[ ] Performance benchmark
|
||||
[ ] Security audit
|
||||
[ ] Dependency updates
|
||||
[ ] Evaluate student model candidates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. API Reference
|
||||
|
||||
### 6.1 HTTP API
|
||||
|
||||
```yaml
|
||||
openapi: "3.0.0"
|
||||
info:
|
||||
title: RuvLLM API
|
||||
version: "0.1.0"
|
||||
description: Self-learning LLM with LFM2 and Ruvector
|
||||
|
||||
paths:
|
||||
/v1/query:
|
||||
post:
|
||||
summary: Process a query
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required:
|
||||
- query
|
||||
properties:
|
||||
query:
|
||||
type: string
|
||||
description: The user query
|
||||
session_id:
|
||||
type: string
|
||||
description: Optional session for multi-turn
|
||||
constraints:
|
||||
type: object
|
||||
properties:
|
||||
max_latency_ms:
|
||||
type: integer
|
||||
max_tokens:
|
||||
type: integer
|
||||
temperature:
|
||||
type: number
|
||||
responses:
|
||||
"200":
|
||||
description: Successful response
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
text:
|
||||
type: string
|
||||
confidence:
|
||||
type: number
|
||||
sources:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
routing_info:
|
||||
type: object
|
||||
|
||||
/v1/feedback:
|
||||
post:
|
||||
summary: Provide feedback on a response
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required:
|
||||
- request_id
|
||||
properties:
|
||||
request_id:
|
||||
type: string
|
||||
rating:
|
||||
type: integer
|
||||
minimum: 1
|
||||
maximum: 5
|
||||
correction:
|
||||
type: string
|
||||
responses:
|
||||
"200":
|
||||
description: Feedback recorded
|
||||
|
||||
/v1/health:
|
||||
get:
|
||||
summary: Health check
|
||||
responses:
|
||||
"200":
|
||||
description: System healthy
|
||||
"503":
|
||||
description: System unhealthy
|
||||
|
||||
/v1/metrics:
|
||||
get:
|
||||
summary: Prometheus metrics
|
||||
responses:
|
||||
"200":
|
||||
description: Metrics in Prometheus format
|
||||
```
|
||||
|
||||
### 6.2 Rust SDK
|
||||
|
||||
```rust
|
||||
use ruvllm::{RuvLLM, Config, Request, Response};
|
||||
|
||||
/// Simple query
|
||||
async fn simple_query(llm: &RuvLLM) -> Result<Response> {
|
||||
llm.query("What is Rust?").await
|
||||
}
|
||||
|
||||
/// Query with options
|
||||
async fn query_with_options(llm: &RuvLLM) -> Result<Response> {
|
||||
llm.query_with(Request {
|
||||
query: "Explain backpropagation".into(),
|
||||
session_id: Some("user-123".into()),
|
||||
constraints: Constraints {
|
||||
max_latency_ms: Some(500),
|
||||
max_tokens: Some(500),
|
||||
temperature: Some(0.7),
|
||||
..Default::default()
|
||||
},
|
||||
}).await
|
||||
}
|
||||
|
||||
/// Multi-turn conversation
|
||||
async fn conversation(llm: &RuvLLM) -> Result<()> {
|
||||
let session = llm.new_session();
|
||||
|
||||
let r1 = llm.query_session(&session, "What is a neural network?").await?;
|
||||
println!("Turn 1: {}", r1.text);
|
||||
|
||||
let r2 = llm.query_session(&session, "How do you train one?").await?;
|
||||
println!("Turn 2: {}", r2.text);
|
||||
|
||||
let r3 = llm.query_session(&session, "What about overfitting?").await?;
|
||||
println!("Turn 3: {}", r3.text);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Provide feedback
|
||||
async fn with_feedback(llm: &RuvLLM) -> Result<()> {
|
||||
let response = llm.query("What is 2+2?").await?;
|
||||
|
||||
llm.feedback(Feedback {
|
||||
request_id: response.request_id,
|
||||
rating: 5,
|
||||
correction: None,
|
||||
}).await?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Stream response
|
||||
async fn streaming(llm: &RuvLLM) -> Result<()> {
|
||||
let mut stream = llm.query_stream("Tell me a story").await?;
|
||||
|
||||
while let Some(chunk) = stream.next().await {
|
||||
print!("{}", chunk?);
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Future Roadmap
|
||||
|
||||
### 7.1 Short-Term (1-3 months)
|
||||
|
||||
- [ ] LFM2-VL integration (vision-language)
|
||||
- [ ] Multi-GPU inference with tensor parallelism
|
||||
- [ ] Retrieval-augmented fine-tuning pipeline
|
||||
- [ ] Improved compression algorithms
|
||||
- [ ] WebAssembly deployment target
|
||||
|
||||
### 7.2 Medium-Term (3-6 months)
|
||||
|
||||
- [ ] Federated learning across edge nodes
|
||||
- [ ] LFM2-Audio integration (speech)
|
||||
- [ ] Custom domain fine-tuning toolkit
|
||||
- [ ] Advanced curriculum learning
|
||||
- [ ] Hyperbolic embeddings for hierarchies
|
||||
|
||||
### 7.3 Long-Term (6-12 months)
|
||||
|
||||
- [ ] Multi-agent collaboration
|
||||
- [ ] Neuro-symbolic reasoning integration
|
||||
- [ ] Continuous pre-training pipeline
|
||||
- [ ] Hardware-specific optimizations (NPU, TPU)
|
||||
- [ ] Enterprise multi-tenancy
|
||||
|
||||
---
|
||||
|
||||
## 8. Success Criteria
|
||||
|
||||
### 8.1 Technical Metrics
|
||||
|
||||
| Metric | Target | Current |
|
||||
|--------|--------|---------|
|
||||
| Latency P50 | <500ms | - |
|
||||
| Latency P99 | <2s | - |
|
||||
| Quality Score | >0.8 | - |
|
||||
| Router Accuracy | >90% | - |
|
||||
| Memory Efficiency | <4GB (edge) | - |
|
||||
| Throughput | 20 QPS (edge) | - |
|
||||
| Forgetting Rate | <5%/10K | - |
|
||||
| Test Coverage | >80% | - |
|
||||
|
||||
### 8.2 Business Metrics
|
||||
|
||||
| Metric | Target | Notes |
|
||||
|--------|--------|-------|
|
||||
| User Satisfaction | >4.0/5.0 | Survey scores |
|
||||
| Response Relevance | >85% | Human eval |
|
||||
| Knowledge Retention | >90% | Multi-turn coherence |
|
||||
| Cost Reduction | >50% | vs. always-big baseline |
|
||||
|
||||
---
|
||||
|
||||
## 9. Conclusion
|
||||
|
||||
RuvLLM represents a paradigm shift from static LLMs to adaptive, self-learning systems. By treating:
|
||||
|
||||
- **LFM2 as the stable cortex** (reasoning)
|
||||
- **Ruvector as the living synaptic mesh** (memory)
|
||||
- **FastGRNN as the control circuit** (routing)
|
||||
|
||||
We create intelligence that emerges from the loop, not just the model.
|
||||
|
||||
The three learning loops—memory growth, router optimization, and concept compression—enable continuous adaptation without the risks of in-place weight modification.
|
||||
|
||||
**The intelligence is not in one model anymore. It is in the loop.**
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0*
|
||||
*Last Updated: 2025-12-02*
|
||||
*Author: RuvLLM Architecture Team*
|
||||
Reference in New Issue
Block a user