Files

40 KiB

SONA Implementation Roadmap

Overview

This document outlines the optimized, prioritized implementation strategy for SONA (Self-Optimizing Neural Architecture). The roadmap leverages existing ruvLLM infrastructure and focuses on maximum value with minimum disruption.

Gap Analysis: Existing vs Required

┌─────────────────────────────────────────────────────────────────────────┐
│                     EXISTING INFRASTRUCTURE                             │
├─────────────────────────────────────────────────────────────────────────┤
│  ✅ LearningService      │ Has EWC skeleton, replay buffer, feedback   │
│  ✅ FastGRNNRouter       │ Low-rank decomposition, 7 output heads      │
│  ✅ MemoryService        │ HNSW graph, node storage, edge weights      │
│  ✅ SIMD Infrastructure  │ AVX2 softmax, matmul, RMS norm              │
│  ✅ Three-Loop Design    │ Loop A/B/C conceptually defined             │
├─────────────────────────────────────────────────────────────────────────┤
│                         GAPS TO FILL                                    │
├─────────────────────────────────────────────────────────────────────────┤
│  ❌ Micro-LoRA           │ Per-request adaptation (NEW)                │
│  ❌ Trajectory Recording │ Step-by-step inference capture              │
│  ❌ EWC++ Enhancements   │ Online Fisher, task boundary detection      │
│  ❌ ReasoningBank        │ K-means++ pattern extraction                │
│  ❌ Dream Engine         │ Random walk + Φ evaluation                  │
│  ❌ Loop Coordinator     │ Temporal orchestration of A/B/C             │
└─────────────────────────────────────────────────────────────────────────┘

Optimized Priority Matrix

Priority Component Impact Effort Build On
P0 Trajectory Recording High Low types.rs
P0 Micro-LoRA High Medium simd_inference.rs
P1 EWC++ Enhancement High Medium learning.rs (existing)
P1 ReasoningBank High Medium memory.rs
P2 Loop Coordinator Medium Low learning.rs
P2 Dream Engine Medium High exo-ai crates
P3 Φ Measurement Low High exo-core

Implementation Philosophy

┌─────────────────────────────────────────────────────────────────────────┐
│                     Implementation Principles                           │
├─────────────────────────────────────────────────────────────────────────┤
│  1. Leverage Existing    │ Build on learning.rs, router.rs, memory.rs  │
│  2. Incremental Value    │ Each phase delivers working functionality   │
│  3. Test-First           │ TDD with comprehensive coverage             │
│  4. Benchmark-Driven     │ Performance validated at each step          │
│  5. Backward Compatible  │ No breaking changes to existing API         │
│  6. Modular Design       │ Components can be used independently        │
└─────────────────────────────────────────────────────────────────────────┘

OPTIMIZED PHASE STRUCTURE

Sprint 1: Foundation (P0) - Core Data Flow

Goal: Enable trajectory capture and micro-adaptation without breaking existing API.

Files to Create:

  • src/sona/mod.rs - SONA module entry point
  • src/sona/types.rs - Core types (LearningSignal, QueryTrajectory)
  • src/sona/lora.rs - MicroLoRA implementation
  • src/sona/trajectory.rs - Lock-free trajectory buffer

Files to Modify:

  • src/lib.rs - Add pub mod sona;
  • src/orchestrator.rs - Inject trajectory recording hooks

Sprint 2: Learning Enhancement (P1) - EWC++ & Patterns

Goal: Upgrade existing EWC to EWC++, add pattern extraction.

Files to Modify:

  • src/learning.rs - Upgrade EWCState → EwcPlusPlus
  • src/memory.rs - Add pattern extraction methods

Files to Create:

  • src/sona/ewc.rs - Full EWC++ with online Fisher
  • src/sona/reasoning_bank.rs - K-means++ pattern storage

Sprint 3: Loop Orchestration (P2) - Temporal Coordination

Goal: Unify instant/background/deep learning cycles.

Files to Create:

  • src/sona/loops/mod.rs - Loop module
  • src/sona/loops/instant.rs - Loop A
  • src/sona/loops/background.rs - Loop B
  • src/sona/loops/deep.rs - Loop C
  • src/sona/coordinator.rs - LoopCoordinator

Sprint 4: Dream & Φ (P3) - Creative Exploration

Goal: Add dream-based consolidation with quality measurement.

Files to Create:

  • src/sona/dreams.rs - DreamEngine
  • src/sona/phi.rs - Φ evaluator (optional exo-core integration)

SPRINT 1: Foundation (P0) - Detailed Implementation

1.1 Core Data Structures (SIMPLIFIED)

Deliverables:

  • LearningSignal struct with gradient estimation
  • QueryTrajectory for inference recording
  • LearnedPattern for pattern storage
  • SIMD-optimized tensor operations

Implementation:

// src/sona/types.rs

/// Learning signal from inference
#[derive(Clone, Debug)]
pub struct LearningSignal {
    pub query_embedding: Vec<f32>,
    pub gradient_estimate: Vec<f32>,
    pub quality_score: f32,
    pub timestamp: Instant,
    pub metadata: SignalMetadata,
}

impl LearningSignal {
    /// Create from query trajectory
    pub fn from_trajectory(trajectory: &QueryTrajectory) -> Self {
        let gradient = Self::estimate_gradient(trajectory);

        Self {
            query_embedding: trajectory.query_embedding.clone(),
            gradient_estimate: gradient,
            quality_score: trajectory.final_quality,
            timestamp: Instant::now(),
            metadata: SignalMetadata {
                trajectory_id: trajectory.id,
                step_count: trajectory.steps.len(),
            },
        }
    }

    /// Estimate gradient from trajectory using REINFORCE
    fn estimate_gradient(trajectory: &QueryTrajectory) -> Vec<f32> {
        let dim = trajectory.query_embedding.len();
        let mut gradient = vec![0.0; dim];

        let baseline = trajectory.steps.iter()
            .map(|s| s.reward)
            .sum::<f32>() / trajectory.steps.len() as f32;

        for step in &trajectory.steps {
            let advantage = step.reward - baseline;
            for (i, &activation) in step.activations.iter().enumerate() {
                gradient[i] += advantage * activation;
            }
        }

        // Normalize
        let norm: f32 = gradient.iter().map(|x| x * x).sum::<f32>().sqrt();
        if norm > 1e-6 {
            gradient.iter_mut().for_each(|x| *x /= norm);
        }

        gradient
    }
}

/// Query trajectory recording
#[derive(Clone, Debug)]
pub struct QueryTrajectory {
    pub id: u64,
    pub query_embedding: Vec<f32>,
    pub steps: Vec<TrajectoryStep>,
    pub final_quality: f32,
    pub latency_us: u64,
}

#[derive(Clone, Debug)]
pub struct TrajectoryStep {
    pub activations: Vec<f32>,
    pub attention_weights: Vec<f32>,
    pub reward: f32,
    pub timestamp: Instant,
}

/// Learned pattern from pattern extraction
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct LearnedPattern {
    pub id: u64,
    pub centroid: Vec<f32>,
    pub cluster_size: usize,
    pub total_weight: f32,
    pub avg_quality: f32,
    pub created_at: u64,
    pub last_accessed: u64,
    pub access_count: u32,
}

impl LearnedPattern {
    /// Merge two patterns
    pub fn merge(&self, other: &Self) -> Self {
        let total_size = self.cluster_size + other.cluster_size;
        let w1 = self.cluster_size as f32 / total_size as f32;
        let w2 = other.cluster_size as f32 / total_size as f32;

        let centroid: Vec<f32> = self.centroid.iter()
            .zip(&other.centroid)
            .map(|(&a, &b)| a * w1 + b * w2)
            .collect();

        Self {
            id: self.id,  // Keep original ID
            centroid,
            cluster_size: total_size,
            total_weight: self.total_weight + other.total_weight,
            avg_quality: self.avg_quality * w1 + other.avg_quality * w2,
            created_at: self.created_at.min(other.created_at),
            last_accessed: self.last_accessed.max(other.last_accessed),
            access_count: self.access_count + other.access_count,
        }
    }

    /// Decay pattern importance over time
    pub fn decay(&mut self, factor: f32) {
        self.total_weight *= factor;
    }
}

Tests:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_learning_signal_creation() {
        let trajectory = QueryTrajectory {
            id: 1,
            query_embedding: vec![0.1, 0.2, 0.3],
            steps: vec![
                TrajectoryStep {
                    activations: vec![0.5, 0.3, 0.2],
                    attention_weights: vec![0.4, 0.4, 0.2],
                    reward: 0.8,
                    timestamp: Instant::now(),
                },
            ],
            final_quality: 0.8,
            latency_us: 1000,
        };

        let signal = LearningSignal::from_trajectory(&trajectory);
        assert_eq!(signal.quality_score, 0.8);
        assert_eq!(signal.gradient_estimate.len(), 3);
    }

    #[test]
    fn test_pattern_merge() {
        let p1 = LearnedPattern {
            id: 1,
            centroid: vec![1.0, 0.0],
            cluster_size: 10,
            total_weight: 5.0,
            avg_quality: 0.8,
            created_at: 100,
            last_accessed: 200,
            access_count: 5,
        };

        let p2 = LearnedPattern {
            id: 2,
            centroid: vec![0.0, 1.0],
            cluster_size: 10,
            total_weight: 5.0,
            avg_quality: 0.9,
            created_at: 150,
            last_accessed: 250,
            access_count: 3,
        };

        let merged = p1.merge(&p2);
        assert_eq!(merged.cluster_size, 20);
        assert!((merged.centroid[0] - 0.5).abs() < 1e-6);
        assert!((merged.centroid[1] - 0.5).abs() < 1e-6);
        assert!((merged.avg_quality - 0.85).abs() < 1e-6);
    }
}

1.2 Micro-LoRA Implementation

Deliverables:

  • MicroLoRA struct with rank 1-2 adapters
  • SIMD-optimized forward pass
  • Gradient accumulation buffer
  • Sub-100μs update mechanism

Implementation:

// src/sona/lora.rs

/// Micro-LoRA for per-request adaptation
pub struct MicroLoRA {
    /// Down projection (hidden_dim -> rank)
    pub down_proj: Vec<f32>,
    /// Up projection (rank -> hidden_dim)
    pub up_proj: Vec<f32>,
    /// Rank (1-2 for micro updates)
    pub rank: usize,
    /// Hidden dimension
    pub hidden_dim: usize,
    /// Accumulated gradients
    gradient_buffer: Vec<f32>,
    /// Update count for averaging
    update_count: usize,
    /// Scaling factor
    pub scale: f32,
}

impl MicroLoRA {
    pub fn new(hidden_dim: usize, rank: usize) -> Self {
        assert!(rank <= 2, "MicroLoRA rank should be 1-2");

        // Initialize with small random values
        let mut rng = rand::thread_rng();
        let down_proj: Vec<f32> = (0..hidden_dim * rank)
            .map(|_| rng.gen::<f32>() * 0.01)
            .collect();
        let up_proj = vec![0.0; rank * hidden_dim];  // Initialize to zero

        Self {
            down_proj,
            up_proj,
            rank,
            hidden_dim,
            gradient_buffer: vec![0.0; (hidden_dim * rank) * 2],
            update_count: 0,
            scale: 1.0 / (rank as f32).sqrt(),
        }
    }

    /// SIMD-optimized forward pass
    #[cfg(target_arch = "x86_64")]
    #[target_feature(enable = "avx2")]
    pub unsafe fn forward_simd(&self, input: &[f32], output: &mut [f32]) {
        use std::arch::x86_64::*;

        assert_eq!(input.len(), self.hidden_dim);
        assert_eq!(output.len(), self.hidden_dim);

        // Down projection: hidden_dim -> rank
        let mut intermediate = vec![0.0f32; self.rank];

        for r in 0..self.rank {
            let mut sum = _mm256_setzero_ps();
            let down_offset = r * self.hidden_dim;

            let mut i = 0;
            while i + 8 <= self.hidden_dim {
                let inp = _mm256_loadu_ps(input[i..].as_ptr());
                let weight = _mm256_loadu_ps(self.down_proj[down_offset + i..].as_ptr());
                sum = _mm256_fmadd_ps(inp, weight, sum);
                i += 8;
            }

            // Horizontal sum
            let mut result = [0.0f32; 8];
            _mm256_storeu_ps(result.as_mut_ptr(), sum);
            intermediate[r] = result.iter().sum();

            // Handle remaining elements
            for j in i..self.hidden_dim {
                intermediate[r] += input[j] * self.down_proj[down_offset + j];
            }
        }

        // Up projection: rank -> hidden_dim
        let mut i = 0;
        while i + 8 <= self.hidden_dim {
            let mut sum = _mm256_setzero_ps();

            for r in 0..self.rank {
                let up_offset = r * self.hidden_dim;
                let weight = _mm256_loadu_ps(self.up_proj[up_offset + i..].as_ptr());
                let inter = _mm256_set1_ps(intermediate[r]);
                sum = _mm256_fmadd_ps(inter, weight, sum);
            }

            // Scale and add to output
            let scale_vec = _mm256_set1_ps(self.scale);
            sum = _mm256_mul_ps(sum, scale_vec);
            let existing = _mm256_loadu_ps(output[i..].as_ptr());
            let result = _mm256_add_ps(existing, sum);
            _mm256_storeu_ps(output[i..].as_mut_ptr(), result);

            i += 8;
        }

        // Handle remaining elements
        for j in i..self.hidden_dim {
            let mut val = 0.0;
            for r in 0..self.rank {
                val += intermediate[r] * self.up_proj[r * self.hidden_dim + j];
            }
            output[j] += val * self.scale;
        }
    }

    /// Accumulate gradient for later update
    pub fn accumulate_gradient(&mut self, signal: &LearningSignal) {
        assert_eq!(signal.gradient_estimate.len(), self.hidden_dim);

        // Accumulate into buffer (simplified outer product update)
        for r in 0..self.rank {
            for i in 0..self.hidden_dim {
                let grad_idx = r * self.hidden_dim + i;
                self.gradient_buffer[grad_idx] +=
                    signal.gradient_estimate[i] * signal.quality_score;
            }
        }

        self.update_count += 1;
    }

    /// Apply accumulated gradients with learning rate
    pub fn apply_accumulated(&mut self, learning_rate: f32) {
        if self.update_count == 0 {
            return;
        }

        let scale = learning_rate / self.update_count as f32;

        // Update up projection (main adaptation target)
        for (i, grad) in self.gradient_buffer.iter().enumerate() {
            if i < self.up_proj.len() {
                self.up_proj[i] += grad * scale;
            }
        }

        // Reset buffer
        self.gradient_buffer.fill(0.0);
        self.update_count = 0;
    }

    /// Get current parameter count
    pub fn param_count(&self) -> usize {
        self.down_proj.len() + self.up_proj.len()
    }
}

/// Base LoRA for hourly adaptation
pub struct BaseLoRA {
    pub layers: Vec<LoRALayer>,
    pub rank: usize,
    pub hidden_dim: usize,
    pub alpha: f32,
}

#[derive(Clone)]
pub struct LoRALayer {
    pub down_proj: Vec<f32>,
    pub up_proj: Vec<f32>,
    pub layer_idx: usize,
}

impl BaseLoRA {
    pub fn new(hidden_dim: usize, rank: usize, num_layers: usize) -> Self {
        let layers = (0..num_layers)
            .map(|idx| LoRALayer {
                down_proj: vec![0.0; hidden_dim * rank],
                up_proj: vec![0.0; rank * hidden_dim],
                layer_idx: idx,
            })
            .collect();

        Self {
            layers,
            rank,
            hidden_dim,
            alpha: rank as f32,
        }
    }

    /// Merge base LoRA into model weights
    pub fn merge_weights(&self, model_weights: &mut [f32], layer_idx: usize) {
        if layer_idx >= self.layers.len() {
            return;
        }

        let layer = &self.layers[layer_idx];
        let scale = self.alpha / self.rank as f32;

        // W' = W + scale * (down @ up)
        for i in 0..self.hidden_dim {
            for j in 0..self.hidden_dim {
                let mut delta = 0.0;
                for r in 0..self.rank {
                    delta += layer.down_proj[i * self.rank + r]
                           * layer.up_proj[r * self.hidden_dim + j];
                }
                model_weights[i * self.hidden_dim + j] += delta * scale;
            }
        }
    }
}

1.3 Trajectory Recording

Deliverables:

  • Lock-free trajectory buffer
  • Efficient step recording
  • Quality signal extraction

Implementation:

// src/sona/trajectory.rs

use crossbeam::queue::ArrayQueue;

/// Lock-free trajectory buffer
pub struct TrajectoryBuffer {
    buffer: ArrayQueue<QueryTrajectory>,
    capacity: usize,
    dropped: AtomicU64,
}

impl TrajectoryBuffer {
    pub fn new(capacity: usize) -> Self {
        Self {
            buffer: ArrayQueue::new(capacity),
            capacity,
            dropped: AtomicU64::new(0),
        }
    }

    /// Record trajectory (non-blocking)
    pub fn record(&self, trajectory: QueryTrajectory) -> bool {
        match self.buffer.push(trajectory) {
            Ok(()) => true,
            Err(_) => {
                self.dropped.fetch_add(1, Ordering::Relaxed);
                false
            }
        }
    }

    /// Drain all trajectories for processing
    pub fn drain(&self) -> Vec<QueryTrajectory> {
        let mut result = Vec::with_capacity(self.capacity);
        while let Some(t) = self.buffer.pop() {
            result.push(t);
        }
        result
    }

    /// Get dropped count
    pub fn dropped_count(&self) -> u64 {
        self.dropped.load(Ordering::Relaxed)
    }
}

/// Builder for constructing trajectories during inference
pub struct TrajectoryBuilder {
    id: u64,
    query_embedding: Vec<f32>,
    steps: Vec<TrajectoryStep>,
    start_time: Instant,
}

impl TrajectoryBuilder {
    pub fn new(id: u64, query_embedding: Vec<f32>) -> Self {
        Self {
            id,
            query_embedding,
            steps: Vec::with_capacity(16),
            start_time: Instant::now(),
        }
    }

    /// Record a step
    pub fn add_step(&mut self, activations: Vec<f32>, attention_weights: Vec<f32>, reward: f32) {
        self.steps.push(TrajectoryStep {
            activations,
            attention_weights,
            reward,
            timestamp: Instant::now(),
        });
    }

    /// Finalize trajectory
    pub fn build(self, final_quality: f32) -> QueryTrajectory {
        QueryTrajectory {
            id: self.id,
            query_embedding: self.query_embedding,
            steps: self.steps,
            final_quality,
            latency_us: self.start_time.elapsed().as_micros() as u64,
        }
    }
}

Phase 2: Learning Loops

2.1 Loop A (Instant Learning)

Deliverables:

  • Per-request trajectory recording
  • Micro-LoRA gradient accumulation
  • Edge weight updates

Implementation:

// src/sona/loops/instant.rs

/// Instant learning loop (per-request)
pub struct InstantLoop {
    trajectory_buffer: Arc<TrajectoryBuffer>,
    micro_lora: RwLock<MicroLoRA>,
    edge_weights: RwLock<EdgeWeights>,
    config: InstantLoopConfig,
    metrics: InstantLoopMetrics,
}

#[derive(Clone)]
pub struct InstantLoopConfig {
    pub micro_lora_rank: usize,
    pub micro_lora_lr: f32,
    pub edge_update_scale: f32,
    pub max_pending_signals: usize,
}

impl Default for InstantLoopConfig {
    fn default() -> Self {
        Self {
            micro_lora_rank: 1,
            micro_lora_lr: 0.001,
            edge_update_scale: 0.01,
            max_pending_signals: 1000,
        }
    }
}

impl InstantLoop {
    pub fn new(hidden_dim: usize, config: InstantLoopConfig) -> Self {
        Self {
            trajectory_buffer: Arc::new(TrajectoryBuffer::new(config.max_pending_signals)),
            micro_lora: RwLock::new(MicroLoRA::new(hidden_dim, config.micro_lora_rank)),
            edge_weights: RwLock::new(EdgeWeights::new()),
            config,
            metrics: InstantLoopMetrics::default(),
        }
    }

    /// Process inference request (called during forward pass)
    pub fn on_inference(&self, trajectory: QueryTrajectory) {
        // Record trajectory
        self.trajectory_buffer.record(trajectory.clone());

        // Generate learning signal
        let signal = LearningSignal::from_trajectory(&trajectory);

        // Accumulate gradient (non-blocking)
        if let Ok(mut lora) = self.micro_lora.try_write() {
            lora.accumulate_gradient(&signal);
        }

        // Update edge weights (non-blocking)
        if let Ok(mut edges) = self.edge_weights.try_write() {
            edges.update_from_signal(&signal, self.config.edge_update_scale);
        }
    }

    /// Apply accumulated updates (called periodically)
    pub fn flush_updates(&self) {
        // Apply micro-LoRA updates
        if let Ok(mut lora) = self.micro_lora.write() {
            lora.apply_accumulated(self.config.micro_lora_lr);
        }

        // Commit edge weight updates
        if let Ok(mut edges) = self.edge_weights.write() {
            edges.commit();
        }
    }

    /// Get trajectory buffer for background processing
    pub fn drain_trajectories(&self) -> Vec<QueryTrajectory> {
        self.trajectory_buffer.drain()
    }
}

/// Edge weights for knowledge graph
pub struct EdgeWeights {
    weights: HashMap<(NodeId, NodeId), f32>,
    pending_updates: Vec<(NodeId, NodeId, f32)>,
}

impl EdgeWeights {
    pub fn new() -> Self {
        Self {
            weights: HashMap::new(),
            pending_updates: Vec::new(),
        }
    }

    pub fn update_from_signal(&mut self, signal: &LearningSignal, scale: f32) {
        // Extract node pairs from signal (simplified)
        let nodes = Self::extract_activated_nodes(signal);

        for i in 0..nodes.len() {
            for j in i+1..nodes.len() {
                let delta = signal.quality_score * scale;
                self.pending_updates.push((nodes[i], nodes[j], delta));
            }
        }
    }

    pub fn commit(&mut self) {
        for (from, to, delta) in self.pending_updates.drain(..) {
            *self.weights.entry((from, to)).or_insert(0.0) += delta;
        }
    }

    fn extract_activated_nodes(signal: &LearningSignal) -> Vec<NodeId> {
        // Simplified: top-k indices from gradient
        signal.gradient_estimate.iter()
            .enumerate()
            .filter(|(_, &v)| v.abs() > 0.1)
            .take(5)
            .map(|(i, _)| i as NodeId)
            .collect()
    }
}

2.2 Loop B (Background Learning)

Deliverables:

  • Hourly pattern extraction
  • EWC++ gradient constraints
  • Base LoRA updates

Implementation:

// src/sona/loops/background.rs

/// Background learning loop (hourly)
pub struct BackgroundLoop {
    reasoning_bank: Arc<RwLock<ReasoningBank>>,
    ewc: Arc<RwLock<EwcPlusPlus>>,
    base_lora: Arc<RwLock<BaseLoRA>>,
    scheduler: BackgroundScheduler,
    config: BackgroundLoopConfig,
}

#[derive(Clone)]
pub struct BackgroundLoopConfig {
    pub extraction_interval: Duration,
    pub min_trajectories: usize,
    pub base_lora_lr: f32,
    pub ewc_lambda: f32,
}

impl Default for BackgroundLoopConfig {
    fn default() -> Self {
        Self {
            extraction_interval: Duration::from_secs(3600),  // 1 hour
            min_trajectories: 100,
            base_lora_lr: 0.0001,
            ewc_lambda: 1000.0,
        }
    }
}

impl BackgroundLoop {
    pub fn new(config: BackgroundLoopConfig, hidden_dim: usize) -> Self {
        Self {
            reasoning_bank: Arc::new(RwLock::new(ReasoningBank::new(PatternConfig::default()))),
            ewc: Arc::new(RwLock::new(EwcPlusPlus::new(EwcConfig::default()))),
            base_lora: Arc::new(RwLock::new(BaseLoRA::new(hidden_dim, 8, 12))),
            scheduler: BackgroundScheduler::new(config.extraction_interval),
            config,
        }
    }

    /// Run background learning cycle
    pub async fn run_cycle(&self, trajectories: Vec<QueryTrajectory>) -> BackgroundResult {
        if trajectories.len() < self.config.min_trajectories {
            return BackgroundResult::skipped("insufficient trajectories");
        }

        let start = Instant::now();

        // 1. Add trajectories to reasoning bank
        {
            let mut bank = self.reasoning_bank.write().await;
            for trajectory in &trajectories {
                bank.add_trajectory(trajectory);
            }
        }

        // 2. Extract patterns
        let patterns = {
            let mut bank = self.reasoning_bank.write().await;
            bank.extract_patterns()
        };

        // 3. Compute gradients from patterns
        let gradients = self.compute_pattern_gradients(&patterns);

        // 4. Apply EWC++ constraints
        let constrained_gradients = {
            let ewc = self.ewc.read().await;
            ewc.apply_constraints(&gradients)
        };

        // 5. Update base LoRA
        {
            let mut lora = self.base_lora.write().await;
            self.apply_gradients_to_lora(&mut lora, &constrained_gradients);
        }

        // 6. Update EWC++ Fisher information
        {
            let mut ewc = self.ewc.write().await;
            ewc.update_fisher(&constrained_gradients);
        }

        BackgroundResult {
            trajectories_processed: trajectories.len(),
            patterns_extracted: patterns.len(),
            elapsed: start.elapsed(),
            status: "completed".to_string(),
        }
    }

    fn compute_pattern_gradients(&self, patterns: &[LearnedPattern]) -> Vec<f32> {
        // Aggregate pattern centroids weighted by quality
        let mut gradient = vec![0.0f32; patterns.first().map(|p| p.centroid.len()).unwrap_or(0)];
        let mut total_weight = 0.0;

        for pattern in patterns {
            let weight = pattern.avg_quality * pattern.cluster_size as f32;
            for (i, &v) in pattern.centroid.iter().enumerate() {
                gradient[i] += v * weight;
            }
            total_weight += weight;
        }

        if total_weight > 0.0 {
            gradient.iter_mut().for_each(|v| *v /= total_weight);
        }

        gradient
    }

    fn apply_gradients_to_lora(&self, lora: &mut BaseLoRA, gradients: &[f32]) {
        // Distribute gradients across layers
        let per_layer = gradients.len() / lora.layers.len();

        for (layer_idx, layer) in lora.layers.iter_mut().enumerate() {
            let start = layer_idx * per_layer;
            let end = (start + per_layer).min(gradients.len());

            // Update up projection
            for (i, &grad) in gradients[start..end].iter().enumerate() {
                if i < layer.up_proj.len() {
                    layer.up_proj[i] += grad * self.config.base_lora_lr;
                }
            }
        }
    }
}

#[derive(Debug)]
pub struct BackgroundResult {
    pub trajectories_processed: usize,
    pub patterns_extracted: usize,
    pub elapsed: Duration,
    pub status: String,
}

impl BackgroundResult {
    fn skipped(reason: &str) -> Self {
        Self {
            trajectories_processed: 0,
            patterns_extracted: 0,
            elapsed: Duration::ZERO,
            status: format!("skipped: {}", reason),
        }
    }
}

2.3 Loop C (Deep Learning)

Deliverables:

  • Weekly dream generation
  • Memory consolidation
  • Full EWC++ update

Implementation:

// src/sona/loops/deep.rs

/// Deep learning loop (weekly)
pub struct DeepLoop {
    dream_engine: Arc<RwLock<DreamEngine>>,
    memory_consolidator: Arc<RwLock<MemoryConsolidator>>,
    ewc: Arc<RwLock<EwcPlusPlus>>,
    phi_evaluator: Arc<PhiEvaluator>,
    config: DeepLoopConfig,
}

#[derive(Clone)]
pub struct DeepLoopConfig {
    pub dreams_per_cycle: usize,
    pub consolidation_threshold: f32,
    pub phi_threshold: f64,
    pub max_cycle_duration: Duration,
}

impl Default for DeepLoopConfig {
    fn default() -> Self {
        Self {
            dreams_per_cycle: 50,
            consolidation_threshold: 0.7,
            phi_threshold: 0.3,
            max_cycle_duration: Duration::from_secs(600),  // 10 minutes
        }
    }
}

impl DeepLoop {
    pub async fn run_cycle(&self) -> DeepResult {
        let start = Instant::now();
        let deadline = start + self.config.max_cycle_duration;

        // 1. Generate dreams
        let dreams = {
            let engine = self.dream_engine.read().await;
            engine.generate_dreams(self.config.dreams_per_cycle)
        };

        // 2. Evaluate dreams with Φ
        let mut evaluated_dreams = Vec::new();
        for dream in &dreams {
            if Instant::now() > deadline {
                break;
            }

            let phi = self.phi_evaluator.evaluate_dream(dream);
            if phi >= self.config.phi_threshold {
                evaluated_dreams.push((dream.clone(), phi));
            }
        }

        // 3. Integrate high-quality dreams
        {
            let mut engine = self.dream_engine.write().await;
            for (dream, _phi) in &evaluated_dreams {
                engine.integrate_dream(dream);
            }
        }

        // 4. Consolidate memory
        let consolidation_result = {
            let mut consolidator = self.memory_consolidator.write().await;
            consolidator.consolidate(self.config.consolidation_threshold).await
        };

        // 5. Full EWC++ consolidation
        {
            let mut ewc = self.ewc.write().await;
            ewc.consolidate_all_tasks();
        }

        DeepResult {
            dreams_generated: dreams.len(),
            dreams_integrated: evaluated_dreams.len(),
            patterns_strengthened: consolidation_result.strengthened,
            patterns_pruned: consolidation_result.pruned,
            elapsed: start.elapsed(),
        }
    }
}

#[derive(Debug)]
pub struct DeepResult {
    pub dreams_generated: usize,
    pub dreams_integrated: usize,
    pub patterns_strengthened: usize,
    pub patterns_pruned: usize,
    pub elapsed: Duration,
}

Phase 3: Pattern Learning

3.1 ReasoningBank Implementation

Deliverables:

  • Trajectory storage with circular buffer
  • K-means++ pattern extraction
  • Verdict judgment system

3.2 EWC++ Implementation

Deliverables:

  • Online Fisher information estimation
  • Multi-task memory with circular buffer
  • Automatic task boundary detection
  • Adaptive lambda scheduling

3.3 Dream Engine

Deliverables:

  • Random walk dream generation
  • Quality evaluation (novelty, coherence, utility)
  • Dream integration with weak edges

Phase 4: Integration

4.1 Unified Pipeline

Deliverables:

  • SonaEngine main interface
  • Loop coordinator
  • Metrics collection

4.2 ruvector Integration

Deliverables:

  • Pattern index with HNSW
  • Knowledge graph with GNN
  • Persistent storage with PostgreSQL

4.3 exo-ai Integration

Deliverables:

  • Φ measurement for quality
  • Temporal pattern learning
  • Quantum-inspired exploration

Phase 5: Optimization

5.1 SIMD Optimization

Deliverables:

  • AVX2 LoRA forward pass
  • SIMD pattern matching
  • Vectorized gradient computation

5.2 Memory Optimization

Deliverables:

  • Lock-free data structures
  • Memory pooling
  • Gradient checkpointing

5.3 Latency Optimization

Deliverables:

  • Sub-100μs micro-updates
  • Async background processing
  • Batched operations

Testing Strategy

Unit Tests

// Every public function gets a test
#[cfg(test)]
mod tests {
    // Pattern extraction tests
    #[test]
    fn test_pattern_extraction_empty() { }
    #[test]
    fn test_pattern_extraction_single() { }
    #[test]
    fn test_pattern_extraction_multiple() { }

    // LoRA tests
    #[test]
    fn test_micro_lora_forward() { }
    #[test]
    fn test_micro_lora_gradient_accumulation() { }
    #[test]
    fn test_base_lora_merge() { }

    // EWC tests
    #[test]
    fn test_ewc_constraint_application() { }
    #[test]
    fn test_fisher_update() { }
    #[test]
    fn test_task_boundary_detection() { }
}

Integration Tests

#[tokio::test]
async fn test_full_learning_cycle() {
    let sona = SonaEngine::new(SonaConfig::default()).await.unwrap();

    // Simulate queries
    for i in 0..100 {
        let response = sona.process(&format!("query {}", i), &Context::default()).await;
        assert!(response.is_ok());
    }

    // Trigger background learning
    let result = sona.background_learn().await.unwrap();
    assert!(result.patterns_learned > 0);
}

Benchmarks

#[bench]
fn bench_micro_lora_forward(b: &mut Bencher) {
    let lora = MicroLoRA::new(256, 1);
    let input = vec![0.1f32; 256];
    let mut output = vec![0.0f32; 256];

    b.iter(|| {
        unsafe { lora.forward_simd(&input, &mut output) };
    });
}

#[bench]
fn bench_pattern_extraction(b: &mut Bencher) {
    let mut bank = ReasoningBank::new(PatternConfig::default());
    // Pre-populate with trajectories

    b.iter(|| {
        bank.extract_patterns()
    });
}

Success Criteria

Metric Target Measurement
Micro-LoRA latency <50μs Benchmark
Background cycle <30s Benchmark
Deep cycle <10min Benchmark
Pattern quality >0.7 avg Metrics
Memory overhead <100MB Profiling
Φ threshold >0.3 IIT measurement

Risk Mitigation

Risk Mitigation
SIMD portability Feature flags for fallback
Memory pressure Configurable buffer sizes
Learning instability EWC++ constraints
Catastrophic forgetting Multi-task Fisher memory
Latency regression Continuous benchmarking

QUICK-START: Minimal Viable SONA

For immediate value, implement this minimal 3-file addition:

File 1: src/sona/mod.rs

//! SONA - Self-Optimizing Neural Architecture
pub mod types;
pub mod lora;

pub use types::*;
pub use lora::MicroLoRA;

File 2: src/sona/types.rs (Minimal)

use std::time::Instant;

/// Minimal learning signal
#[derive(Clone, Debug)]
pub struct LearningSignal {
    pub embedding: Vec<f32>,
    pub quality: f32,
}

/// Minimal trajectory step
#[derive(Clone, Debug)]
pub struct TrajectoryStep {
    pub hidden_state: Vec<f32>,
    pub reward: f32,
}

/// Query trajectory
#[derive(Clone, Debug)]
pub struct QueryTrajectory {
    pub id: u64,
    pub steps: Vec<TrajectoryStep>,
    pub final_quality: f32,
}

impl LearningSignal {
    pub fn from_trajectory(t: &QueryTrajectory) -> Self {
        // Simple: use last hidden state, weighted by quality
        let embedding = t.steps.last()
            .map(|s| s.hidden_state.clone())
            .unwrap_or_default();
        Self {
            embedding,
            quality: t.final_quality,
        }
    }
}

File 3: src/sona/lora.rs (Minimal MicroLoRA)

/// Minimal Micro-LoRA (rank-1)
pub struct MicroLoRA {
    pub down: Vec<f32>,  // [hidden_dim]
    pub up: Vec<f32>,    // [hidden_dim]
    accum: Vec<f32>,
    count: usize,
}

impl MicroLoRA {
    pub fn new(dim: usize) -> Self {
        Self {
            down: vec![0.01; dim],
            up: vec![0.0; dim],
            accum: vec![0.0; dim],
            count: 0,
        }
    }

    /// Forward: output += scale * (input · down) * up
    pub fn forward(&self, input: &[f32], output: &mut [f32]) {
        let dot: f32 = input.iter().zip(&self.down).map(|(a, b)| a * b).sum();
        let scale = 0.1;
        for (o, &u) in output.iter_mut().zip(&self.up) {
            *o += dot * u * scale;
        }
    }

    /// Accumulate gradient signal
    pub fn accumulate(&mut self, signal: &super::types::LearningSignal) {
        for (a, &e) in self.accum.iter_mut().zip(&signal.embedding) {
            *a += e * signal.quality;
        }
        self.count += 1;
    }

    /// Apply accumulated updates
    pub fn apply(&mut self, lr: f32) {
        if self.count == 0 { return; }
        let scale = lr / self.count as f32;
        for (u, &a) in self.up.iter_mut().zip(&self.accum) {
            *u += a * scale;
        }
        self.accum.fill(0.0);
        self.count = 0;
    }
}

Integration Point: src/learning.rs

Add to LearningService:

use crate::sona::{MicroLoRA, QueryTrajectory, LearningSignal};

impl LearningService {
    // Add field: micro_lora: RwLock<MicroLoRA>

    pub fn on_inference_complete(&self, trajectory: QueryTrajectory) {
        let signal = LearningSignal::from_trajectory(&trajectory);
        if let Ok(mut lora) = self.micro_lora.try_write() {
            lora.accumulate(&signal);
        }
    }

    pub fn flush_micro_updates(&self) {
        if let Ok(mut lora) = self.micro_lora.write() {
            lora.apply(0.001);
        }
    }
}

This gives you:

  • Trajectory recording structure
  • Per-request gradient accumulation
  • Micro-LoRA adaptation
  • No breaking changes to existing API

Total: ~150 lines of new code


Critical Success Metrics

Metric Sprint 1 Sprint 2 Sprint 3 Sprint 4
Micro-LoRA latency <50μs - - -
Trajectory overhead <10μs - - -
EWC++ constraint - <500μs - -
Pattern extraction - <1s/1000 - -
Loop A total - - <1ms -
Loop B cycle - - <30s -
Dream generation - - - <100ms

Risk Mitigation (Updated)

Risk Mitigation Owner
SIMD portability Feature flag #[cfg(target_arch)] with scalar fallback Sprint 1
Memory pressure Circular buffers with configurable capacity Sprint 1
Learning instability Start with conservative lr=0.0001 Sprint 1
Breaking changes All SONA code in separate module All
Integration complexity Inject via trait, not inheritance Sprint 2+

Week 1: Sprint 1 - Foundation
├── Day 1-2: src/sona/types.rs + tests
├── Day 3-4: src/sona/lora.rs + SIMD + benchmarks
└── Day 5: Integration into orchestrator

Week 2: Sprint 2 - Learning
├── Day 1-2: Upgrade EWCState → EwcPlusPlus
├── Day 3-4: ReasoningBank with K-means++
└── Day 5: Integration + benchmarks

Week 3: Sprint 3 - Loops
├── Day 1-2: Loop A (InstantLoop)
├── Day 3-4: Loop B (BackgroundLoop)
└── Day 5: LoopCoordinator

Week 4: Sprint 4 - Dreams (Optional)
├── Day 1-3: DreamEngine
├── Day 4-5: Φ integration (if exo-ai available)

Next Steps

  1. 08-BENCHMARKS.md - Detailed performance targets
  2. 09-API-REFERENCE.md - Complete API documentation