Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/dag/01-ARCHITECTURE.md
+++ b/vendor/ruvector/docs/dag/01-ARCHITECTURE.md
@@ -0,0 +1,484 @@
+# Neural Self-Learning DAG Architecture
+
+## Overview
+
+The Neural Self-Learning DAG system transforms RuVector-Postgres from a static query executor into an adaptive system that learns optimal configurations from query patterns.
+
+## System Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        NEURAL DAG RUVECTOR-POSTGRES                         │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                         SQL INTERFACE LAYER                          │   │
+│  │  ruvector_enable_neural_dag() | ruvector_dag_patterns() | ...       │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                        │
+│  ┌─────────────────────────────────┴───────────────────────────────────┐   │
+│  │                      QUERY OPTIMIZER LAYER                           │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │   │
+│  │  │   Pattern   │  │  Attention  │  │    Cost     │  │   Plan     │  │   │
+│  │  │   Matcher   │  │  Selector   │  │  Estimator  │  │  Rewriter  │  │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘  │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                        │
+│  ┌─────────────────────────────────┴───────────────────────────────────┐   │
+│  │                       DAG ATTENTION LAYER                            │   │
+│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐           │   │
+│  │  │Topological│ │  Causal   │ │ Critical  │ │  MinCut   │           │   │
+│  │  │ Attention │ │   Cone    │ │   Path    │ │  Gated    │           │   │
+│  │  └───────────┘ └───────────┘ └───────────┘ └───────────┘           │   │
+│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐                         │   │
+│  │  │Hierarchic │ │ Parallel  │ │ Temporal  │                         │   │
+│  │  │  Lorentz  │ │  Branch   │ │   BTSP    │                         │   │
+│  │  └───────────┘ └───────────┘ └───────────┘                         │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                        │
+│  ┌─────────────────────────────────┴───────────────────────────────────┐   │
+│  │                        SONA LEARNING LAYER                           │   │
+│  │  ┌─────────────────────────────────────────────────────────────┐    │   │
+│  │  │  INSTANT LOOP (<100μs)           BACKGROUND LOOP (hourly)   │    │   │
+│  │  │  ┌─────────────┐                 ┌─────────────┐            │    │   │
+│  │  │  │  MicroLoRA  │                 │  BaseLoRA   │            │    │   │
+│  │  │  │  (rank 1-2) │                 │  (rank 8)   │            │    │   │
+│  │  │  └─────────────┘                 └─────────────┘            │    │   │
+│  │  │  ┌─────────────┐                 ┌─────────────┐            │    │   │
+│  │  │  │ Trajectory  │ ──────────────► │ ReasoningBk │            │    │   │
+│  │  │  │   Buffer    │                 │  (K-means)  │            │    │   │
+│  │  │  └─────────────┘                 └─────────────┘            │    │   │
+│  │  │                                  ┌─────────────┐            │    │   │
+│  │  │                                  │   EWC++     │            │    │   │
+│  │  │                                  │ (forgetting)│            │    │   │
+│  │  │                                  └─────────────┘            │    │   │
+│  │  └─────────────────────────────────────────────────────────────┘    │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                        │
+│  ┌─────────────────────────────────┴───────────────────────────────────┐   │
+│  │                      OPTIMIZATION LAYER                              │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │   │
+│  │  │   MinCut    │  │    HDC      │  │    BTSP     │  │   Self-    │  │   │
+│  │  │  Analysis   │  │   State     │  │   Memory    │  │  Healing   │  │   │
+│  │  │ O(n^0.12)   │  │ Compression │  │  One-Shot   │  │   Engine   │  │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘  │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                        │
+│  ┌─────────────────────────────────┴───────────────────────────────────┐   │
+│  │                      STORAGE LAYER                                   │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │   │
+│  │  │   Pattern   │  │  Embedding  │  │  Trajectory │  │   Index    │  │   │
+│  │  │   Store     │  │   Cache     │  │   History   │  │  Metadata  │  │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘  │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                   OPTIONAL: QUDAG CONSENSUS LAYER                    │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐  │   │
+│  │  │  Federated  │  │  Pattern    │  │   ML-DSA    │  │    rUv     │  │   │
+│  │  │  Learning   │  │  Consensus  │  │  Signatures │  │   Tokens   │  │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘  │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Component Descriptions
+
+### 1. SQL Interface Layer
+
+Provides PostgreSQL-native functions for interacting with the Neural DAG system.
+
+**Key Components:**
+- `ruvector_enable_neural_dag()` - Enable learning for a table
+- `ruvector_dag_patterns()` - View learned patterns
+- `ruvector_attention_*()` - DAG attention functions
+- `ruvector_dag_learn()` - Trigger learning cycle
+
+**Location:** `crates/ruvector-postgres/src/dag/operators.rs`
+
+### 2. Query Optimizer Layer
+
+Intercepts queries and applies learned optimizations.
+
+**Key Components:**
+- **Pattern Matcher**: Finds similar past query patterns via cosine similarity
+- **Attention Selector**: UCB bandit for choosing optimal attention type
+- **Cost Estimator**: Adaptive cost model with micro-LoRA updates
+- **Plan Rewriter**: Applies learned operator ordering and parameters
+
+**Location:** `crates/ruvector-postgres/src/dag/optimizer.rs`
+
+### 3. DAG Attention Layer
+
+Seven specialized attention mechanisms for DAG structures.
+
+| Attention Type | Use Case | Complexity |
+|----------------|----------|------------|
+| Topological | Respect DAG ordering | O(n·k) |
+| Causal Cone | Distance-weighted ancestors | O(n·d) |
+| Critical Path | Focus on bottlenecks | O(n + critical_len) |
+| MinCut Gated | Gate by criticality | O(n^0.12 + n·k) |
+| Hierarchical Lorentz | Deep nesting | O(n·d) |
+| Parallel Branch | Coordinate branches | O(n·b) |
+| Temporal BTSP | Time-correlated patterns | O(n·w) |
+
+**Location:** `crates/ruvector-postgres/src/dag/attention/`
+
+### 4. SONA Learning Layer
+
+Two-tier learning system for continuous optimization.
+
+**Instant Loop (per-query):**
+- MicroLoRA adaptation (rank 1-2)
+- Trajectory recording
+- <100μs overhead
+
+**Background Loop (hourly):**
+- K-means++ pattern extraction
+- BaseLoRA updates (rank 8)
+- EWC++ constraint application
+
+**Location:** `crates/ruvector-postgres/src/dag/learning/`
+
+### 5. Optimization Layer
+
+Advanced optimization components.
+
+**Key Components:**
+- **MinCut Analysis**: O(n^0.12) bottleneck detection
+- **HDC State**: 10K-bit hypervector compression
+- **BTSP Memory**: One-shot pattern recall
+- **Self-Healing**: Proactive index repair
+
+**Location:** `crates/ruvector-postgres/src/dag/optimization/`
+
+### 6. Storage Layer
+
+Persistent storage for learned patterns and state.
+
+**Key Components:**
+- **Pattern Store**: DashMap + PostgreSQL tables
+- **Embedding Cache**: LRU cache for hot embeddings
+- **Trajectory History**: Ring buffer for recent queries
+- **Index Metadata**: Pattern-to-index mappings
+
+**Location:** `crates/ruvector-postgres/src/dag/storage/`
+
+### 7. QuDAG Consensus Layer (Optional)
+
+Distributed learning via quantum-resistant consensus.
+
+**Key Components:**
+- **Federated Learning**: Privacy-preserving pattern sharing
+- **Pattern Consensus**: QR-Avalanche for pattern validation
+- **ML-DSA Signatures**: Quantum-resistant pattern signing
+- **rUv Tokens**: Incentivize learning contributions
+
+**Location:** `crates/ruvector-postgres/src/dag/qudag/`
+
+## Data Flow
+
+### Query Execution Flow
+
+```
+SQL Query
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 1. Pattern Matching                 │
+│    - Embed query plan               │
+│    - Find similar patterns in       │
+│      ReasoningBank (cosine sim)     │
+│    - Return top-k matches           │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 2. Optimization Decision            │
+│    - If pattern found (conf > 0.8): │
+│      Apply learned configuration    │
+│    - Else:                          │
+│      Use defaults + micro-LoRA      │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 3. Attention Selection              │
+│    - UCB bandit selects attention   │
+│    - Based on query pattern type    │
+│    - Exploration vs exploitation    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 4. Plan Execution                   │
+│    - Execute with optimized params  │
+│    - Record operator timings        │
+│    - Track intermediate results     │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 5. Trajectory Recording             │
+│    - Store query embedding          │
+│    - Store operator activations     │
+│    - Store outcome metrics          │
+│    - Compute quality score          │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 6. Instant Learning                 │
+│    - MicroLoRA gradient accumulate  │
+│    - Auto-flush at 100 queries      │
+│    - Update attention selector      │
+└─────────────────────────────────────┘
+```
+
+### Learning Cycle Flow
+
+```
+Hourly Trigger
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 1. Drain Trajectory Buffer          │
+│    - Collect 1000+ trajectories     │
+│    - Filter by quality threshold    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 2. K-means++ Clustering             │
+│    - 100 clusters                   │
+│    - Deterministic initialization   │
+│    - Max 100 iterations             │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 3. Pattern Extraction               │
+│    - Compute cluster centroids      │
+│    - Extract optimal parameters     │
+│    - Calculate confidence scores    │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 4. EWC++ Constraint Check           │
+│    - Compute Fisher information     │
+│    - Apply forgetting prevention    │
+│    - Detect task boundaries         │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 5. BaseLoRA Update                  │
+│    - Apply constrained gradients    │
+│    - Update all layers              │
+│    - Merge weights if needed        │
+└─────────────────────────────────────┘
+    │
+    ▼
+┌─────────────────────────────────────┐
+│ 6. ReasoningBank Update             │
+│    - Store new patterns             │
+│    - Consolidate similar patterns   │
+│    - Evict low-confidence patterns  │
+└─────────────────────────────────────┘
+```
+
+## Module Dependencies
+
+```
+ruvector-postgres/src/dag/
+├── mod.rs                    # Module root, re-exports
+├── operators.rs              # SQL function definitions
+│
+├── attention/
+│   ├── mod.rs               # Attention trait and registry
+│   ├── topological.rs       # TopologicalAttention
+│   ├── causal_cone.rs       # CausalConeAttention
+│   ├── critical_path.rs     # CriticalPathAttention
+│   ├── mincut_gated.rs      # MinCutGatedAttention
+│   ├── hierarchical.rs      # HierarchicalLorentzAttention
+│   ├── parallel_branch.rs   # ParallelBranchAttention
+│   ├── temporal_btsp.rs     # TemporalBTSPAttention
+│   └── ensemble.rs          # EnsembleAttention
+│
+├── learning/
+│   ├── mod.rs               # Learning coordinator
+│   ├── sona_engine.rs       # SONA integration wrapper
+│   ├── trajectory.rs        # Trajectory buffer
+│   ├── patterns.rs          # Pattern extraction
+│   ├── reasoning_bank.rs    # Pattern storage
+│   ├── ewc.rs               # EWC++ integration
+│   └── attention_selector.rs # UCB bandit selector
+│
+├── optimizer/
+│   ├── mod.rs               # Optimizer coordinator
+│   ├── pattern_matcher.rs   # Pattern matching
+│   ├── cost_estimator.rs    # Adaptive costs
+│   └── plan_rewriter.rs     # Plan transformation
+│
+├── optimization/
+│   ├── mod.rs               # Optimization utilities
+│   ├── mincut.rs            # Min-cut integration
+│   ├── hdc_state.rs         # HDC compression
+│   ├── btsp_memory.rs       # BTSP one-shot
+│   └── self_healing.rs      # Self-healing engine
+│
+├── storage/
+│   ├── mod.rs               # Storage coordinator
+│   ├── pattern_store.rs     # Pattern persistence
+│   ├── embedding_cache.rs   # Embedding LRU
+│   └── trajectory_store.rs  # Trajectory history
+│
+├── qudag/
+│   ├── mod.rs               # QuDAG integration
+│   ├── federated.rs         # Federated learning
+│   ├── consensus.rs         # Pattern consensus
+│   ├── signatures.rs        # ML-DSA signing
+│   └── tokens.rs            # rUv token interface
+│
+└── types/
+    ├── mod.rs               # Type definitions
+    ├── neural_plan.rs       # NeuralDagPlan
+    ├── trajectory.rs        # DagTrajectory
+    ├── pattern.rs           # LearnedDagPattern
+    └── metrics.rs           # ExecutionMetrics
+```
+
+## Configuration
+
+### Default Configuration
+
+```rust
+pub struct NeuralDagConfig {
+    // Learning
+    pub learning_enabled: bool,           // true
+    pub max_trajectories: usize,          // 10000
+    pub pattern_clusters: usize,          // 100
+    pub quality_threshold: f32,           // 0.3
+    pub background_interval_ms: u64,      // 3600000 (1 hour)
+
+    // Attention
+    pub default_attention: DagAttentionType, // Topological
+    pub attention_exploration: f32,       // 0.1
+    pub ucb_exploration_c: f32,           // 1.414
+
+    // SONA
+    pub micro_lora_rank: usize,           // 2
+    pub micro_lora_lr: f32,               // 0.002
+    pub base_lora_rank: usize,            // 8
+    pub base_lora_lr: f32,                // 0.001
+
+    // EWC++
+    pub ewc_lambda: f32,                  // 2000.0
+    pub ewc_max_lambda: f32,              // 15000.0
+    pub ewc_fisher_decay: f32,            // 0.999
+
+    // MinCut
+    pub mincut_enabled: bool,             // true
+    pub mincut_threshold: f32,            // 0.5
+
+    // HDC
+    pub hdc_dimensions: usize,            // 10000
+
+    // Self-Healing
+    pub healing_enabled: bool,            // true
+    pub healing_check_interval_ms: u64,   // 300000 (5 min)
+}
+```
+
+### PostgreSQL GUC Variables
+
+```sql
+-- Enable/disable neural DAG
+SET ruvector.neural_dag_enabled = true;
+
+-- Learning parameters
+SET ruvector.dag_learning_rate = 0.002;
+SET ruvector.dag_pattern_clusters = 100;
+SET ruvector.dag_quality_threshold = 0.3;
+
+-- Attention parameters
+SET ruvector.dag_attention_type = 'auto';
+SET ruvector.dag_attention_exploration = 0.1;
+
+-- EWC parameters
+SET ruvector.dag_ewc_lambda = 2000.0;
+
+-- MinCut parameters
+SET ruvector.dag_mincut_enabled = true;
+SET ruvector.dag_mincut_threshold = 0.5;
+```
+
+## Performance Targets
+
+| Operation | Target Latency | Notes |
+|-----------|----------------|-------|
+| Pattern matching | <1ms | Top-5 similar patterns |
+| Attention computation | <500μs | Per operator |
+| MicroLoRA forward | <100μs | Per query |
+| Trajectory recording | <50μs | Non-blocking |
+| Background learning | <5s | 1000 trajectories |
+| MinCut analysis | <10ms | O(n^0.12) |
+| HDC encoding | <100μs | 10K dimensions |
+
+## Memory Budget
+
+| Component | Budget | Notes |
+|-----------|--------|-------|
+| Pattern Store | 50MB | ~1000 patterns per table |
+| Embedding Cache | 20MB | LRU for hot embeddings |
+| Trajectory Buffer | 20MB | 10K trajectories |
+| MicroLoRA | 10KB | Per active query |
+| BaseLoRA | 400KB | Per table |
+| HDC State | 1.2KB | Per state snapshot |
+
+**Total per table:** ~100MB maximum
+
+## Thread Safety
+
+All components use thread-safe primitives:
+
+- `DashMap` for concurrent pattern storage
+- `parking_lot::RwLock` for embedding cache
+- `crossbeam::ArrayQueue` for trajectory buffer
+- `AtomicU64` for counters and statistics
+- PostgreSQL background workers for learning cycles
+
+## Error Handling
+
+```rust
+pub enum NeuralDagError {
+    // Configuration errors
+    InvalidConfig(String),
+    TableNotEnabled(String),
+
+    // Learning errors
+    InsufficientTrajectories,
+    PatternExtractionFailed,
+    EwcConstraintViolation,
+
+    // Attention errors
+    AttentionComputationFailed,
+    InvalidDagStructure,
+
+    // Storage errors
+    PatternStoreFull,
+    EmbeddingCacheMiss,
+
+    // MinCut errors
+    MinCutComputationFailed,
+    GraphDisconnected,
+
+    // QuDAG errors (optional)
+    ConsensusTimeout,
+    SignatureVerificationFailed,
+}
+```
+
+All errors are logged and non-fatal - the system falls back to default behavior on error.