Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/docs/nervous-system/deployment.md
+++ b/vendor/ruvector/docs/nervous-system/deployment.md
@@ -0,0 +1,676 @@
+# RuVector Nervous System: Deployment Mapping & Build Order
+
+## Executive Summary
+
+This document defines the deployment architecture and three-phase build order for the RuVector Nervous System, integrating hyperdimensional computing (HDC), Modern Hopfield networks, and biologically-inspired learning with Cognitum neuromorphic hardware.
+
+**Key Goals:**
+- 10× energy efficiency improvement over baseline HNSW
+- Sub-millisecond inference latency
+- Exponential capacity scaling with dimension
+- Online learning with forgetting prevention
+- Deterministic safety guarantees
+
+---
+
+## Deployment Tiers
+
+### Tier 1: Cognitum Worker Tiles (Reflex Tier)
+
+**Purpose:** Ultra-low-latency event processing and reflexive responses
+
+**Components Deployed:**
+- Event ingestion pipeline
+- K-WTA selection circuits
+- Dendritic coincidence detection
+- BTSP one-shot learning gates
+- Hard safety validators
+- Bounded event queues
+
+**Hardware Constraints:**
+- **Memory:** On-tile SRAM only (no external DRAM access)
+- **Bandwidth:** Zero off-tile memory bandwidth during reflex path
+- **Timing:** Deterministic execution with hard bounds
+- **Queue Depth:** Fixed-size circular buffers (configurable, e.g., 256 events)
+
+**Operational Characteristics:**
+- **Latency Target:** <100μs event→action
+- **Energy Target:** <1μJ per query
+- **Sparsity:** 2-5% neuron activation
+- **Determinism:** Maximum iteration counts enforced
+
+**Safety Mechanisms:**
+- Hard timeout enforcement (circuit breaker)
+- Input validation gates
+- Witness logging for all safety-critical decisions
+- Automatic fallback to safe default state
+
+---
+
+### Tier 2: Cognitum Hub (Coordinator Cores)
+
+**Purpose:** Cross-tile coordination and plasticity consolidation
+
+**Components Deployed:**
+- Routing decision logic
+- Plasticity consolidation engine (EWC, CLS)
+- Workspace coordinator (Global Workspace Theory)
+- Coherence-gated routing
+- Inter-tile communication manager
+
+**Memory Architecture:**
+- **L1/L2:** Per-core cache for hot paths
+- **L3:** Coherent shared cache across hub cores
+- **Access Pattern:** Cache-friendly sequential scans for consolidation
+
+**Operational Characteristics:**
+- **Latency Target:** <10ms for consolidation operations
+- **Bandwidth:** High coherent bandwidth for multi-tile sync
+- **Plasticity Rate:** Capped updates per second (e.g., 1000 updates/sec)
+- **Coordination:** Supports up to 64 worker tiles per hub
+
+**Safety Mechanisms:**
+- Rate limiting on plasticity updates
+- Threshold versioning for rollback capability
+- Coherence validation before routing decisions
+- Circuit breakers for latency spikes
+
+---
+
+### Tier 3: RuVector Server
+
+**Purpose:** Long-horizon learning and associative memory
+
+**Components Deployed:**
+- Modern Hopfield associative memory
+- HDC pattern separation encoding
+- Continuous Learning with Synaptic Intelligence (CLS)
+- Elastic Weight Consolidation (EWC)
+- Cross-collection analytics
+- Predictive residual learner
+
+**Memory Architecture:**
+- **Storage:** Large-scale vector embeddings in memory
+- **Cache:** Hot pattern cache for frequently accessed memories
+- **Compute:** GPU/SIMD acceleration for Hopfield energy minimization
+- **Persistence:** Periodic snapshots to RuVector Postgres
+
+**Operational Characteristics:**
+- **Latency Target:** <10ms for associative retrieval
+- **Capacity:** Exponential(d) with dimension d
+- **Learning:** Online updates with forgetting prevention
+- **Sparsity:** 2-5% activation via K-WTA
+
+**Safety Mechanisms:**
+- Predictive residual thresholds prevent spurious writes
+- EWC prevents catastrophic forgetting
+- Collection versioning for rollback
+- Automatic fallback to baseline HNSW on failures
+
+---
+
+### Tier 4: RuVector Postgres
+
+**Purpose:** Durable storage and collection parameter versioning
+
+**Components Deployed:**
+- Collection metadata and parameters
+- Threshold versioning (predictive residual gates)
+- BTSP one-shot association windows
+- Long-term trajectory logs
+- Performance metrics and analytics
+
+**Storage Schema:**
+```sql
+-- Collection versioning
+collections (
+  id UUID PRIMARY KEY,
+  version INT NOT NULL,
+  created_at TIMESTAMP,
+  hdc_dimension INT,
+  hopfield_beta FLOAT,
+  kWTA_k INT,
+  predictive_threshold FLOAT
+);
+
+-- BTSP association windows
+btsp_windows (
+  collection_id UUID REFERENCES collections(id),
+  window_start TIMESTAMP,
+  window_end TIMESTAMP,
+  max_one_shot_associations INT,
+  associations_used INT
+);
+
+-- Witness logs (safety-critical decisions)
+witness_logs (
+  timestamp TIMESTAMP,
+  component VARCHAR(50),
+  input_hash BYTEA,
+  output_hash BYTEA,
+  decision VARCHAR(20),
+  latency_us INT
+);
+
+-- Performance metrics
+metrics (
+  timestamp TIMESTAMP,
+  tier VARCHAR(20),
+  operation VARCHAR(50),
+  latency_p50_ms FLOAT,
+  latency_p99_ms FLOAT,
+  energy_uj FLOAT,
+  success_rate FLOAT
+);
+```
+
+**Operational Characteristics:**
+- **Write Pattern:** Gated writes via predictive residual
+- **Read Pattern:** Hot parameter cache in RuVector Server
+- **Versioning:** Immutable collection versions with rollback
+- **Analytics:** Aggregated metrics for performance monitoring
+
+**Safety Mechanisms:**
+- Immutable version history
+- Atomic parameter updates
+- Witness log retention for audit trails
+- Circuit breaker configuration persistence
+
+---
+
+## Three-Phase Build Order
+
+### Phase 1: RuVector Foundation (Months 0-3)
+
+**Objective:** Establish core hyperdimensional and Hopfield primitives with 10× energy efficiency
+
+**Deliverables:**
+
+1. **HDC Module Complete**
+   - Hypervector encoding (bundle, bind, permute)
+   - K-WTA selection with configurable k
+   - Similarity measurement (Hamming, cosine)
+   - Integration with ruvector-core Rust API
+
+2. **Modern Hopfield Retrieval**
+   - Energy minimization via softmax attention
+   - Exponential capacity scaling
+   - GPU/SIMD-accelerated inference
+   - Benchmarked against baseline HNSW
+
+3. **K-WTA Selection**
+   - Top-k neuron activation
+   - Sparsity enforcement (2-5% target)
+   - Hardware-friendly implementation
+   - Latency <100μs for d=10000
+
+4. **Pattern Separation Encoding**
+   - Input→hypervector encoding
+   - Collision resistance validation
+   - Dimensionality reduction benchmarks
+
+5. **Integration with ruvector-core**
+   - Rust bindings for HDC and Hopfield
+   - Unified query API (HNSW + HDC + Hopfield lanes)
+   - Performance regression tests
+
+**Success Criteria:**
+- ✅ 10× energy efficiency vs baseline HNSW
+- ✅ <1ms inference latency for d=10000
+- ✅ Exponential capacity demonstrated (>1M patterns)
+- ✅ 95% retrieval accuracy on standard benchmarks
+
+**Demo:**
+Hybrid search system demonstrating:
+- HNSW lane for precise nearest neighbor
+- HDC lane for robust pattern matching
+- Hopfield lane for associative completion
+- Automatic lane selection based on query type
+
+**Risks & Mitigations:**
+- **Risk:** SIMD optimization complexity
+  - **Mitigation:** Start with naive implementation, profile, optimize hot paths
+- **Risk:** Hopfield capacity limits
+  - **Mitigation:** Benchmark capacity scaling empirically, document limits
+- **Risk:** Integration complexity with existing ruvector-core
+  - **Mitigation:** Incremental integration with feature flags
+
+---
+
+### Phase 2: Cognitum Reflex (Months 3-6)
+
+**Objective:** Deploy ultra-low-latency reflex tier on Cognitum neuromorphic tiles
+
+**Deliverables:**
+
+1. **Event Bus with Bounded Queues**
+   - Fixed-size circular buffers (e.g., 256 events)
+   - Priority-based event scheduling
+   - Overflow handling with graceful degradation
+   - Zero dynamic allocation
+
+2. **Dendritic Coincidence Detection**
+   - Multi-branch dendritic computation
+   - Spatial and temporal coincidence detection
+   - Threshold-based gating
+   - On-tile SRAM-only implementation
+
+3. **BTSP One-Shot Learning**
+   - Single-exposure association formation
+   - Time-windowed eligibility traces
+   - Gated by predictive residual
+   - Postgres-backed association windows
+
+4. **Reflex Tier Deployment on Cognitum Tiles**
+   - Tile-local event processing
+   - Deterministic timing enforcement
+   - Hard timeout circuits
+   - Witness logging for safety gates
+
+**Success Criteria:**
+- ✅ <100μs event→action latency
+- ✅ <1μJ energy per query
+- ✅ 100% deterministic timing (no dynamic allocation)
+- ✅ Zero off-tile memory access in reflex path
+
+**Demo:**
+Real-time event processing on simulated Cognitum environment:
+- High-frequency event stream (10kHz)
+- Sub-100μs reflexive responses
+- BTSP one-shot learning demonstration
+- Safety gate validation under adversarial input
+
+**Risks & Mitigations:**
+- **Risk:** Cognitum hardware availability
+  - **Mitigation:** Develop on cycle-accurate simulator, validate on hardware when available
+- **Risk:** SRAM capacity limits
+  - **Mitigation:** Profile memory usage, optimize data structures, prune cold paths
+- **Risk:** Deterministic timing violations
+  - **Mitigation:** Static analysis of loop bounds, hard timeout enforcement
+- **Risk:** BTSP stability under noise
+  - **Mitigation:** Threshold tuning, windowed eligibility traces
+
+---
+
+### Phase 3: Online Learning & Coherence (Months 6-12)
+
+**Objective:** Distributed online learning with forgetting prevention and multi-chip coordination
+
+**Deliverables:**
+
+1. **E-prop Online Learning**
+   - Eligibility trace-based gradient estimation
+   - Event-driven weight updates
+   - Sparse credit assignment
+   - Integrated with reflex tier
+
+2. **EWC Consolidation**
+   - Fisher Information Matrix estimation
+   - Importance-weighted regularization
+   - Per-collection consolidation
+   - Prevents catastrophic forgetting (<5% degradation)
+
+3. **Coherence-Gated Routing**
+   - Global Workspace Theory (GWT) coordination
+   - Multi-tile coherence validation
+   - Routing decisions based on workspace state
+   - Hub-mediated coordination
+
+4. **Global Workspace Coordination**
+   - Cross-tile broadcast of salient events
+   - Winner-take-all workspace selection
+   - Attention-based routing
+   - Coherent state synchronization
+
+5. **Multi-Chip Cognitum Coordination**
+   - Inter-chip communication protocol
+   - Distributed plasticity updates
+   - Fault tolerance and graceful degradation
+   - Scalability to 4+ chips
+
+**Success Criteria:**
+- ✅ Online learning without centralized consolidation
+- ✅ <5% performance degradation over 1M updates
+- ✅ Coherent routing across 64+ tiles
+- ✅ Multi-chip coordination with <1ms sync latency
+
+**Demo:**
+Continuous learning demonstration:
+- 1M+ online updates without catastrophic forgetting
+- Cross-tile coherence maintained under load
+- Multi-chip coordination with graceful degradation
+- EWC prevents forgetting of critical patterns
+
+**Risks & Mitigations:**
+- **Risk:** E-prop stability under distribution shift
+  - **Mitigation:** Adaptive learning rates, eligibility trace decay tuning
+- **Risk:** EWC computational overhead
+  - **Mitigation:** Sparse Fisher approximation, periodic consolidation
+- **Risk:** Coherence protocol deadlocks
+  - **Mitigation:** Timeout-based fallback, formal verification of protocol
+- **Risk:** Multi-chip synchronization overhead
+  - **Mitigation:** Asynchronous updates with eventual consistency
+
+---
+
+## Risk Controls & Safety Mechanisms
+
+### Deterministic Bounds
+
+**Principle:** Every reflex path has a provable maximum execution time
+
+**Implementation:**
+- **Static Loop Bounds:** All loops have compile-time maximum iteration counts
+- **Hard Timeouts:** Circuit breakers enforce timeouts at hardware level
+- **No Dynamic Allocation:** Zero heap allocation in reflex paths
+- **Bounded Queues:** Fixed-size event queues with overflow handling
+
+**Verification:**
+- Static analysis tools verify loop bounds
+- Runtime assertions validate timeout enforcement
+- Continuous integration tests measure worst-case execution time
+
+---
+
+### Witness Logging
+
+**Principle:** All safety-relevant decisions are logged for audit and debugging
+
+**Logged Events:**
+- **Safety Gate Decisions:** Input hash, output hash, decision (accept/reject)
+- **Timestamps:** High-resolution timestamps for causality tracking
+- **Latencies:** Per-operation latency for anomaly detection
+- **Component ID:** Which tier/tile made the decision
+
+**Storage:**
+- Critical decisions → RuVector Postgres (durable)
+- High-frequency events → Ring buffer in RuVector Server (ephemeral)
+- Aggregated metrics → Postgres (hourly rollup)
+
+**Usage:**
+- Post-incident analysis
+- Continuous validation of safety properties
+- Training data for predictive models
+
+---
+
+### Rate Limiting
+
+**Principle:** Plasticity updates are capped to prevent divergence under adversarial input
+
+**Limits:**
+- **Per-Tile:** Max 1000 updates/sec per worker tile
+- **Per-Collection:** Max 10000 updates/sec across all tiles
+- **BTSP Windows:** Max 100 one-shot associations per window (e.g., 1-second windows)
+
+**Enforcement:**
+- Token bucket rate limiter in Cognitum Hub
+- Postgres-backed BTSP window tracking
+- Automatic throttling with graceful degradation
+
+**Monitoring:**
+- Alert on rate limit violations
+- Metrics track throttling frequency
+- Adaptive threshold tuning based on load
+
+---
+
+### Threshold Versioning
+
+**Principle:** Predictive residual thresholds are versioned with collections for rollback
+
+**Implementation:**
+- **Immutable Versions:** Each collection version has frozen thresholds
+- **Rollback Capability:** Revert to previous version on performance degradation
+- **A/B Testing:** Run multiple threshold versions in parallel
+- **Gradual Rollout:** Canary deployments for new thresholds
+
+**Schema:**
+```sql
+collection_thresholds (
+  collection_id UUID,
+  version INT,
+  predictive_residual_threshold FLOAT,
+  btsp_eligibility_threshold FLOAT,
+  kWTA_k INT,
+  PRIMARY KEY (collection_id, version)
+);
+```
+
+**Usage:**
+- Automatic rollback on >10% performance degradation
+- Manual rollback for debugging
+- Threshold evolution tracking over time
+
+---
+
+### Circuit Breakers
+
+**Principle:** Automatic fallback to baseline HNSW on failures or latency spikes
+
+**Triggers:**
+- **Latency:** p99 latency >2× target for 10 consecutive queries
+- **Error Rate:** >5% query failures in 1-second window
+- **Safety Gate:** Any hard safety timeout violation
+- **Resource Exhaustion:** Queue overflow, memory pressure
+
+**Fallback Behavior:**
+- Disable HDC/Hopfield lanes, route all queries to HNSW
+- Log circuit breaker activation with full context
+- Notify monitoring system for manual investigation
+- Automatic reset after cooldown period (e.g., 60 seconds)
+
+**Configuration:**
+- Per-collection circuit breaker settings
+- Stored in RuVector Postgres
+- Hot-reloadable without service restart
+
+---
+
+## Performance Targets Summary
+
+| Metric | Target | Phase | Verification Method |
+|--------|--------|-------|---------------------|
+| **Inference Latency** | <1ms | Phase 1 | Benchmark suite (p99) |
+| **Energy per Query** | <1μJ | Phase 2 | Cognitum power profiler |
+| **One-Shot Learning** | Single exposure | Phase 2 | BTSP accuracy tests |
+| **Forgetting Prevention** | <5% degradation | Phase 3 | EWC consolidation tests |
+| **Capacity Scaling** | Exponential(d) | Phase 1 | Hopfield capacity benchmark |
+| **Sparsity** | 2-5% activation | Phase 1 | K-WTA profiling |
+| **Reflex Latency** | <100μs | Phase 2 | Tile-level timing analysis |
+| **Multi-Tile Coherence** | <1ms sync | Phase 3 | Hub coordination profiler |
+| **Safety Gate Violations** | 0 per 1M queries | All | Witness log analysis |
+| **Circuit Breaker Rate** | <0.1% of queries | All | Monitoring dashboard |
+
+---
+
+## Integration with Cognitum Hardware
+
+### Cognitum v0 (Simulation)
+
+**Capabilities:**
+- Cycle-accurate simulation of tile architecture
+- SRAM modeling with realistic latencies
+- Event bus simulation with timing
+- Power estimation models
+
+**Usage:**
+- Phase 1-2 development and validation
+- Performance profiling before hardware availability
+- Regression testing for deterministic timing
+
+**Limitations:**
+- No real power measurements (estimates only)
+- Simulation overhead limits scale testing
+- May miss hardware-specific edge cases
+
+---
+
+### Cognitum v1 (Hardware)
+
+**Capabilities:**
+- Physical neuromorphic tiles with on-tile SRAM
+- Real power measurements (<1μJ per query target)
+- Hardware-enforced deterministic timing
+- Multi-chip interconnect for scaling
+
+**Usage:**
+- Phase 2-3 deployment and validation
+- Real-world power and latency measurements
+- Multi-chip scaling experiments
+- Safety-critical deployment validation
+
+**Requirements:**
+- Tile firmware with reflex path implementation
+- Hub software for coordination and consolidation
+- Interconnect drivers for multi-chip communication
+- Monitoring and instrumentation infrastructure
+
+---
+
+## Deployment Workflow
+
+### Development Workflow
+
+1. **Local Development**
+   - RuVector Server runs on developer workstation
+   - Mock Cognitum simulator for reflex tier
+   - Local Postgres for persistence
+   - Unit tests + integration tests
+
+2. **Staging Environment**
+   - RuVector Server on dedicated server
+   - Cognitum v0 simulator at scale
+   - Staging Postgres with production-like data
+   - Performance regression tests
+
+3. **Production Deployment**
+   - RuVector Server on high-memory server (128GB+)
+   - Cognitum v1 hardware tiles
+   - Production Postgres with replication
+   - Full monitoring and alerting
+
+---
+
+### Deployment Checklist
+
+**Phase 1 (RuVector Foundation):**
+- [ ] HDC module passes all unit tests
+- [ ] Hopfield capacity scaling validated
+- [ ] K-WTA latency <100μs for d=10000
+- [ ] 10× energy efficiency vs baseline HNSW
+- [ ] Integration tests with ruvector-core pass
+- [ ] Hybrid search demo functional
+
+**Phase 2 (Cognitum Reflex):**
+- [ ] Event bus handles 10kHz input stream
+- [ ] Reflex latency <100μs (p99)
+- [ ] BTSP one-shot learning accuracy >90%
+- [ ] Zero off-tile memory access verified
+- [ ] Witness logging functional
+- [ ] Circuit breakers tested under load
+
+**Phase 3 (Online Learning & Coherence):**
+- [ ] E-prop online learning stable over 1M updates
+- [ ] EWC prevents >5% forgetting
+- [ ] Multi-tile coherence <1ms sync latency
+- [ ] Multi-chip coordination functional
+- [ ] Rate limiting prevents divergence
+- [ ] Threshold versioning and rollback tested
+
+---
+
+## Monitoring & Observability
+
+### Key Metrics
+
+**Latency:**
+- p50, p95, p99, p999 latency per tier
+- Breakdown by operation (encode, retrieve, consolidate)
+- Time-series visualization with anomaly detection
+
+**Throughput:**
+- Queries per second per tier
+- Event processing rate (reflex tier)
+- Plasticity updates per second
+
+**Resource Utilization:**
+- CPU, memory, disk usage per tier
+- SRAM usage on Cognitum tiles
+- Postgres connection pool utilization
+
+**Safety:**
+- Circuit breaker activation rate
+- Safety gate violation count (target: 0)
+- Rate limiter throttling frequency
+
+**Learning:**
+- BTSP association success rate
+- EWC consolidation loss
+- Forgetting rate over time
+
+---
+
+### Alerting Thresholds
+
+**Critical Alerts:**
+- Safety gate violation (immediate page)
+- Circuit breaker activation (immediate notification)
+- p99 latency >10× target (immediate notification)
+- Error rate >5% (immediate notification)
+
+**Warning Alerts:**
+- p99 latency >2× target
+- Rate limiter throttling >1% of requests
+- Memory usage >80%
+- BTSP association success rate <80%
+
+---
+
+## Appendix: Component Mapping Reference
+
+### RuVector Core Components → Deployment Tiers
+
+| Component | Tier | Rationale |
+|-----------|------|-----------|
+| HDC Encoding | Tier 1 (Cognitum Tiles) | Deterministic, SRAM-friendly |
+| K-WTA Selection | Tier 1 (Cognitum Tiles) | Low-latency, sparse activation |
+| Dendritic Coincidence | Tier 1 (Cognitum Tiles) | Event-driven, reflex path |
+| BTSP One-Shot | Tier 1 (Cognitum Tiles) | Single-exposure learning |
+| Hopfield Retrieval | Tier 3 (RuVector Server) | Large memory, GPU acceleration |
+| EWC Consolidation | Tier 2 (Cognitum Hub) | Cross-tile coordination |
+| E-prop Learning | Tier 2 (Cognitum Hub) | Plasticity management |
+| Workspace Coordination | Tier 2 (Cognitum Hub) | Multi-tile routing |
+| Predictive Residual | Tier 3 (RuVector Server) | Requires historical data |
+| Collection Versioning | Tier 4 (Postgres) | Durable storage |
+| Witness Logging | Tier 4 (Postgres) | Audit trail persistence |
+
+---
+
+## Glossary
+
+- **BTSP:** Behavioral Timescale Synaptic Plasticity (one-shot learning)
+- **CLS:** Continuous Learning with Synaptic Intelligence
+- **EWC:** Elastic Weight Consolidation (forgetting prevention)
+- **E-prop:** Eligibility Propagation (online learning)
+- **GWT:** Global Workspace Theory (multi-agent coordination)
+- **HDC:** Hyperdimensional Computing
+- **K-WTA:** K-Winners-Take-All (sparse activation)
+- **SRAM:** Static Random-Access Memory (on-chip memory)
+
+---
+
+## References
+
+1. Cognitum Neuromorphic Hardware Architecture (Internal)
+2. Modern Hopfield Networks: https://arxiv.org/abs/2008.02217
+3. Hyperdimensional Computing: https://arxiv.org/abs/2111.06077
+4. Elastic Weight Consolidation: https://arxiv.org/abs/1612.00796
+5. E-prop Learning: https://www.nature.com/articles/s41467-020-17236-y
+6. Global Workspace Theory: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5924785/
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2025-12-28
+**Maintainer:** RuVector Nervous System Architecture Team