Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,676 @@
# RuVector Nervous System: Deployment Mapping & Build Order
## Executive Summary
This document defines the deployment architecture and three-phase build order for the RuVector Nervous System, integrating hyperdimensional computing (HDC), Modern Hopfield networks, and biologically-inspired learning with Cognitum neuromorphic hardware.
**Key Goals:**
- 10× energy efficiency improvement over baseline HNSW
- Sub-millisecond inference latency
- Exponential capacity scaling with dimension
- Online learning with forgetting prevention
- Deterministic safety guarantees
---
## Deployment Tiers
### Tier 1: Cognitum Worker Tiles (Reflex Tier)
**Purpose:** Ultra-low-latency event processing and reflexive responses
**Components Deployed:**
- Event ingestion pipeline
- K-WTA selection circuits
- Dendritic coincidence detection
- BTSP one-shot learning gates
- Hard safety validators
- Bounded event queues
**Hardware Constraints:**
- **Memory:** On-tile SRAM only (no external DRAM access)
- **Bandwidth:** Zero off-tile memory bandwidth during reflex path
- **Timing:** Deterministic execution with hard bounds
- **Queue Depth:** Fixed-size circular buffers (configurable, e.g., 256 events)
**Operational Characteristics:**
- **Latency Target:** <100μs event→action
- **Energy Target:** <1μJ per query
- **Sparsity:** 2-5% neuron activation
- **Determinism:** Maximum iteration counts enforced
**Safety Mechanisms:**
- Hard timeout enforcement (circuit breaker)
- Input validation gates
- Witness logging for all safety-critical decisions
- Automatic fallback to safe default state
---
### Tier 2: Cognitum Hub (Coordinator Cores)
**Purpose:** Cross-tile coordination and plasticity consolidation
**Components Deployed:**
- Routing decision logic
- Plasticity consolidation engine (EWC, CLS)
- Workspace coordinator (Global Workspace Theory)
- Coherence-gated routing
- Inter-tile communication manager
**Memory Architecture:**
- **L1/L2:** Per-core cache for hot paths
- **L3:** Coherent shared cache across hub cores
- **Access Pattern:** Cache-friendly sequential scans for consolidation
**Operational Characteristics:**
- **Latency Target:** <10ms for consolidation operations
- **Bandwidth:** High coherent bandwidth for multi-tile sync
- **Plasticity Rate:** Capped updates per second (e.g., 1000 updates/sec)
- **Coordination:** Supports up to 64 worker tiles per hub
**Safety Mechanisms:**
- Rate limiting on plasticity updates
- Threshold versioning for rollback capability
- Coherence validation before routing decisions
- Circuit breakers for latency spikes
---
### Tier 3: RuVector Server
**Purpose:** Long-horizon learning and associative memory
**Components Deployed:**
- Modern Hopfield associative memory
- HDC pattern separation encoding
- Continuous Learning with Synaptic Intelligence (CLS)
- Elastic Weight Consolidation (EWC)
- Cross-collection analytics
- Predictive residual learner
**Memory Architecture:**
- **Storage:** Large-scale vector embeddings in memory
- **Cache:** Hot pattern cache for frequently accessed memories
- **Compute:** GPU/SIMD acceleration for Hopfield energy minimization
- **Persistence:** Periodic snapshots to RuVector Postgres
**Operational Characteristics:**
- **Latency Target:** <10ms for associative retrieval
- **Capacity:** Exponential(d) with dimension d
- **Learning:** Online updates with forgetting prevention
- **Sparsity:** 2-5% activation via K-WTA
**Safety Mechanisms:**
- Predictive residual thresholds prevent spurious writes
- EWC prevents catastrophic forgetting
- Collection versioning for rollback
- Automatic fallback to baseline HNSW on failures
---
### Tier 4: RuVector Postgres
**Purpose:** Durable storage and collection parameter versioning
**Components Deployed:**
- Collection metadata and parameters
- Threshold versioning (predictive residual gates)
- BTSP one-shot association windows
- Long-term trajectory logs
- Performance metrics and analytics
**Storage Schema:**
```sql
-- Collection versioning
collections (
id UUID PRIMARY KEY,
version INT NOT NULL,
created_at TIMESTAMP,
hdc_dimension INT,
hopfield_beta FLOAT,
kWTA_k INT,
predictive_threshold FLOAT
);
-- BTSP association windows
btsp_windows (
collection_id UUID REFERENCES collections(id),
window_start TIMESTAMP,
window_end TIMESTAMP,
max_one_shot_associations INT,
associations_used INT
);
-- Witness logs (safety-critical decisions)
witness_logs (
timestamp TIMESTAMP,
component VARCHAR(50),
input_hash BYTEA,
output_hash BYTEA,
decision VARCHAR(20),
latency_us INT
);
-- Performance metrics
metrics (
timestamp TIMESTAMP,
tier VARCHAR(20),
operation VARCHAR(50),
latency_p50_ms FLOAT,
latency_p99_ms FLOAT,
energy_uj FLOAT,
success_rate FLOAT
);
```
**Operational Characteristics:**
- **Write Pattern:** Gated writes via predictive residual
- **Read Pattern:** Hot parameter cache in RuVector Server
- **Versioning:** Immutable collection versions with rollback
- **Analytics:** Aggregated metrics for performance monitoring
**Safety Mechanisms:**
- Immutable version history
- Atomic parameter updates
- Witness log retention for audit trails
- Circuit breaker configuration persistence
---
## Three-Phase Build Order
### Phase 1: RuVector Foundation (Months 0-3)
**Objective:** Establish core hyperdimensional and Hopfield primitives with 10× energy efficiency
**Deliverables:**
1. **HDC Module Complete**
- Hypervector encoding (bundle, bind, permute)
- K-WTA selection with configurable k
- Similarity measurement (Hamming, cosine)
- Integration with ruvector-core Rust API
2. **Modern Hopfield Retrieval**
- Energy minimization via softmax attention
- Exponential capacity scaling
- GPU/SIMD-accelerated inference
- Benchmarked against baseline HNSW
3. **K-WTA Selection**
- Top-k neuron activation
- Sparsity enforcement (2-5% target)
- Hardware-friendly implementation
- Latency <100μs for d=10000
4. **Pattern Separation Encoding**
- Input→hypervector encoding
- Collision resistance validation
- Dimensionality reduction benchmarks
5. **Integration with ruvector-core**
- Rust bindings for HDC and Hopfield
- Unified query API (HNSW + HDC + Hopfield lanes)
- Performance regression tests
**Success Criteria:**
- ✅ 10× energy efficiency vs baseline HNSW
- ✅ <1ms inference latency for d=10000
- ✅ Exponential capacity demonstrated (>1M patterns)
- ✅ 95% retrieval accuracy on standard benchmarks
**Demo:**
Hybrid search system demonstrating:
- HNSW lane for precise nearest neighbor
- HDC lane for robust pattern matching
- Hopfield lane for associative completion
- Automatic lane selection based on query type
**Risks & Mitigations:**
- **Risk:** SIMD optimization complexity
- **Mitigation:** Start with naive implementation, profile, optimize hot paths
- **Risk:** Hopfield capacity limits
- **Mitigation:** Benchmark capacity scaling empirically, document limits
- **Risk:** Integration complexity with existing ruvector-core
- **Mitigation:** Incremental integration with feature flags
---
### Phase 2: Cognitum Reflex (Months 3-6)
**Objective:** Deploy ultra-low-latency reflex tier on Cognitum neuromorphic tiles
**Deliverables:**
1. **Event Bus with Bounded Queues**
- Fixed-size circular buffers (e.g., 256 events)
- Priority-based event scheduling
- Overflow handling with graceful degradation
- Zero dynamic allocation
2. **Dendritic Coincidence Detection**
- Multi-branch dendritic computation
- Spatial and temporal coincidence detection
- Threshold-based gating
- On-tile SRAM-only implementation
3. **BTSP One-Shot Learning**
- Single-exposure association formation
- Time-windowed eligibility traces
- Gated by predictive residual
- Postgres-backed association windows
4. **Reflex Tier Deployment on Cognitum Tiles**
- Tile-local event processing
- Deterministic timing enforcement
- Hard timeout circuits
- Witness logging for safety gates
**Success Criteria:**
- ✅ <100μs event→action latency
- ✅ <1μJ energy per query
- ✅ 100% deterministic timing (no dynamic allocation)
- ✅ Zero off-tile memory access in reflex path
**Demo:**
Real-time event processing on simulated Cognitum environment:
- High-frequency event stream (10kHz)
- Sub-100μs reflexive responses
- BTSP one-shot learning demonstration
- Safety gate validation under adversarial input
**Risks & Mitigations:**
- **Risk:** Cognitum hardware availability
- **Mitigation:** Develop on cycle-accurate simulator, validate on hardware when available
- **Risk:** SRAM capacity limits
- **Mitigation:** Profile memory usage, optimize data structures, prune cold paths
- **Risk:** Deterministic timing violations
- **Mitigation:** Static analysis of loop bounds, hard timeout enforcement
- **Risk:** BTSP stability under noise
- **Mitigation:** Threshold tuning, windowed eligibility traces
---
### Phase 3: Online Learning & Coherence (Months 6-12)
**Objective:** Distributed online learning with forgetting prevention and multi-chip coordination
**Deliverables:**
1. **E-prop Online Learning**
- Eligibility trace-based gradient estimation
- Event-driven weight updates
- Sparse credit assignment
- Integrated with reflex tier
2. **EWC Consolidation**
- Fisher Information Matrix estimation
- Importance-weighted regularization
- Per-collection consolidation
- Prevents catastrophic forgetting (<5% degradation)
3. **Coherence-Gated Routing**
- Global Workspace Theory (GWT) coordination
- Multi-tile coherence validation
- Routing decisions based on workspace state
- Hub-mediated coordination
4. **Global Workspace Coordination**
- Cross-tile broadcast of salient events
- Winner-take-all workspace selection
- Attention-based routing
- Coherent state synchronization
5. **Multi-Chip Cognitum Coordination**
- Inter-chip communication protocol
- Distributed plasticity updates
- Fault tolerance and graceful degradation
- Scalability to 4+ chips
**Success Criteria:**
- ✅ Online learning without centralized consolidation
- ✅ <5% performance degradation over 1M updates
- ✅ Coherent routing across 64+ tiles
- ✅ Multi-chip coordination with <1ms sync latency
**Demo:**
Continuous learning demonstration:
- 1M+ online updates without catastrophic forgetting
- Cross-tile coherence maintained under load
- Multi-chip coordination with graceful degradation
- EWC prevents forgetting of critical patterns
**Risks & Mitigations:**
- **Risk:** E-prop stability under distribution shift
- **Mitigation:** Adaptive learning rates, eligibility trace decay tuning
- **Risk:** EWC computational overhead
- **Mitigation:** Sparse Fisher approximation, periodic consolidation
- **Risk:** Coherence protocol deadlocks
- **Mitigation:** Timeout-based fallback, formal verification of protocol
- **Risk:** Multi-chip synchronization overhead
- **Mitigation:** Asynchronous updates with eventual consistency
---
## Risk Controls & Safety Mechanisms
### Deterministic Bounds
**Principle:** Every reflex path has a provable maximum execution time
**Implementation:**
- **Static Loop Bounds:** All loops have compile-time maximum iteration counts
- **Hard Timeouts:** Circuit breakers enforce timeouts at hardware level
- **No Dynamic Allocation:** Zero heap allocation in reflex paths
- **Bounded Queues:** Fixed-size event queues with overflow handling
**Verification:**
- Static analysis tools verify loop bounds
- Runtime assertions validate timeout enforcement
- Continuous integration tests measure worst-case execution time
---
### Witness Logging
**Principle:** All safety-relevant decisions are logged for audit and debugging
**Logged Events:**
- **Safety Gate Decisions:** Input hash, output hash, decision (accept/reject)
- **Timestamps:** High-resolution timestamps for causality tracking
- **Latencies:** Per-operation latency for anomaly detection
- **Component ID:** Which tier/tile made the decision
**Storage:**
- Critical decisions → RuVector Postgres (durable)
- High-frequency events → Ring buffer in RuVector Server (ephemeral)
- Aggregated metrics → Postgres (hourly rollup)
**Usage:**
- Post-incident analysis
- Continuous validation of safety properties
- Training data for predictive models
---
### Rate Limiting
**Principle:** Plasticity updates are capped to prevent divergence under adversarial input
**Limits:**
- **Per-Tile:** Max 1000 updates/sec per worker tile
- **Per-Collection:** Max 10000 updates/sec across all tiles
- **BTSP Windows:** Max 100 one-shot associations per window (e.g., 1-second windows)
**Enforcement:**
- Token bucket rate limiter in Cognitum Hub
- Postgres-backed BTSP window tracking
- Automatic throttling with graceful degradation
**Monitoring:**
- Alert on rate limit violations
- Metrics track throttling frequency
- Adaptive threshold tuning based on load
---
### Threshold Versioning
**Principle:** Predictive residual thresholds are versioned with collections for rollback
**Implementation:**
- **Immutable Versions:** Each collection version has frozen thresholds
- **Rollback Capability:** Revert to previous version on performance degradation
- **A/B Testing:** Run multiple threshold versions in parallel
- **Gradual Rollout:** Canary deployments for new thresholds
**Schema:**
```sql
collection_thresholds (
collection_id UUID,
version INT,
predictive_residual_threshold FLOAT,
btsp_eligibility_threshold FLOAT,
kWTA_k INT,
PRIMARY KEY (collection_id, version)
);
```
**Usage:**
- Automatic rollback on >10% performance degradation
- Manual rollback for debugging
- Threshold evolution tracking over time
---
### Circuit Breakers
**Principle:** Automatic fallback to baseline HNSW on failures or latency spikes
**Triggers:**
- **Latency:** p99 latency >2× target for 10 consecutive queries
- **Error Rate:** >5% query failures in 1-second window
- **Safety Gate:** Any hard safety timeout violation
- **Resource Exhaustion:** Queue overflow, memory pressure
**Fallback Behavior:**
- Disable HDC/Hopfield lanes, route all queries to HNSW
- Log circuit breaker activation with full context
- Notify monitoring system for manual investigation
- Automatic reset after cooldown period (e.g., 60 seconds)
**Configuration:**
- Per-collection circuit breaker settings
- Stored in RuVector Postgres
- Hot-reloadable without service restart
---
## Performance Targets Summary
| Metric | Target | Phase | Verification Method |
|--------|--------|-------|---------------------|
| **Inference Latency** | <1ms | Phase 1 | Benchmark suite (p99) |
| **Energy per Query** | <1μJ | Phase 2 | Cognitum power profiler |
| **One-Shot Learning** | Single exposure | Phase 2 | BTSP accuracy tests |
| **Forgetting Prevention** | <5% degradation | Phase 3 | EWC consolidation tests |
| **Capacity Scaling** | Exponential(d) | Phase 1 | Hopfield capacity benchmark |
| **Sparsity** | 2-5% activation | Phase 1 | K-WTA profiling |
| **Reflex Latency** | <100μs | Phase 2 | Tile-level timing analysis |
| **Multi-Tile Coherence** | <1ms sync | Phase 3 | Hub coordination profiler |
| **Safety Gate Violations** | 0 per 1M queries | All | Witness log analysis |
| **Circuit Breaker Rate** | <0.1% of queries | All | Monitoring dashboard |
---
## Integration with Cognitum Hardware
### Cognitum v0 (Simulation)
**Capabilities:**
- Cycle-accurate simulation of tile architecture
- SRAM modeling with realistic latencies
- Event bus simulation with timing
- Power estimation models
**Usage:**
- Phase 1-2 development and validation
- Performance profiling before hardware availability
- Regression testing for deterministic timing
**Limitations:**
- No real power measurements (estimates only)
- Simulation overhead limits scale testing
- May miss hardware-specific edge cases
---
### Cognitum v1 (Hardware)
**Capabilities:**
- Physical neuromorphic tiles with on-tile SRAM
- Real power measurements (<1μJ per query target)
- Hardware-enforced deterministic timing
- Multi-chip interconnect for scaling
**Usage:**
- Phase 2-3 deployment and validation
- Real-world power and latency measurements
- Multi-chip scaling experiments
- Safety-critical deployment validation
**Requirements:**
- Tile firmware with reflex path implementation
- Hub software for coordination and consolidation
- Interconnect drivers for multi-chip communication
- Monitoring and instrumentation infrastructure
---
## Deployment Workflow
### Development Workflow
1. **Local Development**
- RuVector Server runs on developer workstation
- Mock Cognitum simulator for reflex tier
- Local Postgres for persistence
- Unit tests + integration tests
2. **Staging Environment**
- RuVector Server on dedicated server
- Cognitum v0 simulator at scale
- Staging Postgres with production-like data
- Performance regression tests
3. **Production Deployment**
- RuVector Server on high-memory server (128GB+)
- Cognitum v1 hardware tiles
- Production Postgres with replication
- Full monitoring and alerting
---
### Deployment Checklist
**Phase 1 (RuVector Foundation):**
- [ ] HDC module passes all unit tests
- [ ] Hopfield capacity scaling validated
- [ ] K-WTA latency <100μs for d=10000
- [ ] 10× energy efficiency vs baseline HNSW
- [ ] Integration tests with ruvector-core pass
- [ ] Hybrid search demo functional
**Phase 2 (Cognitum Reflex):**
- [ ] Event bus handles 10kHz input stream
- [ ] Reflex latency <100μs (p99)
- [ ] BTSP one-shot learning accuracy >90%
- [ ] Zero off-tile memory access verified
- [ ] Witness logging functional
- [ ] Circuit breakers tested under load
**Phase 3 (Online Learning & Coherence):**
- [ ] E-prop online learning stable over 1M updates
- [ ] EWC prevents >5% forgetting
- [ ] Multi-tile coherence <1ms sync latency
- [ ] Multi-chip coordination functional
- [ ] Rate limiting prevents divergence
- [ ] Threshold versioning and rollback tested
---
## Monitoring & Observability
### Key Metrics
**Latency:**
- p50, p95, p99, p999 latency per tier
- Breakdown by operation (encode, retrieve, consolidate)
- Time-series visualization with anomaly detection
**Throughput:**
- Queries per second per tier
- Event processing rate (reflex tier)
- Plasticity updates per second
**Resource Utilization:**
- CPU, memory, disk usage per tier
- SRAM usage on Cognitum tiles
- Postgres connection pool utilization
**Safety:**
- Circuit breaker activation rate
- Safety gate violation count (target: 0)
- Rate limiter throttling frequency
**Learning:**
- BTSP association success rate
- EWC consolidation loss
- Forgetting rate over time
---
### Alerting Thresholds
**Critical Alerts:**
- Safety gate violation (immediate page)
- Circuit breaker activation (immediate notification)
- p99 latency >10× target (immediate notification)
- Error rate >5% (immediate notification)
**Warning Alerts:**
- p99 latency >2× target
- Rate limiter throttling >1% of requests
- Memory usage >80%
- BTSP association success rate <80%
---
## Appendix: Component Mapping Reference
### RuVector Core Components → Deployment Tiers
| Component | Tier | Rationale |
|-----------|------|-----------|
| HDC Encoding | Tier 1 (Cognitum Tiles) | Deterministic, SRAM-friendly |
| K-WTA Selection | Tier 1 (Cognitum Tiles) | Low-latency, sparse activation |
| Dendritic Coincidence | Tier 1 (Cognitum Tiles) | Event-driven, reflex path |
| BTSP One-Shot | Tier 1 (Cognitum Tiles) | Single-exposure learning |
| Hopfield Retrieval | Tier 3 (RuVector Server) | Large memory, GPU acceleration |
| EWC Consolidation | Tier 2 (Cognitum Hub) | Cross-tile coordination |
| E-prop Learning | Tier 2 (Cognitum Hub) | Plasticity management |
| Workspace Coordination | Tier 2 (Cognitum Hub) | Multi-tile routing |
| Predictive Residual | Tier 3 (RuVector Server) | Requires historical data |
| Collection Versioning | Tier 4 (Postgres) | Durable storage |
| Witness Logging | Tier 4 (Postgres) | Audit trail persistence |
---
## Glossary
- **BTSP:** Behavioral Timescale Synaptic Plasticity (one-shot learning)
- **CLS:** Continuous Learning with Synaptic Intelligence
- **EWC:** Elastic Weight Consolidation (forgetting prevention)
- **E-prop:** Eligibility Propagation (online learning)
- **GWT:** Global Workspace Theory (multi-agent coordination)
- **HDC:** Hyperdimensional Computing
- **K-WTA:** K-Winners-Take-All (sparse activation)
- **SRAM:** Static Random-Access Memory (on-chip memory)
---
## References
1. Cognitum Neuromorphic Hardware Architecture (Internal)
2. Modern Hopfield Networks: https://arxiv.org/abs/2008.02217
3. Hyperdimensional Computing: https://arxiv.org/abs/2111.06077
4. Elastic Weight Consolidation: https://arxiv.org/abs/1612.00796
5. E-prop Learning: https://www.nature.com/articles/s41467-020-17236-y
6. Global Workspace Theory: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5924785/
---
**Document Version:** 1.0
**Last Updated:** 2025-12-28
**Maintainer:** RuVector Nervous System Architecture Team