Files
wifi-densepose/docs/nervous-system/test-plan.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

479 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RuVector Nervous System - Comprehensive Test Plan
## Overview
This test plan defines performance targets, quality metrics, and verification strategies for the RuVector Nervous System. All tests are designed to ensure real-time performance, memory efficiency, and biological plausibility.
## 1. Worst-Case Latency Requirements
### Latency Targets
| Component | Target | P50 | P99 | P99.9 | Measurement Method |
|-----------|--------|-----|-----|-------|-------------------|
| **Event Bus** |
| Event publish | <10μs | <5μs | <15μs | <50μs | Criterion benchmark |
| Event delivery (bounded queue) | <5μs | <2μs | <8μs | <20μs | Criterion benchmark |
| Priority routing | <20μs | <10μs | <30μs | <100μs | Criterion benchmark |
| **HDC (Hyperdimensional Computing)** |
| Vector binding (XOR) | <100ns | <50ns | <150ns | <500ns | Criterion benchmark |
| Vector bundling (majority) | <500ns | <200ns | <1μs | <5μs | Criterion benchmark |
| Hamming distance | <100ns | <50ns | <150ns | <500ns | Criterion benchmark |
| Similarity check | <200ns | <100ns | <300ns | <1μs | Criterion benchmark |
| **WTA (Winner-Take-All)** |
| Single winner selection | <1μs | <500ns | <2μs | <10μs | Criterion benchmark |
| k-WTA (k=5) | <5μs | <2μs | <10μs | <50μs | Criterion benchmark |
| Lateral inhibition update | <10μs | <5μs | <20μs | <100μs | Criterion benchmark |
| **Hopfield Networks** |
| Pattern retrieval (100 patterns) | <1ms | <500μs | <2ms | <10ms | Criterion benchmark |
| Pattern storage | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| Energy computation | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |
| **Pattern Separation** |
| Encoding (orthogonalization) | <500μs | <200μs | <1ms | <5ms | Criterion benchmark |
| Collision detection | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| Decorrelation | <200μs | <100μs | <500μs | <2ms | Criterion benchmark |
| **Plasticity** |
| E-prop gradient update | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| BTSP eligibility trace | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |
| EWC Fisher matrix update | <1ms | <500μs | <2ms | <10ms | Criterion benchmark |
| **Cognitum Integration** |
| Reflex event→action | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| v0 adapter dispatch | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |
### Benchmark Implementation
**Location**: `crates/ruvector-nervous-system/benches/latency_benchmarks.rs`
**Key Features**:
- Uses Criterion for statistical rigor
- Measures P50, P99, P99.9 percentiles
- Includes warm-up runs
- Tests under load (concurrent operations)
- Regression detection with baselines
## 2. Memory Bounds Verification
### Memory Targets
| Component | Target per Instance | Verification Method |
|-----------|-------------------|-------------------|
| **Plasticity** |
| E-prop synapse state | 8-12 bytes | `std::mem::size_of` |
| BTSP eligibility window | 32 bytes | `std::mem::size_of` |
| EWC Fisher matrix (per layer) | O(n²) sparse | Allocation tracking |
| **Event Bus** |
| Bounded queue entry | 16-24 bytes | `std::mem::size_of` |
| Regional shard overhead | <1KB | Allocation tracking |
| **HDC** |
| Hypervector (10K dims) | 1.25KB (bit-packed) | Direct calculation |
| Encoding cache | <100KB | Memory profiler |
| **Hopfield** |
| Weight matrix (1000 neurons) | ~4MB (f32) or ~1MB (f16) | Direct calculation |
| Pattern storage | O(n×d) | Allocation tracking |
| **Workspace** |
| Global workspace capacity | 4-7 items × vector size | Capacity test |
| Coherence gating state | <1KB | `std::mem::size_of` |
### Verification Strategy
**Location**: `crates/ruvector-nervous-system/tests/memory_bounds.rs`
**Methods**:
1. **Compile-time checks**: `static_assert` for structure sizes
2. **Runtime verification**: Allocation tracking with custom allocator
3. **Stress tests**: Create maximum capacity scenarios
4. **Leak detection**: Valgrind/MIRI integration
**Example**:
```rust
#[test]
fn verify_eprop_synapse_size() {
assert!(std::mem::size_of::<EPropSynapse>() <= 12);
}
#[test]
fn btsp_window_bounded() {
let btsp = BTSPLearner::new(1000, 0.01, 100);
let initial_mem = get_allocated_bytes();
btsp.train_episodes(1000);
let final_mem = get_allocated_bytes();
assert!(final_mem - initial_mem < 100_000); // <100KB growth
}
```
## 3. Retrieval Quality Benchmarks
### Quality Metrics
| Metric | Target | Baseline Comparison | Test Method |
|--------|--------|-------------------|-------------|
| **HDC Recall** |
| Recall@1 vs HNSW | ≥95% of HNSW | Compare on same dataset | Synthetic corpus |
| Recall@10 vs HNSW | ≥90% of HNSW | Compare on same dataset | Synthetic corpus |
| Noise robustness (20% flip) | >80% accuracy | N/A | Bit-flip test |
| **Hopfield Capacity** |
| Pattern capacity (d=512) | ≥2^(d/2) = 2^256 patterns | Theoretical limit | Stress test |
| Retrieval accuracy (0.1 noise) | >95% | N/A | Noisy retrieval |
| **Pattern Separation** |
| Collision rate | <1% for 10K patterns | Random encoding | Synthetic corpus |
| Orthogonality score | >0.9 cosine distance | N/A | Correlation test |
| **Associative Memory** |
| One-shot learning accuracy | >90% | N/A | Single-shot test |
| Multi-pattern interference | <5% accuracy drop | Isolated patterns | Capacity test |
### Test Implementation
**Location**: `crates/ruvector-nervous-system/tests/retrieval_quality.rs`
**Datasets**:
1. **Synthetic**: Controlled distributions (uniform, gaussian, clustered)
2. **Real-world proxy**: MNIST embeddings, SIFT features
3. **Adversarial**: Designed to stress collision detection
**Comparison Baselines**:
- HNSW index (via ruvector-core)
- Exact k-NN (brute force)
- Theoretical limits (Hopfield capacity)
**Example**:
```rust
#[test]
fn hdc_recall_vs_hnsw() {
let vectors: Vec<Vec<f32>> = generate_synthetic_dataset(10000, 512);
let queries: Vec<Vec<f32>> = &vectors[0..100];
// HDC results
let hdc = HDCIndex::new(512, 10000);
for (i, v) in vectors.iter().enumerate() {
hdc.encode_and_store(i, v);
}
let hdc_results = queries.iter().map(|q| hdc.search(q, 10)).collect();
// HNSW results (ground truth)
let hnsw = HNSWIndex::new(512);
for (i, v) in vectors.iter().enumerate() {
hnsw.insert(i, v);
}
let hnsw_results = queries.iter().map(|q| hnsw.search(q, 10)).collect();
// Compare recall
let recall = calculate_recall(&hdc_results, &hnsw_results);
assert!(recall >= 0.90, "HDC recall@10 {} < 90% of HNSW", recall);
}
```
## 4. Throughput Benchmarks
### Throughput Targets
| Component | Target | Measurement Condition | Test Method |
|-----------|--------|---------------------|-------------|
| **Event Bus** |
| Event throughput | >10,000 events/ms | Sustained load | Load generator |
| Multi-producer scaling | Linear to 8 cores | Concurrent publishers | Parallel bench |
| Backpressure handling | Graceful degradation | Queue saturation | Stress test |
| **Plasticity** |
| Consolidation replay | >100 samples/sec | Batch processing | Batch timer |
| Meta-learning update | >50 tasks/sec | Task distribution | Task timer |
| **HDC** |
| Encoding throughput | >1M ops/sec | Batch encoding | Throughput bench |
| Similarity checks | >10M ops/sec | SIMD acceleration | Throughput bench |
| **Hopfield** |
| Parallel retrieval | >1000 queries/sec | Batch queries | Throughput bench |
### Sustained Load Tests
**Location**: `crates/ruvector-nervous-system/tests/throughput.rs`
**Duration**: Minimum 60 seconds per test
**Metrics**:
- Operations per second (mean, min, max)
- Latency distribution under load
- CPU utilization
- Memory growth rate
**Example**:
```rust
#[test]
fn event_bus_sustained_throughput() {
let bus = EventBus::new(1000);
let start = Instant::now();
let duration = Duration::from_secs(60);
let mut count = 0u64;
while start.elapsed() < duration {
bus.publish(Event::new("test", vec![0.0; 128]));
count += 1;
}
let events_per_sec = count as f64 / duration.as_secs_f64();
assert!(events_per_sec > 10_000.0,
"Event bus throughput {} < 10K/sec", events_per_sec);
}
```
## 5. Integration Tests
### End-to-End Scenarios
**Location**: `crates/ruvector-nervous-system/tests/integration.rs`
| Scenario | Components Tested | Success Criteria |
|----------|------------------|-----------------|
| **DVS Event Processing** | EventBus → HDC → WTA → Hopfield | <1ms end-to-end latency |
| **Associative Recall** | Hopfield → PatternSeparation → EventBus | >95% retrieval accuracy |
| **Adaptive Learning** | BTSP → E-prop → EWC → Memory | Positive transfer, <10% catastrophic forgetting |
| **Cognitive Routing** | Workspace → Coherence → Attention | Correct priority selection |
| **Reflex Arc** | Cognitum → EventBus → WTA → Action | <100μs reflex latency |
### Integration Test Structure
```rust
#[test]
fn test_dvs_to_classification_pipeline() {
// Setup
let event_bus = EventBus::new(1000);
let hdc_encoder = HDCEncoder::new(10000);
let wta = WTALayer::new(100, 0.5, 0.1);
let hopfield = ModernHopfield::new(512, 100.0);
// Train on patterns
for (label, events) in training_data {
let hv = hdc_encoder.encode_events(&events);
let sparse = wta.compete(&hv);
hopfield.store_labeled(label, &sparse);
}
// Test retrieval
let test_events = generate_test_dvs_stream();
let start = Instant::now();
let hv = hdc_encoder.encode_events(&test_events);
let sparse = wta.compete(&hv);
let retrieved = hopfield.retrieve(&sparse);
let latency = start.elapsed();
// Verify
assert!(latency < Duration::from_millis(1), "Latency {} > 1ms", latency.as_micros());
assert!(retrieved.accuracy > 0.95, "Accuracy {} < 95%", retrieved.accuracy);
}
```
## 6. Property-Based Testing
### Invariants to Verify
**Location**: Uses `proptest` crate throughout test suite
| Property | Component | Verification |
|----------|-----------|--------------|
| **HDC** |
| Binding commutativity | `bind(a, b) == bind(b, a)` | Property test |
| Bundling associativity | `bundle([a, b, c]) invariant to order` | Property test |
| Distance symmetry | `distance(a, b) == distance(b, a)` | Property test |
| **Hopfield** |
| Energy monotonic decrease | Energy never increases during retrieval | Property test |
| Fixed point stability | Stored patterns are attractors | Property test |
| **Pattern Separation** |
| Collision bound | Collision rate < theoretical bound | Statistical test |
| Reversibility | `decode(encode(x))` approximates `x` | Property test |
**Example**:
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn hopfield_energy_decreases(
pattern in prop::collection::vec(prop::num::f32::NORMAL, 512)
) {
let mut hopfield = ModernHopfield::new(512, 100.0);
hopfield.store(pattern.clone());
let mut state = add_noise(&pattern, 0.2);
let mut prev_energy = hopfield.energy(&state);
for _ in 0..10 {
state = hopfield.update(&state);
let curr_energy = hopfield.energy(&state);
prop_assert!(curr_energy <= prev_energy,
"Energy increased: {} -> {}", prev_energy, curr_energy);
prev_energy = curr_energy;
}
}
}
proptest! {
#[test]
fn hdc_binding_commutative(
a in hypervector_strategy(),
b in hypervector_strategy()
) {
let ab = a.bind(&b);
let ba = b.bind(&a);
prop_assert_eq!(ab, ba, "Binding not commutative");
}
}
```
## 7. Performance Regression Detection
### Baseline Storage
**Location**: `crates/ruvector-nervous-system/benches/baselines/`
**Format**: JSON files with historical results
```json
{
"benchmark": "hopfield_retrieve_1000_patterns",
"date": "2025-12-28",
"commit": "abc123",
"mean": 874.3,
"std_dev": 12.1,
"p99": 920.5
}
```
### CI Integration
**GitHub Actions Workflow**:
```yaml
name: Performance Regression Check
on: [pull_request]
jobs:
bench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run benchmarks
run: cargo bench --bench latency_benchmarks -- --save-baseline pr
- name: Compare to main
run: |
git checkout main
cargo bench --bench latency_benchmarks -- --save-baseline main
cargo bench --bench latency_benchmarks -- --baseline pr --load-baseline main
- name: Check thresholds
run: |
python scripts/check_regression.py --threshold 1.10 # 10% regression limit
```
### Threshold-Based Pass/Fail
| Metric | Warning Threshold | Failure Threshold |
|--------|------------------|------------------|
| Latency increase | +5% | +10% |
| Throughput decrease | -5% | -10% |
| Memory increase | +10% | +20% |
| Accuracy decrease | -2% | -5% |
## 8. Test Execution Matrix
### Local Development
```bash
# Unit tests
cargo test -p ruvector-nervous-system
# Integration tests
cargo test -p ruvector-nervous-system --test integration
# All benchmarks
cargo bench -p ruvector-nervous-system
# Specific benchmark
cargo bench -p ruvector-nervous-system --bench latency_benchmarks
# With profiling
cargo bench -p ruvector-nervous-system -- --profile-time=10
# Memory bounds check
cargo test -p ruvector-nervous-system --test memory_bounds -- --nocapture
```
### CI Pipeline
| Stage | Tests Run | Success Criteria |
|-------|-----------|-----------------|
| **PR Check** | Unit + Integration | 100% pass |
| **Nightly** | Full benchmark suite | No >10% regressions |
| **Release** | Full suite + extended stress | All thresholds met |
### Platform Coverage
- **Linux x86_64**: Primary target (all tests)
- **Linux ARM64**: Throughput + latency (may differ)
- **macOS**: Compatibility check
- **Windows**: Compatibility check
## 9. Test Data Management
### Synthetic Data Generation
**Location**: `crates/ruvector-nervous-system/tests/data/generators.rs`
- **Uniform random**: `generate_uniform(n, d)`
- **Gaussian clusters**: `generate_clusters(n, k, d, sigma)`
- **Temporal sequences**: `generate_spike_trains(n, duration, rate)`
- **Adversarial**: `generate_collisions(n, d, target_rate)`
### Reproducibility
- All tests use fixed seeds: `rand::SeedableRng::seed_from_u64(42)`
- Snapshot testing for golden outputs
- Version-controlled test vectors
## 10. Documentation and Reporting
### Test Reports
**Generated artifacts**:
- `target/criterion/`: HTML benchmark reports
- `target/coverage/`: Code coverage (via `cargo tarpaulin`)
- `target/flamegraph/`: Performance profiles
### Coverage Targets
| Category | Target |
|----------|--------|
| Line coverage | >85% |
| Branch coverage | >75% |
| Function coverage | >90% |
### Continuous Monitoring
- **Benchmark dashboard**: Track trends over time
- **Alerting**: Slack/email on regression detection
- **Historical comparison**: Compare across releases
---
## Appendix: Test Checklist
### Pre-Release Verification
- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] All benchmarks meet latency targets (P99)
- [ ] Memory bounds verified
- [ ] Retrieval quality ≥95% of baseline
- [ ] Throughput targets met under sustained load
- [ ] No performance regressions >5%
- [ ] Property tests pass (10K iterations)
- [ ] Coverage ≥85%
- [ ] Documentation updated
- [ ] CHANGELOG entries added
### Test Maintenance
- [ ] Review and update baselines quarterly
- [ ] Add tests for each new feature
- [ ] Refactor slow tests
- [ ] Archive obsolete benchmarks
- [ ] Update thresholds based on hardware improvements
---
**Version**: 1.0
**Last Updated**: 2025-12-28
**Maintainer**: RuVector Nervous System Team