wifi-densepose/docs/nervous-system/test-plan.md

# RuVector Nervous System - Comprehensive Test Plan

## Overview

This test plan defines performance targets, quality metrics, and verification strategies for the RuVector Nervous System. All tests are designed to ensure real-time performance, memory efficiency, and biological plausibility.

## 1. Worst-Case Latency Requirements

### Latency Targets

| Component | Target | P50 | P99 | P99.9 | Measurement Method |
|-----------|--------|-----|-----|-------|-------------------|
| **Event Bus** |
| Event publish | <10μs | <5μs | <15μs | <50μs | Criterion benchmark |
| Event delivery (bounded queue) | <5μs | <2μs | <8μs | <20μs | Criterion benchmark |
| Priority routing | <20μs | <10μs | <30μs | <100μs | Criterion benchmark |
| **HDC (Hyperdimensional Computing)** |
| Vector binding (XOR) | <100ns | <50ns | <150ns | <500ns | Criterion benchmark |
| Vector bundling (majority) | <500ns | <200ns | <1μs | <5μs | Criterion benchmark |
| Hamming distance | <100ns | <50ns | <150ns | <500ns | Criterion benchmark |
| Similarity check | <200ns | <100ns | <300ns | <1μs | Criterion benchmark |
| **WTA (Winner-Take-All)** |
| Single winner selection | <1μs | <500ns | <2μs | <10μs | Criterion benchmark |
| k-WTA (k=5) | <5μs | <2μs | <10μs | <50μs | Criterion benchmark |
| Lateral inhibition update | <10μs | <5μs | <20μs | <100μs | Criterion benchmark |
| **Hopfield Networks** |
| Pattern retrieval (100 patterns) | <1ms | <500μs | <2ms | <10ms | Criterion benchmark |
| Pattern storage | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| Energy computation | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |
| **Pattern Separation** |
| Encoding (orthogonalization) | <500μs | <200μs | <1ms | <5ms | Criterion benchmark |
| Collision detection | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| Decorrelation | <200μs | <100μs | <500μs | <2ms | Criterion benchmark |
| **Plasticity** |
| E-prop gradient update | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| BTSP eligibility trace | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |
| EWC Fisher matrix update | <1ms | <500μs | <2ms | <10ms | Criterion benchmark |
| **Cognitum Integration** |
| Reflex event→action | <100μs | <50μs | <200μs | <1ms | Criterion benchmark |
| v0 adapter dispatch | <50μs | <20μs | <100μs | <500μs | Criterion benchmark |

### Benchmark Implementation

**Location**: `crates/ruvector-nervous-system/benches/latency_benchmarks.rs`

**Key Features**:
- Uses Criterion for statistical rigor
- Measures P50, P99, P99.9 percentiles
- Includes warm-up runs
- Tests under load (concurrent operations)
- Regression detection with baselines

## 2. Memory Bounds Verification

### Memory Targets

| Component | Target per Instance | Verification Method |
|-----------|-------------------|-------------------|
| **Plasticity** |
| E-prop synapse state | 8-12 bytes | `std::mem::size_of` |
| BTSP eligibility window | 32 bytes | `std::mem::size_of` |
| EWC Fisher matrix (per layer) | O(n²) sparse | Allocation tracking |
| **Event Bus** |
| Bounded queue entry | 16-24 bytes | `std::mem::size_of` |
| Regional shard overhead | <1KB | Allocation tracking |
| **HDC** |
| Hypervector (10K dims) | 1.25KB (bit-packed) | Direct calculation |
| Encoding cache | <100KB | Memory profiler |
| **Hopfield** |
| Weight matrix (1000 neurons) | ~4MB (f32) or ~1MB (f16) | Direct calculation |
| Pattern storage | O(n×d) | Allocation tracking |
| **Workspace** |
| Global workspace capacity | 4-7 items × vector size | Capacity test |
| Coherence gating state | <1KB | `std::mem::size_of` |

### Verification Strategy

**Location**: `crates/ruvector-nervous-system/tests/memory_bounds.rs`

**Methods**:
1. **Compile-time checks**: `static_assert` for structure sizes
2. **Runtime verification**: Allocation tracking with custom allocator
3. **Stress tests**: Create maximum capacity scenarios
4. **Leak detection**: Valgrind/MIRI integration

**Example**:
```rust
#[test]
fn verify_eprop_synapse_size() {
    assert!(std::mem::size_of::<EPropSynapse>() <= 12);
}

#[test]
fn btsp_window_bounded() {
    let btsp = BTSPLearner::new(1000, 0.01, 100);
    let initial_mem = get_allocated_bytes();
    btsp.train_episodes(1000);
    let final_mem = get_allocated_bytes();
    assert!(final_mem - initial_mem < 100_000); // <100KB growth
}
```

## 3. Retrieval Quality Benchmarks

### Quality Metrics

| Metric | Target | Baseline Comparison | Test Method |
|--------|--------|-------------------|-------------|
| **HDC Recall** |
| Recall@1 vs HNSW | ≥95% of HNSW | Compare on same dataset | Synthetic corpus |
| Recall@10 vs HNSW | ≥90% of HNSW | Compare on same dataset | Synthetic corpus |
| Noise robustness (20% flip) | >80% accuracy | N/A | Bit-flip test |
| **Hopfield Capacity** |
| Pattern capacity (d=512) | ≥2^(d/2) = 2^256 patterns | Theoretical limit | Stress test |
| Retrieval accuracy (0.1 noise) | >95% | N/A | Noisy retrieval |
| **Pattern Separation** |
| Collision rate | <1% for 10K patterns | Random encoding | Synthetic corpus |
| Orthogonality score | >0.9 cosine distance | N/A | Correlation test |
| **Associative Memory** |
| One-shot learning accuracy | >90% | N/A | Single-shot test |
| Multi-pattern interference | <5% accuracy drop | Isolated patterns | Capacity test |

### Test Implementation

**Location**: `crates/ruvector-nervous-system/tests/retrieval_quality.rs`

**Datasets**:
1. **Synthetic**: Controlled distributions (uniform, gaussian, clustered)
2. **Real-world proxy**: MNIST embeddings, SIFT features
3. **Adversarial**: Designed to stress collision detection

**Comparison Baselines**:
- HNSW index (via ruvector-core)
- Exact k-NN (brute force)
- Theoretical limits (Hopfield capacity)

**Example**:
```rust
#[test]
fn hdc_recall_vs_hnsw() {
    let vectors: Vec<Vec<f32>> = generate_synthetic_dataset(10000, 512);
    let queries: Vec<Vec<f32>> = &vectors[0..100];

    // HDC results
    let hdc = HDCIndex::new(512, 10000);
    for (i, v) in vectors.iter().enumerate() {
        hdc.encode_and_store(i, v);
    }
    let hdc_results = queries.iter().map(|q| hdc.search(q, 10)).collect();

    // HNSW results (ground truth)
    let hnsw = HNSWIndex::new(512);
    for (i, v) in vectors.iter().enumerate() {
        hnsw.insert(i, v);
    }
    let hnsw_results = queries.iter().map(|q| hnsw.search(q, 10)).collect();

    // Compare recall
    let recall = calculate_recall(&hdc_results, &hnsw_results);
    assert!(recall >= 0.90, "HDC recall@10 {} < 90% of HNSW", recall);
}
```

## 4. Throughput Benchmarks

### Throughput Targets

| Component | Target | Measurement Condition | Test Method |
|-----------|--------|---------------------|-------------|
| **Event Bus** |
| Event throughput | >10,000 events/ms | Sustained load | Load generator |
| Multi-producer scaling | Linear to 8 cores | Concurrent publishers | Parallel bench |
| Backpressure handling | Graceful degradation | Queue saturation | Stress test |
| **Plasticity** |
| Consolidation replay | >100 samples/sec | Batch processing | Batch timer |
| Meta-learning update | >50 tasks/sec | Task distribution | Task timer |
| **HDC** |
| Encoding throughput | >1M ops/sec | Batch encoding | Throughput bench |
| Similarity checks | >10M ops/sec | SIMD acceleration | Throughput bench |
| **Hopfield** |
| Parallel retrieval | >1000 queries/sec | Batch queries | Throughput bench |

### Sustained Load Tests

**Location**: `crates/ruvector-nervous-system/tests/throughput.rs`

**Duration**: Minimum 60 seconds per test
**Metrics**:
- Operations per second (mean, min, max)
- Latency distribution under load
- CPU utilization
- Memory growth rate

**Example**:
```rust
#[test]
fn event_bus_sustained_throughput() {
    let bus = EventBus::new(1000);
    let start = Instant::now();
    let duration = Duration::from_secs(60);
    let mut count = 0u64;

    while start.elapsed() < duration {
        bus.publish(Event::new("test", vec![0.0; 128]));
        count += 1;
    }

    let events_per_sec = count as f64 / duration.as_secs_f64();
    assert!(events_per_sec > 10_000.0,
            "Event bus throughput {} < 10K/sec", events_per_sec);
}
```

## 5. Integration Tests

### End-to-End Scenarios

**Location**: `crates/ruvector-nervous-system/tests/integration.rs`

| Scenario | Components Tested | Success Criteria |
|----------|------------------|-----------------|
| **DVS Event Processing** | EventBus → HDC → WTA → Hopfield | <1ms end-to-end latency |
| **Associative Recall** | Hopfield → PatternSeparation → EventBus | >95% retrieval accuracy |
| **Adaptive Learning** | BTSP → E-prop → EWC → Memory | Positive transfer, <10% catastrophic forgetting |
| **Cognitive Routing** | Workspace → Coherence → Attention | Correct priority selection |
| **Reflex Arc** | Cognitum → EventBus → WTA → Action | <100μs reflex latency |

### Integration Test Structure

```rust
#[test]
fn test_dvs_to_classification_pipeline() {
    // Setup
    let event_bus = EventBus::new(1000);
    let hdc_encoder = HDCEncoder::new(10000);
    let wta = WTALayer::new(100, 0.5, 0.1);
    let hopfield = ModernHopfield::new(512, 100.0);

    // Train on patterns
    for (label, events) in training_data {
        let hv = hdc_encoder.encode_events(&events);
        let sparse = wta.compete(&hv);
        hopfield.store_labeled(label, &sparse);
    }

    // Test retrieval
    let test_events = generate_test_dvs_stream();
    let start = Instant::now();
    let hv = hdc_encoder.encode_events(&test_events);
    let sparse = wta.compete(&hv);
    let retrieved = hopfield.retrieve(&sparse);
    let latency = start.elapsed();

    // Verify
    assert!(latency < Duration::from_millis(1), "Latency {} > 1ms", latency.as_micros());
    assert!(retrieved.accuracy > 0.95, "Accuracy {} < 95%", retrieved.accuracy);
}
```

## 6. Property-Based Testing

### Invariants to Verify

**Location**: Uses `proptest` crate throughout test suite

| Property | Component | Verification |
|----------|-----------|--------------|
| **HDC** |
| Binding commutativity | `bind(a, b) == bind(b, a)` | Property test |
| Bundling associativity | `bundle([a, b, c]) invariant to order` | Property test |
| Distance symmetry | `distance(a, b) == distance(b, a)` | Property test |
| **Hopfield** |
| Energy monotonic decrease | Energy never increases during retrieval | Property test |
| Fixed point stability | Stored patterns are attractors | Property test |
| **Pattern Separation** |
| Collision bound | Collision rate < theoretical bound | Statistical test |
| Reversibility | `decode(encode(x))` approximates `x` | Property test |

**Example**:
```rust
use proptest::prelude::*;

proptest! {
    #[test]
    fn hopfield_energy_decreases(
        pattern in prop::collection::vec(prop::num::f32::NORMAL, 512)
    ) {
        let mut hopfield = ModernHopfield::new(512, 100.0);
        hopfield.store(pattern.clone());

        let mut state = add_noise(&pattern, 0.2);
        let mut prev_energy = hopfield.energy(&state);

        for _ in 0..10 {
            state = hopfield.update(&state);
            let curr_energy = hopfield.energy(&state);
            prop_assert!(curr_energy <= prev_energy,
                        "Energy increased: {} -> {}", prev_energy, curr_energy);
            prev_energy = curr_energy;
        }
    }
}

proptest! {
    #[test]
    fn hdc_binding_commutative(
        a in hypervector_strategy(),
        b in hypervector_strategy()
    ) {
        let ab = a.bind(&b);
        let ba = b.bind(&a);
        prop_assert_eq!(ab, ba, "Binding not commutative");
    }
}
```

## 7. Performance Regression Detection

### Baseline Storage

**Location**: `crates/ruvector-nervous-system/benches/baselines/`

**Format**: JSON files with historical results
```json
{
  "benchmark": "hopfield_retrieve_1000_patterns",
  "date": "2025-12-28",
  "commit": "abc123",
  "mean": 874.3,
  "std_dev": 12.1,
  "p99": 920.5
}
```

### CI Integration

**GitHub Actions Workflow**:
```yaml
name: Performance Regression Check

on: [pull_request]

jobs:
  bench:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run benchmarks
        run: cargo bench --bench latency_benchmarks -- --save-baseline pr
      - name: Compare to main
        run: |
          git checkout main
          cargo bench --bench latency_benchmarks -- --save-baseline main
          cargo bench --bench latency_benchmarks -- --baseline pr --load-baseline main
      - name: Check thresholds
        run: |
          python scripts/check_regression.py --threshold 1.10 # 10% regression limit
```

### Threshold-Based Pass/Fail

| Metric | Warning Threshold | Failure Threshold |
|--------|------------------|------------------|
| Latency increase | +5% | +10% |
| Throughput decrease | -5% | -10% |
| Memory increase | +10% | +20% |
| Accuracy decrease | -2% | -5% |

## 8. Test Execution Matrix

### Local Development

```bash
# Unit tests
cargo test -p ruvector-nervous-system

# Integration tests
cargo test -p ruvector-nervous-system --test integration

# All benchmarks
cargo bench -p ruvector-nervous-system

# Specific benchmark
cargo bench -p ruvector-nervous-system --bench latency_benchmarks

# With profiling
cargo bench -p ruvector-nervous-system -- --profile-time=10

# Memory bounds check
cargo test -p ruvector-nervous-system --test memory_bounds -- --nocapture
```

### CI Pipeline

| Stage | Tests Run | Success Criteria |
|-------|-----------|-----------------|
| **PR Check** | Unit + Integration | 100% pass |
| **Nightly** | Full benchmark suite | No >10% regressions |
| **Release** | Full suite + extended stress | All thresholds met |

### Platform Coverage

- **Linux x86_64**: Primary target (all tests)
- **Linux ARM64**: Throughput + latency (may differ)
- **macOS**: Compatibility check
- **Windows**: Compatibility check

## 9. Test Data Management

### Synthetic Data Generation

**Location**: `crates/ruvector-nervous-system/tests/data/generators.rs`

- **Uniform random**: `generate_uniform(n, d)`
- **Gaussian clusters**: `generate_clusters(n, k, d, sigma)`
- **Temporal sequences**: `generate_spike_trains(n, duration, rate)`
- **Adversarial**: `generate_collisions(n, d, target_rate)`

### Reproducibility

- All tests use fixed seeds: `rand::SeedableRng::seed_from_u64(42)`
- Snapshot testing for golden outputs
- Version-controlled test vectors

## 10. Documentation and Reporting

### Test Reports

**Generated artifacts**:
- `target/criterion/`: HTML benchmark reports
- `target/coverage/`: Code coverage (via `cargo tarpaulin`)
- `target/flamegraph/`: Performance profiles

### Coverage Targets

| Category | Target |
|----------|--------|
| Line coverage | >85% |
| Branch coverage | >75% |
| Function coverage | >90% |

### Continuous Monitoring

- **Benchmark dashboard**: Track trends over time
- **Alerting**: Slack/email on regression detection
- **Historical comparison**: Compare across releases

---

## Appendix: Test Checklist

### Pre-Release Verification

- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] All benchmarks meet latency targets (P99)
- [ ] Memory bounds verified
- [ ] Retrieval quality ≥95% of baseline
- [ ] Throughput targets met under sustained load
- [ ] No performance regressions >5%
- [ ] Property tests pass (10K iterations)
- [ ] Coverage ≥85%
- [ ] Documentation updated
- [ ] CHANGELOG entries added

### Test Maintenance

- [ ] Review and update baselines quarterly
- [ ] Add tests for each new feature
- [ ] Refactor slow tests
- [ ] Archive obsolete benchmarks
- [ ] Update thresholds based on hardware improvements

---

**Version**: 1.0
**Last Updated**: 2025-12-28
**Maintainer**: RuVector Nervous System Team