Files
wifi-densepose/examples/edge-net/docs/benchmarks/BENCHMARK_ANALYSIS.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

356 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Edge-Net Comprehensive Benchmark Analysis
This document provides detailed analysis of the edge-net performance benchmarks, covering spike-driven attention, RAC coherence, learning modules, and integration tests.
## Benchmark Categories
### 1. Spike-Driven Attention Benchmarks
Tests the energy-efficient spike-driven attention mechanism that claims 87x energy savings over standard attention.
**Benchmarks:**
- `bench_spike_encoding_small` - 64 values encoding
- `bench_spike_encoding_medium` - 256 values encoding
- `bench_spike_encoding_large` - 1024 values encoding
- `bench_spike_attention_seq16_dim64` - Attention with 16 seq, 64 dim
- `bench_spike_attention_seq64_dim128` - Attention with 64 seq, 128 dim
- `bench_spike_attention_seq128_dim256` - Attention with 128 seq, 256 dim
- `bench_spike_energy_ratio_calculation` - Energy ratio computation
**Key Metrics:**
- Encoding throughput (values/sec)
- Attention latency vs sequence length
- Energy ratio accuracy (target: 87x)
- Temporal coding overhead
**Expected Performance:**
- Encoding: < 1µs per value
- Attention (64x128): < 100µs
- Energy ratio calculation: < 10ns
- Scaling: O(n*m) where n=seq_len, m=spike_count
### 2. RAC Coherence Benchmarks
Tests the adversarial coherence engine for distributed claim verification and conflict resolution.
**Benchmarks:**
- `bench_rac_event_ingestion` - Single event ingestion
- `bench_rac_event_ingestion_1k` - 1000 events batch ingestion
- `bench_rac_quarantine_check` - Quarantine level lookup
- `bench_rac_quarantine_set_level` - Quarantine level update
- `bench_rac_merkle_root_update` - Merkle root calculation
- `bench_rac_ruvector_similarity` - Semantic similarity computation
**Key Metrics:**
- Event ingestion throughput (events/sec)
- Quarantine check latency
- Merkle proof generation time
- Conflict detection overhead
**Expected Performance:**
- Single event ingestion: < 50µs
- 1K batch ingestion: < 50ms (1000 events/sec)
- Quarantine check: < 100ns (hash map lookup)
- Merkle root: < 1ms for 100 events
- RuVector similarity: < 500ns
### 3. Learning Module Benchmarks
Tests the ReasoningBank pattern storage and trajectory tracking for self-learning.
**Benchmarks:**
- `bench_reasoning_bank_lookup_1k` - Lookup in 1K patterns
- `bench_reasoning_bank_lookup_10k` - Lookup in 10K patterns
- `bench_reasoning_bank_lookup_100k` - Lookup in 100K patterns (if added)
- `bench_reasoning_bank_store` - Pattern storage
- `bench_trajectory_recording` - Trajectory recording
- `bench_pattern_similarity_computation` - Cosine similarity
**Key Metrics:**
- Lookup latency vs database size
- Scaling characteristics (linear, log, constant)
- Storage throughput (patterns/sec)
- Similarity computation cost
**Expected Performance:**
- 1K lookup: < 1ms
- 10K lookup: < 10ms
- 100K lookup: < 100ms
- Pattern store: < 10µs
- Trajectory record: < 5µs
- Similarity: < 200ns per comparison
**Scaling Analysis:**
- Target: O(n) for brute-force similarity search
- With indexing: O(log n) or better
- 1K → 10K should be ~10x increase
- 10K → 100K should be ~10x increase
### 4. Multi-Head Attention Benchmarks
Tests the standard multi-head attention for task routing.
**Benchmarks:**
- `bench_multi_head_attention_2heads_dim8` - 2 heads, 8 dimensions
- `bench_multi_head_attention_4heads_dim64` - 4 heads, 64 dimensions
- `bench_multi_head_attention_8heads_dim128` - 8 heads, 128 dimensions
- `bench_multi_head_attention_8heads_dim256_10keys` - 8 heads, 256 dim, 10 keys
**Key Metrics:**
- Latency vs dimensions
- Latency vs number of heads
- Latency vs number of keys
- Throughput (ops/sec)
**Expected Performance:**
- 2h x 8d: < 1µs
- 4h x 64d: < 10µs
- 8h x 128d: < 50µs
- 8h x 256d x 10k: < 200µs
**Scaling:**
- O(d²) in dimension size (quadratic due to QKV projections)
- O(h) in number of heads (linear parallelization)
- O(k) in number of keys (linear attention)
### 5. Integration Benchmarks
Tests end-to-end performance with combined systems.
**Benchmarks:**
- `bench_end_to_end_task_routing_with_learning` - Full task lifecycle with learning
- `bench_combined_learning_coherence_overhead` - Learning + RAC overhead
- `bench_memory_usage_trajectory_1k` - Memory footprint for 1K trajectories
- `bench_concurrent_learning_and_rac_ops` - Concurrent operations
**Key Metrics:**
- End-to-end task latency
- Combined system overhead
- Memory usage over time
- Concurrent access performance
**Expected Performance:**
- E2E task routing: < 1ms
- Combined overhead: < 500µs for 10 ops each
- Memory 1K trajectories: < 1MB
- Concurrent ops: < 100µs
## Statistical Analysis
For each benchmark, we measure:
### Central Tendency
- **Mean**: Average execution time
- **Median**: Middle value (robust to outliers)
- **Mode**: Most common value
### Dispersion
- **Standard Deviation**: Measure of spread
- **Variance**: Squared deviation
- **Range**: Max - Min
- **IQR**: Interquartile range (75th - 25th percentile)
### Percentiles
- **P50 (Median)**: 50% of samples below this
- **P90**: 90% of samples below this
- **P95**: 95% of samples below this
- **P99**: 99% of samples below this
- **P99.9**: 99.9% of samples below this
### Performance Metrics
- **Throughput**: Operations per second
- **Latency**: Time per operation
- **Jitter**: Variation in latency (StdDev)
- **Efficiency**: Actual vs theoretical performance
## Running Benchmarks
### Prerequisites
```bash
cd /workspaces/ruvector/examples/edge-net
```
### Run All Benchmarks
```bash
# Using nightly Rust (required for bench feature)
rustup default nightly
cargo bench --features bench
# Or using the provided script
./benches/run_benchmarks.sh
```
### Run Specific Categories
```bash
# Spike-driven attention only
cargo bench --features bench -- spike_
# RAC coherence only
cargo bench --features bench -- rac_
# Learning modules only
cargo bench --features bench -- reasoning_bank
cargo bench --features bench -- trajectory
# Multi-head attention only
cargo bench --features bench -- multi_head
# Integration tests only
cargo bench --features bench -- integration
cargo bench --features bench -- end_to_end
```
### Custom Iterations
```bash
# Run with more iterations for statistical significance
BENCH_ITERATIONS=1000 cargo bench --features bench
```
## Interpreting Results
### Good Performance Indicators
**Low latency** - Operations complete quickly
**Low jitter** - Consistent performance (low StdDev)
**Good scaling** - Performance degrades predictably
**High throughput** - Many operations per second
### Performance Red Flags
**High P99/P99.9** - Long tail latencies
**High StdDev** - Inconsistent performance
**Poor scaling** - Worse than O(n) when expected
**Memory growth** - Unbounded memory usage
### Example Output Interpretation
```
bench_spike_attention_seq64_dim128:
Mean: 45,230 ns (45.23 µs)
Median: 44,100 ns
StdDev: 2,150 ns
P95: 48,500 ns
P99: 51,200 ns
Throughput: 22,110 ops/sec
```
**Analysis:**
- ✅ Mean < 100µs target
- ✅ Low jitter (StdDev ~4.7% of mean)
- ✅ P99 close to mean (good tail latency)
- ✅ Throughput adequate for distributed tasks
## Energy Efficiency Analysis
### Spike-Driven vs Standard Attention
**Theoretical Energy Ratio:** 87x
**Calculation:**
```
Standard Attention Energy:
= 2 * seq_len² * hidden_dim * mult_energy_factor
= 2 * 64² * 128 * 3.7
= 3,833,856 energy units
Spike Attention Energy:
= seq_len * avg_spikes * hidden_dim * add_energy_factor
= 64 * 2.4 * 128 * 1.0
= 19,660 energy units
Ratio = 3,833,856 / 19,660 = 195x (theoretical upper bound)
Achieved = ~87x (accounting for encoding overhead)
```
**Validation:**
- Measure actual execution time spike vs standard
- Compare energy consumption if available
- Verify temporal coding overhead is acceptable
## Scaling Characteristics
### Expected Complexity
| Component | Expected | Actual | Status |
|-----------|----------|--------|--------|
| Spike Encoding | O(n*s) | TBD | - |
| Spike Attention | O(n²) | TBD | - |
| RAC Event Ingestion | O(1) | TBD | - |
| RAC Merkle Update | O(n) | TBD | - |
| ReasoningBank Lookup | O(n) | TBD | - |
| Multi-Head Attention | O(n²d) | TBD | - |
### Scaling Tests
To verify scaling characteristics:
1. **Linear Scaling (O(n))**
- 1x → 10x input should show 10x time
- Example: 1K → 10K ReasoningBank
2. **Quadratic Scaling (O(n²))**
- 1x → 10x input should show 100x time
- Example: Attention sequence length
3. **Logarithmic Scaling (O(log n))**
- 1x → 10x input should show ~3.3x time
- Example: Indexed lookup (if implemented)
## Performance Targets Summary
| Component | Metric | Target | Rationale |
|-----------|--------|--------|-----------|
| Spike Encoding | Latency | < 1µs/value | Fast enough for real-time |
| Spike Attention | Latency | < 100µs | Enables 10K ops/sec |
| RAC Ingestion | Throughput | > 1K events/sec | Handle distributed load |
| RAC Quarantine | Latency | < 100ns | Fast decision making |
| ReasoningBank 10K | Latency | < 10ms | Acceptable for async ops |
| Multi-Head 8h×128d | Latency | < 50µs | Real-time routing |
| E2E Task Routing | Latency | < 1ms | User-facing threshold |
## Continuous Monitoring
### Regression Detection
Track benchmarks over time to detect performance regressions:
```bash
# Save baseline
cargo bench --features bench > baseline.txt
# After changes, compare
cargo bench --features bench > current.txt
diff baseline.txt current.txt
```
### CI/CD Integration
Add to GitHub Actions:
```yaml
- name: Run Benchmarks
run: cargo bench --features bench
- name: Compare with baseline
run: ./benches/compare_benchmarks.sh
```
## Contributing
When adding new features:
1. ✅ Add corresponding benchmarks
2. ✅ Document expected performance
3. ✅ Run benchmarks before submitting PR
4. ✅ Include benchmark results in PR description
5. ✅ Ensure no regressions in existing benchmarks
## References
- [Criterion.rs](https://github.com/bheisler/criterion.rs) - Rust benchmarking
- [Statistical Analysis](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing)
- [Performance Testing Best Practices](https://github.com/rust-lang/rust/blob/master/src/doc/rustc-dev-guide/src/tests/perf.md)