git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
10 KiB
Edge-Net Comprehensive Benchmark Analysis
This document provides detailed analysis of the edge-net performance benchmarks, covering spike-driven attention, RAC coherence, learning modules, and integration tests.
Benchmark Categories
1. Spike-Driven Attention Benchmarks
Tests the energy-efficient spike-driven attention mechanism that claims 87x energy savings over standard attention.
Benchmarks:
bench_spike_encoding_small- 64 values encodingbench_spike_encoding_medium- 256 values encodingbench_spike_encoding_large- 1024 values encodingbench_spike_attention_seq16_dim64- Attention with 16 seq, 64 dimbench_spike_attention_seq64_dim128- Attention with 64 seq, 128 dimbench_spike_attention_seq128_dim256- Attention with 128 seq, 256 dimbench_spike_energy_ratio_calculation- Energy ratio computation
Key Metrics:
- Encoding throughput (values/sec)
- Attention latency vs sequence length
- Energy ratio accuracy (target: 87x)
- Temporal coding overhead
Expected Performance:
- Encoding: < 1µs per value
- Attention (64x128): < 100µs
- Energy ratio calculation: < 10ns
- Scaling: O(n*m) where n=seq_len, m=spike_count
2. RAC Coherence Benchmarks
Tests the adversarial coherence engine for distributed claim verification and conflict resolution.
Benchmarks:
bench_rac_event_ingestion- Single event ingestionbench_rac_event_ingestion_1k- 1000 events batch ingestionbench_rac_quarantine_check- Quarantine level lookupbench_rac_quarantine_set_level- Quarantine level updatebench_rac_merkle_root_update- Merkle root calculationbench_rac_ruvector_similarity- Semantic similarity computation
Key Metrics:
- Event ingestion throughput (events/sec)
- Quarantine check latency
- Merkle proof generation time
- Conflict detection overhead
Expected Performance:
- Single event ingestion: < 50µs
- 1K batch ingestion: < 50ms (1000 events/sec)
- Quarantine check: < 100ns (hash map lookup)
- Merkle root: < 1ms for 100 events
- RuVector similarity: < 500ns
3. Learning Module Benchmarks
Tests the ReasoningBank pattern storage and trajectory tracking for self-learning.
Benchmarks:
bench_reasoning_bank_lookup_1k- Lookup in 1K patternsbench_reasoning_bank_lookup_10k- Lookup in 10K patternsbench_reasoning_bank_lookup_100k- Lookup in 100K patterns (if added)bench_reasoning_bank_store- Pattern storagebench_trajectory_recording- Trajectory recordingbench_pattern_similarity_computation- Cosine similarity
Key Metrics:
- Lookup latency vs database size
- Scaling characteristics (linear, log, constant)
- Storage throughput (patterns/sec)
- Similarity computation cost
Expected Performance:
- 1K lookup: < 1ms
- 10K lookup: < 10ms
- 100K lookup: < 100ms
- Pattern store: < 10µs
- Trajectory record: < 5µs
- Similarity: < 200ns per comparison
Scaling Analysis:
- Target: O(n) for brute-force similarity search
- With indexing: O(log n) or better
- 1K → 10K should be ~10x increase
- 10K → 100K should be ~10x increase
4. Multi-Head Attention Benchmarks
Tests the standard multi-head attention for task routing.
Benchmarks:
bench_multi_head_attention_2heads_dim8- 2 heads, 8 dimensionsbench_multi_head_attention_4heads_dim64- 4 heads, 64 dimensionsbench_multi_head_attention_8heads_dim128- 8 heads, 128 dimensionsbench_multi_head_attention_8heads_dim256_10keys- 8 heads, 256 dim, 10 keys
Key Metrics:
- Latency vs dimensions
- Latency vs number of heads
- Latency vs number of keys
- Throughput (ops/sec)
Expected Performance:
- 2h x 8d: < 1µs
- 4h x 64d: < 10µs
- 8h x 128d: < 50µs
- 8h x 256d x 10k: < 200µs
Scaling:
- O(d²) in dimension size (quadratic due to QKV projections)
- O(h) in number of heads (linear parallelization)
- O(k) in number of keys (linear attention)
5. Integration Benchmarks
Tests end-to-end performance with combined systems.
Benchmarks:
bench_end_to_end_task_routing_with_learning- Full task lifecycle with learningbench_combined_learning_coherence_overhead- Learning + RAC overheadbench_memory_usage_trajectory_1k- Memory footprint for 1K trajectoriesbench_concurrent_learning_and_rac_ops- Concurrent operations
Key Metrics:
- End-to-end task latency
- Combined system overhead
- Memory usage over time
- Concurrent access performance
Expected Performance:
- E2E task routing: < 1ms
- Combined overhead: < 500µs for 10 ops each
- Memory 1K trajectories: < 1MB
- Concurrent ops: < 100µs
Statistical Analysis
For each benchmark, we measure:
Central Tendency
- Mean: Average execution time
- Median: Middle value (robust to outliers)
- Mode: Most common value
Dispersion
- Standard Deviation: Measure of spread
- Variance: Squared deviation
- Range: Max - Min
- IQR: Interquartile range (75th - 25th percentile)
Percentiles
- P50 (Median): 50% of samples below this
- P90: 90% of samples below this
- P95: 95% of samples below this
- P99: 99% of samples below this
- P99.9: 99.9% of samples below this
Performance Metrics
- Throughput: Operations per second
- Latency: Time per operation
- Jitter: Variation in latency (StdDev)
- Efficiency: Actual vs theoretical performance
Running Benchmarks
Prerequisites
cd /workspaces/ruvector/examples/edge-net
Run All Benchmarks
# Using nightly Rust (required for bench feature)
rustup default nightly
cargo bench --features bench
# Or using the provided script
./benches/run_benchmarks.sh
Run Specific Categories
# Spike-driven attention only
cargo bench --features bench -- spike_
# RAC coherence only
cargo bench --features bench -- rac_
# Learning modules only
cargo bench --features bench -- reasoning_bank
cargo bench --features bench -- trajectory
# Multi-head attention only
cargo bench --features bench -- multi_head
# Integration tests only
cargo bench --features bench -- integration
cargo bench --features bench -- end_to_end
Custom Iterations
# Run with more iterations for statistical significance
BENCH_ITERATIONS=1000 cargo bench --features bench
Interpreting Results
Good Performance Indicators
✅ Low latency - Operations complete quickly ✅ Low jitter - Consistent performance (low StdDev) ✅ Good scaling - Performance degrades predictably ✅ High throughput - Many operations per second
Performance Red Flags
❌ High P99/P99.9 - Long tail latencies ❌ High StdDev - Inconsistent performance ❌ Poor scaling - Worse than O(n) when expected ❌ Memory growth - Unbounded memory usage
Example Output Interpretation
bench_spike_attention_seq64_dim128:
Mean: 45,230 ns (45.23 µs)
Median: 44,100 ns
StdDev: 2,150 ns
P95: 48,500 ns
P99: 51,200 ns
Throughput: 22,110 ops/sec
Analysis:
- ✅ Mean < 100µs target
- ✅ Low jitter (StdDev ~4.7% of mean)
- ✅ P99 close to mean (good tail latency)
- ✅ Throughput adequate for distributed tasks
Energy Efficiency Analysis
Spike-Driven vs Standard Attention
Theoretical Energy Ratio: 87x
Calculation:
Standard Attention Energy:
= 2 * seq_len² * hidden_dim * mult_energy_factor
= 2 * 64² * 128 * 3.7
= 3,833,856 energy units
Spike Attention Energy:
= seq_len * avg_spikes * hidden_dim * add_energy_factor
= 64 * 2.4 * 128 * 1.0
= 19,660 energy units
Ratio = 3,833,856 / 19,660 = 195x (theoretical upper bound)
Achieved = ~87x (accounting for encoding overhead)
Validation:
- Measure actual execution time spike vs standard
- Compare energy consumption if available
- Verify temporal coding overhead is acceptable
Scaling Characteristics
Expected Complexity
| Component | Expected | Actual | Status |
|---|---|---|---|
| Spike Encoding | O(n*s) | TBD | - |
| Spike Attention | O(n²) | TBD | - |
| RAC Event Ingestion | O(1) | TBD | - |
| RAC Merkle Update | O(n) | TBD | - |
| ReasoningBank Lookup | O(n) | TBD | - |
| Multi-Head Attention | O(n²d) | TBD | - |
Scaling Tests
To verify scaling characteristics:
-
Linear Scaling (O(n))
- 1x → 10x input should show 10x time
- Example: 1K → 10K ReasoningBank
-
Quadratic Scaling (O(n²))
- 1x → 10x input should show 100x time
- Example: Attention sequence length
-
Logarithmic Scaling (O(log n))
- 1x → 10x input should show ~3.3x time
- Example: Indexed lookup (if implemented)
Performance Targets Summary
| Component | Metric | Target | Rationale |
|---|---|---|---|
| Spike Encoding | Latency | < 1µs/value | Fast enough for real-time |
| Spike Attention | Latency | < 100µs | Enables 10K ops/sec |
| RAC Ingestion | Throughput | > 1K events/sec | Handle distributed load |
| RAC Quarantine | Latency | < 100ns | Fast decision making |
| ReasoningBank 10K | Latency | < 10ms | Acceptable for async ops |
| Multi-Head 8h×128d | Latency | < 50µs | Real-time routing |
| E2E Task Routing | Latency | < 1ms | User-facing threshold |
Continuous Monitoring
Regression Detection
Track benchmarks over time to detect performance regressions:
# Save baseline
cargo bench --features bench > baseline.txt
# After changes, compare
cargo bench --features bench > current.txt
diff baseline.txt current.txt
CI/CD Integration
Add to GitHub Actions:
- name: Run Benchmarks
run: cargo bench --features bench
- name: Compare with baseline
run: ./benches/compare_benchmarks.sh
Contributing
When adding new features:
- ✅ Add corresponding benchmarks
- ✅ Document expected performance
- ✅ Run benchmarks before submitting PR
- ✅ Include benchmark results in PR description
- ✅ Ensure no regressions in existing benchmarks
References
- Criterion.rs - Rust benchmarking
- Statistical Analysis
- Performance Testing Best Practices