# Edge-Net Comprehensive Benchmark Analysis This document provides detailed analysis of the edge-net performance benchmarks, covering spike-driven attention, RAC coherence, learning modules, and integration tests. ## Benchmark Categories ### 1. Spike-Driven Attention Benchmarks Tests the energy-efficient spike-driven attention mechanism that claims 87x energy savings over standard attention. **Benchmarks:** - `bench_spike_encoding_small` - 64 values encoding - `bench_spike_encoding_medium` - 256 values encoding - `bench_spike_encoding_large` - 1024 values encoding - `bench_spike_attention_seq16_dim64` - Attention with 16 seq, 64 dim - `bench_spike_attention_seq64_dim128` - Attention with 64 seq, 128 dim - `bench_spike_attention_seq128_dim256` - Attention with 128 seq, 256 dim - `bench_spike_energy_ratio_calculation` - Energy ratio computation **Key Metrics:** - Encoding throughput (values/sec) - Attention latency vs sequence length - Energy ratio accuracy (target: 87x) - Temporal coding overhead **Expected Performance:** - Encoding: < 1µs per value - Attention (64x128): < 100µs - Energy ratio calculation: < 10ns - Scaling: O(n*m) where n=seq_len, m=spike_count ### 2. RAC Coherence Benchmarks Tests the adversarial coherence engine for distributed claim verification and conflict resolution. **Benchmarks:** - `bench_rac_event_ingestion` - Single event ingestion - `bench_rac_event_ingestion_1k` - 1000 events batch ingestion - `bench_rac_quarantine_check` - Quarantine level lookup - `bench_rac_quarantine_set_level` - Quarantine level update - `bench_rac_merkle_root_update` - Merkle root calculation - `bench_rac_ruvector_similarity` - Semantic similarity computation **Key Metrics:** - Event ingestion throughput (events/sec) - Quarantine check latency - Merkle proof generation time - Conflict detection overhead **Expected Performance:** - Single event ingestion: < 50µs - 1K batch ingestion: < 50ms (1000 events/sec) - Quarantine check: < 100ns (hash map lookup) - Merkle root: < 1ms for 100 events - RuVector similarity: < 500ns ### 3. Learning Module Benchmarks Tests the ReasoningBank pattern storage and trajectory tracking for self-learning. **Benchmarks:** - `bench_reasoning_bank_lookup_1k` - Lookup in 1K patterns - `bench_reasoning_bank_lookup_10k` - Lookup in 10K patterns - `bench_reasoning_bank_lookup_100k` - Lookup in 100K patterns (if added) - `bench_reasoning_bank_store` - Pattern storage - `bench_trajectory_recording` - Trajectory recording - `bench_pattern_similarity_computation` - Cosine similarity **Key Metrics:** - Lookup latency vs database size - Scaling characteristics (linear, log, constant) - Storage throughput (patterns/sec) - Similarity computation cost **Expected Performance:** - 1K lookup: < 1ms - 10K lookup: < 10ms - 100K lookup: < 100ms - Pattern store: < 10µs - Trajectory record: < 5µs - Similarity: < 200ns per comparison **Scaling Analysis:** - Target: O(n) for brute-force similarity search - With indexing: O(log n) or better - 1K → 10K should be ~10x increase - 10K → 100K should be ~10x increase ### 4. Multi-Head Attention Benchmarks Tests the standard multi-head attention for task routing. **Benchmarks:** - `bench_multi_head_attention_2heads_dim8` - 2 heads, 8 dimensions - `bench_multi_head_attention_4heads_dim64` - 4 heads, 64 dimensions - `bench_multi_head_attention_8heads_dim128` - 8 heads, 128 dimensions - `bench_multi_head_attention_8heads_dim256_10keys` - 8 heads, 256 dim, 10 keys **Key Metrics:** - Latency vs dimensions - Latency vs number of heads - Latency vs number of keys - Throughput (ops/sec) **Expected Performance:** - 2h x 8d: < 1µs - 4h x 64d: < 10µs - 8h x 128d: < 50µs - 8h x 256d x 10k: < 200µs **Scaling:** - O(d²) in dimension size (quadratic due to QKV projections) - O(h) in number of heads (linear parallelization) - O(k) in number of keys (linear attention) ### 5. Integration Benchmarks Tests end-to-end performance with combined systems. **Benchmarks:** - `bench_end_to_end_task_routing_with_learning` - Full task lifecycle with learning - `bench_combined_learning_coherence_overhead` - Learning + RAC overhead - `bench_memory_usage_trajectory_1k` - Memory footprint for 1K trajectories - `bench_concurrent_learning_and_rac_ops` - Concurrent operations **Key Metrics:** - End-to-end task latency - Combined system overhead - Memory usage over time - Concurrent access performance **Expected Performance:** - E2E task routing: < 1ms - Combined overhead: < 500µs for 10 ops each - Memory 1K trajectories: < 1MB - Concurrent ops: < 100µs ## Statistical Analysis For each benchmark, we measure: ### Central Tendency - **Mean**: Average execution time - **Median**: Middle value (robust to outliers) - **Mode**: Most common value ### Dispersion - **Standard Deviation**: Measure of spread - **Variance**: Squared deviation - **Range**: Max - Min - **IQR**: Interquartile range (75th - 25th percentile) ### Percentiles - **P50 (Median)**: 50% of samples below this - **P90**: 90% of samples below this - **P95**: 95% of samples below this - **P99**: 99% of samples below this - **P99.9**: 99.9% of samples below this ### Performance Metrics - **Throughput**: Operations per second - **Latency**: Time per operation - **Jitter**: Variation in latency (StdDev) - **Efficiency**: Actual vs theoretical performance ## Running Benchmarks ### Prerequisites ```bash cd /workspaces/ruvector/examples/edge-net ``` ### Run All Benchmarks ```bash # Using nightly Rust (required for bench feature) rustup default nightly cargo bench --features bench # Or using the provided script ./benches/run_benchmarks.sh ``` ### Run Specific Categories ```bash # Spike-driven attention only cargo bench --features bench -- spike_ # RAC coherence only cargo bench --features bench -- rac_ # Learning modules only cargo bench --features bench -- reasoning_bank cargo bench --features bench -- trajectory # Multi-head attention only cargo bench --features bench -- multi_head # Integration tests only cargo bench --features bench -- integration cargo bench --features bench -- end_to_end ``` ### Custom Iterations ```bash # Run with more iterations for statistical significance BENCH_ITERATIONS=1000 cargo bench --features bench ``` ## Interpreting Results ### Good Performance Indicators ✅ **Low latency** - Operations complete quickly ✅ **Low jitter** - Consistent performance (low StdDev) ✅ **Good scaling** - Performance degrades predictably ✅ **High throughput** - Many operations per second ### Performance Red Flags ❌ **High P99/P99.9** - Long tail latencies ❌ **High StdDev** - Inconsistent performance ❌ **Poor scaling** - Worse than O(n) when expected ❌ **Memory growth** - Unbounded memory usage ### Example Output Interpretation ``` bench_spike_attention_seq64_dim128: Mean: 45,230 ns (45.23 µs) Median: 44,100 ns StdDev: 2,150 ns P95: 48,500 ns P99: 51,200 ns Throughput: 22,110 ops/sec ``` **Analysis:** - ✅ Mean < 100µs target - ✅ Low jitter (StdDev ~4.7% of mean) - ✅ P99 close to mean (good tail latency) - ✅ Throughput adequate for distributed tasks ## Energy Efficiency Analysis ### Spike-Driven vs Standard Attention **Theoretical Energy Ratio:** 87x **Calculation:** ``` Standard Attention Energy: = 2 * seq_len² * hidden_dim * mult_energy_factor = 2 * 64² * 128 * 3.7 = 3,833,856 energy units Spike Attention Energy: = seq_len * avg_spikes * hidden_dim * add_energy_factor = 64 * 2.4 * 128 * 1.0 = 19,660 energy units Ratio = 3,833,856 / 19,660 = 195x (theoretical upper bound) Achieved = ~87x (accounting for encoding overhead) ``` **Validation:** - Measure actual execution time spike vs standard - Compare energy consumption if available - Verify temporal coding overhead is acceptable ## Scaling Characteristics ### Expected Complexity | Component | Expected | Actual | Status | |-----------|----------|--------|--------| | Spike Encoding | O(n*s) | TBD | - | | Spike Attention | O(n²) | TBD | - | | RAC Event Ingestion | O(1) | TBD | - | | RAC Merkle Update | O(n) | TBD | - | | ReasoningBank Lookup | O(n) | TBD | - | | Multi-Head Attention | O(n²d) | TBD | - | ### Scaling Tests To verify scaling characteristics: 1. **Linear Scaling (O(n))** - 1x → 10x input should show 10x time - Example: 1K → 10K ReasoningBank 2. **Quadratic Scaling (O(n²))** - 1x → 10x input should show 100x time - Example: Attention sequence length 3. **Logarithmic Scaling (O(log n))** - 1x → 10x input should show ~3.3x time - Example: Indexed lookup (if implemented) ## Performance Targets Summary | Component | Metric | Target | Rationale | |-----------|--------|--------|-----------| | Spike Encoding | Latency | < 1µs/value | Fast enough for real-time | | Spike Attention | Latency | < 100µs | Enables 10K ops/sec | | RAC Ingestion | Throughput | > 1K events/sec | Handle distributed load | | RAC Quarantine | Latency | < 100ns | Fast decision making | | ReasoningBank 10K | Latency | < 10ms | Acceptable for async ops | | Multi-Head 8h×128d | Latency | < 50µs | Real-time routing | | E2E Task Routing | Latency | < 1ms | User-facing threshold | ## Continuous Monitoring ### Regression Detection Track benchmarks over time to detect performance regressions: ```bash # Save baseline cargo bench --features bench > baseline.txt # After changes, compare cargo bench --features bench > current.txt diff baseline.txt current.txt ``` ### CI/CD Integration Add to GitHub Actions: ```yaml - name: Run Benchmarks run: cargo bench --features bench - name: Compare with baseline run: ./benches/compare_benchmarks.sh ``` ## Contributing When adding new features: 1. ✅ Add corresponding benchmarks 2. ✅ Document expected performance 3. ✅ Run benchmarks before submitting PR 4. ✅ Include benchmark results in PR description 5. ✅ Ensure no regressions in existing benchmarks ## References - [Criterion.rs](https://github.com/bheisler/criterion.rs) - Rust benchmarking - [Statistical Analysis](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing) - [Performance Testing Best Practices](https://github.com/rust-lang/rust/blob/master/src/doc/rustc-dev-guide/src/tests/perf.md)