Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

10 KiB

Raw Blame History

Edge-Net Comprehensive Benchmark Analysis

This document provides detailed analysis of the edge-net performance benchmarks, covering spike-driven attention, RAC coherence, learning modules, and integration tests.

Benchmark Categories

1. Spike-Driven Attention Benchmarks

Tests the energy-efficient spike-driven attention mechanism that claims 87x energy savings over standard attention.

Benchmarks:

bench_spike_encoding_small - 64 values encoding
bench_spike_encoding_medium - 256 values encoding
bench_spike_encoding_large - 1024 values encoding
bench_spike_attention_seq16_dim64 - Attention with 16 seq, 64 dim
bench_spike_attention_seq64_dim128 - Attention with 64 seq, 128 dim
bench_spike_attention_seq128_dim256 - Attention with 128 seq, 256 dim
bench_spike_energy_ratio_calculation - Energy ratio computation

Key Metrics:

Encoding throughput (values/sec)
Attention latency vs sequence length
Energy ratio accuracy (target: 87x)
Temporal coding overhead

Expected Performance:

Encoding: < 1µs per value
Attention (64x128): < 100µs
Energy ratio calculation: < 10ns
Scaling: O(n*m) where n=seq_len, m=spike_count

2. RAC Coherence Benchmarks

Tests the adversarial coherence engine for distributed claim verification and conflict resolution.

Benchmarks:

bench_rac_event_ingestion - Single event ingestion
bench_rac_event_ingestion_1k - 1000 events batch ingestion
bench_rac_quarantine_check - Quarantine level lookup
bench_rac_quarantine_set_level - Quarantine level update
bench_rac_merkle_root_update - Merkle root calculation
bench_rac_ruvector_similarity - Semantic similarity computation

Key Metrics:

Event ingestion throughput (events/sec)
Quarantine check latency
Merkle proof generation time
Conflict detection overhead

Expected Performance:

Single event ingestion: < 50µs
1K batch ingestion: < 50ms (1000 events/sec)
Quarantine check: < 100ns (hash map lookup)
Merkle root: < 1ms for 100 events
RuVector similarity: < 500ns

3. Learning Module Benchmarks

Tests the ReasoningBank pattern storage and trajectory tracking for self-learning.

Benchmarks:

bench_reasoning_bank_lookup_1k - Lookup in 1K patterns
bench_reasoning_bank_lookup_10k - Lookup in 10K patterns
bench_reasoning_bank_lookup_100k - Lookup in 100K patterns (if added)
bench_reasoning_bank_store - Pattern storage
bench_trajectory_recording - Trajectory recording
bench_pattern_similarity_computation - Cosine similarity

Key Metrics:

Lookup latency vs database size
Scaling characteristics (linear, log, constant)
Storage throughput (patterns/sec)
Similarity computation cost

Expected Performance:

1K lookup: < 1ms
10K lookup: < 10ms
100K lookup: < 100ms
Pattern store: < 10µs
Trajectory record: < 5µs
Similarity: < 200ns per comparison

Scaling Analysis:

Target: O(n) for brute-force similarity search
With indexing: O(log n) or better
1K → 10K should be ~10x increase
10K → 100K should be ~10x increase

4. Multi-Head Attention Benchmarks

Tests the standard multi-head attention for task routing.

Benchmarks:

bench_multi_head_attention_2heads_dim8 - 2 heads, 8 dimensions
bench_multi_head_attention_4heads_dim64 - 4 heads, 64 dimensions
bench_multi_head_attention_8heads_dim128 - 8 heads, 128 dimensions
bench_multi_head_attention_8heads_dim256_10keys - 8 heads, 256 dim, 10 keys

Key Metrics:

Latency vs dimensions
Latency vs number of heads
Latency vs number of keys
Throughput (ops/sec)

Expected Performance:

2h x 8d: < 1µs
4h x 64d: < 10µs
8h x 128d: < 50µs
8h x 256d x 10k: < 200µs

Scaling:

O(d²) in dimension size (quadratic due to QKV projections)
O(h) in number of heads (linear parallelization)
O(k) in number of keys (linear attention)

5. Integration Benchmarks

Tests end-to-end performance with combined systems.

Benchmarks:

bench_end_to_end_task_routing_with_learning - Full task lifecycle with learning
bench_combined_learning_coherence_overhead - Learning + RAC overhead
bench_memory_usage_trajectory_1k - Memory footprint for 1K trajectories
bench_concurrent_learning_and_rac_ops - Concurrent operations

Key Metrics:

End-to-end task latency
Combined system overhead
Memory usage over time
Concurrent access performance

Expected Performance:

E2E task routing: < 1ms
Combined overhead: < 500µs for 10 ops each
Memory 1K trajectories: < 1MB
Concurrent ops: < 100µs

Statistical Analysis

For each benchmark, we measure:

Central Tendency

Mean: Average execution time
Median: Middle value (robust to outliers)
Mode: Most common value

Dispersion

Standard Deviation: Measure of spread
Variance: Squared deviation
Range: Max - Min
IQR: Interquartile range (75th - 25th percentile)

Percentiles

P50 (Median): 50% of samples below this
P90: 90% of samples below this
P95: 95% of samples below this
P99: 99% of samples below this
P99.9: 99.9% of samples below this

Performance Metrics

Throughput: Operations per second
Latency: Time per operation
Jitter: Variation in latency (StdDev)
Efficiency: Actual vs theoretical performance

Running Benchmarks

Prerequisites

cd /workspaces/ruvector/examples/edge-net

Run All Benchmarks

# Using nightly Rust (required for bench feature)
rustup default nightly
cargo bench --features bench

# Or using the provided script
./benches/run_benchmarks.sh

Run Specific Categories

# Spike-driven attention only
cargo bench --features bench -- spike_

# RAC coherence only
cargo bench --features bench -- rac_

# Learning modules only
cargo bench --features bench -- reasoning_bank
cargo bench --features bench -- trajectory

# Multi-head attention only
cargo bench --features bench -- multi_head

# Integration tests only
cargo bench --features bench -- integration
cargo bench --features bench -- end_to_end

Custom Iterations

# Run with more iterations for statistical significance
BENCH_ITERATIONS=1000 cargo bench --features bench

Interpreting Results

Good Performance Indicators

✅ Low latency - Operations complete quickly ✅ Low jitter - Consistent performance (low StdDev) ✅ Good scaling - Performance degrades predictably ✅ High throughput - Many operations per second

Performance Red Flags

❌ High P99/P99.9 - Long tail latencies ❌ High StdDev - Inconsistent performance ❌ Poor scaling - Worse than O(n) when expected ❌ Memory growth - Unbounded memory usage

Example Output Interpretation

bench_spike_attention_seq64_dim128:
  Mean: 45,230 ns (45.23 µs)
  Median: 44,100 ns
  StdDev: 2,150 ns
  P95: 48,500 ns
  P99: 51,200 ns
  Throughput: 22,110 ops/sec

Analysis:

✅ Mean < 100µs target
✅ Low jitter (StdDev ~4.7% of mean)
✅ P99 close to mean (good tail latency)
✅ Throughput adequate for distributed tasks

Energy Efficiency Analysis

Spike-Driven vs Standard Attention

Theoretical Energy Ratio: 87x

Calculation:

Standard Attention Energy:
  = 2 * seq_len² * hidden_dim * mult_energy_factor
  = 2 * 64² * 128 * 3.7
  = 3,833,856 energy units

Spike Attention Energy:
  = seq_len * avg_spikes * hidden_dim * add_energy_factor
  = 64 * 2.4 * 128 * 1.0
  = 19,660 energy units

Ratio = 3,833,856 / 19,660 = 195x (theoretical upper bound)
Achieved = ~87x (accounting for encoding overhead)

Validation:

Measure actual execution time spike vs standard
Compare energy consumption if available
Verify temporal coding overhead is acceptable

Scaling Characteristics

Expected Complexity

Component	Expected	Actual	Status
Spike Encoding	O(n*s)	TBD	-
Spike Attention	O(n²)	TBD	-
RAC Event Ingestion	O(1)	TBD	-
RAC Merkle Update	O(n)	TBD	-
ReasoningBank Lookup	O(n)	TBD	-
Multi-Head Attention	O(n²d)	TBD	-

Scaling Tests

To verify scaling characteristics:

Linear Scaling (O(n))
- 1x → 10x input should show 10x time
- Example: 1K → 10K ReasoningBank
Quadratic Scaling (O(n²))
- 1x → 10x input should show 100x time
- Example: Attention sequence length
Logarithmic Scaling (O(log n))
- 1x → 10x input should show ~3.3x time
- Example: Indexed lookup (if implemented)

Performance Targets Summary

Component	Metric	Target	Rationale
Spike Encoding	Latency	< 1µs/value	Fast enough for real-time
Spike Attention	Latency	< 100µs	Enables 10K ops/sec
RAC Ingestion	Throughput	> 1K events/sec	Handle distributed load
RAC Quarantine	Latency	< 100ns	Fast decision making
ReasoningBank 10K	Latency	< 10ms	Acceptable for async ops
Multi-Head 8h×128d	Latency	< 50µs	Real-time routing
E2E Task Routing	Latency	< 1ms	User-facing threshold

Continuous Monitoring

Regression Detection

Track benchmarks over time to detect performance regressions:

# Save baseline
cargo bench --features bench > baseline.txt

# After changes, compare
cargo bench --features bench > current.txt
diff baseline.txt current.txt

CI/CD Integration

Add to GitHub Actions:

- name: Run Benchmarks
  run: cargo bench --features bench
- name: Compare with baseline
  run: ./benches/compare_benchmarks.sh

Contributing

When adding new features:

✅ Add corresponding benchmarks
✅ Document expected performance
✅ Run benchmarks before submitting PR
✅ Include benchmark results in PR description
✅ Ensure no regressions in existing benchmarks

10 KiB Raw Blame History Unescape Escape

Edge-Net Comprehensive Benchmark Analysis

Benchmark Categories

1. Spike-Driven Attention Benchmarks

2. RAC Coherence Benchmarks

3. Learning Module Benchmarks

4. Multi-Head Attention Benchmarks

5. Integration Benchmarks

Statistical Analysis

Central Tendency

Dispersion

Percentiles

Performance Metrics

Running Benchmarks

Prerequisites

Run All Benchmarks

Run Specific Categories

Custom Iterations

Interpreting Results

Good Performance Indicators

Performance Red Flags

Example Output Interpretation

Energy Efficiency Analysis

Spike-Driven vs Standard Attention

Scaling Characteristics

Expected Complexity

Scaling Tests

Performance Targets Summary

Continuous Monitoring

Regression Detection

CI/CD Integration

Contributing

References

10 KiB

Raw Blame History