Files

EXO-AI 2025 Performance Benchmarks

This directory contains comprehensive criterion-based benchmarks for the EXO-AI cognitive substrate.

Benchmark Suites

1. Manifold Benchmarks (manifold_bench.rs)

Purpose: Measure geometric manifold operations for concept embedding and retrieval.

Benchmarks:

  • manifold_retrieval: Query performance across different concept counts (100-5000)
  • manifold_deformation: Batch embedding throughput (10-500 concepts)
  • manifold_local_adaptation: Adaptive learning speed
  • manifold_curvature: Geometric computation performance

Expected Baselines (on modern CPU):

  • Retrieval @ 1000 concepts: < 100μs
  • Deformation batch (100): < 1ms
  • Local adaptation: < 50μs
  • Curvature computation: < 10μs

2. Hypergraph Benchmarks (hypergraph_bench.rs)

Purpose: Measure higher-order relational reasoning performance.

Benchmarks:

  • hypergraph_edge_creation: Hyperedge creation rate (2-50 nodes per edge)
  • hypergraph_query: Incident edge queries (100-5000 edges)
  • hypergraph_pattern_match: Pattern matching latency
  • hypergraph_subgraph_extraction: Subgraph extraction speed

Expected Baselines:

  • Edge creation (5 nodes): < 5μs
  • Query @ 1000 edges: < 50μs
  • Pattern matching: < 100μs
  • Subgraph extraction (depth 2): < 200μs

3. Temporal Benchmarks (temporal_bench.rs)

Purpose: Measure temporal coordination and causal reasoning.

Benchmarks:

  • temporal_causal_query: Causal ancestor queries (100-5000 events)
  • temporal_consolidation: Memory consolidation time (100-1000 events)
  • temporal_range_query: Time range query performance
  • temporal_causal_path: Causal path finding
  • temporal_event_pruning: Old event pruning speed

Expected Baselines:

  • Causal query @ 1000 events: < 100μs
  • Consolidation (500 events): < 5ms
  • Range query: < 200μs
  • Path finding (100 hops): < 500μs
  • Pruning (5000 events): < 2ms

4. Federation Benchmarks (federation_bench.rs)

Purpose: Measure distributed coordination and consensus.

Benchmarks:

  • federation_crdt_merge: CRDT operation throughput (10-500 ops)
  • federation_consensus: Consensus round latency (3-10 nodes)
  • federation_state_sync: State synchronization time
  • federation_crypto_sign: Cryptographic signing speed
  • federation_crypto_verify: Signature verification speed
  • federation_gossip: Gossip propagation performance (5-50 nodes)

Expected Baselines (async operations):

  • CRDT merge (100 ops): < 5ms
  • Consensus (5 nodes): < 50ms
  • State sync (100 items): < 10ms
  • Sign operation: < 100μs
  • Verify operation: < 150μs
  • Gossip (10 nodes): < 20ms

Running Benchmarks

Run All Benchmarks

cargo bench

Run Specific Suite

cargo bench --bench manifold_bench
cargo bench --bench hypergraph_bench
cargo bench --bench temporal_bench
cargo bench --bench federation_bench

Run Specific Benchmark

cargo bench --bench manifold_bench -- manifold_retrieval
cargo bench --bench temporal_bench -- causal_query

Generate Detailed Reports

cargo bench -- --save-baseline initial
cargo bench -- --baseline initial

Benchmark Configuration

Criterion is configured with:

  • HTML reports enabled (in target/criterion/)
  • Statistical significance testing
  • Outlier detection
  • Performance regression detection

Performance Targets

Cognitive Operations (Target: Real-time)

  • Single concept retrieval: < 1ms
  • Hypergraph query: < 100μs
  • Causal inference: < 500μs

Batch Operations (Target: High throughput)

  • Embedding batch (100): < 5ms
  • CRDT merges (100): < 10ms
  • Pattern matching: < 1ms

Distributed Operations (Target: Low latency)

  • Consensus round (5 nodes): < 100ms
  • State synchronization: < 50ms
  • Gossip propagation: < 20ms/hop

Analyzing Results

  1. HTML Reports: Open target/criterion/report/index.html
  2. Statistical Analysis: Check for confidence intervals
  3. Regression Detection: Compare against baselines
  4. Scaling Analysis: Review performance across different input sizes

Optimization Guidelines

When to Optimize

  • Operations exceeding 2x baseline targets
  • Significant performance regressions
  • Poor scaling characteristics
  • High variance in measurements

Optimization Priorities

  1. Critical Path: Manifold retrieval, hypergraph queries
  2. Throughput: Batch operations, CRDT merges
  3. Latency: Consensus, synchronization
  4. Scalability: Large-scale operations

Continuous Benchmarking

Run benchmarks:

  • Before major commits
  • After performance optimizations
  • During release candidates
  • Weekly baseline updates

Hardware Considerations

Benchmarks are hardware-dependent. For consistent results:

  • Use dedicated benchmark machines
  • Disable CPU frequency scaling
  • Close unnecessary applications
  • Run multiple iterations
  • Use --baseline for comparisons

Contributing

When adding new benchmarks:

  1. Follow existing naming conventions
  2. Include multiple input sizes
  3. Document expected baselines
  4. Add to this README
  5. Verify statistical significance

Last Updated: 2025-11-29 Benchmark Suite Version: 0.1.0 Criterion Version: 0.5