EXO-AI 2025 Performance Benchmarks
This directory contains comprehensive criterion-based benchmarks for the EXO-AI cognitive substrate.
Benchmark Suites
1. Manifold Benchmarks (manifold_bench.rs)
Purpose: Measure geometric manifold operations for concept embedding and retrieval.
Benchmarks:
manifold_retrieval: Query performance across different concept counts (100-5000)manifold_deformation: Batch embedding throughput (10-500 concepts)manifold_local_adaptation: Adaptive learning speedmanifold_curvature: Geometric computation performance
Expected Baselines (on modern CPU):
- Retrieval @ 1000 concepts: < 100μs
- Deformation batch (100): < 1ms
- Local adaptation: < 50μs
- Curvature computation: < 10μs
2. Hypergraph Benchmarks (hypergraph_bench.rs)
Purpose: Measure higher-order relational reasoning performance.
Benchmarks:
hypergraph_edge_creation: Hyperedge creation rate (2-50 nodes per edge)hypergraph_query: Incident edge queries (100-5000 edges)hypergraph_pattern_match: Pattern matching latencyhypergraph_subgraph_extraction: Subgraph extraction speed
Expected Baselines:
- Edge creation (5 nodes): < 5μs
- Query @ 1000 edges: < 50μs
- Pattern matching: < 100μs
- Subgraph extraction (depth 2): < 200μs
3. Temporal Benchmarks (temporal_bench.rs)
Purpose: Measure temporal coordination and causal reasoning.
Benchmarks:
temporal_causal_query: Causal ancestor queries (100-5000 events)temporal_consolidation: Memory consolidation time (100-1000 events)temporal_range_query: Time range query performancetemporal_causal_path: Causal path findingtemporal_event_pruning: Old event pruning speed
Expected Baselines:
- Causal query @ 1000 events: < 100μs
- Consolidation (500 events): < 5ms
- Range query: < 200μs
- Path finding (100 hops): < 500μs
- Pruning (5000 events): < 2ms
4. Federation Benchmarks (federation_bench.rs)
Purpose: Measure distributed coordination and consensus.
Benchmarks:
federation_crdt_merge: CRDT operation throughput (10-500 ops)federation_consensus: Consensus round latency (3-10 nodes)federation_state_sync: State synchronization timefederation_crypto_sign: Cryptographic signing speedfederation_crypto_verify: Signature verification speedfederation_gossip: Gossip propagation performance (5-50 nodes)
Expected Baselines (async operations):
- CRDT merge (100 ops): < 5ms
- Consensus (5 nodes): < 50ms
- State sync (100 items): < 10ms
- Sign operation: < 100μs
- Verify operation: < 150μs
- Gossip (10 nodes): < 20ms
Running Benchmarks
Run All Benchmarks
cargo bench
Run Specific Suite
cargo bench --bench manifold_bench
cargo bench --bench hypergraph_bench
cargo bench --bench temporal_bench
cargo bench --bench federation_bench
Run Specific Benchmark
cargo bench --bench manifold_bench -- manifold_retrieval
cargo bench --bench temporal_bench -- causal_query
Generate Detailed Reports
cargo bench -- --save-baseline initial
cargo bench -- --baseline initial
Benchmark Configuration
Criterion is configured with:
- HTML reports enabled (in
target/criterion/) - Statistical significance testing
- Outlier detection
- Performance regression detection
Performance Targets
Cognitive Operations (Target: Real-time)
- Single concept retrieval: < 1ms
- Hypergraph query: < 100μs
- Causal inference: < 500μs
Batch Operations (Target: High throughput)
- Embedding batch (100): < 5ms
- CRDT merges (100): < 10ms
- Pattern matching: < 1ms
Distributed Operations (Target: Low latency)
- Consensus round (5 nodes): < 100ms
- State synchronization: < 50ms
- Gossip propagation: < 20ms/hop
Analyzing Results
- HTML Reports: Open
target/criterion/report/index.html - Statistical Analysis: Check for confidence intervals
- Regression Detection: Compare against baselines
- Scaling Analysis: Review performance across different input sizes
Optimization Guidelines
When to Optimize
- Operations exceeding 2x baseline targets
- Significant performance regressions
- Poor scaling characteristics
- High variance in measurements
Optimization Priorities
- Critical Path: Manifold retrieval, hypergraph queries
- Throughput: Batch operations, CRDT merges
- Latency: Consensus, synchronization
- Scalability: Large-scale operations
Continuous Benchmarking
Run benchmarks:
- Before major commits
- After performance optimizations
- During release candidates
- Weekly baseline updates
Hardware Considerations
Benchmarks are hardware-dependent. For consistent results:
- Use dedicated benchmark machines
- Disable CPU frequency scaling
- Close unnecessary applications
- Run multiple iterations
- Use
--baselinefor comparisons
Contributing
When adding new benchmarks:
- Follow existing naming conventions
- Include multiple input sizes
- Document expected baselines
- Add to this README
- Verify statistical significance
Last Updated: 2025-11-29 Benchmark Suite Version: 0.1.0 Criterion Version: 0.5