# EXO-AI 2025 Performance Benchmarks This directory contains comprehensive criterion-based benchmarks for the EXO-AI cognitive substrate. ## Benchmark Suites ### 1. Manifold Benchmarks (`manifold_bench.rs`) **Purpose**: Measure geometric manifold operations for concept embedding and retrieval. **Benchmarks**: - `manifold_retrieval`: Query performance across different concept counts (100-5000) - `manifold_deformation`: Batch embedding throughput (10-500 concepts) - `manifold_local_adaptation`: Adaptive learning speed - `manifold_curvature`: Geometric computation performance **Expected Baselines** (on modern CPU): - Retrieval @ 1000 concepts: < 100μs - Deformation batch (100): < 1ms - Local adaptation: < 50μs - Curvature computation: < 10μs ### 2. Hypergraph Benchmarks (`hypergraph_bench.rs`) **Purpose**: Measure higher-order relational reasoning performance. **Benchmarks**: - `hypergraph_edge_creation`: Hyperedge creation rate (2-50 nodes per edge) - `hypergraph_query`: Incident edge queries (100-5000 edges) - `hypergraph_pattern_match`: Pattern matching latency - `hypergraph_subgraph_extraction`: Subgraph extraction speed **Expected Baselines**: - Edge creation (5 nodes): < 5μs - Query @ 1000 edges: < 50μs - Pattern matching: < 100μs - Subgraph extraction (depth 2): < 200μs ### 3. Temporal Benchmarks (`temporal_bench.rs`) **Purpose**: Measure temporal coordination and causal reasoning. **Benchmarks**: - `temporal_causal_query`: Causal ancestor queries (100-5000 events) - `temporal_consolidation`: Memory consolidation time (100-1000 events) - `temporal_range_query`: Time range query performance - `temporal_causal_path`: Causal path finding - `temporal_event_pruning`: Old event pruning speed **Expected Baselines**: - Causal query @ 1000 events: < 100μs - Consolidation (500 events): < 5ms - Range query: < 200μs - Path finding (100 hops): < 500μs - Pruning (5000 events): < 2ms ### 4. Federation Benchmarks (`federation_bench.rs`) **Purpose**: Measure distributed coordination and consensus. **Benchmarks**: - `federation_crdt_merge`: CRDT operation throughput (10-500 ops) - `federation_consensus`: Consensus round latency (3-10 nodes) - `federation_state_sync`: State synchronization time - `federation_crypto_sign`: Cryptographic signing speed - `federation_crypto_verify`: Signature verification speed - `federation_gossip`: Gossip propagation performance (5-50 nodes) **Expected Baselines** (async operations): - CRDT merge (100 ops): < 5ms - Consensus (5 nodes): < 50ms - State sync (100 items): < 10ms - Sign operation: < 100μs - Verify operation: < 150μs - Gossip (10 nodes): < 20ms ## Running Benchmarks ### Run All Benchmarks ```bash cargo bench ``` ### Run Specific Suite ```bash cargo bench --bench manifold_bench cargo bench --bench hypergraph_bench cargo bench --bench temporal_bench cargo bench --bench federation_bench ``` ### Run Specific Benchmark ```bash cargo bench --bench manifold_bench -- manifold_retrieval cargo bench --bench temporal_bench -- causal_query ``` ### Generate Detailed Reports ```bash cargo bench -- --save-baseline initial cargo bench -- --baseline initial ``` ## Benchmark Configuration Criterion is configured with: - HTML reports enabled (in `target/criterion/`) - Statistical significance testing - Outlier detection - Performance regression detection ## Performance Targets ### Cognitive Operations (Target: Real-time) - Single concept retrieval: < 1ms - Hypergraph query: < 100μs - Causal inference: < 500μs ### Batch Operations (Target: High throughput) - Embedding batch (100): < 5ms - CRDT merges (100): < 10ms - Pattern matching: < 1ms ### Distributed Operations (Target: Low latency) - Consensus round (5 nodes): < 100ms - State synchronization: < 50ms - Gossip propagation: < 20ms/hop ## Analyzing Results 1. **HTML Reports**: Open `target/criterion/report/index.html` 2. **Statistical Analysis**: Check for confidence intervals 3. **Regression Detection**: Compare against baselines 4. **Scaling Analysis**: Review performance across different input sizes ## Optimization Guidelines ### When to Optimize - Operations exceeding 2x baseline targets - Significant performance regressions - Poor scaling characteristics - High variance in measurements ### Optimization Priorities 1. **Critical Path**: Manifold retrieval, hypergraph queries 2. **Throughput**: Batch operations, CRDT merges 3. **Latency**: Consensus, synchronization 4. **Scalability**: Large-scale operations ## Continuous Benchmarking Run benchmarks: - Before major commits - After performance optimizations - During release candidates - Weekly baseline updates ## Hardware Considerations Benchmarks are hardware-dependent. For consistent results: - Use dedicated benchmark machines - Disable CPU frequency scaling - Close unnecessary applications - Run multiple iterations - Use `--baseline` for comparisons ## Contributing When adding new benchmarks: 1. Follow existing naming conventions 2. Include multiple input sizes 3. Document expected baselines 4. Add to this README 5. Verify statistical significance --- **Last Updated**: 2025-11-29 **Benchmark Suite Version**: 0.1.0 **Criterion Version**: 0.5