# RuVector Graph Database Benchmarks Comprehensive benchmark suite for RuVector's graph database implementation, comparing performance with Neo4j baseline. ## Overview This benchmark suite validates RuVector's performance claims: - **10x+ faster** than Neo4j for graph traversals - **100x+ faster** for simple node/edge lookups - **Sub-linear scaling** with graph size ## Components ### 1. Rust Benchmarks (`graph_bench.rs`) Located in `/home/user/ruvector/crates/ruvector-graph/benches/graph_bench.rs` **Benchmark Categories:** #### Node Operations - `node_insertion_single` - Single node insertion (1, 10, 100, 1000 nodes) - `node_insertion_batch` - Batch insertion (100, 1K, 10K nodes) - `node_insertion_bulk` - Bulk insertion optimized path (10K, 100K, 1M nodes) #### Edge Operations - `edge_creation` - Edge creation benchmarks (100, 1K, 10K edges) #### Query Operations - `query_node_lookup` - Simple ID-based node lookup (100K nodes) - `query_1hop_traversal` - 1-hop neighbor traversal (fan-out: 1, 10, 100) - `query_2hop_traversal` - 2-hop BFS traversal - `query_path_finding` - Shortest path algorithms - `query_aggregation` - Aggregation queries (count, avg, etc.) #### Concurrency - `concurrent_operations` - Concurrent read/write (2, 4, 8, 16 threads) #### Memory - `memory_usage` - Memory tracking (10K, 100K, 1M nodes) **Run Rust Benchmarks:** ```bash cd /home/user/ruvector/crates/ruvector-graph cargo bench --bench graph_bench # Run specific benchmark cargo bench --bench graph_bench -- node_insertion # Save baseline cargo bench --bench graph_bench -- --save-baseline my-baseline ``` ### 2. TypeScript Test Scenarios (`graph-scenarios.ts`) Defines high-level benchmark scenarios: - **Social Network** (1M users, 10M friendships) - Friend recommendations - Mutual friends - Influencer detection - **Knowledge Graph** (100K entities, 1M relationships) - Multi-hop reasoning - Path finding - Pattern matching - **Temporal Graph** (500K events) - Time-range queries - State transitions - Event aggregation - **Recommendation Engine** - Collaborative filtering - Item recommendations - Trending items - **Fraud Detection** - Circular transfer detection - Network analysis - Risk scoring ### 3. Data Generator (`graph-data-generator.ts`) Uses `@ruvector/agentic-synth` to generate realistic synthetic graph data. **Features:** - AI-powered realistic data generation - Multiple graph topologies - Scale-free networks (preferential attachment) - Temporal event sequences **Generate Datasets:** ```bash cd /home/user/ruvector/benchmarks npm run graph:generate ``` **Datasets Generated:** - `social_network` - 1M nodes, 10M edges - `knowledge_graph` - 100K entities, 1M relationships - `temporal_events` - 500K events with transitions ### 4. Comparison Runner (`comparison-runner.ts`) Runs benchmarks on both RuVector and Neo4j, compares results. **Run Comparisons:** ```bash # All scenarios npm run graph:compare # Specific scenario npm run graph:compare:social npm run graph:compare:knowledge npm run graph:compare:temporal ``` **Comparison Metrics:** - Execution time (ms) - Throughput (ops/sec) - Memory usage (MB) - Latency percentiles (p50, p95, p99) - Speedup calculation - Pass/fail verdict ### 5. Results Reporter (`results-report.ts`) Generates comprehensive HTML and Markdown reports. **Generate Reports:** ```bash npm run graph:report ``` **Output:** - `benchmark-report.html` - Interactive HTML dashboard with charts - `benchmark-report.md` - Markdown summary - `benchmark-data.json` - Raw JSON data ## Quick Start ### 1. Generate Test Data ```bash cd /home/user/ruvector/benchmarks npm run graph:generate ``` ### 2. Run Rust Benchmarks ```bash npm run graph:bench ``` ### 3. Run Comparison Tests ```bash npm run graph:compare ``` ### 4. Generate Report ```bash npm run graph:report ``` ### 5. View Results ```bash npm run dashboard # Open http://localhost:8000/results/graph/benchmark-report.html ``` ## Complete Workflow Run all benchmarks end-to-end: ```bash npm run graph:all ``` This will: 1. Generate synthetic datasets using agentic-synth 2. Run Rust criterion benchmarks 3. Compare with Neo4j baseline 4. Generate HTML/Markdown reports ## Performance Targets ### ✅ Target: 10x Faster Traversals - 1-hop traversal: >10x speedup - 2-hop traversal: >10x speedup - Multi-hop reasoning: >10x speedup ### ✅ Target: 100x Faster Lookups - Node by ID: >100x speedup - Edge lookup: >100x speedup - Property access: >100x speedup ### ✅ Target: Sub-linear Scaling - Performance remains consistent as graph grows - Memory usage scales efficiently - Query time independent of total graph size ## Dataset Specifications ### Social Network ```typescript { nodes: 1_000_000, edges: 10_000_000, labels: ['Person', 'Post', 'Comment', 'Group'], avgDegree: 10, topology: 'scale-free' // Preferential attachment } ``` ### Knowledge Graph ```typescript { nodes: 100_000, edges: 1_000_000, labels: ['Person', 'Organization', 'Location', 'Event', 'Concept'], avgDegree: 10, topology: 'semantic-network' } ``` ### Temporal Events ```typescript { nodes: 500_000, edges: 2_000_000, labels: ['Event', 'State', 'Entity'], timeRange: '365 days', topology: 'temporal-causal' } ``` ## Agentic-Synth Integration The benchmark suite uses `@ruvector/agentic-synth` for intelligent synthetic data generation: ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; const synth = new AgenticSynth({ provider: 'gemini', model: 'gemini-2.0-flash-exp' }); // Generate realistic user profiles const users = await synth.generateStructured({ type: 'json', count: 10000, schema: { name: 'string', age: 'number', location: 'string', interests: 'array' }, prompt: 'Generate diverse social media user profiles...' }); ``` ## Results Directory Structure ``` benchmarks/ ├── data/ │ └── graph/ │ ├── social_network_nodes.json │ ├── social_network_edges.json │ ├── knowledge_graph_nodes.json │ └── temporal_events_nodes.json ├── results/ │ └── graph/ │ ├── social_network_comparison.json │ ├── benchmark-report.html │ ├── benchmark-report.md │ └── benchmark-data.json └── graph/ ├── graph-scenarios.ts ├── graph-data-generator.ts ├── comparison-runner.ts └── results-report.ts ``` ## CI/CD Integration Add to GitHub Actions: ```yaml - name: Run Graph Benchmarks run: | cd benchmarks npm install npm run graph:all - name: Upload Results uses: actions/upload-artifact@v3 with: name: graph-benchmarks path: benchmarks/results/graph/ ``` ## Troubleshooting ### Neo4j Not Available If Neo4j is not installed, the comparison runner will use baseline metrics from previous runs or estimates. ### Memory Issues For large datasets (>1M nodes), increase Node.js heap: ```bash NODE_OPTIONS="--max-old-space-size=8192" npm run graph:generate ``` ### Criterion Baseline Reset benchmark baselines: ```bash cd crates/ruvector-graph cargo bench --bench graph_bench -- --save-baseline new-baseline ``` ## Contributing When adding new benchmarks: 1. Add Rust benchmark to `graph_bench.rs` 2. Create corresponding TypeScript scenario 3. Update data generator if needed 4. Document expected performance targets 5. Update this README ## License MIT - See LICENSE file