git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
330 lines
7.4 KiB
Markdown
330 lines
7.4 KiB
Markdown
# RuVector Graph Database Benchmarks
|
|
|
|
Comprehensive benchmark suite for RuVector's graph database implementation, comparing performance with Neo4j baseline.
|
|
|
|
## Overview
|
|
|
|
This benchmark suite validates RuVector's performance claims:
|
|
- **10x+ faster** than Neo4j for graph traversals
|
|
- **100x+ faster** for simple node/edge lookups
|
|
- **Sub-linear scaling** with graph size
|
|
|
|
## Components
|
|
|
|
### 1. Rust Benchmarks (`graph_bench.rs`)
|
|
|
|
Located in `/home/user/ruvector/crates/ruvector-graph/benches/graph_bench.rs`
|
|
|
|
**Benchmark Categories:**
|
|
|
|
#### Node Operations
|
|
- `node_insertion_single` - Single node insertion (1, 10, 100, 1000 nodes)
|
|
- `node_insertion_batch` - Batch insertion (100, 1K, 10K nodes)
|
|
- `node_insertion_bulk` - Bulk insertion optimized path (10K, 100K, 1M nodes)
|
|
|
|
#### Edge Operations
|
|
- `edge_creation` - Edge creation benchmarks (100, 1K, 10K edges)
|
|
|
|
#### Query Operations
|
|
- `query_node_lookup` - Simple ID-based node lookup (100K nodes)
|
|
- `query_1hop_traversal` - 1-hop neighbor traversal (fan-out: 1, 10, 100)
|
|
- `query_2hop_traversal` - 2-hop BFS traversal
|
|
- `query_path_finding` - Shortest path algorithms
|
|
- `query_aggregation` - Aggregation queries (count, avg, etc.)
|
|
|
|
#### Concurrency
|
|
- `concurrent_operations` - Concurrent read/write (2, 4, 8, 16 threads)
|
|
|
|
#### Memory
|
|
- `memory_usage` - Memory tracking (10K, 100K, 1M nodes)
|
|
|
|
**Run Rust Benchmarks:**
|
|
```bash
|
|
cd /home/user/ruvector/crates/ruvector-graph
|
|
cargo bench --bench graph_bench
|
|
|
|
# Run specific benchmark
|
|
cargo bench --bench graph_bench -- node_insertion
|
|
|
|
# Save baseline
|
|
cargo bench --bench graph_bench -- --save-baseline my-baseline
|
|
```
|
|
|
|
### 2. TypeScript Test Scenarios (`graph-scenarios.ts`)
|
|
|
|
Defines high-level benchmark scenarios:
|
|
|
|
- **Social Network** (1M users, 10M friendships)
|
|
- Friend recommendations
|
|
- Mutual friends
|
|
- Influencer detection
|
|
|
|
- **Knowledge Graph** (100K entities, 1M relationships)
|
|
- Multi-hop reasoning
|
|
- Path finding
|
|
- Pattern matching
|
|
|
|
- **Temporal Graph** (500K events)
|
|
- Time-range queries
|
|
- State transitions
|
|
- Event aggregation
|
|
|
|
- **Recommendation Engine**
|
|
- Collaborative filtering
|
|
- Item recommendations
|
|
- Trending items
|
|
|
|
- **Fraud Detection**
|
|
- Circular transfer detection
|
|
- Network analysis
|
|
- Risk scoring
|
|
|
|
### 3. Data Generator (`graph-data-generator.ts`)
|
|
|
|
Uses `@ruvector/agentic-synth` to generate realistic synthetic graph data.
|
|
|
|
**Features:**
|
|
- AI-powered realistic data generation
|
|
- Multiple graph topologies
|
|
- Scale-free networks (preferential attachment)
|
|
- Temporal event sequences
|
|
|
|
**Generate Datasets:**
|
|
```bash
|
|
cd /home/user/ruvector/benchmarks
|
|
npm run graph:generate
|
|
```
|
|
|
|
**Datasets Generated:**
|
|
- `social_network` - 1M nodes, 10M edges
|
|
- `knowledge_graph` - 100K entities, 1M relationships
|
|
- `temporal_events` - 500K events with transitions
|
|
|
|
### 4. Comparison Runner (`comparison-runner.ts`)
|
|
|
|
Runs benchmarks on both RuVector and Neo4j, compares results.
|
|
|
|
**Run Comparisons:**
|
|
```bash
|
|
# All scenarios
|
|
npm run graph:compare
|
|
|
|
# Specific scenario
|
|
npm run graph:compare:social
|
|
npm run graph:compare:knowledge
|
|
npm run graph:compare:temporal
|
|
```
|
|
|
|
**Comparison Metrics:**
|
|
- Execution time (ms)
|
|
- Throughput (ops/sec)
|
|
- Memory usage (MB)
|
|
- Latency percentiles (p50, p95, p99)
|
|
- Speedup calculation
|
|
- Pass/fail verdict
|
|
|
|
### 5. Results Reporter (`results-report.ts`)
|
|
|
|
Generates comprehensive HTML and Markdown reports.
|
|
|
|
**Generate Reports:**
|
|
```bash
|
|
npm run graph:report
|
|
```
|
|
|
|
**Output:**
|
|
- `benchmark-report.html` - Interactive HTML dashboard with charts
|
|
- `benchmark-report.md` - Markdown summary
|
|
- `benchmark-data.json` - Raw JSON data
|
|
|
|
## Quick Start
|
|
|
|
### 1. Generate Test Data
|
|
```bash
|
|
cd /home/user/ruvector/benchmarks
|
|
npm run graph:generate
|
|
```
|
|
|
|
### 2. Run Rust Benchmarks
|
|
```bash
|
|
npm run graph:bench
|
|
```
|
|
|
|
### 3. Run Comparison Tests
|
|
```bash
|
|
npm run graph:compare
|
|
```
|
|
|
|
### 4. Generate Report
|
|
```bash
|
|
npm run graph:report
|
|
```
|
|
|
|
### 5. View Results
|
|
```bash
|
|
npm run dashboard
|
|
# Open http://localhost:8000/results/graph/benchmark-report.html
|
|
```
|
|
|
|
## Complete Workflow
|
|
|
|
Run all benchmarks end-to-end:
|
|
```bash
|
|
npm run graph:all
|
|
```
|
|
|
|
This will:
|
|
1. Generate synthetic datasets using agentic-synth
|
|
2. Run Rust criterion benchmarks
|
|
3. Compare with Neo4j baseline
|
|
4. Generate HTML/Markdown reports
|
|
|
|
## Performance Targets
|
|
|
|
### ✅ Target: 10x Faster Traversals
|
|
- 1-hop traversal: >10x speedup
|
|
- 2-hop traversal: >10x speedup
|
|
- Multi-hop reasoning: >10x speedup
|
|
|
|
### ✅ Target: 100x Faster Lookups
|
|
- Node by ID: >100x speedup
|
|
- Edge lookup: >100x speedup
|
|
- Property access: >100x speedup
|
|
|
|
### ✅ Target: Sub-linear Scaling
|
|
- Performance remains consistent as graph grows
|
|
- Memory usage scales efficiently
|
|
- Query time independent of total graph size
|
|
|
|
## Dataset Specifications
|
|
|
|
### Social Network
|
|
```typescript
|
|
{
|
|
nodes: 1_000_000,
|
|
edges: 10_000_000,
|
|
labels: ['Person', 'Post', 'Comment', 'Group'],
|
|
avgDegree: 10,
|
|
topology: 'scale-free' // Preferential attachment
|
|
}
|
|
```
|
|
|
|
### Knowledge Graph
|
|
```typescript
|
|
{
|
|
nodes: 100_000,
|
|
edges: 1_000_000,
|
|
labels: ['Person', 'Organization', 'Location', 'Event', 'Concept'],
|
|
avgDegree: 10,
|
|
topology: 'semantic-network'
|
|
}
|
|
```
|
|
|
|
### Temporal Events
|
|
```typescript
|
|
{
|
|
nodes: 500_000,
|
|
edges: 2_000_000,
|
|
labels: ['Event', 'State', 'Entity'],
|
|
timeRange: '365 days',
|
|
topology: 'temporal-causal'
|
|
}
|
|
```
|
|
|
|
## Agentic-Synth Integration
|
|
|
|
The benchmark suite uses `@ruvector/agentic-synth` for intelligent synthetic data generation:
|
|
|
|
```typescript
|
|
import { AgenticSynth } from '@ruvector/agentic-synth';
|
|
|
|
const synth = new AgenticSynth({
|
|
provider: 'gemini',
|
|
model: 'gemini-2.0-flash-exp'
|
|
});
|
|
|
|
// Generate realistic user profiles
|
|
const users = await synth.generateStructured({
|
|
type: 'json',
|
|
count: 10000,
|
|
schema: {
|
|
name: 'string',
|
|
age: 'number',
|
|
location: 'string',
|
|
interests: 'array<string>'
|
|
},
|
|
prompt: 'Generate diverse social media user profiles...'
|
|
});
|
|
```
|
|
|
|
## Results Directory Structure
|
|
|
|
```
|
|
benchmarks/
|
|
├── data/
|
|
│ └── graph/
|
|
│ ├── social_network_nodes.json
|
|
│ ├── social_network_edges.json
|
|
│ ├── knowledge_graph_nodes.json
|
|
│ └── temporal_events_nodes.json
|
|
├── results/
|
|
│ └── graph/
|
|
│ ├── social_network_comparison.json
|
|
│ ├── benchmark-report.html
|
|
│ ├── benchmark-report.md
|
|
│ └── benchmark-data.json
|
|
└── graph/
|
|
├── graph-scenarios.ts
|
|
├── graph-data-generator.ts
|
|
├── comparison-runner.ts
|
|
└── results-report.ts
|
|
```
|
|
|
|
## CI/CD Integration
|
|
|
|
Add to GitHub Actions:
|
|
```yaml
|
|
- name: Run Graph Benchmarks
|
|
run: |
|
|
cd benchmarks
|
|
npm install
|
|
npm run graph:all
|
|
|
|
- name: Upload Results
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: graph-benchmarks
|
|
path: benchmarks/results/graph/
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Neo4j Not Available
|
|
If Neo4j is not installed, the comparison runner will use baseline metrics from previous runs or estimates.
|
|
|
|
### Memory Issues
|
|
For large datasets (>1M nodes), increase Node.js heap:
|
|
```bash
|
|
NODE_OPTIONS="--max-old-space-size=8192" npm run graph:generate
|
|
```
|
|
|
|
### Criterion Baseline
|
|
Reset benchmark baselines:
|
|
```bash
|
|
cd crates/ruvector-graph
|
|
cargo bench --bench graph_bench -- --save-baseline new-baseline
|
|
```
|
|
|
|
## Contributing
|
|
|
|
When adding new benchmarks:
|
|
1. Add Rust benchmark to `graph_bench.rs`
|
|
2. Create corresponding TypeScript scenario
|
|
3. Update data generator if needed
|
|
4. Document expected performance targets
|
|
5. Update this README
|
|
|
|
## License
|
|
|
|
MIT - See LICENSE file
|