Files

7.4 KiB

RuVector Graph Database Benchmarks

Comprehensive benchmark suite for RuVector's graph database implementation, comparing performance with Neo4j baseline.

Overview

This benchmark suite validates RuVector's performance claims:

  • 10x+ faster than Neo4j for graph traversals
  • 100x+ faster for simple node/edge lookups
  • Sub-linear scaling with graph size

Components

1. Rust Benchmarks (graph_bench.rs)

Located in /home/user/ruvector/crates/ruvector-graph/benches/graph_bench.rs

Benchmark Categories:

Node Operations

  • node_insertion_single - Single node insertion (1, 10, 100, 1000 nodes)
  • node_insertion_batch - Batch insertion (100, 1K, 10K nodes)
  • node_insertion_bulk - Bulk insertion optimized path (10K, 100K, 1M nodes)

Edge Operations

  • edge_creation - Edge creation benchmarks (100, 1K, 10K edges)

Query Operations

  • query_node_lookup - Simple ID-based node lookup (100K nodes)
  • query_1hop_traversal - 1-hop neighbor traversal (fan-out: 1, 10, 100)
  • query_2hop_traversal - 2-hop BFS traversal
  • query_path_finding - Shortest path algorithms
  • query_aggregation - Aggregation queries (count, avg, etc.)

Concurrency

  • concurrent_operations - Concurrent read/write (2, 4, 8, 16 threads)

Memory

  • memory_usage - Memory tracking (10K, 100K, 1M nodes)

Run Rust Benchmarks:

cd /home/user/ruvector/crates/ruvector-graph
cargo bench --bench graph_bench

# Run specific benchmark
cargo bench --bench graph_bench -- node_insertion

# Save baseline
cargo bench --bench graph_bench -- --save-baseline my-baseline

2. TypeScript Test Scenarios (graph-scenarios.ts)

Defines high-level benchmark scenarios:

  • Social Network (1M users, 10M friendships)

    • Friend recommendations
    • Mutual friends
    • Influencer detection
  • Knowledge Graph (100K entities, 1M relationships)

    • Multi-hop reasoning
    • Path finding
    • Pattern matching
  • Temporal Graph (500K events)

    • Time-range queries
    • State transitions
    • Event aggregation
  • Recommendation Engine

    • Collaborative filtering
    • Item recommendations
    • Trending items
  • Fraud Detection

    • Circular transfer detection
    • Network analysis
    • Risk scoring

3. Data Generator (graph-data-generator.ts)

Uses @ruvector/agentic-synth to generate realistic synthetic graph data.

Features:

  • AI-powered realistic data generation
  • Multiple graph topologies
  • Scale-free networks (preferential attachment)
  • Temporal event sequences

Generate Datasets:

cd /home/user/ruvector/benchmarks
npm run graph:generate

Datasets Generated:

  • social_network - 1M nodes, 10M edges
  • knowledge_graph - 100K entities, 1M relationships
  • temporal_events - 500K events with transitions

4. Comparison Runner (comparison-runner.ts)

Runs benchmarks on both RuVector and Neo4j, compares results.

Run Comparisons:

# All scenarios
npm run graph:compare

# Specific scenario
npm run graph:compare:social
npm run graph:compare:knowledge
npm run graph:compare:temporal

Comparison Metrics:

  • Execution time (ms)
  • Throughput (ops/sec)
  • Memory usage (MB)
  • Latency percentiles (p50, p95, p99)
  • Speedup calculation
  • Pass/fail verdict

5. Results Reporter (results-report.ts)

Generates comprehensive HTML and Markdown reports.

Generate Reports:

npm run graph:report

Output:

  • benchmark-report.html - Interactive HTML dashboard with charts
  • benchmark-report.md - Markdown summary
  • benchmark-data.json - Raw JSON data

Quick Start

1. Generate Test Data

cd /home/user/ruvector/benchmarks
npm run graph:generate

2. Run Rust Benchmarks

npm run graph:bench

3. Run Comparison Tests

npm run graph:compare

4. Generate Report

npm run graph:report

5. View Results

npm run dashboard
# Open http://localhost:8000/results/graph/benchmark-report.html

Complete Workflow

Run all benchmarks end-to-end:

npm run graph:all

This will:

  1. Generate synthetic datasets using agentic-synth
  2. Run Rust criterion benchmarks
  3. Compare with Neo4j baseline
  4. Generate HTML/Markdown reports

Performance Targets

Target: 10x Faster Traversals

  • 1-hop traversal: >10x speedup
  • 2-hop traversal: >10x speedup
  • Multi-hop reasoning: >10x speedup

Target: 100x Faster Lookups

  • Node by ID: >100x speedup
  • Edge lookup: >100x speedup
  • Property access: >100x speedup

Target: Sub-linear Scaling

  • Performance remains consistent as graph grows
  • Memory usage scales efficiently
  • Query time independent of total graph size

Dataset Specifications

Social Network

{
  nodes: 1_000_000,
  edges: 10_000_000,
  labels: ['Person', 'Post', 'Comment', 'Group'],
  avgDegree: 10,
  topology: 'scale-free' // Preferential attachment
}

Knowledge Graph

{
  nodes: 100_000,
  edges: 1_000_000,
  labels: ['Person', 'Organization', 'Location', 'Event', 'Concept'],
  avgDegree: 10,
  topology: 'semantic-network'
}

Temporal Events

{
  nodes: 500_000,
  edges: 2_000_000,
  labels: ['Event', 'State', 'Entity'],
  timeRange: '365 days',
  topology: 'temporal-causal'
}

Agentic-Synth Integration

The benchmark suite uses @ruvector/agentic-synth for intelligent synthetic data generation:

import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  provider: 'gemini',
  model: 'gemini-2.0-flash-exp'
});

// Generate realistic user profiles
const users = await synth.generateStructured({
  type: 'json',
  count: 10000,
  schema: {
    name: 'string',
    age: 'number',
    location: 'string',
    interests: 'array<string>'
  },
  prompt: 'Generate diverse social media user profiles...'
});

Results Directory Structure

benchmarks/
├── data/
│   └── graph/
│       ├── social_network_nodes.json
│       ├── social_network_edges.json
│       ├── knowledge_graph_nodes.json
│       └── temporal_events_nodes.json
├── results/
│   └── graph/
│       ├── social_network_comparison.json
│       ├── benchmark-report.html
│       ├── benchmark-report.md
│       └── benchmark-data.json
└── graph/
    ├── graph-scenarios.ts
    ├── graph-data-generator.ts
    ├── comparison-runner.ts
    └── results-report.ts

CI/CD Integration

Add to GitHub Actions:

- name: Run Graph Benchmarks
  run: |
    cd benchmarks
    npm install
    npm run graph:all

- name: Upload Results
  uses: actions/upload-artifact@v3
  with:
    name: graph-benchmarks
    path: benchmarks/results/graph/

Troubleshooting

Neo4j Not Available

If Neo4j is not installed, the comparison runner will use baseline metrics from previous runs or estimates.

Memory Issues

For large datasets (>1M nodes), increase Node.js heap:

NODE_OPTIONS="--max-old-space-size=8192" npm run graph:generate

Criterion Baseline

Reset benchmark baselines:

cd crates/ruvector-graph
cargo bench --bench graph_bench -- --save-baseline new-baseline

Contributing

When adding new benchmarks:

  1. Add Rust benchmark to graph_bench.rs
  2. Create corresponding TypeScript scenario
  3. Update data generator if needed
  4. Document expected performance targets
  5. Update this README

License

MIT - See LICENSE file