dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

7.4 KiB

Raw Permalink Blame History

RuVector Graph Database Benchmarks

Comprehensive benchmark suite for RuVector's graph database implementation, comparing performance with Neo4j baseline.

Overview

This benchmark suite validates RuVector's performance claims:

10x+ faster than Neo4j for graph traversals
100x+ faster for simple node/edge lookups
Sub-linear scaling with graph size

Components

1. Rust Benchmarks (`graph_bench.rs`)

Located in /home/user/ruvector/crates/ruvector-graph/benches/graph_bench.rs

Benchmark Categories:

Node Operations

node_insertion_single - Single node insertion (1, 10, 100, 1000 nodes)
node_insertion_batch - Batch insertion (100, 1K, 10K nodes)
node_insertion_bulk - Bulk insertion optimized path (10K, 100K, 1M nodes)

Edge Operations

edge_creation - Edge creation benchmarks (100, 1K, 10K edges)

Query Operations

query_node_lookup - Simple ID-based node lookup (100K nodes)
query_1hop_traversal - 1-hop neighbor traversal (fan-out: 1, 10, 100)
query_2hop_traversal - 2-hop BFS traversal
query_path_finding - Shortest path algorithms
query_aggregation - Aggregation queries (count, avg, etc.)

Concurrency

concurrent_operations - Concurrent read/write (2, 4, 8, 16 threads)

Memory

memory_usage - Memory tracking (10K, 100K, 1M nodes)

Run Rust Benchmarks:

cd /home/user/ruvector/crates/ruvector-graph
cargo bench --bench graph_bench

# Run specific benchmark
cargo bench --bench graph_bench -- node_insertion

# Save baseline
cargo bench --bench graph_bench -- --save-baseline my-baseline

2. TypeScript Test Scenarios (`graph-scenarios.ts`)

Defines high-level benchmark scenarios:

Social Network (1M users, 10M friendships)
- Friend recommendations
- Mutual friends
- Influencer detection
Knowledge Graph (100K entities, 1M relationships)
- Multi-hop reasoning
- Path finding
- Pattern matching
Temporal Graph (500K events)
- Time-range queries
- State transitions
- Event aggregation
Recommendation Engine
- Collaborative filtering
- Item recommendations
- Trending items
Fraud Detection
- Circular transfer detection
- Network analysis
- Risk scoring

3. Data Generator (`graph-data-generator.ts`)

Uses @ruvector/agentic-synth to generate realistic synthetic graph data.

Features:

AI-powered realistic data generation
Multiple graph topologies
Scale-free networks (preferential attachment)
Temporal event sequences

Generate Datasets:

cd /home/user/ruvector/benchmarks
npm run graph:generate

Datasets Generated:

social_network - 1M nodes, 10M edges
knowledge_graph - 100K entities, 1M relationships
temporal_events - 500K events with transitions

4. Comparison Runner (`comparison-runner.ts`)

Runs benchmarks on both RuVector and Neo4j, compares results.

Run Comparisons:

# All scenarios
npm run graph:compare

# Specific scenario
npm run graph:compare:social
npm run graph:compare:knowledge
npm run graph:compare:temporal

Comparison Metrics:

Execution time (ms)
Throughput (ops/sec)
Memory usage (MB)
Latency percentiles (p50, p95, p99)
Speedup calculation
Pass/fail verdict

5. Results Reporter (`results-report.ts`)

Generates comprehensive HTML and Markdown reports.

Generate Reports:

npm run graph:report

Output:

benchmark-report.html - Interactive HTML dashboard with charts
benchmark-report.md - Markdown summary
benchmark-data.json - Raw JSON data

Quick Start

1. Generate Test Data

cd /home/user/ruvector/benchmarks
npm run graph:generate

2. Run Rust Benchmarks

npm run graph:bench

3. Run Comparison Tests

npm run graph:compare

4. Generate Report

npm run graph:report

5. View Results

npm run dashboard
# Open http://localhost:8000/results/graph/benchmark-report.html

Complete Workflow

Run all benchmarks end-to-end:

npm run graph:all

This will:

Generate synthetic datasets using agentic-synth
Run Rust criterion benchmarks
Compare with Neo4j baseline
Generate HTML/Markdown reports

Performance Targets

✅ Target: 10x Faster Traversals

1-hop traversal: >10x speedup
2-hop traversal: >10x speedup
Multi-hop reasoning: >10x speedup

✅ Target: 100x Faster Lookups

Node by ID: >100x speedup
Edge lookup: >100x speedup
Property access: >100x speedup

✅ Target: Sub-linear Scaling

Performance remains consistent as graph grows
Memory usage scales efficiently
Query time independent of total graph size

Dataset Specifications

{
  nodes: 1_000_000,
  edges: 10_000_000,
  labels: ['Person', 'Post', 'Comment', 'Group'],
  avgDegree: 10,
  topology: 'scale-free' // Preferential attachment
}

Knowledge Graph

{
  nodes: 100_000,
  edges: 1_000_000,
  labels: ['Person', 'Organization', 'Location', 'Event', 'Concept'],
  avgDegree: 10,
  topology: 'semantic-network'
}

Temporal Events

{
  nodes: 500_000,
  edges: 2_000_000,
  labels: ['Event', 'State', 'Entity'],
  timeRange: '365 days',
  topology: 'temporal-causal'
}

Agentic-Synth Integration

The benchmark suite uses @ruvector/agentic-synth for intelligent synthetic data generation:

import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  provider: 'gemini',
  model: 'gemini-2.0-flash-exp'
});

// Generate realistic user profiles
const users = await synth.generateStructured({
  type: 'json',
  count: 10000,
  schema: {
    name: 'string',
    age: 'number',
    location: 'string',
    interests: 'array<string>'
  },
  prompt: 'Generate diverse social media user profiles...'
});

Results Directory Structure

benchmarks/
├── data/
│   └── graph/
│       ├── social_network_nodes.json
│       ├── social_network_edges.json
│       ├── knowledge_graph_nodes.json
│       └── temporal_events_nodes.json
├── results/
│   └── graph/
│       ├── social_network_comparison.json
│       ├── benchmark-report.html
│       ├── benchmark-report.md
│       └── benchmark-data.json
└── graph/
    ├── graph-scenarios.ts
    ├── graph-data-generator.ts
    ├── comparison-runner.ts
    └── results-report.ts

CI/CD Integration

Add to GitHub Actions:

- name: Run Graph Benchmarks
  run: |
    cd benchmarks
    npm install
    npm run graph:all

- name: Upload Results
  uses: actions/upload-artifact@v3
  with:
    name: graph-benchmarks
    path: benchmarks/results/graph/

Troubleshooting

Neo4j Not Available

If Neo4j is not installed, the comparison runner will use baseline metrics from previous runs or estimates.

Memory Issues

For large datasets (>1M nodes), increase Node.js heap:

NODE_OPTIONS="--max-old-space-size=8192" npm run graph:generate

Criterion Baseline

Reset benchmark baselines:

cd crates/ruvector-graph
cargo bench --bench graph_bench -- --save-baseline new-baseline

Contributing

When adding new benchmarks:

Add Rust benchmark to graph_bench.rs
Create corresponding TypeScript scenario
Update data generator if needed
Document expected performance targets
Update this README

License

MIT - See LICENSE file

7.4 KiB Raw Permalink Blame History