Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/benchmarks/graph/docs/README.md
+++ b/vendor/ruvector/benchmarks/graph/docs/README.md
@@ -0,0 +1,329 @@
+# RuVector Graph Database Benchmarks
+
+Comprehensive benchmark suite for RuVector's graph database implementation, comparing performance with Neo4j baseline.
+
+## Overview
+
+This benchmark suite validates RuVector's performance claims:
+- **10x+ faster** than Neo4j for graph traversals
+- **100x+ faster** for simple node/edge lookups
+- **Sub-linear scaling** with graph size
+
+## Components
+
+### 1. Rust Benchmarks (`graph_bench.rs`)
+
+Located in `/home/user/ruvector/crates/ruvector-graph/benches/graph_bench.rs`
+
+**Benchmark Categories:**
+
+#### Node Operations
+- `node_insertion_single` - Single node insertion (1, 10, 100, 1000 nodes)
+- `node_insertion_batch` - Batch insertion (100, 1K, 10K nodes)
+- `node_insertion_bulk` - Bulk insertion optimized path (10K, 100K, 1M nodes)
+
+#### Edge Operations
+- `edge_creation` - Edge creation benchmarks (100, 1K, 10K edges)
+
+#### Query Operations
+- `query_node_lookup` - Simple ID-based node lookup (100K nodes)
+- `query_1hop_traversal` - 1-hop neighbor traversal (fan-out: 1, 10, 100)
+- `query_2hop_traversal` - 2-hop BFS traversal
+- `query_path_finding` - Shortest path algorithms
+- `query_aggregation` - Aggregation queries (count, avg, etc.)
+
+#### Concurrency
+- `concurrent_operations` - Concurrent read/write (2, 4, 8, 16 threads)
+
+#### Memory
+- `memory_usage` - Memory tracking (10K, 100K, 1M nodes)
+
+**Run Rust Benchmarks:**
+```bash
+cd /home/user/ruvector/crates/ruvector-graph
+cargo bench --bench graph_bench
+
+# Run specific benchmark
+cargo bench --bench graph_bench -- node_insertion
+
+# Save baseline
+cargo bench --bench graph_bench -- --save-baseline my-baseline
+```
+
+### 2. TypeScript Test Scenarios (`graph-scenarios.ts`)
+
+Defines high-level benchmark scenarios:
+
+- **Social Network** (1M users, 10M friendships)
+  - Friend recommendations
+  - Mutual friends
+  - Influencer detection
+
+- **Knowledge Graph** (100K entities, 1M relationships)
+  - Multi-hop reasoning
+  - Path finding
+  - Pattern matching
+
+- **Temporal Graph** (500K events)
+  - Time-range queries
+  - State transitions
+  - Event aggregation
+
+- **Recommendation Engine**
+  - Collaborative filtering
+  - Item recommendations
+  - Trending items
+
+- **Fraud Detection**
+  - Circular transfer detection
+  - Network analysis
+  - Risk scoring
+
+### 3. Data Generator (`graph-data-generator.ts`)
+
+Uses `@ruvector/agentic-synth` to generate realistic synthetic graph data.
+
+**Features:**
+- AI-powered realistic data generation
+- Multiple graph topologies
+- Scale-free networks (preferential attachment)
+- Temporal event sequences
+
+**Generate Datasets:**
+```bash
+cd /home/user/ruvector/benchmarks
+npm run graph:generate
+```
+
+**Datasets Generated:**
+- `social_network` - 1M nodes, 10M edges
+- `knowledge_graph` - 100K entities, 1M relationships
+- `temporal_events` - 500K events with transitions
+
+### 4. Comparison Runner (`comparison-runner.ts`)
+
+Runs benchmarks on both RuVector and Neo4j, compares results.
+
+**Run Comparisons:**
+```bash
+# All scenarios
+npm run graph:compare
+
+# Specific scenario
+npm run graph:compare:social
+npm run graph:compare:knowledge
+npm run graph:compare:temporal
+```
+
+**Comparison Metrics:**
+- Execution time (ms)
+- Throughput (ops/sec)
+- Memory usage (MB)
+- Latency percentiles (p50, p95, p99)
+- Speedup calculation
+- Pass/fail verdict
+
+### 5. Results Reporter (`results-report.ts`)
+
+Generates comprehensive HTML and Markdown reports.
+
+**Generate Reports:**
+```bash
+npm run graph:report
+```
+
+**Output:**
+- `benchmark-report.html` - Interactive HTML dashboard with charts
+- `benchmark-report.md` - Markdown summary
+- `benchmark-data.json` - Raw JSON data
+
+## Quick Start
+
+### 1. Generate Test Data
+```bash
+cd /home/user/ruvector/benchmarks
+npm run graph:generate
+```
+
+### 2. Run Rust Benchmarks
+```bash
+npm run graph:bench
+```
+
+### 3. Run Comparison Tests
+```bash
+npm run graph:compare
+```
+
+### 4. Generate Report
+```bash
+npm run graph:report
+```
+
+### 5. View Results
+```bash
+npm run dashboard
+# Open http://localhost:8000/results/graph/benchmark-report.html
+```
+
+## Complete Workflow
+
+Run all benchmarks end-to-end:
+```bash
+npm run graph:all
+```
+
+This will:
+1. Generate synthetic datasets using agentic-synth
+2. Run Rust criterion benchmarks
+3. Compare with Neo4j baseline
+4. Generate HTML/Markdown reports
+
+## Performance Targets
+
+### ✅ Target: 10x Faster Traversals
+- 1-hop traversal: >10x speedup
+- 2-hop traversal: >10x speedup
+- Multi-hop reasoning: >10x speedup
+
+### ✅ Target: 100x Faster Lookups
+- Node by ID: >100x speedup
+- Edge lookup: >100x speedup
+- Property access: >100x speedup
+
+### ✅ Target: Sub-linear Scaling
+- Performance remains consistent as graph grows
+- Memory usage scales efficiently
+- Query time independent of total graph size
+
+## Dataset Specifications
+
+### Social Network
+```typescript
+{
+  nodes: 1_000_000,
+  edges: 10_000_000,
+  labels: ['Person', 'Post', 'Comment', 'Group'],
+  avgDegree: 10,
+  topology: 'scale-free' // Preferential attachment
+}
+```
+
+### Knowledge Graph
+```typescript
+{
+  nodes: 100_000,
+  edges: 1_000_000,
+  labels: ['Person', 'Organization', 'Location', 'Event', 'Concept'],
+  avgDegree: 10,
+  topology: 'semantic-network'
+}
+```
+
+### Temporal Events
+```typescript
+{
+  nodes: 500_000,
+  edges: 2_000_000,
+  labels: ['Event', 'State', 'Entity'],
+  timeRange: '365 days',
+  topology: 'temporal-causal'
+}
+```
+
+## Agentic-Synth Integration
+
+The benchmark suite uses `@ruvector/agentic-synth` for intelligent synthetic data generation:
+
+```typescript
+import { AgenticSynth } from '@ruvector/agentic-synth';
+
+const synth = new AgenticSynth({
+  provider: 'gemini',
+  model: 'gemini-2.0-flash-exp'
+});
+
+// Generate realistic user profiles
+const users = await synth.generateStructured({
+  type: 'json',
+  count: 10000,
+  schema: {
+    name: 'string',
+    age: 'number',
+    location: 'string',
+    interests: 'array<string>'
+  },
+  prompt: 'Generate diverse social media user profiles...'
+});
+```
+
+## Results Directory Structure
+
+```
+benchmarks/
+├── data/
+│   └── graph/
+│       ├── social_network_nodes.json
+│       ├── social_network_edges.json
+│       ├── knowledge_graph_nodes.json
+│       └── temporal_events_nodes.json
+├── results/
+│   └── graph/
+│       ├── social_network_comparison.json
+│       ├── benchmark-report.html
+│       ├── benchmark-report.md
+│       └── benchmark-data.json
+└── graph/
+    ├── graph-scenarios.ts
+    ├── graph-data-generator.ts
+    ├── comparison-runner.ts
+    └── results-report.ts
+```
+
+## CI/CD Integration
+
+Add to GitHub Actions:
+```yaml
+- name: Run Graph Benchmarks
+  run: |
+    cd benchmarks
+    npm install
+    npm run graph:all
+
+- name: Upload Results
+  uses: actions/upload-artifact@v3
+  with:
+    name: graph-benchmarks
+    path: benchmarks/results/graph/
+```
+
+## Troubleshooting
+
+### Neo4j Not Available
+If Neo4j is not installed, the comparison runner will use baseline metrics from previous runs or estimates.
+
+### Memory Issues
+For large datasets (>1M nodes), increase Node.js heap:
+```bash
+NODE_OPTIONS="--max-old-space-size=8192" npm run graph:generate
+```
+
+### Criterion Baseline
+Reset benchmark baselines:
+```bash
+cd crates/ruvector-graph
+cargo bench --bench graph_bench -- --save-baseline new-baseline
+```
+
+## Contributing
+
+When adding new benchmarks:
+1. Add Rust benchmark to `graph_bench.rs`
+2. Create corresponding TypeScript scenario
+3. Update data generator if needed
+4. Document expected performance targets
+5. Update this README
+
+## License
+
+MIT - See LICENSE file