Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,468 @@
# RuVector Parallel Query Execution Guide
Complete guide to parallel query execution for PostgreSQL vector operations in RuVector.
## Overview
RuVector implements PostgreSQL parallel query execution for vector similarity search, enabling:
- **Multi-worker parallel scans** for large vector indexes
- **Automatic parallelization** based on index size and query complexity
- **Work-stealing partitioning** for optimal load balancing
- **SIMD acceleration** within each parallel worker
- **Tournament tree merging** for efficient result combination
## Architecture
### Parallel Execution Components
1. **Parallel-Safe Distance Functions**
- All distance functions marked as `PARALLEL SAFE`
- Can be executed by multiple workers concurrently
- SIMD optimizations active in each worker
2. **Parallel Index Scan**
- Dynamic work partitioning across workers
- Each worker scans assigned partitions
- Local result buffers per worker
3. **Result Merging**
- Tournament tree merge for k-NN results
- Maintains sorted order efficiently
- Minimal overhead for large k values
4. **Background Worker**
- Automatic index maintenance
- Statistics collection
- Periodic optimization
## Configuration
### PostgreSQL Settings
```sql
-- Enable parallel query globally
SET max_parallel_workers_per_gather = 4;
SET parallel_setup_cost = 1000;
SET parallel_tuple_cost = 0.1;
-- RuVector-specific settings
SET ruvector.ef_search = 40;
SET ruvector.probes = 1;
```
### Automatic Worker Estimation
RuVector automatically estimates optimal worker count based on:
```sql
-- Check estimated workers for a query
SELECT ruvector_estimate_workers(
pg_relation_size('my_hnsw_index') / 8192, -- index pages
(SELECT count(*) FROM my_vectors), -- tuple count
10, -- k (neighbors)
40 -- ef_search
);
```
**Estimation factors:**
- Index size (1 worker per 1000 pages)
- Query complexity (higher k and ef_search → more workers)
- Available parallel workers (respects PostgreSQL limits)
### Manual Configuration
```sql
-- Force parallel execution
SET force_parallel_mode = ON;
-- Configure minimum thresholds
SELECT ruvector_set_parallel_config(
enable := true,
min_tuples_for_parallel := 10000,
min_pages_for_parallel := 100
);
```
## Usage Examples
### Basic Parallel Query
```sql
-- Parallel k-NN search (automatic)
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;
-- Output shows parallel workers:
-- Gather (actual time=12.3..18.7 rows=10 loops=1)
-- Workers Planned: 4
-- Workers Launched: 4
-- -> Parallel Seq Scan on embeddings
```
### Index-Based Parallel Search
```sql
-- Create HNSW index
CREATE INDEX embeddings_hnsw_idx
ON embeddings
USING ruhnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);
-- Parallel index scan
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 100;
```
### Query Planning Analysis
```sql
-- Explain query parallelization
SELECT * FROM ruvector_explain_parallel(
'embeddings_hnsw_idx', -- index name
100, -- k (neighbors)
200, -- ef_search
768 -- dimensions
);
-- Returns JSON with:
-- {
-- "parallel_plan": {
-- "enabled": true,
-- "num_workers": 4,
-- "num_partitions": 12,
-- "estimated_speedup": "2.8x"
-- }
-- }
```
## Performance Tuning
### Worker Count Optimization
```sql
-- Benchmark different worker counts
DO $$
DECLARE
workers INT;
exec_time FLOAT;
BEGIN
FOR workers IN 1..8 LOOP
SET max_parallel_workers_per_gather = workers;
SELECT extract(epoch from (
SELECT clock_timestamp() - now()
FROM (
SELECT embedding <-> '[...]'::vector AS dist
FROM embeddings
ORDER BY dist LIMIT 100
) sub
)) INTO exec_time;
RAISE NOTICE 'Workers: %, Time: %ms', workers, exec_time * 1000;
END LOOP;
END $$;
```
### Partition Tuning
The number of partitions affects load balancing:
- **Too few partitions**: Poor load distribution
- **Too many partitions**: Higher overhead
RuVector uses **3x workers** as default partition count.
```sql
-- Check partition statistics
SELECT
num_workers,
num_partitions,
total_results,
completed_workers
FROM ruvector_parallel_stats();
```
### Cost Model Tuning
```sql
-- Adjust costs for your workload
SET parallel_setup_cost = 500; -- Lower = more likely to parallelize
SET parallel_tuple_cost = 0.05; -- Lower = favor parallel execution
-- Monitor query planning
EXPLAIN (ANALYZE, VERBOSE, COSTS)
SELECT * FROM embeddings
ORDER BY embedding <-> '[...]'::vector
LIMIT 50;
```
## Performance Characteristics
### Speedup by Index Size
| Index Size | Tuples | Sequential (ms) | Parallel (4 workers) | Speedup |
|------------|--------|-----------------|---------------------|---------|
| 100 MB | 10K | 8.2 | 8.5 | 0.96x |
| 500 MB | 50K | 42.1 | 17.3 | 2.4x |
| 2 GB | 200K | 165.3 | 52.8 | 3.1x |
| 10 GB | 1M | 891.2 | 247.6 | 3.6x |
### Speedup by Query Complexity
| k | ef_search | Sequential (ms) | Parallel (ms) | Speedup |
|-----|-----------|-----------------|---------------|---------|
| 10 | 40 | 45.2 | 28.3 | 1.6x |
| 50 | 100 | 89.7 | 31.2 | 2.9x |
| 100 | 200 | 178.4 | 51.7 | 3.5x |
| 500 | 500 | 623.1 | 168.9 | 3.7x |
## Background Worker
### Starting the Background Worker
```sql
-- Start background maintenance worker
SELECT ruvector_bgworker_start();
-- Check status
SELECT * FROM ruvector_bgworker_status();
-- Returns:
-- {
-- "running": true,
-- "cycles_completed": 47,
-- "indexes_maintained": 235,
-- "last_maintenance": 1701234567
-- }
```
### Configuration
```sql
-- Configure maintenance intervals and operations
SELECT ruvector_bgworker_config(
maintenance_interval_secs := 300, -- 5 minutes
auto_optimize := true,
collect_stats := true,
auto_vacuum := true
);
```
### Maintenance Operations
The background worker performs:
1. **Statistics Collection**
- Index size tracking
- Fragmentation analysis
- Query performance metrics
2. **Automatic Optimization**
- HNSW graph refinement
- IVFFlat centroid recomputation
- Dead tuple removal
3. **Vacuum Operations**
- Reclaim deleted space
- Update index statistics
- Compact memory
## Monitoring
### Real-Time Statistics
```sql
-- Overall parallel execution stats
SELECT * FROM ruvector_parallel_stats();
-- Per-query monitoring
SELECT
query,
calls,
total_time,
mean_time,
workers_used
FROM pg_stat_statements
WHERE query LIKE '%<->%'
ORDER BY total_time DESC;
```
### Performance Analysis
```sql
-- Benchmark parallel vs sequential
SELECT * FROM ruvector_benchmark_parallel(
'embeddings', -- table
'embedding', -- column
'[0.1, 0.2, ...]'::vector, -- query
100 -- k
);
-- Returns detailed comparison:
-- {
-- "sequential": {"time_ms": 45.2},
-- "parallel": {
-- "time_ms": 18.7,
-- "workers": 4,
-- "speedup": "2.42x"
-- }
-- }
```
## Best Practices
### When to Use Parallel Queries
**Good candidates:**
- Large indexes (>100,000 vectors)
- High-dimensional vectors (>128 dims)
- Large k values (>50)
- High ef_search (>100)
- Production OLAP workloads
**Avoid for:**
- Small indexes (<10,000 vectors)
- Small k values (<10)
- OLTP with many concurrent small queries
- Memory-constrained systems
### Optimization Checklist
1. **Configure PostgreSQL Settings**
```sql
SET max_parallel_workers_per_gather = 4;
SET shared_buffers = '8GB';
SET work_mem = '256MB';
```
2. **Monitor Worker Efficiency**
```sql
-- Check if workers are balanced
SELECT * FROM ruvector_parallel_stats();
```
3. **Tune Index Parameters**
```sql
-- For HNSW
CREATE INDEX ... WITH (
m = 16, -- Connection count
ef_construction = 64, -- Build quality
ef_search = 40 -- Query quality
);
```
4. **Enable Background Maintenance**
```sql
SELECT ruvector_bgworker_start();
```
## Troubleshooting
### Parallel Query Not Activating
**Check settings:**
```sql
SHOW max_parallel_workers_per_gather;
SHOW parallel_setup_cost;
SHOW min_parallel_table_scan_size;
```
**Force parallel mode (testing only):**
```sql
SET force_parallel_mode = ON;
```
### Poor Parallel Speedup
**Possible causes:**
1. **Too few tuples**: Overhead dominates
```sql
SELECT count(*) FROM embeddings; -- Should be >10,000
```
2. **Memory constraints**: Workers competing for resources
```sql
SET work_mem = '512MB'; -- Increase per-worker memory
```
3. **Lock contention**: Concurrent writes blocking readers
```sql
-- Separate read/write workloads
```
### High Memory Usage
```sql
-- Monitor memory per worker
SELECT
pid,
backend_type,
pg_size_pretty(pg_backend_memory_usage()) as memory
FROM pg_stat_activity
WHERE backend_type LIKE 'parallel%';
-- Reduce workers if needed
SET max_parallel_workers_per_gather = 2;
```
## Advanced Features
### Custom Parallelization
```sql
-- Override automatic estimation
SELECT /*+ Parallel(embeddings 8) */
id, embedding <-> '[...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 100;
```
### Partition-Aware Queries
```sql
-- Query specific partitions in parallel
SELECT * FROM embeddings_2024_01
UNION ALL
SELECT * FROM embeddings_2024_02
ORDER BY embedding <-> '[...]'::vector
LIMIT 100;
```
### Integration with Connection Pooling
```sql
-- PgBouncer configuration
[databases]
mydb = host=localhost pool_mode=transaction
max_db_connections = 20
default_pool_size = 5
-- Reserve connections for parallel workers
reserve_pool_size = 16 -- 4 workers * 4 queries
```
## References
- [PostgreSQL Parallel Query Documentation](https://www.postgresql.org/docs/current/parallel-query.html)
- [RuVector Architecture](./architecture.md)
- [HNSW Index Guide](./hnsw-index.md)
- [Performance Tuning](./performance-tuning.md)
## Summary
RuVector's parallel query execution provides:
- **2-4x speedup** for large indexes and complex queries
- **Automatic optimization** with background worker
- **Zero configuration** for most workloads
- **Full PostgreSQL compatibility** with standard parallel query infrastructure
For optimal performance, ensure your index is sufficiently large (>100K vectors) and tune `max_parallel_workers_per_gather` based on your hardware.