git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
11 KiB
RuVector Parallel Query Execution Guide
Complete guide to parallel query execution for PostgreSQL vector operations in RuVector.
Overview
RuVector implements PostgreSQL parallel query execution for vector similarity search, enabling:
- Multi-worker parallel scans for large vector indexes
- Automatic parallelization based on index size and query complexity
- Work-stealing partitioning for optimal load balancing
- SIMD acceleration within each parallel worker
- Tournament tree merging for efficient result combination
Architecture
Parallel Execution Components
-
Parallel-Safe Distance Functions
- All distance functions marked as
PARALLEL SAFE - Can be executed by multiple workers concurrently
- SIMD optimizations active in each worker
- All distance functions marked as
-
Parallel Index Scan
- Dynamic work partitioning across workers
- Each worker scans assigned partitions
- Local result buffers per worker
-
Result Merging
- Tournament tree merge for k-NN results
- Maintains sorted order efficiently
- Minimal overhead for large k values
-
Background Worker
- Automatic index maintenance
- Statistics collection
- Periodic optimization
Configuration
PostgreSQL Settings
-- Enable parallel query globally
SET max_parallel_workers_per_gather = 4;
SET parallel_setup_cost = 1000;
SET parallel_tuple_cost = 0.1;
-- RuVector-specific settings
SET ruvector.ef_search = 40;
SET ruvector.probes = 1;
Automatic Worker Estimation
RuVector automatically estimates optimal worker count based on:
-- Check estimated workers for a query
SELECT ruvector_estimate_workers(
pg_relation_size('my_hnsw_index') / 8192, -- index pages
(SELECT count(*) FROM my_vectors), -- tuple count
10, -- k (neighbors)
40 -- ef_search
);
Estimation factors:
- Index size (1 worker per 1000 pages)
- Query complexity (higher k and ef_search → more workers)
- Available parallel workers (respects PostgreSQL limits)
Manual Configuration
-- Force parallel execution
SET force_parallel_mode = ON;
-- Configure minimum thresholds
SELECT ruvector_set_parallel_config(
enable := true,
min_tuples_for_parallel := 10000,
min_pages_for_parallel := 100
);
Usage Examples
Basic Parallel Query
-- Parallel k-NN search (automatic)
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;
-- Output shows parallel workers:
-- Gather (actual time=12.3..18.7 rows=10 loops=1)
-- Workers Planned: 4
-- Workers Launched: 4
-- -> Parallel Seq Scan on embeddings
Index-Based Parallel Search
-- Create HNSW index
CREATE INDEX embeddings_hnsw_idx
ON embeddings
USING ruhnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);
-- Parallel index scan
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 100;
Query Planning Analysis
-- Explain query parallelization
SELECT * FROM ruvector_explain_parallel(
'embeddings_hnsw_idx', -- index name
100, -- k (neighbors)
200, -- ef_search
768 -- dimensions
);
-- Returns JSON with:
-- {
-- "parallel_plan": {
-- "enabled": true,
-- "num_workers": 4,
-- "num_partitions": 12,
-- "estimated_speedup": "2.8x"
-- }
-- }
Performance Tuning
Worker Count Optimization
-- Benchmark different worker counts
DO $$
DECLARE
workers INT;
exec_time FLOAT;
BEGIN
FOR workers IN 1..8 LOOP
SET max_parallel_workers_per_gather = workers;
SELECT extract(epoch from (
SELECT clock_timestamp() - now()
FROM (
SELECT embedding <-> '[...]'::vector AS dist
FROM embeddings
ORDER BY dist LIMIT 100
) sub
)) INTO exec_time;
RAISE NOTICE 'Workers: %, Time: %ms', workers, exec_time * 1000;
END LOOP;
END $$;
Partition Tuning
The number of partitions affects load balancing:
- Too few partitions: Poor load distribution
- Too many partitions: Higher overhead
RuVector uses 3x workers as default partition count.
-- Check partition statistics
SELECT
num_workers,
num_partitions,
total_results,
completed_workers
FROM ruvector_parallel_stats();
Cost Model Tuning
-- Adjust costs for your workload
SET parallel_setup_cost = 500; -- Lower = more likely to parallelize
SET parallel_tuple_cost = 0.05; -- Lower = favor parallel execution
-- Monitor query planning
EXPLAIN (ANALYZE, VERBOSE, COSTS)
SELECT * FROM embeddings
ORDER BY embedding <-> '[...]'::vector
LIMIT 50;
Performance Characteristics
Speedup by Index Size
| Index Size | Tuples | Sequential (ms) | Parallel (4 workers) | Speedup |
|---|---|---|---|---|
| 100 MB | 10K | 8.2 | 8.5 | 0.96x |
| 500 MB | 50K | 42.1 | 17.3 | 2.4x |
| 2 GB | 200K | 165.3 | 52.8 | 3.1x |
| 10 GB | 1M | 891.2 | 247.6 | 3.6x |
Speedup by Query Complexity
| k | ef_search | Sequential (ms) | Parallel (ms) | Speedup |
|---|---|---|---|---|
| 10 | 40 | 45.2 | 28.3 | 1.6x |
| 50 | 100 | 89.7 | 31.2 | 2.9x |
| 100 | 200 | 178.4 | 51.7 | 3.5x |
| 500 | 500 | 623.1 | 168.9 | 3.7x |
Background Worker
Starting the Background Worker
-- Start background maintenance worker
SELECT ruvector_bgworker_start();
-- Check status
SELECT * FROM ruvector_bgworker_status();
-- Returns:
-- {
-- "running": true,
-- "cycles_completed": 47,
-- "indexes_maintained": 235,
-- "last_maintenance": 1701234567
-- }
Configuration
-- Configure maintenance intervals and operations
SELECT ruvector_bgworker_config(
maintenance_interval_secs := 300, -- 5 minutes
auto_optimize := true,
collect_stats := true,
auto_vacuum := true
);
Maintenance Operations
The background worker performs:
-
Statistics Collection
- Index size tracking
- Fragmentation analysis
- Query performance metrics
-
Automatic Optimization
- HNSW graph refinement
- IVFFlat centroid recomputation
- Dead tuple removal
-
Vacuum Operations
- Reclaim deleted space
- Update index statistics
- Compact memory
Monitoring
Real-Time Statistics
-- Overall parallel execution stats
SELECT * FROM ruvector_parallel_stats();
-- Per-query monitoring
SELECT
query,
calls,
total_time,
mean_time,
workers_used
FROM pg_stat_statements
WHERE query LIKE '%<->%'
ORDER BY total_time DESC;
Performance Analysis
-- Benchmark parallel vs sequential
SELECT * FROM ruvector_benchmark_parallel(
'embeddings', -- table
'embedding', -- column
'[0.1, 0.2, ...]'::vector, -- query
100 -- k
);
-- Returns detailed comparison:
-- {
-- "sequential": {"time_ms": 45.2},
-- "parallel": {
-- "time_ms": 18.7,
-- "workers": 4,
-- "speedup": "2.42x"
-- }
-- }
Best Practices
When to Use Parallel Queries
✅ Good candidates:
- Large indexes (>100,000 vectors)
- High-dimensional vectors (>128 dims)
- Large k values (>50)
- High ef_search (>100)
- Production OLAP workloads
❌ Avoid for:
- Small indexes (<10,000 vectors)
- Small k values (<10)
- OLTP with many concurrent small queries
- Memory-constrained systems
Optimization Checklist
-
Configure PostgreSQL Settings
SET max_parallel_workers_per_gather = 4; SET shared_buffers = '8GB'; SET work_mem = '256MB'; -
Monitor Worker Efficiency
-- Check if workers are balanced SELECT * FROM ruvector_parallel_stats(); -
Tune Index Parameters
-- For HNSW CREATE INDEX ... WITH ( m = 16, -- Connection count ef_construction = 64, -- Build quality ef_search = 40 -- Query quality ); -
Enable Background Maintenance
SELECT ruvector_bgworker_start();
Troubleshooting
Parallel Query Not Activating
Check settings:
SHOW max_parallel_workers_per_gather;
SHOW parallel_setup_cost;
SHOW min_parallel_table_scan_size;
Force parallel mode (testing only):
SET force_parallel_mode = ON;
Poor Parallel Speedup
Possible causes:
-
Too few tuples: Overhead dominates
SELECT count(*) FROM embeddings; -- Should be >10,000 -
Memory constraints: Workers competing for resources
SET work_mem = '512MB'; -- Increase per-worker memory -
Lock contention: Concurrent writes blocking readers
-- Separate read/write workloads
High Memory Usage
-- Monitor memory per worker
SELECT
pid,
backend_type,
pg_size_pretty(pg_backend_memory_usage()) as memory
FROM pg_stat_activity
WHERE backend_type LIKE 'parallel%';
-- Reduce workers if needed
SET max_parallel_workers_per_gather = 2;
Advanced Features
Custom Parallelization
-- Override automatic estimation
SELECT /*+ Parallel(embeddings 8) */
id, embedding <-> '[...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 100;
Partition-Aware Queries
-- Query specific partitions in parallel
SELECT * FROM embeddings_2024_01
UNION ALL
SELECT * FROM embeddings_2024_02
ORDER BY embedding <-> '[...]'::vector
LIMIT 100;
Integration with Connection Pooling
-- PgBouncer configuration
[databases]
mydb = host=localhost pool_mode=transaction
max_db_connections = 20
default_pool_size = 5
-- Reserve connections for parallel workers
reserve_pool_size = 16 -- 4 workers * 4 queries
References
Summary
RuVector's parallel query execution provides:
- 2-4x speedup for large indexes and complex queries
- Automatic optimization with background worker
- Zero configuration for most workloads
- Full PostgreSQL compatibility with standard parallel query infrastructure
For optimal performance, ensure your index is sufficiently large (>100K vectors) and tune max_parallel_workers_per_gather based on your hardware.