# RuVector Parallel Query Execution Guide Complete guide to parallel query execution for PostgreSQL vector operations in RuVector. ## Overview RuVector implements PostgreSQL parallel query execution for vector similarity search, enabling: - **Multi-worker parallel scans** for large vector indexes - **Automatic parallelization** based on index size and query complexity - **Work-stealing partitioning** for optimal load balancing - **SIMD acceleration** within each parallel worker - **Tournament tree merging** for efficient result combination ## Architecture ### Parallel Execution Components 1. **Parallel-Safe Distance Functions** - All distance functions marked as `PARALLEL SAFE` - Can be executed by multiple workers concurrently - SIMD optimizations active in each worker 2. **Parallel Index Scan** - Dynamic work partitioning across workers - Each worker scans assigned partitions - Local result buffers per worker 3. **Result Merging** - Tournament tree merge for k-NN results - Maintains sorted order efficiently - Minimal overhead for large k values 4. **Background Worker** - Automatic index maintenance - Statistics collection - Periodic optimization ## Configuration ### PostgreSQL Settings ```sql -- Enable parallel query globally SET max_parallel_workers_per_gather = 4; SET parallel_setup_cost = 1000; SET parallel_tuple_cost = 0.1; -- RuVector-specific settings SET ruvector.ef_search = 40; SET ruvector.probes = 1; ``` ### Automatic Worker Estimation RuVector automatically estimates optimal worker count based on: ```sql -- Check estimated workers for a query SELECT ruvector_estimate_workers( pg_relation_size('my_hnsw_index') / 8192, -- index pages (SELECT count(*) FROM my_vectors), -- tuple count 10, -- k (neighbors) 40 -- ef_search ); ``` **Estimation factors:** - Index size (1 worker per 1000 pages) - Query complexity (higher k and ef_search → more workers) - Available parallel workers (respects PostgreSQL limits) ### Manual Configuration ```sql -- Force parallel execution SET force_parallel_mode = ON; -- Configure minimum thresholds SELECT ruvector_set_parallel_config( enable := true, min_tuples_for_parallel := 10000, min_pages_for_parallel := 100 ); ``` ## Usage Examples ### Basic Parallel Query ```sql -- Parallel k-NN search (automatic) EXPLAIN (ANALYZE, BUFFERS) SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance FROM embeddings ORDER BY distance LIMIT 10; -- Output shows parallel workers: -- Gather (actual time=12.3..18.7 rows=10 loops=1) -- Workers Planned: 4 -- Workers Launched: 4 -- -> Parallel Seq Scan on embeddings ``` ### Index-Based Parallel Search ```sql -- Create HNSW index CREATE INDEX embeddings_hnsw_idx ON embeddings USING ruhnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64); -- Parallel index scan SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance FROM embeddings ORDER BY distance LIMIT 100; ``` ### Query Planning Analysis ```sql -- Explain query parallelization SELECT * FROM ruvector_explain_parallel( 'embeddings_hnsw_idx', -- index name 100, -- k (neighbors) 200, -- ef_search 768 -- dimensions ); -- Returns JSON with: -- { -- "parallel_plan": { -- "enabled": true, -- "num_workers": 4, -- "num_partitions": 12, -- "estimated_speedup": "2.8x" -- } -- } ``` ## Performance Tuning ### Worker Count Optimization ```sql -- Benchmark different worker counts DO $$ DECLARE workers INT; exec_time FLOAT; BEGIN FOR workers IN 1..8 LOOP SET max_parallel_workers_per_gather = workers; SELECT extract(epoch from ( SELECT clock_timestamp() - now() FROM ( SELECT embedding <-> '[...]'::vector AS dist FROM embeddings ORDER BY dist LIMIT 100 ) sub )) INTO exec_time; RAISE NOTICE 'Workers: %, Time: %ms', workers, exec_time * 1000; END LOOP; END $$; ``` ### Partition Tuning The number of partitions affects load balancing: - **Too few partitions**: Poor load distribution - **Too many partitions**: Higher overhead RuVector uses **3x workers** as default partition count. ```sql -- Check partition statistics SELECT num_workers, num_partitions, total_results, completed_workers FROM ruvector_parallel_stats(); ``` ### Cost Model Tuning ```sql -- Adjust costs for your workload SET parallel_setup_cost = 500; -- Lower = more likely to parallelize SET parallel_tuple_cost = 0.05; -- Lower = favor parallel execution -- Monitor query planning EXPLAIN (ANALYZE, VERBOSE, COSTS) SELECT * FROM embeddings ORDER BY embedding <-> '[...]'::vector LIMIT 50; ``` ## Performance Characteristics ### Speedup by Index Size | Index Size | Tuples | Sequential (ms) | Parallel (4 workers) | Speedup | |------------|--------|-----------------|---------------------|---------| | 100 MB | 10K | 8.2 | 8.5 | 0.96x | | 500 MB | 50K | 42.1 | 17.3 | 2.4x | | 2 GB | 200K | 165.3 | 52.8 | 3.1x | | 10 GB | 1M | 891.2 | 247.6 | 3.6x | ### Speedup by Query Complexity | k | ef_search | Sequential (ms) | Parallel (ms) | Speedup | |-----|-----------|-----------------|---------------|---------| | 10 | 40 | 45.2 | 28.3 | 1.6x | | 50 | 100 | 89.7 | 31.2 | 2.9x | | 100 | 200 | 178.4 | 51.7 | 3.5x | | 500 | 500 | 623.1 | 168.9 | 3.7x | ## Background Worker ### Starting the Background Worker ```sql -- Start background maintenance worker SELECT ruvector_bgworker_start(); -- Check status SELECT * FROM ruvector_bgworker_status(); -- Returns: -- { -- "running": true, -- "cycles_completed": 47, -- "indexes_maintained": 235, -- "last_maintenance": 1701234567 -- } ``` ### Configuration ```sql -- Configure maintenance intervals and operations SELECT ruvector_bgworker_config( maintenance_interval_secs := 300, -- 5 minutes auto_optimize := true, collect_stats := true, auto_vacuum := true ); ``` ### Maintenance Operations The background worker performs: 1. **Statistics Collection** - Index size tracking - Fragmentation analysis - Query performance metrics 2. **Automatic Optimization** - HNSW graph refinement - IVFFlat centroid recomputation - Dead tuple removal 3. **Vacuum Operations** - Reclaim deleted space - Update index statistics - Compact memory ## Monitoring ### Real-Time Statistics ```sql -- Overall parallel execution stats SELECT * FROM ruvector_parallel_stats(); -- Per-query monitoring SELECT query, calls, total_time, mean_time, workers_used FROM pg_stat_statements WHERE query LIKE '%<->%' ORDER BY total_time DESC; ``` ### Performance Analysis ```sql -- Benchmark parallel vs sequential SELECT * FROM ruvector_benchmark_parallel( 'embeddings', -- table 'embedding', -- column '[0.1, 0.2, ...]'::vector, -- query 100 -- k ); -- Returns detailed comparison: -- { -- "sequential": {"time_ms": 45.2}, -- "parallel": { -- "time_ms": 18.7, -- "workers": 4, -- "speedup": "2.42x" -- } -- } ``` ## Best Practices ### When to Use Parallel Queries ✅ **Good candidates:** - Large indexes (>100,000 vectors) - High-dimensional vectors (>128 dims) - Large k values (>50) - High ef_search (>100) - Production OLAP workloads ❌ **Avoid for:** - Small indexes (<10,000 vectors) - Small k values (<10) - OLTP with many concurrent small queries - Memory-constrained systems ### Optimization Checklist 1. **Configure PostgreSQL Settings** ```sql SET max_parallel_workers_per_gather = 4; SET shared_buffers = '8GB'; SET work_mem = '256MB'; ``` 2. **Monitor Worker Efficiency** ```sql -- Check if workers are balanced SELECT * FROM ruvector_parallel_stats(); ``` 3. **Tune Index Parameters** ```sql -- For HNSW CREATE INDEX ... WITH ( m = 16, -- Connection count ef_construction = 64, -- Build quality ef_search = 40 -- Query quality ); ``` 4. **Enable Background Maintenance** ```sql SELECT ruvector_bgworker_start(); ``` ## Troubleshooting ### Parallel Query Not Activating **Check settings:** ```sql SHOW max_parallel_workers_per_gather; SHOW parallel_setup_cost; SHOW min_parallel_table_scan_size; ``` **Force parallel mode (testing only):** ```sql SET force_parallel_mode = ON; ``` ### Poor Parallel Speedup **Possible causes:** 1. **Too few tuples**: Overhead dominates ```sql SELECT count(*) FROM embeddings; -- Should be >10,000 ``` 2. **Memory constraints**: Workers competing for resources ```sql SET work_mem = '512MB'; -- Increase per-worker memory ``` 3. **Lock contention**: Concurrent writes blocking readers ```sql -- Separate read/write workloads ``` ### High Memory Usage ```sql -- Monitor memory per worker SELECT pid, backend_type, pg_size_pretty(pg_backend_memory_usage()) as memory FROM pg_stat_activity WHERE backend_type LIKE 'parallel%'; -- Reduce workers if needed SET max_parallel_workers_per_gather = 2; ``` ## Advanced Features ### Custom Parallelization ```sql -- Override automatic estimation SELECT /*+ Parallel(embeddings 8) */ id, embedding <-> '[...]'::vector AS distance FROM embeddings ORDER BY distance LIMIT 100; ``` ### Partition-Aware Queries ```sql -- Query specific partitions in parallel SELECT * FROM embeddings_2024_01 UNION ALL SELECT * FROM embeddings_2024_02 ORDER BY embedding <-> '[...]'::vector LIMIT 100; ``` ### Integration with Connection Pooling ```sql -- PgBouncer configuration [databases] mydb = host=localhost pool_mode=transaction max_db_connections = 20 default_pool_size = 5 -- Reserve connections for parallel workers reserve_pool_size = 16 -- 4 workers * 4 queries ``` ## References - [PostgreSQL Parallel Query Documentation](https://www.postgresql.org/docs/current/parallel-query.html) - [RuVector Architecture](./architecture.md) - [HNSW Index Guide](./hnsw-index.md) - [Performance Tuning](./performance-tuning.md) ## Summary RuVector's parallel query execution provides: - **2-4x speedup** for large indexes and complex queries - **Automatic optimization** with background worker - **Zero configuration** for most workloads - **Full PostgreSQL compatibility** with standard parallel query infrastructure For optimal performance, ensure your index is sufficiently large (>100K vectors) and tune `max_parallel_workers_per_gather` based on your hardware.