5.4 KiB
5.4 KiB
HNSW Index - Quick Reference Guide
Installation
# Build and install
cd /home/user/ruvector/crates/ruvector-postgres
cargo pgrx install
# Enable in database
CREATE EXTENSION ruvector;
Index Creation
-- L2 distance (default)
CREATE INDEX ON table USING hnsw (column hnsw_l2_ops);
-- With custom parameters
CREATE INDEX ON table USING hnsw (column hnsw_l2_ops)
WITH (m = 32, ef_construction = 128);
-- Cosine distance
CREATE INDEX ON table USING hnsw (column hnsw_cosine_ops);
-- Inner product
CREATE INDEX ON table USING hnsw (column hnsw_ip_ops);
Query Syntax
-- L2 distance
SELECT * FROM table ORDER BY column <-> query_vector LIMIT 10;
-- Cosine distance
SELECT * FROM table ORDER BY column <=> query_vector LIMIT 10;
-- Inner product
SELECT * FROM table ORDER BY column <#> query_vector LIMIT 10;
Parameters
Index Build Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
m |
16 | 2-128 | Max connections per layer |
ef_construction |
64 | 4-1000 | Build candidate list size |
Query Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
ruvector.ef_search |
40 | 1-1000 | Search candidate list size |
-- Set globally
ALTER SYSTEM SET ruvector.ef_search = 100;
-- Set per session
SET ruvector.ef_search = 100;
-- Set per transaction
SET LOCAL ruvector.ef_search = 100;
Distance Metrics
| Metric | Operator | Use Case | Formula |
|---|---|---|---|
| L2 | <-> |
General distance | √(Σ(a-b)²) |
| Cosine | <=> |
Direction similarity | 1-(a·b)/(‖a‖‖b‖) |
| Inner Product | <#> |
Max similarity | -Σ(a*b) |
Performance Tuning
For Better Recall
-- Increase ef_search
SET ruvector.ef_search = 100;
-- Rebuild with higher ef_construction
WITH (ef_construction = 200);
For Faster Build
-- Lower ef_construction
WITH (ef_construction = 32);
-- Increase memory
SET maintenance_work_mem = '4GB';
For Less Memory
-- Lower m
WITH (m = 8);
Common Queries
Basic Similarity Search
SELECT id, column <-> query AS dist
FROM table
ORDER BY column <-> query
LIMIT 10;
Filtered Search
SELECT id, column <-> query AS dist
FROM table
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY column <-> query
LIMIT 10;
Hybrid Search
SELECT
id,
0.3 * text_rank + 0.7 * (1/(1+vector_dist)) AS score
FROM table
WHERE text_column @@ search_query
ORDER BY score DESC
LIMIT 10;
Maintenance
-- View statistics
SELECT ruvector_memory_stats();
-- Perform maintenance
SELECT ruvector_index_maintenance('index_name');
-- Vacuum
VACUUM ANALYZE table;
-- Rebuild index
REINDEX INDEX index_name;
Monitoring
-- Check index size
SELECT pg_size_pretty(pg_relation_size('index_name'));
-- Explain query
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM table ORDER BY column <-> query LIMIT 10;
Operators Reference
-- Distance operators
ARRAY[1,2,3]::real[] <-> ARRAY[4,5,6]::real[] -- L2
ARRAY[1,2,3]::real[] <=> ARRAY[4,5,6]::real[] -- Cosine
ARRAY[1,2,3]::real[] <#> ARRAY[4,5,6]::real[] -- Inner product
-- Vector utilities
vector_normalize(ARRAY[3,4]::real[]) -- Normalize
vector_norm(ARRAY[3,4]::real[]) -- L2 norm
vector_add(a::real[], b::real[]) -- Add vectors
vector_sub(a::real[], b::real[]) -- Subtract
Typical Performance
| Dataset | Dimensions | Build Time | Query Time | Memory |
|---|---|---|---|---|
| 10K | 128 | ~1s | <1ms | ~10MB |
| 100K | 128 | ~20s | ~2ms | ~100MB |
| 1M | 128 | ~5min | ~5ms | ~1GB |
| 10M | 128 | ~1hr | ~10ms | ~10GB |
Parameter Recommendations
Small Dataset (<100K vectors)
WITH (m = 16, ef_construction = 64)
SET ruvector.ef_search = 40;
Medium Dataset (100K-1M vectors)
WITH (m = 16, ef_construction = 128)
SET ruvector.ef_search = 64;
Large Dataset (>1M vectors)
WITH (m = 32, ef_construction = 200)
SET ruvector.ef_search = 100;
Troubleshooting
Slow Queries
- ✓ Increase
ef_search - ✓ Check index exists:
\d table - ✓ Analyze query:
EXPLAIN ANALYZE
Low Recall
- ✓ Increase
ef_search - ✓ Rebuild with higher
ef_construction - ✓ Use higher
mvalue
Out of Memory
- ✓ Lower
mvalue - ✓ Increase
maintenance_work_mem - ✓ Build index in batches
Index Build Fails
- ✓ Check data quality (no NULLs)
- ✓ Verify dimensions match
- ✓ Increase
maintenance_work_mem
Files and Documentation
- Implementation:
/home/user/ruvector/crates/ruvector-postgres/src/index/hnsw_am.rs - SQL:
/home/user/ruvector/crates/ruvector-postgres/sql/hnsw_index.sql - Tests:
/home/user/ruvector/crates/ruvector-postgres/tests/hnsw_index_tests.sql - Docs:
/home/user/ruvector/docs/HNSW_INDEX.md - Examples:
/home/user/ruvector/docs/HNSW_USAGE_EXAMPLE.md - Summary:
/home/user/ruvector/docs/HNSW_IMPLEMENTATION_SUMMARY.md
Version Info
- Implementation Version: 1.0
- PostgreSQL: 14, 15, 16, 17
- Extension: ruvector 0.1.0
- pgrx: 0.12.x
Support
- GitHub: https://github.com/ruvnet/ruvector
- Issues: https://github.com/ruvnet/ruvector/issues
- Docs:
/home/user/ruvector/docs/
Last Updated: December 2, 2025