Files
wifi-densepose/crates/ruvector-postgres/docs/QUICK_REFERENCE_IVFFLAT.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

3.5 KiB

IVFFlat Index - Quick Reference

Installation

-- 1. Load extension
CREATE EXTENSION ruvector;

-- 2. Create access method (run once)
\i sql/ivfflat_am.sql

-- 3. Verify
SELECT * FROM pg_am WHERE amname = 'ruivfflat';

Create Index

-- Small dataset (< 10K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 50);

-- Medium dataset (10K-100K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 100);

-- Large dataset (> 100K vectors)
CREATE INDEX idx_name ON table_name
USING ruivfflat (embedding vector_l2_ops)
WITH (lists = 500);

Distance Metrics

-- Euclidean (L2)
CREATE INDEX ON table USING ruivfflat (embedding vector_l2_ops);
SELECT * FROM table ORDER BY embedding <-> '[...]' LIMIT 10;

-- Cosine
CREATE INDEX ON table USING ruivfflat (embedding vector_cosine_ops);
SELECT * FROM table ORDER BY embedding <=> '[...]' LIMIT 10;

-- Inner Product
CREATE INDEX ON table USING ruivfflat (embedding vector_ip_ops);
SELECT * FROM table ORDER BY embedding <#> '[...]' LIMIT 10;

Performance Tuning

-- Fast (70% recall)
SET ruvector.ivfflat_probes = 1;

-- Balanced (85% recall)
SET ruvector.ivfflat_probes = 5;

-- Accurate (95% recall)
SET ruvector.ivfflat_probes = 10;

-- Very accurate (98% recall)
SET ruvector.ivfflat_probes = 20;

Common Operations

-- Get index stats
SELECT * FROM ruvector_ivfflat_stats('idx_name');

-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_name'));

-- Rebuild index
REINDEX INDEX idx_name;

-- Drop index
DROP INDEX idx_name;

File Structure

Implementation Files (2,106 lines total):
├── src/index/ivfflat_am.rs (673 lines)      - Access method callbacks
├── src/index/ivfflat_storage.rs (347 lines) - Storage management
├── sql/ivfflat_am.sql (61 lines)            - SQL installation
├── docs/ivfflat_access_method.md (304 lines)- Architecture docs
├── examples/ivfflat_usage.md (472 lines)    - Usage examples
└── tests/ivfflat_am_test.sql (249 lines)    - Test suite

Key Implementation Features

PostgreSQL Access Method: Full IndexAmRoutine with all callbacks Storage Layout: Page 0 (metadata), 1-N (centroids), N+1-M (lists) K-means Clustering: K-means++ init + Lloyd's algorithm Search Algorithm: Probe nearest centroids, re-rank candidates Zero-Copy: Direct heap tuple access GUC Variables: Configurable via ruvector.ivfflat_probes Multiple Metrics: L2, Cosine, Inner Product, Manhattan

Performance Guidelines

Dataset Size Lists Probes Expected QPS Recall
10K 50 5 1000 85%
100K 100 10 500 92%
1M 500 10 250 95%
10M 1000 10 125 95%

Troubleshooting

Slow queries?

SET ruvector.ivfflat_probes = 1;  -- Reduce probes

Low recall?

SET ruvector.ivfflat_probes = 20;  -- Increase probes
-- OR
CREATE INDEX ... WITH (lists = 1000);  -- More lists

Index build fails?

-- Reduce lists if memory constrained
CREATE INDEX ... WITH (lists = 50);

Documentation

  • Architecture: docs/ivfflat_access_method.md
  • Usage Examples: examples/ivfflat_usage.md
  • Test Suite: tests/ivfflat_am_test.sql
  • Overview: README_IVFFLAT.md
  • Summary: IMPLEMENTATION_SUMMARY.md