Files
wifi-densepose/docs/postgres/operator-quick-reference.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

170 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RuVector Distance Operators - Quick Reference
## 🚀 Zero-Copy Operators (Use These!)
All operators use SIMD-optimized zero-copy access automatically.
### SQL Operators
```sql
-- L2 (Euclidean) Distance
SELECT * FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 10;
-- Inner Product (Maximum similarity)
SELECT * FROM items ORDER BY embedding <#> '[1,2,3]' LIMIT 10;
-- Cosine Distance (Semantic similarity)
SELECT * FROM items ORDER BY embedding <=> '[1,2,3]' LIMIT 10;
-- L1 (Manhattan) Distance
SELECT * FROM items ORDER BY embedding <+> '[1,2,3]' LIMIT 10;
```
### Function Forms
```sql
-- When you need the distance value explicitly
SELECT
id,
ruvector_l2_distance(embedding, '[1,2,3]') as l2_dist,
ruvector_ip_distance(embedding, '[1,2,3]') as ip_dist,
ruvector_cosine_distance(embedding, '[1,2,3]') as cos_dist,
ruvector_l1_distance(embedding, '[1,2,3]') as l1_dist
FROM items;
```
## 📊 Operator Comparison
| Operator | Math Formula | Range | Best For |
|----------|--------------|-------|----------|
| `<->` | `√Σ(aᵢ-bᵢ)²` | [0, ∞) | General similarity, geometry |
| `<#>` | `-Σ(aᵢ×bᵢ)` | (-∞, ∞) | MIPS, recommendations |
| `<=>` | `1-(a·b)/(‖a‖‖b‖)` | [0, 2] | Text, semantic search |
| `<+>` | `Σ\|aᵢ-bᵢ\|` | [0, ∞) | Sparse vectors, L1 norm |
## 💡 Common Patterns
### Nearest Neighbors
```sql
-- Find 10 nearest neighbors
SELECT id, content, embedding <-> $query AS dist
FROM documents
ORDER BY embedding <-> $query
LIMIT 10;
```
### Filtered Search
```sql
-- Search within a category
SELECT * FROM products
WHERE category = 'electronics'
ORDER BY embedding <=> $query
LIMIT 20;
```
### Distance Threshold
```sql
-- Find all items within distance 0.5
SELECT * FROM items
WHERE embedding <-> $query < 0.5;
```
### Batch Distances
```sql
-- Compare one vector against many
SELECT id, embedding <-> '[1,2,3]' AS distance
FROM items
WHERE id IN (1, 2, 3, 4, 5);
```
## 🏗️ Index Creation
```sql
-- HNSW index (best for most cases)
CREATE INDEX ON items USING hnsw (embedding ruvector_l2_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat index (good for large datasets)
CREATE INDEX ON items USING ivfflat (embedding ruvector_cosine_ops)
WITH (lists = 100);
```
## ⚡ Performance Tips
1. **Use RuVector type, not arrays**: `ruvector` type enables zero-copy
2. **Create indexes**: Essential for large datasets
3. **Normalize for cosine**: Pre-normalize vectors if using cosine often
4. **Check SIMD**: Run `SELECT ruvector_simd_info()` to verify acceleration
## 🔄 Migration from pgvector
RuVector operators are **drop-in compatible** with pgvector:
```sql
-- pgvector syntax works unchanged
SELECT * FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 10;
-- Just change the type from 'vector' to 'ruvector'
ALTER TABLE items ALTER COLUMN embedding TYPE ruvector(384);
```
## 📏 Dimension Support
- **Maximum**: 16,000 dimensions
- **Recommended**: 128-2048 for most use cases
- **Performance**: Optimal at multiples of 16 (AVX-512) or 8 (AVX2)
## 🐛 Debugging
```sql
-- Check SIMD support
SELECT ruvector_simd_info();
-- Verify vector dimensions
SELECT array_length(embedding::float4[], 1) FROM items LIMIT 1;
-- Test distance calculation
SELECT '[1,2,3]'::ruvector <-> '[4,5,6]'::ruvector;
-- Should return: 5.196152 (≈√27)
```
## 🎯 Choosing the Right Metric
| Your Data | Recommended Operator |
|-----------|---------------------|
| Text embeddings (BERT, OpenAI) | `<=>` (cosine) |
| Image features (ResNet, CLIP) | `<->` (L2) |
| Recommender systems | `<#>` (inner product) |
| Document vectors (TF-IDF) | `<=>` (cosine) |
| Sparse features | `<+>` (L1) |
| General floating-point | `<->` (L2) |
## ✅ Validation
```sql
-- Test basic functionality
CREATE TEMP TABLE test_vectors (v ruvector(3));
INSERT INTO test_vectors VALUES ('[1,2,3]'), ('[4,5,6]');
-- Should return distances
SELECT a.v <-> b.v AS l2,
a.v <#> b.v AS ip,
a.v <=> b.v AS cosine,
a.v <+> b.v AS l1
FROM test_vectors a, test_vectors b
WHERE a.v <> b.v;
```
Expected output:
```
l2 | ip | cosine | l1
---------+---------+----------+------
5.19615 | -32.000 | 0.025368 | 9.00
```
## 📚 Further Reading
- [Complete Documentation](./zero-copy-operators.md)
- [SIMD Implementation](../crates/ruvector-postgres/src/distance/simd.rs)
- [Benchmarks](../benchmarks/distance_bench.md)