# HNSW Index - Quick Reference Guide

## Installation

```bash
# Build and install
cd /home/user/ruvector/crates/ruvector-postgres
cargo pgrx install

# Enable in database
CREATE EXTENSION ruvector;
```

## Index Creation

```sql
-- L2 distance (default)
CREATE INDEX ON table USING hnsw (column hnsw_l2_ops);

-- With custom parameters
CREATE INDEX ON table USING hnsw (column hnsw_l2_ops)
    WITH (m = 32, ef_construction = 128);

-- Cosine distance
CREATE INDEX ON table USING hnsw (column hnsw_cosine_ops);

-- Inner product
CREATE INDEX ON table USING hnsw (column hnsw_ip_ops);
```

## Query Syntax

```sql
-- L2 distance
SELECT * FROM table ORDER BY column <-> query_vector LIMIT 10;

-- Cosine distance
SELECT * FROM table ORDER BY column <=> query_vector LIMIT 10;

-- Inner product
SELECT * FROM table ORDER BY column <#> query_vector LIMIT 10;
```

## Parameters

### Index Build Parameters

| Parameter | Default | Range | Description |
|-----------|---------|-------|-------------|
| `m` | 16 | 2-128 | Max connections per layer |
| `ef_construction` | 64 | 4-1000 | Build candidate list size |

### Query Parameters

| Parameter | Default | Range | Description |
|-----------|---------|-------|-------------|
| `ruvector.ef_search` | 40 | 1-1000 | Search candidate list size |

```sql
-- Set globally
ALTER SYSTEM SET ruvector.ef_search = 100;

-- Set per session
SET ruvector.ef_search = 100;

-- Set per transaction
SET LOCAL ruvector.ef_search = 100;
```

## Distance Metrics

| Metric | Operator | Use Case | Formula |
|--------|----------|----------|---------|
| L2 | `<->` | General distance | √(Σ(a-b)²) |
| Cosine | `<=>` | Direction similarity | 1-(a·b)/(‖a‖‖b‖) |
| Inner Product | `<#>` | Max similarity | -Σ(a*b) |

## Performance Tuning

### For Better Recall

```sql
-- Increase ef_search
SET ruvector.ef_search = 100;

-- Rebuild with higher ef_construction
WITH (ef_construction = 200);
```

### For Faster Build

```sql
-- Lower ef_construction
WITH (ef_construction = 32);

-- Increase memory
SET maintenance_work_mem = '4GB';
```

### For Less Memory

```sql
-- Lower m
WITH (m = 8);
```

## Common Queries

### Basic Similarity Search

```sql
SELECT id, column <-> query AS dist
FROM table
ORDER BY column <-> query
LIMIT 10;
```

### Filtered Search

```sql
SELECT id, column <-> query AS dist
FROM table
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY column <-> query
LIMIT 10;
```

### Hybrid Search

```sql
SELECT
    id,
    0.3 * text_rank + 0.7 * (1/(1+vector_dist)) AS score
FROM table
WHERE text_column @@ search_query
ORDER BY score DESC
LIMIT 10;
```

## Maintenance

```sql
-- View statistics
SELECT ruvector_memory_stats();

-- Perform maintenance
SELECT ruvector_index_maintenance('index_name');

-- Vacuum
VACUUM ANALYZE table;

-- Rebuild index
REINDEX INDEX index_name;
```

## Monitoring

```sql
-- Check index size
SELECT pg_size_pretty(pg_relation_size('index_name'));

-- Explain query
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM table ORDER BY column <-> query LIMIT 10;
```

## Operators Reference

```sql
-- Distance operators
ARRAY[1,2,3]::real[] <-> ARRAY[4,5,6]::real[]  -- L2
ARRAY[1,2,3]::real[] <=> ARRAY[4,5,6]::real[]  -- Cosine
ARRAY[1,2,3]::real[] <#> ARRAY[4,5,6]::real[]  -- Inner product

-- Vector utilities
vector_normalize(ARRAY[3,4]::real[])           -- Normalize
vector_norm(ARRAY[3,4]::real[])                -- L2 norm
vector_add(a::real[], b::real[])               -- Add vectors
vector_sub(a::real[], b::real[])               -- Subtract
```

## Typical Performance

| Dataset | Dimensions | Build Time | Query Time | Memory |
|---------|------------|------------|------------|--------|
| 10K | 128 | ~1s | <1ms | ~10MB |
| 100K | 128 | ~20s | ~2ms | ~100MB |
| 1M | 128 | ~5min | ~5ms | ~1GB |
| 10M | 128 | ~1hr | ~10ms | ~10GB |

## Parameter Recommendations

### Small Dataset (<100K vectors)

```sql
WITH (m = 16, ef_construction = 64)
SET ruvector.ef_search = 40;
```

### Medium Dataset (100K-1M vectors)

```sql
WITH (m = 16, ef_construction = 128)
SET ruvector.ef_search = 64;
```

### Large Dataset (>1M vectors)

```sql
WITH (m = 32, ef_construction = 200)
SET ruvector.ef_search = 100;
```

## Troubleshooting

### Slow Queries

- ✓ Increase `ef_search`
- ✓ Check index exists: `\d table`
- ✓ Analyze query: `EXPLAIN ANALYZE`

### Low Recall

- ✓ Increase `ef_search`
- ✓ Rebuild with higher `ef_construction`
- ✓ Use higher `m` value

### Out of Memory

- ✓ Lower `m` value
- ✓ Increase `maintenance_work_mem`
- ✓ Build index in batches

### Index Build Fails

- ✓ Check data quality (no NULLs)
- ✓ Verify dimensions match
- ✓ Increase `maintenance_work_mem`

## Files and Documentation

- **Implementation**: `/home/user/ruvector/crates/ruvector-postgres/src/index/hnsw_am.rs`
- **SQL**: `/home/user/ruvector/crates/ruvector-postgres/sql/hnsw_index.sql`
- **Tests**: `/home/user/ruvector/crates/ruvector-postgres/tests/hnsw_index_tests.sql`
- **Docs**: `/home/user/ruvector/docs/HNSW_INDEX.md`
- **Examples**: `/home/user/ruvector/docs/HNSW_USAGE_EXAMPLE.md`
- **Summary**: `/home/user/ruvector/docs/HNSW_IMPLEMENTATION_SUMMARY.md`

## Version Info

- **Implementation Version**: 1.0
- **PostgreSQL**: 14, 15, 16, 17
- **Extension**: ruvector 0.1.0
- **pgrx**: 0.12.x

## Support

- GitHub: https://github.com/ruvnet/ruvector
- Issues: https://github.com/ruvnet/ruvector/issues
- Docs: `/home/user/ruvector/docs/`

---

**Last Updated**: December 2, 2025