Files
wifi-densepose/vendor/ruvector/docs/adr/ADR-0027-hnsw-parameterized-query-fix.md

186 lines
6.4 KiB
Markdown

# ADR-0027: Fix HNSW Index Segmentation Fault with Parameterized Queries
## Status
**Accepted** - 2026-01-28
## Context
### Problem Statement
GitHub Issue #141 reported a **critical (P0)** bug where HNSW indexes on `ruvector(384)` columns cause PostgreSQL to crash with a segmentation fault when executing similarity queries with parameterized query vectors.
### Symptoms
1. **Warning**: `"HNSW: Could not extract query vector, using zeros"`
2. **Warning**: `"HNSW v2: Bitmap scans not supported for k-NN queries"`
3. **Fatal**: `"server process terminated by signal 11: Segmentation fault"`
### Root Cause Analysis
The bug has three contributing factors:
1. **Query Vector Extraction Failure**
- The `hnsw_rescan` callback extracts the query vector from PostgreSQL's `orderby.sk_argument` datum
- The extraction code only handles direct `ruvector` datums via `RuVector::from_polymorphic_datum()`
- **Parameterized queries** (prepared statements, application drivers) pass text representations that require conversion
- When extraction fails, the code falls back to a zero vector
2. **Invalid Zero Vector Handling**
- A zero vector is mathematically invalid for similarity search (especially in hyperbolic/Poincaré space)
- The HNSW search algorithm proceeds with this invalid vector without validation
- Distance calculations with zero vectors cause undefined behavior
3. **Missing Error Handling**
- No validation before executing HNSW search
- Segmentation fault instead of graceful PostgreSQL error
- No dimension mismatch checking
### Impact
- **Production Adoption Blocked**: Modern applications use parameterized queries (ORMs, prepared statements, SQL injection prevention)
- **100% Reproducible**: Any parameterized HNSW query triggers the crash
- **Workaround Required**: Sequential scans with 10-15x performance penalty
## Decision
### Fix Strategy
Implement a comprehensive query vector extraction pipeline with proper validation:
#### 1. Multi-Method Query Vector Extraction
```rust
// Method 1: Direct RuVector extraction (literals, casts)
if let Some(vector) = RuVector::from_polymorphic_datum(datum, false, typoid) {
state.query_vector = vector.as_slice().to_vec();
state.query_valid = true;
}
// Method 2: Text parameter conversion (parameterized queries)
if !state.query_valid && is_text_type(typoid) {
if let Some(vec) = try_convert_text_to_ruvector(datum) {
state.query_vector = vec;
state.query_valid = true;
}
}
// Method 3: Validated varlena fallback
if !state.query_valid {
// ... with size and dimension validation
}
```
#### 2. Validation Before Search
```rust
// Reject invalid queries with clear error messages
if !state.query_valid || state.query_vector.is_empty() {
pgrx::error!("HNSW: Could not extract query vector...");
}
if is_zero_vector(&state.query_vector) {
pgrx::error!("HNSW: Query vector is all zeros...");
}
if state.query_vector.len() != state.dimensions {
pgrx::error!("HNSW: Dimension mismatch...");
}
```
#### 3. Track Query Validity State
Add `query_valid: bool` field to `HnswScanState` to track extraction success across methods.
### Changes Made
| File | Changes |
|------|---------|
| `crates/ruvector-postgres/src/index/hnsw_am.rs` | Multi-method extraction, validation, zero-vector check |
| `crates/ruvector-postgres/src/index/ivfflat_am.rs` | Same fixes applied for consistency |
### Key Functions Added/Modified
- `hnsw_rescan()` - Complete rewrite of query extraction logic
- `try_convert_text_to_ruvector()` - New function for text→ruvector conversion
- `is_zero_vector()` - New validation helper
- `ivfflat_amrescan()` - Parallel fix for IVFFlat index
- `ivfflat_try_convert_text_to_ruvector()` - IVFFlat text conversion
## Consequences
### Positive
- **Parameterized queries work**: Prepared statements, ORMs, application drivers all function correctly
- **Graceful error handling**: PostgreSQL ERROR instead of segfault
- **Clear error messages**: Users understand what went wrong and how to fix it
- **Dimension validation**: Catches mismatched query/index dimensions early
- **Zero-vector protection**: Invalid queries rejected before search execution
### Negative
- **Slight overhead**: Additional validation on each query (negligible, ~1μs)
- **Text parsing**: Manual vector parsing for text parameters (only when other methods fail)
### Neutral
- **No API changes**: Existing queries continue to work unchanged
- **IVFFlat also fixed**: Consistent behavior across both index types
## Test Plan
### Unit Tests
```sql
-- 1. Literal query (baseline - should work)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0.1,0.2,0.3]'::ruvector(3) LIMIT 5;
-- 2. Prepared statement (was crashing, now works)
PREPARE search AS SELECT * FROM test_hnsw ORDER BY embedding <=> $1::ruvector(3) LIMIT 5;
EXECUTE search('[0.1,0.2,0.3]');
-- 3. Function with text parameter (was crashing, now works)
SELECT * FROM search_similar('[0.1,0.2,0.3]');
-- 4. Zero vector (was crashing, now errors gracefully)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0,0,0]'::ruvector(3) LIMIT 5;
-- ERROR: HNSW: Query vector is all zeros...
-- 5. Dimension mismatch (was undefined behavior, now errors)
SELECT * FROM test_hnsw ORDER BY embedding <=> '[0.1,0.2]'::ruvector(2) LIMIT 5;
-- ERROR: HNSW: Query vector has 2 dimensions but index expects 3
```
### Integration Tests
- Node.js pg driver with parameterized queries
- Python psycopg with prepared statements
- Rust sqlx with query parameters
- Load test with 10k concurrent parameterized queries
## Related
- **Issue**: [#141](https://github.com/ruvnet/ruvector/issues/141) - HNSW Segmentation Fault with Parameterized Queries
- **Reporter**: Mark Allen, NexaDental CTO
- **Priority**: P0 (Critical) - Production blocker
## Implementation Checklist
- [x] Fix `hnsw_rescan()` query extraction
- [x] Add `try_convert_text_to_ruvector()` helper
- [x] Add `is_zero_vector()` validation
- [x] Add `query_valid` field to scan state
- [x] Apply same fix to IVFFlat for consistency
- [x] Compile verification
- [ ] Add regression tests
- [ ] Update documentation
- [ ] Build new Docker image
- [ ] Test with production dataset (6,975 rows)
- [ ] Release v2.0.1 patch
## References
- [PostgreSQL Index AM API](https://www.postgresql.org/docs/current/indexam.html)
- [pgrx FromDatum trait](https://docs.rs/pgrx/latest/pgrx/trait.FromDatum.html)
- [pgvector parameter handling](https://github.com/pgvector/pgvector/blob/master/src/hnsw.c)