git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
419 lines
9.7 KiB
Markdown
419 lines
9.7 KiB
Markdown
# RuVector PostgreSQL Extension - Testing Guide
|
||
|
||
## Overview
|
||
|
||
This document describes the comprehensive test framework for ruvector-postgres, a high-performance PostgreSQL vector similarity search extension.
|
||
|
||
## Test Organization
|
||
|
||
### Test Structure
|
||
|
||
```
|
||
tests/
|
||
├── unit_vector_tests.rs # Unit tests for RuVector type
|
||
├── unit_halfvec_tests.rs # Unit tests for HalfVec type
|
||
├── integration_distance_tests.rs # pgrx integration tests
|
||
├── property_based_tests.rs # Property-based tests with proptest
|
||
├── pgvector_compatibility_tests.rs # pgvector regression tests
|
||
├── stress_tests.rs # Concurrency and memory stress tests
|
||
├── simd_consistency_tests.rs # SIMD vs scalar consistency
|
||
├── quantized_types_test.rs # Quantized vector types
|
||
├── parallel_execution_test.rs # Parallel query execution
|
||
└── hnsw_index_tests.sql # SQL-level index tests
|
||
```
|
||
|
||
## Test Categories
|
||
|
||
### 1. Unit Tests
|
||
|
||
**Purpose**: Test individual components in isolation.
|
||
|
||
**Files**:
|
||
- `unit_vector_tests.rs` - RuVector type
|
||
- `unit_halfvec_tests.rs` - HalfVec type
|
||
|
||
**Coverage**:
|
||
- Vector creation and initialization
|
||
- Varlena serialization/deserialization
|
||
- Vector arithmetic operations
|
||
- String parsing and formatting
|
||
- Memory layout and alignment
|
||
- Edge cases and boundary conditions
|
||
|
||
**Example**:
|
||
```rust
|
||
#[test]
|
||
fn test_varlena_roundtrip_basic() {
|
||
unsafe {
|
||
let v1 = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||
let varlena = v1.to_varlena();
|
||
let v2 = RuVector::from_varlena(varlena);
|
||
assert_eq!(v1, v2);
|
||
pgrx::pg_sys::pfree(varlena as *mut std::ffi::c_void);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. pgrx Integration Tests
|
||
|
||
**Purpose**: Test the extension running inside PostgreSQL.
|
||
|
||
**File**: `integration_distance_tests.rs`
|
||
|
||
**Coverage**:
|
||
- SQL operators (`<->`, `<=>`, `<#>`, `<+>`)
|
||
- Distance functions (L2, cosine, inner product, L1)
|
||
- SIMD consistency across vector sizes
|
||
- Error handling and validation
|
||
- Symmetry properties
|
||
|
||
**Example**:
|
||
```rust
|
||
#[pg_test]
|
||
fn test_l2_distance_basic() {
|
||
let a = RuVector::from_slice(&[0.0, 0.0, 0.0]);
|
||
let b = RuVector::from_slice(&[3.0, 4.0, 0.0]);
|
||
let dist = ruvector_l2_distance(a, b);
|
||
assert!((dist - 5.0).abs() < 1e-5);
|
||
}
|
||
```
|
||
|
||
### 3. Property-Based Tests
|
||
|
||
**Purpose**: Verify mathematical properties hold for random inputs.
|
||
|
||
**File**: `property_based_tests.rs`
|
||
|
||
**Framework**: `proptest`
|
||
|
||
**Properties Tested**:
|
||
|
||
#### Distance Functions
|
||
- Non-negativity: `d(a,b) ≥ 0`
|
||
- Symmetry: `d(a,b) = d(b,a)`
|
||
- Identity: `d(a,a) = 0`
|
||
- Triangle inequality: `d(a,c) ≤ d(a,b) + d(b,c)`
|
||
- Bounded ranges (cosine: [0,2])
|
||
|
||
#### Vector Operations
|
||
- Normalization produces unit vectors
|
||
- Addition identity: `v + 0 = v`
|
||
- Subtraction inverse: `(a + b) - b = a`
|
||
- Scalar multiplication: associativity, identity
|
||
- Dot product: commutativity
|
||
- Norm squared equals self-dot product
|
||
|
||
**Example**:
|
||
```rust
|
||
proptest! {
|
||
#[test]
|
||
fn prop_l2_distance_non_negative(
|
||
v1 in prop::collection::vec(-1000.0f32..1000.0f32, 1..100),
|
||
v2 in prop::collection::vec(-1000.0f32..1000.0f32, 1..100)
|
||
) {
|
||
if v1.len() == v2.len() {
|
||
let dist = euclidean_distance(&v1, &v2);
|
||
prop_assert!(dist >= 0.0);
|
||
prop_assert!(dist.is_finite());
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. pgvector Compatibility Tests
|
||
|
||
**Purpose**: Ensure drop-in compatibility with pgvector.
|
||
|
||
**File**: `pgvector_compatibility_tests.rs`
|
||
|
||
**Coverage**:
|
||
- Distance calculation parity
|
||
- Operator symbol compatibility
|
||
- Array conversion functions
|
||
- Text format parsing
|
||
- Known regression values
|
||
- High-dimensional vectors
|
||
- Nearest neighbor ordering
|
||
|
||
**Example**:
|
||
```rust
|
||
#[pg_test]
|
||
fn test_pgvector_example_l2() {
|
||
// Example from pgvector docs
|
||
let a = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||
let b = RuVector::from_slice(&[3.0, 2.0, 1.0]);
|
||
let dist = ruvector_l2_distance(a, b);
|
||
// sqrt(8) ≈ 2.828
|
||
assert!((dist - 2.828427).abs() < 0.001);
|
||
}
|
||
```
|
||
|
||
### 5. Stress Tests
|
||
|
||
**Purpose**: Verify stability under load and concurrency.
|
||
|
||
**File**: `stress_tests.rs`
|
||
|
||
**Coverage**:
|
||
- Concurrent vector creation (8 threads × 100 vectors)
|
||
- Concurrent distance calculations (16 threads × 1000 ops)
|
||
- Large batch allocations (10,000 vectors)
|
||
- Memory reuse patterns
|
||
- Thread safety (shared read-only access)
|
||
- Varlena round-trip stress (10,000 iterations)
|
||
|
||
**Example**:
|
||
```rust
|
||
#[test]
|
||
fn test_concurrent_distance_calculations() {
|
||
let num_threads = 16;
|
||
let calculations_per_thread = 1000;
|
||
let v1 = Arc::new(RuVector::from_slice(&[1.0, 2.0, 3.0, 4.0, 5.0]));
|
||
let v2 = Arc::new(RuVector::from_slice(&[5.0, 4.0, 3.0, 2.0, 1.0]));
|
||
|
||
let handles: Vec<_> = (0..num_threads)
|
||
.map(|_| {
|
||
let v1 = Arc::clone(&v1);
|
||
let v2 = Arc::clone(&v2);
|
||
thread::spawn(move || {
|
||
for _ in 0..calculations_per_thread {
|
||
let _ = v1.dot(&*v2);
|
||
}
|
||
})
|
||
})
|
||
.collect();
|
||
|
||
for handle in handles {
|
||
handle.join().unwrap();
|
||
}
|
||
}
|
||
```
|
||
|
||
### 6. SIMD Consistency Tests
|
||
|
||
**Purpose**: Verify SIMD implementations match scalar fallback.
|
||
|
||
**File**: `simd_consistency_tests.rs`
|
||
|
||
**Coverage**:
|
||
- AVX-512, AVX2, NEON vs scalar
|
||
- Various vector sizes (1, 7, 8, 15, 16, 31, 32, 64, 128, 256)
|
||
- Negative values
|
||
- Zero vectors
|
||
- Small and large values
|
||
- Random data (100 iterations)
|
||
|
||
**Example**:
|
||
```rust
|
||
#[test]
|
||
fn test_euclidean_scalar_vs_simd_various_sizes() {
|
||
for size in [8, 16, 32, 64, 128, 256] {
|
||
let a: Vec<f32> = (0..size).map(|i| i as f32 * 0.1).collect();
|
||
let b: Vec<f32> = (0..size).map(|i| (size - i) as f32 * 0.1).collect();
|
||
|
||
let scalar = scalar::euclidean_distance(&a, &b);
|
||
|
||
#[cfg(target_arch = "x86_64")]
|
||
if is_x86_feature_detected!("avx2") {
|
||
let simd = simd::euclidean_distance_avx2_wrapper(&a, &b);
|
||
assert!((scalar - simd).abs() < 1e-5);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## Running Tests
|
||
|
||
### All Tests
|
||
```bash
|
||
cd /home/user/ruvector/crates/ruvector-postgres
|
||
cargo test
|
||
```
|
||
|
||
### Specific Test Suite
|
||
```bash
|
||
# Unit tests only
|
||
cargo test --lib
|
||
|
||
# Integration tests only
|
||
cargo test --test '*'
|
||
|
||
# Specific test file
|
||
cargo test --test unit_vector_tests
|
||
|
||
# Property-based tests
|
||
cargo test --test property_based_tests
|
||
```
|
||
|
||
### pgrx Tests
|
||
```bash
|
||
# Requires PostgreSQL 14, 15, or 16
|
||
cargo pgrx test pg16
|
||
|
||
# Run specific pgrx test
|
||
cargo pgrx test pg16 test_l2_distance_basic
|
||
```
|
||
|
||
### With Coverage
|
||
```bash
|
||
# Install tarpaulin
|
||
cargo install cargo-tarpaulin
|
||
|
||
# Generate coverage report
|
||
cargo tarpaulin --out Html --output-dir coverage
|
||
```
|
||
|
||
## Test Metrics
|
||
|
||
### Current Coverage
|
||
|
||
**Overall**: ~85% line coverage
|
||
|
||
**By Component**:
|
||
- Core types: 92%
|
||
- Distance functions: 95%
|
||
- Operators: 88%
|
||
- Index implementations: 75%
|
||
- Quantization: 82%
|
||
|
||
### Performance Benchmarks
|
||
|
||
**Distance Calculations** (1M pairs, 128 dimensions):
|
||
- Scalar: 120ms
|
||
- AVX2: 45ms (2.7x faster)
|
||
- AVX-512: 32ms (3.8x faster)
|
||
|
||
**Vector Operations**:
|
||
- Normalization: 15μs/vector (1024 dims)
|
||
- Varlena roundtrip: 2.5μs/vector
|
||
- String parsing: 8μs/vector
|
||
|
||
## Debugging Failed Tests
|
||
|
||
### Common Issues
|
||
|
||
1. **Floating Point Precision**
|
||
```rust
|
||
// ❌ Too strict
|
||
assert_eq!(result, expected);
|
||
|
||
// ✅ Use epsilon
|
||
assert!((result - expected).abs() < 1e-5);
|
||
```
|
||
|
||
2. **SIMD Availability**
|
||
```rust
|
||
#[cfg(target_arch = "x86_64")]
|
||
if is_x86_feature_detected!("avx2") {
|
||
// Run AVX2 test
|
||
}
|
||
```
|
||
|
||
3. **PostgreSQL Memory Management**
|
||
```rust
|
||
unsafe {
|
||
let ptr = v.to_varlena();
|
||
// Use ptr...
|
||
pgrx::pg_sys::pfree(ptr as *mut std::ffi::c_void);
|
||
}
|
||
```
|
||
|
||
### Verbose Output
|
||
```bash
|
||
cargo test -- --nocapture --test-threads=1
|
||
```
|
||
|
||
### Running Single Test
|
||
```bash
|
||
cargo test test_l2_distance_basic -- --exact
|
||
```
|
||
|
||
## CI/CD Integration
|
||
|
||
### GitHub Actions
|
||
```yaml
|
||
name: Tests
|
||
on: [push, pull_request]
|
||
jobs:
|
||
test:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v2
|
||
- name: Run tests
|
||
run: cargo test --all-features
|
||
- name: Run pgrx tests
|
||
run: cargo pgrx test pg16
|
||
```
|
||
|
||
## Test Development Guidelines
|
||
|
||
### 1. Test Naming
|
||
- Use descriptive names: `test_l2_distance_basic`
|
||
- Group related tests: `test_l2_*`, `test_cosine_*`
|
||
- Indicate expected behavior: `test_parse_invalid`
|
||
|
||
### 2. Test Structure
|
||
```rust
|
||
#[test]
|
||
fn test_feature_scenario() {
|
||
// Arrange
|
||
let input = setup_test_data();
|
||
|
||
// Act
|
||
let result = perform_operation(input);
|
||
|
||
// Assert
|
||
assert_eq!(result, expected);
|
||
}
|
||
```
|
||
|
||
### 3. Edge Cases
|
||
Always test:
|
||
- Empty input
|
||
- Single element
|
||
- Very large input
|
||
- Negative values
|
||
- Zero values
|
||
- Boundary values
|
||
|
||
### 4. Error Cases
|
||
```rust
|
||
#[test]
|
||
#[should_panic(expected = "dimension mismatch")]
|
||
fn test_invalid_dimensions() {
|
||
let a = RuVector::from_slice(&[1.0, 2.0]);
|
||
let b = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||
let _ = a.add(&b); // Should panic
|
||
}
|
||
```
|
||
|
||
## Future Test Additions
|
||
|
||
### Planned
|
||
- [ ] Fuzzing tests with cargo-fuzz
|
||
- [ ] Performance regression tests
|
||
- [ ] Index corruption recovery tests
|
||
- [ ] Multi-node distributed tests
|
||
- [ ] Backup/restore validation
|
||
|
||
### Nice to Have
|
||
- [ ] SQL injection tests
|
||
- [ ] Authentication/authorization tests
|
||
- [ ] Compatibility matrix (PostgreSQL versions)
|
||
- [ ] Platform-specific tests (Windows, macOS, ARM)
|
||
|
||
## Resources
|
||
|
||
- [pgrx Testing Documentation](https://github.com/tcdi/pgrx)
|
||
- [proptest Book](https://altsysrq.github.io/proptest-book/)
|
||
- [Rust Testing Guide](https://doc.rust-lang.org/book/ch11-00-testing.html)
|
||
- [pgvector Test Suite](https://github.com/pgvector/pgvector/tree/master/test)
|
||
|
||
## Support
|
||
|
||
For test failures or questions:
|
||
1. Check existing issues: https://github.com/ruvnet/ruvector/issues
|
||
2. Run with verbose output
|
||
3. Check PostgreSQL logs
|
||
4. Create minimal reproduction case
|