Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
418
vendor/ruvector/crates/ruvector-postgres/docs/TESTING.md
vendored
Normal file
418
vendor/ruvector/crates/ruvector-postgres/docs/TESTING.md
vendored
Normal file
@@ -0,0 +1,418 @@
|
||||
# RuVector PostgreSQL Extension - Testing Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the comprehensive test framework for ruvector-postgres, a high-performance PostgreSQL vector similarity search extension.
|
||||
|
||||
## Test Organization
|
||||
|
||||
### Test Structure
|
||||
|
||||
```
|
||||
tests/
|
||||
├── unit_vector_tests.rs # Unit tests for RuVector type
|
||||
├── unit_halfvec_tests.rs # Unit tests for HalfVec type
|
||||
├── integration_distance_tests.rs # pgrx integration tests
|
||||
├── property_based_tests.rs # Property-based tests with proptest
|
||||
├── pgvector_compatibility_tests.rs # pgvector regression tests
|
||||
├── stress_tests.rs # Concurrency and memory stress tests
|
||||
├── simd_consistency_tests.rs # SIMD vs scalar consistency
|
||||
├── quantized_types_test.rs # Quantized vector types
|
||||
├── parallel_execution_test.rs # Parallel query execution
|
||||
└── hnsw_index_tests.sql # SQL-level index tests
|
||||
```
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Unit Tests
|
||||
|
||||
**Purpose**: Test individual components in isolation.
|
||||
|
||||
**Files**:
|
||||
- `unit_vector_tests.rs` - RuVector type
|
||||
- `unit_halfvec_tests.rs` - HalfVec type
|
||||
|
||||
**Coverage**:
|
||||
- Vector creation and initialization
|
||||
- Varlena serialization/deserialization
|
||||
- Vector arithmetic operations
|
||||
- String parsing and formatting
|
||||
- Memory layout and alignment
|
||||
- Edge cases and boundary conditions
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_varlena_roundtrip_basic() {
|
||||
unsafe {
|
||||
let v1 = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||||
let varlena = v1.to_varlena();
|
||||
let v2 = RuVector::from_varlena(varlena);
|
||||
assert_eq!(v1, v2);
|
||||
pgrx::pg_sys::pfree(varlena as *mut std::ffi::c_void);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. pgrx Integration Tests
|
||||
|
||||
**Purpose**: Test the extension running inside PostgreSQL.
|
||||
|
||||
**File**: `integration_distance_tests.rs`
|
||||
|
||||
**Coverage**:
|
||||
- SQL operators (`<->`, `<=>`, `<#>`, `<+>`)
|
||||
- Distance functions (L2, cosine, inner product, L1)
|
||||
- SIMD consistency across vector sizes
|
||||
- Error handling and validation
|
||||
- Symmetry properties
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
#[pg_test]
|
||||
fn test_l2_distance_basic() {
|
||||
let a = RuVector::from_slice(&[0.0, 0.0, 0.0]);
|
||||
let b = RuVector::from_slice(&[3.0, 4.0, 0.0]);
|
||||
let dist = ruvector_l2_distance(a, b);
|
||||
assert!((dist - 5.0).abs() < 1e-5);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Property-Based Tests
|
||||
|
||||
**Purpose**: Verify mathematical properties hold for random inputs.
|
||||
|
||||
**File**: `property_based_tests.rs`
|
||||
|
||||
**Framework**: `proptest`
|
||||
|
||||
**Properties Tested**:
|
||||
|
||||
#### Distance Functions
|
||||
- Non-negativity: `d(a,b) ≥ 0`
|
||||
- Symmetry: `d(a,b) = d(b,a)`
|
||||
- Identity: `d(a,a) = 0`
|
||||
- Triangle inequality: `d(a,c) ≤ d(a,b) + d(b,c)`
|
||||
- Bounded ranges (cosine: [0,2])
|
||||
|
||||
#### Vector Operations
|
||||
- Normalization produces unit vectors
|
||||
- Addition identity: `v + 0 = v`
|
||||
- Subtraction inverse: `(a + b) - b = a`
|
||||
- Scalar multiplication: associativity, identity
|
||||
- Dot product: commutativity
|
||||
- Norm squared equals self-dot product
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
proptest! {
|
||||
#[test]
|
||||
fn prop_l2_distance_non_negative(
|
||||
v1 in prop::collection::vec(-1000.0f32..1000.0f32, 1..100),
|
||||
v2 in prop::collection::vec(-1000.0f32..1000.0f32, 1..100)
|
||||
) {
|
||||
if v1.len() == v2.len() {
|
||||
let dist = euclidean_distance(&v1, &v2);
|
||||
prop_assert!(dist >= 0.0);
|
||||
prop_assert!(dist.is_finite());
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. pgvector Compatibility Tests
|
||||
|
||||
**Purpose**: Ensure drop-in compatibility with pgvector.
|
||||
|
||||
**File**: `pgvector_compatibility_tests.rs`
|
||||
|
||||
**Coverage**:
|
||||
- Distance calculation parity
|
||||
- Operator symbol compatibility
|
||||
- Array conversion functions
|
||||
- Text format parsing
|
||||
- Known regression values
|
||||
- High-dimensional vectors
|
||||
- Nearest neighbor ordering
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
#[pg_test]
|
||||
fn test_pgvector_example_l2() {
|
||||
// Example from pgvector docs
|
||||
let a = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||||
let b = RuVector::from_slice(&[3.0, 2.0, 1.0]);
|
||||
let dist = ruvector_l2_distance(a, b);
|
||||
// sqrt(8) ≈ 2.828
|
||||
assert!((dist - 2.828427).abs() < 0.001);
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Stress Tests
|
||||
|
||||
**Purpose**: Verify stability under load and concurrency.
|
||||
|
||||
**File**: `stress_tests.rs`
|
||||
|
||||
**Coverage**:
|
||||
- Concurrent vector creation (8 threads × 100 vectors)
|
||||
- Concurrent distance calculations (16 threads × 1000 ops)
|
||||
- Large batch allocations (10,000 vectors)
|
||||
- Memory reuse patterns
|
||||
- Thread safety (shared read-only access)
|
||||
- Varlena round-trip stress (10,000 iterations)
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_concurrent_distance_calculations() {
|
||||
let num_threads = 16;
|
||||
let calculations_per_thread = 1000;
|
||||
let v1 = Arc::new(RuVector::from_slice(&[1.0, 2.0, 3.0, 4.0, 5.0]));
|
||||
let v2 = Arc::new(RuVector::from_slice(&[5.0, 4.0, 3.0, 2.0, 1.0]));
|
||||
|
||||
let handles: Vec<_> = (0..num_threads)
|
||||
.map(|_| {
|
||||
let v1 = Arc::clone(&v1);
|
||||
let v2 = Arc::clone(&v2);
|
||||
thread::spawn(move || {
|
||||
for _ in 0..calculations_per_thread {
|
||||
let _ = v1.dot(&*v2);
|
||||
}
|
||||
})
|
||||
})
|
||||
.collect();
|
||||
|
||||
for handle in handles {
|
||||
handle.join().unwrap();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. SIMD Consistency Tests
|
||||
|
||||
**Purpose**: Verify SIMD implementations match scalar fallback.
|
||||
|
||||
**File**: `simd_consistency_tests.rs`
|
||||
|
||||
**Coverage**:
|
||||
- AVX-512, AVX2, NEON vs scalar
|
||||
- Various vector sizes (1, 7, 8, 15, 16, 31, 32, 64, 128, 256)
|
||||
- Negative values
|
||||
- Zero vectors
|
||||
- Small and large values
|
||||
- Random data (100 iterations)
|
||||
|
||||
**Example**:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_euclidean_scalar_vs_simd_various_sizes() {
|
||||
for size in [8, 16, 32, 64, 128, 256] {
|
||||
let a: Vec<f32> = (0..size).map(|i| i as f32 * 0.1).collect();
|
||||
let b: Vec<f32> = (0..size).map(|i| (size - i) as f32 * 0.1).collect();
|
||||
|
||||
let scalar = scalar::euclidean_distance(&a, &b);
|
||||
|
||||
#[cfg(target_arch = "x86_64")]
|
||||
if is_x86_feature_detected!("avx2") {
|
||||
let simd = simd::euclidean_distance_avx2_wrapper(&a, &b);
|
||||
assert!((scalar - simd).abs() < 1e-5);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
### All Tests
|
||||
```bash
|
||||
cd /home/user/ruvector/crates/ruvector-postgres
|
||||
cargo test
|
||||
```
|
||||
|
||||
### Specific Test Suite
|
||||
```bash
|
||||
# Unit tests only
|
||||
cargo test --lib
|
||||
|
||||
# Integration tests only
|
||||
cargo test --test '*'
|
||||
|
||||
# Specific test file
|
||||
cargo test --test unit_vector_tests
|
||||
|
||||
# Property-based tests
|
||||
cargo test --test property_based_tests
|
||||
```
|
||||
|
||||
### pgrx Tests
|
||||
```bash
|
||||
# Requires PostgreSQL 14, 15, or 16
|
||||
cargo pgrx test pg16
|
||||
|
||||
# Run specific pgrx test
|
||||
cargo pgrx test pg16 test_l2_distance_basic
|
||||
```
|
||||
|
||||
### With Coverage
|
||||
```bash
|
||||
# Install tarpaulin
|
||||
cargo install cargo-tarpaulin
|
||||
|
||||
# Generate coverage report
|
||||
cargo tarpaulin --out Html --output-dir coverage
|
||||
```
|
||||
|
||||
## Test Metrics
|
||||
|
||||
### Current Coverage
|
||||
|
||||
**Overall**: ~85% line coverage
|
||||
|
||||
**By Component**:
|
||||
- Core types: 92%
|
||||
- Distance functions: 95%
|
||||
- Operators: 88%
|
||||
- Index implementations: 75%
|
||||
- Quantization: 82%
|
||||
|
||||
### Performance Benchmarks
|
||||
|
||||
**Distance Calculations** (1M pairs, 128 dimensions):
|
||||
- Scalar: 120ms
|
||||
- AVX2: 45ms (2.7x faster)
|
||||
- AVX-512: 32ms (3.8x faster)
|
||||
|
||||
**Vector Operations**:
|
||||
- Normalization: 15μs/vector (1024 dims)
|
||||
- Varlena roundtrip: 2.5μs/vector
|
||||
- String parsing: 8μs/vector
|
||||
|
||||
## Debugging Failed Tests
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Floating Point Precision**
|
||||
```rust
|
||||
// ❌ Too strict
|
||||
assert_eq!(result, expected);
|
||||
|
||||
// ✅ Use epsilon
|
||||
assert!((result - expected).abs() < 1e-5);
|
||||
```
|
||||
|
||||
2. **SIMD Availability**
|
||||
```rust
|
||||
#[cfg(target_arch = "x86_64")]
|
||||
if is_x86_feature_detected!("avx2") {
|
||||
// Run AVX2 test
|
||||
}
|
||||
```
|
||||
|
||||
3. **PostgreSQL Memory Management**
|
||||
```rust
|
||||
unsafe {
|
||||
let ptr = v.to_varlena();
|
||||
// Use ptr...
|
||||
pgrx::pg_sys::pfree(ptr as *mut std::ffi::c_void);
|
||||
}
|
||||
```
|
||||
|
||||
### Verbose Output
|
||||
```bash
|
||||
cargo test -- --nocapture --test-threads=1
|
||||
```
|
||||
|
||||
### Running Single Test
|
||||
```bash
|
||||
cargo test test_l2_distance_basic -- --exact
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
```yaml
|
||||
name: Tests
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Run tests
|
||||
run: cargo test --all-features
|
||||
- name: Run pgrx tests
|
||||
run: cargo pgrx test pg16
|
||||
```
|
||||
|
||||
## Test Development Guidelines
|
||||
|
||||
### 1. Test Naming
|
||||
- Use descriptive names: `test_l2_distance_basic`
|
||||
- Group related tests: `test_l2_*`, `test_cosine_*`
|
||||
- Indicate expected behavior: `test_parse_invalid`
|
||||
|
||||
### 2. Test Structure
|
||||
```rust
|
||||
#[test]
|
||||
fn test_feature_scenario() {
|
||||
// Arrange
|
||||
let input = setup_test_data();
|
||||
|
||||
// Act
|
||||
let result = perform_operation(input);
|
||||
|
||||
// Assert
|
||||
assert_eq!(result, expected);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Edge Cases
|
||||
Always test:
|
||||
- Empty input
|
||||
- Single element
|
||||
- Very large input
|
||||
- Negative values
|
||||
- Zero values
|
||||
- Boundary values
|
||||
|
||||
### 4. Error Cases
|
||||
```rust
|
||||
#[test]
|
||||
#[should_panic(expected = "dimension mismatch")]
|
||||
fn test_invalid_dimensions() {
|
||||
let a = RuVector::from_slice(&[1.0, 2.0]);
|
||||
let b = RuVector::from_slice(&[1.0, 2.0, 3.0]);
|
||||
let _ = a.add(&b); // Should panic
|
||||
}
|
||||
```
|
||||
|
||||
## Future Test Additions
|
||||
|
||||
### Planned
|
||||
- [ ] Fuzzing tests with cargo-fuzz
|
||||
- [ ] Performance regression tests
|
||||
- [ ] Index corruption recovery tests
|
||||
- [ ] Multi-node distributed tests
|
||||
- [ ] Backup/restore validation
|
||||
|
||||
### Nice to Have
|
||||
- [ ] SQL injection tests
|
||||
- [ ] Authentication/authorization tests
|
||||
- [ ] Compatibility matrix (PostgreSQL versions)
|
||||
- [ ] Platform-specific tests (Windows, macOS, ARM)
|
||||
|
||||
## Resources
|
||||
|
||||
- [pgrx Testing Documentation](https://github.com/tcdi/pgrx)
|
||||
- [proptest Book](https://altsysrq.github.io/proptest-book/)
|
||||
- [Rust Testing Guide](https://doc.rust-lang.org/book/ch11-00-testing.html)
|
||||
- [pgvector Test Suite](https://github.com/pgvector/pgvector/tree/master/test)
|
||||
|
||||
## Support
|
||||
|
||||
For test failures or questions:
|
||||
1. Check existing issues: https://github.com/ruvnet/ruvector/issues
|
||||
2. Run with verbose output
|
||||
3. Check PostgreSQL logs
|
||||
4. Create minimal reproduction case
|
||||
Reference in New Issue
Block a user