Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
188
vendor/ruvector/crates/ruvector-sparse-inference/tests/README.md
vendored
Normal file
188
vendor/ruvector/crates/ruvector-sparse-inference/tests/README.md
vendored
Normal file
@@ -0,0 +1,188 @@
|
||||
# Sparse Inference Engine - Test Suite
|
||||
|
||||
Comprehensive test suite for the RuVector sparse inference engine with 78+ tests and 10 benchmarks across 1516 lines of test code.
|
||||
|
||||
## Test Structure
|
||||
|
||||
### Unit Tests (`tests/unit/`)
|
||||
|
||||
**Predictor Tests** (`predictor_tests.rs` - 12 tests)
|
||||
- Low-rank predictor creation and configuration
|
||||
- Active neuron prediction validation
|
||||
- Top-K mode functionality
|
||||
- Calibration effectiveness
|
||||
- Input validation and edge cases
|
||||
- Consistency and determinism
|
||||
|
||||
**Sparse FFN Tests** (`sparse_ffn_tests.rs` - 14 tests)
|
||||
- Sparse vs dense computation equivalence
|
||||
- Different activation functions (ReLU, GeLU, SiLU)
|
||||
- SwiGLU paired neuron handling
|
||||
- Empty and partial activation sets
|
||||
- Out-of-bounds and duplicate neuron handling
|
||||
- Deterministic output verification
|
||||
|
||||
**Quantization Tests** (`quantization_tests.rs` - 15 tests)
|
||||
- INT8 quantization roundtrip accuracy
|
||||
- INT4 compression ratios
|
||||
- Different group sizes (16, 32, 64, 128)
|
||||
- Selective row dequantization
|
||||
- Range preservation
|
||||
- Uniform and zero value handling
|
||||
- Odd-length array support
|
||||
|
||||
### Integration Tests (`tests/integration/`)
|
||||
|
||||
**Model Loading Tests** (`model_loading_tests.rs` - 15 tests)
|
||||
- GGUF header parsing
|
||||
- Invalid format detection
|
||||
- Model structure validation
|
||||
- Forward pass execution
|
||||
- Configuration handling
|
||||
- Multiple model sizes
|
||||
|
||||
**Sparse Inference Tests** (`sparse_inference_tests.rs` - 12 tests)
|
||||
- Full sparse pipeline execution
|
||||
- Dense vs sparse accuracy comparison
|
||||
- Batch processing
|
||||
- Calibration improvements
|
||||
- Different sparsity levels (10%-90%)
|
||||
- Consistency verification
|
||||
- Extreme input handling
|
||||
|
||||
### Property-Based Tests (`tests/property/mod.rs` - 10 tests)
|
||||
Using `proptest` for generative testing:
|
||||
- Output finiteness invariants
|
||||
- Valid index generation
|
||||
- Dense/sparse equivalence
|
||||
- Quantization ordering preservation
|
||||
- Top-K constraints
|
||||
- Dimension correctness
|
||||
- INT4 roundtrip properties
|
||||
- Output dimension consistency
|
||||
- SwiGLU output validation
|
||||
- Calibration robustness
|
||||
|
||||
### Benchmark Tests (`benches/sparse_inference_bench.rs` - 10 benchmarks)
|
||||
|
||||
**Performance Comparisons:**
|
||||
1. **Sparse vs Dense**: Baseline comparison
|
||||
2. **Sparsity Levels**: 30%, 50%, 70%, 90% sparsity
|
||||
3. **Predictor Performance**: Prediction latency
|
||||
4. **Top-K Modes**: K=100, 500, 1000, 2000
|
||||
5. **Sparse FFN**: Dense vs 10% vs 50% sparse
|
||||
6. **Activation Functions**: ReLU, GeLU, SiLU comparison
|
||||
7. **Quantization**: Dequantization of 1, 10, 100 rows
|
||||
8. **INT4 vs INT8**: Quantization speed and accuracy
|
||||
9. **Calibration**: Sample sizes 10, 50, 100, 500
|
||||
10. **SwiGLU**: Dense vs sparse comparison
|
||||
|
||||
## Common Test Utilities (`tests/common/mod.rs`)
|
||||
|
||||
Helper functions for all tests:
|
||||
- `random_vector(dim)` - Generate test vectors
|
||||
- `random_activations(max)` - Generate activation patterns
|
||||
- `create_test_ffn(input, hidden)` - FFN factory
|
||||
- `create_calibrated_predictor()` - Pre-calibrated predictor
|
||||
- `create_quantized_matrix(rows, cols)` - Quantized weights
|
||||
- `load_test_llama_model()` - Test model loader
|
||||
- `assert_vectors_close(a, b, tol)` - Approximate equality
|
||||
- `mse(a, b)` - Mean squared error
|
||||
- `generate_calibration_data(n)` - Calibration dataset
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
cargo test -p ruvector-sparse-inference
|
||||
|
||||
# Run specific test categories
|
||||
cargo test -p ruvector-sparse-inference --test unit
|
||||
cargo test -p ruvector-sparse-inference --test integration
|
||||
cargo test -p ruvector-sparse-inference --test property
|
||||
|
||||
# Run unit tests for a specific module
|
||||
cargo test -p ruvector-sparse-inference predictor_tests
|
||||
cargo test -p ruvector-sparse-inference quantization_tests
|
||||
cargo test -p ruvector-sparse-inference sparse_ffn_tests
|
||||
|
||||
# Run benchmarks
|
||||
cargo bench -p ruvector-sparse-inference
|
||||
|
||||
# Run specific benchmark
|
||||
cargo bench -p ruvector-sparse-inference -- sparse_vs_dense
|
||||
cargo bench -p ruvector-sparse-inference -- sparsity_levels
|
||||
cargo bench -p ruvector-sparse-inference -- quantization
|
||||
```
|
||||
|
||||
## Test Coverage Goals
|
||||
|
||||
- **Statements**: >80%
|
||||
- **Branches**: >75%
|
||||
- **Functions**: >80%
|
||||
- **Lines**: >80%
|
||||
|
||||
## Test Characteristics
|
||||
|
||||
Tests follow the **FIRST** principles:
|
||||
- **Fast**: Unit tests <100ms
|
||||
- **Isolated**: No dependencies between tests
|
||||
- **Repeatable**: Same result every time
|
||||
- **Self-validating**: Clear pass/fail
|
||||
- **Timely**: Written with implementation
|
||||
|
||||
## Property-Based Testing
|
||||
|
||||
Tests use `proptest` to verify invariants across wide input ranges:
|
||||
- Input values: -10.0 to 10.0
|
||||
- Vector dimensions: 256 to 1024
|
||||
- Hidden dimensions: 512 to 4096
|
||||
- Group sizes: 16, 32, 64, 128
|
||||
- Sample counts: 1 to 100
|
||||
|
||||
## Edge Cases Tested
|
||||
|
||||
1. **Empty inputs**: Zero-length vectors, no active neurons
|
||||
2. **Boundary values**: Maximum dimensions, extreme values
|
||||
3. **Invalid inputs**: Wrong dimensions, out-of-bounds indices
|
||||
4. **Numerical stability**: Very large/small values, precision loss
|
||||
5. **Concurrent operations**: Parallel inference requests
|
||||
6. **Memory efficiency**: Large datasets, quantization compression
|
||||
|
||||
## Test Organization
|
||||
|
||||
```
|
||||
tests/
|
||||
├── common/
|
||||
│ └── mod.rs # Shared test utilities
|
||||
├── unit/
|
||||
│ ├── predictor_tests.rs # Neuron prediction tests
|
||||
│ ├── sparse_ffn_tests.rs # Sparse computation tests
|
||||
│ └── quantization_tests.rs # Weight compression tests
|
||||
├── integration/
|
||||
│ ├── model_loading_tests.rs # GGUF parsing tests
|
||||
│ └── sparse_inference_tests.rs # End-to-end pipeline tests
|
||||
└── property/
|
||||
└── mod.rs # Property-based tests
|
||||
|
||||
benches/
|
||||
└── sparse_inference_bench.rs # Performance benchmarks
|
||||
```
|
||||
|
||||
## Future Test Additions
|
||||
|
||||
Potential areas for expansion:
|
||||
1. Stress tests for memory limits
|
||||
2. Concurrent inference benchmarks
|
||||
3. Hardware-specific SIMD tests
|
||||
4. Model-specific accuracy tests
|
||||
5. Calibration strategy comparisons
|
||||
6. Cache effectiveness tests
|
||||
7. Quantization accuracy analysis
|
||||
|
||||
---
|
||||
|
||||
**Total Test Coverage**: 78+ tests across 1516 lines
|
||||
- 68 unit/integration tests
|
||||
- 10 property-based tests
|
||||
- 10 performance benchmarks
|
||||
Reference in New Issue
Block a user