git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
372 lines
9.1 KiB
Markdown
372 lines
9.1 KiB
Markdown
# ruvector-scipix Benchmark Suite
|
|
|
|
Comprehensive performance benchmarking for the Scipix OCR clone using Criterion.
|
|
|
|
## Overview
|
|
|
|
This benchmark suite provides detailed performance analysis across all critical components of the OCR system:
|
|
|
|
- **OCR Latency**: End-to-end OCR performance metrics
|
|
- **Preprocessing**: Image preprocessing pipeline performance
|
|
- **LaTeX Generation**: LaTeX AST generation and string building
|
|
- **Inference**: Model inference benchmarks (detection, recognition, math)
|
|
- **Cache**: Embedding cache and similarity search performance
|
|
- **API**: REST API request/response handling
|
|
- **Memory**: Memory usage, growth, and fragmentation analysis
|
|
|
|
## Performance Targets
|
|
|
|
### Primary Targets
|
|
|
|
- **Single Image OCR**: < 100ms at P95
|
|
- **Batch Processing (16 images)**: < 500ms total
|
|
- **Preprocessing Pipeline**: < 20ms
|
|
- **LaTeX Generation**: < 5ms
|
|
|
|
### Secondary Targets
|
|
|
|
- **Cache Hit Latency**: < 1ms
|
|
- **Similarity Search (1000 embeddings)**: < 10ms
|
|
- **API Request Parsing**: < 0.5ms
|
|
- **Model Warm-up**: < 200ms
|
|
|
|
## Running Benchmarks
|
|
|
|
### Run All Benchmarks
|
|
|
|
```bash
|
|
cd examples/scipix
|
|
./scripts/run_benchmarks.sh all
|
|
```
|
|
|
|
### Run Specific Benchmark Suite
|
|
|
|
```bash
|
|
# OCR latency benchmarks
|
|
./scripts/run_benchmarks.sh latency
|
|
|
|
# Preprocessing benchmarks
|
|
./scripts/run_benchmarks.sh preprocessing
|
|
|
|
# LaTeX generation benchmarks
|
|
./scripts/run_benchmarks.sh latex
|
|
|
|
# Model inference benchmarks
|
|
./scripts/run_benchmarks.sh inference
|
|
|
|
# Cache benchmarks
|
|
./scripts/run_benchmarks.sh cache
|
|
|
|
# API benchmarks
|
|
./scripts/run_benchmarks.sh api
|
|
|
|
# Memory benchmarks
|
|
./scripts/run_benchmarks.sh memory
|
|
```
|
|
|
|
### Quick Benchmark Suite
|
|
|
|
For rapid iteration during development:
|
|
|
|
```bash
|
|
./scripts/run_benchmarks.sh quick
|
|
```
|
|
|
|
### CI Benchmark Suite
|
|
|
|
Minimal samples for continuous integration:
|
|
|
|
```bash
|
|
./scripts/run_benchmarks.sh ci
|
|
```
|
|
|
|
## Baseline Tracking
|
|
|
|
### Save Current Results as Baseline
|
|
|
|
```bash
|
|
BASELINE=v1.0 ./scripts/run_benchmarks.sh all
|
|
```
|
|
|
|
### Compare with Saved Baseline
|
|
|
|
```bash
|
|
./scripts/run_benchmarks.sh compare v1.0
|
|
```
|
|
|
|
### Compare with Main Branch
|
|
|
|
```bash
|
|
BASELINE=main ./scripts/run_benchmarks.sh all
|
|
./scripts/run_benchmarks.sh compare main
|
|
```
|
|
|
|
## Benchmark Details
|
|
|
|
### 1. OCR Latency Benchmarks (`ocr_latency.rs`)
|
|
|
|
Tests end-to-end OCR performance across various scenarios:
|
|
|
|
- **Single Image OCR**: Different image sizes (224x224 to 1024x1024)
|
|
- **Batch Processing**: Batch sizes from 1 to 32 images
|
|
- **Cold vs Warm Start**: Model initialization overhead
|
|
- **Latency Percentiles**: P50, P95, P99 measurements
|
|
- **Throughput**: Images per second
|
|
|
|
**Key Metrics:**
|
|
- Mean latency
|
|
- P95/P99 latency
|
|
- Throughput (images/sec)
|
|
- Batch efficiency
|
|
|
|
### 2. Preprocessing Benchmarks (`preprocessing.rs`)
|
|
|
|
Image preprocessing pipeline performance:
|
|
|
|
- **Individual Transforms**: Grayscale, blur, threshold, edge detection
|
|
- **Full Pipeline**: Sequential preprocessing chain
|
|
- **Parallel vs Sequential**: Batch processing comparison
|
|
- **Resize Operations**: Nearest neighbor and bilinear interpolation
|
|
|
|
**Key Metrics:**
|
|
- Transform latency
|
|
- Pipeline total time
|
|
- Parallel speedup
|
|
- Memory overhead
|
|
|
|
### 3. LaTeX Generation Benchmarks (`latex_generation.rs`)
|
|
|
|
LaTeX code generation from AST:
|
|
|
|
- **Simple Expressions**: Fractions, powers, sums
|
|
- **Complex Expressions**: Matrices, integrals, summations
|
|
- **AST Traversal**: Tree depth impact on performance
|
|
- **String Building**: Optimization strategies
|
|
- **Batch Generation**: Multiple expressions
|
|
|
|
**Key Metrics:**
|
|
- Generation latency
|
|
- AST traversal time
|
|
- String concatenation efficiency
|
|
|
|
### 4. Inference Benchmarks (`inference.rs`)
|
|
|
|
Neural network model inference:
|
|
|
|
- **Text Detection Model**: Bounding box detection
|
|
- **Text Recognition Model**: OCR text extraction
|
|
- **Math Model**: Mathematical notation recognition
|
|
- **Tensor Preprocessing**: Image to tensor conversion
|
|
- **Output Postprocessing**: NMS, confidence filtering, CTC decoding
|
|
- **Batch Inference**: Multi-image processing
|
|
- **Model Warm-up**: Initialization overhead
|
|
|
|
**Key Metrics:**
|
|
- Inference latency per model
|
|
- Batch throughput
|
|
- Preprocessing overhead
|
|
- Postprocessing time
|
|
|
|
### 5. Cache Benchmarks (`cache.rs`)
|
|
|
|
Embedding cache and similarity search:
|
|
|
|
- **Embedding Generation**: Image to vector embedding
|
|
- **Similarity Search**: Linear and approximate nearest neighbor
|
|
- **Cache Hit/Miss Latency**: Lookup performance
|
|
- **Cache Insertion**: Add new entries
|
|
- **Batch Operations**: Multi-query performance
|
|
- **Cache Statistics**: Memory and efficiency metrics
|
|
|
|
**Key Metrics:**
|
|
- Embedding generation time
|
|
- Search latency (linear vs ANN)
|
|
- Hit/miss ratio impact
|
|
- Memory per embedding
|
|
|
|
### 6. API Benchmarks (`api.rs`)
|
|
|
|
REST API performance:
|
|
|
|
- **Request Parsing**: JSON deserialization
|
|
- **Response Serialization**: JSON encoding
|
|
- **Concurrent Requests**: Multi-client handling
|
|
- **Middleware Overhead**: Auth, logging, validation, rate limiting
|
|
- **Error Handling**: Error response generation
|
|
- **End-to-End Request**: Full request cycle
|
|
|
|
**Key Metrics:**
|
|
- Parse/serialize latency
|
|
- Middleware overhead
|
|
- Concurrent throughput
|
|
- Error handling time
|
|
|
|
### 7. Memory Benchmarks (`memory.rs`)
|
|
|
|
Memory usage and management:
|
|
|
|
- **Peak Memory**: Maximum usage during inference
|
|
- **Memory per Image**: Batch processing memory scaling
|
|
- **Model Loading**: Memory required for model initialization
|
|
- **Memory Growth**: Leak detection over time
|
|
- **Fragmentation**: Allocation/deallocation patterns
|
|
- **Cache Memory**: Embedding storage overhead
|
|
- **Memory Pools**: Pool vs heap allocation
|
|
- **Tensor Layouts**: HWC vs CHW memory impact
|
|
|
|
**Key Metrics:**
|
|
- Peak memory usage
|
|
- Memory growth rate
|
|
- Fragmentation level
|
|
- Pool efficiency
|
|
|
|
## HTML Reports
|
|
|
|
Criterion automatically generates detailed HTML reports with:
|
|
|
|
- Performance graphs
|
|
- Statistical analysis
|
|
- Regression detection
|
|
- Historical comparisons
|
|
|
|
### View Reports
|
|
|
|
After running benchmarks, open:
|
|
|
|
```bash
|
|
open target/criterion/report/index.html
|
|
```
|
|
|
|
Or for a specific benchmark:
|
|
|
|
```bash
|
|
open target/criterion/ocr_latency/report/index.html
|
|
```
|
|
|
|
## Interpreting Results
|
|
|
|
### Latency Metrics
|
|
|
|
- **Mean**: Average latency across all samples
|
|
- **Median (P50)**: 50th percentile - half of requests are faster
|
|
- **P95**: 95th percentile - 95% of requests are faster
|
|
- **P99**: 99th percentile - 99% of requests are faster
|
|
- **Standard Deviation**: Variance in latency
|
|
|
|
### Throughput Metrics
|
|
|
|
- **Images/Second**: Processing rate
|
|
- **Batch Efficiency**: Speedup from batching
|
|
- **Sustainable Throughput**: Max rate with <95% success
|
|
|
|
### Regression Detection
|
|
|
|
Criterion detects performance regressions automatically:
|
|
|
|
- **Green**: Performance improved
|
|
- **Yellow**: Minor change (within noise)
|
|
- **Red**: Performance regressed
|
|
|
|
### Memory Metrics
|
|
|
|
- **Peak Usage**: Maximum memory at any point
|
|
- **Growth Rate**: Memory increase over time
|
|
- **Fragmentation**: Memory layout efficiency
|
|
|
|
## Best Practices
|
|
|
|
### Running Benchmarks
|
|
|
|
1. **Consistent Environment**: Run on the same hardware
|
|
2. **Quiet System**: Close other applications
|
|
3. **Multiple Samples**: Use sufficient sample size (50-100)
|
|
4. **Warm-up**: Allow for JIT compilation and caching
|
|
5. **Baseline Tracking**: Save results for comparison
|
|
|
|
### Analyzing Results
|
|
|
|
1. **Focus on Percentiles**: P95/P99 more important than mean
|
|
2. **Check Variance**: High variance indicates instability
|
|
3. **Profile Outliers**: Investigate extreme values
|
|
4. **Memory Leaks**: Monitor growth rate
|
|
5. **Regression Limits**: Set acceptable thresholds
|
|
|
|
### Optimization Workflow
|
|
|
|
1. **Baseline**: Establish current performance
|
|
2. **Profile**: Identify bottlenecks
|
|
3. **Optimize**: Implement improvements
|
|
4. **Benchmark**: Measure impact
|
|
5. **Compare**: Verify improvement vs baseline
|
|
6. **Iterate**: Repeat until targets met
|
|
|
|
## Continuous Integration
|
|
|
|
### CI Benchmark Configuration
|
|
|
|
```yaml
|
|
# .github/workflows/benchmark.yml
|
|
name: Benchmarks
|
|
|
|
on:
|
|
pull_request:
|
|
push:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
benchmark:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
- uses: actions-rs/toolchain@v1
|
|
with:
|
|
toolchain: stable
|
|
|
|
- name: Run benchmarks
|
|
run: |
|
|
cd examples/scipix
|
|
./scripts/run_benchmarks.sh ci
|
|
|
|
- name: Compare with baseline
|
|
run: |
|
|
cd examples/scipix
|
|
./scripts/run_benchmarks.sh compare main
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Benchmarks Running Slowly
|
|
|
|
- Reduce sample size: `cargo bench -- --sample-size 10`
|
|
- Use quick mode: `./scripts/run_benchmarks.sh quick`
|
|
- Run specific benchmarks only
|
|
|
|
### Inconsistent Results
|
|
|
|
- Ensure system is idle
|
|
- Disable CPU frequency scaling
|
|
- Run with higher sample size
|
|
- Check for thermal throttling
|
|
|
|
### Memory Issues
|
|
|
|
- Monitor system memory during benchmarks
|
|
- Use memory profiling tools (valgrind, heaptrack)
|
|
- Check for memory leaks with growth benchmarks
|
|
|
|
## Contributing
|
|
|
|
When adding new features:
|
|
|
|
1. Add corresponding benchmarks
|
|
2. Set performance targets
|
|
3. Run baseline before/after changes
|
|
4. Document any performance impact
|
|
5. Update this documentation
|
|
|
|
## Resources
|
|
|
|
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
|
|
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
|
|
- [Benchmarking Best Practices](https://easyperf.net/blog/)
|