Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
371
vendor/ruvector/examples/scipix/docs/BENCHMARKS.md
vendored
Normal file
371
vendor/ruvector/examples/scipix/docs/BENCHMARKS.md
vendored
Normal file
@@ -0,0 +1,371 @@
|
||||
# ruvector-scipix Benchmark Suite
|
||||
|
||||
Comprehensive performance benchmarking for the Scipix OCR clone using Criterion.
|
||||
|
||||
## Overview
|
||||
|
||||
This benchmark suite provides detailed performance analysis across all critical components of the OCR system:
|
||||
|
||||
- **OCR Latency**: End-to-end OCR performance metrics
|
||||
- **Preprocessing**: Image preprocessing pipeline performance
|
||||
- **LaTeX Generation**: LaTeX AST generation and string building
|
||||
- **Inference**: Model inference benchmarks (detection, recognition, math)
|
||||
- **Cache**: Embedding cache and similarity search performance
|
||||
- **API**: REST API request/response handling
|
||||
- **Memory**: Memory usage, growth, and fragmentation analysis
|
||||
|
||||
## Performance Targets
|
||||
|
||||
### Primary Targets
|
||||
|
||||
- **Single Image OCR**: < 100ms at P95
|
||||
- **Batch Processing (16 images)**: < 500ms total
|
||||
- **Preprocessing Pipeline**: < 20ms
|
||||
- **LaTeX Generation**: < 5ms
|
||||
|
||||
### Secondary Targets
|
||||
|
||||
- **Cache Hit Latency**: < 1ms
|
||||
- **Similarity Search (1000 embeddings)**: < 10ms
|
||||
- **API Request Parsing**: < 0.5ms
|
||||
- **Model Warm-up**: < 200ms
|
||||
|
||||
## Running Benchmarks
|
||||
|
||||
### Run All Benchmarks
|
||||
|
||||
```bash
|
||||
cd examples/scipix
|
||||
./scripts/run_benchmarks.sh all
|
||||
```
|
||||
|
||||
### Run Specific Benchmark Suite
|
||||
|
||||
```bash
|
||||
# OCR latency benchmarks
|
||||
./scripts/run_benchmarks.sh latency
|
||||
|
||||
# Preprocessing benchmarks
|
||||
./scripts/run_benchmarks.sh preprocessing
|
||||
|
||||
# LaTeX generation benchmarks
|
||||
./scripts/run_benchmarks.sh latex
|
||||
|
||||
# Model inference benchmarks
|
||||
./scripts/run_benchmarks.sh inference
|
||||
|
||||
# Cache benchmarks
|
||||
./scripts/run_benchmarks.sh cache
|
||||
|
||||
# API benchmarks
|
||||
./scripts/run_benchmarks.sh api
|
||||
|
||||
# Memory benchmarks
|
||||
./scripts/run_benchmarks.sh memory
|
||||
```
|
||||
|
||||
### Quick Benchmark Suite
|
||||
|
||||
For rapid iteration during development:
|
||||
|
||||
```bash
|
||||
./scripts/run_benchmarks.sh quick
|
||||
```
|
||||
|
||||
### CI Benchmark Suite
|
||||
|
||||
Minimal samples for continuous integration:
|
||||
|
||||
```bash
|
||||
./scripts/run_benchmarks.sh ci
|
||||
```
|
||||
|
||||
## Baseline Tracking
|
||||
|
||||
### Save Current Results as Baseline
|
||||
|
||||
```bash
|
||||
BASELINE=v1.0 ./scripts/run_benchmarks.sh all
|
||||
```
|
||||
|
||||
### Compare with Saved Baseline
|
||||
|
||||
```bash
|
||||
./scripts/run_benchmarks.sh compare v1.0
|
||||
```
|
||||
|
||||
### Compare with Main Branch
|
||||
|
||||
```bash
|
||||
BASELINE=main ./scripts/run_benchmarks.sh all
|
||||
./scripts/run_benchmarks.sh compare main
|
||||
```
|
||||
|
||||
## Benchmark Details
|
||||
|
||||
### 1. OCR Latency Benchmarks (`ocr_latency.rs`)
|
||||
|
||||
Tests end-to-end OCR performance across various scenarios:
|
||||
|
||||
- **Single Image OCR**: Different image sizes (224x224 to 1024x1024)
|
||||
- **Batch Processing**: Batch sizes from 1 to 32 images
|
||||
- **Cold vs Warm Start**: Model initialization overhead
|
||||
- **Latency Percentiles**: P50, P95, P99 measurements
|
||||
- **Throughput**: Images per second
|
||||
|
||||
**Key Metrics:**
|
||||
- Mean latency
|
||||
- P95/P99 latency
|
||||
- Throughput (images/sec)
|
||||
- Batch efficiency
|
||||
|
||||
### 2. Preprocessing Benchmarks (`preprocessing.rs`)
|
||||
|
||||
Image preprocessing pipeline performance:
|
||||
|
||||
- **Individual Transforms**: Grayscale, blur, threshold, edge detection
|
||||
- **Full Pipeline**: Sequential preprocessing chain
|
||||
- **Parallel vs Sequential**: Batch processing comparison
|
||||
- **Resize Operations**: Nearest neighbor and bilinear interpolation
|
||||
|
||||
**Key Metrics:**
|
||||
- Transform latency
|
||||
- Pipeline total time
|
||||
- Parallel speedup
|
||||
- Memory overhead
|
||||
|
||||
### 3. LaTeX Generation Benchmarks (`latex_generation.rs`)
|
||||
|
||||
LaTeX code generation from AST:
|
||||
|
||||
- **Simple Expressions**: Fractions, powers, sums
|
||||
- **Complex Expressions**: Matrices, integrals, summations
|
||||
- **AST Traversal**: Tree depth impact on performance
|
||||
- **String Building**: Optimization strategies
|
||||
- **Batch Generation**: Multiple expressions
|
||||
|
||||
**Key Metrics:**
|
||||
- Generation latency
|
||||
- AST traversal time
|
||||
- String concatenation efficiency
|
||||
|
||||
### 4. Inference Benchmarks (`inference.rs`)
|
||||
|
||||
Neural network model inference:
|
||||
|
||||
- **Text Detection Model**: Bounding box detection
|
||||
- **Text Recognition Model**: OCR text extraction
|
||||
- **Math Model**: Mathematical notation recognition
|
||||
- **Tensor Preprocessing**: Image to tensor conversion
|
||||
- **Output Postprocessing**: NMS, confidence filtering, CTC decoding
|
||||
- **Batch Inference**: Multi-image processing
|
||||
- **Model Warm-up**: Initialization overhead
|
||||
|
||||
**Key Metrics:**
|
||||
- Inference latency per model
|
||||
- Batch throughput
|
||||
- Preprocessing overhead
|
||||
- Postprocessing time
|
||||
|
||||
### 5. Cache Benchmarks (`cache.rs`)
|
||||
|
||||
Embedding cache and similarity search:
|
||||
|
||||
- **Embedding Generation**: Image to vector embedding
|
||||
- **Similarity Search**: Linear and approximate nearest neighbor
|
||||
- **Cache Hit/Miss Latency**: Lookup performance
|
||||
- **Cache Insertion**: Add new entries
|
||||
- **Batch Operations**: Multi-query performance
|
||||
- **Cache Statistics**: Memory and efficiency metrics
|
||||
|
||||
**Key Metrics:**
|
||||
- Embedding generation time
|
||||
- Search latency (linear vs ANN)
|
||||
- Hit/miss ratio impact
|
||||
- Memory per embedding
|
||||
|
||||
### 6. API Benchmarks (`api.rs`)
|
||||
|
||||
REST API performance:
|
||||
|
||||
- **Request Parsing**: JSON deserialization
|
||||
- **Response Serialization**: JSON encoding
|
||||
- **Concurrent Requests**: Multi-client handling
|
||||
- **Middleware Overhead**: Auth, logging, validation, rate limiting
|
||||
- **Error Handling**: Error response generation
|
||||
- **End-to-End Request**: Full request cycle
|
||||
|
||||
**Key Metrics:**
|
||||
- Parse/serialize latency
|
||||
- Middleware overhead
|
||||
- Concurrent throughput
|
||||
- Error handling time
|
||||
|
||||
### 7. Memory Benchmarks (`memory.rs`)
|
||||
|
||||
Memory usage and management:
|
||||
|
||||
- **Peak Memory**: Maximum usage during inference
|
||||
- **Memory per Image**: Batch processing memory scaling
|
||||
- **Model Loading**: Memory required for model initialization
|
||||
- **Memory Growth**: Leak detection over time
|
||||
- **Fragmentation**: Allocation/deallocation patterns
|
||||
- **Cache Memory**: Embedding storage overhead
|
||||
- **Memory Pools**: Pool vs heap allocation
|
||||
- **Tensor Layouts**: HWC vs CHW memory impact
|
||||
|
||||
**Key Metrics:**
|
||||
- Peak memory usage
|
||||
- Memory growth rate
|
||||
- Fragmentation level
|
||||
- Pool efficiency
|
||||
|
||||
## HTML Reports
|
||||
|
||||
Criterion automatically generates detailed HTML reports with:
|
||||
|
||||
- Performance graphs
|
||||
- Statistical analysis
|
||||
- Regression detection
|
||||
- Historical comparisons
|
||||
|
||||
### View Reports
|
||||
|
||||
After running benchmarks, open:
|
||||
|
||||
```bash
|
||||
open target/criterion/report/index.html
|
||||
```
|
||||
|
||||
Or for a specific benchmark:
|
||||
|
||||
```bash
|
||||
open target/criterion/ocr_latency/report/index.html
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### Latency Metrics
|
||||
|
||||
- **Mean**: Average latency across all samples
|
||||
- **Median (P50)**: 50th percentile - half of requests are faster
|
||||
- **P95**: 95th percentile - 95% of requests are faster
|
||||
- **P99**: 99th percentile - 99% of requests are faster
|
||||
- **Standard Deviation**: Variance in latency
|
||||
|
||||
### Throughput Metrics
|
||||
|
||||
- **Images/Second**: Processing rate
|
||||
- **Batch Efficiency**: Speedup from batching
|
||||
- **Sustainable Throughput**: Max rate with <95% success
|
||||
|
||||
### Regression Detection
|
||||
|
||||
Criterion detects performance regressions automatically:
|
||||
|
||||
- **Green**: Performance improved
|
||||
- **Yellow**: Minor change (within noise)
|
||||
- **Red**: Performance regressed
|
||||
|
||||
### Memory Metrics
|
||||
|
||||
- **Peak Usage**: Maximum memory at any point
|
||||
- **Growth Rate**: Memory increase over time
|
||||
- **Fragmentation**: Memory layout efficiency
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
1. **Consistent Environment**: Run on the same hardware
|
||||
2. **Quiet System**: Close other applications
|
||||
3. **Multiple Samples**: Use sufficient sample size (50-100)
|
||||
4. **Warm-up**: Allow for JIT compilation and caching
|
||||
5. **Baseline Tracking**: Save results for comparison
|
||||
|
||||
### Analyzing Results
|
||||
|
||||
1. **Focus on Percentiles**: P95/P99 more important than mean
|
||||
2. **Check Variance**: High variance indicates instability
|
||||
3. **Profile Outliers**: Investigate extreme values
|
||||
4. **Memory Leaks**: Monitor growth rate
|
||||
5. **Regression Limits**: Set acceptable thresholds
|
||||
|
||||
### Optimization Workflow
|
||||
|
||||
1. **Baseline**: Establish current performance
|
||||
2. **Profile**: Identify bottlenecks
|
||||
3. **Optimize**: Implement improvements
|
||||
4. **Benchmark**: Measure impact
|
||||
5. **Compare**: Verify improvement vs baseline
|
||||
6. **Iterate**: Repeat until targets met
|
||||
|
||||
## Continuous Integration
|
||||
|
||||
### CI Benchmark Configuration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/benchmark.yml
|
||||
name: Benchmarks
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
benchmark:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- uses: actions-rs/toolchain@v1
|
||||
with:
|
||||
toolchain: stable
|
||||
|
||||
- name: Run benchmarks
|
||||
run: |
|
||||
cd examples/scipix
|
||||
./scripts/run_benchmarks.sh ci
|
||||
|
||||
- name: Compare with baseline
|
||||
run: |
|
||||
cd examples/scipix
|
||||
./scripts/run_benchmarks.sh compare main
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Benchmarks Running Slowly
|
||||
|
||||
- Reduce sample size: `cargo bench -- --sample-size 10`
|
||||
- Use quick mode: `./scripts/run_benchmarks.sh quick`
|
||||
- Run specific benchmarks only
|
||||
|
||||
### Inconsistent Results
|
||||
|
||||
- Ensure system is idle
|
||||
- Disable CPU frequency scaling
|
||||
- Run with higher sample size
|
||||
- Check for thermal throttling
|
||||
|
||||
### Memory Issues
|
||||
|
||||
- Monitor system memory during benchmarks
|
||||
- Use memory profiling tools (valgrind, heaptrack)
|
||||
- Check for memory leaks with growth benchmarks
|
||||
|
||||
## Contributing
|
||||
|
||||
When adding new features:
|
||||
|
||||
1. Add corresponding benchmarks
|
||||
2. Set performance targets
|
||||
3. Run baseline before/after changes
|
||||
4. Document any performance impact
|
||||
5. Update this documentation
|
||||
|
||||
## Resources
|
||||
|
||||
- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
|
||||
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
|
||||
- [Benchmarking Best Practices](https://easyperf.net/blog/)
|
||||
Reference in New Issue
Block a user