git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
9.1 KiB
ruvector-scipix Benchmark Suite
Comprehensive performance benchmarking for the Scipix OCR clone using Criterion.
Overview
This benchmark suite provides detailed performance analysis across all critical components of the OCR system:
- OCR Latency: End-to-end OCR performance metrics
- Preprocessing: Image preprocessing pipeline performance
- LaTeX Generation: LaTeX AST generation and string building
- Inference: Model inference benchmarks (detection, recognition, math)
- Cache: Embedding cache and similarity search performance
- API: REST API request/response handling
- Memory: Memory usage, growth, and fragmentation analysis
Performance Targets
Primary Targets
- Single Image OCR: < 100ms at P95
- Batch Processing (16 images): < 500ms total
- Preprocessing Pipeline: < 20ms
- LaTeX Generation: < 5ms
Secondary Targets
- Cache Hit Latency: < 1ms
- Similarity Search (1000 embeddings): < 10ms
- API Request Parsing: < 0.5ms
- Model Warm-up: < 200ms
Running Benchmarks
Run All Benchmarks
cd examples/scipix
./scripts/run_benchmarks.sh all
Run Specific Benchmark Suite
# OCR latency benchmarks
./scripts/run_benchmarks.sh latency
# Preprocessing benchmarks
./scripts/run_benchmarks.sh preprocessing
# LaTeX generation benchmarks
./scripts/run_benchmarks.sh latex
# Model inference benchmarks
./scripts/run_benchmarks.sh inference
# Cache benchmarks
./scripts/run_benchmarks.sh cache
# API benchmarks
./scripts/run_benchmarks.sh api
# Memory benchmarks
./scripts/run_benchmarks.sh memory
Quick Benchmark Suite
For rapid iteration during development:
./scripts/run_benchmarks.sh quick
CI Benchmark Suite
Minimal samples for continuous integration:
./scripts/run_benchmarks.sh ci
Baseline Tracking
Save Current Results as Baseline
BASELINE=v1.0 ./scripts/run_benchmarks.sh all
Compare with Saved Baseline
./scripts/run_benchmarks.sh compare v1.0
Compare with Main Branch
BASELINE=main ./scripts/run_benchmarks.sh all
./scripts/run_benchmarks.sh compare main
Benchmark Details
1. OCR Latency Benchmarks (ocr_latency.rs)
Tests end-to-end OCR performance across various scenarios:
- Single Image OCR: Different image sizes (224x224 to 1024x1024)
- Batch Processing: Batch sizes from 1 to 32 images
- Cold vs Warm Start: Model initialization overhead
- Latency Percentiles: P50, P95, P99 measurements
- Throughput: Images per second
Key Metrics:
- Mean latency
- P95/P99 latency
- Throughput (images/sec)
- Batch efficiency
2. Preprocessing Benchmarks (preprocessing.rs)
Image preprocessing pipeline performance:
- Individual Transforms: Grayscale, blur, threshold, edge detection
- Full Pipeline: Sequential preprocessing chain
- Parallel vs Sequential: Batch processing comparison
- Resize Operations: Nearest neighbor and bilinear interpolation
Key Metrics:
- Transform latency
- Pipeline total time
- Parallel speedup
- Memory overhead
3. LaTeX Generation Benchmarks (latex_generation.rs)
LaTeX code generation from AST:
- Simple Expressions: Fractions, powers, sums
- Complex Expressions: Matrices, integrals, summations
- AST Traversal: Tree depth impact on performance
- String Building: Optimization strategies
- Batch Generation: Multiple expressions
Key Metrics:
- Generation latency
- AST traversal time
- String concatenation efficiency
4. Inference Benchmarks (inference.rs)
Neural network model inference:
- Text Detection Model: Bounding box detection
- Text Recognition Model: OCR text extraction
- Math Model: Mathematical notation recognition
- Tensor Preprocessing: Image to tensor conversion
- Output Postprocessing: NMS, confidence filtering, CTC decoding
- Batch Inference: Multi-image processing
- Model Warm-up: Initialization overhead
Key Metrics:
- Inference latency per model
- Batch throughput
- Preprocessing overhead
- Postprocessing time
5. Cache Benchmarks (cache.rs)
Embedding cache and similarity search:
- Embedding Generation: Image to vector embedding
- Similarity Search: Linear and approximate nearest neighbor
- Cache Hit/Miss Latency: Lookup performance
- Cache Insertion: Add new entries
- Batch Operations: Multi-query performance
- Cache Statistics: Memory and efficiency metrics
Key Metrics:
- Embedding generation time
- Search latency (linear vs ANN)
- Hit/miss ratio impact
- Memory per embedding
6. API Benchmarks (api.rs)
REST API performance:
- Request Parsing: JSON deserialization
- Response Serialization: JSON encoding
- Concurrent Requests: Multi-client handling
- Middleware Overhead: Auth, logging, validation, rate limiting
- Error Handling: Error response generation
- End-to-End Request: Full request cycle
Key Metrics:
- Parse/serialize latency
- Middleware overhead
- Concurrent throughput
- Error handling time
7. Memory Benchmarks (memory.rs)
Memory usage and management:
- Peak Memory: Maximum usage during inference
- Memory per Image: Batch processing memory scaling
- Model Loading: Memory required for model initialization
- Memory Growth: Leak detection over time
- Fragmentation: Allocation/deallocation patterns
- Cache Memory: Embedding storage overhead
- Memory Pools: Pool vs heap allocation
- Tensor Layouts: HWC vs CHW memory impact
Key Metrics:
- Peak memory usage
- Memory growth rate
- Fragmentation level
- Pool efficiency
HTML Reports
Criterion automatically generates detailed HTML reports with:
- Performance graphs
- Statistical analysis
- Regression detection
- Historical comparisons
View Reports
After running benchmarks, open:
open target/criterion/report/index.html
Or for a specific benchmark:
open target/criterion/ocr_latency/report/index.html
Interpreting Results
Latency Metrics
- Mean: Average latency across all samples
- Median (P50): 50th percentile - half of requests are faster
- P95: 95th percentile - 95% of requests are faster
- P99: 99th percentile - 99% of requests are faster
- Standard Deviation: Variance in latency
Throughput Metrics
- Images/Second: Processing rate
- Batch Efficiency: Speedup from batching
- Sustainable Throughput: Max rate with <95% success
Regression Detection
Criterion detects performance regressions automatically:
- Green: Performance improved
- Yellow: Minor change (within noise)
- Red: Performance regressed
Memory Metrics
- Peak Usage: Maximum memory at any point
- Growth Rate: Memory increase over time
- Fragmentation: Memory layout efficiency
Best Practices
Running Benchmarks
- Consistent Environment: Run on the same hardware
- Quiet System: Close other applications
- Multiple Samples: Use sufficient sample size (50-100)
- Warm-up: Allow for JIT compilation and caching
- Baseline Tracking: Save results for comparison
Analyzing Results
- Focus on Percentiles: P95/P99 more important than mean
- Check Variance: High variance indicates instability
- Profile Outliers: Investigate extreme values
- Memory Leaks: Monitor growth rate
- Regression Limits: Set acceptable thresholds
Optimization Workflow
- Baseline: Establish current performance
- Profile: Identify bottlenecks
- Optimize: Implement improvements
- Benchmark: Measure impact
- Compare: Verify improvement vs baseline
- Iterate: Repeat until targets met
Continuous Integration
CI Benchmark Configuration
# .github/workflows/benchmark.yml
name: Benchmarks
on:
pull_request:
push:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Run benchmarks
run: |
cd examples/scipix
./scripts/run_benchmarks.sh ci
- name: Compare with baseline
run: |
cd examples/scipix
./scripts/run_benchmarks.sh compare main
Troubleshooting
Benchmarks Running Slowly
- Reduce sample size:
cargo bench -- --sample-size 10 - Use quick mode:
./scripts/run_benchmarks.sh quick - Run specific benchmarks only
Inconsistent Results
- Ensure system is idle
- Disable CPU frequency scaling
- Run with higher sample size
- Check for thermal throttling
Memory Issues
- Monitor system memory during benchmarks
- Use memory profiling tools (valgrind, heaptrack)
- Check for memory leaks with growth benchmarks
Contributing
When adding new features:
- Add corresponding benchmarks
- Set performance targets
- Run baseline before/after changes
- Document any performance impact
- Update this documentation