wifi-densepose/examples/scipix/docs/BENCHMARKS.md

# ruvector-scipix Benchmark Suite

Comprehensive performance benchmarking for the Scipix OCR clone using Criterion.

## Overview

This benchmark suite provides detailed performance analysis across all critical components of the OCR system:

- **OCR Latency**: End-to-end OCR performance metrics
- **Preprocessing**: Image preprocessing pipeline performance
- **LaTeX Generation**: LaTeX AST generation and string building
- **Inference**: Model inference benchmarks (detection, recognition, math)
- **Cache**: Embedding cache and similarity search performance
- **API**: REST API request/response handling
- **Memory**: Memory usage, growth, and fragmentation analysis

## Performance Targets

### Primary Targets

- **Single Image OCR**: < 100ms at P95
- **Batch Processing (16 images)**: < 500ms total
- **Preprocessing Pipeline**: < 20ms
- **LaTeX Generation**: < 5ms

### Secondary Targets

- **Cache Hit Latency**: < 1ms
- **Similarity Search (1000 embeddings)**: < 10ms
- **API Request Parsing**: < 0.5ms
- **Model Warm-up**: < 200ms

## Running Benchmarks

### Run All Benchmarks

```bash
cd examples/scipix
./scripts/run_benchmarks.sh all
```

### Run Specific Benchmark Suite

```bash
# OCR latency benchmarks
./scripts/run_benchmarks.sh latency

# Preprocessing benchmarks
./scripts/run_benchmarks.sh preprocessing

# LaTeX generation benchmarks
./scripts/run_benchmarks.sh latex

# Model inference benchmarks
./scripts/run_benchmarks.sh inference

# Cache benchmarks
./scripts/run_benchmarks.sh cache

# API benchmarks
./scripts/run_benchmarks.sh api

# Memory benchmarks
./scripts/run_benchmarks.sh memory
```

### Quick Benchmark Suite

For rapid iteration during development:

```bash
./scripts/run_benchmarks.sh quick
```

### CI Benchmark Suite

Minimal samples for continuous integration:

```bash
./scripts/run_benchmarks.sh ci
```

## Baseline Tracking

### Save Current Results as Baseline

```bash
BASELINE=v1.0 ./scripts/run_benchmarks.sh all
```

### Compare with Saved Baseline

```bash
./scripts/run_benchmarks.sh compare v1.0
```

### Compare with Main Branch

```bash
BASELINE=main ./scripts/run_benchmarks.sh all
./scripts/run_benchmarks.sh compare main
```

## Benchmark Details

### 1. OCR Latency Benchmarks (`ocr_latency.rs`)

Tests end-to-end OCR performance across various scenarios:

- **Single Image OCR**: Different image sizes (224x224 to 1024x1024)
- **Batch Processing**: Batch sizes from 1 to 32 images
- **Cold vs Warm Start**: Model initialization overhead
- **Latency Percentiles**: P50, P95, P99 measurements
- **Throughput**: Images per second

**Key Metrics:**
- Mean latency
- P95/P99 latency
- Throughput (images/sec)
- Batch efficiency

### 2. Preprocessing Benchmarks (`preprocessing.rs`)

Image preprocessing pipeline performance:

- **Individual Transforms**: Grayscale, blur, threshold, edge detection
- **Full Pipeline**: Sequential preprocessing chain
- **Parallel vs Sequential**: Batch processing comparison
- **Resize Operations**: Nearest neighbor and bilinear interpolation

**Key Metrics:**
- Transform latency
- Pipeline total time
- Parallel speedup
- Memory overhead

### 3. LaTeX Generation Benchmarks (`latex_generation.rs`)

LaTeX code generation from AST:

- **Simple Expressions**: Fractions, powers, sums
- **Complex Expressions**: Matrices, integrals, summations
- **AST Traversal**: Tree depth impact on performance
- **String Building**: Optimization strategies
- **Batch Generation**: Multiple expressions

**Key Metrics:**
- Generation latency
- AST traversal time
- String concatenation efficiency

### 4. Inference Benchmarks (`inference.rs`)

Neural network model inference:

- **Text Detection Model**: Bounding box detection
- **Text Recognition Model**: OCR text extraction
- **Math Model**: Mathematical notation recognition
- **Tensor Preprocessing**: Image to tensor conversion
- **Output Postprocessing**: NMS, confidence filtering, CTC decoding
- **Batch Inference**: Multi-image processing
- **Model Warm-up**: Initialization overhead

**Key Metrics:**
- Inference latency per model
- Batch throughput
- Preprocessing overhead
- Postprocessing time

### 5. Cache Benchmarks (`cache.rs`)

Embedding cache and similarity search:

- **Embedding Generation**: Image to vector embedding
- **Similarity Search**: Linear and approximate nearest neighbor
- **Cache Hit/Miss Latency**: Lookup performance
- **Cache Insertion**: Add new entries
- **Batch Operations**: Multi-query performance
- **Cache Statistics**: Memory and efficiency metrics

**Key Metrics:**
- Embedding generation time
- Search latency (linear vs ANN)
- Hit/miss ratio impact
- Memory per embedding

### 6. API Benchmarks (`api.rs`)

REST API performance:

- **Request Parsing**: JSON deserialization
- **Response Serialization**: JSON encoding
- **Concurrent Requests**: Multi-client handling
- **Middleware Overhead**: Auth, logging, validation, rate limiting
- **Error Handling**: Error response generation
- **End-to-End Request**: Full request cycle

**Key Metrics:**
- Parse/serialize latency
- Middleware overhead
- Concurrent throughput
- Error handling time

### 7. Memory Benchmarks (`memory.rs`)

Memory usage and management:

- **Peak Memory**: Maximum usage during inference
- **Memory per Image**: Batch processing memory scaling
- **Model Loading**: Memory required for model initialization
- **Memory Growth**: Leak detection over time
- **Fragmentation**: Allocation/deallocation patterns
- **Cache Memory**: Embedding storage overhead
- **Memory Pools**: Pool vs heap allocation
- **Tensor Layouts**: HWC vs CHW memory impact

**Key Metrics:**
- Peak memory usage
- Memory growth rate
- Fragmentation level
- Pool efficiency

## HTML Reports

Criterion automatically generates detailed HTML reports with:

- Performance graphs
- Statistical analysis
- Regression detection
- Historical comparisons

### View Reports

After running benchmarks, open:

```bash
open target/criterion/report/index.html
```

Or for a specific benchmark:

```bash
open target/criterion/ocr_latency/report/index.html
```

## Interpreting Results

### Latency Metrics

- **Mean**: Average latency across all samples
- **Median (P50)**: 50th percentile - half of requests are faster
- **P95**: 95th percentile - 95% of requests are faster
- **P99**: 99th percentile - 99% of requests are faster
- **Standard Deviation**: Variance in latency

### Throughput Metrics

- **Images/Second**: Processing rate
- **Batch Efficiency**: Speedup from batching
- **Sustainable Throughput**: Max rate with <95% success

### Regression Detection

Criterion detects performance regressions automatically:

- **Green**: Performance improved
- **Yellow**: Minor change (within noise)
- **Red**: Performance regressed

### Memory Metrics

- **Peak Usage**: Maximum memory at any point
- **Growth Rate**: Memory increase over time
- **Fragmentation**: Memory layout efficiency

## Best Practices

### Running Benchmarks

1. **Consistent Environment**: Run on the same hardware
2. **Quiet System**: Close other applications
3. **Multiple Samples**: Use sufficient sample size (50-100)
4. **Warm-up**: Allow for JIT compilation and caching
5. **Baseline Tracking**: Save results for comparison

### Analyzing Results

1. **Focus on Percentiles**: P95/P99 more important than mean
2. **Check Variance**: High variance indicates instability
3. **Profile Outliers**: Investigate extreme values
4. **Memory Leaks**: Monitor growth rate
5. **Regression Limits**: Set acceptable thresholds

### Optimization Workflow

1. **Baseline**: Establish current performance
2. **Profile**: Identify bottlenecks
3. **Optimize**: Implement improvements
4. **Benchmark**: Measure impact
5. **Compare**: Verify improvement vs baseline
6. **Iterate**: Repeat until targets met

## Continuous Integration

### CI Benchmark Configuration

```yaml
# .github/workflows/benchmark.yml
name: Benchmarks

on:
  pull_request:
  push:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable

      - name: Run benchmarks
        run: |
          cd examples/scipix
          ./scripts/run_benchmarks.sh ci

      - name: Compare with baseline
        run: |
          cd examples/scipix
          ./scripts/run_benchmarks.sh compare main
```

## Troubleshooting

### Benchmarks Running Slowly

- Reduce sample size: `cargo bench -- --sample-size 10`
- Use quick mode: `./scripts/run_benchmarks.sh quick`
- Run specific benchmarks only

### Inconsistent Results

- Ensure system is idle
- Disable CPU frequency scaling
- Run with higher sample size
- Check for thermal throttling

### Memory Issues

- Monitor system memory during benchmarks
- Use memory profiling tools (valgrind, heaptrack)
- Check for memory leaks with growth benchmarks

## Contributing

When adding new features:

1. Add corresponding benchmarks
2. Set performance targets
3. Run baseline before/after changes
4. Document any performance impact
5. Update this documentation

## Resources

- [Criterion.rs Documentation](https://bheisler.github.io/criterion.rs/book/)
- [Rust Performance Book](https://nnethercote.github.io/perf-book/)
- [Benchmarking Best Practices](https://easyperf.net/blog/)