Files
wifi-densepose/vendor/ruvector/examples/scipix/docs/08_BENCHMARKS.md

37 KiB
Raw Blame History

Benchmarking Strategy for Ruvector-Scipix OCR System

Overview

This document outlines a comprehensive benchmarking strategy for the ruvector-scipix OCR system, covering performance metrics, accuracy metrics, datasets, baselines, and implementation details.

1. Performance Metrics

1.1 Latency Metrics

Measure end-to-end processing time from image input to LaTeX output:

  • P50 (Median): 50th percentile latency - typical processing time
  • P95: 95th percentile latency - most requests complete within this time
  • P99: 99th percentile latency - captures tail latency for SLA requirements
  • P99.9: 99.9th percentile latency - extreme outliers

Target Benchmarks:

Single Image Processing:
- P50: < 200ms (small images, <1MB)
- P95: < 500ms
- P99: < 1000ms
- P99.9: < 2000ms

Batch Processing (10 images):
- P50: < 1500ms
- P95: < 3000ms
- P99: < 5000ms

Component-Level Latency:

  • Image preprocessing: < 50ms
  • Model inference: < 150ms (GPU), < 800ms (CPU)
  • Post-processing/formatting: < 20ms
  • NAPI overhead: < 10ms

1.2 Throughput Metrics

Measure processing capacity under sustained load:

  • Images per second (img/s): Single-threaded performance
  • Pages per minute (ppm): Batch processing performance
  • Concurrent requests: Multi-threaded throughput
  • GPU utilization: Percentage of GPU compute used

Target Benchmarks:

Single-threaded:
- GPU: 10-15 img/s
- CPU: 2-3 img/s

Multi-threaded (8 cores):
- GPU: 40-50 img/s
- CPU: 8-12 img/s

Batch Processing:
- GPU: 60-80 ppm
- CPU: 15-20 ppm

1.3 Memory Usage

Track memory consumption patterns:

  • Peak memory: Maximum memory usage during processing
  • Average memory: Typical memory footprint
  • Memory per image: Incremental memory for each image
  • Memory leaks: Long-running stability tests

Target Benchmarks:

Model Loading:
- Peak: < 2GB (GPU), < 1GB (CPU)

Per-Image Processing:
- Peak: < 500MB
- Average: < 200MB

Batch Processing (100 images):
- Peak: < 3GB
- Average: < 1.5GB

1.4 Model Loading Time

Measure initialization overhead:

  • Cold start: First-time model loading
  • Warm start: Cached model loading
  • Memory mapping: mmap performance for large models

Target Benchmarks:

Cold Start:
- GPU: < 5s
- CPU: < 3s

Warm Start:
- GPU: < 1s
- CPU: < 500ms

2. Accuracy Metrics

2.1 Character Error Rate (CER)

Measures character-level accuracy:

CER = (Substitutions + Deletions + Insertions) / Total_Characters

Target: CER < 2% on standard datasets

Implementation:

fn calculate_cer(reference: &str, hypothesis: &str) -> f64 {
    let ref_chars: Vec<char> = reference.chars().collect();
    let hyp_chars: Vec<char> = hypothesis.chars().collect();

    let distance = levenshtein_distance(&ref_chars, &hyp_chars);
    distance as f64 / ref_chars.len() as f64
}

2.2 Word Error Rate (WER)

Measures word-level accuracy:

WER = (Substitutions + Deletions + Insertions) / Total_Words

Target: WER < 5% on standard datasets

Implementation:

fn calculate_wer(reference: &str, hypothesis: &str) -> f64 {
    let ref_words: Vec<&str> = reference.split_whitespace().collect();
    let hyp_words: Vec<&str> = hypothesis.split_whitespace().collect();

    let distance = levenshtein_distance(&ref_words, &hyp_words);
    distance as f64 / ref_words.len() as f64
}

2.3 BLEU Score for LaTeX Output

Measures LaTeX generation quality (0-100 scale):

BLEU = BP × exp(Σ wn × log(pn))

Target: BLEU > 85 on math expression datasets

Implementation:

fn calculate_bleu(reference: &str, hypothesis: &str, n: usize) -> f64 {
    let ref_ngrams = extract_ngrams(reference, n);
    let hyp_ngrams = extract_ngrams(hypothesis, n);

    let matches = count_matches(&ref_ngrams, &hyp_ngrams);
    let precision = matches as f64 / hyp_ngrams.len() as f64;

    let bp = brevity_penalty(reference.len(), hypothesis.len());
    bp * precision
}

2.4 Expression Recognition Rate (ERR)

Measures mathematical expression correctness:

ERR = Correct_Expressions / Total_Expressions

Target: ERR > 90% on complex mathematical expressions

Categories:

  • Simple expressions: 2+2, x^2
  • Fractions: \frac{a}{b}
  • Matrices: \begin{bmatrix}...\end{bmatrix}
  • Complex equations: integrals, summations, limits

3. Benchmark Datasets

3.1 Im2latex-100k

Source: https://zenodo.org/record/56198

Description:

  • 100,000 LaTeX formula images
  • Rendered from arXiv papers
  • Variety of mathematical expressions

Usage:

# Download dataset
wget https://zenodo.org/record/56198/files/im2latex-100k.tar.gz
tar -xzf im2latex-100k.tar.gz

# Structure:
# im2latex-100k/
#   ├── images/
#   └── formulas.txt

Benchmark Focus:

  • General mathematical notation
  • Diversity of expressions
  • Standard baseline comparison

3.2 Im2latex-230k

Source: Extended Im2latex dataset

Description:

  • 230,000 LaTeX formula images
  • More complex expressions
  • Better coverage of mathematical domains

Usage:

# Download extended dataset
wget https://zenodo.org/record/1234567/files/im2latex-230k.tar.gz
tar -xzf im2latex-230k.tar.gz

Benchmark Focus:

  • Complex mathematical expressions
  • Edge cases and rare symbols
  • Stress testing

3.3 CROHME (Handwritten Math)

Source: https://www.isical.ac.in/~crohme/

Description:

  • Competition on Recognition of Online Handwritten Mathematical Expressions
  • Handwritten formulas (not typed/rendered)
  • Real-world use case

Usage:

# Download CROHME dataset
wget http://www.isical.ac.in/~crohme/CROHME2019.zip
unzip CROHME2019.zip

Benchmark Focus:

  • Handwritten formula recognition
  • Real-world variability
  • Robustness testing

3.4 Custom Ruvector Test Set

Description:

  • Curated test set specific to ruvector use cases
  • Real user submissions
  • Edge cases discovered in production

Structure:

ruvector-testset/
├── easy/          # Simple expressions (100 samples)
├── medium/        # Moderate complexity (200 samples)
├── hard/          # Complex expressions (150 samples)
├── edge_cases/    # Known difficult cases (50 samples)
└── ground_truth.json

Creation Script:

// examples/scipix/benches/create_testset.rs
use std::fs;
use serde_json::json;

fn create_testset() {
    let testset = json!({
        "easy": [
            {"image": "easy/001.png", "latex": "x^2 + 2x + 1"},
            {"image": "easy/002.png", "latex": "\\frac{1}{2}"},
        ],
        "medium": [
            {"image": "medium/001.png", "latex": "\\int_{0}^{\\infty} e^{-x} dx"},
        ],
        "hard": [
            {"image": "hard/001.png", "latex": "\\sum_{n=1}^{\\infty} \\frac{1}{n^2} = \\frac{\\pi^2}{6}"},
        ]
    });

    fs::write("ground_truth.json", testset.to_string()).unwrap();
}

4. Comparison Baselines

4.1 Scipix API (Commercial Baseline)

Website: https://scipix.com/

Metrics to Compare:

  • Accuracy (CER, WER, BLEU)
  • Latency (API roundtrip time)
  • Cost per image
  • Supported formats

Benchmark Script:

async fn benchmark_scipix(image_path: &str) -> BenchmarkResult {
    let client = ScipixClient::new(api_key);

    let start = Instant::now();
    let result = client.ocr_image(image_path).await?;
    let latency = start.elapsed();

    BenchmarkResult {
        provider: "Scipix API",
        latency,
        latex: result.latex,
        confidence: result.confidence,
    }
}

4.2 pix2tex/LaTeX-OCR

Repository: https://github.com/lukas-blecher/LaTeX-OCR

Description:

  • Open-source Python implementation
  • Transformer-based model
  • Academic baseline

Benchmark Script:

# benchmark_pix2tex.py
import time
from pix2tex.cli import LatexOCR

model = LatexOCR()

def benchmark_pix2tex(image_path):
    start = time.time()
    latex = model(image_path)
    latency = time.time() - start

    return {
        'provider': 'pix2tex',
        'latency': latency,
        'latex': latex
    }

4.3 ocrs (Rust Native)

Repository: https://github.com/robertknight/ocrs

Description:

  • Rust-native OCR
  • General text OCR (not math-specific)
  • Performance baseline

Benchmark:

use ocrs::{OcrEngine, OcrEngineParams};

fn benchmark_ocrs(image_path: &str) -> BenchmarkResult {
    let engine = OcrEngine::new(OcrEngineParams::default())?;

    let start = Instant::now();
    let result = engine.ocr_image(image_path)?;
    let latency = start.elapsed();

    BenchmarkResult {
        provider: "ocrs",
        latency,
        text: result.text,
    }
}

4.4 Tesseract

Website: https://github.com/tesseract-ocr/tesseract

Description:

  • Industry standard OCR
  • Not math-specific
  • CPU performance baseline

Benchmark:

use tesseract::Tesseract;

fn benchmark_tesseract(image_path: &str) -> BenchmarkResult {
    let start = Instant::now();
    let text = Tesseract::new(None, Some("eng"))?
        .set_image(image_path)?
        .get_text()?;
    let latency = start.elapsed();

    BenchmarkResult {
        provider: "Tesseract",
        latency,
        text,
    }
}

5. Benchmark Implementation

5.1 Criterion.rs Setup

Dependencies:

# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
serde_json = "1.0"
image = "0.24"

[[bench]]
name = "scipix_ocr"
harness = false

5.2 Basic Benchmark Template

// benches/scipix_ocr.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::path::Path;

fn benchmark_single_image(c: &mut Criterion) {
    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    let image_path = "testdata/simple_equation.png";

    c.bench_function("ocr_simple_equation", |b| {
        b.iter(|| {
            ocr.process_image(black_box(image_path))
        });
    });
}

fn benchmark_image_sizes(c: &mut Criterion) {
    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    let mut group = c.benchmark_group("image_sizes");

    for size in ["small", "medium", "large"].iter() {
        let image_path = format!("testdata/{}_image.png", size);

        group.bench_with_input(
            BenchmarkId::from_parameter(size),
            &image_path,
            |b, path| {
                b.iter(|| ocr.process_image(black_box(path)));
            },
        );
    }

    group.finish();
}

fn benchmark_batch_processing(c: &mut Criterion) {
    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    let images: Vec<String> = (0..10)
        .map(|i| format!("testdata/batch_{}.png", i))
        .collect();

    c.bench_function("ocr_batch_10_images", |b| {
        b.iter(|| {
            ocr.process_batch(black_box(&images))
        });
    });
}

criterion_group!(benches, benchmark_single_image, benchmark_image_sizes, benchmark_batch_processing);
criterion_main!(benches);

5.3 Advanced Benchmark with Metrics

// benches/comprehensive_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId, Throughput};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::time::Duration;

fn benchmark_throughput(c: &mut Criterion) {
    let mut group = c.benchmark_group("throughput");

    // Configure for throughput measurement
    group.sample_size(100);
    group.measurement_time(Duration::from_secs(30));

    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    for batch_size in [1, 5, 10, 20, 50].iter() {
        let images: Vec<String> = (0..*batch_size)
            .map(|i| format!("testdata/image_{}.png", i % 10))
            .collect();

        group.throughput(Throughput::Elements(*batch_size as u64));

        group.bench_with_input(
            BenchmarkId::new("batch_processing", batch_size),
            &images,
            |b, imgs| {
                b.iter(|| ocr.process_batch(imgs));
            },
        );
    }

    group.finish();
}

fn benchmark_memory_usage(c: &mut Criterion) {
    let mut group = c.benchmark_group("memory");

    let config = OCRConfig::default();

    group.bench_function("model_loading", |b| {
        b.iter(|| {
            let _ocr = ScipixOCR::new(config.clone()).unwrap();
            // Model automatically dropped, measuring allocation overhead
        });
    });

    group.finish();
}

fn benchmark_latency_percentiles(c: &mut Criterion) {
    let mut group = c.benchmark_group("latency_percentiles");

    // Large sample size for accurate percentile calculation
    group.sample_size(1000);

    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    let test_images = vec![
        "testdata/simple.png",
        "testdata/complex.png",
        "testdata/matrix.png",
    ];

    for image_path in test_images {
        group.bench_with_input(
            BenchmarkId::from_parameter(Path::new(image_path).file_stem().unwrap().to_str().unwrap()),
            &image_path,
            |b, path| {
                b.iter(|| ocr.process_image(path));
            },
        );
    }

    group.finish();
}

criterion_group!(
    benches,
    benchmark_throughput,
    benchmark_memory_usage,
    benchmark_latency_percentiles
);
criterion_main!(benches);

5.4 Accuracy Benchmark

// benches/accuracy_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use serde::{Deserialize, Serialize};
use std::fs;

#[derive(Deserialize, Serialize)]
struct GroundTruth {
    image: String,
    latex: String,
}

fn load_ground_truth(path: &str) -> Vec<GroundTruth> {
    let content = fs::read_to_string(path).expect("Failed to read ground truth");
    serde_json::from_str(&content).expect("Failed to parse ground truth")
}

fn calculate_cer(reference: &str, hypothesis: &str) -> f64 {
    // Implement Levenshtein distance
    let ref_chars: Vec<char> = reference.chars().collect();
    let hyp_chars: Vec<char> = hypothesis.chars().collect();

    let mut dp = vec![vec![0; hyp_chars.len() + 1]; ref_chars.len() + 1];

    for i in 0..=ref_chars.len() {
        dp[i][0] = i;
    }
    for j in 0..=hyp_chars.len() {
        dp[0][j] = j;
    }

    for i in 1..=ref_chars.len() {
        for j in 1..=hyp_chars.len() {
            let cost = if ref_chars[i - 1] == hyp_chars[j - 1] { 0 } else { 1 };
            dp[i][j] = (dp[i - 1][j] + 1)
                .min(dp[i][j - 1] + 1)
                .min(dp[i - 1][j - 1] + cost);
        }
    }

    dp[ref_chars.len()][hyp_chars.len()] as f64 / ref_chars.len() as f64
}

fn benchmark_accuracy(c: &mut Criterion) {
    let config = OCRConfig::default();
    let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");

    let ground_truth = load_ground_truth("testdata/ground_truth.json");

    c.bench_function("accuracy_evaluation", |b| {
        b.iter(|| {
            let mut total_cer = 0.0;
            let mut count = 0;

            for gt in &ground_truth {
                if let Ok(result) = ocr.process_image(&gt.image) {
                    let cer = calculate_cer(&gt.latex, &result.latex);
                    total_cer += cer;
                    count += 1;
                }
            }

            let avg_cer = if count > 0 { total_cer / count as f64 } else { 1.0 };
            println!("Average CER: {:.4}", avg_cer);
        });
    });
}

criterion_group!(benches, benchmark_accuracy);
criterion_main!(benches);

5.5 Automated Benchmark Runner

// examples/scipix/src/benchmark_runner.rs
use std::process::Command;
use std::fs::{self, File};
use std::io::Write;
use serde_json::json;

pub struct BenchmarkRunner {
    output_dir: String,
}

impl BenchmarkRunner {
    pub fn new(output_dir: &str) -> Self {
        fs::create_dir_all(output_dir).expect("Failed to create output directory");
        Self {
            output_dir: output_dir.to_string(),
        }
    }

    pub fn run_all_benchmarks(&self) -> Result<(), Box<dyn std::error::Error>> {
        println!("Running comprehensive benchmarks...");

        // Run Criterion benchmarks
        let criterion_output = Command::new("cargo")
            .args(&["bench", "--bench", "scipix_ocr"])
            .output()?;

        self.save_output("criterion_output.txt", &criterion_output.stdout)?;

        // Run accuracy benchmarks
        let accuracy_output = Command::new("cargo")
            .args(&["bench", "--bench", "accuracy_benchmark"])
            .output()?;

        self.save_output("accuracy_output.txt", &accuracy_output.stdout)?;

        // Run memory profiling
        self.run_memory_profiling()?;

        // Generate summary report
        self.generate_summary_report()?;

        Ok(())
    }

    fn run_memory_profiling(&self) -> Result<(), Box<dyn std::error::Error>> {
        #[cfg(target_os = "linux")]
        {
            let output = Command::new("valgrind")
                .args(&[
                    "--tool=massif",
                    "--massif-out-file=massif.out",
                    "cargo", "bench", "--bench", "scipix_ocr"
                ])
                .output()?;

            self.save_output("memory_profile.txt", &output.stdout)?;
        }

        Ok(())
    }

    fn save_output(&self, filename: &str, content: &[u8]) -> Result<(), Box<dyn std::error::Error>> {
        let path = format!("{}/{}", self.output_dir, filename);
        let mut file = File::create(path)?;
        file.write_all(content)?;
        Ok(())
    }

    fn generate_summary_report(&self) -> Result<(), Box<dyn std::error::Error>> {
        let report = json!({
            "timestamp": chrono::Utc::now().to_rfc3339(),
            "benchmarks": {
                "performance": "See criterion_output.txt",
                "accuracy": "See accuracy_output.txt",
                "memory": "See memory_profile.txt"
            },
            "results_dir": self.output_dir
        });

        let path = format!("{}/summary.json", self.output_dir);
        let mut file = File::create(path)?;
        file.write_all(serde_json::to_string_pretty(&report)?.as_bytes())?;

        println!("Benchmark summary saved to {}/summary.json", self.output_dir);

        Ok(())
    }
}

// Main entry point
fn main() {
    let runner = BenchmarkRunner::new("benchmark_results");

    match runner.run_all_benchmarks() {
        Ok(_) => println!("All benchmarks completed successfully"),
        Err(e) => eprintln!("Benchmark failed: {}", e),
    }
}

5.6 CI/CD Integration

# .github/workflows/benchmarks.yml
name: Benchmarks

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Run daily at 2 AM

jobs:
  benchmark:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Install Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
          override: true

      - name: Cache cargo registry
        uses: actions/cache@v3
        with:
          path: ~/.cargo/registry
          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

      - name: Download test datasets
        run: |
          mkdir -p testdata
          # Download sample images for benchmarking
          wget -O testdata/simple.png https://example.com/test-images/simple.png

      - name: Run benchmarks
        run: cargo bench --bench scipix_ocr

      - name: Run accuracy benchmarks
        run: cargo bench --bench accuracy_benchmark

      - name: Upload benchmark results
        uses: actions/upload-artifact@v3
        with:
          name: benchmark-results
          path: target/criterion/

      - name: Compare with baseline
        run: |
          cargo install critcmp
          critcmp --group ".*" baseline current

      - name: Check for regressions
        run: |
          python scripts/check_regression.py \
            --baseline benchmark_baseline.json \
            --current target/criterion/results.json \
            --threshold 0.10  # Alert if >10% regression

6. Profiling Tools

6.1 perf and Flamegraph

Installation:

# Install perf (Linux)
sudo apt-get install linux-tools-common linux-tools-generic

# Install flamegraph
cargo install flamegraph

CPU Profiling:

# Profile a benchmark with perf
perf record -F 99 -g cargo bench --bench scipix_ocr

# Generate flamegraph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

# Or use cargo-flamegraph directly
cargo flamegraph --bench scipix_ocr

Analysis Script:

// scripts/analyze_perf.rs
use std::process::Command;

fn main() {
    // Run perf stat for detailed metrics
    let output = Command::new("perf")
        .args(&[
            "stat",
            "-e", "cycles,instructions,cache-misses,branch-misses",
            "cargo", "bench", "--bench", "scipix_ocr"
        ])
        .output()
        .expect("Failed to run perf stat");

    println!("Perf statistics:");
    println!("{}", String::from_utf8_lossy(&output.stderr));
}

6.2 Memory Profiling

Valgrind/Massif:

# Profile memory usage
valgrind --tool=massif \
         --massif-out-file=massif.out \
         cargo bench --bench scipix_ocr

# Visualize with massif-visualizer
massif-visualizer massif.out

# Or use ms_print
ms_print massif.out > memory_report.txt

Heaptrack (Linux):

# Install heaptrack
sudo apt-get install heaptrack

# Profile memory allocations
heaptrack cargo bench --bench scipix_ocr

# Analyze results
heaptrack_gui heaptrack.cargo.*.gz

Custom Memory Tracker:

// src/memory_tracker.rs
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};

pub struct TrackingAllocator;

static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
static DEALLOCATED: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackingAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let size = layout.size();
        ALLOCATED.fetch_add(size, Ordering::SeqCst);
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        let size = layout.size();
        DEALLOCATED.fetch_add(size, Ordering::SeqCst);
        System.dealloc(ptr, layout);
    }
}

#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;

pub fn get_memory_stats() -> (usize, usize, usize) {
    let allocated = ALLOCATED.load(Ordering::SeqCst);
    let deallocated = DEALLOCATED.load(Ordering::SeqCst);
    let current = allocated - deallocated;
    (allocated, deallocated, current)
}

// Usage in benchmark:
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn benchmark_memory() {
        let (before_alloc, _, _) = get_memory_stats();

        // Run OCR operation
        let ocr = ScipixOCR::new(OCRConfig::default()).unwrap();
        ocr.process_image("test.png").unwrap();

        let (after_alloc, _, current) = get_memory_stats();

        println!("Memory allocated: {} bytes", after_alloc - before_alloc);
        println!("Current memory usage: {} bytes", current);
    }
}

6.3 GPU Utilization

NVIDIA GPU Profiling:

# Install NVIDIA profiling tools
# Nsight Systems for timeline profiling
nsys profile --trace=cuda,nvtx cargo bench --bench scipix_ocr

# Nsight Compute for kernel analysis
ncu --set full cargo bench --bench scipix_ocr

GPU Monitoring Script:

// src/gpu_monitor.rs
use std::process::Command;
use std::time::{Duration, Instant};
use std::thread;

pub struct GPUMonitor {
    monitoring: bool,
    samples: Vec<GPUSample>,
}

#[derive(Debug, Clone)]
pub struct GPUSample {
    timestamp: Instant,
    utilization: u32,
    memory_used: u64,
    memory_total: u64,
    temperature: u32,
}

impl GPUMonitor {
    pub fn new() -> Self {
        Self {
            monitoring: false,
            samples: Vec::new(),
        }
    }

    pub fn start(&mut self) {
        self.monitoring = true;
        self.samples.clear();

        while self.monitoring {
            if let Ok(sample) = self.collect_sample() {
                self.samples.push(sample);
            }
            thread::sleep(Duration::from_millis(100));
        }
    }

    pub fn stop(&mut self) {
        self.monitoring = false;
    }

    fn collect_sample(&self) -> Result<GPUSample, Box<dyn std::error::Error>> {
        let output = Command::new("nvidia-smi")
            .args(&[
                "--query-gpu=utilization.gpu,memory.used,memory.total,temperature.gpu",
                "--format=csv,noheader,nounits"
            ])
            .output()?;

        let data = String::from_utf8(output.stdout)?;
        let parts: Vec<&str> = data.trim().split(',').collect();

        Ok(GPUSample {
            timestamp: Instant::now(),
            utilization: parts[0].trim().parse()?,
            memory_used: parts[1].trim().parse()?,
            memory_total: parts[2].trim().parse()?,
            temperature: parts[3].trim().parse()?,
        })
    }

    pub fn get_statistics(&self) -> GPUStatistics {
        if self.samples.is_empty() {
            return GPUStatistics::default();
        }

        let avg_utilization = self.samples.iter()
            .map(|s| s.utilization)
            .sum::<u32>() as f64 / self.samples.len() as f64;

        let max_utilization = self.samples.iter()
            .map(|s| s.utilization)
            .max()
            .unwrap_or(0);

        let avg_memory = self.samples.iter()
            .map(|s| s.memory_used)
            .sum::<u64>() as f64 / self.samples.len() as f64;

        GPUStatistics {
            avg_utilization,
            max_utilization,
            avg_memory_mb: avg_memory / 1024.0,
            sample_count: self.samples.len(),
        }
    }
}

#[derive(Debug, Default)]
pub struct GPUStatistics {
    pub avg_utilization: f64,
    pub max_utilization: u32,
    pub avg_memory_mb: f64,
    pub sample_count: usize,
}

6.4 Integrated Profiling Benchmark

// benches/profiling_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::sync::{Arc, Mutex};
use std::thread;

fn benchmark_with_profiling(c: &mut Criterion) {
    let mut group = c.benchmark_group("profiled");

    group.bench_function("ocr_with_memory_tracking", |b| {
        b.iter_custom(|iters| {
            let config = OCRConfig::default();
            let ocr = ScipixOCR::new(config).unwrap();

            let (start_alloc, _, _) = get_memory_stats();
            let start_time = std::time::Instant::now();

            for _ in 0..iters {
                ocr.process_image("testdata/sample.png").unwrap();
            }

            let duration = start_time.elapsed();
            let (end_alloc, _, current) = get_memory_stats();

            println!("Memory delta: {} bytes", end_alloc - start_alloc);
            println!("Current usage: {} bytes", current);

            duration
        });
    });

    group.bench_function("ocr_with_gpu_monitoring", |b| {
        let monitor = Arc::new(Mutex::new(GPUMonitor::new()));
        let monitor_clone = monitor.clone();

        // Start GPU monitoring in background thread
        let handle = thread::spawn(move || {
            monitor_clone.lock().unwrap().start();
        });

        b.iter(|| {
            let config = OCRConfig::default();
            let ocr = ScipixOCR::new(config).unwrap();
            ocr.process_image("testdata/sample.png").unwrap();
        });

        // Stop monitoring
        monitor.lock().unwrap().stop();
        handle.join().unwrap();

        let stats = monitor.lock().unwrap().get_statistics();
        println!("GPU Statistics: {:?}", stats);
    });

    group.finish();
}

criterion_group!(benches, benchmark_with_profiling);
criterion_main!(benches);

7. Regression Testing

7.1 Performance Baseline Tracking

Baseline Storage Structure:

{
  "commit": "a1b2c3d4",
  "timestamp": "2024-01-15T10:30:00Z",
  "benchmarks": {
    "ocr_simple_equation": {
      "mean": 185.4,
      "std_dev": 12.3,
      "p50": 182.1,
      "p95": 210.5,
      "p99": 225.8
    },
    "ocr_batch_10_images": {
      "mean": 1420.6,
      "std_dev": 85.2,
      "throughput": 7.04
    }
  },
  "accuracy": {
    "cer": 0.0185,
    "wer": 0.0432,
    "bleu": 87.3
  }
}

Baseline Manager:

// src/baseline_manager.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::fs;

#[derive(Serialize, Deserialize, Clone)]
pub struct BenchmarkBaseline {
    pub commit: String,
    pub timestamp: String,
    pub benchmarks: HashMap<String, BenchmarkMetrics>,
    pub accuracy: AccuracyMetrics,
}

#[derive(Serialize, Deserialize, Clone)]
pub struct BenchmarkMetrics {
    pub mean: f64,
    pub std_dev: f64,
    pub p50: f64,
    pub p95: f64,
    pub p99: f64,
}

#[derive(Serialize, Deserialize, Clone)]
pub struct AccuracyMetrics {
    pub cer: f64,
    pub wer: f64,
    pub bleu: f64,
}

pub struct BaselineManager {
    baseline_path: String,
}

impl BaselineManager {
    pub fn new(baseline_path: &str) -> Self {
        Self {
            baseline_path: baseline_path.to_string(),
        }
    }

    pub fn load_baseline(&self) -> Result<BenchmarkBaseline, Box<dyn std::error::Error>> {
        let content = fs::read_to_string(&self.baseline_path)?;
        Ok(serde_json::from_str(&content)?)
    }

    pub fn save_baseline(&self, baseline: &BenchmarkBaseline) -> Result<(), Box<dyn std::error::Error>> {
        let json = serde_json::to_string_pretty(baseline)?;
        fs::write(&self.baseline_path, json)?;
        Ok(())
    }

    pub fn compare_with_baseline(
        &self,
        current: &BenchmarkBaseline,
        threshold: f64
    ) -> Vec<RegressionAlert> {
        let baseline = match self.load_baseline() {
            Ok(b) => b,
            Err(_) => return vec![],
        };

        let mut alerts = Vec::new();

        for (name, current_metrics) in &current.benchmarks {
            if let Some(baseline_metrics) = baseline.benchmarks.get(name) {
                let regression = (current_metrics.mean - baseline_metrics.mean) / baseline_metrics.mean;

                if regression > threshold {
                    alerts.push(RegressionAlert {
                        benchmark: name.clone(),
                        metric: "mean".to_string(),
                        baseline_value: baseline_metrics.mean,
                        current_value: current_metrics.mean,
                        regression_percent: regression * 100.0,
                        severity: if regression > threshold * 2.0 { Severity::High } else { Severity::Medium },
                    });
                }
            }
        }

        // Check accuracy regressions
        if current.accuracy.cer > baseline.accuracy.cer * (1.0 + threshold) {
            alerts.push(RegressionAlert {
                benchmark: "accuracy".to_string(),
                metric: "cer".to_string(),
                baseline_value: baseline.accuracy.cer,
                current_value: current.accuracy.cer,
                regression_percent: ((current.accuracy.cer - baseline.accuracy.cer) / baseline.accuracy.cer) * 100.0,
                severity: Severity::High,
            });
        }

        alerts
    }
}

#[derive(Debug)]
pub struct RegressionAlert {
    pub benchmark: String,
    pub metric: String,
    pub baseline_value: f64,
    pub current_value: f64,
    pub regression_percent: f64,
    pub severity: Severity,
}

#[derive(Debug)]
pub enum Severity {
    Low,
    Medium,
    High,
}

7.2 Automated Regression Detection

// scripts/detect_regression.rs
use ruvector_scipix::baseline_manager::{BaselineManager, BenchmarkBaseline};
use std::env;
use std::process;

fn main() {
    let args: Vec<String> = env::args().collect();

    if args.len() < 3 {
        eprintln!("Usage: detect_regression <baseline.json> <current.json>");
        process::exit(1);
    }

    let baseline_path = &args[1];
    let current_path = &args[2];
    let threshold = 0.10; // 10% regression threshold

    let manager = BaselineManager::new(baseline_path);

    // Load current results
    let current: BenchmarkBaseline = {
        let content = std::fs::read_to_string(current_path)
            .expect("Failed to read current results");
        serde_json::from_str(&content)
            .expect("Failed to parse current results")
    };

    // Compare with baseline
    let alerts = manager.compare_with_baseline(&current, threshold);

    if alerts.is_empty() {
        println!("✅ No performance regressions detected");
        process::exit(0);
    } else {
        println!("⚠️  Performance regressions detected:");

        let mut has_high_severity = false;

        for alert in &alerts {
            let severity_icon = match alert.severity {
                Severity::Low => "🟡",
                Severity::Medium => "🟠",
                Severity::High => "🔴",
            };

            if matches!(alert.severity, Severity::High) {
                has_high_severity = true;
            }

            println!(
                "{} {} / {}: {:.2}ms → {:.2}ms ({:+.1}%)",
                severity_icon,
                alert.benchmark,
                alert.metric,
                alert.baseline_value,
                alert.current_value,
                alert.regression_percent
            );
        }

        if has_high_severity {
            process::exit(1);
        } else {
            process::exit(0);
        }
    }
}

7.3 GitHub Actions Integration

# .github/workflows/regression_check.yml
name: Performance Regression Check

on:
  pull_request:
    branches: [main]

jobs:
  regression-check:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Install Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable

      - name: Download baseline
        run: |
          # Download baseline from releases or artifacts
          gh release download baseline --pattern 'benchmark_baseline.json'
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Run benchmarks
        run: |
          cargo bench --bench scipix_ocr -- --save-baseline current_baseline.json

      - name: Detect regressions
        run: |
          cargo run --bin detect_regression -- benchmark_baseline.json current_baseline.json

      - name: Comment on PR
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '⚠️ Performance regression detected! Please review benchmark results.'
            })

7.4 Continuous Baseline Updates

// scripts/update_baseline.rs
use ruvector_scipix::baseline_manager::{BaselineManager, BenchmarkBaseline};
use std::process::Command;

fn main() {
    // Get current git commit
    let commit = Command::new("git")
        .args(&["rev-parse", "HEAD"])
        .output()
        .expect("Failed to get git commit")
        .stdout;
    let commit = String::from_utf8(commit).unwrap().trim().to_string();

    // Run benchmarks
    let benchmark_output = Command::new("cargo")
        .args(&["bench", "--bench", "scipix_ocr", "--", "--save-baseline", "temp.json"])
        .output()
        .expect("Failed to run benchmarks");

    if !benchmark_output.status.success() {
        eprintln!("Benchmark failed");
        std::process::exit(1);
    }

    // Load benchmark results
    let baseline: BenchmarkBaseline = {
        let content = std::fs::read_to_string("temp.json")
            .expect("Failed to read benchmark results");
        let mut baseline: BenchmarkBaseline = serde_json::from_str(&content)
            .expect("Failed to parse benchmark results");

        baseline.commit = commit;
        baseline.timestamp = chrono::Utc::now().to_rfc3339();
        baseline
    };

    // Save as new baseline
    let manager = BaselineManager::new("benchmark_baseline.json");
    manager.save_baseline(&baseline)
        .expect("Failed to save baseline");

    println!("✅ Baseline updated successfully");
    println!("Commit: {}", baseline.commit);
    println!("Timestamp: {}", baseline.timestamp);
}

Summary

This benchmarking strategy provides:

  1. Comprehensive Performance Metrics: Latency, throughput, memory, and model loading benchmarks
  2. Accuracy Validation: CER, WER, BLEU, and ERR metrics with industry-standard datasets
  3. Competitive Analysis: Baseline comparisons with Scipix, pix2tex, ocrs, and Tesseract
  4. Production-Ready Implementation: Criterion.rs benchmarks with CI/CD integration
  5. Advanced Profiling: CPU, memory, and GPU profiling tools
  6. Regression Protection: Automated detection and alerting for performance degradation

Next Steps:

  1. Set up test datasets (Im2latex, CROHME)
  2. Implement core benchmark suite
  3. Establish performance baselines
  4. Integrate into CI/CD pipeline
  5. Configure alerting for regressions
  6. Regular benchmark reviews and optimization

Benchmark Execution:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench --bench scipix_ocr

# Run with profiling
cargo flamegraph --bench scipix_ocr

# Check for regressions
cargo run --bin detect_regression -- baseline.json current.json

# Update baseline
cargo run --bin update_baseline