37 KiB
Benchmarking Strategy for Ruvector-Scipix OCR System
Overview
This document outlines a comprehensive benchmarking strategy for the ruvector-scipix OCR system, covering performance metrics, accuracy metrics, datasets, baselines, and implementation details.
1. Performance Metrics
1.1 Latency Metrics
Measure end-to-end processing time from image input to LaTeX output:
- P50 (Median): 50th percentile latency - typical processing time
- P95: 95th percentile latency - most requests complete within this time
- P99: 99th percentile latency - captures tail latency for SLA requirements
- P99.9: 99.9th percentile latency - extreme outliers
Target Benchmarks:
Single Image Processing:
- P50: < 200ms (small images, <1MB)
- P95: < 500ms
- P99: < 1000ms
- P99.9: < 2000ms
Batch Processing (10 images):
- P50: < 1500ms
- P95: < 3000ms
- P99: < 5000ms
Component-Level Latency:
- Image preprocessing: < 50ms
- Model inference: < 150ms (GPU), < 800ms (CPU)
- Post-processing/formatting: < 20ms
- NAPI overhead: < 10ms
1.2 Throughput Metrics
Measure processing capacity under sustained load:
- Images per second (img/s): Single-threaded performance
- Pages per minute (ppm): Batch processing performance
- Concurrent requests: Multi-threaded throughput
- GPU utilization: Percentage of GPU compute used
Target Benchmarks:
Single-threaded:
- GPU: 10-15 img/s
- CPU: 2-3 img/s
Multi-threaded (8 cores):
- GPU: 40-50 img/s
- CPU: 8-12 img/s
Batch Processing:
- GPU: 60-80 ppm
- CPU: 15-20 ppm
1.3 Memory Usage
Track memory consumption patterns:
- Peak memory: Maximum memory usage during processing
- Average memory: Typical memory footprint
- Memory per image: Incremental memory for each image
- Memory leaks: Long-running stability tests
Target Benchmarks:
Model Loading:
- Peak: < 2GB (GPU), < 1GB (CPU)
Per-Image Processing:
- Peak: < 500MB
- Average: < 200MB
Batch Processing (100 images):
- Peak: < 3GB
- Average: < 1.5GB
1.4 Model Loading Time
Measure initialization overhead:
- Cold start: First-time model loading
- Warm start: Cached model loading
- Memory mapping: mmap performance for large models
Target Benchmarks:
Cold Start:
- GPU: < 5s
- CPU: < 3s
Warm Start:
- GPU: < 1s
- CPU: < 500ms
2. Accuracy Metrics
2.1 Character Error Rate (CER)
Measures character-level accuracy:
CER = (Substitutions + Deletions + Insertions) / Total_Characters
Target: CER < 2% on standard datasets
Implementation:
fn calculate_cer(reference: &str, hypothesis: &str) -> f64 {
let ref_chars: Vec<char> = reference.chars().collect();
let hyp_chars: Vec<char> = hypothesis.chars().collect();
let distance = levenshtein_distance(&ref_chars, &hyp_chars);
distance as f64 / ref_chars.len() as f64
}
2.2 Word Error Rate (WER)
Measures word-level accuracy:
WER = (Substitutions + Deletions + Insertions) / Total_Words
Target: WER < 5% on standard datasets
Implementation:
fn calculate_wer(reference: &str, hypothesis: &str) -> f64 {
let ref_words: Vec<&str> = reference.split_whitespace().collect();
let hyp_words: Vec<&str> = hypothesis.split_whitespace().collect();
let distance = levenshtein_distance(&ref_words, &hyp_words);
distance as f64 / ref_words.len() as f64
}
2.3 BLEU Score for LaTeX Output
Measures LaTeX generation quality (0-100 scale):
BLEU = BP × exp(Σ wn × log(pn))
Target: BLEU > 85 on math expression datasets
Implementation:
fn calculate_bleu(reference: &str, hypothesis: &str, n: usize) -> f64 {
let ref_ngrams = extract_ngrams(reference, n);
let hyp_ngrams = extract_ngrams(hypothesis, n);
let matches = count_matches(&ref_ngrams, &hyp_ngrams);
let precision = matches as f64 / hyp_ngrams.len() as f64;
let bp = brevity_penalty(reference.len(), hypothesis.len());
bp * precision
}
2.4 Expression Recognition Rate (ERR)
Measures mathematical expression correctness:
ERR = Correct_Expressions / Total_Expressions
Target: ERR > 90% on complex mathematical expressions
Categories:
- Simple expressions: 2+2, x^2
- Fractions: \frac{a}{b}
- Matrices: \begin{bmatrix}...\end{bmatrix}
- Complex equations: integrals, summations, limits
3. Benchmark Datasets
3.1 Im2latex-100k
Source: https://zenodo.org/record/56198
Description:
- 100,000 LaTeX formula images
- Rendered from arXiv papers
- Variety of mathematical expressions
Usage:
# Download dataset
wget https://zenodo.org/record/56198/files/im2latex-100k.tar.gz
tar -xzf im2latex-100k.tar.gz
# Structure:
# im2latex-100k/
# ├── images/
# └── formulas.txt
Benchmark Focus:
- General mathematical notation
- Diversity of expressions
- Standard baseline comparison
3.2 Im2latex-230k
Source: Extended Im2latex dataset
Description:
- 230,000 LaTeX formula images
- More complex expressions
- Better coverage of mathematical domains
Usage:
# Download extended dataset
wget https://zenodo.org/record/1234567/files/im2latex-230k.tar.gz
tar -xzf im2latex-230k.tar.gz
Benchmark Focus:
- Complex mathematical expressions
- Edge cases and rare symbols
- Stress testing
3.3 CROHME (Handwritten Math)
Source: https://www.isical.ac.in/~crohme/
Description:
- Competition on Recognition of Online Handwritten Mathematical Expressions
- Handwritten formulas (not typed/rendered)
- Real-world use case
Usage:
# Download CROHME dataset
wget http://www.isical.ac.in/~crohme/CROHME2019.zip
unzip CROHME2019.zip
Benchmark Focus:
- Handwritten formula recognition
- Real-world variability
- Robustness testing
3.4 Custom Ruvector Test Set
Description:
- Curated test set specific to ruvector use cases
- Real user submissions
- Edge cases discovered in production
Structure:
ruvector-testset/
├── easy/ # Simple expressions (100 samples)
├── medium/ # Moderate complexity (200 samples)
├── hard/ # Complex expressions (150 samples)
├── edge_cases/ # Known difficult cases (50 samples)
└── ground_truth.json
Creation Script:
// examples/scipix/benches/create_testset.rs
use std::fs;
use serde_json::json;
fn create_testset() {
let testset = json!({
"easy": [
{"image": "easy/001.png", "latex": "x^2 + 2x + 1"},
{"image": "easy/002.png", "latex": "\\frac{1}{2}"},
],
"medium": [
{"image": "medium/001.png", "latex": "\\int_{0}^{\\infty} e^{-x} dx"},
],
"hard": [
{"image": "hard/001.png", "latex": "\\sum_{n=1}^{\\infty} \\frac{1}{n^2} = \\frac{\\pi^2}{6}"},
]
});
fs::write("ground_truth.json", testset.to_string()).unwrap();
}
4. Comparison Baselines
4.1 Scipix API (Commercial Baseline)
Website: https://scipix.com/
Metrics to Compare:
- Accuracy (CER, WER, BLEU)
- Latency (API roundtrip time)
- Cost per image
- Supported formats
Benchmark Script:
async fn benchmark_scipix(image_path: &str) -> BenchmarkResult {
let client = ScipixClient::new(api_key);
let start = Instant::now();
let result = client.ocr_image(image_path).await?;
let latency = start.elapsed();
BenchmarkResult {
provider: "Scipix API",
latency,
latex: result.latex,
confidence: result.confidence,
}
}
4.2 pix2tex/LaTeX-OCR
Repository: https://github.com/lukas-blecher/LaTeX-OCR
Description:
- Open-source Python implementation
- Transformer-based model
- Academic baseline
Benchmark Script:
# benchmark_pix2tex.py
import time
from pix2tex.cli import LatexOCR
model = LatexOCR()
def benchmark_pix2tex(image_path):
start = time.time()
latex = model(image_path)
latency = time.time() - start
return {
'provider': 'pix2tex',
'latency': latency,
'latex': latex
}
4.3 ocrs (Rust Native)
Repository: https://github.com/robertknight/ocrs
Description:
- Rust-native OCR
- General text OCR (not math-specific)
- Performance baseline
Benchmark:
use ocrs::{OcrEngine, OcrEngineParams};
fn benchmark_ocrs(image_path: &str) -> BenchmarkResult {
let engine = OcrEngine::new(OcrEngineParams::default())?;
let start = Instant::now();
let result = engine.ocr_image(image_path)?;
let latency = start.elapsed();
BenchmarkResult {
provider: "ocrs",
latency,
text: result.text,
}
}
4.4 Tesseract
Website: https://github.com/tesseract-ocr/tesseract
Description:
- Industry standard OCR
- Not math-specific
- CPU performance baseline
Benchmark:
use tesseract::Tesseract;
fn benchmark_tesseract(image_path: &str) -> BenchmarkResult {
let start = Instant::now();
let text = Tesseract::new(None, Some("eng"))?
.set_image(image_path)?
.get_text()?;
let latency = start.elapsed();
BenchmarkResult {
provider: "Tesseract",
latency,
text,
}
}
5. Benchmark Implementation
5.1 Criterion.rs Setup
Dependencies:
# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
serde_json = "1.0"
image = "0.24"
[[bench]]
name = "scipix_ocr"
harness = false
5.2 Basic Benchmark Template
// benches/scipix_ocr.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::path::Path;
fn benchmark_single_image(c: &mut Criterion) {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
let image_path = "testdata/simple_equation.png";
c.bench_function("ocr_simple_equation", |b| {
b.iter(|| {
ocr.process_image(black_box(image_path))
});
});
}
fn benchmark_image_sizes(c: &mut Criterion) {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
let mut group = c.benchmark_group("image_sizes");
for size in ["small", "medium", "large"].iter() {
let image_path = format!("testdata/{}_image.png", size);
group.bench_with_input(
BenchmarkId::from_parameter(size),
&image_path,
|b, path| {
b.iter(|| ocr.process_image(black_box(path)));
},
);
}
group.finish();
}
fn benchmark_batch_processing(c: &mut Criterion) {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
let images: Vec<String> = (0..10)
.map(|i| format!("testdata/batch_{}.png", i))
.collect();
c.bench_function("ocr_batch_10_images", |b| {
b.iter(|| {
ocr.process_batch(black_box(&images))
});
});
}
criterion_group!(benches, benchmark_single_image, benchmark_image_sizes, benchmark_batch_processing);
criterion_main!(benches);
5.3 Advanced Benchmark with Metrics
// benches/comprehensive_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId, Throughput};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::time::Duration;
fn benchmark_throughput(c: &mut Criterion) {
let mut group = c.benchmark_group("throughput");
// Configure for throughput measurement
group.sample_size(100);
group.measurement_time(Duration::from_secs(30));
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
for batch_size in [1, 5, 10, 20, 50].iter() {
let images: Vec<String> = (0..*batch_size)
.map(|i| format!("testdata/image_{}.png", i % 10))
.collect();
group.throughput(Throughput::Elements(*batch_size as u64));
group.bench_with_input(
BenchmarkId::new("batch_processing", batch_size),
&images,
|b, imgs| {
b.iter(|| ocr.process_batch(imgs));
},
);
}
group.finish();
}
fn benchmark_memory_usage(c: &mut Criterion) {
let mut group = c.benchmark_group("memory");
let config = OCRConfig::default();
group.bench_function("model_loading", |b| {
b.iter(|| {
let _ocr = ScipixOCR::new(config.clone()).unwrap();
// Model automatically dropped, measuring allocation overhead
});
});
group.finish();
}
fn benchmark_latency_percentiles(c: &mut Criterion) {
let mut group = c.benchmark_group("latency_percentiles");
// Large sample size for accurate percentile calculation
group.sample_size(1000);
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
let test_images = vec![
"testdata/simple.png",
"testdata/complex.png",
"testdata/matrix.png",
];
for image_path in test_images {
group.bench_with_input(
BenchmarkId::from_parameter(Path::new(image_path).file_stem().unwrap().to_str().unwrap()),
&image_path,
|b, path| {
b.iter(|| ocr.process_image(path));
},
);
}
group.finish();
}
criterion_group!(
benches,
benchmark_throughput,
benchmark_memory_usage,
benchmark_latency_percentiles
);
criterion_main!(benches);
5.4 Accuracy Benchmark
// benches/accuracy_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use serde::{Deserialize, Serialize};
use std::fs;
#[derive(Deserialize, Serialize)]
struct GroundTruth {
image: String,
latex: String,
}
fn load_ground_truth(path: &str) -> Vec<GroundTruth> {
let content = fs::read_to_string(path).expect("Failed to read ground truth");
serde_json::from_str(&content).expect("Failed to parse ground truth")
}
fn calculate_cer(reference: &str, hypothesis: &str) -> f64 {
// Implement Levenshtein distance
let ref_chars: Vec<char> = reference.chars().collect();
let hyp_chars: Vec<char> = hypothesis.chars().collect();
let mut dp = vec![vec![0; hyp_chars.len() + 1]; ref_chars.len() + 1];
for i in 0..=ref_chars.len() {
dp[i][0] = i;
}
for j in 0..=hyp_chars.len() {
dp[0][j] = j;
}
for i in 1..=ref_chars.len() {
for j in 1..=hyp_chars.len() {
let cost = if ref_chars[i - 1] == hyp_chars[j - 1] { 0 } else { 1 };
dp[i][j] = (dp[i - 1][j] + 1)
.min(dp[i][j - 1] + 1)
.min(dp[i - 1][j - 1] + cost);
}
}
dp[ref_chars.len()][hyp_chars.len()] as f64 / ref_chars.len() as f64
}
fn benchmark_accuracy(c: &mut Criterion) {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).expect("Failed to initialize OCR");
let ground_truth = load_ground_truth("testdata/ground_truth.json");
c.bench_function("accuracy_evaluation", |b| {
b.iter(|| {
let mut total_cer = 0.0;
let mut count = 0;
for gt in &ground_truth {
if let Ok(result) = ocr.process_image(>.image) {
let cer = calculate_cer(>.latex, &result.latex);
total_cer += cer;
count += 1;
}
}
let avg_cer = if count > 0 { total_cer / count as f64 } else { 1.0 };
println!("Average CER: {:.4}", avg_cer);
});
});
}
criterion_group!(benches, benchmark_accuracy);
criterion_main!(benches);
5.5 Automated Benchmark Runner
// examples/scipix/src/benchmark_runner.rs
use std::process::Command;
use std::fs::{self, File};
use std::io::Write;
use serde_json::json;
pub struct BenchmarkRunner {
output_dir: String,
}
impl BenchmarkRunner {
pub fn new(output_dir: &str) -> Self {
fs::create_dir_all(output_dir).expect("Failed to create output directory");
Self {
output_dir: output_dir.to_string(),
}
}
pub fn run_all_benchmarks(&self) -> Result<(), Box<dyn std::error::Error>> {
println!("Running comprehensive benchmarks...");
// Run Criterion benchmarks
let criterion_output = Command::new("cargo")
.args(&["bench", "--bench", "scipix_ocr"])
.output()?;
self.save_output("criterion_output.txt", &criterion_output.stdout)?;
// Run accuracy benchmarks
let accuracy_output = Command::new("cargo")
.args(&["bench", "--bench", "accuracy_benchmark"])
.output()?;
self.save_output("accuracy_output.txt", &accuracy_output.stdout)?;
// Run memory profiling
self.run_memory_profiling()?;
// Generate summary report
self.generate_summary_report()?;
Ok(())
}
fn run_memory_profiling(&self) -> Result<(), Box<dyn std::error::Error>> {
#[cfg(target_os = "linux")]
{
let output = Command::new("valgrind")
.args(&[
"--tool=massif",
"--massif-out-file=massif.out",
"cargo", "bench", "--bench", "scipix_ocr"
])
.output()?;
self.save_output("memory_profile.txt", &output.stdout)?;
}
Ok(())
}
fn save_output(&self, filename: &str, content: &[u8]) -> Result<(), Box<dyn std::error::Error>> {
let path = format!("{}/{}", self.output_dir, filename);
let mut file = File::create(path)?;
file.write_all(content)?;
Ok(())
}
fn generate_summary_report(&self) -> Result<(), Box<dyn std::error::Error>> {
let report = json!({
"timestamp": chrono::Utc::now().to_rfc3339(),
"benchmarks": {
"performance": "See criterion_output.txt",
"accuracy": "See accuracy_output.txt",
"memory": "See memory_profile.txt"
},
"results_dir": self.output_dir
});
let path = format!("{}/summary.json", self.output_dir);
let mut file = File::create(path)?;
file.write_all(serde_json::to_string_pretty(&report)?.as_bytes())?;
println!("Benchmark summary saved to {}/summary.json", self.output_dir);
Ok(())
}
}
// Main entry point
fn main() {
let runner = BenchmarkRunner::new("benchmark_results");
match runner.run_all_benchmarks() {
Ok(_) => println!("All benchmarks completed successfully"),
Err(e) => eprintln!("Benchmark failed: {}", e),
}
}
5.6 CI/CD Integration
# .github/workflows/benchmarks.yml
name: Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 2 * * *' # Run daily at 2 AM
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Cache cargo registry
uses: actions/cache@v3
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
- name: Download test datasets
run: |
mkdir -p testdata
# Download sample images for benchmarking
wget -O testdata/simple.png https://example.com/test-images/simple.png
- name: Run benchmarks
run: cargo bench --bench scipix_ocr
- name: Run accuracy benchmarks
run: cargo bench --bench accuracy_benchmark
- name: Upload benchmark results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: target/criterion/
- name: Compare with baseline
run: |
cargo install critcmp
critcmp --group ".*" baseline current
- name: Check for regressions
run: |
python scripts/check_regression.py \
--baseline benchmark_baseline.json \
--current target/criterion/results.json \
--threshold 0.10 # Alert if >10% regression
6. Profiling Tools
6.1 perf and Flamegraph
Installation:
# Install perf (Linux)
sudo apt-get install linux-tools-common linux-tools-generic
# Install flamegraph
cargo install flamegraph
CPU Profiling:
# Profile a benchmark with perf
perf record -F 99 -g cargo bench --bench scipix_ocr
# Generate flamegraph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Or use cargo-flamegraph directly
cargo flamegraph --bench scipix_ocr
Analysis Script:
// scripts/analyze_perf.rs
use std::process::Command;
fn main() {
// Run perf stat for detailed metrics
let output = Command::new("perf")
.args(&[
"stat",
"-e", "cycles,instructions,cache-misses,branch-misses",
"cargo", "bench", "--bench", "scipix_ocr"
])
.output()
.expect("Failed to run perf stat");
println!("Perf statistics:");
println!("{}", String::from_utf8_lossy(&output.stderr));
}
6.2 Memory Profiling
Valgrind/Massif:
# Profile memory usage
valgrind --tool=massif \
--massif-out-file=massif.out \
cargo bench --bench scipix_ocr
# Visualize with massif-visualizer
massif-visualizer massif.out
# Or use ms_print
ms_print massif.out > memory_report.txt
Heaptrack (Linux):
# Install heaptrack
sudo apt-get install heaptrack
# Profile memory allocations
heaptrack cargo bench --bench scipix_ocr
# Analyze results
heaptrack_gui heaptrack.cargo.*.gz
Custom Memory Tracker:
// src/memory_tracker.rs
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
pub struct TrackingAllocator;
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
static DEALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
let size = layout.size();
ALLOCATED.fetch_add(size, Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
let size = layout.size();
DEALLOCATED.fetch_add(size, Ordering::SeqCst);
System.dealloc(ptr, layout);
}
}
#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;
pub fn get_memory_stats() -> (usize, usize, usize) {
let allocated = ALLOCATED.load(Ordering::SeqCst);
let deallocated = DEALLOCATED.load(Ordering::SeqCst);
let current = allocated - deallocated;
(allocated, deallocated, current)
}
// Usage in benchmark:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn benchmark_memory() {
let (before_alloc, _, _) = get_memory_stats();
// Run OCR operation
let ocr = ScipixOCR::new(OCRConfig::default()).unwrap();
ocr.process_image("test.png").unwrap();
let (after_alloc, _, current) = get_memory_stats();
println!("Memory allocated: {} bytes", after_alloc - before_alloc);
println!("Current memory usage: {} bytes", current);
}
}
6.3 GPU Utilization
NVIDIA GPU Profiling:
# Install NVIDIA profiling tools
# Nsight Systems for timeline profiling
nsys profile --trace=cuda,nvtx cargo bench --bench scipix_ocr
# Nsight Compute for kernel analysis
ncu --set full cargo bench --bench scipix_ocr
GPU Monitoring Script:
// src/gpu_monitor.rs
use std::process::Command;
use std::time::{Duration, Instant};
use std::thread;
pub struct GPUMonitor {
monitoring: bool,
samples: Vec<GPUSample>,
}
#[derive(Debug, Clone)]
pub struct GPUSample {
timestamp: Instant,
utilization: u32,
memory_used: u64,
memory_total: u64,
temperature: u32,
}
impl GPUMonitor {
pub fn new() -> Self {
Self {
monitoring: false,
samples: Vec::new(),
}
}
pub fn start(&mut self) {
self.monitoring = true;
self.samples.clear();
while self.monitoring {
if let Ok(sample) = self.collect_sample() {
self.samples.push(sample);
}
thread::sleep(Duration::from_millis(100));
}
}
pub fn stop(&mut self) {
self.monitoring = false;
}
fn collect_sample(&self) -> Result<GPUSample, Box<dyn std::error::Error>> {
let output = Command::new("nvidia-smi")
.args(&[
"--query-gpu=utilization.gpu,memory.used,memory.total,temperature.gpu",
"--format=csv,noheader,nounits"
])
.output()?;
let data = String::from_utf8(output.stdout)?;
let parts: Vec<&str> = data.trim().split(',').collect();
Ok(GPUSample {
timestamp: Instant::now(),
utilization: parts[0].trim().parse()?,
memory_used: parts[1].trim().parse()?,
memory_total: parts[2].trim().parse()?,
temperature: parts[3].trim().parse()?,
})
}
pub fn get_statistics(&self) -> GPUStatistics {
if self.samples.is_empty() {
return GPUStatistics::default();
}
let avg_utilization = self.samples.iter()
.map(|s| s.utilization)
.sum::<u32>() as f64 / self.samples.len() as f64;
let max_utilization = self.samples.iter()
.map(|s| s.utilization)
.max()
.unwrap_or(0);
let avg_memory = self.samples.iter()
.map(|s| s.memory_used)
.sum::<u64>() as f64 / self.samples.len() as f64;
GPUStatistics {
avg_utilization,
max_utilization,
avg_memory_mb: avg_memory / 1024.0,
sample_count: self.samples.len(),
}
}
}
#[derive(Debug, Default)]
pub struct GPUStatistics {
pub avg_utilization: f64,
pub max_utilization: u32,
pub avg_memory_mb: f64,
pub sample_count: usize,
}
6.4 Integrated Profiling Benchmark
// benches/profiling_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use ruvector_scipix::{ScipixOCR, OCRConfig};
use std::sync::{Arc, Mutex};
use std::thread;
fn benchmark_with_profiling(c: &mut Criterion) {
let mut group = c.benchmark_group("profiled");
group.bench_function("ocr_with_memory_tracking", |b| {
b.iter_custom(|iters| {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).unwrap();
let (start_alloc, _, _) = get_memory_stats();
let start_time = std::time::Instant::now();
for _ in 0..iters {
ocr.process_image("testdata/sample.png").unwrap();
}
let duration = start_time.elapsed();
let (end_alloc, _, current) = get_memory_stats();
println!("Memory delta: {} bytes", end_alloc - start_alloc);
println!("Current usage: {} bytes", current);
duration
});
});
group.bench_function("ocr_with_gpu_monitoring", |b| {
let monitor = Arc::new(Mutex::new(GPUMonitor::new()));
let monitor_clone = monitor.clone();
// Start GPU monitoring in background thread
let handle = thread::spawn(move || {
monitor_clone.lock().unwrap().start();
});
b.iter(|| {
let config = OCRConfig::default();
let ocr = ScipixOCR::new(config).unwrap();
ocr.process_image("testdata/sample.png").unwrap();
});
// Stop monitoring
monitor.lock().unwrap().stop();
handle.join().unwrap();
let stats = monitor.lock().unwrap().get_statistics();
println!("GPU Statistics: {:?}", stats);
});
group.finish();
}
criterion_group!(benches, benchmark_with_profiling);
criterion_main!(benches);
7. Regression Testing
7.1 Performance Baseline Tracking
Baseline Storage Structure:
{
"commit": "a1b2c3d4",
"timestamp": "2024-01-15T10:30:00Z",
"benchmarks": {
"ocr_simple_equation": {
"mean": 185.4,
"std_dev": 12.3,
"p50": 182.1,
"p95": 210.5,
"p99": 225.8
},
"ocr_batch_10_images": {
"mean": 1420.6,
"std_dev": 85.2,
"throughput": 7.04
}
},
"accuracy": {
"cer": 0.0185,
"wer": 0.0432,
"bleu": 87.3
}
}
Baseline Manager:
// src/baseline_manager.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::fs;
#[derive(Serialize, Deserialize, Clone)]
pub struct BenchmarkBaseline {
pub commit: String,
pub timestamp: String,
pub benchmarks: HashMap<String, BenchmarkMetrics>,
pub accuracy: AccuracyMetrics,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct BenchmarkMetrics {
pub mean: f64,
pub std_dev: f64,
pub p50: f64,
pub p95: f64,
pub p99: f64,
}
#[derive(Serialize, Deserialize, Clone)]
pub struct AccuracyMetrics {
pub cer: f64,
pub wer: f64,
pub bleu: f64,
}
pub struct BaselineManager {
baseline_path: String,
}
impl BaselineManager {
pub fn new(baseline_path: &str) -> Self {
Self {
baseline_path: baseline_path.to_string(),
}
}
pub fn load_baseline(&self) -> Result<BenchmarkBaseline, Box<dyn std::error::Error>> {
let content = fs::read_to_string(&self.baseline_path)?;
Ok(serde_json::from_str(&content)?)
}
pub fn save_baseline(&self, baseline: &BenchmarkBaseline) -> Result<(), Box<dyn std::error::Error>> {
let json = serde_json::to_string_pretty(baseline)?;
fs::write(&self.baseline_path, json)?;
Ok(())
}
pub fn compare_with_baseline(
&self,
current: &BenchmarkBaseline,
threshold: f64
) -> Vec<RegressionAlert> {
let baseline = match self.load_baseline() {
Ok(b) => b,
Err(_) => return vec![],
};
let mut alerts = Vec::new();
for (name, current_metrics) in ¤t.benchmarks {
if let Some(baseline_metrics) = baseline.benchmarks.get(name) {
let regression = (current_metrics.mean - baseline_metrics.mean) / baseline_metrics.mean;
if regression > threshold {
alerts.push(RegressionAlert {
benchmark: name.clone(),
metric: "mean".to_string(),
baseline_value: baseline_metrics.mean,
current_value: current_metrics.mean,
regression_percent: regression * 100.0,
severity: if regression > threshold * 2.0 { Severity::High } else { Severity::Medium },
});
}
}
}
// Check accuracy regressions
if current.accuracy.cer > baseline.accuracy.cer * (1.0 + threshold) {
alerts.push(RegressionAlert {
benchmark: "accuracy".to_string(),
metric: "cer".to_string(),
baseline_value: baseline.accuracy.cer,
current_value: current.accuracy.cer,
regression_percent: ((current.accuracy.cer - baseline.accuracy.cer) / baseline.accuracy.cer) * 100.0,
severity: Severity::High,
});
}
alerts
}
}
#[derive(Debug)]
pub struct RegressionAlert {
pub benchmark: String,
pub metric: String,
pub baseline_value: f64,
pub current_value: f64,
pub regression_percent: f64,
pub severity: Severity,
}
#[derive(Debug)]
pub enum Severity {
Low,
Medium,
High,
}
7.2 Automated Regression Detection
// scripts/detect_regression.rs
use ruvector_scipix::baseline_manager::{BaselineManager, BenchmarkBaseline};
use std::env;
use std::process;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 3 {
eprintln!("Usage: detect_regression <baseline.json> <current.json>");
process::exit(1);
}
let baseline_path = &args[1];
let current_path = &args[2];
let threshold = 0.10; // 10% regression threshold
let manager = BaselineManager::new(baseline_path);
// Load current results
let current: BenchmarkBaseline = {
let content = std::fs::read_to_string(current_path)
.expect("Failed to read current results");
serde_json::from_str(&content)
.expect("Failed to parse current results")
};
// Compare with baseline
let alerts = manager.compare_with_baseline(¤t, threshold);
if alerts.is_empty() {
println!("✅ No performance regressions detected");
process::exit(0);
} else {
println!("⚠️ Performance regressions detected:");
let mut has_high_severity = false;
for alert in &alerts {
let severity_icon = match alert.severity {
Severity::Low => "🟡",
Severity::Medium => "🟠",
Severity::High => "🔴",
};
if matches!(alert.severity, Severity::High) {
has_high_severity = true;
}
println!(
"{} {} / {}: {:.2}ms → {:.2}ms ({:+.1}%)",
severity_icon,
alert.benchmark,
alert.metric,
alert.baseline_value,
alert.current_value,
alert.regression_percent
);
}
if has_high_severity {
process::exit(1);
} else {
process::exit(0);
}
}
}
7.3 GitHub Actions Integration
# .github/workflows/regression_check.yml
name: Performance Regression Check
on:
pull_request:
branches: [main]
jobs:
regression-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Download baseline
run: |
# Download baseline from releases or artifacts
gh release download baseline --pattern 'benchmark_baseline.json'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Run benchmarks
run: |
cargo bench --bench scipix_ocr -- --save-baseline current_baseline.json
- name: Detect regressions
run: |
cargo run --bin detect_regression -- benchmark_baseline.json current_baseline.json
- name: Comment on PR
if: failure()
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '⚠️ Performance regression detected! Please review benchmark results.'
})
7.4 Continuous Baseline Updates
// scripts/update_baseline.rs
use ruvector_scipix::baseline_manager::{BaselineManager, BenchmarkBaseline};
use std::process::Command;
fn main() {
// Get current git commit
let commit = Command::new("git")
.args(&["rev-parse", "HEAD"])
.output()
.expect("Failed to get git commit")
.stdout;
let commit = String::from_utf8(commit).unwrap().trim().to_string();
// Run benchmarks
let benchmark_output = Command::new("cargo")
.args(&["bench", "--bench", "scipix_ocr", "--", "--save-baseline", "temp.json"])
.output()
.expect("Failed to run benchmarks");
if !benchmark_output.status.success() {
eprintln!("Benchmark failed");
std::process::exit(1);
}
// Load benchmark results
let baseline: BenchmarkBaseline = {
let content = std::fs::read_to_string("temp.json")
.expect("Failed to read benchmark results");
let mut baseline: BenchmarkBaseline = serde_json::from_str(&content)
.expect("Failed to parse benchmark results");
baseline.commit = commit;
baseline.timestamp = chrono::Utc::now().to_rfc3339();
baseline
};
// Save as new baseline
let manager = BaselineManager::new("benchmark_baseline.json");
manager.save_baseline(&baseline)
.expect("Failed to save baseline");
println!("✅ Baseline updated successfully");
println!("Commit: {}", baseline.commit);
println!("Timestamp: {}", baseline.timestamp);
}
Summary
This benchmarking strategy provides:
- Comprehensive Performance Metrics: Latency, throughput, memory, and model loading benchmarks
- Accuracy Validation: CER, WER, BLEU, and ERR metrics with industry-standard datasets
- Competitive Analysis: Baseline comparisons with Scipix, pix2tex, ocrs, and Tesseract
- Production-Ready Implementation: Criterion.rs benchmarks with CI/CD integration
- Advanced Profiling: CPU, memory, and GPU profiling tools
- Regression Protection: Automated detection and alerting for performance degradation
Next Steps:
- Set up test datasets (Im2latex, CROHME)
- Implement core benchmark suite
- Establish performance baselines
- Integrate into CI/CD pipeline
- Configure alerting for regressions
- Regular benchmark reviews and optimization
Benchmark Execution:
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench scipix_ocr
# Run with profiling
cargo flamegraph --bench scipix_ocr
# Check for regressions
cargo run --bin detect_regression -- baseline.json current.json
# Update baseline
cargo run --bin update_baseline