Files
wifi-densepose/examples/scipix/docs/03_RUST_ECOSYSTEM.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

58 KiB
Raw Blame History

Rust OCR and ML Ecosystem Analysis for ruvector-scipix

Executive Summary

This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment.

Key Finding: The optimal stack for ruvector-scipix combines ort (ONNX Runtime bindings) for inference, image/imageproc for preprocessing, with optional pure Rust alternatives (tract, candle) for WASM targets.


1. Library Comparison Matrix

OCR Libraries

Library Type Model Support WASM Support GPU Support Maturity Performance Dependencies
ocrs Native Rust ONNX (RTen engine) Yes No 🟡 Preview Medium Minimal (Pure Rust)
oar-ocr ONNX Wrapper PaddleOCR ONNX Yes CUDA 🟢 Stable High ort (ONNX Runtime)
kalosm-ocr Pure Rust TrOCR (candle) Yes WGPU/Metal/CUDA 🟡 Alpha Medium candle ML framework
leptess FFI Bindings Tesseract C++ No No 🟢 Mature High (CPU) Tesseract C++ library
paddle-ocr-rs ONNX Wrapper PaddleOCR v4/v5 Yes CUDA/TensorRT 🟢 Stable Very High ort (ONNX Runtime)
pure-onnx-ocr Pure ONNX PaddleOCR DBNet+SVTR Yes Via ONNX RT 🟢 Active (2025) High No C/C++ deps

ML Inference Engines

Library Purpose Model Format WASM Support GPU Support Performance Maturity
ort ONNX Runtime ONNX Yes CUDA/TensorRT/OpenVINO Very High 🟢 Production
candle ML Framework Multiple Yes CUDA/Metal/WGPU High 🟢 Stable (HuggingFace)
tract ONNX/TF Inference ONNX, NNEF, TF Yes Limited High (CPU) 🟢 Mature (Sonos)
burn Deep Learning Multiple Yes CUDA/Metal/WGPU Very High 🟢 Active

Legend: 🟢 Production-ready | 🟡 Active development | 🔴 Experimental

Performance Benchmarks

Based on research findings:

  • ort + PaddleOCR: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4)
  • ONNX conversion: Up to 5x faster than PaddlePaddle native inference
  • tract: 70μs (RPi Zero), 11μs (RPi 3) for CNN models
  • Tesseract (leptess): Baseline CPU performance, requires preprocessing
  • ocrs: Early preview, moderate performance on clear text

2. ONNX Runtime Integration Options

Overview: ort by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support.

Key Features:

  • Hardware Acceleration: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN
  • Dynamic Loading: Runtime linking for flexibility (load-dynamic feature)
  • Alternative Backends: Support for tract and candle backends
  • Minimal Builds: RTTI-free, optimized binary sizes for production
  • Float16/BFloat16: Via half crate integration
  • Production Proven: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB

Cargo Features:

[dependencies]
ort = { version = "2.0.0-rc", features = [
    "half",           # Float16/BFloat16 support
    "load-dynamic",   # Runtime dynamic linking
    "cuda",           # NVIDIA GPU acceleration (requires CUDA 11.6+)
    "tensorrt",       # TensorRT optimization (requires TensorRT 8.4+)
] }

Performance Characteristics:

  • Significantly faster than PyTorch for inference
  • Supports model quantization (int8, float16)
  • Multi-GPU distribution via NCCL
  • Optimal for batch processing and real-time inference

Integration Example:

use ort::{Session, Value};

// Load ONNX model
let session = Session::builder()?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(4)?
    .commit_from_file("model.onnx")?;

// Run inference
let input = Value::from_array(session.allocator(), &input_tensor)?;
let outputs = session.run(vec![input])?;

2.2 Alternative: tract Backend

Use Case: When ONNX Runtime binaries are problematic or WASM target required

Advantages:

  • Pure Rust implementation
  • No external C++ dependencies
  • Excellent WASM support
  • Passes 85% of ONNX backend tests
  • Lightweight and maintainable

Limitations:

  • No tensor sequences or optional tensors
  • Limited GPU support compared to ort
  • TensorFlow 2 support via ONNX conversion only

2.3 Alternative: candle Backend

Use Case: When integrating with Hugging Face ecosystem or needing pure Rust

Advantages:

  • Minimalist design, fast compilation
  • Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion)
  • WASM + WebGPU acceleration
  • Small binary size for serverless deployment
  • CUDA, Metal, MKL, Accelerate backends

Limitations:

  • Younger ecosystem than ONNX Runtime
  • Fewer pre-optimized OCR models available
  • Focus on inference over training

3. Pure Rust ML with Candle/Tract

3.1 Candle Framework (Hugging Face)

Architecture: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment.

Supported Models:

  • Language Models: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder
  • Vision Models: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything
  • Speech: Whisper ASR

Backend Support:

Backend Platform Performance Use Case
CUDA NVIDIA GPU Very High Production inference
Metal Apple Silicon High macOS/iOS deployment
CPU (MKL) x86 Intel Medium-High CPU-only servers
CPU (Accelerate) Apple Medium-High macOS CPU fallback
WGPU WebGPU-enabled Medium Browser deployment

Design Philosophy:

  • Remove Python from production workloads
  • Minimize binary size (critical for edge/serverless)
  • Fast startup times (first token ~120ms on M2 MacBook Air)
  • Rust's safety guarantees for ML workloads

Example Usage:

use candle_core::{Device, Tensor};
use candle_onnx;

// Load model
let model = candle_onnx::read_file("model.onnx")?;
let graph = model.graph.as_ref().unwrap();

// Create device (CUDA/Metal/CPU)
let device = Device::cuda_if_available(0)?;

// Run inference
let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?;
let output = model.forward(&[input])?;

3.2 Tract Framework (Sonos)

Architecture: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices.

Key Capabilities:

  • ONNX Support: 85% of ONNX backend tests passing
  • Operator Set: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18)
  • Proven Models: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc.
  • Pulsing: Streaming inference for time-series models (e.g., WaveNet)
  • Quantization: Built-in int8 quantization support

Performance Characteristics:

  • Optimized for CPU inference
  • Excellent for edge devices (Raspberry Pi, embedded systems)
  • Minimal memory footprint
  • No RTTI or runtime overhead

Example Usage:

use tract_onnx::prelude::*;

// Load and optimize model
let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
    .into_optimized()?
    .into_runnable()?;

// Run inference
let input = tract_ndarray::arr4(&[[...]]).into_dyn();
let result = model.run(tvec![input.into()])?;

Quantization Support:

let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
    .quantize()?  // Automatic int8 quantization
    .into_optimized()?
    .into_runnable()?;

3.3 Comparison: Candle vs Tract vs ort

Criterion Candle Tract ort
Performance (GPU) Very High N/A Very High
Performance (CPU) High Very High Very High
Binary Size Small Very Small Large
Startup Time Fast Very Fast Medium
WASM Support Excellent Excellent Good (with backends)
Model Ecosystem Hugging Face ONNX/TF ONNX (largest)
GPU Backends CUDA/Metal/WGPU Limited CUDA/TensorRT/OpenVINO
Quantization Manual Built-in Excellent (ONNX tools)
Maturity Stable (2024+) Mature (2018+) Production (Microsoft)

Recommendation:

  • ort: Primary choice for maximum performance and hardware acceleration
  • candle: Secondary choice for WASM targets or Hugging Face integration
  • tract: Fallback for pure Rust requirements or extreme size constraints

4. Image Processing in Rust

4.1 The image Crate (Foundation)

Purpose: Core image encoding/decoding and basic manipulation.

Supported Formats:

  • JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF

Key Features:

use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView};

// Load image
let img = image::open("input.jpg")?;

// Basic operations (in imageops module)
let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3);
let grayscale = img.grayscale();
let blurred = imageops::blur(&img, 2.0);
let contrast_adjusted = imageops::contrast(&img, 30.0);

4.2 The imageproc Crate (Advanced Processing)

Purpose: Advanced image processing algorithms for computer vision.

Modules:

Module Capabilities
Contrast Histogram equalization, adaptive thresholding, CLAHE
Corners Harris, FAST, Shi-Tomasi corner detection
Distance Transform Euclidean distance maps, morphological operations
Edges Canny edge detection, Sobel/Scharr operators
Filter Gaussian, median, bilateral filtering
Geometric Rotation, affine, projective transformations
Morphology Erosion, dilation, opening, closing
Drawing Shapes, text, anti-aliased primitives
Contours Border tracing, contour extraction

Parallelism: CPU-based multithreading via rayon (not GPU acceleration)

OCR Preprocessing Example:

use imageproc::contrast::{adaptive_threshold, ThresholdType};
use imageproc::filter::gaussian_blur_f32;
use imageproc::geometric_transformations::{rotate_about_center, Interpolation};

// Preprocessing pipeline for OCR
fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage {
    // Convert to grayscale
    let gray = img.to_luma8();

    // Denoise with Gaussian blur
    let blurred = gaussian_blur_f32(&gray, 1.0);

    // Adaptive thresholding for varying lighting
    let binary = adaptive_threshold(&blurred, 21);

    // Deskew if needed
    let angle = detect_skew(&binary); // Custom function
    let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8]));

    deskewed
}

4.3 GPU Acceleration Options for Image Processing

Current State: imageproc does NOT provide GPU acceleration. For GPU-accelerated image processing, consider:

Option 1: wgpu + Custom Compute Shaders

use wgpu;

// GPU compute shader for image processing
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
    label: Some("Image Processing"),
    source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")),
});

Option 2: OpenCV-Rust Bindings (if CUDA needed)

  • Provides GPU-accelerated operations via CUDA
  • Requires OpenCV C++ installation
  • Not pure Rust

Option 3: Integrate with ML Framework GPU Ops

  • Use candle/ort tensor operations for preprocessing
  • Leverage existing GPU context
  • Keep preprocessing on same device as inference

Recommendation for ruvector-scipix:

  • Use image + imageproc for CPU preprocessing (fast enough for most cases)
  • For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations
  • Leverage rayon parallelism for batch processing

5. GPU Acceleration Options

5.1 Cross-Platform GPU Support in 2025

The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs.

Unified Backend: wgpu (WebGPU Standard)

  • Targets: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers)
  • Use Case: Portable GPU compute without vendor lock-in
  • Frameworks: Burn, Candle (WGPU backend), kalosm

Performance Profile:

Backend Platform Speedup vs CPU Use Case
CUDA NVIDIA GPU 10-50x Production ML inference
TensorRT NVIDIA GPU 15-70x Optimized ONNX models
Metal Apple Silicon 8-30x macOS/iOS deployment
OpenVINO Intel 5-20x Intel CPU/GPU optimization
WGPU WebGPU-capable 3-15x Browser/cross-platform
ROCm AMD GPU 10-40x AMD GPU acceleration

5.2 CUDA Support

Primary Library: cudarc (Low-level CUDA bindings)

Integration via ONNX Runtime:

[dependencies]
ort = { version = "2.0", features = ["cuda"] }

Requirements:

  • CUDA Toolkit 11.6+ (for ort)
  • NVIDIA GPU: Maxwell (7xx series) or newer
  • Compute Capability 5.0+

Benefits:

  • Industry-standard ML acceleration
  • Mature ecosystem and tooling
  • Extensive operator coverage
  • Best-in-class performance for training and inference

5.3 Metal Support (Apple Silicon)

Framework Integration:

  • Candle: Native Metal backend via metal crate
  • Burn: Metal support through burn-metal backend
  • ONNX Runtime: CoreML execution provider (Metal-accelerated)

Example (Candle):

use candle_core::Device;

let device = Device::new_metal(0)?; // First Metal device
let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;

Performance: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips

5.4 WebGPU/WGPU

Purpose: Cross-platform GPU compute for WASM and native

Frameworks with WGPU Support:

  • Burn: First-class WGPU backend
  • Candle: WGPU support for browser deployment
  • Kalosm: WGPU acceleration via Fusor (0.5 release)

Browser Deployment:

// WASM-compatible GPU inference
#[cfg(target_arch = "wasm32")]
use candle_core::Device;

let device = Device::Cpu; // Or Device::Metal/Cuda if available

Benefits:

  • Browser-based ML inference without server
  • Works on AMD GPUs (unlike CUDA)
  • Portable across desktop and web
  • Future-proof standard (W3C specification)

Limitations:

  • Lower performance than native CUDA/Metal
  • Browser memory constraints (typically 2-8GB)
  • First token latency: ~120ms (acceptable for many use cases)

5.5 TensorRT (NVIDIA Optimization)

Purpose: Optimized ONNX model execution on NVIDIA GPUs

Requirements:

  • NVIDIA GPU: GeForce 9xx series or newer
  • TensorRT 8.4+
  • CUDA 11.6+

Integration:

ort = { version = "2.0", features = ["cuda", "tensorrt"] }

Benefits:

  • Automatic kernel fusion and layer optimization
  • Mixed precision (FP32/FP16/INT8)
  • Up to 2-5x faster than standard CUDA
  • Optimal for high-throughput production deployment

5.6 OpenVINO (Intel)

Target: Intel CPUs (6th gen+) and Intel integrated GPUs

Use Case:

  • Intel-based servers without discrete GPU
  • Edge devices with Intel processors
  • Cost-effective acceleration without NVIDIA hardware

Integration:

ort = { version = "2.0", features = ["openvino"] }

Performance: 5-20x CPU speedup depending on model and hardware

5.7 GPU Acceleration Recommendation for ruvector-scipix

Tiered Approach:

  1. Primary (Production): ort with CUDA/TensorRT

    • Maximum performance for server deployment
    • Best operator coverage for PaddleOCR models
    • Production-proven reliability
  2. Secondary (Apple Ecosystem): candle with Metal

    • Native Apple Silicon support
    • Good for macOS/iOS deployment
    • Smaller binary size than ONNX Runtime
  3. Tertiary (WASM/Browser): candle or tract with WGPU

    • Client-side OCR in browser
    • Privacy-preserving (no server upload)
    • Acceptable performance for interactive use
  4. Fallback (CPU-only): tract or ort with optimized CPU execution

    • MKL/OpenBLAS acceleration
    • Rayon parallelism
    • Still faster than Python alternatives

6. WebAssembly Compilation Considerations

6.1 WASM for ML: Current State (2025)

Key Finding: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives.

Performance Characteristics:

  • Rust compiles to WASM faster than C++
  • Rust produces smaller binaries than C++ WASM
  • Memory efficiency: Rust's ownership model translates well to WASM linear memory
  • Consistent performance across browsers

6.2 Memory Constraints and Optimization

Browser Memory Limits:

  • Typical: 2-4GB per tab (Chrome/Firefox)
  • Maximum: 4-8GB (varies by browser/OS)
  • Critical Issue: Running multiple models can exhaust memory quickly

Memory Optimization Strategies:

1. Model Quantization

// INT8 quantization reduces memory by 4x
// FP16 quantization reduces memory by 2x
let quantized_model = model.quantize(QuantizationType::QInt8)?;

2. Memory Reuse

// Pre-allocate tensors, reuse across inferences
struct InferenceContext {
    input_buffer: Vec<f32>,
    output_buffer: Vec<f32>,
}

impl InferenceContext {
    fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> {
        self.input_buffer.copy_from_slice(data);
        model.run(&self.input_buffer, &mut self.output_buffer)?;
        Ok(&self.output_buffer)
    }
}

3. Lazy Loading with Streaming Compile

// Use WebAssembly.instantiateStreaming for faster startup
// Load models on-demand, not at initialization
async fn load_model_lazy(url: &str) -> Result<Module> {
    let response = window.fetch(url).await?;
    let module = WebAssembly::instantiate_streaming(response).await?;
    Ok(module)
}

4. wasm-opt Optimization

# Optimize WASM binary size and performance
wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm

5. Model Cleanup

// Explicit cleanup when switching models
impl Drop for ModelContext {
    fn drop(&mut self) {
        // Free GPU resources
        self.gpu_buffers.clear();
        // Trigger garbage collection hint (if available)
    }
}

6.3 Bundle Size Considerations

Challenge: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint.

Mitigation Strategies:

1. Code Splitting

// Load OCR functionality separately from main bundle
#[wasm_bindgen]
pub async fn init_ocr() -> Result<OcrEngine, JsValue> {
    // Lazy-load OCR model
    let model = load_model("ocr.onnx").await?;
    Ok(OcrEngine::new(model))
}

2. Minimal Features

[dependencies]
ort = { version = "2.0", default-features = false, features = ["minimal-build"] }
tract-onnx = { version = "0.22", default-features = false }

3. Compression

# Brotli compression (recommended by Chrome)
brotli -q 11 output.wasm -o output.wasm.br

# Gzip fallback
gzip -9 output.wasm

4. Tree Shaking

[profile.release]
opt-level = "z"  # Optimize for size
lto = true
codegen-units = 1
panic = "abort"
strip = true

Expected Sizes:

Configuration Uncompressed Brotli Gzip
Minimal tract ~800KB ~250KB ~320KB
Full ort ~3MB ~900KB ~1.1MB
Candle (minimal) ~600KB ~180KB ~240KB

6.4 WASM-Specific Limitations

1. Threading Constraints

  • SharedArrayBuffer required for multi-threading
  • COEP/COOP headers needed for isolation
  • Not all browsers support WASM threads

2. SIMD Support

  • WASM SIMD enabled by default in modern browsers
  • Significant performance boost for ML operations
  • Check browser compatibility: wasm-feature-detect

3. No Direct File System Access

  • Use IndexedDB or Cache API for model storage
  • Stream models from network (HTTP/2)
  • Consider embedding small models in binary

4. GPU Access

  • WebGPU required for GPU acceleration
  • Not universally supported (as of 2025, Chrome/Edge primarily)
  • Fallback to CPU inference needed

Primary: candle with WGPU

  • Smallest binary size
  • Native WASM support
  • WebGPU acceleration when available
  • Hugging Face ecosystem

Secondary: tract

  • Pure Rust, no C++ dependencies
  • Excellent WASM support
  • Proven in production (Sonos)
  • CPU-optimized

Alternative: ort with WASM backend

  • Full ONNX operator support
  • Can use tract or candle as backend
  • Larger bundle size

Example WASM Integration:

use wasm_bindgen::prelude::*;
use candle_core::{Device, Tensor};

#[wasm_bindgen]
pub struct OcrEngine {
    model: candle_onnx::Model,
    device: Device,
}

#[wasm_bindgen]
impl OcrEngine {
    #[wasm_bindgen(constructor)]
    pub async fn new() -> Result<OcrEngine, JsValue> {
        // Use WebGPU if available, fallback to CPU
        let device = Device::Cpu; // Or Device::new_wgpu(0)?

        // Load model from URL
        let model_bytes = fetch_model("model.onnx").await?;
        let model = candle_onnx::read(&model_bytes)
            .map_err(|e| JsValue::from_str(&e.to_string()))?;

        Ok(OcrEngine { model, device })
    }

    pub fn recognize_text(&self, image_data: &[u8]) -> Result<String, JsValue> {
        // Preprocess image
        let tensor = preprocess_image(image_data, &self.device)?;

        // Run inference
        let output = self.model.forward(&[tensor])
            .map_err(|e| JsValue::from_str(&e.to_string()))?;

        // Decode output
        let text = decode_predictions(output)?;
        Ok(text)
    }
}

6.6 WASM Deployment Checklist

  • Enable WASM SIMD in build (RUSTFLAGS='-C target-feature=+simd128')
  • Optimize bundle size (opt-level = "z", LTO, strip)
  • Implement lazy loading for models
  • Set up proper CORS headers for model fetching
  • Add WebGPU feature detection with CPU fallback
  • Configure Brotli/Gzip compression on CDN
  • Test memory usage across browsers (especially mobile)
  • Implement model cleanup on tab close
  • Add loading indicators for async model initialization
  • Consider service worker for model caching

7. Memory Management for Large Models

7.1 Memory Challenges in ML Inference

Typical OCR Model Sizes:

  • PaddleOCR Detection: 3-10MB (FP32)
  • PaddleOCR Recognition: 5-15MB (FP32)
  • TrOCR: 50-300MB (depending on variant)
  • Tesseract trained data: 10-50MB per language

Memory Consumption Beyond Model Weights:

  • Input tensors: Image size × channels × precision
  • Intermediate activations: Varies by architecture (can exceed model size)
  • Output buffers: Sequence length × vocab size
  • KV cache (for transformers): Context length × hidden size × layers

7.2 Quantization Strategies

INT8 Quantization (4x memory reduction)

// ONNX Runtime quantization
use ort::quantization::{QuantizationConfig, QuantizationType};

let config = QuantizationConfig::default()
    .with_per_channel(true)
    .with_reduce_range(true);

let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?;

Benefits:

  • 75% memory reduction (FP32 → INT8)
  • Minimal accuracy loss (typically <1% for OCR)
  • Faster inference on integer-optimized hardware
  • Reduced cache pressure

FP16 Quantization (2x memory reduction)

// Using ort with half crate
use half::f16;
use ort::tensor::OrtOwnedTensor;

let input_f16: Vec<f16> = input_f32.iter().map(|&x| f16::from_f32(x)).collect();
let tensor = OrtOwnedTensor::from_array(input_f16)?;

Benefits:

  • Better accuracy preservation than INT8
  • Native support on modern GPUs (Tensor Cores)
  • Still significant memory savings

Dynamic Quantization (Runtime)

// tract supports dynamic quantization
let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))?
    .quantize()? // Automatic quantization
    .into_optimized()?
    .into_runnable()?;

7.3 Memory Pooling and Reuse

Tensor Buffer Reuse:

use std::sync::Arc;
use parking_lot::Mutex;

struct TensorPool {
    buffers: Vec<Arc<Mutex<Vec<f32>>>>,
    size: usize,
}

impl TensorPool {
    fn new(pool_size: usize, buffer_size: usize) -> Self {
        let buffers = (0..pool_size)
            .map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size])))
            .collect();
        TensorPool { buffers, size: pool_size }
    }

    fn acquire(&self) -> Option<Arc<Mutex<Vec<f32>>>> {
        // Round-robin or availability-based selection
        self.buffers.first().cloned()
    }
}

Session Pooling (ONNX Runtime):

use once_cell::sync::Lazy;
use ort::Session;

static SESSION_POOL: Lazy<Vec<Session>> = Lazy::new(|| {
    (0..4).map(|_| {
        Session::builder()
            .unwrap()
            .commit_from_file("model.onnx")
            .unwrap()
    }).collect()
});

fn get_session() -> &'static Session {
    &SESSION_POOL[thread_id % 4]
}

7.4 Streaming and Batching

Batch Processing (Amortize overhead):

fn process_batch(images: &[DynamicImage], model: &Session) -> Result<Vec<String>> {
    let batch_size = images.len();

    // Create batched tensor [batch_size, channels, height, width]
    let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224];

    for (i, img) in images.iter().enumerate() {
        let offset = i * 3 * 224 * 224;
        preprocess_into_buffer(img, &mut batch_tensor[offset..]);
    }

    // Single inference call for entire batch
    let output = model.run(vec![batch_tensor.into()])?;

    // Decode batch results
    decode_batch_predictions(output, batch_size)
}

Streaming Inference (For large documents):

async fn process_document_streaming(
    pages: impl Stream<Item = Image>,
    model: &Session,
) -> impl Stream<Item = Result<String>> {
    pages.map(|page| {
        // Process one page at a time
        let text = recognize_text(&page, model)?;
        Ok(text)
    })
}

7.5 Model Sharding and Lazy Loading

Lazy Model Loading:

use once_cell::sync::OnceCell;

static DETECTION_MODEL: OnceCell<Session> = OnceCell::new();
static RECOGNITION_MODEL: OnceCell<Session> = OnceCell::new();

fn get_detection_model() -> &'static Session {
    DETECTION_MODEL.get_or_init(|| {
        Session::builder()
            .unwrap()
            .commit_from_file("detection.onnx")
            .unwrap()
    })
}

Conditional Loading:

// Only load language-specific models when needed
struct OcrEngine {
    detection: Session,
    recognition_models: HashMap<Language, OnceCell<Session>>,
}

impl OcrEngine {
    fn recognize(&self, img: &Image, lang: Language) -> Result<String> {
        let boxes = self.detect(img)?;

        let rec_model = self.recognition_models
            .get(&lang)
            .unwrap()
            .get_or_init(|| load_recognition_model(lang));

        self.recognize_boxes(img, &boxes, rec_model)
    }
}

7.6 Memory Mapping (Large Models)

Using memmap2 for Model Files:

use memmap2::Mmap;
use std::fs::File;

fn load_model_mmap(path: &str) -> Result<Mmap> {
    let file = File::open(path)?;
    let mmap = unsafe { Mmap::map(&file)? };
    Ok(mmap)
}

// Model data stays on disk, paged in as needed
// Useful for models >100MB

Benefits:

  • Reduced resident memory
  • Faster startup (no full load)
  • Shared memory across processes

Limitations:

  • Not available in WASM
  • Requires file system access
  • May have higher latency on first access

7.7 GPU Memory Management

CUDA Unified Memory:

// ort automatically manages GPU memory
let session = Session::builder()?
    .with_execution_providers([ExecutionProvider::CUDA])?
    .commit_from_file("model.onnx")?;

// Tensors automatically transferred to/from GPU

Manual GPU Memory Control (candle):

use candle_core::{Device, Tensor};

let device = Device::new_cuda(0)?;

// Allocate on GPU
let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;

// Transfer to CPU when needed
let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?;

// Explicit cleanup
drop(tensor_gpu);

7.8 Memory Profiling and Monitoring

Rust Memory Profiling Tools:

  • valgrind --tool=massif: Heap profiling
  • heaptrack: Heap memory profiler (Linux)
  • dhat: Dynamic heap analysis tool
  • tokio-console: Async runtime monitoring

Custom Memory Tracking:

use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};

struct TrackingAllocator;

static ALLOCATED: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackingAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
        System.dealloc(ptr, layout)
    }
}

#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;

fn get_memory_usage() -> usize {
    ALLOCATED.load(Ordering::SeqCst)
}

7.9 Memory Optimization Recommendations for ruvector-scipix

Priority Strategies:

  1. Quantize Models (INT8 for production)

    • 4x memory reduction
    • Minimal accuracy impact for OCR
    • Use ONNX Runtime quantization tools
  2. Implement Tensor Pooling

    • Reuse buffers for repeated inferences
    • Align with ruvector-core's memory management patterns
    • Use parking_lot for efficient synchronization
  3. Lazy Load Language Models

    • Only load recognition models for requested languages
    • Use OnceCell for thread-safe initialization
    • Share models across threads
  4. Batch Processing

    • Group multiple images into single inference call
    • Amortize overhead, improve GPU utilization
    • Integrate with ruvector's parallel processing
  5. GPU Memory Awareness

    • Monitor GPU memory usage
    • Implement fallback to CPU if GPU OOM
    • Use smaller batch sizes on memory-constrained devices
  6. Profile Real Workloads

    • Measure memory with actual ruvector data
    • Identify bottlenecks (model weights vs activations)
    • Optimize based on data

8.1 Primary Stack (Production Deployment)

Inference Engine: ort (ONNX Runtime)

  • Version: 2.0.0-rc or latest stable
  • Features: cuda, tensorrt, half, load-dynamic
  • Rationale:
    • Best-in-class performance (73% latency reduction)
    • Extensive GPU support (CUDA, TensorRT, OpenVINO)
    • Production-proven (Twitter, Google, SurrealDB)
    • Largest ONNX model ecosystem

OCR Models: PaddleOCR v5 (ONNX format)

  • Detection: ch_PP-OCRv5_mobile_det.onnx
  • Recognition: ch_PP-OCRv5_mobile_rec.onnx
  • Rationale:
    • State-of-the-art accuracy
    • Optimized for speed (5x faster in ONNX)
    • Multi-language support (80+ languages)
    • Active development (2025 updates)

Image Processing: image + imageproc

  • Version: Latest stable
  • Rationale:
    • Comprehensive format support
    • CPU parallelism via rayon (already in workspace)
    • Mature, well-tested
    • Pure Rust (no C++ dependencies)

Dependencies Integration:

[dependencies]
# Inference
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }

# Image processing
image = "0.25"
imageproc = "0.25"

# Existing ruvector-core dependencies (reuse)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }

8.2 Alternative Stack (WASM/Browser Deployment)

Inference Engine: candle with WGPU backend

  • Version: Latest stable from Hugging Face
  • Features: wasm, webgpu
  • Rationale:
    • Smallest WASM bundle size
    • Native WebGPU support
    • Fast startup times
    • Pure Rust

OCR Models: TrOCR (via candle-onnx) or lightweight PaddleOCR

  • Smaller models for browser constraints
  • Quantized INT8 versions

WASM-Specific Stack:

[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
candle-onnx = { version = "0.8" }
wasm-bindgen = { workspace = true }
web-sys = { workspace = true }

8.3 Fallback Stack (Pure Rust/No External Dependencies)

Inference Engine: tract

  • Use Case: When ONNX Runtime binaries unavailable or pure Rust required
  • Rationale:
    • No C++ dependencies
    • Excellent WASM support
    • Mature (Sonos production use)
    • Passes 85% ONNX tests

Stack:

[dependencies]
tract-onnx = "0.22"
image = "0.25"
imageproc = "0.25"

8.4 Architecture Design

┌─────────────────────────────────────────────────────────────┐
│                    ruvector-scipix                          │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐     ┌──────────────┐    ┌──────────────┐  │
│  │  Image Input │────▶│ Preprocessing│───▶│   Detection  │  │
│  │   (image)    │     │ (imageproc)  │    │  (ort/ONNX)  │  │
│  └──────────────┘     └──────────────┘    └──────┬───────┘  │
│                                                   │          │
│                                                   ▼          │
│                                          ┌──────────────┐    │
│                                          │  Text Boxes  │    │
│                                          └──────┬───────┘    │
│                                                 │            │
│                       ┌─────────────────────────┘            │
│                       │                                      │
│                       ▼                                      │
│              ┌──────────────┐      ┌──────────────┐          │
│              │ Recognition  │─────▶│  Post-Proc.  │          │
│              │  (ort/ONNX)  │      │   (decode)   │          │
│              └──────────────┘      └──────┬───────┘          │
│                                           │                  │
│                                           ▼                  │
│                                  ┌──────────────┐            │
│                                  │ Vector Store │            │
│                                  │ (ruvector-   │            │
│                                  │    core)     │            │
│                                  └──────────────┘            │
│                                                               │
└─────────────────────────────────────────────────────────────┘

GPU Acceleration Layers:
├─ CUDA/TensorRT (NVIDIA)
├─ Metal (Apple Silicon)
├─ OpenVINO (Intel)
└─ WGPU (Cross-platform/Browser)

8.5 Module Structure

examples/scipix/
├── Cargo.toml
├── src/
│   ├── lib.rs              # Public API
│   ├── engine.rs           # OCR engine orchestration
│   ├── detection.rs        # Text detection (ONNX)
│   ├── recognition.rs      # Text recognition (ONNX)
│   ├── preprocessing.rs    # Image preprocessing (imageproc)
│   ├── postprocessing.rs   # Result decoding and formatting
│   ├── models.rs           # Model loading and management
│   └── config.rs           # Configuration
├── models/                 # ONNX model files (gitignored)
│   ├── detection.onnx
│   ├── recognition.onnx
│   └── dict.txt
├── tests/
│   ├── integration_test.rs
│   └── benchmark.rs
└── docs/
    ├── 01_REQUIREMENTS.md
    ├── 02_ARCHITECTURE.md
    └── 03_RUST_ECOSYSTEM.md  # This document

8.6 Performance Targets

Based on PaddleOCR benchmarks and Rust optimizations:

Metric Target Hardware
Detection Latency <50ms NVIDIA T4 (TensorRT)
Recognition Latency <20ms NVIDIA T4 (TensorRT)
End-to-End (single image) <100ms NVIDIA T4
Throughput (batched) >100 images/sec NVIDIA T4
CPU Latency <500ms Modern multi-core CPU
WASM Latency <1s Browser (WebGPU)
Memory Usage <500MB With INT8 quantization

8.7 Development Phases

Phase 1: Core Implementation (ort + PaddleOCR)

  • Implement detection and recognition pipelines
  • Integrate with ruvector-core storage
  • CPU-only inference initially
  • Basic preprocessing (resize, normalize)

Phase 2: GPU Acceleration

  • Add CUDA/TensorRT support
  • Benchmark and optimize performance
  • Implement batching for throughput
  • Memory pooling and reuse

Phase 3: Production Hardening

  • Model quantization (INT8)
  • Error handling and fallbacks
  • Metrics and monitoring
  • Load testing

Phase 4: WASM Support (Optional)

  • Port to candle or tract
  • Browser deployment
  • WebGPU acceleration
  • Client-side OCR

8.8 Testing Strategy

Unit Tests:

  • Image preprocessing correctness
  • Model loading and initialization
  • Tensor shape validation
  • Output decoding accuracy

Integration Tests:

#[test]
fn test_end_to_end_ocr() {
    let engine = OcrEngine::new(Config::default()).unwrap();
    let img = image::open("tests/fixtures/sample.jpg").unwrap();
    let result = engine.recognize_text(&img).unwrap();
    assert!(result.contains("expected text"));
}

Benchmarks (using Criterion):

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_detection(c: &mut Criterion) {
    let engine = setup_engine();
    let img = load_test_image();

    c.bench_function("detection", |b| {
        b.iter(|| engine.detect(black_box(&img)))
    });
}

criterion_group!(benches, benchmark_detection);
criterion_main!(benches);

Performance Tests:

  • Latency under various image sizes
  • Throughput with batching
  • Memory usage over time
  • GPU utilization

9. Integration with ruvector-core Dependencies

9.1 Shared Workspace Dependencies

The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency.

Already Available (from workspace):

Dependency ruvector Use scipix Use
rayon Parallel distance computation Batch image preprocessing, parallel OCR
ndarray Vector operations Tensor manipulation, image arrays
parking_lot Lock-free data structures Model pool synchronization
dashmap Concurrent hash maps Model cache, result cache
tokio Async runtime Async inference, streaming
serde / serde_json Serialization Config, results serialization
thiserror / anyhow Error handling OCR error types
tracing Logging Inference timing, debugging
uuid Unique identifiers Request tracking
chrono Timestamps Inference metrics

Benefits:

  • Minimal new dependencies: Only add OCR-specific crates
  • Consistent patterns: Same error handling, logging, async across codebase
  • Binary size: Shared dependencies not duplicated
  • Maintenance: Updates to workspace deps benefit all crates

9.2 Parallel Processing Integration

Leverage rayon for Batch OCR:

use rayon::prelude::*;

fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec<OcrResult> {
    images.par_iter()
        .map(|img| engine.recognize_text(img))
        .collect()
}

Consistency: Matches ruvector-core's parallel distance computation pattern

9.3 Storage Integration

Store OCR Results in ruvector-core:

use ruvector_core::{VectorStore, Vector};

struct OcrResult {
    text: String,
    embedding: Vec<f32>,  // From embedding model
    bounding_boxes: Vec<BoundingBox>,
}

impl OcrResult {
    fn store_in_ruvector(&self, store: &mut VectorStore) -> Result<uuid::Uuid> {
        let vector = Vector::new(self.embedding.clone());
        let id = store.insert(vector)?;

        // Store metadata
        store.set_metadata(id, "text", &self.text)?;
        store.set_metadata(id, "boxes", &self.bounding_boxes)?;

        Ok(id)
    }
}

Vector Search for OCR Results:

// Find similar documents by text embedding
let query_embedding = embed_text("search query")?;
let similar_docs = store.search(&query_embedding, 10)?;

9.4 WASM Compatibility

ruvector-core WASM Patterns:

  • memory-only feature for WASM targets
  • wasm-bindgen for browser interop
  • getrandom with wasm_js feature

Apply to scipix:

[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
wasm-bindgen = { workspace = true }
getrandom = { workspace = true, features = ["wasm_js"] }

[features]
default = ["ort-backend"]
ort-backend = ["ort"]
candle-backend = ["candle-core", "candle-onnx"]
wasm = ["candle-backend"]  # WASM uses candle

9.5 Error Handling Patterns

Consistent with ruvector-core:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum OcrError {
    #[error("Model loading failed: {0}")]
    ModelLoadError(String),

    #[error("Inference failed: {0}")]
    InferenceError(String),

    #[error("Image preprocessing failed: {0}")]
    PreprocessingError(#[from] image::ImageError),

    #[error("ONNX Runtime error: {0}")]
    OrtError(#[from] ort::Error),

    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

pub type Result<T> = std::result::Result<T, OcrError>;

9.6 Configuration Pattern

Similar to ruvector-core config:

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OcrConfig {
    /// Path to detection model
    pub detection_model_path: String,

    /// Path to recognition model
    pub recognition_model_path: String,

    /// Use GPU acceleration if available
    pub use_gpu: bool,

    /// Batch size for parallel processing
    pub batch_size: usize,

    /// Detection confidence threshold
    pub detection_threshold: f32,

    /// Number of inference threads
    pub num_threads: usize,
}

impl Default for OcrConfig {
    fn default() -> Self {
        Self {
            detection_model_path: "models/detection.onnx".into(),
            recognition_model_path: "models/recognition.onnx".into(),
            use_gpu: true,
            batch_size: 8,
            detection_threshold: 0.7,
            num_threads: rayon::current_num_threads(),
        }
    }
}

9.7 Async Integration

Use tokio for async OCR:

use tokio::task;

pub struct AsyncOcrEngine {
    engine: Arc<OcrEngine>,
}

impl AsyncOcrEngine {
    pub async fn recognize_text(&self, image: DynamicImage) -> Result<OcrResult> {
        let engine = Arc::clone(&self.engine);

        // Run blocking OCR in tokio threadpool
        task::spawn_blocking(move || {
            engine.recognize_text_sync(&image)
        }).await?
    }

    pub async fn process_stream(
        &self,
        images: impl Stream<Item = DynamicImage>,
    ) -> impl Stream<Item = Result<OcrResult>> {
        images.then(move |img| {
            let engine = Arc::clone(&self.engine);
            async move {
                engine.recognize_text(img).await
            }
        })
    }
}

9.8 Metrics Integration

Use existing tracing infrastructure:

use tracing::{info, debug, instrument};

#[instrument(skip(self, image))]
pub fn recognize_text(&self, image: &DynamicImage) -> Result<OcrResult> {
    let start = std::time::Instant::now();

    debug!("Starting OCR for image {}x{}", image.width(), image.height());

    let preprocessed = self.preprocess(image)?;
    debug!("Preprocessing took {:?}", start.elapsed());

    let boxes = self.detect(&preprocessed)?;
    debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed());

    let text = self.recognize(&preprocessed, &boxes)?;

    info!(
        "OCR completed in {:?}, extracted {} characters",
        start.elapsed(),
        text.len()
    );

    Ok(OcrResult { text, boxes })
}

9.9 Testing Infrastructure Reuse

Use workspace test dependencies:

[dev-dependencies]
criterion = { workspace = true }
proptest = { workspace = true }
mockall = { workspace = true }
tempfile = "3.13"

Property-Based Testing (like ruvector-core):

use proptest::prelude::*;

proptest! {
    #[test]
    fn test_preprocessing_preserves_aspect_ratio(
        width in 100u32..2000u32,
        height in 100u32..2000u32
    ) {
        let img = DynamicImage::new_rgb8(width, height);
        let processed = preprocess_image(&img)?;

        let original_ratio = width as f32 / height as f32;
        let processed_ratio = processed.width() as f32 / processed.height() as f32;

        prop_assert!((original_ratio - processed_ratio).abs() < 0.01);
    }
}

9.10 Dependency Summary for scipix

New Dependencies Required:

[dependencies]
# OCR/ML (new)
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] }
image = "0.25"
imageproc = "0.25"

# Reuse from workspace (no version needed)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }

# Integration with ruvector-core
ruvector-core = { path = "../../crates/ruvector-core" }

Total New Dependencies: 3 (ort, image, imageproc) Reused Dependencies: 12 from workspace


10. License Compatibility

10.1 ruvector Project License

Current License: MIT (from workspace Cargo.toml)

Requirement: All dependencies must be MIT-compatible for redistribution.

Crate License Compatible? Notes
ort MIT OR Apache-2.0 Yes Dual-licensed, fully compatible
candle MIT OR Apache-2.0 Yes Hugging Face, dual-licensed
tract MIT OR Apache-2.0 Yes Dual-licensed (except ONNX protos)
image MIT OR Apache-2.0 Yes Pure Rust, dual-licensed
imageproc MIT Yes Permissive, MIT-only
ndarray MIT OR Apache-2.0 Yes Already in workspace
rayon MIT OR Apache-2.0 Yes Already in workspace
wasm-bindgen MIT OR Apache-2.0 Yes Already in workspace

Incompatible Libraries (Avoid):

Crate License Issue
leptess MIT (wrapper) Depends on Tesseract (Apache-2.0 with restrictions)
opencv-rust MIT (wrapper) Depends on OpenCV (Apache-2.0, complex)

10.3 ONNX Model Licenses

PaddleOCR models used in ONNX format have Apache-2.0 license.

Compatibility:

  • Apache-2.0 code can be used in MIT-licensed projects
  • ONNX models (weights) are typically considered data, not code
  • Distribution of pre-trained models is permitted
  • ⚠️ Derivative works of Apache-2.0 code require patent grant preservation

Best Practice:

  • Download PaddleOCR ONNX models from official sources
  • Include LICENSE file in models/ directory
  • Document model provenance in README
  • Do not modify Apache-2.0 code (use as-is via ONNX)

10.4 Rust Dual-Licensing Best Practices

Why Rust Uses MIT OR Apache-2.0:

  • MIT: Maximum permissiveness, minimal restrictions
  • Apache-2.0: Patent protection, better for corporate use
  • Dual License: Users choose which applies to them

For ruvector-scipix:

Option 1: Keep MIT-only (Current)

  • Simplest licensing
  • Maximum compatibility
  • Minimal legal overhead
  • All dependencies are MIT-compatible

Option 2: Adopt Dual MIT/Apache-2.0

  • Better patent protection
  • Aligns with Rust ecosystem norms
  • More attractive to enterprise users
  • ⚠️ Slightly more complex

Recommendation: Keep MIT-only for simplicity, unless patent concerns arise.

10.5 License Compliance Checklist

For Production Deployment:

  • Verify all direct dependencies are MIT or MIT/Apache-2.0
  • Check transitive dependencies for license conflicts
  • Include LICENSE file in repository
  • Document third-party licenses in NOTICE file
  • Include PaddleOCR model license in models/LICENSE
  • Add copyright headers to source files (optional for MIT)
  • Review ONNX Runtime's license (MIT, but check binary distribution terms)
  • Ensure no GPL/LGPL dependencies (incompatible with MIT)

Automated License Checking:

# Use cargo-license to audit dependencies
cargo install cargo-license
cargo license --all-features

# Fail build on incompatible licenses
cargo deny check licenses

deny.toml Configuration:

[licenses]
unlicensed = "deny"
allow = [
    "MIT",
    "Apache-2.0",
    "Apache-2.0 WITH LLVM-exception",
    "BSD-2-Clause",
    "BSD-3-Clause",
    "ISC",
    "Unicode-DFS-2016",
]
deny = [
    "GPL-2.0",
    "GPL-3.0",
    "AGPL-3.0",
]

10.6 Attribution Requirements

MIT License Requirements:

  • Include copyright notice
  • Include permission notice (LICENSE file)
  • No obligation to disclose source code modifications

For PaddleOCR Models (Apache-2.0):

  • Include NOTICE file if provided
  • Preserve copyright and patent notices
  • Document significant modifications (if any)

Recommended NOTICE File:

ruvector-scipix
Copyright 2025 Ruvector Team

This software includes components from:

1. ONNX Runtime
   Copyright Microsoft Corporation
   Licensed under MIT License

2. PaddleOCR Models
   Copyright PaddlePaddle Authors
   Licensed under Apache License 2.0
   Model files located in models/ directory

3. Candle ML Framework
   Copyright Hugging Face, Inc.
   Licensed under MIT OR Apache-2.0

Complete license texts available in the LICENSE and models/LICENSE files.

10.7 License Compatibility Summary

SAFE TO USE (Recommended Stack):

  • ort - MIT/Apache-2.0
  • image - MIT/Apache-2.0
  • imageproc - MIT
  • candle - MIT/Apache-2.0
  • tract - MIT/Apache-2.0
  • PaddleOCR ONNX models - Apache-2.0 (data)

⚠️ USE WITH CAUTION:

  • leptess - Requires Tesseract C++ library (complex licensing)
  • opencv-rust - Requires OpenCV (large dependency, Apache-2.0)

AVOID:

  • Any GPL/LGPL libraries (incompatible with MIT for proprietary use)
  • Proprietary OCR engines (licensing fees, redistribution restrictions)

Final Recommendation: The proposed stack (ort + PaddleOCR + image/imageproc) is fully compatible with ruvector's MIT license and follows Rust ecosystem best practices.


11. Final Recommendations

11.1 Optimal Technology Stack

Primary Recommendation (Production):

[dependencies]
# Inference: Best performance, production-proven
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }

# Image processing: Pure Rust, mature
image = "0.25"
imageproc = "0.25"

# OCR models: PaddleOCR v5 ONNX (download separately)
# - Detection: ch_PP-OCRv5_mobile_det.onnx
# - Recognition: ch_PP-OCRv5_mobile_rec.onnx

# Reuse workspace dependencies
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
thiserror = { workspace = true }

# Integration
ruvector-core = { path = "../../crates/ruvector-core" }

Rationale:

  1. Performance: ort provides 73% latency reduction vs alternatives
  2. Ecosystem: Largest ONNX model selection (PaddleOCR, TrOCR, etc.)
  3. GPU Support: CUDA, TensorRT, OpenVINO, Metal (via CoreML)
  4. Production Ready: Used by Twitter, Google, SurrealDB
  5. License: MIT/Apache-2.0 dual-license (fully compatible)
  6. Maintenance: Active development, Microsoft backing

11.2 Alternative Stacks by Use Case

WASM/Browser Deployment:

candle-core = { version = "0.8", features = ["wasm", "webgpu"] }
candle-onnx = "0.8"
  • Smallest bundle size (~180KB Brotli)
  • WebGPU acceleration
  • Fast startup (120ms first token)

Pure Rust / No External Deps:

tract-onnx = "0.22"
  • No C++ dependencies
  • Excellent for embedded/restrictive environments
  • 85% ONNX compatibility

Edge Devices / Raspberry Pi:

tract-onnx = { version = "0.22", features = ["pulse"] }
  • Optimized for CPU inference
  • Minimal memory footprint
  • Proven on RPi (11μs for CNN models)

11.3 Implementation Roadmap

Week 1-2: Core Infrastructure

  • Set up examples/scipix crate structure
  • Integrate ort and image/imageproc
  • Implement model loading (detection + recognition)
  • Basic end-to-end pipeline (CPU-only)

Week 3-4: GPU Acceleration

  • Enable CUDA/TensorRT support
  • Implement batching for throughput
  • Benchmark performance vs targets
  • Memory pooling and optimization

Week 5-6: Production Hardening

  • Model quantization (INT8)
  • Error handling and recovery
  • Metrics and monitoring (tracing)
  • Integration tests and benchmarks

Week 7-8: ruvector Integration

  • Store OCR results in ruvector-core
  • Implement vector search for documents
  • Async API with tokio
  • Documentation and examples

Optional (Week 9-10): WASM Support

  • Port to candle for browser deployment
  • WebGPU acceleration
  • Client-side OCR demo

11.4 Key Metrics to Track

Performance:

  • Detection latency: Target <50ms (GPU), <200ms (CPU)
  • Recognition latency: Target <20ms (GPU), <100ms (CPU)
  • End-to-end: Target <100ms (GPU), <500ms (CPU)
  • Throughput: Target >100 images/sec (batched, GPU)

Memory:

  • Model size: ~15-30MB (FP32), ~5-10MB (INT8)
  • Runtime memory: Target <500MB
  • GPU memory: Monitor for OOM

Accuracy:

  • Character accuracy: Target >95% (clean text)
  • Word accuracy: Target >90%
  • Benchmark against Tesseract and commercial APIs

11.5 Risk Mitigation

Model Availability:

  • PaddleOCR models freely available
  • Multiple model versions for fallback
  • ⚠️ Verify ONNX export quality (may need custom conversion)

Dependency Stability:

  • ort actively maintained (2.0 rc, stable release expected)
  • image/imageproc mature, widely used
  • ⚠️ Monitor for breaking changes during updates

Performance Variability:

  • ⚠️ GPU performance depends on driver versions
  • ⚠️ WASM performance varies by browser
  • Comprehensive benchmarking before production

License Compliance:

  • All recommended dependencies MIT-compatible
  • PaddleOCR Apache-2.0 (compatible for use)
  • ⚠️ Review licenses before adding new dependencies

11.6 Success Criteria

The ruvector-scipix implementation is successful if:

  1. Performance: Meets or exceeds latency/throughput targets
  2. Accuracy: Character accuracy >95% on clean text
  3. Integration: Seamlessly stores results in ruvector-core
  4. Portability: Runs on Linux/macOS/Windows, CPU and GPU
  5. Memory: Operates within <500MB budget
  6. License: Maintains MIT compatibility
  7. Maintainability: Uses idiomatic Rust, well-documented
  8. Scalability: Handles batch processing efficiently

11.7 Next Steps

  1. Review this document with ruvector team for alignment
  2. Download PaddleOCR models (detection + recognition ONNX)
  3. Set up examples/scipix crate with recommended dependencies
  4. Implement basic OCR pipeline (end-to-end proof of concept)
  5. Benchmark initial implementation against targets
  6. Iterate and optimize based on real-world data
  7. Document API and usage examples
  8. Integrate with ruvector-core for vector storage

References and Resources

Documentation

Performance Benchmarks

WASM Resources

License Information


Document Version: 1.0 Last Updated: 2025-11-28 Author: Research and Analysis Agent Status: Complete