git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
58 KiB
Rust OCR and ML Ecosystem Analysis for ruvector-scipix
Executive Summary
This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment.
Key Finding: The optimal stack for ruvector-scipix combines ort (ONNX Runtime bindings) for inference, image/imageproc for preprocessing, with optional pure Rust alternatives (tract, candle) for WASM targets.
1. Library Comparison Matrix
OCR Libraries
| Library | Type | Model Support | WASM Support | GPU Support | Maturity | Performance | Dependencies |
|---|---|---|---|---|---|---|---|
| ocrs | Native Rust | ONNX (RTen engine) | ✅ Yes | ❌ No | 🟡 Preview | Medium | Minimal (Pure Rust) |
| oar-ocr | ONNX Wrapper | PaddleOCR ONNX | ✅ Yes | ✅ CUDA | 🟢 Stable | High | ort (ONNX Runtime) |
| kalosm-ocr | Pure Rust | TrOCR (candle) | ✅ Yes | ✅ WGPU/Metal/CUDA | 🟡 Alpha | Medium | candle ML framework |
| leptess | FFI Bindings | Tesseract C++ | ❌ No | ❌ No | 🟢 Mature | High (CPU) | Tesseract C++ library |
| paddle-ocr-rs | ONNX Wrapper | PaddleOCR v4/v5 | ✅ Yes | ✅ CUDA/TensorRT | 🟢 Stable | Very High | ort (ONNX Runtime) |
| pure-onnx-ocr | Pure ONNX | PaddleOCR DBNet+SVTR | ✅ Yes | ✅ Via ONNX RT | 🟢 Active (2025) | High | No C/C++ deps |
ML Inference Engines
| Library | Purpose | Model Format | WASM Support | GPU Support | Performance | Maturity |
|---|---|---|---|---|---|---|
| ort | ONNX Runtime | ONNX | ✅ Yes | ✅ CUDA/TensorRT/OpenVINO | Very High | 🟢 Production |
| candle | ML Framework | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | High | 🟢 Stable (HuggingFace) |
| tract | ONNX/TF Inference | ONNX, NNEF, TF | ✅ Yes | ❌ Limited | High (CPU) | 🟢 Mature (Sonos) |
| burn | Deep Learning | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | Very High | 🟢 Active |
Legend: 🟢 Production-ready | 🟡 Active development | 🔴 Experimental
Performance Benchmarks
Based on research findings:
- ort + PaddleOCR: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4)
- ONNX conversion: Up to 5x faster than PaddlePaddle native inference
- tract: 70μs (RPi Zero), 11μs (RPi 3) for CNN models
- Tesseract (leptess): Baseline CPU performance, requires preprocessing
- ocrs: Early preview, moderate performance on clear text
2. ONNX Runtime Integration Options
2.1 The ort Crate (Recommended)
Overview: ort by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support.
Key Features:
- Hardware Acceleration: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN
- Dynamic Loading: Runtime linking for flexibility (
load-dynamicfeature) - Alternative Backends: Support for tract and candle backends
- Minimal Builds: RTTI-free, optimized binary sizes for production
- Float16/BFloat16: Via
halfcrate integration - Production Proven: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB
Cargo Features:
[dependencies]
ort = { version = "2.0.0-rc", features = [
"half", # Float16/BFloat16 support
"load-dynamic", # Runtime dynamic linking
"cuda", # NVIDIA GPU acceleration (requires CUDA 11.6+)
"tensorrt", # TensorRT optimization (requires TensorRT 8.4+)
] }
Performance Characteristics:
- Significantly faster than PyTorch for inference
- Supports model quantization (int8, float16)
- Multi-GPU distribution via NCCL
- Optimal for batch processing and real-time inference
Integration Example:
use ort::{Session, Value};
// Load ONNX model
let session = Session::builder()?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(4)?
.commit_from_file("model.onnx")?;
// Run inference
let input = Value::from_array(session.allocator(), &input_tensor)?;
let outputs = session.run(vec![input])?;
2.2 Alternative: tract Backend
Use Case: When ONNX Runtime binaries are problematic or WASM target required
Advantages:
- Pure Rust implementation
- No external C++ dependencies
- Excellent WASM support
- Passes 85% of ONNX backend tests
- Lightweight and maintainable
Limitations:
- No tensor sequences or optional tensors
- Limited GPU support compared to ort
- TensorFlow 2 support via ONNX conversion only
2.3 Alternative: candle Backend
Use Case: When integrating with Hugging Face ecosystem or needing pure Rust
Advantages:
- Minimalist design, fast compilation
- Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion)
- WASM + WebGPU acceleration
- Small binary size for serverless deployment
- CUDA, Metal, MKL, Accelerate backends
Limitations:
- Younger ecosystem than ONNX Runtime
- Fewer pre-optimized OCR models available
- Focus on inference over training
3. Pure Rust ML with Candle/Tract
3.1 Candle Framework (Hugging Face)
Architecture: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment.
Supported Models:
- Language Models: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder
- Vision Models: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything
- Speech: Whisper ASR
Backend Support:
| Backend | Platform | Performance | Use Case |
|---|---|---|---|
| CUDA | NVIDIA GPU | Very High | Production inference |
| Metal | Apple Silicon | High | macOS/iOS deployment |
| CPU (MKL) | x86 Intel | Medium-High | CPU-only servers |
| CPU (Accelerate) | Apple | Medium-High | macOS CPU fallback |
| WGPU | WebGPU-enabled | Medium | Browser deployment |
Design Philosophy:
- Remove Python from production workloads
- Minimize binary size (critical for edge/serverless)
- Fast startup times (first token ~120ms on M2 MacBook Air)
- Rust's safety guarantees for ML workloads
Example Usage:
use candle_core::{Device, Tensor};
use candle_onnx;
// Load model
let model = candle_onnx::read_file("model.onnx")?;
let graph = model.graph.as_ref().unwrap();
// Create device (CUDA/Metal/CPU)
let device = Device::cuda_if_available(0)?;
// Run inference
let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?;
let output = model.forward(&[input])?;
3.2 Tract Framework (Sonos)
Architecture: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices.
Key Capabilities:
- ONNX Support: 85% of ONNX backend tests passing
- Operator Set: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18)
- Proven Models: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc.
- Pulsing: Streaming inference for time-series models (e.g., WaveNet)
- Quantization: Built-in int8 quantization support
Performance Characteristics:
- Optimized for CPU inference
- Excellent for edge devices (Raspberry Pi, embedded systems)
- Minimal memory footprint
- No RTTI or runtime overhead
Example Usage:
use tract_onnx::prelude::*;
// Load and optimize model
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.into_optimized()?
.into_runnable()?;
// Run inference
let input = tract_ndarray::arr4(&[[...]]).into_dyn();
let result = model.run(tvec![input.into()])?;
Quantization Support:
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.quantize()? // Automatic int8 quantization
.into_optimized()?
.into_runnable()?;
3.3 Comparison: Candle vs Tract vs ort
| Criterion | Candle | Tract | ort |
|---|---|---|---|
| Performance (GPU) | Very High | N/A | Very High |
| Performance (CPU) | High | Very High | Very High |
| Binary Size | Small | Very Small | Large |
| Startup Time | Fast | Very Fast | Medium |
| WASM Support | Excellent | Excellent | Good (with backends) |
| Model Ecosystem | Hugging Face | ONNX/TF | ONNX (largest) |
| GPU Backends | CUDA/Metal/WGPU | Limited | CUDA/TensorRT/OpenVINO |
| Quantization | Manual | Built-in | Excellent (ONNX tools) |
| Maturity | Stable (2024+) | Mature (2018+) | Production (Microsoft) |
Recommendation:
- ort: Primary choice for maximum performance and hardware acceleration
- candle: Secondary choice for WASM targets or Hugging Face integration
- tract: Fallback for pure Rust requirements or extreme size constraints
4. Image Processing in Rust
4.1 The image Crate (Foundation)
Purpose: Core image encoding/decoding and basic manipulation.
Supported Formats:
- JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF
Key Features:
use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView};
// Load image
let img = image::open("input.jpg")?;
// Basic operations (in imageops module)
let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3);
let grayscale = img.grayscale();
let blurred = imageops::blur(&img, 2.0);
let contrast_adjusted = imageops::contrast(&img, 30.0);
4.2 The imageproc Crate (Advanced Processing)
Purpose: Advanced image processing algorithms for computer vision.
Modules:
| Module | Capabilities |
|---|---|
| Contrast | Histogram equalization, adaptive thresholding, CLAHE |
| Corners | Harris, FAST, Shi-Tomasi corner detection |
| Distance Transform | Euclidean distance maps, morphological operations |
| Edges | Canny edge detection, Sobel/Scharr operators |
| Filter | Gaussian, median, bilateral filtering |
| Geometric | Rotation, affine, projective transformations |
| Morphology | Erosion, dilation, opening, closing |
| Drawing | Shapes, text, anti-aliased primitives |
| Contours | Border tracing, contour extraction |
Parallelism: CPU-based multithreading via rayon (not GPU acceleration)
OCR Preprocessing Example:
use imageproc::contrast::{adaptive_threshold, ThresholdType};
use imageproc::filter::gaussian_blur_f32;
use imageproc::geometric_transformations::{rotate_about_center, Interpolation};
// Preprocessing pipeline for OCR
fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage {
// Convert to grayscale
let gray = img.to_luma8();
// Denoise with Gaussian blur
let blurred = gaussian_blur_f32(&gray, 1.0);
// Adaptive thresholding for varying lighting
let binary = adaptive_threshold(&blurred, 21);
// Deskew if needed
let angle = detect_skew(&binary); // Custom function
let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8]));
deskewed
}
4.3 GPU Acceleration Options for Image Processing
Current State: imageproc does NOT provide GPU acceleration. For GPU-accelerated image processing, consider:
Option 1: wgpu + Custom Compute Shaders
use wgpu;
// GPU compute shader for image processing
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Image Processing"),
source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")),
});
Option 2: OpenCV-Rust Bindings (if CUDA needed)
- Provides GPU-accelerated operations via CUDA
- Requires OpenCV C++ installation
- Not pure Rust
Option 3: Integrate with ML Framework GPU Ops
- Use candle/ort tensor operations for preprocessing
- Leverage existing GPU context
- Keep preprocessing on same device as inference
Recommendation for ruvector-scipix:
- Use
image+imageprocfor CPU preprocessing (fast enough for most cases) - For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations
- Leverage rayon parallelism for batch processing
5. GPU Acceleration Options
5.1 Cross-Platform GPU Support in 2025
The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs.
Unified Backend: wgpu (WebGPU Standard)
- Targets: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers)
- Use Case: Portable GPU compute without vendor lock-in
- Frameworks: Burn, Candle (WGPU backend), kalosm
Performance Profile:
| Backend | Platform | Speedup vs CPU | Use Case |
|---|---|---|---|
| CUDA | NVIDIA GPU | 10-50x | Production ML inference |
| TensorRT | NVIDIA GPU | 15-70x | Optimized ONNX models |
| Metal | Apple Silicon | 8-30x | macOS/iOS deployment |
| OpenVINO | Intel | 5-20x | Intel CPU/GPU optimization |
| WGPU | WebGPU-capable | 3-15x | Browser/cross-platform |
| ROCm | AMD GPU | 10-40x | AMD GPU acceleration |
5.2 CUDA Support
Primary Library: cudarc (Low-level CUDA bindings)
Integration via ONNX Runtime:
[dependencies]
ort = { version = "2.0", features = ["cuda"] }
Requirements:
- CUDA Toolkit 11.6+ (for ort)
- NVIDIA GPU: Maxwell (7xx series) or newer
- Compute Capability 5.0+
Benefits:
- Industry-standard ML acceleration
- Mature ecosystem and tooling
- Extensive operator coverage
- Best-in-class performance for training and inference
5.3 Metal Support (Apple Silicon)
Framework Integration:
- Candle: Native Metal backend via
metalcrate - Burn: Metal support through
burn-metalbackend - ONNX Runtime: CoreML execution provider (Metal-accelerated)
Example (Candle):
use candle_core::Device;
let device = Device::new_metal(0)?; // First Metal device
let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
Performance: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips
5.4 WebGPU/WGPU
Purpose: Cross-platform GPU compute for WASM and native
Frameworks with WGPU Support:
- Burn: First-class WGPU backend
- Candle: WGPU support for browser deployment
- Kalosm: WGPU acceleration via Fusor (0.5 release)
Browser Deployment:
// WASM-compatible GPU inference
#[cfg(target_arch = "wasm32")]
use candle_core::Device;
let device = Device::Cpu; // Or Device::Metal/Cuda if available
Benefits:
- Browser-based ML inference without server
- Works on AMD GPUs (unlike CUDA)
- Portable across desktop and web
- Future-proof standard (W3C specification)
Limitations:
- Lower performance than native CUDA/Metal
- Browser memory constraints (typically 2-8GB)
- First token latency: ~120ms (acceptable for many use cases)
5.5 TensorRT (NVIDIA Optimization)
Purpose: Optimized ONNX model execution on NVIDIA GPUs
Requirements:
- NVIDIA GPU: GeForce 9xx series or newer
- TensorRT 8.4+
- CUDA 11.6+
Integration:
ort = { version = "2.0", features = ["cuda", "tensorrt"] }
Benefits:
- Automatic kernel fusion and layer optimization
- Mixed precision (FP32/FP16/INT8)
- Up to 2-5x faster than standard CUDA
- Optimal for high-throughput production deployment
5.6 OpenVINO (Intel)
Target: Intel CPUs (6th gen+) and Intel integrated GPUs
Use Case:
- Intel-based servers without discrete GPU
- Edge devices with Intel processors
- Cost-effective acceleration without NVIDIA hardware
Integration:
ort = { version = "2.0", features = ["openvino"] }
Performance: 5-20x CPU speedup depending on model and hardware
5.7 GPU Acceleration Recommendation for ruvector-scipix
Tiered Approach:
-
Primary (Production):
ortwith CUDA/TensorRT- Maximum performance for server deployment
- Best operator coverage for PaddleOCR models
- Production-proven reliability
-
Secondary (Apple Ecosystem):
candlewith Metal- Native Apple Silicon support
- Good for macOS/iOS deployment
- Smaller binary size than ONNX Runtime
-
Tertiary (WASM/Browser):
candleortractwith WGPU- Client-side OCR in browser
- Privacy-preserving (no server upload)
- Acceptable performance for interactive use
-
Fallback (CPU-only):
tractorortwith optimized CPU execution- MKL/OpenBLAS acceleration
- Rayon parallelism
- Still faster than Python alternatives
6. WebAssembly Compilation Considerations
6.1 WASM for ML: Current State (2025)
Key Finding: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives.
Performance Characteristics:
- Rust compiles to WASM faster than C++
- Rust produces smaller binaries than C++ WASM
- Memory efficiency: Rust's ownership model translates well to WASM linear memory
- Consistent performance across browsers
6.2 Memory Constraints and Optimization
Browser Memory Limits:
- Typical: 2-4GB per tab (Chrome/Firefox)
- Maximum: 4-8GB (varies by browser/OS)
- Critical Issue: Running multiple models can exhaust memory quickly
Memory Optimization Strategies:
1. Model Quantization
// INT8 quantization reduces memory by 4x
// FP16 quantization reduces memory by 2x
let quantized_model = model.quantize(QuantizationType::QInt8)?;
2. Memory Reuse
// Pre-allocate tensors, reuse across inferences
struct InferenceContext {
input_buffer: Vec<f32>,
output_buffer: Vec<f32>,
}
impl InferenceContext {
fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> {
self.input_buffer.copy_from_slice(data);
model.run(&self.input_buffer, &mut self.output_buffer)?;
Ok(&self.output_buffer)
}
}
3. Lazy Loading with Streaming Compile
// Use WebAssembly.instantiateStreaming for faster startup
// Load models on-demand, not at initialization
async fn load_model_lazy(url: &str) -> Result<Module> {
let response = window.fetch(url).await?;
let module = WebAssembly::instantiate_streaming(response).await?;
Ok(module)
}
4. wasm-opt Optimization
# Optimize WASM binary size and performance
wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm
5. Model Cleanup
// Explicit cleanup when switching models
impl Drop for ModelContext {
fn drop(&mut self) {
// Free GPU resources
self.gpu_buffers.clear();
// Trigger garbage collection hint (if available)
}
}
6.3 Bundle Size Considerations
Challenge: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint.
Mitigation Strategies:
1. Code Splitting
// Load OCR functionality separately from main bundle
#[wasm_bindgen]
pub async fn init_ocr() -> Result<OcrEngine, JsValue> {
// Lazy-load OCR model
let model = load_model("ocr.onnx").await?;
Ok(OcrEngine::new(model))
}
2. Minimal Features
[dependencies]
ort = { version = "2.0", default-features = false, features = ["minimal-build"] }
tract-onnx = { version = "0.22", default-features = false }
3. Compression
# Brotli compression (recommended by Chrome)
brotli -q 11 output.wasm -o output.wasm.br
# Gzip fallback
gzip -9 output.wasm
4. Tree Shaking
[profile.release]
opt-level = "z" # Optimize for size
lto = true
codegen-units = 1
panic = "abort"
strip = true
Expected Sizes:
| Configuration | Uncompressed | Brotli | Gzip |
|---|---|---|---|
| Minimal tract | ~800KB | ~250KB | ~320KB |
| Full ort | ~3MB | ~900KB | ~1.1MB |
| Candle (minimal) | ~600KB | ~180KB | ~240KB |
6.4 WASM-Specific Limitations
1. Threading Constraints
- SharedArrayBuffer required for multi-threading
- COEP/COOP headers needed for isolation
- Not all browsers support WASM threads
2. SIMD Support
- WASM SIMD enabled by default in modern browsers
- Significant performance boost for ML operations
- Check browser compatibility:
wasm-feature-detect
3. No Direct File System Access
- Use IndexedDB or Cache API for model storage
- Stream models from network (HTTP/2)
- Consider embedding small models in binary
4. GPU Access
- WebGPU required for GPU acceleration
- Not universally supported (as of 2025, Chrome/Edge primarily)
- Fallback to CPU inference needed
6.5 Recommended WASM Frameworks for ruvector-scipix
Primary: candle with WGPU
- Smallest binary size
- Native WASM support
- WebGPU acceleration when available
- Hugging Face ecosystem
Secondary: tract
- Pure Rust, no C++ dependencies
- Excellent WASM support
- Proven in production (Sonos)
- CPU-optimized
Alternative: ort with WASM backend
- Full ONNX operator support
- Can use tract or candle as backend
- Larger bundle size
Example WASM Integration:
use wasm_bindgen::prelude::*;
use candle_core::{Device, Tensor};
#[wasm_bindgen]
pub struct OcrEngine {
model: candle_onnx::Model,
device: Device,
}
#[wasm_bindgen]
impl OcrEngine {
#[wasm_bindgen(constructor)]
pub async fn new() -> Result<OcrEngine, JsValue> {
// Use WebGPU if available, fallback to CPU
let device = Device::Cpu; // Or Device::new_wgpu(0)?
// Load model from URL
let model_bytes = fetch_model("model.onnx").await?;
let model = candle_onnx::read(&model_bytes)
.map_err(|e| JsValue::from_str(&e.to_string()))?;
Ok(OcrEngine { model, device })
}
pub fn recognize_text(&self, image_data: &[u8]) -> Result<String, JsValue> {
// Preprocess image
let tensor = preprocess_image(image_data, &self.device)?;
// Run inference
let output = self.model.forward(&[tensor])
.map_err(|e| JsValue::from_str(&e.to_string()))?;
// Decode output
let text = decode_predictions(output)?;
Ok(text)
}
}
6.6 WASM Deployment Checklist
- Enable WASM SIMD in build (
RUSTFLAGS='-C target-feature=+simd128') - Optimize bundle size (
opt-level = "z", LTO, strip) - Implement lazy loading for models
- Set up proper CORS headers for model fetching
- Add WebGPU feature detection with CPU fallback
- Configure Brotli/Gzip compression on CDN
- Test memory usage across browsers (especially mobile)
- Implement model cleanup on tab close
- Add loading indicators for async model initialization
- Consider service worker for model caching
7. Memory Management for Large Models
7.1 Memory Challenges in ML Inference
Typical OCR Model Sizes:
- PaddleOCR Detection: 3-10MB (FP32)
- PaddleOCR Recognition: 5-15MB (FP32)
- TrOCR: 50-300MB (depending on variant)
- Tesseract trained data: 10-50MB per language
Memory Consumption Beyond Model Weights:
- Input tensors: Image size × channels × precision
- Intermediate activations: Varies by architecture (can exceed model size)
- Output buffers: Sequence length × vocab size
- KV cache (for transformers): Context length × hidden size × layers
7.2 Quantization Strategies
INT8 Quantization (4x memory reduction)
// ONNX Runtime quantization
use ort::quantization::{QuantizationConfig, QuantizationType};
let config = QuantizationConfig::default()
.with_per_channel(true)
.with_reduce_range(true);
let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?;
Benefits:
- 75% memory reduction (FP32 → INT8)
- Minimal accuracy loss (typically <1% for OCR)
- Faster inference on integer-optimized hardware
- Reduced cache pressure
FP16 Quantization (2x memory reduction)
// Using ort with half crate
use half::f16;
use ort::tensor::OrtOwnedTensor;
let input_f16: Vec<f16> = input_f32.iter().map(|&x| f16::from_f32(x)).collect();
let tensor = OrtOwnedTensor::from_array(input_f16)?;
Benefits:
- Better accuracy preservation than INT8
- Native support on modern GPUs (Tensor Cores)
- Still significant memory savings
Dynamic Quantization (Runtime)
// tract supports dynamic quantization
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))?
.quantize()? // Automatic quantization
.into_optimized()?
.into_runnable()?;
7.3 Memory Pooling and Reuse
Tensor Buffer Reuse:
use std::sync::Arc;
use parking_lot::Mutex;
struct TensorPool {
buffers: Vec<Arc<Mutex<Vec<f32>>>>,
size: usize,
}
impl TensorPool {
fn new(pool_size: usize, buffer_size: usize) -> Self {
let buffers = (0..pool_size)
.map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size])))
.collect();
TensorPool { buffers, size: pool_size }
}
fn acquire(&self) -> Option<Arc<Mutex<Vec<f32>>>> {
// Round-robin or availability-based selection
self.buffers.first().cloned()
}
}
Session Pooling (ONNX Runtime):
use once_cell::sync::Lazy;
use ort::Session;
static SESSION_POOL: Lazy<Vec<Session>> = Lazy::new(|| {
(0..4).map(|_| {
Session::builder()
.unwrap()
.commit_from_file("model.onnx")
.unwrap()
}).collect()
});
fn get_session() -> &'static Session {
&SESSION_POOL[thread_id % 4]
}
7.4 Streaming and Batching
Batch Processing (Amortize overhead):
fn process_batch(images: &[DynamicImage], model: &Session) -> Result<Vec<String>> {
let batch_size = images.len();
// Create batched tensor [batch_size, channels, height, width]
let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224];
for (i, img) in images.iter().enumerate() {
let offset = i * 3 * 224 * 224;
preprocess_into_buffer(img, &mut batch_tensor[offset..]);
}
// Single inference call for entire batch
let output = model.run(vec![batch_tensor.into()])?;
// Decode batch results
decode_batch_predictions(output, batch_size)
}
Streaming Inference (For large documents):
async fn process_document_streaming(
pages: impl Stream<Item = Image>,
model: &Session,
) -> impl Stream<Item = Result<String>> {
pages.map(|page| {
// Process one page at a time
let text = recognize_text(&page, model)?;
Ok(text)
})
}
7.5 Model Sharding and Lazy Loading
Lazy Model Loading:
use once_cell::sync::OnceCell;
static DETECTION_MODEL: OnceCell<Session> = OnceCell::new();
static RECOGNITION_MODEL: OnceCell<Session> = OnceCell::new();
fn get_detection_model() -> &'static Session {
DETECTION_MODEL.get_or_init(|| {
Session::builder()
.unwrap()
.commit_from_file("detection.onnx")
.unwrap()
})
}
Conditional Loading:
// Only load language-specific models when needed
struct OcrEngine {
detection: Session,
recognition_models: HashMap<Language, OnceCell<Session>>,
}
impl OcrEngine {
fn recognize(&self, img: &Image, lang: Language) -> Result<String> {
let boxes = self.detect(img)?;
let rec_model = self.recognition_models
.get(&lang)
.unwrap()
.get_or_init(|| load_recognition_model(lang));
self.recognize_boxes(img, &boxes, rec_model)
}
}
7.6 Memory Mapping (Large Models)
Using memmap2 for Model Files:
use memmap2::Mmap;
use std::fs::File;
fn load_model_mmap(path: &str) -> Result<Mmap> {
let file = File::open(path)?;
let mmap = unsafe { Mmap::map(&file)? };
Ok(mmap)
}
// Model data stays on disk, paged in as needed
// Useful for models >100MB
Benefits:
- Reduced resident memory
- Faster startup (no full load)
- Shared memory across processes
Limitations:
- Not available in WASM
- Requires file system access
- May have higher latency on first access
7.7 GPU Memory Management
CUDA Unified Memory:
// ort automatically manages GPU memory
let session = Session::builder()?
.with_execution_providers([ExecutionProvider::CUDA])?
.commit_from_file("model.onnx")?;
// Tensors automatically transferred to/from GPU
Manual GPU Memory Control (candle):
use candle_core::{Device, Tensor};
let device = Device::new_cuda(0)?;
// Allocate on GPU
let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
// Transfer to CPU when needed
let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?;
// Explicit cleanup
drop(tensor_gpu);
7.8 Memory Profiling and Monitoring
Rust Memory Profiling Tools:
valgrind --tool=massif: Heap profilingheaptrack: Heap memory profiler (Linux)dhat: Dynamic heap analysis tooltokio-console: Async runtime monitoring
Custom Memory Tracking:
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
struct TrackingAllocator;
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
System.dealloc(ptr, layout)
}
}
#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;
fn get_memory_usage() -> usize {
ALLOCATED.load(Ordering::SeqCst)
}
7.9 Memory Optimization Recommendations for ruvector-scipix
Priority Strategies:
-
Quantize Models (INT8 for production)
- 4x memory reduction
- Minimal accuracy impact for OCR
- Use ONNX Runtime quantization tools
-
Implement Tensor Pooling
- Reuse buffers for repeated inferences
- Align with ruvector-core's memory management patterns
- Use
parking_lotfor efficient synchronization
-
Lazy Load Language Models
- Only load recognition models for requested languages
- Use
OnceCellfor thread-safe initialization - Share models across threads
-
Batch Processing
- Group multiple images into single inference call
- Amortize overhead, improve GPU utilization
- Integrate with ruvector's parallel processing
-
GPU Memory Awareness
- Monitor GPU memory usage
- Implement fallback to CPU if GPU OOM
- Use smaller batch sizes on memory-constrained devices
-
Profile Real Workloads
- Measure memory with actual ruvector data
- Identify bottlenecks (model weights vs activations)
- Optimize based on data
8. Recommended Technology Stack for ruvector-scipix
8.1 Primary Stack (Production Deployment)
Inference Engine: ort (ONNX Runtime)
- Version:
2.0.0-rcor latest stable - Features:
cuda,tensorrt,half,load-dynamic - Rationale:
- Best-in-class performance (73% latency reduction)
- Extensive GPU support (CUDA, TensorRT, OpenVINO)
- Production-proven (Twitter, Google, SurrealDB)
- Largest ONNX model ecosystem
OCR Models: PaddleOCR v5 (ONNX format)
- Detection:
ch_PP-OCRv5_mobile_det.onnx - Recognition:
ch_PP-OCRv5_mobile_rec.onnx - Rationale:
- State-of-the-art accuracy
- Optimized for speed (5x faster in ONNX)
- Multi-language support (80+ languages)
- Active development (2025 updates)
Image Processing: image + imageproc
- Version: Latest stable
- Rationale:
- Comprehensive format support
- CPU parallelism via rayon (already in workspace)
- Mature, well-tested
- Pure Rust (no C++ dependencies)
Dependencies Integration:
[dependencies]
# Inference
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing
image = "0.25"
imageproc = "0.25"
# Existing ruvector-core dependencies (reuse)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }
8.2 Alternative Stack (WASM/Browser Deployment)
Inference Engine: candle with WGPU backend
- Version: Latest stable from Hugging Face
- Features:
wasm,webgpu - Rationale:
- Smallest WASM bundle size
- Native WebGPU support
- Fast startup times
- Pure Rust
OCR Models: TrOCR (via candle-onnx) or lightweight PaddleOCR
- Smaller models for browser constraints
- Quantized INT8 versions
WASM-Specific Stack:
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
candle-onnx = { version = "0.8" }
wasm-bindgen = { workspace = true }
web-sys = { workspace = true }
8.3 Fallback Stack (Pure Rust/No External Dependencies)
Inference Engine: tract
- Use Case: When ONNX Runtime binaries unavailable or pure Rust required
- Rationale:
- No C++ dependencies
- Excellent WASM support
- Mature (Sonos production use)
- Passes 85% ONNX tests
Stack:
[dependencies]
tract-onnx = "0.22"
image = "0.25"
imageproc = "0.25"
8.4 Architecture Design
┌─────────────────────────────────────────────────────────────┐
│ ruvector-scipix │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Image Input │────▶│ Preprocessing│───▶│ Detection │ │
│ │ (image) │ │ (imageproc) │ │ (ort/ONNX) │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Text Boxes │ │
│ └──────┬───────┘ │
│ │ │
│ ┌─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Recognition │─────▶│ Post-Proc. │ │
│ │ (ort/ONNX) │ │ (decode) │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Vector Store │ │
│ │ (ruvector- │ │
│ │ core) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
GPU Acceleration Layers:
├─ CUDA/TensorRT (NVIDIA)
├─ Metal (Apple Silicon)
├─ OpenVINO (Intel)
└─ WGPU (Cross-platform/Browser)
8.5 Module Structure
examples/scipix/
├── Cargo.toml
├── src/
│ ├── lib.rs # Public API
│ ├── engine.rs # OCR engine orchestration
│ ├── detection.rs # Text detection (ONNX)
│ ├── recognition.rs # Text recognition (ONNX)
│ ├── preprocessing.rs # Image preprocessing (imageproc)
│ ├── postprocessing.rs # Result decoding and formatting
│ ├── models.rs # Model loading and management
│ └── config.rs # Configuration
├── models/ # ONNX model files (gitignored)
│ ├── detection.onnx
│ ├── recognition.onnx
│ └── dict.txt
├── tests/
│ ├── integration_test.rs
│ └── benchmark.rs
└── docs/
├── 01_REQUIREMENTS.md
├── 02_ARCHITECTURE.md
└── 03_RUST_ECOSYSTEM.md # This document
8.6 Performance Targets
Based on PaddleOCR benchmarks and Rust optimizations:
| Metric | Target | Hardware |
|---|---|---|
| Detection Latency | <50ms | NVIDIA T4 (TensorRT) |
| Recognition Latency | <20ms | NVIDIA T4 (TensorRT) |
| End-to-End (single image) | <100ms | NVIDIA T4 |
| Throughput (batched) | >100 images/sec | NVIDIA T4 |
| CPU Latency | <500ms | Modern multi-core CPU |
| WASM Latency | <1s | Browser (WebGPU) |
| Memory Usage | <500MB | With INT8 quantization |
8.7 Development Phases
Phase 1: Core Implementation (ort + PaddleOCR)
- Implement detection and recognition pipelines
- Integrate with ruvector-core storage
- CPU-only inference initially
- Basic preprocessing (resize, normalize)
Phase 2: GPU Acceleration
- Add CUDA/TensorRT support
- Benchmark and optimize performance
- Implement batching for throughput
- Memory pooling and reuse
Phase 3: Production Hardening
- Model quantization (INT8)
- Error handling and fallbacks
- Metrics and monitoring
- Load testing
Phase 4: WASM Support (Optional)
- Port to candle or tract
- Browser deployment
- WebGPU acceleration
- Client-side OCR
8.8 Testing Strategy
Unit Tests:
- Image preprocessing correctness
- Model loading and initialization
- Tensor shape validation
- Output decoding accuracy
Integration Tests:
#[test]
fn test_end_to_end_ocr() {
let engine = OcrEngine::new(Config::default()).unwrap();
let img = image::open("tests/fixtures/sample.jpg").unwrap();
let result = engine.recognize_text(&img).unwrap();
assert!(result.contains("expected text"));
}
Benchmarks (using Criterion):
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_detection(c: &mut Criterion) {
let engine = setup_engine();
let img = load_test_image();
c.bench_function("detection", |b| {
b.iter(|| engine.detect(black_box(&img)))
});
}
criterion_group!(benches, benchmark_detection);
criterion_main!(benches);
Performance Tests:
- Latency under various image sizes
- Throughput with batching
- Memory usage over time
- GPU utilization
9. Integration with ruvector-core Dependencies
9.1 Shared Workspace Dependencies
The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency.
Already Available (from workspace):
| Dependency | ruvector Use | scipix Use |
|---|---|---|
rayon |
Parallel distance computation | Batch image preprocessing, parallel OCR |
ndarray |
Vector operations | Tensor manipulation, image arrays |
parking_lot |
Lock-free data structures | Model pool synchronization |
dashmap |
Concurrent hash maps | Model cache, result cache |
tokio |
Async runtime | Async inference, streaming |
serde / serde_json |
Serialization | Config, results serialization |
thiserror / anyhow |
Error handling | OCR error types |
tracing |
Logging | Inference timing, debugging |
uuid |
Unique identifiers | Request tracking |
chrono |
Timestamps | Inference metrics |
Benefits:
- Minimal new dependencies: Only add OCR-specific crates
- Consistent patterns: Same error handling, logging, async across codebase
- Binary size: Shared dependencies not duplicated
- Maintenance: Updates to workspace deps benefit all crates
9.2 Parallel Processing Integration
Leverage rayon for Batch OCR:
use rayon::prelude::*;
fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec<OcrResult> {
images.par_iter()
.map(|img| engine.recognize_text(img))
.collect()
}
Consistency: Matches ruvector-core's parallel distance computation pattern
9.3 Storage Integration
Store OCR Results in ruvector-core:
use ruvector_core::{VectorStore, Vector};
struct OcrResult {
text: String,
embedding: Vec<f32>, // From embedding model
bounding_boxes: Vec<BoundingBox>,
}
impl OcrResult {
fn store_in_ruvector(&self, store: &mut VectorStore) -> Result<uuid::Uuid> {
let vector = Vector::new(self.embedding.clone());
let id = store.insert(vector)?;
// Store metadata
store.set_metadata(id, "text", &self.text)?;
store.set_metadata(id, "boxes", &self.bounding_boxes)?;
Ok(id)
}
}
Vector Search for OCR Results:
// Find similar documents by text embedding
let query_embedding = embed_text("search query")?;
let similar_docs = store.search(&query_embedding, 10)?;
9.4 WASM Compatibility
ruvector-core WASM Patterns:
memory-onlyfeature for WASM targetswasm-bindgenfor browser interopgetrandomwithwasm_jsfeature
Apply to scipix:
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
wasm-bindgen = { workspace = true }
getrandom = { workspace = true, features = ["wasm_js"] }
[features]
default = ["ort-backend"]
ort-backend = ["ort"]
candle-backend = ["candle-core", "candle-onnx"]
wasm = ["candle-backend"] # WASM uses candle
9.5 Error Handling Patterns
Consistent with ruvector-core:
use thiserror::Error;
#[derive(Error, Debug)]
pub enum OcrError {
#[error("Model loading failed: {0}")]
ModelLoadError(String),
#[error("Inference failed: {0}")]
InferenceError(String),
#[error("Image preprocessing failed: {0}")]
PreprocessingError(#[from] image::ImageError),
#[error("ONNX Runtime error: {0}")]
OrtError(#[from] ort::Error),
#[error("IO error: {0}")]
IoError(#[from] std::io::Error),
}
pub type Result<T> = std::result::Result<T, OcrError>;
9.6 Configuration Pattern
Similar to ruvector-core config:
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OcrConfig {
/// Path to detection model
pub detection_model_path: String,
/// Path to recognition model
pub recognition_model_path: String,
/// Use GPU acceleration if available
pub use_gpu: bool,
/// Batch size for parallel processing
pub batch_size: usize,
/// Detection confidence threshold
pub detection_threshold: f32,
/// Number of inference threads
pub num_threads: usize,
}
impl Default for OcrConfig {
fn default() -> Self {
Self {
detection_model_path: "models/detection.onnx".into(),
recognition_model_path: "models/recognition.onnx".into(),
use_gpu: true,
batch_size: 8,
detection_threshold: 0.7,
num_threads: rayon::current_num_threads(),
}
}
}
9.7 Async Integration
Use tokio for async OCR:
use tokio::task;
pub struct AsyncOcrEngine {
engine: Arc<OcrEngine>,
}
impl AsyncOcrEngine {
pub async fn recognize_text(&self, image: DynamicImage) -> Result<OcrResult> {
let engine = Arc::clone(&self.engine);
// Run blocking OCR in tokio threadpool
task::spawn_blocking(move || {
engine.recognize_text_sync(&image)
}).await?
}
pub async fn process_stream(
&self,
images: impl Stream<Item = DynamicImage>,
) -> impl Stream<Item = Result<OcrResult>> {
images.then(move |img| {
let engine = Arc::clone(&self.engine);
async move {
engine.recognize_text(img).await
}
})
}
}
9.8 Metrics Integration
Use existing tracing infrastructure:
use tracing::{info, debug, instrument};
#[instrument(skip(self, image))]
pub fn recognize_text(&self, image: &DynamicImage) -> Result<OcrResult> {
let start = std::time::Instant::now();
debug!("Starting OCR for image {}x{}", image.width(), image.height());
let preprocessed = self.preprocess(image)?;
debug!("Preprocessing took {:?}", start.elapsed());
let boxes = self.detect(&preprocessed)?;
debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed());
let text = self.recognize(&preprocessed, &boxes)?;
info!(
"OCR completed in {:?}, extracted {} characters",
start.elapsed(),
text.len()
);
Ok(OcrResult { text, boxes })
}
9.9 Testing Infrastructure Reuse
Use workspace test dependencies:
[dev-dependencies]
criterion = { workspace = true }
proptest = { workspace = true }
mockall = { workspace = true }
tempfile = "3.13"
Property-Based Testing (like ruvector-core):
use proptest::prelude::*;
proptest! {
#[test]
fn test_preprocessing_preserves_aspect_ratio(
width in 100u32..2000u32,
height in 100u32..2000u32
) {
let img = DynamicImage::new_rgb8(width, height);
let processed = preprocess_image(&img)?;
let original_ratio = width as f32 / height as f32;
let processed_ratio = processed.width() as f32 / processed.height() as f32;
prop_assert!((original_ratio - processed_ratio).abs() < 0.01);
}
}
9.10 Dependency Summary for scipix
New Dependencies Required:
[dependencies]
# OCR/ML (new)
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] }
image = "0.25"
imageproc = "0.25"
# Reuse from workspace (no version needed)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }
# Integration with ruvector-core
ruvector-core = { path = "../../crates/ruvector-core" }
Total New Dependencies: 3 (ort, image, imageproc) Reused Dependencies: 12 from workspace
10. License Compatibility
10.1 ruvector Project License
Current License: MIT (from workspace Cargo.toml)
Requirement: All dependencies must be MIT-compatible for redistribution.
10.2 Recommended Dependencies License Analysis
| Crate | License | Compatible? | Notes |
|---|---|---|---|
| ort | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed, fully compatible |
| candle | MIT OR Apache-2.0 | ✅ Yes | Hugging Face, dual-licensed |
| tract | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed (except ONNX protos) |
| image | MIT OR Apache-2.0 | ✅ Yes | Pure Rust, dual-licensed |
| imageproc | MIT | ✅ Yes | Permissive, MIT-only |
| ndarray | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| rayon | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| wasm-bindgen | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
Incompatible Libraries (Avoid):
| Crate | License | Issue |
|---|---|---|
| leptess | MIT (wrapper) | ❌ Depends on Tesseract (Apache-2.0 with restrictions) |
| opencv-rust | MIT (wrapper) | ❌ Depends on OpenCV (Apache-2.0, complex) |
10.3 ONNX Model Licenses
PaddleOCR models used in ONNX format have Apache-2.0 license.
Compatibility:
- ✅ Apache-2.0 code can be used in MIT-licensed projects
- ✅ ONNX models (weights) are typically considered data, not code
- ✅ Distribution of pre-trained models is permitted
- ⚠️ Derivative works of Apache-2.0 code require patent grant preservation
Best Practice:
- Download PaddleOCR ONNX models from official sources
- Include LICENSE file in
models/directory - Document model provenance in README
- Do not modify Apache-2.0 code (use as-is via ONNX)
10.4 Rust Dual-Licensing Best Practices
Why Rust Uses MIT OR Apache-2.0:
- MIT: Maximum permissiveness, minimal restrictions
- Apache-2.0: Patent protection, better for corporate use
- Dual License: Users choose which applies to them
For ruvector-scipix:
Option 1: Keep MIT-only (Current)
- ✅ Simplest licensing
- ✅ Maximum compatibility
- ✅ Minimal legal overhead
- ✅ All dependencies are MIT-compatible
Option 2: Adopt Dual MIT/Apache-2.0
- ✅ Better patent protection
- ✅ Aligns with Rust ecosystem norms
- ✅ More attractive to enterprise users
- ⚠️ Slightly more complex
Recommendation: Keep MIT-only for simplicity, unless patent concerns arise.
10.5 License Compliance Checklist
For Production Deployment:
- Verify all direct dependencies are MIT or MIT/Apache-2.0
- Check transitive dependencies for license conflicts
- Include LICENSE file in repository
- Document third-party licenses in NOTICE file
- Include PaddleOCR model license in
models/LICENSE - Add copyright headers to source files (optional for MIT)
- Review ONNX Runtime's license (MIT, but check binary distribution terms)
- Ensure no GPL/LGPL dependencies (incompatible with MIT)
Automated License Checking:
# Use cargo-license to audit dependencies
cargo install cargo-license
cargo license --all-features
# Fail build on incompatible licenses
cargo deny check licenses
deny.toml Configuration:
[licenses]
unlicensed = "deny"
allow = [
"MIT",
"Apache-2.0",
"Apache-2.0 WITH LLVM-exception",
"BSD-2-Clause",
"BSD-3-Clause",
"ISC",
"Unicode-DFS-2016",
]
deny = [
"GPL-2.0",
"GPL-3.0",
"AGPL-3.0",
]
10.6 Attribution Requirements
MIT License Requirements:
- Include copyright notice
- Include permission notice (LICENSE file)
- No obligation to disclose source code modifications
For PaddleOCR Models (Apache-2.0):
- Include NOTICE file if provided
- Preserve copyright and patent notices
- Document significant modifications (if any)
Recommended NOTICE File:
ruvector-scipix
Copyright 2025 Ruvector Team
This software includes components from:
1. ONNX Runtime
Copyright Microsoft Corporation
Licensed under MIT License
2. PaddleOCR Models
Copyright PaddlePaddle Authors
Licensed under Apache License 2.0
Model files located in models/ directory
3. Candle ML Framework
Copyright Hugging Face, Inc.
Licensed under MIT OR Apache-2.0
Complete license texts available in the LICENSE and models/LICENSE files.
10.7 License Compatibility Summary
✅ SAFE TO USE (Recommended Stack):
ort- MIT/Apache-2.0image- MIT/Apache-2.0imageproc- MITcandle- MIT/Apache-2.0tract- MIT/Apache-2.0- PaddleOCR ONNX models - Apache-2.0 (data)
⚠️ USE WITH CAUTION:
leptess- Requires Tesseract C++ library (complex licensing)opencv-rust- Requires OpenCV (large dependency, Apache-2.0)
❌ AVOID:
- Any GPL/LGPL libraries (incompatible with MIT for proprietary use)
- Proprietary OCR engines (licensing fees, redistribution restrictions)
Final Recommendation: The proposed stack (ort + PaddleOCR + image/imageproc) is fully compatible with ruvector's MIT license and follows Rust ecosystem best practices.
11. Final Recommendations
11.1 Optimal Technology Stack
Primary Recommendation (Production):
[dependencies]
# Inference: Best performance, production-proven
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing: Pure Rust, mature
image = "0.25"
imageproc = "0.25"
# OCR models: PaddleOCR v5 ONNX (download separately)
# - Detection: ch_PP-OCRv5_mobile_det.onnx
# - Recognition: ch_PP-OCRv5_mobile_rec.onnx
# Reuse workspace dependencies
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
thiserror = { workspace = true }
# Integration
ruvector-core = { path = "../../crates/ruvector-core" }
Rationale:
- Performance:
ortprovides 73% latency reduction vs alternatives - Ecosystem: Largest ONNX model selection (PaddleOCR, TrOCR, etc.)
- GPU Support: CUDA, TensorRT, OpenVINO, Metal (via CoreML)
- Production Ready: Used by Twitter, Google, SurrealDB
- License: MIT/Apache-2.0 dual-license (fully compatible)
- Maintenance: Active development, Microsoft backing
11.2 Alternative Stacks by Use Case
WASM/Browser Deployment:
candle-core = { version = "0.8", features = ["wasm", "webgpu"] }
candle-onnx = "0.8"
- Smallest bundle size (~180KB Brotli)
- WebGPU acceleration
- Fast startup (120ms first token)
Pure Rust / No External Deps:
tract-onnx = "0.22"
- No C++ dependencies
- Excellent for embedded/restrictive environments
- 85% ONNX compatibility
Edge Devices / Raspberry Pi:
tract-onnx = { version = "0.22", features = ["pulse"] }
- Optimized for CPU inference
- Minimal memory footprint
- Proven on RPi (11μs for CNN models)
11.3 Implementation Roadmap
Week 1-2: Core Infrastructure
- Set up
examples/scipixcrate structure - Integrate
ortandimage/imageproc - Implement model loading (detection + recognition)
- Basic end-to-end pipeline (CPU-only)
Week 3-4: GPU Acceleration
- Enable CUDA/TensorRT support
- Implement batching for throughput
- Benchmark performance vs targets
- Memory pooling and optimization
Week 5-6: Production Hardening
- Model quantization (INT8)
- Error handling and recovery
- Metrics and monitoring (tracing)
- Integration tests and benchmarks
Week 7-8: ruvector Integration
- Store OCR results in ruvector-core
- Implement vector search for documents
- Async API with tokio
- Documentation and examples
Optional (Week 9-10): WASM Support
- Port to candle for browser deployment
- WebGPU acceleration
- Client-side OCR demo
11.4 Key Metrics to Track
Performance:
- Detection latency: Target <50ms (GPU), <200ms (CPU)
- Recognition latency: Target <20ms (GPU), <100ms (CPU)
- End-to-end: Target <100ms (GPU), <500ms (CPU)
- Throughput: Target >100 images/sec (batched, GPU)
Memory:
- Model size: ~15-30MB (FP32), ~5-10MB (INT8)
- Runtime memory: Target <500MB
- GPU memory: Monitor for OOM
Accuracy:
- Character accuracy: Target >95% (clean text)
- Word accuracy: Target >90%
- Benchmark against Tesseract and commercial APIs
11.5 Risk Mitigation
Model Availability:
- ✅ PaddleOCR models freely available
- ✅ Multiple model versions for fallback
- ⚠️ Verify ONNX export quality (may need custom conversion)
Dependency Stability:
- ✅
ortactively maintained (2.0 rc, stable release expected) - ✅
image/imageprocmature, widely used - ⚠️ Monitor for breaking changes during updates
Performance Variability:
- ⚠️ GPU performance depends on driver versions
- ⚠️ WASM performance varies by browser
- ✅ Comprehensive benchmarking before production
License Compliance:
- ✅ All recommended dependencies MIT-compatible
- ✅ PaddleOCR Apache-2.0 (compatible for use)
- ⚠️ Review licenses before adding new dependencies
11.6 Success Criteria
The ruvector-scipix implementation is successful if:
- Performance: Meets or exceeds latency/throughput targets
- Accuracy: Character accuracy >95% on clean text
- Integration: Seamlessly stores results in ruvector-core
- Portability: Runs on Linux/macOS/Windows, CPU and GPU
- Memory: Operates within <500MB budget
- License: Maintains MIT compatibility
- Maintainability: Uses idiomatic Rust, well-documented
- Scalability: Handles batch processing efficiently
11.7 Next Steps
- Review this document with ruvector team for alignment
- Download PaddleOCR models (detection + recognition ONNX)
- Set up
examples/scipixcrate with recommended dependencies - Implement basic OCR pipeline (end-to-end proof of concept)
- Benchmark initial implementation against targets
- Iterate and optimize based on real-world data
- Document API and usage examples
- Integrate with ruvector-core for vector storage
References and Resources
Documentation
- ort Documentation - ONNX Runtime Rust bindings by pykeio
- Candle GitHub - Minimalist ML framework for Rust
- tract GitHub - Tiny, no-nonsense ONNX/TF inference
- PaddleOCR GitHub - OCR models and documentation
- imageproc Docs - Rust image processing library
Performance Benchmarks
- Rust at the Metal: GPU Layer Driving Modern AI
- Rust for Machine Learning in 2025
- PaddleOCR 3.0 High-Performance Inference
WASM Resources
- WebAssembly 3.0 Performance: Rust vs C++ Benchmarks
- 3W for In-Browser AI: WebLLM + WASM + WebWorkers
License Information
- Rust API Guidelines: Licensing
- PaddleOCR License - Apache-2.0
- ONNX Runtime License - MIT
Document Version: 1.0 Last Updated: 2025-11-28 Author: Research and Analysis Agent Status: Complete