# Rust OCR and ML Ecosystem Analysis for ruvector-scipix ## Executive Summary This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment. **Key Finding**: The optimal stack for ruvector-scipix combines `ort` (ONNX Runtime bindings) for inference, `image`/`imageproc` for preprocessing, with optional pure Rust alternatives (`tract`, `candle`) for WASM targets. --- ## 1. Library Comparison Matrix ### OCR Libraries | Library | Type | Model Support | WASM Support | GPU Support | Maturity | Performance | Dependencies | |---------|------|---------------|--------------|-------------|----------|-------------|--------------| | **ocrs** | Native Rust | ONNX (RTen engine) | βœ… Yes | ❌ No | 🟑 Preview | Medium | Minimal (Pure Rust) | | **oar-ocr** | ONNX Wrapper | PaddleOCR ONNX | βœ… Yes | βœ… CUDA | 🟒 Stable | High | ort (ONNX Runtime) | | **kalosm-ocr** | Pure Rust | TrOCR (candle) | βœ… Yes | βœ… WGPU/Metal/CUDA | 🟑 Alpha | Medium | candle ML framework | | **leptess** | FFI Bindings | Tesseract C++ | ❌ No | ❌ No | 🟒 Mature | High (CPU) | Tesseract C++ library | | **paddle-ocr-rs** | ONNX Wrapper | PaddleOCR v4/v5 | βœ… Yes | βœ… CUDA/TensorRT | 🟒 Stable | Very High | ort (ONNX Runtime) | | **pure-onnx-ocr** | Pure ONNX | PaddleOCR DBNet+SVTR | βœ… Yes | βœ… Via ONNX RT | 🟒 Active (2025) | High | No C/C++ deps | ### ML Inference Engines | Library | Purpose | Model Format | WASM Support | GPU Support | Performance | Maturity | |---------|---------|--------------|--------------|-------------|-------------|----------| | **ort** | ONNX Runtime | ONNX | βœ… Yes | βœ… CUDA/TensorRT/OpenVINO | **Very High** | 🟒 Production | | **candle** | ML Framework | Multiple | βœ… Yes | βœ… CUDA/Metal/WGPU | High | 🟒 Stable (HuggingFace) | | **tract** | ONNX/TF Inference | ONNX, NNEF, TF | βœ… Yes | ❌ Limited | High (CPU) | 🟒 Mature (Sonos) | | **burn** | Deep Learning | Multiple | βœ… Yes | βœ… CUDA/Metal/WGPU | Very High | 🟒 Active | **Legend**: 🟒 Production-ready | 🟑 Active development | πŸ”΄ Experimental ### Performance Benchmarks Based on research findings: - **ort + PaddleOCR**: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4) - **ONNX conversion**: Up to 5x faster than PaddlePaddle native inference - **tract**: 70ΞΌs (RPi Zero), 11ΞΌs (RPi 3) for CNN models - **Tesseract (leptess)**: Baseline CPU performance, requires preprocessing - **ocrs**: Early preview, moderate performance on clear text --- ## 2. ONNX Runtime Integration Options ### 2.1 The `ort` Crate (Recommended) **Overview**: `ort` by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support. **Key Features**: - **Hardware Acceleration**: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN - **Dynamic Loading**: Runtime linking for flexibility (`load-dynamic` feature) - **Alternative Backends**: Support for tract and candle backends - **Minimal Builds**: RTTI-free, optimized binary sizes for production - **Float16/BFloat16**: Via `half` crate integration - **Production Proven**: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB **Cargo Features**: ```toml [dependencies] ort = { version = "2.0.0-rc", features = [ "half", # Float16/BFloat16 support "load-dynamic", # Runtime dynamic linking "cuda", # NVIDIA GPU acceleration (requires CUDA 11.6+) "tensorrt", # TensorRT optimization (requires TensorRT 8.4+) ] } ``` **Performance Characteristics**: - Significantly faster than PyTorch for inference - Supports model quantization (int8, float16) - Multi-GPU distribution via NCCL - Optimal for batch processing and real-time inference **Integration Example**: ```rust use ort::{Session, Value}; // Load ONNX model let session = Session::builder()? .with_optimization_level(GraphOptimizationLevel::Level3)? .with_intra_threads(4)? .commit_from_file("model.onnx")?; // Run inference let input = Value::from_array(session.allocator(), &input_tensor)?; let outputs = session.run(vec![input])?; ``` ### 2.2 Alternative: `tract` Backend **Use Case**: When ONNX Runtime binaries are problematic or WASM target required **Advantages**: - Pure Rust implementation - No external C++ dependencies - Excellent WASM support - Passes 85% of ONNX backend tests - Lightweight and maintainable **Limitations**: - No tensor sequences or optional tensors - Limited GPU support compared to ort - TensorFlow 2 support via ONNX conversion only ### 2.3 Alternative: `candle` Backend **Use Case**: When integrating with Hugging Face ecosystem or needing pure Rust **Advantages**: - Minimalist design, fast compilation - Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion) - WASM + WebGPU acceleration - Small binary size for serverless deployment - CUDA, Metal, MKL, Accelerate backends **Limitations**: - Younger ecosystem than ONNX Runtime - Fewer pre-optimized OCR models available - Focus on inference over training --- ## 3. Pure Rust ML with Candle/Tract ### 3.1 Candle Framework (Hugging Face) **Architecture**: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment. **Supported Models**: - **Language Models**: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder - **Vision Models**: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything - **Speech**: Whisper ASR **Backend Support**: | Backend | Platform | Performance | Use Case | |---------|----------|-------------|----------| | CUDA | NVIDIA GPU | Very High | Production inference | | Metal | Apple Silicon | High | macOS/iOS deployment | | CPU (MKL) | x86 Intel | Medium-High | CPU-only servers | | CPU (Accelerate) | Apple | Medium-High | macOS CPU fallback | | WGPU | WebGPU-enabled | Medium | Browser deployment | **Design Philosophy**: - Remove Python from production workloads - Minimize binary size (critical for edge/serverless) - Fast startup times (first token ~120ms on M2 MacBook Air) - Rust's safety guarantees for ML workloads **Example Usage**: ```rust use candle_core::{Device, Tensor}; use candle_onnx; // Load model let model = candle_onnx::read_file("model.onnx")?; let graph = model.graph.as_ref().unwrap(); // Create device (CUDA/Metal/CPU) let device = Device::cuda_if_available(0)?; // Run inference let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?; let output = model.forward(&[input])?; ``` ### 3.2 Tract Framework (Sonos) **Architecture**: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices. **Key Capabilities**: - **ONNX Support**: 85% of ONNX backend tests passing - **Operator Set**: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18) - **Proven Models**: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc. - **Pulsing**: Streaming inference for time-series models (e.g., WaveNet) - **Quantization**: Built-in int8 quantization support **Performance Characteristics**: - Optimized for CPU inference - Excellent for edge devices (Raspberry Pi, embedded systems) - Minimal memory footprint - No RTTI or runtime overhead **Example Usage**: ```rust use tract_onnx::prelude::*; // Load and optimize model let model = tract_onnx::onnx() .model_for_path("model.onnx")? .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())? .into_optimized()? .into_runnable()?; // Run inference let input = tract_ndarray::arr4(&[[...]]).into_dyn(); let result = model.run(tvec![input.into()])?; ``` **Quantization Support**: ```rust let model = tract_onnx::onnx() .model_for_path("model.onnx")? .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())? .quantize()? // Automatic int8 quantization .into_optimized()? .into_runnable()?; ``` ### 3.3 Comparison: Candle vs Tract vs ort | Criterion | Candle | Tract | ort | |-----------|--------|-------|-----| | **Performance (GPU)** | Very High | N/A | Very High | | **Performance (CPU)** | High | Very High | Very High | | **Binary Size** | Small | Very Small | Large | | **Startup Time** | Fast | Very Fast | Medium | | **WASM Support** | Excellent | Excellent | Good (with backends) | | **Model Ecosystem** | Hugging Face | ONNX/TF | ONNX (largest) | | **GPU Backends** | CUDA/Metal/WGPU | Limited | CUDA/TensorRT/OpenVINO | | **Quantization** | Manual | Built-in | Excellent (ONNX tools) | | **Maturity** | Stable (2024+) | Mature (2018+) | Production (Microsoft) | **Recommendation**: - **ort**: Primary choice for maximum performance and hardware acceleration - **candle**: Secondary choice for WASM targets or Hugging Face integration - **tract**: Fallback for pure Rust requirements or extreme size constraints --- ## 4. Image Processing in Rust ### 4.1 The `image` Crate (Foundation) **Purpose**: Core image encoding/decoding and basic manipulation. **Supported Formats**: - JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF **Key Features**: ```rust use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView}; // Load image let img = image::open("input.jpg")?; // Basic operations (in imageops module) let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3); let grayscale = img.grayscale(); let blurred = imageops::blur(&img, 2.0); let contrast_adjusted = imageops::contrast(&img, 30.0); ``` ### 4.2 The `imageproc` Crate (Advanced Processing) **Purpose**: Advanced image processing algorithms for computer vision. **Modules**: | Module | Capabilities | |--------|-------------| | **Contrast** | Histogram equalization, adaptive thresholding, CLAHE | | **Corners** | Harris, FAST, Shi-Tomasi corner detection | | **Distance Transform** | Euclidean distance maps, morphological operations | | **Edges** | Canny edge detection, Sobel/Scharr operators | | **Filter** | Gaussian, median, bilateral filtering | | **Geometric** | Rotation, affine, projective transformations | | **Morphology** | Erosion, dilation, opening, closing | | **Drawing** | Shapes, text, anti-aliased primitives | | **Contours** | Border tracing, contour extraction | **Parallelism**: CPU-based multithreading via `rayon` (not GPU acceleration) **OCR Preprocessing Example**: ```rust use imageproc::contrast::{adaptive_threshold, ThresholdType}; use imageproc::filter::gaussian_blur_f32; use imageproc::geometric_transformations::{rotate_about_center, Interpolation}; // Preprocessing pipeline for OCR fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage { // Convert to grayscale let gray = img.to_luma8(); // Denoise with Gaussian blur let blurred = gaussian_blur_f32(&gray, 1.0); // Adaptive thresholding for varying lighting let binary = adaptive_threshold(&blurred, 21); // Deskew if needed let angle = detect_skew(&binary); // Custom function let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8])); deskewed } ``` ### 4.3 GPU Acceleration Options for Image Processing **Current State**: `imageproc` does NOT provide GPU acceleration. For GPU-accelerated image processing, consider: **Option 1: `wgpu` + Custom Compute Shaders** ```rust use wgpu; // GPU compute shader for image processing let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor { label: Some("Image Processing"), source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")), }); ``` **Option 2: OpenCV-Rust Bindings** (if CUDA needed) - Provides GPU-accelerated operations via CUDA - Requires OpenCV C++ installation - Not pure Rust **Option 3: Integrate with ML Framework GPU Ops** - Use candle/ort tensor operations for preprocessing - Leverage existing GPU context - Keep preprocessing on same device as inference **Recommendation for ruvector-scipix**: - Use `image` + `imageproc` for CPU preprocessing (fast enough for most cases) - For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations - Leverage rayon parallelism for batch processing --- ## 5. GPU Acceleration Options ### 5.1 Cross-Platform GPU Support in 2025 The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs. **Unified Backend: `wgpu` (WebGPU Standard)** - **Targets**: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers) - **Use Case**: Portable GPU compute without vendor lock-in - **Frameworks**: Burn, Candle (WGPU backend), kalosm **Performance Profile**: | Backend | Platform | Speedup vs CPU | Use Case | |---------|----------|----------------|----------| | CUDA | NVIDIA GPU | 10-50x | Production ML inference | | TensorRT | NVIDIA GPU | 15-70x | Optimized ONNX models | | Metal | Apple Silicon | 8-30x | macOS/iOS deployment | | OpenVINO | Intel | 5-20x | Intel CPU/GPU optimization | | WGPU | WebGPU-capable | 3-15x | Browser/cross-platform | | ROCm | AMD GPU | 10-40x | AMD GPU acceleration | ### 5.2 CUDA Support **Primary Library**: `cudarc` (Low-level CUDA bindings) **Integration via ONNX Runtime**: ```toml [dependencies] ort = { version = "2.0", features = ["cuda"] } ``` **Requirements**: - CUDA Toolkit 11.6+ (for ort) - NVIDIA GPU: Maxwell (7xx series) or newer - Compute Capability 5.0+ **Benefits**: - Industry-standard ML acceleration - Mature ecosystem and tooling - Extensive operator coverage - Best-in-class performance for training and inference ### 5.3 Metal Support (Apple Silicon) **Framework Integration**: - **Candle**: Native Metal backend via `metal` crate - **Burn**: Metal support through `burn-metal` backend - **ONNX Runtime**: CoreML execution provider (Metal-accelerated) **Example (Candle)**: ```rust use candle_core::Device; let device = Device::new_metal(0)?; // First Metal device let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?; ``` **Performance**: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips ### 5.4 WebGPU/WGPU **Purpose**: Cross-platform GPU compute for WASM and native **Frameworks with WGPU Support**: - **Burn**: First-class WGPU backend - **Candle**: WGPU support for browser deployment - **Kalosm**: WGPU acceleration via Fusor (0.5 release) **Browser Deployment**: ```rust // WASM-compatible GPU inference #[cfg(target_arch = "wasm32")] use candle_core::Device; let device = Device::Cpu; // Or Device::Metal/Cuda if available ``` **Benefits**: - Browser-based ML inference without server - Works on AMD GPUs (unlike CUDA) - Portable across desktop and web - Future-proof standard (W3C specification) **Limitations**: - Lower performance than native CUDA/Metal - Browser memory constraints (typically 2-8GB) - First token latency: ~120ms (acceptable for many use cases) ### 5.5 TensorRT (NVIDIA Optimization) **Purpose**: Optimized ONNX model execution on NVIDIA GPUs **Requirements**: - NVIDIA GPU: GeForce 9xx series or newer - TensorRT 8.4+ - CUDA 11.6+ **Integration**: ```toml ort = { version = "2.0", features = ["cuda", "tensorrt"] } ``` **Benefits**: - Automatic kernel fusion and layer optimization - Mixed precision (FP32/FP16/INT8) - Up to 2-5x faster than standard CUDA - Optimal for high-throughput production deployment ### 5.6 OpenVINO (Intel) **Target**: Intel CPUs (6th gen+) and Intel integrated GPUs **Use Case**: - Intel-based servers without discrete GPU - Edge devices with Intel processors - Cost-effective acceleration without NVIDIA hardware **Integration**: ```toml ort = { version = "2.0", features = ["openvino"] } ``` **Performance**: 5-20x CPU speedup depending on model and hardware ### 5.7 GPU Acceleration Recommendation for ruvector-scipix **Tiered Approach**: 1. **Primary (Production)**: `ort` with CUDA/TensorRT - Maximum performance for server deployment - Best operator coverage for PaddleOCR models - Production-proven reliability 2. **Secondary (Apple Ecosystem)**: `candle` with Metal - Native Apple Silicon support - Good for macOS/iOS deployment - Smaller binary size than ONNX Runtime 3. **Tertiary (WASM/Browser)**: `candle` or `tract` with WGPU - Client-side OCR in browser - Privacy-preserving (no server upload) - Acceptable performance for interactive use 4. **Fallback (CPU-only)**: `tract` or `ort` with optimized CPU execution - MKL/OpenBLAS acceleration - Rayon parallelism - Still faster than Python alternatives --- ## 6. WebAssembly Compilation Considerations ### 6.1 WASM for ML: Current State (2025) **Key Finding**: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives. **Performance Characteristics**: - Rust compiles to WASM **faster** than C++ - Rust produces **smaller binaries** than C++ WASM - **Memory efficiency**: Rust's ownership model translates well to WASM linear memory - Consistent performance across browsers ### 6.2 Memory Constraints and Optimization **Browser Memory Limits**: - Typical: 2-4GB per tab (Chrome/Firefox) - Maximum: 4-8GB (varies by browser/OS) - **Critical Issue**: Running multiple models can exhaust memory quickly **Memory Optimization Strategies**: **1. Model Quantization** ```rust // INT8 quantization reduces memory by 4x // FP16 quantization reduces memory by 2x let quantized_model = model.quantize(QuantizationType::QInt8)?; ``` **2. Memory Reuse** ```rust // Pre-allocate tensors, reuse across inferences struct InferenceContext { input_buffer: Vec, output_buffer: Vec, } impl InferenceContext { fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> { self.input_buffer.copy_from_slice(data); model.run(&self.input_buffer, &mut self.output_buffer)?; Ok(&self.output_buffer) } } ``` **3. Lazy Loading with Streaming Compile** ```rust // Use WebAssembly.instantiateStreaming for faster startup // Load models on-demand, not at initialization async fn load_model_lazy(url: &str) -> Result { let response = window.fetch(url).await?; let module = WebAssembly::instantiate_streaming(response).await?; Ok(module) } ``` **4. wasm-opt Optimization** ```bash # Optimize WASM binary size and performance wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm ``` **5. Model Cleanup** ```rust // Explicit cleanup when switching models impl Drop for ModelContext { fn drop(&mut self) { // Free GPU resources self.gpu_buffers.clear(); // Trigger garbage collection hint (if available) } } ``` ### 6.3 Bundle Size Considerations **Challenge**: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint. **Mitigation Strategies**: **1. Code Splitting** ```rust // Load OCR functionality separately from main bundle #[wasm_bindgen] pub async fn init_ocr() -> Result { // Lazy-load OCR model let model = load_model("ocr.onnx").await?; Ok(OcrEngine::new(model)) } ``` **2. Minimal Features** ```toml [dependencies] ort = { version = "2.0", default-features = false, features = ["minimal-build"] } tract-onnx = { version = "0.22", default-features = false } ``` **3. Compression** ```bash # Brotli compression (recommended by Chrome) brotli -q 11 output.wasm -o output.wasm.br # Gzip fallback gzip -9 output.wasm ``` **4. Tree Shaking** ```toml [profile.release] opt-level = "z" # Optimize for size lto = true codegen-units = 1 panic = "abort" strip = true ``` **Expected Sizes**: | Configuration | Uncompressed | Brotli | Gzip | |---------------|--------------|--------|------| | Minimal tract | ~800KB | ~250KB | ~320KB | | Full ort | ~3MB | ~900KB | ~1.1MB | | Candle (minimal) | ~600KB | ~180KB | ~240KB | ### 6.4 WASM-Specific Limitations **1. Threading Constraints** - SharedArrayBuffer required for multi-threading - COEP/COOP headers needed for isolation - Not all browsers support WASM threads **2. SIMD Support** - WASM SIMD enabled by default in modern browsers - Significant performance boost for ML operations - Check browser compatibility: `wasm-feature-detect` **3. No Direct File System Access** - Use IndexedDB or Cache API for model storage - Stream models from network (HTTP/2) - Consider embedding small models in binary **4. GPU Access** - WebGPU required for GPU acceleration - Not universally supported (as of 2025, Chrome/Edge primarily) - Fallback to CPU inference needed ### 6.5 Recommended WASM Frameworks for ruvector-scipix **Primary: `candle` with WGPU** - Smallest binary size - Native WASM support - WebGPU acceleration when available - Hugging Face ecosystem **Secondary: `tract`** - Pure Rust, no C++ dependencies - Excellent WASM support - Proven in production (Sonos) - CPU-optimized **Alternative: `ort` with WASM backend** - Full ONNX operator support - Can use tract or candle as backend - Larger bundle size **Example WASM Integration**: ```rust use wasm_bindgen::prelude::*; use candle_core::{Device, Tensor}; #[wasm_bindgen] pub struct OcrEngine { model: candle_onnx::Model, device: Device, } #[wasm_bindgen] impl OcrEngine { #[wasm_bindgen(constructor)] pub async fn new() -> Result { // Use WebGPU if available, fallback to CPU let device = Device::Cpu; // Or Device::new_wgpu(0)? // Load model from URL let model_bytes = fetch_model("model.onnx").await?; let model = candle_onnx::read(&model_bytes) .map_err(|e| JsValue::from_str(&e.to_string()))?; Ok(OcrEngine { model, device }) } pub fn recognize_text(&self, image_data: &[u8]) -> Result { // Preprocess image let tensor = preprocess_image(image_data, &self.device)?; // Run inference let output = self.model.forward(&[tensor]) .map_err(|e| JsValue::from_str(&e.to_string()))?; // Decode output let text = decode_predictions(output)?; Ok(text) } } ``` ### 6.6 WASM Deployment Checklist - [ ] Enable WASM SIMD in build (`RUSTFLAGS='-C target-feature=+simd128'`) - [ ] Optimize bundle size (`opt-level = "z"`, LTO, strip) - [ ] Implement lazy loading for models - [ ] Set up proper CORS headers for model fetching - [ ] Add WebGPU feature detection with CPU fallback - [ ] Configure Brotli/Gzip compression on CDN - [ ] Test memory usage across browsers (especially mobile) - [ ] Implement model cleanup on tab close - [ ] Add loading indicators for async model initialization - [ ] Consider service worker for model caching --- ## 7. Memory Management for Large Models ### 7.1 Memory Challenges in ML Inference **Typical OCR Model Sizes**: - PaddleOCR Detection: 3-10MB (FP32) - PaddleOCR Recognition: 5-15MB (FP32) - TrOCR: 50-300MB (depending on variant) - Tesseract trained data: 10-50MB per language **Memory Consumption Beyond Model Weights**: - Input tensors: Image size Γ— channels Γ— precision - Intermediate activations: Varies by architecture (can exceed model size) - Output buffers: Sequence length Γ— vocab size - KV cache (for transformers): Context length Γ— hidden size Γ— layers ### 7.2 Quantization Strategies **INT8 Quantization** (4x memory reduction) ```rust // ONNX Runtime quantization use ort::quantization::{QuantizationConfig, QuantizationType}; let config = QuantizationConfig::default() .with_per_channel(true) .with_reduce_range(true); let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?; ``` **Benefits**: - 75% memory reduction (FP32 β†’ INT8) - Minimal accuracy loss (typically <1% for OCR) - Faster inference on integer-optimized hardware - Reduced cache pressure **FP16 Quantization** (2x memory reduction) ```rust // Using ort with half crate use half::f16; use ort::tensor::OrtOwnedTensor; let input_f16: Vec = input_f32.iter().map(|&x| f16::from_f32(x)).collect(); let tensor = OrtOwnedTensor::from_array(input_f16)?; ``` **Benefits**: - Better accuracy preservation than INT8 - Native support on modern GPUs (Tensor Cores) - Still significant memory savings **Dynamic Quantization** (Runtime) ```rust // tract supports dynamic quantization let model = tract_onnx::onnx() .model_for_path("model.onnx")? .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))? .quantize()? // Automatic quantization .into_optimized()? .into_runnable()?; ``` ### 7.3 Memory Pooling and Reuse **Tensor Buffer Reuse**: ```rust use std::sync::Arc; use parking_lot::Mutex; struct TensorPool { buffers: Vec>>>, size: usize, } impl TensorPool { fn new(pool_size: usize, buffer_size: usize) -> Self { let buffers = (0..pool_size) .map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size]))) .collect(); TensorPool { buffers, size: pool_size } } fn acquire(&self) -> Option>>> { // Round-robin or availability-based selection self.buffers.first().cloned() } } ``` **Session Pooling** (ONNX Runtime): ```rust use once_cell::sync::Lazy; use ort::Session; static SESSION_POOL: Lazy> = Lazy::new(|| { (0..4).map(|_| { Session::builder() .unwrap() .commit_from_file("model.onnx") .unwrap() }).collect() }); fn get_session() -> &'static Session { &SESSION_POOL[thread_id % 4] } ``` ### 7.4 Streaming and Batching **Batch Processing** (Amortize overhead): ```rust fn process_batch(images: &[DynamicImage], model: &Session) -> Result> { let batch_size = images.len(); // Create batched tensor [batch_size, channels, height, width] let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224]; for (i, img) in images.iter().enumerate() { let offset = i * 3 * 224 * 224; preprocess_into_buffer(img, &mut batch_tensor[offset..]); } // Single inference call for entire batch let output = model.run(vec![batch_tensor.into()])?; // Decode batch results decode_batch_predictions(output, batch_size) } ``` **Streaming Inference** (For large documents): ```rust async fn process_document_streaming( pages: impl Stream, model: &Session, ) -> impl Stream> { pages.map(|page| { // Process one page at a time let text = recognize_text(&page, model)?; Ok(text) }) } ``` ### 7.5 Model Sharding and Lazy Loading **Lazy Model Loading**: ```rust use once_cell::sync::OnceCell; static DETECTION_MODEL: OnceCell = OnceCell::new(); static RECOGNITION_MODEL: OnceCell = OnceCell::new(); fn get_detection_model() -> &'static Session { DETECTION_MODEL.get_or_init(|| { Session::builder() .unwrap() .commit_from_file("detection.onnx") .unwrap() }) } ``` **Conditional Loading**: ```rust // Only load language-specific models when needed struct OcrEngine { detection: Session, recognition_models: HashMap>, } impl OcrEngine { fn recognize(&self, img: &Image, lang: Language) -> Result { let boxes = self.detect(img)?; let rec_model = self.recognition_models .get(&lang) .unwrap() .get_or_init(|| load_recognition_model(lang)); self.recognize_boxes(img, &boxes, rec_model) } } ``` ### 7.6 Memory Mapping (Large Models) **Using `memmap2` for Model Files**: ```rust use memmap2::Mmap; use std::fs::File; fn load_model_mmap(path: &str) -> Result { let file = File::open(path)?; let mmap = unsafe { Mmap::map(&file)? }; Ok(mmap) } // Model data stays on disk, paged in as needed // Useful for models >100MB ``` **Benefits**: - Reduced resident memory - Faster startup (no full load) - Shared memory across processes **Limitations**: - Not available in WASM - Requires file system access - May have higher latency on first access ### 7.7 GPU Memory Management **CUDA Unified Memory**: ```rust // ort automatically manages GPU memory let session = Session::builder()? .with_execution_providers([ExecutionProvider::CUDA])? .commit_from_file("model.onnx")?; // Tensors automatically transferred to/from GPU ``` **Manual GPU Memory Control** (candle): ```rust use candle_core::{Device, Tensor}; let device = Device::new_cuda(0)?; // Allocate on GPU let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?; // Transfer to CPU when needed let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?; // Explicit cleanup drop(tensor_gpu); ``` ### 7.8 Memory Profiling and Monitoring **Rust Memory Profiling Tools**: - `valgrind --tool=massif`: Heap profiling - `heaptrack`: Heap memory profiler (Linux) - `dhat`: Dynamic heap analysis tool - `tokio-console`: Async runtime monitoring **Custom Memory Tracking**: ```rust use std::alloc::{GlobalAlloc, Layout, System}; use std::sync::atomic::{AtomicUsize, Ordering}; struct TrackingAllocator; static ALLOCATED: AtomicUsize = AtomicUsize::new(0); unsafe impl GlobalAlloc for TrackingAllocator { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst); System.alloc(layout) } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst); System.dealloc(ptr, layout) } } #[global_allocator] static GLOBAL: TrackingAllocator = TrackingAllocator; fn get_memory_usage() -> usize { ALLOCATED.load(Ordering::SeqCst) } ``` ### 7.9 Memory Optimization Recommendations for ruvector-scipix **Priority Strategies**: 1. **Quantize Models** (INT8 for production) - 4x memory reduction - Minimal accuracy impact for OCR - Use ONNX Runtime quantization tools 2. **Implement Tensor Pooling** - Reuse buffers for repeated inferences - Align with ruvector-core's memory management patterns - Use `parking_lot` for efficient synchronization 3. **Lazy Load Language Models** - Only load recognition models for requested languages - Use `OnceCell` for thread-safe initialization - Share models across threads 4. **Batch Processing** - Group multiple images into single inference call - Amortize overhead, improve GPU utilization - Integrate with ruvector's parallel processing 5. **GPU Memory Awareness** - Monitor GPU memory usage - Implement fallback to CPU if GPU OOM - Use smaller batch sizes on memory-constrained devices 6. **Profile Real Workloads** - Measure memory with actual ruvector data - Identify bottlenecks (model weights vs activations) - Optimize based on data --- ## 8. Recommended Technology Stack for ruvector-scipix ### 8.1 Primary Stack (Production Deployment) **Inference Engine**: `ort` (ONNX Runtime) - **Version**: `2.0.0-rc` or latest stable - **Features**: `cuda`, `tensorrt`, `half`, `load-dynamic` - **Rationale**: - Best-in-class performance (73% latency reduction) - Extensive GPU support (CUDA, TensorRT, OpenVINO) - Production-proven (Twitter, Google, SurrealDB) - Largest ONNX model ecosystem **OCR Models**: PaddleOCR v5 (ONNX format) - **Detection**: `ch_PP-OCRv5_mobile_det.onnx` - **Recognition**: `ch_PP-OCRv5_mobile_rec.onnx` - **Rationale**: - State-of-the-art accuracy - Optimized for speed (5x faster in ONNX) - Multi-language support (80+ languages) - Active development (2025 updates) **Image Processing**: `image` + `imageproc` - **Version**: Latest stable - **Rationale**: - Comprehensive format support - CPU parallelism via rayon (already in workspace) - Mature, well-tested - Pure Rust (no C++ dependencies) **Dependencies Integration**: ```toml [dependencies] # Inference ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] } # Image processing image = "0.25" imageproc = "0.25" # Existing ruvector-core dependencies (reuse) rayon = { workspace = true } ndarray = { workspace = true } parking_lot = { workspace = true } dashmap = { workspace = true } tokio = { workspace = true } thiserror = { workspace = true } serde = { workspace = true } ``` ### 8.2 Alternative Stack (WASM/Browser Deployment) **Inference Engine**: `candle` with WGPU backend - **Version**: Latest stable from Hugging Face - **Features**: `wasm`, `webgpu` - **Rationale**: - Smallest WASM bundle size - Native WebGPU support - Fast startup times - Pure Rust **OCR Models**: TrOCR (via candle-onnx) or lightweight PaddleOCR - Smaller models for browser constraints - Quantized INT8 versions **WASM-Specific Stack**: ```toml [target.'cfg(target_arch = "wasm32")'.dependencies] candle-core = { version = "0.8", default-features = false } candle-onnx = { version = "0.8" } wasm-bindgen = { workspace = true } web-sys = { workspace = true } ``` ### 8.3 Fallback Stack (Pure Rust/No External Dependencies) **Inference Engine**: `tract` - **Use Case**: When ONNX Runtime binaries unavailable or pure Rust required - **Rationale**: - No C++ dependencies - Excellent WASM support - Mature (Sonos production use) - Passes 85% ONNX tests **Stack**: ```toml [dependencies] tract-onnx = "0.22" image = "0.25" imageproc = "0.25" ``` ### 8.4 Architecture Design ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ruvector-scipix β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Image Input │────▢│ Preprocessing│───▢│ Detection β”‚ β”‚ β”‚ β”‚ (image) β”‚ β”‚ (imageproc) β”‚ β”‚ (ort/ONNX) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Text Boxes β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Recognition │─────▢│ Post-Proc. β”‚ β”‚ β”‚ β”‚ (ort/ONNX) β”‚ β”‚ (decode) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Vector Store β”‚ β”‚ β”‚ β”‚ (ruvector- β”‚ β”‚ β”‚ β”‚ core) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ GPU Acceleration Layers: β”œβ”€ CUDA/TensorRT (NVIDIA) β”œβ”€ Metal (Apple Silicon) β”œβ”€ OpenVINO (Intel) └─ WGPU (Cross-platform/Browser) ``` ### 8.5 Module Structure ``` examples/scipix/ β”œβ”€β”€ Cargo.toml β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ lib.rs # Public API β”‚ β”œβ”€β”€ engine.rs # OCR engine orchestration β”‚ β”œβ”€β”€ detection.rs # Text detection (ONNX) β”‚ β”œβ”€β”€ recognition.rs # Text recognition (ONNX) β”‚ β”œβ”€β”€ preprocessing.rs # Image preprocessing (imageproc) β”‚ β”œβ”€β”€ postprocessing.rs # Result decoding and formatting β”‚ β”œβ”€β”€ models.rs # Model loading and management β”‚ └── config.rs # Configuration β”œβ”€β”€ models/ # ONNX model files (gitignored) β”‚ β”œβ”€β”€ detection.onnx β”‚ β”œβ”€β”€ recognition.onnx β”‚ └── dict.txt β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ integration_test.rs β”‚ └── benchmark.rs └── docs/ β”œβ”€β”€ 01_REQUIREMENTS.md β”œβ”€β”€ 02_ARCHITECTURE.md └── 03_RUST_ECOSYSTEM.md # This document ``` ### 8.6 Performance Targets Based on PaddleOCR benchmarks and Rust optimizations: | Metric | Target | Hardware | |--------|--------|----------| | **Detection Latency** | <50ms | NVIDIA T4 (TensorRT) | | **Recognition Latency** | <20ms | NVIDIA T4 (TensorRT) | | **End-to-End (single image)** | <100ms | NVIDIA T4 | | **Throughput (batched)** | >100 images/sec | NVIDIA T4 | | **CPU Latency** | <500ms | Modern multi-core CPU | | **WASM Latency** | <1s | Browser (WebGPU) | | **Memory Usage** | <500MB | With INT8 quantization | ### 8.7 Development Phases **Phase 1: Core Implementation (ort + PaddleOCR)** - Implement detection and recognition pipelines - Integrate with ruvector-core storage - CPU-only inference initially - Basic preprocessing (resize, normalize) **Phase 2: GPU Acceleration** - Add CUDA/TensorRT support - Benchmark and optimize performance - Implement batching for throughput - Memory pooling and reuse **Phase 3: Production Hardening** - Model quantization (INT8) - Error handling and fallbacks - Metrics and monitoring - Load testing **Phase 4: WASM Support (Optional)** - Port to candle or tract - Browser deployment - WebGPU acceleration - Client-side OCR ### 8.8 Testing Strategy **Unit Tests**: - Image preprocessing correctness - Model loading and initialization - Tensor shape validation - Output decoding accuracy **Integration Tests**: ```rust #[test] fn test_end_to_end_ocr() { let engine = OcrEngine::new(Config::default()).unwrap(); let img = image::open("tests/fixtures/sample.jpg").unwrap(); let result = engine.recognize_text(&img).unwrap(); assert!(result.contains("expected text")); } ``` **Benchmarks** (using Criterion): ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; fn benchmark_detection(c: &mut Criterion) { let engine = setup_engine(); let img = load_test_image(); c.bench_function("detection", |b| { b.iter(|| engine.detect(black_box(&img))) }); } criterion_group!(benches, benchmark_detection); criterion_main!(benches); ``` **Performance Tests**: - Latency under various image sizes - Throughput with batching - Memory usage over time - GPU utilization --- ## 9. Integration with ruvector-core Dependencies ### 9.1 Shared Workspace Dependencies The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency. **Already Available (from workspace)**: | Dependency | ruvector Use | scipix Use | |------------|--------------|-------------| | `rayon` | Parallel distance computation | Batch image preprocessing, parallel OCR | | `ndarray` | Vector operations | Tensor manipulation, image arrays | | `parking_lot` | Lock-free data structures | Model pool synchronization | | `dashmap` | Concurrent hash maps | Model cache, result cache | | `tokio` | Async runtime | Async inference, streaming | | `serde` / `serde_json` | Serialization | Config, results serialization | | `thiserror` / `anyhow` | Error handling | OCR error types | | `tracing` | Logging | Inference timing, debugging | | `uuid` | Unique identifiers | Request tracking | | `chrono` | Timestamps | Inference metrics | **Benefits**: - **Minimal new dependencies**: Only add OCR-specific crates - **Consistent patterns**: Same error handling, logging, async across codebase - **Binary size**: Shared dependencies not duplicated - **Maintenance**: Updates to workspace deps benefit all crates ### 9.2 Parallel Processing Integration **Leverage rayon for Batch OCR**: ```rust use rayon::prelude::*; fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec { images.par_iter() .map(|img| engine.recognize_text(img)) .collect() } ``` **Consistency**: Matches ruvector-core's parallel distance computation pattern ### 9.3 Storage Integration **Store OCR Results in ruvector-core**: ```rust use ruvector_core::{VectorStore, Vector}; struct OcrResult { text: String, embedding: Vec, // From embedding model bounding_boxes: Vec, } impl OcrResult { fn store_in_ruvector(&self, store: &mut VectorStore) -> Result { let vector = Vector::new(self.embedding.clone()); let id = store.insert(vector)?; // Store metadata store.set_metadata(id, "text", &self.text)?; store.set_metadata(id, "boxes", &self.bounding_boxes)?; Ok(id) } } ``` **Vector Search for OCR Results**: ```rust // Find similar documents by text embedding let query_embedding = embed_text("search query")?; let similar_docs = store.search(&query_embedding, 10)?; ``` ### 9.4 WASM Compatibility **ruvector-core WASM Patterns**: - `memory-only` feature for WASM targets - `wasm-bindgen` for browser interop - `getrandom` with `wasm_js` feature **Apply to scipix**: ```toml [target.'cfg(target_arch = "wasm32")'.dependencies] candle-core = { version = "0.8", default-features = false } wasm-bindgen = { workspace = true } getrandom = { workspace = true, features = ["wasm_js"] } [features] default = ["ort-backend"] ort-backend = ["ort"] candle-backend = ["candle-core", "candle-onnx"] wasm = ["candle-backend"] # WASM uses candle ``` ### 9.5 Error Handling Patterns **Consistent with ruvector-core**: ```rust use thiserror::Error; #[derive(Error, Debug)] pub enum OcrError { #[error("Model loading failed: {0}")] ModelLoadError(String), #[error("Inference failed: {0}")] InferenceError(String), #[error("Image preprocessing failed: {0}")] PreprocessingError(#[from] image::ImageError), #[error("ONNX Runtime error: {0}")] OrtError(#[from] ort::Error), #[error("IO error: {0}")] IoError(#[from] std::io::Error), } pub type Result = std::result::Result; ``` ### 9.6 Configuration Pattern **Similar to ruvector-core config**: ```rust use serde::{Deserialize, Serialize}; #[derive(Debug, Clone, Serialize, Deserialize)] pub struct OcrConfig { /// Path to detection model pub detection_model_path: String, /// Path to recognition model pub recognition_model_path: String, /// Use GPU acceleration if available pub use_gpu: bool, /// Batch size for parallel processing pub batch_size: usize, /// Detection confidence threshold pub detection_threshold: f32, /// Number of inference threads pub num_threads: usize, } impl Default for OcrConfig { fn default() -> Self { Self { detection_model_path: "models/detection.onnx".into(), recognition_model_path: "models/recognition.onnx".into(), use_gpu: true, batch_size: 8, detection_threshold: 0.7, num_threads: rayon::current_num_threads(), } } } ``` ### 9.7 Async Integration **Use tokio for async OCR**: ```rust use tokio::task; pub struct AsyncOcrEngine { engine: Arc, } impl AsyncOcrEngine { pub async fn recognize_text(&self, image: DynamicImage) -> Result { let engine = Arc::clone(&self.engine); // Run blocking OCR in tokio threadpool task::spawn_blocking(move || { engine.recognize_text_sync(&image) }).await? } pub async fn process_stream( &self, images: impl Stream, ) -> impl Stream> { images.then(move |img| { let engine = Arc::clone(&self.engine); async move { engine.recognize_text(img).await } }) } } ``` ### 9.8 Metrics Integration **Use existing tracing infrastructure**: ```rust use tracing::{info, debug, instrument}; #[instrument(skip(self, image))] pub fn recognize_text(&self, image: &DynamicImage) -> Result { let start = std::time::Instant::now(); debug!("Starting OCR for image {}x{}", image.width(), image.height()); let preprocessed = self.preprocess(image)?; debug!("Preprocessing took {:?}", start.elapsed()); let boxes = self.detect(&preprocessed)?; debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed()); let text = self.recognize(&preprocessed, &boxes)?; info!( "OCR completed in {:?}, extracted {} characters", start.elapsed(), text.len() ); Ok(OcrResult { text, boxes }) } ``` ### 9.9 Testing Infrastructure Reuse **Use workspace test dependencies**: ```toml [dev-dependencies] criterion = { workspace = true } proptest = { workspace = true } mockall = { workspace = true } tempfile = "3.13" ``` **Property-Based Testing** (like ruvector-core): ```rust use proptest::prelude::*; proptest! { #[test] fn test_preprocessing_preserves_aspect_ratio( width in 100u32..2000u32, height in 100u32..2000u32 ) { let img = DynamicImage::new_rgb8(width, height); let processed = preprocess_image(&img)?; let original_ratio = width as f32 / height as f32; let processed_ratio = processed.width() as f32 / processed.height() as f32; prop_assert!((original_ratio - processed_ratio).abs() < 0.01); } } ``` ### 9.10 Dependency Summary for scipix **New Dependencies Required**: ```toml [dependencies] # OCR/ML (new) ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] } image = "0.25" imageproc = "0.25" # Reuse from workspace (no version needed) rayon = { workspace = true } ndarray = { workspace = true } parking_lot = { workspace = true } dashmap = { workspace = true } tokio = { workspace = true } serde = { workspace = true } serde_json = { workspace = true } thiserror = { workspace = true } anyhow = { workspace = true } tracing = { workspace = true } uuid = { workspace = true } chrono = { workspace = true } # Integration with ruvector-core ruvector-core = { path = "../../crates/ruvector-core" } ``` **Total New Dependencies**: 3 (ort, image, imageproc) **Reused Dependencies**: 12 from workspace --- ## 10. License Compatibility ### 10.1 ruvector Project License **Current License**: MIT (from workspace `Cargo.toml`) **Requirement**: All dependencies must be MIT-compatible for redistribution. ### 10.2 Recommended Dependencies License Analysis | Crate | License | Compatible? | Notes | |-------|---------|-------------|-------| | **ort** | MIT OR Apache-2.0 | βœ… Yes | Dual-licensed, fully compatible | | **candle** | MIT OR Apache-2.0 | βœ… Yes | Hugging Face, dual-licensed | | **tract** | MIT OR Apache-2.0 | βœ… Yes | Dual-licensed (except ONNX protos) | | **image** | MIT OR Apache-2.0 | βœ… Yes | Pure Rust, dual-licensed | | **imageproc** | MIT | βœ… Yes | Permissive, MIT-only | | **ndarray** | MIT OR Apache-2.0 | βœ… Yes | Already in workspace | | **rayon** | MIT OR Apache-2.0 | βœ… Yes | Already in workspace | | **wasm-bindgen** | MIT OR Apache-2.0 | βœ… Yes | Already in workspace | **Incompatible Libraries (Avoid)**: | Crate | License | Issue | |-------|---------|-------| | **leptess** | MIT (wrapper) | ❌ Depends on Tesseract (Apache-2.0 with restrictions) | | **opencv-rust** | MIT (wrapper) | ❌ Depends on OpenCV (Apache-2.0, complex) | ### 10.3 ONNX Model Licenses PaddleOCR models used in ONNX format have **Apache-2.0** license. **Compatibility**: - βœ… Apache-2.0 code can be used in MIT-licensed projects - βœ… ONNX models (weights) are typically considered data, not code - βœ… Distribution of pre-trained models is permitted - ⚠️ Derivative works of Apache-2.0 code require patent grant preservation **Best Practice**: - Download PaddleOCR ONNX models from official sources - Include LICENSE file in `models/` directory - Document model provenance in README - Do not modify Apache-2.0 code (use as-is via ONNX) ### 10.4 Rust Dual-Licensing Best Practices **Why Rust Uses MIT OR Apache-2.0**: - **MIT**: Maximum permissiveness, minimal restrictions - **Apache-2.0**: Patent protection, better for corporate use - **Dual License**: Users choose which applies to them **For ruvector-scipix**: **Option 1: Keep MIT-only (Current)** - βœ… Simplest licensing - βœ… Maximum compatibility - βœ… Minimal legal overhead - βœ… All dependencies are MIT-compatible **Option 2: Adopt Dual MIT/Apache-2.0** - βœ… Better patent protection - βœ… Aligns with Rust ecosystem norms - βœ… More attractive to enterprise users - ⚠️ Slightly more complex **Recommendation**: Keep MIT-only for simplicity, unless patent concerns arise. ### 10.5 License Compliance Checklist **For Production Deployment**: - [ ] Verify all direct dependencies are MIT or MIT/Apache-2.0 - [ ] Check transitive dependencies for license conflicts - [ ] Include LICENSE file in repository - [ ] Document third-party licenses in NOTICE file - [ ] Include PaddleOCR model license in `models/LICENSE` - [ ] Add copyright headers to source files (optional for MIT) - [ ] Review ONNX Runtime's license (MIT, but check binary distribution terms) - [ ] Ensure no GPL/LGPL dependencies (incompatible with MIT) **Automated License Checking**: ```bash # Use cargo-license to audit dependencies cargo install cargo-license cargo license --all-features # Fail build on incompatible licenses cargo deny check licenses ``` **`deny.toml` Configuration**: ```toml [licenses] unlicensed = "deny" allow = [ "MIT", "Apache-2.0", "Apache-2.0 WITH LLVM-exception", "BSD-2-Clause", "BSD-3-Clause", "ISC", "Unicode-DFS-2016", ] deny = [ "GPL-2.0", "GPL-3.0", "AGPL-3.0", ] ``` ### 10.6 Attribution Requirements **MIT License Requirements**: - Include copyright notice - Include permission notice (LICENSE file) - No obligation to disclose source code modifications **For PaddleOCR Models (Apache-2.0)**: - Include NOTICE file if provided - Preserve copyright and patent notices - Document significant modifications (if any) **Recommended NOTICE File**: ``` ruvector-scipix Copyright 2025 Ruvector Team This software includes components from: 1. ONNX Runtime Copyright Microsoft Corporation Licensed under MIT License 2. PaddleOCR Models Copyright PaddlePaddle Authors Licensed under Apache License 2.0 Model files located in models/ directory 3. Candle ML Framework Copyright Hugging Face, Inc. Licensed under MIT OR Apache-2.0 Complete license texts available in the LICENSE and models/LICENSE files. ``` ### 10.7 License Compatibility Summary **βœ… SAFE TO USE** (Recommended Stack): - `ort` - MIT/Apache-2.0 - `image` - MIT/Apache-2.0 - `imageproc` - MIT - `candle` - MIT/Apache-2.0 - `tract` - MIT/Apache-2.0 - PaddleOCR ONNX models - Apache-2.0 (data) **⚠️ USE WITH CAUTION**: - `leptess` - Requires Tesseract C++ library (complex licensing) - `opencv-rust` - Requires OpenCV (large dependency, Apache-2.0) **❌ AVOID**: - Any GPL/LGPL libraries (incompatible with MIT for proprietary use) - Proprietary OCR engines (licensing fees, redistribution restrictions) **Final Recommendation**: The proposed stack (`ort` + PaddleOCR + `image`/`imageproc`) is **fully compatible** with ruvector's MIT license and follows Rust ecosystem best practices. --- ## 11. Final Recommendations ### 11.1 Optimal Technology Stack **Primary Recommendation (Production)**: ```toml [dependencies] # Inference: Best performance, production-proven ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] } # Image processing: Pure Rust, mature image = "0.25" imageproc = "0.25" # OCR models: PaddleOCR v5 ONNX (download separately) # - Detection: ch_PP-OCRv5_mobile_det.onnx # - Recognition: ch_PP-OCRv5_mobile_rec.onnx # Reuse workspace dependencies rayon = { workspace = true } ndarray = { workspace = true } parking_lot = { workspace = true } tokio = { workspace = true } serde = { workspace = true } thiserror = { workspace = true } # Integration ruvector-core = { path = "../../crates/ruvector-core" } ``` **Rationale**: 1. **Performance**: `ort` provides 73% latency reduction vs alternatives 2. **Ecosystem**: Largest ONNX model selection (PaddleOCR, TrOCR, etc.) 3. **GPU Support**: CUDA, TensorRT, OpenVINO, Metal (via CoreML) 4. **Production Ready**: Used by Twitter, Google, SurrealDB 5. **License**: MIT/Apache-2.0 dual-license (fully compatible) 6. **Maintenance**: Active development, Microsoft backing ### 11.2 Alternative Stacks by Use Case **WASM/Browser Deployment**: ```toml candle-core = { version = "0.8", features = ["wasm", "webgpu"] } candle-onnx = "0.8" ``` - Smallest bundle size (~180KB Brotli) - WebGPU acceleration - Fast startup (120ms first token) **Pure Rust / No External Deps**: ```toml tract-onnx = "0.22" ``` - No C++ dependencies - Excellent for embedded/restrictive environments - 85% ONNX compatibility **Edge Devices / Raspberry Pi**: ```toml tract-onnx = { version = "0.22", features = ["pulse"] } ``` - Optimized for CPU inference - Minimal memory footprint - Proven on RPi (11ΞΌs for CNN models) ### 11.3 Implementation Roadmap **Week 1-2: Core Infrastructure** - Set up `examples/scipix` crate structure - Integrate `ort` and `image`/`imageproc` - Implement model loading (detection + recognition) - Basic end-to-end pipeline (CPU-only) **Week 3-4: GPU Acceleration** - Enable CUDA/TensorRT support - Implement batching for throughput - Benchmark performance vs targets - Memory pooling and optimization **Week 5-6: Production Hardening** - Model quantization (INT8) - Error handling and recovery - Metrics and monitoring (tracing) - Integration tests and benchmarks **Week 7-8: ruvector Integration** - Store OCR results in ruvector-core - Implement vector search for documents - Async API with tokio - Documentation and examples **Optional (Week 9-10): WASM Support** - Port to candle for browser deployment - WebGPU acceleration - Client-side OCR demo ### 11.4 Key Metrics to Track **Performance**: - Detection latency: Target <50ms (GPU), <200ms (CPU) - Recognition latency: Target <20ms (GPU), <100ms (CPU) - End-to-end: Target <100ms (GPU), <500ms (CPU) - Throughput: Target >100 images/sec (batched, GPU) **Memory**: - Model size: ~15-30MB (FP32), ~5-10MB (INT8) - Runtime memory: Target <500MB - GPU memory: Monitor for OOM **Accuracy**: - Character accuracy: Target >95% (clean text) - Word accuracy: Target >90% - Benchmark against Tesseract and commercial APIs ### 11.5 Risk Mitigation **Model Availability**: - βœ… PaddleOCR models freely available - βœ… Multiple model versions for fallback - ⚠️ Verify ONNX export quality (may need custom conversion) **Dependency Stability**: - βœ… `ort` actively maintained (2.0 rc, stable release expected) - βœ… `image`/`imageproc` mature, widely used - ⚠️ Monitor for breaking changes during updates **Performance Variability**: - ⚠️ GPU performance depends on driver versions - ⚠️ WASM performance varies by browser - βœ… Comprehensive benchmarking before production **License Compliance**: - βœ… All recommended dependencies MIT-compatible - βœ… PaddleOCR Apache-2.0 (compatible for use) - ⚠️ Review licenses before adding new dependencies ### 11.6 Success Criteria The ruvector-scipix implementation is successful if: 1. **Performance**: Meets or exceeds latency/throughput targets 2. **Accuracy**: Character accuracy >95% on clean text 3. **Integration**: Seamlessly stores results in ruvector-core 4. **Portability**: Runs on Linux/macOS/Windows, CPU and GPU 5. **Memory**: Operates within <500MB budget 6. **License**: Maintains MIT compatibility 7. **Maintainability**: Uses idiomatic Rust, well-documented 8. **Scalability**: Handles batch processing efficiently ### 11.7 Next Steps 1. **Review this document** with ruvector team for alignment 2. **Download PaddleOCR models** (detection + recognition ONNX) 3. **Set up `examples/scipix` crate** with recommended dependencies 4. **Implement basic OCR pipeline** (end-to-end proof of concept) 5. **Benchmark initial implementation** against targets 6. **Iterate and optimize** based on real-world data 7. **Document API** and usage examples 8. **Integrate with ruvector-core** for vector storage --- ## References and Resources ### Documentation - [ort Documentation](https://ort.pyke.io/) - ONNX Runtime Rust bindings by pykeio - [Candle GitHub](https://github.com/huggingface/candle) - Minimalist ML framework for Rust - [tract GitHub](https://github.com/sonos/tract) - Tiny, no-nonsense ONNX/TF inference - [PaddleOCR GitHub](https://github.com/PaddlePaddle/PaddleOCR) - OCR models and documentation - [imageproc Docs](https://docs.rs/imageproc) - Rust image processing library ### Performance Benchmarks - [Rust at the Metal: GPU Layer Driving Modern AI](https://rustacean.ai/p/issue-2-rust-at-the-metal-the-gpu-layer-driving-modern-ai) - [Rust for Machine Learning in 2025](https://markaicode.com/rust-machine-learning-framework-comparison-2025/) - [PaddleOCR 3.0 High-Performance Inference](http://www.paddleocr.ai/main/en/version3.x/deployment/high_performance_inference.html) ### WASM Resources - [WebAssembly 3.0 Performance: Rust vs C++ Benchmarks](https://markaicode.com/webassembly-3-performance-rust-cpp-benchmarks-2025/) - [3W for In-Browser AI: WebLLM + WASM + WebWorkers](https://blog.mozilla.ai/3w-for-in-browser-ai-webllm-wasm-webworkers/) ### License Information - [Rust API Guidelines: Licensing](https://rust-lang.github.io/api-guidelines/necessities.html) - [PaddleOCR License](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE) - Apache-2.0 - [ONNX Runtime License](https://github.com/microsoft/onnxruntime/blob/main/LICENSE) - MIT --- **Document Version**: 1.0 **Last Updated**: 2025-11-28 **Author**: Research and Analysis Agent **Status**: Complete