Files
wifi-densepose/examples/scipix/docs/03_RUST_ECOSYSTEM.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

1914 lines
58 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Rust OCR and ML Ecosystem Analysis for ruvector-scipix
## Executive Summary
This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment.
**Key Finding**: The optimal stack for ruvector-scipix combines `ort` (ONNX Runtime bindings) for inference, `image`/`imageproc` for preprocessing, with optional pure Rust alternatives (`tract`, `candle`) for WASM targets.
---
## 1. Library Comparison Matrix
### OCR Libraries
| Library | Type | Model Support | WASM Support | GPU Support | Maturity | Performance | Dependencies |
|---------|------|---------------|--------------|-------------|----------|-------------|--------------|
| **ocrs** | Native Rust | ONNX (RTen engine) | ✅ Yes | ❌ No | 🟡 Preview | Medium | Minimal (Pure Rust) |
| **oar-ocr** | ONNX Wrapper | PaddleOCR ONNX | ✅ Yes | ✅ CUDA | 🟢 Stable | High | ort (ONNX Runtime) |
| **kalosm-ocr** | Pure Rust | TrOCR (candle) | ✅ Yes | ✅ WGPU/Metal/CUDA | 🟡 Alpha | Medium | candle ML framework |
| **leptess** | FFI Bindings | Tesseract C++ | ❌ No | ❌ No | 🟢 Mature | High (CPU) | Tesseract C++ library |
| **paddle-ocr-rs** | ONNX Wrapper | PaddleOCR v4/v5 | ✅ Yes | ✅ CUDA/TensorRT | 🟢 Stable | Very High | ort (ONNX Runtime) |
| **pure-onnx-ocr** | Pure ONNX | PaddleOCR DBNet+SVTR | ✅ Yes | ✅ Via ONNX RT | 🟢 Active (2025) | High | No C/C++ deps |
### ML Inference Engines
| Library | Purpose | Model Format | WASM Support | GPU Support | Performance | Maturity |
|---------|---------|--------------|--------------|-------------|-------------|----------|
| **ort** | ONNX Runtime | ONNX | ✅ Yes | ✅ CUDA/TensorRT/OpenVINO | **Very High** | 🟢 Production |
| **candle** | ML Framework | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | High | 🟢 Stable (HuggingFace) |
| **tract** | ONNX/TF Inference | ONNX, NNEF, TF | ✅ Yes | ❌ Limited | High (CPU) | 🟢 Mature (Sonos) |
| **burn** | Deep Learning | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | Very High | 🟢 Active |
**Legend**: 🟢 Production-ready | 🟡 Active development | 🔴 Experimental
### Performance Benchmarks
Based on research findings:
- **ort + PaddleOCR**: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4)
- **ONNX conversion**: Up to 5x faster than PaddlePaddle native inference
- **tract**: 70μs (RPi Zero), 11μs (RPi 3) for CNN models
- **Tesseract (leptess)**: Baseline CPU performance, requires preprocessing
- **ocrs**: Early preview, moderate performance on clear text
---
## 2. ONNX Runtime Integration Options
### 2.1 The `ort` Crate (Recommended)
**Overview**: `ort` by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support.
**Key Features**:
- **Hardware Acceleration**: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN
- **Dynamic Loading**: Runtime linking for flexibility (`load-dynamic` feature)
- **Alternative Backends**: Support for tract and candle backends
- **Minimal Builds**: RTTI-free, optimized binary sizes for production
- **Float16/BFloat16**: Via `half` crate integration
- **Production Proven**: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB
**Cargo Features**:
```toml
[dependencies]
ort = { version = "2.0.0-rc", features = [
"half", # Float16/BFloat16 support
"load-dynamic", # Runtime dynamic linking
"cuda", # NVIDIA GPU acceleration (requires CUDA 11.6+)
"tensorrt", # TensorRT optimization (requires TensorRT 8.4+)
] }
```
**Performance Characteristics**:
- Significantly faster than PyTorch for inference
- Supports model quantization (int8, float16)
- Multi-GPU distribution via NCCL
- Optimal for batch processing and real-time inference
**Integration Example**:
```rust
use ort::{Session, Value};
// Load ONNX model
let session = Session::builder()?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(4)?
.commit_from_file("model.onnx")?;
// Run inference
let input = Value::from_array(session.allocator(), &input_tensor)?;
let outputs = session.run(vec![input])?;
```
### 2.2 Alternative: `tract` Backend
**Use Case**: When ONNX Runtime binaries are problematic or WASM target required
**Advantages**:
- Pure Rust implementation
- No external C++ dependencies
- Excellent WASM support
- Passes 85% of ONNX backend tests
- Lightweight and maintainable
**Limitations**:
- No tensor sequences or optional tensors
- Limited GPU support compared to ort
- TensorFlow 2 support via ONNX conversion only
### 2.3 Alternative: `candle` Backend
**Use Case**: When integrating with Hugging Face ecosystem or needing pure Rust
**Advantages**:
- Minimalist design, fast compilation
- Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion)
- WASM + WebGPU acceleration
- Small binary size for serverless deployment
- CUDA, Metal, MKL, Accelerate backends
**Limitations**:
- Younger ecosystem than ONNX Runtime
- Fewer pre-optimized OCR models available
- Focus on inference over training
---
## 3. Pure Rust ML with Candle/Tract
### 3.1 Candle Framework (Hugging Face)
**Architecture**: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment.
**Supported Models**:
- **Language Models**: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder
- **Vision Models**: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything
- **Speech**: Whisper ASR
**Backend Support**:
| Backend | Platform | Performance | Use Case |
|---------|----------|-------------|----------|
| CUDA | NVIDIA GPU | Very High | Production inference |
| Metal | Apple Silicon | High | macOS/iOS deployment |
| CPU (MKL) | x86 Intel | Medium-High | CPU-only servers |
| CPU (Accelerate) | Apple | Medium-High | macOS CPU fallback |
| WGPU | WebGPU-enabled | Medium | Browser deployment |
**Design Philosophy**:
- Remove Python from production workloads
- Minimize binary size (critical for edge/serverless)
- Fast startup times (first token ~120ms on M2 MacBook Air)
- Rust's safety guarantees for ML workloads
**Example Usage**:
```rust
use candle_core::{Device, Tensor};
use candle_onnx;
// Load model
let model = candle_onnx::read_file("model.onnx")?;
let graph = model.graph.as_ref().unwrap();
// Create device (CUDA/Metal/CPU)
let device = Device::cuda_if_available(0)?;
// Run inference
let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?;
let output = model.forward(&[input])?;
```
### 3.2 Tract Framework (Sonos)
**Architecture**: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices.
**Key Capabilities**:
- **ONNX Support**: 85% of ONNX backend tests passing
- **Operator Set**: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18)
- **Proven Models**: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc.
- **Pulsing**: Streaming inference for time-series models (e.g., WaveNet)
- **Quantization**: Built-in int8 quantization support
**Performance Characteristics**:
- Optimized for CPU inference
- Excellent for edge devices (Raspberry Pi, embedded systems)
- Minimal memory footprint
- No RTTI or runtime overhead
**Example Usage**:
```rust
use tract_onnx::prelude::*;
// Load and optimize model
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.into_optimized()?
.into_runnable()?;
// Run inference
let input = tract_ndarray::arr4(&[[...]]).into_dyn();
let result = model.run(tvec![input.into()])?;
```
**Quantization Support**:
```rust
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.quantize()? // Automatic int8 quantization
.into_optimized()?
.into_runnable()?;
```
### 3.3 Comparison: Candle vs Tract vs ort
| Criterion | Candle | Tract | ort |
|-----------|--------|-------|-----|
| **Performance (GPU)** | Very High | N/A | Very High |
| **Performance (CPU)** | High | Very High | Very High |
| **Binary Size** | Small | Very Small | Large |
| **Startup Time** | Fast | Very Fast | Medium |
| **WASM Support** | Excellent | Excellent | Good (with backends) |
| **Model Ecosystem** | Hugging Face | ONNX/TF | ONNX (largest) |
| **GPU Backends** | CUDA/Metal/WGPU | Limited | CUDA/TensorRT/OpenVINO |
| **Quantization** | Manual | Built-in | Excellent (ONNX tools) |
| **Maturity** | Stable (2024+) | Mature (2018+) | Production (Microsoft) |
**Recommendation**:
- **ort**: Primary choice for maximum performance and hardware acceleration
- **candle**: Secondary choice for WASM targets or Hugging Face integration
- **tract**: Fallback for pure Rust requirements or extreme size constraints
---
## 4. Image Processing in Rust
### 4.1 The `image` Crate (Foundation)
**Purpose**: Core image encoding/decoding and basic manipulation.
**Supported Formats**:
- JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF
**Key Features**:
```rust
use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView};
// Load image
let img = image::open("input.jpg")?;
// Basic operations (in imageops module)
let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3);
let grayscale = img.grayscale();
let blurred = imageops::blur(&img, 2.0);
let contrast_adjusted = imageops::contrast(&img, 30.0);
```
### 4.2 The `imageproc` Crate (Advanced Processing)
**Purpose**: Advanced image processing algorithms for computer vision.
**Modules**:
| Module | Capabilities |
|--------|-------------|
| **Contrast** | Histogram equalization, adaptive thresholding, CLAHE |
| **Corners** | Harris, FAST, Shi-Tomasi corner detection |
| **Distance Transform** | Euclidean distance maps, morphological operations |
| **Edges** | Canny edge detection, Sobel/Scharr operators |
| **Filter** | Gaussian, median, bilateral filtering |
| **Geometric** | Rotation, affine, projective transformations |
| **Morphology** | Erosion, dilation, opening, closing |
| **Drawing** | Shapes, text, anti-aliased primitives |
| **Contours** | Border tracing, contour extraction |
**Parallelism**: CPU-based multithreading via `rayon` (not GPU acceleration)
**OCR Preprocessing Example**:
```rust
use imageproc::contrast::{adaptive_threshold, ThresholdType};
use imageproc::filter::gaussian_blur_f32;
use imageproc::geometric_transformations::{rotate_about_center, Interpolation};
// Preprocessing pipeline for OCR
fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage {
// Convert to grayscale
let gray = img.to_luma8();
// Denoise with Gaussian blur
let blurred = gaussian_blur_f32(&gray, 1.0);
// Adaptive thresholding for varying lighting
let binary = adaptive_threshold(&blurred, 21);
// Deskew if needed
let angle = detect_skew(&binary); // Custom function
let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8]));
deskewed
}
```
### 4.3 GPU Acceleration Options for Image Processing
**Current State**: `imageproc` does NOT provide GPU acceleration. For GPU-accelerated image processing, consider:
**Option 1: `wgpu` + Custom Compute Shaders**
```rust
use wgpu;
// GPU compute shader for image processing
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Image Processing"),
source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")),
});
```
**Option 2: OpenCV-Rust Bindings** (if CUDA needed)
- Provides GPU-accelerated operations via CUDA
- Requires OpenCV C++ installation
- Not pure Rust
**Option 3: Integrate with ML Framework GPU Ops**
- Use candle/ort tensor operations for preprocessing
- Leverage existing GPU context
- Keep preprocessing on same device as inference
**Recommendation for ruvector-scipix**:
- Use `image` + `imageproc` for CPU preprocessing (fast enough for most cases)
- For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations
- Leverage rayon parallelism for batch processing
---
## 5. GPU Acceleration Options
### 5.1 Cross-Platform GPU Support in 2025
The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs.
**Unified Backend: `wgpu` (WebGPU Standard)**
- **Targets**: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers)
- **Use Case**: Portable GPU compute without vendor lock-in
- **Frameworks**: Burn, Candle (WGPU backend), kalosm
**Performance Profile**:
| Backend | Platform | Speedup vs CPU | Use Case |
|---------|----------|----------------|----------|
| CUDA | NVIDIA GPU | 10-50x | Production ML inference |
| TensorRT | NVIDIA GPU | 15-70x | Optimized ONNX models |
| Metal | Apple Silicon | 8-30x | macOS/iOS deployment |
| OpenVINO | Intel | 5-20x | Intel CPU/GPU optimization |
| WGPU | WebGPU-capable | 3-15x | Browser/cross-platform |
| ROCm | AMD GPU | 10-40x | AMD GPU acceleration |
### 5.2 CUDA Support
**Primary Library**: `cudarc` (Low-level CUDA bindings)
**Integration via ONNX Runtime**:
```toml
[dependencies]
ort = { version = "2.0", features = ["cuda"] }
```
**Requirements**:
- CUDA Toolkit 11.6+ (for ort)
- NVIDIA GPU: Maxwell (7xx series) or newer
- Compute Capability 5.0+
**Benefits**:
- Industry-standard ML acceleration
- Mature ecosystem and tooling
- Extensive operator coverage
- Best-in-class performance for training and inference
### 5.3 Metal Support (Apple Silicon)
**Framework Integration**:
- **Candle**: Native Metal backend via `metal` crate
- **Burn**: Metal support through `burn-metal` backend
- **ONNX Runtime**: CoreML execution provider (Metal-accelerated)
**Example (Candle)**:
```rust
use candle_core::Device;
let device = Device::new_metal(0)?; // First Metal device
let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
```
**Performance**: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips
### 5.4 WebGPU/WGPU
**Purpose**: Cross-platform GPU compute for WASM and native
**Frameworks with WGPU Support**:
- **Burn**: First-class WGPU backend
- **Candle**: WGPU support for browser deployment
- **Kalosm**: WGPU acceleration via Fusor (0.5 release)
**Browser Deployment**:
```rust
// WASM-compatible GPU inference
#[cfg(target_arch = "wasm32")]
use candle_core::Device;
let device = Device::Cpu; // Or Device::Metal/Cuda if available
```
**Benefits**:
- Browser-based ML inference without server
- Works on AMD GPUs (unlike CUDA)
- Portable across desktop and web
- Future-proof standard (W3C specification)
**Limitations**:
- Lower performance than native CUDA/Metal
- Browser memory constraints (typically 2-8GB)
- First token latency: ~120ms (acceptable for many use cases)
### 5.5 TensorRT (NVIDIA Optimization)
**Purpose**: Optimized ONNX model execution on NVIDIA GPUs
**Requirements**:
- NVIDIA GPU: GeForce 9xx series or newer
- TensorRT 8.4+
- CUDA 11.6+
**Integration**:
```toml
ort = { version = "2.0", features = ["cuda", "tensorrt"] }
```
**Benefits**:
- Automatic kernel fusion and layer optimization
- Mixed precision (FP32/FP16/INT8)
- Up to 2-5x faster than standard CUDA
- Optimal for high-throughput production deployment
### 5.6 OpenVINO (Intel)
**Target**: Intel CPUs (6th gen+) and Intel integrated GPUs
**Use Case**:
- Intel-based servers without discrete GPU
- Edge devices with Intel processors
- Cost-effective acceleration without NVIDIA hardware
**Integration**:
```toml
ort = { version = "2.0", features = ["openvino"] }
```
**Performance**: 5-20x CPU speedup depending on model and hardware
### 5.7 GPU Acceleration Recommendation for ruvector-scipix
**Tiered Approach**:
1. **Primary (Production)**: `ort` with CUDA/TensorRT
- Maximum performance for server deployment
- Best operator coverage for PaddleOCR models
- Production-proven reliability
2. **Secondary (Apple Ecosystem)**: `candle` with Metal
- Native Apple Silicon support
- Good for macOS/iOS deployment
- Smaller binary size than ONNX Runtime
3. **Tertiary (WASM/Browser)**: `candle` or `tract` with WGPU
- Client-side OCR in browser
- Privacy-preserving (no server upload)
- Acceptable performance for interactive use
4. **Fallback (CPU-only)**: `tract` or `ort` with optimized CPU execution
- MKL/OpenBLAS acceleration
- Rayon parallelism
- Still faster than Python alternatives
---
## 6. WebAssembly Compilation Considerations
### 6.1 WASM for ML: Current State (2025)
**Key Finding**: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives.
**Performance Characteristics**:
- Rust compiles to WASM **faster** than C++
- Rust produces **smaller binaries** than C++ WASM
- **Memory efficiency**: Rust's ownership model translates well to WASM linear memory
- Consistent performance across browsers
### 6.2 Memory Constraints and Optimization
**Browser Memory Limits**:
- Typical: 2-4GB per tab (Chrome/Firefox)
- Maximum: 4-8GB (varies by browser/OS)
- **Critical Issue**: Running multiple models can exhaust memory quickly
**Memory Optimization Strategies**:
**1. Model Quantization**
```rust
// INT8 quantization reduces memory by 4x
// FP16 quantization reduces memory by 2x
let quantized_model = model.quantize(QuantizationType::QInt8)?;
```
**2. Memory Reuse**
```rust
// Pre-allocate tensors, reuse across inferences
struct InferenceContext {
input_buffer: Vec<f32>,
output_buffer: Vec<f32>,
}
impl InferenceContext {
fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> {
self.input_buffer.copy_from_slice(data);
model.run(&self.input_buffer, &mut self.output_buffer)?;
Ok(&self.output_buffer)
}
}
```
**3. Lazy Loading with Streaming Compile**
```rust
// Use WebAssembly.instantiateStreaming for faster startup
// Load models on-demand, not at initialization
async fn load_model_lazy(url: &str) -> Result<Module> {
let response = window.fetch(url).await?;
let module = WebAssembly::instantiate_streaming(response).await?;
Ok(module)
}
```
**4. wasm-opt Optimization**
```bash
# Optimize WASM binary size and performance
wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm
```
**5. Model Cleanup**
```rust
// Explicit cleanup when switching models
impl Drop for ModelContext {
fn drop(&mut self) {
// Free GPU resources
self.gpu_buffers.clear();
// Trigger garbage collection hint (if available)
}
}
```
### 6.3 Bundle Size Considerations
**Challenge**: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint.
**Mitigation Strategies**:
**1. Code Splitting**
```rust
// Load OCR functionality separately from main bundle
#[wasm_bindgen]
pub async fn init_ocr() -> Result<OcrEngine, JsValue> {
// Lazy-load OCR model
let model = load_model("ocr.onnx").await?;
Ok(OcrEngine::new(model))
}
```
**2. Minimal Features**
```toml
[dependencies]
ort = { version = "2.0", default-features = false, features = ["minimal-build"] }
tract-onnx = { version = "0.22", default-features = false }
```
**3. Compression**
```bash
# Brotli compression (recommended by Chrome)
brotli -q 11 output.wasm -o output.wasm.br
# Gzip fallback
gzip -9 output.wasm
```
**4. Tree Shaking**
```toml
[profile.release]
opt-level = "z" # Optimize for size
lto = true
codegen-units = 1
panic = "abort"
strip = true
```
**Expected Sizes**:
| Configuration | Uncompressed | Brotli | Gzip |
|---------------|--------------|--------|------|
| Minimal tract | ~800KB | ~250KB | ~320KB |
| Full ort | ~3MB | ~900KB | ~1.1MB |
| Candle (minimal) | ~600KB | ~180KB | ~240KB |
### 6.4 WASM-Specific Limitations
**1. Threading Constraints**
- SharedArrayBuffer required for multi-threading
- COEP/COOP headers needed for isolation
- Not all browsers support WASM threads
**2. SIMD Support**
- WASM SIMD enabled by default in modern browsers
- Significant performance boost for ML operations
- Check browser compatibility: `wasm-feature-detect`
**3. No Direct File System Access**
- Use IndexedDB or Cache API for model storage
- Stream models from network (HTTP/2)
- Consider embedding small models in binary
**4. GPU Access**
- WebGPU required for GPU acceleration
- Not universally supported (as of 2025, Chrome/Edge primarily)
- Fallback to CPU inference needed
### 6.5 Recommended WASM Frameworks for ruvector-scipix
**Primary: `candle` with WGPU**
- Smallest binary size
- Native WASM support
- WebGPU acceleration when available
- Hugging Face ecosystem
**Secondary: `tract`**
- Pure Rust, no C++ dependencies
- Excellent WASM support
- Proven in production (Sonos)
- CPU-optimized
**Alternative: `ort` with WASM backend**
- Full ONNX operator support
- Can use tract or candle as backend
- Larger bundle size
**Example WASM Integration**:
```rust
use wasm_bindgen::prelude::*;
use candle_core::{Device, Tensor};
#[wasm_bindgen]
pub struct OcrEngine {
model: candle_onnx::Model,
device: Device,
}
#[wasm_bindgen]
impl OcrEngine {
#[wasm_bindgen(constructor)]
pub async fn new() -> Result<OcrEngine, JsValue> {
// Use WebGPU if available, fallback to CPU
let device = Device::Cpu; // Or Device::new_wgpu(0)?
// Load model from URL
let model_bytes = fetch_model("model.onnx").await?;
let model = candle_onnx::read(&model_bytes)
.map_err(|e| JsValue::from_str(&e.to_string()))?;
Ok(OcrEngine { model, device })
}
pub fn recognize_text(&self, image_data: &[u8]) -> Result<String, JsValue> {
// Preprocess image
let tensor = preprocess_image(image_data, &self.device)?;
// Run inference
let output = self.model.forward(&[tensor])
.map_err(|e| JsValue::from_str(&e.to_string()))?;
// Decode output
let text = decode_predictions(output)?;
Ok(text)
}
}
```
### 6.6 WASM Deployment Checklist
- [ ] Enable WASM SIMD in build (`RUSTFLAGS='-C target-feature=+simd128'`)
- [ ] Optimize bundle size (`opt-level = "z"`, LTO, strip)
- [ ] Implement lazy loading for models
- [ ] Set up proper CORS headers for model fetching
- [ ] Add WebGPU feature detection with CPU fallback
- [ ] Configure Brotli/Gzip compression on CDN
- [ ] Test memory usage across browsers (especially mobile)
- [ ] Implement model cleanup on tab close
- [ ] Add loading indicators for async model initialization
- [ ] Consider service worker for model caching
---
## 7. Memory Management for Large Models
### 7.1 Memory Challenges in ML Inference
**Typical OCR Model Sizes**:
- PaddleOCR Detection: 3-10MB (FP32)
- PaddleOCR Recognition: 5-15MB (FP32)
- TrOCR: 50-300MB (depending on variant)
- Tesseract trained data: 10-50MB per language
**Memory Consumption Beyond Model Weights**:
- Input tensors: Image size × channels × precision
- Intermediate activations: Varies by architecture (can exceed model size)
- Output buffers: Sequence length × vocab size
- KV cache (for transformers): Context length × hidden size × layers
### 7.2 Quantization Strategies
**INT8 Quantization** (4x memory reduction)
```rust
// ONNX Runtime quantization
use ort::quantization::{QuantizationConfig, QuantizationType};
let config = QuantizationConfig::default()
.with_per_channel(true)
.with_reduce_range(true);
let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?;
```
**Benefits**:
- 75% memory reduction (FP32 → INT8)
- Minimal accuracy loss (typically <1% for OCR)
- Faster inference on integer-optimized hardware
- Reduced cache pressure
**FP16 Quantization** (2x memory reduction)
```rust
// Using ort with half crate
use half::f16;
use ort::tensor::OrtOwnedTensor;
let input_f16: Vec<f16> = input_f32.iter().map(|&x| f16::from_f32(x)).collect();
let tensor = OrtOwnedTensor::from_array(input_f16)?;
```
**Benefits**:
- Better accuracy preservation than INT8
- Native support on modern GPUs (Tensor Cores)
- Still significant memory savings
**Dynamic Quantization** (Runtime)
```rust
// tract supports dynamic quantization
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))?
.quantize()? // Automatic quantization
.into_optimized()?
.into_runnable()?;
```
### 7.3 Memory Pooling and Reuse
**Tensor Buffer Reuse**:
```rust
use std::sync::Arc;
use parking_lot::Mutex;
struct TensorPool {
buffers: Vec<Arc<Mutex<Vec<f32>>>>,
size: usize,
}
impl TensorPool {
fn new(pool_size: usize, buffer_size: usize) -> Self {
let buffers = (0..pool_size)
.map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size])))
.collect();
TensorPool { buffers, size: pool_size }
}
fn acquire(&self) -> Option<Arc<Mutex<Vec<f32>>>> {
// Round-robin or availability-based selection
self.buffers.first().cloned()
}
}
```
**Session Pooling** (ONNX Runtime):
```rust
use once_cell::sync::Lazy;
use ort::Session;
static SESSION_POOL: Lazy<Vec<Session>> = Lazy::new(|| {
(0..4).map(|_| {
Session::builder()
.unwrap()
.commit_from_file("model.onnx")
.unwrap()
}).collect()
});
fn get_session() -> &'static Session {
&SESSION_POOL[thread_id % 4]
}
```
### 7.4 Streaming and Batching
**Batch Processing** (Amortize overhead):
```rust
fn process_batch(images: &[DynamicImage], model: &Session) -> Result<Vec<String>> {
let batch_size = images.len();
// Create batched tensor [batch_size, channels, height, width]
let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224];
for (i, img) in images.iter().enumerate() {
let offset = i * 3 * 224 * 224;
preprocess_into_buffer(img, &mut batch_tensor[offset..]);
}
// Single inference call for entire batch
let output = model.run(vec![batch_tensor.into()])?;
// Decode batch results
decode_batch_predictions(output, batch_size)
}
```
**Streaming Inference** (For large documents):
```rust
async fn process_document_streaming(
pages: impl Stream<Item = Image>,
model: &Session,
) -> impl Stream<Item = Result<String>> {
pages.map(|page| {
// Process one page at a time
let text = recognize_text(&page, model)?;
Ok(text)
})
}
```
### 7.5 Model Sharding and Lazy Loading
**Lazy Model Loading**:
```rust
use once_cell::sync::OnceCell;
static DETECTION_MODEL: OnceCell<Session> = OnceCell::new();
static RECOGNITION_MODEL: OnceCell<Session> = OnceCell::new();
fn get_detection_model() -> &'static Session {
DETECTION_MODEL.get_or_init(|| {
Session::builder()
.unwrap()
.commit_from_file("detection.onnx")
.unwrap()
})
}
```
**Conditional Loading**:
```rust
// Only load language-specific models when needed
struct OcrEngine {
detection: Session,
recognition_models: HashMap<Language, OnceCell<Session>>,
}
impl OcrEngine {
fn recognize(&self, img: &Image, lang: Language) -> Result<String> {
let boxes = self.detect(img)?;
let rec_model = self.recognition_models
.get(&lang)
.unwrap()
.get_or_init(|| load_recognition_model(lang));
self.recognize_boxes(img, &boxes, rec_model)
}
}
```
### 7.6 Memory Mapping (Large Models)
**Using `memmap2` for Model Files**:
```rust
use memmap2::Mmap;
use std::fs::File;
fn load_model_mmap(path: &str) -> Result<Mmap> {
let file = File::open(path)?;
let mmap = unsafe { Mmap::map(&file)? };
Ok(mmap)
}
// Model data stays on disk, paged in as needed
// Useful for models >100MB
```
**Benefits**:
- Reduced resident memory
- Faster startup (no full load)
- Shared memory across processes
**Limitations**:
- Not available in WASM
- Requires file system access
- May have higher latency on first access
### 7.7 GPU Memory Management
**CUDA Unified Memory**:
```rust
// ort automatically manages GPU memory
let session = Session::builder()?
.with_execution_providers([ExecutionProvider::CUDA])?
.commit_from_file("model.onnx")?;
// Tensors automatically transferred to/from GPU
```
**Manual GPU Memory Control** (candle):
```rust
use candle_core::{Device, Tensor};
let device = Device::new_cuda(0)?;
// Allocate on GPU
let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
// Transfer to CPU when needed
let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?;
// Explicit cleanup
drop(tensor_gpu);
```
### 7.8 Memory Profiling and Monitoring
**Rust Memory Profiling Tools**:
- `valgrind --tool=massif`: Heap profiling
- `heaptrack`: Heap memory profiler (Linux)
- `dhat`: Dynamic heap analysis tool
- `tokio-console`: Async runtime monitoring
**Custom Memory Tracking**:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
struct TrackingAllocator;
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
System.dealloc(ptr, layout)
}
}
#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;
fn get_memory_usage() -> usize {
ALLOCATED.load(Ordering::SeqCst)
}
```
### 7.9 Memory Optimization Recommendations for ruvector-scipix
**Priority Strategies**:
1. **Quantize Models** (INT8 for production)
- 4x memory reduction
- Minimal accuracy impact for OCR
- Use ONNX Runtime quantization tools
2. **Implement Tensor Pooling**
- Reuse buffers for repeated inferences
- Align with ruvector-core's memory management patterns
- Use `parking_lot` for efficient synchronization
3. **Lazy Load Language Models**
- Only load recognition models for requested languages
- Use `OnceCell` for thread-safe initialization
- Share models across threads
4. **Batch Processing**
- Group multiple images into single inference call
- Amortize overhead, improve GPU utilization
- Integrate with ruvector's parallel processing
5. **GPU Memory Awareness**
- Monitor GPU memory usage
- Implement fallback to CPU if GPU OOM
- Use smaller batch sizes on memory-constrained devices
6. **Profile Real Workloads**
- Measure memory with actual ruvector data
- Identify bottlenecks (model weights vs activations)
- Optimize based on data
---
## 8. Recommended Technology Stack for ruvector-scipix
### 8.1 Primary Stack (Production Deployment)
**Inference Engine**: `ort` (ONNX Runtime)
- **Version**: `2.0.0-rc` or latest stable
- **Features**: `cuda`, `tensorrt`, `half`, `load-dynamic`
- **Rationale**:
- Best-in-class performance (73% latency reduction)
- Extensive GPU support (CUDA, TensorRT, OpenVINO)
- Production-proven (Twitter, Google, SurrealDB)
- Largest ONNX model ecosystem
**OCR Models**: PaddleOCR v5 (ONNX format)
- **Detection**: `ch_PP-OCRv5_mobile_det.onnx`
- **Recognition**: `ch_PP-OCRv5_mobile_rec.onnx`
- **Rationale**:
- State-of-the-art accuracy
- Optimized for speed (5x faster in ONNX)
- Multi-language support (80+ languages)
- Active development (2025 updates)
**Image Processing**: `image` + `imageproc`
- **Version**: Latest stable
- **Rationale**:
- Comprehensive format support
- CPU parallelism via rayon (already in workspace)
- Mature, well-tested
- Pure Rust (no C++ dependencies)
**Dependencies Integration**:
```toml
[dependencies]
# Inference
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing
image = "0.25"
imageproc = "0.25"
# Existing ruvector-core dependencies (reuse)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }
```
### 8.2 Alternative Stack (WASM/Browser Deployment)
**Inference Engine**: `candle` with WGPU backend
- **Version**: Latest stable from Hugging Face
- **Features**: `wasm`, `webgpu`
- **Rationale**:
- Smallest WASM bundle size
- Native WebGPU support
- Fast startup times
- Pure Rust
**OCR Models**: TrOCR (via candle-onnx) or lightweight PaddleOCR
- Smaller models for browser constraints
- Quantized INT8 versions
**WASM-Specific Stack**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
candle-onnx = { version = "0.8" }
wasm-bindgen = { workspace = true }
web-sys = { workspace = true }
```
### 8.3 Fallback Stack (Pure Rust/No External Dependencies)
**Inference Engine**: `tract`
- **Use Case**: When ONNX Runtime binaries unavailable or pure Rust required
- **Rationale**:
- No C++ dependencies
- Excellent WASM support
- Mature (Sonos production use)
- Passes 85% ONNX tests
**Stack**:
```toml
[dependencies]
tract-onnx = "0.22"
image = "0.25"
imageproc = "0.25"
```
### 8.4 Architecture Design
```
┌─────────────────────────────────────────────────────────────┐
│ ruvector-scipix │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Image Input │────▶│ Preprocessing│───▶│ Detection │ │
│ │ (image) │ │ (imageproc) │ │ (ort/ONNX) │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Text Boxes │ │
│ └──────┬───────┘ │
│ │ │
│ ┌─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Recognition │─────▶│ Post-Proc. │ │
│ │ (ort/ONNX) │ │ (decode) │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Vector Store │ │
│ │ (ruvector- │ │
│ │ core) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
GPU Acceleration Layers:
├─ CUDA/TensorRT (NVIDIA)
├─ Metal (Apple Silicon)
├─ OpenVINO (Intel)
└─ WGPU (Cross-platform/Browser)
```
### 8.5 Module Structure
```
examples/scipix/
├── Cargo.toml
├── src/
│ ├── lib.rs # Public API
│ ├── engine.rs # OCR engine orchestration
│ ├── detection.rs # Text detection (ONNX)
│ ├── recognition.rs # Text recognition (ONNX)
│ ├── preprocessing.rs # Image preprocessing (imageproc)
│ ├── postprocessing.rs # Result decoding and formatting
│ ├── models.rs # Model loading and management
│ └── config.rs # Configuration
├── models/ # ONNX model files (gitignored)
│ ├── detection.onnx
│ ├── recognition.onnx
│ └── dict.txt
├── tests/
│ ├── integration_test.rs
│ └── benchmark.rs
└── docs/
├── 01_REQUIREMENTS.md
├── 02_ARCHITECTURE.md
└── 03_RUST_ECOSYSTEM.md # This document
```
### 8.6 Performance Targets
Based on PaddleOCR benchmarks and Rust optimizations:
| Metric | Target | Hardware |
|--------|--------|----------|
| **Detection Latency** | <50ms | NVIDIA T4 (TensorRT) |
| **Recognition Latency** | <20ms | NVIDIA T4 (TensorRT) |
| **End-to-End (single image)** | <100ms | NVIDIA T4 |
| **Throughput (batched)** | >100 images/sec | NVIDIA T4 |
| **CPU Latency** | <500ms | Modern multi-core CPU |
| **WASM Latency** | <1s | Browser (WebGPU) |
| **Memory Usage** | <500MB | With INT8 quantization |
### 8.7 Development Phases
**Phase 1: Core Implementation (ort + PaddleOCR)**
- Implement detection and recognition pipelines
- Integrate with ruvector-core storage
- CPU-only inference initially
- Basic preprocessing (resize, normalize)
**Phase 2: GPU Acceleration**
- Add CUDA/TensorRT support
- Benchmark and optimize performance
- Implement batching for throughput
- Memory pooling and reuse
**Phase 3: Production Hardening**
- Model quantization (INT8)
- Error handling and fallbacks
- Metrics and monitoring
- Load testing
**Phase 4: WASM Support (Optional)**
- Port to candle or tract
- Browser deployment
- WebGPU acceleration
- Client-side OCR
### 8.8 Testing Strategy
**Unit Tests**:
- Image preprocessing correctness
- Model loading and initialization
- Tensor shape validation
- Output decoding accuracy
**Integration Tests**:
```rust
#[test]
fn test_end_to_end_ocr() {
let engine = OcrEngine::new(Config::default()).unwrap();
let img = image::open("tests/fixtures/sample.jpg").unwrap();
let result = engine.recognize_text(&img).unwrap();
assert!(result.contains("expected text"));
}
```
**Benchmarks** (using Criterion):
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_detection(c: &mut Criterion) {
let engine = setup_engine();
let img = load_test_image();
c.bench_function("detection", |b| {
b.iter(|| engine.detect(black_box(&img)))
});
}
criterion_group!(benches, benchmark_detection);
criterion_main!(benches);
```
**Performance Tests**:
- Latency under various image sizes
- Throughput with batching
- Memory usage over time
- GPU utilization
---
## 9. Integration with ruvector-core Dependencies
### 9.1 Shared Workspace Dependencies
The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency.
**Already Available (from workspace)**:
| Dependency | ruvector Use | scipix Use |
|------------|--------------|-------------|
| `rayon` | Parallel distance computation | Batch image preprocessing, parallel OCR |
| `ndarray` | Vector operations | Tensor manipulation, image arrays |
| `parking_lot` | Lock-free data structures | Model pool synchronization |
| `dashmap` | Concurrent hash maps | Model cache, result cache |
| `tokio` | Async runtime | Async inference, streaming |
| `serde` / `serde_json` | Serialization | Config, results serialization |
| `thiserror` / `anyhow` | Error handling | OCR error types |
| `tracing` | Logging | Inference timing, debugging |
| `uuid` | Unique identifiers | Request tracking |
| `chrono` | Timestamps | Inference metrics |
**Benefits**:
- **Minimal new dependencies**: Only add OCR-specific crates
- **Consistent patterns**: Same error handling, logging, async across codebase
- **Binary size**: Shared dependencies not duplicated
- **Maintenance**: Updates to workspace deps benefit all crates
### 9.2 Parallel Processing Integration
**Leverage rayon for Batch OCR**:
```rust
use rayon::prelude::*;
fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec<OcrResult> {
images.par_iter()
.map(|img| engine.recognize_text(img))
.collect()
}
```
**Consistency**: Matches ruvector-core's parallel distance computation pattern
### 9.3 Storage Integration
**Store OCR Results in ruvector-core**:
```rust
use ruvector_core::{VectorStore, Vector};
struct OcrResult {
text: String,
embedding: Vec<f32>, // From embedding model
bounding_boxes: Vec<BoundingBox>,
}
impl OcrResult {
fn store_in_ruvector(&self, store: &mut VectorStore) -> Result<uuid::Uuid> {
let vector = Vector::new(self.embedding.clone());
let id = store.insert(vector)?;
// Store metadata
store.set_metadata(id, "text", &self.text)?;
store.set_metadata(id, "boxes", &self.bounding_boxes)?;
Ok(id)
}
}
```
**Vector Search for OCR Results**:
```rust
// Find similar documents by text embedding
let query_embedding = embed_text("search query")?;
let similar_docs = store.search(&query_embedding, 10)?;
```
### 9.4 WASM Compatibility
**ruvector-core WASM Patterns**:
- `memory-only` feature for WASM targets
- `wasm-bindgen` for browser interop
- `getrandom` with `wasm_js` feature
**Apply to scipix**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
wasm-bindgen = { workspace = true }
getrandom = { workspace = true, features = ["wasm_js"] }
[features]
default = ["ort-backend"]
ort-backend = ["ort"]
candle-backend = ["candle-core", "candle-onnx"]
wasm = ["candle-backend"] # WASM uses candle
```
### 9.5 Error Handling Patterns
**Consistent with ruvector-core**:
```rust
use thiserror::Error;
#[derive(Error, Debug)]
pub enum OcrError {
#[error("Model loading failed: {0}")]
ModelLoadError(String),
#[error("Inference failed: {0}")]
InferenceError(String),
#[error("Image preprocessing failed: {0}")]
PreprocessingError(#[from] image::ImageError),
#[error("ONNX Runtime error: {0}")]
OrtError(#[from] ort::Error),
#[error("IO error: {0}")]
IoError(#[from] std::io::Error),
}
pub type Result<T> = std::result::Result<T, OcrError>;
```
### 9.6 Configuration Pattern
**Similar to ruvector-core config**:
```rust
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OcrConfig {
/// Path to detection model
pub detection_model_path: String,
/// Path to recognition model
pub recognition_model_path: String,
/// Use GPU acceleration if available
pub use_gpu: bool,
/// Batch size for parallel processing
pub batch_size: usize,
/// Detection confidence threshold
pub detection_threshold: f32,
/// Number of inference threads
pub num_threads: usize,
}
impl Default for OcrConfig {
fn default() -> Self {
Self {
detection_model_path: "models/detection.onnx".into(),
recognition_model_path: "models/recognition.onnx".into(),
use_gpu: true,
batch_size: 8,
detection_threshold: 0.7,
num_threads: rayon::current_num_threads(),
}
}
}
```
### 9.7 Async Integration
**Use tokio for async OCR**:
```rust
use tokio::task;
pub struct AsyncOcrEngine {
engine: Arc<OcrEngine>,
}
impl AsyncOcrEngine {
pub async fn recognize_text(&self, image: DynamicImage) -> Result<OcrResult> {
let engine = Arc::clone(&self.engine);
// Run blocking OCR in tokio threadpool
task::spawn_blocking(move || {
engine.recognize_text_sync(&image)
}).await?
}
pub async fn process_stream(
&self,
images: impl Stream<Item = DynamicImage>,
) -> impl Stream<Item = Result<OcrResult>> {
images.then(move |img| {
let engine = Arc::clone(&self.engine);
async move {
engine.recognize_text(img).await
}
})
}
}
```
### 9.8 Metrics Integration
**Use existing tracing infrastructure**:
```rust
use tracing::{info, debug, instrument};
#[instrument(skip(self, image))]
pub fn recognize_text(&self, image: &DynamicImage) -> Result<OcrResult> {
let start = std::time::Instant::now();
debug!("Starting OCR for image {}x{}", image.width(), image.height());
let preprocessed = self.preprocess(image)?;
debug!("Preprocessing took {:?}", start.elapsed());
let boxes = self.detect(&preprocessed)?;
debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed());
let text = self.recognize(&preprocessed, &boxes)?;
info!(
"OCR completed in {:?}, extracted {} characters",
start.elapsed(),
text.len()
);
Ok(OcrResult { text, boxes })
}
```
### 9.9 Testing Infrastructure Reuse
**Use workspace test dependencies**:
```toml
[dev-dependencies]
criterion = { workspace = true }
proptest = { workspace = true }
mockall = { workspace = true }
tempfile = "3.13"
```
**Property-Based Testing** (like ruvector-core):
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn test_preprocessing_preserves_aspect_ratio(
width in 100u32..2000u32,
height in 100u32..2000u32
) {
let img = DynamicImage::new_rgb8(width, height);
let processed = preprocess_image(&img)?;
let original_ratio = width as f32 / height as f32;
let processed_ratio = processed.width() as f32 / processed.height() as f32;
prop_assert!((original_ratio - processed_ratio).abs() < 0.01);
}
}
```
### 9.10 Dependency Summary for scipix
**New Dependencies Required**:
```toml
[dependencies]
# OCR/ML (new)
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] }
image = "0.25"
imageproc = "0.25"
# Reuse from workspace (no version needed)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }
# Integration with ruvector-core
ruvector-core = { path = "../../crates/ruvector-core" }
```
**Total New Dependencies**: 3 (ort, image, imageproc)
**Reused Dependencies**: 12 from workspace
---
## 10. License Compatibility
### 10.1 ruvector Project License
**Current License**: MIT (from workspace `Cargo.toml`)
**Requirement**: All dependencies must be MIT-compatible for redistribution.
### 10.2 Recommended Dependencies License Analysis
| Crate | License | Compatible? | Notes |
|-------|---------|-------------|-------|
| **ort** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed, fully compatible |
| **candle** | MIT OR Apache-2.0 | ✅ Yes | Hugging Face, dual-licensed |
| **tract** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed (except ONNX protos) |
| **image** | MIT OR Apache-2.0 | ✅ Yes | Pure Rust, dual-licensed |
| **imageproc** | MIT | ✅ Yes | Permissive, MIT-only |
| **ndarray** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **rayon** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **wasm-bindgen** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
**Incompatible Libraries (Avoid)**:
| Crate | License | Issue |
|-------|---------|-------|
| **leptess** | MIT (wrapper) | ❌ Depends on Tesseract (Apache-2.0 with restrictions) |
| **opencv-rust** | MIT (wrapper) | ❌ Depends on OpenCV (Apache-2.0, complex) |
### 10.3 ONNX Model Licenses
PaddleOCR models used in ONNX format have **Apache-2.0** license.
**Compatibility**:
- ✅ Apache-2.0 code can be used in MIT-licensed projects
- ✅ ONNX models (weights) are typically considered data, not code
- ✅ Distribution of pre-trained models is permitted
- ⚠️ Derivative works of Apache-2.0 code require patent grant preservation
**Best Practice**:
- Download PaddleOCR ONNX models from official sources
- Include LICENSE file in `models/` directory
- Document model provenance in README
- Do not modify Apache-2.0 code (use as-is via ONNX)
### 10.4 Rust Dual-Licensing Best Practices
**Why Rust Uses MIT OR Apache-2.0**:
- **MIT**: Maximum permissiveness, minimal restrictions
- **Apache-2.0**: Patent protection, better for corporate use
- **Dual License**: Users choose which applies to them
**For ruvector-scipix**:
**Option 1: Keep MIT-only (Current)**
- ✅ Simplest licensing
- ✅ Maximum compatibility
- ✅ Minimal legal overhead
- ✅ All dependencies are MIT-compatible
**Option 2: Adopt Dual MIT/Apache-2.0**
- ✅ Better patent protection
- ✅ Aligns with Rust ecosystem norms
- ✅ More attractive to enterprise users
- ⚠️ Slightly more complex
**Recommendation**: Keep MIT-only for simplicity, unless patent concerns arise.
### 10.5 License Compliance Checklist
**For Production Deployment**:
- [ ] Verify all direct dependencies are MIT or MIT/Apache-2.0
- [ ] Check transitive dependencies for license conflicts
- [ ] Include LICENSE file in repository
- [ ] Document third-party licenses in NOTICE file
- [ ] Include PaddleOCR model license in `models/LICENSE`
- [ ] Add copyright headers to source files (optional for MIT)
- [ ] Review ONNX Runtime's license (MIT, but check binary distribution terms)
- [ ] Ensure no GPL/LGPL dependencies (incompatible with MIT)
**Automated License Checking**:
```bash
# Use cargo-license to audit dependencies
cargo install cargo-license
cargo license --all-features
# Fail build on incompatible licenses
cargo deny check licenses
```
**`deny.toml` Configuration**:
```toml
[licenses]
unlicensed = "deny"
allow = [
"MIT",
"Apache-2.0",
"Apache-2.0 WITH LLVM-exception",
"BSD-2-Clause",
"BSD-3-Clause",
"ISC",
"Unicode-DFS-2016",
]
deny = [
"GPL-2.0",
"GPL-3.0",
"AGPL-3.0",
]
```
### 10.6 Attribution Requirements
**MIT License Requirements**:
- Include copyright notice
- Include permission notice (LICENSE file)
- No obligation to disclose source code modifications
**For PaddleOCR Models (Apache-2.0)**:
- Include NOTICE file if provided
- Preserve copyright and patent notices
- Document significant modifications (if any)
**Recommended NOTICE File**:
```
ruvector-scipix
Copyright 2025 Ruvector Team
This software includes components from:
1. ONNX Runtime
Copyright Microsoft Corporation
Licensed under MIT License
2. PaddleOCR Models
Copyright PaddlePaddle Authors
Licensed under Apache License 2.0
Model files located in models/ directory
3. Candle ML Framework
Copyright Hugging Face, Inc.
Licensed under MIT OR Apache-2.0
Complete license texts available in the LICENSE and models/LICENSE files.
```
### 10.7 License Compatibility Summary
**✅ SAFE TO USE** (Recommended Stack):
- `ort` - MIT/Apache-2.0
- `image` - MIT/Apache-2.0
- `imageproc` - MIT
- `candle` - MIT/Apache-2.0
- `tract` - MIT/Apache-2.0
- PaddleOCR ONNX models - Apache-2.0 (data)
**⚠️ USE WITH CAUTION**:
- `leptess` - Requires Tesseract C++ library (complex licensing)
- `opencv-rust` - Requires OpenCV (large dependency, Apache-2.0)
**❌ AVOID**:
- Any GPL/LGPL libraries (incompatible with MIT for proprietary use)
- Proprietary OCR engines (licensing fees, redistribution restrictions)
**Final Recommendation**: The proposed stack (`ort` + PaddleOCR + `image`/`imageproc`) is **fully compatible** with ruvector's MIT license and follows Rust ecosystem best practices.
---
## 11. Final Recommendations
### 11.1 Optimal Technology Stack
**Primary Recommendation (Production)**:
```toml
[dependencies]
# Inference: Best performance, production-proven
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing: Pure Rust, mature
image = "0.25"
imageproc = "0.25"
# OCR models: PaddleOCR v5 ONNX (download separately)
# - Detection: ch_PP-OCRv5_mobile_det.onnx
# - Recognition: ch_PP-OCRv5_mobile_rec.onnx
# Reuse workspace dependencies
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
thiserror = { workspace = true }
# Integration
ruvector-core = { path = "../../crates/ruvector-core" }
```
**Rationale**:
1. **Performance**: `ort` provides 73% latency reduction vs alternatives
2. **Ecosystem**: Largest ONNX model selection (PaddleOCR, TrOCR, etc.)
3. **GPU Support**: CUDA, TensorRT, OpenVINO, Metal (via CoreML)
4. **Production Ready**: Used by Twitter, Google, SurrealDB
5. **License**: MIT/Apache-2.0 dual-license (fully compatible)
6. **Maintenance**: Active development, Microsoft backing
### 11.2 Alternative Stacks by Use Case
**WASM/Browser Deployment**:
```toml
candle-core = { version = "0.8", features = ["wasm", "webgpu"] }
candle-onnx = "0.8"
```
- Smallest bundle size (~180KB Brotli)
- WebGPU acceleration
- Fast startup (120ms first token)
**Pure Rust / No External Deps**:
```toml
tract-onnx = "0.22"
```
- No C++ dependencies
- Excellent for embedded/restrictive environments
- 85% ONNX compatibility
**Edge Devices / Raspberry Pi**:
```toml
tract-onnx = { version = "0.22", features = ["pulse"] }
```
- Optimized for CPU inference
- Minimal memory footprint
- Proven on RPi (11μs for CNN models)
### 11.3 Implementation Roadmap
**Week 1-2: Core Infrastructure**
- Set up `examples/scipix` crate structure
- Integrate `ort` and `image`/`imageproc`
- Implement model loading (detection + recognition)
- Basic end-to-end pipeline (CPU-only)
**Week 3-4: GPU Acceleration**
- Enable CUDA/TensorRT support
- Implement batching for throughput
- Benchmark performance vs targets
- Memory pooling and optimization
**Week 5-6: Production Hardening**
- Model quantization (INT8)
- Error handling and recovery
- Metrics and monitoring (tracing)
- Integration tests and benchmarks
**Week 7-8: ruvector Integration**
- Store OCR results in ruvector-core
- Implement vector search for documents
- Async API with tokio
- Documentation and examples
**Optional (Week 9-10): WASM Support**
- Port to candle for browser deployment
- WebGPU acceleration
- Client-side OCR demo
### 11.4 Key Metrics to Track
**Performance**:
- Detection latency: Target <50ms (GPU), <200ms (CPU)
- Recognition latency: Target <20ms (GPU), <100ms (CPU)
- End-to-end: Target <100ms (GPU), <500ms (CPU)
- Throughput: Target >100 images/sec (batched, GPU)
**Memory**:
- Model size: ~15-30MB (FP32), ~5-10MB (INT8)
- Runtime memory: Target <500MB
- GPU memory: Monitor for OOM
**Accuracy**:
- Character accuracy: Target >95% (clean text)
- Word accuracy: Target >90%
- Benchmark against Tesseract and commercial APIs
### 11.5 Risk Mitigation
**Model Availability**:
- ✅ PaddleOCR models freely available
- ✅ Multiple model versions for fallback
- ⚠️ Verify ONNX export quality (may need custom conversion)
**Dependency Stability**:
-`ort` actively maintained (2.0 rc, stable release expected)
-`image`/`imageproc` mature, widely used
- ⚠️ Monitor for breaking changes during updates
**Performance Variability**:
- ⚠️ GPU performance depends on driver versions
- ⚠️ WASM performance varies by browser
- ✅ Comprehensive benchmarking before production
**License Compliance**:
- ✅ All recommended dependencies MIT-compatible
- ✅ PaddleOCR Apache-2.0 (compatible for use)
- ⚠️ Review licenses before adding new dependencies
### 11.6 Success Criteria
The ruvector-scipix implementation is successful if:
1. **Performance**: Meets or exceeds latency/throughput targets
2. **Accuracy**: Character accuracy >95% on clean text
3. **Integration**: Seamlessly stores results in ruvector-core
4. **Portability**: Runs on Linux/macOS/Windows, CPU and GPU
5. **Memory**: Operates within <500MB budget
6. **License**: Maintains MIT compatibility
7. **Maintainability**: Uses idiomatic Rust, well-documented
8. **Scalability**: Handles batch processing efficiently
### 11.7 Next Steps
1. **Review this document** with ruvector team for alignment
2. **Download PaddleOCR models** (detection + recognition ONNX)
3. **Set up `examples/scipix` crate** with recommended dependencies
4. **Implement basic OCR pipeline** (end-to-end proof of concept)
5. **Benchmark initial implementation** against targets
6. **Iterate and optimize** based on real-world data
7. **Document API** and usage examples
8. **Integrate with ruvector-core** for vector storage
---
## References and Resources
### Documentation
- [ort Documentation](https://ort.pyke.io/) - ONNX Runtime Rust bindings by pykeio
- [Candle GitHub](https://github.com/huggingface/candle) - Minimalist ML framework for Rust
- [tract GitHub](https://github.com/sonos/tract) - Tiny, no-nonsense ONNX/TF inference
- [PaddleOCR GitHub](https://github.com/PaddlePaddle/PaddleOCR) - OCR models and documentation
- [imageproc Docs](https://docs.rs/imageproc) - Rust image processing library
### Performance Benchmarks
- [Rust at the Metal: GPU Layer Driving Modern AI](https://rustacean.ai/p/issue-2-rust-at-the-metal-the-gpu-layer-driving-modern-ai)
- [Rust for Machine Learning in 2025](https://markaicode.com/rust-machine-learning-framework-comparison-2025/)
- [PaddleOCR 3.0 High-Performance Inference](http://www.paddleocr.ai/main/en/version3.x/deployment/high_performance_inference.html)
### WASM Resources
- [WebAssembly 3.0 Performance: Rust vs C++ Benchmarks](https://markaicode.com/webassembly-3-performance-rust-cpp-benchmarks-2025/)
- [3W for In-Browser AI: WebLLM + WASM + WebWorkers](https://blog.mozilla.ai/3w-for-in-browser-ai-webllm-wasm-webworkers/)
### License Information
- [Rust API Guidelines: Licensing](https://rust-lang.github.io/api-guidelines/necessities.html)
- [PaddleOCR License](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE) - Apache-2.0
- [ONNX Runtime License](https://github.com/microsoft/onnxruntime/blob/main/LICENSE) - MIT
---
**Document Version**: 1.0
**Last Updated**: 2025-11-28
**Author**: Research and Analysis Agent
**Status**: Complete