wifi-densepose/examples/scipix/docs/03_RUST_ECOSYSTEM.md

# Rust OCR and ML Ecosystem Analysis for ruvector-scipix

## Executive Summary

This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment.

**Key Finding**: The optimal stack for ruvector-scipix combines `ort` (ONNX Runtime bindings) for inference, `image`/`imageproc` for preprocessing, with optional pure Rust alternatives (`tract`, `candle`) for WASM targets.

---

## 1. Library Comparison Matrix

### OCR Libraries

| Library | Type | Model Support | WASM Support | GPU Support | Maturity | Performance | Dependencies |
|---------|------|---------------|--------------|-------------|----------|-------------|--------------|
| **ocrs** | Native Rust | ONNX (RTen engine) | ✅ Yes | ❌ No | 🟡 Preview | Medium | Minimal (Pure Rust) |
| **oar-ocr** | ONNX Wrapper | PaddleOCR ONNX | ✅ Yes | ✅ CUDA | 🟢 Stable | High | ort (ONNX Runtime) |
| **kalosm-ocr** | Pure Rust | TrOCR (candle) | ✅ Yes | ✅ WGPU/Metal/CUDA | 🟡 Alpha | Medium | candle ML framework |
| **leptess** | FFI Bindings | Tesseract C++ | ❌ No | ❌ No | 🟢 Mature | High (CPU) | Tesseract C++ library |
| **paddle-ocr-rs** | ONNX Wrapper | PaddleOCR v4/v5 | ✅ Yes | ✅ CUDA/TensorRT | 🟢 Stable | Very High | ort (ONNX Runtime) |
| **pure-onnx-ocr** | Pure ONNX | PaddleOCR DBNet+SVTR | ✅ Yes | ✅ Via ONNX RT | 🟢 Active (2025) | High | No C/C++ deps |

### ML Inference Engines

| Library | Purpose | Model Format | WASM Support | GPU Support | Performance | Maturity |
|---------|---------|--------------|--------------|-------------|-------------|----------|
| **ort** | ONNX Runtime | ONNX | ✅ Yes | ✅ CUDA/TensorRT/OpenVINO | **Very High** | 🟢 Production |
| **candle** | ML Framework | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | High | 🟢 Stable (HuggingFace) |
| **tract** | ONNX/TF Inference | ONNX, NNEF, TF | ✅ Yes | ❌ Limited | High (CPU) | 🟢 Mature (Sonos) |
| **burn** | Deep Learning | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | Very High | 🟢 Active |

**Legend**: 🟢 Production-ready | 🟡 Active development | 🔴 Experimental

### Performance Benchmarks

Based on research findings:

- **ort + PaddleOCR**: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4)
- **ONNX conversion**: Up to 5x faster than PaddlePaddle native inference
- **tract**: 70μs (RPi Zero), 11μs (RPi 3) for CNN models
- **Tesseract (leptess)**: Baseline CPU performance, requires preprocessing
- **ocrs**: Early preview, moderate performance on clear text

---

## 2. ONNX Runtime Integration Options

### 2.1 The `ort` Crate (Recommended)

**Overview**: `ort` by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support.

**Key Features**:
- **Hardware Acceleration**: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN
- **Dynamic Loading**: Runtime linking for flexibility (`load-dynamic` feature)
- **Alternative Backends**: Support for tract and candle backends
- **Minimal Builds**: RTTI-free, optimized binary sizes for production
- **Float16/BFloat16**: Via `half` crate integration
- **Production Proven**: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB

**Cargo Features**:
```toml
[dependencies]
ort = { version = "2.0.0-rc", features = [
    "half",           # Float16/BFloat16 support
    "load-dynamic",   # Runtime dynamic linking
    "cuda",           # NVIDIA GPU acceleration (requires CUDA 11.6+)
    "tensorrt",       # TensorRT optimization (requires TensorRT 8.4+)
] }
```

**Performance Characteristics**:
- Significantly faster than PyTorch for inference
- Supports model quantization (int8, float16)
- Multi-GPU distribution via NCCL
- Optimal for batch processing and real-time inference

**Integration Example**:
```rust
use ort::{Session, Value};

// Load ONNX model
let session = Session::builder()?
    .with_optimization_level(GraphOptimizationLevel::Level3)?
    .with_intra_threads(4)?
    .commit_from_file("model.onnx")?;

// Run inference
let input = Value::from_array(session.allocator(), &input_tensor)?;
let outputs = session.run(vec![input])?;
```

### 2.2 Alternative: `tract` Backend

**Use Case**: When ONNX Runtime binaries are problematic or WASM target required

**Advantages**:
- Pure Rust implementation
- No external C++ dependencies
- Excellent WASM support
- Passes 85% of ONNX backend tests
- Lightweight and maintainable

**Limitations**:
- No tensor sequences or optional tensors
- Limited GPU support compared to ort
- TensorFlow 2 support via ONNX conversion only

### 2.3 Alternative: `candle` Backend

**Use Case**: When integrating with Hugging Face ecosystem or needing pure Rust

**Advantages**:
- Minimalist design, fast compilation
- Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion)
- WASM + WebGPU acceleration
- Small binary size for serverless deployment
- CUDA, Metal, MKL, Accelerate backends

**Limitations**:
- Younger ecosystem than ONNX Runtime
- Fewer pre-optimized OCR models available
- Focus on inference over training

---

## 3. Pure Rust ML with Candle/Tract

### 3.1 Candle Framework (Hugging Face)

**Architecture**: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment.

**Supported Models**:
- **Language Models**: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder
- **Vision Models**: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything
- **Speech**: Whisper ASR

**Backend Support**:
| Backend | Platform | Performance | Use Case |
|---------|----------|-------------|----------|
| CUDA | NVIDIA GPU | Very High | Production inference |
| Metal | Apple Silicon | High | macOS/iOS deployment |
| CPU (MKL) | x86 Intel | Medium-High | CPU-only servers |
| CPU (Accelerate) | Apple | Medium-High | macOS CPU fallback |
| WGPU | WebGPU-enabled | Medium | Browser deployment |

**Design Philosophy**:
- Remove Python from production workloads
- Minimize binary size (critical for edge/serverless)
- Fast startup times (first token ~120ms on M2 MacBook Air)
- Rust's safety guarantees for ML workloads

**Example Usage**:
```rust
use candle_core::{Device, Tensor};
use candle_onnx;

// Load model
let model = candle_onnx::read_file("model.onnx")?;
let graph = model.graph.as_ref().unwrap();

// Create device (CUDA/Metal/CPU)
let device = Device::cuda_if_available(0)?;

// Run inference
let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?;
let output = model.forward(&[input])?;
```

### 3.2 Tract Framework (Sonos)

**Architecture**: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices.

**Key Capabilities**:
- **ONNX Support**: 85% of ONNX backend tests passing
- **Operator Set**: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18)
- **Proven Models**: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc.
- **Pulsing**: Streaming inference for time-series models (e.g., WaveNet)
- **Quantization**: Built-in int8 quantization support

**Performance Characteristics**:
- Optimized for CPU inference
- Excellent for edge devices (Raspberry Pi, embedded systems)
- Minimal memory footprint
- No RTTI or runtime overhead

**Example Usage**:
```rust
use tract_onnx::prelude::*;

// Load and optimize model
let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
    .into_optimized()?
    .into_runnable()?;

// Run inference
let input = tract_ndarray::arr4(&[[...]]).into_dyn();
let result = model.run(tvec![input.into()])?;
```

**Quantization Support**:
```rust
let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
    .quantize()?  // Automatic int8 quantization
    .into_optimized()?
    .into_runnable()?;
```

### 3.3 Comparison: Candle vs Tract vs ort

| Criterion | Candle | Tract | ort |
|-----------|--------|-------|-----|
| **Performance (GPU)** | Very High | N/A | Very High |
| **Performance (CPU)** | High | Very High | Very High |
| **Binary Size** | Small | Very Small | Large |
| **Startup Time** | Fast | Very Fast | Medium |
| **WASM Support** | Excellent | Excellent | Good (with backends) |
| **Model Ecosystem** | Hugging Face | ONNX/TF | ONNX (largest) |
| **GPU Backends** | CUDA/Metal/WGPU | Limited | CUDA/TensorRT/OpenVINO |
| **Quantization** | Manual | Built-in | Excellent (ONNX tools) |
| **Maturity** | Stable (2024+) | Mature (2018+) | Production (Microsoft) |

**Recommendation**:
- **ort**: Primary choice for maximum performance and hardware acceleration
- **candle**: Secondary choice for WASM targets or Hugging Face integration
- **tract**: Fallback for pure Rust requirements or extreme size constraints

---

## 4. Image Processing in Rust

### 4.1 The `image` Crate (Foundation)

**Purpose**: Core image encoding/decoding and basic manipulation.

**Supported Formats**:
- JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF

**Key Features**:
```rust
use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView};

// Load image
let img = image::open("input.jpg")?;

// Basic operations (in imageops module)
let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3);
let grayscale = img.grayscale();
let blurred = imageops::blur(&img, 2.0);
let contrast_adjusted = imageops::contrast(&img, 30.0);
```

### 4.2 The `imageproc` Crate (Advanced Processing)

**Purpose**: Advanced image processing algorithms for computer vision.

**Modules**:
| Module | Capabilities |
|--------|-------------|
| **Contrast** | Histogram equalization, adaptive thresholding, CLAHE |
| **Corners** | Harris, FAST, Shi-Tomasi corner detection |
| **Distance Transform** | Euclidean distance maps, morphological operations |
| **Edges** | Canny edge detection, Sobel/Scharr operators |
| **Filter** | Gaussian, median, bilateral filtering |
| **Geometric** | Rotation, affine, projective transformations |
| **Morphology** | Erosion, dilation, opening, closing |
| **Drawing** | Shapes, text, anti-aliased primitives |
| **Contours** | Border tracing, contour extraction |

**Parallelism**: CPU-based multithreading via `rayon` (not GPU acceleration)

**OCR Preprocessing Example**:
```rust
use imageproc::contrast::{adaptive_threshold, ThresholdType};
use imageproc::filter::gaussian_blur_f32;
use imageproc::geometric_transformations::{rotate_about_center, Interpolation};

// Preprocessing pipeline for OCR
fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage {
    // Convert to grayscale
    let gray = img.to_luma8();

    // Denoise with Gaussian blur
    let blurred = gaussian_blur_f32(&gray, 1.0);

    // Adaptive thresholding for varying lighting
    let binary = adaptive_threshold(&blurred, 21);

    // Deskew if needed
    let angle = detect_skew(&binary); // Custom function
    let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8]));

    deskewed
}
```

### 4.3 GPU Acceleration Options for Image Processing

**Current State**: `imageproc` does NOT provide GPU acceleration. For GPU-accelerated image processing, consider:

**Option 1: `wgpu` + Custom Compute Shaders**
```rust
use wgpu;

// GPU compute shader for image processing
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
    label: Some("Image Processing"),
    source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")),
});
```

**Option 2: OpenCV-Rust Bindings** (if CUDA needed)
- Provides GPU-accelerated operations via CUDA
- Requires OpenCV C++ installation
- Not pure Rust

**Option 3: Integrate with ML Framework GPU Ops**
- Use candle/ort tensor operations for preprocessing
- Leverage existing GPU context
- Keep preprocessing on same device as inference

**Recommendation for ruvector-scipix**:
- Use `image` + `imageproc` for CPU preprocessing (fast enough for most cases)
- For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations
- Leverage rayon parallelism for batch processing

---

## 5. GPU Acceleration Options

### 5.1 Cross-Platform GPU Support in 2025

The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs.

**Unified Backend: `wgpu` (WebGPU Standard)**
- **Targets**: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers)
- **Use Case**: Portable GPU compute without vendor lock-in
- **Frameworks**: Burn, Candle (WGPU backend), kalosm

**Performance Profile**:
| Backend | Platform | Speedup vs CPU | Use Case |
|---------|----------|----------------|----------|
| CUDA | NVIDIA GPU | 10-50x | Production ML inference |
| TensorRT | NVIDIA GPU | 15-70x | Optimized ONNX models |
| Metal | Apple Silicon | 8-30x | macOS/iOS deployment |
| OpenVINO | Intel | 5-20x | Intel CPU/GPU optimization |
| WGPU | WebGPU-capable | 3-15x | Browser/cross-platform |
| ROCm | AMD GPU | 10-40x | AMD GPU acceleration |

### 5.2 CUDA Support

**Primary Library**: `cudarc` (Low-level CUDA bindings)

**Integration via ONNX Runtime**:
```toml
[dependencies]
ort = { version = "2.0", features = ["cuda"] }
```

**Requirements**:
- CUDA Toolkit 11.6+ (for ort)
- NVIDIA GPU: Maxwell (7xx series) or newer
- Compute Capability 5.0+

**Benefits**:
- Industry-standard ML acceleration
- Mature ecosystem and tooling
- Extensive operator coverage
- Best-in-class performance for training and inference

### 5.3 Metal Support (Apple Silicon)

**Framework Integration**:
- **Candle**: Native Metal backend via `metal` crate
- **Burn**: Metal support through `burn-metal` backend
- **ONNX Runtime**: CoreML execution provider (Metal-accelerated)

**Example (Candle)**:
```rust
use candle_core::Device;

let device = Device::new_metal(0)?; // First Metal device
let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
```

**Performance**: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips

### 5.4 WebGPU/WGPU

**Purpose**: Cross-platform GPU compute for WASM and native

**Frameworks with WGPU Support**:
- **Burn**: First-class WGPU backend
- **Candle**: WGPU support for browser deployment
- **Kalosm**: WGPU acceleration via Fusor (0.5 release)

**Browser Deployment**:
```rust
// WASM-compatible GPU inference
#[cfg(target_arch = "wasm32")]
use candle_core::Device;

let device = Device::Cpu; // Or Device::Metal/Cuda if available
```

**Benefits**:
- Browser-based ML inference without server
- Works on AMD GPUs (unlike CUDA)
- Portable across desktop and web
- Future-proof standard (W3C specification)

**Limitations**:
- Lower performance than native CUDA/Metal
- Browser memory constraints (typically 2-8GB)
- First token latency: ~120ms (acceptable for many use cases)

### 5.5 TensorRT (NVIDIA Optimization)

**Purpose**: Optimized ONNX model execution on NVIDIA GPUs

**Requirements**:
- NVIDIA GPU: GeForce 9xx series or newer
- TensorRT 8.4+
- CUDA 11.6+

**Integration**:
```toml
ort = { version = "2.0", features = ["cuda", "tensorrt"] }
```

**Benefits**:
- Automatic kernel fusion and layer optimization
- Mixed precision (FP32/FP16/INT8)
- Up to 2-5x faster than standard CUDA
- Optimal for high-throughput production deployment

### 5.6 OpenVINO (Intel)

**Target**: Intel CPUs (6th gen+) and Intel integrated GPUs

**Use Case**:
- Intel-based servers without discrete GPU
- Edge devices with Intel processors
- Cost-effective acceleration without NVIDIA hardware

**Integration**:
```toml
ort = { version = "2.0", features = ["openvino"] }
```

**Performance**: 5-20x CPU speedup depending on model and hardware

### 5.7 GPU Acceleration Recommendation for ruvector-scipix

**Tiered Approach**:

1. **Primary (Production)**: `ort` with CUDA/TensorRT
   - Maximum performance for server deployment
   - Best operator coverage for PaddleOCR models
   - Production-proven reliability

2. **Secondary (Apple Ecosystem)**: `candle` with Metal
   - Native Apple Silicon support
   - Good for macOS/iOS deployment
   - Smaller binary size than ONNX Runtime

3. **Tertiary (WASM/Browser)**: `candle` or `tract` with WGPU
   - Client-side OCR in browser
   - Privacy-preserving (no server upload)
   - Acceptable performance for interactive use

4. **Fallback (CPU-only)**: `tract` or `ort` with optimized CPU execution
   - MKL/OpenBLAS acceleration
   - Rayon parallelism
   - Still faster than Python alternatives

---

## 6. WebAssembly Compilation Considerations

### 6.1 WASM for ML: Current State (2025)

**Key Finding**: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives.

**Performance Characteristics**:
- Rust compiles to WASM **faster** than C++
- Rust produces **smaller binaries** than C++ WASM
- **Memory efficiency**: Rust's ownership model translates well to WASM linear memory
- Consistent performance across browsers

### 6.2 Memory Constraints and Optimization

**Browser Memory Limits**:
- Typical: 2-4GB per tab (Chrome/Firefox)
- Maximum: 4-8GB (varies by browser/OS)
- **Critical Issue**: Running multiple models can exhaust memory quickly

**Memory Optimization Strategies**:

**1. Model Quantization**
```rust
// INT8 quantization reduces memory by 4x
// FP16 quantization reduces memory by 2x
let quantized_model = model.quantize(QuantizationType::QInt8)?;
```

**2. Memory Reuse**
```rust
// Pre-allocate tensors, reuse across inferences
struct InferenceContext {
    input_buffer: Vec<f32>,
    output_buffer: Vec<f32>,
}

impl InferenceContext {
    fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> {
        self.input_buffer.copy_from_slice(data);
        model.run(&self.input_buffer, &mut self.output_buffer)?;
        Ok(&self.output_buffer)
    }
}
```

**3. Lazy Loading with Streaming Compile**
```rust
// Use WebAssembly.instantiateStreaming for faster startup
// Load models on-demand, not at initialization
async fn load_model_lazy(url: &str) -> Result<Module> {
    let response = window.fetch(url).await?;
    let module = WebAssembly::instantiate_streaming(response).await?;
    Ok(module)
}
```

**4. wasm-opt Optimization**
```bash
# Optimize WASM binary size and performance
wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm
```

**5. Model Cleanup**
```rust
// Explicit cleanup when switching models
impl Drop for ModelContext {
    fn drop(&mut self) {
        // Free GPU resources
        self.gpu_buffers.clear();
        // Trigger garbage collection hint (if available)
    }
}
```

### 6.3 Bundle Size Considerations

**Challenge**: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint.

**Mitigation Strategies**:

**1. Code Splitting**
```rust
// Load OCR functionality separately from main bundle
#[wasm_bindgen]
pub async fn init_ocr() -> Result<OcrEngine, JsValue> {
    // Lazy-load OCR model
    let model = load_model("ocr.onnx").await?;
    Ok(OcrEngine::new(model))
}
```

**2. Minimal Features**
```toml
[dependencies]
ort = { version = "2.0", default-features = false, features = ["minimal-build"] }
tract-onnx = { version = "0.22", default-features = false }
```

**3. Compression**
```bash
# Brotli compression (recommended by Chrome)
brotli -q 11 output.wasm -o output.wasm.br

# Gzip fallback
gzip -9 output.wasm
```

**4. Tree Shaking**
```toml
[profile.release]
opt-level = "z"  # Optimize for size
lto = true
codegen-units = 1
panic = "abort"
strip = true
```

**Expected Sizes**:
| Configuration | Uncompressed | Brotli | Gzip |
|---------------|--------------|--------|------|
| Minimal tract | ~800KB | ~250KB | ~320KB |
| Full ort | ~3MB | ~900KB | ~1.1MB |
| Candle (minimal) | ~600KB | ~180KB | ~240KB |

### 6.4 WASM-Specific Limitations

**1. Threading Constraints**
- SharedArrayBuffer required for multi-threading
- COEP/COOP headers needed for isolation
- Not all browsers support WASM threads

**2. SIMD Support**
- WASM SIMD enabled by default in modern browsers
- Significant performance boost for ML operations
- Check browser compatibility: `wasm-feature-detect`

**3. No Direct File System Access**
- Use IndexedDB or Cache API for model storage
- Stream models from network (HTTP/2)
- Consider embedding small models in binary

**4. GPU Access**
- WebGPU required for GPU acceleration
- Not universally supported (as of 2025, Chrome/Edge primarily)
- Fallback to CPU inference needed

### 6.5 Recommended WASM Frameworks for ruvector-scipix

**Primary: `candle` with WGPU**
- Smallest binary size
- Native WASM support
- WebGPU acceleration when available
- Hugging Face ecosystem

**Secondary: `tract`**
- Pure Rust, no C++ dependencies
- Excellent WASM support
- Proven in production (Sonos)
- CPU-optimized

**Alternative: `ort` with WASM backend**
- Full ONNX operator support
- Can use tract or candle as backend
- Larger bundle size

**Example WASM Integration**:
```rust
use wasm_bindgen::prelude::*;
use candle_core::{Device, Tensor};

#[wasm_bindgen]
pub struct OcrEngine {
    model: candle_onnx::Model,
    device: Device,
}

#[wasm_bindgen]
impl OcrEngine {
    #[wasm_bindgen(constructor)]
    pub async fn new() -> Result<OcrEngine, JsValue> {
        // Use WebGPU if available, fallback to CPU
        let device = Device::Cpu; // Or Device::new_wgpu(0)?

        // Load model from URL
        let model_bytes = fetch_model("model.onnx").await?;
        let model = candle_onnx::read(&model_bytes)
            .map_err(|e| JsValue::from_str(&e.to_string()))?;

        Ok(OcrEngine { model, device })
    }

    pub fn recognize_text(&self, image_data: &[u8]) -> Result<String, JsValue> {
        // Preprocess image
        let tensor = preprocess_image(image_data, &self.device)?;

        // Run inference
        let output = self.model.forward(&[tensor])
            .map_err(|e| JsValue::from_str(&e.to_string()))?;

        // Decode output
        let text = decode_predictions(output)?;
        Ok(text)
    }
}
```

### 6.6 WASM Deployment Checklist

- [ ] Enable WASM SIMD in build (`RUSTFLAGS='-C target-feature=+simd128'`)
- [ ] Optimize bundle size (`opt-level = "z"`, LTO, strip)
- [ ] Implement lazy loading for models
- [ ] Set up proper CORS headers for model fetching
- [ ] Add WebGPU feature detection with CPU fallback
- [ ] Configure Brotli/Gzip compression on CDN
- [ ] Test memory usage across browsers (especially mobile)
- [ ] Implement model cleanup on tab close
- [ ] Add loading indicators for async model initialization
- [ ] Consider service worker for model caching

---

## 7. Memory Management for Large Models

### 7.1 Memory Challenges in ML Inference

**Typical OCR Model Sizes**:
- PaddleOCR Detection: 3-10MB (FP32)
- PaddleOCR Recognition: 5-15MB (FP32)
- TrOCR: 50-300MB (depending on variant)
- Tesseract trained data: 10-50MB per language

**Memory Consumption Beyond Model Weights**:
- Input tensors: Image size × channels × precision
- Intermediate activations: Varies by architecture (can exceed model size)
- Output buffers: Sequence length × vocab size
- KV cache (for transformers): Context length × hidden size × layers

### 7.2 Quantization Strategies

**INT8 Quantization** (4x memory reduction)
```rust
// ONNX Runtime quantization
use ort::quantization::{QuantizationConfig, QuantizationType};

let config = QuantizationConfig::default()
    .with_per_channel(true)
    .with_reduce_range(true);

let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?;
```

**Benefits**:
- 75% memory reduction (FP32 → INT8)
- Minimal accuracy loss (typically <1% for OCR)
- Faster inference on integer-optimized hardware
- Reduced cache pressure

**FP16 Quantization** (2x memory reduction)
```rust
// Using ort with half crate
use half::f16;
use ort::tensor::OrtOwnedTensor;

let input_f16: Vec<f16> = input_f32.iter().map(|&x| f16::from_f32(x)).collect();
let tensor = OrtOwnedTensor::from_array(input_f16)?;
```

**Benefits**:
- Better accuracy preservation than INT8
- Native support on modern GPUs (Tensor Cores)
- Still significant memory savings

**Dynamic Quantization** (Runtime)
```rust
// tract supports dynamic quantization
let model = tract_onnx::onnx()
    .model_for_path("model.onnx")?
    .with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))?
    .quantize()? // Automatic quantization
    .into_optimized()?
    .into_runnable()?;
```

### 7.3 Memory Pooling and Reuse

**Tensor Buffer Reuse**:
```rust
use std::sync::Arc;
use parking_lot::Mutex;

struct TensorPool {
    buffers: Vec<Arc<Mutex<Vec<f32>>>>,
    size: usize,
}

impl TensorPool {
    fn new(pool_size: usize, buffer_size: usize) -> Self {
        let buffers = (0..pool_size)
            .map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size])))
            .collect();
        TensorPool { buffers, size: pool_size }
    }

    fn acquire(&self) -> Option<Arc<Mutex<Vec<f32>>>> {
        // Round-robin or availability-based selection
        self.buffers.first().cloned()
    }
}
```

**Session Pooling** (ONNX Runtime):
```rust
use once_cell::sync::Lazy;
use ort::Session;

static SESSION_POOL: Lazy<Vec<Session>> = Lazy::new(|| {
    (0..4).map(|_| {
        Session::builder()
            .unwrap()
            .commit_from_file("model.onnx")
            .unwrap()
    }).collect()
});

fn get_session() -> &'static Session {
    &SESSION_POOL[thread_id % 4]
}
```

### 7.4 Streaming and Batching

**Batch Processing** (Amortize overhead):
```rust
fn process_batch(images: &[DynamicImage], model: &Session) -> Result<Vec<String>> {
    let batch_size = images.len();

    // Create batched tensor [batch_size, channels, height, width]
    let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224];

    for (i, img) in images.iter().enumerate() {
        let offset = i * 3 * 224 * 224;
        preprocess_into_buffer(img, &mut batch_tensor[offset..]);
    }

    // Single inference call for entire batch
    let output = model.run(vec![batch_tensor.into()])?;

    // Decode batch results
    decode_batch_predictions(output, batch_size)
}
```

**Streaming Inference** (For large documents):
```rust
async fn process_document_streaming(
    pages: impl Stream<Item = Image>,
    model: &Session,
) -> impl Stream<Item = Result<String>> {
    pages.map(|page| {
        // Process one page at a time
        let text = recognize_text(&page, model)?;
        Ok(text)
    })
}
```

### 7.5 Model Sharding and Lazy Loading

**Lazy Model Loading**:
```rust
use once_cell::sync::OnceCell;

static DETECTION_MODEL: OnceCell<Session> = OnceCell::new();
static RECOGNITION_MODEL: OnceCell<Session> = OnceCell::new();

fn get_detection_model() -> &'static Session {
    DETECTION_MODEL.get_or_init(|| {
        Session::builder()
            .unwrap()
            .commit_from_file("detection.onnx")
            .unwrap()
    })
}
```

**Conditional Loading**:
```rust
// Only load language-specific models when needed
struct OcrEngine {
    detection: Session,
    recognition_models: HashMap<Language, OnceCell<Session>>,
}

impl OcrEngine {
    fn recognize(&self, img: &Image, lang: Language) -> Result<String> {
        let boxes = self.detect(img)?;

        let rec_model = self.recognition_models
            .get(&lang)
            .unwrap()
            .get_or_init(|| load_recognition_model(lang));

        self.recognize_boxes(img, &boxes, rec_model)
    }
}
```

### 7.6 Memory Mapping (Large Models)

**Using `memmap2` for Model Files**:
```rust
use memmap2::Mmap;
use std::fs::File;

fn load_model_mmap(path: &str) -> Result<Mmap> {
    let file = File::open(path)?;
    let mmap = unsafe { Mmap::map(&file)? };
    Ok(mmap)
}

// Model data stays on disk, paged in as needed
// Useful for models >100MB
```

**Benefits**:
- Reduced resident memory
- Faster startup (no full load)
- Shared memory across processes

**Limitations**:
- Not available in WASM
- Requires file system access
- May have higher latency on first access

### 7.7 GPU Memory Management

**CUDA Unified Memory**:
```rust
// ort automatically manages GPU memory
let session = Session::builder()?
    .with_execution_providers([ExecutionProvider::CUDA])?
    .commit_from_file("model.onnx")?;

// Tensors automatically transferred to/from GPU
```

**Manual GPU Memory Control** (candle):
```rust
use candle_core::{Device, Tensor};

let device = Device::new_cuda(0)?;

// Allocate on GPU
let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;

// Transfer to CPU when needed
let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?;

// Explicit cleanup
drop(tensor_gpu);
```

### 7.8 Memory Profiling and Monitoring

**Rust Memory Profiling Tools**:
- `valgrind --tool=massif`: Heap profiling
- `heaptrack`: Heap memory profiler (Linux)
- `dhat`: Dynamic heap analysis tool
- `tokio-console`: Async runtime monitoring

**Custom Memory Tracking**:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};

struct TrackingAllocator;

static ALLOCATED: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackingAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
        System.dealloc(ptr, layout)
    }
}

#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;

fn get_memory_usage() -> usize {
    ALLOCATED.load(Ordering::SeqCst)
}
```

### 7.9 Memory Optimization Recommendations for ruvector-scipix

**Priority Strategies**:

1. **Quantize Models** (INT8 for production)
   - 4x memory reduction
   - Minimal accuracy impact for OCR
   - Use ONNX Runtime quantization tools

2. **Implement Tensor Pooling**
   - Reuse buffers for repeated inferences
   - Align with ruvector-core's memory management patterns
   - Use `parking_lot` for efficient synchronization

3. **Lazy Load Language Models**
   - Only load recognition models for requested languages
   - Use `OnceCell` for thread-safe initialization
   - Share models across threads

4. **Batch Processing**
   - Group multiple images into single inference call
   - Amortize overhead, improve GPU utilization
   - Integrate with ruvector's parallel processing

5. **GPU Memory Awareness**
   - Monitor GPU memory usage
   - Implement fallback to CPU if GPU OOM
   - Use smaller batch sizes on memory-constrained devices

6. **Profile Real Workloads**
   - Measure memory with actual ruvector data
   - Identify bottlenecks (model weights vs activations)
   - Optimize based on data

---

## 8. Recommended Technology Stack for ruvector-scipix

### 8.1 Primary Stack (Production Deployment)

**Inference Engine**: `ort` (ONNX Runtime)
- **Version**: `2.0.0-rc` or latest stable
- **Features**: `cuda`, `tensorrt`, `half`, `load-dynamic`
- **Rationale**:
  - Best-in-class performance (73% latency reduction)
  - Extensive GPU support (CUDA, TensorRT, OpenVINO)
  - Production-proven (Twitter, Google, SurrealDB)
  - Largest ONNX model ecosystem

**OCR Models**: PaddleOCR v5 (ONNX format)
- **Detection**: `ch_PP-OCRv5_mobile_det.onnx`
- **Recognition**: `ch_PP-OCRv5_mobile_rec.onnx`
- **Rationale**:
  - State-of-the-art accuracy
  - Optimized for speed (5x faster in ONNX)
  - Multi-language support (80+ languages)
  - Active development (2025 updates)

**Image Processing**: `image` + `imageproc`
- **Version**: Latest stable
- **Rationale**:
  - Comprehensive format support
  - CPU parallelism via rayon (already in workspace)
  - Mature, well-tested
  - Pure Rust (no C++ dependencies)

**Dependencies Integration**:
```toml
[dependencies]
# Inference
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }

# Image processing
image = "0.25"
imageproc = "0.25"

# Existing ruvector-core dependencies (reuse)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }
```

### 8.2 Alternative Stack (WASM/Browser Deployment)

**Inference Engine**: `candle` with WGPU backend
- **Version**: Latest stable from Hugging Face
- **Features**: `wasm`, `webgpu`
- **Rationale**:
  - Smallest WASM bundle size
  - Native WebGPU support
  - Fast startup times
  - Pure Rust

**OCR Models**: TrOCR (via candle-onnx) or lightweight PaddleOCR
- Smaller models for browser constraints
- Quantized INT8 versions

**WASM-Specific Stack**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
candle-onnx = { version = "0.8" }
wasm-bindgen = { workspace = true }
web-sys = { workspace = true }
```

### 8.3 Fallback Stack (Pure Rust/No External Dependencies)

**Inference Engine**: `tract`
- **Use Case**: When ONNX Runtime binaries unavailable or pure Rust required
- **Rationale**:
  - No C++ dependencies
  - Excellent WASM support
  - Mature (Sonos production use)
  - Passes 85% ONNX tests

**Stack**:
```toml
[dependencies]
tract-onnx = "0.22"
image = "0.25"
imageproc = "0.25"
```

### 8.4 Architecture Design

```
┌─────────────────────────────────────────────────────────────┐
│                    ruvector-scipix                          │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐     ┌──────────────┐    ┌──────────────┐  │
│  │  Image Input │────▶│ Preprocessing│───▶│   Detection  │  │
│  │   (image)    │     │ (imageproc)  │    │  (ort/ONNX)  │  │
│  └──────────────┘     └──────────────┘    └──────┬───────┘  │
│                                                   │          │
│                                                   ▼          │
│                                          ┌──────────────┐    │
│                                          │  Text Boxes  │    │
│                                          └──────┬───────┘    │
│                                                 │            │
│                       ┌─────────────────────────┘            │
│                       │                                      │
│                       ▼                                      │
│              ┌──────────────┐      ┌──────────────┐          │
│              │ Recognition  │─────▶│  Post-Proc.  │          │
│              │  (ort/ONNX)  │      │   (decode)   │          │
│              └──────────────┘      └──────┬───────┘          │
│                                           │                  │
│                                           ▼                  │
│                                  ┌──────────────┐            │
│                                  │ Vector Store │            │
│                                  │ (ruvector-   │            │
│                                  │    core)     │            │
│                                  └──────────────┘            │
│                                                               │
└─────────────────────────────────────────────────────────────┘

GPU Acceleration Layers:
├─ CUDA/TensorRT (NVIDIA)
├─ Metal (Apple Silicon)
├─ OpenVINO (Intel)
└─ WGPU (Cross-platform/Browser)
```

### 8.5 Module Structure

```
examples/scipix/
├── Cargo.toml
├── src/
│   ├── lib.rs              # Public API
│   ├── engine.rs           # OCR engine orchestration
│   ├── detection.rs        # Text detection (ONNX)
│   ├── recognition.rs      # Text recognition (ONNX)
│   ├── preprocessing.rs    # Image preprocessing (imageproc)
│   ├── postprocessing.rs   # Result decoding and formatting
│   ├── models.rs           # Model loading and management
│   └── config.rs           # Configuration
├── models/                 # ONNX model files (gitignored)
│   ├── detection.onnx
│   ├── recognition.onnx
│   └── dict.txt
├── tests/
│   ├── integration_test.rs
│   └── benchmark.rs
└── docs/
    ├── 01_REQUIREMENTS.md
    ├── 02_ARCHITECTURE.md
    └── 03_RUST_ECOSYSTEM.md  # This document
```

### 8.6 Performance Targets

Based on PaddleOCR benchmarks and Rust optimizations:

| Metric | Target | Hardware |
|--------|--------|----------|
| **Detection Latency** | <50ms | NVIDIA T4 (TensorRT) |
| **Recognition Latency** | <20ms | NVIDIA T4 (TensorRT) |
| **End-to-End (single image)** | <100ms | NVIDIA T4 |
| **Throughput (batched)** | >100 images/sec | NVIDIA T4 |
| **CPU Latency** | <500ms | Modern multi-core CPU |
| **WASM Latency** | <1s | Browser (WebGPU) |
| **Memory Usage** | <500MB | With INT8 quantization |

### 8.7 Development Phases

**Phase 1: Core Implementation (ort + PaddleOCR)**
- Implement detection and recognition pipelines
- Integrate with ruvector-core storage
- CPU-only inference initially
- Basic preprocessing (resize, normalize)

**Phase 2: GPU Acceleration**
- Add CUDA/TensorRT support
- Benchmark and optimize performance
- Implement batching for throughput
- Memory pooling and reuse

**Phase 3: Production Hardening**
- Model quantization (INT8)
- Error handling and fallbacks
- Metrics and monitoring
- Load testing

**Phase 4: WASM Support (Optional)**
- Port to candle or tract
- Browser deployment
- WebGPU acceleration
- Client-side OCR

### 8.8 Testing Strategy

**Unit Tests**:
- Image preprocessing correctness
- Model loading and initialization
- Tensor shape validation
- Output decoding accuracy

**Integration Tests**:
```rust
#[test]
fn test_end_to_end_ocr() {
    let engine = OcrEngine::new(Config::default()).unwrap();
    let img = image::open("tests/fixtures/sample.jpg").unwrap();
    let result = engine.recognize_text(&img).unwrap();
    assert!(result.contains("expected text"));
}
```

**Benchmarks** (using Criterion):
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_detection(c: &mut Criterion) {
    let engine = setup_engine();
    let img = load_test_image();

    c.bench_function("detection", |b| {
        b.iter(|| engine.detect(black_box(&img)))
    });
}

criterion_group!(benches, benchmark_detection);
criterion_main!(benches);
```

**Performance Tests**:
- Latency under various image sizes
- Throughput with batching
- Memory usage over time
- GPU utilization

---

## 9. Integration with ruvector-core Dependencies

### 9.1 Shared Workspace Dependencies

The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency.

**Already Available (from workspace)**:

| Dependency | ruvector Use | scipix Use |
|------------|--------------|-------------|
| `rayon` | Parallel distance computation | Batch image preprocessing, parallel OCR |
| `ndarray` | Vector operations | Tensor manipulation, image arrays |
| `parking_lot` | Lock-free data structures | Model pool synchronization |
| `dashmap` | Concurrent hash maps | Model cache, result cache |
| `tokio` | Async runtime | Async inference, streaming |
| `serde` / `serde_json` | Serialization | Config, results serialization |
| `thiserror` / `anyhow` | Error handling | OCR error types |
| `tracing` | Logging | Inference timing, debugging |
| `uuid` | Unique identifiers | Request tracking |
| `chrono` | Timestamps | Inference metrics |

**Benefits**:
- **Minimal new dependencies**: Only add OCR-specific crates
- **Consistent patterns**: Same error handling, logging, async across codebase
- **Binary size**: Shared dependencies not duplicated
- **Maintenance**: Updates to workspace deps benefit all crates

### 9.2 Parallel Processing Integration

**Leverage rayon for Batch OCR**:
```rust
use rayon::prelude::*;

fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec<OcrResult> {
    images.par_iter()
        .map(|img| engine.recognize_text(img))
        .collect()
}
```

**Consistency**: Matches ruvector-core's parallel distance computation pattern

### 9.3 Storage Integration

**Store OCR Results in ruvector-core**:
```rust
use ruvector_core::{VectorStore, Vector};

struct OcrResult {
    text: String,
    embedding: Vec<f32>,  // From embedding model
    bounding_boxes: Vec<BoundingBox>,
}

impl OcrResult {
    fn store_in_ruvector(&self, store: &mut VectorStore) -> Result<uuid::Uuid> {
        let vector = Vector::new(self.embedding.clone());
        let id = store.insert(vector)?;

        // Store metadata
        store.set_metadata(id, "text", &self.text)?;
        store.set_metadata(id, "boxes", &self.bounding_boxes)?;

        Ok(id)
    }
}
```

**Vector Search for OCR Results**:
```rust
// Find similar documents by text embedding
let query_embedding = embed_text("search query")?;
let similar_docs = store.search(&query_embedding, 10)?;
```

### 9.4 WASM Compatibility

**ruvector-core WASM Patterns**:
- `memory-only` feature for WASM targets
- `wasm-bindgen` for browser interop
- `getrandom` with `wasm_js` feature

**Apply to scipix**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
wasm-bindgen = { workspace = true }
getrandom = { workspace = true, features = ["wasm_js"] }

[features]
default = ["ort-backend"]
ort-backend = ["ort"]
candle-backend = ["candle-core", "candle-onnx"]
wasm = ["candle-backend"]  # WASM uses candle
```

### 9.5 Error Handling Patterns

**Consistent with ruvector-core**:
```rust
use thiserror::Error;

#[derive(Error, Debug)]
pub enum OcrError {
    #[error("Model loading failed: {0}")]
    ModelLoadError(String),

    #[error("Inference failed: {0}")]
    InferenceError(String),

    #[error("Image preprocessing failed: {0}")]
    PreprocessingError(#[from] image::ImageError),

    #[error("ONNX Runtime error: {0}")]
    OrtError(#[from] ort::Error),

    #[error("IO error: {0}")]
    IoError(#[from] std::io::Error),
}

pub type Result<T> = std::result::Result<T, OcrError>;
```

### 9.6 Configuration Pattern

**Similar to ruvector-core config**:
```rust
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OcrConfig {
    /// Path to detection model
    pub detection_model_path: String,

    /// Path to recognition model
    pub recognition_model_path: String,

    /// Use GPU acceleration if available
    pub use_gpu: bool,

    /// Batch size for parallel processing
    pub batch_size: usize,

    /// Detection confidence threshold
    pub detection_threshold: f32,

    /// Number of inference threads
    pub num_threads: usize,
}

impl Default for OcrConfig {
    fn default() -> Self {
        Self {
            detection_model_path: "models/detection.onnx".into(),
            recognition_model_path: "models/recognition.onnx".into(),
            use_gpu: true,
            batch_size: 8,
            detection_threshold: 0.7,
            num_threads: rayon::current_num_threads(),
        }
    }
}
```

### 9.7 Async Integration

**Use tokio for async OCR**:
```rust
use tokio::task;

pub struct AsyncOcrEngine {
    engine: Arc<OcrEngine>,
}

impl AsyncOcrEngine {
    pub async fn recognize_text(&self, image: DynamicImage) -> Result<OcrResult> {
        let engine = Arc::clone(&self.engine);

        // Run blocking OCR in tokio threadpool
        task::spawn_blocking(move || {
            engine.recognize_text_sync(&image)
        }).await?
    }

    pub async fn process_stream(
        &self,
        images: impl Stream<Item = DynamicImage>,
    ) -> impl Stream<Item = Result<OcrResult>> {
        images.then(move |img| {
            let engine = Arc::clone(&self.engine);
            async move {
                engine.recognize_text(img).await
            }
        })
    }
}
```

### 9.8 Metrics Integration

**Use existing tracing infrastructure**:
```rust
use tracing::{info, debug, instrument};

#[instrument(skip(self, image))]
pub fn recognize_text(&self, image: &DynamicImage) -> Result<OcrResult> {
    let start = std::time::Instant::now();

    debug!("Starting OCR for image {}x{}", image.width(), image.height());

    let preprocessed = self.preprocess(image)?;
    debug!("Preprocessing took {:?}", start.elapsed());

    let boxes = self.detect(&preprocessed)?;
    debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed());

    let text = self.recognize(&preprocessed, &boxes)?;

    info!(
        "OCR completed in {:?}, extracted {} characters",
        start.elapsed(),
        text.len()
    );

    Ok(OcrResult { text, boxes })
}
```

### 9.9 Testing Infrastructure Reuse

**Use workspace test dependencies**:
```toml
[dev-dependencies]
criterion = { workspace = true }
proptest = { workspace = true }
mockall = { workspace = true }
tempfile = "3.13"
```

**Property-Based Testing** (like ruvector-core):
```rust
use proptest::prelude::*;

proptest! {
    #[test]
    fn test_preprocessing_preserves_aspect_ratio(
        width in 100u32..2000u32,
        height in 100u32..2000u32
    ) {
        let img = DynamicImage::new_rgb8(width, height);
        let processed = preprocess_image(&img)?;

        let original_ratio = width as f32 / height as f32;
        let processed_ratio = processed.width() as f32 / processed.height() as f32;

        prop_assert!((original_ratio - processed_ratio).abs() < 0.01);
    }
}
```

### 9.10 Dependency Summary for scipix

**New Dependencies Required**:
```toml
[dependencies]
# OCR/ML (new)
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] }
image = "0.25"
imageproc = "0.25"

# Reuse from workspace (no version needed)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }

# Integration with ruvector-core
ruvector-core = { path = "../../crates/ruvector-core" }
```

**Total New Dependencies**: 3 (ort, image, imageproc)
**Reused Dependencies**: 12 from workspace

---

## 10. License Compatibility

### 10.1 ruvector Project License

**Current License**: MIT (from workspace `Cargo.toml`)

**Requirement**: All dependencies must be MIT-compatible for redistribution.

### 10.2 Recommended Dependencies License Analysis

| Crate | License | Compatible? | Notes |
|-------|---------|-------------|-------|
| **ort** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed, fully compatible |
| **candle** | MIT OR Apache-2.0 | ✅ Yes | Hugging Face, dual-licensed |
| **tract** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed (except ONNX protos) |
| **image** | MIT OR Apache-2.0 | ✅ Yes | Pure Rust, dual-licensed |
| **imageproc** | MIT | ✅ Yes | Permissive, MIT-only |
| **ndarray** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **rayon** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **wasm-bindgen** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |

**Incompatible Libraries (Avoid)**:

| Crate | License | Issue |
|-------|---------|-------|
| **leptess** | MIT (wrapper) | ❌ Depends on Tesseract (Apache-2.0 with restrictions) |
| **opencv-rust** | MIT (wrapper) | ❌ Depends on OpenCV (Apache-2.0, complex) |

### 10.3 ONNX Model Licenses

PaddleOCR models used in ONNX format have **Apache-2.0** license.

**Compatibility**:
- ✅ Apache-2.0 code can be used in MIT-licensed projects
- ✅ ONNX models (weights) are typically considered data, not code
- ✅ Distribution of pre-trained models is permitted
- ⚠️ Derivative works of Apache-2.0 code require patent grant preservation

**Best Practice**:
- Download PaddleOCR ONNX models from official sources
- Include LICENSE file in `models/` directory
- Document model provenance in README
- Do not modify Apache-2.0 code (use as-is via ONNX)

### 10.4 Rust Dual-Licensing Best Practices

**Why Rust Uses MIT OR Apache-2.0**:
- **MIT**: Maximum permissiveness, minimal restrictions
- **Apache-2.0**: Patent protection, better for corporate use
- **Dual License**: Users choose which applies to them

**For ruvector-scipix**:

**Option 1: Keep MIT-only (Current)**
- ✅ Simplest licensing
- ✅ Maximum compatibility
- ✅ Minimal legal overhead
- ✅ All dependencies are MIT-compatible

**Option 2: Adopt Dual MIT/Apache-2.0**
- ✅ Better patent protection
- ✅ Aligns with Rust ecosystem norms
- ✅ More attractive to enterprise users
- ⚠️ Slightly more complex

**Recommendation**: Keep MIT-only for simplicity, unless patent concerns arise.

### 10.5 License Compliance Checklist

**For Production Deployment**:

- [ ] Verify all direct dependencies are MIT or MIT/Apache-2.0
- [ ] Check transitive dependencies for license conflicts
- [ ] Include LICENSE file in repository
- [ ] Document third-party licenses in NOTICE file
- [ ] Include PaddleOCR model license in `models/LICENSE`
- [ ] Add copyright headers to source files (optional for MIT)
- [ ] Review ONNX Runtime's license (MIT, but check binary distribution terms)
- [ ] Ensure no GPL/LGPL dependencies (incompatible with MIT)

**Automated License Checking**:
```bash
# Use cargo-license to audit dependencies
cargo install cargo-license
cargo license --all-features

# Fail build on incompatible licenses
cargo deny check licenses
```

**`deny.toml` Configuration**:
```toml
[licenses]
unlicensed = "deny"
allow = [
    "MIT",
    "Apache-2.0",
    "Apache-2.0 WITH LLVM-exception",
    "BSD-2-Clause",
    "BSD-3-Clause",
    "ISC",
    "Unicode-DFS-2016",
]
deny = [
    "GPL-2.0",
    "GPL-3.0",
    "AGPL-3.0",
]
```

### 10.6 Attribution Requirements

**MIT License Requirements**:
- Include copyright notice
- Include permission notice (LICENSE file)
- No obligation to disclose source code modifications

**For PaddleOCR Models (Apache-2.0)**:
- Include NOTICE file if provided
- Preserve copyright and patent notices
- Document significant modifications (if any)

**Recommended NOTICE File**:
```
ruvector-scipix
Copyright 2025 Ruvector Team

This software includes components from:

1. ONNX Runtime
   Copyright Microsoft Corporation
   Licensed under MIT License

2. PaddleOCR Models
   Copyright PaddlePaddle Authors
   Licensed under Apache License 2.0
   Model files located in models/ directory

3. Candle ML Framework
   Copyright Hugging Face, Inc.
   Licensed under MIT OR Apache-2.0

Complete license texts available in the LICENSE and models/LICENSE files.
```

### 10.7 License Compatibility Summary

**✅ SAFE TO USE** (Recommended Stack):
- `ort` - MIT/Apache-2.0
- `image` - MIT/Apache-2.0
- `imageproc` - MIT
- `candle` - MIT/Apache-2.0
- `tract` - MIT/Apache-2.0
- PaddleOCR ONNX models - Apache-2.0 (data)

**⚠️ USE WITH CAUTION**:
- `leptess` - Requires Tesseract C++ library (complex licensing)
- `opencv-rust` - Requires OpenCV (large dependency, Apache-2.0)

**❌ AVOID**:
- Any GPL/LGPL libraries (incompatible with MIT for proprietary use)
- Proprietary OCR engines (licensing fees, redistribution restrictions)

**Final Recommendation**: The proposed stack (`ort` + PaddleOCR + `image`/`imageproc`) is **fully compatible** with ruvector's MIT license and follows Rust ecosystem best practices.

---

## 11. Final Recommendations

### 11.1 Optimal Technology Stack

**Primary Recommendation (Production)**:
```toml
[dependencies]
# Inference: Best performance, production-proven
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }

# Image processing: Pure Rust, mature
image = "0.25"
imageproc = "0.25"

# OCR models: PaddleOCR v5 ONNX (download separately)
# - Detection: ch_PP-OCRv5_mobile_det.onnx
# - Recognition: ch_PP-OCRv5_mobile_rec.onnx

# Reuse workspace dependencies
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
thiserror = { workspace = true }

# Integration
ruvector-core = { path = "../../crates/ruvector-core" }
```

**Rationale**:
1. **Performance**: `ort` provides 73% latency reduction vs alternatives
2. **Ecosystem**: Largest ONNX model selection (PaddleOCR, TrOCR, etc.)
3. **GPU Support**: CUDA, TensorRT, OpenVINO, Metal (via CoreML)
4. **Production Ready**: Used by Twitter, Google, SurrealDB
5. **License**: MIT/Apache-2.0 dual-license (fully compatible)
6. **Maintenance**: Active development, Microsoft backing

### 11.2 Alternative Stacks by Use Case

**WASM/Browser Deployment**:
```toml
candle-core = { version = "0.8", features = ["wasm", "webgpu"] }
candle-onnx = "0.8"
```
- Smallest bundle size (~180KB Brotli)
- WebGPU acceleration
- Fast startup (120ms first token)

**Pure Rust / No External Deps**:
```toml
tract-onnx = "0.22"
```
- No C++ dependencies
- Excellent for embedded/restrictive environments
- 85% ONNX compatibility

**Edge Devices / Raspberry Pi**:
```toml
tract-onnx = { version = "0.22", features = ["pulse"] }
```
- Optimized for CPU inference
- Minimal memory footprint
- Proven on RPi (11μs for CNN models)

### 11.3 Implementation Roadmap

**Week 1-2: Core Infrastructure**
- Set up `examples/scipix` crate structure
- Integrate `ort` and `image`/`imageproc`
- Implement model loading (detection + recognition)
- Basic end-to-end pipeline (CPU-only)

**Week 3-4: GPU Acceleration**
- Enable CUDA/TensorRT support
- Implement batching for throughput
- Benchmark performance vs targets
- Memory pooling and optimization

**Week 5-6: Production Hardening**
- Model quantization (INT8)
- Error handling and recovery
- Metrics and monitoring (tracing)
- Integration tests and benchmarks

**Week 7-8: ruvector Integration**
- Store OCR results in ruvector-core
- Implement vector search for documents
- Async API with tokio
- Documentation and examples

**Optional (Week 9-10): WASM Support**
- Port to candle for browser deployment
- WebGPU acceleration
- Client-side OCR demo

### 11.4 Key Metrics to Track

**Performance**:
- Detection latency: Target <50ms (GPU), <200ms (CPU)
- Recognition latency: Target <20ms (GPU), <100ms (CPU)
- End-to-end: Target <100ms (GPU), <500ms (CPU)
- Throughput: Target >100 images/sec (batched, GPU)

**Memory**:
- Model size: ~15-30MB (FP32), ~5-10MB (INT8)
- Runtime memory: Target <500MB
- GPU memory: Monitor for OOM

**Accuracy**:
- Character accuracy: Target >95% (clean text)
- Word accuracy: Target >90%
- Benchmark against Tesseract and commercial APIs

### 11.5 Risk Mitigation

**Model Availability**:
- ✅ PaddleOCR models freely available
- ✅ Multiple model versions for fallback
- ⚠️ Verify ONNX export quality (may need custom conversion)

**Dependency Stability**:
- ✅ `ort` actively maintained (2.0 rc, stable release expected)
- ✅ `image`/`imageproc` mature, widely used
- ⚠️ Monitor for breaking changes during updates

**Performance Variability**:
- ⚠️ GPU performance depends on driver versions
- ⚠️ WASM performance varies by browser
- ✅ Comprehensive benchmarking before production

**License Compliance**:
- ✅ All recommended dependencies MIT-compatible
- ✅ PaddleOCR Apache-2.0 (compatible for use)
- ⚠️ Review licenses before adding new dependencies

### 11.6 Success Criteria

The ruvector-scipix implementation is successful if:

1. **Performance**: Meets or exceeds latency/throughput targets
2. **Accuracy**: Character accuracy >95% on clean text
3. **Integration**: Seamlessly stores results in ruvector-core
4. **Portability**: Runs on Linux/macOS/Windows, CPU and GPU
5. **Memory**: Operates within <500MB budget
6. **License**: Maintains MIT compatibility
7. **Maintainability**: Uses idiomatic Rust, well-documented
8. **Scalability**: Handles batch processing efficiently

### 11.7 Next Steps

1. **Review this document** with ruvector team for alignment
2. **Download PaddleOCR models** (detection + recognition ONNX)
3. **Set up `examples/scipix` crate** with recommended dependencies
4. **Implement basic OCR pipeline** (end-to-end proof of concept)
5. **Benchmark initial implementation** against targets
6. **Iterate and optimize** based on real-world data
7. **Document API** and usage examples
8. **Integrate with ruvector-core** for vector storage

---

## References and Resources

### Documentation
- [ort Documentation](https://ort.pyke.io/) - ONNX Runtime Rust bindings by pykeio
- [Candle GitHub](https://github.com/huggingface/candle) - Minimalist ML framework for Rust
- [tract GitHub](https://github.com/sonos/tract) - Tiny, no-nonsense ONNX/TF inference
- [PaddleOCR GitHub](https://github.com/PaddlePaddle/PaddleOCR) - OCR models and documentation
- [imageproc Docs](https://docs.rs/imageproc) - Rust image processing library

### Performance Benchmarks
- [Rust at the Metal: GPU Layer Driving Modern AI](https://rustacean.ai/p/issue-2-rust-at-the-metal-the-gpu-layer-driving-modern-ai)
- [Rust for Machine Learning in 2025](https://markaicode.com/rust-machine-learning-framework-comparison-2025/)
- [PaddleOCR 3.0 High-Performance Inference](http://www.paddleocr.ai/main/en/version3.x/deployment/high_performance_inference.html)

### WASM Resources
- [WebAssembly 3.0 Performance: Rust vs C++ Benchmarks](https://markaicode.com/webassembly-3-performance-rust-cpp-benchmarks-2025/)
- [3W for In-Browser AI: WebLLM + WASM + WebWorkers](https://blog.mozilla.ai/3w-for-in-browser-ai-webllm-wasm-webworkers/)

### License Information
- [Rust API Guidelines: Licensing](https://rust-lang.github.io/api-guidelines/necessities.html)
- [PaddleOCR License](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE) - Apache-2.0
- [ONNX Runtime License](https://github.com/microsoft/onnxruntime/blob/main/LICENSE) - MIT

---

**Document Version**: 1.0
**Last Updated**: 2025-11-28
**Author**: Research and Analysis Agent
**Status**: Complete