Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/examples/scipix/examples/README.md
+++ b/vendor/ruvector/examples/scipix/examples/README.md
@@ -0,0 +1,444 @@
+# ruvector-scipix Examples
+
+This directory contains comprehensive examples demonstrating various features and use cases of ruvector-scipix.
+
+## Quick Start
+
+All examples can be run using:
+```bash
+cargo run --example <example_name> -- [arguments]
+```
+
+## Examples Overview
+
+### 1. Simple OCR (`simple_ocr.rs`)
+
+**Basic OCR functionality with single image processing.**
+
+Demonstrates:
+- Loading and processing a single image
+- OCR recognition
+- Output in multiple formats (plain text, LaTeX)
+- Confidence scores
+
+**Usage:**
+```bash
+cargo run --example simple_ocr -- path/to/image.png
+```
+
+**Example Output:**
+```
+Plain Text: x² + 2x + 1 = 0
+LaTeX: x^{2} + 2x + 1 = 0
+Confidence: 95.3%
+```
+
+---
+
+### 2. Batch Processing (`batch_processing.rs`)
+
+**Parallel processing of multiple images with progress tracking.**
+
+Demonstrates:
+- Directory-based batch processing
+- Parallel/concurrent processing
+- Progress bar visualization
+- Statistics and metrics
+- JSON output
+
+**Usage:**
+```bash
+cargo run --example batch_processing -- /path/to/images output.json
+```
+
+**Features:**
+- Automatic CPU core detection for optimal parallelism
+- Real-time progress visualization
+- Per-file error handling
+- Aggregate statistics
+
+---
+
+### 3. API Server (`api_server.rs`)
+
+**REST API server for OCR processing.**
+
+Demonstrates:
+- HTTP server with Axum framework
+- Single and batch image processing endpoints
+- Health check endpoint
+- Graceful shutdown
+- CORS support
+- Multipart file uploads
+
+**Usage:**
+```bash
+# Start server
+cargo run --example api_server
+
+# In another terminal, test the API
+curl -X POST -F "image=@equation.png" http://localhost:8080/ocr
+curl http://localhost:8080/health
+```
+
+**Endpoints:**
+- `GET /health` - Health check
+- `POST /ocr` - Process single image
+- `POST /batch` - Process multiple images
+
+---
+
+### 4. Streaming Processing (`streaming.rs`)
+
+**Streaming PDF processing with real-time results.**
+
+Demonstrates:
+- Large document processing
+- Streaming results as pages are processed
+- Real-time progress reporting
+- Incremental JSON output
+- Memory-efficient processing
+
+**Usage:**
+```bash
+cargo run --example streaming -- document.pdf output/
+```
+
+**Features:**
+- Processes pages concurrently (4 at a time)
+- Saves individual page results immediately
+- Generates final document summary
+- Per-page timing statistics
+
+---
+
+### 5. Custom Pipeline (`custom_pipeline.rs`)
+
+**Custom OCR pipeline with preprocessing and post-processing.**
+
+Demonstrates:
+- Image preprocessing (denoising, sharpening, binarization)
+- Post-processing filters
+- LaTeX validation
+- Confidence filtering
+- Custom output formatting
+- Otsu's thresholding
+
+**Usage:**
+```bash
+cargo run --example custom_pipeline -- image.png
+```
+
+**Pipeline Steps:**
+1. **Preprocessing:**
+   - Denoising
+   - Contrast enhancement
+   - Sharpening
+   - Binarization (Otsu's method)
+   - Deskewing
+
+2. **Post-processing:**
+   - Confidence filtering
+   - LaTeX validation
+   - Spell checking
+   - Custom formatting
+
+---
+
+### 6. WASM Browser Demo (`wasm_demo.html`)
+
+**Browser-based OCR demonstration.**
+
+Demonstrates:
+- WebAssembly integration
+- Browser-based image upload
+- Drag-and-drop interface
+- Real-time visualization
+- Client-side processing
+
+**Setup:**
+```bash
+# Build WASM module (when available)
+wasm-pack build --target web
+
+# Serve the demo
+python3 -m http.server 8000
+# Open http://localhost:8000/examples/wasm_demo.html
+```
+
+**Features:**
+- Modern, responsive UI
+- Drag-and-drop file upload
+- Live preview
+- Real-time results
+- No server required (runs in browser)
+
+---
+
+### 7. Agent-Based Processing (`lean_agentic.rs`)
+
+**Distributed OCR processing with agent coordination.**
+
+Demonstrates:
+- Multi-agent coordination
+- Distributed task processing
+- Fault tolerance
+- Load balancing
+- Agent statistics
+
+**Usage:**
+```bash
+cargo run --example lean_agentic -- /path/to/documents
+```
+
+**Features:**
+- Spawns multiple OCR agents (default: 4)
+- Automatic task distribution
+- Per-agent statistics
+- Throughput metrics
+- JSON result export
+
+**Architecture:**
+```
+Coordinator
+├── Agent 1 (tasks: 12)
+├── Agent 2 (tasks: 15)
+├── Agent 3 (tasks: 11)
+└── Agent 4 (tasks: 13)
+```
+
+---
+
+### 8. Accuracy Testing (`accuracy_test.rs`)
+
+**OCR accuracy testing against ground truth datasets.**
+
+Demonstrates:
+- Dataset-based testing
+- Multiple accuracy metrics
+- Category-based analysis
+- Statistical correlation
+- Comprehensive reporting
+
+**Usage:**
+```bash
+cargo run --example accuracy_test -- dataset.json
+```
+
+**Dataset Format:**
+```json
+[
+  {
+    "image_path": "tests/images/quadratic.png",
+    "ground_truth_text": "x^2 + 2x + 1 = 0",
+    "ground_truth_latex": "x^{2} + 2x + 1 = 0",
+    "category": "quadratic"
+  }
+]
+```
+
+**Metrics Calculated:**
+- **Text Accuracy** - Overall string similarity
+- **Character Error Rate (CER)** - Character-level errors
+- **Word Error Rate (WER)** - Word-level errors
+- **LaTeX Accuracy** - LaTeX format correctness
+- **Confidence Correlation** - Pearson correlation between confidence and accuracy
+- **Category Breakdown** - Per-category statistics
+
+**Example Output:**
+```
+Total Cases: 100
+Successful: 98 (98.0%)
+Average Confidence: 92.5%
+Average Text Accuracy: 94.2%
+Average CER: 3.1%
+Average WER: 5.8%
+Confidence Correlation: 0.847
+
+Category Breakdown:
+  quadratic: 25 cases, 96.3% accuracy
+  linear: 30 cases, 98.1% accuracy
+  calculus: 20 cases, 89.7% accuracy
+```
+
+---
+
+## Common Patterns
+
+### Error Handling
+All examples use `anyhow::Result` for error handling:
+```rust
+use anyhow::{Context, Result};
+
+fn main() -> Result<()> {
+    let image = image::open(path)
+        .context("Failed to open image")?;
+    Ok(())
+}
+```
+
+### Logging
+Examples use `env_logger` for debug output:
+```bash
+# Run with debug logging
+RUST_LOG=debug cargo run --example simple_ocr -- image.png
+
+# Run with info logging (default)
+RUST_LOG=info cargo run --example simple_ocr -- image.png
+```
+
+### Configuration
+OCR engine configuration:
+```rust
+use ruvector_scipix::OcrConfig;
+
+let config = OcrConfig {
+    confidence_threshold: 0.7,
+    max_image_size: 4096,
+    enable_preprocessing: true,
+    // ... other options
+};
+```
+
+## Dependencies
+
+Core dependencies used in examples:
+- `anyhow` - Error handling
+- `tokio` - Async runtime
+- `image` - Image processing
+- `serde/serde_json` - Serialization
+- `indicatif` - Progress bars
+- `axum` - HTTP server (api_server)
+- `env_logger` - Logging
+
+## Building Examples
+
+Build all examples:
+```bash
+cargo build --examples
+```
+
+Build specific example:
+```bash
+cargo build --example simple_ocr
+```
+
+Run with optimizations:
+```bash
+cargo run --release --example batch_processing -- images/ output.json
+```
+
+## Testing Examples
+
+Create test images:
+```bash
+# Create test directory
+mkdir -p test_images
+
+# Add some test images
+cp /path/to/math_equation.png test_images/
+```
+
+Run examples:
+```bash
+# Simple OCR
+cargo run --example simple_ocr -- test_images/equation.png
+
+# Batch processing
+cargo run --example batch_processing -- test_images/ results.json
+
+# Accuracy test (requires dataset)
+cargo run --example accuracy_test -- test_dataset.json
+```
+
+## Integration Guide
+
+### Using in Your Project
+
+1. **Add dependency:**
+```toml
+[dependencies]
+ruvector-scipix = "0.1.0"
+```
+
+2. **Basic usage:**
+```rust
+use ruvector_scipix::{OcrEngine, OcrConfig};
+
+#[tokio::main]
+async fn main() -> anyhow::Result<()> {
+    let config = OcrConfig::default();
+    let engine = OcrEngine::new(config).await?;
+
+    let image = image::open("equation.png")?;
+    let result = engine.recognize(&image).await?;
+
+    println!("Text: {}", result.text);
+    Ok(())
+}
+```
+
+3. **Advanced usage:**
+See individual examples for advanced patterns like:
+- Custom pipelines
+- Batch processing
+- API integration
+- Agent-based processing
+
+## Performance Tips
+
+1. **Batch Processing:**
+   - Use parallel processing for multiple images
+   - Adjust concurrency based on CPU cores
+   - Enable model caching for repeated runs
+
+2. **Memory Management:**
+   - Stream large documents instead of loading all at once
+   - Use appropriate image resolution (downscale if needed)
+   - Clear cache periodically for long-running processes
+
+3. **Accuracy vs Speed:**
+   - Higher confidence thresholds = more accuracy, slower processing
+   - Preprocessing improves accuracy but adds overhead
+   - Balance based on your use case
+
+## Troubleshooting
+
+### Common Issues
+
+**"Model not found"**
+```bash
+# Download models first
+./scripts/download_models.sh
+```
+
+**"Out of memory"**
+- Reduce batch size or concurrent workers
+- Downscale large images before processing
+- Enable streaming for PDFs
+
+**"Low confidence scores"**
+- Enable preprocessing pipeline
+- Improve image quality (resolution, contrast)
+- Check for skewed or rotated images
+
+## Contributing
+
+When adding new examples:
+1. Add the `.rs` file to `examples/`
+2. Update `Cargo.toml` with example entry
+3. Document in this README
+4. Include usage examples and expected output
+5. Add error handling and logging
+6. Keep examples self-contained
+
+## Resources
+
+- [Main Documentation](../README.md)
+- [API Reference](../docs/API.md)
+- [Model Guide](../docs/MODELS.md)
+- [Benchmarks](../benches/README.md)
+
+## License
+
+All examples are provided under the same license as ruvector-scipix.
--- a/vendor/ruvector/examples/scipix/examples/accuracy_test.rs
+++ b/vendor/ruvector/examples/scipix/examples/accuracy_test.rs
@@ -0,0 +1,411 @@
+//! Accuracy testing example
+//!
+//! This example demonstrates how to test OCR accuracy against a ground truth dataset.
+//! It calculates various metrics including WER, CER, and confidence correlations.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example accuracy_test -- dataset.json
+//! ```
+//!
+//! Dataset format (JSON):
+//! ```json
+//! [
+//!   {
+//!     "image_path": "path/to/image.png",
+//!     "ground_truth_text": "x^2 + 2x + 1 = 0",
+//!     "ground_truth_latex": "x^{2} + 2x + 1 = 0",
+//!     "category": "quadratic"
+//!   }
+//! ]
+//! ```
+
+use anyhow::{Context, Result};
+use ruvector_scipix::{OcrConfig, OcrEngine, OutputFormat};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+
+#[derive(Debug, Deserialize)]
+struct TestCase {
+    image_path: String,
+    ground_truth_text: String,
+    ground_truth_latex: Option<String>,
+    category: Option<String>,
+}
+
+#[derive(Debug, Serialize)]
+struct TestResult {
+    image_path: String,
+    category: String,
+    predicted_text: String,
+    predicted_latex: Option<String>,
+    ground_truth_text: String,
+    ground_truth_latex: Option<String>,
+    confidence: f32,
+    text_accuracy: f32,
+    latex_accuracy: Option<f32>,
+    character_error_rate: f32,
+    word_error_rate: f32,
+}
+
+#[derive(Debug, Serialize)]
+struct AccuracyMetrics {
+    total_cases: usize,
+    successful_cases: usize,
+    failed_cases: usize,
+    average_confidence: f32,
+    average_text_accuracy: f32,
+    average_latex_accuracy: f32,
+    average_cer: f32,
+    average_wer: f32,
+    category_breakdown: HashMap<String, CategoryMetrics>,
+    confidence_correlation: f32,
+}
+
+#[derive(Debug, Serialize)]
+struct CategoryMetrics {
+    count: usize,
+    average_accuracy: f32,
+    average_confidence: f32,
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 2 {
+        eprintln!("Usage: {} <dataset.json>", args[0]);
+        eprintln!("\nDataset format:");
+        eprintln!(
+            r#"[
+  {{
+    "image_path": "path/to/image.png",
+    "ground_truth_text": "x^2 + 2x + 1 = 0",
+    "ground_truth_latex": "x^{{2}} + 2x + 1 = 0",
+    "category": "quadratic"
+  }}
+]"#
+        );
+        std::process::exit(1);
+    }
+
+    let dataset_path = &args[1];
+
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    println!("Loading test dataset: {}", dataset_path);
+    let dataset_content = std::fs::read_to_string(dataset_path)?;
+    let test_cases: Vec<TestCase> = serde_json::from_str(&dataset_content)?;
+
+    println!("Loaded {} test cases", test_cases.len());
+
+    // Initialize OCR engine
+    println!("Initializing OCR engine...");
+    let config = OcrConfig::default();
+    let engine = OcrEngine::new(config).await?;
+
+    println!("Running accuracy tests...\n");
+
+    let mut results = Vec::new();
+
+    for (idx, test_case) in test_cases.iter().enumerate() {
+        println!(
+            "[{}/{}] Processing: {}",
+            idx + 1,
+            test_cases.len(),
+            test_case.image_path
+        );
+
+        match run_test_case(&engine, test_case).await {
+            Ok(result) => {
+                println!(
+                    "  Accuracy: {:.2}%, CER: {:.2}%, WER: {:.2}%",
+                    result.text_accuracy * 100.0,
+                    result.character_error_rate * 100.0,
+                    result.word_error_rate * 100.0
+                );
+                results.push(result);
+            }
+            Err(e) => {
+                eprintln!("  Error: {}", e);
+            }
+        }
+    }
+
+    // Calculate overall metrics
+    let metrics = calculate_metrics(&results);
+
+    // Display results
+    println!("\n{}", "=".repeat(80));
+    println!("Accuracy Test Results");
+    println!("{}", "=".repeat(80));
+    println!("Total Cases: {}", metrics.total_cases);
+    println!(
+        "Successful: {} ({:.1}%)",
+        metrics.successful_cases,
+        (metrics.successful_cases as f32 / metrics.total_cases as f32) * 100.0
+    );
+    println!("Failed: {}", metrics.failed_cases);
+    println!("\n📊 Overall Metrics:");
+    println!(
+        "  Average Confidence: {:.2}%",
+        metrics.average_confidence * 100.0
+    );
+    println!(
+        "  Average Text Accuracy: {:.2}%",
+        metrics.average_text_accuracy * 100.0
+    );
+    println!(
+        "  Average LaTeX Accuracy: {:.2}%",
+        metrics.average_latex_accuracy * 100.0
+    );
+    println!("  Average CER: {:.2}%", metrics.average_cer * 100.0);
+    println!("  Average WER: {:.2}%", metrics.average_wer * 100.0);
+    println!(
+        "  Confidence Correlation: {:.3}",
+        metrics.confidence_correlation
+    );
+
+    if !metrics.category_breakdown.is_empty() {
+        println!("\n📂 Category Breakdown:");
+        for (category, cat_metrics) in &metrics.category_breakdown {
+            println!("  {}:", category);
+            println!("    Count: {}", cat_metrics.count);
+            println!(
+                "    Average Accuracy: {:.2}%",
+                cat_metrics.average_accuracy * 100.0
+            );
+            println!(
+                "    Average Confidence: {:.2}%",
+                cat_metrics.average_confidence * 100.0
+            );
+        }
+    }
+
+    println!("{}", "=".repeat(80));
+
+    // Save detailed results
+    let json = serde_json::to_string_pretty(&serde_json::json!({
+        "metrics": metrics,
+        "results": results
+    }))?;
+    std::fs::write("accuracy_results.json", json)?;
+    println!("\nDetailed results saved to: accuracy_results.json");
+
+    Ok(())
+}
+
+async fn run_test_case(engine: &OcrEngine, test_case: &TestCase) -> Result<TestResult> {
+    let image = image::open(&test_case.image_path)
+        .context(format!("Failed to load image: {}", test_case.image_path))?;
+
+    let ocr_result = engine.recognize(&image).await?;
+
+    let predicted_text = ocr_result.text.clone();
+    let predicted_latex = ocr_result.to_format(OutputFormat::LaTeX).ok();
+
+    let text_accuracy = calculate_accuracy(&predicted_text, &test_case.ground_truth_text);
+    let latex_accuracy =
+        if let (Some(pred), Some(gt)) = (&predicted_latex, &test_case.ground_truth_latex) {
+            Some(calculate_accuracy(pred, gt))
+        } else {
+            None
+        };
+
+    let cer = calculate_character_error_rate(&predicted_text, &test_case.ground_truth_text);
+    let wer = calculate_word_error_rate(&predicted_text, &test_case.ground_truth_text);
+
+    Ok(TestResult {
+        image_path: test_case.image_path.clone(),
+        category: test_case
+            .category
+            .clone()
+            .unwrap_or_else(|| "uncategorized".to_string()),
+        predicted_text,
+        predicted_latex,
+        ground_truth_text: test_case.ground_truth_text.clone(),
+        ground_truth_latex: test_case.ground_truth_latex.clone(),
+        confidence: ocr_result.confidence,
+        text_accuracy,
+        latex_accuracy,
+        character_error_rate: cer,
+        word_error_rate: wer,
+    })
+}
+
+fn calculate_accuracy(predicted: &str, ground_truth: &str) -> f32 {
+    let distance = levenshtein_distance(predicted, ground_truth);
+    let max_len = predicted.len().max(ground_truth.len());
+
+    if max_len == 0 {
+        return 1.0;
+    }
+
+    1.0 - (distance as f32 / max_len as f32)
+}
+
+fn calculate_character_error_rate(predicted: &str, ground_truth: &str) -> f32 {
+    let distance = levenshtein_distance(predicted, ground_truth);
+    if ground_truth.len() == 0 {
+        return if predicted.len() == 0 { 0.0 } else { 1.0 };
+    }
+    distance as f32 / ground_truth.len() as f32
+}
+
+fn calculate_word_error_rate(predicted: &str, ground_truth: &str) -> f32 {
+    let pred_words: Vec<&str> = predicted.split_whitespace().collect();
+    let gt_words: Vec<&str> = ground_truth.split_whitespace().collect();
+
+    let distance = levenshtein_distance_vec(&pred_words, &gt_words);
+
+    if gt_words.len() == 0 {
+        return if pred_words.len() == 0 { 0.0 } else { 1.0 };
+    }
+
+    distance as f32 / gt_words.len() as f32
+}
+
+fn levenshtein_distance(s1: &str, s2: &str) -> usize {
+    let len1 = s1.len();
+    let len2 = s2.len();
+    let mut matrix = vec![vec![0; len2 + 1]; len1 + 1];
+
+    for i in 0..=len1 {
+        matrix[i][0] = i;
+    }
+    for j in 0..=len2 {
+        matrix[0][j] = j;
+    }
+
+    for (i, c1) in s1.chars().enumerate() {
+        for (j, c2) in s2.chars().enumerate() {
+            let cost = if c1 == c2 { 0 } else { 1 };
+            matrix[i + 1][j + 1] = *[
+                matrix[i][j + 1] + 1,
+                matrix[i + 1][j] + 1,
+                matrix[i][j] + cost,
+            ]
+            .iter()
+            .min()
+            .unwrap();
+        }
+    }
+
+    matrix[len1][len2]
+}
+
+fn levenshtein_distance_vec<T: Eq>(s1: &[T], s2: &[T]) -> usize {
+    let len1 = s1.len();
+    let len2 = s2.len();
+    let mut matrix = vec![vec![0; len2 + 1]; len1 + 1];
+
+    for i in 0..=len1 {
+        matrix[i][0] = i;
+    }
+    for j in 0..=len2 {
+        matrix[0][j] = j;
+    }
+
+    for i in 0..len1 {
+        for j in 0..len2 {
+            let cost = if s1[i] == s2[j] { 0 } else { 1 };
+            matrix[i + 1][j + 1] = *[
+                matrix[i][j + 1] + 1,
+                matrix[i + 1][j] + 1,
+                matrix[i][j] + cost,
+            ]
+            .iter()
+            .min()
+            .unwrap();
+        }
+    }
+
+    matrix[len1][len2]
+}
+
+fn calculate_metrics(results: &[TestResult]) -> AccuracyMetrics {
+    let total_cases = results.len();
+    let successful_cases = results.len();
+    let failed_cases = 0;
+
+    let average_confidence = results.iter().map(|r| r.confidence).sum::<f32>() / total_cases as f32;
+    let average_text_accuracy =
+        results.iter().map(|r| r.text_accuracy).sum::<f32>() / total_cases as f32;
+
+    let latex_count = results
+        .iter()
+        .filter(|r| r.latex_accuracy.is_some())
+        .count();
+    let average_latex_accuracy = if latex_count > 0 {
+        results.iter().filter_map(|r| r.latex_accuracy).sum::<f32>() / latex_count as f32
+    } else {
+        0.0
+    };
+
+    let average_cer =
+        results.iter().map(|r| r.character_error_rate).sum::<f32>() / total_cases as f32;
+    let average_wer = results.iter().map(|r| r.word_error_rate).sum::<f32>() / total_cases as f32;
+
+    // Calculate category breakdown
+    let mut category_breakdown = HashMap::new();
+    for result in results {
+        let entry = category_breakdown
+            .entry(result.category.clone())
+            .or_insert_with(|| CategoryMetrics {
+                count: 0,
+                average_accuracy: 0.0,
+                average_confidence: 0.0,
+            });
+
+        entry.count += 1;
+        entry.average_accuracy += result.text_accuracy;
+        entry.average_confidence += result.confidence;
+    }
+
+    for metrics in category_breakdown.values_mut() {
+        metrics.average_accuracy /= metrics.count as f32;
+        metrics.average_confidence /= metrics.count as f32;
+    }
+
+    // Calculate confidence correlation (Pearson correlation)
+    let confidence_correlation = calculate_pearson_correlation(
+        &results.iter().map(|r| r.confidence).collect::<Vec<_>>(),
+        &results.iter().map(|r| r.text_accuracy).collect::<Vec<_>>(),
+    );
+
+    AccuracyMetrics {
+        total_cases,
+        successful_cases,
+        failed_cases,
+        average_confidence,
+        average_text_accuracy,
+        average_latex_accuracy,
+        average_cer,
+        average_wer,
+        category_breakdown,
+        confidence_correlation,
+    }
+}
+
+fn calculate_pearson_correlation(x: &[f32], y: &[f32]) -> f32 {
+    let n = x.len() as f32;
+    let mean_x = x.iter().sum::<f32>() / n;
+    let mean_y = y.iter().sum::<f32>() / n;
+
+    let mut numerator = 0.0;
+    let mut sum_sq_x = 0.0;
+    let mut sum_sq_y = 0.0;
+
+    for i in 0..x.len() {
+        let diff_x = x[i] - mean_x;
+        let diff_y = y[i] - mean_y;
+        numerator += diff_x * diff_y;
+        sum_sq_x += diff_x * diff_x;
+        sum_sq_y += diff_y * diff_y;
+    }
+
+    if sum_sq_x == 0.0 || sum_sq_y == 0.0 {
+        return 0.0;
+    }
+
+    numerator / (sum_sq_x * sum_sq_y).sqrt()
+}
--- a/vendor/ruvector/examples/scipix/examples/api_server.rs
+++ b/vendor/ruvector/examples/scipix/examples/api_server.rs
@@ -0,0 +1,266 @@
+//! API server example
+//!
+//! This example demonstrates how to create a REST API server for OCR processing.
+//! It includes model preloading, graceful shutdown, and health checks.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example api_server
+//!
+//! # Then in another terminal:
+//! curl -X POST -F "image=@equation.png" http://localhost:8080/ocr
+//! ```
+
+use axum::{
+    extract::{Multipart, State},
+    http::StatusCode,
+    response::{IntoResponse, Json},
+    routing::{get, post},
+    Router,
+};
+use ruvector_scipix::{OcrConfig, OcrEngine, OutputFormat};
+use serde::{Deserialize, Serialize};
+use std::sync::Arc;
+use tokio::signal;
+use tower_http::cors::CorsLayer;
+
+#[derive(Clone)]
+struct AppState {
+    engine: Arc<OcrEngine>,
+}
+
+#[derive(Serialize, Deserialize)]
+struct OcrResponse {
+    success: bool,
+    text: Option<String>,
+    latex: Option<String>,
+    confidence: Option<f32>,
+    error: Option<String>,
+}
+
+#[derive(Serialize)]
+struct HealthResponse {
+    status: String,
+    version: String,
+    models_loaded: bool,
+}
+
+#[tokio::main]
+async fn main() -> anyhow::Result<()> {
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    println!("Initializing OCR engine...");
+
+    // Configure OCR engine
+    let config = OcrConfig::default();
+    let engine = OcrEngine::new(config).await?;
+
+    // Preload models for faster first request
+    println!("Preloading models...");
+    // TODO: Add model preloading method to OcrEngine
+
+    let state = AppState {
+        engine: Arc::new(engine),
+    };
+
+    // Build router
+    let app = Router::new()
+        .route("/health", get(health_check))
+        .route("/ocr", post(process_ocr))
+        .route("/batch", post(process_batch))
+        .layer(CorsLayer::permissive())
+        .with_state(state);
+
+    let addr = "0.0.0.0:8080";
+    println!("Starting server on http://{}", addr);
+    println!("\nEndpoints:");
+    println!("  GET  /health - Health check");
+    println!("  POST /ocr    - Process single image");
+    println!("  POST /batch  - Process multiple images");
+    println!("\nPress Ctrl+C to shutdown");
+
+    let listener = tokio::net::TcpListener::bind(addr).await?;
+
+    // Run server with graceful shutdown
+    axum::serve(listener, app)
+        .with_graceful_shutdown(shutdown_signal())
+        .await?;
+
+    println!("\nServer shutdown complete");
+
+    Ok(())
+}
+
+async fn health_check() -> impl IntoResponse {
+    Json(HealthResponse {
+        status: "healthy".to_string(),
+        version: env!("CARGO_PKG_VERSION").to_string(),
+        models_loaded: true,
+    })
+}
+
+async fn process_ocr(State(state): State<AppState>, mut multipart: Multipart) -> impl IntoResponse {
+    while let Some(field) = multipart.next_field().await.unwrap() {
+        if field.name() == Some("image") {
+            let data = match field.bytes().await {
+                Ok(bytes) => bytes,
+                Err(e) => {
+                    return (
+                        StatusCode::BAD_REQUEST,
+                        Json(OcrResponse {
+                            success: false,
+                            text: None,
+                            latex: None,
+                            confidence: None,
+                            error: Some(format!("Failed to read image: {}", e)),
+                        }),
+                    );
+                }
+            };
+
+            let image = match image::load_from_memory(&data) {
+                Ok(img) => img,
+                Err(e) => {
+                    return (
+                        StatusCode::BAD_REQUEST,
+                        Json(OcrResponse {
+                            success: false,
+                            text: None,
+                            latex: None,
+                            confidence: None,
+                            error: Some(format!("Invalid image format: {}", e)),
+                        }),
+                    );
+                }
+            };
+
+            match state.engine.recognize(&image).await {
+                Ok(result) => {
+                    return (
+                        StatusCode::OK,
+                        Json(OcrResponse {
+                            success: true,
+                            text: Some(result.text.clone()),
+                            latex: result.to_format(OutputFormat::LaTeX).ok(),
+                            confidence: Some(result.confidence),
+                            error: None,
+                        }),
+                    );
+                }
+                Err(e) => {
+                    return (
+                        StatusCode::INTERNAL_SERVER_ERROR,
+                        Json(OcrResponse {
+                            success: false,
+                            text: None,
+                            latex: None,
+                            confidence: None,
+                            error: Some(format!("OCR failed: {}", e)),
+                        }),
+                    );
+                }
+            }
+        }
+    }
+
+    (
+        StatusCode::BAD_REQUEST,
+        Json(OcrResponse {
+            success: false,
+            text: None,
+            latex: None,
+            confidence: None,
+            error: Some("No image field found".to_string()),
+        }),
+    )
+}
+
+async fn process_batch(
+    State(state): State<AppState>,
+    mut multipart: Multipart,
+) -> impl IntoResponse {
+    let mut results = Vec::new();
+
+    while let Some(field) = multipart.next_field().await.unwrap() {
+        if field.name() == Some("images") {
+            let data = match field.bytes().await {
+                Ok(bytes) => bytes,
+                Err(e) => {
+                    results.push(OcrResponse {
+                        success: false,
+                        text: None,
+                        latex: None,
+                        confidence: None,
+                        error: Some(format!("Failed to read image: {}", e)),
+                    });
+                    continue;
+                }
+            };
+
+            let image = match image::load_from_memory(&data) {
+                Ok(img) => img,
+                Err(e) => {
+                    results.push(OcrResponse {
+                        success: false,
+                        text: None,
+                        latex: None,
+                        confidence: None,
+                        error: Some(format!("Invalid image: {}", e)),
+                    });
+                    continue;
+                }
+            };
+
+            match state.engine.recognize(&image).await {
+                Ok(result) => {
+                    results.push(OcrResponse {
+                        success: true,
+                        text: Some(result.text.clone()),
+                        latex: result.to_format(OutputFormat::LaTeX).ok(),
+                        confidence: Some(result.confidence),
+                        error: None,
+                    });
+                }
+                Err(e) => {
+                    results.push(OcrResponse {
+                        success: false,
+                        text: None,
+                        latex: None,
+                        confidence: None,
+                        error: Some(format!("OCR failed: {}", e)),
+                    });
+                }
+            }
+        }
+    }
+
+    (StatusCode::OK, Json(results))
+}
+
+async fn shutdown_signal() {
+    let ctrl_c = async {
+        signal::ctrl_c()
+            .await
+            .expect("Failed to install Ctrl+C handler");
+    };
+
+    #[cfg(unix)]
+    let terminate = async {
+        signal::unix::signal(signal::unix::SignalKind::terminate())
+            .expect("Failed to install signal handler")
+            .recv()
+            .await;
+    };
+
+    #[cfg(not(unix))]
+    let terminate = std::future::pending::<()>();
+
+    tokio::select! {
+        _ = ctrl_c => {
+            println!("\nReceived Ctrl+C, shutting down gracefully...");
+        },
+        _ = terminate => {
+            println!("\nReceived termination signal, shutting down gracefully...");
+        },
+    }
+}
--- a/vendor/ruvector/examples/scipix/examples/batch_processing.rs
+++ b/vendor/ruvector/examples/scipix/examples/batch_processing.rs
@@ -0,0 +1,181 @@
+//! Batch processing example
+//!
+//! This example demonstrates parallel batch processing of multiple images.
+//! It processes all images in a directory concurrently with a progress bar.
+//!
+//! Note: This example requires the `ocr` feature to be enabled.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example batch_processing --features ocr -- /path/to/images output.json
+//! ```
+
+use anyhow::Result;
+use indicatif::{ProgressBar, ProgressStyle};
+use ruvector_scipix::ocr::OcrEngine;
+use ruvector_scipix::output::{OcrResult, OutputFormat};
+use ruvector_scipix::OcrConfig;
+use serde::{Deserialize, Serialize};
+use std::path::{Path, PathBuf};
+use std::sync::Arc;
+use tokio::sync::Semaphore;
+
+#[derive(Debug, Serialize, Deserialize)]
+struct BatchResult {
+    file_path: String,
+    success: bool,
+    text: Option<String>,
+    latex: Option<String>,
+    confidence: Option<f32>,
+    error: Option<String>,
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 3 {
+        eprintln!("Usage: {} <image_directory> <output_json>", args[0]);
+        eprintln!("\nExample:");
+        eprintln!("  {} ./images results.json", args[0]);
+        std::process::exit(1);
+    }
+
+    let image_dir = Path::new(&args[1]);
+    let output_file = &args[2];
+
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    // Collect all image files
+    let image_files = collect_image_files(image_dir)?;
+
+    if image_files.is_empty() {
+        eprintln!("No image files found in: {}", image_dir.display());
+        std::process::exit(1);
+    }
+
+    println!("Found {} images to process", image_files.len());
+
+    // Initialize OCR engine
+    let config = OcrConfig::default();
+    let engine = Arc::new(OcrEngine::new(config).await?);
+
+    // Create progress bar
+    let progress = ProgressBar::new(image_files.len() as u64);
+    progress.set_style(
+        ProgressStyle::default_bar()
+            .template("[{elapsed_precise}] {bar:40.cyan/blue} {pos}/{len} {msg}")
+            .unwrap()
+            .progress_chars("=>-"),
+    );
+
+    // Limit concurrent processing to avoid overwhelming the system
+    let max_concurrent = num_cpus::get();
+    let semaphore = Arc::new(Semaphore::new(max_concurrent));
+
+    // Process images in parallel
+    let mut tasks = Vec::new();
+
+    for image_path in image_files {
+        let engine = Arc::clone(&engine);
+        let semaphore = Arc::clone(&semaphore);
+        let progress = progress.clone();
+
+        let task = tokio::spawn(async move {
+            let _permit = semaphore.acquire().await.unwrap();
+
+            let result = process_image(&engine, &image_path).await;
+            progress.inc(1);
+
+            result
+        });
+
+        tasks.push(task);
+    }
+
+    // Wait for all tasks to complete
+    let mut results = Vec::new();
+    for task in tasks {
+        results.push(task.await?);
+    }
+
+    progress.finish_with_message("Complete");
+
+    // Calculate statistics
+    let successful = results.iter().filter(|r| r.success).count();
+    let failed = results.len() - successful;
+    let avg_confidence =
+        results.iter().filter_map(|r| r.confidence).sum::<f32>() / successful as f32;
+
+    println!("\n{}", "=".repeat(80));
+    println!("Batch Processing Complete");
+    println!("{}", "=".repeat(80));
+    println!("Total: {}", results.len());
+    println!(
+        "Successful: {} ({:.1}%)",
+        successful,
+        (successful as f32 / results.len() as f32) * 100.0
+    );
+    println!("Failed: {}", failed);
+    println!("Average Confidence: {:.2}%", avg_confidence * 100.0);
+    println!("{}", "=".repeat(80));
+
+    // Save results to JSON
+    let json = serde_json::to_string_pretty(&results)?;
+    std::fs::write(output_file, json)?;
+    println!("\nResults saved to: {}", output_file);
+
+    Ok(())
+}
+
+fn collect_image_files(dir: &Path) -> Result<Vec<PathBuf>> {
+    let mut files = Vec::new();
+    let extensions = ["png", "jpg", "jpeg", "bmp", "tiff", "webp"];
+
+    for entry in std::fs::read_dir(dir)? {
+        let entry = entry?;
+        let path = entry.path();
+
+        if path.is_file() {
+            if let Some(ext) = path.extension() {
+                if extensions.contains(&ext.to_str().unwrap_or("").to_lowercase().as_str()) {
+                    files.push(path);
+                }
+            }
+        }
+    }
+
+    Ok(files)
+}
+
+async fn process_image(engine: &OcrEngine, path: &Path) -> BatchResult {
+    let file_path = path.to_string_lossy().to_string();
+
+    match image::open(path) {
+        Ok(img) => match engine.recognize(&img).await {
+            Ok(result) => BatchResult {
+                file_path,
+                success: true,
+                text: Some(result.text.clone()),
+                latex: result.to_format(ruvector_scipix::OutputFormat::LaTeX).ok(),
+                confidence: Some(result.confidence),
+                error: None,
+            },
+            Err(e) => BatchResult {
+                file_path,
+                success: false,
+                text: None,
+                latex: None,
+                confidence: None,
+                error: Some(e.to_string()),
+            },
+        },
+        Err(e) => BatchResult {
+            file_path,
+            success: false,
+            text: None,
+            latex: None,
+            confidence: None,
+            error: Some(e.to_string()),
+        },
+    }
+}
--- a/vendor/ruvector/examples/scipix/examples/custom_pipeline.rs
+++ b/vendor/ruvector/examples/scipix/examples/custom_pipeline.rs
@@ -0,0 +1,358 @@
+//! Custom pipeline example
+//!
+//! This example demonstrates how to create a custom OCR pipeline with:
+//! - Custom preprocessing steps
+//! - Post-processing filters
+//! - Integration with external services
+//! - Custom output formatting
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example custom_pipeline -- image.png
+//! ```
+
+use anyhow::{Context, Result};
+use image::{DynamicImage, ImageBuffer, Luma};
+use ruvector_scipix::{OcrConfig, OcrEngine, OcrResult, OutputFormat};
+use serde::{Deserialize, Serialize};
+
+#[derive(Debug, Clone)]
+struct CustomPipeline {
+    engine: OcrEngine,
+    preprocessing: Vec<PreprocessStep>,
+    postprocessing: Vec<PostprocessStep>,
+}
+
+#[derive(Debug, Clone)]
+enum PreprocessStep {
+    Denoise,
+    Sharpen,
+    ContrastEnhancement,
+    Binarization,
+    Deskew,
+}
+
+#[derive(Debug, Clone)]
+enum PostprocessStep {
+    SpellCheck,
+    LatexValidation,
+    ConfidenceFilter(f32),
+    CustomFormatter,
+}
+
+#[derive(Debug, Serialize, Deserialize)]
+struct PipelineResult {
+    original_result: String,
+    processed_result: String,
+    latex: String,
+    confidence: f32,
+    preprocessing_steps: Vec<String>,
+    postprocessing_steps: Vec<String>,
+    validation_results: ValidationResults,
+}
+
+#[derive(Debug, Serialize, Deserialize)]
+struct ValidationResults {
+    latex_valid: bool,
+    spell_check_corrections: usize,
+    confidence_threshold_passed: bool,
+}
+
+impl CustomPipeline {
+    async fn new(config: OcrConfig) -> Result<Self> {
+        let engine = OcrEngine::new(config).await?;
+
+        Ok(Self {
+            engine,
+            preprocessing: vec![
+                PreprocessStep::Denoise,
+                PreprocessStep::ContrastEnhancement,
+                PreprocessStep::Sharpen,
+                PreprocessStep::Binarization,
+            ],
+            postprocessing: vec![
+                PostprocessStep::ConfidenceFilter(0.7),
+                PostprocessStep::LatexValidation,
+                PostprocessStep::SpellCheck,
+                PostprocessStep::CustomFormatter,
+            ],
+        })
+    }
+
+    async fn process(&self, image: DynamicImage) -> Result<PipelineResult> {
+        // Apply preprocessing steps
+        let mut processed_image = image;
+        let mut preprocessing_log = Vec::new();
+
+        for step in &self.preprocessing {
+            processed_image = self.apply_preprocessing(processed_image, step)?;
+            preprocessing_log.push(format!("{:?}", step));
+        }
+
+        // Run OCR
+        let ocr_result = self.engine.recognize(&processed_image).await?;
+
+        // Apply postprocessing steps
+        let mut result_text = ocr_result.text.clone();
+        let mut postprocessing_log = Vec::new();
+        let mut validation = ValidationResults {
+            latex_valid: false,
+            spell_check_corrections: 0,
+            confidence_threshold_passed: false,
+        };
+
+        for step in &self.postprocessing {
+            let (new_text, step_validation) =
+                self.apply_postprocessing(result_text.clone(), &ocr_result, step)?;
+            result_text = new_text;
+            postprocessing_log.push(format!("{:?}", step));
+
+            // Update validation results
+            match step {
+                PostprocessStep::LatexValidation => {
+                    validation.latex_valid = step_validation.unwrap_or(false);
+                }
+                PostprocessStep::SpellCheck => {
+                    validation.spell_check_corrections = step_validation.unwrap_or(0) as usize;
+                }
+                PostprocessStep::ConfidenceFilter(threshold) => {
+                    validation.confidence_threshold_passed = ocr_result.confidence >= *threshold;
+                }
+                _ => {}
+            }
+        }
+
+        Ok(PipelineResult {
+            original_result: ocr_result.text.clone(),
+            processed_result: result_text,
+            latex: ocr_result.to_format(OutputFormat::LaTeX)?,
+            confidence: ocr_result.confidence,
+            preprocessing_steps: preprocessing_log,
+            postprocessing_steps: postprocessing_log,
+            validation_results: validation,
+        })
+    }
+
+    fn apply_preprocessing(
+        &self,
+        image: DynamicImage,
+        step: &PreprocessStep,
+    ) -> Result<DynamicImage> {
+        match step {
+            PreprocessStep::Denoise => Ok(denoise_image(image)),
+            PreprocessStep::Sharpen => Ok(sharpen_image(image)),
+            PreprocessStep::ContrastEnhancement => Ok(enhance_contrast(image)),
+            PreprocessStep::Binarization => Ok(binarize_image(image)),
+            PreprocessStep::Deskew => Ok(deskew_image(image)),
+        }
+    }
+
+    fn apply_postprocessing(
+        &self,
+        text: String,
+        result: &OcrResult,
+        step: &PostprocessStep,
+    ) -> Result<(String, Option<i32>)> {
+        match step {
+            PostprocessStep::SpellCheck => {
+                let (corrected, corrections) = spell_check(&text);
+                Ok((corrected, Some(corrections as i32)))
+            }
+            PostprocessStep::LatexValidation => {
+                let valid = validate_latex(&text);
+                Ok((text, Some(if valid { 1 } else { 0 })))
+            }
+            PostprocessStep::ConfidenceFilter(threshold) => {
+                if result.confidence >= *threshold {
+                    Ok((text, Some(1)))
+                } else {
+                    Ok((format!("[Low Confidence] {}", text), Some(0)))
+                }
+            }
+            PostprocessStep::CustomFormatter => {
+                let formatted = custom_format(&text);
+                Ok((formatted, None))
+            }
+        }
+    }
+}
+
+// Preprocessing implementations
+fn denoise_image(image: DynamicImage) -> DynamicImage {
+    // Simplified denoising using median filter
+    image.blur(1.0)
+}
+
+fn sharpen_image(image: DynamicImage) -> DynamicImage {
+    // Simplified sharpening
+    image.unsharpen(2.0, 1)
+}
+
+fn enhance_contrast(image: DynamicImage) -> DynamicImage {
+    // Simplified contrast enhancement
+    image.adjust_contrast(20.0)
+}
+
+fn binarize_image(image: DynamicImage) -> DynamicImage {
+    // Otsu's binarization (simplified)
+    let gray = image.to_luma8();
+    let threshold = calculate_otsu_threshold(&gray);
+
+    let binary = ImageBuffer::from_fn(gray.width(), gray.height(), |x, y| {
+        let pixel = gray.get_pixel(x, y)[0];
+        if pixel > threshold {
+            Luma([255u8])
+        } else {
+            Luma([0u8])
+        }
+    });
+
+    DynamicImage::ImageLuma8(binary)
+}
+
+fn deskew_image(image: DynamicImage) -> DynamicImage {
+    // Simplified deskew - in production, use Hough transform
+    image
+}
+
+fn calculate_otsu_threshold(gray: &ImageBuffer<Luma<u8>, Vec<u8>>) -> u8 {
+    // Simplified Otsu's method
+    let mut histogram = [0u32; 256];
+
+    for pixel in gray.pixels() {
+        histogram[pixel[0] as usize] += 1;
+    }
+
+    let total = gray.width() * gray.height();
+    let mut sum = 0u64;
+    for (i, &count) in histogram.iter().enumerate() {
+        sum += i as u64 * count as u64;
+    }
+
+    let mut sum_background = 0u64;
+    let mut weight_background = 0u32;
+    let mut max_variance = 0.0f64;
+    let mut threshold = 0u8;
+
+    for (t, &count) in histogram.iter().enumerate() {
+        weight_background += count;
+        if weight_background == 0 {
+            continue;
+        }
+
+        let weight_foreground = total - weight_background;
+        if weight_foreground == 0 {
+            break;
+        }
+
+        sum_background += t as u64 * count as u64;
+
+        let mean_background = sum_background as f64 / weight_background as f64;
+        let mean_foreground = (sum - sum_background) as f64 / weight_foreground as f64;
+
+        let variance = weight_background as f64
+            * weight_foreground as f64
+            * (mean_background - mean_foreground).powi(2);
+
+        if variance > max_variance {
+            max_variance = variance;
+            threshold = t as u8;
+        }
+    }
+
+    threshold
+}
+
+// Postprocessing implementations
+fn spell_check(text: &str) -> (String, usize) {
+    // Simplified spell check - in production, use a proper library
+    // For demo, just return the original text
+    (text.to_string(), 0)
+}
+
+fn validate_latex(text: &str) -> bool {
+    // Simplified LaTeX validation
+    // Check for balanced braces and common LaTeX patterns
+    let open_braces = text.matches('{').count();
+    let close_braces = text.matches('}').count();
+
+    open_braces == close_braces
+}
+
+fn custom_format(text: &str) -> String {
+    // Custom formatting - e.g., add proper spacing, formatting
+    text.lines()
+        .map(|line| line.trim())
+        .filter(|line| !line.is_empty())
+        .collect::<Vec<_>>()
+        .join("\n")
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 2 {
+        eprintln!("Usage: {} <image_path>", args[0]);
+        std::process::exit(1);
+    }
+
+    let image_path = &args[1];
+
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    println!("Loading image: {}", image_path);
+    let image = image::open(image_path)?;
+
+    // Create custom pipeline
+    let config = OcrConfig::default();
+    let pipeline = CustomPipeline::new(config).await?;
+
+    println!("Processing with custom pipeline...");
+    let result = pipeline.process(image).await?;
+
+    // Display results
+    println!("\n{}", "=".repeat(80));
+    println!("Pipeline Results");
+    println!("{}", "=".repeat(80));
+
+    println!("\n📝 Original OCR Result:");
+    println!("{}", result.original_result);
+
+    println!("\n✨ Processed Result:");
+    println!("{}", result.processed_result);
+
+    println!("\n🔢 LaTeX:");
+    println!("{}", result.latex);
+
+    println!("\n📊 Confidence: {:.2}%", result.confidence * 100.0);
+
+    println!("\n🔧 Preprocessing Steps:");
+    for step in &result.preprocessing_steps {
+        println!("  - {}", step);
+    }
+
+    println!("\n🔄 Postprocessing Steps:");
+    for step in &result.postprocessing_steps {
+        println!("  - {}", step);
+    }
+
+    println!("\n✅ Validation:");
+    println!("  LaTeX Valid: {}", result.validation_results.latex_valid);
+    println!(
+        "  Spell Corrections: {}",
+        result.validation_results.spell_check_corrections
+    );
+    println!(
+        "  Confidence Passed: {}",
+        result.validation_results.confidence_threshold_passed
+    );
+
+    println!("\n{}", "=".repeat(80));
+
+    // Save full results
+    let json = serde_json::to_string_pretty(&result)?;
+    std::fs::write("pipeline_results.json", json)?;
+    println!("\nFull results saved to: pipeline_results.json");
+
+    Ok(())
+}
--- a/vendor/ruvector/examples/scipix/examples/lean_agentic.rs
+++ b/vendor/ruvector/examples/scipix/examples/lean_agentic.rs
@@ -0,0 +1,303 @@
+//! Lean Agentic integration example
+//!
+//! This example demonstrates distributed OCR processing using agent coordination.
+//! Multiple agents work together to process documents in parallel with fault tolerance.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example lean_agentic -- /path/to/documents
+//! ```
+
+use anyhow::{Context, Result};
+use ruvector_scipix::{OcrConfig, OcrEngine};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::path::Path;
+use std::sync::Arc;
+use tokio::sync::{mpsc, RwLock};
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+struct OcrTask {
+    id: String,
+    file_path: String,
+    priority: u8,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+struct OcrTaskResult {
+    task_id: String,
+    agent_id: String,
+    success: bool,
+    text: Option<String>,
+    latex: Option<String>,
+    confidence: Option<f32>,
+    processing_time_ms: u64,
+    error: Option<String>,
+}
+
+#[derive(Debug, Clone)]
+struct OcrAgent {
+    id: String,
+    engine: Arc<OcrEngine>,
+    tasks_completed: Arc<RwLock<usize>>,
+}
+
+impl OcrAgent {
+    async fn new(id: String, config: OcrConfig) -> Result<Self> {
+        let engine = OcrEngine::new(config).await?;
+
+        Ok(Self {
+            id,
+            engine: Arc::new(engine),
+            tasks_completed: Arc::new(RwLock::new(0)),
+        })
+    }
+
+    async fn process_task(&self, task: OcrTask) -> OcrTaskResult {
+        let start = std::time::Instant::now();
+
+        println!("[Agent {}] Processing task: {}", self.id, task.id);
+
+        let result = match image::open(&task.file_path) {
+            Ok(img) => match self.engine.recognize(&img).await {
+                Ok(ocr_result) => {
+                    let mut count = self.tasks_completed.write().await;
+                    *count += 1;
+
+                    OcrTaskResult {
+                        task_id: task.id,
+                        agent_id: self.id.clone(),
+                        success: true,
+                        text: Some(ocr_result.text.clone()),
+                        latex: ocr_result
+                            .to_format(ruvector_scipix::OutputFormat::LaTeX)
+                            .ok(),
+                        confidence: Some(ocr_result.confidence),
+                        processing_time_ms: start.elapsed().as_millis() as u64,
+                        error: None,
+                    }
+                }
+                Err(e) => OcrTaskResult {
+                    task_id: task.id,
+                    agent_id: self.id.clone(),
+                    success: false,
+                    text: None,
+                    latex: None,
+                    confidence: None,
+                    processing_time_ms: start.elapsed().as_millis() as u64,
+                    error: Some(e.to_string()),
+                },
+            },
+            Err(e) => OcrTaskResult {
+                task_id: task.id,
+                agent_id: self.id.clone(),
+                success: false,
+                text: None,
+                latex: None,
+                confidence: None,
+                processing_time_ms: start.elapsed().as_millis() as u64,
+                error: Some(e.to_string()),
+            },
+        };
+
+        println!(
+            "[Agent {}] Completed task: {} ({}ms)",
+            self.id, result.task_id, result.processing_time_ms
+        );
+
+        result
+    }
+
+    async fn get_stats(&self) -> usize {
+        *self.tasks_completed.read().await
+    }
+}
+
+struct AgentCoordinator {
+    agents: Vec<Arc<OcrAgent>>,
+    task_queue: mpsc::Sender<OcrTask>,
+    result_queue: mpsc::Receiver<OcrTaskResult>,
+    results: Arc<RwLock<HashMap<String, OcrTaskResult>>>,
+}
+
+impl AgentCoordinator {
+    async fn new(num_agents: usize, config: OcrConfig) -> Result<Self> {
+        let mut agents = Vec::new();
+
+        for i in 0..num_agents {
+            let agent = OcrAgent::new(format!("agent-{}", i), config.clone()).await?;
+            agents.push(Arc::new(agent));
+        }
+
+        let (task_tx, task_rx) = mpsc::channel::<OcrTask>(100);
+        let (result_tx, result_rx) = mpsc::channel::<OcrTaskResult>(100);
+
+        // Spawn agent workers
+        for agent in &agents {
+            let agent = Arc::clone(agent);
+            let mut task_rx = task_rx.resubscribe();
+            let result_tx = result_tx.clone();
+
+            tokio::spawn(async move {
+                while let Some(task) = task_rx.recv().await {
+                    let result = agent.process_task(task).await;
+                    let _ = result_tx.send(result).await;
+                }
+            });
+        }
+
+        Ok(Self {
+            agents,
+            task_queue: task_tx,
+            result_queue: result_rx,
+            results: Arc::new(RwLock::new(HashMap::new())),
+        })
+    }
+
+    async fn submit_task(&self, task: OcrTask) -> Result<()> {
+        self.task_queue
+            .send(task)
+            .await
+            .context("Failed to submit task")?;
+        Ok(())
+    }
+
+    async fn collect_results(&mut self, expected: usize) -> Vec<OcrTaskResult> {
+        let mut collected = Vec::new();
+
+        while collected.len() < expected {
+            if let Some(result) = self.result_queue.recv().await {
+                let mut results = self.results.write().await;
+                results.insert(result.task_id.clone(), result.clone());
+                collected.push(result);
+            }
+        }
+
+        collected
+    }
+
+    async fn get_agent_stats(&self) -> HashMap<String, usize> {
+        let mut stats = HashMap::new();
+
+        for agent in &self.agents {
+            let count = agent.get_stats().await;
+            stats.insert(agent.id.clone(), count);
+        }
+
+        stats
+    }
+}
+
+// Note: This is a simplified implementation. In production, you would integrate with
+// an actual agent framework like:
+// - lean_agentic crate for agent coordination
+// - tokio actors for distributed processing
+// - Or a custom agent framework
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 2 {
+        eprintln!("Usage: {} <documents_directory>", args[0]);
+        eprintln!("\nExample:");
+        eprintln!("  {} ./documents", args[0]);
+        std::process::exit(1);
+    }
+
+    let docs_dir = Path::new(&args[1]);
+
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    println!("🤖 Initializing Agent Swarm...");
+
+    // Create agent coordinator with 4 agents
+    let num_agents = 4;
+    let config = OcrConfig::default();
+    let mut coordinator = AgentCoordinator::new(num_agents, config).await?;
+
+    println!("✅ Spawned {} OCR agents", num_agents);
+
+    // Collect tasks
+    let mut tasks = Vec::new();
+    for entry in std::fs::read_dir(docs_dir)? {
+        let entry = entry?;
+        let path = entry.path();
+
+        if path.is_file() {
+            if let Some(ext) = path.extension() {
+                let ext_str = ext.to_str().unwrap_or("").to_lowercase();
+                if ["png", "jpg", "jpeg", "bmp", "tiff", "webp"].contains(&ext_str.as_str()) {
+                    let task = OcrTask {
+                        id: format!("task-{}", tasks.len()),
+                        file_path: path.to_string_lossy().to_string(),
+                        priority: 1,
+                    };
+                    tasks.push(task);
+                }
+            }
+        }
+    }
+
+    if tasks.is_empty() {
+        eprintln!("No image files found in: {}", docs_dir.display());
+        std::process::exit(1);
+    }
+
+    println!("📋 Queued {} tasks for processing", tasks.len());
+
+    // Submit all tasks
+    for task in &tasks {
+        coordinator.submit_task(task.clone()).await?;
+    }
+
+    println!("🚀 Processing started...\n");
+
+    let start_time = std::time::Instant::now();
+
+    // Collect results
+    let results = coordinator.collect_results(tasks.len()).await;
+
+    let total_time = start_time.elapsed();
+
+    // Calculate statistics
+    let successful = results.iter().filter(|r| r.success).count();
+    let failed = results.len() - successful;
+    let avg_confidence =
+        results.iter().filter_map(|r| r.confidence).sum::<f32>() / successful.max(1) as f32;
+    let avg_time = results.iter().map(|r| r.processing_time_ms).sum::<u64>() / results.len() as u64;
+
+    // Display results
+    println!("\n{}", "=".repeat(80));
+    println!("Agent Swarm Results");
+    println!("{}", "=".repeat(80));
+    println!("Total Tasks: {}", results.len());
+    println!(
+        "Successful: {} ({:.1}%)",
+        successful,
+        (successful as f32 / results.len() as f32) * 100.0
+    );
+    println!("Failed: {}", failed);
+    println!("Average Confidence: {:.2}%", avg_confidence * 100.0);
+    println!("Average Processing Time: {}ms", avg_time);
+    println!("Total Time: {:.2}s", total_time.as_secs_f32());
+    println!(
+        "Throughput: {:.2} tasks/sec",
+        results.len() as f32 / total_time.as_secs_f32()
+    );
+
+    // Agent statistics
+    println!("\n📊 Agent Statistics:");
+    let agent_stats = coordinator.get_agent_stats().await;
+    for (agent_id, count) in agent_stats {
+        println!("  {}: {} tasks", agent_id, count);
+    }
+
+    println!("{}", "=".repeat(80));
+
+    // Save results
+    let json = serde_json::to_string_pretty(&results)?;
+    std::fs::write("agent_results.json", json)?;
+    println!("\nResults saved to: agent_results.json");
+
+    Ok(())
+}
--- a/vendor/ruvector/examples/scipix/examples/optimization_demo.rs
+++ b/vendor/ruvector/examples/scipix/examples/optimization_demo.rs
@@ -0,0 +1,311 @@
+//! Demonstration of performance optimizations in ruvector-scipix
+//!
+//! This example shows how to use various optimization features:
+//! - SIMD operations for image processing
+//! - Parallel batch processing
+//! - Memory pooling
+//! - Model quantization
+//! - Dynamic batching
+
+use ruvector_scipix::optimize::*;
+use std::sync::Arc;
+use std::time::Instant;
+
+fn main() {
+    println!("=== Ruvector-Scipix Optimization Demo ===\n");
+
+    // 1. Feature Detection
+    demo_feature_detection();
+
+    // 2. SIMD Operations
+    demo_simd_operations();
+
+    // 3. Parallel Processing
+    demo_parallel_processing();
+
+    // 4. Memory Optimizations
+    demo_memory_optimizations();
+
+    // 5. Model Quantization
+    demo_quantization();
+
+    println!("\n=== Demo Complete ===");
+}
+
+fn demo_feature_detection() {
+    println!("1. CPU Feature Detection");
+    println!("------------------------");
+
+    let features = detect_features();
+    println!("AVX2 Support:    {}", if features.avx2 { "✓" } else { "✗" });
+    println!(
+        "AVX-512 Support: {}",
+        if features.avx512f { "✓" } else { "✗" }
+    );
+    println!("NEON Support:    {}", if features.neon { "✓" } else { "✗" });
+    println!(
+        "SSE4.2 Support:  {}",
+        if features.sse4_2 { "✓" } else { "✗" }
+    );
+
+    let opt_level = get_opt_level();
+    println!("Optimization Level: {:?}", opt_level);
+    println!();
+}
+
+fn demo_simd_operations() {
+    println!("2. SIMD Operations");
+    println!("------------------");
+
+    // Create test image (512x512 RGBA)
+    let size = 512;
+    let rgba: Vec<u8> = (0..size * size * 4).map(|i| (i % 256) as u8).collect();
+    let mut gray = vec![0u8; size * size];
+
+    // Benchmark grayscale conversion
+    let iterations = 100;
+
+    let start = Instant::now();
+    for _ in 0..iterations {
+        simd::simd_grayscale(&rgba, &mut gray);
+    }
+    let simd_time = start.elapsed();
+
+    println!("Grayscale conversion ({} iterations):", iterations);
+    println!(
+        "  SIMD: {:?} ({:.2} MP/s)",
+        simd_time,
+        (iterations as f64 * size as f64 * size as f64 / 1_000_000.0) / simd_time.as_secs_f64()
+    );
+
+    // Benchmark threshold
+    let mut binary = vec![0u8; size * size];
+
+    let start = Instant::now();
+    for _ in 0..iterations {
+        simd::simd_threshold(&gray, 128, &mut binary);
+    }
+    let threshold_time = start.elapsed();
+
+    println!("Threshold operation ({} iterations):", iterations);
+    println!(
+        "  SIMD: {:?} ({:.2} MP/s)",
+        threshold_time,
+        (iterations as f64 * size as f64 * size as f64 / 1_000_000.0)
+            / threshold_time.as_secs_f64()
+    );
+
+    // Benchmark normalization
+    let mut data: Vec<f32> = (0..8192).map(|i| i as f32).collect();
+
+    let start = Instant::now();
+    for _ in 0..iterations {
+        simd::simd_normalize(&mut data);
+    }
+    let normalize_time = start.elapsed();
+
+    println!("Normalization ({} iterations):", iterations);
+    println!("  SIMD: {:?}", normalize_time);
+    println!();
+}
+
+fn demo_parallel_processing() {
+    println!("3. Parallel Processing");
+    println!("----------------------");
+
+    let data: Vec<i32> = (0..10000).collect();
+
+    // Sequential processing
+    let start = Instant::now();
+    let _seq_result: Vec<i32> = data.iter().map(|&x| expensive_computation(x)).collect();
+    let seq_time = start.elapsed();
+
+    // Parallel processing
+    let start = Instant::now();
+    let _par_result =
+        parallel::parallel_map_chunked(data.clone(), 100, |x| expensive_computation(x));
+    let par_time = start.elapsed();
+
+    println!("Processing 10,000 items:");
+    println!("  Sequential: {:?}", seq_time);
+    println!("  Parallel:   {:?}", par_time);
+    println!(
+        "  Speedup:    {:.2}x",
+        seq_time.as_secs_f64() / par_time.as_secs_f64()
+    );
+
+    let threads = parallel::optimal_thread_count();
+    println!("  Using {} threads", threads);
+    println!();
+}
+
+fn expensive_computation(x: i32) -> i32 {
+    // Simulate some work
+    (0..100).fold(x, |acc, i| acc.wrapping_add(i))
+}
+
+fn demo_memory_optimizations() {
+    println!("4. Memory Optimizations");
+    println!("-----------------------");
+
+    let pools = memory::GlobalPools::get();
+
+    // Benchmark buffer pool vs direct allocation
+    let iterations = 10000;
+
+    // Pooled allocation
+    let start = Instant::now();
+    for _ in 0..iterations {
+        let mut buf = pools.acquire_small();
+        buf.extend_from_slice(&[0u8; 512]);
+    }
+    let pooled_time = start.elapsed();
+
+    // Direct allocation
+    let start = Instant::now();
+    for _ in 0..iterations {
+        let mut buf = Vec::with_capacity(1024);
+        buf.extend_from_slice(&[0u8; 512]);
+    }
+    let direct_time = start.elapsed();
+
+    println!("Buffer allocation ({} iterations):", iterations);
+    println!("  Pooled:  {:?}", pooled_time);
+    println!("  Direct:  {:?}", direct_time);
+    println!(
+        "  Speedup: {:.2}x",
+        direct_time.as_secs_f64() / pooled_time.as_secs_f64()
+    );
+
+    // Arena allocation
+    let mut arena = memory::Arena::with_capacity(1024 * 1024);
+
+    let start = Instant::now();
+    for _ in 0..iterations {
+        arena.reset();
+        for _ in 0..10 {
+            let _slice = arena.alloc(1024, 8);
+        }
+    }
+    let arena_time = start.elapsed();
+
+    println!(
+        "\nArena allocation ({} iterations, 10 allocs each):",
+        iterations
+    );
+    println!("  Time: {:?}", arena_time);
+    println!();
+}
+
+fn demo_quantization() {
+    println!("5. Model Quantization");
+    println!("---------------------");
+
+    // Create model weights
+    let size = 100_000;
+    let weights: Vec<f32> = (0..size)
+        .map(|i| ((i as f32 / size as f32) * 2.0 - 1.0))
+        .collect();
+
+    println!(
+        "Original model: {} weights ({:.2} MB)",
+        weights.len(),
+        (weights.len() * std::mem::size_of::<f32>()) as f64 / 1_048_576.0
+    );
+
+    // Quantize
+    let start = Instant::now();
+    let (quantized, params) = quantize::quantize_weights(&weights);
+    let quant_time = start.elapsed();
+
+    println!(
+        "Quantized:      {} weights ({:.2} MB)",
+        quantized.len(),
+        (quantized.len() * std::mem::size_of::<i8>()) as f64 / 1_048_576.0
+    );
+    println!(
+        "Compression:    {:.2}x",
+        (weights.len() * std::mem::size_of::<f32>()) as f64
+            / (quantized.len() * std::mem::size_of::<i8>()) as f64
+    );
+    println!("Quantization time: {:?}", quant_time);
+
+    // Check quality
+    let error = quantize::quantization_error(&weights, &quantized, params);
+    let snr = quantize::sqnr(&weights, &quantized, params);
+
+    println!("Quality metrics:");
+    println!("  MSE:  {:.6}", error);
+    println!("  SQNR: {:.2} dB", snr);
+
+    // Benchmark dequantization
+    let iterations = 100;
+    let start = Instant::now();
+    for _ in 0..iterations {
+        let _restored = quantize::dequantize(&quantized, params);
+    }
+    let dequant_time = start.elapsed();
+
+    println!(
+        "Dequantization ({} iterations): {:?}",
+        iterations, dequant_time
+    );
+
+    // Per-channel quantization
+    let weights_2d: Vec<f32> = (0..10_000).map(|i| i as f32).collect();
+    let shape = vec![100, 100]; // 100 channels, 100 values each
+
+    let start = Instant::now();
+    let per_channel = quantize::PerChannelQuant::from_f32(&weights_2d, shape);
+    let per_channel_time = start.elapsed();
+
+    println!("\nPer-channel quantization:");
+    println!("  Channels: {}", per_channel.params.len());
+    println!("  Time:     {:?}", per_channel_time);
+    println!();
+}
+
+// Async batching demo (would need tokio runtime)
+#[allow(dead_code)]
+async fn demo_batching() {
+    println!("6. Dynamic Batching");
+    println!("-------------------");
+
+    use batch::{BatchConfig, DynamicBatcher};
+
+    let config = BatchConfig {
+        max_batch_size: 32,
+        max_wait_ms: 50,
+        max_queue_size: 1000,
+        preferred_batch_size: 16,
+    };
+
+    let batcher = Arc::new(DynamicBatcher::new(config, |items: Vec<i32>| {
+        // Simulate batch processing
+        items.into_iter().map(|x| Ok(x * 2)).collect()
+    }));
+
+    // Start processing loop
+    let batcher_clone = batcher.clone();
+    tokio::spawn(async move {
+        batcher_clone.run().await;
+    });
+
+    // Add items
+    let mut handles = vec![];
+    for i in 0..100 {
+        let batcher = batcher.clone();
+        handles.push(tokio::spawn(async move { batcher.add(i).await }));
+    }
+
+    // Wait for results
+    for handle in handles {
+        let _ = handle.await;
+    }
+
+    let stats = batcher.stats().await;
+    println!("Queue size: {}", stats.queue_size);
+    println!("Max wait:   {:?}", stats.max_wait_time);
+
+    batcher.shutdown().await;
+}
--- a/vendor/ruvector/examples/scipix/examples/sample_dataset.json
+++ b/vendor/ruvector/examples/scipix/examples/sample_dataset.json
@@ -0,0 +1,62 @@
+[
+  {
+    "image_path": "test_images/quadratic_1.png",
+    "ground_truth_text": "x^2 + 2x + 1 = 0",
+    "ground_truth_latex": "x^{2} + 2x + 1 = 0",
+    "category": "quadratic"
+  },
+  {
+    "image_path": "test_images/linear_1.png",
+    "ground_truth_text": "y = mx + b",
+    "ground_truth_latex": "y = mx + b",
+    "category": "linear"
+  },
+  {
+    "image_path": "test_images/integral_1.png",
+    "ground_truth_text": "∫ x^2 dx = x^3/3 + C",
+    "ground_truth_latex": "\\int x^{2} dx = \\frac{x^{3}}{3} + C",
+    "category": "calculus"
+  },
+  {
+    "image_path": "test_images/derivative_1.png",
+    "ground_truth_text": "d/dx(sin(x)) = cos(x)",
+    "ground_truth_latex": "\\frac{d}{dx}(\\sin(x)) = \\cos(x)",
+    "category": "calculus"
+  },
+  {
+    "image_path": "test_images/matrix_1.png",
+    "ground_truth_text": "[1 2; 3 4]",
+    "ground_truth_latex": "\\begin{bmatrix} 1 & 2 \\\\ 3 & 4 \\end{bmatrix}",
+    "category": "linear_algebra"
+  },
+  {
+    "image_path": "test_images/sum_1.png",
+    "ground_truth_text": "Σ(i=1 to n) i = n(n+1)/2",
+    "ground_truth_latex": "\\sum_{i=1}^{n} i = \\frac{n(n+1)}{2}",
+    "category": "summation"
+  },
+  {
+    "image_path": "test_images/fraction_1.png",
+    "ground_truth_text": "(a + b) / (c - d)",
+    "ground_truth_latex": "\\frac{a + b}{c - d}",
+    "category": "fraction"
+  },
+  {
+    "image_path": "test_images/sqrt_1.png",
+    "ground_truth_text": "√(a^2 + b^2) = c",
+    "ground_truth_latex": "\\sqrt{a^{2} + b^{2}} = c",
+    "category": "radical"
+  },
+  {
+    "image_path": "test_images/limit_1.png",
+    "ground_truth_text": "lim(x→0) sin(x)/x = 1",
+    "ground_truth_latex": "\\lim_{x \\to 0} \\frac{\\sin(x)}{x} = 1",
+    "category": "calculus"
+  },
+  {
+    "image_path": "test_images/exponent_1.png",
+    "ground_truth_text": "e^(iπ) + 1 = 0",
+    "ground_truth_latex": "e^{i\\pi} + 1 = 0",
+    "category": "exponential"
+  }
+]
--- a/vendor/ruvector/examples/scipix/examples/simple_ocr.rs
+++ b/vendor/ruvector/examples/scipix/examples/simple_ocr.rs
@@ -0,0 +1,75 @@
+//! Simple OCR example
+//!
+//! This example demonstrates basic OCR functionality with ruvector-scipix.
+//! It processes a single image and outputs the recognized text and LaTeX.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example simple_ocr -- image.png
+//! ```
+
+use anyhow::{Context, Result};
+use ruvector_scipix::{OcrConfig, OcrEngine, OutputFormat};
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    // Parse command line arguments
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 2 {
+        eprintln!("Usage: {} <image_path>", args[0]);
+        eprintln!("\nExample:");
+        eprintln!("  {} equation.png", args[0]);
+        std::process::exit(1);
+    }
+
+    let image_path = &args[1];
+
+    // Initialize logger for debug output
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    println!("Loading image: {}", image_path);
+
+    // Create default OCR configuration
+    let config = OcrConfig::default();
+
+    // Initialize OCR engine
+    println!("Initializing OCR engine...");
+    let engine = OcrEngine::new(config)
+        .await
+        .context("Failed to initialize OCR engine")?;
+
+    // Load and process the image
+    let image = image::open(image_path).context(format!("Failed to open image: {}", image_path))?;
+
+    println!("Processing image...");
+    let result = engine
+        .recognize(&image)
+        .await
+        .context("OCR recognition failed")?;
+
+    // Display results
+    println!("\n{}", "=".repeat(80));
+    println!("OCR Results");
+    println!("{}", "=".repeat(80));
+
+    println!("\n📝 Plain Text:");
+    println!("{}", result.text);
+
+    println!("\n🔢 LaTeX:");
+    println!("{}", result.to_format(OutputFormat::LaTeX)?);
+
+    println!("\n📊 Confidence: {:.2}%", result.confidence * 100.0);
+
+    if let Some(metadata) = &result.metadata {
+        println!("\n📋 Metadata:");
+        println!("  Language: {:?}", metadata.get("language"));
+        println!(
+            "  Processing time: {:?}",
+            metadata.get("processing_time_ms")
+        );
+    }
+
+    println!("\n{}", "=".repeat(80));
+
+    Ok(())
+}
--- a/vendor/ruvector/examples/scipix/examples/streaming.rs
+++ b/vendor/ruvector/examples/scipix/examples/streaming.rs
@@ -0,0 +1,184 @@
+//! Streaming PDF processing example
+//!
+//! This example demonstrates streaming processing of large PDF documents.
+//! Results are streamed as pages are processed, with real-time progress reporting.
+//!
+//! Usage:
+//! ```bash
+//! cargo run --example streaming -- document.pdf output/
+//! ```
+
+use anyhow::{Context, Result};
+use futures::stream::{self, StreamExt};
+use indicatif::{ProgressBar, ProgressStyle};
+use ruvector_scipix::ocr::OcrEngine;
+use ruvector_scipix::output::{OcrResult, OutputFormat};
+use ruvector_scipix::OcrConfig;
+use serde::{Deserialize, Serialize};
+use std::path::Path;
+use tokio::fs;
+
+#[derive(Debug, Serialize, Deserialize)]
+struct PageResult {
+    page_number: usize,
+    text: String,
+    latex: Option<String>,
+    confidence: f32,
+    processing_time_ms: u64,
+}
+
+#[derive(Debug, Serialize, Deserialize)]
+struct DocumentResult {
+    total_pages: usize,
+    pages: Vec<PageResult>,
+    total_processing_time_ms: u64,
+    average_confidence: f32,
+}
+
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args: Vec<String> = std::env::args().collect();
+    if args.len() < 3 {
+        eprintln!("Usage: {} <pdf_path> <output_directory>", args[0]);
+        eprintln!("\nExample:");
+        eprintln!("  {} document.pdf ./output", args[0]);
+        std::process::exit(1);
+    }
+
+    let pdf_path = Path::new(&args[1]);
+    let output_dir = Path::new(&args[2]);
+
+    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
+
+    // Create output directory
+    fs::create_dir_all(output_dir).await?;
+
+    println!("Loading PDF: {}", pdf_path.display());
+
+    // Extract pages from PDF
+    let pages = extract_pdf_pages(pdf_path)?;
+    println!("Extracted {} pages", pages.len());
+
+    // Initialize OCR engine
+    let config = OcrConfig::default();
+    let engine = OcrEngine::new(config).await?;
+
+    // Setup progress bar
+    let progress = ProgressBar::new(pages.len() as u64);
+    progress.set_style(
+        ProgressStyle::default_bar()
+            .template("[{elapsed_precise}] {bar:40.cyan/blue} {pos}/{len} {msg}")
+            .unwrap()
+            .progress_chars("=>-"),
+    );
+
+    let start_time = std::time::Instant::now();
+    let mut page_results = Vec::new();
+
+    // Process pages as a stream
+    let mut stream = stream::iter(pages.into_iter().enumerate())
+        .map(|(idx, page_data)| {
+            let engine = &engine;
+            async move { process_page(engine, idx + 1, page_data).await }
+        })
+        .buffer_unordered(4); // Process 4 pages concurrently
+
+    // Stream results and save incrementally
+    while let Some(result) = stream.next().await {
+        match result {
+            Ok(page_result) => {
+                // Save individual page result
+                let page_file =
+                    output_dir.join(format!("page_{:04}.json", page_result.page_number));
+                let json = serde_json::to_string_pretty(&page_result)?;
+                fs::write(&page_file, json).await?;
+
+                progress.set_message(format!(
+                    "Page {} - {:.1}%",
+                    page_result.page_number,
+                    page_result.confidence * 100.0
+                ));
+                progress.inc(1);
+
+                page_results.push(page_result);
+            }
+            Err(e) => {
+                eprintln!("Error processing page: {}", e);
+                progress.inc(1);
+            }
+        }
+    }
+
+    progress.finish_with_message("Complete");
+
+    let total_time = start_time.elapsed().as_millis() as u64;
+
+    // Calculate statistics
+    let avg_confidence =
+        page_results.iter().map(|p| p.confidence).sum::<f32>() / page_results.len() as f32;
+
+    // Create document result
+    let doc_result = DocumentResult {
+        total_pages: page_results.len(),
+        pages: page_results,
+        total_processing_time_ms: total_time,
+        average_confidence: avg_confidence,
+    };
+
+    // Save complete document result
+    let doc_file = output_dir.join("document.json");
+    let json = serde_json::to_string_pretty(&doc_result)?;
+    fs::write(&doc_file, json).await?;
+
+    println!("\n{}", "=".repeat(80));
+    println!("Processing Complete");
+    println!("{}", "=".repeat(80));
+    println!("Total pages: {}", doc_result.total_pages);
+    println!("Total time: {:.2}s", total_time as f32 / 1000.0);
+    println!(
+        "Average time per page: {:.2}s",
+        (total_time as f32 / doc_result.total_pages as f32) / 1000.0
+    );
+    println!("Average confidence: {:.2}%", avg_confidence * 100.0);
+    println!("Results saved to: {}", output_dir.display());
+    println!("{}", "=".repeat(80));
+
+    Ok(())
+}
+
+fn extract_pdf_pages(pdf_path: &Path) -> Result<Vec<Vec<u8>>> {
+    // TODO: Implement actual PDF extraction using pdf-extract or similar
+    // For now, this is a placeholder that returns mock data
+    println!("Note: PDF extraction is not yet implemented");
+    println!("This example shows the streaming architecture");
+
+    // Mock implementation - in real use, this would extract actual PDF pages
+    Ok(vec![vec![0u8; 100]]) // Placeholder
+}
+
+async fn process_page(
+    engine: &OcrEngine,
+    page_number: usize,
+    page_data: Vec<u8>,
+) -> Result<PageResult> {
+    let start = std::time::Instant::now();
+
+    // TODO: Convert page_data to image
+    // For now, using a placeholder
+    let image = image::DynamicImage::new_rgb8(100, 100);
+
+    let result = engine
+        .recognize(&image)
+        .await
+        .context(format!("Failed to process page {}", page_number))?;
+
+    let processing_time = start.elapsed().as_millis() as u64;
+
+    Ok(PageResult {
+        page_number,
+        text: result.text.clone(),
+        latex: result.to_format(OutputFormat::LaTeX).ok(),
+        confidence: result.confidence,
+        processing_time_ms: processing_time,
+    })
+}
--- a/vendor/ruvector/examples/scipix/examples/wasm_demo.html
+++ b/vendor/ruvector/examples/scipix/examples/wasm_demo.html
@@ -0,0 +1,442 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>ruvector-scipix Browser Demo</title>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+            padding: 20px;
+        }
+
+        .container {
+            max-width: 1200px;
+            margin: 0 auto;
+            background: white;
+            border-radius: 20px;
+            box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
+            overflow: hidden;
+        }
+
+        header {
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            color: white;
+            padding: 30px;
+            text-align: center;
+        }
+
+        h1 {
+            font-size: 2.5em;
+            margin-bottom: 10px;
+        }
+
+        .subtitle {
+            font-size: 1.1em;
+            opacity: 0.9;
+        }
+
+        .main-content {
+            display: grid;
+            grid-template-columns: 1fr 1fr;
+            gap: 30px;
+            padding: 30px;
+        }
+
+        .upload-section {
+            display: flex;
+            flex-direction: column;
+            gap: 20px;
+        }
+
+        .upload-area {
+            border: 3px dashed #667eea;
+            border-radius: 15px;
+            padding: 40px;
+            text-align: center;
+            cursor: pointer;
+            transition: all 0.3s ease;
+            background: #f8f9ff;
+        }
+
+        .upload-area:hover {
+            border-color: #764ba2;
+            background: #f0f2ff;
+            transform: translateY(-2px);
+        }
+
+        .upload-area.dragover {
+            border-color: #764ba2;
+            background: #e8ebff;
+        }
+
+        .upload-icon {
+            font-size: 3em;
+            margin-bottom: 15px;
+        }
+
+        .file-input {
+            display: none;
+        }
+
+        .btn {
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            color: white;
+            border: none;
+            padding: 15px 30px;
+            border-radius: 10px;
+            font-size: 1em;
+            cursor: pointer;
+            transition: all 0.3s ease;
+            font-weight: 600;
+        }
+
+        .btn:hover {
+            transform: translateY(-2px);
+            box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
+        }
+
+        .btn:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+            transform: none;
+        }
+
+        .preview-container {
+            background: #f8f9ff;
+            border-radius: 15px;
+            padding: 20px;
+            min-height: 300px;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+        }
+
+        #preview {
+            max-width: 100%;
+            max-height: 400px;
+            border-radius: 10px;
+        }
+
+        .results-section {
+            display: flex;
+            flex-direction: column;
+            gap: 20px;
+        }
+
+        .result-box {
+            background: #f8f9ff;
+            border-radius: 15px;
+            padding: 20px;
+            min-height: 150px;
+        }
+
+        .result-box h3 {
+            color: #667eea;
+            margin-bottom: 15px;
+            display: flex;
+            align-items: center;
+            gap: 10px;
+        }
+
+        .result-content {
+            background: white;
+            padding: 15px;
+            border-radius: 10px;
+            font-family: 'Courier New', monospace;
+            white-space: pre-wrap;
+            word-break: break-word;
+            max-height: 300px;
+            overflow-y: auto;
+        }
+
+        .confidence-bar {
+            background: #e0e0e0;
+            height: 30px;
+            border-radius: 15px;
+            overflow: hidden;
+            position: relative;
+        }
+
+        .confidence-fill {
+            background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
+            height: 100%;
+            transition: width 0.5s ease;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            color: white;
+            font-weight: 600;
+        }
+
+        .loading {
+            display: none;
+            text-align: center;
+            padding: 20px;
+        }
+
+        .loading.active {
+            display: block;
+        }
+
+        .spinner {
+            border: 4px solid #f3f3f3;
+            border-top: 4px solid #667eea;
+            border-radius: 50%;
+            width: 40px;
+            height: 40px;
+            animation: spin 1s linear infinite;
+            margin: 0 auto 15px;
+        }
+
+        @keyframes spin {
+            0% { transform: rotate(0deg); }
+            100% { transform: rotate(360deg); }
+        }
+
+        .error {
+            background: #fee;
+            border: 2px solid #fcc;
+            color: #c00;
+            padding: 15px;
+            border-radius: 10px;
+            margin-top: 15px;
+            display: none;
+        }
+
+        .error.active {
+            display: block;
+        }
+
+        @media (max-width: 768px) {
+            .main-content {
+                grid-template-columns: 1fr;
+            }
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>🔢 ruvector-scipix</h1>
+            <p class="subtitle">Advanced Math OCR in Your Browser</p>
+        </header>
+
+        <div class="main-content">
+            <div class="upload-section">
+                <div class="upload-area" id="uploadArea">
+                    <div class="upload-icon">📸</div>
+                    <h3>Drop an image here or click to upload</h3>
+                    <p>Supports PNG, JPG, JPEG, WebP</p>
+                    <input type="file" id="fileInput" class="file-input" accept="image/*">
+                </div>
+
+                <button class="btn" id="processBtn" disabled>
+                    🚀 Process Image
+                </button>
+
+                <div class="preview-container">
+                    <canvas id="preview"></canvas>
+                </div>
+
+                <div class="loading" id="loading">
+                    <div class="spinner"></div>
+                    <p>Processing image...</p>
+                </div>
+
+                <div class="error" id="error"></div>
+            </div>
+
+            <div class="results-section">
+                <div class="result-box">
+                    <h3>📝 Plain Text</h3>
+                    <div class="result-content" id="textResult">
+                        Results will appear here...
+                    </div>
+                </div>
+
+                <div class="result-box">
+                    <h3>🔢 LaTeX</h3>
+                    <div class="result-content" id="latexResult">
+                        LaTeX output will appear here...
+                    </div>
+                </div>
+
+                <div class="result-box">
+                    <h3>📊 Confidence</h3>
+                    <div class="confidence-bar">
+                        <div class="confidence-fill" id="confidenceFill" style="width: 0%">
+                            0%
+                        </div>
+                    </div>
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <script type="module">
+        // Note: This demo requires the WASM build of ruvector-scipix
+        // Build with: wasm-pack build --target web
+
+        let wasmModule = null;
+        let currentImage = null;
+
+        // Initialize WASM module
+        async function initWasm() {
+            try {
+                // In production, this would load the actual WASM module
+                // import init, { MathpixWasm } from './pkg/ruvector_scipix_wasm.js';
+                // wasmModule = await init();
+
+                console.log('WASM module would be initialized here');
+                // For demo purposes, we'll simulate the OCR
+            } catch (error) {
+                showError('Failed to initialize WASM module: ' + error.message);
+            }
+        }
+
+        // File upload handling
+        const uploadArea = document.getElementById('uploadArea');
+        const fileInput = document.getElementById('fileInput');
+        const processBtn = document.getElementById('processBtn');
+        const preview = document.getElementById('preview');
+
+        uploadArea.addEventListener('click', () => fileInput.click());
+
+        uploadArea.addEventListener('dragover', (e) => {
+            e.preventDefault();
+            uploadArea.classList.add('dragover');
+        });
+
+        uploadArea.addEventListener('dragleave', () => {
+            uploadArea.classList.remove('dragover');
+        });
+
+        uploadArea.addEventListener('drop', (e) => {
+            e.preventDefault();
+            uploadArea.classList.remove('dragover');
+            const file = e.dataTransfer.files[0];
+            if (file) handleFile(file);
+        });
+
+        fileInput.addEventListener('change', (e) => {
+            const file = e.target.files[0];
+            if (file) handleFile(file);
+        });
+
+        function handleFile(file) {
+            if (!file.type.startsWith('image/')) {
+                showError('Please select an image file');
+                return;
+            }
+
+            const reader = new FileReader();
+            reader.onload = (e) => {
+                const img = new Image();
+                img.onload = () => {
+                    drawPreview(img);
+                    currentImage = img;
+                    processBtn.disabled = false;
+                };
+                img.src = e.target.result;
+            };
+            reader.readAsDataURL(file);
+        }
+
+        function drawPreview(img) {
+            const ctx = preview.getContext('2d');
+            const maxWidth = 500;
+            const maxHeight = 400;
+            let width = img.width;
+            let height = img.height;
+
+            if (width > maxWidth) {
+                height *= maxWidth / width;
+                width = maxWidth;
+            }
+            if (height > maxHeight) {
+                width *= maxHeight / height;
+                height = maxHeight;
+            }
+
+            preview.width = width;
+            preview.height = height;
+            ctx.drawImage(img, 0, 0, width, height);
+        }
+
+        processBtn.addEventListener('click', async () => {
+            if (!currentImage) return;
+
+            showLoading(true);
+            hideError();
+
+            try {
+                // In production, this would call the actual WASM OCR
+                // const result = await wasmModule.recognize(imageData);
+
+                // For demo, simulate OCR processing
+                await simulateOcr();
+            } catch (error) {
+                showError('OCR processing failed: ' + error.message);
+            } finally {
+                showLoading(false);
+            }
+        });
+
+        async function simulateOcr() {
+            // Simulate processing delay
+            await new Promise(resolve => setTimeout(resolve, 2000));
+
+            const mockResults = {
+                text: 'x² + 2x + 1 = 0',
+                latex: 'x^{2} + 2x + 1 = 0',
+                confidence: 0.95
+            };
+
+            displayResults(mockResults);
+        }
+
+        function displayResults(results) {
+            document.getElementById('textResult').textContent = results.text;
+            document.getElementById('latexResult').textContent = results.latex;
+
+            const confidencePct = (results.confidence * 100).toFixed(1);
+            const confidenceFill = document.getElementById('confidenceFill');
+            confidenceFill.style.width = confidencePct + '%';
+            confidenceFill.textContent = confidencePct + '%';
+        }
+
+        function showLoading(show) {
+            const loading = document.getElementById('loading');
+            if (show) {
+                loading.classList.add('active');
+                processBtn.disabled = true;
+            } else {
+                loading.classList.remove('active');
+                processBtn.disabled = false;
+            }
+        }
+
+        function showError(message) {
+            const error = document.getElementById('error');
+            error.textContent = message;
+            error.classList.add('active');
+        }
+
+        function hideError() {
+            document.getElementById('error').classList.remove('active');
+        }
+
+        // Initialize on page load
+        initWasm();
+    </script>
+</body>
+</html>