git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
Scipix WASM - WebAssembly OCR
High-performance OCR with LaTeX support for the browser, powered by WebAssembly.
Features
- 📸 Image OCR: Recognize text from images
- 🧮 LaTeX Support: Extract mathematical formulas
- ⚡ Web Workers: Off-main-thread processing
- 🎯 TypeScript: Full type definitions
- 🚀 Optimized: <2MB bundle size
- 🔧 Flexible API: Multiple input formats
Quick Start
Installation
npm install ruvector-scipix-wasm
Build from Source
cd examples/scipix
npm run build
Basic Usage
import { createScipix } from 'ruvector-scipix-wasm';
// Initialize
const scipix = await createScipix();
// Recognize from file
const result = await scipix.recognize(imageData);
console.log(result.text);
console.log(result.latex);
Canvas Example
import { recognizeCanvas } from 'ruvector-scipix-wasm';
const canvas = document.getElementById('myCanvas');
const result = await recognizeCanvas(canvas);
Web Worker Example
import { createWorker } from 'ruvector-scipix-wasm';
const worker = createWorker();
// Process in background
const result = await worker.recognize(imageData);
// Batch processing with progress
const results = await worker.recognizeBatch(images, {
onProgress: ({ processed, total }) => {
console.log(`Progress: ${processed}/${total}`);
}
});
worker.terminate();
API Reference
createScipix(options?)
Create a new Scipix instance.
const scipix = await createScipix({
format: 'both', // 'text' | 'latex' | 'both'
confidenceThreshold: 0.5 // 0.0 - 1.0
});
ScipixWasm
Main API class.
Methods
recognize(imageData: Uint8Array): Promise<OcrResult>recognizeFromCanvas(canvas: HTMLCanvasElement): Promise<OcrResult>recognizeBase64(base64: string): Promise<OcrResult>recognizeImageData(imageData: ImageData): Promise<OcrResult>recognizeBatch(images: Uint8Array[]): Promise<OcrResult[]>setFormat(format: RecognitionFormat): voidsetConfidenceThreshold(threshold: number): voidgetVersion(): string
Helper Functions
import {
recognizeFile, // From File/Blob
recognizeCanvas, // From HTMLCanvasElement
recognizeBase64, // From base64 string
recognizeUrl, // From image URL
recognizeBatch, // Batch processing
imageToCanvas, // Convert image to canvas
} from 'ruvector-scipix-wasm';
Types
OcrResult
interface OcrResult {
text: string; // Recognized text
latex?: string; // LaTeX (if enabled)
confidence: number; // 0.0 - 1.0
metadata?: {
width?: number;
height?: number;
format?: string;
};
}
RecognitionFormat
type RecognitionFormat = 'text' | 'latex' | 'both';
Demo
Run the interactive demo:
npm run dev
Open http://localhost:8080/example.html
Performance Tips
- Use Web Workers for large images or batch processing
- Set confidence threshold to filter low-quality results
- Resize images before processing if possible
- Reuse instances instead of creating new ones
- Use SharedImageBuffer for large image batches
Browser Support
- Chrome 57+
- Firefox 52+
- Safari 11+
- Edge 16+
Requires WebAssembly support.
Bundle Size
- WASM module: ~800KB (gzipped)
- JavaScript wrapper: ~15KB (gzipped)
- Total: <1MB
License
MIT
Credits
Built with: