190 lines
6.7 KiB
Markdown
190 lines
6.7 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to this project will be documented in this file.
|
|
|
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
|
|
## [0.1.0] - 2024-11-28
|
|
|
|
### Added
|
|
|
|
#### Core Features
|
|
- **Mathematical OCR Engine**: Complete implementation of OCR for mathematical equations and expressions
|
|
- **Vector-Based Caching**: Intelligent caching using ruvector-core for image embeddings and similarity search
|
|
- **Multi-Format Output**: Support for LaTeX, MathML, AsciiMath, SMILES, HTML, DOCX, JSON, and MMD formats
|
|
- **Image Preprocessing Pipeline**: Advanced image enhancement, deskewing, rotation correction, and segmentation
|
|
- **Configuration Management**: Flexible TOML-based configuration with presets (default, high-accuracy, high-speed)
|
|
|
|
#### API Server
|
|
- **REST API Implementation**: Scipix v3 API compatible endpoints
|
|
- `/v3/text` - Image OCR processing (multipart/base64/URL)
|
|
- `/v3/strokes` - Digital ink recognition
|
|
- `/v3/pdf` - Async PDF processing with job queue
|
|
- `/v3/latex` - Legacy equation recognition
|
|
- `/v3/converter` - Document format conversion
|
|
- `/health` - Health check endpoint
|
|
- **Production-Ready Middleware**:
|
|
- Authentication (app_id/app_key validation)
|
|
- Token bucket rate limiting (100 req/min default)
|
|
- Request tracing and structured logging
|
|
- CORS support with configurable origins
|
|
- Gzip compression for responses
|
|
- **Async Job Queue**: Background processing for PDF jobs with status tracking and webhook callbacks
|
|
- **Result Caching**: Moka-based async caching with TTL
|
|
- **Graceful Shutdown**: Proper resource cleanup on termination
|
|
|
|
#### WebAssembly Support
|
|
- **Browser-Based OCR**: Process images directly in the browser
|
|
- **Web Worker Support**: Off-main-thread processing with progress reporting
|
|
- **Multiple Input Formats**: File, Canvas, Base64, URL support
|
|
- **Optimized Bundle**: <2MB compressed size with efficient memory management
|
|
- **TypeScript Definitions**: Full type safety for JavaScript/TypeScript projects
|
|
|
|
#### CLI Tool
|
|
- **Interactive Commands**:
|
|
- `ocr` - Process single or batch images
|
|
- `serve` - Start API server
|
|
- `batch` - Process multiple images in parallel
|
|
- `config` - Manage configuration files
|
|
- **Rich Terminal UI**: Progress bars, colored output, and interactive tables
|
|
- **Shell Completions**: Support for bash, zsh, fish, and PowerShell
|
|
|
|
#### Performance Optimizations
|
|
- **SIMD Acceleration**: Vectorized operations for image processing
|
|
- **Parallel Processing**: Multi-threaded batch processing with rayon
|
|
- **Memory Optimization**: Efficient memory pooling and buffer reuse
|
|
- **Quantization Support**: Model quantization for reduced memory footprint
|
|
- **Batch Inference**: Optimized batch processing for throughput
|
|
|
|
#### Math Processing
|
|
- **LaTeX Parser**: Complete LaTeX to AST parsing with error recovery
|
|
- **MathML Generation**: AST to MathML conversion with proper semantics
|
|
- **AsciiMath Support**: AsciiMath parsing and conversion
|
|
- **Symbol Library**: Comprehensive mathematical symbol database
|
|
- **Format Conversion**: Convert between LaTeX, MathML, and AsciiMath
|
|
|
|
#### Developer Experience
|
|
- **Comprehensive Documentation**: 15+ detailed documentation files covering:
|
|
- Architecture and design decisions
|
|
- OCR research and algorithms
|
|
- Rust ecosystem integration
|
|
- Testing strategies
|
|
- Security best practices
|
|
- Optimization techniques
|
|
- WASM implementation guide
|
|
- Lean/Agentic integration roadmap
|
|
- **Example Programs**: 7 example applications demonstrating different use cases
|
|
- **Integration Tests**: Comprehensive test suite with >90% coverage target
|
|
- **Benchmarks**: Performance benchmarks using Criterion
|
|
- **Type Safety**: Strong typing throughout with comprehensive error handling
|
|
|
|
### Technical Details
|
|
|
|
#### Architecture
|
|
- **Modular Design**: Clean separation of concerns with feature flags
|
|
- **Feature Flags**:
|
|
- `default` - Core functionality with preprocessing, caching, and optimization
|
|
- `preprocess` - Image preprocessing pipeline
|
|
- `cache` - Vector-based caching
|
|
- `ocr` - OCR engine (requires ONNX models)
|
|
- `math` - Mathematical parsing and conversion
|
|
- `optimize` - Performance optimizations
|
|
- `wasm` - WebAssembly bindings
|
|
|
|
#### Dependencies
|
|
- **Core**: ruvector-core, image, imageproc, serde, tokio
|
|
- **ML**: ort (ONNX Runtime) for model inference
|
|
- **Web**: axum, tower, tower-http for REST API
|
|
- **CLI**: clap, indicatif, console for command-line interface
|
|
- **Math**: nom for parsing, nalgebra for linear algebra
|
|
- **Performance**: rayon, memmap2, SIMD intrinsics
|
|
- **Testing**: criterion, proptest, mockall
|
|
|
|
#### Performance Benchmarks
|
|
- **OCR Throughput**: Target >100 images/second (batch mode)
|
|
- **API Latency**: <100ms for typical equations (cached)
|
|
- **Memory Usage**: <500MB baseline, <2GB peak
|
|
- **Cache Hit Rate**: >80% for similar equations
|
|
- **WASM Bundle**: <2MB compressed, <5MB uncompressed
|
|
|
|
### Known Limitations
|
|
|
|
- **ONNX Models**: Models not included in repository (must be downloaded separately)
|
|
- **GPU Support**: ONNX Runtime CPU-only (GPU support planned)
|
|
- **Language Support**: English and mathematical notation only
|
|
- **Handwriting**: Limited handwriting recognition (digital ink only)
|
|
- **Complex Layouts**: Advanced layout analysis planned for future releases
|
|
- **Database**: No persistent storage yet (planned for 0.2.0)
|
|
|
|
### Security
|
|
|
|
- **Input Validation**: Comprehensive validation using validator crate
|
|
- **Rate Limiting**: Default 100 req/min per client
|
|
- **Authentication**: Required for all API endpoints (except health)
|
|
- **No Secrets**: Environment variables for all credentials
|
|
- **CORS**: Configurable allowed origins
|
|
- **Size Limits**: Configurable max request/file sizes
|
|
|
|
### Breaking Changes
|
|
|
|
None (initial release)
|
|
|
|
### Migration Guide
|
|
|
|
This is the initial release. No migration required.
|
|
|
|
### Future Roadmap
|
|
|
|
#### Version 0.2.0 (Q1 2025)
|
|
- [ ] Database persistence (PostgreSQL/SQLite)
|
|
- [ ] Horizontal scaling with Redis
|
|
- [ ] Prometheus metrics
|
|
- [ ] OpenAPI/Swagger documentation
|
|
- [ ] Multi-tenancy support
|
|
|
|
#### Version 0.3.0 (Q2 2025)
|
|
- [ ] GPU acceleration via ONNX Runtime
|
|
- [ ] Advanced layout analysis
|
|
- [ ] Multi-language support
|
|
- [ ] Enhanced handwriting recognition
|
|
- [ ] Real-time collaborative editing
|
|
|
|
#### Version 1.0.0 (Q3 2025)
|
|
- [ ] Production-grade stability
|
|
- [ ] Enterprise features
|
|
- [ ] Cloud-native deployment
|
|
- [ ] Kubernetes operators
|
|
- [ ] Comprehensive monitoring
|
|
|
|
### Contributors
|
|
|
|
- Ruvector Team - Initial implementation and architecture
|
|
- Community - Testing and feedback
|
|
|
|
### License
|
|
|
|
MIT License - See LICENSE file for details
|
|
|
|
---
|
|
|
|
## Unreleased
|
|
|
|
### Added
|
|
- Nothing yet
|
|
|
|
### Changed
|
|
- Nothing yet
|
|
|
|
### Fixed
|
|
- Nothing yet
|
|
|
|
### Deprecated
|
|
- Nothing yet
|
|
|
|
### Removed
|
|
- Nothing yet
|
|
|
|
### Security
|
|
- Nothing yet
|