dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

59 KiB

Raw Blame History

Scipix Clone - System Requirements Specification

Version: 1.0.0 Date: 2025-11-28 Project: ruvector-scipix Methodology: SPARC (Specification Phase)

Project Overview & Goals
Functional Requirements
Non-Functional Requirements
Input/Output Specifications
API Design
Data Models
Use Cases and User Stories
Success Criteria and Acceptance Tests
Constraints and Limitations
Dependencies

1. Project Overview & Goals

1.1 Purpose

This system provides an open-source Rust implementation of mathematical and scientific content recognition, compatible with the Scipix API v3. The system converts images containing mathematical equations, chemical formulas, tables, and diagrams into machine-readable formats (LaTeX, MathML, Markdown, etc.).

1.2 Scope

In Scope:

Mathematical equation recognition (printed and handwritten)
Chemical formula recognition
Table and diagram extraction
Multi-format input support (JPEG, PNG, PDF, etc.)
Multi-format output (LaTeX, MathML, Markdown, HTML, DOCX)
RESTful API compatible with Scipix v3
Vector storage integration via ruvector-core
Confidence scoring and metadata extraction
Line/word segmentation and geometry analysis

Out of Scope:

Real-time video processing
3D model recognition
Audio transcription
Mobile app development (API only)

1.3 Target Users

Researchers: Converting papers to digital format
Students: Digitizing handwritten notes
Educators: Creating accessible educational content
Developers: Building applications requiring math OCR
Publishers: Converting legacy documents to modern formats

1.4 Project Goals

API Compatibility: 95%+ compatibility with Scipix API v3
Performance: <100ms latency for single image processing
Accuracy: 95%+ on printed math, 90%+ on handwritten
Open Source: Fully auditable, extensible, community-driven
Scalability: Handle concurrent requests efficiently
Cost Efficiency: Reduce OCR costs by 10x vs commercial solutions

2. Functional Requirements

2.1 Image Processing

FR-2.1.1: Image Input Support

Priority: High Description: System shall accept images in multiple formats

Acceptance Criteria:

Support JPEG, PNG, GIF, TIFF, WebP, BMP formats
Accept Base64-encoded image data
Accept image URLs (HTTP/HTTPS)
Handle images up to 10MB in size
Support images from 100x100 to 4000x4000 pixels
Auto-rotate based on EXIF orientation

Example:

pub enum ImageInput {
    Base64(String),
    Url(String),
    Binary(Vec<u8>),
}

pub struct ImageConstraints {
    max_size_bytes: usize,      // 10MB
    min_dimension: u32,          // 100px
    max_dimension: u32,          // 4000px
    supported_formats: Vec<ImageFormat>,
}

FR-2.1.2: PDF Processing

Priority: High Description: System shall extract and process mathematical content from PDF documents

Acceptance Criteria:

Support PDF files up to 100 pages
Extract text with position information
Render pages to images for OCR
Preserve page structure and layout
Support both text-based and scanned PDFs
Extract embedded LaTeX if available

FR-2.1.3: Document Processing

Priority: Medium Description: System shall process EPUB, DOCX, PPTX documents

Acceptance Criteria:

Extract text and images from EPUB
Parse DOCX mathematical content (Office Math ML)
Extract slides from PPTX
Maintain document structure metadata
Support password-protected documents (optional)

2.2 Mathematical Recognition

FR-2.2.1: Equation Recognition

Priority: High Description: System shall recognize and convert mathematical equations

Acceptance Criteria:

Recognize inline and display equations
Support basic arithmetic operations (+, -, ×, ÷)
Support algebraic notation (variables, exponents, subscripts)
Support calculus (integrals, derivatives, limits)
Support linear algebra (matrices, vectors)
Support set theory and logic notation
Output confidence scores per equation

Example:

pub struct EquationRecognition {
    detected_math: Vec<MathRegion>,
    confidence: f32,
    latex: String,
    mathml: Option<String>,
    asciimath: Option<String>,
}

pub struct MathRegion {
    bbox: BoundingBox,
    equation_type: EquationType,
    symbols: Vec<Symbol>,
}

pub enum EquationType {
    Inline,
    Display,
    Numbered,
}

FR-2.2.2: Chemical Formula Recognition

Priority: Medium Description: System shall recognize chemical formulas and reactions

Acceptance Criteria:

Recognize molecular formulas (H₂O, C₆H₁₂O₆)
Support chemical equations and reactions
Recognize structural formulas (basic)
Output in SMILES or InChI notation
Support subscripts and superscripts (charges)

FR-2.2.3: Handwritten Math Recognition

Priority: High Description: System shall recognize handwritten mathematical notation

Acceptance Criteria:

Process handwritten equations with 90%+ accuracy
Support various handwriting styles
Handle connected and separated characters
Detect stroke order (if available)
Provide confidence scores per symbol

2.3 Output Formats

FR-2.3.1: LaTeX Output

Priority: High Description: System shall generate valid LaTeX markup

Acceptance Criteria:

Generate compilable LaTeX code
Support standard LaTeX packages (amsmath, amssymb)
Include proper math delimiters (, $, [, ])
Maintain equation structure and alignment
Support custom LaTeX macros (configurable)

Example:

pub struct LatexOutput {
    latex: String,
    packages_required: Vec<String>,
    preamble: Option<String>,
    errors: Vec<LatexValidationError>,
}

impl LatexOutput {
    pub fn validate(&self) -> Result<(), LatexError> {
        // Validate LaTeX syntax
    }

    pub fn compile_test(&self) -> Result<Vec<u8>, CompilationError> {
        // Test compilation to PDF
    }
}

FR-2.3.2: Scipix Markdown (MMD)

Priority: High Description: System shall generate Scipix Markdown format

Acceptance Criteria:

Support MMD syntax extensions
Include metadata blocks
Preserve document structure
Support tables, lists, headings
Include image references and captions

FR-2.3.3: MathML Output

Priority: Medium Description: System shall generate MathML markup

Acceptance Criteria:

Generate valid MathML 3.0
Support both Presentation and Content MathML
Include semantic annotations
Validate against MathML schema

FR-2.3.4: AsciiMath Output

Priority: Low Description: System shall generate AsciiMath notation

Acceptance Criteria:

Generate human-readable AsciiMath
Support basic mathematical operations
Maintain expression structure

FR-2.3.5: HTML/DOCX Export

Priority: Medium Description: System shall export to HTML and DOCX formats

Acceptance Criteria:

Generate semantic HTML with MathJax
Create valid DOCX with Office Math ML
Preserve formatting and structure
Include CSS styling (HTML)

2.4 API Endpoints

FR-2.4.1: Text Recognition Endpoint

Priority: High Description: POST /v3/text endpoint for image-to-text conversion

Acceptance Criteria:

Accept multipart/form-data or JSON
Support batch processing (multiple images)
Return confidence scores
Support async processing for large batches
Implement rate limiting

FR-2.4.2: Strokes Recognition Endpoint

Priority: Medium Description: POST /v3/strokes endpoint for handwritten strokes

Acceptance Criteria:

Accept stroke data (x, y coordinates, timestamps)
Process real-time input
Return incremental results
Support stroke order analysis

FR-2.4.3: LaTeX Rendering Endpoint

Priority: Medium Description: POST /v3/latex endpoint for LaTeX-to-image

Acceptance Criteria:

Render LaTeX to PNG/SVG
Support custom DPI settings
Return rendered image and metadata
Cache rendered results

FR-2.4.4: PDF Conversion Endpoint

Priority: High Description: POST /v3/pdf endpoint for PDF processing

Acceptance Criteria:

Accept PDF uploads
Process multi-page documents
Return page-by-page results
Support partial processing (page ranges)

2.5 Additional Features

FR-2.5.1: Confidence Scoring

Priority: High Description: System shall provide confidence scores for all recognition

Acceptance Criteria:

Score range: 0.0 to 1.0
Per-symbol confidence scores
Overall equation confidence
Calibrated probability estimates

pub struct ConfidenceScores {
    overall: f32,
    per_symbol: Vec<(Symbol, f32)>,
    per_line: Vec<f32>,
    calibrated: bool,
}

FR-2.5.2: Geometry Analysis

Priority: Medium Description: System shall extract geometric information

Acceptance Criteria:

Detect bounding boxes for all elements
Identify text baseline and orientation
Detect equation alignment
Extract line and paragraph structure

pub struct GeometryInfo {
    bounding_boxes: Vec<BoundingBox>,
    baselines: Vec<Line>,
    text_orientation: f32,
    line_spacing: f32,
    columns: Option<Vec<Column>>,
}

pub struct BoundingBox {
    x: f32,
    y: f32,
    width: f32,
    height: f32,
    rotation: f32,
}

FR-2.5.3: Line/Word Segmentation

Priority: Medium Description: System shall segment text into lines and words

Acceptance Criteria:

Detect individual words
Identify line breaks
Separate equations from text
Handle multi-column layouts

3. Non-Functional Requirements

3.1 Performance

NFR-3.1.1: Latency

Priority: High Requirement: Single image processing <100ms (95th percentile)

Measurement:

p50 latency: <50ms
p95 latency: <100ms
p99 latency: <200ms

Test Cases:

#[tokio::test]
async fn test_single_image_latency() {
    let image = load_test_image("simple_equation.png");
    let start = Instant::now();
    let result = processor.process(image).await.unwrap();
    let duration = start.elapsed();
    assert!(duration < Duration::from_millis(100));
}

NFR-3.1.2: Throughput

Priority: High Requirement: Process 100 requests per second per core

Measurement:

Single core: 100 req/s
4 cores: 350+ req/s (accounting for overhead)
8 cores: 650+ req/s

NFR-3.1.3: Batch Processing

Priority: Medium Requirement: Process 100-image batch in <5 seconds

Measurement:

Average time per image in batch: <50ms
Total batch overhead: <500ms

3.2 Accuracy

NFR-3.2.1: Printed Math Accuracy

Priority: High Requirement: 95%+ character-level accuracy on printed equations

Measurement:

Use standard math OCR benchmark datasets
Calculate Character Error Rate (CER)
Test on various fonts and sizes

Validation:

pub fn calculate_accuracy(ground_truth: &str, predicted: &str) -> AccuracyMetrics {
    AccuracyMetrics {
        character_error_rate: calculate_cer(ground_truth, predicted),
        word_error_rate: calculate_wer(ground_truth, predicted),
        equation_match: exact_match(ground_truth, predicted),
    }
}

NFR-3.2.2: Handwritten Math Accuracy

Priority: High Requirement: 90%+ character-level accuracy on handwritten equations

Measurement:

Test on CROHME dataset
Calculate symbol recognition rate
Measure expression recognition rate

NFR-3.2.3: Chemical Formula Accuracy

Priority: Medium Requirement: 93%+ accuracy on chemical formulas

Measurement:

Test on ChemDraw and standard chemistry datasets
Validate SMILES generation
Check stoichiometry preservation

3.3 Scalability

NFR-3.3.1: Concurrent Users

Priority: High Requirement: Support 1000+ concurrent users

Constraints:

Connection pooling
Request queueing
Resource limits per user

NFR-3.3.2: Horizontal Scaling

Priority: High Requirement: Linear scaling up to 10 nodes

Architecture:

Stateless API servers
Shared vector database
Distributed caching

NFR-3.3.3: Memory Usage

Priority: High Requirement: <2GB RAM per worker process

Constraints:

Model size optimization
Efficient image buffering
Memory-mapped model loading

3.4 Reliability

NFR-3.4.1: Availability

Priority: High Requirement: 99.9% uptime (SLA)

Measurement:

Planned downtime excluded
Maximum 8.76 hours downtime per year

NFR-3.4.2: Error Handling

Priority: High Requirement: Graceful degradation for all error cases

Implementation:

pub enum ProcessingError {
    ImageFormatUnsupported(String),
    ImageTooLarge { size: usize, max: usize },
    ImageDimensionInvalid { width: u32, height: u32 },
    OCRProcessingFailed { reason: String },
    LatexGenerationFailed { partial_result: Option<String> },
    TimeoutExceeded { duration: Duration },
}

impl ProcessingError {
    pub fn to_user_message(&self) -> String {
        // User-friendly error messages
    }

    pub fn recovery_action(&self) -> Option<RecoveryAction> {
        // Suggest recovery actions
    }
}

NFR-3.4.3: Data Validation

Priority: High Requirement: Validate all inputs before processing

Checks:

File format validation
Size limits enforcement
Content type verification
Malicious content detection

3.5 Security

NFR-3.5.1: Authentication

Priority: High Requirement: API key-based authentication

Implementation:

SHA-256 hashed API keys
Rate limiting per key
Key rotation support
Expiration policies

pub struct ApiKey {
    id: Uuid,
    key_hash: String,
    created_at: DateTime<Utc>,
    expires_at: Option<DateTime<Utc>>,
    rate_limit: RateLimit,
    permissions: Vec<Permission>,
}

NFR-3.5.2: Data Privacy

Priority: High Requirement: No persistent storage of user images

Policies:

Images processed in memory
Automatic cleanup after processing
Optional temporary storage (user consent)
No logging of image content

NFR-3.5.3: Input Sanitization

Priority: High Requirement: Sanitize all inputs to prevent attacks

Protections:

Image bomb detection
Zip bomb prevention
Path traversal prevention
Script injection prevention

3.6 Usability

NFR-3.6.1: API Design

Priority: High Requirement: RESTful API following OpenAPI 3.0 specification

Standards:

Consistent error responses
Comprehensive documentation
Example code in 5+ languages
Interactive API explorer

NFR-3.6.2: Error Messages

Priority: Medium Requirement: Clear, actionable error messages

Format:

pub struct ApiError {
    code: String,
    message: String,
    details: Option<serde_json::Value>,
    suggestion: Option<String>,
    documentation_url: Option<String>,
}

3.7 Maintainability

NFR-3.7.1: Code Quality

Priority: High Requirements:

80%+ test coverage
Clippy warnings as errors
Rustfmt formatting enforced
Documentation for public APIs

NFR-3.7.2: Logging

Priority: High Requirement: Structured logging at multiple levels

Levels:

ERROR: Processing failures
WARN: Degraded performance
INFO: Request/response logs
DEBUG: Detailed processing steps
TRACE: Symbol-level recognition

use tracing::{info, debug, error};

#[instrument(skip(image_data))]
async fn process_image(image_data: &[u8]) -> Result<Recognition> {
    info!("Starting image processing");
    debug!("Image size: {} bytes", image_data.len());

    let result = recognize(image_data).await?;

    info!(
        confidence = %result.confidence,
        symbols_detected = result.symbols.len(),
        "Processing complete"
    );

    Ok(result)
}

NFR-3.7.3: Monitoring

Priority: High Requirement: Prometheus metrics for all operations

Metrics:

Request rate
Error rate
Processing latency
Model inference time
Memory usage
Queue depth

4. Input/Output Specifications

4.1 Input Specifications

4.1.1 Image Input

Supported Formats:

pub enum ImageFormat {
    Jpeg,
    Png,
    Gif,
    Tiff,
    WebP,
    Bmp,
}

pub struct ImageInput {
    format: ImageFormat,
    data: ImageData,
    metadata: Option<ImageMetadata>,
}

pub enum ImageData {
    Base64(String),
    Binary(Vec<u8>),
    Url(String),
}

pub struct ImageMetadata {
    width: u32,
    height: u32,
    dpi: Option<u32>,
    color_space: ColorSpace,
    exif: Option<ExifData>,
}

Constraints:

pub const MAX_IMAGE_SIZE: usize = 10 * 1024 * 1024; // 10MB
pub const MIN_DIMENSION: u32 = 100;
pub const MAX_DIMENSION: u32 = 4000;
pub const SUPPORTED_MIME_TYPES: &[&str] = &[
    "image/jpeg",
    "image/png",
    "image/gif",
    "image/tiff",
    "image/webp",
    "image/bmp",
];

Example JSON Request:

{
  "src": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
  "formats": ["latex", "mathml", "text"],
  "ocr": ["math", "text"],
  "metadata": {
    "include_geometry": true,
    "include_confidence": true,
    "include_line_data": true
  }
}

4.1.2 PDF Input

pub struct PdfInput {
    data: Vec<u8>,
    options: PdfProcessingOptions,
}

pub struct PdfProcessingOptions {
    page_range: Option<Range<usize>>,
    dpi: u32, // Default: 300
    extract_text: bool,
    extract_images: bool,
    preserve_layout: bool,
}

Example Request:

{
  "pdf": "base64_encoded_pdf_data",
  "conversion_formats": {
    "latex": true,
    "mmd": true
  },
  "page_ranges": [[1, 10]],
  "options": {
    "dpi": 300,
    "extract_text": true
  }
}

4.1.3 Stroke Input (Handwriting)

pub struct StrokeInput {
    strokes: Vec<Stroke>,
    canvas_size: (u32, u32),
}

pub struct Stroke {
    points: Vec<Point>,
    timestamps: Option<Vec<u64>>, // milliseconds
    pressure: Option<Vec<f32>>,   // 0.0 to 1.0
}

pub struct Point {
    x: f32,
    y: f32,
}

Example Request:

{
  "strokes": [
    {
      "points": [[10, 20], [15, 25], [20, 30]],
      "timestamps": [0, 50, 100]
    }
  ],
  "canvas_size": [800, 600],
  "formats": ["latex"]
}

4.2 Output Specifications

4.2.1 Recognition Response

pub struct RecognitionResponse {
    // Core recognition
    text: String,
    latex: Option<String>,
    mathml: Option<String>,
    asciimath: Option<String>,
    mmd: Option<String>,

    // Confidence and quality
    confidence: f32,
    confidence_rate: f32,

    // Geometric information
    line_data: Option<Vec<LineData>>,
    word_data: Option<Vec<WordData>>,
    position: Option<Position>,

    // Metadata
    is_printed: Option<bool>,
    is_handwritten: Option<bool>,
    detected_alphabets: Vec<Alphabet>,

    // Processing info
    processing_time_ms: u64,
    model_version: String,
}

pub struct LineData {
    text: String,
    confidence: f32,
    bbox: BoundingBox,
    type_: LineType,
}

pub enum LineType {
    Text,
    Math,
    ChemicalFormula,
    Table,
    Diagram,
}

pub struct WordData {
    text: String,
    confidence: f32,
    bbox: BoundingBox,
}

pub enum Alphabet {
    Latin,
    Greek,
    Cyrillic,
    Hebrew,
    Arabic,
    Mathematical,
    Chemical,
}

Example JSON Response:

{
  "text": "The quadratic formula is x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}",
  "latex": "x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}",
  "mathml": "<math>...</math>",
  "confidence": 0.97,
  "confidence_rate": 0.95,
  "line_data": [
    {
      "text": "The quadratic formula is",
      "confidence": 0.99,
      "bbox": {"x": 10, "y": 20, "width": 200, "height": 25},
      "type": "text"
    },
    {
      "text": "x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}",
      "confidence": 0.96,
      "bbox": {"x": 10, "y": 50, "width": 300, "height": 40},
      "type": "math"
    }
  ],
  "is_printed": true,
  "is_handwritten": false,
  "detected_alphabets": ["latin", "mathematical"],
  "processing_time_ms": 87,
  "model_version": "1.0.0"
}

4.2.2 Error Response

pub struct ErrorResponse {
    error: String,
    error_code: ErrorCode,
    message: String,
    details: Option<serde_json::Value>,
    suggestion: Option<String>,
    documentation_url: String,
}

pub enum ErrorCode {
    InvalidInput,
    UnsupportedFormat,
    ImageTooLarge,
    ProcessingTimeout,
    InternalError,
    RateLimitExceeded,
    UnauthorizedRequest,
}

Example Error Response:

{
  "error": "invalid_image_format",
  "error_code": "UNSUPPORTED_FORMAT",
  "message": "The provided image format is not supported",
  "details": {
    "detected_format": "image/svg+xml",
    "supported_formats": ["image/jpeg", "image/png", "image/gif"]
  },
  "suggestion": "Convert your image to JPEG or PNG format before uploading",
  "documentation_url": "https://docs.scipix.com/formats"
}

4.2.3 Batch Processing Response

pub struct BatchResponse {
    results: Vec<BatchResult>,
    total_processing_time_ms: u64,
    success_count: usize,
    failure_count: usize,
}

pub struct BatchResult {
    index: usize,
    success: bool,
    result: Option<RecognitionResponse>,
    error: Option<ErrorResponse>,
}

5. API Design

5.1 REST API Specification

Base URL

https://api.scipix.com/v3/

Authentication

Authorization: Bearer <api_key>
Content-Type: application/json

5.2 Endpoints

5.2.1 Text Recognition

Endpoint: POST /v3/text

Description: Convert image to text and mathematical markup

Request:

pub struct TextRecognitionRequest {
    /// Image source (Base64, URL, or binary)
    src: ImageSource,

    /// Output formats to generate
    #[serde(default)]
    formats: Vec<OutputFormat>,

    /// OCR modes to use
    #[serde(default)]
    ocr: Vec<OcrMode>,

    /// Processing options
    #[serde(default)]
    options: ProcessingOptions,

    /// Metadata to include in response
    #[serde(default)]
    metadata: MetadataOptions,
}

pub enum ImageSource {
    Base64(String),
    Url(String),
    Binary(Vec<u8>),
}

pub enum OutputFormat {
    Text,
    Latex,
    MathML,
    AsciiMath,
    MMD,
    HTML,
}

pub enum OcrMode {
    Math,
    Text,
    Chemistry,
    Table,
    Diagram,
}

pub struct ProcessingOptions {
    /// Enable equation numbering
    pub equation_numbers: Option<bool>,

    /// Include LaTeX packages
    pub latex_packages: Option<Vec<String>>,

    /// Custom delimiters for math
    pub math_delimiters: Option<MathDelimiters>,

    /// Confidence threshold (0.0-1.0)
    pub confidence_threshold: Option<f32>,

    /// Enable preprocessing
    pub preprocessing: Option<PreprocessingOptions>,
}

pub struct MetadataOptions {
    pub include_geometry: bool,
    pub include_confidence: bool,
    pub include_line_data: bool,
    pub include_word_data: bool,
}

Example Request:

POST /v3/text HTTP/1.1
Authorization: Bearer sk_live_abc123
Content-Type: application/json

{
  "src": "data:image/png;base64,iVBORw0KGgo...",
  "formats": ["latex", "mathml", "text"],
  "ocr": ["math", "text"],
  "options": {
    "equation_numbers": true,
    "confidence_threshold": 0.8
  },
  "metadata": {
    "include_geometry": true,
    "include_confidence": true
  }
}

Response: 200 OK

{
  "request_id": "req_abc123",
  "text": "Einstein's equation: E = mc^2",
  "latex": "E = mc^2",
  "mathml": "<math><mi>E</mi><mo>=</mo><mi>m</mi><msup><mi>c</mi><mn>2</mn></msup></math>",
  "confidence": 0.98,
  "processing_time_ms": 75
}

5.2.2 Stroke Recognition

Endpoint: POST /v3/strokes

Description: Convert handwritten strokes to mathematical notation

Request:

pub struct StrokeRecognitionRequest {
    strokes: Vec<Stroke>,
    canvas_size: (u32, u32),
    formats: Vec<OutputFormat>,
    options: StrokeProcessingOptions,
}

pub struct StrokeProcessingOptions {
    /// Recognize as equation or expression
    pub mode: StrokeMode,

    /// Previous context for incremental recognition
    pub context: Option<String>,

    /// Language/alphabet hint
    pub alphabet_hint: Option<Vec<Alphabet>>,
}

pub enum StrokeMode {
    Expression,
    Equation,
    Text,
}

Example Request:

POST /v3/strokes HTTP/1.1
Authorization: Bearer sk_live_abc123
Content-Type: application/json

{
  "strokes": [
    {
      "points": [[50, 100], [55, 95], [60, 90]],
      "timestamps": [0, 50, 100]
    }
  ],
  "canvas_size": [800, 600],
  "formats": ["latex", "text"]
}

5.2.3 LaTeX Rendering

Endpoint: POST /v3/latex

Description: Render LaTeX to image

Request:

pub struct LatexRenderRequest {
    latex: String,
    format: ImageFormat,
    options: RenderOptions,
}

pub struct RenderOptions {
    pub dpi: u32,              // Default: 300
    pub foreground: String,     // Hex color
    pub background: String,     // Hex color
    pub padding: u32,          // Pixels
    pub font_size: u32,        // Points
}

Example Request:

POST /v3/latex HTTP/1.1
Authorization: Bearer sk_live_abc123
Content-Type: application/json

{
  "latex": "\\int_0^\\infty e^{-x^2} dx = \\frac{\\sqrt{\\pi}}{2}",
  "format": "png",
  "options": {
    "dpi": 300,
    "foreground": "#000000",
    "background": "#FFFFFF"
  }
}

Response: Binary image data or Base64

5.2.4 PDF Processing

Endpoint: POST /v3/pdf

Description: Convert PDF to text and mathematical markup

Request:

pub struct PdfProcessingRequest {
    pdf: Vec<u8>,  // Base64 or binary
    conversion_formats: ConversionFormats,
    page_ranges: Option<Vec<Range<usize>>>,
    options: PdfOptions,
}

pub struct ConversionFormats {
    pub latex: bool,
    pub mathml: bool,
    pub mmd: bool,
    pub docx: bool,
    pub html: bool,
}

pub struct PdfOptions {
    pub dpi: u32,
    pub extract_text: bool,
    pub extract_images: bool,
    pub preserve_layout: bool,
    pub ocr_strategy: OcrStrategy,
}

pub enum OcrStrategy {
    Auto,
    AlwaysOcr,
    TextOnly,
}

Example Request:

POST /v3/pdf HTTP/1.1
Authorization: Bearer sk_live_abc123
Content-Type: multipart/form-data

{
  "pdf": "base64_pdf_data",
  "conversion_formats": {
    "latex": true,
    "mmd": true
  },
  "page_ranges": [[1, 5]],
  "options": {
    "dpi": 300,
    "ocr_strategy": "auto"
  }
}

Response:

{
  "pages": [
    {
      "page_number": 1,
      "text": "...",
      "latex": "...",
      "mmd": "..."
    }
  ],
  "total_pages": 5,
  "processing_time_ms": 2340
}

5.3 Rate Limiting

pub struct RateLimiter {
    requests_per_second: u32,
    requests_per_hour: u32,
    concurrent_requests: u32,
}

impl Default for RateLimiter {
    fn default() -> Self {
        Self {
            requests_per_second: 10,
            requests_per_hour: 1000,
            concurrent_requests: 5,
        }
    }
}

Rate Limit Headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 950
X-RateLimit-Reset: 1640995200

5.4 Versioning

API version in URL: /v3/
Backward compatibility for minor versions
Deprecation notices 6 months before removal

6. Data Models

6.1 Core Models

6.1.1 Mathematical Expression

use serde::{Deserialize, Serialize};
use uuid::Uuid;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MathExpression {
    pub id: Uuid,
    pub latex: String,
    pub mathml: Option<String>,
    pub asciimath: Option<String>,
    pub expression_tree: ExpressionTree,
    pub symbols: Vec<MathSymbol>,
    pub bounding_box: BoundingBox,
    pub confidence: f32,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExpressionTree {
    pub root: ExpressionNode,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExpressionNode {
    pub node_type: NodeType,
    pub value: Option<String>,
    pub children: Vec<ExpressionNode>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum NodeType {
    Number,
    Variable,
    Operator(Operator),
    Function(Function),
    Fraction,
    Exponent,
    Subscript,
    Matrix,
    Integral,
    Sum,
    Product,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Operator {
    Add,
    Subtract,
    Multiply,
    Divide,
    Equals,
    LessThan,
    GreaterThan,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Function {
    Sin,
    Cos,
    Tan,
    Log,
    Ln,
    Sqrt,
    Custom(String),
}

6.1.2 Symbol Recognition

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MathSymbol {
    pub id: Uuid,
    pub symbol: String,
    pub unicode: u32,
    pub latex_command: String,
    pub category: SymbolCategory,
    pub bounding_box: BoundingBox,
    pub confidence: f32,
    pub alternatives: Vec<SymbolAlternative>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SymbolCategory {
    Digit,
    Letter,
    GreekLetter,
    Operator,
    Relation,
    Delimiter,
    Arrow,
    Accent,
    LargeOperator,
    BinaryOperator,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SymbolAlternative {
    pub symbol: String,
    pub confidence: f32,
}

6.1.3 Document Structure

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Document {
    pub id: Uuid,
    pub pages: Vec<Page>,
    pub metadata: DocumentMetadata,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Page {
    pub page_number: usize,
    pub blocks: Vec<ContentBlock>,
    pub dimensions: (u32, u32),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ContentBlock {
    Text(TextBlock),
    Math(MathBlock),
    Table(TableBlock),
    Image(ImageBlock),
    Diagram(DiagramBlock),
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TextBlock {
    pub text: String,
    pub lines: Vec<TextLine>,
    pub bounding_box: BoundingBox,
    pub font_info: Option<FontInfo>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MathBlock {
    pub expression: MathExpression,
    pub display_mode: bool,
    pub numbered: bool,
    pub equation_number: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TableBlock {
    pub rows: usize,
    pub cols: usize,
    pub cells: Vec<Vec<ContentBlock>>,
    pub bounding_box: BoundingBox,
}

6.2 Processing Models

6.2.1 Recognition Pipeline

#[derive(Debug, Clone)]
pub struct RecognitionPipeline {
    pub stages: Vec<PipelineStage>,
}

#[derive(Debug, Clone)]
pub enum PipelineStage {
    Preprocessing(PreprocessingConfig),
    Detection(DetectionConfig),
    Recognition(RecognitionConfig),
    Postprocessing(PostprocessingConfig),
}

#[derive(Debug, Clone)]
pub struct PreprocessingConfig {
    pub denoise: bool,
    pub deskew: bool,
    pub binarize: bool,
    pub enhance_contrast: bool,
    pub remove_artifacts: bool,
}

#[derive(Debug, Clone)]
pub struct DetectionConfig {
    pub detect_text: bool,
    pub detect_math: bool,
    pub detect_tables: bool,
    pub detect_diagrams: bool,
    pub min_confidence: f32,
}

#[derive(Debug, Clone)]
pub struct RecognitionConfig {
    pub model_type: ModelType,
    pub beam_width: usize,
    pub temperature: f32,
    pub max_length: usize,
}

#[derive(Debug, Clone)]
pub enum ModelType {
    CnnLstm,
    Transformer,
    Hybrid,
}

6.3 Storage Models

6.3.1 Vector Embeddings

use ruvector_core::{Vector, VectorId, VectorMetadata};

#[derive(Debug, Clone)]
pub struct SymbolEmbedding {
    pub symbol_id: Uuid,
    pub vector_id: VectorId,
    pub embedding: Vector,
    pub metadata: SymbolMetadata,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SymbolMetadata {
    pub symbol: String,
    pub category: SymbolCategory,
    pub frequency: u32,
    pub variants: Vec<String>,
    pub created_at: i64,
}

impl From<SymbolEmbedding> for VectorMetadata {
    fn from(embedding: SymbolEmbedding) -> Self {
        VectorMetadata {
            id: embedding.vector_id,
            tags: vec![
                format!("category:{}", embedding.metadata.category.to_string()),
                format!("symbol:{}", embedding.metadata.symbol),
            ],
            ..Default::default()
        }
    }
}

6.3.2 Pattern Cache

#[derive(Debug, Clone)]
pub struct PatternCache {
    pub patterns: HashMap<String, CachedPattern>,
    pub max_size: usize,
}

#[derive(Debug, Clone)]
pub struct CachedPattern {
    pub pattern: String,
    pub latex: String,
    pub confidence: f32,
    pub usage_count: u32,
    pub last_used: DateTime<Utc>,
}

7. Use Cases and User Stories

7.1 Academic Researcher

User Story:

"As an academic researcher, I want to convert my handwritten mathematical derivations into LaTeX so that I can include them in my papers without retyping."

Use Case UC-001: Handwritten Notes Conversion

Actor: Academic Researcher

Preconditions:

User has handwritten mathematical notes
User has photographed or scanned the notes
Image quality is sufficient (300+ DPI)

Main Flow:

User uploads image via API or web interface
System preprocesses image (deskew, denoise)
System detects mathematical regions
System recognizes handwritten symbols
System generates LaTeX code
System returns result with confidence scores
User reviews and makes corrections if needed
User exports to LaTeX document

Postconditions:

LaTeX code generated
Original image preserved
Confidence scores provided

Alternative Flows:

3a. Low confidence: System requests higher quality image
4a. Ambiguous symbols: System provides alternatives
5a. Complex layout: System segments into regions

Acceptance Criteria:

90%+ accuracy on handwritten math
Processing time <5 seconds per page
Confidence scores for all symbols
Alternative suggestions for low-confidence symbols

7.2 Student

User Story:

"As a student, I want to quickly digitize equations from my textbook so that I can solve them in Mathematica or WolframAlpha."

Use Case UC-002: Textbook Equation Extraction

Actor: Student

Preconditions:

User has textbook with equations
User can photograph equations clearly

Main Flow:

Student photographs equation with phone
Student uploads via mobile app or API
System recognizes printed equation
System generates multiple formats (LaTeX, AsciiMath, MathML)
Student copies format of choice
Student pastes into computational tool

Postconditions:

Equation converted to multiple formats
Copy-paste ready output

Alternative Flows:

3a. Image quality issues: System requests retake
4a. Multiple equations: System segments automatically

Acceptance Criteria:

95%+ accuracy on printed equations
Processing time <2 seconds
Support for inline and display equations
Output compatible with major math tools

7.3 Publisher

User Story:

"As a publisher, I want to convert legacy mathematical documents to modern formats so that we can create accessible digital editions."

Use Case UC-003: Legacy Document Conversion

Actor: Publisher

Preconditions:

Publisher has scanned PDFs of legacy documents
Documents contain mathematical content
OCR text layer may be absent or poor quality

Main Flow:

Publisher uploads PDF document
System processes pages in parallel
System extracts text and math separately
System generates Scipix Markdown (MMD)
System generates accessible HTML with MathML
Publisher reviews and exports final format

Postconditions:

Document converted to multiple formats
Accessibility standards met (WCAG 2.1)
Mathematical content preserved

Alternative Flows:

2a. Large document: System provides progress updates
3a. Complex layouts: System preserves structure
4a. Tables and diagrams: System maintains formatting

Acceptance Criteria:

Process 100-page document in <10 minutes
Preserve document structure (headings, lists, etc.)
Generate accessible output (WCAG 2.1 AA)
Support for tables and diagrams

7.4 Developer

User Story:

"As a developer, I want to integrate math OCR into my educational app so that students can solve problems by taking photos."

Use Case UC-004: API Integration

Actor: Application Developer

Preconditions:

Developer has API credentials
Developer's app can capture images
Developer can make HTTP requests

Main Flow:

Developer reads API documentation
Developer implements authentication
Developer captures image in app
Developer sends image to API
API returns recognition results
Developer displays results in app
Developer implements error handling

Postconditions:

Math OCR integrated into app
Users can recognize equations
Errors handled gracefully

Alternative Flows:

4a. Rate limit exceeded: Developer implements backoff
5a. Low confidence: Developer requests user verification
6a. Network error: Developer shows offline message

Acceptance Criteria:

Clear API documentation with examples
SDKs for major languages (Python, JavaScript, etc.)
Comprehensive error codes and messages
Rate limiting with clear headers

7.5 Chemistry Student

User Story:

"As a chemistry student, I want to digitize chemical equations from my lab notebook so that I can maintain a digital record."

Use Case UC-005: Chemical Formula Recognition

Actor: Chemistry Student

Preconditions:

Student has lab notebook with chemical formulas
Formulas include subscripts, superscripts, arrows

Main Flow:

Student photographs chemical equation
System recognizes chemical notation
System generates LaTeX (mhchem package)
System generates SMILES notation
Student exports to digital lab notebook

Postconditions:

Chemical equation digitized
Multiple output formats available

Alternative Flows:

2a. Complex structural formula: System generates SVG
3a. Reaction mechanism: System preserves arrows and conditions

Acceptance Criteria:

93%+ accuracy on chemical formulas
Support for subscripts and superscripts
Recognize reaction arrows and conditions
Generate SMILES for molecules

8. Success Criteria and Acceptance Tests

8.1 Performance Benchmarks

Test Suite 1: Latency Benchmarks

#[cfg(test)]
mod latency_tests {
    use super::*;
    use std::time::Instant;

    #[tokio::test]
    async fn test_single_image_p50_latency() {
        let processor = MathProcessor::new();
        let image = load_test_image("simple_equation.png");

        let mut measurements = vec![];
        for _ in 0..100 {
            let start = Instant::now();
            let _ = processor.process(&image).await.unwrap();
            measurements.push(start.elapsed());
        }

        measurements.sort();
        let p50 = measurements[50];

        assert!(
            p50 < Duration::from_millis(50),
            "P50 latency {} exceeds 50ms target",
            p50.as_millis()
        );
    }

    #[tokio::test]
    async fn test_single_image_p95_latency() {
        let processor = MathProcessor::new();
        let image = load_test_image("complex_equation.png");

        let mut measurements = vec![];
        for _ in 0..100 {
            let start = Instant::now();
            let _ = processor.process(&image).await.unwrap();
            measurements.push(start.elapsed());
        }

        measurements.sort();
        let p95 = measurements[95];

        assert!(
            p95 < Duration::from_millis(100),
            "P95 latency {} exceeds 100ms target",
            p95.as_millis()
        );
    }

    #[tokio::test]
    async fn test_batch_processing_time() {
        let processor = MathProcessor::new();
        let images: Vec<_> = (0..100)
            .map(|i| load_test_image(&format!("equation_{}.png", i)))
            .collect();

        let start = Instant::now();
        let results = processor.process_batch(&images).await.unwrap();
        let duration = start.elapsed();

        assert_eq!(results.len(), 100);
        assert!(
            duration < Duration::from_secs(5),
            "Batch processing took {}s, exceeds 5s target",
            duration.as_secs()
        );
    }
}

Test Suite 2: Accuracy Benchmarks

#[cfg(test)]
mod accuracy_tests {
    use super::*;

    #[tokio::test]
    async fn test_printed_math_accuracy() {
        let processor = MathProcessor::new();
        let test_dataset = load_dataset("printed_math_benchmark");

        let mut total_cer = 0.0;
        let mut count = 0;

        for (image, ground_truth) in test_dataset.iter() {
            let result = processor.process(image).await.unwrap();
            let cer = calculate_character_error_rate(&result.latex, ground_truth);
            total_cer += cer;
            count += 1;
        }

        let avg_cer = total_cer / count as f32;
        let accuracy = 1.0 - avg_cer;

        assert!(
            accuracy >= 0.95,
            "Printed math accuracy {:.2}% is below 95% target",
            accuracy * 100.0
        );
    }

    #[tokio::test]
    async fn test_handwritten_math_accuracy() {
        let processor = MathProcessor::new();
        let test_dataset = load_dataset("crohme_2019");

        let mut correct = 0;
        let mut total = 0;

        for (strokes, ground_truth) in test_dataset.iter() {
            let result = processor.process_strokes(strokes).await.unwrap();
            if normalize_latex(&result.latex) == normalize_latex(ground_truth) {
                correct += 1;
            }
            total += 1;
        }

        let accuracy = correct as f32 / total as f32;

        assert!(
            accuracy >= 0.90,
            "Handwritten math accuracy {:.2}% is below 90% target",
            accuracy * 100.0
        );
    }

    #[tokio::test]
    async fn test_chemical_formula_accuracy() {
        let processor = MathProcessor::new();
        let test_dataset = load_dataset("chemistry_formulas");

        let mut correct = 0;
        let mut total = 0;

        for (image, ground_truth) in test_dataset.iter() {
            let result = processor.process(image).await.unwrap();
            if result.latex == ground_truth.latex {
                correct += 1;
            }
            total += 1;
        }

        let accuracy = correct as f32 / total as f32;

        assert!(
            accuracy >= 0.93,
            "Chemical formula accuracy {:.2}% is below 93% target",
            accuracy * 100.0
        );
    }
}

Test Suite 3: Scalability Tests

#[cfg(test)]
mod scalability_tests {
    use super::*;

    #[tokio::test]
    async fn test_concurrent_requests() {
        let processor = Arc::new(MathProcessor::new());
        let mut handles = vec![];

        for i in 0..1000 {
            let processor = processor.clone();
            let handle = tokio::spawn(async move {
                let image = generate_test_image(i);
                processor.process(&image).await
            });
            handles.push(handle);
        }

        let results: Vec<_> = futures::future::join_all(handles)
            .await
            .into_iter()
            .collect();

        let success_count = results.iter().filter(|r| r.is_ok()).count();
        let success_rate = success_count as f32 / 1000.0;

        assert!(
            success_rate >= 0.99,
            "Success rate {:.2}% below 99% target",
            success_rate * 100.0
        );
    }

    #[tokio::test]
    async fn test_memory_usage() {
        let processor = MathProcessor::new();

        let initial_memory = get_memory_usage();

        // Process 1000 images
        for i in 0..1000 {
            let image = generate_test_image(i);
            let _ = processor.process(&image).await.unwrap();
        }

        let final_memory = get_memory_usage();
        let memory_increase = final_memory - initial_memory;

        assert!(
            memory_increase < 2_000_000_000, // 2GB
            "Memory usage increased by {} bytes, exceeds 2GB limit",
            memory_increase
        );
    }
}

8.2 API Compatibility Tests

#[cfg(test)]
mod api_compatibility_tests {
    use super::*;

    #[tokio::test]
    async fn test_scipix_api_request_format() {
        let client = TestClient::new();

        let request = json!({
            "src": "data:image/png;base64,...",
            "formats": ["latex", "mathml"],
            "ocr": ["math", "text"]
        });

        let response = client
            .post("/v3/text")
            .json(&request)
            .send()
            .await
            .unwrap();

        assert_eq!(response.status(), 200);

        let body: serde_json::Value = response.json().await.unwrap();
        assert!(body.get("latex").is_some());
        assert!(body.get("mathml").is_some());
        assert!(body.get("confidence").is_some());
    }

    #[tokio::test]
    async fn test_error_response_format() {
        let client = TestClient::new();

        let request = json!({
            "src": "invalid_data"
        });

        let response = client
            .post("/v3/text")
            .json(&request)
            .send()
            .await
            .unwrap();

        assert_eq!(response.status(), 400);

        let body: ErrorResponse = response.json().await.unwrap();
        assert!(!body.error.is_empty());
        assert!(!body.message.is_empty());
    }
}

8.3 Acceptance Criteria Checklist

Functional Requirements

Support all specified image formats (JPEG, PNG, GIF, TIFF, WebP, BMP)
Process PDF documents (up to 100 pages)
Recognize printed mathematical equations (95%+ accuracy)
Recognize handwritten equations (90%+ accuracy)
Recognize chemical formulas (93%+ accuracy)
Generate LaTeX output
Generate MathML output
Generate Scipix Markdown
Provide confidence scores
Extract bounding boxes and geometry
Segment lines and words
Support batch processing

Non-Functional Requirements

Single image latency <100ms (p95)
Batch processing: 100 images in <5 seconds
Support 1000+ concurrent users
99.9% uptime SLA
Memory usage <2GB per worker
Horizontal scaling to 10+ nodes

API Requirements

RESTful API following OpenAPI 3.0
API key authentication
Rate limiting
Comprehensive error messages
API documentation with examples
Compatible with Scipix API v3 (95%+)

Quality Requirements

80%+ test coverage
No Clippy warnings
Formatted with Rustfmt
Documentation for all public APIs
Structured logging with tracing
Prometheus metrics

9. Constraints and Limitations

9.1 Technical Constraints

9.1.1 Processing Limitations

Image Size Constraints:

pub const MAX_IMAGE_SIZE: usize = 10 * 1024 * 1024; // 10MB
pub const MIN_IMAGE_DIMENSION: u32 = 100;           // 100px
pub const MAX_IMAGE_DIMENSION: u32 = 4000;          // 4000px
pub const RECOMMENDED_DPI: u32 = 300;               // 300 DPI

Performance Limitations:

Processing time increases with image size
Complex equations may exceed 100ms target
Very low quality images may fail recognition
Batch processing limited to 1000 images per request

Accuracy Limitations:

Handwritten accuracy depends on legibility
Very stylized fonts may reduce accuracy
Mixed languages in same equation may confuse recognition
Structural formulas (chemistry) have limited support

9.1.2 Format Limitations

Input Formats:

SVG not supported (rasterize first)
Animated GIFs (only first frame processed)
HEIC/HEIF require conversion
Password-protected PDFs require password

Output Formats:

LaTeX: Requires standard packages (amsmath, amssymb)
MathML: Version 3.0 only
DOCX: Basic formatting only
HTML: Requires MathJax or KaTeX for rendering

9.1.3 Character Set Limitations

pub enum SupportLevel {
    Full,        // 95%+ accuracy
    Partial,     // 80-95% accuracy
    Limited,     // 60-80% accuracy
    Experimental, // <60% accuracy
}

pub const CHARACTER_SUPPORT: &[(CharacterSet, SupportLevel)] = &[
    (CharacterSet::BasicLatin, SupportLevel::Full),
    (CharacterSet::Greek, SupportLevel::Full),
    (CharacterSet::MathematicalOperators, SupportLevel::Full),
    (CharacterSet::Cyrillic, SupportLevel::Partial),
    (CharacterSet::Hebrew, SupportLevel::Limited),
    (CharacterSet::Arabic, SupportLevel::Limited),
    (CharacterSet::CJK, SupportLevel::Experimental),
];

9.2 Operational Constraints

9.2.1 Resource Requirements

Minimum Hardware:

CPU: 4 cores (2.0 GHz+)
RAM: 8GB
Storage: 20GB (including models)
Network: 100 Mbps

Recommended Hardware:

CPU: 8+ cores (3.0 GHz+)
RAM: 16GB+
Storage: 100GB SSD
Network: 1 Gbps
GPU: Optional (CUDA-capable for acceleration)

9.2.2 Dependency Constraints

[dependencies]
# Core dependencies
ruvector-core = "0.3"        # Vector storage
tokio = { version = "1.0", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }

# Image processing
image = "0.24"
imageproc = "0.23"

# ML models (size constraints)
onnxruntime = "0.0.14"       # Model size: ~500MB
tensorflow = { version = "0.20", optional = true }  # Model size: ~1GB

# Document processing
pdf = "0.8"
lopdf = "0.26"
docx-rs = "0.4"

# Constraints
# - ONNX runtime: Prebuilt binaries required
# - TensorFlow: Optional, adds 1GB+ to binary
# - PDF libraries: Limited to PDF 1.7

9.2.3 Compliance Constraints

Privacy Requirements:

GDPR: No persistent storage of user data
CCPA: User data deletion within 30 days
HIPAA: Not certified (avoid medical documents)

Accessibility Requirements:

WCAG 2.1 AA for HTML output
Screen reader compatible MathML
Alt text for all images

License Constraints:

MIT/Apache-2.0 for core library
Model licenses vary by source
Dataset licenses must be respected

9.3 Design Constraints

9.3.1 API Compatibility

Must Maintain:

URL structure: /v3/{endpoint}
Request/response formats
Error codes and messages
Authentication mechanism
Rate limit headers

May Differ:

Internal implementation
Performance characteristics
Additional features
Model architectures

9.3.2 Extensibility Requirements

// Plugin architecture for custom models
pub trait RecognitionModel: Send + Sync {
    fn recognize(&self, image: &Image) -> Result<Recognition>;
    fn model_info(&self) -> ModelInfo;
}

// Hook system for preprocessing
pub trait PreprocessingHook: Send + Sync {
    fn process(&self, image: Image) -> Result<Image>;
    fn priority(&self) -> i32;
}

// Custom output formatters
pub trait OutputFormatter: Send + Sync {
    fn format(&self, recognition: &Recognition) -> Result<String>;
    fn mime_type(&self) -> &str;
}

9.3.3 Scalability Constraints

Vertical Scaling:

Limited by single-machine resources
Model size limits memory scaling
CPU-bound processing limits throughput

Horizontal Scaling:

Stateless design required
Shared storage for models
Coordinated caching strategy
Load balancer required

10. Dependencies

10.1 Core Dependencies

10.1.1 ruvector-core Integration

Purpose: Vector storage for symbol embeddings and pattern matching

use ruvector_core::{
    VectorDatabase, Vector, VectorId, VectorMetadata,
    SearchOptions, SearchResult,
};

pub struct SymbolDatabase {
    db: VectorDatabase,
}

impl SymbolDatabase {
    pub async fn new(path: &str) -> Result<Self> {
        let db = VectorDatabase::open(path).await?;
        Ok(Self { db })
    }

    pub async fn find_similar_symbols(
        &self,
        embedding: &Vector,
        limit: usize,
    ) -> Result<Vec<SymbolMatch>> {
        let options = SearchOptions {
            limit,
            threshold: 0.8,
            ..Default::default()
        };

        let results = self.db.search(embedding, &options).await?;

        Ok(results
            .into_iter()
            .map(|r| SymbolMatch {
                symbol: r.metadata.get("symbol").unwrap().to_string(),
                confidence: r.score,
            })
            .collect())
    }

    pub async fn add_symbol(
        &self,
        symbol: &str,
        embedding: Vector,
        metadata: SymbolMetadata,
    ) -> Result<VectorId> {
        let vector_metadata = VectorMetadata {
            tags: vec![
                format!("symbol:{}", symbol),
                format!("category:{}", metadata.category.to_string()),
            ],
            ..Default::default()
        };

        self.db.insert(embedding, vector_metadata).await
    }
}

Use Cases:

Symbol recognition via nearest neighbor search
Pattern matching for common equations
Caching of recognized expressions
Similarity-based error correction

Performance Requirements:

Search latency: <10ms for 1M vectors
Insert throughput: 10,000+ vectors/sec
Memory efficiency: Quantization support
Horizontal scaling: Distributed mode

10.1.2 Machine Learning Models

Symbol Recognition Model:

pub struct SymbolRecognitionModel {
    session: onnxruntime::Session,
    embedder: Embedder,
    symbol_db: SymbolDatabase,
}

impl SymbolRecognitionModel {
    pub fn load(model_path: &str, symbol_db: SymbolDatabase) -> Result<Self> {
        let session = onnxruntime::SessionBuilder::new()?
            .with_model_from_file(model_path)?;

        let embedder = Embedder::new(embedding_dim: 512);

        Ok(Self { session, embedder, symbol_db })
    }

    pub async fn recognize(&self, image: &Image) -> Result<Vec<Symbol>> {
        // 1. Extract symbol regions
        let regions = self.detect_symbols(image)?;

        // 2. Generate embeddings
        let embeddings: Vec<_> = regions
            .iter()
            .map(|r| self.embedder.embed(r))
            .collect();

        // 3. Search in vector database
        let mut symbols = vec![];
        for (region, embedding) in regions.iter().zip(embeddings.iter()) {
            let matches = self.symbol_db
                .find_similar_symbols(embedding, 5)
                .await?;

            symbols.push(Symbol {
                bounding_box: region.bbox,
                symbol: matches[0].symbol.clone(),
                confidence: matches[0].confidence,
                alternatives: matches[1..].to_vec(),
            });
        }

        Ok(symbols)
    }
}

Model Requirements:

Format: ONNX Runtime compatible
Size: <500MB per model
Quantization: INT8 support for deployment
Input: 224x224 RGB images (normalized)
Output: 512-dimensional embeddings

10.1.3 Image Processing

Dependencies:

[dependencies]
image = "0.24"           # Image loading/saving
imageproc = "0.23"       # Image processing primitives
fast_image_resize = "2.7" # High-performance resizing

Processing Pipeline:

pub struct ImagePreprocessor {
    config: PreprocessingConfig,
}

impl ImagePreprocessor {
    pub fn preprocess(&self, image: DynamicImage) -> Result<ProcessedImage> {
        let mut img = image;

        // 1. Deskew
        if self.config.deskew {
            img = self.deskew_image(img)?;
        }

        // 2. Denoise
        if self.config.denoise {
            img = self.apply_bilateral_filter(img)?;
        }

        // 3. Binarize
        if self.config.binarize {
            img = self.adaptive_threshold(img)?;
        }

        // 4. Enhance contrast
        if self.config.enhance_contrast {
            img = self.enhance_contrast(img)?;
        }

        Ok(ProcessedImage { image: img })
    }
}

10.2 External Dependencies

10.2.1 Document Processing

PDF Processing:

pdf = "0.8"              # PDF parsing
lopdf = "0.26"           # Low-level PDF operations
pdfium-render = "0.7"    # PDF rendering

DOCX Processing:

docx-rs = "0.4"          # DOCX reading/writing
zip = "0.6"              # DOCX is ZIP-based

10.2.2 Web Framework

axum = "0.6"             # Web framework
tower = "0.4"            # Middleware
tower-http = "0.4"       # HTTP middleware

API Server:

use axum::{
    routing::{post, get},
    Router, Json, extract::State,
};

pub fn create_app(state: AppState) -> Router {
    Router::new()
        .route("/v3/text", post(text_recognition_handler))
        .route("/v3/strokes", post(stroke_recognition_handler))
        .route("/v3/latex", post(latex_render_handler))
        .route("/v3/pdf", post(pdf_processing_handler))
        .route("/health", get(health_check))
        .layer(/* authentication middleware */)
        .layer(/* rate limiting middleware */)
        .layer(/* logging middleware */)
        .with_state(state)
}

10.3 Development Dependencies

[dev-dependencies]
criterion = "0.5"        # Benchmarking
proptest = "1.0"         # Property testing
mockall = "0.11"         # Mocking
tokio-test = "0.4"       # Async testing
insta = "1.26"           # Snapshot testing

10.4 Dependency Version Matrix

Dependency	Minimum Version	Recommended	Notes
ruvector-core	0.3.0	0.3.x	Vector storage
tokio	1.0	1.35+	Async runtime
axum	0.6	0.7+	Web framework
onnxruntime	0.0.14	latest	ML inference
image	0.24	0.24+	Image processing
pdf	0.8	0.8+	PDF parsing

10.5 Build Requirements

System Dependencies:

# Ubuntu/Debian
apt-get install -y \
    build-essential \
    pkg-config \
    libssl-dev \
    cmake

# macOS
brew install cmake openssl

Rust Toolchain:

rustc >= 1.70.0
cargo >= 1.70.0

Appendix A: Glossary

AsciiMath: Simplified mathematical notation for web

Bounding Box: Rectangle enclosing a detected object

CER (Character Error Rate): Metric for OCR accuracy

CROHME: Competition on Recognition of Online Handwritten Mathematical Expressions

LaTeX: Document preparation system for technical content

MathML: Mathematical Markup Language (XML-based)

Scipix Markdown (MMD): Extended Markdown with math support

OCR: Optical Character Recognition

ONNX: Open Neural Network Exchange format

Quantization: Reducing model precision to save memory

SMILES: Simplified Molecular Input Line Entry System

Stroke: Continuous pen/stylus movement

Vector Embedding: Dense numerical representation of data

Appendix B: References

Scipix API Documentation
- https://docs.scipix.com/
CROHME Dataset
- https://www.isical.ac.in/~crohme/
OpenAPI Specification 3.0
- https://swagger.io/specification/
WCAG 2.1 Guidelines
- https://www.w3.org/WAI/WCAG21/quickref/
LaTeX Documentation
- https://www.latex-project.org/help/documentation/
MathML Specification
- https://www.w3.org/TR/MathML3/
ruvector-core Documentation
- https://github.com/ruvnet/ruvector

Document History

Version	Date	Author	Changes
1.0.0	2025-11-28	SPARC Agent	Initial specification

Next Phase: 02_PSEUDOCODE.md - Algorithm design and processing pipelines

59 KiB Raw Blame History Unescape Escape

Scipix Clone - System Requirements Specification

Table of Contents

1. Project Overview & Goals

1.1 Purpose

1.2 Scope

1.3 Target Users

1.4 Project Goals

2. Functional Requirements

2.1 Image Processing

FR-2.1.1: Image Input Support

FR-2.1.2: PDF Processing

FR-2.1.3: Document Processing

2.2 Mathematical Recognition

FR-2.2.1: Equation Recognition

FR-2.2.2: Chemical Formula Recognition

FR-2.2.3: Handwritten Math Recognition

2.3 Output Formats

FR-2.3.1: LaTeX Output

FR-2.3.2: Scipix Markdown (MMD)

FR-2.3.3: MathML Output

FR-2.3.4: AsciiMath Output

FR-2.3.5: HTML/DOCX Export

2.4 API Endpoints

FR-2.4.1: Text Recognition Endpoint

FR-2.4.2: Strokes Recognition Endpoint

FR-2.4.3: LaTeX Rendering Endpoint

FR-2.4.4: PDF Conversion Endpoint

2.5 Additional Features

FR-2.5.1: Confidence Scoring

FR-2.5.2: Geometry Analysis

FR-2.5.3: Line/Word Segmentation

3. Non-Functional Requirements

3.1 Performance

NFR-3.1.1: Latency

NFR-3.1.2: Throughput

NFR-3.1.3: Batch Processing

3.2 Accuracy

NFR-3.2.1: Printed Math Accuracy

NFR-3.2.2: Handwritten Math Accuracy

NFR-3.2.3: Chemical Formula Accuracy

3.3 Scalability

NFR-3.3.1: Concurrent Users

NFR-3.3.2: Horizontal Scaling

NFR-3.3.3: Memory Usage

3.4 Reliability

NFR-3.4.1: Availability

NFR-3.4.2: Error Handling

NFR-3.4.3: Data Validation

3.5 Security

NFR-3.5.1: Authentication

NFR-3.5.2: Data Privacy

NFR-3.5.3: Input Sanitization

3.6 Usability

NFR-3.6.1: API Design

NFR-3.6.2: Error Messages

3.7 Maintainability

NFR-3.7.1: Code Quality

NFR-3.7.2: Logging

NFR-3.7.3: Monitoring

4. Input/Output Specifications

4.1 Input Specifications

4.1.1 Image Input

4.1.2 PDF Input

4.1.3 Stroke Input (Handwriting)

4.2 Output Specifications

4.2.1 Recognition Response

4.2.2 Error Response

4.2.3 Batch Processing Response

5. API Design

5.1 REST API Specification

Base URL

Authentication

5.2 Endpoints

5.2.1 Text Recognition

5.2.2 Stroke Recognition

5.2.3 LaTeX Rendering

5.2.4 PDF Processing

5.3 Rate Limiting

5.4 Versioning

59 KiB

Raw Blame History