# SPARC Pseudocode: Ruvector-Scipix OCR & Math Recognition Pipeline ## Document Overview This document provides algorithmic pseudocode for the core components of the ruvector-scipix OCR and mathematical expression recognition system. All algorithms use Rust-like syntax and include complexity analysis. --- ## 1. Image Preprocessing Pipeline ### 1.1 Main Preprocessing Algorithm ``` ALGORITHM: PreprocessImage INPUT: imageBytes (Vec), config (PreprocessConfig) OUTPUT: Result CONSTANTS: MAX_IMAGE_SIZE = 4096 × 4096 pixels MIN_DPI = 150 TARGET_DPI = 300 NOISE_THRESHOLD = 0.15 DATA STRUCTURES: ProcessedImage { data: Vec, width: u32, height: u32, channels: u8, metadata: ImageMetadata, regions: Vec } ImageMetadata { dpi: u32, rotation: f32, quality_score: f32, has_math: bool } TextRegion { bbox: BoundingBox, confidence: f32, region_type: RegionType // Text, Math, Diagram } BEGIN // Phase 1: Image Loading and Validation rawImage ← DecodeImage(imageBytes) IF rawImage.is_error() THEN RETURN Error("Failed to decode image") END IF IF rawImage.width > MAX_IMAGE_SIZE OR rawImage.height > MAX_IMAGE_SIZE THEN rawImage ← ResizeImage(rawImage, MAX_IMAGE_SIZE) END IF // Phase 2: Rotation Detection and Correction rotationAngle ← DetectRotation(rawImage) IF ABS(rotationAngle) > 0.5 THEN rawImage ← RotateImage(rawImage, -rotationAngle) END IF // Phase 3: DPI Normalization currentDPI ← EstimateDPI(rawImage) IF currentDPI < MIN_DPI THEN RETURN Error("Image resolution too low") END IF IF currentDPI != TARGET_DPI THEN scaleFactor ← TARGET_DPI / currentDPI rawImage ← ResizeImage(rawImage, scaleFactor) END IF // Phase 4: Noise Reduction noiseLevel ← EstimateNoise(rawImage) IF noiseLevel > NOISE_THRESHOLD THEN rawImage ← ApplyBilateralFilter(rawImage, sigma: 2.0, radius: 3) END IF // Phase 5: Contrast Enhancement enhancedImage ← AdaptiveHistogramEqualization(rawImage, clip_limit: 2.0) // Phase 6: Text Region Detection regions ← DetectTextRegions(enhancedImage) // Phase 7: Quality Assessment qualityScore ← AssessQuality(enhancedImage, regions) metadata ← ImageMetadata { dpi: TARGET_DPI, rotation: rotationAngle, quality_score: qualityScore, has_math: ContainsMathRegions(regions) } RETURN Ok(ProcessedImage { data: enhancedImage.to_bytes(), width: enhancedImage.width, height: enhancedImage.height, channels: enhancedImage.channels, metadata: metadata, regions: regions }) END COMPLEXITY ANALYSIS: Time Complexity: - Image decoding: O(n) where n = pixel count - Rotation detection: O(n log n) using Hough transform - Image rotation: O(n) - DPI scaling: O(n) - Bilateral filter: O(n × r²) where r = radius - CLAHE: O(n) - Region detection: O(n log n) Total: O(n log n) Space Complexity: - Raw image buffer: O(n) - Intermediate buffers: O(n) - Region storage: O(k) where k = region count Total: O(n) ``` ### 1.2 Rotation Detection Algorithm ``` ALGORITHM: DetectRotation INPUT: image (Image) OUTPUT: angle (f32) BEGIN // Convert to grayscale if needed grayImage ← ToGrayscale(image) // Apply edge detection edges ← CannyEdgeDetection(grayImage, low: 50, high: 150) // Use Hough Line Transform to detect dominant lines lines ← HoughLineTransform(edges, rho: 1.0, theta: PI/180, threshold: 100) IF lines.is_empty() THEN RETURN 0.0 END IF // Cluster angles into dominant orientations angles ← [] FOR EACH line IN lines DO angle ← line.theta * 180 / PI // Normalize to [-45, 45] range WHILE angle > 45 DO angle ← angle - 90 END WHILE WHILE angle < -45 DO angle ← angle + 90 END WHILE angles.push(angle) END FOR // Use median for robustness angles.sort() medianAngle ← angles[angles.len() / 2] RETURN medianAngle END COMPLEXITY ANALYSIS: Time: O(n log n) for Hough transform Space: O(n) for edge map ``` ### 1.3 Text Region Detection ``` ALGORITHM: DetectTextRegions INPUT: image (Image) OUTPUT: regions (Vec) DATA STRUCTURES: Component { pixels: Vec, bbox: BoundingBox, area: u32 } BEGIN // Use MSER (Maximally Stable Extremal Regions) binaryImage ← AdaptiveThreshold(image, window: 15) components ← FindConnectedComponents(binaryImage) regions ← [] FOR EACH comp IN components DO // Filter by geometric properties aspectRatio ← comp.bbox.width / comp.bbox.height density ← comp.area / (comp.bbox.width * comp.bbox.height) IF aspectRatio > 0.1 AND aspectRatio < 10.0 AND density > 0.3 THEN // Classify region type features ← ExtractRegionFeatures(comp, image) regionType ← ClassifyRegion(features) region ← TextRegion { bbox: comp.bbox, confidence: features.confidence, region_type: regionType } regions.push(region) END IF END FOR // Merge nearby regions mergedRegions ← MergeOverlappingRegions(regions, iou_threshold: 0.3) RETURN mergedRegions END COMPLEXITY ANALYSIS: Time: O(n × α(n)) where α is inverse Ackermann (connected components) Space: O(k) where k = component count ``` --- ## 2. OCR Engine Core ### 2.1 Main OCR Pipeline ``` ALGORITHM: RecognizeText INPUT: image (ProcessedImage), model (VisionTransformer) OUTPUT: Result DATA STRUCTURES: RecognitionResult { lines: Vec, confidence: f32, processing_time_ms: u64 } TextLine { text: String, bbox: BoundingBox, words: Vec, confidence: f32 } Word { text: String, bbox: BoundingBox, chars: Vec, confidence: f32 } Character { char: char, bbox: BoundingBox, confidence: f32, alternatives: Vec<(char, f32)> } BEGIN startTime ← GetCurrentTime() // Phase 1: Vision Transformer Encoding encodedFeatures ← EncodeImageFeatures(image, model) // Phase 2: Text Line Detection textLines ← DetectTextLines(encodedFeatures, image.regions) // Phase 3: Character Recognition recognizedLines ← [] totalConfidence ← 0.0 FOR EACH lineRegion IN textLines DO lineImage ← CropRegion(image, lineRegion.bbox) // Run sequence-to-sequence recognition words ← RecognizeLineSequence(lineImage, model, encodedFeatures) lineText ← words.map(|w| w.text).join(" ") lineConfidence ← ComputeLineConfidence(words) textLine ← TextLine { text: lineText, bbox: lineRegion.bbox, words: words, confidence: lineConfidence } recognizedLines.push(textLine) totalConfidence ← totalConfidence + lineConfidence END FOR avgConfidence ← totalConfidence / recognizedLines.len() processingTime ← GetCurrentTime() - startTime RETURN Ok(RecognitionResult { lines: recognizedLines, confidence: avgConfidence, processing_time_ms: processingTime }) END COMPLEXITY ANALYSIS: Time Complexity: - Vision Transformer encoding: O(n² × d) where d = embedding dim - Line detection: O(k × log k) where k = regions - Character recognition per line: O(m × d²) where m = line length - Total lines L: O(L × m × d²) Overall: O(n² × d + L × m × d²) Space Complexity: - Feature maps: O(n × d) - Attention maps: O(n² × h) where h = attention heads - Output storage: O(L × m) Total: O(n² × h + n × d) ``` ### 2.2 Vision Transformer Encoding ``` ALGORITHM: EncodeImageFeatures INPUT: image (ProcessedImage), model (VisionTransformer) OUTPUT: features (FeatureMap) DATA STRUCTURES: FeatureMap { embeddings: Tensor, // Shape: [seq_len, embed_dim] attention_weights: Tensor, // Shape: [heads, seq_len, seq_len] positions: Vec } VisionTransformer { patch_size: u32, embed_dim: u32, num_heads: u32, num_layers: u32, weights: ModelWeights } BEGIN // Phase 1: Patch Extraction patchSize ← model.patch_size numPatchesH ← image.height / patchSize numPatchesW ← image.width / patchSize patches ← [] positions ← [] FOR h IN 0..numPatchesH DO FOR w IN 0..numPatchesW DO y ← h * patchSize x ← w * patchSize patch ← ExtractPatch(image, x, y, patchSize) patches.push(patch) positions.push(Point{x, y}) END FOR END FOR // Phase 2: Patch Embedding embeddings ← [] FOR EACH patch IN patches DO // Linear projection of flattened patch flatPatch ← Flatten(patch) embedding ← MatMul(model.weights.patch_projection, flatPatch) embeddings.push(embedding) END FOR // Phase 3: Positional Encoding FOR i IN 0..embeddings.len() DO posEncoding ← ComputePositionalEncoding(i, model.embed_dim) embeddings[i] ← embeddings[i] + posEncoding END FOR // Add [CLS] token clsToken ← model.weights.cls_token embeddings.insert(0, clsToken) // Phase 4: Transformer Layers x ← Tensor::from(embeddings) allAttentionWeights ← [] FOR layer IN 0..model.num_layers DO // Multi-head self-attention (x, attentionWeights) ← MultiHeadAttention( x, model.weights.layers[layer], num_heads: model.num_heads ) allAttentionWeights.push(attentionWeights) // Feed-forward network x ← FeedForward(x, model.weights.layers[layer]) // Layer normalization x ← LayerNorm(x, model.weights.layers[layer]) END FOR RETURN FeatureMap { embeddings: x, attention_weights: Stack(allAttentionWeights), positions: positions } END COMPLEXITY ANALYSIS: Time Complexity: - Patch extraction: O(n) where n = pixels - Patch embedding: O(p × d²) where p = patches, d = embed_dim - Attention per layer: O(p² × d) - Total layers L: O(L × p² × d) Overall: O(L × p² × d) Space Complexity: - Embeddings: O(p × d) - Attention matrices: O(L × h × p²) where h = heads Total: O(L × h × p² + p × d) ``` ### 2.3 Character Recognition Sequence ``` ALGORITHM: RecognizeLineSequence INPUT: lineImage (Image), model (VisionTransformer), features (FeatureMap) OUTPUT: words (Vec) DATA STRUCTURES: BeamSearchState { sequence: Vec, score: f32, hidden_state: Tensor } CONSTANTS: BEAM_WIDTH = 5 MAX_SEQUENCE_LENGTH = 256 END_TOKEN = '' SPACE_TOKEN = '' BEGIN // Initialize beam search initialState ← BeamSearchState { sequence: [], score: 0.0, hidden_state: features.embeddings[0] // CLS token } beams ← [initialState] // Beam search decoding FOR step IN 0..MAX_SEQUENCE_LENGTH DO candidates ← [] FOR EACH beam IN beams DO IF beam.sequence.last() == END_TOKEN THEN candidates.push(beam) CONTINUE END IF // Get character probabilities from model (logits, newHiddenState) ← model.decode_step( beam.hidden_state, features.embeddings ) probabilities ← Softmax(logits) // Get top-k characters topK ← GetTopK(probabilities, k: BEAM_WIDTH) FOR EACH (char, prob) IN topK DO newSequence ← beam.sequence.clone() newSequence.push(char) // Log probability for numerical stability newScore ← beam.score + LOG(prob) newBeam ← BeamSearchState { sequence: newSequence, score: newScore, hidden_state: newHiddenState } candidates.push(newBeam) END FOR END FOR // Keep top BEAM_WIDTH candidates candidates.sort_by(|a, b| b.score.cmp(a.score)) beams ← candidates[0..BEAM_WIDTH] // Check if all beams ended allEnded ← beams.all(|b| b.sequence.last() == END_TOKEN) IF allEnded THEN BREAK END IF END FOR // Take best beam bestBeam ← beams[0] // Split sequence into words words ← [] currentWord ← [] currentBBox ← BoundingBox::new() FOR i IN 0..bestBeam.sequence.len() DO char ← bestBeam.sequence[i] IF char == SPACE_TOKEN OR char == END_TOKEN THEN IF NOT currentWord.is_empty() THEN wordText ← currentWord.join("") word ← Word { text: wordText, bbox: currentBBox, chars: currentWord.clone(), confidence: EXP(bestBeam.score / bestBeam.sequence.len()) } words.push(word) currentWord.clear() END IF ELSE currentWord.push(Character { char: char, bbox: EstimateCharBBox(lineImage, i), confidence: EXP(bestBeam.score / (i + 1)), alternatives: [] }) END IF END FOR RETURN words END COMPLEXITY ANALYSIS: Time Complexity: - Beam search steps: O(T × B × V) where: T = max sequence length B = beam width V = vocabulary size - Sorting per step: O(B × V × log(B × V)) Overall: O(T × B × V × log(B × V)) Space Complexity: - Beam storage: O(B × T × d) where d = hidden dim - Candidate buffer: O(B × V) Total: O(B × T × d) ``` --- ## 3. Mathematical Expression Parser ### 3.1 Math Expression Recognition ``` ALGORITHM: RecognizeMathExpression INPUT: region (TextRegion), image (ProcessedImage), model (MathModel) OUTPUT: Result DATA STRUCTURES: MathExpression { latex: String, tree: ExpressionTree, symbols: Vec, confidence: f32 } ExpressionTree { root: Box, height: u32 } TreeNode { symbol: MathSymbol, relationship: SpatialRelation, children: Vec> } MathSymbol { symbol_type: SymbolType, // Digit, Operator, Letter, Special value: String, bbox: BoundingBox, confidence: f32 } SpatialRelation { relation_type: RelationType, // Above, Below, Right, Superscript, Subscript distance: f32, alignment: f32 } BEGIN // Phase 1: Extract math region mathImage ← CropRegion(image, region.bbox) // Phase 2: Symbol Detection and Classification symbols ← DetectMathSymbols(mathImage, model) IF symbols.is_empty() THEN RETURN Error("No mathematical symbols detected") END IF // Phase 3: Spatial Relationship Analysis relationships ← AnalyzeSpatialRelationships(symbols) // Phase 4: Expression Tree Construction tree ← BuildExpressionTree(symbols, relationships) // Phase 5: LaTeX Generation latex ← GenerateLaTeX(tree) // Calculate overall confidence avgConfidence ← symbols.map(|s| s.confidence).average() RETURN Ok(MathExpression { latex: latex, tree: tree, symbols: symbols, confidence: avgConfidence }) END COMPLEXITY ANALYSIS: Time: O(n² × log n) where n = symbol count Space: O(n × h) where h = tree height ``` ### 3.2 Symbol Detection and Classification ``` ALGORITHM: DetectMathSymbols INPUT: mathImage (Image), model (MathModel) OUTPUT: symbols (Vec) CONSTANTS: SYMBOL_MIN_SIZE = 8 pixels SYMBOL_MAX_SIZE = 128 pixels CONFIDENCE_THRESHOLD = 0.7 BEGIN // Phase 1: Connected Component Analysis binaryImage ← AdaptiveThreshold(mathImage, window: 11) components ← FindConnectedComponents(binaryImage) symbols ← [] FOR EACH comp IN components DO // Filter by size width ← comp.bbox.width height ← comp.bbox.height IF width < SYMBOL_MIN_SIZE OR height < SYMBOL_MIN_SIZE THEN CONTINUE END IF IF width > SYMBOL_MAX_SIZE OR height > SYMBOL_MAX_SIZE THEN // Might be compound symbol, try to split subComponents ← SplitComponent(comp) FOR EACH subComp IN subComponents DO ProcessSymbol(subComp, mathImage, model, symbols) END FOR ELSE ProcessSymbol(comp, mathImage, model, symbols) END IF END FOR // Sort symbols left-to-right, top-to-bottom symbols.sort_by(|a, b| { IF ABS(a.bbox.y - b.bbox.y) < 10 THEN a.bbox.x.cmp(b.bbox.x) ELSE a.bbox.y.cmp(b.bbox.y) END IF }) RETURN symbols END SUBROUTINE: ProcessSymbol INPUT: component (Component), image (Image), model (MathModel), symbols (Vec) OUTPUT: None (modifies symbols) BEGIN // Extract symbol image symbolImage ← CropRegion(image, component.bbox) // Normalize to model input size normalizedSymbol ← ResizeImage(symbolImage, 64, 64) // Classify symbol (symbolClass, confidence) ← model.classify_symbol(normalizedSymbol) IF confidence >= CONFIDENCE_THRESHOLD THEN symbol ← MathSymbol { symbol_type: DetermineSymbolType(symbolClass), value: symbolClass.to_string(), bbox: component.bbox, confidence: confidence } symbols.push(symbol) END IF END COMPLEXITY ANALYSIS: Time: O(n × c) where n = components, c = classification time Space: O(n) for symbol storage ``` ### 3.3 Spatial Relationship Analysis ``` ALGORITHM: AnalyzeSpatialRelationships INPUT: symbols (Vec) OUTPUT: relationships (Vec<(usize, usize, SpatialRelation)>) DATA STRUCTURES: RelationFeatures { horizontal_distance: f32, vertical_distance: f32, size_ratio: f32, vertical_alignment: f32, horizontal_alignment: f32 } CONSTANTS: SUPERSCRIPT_Y_THRESHOLD = 0.6 // Relative to symbol height SUBSCRIPT_Y_THRESHOLD = 0.4 FRACTION_ALIGNMENT_THRESHOLD = 0.8 BEGIN relationships ← [] // Build spatial index for efficient queries spatialIndex ← BuildQuadTree(symbols) FOR i IN 0..symbols.len() DO symbolA ← symbols[i] // Find nearby symbols nearbySymbols ← spatialIndex.query_radius( symbolA.bbox.center(), radius: symbolA.bbox.width * 3 ) FOR EACH (j, symbolB) IN nearbySymbols DO IF i >= j THEN CONTINUE // Avoid duplicate pairs END IF // Extract relationship features features ← ExtractRelationFeatures(symbolA, symbolB) // Classify relationship relation ← ClassifyRelation(features, symbolA, symbolB) IF relation.is_some() THEN relationships.push((i, j, relation.unwrap())) END IF END FOR END FOR RETURN relationships END SUBROUTINE: ClassifyRelation INPUT: features (RelationFeatures), symbolA (MathSymbol), symbolB (MathSymbol) OUTPUT: Option BEGIN centerA ← symbolA.bbox.center() centerB ← symbolB.bbox.center() deltaX ← centerB.x - centerA.x deltaY ← centerB.y - centerA.y // Determine dominant relationship // Superscript/Subscript detection IF deltaX > 0 AND deltaX < symbolA.bbox.width * 1.5 THEN relativeY ← deltaY / symbolA.bbox.height IF relativeY < -SUPERSCRIPT_Y_THRESHOLD THEN RETURN Some(SpatialRelation { relation_type: Superscript, distance: SQRT(deltaX² + deltaY²), alignment: features.horizontal_alignment }) ELSE IF relativeY > SUBSCRIPT_Y_THRESHOLD THEN RETURN Some(SpatialRelation { relation_type: Subscript, distance: SQRT(deltaX² + deltaY²), alignment: features.horizontal_alignment }) END IF END IF // Fraction detection (vertical alignment) IF features.vertical_alignment > FRACTION_ALIGNMENT_THRESHOLD THEN IF deltaY < 0 THEN RETURN Some(SpatialRelation { relation_type: Above, distance: ABS(deltaY), alignment: features.vertical_alignment }) ELSE IF deltaY > 0 THEN RETURN Some(SpatialRelation { relation_type: Below, distance: ABS(deltaY), alignment: features.vertical_alignment }) END IF END IF // Horizontal sequence (default) IF deltaX > 0 AND ABS(deltaY) < symbolA.bbox.height * 0.3 THEN RETURN Some(SpatialRelation { relation_type: Right, distance: deltaX, alignment: features.horizontal_alignment }) END IF RETURN None END COMPLEXITY ANALYSIS: Time Complexity: - QuadTree construction: O(n log n) - For each symbol, query nearby: O(log n + k) where k = nearby count - Total: O(n × (log n + k)) Average case: O(n log n) if k is constant Space Complexity: - QuadTree: O(n) - Relationships: O(n²) worst case, O(n) average Total: O(n²) worst case ``` ### 3.4 Expression Tree Construction ``` ALGORITHM: BuildExpressionTree INPUT: symbols (Vec), relationships (Vec<(usize, usize, SpatialRelation)>) OUTPUT: tree (ExpressionTree) DATA STRUCTURES: TreeBuilder { nodes: Vec>, parent_map: HashMap, relation_graph: AdjacencyList } BEGIN // Phase 1: Build relationship graph graph ← BuildRelationGraph(symbols, relationships) // Phase 2: Identify root candidates (symbols with no parents) rootCandidates ← [] FOR i IN 0..symbols.len() DO IF NOT HasIncomingEdge(graph, i, excludeRight: true) THEN rootCandidates.push(i) END IF END FOR // Phase 3: Build tree from leftmost root rootCandidates.sort_by(|a, b| { symbols[*a].bbox.x.cmp(&symbols[*b].bbox.x) }) rootIdx ← rootCandidates[0] // Phase 4: Recursive tree construction root ← BuildSubtree(rootIdx, symbols, graph, visited: Set::new()) // Phase 5: Calculate tree height height ← CalculateHeight(root) RETURN ExpressionTree { root: root, height: height } END SUBROUTINE: BuildSubtree INPUT: nodeIdx (usize), symbols (Vec), graph (AdjacencyList), visited (Set) OUTPUT: node (Box) BEGIN IF visited.contains(nodeIdx) THEN RETURN Error("Cycle detected in expression tree") END IF visited.insert(nodeIdx) symbol ← symbols[nodeIdx] children ← [] // Get all outgoing edges sorted by relationship priority edges ← graph.get_outgoing(nodeIdx) edges.sort_by(|a, b| { // Priority: Superscript > Subscript > Above > Below > Right GetRelationPriority(a.relation).cmp(GetRelationPriority(b.relation)) }) FOR EACH edge IN edges DO IF NOT visited.contains(edge.target) THEN childNode ← BuildSubtree(edge.target, symbols, graph, visited) childNode.relationship ← edge.relation children.push(childNode) END IF END FOR node ← TreeNode { symbol: symbol.clone(), relationship: SpatialRelation::default(), children: children } RETURN Box::new(node) END COMPLEXITY ANALYSIS: Time: O(n × log n) for graph construction and tree building Space: O(n × h) where h = average tree height ``` ### 3.5 LaTeX Generation ``` ALGORITHM: GenerateLaTeX INPUT: tree (ExpressionTree) OUTPUT: latex (String) BEGIN latex ← RecursiveGenerateLaTeX(tree.root) // Wrap in delimiters latex ← "\\(" + latex + "\\)" RETURN latex END SUBROUTINE: RecursiveGenerateLaTeX INPUT: node (Box) OUTPUT: latex (String) BEGIN symbol ← node.symbol baseLatex ← SymbolToLatex(symbol) // Group children by relationship type superscripts ← [] subscripts ← [] numerator ← None denominator ← None rightChildren ← [] FOR EACH child IN node.children DO MATCH child.relationship.relation_type: Superscript → superscripts.push(child) Subscript → subscripts.push(child) Above → numerator ← Some(child) Below → denominator ← Some(child) Right → rightChildren.push(child) END MATCH END FOR // Build LaTeX string result ← baseLatex // Handle fractions IF numerator.is_some() AND denominator.is_some() THEN numLatex ← RecursiveGenerateLaTeX(numerator.unwrap()) denomLatex ← RecursiveGenerateLaTeX(denominator.unwrap()) result ← "\\frac{" + numLatex + "}{" + denomLatex + "}" END IF // Handle superscripts IF NOT superscripts.is_empty() THEN superLatex ← superscripts .map(|c| RecursiveGenerateLaTeX(c)) .join("") result ← result + "^{" + superLatex + "}" END IF // Handle subscripts IF NOT subscripts.is_empty() THEN subLatex ← subscripts .map(|c| RecursiveGenerateLaTeX(c)) .join("") result ← result + "_{" + subLatex + "}" END IF // Handle right children (sequential) FOR EACH child IN rightChildren DO childLatex ← RecursiveGenerateLaTeX(child) // Add spacing for operators IF IsOperator(child.symbol) THEN result ← result + " " + childLatex + " " ELSE result ← result + childLatex END IF END FOR RETURN result END SUBROUTINE: SymbolToLatex INPUT: symbol (MathSymbol) OUTPUT: latex (String) BEGIN MATCH symbol.symbol_type: Digit → RETURN symbol.value Letter → RETURN symbol.value Operator → RETURN OperatorToLatex(symbol.value) Special → RETURN SpecialToLatex(symbol.value) END MATCH RETURN symbol.value END COMPLEXITY ANALYSIS: Time: O(n) where n = nodes in tree Space: O(h) for recursion stack where h = tree height ``` --- ## 4. Output Format Conversion ### 4.1 Multi-Format Generation ``` ALGORITHM: ConvertToFormats INPUT: mathExpr (MathExpression), formats (Vec) OUTPUT: Result, Error> DATA STRUCTURES: OutputFormat { MMD, // Markdown with delimiters LaTeXStyled, // Standalone LaTeX MathML, // MathML XML HTML // Rendered HTML } BEGIN results ← HashMap::new() FOR EACH format IN formats DO output ← MATCH format: MMD → GenerateMMD(mathExpr) LaTeXStyled → GenerateStyledLaTeX(mathExpr) MathML → GenerateMathML(mathExpr.tree) HTML → GenerateHTML(mathExpr) END MATCH results.insert(format, output) END FOR RETURN Ok(results) END COMPLEXITY ANALYSIS: Time: O(f × n) where f = format count, n = expression size Space: O(f × n) for storing all formats ``` ### 4.2 MMD Generation ``` ALGORITHM: GenerateMMD INPUT: mathExpr (MathExpression) OUTPUT: mmd (String) CONSTANTS: INLINE_DELIMITER = "$" DISPLAY_DELIMITER = "$$" BEGIN latex ← mathExpr.latex // Determine if expression should be display or inline isDisplayMath ← ShouldBeDisplayMath(mathExpr) IF isDisplayMath THEN mmd ← DISPLAY_DELIMITER + "\n" + latex + "\n" + DISPLAY_DELIMITER ELSE mmd ← INLINE_DELIMITER + latex + INLINE_DELIMITER END IF RETURN mmd END SUBROUTINE: ShouldBeDisplayMath INPUT: mathExpr (MathExpression) OUTPUT: isDisplay (bool) BEGIN // Display math if: // 1. Contains fractions or large operators // 2. Tree height > 2 // 3. Width > threshold hasFractions ← mathExpr.latex.contains("\\frac") hasLargeOps ← mathExpr.latex.contains("\\sum") OR mathExpr.latex.contains("\\int") OR mathExpr.latex.contains("\\prod") isTall ← mathExpr.tree.height > 2 isWide ← mathExpr.symbols.len() > 10 RETURN hasFractions OR hasLargeOps OR isTall OR isWide END COMPLEXITY ANALYSIS: Time: O(n) where n = LaTeX string length Space: O(n) for output string ``` ### 4.3 MathML Generation ``` ALGORITHM: GenerateMathML INPUT: tree (ExpressionTree) OUTPUT: mathml (String) BEGIN xml ← XMLBuilder::new() xml.start_element("math", [("xmlns", "http://www.w3.org/1998/Math/MathML")]) RecursiveGenerateMathML(tree.root, xml) xml.end_element("math") RETURN xml.to_string() END SUBROUTINE: RecursiveGenerateMathML INPUT: node (Box), xml (XMLBuilder) OUTPUT: None (modifies xml) BEGIN symbol ← node.symbol // Determine MathML element type MATCH symbol.symbol_type: Digit OR Letter → xml.element("mi", symbol.value) Operator → xml.element("mo", symbol.value) Special → HandleSpecialSymbol(symbol, xml) END MATCH // Handle relationships IF HasSuperscript(node) THEN xml.start_element("msup") RecursiveGenerateMathML(GetBase(node), xml) RecursiveGenerateMathML(GetSuperscript(node), xml) xml.end_element("msup") ELSE IF HasSubscript(node) THEN xml.start_element("msub") RecursiveGenerateMathML(GetBase(node), xml) RecursiveGenerateMathML(GetSubscript(node), xml) xml.end_element("msub") ELSE IF HasFraction(node) THEN xml.start_element("mfrac") RecursiveGenerateMathML(GetNumerator(node), xml) RecursiveGenerateMathML(GetDenominator(node), xml) xml.end_element("mfrac") END IF // Process right children FOR EACH child IN GetRightChildren(node) DO RecursiveGenerateMathML(child, xml) END FOR END COMPLEXITY ANALYSIS: Time: O(n) tree traversal Space: O(n) for XML string ``` ### 4.4 HTML Rendering ``` ALGORITHM: GenerateHTML INPUT: mathExpr (MathExpression) OUTPUT: html (String) BEGIN // Use KaTeX or MathJax for rendering latex ← mathExpr.latex html ← """
""" // Add accessibility attributes html ← AddAriaLabels(html, mathExpr) RETURN html END COMPLEXITY ANALYSIS: Time: O(n) string concatenation Space: O(n) output size ``` --- ## 5. Batch Processing ### 5.1 Parallel Batch Processing ``` ALGORITHM: ProcessBatch INPUT: inputs (Vec), config (ProcessConfig) OUTPUT: Result, Error> DATA STRUCTURES: InputSource { source_type: SourceType, // Image, PDF, Directory path: PathBuf, page_range: Option> } ProcessResult { input: InputSource, output: RecognitionResult, processing_time_ms: u64, status: ResultStatus } ProcessConfig { max_parallel: usize, timeout_ms: u64, cache_enabled: bool } BEGIN // Phase 1: Expand inputs (handle PDFs and directories) expandedInputs ← [] FOR EACH input IN inputs DO MATCH input.source_type: PDF → pages ← ExtractPDFPages(input.path, input.page_range) expandedInputs.extend(pages) Directory → images ← FindImagesInDirectory(input.path) expandedInputs.extend(images) Image → expandedInputs.push(input) END MATCH END FOR // Phase 2: Create processing queue queue ← WorkQueue::new(expandedInputs) results ← ConcurrentVec::new() // Phase 3: Parallel processing numWorkers ← MIN(config.max_parallel, CPU_COUNT) PARALLEL FOR worker IN 0..numWorkers DO LOOP input ← queue.pop() IF input.is_none() THEN BREAK END IF startTime ← GetCurrentTime() // Process single input result ← ProcessSingleInput( input.unwrap(), config, timeout: config.timeout_ms ) processingTime ← GetCurrentTime() - startTime processResult ← ProcessResult { input: input.unwrap(), output: result, processing_time_ms: processingTime, status: DetermineStatus(result) } results.push(processResult) END LOOP END PARALLEL // Phase 4: Aggregate and return finalResults ← results.into_vec() finalResults.sort_by(|a, b| a.input.path.cmp(&b.input.path)) RETURN Ok(finalResults) END COMPLEXITY ANALYSIS: Time Complexity: - With P workers, N inputs, T time per input - Parallel: O(N × T / P) - Sequential equivalent: O(N × T) - Speedup: ~P (linear with worker count) Space Complexity: - Queue: O(N) - Results: O(N × R) where R = result size - Worker memory: O(P × M) where M = model size Total: O(N × R + P × M) ``` ### 5.2 PDF Page Extraction ``` ALGORITHM: ExtractPDFPages INPUT: pdfPath (PathBuf), pageRange (Option>) OUTPUT: pages (Vec) BEGIN // Load PDF document document ← PDFDocument::load(pdfPath) IF document.is_error() THEN RETURN Error("Failed to load PDF") END IF // Determine page range totalPages ← document.page_count() range ← pageRange.unwrap_or(0..totalPages) pages ← [] FOR pageNum IN range DO IF pageNum >= totalPages THEN BREAK END IF // Render page to image page ← document.get_page(pageNum) // Render at high DPI for quality image ← page.render(dpi: 300) // Create temporary file tempPath ← CreateTempFile(format!("page_{}.png", pageNum)) image.save(tempPath) inputSource ← InputSource { source_type: Image, path: tempPath, page_range: None } pages.push(inputSource) END FOR RETURN pages END COMPLEXITY ANALYSIS: Time: O(P × R) where P = pages, R = render time per page Space: O(P × S) where S = image size ``` ### 5.3 Result Aggregation ``` ALGORITHM: AggregateResults INPUT: results (Vec) OUTPUT: aggregated (AggregatedResults) DATA STRUCTURES: AggregatedResults { total_count: usize, success_count: usize, failure_count: usize, total_processing_time_ms: u64, average_confidence: f32, results_by_status: HashMap> } BEGIN totalCount ← results.len() successCount ← 0 failureCount ← 0 totalTime ← 0 totalConfidence ← 0.0 byStatus ← HashMap::new() FOR EACH result IN results DO totalTime ← totalTime + result.processing_time_ms MATCH result.status: Success → successCount ← successCount + 1 totalConfidence ← totalConfidence + result.output.confidence Failure → failureCount ← failureCount + 1 END MATCH // Group by status IF NOT byStatus.contains_key(result.status) THEN byStatus.insert(result.status, []) END IF byStatus.get_mut(result.status).push(result) END FOR avgConfidence ← IF successCount > 0 THEN totalConfidence / successCount ELSE 0.0 END IF RETURN AggregatedResults { total_count: totalCount, success_count: successCount, failure_count: failureCount, total_processing_time_ms: totalTime, average_confidence: avgConfidence, results_by_status: byStatus } END COMPLEXITY ANALYSIS: Time: O(n) single pass Space: O(n) for grouping ``` --- ## 6. Caching and Memoization ### 6.1 Model Weight Caching ``` ALGORITHM: LoadModelWithCache INPUT: modelPath (PathBuf), cacheConfig (CacheConfig) OUTPUT: Result DATA STRUCTURES: CacheConfig { enabled: bool, cache_dir: PathBuf, max_cache_size_mb: u64, ttl_seconds: u64 } CachedModel { weights: Vec, metadata: ModelMetadata, cached_at: Timestamp, access_count: u64 } BEGIN IF NOT cacheConfig.enabled THEN RETURN LoadModelDirect(modelPath) END IF // Generate cache key from model path and version cacheKey ← ComputeHash(modelPath, algorithm: SHA256) cachePath ← cacheConfig.cache_dir.join(cacheKey) // Check if cached version exists and is valid IF cachePath.exists() THEN cachedModel ← DeserializeCachedModel(cachePath) // Check TTL age ← GetCurrentTime() - cachedModel.cached_at IF age < cacheConfig.ttl_seconds THEN // Cache hit cachedModel.access_count ← cachedModel.access_count + 1 UpdateCacheMetadata(cachePath, cachedModel.metadata) model ← DeserializeModel(cachedModel.weights) RETURN Ok(model) ELSE // Cache expired DeleteFile(cachePath) END IF END IF // Cache miss - load from disk model ← LoadModelDirect(modelPath) IF model.is_error() THEN RETURN model END IF // Serialize and cache serializedWeights ← SerializeModel(model.unwrap()) cachedModel ← CachedModel { weights: serializedWeights, metadata: model.metadata, cached_at: GetCurrentTime(), access_count: 1 } // Check cache size limit EnsureCacheSize(cacheConfig) // Write to cache WriteCachedModel(cachePath, cachedModel) RETURN model END SUBROUTINE: EnsureCacheSize INPUT: cacheConfig (CacheConfig) OUTPUT: None BEGIN currentSize ← GetDirectorySize(cacheConfig.cache_dir) maxSize ← cacheConfig.max_cache_size_mb * 1024 * 1024 IF currentSize <= maxSize THEN RETURN END IF // Evict least recently used models cachedFiles ← ListFiles(cacheConfig.cache_dir) // Sort by last access time cachedFiles.sort_by(|a, b| { a.metadata.accessed_at.cmp(&b.metadata.accessed_at) }) freedSpace ← 0 targetFree ← currentSize - maxSize FOR EACH file IN cachedFiles DO IF freedSpace >= targetFree THEN BREAK END IF fileSize ← GetFileSize(file) DeleteFile(file) freedSpace ← freedSpace + fileSize END FOR END COMPLEXITY ANALYSIS: Time Complexity: - Cache hit: O(1) for lookup + O(m) for deserialization - Cache miss: O(m) for model loading + O(m) for serialization - Eviction: O(k log k) where k = cached files Space Complexity: - Cached model: O(m) where m = model size - LRU tracking: O(k) ``` ### 6.2 Result Caching with Ruvector ``` ALGORITHM: CacheResultWithVector INPUT: imageHash (Hash), result (RecognitionResult), vectorStore (RuvectorStore) OUTPUT: Result<(), Error> DATA STRUCTURES: RuvectorStore { index: VectorIndex, metadata_db: HashMap, config: VectorConfig } VectorConfig { embedding_dim: usize, similarity_threshold: f32, max_cache_entries: usize } ResultMetadata { result: RecognitionResult, image_hash: Hash, cached_at: Timestamp, hit_count: u64 } BEGIN // Phase 1: Generate perceptual hash perceptualHash ← ComputePerceptualHash(imageHash) // Phase 2: Check if already cached IF vectorStore.metadata_db.contains_key(perceptualHash) THEN // Update metadata metadata ← vectorStore.metadata_db.get_mut(perceptualHash) metadata.hit_count ← metadata.hit_count + 1 RETURN Ok(()) END IF // Phase 3: Generate embedding for the result embedding ← GenerateResultEmbedding(result) // Phase 4: Store in vector index vectorStore.index.insert( id: perceptualHash, vector: embedding ) // Phase 5: Store metadata metadata ← ResultMetadata { result: result, image_hash: imageHash, cached_at: GetCurrentTime(), hit_count: 1 } vectorStore.metadata_db.insert(perceptualHash, metadata) // Phase 6: Enforce cache size limit IF vectorStore.metadata_db.len() > vectorStore.config.max_cache_entries THEN EvictLeastUsedEntry(vectorStore) END IF RETURN Ok(()) END ALGORITHM: QuerySimilarCachedResult INPUT: imageHash (Hash), vectorStore (RuvectorStore) OUTPUT: Option BEGIN // Generate perceptual hash perceptualHash ← ComputePerceptualHash(imageHash) // Exact match check IF vectorStore.metadata_db.contains_key(perceptualHash) THEN metadata ← vectorStore.metadata_db.get(perceptualHash) metadata.hit_count ← metadata.hit_count + 1 RETURN Some(metadata.result.clone()) END IF // Generate query embedding queryEmbedding ← GenerateImageEmbedding(imageHash) // Search for similar results results ← vectorStore.index.search( query: queryEmbedding, k: 1, threshold: vectorStore.config.similarity_threshold ) IF results.is_empty() THEN RETURN None END IF bestMatch ← results[0] IF bestMatch.similarity >= vectorStore.config.similarity_threshold THEN metadata ← vectorStore.metadata_db.get(bestMatch.id) metadata.hit_count ← metadata.hit_count + 1 RETURN Some(metadata.result.clone()) END IF RETURN None END COMPLEXITY ANALYSIS: Caching: Time: O(d) for embedding + O(log n) for index insertion Space: O(n × d) where n = cached entries, d = embedding dim Querying: Time: O(d) for embedding + O(log n × d) for ANN search Space: O(k) for results where k = search parameter ``` ### 6.3 Incremental Update Cache ``` ALGORITHM: UpdateCacheIncremental INPUT: updates (Vec), vectorStore (RuvectorStore) OUTPUT: Result<(), Error> DATA STRUCTURES: CacheUpdate { operation: UpdateOp, // Insert, Update, Delete image_hash: Hash, result: Option } UpdateOp { Insert, Update, Delete } BEGIN // Batch updates for efficiency insertBatch ← [] updateBatch ← [] deleteBatch ← [] FOR EACH update IN updates DO MATCH update.operation: Insert → insertBatch.push(update) Update → updateBatch.push(update) Delete → deleteBatch.push(update) END MATCH END FOR // Process deletes first FOR EACH update IN deleteBatch DO perceptualHash ← ComputePerceptualHash(update.image_hash) vectorStore.index.remove(perceptualHash) vectorStore.metadata_db.remove(perceptualHash) END FOR // Process updates FOR EACH update IN updateBatch DO perceptualHash ← ComputePerceptualHash(update.image_hash) IF vectorStore.metadata_db.contains_key(perceptualHash) THEN // Update existing entry embedding ← GenerateResultEmbedding(update.result.unwrap()) vectorStore.index.update(perceptualHash, embedding) metadata ← vectorStore.metadata_db.get_mut(perceptualHash) metadata.result ← update.result.unwrap() metadata.cached_at ← GetCurrentTime() END IF END FOR // Process inserts in batch IF NOT insertBatch.is_empty() THEN embeddings ← [] metadataList ← [] FOR EACH update IN insertBatch DO embedding ← GenerateResultEmbedding(update.result.unwrap()) embeddings.push(embedding) perceptualHash ← ComputePerceptualHash(update.image_hash) metadata ← ResultMetadata { result: update.result.unwrap(), image_hash: update.image_hash, cached_at: GetCurrentTime(), hit_count: 1 } metadataList.push((perceptualHash, metadata)) END FOR // Batch insert into vector index vectorStore.index.insert_batch(embeddings) // Batch insert metadata FOR EACH (hash, metadata) IN metadataList DO vectorStore.metadata_db.insert(hash, metadata) END FOR END IF RETURN Ok(()) END COMPLEXITY ANALYSIS: Time: O(b × d) where b = batch size, d = embedding dim Space: O(b × d) for batch processing ``` --- ## Summary: Complexity Analysis ### Overall System Complexity | Component | Time Complexity | Space Complexity | |-----------|----------------|------------------| | Image Preprocessing | O(n log n) | O(n) | | Vision Transformer | O(L × p² × d) | O(L × h × p²) | | Text Recognition | O(T × B × V × log(BV)) | O(B × T × d) | | Math Symbol Detection | O(s × c) | O(s) | | Spatial Analysis | O(s log s) | O(s²) worst case | | Tree Construction | O(s log s) | O(s × h) | | LaTeX Generation | O(s) | O(h) | | Batch Processing | O(N × T / P) | O(N × R + P × M) | | Vector Caching | O(d + log n) | O(n × d) | **Legend:** - n = pixel count - L = transformer layers - p = number of patches - d = embedding dimension - h = attention heads - T = sequence length - B = beam width - V = vocabulary size - s = symbol count - N = batch size - P = parallel workers - R = result size - M = model size ### Optimization Opportunities 1. **Preprocessing**: Use GPU-accelerated image operations 2. **Transformer**: Implement efficient attention (FlashAttention) 3. **Beam Search**: Prune low-probability beams early 4. **Spatial Analysis**: Use spatial indexing (QuadTree/R-tree) 5. **Caching**: Implement tiered cache (L1: memory, L2: disk) 6. **Batch Processing**: Dynamic load balancing across workers 7. **Vector Search**: Use approximate nearest neighbor (HNSW) --- ## Design Patterns Used 1. **Pipeline Pattern**: Image preprocessing → OCR → Math parsing → Output 2. **Strategy Pattern**: Multiple output format generators 3. **Observer Pattern**: Progress tracking in batch processing 4. **Factory Pattern**: Model and cache instantiation 5. **Adapter Pattern**: Format conversion layers 6. **Repository Pattern**: Vector store abstraction 7. **Command Pattern**: Cache update operations 8. **Builder Pattern**: Expression tree and XML construction --- *This pseudocode serves as the algorithmic blueprint for implementation in the Refinement phase.*