47 KiB
47 KiB
SPARC Pseudocode: Ruvector-Scipix OCR & Math Recognition Pipeline
Document Overview
This document provides algorithmic pseudocode for the core components of the ruvector-scipix OCR and mathematical expression recognition system. All algorithms use Rust-like syntax and include complexity analysis.
1. Image Preprocessing Pipeline
1.1 Main Preprocessing Algorithm
ALGORITHM: PreprocessImage
INPUT: imageBytes (Vec<u8>), config (PreprocessConfig)
OUTPUT: Result<ProcessedImage, Error>
CONSTANTS:
MAX_IMAGE_SIZE = 4096 × 4096 pixels
MIN_DPI = 150
TARGET_DPI = 300
NOISE_THRESHOLD = 0.15
DATA STRUCTURES:
ProcessedImage {
data: Vec<u8>,
width: u32,
height: u32,
channels: u8,
metadata: ImageMetadata,
regions: Vec<TextRegion>
}
ImageMetadata {
dpi: u32,
rotation: f32,
quality_score: f32,
has_math: bool
}
TextRegion {
bbox: BoundingBox,
confidence: f32,
region_type: RegionType // Text, Math, Diagram
}
BEGIN
// Phase 1: Image Loading and Validation
rawImage ← DecodeImage(imageBytes)
IF rawImage.is_error() THEN
RETURN Error("Failed to decode image")
END IF
IF rawImage.width > MAX_IMAGE_SIZE OR rawImage.height > MAX_IMAGE_SIZE THEN
rawImage ← ResizeImage(rawImage, MAX_IMAGE_SIZE)
END IF
// Phase 2: Rotation Detection and Correction
rotationAngle ← DetectRotation(rawImage)
IF ABS(rotationAngle) > 0.5 THEN
rawImage ← RotateImage(rawImage, -rotationAngle)
END IF
// Phase 3: DPI Normalization
currentDPI ← EstimateDPI(rawImage)
IF currentDPI < MIN_DPI THEN
RETURN Error("Image resolution too low")
END IF
IF currentDPI != TARGET_DPI THEN
scaleFactor ← TARGET_DPI / currentDPI
rawImage ← ResizeImage(rawImage, scaleFactor)
END IF
// Phase 4: Noise Reduction
noiseLevel ← EstimateNoise(rawImage)
IF noiseLevel > NOISE_THRESHOLD THEN
rawImage ← ApplyBilateralFilter(rawImage, sigma: 2.0, radius: 3)
END IF
// Phase 5: Contrast Enhancement
enhancedImage ← AdaptiveHistogramEqualization(rawImage, clip_limit: 2.0)
// Phase 6: Text Region Detection
regions ← DetectTextRegions(enhancedImage)
// Phase 7: Quality Assessment
qualityScore ← AssessQuality(enhancedImage, regions)
metadata ← ImageMetadata {
dpi: TARGET_DPI,
rotation: rotationAngle,
quality_score: qualityScore,
has_math: ContainsMathRegions(regions)
}
RETURN Ok(ProcessedImage {
data: enhancedImage.to_bytes(),
width: enhancedImage.width,
height: enhancedImage.height,
channels: enhancedImage.channels,
metadata: metadata,
regions: regions
})
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Image decoding: O(n) where n = pixel count
- Rotation detection: O(n log n) using Hough transform
- Image rotation: O(n)
- DPI scaling: O(n)
- Bilateral filter: O(n × r²) where r = radius
- CLAHE: O(n)
- Region detection: O(n log n)
Total: O(n log n)
Space Complexity:
- Raw image buffer: O(n)
- Intermediate buffers: O(n)
- Region storage: O(k) where k = region count
Total: O(n)
1.2 Rotation Detection Algorithm
ALGORITHM: DetectRotation
INPUT: image (Image)
OUTPUT: angle (f32)
BEGIN
// Convert to grayscale if needed
grayImage ← ToGrayscale(image)
// Apply edge detection
edges ← CannyEdgeDetection(grayImage, low: 50, high: 150)
// Use Hough Line Transform to detect dominant lines
lines ← HoughLineTransform(edges, rho: 1.0, theta: PI/180, threshold: 100)
IF lines.is_empty() THEN
RETURN 0.0
END IF
// Cluster angles into dominant orientations
angles ← []
FOR EACH line IN lines DO
angle ← line.theta * 180 / PI
// Normalize to [-45, 45] range
WHILE angle > 45 DO
angle ← angle - 90
END WHILE
WHILE angle < -45 DO
angle ← angle + 90
END WHILE
angles.push(angle)
END FOR
// Use median for robustness
angles.sort()
medianAngle ← angles[angles.len() / 2]
RETURN medianAngle
END
COMPLEXITY ANALYSIS:
Time: O(n log n) for Hough transform
Space: O(n) for edge map
1.3 Text Region Detection
ALGORITHM: DetectTextRegions
INPUT: image (Image)
OUTPUT: regions (Vec<TextRegion>)
DATA STRUCTURES:
Component {
pixels: Vec<Point>,
bbox: BoundingBox,
area: u32
}
BEGIN
// Use MSER (Maximally Stable Extremal Regions)
binaryImage ← AdaptiveThreshold(image, window: 15)
components ← FindConnectedComponents(binaryImage)
regions ← []
FOR EACH comp IN components DO
// Filter by geometric properties
aspectRatio ← comp.bbox.width / comp.bbox.height
density ← comp.area / (comp.bbox.width * comp.bbox.height)
IF aspectRatio > 0.1 AND aspectRatio < 10.0 AND density > 0.3 THEN
// Classify region type
features ← ExtractRegionFeatures(comp, image)
regionType ← ClassifyRegion(features)
region ← TextRegion {
bbox: comp.bbox,
confidence: features.confidence,
region_type: regionType
}
regions.push(region)
END IF
END FOR
// Merge nearby regions
mergedRegions ← MergeOverlappingRegions(regions, iou_threshold: 0.3)
RETURN mergedRegions
END
COMPLEXITY ANALYSIS:
Time: O(n × α(n)) where α is inverse Ackermann (connected components)
Space: O(k) where k = component count
2. OCR Engine Core
2.1 Main OCR Pipeline
ALGORITHM: RecognizeText
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: Result<RecognitionResult, Error>
DATA STRUCTURES:
RecognitionResult {
lines: Vec<TextLine>,
confidence: f32,
processing_time_ms: u64
}
TextLine {
text: String,
bbox: BoundingBox,
words: Vec<Word>,
confidence: f32
}
Word {
text: String,
bbox: BoundingBox,
chars: Vec<Character>,
confidence: f32
}
Character {
char: char,
bbox: BoundingBox,
confidence: f32,
alternatives: Vec<(char, f32)>
}
BEGIN
startTime ← GetCurrentTime()
// Phase 1: Vision Transformer Encoding
encodedFeatures ← EncodeImageFeatures(image, model)
// Phase 2: Text Line Detection
textLines ← DetectTextLines(encodedFeatures, image.regions)
// Phase 3: Character Recognition
recognizedLines ← []
totalConfidence ← 0.0
FOR EACH lineRegion IN textLines DO
lineImage ← CropRegion(image, lineRegion.bbox)
// Run sequence-to-sequence recognition
words ← RecognizeLineSequence(lineImage, model, encodedFeatures)
lineText ← words.map(|w| w.text).join(" ")
lineConfidence ← ComputeLineConfidence(words)
textLine ← TextLine {
text: lineText,
bbox: lineRegion.bbox,
words: words,
confidence: lineConfidence
}
recognizedLines.push(textLine)
totalConfidence ← totalConfidence + lineConfidence
END FOR
avgConfidence ← totalConfidence / recognizedLines.len()
processingTime ← GetCurrentTime() - startTime
RETURN Ok(RecognitionResult {
lines: recognizedLines,
confidence: avgConfidence,
processing_time_ms: processingTime
})
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Vision Transformer encoding: O(n² × d) where d = embedding dim
- Line detection: O(k × log k) where k = regions
- Character recognition per line: O(m × d²) where m = line length
- Total lines L: O(L × m × d²)
Overall: O(n² × d + L × m × d²)
Space Complexity:
- Feature maps: O(n × d)
- Attention maps: O(n² × h) where h = attention heads
- Output storage: O(L × m)
Total: O(n² × h + n × d)
2.2 Vision Transformer Encoding
ALGORITHM: EncodeImageFeatures
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: features (FeatureMap)
DATA STRUCTURES:
FeatureMap {
embeddings: Tensor<f32>, // Shape: [seq_len, embed_dim]
attention_weights: Tensor<f32>, // Shape: [heads, seq_len, seq_len]
positions: Vec<Point>
}
VisionTransformer {
patch_size: u32,
embed_dim: u32,
num_heads: u32,
num_layers: u32,
weights: ModelWeights
}
BEGIN
// Phase 1: Patch Extraction
patchSize ← model.patch_size
numPatchesH ← image.height / patchSize
numPatchesW ← image.width / patchSize
patches ← []
positions ← []
FOR h IN 0..numPatchesH DO
FOR w IN 0..numPatchesW DO
y ← h * patchSize
x ← w * patchSize
patch ← ExtractPatch(image, x, y, patchSize)
patches.push(patch)
positions.push(Point{x, y})
END FOR
END FOR
// Phase 2: Patch Embedding
embeddings ← []
FOR EACH patch IN patches DO
// Linear projection of flattened patch
flatPatch ← Flatten(patch)
embedding ← MatMul(model.weights.patch_projection, flatPatch)
embeddings.push(embedding)
END FOR
// Phase 3: Positional Encoding
FOR i IN 0..embeddings.len() DO
posEncoding ← ComputePositionalEncoding(i, model.embed_dim)
embeddings[i] ← embeddings[i] + posEncoding
END FOR
// Add [CLS] token
clsToken ← model.weights.cls_token
embeddings.insert(0, clsToken)
// Phase 4: Transformer Layers
x ← Tensor::from(embeddings)
allAttentionWeights ← []
FOR layer IN 0..model.num_layers DO
// Multi-head self-attention
(x, attentionWeights) ← MultiHeadAttention(
x,
model.weights.layers[layer],
num_heads: model.num_heads
)
allAttentionWeights.push(attentionWeights)
// Feed-forward network
x ← FeedForward(x, model.weights.layers[layer])
// Layer normalization
x ← LayerNorm(x, model.weights.layers[layer])
END FOR
RETURN FeatureMap {
embeddings: x,
attention_weights: Stack(allAttentionWeights),
positions: positions
}
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Patch extraction: O(n) where n = pixels
- Patch embedding: O(p × d²) where p = patches, d = embed_dim
- Attention per layer: O(p² × d)
- Total layers L: O(L × p² × d)
Overall: O(L × p² × d)
Space Complexity:
- Embeddings: O(p × d)
- Attention matrices: O(L × h × p²) where h = heads
Total: O(L × h × p² + p × d)
2.3 Character Recognition Sequence
ALGORITHM: RecognizeLineSequence
INPUT: lineImage (Image), model (VisionTransformer), features (FeatureMap)
OUTPUT: words (Vec<Word>)
DATA STRUCTURES:
BeamSearchState {
sequence: Vec<char>,
score: f32,
hidden_state: Tensor<f32>
}
CONSTANTS:
BEAM_WIDTH = 5
MAX_SEQUENCE_LENGTH = 256
END_TOKEN = '<END>'
SPACE_TOKEN = '<SPACE>'
BEGIN
// Initialize beam search
initialState ← BeamSearchState {
sequence: [],
score: 0.0,
hidden_state: features.embeddings[0] // CLS token
}
beams ← [initialState]
// Beam search decoding
FOR step IN 0..MAX_SEQUENCE_LENGTH DO
candidates ← []
FOR EACH beam IN beams DO
IF beam.sequence.last() == END_TOKEN THEN
candidates.push(beam)
CONTINUE
END IF
// Get character probabilities from model
(logits, newHiddenState) ← model.decode_step(
beam.hidden_state,
features.embeddings
)
probabilities ← Softmax(logits)
// Get top-k characters
topK ← GetTopK(probabilities, k: BEAM_WIDTH)
FOR EACH (char, prob) IN topK DO
newSequence ← beam.sequence.clone()
newSequence.push(char)
// Log probability for numerical stability
newScore ← beam.score + LOG(prob)
newBeam ← BeamSearchState {
sequence: newSequence,
score: newScore,
hidden_state: newHiddenState
}
candidates.push(newBeam)
END FOR
END FOR
// Keep top BEAM_WIDTH candidates
candidates.sort_by(|a, b| b.score.cmp(a.score))
beams ← candidates[0..BEAM_WIDTH]
// Check if all beams ended
allEnded ← beams.all(|b| b.sequence.last() == END_TOKEN)
IF allEnded THEN
BREAK
END IF
END FOR
// Take best beam
bestBeam ← beams[0]
// Split sequence into words
words ← []
currentWord ← []
currentBBox ← BoundingBox::new()
FOR i IN 0..bestBeam.sequence.len() DO
char ← bestBeam.sequence[i]
IF char == SPACE_TOKEN OR char == END_TOKEN THEN
IF NOT currentWord.is_empty() THEN
wordText ← currentWord.join("")
word ← Word {
text: wordText,
bbox: currentBBox,
chars: currentWord.clone(),
confidence: EXP(bestBeam.score / bestBeam.sequence.len())
}
words.push(word)
currentWord.clear()
END IF
ELSE
currentWord.push(Character {
char: char,
bbox: EstimateCharBBox(lineImage, i),
confidence: EXP(bestBeam.score / (i + 1)),
alternatives: []
})
END IF
END FOR
RETURN words
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Beam search steps: O(T × B × V) where:
T = max sequence length
B = beam width
V = vocabulary size
- Sorting per step: O(B × V × log(B × V))
Overall: O(T × B × V × log(B × V))
Space Complexity:
- Beam storage: O(B × T × d) where d = hidden dim
- Candidate buffer: O(B × V)
Total: O(B × T × d)
3. Mathematical Expression Parser
3.1 Math Expression Recognition
ALGORITHM: RecognizeMathExpression
INPUT: region (TextRegion), image (ProcessedImage), model (MathModel)
OUTPUT: Result<MathExpression, Error>
DATA STRUCTURES:
MathExpression {
latex: String,
tree: ExpressionTree,
symbols: Vec<MathSymbol>,
confidence: f32
}
ExpressionTree {
root: Box<TreeNode>,
height: u32
}
TreeNode {
symbol: MathSymbol,
relationship: SpatialRelation,
children: Vec<Box<TreeNode>>
}
MathSymbol {
symbol_type: SymbolType, // Digit, Operator, Letter, Special
value: String,
bbox: BoundingBox,
confidence: f32
}
SpatialRelation {
relation_type: RelationType, // Above, Below, Right, Superscript, Subscript
distance: f32,
alignment: f32
}
BEGIN
// Phase 1: Extract math region
mathImage ← CropRegion(image, region.bbox)
// Phase 2: Symbol Detection and Classification
symbols ← DetectMathSymbols(mathImage, model)
IF symbols.is_empty() THEN
RETURN Error("No mathematical symbols detected")
END IF
// Phase 3: Spatial Relationship Analysis
relationships ← AnalyzeSpatialRelationships(symbols)
// Phase 4: Expression Tree Construction
tree ← BuildExpressionTree(symbols, relationships)
// Phase 5: LaTeX Generation
latex ← GenerateLaTeX(tree)
// Calculate overall confidence
avgConfidence ← symbols.map(|s| s.confidence).average()
RETURN Ok(MathExpression {
latex: latex,
tree: tree,
symbols: symbols,
confidence: avgConfidence
})
END
COMPLEXITY ANALYSIS:
Time: O(n² × log n) where n = symbol count
Space: O(n × h) where h = tree height
3.2 Symbol Detection and Classification
ALGORITHM: DetectMathSymbols
INPUT: mathImage (Image), model (MathModel)
OUTPUT: symbols (Vec<MathSymbol>)
CONSTANTS:
SYMBOL_MIN_SIZE = 8 pixels
SYMBOL_MAX_SIZE = 128 pixels
CONFIDENCE_THRESHOLD = 0.7
BEGIN
// Phase 1: Connected Component Analysis
binaryImage ← AdaptiveThreshold(mathImage, window: 11)
components ← FindConnectedComponents(binaryImage)
symbols ← []
FOR EACH comp IN components DO
// Filter by size
width ← comp.bbox.width
height ← comp.bbox.height
IF width < SYMBOL_MIN_SIZE OR height < SYMBOL_MIN_SIZE THEN
CONTINUE
END IF
IF width > SYMBOL_MAX_SIZE OR height > SYMBOL_MAX_SIZE THEN
// Might be compound symbol, try to split
subComponents ← SplitComponent(comp)
FOR EACH subComp IN subComponents DO
ProcessSymbol(subComp, mathImage, model, symbols)
END FOR
ELSE
ProcessSymbol(comp, mathImage, model, symbols)
END IF
END FOR
// Sort symbols left-to-right, top-to-bottom
symbols.sort_by(|a, b| {
IF ABS(a.bbox.y - b.bbox.y) < 10 THEN
a.bbox.x.cmp(b.bbox.x)
ELSE
a.bbox.y.cmp(b.bbox.y)
END IF
})
RETURN symbols
END
SUBROUTINE: ProcessSymbol
INPUT: component (Component), image (Image), model (MathModel), symbols (Vec<MathSymbol>)
OUTPUT: None (modifies symbols)
BEGIN
// Extract symbol image
symbolImage ← CropRegion(image, component.bbox)
// Normalize to model input size
normalizedSymbol ← ResizeImage(symbolImage, 64, 64)
// Classify symbol
(symbolClass, confidence) ← model.classify_symbol(normalizedSymbol)
IF confidence >= CONFIDENCE_THRESHOLD THEN
symbol ← MathSymbol {
symbol_type: DetermineSymbolType(symbolClass),
value: symbolClass.to_string(),
bbox: component.bbox,
confidence: confidence
}
symbols.push(symbol)
END IF
END
COMPLEXITY ANALYSIS:
Time: O(n × c) where n = components, c = classification time
Space: O(n) for symbol storage
3.3 Spatial Relationship Analysis
ALGORITHM: AnalyzeSpatialRelationships
INPUT: symbols (Vec<MathSymbol>)
OUTPUT: relationships (Vec<(usize, usize, SpatialRelation)>)
DATA STRUCTURES:
RelationFeatures {
horizontal_distance: f32,
vertical_distance: f32,
size_ratio: f32,
vertical_alignment: f32,
horizontal_alignment: f32
}
CONSTANTS:
SUPERSCRIPT_Y_THRESHOLD = 0.6 // Relative to symbol height
SUBSCRIPT_Y_THRESHOLD = 0.4
FRACTION_ALIGNMENT_THRESHOLD = 0.8
BEGIN
relationships ← []
// Build spatial index for efficient queries
spatialIndex ← BuildQuadTree(symbols)
FOR i IN 0..symbols.len() DO
symbolA ← symbols[i]
// Find nearby symbols
nearbySymbols ← spatialIndex.query_radius(
symbolA.bbox.center(),
radius: symbolA.bbox.width * 3
)
FOR EACH (j, symbolB) IN nearbySymbols DO
IF i >= j THEN
CONTINUE // Avoid duplicate pairs
END IF
// Extract relationship features
features ← ExtractRelationFeatures(symbolA, symbolB)
// Classify relationship
relation ← ClassifyRelation(features, symbolA, symbolB)
IF relation.is_some() THEN
relationships.push((i, j, relation.unwrap()))
END IF
END FOR
END FOR
RETURN relationships
END
SUBROUTINE: ClassifyRelation
INPUT: features (RelationFeatures), symbolA (MathSymbol), symbolB (MathSymbol)
OUTPUT: Option<SpatialRelation>
BEGIN
centerA ← symbolA.bbox.center()
centerB ← symbolB.bbox.center()
deltaX ← centerB.x - centerA.x
deltaY ← centerB.y - centerA.y
// Determine dominant relationship
// Superscript/Subscript detection
IF deltaX > 0 AND deltaX < symbolA.bbox.width * 1.5 THEN
relativeY ← deltaY / symbolA.bbox.height
IF relativeY < -SUPERSCRIPT_Y_THRESHOLD THEN
RETURN Some(SpatialRelation {
relation_type: Superscript,
distance: SQRT(deltaX² + deltaY²),
alignment: features.horizontal_alignment
})
ELSE IF relativeY > SUBSCRIPT_Y_THRESHOLD THEN
RETURN Some(SpatialRelation {
relation_type: Subscript,
distance: SQRT(deltaX² + deltaY²),
alignment: features.horizontal_alignment
})
END IF
END IF
// Fraction detection (vertical alignment)
IF features.vertical_alignment > FRACTION_ALIGNMENT_THRESHOLD THEN
IF deltaY < 0 THEN
RETURN Some(SpatialRelation {
relation_type: Above,
distance: ABS(deltaY),
alignment: features.vertical_alignment
})
ELSE IF deltaY > 0 THEN
RETURN Some(SpatialRelation {
relation_type: Below,
distance: ABS(deltaY),
alignment: features.vertical_alignment
})
END IF
END IF
// Horizontal sequence (default)
IF deltaX > 0 AND ABS(deltaY) < symbolA.bbox.height * 0.3 THEN
RETURN Some(SpatialRelation {
relation_type: Right,
distance: deltaX,
alignment: features.horizontal_alignment
})
END IF
RETURN None
END
COMPLEXITY ANALYSIS:
Time Complexity:
- QuadTree construction: O(n log n)
- For each symbol, query nearby: O(log n + k) where k = nearby count
- Total: O(n × (log n + k))
Average case: O(n log n) if k is constant
Space Complexity:
- QuadTree: O(n)
- Relationships: O(n²) worst case, O(n) average
Total: O(n²) worst case
3.4 Expression Tree Construction
ALGORITHM: BuildExpressionTree
INPUT: symbols (Vec<MathSymbol>), relationships (Vec<(usize, usize, SpatialRelation)>)
OUTPUT: tree (ExpressionTree)
DATA STRUCTURES:
TreeBuilder {
nodes: Vec<Box<TreeNode>>,
parent_map: HashMap<usize, usize>,
relation_graph: AdjacencyList
}
BEGIN
// Phase 1: Build relationship graph
graph ← BuildRelationGraph(symbols, relationships)
// Phase 2: Identify root candidates (symbols with no parents)
rootCandidates ← []
FOR i IN 0..symbols.len() DO
IF NOT HasIncomingEdge(graph, i, excludeRight: true) THEN
rootCandidates.push(i)
END IF
END FOR
// Phase 3: Build tree from leftmost root
rootCandidates.sort_by(|a, b| {
symbols[*a].bbox.x.cmp(&symbols[*b].bbox.x)
})
rootIdx ← rootCandidates[0]
// Phase 4: Recursive tree construction
root ← BuildSubtree(rootIdx, symbols, graph, visited: Set::new())
// Phase 5: Calculate tree height
height ← CalculateHeight(root)
RETURN ExpressionTree {
root: root,
height: height
}
END
SUBROUTINE: BuildSubtree
INPUT: nodeIdx (usize), symbols (Vec<MathSymbol>), graph (AdjacencyList), visited (Set<usize>)
OUTPUT: node (Box<TreeNode>)
BEGIN
IF visited.contains(nodeIdx) THEN
RETURN Error("Cycle detected in expression tree")
END IF
visited.insert(nodeIdx)
symbol ← symbols[nodeIdx]
children ← []
// Get all outgoing edges sorted by relationship priority
edges ← graph.get_outgoing(nodeIdx)
edges.sort_by(|a, b| {
// Priority: Superscript > Subscript > Above > Below > Right
GetRelationPriority(a.relation).cmp(GetRelationPriority(b.relation))
})
FOR EACH edge IN edges DO
IF NOT visited.contains(edge.target) THEN
childNode ← BuildSubtree(edge.target, symbols, graph, visited)
childNode.relationship ← edge.relation
children.push(childNode)
END IF
END FOR
node ← TreeNode {
symbol: symbol.clone(),
relationship: SpatialRelation::default(),
children: children
}
RETURN Box::new(node)
END
COMPLEXITY ANALYSIS:
Time: O(n × log n) for graph construction and tree building
Space: O(n × h) where h = average tree height
3.5 LaTeX Generation
ALGORITHM: GenerateLaTeX
INPUT: tree (ExpressionTree)
OUTPUT: latex (String)
BEGIN
latex ← RecursiveGenerateLaTeX(tree.root)
// Wrap in delimiters
latex ← "\\(" + latex + "\\)"
RETURN latex
END
SUBROUTINE: RecursiveGenerateLaTeX
INPUT: node (Box<TreeNode>)
OUTPUT: latex (String)
BEGIN
symbol ← node.symbol
baseLatex ← SymbolToLatex(symbol)
// Group children by relationship type
superscripts ← []
subscripts ← []
numerator ← None
denominator ← None
rightChildren ← []
FOR EACH child IN node.children DO
MATCH child.relationship.relation_type:
Superscript → superscripts.push(child)
Subscript → subscripts.push(child)
Above → numerator ← Some(child)
Below → denominator ← Some(child)
Right → rightChildren.push(child)
END MATCH
END FOR
// Build LaTeX string
result ← baseLatex
// Handle fractions
IF numerator.is_some() AND denominator.is_some() THEN
numLatex ← RecursiveGenerateLaTeX(numerator.unwrap())
denomLatex ← RecursiveGenerateLaTeX(denominator.unwrap())
result ← "\\frac{" + numLatex + "}{" + denomLatex + "}"
END IF
// Handle superscripts
IF NOT superscripts.is_empty() THEN
superLatex ← superscripts
.map(|c| RecursiveGenerateLaTeX(c))
.join("")
result ← result + "^{" + superLatex + "}"
END IF
// Handle subscripts
IF NOT subscripts.is_empty() THEN
subLatex ← subscripts
.map(|c| RecursiveGenerateLaTeX(c))
.join("")
result ← result + "_{" + subLatex + "}"
END IF
// Handle right children (sequential)
FOR EACH child IN rightChildren DO
childLatex ← RecursiveGenerateLaTeX(child)
// Add spacing for operators
IF IsOperator(child.symbol) THEN
result ← result + " " + childLatex + " "
ELSE
result ← result + childLatex
END IF
END FOR
RETURN result
END
SUBROUTINE: SymbolToLatex
INPUT: symbol (MathSymbol)
OUTPUT: latex (String)
BEGIN
MATCH symbol.symbol_type:
Digit → RETURN symbol.value
Letter → RETURN symbol.value
Operator → RETURN OperatorToLatex(symbol.value)
Special → RETURN SpecialToLatex(symbol.value)
END MATCH
RETURN symbol.value
END
COMPLEXITY ANALYSIS:
Time: O(n) where n = nodes in tree
Space: O(h) for recursion stack where h = tree height
4. Output Format Conversion
4.1 Multi-Format Generation
ALGORITHM: ConvertToFormats
INPUT: mathExpr (MathExpression), formats (Vec<OutputFormat>)
OUTPUT: Result<HashMap<OutputFormat, String>, Error>
DATA STRUCTURES:
OutputFormat {
MMD, // Markdown with delimiters
LaTeXStyled, // Standalone LaTeX
MathML, // MathML XML
HTML // Rendered HTML
}
BEGIN
results ← HashMap::new()
FOR EACH format IN formats DO
output ← MATCH format:
MMD → GenerateMMD(mathExpr)
LaTeXStyled → GenerateStyledLaTeX(mathExpr)
MathML → GenerateMathML(mathExpr.tree)
HTML → GenerateHTML(mathExpr)
END MATCH
results.insert(format, output)
END FOR
RETURN Ok(results)
END
COMPLEXITY ANALYSIS:
Time: O(f × n) where f = format count, n = expression size
Space: O(f × n) for storing all formats
4.2 MMD Generation
ALGORITHM: GenerateMMD
INPUT: mathExpr (MathExpression)
OUTPUT: mmd (String)
CONSTANTS:
INLINE_DELIMITER = "$"
DISPLAY_DELIMITER = "$$"
BEGIN
latex ← mathExpr.latex
// Determine if expression should be display or inline
isDisplayMath ← ShouldBeDisplayMath(mathExpr)
IF isDisplayMath THEN
mmd ← DISPLAY_DELIMITER + "\n" + latex + "\n" + DISPLAY_DELIMITER
ELSE
mmd ← INLINE_DELIMITER + latex + INLINE_DELIMITER
END IF
RETURN mmd
END
SUBROUTINE: ShouldBeDisplayMath
INPUT: mathExpr (MathExpression)
OUTPUT: isDisplay (bool)
BEGIN
// Display math if:
// 1. Contains fractions or large operators
// 2. Tree height > 2
// 3. Width > threshold
hasFractions ← mathExpr.latex.contains("\\frac")
hasLargeOps ← mathExpr.latex.contains("\\sum") OR
mathExpr.latex.contains("\\int") OR
mathExpr.latex.contains("\\prod")
isTall ← mathExpr.tree.height > 2
isWide ← mathExpr.symbols.len() > 10
RETURN hasFractions OR hasLargeOps OR isTall OR isWide
END
COMPLEXITY ANALYSIS:
Time: O(n) where n = LaTeX string length
Space: O(n) for output string
4.3 MathML Generation
ALGORITHM: GenerateMathML
INPUT: tree (ExpressionTree)
OUTPUT: mathml (String)
BEGIN
xml ← XMLBuilder::new()
xml.start_element("math", [("xmlns", "http://www.w3.org/1998/Math/MathML")])
RecursiveGenerateMathML(tree.root, xml)
xml.end_element("math")
RETURN xml.to_string()
END
SUBROUTINE: RecursiveGenerateMathML
INPUT: node (Box<TreeNode>), xml (XMLBuilder)
OUTPUT: None (modifies xml)
BEGIN
symbol ← node.symbol
// Determine MathML element type
MATCH symbol.symbol_type:
Digit OR Letter →
xml.element("mi", symbol.value)
Operator →
xml.element("mo", symbol.value)
Special →
HandleSpecialSymbol(symbol, xml)
END MATCH
// Handle relationships
IF HasSuperscript(node) THEN
xml.start_element("msup")
RecursiveGenerateMathML(GetBase(node), xml)
RecursiveGenerateMathML(GetSuperscript(node), xml)
xml.end_element("msup")
ELSE IF HasSubscript(node) THEN
xml.start_element("msub")
RecursiveGenerateMathML(GetBase(node), xml)
RecursiveGenerateMathML(GetSubscript(node), xml)
xml.end_element("msub")
ELSE IF HasFraction(node) THEN
xml.start_element("mfrac")
RecursiveGenerateMathML(GetNumerator(node), xml)
RecursiveGenerateMathML(GetDenominator(node), xml)
xml.end_element("mfrac")
END IF
// Process right children
FOR EACH child IN GetRightChildren(node) DO
RecursiveGenerateMathML(child, xml)
END FOR
END
COMPLEXITY ANALYSIS:
Time: O(n) tree traversal
Space: O(n) for XML string
4.4 HTML Rendering
ALGORITHM: GenerateHTML
INPUT: mathExpr (MathExpression)
OUTPUT: html (String)
BEGIN
// Use KaTeX or MathJax for rendering
latex ← mathExpr.latex
html ← """
<div class="math-expression" data-confidence="{mathExpr.confidence}">
<script type="math/tex">
{latex}
</script>
</div>
"""
// Add accessibility attributes
html ← AddAriaLabels(html, mathExpr)
RETURN html
END
COMPLEXITY ANALYSIS:
Time: O(n) string concatenation
Space: O(n) output size
5. Batch Processing
5.1 Parallel Batch Processing
ALGORITHM: ProcessBatch
INPUT: inputs (Vec<InputSource>), config (ProcessConfig)
OUTPUT: Result<Vec<ProcessResult>, Error>
DATA STRUCTURES:
InputSource {
source_type: SourceType, // Image, PDF, Directory
path: PathBuf,
page_range: Option<Range<u32>>
}
ProcessResult {
input: InputSource,
output: RecognitionResult,
processing_time_ms: u64,
status: ResultStatus
}
ProcessConfig {
max_parallel: usize,
timeout_ms: u64,
cache_enabled: bool
}
BEGIN
// Phase 1: Expand inputs (handle PDFs and directories)
expandedInputs ← []
FOR EACH input IN inputs DO
MATCH input.source_type:
PDF →
pages ← ExtractPDFPages(input.path, input.page_range)
expandedInputs.extend(pages)
Directory →
images ← FindImagesInDirectory(input.path)
expandedInputs.extend(images)
Image →
expandedInputs.push(input)
END MATCH
END FOR
// Phase 2: Create processing queue
queue ← WorkQueue::new(expandedInputs)
results ← ConcurrentVec::new()
// Phase 3: Parallel processing
numWorkers ← MIN(config.max_parallel, CPU_COUNT)
PARALLEL FOR worker IN 0..numWorkers DO
LOOP
input ← queue.pop()
IF input.is_none() THEN
BREAK
END IF
startTime ← GetCurrentTime()
// Process single input
result ← ProcessSingleInput(
input.unwrap(),
config,
timeout: config.timeout_ms
)
processingTime ← GetCurrentTime() - startTime
processResult ← ProcessResult {
input: input.unwrap(),
output: result,
processing_time_ms: processingTime,
status: DetermineStatus(result)
}
results.push(processResult)
END LOOP
END PARALLEL
// Phase 4: Aggregate and return
finalResults ← results.into_vec()
finalResults.sort_by(|a, b| a.input.path.cmp(&b.input.path))
RETURN Ok(finalResults)
END
COMPLEXITY ANALYSIS:
Time Complexity:
- With P workers, N inputs, T time per input
- Parallel: O(N × T / P)
- Sequential equivalent: O(N × T)
- Speedup: ~P (linear with worker count)
Space Complexity:
- Queue: O(N)
- Results: O(N × R) where R = result size
- Worker memory: O(P × M) where M = model size
Total: O(N × R + P × M)
5.2 PDF Page Extraction
ALGORITHM: ExtractPDFPages
INPUT: pdfPath (PathBuf), pageRange (Option<Range<u32>>)
OUTPUT: pages (Vec<InputSource>)
BEGIN
// Load PDF document
document ← PDFDocument::load(pdfPath)
IF document.is_error() THEN
RETURN Error("Failed to load PDF")
END IF
// Determine page range
totalPages ← document.page_count()
range ← pageRange.unwrap_or(0..totalPages)
pages ← []
FOR pageNum IN range DO
IF pageNum >= totalPages THEN
BREAK
END IF
// Render page to image
page ← document.get_page(pageNum)
// Render at high DPI for quality
image ← page.render(dpi: 300)
// Create temporary file
tempPath ← CreateTempFile(format!("page_{}.png", pageNum))
image.save(tempPath)
inputSource ← InputSource {
source_type: Image,
path: tempPath,
page_range: None
}
pages.push(inputSource)
END FOR
RETURN pages
END
COMPLEXITY ANALYSIS:
Time: O(P × R) where P = pages, R = render time per page
Space: O(P × S) where S = image size
5.3 Result Aggregation
ALGORITHM: AggregateResults
INPUT: results (Vec<ProcessResult>)
OUTPUT: aggregated (AggregatedResults)
DATA STRUCTURES:
AggregatedResults {
total_count: usize,
success_count: usize,
failure_count: usize,
total_processing_time_ms: u64,
average_confidence: f32,
results_by_status: HashMap<ResultStatus, Vec<ProcessResult>>
}
BEGIN
totalCount ← results.len()
successCount ← 0
failureCount ← 0
totalTime ← 0
totalConfidence ← 0.0
byStatus ← HashMap::new()
FOR EACH result IN results DO
totalTime ← totalTime + result.processing_time_ms
MATCH result.status:
Success →
successCount ← successCount + 1
totalConfidence ← totalConfidence + result.output.confidence
Failure →
failureCount ← failureCount + 1
END MATCH
// Group by status
IF NOT byStatus.contains_key(result.status) THEN
byStatus.insert(result.status, [])
END IF
byStatus.get_mut(result.status).push(result)
END FOR
avgConfidence ← IF successCount > 0 THEN
totalConfidence / successCount
ELSE
0.0
END IF
RETURN AggregatedResults {
total_count: totalCount,
success_count: successCount,
failure_count: failureCount,
total_processing_time_ms: totalTime,
average_confidence: avgConfidence,
results_by_status: byStatus
}
END
COMPLEXITY ANALYSIS:
Time: O(n) single pass
Space: O(n) for grouping
6. Caching and Memoization
6.1 Model Weight Caching
ALGORITHM: LoadModelWithCache
INPUT: modelPath (PathBuf), cacheConfig (CacheConfig)
OUTPUT: Result<Model, Error>
DATA STRUCTURES:
CacheConfig {
enabled: bool,
cache_dir: PathBuf,
max_cache_size_mb: u64,
ttl_seconds: u64
}
CachedModel {
weights: Vec<u8>,
metadata: ModelMetadata,
cached_at: Timestamp,
access_count: u64
}
BEGIN
IF NOT cacheConfig.enabled THEN
RETURN LoadModelDirect(modelPath)
END IF
// Generate cache key from model path and version
cacheKey ← ComputeHash(modelPath, algorithm: SHA256)
cachePath ← cacheConfig.cache_dir.join(cacheKey)
// Check if cached version exists and is valid
IF cachePath.exists() THEN
cachedModel ← DeserializeCachedModel(cachePath)
// Check TTL
age ← GetCurrentTime() - cachedModel.cached_at
IF age < cacheConfig.ttl_seconds THEN
// Cache hit
cachedModel.access_count ← cachedModel.access_count + 1
UpdateCacheMetadata(cachePath, cachedModel.metadata)
model ← DeserializeModel(cachedModel.weights)
RETURN Ok(model)
ELSE
// Cache expired
DeleteFile(cachePath)
END IF
END IF
// Cache miss - load from disk
model ← LoadModelDirect(modelPath)
IF model.is_error() THEN
RETURN model
END IF
// Serialize and cache
serializedWeights ← SerializeModel(model.unwrap())
cachedModel ← CachedModel {
weights: serializedWeights,
metadata: model.metadata,
cached_at: GetCurrentTime(),
access_count: 1
}
// Check cache size limit
EnsureCacheSize(cacheConfig)
// Write to cache
WriteCachedModel(cachePath, cachedModel)
RETURN model
END
SUBROUTINE: EnsureCacheSize
INPUT: cacheConfig (CacheConfig)
OUTPUT: None
BEGIN
currentSize ← GetDirectorySize(cacheConfig.cache_dir)
maxSize ← cacheConfig.max_cache_size_mb * 1024 * 1024
IF currentSize <= maxSize THEN
RETURN
END IF
// Evict least recently used models
cachedFiles ← ListFiles(cacheConfig.cache_dir)
// Sort by last access time
cachedFiles.sort_by(|a, b| {
a.metadata.accessed_at.cmp(&b.metadata.accessed_at)
})
freedSpace ← 0
targetFree ← currentSize - maxSize
FOR EACH file IN cachedFiles DO
IF freedSpace >= targetFree THEN
BREAK
END IF
fileSize ← GetFileSize(file)
DeleteFile(file)
freedSpace ← freedSpace + fileSize
END FOR
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Cache hit: O(1) for lookup + O(m) for deserialization
- Cache miss: O(m) for model loading + O(m) for serialization
- Eviction: O(k log k) where k = cached files
Space Complexity:
- Cached model: O(m) where m = model size
- LRU tracking: O(k)
6.2 Result Caching with Ruvector
ALGORITHM: CacheResultWithVector
INPUT: imageHash (Hash), result (RecognitionResult), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>
DATA STRUCTURES:
RuvectorStore {
index: VectorIndex,
metadata_db: HashMap<Hash, ResultMetadata>,
config: VectorConfig
}
VectorConfig {
embedding_dim: usize,
similarity_threshold: f32,
max_cache_entries: usize
}
ResultMetadata {
result: RecognitionResult,
image_hash: Hash,
cached_at: Timestamp,
hit_count: u64
}
BEGIN
// Phase 1: Generate perceptual hash
perceptualHash ← ComputePerceptualHash(imageHash)
// Phase 2: Check if already cached
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
// Update metadata
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
metadata.hit_count ← metadata.hit_count + 1
RETURN Ok(())
END IF
// Phase 3: Generate embedding for the result
embedding ← GenerateResultEmbedding(result)
// Phase 4: Store in vector index
vectorStore.index.insert(
id: perceptualHash,
vector: embedding
)
// Phase 5: Store metadata
metadata ← ResultMetadata {
result: result,
image_hash: imageHash,
cached_at: GetCurrentTime(),
hit_count: 1
}
vectorStore.metadata_db.insert(perceptualHash, metadata)
// Phase 6: Enforce cache size limit
IF vectorStore.metadata_db.len() > vectorStore.config.max_cache_entries THEN
EvictLeastUsedEntry(vectorStore)
END IF
RETURN Ok(())
END
ALGORITHM: QuerySimilarCachedResult
INPUT: imageHash (Hash), vectorStore (RuvectorStore)
OUTPUT: Option<RecognitionResult>
BEGIN
// Generate perceptual hash
perceptualHash ← ComputePerceptualHash(imageHash)
// Exact match check
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
metadata ← vectorStore.metadata_db.get(perceptualHash)
metadata.hit_count ← metadata.hit_count + 1
RETURN Some(metadata.result.clone())
END IF
// Generate query embedding
queryEmbedding ← GenerateImageEmbedding(imageHash)
// Search for similar results
results ← vectorStore.index.search(
query: queryEmbedding,
k: 1,
threshold: vectorStore.config.similarity_threshold
)
IF results.is_empty() THEN
RETURN None
END IF
bestMatch ← results[0]
IF bestMatch.similarity >= vectorStore.config.similarity_threshold THEN
metadata ← vectorStore.metadata_db.get(bestMatch.id)
metadata.hit_count ← metadata.hit_count + 1
RETURN Some(metadata.result.clone())
END IF
RETURN None
END
COMPLEXITY ANALYSIS:
Caching:
Time: O(d) for embedding + O(log n) for index insertion
Space: O(n × d) where n = cached entries, d = embedding dim
Querying:
Time: O(d) for embedding + O(log n × d) for ANN search
Space: O(k) for results where k = search parameter
6.3 Incremental Update Cache
ALGORITHM: UpdateCacheIncremental
INPUT: updates (Vec<CacheUpdate>), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>
DATA STRUCTURES:
CacheUpdate {
operation: UpdateOp, // Insert, Update, Delete
image_hash: Hash,
result: Option<RecognitionResult>
}
UpdateOp {
Insert,
Update,
Delete
}
BEGIN
// Batch updates for efficiency
insertBatch ← []
updateBatch ← []
deleteBatch ← []
FOR EACH update IN updates DO
MATCH update.operation:
Insert →
insertBatch.push(update)
Update →
updateBatch.push(update)
Delete →
deleteBatch.push(update)
END MATCH
END FOR
// Process deletes first
FOR EACH update IN deleteBatch DO
perceptualHash ← ComputePerceptualHash(update.image_hash)
vectorStore.index.remove(perceptualHash)
vectorStore.metadata_db.remove(perceptualHash)
END FOR
// Process updates
FOR EACH update IN updateBatch DO
perceptualHash ← ComputePerceptualHash(update.image_hash)
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
// Update existing entry
embedding ← GenerateResultEmbedding(update.result.unwrap())
vectorStore.index.update(perceptualHash, embedding)
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
metadata.result ← update.result.unwrap()
metadata.cached_at ← GetCurrentTime()
END IF
END FOR
// Process inserts in batch
IF NOT insertBatch.is_empty() THEN
embeddings ← []
metadataList ← []
FOR EACH update IN insertBatch DO
embedding ← GenerateResultEmbedding(update.result.unwrap())
embeddings.push(embedding)
perceptualHash ← ComputePerceptualHash(update.image_hash)
metadata ← ResultMetadata {
result: update.result.unwrap(),
image_hash: update.image_hash,
cached_at: GetCurrentTime(),
hit_count: 1
}
metadataList.push((perceptualHash, metadata))
END FOR
// Batch insert into vector index
vectorStore.index.insert_batch(embeddings)
// Batch insert metadata
FOR EACH (hash, metadata) IN metadataList DO
vectorStore.metadata_db.insert(hash, metadata)
END FOR
END IF
RETURN Ok(())
END
COMPLEXITY ANALYSIS:
Time: O(b × d) where b = batch size, d = embedding dim
Space: O(b × d) for batch processing
Summary: Complexity Analysis
Overall System Complexity
| Component | Time Complexity | Space Complexity |
|---|---|---|
| Image Preprocessing | O(n log n) | O(n) |
| Vision Transformer | O(L × p² × d) | O(L × h × p²) |
| Text Recognition | O(T × B × V × log(BV)) | O(B × T × d) |
| Math Symbol Detection | O(s × c) | O(s) |
| Spatial Analysis | O(s log s) | O(s²) worst case |
| Tree Construction | O(s log s) | O(s × h) |
| LaTeX Generation | O(s) | O(h) |
| Batch Processing | O(N × T / P) | O(N × R + P × M) |
| Vector Caching | O(d + log n) | O(n × d) |
Legend:
- n = pixel count
- L = transformer layers
- p = number of patches
- d = embedding dimension
- h = attention heads
- T = sequence length
- B = beam width
- V = vocabulary size
- s = symbol count
- N = batch size
- P = parallel workers
- R = result size
- M = model size
Optimization Opportunities
- Preprocessing: Use GPU-accelerated image operations
- Transformer: Implement efficient attention (FlashAttention)
- Beam Search: Prune low-probability beams early
- Spatial Analysis: Use spatial indexing (QuadTree/R-tree)
- Caching: Implement tiered cache (L1: memory, L2: disk)
- Batch Processing: Dynamic load balancing across workers
- Vector Search: Use approximate nearest neighbor (HNSW)
Design Patterns Used
- Pipeline Pattern: Image preprocessing → OCR → Math parsing → Output
- Strategy Pattern: Multiple output format generators
- Observer Pattern: Progress tracking in batch processing
- Factory Pattern: Model and cache instantiation
- Adapter Pattern: Format conversion layers
- Repository Pattern: Vector store abstraction
- Command Pattern: Cache update operations
- Builder Pattern: Expression tree and XML construction
This pseudocode serves as the algorithmic blueprint for implementation in the Refinement phase.