Files
wifi-densepose/examples/scipix/docs/05_PSEUDOCODE.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

1838 lines
47 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SPARC Pseudocode: Ruvector-Scipix OCR & Math Recognition Pipeline
## Document Overview
This document provides algorithmic pseudocode for the core components of the ruvector-scipix OCR and mathematical expression recognition system. All algorithms use Rust-like syntax and include complexity analysis.
---
## 1. Image Preprocessing Pipeline
### 1.1 Main Preprocessing Algorithm
```
ALGORITHM: PreprocessImage
INPUT: imageBytes (Vec<u8>), config (PreprocessConfig)
OUTPUT: Result<ProcessedImage, Error>
CONSTANTS:
MAX_IMAGE_SIZE = 4096 × 4096 pixels
MIN_DPI = 150
TARGET_DPI = 300
NOISE_THRESHOLD = 0.15
DATA STRUCTURES:
ProcessedImage {
data: Vec<u8>,
width: u32,
height: u32,
channels: u8,
metadata: ImageMetadata,
regions: Vec<TextRegion>
}
ImageMetadata {
dpi: u32,
rotation: f32,
quality_score: f32,
has_math: bool
}
TextRegion {
bbox: BoundingBox,
confidence: f32,
region_type: RegionType // Text, Math, Diagram
}
BEGIN
// Phase 1: Image Loading and Validation
rawImage ← DecodeImage(imageBytes)
IF rawImage.is_error() THEN
RETURN Error("Failed to decode image")
END IF
IF rawImage.width > MAX_IMAGE_SIZE OR rawImage.height > MAX_IMAGE_SIZE THEN
rawImage ← ResizeImage(rawImage, MAX_IMAGE_SIZE)
END IF
// Phase 2: Rotation Detection and Correction
rotationAngle ← DetectRotation(rawImage)
IF ABS(rotationAngle) > 0.5 THEN
rawImage ← RotateImage(rawImage, -rotationAngle)
END IF
// Phase 3: DPI Normalization
currentDPI ← EstimateDPI(rawImage)
IF currentDPI < MIN_DPI THEN
RETURN Error("Image resolution too low")
END IF
IF currentDPI != TARGET_DPI THEN
scaleFactor ← TARGET_DPI / currentDPI
rawImage ← ResizeImage(rawImage, scaleFactor)
END IF
// Phase 4: Noise Reduction
noiseLevel ← EstimateNoise(rawImage)
IF noiseLevel > NOISE_THRESHOLD THEN
rawImage ← ApplyBilateralFilter(rawImage, sigma: 2.0, radius: 3)
END IF
// Phase 5: Contrast Enhancement
enhancedImage ← AdaptiveHistogramEqualization(rawImage, clip_limit: 2.0)
// Phase 6: Text Region Detection
regions ← DetectTextRegions(enhancedImage)
// Phase 7: Quality Assessment
qualityScore ← AssessQuality(enhancedImage, regions)
metadata ← ImageMetadata {
dpi: TARGET_DPI,
rotation: rotationAngle,
quality_score: qualityScore,
has_math: ContainsMathRegions(regions)
}
RETURN Ok(ProcessedImage {
data: enhancedImage.to_bytes(),
width: enhancedImage.width,
height: enhancedImage.height,
channels: enhancedImage.channels,
metadata: metadata,
regions: regions
})
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Image decoding: O(n) where n = pixel count
- Rotation detection: O(n log n) using Hough transform
- Image rotation: O(n)
- DPI scaling: O(n)
- Bilateral filter: O(n × r²) where r = radius
- CLAHE: O(n)
- Region detection: O(n log n)
Total: O(n log n)
Space Complexity:
- Raw image buffer: O(n)
- Intermediate buffers: O(n)
- Region storage: O(k) where k = region count
Total: O(n)
```
### 1.2 Rotation Detection Algorithm
```
ALGORITHM: DetectRotation
INPUT: image (Image)
OUTPUT: angle (f32)
BEGIN
// Convert to grayscale if needed
grayImage ← ToGrayscale(image)
// Apply edge detection
edges ← CannyEdgeDetection(grayImage, low: 50, high: 150)
// Use Hough Line Transform to detect dominant lines
lines ← HoughLineTransform(edges, rho: 1.0, theta: PI/180, threshold: 100)
IF lines.is_empty() THEN
RETURN 0.0
END IF
// Cluster angles into dominant orientations
angles ← []
FOR EACH line IN lines DO
angle ← line.theta * 180 / PI
// Normalize to [-45, 45] range
WHILE angle > 45 DO
angle ← angle - 90
END WHILE
WHILE angle < -45 DO
angle ← angle + 90
END WHILE
angles.push(angle)
END FOR
// Use median for robustness
angles.sort()
medianAngle ← angles[angles.len() / 2]
RETURN medianAngle
END
COMPLEXITY ANALYSIS:
Time: O(n log n) for Hough transform
Space: O(n) for edge map
```
### 1.3 Text Region Detection
```
ALGORITHM: DetectTextRegions
INPUT: image (Image)
OUTPUT: regions (Vec<TextRegion>)
DATA STRUCTURES:
Component {
pixels: Vec<Point>,
bbox: BoundingBox,
area: u32
}
BEGIN
// Use MSER (Maximally Stable Extremal Regions)
binaryImage ← AdaptiveThreshold(image, window: 15)
components ← FindConnectedComponents(binaryImage)
regions ← []
FOR EACH comp IN components DO
// Filter by geometric properties
aspectRatio ← comp.bbox.width / comp.bbox.height
density ← comp.area / (comp.bbox.width * comp.bbox.height)
IF aspectRatio > 0.1 AND aspectRatio < 10.0 AND density > 0.3 THEN
// Classify region type
features ← ExtractRegionFeatures(comp, image)
regionType ← ClassifyRegion(features)
region ← TextRegion {
bbox: comp.bbox,
confidence: features.confidence,
region_type: regionType
}
regions.push(region)
END IF
END FOR
// Merge nearby regions
mergedRegions ← MergeOverlappingRegions(regions, iou_threshold: 0.3)
RETURN mergedRegions
END
COMPLEXITY ANALYSIS:
Time: O(n × α(n)) where α is inverse Ackermann (connected components)
Space: O(k) where k = component count
```
---
## 2. OCR Engine Core
### 2.1 Main OCR Pipeline
```
ALGORITHM: RecognizeText
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: Result<RecognitionResult, Error>
DATA STRUCTURES:
RecognitionResult {
lines: Vec<TextLine>,
confidence: f32,
processing_time_ms: u64
}
TextLine {
text: String,
bbox: BoundingBox,
words: Vec<Word>,
confidence: f32
}
Word {
text: String,
bbox: BoundingBox,
chars: Vec<Character>,
confidence: f32
}
Character {
char: char,
bbox: BoundingBox,
confidence: f32,
alternatives: Vec<(char, f32)>
}
BEGIN
startTime ← GetCurrentTime()
// Phase 1: Vision Transformer Encoding
encodedFeatures ← EncodeImageFeatures(image, model)
// Phase 2: Text Line Detection
textLines ← DetectTextLines(encodedFeatures, image.regions)
// Phase 3: Character Recognition
recognizedLines ← []
totalConfidence ← 0.0
FOR EACH lineRegion IN textLines DO
lineImage ← CropRegion(image, lineRegion.bbox)
// Run sequence-to-sequence recognition
words ← RecognizeLineSequence(lineImage, model, encodedFeatures)
lineText ← words.map(|w| w.text).join(" ")
lineConfidence ← ComputeLineConfidence(words)
textLine ← TextLine {
text: lineText,
bbox: lineRegion.bbox,
words: words,
confidence: lineConfidence
}
recognizedLines.push(textLine)
totalConfidence ← totalConfidence + lineConfidence
END FOR
avgConfidence ← totalConfidence / recognizedLines.len()
processingTime ← GetCurrentTime() - startTime
RETURN Ok(RecognitionResult {
lines: recognizedLines,
confidence: avgConfidence,
processing_time_ms: processingTime
})
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Vision Transformer encoding: O(n² × d) where d = embedding dim
- Line detection: O(k × log k) where k = regions
- Character recognition per line: O(m × d²) where m = line length
- Total lines L: O(L × m × d²)
Overall: O(n² × d + L × m × d²)
Space Complexity:
- Feature maps: O(n × d)
- Attention maps: O(n² × h) where h = attention heads
- Output storage: O(L × m)
Total: O(n² × h + n × d)
```
### 2.2 Vision Transformer Encoding
```
ALGORITHM: EncodeImageFeatures
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: features (FeatureMap)
DATA STRUCTURES:
FeatureMap {
embeddings: Tensor<f32>, // Shape: [seq_len, embed_dim]
attention_weights: Tensor<f32>, // Shape: [heads, seq_len, seq_len]
positions: Vec<Point>
}
VisionTransformer {
patch_size: u32,
embed_dim: u32,
num_heads: u32,
num_layers: u32,
weights: ModelWeights
}
BEGIN
// Phase 1: Patch Extraction
patchSize ← model.patch_size
numPatchesH ← image.height / patchSize
numPatchesW ← image.width / patchSize
patches ← []
positions ← []
FOR h IN 0..numPatchesH DO
FOR w IN 0..numPatchesW DO
y ← h * patchSize
x ← w * patchSize
patch ← ExtractPatch(image, x, y, patchSize)
patches.push(patch)
positions.push(Point{x, y})
END FOR
END FOR
// Phase 2: Patch Embedding
embeddings ← []
FOR EACH patch IN patches DO
// Linear projection of flattened patch
flatPatch ← Flatten(patch)
embedding ← MatMul(model.weights.patch_projection, flatPatch)
embeddings.push(embedding)
END FOR
// Phase 3: Positional Encoding
FOR i IN 0..embeddings.len() DO
posEncoding ← ComputePositionalEncoding(i, model.embed_dim)
embeddings[i] ← embeddings[i] + posEncoding
END FOR
// Add [CLS] token
clsToken ← model.weights.cls_token
embeddings.insert(0, clsToken)
// Phase 4: Transformer Layers
x ← Tensor::from(embeddings)
allAttentionWeights ← []
FOR layer IN 0..model.num_layers DO
// Multi-head self-attention
(x, attentionWeights) ← MultiHeadAttention(
x,
model.weights.layers[layer],
num_heads: model.num_heads
)
allAttentionWeights.push(attentionWeights)
// Feed-forward network
x ← FeedForward(x, model.weights.layers[layer])
// Layer normalization
x ← LayerNorm(x, model.weights.layers[layer])
END FOR
RETURN FeatureMap {
embeddings: x,
attention_weights: Stack(allAttentionWeights),
positions: positions
}
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Patch extraction: O(n) where n = pixels
- Patch embedding: O(p × d²) where p = patches, d = embed_dim
- Attention per layer: O(p² × d)
- Total layers L: O(L ×× d)
Overall: O(L ×× d)
Space Complexity:
- Embeddings: O(p × d)
- Attention matrices: O(L × h × p²) where h = heads
Total: O(L × h × p² + p × d)
```
### 2.3 Character Recognition Sequence
```
ALGORITHM: RecognizeLineSequence
INPUT: lineImage (Image), model (VisionTransformer), features (FeatureMap)
OUTPUT: words (Vec<Word>)
DATA STRUCTURES:
BeamSearchState {
sequence: Vec<char>,
score: f32,
hidden_state: Tensor<f32>
}
CONSTANTS:
BEAM_WIDTH = 5
MAX_SEQUENCE_LENGTH = 256
END_TOKEN = '<END>'
SPACE_TOKEN = '<SPACE>'
BEGIN
// Initialize beam search
initialState ← BeamSearchState {
sequence: [],
score: 0.0,
hidden_state: features.embeddings[0] // CLS token
}
beams ← [initialState]
// Beam search decoding
FOR step IN 0..MAX_SEQUENCE_LENGTH DO
candidates ← []
FOR EACH beam IN beams DO
IF beam.sequence.last() == END_TOKEN THEN
candidates.push(beam)
CONTINUE
END IF
// Get character probabilities from model
(logits, newHiddenState) ← model.decode_step(
beam.hidden_state,
features.embeddings
)
probabilities ← Softmax(logits)
// Get top-k characters
topK ← GetTopK(probabilities, k: BEAM_WIDTH)
FOR EACH (char, prob) IN topK DO
newSequence ← beam.sequence.clone()
newSequence.push(char)
// Log probability for numerical stability
newScore ← beam.score + LOG(prob)
newBeam ← BeamSearchState {
sequence: newSequence,
score: newScore,
hidden_state: newHiddenState
}
candidates.push(newBeam)
END FOR
END FOR
// Keep top BEAM_WIDTH candidates
candidates.sort_by(|a, b| b.score.cmp(a.score))
beams ← candidates[0..BEAM_WIDTH]
// Check if all beams ended
allEnded ← beams.all(|b| b.sequence.last() == END_TOKEN)
IF allEnded THEN
BREAK
END IF
END FOR
// Take best beam
bestBeam ← beams[0]
// Split sequence into words
words ← []
currentWord ← []
currentBBox ← BoundingBox::new()
FOR i IN 0..bestBeam.sequence.len() DO
char ← bestBeam.sequence[i]
IF char == SPACE_TOKEN OR char == END_TOKEN THEN
IF NOT currentWord.is_empty() THEN
wordText ← currentWord.join("")
word ← Word {
text: wordText,
bbox: currentBBox,
chars: currentWord.clone(),
confidence: EXP(bestBeam.score / bestBeam.sequence.len())
}
words.push(word)
currentWord.clear()
END IF
ELSE
currentWord.push(Character {
char: char,
bbox: EstimateCharBBox(lineImage, i),
confidence: EXP(bestBeam.score / (i + 1)),
alternatives: []
})
END IF
END FOR
RETURN words
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Beam search steps: O(T × B × V) where:
T = max sequence length
B = beam width
V = vocabulary size
- Sorting per step: O(B × V × log(B × V))
Overall: O(T × B × V × log(B × V))
Space Complexity:
- Beam storage: O(B × T × d) where d = hidden dim
- Candidate buffer: O(B × V)
Total: O(B × T × d)
```
---
## 3. Mathematical Expression Parser
### 3.1 Math Expression Recognition
```
ALGORITHM: RecognizeMathExpression
INPUT: region (TextRegion), image (ProcessedImage), model (MathModel)
OUTPUT: Result<MathExpression, Error>
DATA STRUCTURES:
MathExpression {
latex: String,
tree: ExpressionTree,
symbols: Vec<MathSymbol>,
confidence: f32
}
ExpressionTree {
root: Box<TreeNode>,
height: u32
}
TreeNode {
symbol: MathSymbol,
relationship: SpatialRelation,
children: Vec<Box<TreeNode>>
}
MathSymbol {
symbol_type: SymbolType, // Digit, Operator, Letter, Special
value: String,
bbox: BoundingBox,
confidence: f32
}
SpatialRelation {
relation_type: RelationType, // Above, Below, Right, Superscript, Subscript
distance: f32,
alignment: f32
}
BEGIN
// Phase 1: Extract math region
mathImage ← CropRegion(image, region.bbox)
// Phase 2: Symbol Detection and Classification
symbols ← DetectMathSymbols(mathImage, model)
IF symbols.is_empty() THEN
RETURN Error("No mathematical symbols detected")
END IF
// Phase 3: Spatial Relationship Analysis
relationships ← AnalyzeSpatialRelationships(symbols)
// Phase 4: Expression Tree Construction
tree ← BuildExpressionTree(symbols, relationships)
// Phase 5: LaTeX Generation
latex ← GenerateLaTeX(tree)
// Calculate overall confidence
avgConfidence ← symbols.map(|s| s.confidence).average()
RETURN Ok(MathExpression {
latex: latex,
tree: tree,
symbols: symbols,
confidence: avgConfidence
})
END
COMPLEXITY ANALYSIS:
Time: O(n² × log n) where n = symbol count
Space: O(n × h) where h = tree height
```
### 3.2 Symbol Detection and Classification
```
ALGORITHM: DetectMathSymbols
INPUT: mathImage (Image), model (MathModel)
OUTPUT: symbols (Vec<MathSymbol>)
CONSTANTS:
SYMBOL_MIN_SIZE = 8 pixels
SYMBOL_MAX_SIZE = 128 pixels
CONFIDENCE_THRESHOLD = 0.7
BEGIN
// Phase 1: Connected Component Analysis
binaryImage ← AdaptiveThreshold(mathImage, window: 11)
components ← FindConnectedComponents(binaryImage)
symbols ← []
FOR EACH comp IN components DO
// Filter by size
width ← comp.bbox.width
height ← comp.bbox.height
IF width < SYMBOL_MIN_SIZE OR height < SYMBOL_MIN_SIZE THEN
CONTINUE
END IF
IF width > SYMBOL_MAX_SIZE OR height > SYMBOL_MAX_SIZE THEN
// Might be compound symbol, try to split
subComponents ← SplitComponent(comp)
FOR EACH subComp IN subComponents DO
ProcessSymbol(subComp, mathImage, model, symbols)
END FOR
ELSE
ProcessSymbol(comp, mathImage, model, symbols)
END IF
END FOR
// Sort symbols left-to-right, top-to-bottom
symbols.sort_by(|a, b| {
IF ABS(a.bbox.y - b.bbox.y) < 10 THEN
a.bbox.x.cmp(b.bbox.x)
ELSE
a.bbox.y.cmp(b.bbox.y)
END IF
})
RETURN symbols
END
SUBROUTINE: ProcessSymbol
INPUT: component (Component), image (Image), model (MathModel), symbols (Vec<MathSymbol>)
OUTPUT: None (modifies symbols)
BEGIN
// Extract symbol image
symbolImage ← CropRegion(image, component.bbox)
// Normalize to model input size
normalizedSymbol ← ResizeImage(symbolImage, 64, 64)
// Classify symbol
(symbolClass, confidence) ← model.classify_symbol(normalizedSymbol)
IF confidence >= CONFIDENCE_THRESHOLD THEN
symbol ← MathSymbol {
symbol_type: DetermineSymbolType(symbolClass),
value: symbolClass.to_string(),
bbox: component.bbox,
confidence: confidence
}
symbols.push(symbol)
END IF
END
COMPLEXITY ANALYSIS:
Time: O(n × c) where n = components, c = classification time
Space: O(n) for symbol storage
```
### 3.3 Spatial Relationship Analysis
```
ALGORITHM: AnalyzeSpatialRelationships
INPUT: symbols (Vec<MathSymbol>)
OUTPUT: relationships (Vec<(usize, usize, SpatialRelation)>)
DATA STRUCTURES:
RelationFeatures {
horizontal_distance: f32,
vertical_distance: f32,
size_ratio: f32,
vertical_alignment: f32,
horizontal_alignment: f32
}
CONSTANTS:
SUPERSCRIPT_Y_THRESHOLD = 0.6 // Relative to symbol height
SUBSCRIPT_Y_THRESHOLD = 0.4
FRACTION_ALIGNMENT_THRESHOLD = 0.8
BEGIN
relationships ← []
// Build spatial index for efficient queries
spatialIndex ← BuildQuadTree(symbols)
FOR i IN 0..symbols.len() DO
symbolA ← symbols[i]
// Find nearby symbols
nearbySymbols ← spatialIndex.query_radius(
symbolA.bbox.center(),
radius: symbolA.bbox.width * 3
)
FOR EACH (j, symbolB) IN nearbySymbols DO
IF i >= j THEN
CONTINUE // Avoid duplicate pairs
END IF
// Extract relationship features
features ← ExtractRelationFeatures(symbolA, symbolB)
// Classify relationship
relation ← ClassifyRelation(features, symbolA, symbolB)
IF relation.is_some() THEN
relationships.push((i, j, relation.unwrap()))
END IF
END FOR
END FOR
RETURN relationships
END
SUBROUTINE: ClassifyRelation
INPUT: features (RelationFeatures), symbolA (MathSymbol), symbolB (MathSymbol)
OUTPUT: Option<SpatialRelation>
BEGIN
centerA ← symbolA.bbox.center()
centerB ← symbolB.bbox.center()
deltaX ← centerB.x - centerA.x
deltaY ← centerB.y - centerA.y
// Determine dominant relationship
// Superscript/Subscript detection
IF deltaX > 0 AND deltaX < symbolA.bbox.width * 1.5 THEN
relativeY ← deltaY / symbolA.bbox.height
IF relativeY < -SUPERSCRIPT_Y_THRESHOLD THEN
RETURN Some(SpatialRelation {
relation_type: Superscript,
distance: SQRT(deltaX² + deltaY²),
alignment: features.horizontal_alignment
})
ELSE IF relativeY > SUBSCRIPT_Y_THRESHOLD THEN
RETURN Some(SpatialRelation {
relation_type: Subscript,
distance: SQRT(deltaX² + deltaY²),
alignment: features.horizontal_alignment
})
END IF
END IF
// Fraction detection (vertical alignment)
IF features.vertical_alignment > FRACTION_ALIGNMENT_THRESHOLD THEN
IF deltaY < 0 THEN
RETURN Some(SpatialRelation {
relation_type: Above,
distance: ABS(deltaY),
alignment: features.vertical_alignment
})
ELSE IF deltaY > 0 THEN
RETURN Some(SpatialRelation {
relation_type: Below,
distance: ABS(deltaY),
alignment: features.vertical_alignment
})
END IF
END IF
// Horizontal sequence (default)
IF deltaX > 0 AND ABS(deltaY) < symbolA.bbox.height * 0.3 THEN
RETURN Some(SpatialRelation {
relation_type: Right,
distance: deltaX,
alignment: features.horizontal_alignment
})
END IF
RETURN None
END
COMPLEXITY ANALYSIS:
Time Complexity:
- QuadTree construction: O(n log n)
- For each symbol, query nearby: O(log n + k) where k = nearby count
- Total: O(n × (log n + k))
Average case: O(n log n) if k is constant
Space Complexity:
- QuadTree: O(n)
- Relationships: O(n²) worst case, O(n) average
Total: O(n²) worst case
```
### 3.4 Expression Tree Construction
```
ALGORITHM: BuildExpressionTree
INPUT: symbols (Vec<MathSymbol>), relationships (Vec<(usize, usize, SpatialRelation)>)
OUTPUT: tree (ExpressionTree)
DATA STRUCTURES:
TreeBuilder {
nodes: Vec<Box<TreeNode>>,
parent_map: HashMap<usize, usize>,
relation_graph: AdjacencyList
}
BEGIN
// Phase 1: Build relationship graph
graph ← BuildRelationGraph(symbols, relationships)
// Phase 2: Identify root candidates (symbols with no parents)
rootCandidates ← []
FOR i IN 0..symbols.len() DO
IF NOT HasIncomingEdge(graph, i, excludeRight: true) THEN
rootCandidates.push(i)
END IF
END FOR
// Phase 3: Build tree from leftmost root
rootCandidates.sort_by(|a, b| {
symbols[*a].bbox.x.cmp(&symbols[*b].bbox.x)
})
rootIdx ← rootCandidates[0]
// Phase 4: Recursive tree construction
root ← BuildSubtree(rootIdx, symbols, graph, visited: Set::new())
// Phase 5: Calculate tree height
height ← CalculateHeight(root)
RETURN ExpressionTree {
root: root,
height: height
}
END
SUBROUTINE: BuildSubtree
INPUT: nodeIdx (usize), symbols (Vec<MathSymbol>), graph (AdjacencyList), visited (Set<usize>)
OUTPUT: node (Box<TreeNode>)
BEGIN
IF visited.contains(nodeIdx) THEN
RETURN Error("Cycle detected in expression tree")
END IF
visited.insert(nodeIdx)
symbol ← symbols[nodeIdx]
children ← []
// Get all outgoing edges sorted by relationship priority
edges ← graph.get_outgoing(nodeIdx)
edges.sort_by(|a, b| {
// Priority: Superscript > Subscript > Above > Below > Right
GetRelationPriority(a.relation).cmp(GetRelationPriority(b.relation))
})
FOR EACH edge IN edges DO
IF NOT visited.contains(edge.target) THEN
childNode ← BuildSubtree(edge.target, symbols, graph, visited)
childNode.relationship ← edge.relation
children.push(childNode)
END IF
END FOR
node ← TreeNode {
symbol: symbol.clone(),
relationship: SpatialRelation::default(),
children: children
}
RETURN Box::new(node)
END
COMPLEXITY ANALYSIS:
Time: O(n × log n) for graph construction and tree building
Space: O(n × h) where h = average tree height
```
### 3.5 LaTeX Generation
```
ALGORITHM: GenerateLaTeX
INPUT: tree (ExpressionTree)
OUTPUT: latex (String)
BEGIN
latex ← RecursiveGenerateLaTeX(tree.root)
// Wrap in delimiters
latex ← "\\(" + latex + "\\)"
RETURN latex
END
SUBROUTINE: RecursiveGenerateLaTeX
INPUT: node (Box<TreeNode>)
OUTPUT: latex (String)
BEGIN
symbol ← node.symbol
baseLatex ← SymbolToLatex(symbol)
// Group children by relationship type
superscripts ← []
subscripts ← []
numerator ← None
denominator ← None
rightChildren ← []
FOR EACH child IN node.children DO
MATCH child.relationship.relation_type:
Superscript → superscripts.push(child)
Subscript → subscripts.push(child)
Above → numerator ← Some(child)
Below → denominator ← Some(child)
Right → rightChildren.push(child)
END MATCH
END FOR
// Build LaTeX string
result ← baseLatex
// Handle fractions
IF numerator.is_some() AND denominator.is_some() THEN
numLatex ← RecursiveGenerateLaTeX(numerator.unwrap())
denomLatex ← RecursiveGenerateLaTeX(denominator.unwrap())
result ← "\\frac{" + numLatex + "}{" + denomLatex + "}"
END IF
// Handle superscripts
IF NOT superscripts.is_empty() THEN
superLatex ← superscripts
.map(|c| RecursiveGenerateLaTeX(c))
.join("")
result ← result + "^{" + superLatex + "}"
END IF
// Handle subscripts
IF NOT subscripts.is_empty() THEN
subLatex ← subscripts
.map(|c| RecursiveGenerateLaTeX(c))
.join("")
result ← result + "_{" + subLatex + "}"
END IF
// Handle right children (sequential)
FOR EACH child IN rightChildren DO
childLatex ← RecursiveGenerateLaTeX(child)
// Add spacing for operators
IF IsOperator(child.symbol) THEN
result ← result + " " + childLatex + " "
ELSE
result ← result + childLatex
END IF
END FOR
RETURN result
END
SUBROUTINE: SymbolToLatex
INPUT: symbol (MathSymbol)
OUTPUT: latex (String)
BEGIN
MATCH symbol.symbol_type:
Digit → RETURN symbol.value
Letter → RETURN symbol.value
Operator → RETURN OperatorToLatex(symbol.value)
Special → RETURN SpecialToLatex(symbol.value)
END MATCH
RETURN symbol.value
END
COMPLEXITY ANALYSIS:
Time: O(n) where n = nodes in tree
Space: O(h) for recursion stack where h = tree height
```
---
## 4. Output Format Conversion
### 4.1 Multi-Format Generation
```
ALGORITHM: ConvertToFormats
INPUT: mathExpr (MathExpression), formats (Vec<OutputFormat>)
OUTPUT: Result<HashMap<OutputFormat, String>, Error>
DATA STRUCTURES:
OutputFormat {
MMD, // Markdown with delimiters
LaTeXStyled, // Standalone LaTeX
MathML, // MathML XML
HTML // Rendered HTML
}
BEGIN
results ← HashMap::new()
FOR EACH format IN formats DO
output ← MATCH format:
MMD → GenerateMMD(mathExpr)
LaTeXStyled → GenerateStyledLaTeX(mathExpr)
MathML → GenerateMathML(mathExpr.tree)
HTML → GenerateHTML(mathExpr)
END MATCH
results.insert(format, output)
END FOR
RETURN Ok(results)
END
COMPLEXITY ANALYSIS:
Time: O(f × n) where f = format count, n = expression size
Space: O(f × n) for storing all formats
```
### 4.2 MMD Generation
```
ALGORITHM: GenerateMMD
INPUT: mathExpr (MathExpression)
OUTPUT: mmd (String)
CONSTANTS:
INLINE_DELIMITER = "$"
DISPLAY_DELIMITER = "$$"
BEGIN
latex ← mathExpr.latex
// Determine if expression should be display or inline
isDisplayMath ← ShouldBeDisplayMath(mathExpr)
IF isDisplayMath THEN
mmd ← DISPLAY_DELIMITER + "\n" + latex + "\n" + DISPLAY_DELIMITER
ELSE
mmd ← INLINE_DELIMITER + latex + INLINE_DELIMITER
END IF
RETURN mmd
END
SUBROUTINE: ShouldBeDisplayMath
INPUT: mathExpr (MathExpression)
OUTPUT: isDisplay (bool)
BEGIN
// Display math if:
// 1. Contains fractions or large operators
// 2. Tree height > 2
// 3. Width > threshold
hasFractions ← mathExpr.latex.contains("\\frac")
hasLargeOps ← mathExpr.latex.contains("\\sum") OR
mathExpr.latex.contains("\\int") OR
mathExpr.latex.contains("\\prod")
isTall ← mathExpr.tree.height > 2
isWide ← mathExpr.symbols.len() > 10
RETURN hasFractions OR hasLargeOps OR isTall OR isWide
END
COMPLEXITY ANALYSIS:
Time: O(n) where n = LaTeX string length
Space: O(n) for output string
```
### 4.3 MathML Generation
```
ALGORITHM: GenerateMathML
INPUT: tree (ExpressionTree)
OUTPUT: mathml (String)
BEGIN
xml ← XMLBuilder::new()
xml.start_element("math", [("xmlns", "http://www.w3.org/1998/Math/MathML")])
RecursiveGenerateMathML(tree.root, xml)
xml.end_element("math")
RETURN xml.to_string()
END
SUBROUTINE: RecursiveGenerateMathML
INPUT: node (Box<TreeNode>), xml (XMLBuilder)
OUTPUT: None (modifies xml)
BEGIN
symbol ← node.symbol
// Determine MathML element type
MATCH symbol.symbol_type:
Digit OR Letter →
xml.element("mi", symbol.value)
Operator →
xml.element("mo", symbol.value)
Special →
HandleSpecialSymbol(symbol, xml)
END MATCH
// Handle relationships
IF HasSuperscript(node) THEN
xml.start_element("msup")
RecursiveGenerateMathML(GetBase(node), xml)
RecursiveGenerateMathML(GetSuperscript(node), xml)
xml.end_element("msup")
ELSE IF HasSubscript(node) THEN
xml.start_element("msub")
RecursiveGenerateMathML(GetBase(node), xml)
RecursiveGenerateMathML(GetSubscript(node), xml)
xml.end_element("msub")
ELSE IF HasFraction(node) THEN
xml.start_element("mfrac")
RecursiveGenerateMathML(GetNumerator(node), xml)
RecursiveGenerateMathML(GetDenominator(node), xml)
xml.end_element("mfrac")
END IF
// Process right children
FOR EACH child IN GetRightChildren(node) DO
RecursiveGenerateMathML(child, xml)
END FOR
END
COMPLEXITY ANALYSIS:
Time: O(n) tree traversal
Space: O(n) for XML string
```
### 4.4 HTML Rendering
```
ALGORITHM: GenerateHTML
INPUT: mathExpr (MathExpression)
OUTPUT: html (String)
BEGIN
// Use KaTeX or MathJax for rendering
latex ← mathExpr.latex
html ← """
<div class="math-expression" data-confidence="{mathExpr.confidence}">
<script type="math/tex">
{latex}
</script>
</div>
"""
// Add accessibility attributes
html ← AddAriaLabels(html, mathExpr)
RETURN html
END
COMPLEXITY ANALYSIS:
Time: O(n) string concatenation
Space: O(n) output size
```
---
## 5. Batch Processing
### 5.1 Parallel Batch Processing
```
ALGORITHM: ProcessBatch
INPUT: inputs (Vec<InputSource>), config (ProcessConfig)
OUTPUT: Result<Vec<ProcessResult>, Error>
DATA STRUCTURES:
InputSource {
source_type: SourceType, // Image, PDF, Directory
path: PathBuf,
page_range: Option<Range<u32>>
}
ProcessResult {
input: InputSource,
output: RecognitionResult,
processing_time_ms: u64,
status: ResultStatus
}
ProcessConfig {
max_parallel: usize,
timeout_ms: u64,
cache_enabled: bool
}
BEGIN
// Phase 1: Expand inputs (handle PDFs and directories)
expandedInputs ← []
FOR EACH input IN inputs DO
MATCH input.source_type:
PDF →
pages ← ExtractPDFPages(input.path, input.page_range)
expandedInputs.extend(pages)
Directory →
images ← FindImagesInDirectory(input.path)
expandedInputs.extend(images)
Image →
expandedInputs.push(input)
END MATCH
END FOR
// Phase 2: Create processing queue
queue ← WorkQueue::new(expandedInputs)
results ← ConcurrentVec::new()
// Phase 3: Parallel processing
numWorkers ← MIN(config.max_parallel, CPU_COUNT)
PARALLEL FOR worker IN 0..numWorkers DO
LOOP
input ← queue.pop()
IF input.is_none() THEN
BREAK
END IF
startTime ← GetCurrentTime()
// Process single input
result ← ProcessSingleInput(
input.unwrap(),
config,
timeout: config.timeout_ms
)
processingTime ← GetCurrentTime() - startTime
processResult ← ProcessResult {
input: input.unwrap(),
output: result,
processing_time_ms: processingTime,
status: DetermineStatus(result)
}
results.push(processResult)
END LOOP
END PARALLEL
// Phase 4: Aggregate and return
finalResults ← results.into_vec()
finalResults.sort_by(|a, b| a.input.path.cmp(&b.input.path))
RETURN Ok(finalResults)
END
COMPLEXITY ANALYSIS:
Time Complexity:
- With P workers, N inputs, T time per input
- Parallel: O(N × T / P)
- Sequential equivalent: O(N × T)
- Speedup: ~P (linear with worker count)
Space Complexity:
- Queue: O(N)
- Results: O(N × R) where R = result size
- Worker memory: O(P × M) where M = model size
Total: O(N × R + P × M)
```
### 5.2 PDF Page Extraction
```
ALGORITHM: ExtractPDFPages
INPUT: pdfPath (PathBuf), pageRange (Option<Range<u32>>)
OUTPUT: pages (Vec<InputSource>)
BEGIN
// Load PDF document
document ← PDFDocument::load(pdfPath)
IF document.is_error() THEN
RETURN Error("Failed to load PDF")
END IF
// Determine page range
totalPages ← document.page_count()
range ← pageRange.unwrap_or(0..totalPages)
pages ← []
FOR pageNum IN range DO
IF pageNum >= totalPages THEN
BREAK
END IF
// Render page to image
page ← document.get_page(pageNum)
// Render at high DPI for quality
image ← page.render(dpi: 300)
// Create temporary file
tempPath ← CreateTempFile(format!("page_{}.png", pageNum))
image.save(tempPath)
inputSource ← InputSource {
source_type: Image,
path: tempPath,
page_range: None
}
pages.push(inputSource)
END FOR
RETURN pages
END
COMPLEXITY ANALYSIS:
Time: O(P × R) where P = pages, R = render time per page
Space: O(P × S) where S = image size
```
### 5.3 Result Aggregation
```
ALGORITHM: AggregateResults
INPUT: results (Vec<ProcessResult>)
OUTPUT: aggregated (AggregatedResults)
DATA STRUCTURES:
AggregatedResults {
total_count: usize,
success_count: usize,
failure_count: usize,
total_processing_time_ms: u64,
average_confidence: f32,
results_by_status: HashMap<ResultStatus, Vec<ProcessResult>>
}
BEGIN
totalCount ← results.len()
successCount ← 0
failureCount ← 0
totalTime ← 0
totalConfidence ← 0.0
byStatus ← HashMap::new()
FOR EACH result IN results DO
totalTime ← totalTime + result.processing_time_ms
MATCH result.status:
Success →
successCount ← successCount + 1
totalConfidence ← totalConfidence + result.output.confidence
Failure →
failureCount ← failureCount + 1
END MATCH
// Group by status
IF NOT byStatus.contains_key(result.status) THEN
byStatus.insert(result.status, [])
END IF
byStatus.get_mut(result.status).push(result)
END FOR
avgConfidence ← IF successCount > 0 THEN
totalConfidence / successCount
ELSE
0.0
END IF
RETURN AggregatedResults {
total_count: totalCount,
success_count: successCount,
failure_count: failureCount,
total_processing_time_ms: totalTime,
average_confidence: avgConfidence,
results_by_status: byStatus
}
END
COMPLEXITY ANALYSIS:
Time: O(n) single pass
Space: O(n) for grouping
```
---
## 6. Caching and Memoization
### 6.1 Model Weight Caching
```
ALGORITHM: LoadModelWithCache
INPUT: modelPath (PathBuf), cacheConfig (CacheConfig)
OUTPUT: Result<Model, Error>
DATA STRUCTURES:
CacheConfig {
enabled: bool,
cache_dir: PathBuf,
max_cache_size_mb: u64,
ttl_seconds: u64
}
CachedModel {
weights: Vec<u8>,
metadata: ModelMetadata,
cached_at: Timestamp,
access_count: u64
}
BEGIN
IF NOT cacheConfig.enabled THEN
RETURN LoadModelDirect(modelPath)
END IF
// Generate cache key from model path and version
cacheKey ← ComputeHash(modelPath, algorithm: SHA256)
cachePath ← cacheConfig.cache_dir.join(cacheKey)
// Check if cached version exists and is valid
IF cachePath.exists() THEN
cachedModel ← DeserializeCachedModel(cachePath)
// Check TTL
age ← GetCurrentTime() - cachedModel.cached_at
IF age < cacheConfig.ttl_seconds THEN
// Cache hit
cachedModel.access_count ← cachedModel.access_count + 1
UpdateCacheMetadata(cachePath, cachedModel.metadata)
model ← DeserializeModel(cachedModel.weights)
RETURN Ok(model)
ELSE
// Cache expired
DeleteFile(cachePath)
END IF
END IF
// Cache miss - load from disk
model ← LoadModelDirect(modelPath)
IF model.is_error() THEN
RETURN model
END IF
// Serialize and cache
serializedWeights ← SerializeModel(model.unwrap())
cachedModel ← CachedModel {
weights: serializedWeights,
metadata: model.metadata,
cached_at: GetCurrentTime(),
access_count: 1
}
// Check cache size limit
EnsureCacheSize(cacheConfig)
// Write to cache
WriteCachedModel(cachePath, cachedModel)
RETURN model
END
SUBROUTINE: EnsureCacheSize
INPUT: cacheConfig (CacheConfig)
OUTPUT: None
BEGIN
currentSize ← GetDirectorySize(cacheConfig.cache_dir)
maxSize ← cacheConfig.max_cache_size_mb * 1024 * 1024
IF currentSize <= maxSize THEN
RETURN
END IF
// Evict least recently used models
cachedFiles ← ListFiles(cacheConfig.cache_dir)
// Sort by last access time
cachedFiles.sort_by(|a, b| {
a.metadata.accessed_at.cmp(&b.metadata.accessed_at)
})
freedSpace ← 0
targetFree ← currentSize - maxSize
FOR EACH file IN cachedFiles DO
IF freedSpace >= targetFree THEN
BREAK
END IF
fileSize ← GetFileSize(file)
DeleteFile(file)
freedSpace ← freedSpace + fileSize
END FOR
END
COMPLEXITY ANALYSIS:
Time Complexity:
- Cache hit: O(1) for lookup + O(m) for deserialization
- Cache miss: O(m) for model loading + O(m) for serialization
- Eviction: O(k log k) where k = cached files
Space Complexity:
- Cached model: O(m) where m = model size
- LRU tracking: O(k)
```
### 6.2 Result Caching with Ruvector
```
ALGORITHM: CacheResultWithVector
INPUT: imageHash (Hash), result (RecognitionResult), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>
DATA STRUCTURES:
RuvectorStore {
index: VectorIndex,
metadata_db: HashMap<Hash, ResultMetadata>,
config: VectorConfig
}
VectorConfig {
embedding_dim: usize,
similarity_threshold: f32,
max_cache_entries: usize
}
ResultMetadata {
result: RecognitionResult,
image_hash: Hash,
cached_at: Timestamp,
hit_count: u64
}
BEGIN
// Phase 1: Generate perceptual hash
perceptualHash ← ComputePerceptualHash(imageHash)
// Phase 2: Check if already cached
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
// Update metadata
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
metadata.hit_count ← metadata.hit_count + 1
RETURN Ok(())
END IF
// Phase 3: Generate embedding for the result
embedding ← GenerateResultEmbedding(result)
// Phase 4: Store in vector index
vectorStore.index.insert(
id: perceptualHash,
vector: embedding
)
// Phase 5: Store metadata
metadata ← ResultMetadata {
result: result,
image_hash: imageHash,
cached_at: GetCurrentTime(),
hit_count: 1
}
vectorStore.metadata_db.insert(perceptualHash, metadata)
// Phase 6: Enforce cache size limit
IF vectorStore.metadata_db.len() > vectorStore.config.max_cache_entries THEN
EvictLeastUsedEntry(vectorStore)
END IF
RETURN Ok(())
END
ALGORITHM: QuerySimilarCachedResult
INPUT: imageHash (Hash), vectorStore (RuvectorStore)
OUTPUT: Option<RecognitionResult>
BEGIN
// Generate perceptual hash
perceptualHash ← ComputePerceptualHash(imageHash)
// Exact match check
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
metadata ← vectorStore.metadata_db.get(perceptualHash)
metadata.hit_count ← metadata.hit_count + 1
RETURN Some(metadata.result.clone())
END IF
// Generate query embedding
queryEmbedding ← GenerateImageEmbedding(imageHash)
// Search for similar results
results ← vectorStore.index.search(
query: queryEmbedding,
k: 1,
threshold: vectorStore.config.similarity_threshold
)
IF results.is_empty() THEN
RETURN None
END IF
bestMatch ← results[0]
IF bestMatch.similarity >= vectorStore.config.similarity_threshold THEN
metadata ← vectorStore.metadata_db.get(bestMatch.id)
metadata.hit_count ← metadata.hit_count + 1
RETURN Some(metadata.result.clone())
END IF
RETURN None
END
COMPLEXITY ANALYSIS:
Caching:
Time: O(d) for embedding + O(log n) for index insertion
Space: O(n × d) where n = cached entries, d = embedding dim
Querying:
Time: O(d) for embedding + O(log n × d) for ANN search
Space: O(k) for results where k = search parameter
```
### 6.3 Incremental Update Cache
```
ALGORITHM: UpdateCacheIncremental
INPUT: updates (Vec<CacheUpdate>), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>
DATA STRUCTURES:
CacheUpdate {
operation: UpdateOp, // Insert, Update, Delete
image_hash: Hash,
result: Option<RecognitionResult>
}
UpdateOp {
Insert,
Update,
Delete
}
BEGIN
// Batch updates for efficiency
insertBatch ← []
updateBatch ← []
deleteBatch ← []
FOR EACH update IN updates DO
MATCH update.operation:
Insert →
insertBatch.push(update)
Update →
updateBatch.push(update)
Delete →
deleteBatch.push(update)
END MATCH
END FOR
// Process deletes first
FOR EACH update IN deleteBatch DO
perceptualHash ← ComputePerceptualHash(update.image_hash)
vectorStore.index.remove(perceptualHash)
vectorStore.metadata_db.remove(perceptualHash)
END FOR
// Process updates
FOR EACH update IN updateBatch DO
perceptualHash ← ComputePerceptualHash(update.image_hash)
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
// Update existing entry
embedding ← GenerateResultEmbedding(update.result.unwrap())
vectorStore.index.update(perceptualHash, embedding)
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
metadata.result ← update.result.unwrap()
metadata.cached_at ← GetCurrentTime()
END IF
END FOR
// Process inserts in batch
IF NOT insertBatch.is_empty() THEN
embeddings ← []
metadataList ← []
FOR EACH update IN insertBatch DO
embedding ← GenerateResultEmbedding(update.result.unwrap())
embeddings.push(embedding)
perceptualHash ← ComputePerceptualHash(update.image_hash)
metadata ← ResultMetadata {
result: update.result.unwrap(),
image_hash: update.image_hash,
cached_at: GetCurrentTime(),
hit_count: 1
}
metadataList.push((perceptualHash, metadata))
END FOR
// Batch insert into vector index
vectorStore.index.insert_batch(embeddings)
// Batch insert metadata
FOR EACH (hash, metadata) IN metadataList DO
vectorStore.metadata_db.insert(hash, metadata)
END FOR
END IF
RETURN Ok(())
END
COMPLEXITY ANALYSIS:
Time: O(b × d) where b = batch size, d = embedding dim
Space: O(b × d) for batch processing
```
---
## Summary: Complexity Analysis
### Overall System Complexity
| Component | Time Complexity | Space Complexity |
|-----------|----------------|------------------|
| Image Preprocessing | O(n log n) | O(n) |
| Vision Transformer | O(L ×× d) | O(L × h × p²) |
| Text Recognition | O(T × B × V × log(BV)) | O(B × T × d) |
| Math Symbol Detection | O(s × c) | O(s) |
| Spatial Analysis | O(s log s) | O(s²) worst case |
| Tree Construction | O(s log s) | O(s × h) |
| LaTeX Generation | O(s) | O(h) |
| Batch Processing | O(N × T / P) | O(N × R + P × M) |
| Vector Caching | O(d + log n) | O(n × d) |
**Legend:**
- n = pixel count
- L = transformer layers
- p = number of patches
- d = embedding dimension
- h = attention heads
- T = sequence length
- B = beam width
- V = vocabulary size
- s = symbol count
- N = batch size
- P = parallel workers
- R = result size
- M = model size
### Optimization Opportunities
1. **Preprocessing**: Use GPU-accelerated image operations
2. **Transformer**: Implement efficient attention (FlashAttention)
3. **Beam Search**: Prune low-probability beams early
4. **Spatial Analysis**: Use spatial indexing (QuadTree/R-tree)
5. **Caching**: Implement tiered cache (L1: memory, L2: disk)
6. **Batch Processing**: Dynamic load balancing across workers
7. **Vector Search**: Use approximate nearest neighbor (HNSW)
---
## Design Patterns Used
1. **Pipeline Pattern**: Image preprocessing → OCR → Math parsing → Output
2. **Strategy Pattern**: Multiple output format generators
3. **Observer Pattern**: Progress tracking in batch processing
4. **Factory Pattern**: Model and cache instantiation
5. **Adapter Pattern**: Format conversion layers
6. **Repository Pattern**: Vector store abstraction
7. **Command Pattern**: Cache update operations
8. **Builder Pattern**: Expression tree and XML construction
---
*This pseudocode serves as the algorithmic blueprint for implementation in the Refinement phase.*