git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
1838 lines
47 KiB
Markdown
1838 lines
47 KiB
Markdown
# SPARC Pseudocode: Ruvector-Scipix OCR & Math Recognition Pipeline
|
||
|
||
## Document Overview
|
||
|
||
This document provides algorithmic pseudocode for the core components of the ruvector-scipix OCR and mathematical expression recognition system. All algorithms use Rust-like syntax and include complexity analysis.
|
||
|
||
---
|
||
|
||
## 1. Image Preprocessing Pipeline
|
||
|
||
### 1.1 Main Preprocessing Algorithm
|
||
|
||
```
|
||
ALGORITHM: PreprocessImage
|
||
INPUT: imageBytes (Vec<u8>), config (PreprocessConfig)
|
||
OUTPUT: Result<ProcessedImage, Error>
|
||
|
||
CONSTANTS:
|
||
MAX_IMAGE_SIZE = 4096 × 4096 pixels
|
||
MIN_DPI = 150
|
||
TARGET_DPI = 300
|
||
NOISE_THRESHOLD = 0.15
|
||
|
||
DATA STRUCTURES:
|
||
ProcessedImage {
|
||
data: Vec<u8>,
|
||
width: u32,
|
||
height: u32,
|
||
channels: u8,
|
||
metadata: ImageMetadata,
|
||
regions: Vec<TextRegion>
|
||
}
|
||
|
||
ImageMetadata {
|
||
dpi: u32,
|
||
rotation: f32,
|
||
quality_score: f32,
|
||
has_math: bool
|
||
}
|
||
|
||
TextRegion {
|
||
bbox: BoundingBox,
|
||
confidence: f32,
|
||
region_type: RegionType // Text, Math, Diagram
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Image Loading and Validation
|
||
rawImage ← DecodeImage(imageBytes)
|
||
IF rawImage.is_error() THEN
|
||
RETURN Error("Failed to decode image")
|
||
END IF
|
||
|
||
IF rawImage.width > MAX_IMAGE_SIZE OR rawImage.height > MAX_IMAGE_SIZE THEN
|
||
rawImage ← ResizeImage(rawImage, MAX_IMAGE_SIZE)
|
||
END IF
|
||
|
||
// Phase 2: Rotation Detection and Correction
|
||
rotationAngle ← DetectRotation(rawImage)
|
||
IF ABS(rotationAngle) > 0.5 THEN
|
||
rawImage ← RotateImage(rawImage, -rotationAngle)
|
||
END IF
|
||
|
||
// Phase 3: DPI Normalization
|
||
currentDPI ← EstimateDPI(rawImage)
|
||
IF currentDPI < MIN_DPI THEN
|
||
RETURN Error("Image resolution too low")
|
||
END IF
|
||
|
||
IF currentDPI != TARGET_DPI THEN
|
||
scaleFactor ← TARGET_DPI / currentDPI
|
||
rawImage ← ResizeImage(rawImage, scaleFactor)
|
||
END IF
|
||
|
||
// Phase 4: Noise Reduction
|
||
noiseLevel ← EstimateNoise(rawImage)
|
||
IF noiseLevel > NOISE_THRESHOLD THEN
|
||
rawImage ← ApplyBilateralFilter(rawImage, sigma: 2.0, radius: 3)
|
||
END IF
|
||
|
||
// Phase 5: Contrast Enhancement
|
||
enhancedImage ← AdaptiveHistogramEqualization(rawImage, clip_limit: 2.0)
|
||
|
||
// Phase 6: Text Region Detection
|
||
regions ← DetectTextRegions(enhancedImage)
|
||
|
||
// Phase 7: Quality Assessment
|
||
qualityScore ← AssessQuality(enhancedImage, regions)
|
||
|
||
metadata ← ImageMetadata {
|
||
dpi: TARGET_DPI,
|
||
rotation: rotationAngle,
|
||
quality_score: qualityScore,
|
||
has_math: ContainsMathRegions(regions)
|
||
}
|
||
|
||
RETURN Ok(ProcessedImage {
|
||
data: enhancedImage.to_bytes(),
|
||
width: enhancedImage.width,
|
||
height: enhancedImage.height,
|
||
channels: enhancedImage.channels,
|
||
metadata: metadata,
|
||
regions: regions
|
||
})
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- Image decoding: O(n) where n = pixel count
|
||
- Rotation detection: O(n log n) using Hough transform
|
||
- Image rotation: O(n)
|
||
- DPI scaling: O(n)
|
||
- Bilateral filter: O(n × r²) where r = radius
|
||
- CLAHE: O(n)
|
||
- Region detection: O(n log n)
|
||
Total: O(n log n)
|
||
|
||
Space Complexity:
|
||
- Raw image buffer: O(n)
|
||
- Intermediate buffers: O(n)
|
||
- Region storage: O(k) where k = region count
|
||
Total: O(n)
|
||
```
|
||
|
||
### 1.2 Rotation Detection Algorithm
|
||
|
||
```
|
||
ALGORITHM: DetectRotation
|
||
INPUT: image (Image)
|
||
OUTPUT: angle (f32)
|
||
|
||
BEGIN
|
||
// Convert to grayscale if needed
|
||
grayImage ← ToGrayscale(image)
|
||
|
||
// Apply edge detection
|
||
edges ← CannyEdgeDetection(grayImage, low: 50, high: 150)
|
||
|
||
// Use Hough Line Transform to detect dominant lines
|
||
lines ← HoughLineTransform(edges, rho: 1.0, theta: PI/180, threshold: 100)
|
||
|
||
IF lines.is_empty() THEN
|
||
RETURN 0.0
|
||
END IF
|
||
|
||
// Cluster angles into dominant orientations
|
||
angles ← []
|
||
FOR EACH line IN lines DO
|
||
angle ← line.theta * 180 / PI
|
||
// Normalize to [-45, 45] range
|
||
WHILE angle > 45 DO
|
||
angle ← angle - 90
|
||
END WHILE
|
||
WHILE angle < -45 DO
|
||
angle ← angle + 90
|
||
END WHILE
|
||
angles.push(angle)
|
||
END FOR
|
||
|
||
// Use median for robustness
|
||
angles.sort()
|
||
medianAngle ← angles[angles.len() / 2]
|
||
|
||
RETURN medianAngle
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n log n) for Hough transform
|
||
Space: O(n) for edge map
|
||
```
|
||
|
||
### 1.3 Text Region Detection
|
||
|
||
```
|
||
ALGORITHM: DetectTextRegions
|
||
INPUT: image (Image)
|
||
OUTPUT: regions (Vec<TextRegion>)
|
||
|
||
DATA STRUCTURES:
|
||
Component {
|
||
pixels: Vec<Point>,
|
||
bbox: BoundingBox,
|
||
area: u32
|
||
}
|
||
|
||
BEGIN
|
||
// Use MSER (Maximally Stable Extremal Regions)
|
||
binaryImage ← AdaptiveThreshold(image, window: 15)
|
||
|
||
components ← FindConnectedComponents(binaryImage)
|
||
|
||
regions ← []
|
||
FOR EACH comp IN components DO
|
||
// Filter by geometric properties
|
||
aspectRatio ← comp.bbox.width / comp.bbox.height
|
||
density ← comp.area / (comp.bbox.width * comp.bbox.height)
|
||
|
||
IF aspectRatio > 0.1 AND aspectRatio < 10.0 AND density > 0.3 THEN
|
||
// Classify region type
|
||
features ← ExtractRegionFeatures(comp, image)
|
||
regionType ← ClassifyRegion(features)
|
||
|
||
region ← TextRegion {
|
||
bbox: comp.bbox,
|
||
confidence: features.confidence,
|
||
region_type: regionType
|
||
}
|
||
regions.push(region)
|
||
END IF
|
||
END FOR
|
||
|
||
// Merge nearby regions
|
||
mergedRegions ← MergeOverlappingRegions(regions, iou_threshold: 0.3)
|
||
|
||
RETURN mergedRegions
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n × α(n)) where α is inverse Ackermann (connected components)
|
||
Space: O(k) where k = component count
|
||
```
|
||
|
||
---
|
||
|
||
## 2. OCR Engine Core
|
||
|
||
### 2.1 Main OCR Pipeline
|
||
|
||
```
|
||
ALGORITHM: RecognizeText
|
||
INPUT: image (ProcessedImage), model (VisionTransformer)
|
||
OUTPUT: Result<RecognitionResult, Error>
|
||
|
||
DATA STRUCTURES:
|
||
RecognitionResult {
|
||
lines: Vec<TextLine>,
|
||
confidence: f32,
|
||
processing_time_ms: u64
|
||
}
|
||
|
||
TextLine {
|
||
text: String,
|
||
bbox: BoundingBox,
|
||
words: Vec<Word>,
|
||
confidence: f32
|
||
}
|
||
|
||
Word {
|
||
text: String,
|
||
bbox: BoundingBox,
|
||
chars: Vec<Character>,
|
||
confidence: f32
|
||
}
|
||
|
||
Character {
|
||
char: char,
|
||
bbox: BoundingBox,
|
||
confidence: f32,
|
||
alternatives: Vec<(char, f32)>
|
||
}
|
||
|
||
BEGIN
|
||
startTime ← GetCurrentTime()
|
||
|
||
// Phase 1: Vision Transformer Encoding
|
||
encodedFeatures ← EncodeImageFeatures(image, model)
|
||
|
||
// Phase 2: Text Line Detection
|
||
textLines ← DetectTextLines(encodedFeatures, image.regions)
|
||
|
||
// Phase 3: Character Recognition
|
||
recognizedLines ← []
|
||
totalConfidence ← 0.0
|
||
|
||
FOR EACH lineRegion IN textLines DO
|
||
lineImage ← CropRegion(image, lineRegion.bbox)
|
||
|
||
// Run sequence-to-sequence recognition
|
||
words ← RecognizeLineSequence(lineImage, model, encodedFeatures)
|
||
|
||
lineText ← words.map(|w| w.text).join(" ")
|
||
lineConfidence ← ComputeLineConfidence(words)
|
||
|
||
textLine ← TextLine {
|
||
text: lineText,
|
||
bbox: lineRegion.bbox,
|
||
words: words,
|
||
confidence: lineConfidence
|
||
}
|
||
|
||
recognizedLines.push(textLine)
|
||
totalConfidence ← totalConfidence + lineConfidence
|
||
END FOR
|
||
|
||
avgConfidence ← totalConfidence / recognizedLines.len()
|
||
processingTime ← GetCurrentTime() - startTime
|
||
|
||
RETURN Ok(RecognitionResult {
|
||
lines: recognizedLines,
|
||
confidence: avgConfidence,
|
||
processing_time_ms: processingTime
|
||
})
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- Vision Transformer encoding: O(n² × d) where d = embedding dim
|
||
- Line detection: O(k × log k) where k = regions
|
||
- Character recognition per line: O(m × d²) where m = line length
|
||
- Total lines L: O(L × m × d²)
|
||
Overall: O(n² × d + L × m × d²)
|
||
|
||
Space Complexity:
|
||
- Feature maps: O(n × d)
|
||
- Attention maps: O(n² × h) where h = attention heads
|
||
- Output storage: O(L × m)
|
||
Total: O(n² × h + n × d)
|
||
```
|
||
|
||
### 2.2 Vision Transformer Encoding
|
||
|
||
```
|
||
ALGORITHM: EncodeImageFeatures
|
||
INPUT: image (ProcessedImage), model (VisionTransformer)
|
||
OUTPUT: features (FeatureMap)
|
||
|
||
DATA STRUCTURES:
|
||
FeatureMap {
|
||
embeddings: Tensor<f32>, // Shape: [seq_len, embed_dim]
|
||
attention_weights: Tensor<f32>, // Shape: [heads, seq_len, seq_len]
|
||
positions: Vec<Point>
|
||
}
|
||
|
||
VisionTransformer {
|
||
patch_size: u32,
|
||
embed_dim: u32,
|
||
num_heads: u32,
|
||
num_layers: u32,
|
||
weights: ModelWeights
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Patch Extraction
|
||
patchSize ← model.patch_size
|
||
numPatchesH ← image.height / patchSize
|
||
numPatchesW ← image.width / patchSize
|
||
|
||
patches ← []
|
||
positions ← []
|
||
|
||
FOR h IN 0..numPatchesH DO
|
||
FOR w IN 0..numPatchesW DO
|
||
y ← h * patchSize
|
||
x ← w * patchSize
|
||
patch ← ExtractPatch(image, x, y, patchSize)
|
||
patches.push(patch)
|
||
positions.push(Point{x, y})
|
||
END FOR
|
||
END FOR
|
||
|
||
// Phase 2: Patch Embedding
|
||
embeddings ← []
|
||
FOR EACH patch IN patches DO
|
||
// Linear projection of flattened patch
|
||
flatPatch ← Flatten(patch)
|
||
embedding ← MatMul(model.weights.patch_projection, flatPatch)
|
||
embeddings.push(embedding)
|
||
END FOR
|
||
|
||
// Phase 3: Positional Encoding
|
||
FOR i IN 0..embeddings.len() DO
|
||
posEncoding ← ComputePositionalEncoding(i, model.embed_dim)
|
||
embeddings[i] ← embeddings[i] + posEncoding
|
||
END FOR
|
||
|
||
// Add [CLS] token
|
||
clsToken ← model.weights.cls_token
|
||
embeddings.insert(0, clsToken)
|
||
|
||
// Phase 4: Transformer Layers
|
||
x ← Tensor::from(embeddings)
|
||
allAttentionWeights ← []
|
||
|
||
FOR layer IN 0..model.num_layers DO
|
||
// Multi-head self-attention
|
||
(x, attentionWeights) ← MultiHeadAttention(
|
||
x,
|
||
model.weights.layers[layer],
|
||
num_heads: model.num_heads
|
||
)
|
||
|
||
allAttentionWeights.push(attentionWeights)
|
||
|
||
// Feed-forward network
|
||
x ← FeedForward(x, model.weights.layers[layer])
|
||
|
||
// Layer normalization
|
||
x ← LayerNorm(x, model.weights.layers[layer])
|
||
END FOR
|
||
|
||
RETURN FeatureMap {
|
||
embeddings: x,
|
||
attention_weights: Stack(allAttentionWeights),
|
||
positions: positions
|
||
}
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- Patch extraction: O(n) where n = pixels
|
||
- Patch embedding: O(p × d²) where p = patches, d = embed_dim
|
||
- Attention per layer: O(p² × d)
|
||
- Total layers L: O(L × p² × d)
|
||
Overall: O(L × p² × d)
|
||
|
||
Space Complexity:
|
||
- Embeddings: O(p × d)
|
||
- Attention matrices: O(L × h × p²) where h = heads
|
||
Total: O(L × h × p² + p × d)
|
||
```
|
||
|
||
### 2.3 Character Recognition Sequence
|
||
|
||
```
|
||
ALGORITHM: RecognizeLineSequence
|
||
INPUT: lineImage (Image), model (VisionTransformer), features (FeatureMap)
|
||
OUTPUT: words (Vec<Word>)
|
||
|
||
DATA STRUCTURES:
|
||
BeamSearchState {
|
||
sequence: Vec<char>,
|
||
score: f32,
|
||
hidden_state: Tensor<f32>
|
||
}
|
||
|
||
CONSTANTS:
|
||
BEAM_WIDTH = 5
|
||
MAX_SEQUENCE_LENGTH = 256
|
||
END_TOKEN = '<END>'
|
||
SPACE_TOKEN = '<SPACE>'
|
||
|
||
BEGIN
|
||
// Initialize beam search
|
||
initialState ← BeamSearchState {
|
||
sequence: [],
|
||
score: 0.0,
|
||
hidden_state: features.embeddings[0] // CLS token
|
||
}
|
||
|
||
beams ← [initialState]
|
||
|
||
// Beam search decoding
|
||
FOR step IN 0..MAX_SEQUENCE_LENGTH DO
|
||
candidates ← []
|
||
|
||
FOR EACH beam IN beams DO
|
||
IF beam.sequence.last() == END_TOKEN THEN
|
||
candidates.push(beam)
|
||
CONTINUE
|
||
END IF
|
||
|
||
// Get character probabilities from model
|
||
(logits, newHiddenState) ← model.decode_step(
|
||
beam.hidden_state,
|
||
features.embeddings
|
||
)
|
||
|
||
probabilities ← Softmax(logits)
|
||
|
||
// Get top-k characters
|
||
topK ← GetTopK(probabilities, k: BEAM_WIDTH)
|
||
|
||
FOR EACH (char, prob) IN topK DO
|
||
newSequence ← beam.sequence.clone()
|
||
newSequence.push(char)
|
||
|
||
// Log probability for numerical stability
|
||
newScore ← beam.score + LOG(prob)
|
||
|
||
newBeam ← BeamSearchState {
|
||
sequence: newSequence,
|
||
score: newScore,
|
||
hidden_state: newHiddenState
|
||
}
|
||
|
||
candidates.push(newBeam)
|
||
END FOR
|
||
END FOR
|
||
|
||
// Keep top BEAM_WIDTH candidates
|
||
candidates.sort_by(|a, b| b.score.cmp(a.score))
|
||
beams ← candidates[0..BEAM_WIDTH]
|
||
|
||
// Check if all beams ended
|
||
allEnded ← beams.all(|b| b.sequence.last() == END_TOKEN)
|
||
IF allEnded THEN
|
||
BREAK
|
||
END IF
|
||
END FOR
|
||
|
||
// Take best beam
|
||
bestBeam ← beams[0]
|
||
|
||
// Split sequence into words
|
||
words ← []
|
||
currentWord ← []
|
||
currentBBox ← BoundingBox::new()
|
||
|
||
FOR i IN 0..bestBeam.sequence.len() DO
|
||
char ← bestBeam.sequence[i]
|
||
|
||
IF char == SPACE_TOKEN OR char == END_TOKEN THEN
|
||
IF NOT currentWord.is_empty() THEN
|
||
wordText ← currentWord.join("")
|
||
word ← Word {
|
||
text: wordText,
|
||
bbox: currentBBox,
|
||
chars: currentWord.clone(),
|
||
confidence: EXP(bestBeam.score / bestBeam.sequence.len())
|
||
}
|
||
words.push(word)
|
||
currentWord.clear()
|
||
END IF
|
||
ELSE
|
||
currentWord.push(Character {
|
||
char: char,
|
||
bbox: EstimateCharBBox(lineImage, i),
|
||
confidence: EXP(bestBeam.score / (i + 1)),
|
||
alternatives: []
|
||
})
|
||
END IF
|
||
END FOR
|
||
|
||
RETURN words
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- Beam search steps: O(T × B × V) where:
|
||
T = max sequence length
|
||
B = beam width
|
||
V = vocabulary size
|
||
- Sorting per step: O(B × V × log(B × V))
|
||
Overall: O(T × B × V × log(B × V))
|
||
|
||
Space Complexity:
|
||
- Beam storage: O(B × T × d) where d = hidden dim
|
||
- Candidate buffer: O(B × V)
|
||
Total: O(B × T × d)
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Mathematical Expression Parser
|
||
|
||
### 3.1 Math Expression Recognition
|
||
|
||
```
|
||
ALGORITHM: RecognizeMathExpression
|
||
INPUT: region (TextRegion), image (ProcessedImage), model (MathModel)
|
||
OUTPUT: Result<MathExpression, Error>
|
||
|
||
DATA STRUCTURES:
|
||
MathExpression {
|
||
latex: String,
|
||
tree: ExpressionTree,
|
||
symbols: Vec<MathSymbol>,
|
||
confidence: f32
|
||
}
|
||
|
||
ExpressionTree {
|
||
root: Box<TreeNode>,
|
||
height: u32
|
||
}
|
||
|
||
TreeNode {
|
||
symbol: MathSymbol,
|
||
relationship: SpatialRelation,
|
||
children: Vec<Box<TreeNode>>
|
||
}
|
||
|
||
MathSymbol {
|
||
symbol_type: SymbolType, // Digit, Operator, Letter, Special
|
||
value: String,
|
||
bbox: BoundingBox,
|
||
confidence: f32
|
||
}
|
||
|
||
SpatialRelation {
|
||
relation_type: RelationType, // Above, Below, Right, Superscript, Subscript
|
||
distance: f32,
|
||
alignment: f32
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Extract math region
|
||
mathImage ← CropRegion(image, region.bbox)
|
||
|
||
// Phase 2: Symbol Detection and Classification
|
||
symbols ← DetectMathSymbols(mathImage, model)
|
||
|
||
IF symbols.is_empty() THEN
|
||
RETURN Error("No mathematical symbols detected")
|
||
END IF
|
||
|
||
// Phase 3: Spatial Relationship Analysis
|
||
relationships ← AnalyzeSpatialRelationships(symbols)
|
||
|
||
// Phase 4: Expression Tree Construction
|
||
tree ← BuildExpressionTree(symbols, relationships)
|
||
|
||
// Phase 5: LaTeX Generation
|
||
latex ← GenerateLaTeX(tree)
|
||
|
||
// Calculate overall confidence
|
||
avgConfidence ← symbols.map(|s| s.confidence).average()
|
||
|
||
RETURN Ok(MathExpression {
|
||
latex: latex,
|
||
tree: tree,
|
||
symbols: symbols,
|
||
confidence: avgConfidence
|
||
})
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n² × log n) where n = symbol count
|
||
Space: O(n × h) where h = tree height
|
||
```
|
||
|
||
### 3.2 Symbol Detection and Classification
|
||
|
||
```
|
||
ALGORITHM: DetectMathSymbols
|
||
INPUT: mathImage (Image), model (MathModel)
|
||
OUTPUT: symbols (Vec<MathSymbol>)
|
||
|
||
CONSTANTS:
|
||
SYMBOL_MIN_SIZE = 8 pixels
|
||
SYMBOL_MAX_SIZE = 128 pixels
|
||
CONFIDENCE_THRESHOLD = 0.7
|
||
|
||
BEGIN
|
||
// Phase 1: Connected Component Analysis
|
||
binaryImage ← AdaptiveThreshold(mathImage, window: 11)
|
||
components ← FindConnectedComponents(binaryImage)
|
||
|
||
symbols ← []
|
||
|
||
FOR EACH comp IN components DO
|
||
// Filter by size
|
||
width ← comp.bbox.width
|
||
height ← comp.bbox.height
|
||
|
||
IF width < SYMBOL_MIN_SIZE OR height < SYMBOL_MIN_SIZE THEN
|
||
CONTINUE
|
||
END IF
|
||
|
||
IF width > SYMBOL_MAX_SIZE OR height > SYMBOL_MAX_SIZE THEN
|
||
// Might be compound symbol, try to split
|
||
subComponents ← SplitComponent(comp)
|
||
FOR EACH subComp IN subComponents DO
|
||
ProcessSymbol(subComp, mathImage, model, symbols)
|
||
END FOR
|
||
ELSE
|
||
ProcessSymbol(comp, mathImage, model, symbols)
|
||
END IF
|
||
END FOR
|
||
|
||
// Sort symbols left-to-right, top-to-bottom
|
||
symbols.sort_by(|a, b| {
|
||
IF ABS(a.bbox.y - b.bbox.y) < 10 THEN
|
||
a.bbox.x.cmp(b.bbox.x)
|
||
ELSE
|
||
a.bbox.y.cmp(b.bbox.y)
|
||
END IF
|
||
})
|
||
|
||
RETURN symbols
|
||
END
|
||
|
||
SUBROUTINE: ProcessSymbol
|
||
INPUT: component (Component), image (Image), model (MathModel), symbols (Vec<MathSymbol>)
|
||
OUTPUT: None (modifies symbols)
|
||
|
||
BEGIN
|
||
// Extract symbol image
|
||
symbolImage ← CropRegion(image, component.bbox)
|
||
|
||
// Normalize to model input size
|
||
normalizedSymbol ← ResizeImage(symbolImage, 64, 64)
|
||
|
||
// Classify symbol
|
||
(symbolClass, confidence) ← model.classify_symbol(normalizedSymbol)
|
||
|
||
IF confidence >= CONFIDENCE_THRESHOLD THEN
|
||
symbol ← MathSymbol {
|
||
symbol_type: DetermineSymbolType(symbolClass),
|
||
value: symbolClass.to_string(),
|
||
bbox: component.bbox,
|
||
confidence: confidence
|
||
}
|
||
|
||
symbols.push(symbol)
|
||
END IF
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n × c) where n = components, c = classification time
|
||
Space: O(n) for symbol storage
|
||
```
|
||
|
||
### 3.3 Spatial Relationship Analysis
|
||
|
||
```
|
||
ALGORITHM: AnalyzeSpatialRelationships
|
||
INPUT: symbols (Vec<MathSymbol>)
|
||
OUTPUT: relationships (Vec<(usize, usize, SpatialRelation)>)
|
||
|
||
DATA STRUCTURES:
|
||
RelationFeatures {
|
||
horizontal_distance: f32,
|
||
vertical_distance: f32,
|
||
size_ratio: f32,
|
||
vertical_alignment: f32,
|
||
horizontal_alignment: f32
|
||
}
|
||
|
||
CONSTANTS:
|
||
SUPERSCRIPT_Y_THRESHOLD = 0.6 // Relative to symbol height
|
||
SUBSCRIPT_Y_THRESHOLD = 0.4
|
||
FRACTION_ALIGNMENT_THRESHOLD = 0.8
|
||
|
||
BEGIN
|
||
relationships ← []
|
||
|
||
// Build spatial index for efficient queries
|
||
spatialIndex ← BuildQuadTree(symbols)
|
||
|
||
FOR i IN 0..symbols.len() DO
|
||
symbolA ← symbols[i]
|
||
|
||
// Find nearby symbols
|
||
nearbySymbols ← spatialIndex.query_radius(
|
||
symbolA.bbox.center(),
|
||
radius: symbolA.bbox.width * 3
|
||
)
|
||
|
||
FOR EACH (j, symbolB) IN nearbySymbols DO
|
||
IF i >= j THEN
|
||
CONTINUE // Avoid duplicate pairs
|
||
END IF
|
||
|
||
// Extract relationship features
|
||
features ← ExtractRelationFeatures(symbolA, symbolB)
|
||
|
||
// Classify relationship
|
||
relation ← ClassifyRelation(features, symbolA, symbolB)
|
||
|
||
IF relation.is_some() THEN
|
||
relationships.push((i, j, relation.unwrap()))
|
||
END IF
|
||
END FOR
|
||
END FOR
|
||
|
||
RETURN relationships
|
||
END
|
||
|
||
SUBROUTINE: ClassifyRelation
|
||
INPUT: features (RelationFeatures), symbolA (MathSymbol), symbolB (MathSymbol)
|
||
OUTPUT: Option<SpatialRelation>
|
||
|
||
BEGIN
|
||
centerA ← symbolA.bbox.center()
|
||
centerB ← symbolB.bbox.center()
|
||
|
||
deltaX ← centerB.x - centerA.x
|
||
deltaY ← centerB.y - centerA.y
|
||
|
||
// Determine dominant relationship
|
||
|
||
// Superscript/Subscript detection
|
||
IF deltaX > 0 AND deltaX < symbolA.bbox.width * 1.5 THEN
|
||
relativeY ← deltaY / symbolA.bbox.height
|
||
|
||
IF relativeY < -SUPERSCRIPT_Y_THRESHOLD THEN
|
||
RETURN Some(SpatialRelation {
|
||
relation_type: Superscript,
|
||
distance: SQRT(deltaX² + deltaY²),
|
||
alignment: features.horizontal_alignment
|
||
})
|
||
ELSE IF relativeY > SUBSCRIPT_Y_THRESHOLD THEN
|
||
RETURN Some(SpatialRelation {
|
||
relation_type: Subscript,
|
||
distance: SQRT(deltaX² + deltaY²),
|
||
alignment: features.horizontal_alignment
|
||
})
|
||
END IF
|
||
END IF
|
||
|
||
// Fraction detection (vertical alignment)
|
||
IF features.vertical_alignment > FRACTION_ALIGNMENT_THRESHOLD THEN
|
||
IF deltaY < 0 THEN
|
||
RETURN Some(SpatialRelation {
|
||
relation_type: Above,
|
||
distance: ABS(deltaY),
|
||
alignment: features.vertical_alignment
|
||
})
|
||
ELSE IF deltaY > 0 THEN
|
||
RETURN Some(SpatialRelation {
|
||
relation_type: Below,
|
||
distance: ABS(deltaY),
|
||
alignment: features.vertical_alignment
|
||
})
|
||
END IF
|
||
END IF
|
||
|
||
// Horizontal sequence (default)
|
||
IF deltaX > 0 AND ABS(deltaY) < symbolA.bbox.height * 0.3 THEN
|
||
RETURN Some(SpatialRelation {
|
||
relation_type: Right,
|
||
distance: deltaX,
|
||
alignment: features.horizontal_alignment
|
||
})
|
||
END IF
|
||
|
||
RETURN None
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- QuadTree construction: O(n log n)
|
||
- For each symbol, query nearby: O(log n + k) where k = nearby count
|
||
- Total: O(n × (log n + k))
|
||
Average case: O(n log n) if k is constant
|
||
|
||
Space Complexity:
|
||
- QuadTree: O(n)
|
||
- Relationships: O(n²) worst case, O(n) average
|
||
Total: O(n²) worst case
|
||
```
|
||
|
||
### 3.4 Expression Tree Construction
|
||
|
||
```
|
||
ALGORITHM: BuildExpressionTree
|
||
INPUT: symbols (Vec<MathSymbol>), relationships (Vec<(usize, usize, SpatialRelation)>)
|
||
OUTPUT: tree (ExpressionTree)
|
||
|
||
DATA STRUCTURES:
|
||
TreeBuilder {
|
||
nodes: Vec<Box<TreeNode>>,
|
||
parent_map: HashMap<usize, usize>,
|
||
relation_graph: AdjacencyList
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Build relationship graph
|
||
graph ← BuildRelationGraph(symbols, relationships)
|
||
|
||
// Phase 2: Identify root candidates (symbols with no parents)
|
||
rootCandidates ← []
|
||
FOR i IN 0..symbols.len() DO
|
||
IF NOT HasIncomingEdge(graph, i, excludeRight: true) THEN
|
||
rootCandidates.push(i)
|
||
END IF
|
||
END FOR
|
||
|
||
// Phase 3: Build tree from leftmost root
|
||
rootCandidates.sort_by(|a, b| {
|
||
symbols[*a].bbox.x.cmp(&symbols[*b].bbox.x)
|
||
})
|
||
|
||
rootIdx ← rootCandidates[0]
|
||
|
||
// Phase 4: Recursive tree construction
|
||
root ← BuildSubtree(rootIdx, symbols, graph, visited: Set::new())
|
||
|
||
// Phase 5: Calculate tree height
|
||
height ← CalculateHeight(root)
|
||
|
||
RETURN ExpressionTree {
|
||
root: root,
|
||
height: height
|
||
}
|
||
END
|
||
|
||
SUBROUTINE: BuildSubtree
|
||
INPUT: nodeIdx (usize), symbols (Vec<MathSymbol>), graph (AdjacencyList), visited (Set<usize>)
|
||
OUTPUT: node (Box<TreeNode>)
|
||
|
||
BEGIN
|
||
IF visited.contains(nodeIdx) THEN
|
||
RETURN Error("Cycle detected in expression tree")
|
||
END IF
|
||
|
||
visited.insert(nodeIdx)
|
||
|
||
symbol ← symbols[nodeIdx]
|
||
children ← []
|
||
|
||
// Get all outgoing edges sorted by relationship priority
|
||
edges ← graph.get_outgoing(nodeIdx)
|
||
edges.sort_by(|a, b| {
|
||
// Priority: Superscript > Subscript > Above > Below > Right
|
||
GetRelationPriority(a.relation).cmp(GetRelationPriority(b.relation))
|
||
})
|
||
|
||
FOR EACH edge IN edges DO
|
||
IF NOT visited.contains(edge.target) THEN
|
||
childNode ← BuildSubtree(edge.target, symbols, graph, visited)
|
||
childNode.relationship ← edge.relation
|
||
children.push(childNode)
|
||
END IF
|
||
END FOR
|
||
|
||
node ← TreeNode {
|
||
symbol: symbol.clone(),
|
||
relationship: SpatialRelation::default(),
|
||
children: children
|
||
}
|
||
|
||
RETURN Box::new(node)
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n × log n) for graph construction and tree building
|
||
Space: O(n × h) where h = average tree height
|
||
```
|
||
|
||
### 3.5 LaTeX Generation
|
||
|
||
```
|
||
ALGORITHM: GenerateLaTeX
|
||
INPUT: tree (ExpressionTree)
|
||
OUTPUT: latex (String)
|
||
|
||
BEGIN
|
||
latex ← RecursiveGenerateLaTeX(tree.root)
|
||
|
||
// Wrap in delimiters
|
||
latex ← "\\(" + latex + "\\)"
|
||
|
||
RETURN latex
|
||
END
|
||
|
||
SUBROUTINE: RecursiveGenerateLaTeX
|
||
INPUT: node (Box<TreeNode>)
|
||
OUTPUT: latex (String)
|
||
|
||
BEGIN
|
||
symbol ← node.symbol
|
||
baseLatex ← SymbolToLatex(symbol)
|
||
|
||
// Group children by relationship type
|
||
superscripts ← []
|
||
subscripts ← []
|
||
numerator ← None
|
||
denominator ← None
|
||
rightChildren ← []
|
||
|
||
FOR EACH child IN node.children DO
|
||
MATCH child.relationship.relation_type:
|
||
Superscript → superscripts.push(child)
|
||
Subscript → subscripts.push(child)
|
||
Above → numerator ← Some(child)
|
||
Below → denominator ← Some(child)
|
||
Right → rightChildren.push(child)
|
||
END MATCH
|
||
END FOR
|
||
|
||
// Build LaTeX string
|
||
result ← baseLatex
|
||
|
||
// Handle fractions
|
||
IF numerator.is_some() AND denominator.is_some() THEN
|
||
numLatex ← RecursiveGenerateLaTeX(numerator.unwrap())
|
||
denomLatex ← RecursiveGenerateLaTeX(denominator.unwrap())
|
||
result ← "\\frac{" + numLatex + "}{" + denomLatex + "}"
|
||
END IF
|
||
|
||
// Handle superscripts
|
||
IF NOT superscripts.is_empty() THEN
|
||
superLatex ← superscripts
|
||
.map(|c| RecursiveGenerateLaTeX(c))
|
||
.join("")
|
||
result ← result + "^{" + superLatex + "}"
|
||
END IF
|
||
|
||
// Handle subscripts
|
||
IF NOT subscripts.is_empty() THEN
|
||
subLatex ← subscripts
|
||
.map(|c| RecursiveGenerateLaTeX(c))
|
||
.join("")
|
||
result ← result + "_{" + subLatex + "}"
|
||
END IF
|
||
|
||
// Handle right children (sequential)
|
||
FOR EACH child IN rightChildren DO
|
||
childLatex ← RecursiveGenerateLaTeX(child)
|
||
|
||
// Add spacing for operators
|
||
IF IsOperator(child.symbol) THEN
|
||
result ← result + " " + childLatex + " "
|
||
ELSE
|
||
result ← result + childLatex
|
||
END IF
|
||
END FOR
|
||
|
||
RETURN result
|
||
END
|
||
|
||
SUBROUTINE: SymbolToLatex
|
||
INPUT: symbol (MathSymbol)
|
||
OUTPUT: latex (String)
|
||
|
||
BEGIN
|
||
MATCH symbol.symbol_type:
|
||
Digit → RETURN symbol.value
|
||
Letter → RETURN symbol.value
|
||
Operator → RETURN OperatorToLatex(symbol.value)
|
||
Special → RETURN SpecialToLatex(symbol.value)
|
||
END MATCH
|
||
|
||
RETURN symbol.value
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n) where n = nodes in tree
|
||
Space: O(h) for recursion stack where h = tree height
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Output Format Conversion
|
||
|
||
### 4.1 Multi-Format Generation
|
||
|
||
```
|
||
ALGORITHM: ConvertToFormats
|
||
INPUT: mathExpr (MathExpression), formats (Vec<OutputFormat>)
|
||
OUTPUT: Result<HashMap<OutputFormat, String>, Error>
|
||
|
||
DATA STRUCTURES:
|
||
OutputFormat {
|
||
MMD, // Markdown with delimiters
|
||
LaTeXStyled, // Standalone LaTeX
|
||
MathML, // MathML XML
|
||
HTML // Rendered HTML
|
||
}
|
||
|
||
BEGIN
|
||
results ← HashMap::new()
|
||
|
||
FOR EACH format IN formats DO
|
||
output ← MATCH format:
|
||
MMD → GenerateMMD(mathExpr)
|
||
LaTeXStyled → GenerateStyledLaTeX(mathExpr)
|
||
MathML → GenerateMathML(mathExpr.tree)
|
||
HTML → GenerateHTML(mathExpr)
|
||
END MATCH
|
||
|
||
results.insert(format, output)
|
||
END FOR
|
||
|
||
RETURN Ok(results)
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(f × n) where f = format count, n = expression size
|
||
Space: O(f × n) for storing all formats
|
||
```
|
||
|
||
### 4.2 MMD Generation
|
||
|
||
```
|
||
ALGORITHM: GenerateMMD
|
||
INPUT: mathExpr (MathExpression)
|
||
OUTPUT: mmd (String)
|
||
|
||
CONSTANTS:
|
||
INLINE_DELIMITER = "$"
|
||
DISPLAY_DELIMITER = "$$"
|
||
|
||
BEGIN
|
||
latex ← mathExpr.latex
|
||
|
||
// Determine if expression should be display or inline
|
||
isDisplayMath ← ShouldBeDisplayMath(mathExpr)
|
||
|
||
IF isDisplayMath THEN
|
||
mmd ← DISPLAY_DELIMITER + "\n" + latex + "\n" + DISPLAY_DELIMITER
|
||
ELSE
|
||
mmd ← INLINE_DELIMITER + latex + INLINE_DELIMITER
|
||
END IF
|
||
|
||
RETURN mmd
|
||
END
|
||
|
||
SUBROUTINE: ShouldBeDisplayMath
|
||
INPUT: mathExpr (MathExpression)
|
||
OUTPUT: isDisplay (bool)
|
||
|
||
BEGIN
|
||
// Display math if:
|
||
// 1. Contains fractions or large operators
|
||
// 2. Tree height > 2
|
||
// 3. Width > threshold
|
||
|
||
hasFractions ← mathExpr.latex.contains("\\frac")
|
||
hasLargeOps ← mathExpr.latex.contains("\\sum") OR
|
||
mathExpr.latex.contains("\\int") OR
|
||
mathExpr.latex.contains("\\prod")
|
||
|
||
isTall ← mathExpr.tree.height > 2
|
||
isWide ← mathExpr.symbols.len() > 10
|
||
|
||
RETURN hasFractions OR hasLargeOps OR isTall OR isWide
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n) where n = LaTeX string length
|
||
Space: O(n) for output string
|
||
```
|
||
|
||
### 4.3 MathML Generation
|
||
|
||
```
|
||
ALGORITHM: GenerateMathML
|
||
INPUT: tree (ExpressionTree)
|
||
OUTPUT: mathml (String)
|
||
|
||
BEGIN
|
||
xml ← XMLBuilder::new()
|
||
xml.start_element("math", [("xmlns", "http://www.w3.org/1998/Math/MathML")])
|
||
|
||
RecursiveGenerateMathML(tree.root, xml)
|
||
|
||
xml.end_element("math")
|
||
|
||
RETURN xml.to_string()
|
||
END
|
||
|
||
SUBROUTINE: RecursiveGenerateMathML
|
||
INPUT: node (Box<TreeNode>), xml (XMLBuilder)
|
||
OUTPUT: None (modifies xml)
|
||
|
||
BEGIN
|
||
symbol ← node.symbol
|
||
|
||
// Determine MathML element type
|
||
MATCH symbol.symbol_type:
|
||
Digit OR Letter →
|
||
xml.element("mi", symbol.value)
|
||
|
||
Operator →
|
||
xml.element("mo", symbol.value)
|
||
|
||
Special →
|
||
HandleSpecialSymbol(symbol, xml)
|
||
END MATCH
|
||
|
||
// Handle relationships
|
||
IF HasSuperscript(node) THEN
|
||
xml.start_element("msup")
|
||
RecursiveGenerateMathML(GetBase(node), xml)
|
||
RecursiveGenerateMathML(GetSuperscript(node), xml)
|
||
xml.end_element("msup")
|
||
ELSE IF HasSubscript(node) THEN
|
||
xml.start_element("msub")
|
||
RecursiveGenerateMathML(GetBase(node), xml)
|
||
RecursiveGenerateMathML(GetSubscript(node), xml)
|
||
xml.end_element("msub")
|
||
ELSE IF HasFraction(node) THEN
|
||
xml.start_element("mfrac")
|
||
RecursiveGenerateMathML(GetNumerator(node), xml)
|
||
RecursiveGenerateMathML(GetDenominator(node), xml)
|
||
xml.end_element("mfrac")
|
||
END IF
|
||
|
||
// Process right children
|
||
FOR EACH child IN GetRightChildren(node) DO
|
||
RecursiveGenerateMathML(child, xml)
|
||
END FOR
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n) tree traversal
|
||
Space: O(n) for XML string
|
||
```
|
||
|
||
### 4.4 HTML Rendering
|
||
|
||
```
|
||
ALGORITHM: GenerateHTML
|
||
INPUT: mathExpr (MathExpression)
|
||
OUTPUT: html (String)
|
||
|
||
BEGIN
|
||
// Use KaTeX or MathJax for rendering
|
||
latex ← mathExpr.latex
|
||
|
||
html ← """
|
||
<div class="math-expression" data-confidence="{mathExpr.confidence}">
|
||
<script type="math/tex">
|
||
{latex}
|
||
</script>
|
||
</div>
|
||
"""
|
||
|
||
// Add accessibility attributes
|
||
html ← AddAriaLabels(html, mathExpr)
|
||
|
||
RETURN html
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n) string concatenation
|
||
Space: O(n) output size
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Batch Processing
|
||
|
||
### 5.1 Parallel Batch Processing
|
||
|
||
```
|
||
ALGORITHM: ProcessBatch
|
||
INPUT: inputs (Vec<InputSource>), config (ProcessConfig)
|
||
OUTPUT: Result<Vec<ProcessResult>, Error>
|
||
|
||
DATA STRUCTURES:
|
||
InputSource {
|
||
source_type: SourceType, // Image, PDF, Directory
|
||
path: PathBuf,
|
||
page_range: Option<Range<u32>>
|
||
}
|
||
|
||
ProcessResult {
|
||
input: InputSource,
|
||
output: RecognitionResult,
|
||
processing_time_ms: u64,
|
||
status: ResultStatus
|
||
}
|
||
|
||
ProcessConfig {
|
||
max_parallel: usize,
|
||
timeout_ms: u64,
|
||
cache_enabled: bool
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Expand inputs (handle PDFs and directories)
|
||
expandedInputs ← []
|
||
FOR EACH input IN inputs DO
|
||
MATCH input.source_type:
|
||
PDF →
|
||
pages ← ExtractPDFPages(input.path, input.page_range)
|
||
expandedInputs.extend(pages)
|
||
Directory →
|
||
images ← FindImagesInDirectory(input.path)
|
||
expandedInputs.extend(images)
|
||
Image →
|
||
expandedInputs.push(input)
|
||
END MATCH
|
||
END FOR
|
||
|
||
// Phase 2: Create processing queue
|
||
queue ← WorkQueue::new(expandedInputs)
|
||
results ← ConcurrentVec::new()
|
||
|
||
// Phase 3: Parallel processing
|
||
numWorkers ← MIN(config.max_parallel, CPU_COUNT)
|
||
|
||
PARALLEL FOR worker IN 0..numWorkers DO
|
||
LOOP
|
||
input ← queue.pop()
|
||
IF input.is_none() THEN
|
||
BREAK
|
||
END IF
|
||
|
||
startTime ← GetCurrentTime()
|
||
|
||
// Process single input
|
||
result ← ProcessSingleInput(
|
||
input.unwrap(),
|
||
config,
|
||
timeout: config.timeout_ms
|
||
)
|
||
|
||
processingTime ← GetCurrentTime() - startTime
|
||
|
||
processResult ← ProcessResult {
|
||
input: input.unwrap(),
|
||
output: result,
|
||
processing_time_ms: processingTime,
|
||
status: DetermineStatus(result)
|
||
}
|
||
|
||
results.push(processResult)
|
||
END LOOP
|
||
END PARALLEL
|
||
|
||
// Phase 4: Aggregate and return
|
||
finalResults ← results.into_vec()
|
||
finalResults.sort_by(|a, b| a.input.path.cmp(&b.input.path))
|
||
|
||
RETURN Ok(finalResults)
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- With P workers, N inputs, T time per input
|
||
- Parallel: O(N × T / P)
|
||
- Sequential equivalent: O(N × T)
|
||
- Speedup: ~P (linear with worker count)
|
||
|
||
Space Complexity:
|
||
- Queue: O(N)
|
||
- Results: O(N × R) where R = result size
|
||
- Worker memory: O(P × M) where M = model size
|
||
Total: O(N × R + P × M)
|
||
```
|
||
|
||
### 5.2 PDF Page Extraction
|
||
|
||
```
|
||
ALGORITHM: ExtractPDFPages
|
||
INPUT: pdfPath (PathBuf), pageRange (Option<Range<u32>>)
|
||
OUTPUT: pages (Vec<InputSource>)
|
||
|
||
BEGIN
|
||
// Load PDF document
|
||
document ← PDFDocument::load(pdfPath)
|
||
|
||
IF document.is_error() THEN
|
||
RETURN Error("Failed to load PDF")
|
||
END IF
|
||
|
||
// Determine page range
|
||
totalPages ← document.page_count()
|
||
range ← pageRange.unwrap_or(0..totalPages)
|
||
|
||
pages ← []
|
||
|
||
FOR pageNum IN range DO
|
||
IF pageNum >= totalPages THEN
|
||
BREAK
|
||
END IF
|
||
|
||
// Render page to image
|
||
page ← document.get_page(pageNum)
|
||
|
||
// Render at high DPI for quality
|
||
image ← page.render(dpi: 300)
|
||
|
||
// Create temporary file
|
||
tempPath ← CreateTempFile(format!("page_{}.png", pageNum))
|
||
image.save(tempPath)
|
||
|
||
inputSource ← InputSource {
|
||
source_type: Image,
|
||
path: tempPath,
|
||
page_range: None
|
||
}
|
||
|
||
pages.push(inputSource)
|
||
END FOR
|
||
|
||
RETURN pages
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(P × R) where P = pages, R = render time per page
|
||
Space: O(P × S) where S = image size
|
||
```
|
||
|
||
### 5.3 Result Aggregation
|
||
|
||
```
|
||
ALGORITHM: AggregateResults
|
||
INPUT: results (Vec<ProcessResult>)
|
||
OUTPUT: aggregated (AggregatedResults)
|
||
|
||
DATA STRUCTURES:
|
||
AggregatedResults {
|
||
total_count: usize,
|
||
success_count: usize,
|
||
failure_count: usize,
|
||
total_processing_time_ms: u64,
|
||
average_confidence: f32,
|
||
results_by_status: HashMap<ResultStatus, Vec<ProcessResult>>
|
||
}
|
||
|
||
BEGIN
|
||
totalCount ← results.len()
|
||
successCount ← 0
|
||
failureCount ← 0
|
||
totalTime ← 0
|
||
totalConfidence ← 0.0
|
||
byStatus ← HashMap::new()
|
||
|
||
FOR EACH result IN results DO
|
||
totalTime ← totalTime + result.processing_time_ms
|
||
|
||
MATCH result.status:
|
||
Success →
|
||
successCount ← successCount + 1
|
||
totalConfidence ← totalConfidence + result.output.confidence
|
||
Failure →
|
||
failureCount ← failureCount + 1
|
||
END MATCH
|
||
|
||
// Group by status
|
||
IF NOT byStatus.contains_key(result.status) THEN
|
||
byStatus.insert(result.status, [])
|
||
END IF
|
||
byStatus.get_mut(result.status).push(result)
|
||
END FOR
|
||
|
||
avgConfidence ← IF successCount > 0 THEN
|
||
totalConfidence / successCount
|
||
ELSE
|
||
0.0
|
||
END IF
|
||
|
||
RETURN AggregatedResults {
|
||
total_count: totalCount,
|
||
success_count: successCount,
|
||
failure_count: failureCount,
|
||
total_processing_time_ms: totalTime,
|
||
average_confidence: avgConfidence,
|
||
results_by_status: byStatus
|
||
}
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(n) single pass
|
||
Space: O(n) for grouping
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Caching and Memoization
|
||
|
||
### 6.1 Model Weight Caching
|
||
|
||
```
|
||
ALGORITHM: LoadModelWithCache
|
||
INPUT: modelPath (PathBuf), cacheConfig (CacheConfig)
|
||
OUTPUT: Result<Model, Error>
|
||
|
||
DATA STRUCTURES:
|
||
CacheConfig {
|
||
enabled: bool,
|
||
cache_dir: PathBuf,
|
||
max_cache_size_mb: u64,
|
||
ttl_seconds: u64
|
||
}
|
||
|
||
CachedModel {
|
||
weights: Vec<u8>,
|
||
metadata: ModelMetadata,
|
||
cached_at: Timestamp,
|
||
access_count: u64
|
||
}
|
||
|
||
BEGIN
|
||
IF NOT cacheConfig.enabled THEN
|
||
RETURN LoadModelDirect(modelPath)
|
||
END IF
|
||
|
||
// Generate cache key from model path and version
|
||
cacheKey ← ComputeHash(modelPath, algorithm: SHA256)
|
||
cachePath ← cacheConfig.cache_dir.join(cacheKey)
|
||
|
||
// Check if cached version exists and is valid
|
||
IF cachePath.exists() THEN
|
||
cachedModel ← DeserializeCachedModel(cachePath)
|
||
|
||
// Check TTL
|
||
age ← GetCurrentTime() - cachedModel.cached_at
|
||
IF age < cacheConfig.ttl_seconds THEN
|
||
// Cache hit
|
||
cachedModel.access_count ← cachedModel.access_count + 1
|
||
UpdateCacheMetadata(cachePath, cachedModel.metadata)
|
||
|
||
model ← DeserializeModel(cachedModel.weights)
|
||
RETURN Ok(model)
|
||
ELSE
|
||
// Cache expired
|
||
DeleteFile(cachePath)
|
||
END IF
|
||
END IF
|
||
|
||
// Cache miss - load from disk
|
||
model ← LoadModelDirect(modelPath)
|
||
|
||
IF model.is_error() THEN
|
||
RETURN model
|
||
END IF
|
||
|
||
// Serialize and cache
|
||
serializedWeights ← SerializeModel(model.unwrap())
|
||
|
||
cachedModel ← CachedModel {
|
||
weights: serializedWeights,
|
||
metadata: model.metadata,
|
||
cached_at: GetCurrentTime(),
|
||
access_count: 1
|
||
}
|
||
|
||
// Check cache size limit
|
||
EnsureCacheSize(cacheConfig)
|
||
|
||
// Write to cache
|
||
WriteCachedModel(cachePath, cachedModel)
|
||
|
||
RETURN model
|
||
END
|
||
|
||
SUBROUTINE: EnsureCacheSize
|
||
INPUT: cacheConfig (CacheConfig)
|
||
OUTPUT: None
|
||
|
||
BEGIN
|
||
currentSize ← GetDirectorySize(cacheConfig.cache_dir)
|
||
maxSize ← cacheConfig.max_cache_size_mb * 1024 * 1024
|
||
|
||
IF currentSize <= maxSize THEN
|
||
RETURN
|
||
END IF
|
||
|
||
// Evict least recently used models
|
||
cachedFiles ← ListFiles(cacheConfig.cache_dir)
|
||
|
||
// Sort by last access time
|
||
cachedFiles.sort_by(|a, b| {
|
||
a.metadata.accessed_at.cmp(&b.metadata.accessed_at)
|
||
})
|
||
|
||
freedSpace ← 0
|
||
targetFree ← currentSize - maxSize
|
||
|
||
FOR EACH file IN cachedFiles DO
|
||
IF freedSpace >= targetFree THEN
|
||
BREAK
|
||
END IF
|
||
|
||
fileSize ← GetFileSize(file)
|
||
DeleteFile(file)
|
||
freedSpace ← freedSpace + fileSize
|
||
END FOR
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time Complexity:
|
||
- Cache hit: O(1) for lookup + O(m) for deserialization
|
||
- Cache miss: O(m) for model loading + O(m) for serialization
|
||
- Eviction: O(k log k) where k = cached files
|
||
|
||
Space Complexity:
|
||
- Cached model: O(m) where m = model size
|
||
- LRU tracking: O(k)
|
||
```
|
||
|
||
### 6.2 Result Caching with Ruvector
|
||
|
||
```
|
||
ALGORITHM: CacheResultWithVector
|
||
INPUT: imageHash (Hash), result (RecognitionResult), vectorStore (RuvectorStore)
|
||
OUTPUT: Result<(), Error>
|
||
|
||
DATA STRUCTURES:
|
||
RuvectorStore {
|
||
index: VectorIndex,
|
||
metadata_db: HashMap<Hash, ResultMetadata>,
|
||
config: VectorConfig
|
||
}
|
||
|
||
VectorConfig {
|
||
embedding_dim: usize,
|
||
similarity_threshold: f32,
|
||
max_cache_entries: usize
|
||
}
|
||
|
||
ResultMetadata {
|
||
result: RecognitionResult,
|
||
image_hash: Hash,
|
||
cached_at: Timestamp,
|
||
hit_count: u64
|
||
}
|
||
|
||
BEGIN
|
||
// Phase 1: Generate perceptual hash
|
||
perceptualHash ← ComputePerceptualHash(imageHash)
|
||
|
||
// Phase 2: Check if already cached
|
||
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
|
||
// Update metadata
|
||
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
|
||
metadata.hit_count ← metadata.hit_count + 1
|
||
RETURN Ok(())
|
||
END IF
|
||
|
||
// Phase 3: Generate embedding for the result
|
||
embedding ← GenerateResultEmbedding(result)
|
||
|
||
// Phase 4: Store in vector index
|
||
vectorStore.index.insert(
|
||
id: perceptualHash,
|
||
vector: embedding
|
||
)
|
||
|
||
// Phase 5: Store metadata
|
||
metadata ← ResultMetadata {
|
||
result: result,
|
||
image_hash: imageHash,
|
||
cached_at: GetCurrentTime(),
|
||
hit_count: 1
|
||
}
|
||
|
||
vectorStore.metadata_db.insert(perceptualHash, metadata)
|
||
|
||
// Phase 6: Enforce cache size limit
|
||
IF vectorStore.metadata_db.len() > vectorStore.config.max_cache_entries THEN
|
||
EvictLeastUsedEntry(vectorStore)
|
||
END IF
|
||
|
||
RETURN Ok(())
|
||
END
|
||
|
||
ALGORITHM: QuerySimilarCachedResult
|
||
INPUT: imageHash (Hash), vectorStore (RuvectorStore)
|
||
OUTPUT: Option<RecognitionResult>
|
||
|
||
BEGIN
|
||
// Generate perceptual hash
|
||
perceptualHash ← ComputePerceptualHash(imageHash)
|
||
|
||
// Exact match check
|
||
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
|
||
metadata ← vectorStore.metadata_db.get(perceptualHash)
|
||
metadata.hit_count ← metadata.hit_count + 1
|
||
RETURN Some(metadata.result.clone())
|
||
END IF
|
||
|
||
// Generate query embedding
|
||
queryEmbedding ← GenerateImageEmbedding(imageHash)
|
||
|
||
// Search for similar results
|
||
results ← vectorStore.index.search(
|
||
query: queryEmbedding,
|
||
k: 1,
|
||
threshold: vectorStore.config.similarity_threshold
|
||
)
|
||
|
||
IF results.is_empty() THEN
|
||
RETURN None
|
||
END IF
|
||
|
||
bestMatch ← results[0]
|
||
|
||
IF bestMatch.similarity >= vectorStore.config.similarity_threshold THEN
|
||
metadata ← vectorStore.metadata_db.get(bestMatch.id)
|
||
metadata.hit_count ← metadata.hit_count + 1
|
||
RETURN Some(metadata.result.clone())
|
||
END IF
|
||
|
||
RETURN None
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Caching:
|
||
Time: O(d) for embedding + O(log n) for index insertion
|
||
Space: O(n × d) where n = cached entries, d = embedding dim
|
||
|
||
Querying:
|
||
Time: O(d) for embedding + O(log n × d) for ANN search
|
||
Space: O(k) for results where k = search parameter
|
||
```
|
||
|
||
### 6.3 Incremental Update Cache
|
||
|
||
```
|
||
ALGORITHM: UpdateCacheIncremental
|
||
INPUT: updates (Vec<CacheUpdate>), vectorStore (RuvectorStore)
|
||
OUTPUT: Result<(), Error>
|
||
|
||
DATA STRUCTURES:
|
||
CacheUpdate {
|
||
operation: UpdateOp, // Insert, Update, Delete
|
||
image_hash: Hash,
|
||
result: Option<RecognitionResult>
|
||
}
|
||
|
||
UpdateOp {
|
||
Insert,
|
||
Update,
|
||
Delete
|
||
}
|
||
|
||
BEGIN
|
||
// Batch updates for efficiency
|
||
insertBatch ← []
|
||
updateBatch ← []
|
||
deleteBatch ← []
|
||
|
||
FOR EACH update IN updates DO
|
||
MATCH update.operation:
|
||
Insert →
|
||
insertBatch.push(update)
|
||
Update →
|
||
updateBatch.push(update)
|
||
Delete →
|
||
deleteBatch.push(update)
|
||
END MATCH
|
||
END FOR
|
||
|
||
// Process deletes first
|
||
FOR EACH update IN deleteBatch DO
|
||
perceptualHash ← ComputePerceptualHash(update.image_hash)
|
||
vectorStore.index.remove(perceptualHash)
|
||
vectorStore.metadata_db.remove(perceptualHash)
|
||
END FOR
|
||
|
||
// Process updates
|
||
FOR EACH update IN updateBatch DO
|
||
perceptualHash ← ComputePerceptualHash(update.image_hash)
|
||
|
||
IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
|
||
// Update existing entry
|
||
embedding ← GenerateResultEmbedding(update.result.unwrap())
|
||
vectorStore.index.update(perceptualHash, embedding)
|
||
|
||
metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
|
||
metadata.result ← update.result.unwrap()
|
||
metadata.cached_at ← GetCurrentTime()
|
||
END IF
|
||
END FOR
|
||
|
||
// Process inserts in batch
|
||
IF NOT insertBatch.is_empty() THEN
|
||
embeddings ← []
|
||
metadataList ← []
|
||
|
||
FOR EACH update IN insertBatch DO
|
||
embedding ← GenerateResultEmbedding(update.result.unwrap())
|
||
embeddings.push(embedding)
|
||
|
||
perceptualHash ← ComputePerceptualHash(update.image_hash)
|
||
metadata ← ResultMetadata {
|
||
result: update.result.unwrap(),
|
||
image_hash: update.image_hash,
|
||
cached_at: GetCurrentTime(),
|
||
hit_count: 1
|
||
}
|
||
metadataList.push((perceptualHash, metadata))
|
||
END FOR
|
||
|
||
// Batch insert into vector index
|
||
vectorStore.index.insert_batch(embeddings)
|
||
|
||
// Batch insert metadata
|
||
FOR EACH (hash, metadata) IN metadataList DO
|
||
vectorStore.metadata_db.insert(hash, metadata)
|
||
END FOR
|
||
END IF
|
||
|
||
RETURN Ok(())
|
||
END
|
||
|
||
COMPLEXITY ANALYSIS:
|
||
Time: O(b × d) where b = batch size, d = embedding dim
|
||
Space: O(b × d) for batch processing
|
||
```
|
||
|
||
---
|
||
|
||
## Summary: Complexity Analysis
|
||
|
||
### Overall System Complexity
|
||
|
||
| Component | Time Complexity | Space Complexity |
|
||
|-----------|----------------|------------------|
|
||
| Image Preprocessing | O(n log n) | O(n) |
|
||
| Vision Transformer | O(L × p² × d) | O(L × h × p²) |
|
||
| Text Recognition | O(T × B × V × log(BV)) | O(B × T × d) |
|
||
| Math Symbol Detection | O(s × c) | O(s) |
|
||
| Spatial Analysis | O(s log s) | O(s²) worst case |
|
||
| Tree Construction | O(s log s) | O(s × h) |
|
||
| LaTeX Generation | O(s) | O(h) |
|
||
| Batch Processing | O(N × T / P) | O(N × R + P × M) |
|
||
| Vector Caching | O(d + log n) | O(n × d) |
|
||
|
||
**Legend:**
|
||
- n = pixel count
|
||
- L = transformer layers
|
||
- p = number of patches
|
||
- d = embedding dimension
|
||
- h = attention heads
|
||
- T = sequence length
|
||
- B = beam width
|
||
- V = vocabulary size
|
||
- s = symbol count
|
||
- N = batch size
|
||
- P = parallel workers
|
||
- R = result size
|
||
- M = model size
|
||
|
||
### Optimization Opportunities
|
||
|
||
1. **Preprocessing**: Use GPU-accelerated image operations
|
||
2. **Transformer**: Implement efficient attention (FlashAttention)
|
||
3. **Beam Search**: Prune low-probability beams early
|
||
4. **Spatial Analysis**: Use spatial indexing (QuadTree/R-tree)
|
||
5. **Caching**: Implement tiered cache (L1: memory, L2: disk)
|
||
6. **Batch Processing**: Dynamic load balancing across workers
|
||
7. **Vector Search**: Use approximate nearest neighbor (HNSW)
|
||
|
||
---
|
||
|
||
## Design Patterns Used
|
||
|
||
1. **Pipeline Pattern**: Image preprocessing → OCR → Math parsing → Output
|
||
2. **Strategy Pattern**: Multiple output format generators
|
||
3. **Observer Pattern**: Progress tracking in batch processing
|
||
4. **Factory Pattern**: Model and cache instantiation
|
||
5. **Adapter Pattern**: Format conversion layers
|
||
6. **Repository Pattern**: Vector store abstraction
|
||
7. **Command Pattern**: Cache update operations
|
||
8. **Builder Pattern**: Expression tree and XML construction
|
||
|
||
---
|
||
|
||
*This pseudocode serves as the algorithmic blueprint for implementation in the Refinement phase.*
|