Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

20 KiB

Raw Blame History

Bounded Context Map - Genomic Analysis Platform

Context Map Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                         GENOMIC ANALYSIS PLATFORM                            │
└─────────────────────────────────────────────────────────────────────────────┘

    ┌──────────────────┐
    │   Pipeline       │ ◄───────── Orchestration Layer
    │   Context        │
    └────────┬─────────┘
             │ ACL (maps domain events to pipeline commands)
             │
    ┌────────┴─────────────────────────────────────────────┐
    │                                                       │
    ▼                                                       ▼
┌─────────────────┐                               ┌─────────────────┐
│   Sequence      │ Customer-Supplier             │   Alignment     │
│   Context       ├──────────────────────────────►│   Context       │
│                 │ (provides k-mer indices)      │                 │
└────────┬────────┘                               └────────┬────────┘
         │                                                 │
         │ Shared Kernel (GenomicPosition, QualityScore)  │
         │                                                 │
         ▼                                                 ▼
┌─────────────────┐                               ┌─────────────────┐
│   Variant       │                               │   Protein       │
│   Context       │◄──────────────────────────────┤   Context       │
│                 │  Partner (variant→structure)  │                 │
└────────┬────────┘                               └─────────────────┘
         │
         │ ACL (translates variants to epigenetic events)
         │
         ▼
┌─────────────────┐
│  Epigenomic     │
│  Context        │
└────────┬────────┘
         │
         │ Customer-Supplier (epigenetic→drug response)
         │
         ▼
┌─────────────────┐
│ Pharmacogenomic │
│ Context         │
└─────────────────┘

Legend:
  Customer-Supplier: →  (upstream provides services to downstream)
  Shared Kernel:     ├─┤ (shared domain model)
  Partner:           ◄─► (mutual dependency)
  ACL:               [A] (anti-corruption layer)

1. Sequence Context

Module: kmer.rs

Responsibility: K-mer indexing, sequence sketching, and similarity search

Core Aggregates:

KmerIndex - Root aggregate managing k-mer → position mappings
MinHashSketch - Aggregate for approximate sequence similarity

Key Types:

pub struct KmerEncoder {
    k: usize,
    alphabet_size: usize,
}

pub struct KmerIndex {
    k: usize,
    index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
}

pub struct MinHashSketch {
    k: usize,
    num_hashes: usize,
    signatures: Vec<u64>,
}

Published Events:

SequenceIndexed { sequence_id: String, kmer_count: usize }
SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }

Domain Language:

K-mer: substring of length k
Minimizer: canonical k-mer representation
Sketch: compressed sequence signature
Jaccard similarity: set overlap metric

Invariants:

K-mer length must be 3 ≤ k ≤ 32
MinHash signature size must be ≥ 1
All k-mers normalized to canonical form (min(kmer, reverse_complement))

2. Alignment Context

Module: alignment.rs

Responsibility: Sequence alignment using attention mechanisms and motif detection

Core Aggregates:

AttentionAligner - Root aggregate for pairwise sequence alignment
MotifScanner - Aggregate for regulatory motif discovery

Key Types:

pub struct AttentionAligner {
    attention_service: Arc<AttentionService>,
    gap_penalty: f64,
    match_bonus: f64,
}

pub struct MotifScanner {
    attention_service: Arc<AttentionService>,
    min_score: f64,
    known_motifs: Vec<MotifPattern>,
}

pub struct AlignmentResult {
    pub score: f64,
    pub aligned_query: String,
    pub aligned_target: String,
    pub attention_weights: Vec<Vec<f64>>,
}

Published Events:

AlignmentCompleted { query_id: String, target_id: String, score: f64 }
MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }

Domain Language:

Alignment: optimal mapping between two sequences
Gap penalty: cost of insertions/deletions
Attention weight: learned similarity between positions
Motif: conserved sequence pattern (e.g., TATA box)
PWM (Position Weight Matrix): motif scoring matrix

Invariants:

Gap penalty must be negative
Match bonus must be positive
Motif minimum score 0.0 ≤ score ≤ 1.0
Alignment score monotonically decreases with gaps

Relationship with Sequence Context:

Type: Customer-Supplier
Direction: Sequence → Alignment
Integration: Alignment consumes k-mer indices for fast seed-and-extend
Translation: None (direct dependency)

3. Variant Context

Module: variant.rs

Responsibility: Variant calling, genotyping, and population genetics

Core Aggregates:

VariantDatabase - Root aggregate managing variant collection
VariantCaller - Service aggregate for variant detection

Key Types:

pub struct VariantCaller {
    min_quality: f64,
    min_depth: usize,
    gnn_service: Arc<GnnService>,
}

pub struct Variant {
    pub position: GenomicPosition,
    pub reference: String,
    pub alternate: String,
    pub quality: f64,
    pub genotype: Genotype,
    pub depth: usize,
    pub allele_frequency: Option<f64>,
}

pub struct VariantDatabase {
    variants: HashMap<GenomicPosition, Variant>,
    graph_index: Option<GraphIndex>, // GNN-based variant relationships
}

pub enum Genotype {
    Homozygous(Allele),
    Heterozygous(Allele, Allele),
}

Published Events:

VariantCalled { position: GenomicPosition, variant: Variant }
GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }
PopulationFrequencyCalculated { variant_id: String, frequency: f64 }

Domain Language:

SNP (Single Nucleotide Polymorphism): single base change
Indel: insertion or deletion
Genotype: allele combination (0/0, 0/1, 1/1)
Allele frequency: population prevalence
Quality score: confidence in variant call (Phred scale)
Coverage depth: number of reads supporting variant

Invariants:

Quality score ≥ 0 (Phred scale)
Coverage depth ≥ 1
Allele frequency 0.0 ≤ AF ≤ 1.0
Reference and alternate alleles must differ
Genotype alleles must match available alleles

Relationship with Alignment Context:

Type: Customer-Supplier
Direction: Alignment → Variant
Integration: Variant caller uses alignment results to identify mismatches
Translation: Alignment gaps → insertion/deletion variants

Shared Kernel with Sequence Context:

GenomicPosition { chromosome: String, position: usize }
QualityScore(f64) (Phred-scaled)
Nucleotide enum (A, C, G, T)

4. Protein Context

Module: protein.rs

Responsibility: Protein structure prediction and contact map generation

Core Aggregates:

ProteinGraph - Root aggregate representing protein as graph
ContactPredictor - Service aggregate for 3D contact prediction

Key Types:

pub struct ProteinGraph {
    pub sequence: String, // amino acid sequence
    pub nodes: Vec<AminoAcid>,
    pub edges: Vec<(usize, usize, ContactType)>,
}

pub struct ContactPredictor {
    gnn_service: Arc<GnnService>,
    attention_service: Arc<AttentionService>,
    distance_threshold: f64, // Ångströms
}

pub struct ContactPrediction {
    pub residue_i: usize,
    pub residue_j: usize,
    pub probability: f64,
    pub distance: Option<f64>,
}

pub enum ContactType {
    Backbone,
    SideChain,
    HydrogenBond,
    DisulfideBridge,
}

Published Events:

ProteinTranslated { gene_id: String, protein_sequence: String }
StructurePredicted { protein_id: String, contact_count: usize }
FoldingPathwayComputed { protein_id: String, energy: f64 }

Domain Language:

Amino acid: protein building block (20 standard types)
Residue: amino acid position in sequence
Contact: spatial proximity between residues (<8Å)
Secondary structure: local folding patterns (helix, sheet, loop)
Tertiary structure: 3D protein fold
Contact map: matrix of residue-residue distances

Invariants:

Sequence length ≥ 1
Contact probability 0.0 ≤ p ≤ 1.0
Distance threshold > 0.0 (typically 8.0Å)
Contact pairs must be |i - j| ≥ 4 (exclude local contacts)

Relationship with Variant Context:

Type: Partner (bidirectional)
Direction: Variant ↔ Protein
Integration:
- Variant → Protein: coding variants cause amino acid changes
- Protein → Variant: structural changes inform variant pathogenicity
Translation:
- Variant ACL translates nucleotide changes to codon changes
- Protein ACL maps structure disruption to clinical significance

5. Epigenomic Context

Module: epigenomics.rs

Responsibility: DNA methylation analysis and epigenetic age prediction

Core Aggregates:

EpigeneticIndex - Root aggregate managing methylation sites
HorvathClock - Service aggregate for epigenetic age calculation

Key Types:

pub struct MethylationProfile {
    pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
    pub total_sites: usize,
    pub mean_methylation: f64,
}

pub struct HorvathClock {
    pub coefficients: HashMap<String, f64>, // CpG site → weight
    pub intercept: f64,
}

pub struct CpGSite {
    pub position: GenomicPosition,
    pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
    pub coverage: usize,
}

pub struct EpigeneticAge {
    pub chronological_age: Option<f64>,
    pub predicted_age: f64,
    pub acceleration: f64, // predicted - chronological
}

Published Events:

MethylationProfileGenerated { sample_id: String, site_count: usize }
EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }
DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }

Domain Language:

CpG site: cytosine-guanine dinucleotide (methylation target)
Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
Epigenetic clock: age predictor based on methylation
Age acceleration: difference between epigenetic and chronological age
DMR (Differentially Methylated Region): region with changed methylation

Invariants:

Beta value 0.0 ≤ β ≤ 1.0
Coverage ≥ 1
Horvath coefficients sum to meaningful scale
Age ≥ 0.0

Relationship with Variant Context:

Type: Anti-Corruption Layer
Direction: Variant → Epigenomic
Integration: Variants in regulatory regions affect methylation patterns
Translation:
- ACL translates genetic variants to epigenetic effects
- Maps SNPs → methylation quantitative trait loci (mQTL)
- Prevents variant domain concepts from leaking into epigenetic model

6. Pharmacogenomic Context

Module: pharma.rs

Responsibility: Pharmacogenetic analysis and drug-gene interaction prediction

Core Aggregates:

DrugInteractionGraph - Root aggregate representing drug-gene network
StarAlleleCaller - Service aggregate for haplotype phasing

Key Types:

pub struct StarAlleleCaller {
    gene_definitions: HashMap<String, GeneDefinition>,
    min_coverage: usize,
}

pub struct StarAllele {
    pub gene: String,
    pub allele: String, // e.g., "*1", "*2", "*17"
    pub variants: Vec<Variant>,
    pub function: AlleleFunction,
}

pub enum AlleleFunction {
    Normal,
    Increased,
    Decreased,
    NoFunction,
}

pub struct DrugInteractionGraph {
    pub nodes: Vec<DrugGeneNode>,
    pub edges: Vec<(usize, usize, InteractionType)>,
}

pub struct DrugResponse {
    pub drug: String,
    pub diplotype: Diplotype,
    pub phenotype: MetabolizerPhenotype,
    pub recommendation: ClinicalRecommendation,
}

pub enum MetabolizerPhenotype {
    UltraRapid,
    Rapid,
    Normal,
    Intermediate,
    Poor,
}

Published Events:

StarAlleleIdentified { gene: String, allele: String, diplotype: String }
DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }
InteractionDetected { drug1: String, drug2: String, severity: Severity }

Domain Language:

Star allele: named haplotype variant (e.g., CYP2D6*4)
Diplotype: pair of haplotypes (e.g., *1/*4)
Metabolizer phenotype: drug metabolism rate
Pharmacogene: gene affecting drug response
Drug-gene interaction: how genetics modulates drug efficacy/toxicity

Invariants:

Diplotype must have exactly 2 alleles
Phenotype derivable from diplotype
Coverage ≥ minimum threshold for calling
All star allele variants must exist in variant database

Relationship with Epigenomic Context:

Type: Customer-Supplier
Direction: Epigenomic → Pharmacogenomic
Integration: Methylation affects drug metabolism gene expression
Translation: Methylation beta values → gene expression levels → phenotype

7. Pipeline Context

Module: pipeline.rs

Responsibility: Orchestration of multi-stage genomic analysis workflow

Core Aggregates:

GenomicPipeline - Root aggregate orchestrating all contexts

Key Types:

pub struct GenomicPipeline {
    pub kmer_encoder: KmerEncoder,
    pub aligner: AttentionAligner,
    pub variant_caller: VariantCaller,
    pub protein_predictor: ContactPredictor,
    pub methylation_analyzer: MethylationAnalyzer,
    pub pharma_analyzer: StarAlleleCaller,
}

pub struct PipelineConfig {
    pub k: usize,
    pub min_variant_quality: f64,
    pub min_coverage: usize,
    pub enable_protein_prediction: bool,
    pub enable_epigenetic_analysis: bool,
    pub enable_pharmacogenomics: bool,
}

pub struct AnalysisResult {
    pub sequence_stats: SequenceStats,
    pub variants: Vec<Variant>,
    pub protein_structures: Vec<ProteinGraph>,
    pub methylation_profile: Option<MethylationProfile>,
    pub drug_responses: Vec<DrugResponse>,
}

Published Events:

PipelineStarted { sample_id: String, stages: Vec<String> }
StageCompleted { stage: String, duration_ms: u64 }
PipelineCompleted { sample_id: String, total_duration_ms: u64 }
PipelineFailed { stage: String, error: String }

Domain Language:

Pipeline: directed acyclic graph of analysis stages
Stage: atomic analysis unit (alignment, variant calling, etc.)
Workflow: ordered execution of stages
Checkpoint: saved intermediate state
Provenance: lineage tracking of analysis steps

Invariants:

All enabled stages must execute in dependency order
Failed stage halts downstream execution
All results traceable to input data and parameters

Anti-Corruption Layers:

The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:

Sequence ACL: Translates k-mer indices to alignment seeds
Alignment ACL: Converts alignment gaps to variant candidates
Variant ACL: Maps variants to protein mutations
Protein ACL: Translates structure to functional predictions
Epigenetic ACL: Converts methylation to gene expression estimates
Pharmacogenomic ACL: Maps genotypes to clinical recommendations

Context Relationship Matrix

From ↓ / To →	Sequence	Alignment	Variant	Protein	Epigenomic	Pharma	Pipeline
Sequence	-	C-S	SK	SK	-	-	ACL
Alignment	-	-	C-S	-	-	-	ACL
Variant	-	-	-	Partner	ACL	-	ACL
Protein	-	-	Partner	-	-	-	ACL
Epigenomic	-	-	-	-	-	C-S	ACL
Pharma	-	-	-	-	-	-	ACL
Pipeline	C-S	C-S	C-S	C-S	C-S	C-S	-

Legend:

C-S: Customer-Supplier
SK: Shared Kernel
Partner: Partnership
ACL: Anti-Corruption Layer

Integration Patterns

1. Event-Driven Integration

Contexts communicate via domain events to maintain loose coupling:

// Example: Variant Context publishes event
pub enum DomainEvent {
    VariantCalled(VariantCalledEvent),
    ProteinStructurePredicted(ProteinPredictedEvent),
    // ...
}

// Pipeline Context subscribes and translates
impl EventHandler for GenomicPipeline {
    fn handle(&mut self, event: DomainEvent) {
        match event {
            DomainEvent::VariantCalled(e) => {
                if e.variant.is_coding() {
                    self.trigger_protein_analysis(e.variant);
                }
            }
            // ...
        }
    }
}

2. Shared Kernel Components

Core domain types shared across contexts:

// In types.rs (core domain)
pub struct GenomicPosition {
    pub chromosome: String,
    pub position: usize,
}

pub struct QualityScore(pub f64); // Phred-scaled

pub enum Nucleotide { A, C, G, T }

pub struct GenomicRegion {
    pub chromosome: String,
    pub start: usize,
    pub end: usize,
}

3. Anti-Corruption Layer Example

// Variant → Protein ACL
pub struct VariantToProteinTranslator {
    codon_table: CodonTable,
}

impl VariantToProteinTranslator {
    pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
        // Prevents protein context from depending on variant implementation
        let codon_change = self.map_to_codon(variant)?;
        let aa_change = self.codon_table.translate(codon_change)?;

        Some(ProteinMutation {
            position: variant.position.position / 3,
            reference_aa: aa_change.reference,
            alternate_aa: aa_change.alternate,
        })
    }
}

Bounded Context Responsibilities Summary

Sequence Context: K-mer indexing and sequence similarity (foundation)
Alignment Context: Pairwise alignment and motif discovery
Variant Context: Variant calling and population genetics
Protein Context: Structure prediction and functional analysis
Epigenomic Context: Methylation profiling and age prediction
Pharmacogenomic Context: Drug-gene interactions and clinical recommendations
Pipeline Context: Workflow orchestration and result aggregation

Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.

20 KiB Raw Blame History

Bounded Context Map - Genomic Analysis Platform

Context Map Overview

1. Sequence Context

2. Alignment Context

3. Variant Context

4. Protein Context

5. Epigenomic Context

6. Pharmacogenomic Context

7. Pipeline Context

Context Relationship Matrix

Integration Patterns

1. Event-Driven Integration

2. Shared Kernel Components

3. Anti-Corruption Layer Example

Bounded Context Responsibilities Summary

20 KiB

Raw Blame History