# Bounded Context Map - Genomic Analysis Platform ## Context Map Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ GENOMIC ANALYSIS PLATFORM │ └─────────────────────────────────────────────────────────────────────────────┘ ┌──────────────────┐ │ Pipeline │ ◄───────── Orchestration Layer │ Context │ └────────┬─────────┘ │ ACL (maps domain events to pipeline commands) │ ┌────────┴─────────────────────────────────────────────┐ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Sequence │ Customer-Supplier │ Alignment │ │ Context ├──────────────────────────────►│ Context │ │ │ (provides k-mer indices) │ │ └────────┬────────┘ └────────┬────────┘ │ │ │ Shared Kernel (GenomicPosition, QualityScore) │ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Variant │ │ Protein │ │ Context │◄──────────────────────────────┤ Context │ │ │ Partner (variant→structure) │ │ └────────┬────────┘ └─────────────────┘ │ │ ACL (translates variants to epigenetic events) │ ▼ ┌─────────────────┐ │ Epigenomic │ │ Context │ └────────┬────────┘ │ │ Customer-Supplier (epigenetic→drug response) │ ▼ ┌─────────────────┐ │ Pharmacogenomic │ │ Context │ └─────────────────┘ Legend: Customer-Supplier: → (upstream provides services to downstream) Shared Kernel: ├─┤ (shared domain model) Partner: ◄─► (mutual dependency) ACL: [A] (anti-corruption layer) ``` ## 1. Sequence Context **Module**: `kmer.rs` **Responsibility**: K-mer indexing, sequence sketching, and similarity search **Core Aggregates**: - `KmerIndex` - Root aggregate managing k-mer → position mappings - `MinHashSketch` - Aggregate for approximate sequence similarity **Key Types**: ```rust pub struct KmerEncoder { k: usize, alphabet_size: usize, } pub struct KmerIndex { k: usize, index: HashMap>, // k-mer hash → positions } pub struct MinHashSketch { k: usize, num_hashes: usize, signatures: Vec, } ``` **Published Events**: - `SequenceIndexed { sequence_id: String, kmer_count: usize }` - `SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }` **Domain Language**: - K-mer: substring of length k - Minimizer: canonical k-mer representation - Sketch: compressed sequence signature - Jaccard similarity: set overlap metric **Invariants**: - K-mer length must be 3 ≤ k ≤ 32 - MinHash signature size must be ≥ 1 - All k-mers normalized to canonical form (min(kmer, reverse_complement)) ## 2. Alignment Context **Module**: `alignment.rs` **Responsibility**: Sequence alignment using attention mechanisms and motif detection **Core Aggregates**: - `AttentionAligner` - Root aggregate for pairwise sequence alignment - `MotifScanner` - Aggregate for regulatory motif discovery **Key Types**: ```rust pub struct AttentionAligner { attention_service: Arc, gap_penalty: f64, match_bonus: f64, } pub struct MotifScanner { attention_service: Arc, min_score: f64, known_motifs: Vec, } pub struct AlignmentResult { pub score: f64, pub aligned_query: String, pub aligned_target: String, pub attention_weights: Vec>, } ``` **Published Events**: - `AlignmentCompleted { query_id: String, target_id: String, score: f64 }` - `MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }` **Domain Language**: - Alignment: optimal mapping between two sequences - Gap penalty: cost of insertions/deletions - Attention weight: learned similarity between positions - Motif: conserved sequence pattern (e.g., TATA box) - PWM (Position Weight Matrix): motif scoring matrix **Invariants**: - Gap penalty must be negative - Match bonus must be positive - Motif minimum score 0.0 ≤ score ≤ 1.0 - Alignment score monotonically decreases with gaps **Relationship with Sequence Context**: - **Type**: Customer-Supplier - **Direction**: Sequence → Alignment - **Integration**: Alignment consumes k-mer indices for fast seed-and-extend - **Translation**: None (direct dependency) ## 3. Variant Context **Module**: `variant.rs` **Responsibility**: Variant calling, genotyping, and population genetics **Core Aggregates**: - `VariantDatabase` - Root aggregate managing variant collection - `VariantCaller` - Service aggregate for variant detection **Key Types**: ```rust pub struct VariantCaller { min_quality: f64, min_depth: usize, gnn_service: Arc, } pub struct Variant { pub position: GenomicPosition, pub reference: String, pub alternate: String, pub quality: f64, pub genotype: Genotype, pub depth: usize, pub allele_frequency: Option, } pub struct VariantDatabase { variants: HashMap, graph_index: Option, // GNN-based variant relationships } pub enum Genotype { Homozygous(Allele), Heterozygous(Allele, Allele), } ``` **Published Events**: - `VariantCalled { position: GenomicPosition, variant: Variant }` - `GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }` - `PopulationFrequencyCalculated { variant_id: String, frequency: f64 }` **Domain Language**: - SNP (Single Nucleotide Polymorphism): single base change - Indel: insertion or deletion - Genotype: allele combination (0/0, 0/1, 1/1) - Allele frequency: population prevalence - Quality score: confidence in variant call (Phred scale) - Coverage depth: number of reads supporting variant **Invariants**: - Quality score ≥ 0 (Phred scale) - Coverage depth ≥ 1 - Allele frequency 0.0 ≤ AF ≤ 1.0 - Reference and alternate alleles must differ - Genotype alleles must match available alleles **Relationship with Alignment Context**: - **Type**: Customer-Supplier - **Direction**: Alignment → Variant - **Integration**: Variant caller uses alignment results to identify mismatches - **Translation**: Alignment gaps → insertion/deletion variants **Shared Kernel with Sequence Context**: - `GenomicPosition { chromosome: String, position: usize }` - `QualityScore(f64)` (Phred-scaled) - `Nucleotide` enum (A, C, G, T) ## 4. Protein Context **Module**: `protein.rs` **Responsibility**: Protein structure prediction and contact map generation **Core Aggregates**: - `ProteinGraph` - Root aggregate representing protein as graph - `ContactPredictor` - Service aggregate for 3D contact prediction **Key Types**: ```rust pub struct ProteinGraph { pub sequence: String, // amino acid sequence pub nodes: Vec, pub edges: Vec<(usize, usize, ContactType)>, } pub struct ContactPredictor { gnn_service: Arc, attention_service: Arc, distance_threshold: f64, // Ångströms } pub struct ContactPrediction { pub residue_i: usize, pub residue_j: usize, pub probability: f64, pub distance: Option, } pub enum ContactType { Backbone, SideChain, HydrogenBond, DisulfideBridge, } ``` **Published Events**: - `ProteinTranslated { gene_id: String, protein_sequence: String }` - `StructurePredicted { protein_id: String, contact_count: usize }` - `FoldingPathwayComputed { protein_id: String, energy: f64 }` **Domain Language**: - Amino acid: protein building block (20 standard types) - Residue: amino acid position in sequence - Contact: spatial proximity between residues (<8Å) - Secondary structure: local folding patterns (helix, sheet, loop) - Tertiary structure: 3D protein fold - Contact map: matrix of residue-residue distances **Invariants**: - Sequence length ≥ 1 - Contact probability 0.0 ≤ p ≤ 1.0 - Distance threshold > 0.0 (typically 8.0Å) - Contact pairs must be |i - j| ≥ 4 (exclude local contacts) **Relationship with Variant Context**: - **Type**: Partner (bidirectional) - **Direction**: Variant ↔ Protein - **Integration**: - Variant → Protein: coding variants cause amino acid changes - Protein → Variant: structural changes inform variant pathogenicity - **Translation**: - Variant ACL translates nucleotide changes to codon changes - Protein ACL maps structure disruption to clinical significance ## 5. Epigenomic Context **Module**: `epigenomics.rs` **Responsibility**: DNA methylation analysis and epigenetic age prediction **Core Aggregates**: - `EpigeneticIndex` - Root aggregate managing methylation sites - `HorvathClock` - Service aggregate for epigenetic age calculation **Key Types**: ```rust pub struct MethylationProfile { pub cpg_sites: HashMap, // position → beta value pub total_sites: usize, pub mean_methylation: f64, } pub struct HorvathClock { pub coefficients: HashMap, // CpG site → weight pub intercept: f64, } pub struct CpGSite { pub position: GenomicPosition, pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated) pub coverage: usize, } pub struct EpigeneticAge { pub chronological_age: Option, pub predicted_age: f64, pub acceleration: f64, // predicted - chronological } ``` **Published Events**: - `MethylationProfileGenerated { sample_id: String, site_count: usize }` - `EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }` - `DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }` **Domain Language**: - CpG site: cytosine-guanine dinucleotide (methylation target) - Beta value: methylation level (0 = unmethylated, 1 = fully methylated) - Epigenetic clock: age predictor based on methylation - Age acceleration: difference between epigenetic and chronological age - DMR (Differentially Methylated Region): region with changed methylation **Invariants**: - Beta value 0.0 ≤ β ≤ 1.0 - Coverage ≥ 1 - Horvath coefficients sum to meaningful scale - Age ≥ 0.0 **Relationship with Variant Context**: - **Type**: Anti-Corruption Layer - **Direction**: Variant → Epigenomic - **Integration**: Variants in regulatory regions affect methylation patterns - **Translation**: - ACL translates genetic variants to epigenetic effects - Maps SNPs → methylation quantitative trait loci (mQTL) - Prevents variant domain concepts from leaking into epigenetic model ## 6. Pharmacogenomic Context **Module**: `pharma.rs` **Responsibility**: Pharmacogenetic analysis and drug-gene interaction prediction **Core Aggregates**: - `DrugInteractionGraph` - Root aggregate representing drug-gene network - `StarAlleleCaller` - Service aggregate for haplotype phasing **Key Types**: ```rust pub struct StarAlleleCaller { gene_definitions: HashMap, min_coverage: usize, } pub struct StarAllele { pub gene: String, pub allele: String, // e.g., "*1", "*2", "*17" pub variants: Vec, pub function: AlleleFunction, } pub enum AlleleFunction { Normal, Increased, Decreased, NoFunction, } pub struct DrugInteractionGraph { pub nodes: Vec, pub edges: Vec<(usize, usize, InteractionType)>, } pub struct DrugResponse { pub drug: String, pub diplotype: Diplotype, pub phenotype: MetabolizerPhenotype, pub recommendation: ClinicalRecommendation, } pub enum MetabolizerPhenotype { UltraRapid, Rapid, Normal, Intermediate, Poor, } ``` **Published Events**: - `StarAlleleIdentified { gene: String, allele: String, diplotype: String }` - `DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }` - `InteractionDetected { drug1: String, drug2: String, severity: Severity }` **Domain Language**: - Star allele: named haplotype variant (e.g., CYP2D6*4) - Diplotype: pair of haplotypes (e.g., *1/*4) - Metabolizer phenotype: drug metabolism rate - Pharmacogene: gene affecting drug response - Drug-gene interaction: how genetics modulates drug efficacy/toxicity **Invariants**: - Diplotype must have exactly 2 alleles - Phenotype derivable from diplotype - Coverage ≥ minimum threshold for calling - All star allele variants must exist in variant database **Relationship with Epigenomic Context**: - **Type**: Customer-Supplier - **Direction**: Epigenomic → Pharmacogenomic - **Integration**: Methylation affects drug metabolism gene expression - **Translation**: Methylation beta values → gene expression levels → phenotype ## 7. Pipeline Context **Module**: `pipeline.rs` **Responsibility**: Orchestration of multi-stage genomic analysis workflow **Core Aggregates**: - `GenomicPipeline` - Root aggregate orchestrating all contexts **Key Types**: ```rust pub struct GenomicPipeline { pub kmer_encoder: KmerEncoder, pub aligner: AttentionAligner, pub variant_caller: VariantCaller, pub protein_predictor: ContactPredictor, pub methylation_analyzer: MethylationAnalyzer, pub pharma_analyzer: StarAlleleCaller, } pub struct PipelineConfig { pub k: usize, pub min_variant_quality: f64, pub min_coverage: usize, pub enable_protein_prediction: bool, pub enable_epigenetic_analysis: bool, pub enable_pharmacogenomics: bool, } pub struct AnalysisResult { pub sequence_stats: SequenceStats, pub variants: Vec, pub protein_structures: Vec, pub methylation_profile: Option, pub drug_responses: Vec, } ``` **Published Events**: - `PipelineStarted { sample_id: String, stages: Vec }` - `StageCompleted { stage: String, duration_ms: u64 }` - `PipelineCompleted { sample_id: String, total_duration_ms: u64 }` - `PipelineFailed { stage: String, error: String }` **Domain Language**: - Pipeline: directed acyclic graph of analysis stages - Stage: atomic analysis unit (alignment, variant calling, etc.) - Workflow: ordered execution of stages - Checkpoint: saved intermediate state - Provenance: lineage tracking of analysis steps **Invariants**: - All enabled stages must execute in dependency order - Failed stage halts downstream execution - All results traceable to input data and parameters **Anti-Corruption Layers**: The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details: 1. **Sequence ACL**: Translates k-mer indices to alignment seeds 2. **Alignment ACL**: Converts alignment gaps to variant candidates 3. **Variant ACL**: Maps variants to protein mutations 4. **Protein ACL**: Translates structure to functional predictions 5. **Epigenetic ACL**: Converts methylation to gene expression estimates 6. **Pharmacogenomic ACL**: Maps genotypes to clinical recommendations ## Context Relationship Matrix | From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline | |---------------|----------|-----------|---------|---------|------------|--------|----------| | Sequence | - | C-S | SK | SK | - | - | ACL | | Alignment | - | - | C-S | - | - | - | ACL | | Variant | - | - | - | Partner | ACL | - | ACL | | Protein | - | - | Partner | - | - | - | ACL | | Epigenomic | - | - | - | - | - | C-S | ACL | | Pharma | - | - | - | - | - | - | ACL | | Pipeline | C-S | C-S | C-S | C-S | C-S | C-S | - | **Legend**: - C-S: Customer-Supplier - SK: Shared Kernel - Partner: Partnership - ACL: Anti-Corruption Layer ## Integration Patterns ### 1. Event-Driven Integration Contexts communicate via domain events to maintain loose coupling: ```rust // Example: Variant Context publishes event pub enum DomainEvent { VariantCalled(VariantCalledEvent), ProteinStructurePredicted(ProteinPredictedEvent), // ... } // Pipeline Context subscribes and translates impl EventHandler for GenomicPipeline { fn handle(&mut self, event: DomainEvent) { match event { DomainEvent::VariantCalled(e) => { if e.variant.is_coding() { self.trigger_protein_analysis(e.variant); } } // ... } } } ``` ### 2. Shared Kernel Components Core domain types shared across contexts: ```rust // In types.rs (core domain) pub struct GenomicPosition { pub chromosome: String, pub position: usize, } pub struct QualityScore(pub f64); // Phred-scaled pub enum Nucleotide { A, C, G, T } pub struct GenomicRegion { pub chromosome: String, pub start: usize, pub end: usize, } ``` ### 3. Anti-Corruption Layer Example ```rust // Variant → Protein ACL pub struct VariantToProteinTranslator { codon_table: CodonTable, } impl VariantToProteinTranslator { pub fn translate_variant(&self, variant: &Variant) -> Option { // Prevents protein context from depending on variant implementation let codon_change = self.map_to_codon(variant)?; let aa_change = self.codon_table.translate(codon_change)?; Some(ProteinMutation { position: variant.position.position / 3, reference_aa: aa_change.reference, alternate_aa: aa_change.alternate, }) } } ``` ## Bounded Context Responsibilities Summary 1. **Sequence Context**: K-mer indexing and sequence similarity (foundation) 2. **Alignment Context**: Pairwise alignment and motif discovery 3. **Variant Context**: Variant calling and population genetics 4. **Protein Context**: Structure prediction and functional analysis 5. **Epigenomic Context**: Methylation profiling and age prediction 6. **Pharmacogenomic Context**: Drug-gene interactions and clinical recommendations 7. **Pipeline Context**: Workflow orchestration and result aggregation Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.