# Domain Model - Genomic Analysis Platform ## Overview This document defines all entities, value objects, aggregates, and domain events across the seven bounded contexts. Each type is shown with its Rust signature and business rules. ## Core Domain Types (Shared Kernel) ### Value Objects ```rust /// Genomic coordinate (immutable) #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub struct GenomicPosition { pub chromosome: String, pub position: usize, } // Invariants: // - chromosome must be valid (1-22, X, Y, MT) // - position must be ≥ 1 /// Quality score using Phred scale: Q = -10 * log10(P_error) #[derive(Debug, Clone, Copy, PartialEq)] pub struct QualityScore(pub f64); // Invariants: // - score ≥ 0 // - Q=10 means 10% error rate // - Q=20 means 1% error rate // - Q=30 means 0.1% error rate /// Single nucleotide #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Nucleotide { A, // Adenine C, // Cytosine G, // Guanine T, // Thymine } // Operations: impl Nucleotide { pub fn complement(&self) -> Self; pub fn to_byte(&self) -> u8; pub fn from_byte(b: u8) -> Result; } /// Genomic interval #[derive(Debug, Clone, PartialEq, Eq)] pub struct GenomicRegion { pub chromosome: String, pub start: usize, pub end: usize, } // Invariants: // - start < end // - start ≥ 1 // - Same chromosome validity rules as GenomicPosition /// Amino acid single-letter code #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum AminoAcid { A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, Stop, } // Invariants: // - 20 standard amino acids + stop codon // - Each has specific properties (hydrophobic, charged, etc.) ``` ## 1. Sequence Context Domain Model ### Aggregates ```rust /// Aggregate Root: K-mer index for fast sequence search pub struct KmerIndex { k: usize, index: HashMap>, // k-mer hash → positions sequence_length: usize, } // Aggregate boundary: Controls all k-mer operations // Invariants: // - 3 ≤ k ≤ 32 // - All positions < sequence_length // - K-mers stored in canonical form impl KmerIndex { pub fn new(k: usize) -> Result; pub fn index_sequence(&mut self, sequence: &[u8]) -> Result<(), Error>; pub fn query(&self, kmer: &[u8]) -> Vec; pub fn contains(&self, kmer: &[u8]) -> bool; } /// Aggregate: MinHash sketch for approximate similarity pub struct MinHashSketch { k: usize, num_hashes: usize, signatures: Vec, } // Invariants: // - num_hashes ≥ 1 (typically 128-1024) // - signatures.len() == num_hashes // - Signatures sorted in ascending order impl MinHashSketch { pub fn new(k: usize, num_hashes: usize) -> Self; pub fn add_sequence(&mut self, sequence: &[u8]); pub fn jaccard_similarity(&self, other: &Self) -> f64; } ``` ### Entities ```rust /// Entity: DNA sequence with metadata #[derive(Debug, Clone)] pub struct DnaSequence { pub id: String, // Identity pub sequence: Vec, pub quality_scores: Option>, pub created_at: DateTime, } // Invariants: // - id must be unique // - sequence contains only A, C, G, T, N // - if quality_scores.is_some(), length must equal sequence length impl DnaSequence { pub fn reverse_complement(&self) -> Self; pub fn gc_content(&self) -> f64; pub fn length(&self) -> usize; } ``` ### Value Objects ```rust /// K-mer encoder configuration #[derive(Debug, Clone, Copy)] pub struct KmerConfig { pub k: usize, pub alphabet_size: usize, } // Invariants: // - k ≥ 3 // - alphabet_size typically 4 (DNA) or 20 (protein) ``` ### Domain Events ```rust pub enum SequenceEvent { SequenceIndexed { sequence_id: String, kmer_count: usize, timestamp: DateTime, }, SimilarSequenceFound { query_id: String, match_id: String, similarity: f64, timestamp: DateTime, }, } ``` ## 2. Alignment Context Domain Model ### Aggregates ```rust /// Aggregate Root: Attention-based sequence aligner pub struct AttentionAligner { attention_service: Arc, gap_penalty: f64, match_bonus: f64, } // Invariants: // - gap_penalty < 0 // - match_bonus > 0 // - |gap_penalty| < match_bonus (gaps should be costly) impl AttentionAligner { pub fn align(&self, query: &[u8], target: &[u8]) -> Result; pub fn batch_align(&self, pairs: Vec<(&[u8], &[u8])>) -> Result, Error>; } /// Aggregate: Motif scanner for regulatory elements pub struct MotifScanner { attention_service: Arc, min_score: f64, known_motifs: Vec, } // Invariants: // - 0.0 ≤ min_score ≤ 1.0 // - All motif patterns valid (length ≥ 4) impl MotifScanner { pub fn scan(&self, sequence: &[u8]) -> Vec; pub fn add_motif(&mut self, pattern: MotifPattern); } ``` ### Value Objects ```rust /// Alignment result (immutable) #[derive(Debug, Clone)] pub struct AlignmentResult { pub score: f64, pub aligned_query: String, pub aligned_target: String, pub attention_weights: Vec>, pub identity: f64, // % exact matches pub gaps: usize, } // Invariants: // - aligned_query.len() == aligned_target.len() // - 0.0 ≤ identity ≤ 1.0 // - attention_weights dimensions match alignment length /// Motif pattern definition #[derive(Debug, Clone)] pub struct MotifPattern { pub name: String, pub consensus: String, // IUPAC nucleotide codes pub pwm: Vec<[f64; 4]>, // Position Weight Matrix } // Invariants: // - consensus.len() == pwm.len() // - Each PWM position sums to ~1.0 // - pwm.len() ≥ 4 /// Motif match location #[derive(Debug, Clone)] pub struct MotifMatch { pub motif_name: String, pub position: usize, pub score: f64, pub strand: Strand, } #[derive(Debug, Clone, Copy)] pub enum Strand { Forward, Reverse, } ``` ### Domain Events ```rust pub enum AlignmentEvent { AlignmentCompleted { query_id: String, target_id: String, score: f64, timestamp: DateTime, }, MotifDetected { sequence_id: String, motif: String, position: usize, score: f64, timestamp: DateTime, }, } ``` ## 3. Variant Context Domain Model ### Aggregates ```rust /// Aggregate Root: Collection of genetic variants pub struct VariantDatabase { variants: HashMap, graph_index: Option, population_frequencies: HashMap, } // Aggregate boundary: Ensures variant consistency and relationships // Invariants: // - No duplicate positions // - All frequencies 0.0 ≤ f ≤ 1.0 // - Graph index consistent with variant set impl VariantDatabase { pub fn add_variant(&mut self, variant: Variant) -> Result<(), Error>; pub fn get_variant(&self, pos: &GenomicPosition) -> Option<&Variant>; pub fn variants_in_region(&self, region: &GenomicRegion) -> Vec<&Variant>; pub fn update_frequency(&mut self, pos: &GenomicPosition, freq: f64); } /// Service Aggregate: Variant calling engine pub struct VariantCaller { min_quality: f64, min_depth: usize, gnn_service: Arc, } // Invariants: // - min_quality ≥ 0 // - min_depth ≥ 1 impl VariantCaller { pub fn call_variants(&self, reads: &[Read], reference: &[u8]) -> Result, Error>; pub fn genotype(&self, variant: &Variant, reads: &[Read]) -> Result; } ``` ### Entities ```rust /// Entity: Genetic variant with identity at genomic position #[derive(Debug, Clone)] pub struct Variant { pub position: GenomicPosition, // Identity (part of) pub reference: String, pub alternate: String, pub variant_type: VariantType, pub quality: f64, pub genotype: Genotype, pub depth: usize, pub allele_frequency: Option, pub annotations: Vec, } // Invariants: // - reference != alternate // - quality ≥ 0 // - depth ≥ 1 // - if allele_frequency.is_some(), 0.0 ≤ f ≤ 1.0 // - variant_type consistent with reference/alternate impl Variant { pub fn is_snp(&self) -> bool; pub fn is_indel(&self) -> bool; pub fn is_coding(&self) -> bool; pub fn clinical_significance(&self) -> ClinicalSignificance; } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum VariantType { SNP, Insertion, Deletion, MNP, // Multi-nucleotide polymorphism Complex, } ``` ### Value Objects ```rust /// Genotype representation #[derive(Debug, Clone, PartialEq, Eq)] pub enum Genotype { Homozygous(Allele), Heterozygous(Allele, Allele), } // Invariants: // - Heterozygous alleles must differ // - Alleles must match variant's reference/alternate #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Allele { Reference, Alternate, } /// Variant annotation #[derive(Debug, Clone)] pub struct Annotation { pub gene: String, pub consequence: Consequence, pub impact: Impact, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Consequence { Synonymous, Missense, Nonsense, FrameShift, SpliceSite, Regulatory, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Impact { High, Moderate, Low, Modifier, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum ClinicalSignificance { Benign, LikelyBenign, VUS, // Variant of Uncertain Significance LikelyPathogenic, Pathogenic, } ``` ### Domain Events ```rust pub enum VariantEvent { VariantCalled { position: GenomicPosition, variant: Variant, timestamp: DateTime, }, GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype, timestamp: DateTime, }, PopulationFrequencyCalculated { variant_id: String, frequency: f64, population: String, timestamp: DateTime, }, } ``` ## 4. Protein Context Domain Model ### Aggregates ```rust /// Aggregate Root: Protein represented as graph pub struct ProteinGraph { pub id: String, pub sequence: String, // Amino acid sequence pub nodes: Vec, pub edges: Vec, pub secondary_structure: Vec, } // Aggregate boundary: Manages all structural relationships // Invariants: // - nodes.len() == sequence.len() // - All edge indices < nodes.len() // - No duplicate contacts impl ProteinGraph { pub fn from_sequence(sequence: String) -> Self; pub fn add_contact(&mut self, i: usize, j: usize, contact_type: ContactType); pub fn contact_map(&self) -> Vec>; pub fn fold_energy(&self) -> f64; } /// Service Aggregate: 3D contact prediction pub struct ContactPredictor { gnn_service: Arc, attention_service: Arc, distance_threshold: f64, } // Invariants: // - distance_threshold > 0.0 (typically 8.0 Ångströms) impl ContactPredictor { pub fn predict_contacts(&self, sequence: &str) -> Result, Error>; pub fn predict_structure(&self, sequence: &str) -> Result; } ``` ### Entities ```rust /// Entity: Amino acid residue in protein #[derive(Debug, Clone)] pub struct Residue { pub position: usize, // Identity pub amino_acid: AminoAcid, pub phi_angle: Option, // Backbone dihedral pub psi_angle: Option, // Backbone dihedral pub secondary_structure: Option, } // Invariants: // - position ≥ 1 // - -180° ≤ phi, psi ≤ 180° ``` ### Value Objects ```rust /// Contact between residues #[derive(Debug, Clone)] pub struct Contact { pub residue_i: usize, pub residue_j: usize, pub contact_type: ContactType, pub distance: Option, // Ångströms } // Invariants: // - residue_i < residue_j (ordered) // - |residue_i - residue_j| ≥ 4 (exclude local contacts) // - if distance.is_some(), distance > 0.0 #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum ContactType { Backbone, SideChain, HydrogenBond, DisulfideBridge, } /// Contact prediction with confidence #[derive(Debug, Clone)] pub struct ContactPrediction { pub residue_i: usize, pub residue_j: usize, pub probability: f64, pub distance: Option, } // Invariants: // - 0.0 ≤ probability ≤ 1.0 /// Secondary structure element #[derive(Debug, Clone)] pub struct SecondaryStructureElement { pub start: usize, pub end: usize, pub structure_type: SecondaryStructure, } // Invariants: // - start < end #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum SecondaryStructure { Helix, // α-helix Sheet, // β-sheet Loop, // Random coil Turn, // β-turn } /// Protein mutation #[derive(Debug, Clone)] pub struct ProteinMutation { pub position: usize, pub reference_aa: AminoAcid, pub alternate_aa: AminoAcid, pub structural_impact: f64, // 0.0-1.0 } ``` ### Domain Events ```rust pub enum ProteinEvent { ProteinTranslated { gene_id: String, protein_sequence: String, timestamp: DateTime, }, StructurePredicted { protein_id: String, contact_count: usize, confidence: f64, timestamp: DateTime, }, } ``` ## 5. Epigenomic Context Domain Model ### Aggregates ```rust /// Aggregate Root: Epigenetic modification index pub struct EpigeneticIndex { cpg_sites: HashMap, methylation_profile: MethylationProfile, } // Aggregate boundary: Manages methylation data consistency // Invariants: // - All CpG sites have valid genomic positions // - Beta values 0.0 ≤ β ≤ 1.0 impl EpigeneticIndex { pub fn add_site(&mut self, site: CpGSite) -> Result<(), Error>; pub fn get_profile(&self) -> &MethylationProfile; pub fn differential_methylation(&self, other: &Self) -> Vec; } /// Service Aggregate: Epigenetic age calculator pub struct HorvathClock { coefficients: HashMap, intercept: f64, } // Invariants: // - At least 353 CpG sites (original Horvath model) // - Coefficients normalized impl HorvathClock { pub fn predict_age(&self, profile: &MethylationProfile) -> Result; } ``` ### Entities ```rust /// Entity: Methylation profile for sample #[derive(Debug, Clone)] pub struct MethylationProfile { pub sample_id: String, // Identity pub cpg_sites: HashMap, pub total_sites: usize, pub mean_methylation: f64, pub created_at: DateTime, } // Invariants: // - cpg_sites.len() ≤ total_sites // - All beta values 0.0 ≤ β ≤ 1.0 // - mean_methylation = average of all beta values impl MethylationProfile { pub fn global_methylation(&self) -> f64; pub fn region_methylation(&self, region: &GenomicRegion) -> f64; } ``` ### Value Objects ```rust /// CpG methylation site #[derive(Debug, Clone)] pub struct CpGSite { pub position: GenomicPosition, pub beta_value: f64, // 0.0 = unmethylated, 1.0 = fully methylated pub coverage: usize, pub quality: QualityScore, } // Invariants: // - 0.0 ≤ beta_value ≤ 1.0 // - coverage ≥ 1 /// Epigenetic age prediction #[derive(Debug, Clone)] pub struct EpigeneticAge { pub chronological_age: Option, pub predicted_age: f64, pub acceleration: f64, // predicted - chronological pub confidence_interval: (f64, f64), } // Invariants: // - predicted_age ≥ 0.0 // - confidence_interval.0 < confidence_interval.1 /// Differentially methylated region #[derive(Debug, Clone)] pub struct DifferentialRegion { pub region: GenomicRegion, pub delta_beta: f64, pub p_value: f64, } // Invariants: // - -1.0 ≤ delta_beta ≤ 1.0 // - 0.0 ≤ p_value ≤ 1.0 ``` ### Domain Events ```rust pub enum EpigenomicEvent { MethylationProfileGenerated { sample_id: String, site_count: usize, timestamp: DateTime, }, EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64, timestamp: DateTime, }, } ``` ## 6. Pharmacogenomic Context Domain Model ### Aggregates ```rust /// Aggregate Root: Drug-gene interaction network pub struct DrugInteractionGraph { nodes: Vec, edges: Vec, phenotype_map: HashMap, } // Aggregate boundary: Manages pharmacogenetic relationships // Invariants: // - All edge indices valid // - All diplotypes map to phenotypes impl DrugInteractionGraph { pub fn add_interaction(&mut self, interaction: Interaction); pub fn predict_response(&self, drug: &str, diplotype: &Diplotype) -> DrugResponse; } /// Service Aggregate: Star allele haplotype caller pub struct StarAlleleCaller { gene_definitions: HashMap, min_coverage: usize, } // Invariants: // - min_coverage ≥ 1 // - All genes have valid definitions impl StarAlleleCaller { pub fn call_alleles(&self, variants: &[Variant], gene: &str) -> Result; } ``` ### Entities ```rust /// Entity: Star allele definition #[derive(Debug, Clone)] pub struct StarAllele { pub id: String, // Identity (e.g., "CYP2D6*4") pub gene: String, pub allele: String, pub variants: Vec, pub function: AlleleFunction, pub activity_score: f64, } // Invariants: // - id format: "{gene}*{allele_number}" // - 0.0 ≤ activity_score ≤ 2.0 #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum AlleleFunction { Normal, Increased, Decreased, NoFunction, } ``` ### Value Objects ```rust /// Diplotype (pair of haplotypes) #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub struct Diplotype { pub allele1: String, pub allele2: String, } // Invariants: // - Both alleles non-empty // - Canonical ordering (allele1 ≤ allele2 lexicographically) /// Metabolizer phenotype derived from diplotype #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum MetabolizerPhenotype { UltraRapid, // Activity score > 2.0 Rapid, // Activity score 1.5-2.0 Normal, // Activity score 1.0-1.5 Intermediate, // Activity score 0.5-1.0 Poor, // Activity score < 0.5 } impl MetabolizerPhenotype { pub fn from_activity_score(score: f64) -> Self; } /// Drug response prediction #[derive(Debug, Clone)] pub struct DrugResponse { pub drug: String, pub diplotype: Diplotype, pub phenotype: MetabolizerPhenotype, pub recommendation: ClinicalRecommendation, } #[derive(Debug, Clone)] pub struct ClinicalRecommendation { pub recommendation_type: RecommendationType, pub dosage_adjustment: Option, // Multiplier pub alternative_drug: Option, pub cpic_level: CpicLevel, // CPIC evidence level } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum RecommendationType { Standard, IncreaseDose, DecreaseDose, AlternativeDrug, Contraindicated, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum CpicLevel { A, // High evidence B, // Moderate evidence C, // Low evidence D, // Preclinical evidence } /// Drug-gene interaction #[derive(Debug, Clone)] pub struct Interaction { pub drug: String, pub gene: String, pub interaction_type: InteractionType, pub strength: f64, // 0.0-1.0 } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum InteractionType { Metabolism, Transport, Target, Toxicity, } ``` ### Domain Events ```rust pub enum PharmacogenomicEvent { StarAlleleIdentified { gene: String, allele: String, diplotype: String, timestamp: DateTime, }, DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype, recommendation: RecommendationType, timestamp: DateTime, }, } ``` ## 7. Pipeline Context Domain Model ### Aggregates ```rust /// Aggregate Root: Complete genomic analysis workflow pub struct GenomicPipeline { pub id: String, pub config: PipelineConfig, stages: Vec, state: PipelineState, results: AnalysisResult, } // Aggregate boundary: Orchestrates all analysis contexts // Invariants: // - Stages execute in dependency order // - No stage runs until dependencies complete // - Failed stage prevents downstream execution impl GenomicPipeline { pub fn new(config: PipelineConfig) -> Self; pub fn run(&mut self, input: SequenceData) -> Result; pub fn run_stage(&mut self, stage: &str) -> Result<(), Error>; pub fn checkpoint(&self) -> Result<(), Error>; pub fn restore(checkpoint_id: &str) -> Result; } ``` ### Value Objects ```rust /// Pipeline configuration #[derive(Debug, Clone)] pub struct PipelineConfig { pub k: usize, pub min_variant_quality: f64, pub min_coverage: usize, pub enable_protein_prediction: bool, pub enable_epigenetic_analysis: bool, pub enable_pharmacogenomics: bool, } /// Analysis stage definition #[derive(Debug, Clone)] pub struct PipelineStage { pub name: String, pub dependencies: Vec, pub timeout: Duration, pub retries: usize, } /// Pipeline execution state #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum PipelineState { Idle, Running, Completed, Failed, } /// Complete analysis result #[derive(Debug, Clone)] pub struct AnalysisResult { pub sequence_stats: SequenceStats, pub variants: Vec, pub protein_structures: Vec, pub methylation_profile: Option, pub drug_responses: Vec, pub execution_time: Duration, } #[derive(Debug, Clone)] pub struct SequenceStats { pub total_length: usize, pub gc_content: f64, pub n_count: usize, pub quality_mean: f64, } ``` ### Domain Events ```rust pub enum PipelineEvent { PipelineStarted { pipeline_id: String, stages: Vec, timestamp: DateTime, }, StageCompleted { pipeline_id: String, stage: String, duration_ms: u64, timestamp: DateTime, }, PipelineCompleted { pipeline_id: String, total_duration_ms: u64, timestamp: DateTime, }, PipelineFailed { pipeline_id: String, stage: String, error: String, timestamp: DateTime, }, } ``` ## Business Rules Summary ### Cross-Cutting Rules 1. **Quality Thresholds**: All data must meet minimum quality scores 2. **Validation**: Input data validated at bounded context entry points 3. **Traceability**: All results traceable to source data and parameters 4. **Consistency**: Aggregates maintain internal consistency invariants ### Context-Specific Rules **Sequence Context**: - K-mer indices use canonical (lexicographically minimal) representation - MinHash signatures maintain cardinality for accurate similarity **Alignment Context**: - Gap penalties never exceed match bonuses - Motif matches require minimum conservation score **Variant Context**: - Variants only called above quality and coverage thresholds - Population frequencies sum to 1.0 across all samples - Clinical significance based on ClinVar/evidence database **Protein Context**: - Contacts only between residues separated by ≥4 positions - Secondary structure assignments mutually exclusive **Epigenomic Context**: - Beta values strictly bounded [0.0, 1.0] - Epigenetic age non-negative **Pharmacogenomic Context**: - Diplotypes sorted in canonical order - Phenotypes deterministically derived from diplotype activity scores - CPIC recommendations follow evidence-based guidelines **Pipeline Context**: - Stage execution respects dependency DAG - Checkpoints enable recovery from failures - Configuration immutable during pipeline run ## Aggregate Invariants Each aggregate root enforces these invariants: 1. **Identity**: Unique identifier within bounded context 2. **Completeness**: All required fields populated 3. **Consistency**: Related entities maintain referential integrity 4. **Validity**: All values within acceptable ranges 5. **Atomicity**: Changes commit or rollback as unit These invariants ensure domain model correctness across all bounded contexts.