git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
20 KiB
Bounded Context Map - Genomic Analysis Platform
Context Map Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ GENOMIC ANALYSIS PLATFORM │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────┐
│ Pipeline │ ◄───────── Orchestration Layer
│ Context │
└────────┬─────────┘
│ ACL (maps domain events to pipeline commands)
│
┌────────┴─────────────────────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Sequence │ Customer-Supplier │ Alignment │
│ Context ├──────────────────────────────►│ Context │
│ │ (provides k-mer indices) │ │
└────────┬────────┘ └────────┬────────┘
│ │
│ Shared Kernel (GenomicPosition, QualityScore) │
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Variant │ │ Protein │
│ Context │◄──────────────────────────────┤ Context │
│ │ Partner (variant→structure) │ │
└────────┬────────┘ └─────────────────┘
│
│ ACL (translates variants to epigenetic events)
│
▼
┌─────────────────┐
│ Epigenomic │
│ Context │
└────────┬────────┘
│
│ Customer-Supplier (epigenetic→drug response)
│
▼
┌─────────────────┐
│ Pharmacogenomic │
│ Context │
└─────────────────┘
Legend:
Customer-Supplier: → (upstream provides services to downstream)
Shared Kernel: ├─┤ (shared domain model)
Partner: ◄─► (mutual dependency)
ACL: [A] (anti-corruption layer)
1. Sequence Context
Module: kmer.rs
Responsibility: K-mer indexing, sequence sketching, and similarity search
Core Aggregates:
KmerIndex- Root aggregate managing k-mer → position mappingsMinHashSketch- Aggregate for approximate sequence similarity
Key Types:
pub struct KmerEncoder {
k: usize,
alphabet_size: usize,
}
pub struct KmerIndex {
k: usize,
index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
}
pub struct MinHashSketch {
k: usize,
num_hashes: usize,
signatures: Vec<u64>,
}
Published Events:
SequenceIndexed { sequence_id: String, kmer_count: usize }SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }
Domain Language:
- K-mer: substring of length k
- Minimizer: canonical k-mer representation
- Sketch: compressed sequence signature
- Jaccard similarity: set overlap metric
Invariants:
- K-mer length must be 3 ≤ k ≤ 32
- MinHash signature size must be ≥ 1
- All k-mers normalized to canonical form (min(kmer, reverse_complement))
2. Alignment Context
Module: alignment.rs
Responsibility: Sequence alignment using attention mechanisms and motif detection
Core Aggregates:
AttentionAligner- Root aggregate for pairwise sequence alignmentMotifScanner- Aggregate for regulatory motif discovery
Key Types:
pub struct AttentionAligner {
attention_service: Arc<AttentionService>,
gap_penalty: f64,
match_bonus: f64,
}
pub struct MotifScanner {
attention_service: Arc<AttentionService>,
min_score: f64,
known_motifs: Vec<MotifPattern>,
}
pub struct AlignmentResult {
pub score: f64,
pub aligned_query: String,
pub aligned_target: String,
pub attention_weights: Vec<Vec<f64>>,
}
Published Events:
AlignmentCompleted { query_id: String, target_id: String, score: f64 }MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }
Domain Language:
- Alignment: optimal mapping between two sequences
- Gap penalty: cost of insertions/deletions
- Attention weight: learned similarity between positions
- Motif: conserved sequence pattern (e.g., TATA box)
- PWM (Position Weight Matrix): motif scoring matrix
Invariants:
- Gap penalty must be negative
- Match bonus must be positive
- Motif minimum score 0.0 ≤ score ≤ 1.0
- Alignment score monotonically decreases with gaps
Relationship with Sequence Context:
- Type: Customer-Supplier
- Direction: Sequence → Alignment
- Integration: Alignment consumes k-mer indices for fast seed-and-extend
- Translation: None (direct dependency)
3. Variant Context
Module: variant.rs
Responsibility: Variant calling, genotyping, and population genetics
Core Aggregates:
VariantDatabase- Root aggregate managing variant collectionVariantCaller- Service aggregate for variant detection
Key Types:
pub struct VariantCaller {
min_quality: f64,
min_depth: usize,
gnn_service: Arc<GnnService>,
}
pub struct Variant {
pub position: GenomicPosition,
pub reference: String,
pub alternate: String,
pub quality: f64,
pub genotype: Genotype,
pub depth: usize,
pub allele_frequency: Option<f64>,
}
pub struct VariantDatabase {
variants: HashMap<GenomicPosition, Variant>,
graph_index: Option<GraphIndex>, // GNN-based variant relationships
}
pub enum Genotype {
Homozygous(Allele),
Heterozygous(Allele, Allele),
}
Published Events:
VariantCalled { position: GenomicPosition, variant: Variant }GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }PopulationFrequencyCalculated { variant_id: String, frequency: f64 }
Domain Language:
- SNP (Single Nucleotide Polymorphism): single base change
- Indel: insertion or deletion
- Genotype: allele combination (0/0, 0/1, 1/1)
- Allele frequency: population prevalence
- Quality score: confidence in variant call (Phred scale)
- Coverage depth: number of reads supporting variant
Invariants:
- Quality score ≥ 0 (Phred scale)
- Coverage depth ≥ 1
- Allele frequency 0.0 ≤ AF ≤ 1.0
- Reference and alternate alleles must differ
- Genotype alleles must match available alleles
Relationship with Alignment Context:
- Type: Customer-Supplier
- Direction: Alignment → Variant
- Integration: Variant caller uses alignment results to identify mismatches
- Translation: Alignment gaps → insertion/deletion variants
Shared Kernel with Sequence Context:
GenomicPosition { chromosome: String, position: usize }QualityScore(f64)(Phred-scaled)Nucleotideenum (A, C, G, T)
4. Protein Context
Module: protein.rs
Responsibility: Protein structure prediction and contact map generation
Core Aggregates:
ProteinGraph- Root aggregate representing protein as graphContactPredictor- Service aggregate for 3D contact prediction
Key Types:
pub struct ProteinGraph {
pub sequence: String, // amino acid sequence
pub nodes: Vec<AminoAcid>,
pub edges: Vec<(usize, usize, ContactType)>,
}
pub struct ContactPredictor {
gnn_service: Arc<GnnService>,
attention_service: Arc<AttentionService>,
distance_threshold: f64, // Ångströms
}
pub struct ContactPrediction {
pub residue_i: usize,
pub residue_j: usize,
pub probability: f64,
pub distance: Option<f64>,
}
pub enum ContactType {
Backbone,
SideChain,
HydrogenBond,
DisulfideBridge,
}
Published Events:
ProteinTranslated { gene_id: String, protein_sequence: String }StructurePredicted { protein_id: String, contact_count: usize }FoldingPathwayComputed { protein_id: String, energy: f64 }
Domain Language:
- Amino acid: protein building block (20 standard types)
- Residue: amino acid position in sequence
- Contact: spatial proximity between residues (<8Å)
- Secondary structure: local folding patterns (helix, sheet, loop)
- Tertiary structure: 3D protein fold
- Contact map: matrix of residue-residue distances
Invariants:
- Sequence length ≥ 1
- Contact probability 0.0 ≤ p ≤ 1.0
- Distance threshold > 0.0 (typically 8.0Å)
- Contact pairs must be |i - j| ≥ 4 (exclude local contacts)
Relationship with Variant Context:
- Type: Partner (bidirectional)
- Direction: Variant ↔ Protein
- Integration:
- Variant → Protein: coding variants cause amino acid changes
- Protein → Variant: structural changes inform variant pathogenicity
- Translation:
- Variant ACL translates nucleotide changes to codon changes
- Protein ACL maps structure disruption to clinical significance
5. Epigenomic Context
Module: epigenomics.rs
Responsibility: DNA methylation analysis and epigenetic age prediction
Core Aggregates:
EpigeneticIndex- Root aggregate managing methylation sitesHorvathClock- Service aggregate for epigenetic age calculation
Key Types:
pub struct MethylationProfile {
pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
pub total_sites: usize,
pub mean_methylation: f64,
}
pub struct HorvathClock {
pub coefficients: HashMap<String, f64>, // CpG site → weight
pub intercept: f64,
}
pub struct CpGSite {
pub position: GenomicPosition,
pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
pub coverage: usize,
}
pub struct EpigeneticAge {
pub chronological_age: Option<f64>,
pub predicted_age: f64,
pub acceleration: f64, // predicted - chronological
}
Published Events:
MethylationProfileGenerated { sample_id: String, site_count: usize }EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }
Domain Language:
- CpG site: cytosine-guanine dinucleotide (methylation target)
- Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
- Epigenetic clock: age predictor based on methylation
- Age acceleration: difference between epigenetic and chronological age
- DMR (Differentially Methylated Region): region with changed methylation
Invariants:
- Beta value 0.0 ≤ β ≤ 1.0
- Coverage ≥ 1
- Horvath coefficients sum to meaningful scale
- Age ≥ 0.0
Relationship with Variant Context:
- Type: Anti-Corruption Layer
- Direction: Variant → Epigenomic
- Integration: Variants in regulatory regions affect methylation patterns
- Translation:
- ACL translates genetic variants to epigenetic effects
- Maps SNPs → methylation quantitative trait loci (mQTL)
- Prevents variant domain concepts from leaking into epigenetic model
6. Pharmacogenomic Context
Module: pharma.rs
Responsibility: Pharmacogenetic analysis and drug-gene interaction prediction
Core Aggregates:
DrugInteractionGraph- Root aggregate representing drug-gene networkStarAlleleCaller- Service aggregate for haplotype phasing
Key Types:
pub struct StarAlleleCaller {
gene_definitions: HashMap<String, GeneDefinition>,
min_coverage: usize,
}
pub struct StarAllele {
pub gene: String,
pub allele: String, // e.g., "*1", "*2", "*17"
pub variants: Vec<Variant>,
pub function: AlleleFunction,
}
pub enum AlleleFunction {
Normal,
Increased,
Decreased,
NoFunction,
}
pub struct DrugInteractionGraph {
pub nodes: Vec<DrugGeneNode>,
pub edges: Vec<(usize, usize, InteractionType)>,
}
pub struct DrugResponse {
pub drug: String,
pub diplotype: Diplotype,
pub phenotype: MetabolizerPhenotype,
pub recommendation: ClinicalRecommendation,
}
pub enum MetabolizerPhenotype {
UltraRapid,
Rapid,
Normal,
Intermediate,
Poor,
}
Published Events:
StarAlleleIdentified { gene: String, allele: String, diplotype: String }DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }InteractionDetected { drug1: String, drug2: String, severity: Severity }
Domain Language:
- Star allele: named haplotype variant (e.g., CYP2D6*4)
- Diplotype: pair of haplotypes (e.g., *1/*4)
- Metabolizer phenotype: drug metabolism rate
- Pharmacogene: gene affecting drug response
- Drug-gene interaction: how genetics modulates drug efficacy/toxicity
Invariants:
- Diplotype must have exactly 2 alleles
- Phenotype derivable from diplotype
- Coverage ≥ minimum threshold for calling
- All star allele variants must exist in variant database
Relationship with Epigenomic Context:
- Type: Customer-Supplier
- Direction: Epigenomic → Pharmacogenomic
- Integration: Methylation affects drug metabolism gene expression
- Translation: Methylation beta values → gene expression levels → phenotype
7. Pipeline Context
Module: pipeline.rs
Responsibility: Orchestration of multi-stage genomic analysis workflow
Core Aggregates:
GenomicPipeline- Root aggregate orchestrating all contexts
Key Types:
pub struct GenomicPipeline {
pub kmer_encoder: KmerEncoder,
pub aligner: AttentionAligner,
pub variant_caller: VariantCaller,
pub protein_predictor: ContactPredictor,
pub methylation_analyzer: MethylationAnalyzer,
pub pharma_analyzer: StarAlleleCaller,
}
pub struct PipelineConfig {
pub k: usize,
pub min_variant_quality: f64,
pub min_coverage: usize,
pub enable_protein_prediction: bool,
pub enable_epigenetic_analysis: bool,
pub enable_pharmacogenomics: bool,
}
pub struct AnalysisResult {
pub sequence_stats: SequenceStats,
pub variants: Vec<Variant>,
pub protein_structures: Vec<ProteinGraph>,
pub methylation_profile: Option<MethylationProfile>,
pub drug_responses: Vec<DrugResponse>,
}
Published Events:
PipelineStarted { sample_id: String, stages: Vec<String> }StageCompleted { stage: String, duration_ms: u64 }PipelineCompleted { sample_id: String, total_duration_ms: u64 }PipelineFailed { stage: String, error: String }
Domain Language:
- Pipeline: directed acyclic graph of analysis stages
- Stage: atomic analysis unit (alignment, variant calling, etc.)
- Workflow: ordered execution of stages
- Checkpoint: saved intermediate state
- Provenance: lineage tracking of analysis steps
Invariants:
- All enabled stages must execute in dependency order
- Failed stage halts downstream execution
- All results traceable to input data and parameters
Anti-Corruption Layers:
The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:
- Sequence ACL: Translates k-mer indices to alignment seeds
- Alignment ACL: Converts alignment gaps to variant candidates
- Variant ACL: Maps variants to protein mutations
- Protein ACL: Translates structure to functional predictions
- Epigenetic ACL: Converts methylation to gene expression estimates
- Pharmacogenomic ACL: Maps genotypes to clinical recommendations
Context Relationship Matrix
| From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline |
|---|---|---|---|---|---|---|---|
| Sequence | - | C-S | SK | SK | - | - | ACL |
| Alignment | - | - | C-S | - | - | - | ACL |
| Variant | - | - | - | Partner | ACL | - | ACL |
| Protein | - | - | Partner | - | - | - | ACL |
| Epigenomic | - | - | - | - | - | C-S | ACL |
| Pharma | - | - | - | - | - | - | ACL |
| Pipeline | C-S | C-S | C-S | C-S | C-S | C-S | - |
Legend:
- C-S: Customer-Supplier
- SK: Shared Kernel
- Partner: Partnership
- ACL: Anti-Corruption Layer
Integration Patterns
1. Event-Driven Integration
Contexts communicate via domain events to maintain loose coupling:
// Example: Variant Context publishes event
pub enum DomainEvent {
VariantCalled(VariantCalledEvent),
ProteinStructurePredicted(ProteinPredictedEvent),
// ...
}
// Pipeline Context subscribes and translates
impl EventHandler for GenomicPipeline {
fn handle(&mut self, event: DomainEvent) {
match event {
DomainEvent::VariantCalled(e) => {
if e.variant.is_coding() {
self.trigger_protein_analysis(e.variant);
}
}
// ...
}
}
}
2. Shared Kernel Components
Core domain types shared across contexts:
// In types.rs (core domain)
pub struct GenomicPosition {
pub chromosome: String,
pub position: usize,
}
pub struct QualityScore(pub f64); // Phred-scaled
pub enum Nucleotide { A, C, G, T }
pub struct GenomicRegion {
pub chromosome: String,
pub start: usize,
pub end: usize,
}
3. Anti-Corruption Layer Example
// Variant → Protein ACL
pub struct VariantToProteinTranslator {
codon_table: CodonTable,
}
impl VariantToProteinTranslator {
pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
// Prevents protein context from depending on variant implementation
let codon_change = self.map_to_codon(variant)?;
let aa_change = self.codon_table.translate(codon_change)?;
Some(ProteinMutation {
position: variant.position.position / 3,
reference_aa: aa_change.reference,
alternate_aa: aa_change.alternate,
})
}
}
Bounded Context Responsibilities Summary
- Sequence Context: K-mer indexing and sequence similarity (foundation)
- Alignment Context: Pairwise alignment and motif discovery
- Variant Context: Variant calling and population genetics
- Protein Context: Structure prediction and functional analysis
- Epigenomic Context: Methylation profiling and age prediction
- Pharmacogenomic Context: Drug-gene interactions and clinical recommendations
- Pipeline Context: Workflow orchestration and result aggregation
Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.