git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
603 lines
20 KiB
Markdown
603 lines
20 KiB
Markdown
# Bounded Context Map - Genomic Analysis Platform
|
|
|
|
## Context Map Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ GENOMIC ANALYSIS PLATFORM │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌──────────────────┐
|
|
│ Pipeline │ ◄───────── Orchestration Layer
|
|
│ Context │
|
|
└────────┬─────────┘
|
|
│ ACL (maps domain events to pipeline commands)
|
|
│
|
|
┌────────┴─────────────────────────────────────────────┐
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Sequence │ Customer-Supplier │ Alignment │
|
|
│ Context ├──────────────────────────────►│ Context │
|
|
│ │ (provides k-mer indices) │ │
|
|
└────────┬────────┘ └────────┬────────┘
|
|
│ │
|
|
│ Shared Kernel (GenomicPosition, QualityScore) │
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Variant │ │ Protein │
|
|
│ Context │◄──────────────────────────────┤ Context │
|
|
│ │ Partner (variant→structure) │ │
|
|
└────────┬────────┘ └─────────────────┘
|
|
│
|
|
│ ACL (translates variants to epigenetic events)
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Epigenomic │
|
|
│ Context │
|
|
└────────┬────────┘
|
|
│
|
|
│ Customer-Supplier (epigenetic→drug response)
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Pharmacogenomic │
|
|
│ Context │
|
|
└─────────────────┘
|
|
|
|
Legend:
|
|
Customer-Supplier: → (upstream provides services to downstream)
|
|
Shared Kernel: ├─┤ (shared domain model)
|
|
Partner: ◄─► (mutual dependency)
|
|
ACL: [A] (anti-corruption layer)
|
|
```
|
|
|
|
## 1. Sequence Context
|
|
|
|
**Module**: `kmer.rs`
|
|
|
|
**Responsibility**: K-mer indexing, sequence sketching, and similarity search
|
|
|
|
**Core Aggregates**:
|
|
- `KmerIndex` - Root aggregate managing k-mer → position mappings
|
|
- `MinHashSketch` - Aggregate for approximate sequence similarity
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct KmerEncoder {
|
|
k: usize,
|
|
alphabet_size: usize,
|
|
}
|
|
|
|
pub struct KmerIndex {
|
|
k: usize,
|
|
index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
|
|
}
|
|
|
|
pub struct MinHashSketch {
|
|
k: usize,
|
|
num_hashes: usize,
|
|
signatures: Vec<u64>,
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `SequenceIndexed { sequence_id: String, kmer_count: usize }`
|
|
- `SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }`
|
|
|
|
**Domain Language**:
|
|
- K-mer: substring of length k
|
|
- Minimizer: canonical k-mer representation
|
|
- Sketch: compressed sequence signature
|
|
- Jaccard similarity: set overlap metric
|
|
|
|
**Invariants**:
|
|
- K-mer length must be 3 ≤ k ≤ 32
|
|
- MinHash signature size must be ≥ 1
|
|
- All k-mers normalized to canonical form (min(kmer, reverse_complement))
|
|
|
|
## 2. Alignment Context
|
|
|
|
**Module**: `alignment.rs`
|
|
|
|
**Responsibility**: Sequence alignment using attention mechanisms and motif detection
|
|
|
|
**Core Aggregates**:
|
|
- `AttentionAligner` - Root aggregate for pairwise sequence alignment
|
|
- `MotifScanner` - Aggregate for regulatory motif discovery
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct AttentionAligner {
|
|
attention_service: Arc<AttentionService>,
|
|
gap_penalty: f64,
|
|
match_bonus: f64,
|
|
}
|
|
|
|
pub struct MotifScanner {
|
|
attention_service: Arc<AttentionService>,
|
|
min_score: f64,
|
|
known_motifs: Vec<MotifPattern>,
|
|
}
|
|
|
|
pub struct AlignmentResult {
|
|
pub score: f64,
|
|
pub aligned_query: String,
|
|
pub aligned_target: String,
|
|
pub attention_weights: Vec<Vec<f64>>,
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `AlignmentCompleted { query_id: String, target_id: String, score: f64 }`
|
|
- `MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }`
|
|
|
|
**Domain Language**:
|
|
- Alignment: optimal mapping between two sequences
|
|
- Gap penalty: cost of insertions/deletions
|
|
- Attention weight: learned similarity between positions
|
|
- Motif: conserved sequence pattern (e.g., TATA box)
|
|
- PWM (Position Weight Matrix): motif scoring matrix
|
|
|
|
**Invariants**:
|
|
- Gap penalty must be negative
|
|
- Match bonus must be positive
|
|
- Motif minimum score 0.0 ≤ score ≤ 1.0
|
|
- Alignment score monotonically decreases with gaps
|
|
|
|
**Relationship with Sequence Context**:
|
|
- **Type**: Customer-Supplier
|
|
- **Direction**: Sequence → Alignment
|
|
- **Integration**: Alignment consumes k-mer indices for fast seed-and-extend
|
|
- **Translation**: None (direct dependency)
|
|
|
|
## 3. Variant Context
|
|
|
|
**Module**: `variant.rs`
|
|
|
|
**Responsibility**: Variant calling, genotyping, and population genetics
|
|
|
|
**Core Aggregates**:
|
|
- `VariantDatabase` - Root aggregate managing variant collection
|
|
- `VariantCaller` - Service aggregate for variant detection
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct VariantCaller {
|
|
min_quality: f64,
|
|
min_depth: usize,
|
|
gnn_service: Arc<GnnService>,
|
|
}
|
|
|
|
pub struct Variant {
|
|
pub position: GenomicPosition,
|
|
pub reference: String,
|
|
pub alternate: String,
|
|
pub quality: f64,
|
|
pub genotype: Genotype,
|
|
pub depth: usize,
|
|
pub allele_frequency: Option<f64>,
|
|
}
|
|
|
|
pub struct VariantDatabase {
|
|
variants: HashMap<GenomicPosition, Variant>,
|
|
graph_index: Option<GraphIndex>, // GNN-based variant relationships
|
|
}
|
|
|
|
pub enum Genotype {
|
|
Homozygous(Allele),
|
|
Heterozygous(Allele, Allele),
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `VariantCalled { position: GenomicPosition, variant: Variant }`
|
|
- `GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }`
|
|
- `PopulationFrequencyCalculated { variant_id: String, frequency: f64 }`
|
|
|
|
**Domain Language**:
|
|
- SNP (Single Nucleotide Polymorphism): single base change
|
|
- Indel: insertion or deletion
|
|
- Genotype: allele combination (0/0, 0/1, 1/1)
|
|
- Allele frequency: population prevalence
|
|
- Quality score: confidence in variant call (Phred scale)
|
|
- Coverage depth: number of reads supporting variant
|
|
|
|
**Invariants**:
|
|
- Quality score ≥ 0 (Phred scale)
|
|
- Coverage depth ≥ 1
|
|
- Allele frequency 0.0 ≤ AF ≤ 1.0
|
|
- Reference and alternate alleles must differ
|
|
- Genotype alleles must match available alleles
|
|
|
|
**Relationship with Alignment Context**:
|
|
- **Type**: Customer-Supplier
|
|
- **Direction**: Alignment → Variant
|
|
- **Integration**: Variant caller uses alignment results to identify mismatches
|
|
- **Translation**: Alignment gaps → insertion/deletion variants
|
|
|
|
**Shared Kernel with Sequence Context**:
|
|
- `GenomicPosition { chromosome: String, position: usize }`
|
|
- `QualityScore(f64)` (Phred-scaled)
|
|
- `Nucleotide` enum (A, C, G, T)
|
|
|
|
## 4. Protein Context
|
|
|
|
**Module**: `protein.rs`
|
|
|
|
**Responsibility**: Protein structure prediction and contact map generation
|
|
|
|
**Core Aggregates**:
|
|
- `ProteinGraph` - Root aggregate representing protein as graph
|
|
- `ContactPredictor` - Service aggregate for 3D contact prediction
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct ProteinGraph {
|
|
pub sequence: String, // amino acid sequence
|
|
pub nodes: Vec<AminoAcid>,
|
|
pub edges: Vec<(usize, usize, ContactType)>,
|
|
}
|
|
|
|
pub struct ContactPredictor {
|
|
gnn_service: Arc<GnnService>,
|
|
attention_service: Arc<AttentionService>,
|
|
distance_threshold: f64, // Ångströms
|
|
}
|
|
|
|
pub struct ContactPrediction {
|
|
pub residue_i: usize,
|
|
pub residue_j: usize,
|
|
pub probability: f64,
|
|
pub distance: Option<f64>,
|
|
}
|
|
|
|
pub enum ContactType {
|
|
Backbone,
|
|
SideChain,
|
|
HydrogenBond,
|
|
DisulfideBridge,
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `ProteinTranslated { gene_id: String, protein_sequence: String }`
|
|
- `StructurePredicted { protein_id: String, contact_count: usize }`
|
|
- `FoldingPathwayComputed { protein_id: String, energy: f64 }`
|
|
|
|
**Domain Language**:
|
|
- Amino acid: protein building block (20 standard types)
|
|
- Residue: amino acid position in sequence
|
|
- Contact: spatial proximity between residues (<8Å)
|
|
- Secondary structure: local folding patterns (helix, sheet, loop)
|
|
- Tertiary structure: 3D protein fold
|
|
- Contact map: matrix of residue-residue distances
|
|
|
|
**Invariants**:
|
|
- Sequence length ≥ 1
|
|
- Contact probability 0.0 ≤ p ≤ 1.0
|
|
- Distance threshold > 0.0 (typically 8.0Å)
|
|
- Contact pairs must be |i - j| ≥ 4 (exclude local contacts)
|
|
|
|
**Relationship with Variant Context**:
|
|
- **Type**: Partner (bidirectional)
|
|
- **Direction**: Variant ↔ Protein
|
|
- **Integration**:
|
|
- Variant → Protein: coding variants cause amino acid changes
|
|
- Protein → Variant: structural changes inform variant pathogenicity
|
|
- **Translation**:
|
|
- Variant ACL translates nucleotide changes to codon changes
|
|
- Protein ACL maps structure disruption to clinical significance
|
|
|
|
## 5. Epigenomic Context
|
|
|
|
**Module**: `epigenomics.rs`
|
|
|
|
**Responsibility**: DNA methylation analysis and epigenetic age prediction
|
|
|
|
**Core Aggregates**:
|
|
- `EpigeneticIndex` - Root aggregate managing methylation sites
|
|
- `HorvathClock` - Service aggregate for epigenetic age calculation
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct MethylationProfile {
|
|
pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
|
|
pub total_sites: usize,
|
|
pub mean_methylation: f64,
|
|
}
|
|
|
|
pub struct HorvathClock {
|
|
pub coefficients: HashMap<String, f64>, // CpG site → weight
|
|
pub intercept: f64,
|
|
}
|
|
|
|
pub struct CpGSite {
|
|
pub position: GenomicPosition,
|
|
pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
|
|
pub coverage: usize,
|
|
}
|
|
|
|
pub struct EpigeneticAge {
|
|
pub chronological_age: Option<f64>,
|
|
pub predicted_age: f64,
|
|
pub acceleration: f64, // predicted - chronological
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `MethylationProfileGenerated { sample_id: String, site_count: usize }`
|
|
- `EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }`
|
|
- `DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }`
|
|
|
|
**Domain Language**:
|
|
- CpG site: cytosine-guanine dinucleotide (methylation target)
|
|
- Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
|
|
- Epigenetic clock: age predictor based on methylation
|
|
- Age acceleration: difference between epigenetic and chronological age
|
|
- DMR (Differentially Methylated Region): region with changed methylation
|
|
|
|
**Invariants**:
|
|
- Beta value 0.0 ≤ β ≤ 1.0
|
|
- Coverage ≥ 1
|
|
- Horvath coefficients sum to meaningful scale
|
|
- Age ≥ 0.0
|
|
|
|
**Relationship with Variant Context**:
|
|
- **Type**: Anti-Corruption Layer
|
|
- **Direction**: Variant → Epigenomic
|
|
- **Integration**: Variants in regulatory regions affect methylation patterns
|
|
- **Translation**:
|
|
- ACL translates genetic variants to epigenetic effects
|
|
- Maps SNPs → methylation quantitative trait loci (mQTL)
|
|
- Prevents variant domain concepts from leaking into epigenetic model
|
|
|
|
## 6. Pharmacogenomic Context
|
|
|
|
**Module**: `pharma.rs`
|
|
|
|
**Responsibility**: Pharmacogenetic analysis and drug-gene interaction prediction
|
|
|
|
**Core Aggregates**:
|
|
- `DrugInteractionGraph` - Root aggregate representing drug-gene network
|
|
- `StarAlleleCaller` - Service aggregate for haplotype phasing
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct StarAlleleCaller {
|
|
gene_definitions: HashMap<String, GeneDefinition>,
|
|
min_coverage: usize,
|
|
}
|
|
|
|
pub struct StarAllele {
|
|
pub gene: String,
|
|
pub allele: String, // e.g., "*1", "*2", "*17"
|
|
pub variants: Vec<Variant>,
|
|
pub function: AlleleFunction,
|
|
}
|
|
|
|
pub enum AlleleFunction {
|
|
Normal,
|
|
Increased,
|
|
Decreased,
|
|
NoFunction,
|
|
}
|
|
|
|
pub struct DrugInteractionGraph {
|
|
pub nodes: Vec<DrugGeneNode>,
|
|
pub edges: Vec<(usize, usize, InteractionType)>,
|
|
}
|
|
|
|
pub struct DrugResponse {
|
|
pub drug: String,
|
|
pub diplotype: Diplotype,
|
|
pub phenotype: MetabolizerPhenotype,
|
|
pub recommendation: ClinicalRecommendation,
|
|
}
|
|
|
|
pub enum MetabolizerPhenotype {
|
|
UltraRapid,
|
|
Rapid,
|
|
Normal,
|
|
Intermediate,
|
|
Poor,
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `StarAlleleIdentified { gene: String, allele: String, diplotype: String }`
|
|
- `DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }`
|
|
- `InteractionDetected { drug1: String, drug2: String, severity: Severity }`
|
|
|
|
**Domain Language**:
|
|
- Star allele: named haplotype variant (e.g., CYP2D6*4)
|
|
- Diplotype: pair of haplotypes (e.g., *1/*4)
|
|
- Metabolizer phenotype: drug metabolism rate
|
|
- Pharmacogene: gene affecting drug response
|
|
- Drug-gene interaction: how genetics modulates drug efficacy/toxicity
|
|
|
|
**Invariants**:
|
|
- Diplotype must have exactly 2 alleles
|
|
- Phenotype derivable from diplotype
|
|
- Coverage ≥ minimum threshold for calling
|
|
- All star allele variants must exist in variant database
|
|
|
|
**Relationship with Epigenomic Context**:
|
|
- **Type**: Customer-Supplier
|
|
- **Direction**: Epigenomic → Pharmacogenomic
|
|
- **Integration**: Methylation affects drug metabolism gene expression
|
|
- **Translation**: Methylation beta values → gene expression levels → phenotype
|
|
|
|
## 7. Pipeline Context
|
|
|
|
**Module**: `pipeline.rs`
|
|
|
|
**Responsibility**: Orchestration of multi-stage genomic analysis workflow
|
|
|
|
**Core Aggregates**:
|
|
- `GenomicPipeline` - Root aggregate orchestrating all contexts
|
|
|
|
**Key Types**:
|
|
```rust
|
|
pub struct GenomicPipeline {
|
|
pub kmer_encoder: KmerEncoder,
|
|
pub aligner: AttentionAligner,
|
|
pub variant_caller: VariantCaller,
|
|
pub protein_predictor: ContactPredictor,
|
|
pub methylation_analyzer: MethylationAnalyzer,
|
|
pub pharma_analyzer: StarAlleleCaller,
|
|
}
|
|
|
|
pub struct PipelineConfig {
|
|
pub k: usize,
|
|
pub min_variant_quality: f64,
|
|
pub min_coverage: usize,
|
|
pub enable_protein_prediction: bool,
|
|
pub enable_epigenetic_analysis: bool,
|
|
pub enable_pharmacogenomics: bool,
|
|
}
|
|
|
|
pub struct AnalysisResult {
|
|
pub sequence_stats: SequenceStats,
|
|
pub variants: Vec<Variant>,
|
|
pub protein_structures: Vec<ProteinGraph>,
|
|
pub methylation_profile: Option<MethylationProfile>,
|
|
pub drug_responses: Vec<DrugResponse>,
|
|
}
|
|
```
|
|
|
|
**Published Events**:
|
|
- `PipelineStarted { sample_id: String, stages: Vec<String> }`
|
|
- `StageCompleted { stage: String, duration_ms: u64 }`
|
|
- `PipelineCompleted { sample_id: String, total_duration_ms: u64 }`
|
|
- `PipelineFailed { stage: String, error: String }`
|
|
|
|
**Domain Language**:
|
|
- Pipeline: directed acyclic graph of analysis stages
|
|
- Stage: atomic analysis unit (alignment, variant calling, etc.)
|
|
- Workflow: ordered execution of stages
|
|
- Checkpoint: saved intermediate state
|
|
- Provenance: lineage tracking of analysis steps
|
|
|
|
**Invariants**:
|
|
- All enabled stages must execute in dependency order
|
|
- Failed stage halts downstream execution
|
|
- All results traceable to input data and parameters
|
|
|
|
**Anti-Corruption Layers**:
|
|
|
|
The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:
|
|
|
|
1. **Sequence ACL**: Translates k-mer indices to alignment seeds
|
|
2. **Alignment ACL**: Converts alignment gaps to variant candidates
|
|
3. **Variant ACL**: Maps variants to protein mutations
|
|
4. **Protein ACL**: Translates structure to functional predictions
|
|
5. **Epigenetic ACL**: Converts methylation to gene expression estimates
|
|
6. **Pharmacogenomic ACL**: Maps genotypes to clinical recommendations
|
|
|
|
## Context Relationship Matrix
|
|
|
|
| From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline |
|
|
|---------------|----------|-----------|---------|---------|------------|--------|----------|
|
|
| Sequence | - | C-S | SK | SK | - | - | ACL |
|
|
| Alignment | - | - | C-S | - | - | - | ACL |
|
|
| Variant | - | - | - | Partner | ACL | - | ACL |
|
|
| Protein | - | - | Partner | - | - | - | ACL |
|
|
| Epigenomic | - | - | - | - | - | C-S | ACL |
|
|
| Pharma | - | - | - | - | - | - | ACL |
|
|
| Pipeline | C-S | C-S | C-S | C-S | C-S | C-S | - |
|
|
|
|
**Legend**:
|
|
- C-S: Customer-Supplier
|
|
- SK: Shared Kernel
|
|
- Partner: Partnership
|
|
- ACL: Anti-Corruption Layer
|
|
|
|
## Integration Patterns
|
|
|
|
### 1. Event-Driven Integration
|
|
|
|
Contexts communicate via domain events to maintain loose coupling:
|
|
|
|
```rust
|
|
// Example: Variant Context publishes event
|
|
pub enum DomainEvent {
|
|
VariantCalled(VariantCalledEvent),
|
|
ProteinStructurePredicted(ProteinPredictedEvent),
|
|
// ...
|
|
}
|
|
|
|
// Pipeline Context subscribes and translates
|
|
impl EventHandler for GenomicPipeline {
|
|
fn handle(&mut self, event: DomainEvent) {
|
|
match event {
|
|
DomainEvent::VariantCalled(e) => {
|
|
if e.variant.is_coding() {
|
|
self.trigger_protein_analysis(e.variant);
|
|
}
|
|
}
|
|
// ...
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Shared Kernel Components
|
|
|
|
Core domain types shared across contexts:
|
|
|
|
```rust
|
|
// In types.rs (core domain)
|
|
pub struct GenomicPosition {
|
|
pub chromosome: String,
|
|
pub position: usize,
|
|
}
|
|
|
|
pub struct QualityScore(pub f64); // Phred-scaled
|
|
|
|
pub enum Nucleotide { A, C, G, T }
|
|
|
|
pub struct GenomicRegion {
|
|
pub chromosome: String,
|
|
pub start: usize,
|
|
pub end: usize,
|
|
}
|
|
```
|
|
|
|
### 3. Anti-Corruption Layer Example
|
|
|
|
```rust
|
|
// Variant → Protein ACL
|
|
pub struct VariantToProteinTranslator {
|
|
codon_table: CodonTable,
|
|
}
|
|
|
|
impl VariantToProteinTranslator {
|
|
pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
|
|
// Prevents protein context from depending on variant implementation
|
|
let codon_change = self.map_to_codon(variant)?;
|
|
let aa_change = self.codon_table.translate(codon_change)?;
|
|
|
|
Some(ProteinMutation {
|
|
position: variant.position.position / 3,
|
|
reference_aa: aa_change.reference,
|
|
alternate_aa: aa_change.alternate,
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
## Bounded Context Responsibilities Summary
|
|
|
|
1. **Sequence Context**: K-mer indexing and sequence similarity (foundation)
|
|
2. **Alignment Context**: Pairwise alignment and motif discovery
|
|
3. **Variant Context**: Variant calling and population genetics
|
|
4. **Protein Context**: Structure prediction and functional analysis
|
|
5. **Epigenomic Context**: Methylation profiling and age prediction
|
|
6. **Pharmacogenomic Context**: Drug-gene interactions and clinical recommendations
|
|
7. **Pipeline Context**: Workflow orchestration and result aggregation
|
|
|
|
Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.
|