Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,602 @@
# Bounded Context Map - Genomic Analysis Platform
## Context Map Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ GENOMIC ANALYSIS PLATFORM │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────┐
│ Pipeline │ ◄───────── Orchestration Layer
│ Context │
└────────┬─────────┘
│ ACL (maps domain events to pipeline commands)
┌────────┴─────────────────────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Sequence │ Customer-Supplier │ Alignment │
│ Context ├──────────────────────────────►│ Context │
│ │ (provides k-mer indices) │ │
└────────┬────────┘ └────────┬────────┘
│ │
│ Shared Kernel (GenomicPosition, QualityScore) │
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Variant │ │ Protein │
│ Context │◄──────────────────────────────┤ Context │
│ │ Partner (variant→structure) │ │
└────────┬────────┘ └─────────────────┘
│ ACL (translates variants to epigenetic events)
┌─────────────────┐
│ Epigenomic │
│ Context │
└────────┬────────┘
│ Customer-Supplier (epigenetic→drug response)
┌─────────────────┐
│ Pharmacogenomic │
│ Context │
└─────────────────┘
Legend:
Customer-Supplier: → (upstream provides services to downstream)
Shared Kernel: ├─┤ (shared domain model)
Partner: ◄─► (mutual dependency)
ACL: [A] (anti-corruption layer)
```
## 1. Sequence Context
**Module**: `kmer.rs`
**Responsibility**: K-mer indexing, sequence sketching, and similarity search
**Core Aggregates**:
- `KmerIndex` - Root aggregate managing k-mer → position mappings
- `MinHashSketch` - Aggregate for approximate sequence similarity
**Key Types**:
```rust
pub struct KmerEncoder {
k: usize,
alphabet_size: usize,
}
pub struct KmerIndex {
k: usize,
index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
}
pub struct MinHashSketch {
k: usize,
num_hashes: usize,
signatures: Vec<u64>,
}
```
**Published Events**:
- `SequenceIndexed { sequence_id: String, kmer_count: usize }`
- `SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }`
**Domain Language**:
- K-mer: substring of length k
- Minimizer: canonical k-mer representation
- Sketch: compressed sequence signature
- Jaccard similarity: set overlap metric
**Invariants**:
- K-mer length must be 3 ≤ k ≤ 32
- MinHash signature size must be ≥ 1
- All k-mers normalized to canonical form (min(kmer, reverse_complement))
## 2. Alignment Context
**Module**: `alignment.rs`
**Responsibility**: Sequence alignment using attention mechanisms and motif detection
**Core Aggregates**:
- `AttentionAligner` - Root aggregate for pairwise sequence alignment
- `MotifScanner` - Aggregate for regulatory motif discovery
**Key Types**:
```rust
pub struct AttentionAligner {
attention_service: Arc<AttentionService>,
gap_penalty: f64,
match_bonus: f64,
}
pub struct MotifScanner {
attention_service: Arc<AttentionService>,
min_score: f64,
known_motifs: Vec<MotifPattern>,
}
pub struct AlignmentResult {
pub score: f64,
pub aligned_query: String,
pub aligned_target: String,
pub attention_weights: Vec<Vec<f64>>,
}
```
**Published Events**:
- `AlignmentCompleted { query_id: String, target_id: String, score: f64 }`
- `MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }`
**Domain Language**:
- Alignment: optimal mapping between two sequences
- Gap penalty: cost of insertions/deletions
- Attention weight: learned similarity between positions
- Motif: conserved sequence pattern (e.g., TATA box)
- PWM (Position Weight Matrix): motif scoring matrix
**Invariants**:
- Gap penalty must be negative
- Match bonus must be positive
- Motif minimum score 0.0 ≤ score ≤ 1.0
- Alignment score monotonically decreases with gaps
**Relationship with Sequence Context**:
- **Type**: Customer-Supplier
- **Direction**: Sequence → Alignment
- **Integration**: Alignment consumes k-mer indices for fast seed-and-extend
- **Translation**: None (direct dependency)
## 3. Variant Context
**Module**: `variant.rs`
**Responsibility**: Variant calling, genotyping, and population genetics
**Core Aggregates**:
- `VariantDatabase` - Root aggregate managing variant collection
- `VariantCaller` - Service aggregate for variant detection
**Key Types**:
```rust
pub struct VariantCaller {
min_quality: f64,
min_depth: usize,
gnn_service: Arc<GnnService>,
}
pub struct Variant {
pub position: GenomicPosition,
pub reference: String,
pub alternate: String,
pub quality: f64,
pub genotype: Genotype,
pub depth: usize,
pub allele_frequency: Option<f64>,
}
pub struct VariantDatabase {
variants: HashMap<GenomicPosition, Variant>,
graph_index: Option<GraphIndex>, // GNN-based variant relationships
}
pub enum Genotype {
Homozygous(Allele),
Heterozygous(Allele, Allele),
}
```
**Published Events**:
- `VariantCalled { position: GenomicPosition, variant: Variant }`
- `GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }`
- `PopulationFrequencyCalculated { variant_id: String, frequency: f64 }`
**Domain Language**:
- SNP (Single Nucleotide Polymorphism): single base change
- Indel: insertion or deletion
- Genotype: allele combination (0/0, 0/1, 1/1)
- Allele frequency: population prevalence
- Quality score: confidence in variant call (Phred scale)
- Coverage depth: number of reads supporting variant
**Invariants**:
- Quality score ≥ 0 (Phred scale)
- Coverage depth ≥ 1
- Allele frequency 0.0 ≤ AF ≤ 1.0
- Reference and alternate alleles must differ
- Genotype alleles must match available alleles
**Relationship with Alignment Context**:
- **Type**: Customer-Supplier
- **Direction**: Alignment → Variant
- **Integration**: Variant caller uses alignment results to identify mismatches
- **Translation**: Alignment gaps → insertion/deletion variants
**Shared Kernel with Sequence Context**:
- `GenomicPosition { chromosome: String, position: usize }`
- `QualityScore(f64)` (Phred-scaled)
- `Nucleotide` enum (A, C, G, T)
## 4. Protein Context
**Module**: `protein.rs`
**Responsibility**: Protein structure prediction and contact map generation
**Core Aggregates**:
- `ProteinGraph` - Root aggregate representing protein as graph
- `ContactPredictor` - Service aggregate for 3D contact prediction
**Key Types**:
```rust
pub struct ProteinGraph {
pub sequence: String, // amino acid sequence
pub nodes: Vec<AminoAcid>,
pub edges: Vec<(usize, usize, ContactType)>,
}
pub struct ContactPredictor {
gnn_service: Arc<GnnService>,
attention_service: Arc<AttentionService>,
distance_threshold: f64, // Ångströms
}
pub struct ContactPrediction {
pub residue_i: usize,
pub residue_j: usize,
pub probability: f64,
pub distance: Option<f64>,
}
pub enum ContactType {
Backbone,
SideChain,
HydrogenBond,
DisulfideBridge,
}
```
**Published Events**:
- `ProteinTranslated { gene_id: String, protein_sequence: String }`
- `StructurePredicted { protein_id: String, contact_count: usize }`
- `FoldingPathwayComputed { protein_id: String, energy: f64 }`
**Domain Language**:
- Amino acid: protein building block (20 standard types)
- Residue: amino acid position in sequence
- Contact: spatial proximity between residues (<8Å)
- Secondary structure: local folding patterns (helix, sheet, loop)
- Tertiary structure: 3D protein fold
- Contact map: matrix of residue-residue distances
**Invariants**:
- Sequence length ≥ 1
- Contact probability 0.0 ≤ p ≤ 1.0
- Distance threshold > 0.0 (typically 8.0Å)
- Contact pairs must be |i - j| ≥ 4 (exclude local contacts)
**Relationship with Variant Context**:
- **Type**: Partner (bidirectional)
- **Direction**: Variant ↔ Protein
- **Integration**:
- Variant → Protein: coding variants cause amino acid changes
- Protein → Variant: structural changes inform variant pathogenicity
- **Translation**:
- Variant ACL translates nucleotide changes to codon changes
- Protein ACL maps structure disruption to clinical significance
## 5. Epigenomic Context
**Module**: `epigenomics.rs`
**Responsibility**: DNA methylation analysis and epigenetic age prediction
**Core Aggregates**:
- `EpigeneticIndex` - Root aggregate managing methylation sites
- `HorvathClock` - Service aggregate for epigenetic age calculation
**Key Types**:
```rust
pub struct MethylationProfile {
pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
pub total_sites: usize,
pub mean_methylation: f64,
}
pub struct HorvathClock {
pub coefficients: HashMap<String, f64>, // CpG site → weight
pub intercept: f64,
}
pub struct CpGSite {
pub position: GenomicPosition,
pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
pub coverage: usize,
}
pub struct EpigeneticAge {
pub chronological_age: Option<f64>,
pub predicted_age: f64,
pub acceleration: f64, // predicted - chronological
}
```
**Published Events**:
- `MethylationProfileGenerated { sample_id: String, site_count: usize }`
- `EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }`
- `DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }`
**Domain Language**:
- CpG site: cytosine-guanine dinucleotide (methylation target)
- Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
- Epigenetic clock: age predictor based on methylation
- Age acceleration: difference between epigenetic and chronological age
- DMR (Differentially Methylated Region): region with changed methylation
**Invariants**:
- Beta value 0.0 ≤ β ≤ 1.0
- Coverage ≥ 1
- Horvath coefficients sum to meaningful scale
- Age ≥ 0.0
**Relationship with Variant Context**:
- **Type**: Anti-Corruption Layer
- **Direction**: Variant → Epigenomic
- **Integration**: Variants in regulatory regions affect methylation patterns
- **Translation**:
- ACL translates genetic variants to epigenetic effects
- Maps SNPs → methylation quantitative trait loci (mQTL)
- Prevents variant domain concepts from leaking into epigenetic model
## 6. Pharmacogenomic Context
**Module**: `pharma.rs`
**Responsibility**: Pharmacogenetic analysis and drug-gene interaction prediction
**Core Aggregates**:
- `DrugInteractionGraph` - Root aggregate representing drug-gene network
- `StarAlleleCaller` - Service aggregate for haplotype phasing
**Key Types**:
```rust
pub struct StarAlleleCaller {
gene_definitions: HashMap<String, GeneDefinition>,
min_coverage: usize,
}
pub struct StarAllele {
pub gene: String,
pub allele: String, // e.g., "*1", "*2", "*17"
pub variants: Vec<Variant>,
pub function: AlleleFunction,
}
pub enum AlleleFunction {
Normal,
Increased,
Decreased,
NoFunction,
}
pub struct DrugInteractionGraph {
pub nodes: Vec<DrugGeneNode>,
pub edges: Vec<(usize, usize, InteractionType)>,
}
pub struct DrugResponse {
pub drug: String,
pub diplotype: Diplotype,
pub phenotype: MetabolizerPhenotype,
pub recommendation: ClinicalRecommendation,
}
pub enum MetabolizerPhenotype {
UltraRapid,
Rapid,
Normal,
Intermediate,
Poor,
}
```
**Published Events**:
- `StarAlleleIdentified { gene: String, allele: String, diplotype: String }`
- `DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }`
- `InteractionDetected { drug1: String, drug2: String, severity: Severity }`
**Domain Language**:
- Star allele: named haplotype variant (e.g., CYP2D6*4)
- Diplotype: pair of haplotypes (e.g., *1/*4)
- Metabolizer phenotype: drug metabolism rate
- Pharmacogene: gene affecting drug response
- Drug-gene interaction: how genetics modulates drug efficacy/toxicity
**Invariants**:
- Diplotype must have exactly 2 alleles
- Phenotype derivable from diplotype
- Coverage ≥ minimum threshold for calling
- All star allele variants must exist in variant database
**Relationship with Epigenomic Context**:
- **Type**: Customer-Supplier
- **Direction**: Epigenomic → Pharmacogenomic
- **Integration**: Methylation affects drug metabolism gene expression
- **Translation**: Methylation beta values → gene expression levels → phenotype
## 7. Pipeline Context
**Module**: `pipeline.rs`
**Responsibility**: Orchestration of multi-stage genomic analysis workflow
**Core Aggregates**:
- `GenomicPipeline` - Root aggregate orchestrating all contexts
**Key Types**:
```rust
pub struct GenomicPipeline {
pub kmer_encoder: KmerEncoder,
pub aligner: AttentionAligner,
pub variant_caller: VariantCaller,
pub protein_predictor: ContactPredictor,
pub methylation_analyzer: MethylationAnalyzer,
pub pharma_analyzer: StarAlleleCaller,
}
pub struct PipelineConfig {
pub k: usize,
pub min_variant_quality: f64,
pub min_coverage: usize,
pub enable_protein_prediction: bool,
pub enable_epigenetic_analysis: bool,
pub enable_pharmacogenomics: bool,
}
pub struct AnalysisResult {
pub sequence_stats: SequenceStats,
pub variants: Vec<Variant>,
pub protein_structures: Vec<ProteinGraph>,
pub methylation_profile: Option<MethylationProfile>,
pub drug_responses: Vec<DrugResponse>,
}
```
**Published Events**:
- `PipelineStarted { sample_id: String, stages: Vec<String> }`
- `StageCompleted { stage: String, duration_ms: u64 }`
- `PipelineCompleted { sample_id: String, total_duration_ms: u64 }`
- `PipelineFailed { stage: String, error: String }`
**Domain Language**:
- Pipeline: directed acyclic graph of analysis stages
- Stage: atomic analysis unit (alignment, variant calling, etc.)
- Workflow: ordered execution of stages
- Checkpoint: saved intermediate state
- Provenance: lineage tracking of analysis steps
**Invariants**:
- All enabled stages must execute in dependency order
- Failed stage halts downstream execution
- All results traceable to input data and parameters
**Anti-Corruption Layers**:
The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:
1. **Sequence ACL**: Translates k-mer indices to alignment seeds
2. **Alignment ACL**: Converts alignment gaps to variant candidates
3. **Variant ACL**: Maps variants to protein mutations
4. **Protein ACL**: Translates structure to functional predictions
5. **Epigenetic ACL**: Converts methylation to gene expression estimates
6. **Pharmacogenomic ACL**: Maps genotypes to clinical recommendations
## Context Relationship Matrix
| From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline |
|---------------|----------|-----------|---------|---------|------------|--------|----------|
| Sequence | - | C-S | SK | SK | - | - | ACL |
| Alignment | - | - | C-S | - | - | - | ACL |
| Variant | - | - | - | Partner | ACL | - | ACL |
| Protein | - | - | Partner | - | - | - | ACL |
| Epigenomic | - | - | - | - | - | C-S | ACL |
| Pharma | - | - | - | - | - | - | ACL |
| Pipeline | C-S | C-S | C-S | C-S | C-S | C-S | - |
**Legend**:
- C-S: Customer-Supplier
- SK: Shared Kernel
- Partner: Partnership
- ACL: Anti-Corruption Layer
## Integration Patterns
### 1. Event-Driven Integration
Contexts communicate via domain events to maintain loose coupling:
```rust
// Example: Variant Context publishes event
pub enum DomainEvent {
VariantCalled(VariantCalledEvent),
ProteinStructurePredicted(ProteinPredictedEvent),
// ...
}
// Pipeline Context subscribes and translates
impl EventHandler for GenomicPipeline {
fn handle(&mut self, event: DomainEvent) {
match event {
DomainEvent::VariantCalled(e) => {
if e.variant.is_coding() {
self.trigger_protein_analysis(e.variant);
}
}
// ...
}
}
}
```
### 2. Shared Kernel Components
Core domain types shared across contexts:
```rust
// In types.rs (core domain)
pub struct GenomicPosition {
pub chromosome: String,
pub position: usize,
}
pub struct QualityScore(pub f64); // Phred-scaled
pub enum Nucleotide { A, C, G, T }
pub struct GenomicRegion {
pub chromosome: String,
pub start: usize,
pub end: usize,
}
```
### 3. Anti-Corruption Layer Example
```rust
// Variant → Protein ACL
pub struct VariantToProteinTranslator {
codon_table: CodonTable,
}
impl VariantToProteinTranslator {
pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
// Prevents protein context from depending on variant implementation
let codon_change = self.map_to_codon(variant)?;
let aa_change = self.codon_table.translate(codon_change)?;
Some(ProteinMutation {
position: variant.position.position / 3,
reference_aa: aa_change.reference,
alternate_aa: aa_change.alternate,
})
}
}
```
## Bounded Context Responsibilities Summary
1. **Sequence Context**: K-mer indexing and sequence similarity (foundation)
2. **Alignment Context**: Pairwise alignment and motif discovery
3. **Variant Context**: Variant calling and population genetics
4. **Protein Context**: Structure prediction and functional analysis
5. **Epigenomic Context**: Methylation profiling and age prediction
6. **Pharmacogenomic Context**: Drug-gene interactions and clinical recommendations
7. **Pipeline Context**: Workflow orchestration and result aggregation
Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.