Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/examples/dna/ddd/bounded-context-map.md
+++ b/vendor/ruvector/examples/dna/ddd/bounded-context-map.md
@@ -0,0 +1,602 @@
+# Bounded Context Map - Genomic Analysis Platform
+
+## Context Map Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         GENOMIC ANALYSIS PLATFORM                            │
+└─────────────────────────────────────────────────────────────────────────────┘
+
+    ┌──────────────────┐
+    │   Pipeline       │ ◄───────── Orchestration Layer
+    │   Context        │
+    └────────┬─────────┘
+             │ ACL (maps domain events to pipeline commands)
+             │
+    ┌────────┴─────────────────────────────────────────────┐
+    │                                                       │
+    ▼                                                       ▼
+┌─────────────────┐                               ┌─────────────────┐
+│   Sequence      │ Customer-Supplier             │   Alignment     │
+│   Context       ├──────────────────────────────►│   Context       │
+│                 │ (provides k-mer indices)      │                 │
+└────────┬────────┘                               └────────┬────────┘
+         │                                                 │
+         │ Shared Kernel (GenomicPosition, QualityScore)  │
+         │                                                 │
+         ▼                                                 ▼
+┌─────────────────┐                               ┌─────────────────┐
+│   Variant       │                               │   Protein       │
+│   Context       │◄──────────────────────────────┤   Context       │
+│                 │  Partner (variant→structure)  │                 │
+└────────┬────────┘                               └─────────────────┘
+         │
+         │ ACL (translates variants to epigenetic events)
+         │
+         ▼
+┌─────────────────┐
+│  Epigenomic     │
+│  Context        │
+└────────┬────────┘
+         │
+         │ Customer-Supplier (epigenetic→drug response)
+         │
+         ▼
+┌─────────────────┐
+│ Pharmacogenomic │
+│ Context         │
+└─────────────────┘
+
+Legend:
+  Customer-Supplier: →  (upstream provides services to downstream)
+  Shared Kernel:     ├─┤ (shared domain model)
+  Partner:           ◄─► (mutual dependency)
+  ACL:               [A] (anti-corruption layer)
+```
+
+## 1. Sequence Context
+
+**Module**: `kmer.rs`
+
+**Responsibility**: K-mer indexing, sequence sketching, and similarity search
+
+**Core Aggregates**:
+- `KmerIndex` - Root aggregate managing k-mer → position mappings
+- `MinHashSketch` - Aggregate for approximate sequence similarity
+
+**Key Types**:
+```rust
+pub struct KmerEncoder {
+    k: usize,
+    alphabet_size: usize,
+}
+
+pub struct KmerIndex {
+    k: usize,
+    index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
+}
+
+pub struct MinHashSketch {
+    k: usize,
+    num_hashes: usize,
+    signatures: Vec<u64>,
+}
+```
+
+**Published Events**:
+- `SequenceIndexed { sequence_id: String, kmer_count: usize }`
+- `SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }`
+
+**Domain Language**:
+- K-mer: substring of length k
+- Minimizer: canonical k-mer representation
+- Sketch: compressed sequence signature
+- Jaccard similarity: set overlap metric
+
+**Invariants**:
+- K-mer length must be 3 ≤ k ≤ 32
+- MinHash signature size must be ≥ 1
+- All k-mers normalized to canonical form (min(kmer, reverse_complement))
+
+## 2. Alignment Context
+
+**Module**: `alignment.rs`
+
+**Responsibility**: Sequence alignment using attention mechanisms and motif detection
+
+**Core Aggregates**:
+- `AttentionAligner` - Root aggregate for pairwise sequence alignment
+- `MotifScanner` - Aggregate for regulatory motif discovery
+
+**Key Types**:
+```rust
+pub struct AttentionAligner {
+    attention_service: Arc<AttentionService>,
+    gap_penalty: f64,
+    match_bonus: f64,
+}
+
+pub struct MotifScanner {
+    attention_service: Arc<AttentionService>,
+    min_score: f64,
+    known_motifs: Vec<MotifPattern>,
+}
+
+pub struct AlignmentResult {
+    pub score: f64,
+    pub aligned_query: String,
+    pub aligned_target: String,
+    pub attention_weights: Vec<Vec<f64>>,
+}
+```
+
+**Published Events**:
+- `AlignmentCompleted { query_id: String, target_id: String, score: f64 }`
+- `MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }`
+
+**Domain Language**:
+- Alignment: optimal mapping between two sequences
+- Gap penalty: cost of insertions/deletions
+- Attention weight: learned similarity between positions
+- Motif: conserved sequence pattern (e.g., TATA box)
+- PWM (Position Weight Matrix): motif scoring matrix
+
+**Invariants**:
+- Gap penalty must be negative
+- Match bonus must be positive
+- Motif minimum score 0.0 ≤ score ≤ 1.0
+- Alignment score monotonically decreases with gaps
+
+**Relationship with Sequence Context**:
+- **Type**: Customer-Supplier
+- **Direction**: Sequence → Alignment
+- **Integration**: Alignment consumes k-mer indices for fast seed-and-extend
+- **Translation**: None (direct dependency)
+
+## 3. Variant Context
+
+**Module**: `variant.rs`
+
+**Responsibility**: Variant calling, genotyping, and population genetics
+
+**Core Aggregates**:
+- `VariantDatabase` - Root aggregate managing variant collection
+- `VariantCaller` - Service aggregate for variant detection
+
+**Key Types**:
+```rust
+pub struct VariantCaller {
+    min_quality: f64,
+    min_depth: usize,
+    gnn_service: Arc<GnnService>,
+}
+
+pub struct Variant {
+    pub position: GenomicPosition,
+    pub reference: String,
+    pub alternate: String,
+    pub quality: f64,
+    pub genotype: Genotype,
+    pub depth: usize,
+    pub allele_frequency: Option<f64>,
+}
+
+pub struct VariantDatabase {
+    variants: HashMap<GenomicPosition, Variant>,
+    graph_index: Option<GraphIndex>, // GNN-based variant relationships
+}
+
+pub enum Genotype {
+    Homozygous(Allele),
+    Heterozygous(Allele, Allele),
+}
+```
+
+**Published Events**:
+- `VariantCalled { position: GenomicPosition, variant: Variant }`
+- `GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }`
+- `PopulationFrequencyCalculated { variant_id: String, frequency: f64 }`
+
+**Domain Language**:
+- SNP (Single Nucleotide Polymorphism): single base change
+- Indel: insertion or deletion
+- Genotype: allele combination (0/0, 0/1, 1/1)
+- Allele frequency: population prevalence
+- Quality score: confidence in variant call (Phred scale)
+- Coverage depth: number of reads supporting variant
+
+**Invariants**:
+- Quality score ≥ 0 (Phred scale)
+- Coverage depth ≥ 1
+- Allele frequency 0.0 ≤ AF ≤ 1.0
+- Reference and alternate alleles must differ
+- Genotype alleles must match available alleles
+
+**Relationship with Alignment Context**:
+- **Type**: Customer-Supplier
+- **Direction**: Alignment → Variant
+- **Integration**: Variant caller uses alignment results to identify mismatches
+- **Translation**: Alignment gaps → insertion/deletion variants
+
+**Shared Kernel with Sequence Context**:
+- `GenomicPosition { chromosome: String, position: usize }`
+- `QualityScore(f64)` (Phred-scaled)
+- `Nucleotide` enum (A, C, G, T)
+
+## 4. Protein Context
+
+**Module**: `protein.rs`
+
+**Responsibility**: Protein structure prediction and contact map generation
+
+**Core Aggregates**:
+- `ProteinGraph` - Root aggregate representing protein as graph
+- `ContactPredictor` - Service aggregate for 3D contact prediction
+
+**Key Types**:
+```rust
+pub struct ProteinGraph {
+    pub sequence: String, // amino acid sequence
+    pub nodes: Vec<AminoAcid>,
+    pub edges: Vec<(usize, usize, ContactType)>,
+}
+
+pub struct ContactPredictor {
+    gnn_service: Arc<GnnService>,
+    attention_service: Arc<AttentionService>,
+    distance_threshold: f64, // Ångströms
+}
+
+pub struct ContactPrediction {
+    pub residue_i: usize,
+    pub residue_j: usize,
+    pub probability: f64,
+    pub distance: Option<f64>,
+}
+
+pub enum ContactType {
+    Backbone,
+    SideChain,
+    HydrogenBond,
+    DisulfideBridge,
+}
+```
+
+**Published Events**:
+- `ProteinTranslated { gene_id: String, protein_sequence: String }`
+- `StructurePredicted { protein_id: String, contact_count: usize }`
+- `FoldingPathwayComputed { protein_id: String, energy: f64 }`
+
+**Domain Language**:
+- Amino acid: protein building block (20 standard types)
+- Residue: amino acid position in sequence
+- Contact: spatial proximity between residues (<8Å)
+- Secondary structure: local folding patterns (helix, sheet, loop)
+- Tertiary structure: 3D protein fold
+- Contact map: matrix of residue-residue distances
+
+**Invariants**:
+- Sequence length ≥ 1
+- Contact probability 0.0 ≤ p ≤ 1.0
+- Distance threshold > 0.0 (typically 8.0Å)
+- Contact pairs must be |i - j| ≥ 4 (exclude local contacts)
+
+**Relationship with Variant Context**:
+- **Type**: Partner (bidirectional)
+- **Direction**: Variant ↔ Protein
+- **Integration**:
+  - Variant → Protein: coding variants cause amino acid changes
+  - Protein → Variant: structural changes inform variant pathogenicity
+- **Translation**:
+  - Variant ACL translates nucleotide changes to codon changes
+  - Protein ACL maps structure disruption to clinical significance
+
+## 5. Epigenomic Context
+
+**Module**: `epigenomics.rs`
+
+**Responsibility**: DNA methylation analysis and epigenetic age prediction
+
+**Core Aggregates**:
+- `EpigeneticIndex` - Root aggregate managing methylation sites
+- `HorvathClock` - Service aggregate for epigenetic age calculation
+
+**Key Types**:
+```rust
+pub struct MethylationProfile {
+    pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
+    pub total_sites: usize,
+    pub mean_methylation: f64,
+}
+
+pub struct HorvathClock {
+    pub coefficients: HashMap<String, f64>, // CpG site → weight
+    pub intercept: f64,
+}
+
+pub struct CpGSite {
+    pub position: GenomicPosition,
+    pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
+    pub coverage: usize,
+}
+
+pub struct EpigeneticAge {
+    pub chronological_age: Option<f64>,
+    pub predicted_age: f64,
+    pub acceleration: f64, // predicted - chronological
+}
+```
+
+**Published Events**:
+- `MethylationProfileGenerated { sample_id: String, site_count: usize }`
+- `EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }`
+- `DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }`
+
+**Domain Language**:
+- CpG site: cytosine-guanine dinucleotide (methylation target)
+- Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
+- Epigenetic clock: age predictor based on methylation
+- Age acceleration: difference between epigenetic and chronological age
+- DMR (Differentially Methylated Region): region with changed methylation
+
+**Invariants**:
+- Beta value 0.0 ≤ β ≤ 1.0
+- Coverage ≥ 1
+- Horvath coefficients sum to meaningful scale
+- Age ≥ 0.0
+
+**Relationship with Variant Context**:
+- **Type**: Anti-Corruption Layer
+- **Direction**: Variant → Epigenomic
+- **Integration**: Variants in regulatory regions affect methylation patterns
+- **Translation**:
+  - ACL translates genetic variants to epigenetic effects
+  - Maps SNPs → methylation quantitative trait loci (mQTL)
+  - Prevents variant domain concepts from leaking into epigenetic model
+
+## 6. Pharmacogenomic Context
+
+**Module**: `pharma.rs`
+
+**Responsibility**: Pharmacogenetic analysis and drug-gene interaction prediction
+
+**Core Aggregates**:
+- `DrugInteractionGraph` - Root aggregate representing drug-gene network
+- `StarAlleleCaller` - Service aggregate for haplotype phasing
+
+**Key Types**:
+```rust
+pub struct StarAlleleCaller {
+    gene_definitions: HashMap<String, GeneDefinition>,
+    min_coverage: usize,
+}
+
+pub struct StarAllele {
+    pub gene: String,
+    pub allele: String, // e.g., "*1", "*2", "*17"
+    pub variants: Vec<Variant>,
+    pub function: AlleleFunction,
+}
+
+pub enum AlleleFunction {
+    Normal,
+    Increased,
+    Decreased,
+    NoFunction,
+}
+
+pub struct DrugInteractionGraph {
+    pub nodes: Vec<DrugGeneNode>,
+    pub edges: Vec<(usize, usize, InteractionType)>,
+}
+
+pub struct DrugResponse {
+    pub drug: String,
+    pub diplotype: Diplotype,
+    pub phenotype: MetabolizerPhenotype,
+    pub recommendation: ClinicalRecommendation,
+}
+
+pub enum MetabolizerPhenotype {
+    UltraRapid,
+    Rapid,
+    Normal,
+    Intermediate,
+    Poor,
+}
+```
+
+**Published Events**:
+- `StarAlleleIdentified { gene: String, allele: String, diplotype: String }`
+- `DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }`
+- `InteractionDetected { drug1: String, drug2: String, severity: Severity }`
+
+**Domain Language**:
+- Star allele: named haplotype variant (e.g., CYP2D6*4)
+- Diplotype: pair of haplotypes (e.g., *1/*4)
+- Metabolizer phenotype: drug metabolism rate
+- Pharmacogene: gene affecting drug response
+- Drug-gene interaction: how genetics modulates drug efficacy/toxicity
+
+**Invariants**:
+- Diplotype must have exactly 2 alleles
+- Phenotype derivable from diplotype
+- Coverage ≥ minimum threshold for calling
+- All star allele variants must exist in variant database
+
+**Relationship with Epigenomic Context**:
+- **Type**: Customer-Supplier
+- **Direction**: Epigenomic → Pharmacogenomic
+- **Integration**: Methylation affects drug metabolism gene expression
+- **Translation**: Methylation beta values → gene expression levels → phenotype
+
+## 7. Pipeline Context
+
+**Module**: `pipeline.rs`
+
+**Responsibility**: Orchestration of multi-stage genomic analysis workflow
+
+**Core Aggregates**:
+- `GenomicPipeline` - Root aggregate orchestrating all contexts
+
+**Key Types**:
+```rust
+pub struct GenomicPipeline {
+    pub kmer_encoder: KmerEncoder,
+    pub aligner: AttentionAligner,
+    pub variant_caller: VariantCaller,
+    pub protein_predictor: ContactPredictor,
+    pub methylation_analyzer: MethylationAnalyzer,
+    pub pharma_analyzer: StarAlleleCaller,
+}
+
+pub struct PipelineConfig {
+    pub k: usize,
+    pub min_variant_quality: f64,
+    pub min_coverage: usize,
+    pub enable_protein_prediction: bool,
+    pub enable_epigenetic_analysis: bool,
+    pub enable_pharmacogenomics: bool,
+}
+
+pub struct AnalysisResult {
+    pub sequence_stats: SequenceStats,
+    pub variants: Vec<Variant>,
+    pub protein_structures: Vec<ProteinGraph>,
+    pub methylation_profile: Option<MethylationProfile>,
+    pub drug_responses: Vec<DrugResponse>,
+}
+```
+
+**Published Events**:
+- `PipelineStarted { sample_id: String, stages: Vec<String> }`
+- `StageCompleted { stage: String, duration_ms: u64 }`
+- `PipelineCompleted { sample_id: String, total_duration_ms: u64 }`
+- `PipelineFailed { stage: String, error: String }`
+
+**Domain Language**:
+- Pipeline: directed acyclic graph of analysis stages
+- Stage: atomic analysis unit (alignment, variant calling, etc.)
+- Workflow: ordered execution of stages
+- Checkpoint: saved intermediate state
+- Provenance: lineage tracking of analysis steps
+
+**Invariants**:
+- All enabled stages must execute in dependency order
+- Failed stage halts downstream execution
+- All results traceable to input data and parameters
+
+**Anti-Corruption Layers**:
+
+The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:
+
+1. **Sequence ACL**: Translates k-mer indices to alignment seeds
+2. **Alignment ACL**: Converts alignment gaps to variant candidates
+3. **Variant ACL**: Maps variants to protein mutations
+4. **Protein ACL**: Translates structure to functional predictions
+5. **Epigenetic ACL**: Converts methylation to gene expression estimates
+6. **Pharmacogenomic ACL**: Maps genotypes to clinical recommendations
+
+## Context Relationship Matrix
+
+| From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline |
+|---------------|----------|-----------|---------|---------|------------|--------|----------|
+| Sequence      | -        | C-S       | SK      | SK      | -          | -      | ACL      |
+| Alignment     | -        | -         | C-S     | -       | -          | -      | ACL      |
+| Variant       | -        | -         | -       | Partner | ACL        | -      | ACL      |
+| Protein       | -        | -         | Partner | -       | -          | -      | ACL      |
+| Epigenomic    | -        | -         | -       | -       | -          | C-S    | ACL      |
+| Pharma        | -        | -         | -       | -       | -          | -      | ACL      |
+| Pipeline      | C-S      | C-S       | C-S     | C-S     | C-S        | C-S    | -        |
+
+**Legend**:
+- C-S: Customer-Supplier
+- SK: Shared Kernel
+- Partner: Partnership
+- ACL: Anti-Corruption Layer
+
+## Integration Patterns
+
+### 1. Event-Driven Integration
+
+Contexts communicate via domain events to maintain loose coupling:
+
+```rust
+// Example: Variant Context publishes event
+pub enum DomainEvent {
+    VariantCalled(VariantCalledEvent),
+    ProteinStructurePredicted(ProteinPredictedEvent),
+    // ...
+}
+
+// Pipeline Context subscribes and translates
+impl EventHandler for GenomicPipeline {
+    fn handle(&mut self, event: DomainEvent) {
+        match event {
+            DomainEvent::VariantCalled(e) => {
+                if e.variant.is_coding() {
+                    self.trigger_protein_analysis(e.variant);
+                }
+            }
+            // ...
+        }
+    }
+}
+```
+
+### 2. Shared Kernel Components
+
+Core domain types shared across contexts:
+
+```rust
+// In types.rs (core domain)
+pub struct GenomicPosition {
+    pub chromosome: String,
+    pub position: usize,
+}
+
+pub struct QualityScore(pub f64); // Phred-scaled
+
+pub enum Nucleotide { A, C, G, T }
+
+pub struct GenomicRegion {
+    pub chromosome: String,
+    pub start: usize,
+    pub end: usize,
+}
+```
+
+### 3. Anti-Corruption Layer Example
+
+```rust
+// Variant → Protein ACL
+pub struct VariantToProteinTranslator {
+    codon_table: CodonTable,
+}
+
+impl VariantToProteinTranslator {
+    pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
+        // Prevents protein context from depending on variant implementation
+        let codon_change = self.map_to_codon(variant)?;
+        let aa_change = self.codon_table.translate(codon_change)?;
+
+        Some(ProteinMutation {
+            position: variant.position.position / 3,
+            reference_aa: aa_change.reference,
+            alternate_aa: aa_change.alternate,
+        })
+    }
+}
+```
+
+## Bounded Context Responsibilities Summary
+
+1. **Sequence Context**: K-mer indexing and sequence similarity (foundation)
+2. **Alignment Context**: Pairwise alignment and motif discovery
+3. **Variant Context**: Variant calling and population genetics
+4. **Protein Context**: Structure prediction and functional analysis
+5. **Epigenomic Context**: Methylation profiling and age prediction
+6. **Pharmacogenomic Context**: Drug-gene interactions and clinical recommendations
+7. **Pipeline Context**: Workflow orchestration and result aggregation
+
+Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.