Files
wifi-densepose/examples/dna/ddd/bounded-context-map.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

20 KiB

Bounded Context Map - Genomic Analysis Platform

Context Map Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                         GENOMIC ANALYSIS PLATFORM                            │
└─────────────────────────────────────────────────────────────────────────────┘

    ┌──────────────────┐
    │   Pipeline       │ ◄───────── Orchestration Layer
    │   Context        │
    └────────┬─────────┘
             │ ACL (maps domain events to pipeline commands)
             │
    ┌────────┴─────────────────────────────────────────────┐
    │                                                       │
    ▼                                                       ▼
┌─────────────────┐                               ┌─────────────────┐
│   Sequence      │ Customer-Supplier             │   Alignment     │
│   Context       ├──────────────────────────────►│   Context       │
│                 │ (provides k-mer indices)      │                 │
└────────┬────────┘                               └────────┬────────┘
         │                                                 │
         │ Shared Kernel (GenomicPosition, QualityScore)  │
         │                                                 │
         ▼                                                 ▼
┌─────────────────┐                               ┌─────────────────┐
│   Variant       │                               │   Protein       │
│   Context       │◄──────────────────────────────┤   Context       │
│                 │  Partner (variant→structure)  │                 │
└────────┬────────┘                               └─────────────────┘
         │
         │ ACL (translates variants to epigenetic events)
         │
         ▼
┌─────────────────┐
│  Epigenomic     │
│  Context        │
└────────┬────────┘
         │
         │ Customer-Supplier (epigenetic→drug response)
         │
         ▼
┌─────────────────┐
│ Pharmacogenomic │
│ Context         │
└─────────────────┘

Legend:
  Customer-Supplier: →  (upstream provides services to downstream)
  Shared Kernel:     ├─┤ (shared domain model)
  Partner:           ◄─► (mutual dependency)
  ACL:               [A] (anti-corruption layer)

1. Sequence Context

Module: kmer.rs

Responsibility: K-mer indexing, sequence sketching, and similarity search

Core Aggregates:

  • KmerIndex - Root aggregate managing k-mer → position mappings
  • MinHashSketch - Aggregate for approximate sequence similarity

Key Types:

pub struct KmerEncoder {
    k: usize,
    alphabet_size: usize,
}

pub struct KmerIndex {
    k: usize,
    index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
}

pub struct MinHashSketch {
    k: usize,
    num_hashes: usize,
    signatures: Vec<u64>,
}

Published Events:

  • SequenceIndexed { sequence_id: String, kmer_count: usize }
  • SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }

Domain Language:

  • K-mer: substring of length k
  • Minimizer: canonical k-mer representation
  • Sketch: compressed sequence signature
  • Jaccard similarity: set overlap metric

Invariants:

  • K-mer length must be 3 ≤ k ≤ 32
  • MinHash signature size must be ≥ 1
  • All k-mers normalized to canonical form (min(kmer, reverse_complement))

2. Alignment Context

Module: alignment.rs

Responsibility: Sequence alignment using attention mechanisms and motif detection

Core Aggregates:

  • AttentionAligner - Root aggregate for pairwise sequence alignment
  • MotifScanner - Aggregate for regulatory motif discovery

Key Types:

pub struct AttentionAligner {
    attention_service: Arc<AttentionService>,
    gap_penalty: f64,
    match_bonus: f64,
}

pub struct MotifScanner {
    attention_service: Arc<AttentionService>,
    min_score: f64,
    known_motifs: Vec<MotifPattern>,
}

pub struct AlignmentResult {
    pub score: f64,
    pub aligned_query: String,
    pub aligned_target: String,
    pub attention_weights: Vec<Vec<f64>>,
}

Published Events:

  • AlignmentCompleted { query_id: String, target_id: String, score: f64 }
  • MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }

Domain Language:

  • Alignment: optimal mapping between two sequences
  • Gap penalty: cost of insertions/deletions
  • Attention weight: learned similarity between positions
  • Motif: conserved sequence pattern (e.g., TATA box)
  • PWM (Position Weight Matrix): motif scoring matrix

Invariants:

  • Gap penalty must be negative
  • Match bonus must be positive
  • Motif minimum score 0.0 ≤ score ≤ 1.0
  • Alignment score monotonically decreases with gaps

Relationship with Sequence Context:

  • Type: Customer-Supplier
  • Direction: Sequence → Alignment
  • Integration: Alignment consumes k-mer indices for fast seed-and-extend
  • Translation: None (direct dependency)

3. Variant Context

Module: variant.rs

Responsibility: Variant calling, genotyping, and population genetics

Core Aggregates:

  • VariantDatabase - Root aggregate managing variant collection
  • VariantCaller - Service aggregate for variant detection

Key Types:

pub struct VariantCaller {
    min_quality: f64,
    min_depth: usize,
    gnn_service: Arc<GnnService>,
}

pub struct Variant {
    pub position: GenomicPosition,
    pub reference: String,
    pub alternate: String,
    pub quality: f64,
    pub genotype: Genotype,
    pub depth: usize,
    pub allele_frequency: Option<f64>,
}

pub struct VariantDatabase {
    variants: HashMap<GenomicPosition, Variant>,
    graph_index: Option<GraphIndex>, // GNN-based variant relationships
}

pub enum Genotype {
    Homozygous(Allele),
    Heterozygous(Allele, Allele),
}

Published Events:

  • VariantCalled { position: GenomicPosition, variant: Variant }
  • GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }
  • PopulationFrequencyCalculated { variant_id: String, frequency: f64 }

Domain Language:

  • SNP (Single Nucleotide Polymorphism): single base change
  • Indel: insertion or deletion
  • Genotype: allele combination (0/0, 0/1, 1/1)
  • Allele frequency: population prevalence
  • Quality score: confidence in variant call (Phred scale)
  • Coverage depth: number of reads supporting variant

Invariants:

  • Quality score ≥ 0 (Phred scale)
  • Coverage depth ≥ 1
  • Allele frequency 0.0 ≤ AF ≤ 1.0
  • Reference and alternate alleles must differ
  • Genotype alleles must match available alleles

Relationship with Alignment Context:

  • Type: Customer-Supplier
  • Direction: Alignment → Variant
  • Integration: Variant caller uses alignment results to identify mismatches
  • Translation: Alignment gaps → insertion/deletion variants

Shared Kernel with Sequence Context:

  • GenomicPosition { chromosome: String, position: usize }
  • QualityScore(f64) (Phred-scaled)
  • Nucleotide enum (A, C, G, T)

4. Protein Context

Module: protein.rs

Responsibility: Protein structure prediction and contact map generation

Core Aggregates:

  • ProteinGraph - Root aggregate representing protein as graph
  • ContactPredictor - Service aggregate for 3D contact prediction

Key Types:

pub struct ProteinGraph {
    pub sequence: String, // amino acid sequence
    pub nodes: Vec<AminoAcid>,
    pub edges: Vec<(usize, usize, ContactType)>,
}

pub struct ContactPredictor {
    gnn_service: Arc<GnnService>,
    attention_service: Arc<AttentionService>,
    distance_threshold: f64, // Ångströms
}

pub struct ContactPrediction {
    pub residue_i: usize,
    pub residue_j: usize,
    pub probability: f64,
    pub distance: Option<f64>,
}

pub enum ContactType {
    Backbone,
    SideChain,
    HydrogenBond,
    DisulfideBridge,
}

Published Events:

  • ProteinTranslated { gene_id: String, protein_sequence: String }
  • StructurePredicted { protein_id: String, contact_count: usize }
  • FoldingPathwayComputed { protein_id: String, energy: f64 }

Domain Language:

  • Amino acid: protein building block (20 standard types)
  • Residue: amino acid position in sequence
  • Contact: spatial proximity between residues (<8Å)
  • Secondary structure: local folding patterns (helix, sheet, loop)
  • Tertiary structure: 3D protein fold
  • Contact map: matrix of residue-residue distances

Invariants:

  • Sequence length ≥ 1
  • Contact probability 0.0 ≤ p ≤ 1.0
  • Distance threshold > 0.0 (typically 8.0Å)
  • Contact pairs must be |i - j| ≥ 4 (exclude local contacts)

Relationship with Variant Context:

  • Type: Partner (bidirectional)
  • Direction: Variant ↔ Protein
  • Integration:
    • Variant → Protein: coding variants cause amino acid changes
    • Protein → Variant: structural changes inform variant pathogenicity
  • Translation:
    • Variant ACL translates nucleotide changes to codon changes
    • Protein ACL maps structure disruption to clinical significance

5. Epigenomic Context

Module: epigenomics.rs

Responsibility: DNA methylation analysis and epigenetic age prediction

Core Aggregates:

  • EpigeneticIndex - Root aggregate managing methylation sites
  • HorvathClock - Service aggregate for epigenetic age calculation

Key Types:

pub struct MethylationProfile {
    pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
    pub total_sites: usize,
    pub mean_methylation: f64,
}

pub struct HorvathClock {
    pub coefficients: HashMap<String, f64>, // CpG site → weight
    pub intercept: f64,
}

pub struct CpGSite {
    pub position: GenomicPosition,
    pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
    pub coverage: usize,
}

pub struct EpigeneticAge {
    pub chronological_age: Option<f64>,
    pub predicted_age: f64,
    pub acceleration: f64, // predicted - chronological
}

Published Events:

  • MethylationProfileGenerated { sample_id: String, site_count: usize }
  • EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }
  • DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }

Domain Language:

  • CpG site: cytosine-guanine dinucleotide (methylation target)
  • Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
  • Epigenetic clock: age predictor based on methylation
  • Age acceleration: difference between epigenetic and chronological age
  • DMR (Differentially Methylated Region): region with changed methylation

Invariants:

  • Beta value 0.0 ≤ β ≤ 1.0
  • Coverage ≥ 1
  • Horvath coefficients sum to meaningful scale
  • Age ≥ 0.0

Relationship with Variant Context:

  • Type: Anti-Corruption Layer
  • Direction: Variant → Epigenomic
  • Integration: Variants in regulatory regions affect methylation patterns
  • Translation:
    • ACL translates genetic variants to epigenetic effects
    • Maps SNPs → methylation quantitative trait loci (mQTL)
    • Prevents variant domain concepts from leaking into epigenetic model

6. Pharmacogenomic Context

Module: pharma.rs

Responsibility: Pharmacogenetic analysis and drug-gene interaction prediction

Core Aggregates:

  • DrugInteractionGraph - Root aggregate representing drug-gene network
  • StarAlleleCaller - Service aggregate for haplotype phasing

Key Types:

pub struct StarAlleleCaller {
    gene_definitions: HashMap<String, GeneDefinition>,
    min_coverage: usize,
}

pub struct StarAllele {
    pub gene: String,
    pub allele: String, // e.g., "*1", "*2", "*17"
    pub variants: Vec<Variant>,
    pub function: AlleleFunction,
}

pub enum AlleleFunction {
    Normal,
    Increased,
    Decreased,
    NoFunction,
}

pub struct DrugInteractionGraph {
    pub nodes: Vec<DrugGeneNode>,
    pub edges: Vec<(usize, usize, InteractionType)>,
}

pub struct DrugResponse {
    pub drug: String,
    pub diplotype: Diplotype,
    pub phenotype: MetabolizerPhenotype,
    pub recommendation: ClinicalRecommendation,
}

pub enum MetabolizerPhenotype {
    UltraRapid,
    Rapid,
    Normal,
    Intermediate,
    Poor,
}

Published Events:

  • StarAlleleIdentified { gene: String, allele: String, diplotype: String }
  • DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }
  • InteractionDetected { drug1: String, drug2: String, severity: Severity }

Domain Language:

  • Star allele: named haplotype variant (e.g., CYP2D6*4)
  • Diplotype: pair of haplotypes (e.g., *1/*4)
  • Metabolizer phenotype: drug metabolism rate
  • Pharmacogene: gene affecting drug response
  • Drug-gene interaction: how genetics modulates drug efficacy/toxicity

Invariants:

  • Diplotype must have exactly 2 alleles
  • Phenotype derivable from diplotype
  • Coverage ≥ minimum threshold for calling
  • All star allele variants must exist in variant database

Relationship with Epigenomic Context:

  • Type: Customer-Supplier
  • Direction: Epigenomic → Pharmacogenomic
  • Integration: Methylation affects drug metabolism gene expression
  • Translation: Methylation beta values → gene expression levels → phenotype

7. Pipeline Context

Module: pipeline.rs

Responsibility: Orchestration of multi-stage genomic analysis workflow

Core Aggregates:

  • GenomicPipeline - Root aggregate orchestrating all contexts

Key Types:

pub struct GenomicPipeline {
    pub kmer_encoder: KmerEncoder,
    pub aligner: AttentionAligner,
    pub variant_caller: VariantCaller,
    pub protein_predictor: ContactPredictor,
    pub methylation_analyzer: MethylationAnalyzer,
    pub pharma_analyzer: StarAlleleCaller,
}

pub struct PipelineConfig {
    pub k: usize,
    pub min_variant_quality: f64,
    pub min_coverage: usize,
    pub enable_protein_prediction: bool,
    pub enable_epigenetic_analysis: bool,
    pub enable_pharmacogenomics: bool,
}

pub struct AnalysisResult {
    pub sequence_stats: SequenceStats,
    pub variants: Vec<Variant>,
    pub protein_structures: Vec<ProteinGraph>,
    pub methylation_profile: Option<MethylationProfile>,
    pub drug_responses: Vec<DrugResponse>,
}

Published Events:

  • PipelineStarted { sample_id: String, stages: Vec<String> }
  • StageCompleted { stage: String, duration_ms: u64 }
  • PipelineCompleted { sample_id: String, total_duration_ms: u64 }
  • PipelineFailed { stage: String, error: String }

Domain Language:

  • Pipeline: directed acyclic graph of analysis stages
  • Stage: atomic analysis unit (alignment, variant calling, etc.)
  • Workflow: ordered execution of stages
  • Checkpoint: saved intermediate state
  • Provenance: lineage tracking of analysis steps

Invariants:

  • All enabled stages must execute in dependency order
  • Failed stage halts downstream execution
  • All results traceable to input data and parameters

Anti-Corruption Layers:

The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:

  1. Sequence ACL: Translates k-mer indices to alignment seeds
  2. Alignment ACL: Converts alignment gaps to variant candidates
  3. Variant ACL: Maps variants to protein mutations
  4. Protein ACL: Translates structure to functional predictions
  5. Epigenetic ACL: Converts methylation to gene expression estimates
  6. Pharmacogenomic ACL: Maps genotypes to clinical recommendations

Context Relationship Matrix

From ↓ / To → Sequence Alignment Variant Protein Epigenomic Pharma Pipeline
Sequence - C-S SK SK - - ACL
Alignment - - C-S - - - ACL
Variant - - - Partner ACL - ACL
Protein - - Partner - - - ACL
Epigenomic - - - - - C-S ACL
Pharma - - - - - - ACL
Pipeline C-S C-S C-S C-S C-S C-S -

Legend:

  • C-S: Customer-Supplier
  • SK: Shared Kernel
  • Partner: Partnership
  • ACL: Anti-Corruption Layer

Integration Patterns

1. Event-Driven Integration

Contexts communicate via domain events to maintain loose coupling:

// Example: Variant Context publishes event
pub enum DomainEvent {
    VariantCalled(VariantCalledEvent),
    ProteinStructurePredicted(ProteinPredictedEvent),
    // ...
}

// Pipeline Context subscribes and translates
impl EventHandler for GenomicPipeline {
    fn handle(&mut self, event: DomainEvent) {
        match event {
            DomainEvent::VariantCalled(e) => {
                if e.variant.is_coding() {
                    self.trigger_protein_analysis(e.variant);
                }
            }
            // ...
        }
    }
}

2. Shared Kernel Components

Core domain types shared across contexts:

// In types.rs (core domain)
pub struct GenomicPosition {
    pub chromosome: String,
    pub position: usize,
}

pub struct QualityScore(pub f64); // Phred-scaled

pub enum Nucleotide { A, C, G, T }

pub struct GenomicRegion {
    pub chromosome: String,
    pub start: usize,
    pub end: usize,
}

3. Anti-Corruption Layer Example

// Variant → Protein ACL
pub struct VariantToProteinTranslator {
    codon_table: CodonTable,
}

impl VariantToProteinTranslator {
    pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
        // Prevents protein context from depending on variant implementation
        let codon_change = self.map_to_codon(variant)?;
        let aa_change = self.codon_table.translate(codon_change)?;

        Some(ProteinMutation {
            position: variant.position.position / 3,
            reference_aa: aa_change.reference,
            alternate_aa: aa_change.alternate,
        })
    }
}

Bounded Context Responsibilities Summary

  1. Sequence Context: K-mer indexing and sequence similarity (foundation)
  2. Alignment Context: Pairwise alignment and motif discovery
  3. Variant Context: Variant calling and population genetics
  4. Protein Context: Structure prediction and functional analysis
  5. Epigenomic Context: Methylation profiling and age prediction
  6. Pharmacogenomic Context: Drug-gene interactions and clinical recommendations
  7. Pipeline Context: Workflow orchestration and result aggregation

Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.