Files
wifi-densepose/vendor/ruvector/examples/dna/ddd/domain-model.md

24 KiB
Raw Permalink Blame History

Domain Model - Genomic Analysis Platform

Overview

This document defines all entities, value objects, aggregates, and domain events across the seven bounded contexts. Each type is shown with its Rust signature and business rules.

Core Domain Types (Shared Kernel)

Value Objects

/// Genomic coordinate (immutable)
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct GenomicPosition {
    pub chromosome: String,
    pub position: usize,
}

// Invariants:
// - chromosome must be valid (1-22, X, Y, MT)
// - position must be ≥ 1

/// Quality score using Phred scale: Q = -10 * log10(P_error)
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct QualityScore(pub f64);

// Invariants:
// - score ≥ 0
// - Q=10 means 10% error rate
// - Q=20 means 1% error rate
// - Q=30 means 0.1% error rate

/// Single nucleotide
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Nucleotide {
    A, // Adenine
    C, // Cytosine
    G, // Guanine
    T, // Thymine
}

// Operations:
impl Nucleotide {
    pub fn complement(&self) -> Self;
    pub fn to_byte(&self) -> u8;
    pub fn from_byte(b: u8) -> Result<Self, Error>;
}

/// Genomic interval
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct GenomicRegion {
    pub chromosome: String,
    pub start: usize,
    pub end: usize,
}

// Invariants:
// - start < end
// - start ≥ 1
// - Same chromosome validity rules as GenomicPosition

/// Amino acid single-letter code
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum AminoAcid {
    A, C, D, E, F, G, H, I, K, L,
    M, N, P, Q, R, S, T, V, W, Y,
    Stop,
}

// Invariants:
// - 20 standard amino acids + stop codon
// - Each has specific properties (hydrophobic, charged, etc.)

1. Sequence Context Domain Model

Aggregates

/// Aggregate Root: K-mer index for fast sequence search
pub struct KmerIndex {
    k: usize,
    index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
    sequence_length: usize,
}

// Aggregate boundary: Controls all k-mer operations
// Invariants:
// - 3 ≤ k ≤ 32
// - All positions < sequence_length
// - K-mers stored in canonical form

impl KmerIndex {
    pub fn new(k: usize) -> Result<Self, Error>;
    pub fn index_sequence(&mut self, sequence: &[u8]) -> Result<(), Error>;
    pub fn query(&self, kmer: &[u8]) -> Vec<usize>;
    pub fn contains(&self, kmer: &[u8]) -> bool;
}

/// Aggregate: MinHash sketch for approximate similarity
pub struct MinHashSketch {
    k: usize,
    num_hashes: usize,
    signatures: Vec<u64>,
}

// Invariants:
// - num_hashes ≥ 1 (typically 128-1024)
// - signatures.len() == num_hashes
// - Signatures sorted in ascending order

impl MinHashSketch {
    pub fn new(k: usize, num_hashes: usize) -> Self;
    pub fn add_sequence(&mut self, sequence: &[u8]);
    pub fn jaccard_similarity(&self, other: &Self) -> f64;
}

Entities

/// Entity: DNA sequence with metadata
#[derive(Debug, Clone)]
pub struct DnaSequence {
    pub id: String, // Identity
    pub sequence: Vec<u8>,
    pub quality_scores: Option<Vec<QualityScore>>,
    pub created_at: DateTime<Utc>,
}

// Invariants:
// - id must be unique
// - sequence contains only A, C, G, T, N
// - if quality_scores.is_some(), length must equal sequence length

impl DnaSequence {
    pub fn reverse_complement(&self) -> Self;
    pub fn gc_content(&self) -> f64;
    pub fn length(&self) -> usize;
}

Value Objects

/// K-mer encoder configuration
#[derive(Debug, Clone, Copy)]
pub struct KmerConfig {
    pub k: usize,
    pub alphabet_size: usize,
}

// Invariants:
// - k ≥ 3
// - alphabet_size typically 4 (DNA) or 20 (protein)

Domain Events

pub enum SequenceEvent {
    SequenceIndexed {
        sequence_id: String,
        kmer_count: usize,
        timestamp: DateTime<Utc>,
    },
    SimilarSequenceFound {
        query_id: String,
        match_id: String,
        similarity: f64,
        timestamp: DateTime<Utc>,
    },
}

2. Alignment Context Domain Model

Aggregates

/// Aggregate Root: Attention-based sequence aligner
pub struct AttentionAligner {
    attention_service: Arc<AttentionService>,
    gap_penalty: f64,
    match_bonus: f64,
}

// Invariants:
// - gap_penalty < 0
// - match_bonus > 0
// - |gap_penalty| < match_bonus (gaps should be costly)

impl AttentionAligner {
    pub fn align(&self, query: &[u8], target: &[u8])
        -> Result<AlignmentResult, Error>;
    pub fn batch_align(&self, pairs: Vec<(&[u8], &[u8])>)
        -> Result<Vec<AlignmentResult>, Error>;
}

/// Aggregate: Motif scanner for regulatory elements
pub struct MotifScanner {
    attention_service: Arc<AttentionService>,
    min_score: f64,
    known_motifs: Vec<MotifPattern>,
}

// Invariants:
// - 0.0 ≤ min_score ≤ 1.0
// - All motif patterns valid (length ≥ 4)

impl MotifScanner {
    pub fn scan(&self, sequence: &[u8]) -> Vec<MotifMatch>;
    pub fn add_motif(&mut self, pattern: MotifPattern);
}

Value Objects

/// Alignment result (immutable)
#[derive(Debug, Clone)]
pub struct AlignmentResult {
    pub score: f64,
    pub aligned_query: String,
    pub aligned_target: String,
    pub attention_weights: Vec<Vec<f64>>,
    pub identity: f64, // % exact matches
    pub gaps: usize,
}

// Invariants:
// - aligned_query.len() == aligned_target.len()
// - 0.0 ≤ identity ≤ 1.0
// - attention_weights dimensions match alignment length

/// Motif pattern definition
#[derive(Debug, Clone)]
pub struct MotifPattern {
    pub name: String,
    pub consensus: String, // IUPAC nucleotide codes
    pub pwm: Vec<[f64; 4]>, // Position Weight Matrix
}

// Invariants:
// - consensus.len() == pwm.len()
// - Each PWM position sums to ~1.0
// - pwm.len() ≥ 4

/// Motif match location
#[derive(Debug, Clone)]
pub struct MotifMatch {
    pub motif_name: String,
    pub position: usize,
    pub score: f64,
    pub strand: Strand,
}

#[derive(Debug, Clone, Copy)]
pub enum Strand {
    Forward,
    Reverse,
}

Domain Events

pub enum AlignmentEvent {
    AlignmentCompleted {
        query_id: String,
        target_id: String,
        score: f64,
        timestamp: DateTime<Utc>,
    },
    MotifDetected {
        sequence_id: String,
        motif: String,
        position: usize,
        score: f64,
        timestamp: DateTime<Utc>,
    },
}

3. Variant Context Domain Model

Aggregates

/// Aggregate Root: Collection of genetic variants
pub struct VariantDatabase {
    variants: HashMap<GenomicPosition, Variant>,
    graph_index: Option<GraphIndex>,
    population_frequencies: HashMap<String, f64>,
}

// Aggregate boundary: Ensures variant consistency and relationships
// Invariants:
// - No duplicate positions
// - All frequencies 0.0 ≤ f ≤ 1.0
// - Graph index consistent with variant set

impl VariantDatabase {
    pub fn add_variant(&mut self, variant: Variant) -> Result<(), Error>;
    pub fn get_variant(&self, pos: &GenomicPosition) -> Option<&Variant>;
    pub fn variants_in_region(&self, region: &GenomicRegion) -> Vec<&Variant>;
    pub fn update_frequency(&mut self, pos: &GenomicPosition, freq: f64);
}

/// Service Aggregate: Variant calling engine
pub struct VariantCaller {
    min_quality: f64,
    min_depth: usize,
    gnn_service: Arc<GnnService>,
}

// Invariants:
// - min_quality ≥ 0
// - min_depth ≥ 1

impl VariantCaller {
    pub fn call_variants(&self, reads: &[Read], reference: &[u8])
        -> Result<Vec<Variant>, Error>;
    pub fn genotype(&self, variant: &Variant, reads: &[Read])
        -> Result<Genotype, Error>;
}

Entities

/// Entity: Genetic variant with identity at genomic position
#[derive(Debug, Clone)]
pub struct Variant {
    pub position: GenomicPosition, // Identity (part of)
    pub reference: String,
    pub alternate: String,
    pub variant_type: VariantType,
    pub quality: f64,
    pub genotype: Genotype,
    pub depth: usize,
    pub allele_frequency: Option<f64>,
    pub annotations: Vec<Annotation>,
}

// Invariants:
// - reference != alternate
// - quality ≥ 0
// - depth ≥ 1
// - if allele_frequency.is_some(), 0.0 ≤ f ≤ 1.0
// - variant_type consistent with reference/alternate

impl Variant {
    pub fn is_snp(&self) -> bool;
    pub fn is_indel(&self) -> bool;
    pub fn is_coding(&self) -> bool;
    pub fn clinical_significance(&self) -> ClinicalSignificance;
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum VariantType {
    SNP,
    Insertion,
    Deletion,
    MNP, // Multi-nucleotide polymorphism
    Complex,
}

Value Objects

/// Genotype representation
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Genotype {
    Homozygous(Allele),
    Heterozygous(Allele, Allele),
}

// Invariants:
// - Heterozygous alleles must differ
// - Alleles must match variant's reference/alternate

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Allele {
    Reference,
    Alternate,
}

/// Variant annotation
#[derive(Debug, Clone)]
pub struct Annotation {
    pub gene: String,
    pub consequence: Consequence,
    pub impact: Impact,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Consequence {
    Synonymous,
    Missense,
    Nonsense,
    FrameShift,
    SpliceSite,
    Regulatory,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Impact {
    High,
    Moderate,
    Low,
    Modifier,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ClinicalSignificance {
    Benign,
    LikelyBenign,
    VUS, // Variant of Uncertain Significance
    LikelyPathogenic,
    Pathogenic,
}

Domain Events

pub enum VariantEvent {
    VariantCalled {
        position: GenomicPosition,
        variant: Variant,
        timestamp: DateTime<Utc>,
    },
    GenotypeUpdated {
        sample_id: String,
        position: GenomicPosition,
        genotype: Genotype,
        timestamp: DateTime<Utc>,
    },
    PopulationFrequencyCalculated {
        variant_id: String,
        frequency: f64,
        population: String,
        timestamp: DateTime<Utc>,
    },
}

4. Protein Context Domain Model

Aggregates

/// Aggregate Root: Protein represented as graph
pub struct ProteinGraph {
    pub id: String,
    pub sequence: String, // Amino acid sequence
    pub nodes: Vec<Residue>,
    pub edges: Vec<Contact>,
    pub secondary_structure: Vec<SecondaryStructureElement>,
}

// Aggregate boundary: Manages all structural relationships
// Invariants:
// - nodes.len() == sequence.len()
// - All edge indices < nodes.len()
// - No duplicate contacts

impl ProteinGraph {
    pub fn from_sequence(sequence: String) -> Self;
    pub fn add_contact(&mut self, i: usize, j: usize, contact_type: ContactType);
    pub fn contact_map(&self) -> Vec<Vec<f64>>;
    pub fn fold_energy(&self) -> f64;
}

/// Service Aggregate: 3D contact prediction
pub struct ContactPredictor {
    gnn_service: Arc<GnnService>,
    attention_service: Arc<AttentionService>,
    distance_threshold: f64,
}

// Invariants:
// - distance_threshold > 0.0 (typically 8.0 Ångströms)

impl ContactPredictor {
    pub fn predict_contacts(&self, sequence: &str)
        -> Result<Vec<ContactPrediction>, Error>;
    pub fn predict_structure(&self, sequence: &str)
        -> Result<ProteinGraph, Error>;
}

Entities

/// Entity: Amino acid residue in protein
#[derive(Debug, Clone)]
pub struct Residue {
    pub position: usize, // Identity
    pub amino_acid: AminoAcid,
    pub phi_angle: Option<f64>, // Backbone dihedral
    pub psi_angle: Option<f64>, // Backbone dihedral
    pub secondary_structure: Option<SecondaryStructure>,
}

// Invariants:
// - position ≥ 1
// - -180° ≤ phi, psi ≤ 180°

Value Objects

/// Contact between residues
#[derive(Debug, Clone)]
pub struct Contact {
    pub residue_i: usize,
    pub residue_j: usize,
    pub contact_type: ContactType,
    pub distance: Option<f64>, // Ångströms
}

// Invariants:
// - residue_i < residue_j (ordered)
// - |residue_i - residue_j| ≥ 4 (exclude local contacts)
// - if distance.is_some(), distance > 0.0

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ContactType {
    Backbone,
    SideChain,
    HydrogenBond,
    DisulfideBridge,
}

/// Contact prediction with confidence
#[derive(Debug, Clone)]
pub struct ContactPrediction {
    pub residue_i: usize,
    pub residue_j: usize,
    pub probability: f64,
    pub distance: Option<f64>,
}

// Invariants:
// - 0.0 ≤ probability ≤ 1.0

/// Secondary structure element
#[derive(Debug, Clone)]
pub struct SecondaryStructureElement {
    pub start: usize,
    pub end: usize,
    pub structure_type: SecondaryStructure,
}

// Invariants:
// - start < end

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum SecondaryStructure {
    Helix,      // α-helix
    Sheet,      // β-sheet
    Loop,       // Random coil
    Turn,       // β-turn
}

/// Protein mutation
#[derive(Debug, Clone)]
pub struct ProteinMutation {
    pub position: usize,
    pub reference_aa: AminoAcid,
    pub alternate_aa: AminoAcid,
    pub structural_impact: f64, // 0.0-1.0
}

Domain Events

pub enum ProteinEvent {
    ProteinTranslated {
        gene_id: String,
        protein_sequence: String,
        timestamp: DateTime<Utc>,
    },
    StructurePredicted {
        protein_id: String,
        contact_count: usize,
        confidence: f64,
        timestamp: DateTime<Utc>,
    },
}

5. Epigenomic Context Domain Model

Aggregates

/// Aggregate Root: Epigenetic modification index
pub struct EpigeneticIndex {
    cpg_sites: HashMap<GenomicPosition, CpGSite>,
    methylation_profile: MethylationProfile,
}

// Aggregate boundary: Manages methylation data consistency
// Invariants:
// - All CpG sites have valid genomic positions
// - Beta values 0.0 ≤ β ≤ 1.0

impl EpigeneticIndex {
    pub fn add_site(&mut self, site: CpGSite) -> Result<(), Error>;
    pub fn get_profile(&self) -> &MethylationProfile;
    pub fn differential_methylation(&self, other: &Self)
        -> Vec<DifferentialRegion>;
}

/// Service Aggregate: Epigenetic age calculator
pub struct HorvathClock {
    coefficients: HashMap<String, f64>,
    intercept: f64,
}

// Invariants:
// - At least 353 CpG sites (original Horvath model)
// - Coefficients normalized

impl HorvathClock {
    pub fn predict_age(&self, profile: &MethylationProfile)
        -> Result<EpigeneticAge, Error>;
}

Entities

/// Entity: Methylation profile for sample
#[derive(Debug, Clone)]
pub struct MethylationProfile {
    pub sample_id: String, // Identity
    pub cpg_sites: HashMap<GenomicPosition, f64>,
    pub total_sites: usize,
    pub mean_methylation: f64,
    pub created_at: DateTime<Utc>,
}

// Invariants:
// - cpg_sites.len() ≤ total_sites
// - All beta values 0.0 ≤ β ≤ 1.0
// - mean_methylation = average of all beta values

impl MethylationProfile {
    pub fn global_methylation(&self) -> f64;
    pub fn region_methylation(&self, region: &GenomicRegion) -> f64;
}

Value Objects

/// CpG methylation site
#[derive(Debug, Clone)]
pub struct CpGSite {
    pub position: GenomicPosition,
    pub beta_value: f64, // 0.0 = unmethylated, 1.0 = fully methylated
    pub coverage: usize,
    pub quality: QualityScore,
}

// Invariants:
// - 0.0 ≤ beta_value ≤ 1.0
// - coverage ≥ 1

/// Epigenetic age prediction
#[derive(Debug, Clone)]
pub struct EpigeneticAge {
    pub chronological_age: Option<f64>,
    pub predicted_age: f64,
    pub acceleration: f64, // predicted - chronological
    pub confidence_interval: (f64, f64),
}

// Invariants:
// - predicted_age ≥ 0.0
// - confidence_interval.0 < confidence_interval.1

/// Differentially methylated region
#[derive(Debug, Clone)]
pub struct DifferentialRegion {
    pub region: GenomicRegion,
    pub delta_beta: f64,
    pub p_value: f64,
}

// Invariants:
// - -1.0 ≤ delta_beta ≤ 1.0
// - 0.0 ≤ p_value ≤ 1.0

Domain Events

pub enum EpigenomicEvent {
    MethylationProfileGenerated {
        sample_id: String,
        site_count: usize,
        timestamp: DateTime<Utc>,
    },
    EpigeneticAgeCalculated {
        sample_id: String,
        age: f64,
        acceleration: f64,
        timestamp: DateTime<Utc>,
    },
}

6. Pharmacogenomic Context Domain Model

Aggregates

/// Aggregate Root: Drug-gene interaction network
pub struct DrugInteractionGraph {
    nodes: Vec<DrugGeneNode>,
    edges: Vec<Interaction>,
    phenotype_map: HashMap<Diplotype, MetabolizerPhenotype>,
}

// Aggregate boundary: Manages pharmacogenetic relationships
// Invariants:
// - All edge indices valid
// - All diplotypes map to phenotypes

impl DrugInteractionGraph {
    pub fn add_interaction(&mut self, interaction: Interaction);
    pub fn predict_response(&self, drug: &str, diplotype: &Diplotype)
        -> DrugResponse;
}

/// Service Aggregate: Star allele haplotype caller
pub struct StarAlleleCaller {
    gene_definitions: HashMap<String, GeneDefinition>,
    min_coverage: usize,
}

// Invariants:
// - min_coverage ≥ 1
// - All genes have valid definitions

impl StarAlleleCaller {
    pub fn call_alleles(&self, variants: &[Variant], gene: &str)
        -> Result<Diplotype, Error>;
}

Entities

/// Entity: Star allele definition
#[derive(Debug, Clone)]
pub struct StarAllele {
    pub id: String, // Identity (e.g., "CYP2D6*4")
    pub gene: String,
    pub allele: String,
    pub variants: Vec<Variant>,
    pub function: AlleleFunction,
    pub activity_score: f64,
}

// Invariants:
// - id format: "{gene}*{allele_number}"
// - 0.0 ≤ activity_score ≤ 2.0

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum AlleleFunction {
    Normal,
    Increased,
    Decreased,
    NoFunction,
}

Value Objects

/// Diplotype (pair of haplotypes)
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Diplotype {
    pub allele1: String,
    pub allele2: String,
}

// Invariants:
// - Both alleles non-empty
// - Canonical ordering (allele1 ≤ allele2 lexicographically)

/// Metabolizer phenotype derived from diplotype
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum MetabolizerPhenotype {
    UltraRapid,   // Activity score > 2.0
    Rapid,        // Activity score 1.5-2.0
    Normal,       // Activity score 1.0-1.5
    Intermediate, // Activity score 0.5-1.0
    Poor,         // Activity score < 0.5
}

impl MetabolizerPhenotype {
    pub fn from_activity_score(score: f64) -> Self;
}

/// Drug response prediction
#[derive(Debug, Clone)]
pub struct DrugResponse {
    pub drug: String,
    pub diplotype: Diplotype,
    pub phenotype: MetabolizerPhenotype,
    pub recommendation: ClinicalRecommendation,
}

#[derive(Debug, Clone)]
pub struct ClinicalRecommendation {
    pub recommendation_type: RecommendationType,
    pub dosage_adjustment: Option<f64>, // Multiplier
    pub alternative_drug: Option<String>,
    pub cpic_level: CpicLevel, // CPIC evidence level
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum RecommendationType {
    Standard,
    IncreaseDose,
    DecreaseDose,
    AlternativeDrug,
    Contraindicated,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum CpicLevel {
    A, // High evidence
    B, // Moderate evidence
    C, // Low evidence
    D, // Preclinical evidence
}

/// Drug-gene interaction
#[derive(Debug, Clone)]
pub struct Interaction {
    pub drug: String,
    pub gene: String,
    pub interaction_type: InteractionType,
    pub strength: f64, // 0.0-1.0
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum InteractionType {
    Metabolism,
    Transport,
    Target,
    Toxicity,
}

Domain Events

pub enum PharmacogenomicEvent {
    StarAlleleIdentified {
        gene: String,
        allele: String,
        diplotype: String,
        timestamp: DateTime<Utc>,
    },
    DrugResponsePredicted {
        drug: String,
        phenotype: MetabolizerPhenotype,
        recommendation: RecommendationType,
        timestamp: DateTime<Utc>,
    },
}

7. Pipeline Context Domain Model

Aggregates

/// Aggregate Root: Complete genomic analysis workflow
pub struct GenomicPipeline {
    pub id: String,
    pub config: PipelineConfig,
    stages: Vec<PipelineStage>,
    state: PipelineState,
    results: AnalysisResult,
}

// Aggregate boundary: Orchestrates all analysis contexts
// Invariants:
// - Stages execute in dependency order
// - No stage runs until dependencies complete
// - Failed stage prevents downstream execution

impl GenomicPipeline {
    pub fn new(config: PipelineConfig) -> Self;
    pub fn run(&mut self, input: SequenceData) -> Result<AnalysisResult, Error>;
    pub fn run_stage(&mut self, stage: &str) -> Result<(), Error>;
    pub fn checkpoint(&self) -> Result<(), Error>;
    pub fn restore(checkpoint_id: &str) -> Result<Self, Error>;
}

Value Objects

/// Pipeline configuration
#[derive(Debug, Clone)]
pub struct PipelineConfig {
    pub k: usize,
    pub min_variant_quality: f64,
    pub min_coverage: usize,
    pub enable_protein_prediction: bool,
    pub enable_epigenetic_analysis: bool,
    pub enable_pharmacogenomics: bool,
}

/// Analysis stage definition
#[derive(Debug, Clone)]
pub struct PipelineStage {
    pub name: String,
    pub dependencies: Vec<String>,
    pub timeout: Duration,
    pub retries: usize,
}

/// Pipeline execution state
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PipelineState {
    Idle,
    Running,
    Completed,
    Failed,
}

/// Complete analysis result
#[derive(Debug, Clone)]
pub struct AnalysisResult {
    pub sequence_stats: SequenceStats,
    pub variants: Vec<Variant>,
    pub protein_structures: Vec<ProteinGraph>,
    pub methylation_profile: Option<MethylationProfile>,
    pub drug_responses: Vec<DrugResponse>,
    pub execution_time: Duration,
}

#[derive(Debug, Clone)]
pub struct SequenceStats {
    pub total_length: usize,
    pub gc_content: f64,
    pub n_count: usize,
    pub quality_mean: f64,
}

Domain Events

pub enum PipelineEvent {
    PipelineStarted {
        pipeline_id: String,
        stages: Vec<String>,
        timestamp: DateTime<Utc>,
    },
    StageCompleted {
        pipeline_id: String,
        stage: String,
        duration_ms: u64,
        timestamp: DateTime<Utc>,
    },
    PipelineCompleted {
        pipeline_id: String,
        total_duration_ms: u64,
        timestamp: DateTime<Utc>,
    },
    PipelineFailed {
        pipeline_id: String,
        stage: String,
        error: String,
        timestamp: DateTime<Utc>,
    },
}

Business Rules Summary

Cross-Cutting Rules

  1. Quality Thresholds: All data must meet minimum quality scores
  2. Validation: Input data validated at bounded context entry points
  3. Traceability: All results traceable to source data and parameters
  4. Consistency: Aggregates maintain internal consistency invariants

Context-Specific Rules

Sequence Context:

  • K-mer indices use canonical (lexicographically minimal) representation
  • MinHash signatures maintain cardinality for accurate similarity

Alignment Context:

  • Gap penalties never exceed match bonuses
  • Motif matches require minimum conservation score

Variant Context:

  • Variants only called above quality and coverage thresholds
  • Population frequencies sum to 1.0 across all samples
  • Clinical significance based on ClinVar/evidence database

Protein Context:

  • Contacts only between residues separated by ≥4 positions
  • Secondary structure assignments mutually exclusive

Epigenomic Context:

  • Beta values strictly bounded [0.0, 1.0]
  • Epigenetic age non-negative

Pharmacogenomic Context:

  • Diplotypes sorted in canonical order
  • Phenotypes deterministically derived from diplotype activity scores
  • CPIC recommendations follow evidence-based guidelines

Pipeline Context:

  • Stage execution respects dependency DAG
  • Checkpoints enable recovery from failures
  • Configuration immutable during pipeline run

Aggregate Invariants

Each aggregate root enforces these invariants:

  1. Identity: Unique identifier within bounded context
  2. Completeness: All required fields populated
  3. Consistency: Related entities maintain referential integrity
  4. Validity: All values within acceptable ranges
  5. Atomicity: Changes commit or rollback as unit

These invariants ensure domain model correctness across all bounded contexts.