Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
0
examples/dna/ddd/.gitkeep
Normal file
0
examples/dna/ddd/.gitkeep
Normal file
871
examples/dna/ddd/architecture.md
Normal file
871
examples/dna/ddd/architecture.md
Normal file
@@ -0,0 +1,871 @@
|
||||
# Hexagonal Architecture - Genomic Analysis Platform
|
||||
|
||||
## Overview
|
||||
|
||||
The DNA analyzer follows hexagonal (ports and adapters) architecture to maintain domain logic independence from infrastructure concerns. The core domain remains pure Rust with no external dependencies, while adapters integrate with ruvector components.
|
||||
|
||||
## Hexagonal Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ PRIMARY ACTORS (Inbound) │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ CLI Client │ │ REST API │ │ Web UI │ │
|
||||
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
└──────────┼───────────────────┼───────────────────┼────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ PRIMARY PORTS (Inbound) │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PipelinePort trait │ │
|
||||
│ │ - run_analysis(input: SequenceData) -> Result │ │
|
||||
│ │ - get_status() -> PipelineStatus │ │
|
||||
│ │ - get_results() -> AnalysisResult │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ CORE DOMAIN (Pure) │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Domain Model (types.rs, error.rs) │ │
|
||||
│ │ - GenomicPosition, QualityScore, Nucleotide │ │
|
||||
│ │ - No external dependencies │ │
|
||||
│ │ - Pure business logic │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Domain Services (7 Bounded Contexts) │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Sequence │ │ Alignment │ │ Variant │ │ │
|
||||
│ │ │ (kmer.rs) │ │ (align.rs) │ │(variant.rs) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Protein │ │ Epigenomic │ │ Pharma │ │ │
|
||||
│ │ │(protein.rs) │ │(epigen.rs) │ │ (pharma.rs) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Pipeline Orchestrator (pipeline.rs) │ │ │
|
||||
│ │ │ - Coordinates all contexts │ │ │
|
||||
│ │ │ - Manages workflow execution │ │ │
|
||||
│ │ └──────────────────────────────────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SECONDARY PORTS (Outbound) │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ VectorStoragePort trait │ │
|
||||
│ │ - store_embedding(key, vec) -> Result │ │
|
||||
│ │ - search_similar(query, k) -> Vec<Match> │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AttentionPort trait │ │
|
||||
│ │ - compute_attention(Q, K, V) -> Tensor │ │
|
||||
│ │ - flash_attention(Q, K, V) -> Tensor │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ GraphNeuralPort trait │ │
|
||||
│ │ - gnn_inference(graph) -> Predictions │ │
|
||||
│ │ - graph_search(query) -> Vec<Node> │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PersistencePort trait │ │
|
||||
│ │ - save(data) -> Result │ │
|
||||
│ │ - load(id) -> Result<Data> │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SECONDARY ADAPTERS (Outbound) │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ RuVector │ │ RuVector │ │ RuVector │ │
|
||||
│ │ Core │ │ Attention │ │ GNN │ │
|
||||
│ │ (HNSW) │ │ (Flash) │ │ (Graph) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ SQLite │ │ PostgreSQL │ │ File │ │
|
||||
│ │ Adapter │ │ Adapter │ │ System │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
DEPENDENCY RULE: Dependencies point INWARD
|
||||
Core Domain ← Secondary Ports ← Secondary Adapters
|
||||
Core Domain ← Primary Ports ← Primary Adapters
|
||||
```
|
||||
|
||||
## Layer Definitions
|
||||
|
||||
### 1. Core Domain Layer
|
||||
|
||||
**Location**: `/src/types.rs`, `/src/error.rs`
|
||||
|
||||
**Characteristics**:
|
||||
- Zero external dependencies (except std)
|
||||
- Pure business logic
|
||||
- No knowledge of infrastructure
|
||||
- Immutable value objects
|
||||
- Rich domain model
|
||||
|
||||
**Example Types**:
|
||||
|
||||
```rust
|
||||
// types.rs - Pure domain types
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub struct GenomicPosition {
|
||||
pub chromosome: String,
|
||||
pub position: usize,
|
||||
}
|
||||
|
||||
impl GenomicPosition {
|
||||
pub fn new(chromosome: String, position: usize) -> Result<Self, DomainError> {
|
||||
if position == 0 {
|
||||
return Err(DomainError::InvalidPosition);
|
||||
}
|
||||
Ok(Self { chromosome, position })
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct QualityScore(pub f64);
|
||||
|
||||
impl QualityScore {
|
||||
pub fn from_phred(score: f64) -> Result<Self, DomainError> {
|
||||
if score < 0.0 {
|
||||
return Err(DomainError::InvalidQuality);
|
||||
}
|
||||
Ok(Self(score))
|
||||
}
|
||||
|
||||
pub fn error_probability(&self) -> f64 {
|
||||
10_f64.powf(-self.0 / 10.0)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum Nucleotide {
|
||||
A, C, G, T,
|
||||
}
|
||||
|
||||
impl Nucleotide {
|
||||
pub fn complement(&self) -> Self {
|
||||
match self {
|
||||
Nucleotide::A => Nucleotide::T,
|
||||
Nucleotide::T => Nucleotide::A,
|
||||
Nucleotide::C => Nucleotide::G,
|
||||
Nucleotide::G => Nucleotide::C,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// error.rs - Domain errors
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum DomainError {
|
||||
#[error("Invalid genomic position")]
|
||||
InvalidPosition,
|
||||
|
||||
#[error("Invalid quality score")]
|
||||
InvalidQuality,
|
||||
|
||||
#[error("Invalid sequence: {0}")]
|
||||
InvalidSequence(String),
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Domain Services Layer
|
||||
|
||||
**Location**: 7 bounded context modules
|
||||
|
||||
**Characteristics**:
|
||||
- Implements business logic using domain types
|
||||
- Depends on ports (traits), not implementations
|
||||
- Orchestrates domain operations
|
||||
- No infrastructure code
|
||||
|
||||
**Example Services**:
|
||||
|
||||
```rust
|
||||
// kmer.rs - Sequence Context service
|
||||
pub struct KmerEncoder {
|
||||
k: usize,
|
||||
alphabet_size: usize,
|
||||
}
|
||||
|
||||
impl KmerEncoder {
|
||||
pub fn new(k: usize) -> Result<Self, DomainError> {
|
||||
if k < 3 || k > 32 {
|
||||
return Err(DomainError::InvalidKmerSize);
|
||||
}
|
||||
Ok(Self { k, alphabet_size: 4 })
|
||||
}
|
||||
|
||||
// Pure domain logic - no infrastructure
|
||||
pub fn encode(&self, kmer: &[u8]) -> Result<u64, DomainError> {
|
||||
if kmer.len() != self.k {
|
||||
return Err(DomainError::InvalidKmerLength);
|
||||
}
|
||||
|
||||
let mut hash = 0u64;
|
||||
for &base in kmer {
|
||||
let encoded = match base {
|
||||
b'A' | b'a' => 0,
|
||||
b'C' | b'c' => 1,
|
||||
b'G' | b'g' => 2,
|
||||
b'T' | b't' => 3,
|
||||
_ => return Err(DomainError::InvalidNucleotide),
|
||||
};
|
||||
hash = hash * self.alphabet_size as u64 + encoded;
|
||||
}
|
||||
Ok(hash)
|
||||
}
|
||||
}
|
||||
|
||||
// variant.rs - Variant Context service (depends on ports)
|
||||
pub struct VariantCaller<G: GraphNeuralPort> {
|
||||
min_quality: f64,
|
||||
min_depth: usize,
|
||||
gnn_service: Arc<G>, // Port dependency
|
||||
}
|
||||
|
||||
impl<G: GraphNeuralPort> VariantCaller<G> {
|
||||
pub fn call_variants(
|
||||
&self,
|
||||
alignments: &[Alignment],
|
||||
) -> Result<Vec<Variant>, DomainError> {
|
||||
// Business logic using port abstraction
|
||||
let candidate_positions = self.identify_candidates(alignments)?;
|
||||
|
||||
// Use GNN port for variant classification
|
||||
let predictions = self.gnn_service.classify_variants(candidate_positions)?;
|
||||
|
||||
// Apply business rules
|
||||
predictions
|
||||
.into_iter()
|
||||
.filter(|v| v.quality >= self.min_quality && v.depth >= self.min_depth)
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Primary Ports (Inbound)
|
||||
|
||||
**Location**: `pipeline.rs` trait definitions
|
||||
|
||||
**Characteristics**:
|
||||
- Define application API
|
||||
- Trait-based contracts
|
||||
- Technology-agnostic
|
||||
- Used by primary adapters (CLI, API, UI)
|
||||
|
||||
**Example Ports**:
|
||||
|
||||
```rust
|
||||
// Primary port for pipeline orchestration
|
||||
pub trait PipelinePort {
|
||||
fn run_analysis(&mut self, input: SequenceData) -> Result<AnalysisResult, Error>;
|
||||
fn get_status(&self) -> PipelineStatus;
|
||||
fn get_results(&self) -> Option<&AnalysisResult>;
|
||||
fn checkpoint(&self) -> Result<String, Error>;
|
||||
fn restore(&mut self, checkpoint_id: &str) -> Result<(), Error>;
|
||||
}
|
||||
|
||||
// Primary port for variant analysis
|
||||
pub trait VariantAnalysisPort {
|
||||
fn call_variants(&self, sequence: &[u8], reference: &[u8])
|
||||
-> Result<Vec<Variant>, Error>;
|
||||
fn annotate_variant(&self, variant: &Variant)
|
||||
-> Result<Annotation, Error>;
|
||||
}
|
||||
|
||||
// Primary port for pharmacogenomics
|
||||
pub trait PharmacogenomicsPort {
|
||||
fn analyze_drug_response(&self, variants: &[Variant])
|
||||
-> Result<Vec<DrugResponse>, Error>;
|
||||
fn get_recommendations(&self, drug: &str, diplotype: &Diplotype)
|
||||
-> Result<ClinicalRecommendation, Error>;
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Secondary Ports (Outbound)
|
||||
|
||||
**Location**: Trait definitions in each bounded context module
|
||||
|
||||
**Characteristics**:
|
||||
- Define infrastructure abstractions
|
||||
- Implemented by secondary adapters
|
||||
- Enable dependency inversion
|
||||
- Mock-friendly for testing
|
||||
|
||||
**Example Ports**:
|
||||
|
||||
```rust
|
||||
// Port for vector storage (HNSW)
|
||||
pub trait VectorStoragePort: Send + Sync {
|
||||
fn store_embedding(&self, key: String, embedding: Vec<f32>)
|
||||
-> Result<(), Error>;
|
||||
|
||||
fn search_similar(&self, query: Vec<f32>, k: usize)
|
||||
-> Result<Vec<SimilarityMatch>, Error>;
|
||||
|
||||
fn delete_embedding(&self, key: &str) -> Result<(), Error>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct SimilarityMatch {
|
||||
pub key: String,
|
||||
pub similarity: f64,
|
||||
pub metadata: Option<String>,
|
||||
}
|
||||
|
||||
// Port for attention mechanisms
|
||||
pub trait AttentionPort: Send + Sync {
|
||||
fn compute_attention(
|
||||
&self,
|
||||
query: &[f32],
|
||||
keys: &[Vec<f32>],
|
||||
values: &[Vec<f32>],
|
||||
) -> Result<Vec<f32>, Error>;
|
||||
|
||||
fn flash_attention(
|
||||
&self,
|
||||
query: &[f32],
|
||||
keys: &[Vec<f32>],
|
||||
values: &[Vec<f32>],
|
||||
) -> Result<Vec<f32>, Error>;
|
||||
}
|
||||
|
||||
// Port for graph neural networks
|
||||
pub trait GraphNeuralPort: Send + Sync {
|
||||
fn gnn_inference(&self, graph: &Graph) -> Result<Vec<Prediction>, Error>;
|
||||
|
||||
fn graph_search(&self, query_node: Node, k: usize)
|
||||
-> Result<Vec<Node>, Error>;
|
||||
|
||||
fn classify_variants(&self, candidates: Vec<VariantCandidate>)
|
||||
-> Result<Vec<Variant>, Error>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Graph {
|
||||
pub nodes: Vec<Node>,
|
||||
pub edges: Vec<(usize, usize, f64)>,
|
||||
}
|
||||
|
||||
// Port for persistence
|
||||
pub trait PersistencePort: Send + Sync {
|
||||
fn save_results(&self, results: &AnalysisResult) -> Result<String, Error>;
|
||||
fn load_results(&self, id: &str) -> Result<AnalysisResult, Error>;
|
||||
fn save_checkpoint(&self, pipeline: &GenomicPipeline) -> Result<String, Error>;
|
||||
fn load_checkpoint(&self, id: &str) -> Result<GenomicPipeline, Error>;
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Primary Adapters (Inbound)
|
||||
|
||||
**Location**: Binary crates or API modules
|
||||
|
||||
**Characteristics**:
|
||||
- Convert external requests to domain calls
|
||||
- Implement framework-specific code
|
||||
- Handle serialization/deserialization
|
||||
- Map errors to appropriate responses
|
||||
|
||||
**Example Adapters**:
|
||||
|
||||
```rust
|
||||
// CLI adapter
|
||||
pub struct CliAdapter {
|
||||
pipeline: Box<dyn PipelinePort>,
|
||||
}
|
||||
|
||||
impl CliAdapter {
|
||||
pub fn run(&mut self, args: CliArgs) -> Result<(), Error> {
|
||||
// Convert CLI args to domain input
|
||||
let input = SequenceData {
|
||||
sequence: std::fs::read_to_string(&args.input)?,
|
||||
quality: None,
|
||||
};
|
||||
|
||||
// Call domain through port
|
||||
let result = self.pipeline.run_analysis(input)?;
|
||||
|
||||
// Format output for CLI
|
||||
self.print_results(&result);
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
// REST API adapter (hypothetical)
|
||||
pub struct RestApiAdapter {
|
||||
pipeline: Box<dyn PipelinePort>,
|
||||
}
|
||||
|
||||
impl RestApiAdapter {
|
||||
pub async fn analyze_handler(&self, req: Request) -> Response {
|
||||
// Parse JSON request
|
||||
let input: SequenceData = match serde_json::from_slice(req.body()) {
|
||||
Ok(data) => data,
|
||||
Err(e) => return Response::error(400, e.to_string()),
|
||||
};
|
||||
|
||||
// Call domain
|
||||
match self.pipeline.run_analysis(input) {
|
||||
Ok(result) => Response::ok(serde_json::to_string(&result).unwrap()),
|
||||
Err(e) => Response::error(500, e.to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Secondary Adapters (Outbound)
|
||||
|
||||
**Location**: Infrastructure modules or separate crates
|
||||
|
||||
**Characteristics**:
|
||||
- Implement secondary ports
|
||||
- Integrate with external libraries (ruvector)
|
||||
- Handle technical concerns (networking, storage, etc.)
|
||||
- Isolate infrastructure code
|
||||
|
||||
**Example Adapters**:
|
||||
|
||||
```rust
|
||||
// RuVector HNSW adapter
|
||||
pub struct RuVectorAdapter {
|
||||
db: Arc<AgentDB>,
|
||||
}
|
||||
|
||||
impl VectorStoragePort for RuVectorAdapter {
|
||||
fn store_embedding(&self, key: String, embedding: Vec<f32>)
|
||||
-> Result<(), Error>
|
||||
{
|
||||
self.db.store(&key, &embedding)
|
||||
.map_err(|e| Error::StorageError(e.to_string()))
|
||||
}
|
||||
|
||||
fn search_similar(&self, query: Vec<f32>, k: usize)
|
||||
-> Result<Vec<SimilarityMatch>, Error>
|
||||
{
|
||||
let results = self.db.search(&query, k)
|
||||
.map_err(|e| Error::SearchError(e.to_string()))?;
|
||||
|
||||
Ok(results.into_iter().map(|r| SimilarityMatch {
|
||||
key: r.key,
|
||||
similarity: r.distance,
|
||||
metadata: r.metadata,
|
||||
}).collect())
|
||||
}
|
||||
|
||||
fn delete_embedding(&self, key: &str) -> Result<(), Error> {
|
||||
self.db.delete(key)
|
||||
.map_err(|e| Error::StorageError(e.to_string()))
|
||||
}
|
||||
}
|
||||
|
||||
// RuVector Attention adapter
|
||||
pub struct RuVectorAttentionAdapter {
|
||||
attention_service: Arc<AttentionService>,
|
||||
}
|
||||
|
||||
impl AttentionPort for RuVectorAttentionAdapter {
|
||||
fn compute_attention(
|
||||
&self,
|
||||
query: &[f32],
|
||||
keys: &[Vec<f32>],
|
||||
values: &[Vec<f32>],
|
||||
) -> Result<Vec<f32>, Error> {
|
||||
// Convert to ruvector tensor format
|
||||
let q_tensor = Tensor::from_slice(query);
|
||||
let k_tensor = Tensor::from_matrix(keys);
|
||||
let v_tensor = Tensor::from_matrix(values);
|
||||
|
||||
// Call ruvector attention
|
||||
let output = self.attention_service
|
||||
.scaled_dot_product(&q_tensor, &k_tensor, &v_tensor)
|
||||
.map_err(|e| Error::AttentionError(e.to_string()))?;
|
||||
|
||||
// Convert back to Vec<f32>
|
||||
Ok(output.to_vec())
|
||||
}
|
||||
|
||||
fn flash_attention(
|
||||
&self,
|
||||
query: &[f32],
|
||||
keys: &[Vec<f32>],
|
||||
values: &[Vec<f32>],
|
||||
) -> Result<Vec<f32>, Error> {
|
||||
// Use ruvector flash attention for efficiency
|
||||
let q_tensor = Tensor::from_slice(query);
|
||||
let k_tensor = Tensor::from_matrix(keys);
|
||||
let v_tensor = Tensor::from_matrix(values);
|
||||
|
||||
let output = self.attention_service
|
||||
.flash_attention(&q_tensor, &k_tensor, &v_tensor)
|
||||
.map_err(|e| Error::AttentionError(e.to_string()))?;
|
||||
|
||||
Ok(output.to_vec())
|
||||
}
|
||||
}
|
||||
|
||||
// RuVector GNN adapter
|
||||
pub struct RuVectorGnnAdapter {
|
||||
gnn_service: Arc<GnnService>,
|
||||
}
|
||||
|
||||
impl GraphNeuralPort for RuVectorGnnAdapter {
|
||||
fn gnn_inference(&self, graph: &Graph) -> Result<Vec<Prediction>, Error> {
|
||||
// Convert domain graph to ruvector format
|
||||
let nodes: Vec<Vec<f32>> = graph.nodes.iter()
|
||||
.map(|n| n.features.clone())
|
||||
.collect();
|
||||
|
||||
let edges: Vec<(usize, usize)> = graph.edges.iter()
|
||||
.map(|(i, j, _)| (*i, *j))
|
||||
.collect();
|
||||
|
||||
// Call ruvector GNN
|
||||
let predictions = self.gnn_service
|
||||
.predict(&nodes, &edges)
|
||||
.map_err(|e| Error::GnnError(e.to_string()))?;
|
||||
|
||||
Ok(predictions)
|
||||
}
|
||||
|
||||
fn classify_variants(&self, candidates: Vec<VariantCandidate>)
|
||||
-> Result<Vec<Variant>, Error>
|
||||
{
|
||||
// Build graph from variant candidates
|
||||
let graph = self.build_variant_graph(&candidates);
|
||||
|
||||
// Use GNN to classify
|
||||
let predictions = self.gnn_inference(&graph)?;
|
||||
|
||||
// Convert predictions back to variants
|
||||
candidates.into_iter()
|
||||
.zip(predictions)
|
||||
.filter(|(_, pred)| pred.confidence > 0.8)
|
||||
.map(|(cand, pred)| self.to_variant(cand, pred))
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
|
||||
// File system persistence adapter
|
||||
pub struct FileSystemAdapter {
|
||||
output_dir: PathBuf,
|
||||
}
|
||||
|
||||
impl PersistencePort for FileSystemAdapter {
|
||||
fn save_results(&self, results: &AnalysisResult) -> Result<String, Error> {
|
||||
let id = Uuid::new_v4().to_string();
|
||||
let path = self.output_dir.join(format!("{}.json", id));
|
||||
|
||||
let json = serde_json::to_string_pretty(results)
|
||||
.map_err(|e| Error::SerializationError(e.to_string()))?;
|
||||
|
||||
std::fs::write(&path, json)
|
||||
.map_err(|e| Error::IoError(e.to_string()))?;
|
||||
|
||||
Ok(id)
|
||||
}
|
||||
|
||||
fn load_results(&self, id: &str) -> Result<AnalysisResult, Error> {
|
||||
let path = self.output_dir.join(format!("{}.json", id));
|
||||
|
||||
let json = std::fs::read_to_string(&path)
|
||||
.map_err(|e| Error::IoError(e.to_string()))?;
|
||||
|
||||
serde_json::from_str(&json)
|
||||
.map_err(|e| Error::DeserializationError(e.to_string()))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dependency Injection
|
||||
|
||||
**Construction at Application Startup**:
|
||||
|
||||
```rust
|
||||
// main.rs or application initialization
|
||||
pub fn build_pipeline() -> Result<impl PipelinePort, Error> {
|
||||
// Create secondary adapters (infrastructure)
|
||||
let vector_store = Arc::new(RuVectorAdapter::new()?);
|
||||
let attention = Arc::new(RuVectorAttentionAdapter::new()?);
|
||||
let gnn = Arc::new(RuVectorGnnAdapter::new()?);
|
||||
let persistence = Arc::new(FileSystemAdapter::new("./output")?);
|
||||
|
||||
// Create domain services with port dependencies
|
||||
let kmer_encoder = KmerEncoder::new(21)?;
|
||||
|
||||
let aligner = AttentionAligner::new(
|
||||
attention.clone(),
|
||||
-1.0, // gap penalty
|
||||
2.0, // match bonus
|
||||
);
|
||||
|
||||
let variant_caller = VariantCaller::new(
|
||||
30.0, // min quality
|
||||
10, // min depth
|
||||
gnn.clone(),
|
||||
);
|
||||
|
||||
let protein_predictor = ContactPredictor::new(
|
||||
gnn.clone(),
|
||||
attention.clone(),
|
||||
8.0, // distance threshold
|
||||
);
|
||||
|
||||
// Create pipeline (aggregates all services)
|
||||
let pipeline = GenomicPipeline::new(
|
||||
kmer_encoder,
|
||||
aligner,
|
||||
variant_caller,
|
||||
protein_predictor,
|
||||
persistence,
|
||||
)?;
|
||||
|
||||
Ok(pipeline)
|
||||
}
|
||||
```
|
||||
|
||||
## Testing Strategy by Layer
|
||||
|
||||
### 1. Core Domain Testing
|
||||
|
||||
**Strategy**: Pure unit tests, no mocks needed
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_nucleotide_complement() {
|
||||
assert_eq!(Nucleotide::A.complement(), Nucleotide::T);
|
||||
assert_eq!(Nucleotide::G.complement(), Nucleotide::C);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_quality_score_error_probability() {
|
||||
let q30 = QualityScore::from_phred(30.0).unwrap();
|
||||
assert!((q30.error_probability() - 0.001).abs() < 1e-6);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_genomic_position_validation() {
|
||||
let valid = GenomicPosition::new("chr1".to_string(), 1000);
|
||||
assert!(valid.is_ok());
|
||||
|
||||
let invalid = GenomicPosition::new("chr1".to_string(), 0);
|
||||
assert!(invalid.is_err());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Domain Service Testing
|
||||
|
||||
**Strategy**: Use mock implementations of ports
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use mockall::predicate::*;
|
||||
use mockall::mock;
|
||||
|
||||
// Mock GNN port
|
||||
mock! {
|
||||
GnnService {}
|
||||
|
||||
impl GraphNeuralPort for GnnService {
|
||||
fn classify_variants(&self, candidates: Vec<VariantCandidate>)
|
||||
-> Result<Vec<Variant>, Error>;
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_variant_caller_filters_low_quality() {
|
||||
// Setup mock
|
||||
let mut mock_gnn = MockGnnService::new();
|
||||
mock_gnn.expect_classify_variants()
|
||||
.returning(|_| Ok(vec![
|
||||
Variant { quality: 35.0, depth: 15, ..Default::default() },
|
||||
Variant { quality: 20.0, depth: 15, ..Default::default() }, // Below threshold
|
||||
]));
|
||||
|
||||
// Test service
|
||||
let caller = VariantCaller::new(30.0, 10, Arc::new(mock_gnn));
|
||||
let results = caller.call_variants(&alignments).unwrap();
|
||||
|
||||
// Only high-quality variant should pass
|
||||
assert_eq!(results.len(), 1);
|
||||
assert_eq!(results[0].quality, 35.0);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Adapter Testing
|
||||
|
||||
**Strategy**: Integration tests with real infrastructure or test doubles
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_ruvector_adapter_roundtrip() {
|
||||
// Use in-memory ruvector instance
|
||||
let adapter = RuVectorAdapter::new_in_memory().unwrap();
|
||||
|
||||
// Store embedding
|
||||
let embedding = vec![0.1, 0.2, 0.3, 0.4];
|
||||
adapter.store_embedding("test_key".to_string(), embedding.clone()).unwrap();
|
||||
|
||||
// Search should find it
|
||||
let results = adapter.search_similar(embedding, 1).unwrap();
|
||||
|
||||
assert_eq!(results.len(), 1);
|
||||
assert_eq!(results[0].key, "test_key");
|
||||
assert!(results[0].similarity > 0.99);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. End-to-End Testing
|
||||
|
||||
**Strategy**: Full pipeline with real or test infrastructure
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod integration_tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_full_pipeline() {
|
||||
// Build pipeline with real adapters
|
||||
let pipeline = build_pipeline().unwrap();
|
||||
|
||||
// Load test data
|
||||
let input = SequenceData {
|
||||
sequence: include_str!("../test_data/sample.fasta").to_string(),
|
||||
quality: None,
|
||||
};
|
||||
|
||||
// Run analysis
|
||||
let result = pipeline.run_analysis(input).unwrap();
|
||||
|
||||
// Verify results
|
||||
assert!(result.variants.len() > 0);
|
||||
assert!(result.protein_structures.len() > 0);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Benefits of Hexagonal Architecture
|
||||
|
||||
### 1. Testability
|
||||
- Domain logic testable without infrastructure
|
||||
- Ports enable easy mocking
|
||||
- Fast unit tests (no I/O)
|
||||
|
||||
### 2. Maintainability
|
||||
- Clear separation of concerns
|
||||
- Changes to infrastructure don't affect domain
|
||||
- Easy to understand dependencies
|
||||
|
||||
### 3. Flexibility
|
||||
- Swap implementations without changing domain
|
||||
- Support multiple adapters (CLI, API, UI)
|
||||
- Easy to add new infrastructure
|
||||
|
||||
### 4. Domain Focus
|
||||
- Business logic remains pure
|
||||
- Rich domain model
|
||||
- Ubiquitous language preserved
|
||||
|
||||
## Adapter Implementation Matrix
|
||||
|
||||
| Port | RuVector Adapter | Alternative Adapter | Test Adapter |
|
||||
|------|------------------|---------------------|--------------|
|
||||
| VectorStoragePort | RuVectorAdapter (HNSW) | PostgreSQL pgvector | InMemoryVectorStore |
|
||||
| AttentionPort | RuVectorAttentionAdapter | PyTorch bindings | MockAttention |
|
||||
| GraphNeuralPort | RuVectorGnnAdapter | DGL bindings | MockGNN |
|
||||
| PersistencePort | FileSystemAdapter | PostgreSQL | InMemoryPersistence |
|
||||
|
||||
## Configuration Management
|
||||
|
||||
```rust
|
||||
// Configuration for adapter selection
|
||||
pub struct AdapterConfig {
|
||||
pub vector_backend: VectorBackend,
|
||||
pub persistence_backend: PersistenceBackend,
|
||||
pub enable_flash_attention: bool,
|
||||
}
|
||||
|
||||
pub enum VectorBackend {
|
||||
RuVector,
|
||||
PgVector,
|
||||
InMemory,
|
||||
}
|
||||
|
||||
pub enum PersistenceBackend {
|
||||
FileSystem { path: PathBuf },
|
||||
PostgreSQL { connection_string: String },
|
||||
InMemory,
|
||||
}
|
||||
|
||||
// Factory for building adapters
|
||||
pub struct AdapterFactory;
|
||||
|
||||
impl AdapterFactory {
|
||||
pub fn build_vector_storage(config: &AdapterConfig)
|
||||
-> Result<Box<dyn VectorStoragePort>, Error>
|
||||
{
|
||||
match config.vector_backend {
|
||||
VectorBackend::RuVector => {
|
||||
Ok(Box::new(RuVectorAdapter::new()?))
|
||||
}
|
||||
VectorBackend::PgVector => {
|
||||
Ok(Box::new(PgVectorAdapter::new(&config.db_url)?))
|
||||
}
|
||||
VectorBackend::InMemory => {
|
||||
Ok(Box::new(InMemoryVectorStore::new()))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn build_persistence(config: &AdapterConfig)
|
||||
-> Result<Box<dyn PersistencePort>, Error>
|
||||
{
|
||||
match &config.persistence_backend {
|
||||
PersistenceBackend::FileSystem { path } => {
|
||||
Ok(Box::new(FileSystemAdapter::new(path)?))
|
||||
}
|
||||
PersistenceBackend::PostgreSQL { connection_string } => {
|
||||
Ok(Box::new(PostgresAdapter::new(connection_string)?))
|
||||
}
|
||||
PersistenceBackend::InMemory => {
|
||||
Ok(Box::new(InMemoryPersistence::new()))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
The hexagonal architecture provides:
|
||||
|
||||
1. **Pure Core Domain**: Business logic independent of infrastructure (types.rs, error.rs)
|
||||
2. **Domain Services**: Seven bounded contexts implementing genomic analysis
|
||||
3. **Primary Ports**: Application API (pipeline.rs traits)
|
||||
4. **Secondary Ports**: Infrastructure abstractions (VectorStoragePort, AttentionPort, etc.)
|
||||
5. **Primary Adapters**: CLI, API, UI interfaces
|
||||
6. **Secondary Adapters**: RuVector integrations (HNSW, Flash Attention, GNN)
|
||||
|
||||
All dependencies point inward toward the core domain, enabling testability, maintainability, and flexibility in implementation choices.
|
||||
602
examples/dna/ddd/bounded-context-map.md
Normal file
602
examples/dna/ddd/bounded-context-map.md
Normal file
@@ -0,0 +1,602 @@
|
||||
# Bounded Context Map - Genomic Analysis Platform
|
||||
|
||||
## Context Map Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ GENOMIC ANALYSIS PLATFORM │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌──────────────────┐
|
||||
│ Pipeline │ ◄───────── Orchestration Layer
|
||||
│ Context │
|
||||
└────────┬─────────┘
|
||||
│ ACL (maps domain events to pipeline commands)
|
||||
│
|
||||
┌────────┴─────────────────────────────────────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Sequence │ Customer-Supplier │ Alignment │
|
||||
│ Context ├──────────────────────────────►│ Context │
|
||||
│ │ (provides k-mer indices) │ │
|
||||
└────────┬────────┘ └────────┬────────┘
|
||||
│ │
|
||||
│ Shared Kernel (GenomicPosition, QualityScore) │
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Variant │ │ Protein │
|
||||
│ Context │◄──────────────────────────────┤ Context │
|
||||
│ │ Partner (variant→structure) │ │
|
||||
└────────┬────────┘ └─────────────────┘
|
||||
│
|
||||
│ ACL (translates variants to epigenetic events)
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Epigenomic │
|
||||
│ Context │
|
||||
└────────┬────────┘
|
||||
│
|
||||
│ Customer-Supplier (epigenetic→drug response)
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Pharmacogenomic │
|
||||
│ Context │
|
||||
└─────────────────┘
|
||||
|
||||
Legend:
|
||||
Customer-Supplier: → (upstream provides services to downstream)
|
||||
Shared Kernel: ├─┤ (shared domain model)
|
||||
Partner: ◄─► (mutual dependency)
|
||||
ACL: [A] (anti-corruption layer)
|
||||
```
|
||||
|
||||
## 1. Sequence Context
|
||||
|
||||
**Module**: `kmer.rs`
|
||||
|
||||
**Responsibility**: K-mer indexing, sequence sketching, and similarity search
|
||||
|
||||
**Core Aggregates**:
|
||||
- `KmerIndex` - Root aggregate managing k-mer → position mappings
|
||||
- `MinHashSketch` - Aggregate for approximate sequence similarity
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct KmerEncoder {
|
||||
k: usize,
|
||||
alphabet_size: usize,
|
||||
}
|
||||
|
||||
pub struct KmerIndex {
|
||||
k: usize,
|
||||
index: HashMap<u64, Vec<usize>>, // k-mer hash → positions
|
||||
}
|
||||
|
||||
pub struct MinHashSketch {
|
||||
k: usize,
|
||||
num_hashes: usize,
|
||||
signatures: Vec<u64>,
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `SequenceIndexed { sequence_id: String, kmer_count: usize }`
|
||||
- `SimilarSequenceFound { query_id: String, match_id: String, similarity: f64 }`
|
||||
|
||||
**Domain Language**:
|
||||
- K-mer: substring of length k
|
||||
- Minimizer: canonical k-mer representation
|
||||
- Sketch: compressed sequence signature
|
||||
- Jaccard similarity: set overlap metric
|
||||
|
||||
**Invariants**:
|
||||
- K-mer length must be 3 ≤ k ≤ 32
|
||||
- MinHash signature size must be ≥ 1
|
||||
- All k-mers normalized to canonical form (min(kmer, reverse_complement))
|
||||
|
||||
## 2. Alignment Context
|
||||
|
||||
**Module**: `alignment.rs`
|
||||
|
||||
**Responsibility**: Sequence alignment using attention mechanisms and motif detection
|
||||
|
||||
**Core Aggregates**:
|
||||
- `AttentionAligner` - Root aggregate for pairwise sequence alignment
|
||||
- `MotifScanner` - Aggregate for regulatory motif discovery
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct AttentionAligner {
|
||||
attention_service: Arc<AttentionService>,
|
||||
gap_penalty: f64,
|
||||
match_bonus: f64,
|
||||
}
|
||||
|
||||
pub struct MotifScanner {
|
||||
attention_service: Arc<AttentionService>,
|
||||
min_score: f64,
|
||||
known_motifs: Vec<MotifPattern>,
|
||||
}
|
||||
|
||||
pub struct AlignmentResult {
|
||||
pub score: f64,
|
||||
pub aligned_query: String,
|
||||
pub aligned_target: String,
|
||||
pub attention_weights: Vec<Vec<f64>>,
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `AlignmentCompleted { query_id: String, target_id: String, score: f64 }`
|
||||
- `MotifDetected { sequence_id: String, motif: String, position: usize, score: f64 }`
|
||||
|
||||
**Domain Language**:
|
||||
- Alignment: optimal mapping between two sequences
|
||||
- Gap penalty: cost of insertions/deletions
|
||||
- Attention weight: learned similarity between positions
|
||||
- Motif: conserved sequence pattern (e.g., TATA box)
|
||||
- PWM (Position Weight Matrix): motif scoring matrix
|
||||
|
||||
**Invariants**:
|
||||
- Gap penalty must be negative
|
||||
- Match bonus must be positive
|
||||
- Motif minimum score 0.0 ≤ score ≤ 1.0
|
||||
- Alignment score monotonically decreases with gaps
|
||||
|
||||
**Relationship with Sequence Context**:
|
||||
- **Type**: Customer-Supplier
|
||||
- **Direction**: Sequence → Alignment
|
||||
- **Integration**: Alignment consumes k-mer indices for fast seed-and-extend
|
||||
- **Translation**: None (direct dependency)
|
||||
|
||||
## 3. Variant Context
|
||||
|
||||
**Module**: `variant.rs`
|
||||
|
||||
**Responsibility**: Variant calling, genotyping, and population genetics
|
||||
|
||||
**Core Aggregates**:
|
||||
- `VariantDatabase` - Root aggregate managing variant collection
|
||||
- `VariantCaller` - Service aggregate for variant detection
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct VariantCaller {
|
||||
min_quality: f64,
|
||||
min_depth: usize,
|
||||
gnn_service: Arc<GnnService>,
|
||||
}
|
||||
|
||||
pub struct Variant {
|
||||
pub position: GenomicPosition,
|
||||
pub reference: String,
|
||||
pub alternate: String,
|
||||
pub quality: f64,
|
||||
pub genotype: Genotype,
|
||||
pub depth: usize,
|
||||
pub allele_frequency: Option<f64>,
|
||||
}
|
||||
|
||||
pub struct VariantDatabase {
|
||||
variants: HashMap<GenomicPosition, Variant>,
|
||||
graph_index: Option<GraphIndex>, // GNN-based variant relationships
|
||||
}
|
||||
|
||||
pub enum Genotype {
|
||||
Homozygous(Allele),
|
||||
Heterozygous(Allele, Allele),
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `VariantCalled { position: GenomicPosition, variant: Variant }`
|
||||
- `GenotypeUpdated { sample_id: String, position: GenomicPosition, genotype: Genotype }`
|
||||
- `PopulationFrequencyCalculated { variant_id: String, frequency: f64 }`
|
||||
|
||||
**Domain Language**:
|
||||
- SNP (Single Nucleotide Polymorphism): single base change
|
||||
- Indel: insertion or deletion
|
||||
- Genotype: allele combination (0/0, 0/1, 1/1)
|
||||
- Allele frequency: population prevalence
|
||||
- Quality score: confidence in variant call (Phred scale)
|
||||
- Coverage depth: number of reads supporting variant
|
||||
|
||||
**Invariants**:
|
||||
- Quality score ≥ 0 (Phred scale)
|
||||
- Coverage depth ≥ 1
|
||||
- Allele frequency 0.0 ≤ AF ≤ 1.0
|
||||
- Reference and alternate alleles must differ
|
||||
- Genotype alleles must match available alleles
|
||||
|
||||
**Relationship with Alignment Context**:
|
||||
- **Type**: Customer-Supplier
|
||||
- **Direction**: Alignment → Variant
|
||||
- **Integration**: Variant caller uses alignment results to identify mismatches
|
||||
- **Translation**: Alignment gaps → insertion/deletion variants
|
||||
|
||||
**Shared Kernel with Sequence Context**:
|
||||
- `GenomicPosition { chromosome: String, position: usize }`
|
||||
- `QualityScore(f64)` (Phred-scaled)
|
||||
- `Nucleotide` enum (A, C, G, T)
|
||||
|
||||
## 4. Protein Context
|
||||
|
||||
**Module**: `protein.rs`
|
||||
|
||||
**Responsibility**: Protein structure prediction and contact map generation
|
||||
|
||||
**Core Aggregates**:
|
||||
- `ProteinGraph` - Root aggregate representing protein as graph
|
||||
- `ContactPredictor` - Service aggregate for 3D contact prediction
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct ProteinGraph {
|
||||
pub sequence: String, // amino acid sequence
|
||||
pub nodes: Vec<AminoAcid>,
|
||||
pub edges: Vec<(usize, usize, ContactType)>,
|
||||
}
|
||||
|
||||
pub struct ContactPredictor {
|
||||
gnn_service: Arc<GnnService>,
|
||||
attention_service: Arc<AttentionService>,
|
||||
distance_threshold: f64, // Ångströms
|
||||
}
|
||||
|
||||
pub struct ContactPrediction {
|
||||
pub residue_i: usize,
|
||||
pub residue_j: usize,
|
||||
pub probability: f64,
|
||||
pub distance: Option<f64>,
|
||||
}
|
||||
|
||||
pub enum ContactType {
|
||||
Backbone,
|
||||
SideChain,
|
||||
HydrogenBond,
|
||||
DisulfideBridge,
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `ProteinTranslated { gene_id: String, protein_sequence: String }`
|
||||
- `StructurePredicted { protein_id: String, contact_count: usize }`
|
||||
- `FoldingPathwayComputed { protein_id: String, energy: f64 }`
|
||||
|
||||
**Domain Language**:
|
||||
- Amino acid: protein building block (20 standard types)
|
||||
- Residue: amino acid position in sequence
|
||||
- Contact: spatial proximity between residues (<8Å)
|
||||
- Secondary structure: local folding patterns (helix, sheet, loop)
|
||||
- Tertiary structure: 3D protein fold
|
||||
- Contact map: matrix of residue-residue distances
|
||||
|
||||
**Invariants**:
|
||||
- Sequence length ≥ 1
|
||||
- Contact probability 0.0 ≤ p ≤ 1.0
|
||||
- Distance threshold > 0.0 (typically 8.0Å)
|
||||
- Contact pairs must be |i - j| ≥ 4 (exclude local contacts)
|
||||
|
||||
**Relationship with Variant Context**:
|
||||
- **Type**: Partner (bidirectional)
|
||||
- **Direction**: Variant ↔ Protein
|
||||
- **Integration**:
|
||||
- Variant → Protein: coding variants cause amino acid changes
|
||||
- Protein → Variant: structural changes inform variant pathogenicity
|
||||
- **Translation**:
|
||||
- Variant ACL translates nucleotide changes to codon changes
|
||||
- Protein ACL maps structure disruption to clinical significance
|
||||
|
||||
## 5. Epigenomic Context
|
||||
|
||||
**Module**: `epigenomics.rs`
|
||||
|
||||
**Responsibility**: DNA methylation analysis and epigenetic age prediction
|
||||
|
||||
**Core Aggregates**:
|
||||
- `EpigeneticIndex` - Root aggregate managing methylation sites
|
||||
- `HorvathClock` - Service aggregate for epigenetic age calculation
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct MethylationProfile {
|
||||
pub cpg_sites: HashMap<GenomicPosition, f64>, // position → beta value
|
||||
pub total_sites: usize,
|
||||
pub mean_methylation: f64,
|
||||
}
|
||||
|
||||
pub struct HorvathClock {
|
||||
pub coefficients: HashMap<String, f64>, // CpG site → weight
|
||||
pub intercept: f64,
|
||||
}
|
||||
|
||||
pub struct CpGSite {
|
||||
pub position: GenomicPosition,
|
||||
pub beta_value: f64, // 0.0 (unmethylated) to 1.0 (methylated)
|
||||
pub coverage: usize,
|
||||
}
|
||||
|
||||
pub struct EpigeneticAge {
|
||||
pub chronological_age: Option<f64>,
|
||||
pub predicted_age: f64,
|
||||
pub acceleration: f64, // predicted - chronological
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `MethylationProfileGenerated { sample_id: String, site_count: usize }`
|
||||
- `EpigeneticAgeCalculated { sample_id: String, age: f64, acceleration: f64 }`
|
||||
- `DifferentialMethylationDetected { region: GenomicRegion, delta_beta: f64 }`
|
||||
|
||||
**Domain Language**:
|
||||
- CpG site: cytosine-guanine dinucleotide (methylation target)
|
||||
- Beta value: methylation level (0 = unmethylated, 1 = fully methylated)
|
||||
- Epigenetic clock: age predictor based on methylation
|
||||
- Age acceleration: difference between epigenetic and chronological age
|
||||
- DMR (Differentially Methylated Region): region with changed methylation
|
||||
|
||||
**Invariants**:
|
||||
- Beta value 0.0 ≤ β ≤ 1.0
|
||||
- Coverage ≥ 1
|
||||
- Horvath coefficients sum to meaningful scale
|
||||
- Age ≥ 0.0
|
||||
|
||||
**Relationship with Variant Context**:
|
||||
- **Type**: Anti-Corruption Layer
|
||||
- **Direction**: Variant → Epigenomic
|
||||
- **Integration**: Variants in regulatory regions affect methylation patterns
|
||||
- **Translation**:
|
||||
- ACL translates genetic variants to epigenetic effects
|
||||
- Maps SNPs → methylation quantitative trait loci (mQTL)
|
||||
- Prevents variant domain concepts from leaking into epigenetic model
|
||||
|
||||
## 6. Pharmacogenomic Context
|
||||
|
||||
**Module**: `pharma.rs`
|
||||
|
||||
**Responsibility**: Pharmacogenetic analysis and drug-gene interaction prediction
|
||||
|
||||
**Core Aggregates**:
|
||||
- `DrugInteractionGraph` - Root aggregate representing drug-gene network
|
||||
- `StarAlleleCaller` - Service aggregate for haplotype phasing
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct StarAlleleCaller {
|
||||
gene_definitions: HashMap<String, GeneDefinition>,
|
||||
min_coverage: usize,
|
||||
}
|
||||
|
||||
pub struct StarAllele {
|
||||
pub gene: String,
|
||||
pub allele: String, // e.g., "*1", "*2", "*17"
|
||||
pub variants: Vec<Variant>,
|
||||
pub function: AlleleFunction,
|
||||
}
|
||||
|
||||
pub enum AlleleFunction {
|
||||
Normal,
|
||||
Increased,
|
||||
Decreased,
|
||||
NoFunction,
|
||||
}
|
||||
|
||||
pub struct DrugInteractionGraph {
|
||||
pub nodes: Vec<DrugGeneNode>,
|
||||
pub edges: Vec<(usize, usize, InteractionType)>,
|
||||
}
|
||||
|
||||
pub struct DrugResponse {
|
||||
pub drug: String,
|
||||
pub diplotype: Diplotype,
|
||||
pub phenotype: MetabolizerPhenotype,
|
||||
pub recommendation: ClinicalRecommendation,
|
||||
}
|
||||
|
||||
pub enum MetabolizerPhenotype {
|
||||
UltraRapid,
|
||||
Rapid,
|
||||
Normal,
|
||||
Intermediate,
|
||||
Poor,
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `StarAlleleIdentified { gene: String, allele: String, diplotype: String }`
|
||||
- `DrugResponsePredicted { drug: String, phenotype: MetabolizerPhenotype }`
|
||||
- `InteractionDetected { drug1: String, drug2: String, severity: Severity }`
|
||||
|
||||
**Domain Language**:
|
||||
- Star allele: named haplotype variant (e.g., CYP2D6*4)
|
||||
- Diplotype: pair of haplotypes (e.g., *1/*4)
|
||||
- Metabolizer phenotype: drug metabolism rate
|
||||
- Pharmacogene: gene affecting drug response
|
||||
- Drug-gene interaction: how genetics modulates drug efficacy/toxicity
|
||||
|
||||
**Invariants**:
|
||||
- Diplotype must have exactly 2 alleles
|
||||
- Phenotype derivable from diplotype
|
||||
- Coverage ≥ minimum threshold for calling
|
||||
- All star allele variants must exist in variant database
|
||||
|
||||
**Relationship with Epigenomic Context**:
|
||||
- **Type**: Customer-Supplier
|
||||
- **Direction**: Epigenomic → Pharmacogenomic
|
||||
- **Integration**: Methylation affects drug metabolism gene expression
|
||||
- **Translation**: Methylation beta values → gene expression levels → phenotype
|
||||
|
||||
## 7. Pipeline Context
|
||||
|
||||
**Module**: `pipeline.rs`
|
||||
|
||||
**Responsibility**: Orchestration of multi-stage genomic analysis workflow
|
||||
|
||||
**Core Aggregates**:
|
||||
- `GenomicPipeline` - Root aggregate orchestrating all contexts
|
||||
|
||||
**Key Types**:
|
||||
```rust
|
||||
pub struct GenomicPipeline {
|
||||
pub kmer_encoder: KmerEncoder,
|
||||
pub aligner: AttentionAligner,
|
||||
pub variant_caller: VariantCaller,
|
||||
pub protein_predictor: ContactPredictor,
|
||||
pub methylation_analyzer: MethylationAnalyzer,
|
||||
pub pharma_analyzer: StarAlleleCaller,
|
||||
}
|
||||
|
||||
pub struct PipelineConfig {
|
||||
pub k: usize,
|
||||
pub min_variant_quality: f64,
|
||||
pub min_coverage: usize,
|
||||
pub enable_protein_prediction: bool,
|
||||
pub enable_epigenetic_analysis: bool,
|
||||
pub enable_pharmacogenomics: bool,
|
||||
}
|
||||
|
||||
pub struct AnalysisResult {
|
||||
pub sequence_stats: SequenceStats,
|
||||
pub variants: Vec<Variant>,
|
||||
pub protein_structures: Vec<ProteinGraph>,
|
||||
pub methylation_profile: Option<MethylationProfile>,
|
||||
pub drug_responses: Vec<DrugResponse>,
|
||||
}
|
||||
```
|
||||
|
||||
**Published Events**:
|
||||
- `PipelineStarted { sample_id: String, stages: Vec<String> }`
|
||||
- `StageCompleted { stage: String, duration_ms: u64 }`
|
||||
- `PipelineCompleted { sample_id: String, total_duration_ms: u64 }`
|
||||
- `PipelineFailed { stage: String, error: String }`
|
||||
|
||||
**Domain Language**:
|
||||
- Pipeline: directed acyclic graph of analysis stages
|
||||
- Stage: atomic analysis unit (alignment, variant calling, etc.)
|
||||
- Workflow: ordered execution of stages
|
||||
- Checkpoint: saved intermediate state
|
||||
- Provenance: lineage tracking of analysis steps
|
||||
|
||||
**Invariants**:
|
||||
- All enabled stages must execute in dependency order
|
||||
- Failed stage halts downstream execution
|
||||
- All results traceable to input data and parameters
|
||||
|
||||
**Anti-Corruption Layers**:
|
||||
|
||||
The Pipeline Context uses ACLs to prevent downstream contexts from depending on upstream implementation details:
|
||||
|
||||
1. **Sequence ACL**: Translates k-mer indices to alignment seeds
|
||||
2. **Alignment ACL**: Converts alignment gaps to variant candidates
|
||||
3. **Variant ACL**: Maps variants to protein mutations
|
||||
4. **Protein ACL**: Translates structure to functional predictions
|
||||
5. **Epigenetic ACL**: Converts methylation to gene expression estimates
|
||||
6. **Pharmacogenomic ACL**: Maps genotypes to clinical recommendations
|
||||
|
||||
## Context Relationship Matrix
|
||||
|
||||
| From ↓ / To → | Sequence | Alignment | Variant | Protein | Epigenomic | Pharma | Pipeline |
|
||||
|---------------|----------|-----------|---------|---------|------------|--------|----------|
|
||||
| Sequence | - | C-S | SK | SK | - | - | ACL |
|
||||
| Alignment | - | - | C-S | - | - | - | ACL |
|
||||
| Variant | - | - | - | Partner | ACL | - | ACL |
|
||||
| Protein | - | - | Partner | - | - | - | ACL |
|
||||
| Epigenomic | - | - | - | - | - | C-S | ACL |
|
||||
| Pharma | - | - | - | - | - | - | ACL |
|
||||
| Pipeline | C-S | C-S | C-S | C-S | C-S | C-S | - |
|
||||
|
||||
**Legend**:
|
||||
- C-S: Customer-Supplier
|
||||
- SK: Shared Kernel
|
||||
- Partner: Partnership
|
||||
- ACL: Anti-Corruption Layer
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### 1. Event-Driven Integration
|
||||
|
||||
Contexts communicate via domain events to maintain loose coupling:
|
||||
|
||||
```rust
|
||||
// Example: Variant Context publishes event
|
||||
pub enum DomainEvent {
|
||||
VariantCalled(VariantCalledEvent),
|
||||
ProteinStructurePredicted(ProteinPredictedEvent),
|
||||
// ...
|
||||
}
|
||||
|
||||
// Pipeline Context subscribes and translates
|
||||
impl EventHandler for GenomicPipeline {
|
||||
fn handle(&mut self, event: DomainEvent) {
|
||||
match event {
|
||||
DomainEvent::VariantCalled(e) => {
|
||||
if e.variant.is_coding() {
|
||||
self.trigger_protein_analysis(e.variant);
|
||||
}
|
||||
}
|
||||
// ...
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Shared Kernel Components
|
||||
|
||||
Core domain types shared across contexts:
|
||||
|
||||
```rust
|
||||
// In types.rs (core domain)
|
||||
pub struct GenomicPosition {
|
||||
pub chromosome: String,
|
||||
pub position: usize,
|
||||
}
|
||||
|
||||
pub struct QualityScore(pub f64); // Phred-scaled
|
||||
|
||||
pub enum Nucleotide { A, C, G, T }
|
||||
|
||||
pub struct GenomicRegion {
|
||||
pub chromosome: String,
|
||||
pub start: usize,
|
||||
pub end: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Anti-Corruption Layer Example
|
||||
|
||||
```rust
|
||||
// Variant → Protein ACL
|
||||
pub struct VariantToProteinTranslator {
|
||||
codon_table: CodonTable,
|
||||
}
|
||||
|
||||
impl VariantToProteinTranslator {
|
||||
pub fn translate_variant(&self, variant: &Variant) -> Option<ProteinMutation> {
|
||||
// Prevents protein context from depending on variant implementation
|
||||
let codon_change = self.map_to_codon(variant)?;
|
||||
let aa_change = self.codon_table.translate(codon_change)?;
|
||||
|
||||
Some(ProteinMutation {
|
||||
position: variant.position.position / 3,
|
||||
reference_aa: aa_change.reference,
|
||||
alternate_aa: aa_change.alternate,
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Bounded Context Responsibilities Summary
|
||||
|
||||
1. **Sequence Context**: K-mer indexing and sequence similarity (foundation)
|
||||
2. **Alignment Context**: Pairwise alignment and motif discovery
|
||||
3. **Variant Context**: Variant calling and population genetics
|
||||
4. **Protein Context**: Structure prediction and functional analysis
|
||||
5. **Epigenomic Context**: Methylation profiling and age prediction
|
||||
6. **Pharmacogenomic Context**: Drug-gene interactions and clinical recommendations
|
||||
7. **Pipeline Context**: Workflow orchestration and result aggregation
|
||||
|
||||
Each context maintains its own ubiquitous language, domain model, and business rules while integrating through well-defined relationships.
|
||||
1047
examples/dna/ddd/domain-model.md
Normal file
1047
examples/dna/ddd/domain-model.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user