52 KiB
ADR-002: Domain-Driven Design Bounded Contexts
Status
Accepted
Date
2026-01-15
Context
7sense is a bioacoustics analysis platform that transforms bird audio recordings into navigable geometric spaces. The system processes audio through Perch 2.0 embeddings (1536-dimensional vectors), stores them in RuVector with HNSW indexing, and applies GNN learning to discover patterns, motifs, and sequences. The output feeds into RAB (Retrieval-Augmented Bioacoustics) evidence packs for transparent, citation-backed interpretations.
The complexity of this domain requires clear separation of concerns to:
- Enable independent evolution of subsystems
- Maintain clear ownership boundaries
- Reduce coupling between technical and analytical components
- Support distributed team development
- Facilitate testing and validation at context boundaries
Decision
We adopt Domain-Driven Design (DDD) with six bounded contexts that represent distinct subdomains of the bioacoustics analysis pipeline:
- Audio Ingestion Context
- Embedding Context
- Vector Space Context
- Learning Context
- Analysis Context
- Interpretation Context
Bounded Context Definitions
1. Audio Ingestion Context
Purpose: Capture, segment, and preprocess raw audio recordings into analysis-ready call segments.
Ubiquitous Language
| Term | Definition |
|---|---|
| Recording | A continuous audio capture from a sensor at a specific location and time |
| Sensor | A physical audio capture device with known characteristics (sample rate, gain, location) |
| Call Segment | An isolated vocalization extracted from a recording (typically 5 seconds at 32kHz) |
| Segmentation | The process of detecting and extracting individual vocalizations from continuous audio |
| SNR (Signal-to-Noise Ratio) | Quality metric indicating clarity of vocalization above background noise |
| Preprocessing | Normalization, resampling, and filtering applied before embedding |
| Habitat | Environmental classification of the recording location |
| Soundscape | The full acoustic environment including all sound sources |
Aggregates and Entities
Aggregate: Recording
├── Entity: Recording (Aggregate Root)
│ ├── id: RecordingId (UUID)
│ ├── sensorId: SensorId
│ ├── location: GeoLocation {lat, lon, altitude}
│ ├── startTimestamp: DateTime
│ ├── duration: Duration
│ ├── habitat: HabitatType
│ ├── weather: WeatherConditions
│ ├── format: AudioFormat {sampleRate, channels, bitDepth}
│ └── status: IngestionStatus
│
├── Value Object: AudioFormat
│ ├── sampleRate: u32 (target: 32000 Hz)
│ ├── channels: u8 (target: 1 mono)
│ └── bitDepth: u8
│
└── Value Object: WeatherConditions
├── temperature: f32
├── humidity: f32
├── windSpeed: f32
└── precipitation: PrecipitationType
Aggregate: CallSegment
├── Entity: CallSegment (Aggregate Root)
│ ├── id: SegmentId (UUID)
│ ├── recordingId: RecordingId
│ ├── startOffset: Duration (t0_ms)
│ ├── endOffset: Duration (t1_ms)
│ ├── snr: f32
│ ├── energy: f32
│ ├── clippingScore: f32
│ ├── overlapScore: f32
│ └── qualityGrade: QualityGrade
│
└── Value Object: SegmentMetrics
├── peakAmplitude: f32
├── rmsEnergy: f32
├── zeroCrossingRate: f32
└── spectralCentroid: f32
Aggregate: Sensor
├── Entity: Sensor (Aggregate Root)
│ ├── id: SensorId
│ ├── model: String
│ ├── location: GeoLocation
│ ├── calibration: CalibrationProfile
│ └── status: SensorStatus
│
└── Value Object: CalibrationProfile
├── frequencyResponse: Vec<(f32, f32)>
├── noiseFloor: f32
└── lastCalibrated: DateTime
Domain Events
| Event | Payload | Published When |
|---|---|---|
RecordingReceived |
recordingId, sensorId, timestamp, duration | New audio file uploaded/streamed |
RecordingValidated |
recordingId, format, qualityScore | Format and quality checks pass |
RecordingRejected |
recordingId, reason, details | Recording fails validation |
SegmentationStarted |
recordingId, algorithm, parameters | Segmentation process begins |
SegmentExtracted |
segmentId, recordingId, timeRange, snr | Individual call isolated |
SegmentationCompleted |
recordingId, segmentCount, duration | All segments extracted |
PreprocessingCompleted |
segmentId, normalizedFormat | Segment ready for embedding |
Services
// Domain Services
trait SegmentationService {
fn segment_recording(recording: &Recording, config: SegmentationConfig)
-> Result<Vec<CallSegment>, SegmentationError>;
fn detect_vocalizations(audio: &AudioBuffer) -> Vec<TimeRange>;
}
trait PreprocessingService {
fn normalize(segment: &CallSegment) -> NormalizedAudio;
fn resample(audio: &AudioBuffer, targetRate: u32) -> AudioBuffer;
fn apply_bandpass(audio: &AudioBuffer, lowHz: f32, highHz: f32) -> AudioBuffer;
}
trait QualityAssessmentService {
fn compute_snr(segment: &CallSegment) -> f32;
fn detect_clipping(segment: &CallSegment) -> f32;
fn assess_quality(segment: &CallSegment) -> QualityGrade;
}
2. Embedding Context
Purpose: Transform preprocessed audio segments into 1536-dimensional Perch 2.0 embeddings suitable for vector space operations.
Ubiquitous Language
| Term | Definition |
|---|---|
| Embedding | A 1536-dimensional vector representation of a call segment |
| Perch 2.0 | Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone) |
| Mel Spectrogram | Time-frequency representation using mel-scaled frequency bins (500 frames x 128 bins) |
| Inference | The process of generating an embedding from audio input |
| Normalization | L2 normalization of embedding vectors for cosine similarity |
| Model Version | Specific checkpoint/version of the embedding model |
| Batch | Collection of segments processed together for efficiency |
| Embedding Stability | Consistency of embeddings for identical/similar inputs |
Aggregates and Entities
Aggregate: Embedding
├── Entity: Embedding (Aggregate Root)
│ ├── id: EmbeddingId (UUID)
│ ├── segmentId: SegmentId
│ ├── vector: Vec<f32> (dim=1536)
│ ├── modelVersion: ModelVersion
│ ├── norm: f32
│ ├── createdAt: DateTime
│ └── metadata: EmbeddingMetadata
│
└── Value Object: EmbeddingMetadata
├── inferenceLatency: Duration
├── batchId: Option<BatchId>
└── gpuUsed: bool
Aggregate: EmbeddingModel
├── Entity: EmbeddingModel (Aggregate Root)
│ ├── id: ModelId
│ ├── name: "perch2"
│ ├── version: SemanticVersion
│ ├── dimensions: u32 (1536)
│ ├── inputSpec: InputSpecification
│ └── status: ModelStatus
│
├── Value Object: InputSpecification
│ ├── sampleRate: 32000
│ ├── windowDuration: 5.0 seconds
│ ├── windowSamples: 160000
│ ├── melBins: 128
│ └── frequencyRange: (60, 16000) Hz
│
└── Value Object: ModelCheckpoint
├── path: String
├── format: ModelFormat (ONNX)
└── checksum: String
Aggregate: EmbeddingBatch
├── Entity: EmbeddingBatch (Aggregate Root)
│ ├── id: BatchId
│ ├── segmentIds: Vec<SegmentId>
│ ├── status: BatchStatus
│ ├── startedAt: DateTime
│ ├── completedAt: Option<DateTime>
│ └── metrics: BatchMetrics
│
└── Value Object: BatchMetrics
├── totalSegments: u32
├── successCount: u32
├── failureCount: u32
├── avgLatencyMs: f32
└── throughput: f32
Domain Events
| Event | Payload | Published When |
|---|---|---|
EmbeddingRequested |
segmentId, modelVersion, priority | Segment queued for embedding |
BatchCreated |
batchId, segmentIds, modelVersion | Batch assembled for processing |
InferenceStarted |
embeddingId/batchId, modelVersion | Model inference begins |
EmbeddingGenerated |
embeddingId, segmentId, vector, norm | Single embedding computed |
BatchCompleted |
batchId, successCount, failureCount | Batch processing finishes |
EmbeddingFailed |
segmentId, error, retryable | Inference failure |
ModelVersionChanged |
oldVersion, newVersion, migrationRequired | Model updated |
EmbeddingNormalized |
embeddingId, originalNorm, normalizedVector | L2 normalization applied |
Services
// Domain Services
trait EmbeddingService {
fn embed_segment(segment: &NormalizedAudio, model: &EmbeddingModel)
-> Result<Embedding, EmbeddingError>;
fn embed_batch(segments: Vec<&NormalizedAudio>, model: &EmbeddingModel)
-> Vec<Result<Embedding, EmbeddingError>>;
}
trait SpectrogramService {
fn compute_mel_spectrogram(audio: &AudioBuffer) -> MelSpectrogram;
fn validate_spectrogram(spectrogram: &MelSpectrogram) -> ValidationResult;
}
trait NormalizationService {
fn l2_normalize(embedding: &Embedding) -> NormalizedEmbedding;
fn validate_norm_stability(embeddings: &[Embedding]) -> StabilityReport;
}
trait ModelManagementService {
fn load_model(version: &ModelVersion) -> Result<EmbeddingModel, ModelError>;
fn validate_model_output(embedding: &Embedding) -> ValidationResult;
fn compare_model_versions(v1: &ModelVersion, v2: &ModelVersion, samples: &[AudioBuffer])
-> VersionComparisonReport;
}
3. Vector Space Context
Purpose: Index embeddings using HNSW, manage similarity search, and maintain the navigable neighbor graph that forms the geometric foundation.
Ubiquitous Language
| Term | Definition |
|---|---|
| HNSW Index | Hierarchical Navigable Small World graph for approximate nearest neighbor search |
| Neighbor Graph | Network of similarity edges connecting acoustically related embeddings |
| k-NN Query | Search for k nearest neighbors to a query vector |
| Similarity Edge | Weighted connection between two embeddings based on distance |
| Distance Metric | Function measuring dissimilarity (cosine, euclidean, Poincare) |
| Index Layer | One level in the HNSW hierarchical structure |
| Entry Point | Starting node for graph traversal in search |
| ef (Search) | Exploration factor controlling search accuracy vs. speed |
| M (Construction) | Maximum number of connections per node per layer |
Aggregates and Entities
Aggregate: VectorIndex
├── Entity: VectorIndex (Aggregate Root)
│ ├── id: IndexId
│ ├── name: String
│ ├── dimensions: u32 (1536)
│ ├── distanceMetric: DistanceMetric
│ ├── hnswConfig: HnswConfiguration
│ ├── vectorCount: u64
│ ├── layerCount: u32
│ └── status: IndexStatus
│
├── Value Object: HnswConfiguration
│ ├── m: u32 (max connections per layer)
│ ├── efConstruction: u32
│ ├── efSearch: u32
│ └── maxLayers: u32
│
└── Value Object: IndexStatistics
├── memoryUsage: u64
├── avgDegree: f32
├── layerDistribution: Vec<u32>
└── searchLatencyP99: Duration
Aggregate: IndexedVector
├── Entity: IndexedVector (Aggregate Root)
│ ├── id: VectorId
│ ├── embeddingId: EmbeddingId
│ ├── indexId: IndexId
│ ├── layerMembership: Vec<u32>
│ ├── neighborIds: Vec<VectorId>
│ └── insertedAt: DateTime
│
└── Value Object: VectorPosition
├── entryDistance: f32
└── layerDistances: Vec<f32>
Aggregate: SimilarityEdge
├── Entity: SimilarityEdge (Aggregate Root)
│ ├── id: EdgeId
│ ├── sourceId: VectorId
│ ├── targetId: VectorId
│ ├── distance: f32
│ ├── edgeType: EdgeType (SIMILAR, HNSW_NEIGHBOR)
│ └── weight: f32
│
└── Value Object: EdgeMetadata
├── createdAt: DateTime
├── lastAccessed: DateTime
└── accessCount: u32
Aggregate: SearchQuery
├── Entity: SearchQuery (Aggregate Root)
│ ├── id: QueryId
│ ├── queryVector: Vec<f32>
│ ├── k: u32
│ ├── efSearch: u32
│ ├── filters: Vec<SearchFilter>
│ └── results: Option<SearchResults>
│
└── Value Object: SearchResults
├── neighbors: Vec<(VectorId, f32)>
├── searchLatency: Duration
├── nodesVisited: u32
└── distanceComputations: u32
Domain Events
| Event | Payload | Published When |
|---|---|---|
IndexCreated |
indexId, config, distanceMetric | New HNSW index initialized |
VectorInserted |
vectorId, embeddingId, indexId, layerAssignment | Embedding added to index |
VectorRemoved |
vectorId, indexId | Embedding removed from index |
NeighborGraphUpdated |
indexId, affectedVectors, newEdges | Graph structure modified |
SimilarityEdgeCreated |
edgeId, sourceId, targetId, distance | New similarity link established |
SearchExecuted |
queryId, k, latency, resultsCount | k-NN search completed |
IndexRebuildStarted |
indexId, reason, estimatedDuration | Index reconstruction begins |
IndexRebuildCompleted |
indexId, vectorCount, duration | Index reconstruction finishes |
IndexOptimized |
indexId, beforeStats, afterStats | Index compaction/optimization |
Services
// Domain Services
trait VectorIndexService {
fn create_index(config: IndexConfiguration) -> Result<VectorIndex, IndexError>;
fn insert_vector(index: &mut VectorIndex, embedding: &Embedding)
-> Result<IndexedVector, InsertionError>;
fn remove_vector(index: &mut VectorIndex, vectorId: VectorId)
-> Result<(), RemovalError>;
fn rebuild_index(index: &mut VectorIndex) -> Result<IndexStatistics, RebuildError>;
}
trait SimilaritySearchService {
fn knn_search(index: &VectorIndex, query: &[f32], k: u32, ef: u32)
-> SearchResults;
fn range_search(index: &VectorIndex, query: &[f32], radius: f32)
-> Vec<(VectorId, f32)>;
fn batch_search(index: &VectorIndex, queries: &[Vec<f32>], k: u32)
-> Vec<SearchResults>;
}
trait NeighborGraphService {
fn get_neighbors(vectorId: VectorId, depth: u32) -> NeighborGraph;
fn compute_similarity_edges(index: &VectorIndex, topK: u32)
-> Vec<SimilarityEdge>;
fn prune_edges(index: &mut VectorIndex, threshold: f32) -> u32;
}
trait DistanceService {
fn cosine_distance(a: &[f32], b: &[f32]) -> f32;
fn euclidean_distance(a: &[f32], b: &[f32]) -> f32;
fn poincare_distance(a: &[f32], b: &[f32], curvature: f32) -> f32;
}
4. Learning Context
Purpose: Train and apply GNN models to refine embeddings, learn transition patterns, and enable continuous self-improvement of the vector space.
Ubiquitous Language
| Term | Definition |
|---|---|
| GNN (Graph Neural Network) | Neural network operating on graph-structured data |
| Message Passing | GNN mechanism where nodes aggregate information from neighbors |
| Graph Attention (GAT) | Attention-weighted message passing for learnable edge importance |
| Training Epoch | One complete pass through the training data |
| Contrastive Loss | Loss function pulling similar pairs together, pushing dissimilar apart |
| InfoNCE | Information Noise-Contrastive Estimation loss for self-supervised learning |
| Embedding Refinement | GNN-driven adjustment of embedding positions in vector space |
| Transition Edge | Temporal connection between sequential call segments |
| EWC (Elastic Weight Consolidation) | Technique preventing catastrophic forgetting during updates |
Aggregates and Entities
Aggregate: LearningModel
├── Entity: LearningModel (Aggregate Root)
│ ├── id: ModelId
│ ├── architecture: GnnArchitecture (GAT, GraphSAGE, GCN)
│ ├── layers: Vec<LayerConfig>
│ ├── version: SemanticVersion
│ ├── trainedAt: DateTime
│ ├── metrics: TrainingMetrics
│ └── status: ModelStatus
│
├── Value Object: LayerConfig
│ ├── layerType: LayerType
│ ├── inputDim: u32
│ ├── outputDim: u32
│ ├── heads: u32 (for attention)
│ └── dropout: f32
│
└── Value Object: TrainingMetrics
├── epochs: u32
├── finalLoss: f32
├── validationScore: f32
└── trainingDuration: Duration
Aggregate: TrainingSession
├── Entity: TrainingSession (Aggregate Root)
│ ├── id: SessionId
│ ├── modelId: ModelId
│ ├── config: TrainingConfiguration
│ ├── currentEpoch: u32
│ ├── status: SessionStatus
│ └── checkpoints: Vec<Checkpoint>
│
├── Value Object: TrainingConfiguration
│ ├── learningRate: f32
│ ├── batchSize: u32
│ ├── maxEpochs: u32
│ ├── lossFunction: LossType (InfoNCE, Triplet, Contrastive)
│ ├── optimizer: OptimizerType
│ └── ewcEnabled: bool
│
└── Value Object: Checkpoint
├── epoch: u32
├── loss: f32
├── weightsPath: String
└── timestamp: DateTime
Aggregate: TransitionGraph
├── Entity: TransitionGraph (Aggregate Root)
│ ├── id: GraphId
│ ├── nodeCount: u32
│ ├── edgeCount: u32
│ ├── edgeTypes: Vec<EdgeType>
│ └── statistics: GraphStatistics
│
├── Entity: TransitionEdge
│ ├── id: EdgeId
│ ├── sourceSegmentId: SegmentId
│ ├── targetSegmentId: SegmentId
│ ├── edgeType: EdgeType (NEXT, SIMILAR, CO_OCCURRENCE)
│ ├── weight: f32
│ └── metadata: EdgeMetadata
│
└── Value Object: GraphStatistics
├── avgDegree: f32
├── clusteringCoefficient: f32
├── diameter: u32
└── componentCount: u32
Aggregate: RefinedEmbedding
├── Entity: RefinedEmbedding (Aggregate Root)
│ ├── id: RefinedEmbeddingId
│ ├── originalEmbeddingId: EmbeddingId
│ ├── refinedVector: Vec<f32>
│ ├── modelVersion: ModelVersion
│ ├── refinementDelta: f32
│ └── createdAt: DateTime
│
└── Value Object: RefinementMetadata
├── neighborInfluence: Vec<(EmbeddingId, f32)>
├── attentionWeights: Vec<f32>
└── iterations: u32
Domain Events
| Event | Payload | Published When |
|---|---|---|
TrainingSessionStarted |
sessionId, modelId, config | GNN training begins |
EpochCompleted |
sessionId, epoch, loss, metrics | Training epoch finishes |
CheckpointSaved |
sessionId, epoch, path | Model weights saved |
TrainingCompleted |
sessionId, finalMetrics | Training session ends |
ModelDeployed |
modelId, version | New model activated |
EmbeddingsRefined |
batchId, vectorCount, avgDelta | GNN refinement applied |
TransitionEdgeDiscovered |
edgeId, source, target, type | New temporal relationship |
GraphStructureUpdated |
graphId, addedEdges, removedEdges | Transition graph modified |
LearningRateAdjusted |
sessionId, oldLr, newLr | Adaptive LR change |
EwcConsolidated |
sessionId, importantWeights | EWC protection updated |
Services
// Domain Services
trait GnnTrainingService {
fn start_training(model: &LearningModel, graph: &TransitionGraph, config: TrainingConfiguration)
-> Result<TrainingSession, TrainingError>;
fn run_epoch(session: &mut TrainingSession, batch: &GraphBatch)
-> EpochResult;
fn save_checkpoint(session: &TrainingSession) -> Result<Checkpoint, IoError>;
fn apply_ewc(session: &mut TrainingSession, importanceMatrix: &ImportanceMatrix);
}
trait EmbeddingRefinementService {
fn refine_embeddings(embeddings: &[Embedding], model: &LearningModel, graph: &TransitionGraph)
-> Vec<RefinedEmbedding>;
fn compute_refinement_delta(original: &Embedding, refined: &RefinedEmbedding) -> f32;
}
trait TransitionGraphService {
fn build_transition_graph(segments: &[CallSegment], recordings: &[Recording])
-> TransitionGraph;
fn add_temporal_edges(graph: &mut TransitionGraph, sequences: &[SegmentSequence]);
fn add_similarity_edges(graph: &mut TransitionGraph, index: &VectorIndex, topK: u32);
fn compute_graph_statistics(graph: &TransitionGraph) -> GraphStatistics;
}
trait AttentionService {
fn compute_attention_weights(query: &Embedding, neighbors: &[Embedding]) -> Vec<f32>;
fn apply_graph_attention(node: &GraphNode, neighbors: &[GraphNode], model: &GatLayer)
-> AttentionOutput;
}
5. Analysis Context
Purpose: Perform clustering, motif detection, sequence mining, and pattern discovery on the refined vector space.
Ubiquitous Language
| Term | Definition |
|---|---|
| Cluster | Group of acoustically similar call segments |
| Prototype | Representative embedding for a cluster (centroid or medoid) |
| Exemplar | Actual call segment that best represents a cluster |
| Motif | Recurring pattern or phrase in vocalization sequences |
| Sequence | Ordered series of call segments from a recording |
| Transition Matrix | Probability matrix of call-to-call transitions |
| Entropy Rate | Measure of unpredictability in vocalization sequences |
| Call Type | Functional category of vocalization (alarm, contact, song) |
| Dialect | Regional variation in vocalization patterns |
Aggregates and Entities
Aggregate: Cluster
├── Entity: Cluster (Aggregate Root)
│ ├── id: ClusterId
│ ├── method: ClusteringMethod (HDBSCAN, KMeans, Spectral)
│ ├── parameters: ClusteringParameters
│ ├── memberCount: u32
│ ├── cohesion: f32
│ ├── separation: f32
│ └── status: ClusterStatus
│
├── Entity: Prototype
│ ├── id: PrototypeId
│ ├── clusterId: ClusterId
│ ├── centroidVector: Vec<f32>
│ ├── exemplarIds: Vec<SegmentId>
│ └── stability: f32
│
└── Value Object: ClusteringParameters
├── minClusterSize: u32
├── minSamples: u32
├── epsilon: Option<f32>
└── metric: DistanceMetric
Aggregate: ClusterAssignment
├── Entity: ClusterAssignment (Aggregate Root)
│ ├── id: AssignmentId
│ ├── segmentId: SegmentId
│ ├── clusterId: ClusterId
│ ├── confidence: f32
│ ├── distance_to_centroid: f32
│ └── assignedAt: DateTime
│
└── Value Object: SoftAssignment
├── clusterProbabilities: Vec<(ClusterId, f32)>
└── isAmbiguous: bool
Aggregate: Motif
├── Entity: Motif (Aggregate Root)
│ ├── id: MotifId
│ ├── pattern: Vec<ClusterId>
│ ├── occurrenceCount: u32
│ ├── avgDuration: Duration
│ ├── confidence: f32
│ └── context: MotifContext
│
├── Value Object: MotifOccurrence
│ ├── recordingId: RecordingId
│ ├── startSegmentId: SegmentId
│ ├── segmentIds: Vec<SegmentId>
│ └── timestamp: DateTime
│
└── Value Object: MotifContext
├── typicalHabitat: Vec<HabitatType>
├── timeOfDay: Vec<TimeRange>
└── associatedBehavior: Option<String>
Aggregate: SequenceAnalysis
├── Entity: SequenceAnalysis (Aggregate Root)
│ ├── id: AnalysisId
│ ├── recordingId: RecordingId
│ ├── segmentSequence: Vec<SegmentId>
│ ├── clusterSequence: Vec<ClusterId>
│ ├── transitionMatrix: TransitionMatrix
│ └── metrics: SequenceMetrics
│
├── Value Object: TransitionMatrix
│ ├── clusterIds: Vec<ClusterId>
│ ├── probabilities: Vec<Vec<f32>>
│ └── observations: Vec<Vec<u32>>
│
└── Value Object: SequenceMetrics
├── entropyRate: f32
├── stereotypy: f32
├── motifDensity: f32
└── uniqueTransitions: u32
Aggregate: Anomaly
├── Entity: Anomaly (Aggregate Root)
│ ├── id: AnomalyId
│ ├── segmentId: SegmentId
│ ├── anomalyType: AnomalyType (Rare, Novel, Artifact)
│ ├── score: f32
│ ├── nearestCluster: Option<ClusterId>
│ └── detectedAt: DateTime
│
└── Value Object: AnomalyContext
├── neighborDistances: Vec<f32>
├── localDensity: f32
└── globalRarity: f32
Domain Events
| Event | Payload | Published When |
|---|---|---|
ClusteringStarted |
clusterId, method, parameters | Clustering analysis begins |
ClusteringCompleted |
clusterId, clusterCount, metrics | Clustering finishes |
ClusterAssigned |
assignmentId, segmentId, clusterId, confidence | Segment assigned to cluster |
PrototypeUpdated |
prototypeId, clusterId, newCentroid | Cluster representative changed |
MotifDiscovered |
motifId, pattern, occurrenceCount | New recurring pattern found |
MotifOccurrenceFound |
motifId, recordingId, segmentIds | Motif instance detected |
SequenceAnalyzed |
analysisId, recordingId, entropyRate | Sequence metrics computed |
AnomalyDetected |
anomalyId, segmentId, score, type | Unusual vocalization found |
TransitionMatrixUpdated |
recordingId, entropyChange | Transition probabilities recalculated |
DialectIdentified |
clusterId, region, distinctiveness | Regional variant discovered |
Services
// Domain Services
trait ClusteringService {
fn cluster_embeddings(embeddings: &[Embedding], method: ClusteringMethod, params: ClusteringParameters)
-> ClusteringResult;
fn assign_to_cluster(embedding: &Embedding, clusters: &[Cluster])
-> ClusterAssignment;
fn compute_prototype(cluster: &Cluster, members: &[Embedding]) -> Prototype;
fn evaluate_clustering(clusters: &[Cluster], assignments: &[ClusterAssignment])
-> ClusteringMetrics;
}
trait MotifDetectionService {
fn discover_motifs(sequences: &[SequenceAnalysis], minSupport: u32, maxLength: u32)
-> Vec<Motif>;
fn find_motif_occurrences(motif: &Motif, sequence: &SequenceAnalysis)
-> Vec<MotifOccurrence>;
fn validate_motif_dtw(motif: &Motif, occurrences: &[MotifOccurrence])
-> ValidationResult;
}
trait SequenceAnalysisService {
fn analyze_sequence(recording: &Recording, segments: &[CallSegment], clusters: &[Cluster])
-> SequenceAnalysis;
fn compute_transition_matrix(clusterSequence: &[ClusterId]) -> TransitionMatrix;
fn compute_entropy_rate(matrix: &TransitionMatrix) -> f32;
fn compute_stereotypy(matrix: &TransitionMatrix) -> f32;
}
trait AnomalyDetectionService {
fn detect_anomalies(embeddings: &[Embedding], index: &VectorIndex, threshold: f32)
-> Vec<Anomaly>;
fn classify_anomaly(anomaly: &Anomaly, context: &AnalysisContext) -> AnomalyType;
fn compute_local_outlier_factor(embedding: &Embedding, neighbors: &[Embedding]) -> f32;
}
6. Interpretation Context
Purpose: Generate RAB (Retrieval-Augmented Bioacoustics) evidence packs and constrained interpretations with full citation and transparency.
Ubiquitous Language
| Term | Definition |
|---|---|
| Evidence Pack | Structured collection of supporting data for an interpretation |
| Citation | Reference to specific retrieved calls supporting a statement |
| Constrained Generation | Output limited to evidence-backed structural descriptions |
| Structural Descriptor | Objective characterization (pitch contour, rhythm, spectral texture) |
| Interpretation | Evidence-backed analysis of vocalization meaning/context |
| Confidence Level | Certainty measure based on evidence quality and quantity |
| Attribution | Clear link between interpretation statements and source evidence |
| Hypothesis | Testable suggestion generated from pattern analysis |
Aggregates and Entities
Aggregate: EvidencePack
├── Entity: EvidencePack (Aggregate Root)
│ ├── id: EvidencePackId
│ ├── querySegmentId: SegmentId
│ ├── queryType: QueryType (Segment, TimeInterval, Habitat)
│ ├── retrievedNeighbors: Vec<RetrievedNeighbor>
│ ├── clusterExemplars: Vec<Exemplar>
│ ├── sequenceContext: SequenceContext
│ ├── signalQuality: SignalQuality
│ └── generatedAt: DateTime
│
├── Value Object: RetrievedNeighbor
│ ├── segmentId: SegmentId
│ ├── distance: f32
│ ├── clusterId: Option<ClusterId>
│ ├── spectrogramThumbnail: Option<ThumbnailId>
│ └── metadata: SegmentMetadata
│
├── Value Object: SequenceContext
│ ├── previousSegments: Vec<SegmentId>
│ ├── nextSegments: Vec<SegmentId>
│ ├── positionInRecording: f32
│ └── localMotifs: Vec<MotifId>
│
└── Value Object: SignalQuality
├── snr: f32
├── clippingScore: f32
├── overlapScore: f32
└── qualityGrade: QualityGrade
Aggregate: Interpretation
├── Entity: Interpretation (Aggregate Root)
│ ├── id: InterpretationId
│ ├── evidencePackId: EvidencePackId
│ ├── interpretationType: InterpretationType
│ ├── statements: Vec<InterpretationStatement>
│ ├── overallConfidence: f32
│ ├── hypotheses: Vec<Hypothesis>
│ └── generatedAt: DateTime
│
├── Entity: InterpretationStatement
│ ├── id: StatementId
│ ├── content: String
│ ├── statementType: StatementType
│ ├── citations: Vec<Citation>
│ ├── confidence: f32
│ └── constraints: Vec<Constraint>
│
├── Value Object: Citation
│ ├── sourceType: CitationSource (Neighbor, Exemplar, Motif, Cluster)
│ ├── sourceId: String
│ ├── relevance: f32
│ └── excerpt: Option<String>
│
└── Value Object: Hypothesis
├── statement: String
├── testability: TestabilityLevel
├── supportingEvidence: Vec<CitationId>
└── suggestedExperiment: Option<String>
Aggregate: StructuralDescriptor
├── Entity: StructuralDescriptor (Aggregate Root)
│ ├── id: DescriptorId
│ ├── segmentId: SegmentId
│ ├── pitchContour: PitchContourStats
│ ├── rhythmProfile: RhythmProfile
│ ├── spectralTexture: SpectralTexture
│ └── sequenceRole: SequenceRole
│
├── Value Object: PitchContourStats
│ ├── minFrequency: f32
│ ├── maxFrequency: f32
│ ├── meanFrequency: f32
│ ├── contourShape: ContourShape
│ └── bandwidth: f32
│
├── Value Object: RhythmProfile
│ ├── duration: Duration
│ ├── syllableCount: u32
│ ├── interSyllableIntervals: Vec<Duration>
│ └── rhythmRegularity: f32
│
├── Value Object: SpectralTexture
│ ├── harmonicity: f32
│ ├── spectralCentroid: f32
│ ├── spectralFlatness: f32
│ └── wienerEntropy: f32
│
└── Value Object: SequenceRole
├── typicalPredecessors: Vec<ClusterId>
├── typicalSuccessors: Vec<ClusterId>
├── positionDistribution: PositionDistribution
└── contextualFrequency: f32
Aggregate: MonitoringSummary
├── Entity: MonitoringSummary (Aggregate Root)
│ ├── id: SummaryId
│ ├── timeRange: TimeRange
│ ├── location: GeoLocation
│ ├── callCounts: HashMap<ClusterId, u32>
│ ├── diversityMetrics: DiversityMetrics
│ ├── anomalies: Vec<AnomalyId>
│ └── interpretations: Vec<InterpretationId>
│
└── Value Object: DiversityMetrics
├── speciesRichness: u32
├── shannonIndex: f32
├── simpsonIndex: f32
└── evenness: f32
Domain Events
| Event | Payload | Published When |
|---|---|---|
EvidencePackRequested |
querySegmentId, queryType, parameters | Analysis request initiated |
EvidencePackAssembled |
evidencePackId, neighborCount, exemplarCount | Evidence gathering complete |
InterpretationGenerated |
interpretationId, evidencePackId, statementCount | Interpretation created |
StatementCited |
statementId, citations | Statement linked to evidence |
HypothesisProposed |
hypothesisId, interpretationId, testability | Testable hypothesis generated |
StructuralDescriptorComputed |
descriptorId, segmentId | Acoustic features extracted |
MonitoringSummaryGenerated |
summaryId, timeRange, location | Period summary created |
AnnotationSuggested |
segmentId, suggestedLabel, confidence | Label recommendation made |
InterpretationValidated |
interpretationId, validationResult | Expert review completed |
Services
// Domain Services
trait EvidencePackService {
fn assemble_evidence_pack(
querySegment: &CallSegment,
index: &VectorIndex,
clusters: &[Cluster],
sequences: &[SequenceAnalysis],
config: EvidencePackConfig
) -> EvidencePack;
fn retrieve_neighbors(segment: &CallSegment, index: &VectorIndex, k: u32)
-> Vec<RetrievedNeighbor>;
fn get_sequence_context(segment: &CallSegment, recording: &Recording)
-> SequenceContext;
}
trait InterpretationService {
fn generate_interpretation(evidencePack: &EvidencePack, constraints: &[Constraint])
-> Interpretation;
fn create_statement(content: &str, citations: &[Citation], statementType: StatementType)
-> InterpretationStatement;
fn generate_hypotheses(evidencePack: &EvidencePack, interpretation: &Interpretation)
-> Vec<Hypothesis>;
}
trait StructuralDescriptorService {
fn compute_descriptors(segment: &CallSegment) -> StructuralDescriptor;
fn extract_pitch_contour(audio: &AudioBuffer) -> PitchContourStats;
fn analyze_rhythm(segments: &[CallSegment]) -> RhythmProfile;
fn compute_spectral_texture(spectrogram: &MelSpectrogram) -> SpectralTexture;
}
trait MonitoringService {
fn generate_summary(
recordings: &[Recording],
timeRange: TimeRange,
location: GeoLocation
) -> MonitoringSummary;
fn compute_diversity_metrics(clusterAssignments: &[ClusterAssignment])
-> DiversityMetrics;
fn detect_temporal_patterns(summaries: &[MonitoringSummary])
-> Vec<TemporalPattern>;
}
trait CitationService {
fn create_citation(source: CitationSource, sourceId: &str, relevance: f32)
-> Citation;
fn validate_citation(citation: &Citation, evidencePack: &EvidencePack)
-> ValidationResult;
fn format_attribution(statement: &InterpretationStatement) -> String;
}
Context Mapping
Relationships Between Contexts
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CONTEXT MAP │
└─────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────┐
│ Audio Ingestion │
│ Context │
└────────┬─────────┘
│
│ [U/D] CallSegment
│ Published Language
▼
┌──────────────────┐
│ Embedding │
│ Context │
└────────┬─────────┘
│
│ [U/D] Embedding
│ Published Language
▼
┌──────────────────┐
│ Vector Space │◄──────────────────────┐
│ Context │ │
└────────┬─────────┘ │
│ │
┌────────────────┼────────────────┐ │
│ │ │ │
│ [ACL] │ [ACL] │ [ACL] │
▼ ▼ ▼ │
┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ Learning │ │ Analysis │ │ Interpretation │ │
│ Context │ │ Context │ │ Context │ │
└──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │
│ [Partnership] │ [Customer/ │ [Customer/ │
│ │ Supplier] │ Supplier] │
└────────────────┴───────────────────┘ │
│ │
│ RefinedEmbedding │
└────────────────────────────────────┘
LEGEND:
[U/D] = Upstream/Downstream (Published Language)
[ACL] = Anti-Corruption Layer
[Partnership] = Shared development, mutual dependency
[Customer/Supplier] = Clear provider/consumer relationship
Integration Patterns
| Upstream | Downstream | Pattern | Shared Kernel |
|---|---|---|---|
| Audio Ingestion | Embedding | Published Language | CallSegment, SegmentId, QualityGrade |
| Embedding | Vector Space | Published Language | Embedding, EmbeddingId, Vec<f32> |
| Vector Space | Learning | ACL + Partnership | VectorIndex, NeighborGraph |
| Vector Space | Analysis | ACL + Customer/Supplier | SearchResults, SimilarityEdge |
| Vector Space | Interpretation | ACL + Customer/Supplier | SearchResults, RetrievedNeighbor |
| Learning | Vector Space | Partnership | RefinedEmbedding (feedback loop) |
| Analysis | Interpretation | Customer/Supplier | Cluster, Motif, SequenceAnalysis |
Anti-Corruption Layers
Learning Context ACL
/// Translates Vector Space concepts to Learning domain
mod learning_acl {
use crate::vector_space::{VectorIndex, IndexedVector, SimilarityEdge};
use crate::learning::{TransitionGraph, GraphNode, GraphEdge};
pub struct VectorSpaceAdapter {
index: Arc<VectorIndex>,
}
impl VectorSpaceAdapter {
/// Convert HNSW neighbor graph to GNN-compatible format
pub fn to_transition_graph(&self, max_neighbors: u32) -> TransitionGraph {
let nodes: Vec<GraphNode> = self.index
.iter_vectors()
.map(|v| GraphNode {
id: v.id.into(),
embedding: v.embedding_id,
features: self.extract_node_features(&v),
})
.collect();
let edges: Vec<GraphEdge> = self.index
.iter_similarity_edges()
.filter(|e| e.distance < self.distance_threshold())
.map(|e| GraphEdge {
source: e.source_id.into(),
target: e.target_id.into(),
edge_type: EdgeType::Similarity,
weight: 1.0 - e.distance, // Convert distance to similarity
})
.collect();
TransitionGraph::new(nodes, edges)
}
/// Query neighbors without exposing HNSW internals
pub fn get_trainable_neighbors(&self, vector_id: VectorId, k: u32)
-> Vec<(GraphNodeId, f32)>
{
self.index
.knn_search_by_id(vector_id, k)
.map(|(vid, dist)| (vid.into(), 1.0 - dist))
.collect()
}
}
}
Analysis Context ACL
/// Translates Vector Space results to Analysis domain
mod analysis_acl {
use crate::vector_space::{SearchResults, VectorIndex};
use crate::analysis::{ClusterCandidate, SimilarityMatrix};
pub struct SearchResultsAdapter;
impl SearchResultsAdapter {
/// Convert k-NN results to clustering input
pub fn to_similarity_matrix(
index: &VectorIndex,
embeddings: &[EmbeddingId],
k: u32
) -> SimilarityMatrix {
let n = embeddings.len();
let mut matrix = SimilarityMatrix::new(n);
for (i, emb_id) in embeddings.iter().enumerate() {
let neighbors = index.knn_search_by_embedding_id(*emb_id, k);
for (neighbor_id, distance) in neighbors {
if let Some(j) = embeddings.iter().position(|e| *e == neighbor_id) {
matrix.set(i, j, 1.0 - distance);
}
}
}
matrix
}
/// Extract cluster candidates from dense regions
pub fn identify_dense_regions(
index: &VectorIndex,
min_density: f32
) -> Vec<ClusterCandidate> {
index.iter_vectors()
.filter_map(|v| {
let local_density = index.compute_local_density(v.id);
if local_density >= min_density {
Some(ClusterCandidate {
center_id: v.embedding_id,
density: local_density,
estimated_size: (local_density * 100.0) as u32,
})
} else {
None
}
})
.collect()
}
}
}
Interpretation Context ACL
/// Translates Analysis results to Interpretation domain
mod interpretation_acl {
use crate::analysis::{Cluster, Motif, SequenceAnalysis, ClusterAssignment};
use crate::interpretation::{
EvidencePack, RetrievedNeighbor, Exemplar, SequenceContext
};
pub struct AnalysisAdapter {
clusters: Arc<HashMap<ClusterId, Cluster>>,
motifs: Arc<HashMap<MotifId, Motif>>,
}
impl AnalysisAdapter {
/// Build evidence pack from analysis artifacts
pub fn build_evidence_pack(
&self,
query_segment: &CallSegment,
neighbors: Vec<(SegmentId, f32)>,
sequence: &SequenceAnalysis,
) -> EvidencePack {
let retrieved_neighbors: Vec<RetrievedNeighbor> = neighbors
.into_iter()
.map(|(seg_id, distance)| {
let cluster_id = self.find_cluster_for_segment(seg_id);
RetrievedNeighbor {
segment_id: seg_id,
distance,
cluster_id,
spectogram_thumbnail: self.generate_thumbnail(seg_id),
metadata: self.get_segment_metadata(seg_id),
}
})
.collect();
let exemplars: Vec<Exemplar> = self.get_relevant_exemplars(
&retrieved_neighbors,
5 // top 5 exemplars
);
let sequence_context = SequenceContext {
previous_segments: sequence.get_predecessors(query_segment.id, 3),
next_segments: sequence.get_successors(query_segment.id, 3),
position_in_recording: sequence.relative_position(query_segment.id),
local_motifs: self.find_local_motifs(query_segment.id, sequence),
};
EvidencePack {
id: EvidencePackId::new(),
query_segment_id: query_segment.id,
query_type: QueryType::Segment,
retrieved_neighbors,
cluster_exemplars: exemplars,
sequence_context,
signal_quality: self.assess_quality(query_segment),
generated_at: Utc::now(),
}
}
/// Convert cluster to citable evidence
pub fn cluster_to_citation(&self, cluster_id: ClusterId) -> Citation {
let cluster = self.clusters.get(&cluster_id)
.expect("Cluster not found");
Citation {
source_type: CitationSource::Cluster,
source_id: cluster_id.to_string(),
relevance: cluster.cohesion,
excerpt: Some(format!(
"Cluster {} with {} members (cohesion: {:.2})",
cluster_id, cluster.member_count, cluster.cohesion
)),
}
}
}
}
Shared Kernel
The following types are shared across multiple contexts and form the ubiquitous language foundation:
/// Shared identifiers
pub mod shared_kernel {
use uuid::Uuid;
// Core identifiers shared across all contexts
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct RecordingId(Uuid);
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct SegmentId(Uuid);
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct EmbeddingId(Uuid);
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct ClusterId(Uuid);
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub struct MotifId(Uuid);
// Shared value objects
#[derive(Clone, Debug)]
pub struct GeoLocation {
pub latitude: f64,
pub longitude: f64,
pub altitude: Option<f32>,
}
#[derive(Clone, Debug)]
pub struct TimeRange {
pub start: DateTime<Utc>,
pub end: DateTime<Utc>,
}
#[derive(Clone, Copy, Debug, PartialEq)]
pub enum QualityGrade {
Excellent, // SNR > 20dB, no clipping
Good, // SNR > 10dB, minimal issues
Fair, // SNR > 5dB, some artifacts
Poor, // SNR < 5dB or significant issues
Unusable, // Too degraded for analysis
}
// Embedding vector type (1536-D for Perch 2.0)
pub type EmbeddingVector = Vec<f32>;
pub const EMBEDDING_DIM: usize = 1536;
// Audio format constants for Perch 2.0
pub const TARGET_SAMPLE_RATE: u32 = 32000;
pub const TARGET_WINDOW_SECONDS: f32 = 5.0;
pub const TARGET_WINDOW_SAMPLES: usize = 160000;
pub const MEL_BINS: usize = 128;
pub const MEL_FRAMES: usize = 500;
}
Event Flow
Recording Upload
│
▼
┌──────────────────────────────────────────────────────────────┐
│ AUDIO INGESTION CONTEXT │
│ RecordingReceived → RecordingValidated → SegmentExtracted │
└──────────────────────────────────────────────────────────────┘
│ CallSegment (Published Language)
▼
┌──────────────────────────────────────────────────────────────┐
│ EMBEDDING CONTEXT │
│ EmbeddingRequested → InferenceStarted → EmbeddingGenerated │
└──────────────────────────────────────────────────────────────┘
│ Embedding (Published Language)
▼
┌──────────────────────────────────────────────────────────────┐
│ VECTOR SPACE CONTEXT │
│ VectorInserted → NeighborGraphUpdated → SimilarityEdgeCreated│
└──────────────────────────────────────────────────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────────┐ ┌─────────────────────┐
│ LEARNING │ │ ANALYSIS │ │ INTERPRETATION │
│ CONTEXT │ │ CONTEXT │ │ CONTEXT │
│ │ │ │ │ │
│ Training- │ │ Clustering- │ │ EvidencePack- │
│ Started │ │ Completed │ │ Assembled │
│ │ │ │ │ │ │ │ │
│ ▼ │ │ ▼ │ │ ▼ │
│ Embeddings- │ │ MotifDiscovered │ │ Interpretation- │
│ Refined │ │ │ │ Generated │
└─────────────┘ └─────────────────┘ └─────────────────────┘
│
│ RefinedEmbedding (feedback to Vector Space)
└──────────────────────────────────────────────────────────┐
▼
┌──────────────────┐
│ VECTOR SPACE │
│ (Index Update) │
└──────────────────┘
Consequences
Benefits
- Clear Ownership: Each bounded context has explicit responsibilities and can be developed by independent teams
- Reduced Coupling: Anti-corruption layers prevent domain model pollution across boundaries
- Testability: Each context can be tested in isolation with well-defined interfaces
- Scalability: Contexts can be deployed and scaled independently
- Evolvability: Internal implementations can change without affecting other contexts
- Domain Alignment: Ubiquitous language matches the bioacoustics domain
Risks
- Complexity: Six contexts introduce coordination overhead
- Data Duplication: Some data may be replicated across context boundaries
- Event Consistency: Eventual consistency between contexts requires careful handling
- Learning Curve: Team must understand DDD concepts and context boundaries
Mitigations
- Use event sourcing for cross-context communication
- Implement saga patterns for multi-context transactions
- Maintain comprehensive integration tests at context boundaries
- Document context mappings and keep them updated
References
- Evans, Eric. "Domain-Driven Design: Tackling Complexity in the Heart of Software" (2003)
- Vernon, Vaughn. "Implementing Domain-Driven Design" (2013)
- Perch 2.0 Paper: arXiv:2508.04665
- RuVector Documentation: https://github.com/ruvnet/ruvector
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-15 | Architecture Team | Initial ADR |