# ADR-002: Domain-Driven Design Bounded Contexts ## Status Accepted ## Date 2026-01-15 ## Context 7sense is a bioacoustics analysis platform that transforms bird audio recordings into navigable geometric spaces. The system processes audio through Perch 2.0 embeddings (1536-dimensional vectors), stores them in RuVector with HNSW indexing, and applies GNN learning to discover patterns, motifs, and sequences. The output feeds into RAB (Retrieval-Augmented Bioacoustics) evidence packs for transparent, citation-backed interpretations. The complexity of this domain requires clear separation of concerns to: - Enable independent evolution of subsystems - Maintain clear ownership boundaries - Reduce coupling between technical and analytical components - Support distributed team development - Facilitate testing and validation at context boundaries ## Decision We adopt Domain-Driven Design (DDD) with six bounded contexts that represent distinct subdomains of the bioacoustics analysis pipeline: 1. **Audio Ingestion Context** 2. **Embedding Context** 3. **Vector Space Context** 4. **Learning Context** 5. **Analysis Context** 6. **Interpretation Context** --- ## Bounded Context Definitions ### 1. Audio Ingestion Context **Purpose**: Capture, segment, and preprocess raw audio recordings into analysis-ready call segments. #### Ubiquitous Language | Term | Definition | |------|------------| | **Recording** | A continuous audio capture from a sensor at a specific location and time | | **Sensor** | A physical audio capture device with known characteristics (sample rate, gain, location) | | **Call Segment** | An isolated vocalization extracted from a recording (typically 5 seconds at 32kHz) | | **Segmentation** | The process of detecting and extracting individual vocalizations from continuous audio | | **SNR (Signal-to-Noise Ratio)** | Quality metric indicating clarity of vocalization above background noise | | **Preprocessing** | Normalization, resampling, and filtering applied before embedding | | **Habitat** | Environmental classification of the recording location | | **Soundscape** | The full acoustic environment including all sound sources | #### Aggregates and Entities ``` Aggregate: Recording ├── Entity: Recording (Aggregate Root) │ ├── id: RecordingId (UUID) │ ├── sensorId: SensorId │ ├── location: GeoLocation {lat, lon, altitude} │ ├── startTimestamp: DateTime │ ├── duration: Duration │ ├── habitat: HabitatType │ ├── weather: WeatherConditions │ ├── format: AudioFormat {sampleRate, channels, bitDepth} │ └── status: IngestionStatus │ ├── Value Object: AudioFormat │ ├── sampleRate: u32 (target: 32000 Hz) │ ├── channels: u8 (target: 1 mono) │ └── bitDepth: u8 │ └── Value Object: WeatherConditions ├── temperature: f32 ├── humidity: f32 ├── windSpeed: f32 └── precipitation: PrecipitationType Aggregate: CallSegment ├── Entity: CallSegment (Aggregate Root) │ ├── id: SegmentId (UUID) │ ├── recordingId: RecordingId │ ├── startOffset: Duration (t0_ms) │ ├── endOffset: Duration (t1_ms) │ ├── snr: f32 │ ├── energy: f32 │ ├── clippingScore: f32 │ ├── overlapScore: f32 │ └── qualityGrade: QualityGrade │ └── Value Object: SegmentMetrics ├── peakAmplitude: f32 ├── rmsEnergy: f32 ├── zeroCrossingRate: f32 └── spectralCentroid: f32 Aggregate: Sensor ├── Entity: Sensor (Aggregate Root) │ ├── id: SensorId │ ├── model: String │ ├── location: GeoLocation │ ├── calibration: CalibrationProfile │ └── status: SensorStatus │ └── Value Object: CalibrationProfile ├── frequencyResponse: Vec<(f32, f32)> ├── noiseFloor: f32 └── lastCalibrated: DateTime ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `RecordingReceived` | recordingId, sensorId, timestamp, duration | New audio file uploaded/streamed | | `RecordingValidated` | recordingId, format, qualityScore | Format and quality checks pass | | `RecordingRejected` | recordingId, reason, details | Recording fails validation | | `SegmentationStarted` | recordingId, algorithm, parameters | Segmentation process begins | | `SegmentExtracted` | segmentId, recordingId, timeRange, snr | Individual call isolated | | `SegmentationCompleted` | recordingId, segmentCount, duration | All segments extracted | | `PreprocessingCompleted` | segmentId, normalizedFormat | Segment ready for embedding | #### Services ```rust // Domain Services trait SegmentationService { fn segment_recording(recording: &Recording, config: SegmentationConfig) -> Result, SegmentationError>; fn detect_vocalizations(audio: &AudioBuffer) -> Vec; } trait PreprocessingService { fn normalize(segment: &CallSegment) -> NormalizedAudio; fn resample(audio: &AudioBuffer, targetRate: u32) -> AudioBuffer; fn apply_bandpass(audio: &AudioBuffer, lowHz: f32, highHz: f32) -> AudioBuffer; } trait QualityAssessmentService { fn compute_snr(segment: &CallSegment) -> f32; fn detect_clipping(segment: &CallSegment) -> f32; fn assess_quality(segment: &CallSegment) -> QualityGrade; } ``` --- ### 2. Embedding Context **Purpose**: Transform preprocessed audio segments into 1536-dimensional Perch 2.0 embeddings suitable for vector space operations. #### Ubiquitous Language | Term | Definition | |------|------------| | **Embedding** | A 1536-dimensional vector representation of a call segment | | **Perch 2.0** | Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone) | | **Mel Spectrogram** | Time-frequency representation using mel-scaled frequency bins (500 frames x 128 bins) | | **Inference** | The process of generating an embedding from audio input | | **Normalization** | L2 normalization of embedding vectors for cosine similarity | | **Model Version** | Specific checkpoint/version of the embedding model | | **Batch** | Collection of segments processed together for efficiency | | **Embedding Stability** | Consistency of embeddings for identical/similar inputs | #### Aggregates and Entities ``` Aggregate: Embedding ├── Entity: Embedding (Aggregate Root) │ ├── id: EmbeddingId (UUID) │ ├── segmentId: SegmentId │ ├── vector: Vec (dim=1536) │ ├── modelVersion: ModelVersion │ ├── norm: f32 │ ├── createdAt: DateTime │ └── metadata: EmbeddingMetadata │ └── Value Object: EmbeddingMetadata ├── inferenceLatency: Duration ├── batchId: Option └── gpuUsed: bool Aggregate: EmbeddingModel ├── Entity: EmbeddingModel (Aggregate Root) │ ├── id: ModelId │ ├── name: "perch2" │ ├── version: SemanticVersion │ ├── dimensions: u32 (1536) │ ├── inputSpec: InputSpecification │ └── status: ModelStatus │ ├── Value Object: InputSpecification │ ├── sampleRate: 32000 │ ├── windowDuration: 5.0 seconds │ ├── windowSamples: 160000 │ ├── melBins: 128 │ └── frequencyRange: (60, 16000) Hz │ └── Value Object: ModelCheckpoint ├── path: String ├── format: ModelFormat (ONNX) └── checksum: String Aggregate: EmbeddingBatch ├── Entity: EmbeddingBatch (Aggregate Root) │ ├── id: BatchId │ ├── segmentIds: Vec │ ├── status: BatchStatus │ ├── startedAt: DateTime │ ├── completedAt: Option │ └── metrics: BatchMetrics │ └── Value Object: BatchMetrics ├── totalSegments: u32 ├── successCount: u32 ├── failureCount: u32 ├── avgLatencyMs: f32 └── throughput: f32 ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `EmbeddingRequested` | segmentId, modelVersion, priority | Segment queued for embedding | | `BatchCreated` | batchId, segmentIds, modelVersion | Batch assembled for processing | | `InferenceStarted` | embeddingId/batchId, modelVersion | Model inference begins | | `EmbeddingGenerated` | embeddingId, segmentId, vector, norm | Single embedding computed | | `BatchCompleted` | batchId, successCount, failureCount | Batch processing finishes | | `EmbeddingFailed` | segmentId, error, retryable | Inference failure | | `ModelVersionChanged` | oldVersion, newVersion, migrationRequired | Model updated | | `EmbeddingNormalized` | embeddingId, originalNorm, normalizedVector | L2 normalization applied | #### Services ```rust // Domain Services trait EmbeddingService { fn embed_segment(segment: &NormalizedAudio, model: &EmbeddingModel) -> Result; fn embed_batch(segments: Vec<&NormalizedAudio>, model: &EmbeddingModel) -> Vec>; } trait SpectrogramService { fn compute_mel_spectrogram(audio: &AudioBuffer) -> MelSpectrogram; fn validate_spectrogram(spectrogram: &MelSpectrogram) -> ValidationResult; } trait NormalizationService { fn l2_normalize(embedding: &Embedding) -> NormalizedEmbedding; fn validate_norm_stability(embeddings: &[Embedding]) -> StabilityReport; } trait ModelManagementService { fn load_model(version: &ModelVersion) -> Result; fn validate_model_output(embedding: &Embedding) -> ValidationResult; fn compare_model_versions(v1: &ModelVersion, v2: &ModelVersion, samples: &[AudioBuffer]) -> VersionComparisonReport; } ``` --- ### 3. Vector Space Context **Purpose**: Index embeddings using HNSW, manage similarity search, and maintain the navigable neighbor graph that forms the geometric foundation. #### Ubiquitous Language | Term | Definition | |------|------------| | **HNSW Index** | Hierarchical Navigable Small World graph for approximate nearest neighbor search | | **Neighbor Graph** | Network of similarity edges connecting acoustically related embeddings | | **k-NN Query** | Search for k nearest neighbors to a query vector | | **Similarity Edge** | Weighted connection between two embeddings based on distance | | **Distance Metric** | Function measuring dissimilarity (cosine, euclidean, Poincare) | | **Index Layer** | One level in the HNSW hierarchical structure | | **Entry Point** | Starting node for graph traversal in search | | **ef (Search)** | Exploration factor controlling search accuracy vs. speed | | **M (Construction)** | Maximum number of connections per node per layer | #### Aggregates and Entities ``` Aggregate: VectorIndex ├── Entity: VectorIndex (Aggregate Root) │ ├── id: IndexId │ ├── name: String │ ├── dimensions: u32 (1536) │ ├── distanceMetric: DistanceMetric │ ├── hnswConfig: HnswConfiguration │ ├── vectorCount: u64 │ ├── layerCount: u32 │ └── status: IndexStatus │ ├── Value Object: HnswConfiguration │ ├── m: u32 (max connections per layer) │ ├── efConstruction: u32 │ ├── efSearch: u32 │ └── maxLayers: u32 │ └── Value Object: IndexStatistics ├── memoryUsage: u64 ├── avgDegree: f32 ├── layerDistribution: Vec └── searchLatencyP99: Duration Aggregate: IndexedVector ├── Entity: IndexedVector (Aggregate Root) │ ├── id: VectorId │ ├── embeddingId: EmbeddingId │ ├── indexId: IndexId │ ├── layerMembership: Vec │ ├── neighborIds: Vec │ └── insertedAt: DateTime │ └── Value Object: VectorPosition ├── entryDistance: f32 └── layerDistances: Vec Aggregate: SimilarityEdge ├── Entity: SimilarityEdge (Aggregate Root) │ ├── id: EdgeId │ ├── sourceId: VectorId │ ├── targetId: VectorId │ ├── distance: f32 │ ├── edgeType: EdgeType (SIMILAR, HNSW_NEIGHBOR) │ └── weight: f32 │ └── Value Object: EdgeMetadata ├── createdAt: DateTime ├── lastAccessed: DateTime └── accessCount: u32 Aggregate: SearchQuery ├── Entity: SearchQuery (Aggregate Root) │ ├── id: QueryId │ ├── queryVector: Vec │ ├── k: u32 │ ├── efSearch: u32 │ ├── filters: Vec │ └── results: Option │ └── Value Object: SearchResults ├── neighbors: Vec<(VectorId, f32)> ├── searchLatency: Duration ├── nodesVisited: u32 └── distanceComputations: u32 ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `IndexCreated` | indexId, config, distanceMetric | New HNSW index initialized | | `VectorInserted` | vectorId, embeddingId, indexId, layerAssignment | Embedding added to index | | `VectorRemoved` | vectorId, indexId | Embedding removed from index | | `NeighborGraphUpdated` | indexId, affectedVectors, newEdges | Graph structure modified | | `SimilarityEdgeCreated` | edgeId, sourceId, targetId, distance | New similarity link established | | `SearchExecuted` | queryId, k, latency, resultsCount | k-NN search completed | | `IndexRebuildStarted` | indexId, reason, estimatedDuration | Index reconstruction begins | | `IndexRebuildCompleted` | indexId, vectorCount, duration | Index reconstruction finishes | | `IndexOptimized` | indexId, beforeStats, afterStats | Index compaction/optimization | #### Services ```rust // Domain Services trait VectorIndexService { fn create_index(config: IndexConfiguration) -> Result; fn insert_vector(index: &mut VectorIndex, embedding: &Embedding) -> Result; fn remove_vector(index: &mut VectorIndex, vectorId: VectorId) -> Result<(), RemovalError>; fn rebuild_index(index: &mut VectorIndex) -> Result; } trait SimilaritySearchService { fn knn_search(index: &VectorIndex, query: &[f32], k: u32, ef: u32) -> SearchResults; fn range_search(index: &VectorIndex, query: &[f32], radius: f32) -> Vec<(VectorId, f32)>; fn batch_search(index: &VectorIndex, queries: &[Vec], k: u32) -> Vec; } trait NeighborGraphService { fn get_neighbors(vectorId: VectorId, depth: u32) -> NeighborGraph; fn compute_similarity_edges(index: &VectorIndex, topK: u32) -> Vec; fn prune_edges(index: &mut VectorIndex, threshold: f32) -> u32; } trait DistanceService { fn cosine_distance(a: &[f32], b: &[f32]) -> f32; fn euclidean_distance(a: &[f32], b: &[f32]) -> f32; fn poincare_distance(a: &[f32], b: &[f32], curvature: f32) -> f32; } ``` --- ### 4. Learning Context **Purpose**: Train and apply GNN models to refine embeddings, learn transition patterns, and enable continuous self-improvement of the vector space. #### Ubiquitous Language | Term | Definition | |------|------------| | **GNN (Graph Neural Network)** | Neural network operating on graph-structured data | | **Message Passing** | GNN mechanism where nodes aggregate information from neighbors | | **Graph Attention (GAT)** | Attention-weighted message passing for learnable edge importance | | **Training Epoch** | One complete pass through the training data | | **Contrastive Loss** | Loss function pulling similar pairs together, pushing dissimilar apart | | **InfoNCE** | Information Noise-Contrastive Estimation loss for self-supervised learning | | **Embedding Refinement** | GNN-driven adjustment of embedding positions in vector space | | **Transition Edge** | Temporal connection between sequential call segments | | **EWC (Elastic Weight Consolidation)** | Technique preventing catastrophic forgetting during updates | #### Aggregates and Entities ``` Aggregate: LearningModel ├── Entity: LearningModel (Aggregate Root) │ ├── id: ModelId │ ├── architecture: GnnArchitecture (GAT, GraphSAGE, GCN) │ ├── layers: Vec │ ├── version: SemanticVersion │ ├── trainedAt: DateTime │ ├── metrics: TrainingMetrics │ └── status: ModelStatus │ ├── Value Object: LayerConfig │ ├── layerType: LayerType │ ├── inputDim: u32 │ ├── outputDim: u32 │ ├── heads: u32 (for attention) │ └── dropout: f32 │ └── Value Object: TrainingMetrics ├── epochs: u32 ├── finalLoss: f32 ├── validationScore: f32 └── trainingDuration: Duration Aggregate: TrainingSession ├── Entity: TrainingSession (Aggregate Root) │ ├── id: SessionId │ ├── modelId: ModelId │ ├── config: TrainingConfiguration │ ├── currentEpoch: u32 │ ├── status: SessionStatus │ └── checkpoints: Vec │ ├── Value Object: TrainingConfiguration │ ├── learningRate: f32 │ ├── batchSize: u32 │ ├── maxEpochs: u32 │ ├── lossFunction: LossType (InfoNCE, Triplet, Contrastive) │ ├── optimizer: OptimizerType │ └── ewcEnabled: bool │ └── Value Object: Checkpoint ├── epoch: u32 ├── loss: f32 ├── weightsPath: String └── timestamp: DateTime Aggregate: TransitionGraph ├── Entity: TransitionGraph (Aggregate Root) │ ├── id: GraphId │ ├── nodeCount: u32 │ ├── edgeCount: u32 │ ├── edgeTypes: Vec │ └── statistics: GraphStatistics │ ├── Entity: TransitionEdge │ ├── id: EdgeId │ ├── sourceSegmentId: SegmentId │ ├── targetSegmentId: SegmentId │ ├── edgeType: EdgeType (NEXT, SIMILAR, CO_OCCURRENCE) │ ├── weight: f32 │ └── metadata: EdgeMetadata │ └── Value Object: GraphStatistics ├── avgDegree: f32 ├── clusteringCoefficient: f32 ├── diameter: u32 └── componentCount: u32 Aggregate: RefinedEmbedding ├── Entity: RefinedEmbedding (Aggregate Root) │ ├── id: RefinedEmbeddingId │ ├── originalEmbeddingId: EmbeddingId │ ├── refinedVector: Vec │ ├── modelVersion: ModelVersion │ ├── refinementDelta: f32 │ └── createdAt: DateTime │ └── Value Object: RefinementMetadata ├── neighborInfluence: Vec<(EmbeddingId, f32)> ├── attentionWeights: Vec └── iterations: u32 ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `TrainingSessionStarted` | sessionId, modelId, config | GNN training begins | | `EpochCompleted` | sessionId, epoch, loss, metrics | Training epoch finishes | | `CheckpointSaved` | sessionId, epoch, path | Model weights saved | | `TrainingCompleted` | sessionId, finalMetrics | Training session ends | | `ModelDeployed` | modelId, version | New model activated | | `EmbeddingsRefined` | batchId, vectorCount, avgDelta | GNN refinement applied | | `TransitionEdgeDiscovered` | edgeId, source, target, type | New temporal relationship | | `GraphStructureUpdated` | graphId, addedEdges, removedEdges | Transition graph modified | | `LearningRateAdjusted` | sessionId, oldLr, newLr | Adaptive LR change | | `EwcConsolidated` | sessionId, importantWeights | EWC protection updated | #### Services ```rust // Domain Services trait GnnTrainingService { fn start_training(model: &LearningModel, graph: &TransitionGraph, config: TrainingConfiguration) -> Result; fn run_epoch(session: &mut TrainingSession, batch: &GraphBatch) -> EpochResult; fn save_checkpoint(session: &TrainingSession) -> Result; fn apply_ewc(session: &mut TrainingSession, importanceMatrix: &ImportanceMatrix); } trait EmbeddingRefinementService { fn refine_embeddings(embeddings: &[Embedding], model: &LearningModel, graph: &TransitionGraph) -> Vec; fn compute_refinement_delta(original: &Embedding, refined: &RefinedEmbedding) -> f32; } trait TransitionGraphService { fn build_transition_graph(segments: &[CallSegment], recordings: &[Recording]) -> TransitionGraph; fn add_temporal_edges(graph: &mut TransitionGraph, sequences: &[SegmentSequence]); fn add_similarity_edges(graph: &mut TransitionGraph, index: &VectorIndex, topK: u32); fn compute_graph_statistics(graph: &TransitionGraph) -> GraphStatistics; } trait AttentionService { fn compute_attention_weights(query: &Embedding, neighbors: &[Embedding]) -> Vec; fn apply_graph_attention(node: &GraphNode, neighbors: &[GraphNode], model: &GatLayer) -> AttentionOutput; } ``` --- ### 5. Analysis Context **Purpose**: Perform clustering, motif detection, sequence mining, and pattern discovery on the refined vector space. #### Ubiquitous Language | Term | Definition | |------|------------| | **Cluster** | Group of acoustically similar call segments | | **Prototype** | Representative embedding for a cluster (centroid or medoid) | | **Exemplar** | Actual call segment that best represents a cluster | | **Motif** | Recurring pattern or phrase in vocalization sequences | | **Sequence** | Ordered series of call segments from a recording | | **Transition Matrix** | Probability matrix of call-to-call transitions | | **Entropy Rate** | Measure of unpredictability in vocalization sequences | | **Call Type** | Functional category of vocalization (alarm, contact, song) | | **Dialect** | Regional variation in vocalization patterns | #### Aggregates and Entities ``` Aggregate: Cluster ├── Entity: Cluster (Aggregate Root) │ ├── id: ClusterId │ ├── method: ClusteringMethod (HDBSCAN, KMeans, Spectral) │ ├── parameters: ClusteringParameters │ ├── memberCount: u32 │ ├── cohesion: f32 │ ├── separation: f32 │ └── status: ClusterStatus │ ├── Entity: Prototype │ ├── id: PrototypeId │ ├── clusterId: ClusterId │ ├── centroidVector: Vec │ ├── exemplarIds: Vec │ └── stability: f32 │ └── Value Object: ClusteringParameters ├── minClusterSize: u32 ├── minSamples: u32 ├── epsilon: Option └── metric: DistanceMetric Aggregate: ClusterAssignment ├── Entity: ClusterAssignment (Aggregate Root) │ ├── id: AssignmentId │ ├── segmentId: SegmentId │ ├── clusterId: ClusterId │ ├── confidence: f32 │ ├── distance_to_centroid: f32 │ └── assignedAt: DateTime │ └── Value Object: SoftAssignment ├── clusterProbabilities: Vec<(ClusterId, f32)> └── isAmbiguous: bool Aggregate: Motif ├── Entity: Motif (Aggregate Root) │ ├── id: MotifId │ ├── pattern: Vec │ ├── occurrenceCount: u32 │ ├── avgDuration: Duration │ ├── confidence: f32 │ └── context: MotifContext │ ├── Value Object: MotifOccurrence │ ├── recordingId: RecordingId │ ├── startSegmentId: SegmentId │ ├── segmentIds: Vec │ └── timestamp: DateTime │ └── Value Object: MotifContext ├── typicalHabitat: Vec ├── timeOfDay: Vec └── associatedBehavior: Option Aggregate: SequenceAnalysis ├── Entity: SequenceAnalysis (Aggregate Root) │ ├── id: AnalysisId │ ├── recordingId: RecordingId │ ├── segmentSequence: Vec │ ├── clusterSequence: Vec │ ├── transitionMatrix: TransitionMatrix │ └── metrics: SequenceMetrics │ ├── Value Object: TransitionMatrix │ ├── clusterIds: Vec │ ├── probabilities: Vec> │ └── observations: Vec> │ └── Value Object: SequenceMetrics ├── entropyRate: f32 ├── stereotypy: f32 ├── motifDensity: f32 └── uniqueTransitions: u32 Aggregate: Anomaly ├── Entity: Anomaly (Aggregate Root) │ ├── id: AnomalyId │ ├── segmentId: SegmentId │ ├── anomalyType: AnomalyType (Rare, Novel, Artifact) │ ├── score: f32 │ ├── nearestCluster: Option │ └── detectedAt: DateTime │ └── Value Object: AnomalyContext ├── neighborDistances: Vec ├── localDensity: f32 └── globalRarity: f32 ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `ClusteringStarted` | clusterId, method, parameters | Clustering analysis begins | | `ClusteringCompleted` | clusterId, clusterCount, metrics | Clustering finishes | | `ClusterAssigned` | assignmentId, segmentId, clusterId, confidence | Segment assigned to cluster | | `PrototypeUpdated` | prototypeId, clusterId, newCentroid | Cluster representative changed | | `MotifDiscovered` | motifId, pattern, occurrenceCount | New recurring pattern found | | `MotifOccurrenceFound` | motifId, recordingId, segmentIds | Motif instance detected | | `SequenceAnalyzed` | analysisId, recordingId, entropyRate | Sequence metrics computed | | `AnomalyDetected` | anomalyId, segmentId, score, type | Unusual vocalization found | | `TransitionMatrixUpdated` | recordingId, entropyChange | Transition probabilities recalculated | | `DialectIdentified` | clusterId, region, distinctiveness | Regional variant discovered | #### Services ```rust // Domain Services trait ClusteringService { fn cluster_embeddings(embeddings: &[Embedding], method: ClusteringMethod, params: ClusteringParameters) -> ClusteringResult; fn assign_to_cluster(embedding: &Embedding, clusters: &[Cluster]) -> ClusterAssignment; fn compute_prototype(cluster: &Cluster, members: &[Embedding]) -> Prototype; fn evaluate_clustering(clusters: &[Cluster], assignments: &[ClusterAssignment]) -> ClusteringMetrics; } trait MotifDetectionService { fn discover_motifs(sequences: &[SequenceAnalysis], minSupport: u32, maxLength: u32) -> Vec; fn find_motif_occurrences(motif: &Motif, sequence: &SequenceAnalysis) -> Vec; fn validate_motif_dtw(motif: &Motif, occurrences: &[MotifOccurrence]) -> ValidationResult; } trait SequenceAnalysisService { fn analyze_sequence(recording: &Recording, segments: &[CallSegment], clusters: &[Cluster]) -> SequenceAnalysis; fn compute_transition_matrix(clusterSequence: &[ClusterId]) -> TransitionMatrix; fn compute_entropy_rate(matrix: &TransitionMatrix) -> f32; fn compute_stereotypy(matrix: &TransitionMatrix) -> f32; } trait AnomalyDetectionService { fn detect_anomalies(embeddings: &[Embedding], index: &VectorIndex, threshold: f32) -> Vec; fn classify_anomaly(anomaly: &Anomaly, context: &AnalysisContext) -> AnomalyType; fn compute_local_outlier_factor(embedding: &Embedding, neighbors: &[Embedding]) -> f32; } ``` --- ### 6. Interpretation Context **Purpose**: Generate RAB (Retrieval-Augmented Bioacoustics) evidence packs and constrained interpretations with full citation and transparency. #### Ubiquitous Language | Term | Definition | |------|------------| | **Evidence Pack** | Structured collection of supporting data for an interpretation | | **Citation** | Reference to specific retrieved calls supporting a statement | | **Constrained Generation** | Output limited to evidence-backed structural descriptions | | **Structural Descriptor** | Objective characterization (pitch contour, rhythm, spectral texture) | | **Interpretation** | Evidence-backed analysis of vocalization meaning/context | | **Confidence Level** | Certainty measure based on evidence quality and quantity | | **Attribution** | Clear link between interpretation statements and source evidence | | **Hypothesis** | Testable suggestion generated from pattern analysis | #### Aggregates and Entities ``` Aggregate: EvidencePack ├── Entity: EvidencePack (Aggregate Root) │ ├── id: EvidencePackId │ ├── querySegmentId: SegmentId │ ├── queryType: QueryType (Segment, TimeInterval, Habitat) │ ├── retrievedNeighbors: Vec │ ├── clusterExemplars: Vec │ ├── sequenceContext: SequenceContext │ ├── signalQuality: SignalQuality │ └── generatedAt: DateTime │ ├── Value Object: RetrievedNeighbor │ ├── segmentId: SegmentId │ ├── distance: f32 │ ├── clusterId: Option │ ├── spectrogramThumbnail: Option │ └── metadata: SegmentMetadata │ ├── Value Object: SequenceContext │ ├── previousSegments: Vec │ ├── nextSegments: Vec │ ├── positionInRecording: f32 │ └── localMotifs: Vec │ └── Value Object: SignalQuality ├── snr: f32 ├── clippingScore: f32 ├── overlapScore: f32 └── qualityGrade: QualityGrade Aggregate: Interpretation ├── Entity: Interpretation (Aggregate Root) │ ├── id: InterpretationId │ ├── evidencePackId: EvidencePackId │ ├── interpretationType: InterpretationType │ ├── statements: Vec │ ├── overallConfidence: f32 │ ├── hypotheses: Vec │ └── generatedAt: DateTime │ ├── Entity: InterpretationStatement │ ├── id: StatementId │ ├── content: String │ ├── statementType: StatementType │ ├── citations: Vec │ ├── confidence: f32 │ └── constraints: Vec │ ├── Value Object: Citation │ ├── sourceType: CitationSource (Neighbor, Exemplar, Motif, Cluster) │ ├── sourceId: String │ ├── relevance: f32 │ └── excerpt: Option │ └── Value Object: Hypothesis ├── statement: String ├── testability: TestabilityLevel ├── supportingEvidence: Vec └── suggestedExperiment: Option Aggregate: StructuralDescriptor ├── Entity: StructuralDescriptor (Aggregate Root) │ ├── id: DescriptorId │ ├── segmentId: SegmentId │ ├── pitchContour: PitchContourStats │ ├── rhythmProfile: RhythmProfile │ ├── spectralTexture: SpectralTexture │ └── sequenceRole: SequenceRole │ ├── Value Object: PitchContourStats │ ├── minFrequency: f32 │ ├── maxFrequency: f32 │ ├── meanFrequency: f32 │ ├── contourShape: ContourShape │ └── bandwidth: f32 │ ├── Value Object: RhythmProfile │ ├── duration: Duration │ ├── syllableCount: u32 │ ├── interSyllableIntervals: Vec │ └── rhythmRegularity: f32 │ ├── Value Object: SpectralTexture │ ├── harmonicity: f32 │ ├── spectralCentroid: f32 │ ├── spectralFlatness: f32 │ └── wienerEntropy: f32 │ └── Value Object: SequenceRole ├── typicalPredecessors: Vec ├── typicalSuccessors: Vec ├── positionDistribution: PositionDistribution └── contextualFrequency: f32 Aggregate: MonitoringSummary ├── Entity: MonitoringSummary (Aggregate Root) │ ├── id: SummaryId │ ├── timeRange: TimeRange │ ├── location: GeoLocation │ ├── callCounts: HashMap │ ├── diversityMetrics: DiversityMetrics │ ├── anomalies: Vec │ └── interpretations: Vec │ └── Value Object: DiversityMetrics ├── speciesRichness: u32 ├── shannonIndex: f32 ├── simpsonIndex: f32 └── evenness: f32 ``` #### Domain Events | Event | Payload | Published When | |-------|---------|----------------| | `EvidencePackRequested` | querySegmentId, queryType, parameters | Analysis request initiated | | `EvidencePackAssembled` | evidencePackId, neighborCount, exemplarCount | Evidence gathering complete | | `InterpretationGenerated` | interpretationId, evidencePackId, statementCount | Interpretation created | | `StatementCited` | statementId, citations | Statement linked to evidence | | `HypothesisProposed` | hypothesisId, interpretationId, testability | Testable hypothesis generated | | `StructuralDescriptorComputed` | descriptorId, segmentId | Acoustic features extracted | | `MonitoringSummaryGenerated` | summaryId, timeRange, location | Period summary created | | `AnnotationSuggested` | segmentId, suggestedLabel, confidence | Label recommendation made | | `InterpretationValidated` | interpretationId, validationResult | Expert review completed | #### Services ```rust // Domain Services trait EvidencePackService { fn assemble_evidence_pack( querySegment: &CallSegment, index: &VectorIndex, clusters: &[Cluster], sequences: &[SequenceAnalysis], config: EvidencePackConfig ) -> EvidencePack; fn retrieve_neighbors(segment: &CallSegment, index: &VectorIndex, k: u32) -> Vec; fn get_sequence_context(segment: &CallSegment, recording: &Recording) -> SequenceContext; } trait InterpretationService { fn generate_interpretation(evidencePack: &EvidencePack, constraints: &[Constraint]) -> Interpretation; fn create_statement(content: &str, citations: &[Citation], statementType: StatementType) -> InterpretationStatement; fn generate_hypotheses(evidencePack: &EvidencePack, interpretation: &Interpretation) -> Vec; } trait StructuralDescriptorService { fn compute_descriptors(segment: &CallSegment) -> StructuralDescriptor; fn extract_pitch_contour(audio: &AudioBuffer) -> PitchContourStats; fn analyze_rhythm(segments: &[CallSegment]) -> RhythmProfile; fn compute_spectral_texture(spectrogram: &MelSpectrogram) -> SpectralTexture; } trait MonitoringService { fn generate_summary( recordings: &[Recording], timeRange: TimeRange, location: GeoLocation ) -> MonitoringSummary; fn compute_diversity_metrics(clusterAssignments: &[ClusterAssignment]) -> DiversityMetrics; fn detect_temporal_patterns(summaries: &[MonitoringSummary]) -> Vec; } trait CitationService { fn create_citation(source: CitationSource, sourceId: &str, relevance: f32) -> Citation; fn validate_citation(citation: &Citation, evidencePack: &EvidencePack) -> ValidationResult; fn format_attribution(statement: &InterpretationStatement) -> String; } ``` --- ## Context Mapping ### Relationships Between Contexts ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ CONTEXT MAP │ └─────────────────────────────────────────────────────────────────────────────────┘ ┌──────────────────┐ │ Audio Ingestion │ │ Context │ └────────┬─────────┘ │ │ [U/D] CallSegment │ Published Language ▼ ┌──────────────────┐ │ Embedding │ │ Context │ └────────┬─────────┘ │ │ [U/D] Embedding │ Published Language ▼ ┌──────────────────┐ │ Vector Space │◄──────────────────────┐ │ Context │ │ └────────┬─────────┘ │ │ │ ┌────────────────┼────────────────┐ │ │ │ │ │ │ [ACL] │ [ACL] │ [ACL] │ ▼ ▼ ▼ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │ Learning │ │ Analysis │ │ Interpretation │ │ │ Context │ │ Context │ │ Context │ │ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ │ │ │ │ │ [Partnership] │ [Customer/ │ [Customer/ │ │ │ Supplier] │ Supplier] │ └────────────────┴───────────────────┘ │ │ │ │ RefinedEmbedding │ └────────────────────────────────────┘ LEGEND: [U/D] = Upstream/Downstream (Published Language) [ACL] = Anti-Corruption Layer [Partnership] = Shared development, mutual dependency [Customer/Supplier] = Clear provider/consumer relationship ``` ### Integration Patterns | Upstream | Downstream | Pattern | Shared Kernel | |----------|------------|---------|---------------| | Audio Ingestion | Embedding | Published Language | `CallSegment`, `SegmentId`, `QualityGrade` | | Embedding | Vector Space | Published Language | `Embedding`, `EmbeddingId`, `Vec` | | Vector Space | Learning | ACL + Partnership | `VectorIndex`, `NeighborGraph` | | Vector Space | Analysis | ACL + Customer/Supplier | `SearchResults`, `SimilarityEdge` | | Vector Space | Interpretation | ACL + Customer/Supplier | `SearchResults`, `RetrievedNeighbor` | | Learning | Vector Space | Partnership | `RefinedEmbedding` (feedback loop) | | Analysis | Interpretation | Customer/Supplier | `Cluster`, `Motif`, `SequenceAnalysis` | --- ## Anti-Corruption Layers ### Learning Context ACL ```rust /// Translates Vector Space concepts to Learning domain mod learning_acl { use crate::vector_space::{VectorIndex, IndexedVector, SimilarityEdge}; use crate::learning::{TransitionGraph, GraphNode, GraphEdge}; pub struct VectorSpaceAdapter { index: Arc, } impl VectorSpaceAdapter { /// Convert HNSW neighbor graph to GNN-compatible format pub fn to_transition_graph(&self, max_neighbors: u32) -> TransitionGraph { let nodes: Vec = self.index .iter_vectors() .map(|v| GraphNode { id: v.id.into(), embedding: v.embedding_id, features: self.extract_node_features(&v), }) .collect(); let edges: Vec = self.index .iter_similarity_edges() .filter(|e| e.distance < self.distance_threshold()) .map(|e| GraphEdge { source: e.source_id.into(), target: e.target_id.into(), edge_type: EdgeType::Similarity, weight: 1.0 - e.distance, // Convert distance to similarity }) .collect(); TransitionGraph::new(nodes, edges) } /// Query neighbors without exposing HNSW internals pub fn get_trainable_neighbors(&self, vector_id: VectorId, k: u32) -> Vec<(GraphNodeId, f32)> { self.index .knn_search_by_id(vector_id, k) .map(|(vid, dist)| (vid.into(), 1.0 - dist)) .collect() } } } ``` ### Analysis Context ACL ```rust /// Translates Vector Space results to Analysis domain mod analysis_acl { use crate::vector_space::{SearchResults, VectorIndex}; use crate::analysis::{ClusterCandidate, SimilarityMatrix}; pub struct SearchResultsAdapter; impl SearchResultsAdapter { /// Convert k-NN results to clustering input pub fn to_similarity_matrix( index: &VectorIndex, embeddings: &[EmbeddingId], k: u32 ) -> SimilarityMatrix { let n = embeddings.len(); let mut matrix = SimilarityMatrix::new(n); for (i, emb_id) in embeddings.iter().enumerate() { let neighbors = index.knn_search_by_embedding_id(*emb_id, k); for (neighbor_id, distance) in neighbors { if let Some(j) = embeddings.iter().position(|e| *e == neighbor_id) { matrix.set(i, j, 1.0 - distance); } } } matrix } /// Extract cluster candidates from dense regions pub fn identify_dense_regions( index: &VectorIndex, min_density: f32 ) -> Vec { index.iter_vectors() .filter_map(|v| { let local_density = index.compute_local_density(v.id); if local_density >= min_density { Some(ClusterCandidate { center_id: v.embedding_id, density: local_density, estimated_size: (local_density * 100.0) as u32, }) } else { None } }) .collect() } } } ``` ### Interpretation Context ACL ```rust /// Translates Analysis results to Interpretation domain mod interpretation_acl { use crate::analysis::{Cluster, Motif, SequenceAnalysis, ClusterAssignment}; use crate::interpretation::{ EvidencePack, RetrievedNeighbor, Exemplar, SequenceContext }; pub struct AnalysisAdapter { clusters: Arc>, motifs: Arc>, } impl AnalysisAdapter { /// Build evidence pack from analysis artifacts pub fn build_evidence_pack( &self, query_segment: &CallSegment, neighbors: Vec<(SegmentId, f32)>, sequence: &SequenceAnalysis, ) -> EvidencePack { let retrieved_neighbors: Vec = neighbors .into_iter() .map(|(seg_id, distance)| { let cluster_id = self.find_cluster_for_segment(seg_id); RetrievedNeighbor { segment_id: seg_id, distance, cluster_id, spectogram_thumbnail: self.generate_thumbnail(seg_id), metadata: self.get_segment_metadata(seg_id), } }) .collect(); let exemplars: Vec = self.get_relevant_exemplars( &retrieved_neighbors, 5 // top 5 exemplars ); let sequence_context = SequenceContext { previous_segments: sequence.get_predecessors(query_segment.id, 3), next_segments: sequence.get_successors(query_segment.id, 3), position_in_recording: sequence.relative_position(query_segment.id), local_motifs: self.find_local_motifs(query_segment.id, sequence), }; EvidencePack { id: EvidencePackId::new(), query_segment_id: query_segment.id, query_type: QueryType::Segment, retrieved_neighbors, cluster_exemplars: exemplars, sequence_context, signal_quality: self.assess_quality(query_segment), generated_at: Utc::now(), } } /// Convert cluster to citable evidence pub fn cluster_to_citation(&self, cluster_id: ClusterId) -> Citation { let cluster = self.clusters.get(&cluster_id) .expect("Cluster not found"); Citation { source_type: CitationSource::Cluster, source_id: cluster_id.to_string(), relevance: cluster.cohesion, excerpt: Some(format!( "Cluster {} with {} members (cohesion: {:.2})", cluster_id, cluster.member_count, cluster.cohesion )), } } } } ``` --- ## Shared Kernel The following types are shared across multiple contexts and form the ubiquitous language foundation: ```rust /// Shared identifiers pub mod shared_kernel { use uuid::Uuid; // Core identifiers shared across all contexts #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)] pub struct RecordingId(Uuid); #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)] pub struct SegmentId(Uuid); #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)] pub struct EmbeddingId(Uuid); #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)] pub struct ClusterId(Uuid); #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)] pub struct MotifId(Uuid); // Shared value objects #[derive(Clone, Debug)] pub struct GeoLocation { pub latitude: f64, pub longitude: f64, pub altitude: Option, } #[derive(Clone, Debug)] pub struct TimeRange { pub start: DateTime, pub end: DateTime, } #[derive(Clone, Copy, Debug, PartialEq)] pub enum QualityGrade { Excellent, // SNR > 20dB, no clipping Good, // SNR > 10dB, minimal issues Fair, // SNR > 5dB, some artifacts Poor, // SNR < 5dB or significant issues Unusable, // Too degraded for analysis } // Embedding vector type (1536-D for Perch 2.0) pub type EmbeddingVector = Vec; pub const EMBEDDING_DIM: usize = 1536; // Audio format constants for Perch 2.0 pub const TARGET_SAMPLE_RATE: u32 = 32000; pub const TARGET_WINDOW_SECONDS: f32 = 5.0; pub const TARGET_WINDOW_SAMPLES: usize = 160000; pub const MEL_BINS: usize = 128; pub const MEL_FRAMES: usize = 500; } ``` --- ## Event Flow ``` Recording Upload │ ▼ ┌──────────────────────────────────────────────────────────────┐ │ AUDIO INGESTION CONTEXT │ │ RecordingReceived → RecordingValidated → SegmentExtracted │ └──────────────────────────────────────────────────────────────┘ │ CallSegment (Published Language) ▼ ┌──────────────────────────────────────────────────────────────┐ │ EMBEDDING CONTEXT │ │ EmbeddingRequested → InferenceStarted → EmbeddingGenerated │ └──────────────────────────────────────────────────────────────┘ │ Embedding (Published Language) ▼ ┌──────────────────────────────────────────────────────────────┐ │ VECTOR SPACE CONTEXT │ │ VectorInserted → NeighborGraphUpdated → SimilarityEdgeCreated│ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ LEARNING │ │ ANALYSIS │ │ INTERPRETATION │ │ CONTEXT │ │ CONTEXT │ │ CONTEXT │ │ │ │ │ │ │ │ Training- │ │ Clustering- │ │ EvidencePack- │ │ Started │ │ Completed │ │ Assembled │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ ▼ │ │ ▼ │ │ Embeddings- │ │ MotifDiscovered │ │ Interpretation- │ │ Refined │ │ │ │ Generated │ └─────────────┘ └─────────────────┘ └─────────────────────┘ │ │ RefinedEmbedding (feedback to Vector Space) └──────────────────────────────────────────────────────────┐ ▼ ┌──────────────────┐ │ VECTOR SPACE │ │ (Index Update) │ └──────────────────┘ ``` --- ## Consequences ### Benefits 1. **Clear Ownership**: Each bounded context has explicit responsibilities and can be developed by independent teams 2. **Reduced Coupling**: Anti-corruption layers prevent domain model pollution across boundaries 3. **Testability**: Each context can be tested in isolation with well-defined interfaces 4. **Scalability**: Contexts can be deployed and scaled independently 5. **Evolvability**: Internal implementations can change without affecting other contexts 6. **Domain Alignment**: Ubiquitous language matches the bioacoustics domain ### Risks 1. **Complexity**: Six contexts introduce coordination overhead 2. **Data Duplication**: Some data may be replicated across context boundaries 3. **Event Consistency**: Eventual consistency between contexts requires careful handling 4. **Learning Curve**: Team must understand DDD concepts and context boundaries ### Mitigations 1. Use event sourcing for cross-context communication 2. Implement saga patterns for multi-context transactions 3. Maintain comprehensive integration tests at context boundaries 4. Document context mappings and keep them updated --- ## References - Evans, Eric. "Domain-Driven Design: Tackling Complexity in the Heart of Software" (2003) - Vernon, Vaughn. "Implementing Domain-Driven Design" (2013) - Perch 2.0 Paper: arXiv:2508.04665 - RuVector Documentation: https://github.com/ruvnet/ruvector --- ## Revision History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2026-01-15 | Architecture Team | Initial ADR |