Files
wifi-densepose/vendor/ruvector/examples/vibecast-7sense/docs/adr/ADR-002-ddd-bounded-contexts.md

52 KiB

ADR-002: Domain-Driven Design Bounded Contexts

Status

Accepted

Date

2026-01-15

Context

7sense is a bioacoustics analysis platform that transforms bird audio recordings into navigable geometric spaces. The system processes audio through Perch 2.0 embeddings (1536-dimensional vectors), stores them in RuVector with HNSW indexing, and applies GNN learning to discover patterns, motifs, and sequences. The output feeds into RAB (Retrieval-Augmented Bioacoustics) evidence packs for transparent, citation-backed interpretations.

The complexity of this domain requires clear separation of concerns to:

  • Enable independent evolution of subsystems
  • Maintain clear ownership boundaries
  • Reduce coupling between technical and analytical components
  • Support distributed team development
  • Facilitate testing and validation at context boundaries

Decision

We adopt Domain-Driven Design (DDD) with six bounded contexts that represent distinct subdomains of the bioacoustics analysis pipeline:

  1. Audio Ingestion Context
  2. Embedding Context
  3. Vector Space Context
  4. Learning Context
  5. Analysis Context
  6. Interpretation Context

Bounded Context Definitions

1. Audio Ingestion Context

Purpose: Capture, segment, and preprocess raw audio recordings into analysis-ready call segments.

Ubiquitous Language

Term Definition
Recording A continuous audio capture from a sensor at a specific location and time
Sensor A physical audio capture device with known characteristics (sample rate, gain, location)
Call Segment An isolated vocalization extracted from a recording (typically 5 seconds at 32kHz)
Segmentation The process of detecting and extracting individual vocalizations from continuous audio
SNR (Signal-to-Noise Ratio) Quality metric indicating clarity of vocalization above background noise
Preprocessing Normalization, resampling, and filtering applied before embedding
Habitat Environmental classification of the recording location
Soundscape The full acoustic environment including all sound sources

Aggregates and Entities

Aggregate: Recording
├── Entity: Recording (Aggregate Root)
│   ├── id: RecordingId (UUID)
│   ├── sensorId: SensorId
│   ├── location: GeoLocation {lat, lon, altitude}
│   ├── startTimestamp: DateTime
│   ├── duration: Duration
│   ├── habitat: HabitatType
│   ├── weather: WeatherConditions
│   ├── format: AudioFormat {sampleRate, channels, bitDepth}
│   └── status: IngestionStatus
│
├── Value Object: AudioFormat
│   ├── sampleRate: u32 (target: 32000 Hz)
│   ├── channels: u8 (target: 1 mono)
│   └── bitDepth: u8
│
└── Value Object: WeatherConditions
    ├── temperature: f32
    ├── humidity: f32
    ├── windSpeed: f32
    └── precipitation: PrecipitationType

Aggregate: CallSegment
├── Entity: CallSegment (Aggregate Root)
│   ├── id: SegmentId (UUID)
│   ├── recordingId: RecordingId
│   ├── startOffset: Duration (t0_ms)
│   ├── endOffset: Duration (t1_ms)
│   ├── snr: f32
│   ├── energy: f32
│   ├── clippingScore: f32
│   ├── overlapScore: f32
│   └── qualityGrade: QualityGrade
│
└── Value Object: SegmentMetrics
    ├── peakAmplitude: f32
    ├── rmsEnergy: f32
    ├── zeroCrossingRate: f32
    └── spectralCentroid: f32

Aggregate: Sensor
├── Entity: Sensor (Aggregate Root)
│   ├── id: SensorId
│   ├── model: String
│   ├── location: GeoLocation
│   ├── calibration: CalibrationProfile
│   └── status: SensorStatus
│
└── Value Object: CalibrationProfile
    ├── frequencyResponse: Vec<(f32, f32)>
    ├── noiseFloor: f32
    └── lastCalibrated: DateTime

Domain Events

Event Payload Published When
RecordingReceived recordingId, sensorId, timestamp, duration New audio file uploaded/streamed
RecordingValidated recordingId, format, qualityScore Format and quality checks pass
RecordingRejected recordingId, reason, details Recording fails validation
SegmentationStarted recordingId, algorithm, parameters Segmentation process begins
SegmentExtracted segmentId, recordingId, timeRange, snr Individual call isolated
SegmentationCompleted recordingId, segmentCount, duration All segments extracted
PreprocessingCompleted segmentId, normalizedFormat Segment ready for embedding

Services

// Domain Services
trait SegmentationService {
    fn segment_recording(recording: &Recording, config: SegmentationConfig)
        -> Result<Vec<CallSegment>, SegmentationError>;
    fn detect_vocalizations(audio: &AudioBuffer) -> Vec<TimeRange>;
}

trait PreprocessingService {
    fn normalize(segment: &CallSegment) -> NormalizedAudio;
    fn resample(audio: &AudioBuffer, targetRate: u32) -> AudioBuffer;
    fn apply_bandpass(audio: &AudioBuffer, lowHz: f32, highHz: f32) -> AudioBuffer;
}

trait QualityAssessmentService {
    fn compute_snr(segment: &CallSegment) -> f32;
    fn detect_clipping(segment: &CallSegment) -> f32;
    fn assess_quality(segment: &CallSegment) -> QualityGrade;
}

2. Embedding Context

Purpose: Transform preprocessed audio segments into 1536-dimensional Perch 2.0 embeddings suitable for vector space operations.

Ubiquitous Language

Term Definition
Embedding A 1536-dimensional vector representation of a call segment
Perch 2.0 Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone)
Mel Spectrogram Time-frequency representation using mel-scaled frequency bins (500 frames x 128 bins)
Inference The process of generating an embedding from audio input
Normalization L2 normalization of embedding vectors for cosine similarity
Model Version Specific checkpoint/version of the embedding model
Batch Collection of segments processed together for efficiency
Embedding Stability Consistency of embeddings for identical/similar inputs

Aggregates and Entities

Aggregate: Embedding
├── Entity: Embedding (Aggregate Root)
│   ├── id: EmbeddingId (UUID)
│   ├── segmentId: SegmentId
│   ├── vector: Vec<f32> (dim=1536)
│   ├── modelVersion: ModelVersion
│   ├── norm: f32
│   ├── createdAt: DateTime
│   └── metadata: EmbeddingMetadata
│
└── Value Object: EmbeddingMetadata
    ├── inferenceLatency: Duration
    ├── batchId: Option<BatchId>
    └── gpuUsed: bool

Aggregate: EmbeddingModel
├── Entity: EmbeddingModel (Aggregate Root)
│   ├── id: ModelId
│   ├── name: "perch2"
│   ├── version: SemanticVersion
│   ├── dimensions: u32 (1536)
│   ├── inputSpec: InputSpecification
│   └── status: ModelStatus
│
├── Value Object: InputSpecification
│   ├── sampleRate: 32000
│   ├── windowDuration: 5.0 seconds
│   ├── windowSamples: 160000
│   ├── melBins: 128
│   └── frequencyRange: (60, 16000) Hz
│
└── Value Object: ModelCheckpoint
    ├── path: String
    ├── format: ModelFormat (ONNX)
    └── checksum: String

Aggregate: EmbeddingBatch
├── Entity: EmbeddingBatch (Aggregate Root)
│   ├── id: BatchId
│   ├── segmentIds: Vec<SegmentId>
│   ├── status: BatchStatus
│   ├── startedAt: DateTime
│   ├── completedAt: Option<DateTime>
│   └── metrics: BatchMetrics
│
└── Value Object: BatchMetrics
    ├── totalSegments: u32
    ├── successCount: u32
    ├── failureCount: u32
    ├── avgLatencyMs: f32
    └── throughput: f32

Domain Events

Event Payload Published When
EmbeddingRequested segmentId, modelVersion, priority Segment queued for embedding
BatchCreated batchId, segmentIds, modelVersion Batch assembled for processing
InferenceStarted embeddingId/batchId, modelVersion Model inference begins
EmbeddingGenerated embeddingId, segmentId, vector, norm Single embedding computed
BatchCompleted batchId, successCount, failureCount Batch processing finishes
EmbeddingFailed segmentId, error, retryable Inference failure
ModelVersionChanged oldVersion, newVersion, migrationRequired Model updated
EmbeddingNormalized embeddingId, originalNorm, normalizedVector L2 normalization applied

Services

// Domain Services
trait EmbeddingService {
    fn embed_segment(segment: &NormalizedAudio, model: &EmbeddingModel)
        -> Result<Embedding, EmbeddingError>;
    fn embed_batch(segments: Vec<&NormalizedAudio>, model: &EmbeddingModel)
        -> Vec<Result<Embedding, EmbeddingError>>;
}

trait SpectrogramService {
    fn compute_mel_spectrogram(audio: &AudioBuffer) -> MelSpectrogram;
    fn validate_spectrogram(spectrogram: &MelSpectrogram) -> ValidationResult;
}

trait NormalizationService {
    fn l2_normalize(embedding: &Embedding) -> NormalizedEmbedding;
    fn validate_norm_stability(embeddings: &[Embedding]) -> StabilityReport;
}

trait ModelManagementService {
    fn load_model(version: &ModelVersion) -> Result<EmbeddingModel, ModelError>;
    fn validate_model_output(embedding: &Embedding) -> ValidationResult;
    fn compare_model_versions(v1: &ModelVersion, v2: &ModelVersion, samples: &[AudioBuffer])
        -> VersionComparisonReport;
}

3. Vector Space Context

Purpose: Index embeddings using HNSW, manage similarity search, and maintain the navigable neighbor graph that forms the geometric foundation.

Ubiquitous Language

Term Definition
HNSW Index Hierarchical Navigable Small World graph for approximate nearest neighbor search
Neighbor Graph Network of similarity edges connecting acoustically related embeddings
k-NN Query Search for k nearest neighbors to a query vector
Similarity Edge Weighted connection between two embeddings based on distance
Distance Metric Function measuring dissimilarity (cosine, euclidean, Poincare)
Index Layer One level in the HNSW hierarchical structure
Entry Point Starting node for graph traversal in search
ef (Search) Exploration factor controlling search accuracy vs. speed
M (Construction) Maximum number of connections per node per layer

Aggregates and Entities

Aggregate: VectorIndex
├── Entity: VectorIndex (Aggregate Root)
│   ├── id: IndexId
│   ├── name: String
│   ├── dimensions: u32 (1536)
│   ├── distanceMetric: DistanceMetric
│   ├── hnswConfig: HnswConfiguration
│   ├── vectorCount: u64
│   ├── layerCount: u32
│   └── status: IndexStatus
│
├── Value Object: HnswConfiguration
│   ├── m: u32 (max connections per layer)
│   ├── efConstruction: u32
│   ├── efSearch: u32
│   └── maxLayers: u32
│
└── Value Object: IndexStatistics
    ├── memoryUsage: u64
    ├── avgDegree: f32
    ├── layerDistribution: Vec<u32>
    └── searchLatencyP99: Duration

Aggregate: IndexedVector
├── Entity: IndexedVector (Aggregate Root)
│   ├── id: VectorId
│   ├── embeddingId: EmbeddingId
│   ├── indexId: IndexId
│   ├── layerMembership: Vec<u32>
│   ├── neighborIds: Vec<VectorId>
│   └── insertedAt: DateTime
│
└── Value Object: VectorPosition
    ├── entryDistance: f32
    └── layerDistances: Vec<f32>

Aggregate: SimilarityEdge
├── Entity: SimilarityEdge (Aggregate Root)
│   ├── id: EdgeId
│   ├── sourceId: VectorId
│   ├── targetId: VectorId
│   ├── distance: f32
│   ├── edgeType: EdgeType (SIMILAR, HNSW_NEIGHBOR)
│   └── weight: f32
│
└── Value Object: EdgeMetadata
    ├── createdAt: DateTime
    ├── lastAccessed: DateTime
    └── accessCount: u32

Aggregate: SearchQuery
├── Entity: SearchQuery (Aggregate Root)
│   ├── id: QueryId
│   ├── queryVector: Vec<f32>
│   ├── k: u32
│   ├── efSearch: u32
│   ├── filters: Vec<SearchFilter>
│   └── results: Option<SearchResults>
│
└── Value Object: SearchResults
    ├── neighbors: Vec<(VectorId, f32)>
    ├── searchLatency: Duration
    ├── nodesVisited: u32
    └── distanceComputations: u32

Domain Events

Event Payload Published When
IndexCreated indexId, config, distanceMetric New HNSW index initialized
VectorInserted vectorId, embeddingId, indexId, layerAssignment Embedding added to index
VectorRemoved vectorId, indexId Embedding removed from index
NeighborGraphUpdated indexId, affectedVectors, newEdges Graph structure modified
SimilarityEdgeCreated edgeId, sourceId, targetId, distance New similarity link established
SearchExecuted queryId, k, latency, resultsCount k-NN search completed
IndexRebuildStarted indexId, reason, estimatedDuration Index reconstruction begins
IndexRebuildCompleted indexId, vectorCount, duration Index reconstruction finishes
IndexOptimized indexId, beforeStats, afterStats Index compaction/optimization

Services

// Domain Services
trait VectorIndexService {
    fn create_index(config: IndexConfiguration) -> Result<VectorIndex, IndexError>;
    fn insert_vector(index: &mut VectorIndex, embedding: &Embedding)
        -> Result<IndexedVector, InsertionError>;
    fn remove_vector(index: &mut VectorIndex, vectorId: VectorId)
        -> Result<(), RemovalError>;
    fn rebuild_index(index: &mut VectorIndex) -> Result<IndexStatistics, RebuildError>;
}

trait SimilaritySearchService {
    fn knn_search(index: &VectorIndex, query: &[f32], k: u32, ef: u32)
        -> SearchResults;
    fn range_search(index: &VectorIndex, query: &[f32], radius: f32)
        -> Vec<(VectorId, f32)>;
    fn batch_search(index: &VectorIndex, queries: &[Vec<f32>], k: u32)
        -> Vec<SearchResults>;
}

trait NeighborGraphService {
    fn get_neighbors(vectorId: VectorId, depth: u32) -> NeighborGraph;
    fn compute_similarity_edges(index: &VectorIndex, topK: u32)
        -> Vec<SimilarityEdge>;
    fn prune_edges(index: &mut VectorIndex, threshold: f32) -> u32;
}

trait DistanceService {
    fn cosine_distance(a: &[f32], b: &[f32]) -> f32;
    fn euclidean_distance(a: &[f32], b: &[f32]) -> f32;
    fn poincare_distance(a: &[f32], b: &[f32], curvature: f32) -> f32;
}

4. Learning Context

Purpose: Train and apply GNN models to refine embeddings, learn transition patterns, and enable continuous self-improvement of the vector space.

Ubiquitous Language

Term Definition
GNN (Graph Neural Network) Neural network operating on graph-structured data
Message Passing GNN mechanism where nodes aggregate information from neighbors
Graph Attention (GAT) Attention-weighted message passing for learnable edge importance
Training Epoch One complete pass through the training data
Contrastive Loss Loss function pulling similar pairs together, pushing dissimilar apart
InfoNCE Information Noise-Contrastive Estimation loss for self-supervised learning
Embedding Refinement GNN-driven adjustment of embedding positions in vector space
Transition Edge Temporal connection between sequential call segments
EWC (Elastic Weight Consolidation) Technique preventing catastrophic forgetting during updates

Aggregates and Entities

Aggregate: LearningModel
├── Entity: LearningModel (Aggregate Root)
│   ├── id: ModelId
│   ├── architecture: GnnArchitecture (GAT, GraphSAGE, GCN)
│   ├── layers: Vec<LayerConfig>
│   ├── version: SemanticVersion
│   ├── trainedAt: DateTime
│   ├── metrics: TrainingMetrics
│   └── status: ModelStatus
│
├── Value Object: LayerConfig
│   ├── layerType: LayerType
│   ├── inputDim: u32
│   ├── outputDim: u32
│   ├── heads: u32 (for attention)
│   └── dropout: f32
│
└── Value Object: TrainingMetrics
    ├── epochs: u32
    ├── finalLoss: f32
    ├── validationScore: f32
    └── trainingDuration: Duration

Aggregate: TrainingSession
├── Entity: TrainingSession (Aggregate Root)
│   ├── id: SessionId
│   ├── modelId: ModelId
│   ├── config: TrainingConfiguration
│   ├── currentEpoch: u32
│   ├── status: SessionStatus
│   └── checkpoints: Vec<Checkpoint>
│
├── Value Object: TrainingConfiguration
│   ├── learningRate: f32
│   ├── batchSize: u32
│   ├── maxEpochs: u32
│   ├── lossFunction: LossType (InfoNCE, Triplet, Contrastive)
│   ├── optimizer: OptimizerType
│   └── ewcEnabled: bool
│
└── Value Object: Checkpoint
    ├── epoch: u32
    ├── loss: f32
    ├── weightsPath: String
    └── timestamp: DateTime

Aggregate: TransitionGraph
├── Entity: TransitionGraph (Aggregate Root)
│   ├── id: GraphId
│   ├── nodeCount: u32
│   ├── edgeCount: u32
│   ├── edgeTypes: Vec<EdgeType>
│   └── statistics: GraphStatistics
│
├── Entity: TransitionEdge
│   ├── id: EdgeId
│   ├── sourceSegmentId: SegmentId
│   ├── targetSegmentId: SegmentId
│   ├── edgeType: EdgeType (NEXT, SIMILAR, CO_OCCURRENCE)
│   ├── weight: f32
│   └── metadata: EdgeMetadata
│
└── Value Object: GraphStatistics
    ├── avgDegree: f32
    ├── clusteringCoefficient: f32
    ├── diameter: u32
    └── componentCount: u32

Aggregate: RefinedEmbedding
├── Entity: RefinedEmbedding (Aggregate Root)
│   ├── id: RefinedEmbeddingId
│   ├── originalEmbeddingId: EmbeddingId
│   ├── refinedVector: Vec<f32>
│   ├── modelVersion: ModelVersion
│   ├── refinementDelta: f32
│   └── createdAt: DateTime
│
└── Value Object: RefinementMetadata
    ├── neighborInfluence: Vec<(EmbeddingId, f32)>
    ├── attentionWeights: Vec<f32>
    └── iterations: u32

Domain Events

Event Payload Published When
TrainingSessionStarted sessionId, modelId, config GNN training begins
EpochCompleted sessionId, epoch, loss, metrics Training epoch finishes
CheckpointSaved sessionId, epoch, path Model weights saved
TrainingCompleted sessionId, finalMetrics Training session ends
ModelDeployed modelId, version New model activated
EmbeddingsRefined batchId, vectorCount, avgDelta GNN refinement applied
TransitionEdgeDiscovered edgeId, source, target, type New temporal relationship
GraphStructureUpdated graphId, addedEdges, removedEdges Transition graph modified
LearningRateAdjusted sessionId, oldLr, newLr Adaptive LR change
EwcConsolidated sessionId, importantWeights EWC protection updated

Services

// Domain Services
trait GnnTrainingService {
    fn start_training(model: &LearningModel, graph: &TransitionGraph, config: TrainingConfiguration)
        -> Result<TrainingSession, TrainingError>;
    fn run_epoch(session: &mut TrainingSession, batch: &GraphBatch)
        -> EpochResult;
    fn save_checkpoint(session: &TrainingSession) -> Result<Checkpoint, IoError>;
    fn apply_ewc(session: &mut TrainingSession, importanceMatrix: &ImportanceMatrix);
}

trait EmbeddingRefinementService {
    fn refine_embeddings(embeddings: &[Embedding], model: &LearningModel, graph: &TransitionGraph)
        -> Vec<RefinedEmbedding>;
    fn compute_refinement_delta(original: &Embedding, refined: &RefinedEmbedding) -> f32;
}

trait TransitionGraphService {
    fn build_transition_graph(segments: &[CallSegment], recordings: &[Recording])
        -> TransitionGraph;
    fn add_temporal_edges(graph: &mut TransitionGraph, sequences: &[SegmentSequence]);
    fn add_similarity_edges(graph: &mut TransitionGraph, index: &VectorIndex, topK: u32);
    fn compute_graph_statistics(graph: &TransitionGraph) -> GraphStatistics;
}

trait AttentionService {
    fn compute_attention_weights(query: &Embedding, neighbors: &[Embedding]) -> Vec<f32>;
    fn apply_graph_attention(node: &GraphNode, neighbors: &[GraphNode], model: &GatLayer)
        -> AttentionOutput;
}

5. Analysis Context

Purpose: Perform clustering, motif detection, sequence mining, and pattern discovery on the refined vector space.

Ubiquitous Language

Term Definition
Cluster Group of acoustically similar call segments
Prototype Representative embedding for a cluster (centroid or medoid)
Exemplar Actual call segment that best represents a cluster
Motif Recurring pattern or phrase in vocalization sequences
Sequence Ordered series of call segments from a recording
Transition Matrix Probability matrix of call-to-call transitions
Entropy Rate Measure of unpredictability in vocalization sequences
Call Type Functional category of vocalization (alarm, contact, song)
Dialect Regional variation in vocalization patterns

Aggregates and Entities

Aggregate: Cluster
├── Entity: Cluster (Aggregate Root)
│   ├── id: ClusterId
│   ├── method: ClusteringMethod (HDBSCAN, KMeans, Spectral)
│   ├── parameters: ClusteringParameters
│   ├── memberCount: u32
│   ├── cohesion: f32
│   ├── separation: f32
│   └── status: ClusterStatus
│
├── Entity: Prototype
│   ├── id: PrototypeId
│   ├── clusterId: ClusterId
│   ├── centroidVector: Vec<f32>
│   ├── exemplarIds: Vec<SegmentId>
│   └── stability: f32
│
└── Value Object: ClusteringParameters
    ├── minClusterSize: u32
    ├── minSamples: u32
    ├── epsilon: Option<f32>
    └── metric: DistanceMetric

Aggregate: ClusterAssignment
├── Entity: ClusterAssignment (Aggregate Root)
│   ├── id: AssignmentId
│   ├── segmentId: SegmentId
│   ├── clusterId: ClusterId
│   ├── confidence: f32
│   ├── distance_to_centroid: f32
│   └── assignedAt: DateTime
│
└── Value Object: SoftAssignment
    ├── clusterProbabilities: Vec<(ClusterId, f32)>
    └── isAmbiguous: bool

Aggregate: Motif
├── Entity: Motif (Aggregate Root)
│   ├── id: MotifId
│   ├── pattern: Vec<ClusterId>
│   ├── occurrenceCount: u32
│   ├── avgDuration: Duration
│   ├── confidence: f32
│   └── context: MotifContext
│
├── Value Object: MotifOccurrence
│   ├── recordingId: RecordingId
│   ├── startSegmentId: SegmentId
│   ├── segmentIds: Vec<SegmentId>
│   └── timestamp: DateTime
│
└── Value Object: MotifContext
    ├── typicalHabitat: Vec<HabitatType>
    ├── timeOfDay: Vec<TimeRange>
    └── associatedBehavior: Option<String>

Aggregate: SequenceAnalysis
├── Entity: SequenceAnalysis (Aggregate Root)
│   ├── id: AnalysisId
│   ├── recordingId: RecordingId
│   ├── segmentSequence: Vec<SegmentId>
│   ├── clusterSequence: Vec<ClusterId>
│   ├── transitionMatrix: TransitionMatrix
│   └── metrics: SequenceMetrics
│
├── Value Object: TransitionMatrix
│   ├── clusterIds: Vec<ClusterId>
│   ├── probabilities: Vec<Vec<f32>>
│   └── observations: Vec<Vec<u32>>
│
└── Value Object: SequenceMetrics
    ├── entropyRate: f32
    ├── stereotypy: f32
    ├── motifDensity: f32
    └── uniqueTransitions: u32

Aggregate: Anomaly
├── Entity: Anomaly (Aggregate Root)
│   ├── id: AnomalyId
│   ├── segmentId: SegmentId
│   ├── anomalyType: AnomalyType (Rare, Novel, Artifact)
│   ├── score: f32
│   ├── nearestCluster: Option<ClusterId>
│   └── detectedAt: DateTime
│
└── Value Object: AnomalyContext
    ├── neighborDistances: Vec<f32>
    ├── localDensity: f32
    └── globalRarity: f32

Domain Events

Event Payload Published When
ClusteringStarted clusterId, method, parameters Clustering analysis begins
ClusteringCompleted clusterId, clusterCount, metrics Clustering finishes
ClusterAssigned assignmentId, segmentId, clusterId, confidence Segment assigned to cluster
PrototypeUpdated prototypeId, clusterId, newCentroid Cluster representative changed
MotifDiscovered motifId, pattern, occurrenceCount New recurring pattern found
MotifOccurrenceFound motifId, recordingId, segmentIds Motif instance detected
SequenceAnalyzed analysisId, recordingId, entropyRate Sequence metrics computed
AnomalyDetected anomalyId, segmentId, score, type Unusual vocalization found
TransitionMatrixUpdated recordingId, entropyChange Transition probabilities recalculated
DialectIdentified clusterId, region, distinctiveness Regional variant discovered

Services

// Domain Services
trait ClusteringService {
    fn cluster_embeddings(embeddings: &[Embedding], method: ClusteringMethod, params: ClusteringParameters)
        -> ClusteringResult;
    fn assign_to_cluster(embedding: &Embedding, clusters: &[Cluster])
        -> ClusterAssignment;
    fn compute_prototype(cluster: &Cluster, members: &[Embedding]) -> Prototype;
    fn evaluate_clustering(clusters: &[Cluster], assignments: &[ClusterAssignment])
        -> ClusteringMetrics;
}

trait MotifDetectionService {
    fn discover_motifs(sequences: &[SequenceAnalysis], minSupport: u32, maxLength: u32)
        -> Vec<Motif>;
    fn find_motif_occurrences(motif: &Motif, sequence: &SequenceAnalysis)
        -> Vec<MotifOccurrence>;
    fn validate_motif_dtw(motif: &Motif, occurrences: &[MotifOccurrence])
        -> ValidationResult;
}

trait SequenceAnalysisService {
    fn analyze_sequence(recording: &Recording, segments: &[CallSegment], clusters: &[Cluster])
        -> SequenceAnalysis;
    fn compute_transition_matrix(clusterSequence: &[ClusterId]) -> TransitionMatrix;
    fn compute_entropy_rate(matrix: &TransitionMatrix) -> f32;
    fn compute_stereotypy(matrix: &TransitionMatrix) -> f32;
}

trait AnomalyDetectionService {
    fn detect_anomalies(embeddings: &[Embedding], index: &VectorIndex, threshold: f32)
        -> Vec<Anomaly>;
    fn classify_anomaly(anomaly: &Anomaly, context: &AnalysisContext) -> AnomalyType;
    fn compute_local_outlier_factor(embedding: &Embedding, neighbors: &[Embedding]) -> f32;
}

6. Interpretation Context

Purpose: Generate RAB (Retrieval-Augmented Bioacoustics) evidence packs and constrained interpretations with full citation and transparency.

Ubiquitous Language

Term Definition
Evidence Pack Structured collection of supporting data for an interpretation
Citation Reference to specific retrieved calls supporting a statement
Constrained Generation Output limited to evidence-backed structural descriptions
Structural Descriptor Objective characterization (pitch contour, rhythm, spectral texture)
Interpretation Evidence-backed analysis of vocalization meaning/context
Confidence Level Certainty measure based on evidence quality and quantity
Attribution Clear link between interpretation statements and source evidence
Hypothesis Testable suggestion generated from pattern analysis

Aggregates and Entities

Aggregate: EvidencePack
├── Entity: EvidencePack (Aggregate Root)
│   ├── id: EvidencePackId
│   ├── querySegmentId: SegmentId
│   ├── queryType: QueryType (Segment, TimeInterval, Habitat)
│   ├── retrievedNeighbors: Vec<RetrievedNeighbor>
│   ├── clusterExemplars: Vec<Exemplar>
│   ├── sequenceContext: SequenceContext
│   ├── signalQuality: SignalQuality
│   └── generatedAt: DateTime
│
├── Value Object: RetrievedNeighbor
│   ├── segmentId: SegmentId
│   ├── distance: f32
│   ├── clusterId: Option<ClusterId>
│   ├── spectrogramThumbnail: Option<ThumbnailId>
│   └── metadata: SegmentMetadata
│
├── Value Object: SequenceContext
│   ├── previousSegments: Vec<SegmentId>
│   ├── nextSegments: Vec<SegmentId>
│   ├── positionInRecording: f32
│   └── localMotifs: Vec<MotifId>
│
└── Value Object: SignalQuality
    ├── snr: f32
    ├── clippingScore: f32
    ├── overlapScore: f32
    └── qualityGrade: QualityGrade

Aggregate: Interpretation
├── Entity: Interpretation (Aggregate Root)
│   ├── id: InterpretationId
│   ├── evidencePackId: EvidencePackId
│   ├── interpretationType: InterpretationType
│   ├── statements: Vec<InterpretationStatement>
│   ├── overallConfidence: f32
│   ├── hypotheses: Vec<Hypothesis>
│   └── generatedAt: DateTime
│
├── Entity: InterpretationStatement
│   ├── id: StatementId
│   ├── content: String
│   ├── statementType: StatementType
│   ├── citations: Vec<Citation>
│   ├── confidence: f32
│   └── constraints: Vec<Constraint>
│
├── Value Object: Citation
│   ├── sourceType: CitationSource (Neighbor, Exemplar, Motif, Cluster)
│   ├── sourceId: String
│   ├── relevance: f32
│   └── excerpt: Option<String>
│
└── Value Object: Hypothesis
    ├── statement: String
    ├── testability: TestabilityLevel
    ├── supportingEvidence: Vec<CitationId>
    └── suggestedExperiment: Option<String>

Aggregate: StructuralDescriptor
├── Entity: StructuralDescriptor (Aggregate Root)
│   ├── id: DescriptorId
│   ├── segmentId: SegmentId
│   ├── pitchContour: PitchContourStats
│   ├── rhythmProfile: RhythmProfile
│   ├── spectralTexture: SpectralTexture
│   └── sequenceRole: SequenceRole
│
├── Value Object: PitchContourStats
│   ├── minFrequency: f32
│   ├── maxFrequency: f32
│   ├── meanFrequency: f32
│   ├── contourShape: ContourShape
│   └── bandwidth: f32
│
├── Value Object: RhythmProfile
│   ├── duration: Duration
│   ├── syllableCount: u32
│   ├── interSyllableIntervals: Vec<Duration>
│   └── rhythmRegularity: f32
│
├── Value Object: SpectralTexture
│   ├── harmonicity: f32
│   ├── spectralCentroid: f32
│   ├── spectralFlatness: f32
│   └── wienerEntropy: f32
│
└── Value Object: SequenceRole
    ├── typicalPredecessors: Vec<ClusterId>
    ├── typicalSuccessors: Vec<ClusterId>
    ├── positionDistribution: PositionDistribution
    └── contextualFrequency: f32

Aggregate: MonitoringSummary
├── Entity: MonitoringSummary (Aggregate Root)
│   ├── id: SummaryId
│   ├── timeRange: TimeRange
│   ├── location: GeoLocation
│   ├── callCounts: HashMap<ClusterId, u32>
│   ├── diversityMetrics: DiversityMetrics
│   ├── anomalies: Vec<AnomalyId>
│   └── interpretations: Vec<InterpretationId>
│
└── Value Object: DiversityMetrics
    ├── speciesRichness: u32
    ├── shannonIndex: f32
    ├── simpsonIndex: f32
    └── evenness: f32

Domain Events

Event Payload Published When
EvidencePackRequested querySegmentId, queryType, parameters Analysis request initiated
EvidencePackAssembled evidencePackId, neighborCount, exemplarCount Evidence gathering complete
InterpretationGenerated interpretationId, evidencePackId, statementCount Interpretation created
StatementCited statementId, citations Statement linked to evidence
HypothesisProposed hypothesisId, interpretationId, testability Testable hypothesis generated
StructuralDescriptorComputed descriptorId, segmentId Acoustic features extracted
MonitoringSummaryGenerated summaryId, timeRange, location Period summary created
AnnotationSuggested segmentId, suggestedLabel, confidence Label recommendation made
InterpretationValidated interpretationId, validationResult Expert review completed

Services

// Domain Services
trait EvidencePackService {
    fn assemble_evidence_pack(
        querySegment: &CallSegment,
        index: &VectorIndex,
        clusters: &[Cluster],
        sequences: &[SequenceAnalysis],
        config: EvidencePackConfig
    ) -> EvidencePack;

    fn retrieve_neighbors(segment: &CallSegment, index: &VectorIndex, k: u32)
        -> Vec<RetrievedNeighbor>;
    fn get_sequence_context(segment: &CallSegment, recording: &Recording)
        -> SequenceContext;
}

trait InterpretationService {
    fn generate_interpretation(evidencePack: &EvidencePack, constraints: &[Constraint])
        -> Interpretation;
    fn create_statement(content: &str, citations: &[Citation], statementType: StatementType)
        -> InterpretationStatement;
    fn generate_hypotheses(evidencePack: &EvidencePack, interpretation: &Interpretation)
        -> Vec<Hypothesis>;
}

trait StructuralDescriptorService {
    fn compute_descriptors(segment: &CallSegment) -> StructuralDescriptor;
    fn extract_pitch_contour(audio: &AudioBuffer) -> PitchContourStats;
    fn analyze_rhythm(segments: &[CallSegment]) -> RhythmProfile;
    fn compute_spectral_texture(spectrogram: &MelSpectrogram) -> SpectralTexture;
}

trait MonitoringService {
    fn generate_summary(
        recordings: &[Recording],
        timeRange: TimeRange,
        location: GeoLocation
    ) -> MonitoringSummary;
    fn compute_diversity_metrics(clusterAssignments: &[ClusterAssignment])
        -> DiversityMetrics;
    fn detect_temporal_patterns(summaries: &[MonitoringSummary])
        -> Vec<TemporalPattern>;
}

trait CitationService {
    fn create_citation(source: CitationSource, sourceId: &str, relevance: f32)
        -> Citation;
    fn validate_citation(citation: &Citation, evidencePack: &EvidencePack)
        -> ValidationResult;
    fn format_attribution(statement: &InterpretationStatement) -> String;
}

Context Mapping

Relationships Between Contexts

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CONTEXT MAP                                            │
└─────────────────────────────────────────────────────────────────────────────────┘

                    ┌──────────────────┐
                    │  Audio Ingestion │
                    │     Context      │
                    └────────┬─────────┘
                             │
                             │ [U/D] CallSegment
                             │ Published Language
                             ▼
                    ┌──────────────────┐
                    │    Embedding     │
                    │     Context      │
                    └────────┬─────────┘
                             │
                             │ [U/D] Embedding
                             │ Published Language
                             ▼
                    ┌──────────────────┐
                    │   Vector Space   │◄──────────────────────┐
                    │     Context      │                       │
                    └────────┬─────────┘                       │
                             │                                 │
            ┌────────────────┼────────────────┐                │
            │                │                │                │
            │ [ACL]          │ [ACL]          │ [ACL]          │
            ▼                ▼                ▼                │
   ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐       │
   │   Learning  │  │   Analysis  │  │ Interpretation  │       │
   │   Context   │  │   Context   │  │    Context      │       │
   └──────┬──────┘  └──────┬──────┘  └────────┬────────┘       │
          │                │                   │                │
          │ [Partnership]  │ [Customer/        │ [Customer/     │
          │                │  Supplier]        │  Supplier]     │
          └────────────────┴───────────────────┘                │
                           │                                    │
                           │ RefinedEmbedding                   │
                           └────────────────────────────────────┘


LEGEND:
  [U/D]  = Upstream/Downstream (Published Language)
  [ACL]  = Anti-Corruption Layer
  [Partnership] = Shared development, mutual dependency
  [Customer/Supplier] = Clear provider/consumer relationship

Integration Patterns

Upstream Downstream Pattern Shared Kernel
Audio Ingestion Embedding Published Language CallSegment, SegmentId, QualityGrade
Embedding Vector Space Published Language Embedding, EmbeddingId, Vec<f32>
Vector Space Learning ACL + Partnership VectorIndex, NeighborGraph
Vector Space Analysis ACL + Customer/Supplier SearchResults, SimilarityEdge
Vector Space Interpretation ACL + Customer/Supplier SearchResults, RetrievedNeighbor
Learning Vector Space Partnership RefinedEmbedding (feedback loop)
Analysis Interpretation Customer/Supplier Cluster, Motif, SequenceAnalysis

Anti-Corruption Layers

Learning Context ACL

/// Translates Vector Space concepts to Learning domain
mod learning_acl {
    use crate::vector_space::{VectorIndex, IndexedVector, SimilarityEdge};
    use crate::learning::{TransitionGraph, GraphNode, GraphEdge};

    pub struct VectorSpaceAdapter {
        index: Arc<VectorIndex>,
    }

    impl VectorSpaceAdapter {
        /// Convert HNSW neighbor graph to GNN-compatible format
        pub fn to_transition_graph(&self, max_neighbors: u32) -> TransitionGraph {
            let nodes: Vec<GraphNode> = self.index
                .iter_vectors()
                .map(|v| GraphNode {
                    id: v.id.into(),
                    embedding: v.embedding_id,
                    features: self.extract_node_features(&v),
                })
                .collect();

            let edges: Vec<GraphEdge> = self.index
                .iter_similarity_edges()
                .filter(|e| e.distance < self.distance_threshold())
                .map(|e| GraphEdge {
                    source: e.source_id.into(),
                    target: e.target_id.into(),
                    edge_type: EdgeType::Similarity,
                    weight: 1.0 - e.distance, // Convert distance to similarity
                })
                .collect();

            TransitionGraph::new(nodes, edges)
        }

        /// Query neighbors without exposing HNSW internals
        pub fn get_trainable_neighbors(&self, vector_id: VectorId, k: u32)
            -> Vec<(GraphNodeId, f32)>
        {
            self.index
                .knn_search_by_id(vector_id, k)
                .map(|(vid, dist)| (vid.into(), 1.0 - dist))
                .collect()
        }
    }
}

Analysis Context ACL

/// Translates Vector Space results to Analysis domain
mod analysis_acl {
    use crate::vector_space::{SearchResults, VectorIndex};
    use crate::analysis::{ClusterCandidate, SimilarityMatrix};

    pub struct SearchResultsAdapter;

    impl SearchResultsAdapter {
        /// Convert k-NN results to clustering input
        pub fn to_similarity_matrix(
            index: &VectorIndex,
            embeddings: &[EmbeddingId],
            k: u32
        ) -> SimilarityMatrix {
            let n = embeddings.len();
            let mut matrix = SimilarityMatrix::new(n);

            for (i, emb_id) in embeddings.iter().enumerate() {
                let neighbors = index.knn_search_by_embedding_id(*emb_id, k);
                for (neighbor_id, distance) in neighbors {
                    if let Some(j) = embeddings.iter().position(|e| *e == neighbor_id) {
                        matrix.set(i, j, 1.0 - distance);
                    }
                }
            }

            matrix
        }

        /// Extract cluster candidates from dense regions
        pub fn identify_dense_regions(
            index: &VectorIndex,
            min_density: f32
        ) -> Vec<ClusterCandidate> {
            index.iter_vectors()
                .filter_map(|v| {
                    let local_density = index.compute_local_density(v.id);
                    if local_density >= min_density {
                        Some(ClusterCandidate {
                            center_id: v.embedding_id,
                            density: local_density,
                            estimated_size: (local_density * 100.0) as u32,
                        })
                    } else {
                        None
                    }
                })
                .collect()
        }
    }
}

Interpretation Context ACL

/// Translates Analysis results to Interpretation domain
mod interpretation_acl {
    use crate::analysis::{Cluster, Motif, SequenceAnalysis, ClusterAssignment};
    use crate::interpretation::{
        EvidencePack, RetrievedNeighbor, Exemplar, SequenceContext
    };

    pub struct AnalysisAdapter {
        clusters: Arc<HashMap<ClusterId, Cluster>>,
        motifs: Arc<HashMap<MotifId, Motif>>,
    }

    impl AnalysisAdapter {
        /// Build evidence pack from analysis artifacts
        pub fn build_evidence_pack(
            &self,
            query_segment: &CallSegment,
            neighbors: Vec<(SegmentId, f32)>,
            sequence: &SequenceAnalysis,
        ) -> EvidencePack {
            let retrieved_neighbors: Vec<RetrievedNeighbor> = neighbors
                .into_iter()
                .map(|(seg_id, distance)| {
                    let cluster_id = self.find_cluster_for_segment(seg_id);
                    RetrievedNeighbor {
                        segment_id: seg_id,
                        distance,
                        cluster_id,
                        spectogram_thumbnail: self.generate_thumbnail(seg_id),
                        metadata: self.get_segment_metadata(seg_id),
                    }
                })
                .collect();

            let exemplars: Vec<Exemplar> = self.get_relevant_exemplars(
                &retrieved_neighbors,
                5 // top 5 exemplars
            );

            let sequence_context = SequenceContext {
                previous_segments: sequence.get_predecessors(query_segment.id, 3),
                next_segments: sequence.get_successors(query_segment.id, 3),
                position_in_recording: sequence.relative_position(query_segment.id),
                local_motifs: self.find_local_motifs(query_segment.id, sequence),
            };

            EvidencePack {
                id: EvidencePackId::new(),
                query_segment_id: query_segment.id,
                query_type: QueryType::Segment,
                retrieved_neighbors,
                cluster_exemplars: exemplars,
                sequence_context,
                signal_quality: self.assess_quality(query_segment),
                generated_at: Utc::now(),
            }
        }

        /// Convert cluster to citable evidence
        pub fn cluster_to_citation(&self, cluster_id: ClusterId) -> Citation {
            let cluster = self.clusters.get(&cluster_id)
                .expect("Cluster not found");

            Citation {
                source_type: CitationSource::Cluster,
                source_id: cluster_id.to_string(),
                relevance: cluster.cohesion,
                excerpt: Some(format!(
                    "Cluster {} with {} members (cohesion: {:.2})",
                    cluster_id, cluster.member_count, cluster.cohesion
                )),
            }
        }
    }
}

Shared Kernel

The following types are shared across multiple contexts and form the ubiquitous language foundation:

/// Shared identifiers
pub mod shared_kernel {
    use uuid::Uuid;

    // Core identifiers shared across all contexts
    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct RecordingId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct SegmentId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct EmbeddingId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct ClusterId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct MotifId(Uuid);

    // Shared value objects
    #[derive(Clone, Debug)]
    pub struct GeoLocation {
        pub latitude: f64,
        pub longitude: f64,
        pub altitude: Option<f32>,
    }

    #[derive(Clone, Debug)]
    pub struct TimeRange {
        pub start: DateTime<Utc>,
        pub end: DateTime<Utc>,
    }

    #[derive(Clone, Copy, Debug, PartialEq)]
    pub enum QualityGrade {
        Excellent,  // SNR > 20dB, no clipping
        Good,       // SNR > 10dB, minimal issues
        Fair,       // SNR > 5dB, some artifacts
        Poor,       // SNR < 5dB or significant issues
        Unusable,   // Too degraded for analysis
    }

    // Embedding vector type (1536-D for Perch 2.0)
    pub type EmbeddingVector = Vec<f32>;
    pub const EMBEDDING_DIM: usize = 1536;

    // Audio format constants for Perch 2.0
    pub const TARGET_SAMPLE_RATE: u32 = 32000;
    pub const TARGET_WINDOW_SECONDS: f32 = 5.0;
    pub const TARGET_WINDOW_SAMPLES: usize = 160000;
    pub const MEL_BINS: usize = 128;
    pub const MEL_FRAMES: usize = 500;
}

Event Flow

Recording Upload
       │
       ▼
┌──────────────────────────────────────────────────────────────┐
│ AUDIO INGESTION CONTEXT                                       │
│   RecordingReceived → RecordingValidated → SegmentExtracted  │
└──────────────────────────────────────────────────────────────┘
       │ CallSegment (Published Language)
       ▼
┌──────────────────────────────────────────────────────────────┐
│ EMBEDDING CONTEXT                                             │
│   EmbeddingRequested → InferenceStarted → EmbeddingGenerated │
└──────────────────────────────────────────────────────────────┘
       │ Embedding (Published Language)
       ▼
┌──────────────────────────────────────────────────────────────┐
│ VECTOR SPACE CONTEXT                                          │
│   VectorInserted → NeighborGraphUpdated → SimilarityEdgeCreated│
└──────────────────────────────────────────────────────────────┘
       │                    │                         │
       │                    │                         │
       ▼                    ▼                         ▼
┌─────────────┐    ┌─────────────────┐    ┌─────────────────────┐
│  LEARNING   │    │    ANALYSIS     │    │   INTERPRETATION    │
│   CONTEXT   │    │     CONTEXT     │    │      CONTEXT        │
│             │    │                 │    │                     │
│ Training-   │    │ Clustering-     │    │ EvidencePack-       │
│ Started     │    │ Completed       │    │ Assembled           │
│     │       │    │     │           │    │     │               │
│     ▼       │    │     ▼           │    │     ▼               │
│ Embeddings- │    │ MotifDiscovered │    │ Interpretation-     │
│ Refined     │    │                 │    │ Generated           │
└─────────────┘    └─────────────────┘    └─────────────────────┘
       │
       │ RefinedEmbedding (feedback to Vector Space)
       └──────────────────────────────────────────────────────────┐
                                                                  ▼
                                                     ┌──────────────────┐
                                                     │ VECTOR SPACE     │
                                                     │ (Index Update)   │
                                                     └──────────────────┘

Consequences

Benefits

  1. Clear Ownership: Each bounded context has explicit responsibilities and can be developed by independent teams
  2. Reduced Coupling: Anti-corruption layers prevent domain model pollution across boundaries
  3. Testability: Each context can be tested in isolation with well-defined interfaces
  4. Scalability: Contexts can be deployed and scaled independently
  5. Evolvability: Internal implementations can change without affecting other contexts
  6. Domain Alignment: Ubiquitous language matches the bioacoustics domain

Risks

  1. Complexity: Six contexts introduce coordination overhead
  2. Data Duplication: Some data may be replicated across context boundaries
  3. Event Consistency: Eventual consistency between contexts requires careful handling
  4. Learning Curve: Team must understand DDD concepts and context boundaries

Mitigations

  1. Use event sourcing for cross-context communication
  2. Implement saga patterns for multi-context transactions
  3. Maintain comprehensive integration tests at context boundaries
  4. Document context mappings and keep them updated

References

  • Evans, Eric. "Domain-Driven Design: Tackling Complexity in the Heart of Software" (2003)
  • Vernon, Vaughn. "Implementing Domain-Driven Design" (2013)
  • Perch 2.0 Paper: arXiv:2508.04665
  • RuVector Documentation: https://github.com/ruvnet/ruvector

Revision History

Version Date Author Changes
1.0 2026-01-15 Architecture Team Initial ADR