dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

52 KiB

Raw Blame History

ADR-002: Domain-Driven Design Bounded Contexts

Status

Accepted

Date

2026-01-15

Context

7sense is a bioacoustics analysis platform that transforms bird audio recordings into navigable geometric spaces. The system processes audio through Perch 2.0 embeddings (1536-dimensional vectors), stores them in RuVector with HNSW indexing, and applies GNN learning to discover patterns, motifs, and sequences. The output feeds into RAB (Retrieval-Augmented Bioacoustics) evidence packs for transparent, citation-backed interpretations.

The complexity of this domain requires clear separation of concerns to:

Enable independent evolution of subsystems
Maintain clear ownership boundaries
Reduce coupling between technical and analytical components
Support distributed team development
Facilitate testing and validation at context boundaries

Decision

We adopt Domain-Driven Design (DDD) with six bounded contexts that represent distinct subdomains of the bioacoustics analysis pipeline:

Audio Ingestion Context
Embedding Context
Vector Space Context
Learning Context
Analysis Context
Interpretation Context

Bounded Context Definitions

1. Audio Ingestion Context

Purpose: Capture, segment, and preprocess raw audio recordings into analysis-ready call segments.

Ubiquitous Language

Term	Definition
Recording	A continuous audio capture from a sensor at a specific location and time
Sensor	A physical audio capture device with known characteristics (sample rate, gain, location)
Call Segment	An isolated vocalization extracted from a recording (typically 5 seconds at 32kHz)
Segmentation	The process of detecting and extracting individual vocalizations from continuous audio
SNR (Signal-to-Noise Ratio)	Quality metric indicating clarity of vocalization above background noise
Preprocessing	Normalization, resampling, and filtering applied before embedding
Habitat	Environmental classification of the recording location
Soundscape	The full acoustic environment including all sound sources

Aggregates and Entities

Aggregate: Recording
├── Entity: Recording (Aggregate Root)
│   ├── id: RecordingId (UUID)
│   ├── sensorId: SensorId
│   ├── location: GeoLocation {lat, lon, altitude}
│   ├── startTimestamp: DateTime
│   ├── duration: Duration
│   ├── habitat: HabitatType
│   ├── weather: WeatherConditions
│   ├── format: AudioFormat {sampleRate, channels, bitDepth}
│   └── status: IngestionStatus
│
├── Value Object: AudioFormat
│   ├── sampleRate: u32 (target: 32000 Hz)
│   ├── channels: u8 (target: 1 mono)
│   └── bitDepth: u8
│
└── Value Object: WeatherConditions
    ├── temperature: f32
    ├── humidity: f32
    ├── windSpeed: f32
    └── precipitation: PrecipitationType

Aggregate: CallSegment
├── Entity: CallSegment (Aggregate Root)
│   ├── id: SegmentId (UUID)
│   ├── recordingId: RecordingId
│   ├── startOffset: Duration (t0_ms)
│   ├── endOffset: Duration (t1_ms)
│   ├── snr: f32
│   ├── energy: f32
│   ├── clippingScore: f32
│   ├── overlapScore: f32
│   └── qualityGrade: QualityGrade
│
└── Value Object: SegmentMetrics
    ├── peakAmplitude: f32
    ├── rmsEnergy: f32
    ├── zeroCrossingRate: f32
    └── spectralCentroid: f32

Aggregate: Sensor
├── Entity: Sensor (Aggregate Root)
│   ├── id: SensorId
│   ├── model: String
│   ├── location: GeoLocation
│   ├── calibration: CalibrationProfile
│   └── status: SensorStatus
│
└── Value Object: CalibrationProfile
    ├── frequencyResponse: Vec<(f32, f32)>
    ├── noiseFloor: f32
    └── lastCalibrated: DateTime

Domain Events

Event	Payload	Published When
`RecordingReceived`	recordingId, sensorId, timestamp, duration	New audio file uploaded/streamed
`RecordingValidated`	recordingId, format, qualityScore	Format and quality checks pass
`RecordingRejected`	recordingId, reason, details	Recording fails validation
`SegmentationStarted`	recordingId, algorithm, parameters	Segmentation process begins
`SegmentExtracted`	segmentId, recordingId, timeRange, snr	Individual call isolated
`SegmentationCompleted`	recordingId, segmentCount, duration	All segments extracted
`PreprocessingCompleted`	segmentId, normalizedFormat	Segment ready for embedding

Services

// Domain Services
trait SegmentationService {
    fn segment_recording(recording: &Recording, config: SegmentationConfig)
        -> Result<Vec<CallSegment>, SegmentationError>;
    fn detect_vocalizations(audio: &AudioBuffer) -> Vec<TimeRange>;
}

trait PreprocessingService {
    fn normalize(segment: &CallSegment) -> NormalizedAudio;
    fn resample(audio: &AudioBuffer, targetRate: u32) -> AudioBuffer;
    fn apply_bandpass(audio: &AudioBuffer, lowHz: f32, highHz: f32) -> AudioBuffer;
}

trait QualityAssessmentService {
    fn compute_snr(segment: &CallSegment) -> f32;
    fn detect_clipping(segment: &CallSegment) -> f32;
    fn assess_quality(segment: &CallSegment) -> QualityGrade;
}

2. Embedding Context

Purpose: Transform preprocessed audio segments into 1536-dimensional Perch 2.0 embeddings suitable for vector space operations.

Ubiquitous Language

Term	Definition
Embedding	A 1536-dimensional vector representation of a call segment
Perch 2.0	Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone)
Mel Spectrogram	Time-frequency representation using mel-scaled frequency bins (500 frames x 128 bins)
Inference	The process of generating an embedding from audio input
Normalization	L2 normalization of embedding vectors for cosine similarity
Model Version	Specific checkpoint/version of the embedding model
Batch	Collection of segments processed together for efficiency
Embedding Stability	Consistency of embeddings for identical/similar inputs

Aggregates and Entities

Aggregate: Embedding
├── Entity: Embedding (Aggregate Root)
│   ├── id: EmbeddingId (UUID)
│   ├── segmentId: SegmentId
│   ├── vector: Vec<f32> (dim=1536)
│   ├── modelVersion: ModelVersion
│   ├── norm: f32
│   ├── createdAt: DateTime
│   └── metadata: EmbeddingMetadata
│
└── Value Object: EmbeddingMetadata
    ├── inferenceLatency: Duration
    ├── batchId: Option<BatchId>
    └── gpuUsed: bool

Aggregate: EmbeddingModel
├── Entity: EmbeddingModel (Aggregate Root)
│   ├── id: ModelId
│   ├── name: "perch2"
│   ├── version: SemanticVersion
│   ├── dimensions: u32 (1536)
│   ├── inputSpec: InputSpecification
│   └── status: ModelStatus
│
├── Value Object: InputSpecification
│   ├── sampleRate: 32000
│   ├── windowDuration: 5.0 seconds
│   ├── windowSamples: 160000
│   ├── melBins: 128
│   └── frequencyRange: (60, 16000) Hz
│
└── Value Object: ModelCheckpoint
    ├── path: String
    ├── format: ModelFormat (ONNX)
    └── checksum: String

Aggregate: EmbeddingBatch
├── Entity: EmbeddingBatch (Aggregate Root)
│   ├── id: BatchId
│   ├── segmentIds: Vec<SegmentId>
│   ├── status: BatchStatus
│   ├── startedAt: DateTime
│   ├── completedAt: Option<DateTime>
│   └── metrics: BatchMetrics
│
└── Value Object: BatchMetrics
    ├── totalSegments: u32
    ├── successCount: u32
    ├── failureCount: u32
    ├── avgLatencyMs: f32
    └── throughput: f32

Domain Events

Event	Payload	Published When
`EmbeddingRequested`	segmentId, modelVersion, priority	Segment queued for embedding
`BatchCreated`	batchId, segmentIds, modelVersion	Batch assembled for processing
`InferenceStarted`	embeddingId/batchId, modelVersion	Model inference begins
`EmbeddingGenerated`	embeddingId, segmentId, vector, norm	Single embedding computed
`BatchCompleted`	batchId, successCount, failureCount	Batch processing finishes
`EmbeddingFailed`	segmentId, error, retryable	Inference failure
`ModelVersionChanged`	oldVersion, newVersion, migrationRequired	Model updated
`EmbeddingNormalized`	embeddingId, originalNorm, normalizedVector	L2 normalization applied

Services

// Domain Services
trait EmbeddingService {
    fn embed_segment(segment: &NormalizedAudio, model: &EmbeddingModel)
        -> Result<Embedding, EmbeddingError>;
    fn embed_batch(segments: Vec<&NormalizedAudio>, model: &EmbeddingModel)
        -> Vec<Result<Embedding, EmbeddingError>>;
}

trait SpectrogramService {
    fn compute_mel_spectrogram(audio: &AudioBuffer) -> MelSpectrogram;
    fn validate_spectrogram(spectrogram: &MelSpectrogram) -> ValidationResult;
}

trait NormalizationService {
    fn l2_normalize(embedding: &Embedding) -> NormalizedEmbedding;
    fn validate_norm_stability(embeddings: &[Embedding]) -> StabilityReport;
}

trait ModelManagementService {
    fn load_model(version: &ModelVersion) -> Result<EmbeddingModel, ModelError>;
    fn validate_model_output(embedding: &Embedding) -> ValidationResult;
    fn compare_model_versions(v1: &ModelVersion, v2: &ModelVersion, samples: &[AudioBuffer])
        -> VersionComparisonReport;
}

3. Vector Space Context

Purpose: Index embeddings using HNSW, manage similarity search, and maintain the navigable neighbor graph that forms the geometric foundation.

Ubiquitous Language

Term	Definition
HNSW Index	Hierarchical Navigable Small World graph for approximate nearest neighbor search
Neighbor Graph	Network of similarity edges connecting acoustically related embeddings
k-NN Query	Search for k nearest neighbors to a query vector
Similarity Edge	Weighted connection between two embeddings based on distance
Distance Metric	Function measuring dissimilarity (cosine, euclidean, Poincare)
Index Layer	One level in the HNSW hierarchical structure
Entry Point	Starting node for graph traversal in search
ef (Search)	Exploration factor controlling search accuracy vs. speed
M (Construction)	Maximum number of connections per node per layer

Aggregates and Entities

Aggregate: VectorIndex
├── Entity: VectorIndex (Aggregate Root)
│   ├── id: IndexId
│   ├── name: String
│   ├── dimensions: u32 (1536)
│   ├── distanceMetric: DistanceMetric
│   ├── hnswConfig: HnswConfiguration
│   ├── vectorCount: u64
│   ├── layerCount: u32
│   └── status: IndexStatus
│
├── Value Object: HnswConfiguration
│   ├── m: u32 (max connections per layer)
│   ├── efConstruction: u32
│   ├── efSearch: u32
│   └── maxLayers: u32
│
└── Value Object: IndexStatistics
    ├── memoryUsage: u64
    ├── avgDegree: f32
    ├── layerDistribution: Vec<u32>
    └── searchLatencyP99: Duration

Aggregate: IndexedVector
├── Entity: IndexedVector (Aggregate Root)
│   ├── id: VectorId
│   ├── embeddingId: EmbeddingId
│   ├── indexId: IndexId
│   ├── layerMembership: Vec<u32>
│   ├── neighborIds: Vec<VectorId>
│   └── insertedAt: DateTime
│
└── Value Object: VectorPosition
    ├── entryDistance: f32
    └── layerDistances: Vec<f32>

Aggregate: SimilarityEdge
├── Entity: SimilarityEdge (Aggregate Root)
│   ├── id: EdgeId
│   ├── sourceId: VectorId
│   ├── targetId: VectorId
│   ├── distance: f32
│   ├── edgeType: EdgeType (SIMILAR, HNSW_NEIGHBOR)
│   └── weight: f32
│
└── Value Object: EdgeMetadata
    ├── createdAt: DateTime
    ├── lastAccessed: DateTime
    └── accessCount: u32

Aggregate: SearchQuery
├── Entity: SearchQuery (Aggregate Root)
│   ├── id: QueryId
│   ├── queryVector: Vec<f32>
│   ├── k: u32
│   ├── efSearch: u32
│   ├── filters: Vec<SearchFilter>
│   └── results: Option<SearchResults>
│
└── Value Object: SearchResults
    ├── neighbors: Vec<(VectorId, f32)>
    ├── searchLatency: Duration
    ├── nodesVisited: u32
    └── distanceComputations: u32

Domain Events

Event	Payload	Published When
`IndexCreated`	indexId, config, distanceMetric	New HNSW index initialized
`VectorInserted`	vectorId, embeddingId, indexId, layerAssignment	Embedding added to index
`VectorRemoved`	vectorId, indexId	Embedding removed from index
`NeighborGraphUpdated`	indexId, affectedVectors, newEdges	Graph structure modified
`SimilarityEdgeCreated`	edgeId, sourceId, targetId, distance	New similarity link established
`SearchExecuted`	queryId, k, latency, resultsCount	k-NN search completed
`IndexRebuildStarted`	indexId, reason, estimatedDuration	Index reconstruction begins
`IndexRebuildCompleted`	indexId, vectorCount, duration	Index reconstruction finishes
`IndexOptimized`	indexId, beforeStats, afterStats	Index compaction/optimization

Services

// Domain Services
trait VectorIndexService {
    fn create_index(config: IndexConfiguration) -> Result<VectorIndex, IndexError>;
    fn insert_vector(index: &mut VectorIndex, embedding: &Embedding)
        -> Result<IndexedVector, InsertionError>;
    fn remove_vector(index: &mut VectorIndex, vectorId: VectorId)
        -> Result<(), RemovalError>;
    fn rebuild_index(index: &mut VectorIndex) -> Result<IndexStatistics, RebuildError>;
}

trait SimilaritySearchService {
    fn knn_search(index: &VectorIndex, query: &[f32], k: u32, ef: u32)
        -> SearchResults;
    fn range_search(index: &VectorIndex, query: &[f32], radius: f32)
        -> Vec<(VectorId, f32)>;
    fn batch_search(index: &VectorIndex, queries: &[Vec<f32>], k: u32)
        -> Vec<SearchResults>;
}

trait NeighborGraphService {
    fn get_neighbors(vectorId: VectorId, depth: u32) -> NeighborGraph;
    fn compute_similarity_edges(index: &VectorIndex, topK: u32)
        -> Vec<SimilarityEdge>;
    fn prune_edges(index: &mut VectorIndex, threshold: f32) -> u32;
}

trait DistanceService {
    fn cosine_distance(a: &[f32], b: &[f32]) -> f32;
    fn euclidean_distance(a: &[f32], b: &[f32]) -> f32;
    fn poincare_distance(a: &[f32], b: &[f32], curvature: f32) -> f32;
}

4. Learning Context

Purpose: Train and apply GNN models to refine embeddings, learn transition patterns, and enable continuous self-improvement of the vector space.

Ubiquitous Language

Term	Definition
GNN (Graph Neural Network)	Neural network operating on graph-structured data
Message Passing	GNN mechanism where nodes aggregate information from neighbors
Graph Attention (GAT)	Attention-weighted message passing for learnable edge importance
Training Epoch	One complete pass through the training data
Contrastive Loss	Loss function pulling similar pairs together, pushing dissimilar apart
InfoNCE	Information Noise-Contrastive Estimation loss for self-supervised learning
Embedding Refinement	GNN-driven adjustment of embedding positions in vector space
Transition Edge	Temporal connection between sequential call segments
EWC (Elastic Weight Consolidation)	Technique preventing catastrophic forgetting during updates

Aggregates and Entities

Aggregate: LearningModel
├── Entity: LearningModel (Aggregate Root)
│   ├── id: ModelId
│   ├── architecture: GnnArchitecture (GAT, GraphSAGE, GCN)
│   ├── layers: Vec<LayerConfig>
│   ├── version: SemanticVersion
│   ├── trainedAt: DateTime
│   ├── metrics: TrainingMetrics
│   └── status: ModelStatus
│
├── Value Object: LayerConfig
│   ├── layerType: LayerType
│   ├── inputDim: u32
│   ├── outputDim: u32
│   ├── heads: u32 (for attention)
│   └── dropout: f32
│
└── Value Object: TrainingMetrics
    ├── epochs: u32
    ├── finalLoss: f32
    ├── validationScore: f32
    └── trainingDuration: Duration

Aggregate: TrainingSession
├── Entity: TrainingSession (Aggregate Root)
│   ├── id: SessionId
│   ├── modelId: ModelId
│   ├── config: TrainingConfiguration
│   ├── currentEpoch: u32
│   ├── status: SessionStatus
│   └── checkpoints: Vec<Checkpoint>
│
├── Value Object: TrainingConfiguration
│   ├── learningRate: f32
│   ├── batchSize: u32
│   ├── maxEpochs: u32
│   ├── lossFunction: LossType (InfoNCE, Triplet, Contrastive)
│   ├── optimizer: OptimizerType
│   └── ewcEnabled: bool
│
└── Value Object: Checkpoint
    ├── epoch: u32
    ├── loss: f32
    ├── weightsPath: String
    └── timestamp: DateTime

Aggregate: TransitionGraph
├── Entity: TransitionGraph (Aggregate Root)
│   ├── id: GraphId
│   ├── nodeCount: u32
│   ├── edgeCount: u32
│   ├── edgeTypes: Vec<EdgeType>
│   └── statistics: GraphStatistics
│
├── Entity: TransitionEdge
│   ├── id: EdgeId
│   ├── sourceSegmentId: SegmentId
│   ├── targetSegmentId: SegmentId
│   ├── edgeType: EdgeType (NEXT, SIMILAR, CO_OCCURRENCE)
│   ├── weight: f32
│   └── metadata: EdgeMetadata
│
└── Value Object: GraphStatistics
    ├── avgDegree: f32
    ├── clusteringCoefficient: f32
    ├── diameter: u32
    └── componentCount: u32

Aggregate: RefinedEmbedding
├── Entity: RefinedEmbedding (Aggregate Root)
│   ├── id: RefinedEmbeddingId
│   ├── originalEmbeddingId: EmbeddingId
│   ├── refinedVector: Vec<f32>
│   ├── modelVersion: ModelVersion
│   ├── refinementDelta: f32
│   └── createdAt: DateTime
│
└── Value Object: RefinementMetadata
    ├── neighborInfluence: Vec<(EmbeddingId, f32)>
    ├── attentionWeights: Vec<f32>
    └── iterations: u32

Domain Events

Event	Payload	Published When
`TrainingSessionStarted`	sessionId, modelId, config	GNN training begins
`EpochCompleted`	sessionId, epoch, loss, metrics	Training epoch finishes
`CheckpointSaved`	sessionId, epoch, path	Model weights saved
`TrainingCompleted`	sessionId, finalMetrics	Training session ends
`ModelDeployed`	modelId, version	New model activated
`EmbeddingsRefined`	batchId, vectorCount, avgDelta	GNN refinement applied
`TransitionEdgeDiscovered`	edgeId, source, target, type	New temporal relationship
`GraphStructureUpdated`	graphId, addedEdges, removedEdges	Transition graph modified
`LearningRateAdjusted`	sessionId, oldLr, newLr	Adaptive LR change
`EwcConsolidated`	sessionId, importantWeights	EWC protection updated

Services

// Domain Services
trait GnnTrainingService {
    fn start_training(model: &LearningModel, graph: &TransitionGraph, config: TrainingConfiguration)
        -> Result<TrainingSession, TrainingError>;
    fn run_epoch(session: &mut TrainingSession, batch: &GraphBatch)
        -> EpochResult;
    fn save_checkpoint(session: &TrainingSession) -> Result<Checkpoint, IoError>;
    fn apply_ewc(session: &mut TrainingSession, importanceMatrix: &ImportanceMatrix);
}

trait EmbeddingRefinementService {
    fn refine_embeddings(embeddings: &[Embedding], model: &LearningModel, graph: &TransitionGraph)
        -> Vec<RefinedEmbedding>;
    fn compute_refinement_delta(original: &Embedding, refined: &RefinedEmbedding) -> f32;
}

trait TransitionGraphService {
    fn build_transition_graph(segments: &[CallSegment], recordings: &[Recording])
        -> TransitionGraph;
    fn add_temporal_edges(graph: &mut TransitionGraph, sequences: &[SegmentSequence]);
    fn add_similarity_edges(graph: &mut TransitionGraph, index: &VectorIndex, topK: u32);
    fn compute_graph_statistics(graph: &TransitionGraph) -> GraphStatistics;
}

trait AttentionService {
    fn compute_attention_weights(query: &Embedding, neighbors: &[Embedding]) -> Vec<f32>;
    fn apply_graph_attention(node: &GraphNode, neighbors: &[GraphNode], model: &GatLayer)
        -> AttentionOutput;
}

5. Analysis Context

Purpose: Perform clustering, motif detection, sequence mining, and pattern discovery on the refined vector space.

Ubiquitous Language

Term	Definition
Cluster	Group of acoustically similar call segments
Prototype	Representative embedding for a cluster (centroid or medoid)
Exemplar	Actual call segment that best represents a cluster
Motif	Recurring pattern or phrase in vocalization sequences
Sequence	Ordered series of call segments from a recording
Transition Matrix	Probability matrix of call-to-call transitions
Entropy Rate	Measure of unpredictability in vocalization sequences
Call Type	Functional category of vocalization (alarm, contact, song)
Dialect	Regional variation in vocalization patterns

Aggregates and Entities

Aggregate: Cluster
├── Entity: Cluster (Aggregate Root)
│   ├── id: ClusterId
│   ├── method: ClusteringMethod (HDBSCAN, KMeans, Spectral)
│   ├── parameters: ClusteringParameters
│   ├── memberCount: u32
│   ├── cohesion: f32
│   ├── separation: f32
│   └── status: ClusterStatus
│
├── Entity: Prototype
│   ├── id: PrototypeId
│   ├── clusterId: ClusterId
│   ├── centroidVector: Vec<f32>
│   ├── exemplarIds: Vec<SegmentId>
│   └── stability: f32
│
└── Value Object: ClusteringParameters
    ├── minClusterSize: u32
    ├── minSamples: u32
    ├── epsilon: Option<f32>
    └── metric: DistanceMetric

Aggregate: ClusterAssignment
├── Entity: ClusterAssignment (Aggregate Root)
│   ├── id: AssignmentId
│   ├── segmentId: SegmentId
│   ├── clusterId: ClusterId
│   ├── confidence: f32
│   ├── distance_to_centroid: f32
│   └── assignedAt: DateTime
│
└── Value Object: SoftAssignment
    ├── clusterProbabilities: Vec<(ClusterId, f32)>
    └── isAmbiguous: bool

Aggregate: Motif
├── Entity: Motif (Aggregate Root)
│   ├── id: MotifId
│   ├── pattern: Vec<ClusterId>
│   ├── occurrenceCount: u32
│   ├── avgDuration: Duration
│   ├── confidence: f32
│   └── context: MotifContext
│
├── Value Object: MotifOccurrence
│   ├── recordingId: RecordingId
│   ├── startSegmentId: SegmentId
│   ├── segmentIds: Vec<SegmentId>
│   └── timestamp: DateTime
│
└── Value Object: MotifContext
    ├── typicalHabitat: Vec<HabitatType>
    ├── timeOfDay: Vec<TimeRange>
    └── associatedBehavior: Option<String>

Aggregate: SequenceAnalysis
├── Entity: SequenceAnalysis (Aggregate Root)
│   ├── id: AnalysisId
│   ├── recordingId: RecordingId
│   ├── segmentSequence: Vec<SegmentId>
│   ├── clusterSequence: Vec<ClusterId>
│   ├── transitionMatrix: TransitionMatrix
│   └── metrics: SequenceMetrics
│
├── Value Object: TransitionMatrix
│   ├── clusterIds: Vec<ClusterId>
│   ├── probabilities: Vec<Vec<f32>>
│   └── observations: Vec<Vec<u32>>
│
└── Value Object: SequenceMetrics
    ├── entropyRate: f32
    ├── stereotypy: f32
    ├── motifDensity: f32
    └── uniqueTransitions: u32

Aggregate: Anomaly
├── Entity: Anomaly (Aggregate Root)
│   ├── id: AnomalyId
│   ├── segmentId: SegmentId
│   ├── anomalyType: AnomalyType (Rare, Novel, Artifact)
│   ├── score: f32
│   ├── nearestCluster: Option<ClusterId>
│   └── detectedAt: DateTime
│
└── Value Object: AnomalyContext
    ├── neighborDistances: Vec<f32>
    ├── localDensity: f32
    └── globalRarity: f32

Domain Events

Event	Payload	Published When
`ClusteringStarted`	clusterId, method, parameters	Clustering analysis begins
`ClusteringCompleted`	clusterId, clusterCount, metrics	Clustering finishes
`ClusterAssigned`	assignmentId, segmentId, clusterId, confidence	Segment assigned to cluster
`PrototypeUpdated`	prototypeId, clusterId, newCentroid	Cluster representative changed
`MotifDiscovered`	motifId, pattern, occurrenceCount	New recurring pattern found
`MotifOccurrenceFound`	motifId, recordingId, segmentIds	Motif instance detected
`SequenceAnalyzed`	analysisId, recordingId, entropyRate	Sequence metrics computed
`AnomalyDetected`	anomalyId, segmentId, score, type	Unusual vocalization found
`TransitionMatrixUpdated`	recordingId, entropyChange	Transition probabilities recalculated
`DialectIdentified`	clusterId, region, distinctiveness	Regional variant discovered

Services

// Domain Services
trait ClusteringService {
    fn cluster_embeddings(embeddings: &[Embedding], method: ClusteringMethod, params: ClusteringParameters)
        -> ClusteringResult;
    fn assign_to_cluster(embedding: &Embedding, clusters: &[Cluster])
        -> ClusterAssignment;
    fn compute_prototype(cluster: &Cluster, members: &[Embedding]) -> Prototype;
    fn evaluate_clustering(clusters: &[Cluster], assignments: &[ClusterAssignment])
        -> ClusteringMetrics;
}

trait MotifDetectionService {
    fn discover_motifs(sequences: &[SequenceAnalysis], minSupport: u32, maxLength: u32)
        -> Vec<Motif>;
    fn find_motif_occurrences(motif: &Motif, sequence: &SequenceAnalysis)
        -> Vec<MotifOccurrence>;
    fn validate_motif_dtw(motif: &Motif, occurrences: &[MotifOccurrence])
        -> ValidationResult;
}

trait SequenceAnalysisService {
    fn analyze_sequence(recording: &Recording, segments: &[CallSegment], clusters: &[Cluster])
        -> SequenceAnalysis;
    fn compute_transition_matrix(clusterSequence: &[ClusterId]) -> TransitionMatrix;
    fn compute_entropy_rate(matrix: &TransitionMatrix) -> f32;
    fn compute_stereotypy(matrix: &TransitionMatrix) -> f32;
}

trait AnomalyDetectionService {
    fn detect_anomalies(embeddings: &[Embedding], index: &VectorIndex, threshold: f32)
        -> Vec<Anomaly>;
    fn classify_anomaly(anomaly: &Anomaly, context: &AnalysisContext) -> AnomalyType;
    fn compute_local_outlier_factor(embedding: &Embedding, neighbors: &[Embedding]) -> f32;
}

6. Interpretation Context

Purpose: Generate RAB (Retrieval-Augmented Bioacoustics) evidence packs and constrained interpretations with full citation and transparency.

Ubiquitous Language

Term	Definition
Evidence Pack	Structured collection of supporting data for an interpretation
Citation	Reference to specific retrieved calls supporting a statement
Constrained Generation	Output limited to evidence-backed structural descriptions
Structural Descriptor	Objective characterization (pitch contour, rhythm, spectral texture)
Interpretation	Evidence-backed analysis of vocalization meaning/context
Confidence Level	Certainty measure based on evidence quality and quantity
Attribution	Clear link between interpretation statements and source evidence
Hypothesis	Testable suggestion generated from pattern analysis

Aggregates and Entities

Aggregate: EvidencePack
├── Entity: EvidencePack (Aggregate Root)
│   ├── id: EvidencePackId
│   ├── querySegmentId: SegmentId
│   ├── queryType: QueryType (Segment, TimeInterval, Habitat)
│   ├── retrievedNeighbors: Vec<RetrievedNeighbor>
│   ├── clusterExemplars: Vec<Exemplar>
│   ├── sequenceContext: SequenceContext
│   ├── signalQuality: SignalQuality
│   └── generatedAt: DateTime
│
├── Value Object: RetrievedNeighbor
│   ├── segmentId: SegmentId
│   ├── distance: f32
│   ├── clusterId: Option<ClusterId>
│   ├── spectrogramThumbnail: Option<ThumbnailId>
│   └── metadata: SegmentMetadata
│
├── Value Object: SequenceContext
│   ├── previousSegments: Vec<SegmentId>
│   ├── nextSegments: Vec<SegmentId>
│   ├── positionInRecording: f32
│   └── localMotifs: Vec<MotifId>
│
└── Value Object: SignalQuality
    ├── snr: f32
    ├── clippingScore: f32
    ├── overlapScore: f32
    └── qualityGrade: QualityGrade

Aggregate: Interpretation
├── Entity: Interpretation (Aggregate Root)
│   ├── id: InterpretationId
│   ├── evidencePackId: EvidencePackId
│   ├── interpretationType: InterpretationType
│   ├── statements: Vec<InterpretationStatement>
│   ├── overallConfidence: f32
│   ├── hypotheses: Vec<Hypothesis>
│   └── generatedAt: DateTime
│
├── Entity: InterpretationStatement
│   ├── id: StatementId
│   ├── content: String
│   ├── statementType: StatementType
│   ├── citations: Vec<Citation>
│   ├── confidence: f32
│   └── constraints: Vec<Constraint>
│
├── Value Object: Citation
│   ├── sourceType: CitationSource (Neighbor, Exemplar, Motif, Cluster)
│   ├── sourceId: String
│   ├── relevance: f32
│   └── excerpt: Option<String>
│
└── Value Object: Hypothesis
    ├── statement: String
    ├── testability: TestabilityLevel
    ├── supportingEvidence: Vec<CitationId>
    └── suggestedExperiment: Option<String>

Aggregate: StructuralDescriptor
├── Entity: StructuralDescriptor (Aggregate Root)
│   ├── id: DescriptorId
│   ├── segmentId: SegmentId
│   ├── pitchContour: PitchContourStats
│   ├── rhythmProfile: RhythmProfile
│   ├── spectralTexture: SpectralTexture
│   └── sequenceRole: SequenceRole
│
├── Value Object: PitchContourStats
│   ├── minFrequency: f32
│   ├── maxFrequency: f32
│   ├── meanFrequency: f32
│   ├── contourShape: ContourShape
│   └── bandwidth: f32
│
├── Value Object: RhythmProfile
│   ├── duration: Duration
│   ├── syllableCount: u32
│   ├── interSyllableIntervals: Vec<Duration>
│   └── rhythmRegularity: f32
│
├── Value Object: SpectralTexture
│   ├── harmonicity: f32
│   ├── spectralCentroid: f32
│   ├── spectralFlatness: f32
│   └── wienerEntropy: f32
│
└── Value Object: SequenceRole
    ├── typicalPredecessors: Vec<ClusterId>
    ├── typicalSuccessors: Vec<ClusterId>
    ├── positionDistribution: PositionDistribution
    └── contextualFrequency: f32

Aggregate: MonitoringSummary
├── Entity: MonitoringSummary (Aggregate Root)
│   ├── id: SummaryId
│   ├── timeRange: TimeRange
│   ├── location: GeoLocation
│   ├── callCounts: HashMap<ClusterId, u32>
│   ├── diversityMetrics: DiversityMetrics
│   ├── anomalies: Vec<AnomalyId>
│   └── interpretations: Vec<InterpretationId>
│
└── Value Object: DiversityMetrics
    ├── speciesRichness: u32
    ├── shannonIndex: f32
    ├── simpsonIndex: f32
    └── evenness: f32

Domain Events

Event	Payload	Published When
`EvidencePackRequested`	querySegmentId, queryType, parameters	Analysis request initiated
`EvidencePackAssembled`	evidencePackId, neighborCount, exemplarCount	Evidence gathering complete
`InterpretationGenerated`	interpretationId, evidencePackId, statementCount	Interpretation created
`StatementCited`	statementId, citations	Statement linked to evidence
`HypothesisProposed`	hypothesisId, interpretationId, testability	Testable hypothesis generated
`StructuralDescriptorComputed`	descriptorId, segmentId	Acoustic features extracted
`MonitoringSummaryGenerated`	summaryId, timeRange, location	Period summary created
`AnnotationSuggested`	segmentId, suggestedLabel, confidence	Label recommendation made
`InterpretationValidated`	interpretationId, validationResult	Expert review completed

Services

// Domain Services
trait EvidencePackService {
    fn assemble_evidence_pack(
        querySegment: &CallSegment,
        index: &VectorIndex,
        clusters: &[Cluster],
        sequences: &[SequenceAnalysis],
        config: EvidencePackConfig
    ) -> EvidencePack;

    fn retrieve_neighbors(segment: &CallSegment, index: &VectorIndex, k: u32)
        -> Vec<RetrievedNeighbor>;
    fn get_sequence_context(segment: &CallSegment, recording: &Recording)
        -> SequenceContext;
}

trait InterpretationService {
    fn generate_interpretation(evidencePack: &EvidencePack, constraints: &[Constraint])
        -> Interpretation;
    fn create_statement(content: &str, citations: &[Citation], statementType: StatementType)
        -> InterpretationStatement;
    fn generate_hypotheses(evidencePack: &EvidencePack, interpretation: &Interpretation)
        -> Vec<Hypothesis>;
}

trait StructuralDescriptorService {
    fn compute_descriptors(segment: &CallSegment) -> StructuralDescriptor;
    fn extract_pitch_contour(audio: &AudioBuffer) -> PitchContourStats;
    fn analyze_rhythm(segments: &[CallSegment]) -> RhythmProfile;
    fn compute_spectral_texture(spectrogram: &MelSpectrogram) -> SpectralTexture;
}

trait MonitoringService {
    fn generate_summary(
        recordings: &[Recording],
        timeRange: TimeRange,
        location: GeoLocation
    ) -> MonitoringSummary;
    fn compute_diversity_metrics(clusterAssignments: &[ClusterAssignment])
        -> DiversityMetrics;
    fn detect_temporal_patterns(summaries: &[MonitoringSummary])
        -> Vec<TemporalPattern>;
}

trait CitationService {
    fn create_citation(source: CitationSource, sourceId: &str, relevance: f32)
        -> Citation;
    fn validate_citation(citation: &Citation, evidencePack: &EvidencePack)
        -> ValidationResult;
    fn format_attribution(statement: &InterpretationStatement) -> String;
}

Context Mapping

Relationships Between Contexts

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           CONTEXT MAP                                            │
└─────────────────────────────────────────────────────────────────────────────────┘

                    ┌──────────────────┐
                    │  Audio Ingestion │
                    │     Context      │
                    └────────┬─────────┘
                             │
                             │ [U/D] CallSegment
                             │ Published Language
                             ▼
                    ┌──────────────────┐
                    │    Embedding     │
                    │     Context      │
                    └────────┬─────────┘
                             │
                             │ [U/D] Embedding
                             │ Published Language
                             ▼
                    ┌──────────────────┐
                    │   Vector Space   │◄──────────────────────┐
                    │     Context      │                       │
                    └────────┬─────────┘                       │
                             │                                 │
            ┌────────────────┼────────────────┐                │
            │                │                │                │
            │ [ACL]          │ [ACL]          │ [ACL]          │
            ▼                ▼                ▼                │
   ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐       │
   │   Learning  │  │   Analysis  │  │ Interpretation  │       │
   │   Context   │  │   Context   │  │    Context      │       │
   └──────┬──────┘  └──────┬──────┘  └────────┬────────┘       │
          │                │                   │                │
          │ [Partnership]  │ [Customer/        │ [Customer/     │
          │                │  Supplier]        │  Supplier]     │
          └────────────────┴───────────────────┘                │
                           │                                    │
                           │ RefinedEmbedding                   │
                           └────────────────────────────────────┘


LEGEND:
  [U/D]  = Upstream/Downstream (Published Language)
  [ACL]  = Anti-Corruption Layer
  [Partnership] = Shared development, mutual dependency
  [Customer/Supplier] = Clear provider/consumer relationship

Integration Patterns

Upstream	Downstream	Pattern	Shared Kernel
Audio Ingestion	Embedding	Published Language	`CallSegment`, `SegmentId`, `QualityGrade`
Embedding	Vector Space	Published Language	`Embedding`, `EmbeddingId`, `Vec<f32>`
Vector Space	Learning	ACL + Partnership	`VectorIndex`, `NeighborGraph`
Vector Space	Analysis	ACL + Customer/Supplier	`SearchResults`, `SimilarityEdge`
Vector Space	Interpretation	ACL + Customer/Supplier	`SearchResults`, `RetrievedNeighbor`
Learning	Vector Space	Partnership	`RefinedEmbedding` (feedback loop)
Analysis	Interpretation	Customer/Supplier	`Cluster`, `Motif`, `SequenceAnalysis`

Anti-Corruption Layers

Learning Context ACL

/// Translates Vector Space concepts to Learning domain
mod learning_acl {
    use crate::vector_space::{VectorIndex, IndexedVector, SimilarityEdge};
    use crate::learning::{TransitionGraph, GraphNode, GraphEdge};

    pub struct VectorSpaceAdapter {
        index: Arc<VectorIndex>,
    }

    impl VectorSpaceAdapter {
        /// Convert HNSW neighbor graph to GNN-compatible format
        pub fn to_transition_graph(&self, max_neighbors: u32) -> TransitionGraph {
            let nodes: Vec<GraphNode> = self.index
                .iter_vectors()
                .map(|v| GraphNode {
                    id: v.id.into(),
                    embedding: v.embedding_id,
                    features: self.extract_node_features(&v),
                })
                .collect();

            let edges: Vec<GraphEdge> = self.index
                .iter_similarity_edges()
                .filter(|e| e.distance < self.distance_threshold())
                .map(|e| GraphEdge {
                    source: e.source_id.into(),
                    target: e.target_id.into(),
                    edge_type: EdgeType::Similarity,
                    weight: 1.0 - e.distance, // Convert distance to similarity
                })
                .collect();

            TransitionGraph::new(nodes, edges)
        }

        /// Query neighbors without exposing HNSW internals
        pub fn get_trainable_neighbors(&self, vector_id: VectorId, k: u32)
            -> Vec<(GraphNodeId, f32)>
        {
            self.index
                .knn_search_by_id(vector_id, k)
                .map(|(vid, dist)| (vid.into(), 1.0 - dist))
                .collect()
        }
    }
}

Analysis Context ACL

/// Translates Vector Space results to Analysis domain
mod analysis_acl {
    use crate::vector_space::{SearchResults, VectorIndex};
    use crate::analysis::{ClusterCandidate, SimilarityMatrix};

    pub struct SearchResultsAdapter;

    impl SearchResultsAdapter {
        /// Convert k-NN results to clustering input
        pub fn to_similarity_matrix(
            index: &VectorIndex,
            embeddings: &[EmbeddingId],
            k: u32
        ) -> SimilarityMatrix {
            let n = embeddings.len();
            let mut matrix = SimilarityMatrix::new(n);

            for (i, emb_id) in embeddings.iter().enumerate() {
                let neighbors = index.knn_search_by_embedding_id(*emb_id, k);
                for (neighbor_id, distance) in neighbors {
                    if let Some(j) = embeddings.iter().position(|e| *e == neighbor_id) {
                        matrix.set(i, j, 1.0 - distance);
                    }
                }
            }

            matrix
        }

        /// Extract cluster candidates from dense regions
        pub fn identify_dense_regions(
            index: &VectorIndex,
            min_density: f32
        ) -> Vec<ClusterCandidate> {
            index.iter_vectors()
                .filter_map(|v| {
                    let local_density = index.compute_local_density(v.id);
                    if local_density >= min_density {
                        Some(ClusterCandidate {
                            center_id: v.embedding_id,
                            density: local_density,
                            estimated_size: (local_density * 100.0) as u32,
                        })
                    } else {
                        None
                    }
                })
                .collect()
        }
    }
}

Interpretation Context ACL

/// Translates Analysis results to Interpretation domain
mod interpretation_acl {
    use crate::analysis::{Cluster, Motif, SequenceAnalysis, ClusterAssignment};
    use crate::interpretation::{
        EvidencePack, RetrievedNeighbor, Exemplar, SequenceContext
    };

    pub struct AnalysisAdapter {
        clusters: Arc<HashMap<ClusterId, Cluster>>,
        motifs: Arc<HashMap<MotifId, Motif>>,
    }

    impl AnalysisAdapter {
        /// Build evidence pack from analysis artifacts
        pub fn build_evidence_pack(
            &self,
            query_segment: &CallSegment,
            neighbors: Vec<(SegmentId, f32)>,
            sequence: &SequenceAnalysis,
        ) -> EvidencePack {
            let retrieved_neighbors: Vec<RetrievedNeighbor> = neighbors
                .into_iter()
                .map(|(seg_id, distance)| {
                    let cluster_id = self.find_cluster_for_segment(seg_id);
                    RetrievedNeighbor {
                        segment_id: seg_id,
                        distance,
                        cluster_id,
                        spectogram_thumbnail: self.generate_thumbnail(seg_id),
                        metadata: self.get_segment_metadata(seg_id),
                    }
                })
                .collect();

            let exemplars: Vec<Exemplar> = self.get_relevant_exemplars(
                &retrieved_neighbors,
                5 // top 5 exemplars
            );

            let sequence_context = SequenceContext {
                previous_segments: sequence.get_predecessors(query_segment.id, 3),
                next_segments: sequence.get_successors(query_segment.id, 3),
                position_in_recording: sequence.relative_position(query_segment.id),
                local_motifs: self.find_local_motifs(query_segment.id, sequence),
            };

            EvidencePack {
                id: EvidencePackId::new(),
                query_segment_id: query_segment.id,
                query_type: QueryType::Segment,
                retrieved_neighbors,
                cluster_exemplars: exemplars,
                sequence_context,
                signal_quality: self.assess_quality(query_segment),
                generated_at: Utc::now(),
            }
        }

        /// Convert cluster to citable evidence
        pub fn cluster_to_citation(&self, cluster_id: ClusterId) -> Citation {
            let cluster = self.clusters.get(&cluster_id)
                .expect("Cluster not found");

            Citation {
                source_type: CitationSource::Cluster,
                source_id: cluster_id.to_string(),
                relevance: cluster.cohesion,
                excerpt: Some(format!(
                    "Cluster {} with {} members (cohesion: {:.2})",
                    cluster_id, cluster.member_count, cluster.cohesion
                )),
            }
        }
    }
}

Shared Kernel

The following types are shared across multiple contexts and form the ubiquitous language foundation:

/// Shared identifiers
pub mod shared_kernel {
    use uuid::Uuid;

    // Core identifiers shared across all contexts
    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct RecordingId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct SegmentId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct EmbeddingId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct ClusterId(Uuid);

    #[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
    pub struct MotifId(Uuid);

    // Shared value objects
    #[derive(Clone, Debug)]
    pub struct GeoLocation {
        pub latitude: f64,
        pub longitude: f64,
        pub altitude: Option<f32>,
    }

    #[derive(Clone, Debug)]
    pub struct TimeRange {
        pub start: DateTime<Utc>,
        pub end: DateTime<Utc>,
    }

    #[derive(Clone, Copy, Debug, PartialEq)]
    pub enum QualityGrade {
        Excellent,  // SNR > 20dB, no clipping
        Good,       // SNR > 10dB, minimal issues
        Fair,       // SNR > 5dB, some artifacts
        Poor,       // SNR < 5dB or significant issues
        Unusable,   // Too degraded for analysis
    }

    // Embedding vector type (1536-D for Perch 2.0)
    pub type EmbeddingVector = Vec<f32>;
    pub const EMBEDDING_DIM: usize = 1536;

    // Audio format constants for Perch 2.0
    pub const TARGET_SAMPLE_RATE: u32 = 32000;
    pub const TARGET_WINDOW_SECONDS: f32 = 5.0;
    pub const TARGET_WINDOW_SAMPLES: usize = 160000;
    pub const MEL_BINS: usize = 128;
    pub const MEL_FRAMES: usize = 500;
}

Event Flow

Recording Upload
       │
       ▼
┌──────────────────────────────────────────────────────────────┐
│ AUDIO INGESTION CONTEXT                                       │
│   RecordingReceived → RecordingValidated → SegmentExtracted  │
└──────────────────────────────────────────────────────────────┘
       │ CallSegment (Published Language)
       ▼
┌──────────────────────────────────────────────────────────────┐
│ EMBEDDING CONTEXT                                             │
│   EmbeddingRequested → InferenceStarted → EmbeddingGenerated │
└──────────────────────────────────────────────────────────────┘
       │ Embedding (Published Language)
       ▼
┌──────────────────────────────────────────────────────────────┐
│ VECTOR SPACE CONTEXT                                          │
│   VectorInserted → NeighborGraphUpdated → SimilarityEdgeCreated│
└──────────────────────────────────────────────────────────────┘
       │                    │                         │
       │                    │                         │
       ▼                    ▼                         ▼
┌─────────────┐    ┌─────────────────┐    ┌─────────────────────┐
│  LEARNING   │    │    ANALYSIS     │    │   INTERPRETATION    │
│   CONTEXT   │    │     CONTEXT     │    │      CONTEXT        │
│             │    │                 │    │                     │
│ Training-   │    │ Clustering-     │    │ EvidencePack-       │
│ Started     │    │ Completed       │    │ Assembled           │
│     │       │    │     │           │    │     │               │
│     ▼       │    │     ▼           │    │     ▼               │
│ Embeddings- │    │ MotifDiscovered │    │ Interpretation-     │
│ Refined     │    │                 │    │ Generated           │
└─────────────┘    └─────────────────┘    └─────────────────────┘
       │
       │ RefinedEmbedding (feedback to Vector Space)
       └──────────────────────────────────────────────────────────┐
                                                                  ▼
                                                     ┌──────────────────┐
                                                     │ VECTOR SPACE     │
                                                     │ (Index Update)   │
                                                     └──────────────────┘

Consequences

Benefits

Clear Ownership: Each bounded context has explicit responsibilities and can be developed by independent teams
Reduced Coupling: Anti-corruption layers prevent domain model pollution across boundaries
Testability: Each context can be tested in isolation with well-defined interfaces
Scalability: Contexts can be deployed and scaled independently
Evolvability: Internal implementations can change without affecting other contexts
Domain Alignment: Ubiquitous language matches the bioacoustics domain

Risks

Complexity: Six contexts introduce coordination overhead
Data Duplication: Some data may be replicated across context boundaries
Event Consistency: Eventual consistency between contexts requires careful handling
Learning Curve: Team must understand DDD concepts and context boundaries

Mitigations

Use event sourcing for cross-context communication
Implement saga patterns for multi-context transactions
Maintain comprehensive integration tests at context boundaries
Document context mappings and keep them updated

References

Evans, Eric. "Domain-Driven Design: Tackling Complexity in the Heart of Software" (2003)
Vernon, Vaughn. "Implementing Domain-Driven Design" (2013)
Perch 2.0 Paper: arXiv:2508.04665
RuVector Documentation: https://github.com/ruvnet/ruvector

Revision History

Version	Date	Author	Changes
1.0	2026-01-15	Architecture Team	Initial ADR

52 KiB Raw Blame History

ADR-002: Domain-Driven Design Bounded Contexts

Status

Date

Context

Decision

Bounded Context Definitions

1. Audio Ingestion Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

2. Embedding Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

3. Vector Space Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

4. Learning Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

5. Analysis Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

6. Interpretation Context

Ubiquitous Language

Aggregates and Entities

Domain Events

Services

Context Mapping

Relationships Between Contexts

Integration Patterns

Anti-Corruption Layers

Learning Context ACL

Analysis Context ACL

Interpretation Context ACL

Shared Kernel

Event Flow

Consequences

Benefits

Risks

Mitigations

References

Revision History

52 KiB

Raw Blame History