# ADR-001: System Architecture Overview **Status:** Accepted **Date:** 2026-01-15 **Decision Makers:** 7sense Architecture Team **Technical Area:** System Architecture --- ## Context and Problem Statement 7sense aims to transform bioacoustic signals (primarily bird calls) into a navigable geometric space where meaningful structure emerges. The system must process audio recordings, generate high-dimensional embeddings using Perch 2.0 (1536-D vectors), organize them with HNSW indexing in RuVector, and apply GNN-based learning to surface patterns such as call types, motifs, and behavioral contexts. The core challenge is designing an architecture that: 1. **Handles diverse data pipelines** - From raw 32kHz audio to queryable vector embeddings 2. **Scales to millions of call segments** - Real-world bioacoustic monitoring generates vast datasets 3. **Supports scientific workflows** - Researchers need reproducibility, transparency, and evidence-backed interpretations (RAB pattern) 4. **Enables real-time and batch processing** - Field deployments require streaming; research requires bulk analysis 5. **Integrates ML inference efficiently** - ONNX-based Perch 2.0 inference in Rust for performance ### Current State This is a greenfield project building upon: - **Perch 2.0**: Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone, 1536-D output) - **RuVector**: Rust-based vector database with HNSW indexing and self-learning GNN layers - **RAB Pattern**: Retrieval-Augmented Bioacoustics for evidence-backed interpretation --- ## Decision Drivers ### Performance Requirements - **Embedding generation**: Process 5-second audio segments at >100 segments/second - **Vector search**: Sub-millisecond kNN queries on 1M+ vectors (HNSW target: ~100us) - **Batch ingestion**: 1M vectors/minute build speed (RuVector baseline) - **Memory efficiency**: Support 32x compression for cold data tiers ### Scalability Requirements - **Data volume**: Support 10K to 10M+ call segments per deployment - **Concurrent users**: Multiple researchers querying simultaneously - **Geographic distribution**: Sensor networks across multiple sites - **Temporal depth**: Years of historical recordings ### Scientific Rigor Requirements - **Reproducibility**: Deterministic pipelines with versioned models and parameters - **Transparency**: RAB-style evidence packs citing retrieved calls for any interpretation - **Auditability**: Full provenance tracking from raw audio to conclusions - **Validation**: Built-in verification against ground truth labels ### Operational Requirements - **Deployment flexibility**: Edge (sensor), cloud, and hybrid deployments - **Monitoring**: Health metrics, processing throughput, index quality - **Updates**: Hot-swap embedding models without full reindexing - **Recovery**: Graceful degradation and disaster recovery --- ## Considered Options ### Option A: Monolithic Architecture A single application handling all concerns: audio processing, embedding generation, vector storage, GNN learning, API serving, and visualization. **Pros:** - Simplest deployment model - No inter-service communication overhead - Single codebase to maintain **Cons:** - Cannot scale components independently - Single point of failure - Difficult to update individual components - Memory pressure from co-located ML models - Not suitable for distributed sensor networks ### Option B: Microservices Architecture Fully decomposed services: Audio Ingest Service, Embedding Service, Vector Store Service, GNN Learning Service, Query Service, Visualization Service, etc. **Pros:** - Independent scaling per service - Technology flexibility per service - Fault isolation - Team parallelization **Cons:** - Significant operational complexity - Network latency between services - Data consistency challenges - Overkill for initial team size - Complex debugging across service boundaries ### Option C: Modular Monolith Architecture A single deployable unit with clearly separated internal modules, designed for future extraction into services if needed. **Pros:** - Maintains deployment simplicity - Clear module boundaries enable future splitting - In-process communication for performance-critical paths - Easier debugging and testing - Appropriate for current team/project scale - Can evolve toward microservices as needs emerge **Cons:** - Requires discipline to maintain module boundaries - All modules share the same runtime resources - Scaling requires scaling the entire application --- ## Decision Outcome **Chosen Option: Option C - Modular Monolith Architecture** We adopt a modular monolith architecture with clearly defined domain boundaries, designed with explicit seams that allow future extraction to services. This balances immediate development velocity with long-term architectural flexibility. ### Rationale 1. **Right-sized for current needs**: A small team building a new product benefits from deployment simplicity 2. **Performance-critical paths stay in-process**: Audio-to-embedding-to-index flow benefits from zero network hops 3. **Scientific workflow alignment**: Researchers prefer reproducible, debuggable systems over distributed complexity 4. **Evolution path preserved**: Module boundaries are designed as potential service boundaries 5. **RuVector integration**: RuVector is designed as an embeddable library, making monolith integration natural --- ## Technical Specifications ### Module Architecture ``` sevensense/ ├── core/ # Domain-agnostic foundations │ ├── config/ # Configuration management │ ├── error/ # Error types and handling │ ├── telemetry/ # Logging, metrics, tracing │ └── storage/ # Abstract storage interfaces │ ├── audio/ # Audio Processing Domain │ ├── ingest/ # Audio file reading, streaming │ ├── segment/ # Call detection and segmentation │ ├── features/ # Acoustic feature extraction │ └── spectrogram/ # Mel spectrogram generation │ ├── embedding/ # Embedding Generation Domain │ ├── perch/ # Perch 2.0 ONNX inference │ ├── models/ # Model versioning and registry │ ├── batch/ # Batch embedding pipelines │ └── normalize/ # Vector normalization (L2, etc.) │ ├── vectordb/ # Vector Storage Domain (RuVector) │ ├── index/ # HNSW index management │ ├── graph/ # Graph structure (nodes, edges) │ ├── query/ # Similarity search, Cypher queries │ └── hyperbolic/ # Poincare ball embeddings │ ├── learning/ # GNN Learning Domain │ ├── gnn/ # GNN layers (GCN, GAT, GraphSAGE) │ ├── attention/ # Attention mechanisms │ ├── training/ # Self-supervised training loops │ └── refinement/ # Embedding refinement pipelines │ ├── analysis/ # Analysis Domain │ ├── clustering/ # HDBSCAN, prototype extraction │ ├── sequence/ # Motif detection, transition analysis │ ├── entropy/ # Sequence entropy metrics │ └── validation/ # Ground truth comparison │ ├── rab/ # Retrieval-Augmented Bioacoustics │ ├── evidence/ # Evidence pack construction │ ├── retrieval/ # Adaptive retrieval depth │ ├── interpretation/ # Constrained interpretation generation │ └── citation/ # Source attribution │ ├── api/ # API Layer │ ├── rest/ # REST endpoints │ ├── graphql/ # GraphQL schema and resolvers │ ├── websocket/ # Real-time streaming │ └── grpc/ # gRPC for inter-service (future) │ ├── visualization/ # Visualization Domain │ ├── projection/ # UMAP/t-SNE dimensionality reduction │ ├── graph_viz/ # Network visualization │ ├── spectrogram_viz/ # Spectrogram rendering │ └── export/ # Export formats (JSON, PNG, etc.) │ └── cli/ # Command Line Interface ├── ingest/ # Batch ingestion commands ├── query/ # Query commands ├── train/ # Training commands └── export/ # Export commands ``` ### Data Model #### Core Entities (Graph Nodes) ```rust /// Raw audio recording from a sensor struct Recording { id: Uuid, sensor_id: String, location: GeoPoint, // lat, lon, elevation start_timestamp: DateTime, duration_ms: u32, sample_rate: u32, // 32000 Hz for Perch 2.0 channels: u8, habitat: Option, weather: Option, file_path: PathBuf, checksum: String, // SHA-256 for reproducibility } /// Detected call segment within a recording struct CallSegment { id: Uuid, recording_id: Uuid, start_ms: u32, end_ms: u32, snr_db: f32, // Signal-to-noise ratio peak_frequency_hz: f32, energy: f32, detection_confidence: f32, detection_method: String, // "energy_threshold", "whisper_seg", etc. } /// Embedding vector for a call segment struct Embedding { id: Uuid, segment_id: Uuid, model_id: String, // "perch2_v1.0" dimensions: u16, // 1536 for Perch 2.0 vector: Vec, normalized: bool, created_at: DateTime, } /// Cluster prototype (centroid of similar calls) struct Prototype { id: Uuid, cluster_id: Uuid, centroid_vector: Vec, exemplar_ids: Vec, // Representative segments member_count: u32, coherence_score: f32, } /// Cluster of similar call segments struct Cluster { id: Uuid, method: String, // "hdbscan", "kmeans", etc. parameters: HashMap, created_at: DateTime, validation_score: Option, } /// Optional taxonomic reference struct Taxon { id: Uuid, scientific_name: String, common_name: String, inat_id: Option, // iNaturalist ID ebird_code: Option, // eBird species code } ``` #### Relationships (Graph Edges) ```rust /// Recording contains segments edge HAS_SEGMENT: Recording -> CallSegment /// Temporal sequence within recording edge NEXT: CallSegment -> CallSegment { delta_ms: u32, // Time gap between calls } /// Acoustic similarity from HNSW edge SIMILAR: CallSegment -> CallSegment { distance: f32, // Cosine or Euclidean rank: u8, // kNN rank (1 = nearest) } /// Cluster membership edge ASSIGNED_TO: CallSegment -> Cluster /// Prototype ownership edge HAS_PROTOTYPE: Cluster -> Prototype /// Species identification (when available) edge IDENTIFIED_AS: CallSegment -> Taxon { confidence: f32, method: String, // "manual", "model", "consensus" } ``` ### Processing Pipeline ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ INGESTION PIPELINE │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Audio │───▶│ Segment │───▶│ Mel │───▶│ Perch2.0 │ │ │ │ Input │ │Detection │ │Spectrogram│ │ ONNX │ │ │ │(32kHz,5s)│ │ │ │(500x128) │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ │ ┌──────────┐ │ │ │ │ │ │Embedding │ │ │ │ │ │ │ (1536-D) │ │ │ │ │ │ └──────────┘ │ │ │ │ │ │ │ └───────┼───────────────┼───────────────┼───────────────┼──────────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ STORAGE LAYER │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ RuVector │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ │ │ HNSW │ │ Graph │ │ Metadata Store │ │ │ │ │ │ Index │ │ Store │ │ (Recordings, │ │ │ │ │ │ │ │ (Edges) │ │ Segments, etc.) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ LEARNING LAYER │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ GNN │ │ Attention │ │ Hyperbolic │ │ │ │ Reranker │───▶│ Layers │───▶│ Refinement │ │ │ │(GCN/GAT/SAGE)│ │ │ │ (Poincare) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ └───────────────────┴───────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ Refined │ │ │ │ Embeddings │ │ │ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ ANALYSIS LAYER │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │Clustering│ │ Sequence │ │ Anomaly │ │ Entropy │ │ RAB │ │ │ │(HDBSCAN) │ │ Mining │ │Detection │ │ Metrics │ │ Evidence │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ API / PRESENTATION │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ REST │ │ GraphQL │ │WebSocket │ │ CLI │ │ WASM │ │ │ │ API │ │ API │ │(Streaming)│ │ │ │ (Browser)│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Key Interfaces Between Modules ```rust // Audio -> Embedding interface trait AudioEmbedder { fn embed_segment(&self, audio: &AudioSegment) -> Result; fn embed_batch(&self, segments: &[AudioSegment]) -> Result>; fn model_info(&self) -> ModelInfo; } // Embedding -> VectorDB interface trait VectorStore { fn insert(&mut self, embedding: &Embedding) -> Result<()>; fn search_knn(&self, query: &[f32], k: usize) -> Result>; fn get_neighbors(&self, id: Uuid) -> Result>; fn build_similarity_edges(&mut self, k: usize) -> Result; } // VectorDB -> Learning interface trait GraphLearner { fn train_step(&mut self, graph: &Graph) -> Result; fn refine_embeddings(&self, embeddings: &mut [Embedding]) -> Result<()>; fn attention_weights(&self, node_id: Uuid) -> Result>; } // Learning -> Analysis interface trait PatternAnalyzer { fn cluster(&self, embeddings: &[Embedding]) -> Result>; fn find_motifs(&self, sequences: &[Sequence]) -> Result>; fn compute_entropy(&self, transitions: &TransitionMatrix) -> f32; } // Analysis -> RAB interface trait EvidenceBuilder { fn build_pack(&self, query: &Query) -> Result; fn generate_interpretation(&self, pack: &EvidencePack) -> Result; fn cite_sources(&self, interpretation: &Interpretation) -> Vec; } ``` ### Configuration Structure ```yaml # sevensense.yaml sevensense: # Audio processing settings audio: sample_rate: 32000 # Perch 2.0 requirement segment_duration_ms: 5000 # 5 seconds segment_overlap_ms: 500 # Overlap for continuity min_snr_db: 10.0 # Minimum signal-to-noise detection_method: "energy" # or "whisper_seg", "tweety" # Embedding generation embedding: model: "perch2_v1.0" onnx_path: "./models/perch2.onnx" dimensions: 1536 normalize: true batch_size: 32 # Vector database (RuVector) vectordb: index_type: "hnsw" hnsw: m: 16 # Connections per node ef_construction: 200 # Build-time search width ef_search: 100 # Query-time search width distance_metric: "cosine" # or "euclidean", "poincare" enable_hyperbolic: false # Experimental compression: hot_tier: "none" warm_tier: "pq_8" # Product quantization cold_tier: "pq_4" # Aggressive compression # GNN learning learning: enabled: true gnn_type: "gat" # GCN, GAT, or GraphSAGE hidden_dim: 256 num_layers: 2 attention_heads: 4 learning_rate: 0.001 training_interval_hours: 24 # Analysis settings analysis: clustering: method: "hdbscan" min_cluster_size: 10 min_samples: 5 sequence: max_gap_ms: 2000 # Max silence between calls min_motif_length: 3 # RAB settings rab: retrieval_k: 10 # Neighbors to retrieve min_confidence: 0.7 cite_exemplars: true # API settings api: host: "0.0.0.0" port: 8080 enable_graphql: true enable_websocket: true cors_origins: ["*"] # Telemetry telemetry: log_level: "info" metrics_port: 9090 tracing_enabled: true tracing_endpoint: "http://localhost:4317" ``` --- ## Consequences ### Positive Consequences 1. **Development velocity**: Single deployment simplifies CI/CD and local development 2. **Performance**: Critical audio-to-index path has zero network overhead 3. **Debugging**: Stack traces span the entire flow; no distributed tracing required initially 4. **Testing**: Integration tests run in-process without container orchestration 5. **Scientific reproducibility**: Single binary with pinned dependencies ensures consistent results 6. **Resource efficiency**: Shared memory pools and caches across modules 7. **Evolution path**: Clear module boundaries allow extraction to services when justified ### Negative Consequences 1. **Scaling limitations**: Cannot scale embedding generation independently from query serving 2. **Deployment coupling**: Updates to any module require full redeployment 3. **Resource contention**: GNN training may compete with query serving for CPU/memory 4. **Technology constraints**: All modules must work within Rust ecosystem (mitigated by FFI) ### Mitigation Strategies | Risk | Mitigation | |------|------------| | Scaling limitations | Design async job queues that could become external workers | | Deployment coupling | Blue-green deployments with health checks | | Resource contention | Configurable resource limits per module; background training scheduling | | Technology constraints | ONNX runtime for ML; FFI bindings for specialized libraries | --- ## Related Decisions - **ADR-002**: Perch 2.0 Integration Strategy (ONNX vs. birdnet-onnx crate) - **ADR-003**: HNSW vs. Hyperbolic Space Configuration - **ADR-004**: GNN Training Strategy (Online vs. Batch) - **ADR-005**: RAB Evidence Pack Schema - **ADR-006**: API Design (REST/GraphQL/gRPC) --- ## Compliance and Standards ### Scientific Standards - All embeddings include model version and parameters for reproducibility - Evidence packs include full retrieval citations per RAB methodology - Validation metrics align with published benchmarks (V-measure, silhouette scores) ### Data Standards - Audio metadata follows Darwin Core / TDWG standards where applicable - Taxonomic references link to iNaturalist and eBird identifiers - Geospatial data uses WGS84 coordinates ### Security Considerations - No PII in bioacoustic data (sensor IDs are pseudonymous) - API authentication via JWT tokens - Audit logging for all data modifications --- ## References 1. Perch 2.0 Paper: "The Bittern Lesson for Bioacoustics" (arXiv:2508.04665) 2. RuVector Documentation: https://github.com/ruvnet/ruvector 3. HNSW Paper: "Efficient and Robust Approximate Nearest Neighbor Search" 4. RAB Pattern: Retrieval-Augmented Bioacoustics methodology 5. AVN Deep Learning Study: "A deep learning approach for the analysis of birdsong" (eLife 2025) --- ## Revision History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2026-01-15 | 7sense Architecture Team | Initial version |