27 KiB
ADR-001: System Architecture Overview
Status: Accepted Date: 2026-01-15 Decision Makers: 7sense Architecture Team Technical Area: System Architecture
Context and Problem Statement
7sense aims to transform bioacoustic signals (primarily bird calls) into a navigable geometric space where meaningful structure emerges. The system must process audio recordings, generate high-dimensional embeddings using Perch 2.0 (1536-D vectors), organize them with HNSW indexing in RuVector, and apply GNN-based learning to surface patterns such as call types, motifs, and behavioral contexts.
The core challenge is designing an architecture that:
- Handles diverse data pipelines - From raw 32kHz audio to queryable vector embeddings
- Scales to millions of call segments - Real-world bioacoustic monitoring generates vast datasets
- Supports scientific workflows - Researchers need reproducibility, transparency, and evidence-backed interpretations (RAB pattern)
- Enables real-time and batch processing - Field deployments require streaming; research requires bulk analysis
- Integrates ML inference efficiently - ONNX-based Perch 2.0 inference in Rust for performance
Current State
This is a greenfield project building upon:
- Perch 2.0: Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone, 1536-D output)
- RuVector: Rust-based vector database with HNSW indexing and self-learning GNN layers
- RAB Pattern: Retrieval-Augmented Bioacoustics for evidence-backed interpretation
Decision Drivers
Performance Requirements
- Embedding generation: Process 5-second audio segments at >100 segments/second
- Vector search: Sub-millisecond kNN queries on 1M+ vectors (HNSW target: ~100us)
- Batch ingestion: 1M vectors/minute build speed (RuVector baseline)
- Memory efficiency: Support 32x compression for cold data tiers
Scalability Requirements
- Data volume: Support 10K to 10M+ call segments per deployment
- Concurrent users: Multiple researchers querying simultaneously
- Geographic distribution: Sensor networks across multiple sites
- Temporal depth: Years of historical recordings
Scientific Rigor Requirements
- Reproducibility: Deterministic pipelines with versioned models and parameters
- Transparency: RAB-style evidence packs citing retrieved calls for any interpretation
- Auditability: Full provenance tracking from raw audio to conclusions
- Validation: Built-in verification against ground truth labels
Operational Requirements
- Deployment flexibility: Edge (sensor), cloud, and hybrid deployments
- Monitoring: Health metrics, processing throughput, index quality
- Updates: Hot-swap embedding models without full reindexing
- Recovery: Graceful degradation and disaster recovery
Considered Options
Option A: Monolithic Architecture
A single application handling all concerns: audio processing, embedding generation, vector storage, GNN learning, API serving, and visualization.
Pros:
- Simplest deployment model
- No inter-service communication overhead
- Single codebase to maintain
Cons:
- Cannot scale components independently
- Single point of failure
- Difficult to update individual components
- Memory pressure from co-located ML models
- Not suitable for distributed sensor networks
Option B: Microservices Architecture
Fully decomposed services: Audio Ingest Service, Embedding Service, Vector Store Service, GNN Learning Service, Query Service, Visualization Service, etc.
Pros:
- Independent scaling per service
- Technology flexibility per service
- Fault isolation
- Team parallelization
Cons:
- Significant operational complexity
- Network latency between services
- Data consistency challenges
- Overkill for initial team size
- Complex debugging across service boundaries
Option C: Modular Monolith Architecture
A single deployable unit with clearly separated internal modules, designed for future extraction into services if needed.
Pros:
- Maintains deployment simplicity
- Clear module boundaries enable future splitting
- In-process communication for performance-critical paths
- Easier debugging and testing
- Appropriate for current team/project scale
- Can evolve toward microservices as needs emerge
Cons:
- Requires discipline to maintain module boundaries
- All modules share the same runtime resources
- Scaling requires scaling the entire application
Decision Outcome
Chosen Option: Option C - Modular Monolith Architecture
We adopt a modular monolith architecture with clearly defined domain boundaries, designed with explicit seams that allow future extraction to services. This balances immediate development velocity with long-term architectural flexibility.
Rationale
- Right-sized for current needs: A small team building a new product benefits from deployment simplicity
- Performance-critical paths stay in-process: Audio-to-embedding-to-index flow benefits from zero network hops
- Scientific workflow alignment: Researchers prefer reproducible, debuggable systems over distributed complexity
- Evolution path preserved: Module boundaries are designed as potential service boundaries
- RuVector integration: RuVector is designed as an embeddable library, making monolith integration natural
Technical Specifications
Module Architecture
sevensense/
├── core/ # Domain-agnostic foundations
│ ├── config/ # Configuration management
│ ├── error/ # Error types and handling
│ ├── telemetry/ # Logging, metrics, tracing
│ └── storage/ # Abstract storage interfaces
│
├── audio/ # Audio Processing Domain
│ ├── ingest/ # Audio file reading, streaming
│ ├── segment/ # Call detection and segmentation
│ ├── features/ # Acoustic feature extraction
│ └── spectrogram/ # Mel spectrogram generation
│
├── embedding/ # Embedding Generation Domain
│ ├── perch/ # Perch 2.0 ONNX inference
│ ├── models/ # Model versioning and registry
│ ├── batch/ # Batch embedding pipelines
│ └── normalize/ # Vector normalization (L2, etc.)
│
├── vectordb/ # Vector Storage Domain (RuVector)
│ ├── index/ # HNSW index management
│ ├── graph/ # Graph structure (nodes, edges)
│ ├── query/ # Similarity search, Cypher queries
│ └── hyperbolic/ # Poincare ball embeddings
│
├── learning/ # GNN Learning Domain
│ ├── gnn/ # GNN layers (GCN, GAT, GraphSAGE)
│ ├── attention/ # Attention mechanisms
│ ├── training/ # Self-supervised training loops
│ └── refinement/ # Embedding refinement pipelines
│
├── analysis/ # Analysis Domain
│ ├── clustering/ # HDBSCAN, prototype extraction
│ ├── sequence/ # Motif detection, transition analysis
│ ├── entropy/ # Sequence entropy metrics
│ └── validation/ # Ground truth comparison
│
├── rab/ # Retrieval-Augmented Bioacoustics
│ ├── evidence/ # Evidence pack construction
│ ├── retrieval/ # Adaptive retrieval depth
│ ├── interpretation/ # Constrained interpretation generation
│ └── citation/ # Source attribution
│
├── api/ # API Layer
│ ├── rest/ # REST endpoints
│ ├── graphql/ # GraphQL schema and resolvers
│ ├── websocket/ # Real-time streaming
│ └── grpc/ # gRPC for inter-service (future)
│
├── visualization/ # Visualization Domain
│ ├── projection/ # UMAP/t-SNE dimensionality reduction
│ ├── graph_viz/ # Network visualization
│ ├── spectrogram_viz/ # Spectrogram rendering
│ └── export/ # Export formats (JSON, PNG, etc.)
│
└── cli/ # Command Line Interface
├── ingest/ # Batch ingestion commands
├── query/ # Query commands
├── train/ # Training commands
└── export/ # Export commands
Data Model
Core Entities (Graph Nodes)
/// Raw audio recording from a sensor
struct Recording {
id: Uuid,
sensor_id: String,
location: GeoPoint, // lat, lon, elevation
start_timestamp: DateTime,
duration_ms: u32,
sample_rate: u32, // 32000 Hz for Perch 2.0
channels: u8,
habitat: Option<String>,
weather: Option<WeatherData>,
file_path: PathBuf,
checksum: String, // SHA-256 for reproducibility
}
/// Detected call segment within a recording
struct CallSegment {
id: Uuid,
recording_id: Uuid,
start_ms: u32,
end_ms: u32,
snr_db: f32, // Signal-to-noise ratio
peak_frequency_hz: f32,
energy: f32,
detection_confidence: f32,
detection_method: String, // "energy_threshold", "whisper_seg", etc.
}
/// Embedding vector for a call segment
struct Embedding {
id: Uuid,
segment_id: Uuid,
model_id: String, // "perch2_v1.0"
dimensions: u16, // 1536 for Perch 2.0
vector: Vec<f32>,
normalized: bool,
created_at: DateTime,
}
/// Cluster prototype (centroid of similar calls)
struct Prototype {
id: Uuid,
cluster_id: Uuid,
centroid_vector: Vec<f32>,
exemplar_ids: Vec<Uuid>, // Representative segments
member_count: u32,
coherence_score: f32,
}
/// Cluster of similar call segments
struct Cluster {
id: Uuid,
method: String, // "hdbscan", "kmeans", etc.
parameters: HashMap<String, Value>,
created_at: DateTime,
validation_score: Option<f32>,
}
/// Optional taxonomic reference
struct Taxon {
id: Uuid,
scientific_name: String,
common_name: String,
inat_id: Option<u64>, // iNaturalist ID
ebird_code: Option<String>, // eBird species code
}
Relationships (Graph Edges)
/// Recording contains segments
edge HAS_SEGMENT: Recording -> CallSegment
/// Temporal sequence within recording
edge NEXT: CallSegment -> CallSegment {
delta_ms: u32, // Time gap between calls
}
/// Acoustic similarity from HNSW
edge SIMILAR: CallSegment -> CallSegment {
distance: f32, // Cosine or Euclidean
rank: u8, // kNN rank (1 = nearest)
}
/// Cluster membership
edge ASSIGNED_TO: CallSegment -> Cluster
/// Prototype ownership
edge HAS_PROTOTYPE: Cluster -> Prototype
/// Species identification (when available)
edge IDENTIFIED_AS: CallSegment -> Taxon {
confidence: f32,
method: String, // "manual", "model", "consensus"
}
Processing Pipeline
┌─────────────────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Audio │───▶│ Segment │───▶│ Mel │───▶│ Perch2.0 │ │
│ │ Input │ │Detection │ │Spectrogram│ │ ONNX │ │
│ │(32kHz,5s)│ │ │ │(500x128) │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │ │
│ │ │ │ ▼ │
│ │ │ │ ┌──────────┐ │
│ │ │ │ │Embedding │ │
│ │ │ │ │ (1536-D) │ │
│ │ │ │ └──────────┘ │
│ │ │ │ │ │
└───────┼───────────────┼───────────────┼───────────────┼──────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ STORAGE LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ RuVector │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ HNSW │ │ Graph │ │ Metadata Store │ │ │
│ │ │ Index │ │ Store │ │ (Recordings, │ │ │
│ │ │ │ │ (Edges) │ │ Segments, etc.) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ LEARNING LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ GNN │ │ Attention │ │ Hyperbolic │ │
│ │ Reranker │───▶│ Layers │───▶│ Refinement │ │
│ │(GCN/GAT/SAGE)│ │ │ │ (Poincare) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Refined │ │
│ │ Embeddings │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ ANALYSIS LAYER │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Clustering│ │ Sequence │ │ Anomaly │ │ Entropy │ │ RAB │ │
│ │(HDBSCAN) │ │ Mining │ │Detection │ │ Metrics │ │ Evidence │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ API / PRESENTATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ REST │ │ GraphQL │ │WebSocket │ │ CLI │ │ WASM │ │
│ │ API │ │ API │ │(Streaming)│ │ │ │ (Browser)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key Interfaces Between Modules
// Audio -> Embedding interface
trait AudioEmbedder {
fn embed_segment(&self, audio: &AudioSegment) -> Result<Embedding>;
fn embed_batch(&self, segments: &[AudioSegment]) -> Result<Vec<Embedding>>;
fn model_info(&self) -> ModelInfo;
}
// Embedding -> VectorDB interface
trait VectorStore {
fn insert(&mut self, embedding: &Embedding) -> Result<()>;
fn search_knn(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
fn get_neighbors(&self, id: Uuid) -> Result<Vec<Neighbor>>;
fn build_similarity_edges(&mut self, k: usize) -> Result<usize>;
}
// VectorDB -> Learning interface
trait GraphLearner {
fn train_step(&mut self, graph: &Graph) -> Result<TrainMetrics>;
fn refine_embeddings(&self, embeddings: &mut [Embedding]) -> Result<()>;
fn attention_weights(&self, node_id: Uuid) -> Result<Vec<(Uuid, f32)>>;
}
// Learning -> Analysis interface
trait PatternAnalyzer {
fn cluster(&self, embeddings: &[Embedding]) -> Result<Vec<Cluster>>;
fn find_motifs(&self, sequences: &[Sequence]) -> Result<Vec<Motif>>;
fn compute_entropy(&self, transitions: &TransitionMatrix) -> f32;
}
// Analysis -> RAB interface
trait EvidenceBuilder {
fn build_pack(&self, query: &Query) -> Result<EvidencePack>;
fn generate_interpretation(&self, pack: &EvidencePack) -> Result<Interpretation>;
fn cite_sources(&self, interpretation: &Interpretation) -> Vec<Citation>;
}
Configuration Structure
# sevensense.yaml
sevensense:
# Audio processing settings
audio:
sample_rate: 32000 # Perch 2.0 requirement
segment_duration_ms: 5000 # 5 seconds
segment_overlap_ms: 500 # Overlap for continuity
min_snr_db: 10.0 # Minimum signal-to-noise
detection_method: "energy" # or "whisper_seg", "tweety"
# Embedding generation
embedding:
model: "perch2_v1.0"
onnx_path: "./models/perch2.onnx"
dimensions: 1536
normalize: true
batch_size: 32
# Vector database (RuVector)
vectordb:
index_type: "hnsw"
hnsw:
m: 16 # Connections per node
ef_construction: 200 # Build-time search width
ef_search: 100 # Query-time search width
distance_metric: "cosine" # or "euclidean", "poincare"
enable_hyperbolic: false # Experimental
compression:
hot_tier: "none"
warm_tier: "pq_8" # Product quantization
cold_tier: "pq_4" # Aggressive compression
# GNN learning
learning:
enabled: true
gnn_type: "gat" # GCN, GAT, or GraphSAGE
hidden_dim: 256
num_layers: 2
attention_heads: 4
learning_rate: 0.001
training_interval_hours: 24
# Analysis settings
analysis:
clustering:
method: "hdbscan"
min_cluster_size: 10
min_samples: 5
sequence:
max_gap_ms: 2000 # Max silence between calls
min_motif_length: 3
# RAB settings
rab:
retrieval_k: 10 # Neighbors to retrieve
min_confidence: 0.7
cite_exemplars: true
# API settings
api:
host: "0.0.0.0"
port: 8080
enable_graphql: true
enable_websocket: true
cors_origins: ["*"]
# Telemetry
telemetry:
log_level: "info"
metrics_port: 9090
tracing_enabled: true
tracing_endpoint: "http://localhost:4317"
Consequences
Positive Consequences
- Development velocity: Single deployment simplifies CI/CD and local development
- Performance: Critical audio-to-index path has zero network overhead
- Debugging: Stack traces span the entire flow; no distributed tracing required initially
- Testing: Integration tests run in-process without container orchestration
- Scientific reproducibility: Single binary with pinned dependencies ensures consistent results
- Resource efficiency: Shared memory pools and caches across modules
- Evolution path: Clear module boundaries allow extraction to services when justified
Negative Consequences
- Scaling limitations: Cannot scale embedding generation independently from query serving
- Deployment coupling: Updates to any module require full redeployment
- Resource contention: GNN training may compete with query serving for CPU/memory
- Technology constraints: All modules must work within Rust ecosystem (mitigated by FFI)
Mitigation Strategies
| Risk | Mitigation |
|---|---|
| Scaling limitations | Design async job queues that could become external workers |
| Deployment coupling | Blue-green deployments with health checks |
| Resource contention | Configurable resource limits per module; background training scheduling |
| Technology constraints | ONNX runtime for ML; FFI bindings for specialized libraries |
Related Decisions
- ADR-002: Perch 2.0 Integration Strategy (ONNX vs. birdnet-onnx crate)
- ADR-003: HNSW vs. Hyperbolic Space Configuration
- ADR-004: GNN Training Strategy (Online vs. Batch)
- ADR-005: RAB Evidence Pack Schema
- ADR-006: API Design (REST/GraphQL/gRPC)
Compliance and Standards
Scientific Standards
- All embeddings include model version and parameters for reproducibility
- Evidence packs include full retrieval citations per RAB methodology
- Validation metrics align with published benchmarks (V-measure, silhouette scores)
Data Standards
- Audio metadata follows Darwin Core / TDWG standards where applicable
- Taxonomic references link to iNaturalist and eBird identifiers
- Geospatial data uses WGS84 coordinates
Security Considerations
- No PII in bioacoustic data (sensor IDs are pseudonymous)
- API authentication via JWT tokens
- Audit logging for all data modifications
References
- Perch 2.0 Paper: "The Bittern Lesson for Bioacoustics" (arXiv:2508.04665)
- RuVector Documentation: https://github.com/ruvnet/ruvector
- HNSW Paper: "Efficient and Robust Approximate Nearest Neighbor Search"
- RAB Pattern: Retrieval-Augmented Bioacoustics methodology
- AVN Deep Learning Study: "A deep learning approach for the analysis of birdsong" (eLife 2025)
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-15 | 7sense Architecture Team | Initial version |