Files
wifi-densepose/vendor/ruvector/examples/vibecast-7sense/docs/adr/ADR-001-system-architecture.md

27 KiB

ADR-001: System Architecture Overview

Status: Accepted Date: 2026-01-15 Decision Makers: 7sense Architecture Team Technical Area: System Architecture


Context and Problem Statement

7sense aims to transform bioacoustic signals (primarily bird calls) into a navigable geometric space where meaningful structure emerges. The system must process audio recordings, generate high-dimensional embeddings using Perch 2.0 (1536-D vectors), organize them with HNSW indexing in RuVector, and apply GNN-based learning to surface patterns such as call types, motifs, and behavioral contexts.

The core challenge is designing an architecture that:

  1. Handles diverse data pipelines - From raw 32kHz audio to queryable vector embeddings
  2. Scales to millions of call segments - Real-world bioacoustic monitoring generates vast datasets
  3. Supports scientific workflows - Researchers need reproducibility, transparency, and evidence-backed interpretations (RAB pattern)
  4. Enables real-time and batch processing - Field deployments require streaming; research requires bulk analysis
  5. Integrates ML inference efficiently - ONNX-based Perch 2.0 inference in Rust for performance

Current State

This is a greenfield project building upon:

  • Perch 2.0: Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone, 1536-D output)
  • RuVector: Rust-based vector database with HNSW indexing and self-learning GNN layers
  • RAB Pattern: Retrieval-Augmented Bioacoustics for evidence-backed interpretation

Decision Drivers

Performance Requirements

  • Embedding generation: Process 5-second audio segments at >100 segments/second
  • Vector search: Sub-millisecond kNN queries on 1M+ vectors (HNSW target: ~100us)
  • Batch ingestion: 1M vectors/minute build speed (RuVector baseline)
  • Memory efficiency: Support 32x compression for cold data tiers

Scalability Requirements

  • Data volume: Support 10K to 10M+ call segments per deployment
  • Concurrent users: Multiple researchers querying simultaneously
  • Geographic distribution: Sensor networks across multiple sites
  • Temporal depth: Years of historical recordings

Scientific Rigor Requirements

  • Reproducibility: Deterministic pipelines with versioned models and parameters
  • Transparency: RAB-style evidence packs citing retrieved calls for any interpretation
  • Auditability: Full provenance tracking from raw audio to conclusions
  • Validation: Built-in verification against ground truth labels

Operational Requirements

  • Deployment flexibility: Edge (sensor), cloud, and hybrid deployments
  • Monitoring: Health metrics, processing throughput, index quality
  • Updates: Hot-swap embedding models without full reindexing
  • Recovery: Graceful degradation and disaster recovery

Considered Options

Option A: Monolithic Architecture

A single application handling all concerns: audio processing, embedding generation, vector storage, GNN learning, API serving, and visualization.

Pros:

  • Simplest deployment model
  • No inter-service communication overhead
  • Single codebase to maintain

Cons:

  • Cannot scale components independently
  • Single point of failure
  • Difficult to update individual components
  • Memory pressure from co-located ML models
  • Not suitable for distributed sensor networks

Option B: Microservices Architecture

Fully decomposed services: Audio Ingest Service, Embedding Service, Vector Store Service, GNN Learning Service, Query Service, Visualization Service, etc.

Pros:

  • Independent scaling per service
  • Technology flexibility per service
  • Fault isolation
  • Team parallelization

Cons:

  • Significant operational complexity
  • Network latency between services
  • Data consistency challenges
  • Overkill for initial team size
  • Complex debugging across service boundaries

Option C: Modular Monolith Architecture

A single deployable unit with clearly separated internal modules, designed for future extraction into services if needed.

Pros:

  • Maintains deployment simplicity
  • Clear module boundaries enable future splitting
  • In-process communication for performance-critical paths
  • Easier debugging and testing
  • Appropriate for current team/project scale
  • Can evolve toward microservices as needs emerge

Cons:

  • Requires discipline to maintain module boundaries
  • All modules share the same runtime resources
  • Scaling requires scaling the entire application

Decision Outcome

Chosen Option: Option C - Modular Monolith Architecture

We adopt a modular monolith architecture with clearly defined domain boundaries, designed with explicit seams that allow future extraction to services. This balances immediate development velocity with long-term architectural flexibility.

Rationale

  1. Right-sized for current needs: A small team building a new product benefits from deployment simplicity
  2. Performance-critical paths stay in-process: Audio-to-embedding-to-index flow benefits from zero network hops
  3. Scientific workflow alignment: Researchers prefer reproducible, debuggable systems over distributed complexity
  4. Evolution path preserved: Module boundaries are designed as potential service boundaries
  5. RuVector integration: RuVector is designed as an embeddable library, making monolith integration natural

Technical Specifications

Module Architecture

sevensense/
├── core/                      # Domain-agnostic foundations
│   ├── config/               # Configuration management
│   ├── error/                # Error types and handling
│   ├── telemetry/            # Logging, metrics, tracing
│   └── storage/              # Abstract storage interfaces
│
├── audio/                     # Audio Processing Domain
│   ├── ingest/               # Audio file reading, streaming
│   ├── segment/              # Call detection and segmentation
│   ├── features/             # Acoustic feature extraction
│   └── spectrogram/          # Mel spectrogram generation
│
├── embedding/                 # Embedding Generation Domain
│   ├── perch/                # Perch 2.0 ONNX inference
│   ├── models/               # Model versioning and registry
│   ├── batch/                # Batch embedding pipelines
│   └── normalize/            # Vector normalization (L2, etc.)
│
├── vectordb/                  # Vector Storage Domain (RuVector)
│   ├── index/                # HNSW index management
│   ├── graph/                # Graph structure (nodes, edges)
│   ├── query/                # Similarity search, Cypher queries
│   └── hyperbolic/           # Poincare ball embeddings
│
├── learning/                  # GNN Learning Domain
│   ├── gnn/                  # GNN layers (GCN, GAT, GraphSAGE)
│   ├── attention/            # Attention mechanisms
│   ├── training/             # Self-supervised training loops
│   └── refinement/           # Embedding refinement pipelines
│
├── analysis/                  # Analysis Domain
│   ├── clustering/           # HDBSCAN, prototype extraction
│   ├── sequence/             # Motif detection, transition analysis
│   ├── entropy/              # Sequence entropy metrics
│   └── validation/           # Ground truth comparison
│
├── rab/                       # Retrieval-Augmented Bioacoustics
│   ├── evidence/             # Evidence pack construction
│   ├── retrieval/            # Adaptive retrieval depth
│   ├── interpretation/       # Constrained interpretation generation
│   └── citation/             # Source attribution
│
├── api/                       # API Layer
│   ├── rest/                 # REST endpoints
│   ├── graphql/              # GraphQL schema and resolvers
│   ├── websocket/            # Real-time streaming
│   └── grpc/                 # gRPC for inter-service (future)
│
├── visualization/             # Visualization Domain
│   ├── projection/           # UMAP/t-SNE dimensionality reduction
│   ├── graph_viz/            # Network visualization
│   ├── spectrogram_viz/      # Spectrogram rendering
│   └── export/               # Export formats (JSON, PNG, etc.)
│
└── cli/                       # Command Line Interface
    ├── ingest/               # Batch ingestion commands
    ├── query/                # Query commands
    ├── train/                # Training commands
    └── export/               # Export commands

Data Model

Core Entities (Graph Nodes)

/// Raw audio recording from a sensor
struct Recording {
    id: Uuid,
    sensor_id: String,
    location: GeoPoint,          // lat, lon, elevation
    start_timestamp: DateTime,
    duration_ms: u32,
    sample_rate: u32,            // 32000 Hz for Perch 2.0
    channels: u8,
    habitat: Option<String>,
    weather: Option<WeatherData>,
    file_path: PathBuf,
    checksum: String,            // SHA-256 for reproducibility
}

/// Detected call segment within a recording
struct CallSegment {
    id: Uuid,
    recording_id: Uuid,
    start_ms: u32,
    end_ms: u32,
    snr_db: f32,                 // Signal-to-noise ratio
    peak_frequency_hz: f32,
    energy: f32,
    detection_confidence: f32,
    detection_method: String,    // "energy_threshold", "whisper_seg", etc.
}

/// Embedding vector for a call segment
struct Embedding {
    id: Uuid,
    segment_id: Uuid,
    model_id: String,            // "perch2_v1.0"
    dimensions: u16,             // 1536 for Perch 2.0
    vector: Vec<f32>,
    normalized: bool,
    created_at: DateTime,
}

/// Cluster prototype (centroid of similar calls)
struct Prototype {
    id: Uuid,
    cluster_id: Uuid,
    centroid_vector: Vec<f32>,
    exemplar_ids: Vec<Uuid>,     // Representative segments
    member_count: u32,
    coherence_score: f32,
}

/// Cluster of similar call segments
struct Cluster {
    id: Uuid,
    method: String,              // "hdbscan", "kmeans", etc.
    parameters: HashMap<String, Value>,
    created_at: DateTime,
    validation_score: Option<f32>,
}

/// Optional taxonomic reference
struct Taxon {
    id: Uuid,
    scientific_name: String,
    common_name: String,
    inat_id: Option<u64>,        // iNaturalist ID
    ebird_code: Option<String>,  // eBird species code
}

Relationships (Graph Edges)

/// Recording contains segments
edge HAS_SEGMENT: Recording -> CallSegment

/// Temporal sequence within recording
edge NEXT: CallSegment -> CallSegment {
    delta_ms: u32,               // Time gap between calls
}

/// Acoustic similarity from HNSW
edge SIMILAR: CallSegment -> CallSegment {
    distance: f32,               // Cosine or Euclidean
    rank: u8,                    // kNN rank (1 = nearest)
}

/// Cluster membership
edge ASSIGNED_TO: CallSegment -> Cluster

/// Prototype ownership
edge HAS_PROTOTYPE: Cluster -> Prototype

/// Species identification (when available)
edge IDENTIFIED_AS: CallSegment -> Taxon {
    confidence: f32,
    method: String,              // "manual", "model", "consensus"
}

Processing Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│                         INGESTION PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │  Audio   │───▶│ Segment  │───▶│   Mel    │───▶│ Perch2.0 │          │
│  │  Input   │    │Detection │    │Spectrogram│   │  ONNX    │          │
│  │(32kHz,5s)│    │          │    │(500x128) │    │          │          │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘          │
│       │               │               │               │                  │
│       │               │               │               ▼                  │
│       │               │               │         ┌──────────┐            │
│       │               │               │         │Embedding │            │
│       │               │               │         │ (1536-D) │            │
│       │               │               │         └──────────┘            │
│       │               │               │               │                  │
└───────┼───────────────┼───────────────┼───────────────┼──────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         STORAGE LAYER                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                        RuVector                               │       │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │       │
│  │  │   HNSW      │  │   Graph     │  │   Metadata Store    │  │       │
│  │  │   Index     │  │   Store     │  │   (Recordings,      │  │       │
│  │  │             │  │   (Edges)   │  │    Segments, etc.)  │  │       │
│  │  └─────────────┘  └─────────────┘  └─────────────────────┘  │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         LEARNING LAYER                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │    GNN       │    │  Attention   │    │  Hyperbolic  │              │
│  │  Reranker    │───▶│   Layers     │───▶│  Refinement  │              │
│  │(GCN/GAT/SAGE)│    │              │    │  (Poincare)  │              │
│  └──────────────┘    └──────────────┘    └──────────────┘              │
│         │                   │                   │                        │
│         └───────────────────┴───────────────────┘                        │
│                             │                                            │
│                             ▼                                            │
│                    ┌──────────────┐                                     │
│                    │   Refined    │                                     │
│                    │  Embeddings  │                                     │
│                    └──────────────┘                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         ANALYSIS LAYER                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │Clustering│  │ Sequence │  │ Anomaly  │  │  Entropy │  │   RAB    │ │
│  │(HDBSCAN) │  │  Mining  │  │Detection │  │  Metrics │  │ Evidence │ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         API / PRESENTATION                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │   REST   │  │ GraphQL  │  │WebSocket │  │   CLI    │  │   WASM   │ │
│  │   API    │  │   API    │  │(Streaming)│ │          │  │ (Browser)│ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Key Interfaces Between Modules

// Audio -> Embedding interface
trait AudioEmbedder {
    fn embed_segment(&self, audio: &AudioSegment) -> Result<Embedding>;
    fn embed_batch(&self, segments: &[AudioSegment]) -> Result<Vec<Embedding>>;
    fn model_info(&self) -> ModelInfo;
}

// Embedding -> VectorDB interface
trait VectorStore {
    fn insert(&mut self, embedding: &Embedding) -> Result<()>;
    fn search_knn(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
    fn get_neighbors(&self, id: Uuid) -> Result<Vec<Neighbor>>;
    fn build_similarity_edges(&mut self, k: usize) -> Result<usize>;
}

// VectorDB -> Learning interface
trait GraphLearner {
    fn train_step(&mut self, graph: &Graph) -> Result<TrainMetrics>;
    fn refine_embeddings(&self, embeddings: &mut [Embedding]) -> Result<()>;
    fn attention_weights(&self, node_id: Uuid) -> Result<Vec<(Uuid, f32)>>;
}

// Learning -> Analysis interface
trait PatternAnalyzer {
    fn cluster(&self, embeddings: &[Embedding]) -> Result<Vec<Cluster>>;
    fn find_motifs(&self, sequences: &[Sequence]) -> Result<Vec<Motif>>;
    fn compute_entropy(&self, transitions: &TransitionMatrix) -> f32;
}

// Analysis -> RAB interface
trait EvidenceBuilder {
    fn build_pack(&self, query: &Query) -> Result<EvidencePack>;
    fn generate_interpretation(&self, pack: &EvidencePack) -> Result<Interpretation>;
    fn cite_sources(&self, interpretation: &Interpretation) -> Vec<Citation>;
}

Configuration Structure

# sevensense.yaml
sevensense:
  # Audio processing settings
  audio:
    sample_rate: 32000          # Perch 2.0 requirement
    segment_duration_ms: 5000   # 5 seconds
    segment_overlap_ms: 500     # Overlap for continuity
    min_snr_db: 10.0           # Minimum signal-to-noise
    detection_method: "energy"  # or "whisper_seg", "tweety"

  # Embedding generation
  embedding:
    model: "perch2_v1.0"
    onnx_path: "./models/perch2.onnx"
    dimensions: 1536
    normalize: true
    batch_size: 32

  # Vector database (RuVector)
  vectordb:
    index_type: "hnsw"
    hnsw:
      m: 16                     # Connections per node
      ef_construction: 200      # Build-time search width
      ef_search: 100           # Query-time search width
    distance_metric: "cosine"   # or "euclidean", "poincare"
    enable_hyperbolic: false    # Experimental
    compression:
      hot_tier: "none"
      warm_tier: "pq_8"        # Product quantization
      cold_tier: "pq_4"        # Aggressive compression

  # GNN learning
  learning:
    enabled: true
    gnn_type: "gat"            # GCN, GAT, or GraphSAGE
    hidden_dim: 256
    num_layers: 2
    attention_heads: 4
    learning_rate: 0.001
    training_interval_hours: 24

  # Analysis settings
  analysis:
    clustering:
      method: "hdbscan"
      min_cluster_size: 10
      min_samples: 5
    sequence:
      max_gap_ms: 2000         # Max silence between calls
      min_motif_length: 3

  # RAB settings
  rab:
    retrieval_k: 10            # Neighbors to retrieve
    min_confidence: 0.7
    cite_exemplars: true

  # API settings
  api:
    host: "0.0.0.0"
    port: 8080
    enable_graphql: true
    enable_websocket: true
    cors_origins: ["*"]

  # Telemetry
  telemetry:
    log_level: "info"
    metrics_port: 9090
    tracing_enabled: true
    tracing_endpoint: "http://localhost:4317"

Consequences

Positive Consequences

  1. Development velocity: Single deployment simplifies CI/CD and local development
  2. Performance: Critical audio-to-index path has zero network overhead
  3. Debugging: Stack traces span the entire flow; no distributed tracing required initially
  4. Testing: Integration tests run in-process without container orchestration
  5. Scientific reproducibility: Single binary with pinned dependencies ensures consistent results
  6. Resource efficiency: Shared memory pools and caches across modules
  7. Evolution path: Clear module boundaries allow extraction to services when justified

Negative Consequences

  1. Scaling limitations: Cannot scale embedding generation independently from query serving
  2. Deployment coupling: Updates to any module require full redeployment
  3. Resource contention: GNN training may compete with query serving for CPU/memory
  4. Technology constraints: All modules must work within Rust ecosystem (mitigated by FFI)

Mitigation Strategies

Risk Mitigation
Scaling limitations Design async job queues that could become external workers
Deployment coupling Blue-green deployments with health checks
Resource contention Configurable resource limits per module; background training scheduling
Technology constraints ONNX runtime for ML; FFI bindings for specialized libraries

  • ADR-002: Perch 2.0 Integration Strategy (ONNX vs. birdnet-onnx crate)
  • ADR-003: HNSW vs. Hyperbolic Space Configuration
  • ADR-004: GNN Training Strategy (Online vs. Batch)
  • ADR-005: RAB Evidence Pack Schema
  • ADR-006: API Design (REST/GraphQL/gRPC)

Compliance and Standards

Scientific Standards

  • All embeddings include model version and parameters for reproducibility
  • Evidence packs include full retrieval citations per RAB methodology
  • Validation metrics align with published benchmarks (V-measure, silhouette scores)

Data Standards

  • Audio metadata follows Darwin Core / TDWG standards where applicable
  • Taxonomic references link to iNaturalist and eBird identifiers
  • Geospatial data uses WGS84 coordinates

Security Considerations

  • No PII in bioacoustic data (sensor IDs are pseudonymous)
  • API authentication via JWT tokens
  • Audit logging for all data modifications

References

  1. Perch 2.0 Paper: "The Bittern Lesson for Bioacoustics" (arXiv:2508.04665)
  2. RuVector Documentation: https://github.com/ruvnet/ruvector
  3. HNSW Paper: "Efficient and Robust Approximate Nearest Neighbor Search"
  4. RAB Pattern: Retrieval-Augmented Bioacoustics methodology
  5. AVN Deep Learning Study: "A deep learning approach for the analysis of birdsong" (eLife 2025)

Revision History

Version Date Author Changes
1.0 2026-01-15 7sense Architecture Team Initial version