Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

27 KiB

Raw Blame History

ADR-001: System Architecture Overview

Status: Accepted Date: 2026-01-15 Decision Makers: 7sense Architecture Team Technical Area: System Architecture

Context and Problem Statement

7sense aims to transform bioacoustic signals (primarily bird calls) into a navigable geometric space where meaningful structure emerges. The system must process audio recordings, generate high-dimensional embeddings using Perch 2.0 (1536-D vectors), organize them with HNSW indexing in RuVector, and apply GNN-based learning to surface patterns such as call types, motifs, and behavioral contexts.

The core challenge is designing an architecture that:

Handles diverse data pipelines - From raw 32kHz audio to queryable vector embeddings
Scales to millions of call segments - Real-world bioacoustic monitoring generates vast datasets
Supports scientific workflows - Researchers need reproducibility, transparency, and evidence-backed interpretations (RAB pattern)
Enables real-time and batch processing - Field deployments require streaming; research requires bulk analysis
Integrates ML inference efficiently - ONNX-based Perch 2.0 inference in Rust for performance

Current State

This is a greenfield project building upon:

Perch 2.0: Google DeepMind's bioacoustic embedding model (EfficientNet-B3 backbone, 1536-D output)
RuVector: Rust-based vector database with HNSW indexing and self-learning GNN layers
RAB Pattern: Retrieval-Augmented Bioacoustics for evidence-backed interpretation

Decision Drivers

Performance Requirements

Embedding generation: Process 5-second audio segments at >100 segments/second
Vector search: Sub-millisecond kNN queries on 1M+ vectors (HNSW target: ~100us)
Batch ingestion: 1M vectors/minute build speed (RuVector baseline)
Memory efficiency: Support 32x compression for cold data tiers

Scalability Requirements

Data volume: Support 10K to 10M+ call segments per deployment
Concurrent users: Multiple researchers querying simultaneously
Geographic distribution: Sensor networks across multiple sites
Temporal depth: Years of historical recordings

Scientific Rigor Requirements

Reproducibility: Deterministic pipelines with versioned models and parameters
Transparency: RAB-style evidence packs citing retrieved calls for any interpretation
Auditability: Full provenance tracking from raw audio to conclusions
Validation: Built-in verification against ground truth labels

Operational Requirements

Deployment flexibility: Edge (sensor), cloud, and hybrid deployments
Monitoring: Health metrics, processing throughput, index quality
Updates: Hot-swap embedding models without full reindexing
Recovery: Graceful degradation and disaster recovery

Considered Options

Option A: Monolithic Architecture

A single application handling all concerns: audio processing, embedding generation, vector storage, GNN learning, API serving, and visualization.

Pros:

Simplest deployment model
No inter-service communication overhead
Single codebase to maintain

Cons:

Cannot scale components independently
Single point of failure
Difficult to update individual components
Memory pressure from co-located ML models
Not suitable for distributed sensor networks

Option B: Microservices Architecture

Fully decomposed services: Audio Ingest Service, Embedding Service, Vector Store Service, GNN Learning Service, Query Service, Visualization Service, etc.

Pros:

Independent scaling per service
Technology flexibility per service
Fault isolation
Team parallelization

Cons:

Significant operational complexity
Network latency between services
Data consistency challenges
Overkill for initial team size
Complex debugging across service boundaries

Option C: Modular Monolith Architecture

A single deployable unit with clearly separated internal modules, designed for future extraction into services if needed.

Pros:

Maintains deployment simplicity
Clear module boundaries enable future splitting
In-process communication for performance-critical paths
Easier debugging and testing
Appropriate for current team/project scale
Can evolve toward microservices as needs emerge

Cons:

Requires discipline to maintain module boundaries
All modules share the same runtime resources
Scaling requires scaling the entire application

Decision Outcome

Chosen Option: Option C - Modular Monolith Architecture

We adopt a modular monolith architecture with clearly defined domain boundaries, designed with explicit seams that allow future extraction to services. This balances immediate development velocity with long-term architectural flexibility.

Rationale

Right-sized for current needs: A small team building a new product benefits from deployment simplicity
Performance-critical paths stay in-process: Audio-to-embedding-to-index flow benefits from zero network hops
Scientific workflow alignment: Researchers prefer reproducible, debuggable systems over distributed complexity
Evolution path preserved: Module boundaries are designed as potential service boundaries
RuVector integration: RuVector is designed as an embeddable library, making monolith integration natural

Technical Specifications

Module Architecture

sevensense/
├── core/                      # Domain-agnostic foundations
│   ├── config/               # Configuration management
│   ├── error/                # Error types and handling
│   ├── telemetry/            # Logging, metrics, tracing
│   └── storage/              # Abstract storage interfaces
│
├── audio/                     # Audio Processing Domain
│   ├── ingest/               # Audio file reading, streaming
│   ├── segment/              # Call detection and segmentation
│   ├── features/             # Acoustic feature extraction
│   └── spectrogram/          # Mel spectrogram generation
│
├── embedding/                 # Embedding Generation Domain
│   ├── perch/                # Perch 2.0 ONNX inference
│   ├── models/               # Model versioning and registry
│   ├── batch/                # Batch embedding pipelines
│   └── normalize/            # Vector normalization (L2, etc.)
│
├── vectordb/                  # Vector Storage Domain (RuVector)
│   ├── index/                # HNSW index management
│   ├── graph/                # Graph structure (nodes, edges)
│   ├── query/                # Similarity search, Cypher queries
│   └── hyperbolic/           # Poincare ball embeddings
│
├── learning/                  # GNN Learning Domain
│   ├── gnn/                  # GNN layers (GCN, GAT, GraphSAGE)
│   ├── attention/            # Attention mechanisms
│   ├── training/             # Self-supervised training loops
│   └── refinement/           # Embedding refinement pipelines
│
├── analysis/                  # Analysis Domain
│   ├── clustering/           # HDBSCAN, prototype extraction
│   ├── sequence/             # Motif detection, transition analysis
│   ├── entropy/              # Sequence entropy metrics
│   └── validation/           # Ground truth comparison
│
├── rab/                       # Retrieval-Augmented Bioacoustics
│   ├── evidence/             # Evidence pack construction
│   ├── retrieval/            # Adaptive retrieval depth
│   ├── interpretation/       # Constrained interpretation generation
│   └── citation/             # Source attribution
│
├── api/                       # API Layer
│   ├── rest/                 # REST endpoints
│   ├── graphql/              # GraphQL schema and resolvers
│   ├── websocket/            # Real-time streaming
│   └── grpc/                 # gRPC for inter-service (future)
│
├── visualization/             # Visualization Domain
│   ├── projection/           # UMAP/t-SNE dimensionality reduction
│   ├── graph_viz/            # Network visualization
│   ├── spectrogram_viz/      # Spectrogram rendering
│   └── export/               # Export formats (JSON, PNG, etc.)
│
└── cli/                       # Command Line Interface
    ├── ingest/               # Batch ingestion commands
    ├── query/                # Query commands
    ├── train/                # Training commands
    └── export/               # Export commands

Data Model

Core Entities (Graph Nodes)

/// Raw audio recording from a sensor
struct Recording {
    id: Uuid,
    sensor_id: String,
    location: GeoPoint,          // lat, lon, elevation
    start_timestamp: DateTime,
    duration_ms: u32,
    sample_rate: u32,            // 32000 Hz for Perch 2.0
    channels: u8,
    habitat: Option<String>,
    weather: Option<WeatherData>,
    file_path: PathBuf,
    checksum: String,            // SHA-256 for reproducibility
}

/// Detected call segment within a recording
struct CallSegment {
    id: Uuid,
    recording_id: Uuid,
    start_ms: u32,
    end_ms: u32,
    snr_db: f32,                 // Signal-to-noise ratio
    peak_frequency_hz: f32,
    energy: f32,
    detection_confidence: f32,
    detection_method: String,    // "energy_threshold", "whisper_seg", etc.
}

/// Embedding vector for a call segment
struct Embedding {
    id: Uuid,
    segment_id: Uuid,
    model_id: String,            // "perch2_v1.0"
    dimensions: u16,             // 1536 for Perch 2.0
    vector: Vec<f32>,
    normalized: bool,
    created_at: DateTime,
}

/// Cluster prototype (centroid of similar calls)
struct Prototype {
    id: Uuid,
    cluster_id: Uuid,
    centroid_vector: Vec<f32>,
    exemplar_ids: Vec<Uuid>,     // Representative segments
    member_count: u32,
    coherence_score: f32,
}

/// Cluster of similar call segments
struct Cluster {
    id: Uuid,
    method: String,              // "hdbscan", "kmeans", etc.
    parameters: HashMap<String, Value>,
    created_at: DateTime,
    validation_score: Option<f32>,
}

/// Optional taxonomic reference
struct Taxon {
    id: Uuid,
    scientific_name: String,
    common_name: String,
    inat_id: Option<u64>,        // iNaturalist ID
    ebird_code: Option<String>,  // eBird species code
}

Relationships (Graph Edges)

/// Recording contains segments
edge HAS_SEGMENT: Recording -> CallSegment

/// Temporal sequence within recording
edge NEXT: CallSegment -> CallSegment {
    delta_ms: u32,               // Time gap between calls
}

/// Acoustic similarity from HNSW
edge SIMILAR: CallSegment -> CallSegment {
    distance: f32,               // Cosine or Euclidean
    rank: u8,                    // kNN rank (1 = nearest)
}

/// Cluster membership
edge ASSIGNED_TO: CallSegment -> Cluster

/// Prototype ownership
edge HAS_PROTOTYPE: Cluster -> Prototype

/// Species identification (when available)
edge IDENTIFIED_AS: CallSegment -> Taxon {
    confidence: f32,
    method: String,              // "manual", "model", "consensus"
}

Processing Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│                         INGESTION PIPELINE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │  Audio   │───▶│ Segment  │───▶│   Mel    │───▶│ Perch2.0 │          │
│  │  Input   │    │Detection │    │Spectrogram│   │  ONNX    │          │
│  │(32kHz,5s)│    │          │    │(500x128) │    │          │          │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘          │
│       │               │               │               │                  │
│       │               │               │               ▼                  │
│       │               │               │         ┌──────────┐            │
│       │               │               │         │Embedding │            │
│       │               │               │         │ (1536-D) │            │
│       │               │               │         └──────────┘            │
│       │               │               │               │                  │
└───────┼───────────────┼───────────────┼───────────────┼──────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         STORAGE LAYER                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │                        RuVector                               │       │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │       │
│  │  │   HNSW      │  │   Graph     │  │   Metadata Store    │  │       │
│  │  │   Index     │  │   Store     │  │   (Recordings,      │  │       │
│  │  │             │  │   (Edges)   │  │    Segments, etc.)  │  │       │
│  │  └─────────────┘  └─────────────┘  └─────────────────────┘  │       │
│  └──────────────────────────────────────────────────────────────┘       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         LEARNING LAYER                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐              │
│  │    GNN       │    │  Attention   │    │  Hyperbolic  │              │
│  │  Reranker    │───▶│   Layers     │───▶│  Refinement  │              │
│  │(GCN/GAT/SAGE)│    │              │    │  (Poincare)  │              │
│  └──────────────┘    └──────────────┘    └──────────────┘              │
│         │                   │                   │                        │
│         └───────────────────┴───────────────────┘                        │
│                             │                                            │
│                             ▼                                            │
│                    ┌──────────────┐                                     │
│                    │   Refined    │                                     │
│                    │  Embeddings  │                                     │
│                    └──────────────┘                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │               │               │               │
        ▼               ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         ANALYSIS LAYER                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │Clustering│  │ Sequence │  │ Anomaly  │  │  Entropy │  │   RAB    │ │
│  │(HDBSCAN) │  │  Mining  │  │Detection │  │  Metrics │  │ Evidence │ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         API / PRESENTATION                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
│  │   REST   │  │ GraphQL  │  │WebSocket │  │   CLI    │  │   WASM   │ │
│  │   API    │  │   API    │  │(Streaming)│ │          │  │ (Browser)│ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Key Interfaces Between Modules

// Audio -> Embedding interface
trait AudioEmbedder {
    fn embed_segment(&self, audio: &AudioSegment) -> Result<Embedding>;
    fn embed_batch(&self, segments: &[AudioSegment]) -> Result<Vec<Embedding>>;
    fn model_info(&self) -> ModelInfo;
}

// Embedding -> VectorDB interface
trait VectorStore {
    fn insert(&mut self, embedding: &Embedding) -> Result<()>;
    fn search_knn(&self, query: &[f32], k: usize) -> Result<Vec<SearchResult>>;
    fn get_neighbors(&self, id: Uuid) -> Result<Vec<Neighbor>>;
    fn build_similarity_edges(&mut self, k: usize) -> Result<usize>;
}

// VectorDB -> Learning interface
trait GraphLearner {
    fn train_step(&mut self, graph: &Graph) -> Result<TrainMetrics>;
    fn refine_embeddings(&self, embeddings: &mut [Embedding]) -> Result<()>;
    fn attention_weights(&self, node_id: Uuid) -> Result<Vec<(Uuid, f32)>>;
}

// Learning -> Analysis interface
trait PatternAnalyzer {
    fn cluster(&self, embeddings: &[Embedding]) -> Result<Vec<Cluster>>;
    fn find_motifs(&self, sequences: &[Sequence]) -> Result<Vec<Motif>>;
    fn compute_entropy(&self, transitions: &TransitionMatrix) -> f32;
}

// Analysis -> RAB interface
trait EvidenceBuilder {
    fn build_pack(&self, query: &Query) -> Result<EvidencePack>;
    fn generate_interpretation(&self, pack: &EvidencePack) -> Result<Interpretation>;
    fn cite_sources(&self, interpretation: &Interpretation) -> Vec<Citation>;
}

Configuration Structure

# sevensense.yaml
sevensense:
  # Audio processing settings
  audio:
    sample_rate: 32000          # Perch 2.0 requirement
    segment_duration_ms: 5000   # 5 seconds
    segment_overlap_ms: 500     # Overlap for continuity
    min_snr_db: 10.0           # Minimum signal-to-noise
    detection_method: "energy"  # or "whisper_seg", "tweety"

  # Embedding generation
  embedding:
    model: "perch2_v1.0"
    onnx_path: "./models/perch2.onnx"
    dimensions: 1536
    normalize: true
    batch_size: 32

  # Vector database (RuVector)
  vectordb:
    index_type: "hnsw"
    hnsw:
      m: 16                     # Connections per node
      ef_construction: 200      # Build-time search width
      ef_search: 100           # Query-time search width
    distance_metric: "cosine"   # or "euclidean", "poincare"
    enable_hyperbolic: false    # Experimental
    compression:
      hot_tier: "none"
      warm_tier: "pq_8"        # Product quantization
      cold_tier: "pq_4"        # Aggressive compression

  # GNN learning
  learning:
    enabled: true
    gnn_type: "gat"            # GCN, GAT, or GraphSAGE
    hidden_dim: 256
    num_layers: 2
    attention_heads: 4
    learning_rate: 0.001
    training_interval_hours: 24

  # Analysis settings
  analysis:
    clustering:
      method: "hdbscan"
      min_cluster_size: 10
      min_samples: 5
    sequence:
      max_gap_ms: 2000         # Max silence between calls
      min_motif_length: 3

  # RAB settings
  rab:
    retrieval_k: 10            # Neighbors to retrieve
    min_confidence: 0.7
    cite_exemplars: true

  # API settings
  api:
    host: "0.0.0.0"
    port: 8080
    enable_graphql: true
    enable_websocket: true
    cors_origins: ["*"]

  # Telemetry
  telemetry:
    log_level: "info"
    metrics_port: 9090
    tracing_enabled: true
    tracing_endpoint: "http://localhost:4317"

Consequences

Positive Consequences

Development velocity: Single deployment simplifies CI/CD and local development
Performance: Critical audio-to-index path has zero network overhead
Debugging: Stack traces span the entire flow; no distributed tracing required initially
Testing: Integration tests run in-process without container orchestration
Scientific reproducibility: Single binary with pinned dependencies ensures consistent results
Resource efficiency: Shared memory pools and caches across modules
Evolution path: Clear module boundaries allow extraction to services when justified

Negative Consequences

Scaling limitations: Cannot scale embedding generation independently from query serving
Deployment coupling: Updates to any module require full redeployment
Resource contention: GNN training may compete with query serving for CPU/memory
Technology constraints: All modules must work within Rust ecosystem (mitigated by FFI)

Mitigation Strategies

Risk	Mitigation
Scaling limitations	Design async job queues that could become external workers
Deployment coupling	Blue-green deployments with health checks
Resource contention	Configurable resource limits per module; background training scheduling
Technology constraints	ONNX runtime for ML; FFI bindings for specialized libraries

ADR-002: Perch 2.0 Integration Strategy (ONNX vs. birdnet-onnx crate)
ADR-003: HNSW vs. Hyperbolic Space Configuration
ADR-004: GNN Training Strategy (Online vs. Batch)
ADR-005: RAB Evidence Pack Schema
ADR-006: API Design (REST/GraphQL/gRPC)

Compliance and Standards

Scientific Standards

All embeddings include model version and parameters for reproducibility
Evidence packs include full retrieval citations per RAB methodology
Validation metrics align with published benchmarks (V-measure, silhouette scores)

Data Standards

Audio metadata follows Darwin Core / TDWG standards where applicable
Taxonomic references link to iNaturalist and eBird identifiers
Geospatial data uses WGS84 coordinates

Security Considerations

No PII in bioacoustic data (sensor IDs are pseudonymous)
API authentication via JWT tokens
Audit logging for all data modifications

References

Perch 2.0 Paper: "The Bittern Lesson for Bioacoustics" (arXiv:2508.04665)
RuVector Documentation: https://github.com/ruvnet/ruvector
HNSW Paper: "Efficient and Robust Approximate Nearest Neighbor Search"
RAB Pattern: Retrieval-Augmented Bioacoustics methodology
AVN Deep Learning Study: "A deep learning approach for the analysis of birdsong" (eLife 2025)

Revision History

Version	Date	Author	Changes
1.0	2026-01-15	7sense Architecture Team	Initial version

27 KiB Raw Blame History

ADR-001: System Architecture Overview

Context and Problem Statement

Current State

Decision Drivers

Performance Requirements

Scalability Requirements

Scientific Rigor Requirements

Operational Requirements

Considered Options

Option A: Monolithic Architecture

Option B: Microservices Architecture

Option C: Modular Monolith Architecture

Decision Outcome

Rationale

Technical Specifications

Module Architecture

Data Model

Core Entities (Graph Nodes)

Relationships (Graph Edges)

Processing Pipeline

Key Interfaces Between Modules

Configuration Structure

Consequences

Positive Consequences

Negative Consequences

Mitigation Strategies

Related Decisions

Compliance and Standards

Scientific Standards

Data Standards

Security Considerations

References

Revision History

27 KiB

Raw Blame History