Files
wifi-densepose/vendor/ruvector/examples/vibecast-7sense/docs/adr/ADR-003-security-architecture.md

57 KiB

ADR-003: Security Architecture for 7sense Bioacoustics Platform

Status

Accepted

Date

2026-01-15

Context

7sense is a bioacoustics platform that processes audio recordings of wildlife vocalizations, generates embeddings using the Perch 2.0 ONNX model, and stores them in a RuVector vector database for similarity search and pattern analysis. The platform implements Retrieval-Augmented Bioacoustics (RAB) for evidence-based interpretation of wildlife communication patterns.

Security-Critical Components

  1. Audio Processing Pipeline: Ingests 5-second mono audio at 32kHz (160,000 samples)
  2. Perch 2.0 ONNX Model: Generates 1536-dimensional embeddings from mel spectrograms
  3. RuVector Database: Stores embeddings with HNSW indexing and GNN learning layers
  4. RAB Evidence Packs: Aggregates retrieval results with provenance for interpretations
  5. API Layer: Exposes search, ingestion, and analysis capabilities

Regulatory Considerations

  • Endangered Species Act (ESA) compliance for protected species data
  • CITES requirements for international wildlife data sharing
  • Research ethics for sensitive habitat location data
  • Data sovereignty for indigenous lands recordings

Decision

We will implement a defense-in-depth security architecture with the following layers:

1. Threat Model

1.1 Primary Threat Actors

Actor Motivation Capability Risk Level
Data Exfiltrators Steal research data, endangered species locations Moderate-High Critical
Model Poisoners Corrupt embeddings to degrade analysis quality Moderate High
Inference Attackers Extract training data or model internals High High
Malicious Researchers Upload harmful content, abuse API Low-Moderate Medium
Script Kiddies Automated scanning, opportunistic attacks Low Low

1.2 Attack Vectors

                         ATTACK SURFACE MAP
    +------------------------------------------------------------------+
    |                        API BOUNDARY                               |
    |  [Audio Upload] [Search Query] [Batch Ingestion] [Admin Endpoints]|
    +------------------------------------------------------------------+
              |              |              |              |
              v              v              v              v
    +------------------------------------------------------------------+
    |                    INPUT VALIDATION LAYER                        |
    |  - Audio format validation      - Query sanitization             |
    |  - File size limits             - Rate limiting                  |
    |  - Path traversal prevention    - Authentication check           |
    +------------------------------------------------------------------+
              |              |              |              |
              v              v              v              v
    +------------------------------------------------------------------+
    |                    PROCESSING LAYER                              |
    |  - ONNX model sandboxing        - Memory bounds checking         |
    |  - Embedding normalization      - Resource quotas                |
    +------------------------------------------------------------------+
              |              |              |              |
              v              v              v              v
    +------------------------------------------------------------------+
    |                    STORAGE LAYER                                 |
    |  - Encrypted at rest            - Access control (RBAC)          |
    |  - Audit logging                - Data classification            |
    +------------------------------------------------------------------+

1.3 Threat Scenarios

T1: Model Poisoning via Malicious Audio

  • Attack: Upload crafted audio that produces adversarial embeddings
  • Impact: Corrupts similarity search, clusters benign calls with malicious
  • Mitigation: Embedding bounds validation, anomaly detection on insertions

T2: Inference Attack on Embeddings

  • Attack: Query embeddings to reconstruct original audio or model weights
  • Impact: Intellectual property theft, privacy breach
  • Mitigation: Differential privacy on query results, rate limiting

T3: Path Traversal on Audio Storage

  • Attack: Manipulate file paths to access system files
  • Impact: System compromise, data exfiltration
  • Mitigation: Strict path canonicalization, chroot-style isolation

T4: Protected Species Location Leakage

  • Attack: Correlate audio metadata to locate endangered species
  • Impact: Poaching risk, regulatory violations
  • Mitigation: Location fuzzing, access tiering, audit logging

T5: RAB Attribution Manipulation

  • Attack: Forge or modify evidence pack citations
  • Impact: Loss of scientific integrity, misinformation
  • Mitigation: Cryptographic signatures on RAB outputs

2. Input Validation Strategy

2.1 Audio File Validation

// audio_validator.rs

use std::io::{Read, Seek, SeekFrom};

pub struct AudioValidationConfig {
    pub max_file_size: usize,           // 50 MB default
    pub allowed_formats: Vec<String>,    // ["wav", "flac", "ogg"]
    pub required_sample_rate: u32,       // 32000 Hz (Perch 2.0 requirement)
    pub required_channels: u8,           // 1 (mono)
    pub max_duration_seconds: f64,       // 300.0 (5 minutes)
    pub min_duration_seconds: f64,       // 0.5
}

pub enum AudioValidationError {
    FileTooLarge { size: usize, max: usize },
    UnsupportedFormat { format: String },
    InvalidSampleRate { found: u32, expected: u32 },
    InvalidChannels { found: u8, expected: u8 },
    DurationOutOfRange { duration: f64 },
    MalformedHeader,
    SuspiciousPayload { reason: String },
}

pub fn validate_audio_file<R: Read + Seek>(
    reader: &mut R,
    config: &AudioValidationConfig,
) -> Result<AudioMetadata, AudioValidationError> {
    // 1. Check file size without loading entire file
    let file_size = reader.seek(SeekFrom::End(0))? as usize;
    reader.seek(SeekFrom::Start(0))?;

    if file_size > config.max_file_size {
        return Err(AudioValidationError::FileTooLarge {
            size: file_size,
            max: config.max_file_size,
        });
    }

    // 2. Validate magic bytes for format detection
    let mut magic = [0u8; 12];
    reader.read_exact(&mut magic)?;
    reader.seek(SeekFrom::Start(0))?;

    let format = detect_audio_format(&magic)?;
    if !config.allowed_formats.contains(&format) {
        return Err(AudioValidationError::UnsupportedFormat { format });
    }

    // 3. Parse and validate header (format-specific)
    let metadata = parse_audio_metadata(reader, &format)?;

    // 4. Validate sample rate matches Perch 2.0 requirement
    if metadata.sample_rate != config.required_sample_rate {
        return Err(AudioValidationError::InvalidSampleRate {
            found: metadata.sample_rate,
            expected: config.required_sample_rate,
        });
    }

    // 5. Validate mono channel requirement
    if metadata.channels != config.required_channels {
        return Err(AudioValidationError::InvalidChannels {
            found: metadata.channels,
            expected: config.required_channels,
        });
    }

    // 6. Validate duration bounds
    if metadata.duration < config.min_duration_seconds
        || metadata.duration > config.max_duration_seconds {
        return Err(AudioValidationError::DurationOutOfRange {
            duration: metadata.duration,
        });
    }

    // 7. Scan for suspicious embedded content
    scan_for_polyglot_attacks(reader)?;

    Ok(metadata)
}

fn scan_for_polyglot_attacks<R: Read + Seek>(reader: &mut R) -> Result<(), AudioValidationError> {
    // Check for embedded executables, scripts, or other dangerous payloads
    // that could exploit audio parser vulnerabilities
    let mut buffer = [0u8; 4096];
    reader.seek(SeekFrom::Start(0))?;

    while let Ok(n) = reader.read(&mut buffer) {
        if n == 0 { break; }

        // Check for common executable signatures
        if contains_executable_signature(&buffer[..n]) {
            return Err(AudioValidationError::SuspiciousPayload {
                reason: "Embedded executable detected".into(),
            });
        }

        // Check for script injection patterns
        if contains_script_patterns(&buffer[..n]) {
            return Err(AudioValidationError::SuspiciousPayload {
                reason: "Script content detected".into(),
            });
        }
    }

    reader.seek(SeekFrom::Start(0))?;
    Ok(())
}

2.2 Embedding Bounds Validation

// embedding_validator.rs

pub struct EmbeddingValidationConfig {
    pub expected_dimensions: usize,      // 1536 for Perch 2.0
    pub max_l2_norm: f32,                // 100.0 (generous bound)
    pub min_l2_norm: f32,                // 0.01 (detect collapsed embeddings)
    pub max_element_value: f32,          // 50.0
    pub min_element_value: f32,          // -50.0
    pub nan_policy: NanPolicy,           // Reject
    pub inf_policy: InfPolicy,           // Reject
}

pub enum EmbeddingValidationError {
    DimensionMismatch { found: usize, expected: usize },
    NormOutOfBounds { norm: f32, min: f32, max: f32 },
    ElementOutOfBounds { index: usize, value: f32 },
    ContainsNaN { indices: Vec<usize> },
    ContainsInf { indices: Vec<usize> },
    SuspiciousPattern { reason: String },
}

pub fn validate_embedding(
    embedding: &[f32],
    config: &EmbeddingValidationConfig,
) -> Result<EmbeddingStats, EmbeddingValidationError> {
    // 1. Dimension check
    if embedding.len() != config.expected_dimensions {
        return Err(EmbeddingValidationError::DimensionMismatch {
            found: embedding.len(),
            expected: config.expected_dimensions,
        });
    }

    let mut nan_indices = Vec::new();
    let mut inf_indices = Vec::new();
    let mut sum_squares = 0.0f64;

    for (i, &val) in embedding.iter().enumerate() {
        // 2. NaN check
        if val.is_nan() {
            nan_indices.push(i);
            continue;
        }

        // 3. Infinity check
        if val.is_infinite() {
            inf_indices.push(i);
            continue;
        }

        // 4. Element bounds check
        if val < config.min_element_value || val > config.max_element_value {
            return Err(EmbeddingValidationError::ElementOutOfBounds {
                index: i,
                value: val,
            });
        }

        sum_squares += (val as f64) * (val as f64);
    }

    // Report NaN/Inf based on policy
    if !nan_indices.is_empty() {
        return Err(EmbeddingValidationError::ContainsNaN { indices: nan_indices });
    }
    if !inf_indices.is_empty() {
        return Err(EmbeddingValidationError::ContainsInf { indices: inf_indices });
    }

    // 5. L2 norm bounds check
    let l2_norm = (sum_squares as f32).sqrt();
    if l2_norm < config.min_l2_norm || l2_norm > config.max_l2_norm {
        return Err(EmbeddingValidationError::NormOutOfBounds {
            norm: l2_norm,
            min: config.min_l2_norm,
            max: config.max_l2_norm,
        });
    }

    // 6. Statistical anomaly detection
    detect_adversarial_patterns(embedding)?;

    Ok(EmbeddingStats {
        l2_norm,
        mean: embedding.iter().sum::<f32>() / embedding.len() as f32,
        variance: compute_variance(embedding),
    })
}

fn detect_adversarial_patterns(embedding: &[f32]) -> Result<(), EmbeddingValidationError> {
    // Detect patterns indicative of adversarial manipulation:
    // - Unusual sparsity (most values zero)
    // - Extreme clustering at specific values
    // - Patterns inconsistent with learned embedding distribution

    let zero_count = embedding.iter().filter(|&&v| v.abs() < 1e-6).count();
    let sparsity = zero_count as f32 / embedding.len() as f32;

    if sparsity > 0.95 {
        return Err(EmbeddingValidationError::SuspiciousPattern {
            reason: format!("Abnormal sparsity: {:.2}%", sparsity * 100.0),
        });
    }

    Ok(())
}

3. Path Traversal Prevention

// path_security.rs

use std::path::{Path, PathBuf, Component};

pub struct SecurePathConfig {
    pub audio_root: PathBuf,          // /data/audio
    pub embedding_root: PathBuf,      // /data/embeddings
    pub model_root: PathBuf,          // /models
    pub temp_root: PathBuf,           // /tmp/sevensense
    pub max_path_depth: usize,        // 10
    pub allowed_extensions: Vec<String>,
}

pub enum PathSecurityError {
    PathTraversalAttempt { path: String, reason: String },
    OutsideAllowedRoot { path: String, root: String },
    DisallowedExtension { ext: String },
    SymlinkDetected { path: String },
    PathTooDeep { depth: usize, max: usize },
    InvalidUtf8,
    NullByteDetected,
}

/// Sanitize and validate a user-provided path against traversal attacks.
///
/// CRITICAL: This function MUST be called for ALL user-provided file paths.
pub fn secure_path(
    user_path: &str,
    allowed_root: &Path,
    config: &SecurePathConfig,
) -> Result<PathBuf, PathSecurityError> {
    // 1. Check for null bytes (common bypass technique)
    if user_path.contains('\0') {
        return Err(PathSecurityError::NullByteDetected);
    }

    // 2. Check for URL encoding bypass attempts
    let decoded = percent_decode(user_path)?;

    // 3. Reject paths with explicit traversal sequences
    let dangerous_patterns = [
        "..", "..\\", "../", "..%2f", "..%5c",
        "%2e%2e", "%252e%252e",  // Double encoding
        "....//", "....\\\\",    // Variant bypasses
    ];

    let lower = decoded.to_lowercase();
    for pattern in &dangerous_patterns {
        if lower.contains(pattern) {
            return Err(PathSecurityError::PathTraversalAttempt {
                path: user_path.to_string(),
                reason: format!("Contains dangerous pattern: {}", pattern),
            });
        }
    }

    // 4. Parse and canonicalize the path
    let user_path_buf = PathBuf::from(&decoded);

    // 5. Validate each component
    let mut depth = 0;
    for component in user_path_buf.components() {
        match component {
            Component::ParentDir => {
                return Err(PathSecurityError::PathTraversalAttempt {
                    path: user_path.to_string(),
                    reason: "Parent directory reference detected".into(),
                });
            }
            Component::Normal(segment) => {
                depth += 1;
                // Validate segment doesn't contain hidden traversal
                let seg_str = segment.to_str()
                    .ok_or(PathSecurityError::InvalidUtf8)?;
                if seg_str.starts_with('.') && seg_str.len() > 1 {
                    // Allow single dot but reject hidden files/dirs
                    if seg_str != "." {
                        return Err(PathSecurityError::PathTraversalAttempt {
                            path: user_path.to_string(),
                            reason: "Hidden file/directory not allowed".into(),
                        });
                    }
                }
            }
            _ => {}
        }
    }

    // 6. Check path depth
    if depth > config.max_path_depth {
        return Err(PathSecurityError::PathTooDeep {
            depth,
            max: config.max_path_depth,
        });
    }

    // 7. Construct the final path within the allowed root
    let final_path = allowed_root.join(&user_path_buf);

    // 8. Canonicalize and verify it's still under the root
    // Note: We canonicalize the root first to handle symlinks in the root itself
    let canonical_root = allowed_root.canonicalize()
        .map_err(|_| PathSecurityError::PathTraversalAttempt {
            path: user_path.to_string(),
            reason: "Root path resolution failed".into(),
        })?;

    // For new files, canonicalize parent and append filename
    let canonical_final = if final_path.exists() {
        final_path.canonicalize()
            .map_err(|_| PathSecurityError::PathTraversalAttempt {
                path: user_path.to_string(),
                reason: "Path resolution failed".into(),
            })?
    } else {
        let parent = final_path.parent()
            .ok_or(PathSecurityError::PathTraversalAttempt {
                path: user_path.to_string(),
                reason: "Invalid parent path".into(),
            })?;
        let filename = final_path.file_name()
            .ok_or(PathSecurityError::PathTraversalAttempt {
                path: user_path.to_string(),
                reason: "Missing filename".into(),
            })?;

        parent.canonicalize()
            .map_err(|_| PathSecurityError::PathTraversalAttempt {
                path: user_path.to_string(),
                reason: "Parent path resolution failed".into(),
            })?
            .join(filename)
    };

    // 9. Final containment check
    if !canonical_final.starts_with(&canonical_root) {
        return Err(PathSecurityError::OutsideAllowedRoot {
            path: canonical_final.display().to_string(),
            root: canonical_root.display().to_string(),
        });
    }

    // 10. Check for symlinks (optional, depending on policy)
    if final_path.exists() && final_path.symlink_metadata()?.file_type().is_symlink() {
        return Err(PathSecurityError::SymlinkDetected {
            path: user_path.to_string(),
        });
    }

    // 11. Validate extension if applicable
    if let Some(ext) = canonical_final.extension() {
        let ext_str = ext.to_str().ok_or(PathSecurityError::InvalidUtf8)?;
        if !config.allowed_extensions.contains(&ext_str.to_lowercase()) {
            return Err(PathSecurityError::DisallowedExtension {
                ext: ext_str.to_string(),
            });
        }
    }

    Ok(canonical_final)
}

4. API Security

4.1 Authentication Architecture

// auth.rs

use jsonwebtoken::{decode, encode, DecodingKey, EncodingKey, Header, Validation};
use argon2::{Argon2, PasswordHash, PasswordHasher, PasswordVerifier};
use rand::rngs::OsRng;

/// Authentication configuration - NO HARDCODED CREDENTIALS
pub struct AuthConfig {
    /// JWT signing key - MUST be loaded from environment or secure vault
    pub jwt_secret: String,
    /// Token expiration in seconds
    pub token_expiry_secs: u64,
    /// Refresh token expiration in seconds
    pub refresh_expiry_secs: u64,
    /// Argon2 parameters for password hashing
    pub argon2_params: Argon2Params,
}

pub struct Argon2Params {
    pub memory_cost: u32,      // 65536 (64 MB)
    pub time_cost: u32,        // 3 iterations
    pub parallelism: u32,      // 4 threads
    pub output_length: usize,  // 32 bytes
}

impl Default for Argon2Params {
    fn default() -> Self {
        Self {
            memory_cost: 65536,
            time_cost: 3,
            parallelism: 4,
            output_length: 32,
        }
    }
}

/// Hash password using Argon2id (OWASP recommended)
pub fn hash_password(password: &str, params: &Argon2Params) -> Result<String, AuthError> {
    let salt = argon2::password_hash::SaltString::generate(&mut OsRng);

    let argon2 = Argon2::new(
        argon2::Algorithm::Argon2id,
        argon2::Version::V0x13,
        argon2::Params::new(
            params.memory_cost,
            params.time_cost,
            params.parallelism,
            Some(params.output_length),
        ).map_err(|e| AuthError::HashingError(e.to_string()))?,
    );

    let hash = argon2.hash_password(password.as_bytes(), &salt)
        .map_err(|e| AuthError::HashingError(e.to_string()))?;

    Ok(hash.to_string())
}

/// Verify password against stored hash
pub fn verify_password(password: &str, hash: &str) -> Result<bool, AuthError> {
    let parsed_hash = PasswordHash::new(hash)
        .map_err(|e| AuthError::VerificationError(e.to_string()))?;

    let argon2 = Argon2::default();

    Ok(argon2.verify_password(password.as_bytes(), &parsed_hash).is_ok())
}

#[derive(Debug, Serialize, Deserialize)]
pub struct Claims {
    pub sub: String,           // User ID
    pub role: UserRole,        // Access level
    pub exp: u64,              // Expiration timestamp
    pub iat: u64,              // Issued at
    pub jti: String,           // Unique token ID (for revocation)
    pub permissions: Vec<Permission>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub enum UserRole {
    Public,          // Read-only access to public data
    Researcher,      // Read/write access to research data
    DataCurator,     // Can modify data classifications
    Administrator,   // Full system access
    Service,         // Machine-to-machine authentication
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub enum Permission {
    AudioRead,
    AudioWrite,
    AudioDelete,
    EmbeddingRead,
    EmbeddingWrite,
    ProtectedSpeciesRead,  // Requires additional verification
    ProtectedSpeciesWrite,
    ModelExecute,
    AdminAccess,
    AuditLogRead,
}

4.2 Rate Limiting

// rate_limiter.rs

use std::collections::HashMap;
use std::time::{Duration, Instant};
use parking_lot::RwLock;

pub struct RateLimiterConfig {
    /// Limits per endpoint category
    pub limits: HashMap<EndpointCategory, RateLimit>,
    /// Global limit across all endpoints
    pub global_limit: RateLimit,
    /// Penalty multiplier for repeated violations
    pub violation_penalty: f32,
    /// Max penalty duration
    pub max_penalty_duration: Duration,
}

#[derive(Debug, Clone, Hash, Eq, PartialEq)]
pub enum EndpointCategory {
    AudioUpload,
    EmbeddingQuery,
    BatchIngestion,
    Search,
    Admin,
    ProtectedData,
}

#[derive(Debug, Clone)]
pub struct RateLimit {
    /// Requests allowed per window
    pub requests: u32,
    /// Time window duration
    pub window: Duration,
    /// Burst allowance (token bucket)
    pub burst: u32,
    /// Cost per request (for weighted limiting)
    pub cost: u32,
}

impl Default for RateLimiterConfig {
    fn default() -> Self {
        let mut limits = HashMap::new();

        // Conservative defaults - adjust based on capacity
        limits.insert(EndpointCategory::AudioUpload, RateLimit {
            requests: 100,
            window: Duration::from_secs(3600), // 100/hour
            burst: 10,
            cost: 10,
        });

        limits.insert(EndpointCategory::EmbeddingQuery, RateLimit {
            requests: 1000,
            window: Duration::from_secs(60), // 1000/minute
            burst: 50,
            cost: 1,
        });

        limits.insert(EndpointCategory::Search, RateLimit {
            requests: 500,
            window: Duration::from_secs(60), // 500/minute
            burst: 20,
            cost: 1,
        });

        limits.insert(EndpointCategory::BatchIngestion, RateLimit {
            requests: 10,
            window: Duration::from_secs(3600), // 10/hour
            burst: 2,
            cost: 100,
        });

        limits.insert(EndpointCategory::ProtectedData, RateLimit {
            requests: 50,
            window: Duration::from_secs(3600), // 50/hour
            burst: 5,
            cost: 20,
        });

        limits.insert(EndpointCategory::Admin, RateLimit {
            requests: 100,
            window: Duration::from_secs(60), // 100/minute
            burst: 10,
            cost: 5,
        });

        Self {
            limits,
            global_limit: RateLimit {
                requests: 10000,
                window: Duration::from_secs(60),
                burst: 100,
                cost: 1,
            },
            violation_penalty: 2.0,
            max_penalty_duration: Duration::from_secs(86400), // 24 hours
        }
    }
}

pub struct TokenBucket {
    tokens: f32,
    max_tokens: f32,
    refill_rate: f32,  // tokens per second
    last_refill: Instant,
}

impl TokenBucket {
    pub fn new(max_tokens: f32, refill_rate: f32) -> Self {
        Self {
            tokens: max_tokens,
            max_tokens,
            refill_rate,
            last_refill: Instant::now(),
        }
    }

    pub fn try_consume(&mut self, cost: f32) -> bool {
        self.refill();

        if self.tokens >= cost {
            self.tokens -= cost;
            true
        } else {
            false
        }
    }

    fn refill(&mut self) {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_refill).as_secs_f32();
        self.tokens = (self.tokens + elapsed * self.refill_rate).min(self.max_tokens);
        self.last_refill = now;
    }
}

5. Data Classification

// data_classification.rs

use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};

/// Data classification levels following sensitivity hierarchy
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
pub enum ClassificationLevel {
    /// Publicly available data, no restrictions
    Public = 0,

    /// Research data with attribution requirements
    Research = 1,

    /// Internal use only, not for public release
    Internal = 2,

    /// Sensitive habitat or behavioral data
    Sensitive = 3,

    /// Protected species data - regulatory restrictions
    Protected = 4,

    /// Classified/embargoed data - strict access control
    Restricted = 5,
}

/// Classification metadata for audio recordings
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DataClassification {
    /// Primary classification level
    pub level: ClassificationLevel,

    /// Specific classification tags
    pub tags: Vec<ClassificationTag>,

    /// Regulatory frameworks that apply
    pub regulations: Vec<Regulation>,

    /// Access requirements
    pub access_requirements: AccessRequirements,

    /// Retention policy
    pub retention: RetentionPolicy,

    /// Classification reason and justification
    pub rationale: String,

    /// Who assigned the classification
    pub classified_by: String,

    /// When the classification was assigned
    pub classified_at: DateTime<Utc>,

    /// Review date for reclassification
    pub review_date: Option<DateTime<Utc>>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ClassificationTag {
    /// Contains protected species vocalizations
    ProtectedSpecies { species_code: String, conservation_status: ConservationStatus },

    /// Contains precise location data
    PreciseLocation,

    /// Contains indigenous lands recordings
    IndigenousTerritory { territory_code: String },

    /// Contains breeding site information
    BreedingSite,

    /// Contains data under active research embargo
    ResearchEmbargo { lift_date: DateTime<Utc> },

    /// Contains personally identifiable information (researcher voices, etc.)
    PII,

    /// Commercial restrictions apply
    CommercialRestriction,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ConservationStatus {
    LeastConcern,
    NearThreatened,
    Vulnerable,
    Endangered,
    CriticallyEndangered,
    ExtinctInWild,
    Unknown,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Regulation {
    /// US Endangered Species Act
    ESA { permit_required: bool, permit_number: Option<String> },

    /// Convention on International Trade in Endangered Species
    CITES { appendix: u8 },

    /// EU Habitats Directive
    HabitatsDirective,

    /// Migratory Bird Treaty Act
    MBTA,

    /// Institution-specific IRB approval
    IRB { protocol_number: String },

    /// Data sovereignty requirements
    DataSovereignty { jurisdiction: String },

    /// Custom regulatory framework
    Custom { name: String, requirements: String },
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AccessRequirements {
    /// Minimum role required
    pub min_role: crate::auth::UserRole,

    /// Additional permissions required
    pub required_permissions: Vec<crate::auth::Permission>,

    /// Requires signed data use agreement
    pub requires_dua: bool,

    /// Requires institutional affiliation verification
    pub requires_affiliation: bool,

    /// Requires ethics approval
    pub requires_ethics_approval: bool,

    /// Geographic restrictions on access
    pub geographic_restrictions: Option<Vec<String>>,

    /// Time-based access restrictions
    pub time_restrictions: Option<TimeRestrictions>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TimeRestrictions {
    /// Earliest time data can be accessed
    pub not_before: Option<DateTime<Utc>>,

    /// Latest time data can be accessed
    pub not_after: Option<DateTime<Utc>>,

    /// Seasonal restrictions (e.g., no access during breeding season)
    pub seasonal_blackouts: Vec<SeasonalBlackout>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SeasonalBlackout {
    pub name: String,
    pub start_month: u8,
    pub start_day: u8,
    pub end_month: u8,
    pub end_day: u8,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RetentionPolicy {
    /// Minimum retention period
    pub min_retention: Duration,

    /// Maximum retention period (for PII, etc.)
    pub max_retention: Option<Duration>,

    /// Action after retention period
    pub post_retention_action: PostRetentionAction,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum PostRetentionAction {
    Delete,
    Archive,
    Anonymize,
    Review,
}

/// Apply classification-based access control
pub fn check_access(
    classification: &DataClassification,
    user_role: &crate::auth::UserRole,
    user_permissions: &[crate::auth::Permission],
    context: &AccessContext,
) -> Result<(), AccessDeniedReason> {
    // Check role hierarchy
    if *user_role < classification.access_requirements.min_role {
        return Err(AccessDeniedReason::InsufficientRole {
            required: classification.access_requirements.min_role.clone(),
            actual: user_role.clone(),
        });
    }

    // Check required permissions
    for required in &classification.access_requirements.required_permissions {
        if !user_permissions.contains(required) {
            return Err(AccessDeniedReason::MissingPermission {
                required: required.clone(),
            });
        }
    }

    // Check geographic restrictions
    if let Some(ref allowed_regions) = classification.access_requirements.geographic_restrictions {
        if !allowed_regions.contains(&context.requester_region) {
            return Err(AccessDeniedReason::GeographicRestriction {
                requester_region: context.requester_region.clone(),
            });
        }
    }

    // Check time restrictions
    if let Some(ref time_restrictions) = classification.access_requirements.time_restrictions {
        let now = Utc::now();

        if let Some(not_before) = time_restrictions.not_before {
            if now < not_before {
                return Err(AccessDeniedReason::TemporalRestriction {
                    reason: format!("Data not available until {}", not_before),
                });
            }
        }

        if let Some(not_after) = time_restrictions.not_after {
            if now > not_after {
                return Err(AccessDeniedReason::TemporalRestriction {
                    reason: format!("Data access expired at {}", not_after),
                });
            }
        }
    }

    Ok(())
}

6. Audit Logging and Provenance

// audit.rs

use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use sha2::{Sha256, Digest};
use uuid::Uuid;

/// Immutable audit log entry
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AuditEntry {
    /// Unique entry ID
    pub id: Uuid,

    /// Timestamp of the event
    pub timestamp: DateTime<Utc>,

    /// Type of event
    pub event_type: AuditEventType,

    /// User or service that performed the action
    pub actor: Actor,

    /// Resource affected
    pub resource: Resource,

    /// Action performed
    pub action: Action,

    /// Outcome of the action
    pub outcome: Outcome,

    /// Additional context
    pub context: AuditContext,

    /// Hash of previous entry (blockchain-style chain)
    pub previous_hash: String,

    /// Hash of this entry
    pub entry_hash: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum AuditEventType {
    Authentication,
    Authorization,
    DataAccess,
    DataModification,
    DataDeletion,
    ModelExecution,
    ConfigurationChange,
    SecurityEvent,
    SystemEvent,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Actor {
    pub actor_type: ActorType,
    pub id: String,
    pub name: Option<String>,
    pub ip_address: Option<String>,
    pub user_agent: Option<String>,
    pub session_id: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ActorType {
    User,
    Service,
    System,
    Anonymous,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Resource {
    pub resource_type: ResourceType,
    pub id: String,
    pub classification: Option<ClassificationLevel>,
    pub metadata: Option<serde_json::Value>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ResourceType {
    AudioRecording,
    Embedding,
    Model,
    Query,
    Configuration,
    User,
    ApiKey,
    RABEvidencePack,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Action {
    pub action_type: ActionType,
    pub details: String,
    pub parameters: Option<serde_json::Value>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ActionType {
    Create,
    Read,
    Update,
    Delete,
    Query,
    Export,
    Import,
    Execute,
    Authenticate,
    Authorize,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Outcome {
    pub success: bool,
    pub error_code: Option<String>,
    pub error_message: Option<String>,
    pub affected_count: Option<u64>,
    pub duration_ms: Option<u64>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AuditContext {
    /// Request correlation ID for tracing
    pub correlation_id: String,

    /// Server that processed the request
    pub server_id: String,

    /// API endpoint or function
    pub endpoint: String,

    /// Request method
    pub method: String,

    /// Query or search terms (sanitized)
    pub query_sanitized: Option<String>,

    /// Data classification of accessed resources
    pub data_classification: Option<ClassificationLevel>,

    /// Regulatory frameworks involved
    pub regulations_involved: Vec<String>,
}

impl AuditEntry {
    pub fn new(
        event_type: AuditEventType,
        actor: Actor,
        resource: Resource,
        action: Action,
        outcome: Outcome,
        context: AuditContext,
        previous_hash: String,
    ) -> Self {
        let mut entry = Self {
            id: Uuid::new_v4(),
            timestamp: Utc::now(),
            event_type,
            actor,
            resource,
            action,
            outcome,
            context,
            previous_hash,
            entry_hash: String::new(),
        };

        entry.entry_hash = entry.compute_hash();
        entry
    }

    fn compute_hash(&self) -> String {
        let mut hasher = Sha256::new();

        hasher.update(self.id.to_string().as_bytes());
        hasher.update(self.timestamp.to_rfc3339().as_bytes());
        hasher.update(serde_json::to_string(&self.event_type).unwrap().as_bytes());
        hasher.update(serde_json::to_string(&self.actor).unwrap().as_bytes());
        hasher.update(serde_json::to_string(&self.resource).unwrap().as_bytes());
        hasher.update(serde_json::to_string(&self.action).unwrap().as_bytes());
        hasher.update(serde_json::to_string(&self.outcome).unwrap().as_bytes());
        hasher.update(self.previous_hash.as_bytes());

        format!("{:x}", hasher.finalize())
    }

    pub fn verify_chain(&self, previous: &AuditEntry) -> bool {
        self.previous_hash == previous.entry_hash
    }
}

/// RAB Evidence Pack Provenance
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RABProvenance {
    /// Unique provenance ID
    pub id: Uuid,

    /// When the evidence pack was generated
    pub generated_at: DateTime<Utc>,

    /// Query that triggered the generation
    pub query_id: String,

    /// Retrieved neighbors with source attribution
    pub retrieved_sources: Vec<RetrievedSource>,

    /// Model version used for embeddings
    pub embedding_model: ModelVersion,

    /// Search parameters used
    pub search_parameters: SearchParameters,

    /// Confidence metrics
    pub confidence: ConfidenceMetrics,

    /// Cryptographic signature for integrity
    pub signature: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RetrievedSource {
    /// Source recording ID
    pub recording_id: String,

    /// Segment within recording
    pub segment_id: String,

    /// Distance/similarity score
    pub similarity_score: f32,

    /// Original data source (dataset name, institution)
    pub data_source: String,

    /// License/usage terms
    pub license: String,

    /// Attribution string
    pub attribution: String,

    /// Timestamp of source recording
    pub source_timestamp: Option<DateTime<Utc>>,

    /// Location (if not restricted)
    pub location: Option<FuzzedLocation>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FuzzedLocation {
    /// Fuzzing applied (for protected species)
    pub fuzzing_radius_km: f32,

    /// Fuzzed coordinates
    pub latitude: f64,
    pub longitude: f64,

    /// Region name (safe to disclose)
    pub region: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ModelVersion {
    pub name: String,
    pub version: String,
    pub hash: String,  // SHA256 of model weights
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SearchParameters {
    pub top_k: usize,
    pub distance_metric: String,
    pub min_similarity: f32,
    pub filters_applied: Vec<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ConfidenceMetrics {
    /// Overall retrieval confidence
    pub retrieval_confidence: f32,

    /// Similarity distribution statistics
    pub similarity_mean: f32,
    pub similarity_std: f32,

    /// Number of sources above threshold
    pub high_confidence_count: usize,
}

7. Secure ONNX Model Execution

// model_security.rs

use std::path::Path;
use sha2::{Sha256, Digest};

/// Configuration for secure ONNX model execution
pub struct ONNXSecurityConfig {
    /// Expected model hash (SHA256)
    pub expected_model_hash: String,

    /// Maximum input tensor size (bytes)
    pub max_input_size: usize,

    /// Maximum output tensor size (bytes)
    pub max_output_size: usize,

    /// Execution timeout (milliseconds)
    pub execution_timeout_ms: u64,

    /// Memory limit for inference (bytes)
    pub memory_limit: usize,

    /// Allow GPU execution
    pub allow_gpu: bool,

    /// Allowed execution providers
    pub allowed_providers: Vec<String>,
}

impl Default for ONNXSecurityConfig {
    fn default() -> Self {
        Self {
            expected_model_hash: String::new(), // Must be set explicitly
            max_input_size: 160_000 * 4,        // 160k samples * 4 bytes (f32)
            max_output_size: 1536 * 4,          // 1536-dim embedding * 4 bytes
            execution_timeout_ms: 30_000,        // 30 seconds
            memory_limit: 2 * 1024 * 1024 * 1024, // 2 GB
            allow_gpu: true,
            allowed_providers: vec![
                "CPUExecutionProvider".into(),
                "CUDAExecutionProvider".into(),
            ],
        }
    }
}

pub struct SecureONNXRuntime {
    config: ONNXSecurityConfig,
    model_hash: String,
    // session: ort::Session, // actual ONNX runtime session
}

impl SecureONNXRuntime {
    /// Load and verify ONNX model
    pub fn load(model_path: &Path, config: ONNXSecurityConfig) -> Result<Self, ModelSecurityError> {
        // 1. Verify model file integrity
        let model_bytes = std::fs::read(model_path)
            .map_err(|e| ModelSecurityError::LoadError(e.to_string()))?;

        let mut hasher = Sha256::new();
        hasher.update(&model_bytes);
        let model_hash = format!("{:x}", hasher.finalize());

        if !config.expected_model_hash.is_empty() && model_hash != config.expected_model_hash {
            return Err(ModelSecurityError::IntegrityViolation {
                expected: config.expected_model_hash.clone(),
                actual: model_hash,
            });
        }

        // 2. Validate model structure (basic sanity checks)
        validate_onnx_structure(&model_bytes)?;

        // 3. Create ONNX runtime session with security constraints
        // let session = create_secure_session(&model_bytes, &config)?;

        Ok(Self {
            config,
            model_hash,
            // session,
        })
    }

    /// Execute inference with security constraints
    pub fn infer(&self, input: &[f32]) -> Result<Vec<f32>, ModelSecurityError> {
        // 1. Validate input size
        let input_bytes = input.len() * std::mem::size_of::<f32>();
        if input_bytes > self.config.max_input_size {
            return Err(ModelSecurityError::InputTooLarge {
                size: input_bytes,
                max: self.config.max_input_size,
            });
        }

        // 2. Validate input dimensions for Perch 2.0 (160,000 samples)
        if input.len() != 160_000 {
            return Err(ModelSecurityError::InvalidInputDimensions {
                expected: 160_000,
                actual: input.len(),
            });
        }

        // 3. Check for NaN/Inf in input
        for (i, &val) in input.iter().enumerate() {
            if val.is_nan() {
                return Err(ModelSecurityError::InvalidInputValue {
                    index: i,
                    reason: "NaN value".into(),
                });
            }
            if val.is_infinite() {
                return Err(ModelSecurityError::InvalidInputValue {
                    index: i,
                    reason: "Infinite value".into(),
                });
            }
        }

        // 4. Execute with timeout
        // let output = tokio::time::timeout(
        //     Duration::from_millis(self.config.execution_timeout_ms),
        //     self.session.run(input)
        // ).await??;

        // 5. Validate output
        // validate_output(&output, &self.config)?;

        // Placeholder - actual implementation uses ort crate
        Ok(vec![0.0; 1536])
    }
}

fn validate_onnx_structure(model_bytes: &[u8]) -> Result<(), ModelSecurityError> {
    // Basic ONNX format validation
    // Check magic bytes, version, graph structure

    if model_bytes.len() < 8 {
        return Err(ModelSecurityError::InvalidFormat("File too small".into()));
    }

    // ONNX files start with specific protobuf structure
    // This is a simplified check - production should use onnx crate for parsing

    Ok(())
}

#[derive(Debug)]
pub enum ModelSecurityError {
    LoadError(String),
    IntegrityViolation { expected: String, actual: String },
    InvalidFormat(String),
    InputTooLarge { size: usize, max: usize },
    InvalidInputDimensions { expected: usize, actual: usize },
    InvalidInputValue { index: usize, reason: String },
    ExecutionTimeout,
    MemoryExceeded,
    OutputValidationFailed(String),
}

8. Memory Safety (Rust Advantages)

// memory_safety.rs

//! 7sense leverages Rust's memory safety guarantees to prevent
//! entire classes of vulnerabilities common in systems handling
//! binary data (audio files, embeddings, model weights).

/// Key Memory Safety Features Utilized
///
/// 1. BUFFER OVERFLOW PREVENTION
///    - Rust's bounds checking on array/slice access
///    - No raw pointer arithmetic without unsafe blocks
///    - Example: Audio sample access is always bounds-checked
///
/// 2. USE-AFTER-FREE PREVENTION
///    - Ownership system ensures memory is freed exactly once
///    - Embedding vectors cannot be accessed after transfer
///    - Example: Once an embedding is moved to RuVector, caller cannot access it
///
/// 3. DATA RACE PREVENTION
///    - Send/Sync traits enforce thread-safe data sharing
///    - RuVector's concurrent access is compile-time verified
///    - Example: Concurrent embedding queries are proven race-free
///
/// 4. NULL POINTER PREVENTION
///    - Option<T> explicitly represents nullable values
///    - No null pointer dereferences possible
///    - Example: Missing metadata returns None, not crash
///
/// 5. INTEGER OVERFLOW PROTECTION
///    - Debug mode panics on overflow
///    - Release mode can use checked_* methods
///    - Example: Audio duration calculations use checked arithmetic

/// Safe audio buffer handling
pub struct AudioBuffer {
    samples: Vec<f32>,
    sample_rate: u32,
}

impl AudioBuffer {
    /// Create a new audio buffer with validated dimensions
    pub fn new(samples: Vec<f32>, sample_rate: u32) -> Result<Self, AudioError> {
        // Capacity is already allocated, no buffer overflow possible
        if samples.is_empty() {
            return Err(AudioError::EmptyBuffer);
        }

        // Checked arithmetic prevents integer overflow
        let duration_samples = samples.len();
        let _duration_seconds = duration_samples
            .checked_div(sample_rate as usize)
            .ok_or(AudioError::InvalidSampleRate)?;

        Ok(Self { samples, sample_rate })
    }

    /// Access samples safely - bounds checked at compile time with iterators
    pub fn iter(&self) -> impl Iterator<Item = &f32> {
        self.samples.iter()
    }

    /// Slice access - bounds checked at runtime, returns None if out of bounds
    pub fn get_segment(&self, start: usize, end: usize) -> Option<&[f32]> {
        self.samples.get(start..end)
    }
}

/// Safe embedding handling with ownership transfer
pub struct EmbeddingHandle {
    /// Private field prevents external construction
    embedding: Box<[f32; 1536]>,
    /// Metadata stays with the embedding
    metadata: EmbeddingMetadata,
}

impl EmbeddingHandle {
    /// Consume the handle to get the embedding - prevents double-use
    pub fn into_inner(self) -> Box<[f32; 1536]> {
        // self is moved here, cannot be used again
        self.embedding
    }

    /// Borrow for read-only access
    pub fn as_slice(&self) -> &[f32] {
        &self.embedding[..]
    }
}

/// Thread-safe shared state for concurrent embedding operations
pub struct ConcurrentEmbeddingStore {
    /// RwLock allows multiple readers or single writer
    /// Compile-time guaranteed no data races
    store: parking_lot::RwLock<std::collections::HashMap<String, EmbeddingHandle>>,
}

impl ConcurrentEmbeddingStore {
    pub fn new() -> Self {
        Self {
            store: parking_lot::RwLock::new(std::collections::HashMap::new()),
        }
    }

    /// Read access - multiple threads can read simultaneously
    pub fn get(&self, key: &str) -> Option<Vec<f32>> {
        let guard = self.store.read();
        guard.get(key).map(|h| h.as_slice().to_vec())
    }

    /// Write access - exclusive, blocks readers
    pub fn insert(&self, key: String, handle: EmbeddingHandle) {
        let mut guard = self.store.write();
        guard.insert(key, handle);
        // Lock released here, other threads can proceed
    }
}

/// Zeroing sensitive data on drop
pub struct SensitiveBuffer {
    data: Vec<u8>,
}

impl Drop for SensitiveBuffer {
    fn drop(&mut self) {
        // Explicitly zero memory before deallocation
        // Prevents sensitive data from lingering in freed memory
        for byte in &mut self.data {
            unsafe {
                std::ptr::write_volatile(byte, 0);
            }
        }
        // Compiler fence prevents optimization from removing the zeroing
        std::sync::atomic::fence(std::sync::atomic::Ordering::SeqCst);
    }
}

#[derive(Debug)]
pub enum AudioError {
    EmptyBuffer,
    InvalidSampleRate,
}

pub struct EmbeddingMetadata {
    pub source_id: String,
    pub generated_at: chrono::DateTime<chrono::Utc>,
}

9. OWASP Top 10 Mitigations (Bioacoustics Domain)

OWASP Category 7sense-Specific Risk Mitigation
A01:2021 Broken Access Control Unauthorized access to protected species data RBAC with classification-based access, location fuzzing for sensitive coordinates
A02:2021 Cryptographic Failures Embedding data exposure, weak provenance AES-256 encryption at rest, Ed25519 signatures on RAB evidence packs
A03:2021 Injection Path traversal in audio storage, query injection in Cypher Strict path canonicalization (Section 3), parameterized queries only
A04:2021 Insecure Design Model poisoning via adversarial audio Embedding bounds validation, anomaly detection on insertions
A05:2021 Security Misconfiguration Exposed ONNX model internals, debug endpoints Hardened default config, model integrity verification (Section 7)
A06:2021 Vulnerable Components Outdated ONNX runtime, RuVector dependencies Automated dependency scanning, pinned versions with hash verification
A07:2021 Auth Failures Weak API key management, session hijacking Argon2id hashing, short-lived JWTs, secure session management
A08:2021 Data Integrity Failures Corrupted embeddings, falsified provenance Hash-chained audit logs, cryptographic RAB signatures
A09:2021 Logging Failures Missing audit trail for protected data access Comprehensive audit logging (Section 6), immutable log chain
A10:2021 SSRF Model loading from attacker-controlled URLs Local-only model loading, no remote URL support

10. Security Testing Requirements

// security_tests.rs

#[cfg(test)]
mod security_tests {
    use super::*;

    /// Test: Path traversal attempts must be rejected
    #[test]
    fn test_path_traversal_prevention() {
        let config = SecurePathConfig::default();
        let root = Path::new("/data/audio");

        let malicious_paths = [
            "../../../etc/passwd",
            "..\\..\\..\\windows\\system32\\config\\sam",
            "audio/../../secret",
            "audio%2f..%2f..%2fsecret",
            "audio\x00.wav",  // Null byte injection
            "....//....//etc/passwd",  // Bypass attempts
        ];

        for path in &malicious_paths {
            let result = secure_path(path, root, &config);
            assert!(result.is_err(), "Path should be rejected: {}", path);
        }
    }

    /// Test: Embedding bounds are enforced
    #[test]
    fn test_embedding_bounds_validation() {
        let config = EmbeddingValidationConfig::default();

        // Test NaN rejection
        let mut nan_embedding = vec![0.0f32; 1536];
        nan_embedding[100] = f32::NAN;
        assert!(validate_embedding(&nan_embedding, &config).is_err());

        // Test infinity rejection
        let mut inf_embedding = vec![0.0f32; 1536];
        inf_embedding[500] = f32::INFINITY;
        assert!(validate_embedding(&inf_embedding, &config).is_err());

        // Test dimension mismatch
        let wrong_dim = vec![0.0f32; 512];
        assert!(validate_embedding(&wrong_dim, &config).is_err());

        // Test extreme values
        let mut extreme_embedding = vec![0.0f32; 1536];
        extreme_embedding[0] = 1000.0;  // Way above max
        assert!(validate_embedding(&extreme_embedding, &config).is_err());
    }

    /// Test: Audio validation rejects malformed files
    #[test]
    fn test_audio_validation() {
        let config = AudioValidationConfig::default();

        // Test: Reject files exceeding size limit
        // Test: Reject non-audio files disguised as audio
        // Test: Reject wrong sample rate
        // Test: Reject stereo files (require mono)
        // Test: Detect embedded executables
    }

    /// Test: Rate limiting prevents abuse
    #[test]
    fn test_rate_limiting() {
        let config = RateLimiterConfig::default();
        let limiter = RateLimiter::new(config);

        // Exhaust rate limit
        for _ in 0..1000 {
            let _ = limiter.check("user1", EndpointCategory::Search);
        }

        // Next request should be limited
        let result = limiter.check("user1", EndpointCategory::Search);
        assert!(result.is_err());
    }

    /// Test: Classification access control enforced
    #[test]
    fn test_classification_access() {
        let protected_classification = DataClassification {
            level: ClassificationLevel::Protected,
            access_requirements: AccessRequirements {
                min_role: UserRole::Researcher,
                required_permissions: vec![Permission::ProtectedSpeciesRead],
                requires_dua: true,
                ..Default::default()
            },
            ..Default::default()
        };

        // Public user should be denied
        let public_context = AccessContext {
            requester_region: "US".into(),
            ..Default::default()
        };
        assert!(check_access(
            &protected_classification,
            &UserRole::Public,
            &[Permission::AudioRead],
            &public_context
        ).is_err());

        // Researcher with correct permissions should be allowed
        assert!(check_access(
            &protected_classification,
            &UserRole::Researcher,
            &[Permission::ProtectedSpeciesRead],
            &public_context
        ).is_ok());
    }

    /// Test: Audit log chain integrity
    #[test]
    fn test_audit_chain_integrity() {
        let entry1 = AuditEntry::new(
            AuditEventType::DataAccess,
            Actor { actor_type: ActorType::User, id: "user1".into(), ..Default::default() },
            Resource { resource_type: ResourceType::AudioRecording, id: "rec1".into(), ..Default::default() },
            Action { action_type: ActionType::Read, details: "Query".into(), ..Default::default() },
            Outcome { success: true, ..Default::default() },
            AuditContext::default(),
            "genesis".into(),
        );

        let entry2 = AuditEntry::new(
            AuditEventType::DataAccess,
            Actor { actor_type: ActorType::User, id: "user2".into(), ..Default::default() },
            Resource { resource_type: ResourceType::AudioRecording, id: "rec2".into(), ..Default::default() },
            Action { action_type: ActionType::Read, details: "Query".into(), ..Default::default() },
            Outcome { success: true, ..Default::default() },
            AuditContext::default(),
            entry1.entry_hash.clone(),
        );

        assert!(entry2.verify_chain(&entry1));

        // Tampering should break chain
        let mut tampered = entry1.clone();
        tampered.actor.id = "attacker".into();
        assert!(!entry2.verify_chain(&tampered));
    }

    /// Test: ONNX model integrity verification
    #[test]
    fn test_model_integrity() {
        let config = ONNXSecurityConfig {
            expected_model_hash: "known_good_hash_here".into(),
            ..Default::default()
        };

        // Loading model with wrong hash should fail
        // let result = SecureONNXRuntime::load(Path::new("tampered_model.onnx"), config);
        // assert!(matches!(result, Err(ModelSecurityError::IntegrityViolation { .. })));
    }
}

Consequences

Positive

  1. Regulatory Compliance: Classification system enables ESA/CITES compliance
  2. Research Integrity: RAB provenance tracking supports scientific reproducibility
  3. Defense in Depth: Multiple security layers prevent single-point failures
  4. Memory Safety: Rust eliminates buffer overflows, use-after-free, data races
  5. Auditability: Hash-chained logs provide tamper-evident audit trail
  6. Performance: Security checks are designed for minimal latency impact

Negative

  1. Development Overhead: Security validation adds code complexity
  2. Operational Burden: Classification management requires ongoing curation
  3. Access Friction: Researchers may face additional hurdles for protected data
  4. Storage Overhead: Audit logs and provenance data increase storage requirements

Risks and Mitigations

Risk Likelihood Impact Mitigation
Security configuration drift Medium High Automated security policy enforcement, regular audits
Classification errors Medium High Human review workflow, conservative default classification
Key compromise Low Critical Key rotation, HSM for production keys, breach response plan
Insider threat Low High Principle of least privilege, comprehensive audit logging

References

Appendix A: Security Checklist

Pre-Deployment

  • All dependencies audited and pinned
  • ONNX model hash verified and documented
  • Encryption keys generated and stored in vault
  • Rate limiting configured for production load
  • Audit logging enabled and tested
  • Classification policies defined for all data types
  • Access control policies reviewed by stakeholders
  • Penetration testing completed

Operational

  • Security monitoring dashboards configured
  • Alert thresholds set for anomalous access patterns
  • Incident response runbook documented
  • Key rotation schedule established
  • Audit log retention policy configured
  • Backup encryption verified

Compliance

  • Data classification inventory complete
  • Regulatory framework mapping documented
  • Data use agreements templated and reviewed
  • Privacy impact assessment completed
  • Security training materials prepared