Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

61 KiB

Raw Blame History

SPARC Specification: ruvector-attention Crate

Version: 1.0.0 Date: 2025-11-30 Status: Draft Authors: RuVector Research Team SPARC Phase: Specification

Executive Summary
Requirements Analysis
Module Architecture
API Design
Performance Targets
Compatibility Matrix
Testing Strategy
Success Criteria
Constraints and Dependencies
Risk Assessment

1. Executive Summary

1.1 Vision

Create a modular, high-performance attention mechanism library specifically designed for GNN latent space operations in RuVector. The ruvector-attention crate will implement 10 distinct attention mechanisms from research literature, enabling researchers and practitioners to experiment with different attention strategies for graph-structured data.

Core Mission: Bridge the gap between latent space representations and graph topology through specialized attention mechanisms optimized for HNSW-based vector databases.

1.2 Goals

Primary Goals:

Modularity: Each attention mechanism is a standalone, composable component
Performance: Achieve <200ms latency for 95% of attention operations on 1000-neighbor graphs
Compatibility: Support WASM, NAPI-RS (Node.js), CLI, and Rust SDK environments
Extensibility: Easy to add new attention mechanisms without modifying core APIs
Research-Driven: Implement cutting-edge attention mechanisms from academic literature

Secondary Goals:

Provide benchmarking tools for comparing attention mechanisms
Enable automatic mechanism selection based on graph properties
Support distributed/parallel attention computation
Maintain numerical stability across all implementations

1.3 Performance Targets

Metric	Target	Stretch Goal
Latency (p95)	<200ms @ 1K neighbors	<100ms @ 1K neighbors
Throughput	5,000 ops/sec	10,000 ops/sec
Memory (per op)	<50MB @ 1K neighbors	<25MB @ 1K neighbors
WASM Binary Size	<2MB (gzipped)	<1MB (gzipped)
Compilation Time	<60s (release)	<30s (release)
Test Coverage	>90%	>95%

1.4 Timeline Overview

Phase 1 (Weeks 1-4): Core attention primitives + Multi-head attention Phase 2 (Weeks 5-8): Geometric attention (Hyperbolic, Edge-featured) Phase 3 (Weeks 9-12): Sparse and efficient mechanisms (Flash, Linear) Phase 4 (Weeks 13-16): Adaptive mechanisms (MoE, Cross-attention) Phase 5 (Weeks 17-20): Integration, optimization, documentation

2. Requirements Analysis

2.1 Functional Requirements

FR-001: Core Attention Mechanisms

Priority: CRITICAL Description: Implement foundational attention mechanisms

Acceptance Criteria:

FR-001.1: Scaled Dot-Product Attention (baseline)
FR-001.2: Multi-Head Attention (2-16 heads configurable)
FR-001.3: Supports variable-length input sequences
FR-001.4: Numerically stable softmax implementation
FR-001.5: Gradient computation for backpropagation

Test Cases:

#[test]
fn test_scaled_dot_product_attention() {
    let attn = ScaledDotProductAttention::new(128);
    let query = vec![1.0; 128];
    let keys = vec![vec![1.0; 128]; 10];
    let values = vec![vec![1.0; 128]; 10];

    let output = attn.forward(&query, &keys, &values);
    assert_eq!(output.len(), 128);
    assert!(output.iter().all(|&x| x.is_finite()));
}

FR-002: Geometric Attention Mechanisms

Priority: HIGH Description: Implement attention mechanisms aware of geometric structure

Acceptance Criteria:

FR-002.1: Edge-Featured Attention (GAT-style with edge attributes)
FR-002.2: Hyperbolic Attention (Poincaré ball model)
FR-002.3: Mixed-Curvature Attention (Euclidean + Hyperbolic fusion)
FR-002.4: Manifold-Aware Attention

Edge-Featured Attention:

score(i, j) = LeakyReLU(a^T [W·h_i || W·h_j || W_e·edge_ij])

Hyperbolic Attention:

distance_poincare(x, y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
score(i, j) = -distance_poincare(q_i, k_j)

Test Cases:

#[test]
fn test_edge_featured_attention() {
    let attn = EdgeFeaturedAttention::new(128, 32);
    let edge_features = vec![vec![1.0; 32]; 10];

    let output = attn.forward_with_edges(
        &query, &keys, &values, Some(&edge_features)
    );
    assert!(output.len() == 128);
}

#[test]
fn test_hyperbolic_attention_bounds() {
    let attn = HyperbolicAttention::new(128, -1.0);
    let query = vec![0.5; 128]; // Inside Poincaré ball

    // Ensure all embeddings stay in ball (||x|| < 1)
    let output = attn.forward(&query, &keys, &values);
    assert!(l2_norm(&output) < 0.99);
}

FR-003: Sparse Attention Patterns

Priority: HIGH Description: Reduce O(n²) complexity through sparsity

Acceptance Criteria:

FR-003.1: Local + Global Attention (Longformer-style)
FR-003.2: Linear Attention (Performer/FAVOR+)
FR-003.3: Flash Attention (memory-efficient tiling)
FR-003.4: Configurable sparsity patterns

Local + Global Pattern:

Attention Matrix:
  [L L L G 0 0 0 0]  L = Local (1-hop neighbors)
  [L L L L G 0 0 0]  G = Global (HNSW higher layers)
  [L L L L L G 0 0]  0 = No attention
  ...

Complexity Requirements:

Local + Global: O(k_local + k_global) where k << n
Linear: O(n·d) where d = feature dimension
Flash: O(n) memory (vs O(n²) standard)

Test Cases:

#[test]
fn test_sparse_attention_complexity() {
    let sparse_attn = SparseGraphAttention::new(
        local_window: 10,
        global_nodes: 5
    );

    // Should only attend to 15 nodes, not all 1000
    let num_neighbors = 1000;
    let actual_attention = sparse_attn.get_attention_mask(num_neighbors);
    assert!(actual_attention.count_nonzero() <= 15);
}

FR-004: Graph-Aware Mechanisms

Priority: HIGH Description: Attention specialized for graph structure

Acceptance Criteria:

FR-004.1: RoPE (Rotary Position Embeddings) for graph distance
FR-004.2: HNSW-layer encoding in attention
FR-004.3: Cross-Attention (Dual-Space: graph + latent)
FR-004.4: Structural feature integration (degree, centrality)

RoPE for Graphs:

// Encode graph distance via rotation
rotation_angle = graph_distance / base^(2i/d)
rotated[i] = emb[i] * cos(θ) - emb[i+1] * sin(θ)

Cross-Attention:

graph_attn = Attention(h, N_graph(h), N_graph(h))
latent_attn = Attention(h, N_latent(h), N_latent(h))
cross_attn = Attention(graph_attn, N_latent(h), N_latent(h))
output = Fusion(graph_attn, latent_attn, cross_attn)

FR-005: Adaptive Mechanisms

Priority: MEDIUM Description: Attention that adapts to input patterns

Acceptance Criteria:

FR-005.1: Mixture of Experts (MoE) Attention
FR-005.2: Learned routing between attention types
FR-005.3: RL-based navigation function learning
FR-005.4: Dynamic head count adjustment

MoE Attention:

router_scores = Router(query)
expert_indices = topk(router_scores, k=2)
output = Σ router_scores[i] * Expert[i](query, keys, values)

Experts:

Local Expert: Standard attention for 1-hop neighbors
Hierarchical Expert: Hyperbolic attention for HNSW layers
Global Expert: Linear attention for distant nodes
Structural Expert: Edge-featured attention

FR-006: Training and Optimization Utilities

Priority: HIGH Description: Tools for training attention-based models

Acceptance Criteria:

FR-006.1: Contrastive losses (InfoNCE, Local Contrastive)
FR-006.2: Spectral regularization (Laplacian smoothness)
FR-006.3: Multi-objective loss balancing
FR-006.4: Curriculum learning schedules
FR-006.5: Hard negative mining

FR-007: Tensor Compression

Priority: MEDIUM Description: Memory-efficient tensor operations

Acceptance Criteria:

FR-007.1: Quantization (INT8, INT4)
FR-007.2: Low-rank factorization
FR-007.3: Sparse tensor storage
FR-007.4: Hierarchical compression for HNSW layers

FR-008: SIMD Optimizations

Priority: MEDIUM Description: Vectorized operations for performance

Acceptance Criteria:

FR-008.1: AVX2/AVX-512 support for x86_64
FR-008.2: NEON support for ARM
FR-008.3: WASM SIMD support
FR-008.4: Automatic fallback to scalar operations

2.2 Non-Functional Requirements

NFR-001: Performance

NFR-001.1: Latency

Requirement: p95 latency <200ms for 1000-neighbor attention
Measurement: Benchmark suite with synthetic graphs
Verification: CI/CD performance regression tests

NFR-001.2: Throughput

Requirement: 5,000 attention operations per second
Measurement: Batch processing benchmarks
Verification: Load testing with real HNSW graphs

NFR-001.3: Memory

Requirement: Peak memory <50MB per operation
Measurement: Memory profiling with valgrind/heaptrack
Verification: Memory regression tests in CI

NFR-001.4: Scalability

Requirement: Linear scaling up to 10K neighbors
Measurement: Complexity analysis and empirical benchmarks
Verification: Big-O complexity proofs + empirical validation

NFR-002: Reliability

NFR-002.1: Numerical Stability

Requirement: All outputs finite (no NaN, Inf) across 10M operations
Measurement: Fuzzing with random inputs
Verification: Property-based testing with proptest

NFR-002.2: Error Handling

Requirement: All errors recoverable, 100% error path coverage
Measurement: Error injection testing
Verification: Unit tests for error cases

NFR-002.3: Determinism

Requirement: Same inputs produce same outputs (no random behavior)
Measurement: Repeated execution tests
Verification: Determinism tests in CI

NFR-003: Maintainability

NFR-003.1: Code Quality

Requirement: Clippy clean (zero warnings), rustfmt formatted
Measurement: CI linting checks
Verification: Pre-commit hooks + CI gates

NFR-003.2: Documentation

Requirement: 100% public API documented with examples
Measurement: rustdoc coverage tool
Verification: Doc tests pass, examples compile

NFR-003.3: Test Coverage

Requirement: >90% line coverage, >95% branch coverage
Measurement: cargo-tarpaulin
Verification: CI coverage reports

NFR-004: Portability

NFR-004.1: Platform Support

Requirement: Linux, macOS, Windows support
Measurement: CI testing on all platforms
Verification: Cross-platform integration tests

NFR-004.2: WASM Compatibility

Requirement: Full functionality in WASM (wasm32-unknown-unknown)
Measurement: WASM-specific test suite
Verification: Browser and Node.js WASM tests

NFR-004.3: NAPI-RS Support

Requirement: All attention mechanisms callable from Node.js
Measurement: Node.js integration tests
Verification: NPM package smoke tests

NFR-005: Security

NFR-005.1: Memory Safety

Requirement: Zero unsafe code blocks (or 100% audited unsafe)
Measurement: Manual code review
Verification: MIRI checks, cargo-geiger

NFR-005.2: Dependency Audit

Requirement: All dependencies audited, no known CVEs
Measurement: cargo-audit
Verification: Automated dependency scanning in CI

2.3 Constraints

C-001: Compatibility Constraints

Rust Version: MSRV 1.77+ (per workspace configuration)
No GPU: All implementations must run on CPU (WASM/NAPI-RS requirement)
No Standard Library in WASM: Must support #![no_std] for WASM32

C-002: API Constraints

Backwards Compatibility: Once 1.0 released, follow SemVer strictly
Trait Consistency: All attention mechanisms implement common Attention trait
Builder Pattern: Configuration via builders, not constructors

C-003: Performance Constraints

Compilation Time: Release build <60s on CI runners
Binary Size: WASM bundle <2MB gzipped
Memory Footprint: No global allocators, stack-preferred where possible

C-004: Licensing Constraints

License: MIT (per workspace)
Dependency Licenses: MIT/Apache-2.0 only (no GPL/LGPL)

3. Module Architecture

3.1 Crate Structure

ruvector-attention/
├── Cargo.toml
├── README.md
├── LICENSE
│
├── src/
│   ├── lib.rs                     # Public API, re-exports
│   │
│   ├── core/                      # Core attention primitives
│   │   ├── mod.rs                 # Core module exports
│   │   ├── base.rs                # Attention trait definition
│   │   ├── scaled_dot.rs          # Scaled dot-product attention
│   │   ├── multi_head.rs          # Multi-head attention
│   │   └── config.rs              # Configuration structs
│   │
│   ├── geometric/                 # Geometric attention
│   │   ├── mod.rs
│   │   ├── hyperbolic.rs          # Poincaré ball attention
│   │   ├── edge_featured.rs       # GAT-style edge attention
│   │   ├── mixed_curvature.rs     # Euclidean + Hyperbolic
│   │   └── manifold.rs            # General manifold attention
│   │
│   ├── sparse/                    # Sparse patterns
│   │   ├── mod.rs
│   │   ├── local_global.rs        # Longformer-style
│   │   ├── linear.rs              # Performer/FAVOR+
│   │   ├── flash.rs               # Flash Attention (tiled)
│   │   └── patterns.rs            # Sparsity pattern utilities
│   │
│   ├── graph/                     # Graph-aware attention
│   │   ├── mod.rs
│   │   ├── rope_graph.rs          # RoPE for graph distances
│   │   ├── cross_space.rs         # Dual-space cross-attention
│   │   ├── hnsw_aware.rs          # HNSW layer encoding
│   │   └── structural.rs          # Degree/centrality features
│   │
│   ├── adaptive/                  # Adaptive/learned mechanisms
│   │   ├── mod.rs
│   │   ├── moe.rs                 # Mixture of Experts
│   │   ├── learned_routing.rs     # Attention routing
│   │   ├── rl_navigator.rs        # RL-based graph navigation
│   │   └── dynamic_heads.rs       # Adaptive head count
│   │
│   ├── training/                  # Training utilities
│   │   ├── mod.rs
│   │   ├── losses.rs              # Contrastive, reconstruction
│   │   ├── optimizers.rs          # SGD, Adam, etc.
│   │   ├── regularizers.rs        # Spectral, L2, etc.
│   │   ├── curriculum.rs          # Curriculum learning
│   │   └── hard_negatives.rs      # Negative sampling
│   │
│   ├── compression/               # Tensor compression
│   │   ├── mod.rs
│   │   ├── quantization.rs        # INT8/INT4 quantization
│   │   ├── low_rank.rs            # SVD/Tucker decomposition
│   │   ├── sparse_storage.rs      # CSR/COO sparse tensors
│   │   └── hierarchical.rs        # Layer-wise compression
│   │
│   ├── simd/                      # SIMD optimizations
│   │   ├── mod.rs
│   │   ├── avx2.rs                # AVX2 kernels
│   │   ├── avx512.rs              # AVX-512 kernels
│   │   ├── neon.rs                # ARM NEON kernels
│   │   ├── wasm_simd.rs           # WASM SIMD
│   │   └── dispatch.rs            # Runtime detection
│   │
│   ├── utils/                     # Utilities
│   │   ├── mod.rs
│   │   ├── math.rs                # Math primitives
│   │   ├── tensor.rs              # Tensor ops
│   │   ├── softmax.rs             # Numerically stable softmax
│   │   └── distances.rs           # Distance metrics
│   │
│   └── prelude.rs                 # Common imports
│
├── benches/                       # Benchmarks
│   ├── attention_benchmark.rs     # Core attention benchmarks
│   ├── geometric_benchmark.rs     # Geometric attention
│   ├── sparse_benchmark.rs        # Sparse patterns
│   └── comparison_benchmark.rs    # Mechanism comparison
│
├── tests/                         # Integration tests
│   ├── core_tests.rs              # Core attention tests
│   ├── geometric_tests.rs         # Geometric tests
│   ├── sparse_tests.rs            # Sparse pattern tests
│   ├── numerical_stability.rs     # Stability tests
│   └── property_tests.rs          # Property-based tests
│
├── ffi/                           # Foreign Function Interface
│   ├── wasm/                      # WASM bindings
│   │   ├── Cargo.toml
│   │   ├── src/
│   │   │   └── lib.rs             # wasm-bindgen exports
│   │   └── tests/
│   │       └── web.rs             # Browser tests
│   │
│   └── napi/                      # NAPI-RS bindings
│       ├── Cargo.toml
│       ├── src/
│       │   └── lib.rs             # napi-derive exports
│       └── index.d.ts             # TypeScript definitions
│
├── cli/                           # CLI interface
│   ├── Cargo.toml
│   └── src/
│       ├── main.rs                # CLI entry point
│       ├── commands/              # CLI commands
│       │   ├── benchmark.rs       # Run benchmarks
│       │   ├── compare.rs         # Compare mechanisms
│       │   └── analyze.rs         # Analyze attention patterns
│       └── output.rs              # Formatting
│
├── examples/                      # Examples
│   ├── basic_attention.rs         # Hello world
│   ├── graph_attention.rs         # Graph-aware usage
│   ├── hnsw_integration.rs        # HNSW integration
│   ├── custom_mechanism.rs        # Extending the library
│   └── distributed_attention.rs   # Parallel processing
│
└── docs/                          # Documentation
    ├── design/                    # Design documents
    │   ├── architecture.md        # Architecture overview
    │   ├── api_design.md          # API design rationale
    │   └── performance.md         # Performance analysis
    │
    ├── guides/                    # User guides
    │   ├── getting_started.md     # Quick start
    │   ├── mechanism_guide.md     # Choosing mechanisms
    │   └── integration.md         # Integration guide
    │
    └── research/                  # Research notes
        ├── attention_mechanisms.md
        ├── benchmarks.md
        └── experiments.md

3.2 Module Responsibilities

Core Module (`src/core/`)

Responsibility: Foundational attention mechanisms and trait definitions

Key Components:

Attention trait: Common interface for all mechanisms
ScaledDotProductAttention: Baseline implementation
MultiHeadAttention: Standard multi-head decomposition
AttentionConfig: Configuration builders

Dependencies: utils only

Geometric Module (`src/geometric/`)

Responsibility: Geometry-aware attention mechanisms

Key Components:

HyperbolicAttention: Poincaré ball operations
EdgeFeaturedAttention: GAT-style with edge features
MixedCurvatureAttention: Product space (Euclidean × Hyperbolic)

Dependencies: core, utils

Sparse Module (`src/sparse/`)

Responsibility: Efficient sparse attention patterns

Key Components:

LocalGlobalAttention: Longformer-style
LinearAttention: Kernel-based approximation
FlashAttention: Memory-efficient tiling

Dependencies: core, utils, simd (optional)

Graph Module (`src/graph/`)

Responsibility: Graph structure-aware mechanisms

Key Components:

GraphRoPE: Rotary embeddings for graph distance
CrossSpaceAttention: Dual topology + latent space
HNSWAwareAttention: HNSW layer encoding

Dependencies: core, geometric, utils

Adaptive Module (`src/adaptive/`)

Responsibility: Learned and adaptive attention

Key Components:

MoEAttention: Mixture of experts routing
RLNavigator: Reinforcement learning-based navigation
DynamicHeadAttention: Runtime head count adjustment

Dependencies: core, geometric, sparse, graph, training

Training Module (`src/training/`)

Responsibility: Loss functions and optimization

Key Components:

ContrastiveLoss: InfoNCE, Triplet
SpectralRegularizer: Laplacian smoothness
HardNegativeSampler: Mining hard negatives
CurriculumScheduler: Loss weight scheduling

Dependencies: utils only

Compression Module (`src/compression/`)

Responsibility: Memory-efficient tensor storage

Key Components:

Quantizer: INT8/INT4 quantization
LowRankFactorizer: SVD compression
SparseStorage: CSR/COO formats

Dependencies: utils, simd (optional)

SIMD Module (`src/simd/`)

Responsibility: Vectorized operations

Key Components:

SimdDispatcher: Runtime CPU feature detection
Platform-specific kernels: AVX2, AVX-512, NEON, WASM SIMD

Dependencies: utils only

3.3 Dependency Graph

┌─────────────────────────────────────────────────────────────┐
│                        Public API (lib.rs)                  │
└─────────────────────────────────────────────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
        v                    v                    v
  ┌──────────┐         ┌──────────┐        ┌──────────┐
  │   core   │         │ training │        │  utils   │
  └──────────┘         └──────────┘        └──────────┘
        │                                        │
        └────────┬───────────┬─────────┬────────┘
                 │           │         │
                 v           v         v
          ┌──────────┐ ┌─────────┐ ┌──────┐
          │geometric │ │ sparse  │ │ simd │
          └──────────┘ └─────────┘ └──────┘
                 │           │
                 └─────┬─────┘
                       │
                       v
                 ┌──────────┐
                 │  graph   │
                 └──────────┘
                       │
                       v
                 ┌──────────┐
                 │ adaptive │
                 └──────────┘
                       │
                       v
                ┌─────────────┐
                │ compression │
                └─────────────┘

Design Principles:

Acyclic: No circular dependencies
Layered: Lower layers have fewer dependencies
Optional Features: SIMD, compression via feature flags
Core Stability: core and utils are most stable

4. API Design

4.1 Core Trait: `Attention`

/// Core trait for all attention mechanisms
pub trait Attention: Send + Sync {
    /// Forward pass: compute attention over keys/values given query
    ///
    /// # Arguments
    /// * `query` - Query vector (d-dimensional)
    /// * `keys` - Key vectors (n × d)
    /// * `values` - Value vectors (n × d)
    ///
    /// # Returns
    /// Attention-weighted aggregation of values (d-dimensional)
    ///
    /// # Example
    /// ```
    /// use ruvector_attention::core::ScaledDotProductAttention;
    /// use ruvector_attention::Attention;
    ///
    /// let attn = ScaledDotProductAttention::new(128);
    /// let query = vec![1.0; 128];
    /// let keys = vec![vec![0.5; 128]; 10];
    /// let values = keys.clone();
    ///
    /// let output = attn.forward(&query, &keys, &values);
    /// assert_eq!(output.len(), 128);
    /// ```
    fn forward(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
    ) -> Result<Vec<f32>, AttentionError>;

    /// Get attention weights without computing weighted sum
    ///
    /// Useful for visualization and debugging
    fn attention_weights(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
    ) -> Result<Vec<f32>, AttentionError>;

    /// Get hidden dimension
    fn hidden_dim(&self) -> usize;

    /// Check if mechanism supports variable-length inputs
    fn supports_variable_length(&self) -> bool {
        true
    }

    /// Estimated computational complexity (for documentation)
    fn complexity(&self) -> Complexity {
        Complexity::Quadratic
    }
}

/// Computational complexity categories
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Complexity {
    Linear,      // O(n)
    Linearithmic, // O(n log n)
    Quadratic,   // O(n²)
    Custom(&'static str), // Custom complexity description
}

4.2 Error Handling

/// Errors that can occur during attention computation
#[derive(Debug, thiserror::Error)]
pub enum AttentionError {
    /// Input dimension mismatch
    #[error("Dimension mismatch: expected {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },

    /// Empty input
    #[error("Empty input: {context}")]
    EmptyInput { context: String },

    /// Numerical instability detected
    #[error("Numerical instability: {message}")]
    NumericalInstability { message: String },

    /// Invalid configuration
    #[error("Invalid configuration: {message}")]
    InvalidConfig { message: String },

    /// Out of bounds access
    #[error("Index out of bounds: {index} >= {len}")]
    OutOfBounds { index: usize, len: usize },

    /// Unsupported operation
    #[error("Unsupported operation: {operation}")]
    Unsupported { operation: String },

    /// Internal error
    #[error("Internal error: {message}")]
    Internal { message: String },
}

pub type Result<T> = std::result::Result<T, AttentionError>;

4.3 Builder Pattern

/// Builder for ScaledDotProductAttention
#[derive(Debug, Clone)]
pub struct ScaledDotProductAttentionBuilder {
    hidden_dim: usize,
    dropout: Option<f32>,
    temperature: f32,
    normalize: bool,
}

impl ScaledDotProductAttentionBuilder {
    pub fn new(hidden_dim: usize) -> Self {
        Self {
            hidden_dim,
            dropout: None,
            temperature: 1.0,
            normalize: true,
        }
    }

    pub fn dropout(mut self, rate: f32) -> Self {
        assert!((0.0..=1.0).contains(&rate), "Dropout must be in [0, 1]");
        self.dropout = Some(rate);
        self
    }

    pub fn temperature(mut self, temp: f32) -> Self {
        assert!(temp > 0.0, "Temperature must be positive");
        self.temperature = temp;
        self
    }

    pub fn normalize(mut self, normalize: bool) -> Self {
        self.normalize = normalize;
        self
    }

    pub fn build(self) -> Result<ScaledDotProductAttention> {
        if self.hidden_dim == 0 {
            return Err(AttentionError::InvalidConfig {
                message: "hidden_dim must be > 0".to_string(),
            });
        }

        Ok(ScaledDotProductAttention {
            hidden_dim: self.hidden_dim,
            scale: (self.hidden_dim as f32).sqrt().recip(),
            dropout: self.dropout,
            temperature: self.temperature,
            normalize: self.normalize,
        })
    }
}

// Usage:
let attn = ScaledDotProductAttention::builder(128)
    .dropout(0.1)
    .temperature(0.07)
    .build()?;

4.4 Multi-Head Attention API

/// Multi-head attention with configurable heads
pub struct MultiHeadAttention {
    num_heads: usize,
    head_dim: usize,
    hidden_dim: usize,
    w_q: Vec<Linear>,  // Query projections per head
    w_k: Vec<Linear>,  // Key projections per head
    w_v: Vec<Linear>,  // Value projections per head
    w_o: Linear,       // Output projection
    dropout: Option<f32>,
}

impl MultiHeadAttention {
    pub fn builder(hidden_dim: usize, num_heads: usize) -> MultiHeadAttentionBuilder {
        MultiHeadAttentionBuilder::new(hidden_dim, num_heads)
    }

    /// Get attention patterns for all heads
    pub fn head_attention_weights(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
    ) -> Result<Vec<Vec<f32>>> {
        // Returns [num_heads × num_keys] attention weights
        // Useful for interpretability
    }

    /// Get specific head output
    pub fn head_output(
        &self,
        head_idx: usize,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
    ) -> Result<Vec<f32>> {
        // Get output of a single head (for debugging)
    }
}

impl Attention for MultiHeadAttention {
    fn forward(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
    ) -> Result<Vec<f32>> {
        // 1. Project to heads: Q_i, K_i, V_i for each head i
        // 2. Compute attention per head: head_i = Attention(Q_i, K_i, V_i)
        // 3. Concatenate heads: concat(head_1, ..., head_h)
        // 4. Output projection: W_o @ concat
    }
}

4.5 Geometric Attention API

/// Hyperbolic attention in Poincaré ball
pub struct HyperbolicAttention {
    hidden_dim: usize,
    curvature: f32,  // Negative curvature (e.g., -1.0)
    w_q: Linear,
    w_k: Linear,
    w_v: Linear,
}

impl HyperbolicAttention {
    /// Create new hyperbolic attention
    ///
    /// # Arguments
    /// * `hidden_dim` - Embedding dimension
    /// * `curvature` - Curvature of hyperbolic space (must be negative)
    pub fn new(hidden_dim: usize, curvature: f32) -> Result<Self> {
        if curvature >= 0.0 {
            return Err(AttentionError::InvalidConfig {
                message: "Hyperbolic curvature must be negative".to_string(),
            });
        }
        // ...
    }

    /// Poincaré distance between two points
    pub fn poincare_distance(&self, x: &[f32], y: &[f32]) -> f32 {
        // d(x,y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
    }

    /// Möbius addition (hyperbolic vector addition)
    pub fn mobius_add(&self, x: &[f32], y: &[f32]) -> Vec<f32> {
        // (1+2⟨x,y⟩+||y||²)x + (1-||x||²)y / (1+2⟨x,y⟩+||x||²||y||²)
    }

    /// Project point onto Poincaré ball (clip to ||x|| < 1)
    pub fn project_to_ball(&self, x: &mut [f32], eps: f32) {
        let norm = l2_norm(x);
        if norm >= 1.0 - eps {
            let scale = (1.0 - eps) / norm;
            for xi in x.iter_mut() {
                *xi *= scale;
            }
        }
    }
}

impl Attention for HyperbolicAttention {
    fn forward(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
    ) -> Result<Vec<f32>> {
        // 1. Compute hyperbolic similarities: -d_poincare(q, k_j)
        // 2. Softmax attention weights
        // 3. Aggregate in hyperbolic space via Möbius operations
    }
}

4.6 Graph-Aware Attention API

/// Attention with graph-specific features
pub trait GraphAttention: Attention {
    /// Forward pass with edge features
    fn forward_with_edges(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
        edge_features: &[Vec<f32>],
    ) -> Result<Vec<f32>>;

    /// Forward pass with graph metadata
    fn forward_with_metadata(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
        metadata: &GraphMetadata,
    ) -> Result<Vec<f32>>;
}

/// Graph metadata for attention
#[derive(Debug, Clone)]
pub struct GraphMetadata {
    /// Graph distances (e.g., shortest path lengths)
    pub distances: Option<Vec<f32>>,

    /// HNSW layer indices
    pub hnsw_layers: Option<Vec<usize>>,

    /// Edge weights
    pub edge_weights: Option<Vec<f32>>,

    /// Structural features (degree, centrality, etc.)
    pub structural_features: Option<Vec<Vec<f32>>>,
}

/// RoPE-enhanced attention for graphs
pub struct GraphRoPE {
    hidden_dim: usize,
    base: f32,  // Frequency base (default 10000)
    w_q: Linear,
    w_k: Linear,
    w_v: Linear,
}

impl GraphRoPE {
    /// Apply rotation based on graph distance
    pub fn apply_rotation(&self, embedding: &[f32], distance: f32) -> Vec<f32> {
        // Rotate embedding by angle proportional to distance
    }
}

impl GraphAttention for GraphRoPE {
    fn forward_with_metadata(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
        metadata: &GraphMetadata,
    ) -> Result<Vec<f32>> {
        let distances = metadata.distances.as_ref()
            .ok_or_else(|| AttentionError::InvalidConfig {
                message: "GraphRoPE requires distance metadata".to_string(),
            })?;

        // Apply rotations based on distances
        // Compute attention with rotated embeddings
    }
}

4.7 Adaptive Attention API

/// Mixture of Experts attention
pub struct MoEAttention {
    router: Linear,  // Maps query to expert scores
    experts: Vec<Box<dyn Attention>>,
    top_k: usize,    // Number of experts to activate
}

impl MoEAttention {
    pub fn builder() -> MoEAttentionBuilder {
        MoEAttentionBuilder::new()
    }

    /// Get routing decisions
    pub fn get_routing(
        &self,
        query: &[f32],
    ) -> Result<Vec<(usize, f32)>> {
        // Returns (expert_index, weight) pairs
    }

    /// Add an expert to the mixture
    pub fn add_expert(&mut self, expert: Box<dyn Attention>) {
        self.experts.push(expert);
    }
}

impl Attention for MoEAttention {
    fn forward(
        &self,
        query: &[f32],
        keys: &[Vec<f32>],
        values: &[Vec<f32>],
    ) -> Result<Vec<f32>> {
        // 1. Route: scores = Router(query)
        // 2. Select top-k experts
        // 3. Weighted combination of expert outputs
    }
}

/// Builder for MoE attention
pub struct MoEAttentionBuilder {
    router_hidden_dim: usize,
    experts: Vec<Box<dyn Attention>>,
    top_k: usize,
}

impl MoEAttentionBuilder {
    pub fn add_local_expert(mut self, hidden_dim: usize) -> Self {
        self.experts.push(Box::new(
            ScaledDotProductAttention::new(hidden_dim).unwrap()
        ));
        self
    }

    pub fn add_hyperbolic_expert(mut self, hidden_dim: usize, curvature: f32) -> Self {
        self.experts.push(Box::new(
            HyperbolicAttention::new(hidden_dim, curvature).unwrap()
        ));
        self
    }

    pub fn add_sparse_expert(mut self, local_window: usize, global_nodes: usize) -> Self {
        self.experts.push(Box::new(
            LocalGlobalAttention::new(local_window, global_nodes).unwrap()
        ));
        self
    }

    pub fn top_k(mut self, k: usize) -> Self {
        self.top_k = k;
        self
    }

    pub fn build(self) -> Result<MoEAttention> {
        // Validation and construction
    }
}

4.8 Training Utilities API

/// Contrastive loss functions
pub mod losses {
    /// InfoNCE contrastive loss
    pub fn info_nce(
        anchor: &[f32],
        positives: &[&[f32]],
        negatives: &[&[f32]],
        temperature: f32,
    ) -> f32;

    /// Triplet loss
    pub fn triplet(
        anchor: &[f32],
        positive: &[f32],
        negative: &[f32],
        margin: f32,
    ) -> f32;

    /// Local contrastive loss (graph-specific)
    pub fn local_contrastive(
        node_embedding: &[f32],
        neighbor_embeddings: &[Vec<f32>],
        non_neighbor_embeddings: &[Vec<f32>],
        temperature: f32,
    ) -> f32;
}

/// Hard negative mining
pub mod hard_negatives {
    pub enum SamplingStrategy {
        Distance,      // Most similar non-neighbors
        Degree,        // Similar degree distribution
        Mixed,         // Combination
    }

    pub fn sample_hard_negatives(
        anchor: &[f32],
        all_embeddings: &[Vec<f32>],
        positive_indices: &[usize],
        k: usize,
        strategy: SamplingStrategy,
    ) -> Vec<Vec<f32>>;
}

/// Spectral regularization
pub mod regularizers {
    /// Laplacian smoothness
    pub fn laplacian(
        embeddings: &[Vec<f32>],
        edges: &[(usize, usize)],
        edge_weights: Option<&[f32]>,
    ) -> f32;

    /// Orthogonality regularization
    pub fn orthogonality(embeddings: &[Vec<f32>]) -> f32;

    /// Embedding norm regularization
    pub fn norm_penalty(embeddings: &[Vec<f32>], target_norm: f32) -> f32;
}

5. Performance Targets

5.1 Latency Targets

Operation	Input Size	p50	p95	p99
Scaled Dot-Product	100 neighbors	<5ms	<10ms	<20ms
Scaled Dot-Product	1K neighbors	<50ms	<100ms	<150ms
Multi-Head (4 heads)	100 neighbors	<10ms	<20ms	<30ms
Multi-Head (4 heads)	1K neighbors	<80ms	<150ms	<200ms
Hyperbolic	100 neighbors	<15ms	<30ms	<50ms
Sparse (Local+Global)	1K neighbors	<30ms	<60ms	<100ms
Flash Attention	1K neighbors	<40ms	<80ms	<120ms
MoE (4 experts, top-2)	1K neighbors	<100ms	<180ms	<250ms

Measurement Method: Criterion.rs benchmarks with 1000 iterations, warm cache

5.2 Throughput Targets

Mechanism	Target (ops/sec)	Stretch (ops/sec)
Scaled Dot-Product	10,000	20,000
Multi-Head	5,000	10,000
Hyperbolic	3,000	6,000
Sparse	8,000	15,000
Flash	7,000	12,000

Measurement: Batch processing of 1000 operations, averaged over 10 runs

5.3 Memory Targets

Mechanism	Peak Memory (1K neighbors)	Target	Stretch
Scaled Dot-Product	Full attention matrix	<50MB	<25MB
Multi-Head (4 heads)	4× attention matrices	<100MB	<50MB
Flash Attention	Tiled computation	<20MB	<10MB
Sparse	Sparse patterns only	<15MB	<8MB

Measurement: Valgrind/heaptrack during benchmark execution

5.4 Compilation Targets

Configuration	Target	Stretch
Debug build	<10s	<5s
Release build (--release)	<60s	<30s
Release with LTO	<120s	<60s
WASM build	<90s	<45s

Measurement: CI build times on GitHub Actions standard runners

5.5 Binary Size Targets

Target	Size (uncompressed)	Size (gzipped)	Target	Stretch
WASM	5-8 MB	1.5-2 MB	<2MB	<1MB
Native (Linux x86_64)	10-15 MB	N/A	<15MB	<10MB
NAPI-RS addon	8-12 MB	N/A	<12MB	<8MB

Measurement: wasm-opt for WASM, strip for native

5.6 Scalability Targets

Linear Scaling:

Operations should scale O(n) or better up to 10K neighbors
No quadratic blowup in standard use cases

Benchmark:

#[bench]
fn bench_scalability_attention(b: &mut Bencher) {
    for n in [100, 500, 1000, 5000, 10000] {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let query = vec![1.0; 128];
        let keys = vec![vec![1.0; 128]; n];
        let values = keys.clone();

        let start = Instant::now();
        b.iter(|| attn.forward(&query, &keys, &values));
        let elapsed = start.elapsed();

        println!("n={}: {:?}", n, elapsed);
        // Assert linear or sub-quadratic scaling
    }
}

6. Compatibility Matrix

6.1 Rust Version Support

Rust Version	Support Status	Notes
1.77.0 (MSRV)	✅ Supported	Minimum supported version
1.78.x	✅ Supported
1.79.x	✅ Supported
1.80.x+	✅ Supported	Latest stable
Nightly	⚠️ Best-effort	May use unstable features behind flags

Testing: CI runs on MSRV, stable, and nightly

6.2 Platform Support

Desktop Platforms

Platform	Tier	Support Status	CI Testing
Linux x86_64	Tier 1	✅ Full support	Yes
Linux ARM64	Tier 2	✅ Full support	Yes
macOS x86_64	Tier 1	✅ Full support	Yes
macOS ARM64 (M1/M2)	Tier 1	✅ Full support	Yes
Windows x86_64	Tier 1	✅ Full support	Yes
Windows ARM64	Tier 3	⚠️ Best-effort	No

WASM Targets

Target	Support Status	Notes
wasm32-unknown-unknown	✅ Full support	Browser + Node.js
wasm32-wasi	✅ Full support	WASI runtime
wasm32-unknown-emscripten	⚠️ Untested	Should work

WASM Features:

✅ All attention mechanisms
✅ SIMD support (where available)
✅ Multi-threading via Web Workers
❌ File I/O (not needed)

Mobile Platforms

Platform	Support Status	Notes
iOS ARM64	⚠️ Untested	Should work via FFI
Android ARM64	⚠️ Untested	Should work via FFI

6.3 Node.js Support (NAPI-RS)

Node.js Version	Support Status	Notes
18.x LTS	✅ Supported	NAPI-RS requires N-API 9+
20.x LTS	✅ Supported	Recommended
21.x+ Current	✅ Supported	Latest features

NAPI-RS Features:

✅ All attention mechanisms exposed
✅ TypeScript definitions
✅ Async operations (Tokio runtime)
✅ Buffer zero-copy where possible

Package Platforms:

{
  "napi": {
    "triples": {
      "defaults": true,
      "additional": [
        "x86_64-unknown-linux-musl",
        "aarch64-unknown-linux-gnu",
        "aarch64-apple-darwin",
        "x86_64-pc-windows-msvc"
      ]
    }
  }
}

6.4 Feature Flags

Feature	Default	Description	Dependencies
`std`	✅	Standard library support	None
`simd`	❌	SIMD optimizations	`std`
`rayon`	❌	Parallel processing	`std`, `rayon`
`compression`	❌	Tensor compression	`std`
`wasm`	❌	WASM-specific bindings	`wasm-bindgen`
`napi`	❌	Node.js bindings	`napi-rs`
`cli`	❌	CLI interface	`std`, `clap`
`serde`	✅	Serialization support	`serde`

Example:

[dependencies]
ruvector-attention = { version = "0.1", features = ["simd", "rayon"] }

7. Testing Strategy

7.1 Unit Tests

Coverage Target: >90% line coverage, >95% branch coverage

Test Categories:

7.1.1 Correctness Tests

#[cfg(test)]
mod correctness_tests {
    use super::*;

    #[test]
    fn test_attention_output_dimension() {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let output = attn.forward(&query, &keys, &values).unwrap();
        assert_eq!(output.len(), 128);
    }

    #[test]
    fn test_attention_weights_sum_to_one() {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let weights = attn.attention_weights(&query, &keys).unwrap();
        let sum: f32 = weights.iter().sum();
        assert!((sum - 1.0).abs() < 1e-5);
    }

    #[test]
    fn test_empty_neighbors_handling() {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let result = attn.forward(&query, &[], &[]);
        assert!(result.is_err());
        assert!(matches!(result, Err(AttentionError::EmptyInput { .. })));
    }
}

7.1.2 Numerical Stability Tests

#[cfg(test)]
mod stability_tests {
    #[test]
    fn test_large_scores_softmax() {
        // Test softmax with very large scores (overflow risk)
        let scores = vec![1000.0, 999.0, 998.0];
        let weights = softmax(&scores);
        assert!(weights.iter().all(|&w| w.is_finite()));
    }

    #[test]
    fn test_small_scores_softmax() {
        // Test softmax with very small scores (underflow risk)
        let scores = vec![-1000.0, -999.0, -998.0];
        let weights = softmax(&scores);
        assert!(weights.iter().all(|&w| w.is_finite()));
    }

    #[test]
    fn test_hyperbolic_boundary() {
        let attn = HyperbolicAttention::new(128, -1.0).unwrap();
        let query = vec![0.99; 128]; // Near ball boundary
        let output = attn.forward(&query, &keys, &values).unwrap();

        // Output must stay inside ball
        assert!(l2_norm(&output) < 1.0);
    }
}

7.1.3 Edge Case Tests

#[cfg(test)]
mod edge_case_tests {
    #[test]
    fn test_single_neighbor() {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let keys = vec![vec![1.0; 128]];
        let output = attn.forward(&query, &keys, &keys).unwrap();
        // With single neighbor, attention weight should be 1.0
    }

    #[test]
    fn test_identical_keys() {
        // All keys identical -> uniform attention
        let keys = vec![vec![1.0; 128]; 10];
        let weights = attn.attention_weights(&query, &keys).unwrap();
        for w in &weights {
            assert!((w - 0.1).abs() < 1e-5); // 1/10
        }
    }

    #[test]
    fn test_zero_vectors() {
        let query = vec![0.0; 128];
        let keys = vec![vec![0.0; 128]; 10];
        let result = attn.forward(&query, &keys, &keys);
        // Should handle gracefully (may return error or uniform weights)
    }
}

7.2 Integration Tests

Goal: Test interactions between modules

7.2.1 Multi-Mechanism Pipeline

#[test]
fn test_moe_with_multiple_experts() {
    let moe = MoEAttention::builder()
        .add_local_expert(128)
        .add_hyperbolic_expert(128, -1.0)
        .add_sparse_expert(10, 5)
        .top_k(2)
        .build()
        .unwrap();

    let output = moe.forward(&query, &keys, &values).unwrap();
    assert_eq!(output.len(), 128);
}

7.2.2 Graph Attention with HNSW

#[test]
fn test_graph_rope_with_hnsw_layers() {
    let rope = GraphRoPE::new(128, 10000.0).unwrap();

    let metadata = GraphMetadata {
        distances: Some(vec![1.0, 2.0, 3.0]),
        hnsw_layers: Some(vec![0, 1, 2]),
        ..Default::default()
    };

    let output = rope.forward_with_metadata(
        &query, &keys, &values, &metadata
    ).unwrap();

    assert_eq!(output.len(), 128);
}

7.3 Property-Based Tests

Tool: proptest

use proptest::prelude::*;

proptest! {
    #[test]
    fn prop_attention_weights_normalized(
        query in prop::collection::vec(-10.0f32..10.0, 128),
        keys in prop::collection::vec(
            prop::collection::vec(-10.0f32..10.0, 128),
            1..100
        )
    ) {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let weights = attn.attention_weights(&query, &keys).unwrap();

        let sum: f32 = weights.iter().sum();
        prop_assert!((sum - 1.0).abs() < 1e-4);
    }

    #[test]
    fn prop_attention_output_finite(
        query in prop::collection::vec(-100.0f32..100.0, 128),
        keys in prop::collection::vec(
            prop::collection::vec(-100.0f32..100.0, 128),
            1..100
        )
    ) {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let values = keys.clone();
        let output = attn.forward(&query, &keys, &values).unwrap();

        prop_assert!(output.iter().all(|&x| x.is_finite()));
    }
}

7.4 Benchmark Tests

Tool: criterion

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn bench_scaled_dot_product(c: &mut Criterion) {
    let attn = ScaledDotProductAttention::new(128).unwrap();
    let query = vec![1.0; 128];
    let keys = vec![vec![1.0; 128]; 1000];
    let values = keys.clone();

    c.bench_function("scaled_dot_product_1k", |b| {
        b.iter(|| {
            attn.forward(
                black_box(&query),
                black_box(&keys),
                black_box(&values)
            )
        })
    });
}

fn bench_multi_head(c: &mut Criterion) {
    let mut group = c.benchmark_group("multi_head_attention");

    for num_heads in [1, 2, 4, 8] {
        let attn = MultiHeadAttention::builder(128, num_heads)
            .build()
            .unwrap();

        group.bench_function(format!("heads_{}", num_heads), |b| {
            b.iter(|| {
                attn.forward(
                    black_box(&query),
                    black_box(&keys),
                    black_box(&values)
                )
            })
        });
    }

    group.finish();
}

criterion_group!(benches, bench_scaled_dot_product, bench_multi_head);
criterion_main!(benches);

7.5 Fuzzing

Tool: cargo-fuzz

#![no_main]
use libfuzzer_sys::fuzz_target;
use ruvector_attention::core::ScaledDotProductAttention;
use ruvector_attention::Attention;

fuzz_target!(|data: &[u8]| {
    if data.len() < 512 {
        return;
    }

    // Parse fuzzer input into query, keys, values
    let query: Vec<f32> = data[0..128]
        .chunks(4)
        .map(|chunk| f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]))
        .collect();

    // ... similar for keys and values

    let attn = ScaledDotProductAttention::new(32).unwrap();

    // Fuzz target: should never panic
    let _ = attn.forward(&query, &keys, &values);
});

7.6 WASM Tests

#[cfg(target_arch = "wasm32")]
#[cfg(test)]
mod wasm_tests {
    use wasm_bindgen_test::*;

    #[wasm_bindgen_test]
    fn test_attention_in_wasm() {
        let attn = ScaledDotProductAttention::new(128).unwrap();
        let output = attn.forward(&query, &keys, &values).unwrap();
        assert_eq!(output.len(), 128);
    }

    #[wasm_bindgen_test]
    fn test_simd_in_wasm() {
        #[cfg(feature = "simd")]
        {
            // Test WASM SIMD operations
        }
    }
}

7.7 Performance Regression Tests

CI Check: Fail if performance degrades >5% from baseline

#[test]
fn test_performance_regression() {
    let baseline_latency_ms = 100.0; // From previous run

    let start = Instant::now();
    let attn = ScaledDotProductAttention::new(128).unwrap();
    for _ in 0..1000 {
        attn.forward(&query, &keys, &values).unwrap();
    }
    let elapsed = start.elapsed().as_secs_f64() * 1000.0;

    let current_latency_ms = elapsed / 1000.0;
    let regression = (current_latency_ms - baseline_latency_ms) / baseline_latency_ms;

    assert!(
        regression < 0.05,
        "Performance regression detected: {}%", regression * 100.0
    );
}

8. Success Criteria

8.1 Quantifiable Metrics

8.1.1 Functional Completeness

10/10 attention mechanisms implemented (100%)
All mechanisms pass unit tests (100% pass rate)
Integration tests pass (100% pass rate)

8.1.2 Performance

Latency: p95 <200ms @ 1K neighbors for all mechanisms
Throughput: >5,000 ops/sec for scaled dot-product
Memory: Peak usage <50MB per operation
Scalability: Linear or sub-quadratic up to 10K neighbors

8.1.3 Quality

Test coverage: >90% line coverage
Documentation coverage: 100% public APIs documented
Zero compiler warnings: Clippy clean
Zero unsafe code: Or 100% audited and justified

8.1.4 Compatibility

Platforms: Linux, macOS, Windows passing CI
WASM: All tests pass in wasm32-unknown-unknown
NAPI-RS: Node.js 18+, all platforms published
MSRV: Rust 1.77+ supported

8.1.5 Adoption

Examples: 5+ runnable examples
Documentation: Getting started guide, API docs, tutorials
Integration: Used in ruvector-gnn crate

8.2 Acceptance Tests

Phase 1 Acceptance (Weeks 1-4)

✅ Core attention mechanisms (scaled dot-product, multi-head)
✅ Unit tests passing (>80% coverage)
✅ Basic benchmarks established
✅ API design finalized

Phase 2 Acceptance (Weeks 5-8)

✅ Geometric attention (hyperbolic, edge-featured)
✅ Integration tests with graph structures
✅ Performance targets met for core mechanisms
✅ WASM compatibility verified

Phase 3 Acceptance (Weeks 9-12)

✅ Sparse mechanisms (flash, linear, local+global)
✅ Memory targets met
✅ NAPI-RS bindings complete
✅ Documentation 50% complete

Phase 4 Acceptance (Weeks 13-16)

✅ Adaptive mechanisms (MoE, cross-attention)
✅ Training utilities complete
✅ CLI interface functional
✅ All performance targets met

Phase 5 Acceptance (Weeks 17-20)

✅ Full integration with ruvector-gnn
✅ Documentation 100% complete
✅ Optimization passes complete
✅ Ready for 1.0 release

8.3 Release Criteria (v1.0)

Blocker Issues (must fix before release):

Zero failing tests
Zero compiler warnings
All performance targets met
100% public API documented
Security audit complete
Cross-platform CI passing

Nice-to-Have (can defer to 1.1):

GPU acceleration (CUDA/Metal)
Additional attention variants
Advanced SIMD optimizations
Distributed attention

9. Constraints and Dependencies

9.1 Technical Constraints

C-001: No GPU Dependency

Constraint: All implementations must run on CPU Rationale: WASM and NAPI-RS environments lack GPU access Impact: May limit performance for very large graphs Mitigation: SIMD optimizations, algorithm choice (sparse/linear attention)

C-002: Memory Constraints in WASM

Constraint: WASM has limited memory (typically 2-4GB) Rationale: Browser and Node.js WASM environments Impact: Cannot materialize large attention matrices Mitigation: Flash Attention, sparse patterns, streaming computation

C-003: Serialization Requirements

Constraint: All types must be serializable (serde) Rationale: Model saving/loading, network transfer Impact: Design complexity, trait object limitations Mitigation: Enum-based polymorphism, careful trait design

9.2 Dependencies

Core Dependencies

[dependencies]
# Math and numerics
ndarray = { version = "0.16", default-features = false }
rand = { version = "0.8", default-features = false }
rand_distr = { version = "0.4", default-features = false }

# Serialization
serde = { version = "1.0", features = ["derive"], optional = true }
rkyv = { version = "0.8", optional = true }

# Error handling
thiserror = "2.0"

# Optional: SIMD
simsimd = { version = "5.9", optional = true, features = ["nightly"] }

# Optional: Parallel processing
rayon = { version = "1.10", optional = true }

# Optional: WASM
wasm-bindgen = { version = "0.2", optional = true }
js-sys = { version = "0.3", optional = true }

# Optional: NAPI-RS
napi = { version = "2.16", optional = true }
napi-derive = { version = "2.16", optional = true }

Dependency Audit: All dependencies must be MIT/Apache-2.0 licensed

9.3 Integration Dependencies

Upstream (Used By)

ruvector-gnn: Uses attention mechanisms in GNN layers
ruvector-graph: Graph construction with attention-based edge selection

Downstream (Depends On)

ruvector-core: Core vector operations, distance metrics
hnsw_rs: HNSW graph structure (optional, for examples)

10. Risk Assessment

10.1 Technical Risks

Risk	Probability	Impact	Mitigation
Hyperbolic numerical instability	High	Medium	Careful boundary handling, epsilon clipping, extensive testing
WASM performance degradation	Medium	High	WASM SIMD, algorithmic optimizations, benchmarking
Memory bloat in large graphs	Medium	High	Flash Attention, sparse patterns, streaming
API breaking changes	Low	High	Careful API design, SemVer, deprecation warnings
Dependency conflicts	Low	Medium	Minimal dependencies, version pinning

10.2 Schedule Risks

Risk	Probability	Impact	Mitigation
Hyperbolic implementation complexity	Medium	Medium	20% buffer time, fallback to Euclidean
Performance targets not met	Low	High	Early benchmarking, iterative optimization
WASM/NAPI-RS compatibility issues	Low	Medium	Early CI setup, continuous testing

Buffer: 20% time buffer in each phase for unexpected issues

10.3 Operational Risks

Risk	Probability	Impact	Mitigation
CI infrastructure failures	Low	Low	GitHub Actions redundancy, local testing
Documentation drift	Medium	Medium	Doc tests, CI doc generation checks
Contributor onboarding difficulty	Medium	Low	Comprehensive docs, clear examples

11. Open Questions

11.1 Design Questions

Q1: Should we support dynamic mechanism selection at runtime? Options:

A) Enum-based (AttentionMechanism::ScaledDotProduct)
B) Trait objects (Box<dyn Attention>)
C) Both

Q2: How to handle attention visualization? Options:

A) Return attention weights separately
B) Integrate with vis library (e.g., plotters)
C) Export to JSON for external tools

Q3: Should we support distributed attention computation? Options:

A) In-crate via rayon
B) External crate (e.g., ruvector-attention-distributed)
C) Defer to v2.0

11.2 API Questions

Q4: Naming convention for attention mechanisms? Options:

A) Descriptive (ScaledDotProductAttention)
B) Abbreviated (SDPAttention)
C) Mixed (long in code, short in docs)

Q5: Should builders be mandatory or optional? Options:

A) Mandatory (always use builder)
B) Optional (provide new() for defaults)
C) Hybrid (simple types use new(), complex use builder)

Appendix A: Glossary

Term	Definition
Attention	Mechanism for weighted aggregation based on learned similarities
Scaled Dot-Product	`Attention(Q,K,V) = softmax(QK^T/√d) V`
Multi-Head	Parallel attention mechanisms with different projections
Hyperbolic	Non-Euclidean geometry with negative curvature
Poincaré Ball	Model of hyperbolic space as unit ball
GAT	Graph Attention Networks
RoPE	Rotary Position Embeddings
Flash Attention	Memory-efficient tiled attention computation
MoE	Mixture of Experts (learned routing between mechanisms)
InfoNCE	Contrastive loss function
HNSW	Hierarchical Navigable Small World graphs

Appendix B: References

Research Papers

Attention Mechanism: Vaswani et al. (2017) - "Attention Is All You Need"
GAT: Veličković et al. (2018) - "Graph Attention Networks"
Hyperbolic: Chami et al. (2019) - "Hyperbolic Graph Convolutional Neural Networks"
Flash Attention: Dao et al. (2022) - "FlashAttention: Fast and Memory-Efficient Exact Attention"
Performer: Choromanski et al. (2020) - "Rethinking Attention with Performers"
RoPE: Su et al. (2021) - "RoFormer: Enhanced Transformer with Rotary Position Embedding"
MoE: Shazeer et al. (2017) - "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

RuVector Research Documents

/docs/latent-space/attention-mechanisms-research.md
/docs/latent-space/gnn-architecture-analysis.md
/docs/latent-space/optimization-strategies.md
/docs/latent-space/implementation-roadmap.md

External Resources

Document History

Version	Date	Author	Changes
1.0.0	2025-11-30	RuVector Team	Initial specification

Approvals

Role	Name	Signature	Date
Technical Lead
Architecture Review
QA Lead
Product Owner

END OF SPECIFICATION

This document represents the complete specification for the ruvector-attention crate. Implementation should proceed according to the SPARC methodology:

Specification ✅ (this document)
Pseudocode (next phase)
Architecture (detailed design)
Refinement (iterative TDD)
Completion (integration and release)

61 KiB Raw Blame History Unescape Escape

SPARC Specification: ruvector-attention Crate

Table of Contents

1. Executive Summary

1.1 Vision

1.2 Goals

1.3 Performance Targets

1.4 Timeline Overview

2. Requirements Analysis

2.1 Functional Requirements

FR-001: Core Attention Mechanisms

FR-002: Geometric Attention Mechanisms

FR-003: Sparse Attention Patterns

FR-004: Graph-Aware Mechanisms

FR-005: Adaptive Mechanisms

FR-006: Training and Optimization Utilities

FR-007: Tensor Compression

FR-008: SIMD Optimizations

2.2 Non-Functional Requirements

NFR-001: Performance

NFR-002: Reliability

NFR-003: Maintainability

NFR-004: Portability

NFR-005: Security

2.3 Constraints

C-001: Compatibility Constraints

C-002: API Constraints

C-003: Performance Constraints

C-004: Licensing Constraints

3. Module Architecture

3.1 Crate Structure

3.2 Module Responsibilities

Core Module (src/core/)

Geometric Module (src/geometric/)

Sparse Module (src/sparse/)

Graph Module (src/graph/)

Adaptive Module (src/adaptive/)

Training Module (src/training/)

Compression Module (src/compression/)

SIMD Module (src/simd/)

3.3 Dependency Graph

4. API Design

4.1 Core Trait: Attention

4.2 Error Handling

4.3 Builder Pattern

4.4 Multi-Head Attention API

4.5 Geometric Attention API

4.6 Graph-Aware Attention API

4.7 Adaptive Attention API

4.8 Training Utilities API

5. Performance Targets

5.1 Latency Targets

5.2 Throughput Targets

5.3 Memory Targets

5.4 Compilation Targets

5.5 Binary Size Targets

5.6 Scalability Targets

6. Compatibility Matrix

6.1 Rust Version Support

6.2 Platform Support

Desktop Platforms

WASM Targets

Mobile Platforms

6.3 Node.js Support (NAPI-RS)

6.4 Feature Flags

7. Testing Strategy

7.1 Unit Tests

7.1.1 Correctness Tests

7.1.2 Numerical Stability Tests

7.1.3 Edge Case Tests

7.2 Integration Tests

7.2.1 Multi-Mechanism Pipeline

7.2.2 Graph Attention with HNSW

7.3 Property-Based Tests

7.4 Benchmark Tests

7.5 Fuzzing

7.6 WASM Tests

7.7 Performance Regression Tests

8. Success Criteria

8.1 Quantifiable Metrics

61 KiB

Raw Blame History

Core Module (`src/core/`)

Geometric Module (`src/geometric/`)

Sparse Module (`src/sparse/`)

Graph Module (`src/graph/`)

Adaptive Module (`src/adaptive/`)

Training Module (`src/training/`)

Compression Module (`src/compression/`)

SIMD Module (`src/simd/`)

4.1 Core Trait: `Attention`