# SPARC Specification: ruvector-attention Crate **Version**: 1.0.0 **Date**: 2025-11-30 **Status**: Draft **Authors**: RuVector Research Team **SPARC Phase**: Specification --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [Requirements Analysis](#2-requirements-analysis) 3. [Module Architecture](#3-module-architecture) 4. [API Design](#4-api-design) 5. [Performance Targets](#5-performance-targets) 6. [Compatibility Matrix](#6-compatibility-matrix) 7. [Testing Strategy](#7-testing-strategy) 8. [Success Criteria](#8-success-criteria) 9. [Constraints and Dependencies](#9-constraints-and-dependencies) 10. [Risk Assessment](#10-risk-assessment) --- ## 1. Executive Summary ### 1.1 Vision Create a modular, high-performance attention mechanism library specifically designed for GNN latent space operations in RuVector. The `ruvector-attention` crate will implement **10 distinct attention mechanisms** from research literature, enabling researchers and practitioners to experiment with different attention strategies for graph-structured data. **Core Mission**: Bridge the gap between latent space representations and graph topology through specialized attention mechanisms optimized for HNSW-based vector databases. ### 1.2 Goals **Primary Goals**: 1. **Modularity**: Each attention mechanism is a standalone, composable component 2. **Performance**: Achieve <200ms latency for 95% of attention operations on 1000-neighbor graphs 3. **Compatibility**: Support WASM, NAPI-RS (Node.js), CLI, and Rust SDK environments 4. **Extensibility**: Easy to add new attention mechanisms without modifying core APIs 5. **Research-Driven**: Implement cutting-edge attention mechanisms from academic literature **Secondary Goals**: 1. Provide benchmarking tools for comparing attention mechanisms 2. Enable automatic mechanism selection based on graph properties 3. Support distributed/parallel attention computation 4. Maintain numerical stability across all implementations ### 1.3 Performance Targets | Metric | Target | Stretch Goal | |--------|--------|--------------| | **Latency (p95)** | <200ms @ 1K neighbors | <100ms @ 1K neighbors | | **Throughput** | 5,000 ops/sec | 10,000 ops/sec | | **Memory (per op)** | <50MB @ 1K neighbors | <25MB @ 1K neighbors | | **WASM Binary Size** | <2MB (gzipped) | <1MB (gzipped) | | **Compilation Time** | <60s (release) | <30s (release) | | **Test Coverage** | >90% | >95% | ### 1.4 Timeline Overview **Phase 1 (Weeks 1-4)**: Core attention primitives + Multi-head attention **Phase 2 (Weeks 5-8)**: Geometric attention (Hyperbolic, Edge-featured) **Phase 3 (Weeks 9-12)**: Sparse and efficient mechanisms (Flash, Linear) **Phase 4 (Weeks 13-16)**: Adaptive mechanisms (MoE, Cross-attention) **Phase 5 (Weeks 17-20)**: Integration, optimization, documentation --- ## 2. Requirements Analysis ### 2.1 Functional Requirements #### FR-001: Core Attention Mechanisms **Priority**: CRITICAL **Description**: Implement foundational attention mechanisms **Acceptance Criteria**: - [x] FR-001.1: Scaled Dot-Product Attention (baseline) - [ ] FR-001.2: Multi-Head Attention (2-16 heads configurable) - [ ] FR-001.3: Supports variable-length input sequences - [ ] FR-001.4: Numerically stable softmax implementation - [ ] FR-001.5: Gradient computation for backpropagation **Test Cases**: ```rust #[test] fn test_scaled_dot_product_attention() { let attn = ScaledDotProductAttention::new(128); let query = vec![1.0; 128]; let keys = vec![vec![1.0; 128]; 10]; let values = vec![vec![1.0; 128]; 10]; let output = attn.forward(&query, &keys, &values); assert_eq!(output.len(), 128); assert!(output.iter().all(|&x| x.is_finite())); } ``` --- #### FR-002: Geometric Attention Mechanisms **Priority**: HIGH **Description**: Implement attention mechanisms aware of geometric structure **Acceptance Criteria**: - [ ] FR-002.1: Edge-Featured Attention (GAT-style with edge attributes) - [ ] FR-002.2: Hyperbolic Attention (Poincaré ball model) - [ ] FR-002.3: Mixed-Curvature Attention (Euclidean + Hyperbolic fusion) - [ ] FR-002.4: Manifold-Aware Attention **Edge-Featured Attention**: ``` score(i, j) = LeakyReLU(a^T [W·h_i || W·h_j || W_e·edge_ij]) ``` **Hyperbolic Attention**: ``` distance_poincare(x, y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²))) score(i, j) = -distance_poincare(q_i, k_j) ``` **Test Cases**: ```rust #[test] fn test_edge_featured_attention() { let attn = EdgeFeaturedAttention::new(128, 32); let edge_features = vec![vec![1.0; 32]; 10]; let output = attn.forward_with_edges( &query, &keys, &values, Some(&edge_features) ); assert!(output.len() == 128); } #[test] fn test_hyperbolic_attention_bounds() { let attn = HyperbolicAttention::new(128, -1.0); let query = vec![0.5; 128]; // Inside Poincaré ball // Ensure all embeddings stay in ball (||x|| < 1) let output = attn.forward(&query, &keys, &values); assert!(l2_norm(&output) < 0.99); } ``` --- #### FR-003: Sparse Attention Patterns **Priority**: HIGH **Description**: Reduce O(n²) complexity through sparsity **Acceptance Criteria**: - [ ] FR-003.1: Local + Global Attention (Longformer-style) - [ ] FR-003.2: Linear Attention (Performer/FAVOR+) - [ ] FR-003.3: Flash Attention (memory-efficient tiling) - [ ] FR-003.4: Configurable sparsity patterns **Local + Global Pattern**: ``` Attention Matrix: [L L L G 0 0 0 0] L = Local (1-hop neighbors) [L L L L G 0 0 0] G = Global (HNSW higher layers) [L L L L L G 0 0] 0 = No attention ... ``` **Complexity Requirements**: - Local + Global: O(k_local + k_global) where k << n - Linear: O(n·d) where d = feature dimension - Flash: O(n) memory (vs O(n²) standard) **Test Cases**: ```rust #[test] fn test_sparse_attention_complexity() { let sparse_attn = SparseGraphAttention::new( local_window: 10, global_nodes: 5 ); // Should only attend to 15 nodes, not all 1000 let num_neighbors = 1000; let actual_attention = sparse_attn.get_attention_mask(num_neighbors); assert!(actual_attention.count_nonzero() <= 15); } ``` --- #### FR-004: Graph-Aware Mechanisms **Priority**: HIGH **Description**: Attention specialized for graph structure **Acceptance Criteria**: - [ ] FR-004.1: RoPE (Rotary Position Embeddings) for graph distance - [ ] FR-004.2: HNSW-layer encoding in attention - [ ] FR-004.3: Cross-Attention (Dual-Space: graph + latent) - [ ] FR-004.4: Structural feature integration (degree, centrality) **RoPE for Graphs**: ```rust // Encode graph distance via rotation rotation_angle = graph_distance / base^(2i/d) rotated[i] = emb[i] * cos(θ) - emb[i+1] * sin(θ) ``` **Cross-Attention**: ``` graph_attn = Attention(h, N_graph(h), N_graph(h)) latent_attn = Attention(h, N_latent(h), N_latent(h)) cross_attn = Attention(graph_attn, N_latent(h), N_latent(h)) output = Fusion(graph_attn, latent_attn, cross_attn) ``` --- #### FR-005: Adaptive Mechanisms **Priority**: MEDIUM **Description**: Attention that adapts to input patterns **Acceptance Criteria**: - [ ] FR-005.1: Mixture of Experts (MoE) Attention - [ ] FR-005.2: Learned routing between attention types - [ ] FR-005.3: RL-based navigation function learning - [ ] FR-005.4: Dynamic head count adjustment **MoE Attention**: ```rust router_scores = Router(query) expert_indices = topk(router_scores, k=2) output = Σ router_scores[i] * Expert[i](query, keys, values) ``` **Experts**: 1. Local Expert: Standard attention for 1-hop neighbors 2. Hierarchical Expert: Hyperbolic attention for HNSW layers 3. Global Expert: Linear attention for distant nodes 4. Structural Expert: Edge-featured attention --- #### FR-006: Training and Optimization Utilities **Priority**: HIGH **Description**: Tools for training attention-based models **Acceptance Criteria**: - [ ] FR-006.1: Contrastive losses (InfoNCE, Local Contrastive) - [ ] FR-006.2: Spectral regularization (Laplacian smoothness) - [ ] FR-006.3: Multi-objective loss balancing - [ ] FR-006.4: Curriculum learning schedules - [ ] FR-006.5: Hard negative mining --- #### FR-007: Tensor Compression **Priority**: MEDIUM **Description**: Memory-efficient tensor operations **Acceptance Criteria**: - [ ] FR-007.1: Quantization (INT8, INT4) - [ ] FR-007.2: Low-rank factorization - [ ] FR-007.3: Sparse tensor storage - [ ] FR-007.4: Hierarchical compression for HNSW layers --- #### FR-008: SIMD Optimizations **Priority**: MEDIUM **Description**: Vectorized operations for performance **Acceptance Criteria**: - [ ] FR-008.1: AVX2/AVX-512 support for x86_64 - [ ] FR-008.2: NEON support for ARM - [ ] FR-008.3: WASM SIMD support - [ ] FR-008.4: Automatic fallback to scalar operations --- ### 2.2 Non-Functional Requirements #### NFR-001: Performance **NFR-001.1**: Latency - **Requirement**: p95 latency <200ms for 1000-neighbor attention - **Measurement**: Benchmark suite with synthetic graphs - **Verification**: CI/CD performance regression tests **NFR-001.2**: Throughput - **Requirement**: 5,000 attention operations per second - **Measurement**: Batch processing benchmarks - **Verification**: Load testing with real HNSW graphs **NFR-001.3**: Memory - **Requirement**: Peak memory <50MB per operation - **Measurement**: Memory profiling with valgrind/heaptrack - **Verification**: Memory regression tests in CI **NFR-001.4**: Scalability - **Requirement**: Linear scaling up to 10K neighbors - **Measurement**: Complexity analysis and empirical benchmarks - **Verification**: Big-O complexity proofs + empirical validation --- #### NFR-002: Reliability **NFR-002.1**: Numerical Stability - **Requirement**: All outputs finite (no NaN, Inf) across 10M operations - **Measurement**: Fuzzing with random inputs - **Verification**: Property-based testing with proptest **NFR-002.2**: Error Handling - **Requirement**: All errors recoverable, 100% error path coverage - **Measurement**: Error injection testing - **Verification**: Unit tests for error cases **NFR-002.3**: Determinism - **Requirement**: Same inputs produce same outputs (no random behavior) - **Measurement**: Repeated execution tests - **Verification**: Determinism tests in CI --- #### NFR-003: Maintainability **NFR-003.1**: Code Quality - **Requirement**: Clippy clean (zero warnings), rustfmt formatted - **Measurement**: CI linting checks - **Verification**: Pre-commit hooks + CI gates **NFR-003.2**: Documentation - **Requirement**: 100% public API documented with examples - **Measurement**: rustdoc coverage tool - **Verification**: Doc tests pass, examples compile **NFR-003.3**: Test Coverage - **Requirement**: >90% line coverage, >95% branch coverage - **Measurement**: cargo-tarpaulin - **Verification**: CI coverage reports --- #### NFR-004: Portability **NFR-004.1**: Platform Support - **Requirement**: Linux, macOS, Windows support - **Measurement**: CI testing on all platforms - **Verification**: Cross-platform integration tests **NFR-004.2**: WASM Compatibility - **Requirement**: Full functionality in WASM (wasm32-unknown-unknown) - **Measurement**: WASM-specific test suite - **Verification**: Browser and Node.js WASM tests **NFR-004.3**: NAPI-RS Support - **Requirement**: All attention mechanisms callable from Node.js - **Measurement**: Node.js integration tests - **Verification**: NPM package smoke tests --- #### NFR-005: Security **NFR-005.1**: Memory Safety - **Requirement**: Zero unsafe code blocks (or 100% audited unsafe) - **Measurement**: Manual code review - **Verification**: MIRI checks, cargo-geiger **NFR-005.2**: Dependency Audit - **Requirement**: All dependencies audited, no known CVEs - **Measurement**: cargo-audit - **Verification**: Automated dependency scanning in CI --- ### 2.3 Constraints #### C-001: Compatibility Constraints - **Rust Version**: MSRV 1.77+ (per workspace configuration) - **No GPU**: All implementations must run on CPU (WASM/NAPI-RS requirement) - **No Standard Library in WASM**: Must support `#![no_std]` for WASM32 #### C-002: API Constraints - **Backwards Compatibility**: Once 1.0 released, follow SemVer strictly - **Trait Consistency**: All attention mechanisms implement common `Attention` trait - **Builder Pattern**: Configuration via builders, not constructors #### C-003: Performance Constraints - **Compilation Time**: Release build <60s on CI runners - **Binary Size**: WASM bundle <2MB gzipped - **Memory Footprint**: No global allocators, stack-preferred where possible #### C-004: Licensing Constraints - **License**: MIT (per workspace) - **Dependency Licenses**: MIT/Apache-2.0 only (no GPL/LGPL) --- ## 3. Module Architecture ### 3.1 Crate Structure ``` ruvector-attention/ ├── Cargo.toml ├── README.md ├── LICENSE │ ├── src/ │ ├── lib.rs # Public API, re-exports │ │ │ ├── core/ # Core attention primitives │ │ ├── mod.rs # Core module exports │ │ ├── base.rs # Attention trait definition │ │ ├── scaled_dot.rs # Scaled dot-product attention │ │ ├── multi_head.rs # Multi-head attention │ │ └── config.rs # Configuration structs │ │ │ ├── geometric/ # Geometric attention │ │ ├── mod.rs │ │ ├── hyperbolic.rs # Poincaré ball attention │ │ ├── edge_featured.rs # GAT-style edge attention │ │ ├── mixed_curvature.rs # Euclidean + Hyperbolic │ │ └── manifold.rs # General manifold attention │ │ │ ├── sparse/ # Sparse patterns │ │ ├── mod.rs │ │ ├── local_global.rs # Longformer-style │ │ ├── linear.rs # Performer/FAVOR+ │ │ ├── flash.rs # Flash Attention (tiled) │ │ └── patterns.rs # Sparsity pattern utilities │ │ │ ├── graph/ # Graph-aware attention │ │ ├── mod.rs │ │ ├── rope_graph.rs # RoPE for graph distances │ │ ├── cross_space.rs # Dual-space cross-attention │ │ ├── hnsw_aware.rs # HNSW layer encoding │ │ └── structural.rs # Degree/centrality features │ │ │ ├── adaptive/ # Adaptive/learned mechanisms │ │ ├── mod.rs │ │ ├── moe.rs # Mixture of Experts │ │ ├── learned_routing.rs # Attention routing │ │ ├── rl_navigator.rs # RL-based graph navigation │ │ └── dynamic_heads.rs # Adaptive head count │ │ │ ├── training/ # Training utilities │ │ ├── mod.rs │ │ ├── losses.rs # Contrastive, reconstruction │ │ ├── optimizers.rs # SGD, Adam, etc. │ │ ├── regularizers.rs # Spectral, L2, etc. │ │ ├── curriculum.rs # Curriculum learning │ │ └── hard_negatives.rs # Negative sampling │ │ │ ├── compression/ # Tensor compression │ │ ├── mod.rs │ │ ├── quantization.rs # INT8/INT4 quantization │ │ ├── low_rank.rs # SVD/Tucker decomposition │ │ ├── sparse_storage.rs # CSR/COO sparse tensors │ │ └── hierarchical.rs # Layer-wise compression │ │ │ ├── simd/ # SIMD optimizations │ │ ├── mod.rs │ │ ├── avx2.rs # AVX2 kernels │ │ ├── avx512.rs # AVX-512 kernels │ │ ├── neon.rs # ARM NEON kernels │ │ ├── wasm_simd.rs # WASM SIMD │ │ └── dispatch.rs # Runtime detection │ │ │ ├── utils/ # Utilities │ │ ├── mod.rs │ │ ├── math.rs # Math primitives │ │ ├── tensor.rs # Tensor ops │ │ ├── softmax.rs # Numerically stable softmax │ │ └── distances.rs # Distance metrics │ │ │ └── prelude.rs # Common imports │ ├── benches/ # Benchmarks │ ├── attention_benchmark.rs # Core attention benchmarks │ ├── geometric_benchmark.rs # Geometric attention │ ├── sparse_benchmark.rs # Sparse patterns │ └── comparison_benchmark.rs # Mechanism comparison │ ├── tests/ # Integration tests │ ├── core_tests.rs # Core attention tests │ ├── geometric_tests.rs # Geometric tests │ ├── sparse_tests.rs # Sparse pattern tests │ ├── numerical_stability.rs # Stability tests │ └── property_tests.rs # Property-based tests │ ├── ffi/ # Foreign Function Interface │ ├── wasm/ # WASM bindings │ │ ├── Cargo.toml │ │ ├── src/ │ │ │ └── lib.rs # wasm-bindgen exports │ │ └── tests/ │ │ └── web.rs # Browser tests │ │ │ └── napi/ # NAPI-RS bindings │ ├── Cargo.toml │ ├── src/ │ │ └── lib.rs # napi-derive exports │ └── index.d.ts # TypeScript definitions │ ├── cli/ # CLI interface │ ├── Cargo.toml │ └── src/ │ ├── main.rs # CLI entry point │ ├── commands/ # CLI commands │ │ ├── benchmark.rs # Run benchmarks │ │ ├── compare.rs # Compare mechanisms │ │ └── analyze.rs # Analyze attention patterns │ └── output.rs # Formatting │ ├── examples/ # Examples │ ├── basic_attention.rs # Hello world │ ├── graph_attention.rs # Graph-aware usage │ ├── hnsw_integration.rs # HNSW integration │ ├── custom_mechanism.rs # Extending the library │ └── distributed_attention.rs # Parallel processing │ └── docs/ # Documentation ├── design/ # Design documents │ ├── architecture.md # Architecture overview │ ├── api_design.md # API design rationale │ └── performance.md # Performance analysis │ ├── guides/ # User guides │ ├── getting_started.md # Quick start │ ├── mechanism_guide.md # Choosing mechanisms │ └── integration.md # Integration guide │ └── research/ # Research notes ├── attention_mechanisms.md ├── benchmarks.md └── experiments.md ``` ### 3.2 Module Responsibilities #### Core Module (`src/core/`) **Responsibility**: Foundational attention mechanisms and trait definitions **Key Components**: - `Attention` trait: Common interface for all mechanisms - `ScaledDotProductAttention`: Baseline implementation - `MultiHeadAttention`: Standard multi-head decomposition - `AttentionConfig`: Configuration builders **Dependencies**: `utils` only --- #### Geometric Module (`src/geometric/`) **Responsibility**: Geometry-aware attention mechanisms **Key Components**: - `HyperbolicAttention`: Poincaré ball operations - `EdgeFeaturedAttention`: GAT-style with edge features - `MixedCurvatureAttention`: Product space (Euclidean × Hyperbolic) **Dependencies**: `core`, `utils` --- #### Sparse Module (`src/sparse/`) **Responsibility**: Efficient sparse attention patterns **Key Components**: - `LocalGlobalAttention`: Longformer-style - `LinearAttention`: Kernel-based approximation - `FlashAttention`: Memory-efficient tiling **Dependencies**: `core`, `utils`, `simd` (optional) --- #### Graph Module (`src/graph/`) **Responsibility**: Graph structure-aware mechanisms **Key Components**: - `GraphRoPE`: Rotary embeddings for graph distance - `CrossSpaceAttention`: Dual topology + latent space - `HNSWAwareAttention`: HNSW layer encoding **Dependencies**: `core`, `geometric`, `utils` --- #### Adaptive Module (`src/adaptive/`) **Responsibility**: Learned and adaptive attention **Key Components**: - `MoEAttention`: Mixture of experts routing - `RLNavigator`: Reinforcement learning-based navigation - `DynamicHeadAttention`: Runtime head count adjustment **Dependencies**: `core`, `geometric`, `sparse`, `graph`, `training` --- #### Training Module (`src/training/`) **Responsibility**: Loss functions and optimization **Key Components**: - `ContrastiveLoss`: InfoNCE, Triplet - `SpectralRegularizer`: Laplacian smoothness - `HardNegativeSampler`: Mining hard negatives - `CurriculumScheduler`: Loss weight scheduling **Dependencies**: `utils` only --- #### Compression Module (`src/compression/`) **Responsibility**: Memory-efficient tensor storage **Key Components**: - `Quantizer`: INT8/INT4 quantization - `LowRankFactorizer`: SVD compression - `SparseStorage`: CSR/COO formats **Dependencies**: `utils`, `simd` (optional) --- #### SIMD Module (`src/simd/`) **Responsibility**: Vectorized operations **Key Components**: - `SimdDispatcher`: Runtime CPU feature detection - Platform-specific kernels: AVX2, AVX-512, NEON, WASM SIMD **Dependencies**: `utils` only --- ### 3.3 Dependency Graph ``` ┌─────────────────────────────────────────────────────────────┐ │ Public API (lib.rs) │ └─────────────────────────────────────────────────────────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │ v v v ┌──────────┐ ┌──────────┐ ┌──────────┐ │ core │ │ training │ │ utils │ └──────────┘ └──────────┘ └──────────┘ │ │ └────────┬───────────┬─────────┬────────┘ │ │ │ v v v ┌──────────┐ ┌─────────┐ ┌──────┐ │geometric │ │ sparse │ │ simd │ └──────────┘ └─────────┘ └──────┘ │ │ └─────┬─────┘ │ v ┌──────────┐ │ graph │ └──────────┘ │ v ┌──────────┐ │ adaptive │ └──────────┘ │ v ┌─────────────┐ │ compression │ └─────────────┘ ``` **Design Principles**: 1. **Acyclic**: No circular dependencies 2. **Layered**: Lower layers have fewer dependencies 3. **Optional Features**: SIMD, compression via feature flags 4. **Core Stability**: `core` and `utils` are most stable --- ## 4. API Design ### 4.1 Core Trait: `Attention` ```rust /// Core trait for all attention mechanisms pub trait Attention: Send + Sync { /// Forward pass: compute attention over keys/values given query /// /// # Arguments /// * `query` - Query vector (d-dimensional) /// * `keys` - Key vectors (n × d) /// * `values` - Value vectors (n × d) /// /// # Returns /// Attention-weighted aggregation of values (d-dimensional) /// /// # Example /// ``` /// use ruvector_attention::core::ScaledDotProductAttention; /// use ruvector_attention::Attention; /// /// let attn = ScaledDotProductAttention::new(128); /// let query = vec![1.0; 128]; /// let keys = vec![vec![0.5; 128]; 10]; /// let values = keys.clone(); /// /// let output = attn.forward(&query, &keys, &values); /// assert_eq!(output.len(), 128); /// ``` fn forward( &self, query: &[f32], keys: &[Vec], values: &[Vec], ) -> Result, AttentionError>; /// Get attention weights without computing weighted sum /// /// Useful for visualization and debugging fn attention_weights( &self, query: &[f32], keys: &[Vec], ) -> Result, AttentionError>; /// Get hidden dimension fn hidden_dim(&self) -> usize; /// Check if mechanism supports variable-length inputs fn supports_variable_length(&self) -> bool { true } /// Estimated computational complexity (for documentation) fn complexity(&self) -> Complexity { Complexity::Quadratic } } /// Computational complexity categories #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Complexity { Linear, // O(n) Linearithmic, // O(n log n) Quadratic, // O(n²) Custom(&'static str), // Custom complexity description } ``` --- ### 4.2 Error Handling ```rust /// Errors that can occur during attention computation #[derive(Debug, thiserror::Error)] pub enum AttentionError { /// Input dimension mismatch #[error("Dimension mismatch: expected {expected}, got {actual}")] DimensionMismatch { expected: usize, actual: usize }, /// Empty input #[error("Empty input: {context}")] EmptyInput { context: String }, /// Numerical instability detected #[error("Numerical instability: {message}")] NumericalInstability { message: String }, /// Invalid configuration #[error("Invalid configuration: {message}")] InvalidConfig { message: String }, /// Out of bounds access #[error("Index out of bounds: {index} >= {len}")] OutOfBounds { index: usize, len: usize }, /// Unsupported operation #[error("Unsupported operation: {operation}")] Unsupported { operation: String }, /// Internal error #[error("Internal error: {message}")] Internal { message: String }, } pub type Result = std::result::Result; ``` --- ### 4.3 Builder Pattern ```rust /// Builder for ScaledDotProductAttention #[derive(Debug, Clone)] pub struct ScaledDotProductAttentionBuilder { hidden_dim: usize, dropout: Option, temperature: f32, normalize: bool, } impl ScaledDotProductAttentionBuilder { pub fn new(hidden_dim: usize) -> Self { Self { hidden_dim, dropout: None, temperature: 1.0, normalize: true, } } pub fn dropout(mut self, rate: f32) -> Self { assert!((0.0..=1.0).contains(&rate), "Dropout must be in [0, 1]"); self.dropout = Some(rate); self } pub fn temperature(mut self, temp: f32) -> Self { assert!(temp > 0.0, "Temperature must be positive"); self.temperature = temp; self } pub fn normalize(mut self, normalize: bool) -> Self { self.normalize = normalize; self } pub fn build(self) -> Result { if self.hidden_dim == 0 { return Err(AttentionError::InvalidConfig { message: "hidden_dim must be > 0".to_string(), }); } Ok(ScaledDotProductAttention { hidden_dim: self.hidden_dim, scale: (self.hidden_dim as f32).sqrt().recip(), dropout: self.dropout, temperature: self.temperature, normalize: self.normalize, }) } } // Usage: let attn = ScaledDotProductAttention::builder(128) .dropout(0.1) .temperature(0.07) .build()?; ``` --- ### 4.4 Multi-Head Attention API ```rust /// Multi-head attention with configurable heads pub struct MultiHeadAttention { num_heads: usize, head_dim: usize, hidden_dim: usize, w_q: Vec, // Query projections per head w_k: Vec, // Key projections per head w_v: Vec, // Value projections per head w_o: Linear, // Output projection dropout: Option, } impl MultiHeadAttention { pub fn builder(hidden_dim: usize, num_heads: usize) -> MultiHeadAttentionBuilder { MultiHeadAttentionBuilder::new(hidden_dim, num_heads) } /// Get attention patterns for all heads pub fn head_attention_weights( &self, query: &[f32], keys: &[Vec], ) -> Result>> { // Returns [num_heads × num_keys] attention weights // Useful for interpretability } /// Get specific head output pub fn head_output( &self, head_idx: usize, query: &[f32], keys: &[Vec], values: &[Vec], ) -> Result> { // Get output of a single head (for debugging) } } impl Attention for MultiHeadAttention { fn forward( &self, query: &[f32], keys: &[Vec], values: &[Vec], ) -> Result> { // 1. Project to heads: Q_i, K_i, V_i for each head i // 2. Compute attention per head: head_i = Attention(Q_i, K_i, V_i) // 3. Concatenate heads: concat(head_1, ..., head_h) // 4. Output projection: W_o @ concat } } ``` --- ### 4.5 Geometric Attention API ```rust /// Hyperbolic attention in Poincaré ball pub struct HyperbolicAttention { hidden_dim: usize, curvature: f32, // Negative curvature (e.g., -1.0) w_q: Linear, w_k: Linear, w_v: Linear, } impl HyperbolicAttention { /// Create new hyperbolic attention /// /// # Arguments /// * `hidden_dim` - Embedding dimension /// * `curvature` - Curvature of hyperbolic space (must be negative) pub fn new(hidden_dim: usize, curvature: f32) -> Result { if curvature >= 0.0 { return Err(AttentionError::InvalidConfig { message: "Hyperbolic curvature must be negative".to_string(), }); } // ... } /// Poincaré distance between two points pub fn poincare_distance(&self, x: &[f32], y: &[f32]) -> f32 { // d(x,y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²))) } /// Möbius addition (hyperbolic vector addition) pub fn mobius_add(&self, x: &[f32], y: &[f32]) -> Vec { // (1+2⟨x,y⟩+||y||²)x + (1-||x||²)y / (1+2⟨x,y⟩+||x||²||y||²) } /// Project point onto Poincaré ball (clip to ||x|| < 1) pub fn project_to_ball(&self, x: &mut [f32], eps: f32) { let norm = l2_norm(x); if norm >= 1.0 - eps { let scale = (1.0 - eps) / norm; for xi in x.iter_mut() { *xi *= scale; } } } } impl Attention for HyperbolicAttention { fn forward( &self, query: &[f32], keys: &[Vec], values: &[Vec], ) -> Result> { // 1. Compute hyperbolic similarities: -d_poincare(q, k_j) // 2. Softmax attention weights // 3. Aggregate in hyperbolic space via Möbius operations } } ``` --- ### 4.6 Graph-Aware Attention API ```rust /// Attention with graph-specific features pub trait GraphAttention: Attention { /// Forward pass with edge features fn forward_with_edges( &self, query: &[f32], keys: &[Vec], values: &[Vec], edge_features: &[Vec], ) -> Result>; /// Forward pass with graph metadata fn forward_with_metadata( &self, query: &[f32], keys: &[Vec], values: &[Vec], metadata: &GraphMetadata, ) -> Result>; } /// Graph metadata for attention #[derive(Debug, Clone)] pub struct GraphMetadata { /// Graph distances (e.g., shortest path lengths) pub distances: Option>, /// HNSW layer indices pub hnsw_layers: Option>, /// Edge weights pub edge_weights: Option>, /// Structural features (degree, centrality, etc.) pub structural_features: Option>>, } /// RoPE-enhanced attention for graphs pub struct GraphRoPE { hidden_dim: usize, base: f32, // Frequency base (default 10000) w_q: Linear, w_k: Linear, w_v: Linear, } impl GraphRoPE { /// Apply rotation based on graph distance pub fn apply_rotation(&self, embedding: &[f32], distance: f32) -> Vec { // Rotate embedding by angle proportional to distance } } impl GraphAttention for GraphRoPE { fn forward_with_metadata( &self, query: &[f32], keys: &[Vec], values: &[Vec], metadata: &GraphMetadata, ) -> Result> { let distances = metadata.distances.as_ref() .ok_or_else(|| AttentionError::InvalidConfig { message: "GraphRoPE requires distance metadata".to_string(), })?; // Apply rotations based on distances // Compute attention with rotated embeddings } } ``` --- ### 4.7 Adaptive Attention API ```rust /// Mixture of Experts attention pub struct MoEAttention { router: Linear, // Maps query to expert scores experts: Vec>, top_k: usize, // Number of experts to activate } impl MoEAttention { pub fn builder() -> MoEAttentionBuilder { MoEAttentionBuilder::new() } /// Get routing decisions pub fn get_routing( &self, query: &[f32], ) -> Result> { // Returns (expert_index, weight) pairs } /// Add an expert to the mixture pub fn add_expert(&mut self, expert: Box) { self.experts.push(expert); } } impl Attention for MoEAttention { fn forward( &self, query: &[f32], keys: &[Vec], values: &[Vec], ) -> Result> { // 1. Route: scores = Router(query) // 2. Select top-k experts // 3. Weighted combination of expert outputs } } /// Builder for MoE attention pub struct MoEAttentionBuilder { router_hidden_dim: usize, experts: Vec>, top_k: usize, } impl MoEAttentionBuilder { pub fn add_local_expert(mut self, hidden_dim: usize) -> Self { self.experts.push(Box::new( ScaledDotProductAttention::new(hidden_dim).unwrap() )); self } pub fn add_hyperbolic_expert(mut self, hidden_dim: usize, curvature: f32) -> Self { self.experts.push(Box::new( HyperbolicAttention::new(hidden_dim, curvature).unwrap() )); self } pub fn add_sparse_expert(mut self, local_window: usize, global_nodes: usize) -> Self { self.experts.push(Box::new( LocalGlobalAttention::new(local_window, global_nodes).unwrap() )); self } pub fn top_k(mut self, k: usize) -> Self { self.top_k = k; self } pub fn build(self) -> Result { // Validation and construction } } ``` --- ### 4.8 Training Utilities API ```rust /// Contrastive loss functions pub mod losses { /// InfoNCE contrastive loss pub fn info_nce( anchor: &[f32], positives: &[&[f32]], negatives: &[&[f32]], temperature: f32, ) -> f32; /// Triplet loss pub fn triplet( anchor: &[f32], positive: &[f32], negative: &[f32], margin: f32, ) -> f32; /// Local contrastive loss (graph-specific) pub fn local_contrastive( node_embedding: &[f32], neighbor_embeddings: &[Vec], non_neighbor_embeddings: &[Vec], temperature: f32, ) -> f32; } /// Hard negative mining pub mod hard_negatives { pub enum SamplingStrategy { Distance, // Most similar non-neighbors Degree, // Similar degree distribution Mixed, // Combination } pub fn sample_hard_negatives( anchor: &[f32], all_embeddings: &[Vec], positive_indices: &[usize], k: usize, strategy: SamplingStrategy, ) -> Vec>; } /// Spectral regularization pub mod regularizers { /// Laplacian smoothness pub fn laplacian( embeddings: &[Vec], edges: &[(usize, usize)], edge_weights: Option<&[f32]>, ) -> f32; /// Orthogonality regularization pub fn orthogonality(embeddings: &[Vec]) -> f32; /// Embedding norm regularization pub fn norm_penalty(embeddings: &[Vec], target_norm: f32) -> f32; } ``` --- ## 5. Performance Targets ### 5.1 Latency Targets | Operation | Input Size | p50 | p95 | p99 | |-----------|------------|-----|-----|-----| | Scaled Dot-Product | 100 neighbors | <5ms | <10ms | <20ms | | Scaled Dot-Product | 1K neighbors | <50ms | <100ms | <150ms | | Multi-Head (4 heads) | 100 neighbors | <10ms | <20ms | <30ms | | Multi-Head (4 heads) | 1K neighbors | <80ms | <150ms | <200ms | | Hyperbolic | 100 neighbors | <15ms | <30ms | <50ms | | Sparse (Local+Global) | 1K neighbors | <30ms | <60ms | <100ms | | Flash Attention | 1K neighbors | <40ms | <80ms | <120ms | | MoE (4 experts, top-2) | 1K neighbors | <100ms | <180ms | <250ms | **Measurement Method**: Criterion.rs benchmarks with 1000 iterations, warm cache --- ### 5.2 Throughput Targets | Mechanism | Target (ops/sec) | Stretch (ops/sec) | |-----------|------------------|-------------------| | Scaled Dot-Product | 10,000 | 20,000 | | Multi-Head | 5,000 | 10,000 | | Hyperbolic | 3,000 | 6,000 | | Sparse | 8,000 | 15,000 | | Flash | 7,000 | 12,000 | **Measurement**: Batch processing of 1000 operations, averaged over 10 runs --- ### 5.3 Memory Targets | Mechanism | Peak Memory (1K neighbors) | Target | Stretch | |-----------|---------------------------|--------|---------| | Scaled Dot-Product | Full attention matrix | <50MB | <25MB | | Multi-Head (4 heads) | 4× attention matrices | <100MB | <50MB | | Flash Attention | Tiled computation | <20MB | <10MB | | Sparse | Sparse patterns only | <15MB | <8MB | **Measurement**: Valgrind/heaptrack during benchmark execution --- ### 5.4 Compilation Targets | Configuration | Target | Stretch | |---------------|--------|---------| | Debug build | <10s | <5s | | Release build (--release) | <60s | <30s | | Release with LTO | <120s | <60s | | WASM build | <90s | <45s | **Measurement**: CI build times on GitHub Actions standard runners --- ### 5.5 Binary Size Targets | Target | Size (uncompressed) | Size (gzipped) | Target | Stretch | |--------|---------------------|----------------|--------|---------| | WASM | 5-8 MB | 1.5-2 MB | <2MB | <1MB | | Native (Linux x86_64) | 10-15 MB | N/A | <15MB | <10MB | | NAPI-RS addon | 8-12 MB | N/A | <12MB | <8MB | **Measurement**: `wasm-opt` for WASM, `strip` for native --- ### 5.6 Scalability Targets **Linear Scaling**: - Operations should scale O(n) or better up to 10K neighbors - No quadratic blowup in standard use cases **Benchmark**: ```rust #[bench] fn bench_scalability_attention(b: &mut Bencher) { for n in [100, 500, 1000, 5000, 10000] { let attn = ScaledDotProductAttention::new(128).unwrap(); let query = vec![1.0; 128]; let keys = vec![vec![1.0; 128]; n]; let values = keys.clone(); let start = Instant::now(); b.iter(|| attn.forward(&query, &keys, &values)); let elapsed = start.elapsed(); println!("n={}: {:?}", n, elapsed); // Assert linear or sub-quadratic scaling } } ``` --- ## 6. Compatibility Matrix ### 6.1 Rust Version Support | Rust Version | Support Status | Notes | |--------------|----------------|-------| | 1.77.0 (MSRV) | ✅ Supported | Minimum supported version | | 1.78.x | ✅ Supported | | | 1.79.x | ✅ Supported | | | 1.80.x+ | ✅ Supported | Latest stable | | Nightly | ⚠️ Best-effort | May use unstable features behind flags | **Testing**: CI runs on MSRV, stable, and nightly --- ### 6.2 Platform Support #### Desktop Platforms | Platform | Tier | Support Status | CI Testing | |----------|------|----------------|------------| | Linux x86_64 | Tier 1 | ✅ Full support | Yes | | Linux ARM64 | Tier 2 | ✅ Full support | Yes | | macOS x86_64 | Tier 1 | ✅ Full support | Yes | | macOS ARM64 (M1/M2) | Tier 1 | ✅ Full support | Yes | | Windows x86_64 | Tier 1 | ✅ Full support | Yes | | Windows ARM64 | Tier 3 | ⚠️ Best-effort | No | #### WASM Targets | Target | Support Status | Notes | |--------|----------------|-------| | wasm32-unknown-unknown | ✅ Full support | Browser + Node.js | | wasm32-wasi | ✅ Full support | WASI runtime | | wasm32-unknown-emscripten | ⚠️ Untested | Should work | **WASM Features**: - ✅ All attention mechanisms - ✅ SIMD support (where available) - ✅ Multi-threading via Web Workers - ❌ File I/O (not needed) #### Mobile Platforms | Platform | Support Status | Notes | |----------|----------------|-------| | iOS ARM64 | ⚠️ Untested | Should work via FFI | | Android ARM64 | ⚠️ Untested | Should work via FFI | --- ### 6.3 Node.js Support (NAPI-RS) | Node.js Version | Support Status | Notes | |-----------------|----------------|-------| | 18.x LTS | ✅ Supported | NAPI-RS requires N-API 9+ | | 20.x LTS | ✅ Supported | Recommended | | 21.x+ Current | ✅ Supported | Latest features | **NAPI-RS Features**: - ✅ All attention mechanisms exposed - ✅ TypeScript definitions - ✅ Async operations (Tokio runtime) - ✅ Buffer zero-copy where possible **Package Platforms**: ```json { "napi": { "triples": { "defaults": true, "additional": [ "x86_64-unknown-linux-musl", "aarch64-unknown-linux-gnu", "aarch64-apple-darwin", "x86_64-pc-windows-msvc" ] } } } ``` --- ### 6.4 Feature Flags | Feature | Default | Description | Dependencies | |---------|---------|-------------|--------------| | `std` | ✅ | Standard library support | None | | `simd` | ❌ | SIMD optimizations | `std` | | `rayon` | ❌ | Parallel processing | `std`, `rayon` | | `compression` | ❌ | Tensor compression | `std` | | `wasm` | ❌ | WASM-specific bindings | `wasm-bindgen` | | `napi` | ❌ | Node.js bindings | `napi-rs` | | `cli` | ❌ | CLI interface | `std`, `clap` | | `serde` | ✅ | Serialization support | `serde` | **Example**: ```toml [dependencies] ruvector-attention = { version = "0.1", features = ["simd", "rayon"] } ``` --- ## 7. Testing Strategy ### 7.1 Unit Tests **Coverage Target**: >90% line coverage, >95% branch coverage **Test Categories**: #### 7.1.1 Correctness Tests ```rust #[cfg(test)] mod correctness_tests { use super::*; #[test] fn test_attention_output_dimension() { let attn = ScaledDotProductAttention::new(128).unwrap(); let output = attn.forward(&query, &keys, &values).unwrap(); assert_eq!(output.len(), 128); } #[test] fn test_attention_weights_sum_to_one() { let attn = ScaledDotProductAttention::new(128).unwrap(); let weights = attn.attention_weights(&query, &keys).unwrap(); let sum: f32 = weights.iter().sum(); assert!((sum - 1.0).abs() < 1e-5); } #[test] fn test_empty_neighbors_handling() { let attn = ScaledDotProductAttention::new(128).unwrap(); let result = attn.forward(&query, &[], &[]); assert!(result.is_err()); assert!(matches!(result, Err(AttentionError::EmptyInput { .. }))); } } ``` #### 7.1.2 Numerical Stability Tests ```rust #[cfg(test)] mod stability_tests { #[test] fn test_large_scores_softmax() { // Test softmax with very large scores (overflow risk) let scores = vec![1000.0, 999.0, 998.0]; let weights = softmax(&scores); assert!(weights.iter().all(|&w| w.is_finite())); } #[test] fn test_small_scores_softmax() { // Test softmax with very small scores (underflow risk) let scores = vec![-1000.0, -999.0, -998.0]; let weights = softmax(&scores); assert!(weights.iter().all(|&w| w.is_finite())); } #[test] fn test_hyperbolic_boundary() { let attn = HyperbolicAttention::new(128, -1.0).unwrap(); let query = vec![0.99; 128]; // Near ball boundary let output = attn.forward(&query, &keys, &values).unwrap(); // Output must stay inside ball assert!(l2_norm(&output) < 1.0); } } ``` #### 7.1.3 Edge Case Tests ```rust #[cfg(test)] mod edge_case_tests { #[test] fn test_single_neighbor() { let attn = ScaledDotProductAttention::new(128).unwrap(); let keys = vec![vec![1.0; 128]]; let output = attn.forward(&query, &keys, &keys).unwrap(); // With single neighbor, attention weight should be 1.0 } #[test] fn test_identical_keys() { // All keys identical -> uniform attention let keys = vec![vec![1.0; 128]; 10]; let weights = attn.attention_weights(&query, &keys).unwrap(); for w in &weights { assert!((w - 0.1).abs() < 1e-5); // 1/10 } } #[test] fn test_zero_vectors() { let query = vec![0.0; 128]; let keys = vec![vec![0.0; 128]; 10]; let result = attn.forward(&query, &keys, &keys); // Should handle gracefully (may return error or uniform weights) } } ``` --- ### 7.2 Integration Tests **Goal**: Test interactions between modules #### 7.2.1 Multi-Mechanism Pipeline ```rust #[test] fn test_moe_with_multiple_experts() { let moe = MoEAttention::builder() .add_local_expert(128) .add_hyperbolic_expert(128, -1.0) .add_sparse_expert(10, 5) .top_k(2) .build() .unwrap(); let output = moe.forward(&query, &keys, &values).unwrap(); assert_eq!(output.len(), 128); } ``` #### 7.2.2 Graph Attention with HNSW ```rust #[test] fn test_graph_rope_with_hnsw_layers() { let rope = GraphRoPE::new(128, 10000.0).unwrap(); let metadata = GraphMetadata { distances: Some(vec![1.0, 2.0, 3.0]), hnsw_layers: Some(vec![0, 1, 2]), ..Default::default() }; let output = rope.forward_with_metadata( &query, &keys, &values, &metadata ).unwrap(); assert_eq!(output.len(), 128); } ``` --- ### 7.3 Property-Based Tests **Tool**: `proptest` ```rust use proptest::prelude::*; proptest! { #[test] fn prop_attention_weights_normalized( query in prop::collection::vec(-10.0f32..10.0, 128), keys in prop::collection::vec( prop::collection::vec(-10.0f32..10.0, 128), 1..100 ) ) { let attn = ScaledDotProductAttention::new(128).unwrap(); let weights = attn.attention_weights(&query, &keys).unwrap(); let sum: f32 = weights.iter().sum(); prop_assert!((sum - 1.0).abs() < 1e-4); } #[test] fn prop_attention_output_finite( query in prop::collection::vec(-100.0f32..100.0, 128), keys in prop::collection::vec( prop::collection::vec(-100.0f32..100.0, 128), 1..100 ) ) { let attn = ScaledDotProductAttention::new(128).unwrap(); let values = keys.clone(); let output = attn.forward(&query, &keys, &values).unwrap(); prop_assert!(output.iter().all(|&x| x.is_finite())); } } ``` --- ### 7.4 Benchmark Tests **Tool**: `criterion` ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; fn bench_scaled_dot_product(c: &mut Criterion) { let attn = ScaledDotProductAttention::new(128).unwrap(); let query = vec![1.0; 128]; let keys = vec![vec![1.0; 128]; 1000]; let values = keys.clone(); c.bench_function("scaled_dot_product_1k", |b| { b.iter(|| { attn.forward( black_box(&query), black_box(&keys), black_box(&values) ) }) }); } fn bench_multi_head(c: &mut Criterion) { let mut group = c.benchmark_group("multi_head_attention"); for num_heads in [1, 2, 4, 8] { let attn = MultiHeadAttention::builder(128, num_heads) .build() .unwrap(); group.bench_function(format!("heads_{}", num_heads), |b| { b.iter(|| { attn.forward( black_box(&query), black_box(&keys), black_box(&values) ) }) }); } group.finish(); } criterion_group!(benches, bench_scaled_dot_product, bench_multi_head); criterion_main!(benches); ``` --- ### 7.5 Fuzzing **Tool**: `cargo-fuzz` ```rust #![no_main] use libfuzzer_sys::fuzz_target; use ruvector_attention::core::ScaledDotProductAttention; use ruvector_attention::Attention; fuzz_target!(|data: &[u8]| { if data.len() < 512 { return; } // Parse fuzzer input into query, keys, values let query: Vec = data[0..128] .chunks(4) .map(|chunk| f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]])) .collect(); // ... similar for keys and values let attn = ScaledDotProductAttention::new(32).unwrap(); // Fuzz target: should never panic let _ = attn.forward(&query, &keys, &values); }); ``` --- ### 7.6 WASM Tests ```rust #[cfg(target_arch = "wasm32")] #[cfg(test)] mod wasm_tests { use wasm_bindgen_test::*; #[wasm_bindgen_test] fn test_attention_in_wasm() { let attn = ScaledDotProductAttention::new(128).unwrap(); let output = attn.forward(&query, &keys, &values).unwrap(); assert_eq!(output.len(), 128); } #[wasm_bindgen_test] fn test_simd_in_wasm() { #[cfg(feature = "simd")] { // Test WASM SIMD operations } } } ``` --- ### 7.7 Performance Regression Tests **CI Check**: Fail if performance degrades >5% from baseline ```rust #[test] fn test_performance_regression() { let baseline_latency_ms = 100.0; // From previous run let start = Instant::now(); let attn = ScaledDotProductAttention::new(128).unwrap(); for _ in 0..1000 { attn.forward(&query, &keys, &values).unwrap(); } let elapsed = start.elapsed().as_secs_f64() * 1000.0; let current_latency_ms = elapsed / 1000.0; let regression = (current_latency_ms - baseline_latency_ms) / baseline_latency_ms; assert!( regression < 0.05, "Performance regression detected: {}%", regression * 100.0 ); } ``` --- ## 8. Success Criteria ### 8.1 Quantifiable Metrics #### 8.1.1 Functional Completeness - [ ] **10/10 attention mechanisms implemented** (100%) - [ ] **All mechanisms pass unit tests** (100% pass rate) - [ ] **Integration tests pass** (100% pass rate) #### 8.1.2 Performance - [ ] **Latency**: p95 <200ms @ 1K neighbors for all mechanisms - [ ] **Throughput**: >5,000 ops/sec for scaled dot-product - [ ] **Memory**: Peak usage <50MB per operation - [ ] **Scalability**: Linear or sub-quadratic up to 10K neighbors #### 8.1.3 Quality - [ ] **Test coverage**: >90% line coverage - [ ] **Documentation coverage**: 100% public APIs documented - [ ] **Zero compiler warnings**: Clippy clean - [ ] **Zero unsafe code**: Or 100% audited and justified #### 8.1.4 Compatibility - [ ] **Platforms**: Linux, macOS, Windows passing CI - [ ] **WASM**: All tests pass in wasm32-unknown-unknown - [ ] **NAPI-RS**: Node.js 18+, all platforms published - [ ] **MSRV**: Rust 1.77+ supported #### 8.1.5 Adoption - [ ] **Examples**: 5+ runnable examples - [ ] **Documentation**: Getting started guide, API docs, tutorials - [ ] **Integration**: Used in ruvector-gnn crate --- ### 8.2 Acceptance Tests #### Phase 1 Acceptance (Weeks 1-4) ``` ✅ Core attention mechanisms (scaled dot-product, multi-head) ✅ Unit tests passing (>80% coverage) ✅ Basic benchmarks established ✅ API design finalized ``` #### Phase 2 Acceptance (Weeks 5-8) ``` ✅ Geometric attention (hyperbolic, edge-featured) ✅ Integration tests with graph structures ✅ Performance targets met for core mechanisms ✅ WASM compatibility verified ``` #### Phase 3 Acceptance (Weeks 9-12) ``` ✅ Sparse mechanisms (flash, linear, local+global) ✅ Memory targets met ✅ NAPI-RS bindings complete ✅ Documentation 50% complete ``` #### Phase 4 Acceptance (Weeks 13-16) ``` ✅ Adaptive mechanisms (MoE, cross-attention) ✅ Training utilities complete ✅ CLI interface functional ✅ All performance targets met ``` #### Phase 5 Acceptance (Weeks 17-20) ``` ✅ Full integration with ruvector-gnn ✅ Documentation 100% complete ✅ Optimization passes complete ✅ Ready for 1.0 release ``` --- ### 8.3 Release Criteria (v1.0) **Blocker Issues** (must fix before release): - [ ] Zero failing tests - [ ] Zero compiler warnings - [ ] All performance targets met - [ ] 100% public API documented - [ ] Security audit complete - [ ] Cross-platform CI passing **Nice-to-Have** (can defer to 1.1): - [ ] GPU acceleration (CUDA/Metal) - [ ] Additional attention variants - [ ] Advanced SIMD optimizations - [ ] Distributed attention --- ## 9. Constraints and Dependencies ### 9.1 Technical Constraints #### C-001: No GPU Dependency **Constraint**: All implementations must run on CPU **Rationale**: WASM and NAPI-RS environments lack GPU access **Impact**: May limit performance for very large graphs **Mitigation**: SIMD optimizations, algorithm choice (sparse/linear attention) #### C-002: Memory Constraints in WASM **Constraint**: WASM has limited memory (typically 2-4GB) **Rationale**: Browser and Node.js WASM environments **Impact**: Cannot materialize large attention matrices **Mitigation**: Flash Attention, sparse patterns, streaming computation #### C-003: Serialization Requirements **Constraint**: All types must be serializable (serde) **Rationale**: Model saving/loading, network transfer **Impact**: Design complexity, trait object limitations **Mitigation**: Enum-based polymorphism, careful trait design --- ### 9.2 Dependencies #### Core Dependencies ```toml [dependencies] # Math and numerics ndarray = { version = "0.16", default-features = false } rand = { version = "0.8", default-features = false } rand_distr = { version = "0.4", default-features = false } # Serialization serde = { version = "1.0", features = ["derive"], optional = true } rkyv = { version = "0.8", optional = true } # Error handling thiserror = "2.0" # Optional: SIMD simsimd = { version = "5.9", optional = true, features = ["nightly"] } # Optional: Parallel processing rayon = { version = "1.10", optional = true } # Optional: WASM wasm-bindgen = { version = "0.2", optional = true } js-sys = { version = "0.3", optional = true } # Optional: NAPI-RS napi = { version = "2.16", optional = true } napi-derive = { version = "2.16", optional = true } ``` **Dependency Audit**: All dependencies must be MIT/Apache-2.0 licensed --- ### 9.3 Integration Dependencies #### Upstream (Used By) - `ruvector-gnn`: Uses attention mechanisms in GNN layers - `ruvector-graph`: Graph construction with attention-based edge selection #### Downstream (Depends On) - `ruvector-core`: Core vector operations, distance metrics - `hnsw_rs`: HNSW graph structure (optional, for examples) --- ## 10. Risk Assessment ### 10.1 Technical Risks | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | **Hyperbolic numerical instability** | High | Medium | Careful boundary handling, epsilon clipping, extensive testing | | **WASM performance degradation** | Medium | High | WASM SIMD, algorithmic optimizations, benchmarking | | **Memory bloat in large graphs** | Medium | High | Flash Attention, sparse patterns, streaming | | **API breaking changes** | Low | High | Careful API design, SemVer, deprecation warnings | | **Dependency conflicts** | Low | Medium | Minimal dependencies, version pinning | --- ### 10.2 Schedule Risks | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | **Hyperbolic implementation complexity** | Medium | Medium | 20% buffer time, fallback to Euclidean | | **Performance targets not met** | Low | High | Early benchmarking, iterative optimization | | **WASM/NAPI-RS compatibility issues** | Low | Medium | Early CI setup, continuous testing | **Buffer**: 20% time buffer in each phase for unexpected issues --- ### 10.3 Operational Risks | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | **CI infrastructure failures** | Low | Low | GitHub Actions redundancy, local testing | | **Documentation drift** | Medium | Medium | Doc tests, CI doc generation checks | | **Contributor onboarding difficulty** | Medium | Low | Comprehensive docs, clear examples | --- ## 11. Open Questions ### 11.1 Design Questions **Q1**: Should we support dynamic mechanism selection at runtime? **Options**: - A) Enum-based (`AttentionMechanism::ScaledDotProduct`) - B) Trait objects (`Box`) - C) Both **Q2**: How to handle attention visualization? **Options**: - A) Return attention weights separately - B) Integrate with vis library (e.g., `plotters`) - C) Export to JSON for external tools **Q3**: Should we support distributed attention computation? **Options**: - A) In-crate via `rayon` - B) External crate (e.g., `ruvector-attention-distributed`) - C) Defer to v2.0 --- ### 11.2 API Questions **Q4**: Naming convention for attention mechanisms? **Options**: - A) Descriptive (`ScaledDotProductAttention`) - B) Abbreviated (`SDPAttention`) - C) Mixed (long in code, short in docs) **Q5**: Should builders be mandatory or optional? **Options**: - A) Mandatory (always use builder) - B) Optional (provide `new()` for defaults) - C) Hybrid (simple types use `new()`, complex use builder) --- ## Appendix A: Glossary | Term | Definition | |------|------------| | **Attention** | Mechanism for weighted aggregation based on learned similarities | | **Scaled Dot-Product** | `Attention(Q,K,V) = softmax(QK^T/√d) V` | | **Multi-Head** | Parallel attention mechanisms with different projections | | **Hyperbolic** | Non-Euclidean geometry with negative curvature | | **Poincaré Ball** | Model of hyperbolic space as unit ball | | **GAT** | Graph Attention Networks | | **RoPE** | Rotary Position Embeddings | | **Flash Attention** | Memory-efficient tiled attention computation | | **MoE** | Mixture of Experts (learned routing between mechanisms) | | **InfoNCE** | Contrastive loss function | | **HNSW** | Hierarchical Navigable Small World graphs | --- ## Appendix B: References ### Research Papers 1. **Attention Mechanism**: Vaswani et al. (2017) - "Attention Is All You Need" 2. **GAT**: Veličković et al. (2018) - "Graph Attention Networks" 3. **Hyperbolic**: Chami et al. (2019) - "Hyperbolic Graph Convolutional Neural Networks" 4. **Flash Attention**: Dao et al. (2022) - "FlashAttention: Fast and Memory-Efficient Exact Attention" 5. **Performer**: Choromanski et al. (2020) - "Rethinking Attention with Performers" 6. **RoPE**: Su et al. (2021) - "RoFormer: Enhanced Transformer with Rotary Position Embedding" 7. **MoE**: Shazeer et al. (2017) - "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer" ### RuVector Research Documents - `/docs/latent-space/attention-mechanisms-research.md` - `/docs/latent-space/gnn-architecture-analysis.md` - `/docs/latent-space/optimization-strategies.md` - `/docs/latent-space/implementation-roadmap.md` ### External Resources - [Rust WASM Book](https://rustwasm.github.io/book/) - [NAPI-RS Documentation](https://napi.rs/) - [Criterion.rs Guide](https://bheisler.github.io/criterion.rs/book/) - [Proptest Book](https://proptest-rs.github.io/proptest/) --- ## Document History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-11-30 | RuVector Team | Initial specification | --- ## Approvals | Role | Name | Signature | Date | |------|------|-----------|------| | **Technical Lead** | | | | | **Architecture Review** | | | | | **QA Lead** | | | | | **Product Owner** | | | | --- **END OF SPECIFICATION** This document represents the complete specification for the `ruvector-attention` crate. Implementation should proceed according to the SPARC methodology: - **S**pecification ✅ (this document) - **P**seudocode (next phase) - **A**rchitecture (detailed design) - **R**efinement (iterative TDD) - **C**ompletion (integration and release)