git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
61 KiB
SPARC Specification: ruvector-attention Crate
Version: 1.0.0 Date: 2025-11-30 Status: Draft Authors: RuVector Research Team SPARC Phase: Specification
Table of Contents
- Executive Summary
- Requirements Analysis
- Module Architecture
- API Design
- Performance Targets
- Compatibility Matrix
- Testing Strategy
- Success Criteria
- Constraints and Dependencies
- Risk Assessment
1. Executive Summary
1.1 Vision
Create a modular, high-performance attention mechanism library specifically designed for GNN latent space operations in RuVector. The ruvector-attention crate will implement 10 distinct attention mechanisms from research literature, enabling researchers and practitioners to experiment with different attention strategies for graph-structured data.
Core Mission: Bridge the gap between latent space representations and graph topology through specialized attention mechanisms optimized for HNSW-based vector databases.
1.2 Goals
Primary Goals:
- Modularity: Each attention mechanism is a standalone, composable component
- Performance: Achieve <200ms latency for 95% of attention operations on 1000-neighbor graphs
- Compatibility: Support WASM, NAPI-RS (Node.js), CLI, and Rust SDK environments
- Extensibility: Easy to add new attention mechanisms without modifying core APIs
- Research-Driven: Implement cutting-edge attention mechanisms from academic literature
Secondary Goals:
- Provide benchmarking tools for comparing attention mechanisms
- Enable automatic mechanism selection based on graph properties
- Support distributed/parallel attention computation
- Maintain numerical stability across all implementations
1.3 Performance Targets
| Metric | Target | Stretch Goal |
|---|---|---|
| Latency (p95) | <200ms @ 1K neighbors | <100ms @ 1K neighbors |
| Throughput | 5,000 ops/sec | 10,000 ops/sec |
| Memory (per op) | <50MB @ 1K neighbors | <25MB @ 1K neighbors |
| WASM Binary Size | <2MB (gzipped) | <1MB (gzipped) |
| Compilation Time | <60s (release) | <30s (release) |
| Test Coverage | >90% | >95% |
1.4 Timeline Overview
Phase 1 (Weeks 1-4): Core attention primitives + Multi-head attention Phase 2 (Weeks 5-8): Geometric attention (Hyperbolic, Edge-featured) Phase 3 (Weeks 9-12): Sparse and efficient mechanisms (Flash, Linear) Phase 4 (Weeks 13-16): Adaptive mechanisms (MoE, Cross-attention) Phase 5 (Weeks 17-20): Integration, optimization, documentation
2. Requirements Analysis
2.1 Functional Requirements
FR-001: Core Attention Mechanisms
Priority: CRITICAL Description: Implement foundational attention mechanisms
Acceptance Criteria:
- FR-001.1: Scaled Dot-Product Attention (baseline)
- FR-001.2: Multi-Head Attention (2-16 heads configurable)
- FR-001.3: Supports variable-length input sequences
- FR-001.4: Numerically stable softmax implementation
- FR-001.5: Gradient computation for backpropagation
Test Cases:
#[test]
fn test_scaled_dot_product_attention() {
let attn = ScaledDotProductAttention::new(128);
let query = vec![1.0; 128];
let keys = vec![vec![1.0; 128]; 10];
let values = vec![vec![1.0; 128]; 10];
let output = attn.forward(&query, &keys, &values);
assert_eq!(output.len(), 128);
assert!(output.iter().all(|&x| x.is_finite()));
}
FR-002: Geometric Attention Mechanisms
Priority: HIGH Description: Implement attention mechanisms aware of geometric structure
Acceptance Criteria:
- FR-002.1: Edge-Featured Attention (GAT-style with edge attributes)
- FR-002.2: Hyperbolic Attention (Poincaré ball model)
- FR-002.3: Mixed-Curvature Attention (Euclidean + Hyperbolic fusion)
- FR-002.4: Manifold-Aware Attention
Edge-Featured Attention:
score(i, j) = LeakyReLU(a^T [W·h_i || W·h_j || W_e·edge_ij])
Hyperbolic Attention:
distance_poincare(x, y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
score(i, j) = -distance_poincare(q_i, k_j)
Test Cases:
#[test]
fn test_edge_featured_attention() {
let attn = EdgeFeaturedAttention::new(128, 32);
let edge_features = vec![vec![1.0; 32]; 10];
let output = attn.forward_with_edges(
&query, &keys, &values, Some(&edge_features)
);
assert!(output.len() == 128);
}
#[test]
fn test_hyperbolic_attention_bounds() {
let attn = HyperbolicAttention::new(128, -1.0);
let query = vec![0.5; 128]; // Inside Poincaré ball
// Ensure all embeddings stay in ball (||x|| < 1)
let output = attn.forward(&query, &keys, &values);
assert!(l2_norm(&output) < 0.99);
}
FR-003: Sparse Attention Patterns
Priority: HIGH Description: Reduce O(n²) complexity through sparsity
Acceptance Criteria:
- FR-003.1: Local + Global Attention (Longformer-style)
- FR-003.2: Linear Attention (Performer/FAVOR+)
- FR-003.3: Flash Attention (memory-efficient tiling)
- FR-003.4: Configurable sparsity patterns
Local + Global Pattern:
Attention Matrix:
[L L L G 0 0 0 0] L = Local (1-hop neighbors)
[L L L L G 0 0 0] G = Global (HNSW higher layers)
[L L L L L G 0 0] 0 = No attention
...
Complexity Requirements:
- Local + Global: O(k_local + k_global) where k << n
- Linear: O(n·d) where d = feature dimension
- Flash: O(n) memory (vs O(n²) standard)
Test Cases:
#[test]
fn test_sparse_attention_complexity() {
let sparse_attn = SparseGraphAttention::new(
local_window: 10,
global_nodes: 5
);
// Should only attend to 15 nodes, not all 1000
let num_neighbors = 1000;
let actual_attention = sparse_attn.get_attention_mask(num_neighbors);
assert!(actual_attention.count_nonzero() <= 15);
}
FR-004: Graph-Aware Mechanisms
Priority: HIGH Description: Attention specialized for graph structure
Acceptance Criteria:
- FR-004.1: RoPE (Rotary Position Embeddings) for graph distance
- FR-004.2: HNSW-layer encoding in attention
- FR-004.3: Cross-Attention (Dual-Space: graph + latent)
- FR-004.4: Structural feature integration (degree, centrality)
RoPE for Graphs:
// Encode graph distance via rotation
rotation_angle = graph_distance / base^(2i/d)
rotated[i] = emb[i] * cos(θ) - emb[i+1] * sin(θ)
Cross-Attention:
graph_attn = Attention(h, N_graph(h), N_graph(h))
latent_attn = Attention(h, N_latent(h), N_latent(h))
cross_attn = Attention(graph_attn, N_latent(h), N_latent(h))
output = Fusion(graph_attn, latent_attn, cross_attn)
FR-005: Adaptive Mechanisms
Priority: MEDIUM Description: Attention that adapts to input patterns
Acceptance Criteria:
- FR-005.1: Mixture of Experts (MoE) Attention
- FR-005.2: Learned routing between attention types
- FR-005.3: RL-based navigation function learning
- FR-005.4: Dynamic head count adjustment
MoE Attention:
router_scores = Router(query)
expert_indices = topk(router_scores, k=2)
output = Σ router_scores[i] * Expert[i](query, keys, values)
Experts:
- Local Expert: Standard attention for 1-hop neighbors
- Hierarchical Expert: Hyperbolic attention for HNSW layers
- Global Expert: Linear attention for distant nodes
- Structural Expert: Edge-featured attention
FR-006: Training and Optimization Utilities
Priority: HIGH Description: Tools for training attention-based models
Acceptance Criteria:
- FR-006.1: Contrastive losses (InfoNCE, Local Contrastive)
- FR-006.2: Spectral regularization (Laplacian smoothness)
- FR-006.3: Multi-objective loss balancing
- FR-006.4: Curriculum learning schedules
- FR-006.5: Hard negative mining
FR-007: Tensor Compression
Priority: MEDIUM Description: Memory-efficient tensor operations
Acceptance Criteria:
- FR-007.1: Quantization (INT8, INT4)
- FR-007.2: Low-rank factorization
- FR-007.3: Sparse tensor storage
- FR-007.4: Hierarchical compression for HNSW layers
FR-008: SIMD Optimizations
Priority: MEDIUM Description: Vectorized operations for performance
Acceptance Criteria:
- FR-008.1: AVX2/AVX-512 support for x86_64
- FR-008.2: NEON support for ARM
- FR-008.3: WASM SIMD support
- FR-008.4: Automatic fallback to scalar operations
2.2 Non-Functional Requirements
NFR-001: Performance
NFR-001.1: Latency
- Requirement: p95 latency <200ms for 1000-neighbor attention
- Measurement: Benchmark suite with synthetic graphs
- Verification: CI/CD performance regression tests
NFR-001.2: Throughput
- Requirement: 5,000 attention operations per second
- Measurement: Batch processing benchmarks
- Verification: Load testing with real HNSW graphs
NFR-001.3: Memory
- Requirement: Peak memory <50MB per operation
- Measurement: Memory profiling with valgrind/heaptrack
- Verification: Memory regression tests in CI
NFR-001.4: Scalability
- Requirement: Linear scaling up to 10K neighbors
- Measurement: Complexity analysis and empirical benchmarks
- Verification: Big-O complexity proofs + empirical validation
NFR-002: Reliability
NFR-002.1: Numerical Stability
- Requirement: All outputs finite (no NaN, Inf) across 10M operations
- Measurement: Fuzzing with random inputs
- Verification: Property-based testing with proptest
NFR-002.2: Error Handling
- Requirement: All errors recoverable, 100% error path coverage
- Measurement: Error injection testing
- Verification: Unit tests for error cases
NFR-002.3: Determinism
- Requirement: Same inputs produce same outputs (no random behavior)
- Measurement: Repeated execution tests
- Verification: Determinism tests in CI
NFR-003: Maintainability
NFR-003.1: Code Quality
- Requirement: Clippy clean (zero warnings), rustfmt formatted
- Measurement: CI linting checks
- Verification: Pre-commit hooks + CI gates
NFR-003.2: Documentation
- Requirement: 100% public API documented with examples
- Measurement: rustdoc coverage tool
- Verification: Doc tests pass, examples compile
NFR-003.3: Test Coverage
- Requirement: >90% line coverage, >95% branch coverage
- Measurement: cargo-tarpaulin
- Verification: CI coverage reports
NFR-004: Portability
NFR-004.1: Platform Support
- Requirement: Linux, macOS, Windows support
- Measurement: CI testing on all platforms
- Verification: Cross-platform integration tests
NFR-004.2: WASM Compatibility
- Requirement: Full functionality in WASM (wasm32-unknown-unknown)
- Measurement: WASM-specific test suite
- Verification: Browser and Node.js WASM tests
NFR-004.3: NAPI-RS Support
- Requirement: All attention mechanisms callable from Node.js
- Measurement: Node.js integration tests
- Verification: NPM package smoke tests
NFR-005: Security
NFR-005.1: Memory Safety
- Requirement: Zero unsafe code blocks (or 100% audited unsafe)
- Measurement: Manual code review
- Verification: MIRI checks, cargo-geiger
NFR-005.2: Dependency Audit
- Requirement: All dependencies audited, no known CVEs
- Measurement: cargo-audit
- Verification: Automated dependency scanning in CI
2.3 Constraints
C-001: Compatibility Constraints
- Rust Version: MSRV 1.77+ (per workspace configuration)
- No GPU: All implementations must run on CPU (WASM/NAPI-RS requirement)
- No Standard Library in WASM: Must support
#![no_std]for WASM32
C-002: API Constraints
- Backwards Compatibility: Once 1.0 released, follow SemVer strictly
- Trait Consistency: All attention mechanisms implement common
Attentiontrait - Builder Pattern: Configuration via builders, not constructors
C-003: Performance Constraints
- Compilation Time: Release build <60s on CI runners
- Binary Size: WASM bundle <2MB gzipped
- Memory Footprint: No global allocators, stack-preferred where possible
C-004: Licensing Constraints
- License: MIT (per workspace)
- Dependency Licenses: MIT/Apache-2.0 only (no GPL/LGPL)
3. Module Architecture
3.1 Crate Structure
ruvector-attention/
├── Cargo.toml
├── README.md
├── LICENSE
│
├── src/
│ ├── lib.rs # Public API, re-exports
│ │
│ ├── core/ # Core attention primitives
│ │ ├── mod.rs # Core module exports
│ │ ├── base.rs # Attention trait definition
│ │ ├── scaled_dot.rs # Scaled dot-product attention
│ │ ├── multi_head.rs # Multi-head attention
│ │ └── config.rs # Configuration structs
│ │
│ ├── geometric/ # Geometric attention
│ │ ├── mod.rs
│ │ ├── hyperbolic.rs # Poincaré ball attention
│ │ ├── edge_featured.rs # GAT-style edge attention
│ │ ├── mixed_curvature.rs # Euclidean + Hyperbolic
│ │ └── manifold.rs # General manifold attention
│ │
│ ├── sparse/ # Sparse patterns
│ │ ├── mod.rs
│ │ ├── local_global.rs # Longformer-style
│ │ ├── linear.rs # Performer/FAVOR+
│ │ ├── flash.rs # Flash Attention (tiled)
│ │ └── patterns.rs # Sparsity pattern utilities
│ │
│ ├── graph/ # Graph-aware attention
│ │ ├── mod.rs
│ │ ├── rope_graph.rs # RoPE for graph distances
│ │ ├── cross_space.rs # Dual-space cross-attention
│ │ ├── hnsw_aware.rs # HNSW layer encoding
│ │ └── structural.rs # Degree/centrality features
│ │
│ ├── adaptive/ # Adaptive/learned mechanisms
│ │ ├── mod.rs
│ │ ├── moe.rs # Mixture of Experts
│ │ ├── learned_routing.rs # Attention routing
│ │ ├── rl_navigator.rs # RL-based graph navigation
│ │ └── dynamic_heads.rs # Adaptive head count
│ │
│ ├── training/ # Training utilities
│ │ ├── mod.rs
│ │ ├── losses.rs # Contrastive, reconstruction
│ │ ├── optimizers.rs # SGD, Adam, etc.
│ │ ├── regularizers.rs # Spectral, L2, etc.
│ │ ├── curriculum.rs # Curriculum learning
│ │ └── hard_negatives.rs # Negative sampling
│ │
│ ├── compression/ # Tensor compression
│ │ ├── mod.rs
│ │ ├── quantization.rs # INT8/INT4 quantization
│ │ ├── low_rank.rs # SVD/Tucker decomposition
│ │ ├── sparse_storage.rs # CSR/COO sparse tensors
│ │ └── hierarchical.rs # Layer-wise compression
│ │
│ ├── simd/ # SIMD optimizations
│ │ ├── mod.rs
│ │ ├── avx2.rs # AVX2 kernels
│ │ ├── avx512.rs # AVX-512 kernels
│ │ ├── neon.rs # ARM NEON kernels
│ │ ├── wasm_simd.rs # WASM SIMD
│ │ └── dispatch.rs # Runtime detection
│ │
│ ├── utils/ # Utilities
│ │ ├── mod.rs
│ │ ├── math.rs # Math primitives
│ │ ├── tensor.rs # Tensor ops
│ │ ├── softmax.rs # Numerically stable softmax
│ │ └── distances.rs # Distance metrics
│ │
│ └── prelude.rs # Common imports
│
├── benches/ # Benchmarks
│ ├── attention_benchmark.rs # Core attention benchmarks
│ ├── geometric_benchmark.rs # Geometric attention
│ ├── sparse_benchmark.rs # Sparse patterns
│ └── comparison_benchmark.rs # Mechanism comparison
│
├── tests/ # Integration tests
│ ├── core_tests.rs # Core attention tests
│ ├── geometric_tests.rs # Geometric tests
│ ├── sparse_tests.rs # Sparse pattern tests
│ ├── numerical_stability.rs # Stability tests
│ └── property_tests.rs # Property-based tests
│
├── ffi/ # Foreign Function Interface
│ ├── wasm/ # WASM bindings
│ │ ├── Cargo.toml
│ │ ├── src/
│ │ │ └── lib.rs # wasm-bindgen exports
│ │ └── tests/
│ │ └── web.rs # Browser tests
│ │
│ └── napi/ # NAPI-RS bindings
│ ├── Cargo.toml
│ ├── src/
│ │ └── lib.rs # napi-derive exports
│ └── index.d.ts # TypeScript definitions
│
├── cli/ # CLI interface
│ ├── Cargo.toml
│ └── src/
│ ├── main.rs # CLI entry point
│ ├── commands/ # CLI commands
│ │ ├── benchmark.rs # Run benchmarks
│ │ ├── compare.rs # Compare mechanisms
│ │ └── analyze.rs # Analyze attention patterns
│ └── output.rs # Formatting
│
├── examples/ # Examples
│ ├── basic_attention.rs # Hello world
│ ├── graph_attention.rs # Graph-aware usage
│ ├── hnsw_integration.rs # HNSW integration
│ ├── custom_mechanism.rs # Extending the library
│ └── distributed_attention.rs # Parallel processing
│
└── docs/ # Documentation
├── design/ # Design documents
│ ├── architecture.md # Architecture overview
│ ├── api_design.md # API design rationale
│ └── performance.md # Performance analysis
│
├── guides/ # User guides
│ ├── getting_started.md # Quick start
│ ├── mechanism_guide.md # Choosing mechanisms
│ └── integration.md # Integration guide
│
└── research/ # Research notes
├── attention_mechanisms.md
├── benchmarks.md
└── experiments.md
3.2 Module Responsibilities
Core Module (src/core/)
Responsibility: Foundational attention mechanisms and trait definitions
Key Components:
Attentiontrait: Common interface for all mechanismsScaledDotProductAttention: Baseline implementationMultiHeadAttention: Standard multi-head decompositionAttentionConfig: Configuration builders
Dependencies: utils only
Geometric Module (src/geometric/)
Responsibility: Geometry-aware attention mechanisms
Key Components:
HyperbolicAttention: Poincaré ball operationsEdgeFeaturedAttention: GAT-style with edge featuresMixedCurvatureAttention: Product space (Euclidean × Hyperbolic)
Dependencies: core, utils
Sparse Module (src/sparse/)
Responsibility: Efficient sparse attention patterns
Key Components:
LocalGlobalAttention: Longformer-styleLinearAttention: Kernel-based approximationFlashAttention: Memory-efficient tiling
Dependencies: core, utils, simd (optional)
Graph Module (src/graph/)
Responsibility: Graph structure-aware mechanisms
Key Components:
GraphRoPE: Rotary embeddings for graph distanceCrossSpaceAttention: Dual topology + latent spaceHNSWAwareAttention: HNSW layer encoding
Dependencies: core, geometric, utils
Adaptive Module (src/adaptive/)
Responsibility: Learned and adaptive attention
Key Components:
MoEAttention: Mixture of experts routingRLNavigator: Reinforcement learning-based navigationDynamicHeadAttention: Runtime head count adjustment
Dependencies: core, geometric, sparse, graph, training
Training Module (src/training/)
Responsibility: Loss functions and optimization
Key Components:
ContrastiveLoss: InfoNCE, TripletSpectralRegularizer: Laplacian smoothnessHardNegativeSampler: Mining hard negativesCurriculumScheduler: Loss weight scheduling
Dependencies: utils only
Compression Module (src/compression/)
Responsibility: Memory-efficient tensor storage
Key Components:
Quantizer: INT8/INT4 quantizationLowRankFactorizer: SVD compressionSparseStorage: CSR/COO formats
Dependencies: utils, simd (optional)
SIMD Module (src/simd/)
Responsibility: Vectorized operations
Key Components:
SimdDispatcher: Runtime CPU feature detection- Platform-specific kernels: AVX2, AVX-512, NEON, WASM SIMD
Dependencies: utils only
3.3 Dependency Graph
┌─────────────────────────────────────────────────────────────┐
│ Public API (lib.rs) │
└─────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
v v v
┌──────────┐ ┌──────────┐ ┌──────────┐
│ core │ │ training │ │ utils │
└──────────┘ └──────────┘ └──────────┘
│ │
└────────┬───────────┬─────────┬────────┘
│ │ │
v v v
┌──────────┐ ┌─────────┐ ┌──────┐
│geometric │ │ sparse │ │ simd │
└──────────┘ └─────────┘ └──────┘
│ │
└─────┬─────┘
│
v
┌──────────┐
│ graph │
└──────────┘
│
v
┌──────────┐
│ adaptive │
└──────────┘
│
v
┌─────────────┐
│ compression │
└─────────────┘
Design Principles:
- Acyclic: No circular dependencies
- Layered: Lower layers have fewer dependencies
- Optional Features: SIMD, compression via feature flags
- Core Stability:
coreandutilsare most stable
4. API Design
4.1 Core Trait: Attention
/// Core trait for all attention mechanisms
pub trait Attention: Send + Sync {
/// Forward pass: compute attention over keys/values given query
///
/// # Arguments
/// * `query` - Query vector (d-dimensional)
/// * `keys` - Key vectors (n × d)
/// * `values` - Value vectors (n × d)
///
/// # Returns
/// Attention-weighted aggregation of values (d-dimensional)
///
/// # Example
/// ```
/// use ruvector_attention::core::ScaledDotProductAttention;
/// use ruvector_attention::Attention;
///
/// let attn = ScaledDotProductAttention::new(128);
/// let query = vec![1.0; 128];
/// let keys = vec![vec![0.5; 128]; 10];
/// let values = keys.clone();
///
/// let output = attn.forward(&query, &keys, &values);
/// assert_eq!(output.len(), 128);
/// ```
fn forward(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
) -> Result<Vec<f32>, AttentionError>;
/// Get attention weights without computing weighted sum
///
/// Useful for visualization and debugging
fn attention_weights(
&self,
query: &[f32],
keys: &[Vec<f32>],
) -> Result<Vec<f32>, AttentionError>;
/// Get hidden dimension
fn hidden_dim(&self) -> usize;
/// Check if mechanism supports variable-length inputs
fn supports_variable_length(&self) -> bool {
true
}
/// Estimated computational complexity (for documentation)
fn complexity(&self) -> Complexity {
Complexity::Quadratic
}
}
/// Computational complexity categories
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Complexity {
Linear, // O(n)
Linearithmic, // O(n log n)
Quadratic, // O(n²)
Custom(&'static str), // Custom complexity description
}
4.2 Error Handling
/// Errors that can occur during attention computation
#[derive(Debug, thiserror::Error)]
pub enum AttentionError {
/// Input dimension mismatch
#[error("Dimension mismatch: expected {expected}, got {actual}")]
DimensionMismatch { expected: usize, actual: usize },
/// Empty input
#[error("Empty input: {context}")]
EmptyInput { context: String },
/// Numerical instability detected
#[error("Numerical instability: {message}")]
NumericalInstability { message: String },
/// Invalid configuration
#[error("Invalid configuration: {message}")]
InvalidConfig { message: String },
/// Out of bounds access
#[error("Index out of bounds: {index} >= {len}")]
OutOfBounds { index: usize, len: usize },
/// Unsupported operation
#[error("Unsupported operation: {operation}")]
Unsupported { operation: String },
/// Internal error
#[error("Internal error: {message}")]
Internal { message: String },
}
pub type Result<T> = std::result::Result<T, AttentionError>;
4.3 Builder Pattern
/// Builder for ScaledDotProductAttention
#[derive(Debug, Clone)]
pub struct ScaledDotProductAttentionBuilder {
hidden_dim: usize,
dropout: Option<f32>,
temperature: f32,
normalize: bool,
}
impl ScaledDotProductAttentionBuilder {
pub fn new(hidden_dim: usize) -> Self {
Self {
hidden_dim,
dropout: None,
temperature: 1.0,
normalize: true,
}
}
pub fn dropout(mut self, rate: f32) -> Self {
assert!((0.0..=1.0).contains(&rate), "Dropout must be in [0, 1]");
self.dropout = Some(rate);
self
}
pub fn temperature(mut self, temp: f32) -> Self {
assert!(temp > 0.0, "Temperature must be positive");
self.temperature = temp;
self
}
pub fn normalize(mut self, normalize: bool) -> Self {
self.normalize = normalize;
self
}
pub fn build(self) -> Result<ScaledDotProductAttention> {
if self.hidden_dim == 0 {
return Err(AttentionError::InvalidConfig {
message: "hidden_dim must be > 0".to_string(),
});
}
Ok(ScaledDotProductAttention {
hidden_dim: self.hidden_dim,
scale: (self.hidden_dim as f32).sqrt().recip(),
dropout: self.dropout,
temperature: self.temperature,
normalize: self.normalize,
})
}
}
// Usage:
let attn = ScaledDotProductAttention::builder(128)
.dropout(0.1)
.temperature(0.07)
.build()?;
4.4 Multi-Head Attention API
/// Multi-head attention with configurable heads
pub struct MultiHeadAttention {
num_heads: usize,
head_dim: usize,
hidden_dim: usize,
w_q: Vec<Linear>, // Query projections per head
w_k: Vec<Linear>, // Key projections per head
w_v: Vec<Linear>, // Value projections per head
w_o: Linear, // Output projection
dropout: Option<f32>,
}
impl MultiHeadAttention {
pub fn builder(hidden_dim: usize, num_heads: usize) -> MultiHeadAttentionBuilder {
MultiHeadAttentionBuilder::new(hidden_dim, num_heads)
}
/// Get attention patterns for all heads
pub fn head_attention_weights(
&self,
query: &[f32],
keys: &[Vec<f32>],
) -> Result<Vec<Vec<f32>>> {
// Returns [num_heads × num_keys] attention weights
// Useful for interpretability
}
/// Get specific head output
pub fn head_output(
&self,
head_idx: usize,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
) -> Result<Vec<f32>> {
// Get output of a single head (for debugging)
}
}
impl Attention for MultiHeadAttention {
fn forward(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
) -> Result<Vec<f32>> {
// 1. Project to heads: Q_i, K_i, V_i for each head i
// 2. Compute attention per head: head_i = Attention(Q_i, K_i, V_i)
// 3. Concatenate heads: concat(head_1, ..., head_h)
// 4. Output projection: W_o @ concat
}
}
4.5 Geometric Attention API
/// Hyperbolic attention in Poincaré ball
pub struct HyperbolicAttention {
hidden_dim: usize,
curvature: f32, // Negative curvature (e.g., -1.0)
w_q: Linear,
w_k: Linear,
w_v: Linear,
}
impl HyperbolicAttention {
/// Create new hyperbolic attention
///
/// # Arguments
/// * `hidden_dim` - Embedding dimension
/// * `curvature` - Curvature of hyperbolic space (must be negative)
pub fn new(hidden_dim: usize, curvature: f32) -> Result<Self> {
if curvature >= 0.0 {
return Err(AttentionError::InvalidConfig {
message: "Hyperbolic curvature must be negative".to_string(),
});
}
// ...
}
/// Poincaré distance between two points
pub fn poincare_distance(&self, x: &[f32], y: &[f32]) -> f32 {
// d(x,y) = arccosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
}
/// Möbius addition (hyperbolic vector addition)
pub fn mobius_add(&self, x: &[f32], y: &[f32]) -> Vec<f32> {
// (1+2⟨x,y⟩+||y||²)x + (1-||x||²)y / (1+2⟨x,y⟩+||x||²||y||²)
}
/// Project point onto Poincaré ball (clip to ||x|| < 1)
pub fn project_to_ball(&self, x: &mut [f32], eps: f32) {
let norm = l2_norm(x);
if norm >= 1.0 - eps {
let scale = (1.0 - eps) / norm;
for xi in x.iter_mut() {
*xi *= scale;
}
}
}
}
impl Attention for HyperbolicAttention {
fn forward(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
) -> Result<Vec<f32>> {
// 1. Compute hyperbolic similarities: -d_poincare(q, k_j)
// 2. Softmax attention weights
// 3. Aggregate in hyperbolic space via Möbius operations
}
}
4.6 Graph-Aware Attention API
/// Attention with graph-specific features
pub trait GraphAttention: Attention {
/// Forward pass with edge features
fn forward_with_edges(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
edge_features: &[Vec<f32>],
) -> Result<Vec<f32>>;
/// Forward pass with graph metadata
fn forward_with_metadata(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
metadata: &GraphMetadata,
) -> Result<Vec<f32>>;
}
/// Graph metadata for attention
#[derive(Debug, Clone)]
pub struct GraphMetadata {
/// Graph distances (e.g., shortest path lengths)
pub distances: Option<Vec<f32>>,
/// HNSW layer indices
pub hnsw_layers: Option<Vec<usize>>,
/// Edge weights
pub edge_weights: Option<Vec<f32>>,
/// Structural features (degree, centrality, etc.)
pub structural_features: Option<Vec<Vec<f32>>>,
}
/// RoPE-enhanced attention for graphs
pub struct GraphRoPE {
hidden_dim: usize,
base: f32, // Frequency base (default 10000)
w_q: Linear,
w_k: Linear,
w_v: Linear,
}
impl GraphRoPE {
/// Apply rotation based on graph distance
pub fn apply_rotation(&self, embedding: &[f32], distance: f32) -> Vec<f32> {
// Rotate embedding by angle proportional to distance
}
}
impl GraphAttention for GraphRoPE {
fn forward_with_metadata(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
metadata: &GraphMetadata,
) -> Result<Vec<f32>> {
let distances = metadata.distances.as_ref()
.ok_or_else(|| AttentionError::InvalidConfig {
message: "GraphRoPE requires distance metadata".to_string(),
})?;
// Apply rotations based on distances
// Compute attention with rotated embeddings
}
}
4.7 Adaptive Attention API
/// Mixture of Experts attention
pub struct MoEAttention {
router: Linear, // Maps query to expert scores
experts: Vec<Box<dyn Attention>>,
top_k: usize, // Number of experts to activate
}
impl MoEAttention {
pub fn builder() -> MoEAttentionBuilder {
MoEAttentionBuilder::new()
}
/// Get routing decisions
pub fn get_routing(
&self,
query: &[f32],
) -> Result<Vec<(usize, f32)>> {
// Returns (expert_index, weight) pairs
}
/// Add an expert to the mixture
pub fn add_expert(&mut self, expert: Box<dyn Attention>) {
self.experts.push(expert);
}
}
impl Attention for MoEAttention {
fn forward(
&self,
query: &[f32],
keys: &[Vec<f32>],
values: &[Vec<f32>],
) -> Result<Vec<f32>> {
// 1. Route: scores = Router(query)
// 2. Select top-k experts
// 3. Weighted combination of expert outputs
}
}
/// Builder for MoE attention
pub struct MoEAttentionBuilder {
router_hidden_dim: usize,
experts: Vec<Box<dyn Attention>>,
top_k: usize,
}
impl MoEAttentionBuilder {
pub fn add_local_expert(mut self, hidden_dim: usize) -> Self {
self.experts.push(Box::new(
ScaledDotProductAttention::new(hidden_dim).unwrap()
));
self
}
pub fn add_hyperbolic_expert(mut self, hidden_dim: usize, curvature: f32) -> Self {
self.experts.push(Box::new(
HyperbolicAttention::new(hidden_dim, curvature).unwrap()
));
self
}
pub fn add_sparse_expert(mut self, local_window: usize, global_nodes: usize) -> Self {
self.experts.push(Box::new(
LocalGlobalAttention::new(local_window, global_nodes).unwrap()
));
self
}
pub fn top_k(mut self, k: usize) -> Self {
self.top_k = k;
self
}
pub fn build(self) -> Result<MoEAttention> {
// Validation and construction
}
}
4.8 Training Utilities API
/// Contrastive loss functions
pub mod losses {
/// InfoNCE contrastive loss
pub fn info_nce(
anchor: &[f32],
positives: &[&[f32]],
negatives: &[&[f32]],
temperature: f32,
) -> f32;
/// Triplet loss
pub fn triplet(
anchor: &[f32],
positive: &[f32],
negative: &[f32],
margin: f32,
) -> f32;
/// Local contrastive loss (graph-specific)
pub fn local_contrastive(
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
non_neighbor_embeddings: &[Vec<f32>],
temperature: f32,
) -> f32;
}
/// Hard negative mining
pub mod hard_negatives {
pub enum SamplingStrategy {
Distance, // Most similar non-neighbors
Degree, // Similar degree distribution
Mixed, // Combination
}
pub fn sample_hard_negatives(
anchor: &[f32],
all_embeddings: &[Vec<f32>],
positive_indices: &[usize],
k: usize,
strategy: SamplingStrategy,
) -> Vec<Vec<f32>>;
}
/// Spectral regularization
pub mod regularizers {
/// Laplacian smoothness
pub fn laplacian(
embeddings: &[Vec<f32>],
edges: &[(usize, usize)],
edge_weights: Option<&[f32]>,
) -> f32;
/// Orthogonality regularization
pub fn orthogonality(embeddings: &[Vec<f32>]) -> f32;
/// Embedding norm regularization
pub fn norm_penalty(embeddings: &[Vec<f32>], target_norm: f32) -> f32;
}
5. Performance Targets
5.1 Latency Targets
| Operation | Input Size | p50 | p95 | p99 |
|---|---|---|---|---|
| Scaled Dot-Product | 100 neighbors | <5ms | <10ms | <20ms |
| Scaled Dot-Product | 1K neighbors | <50ms | <100ms | <150ms |
| Multi-Head (4 heads) | 100 neighbors | <10ms | <20ms | <30ms |
| Multi-Head (4 heads) | 1K neighbors | <80ms | <150ms | <200ms |
| Hyperbolic | 100 neighbors | <15ms | <30ms | <50ms |
| Sparse (Local+Global) | 1K neighbors | <30ms | <60ms | <100ms |
| Flash Attention | 1K neighbors | <40ms | <80ms | <120ms |
| MoE (4 experts, top-2) | 1K neighbors | <100ms | <180ms | <250ms |
Measurement Method: Criterion.rs benchmarks with 1000 iterations, warm cache
5.2 Throughput Targets
| Mechanism | Target (ops/sec) | Stretch (ops/sec) |
|---|---|---|
| Scaled Dot-Product | 10,000 | 20,000 |
| Multi-Head | 5,000 | 10,000 |
| Hyperbolic | 3,000 | 6,000 |
| Sparse | 8,000 | 15,000 |
| Flash | 7,000 | 12,000 |
Measurement: Batch processing of 1000 operations, averaged over 10 runs
5.3 Memory Targets
| Mechanism | Peak Memory (1K neighbors) | Target | Stretch |
|---|---|---|---|
| Scaled Dot-Product | Full attention matrix | <50MB | <25MB |
| Multi-Head (4 heads) | 4× attention matrices | <100MB | <50MB |
| Flash Attention | Tiled computation | <20MB | <10MB |
| Sparse | Sparse patterns only | <15MB | <8MB |
Measurement: Valgrind/heaptrack during benchmark execution
5.4 Compilation Targets
| Configuration | Target | Stretch |
|---|---|---|
| Debug build | <10s | <5s |
| Release build (--release) | <60s | <30s |
| Release with LTO | <120s | <60s |
| WASM build | <90s | <45s |
Measurement: CI build times on GitHub Actions standard runners
5.5 Binary Size Targets
| Target | Size (uncompressed) | Size (gzipped) | Target | Stretch |
|---|---|---|---|---|
| WASM | 5-8 MB | 1.5-2 MB | <2MB | <1MB |
| Native (Linux x86_64) | 10-15 MB | N/A | <15MB | <10MB |
| NAPI-RS addon | 8-12 MB | N/A | <12MB | <8MB |
Measurement: wasm-opt for WASM, strip for native
5.6 Scalability Targets
Linear Scaling:
- Operations should scale O(n) or better up to 10K neighbors
- No quadratic blowup in standard use cases
Benchmark:
#[bench]
fn bench_scalability_attention(b: &mut Bencher) {
for n in [100, 500, 1000, 5000, 10000] {
let attn = ScaledDotProductAttention::new(128).unwrap();
let query = vec![1.0; 128];
let keys = vec![vec![1.0; 128]; n];
let values = keys.clone();
let start = Instant::now();
b.iter(|| attn.forward(&query, &keys, &values));
let elapsed = start.elapsed();
println!("n={}: {:?}", n, elapsed);
// Assert linear or sub-quadratic scaling
}
}
6. Compatibility Matrix
6.1 Rust Version Support
| Rust Version | Support Status | Notes |
|---|---|---|
| 1.77.0 (MSRV) | ✅ Supported | Minimum supported version |
| 1.78.x | ✅ Supported | |
| 1.79.x | ✅ Supported | |
| 1.80.x+ | ✅ Supported | Latest stable |
| Nightly | ⚠️ Best-effort | May use unstable features behind flags |
Testing: CI runs on MSRV, stable, and nightly
6.2 Platform Support
Desktop Platforms
| Platform | Tier | Support Status | CI Testing |
|---|---|---|---|
| Linux x86_64 | Tier 1 | ✅ Full support | Yes |
| Linux ARM64 | Tier 2 | ✅ Full support | Yes |
| macOS x86_64 | Tier 1 | ✅ Full support | Yes |
| macOS ARM64 (M1/M2) | Tier 1 | ✅ Full support | Yes |
| Windows x86_64 | Tier 1 | ✅ Full support | Yes |
| Windows ARM64 | Tier 3 | ⚠️ Best-effort | No |
WASM Targets
| Target | Support Status | Notes |
|---|---|---|
| wasm32-unknown-unknown | ✅ Full support | Browser + Node.js |
| wasm32-wasi | ✅ Full support | WASI runtime |
| wasm32-unknown-emscripten | ⚠️ Untested | Should work |
WASM Features:
- ✅ All attention mechanisms
- ✅ SIMD support (where available)
- ✅ Multi-threading via Web Workers
- ❌ File I/O (not needed)
Mobile Platforms
| Platform | Support Status | Notes |
|---|---|---|
| iOS ARM64 | ⚠️ Untested | Should work via FFI |
| Android ARM64 | ⚠️ Untested | Should work via FFI |
6.3 Node.js Support (NAPI-RS)
| Node.js Version | Support Status | Notes |
|---|---|---|
| 18.x LTS | ✅ Supported | NAPI-RS requires N-API 9+ |
| 20.x LTS | ✅ Supported | Recommended |
| 21.x+ Current | ✅ Supported | Latest features |
NAPI-RS Features:
- ✅ All attention mechanisms exposed
- ✅ TypeScript definitions
- ✅ Async operations (Tokio runtime)
- ✅ Buffer zero-copy where possible
Package Platforms:
{
"napi": {
"triples": {
"defaults": true,
"additional": [
"x86_64-unknown-linux-musl",
"aarch64-unknown-linux-gnu",
"aarch64-apple-darwin",
"x86_64-pc-windows-msvc"
]
}
}
}
6.4 Feature Flags
| Feature | Default | Description | Dependencies |
|---|---|---|---|
std |
✅ | Standard library support | None |
simd |
❌ | SIMD optimizations | std |
rayon |
❌ | Parallel processing | std, rayon |
compression |
❌ | Tensor compression | std |
wasm |
❌ | WASM-specific bindings | wasm-bindgen |
napi |
❌ | Node.js bindings | napi-rs |
cli |
❌ | CLI interface | std, clap |
serde |
✅ | Serialization support | serde |
Example:
[dependencies]
ruvector-attention = { version = "0.1", features = ["simd", "rayon"] }
7. Testing Strategy
7.1 Unit Tests
Coverage Target: >90% line coverage, >95% branch coverage
Test Categories:
7.1.1 Correctness Tests
#[cfg(test)]
mod correctness_tests {
use super::*;
#[test]
fn test_attention_output_dimension() {
let attn = ScaledDotProductAttention::new(128).unwrap();
let output = attn.forward(&query, &keys, &values).unwrap();
assert_eq!(output.len(), 128);
}
#[test]
fn test_attention_weights_sum_to_one() {
let attn = ScaledDotProductAttention::new(128).unwrap();
let weights = attn.attention_weights(&query, &keys).unwrap();
let sum: f32 = weights.iter().sum();
assert!((sum - 1.0).abs() < 1e-5);
}
#[test]
fn test_empty_neighbors_handling() {
let attn = ScaledDotProductAttention::new(128).unwrap();
let result = attn.forward(&query, &[], &[]);
assert!(result.is_err());
assert!(matches!(result, Err(AttentionError::EmptyInput { .. })));
}
}
7.1.2 Numerical Stability Tests
#[cfg(test)]
mod stability_tests {
#[test]
fn test_large_scores_softmax() {
// Test softmax with very large scores (overflow risk)
let scores = vec![1000.0, 999.0, 998.0];
let weights = softmax(&scores);
assert!(weights.iter().all(|&w| w.is_finite()));
}
#[test]
fn test_small_scores_softmax() {
// Test softmax with very small scores (underflow risk)
let scores = vec![-1000.0, -999.0, -998.0];
let weights = softmax(&scores);
assert!(weights.iter().all(|&w| w.is_finite()));
}
#[test]
fn test_hyperbolic_boundary() {
let attn = HyperbolicAttention::new(128, -1.0).unwrap();
let query = vec![0.99; 128]; // Near ball boundary
let output = attn.forward(&query, &keys, &values).unwrap();
// Output must stay inside ball
assert!(l2_norm(&output) < 1.0);
}
}
7.1.3 Edge Case Tests
#[cfg(test)]
mod edge_case_tests {
#[test]
fn test_single_neighbor() {
let attn = ScaledDotProductAttention::new(128).unwrap();
let keys = vec![vec![1.0; 128]];
let output = attn.forward(&query, &keys, &keys).unwrap();
// With single neighbor, attention weight should be 1.0
}
#[test]
fn test_identical_keys() {
// All keys identical -> uniform attention
let keys = vec![vec![1.0; 128]; 10];
let weights = attn.attention_weights(&query, &keys).unwrap();
for w in &weights {
assert!((w - 0.1).abs() < 1e-5); // 1/10
}
}
#[test]
fn test_zero_vectors() {
let query = vec![0.0; 128];
let keys = vec![vec![0.0; 128]; 10];
let result = attn.forward(&query, &keys, &keys);
// Should handle gracefully (may return error or uniform weights)
}
}
7.2 Integration Tests
Goal: Test interactions between modules
7.2.1 Multi-Mechanism Pipeline
#[test]
fn test_moe_with_multiple_experts() {
let moe = MoEAttention::builder()
.add_local_expert(128)
.add_hyperbolic_expert(128, -1.0)
.add_sparse_expert(10, 5)
.top_k(2)
.build()
.unwrap();
let output = moe.forward(&query, &keys, &values).unwrap();
assert_eq!(output.len(), 128);
}
7.2.2 Graph Attention with HNSW
#[test]
fn test_graph_rope_with_hnsw_layers() {
let rope = GraphRoPE::new(128, 10000.0).unwrap();
let metadata = GraphMetadata {
distances: Some(vec![1.0, 2.0, 3.0]),
hnsw_layers: Some(vec![0, 1, 2]),
..Default::default()
};
let output = rope.forward_with_metadata(
&query, &keys, &values, &metadata
).unwrap();
assert_eq!(output.len(), 128);
}
7.3 Property-Based Tests
Tool: proptest
use proptest::prelude::*;
proptest! {
#[test]
fn prop_attention_weights_normalized(
query in prop::collection::vec(-10.0f32..10.0, 128),
keys in prop::collection::vec(
prop::collection::vec(-10.0f32..10.0, 128),
1..100
)
) {
let attn = ScaledDotProductAttention::new(128).unwrap();
let weights = attn.attention_weights(&query, &keys).unwrap();
let sum: f32 = weights.iter().sum();
prop_assert!((sum - 1.0).abs() < 1e-4);
}
#[test]
fn prop_attention_output_finite(
query in prop::collection::vec(-100.0f32..100.0, 128),
keys in prop::collection::vec(
prop::collection::vec(-100.0f32..100.0, 128),
1..100
)
) {
let attn = ScaledDotProductAttention::new(128).unwrap();
let values = keys.clone();
let output = attn.forward(&query, &keys, &values).unwrap();
prop_assert!(output.iter().all(|&x| x.is_finite()));
}
}
7.4 Benchmark Tests
Tool: criterion
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_scaled_dot_product(c: &mut Criterion) {
let attn = ScaledDotProductAttention::new(128).unwrap();
let query = vec![1.0; 128];
let keys = vec![vec![1.0; 128]; 1000];
let values = keys.clone();
c.bench_function("scaled_dot_product_1k", |b| {
b.iter(|| {
attn.forward(
black_box(&query),
black_box(&keys),
black_box(&values)
)
})
});
}
fn bench_multi_head(c: &mut Criterion) {
let mut group = c.benchmark_group("multi_head_attention");
for num_heads in [1, 2, 4, 8] {
let attn = MultiHeadAttention::builder(128, num_heads)
.build()
.unwrap();
group.bench_function(format!("heads_{}", num_heads), |b| {
b.iter(|| {
attn.forward(
black_box(&query),
black_box(&keys),
black_box(&values)
)
})
});
}
group.finish();
}
criterion_group!(benches, bench_scaled_dot_product, bench_multi_head);
criterion_main!(benches);
7.5 Fuzzing
Tool: cargo-fuzz
#![no_main]
use libfuzzer_sys::fuzz_target;
use ruvector_attention::core::ScaledDotProductAttention;
use ruvector_attention::Attention;
fuzz_target!(|data: &[u8]| {
if data.len() < 512 {
return;
}
// Parse fuzzer input into query, keys, values
let query: Vec<f32> = data[0..128]
.chunks(4)
.map(|chunk| f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]))
.collect();
// ... similar for keys and values
let attn = ScaledDotProductAttention::new(32).unwrap();
// Fuzz target: should never panic
let _ = attn.forward(&query, &keys, &values);
});
7.6 WASM Tests
#[cfg(target_arch = "wasm32")]
#[cfg(test)]
mod wasm_tests {
use wasm_bindgen_test::*;
#[wasm_bindgen_test]
fn test_attention_in_wasm() {
let attn = ScaledDotProductAttention::new(128).unwrap();
let output = attn.forward(&query, &keys, &values).unwrap();
assert_eq!(output.len(), 128);
}
#[wasm_bindgen_test]
fn test_simd_in_wasm() {
#[cfg(feature = "simd")]
{
// Test WASM SIMD operations
}
}
}
7.7 Performance Regression Tests
CI Check: Fail if performance degrades >5% from baseline
#[test]
fn test_performance_regression() {
let baseline_latency_ms = 100.0; // From previous run
let start = Instant::now();
let attn = ScaledDotProductAttention::new(128).unwrap();
for _ in 0..1000 {
attn.forward(&query, &keys, &values).unwrap();
}
let elapsed = start.elapsed().as_secs_f64() * 1000.0;
let current_latency_ms = elapsed / 1000.0;
let regression = (current_latency_ms - baseline_latency_ms) / baseline_latency_ms;
assert!(
regression < 0.05,
"Performance regression detected: {}%", regression * 100.0
);
}
8. Success Criteria
8.1 Quantifiable Metrics
8.1.1 Functional Completeness
- 10/10 attention mechanisms implemented (100%)
- All mechanisms pass unit tests (100% pass rate)
- Integration tests pass (100% pass rate)
8.1.2 Performance
- Latency: p95 <200ms @ 1K neighbors for all mechanisms
- Throughput: >5,000 ops/sec for scaled dot-product
- Memory: Peak usage <50MB per operation
- Scalability: Linear or sub-quadratic up to 10K neighbors
8.1.3 Quality
- Test coverage: >90% line coverage
- Documentation coverage: 100% public APIs documented
- Zero compiler warnings: Clippy clean
- Zero unsafe code: Or 100% audited and justified
8.1.4 Compatibility
- Platforms: Linux, macOS, Windows passing CI
- WASM: All tests pass in wasm32-unknown-unknown
- NAPI-RS: Node.js 18+, all platforms published
- MSRV: Rust 1.77+ supported
8.1.5 Adoption
- Examples: 5+ runnable examples
- Documentation: Getting started guide, API docs, tutorials
- Integration: Used in ruvector-gnn crate
8.2 Acceptance Tests
Phase 1 Acceptance (Weeks 1-4)
✅ Core attention mechanisms (scaled dot-product, multi-head)
✅ Unit tests passing (>80% coverage)
✅ Basic benchmarks established
✅ API design finalized
Phase 2 Acceptance (Weeks 5-8)
✅ Geometric attention (hyperbolic, edge-featured)
✅ Integration tests with graph structures
✅ Performance targets met for core mechanisms
✅ WASM compatibility verified
Phase 3 Acceptance (Weeks 9-12)
✅ Sparse mechanisms (flash, linear, local+global)
✅ Memory targets met
✅ NAPI-RS bindings complete
✅ Documentation 50% complete
Phase 4 Acceptance (Weeks 13-16)
✅ Adaptive mechanisms (MoE, cross-attention)
✅ Training utilities complete
✅ CLI interface functional
✅ All performance targets met
Phase 5 Acceptance (Weeks 17-20)
✅ Full integration with ruvector-gnn
✅ Documentation 100% complete
✅ Optimization passes complete
✅ Ready for 1.0 release
8.3 Release Criteria (v1.0)
Blocker Issues (must fix before release):
- Zero failing tests
- Zero compiler warnings
- All performance targets met
- 100% public API documented
- Security audit complete
- Cross-platform CI passing
Nice-to-Have (can defer to 1.1):
- GPU acceleration (CUDA/Metal)
- Additional attention variants
- Advanced SIMD optimizations
- Distributed attention
9. Constraints and Dependencies
9.1 Technical Constraints
C-001: No GPU Dependency
Constraint: All implementations must run on CPU Rationale: WASM and NAPI-RS environments lack GPU access Impact: May limit performance for very large graphs Mitigation: SIMD optimizations, algorithm choice (sparse/linear attention)
C-002: Memory Constraints in WASM
Constraint: WASM has limited memory (typically 2-4GB) Rationale: Browser and Node.js WASM environments Impact: Cannot materialize large attention matrices Mitigation: Flash Attention, sparse patterns, streaming computation
C-003: Serialization Requirements
Constraint: All types must be serializable (serde) Rationale: Model saving/loading, network transfer Impact: Design complexity, trait object limitations Mitigation: Enum-based polymorphism, careful trait design
9.2 Dependencies
Core Dependencies
[dependencies]
# Math and numerics
ndarray = { version = "0.16", default-features = false }
rand = { version = "0.8", default-features = false }
rand_distr = { version = "0.4", default-features = false }
# Serialization
serde = { version = "1.0", features = ["derive"], optional = true }
rkyv = { version = "0.8", optional = true }
# Error handling
thiserror = "2.0"
# Optional: SIMD
simsimd = { version = "5.9", optional = true, features = ["nightly"] }
# Optional: Parallel processing
rayon = { version = "1.10", optional = true }
# Optional: WASM
wasm-bindgen = { version = "0.2", optional = true }
js-sys = { version = "0.3", optional = true }
# Optional: NAPI-RS
napi = { version = "2.16", optional = true }
napi-derive = { version = "2.16", optional = true }
Dependency Audit: All dependencies must be MIT/Apache-2.0 licensed
9.3 Integration Dependencies
Upstream (Used By)
ruvector-gnn: Uses attention mechanisms in GNN layersruvector-graph: Graph construction with attention-based edge selection
Downstream (Depends On)
ruvector-core: Core vector operations, distance metricshnsw_rs: HNSW graph structure (optional, for examples)
10. Risk Assessment
10.1 Technical Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Hyperbolic numerical instability | High | Medium | Careful boundary handling, epsilon clipping, extensive testing |
| WASM performance degradation | Medium | High | WASM SIMD, algorithmic optimizations, benchmarking |
| Memory bloat in large graphs | Medium | High | Flash Attention, sparse patterns, streaming |
| API breaking changes | Low | High | Careful API design, SemVer, deprecation warnings |
| Dependency conflicts | Low | Medium | Minimal dependencies, version pinning |
10.2 Schedule Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Hyperbolic implementation complexity | Medium | Medium | 20% buffer time, fallback to Euclidean |
| Performance targets not met | Low | High | Early benchmarking, iterative optimization |
| WASM/NAPI-RS compatibility issues | Low | Medium | Early CI setup, continuous testing |
Buffer: 20% time buffer in each phase for unexpected issues
10.3 Operational Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| CI infrastructure failures | Low | Low | GitHub Actions redundancy, local testing |
| Documentation drift | Medium | Medium | Doc tests, CI doc generation checks |
| Contributor onboarding difficulty | Medium | Low | Comprehensive docs, clear examples |
11. Open Questions
11.1 Design Questions
Q1: Should we support dynamic mechanism selection at runtime? Options:
- A) Enum-based (
AttentionMechanism::ScaledDotProduct) - B) Trait objects (
Box<dyn Attention>) - C) Both
Q2: How to handle attention visualization? Options:
- A) Return attention weights separately
- B) Integrate with vis library (e.g.,
plotters) - C) Export to JSON for external tools
Q3: Should we support distributed attention computation? Options:
- A) In-crate via
rayon - B) External crate (e.g.,
ruvector-attention-distributed) - C) Defer to v2.0
11.2 API Questions
Q4: Naming convention for attention mechanisms? Options:
- A) Descriptive (
ScaledDotProductAttention) - B) Abbreviated (
SDPAttention) - C) Mixed (long in code, short in docs)
Q5: Should builders be mandatory or optional? Options:
- A) Mandatory (always use builder)
- B) Optional (provide
new()for defaults) - C) Hybrid (simple types use
new(), complex use builder)
Appendix A: Glossary
| Term | Definition |
|---|---|
| Attention | Mechanism for weighted aggregation based on learned similarities |
| Scaled Dot-Product | Attention(Q,K,V) = softmax(QK^T/√d) V |
| Multi-Head | Parallel attention mechanisms with different projections |
| Hyperbolic | Non-Euclidean geometry with negative curvature |
| Poincaré Ball | Model of hyperbolic space as unit ball |
| GAT | Graph Attention Networks |
| RoPE | Rotary Position Embeddings |
| Flash Attention | Memory-efficient tiled attention computation |
| MoE | Mixture of Experts (learned routing between mechanisms) |
| InfoNCE | Contrastive loss function |
| HNSW | Hierarchical Navigable Small World graphs |
Appendix B: References
Research Papers
- Attention Mechanism: Vaswani et al. (2017) - "Attention Is All You Need"
- GAT: Veličković et al. (2018) - "Graph Attention Networks"
- Hyperbolic: Chami et al. (2019) - "Hyperbolic Graph Convolutional Neural Networks"
- Flash Attention: Dao et al. (2022) - "FlashAttention: Fast and Memory-Efficient Exact Attention"
- Performer: Choromanski et al. (2020) - "Rethinking Attention with Performers"
- RoPE: Su et al. (2021) - "RoFormer: Enhanced Transformer with Rotary Position Embedding"
- MoE: Shazeer et al. (2017) - "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
RuVector Research Documents
/docs/latent-space/attention-mechanisms-research.md/docs/latent-space/gnn-architecture-analysis.md/docs/latent-space/optimization-strategies.md/docs/latent-space/implementation-roadmap.md
External Resources
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-11-30 | RuVector Team | Initial specification |
Approvals
| Role | Name | Signature | Date |
|---|---|---|---|
| Technical Lead | |||
| Architecture Review | |||
| QA Lead | |||
| Product Owner |
END OF SPECIFICATION
This document represents the complete specification for the ruvector-attention crate. Implementation should proceed according to the SPARC methodology:
- Specification ✅ (this document)
- Pseudocode (next phase)
- Architecture (detailed design)
- Refinement (iterative TDD)
- Completion (integration and release)