Files
wifi-densepose/docs/research/latent-space/implementation-plans/agents/08-napi-bindings.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

48 KiB

Agent 8: NAPI-RS Node.js Native Bindings

Overview

Create high-performance Node.js bindings for RuVector's GNN latent space attention mechanisms using NAPI-RS, enabling seamless integration with JavaScript/TypeScript applications.

Project Structure

ruvector-node/
├── Cargo.toml
├── build.rs
├── package.json
├── index.js
├── index.d.ts
├── src/
│   ├── lib.rs
│   ├── attention/
│   │   ├── mod.rs
│   │   ├── dot_product.rs
│   │   ├── multi_head.rs
│   │   ├── graph_attention.rs
│   │   ├── temporal_attention.rs
│   │   └── hierarchical_attention.rs
│   ├── types.rs
│   ├── error.rs
│   └── utils.rs
├── __test__/
│   ├── attention.spec.ts
│   ├── batch.spec.ts
│   └── benchmark.spec.ts
├── examples/
│   ├── basic-usage.js
│   ├── async-batch.js
│   └── typescript-example.ts
└── .github/
    └── workflows/
        └── build.yml

1. Cargo.toml Configuration

[package]
name = "ruvector-node"
version = "0.1.0"
edition = "2021"
authors = ["RuVector Team"]
description = "Node.js bindings for RuVector GNN latent space attention mechanisms"
license = "MIT"

[lib]
crate-type = ["cdylib"]

[dependencies]
# NAPI-RS core
napi = { version = "2.16", features = ["async", "tokio_rt"] }
napi-derive = "2.16"

# Async runtime
tokio = { version = "1.35", features = ["full"] }

# Core dependencies
ndarray = "0.15"
rayon = "1.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Error handling
thiserror = "1.0"
anyhow = "1.0"

# Performance
parking_lot = "0.12"

[build-dependencies]
napi-build = "2.1"

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

# Platform-specific optimizations
[target.'cfg(target_arch = "x86_64")'.dependencies]
packed_simd = "0.3"

[target.'cfg(target_arch = "aarch64")'.dependencies]
packed_simd = "0.3"

2. build.rs

extern crate napi_build;

fn main() {
    napi_build::setup();

    // Platform-specific optimizations
    println!("cargo:rerun-if-changed=build.rs");

    #[cfg(target_arch = "x86_64")]
    {
        println!("cargo:rustc-env=TARGET_ARCH=x86_64");
        // Enable AVX2 if available
        if is_x86_feature_detected!("avx2") {
            println!("cargo:rustc-cfg=has_avx2");
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        println!("cargo:rustc-env=TARGET_ARCH=aarch64");
        println!("cargo:rustc-cfg=has_neon");
    }
}

#[cfg(target_arch = "x86_64")]
fn is_x86_feature_detected(feature: &str) -> bool {
    std::arch::is_x86_feature_detected(feature)
}

3. Rust NAPI Bindings

src/lib.rs

#![deny(clippy::all)]

use napi::bindgen_prelude::*;
use napi_derive::napi;

mod attention;
mod error;
mod types;
mod utils;

pub use attention::*;
pub use error::*;
pub use types::*;

/// Initialize the RuVector native module
#[napi]
pub fn init() -> Result<String> {
    Ok("RuVector native module initialized".to_string())
}

/// Get module version
#[napi]
pub fn get_version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}

/// Get available SIMD features
#[napi]
pub fn get_features() -> Vec<String> {
    let mut features = Vec::new();

    #[cfg(target_arch = "x86_64")]
    {
        if std::arch::is_x86_feature_detected!("avx2") {
            features.push("avx2".to_string());
        }
        if std::arch::is_x86_feature_detected!("fma") {
            features.push("fma".to_string());
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        features.push("neon".to_string());
    }

    features
}

src/types.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;

/// Attention configuration options
#[napi(object)]
#[derive(Debug, Clone)]
pub struct AttentionConfig {
    /// Number of attention heads
    pub num_heads: Option<u32>,
    /// Dimension of each head
    pub head_dim: Option<u32>,
    /// Dropout rate
    pub dropout: Option<f32>,
    /// Whether to use bias in projections
    pub use_bias: Option<bool>,
    /// Attention scaling factor
    pub scale: Option<f32>,
}

impl Default for AttentionConfig {
    fn default() -> Self {
        Self {
            num_heads: Some(8),
            head_dim: Some(64),
            dropout: Some(0.1),
            use_bias: Some(true),
            scale: None,
        }
    }
}

/// Graph structure for attention
#[napi(object)]
#[derive(Debug, Clone)]
pub struct GraphStructure {
    /// Edge list (source, target pairs)
    pub edges: Vec<Vec<u32>>,
    /// Number of nodes
    pub num_nodes: u32,
    /// Edge features (optional)
    pub edge_features: Option<Vec<Vec<f32>>>,
}

/// Attention output
#[napi(object)]
#[derive(Debug, Clone)]
pub struct AttentionOutput {
    /// Attention values
    pub values: Vec<Vec<f32>>,
    /// Attention weights (optional)
    pub weights: Option<Vec<Vec<f32>>>,
    /// Metadata
    pub metadata: Option<serde_json::Value>,
}

/// Batch processing configuration
#[napi(object)]
#[derive(Debug, Clone)]
pub struct BatchConfig {
    /// Maximum batch size
    pub max_batch_size: u32,
    /// Number of parallel threads
    pub num_threads: Option<u32>,
    /// Enable progress callbacks
    pub enable_progress: Option<bool>,
}

src/error.rs

use napi::bindgen_prelude::*;
use thiserror::Error;

#[derive(Error, Debug)]
pub enum RuVectorError {
    #[error("Invalid input: {0}")]
    InvalidInput(String),

    #[error("Dimension mismatch: expected {expected}, got {actual}")]
    DimensionMismatch { expected: usize, actual: usize },

    #[error("Computation error: {0}")]
    ComputationError(String),

    #[error("Invalid configuration: {0}")]
    InvalidConfig(String),
}

impl From<RuVectorError> for Error {
    fn from(err: RuVectorError) -> Self {
        Error::new(Status::GenericFailure, err.to_string())
    }
}

src/attention/mod.rs

pub mod dot_product;
pub mod multi_head;
pub mod graph_attention;
pub mod temporal_attention;
pub mod hierarchical_attention;

pub use dot_product::*;
pub use multi_head::*;
pub use graph_attention::*;
pub use temporal_attention::*;
pub use hierarchical_attention::*;

src/attention/dot_product.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;
use ndarray::{Array2, ArrayView2};
use crate::{AttentionConfig, AttentionOutput, RuVectorError};

/// Dot-product attention mechanism
#[napi]
pub struct DotProductAttention {
    config: AttentionConfig,
}

#[napi]
impl DotProductAttention {
    /// Create a new dot-product attention instance
    #[napi(constructor)]
    pub fn new(config: Option<AttentionConfig>) -> Self {
        Self {
            config: config.unwrap_or_default(),
        }
    }

    /// Compute attention (synchronous)
    #[napi]
    pub fn compute(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        self.compute_internal(query, key, value)
            .map_err(|e| e.into())
    }

    /// Compute attention (asynchronous)
    #[napi]
    pub async fn compute_async(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        let config = self.config.clone();

        tokio::task::spawn_blocking(move || {
            Self::compute_internal_static(&config, query, key, value)
        })
        .await
        .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
        .map_err(|e| e.into())
    }

    /// Batch compute attention
    #[napi]
    pub fn compute_batch(
        &self,
        queries: Vec<Vec<Vec<f32>>>,
        keys: Vec<Vec<Vec<f32>>>,
        values: Vec<Vec<Vec<f32>>>,
    ) -> Result<Vec<AttentionOutput>> {
        if queries.len() != keys.len() || queries.len() != values.len() {
            return Err(Error::from(RuVectorError::InvalidInput(
                "Batch sizes must match".to_string()
            )));
        }

        queries
            .into_iter()
            .zip(keys)
            .zip(values)
            .map(|((q, k), v)| self.compute_internal(q, k, v))
            .collect::<Result<Vec<_>, _>>()
            .map_err(|e| e.into())
    }

    /// Batch compute attention (asynchronous with parallelism)
    #[napi]
    pub async fn compute_batch_async(
        &self,
        queries: Vec<Vec<Vec<f32>>>,
        keys: Vec<Vec<Vec<f32>>>,
        values: Vec<Vec<Vec<f32>>>,
    ) -> Result<Vec<AttentionOutput>> {
        if queries.len() != keys.len() || queries.len() != values.len() {
            return Err(Error::from(RuVectorError::InvalidInput(
                "Batch sizes must match".to_string()
            )));
        }

        let config = self.config.clone();
        let tasks: Vec<_> = queries
            .into_iter()
            .zip(keys)
            .zip(values)
            .map(|((q, k), v)| {
                let config = config.clone();
                tokio::task::spawn_blocking(move || {
                    Self::compute_internal_static(&config, q, k, v)
                })
            })
            .collect();

        let mut results = Vec::new();
        for task in tasks {
            let result = task
                .await
                .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
                .map_err(|e: RuVectorError| Error::from(e))?;
            results.push(result);
        }

        Ok(results)
    }

    // Internal implementation
    fn compute_internal(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        Self::compute_internal_static(&self.config, query, key, value)
    }

    fn compute_internal_static(
        config: &AttentionConfig,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        // Validate dimensions
        if query.is_empty() || key.is_empty() || value.is_empty() {
            return Err(RuVectorError::InvalidInput("Empty input".to_string()));
        }

        let q_dim = query[0].len();
        let k_dim = key[0].len();
        let v_dim = value[0].len();

        if q_dim != k_dim {
            return Err(RuVectorError::DimensionMismatch {
                expected: q_dim,
                actual: k_dim,
            });
        }

        // Convert to ndarray
        let q = Self::vec_to_array2(&query)?;
        let k = Self::vec_to_array2(&key)?;
        let v = Self::vec_to_array2(&value)?;

        // Compute scaled dot-product attention
        let scale = config.scale.unwrap_or_else(|| (k_dim as f32).sqrt());

        // Q @ K^T
        let scores = q.dot(&k.t()) / scale;

        // Softmax
        let weights = Self::softmax(&scores);

        // Attention @ V
        let output = weights.dot(&v);

        // Convert back to Vec
        let values = Self::array2_to_vec(&output);
        let weights_vec = Some(Self::array2_to_vec(&weights));

        Ok(AttentionOutput {
            values,
            weights: weights_vec,
            metadata: None,
        })
    }

    fn vec_to_array2(vec: &[Vec<f32>]) -> Result<Array2<f32>, RuVectorError> {
        if vec.is_empty() {
            return Err(RuVectorError::InvalidInput("Empty vector".to_string()));
        }

        let rows = vec.len();
        let cols = vec[0].len();
        let flat: Vec<f32> = vec.iter().flat_map(|row| row.iter().copied()).collect();

        Array2::from_shape_vec((rows, cols), flat)
            .map_err(|e| RuVectorError::ComputationError(e.to_string()))
    }

    fn array2_to_vec(arr: &Array2<f32>) -> Vec<Vec<f32>> {
        arr.outer_iter()
            .map(|row| row.to_vec())
            .collect()
    }

    fn softmax(arr: &Array2<f32>) -> Array2<f32> {
        let mut result = Array2::zeros(arr.raw_dim());

        for (i, row) in arr.outer_iter().enumerate() {
            let max = row.iter().copied().fold(f32::NEG_INFINITY, f32::max);
            let exp_sum: f32 = row.iter().map(|&x| (x - max).exp()).sum();

            for (j, &val) in row.iter().enumerate() {
                result[[i, j]] = ((val - max).exp()) / exp_sum;
            }
        }

        result
    }
}

src/attention/multi_head.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::{AttentionConfig, AttentionOutput, RuVectorError};

/// Multi-head attention mechanism
#[napi]
pub struct MultiHeadAttention {
    config: AttentionConfig,
}

#[napi]
impl MultiHeadAttention {
    #[napi(constructor)]
    pub fn new(config: Option<AttentionConfig>) -> Self {
        Self {
            config: config.unwrap_or_default(),
        }
    }

    #[napi]
    pub fn compute(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        self.compute_internal(query, key, value)
            .map_err(|e| e.into())
    }

    #[napi]
    pub async fn compute_async(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        let config = self.config.clone();

        tokio::task::spawn_blocking(move || {
            Self::compute_internal_static(&config, query, key, value)
        })
        .await
        .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
        .map_err(|e| e.into())
    }

    #[napi]
    pub fn compute_batch(
        &self,
        queries: Vec<Vec<Vec<f32>>>,
        keys: Vec<Vec<Vec<f32>>>,
        values: Vec<Vec<Vec<f32>>>,
    ) -> Result<Vec<AttentionOutput>> {
        queries
            .into_iter()
            .zip(keys)
            .zip(values)
            .map(|((q, k), v)| self.compute_internal(q, k, v))
            .collect::<Result<Vec<_>, _>>()
            .map_err(|e| e.into())
    }

    #[napi]
    pub async fn compute_batch_async(
        &self,
        queries: Vec<Vec<Vec<f32>>>,
        keys: Vec<Vec<Vec<f32>>>,
        values: Vec<Vec<Vec<f32>>>,
    ) -> Result<Vec<AttentionOutput>> {
        let config = self.config.clone();
        let tasks: Vec<_> = queries
            .into_iter()
            .zip(keys)
            .zip(values)
            .map(|((q, k), v)| {
                let config = config.clone();
                tokio::task::spawn_blocking(move || {
                    Self::compute_internal_static(&config, q, k, v)
                })
            })
            .collect();

        let mut results = Vec::new();
        for task in tasks {
            results.push(
                task.await
                    .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
                    .map_err(|e: RuVectorError| Error::from(e))?
            );
        }

        Ok(results)
    }

    fn compute_internal(
        &self,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        Self::compute_internal_static(&self.config, query, key, value)
    }

    fn compute_internal_static(
        config: &AttentionConfig,
        query: Vec<Vec<f32>>,
        key: Vec<Vec<f32>>,
        value: Vec<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        let num_heads = config.num_heads.unwrap_or(8) as usize;

        // Simplified multi-head implementation
        // In production, would split into heads, compute attention per head, concat
        let values = query.clone(); // Placeholder

        Ok(AttentionOutput {
            values,
            weights: None,
            metadata: Some(serde_json::json!({
                "num_heads": num_heads,
                "attention_type": "multi_head"
            })),
        })
    }
}

src/attention/graph_attention.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::{AttentionConfig, AttentionOutput, GraphStructure, RuVectorError};

/// Graph attention network (GAT) mechanism
#[napi]
pub struct GraphAttention {
    config: AttentionConfig,
}

#[napi]
impl GraphAttention {
    #[napi(constructor)]
    pub fn new(config: Option<AttentionConfig>) -> Self {
        Self {
            config: config.unwrap_or_default(),
        }
    }

    #[napi]
    pub fn compute(
        &self,
        node_features: Vec<Vec<f32>>,
        graph: GraphStructure,
    ) -> Result<AttentionOutput> {
        self.compute_internal(node_features, graph)
            .map_err(|e| e.into())
    }

    #[napi]
    pub async fn compute_async(
        &self,
        node_features: Vec<Vec<f32>>,
        graph: GraphStructure,
    ) -> Result<AttentionOutput> {
        let config = self.config.clone();

        tokio::task::spawn_blocking(move || {
            Self::compute_internal_static(&config, node_features, graph)
        })
        .await
        .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
        .map_err(|e| e.into())
    }

    fn compute_internal(
        &self,
        node_features: Vec<Vec<f32>>,
        graph: GraphStructure,
    ) -> Result<AttentionOutput, RuVectorError> {
        Self::compute_internal_static(&self.config, node_features, graph)
    }

    fn compute_internal_static(
        config: &AttentionConfig,
        node_features: Vec<Vec<f32>>,
        graph: GraphStructure,
    ) -> Result<AttentionOutput, RuVectorError> {
        // Simplified GAT implementation
        let values = node_features.clone(); // Placeholder

        Ok(AttentionOutput {
            values,
            weights: None,
            metadata: Some(serde_json::json!({
                "num_nodes": graph.num_nodes,
                "num_edges": graph.edges.len(),
                "attention_type": "graph"
            })),
        })
    }
}

src/attention/temporal_attention.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::{AttentionConfig, AttentionOutput, RuVectorError};

/// Temporal attention for sequence data
#[napi]
pub struct TemporalAttention {
    config: AttentionConfig,
}

#[napi]
impl TemporalAttention {
    #[napi(constructor)]
    pub fn new(config: Option<AttentionConfig>) -> Self {
        Self {
            config: config.unwrap_or_default(),
        }
    }

    #[napi]
    pub fn compute(
        &self,
        sequence: Vec<Vec<f32>>,
        timestamps: Option<Vec<f64>>,
    ) -> Result<AttentionOutput> {
        self.compute_internal(sequence, timestamps)
            .map_err(|e| e.into())
    }

    #[napi]
    pub async fn compute_async(
        &self,
        sequence: Vec<Vec<f32>>,
        timestamps: Option<Vec<f64>>,
    ) -> Result<AttentionOutput> {
        let config = self.config.clone();

        tokio::task::spawn_blocking(move || {
            Self::compute_internal_static(&config, sequence, timestamps)
        })
        .await
        .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
        .map_err(|e| e.into())
    }

    fn compute_internal(
        &self,
        sequence: Vec<Vec<f32>>,
        timestamps: Option<Vec<f64>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        Self::compute_internal_static(&self.config, sequence, timestamps)
    }

    fn compute_internal_static(
        _config: &AttentionConfig,
        sequence: Vec<Vec<f32>>,
        timestamps: Option<Vec<f64>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        let values = sequence.clone(); // Placeholder

        Ok(AttentionOutput {
            values,
            weights: None,
            metadata: Some(serde_json::json!({
                "sequence_length": sequence.len(),
                "has_timestamps": timestamps.is_some(),
                "attention_type": "temporal"
            })),
        })
    }
}

src/attention/hierarchical_attention.rs

use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::{AttentionConfig, AttentionOutput, RuVectorError};

/// Hierarchical attention for multi-level structures
#[napi]
pub struct HierarchicalAttention {
    config: AttentionConfig,
}

#[napi]
impl HierarchicalAttention {
    #[napi(constructor)]
    pub fn new(config: Option<AttentionConfig>) -> Self {
        Self {
            config: config.unwrap_or_default(),
        }
    }

    #[napi]
    pub fn compute(
        &self,
        hierarchical_features: Vec<Vec<Vec<f32>>>,
        level_weights: Option<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        self.compute_internal(hierarchical_features, level_weights)
            .map_err(|e| e.into())
    }

    #[napi]
    pub async fn compute_async(
        &self,
        hierarchical_features: Vec<Vec<Vec<f32>>>,
        level_weights: Option<Vec<f32>>,
    ) -> Result<AttentionOutput> {
        let config = self.config.clone();

        tokio::task::spawn_blocking(move || {
            Self::compute_internal_static(&config, hierarchical_features, level_weights)
        })
        .await
        .map_err(|e| Error::new(Status::GenericFailure, e.to_string()))?
        .map_err(|e| e.into())
    }

    fn compute_internal(
        &self,
        hierarchical_features: Vec<Vec<Vec<f32>>>,
        level_weights: Option<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        Self::compute_internal_static(&self.config, hierarchical_features, level_weights)
    }

    fn compute_internal_static(
        _config: &AttentionConfig,
        hierarchical_features: Vec<Vec<Vec<f32>>>,
        level_weights: Option<Vec<f32>>,
    ) -> Result<AttentionOutput, RuVectorError> {
        if hierarchical_features.is_empty() {
            return Err(RuVectorError::InvalidInput("Empty hierarchy".to_string()));
        }

        // Simplified hierarchical attention
        let values = hierarchical_features[0].clone(); // Placeholder

        Ok(AttentionOutput {
            values,
            weights: None,
            metadata: Some(serde_json::json!({
                "num_levels": hierarchical_features.len(),
                "has_level_weights": level_weights.is_some(),
                "attention_type": "hierarchical"
            })),
        })
    }
}

4. TypeScript Definitions

index.d.ts

/* tslint:disable */
/* eslint-disable */

/* auto-generated by NAPI-RS */

/**
 * Initialize the RuVector native module
 */
export function init(): string

/**
 * Get module version
 */
export function getVersion(): string

/**
 * Get available SIMD features
 */
export function getFeatures(): Array<string>

/**
 * Attention configuration options
 */
export interface AttentionConfig {
  /** Number of attention heads */
  numHeads?: number
  /** Dimension of each head */
  headDim?: number
  /** Dropout rate */
  dropout?: number
  /** Whether to use bias in projections */
  useBias?: boolean
  /** Attention scaling factor */
  scale?: number
}

/**
 * Graph structure for attention
 */
export interface GraphStructure {
  /** Edge list (source, target pairs) */
  edges: Array<Array<number>>
  /** Number of nodes */
  numNodes: number
  /** Edge features (optional) */
  edgeFeatures?: Array<Array<number>>
}

/**
 * Attention output
 */
export interface AttentionOutput {
  /** Attention values */
  values: Array<Array<number>>
  /** Attention weights (optional) */
  weights?: Array<Array<number>>
  /** Metadata */
  metadata?: any
}

/**
 * Batch processing configuration
 */
export interface BatchConfig {
  /** Maximum batch size */
  maxBatchSize: number
  /** Number of parallel threads */
  numThreads?: number
  /** Enable progress callbacks */
  enableProgress?: boolean
}

/**
 * Dot-product attention mechanism
 */
export class DotProductAttention {
  /**
   * Create a new dot-product attention instance
   * @param config Optional attention configuration
   */
  constructor(config?: AttentionConfig)

  /**
   * Compute attention (synchronous)
   * @param query Query matrix [seq_len, dim]
   * @param key Key matrix [seq_len, dim]
   * @param value Value matrix [seq_len, dim]
   * @returns Attention output with values and weights
   */
  compute(
    query: Array<Array<number>>,
    key: Array<Array<number>>,
    value: Array<Array<number>>
  ): AttentionOutput

  /**
   * Compute attention (asynchronous)
   * @param query Query matrix [seq_len, dim]
   * @param key Key matrix [seq_len, dim]
   * @param value Value matrix [seq_len, dim]
   * @returns Promise resolving to attention output
   */
  computeAsync(
    query: Array<Array<number>>,
    key: Array<Array<number>>,
    value: Array<Array<number>>
  ): Promise<AttentionOutput>

  /**
   * Batch compute attention (synchronous)
   * @param queries Array of query matrices
   * @param keys Array of key matrices
   * @param values Array of value matrices
   * @returns Array of attention outputs
   */
  computeBatch(
    queries: Array<Array<Array<number>>>,
    keys: Array<Array<Array<number>>>,
    values: Array<Array<Array<number>>>
  ): Array<AttentionOutput>

  /**
   * Batch compute attention (asynchronous with parallelism)
   * @param queries Array of query matrices
   * @param keys Array of key matrices
   * @param values Array of value matrices
   * @returns Promise resolving to array of attention outputs
   */
  computeBatchAsync(
    queries: Array<Array<Array<number>>>,
    keys: Array<Array<Array<number>>>,
    values: Array<Array<Array<number>>>
  ): Promise<Array<AttentionOutput>>
}

/**
 * Multi-head attention mechanism
 */
export class MultiHeadAttention {
  constructor(config?: AttentionConfig)
  compute(
    query: Array<Array<number>>,
    key: Array<Array<number>>,
    value: Array<Array<number>>
  ): AttentionOutput
  computeAsync(
    query: Array<Array<number>>,
    key: Array<Array<number>>,
    value: Array<Array<number>>
  ): Promise<AttentionOutput>
  computeBatch(
    queries: Array<Array<Array<number>>>,
    keys: Array<Array<Array<number>>>,
    values: Array<Array<Array<number>>>
  ): Array<AttentionOutput>
  computeBatchAsync(
    queries: Array<Array<Array<number>>>,
    keys: Array<Array<Array<number>>>,
    values: Array<Array<Array<number>>>
  ): Promise<Array<AttentionOutput>>
}

/**
 * Graph attention network (GAT) mechanism
 */
export class GraphAttention {
  constructor(config?: AttentionConfig)

  /**
   * Compute graph attention
   * @param nodeFeatures Node feature matrix [num_nodes, feature_dim]
   * @param graph Graph structure with edges and optional edge features
   * @returns Attention output with updated node features
   */
  compute(
    nodeFeatures: Array<Array<number>>,
    graph: GraphStructure
  ): AttentionOutput

  computeAsync(
    nodeFeatures: Array<Array<number>>,
    graph: GraphStructure
  ): Promise<AttentionOutput>
}

/**
 * Temporal attention for sequence data
 */
export class TemporalAttention {
  constructor(config?: AttentionConfig)

  /**
   * Compute temporal attention
   * @param sequence Sequence of feature vectors [seq_len, feature_dim]
   * @param timestamps Optional timestamps for each sequence element
   * @returns Attention output with temporal features
   */
  compute(
    sequence: Array<Array<number>>,
    timestamps?: Array<number>
  ): AttentionOutput

  computeAsync(
    sequence: Array<Array<number>>,
    timestamps?: Array<number>
  ): Promise<AttentionOutput>
}

/**
 * Hierarchical attention for multi-level structures
 */
export class HierarchicalAttention {
  constructor(config?: AttentionConfig)

  /**
   * Compute hierarchical attention
   * @param hierarchicalFeatures Multi-level features [num_levels][num_items][feature_dim]
   * @param levelWeights Optional weights for each hierarchy level
   * @returns Attention output with aggregated features
   */
  compute(
    hierarchicalFeatures: Array<Array<Array<number>>>,
    levelWeights?: Array<number>
  ): AttentionOutput

  computeAsync(
    hierarchicalFeatures: Array<Array<Array<number>>>,
    levelWeights?: Array<number>
  ): Promise<AttentionOutput>
}

5. Package.json

{
  "name": "@ruvector/node",
  "version": "0.1.0",
  "description": "High-performance Node.js bindings for RuVector GNN latent space attention mechanisms",
  "main": "index.js",
  "types": "index.d.ts",
  "keywords": [
    "rust",
    "napi",
    "attention",
    "gnn",
    "graph-neural-networks",
    "machine-learning",
    "vector-database",
    "native-addon"
  ],
  "author": "RuVector Team",
  "license": "MIT",
  "repository": {
    "type": "git",
    "url": "https://github.com/ruvnet/ruvector"
  },
  "napi": {
    "name": "ruvector-node",
    "triples": {
      "defaults": true,
      "additional": [
        "x86_64-unknown-linux-musl",
        "aarch64-unknown-linux-gnu",
        "aarch64-unknown-linux-musl",
        "aarch64-apple-darwin",
        "x86_64-apple-darwin",
        "x86_64-pc-windows-msvc",
        "aarch64-pc-windows-msvc"
      ]
    }
  },
  "engines": {
    "node": ">= 16"
  },
  "scripts": {
    "artifacts": "napi artifacts",
    "build": "napi build --platform --release",
    "build:debug": "napi build --platform",
    "prepublishOnly": "napi prepublish -t npm",
    "test": "vitest run",
    "test:watch": "vitest",
    "universal": "napi universal",
    "version": "napi version",
    "bench": "node benchmarks/run.js",
    "lint": "eslint . --ext .ts,.js",
    "format": "prettier --write .",
    "typecheck": "tsc --noEmit"
  },
  "devDependencies": {
    "@napi-rs/cli": "^2.18.0",
    "@types/node": "^20.10.0",
    "@typescript-eslint/eslint-plugin": "^6.15.0",
    "@typescript-eslint/parser": "^6.15.0",
    "eslint": "^8.56.0",
    "prettier": "^3.1.1",
    "typescript": "^5.3.3",
    "vitest": "^1.0.4"
  },
  "packageManager": "npm@10.2.5",
  "files": [
    "index.js",
    "index.d.ts",
    "README.md",
    "LICENSE"
  ],
  "optionalDependencies": {
    "@ruvector/node-win32-x64-msvc": "0.1.0",
    "@ruvector/node-darwin-x64": "0.1.0",
    "@ruvector/node-darwin-arm64": "0.1.0",
    "@ruvector/node-linux-x64-gnu": "0.1.0",
    "@ruvector/node-linux-x64-musl": "0.1.0",
    "@ruvector/node-linux-arm64-gnu": "0.1.0",
    "@ruvector/node-linux-arm64-musl": "0.1.0"
  }
}

6. GitHub Actions Workflow

.github/workflows/build.yml

name: Build and Release

on:
  push:
    branches: [main]
    tags:
      - 'v*'
  pull_request:
    branches: [main]
  workflow_dispatch:

env:
  DEBUG: napi:*
  APP_NAME: ruvector-node
  MACOSX_DEPLOYMENT_TARGET: '10.13'

jobs:
  build:
    strategy:
      fail-fast: false
      matrix:
        settings:
          - host: macos-latest
            target: x86_64-apple-darwin
            build: |
              npm run build
              strip -x *.node

          - host: macos-latest
            target: aarch64-apple-darwin
            build: |
              sudo rm -Rf /Library/Developer/CommandLineTools/SDKs/*;
              export CC=$(xcrun -f clang);
              export CXX=$(xcrun -f clang++);
              SYSROOT=$(xcrun --sdk macosx --show-sdk-path);
              export CFLAGS="-isysroot $SYSROOT -isystem $SYSROOT";
              npm run build -- --target aarch64-apple-darwin
              strip -x *.node

          - host: ubuntu-latest
            target: x86_64-unknown-linux-gnu
            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-debian
            build: |
              set -e &&
              npm run build -- --target x86_64-unknown-linux-gnu &&
              strip *.node

          - host: ubuntu-latest
            target: x86_64-unknown-linux-musl
            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine
            build: |
              set -e &&
              npm run build &&
              strip *.node

          - host: ubuntu-latest
            target: aarch64-unknown-linux-gnu
            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-debian-aarch64
            build: |
              set -e &&
              npm run build -- --target aarch64-unknown-linux-gnu &&
              aarch64-unknown-linux-gnu-strip *.node

          - host: ubuntu-latest
            target: aarch64-unknown-linux-musl
            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine-aarch64
            build: |
              set -e &&
              rustup target add aarch64-unknown-linux-musl &&
              npm run build -- --target aarch64-unknown-linux-musl &&
              /aarch64-linux-musl-cross/bin/aarch64-linux-musl-strip *.node

          - host: windows-latest
            target: x86_64-pc-windows-msvc
            build: npm run build

          - host: windows-latest
            target: aarch64-pc-windows-msvc
            build: npm run build -- --target aarch64-pc-windows-msvc

    name: stable - ${{ matrix.settings.target }} - node@20
    runs-on: ${{ matrix.settings.host }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: stable
          targets: ${{ matrix.settings.target }}

      - name: Cache cargo
        uses: actions/cache@v3
        with:
          path: |
            ~/.cargo/registry/index/
            ~/.cargo/registry/cache/
            ~/.cargo/git/db/
            .cargo-cache
            target/
          key: ${{ matrix.settings.target }}-cargo-${{ matrix.settings.host }}

      - name: Install dependencies
        run: npm ci

      - name: Build in docker
        uses: addnab/docker-run-action@v3
        if: ${{ matrix.settings.docker }}
        with:
          image: ${{ matrix.settings.docker }}
          options: '--user 0:0 -v ${{ github.workspace }}/.cargo-cache/git/db:/usr/local/cargo/git/db -v ${{ github.workspace }}/.cargo/registry/cache:/usr/local/cargo/registry/cache -v ${{ github.workspace }}/.cargo/registry/index:/usr/local/cargo/registry/index -v ${{ github.workspace }}:/build -w /build'
          run: ${{ matrix.settings.build }}

      - name: Build
        run: ${{ matrix.settings.build }}
        if: ${{ !matrix.settings.docker }}
        shell: bash

      - name: Upload artifact
        uses: actions/upload-artifact@v3
        with:
          name: bindings-${{ matrix.settings.target }}
          path: ${{ env.APP_NAME }}.*.node
          if-no-files-found: error

  test-macOS-windows-binding:
    name: Test bindings on ${{ matrix.settings.target }} - node@${{ matrix.node }}
    needs:
      - build
    strategy:
      fail-fast: false
      matrix:
        settings:
          - host: macos-latest
            target: x86_64-apple-darwin
          - host: windows-latest
            target: x86_64-pc-windows-msvc
        node:
          - '18'
          - '20'
    runs-on: ${{ matrix.settings.host }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup node
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
          cache: npm

      - name: Install dependencies
        run: npm ci

      - name: Download artifacts
        uses: actions/download-artifact@v3
        with:
          name: bindings-${{ matrix.settings.target }}
          path: .

      - name: List packages
        run: ls -R .
        shell: bash

      - name: Test bindings
        run: npm test

  test-linux-x64-gnu-binding:
    name: Test bindings on Linux-x64-gnu - node@${{ matrix.node }}
    needs:
      - build
    strategy:
      fail-fast: false
      matrix:
        node:
          - '18'
          - '20'
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup node
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
          cache: npm

      - name: Install dependencies
        run: npm ci

      - name: Download artifacts
        uses: actions/download-artifact@v3
        with:
          name: bindings-x86_64-unknown-linux-gnu
          path: .

      - name: List packages
        run: ls -R .
        shell: bash

      - name: Test bindings
        run: docker run --rm -v $(pwd):/build -w /build node:${{ matrix.node }}-slim npm test

  publish:
    name: Publish
    runs-on: ubuntu-latest
    needs:
      - test-macOS-windows-binding
      - test-linux-x64-gnu-binding
    if: startsWith(github.ref, 'refs/tags/v')

    steps:
      - uses: actions/checkout@v4

      - name: Setup node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install dependencies
        run: npm ci

      - name: Download all artifacts
        uses: actions/download-artifact@v3
        with:
          path: artifacts

      - name: Move artifacts
        run: npm run artifacts

      - name: List packages
        run: ls -R ./npm
        shell: bash

      - name: Publish
        run: |
          npm config set provenance true
          if git log -1 --pretty=%B | grep "^[0-9]\+\.[0-9]\+\.[0-9]\+$";
          then
            echo "//registry.npmjs.org/:_authToken=$NPM_TOKEN" >> ~/.npmrc
            npm publish --access public
          elif git log -1 --pretty=%B | grep "^[0-9]\+\.[0-9]\+\.[0-9]\+";
          then
            echo "//registry.npmjs.org/:_authToken=$NPM_TOKEN" >> ~/.npmrc
            npm publish --tag next --access public
          else
            echo "Not a release, skipping publish"
          fi
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          NPM_TOKEN: ${{ secrets.NPM_TOKEN }}

7. Testing Strategy

test/attention.spec.ts

import { describe, it, expect } from 'vitest'
import {
  DotProductAttention,
  MultiHeadAttention,
  GraphAttention,
  TemporalAttention,
  HierarchicalAttention,
  AttentionConfig,
  GraphStructure
} from '../index'

describe('DotProductAttention', () => {
  it('should compute attention synchronously', () => {
    const attention = new DotProductAttention()
    const query = [[1, 2], [3, 4]]
    const key = [[1, 2], [3, 4]]
    const value = [[5, 6], [7, 8]]

    const result = attention.compute(query, key, value)

    expect(result.values).toBeDefined()
    expect(result.values.length).toBe(2)
    expect(result.weights).toBeDefined()
  })

  it('should compute attention asynchronously', async () => {
    const attention = new DotProductAttention()
    const query = [[1, 2], [3, 4]]
    const key = [[1, 2], [3, 4]]
    const value = [[5, 6], [7, 8]]

    const result = await attention.computeAsync(query, key, value)

    expect(result.values).toBeDefined()
    expect(result.values.length).toBe(2)
  })

  it('should compute batch attention', () => {
    const attention = new DotProductAttention()
    const queries = [[[1, 2]], [[3, 4]]]
    const keys = [[[1, 2]], [[3, 4]]]
    const values = [[[5, 6]], [[7, 8]]]

    const results = attention.computeBatch(queries, keys, values)

    expect(results.length).toBe(2)
    expect(results[0].values).toBeDefined()
  })

  it('should compute batch attention asynchronously', async () => {
    const attention = new DotProductAttention()
    const queries = [[[1, 2]], [[3, 4]]]
    const keys = [[[1, 2]], [[3, 4]]]
    const values = [[[5, 6]], [[7, 8]]]

    const results = await attention.computeBatchAsync(queries, keys, values)

    expect(results.length).toBe(2)
  })

  it('should accept custom configuration', () => {
    const config: AttentionConfig = {
      numHeads: 4,
      headDim: 32,
      dropout: 0.2,
      scale: 0.5
    }

    const attention = new DotProductAttention(config)
    expect(attention).toBeDefined()
  })
})

describe('GraphAttention', () => {
  it('should compute graph attention', () => {
    const attention = new GraphAttention()
    const nodeFeatures = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    const graph: GraphStructure = {
      edges: [[0, 1], [1, 2], [2, 0]],
      numNodes: 3
    }

    const result = attention.compute(nodeFeatures, graph)

    expect(result.values).toBeDefined()
    expect(result.metadata).toBeDefined()
    expect(result.metadata.attention_type).toBe('graph')
  })

  it('should compute graph attention asynchronously', async () => {
    const attention = new GraphAttention()
    const nodeFeatures = [[1, 2, 3], [4, 5, 6]]
    const graph: GraphStructure = {
      edges: [[0, 1]],
      numNodes: 2
    }

    const result = await attention.computeAsync(nodeFeatures, graph)

    expect(result.values).toBeDefined()
  })
})

describe('TemporalAttention', () => {
  it('should compute temporal attention', () => {
    const attention = new TemporalAttention()
    const sequence = [[1, 2], [3, 4], [5, 6]]

    const result = attention.compute(sequence)

    expect(result.values).toBeDefined()
    expect(result.metadata.sequence_length).toBe(3)
  })

  it('should handle timestamps', () => {
    const attention = new TemporalAttention()
    const sequence = [[1, 2], [3, 4]]
    const timestamps = [0.0, 1.0]

    const result = attention.compute(sequence, timestamps)

    expect(result.metadata.has_timestamps).toBe(true)
  })
})

describe('HierarchicalAttention', () => {
  it('should compute hierarchical attention', () => {
    const attention = new HierarchicalAttention()
    const hierarchicalFeatures = [
      [[1, 2], [3, 4]],
      [[5, 6], [7, 8]]
    ]

    const result = attention.compute(hierarchicalFeatures)

    expect(result.values).toBeDefined()
    expect(result.metadata.num_levels).toBe(2)
  })
})

test/benchmark.spec.ts

import { describe, it, expect } from 'vitest'
import { DotProductAttention } from '../index'

describe('Performance Benchmarks', () => {
  it('should handle large matrices efficiently', () => {
    const attention = new DotProductAttention()
    const size = 1000
    const dim = 512

    // Generate random matrices
    const query = Array.from({ length: size }, () =>
      Array.from({ length: dim }, () => Math.random())
    )
    const key = Array.from({ length: size }, () =>
      Array.from({ length: dim }, () => Math.random())
    )
    const value = Array.from({ length: size }, () =>
      Array.from({ length: dim }, () => Math.random())
    )

    const start = Date.now()
    const result = attention.compute(query, key, value)
    const duration = Date.now() - start

    expect(result.values).toBeDefined()
    expect(duration).toBeLessThan(5000) // Should complete in < 5 seconds
  })

  it('async should be faster for large batches', async () => {
    const attention = new DotProductAttention()
    const batchSize = 100

    const queries = Array.from({ length: batchSize }, () =>
      [[1, 2, 3], [4, 5, 6]]
    )
    const keys = queries
    const values = queries

    const syncStart = Date.now()
    attention.computeBatch(queries, keys, values)
    const syncDuration = Date.now() - syncStart

    const asyncStart = Date.now()
    await attention.computeBatchAsync(queries, keys, values)
    const asyncDuration = Date.now() - asyncStart

    console.log(`Sync: ${syncDuration}ms, Async: ${asyncDuration}ms`)
    expect(asyncDuration).toBeLessThanOrEqual(syncDuration * 1.5)
  })
})

8. Usage Examples

examples/basic-usage.js

const {
  init,
  getVersion,
  getFeatures,
  DotProductAttention
} = require('@ruvector/node')

// Initialize
console.log(init())
console.log('Version:', getVersion())
console.log('Features:', getFeatures())

// Create attention instance
const attention = new DotProductAttention({
  numHeads: 8,
  headDim: 64,
  dropout: 0.1
})

// Compute attention
const query = [[1, 2, 3], [4, 5, 6]]
const key = [[1, 2, 3], [4, 5, 6]]
const value = [[7, 8, 9], [10, 11, 12]]

const result = attention.compute(query, key, value)

console.log('Attention output:', result.values)
console.log('Attention weights:', result.weights)

examples/async-batch.js

const { DotProductAttention } = require('@ruvector/node')

async function main() {
  const attention = new DotProductAttention()

  // Prepare batch data
  const batchSize = 10
  const queries = Array.from({ length: batchSize }, (_, i) =>
    [[i, i+1], [i+2, i+3]]
  )
  const keys = queries
  const values = queries

  console.log('Processing batch asynchronously...')
  const start = Date.now()

  const results = await attention.computeBatchAsync(queries, keys, values)

  const duration = Date.now() - start
  console.log(`Processed ${batchSize} items in ${duration}ms`)
  console.log('Results:', results.length)
}

main().catch(console.error)

examples/typescript-example.ts

import {
  DotProductAttention,
  GraphAttention,
  AttentionConfig,
  GraphStructure,
  AttentionOutput
} from '@ruvector/node'

// Type-safe configuration
const config: AttentionConfig = {
  numHeads: 8,
  headDim: 64,
  dropout: 0.1,
  useBias: true
}

// Dot-product attention
const dotAttention = new DotProductAttention(config)
const output: AttentionOutput = dotAttention.compute(
  [[1, 2], [3, 4]],
  [[1, 2], [3, 4]],
  [[5, 6], [7, 8]]
)

console.log('Attention values:', output.values)

// Graph attention
const graphAttention = new GraphAttention(config)
const graph: GraphStructure = {
  edges: [[0, 1], [1, 2]],
  numNodes: 3,
  edgeFeatures: [[1.0], [2.0]]
}

const graphOutput = graphAttention.compute(
  [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
  graph
)

console.log('Graph attention:', graphOutput.metadata)

// Async with proper typing
async function processAsync(): Promise<void> {
  const result = await dotAttention.computeAsync(
    [[1, 2]],
    [[1, 2]],
    [[3, 4]]
  )
  console.log('Async result:', result)
}

processAsync()

9. Documentation

README.md

# @ruvector/node

High-performance Node.js bindings for RuVector's GNN latent space attention mechanisms.

## Features

-**Blazing Fast**: Native Rust implementation with SIMD optimizations
- 🔄 **Async Support**: Non-blocking async methods with Tokio runtime
- 📦 **Batch Processing**: Efficient parallel batch operations
- 🎯 **Multiple Attention Types**: Dot-product, multi-head, graph, temporal, hierarchical
- 🔒 **Type Safe**: Full TypeScript definitions
- 🌐 **Cross-Platform**: Pre-built binaries for all major platforms

## Installation

```bash
npm install @ruvector/node

Quick Start

const { DotProductAttention } = require('@ruvector/node')

const attention = new DotProductAttention()
const result = attention.compute(query, key, value)

API Reference

See index.d.ts for complete API documentation.

Performance

  • SIMD Optimized: AVX2 on x86_64, NEON on ARM64
  • Parallel Processing: Multi-threaded batch operations
  • Zero-Copy: Efficient memory handling

License

MIT


## Implementation Checklist

### Phase 1: Core Setup
- [ ] Initialize NAPI-RS project with Cargo.toml
- [ ] Configure build.rs for platform detection
- [ ] Set up basic project structure
- [ ] Implement error handling types
- [ ] Create TypeScript definitions

### Phase 2: Attention Mechanisms
- [ ] Implement DotProductAttention
- [ ] Implement MultiHeadAttention
- [ ] Implement GraphAttention
- [ ] Implement TemporalAttention
- [ ] Implement HierarchicalAttention
- [ ] Add sync/async variants for all

### Phase 3: Testing
- [ ] Write unit tests for each attention type
- [ ] Add integration tests
- [ ] Create benchmark suite
- [ ] Test all platforms

### Phase 4: Build & CI/CD
- [ ] Set up GitHub Actions workflow
- [ ] Configure multi-platform builds
- [ ] Add automated testing
- [ ] Set up npm publishing

### Phase 5: Documentation
- [ ] Write comprehensive README
- [ ] Add usage examples
- [ ] Document all APIs
- [ ] Create migration guide

## Performance Targets

- **Dot-Product Attention**: < 10ms for 1000x512 matrices
- **Batch Processing**: < 100ms for 100 items (async)
- **Memory Efficiency**: < 2x input size overhead
- **SIMD Speedup**: 2-4x over scalar implementation

## Platform Support

- ✅ Linux x64 (GNU/MUSL)
- ✅ Linux ARM64 (GNU/MUSL)
- ✅ macOS x64
- ✅ macOS ARM64 (Apple Silicon)
- ✅ Windows x64
- ✅ Windows ARM64

## Dependencies

- **napi-rs**: 2.16+ (Node.js bindings)
- **tokio**: 1.35+ (Async runtime)
- **ndarray**: 0.15+ (Linear algebra)
- **rayon**: 1.8+ (Parallelism)

## Integration Points

- **Agent 4**: Uses core Rust attention implementations
- **Agent 6**: Integrates with Python bindings
- **Agent 7**: Provides C++ FFI layer
- **Agent 9**: TypeScript SDK consumer