Files
wifi-densepose/docs/research/latent-space/implementation-plans/06-platform-bindings.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

54 KiB

Platform Bindings Specification - RuVector Attention

Version: 1.0.0 Date: 2025-11-30 Status: Implementation Ready

Executive Summary

This document specifies comprehensive platform bindings for ruvector-attention, enabling deployment across Rust native, WebAssembly (browser and server), Node.js (via NAPI-RS), and providing CLI/SDK interfaces for maximum accessibility.

1. Platform Support Matrix

Platform Target Status Priority Notes
Rust Native All Tier 1 targets Primary P0 Core implementation
WASM (Browser) wasm32-unknown-unknown Full P0 Browser/Deno runtime
WASM (Server) wasm32-wasi Full P1 Server-side WASM
Node.js 18 LTS x86_64, arm64 Full P0 Long-term support
Node.js 20 LTS x86_64, arm64 Full P0 Long-term support
Node.js 22 x86_64, arm64 Full P1 Current release
Windows x86_64-pc-windows-msvc Full P0 NAPI-RS bindings
macOS Intel x86_64-apple-darwin Full P0 NAPI-RS bindings
macOS Apple Silicon aarch64-apple-darwin Full P0 NAPI-RS bindings
Linux x64 x86_64-unknown-linux-gnu Full P0 NAPI-RS (glibc)
Linux x64 (musl) x86_64-unknown-linux-musl Full P1 Alpine/static linking
Linux ARM64 aarch64-unknown-linux-gnu Full P1 NAPI-RS (glibc)
Linux ARM64 (musl) aarch64-unknown-linux-musl Full P2 Alpine ARM
Linux ARMv7 armv7-unknown-linux-gnueabihf Partial P2 Raspberry Pi

1.1 Feature Matrix by Platform

Feature Rust WASM Node.js CLI
Scaled Dot-Product
Multi-Head Attention
Hyperbolic Attention
Linear Attention
Cross Attention
Self Attention
SIMD Optimizations ⚠️
Async Processing N/A
Batch Operations
Streaming ⚠️

⚠️ = Requires experimental flags or limited support


2. WASM Bindings

2.1 Project Structure

crates/ruvector-attention-wasm/
├── Cargo.toml
├── README.md
├── src/
│   ├── lib.rs                 # Main WASM exports
│   ├── attention/
│   │   ├── mod.rs
│   │   ├── scaled_dot.rs      # ScaledDotProduct WASM wrapper
│   │   ├── multi_head.rs      # MultiHead WASM wrapper
│   │   ├── hyperbolic.rs      # Hyperbolic WASM wrapper
│   │   ├── linear.rs          # Linear WASM wrapper
│   │   └── cross.rs           # Cross attention WASM wrapper
│   ├── utils.rs               # WASM utilities (panic hook, logging)
│   ├── error.rs               # WASM error handling
│   ├── types.rs               # JS-compatible types
│   └── async_ops.rs           # Async operations for WASM
├── pkg/                       # wasm-pack build output (gitignored)
├── web/                       # Browser examples
│   ├── index.html
│   ├── demo.js
│   └── worker.js             # Web Worker example
├── node/                      # Node.js WASM examples
│   └── example.mjs
├── tests/
│   ├── web.rs                # wasm-bindgen-test
│   └── node.rs
├── benches/
│   └── wasm_bench.rs
└── examples/
    ├── browser_basic.html
    ├── browser_worker.html
    └── node_server.mjs

2.2 Cargo Configuration

# crates/ruvector-attention-wasm/Cargo.toml
[package]
name = "ruvector-attention-wasm"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "WebAssembly bindings for RuVector attention mechanisms"
repository = "https://github.com/yourusername/ruvector"

[lib]
crate-type = ["cdylib", "rlib"]

[features]
default = ["console_error_panic_hook", "wee_alloc"]
simd = ["packed_simd"]
parallel = ["wasm-bindgen-rayon"]

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
wasm-bindgen = "0.2.92"
wasm-bindgen-futures = "0.4.42"
js-sys = "0.3.69"
web-sys = { version = "0.3.69", features = [
    "console",
    "Performance",
    "PerformanceTiming",
    "Window",
    "Worker",
] }
serde = { version = "1.0", features = ["derive"] }
serde-wasm-bindgen = "0.6"
console_error_panic_hook = { version = "0.1.7", optional = true }
wee_alloc = { version = "0.4.5", optional = true }
wasm-bindgen-rayon = { version = "1.2", optional = true }

[dev-dependencies]
wasm-bindgen-test = "0.3.42"
criterion = "0.5"

[profile.release]
# Optimize for size and speed
opt-level = "z"        # Optimize for size
lto = true             # Enable Link Time Optimization
codegen-units = 1      # Reduce parallel codegen for better optimization
panic = "abort"        # Remove panic formatting code
strip = true           # Strip symbols

[profile.release.package."*"]
opt-level = "z"

2.3 WASM API Design

2.3.1 Core Library (lib.rs)

// crates/ruvector-attention-wasm/src/lib.rs
use wasm_bindgen::prelude::*;

mod attention;
mod error;
mod types;
mod utils;

pub use attention::*;
pub use error::WasmError;
pub use types::*;

// Initialize WASM module
#[wasm_bindgen(start)]
pub fn init() {
    #[cfg(feature = "console_error_panic_hook")]
    console_error_panic_hook::set_once();

    #[cfg(feature = "wee_alloc")]
    {
        #[global_allocator]
        static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;
    }
}

// Version info
#[wasm_bindgen]
pub fn version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}

2.3.2 Scaled Dot-Product Attention

// crates/ruvector-attention-wasm/src/attention/scaled_dot.rs
use wasm_bindgen::prelude::*;
use js_sys::{Float32Array, Array};
use ruvector_attention::ScaledDotProduct;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmScaledDotProduct {
    inner: ScaledDotProduct,
}

#[wasm_bindgen]
impl WasmScaledDotProduct {
    /// Create a new scaled dot-product attention layer
    #[wasm_bindgen(constructor)]
    pub fn new(dim: usize) -> Result<WasmScaledDotProduct, WasmError> {
        Ok(Self {
            inner: ScaledDotProduct::new(dim),
        })
    }

    /// Forward pass with single query
    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(
            query,
            &keys_vec,
            &values_vec,
            num_neighbors,
        )?;

        Ok(Float32Array::from(&result[..]))
    }

    /// Forward pass with batched queries
    #[wasm_bindgen]
    pub fn forward_batch(
        &self,
        queries: Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let results = Array::new();
        for i in 0..queries.length() {
            let query_arr = Float32Array::from(queries.get(i));
            let query_vec: Vec<f32> = query_arr.to_vec();

            let result = self.inner.forward(
                &query_vec,
                &keys_vec,
                &values_vec,
                num_neighbors,
            )?;

            results.push(&Float32Array::from(&result[..]));
        }

        Ok(results)
    }

    /// Async forward pass (runs on web worker or async runtime)
    #[wasm_bindgen]
    pub async fn forward_async(
        &self,
        query: Vec<f32>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        // Use wasm-bindgen-futures for async execution
        let result = wasm_bindgen_futures::spawn_local(async move {
            self.inner.forward(&query, &keys_vec, &values_vec, num_neighbors)
        })
        .await?;

        Ok(Float32Array::from(&result[..]))
    }
}

2.3.3 Multi-Head Attention

// crates/ruvector-attention-wasm/src/attention/multi_head.rs
use wasm_bindgen::prelude::*;
use js_sys::Float32Array;
use ruvector_attention::MultiHeadAttention;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmMultiHeadAttention {
    inner: MultiHeadAttention,
}

#[wasm_bindgen]
impl WasmMultiHeadAttention {
    #[wasm_bindgen(constructor)]
    pub fn new(
        num_heads: usize,
        hidden_dim: usize,
        dropout: f32,
    ) -> Result<WasmMultiHeadAttention, WasmError> {
        Ok(Self {
            inner: MultiHeadAttention::new(num_heads, hidden_dim, dropout)?,
        })
    }

    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(
            query,
            &keys_vec,
            &values_vec,
            mask.as_deref(),
        )?;

        Ok(Float32Array::from(&result[..]))
    }

    #[wasm_bindgen(getter)]
    pub fn num_heads(&self) -> usize {
        self.inner.num_heads()
    }

    #[wasm_bindgen(getter)]
    pub fn hidden_dim(&self) -> usize {
        self.inner.hidden_dim()
    }
}

2.3.4 Hyperbolic Attention

// crates/ruvector-attention-wasm/src/attention/hyperbolic.rs
use wasm_bindgen::prelude::*;
use js_sys::Float32Array;
use ruvector_attention::HyperbolicAttention;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmHyperbolicAttention {
    inner: HyperbolicAttention,
}

#[wasm_bindgen]
impl WasmHyperbolicAttention {
    #[wasm_bindgen(constructor)]
    pub fn new(curvature: f32) -> Result<WasmHyperbolicAttention, WasmError> {
        Ok(Self {
            inner: HyperbolicAttention::new(curvature),
        })
    }

    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(query, &keys_vec, &values_vec)?;

        Ok(Float32Array::from(&result[..]))
    }

    /// Compute Poincaré distance between two points
    #[wasm_bindgen]
    pub fn poincare_distance(&self, x: &[f32], y: &[f32]) -> Result<f32, WasmError> {
        self.inner.poincare_distance(x, y)
    }

    /// Project Euclidean point to Poincaré ball
    #[wasm_bindgen]
    pub fn to_poincare(&self, x: &[f32]) -> Result<Float32Array, WasmError> {
        let result = self.inner.to_poincare(x)?;
        Ok(Float32Array::from(&result[..]))
    }

    #[wasm_bindgen(getter)]
    pub fn curvature(&self) -> f32 {
        self.inner.curvature()
    }
}

2.4 Build Scripts and Configuration

2.4.1 Build Script

#!/bin/bash
# scripts/build-wasm.sh

set -e

echo "Building RuVector Attention WASM..."

# Clean previous builds
rm -rf pkg/

# Build for web (browser)
echo "Building for web target..."
wasm-pack build \
    --target web \
    --out-dir pkg/web \
    --release \
    crates/ruvector-attention-wasm

# Build for Node.js
echo "Building for Node.js target..."
wasm-pack build \
    --target nodejs \
    --out-dir pkg/nodejs \
    --release \
    crates/ruvector-attention-wasm

# Build for bundlers (webpack, vite, etc.)
echo "Building for bundler target..."
wasm-pack build \
    --target bundler \
    --out-dir pkg/bundler \
    --release \
    crates/ruvector-attention-wasm

# Optional: Build with SIMD (requires nightly + flags)
if [ "$BUILD_SIMD" = "1" ]; then
    echo "Building SIMD version (requires nightly)..."
    RUSTFLAGS="-C target-feature=+simd128" \
    wasm-pack build \
        --target web \
        --out-dir pkg/web-simd \
        --release \
        -- --features simd \
        crates/ruvector-attention-wasm
fi

echo "WASM build complete!"
echo "Outputs:"
echo "  - Web:     pkg/web/"
echo "  - Node.js: pkg/nodejs/"
echo "  - Bundler: pkg/bundler/"

2.4.2 Package.json (for NPM publishing)

{
  "name": "ruvector-attention-wasm",
  "version": "0.1.0",
  "description": "WebAssembly bindings for RuVector attention mechanisms",
  "main": "pkg/nodejs/ruvector_attention_wasm.js",
  "module": "pkg/web/ruvector_attention_wasm.js",
  "types": "pkg/web/ruvector_attention_wasm.d.ts",
  "files": [
    "pkg/**/*"
  ],
  "scripts": {
    "build": "./scripts/build-wasm.sh",
    "test": "wasm-pack test --headless --chrome --firefox",
    "test:node": "wasm-pack test --node"
  },
  "keywords": [
    "wasm",
    "attention",
    "machine-learning",
    "rust",
    "webassembly"
  ],
  "license": "MIT OR Apache-2.0",
  "repository": {
    "type": "git",
    "url": "https://github.com/yourusername/ruvector"
  }
}

2.5 TypeScript Definitions (Auto-generated)

// pkg/web/ruvector_attention_wasm.d.ts (generated by wasm-bindgen)
export function init(): void;
export function version(): string;

export class WasmScaledDotProduct {
    constructor(dim: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Float32Array;
    forwardBatch(
        queries: Float32Array[],
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Float32Array[];
    forwardAsync(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Promise<Float32Array>;
}

export class WasmMultiHeadAttention {
    constructor(numHeads: number, hiddenDim: number, dropout: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask?: boolean[]
    ): Float32Array;
    readonly numHeads: number;
    readonly hiddenDim: number;
}

export class WasmHyperbolicAttention {
    constructor(curvature: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array
    ): Float32Array;
    poincareDistance(x: Float32Array, y: Float32Array): number;
    toPoincare(x: Float32Array): Float32Array;
    readonly curvature: number;
}

3. NAPI-RS Bindings (Node.js)

3.1 Project Structure

crates/ruvector-attention-node/
├── Cargo.toml
├── build.rs
├── package.json
├── index.js                   # JS entry point
├── index.d.ts                 # TypeScript definitions
├── README.md
├── src/
│   ├── lib.rs                 # NAPI exports
│   ├── attention/
│   │   ├── mod.rs
│   │   ├── scaled_dot.rs
│   │   ├── multi_head.rs
│   │   ├── hyperbolic.rs
│   │   ├── linear.rs
│   │   └── cross.rs
│   ├── async_ops.rs           # Tokio async operations
│   ├── error.rs               # NAPI error handling
│   ├── buffer.rs              # Buffer conversions
│   └── types.rs               # Type conversions
├── npm/                       # Platform-specific packages
│   ├── darwin-arm64/
│   │   ├── package.json
│   │   └── README.md
│   ├── darwin-x64/
│   ├── linux-arm64-gnu/
│   ├── linux-arm64-musl/
│   ├── linux-x64-gnu/
│   ├── linux-x64-musl/
│   ├── win32-arm64-msvc/
│   └── win32-x64-msvc/
├── __tests__/
│   ├── basic.test.ts
│   ├── async.test.ts
│   └── performance.test.ts
└── examples/
    ├── basic.mjs
    ├── async.mjs
    └── streaming.mjs

3.2 Cargo Configuration

# crates/ruvector-attention-node/Cargo.toml
[package]
name = "ruvector-attention-node"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "Node.js bindings for RuVector attention mechanisms via NAPI-RS"

[lib]
crate-type = ["cdylib"]

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
napi = { version = "2.16", features = ["async", "napi8", "tokio_rt"] }
napi-derive = "2.16"
tokio = { version = "1.37", features = ["rt-multi-thread"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

[build-dependencies]
napi-build = "2.1"

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

3.3 NAPI API Design

3.3.1 Core Library

// crates/ruvector-attention-node/src/lib.rs
#![deny(clippy::all)]

#[macro_use]
extern crate napi_derive;

mod attention;
mod async_ops;
mod buffer;
mod error;
mod types;

pub use attention::*;
pub use error::NapiError;

#[napi]
pub fn get_version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}

#[napi]
pub fn get_supported_features() -> Vec<String> {
    vec![
        "scaled-dot-product".to_string(),
        "multi-head".to_string(),
        "hyperbolic".to_string(),
        "linear".to_string(),
        "cross-attention".to_string(),
        "self-attention".to_string(),
    ]
}

3.3.2 Scaled Dot-Product Attention

// crates/ruvector-attention-node/src/attention/scaled_dot.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::ScaledDotProduct as CoreScaledDotProduct;

#[napi]
pub struct ScaledDotProductAttention {
    inner: CoreScaledDotProduct,
}

#[napi]
impl ScaledDotProductAttention {
    #[napi(constructor)]
    pub fn new(dim: u32) -> Result<Self> {
        Ok(Self {
            inner: CoreScaledDotProduct::new(dim as usize),
        })
    }

    /// Synchronous forward pass
    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    /// Asynchronous forward pass (non-blocking)
    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    /// Batch forward pass
    #[napi]
    pub fn forward_batch(
        &self,
        queries: Vec<Float32Array>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Vec<Float32Array>> {
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        queries
            .into_iter()
            .map(|q| {
                let query_vec = q.to_vec();
                self.inner
                    .forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
                    .map(Float32Array::new)
                    .map_err(|e| Error::from_reason(e.to_string()))
            })
            .collect()
    }

    /// Async batch forward pass
    #[napi]
    pub async fn forward_batch_async(
        &self,
        queries: Vec<Float32Array>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Vec<Float32Array>> {
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        let query_vecs: Vec<Vec<f32>> = queries.into_iter()
            .map(|q| q.to_vec())
            .collect();

        tokio::task::spawn_blocking(move || {
            query_vecs.into_iter()
                .map(|q| {
                    inner.forward(&q, &keys_vec, &values_vec, num_neighbors as usize)
                        .map(Float32Array::new)
                })
                .collect::<Result<Vec<_>, _>>()
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi(getter)]
    pub fn dim(&self) -> u32 {
        self.inner.dim() as u32
    }
}

3.3.3 Multi-Head Attention

// crates/ruvector-attention-node/src/attention/multi_head.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::MultiHeadAttention as CoreMultiHeadAttention;

#[napi(object)]
pub struct AttentionConfig {
    pub num_heads: u32,
    pub hidden_dim: u32,
    pub dropout: f64,
}

#[napi]
pub struct MultiHeadAttention {
    inner: CoreMultiHeadAttention,
}

#[napi]
impl MultiHeadAttention {
    #[napi(constructor)]
    pub fn new(config: AttentionConfig) -> Result<Self> {
        let inner = CoreMultiHeadAttention::new(
            config.num_heads as usize,
            config.hidden_dim as usize,
            config.dropout as f32,
        )
        .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Self { inner })
    }

    #[napi(factory)]
    pub fn from_params(num_heads: u32, hidden_dim: u32, dropout: Option<f64>) -> Result<Self> {
        Self::new(AttentionConfig {
            num_heads,
            hidden_dim,
            dropout: dropout.unwrap_or(0.0),
        })
    }

    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec, mask.as_deref())
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec, mask.as_deref())
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    #[napi(getter)]
    pub fn num_heads(&self) -> u32 {
        self.inner.num_heads() as u32
    }

    #[napi(getter)]
    pub fn hidden_dim(&self) -> u32 {
        self.inner.hidden_dim() as u32
    }

    #[napi(getter)]
    pub fn head_dim(&self) -> u32 {
        (self.inner.hidden_dim() / self.inner.num_heads()) as u32
    }
}

3.3.4 Hyperbolic Attention

// crates/ruvector-attention-node/src/attention/hyperbolic.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::HyperbolicAttention as CoreHyperbolicAttention;

#[napi]
pub struct HyperbolicAttention {
    inner: CoreHyperbolicAttention,
}

#[napi]
impl HyperbolicAttention {
    #[napi(constructor)]
    pub fn new(curvature: f64) -> Result<Self> {
        Ok(Self {
            inner: CoreHyperbolicAttention::new(curvature as f32),
        })
    }

    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec)
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec)
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    #[napi]
    pub fn poincare_distance(&self, x: Float32Array, y: Float32Array) -> Result<f64> {
        let x_vec = x.to_vec();
        let y_vec = y.to_vec();

        self.inner
            .poincare_distance(&x_vec, &y_vec)
            .map(|d| d as f64)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi]
    pub fn to_poincare(&self, x: Float32Array) -> Result<Float32Array> {
        let x_vec = x.to_vec();

        self.inner
            .to_poincare(&x_vec)
            .map(Float32Array::new)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi]
    pub fn from_poincare(&self, x: Float32Array) -> Result<Float32Array> {
        let x_vec = x.to_vec();

        self.inner
            .from_poincare(&x_vec)
            .map(Float32Array::new)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi(getter)]
    pub fn curvature(&self) -> f64 {
        self.inner.curvature() as f64
    }
}

3.4 Package Configuration

3.4.1 package.json

{
  "name": "ruvector-attention",
  "version": "0.1.0",
  "description": "High-performance attention mechanisms for Node.js via Rust/NAPI-RS",
  "main": "index.js",
  "types": "index.d.ts",
  "keywords": [
    "attention",
    "machine-learning",
    "rust",
    "napi",
    "native",
    "performance"
  ],
  "license": "MIT OR Apache-2.0",
  "author": "RuVector Team",
  "repository": {
    "type": "git",
    "url": "https://github.com/yourusername/ruvector"
  },
  "engines": {
    "node": ">= 18"
  },
  "napi": {
    "name": "ruvector-attention",
    "triples": {
      "defaults": true,
      "additional": [
        "aarch64-apple-darwin",
        "aarch64-unknown-linux-gnu",
        "aarch64-unknown-linux-musl",
        "armv7-unknown-linux-gnueabihf",
        "x86_64-unknown-linux-musl",
        "aarch64-pc-windows-msvc"
      ]
    }
  },
  "scripts": {
    "artifacts": "napi artifacts",
    "build": "napi build --platform --release",
    "build:debug": "napi build --platform",
    "prepublishOnly": "napi prepublish -t npm",
    "test": "jest",
    "test:coverage": "jest --coverage",
    "version": "napi version",
    "bench": "node benches/benchmark.mjs"
  },
  "devDependencies": {
    "@napi-rs/cli": "^2.18.0",
    "@types/node": "^20.11.0",
    "jest": "^29.7.0",
    "typescript": "^5.3.3"
  },
  "optionalDependencies": {
    "ruvector-attention-darwin-arm64": "0.1.0",
    "ruvector-attention-darwin-x64": "0.1.0",
    "ruvector-attention-linux-arm64-gnu": "0.1.0",
    "ruvector-attention-linux-arm64-musl": "0.1.0",
    "ruvector-attention-linux-x64-gnu": "0.1.0",
    "ruvector-attention-linux-x64-musl": "0.1.0",
    "ruvector-attention-win32-arm64-msvc": "0.1.0",
    "ruvector-attention-win32-x64-msvc": "0.1.0"
  }
}

3.4.2 TypeScript Definitions

// index.d.ts
export function getVersion(): string;
export function getSupportedFeatures(): string[];

export interface AttentionConfig {
  numHeads: number;
  hiddenDim: number;
  dropout: number;
}

export class ScaledDotProductAttention {
  constructor(dim: number);

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Promise<Float32Array>;

  forwardBatch(
    queries: Float32Array[],
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Float32Array[];

  forwardBatchAsync(
    queries: Float32Array[],
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Promise<Float32Array[]>;

  readonly dim: number;
}

export class MultiHeadAttention {
  constructor(config: AttentionConfig);

  static fromParams(
    numHeads: number,
    hiddenDim: number,
    dropout?: number
  ): MultiHeadAttention;

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    mask?: boolean[]
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    mask?: boolean[]
  ): Promise<Float32Array>;

  readonly numHeads: number;
  readonly hiddenDim: number;
  readonly headDim: number;
}

export class HyperbolicAttention {
  constructor(curvature: number);

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array
  ): Promise<Float32Array>;

  poincareDistance(x: Float32Array, y: Float32Array): number;
  toPoincare(x: Float32Array): Float32Array;
  fromPoincare(x: Float32Array): Float32Array;

  readonly curvature: number;
}

// Export all attention types
export { ScaledDotProductAttention as ScaledDotProduct };
export { MultiHeadAttention as MultiHead };
export { HyperbolicAttention as Hyperbolic };

3.5 Build and Deployment

3.5.1 GitHub Actions Workflow

# .github/workflows/napi.yml
name: NAPI Build and Release

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:

jobs:
  build:
    strategy:
      fail-fast: false
      matrix:
        settings:
          - host: macos-latest
            target: x86_64-apple-darwin
            build: pnpm build --target x86_64-apple-darwin
          - host: macos-latest
            target: aarch64-apple-darwin
            build: pnpm build --target aarch64-apple-darwin
          - host: windows-latest
            target: x86_64-pc-windows-msvc
            build: pnpm build --target x86_64-pc-windows-msvc
          - host: ubuntu-latest
            target: x86_64-unknown-linux-gnu
            build: pnpm build --target x86_64-unknown-linux-gnu
          - host: ubuntu-latest
            target: x86_64-unknown-linux-musl
            build: pnpm build --target x86_64-unknown-linux-musl
          - host: ubuntu-latest
            target: aarch64-unknown-linux-gnu
            build: pnpm build --target aarch64-unknown-linux-gnu

    name: Build ${{ matrix.settings.target }}
    runs-on: ${{ matrix.settings.host }}

    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
        with:
          targets: ${{ matrix.settings.target }}

      - name: Build
        run: ${{ matrix.settings.build }}

      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: bindings-${{ matrix.settings.target }}
          path: crates/ruvector-attention-node/*.node

4. CLI Interface

4.1 Project Structure

crates/ruvector-attention-cli/
├── Cargo.toml
├── README.md
├── src/
│   ├── main.rs                # CLI entry point
│   ├── commands/
│   │   ├── mod.rs
│   │   ├── compute.rs         # Compute attention
│   │   ├── benchmark.rs       # Benchmarking
│   │   ├── convert.rs         # Model conversion
│   │   ├── serve.rs           # HTTP server
│   │   └── repl.rs            # Interactive REPL
│   ├── config.rs              # Configuration management
│   ├── format.rs              # Input/output formats
│   ├── server/
│   │   ├── mod.rs
│   │   ├── handlers.rs
│   │   └── middleware.rs
│   └── utils.rs
├── tests/
│   ├── integration.rs
│   └── cli.rs
└── examples/
    ├── config.toml
    └── sample_data/

4.2 Cargo Configuration

# crates/ruvector-attention-cli/Cargo.toml
[package]
name = "ruvector-attention-cli"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "CLI for RuVector attention mechanisms"

[[bin]]
name = "ruvector-attention"
path = "src/main.rs"

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
clap = { version = "4.5", features = ["derive", "cargo"] }
tokio = { version = "1.37", features = ["full"] }
axum = "0.7"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
toml = "0.8"
anyhow = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
indicatif = "0.17"
comfy-table = "7.1"
colored = "2.1"
rustyline = "14.0"

[dev-dependencies]
assert_cmd = "2.0"
predicates = "3.1"
tempfile = "3.10"

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

4.3 CLI Design

4.3.1 Main CLI Structure

// src/main.rs
use clap::{Parser, Subcommand};
use anyhow::Result;

mod commands;
mod config;
mod format;
mod server;
mod utils;

#[derive(Parser)]
#[command(name = "ruvector-attention")]
#[command(version, about, long_about = None)]
struct Cli {
    #[command(subcommand)]
    command: Commands,

    /// Enable verbose logging
    #[arg(short, long, global = true)]
    verbose: bool,

    /// Configuration file
    #[arg(short, long, global = true)]
    config: Option<String>,
}

#[derive(Subcommand)]
enum Commands {
    /// Compute attention for given inputs
    Compute(commands::compute::ComputeArgs),

    /// Run benchmarks
    Benchmark(commands::benchmark::BenchmarkArgs),

    /// Convert between model formats
    Convert(commands::convert::ConvertArgs),

    /// Start HTTP server
    Serve(commands::serve::ServeArgs),

    /// Interactive REPL
    Repl(commands::repl::ReplArgs),
}

#[tokio::main]
async fn main() -> Result<()> {
    let cli = Cli::parse();

    // Initialize logging
    let log_level = if cli.verbose { "debug" } else { "info" };
    tracing_subscriber::fmt()
        .with_env_filter(log_level)
        .init();

    // Execute command
    match cli.command {
        Commands::Compute(args) => commands::compute::run(args).await,
        Commands::Benchmark(args) => commands::benchmark::run(args).await,
        Commands::Convert(args) => commands::convert::run(args).await,
        Commands::Serve(args) => commands::serve::run(args).await,
        Commands::Repl(args) => commands::repl::run(args).await,
    }
}

4.3.2 Compute Command

// src/commands/compute.rs
use clap::Args;
use anyhow::{Result, Context};
use ruvector_attention::*;
use std::path::PathBuf;

#[derive(Args)]
pub struct ComputeArgs {
    /// Attention type (scaled-dot-product, multi-head, hyperbolic, linear)
    #[arg(short, long)]
    attention_type: String,

    /// Query vector file (binary f32)
    #[arg(short, long)]
    query: PathBuf,

    /// Keys file (binary f32)
    #[arg(short, long)]
    keys: PathBuf,

    /// Values file (binary f32)
    #[arg(short, long)]
    values: PathBuf,

    /// Output file
    #[arg(short, long)]
    output: PathBuf,

    /// Number of neighbors (for k-NN attention)
    #[arg(long, default_value = "100")]
    neighbors: usize,

    /// Number of heads (for multi-head attention)
    #[arg(long, default_value = "8")]
    num_heads: usize,

    /// Hidden dimension
    #[arg(long, default_value = "128")]
    hidden_dim: usize,
}

pub async fn run(args: ComputeArgs) -> Result<()> {
    println!("Computing {} attention...", args.attention_type);

    // Load inputs
    let query = load_f32_binary(&args.query)
        .context("Failed to load query")?;
    let keys = load_f32_binary(&args.keys)
        .context("Failed to load keys")?;
    let values = load_f32_binary(&args.values)
        .context("Failed to load values")?;

    // Compute attention
    let result = match args.attention_type.as_str() {
        "scaled-dot-product" => {
            let attention = ScaledDotProduct::new(args.hidden_dim);
            attention.forward(&query, &keys, &values, args.neighbors)?
        },
        "multi-head" => {
            let attention = MultiHeadAttention::new(
                args.num_heads,
                args.hidden_dim,
                0.0,
            )?;
            attention.forward(&query, &keys, &values, None)?
        },
        "hyperbolic" => {
            let attention = HyperbolicAttention::new(1.0);
            attention.forward(&query, &keys, &values)?
        },
        _ => anyhow::bail!("Unknown attention type: {}", args.attention_type),
    };

    // Save output
    save_f32_binary(&args.output, &result)
        .context("Failed to save output")?;

    println!("✓ Attention computed successfully");
    println!("  Output shape: {}", result.len());
    println!("  Saved to: {}", args.output.display());

    Ok(())
}

fn load_f32_binary(path: &PathBuf) -> Result<Vec<f32>> {
    let bytes = std::fs::read(path)?;
    let floats: Vec<f32> = bytes
        .chunks_exact(4)
        .map(|chunk| f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]))
        .collect();
    Ok(floats)
}

fn save_f32_binary(path: &PathBuf, data: &[f32]) -> Result<()> {
    let bytes: Vec<u8> = data.iter()
        .flat_map(|f| f.to_le_bytes())
        .collect();
    std::fs::write(path, bytes)?;
    Ok(())
}

4.3.3 Benchmark Command

// src/commands/benchmark.rs
use clap::Args;
use anyhow::Result;
use ruvector_attention::*;
use std::time::Instant;
use comfy_table::{Table, presets::UTF8_FULL};
use colored::Colorize;

#[derive(Args)]
pub struct BenchmarkArgs {
    /// Attention types to benchmark (comma-separated)
    #[arg(long, default_value = "scaled-dot-product,multi-head,hyperbolic")]
    types: String,

    /// Dimensions to test (comma-separated)
    #[arg(long, default_value = "128,256,512")]
    dims: String,

    /// Number of neighbors to test
    #[arg(long, default_value = "100,500,1000")]
    neighbors: String,

    /// Number of iterations
    #[arg(long, default_value = "1000")]
    iterations: usize,

    /// Output file for results (JSON)
    #[arg(short, long)]
    output: Option<PathBuf>,
}

pub async fn run(args: BenchmarkArgs) -> Result<()> {
    println!("{}", "Running Benchmarks...".bold().green());

    let types: Vec<&str> = args.types.split(',').collect();
    let dims: Vec<usize> = args.dims.split(',')
        .filter_map(|s| s.parse().ok())
        .collect();
    let neighbors: Vec<usize> = args.neighbors.split(',')
        .filter_map(|s| s.parse().ok())
        .collect();

    let mut table = Table::new();
    table.load_preset(UTF8_FULL);
    table.set_header(vec!["Type", "Dim", "Neighbors", "Avg Time (ms)", "Throughput (ops/s)"]);

    for attention_type in &types {
        for &dim in &dims {
            for &k in &neighbors {
                let avg_time = benchmark_attention(
                    attention_type,
                    dim,
                    k,
                    args.iterations,
                )?;

                let throughput = 1000.0 / avg_time;

                table.add_row(vec![
                    attention_type.to_string(),
                    dim.to_string(),
                    k.to_string(),
                    format!("{:.3}", avg_time),
                    format!("{:.0}", throughput),
                ]);
            }
        }
    }

    println!("\n{}", table);

    Ok(())
}

fn benchmark_attention(
    attention_type: &str,
    dim: usize,
    num_neighbors: usize,
    iterations: usize,
) -> Result<f64> {
    // Generate random data
    let query: Vec<f32> = (0..dim).map(|_| rand::random()).collect();
    let keys: Vec<f32> = (0..dim * num_neighbors).map(|_| rand::random()).collect();
    let values: Vec<f32> = (0..dim * num_neighbors).map(|_| rand::random()).collect();

    let start = Instant::now();

    for _ in 0..iterations {
        match attention_type {
            "scaled-dot-product" => {
                let attention = ScaledDotProduct::new(dim);
                let _ = attention.forward(&query, &keys, &values, num_neighbors)?;
            },
            "multi-head" => {
                let attention = MultiHeadAttention::new(8, dim, 0.0)?;
                let _ = attention.forward(&query, &keys, &values, None)?;
            },
            "hyperbolic" => {
                let attention = HyperbolicAttention::new(1.0);
                let _ = attention.forward(&query, &keys, &values)?;
            },
            _ => {},
        }
    }

    let elapsed = start.elapsed();
    let avg_ms = elapsed.as_secs_f64() * 1000.0 / iterations as f64;

    Ok(avg_ms)
}

4.3.4 Server Command

// src/commands/serve.rs
use clap::Args;
use anyhow::Result;
use axum::{
    routing::{get, post},
    Router, Json,
    http::StatusCode,
};
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;

#[derive(Args)]
pub struct ServeArgs {
    /// Server host
    #[arg(long, default_value = "0.0.0.0")]
    host: String,

    /// Server port
    #[arg(short, long, default_value = "8080")]
    port: u16,

    /// Maximum batch size
    #[arg(long, default_value = "256")]
    max_batch_size: usize,
}

#[derive(Deserialize)]
struct AttentionRequest {
    attention_type: String,
    query: Vec<f32>,
    keys: Vec<f32>,
    values: Vec<f32>,
    num_neighbors: Option<usize>,
    num_heads: Option<usize>,
    hidden_dim: Option<usize>,
}

#[derive(Serialize)]
struct AttentionResponse {
    result: Vec<f32>,
    computation_time_ms: f64,
}

pub async fn run(args: ServeArgs) -> Result<()> {
    let app = Router::new()
        .route("/", get(health_check))
        .route("/attention", post(compute_attention))
        .route("/health", get(health_check));

    let addr: SocketAddr = format!("{}:{}", args.host, args.port).parse()?;

    println!("🚀 Server listening on http://{}", addr);

    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await?;

    Ok(())
}

async fn health_check() -> &'static str {
    "OK"
}

async fn compute_attention(
    Json(req): Json<AttentionRequest>,
) -> Result<Json<AttentionResponse>, StatusCode> {
    let start = std::time::Instant::now();

    let result = match req.attention_type.as_str() {
        "scaled-dot-product" => {
            let dim = req.hidden_dim.unwrap_or(128);
            let k = req.num_neighbors.unwrap_or(100);
            let attention = ScaledDotProduct::new(dim);
            attention.forward(&req.query, &req.keys, &req.values, k)
                .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
        },
        _ => return Err(StatusCode::BAD_REQUEST),
    };

    let elapsed = start.elapsed().as_secs_f64() * 1000.0;

    Ok(Json(AttentionResponse {
        result,
        computation_time_ms: elapsed,
    }))
}

4.4 Configuration File

# config.toml
[attention]
default_type = "multi-head"
num_heads = 8
hidden_dim = 128
dropout = 0.1

[optimization]
use_simd = true
num_threads = 4
batch_size = 64

[server]
host = "0.0.0.0"
port = 8080
max_batch_size = 256
timeout_ms = 5000
max_connections = 1000

[logging]
level = "info"
format = "json"

5. SDK Design

5.1 High-Level Rust SDK

// src/sdk/mod.rs
use crate::*;

pub mod prelude {
    pub use super::{
        AttentionBuilder,
        AttentionType,
        AttentionConfig,
    };
    pub use crate::{
        ScaledDotProduct,
        MultiHeadAttention,
        HyperbolicAttention,
    };
}

#[derive(Debug, Clone)]
pub enum AttentionType {
    ScaledDotProduct { neighbors: usize },
    MultiHead { num_heads: usize },
    Hyperbolic { curvature: f32 },
    Linear,
    Auto, // Auto-select based on input
}

#[derive(Debug, Clone)]
pub struct AttentionConfig {
    pub hidden_dim: usize,
    pub dropout: f32,
    pub use_simd: bool,
}

impl Default for AttentionConfig {
    fn default() -> Self {
        Self {
            hidden_dim: 128,
            dropout: 0.0,
            use_simd: true,
        }
    }
}

pub struct AttentionBuilder {
    attention_type: Option<AttentionType>,
    config: AttentionConfig,
}

impl AttentionBuilder {
    pub fn new() -> Self {
        Self {
            attention_type: None,
            config: AttentionConfig::default(),
        }
    }

    pub fn attention_type(mut self, att_type: AttentionType) -> Self {
        self.attention_type = Some(att_type);
        self
    }

    pub fn hidden_dim(mut self, dim: usize) -> Self {
        self.config.hidden_dim = dim;
        self
    }

    pub fn dropout(mut self, dropout: f32) -> Self {
        self.config.dropout = dropout;
        self
    }

    pub fn build(self) -> Result<Box<dyn Attention>> {
        let att_type = self.attention_type
            .ok_or_else(|| anyhow::anyhow!("Attention type not specified"))?;

        let attention: Box<dyn Attention> = match att_type {
            AttentionType::ScaledDotProduct { neighbors } => {
                Box::new(ScaledDotProduct::new(self.config.hidden_dim))
            },
            AttentionType::MultiHead { num_heads } => {
                Box::new(MultiHeadAttention::new(
                    num_heads,
                    self.config.hidden_dim,
                    self.config.dropout,
                )?)
            },
            AttentionType::Hyperbolic { curvature } => {
                Box::new(HyperbolicAttention::new(curvature))
            },
            AttentionType::Auto => {
                // Auto-select based on configuration
                Box::new(MultiHeadAttention::new(8, self.config.hidden_dim, 0.0)?)
            },
            _ => unimplemented!(),
        };

        Ok(attention)
    }
}

// Common trait for all attention mechanisms
pub trait Attention {
    fn forward(
        &self,
        query: &[f32],
        keys: &[f32],
        values: &[f32],
    ) -> Result<Vec<f32>>;
}

5.2 JavaScript/TypeScript SDK

// sdk/typescript/src/index.ts
import {
    ScaledDotProductAttention,
    MultiHeadAttention,
    HyperbolicAttention,
} from 'ruvector-attention';

export enum AttentionType {
    ScaledDotProduct = 'scaled-dot-product',
    MultiHead = 'multi-head',
    Hyperbolic = 'hyperbolic',
    Linear = 'linear',
    Auto = 'auto',
}

export interface AttentionConfig {
    type: AttentionType;
    hiddenDim: number;
    dropout?: number;
    numHeads?: number;
    neighbors?: number;
    curvature?: number;
}

export class Attention {
    private inner: any;
    private config: AttentionConfig;

    constructor(config: AttentionConfig) {
        this.config = config;
        this.inner = this.createAttention(config);
    }

    private createAttention(config: AttentionConfig) {
        switch (config.type) {
            case AttentionType.ScaledDotProduct:
                return new ScaledDotProductAttention(config.hiddenDim);

            case AttentionType.MultiHead:
                return MultiHeadAttention.fromParams(
                    config.numHeads || 8,
                    config.hiddenDim,
                    config.dropout || 0.0
                );

            case AttentionType.Hyperbolic:
                return new HyperbolicAttention(config.curvature || 1.0);

            case AttentionType.Auto:
                // Auto-select based on input size
                return MultiHeadAttention.fromParams(8, config.hiddenDim);

            default:
                throw new Error(`Unknown attention type: ${config.type}`);
        }
    }

    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        options?: { mask?: boolean[], neighbors?: number }
    ): Float32Array {
        if (this.config.type === AttentionType.ScaledDotProduct) {
            return this.inner.forward(
                query,
                keys,
                values,
                options?.neighbors || 100
            );
        } else {
            return this.inner.forward(query, keys, values, options?.mask);
        }
    }

    async forwardAsync(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        options?: { mask?: boolean[], neighbors?: number }
    ): Promise<Float32Array> {
        if (this.config.type === AttentionType.ScaledDotProduct) {
            return this.inner.forwardAsync(
                query,
                keys,
                values,
                options?.neighbors || 100
            );
        } else {
            return this.inner.forwardAsync(query, keys, values, options?.mask);
        }
    }

    // Streaming API for large inputs
    async *forwardStream(
        queryStream: AsyncIterable<Float32Array>,
        keys: Float32Array,
        values: Float32Array
    ): AsyncGenerator<Float32Array> {
        for await (const query of queryStream) {
            yield await this.forwardAsync(query, keys, values);
        }
    }
}

// Builder pattern
export class AttentionBuilder {
    private config: Partial<AttentionConfig> = {};

    type(type: AttentionType, options?: {
        numHeads?: number,
        neighbors?: number,
        curvature?: number
    }): this {
        this.config.type = type;
        if (options?.numHeads) this.config.numHeads = options.numHeads;
        if (options?.neighbors) this.config.neighbors = options.neighbors;
        if (options?.curvature) this.config.curvature = options.curvature;
        return this;
    }

    hiddenDim(dim: number): this {
        this.config.hiddenDim = dim;
        return this;
    }

    dropout(dropout: number): this {
        this.config.dropout = dropout;
        return this;
    }

    build(): Attention {
        if (!this.config.type) {
            throw new Error('Attention type not specified');
        }
        if (!this.config.hiddenDim) {
            throw new Error('Hidden dimension not specified');
        }

        return new Attention(this.config as AttentionConfig);
    }
}

// Export everything
export {
    ScaledDotProductAttention,
    MultiHeadAttention,
    HyperbolicAttention,
};

6. Testing Strategy

6.1 Platform-Specific Tests

// WASM tests
#[cfg(target_arch = "wasm32")]
mod wasm_tests {
    use wasm_bindgen_test::*;

    #[wasm_bindgen_test]
    fn test_scaled_dot_product() {
        // Test WASM-specific behavior
    }
}

// NAPI tests
#[cfg(all(not(target_arch = "wasm32"), feature = "napi"))]
mod napi_tests {
    use napi::bindgen_prelude::*;

    #[test]
    fn test_napi_conversion() {
        // Test NAPI-specific behavior
    }
}

6.2 Integration Tests

// __tests__/integration.test.ts
import { AttentionBuilder, AttentionType } from 'ruvector-attention';

describe('Attention Integration Tests', () => {
    test('scaled dot-product attention', async () => {
        const attention = new AttentionBuilder()
            .type(AttentionType.ScaledDotProduct, { neighbors: 100 })
            .hiddenDim(128)
            .build();

        const query = new Float32Array(128).fill(1.0);
        const keys = new Float32Array(128 * 100).fill(0.5);
        const values = new Float32Array(128 * 100).fill(0.5);

        const result = await attention.forwardAsync(query, keys, values);

        expect(result).toBeInstanceOf(Float32Array);
        expect(result.length).toBe(128);
    });
});

7. Documentation

7.1 API Documentation

  • Rust: Generated via cargo doc
  • WASM: Auto-generated TypeScript definitions
  • Node.js: TypeScript definitions + JSDoc
  • CLI: Auto-generated from Clap

7.2 Examples

Each platform should include:

  • Basic usage examples
  • Advanced patterns
  • Performance optimization guides
  • Troubleshooting guides

8. Release Process

8.1 Version Management

  • Rust crate: cargo release
  • WASM package: wasm-pack publish
  • Node.js package: npm publish
  • CLI binary: GitHub Releases

8.2 Distribution

  • Rust: crates.io
  • WASM: npm (as ruvector-attention-wasm)
  • Node.js: npm (as ruvector-attention)
  • CLI: GitHub Releases, Homebrew, Cargo install

9. Performance Targets

Platform Target Notes
Rust Native Baseline Reference implementation
WASM (Browser) 60-80% of native JS interop overhead
WASM (Server) 70-90% of native WASI optimizations
Node.js (NAPI) 95-100% of native Minimal overhead
CLI 95-100% of native Direct Rust

10. Next Steps

  1. Implement WASM bindings (Week 1-2)
  2. Implement NAPI-RS bindings (Week 2-3)
  3. Build CLI interface (Week 3-4)
  4. Create SDK wrappers (Week 4)
  5. Write comprehensive tests (Week 5)
  6. Documentation and examples (Week 6)
  7. Release and distribution (Week 7)

Appendix A: Complete Build Commands

# Rust native
cargo build --release

# WASM (all targets)
./scripts/build-wasm.sh

# Node.js (all platforms)
cd crates/ruvector-attention-node
pnpm build --platform

# CLI
cargo build --release --bin ruvector-attention

# All at once
./scripts/build-all.sh

Appendix B: Platform-Specific Optimizations

WASM Optimizations

  • Enable wasm-opt for size reduction
  • Use SIMD128 where supported
  • Minimize JS/WASM boundary crossings

NAPI Optimizations

  • Use AsyncTask for CPU-intensive operations
  • Minimize allocations in hot paths
  • Leverage native Node.js buffers

CLI Optimizations

  • Use mimalloc for better allocation performance
  • Enable LTO and aggressive optimizations
  • Consider static linking for distribution

Document Status: Implementation Ready Last Updated: 2025-11-30 Review Date: 2025-12-07