Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

54 KiB

Raw Blame History

Platform Bindings Specification - RuVector Attention

Version: 1.0.0 Date: 2025-11-30 Status: Implementation Ready

Executive Summary

This document specifies comprehensive platform bindings for ruvector-attention, enabling deployment across Rust native, WebAssembly (browser and server), Node.js (via NAPI-RS), and providing CLI/SDK interfaces for maximum accessibility.

1. Platform Support Matrix

Platform	Target	Status	Priority	Notes
Rust Native	All Tier 1 targets	Primary	P0	Core implementation
WASM (Browser)	wasm32-unknown-unknown	Full	P0	Browser/Deno runtime
WASM (Server)	wasm32-wasi	Full	P1	Server-side WASM
Node.js 18 LTS	x86_64, arm64	Full	P0	Long-term support
Node.js 20 LTS	x86_64, arm64	Full	P0	Long-term support
Node.js 22	x86_64, arm64	Full	P1	Current release
Windows	x86_64-pc-windows-msvc	Full	P0	NAPI-RS bindings
macOS Intel	x86_64-apple-darwin	Full	P0	NAPI-RS bindings
macOS Apple Silicon	aarch64-apple-darwin	Full	P0	NAPI-RS bindings
Linux x64	x86_64-unknown-linux-gnu	Full	P0	NAPI-RS (glibc)
Linux x64 (musl)	x86_64-unknown-linux-musl	Full	P1	Alpine/static linking
Linux ARM64	aarch64-unknown-linux-gnu	Full	P1	NAPI-RS (glibc)
Linux ARM64 (musl)	aarch64-unknown-linux-musl	Full	P2	Alpine ARM
Linux ARMv7	armv7-unknown-linux-gnueabihf	Partial	P2	Raspberry Pi

1.1 Feature Matrix by Platform

Feature	Rust	WASM	Node.js	CLI
Scaled Dot-Product	✅	✅	✅	✅
Multi-Head Attention	✅	✅	✅	✅
Hyperbolic Attention	✅	✅	✅	✅
Linear Attention	✅	✅	✅	✅
Cross Attention	✅	✅	✅	✅
Self Attention	✅	✅	✅	✅
SIMD Optimizations	✅	⚠️	✅	✅
Async Processing	✅	✅	✅	N/A
Batch Operations	✅	✅	✅	✅
Streaming	✅	⚠️	✅	✅

⚠️ = Requires experimental flags or limited support

2. WASM Bindings

2.1 Project Structure

crates/ruvector-attention-wasm/
├── Cargo.toml
├── README.md
├── src/
│   ├── lib.rs                 # Main WASM exports
│   ├── attention/
│   │   ├── mod.rs
│   │   ├── scaled_dot.rs      # ScaledDotProduct WASM wrapper
│   │   ├── multi_head.rs      # MultiHead WASM wrapper
│   │   ├── hyperbolic.rs      # Hyperbolic WASM wrapper
│   │   ├── linear.rs          # Linear WASM wrapper
│   │   └── cross.rs           # Cross attention WASM wrapper
│   ├── utils.rs               # WASM utilities (panic hook, logging)
│   ├── error.rs               # WASM error handling
│   ├── types.rs               # JS-compatible types
│   └── async_ops.rs           # Async operations for WASM
├── pkg/                       # wasm-pack build output (gitignored)
├── web/                       # Browser examples
│   ├── index.html
│   ├── demo.js
│   └── worker.js             # Web Worker example
├── node/                      # Node.js WASM examples
│   └── example.mjs
├── tests/
│   ├── web.rs                # wasm-bindgen-test
│   └── node.rs
├── benches/
│   └── wasm_bench.rs
└── examples/
    ├── browser_basic.html
    ├── browser_worker.html
    └── node_server.mjs

2.2 Cargo Configuration

# crates/ruvector-attention-wasm/Cargo.toml
[package]
name = "ruvector-attention-wasm"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "WebAssembly bindings for RuVector attention mechanisms"
repository = "https://github.com/yourusername/ruvector"

[lib]
crate-type = ["cdylib", "rlib"]

[features]
default = ["console_error_panic_hook", "wee_alloc"]
simd = ["packed_simd"]
parallel = ["wasm-bindgen-rayon"]

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
wasm-bindgen = "0.2.92"
wasm-bindgen-futures = "0.4.42"
js-sys = "0.3.69"
web-sys = { version = "0.3.69", features = [
    "console",
    "Performance",
    "PerformanceTiming",
    "Window",
    "Worker",
] }
serde = { version = "1.0", features = ["derive"] }
serde-wasm-bindgen = "0.6"
console_error_panic_hook = { version = "0.1.7", optional = true }
wee_alloc = { version = "0.4.5", optional = true }
wasm-bindgen-rayon = { version = "1.2", optional = true }

[dev-dependencies]
wasm-bindgen-test = "0.3.42"
criterion = "0.5"

[profile.release]
# Optimize for size and speed
opt-level = "z"        # Optimize for size
lto = true             # Enable Link Time Optimization
codegen-units = 1      # Reduce parallel codegen for better optimization
panic = "abort"        # Remove panic formatting code
strip = true           # Strip symbols

[profile.release.package."*"]
opt-level = "z"

2.3 WASM API Design

2.3.1 Core Library (lib.rs)

// crates/ruvector-attention-wasm/src/lib.rs
use wasm_bindgen::prelude::*;

mod attention;
mod error;
mod types;
mod utils;

pub use attention::*;
pub use error::WasmError;
pub use types::*;

// Initialize WASM module
#[wasm_bindgen(start)]
pub fn init() {
    #[cfg(feature = "console_error_panic_hook")]
    console_error_panic_hook::set_once();

    #[cfg(feature = "wee_alloc")]
    {
        #[global_allocator]
        static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;
    }
}

// Version info
#[wasm_bindgen]
pub fn version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}

2.3.2 Scaled Dot-Product Attention

// crates/ruvector-attention-wasm/src/attention/scaled_dot.rs
use wasm_bindgen::prelude::*;
use js_sys::{Float32Array, Array};
use ruvector_attention::ScaledDotProduct;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmScaledDotProduct {
    inner: ScaledDotProduct,
}

#[wasm_bindgen]
impl WasmScaledDotProduct {
    /// Create a new scaled dot-product attention layer
    #[wasm_bindgen(constructor)]
    pub fn new(dim: usize) -> Result<WasmScaledDotProduct, WasmError> {
        Ok(Self {
            inner: ScaledDotProduct::new(dim),
        })
    }

    /// Forward pass with single query
    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(
            query,
            &keys_vec,
            &values_vec,
            num_neighbors,
        )?;

        Ok(Float32Array::from(&result[..]))
    }

    /// Forward pass with batched queries
    #[wasm_bindgen]
    pub fn forward_batch(
        &self,
        queries: Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let results = Array::new();
        for i in 0..queries.length() {
            let query_arr = Float32Array::from(queries.get(i));
            let query_vec: Vec<f32> = query_arr.to_vec();

            let result = self.inner.forward(
                &query_vec,
                &keys_vec,
                &values_vec,
                num_neighbors,
            )?;

            results.push(&Float32Array::from(&result[..]));
        }

        Ok(results)
    }

    /// Async forward pass (runs on web worker or async runtime)
    #[wasm_bindgen]
    pub async fn forward_async(
        &self,
        query: Vec<f32>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: usize,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        // Use wasm-bindgen-futures for async execution
        let result = wasm_bindgen_futures::spawn_local(async move {
            self.inner.forward(&query, &keys_vec, &values_vec, num_neighbors)
        })
        .await?;

        Ok(Float32Array::from(&result[..]))
    }
}

2.3.3 Multi-Head Attention

// crates/ruvector-attention-wasm/src/attention/multi_head.rs
use wasm_bindgen::prelude::*;
use js_sys::Float32Array;
use ruvector_attention::MultiHeadAttention;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmMultiHeadAttention {
    inner: MultiHeadAttention,
}

#[wasm_bindgen]
impl WasmMultiHeadAttention {
    #[wasm_bindgen(constructor)]
    pub fn new(
        num_heads: usize,
        hidden_dim: usize,
        dropout: f32,
    ) -> Result<WasmMultiHeadAttention, WasmError> {
        Ok(Self {
            inner: MultiHeadAttention::new(num_heads, hidden_dim, dropout)?,
        })
    }

    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(
            query,
            &keys_vec,
            &values_vec,
            mask.as_deref(),
        )?;

        Ok(Float32Array::from(&result[..]))
    }

    #[wasm_bindgen(getter)]
    pub fn num_heads(&self) -> usize {
        self.inner.num_heads()
    }

    #[wasm_bindgen(getter)]
    pub fn hidden_dim(&self) -> usize {
        self.inner.hidden_dim()
    }
}

2.3.4 Hyperbolic Attention

// crates/ruvector-attention-wasm/src/attention/hyperbolic.rs
use wasm_bindgen::prelude::*;
use js_sys::Float32Array;
use ruvector_attention::HyperbolicAttention;
use crate::error::WasmError;

#[wasm_bindgen]
pub struct WasmHyperbolicAttention {
    inner: HyperbolicAttention,
}

#[wasm_bindgen]
impl WasmHyperbolicAttention {
    #[wasm_bindgen(constructor)]
    pub fn new(curvature: f32) -> Result<WasmHyperbolicAttention, WasmError> {
        Ok(Self {
            inner: HyperbolicAttention::new(curvature),
        })
    }

    #[wasm_bindgen]
    pub fn forward(
        &self,
        query: &[f32],
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array, WasmError> {
        let keys_vec: Vec<f32> = keys.to_vec();
        let values_vec: Vec<f32> = values.to_vec();

        let result = self.inner.forward(query, &keys_vec, &values_vec)?;

        Ok(Float32Array::from(&result[..]))
    }

    /// Compute Poincaré distance between two points
    #[wasm_bindgen]
    pub fn poincare_distance(&self, x: &[f32], y: &[f32]) -> Result<f32, WasmError> {
        self.inner.poincare_distance(x, y)
    }

    /// Project Euclidean point to Poincaré ball
    #[wasm_bindgen]
    pub fn to_poincare(&self, x: &[f32]) -> Result<Float32Array, WasmError> {
        let result = self.inner.to_poincare(x)?;
        Ok(Float32Array::from(&result[..]))
    }

    #[wasm_bindgen(getter)]
    pub fn curvature(&self) -> f32 {
        self.inner.curvature()
    }
}

2.4 Build Scripts and Configuration

2.4.1 Build Script

#!/bin/bash
# scripts/build-wasm.sh

set -e

echo "Building RuVector Attention WASM..."

# Clean previous builds
rm -rf pkg/

# Build for web (browser)
echo "Building for web target..."
wasm-pack build \
    --target web \
    --out-dir pkg/web \
    --release \
    crates/ruvector-attention-wasm

# Build for Node.js
echo "Building for Node.js target..."
wasm-pack build \
    --target nodejs \
    --out-dir pkg/nodejs \
    --release \
    crates/ruvector-attention-wasm

# Build for bundlers (webpack, vite, etc.)
echo "Building for bundler target..."
wasm-pack build \
    --target bundler \
    --out-dir pkg/bundler \
    --release \
    crates/ruvector-attention-wasm

# Optional: Build with SIMD (requires nightly + flags)
if [ "$BUILD_SIMD" = "1" ]; then
    echo "Building SIMD version (requires nightly)..."
    RUSTFLAGS="-C target-feature=+simd128" \
    wasm-pack build \
        --target web \
        --out-dir pkg/web-simd \
        --release \
        -- --features simd \
        crates/ruvector-attention-wasm
fi

echo "WASM build complete!"
echo "Outputs:"
echo "  - Web:     pkg/web/"
echo "  - Node.js: pkg/nodejs/"
echo "  - Bundler: pkg/bundler/"

2.4.2 Package.json (for NPM publishing)

{
  "name": "ruvector-attention-wasm",
  "version": "0.1.0",
  "description": "WebAssembly bindings for RuVector attention mechanisms",
  "main": "pkg/nodejs/ruvector_attention_wasm.js",
  "module": "pkg/web/ruvector_attention_wasm.js",
  "types": "pkg/web/ruvector_attention_wasm.d.ts",
  "files": [
    "pkg/**/*"
  ],
  "scripts": {
    "build": "./scripts/build-wasm.sh",
    "test": "wasm-pack test --headless --chrome --firefox",
    "test:node": "wasm-pack test --node"
  },
  "keywords": [
    "wasm",
    "attention",
    "machine-learning",
    "rust",
    "webassembly"
  ],
  "license": "MIT OR Apache-2.0",
  "repository": {
    "type": "git",
    "url": "https://github.com/yourusername/ruvector"
  }
}

2.5 TypeScript Definitions (Auto-generated)

// pkg/web/ruvector_attention_wasm.d.ts (generated by wasm-bindgen)
export function init(): void;
export function version(): string;

export class WasmScaledDotProduct {
    constructor(dim: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Float32Array;
    forwardBatch(
        queries: Float32Array[],
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Float32Array[];
    forwardAsync(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        numNeighbors: number
    ): Promise<Float32Array>;
}

export class WasmMultiHeadAttention {
    constructor(numHeads: number, hiddenDim: number, dropout: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask?: boolean[]
    ): Float32Array;
    readonly numHeads: number;
    readonly hiddenDim: number;
}

export class WasmHyperbolicAttention {
    constructor(curvature: number);
    free(): void;
    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array
    ): Float32Array;
    poincareDistance(x: Float32Array, y: Float32Array): number;
    toPoincare(x: Float32Array): Float32Array;
    readonly curvature: number;
}

3. NAPI-RS Bindings (Node.js)

3.1 Project Structure

crates/ruvector-attention-node/
├── Cargo.toml
├── build.rs
├── package.json
├── index.js                   # JS entry point
├── index.d.ts                 # TypeScript definitions
├── README.md
├── src/
│   ├── lib.rs                 # NAPI exports
│   ├── attention/
│   │   ├── mod.rs
│   │   ├── scaled_dot.rs
│   │   ├── multi_head.rs
│   │   ├── hyperbolic.rs
│   │   ├── linear.rs
│   │   └── cross.rs
│   ├── async_ops.rs           # Tokio async operations
│   ├── error.rs               # NAPI error handling
│   ├── buffer.rs              # Buffer conversions
│   └── types.rs               # Type conversions
├── npm/                       # Platform-specific packages
│   ├── darwin-arm64/
│   │   ├── package.json
│   │   └── README.md
│   ├── darwin-x64/
│   ├── linux-arm64-gnu/
│   ├── linux-arm64-musl/
│   ├── linux-x64-gnu/
│   ├── linux-x64-musl/
│   ├── win32-arm64-msvc/
│   └── win32-x64-msvc/
├── __tests__/
│   ├── basic.test.ts
│   ├── async.test.ts
│   └── performance.test.ts
└── examples/
    ├── basic.mjs
    ├── async.mjs
    └── streaming.mjs

3.2 Cargo Configuration

# crates/ruvector-attention-node/Cargo.toml
[package]
name = "ruvector-attention-node"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "Node.js bindings for RuVector attention mechanisms via NAPI-RS"

[lib]
crate-type = ["cdylib"]

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
napi = { version = "2.16", features = ["async", "napi8", "tokio_rt"] }
napi-derive = "2.16"
tokio = { version = "1.37", features = ["rt-multi-thread"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

[build-dependencies]
napi-build = "2.1"

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

3.3 NAPI API Design

3.3.1 Core Library

// crates/ruvector-attention-node/src/lib.rs
#![deny(clippy::all)]

#[macro_use]
extern crate napi_derive;

mod attention;
mod async_ops;
mod buffer;
mod error;
mod types;

pub use attention::*;
pub use error::NapiError;

#[napi]
pub fn get_version() -> String {
    env!("CARGO_PKG_VERSION").to_string()
}

#[napi]
pub fn get_supported_features() -> Vec<String> {
    vec![
        "scaled-dot-product".to_string(),
        "multi-head".to_string(),
        "hyperbolic".to_string(),
        "linear".to_string(),
        "cross-attention".to_string(),
        "self-attention".to_string(),
    ]
}

3.3.2 Scaled Dot-Product Attention

// crates/ruvector-attention-node/src/attention/scaled_dot.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::ScaledDotProduct as CoreScaledDotProduct;

#[napi]
pub struct ScaledDotProductAttention {
    inner: CoreScaledDotProduct,
}

#[napi]
impl ScaledDotProductAttention {
    #[napi(constructor)]
    pub fn new(dim: u32) -> Result<Self> {
        Ok(Self {
            inner: CoreScaledDotProduct::new(dim as usize),
        })
    }

    /// Synchronous forward pass
    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    /// Asynchronous forward pass (non-blocking)
    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    /// Batch forward pass
    #[napi]
    pub fn forward_batch(
        &self,
        queries: Vec<Float32Array>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Vec<Float32Array>> {
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        queries
            .into_iter()
            .map(|q| {
                let query_vec = q.to_vec();
                self.inner
                    .forward(&query_vec, &keys_vec, &values_vec, num_neighbors as usize)
                    .map(Float32Array::new)
                    .map_err(|e| Error::from_reason(e.to_string()))
            })
            .collect()
    }

    /// Async batch forward pass
    #[napi]
    pub async fn forward_batch_async(
        &self,
        queries: Vec<Float32Array>,
        keys: Float32Array,
        values: Float32Array,
        num_neighbors: u32,
    ) -> Result<Vec<Float32Array>> {
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        let query_vecs: Vec<Vec<f32>> = queries.into_iter()
            .map(|q| q.to_vec())
            .collect();

        tokio::task::spawn_blocking(move || {
            query_vecs.into_iter()
                .map(|q| {
                    inner.forward(&q, &keys_vec, &values_vec, num_neighbors as usize)
                        .map(Float32Array::new)
                })
                .collect::<Result<Vec<_>, _>>()
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi(getter)]
    pub fn dim(&self) -> u32 {
        self.inner.dim() as u32
    }
}

3.3.3 Multi-Head Attention

// crates/ruvector-attention-node/src/attention/multi_head.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::MultiHeadAttention as CoreMultiHeadAttention;

#[napi(object)]
pub struct AttentionConfig {
    pub num_heads: u32,
    pub hidden_dim: u32,
    pub dropout: f64,
}

#[napi]
pub struct MultiHeadAttention {
    inner: CoreMultiHeadAttention,
}

#[napi]
impl MultiHeadAttention {
    #[napi(constructor)]
    pub fn new(config: AttentionConfig) -> Result<Self> {
        let inner = CoreMultiHeadAttention::new(
            config.num_heads as usize,
            config.hidden_dim as usize,
            config.dropout as f32,
        )
        .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Self { inner })
    }

    #[napi(factory)]
    pub fn from_params(num_heads: u32, hidden_dim: u32, dropout: Option<f64>) -> Result<Self> {
        Self::new(AttentionConfig {
            num_heads,
            hidden_dim,
            dropout: dropout.unwrap_or(0.0),
        })
    }

    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec, mask.as_deref())
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        mask: Option<Vec<bool>>,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec, mask.as_deref())
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    #[napi(getter)]
    pub fn num_heads(&self) -> u32 {
        self.inner.num_heads() as u32
    }

    #[napi(getter)]
    pub fn hidden_dim(&self) -> u32 {
        self.inner.hidden_dim() as u32
    }

    #[napi(getter)]
    pub fn head_dim(&self) -> u32 {
        (self.inner.hidden_dim() / self.inner.num_heads()) as u32
    }
}

3.3.4 Hyperbolic Attention

// crates/ruvector-attention-node/src/attention/hyperbolic.rs
use napi::bindgen_prelude::*;
use napi_derive::napi;
use ruvector_attention::HyperbolicAttention as CoreHyperbolicAttention;

#[napi]
pub struct HyperbolicAttention {
    inner: CoreHyperbolicAttention,
}

#[napi]
impl HyperbolicAttention {
    #[napi(constructor)]
    pub fn new(curvature: f64) -> Result<Self> {
        Ok(Self {
            inner: CoreHyperbolicAttention::new(curvature as f32),
        })
    }

    #[napi]
    pub fn forward(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();

        let result = self.inner
            .forward(&query_vec, &keys_vec, &values_vec)
            .map_err(|e| Error::from_reason(e.to_string()))?;

        Ok(Float32Array::new(result))
    }

    #[napi]
    pub async fn forward_async(
        &self,
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
    ) -> Result<Float32Array> {
        let query_vec = query.to_vec();
        let keys_vec = keys.to_vec();
        let values_vec = values.to_vec();
        let inner = self.inner.clone();

        tokio::task::spawn_blocking(move || {
            inner.forward(&query_vec, &keys_vec, &values_vec)
        })
        .await
        .map_err(|e| Error::from_reason(e.to_string()))?
        .map_err(|e| Error::from_reason(e.to_string()))
        .map(Float32Array::new)
    }

    #[napi]
    pub fn poincare_distance(&self, x: Float32Array, y: Float32Array) -> Result<f64> {
        let x_vec = x.to_vec();
        let y_vec = y.to_vec();

        self.inner
            .poincare_distance(&x_vec, &y_vec)
            .map(|d| d as f64)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi]
    pub fn to_poincare(&self, x: Float32Array) -> Result<Float32Array> {
        let x_vec = x.to_vec();

        self.inner
            .to_poincare(&x_vec)
            .map(Float32Array::new)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi]
    pub fn from_poincare(&self, x: Float32Array) -> Result<Float32Array> {
        let x_vec = x.to_vec();

        self.inner
            .from_poincare(&x_vec)
            .map(Float32Array::new)
            .map_err(|e| Error::from_reason(e.to_string()))
    }

    #[napi(getter)]
    pub fn curvature(&self) -> f64 {
        self.inner.curvature() as f64
    }
}

3.4 Package Configuration

3.4.1 package.json

{
  "name": "ruvector-attention",
  "version": "0.1.0",
  "description": "High-performance attention mechanisms for Node.js via Rust/NAPI-RS",
  "main": "index.js",
  "types": "index.d.ts",
  "keywords": [
    "attention",
    "machine-learning",
    "rust",
    "napi",
    "native",
    "performance"
  ],
  "license": "MIT OR Apache-2.0",
  "author": "RuVector Team",
  "repository": {
    "type": "git",
    "url": "https://github.com/yourusername/ruvector"
  },
  "engines": {
    "node": ">= 18"
  },
  "napi": {
    "name": "ruvector-attention",
    "triples": {
      "defaults": true,
      "additional": [
        "aarch64-apple-darwin",
        "aarch64-unknown-linux-gnu",
        "aarch64-unknown-linux-musl",
        "armv7-unknown-linux-gnueabihf",
        "x86_64-unknown-linux-musl",
        "aarch64-pc-windows-msvc"
      ]
    }
  },
  "scripts": {
    "artifacts": "napi artifacts",
    "build": "napi build --platform --release",
    "build:debug": "napi build --platform",
    "prepublishOnly": "napi prepublish -t npm",
    "test": "jest",
    "test:coverage": "jest --coverage",
    "version": "napi version",
    "bench": "node benches/benchmark.mjs"
  },
  "devDependencies": {
    "@napi-rs/cli": "^2.18.0",
    "@types/node": "^20.11.0",
    "jest": "^29.7.0",
    "typescript": "^5.3.3"
  },
  "optionalDependencies": {
    "ruvector-attention-darwin-arm64": "0.1.0",
    "ruvector-attention-darwin-x64": "0.1.0",
    "ruvector-attention-linux-arm64-gnu": "0.1.0",
    "ruvector-attention-linux-arm64-musl": "0.1.0",
    "ruvector-attention-linux-x64-gnu": "0.1.0",
    "ruvector-attention-linux-x64-musl": "0.1.0",
    "ruvector-attention-win32-arm64-msvc": "0.1.0",
    "ruvector-attention-win32-x64-msvc": "0.1.0"
  }
}

3.4.2 TypeScript Definitions

// index.d.ts
export function getVersion(): string;
export function getSupportedFeatures(): string[];

export interface AttentionConfig {
  numHeads: number;
  hiddenDim: number;
  dropout: number;
}

export class ScaledDotProductAttention {
  constructor(dim: number);

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Promise<Float32Array>;

  forwardBatch(
    queries: Float32Array[],
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Float32Array[];

  forwardBatchAsync(
    queries: Float32Array[],
    keys: Float32Array,
    values: Float32Array,
    numNeighbors: number
  ): Promise<Float32Array[]>;

  readonly dim: number;
}

export class MultiHeadAttention {
  constructor(config: AttentionConfig);

  static fromParams(
    numHeads: number,
    hiddenDim: number,
    dropout?: number
  ): MultiHeadAttention;

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    mask?: boolean[]
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array,
    mask?: boolean[]
  ): Promise<Float32Array>;

  readonly numHeads: number;
  readonly hiddenDim: number;
  readonly headDim: number;
}

export class HyperbolicAttention {
  constructor(curvature: number);

  forward(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array
  ): Float32Array;

  forwardAsync(
    query: Float32Array,
    keys: Float32Array,
    values: Float32Array
  ): Promise<Float32Array>;

  poincareDistance(x: Float32Array, y: Float32Array): number;
  toPoincare(x: Float32Array): Float32Array;
  fromPoincare(x: Float32Array): Float32Array;

  readonly curvature: number;
}

// Export all attention types
export { ScaledDotProductAttention as ScaledDotProduct };
export { MultiHeadAttention as MultiHead };
export { HyperbolicAttention as Hyperbolic };

3.5 Build and Deployment

3.5.1 GitHub Actions Workflow

# .github/workflows/napi.yml
name: NAPI Build and Release

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:

jobs:
  build:
    strategy:
      fail-fast: false
      matrix:
        settings:
          - host: macos-latest
            target: x86_64-apple-darwin
            build: pnpm build --target x86_64-apple-darwin
          - host: macos-latest
            target: aarch64-apple-darwin
            build: pnpm build --target aarch64-apple-darwin
          - host: windows-latest
            target: x86_64-pc-windows-msvc
            build: pnpm build --target x86_64-pc-windows-msvc
          - host: ubuntu-latest
            target: x86_64-unknown-linux-gnu
            build: pnpm build --target x86_64-unknown-linux-gnu
          - host: ubuntu-latest
            target: x86_64-unknown-linux-musl
            build: pnpm build --target x86_64-unknown-linux-musl
          - host: ubuntu-latest
            target: aarch64-unknown-linux-gnu
            build: pnpm build --target aarch64-unknown-linux-gnu

    name: Build ${{ matrix.settings.target }}
    runs-on: ${{ matrix.settings.host }}

    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
        with:
          targets: ${{ matrix.settings.target }}

      - name: Build
        run: ${{ matrix.settings.build }}

      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: bindings-${{ matrix.settings.target }}
          path: crates/ruvector-attention-node/*.node

4. CLI Interface

4.1 Project Structure

crates/ruvector-attention-cli/
├── Cargo.toml
├── README.md
├── src/
│   ├── main.rs                # CLI entry point
│   ├── commands/
│   │   ├── mod.rs
│   │   ├── compute.rs         # Compute attention
│   │   ├── benchmark.rs       # Benchmarking
│   │   ├── convert.rs         # Model conversion
│   │   ├── serve.rs           # HTTP server
│   │   └── repl.rs            # Interactive REPL
│   ├── config.rs              # Configuration management
│   ├── format.rs              # Input/output formats
│   ├── server/
│   │   ├── mod.rs
│   │   ├── handlers.rs
│   │   └── middleware.rs
│   └── utils.rs
├── tests/
│   ├── integration.rs
│   └── cli.rs
└── examples/
    ├── config.toml
    └── sample_data/

4.2 Cargo Configuration

# crates/ruvector-attention-cli/Cargo.toml
[package]
name = "ruvector-attention-cli"
version = "0.1.0"
authors = ["RuVector Team"]
edition = "2021"
license = "MIT OR Apache-2.0"
description = "CLI for RuVector attention mechanisms"

[[bin]]
name = "ruvector-attention"
path = "src/main.rs"

[dependencies]
ruvector-attention = { path = "../ruvector-attention" }
clap = { version = "4.5", features = ["derive", "cargo"] }
tokio = { version = "1.37", features = ["full"] }
axum = "0.7"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
toml = "0.8"
anyhow = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
indicatif = "0.17"
comfy-table = "7.1"
colored = "2.1"
rustyline = "14.0"

[dev-dependencies]
assert_cmd = "2.0"
predicates = "3.1"
tempfile = "3.10"

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

4.3 CLI Design

4.3.1 Main CLI Structure

// src/main.rs
use clap::{Parser, Subcommand};
use anyhow::Result;

mod commands;
mod config;
mod format;
mod server;
mod utils;

#[derive(Parser)]
#[command(name = "ruvector-attention")]
#[command(version, about, long_about = None)]
struct Cli {
    #[command(subcommand)]
    command: Commands,

    /// Enable verbose logging
    #[arg(short, long, global = true)]
    verbose: bool,

    /// Configuration file
    #[arg(short, long, global = true)]
    config: Option<String>,
}

#[derive(Subcommand)]
enum Commands {
    /// Compute attention for given inputs
    Compute(commands::compute::ComputeArgs),

    /// Run benchmarks
    Benchmark(commands::benchmark::BenchmarkArgs),

    /// Convert between model formats
    Convert(commands::convert::ConvertArgs),

    /// Start HTTP server
    Serve(commands::serve::ServeArgs),

    /// Interactive REPL
    Repl(commands::repl::ReplArgs),
}

#[tokio::main]
async fn main() -> Result<()> {
    let cli = Cli::parse();

    // Initialize logging
    let log_level = if cli.verbose { "debug" } else { "info" };
    tracing_subscriber::fmt()
        .with_env_filter(log_level)
        .init();

    // Execute command
    match cli.command {
        Commands::Compute(args) => commands::compute::run(args).await,
        Commands::Benchmark(args) => commands::benchmark::run(args).await,
        Commands::Convert(args) => commands::convert::run(args).await,
        Commands::Serve(args) => commands::serve::run(args).await,
        Commands::Repl(args) => commands::repl::run(args).await,
    }
}

4.3.2 Compute Command

// src/commands/compute.rs
use clap::Args;
use anyhow::{Result, Context};
use ruvector_attention::*;
use std::path::PathBuf;

#[derive(Args)]
pub struct ComputeArgs {
    /// Attention type (scaled-dot-product, multi-head, hyperbolic, linear)
    #[arg(short, long)]
    attention_type: String,

    /// Query vector file (binary f32)
    #[arg(short, long)]
    query: PathBuf,

    /// Keys file (binary f32)
    #[arg(short, long)]
    keys: PathBuf,

    /// Values file (binary f32)
    #[arg(short, long)]
    values: PathBuf,

    /// Output file
    #[arg(short, long)]
    output: PathBuf,

    /// Number of neighbors (for k-NN attention)
    #[arg(long, default_value = "100")]
    neighbors: usize,

    /// Number of heads (for multi-head attention)
    #[arg(long, default_value = "8")]
    num_heads: usize,

    /// Hidden dimension
    #[arg(long, default_value = "128")]
    hidden_dim: usize,
}

pub async fn run(args: ComputeArgs) -> Result<()> {
    println!("Computing {} attention...", args.attention_type);

    // Load inputs
    let query = load_f32_binary(&args.query)
        .context("Failed to load query")?;
    let keys = load_f32_binary(&args.keys)
        .context("Failed to load keys")?;
    let values = load_f32_binary(&args.values)
        .context("Failed to load values")?;

    // Compute attention
    let result = match args.attention_type.as_str() {
        "scaled-dot-product" => {
            let attention = ScaledDotProduct::new(args.hidden_dim);
            attention.forward(&query, &keys, &values, args.neighbors)?
        },
        "multi-head" => {
            let attention = MultiHeadAttention::new(
                args.num_heads,
                args.hidden_dim,
                0.0,
            )?;
            attention.forward(&query, &keys, &values, None)?
        },
        "hyperbolic" => {
            let attention = HyperbolicAttention::new(1.0);
            attention.forward(&query, &keys, &values)?
        },
        _ => anyhow::bail!("Unknown attention type: {}", args.attention_type),
    };

    // Save output
    save_f32_binary(&args.output, &result)
        .context("Failed to save output")?;

    println!("✓ Attention computed successfully");
    println!("  Output shape: {}", result.len());
    println!("  Saved to: {}", args.output.display());

    Ok(())
}

fn load_f32_binary(path: &PathBuf) -> Result<Vec<f32>> {
    let bytes = std::fs::read(path)?;
    let floats: Vec<f32> = bytes
        .chunks_exact(4)
        .map(|chunk| f32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]))
        .collect();
    Ok(floats)
}

fn save_f32_binary(path: &PathBuf, data: &[f32]) -> Result<()> {
    let bytes: Vec<u8> = data.iter()
        .flat_map(|f| f.to_le_bytes())
        .collect();
    std::fs::write(path, bytes)?;
    Ok(())
}

4.3.3 Benchmark Command

// src/commands/benchmark.rs
use clap::Args;
use anyhow::Result;
use ruvector_attention::*;
use std::time::Instant;
use comfy_table::{Table, presets::UTF8_FULL};
use colored::Colorize;

#[derive(Args)]
pub struct BenchmarkArgs {
    /// Attention types to benchmark (comma-separated)
    #[arg(long, default_value = "scaled-dot-product,multi-head,hyperbolic")]
    types: String,

    /// Dimensions to test (comma-separated)
    #[arg(long, default_value = "128,256,512")]
    dims: String,

    /// Number of neighbors to test
    #[arg(long, default_value = "100,500,1000")]
    neighbors: String,

    /// Number of iterations
    #[arg(long, default_value = "1000")]
    iterations: usize,

    /// Output file for results (JSON)
    #[arg(short, long)]
    output: Option<PathBuf>,
}

pub async fn run(args: BenchmarkArgs) -> Result<()> {
    println!("{}", "Running Benchmarks...".bold().green());

    let types: Vec<&str> = args.types.split(',').collect();
    let dims: Vec<usize> = args.dims.split(',')
        .filter_map(|s| s.parse().ok())
        .collect();
    let neighbors: Vec<usize> = args.neighbors.split(',')
        .filter_map(|s| s.parse().ok())
        .collect();

    let mut table = Table::new();
    table.load_preset(UTF8_FULL);
    table.set_header(vec!["Type", "Dim", "Neighbors", "Avg Time (ms)", "Throughput (ops/s)"]);

    for attention_type in &types {
        for &dim in &dims {
            for &k in &neighbors {
                let avg_time = benchmark_attention(
                    attention_type,
                    dim,
                    k,
                    args.iterations,
                )?;

                let throughput = 1000.0 / avg_time;

                table.add_row(vec![
                    attention_type.to_string(),
                    dim.to_string(),
                    k.to_string(),
                    format!("{:.3}", avg_time),
                    format!("{:.0}", throughput),
                ]);
            }
        }
    }

    println!("\n{}", table);

    Ok(())
}

fn benchmark_attention(
    attention_type: &str,
    dim: usize,
    num_neighbors: usize,
    iterations: usize,
) -> Result<f64> {
    // Generate random data
    let query: Vec<f32> = (0..dim).map(|_| rand::random()).collect();
    let keys: Vec<f32> = (0..dim * num_neighbors).map(|_| rand::random()).collect();
    let values: Vec<f32> = (0..dim * num_neighbors).map(|_| rand::random()).collect();

    let start = Instant::now();

    for _ in 0..iterations {
        match attention_type {
            "scaled-dot-product" => {
                let attention = ScaledDotProduct::new(dim);
                let _ = attention.forward(&query, &keys, &values, num_neighbors)?;
            },
            "multi-head" => {
                let attention = MultiHeadAttention::new(8, dim, 0.0)?;
                let _ = attention.forward(&query, &keys, &values, None)?;
            },
            "hyperbolic" => {
                let attention = HyperbolicAttention::new(1.0);
                let _ = attention.forward(&query, &keys, &values)?;
            },
            _ => {},
        }
    }

    let elapsed = start.elapsed();
    let avg_ms = elapsed.as_secs_f64() * 1000.0 / iterations as f64;

    Ok(avg_ms)
}

4.3.4 Server Command

// src/commands/serve.rs
use clap::Args;
use anyhow::Result;
use axum::{
    routing::{get, post},
    Router, Json,
    http::StatusCode,
};
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;

#[derive(Args)]
pub struct ServeArgs {
    /// Server host
    #[arg(long, default_value = "0.0.0.0")]
    host: String,

    /// Server port
    #[arg(short, long, default_value = "8080")]
    port: u16,

    /// Maximum batch size
    #[arg(long, default_value = "256")]
    max_batch_size: usize,
}

#[derive(Deserialize)]
struct AttentionRequest {
    attention_type: String,
    query: Vec<f32>,
    keys: Vec<f32>,
    values: Vec<f32>,
    num_neighbors: Option<usize>,
    num_heads: Option<usize>,
    hidden_dim: Option<usize>,
}

#[derive(Serialize)]
struct AttentionResponse {
    result: Vec<f32>,
    computation_time_ms: f64,
}

pub async fn run(args: ServeArgs) -> Result<()> {
    let app = Router::new()
        .route("/", get(health_check))
        .route("/attention", post(compute_attention))
        .route("/health", get(health_check));

    let addr: SocketAddr = format!("{}:{}", args.host, args.port).parse()?;

    println!("🚀 Server listening on http://{}", addr);

    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await?;

    Ok(())
}

async fn health_check() -> &'static str {
    "OK"
}

async fn compute_attention(
    Json(req): Json<AttentionRequest>,
) -> Result<Json<AttentionResponse>, StatusCode> {
    let start = std::time::Instant::now();

    let result = match req.attention_type.as_str() {
        "scaled-dot-product" => {
            let dim = req.hidden_dim.unwrap_or(128);
            let k = req.num_neighbors.unwrap_or(100);
            let attention = ScaledDotProduct::new(dim);
            attention.forward(&req.query, &req.keys, &req.values, k)
                .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
        },
        _ => return Err(StatusCode::BAD_REQUEST),
    };

    let elapsed = start.elapsed().as_secs_f64() * 1000.0;

    Ok(Json(AttentionResponse {
        result,
        computation_time_ms: elapsed,
    }))
}

4.4 Configuration File

# config.toml
[attention]
default_type = "multi-head"
num_heads = 8
hidden_dim = 128
dropout = 0.1

[optimization]
use_simd = true
num_threads = 4
batch_size = 64

[server]
host = "0.0.0.0"
port = 8080
max_batch_size = 256
timeout_ms = 5000
max_connections = 1000

[logging]
level = "info"
format = "json"

5. SDK Design

5.1 High-Level Rust SDK

// src/sdk/mod.rs
use crate::*;

pub mod prelude {
    pub use super::{
        AttentionBuilder,
        AttentionType,
        AttentionConfig,
    };
    pub use crate::{
        ScaledDotProduct,
        MultiHeadAttention,
        HyperbolicAttention,
    };
}

#[derive(Debug, Clone)]
pub enum AttentionType {
    ScaledDotProduct { neighbors: usize },
    MultiHead { num_heads: usize },
    Hyperbolic { curvature: f32 },
    Linear,
    Auto, // Auto-select based on input
}

#[derive(Debug, Clone)]
pub struct AttentionConfig {
    pub hidden_dim: usize,
    pub dropout: f32,
    pub use_simd: bool,
}

impl Default for AttentionConfig {
    fn default() -> Self {
        Self {
            hidden_dim: 128,
            dropout: 0.0,
            use_simd: true,
        }
    }
}

pub struct AttentionBuilder {
    attention_type: Option<AttentionType>,
    config: AttentionConfig,
}

impl AttentionBuilder {
    pub fn new() -> Self {
        Self {
            attention_type: None,
            config: AttentionConfig::default(),
        }
    }

    pub fn attention_type(mut self, att_type: AttentionType) -> Self {
        self.attention_type = Some(att_type);
        self
    }

    pub fn hidden_dim(mut self, dim: usize) -> Self {
        self.config.hidden_dim = dim;
        self
    }

    pub fn dropout(mut self, dropout: f32) -> Self {
        self.config.dropout = dropout;
        self
    }

    pub fn build(self) -> Result<Box<dyn Attention>> {
        let att_type = self.attention_type
            .ok_or_else(|| anyhow::anyhow!("Attention type not specified"))?;

        let attention: Box<dyn Attention> = match att_type {
            AttentionType::ScaledDotProduct { neighbors } => {
                Box::new(ScaledDotProduct::new(self.config.hidden_dim))
            },
            AttentionType::MultiHead { num_heads } => {
                Box::new(MultiHeadAttention::new(
                    num_heads,
                    self.config.hidden_dim,
                    self.config.dropout,
                )?)
            },
            AttentionType::Hyperbolic { curvature } => {
                Box::new(HyperbolicAttention::new(curvature))
            },
            AttentionType::Auto => {
                // Auto-select based on configuration
                Box::new(MultiHeadAttention::new(8, self.config.hidden_dim, 0.0)?)
            },
            _ => unimplemented!(),
        };

        Ok(attention)
    }
}

// Common trait for all attention mechanisms
pub trait Attention {
    fn forward(
        &self,
        query: &[f32],
        keys: &[f32],
        values: &[f32],
    ) -> Result<Vec<f32>>;
}

5.2 JavaScript/TypeScript SDK

// sdk/typescript/src/index.ts
import {
    ScaledDotProductAttention,
    MultiHeadAttention,
    HyperbolicAttention,
} from 'ruvector-attention';

export enum AttentionType {
    ScaledDotProduct = 'scaled-dot-product',
    MultiHead = 'multi-head',
    Hyperbolic = 'hyperbolic',
    Linear = 'linear',
    Auto = 'auto',
}

export interface AttentionConfig {
    type: AttentionType;
    hiddenDim: number;
    dropout?: number;
    numHeads?: number;
    neighbors?: number;
    curvature?: number;
}

export class Attention {
    private inner: any;
    private config: AttentionConfig;

    constructor(config: AttentionConfig) {
        this.config = config;
        this.inner = this.createAttention(config);
    }

    private createAttention(config: AttentionConfig) {
        switch (config.type) {
            case AttentionType.ScaledDotProduct:
                return new ScaledDotProductAttention(config.hiddenDim);

            case AttentionType.MultiHead:
                return MultiHeadAttention.fromParams(
                    config.numHeads || 8,
                    config.hiddenDim,
                    config.dropout || 0.0
                );

            case AttentionType.Hyperbolic:
                return new HyperbolicAttention(config.curvature || 1.0);

            case AttentionType.Auto:
                // Auto-select based on input size
                return MultiHeadAttention.fromParams(8, config.hiddenDim);

            default:
                throw new Error(`Unknown attention type: ${config.type}`);
        }
    }

    forward(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        options?: { mask?: boolean[], neighbors?: number }
    ): Float32Array {
        if (this.config.type === AttentionType.ScaledDotProduct) {
            return this.inner.forward(
                query,
                keys,
                values,
                options?.neighbors || 100
            );
        } else {
            return this.inner.forward(query, keys, values, options?.mask);
        }
    }

    async forwardAsync(
        query: Float32Array,
        keys: Float32Array,
        values: Float32Array,
        options?: { mask?: boolean[], neighbors?: number }
    ): Promise<Float32Array> {
        if (this.config.type === AttentionType.ScaledDotProduct) {
            return this.inner.forwardAsync(
                query,
                keys,
                values,
                options?.neighbors || 100
            );
        } else {
            return this.inner.forwardAsync(query, keys, values, options?.mask);
        }
    }

    // Streaming API for large inputs
    async *forwardStream(
        queryStream: AsyncIterable<Float32Array>,
        keys: Float32Array,
        values: Float32Array
    ): AsyncGenerator<Float32Array> {
        for await (const query of queryStream) {
            yield await this.forwardAsync(query, keys, values);
        }
    }
}

// Builder pattern
export class AttentionBuilder {
    private config: Partial<AttentionConfig> = {};

    type(type: AttentionType, options?: {
        numHeads?: number,
        neighbors?: number,
        curvature?: number
    }): this {
        this.config.type = type;
        if (options?.numHeads) this.config.numHeads = options.numHeads;
        if (options?.neighbors) this.config.neighbors = options.neighbors;
        if (options?.curvature) this.config.curvature = options.curvature;
        return this;
    }

    hiddenDim(dim: number): this {
        this.config.hiddenDim = dim;
        return this;
    }

    dropout(dropout: number): this {
        this.config.dropout = dropout;
        return this;
    }

    build(): Attention {
        if (!this.config.type) {
            throw new Error('Attention type not specified');
        }
        if (!this.config.hiddenDim) {
            throw new Error('Hidden dimension not specified');
        }

        return new Attention(this.config as AttentionConfig);
    }
}

// Export everything
export {
    ScaledDotProductAttention,
    MultiHeadAttention,
    HyperbolicAttention,
};

6. Testing Strategy

6.1 Platform-Specific Tests

// WASM tests
#[cfg(target_arch = "wasm32")]
mod wasm_tests {
    use wasm_bindgen_test::*;

    #[wasm_bindgen_test]
    fn test_scaled_dot_product() {
        // Test WASM-specific behavior
    }
}

// NAPI tests
#[cfg(all(not(target_arch = "wasm32"), feature = "napi"))]
mod napi_tests {
    use napi::bindgen_prelude::*;

    #[test]
    fn test_napi_conversion() {
        // Test NAPI-specific behavior
    }
}

6.2 Integration Tests

// __tests__/integration.test.ts
import { AttentionBuilder, AttentionType } from 'ruvector-attention';

describe('Attention Integration Tests', () => {
    test('scaled dot-product attention', async () => {
        const attention = new AttentionBuilder()
            .type(AttentionType.ScaledDotProduct, { neighbors: 100 })
            .hiddenDim(128)
            .build();

        const query = new Float32Array(128).fill(1.0);
        const keys = new Float32Array(128 * 100).fill(0.5);
        const values = new Float32Array(128 * 100).fill(0.5);

        const result = await attention.forwardAsync(query, keys, values);

        expect(result).toBeInstanceOf(Float32Array);
        expect(result.length).toBe(128);
    });
});

7. Documentation

7.1 API Documentation

Rust: Generated via cargo doc
WASM: Auto-generated TypeScript definitions
Node.js: TypeScript definitions + JSDoc
CLI: Auto-generated from Clap

7.2 Examples

Each platform should include:

Basic usage examples
Advanced patterns
Performance optimization guides
Troubleshooting guides

8. Release Process

8.1 Version Management

Rust crate: cargo release
WASM package: wasm-pack publish
Node.js package: npm publish
CLI binary: GitHub Releases

8.2 Distribution

Rust: crates.io
WASM: npm (as ruvector-attention-wasm)
Node.js: npm (as ruvector-attention)
CLI: GitHub Releases, Homebrew, Cargo install

9. Performance Targets

Platform	Target	Notes
Rust Native	Baseline	Reference implementation
WASM (Browser)	60-80% of native	JS interop overhead
WASM (Server)	70-90% of native	WASI optimizations
Node.js (NAPI)	95-100% of native	Minimal overhead
CLI	95-100% of native	Direct Rust

10. Next Steps

Implement WASM bindings (Week 1-2)
Implement NAPI-RS bindings (Week 2-3)
Build CLI interface (Week 3-4)
Create SDK wrappers (Week 4)
Write comprehensive tests (Week 5)
Documentation and examples (Week 6)
Release and distribution (Week 7)

Appendix A: Complete Build Commands

# Rust native
cargo build --release

# WASM (all targets)
./scripts/build-wasm.sh

# Node.js (all platforms)
cd crates/ruvector-attention-node
pnpm build --platform

# CLI
cargo build --release --bin ruvector-attention

# All at once
./scripts/build-all.sh

Appendix B: Platform-Specific Optimizations

WASM Optimizations

Enable wasm-opt for size reduction
Use SIMD128 where supported
Minimize JS/WASM boundary crossings

NAPI Optimizations

Use AsyncTask for CPU-intensive operations
Minimize allocations in hot paths
Leverage native Node.js buffers

CLI Optimizations

Use mimalloc for better allocation performance
Enable LTO and aggressive optimizations
Consider static linking for distribution

Document Status: Implementation Ready Last Updated: 2025-11-30 Review Date: 2025-12-07

54 KiB Raw Blame History