git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
967 lines
26 KiB
Markdown
967 lines
26 KiB
Markdown
# RuVector-WASM: Standalone Vector Database Architecture
|
|
|
|
## 🎯 Vision
|
|
|
|
**A complete, self-contained vector database with graph and semantic capabilities that runs anywhere - browser, Node.js, Deno, Bun, edge workers - without PostgreSQL.**
|
|
|
|
Think **DuckDB-WASM** but for vector/graph/semantic workloads.
|
|
|
|
---
|
|
|
|
## ✨ What You Get
|
|
|
|
### Complete Feature Set
|
|
- ✅ **Vector Operations**: All types (f32, f16, binary, sparse), SIMD-optimized distances
|
|
- ✅ **Graph Database**: Cypher queries, graph traversal, GNN layers
|
|
- ✅ **Semantic Search**: HNSW indexing, quantization, hybrid search
|
|
- ✅ **RDF/SPARQL**: W3C-compliant SPARQL 1.1, triple store
|
|
- ✅ **SQL Interface**: Standard SQL for vector queries
|
|
- ✅ **Self-Learning**: ReasoningBank, attention mechanisms, adaptive patterns
|
|
- ✅ **Hyperbolic Embeddings**: Poincaré, Lorentz spaces
|
|
- ✅ **AI Routing**: Tiny Dancer intelligent routing
|
|
|
|
### Zero Dependencies
|
|
- ❌ No PostgreSQL installation
|
|
- ❌ No Docker required
|
|
- ❌ No native compilation
|
|
- ✅ Pure WASM (~5-10MB bundle)
|
|
- ✅ `npm install @ruvector/wasm` and go!
|
|
|
|
### Universal Runtime
|
|
```bash
|
|
# Browser
|
|
<script type="module" src="https://cdn.jsdelivr.net/npm/@ruvector/wasm"></script>
|
|
|
|
# Node.js
|
|
npm install @ruvector/wasm
|
|
|
|
# Deno
|
|
import { RuVector } from "https://esm.sh/@ruvector/wasm"
|
|
|
|
# Bun
|
|
bun add @ruvector/wasm
|
|
|
|
# Cloudflare Workers (edge)
|
|
import { RuVector } from "@ruvector/wasm"
|
|
```
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### High-Level Design
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Query Layer (TypeScript/JavaScript API) │
|
|
│ - SQL parser & executor │
|
|
│ - Cypher parser & executor │
|
|
│ - SPARQL parser & executor │
|
|
│ - REST-like API (insert, query, delete) │
|
|
└──────────────────────┬──────────────────────────────────────┘
|
|
│ wasm-bindgen
|
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
│ RuVector Core (Rust → WASM) │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ Storage Engine │ │
|
|
│ │ - In-memory tables (DashMap) │ │
|
|
│ │ - IndexedDB persistence (browser) │ │
|
|
│ │ - OPFS storage (browser) │ │
|
|
│ │ - File system (Node.js/Deno/Bun) │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ Index Layer │ │
|
|
│ │ - HNSW vector index │ │
|
|
│ │ - B-Tree for scalar columns │ │
|
|
│ │ - Triple store (SPO, POS, OSP indexes) │ │
|
|
│ │ - Graph adjacency lists │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ Execution Engine │ │
|
|
│ │ - SQL query planner & executor │ │
|
|
│ │ - Cypher graph pattern matching │ │
|
|
│ │ - SPARQL triple pattern matching │ │
|
|
│ │ - Vector similarity search │ │
|
|
│ │ - GNN computation (GCN, GraphSage, GAT) │ │
|
|
│ │ - ReasoningBank self-learning │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
│ ┌────────────────────────────────────────────────────────┐ │
|
|
│ │ SIMD & Optimization │ │
|
|
│ │ - WASM SIMD (128-bit) │ │
|
|
│ │ - Quantization (binary, scalar, product) │ │
|
|
│ │ - Parallel query execution (rayon → WASM threads) │ │
|
|
│ └────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component Breakdown
|
|
|
|
#### 1. Storage Layer (`ruvector-wasm/src/storage/`)
|
|
|
|
**In-Memory Store** (primary):
|
|
```rust
|
|
pub struct Database {
|
|
tables: DashMap<String, Table>,
|
|
triple_store: Arc<TripleStore>,
|
|
graph: Arc<GraphStore>,
|
|
indexes: DashMap<String, Index>,
|
|
}
|
|
|
|
pub struct Table {
|
|
schema: Schema,
|
|
rows: RwLock<Vec<Row>>,
|
|
vector_columns: HashMap<String, VectorColumn>,
|
|
}
|
|
```
|
|
|
|
**Persistence Adapters**:
|
|
```rust
|
|
trait PersistenceBackend {
|
|
async fn save(&self, db: &Database) -> Result<()>;
|
|
async fn load(&self) -> Result<Database>;
|
|
}
|
|
|
|
// Browser: IndexedDB
|
|
struct IndexedDBBackend;
|
|
|
|
// Browser: OPFS (Origin Private File System)
|
|
struct OPFSBackend;
|
|
|
|
// Node/Deno/Bun: File system
|
|
struct FileSystemBackend;
|
|
```
|
|
|
|
#### 2. Query Engines (`ruvector-wasm/src/query/`)
|
|
|
|
**SQL Engine**:
|
|
```rust
|
|
// Reuse existing PostgreSQL SQL functions, but with custom executor
|
|
pub struct SqlExecutor {
|
|
parser: SqlParser,
|
|
planner: QueryPlanner,
|
|
executor: Executor,
|
|
}
|
|
|
|
// Example: Vector similarity in SQL
|
|
// SELECT * FROM docs ORDER BY embedding <=> $1 LIMIT 10
|
|
```
|
|
|
|
**SPARQL Engine** (already exists!):
|
|
```rust
|
|
// From ruvector-postgres/src/graph/sparql/
|
|
pub use crate::graph::sparql::{
|
|
parse_sparql,
|
|
execute_sparql,
|
|
TripleStore,
|
|
};
|
|
|
|
// Example:
|
|
// SELECT ?subject ?label WHERE {
|
|
// ?subject rdfs:label ?label .
|
|
// FILTER vector:similar(?subject, ?query_vector, 0.8)
|
|
// }
|
|
```
|
|
|
|
**Cypher Engine** (already exists!):
|
|
```rust
|
|
// From ruvector-postgres/src/graph/cypher/
|
|
pub use crate::graph::cypher::{
|
|
parse_cypher,
|
|
execute_cypher,
|
|
};
|
|
|
|
// Example:
|
|
// MATCH (a:Person)-[:KNOWS]->(b:Person)
|
|
// WHERE vector.cosine(a.embedding, $query) > 0.8
|
|
// RETURN a, b
|
|
```
|
|
|
|
#### 3. WASM Bindings (`ruvector-wasm/src/lib.rs`)
|
|
|
|
```rust
|
|
use wasm_bindgen::prelude::*;
|
|
|
|
#[wasm_bindgen]
|
|
pub struct RuVector {
|
|
db: Arc<Database>,
|
|
}
|
|
|
|
#[wasm_bindgen]
|
|
impl RuVector {
|
|
/// Create a new in-memory database
|
|
#[wasm_bindgen(constructor)]
|
|
pub fn new() -> Result<RuVector, JsValue> {
|
|
Ok(RuVector {
|
|
db: Arc::new(Database::new()),
|
|
})
|
|
}
|
|
|
|
/// Execute SQL query
|
|
#[wasm_bindgen(js_name = executeSql)]
|
|
pub async fn execute_sql(&self, query: &str) -> Result<JsValue, JsValue> {
|
|
let results = self.db.execute_sql(query).await?;
|
|
Ok(serde_wasm_bindgen::to_value(&results)?)
|
|
}
|
|
|
|
/// Execute SPARQL query
|
|
#[wasm_bindgen(js_name = executeSparql)]
|
|
pub async fn execute_sparql(&self, query: &str) -> Result<JsValue, JsValue> {
|
|
let results = self.db.execute_sparql(query).await?;
|
|
Ok(serde_wasm_bindgen::to_value(&results)?)
|
|
}
|
|
|
|
/// Execute Cypher query
|
|
#[wasm_bindgen(js_name = executeCypher)]
|
|
pub async fn execute_cypher(&self, query: &str) -> Result<JsValue, JsValue> {
|
|
let results = self.db.execute_cypher(query).await?;
|
|
Ok(serde_wasm_bindgen::to_value(&results)?)
|
|
}
|
|
|
|
/// Insert vectors
|
|
#[wasm_bindgen(js_name = insertVectors)]
|
|
pub async fn insert_vectors(
|
|
&self,
|
|
table: &str,
|
|
vectors: Vec<f32>,
|
|
metadata: JsValue,
|
|
) -> Result<(), JsValue> {
|
|
// Implementation
|
|
Ok(())
|
|
}
|
|
|
|
/// Vector similarity search
|
|
#[wasm_bindgen(js_name = searchSimilar)]
|
|
pub async fn search_similar(
|
|
&self,
|
|
table: &str,
|
|
query_vector: Vec<f32>,
|
|
limit: usize,
|
|
) -> Result<JsValue, JsValue> {
|
|
// Implementation
|
|
Ok(serde_wasm_bindgen::to_value(&results)?)
|
|
}
|
|
|
|
/// Save database to storage
|
|
#[wasm_bindgen(js_name = save)]
|
|
pub async fn save(&self) -> Result<(), JsValue> {
|
|
self.db.persist().await?;
|
|
Ok(())
|
|
}
|
|
|
|
/// Load database from storage
|
|
#[wasm_bindgen(js_name = load)]
|
|
pub async fn load() -> Result<RuVector, JsValue> {
|
|
let db = Database::load().await?;
|
|
Ok(RuVector { db: Arc::new(db) })
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 4. TypeScript API (`npm/packages/wasm/src/index.ts`)
|
|
|
|
```typescript
|
|
export class RuVector {
|
|
private db: WasmRuVector;
|
|
|
|
static async create(options?: DatabaseOptions): Promise<RuVector> {
|
|
await init(); // Load WASM
|
|
const db = new WasmRuVector();
|
|
return new RuVector(db);
|
|
}
|
|
|
|
// SQL interface
|
|
async sql<T = any>(query: string, params?: any[]): Promise<T[]> {
|
|
return await this.db.executeSql(query);
|
|
}
|
|
|
|
// SPARQL interface
|
|
async sparql(query: string): Promise<SparqlResults> {
|
|
return await this.db.executeSparql(query);
|
|
}
|
|
|
|
// Cypher interface
|
|
async cypher(query: string): Promise<CypherResults> {
|
|
return await this.db.executeCypher(query);
|
|
}
|
|
|
|
// Convenience methods
|
|
async insertVectors(table: string, data: VectorData[]): Promise<void> {
|
|
// Batch insert
|
|
}
|
|
|
|
async searchSimilar(
|
|
table: string,
|
|
queryVector: Float32Array,
|
|
options?: SearchOptions
|
|
): Promise<SearchResult[]> {
|
|
return await this.db.searchSimilar(table, queryVector, options.limit);
|
|
}
|
|
|
|
// Persistence
|
|
async save(): Promise<void> {
|
|
await this.db.save();
|
|
}
|
|
|
|
static async load(): Promise<RuVector> {
|
|
await init();
|
|
const db = await WasmRuVector.load();
|
|
return new RuVector(db);
|
|
}
|
|
}
|
|
|
|
// Export types
|
|
export type { VectorData, SearchResult, SearchOptions, SparqlResults, CypherResults };
|
|
```
|
|
|
|
---
|
|
|
|
## 📦 Project Structure
|
|
|
|
```
|
|
ruvector/
|
|
├── crates/
|
|
│ ├── ruvector-core/ # Shared core (NO PostgreSQL deps)
|
|
│ │ ├── types/ # Vector types
|
|
│ │ ├── distance/ # Distance metrics
|
|
│ │ ├── index/ # HNSW, IVF
|
|
│ │ ├── quantization/ # Binary, scalar, product
|
|
│ │ └── simd/ # WASM SIMD
|
|
│ │
|
|
│ ├── ruvector-wasm/ # NEW: Standalone WASM database
|
|
│ │ ├── Cargo.toml # wasm-bindgen, wasm-pack
|
|
│ │ ├── src/
|
|
│ │ │ ├── lib.rs # WASM bindings
|
|
│ │ │ ├── storage/ # In-memory + persistence
|
|
│ │ │ ├── query/ # SQL/SPARQL/Cypher engines
|
|
│ │ │ ├── graph/ # Graph operations (from postgres)
|
|
│ │ │ ├── learning/ # ReasoningBank (from postgres)
|
|
│ │ │ ├── gnn/ # GNN layers (from postgres)
|
|
│ │ │ └── hyperbolic/ # Hyperbolic spaces (from postgres)
|
|
│ │ └── tests/
|
|
│ │
|
|
│ └── ruvector-postgres/ # Existing PostgreSQL extension
|
|
│ └── ... (unchanged)
|
|
│
|
|
├── npm/
|
|
│ └── packages/
|
|
│ ├── wasm/ # NEW: @ruvector/wasm
|
|
│ │ ├── package.json
|
|
│ │ ├── src/
|
|
│ │ │ ├── index.ts # TypeScript API
|
|
│ │ │ ├── types.ts # Type definitions
|
|
│ │ │ └── workers/ # Web Worker support
|
|
│ │ ├── dist/
|
|
│ │ │ ├── ruvector_bg.wasm
|
|
│ │ │ ├── index.js
|
|
│ │ │ └── index.d.ts
|
|
│ │ └── examples/
|
|
│ │ ├── browser.html
|
|
│ │ ├── node.js
|
|
│ │ ├── deno.ts
|
|
│ │ └── cloudflare-worker.js
|
|
│ │
|
|
│ └── ... (existing packages)
|
|
│
|
|
└── docs/
|
|
├── RUVECTOR_WASM_QUICKSTART.md
|
|
├── SQL_API.md
|
|
├── SPARQL_API.md
|
|
├── CYPHER_API.md
|
|
└── DEPLOYMENT_GUIDE.md
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Implementation Plan
|
|
|
|
### Phase 1: Core Extraction (Week 1-2)
|
|
|
|
**Extract from `ruvector-postgres`** to create standalone `ruvector-core`:
|
|
|
|
```toml
|
|
# crates/ruvector-core/Cargo.toml
|
|
[package]
|
|
name = "ruvector-core"
|
|
version = "0.1.0"
|
|
|
|
[lib]
|
|
crate-type = ["lib"]
|
|
|
|
[features]
|
|
default = ["std"]
|
|
std = [] # Standard library
|
|
wasm = ["wasm-bindgen"] # WASM support
|
|
|
|
[dependencies]
|
|
# NO PostgreSQL dependencies!
|
|
half = { version = "2.4", default-features = false }
|
|
simsimd = { version = "5.9", default-features = false }
|
|
serde = { version = "1.0", default-features = false, features = ["alloc"] }
|
|
dashmap = "6.0"
|
|
|
|
[target.'cfg(target_arch = "wasm32")'.dependencies]
|
|
wasm-bindgen = { version = "0.2", optional = true }
|
|
```
|
|
|
|
**Move these modules** (already written!):
|
|
- `types/` → Vector types (no changes needed)
|
|
- `distance/` → Distance metrics (remove pgrx deps)
|
|
- `index/hnsw.rs` → HNSW index (remove pgrx deps)
|
|
- `quantization/` → Quantization (no changes)
|
|
- `graph/` → SPARQL, Cypher (remove pgrx, add standalone storage)
|
|
- `learning/` → ReasoningBank (remove pgrx deps)
|
|
- `gnn/` → GNN layers (remove pgrx deps)
|
|
- `hyperbolic/` → Hyperbolic spaces (no changes)
|
|
|
|
### Phase 2: Storage Engine (Week 3)
|
|
|
|
**Create in-memory database**:
|
|
|
|
```rust
|
|
// ruvector-wasm/src/storage/database.rs
|
|
use dashmap::DashMap;
|
|
use parking_lot::RwLock;
|
|
|
|
pub struct Database {
|
|
tables: DashMap<String, Table>,
|
|
triple_store: Arc<TripleStore>,
|
|
graph: Arc<GraphStore>,
|
|
}
|
|
|
|
pub struct Table {
|
|
schema: Schema,
|
|
rows: RwLock<Vec<Row>>,
|
|
vector_index: Option<HnswIndex>,
|
|
}
|
|
|
|
impl Database {
|
|
pub fn create_table(&self, name: &str, schema: Schema) {
|
|
self.tables.insert(name.to_string(), Table::new(schema));
|
|
}
|
|
|
|
pub fn insert(&self, table: &str, row: Row) -> Result<()> {
|
|
let table = self.tables.get(table)?;
|
|
table.rows.write().push(row);
|
|
|
|
// Update vector index if present
|
|
if let Some(idx) = &table.vector_index {
|
|
idx.insert(row.vector)?;
|
|
}
|
|
Ok(())
|
|
}
|
|
|
|
pub fn search_similar(
|
|
&self,
|
|
table: &str,
|
|
query: &[f32],
|
|
k: usize,
|
|
) -> Result<Vec<Row>> {
|
|
let table = self.tables.get(table)?;
|
|
let idx = table.vector_index.as_ref()?;
|
|
let ids = idx.search(query, k)?;
|
|
|
|
let rows = table.rows.read();
|
|
Ok(ids.iter().map(|&id| rows[id].clone()).collect())
|
|
}
|
|
}
|
|
```
|
|
|
|
**Add persistence backends**:
|
|
|
|
```rust
|
|
// ruvector-wasm/src/storage/persist.rs
|
|
|
|
#[cfg(target_arch = "wasm32")]
|
|
mod browser {
|
|
use wasm_bindgen::prelude::*;
|
|
use web_sys::{IdbDatabase, IdbObjectStore};
|
|
|
|
pub async fn save_to_indexeddb(db: &Database) -> Result<()> {
|
|
let idb = open_indexeddb("ruvector").await?;
|
|
// Serialize database to IndexedDB
|
|
Ok(())
|
|
}
|
|
}
|
|
|
|
#[cfg(not(target_arch = "wasm32"))]
|
|
mod node {
|
|
use std::fs;
|
|
use bincode;
|
|
|
|
pub async fn save_to_file(db: &Database, path: &str) -> Result<()> {
|
|
let bytes = bincode::serialize(db)?;
|
|
fs::write(path, bytes)?;
|
|
Ok(())
|
|
}
|
|
}
|
|
```
|
|
|
|
### Phase 3: Query Engines (Week 4-5)
|
|
|
|
**SQL Executor**:
|
|
|
|
```rust
|
|
// ruvector-wasm/src/query/sql.rs
|
|
|
|
pub struct SqlExecutor {
|
|
db: Arc<Database>,
|
|
}
|
|
|
|
impl SqlExecutor {
|
|
pub async fn execute(&self, query: &str) -> Result<QueryResult> {
|
|
let ast = parse_sql(query)?;
|
|
|
|
match ast {
|
|
Statement::Select(select) => self.execute_select(select).await,
|
|
Statement::Insert(insert) => self.execute_insert(insert).await,
|
|
Statement::CreateTable(create) => self.execute_create(create).await,
|
|
_ => todo!(),
|
|
}
|
|
}
|
|
|
|
async fn execute_select(&self, select: Select) -> Result<QueryResult> {
|
|
// Vector similarity: ORDER BY embedding <=> $1
|
|
if let Some(order) = &select.order_by {
|
|
if order.op == VectorDistance {
|
|
return self.vector_search(select).await;
|
|
}
|
|
}
|
|
|
|
// Standard SELECT
|
|
self.scan_table(select).await
|
|
}
|
|
}
|
|
```
|
|
|
|
**SPARQL Executor** (already exists, just adapt!):
|
|
|
|
```rust
|
|
// Already in ruvector-postgres/src/graph/sparql/executor.rs
|
|
// Just remove pgrx dependencies and use our Database instead
|
|
|
|
pub async fn execute_sparql(
|
|
db: &Database,
|
|
query: &str,
|
|
) -> Result<SparqlResults> {
|
|
let ast = parse_sparql(query)?;
|
|
|
|
match ast.query_form {
|
|
QueryForm::Select => execute_select(db, ast).await,
|
|
QueryForm::Construct => execute_construct(db, ast).await,
|
|
// ... already implemented!
|
|
}
|
|
}
|
|
```
|
|
|
|
**Cypher Executor** (already exists!):
|
|
|
|
```rust
|
|
// Already in ruvector-postgres/src/graph/cypher/executor.rs
|
|
pub async fn execute_cypher(
|
|
db: &Database,
|
|
query: &str,
|
|
) -> Result<CypherResults> {
|
|
// Already implemented, just adapt to our Database
|
|
}
|
|
```
|
|
|
|
### Phase 4: WASM Compilation (Week 6)
|
|
|
|
**Configure for WASM**:
|
|
|
|
```toml
|
|
# ruvector-wasm/Cargo.toml
|
|
[package]
|
|
name = "ruvector-wasm"
|
|
version = "0.1.0"
|
|
|
|
[lib]
|
|
crate-type = ["cdylib"] # For WASM
|
|
|
|
[dependencies]
|
|
ruvector-core = { path = "../ruvector-core", features = ["wasm"] }
|
|
wasm-bindgen = "0.2"
|
|
wasm-bindgen-futures = "0.4"
|
|
serde-wasm-bindgen = "0.6"
|
|
js-sys = "0.3"
|
|
web-sys = { version = "0.3", features = ["Window", "IdbDatabase", "IdbObjectStore"] }
|
|
|
|
[profile.release]
|
|
opt-level = "z" # Optimize for size
|
|
lto = true # Link-time optimization
|
|
codegen-units = 1 # Single codegen unit
|
|
```
|
|
|
|
**Build script**:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/build-wasm.sh
|
|
|
|
echo "Building ruvector-wasm..."
|
|
|
|
# Install wasm-pack
|
|
cargo install wasm-pack
|
|
|
|
# Build for web
|
|
wasm-pack build crates/ruvector-wasm \
|
|
--target web \
|
|
--out-dir ../../npm/packages/wasm/dist \
|
|
--release
|
|
|
|
# Build for Node.js
|
|
wasm-pack build crates/ruvector-wasm \
|
|
--target nodejs \
|
|
--out-dir ../../npm/packages/wasm/dist-node \
|
|
--release
|
|
|
|
# Size report
|
|
echo "WASM size:"
|
|
du -h npm/packages/wasm/dist/ruvector_bg.wasm
|
|
```
|
|
|
|
### Phase 5: NPM Package (Week 7)
|
|
|
|
**TypeScript wrapper**:
|
|
|
|
```typescript
|
|
// npm/packages/wasm/src/index.ts
|
|
import init, {
|
|
RuVector as WasmRuVector,
|
|
InitOutput
|
|
} from '../dist/ruvector_wasm';
|
|
|
|
let wasmInit: Promise<InitOutput> | null = null;
|
|
|
|
async function ensureInit(): Promise<void> {
|
|
if (!wasmInit) {
|
|
wasmInit = init();
|
|
}
|
|
await wasmInit;
|
|
}
|
|
|
|
export class RuVector {
|
|
private db: WasmRuVector;
|
|
|
|
private constructor(db: WasmRuVector) {
|
|
this.db = db;
|
|
}
|
|
|
|
static async create(options?: DatabaseOptions): Promise<RuVector> {
|
|
await ensureInit();
|
|
const db = new WasmRuVector();
|
|
return new RuVector(db);
|
|
}
|
|
|
|
async sql<T = any>(query: string): Promise<T[]> {
|
|
const result = await this.db.executeSql(query);
|
|
return JSON.parse(result);
|
|
}
|
|
|
|
async sparql(query: string): Promise<SparqlResults> {
|
|
const result = await this.db.executeSparql(query);
|
|
return JSON.parse(result);
|
|
}
|
|
|
|
async cypher(query: string): Promise<any> {
|
|
const result = await this.db.executeCypher(query);
|
|
return JSON.parse(result);
|
|
}
|
|
|
|
async close(): Promise<void> {
|
|
// Cleanup
|
|
}
|
|
}
|
|
|
|
export * from './types';
|
|
```
|
|
|
|
**Package configuration**:
|
|
|
|
```json
|
|
{
|
|
"name": "@ruvector/wasm",
|
|
"version": "0.1.0",
|
|
"description": "Standalone vector database with SQL/SPARQL/Cypher - runs anywhere",
|
|
"type": "module",
|
|
"main": "./dist/index.js",
|
|
"types": "./dist/index.d.ts",
|
|
"exports": {
|
|
".": {
|
|
"types": "./dist/index.d.ts",
|
|
"import": "./dist/index.js",
|
|
"require": "./dist/index.cjs"
|
|
},
|
|
"./node": {
|
|
"types": "./dist-node/index.d.ts",
|
|
"import": "./dist-node/index.js"
|
|
}
|
|
},
|
|
"files": ["dist", "dist-node"],
|
|
"keywords": [
|
|
"vector-database",
|
|
"wasm",
|
|
"graph-database",
|
|
"sparql",
|
|
"cypher",
|
|
"sql",
|
|
"semantic-search",
|
|
"embeddings"
|
|
],
|
|
"engines": {
|
|
"node": ">=18.0.0"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Phase 6: Examples & Documentation (Week 8)
|
|
|
|
**Browser example**:
|
|
|
|
```html
|
|
<!-- examples/browser.html -->
|
|
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<title>RuVector WASM Demo</title>
|
|
</head>
|
|
<body>
|
|
<h1>Standalone Vector Database in Your Browser!</h1>
|
|
<div id="results"></div>
|
|
|
|
<script type="module">
|
|
import { RuVector } from 'https://cdn.jsdelivr.net/npm/@ruvector/wasm/+esm';
|
|
|
|
const db = await RuVector.create();
|
|
|
|
// Create table
|
|
await db.sql(`
|
|
CREATE TABLE documents (
|
|
id INTEGER PRIMARY KEY,
|
|
content TEXT,
|
|
embedding VECTOR(384)
|
|
)
|
|
`);
|
|
|
|
// Create vector index
|
|
await db.sql(`
|
|
CREATE INDEX ON documents USING hnsw (embedding)
|
|
`);
|
|
|
|
// Insert documents
|
|
const embedding = new Float32Array(384).map(() => Math.random());
|
|
await db.sql(`
|
|
INSERT INTO documents (id, content, embedding)
|
|
VALUES (1, 'Hello world', $1)
|
|
`, [embedding]);
|
|
|
|
// Vector search
|
|
const results = await db.sql(`
|
|
SELECT content, embedding <=> $1 AS distance
|
|
FROM documents
|
|
ORDER BY distance
|
|
LIMIT 5
|
|
`, [embedding]);
|
|
|
|
console.log('Search results:', results);
|
|
|
|
// SPARQL query
|
|
const sparqlResults = await db.sparql(`
|
|
SELECT ?subject ?label WHERE {
|
|
?subject rdfs:label ?label .
|
|
}
|
|
`);
|
|
|
|
// Cypher query
|
|
const cypherResults = await db.cypher(`
|
|
MATCH (a:Person)-[:KNOWS]->(b:Person)
|
|
RETURN a.name, b.name
|
|
`);
|
|
|
|
// Save to IndexedDB
|
|
await db.save();
|
|
</script>
|
|
</body>
|
|
</html>
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Size Budget
|
|
|
|
| Component | Size | Cumulative |
|
|
|-----------|------|------------|
|
|
| Core WASM runtime | ~500KB | 500KB |
|
|
| Vector operations + SIMD | ~300KB | 800KB |
|
|
| HNSW index | ~400KB | 1.2MB |
|
|
| SQL parser & executor | ~600KB | 1.8MB |
|
|
| SPARQL engine | ~800KB | 2.6MB |
|
|
| Cypher engine | ~600KB | 3.2MB |
|
|
| Graph operations | ~400KB | 3.6MB |
|
|
| GNN layers | ~800KB | 4.4MB |
|
|
| ReasoningBank learning | ~600KB | 5.0MB |
|
|
| Hyperbolic spaces | ~300KB | 5.3MB |
|
|
| **Total (gzipped)** | | **~5-6MB** |
|
|
|
|
**Comparison**:
|
|
- DuckDB-WASM: ~6-8MB
|
|
- SQLite-WASM: ~1MB (no vector/graph features)
|
|
- PGlite: ~3MB (minimal Postgres, no vector by default)
|
|
- **RuVector-WASM: ~5-6MB** (FULL features!)
|
|
|
|
---
|
|
|
|
## 🎯 Success Metrics
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| Bundle size (gzipped) | < 6MB |
|
|
| Load time (browser) | < 1s |
|
|
| Query latency (1k vectors) | < 20ms |
|
|
| Memory usage (100k vectors) | < 200MB |
|
|
| Browser support | Chrome 91+, Firefox 89+, Safari 16.4+ |
|
|
| Node.js support | 18+ |
|
|
|
|
---
|
|
|
|
## 🚀 Usage Examples
|
|
|
|
### Browser - Semantic Search
|
|
|
|
```typescript
|
|
import { RuVector } from '@ruvector/wasm';
|
|
|
|
// Create database
|
|
const db = await RuVector.create();
|
|
|
|
// Setup schema
|
|
await db.sql(`
|
|
CREATE TABLE articles (
|
|
id SERIAL PRIMARY KEY,
|
|
title TEXT,
|
|
content TEXT,
|
|
embedding VECTOR(768)
|
|
)
|
|
`);
|
|
|
|
// Insert with embeddings
|
|
await db.insertVectors('articles', [
|
|
{
|
|
title: 'AI in 2025',
|
|
content: '...',
|
|
embedding: await getEmbedding('AI in 2025')
|
|
},
|
|
// ... more articles
|
|
]);
|
|
|
|
// Semantic search
|
|
const results = await db.searchSimilar(
|
|
'articles',
|
|
queryEmbedding,
|
|
{ limit: 10, threshold: 0.7 }
|
|
);
|
|
```
|
|
|
|
### Node.js - Knowledge Graph
|
|
|
|
```typescript
|
|
import { RuVector } from '@ruvector/wasm/node';
|
|
|
|
const db = await RuVector.create();
|
|
|
|
// Cypher: Create knowledge graph
|
|
await db.cypher(`
|
|
CREATE (a:Person {name: 'Alice', embedding: $a_emb})
|
|
CREATE (b:Person {name: 'Bob', embedding: $b_emb})
|
|
CREATE (a)-[:KNOWS]->(b)
|
|
`, { a_emb: aliceEmbedding, b_emb: bobEmbedding });
|
|
|
|
// Cypher: Find similar people
|
|
await db.cypher(`
|
|
MATCH (p:Person)
|
|
WHERE vector.cosine(p.embedding, $query) > 0.8
|
|
RETURN p.name
|
|
`);
|
|
```
|
|
|
|
### Cloudflare Workers - Edge Search
|
|
|
|
```typescript
|
|
import { RuVector } from '@ruvector/wasm';
|
|
|
|
export default {
|
|
async fetch(request: Request, env: Env) {
|
|
const db = await RuVector.create();
|
|
|
|
// Load from Durable Object or KV
|
|
// await db.load();
|
|
|
|
const { query } = await request.json();
|
|
const embedding = await generateEmbedding(query);
|
|
|
|
const results = await db.sql(`
|
|
SELECT * FROM products
|
|
ORDER BY embedding <=> $1
|
|
LIMIT 10
|
|
`, [embedding]);
|
|
|
|
return Response.json(results);
|
|
}
|
|
};
|
|
```
|
|
|
|
### Deno - SPARQL RDF Store
|
|
|
|
```typescript
|
|
import { RuVector } from "https://esm.sh/@ruvector/wasm";
|
|
|
|
const db = await RuVector.create();
|
|
|
|
// SPARQL: Query RDF data
|
|
const results = await db.sparql(`
|
|
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
PREFIX vec: <http://ruvector.dev/vector/>
|
|
|
|
SELECT ?person ?name ?similarity WHERE {
|
|
?person foaf:name ?name .
|
|
BIND(vec:cosine(?person, ?query) AS ?similarity)
|
|
FILTER(?similarity > 0.8)
|
|
}
|
|
ORDER BY DESC(?similarity)
|
|
LIMIT 10
|
|
`);
|
|
```
|
|
|
|
---
|
|
|
|
## 📅 8-Week Timeline
|
|
|
|
| Week | Milestone |
|
|
|------|-----------|
|
|
| 1-2 | Extract `ruvector-core`, remove PostgreSQL deps |
|
|
| 3 | Build storage engine (in-memory + persistence) |
|
|
| 4-5 | Adapt query engines (SQL, SPARQL, Cypher) |
|
|
| 6 | WASM compilation, optimization, testing |
|
|
| 7 | NPM package, TypeScript API, CI/CD |
|
|
| 8 | Documentation, examples, beta release |
|
|
|
|
---
|
|
|
|
## ✅ Advantages Over PGlite Extension
|
|
|
|
| Feature | PGlite Extension | RuVector-WASM |
|
|
|---------|------------------|---------------|
|
|
| PostgreSQL dependency | ✅ Included (3MB) | ❌ None |
|
|
| Full ruvector features | ❌ Limited (size) | ✅ All features |
|
|
| SPARQL/Cypher | ❌ Hard to add | ✅ Built-in |
|
|
| GNN/Learning | ❌ Too large | ✅ Included |
|
|
| Build complexity | ❌ Emscripten + PGlite fork | ✅ wasm-pack only |
|
|
| Maintenance | ❌ Track PGlite updates | ✅ Independent |
|
|
| Size | ~3MB + extension | ~5-6MB total |
|
|
|
|
---
|
|
|
|
**Ready to build the future of vector databases?** 🚀
|
|
|
|
This is the RIGHT architecture for your vision!
|