Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,507 @@
# RvLite Revised Architecture - Maximum WASM Reuse
## 🎯 Critical Discovery
After thorough review, **RvLite can be built as a THIN ORCHESTRATION LAYER** over existing WASM crates!
---
## ✅ What Already Exists (WASM-Ready)
### 1. Vector Operations - **100% Complete**
**Crate**: `ruvector-wasm`
- Vector types (vector, halfvec, binaryvec, sparsevec)
- Distance metrics (L2, cosine, inner product, etc.)
- HNSW indexing
- Quantization
- IndexedDB persistence
- SIMD support
**Reuse Strategy**: **Direct dependency**
```toml
ruvector-wasm = { path = "../ruvector-wasm" }
```
### 2. Graph Database + Cypher - **100% Complete**
**Crates**:
- `ruvector-graph` ✅ (Core graph DB with Cypher parser/executor)
- `ruvector-graph-wasm` ✅ (WASM bindings)
**What's Included**:
- ✅ Cypher parser (`src/cypher/parser.rs`)
- ✅ Cypher executor (`src/executor/`)
- ✅ Graph storage
- ✅ Neo4j compatibility
- ✅ ACID transactions
- ✅ Property graphs
- ✅ Hypergraphs
**Reuse Strategy**: **Direct dependency**
```toml
ruvector-graph-wasm = { path = "../ruvector-graph-wasm" }
```
### 3. Graph Neural Networks - **100% Complete**
**Crates**:
- `ruvector-gnn` ✅ (GNN layers)
- `ruvector-gnn-wasm` ✅ (WASM bindings)
**What's Included**:
- ✅ GCN, GraphSage, GAT, GIN
- ✅ Node embeddings
- ✅ Graph classification
- ✅ Tensor compression
**Reuse Strategy**: **Direct dependency**
```toml
ruvector-gnn-wasm = { path = "../ruvector-gnn-wasm" }
```
### 4. Self-Learning (ReasoningBank) - **100% Complete**
**Crate**: `sona`
**What's Included**:
- ✅ Micro-LoRA (instant learning)
- ✅ Base-LoRA (background learning)
- ✅ EWC++ (prevent catastrophic forgetting)
- ✅ ReasoningBank (pattern extraction)
- ✅ Trajectory tracking
- ✅ WASM support (feature flag)
**Reuse Strategy**: **Direct dependency**
```toml
sona = { path = "../sona", features = ["wasm"] }
```
### 5. Ultra-Lightweight HNSW - **100% Complete**
**Crate**: `micro-hnsw-wasm`
**What's Included**:
- ✅ Neuromorphic HNSW (11.8KB!)
- ✅ Spiking neural networks
- ✅ Ultra-optimized
**Reuse Strategy**: **Optional for size-constrained builds**
```toml
micro-hnsw-wasm = { path = "../micro-hnsw-wasm", optional = true }
```
### 6. Attention Mechanisms - **100% Complete**
**Crate**: `ruvector-attention-wasm`
**Reuse Strategy**: **Optional feature**
```toml
ruvector-attention-wasm = { path = "../ruvector-attention-wasm", optional = true }
```
---
## ❌ What's Missing (Need to Create)
### 1. SQL Query Engine - **NOT IMPLEMENTED**
**Status**: Need to build
**Options**:
- **Option A**: Use `sqlparser-rs` (~200KB)
- **Option B**: Build lightweight SQL subset parser (~50KB)
- **Option C**: Skip SQL, use programmatic API only
**Recommendation**: Option A (full SQL compatibility)
### 2. SPARQL Engine - **PARTIALLY EXISTS**
**Status**: Exists in `ruvector-postgres` but needs extraction
**Location**: `crates/ruvector-postgres/src/graph/sparql/`
**What Exists**:
- ✅ SPARQL 1.1 parser (`parser.rs`)
- ✅ SPARQL executor (`executor.rs`)
- ✅ Triple store (`triple_store.rs`)
- ✅ Result formatting (`results.rs`)
**Issues**:
- ❌ Uses `pgrx` (PostgreSQL extension framework)
- ❌ Tied to PostgreSQL storage
**Extraction Strategy**:
1. Copy `sparql/` module from ruvector-postgres
2. Remove `pgrx` dependencies
3. Replace PostgreSQL storage with RvLite storage
4. Wrap in WASM bindings
**Effort**: 2-3 days
### 3. Storage Engine - **PARTIALLY EXISTS**
**Status**: Each crate has its own storage
**What Exists**:
- `ruvector-wasm` → In-memory + IndexedDB
- `ruvector-graph` → Graph storage
- Need: **Unified storage layer**
**Recommendation**: Create thin adapter layer that routes:
- Vector data → `ruvector-wasm`
- Graph data → `ruvector-graph-wasm`
- Triples → SPARQL triple store (extracted)
**Effort**: 1-2 days
### 4. Orchestration Layer - **NOT IMPLEMENTED**
**Status**: Need to create
**Purpose**: Unified API that routes queries to appropriate engines
**Structure**:
```rust
pub struct RvLite {
vector_db: Arc<VectorDB>, // From ruvector-wasm
graph_db: Arc<GraphDB>, // From ruvector-graph-wasm
gnn_engine: Arc<GnnEngine>, // From ruvector-gnn-wasm
learning_engine: Arc<SonaEngine>, // From sona
sparql_executor: Arc<SparqlExecutor>, // Extracted from postgres
sql_executor: Arc<SqlExecutor>, // NEW
}
impl RvLite {
pub async fn query(&self, query: &str) -> Result<QueryResult> {
// Route to appropriate engine based on query type
if query.trim_start().starts_with("SELECT") {
self.sql_executor.execute(query).await
} else if query.trim_start().starts_with("MATCH") {
self.graph_db.cypher(query).await
} else if query.trim_start().starts_with("PREFIX") {
self.sparql_executor.execute(query).await
}
}
}
```
**Effort**: 2-3 days
---
## 📊 Revised Implementation Effort
### Total Estimated Effort
| Component | Status | Effort | Reuse % |
|-----------|--------|--------|---------|
| Vector operations | ✅ Exists | 0 days | 100% |
| Cypher/Graph DB | ✅ Exists | 0 days | 100% |
| GNN layers | ✅ Exists | 0 days | 100% |
| ReasoningBank | ✅ Exists | 0 days | 100% |
| HNSW indexing | ✅ Exists | 0 days | 100% |
| Attention | ✅ Exists | 0 days | 100% |
| **SQL engine** | ❌ Missing | **3-4 days** | 0% |
| **SPARQL extraction** | ⚠️ Partial | **2-3 days** | 80% |
| **Storage adapter** | ⚠️ Partial | **1-2 days** | 60% |
| **Orchestration layer** | ❌ Missing | **2-3 days** | 0% |
| **WASM bindings** | ⚠️ Partial | **2-3 days** | 50% |
| **Testing** | ❌ Missing | **2-3 days** | 0% |
| **Documentation** | ❌ Missing | **2-3 days** | 0% |
**Total New Work**: **14-21 days** (2-3 weeks)
**Reuse Rate**: **~70%**
---
## 🏗️ Optimized RvLite Architecture
### Minimal Dependency Graph
```
┌─────────────────────────────────────────┐
│ RvLite (NEW - Orchestration Only) │
│ ├─ SQL parser & executor (NEW) │
│ ├─ SPARQL executor (extracted) │
│ ├─ Storage adapter (NEW) │
│ └─ Unified WASM API (NEW) │
└──────────────┬──────────────────────────┘
│ depends on (100% reuse)
┌──────────────────────────────────────────┐
│ Existing WASM Crates │
├──────────────────────────────────────────┤
│ • ruvector-wasm (vectors) │
│ • ruvector-graph-wasm (Cypher) │
│ • ruvector-gnn-wasm (GNN) │
│ • sona (learning) │
│ • micro-hnsw-wasm (optional) │
│ • ruvector-attention-wasm (optional) │
└──────────────────────────────────────────┘
```
### Simplified File Structure
```
crates/rvlite/
├── Cargo.toml # Depends on existing WASM crates
├── src/
│ ├── lib.rs # WASM entry point, orchestration
│ ├── storage/
│ │ └── adapter.rs # Routes to existing storage backends
│ ├── query/
│ │ ├── sql/ # NEW: SQL engine
│ │ │ ├── parser.rs
│ │ │ └── executor.rs
│ │ └── sparql/ # EXTRACTED from ruvector-postgres
│ │ ├── mod.rs # (remove pgrx deps)
│ │ ├── parser.rs
│ │ ├── executor.rs
│ │ └── triple_store.rs
│ ├── api.rs # Unified TypeScript API
│ └── error.rs # Error handling
├── tests/
│ ├── sql_tests.rs
│ ├── sparql_tests.rs
│ └── integration_tests.rs
└── examples/
├── browser.html
└── nodejs.ts
```
---
## 🚀 Ultra-Fast 2-Week Implementation Plan
### Week 1: Core Integration
**Monday** (Day 1):
- Create `rvlite` crate
- Set up `Cargo.toml` with all existing WASM crate dependencies
- Basic orchestration layer structure
**Tuesday** (Day 2):
- Storage adapter implementation
- Route vector ops to `ruvector-wasm`
- Route graph ops to `ruvector-graph-wasm`
**Wednesday** (Day 3):
- Extract SPARQL from `ruvector-postgres`
- Remove `pgrx` dependencies
- Adapt to RvLite storage
**Thursday** (Day 4):
- Integrate `sona` for learning
- Integrate `ruvector-gnn-wasm` for GNN
- Test basic operations
**Friday** (Day 5):
- SQL parser integration (sqlparser-rs)
- Basic SQL executor
- Week 1 demo
### Week 2: SQL Engine + Polish
**Monday** (Day 6):
- Complete SQL executor
- Vector operators in SQL (<->, <=>, <#>)
- CREATE TABLE, INSERT, SELECT
**Tuesday** (Day 7):
- SQL query planning
- Index support
- JOIN operations (basic)
**Wednesday** (Day 8):
- WASM bindings for unified API
- TypeScript type definitions
- JavaScript examples
**Thursday** (Day 9):
- Testing (unit, integration)
- Performance benchmarking
- Size optimization
**Friday** (Day 10):
- Documentation
- Examples (browser, Node.js, Deno)
- Beta release preparation
---
## 📦 Optimized Cargo.toml
```toml
[package]
name = "rvlite"
version = "0.1.0"
edition = "2021"
description = "Standalone vector database with SQL, SPARQL, and Cypher - powered by RuVector WASM"
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
# ===== 100% REUSE - Existing WASM Crates =====
ruvector-wasm = { path = "../ruvector-wasm" }
ruvector-graph-wasm = { path = "../ruvector-graph-wasm" }
ruvector-gnn-wasm = { path = "../ruvector-gnn-wasm" }
sona = { path = "../sona", features = ["wasm"] }
# Optional features
micro-hnsw-wasm = { path = "../micro-hnsw-wasm", optional = true }
ruvector-attention-wasm = { path = "../ruvector-attention-wasm", optional = true }
# ===== NEW - SQL Engine =====
sqlparser = "0.49" # ~200KB
# ===== WASM Bindings (same as existing crates) =====
wasm-bindgen = { workspace = true }
wasm-bindgen-futures = { workspace = true }
js-sys = { workspace = true }
web-sys = { workspace = true, features = ["console", "IdbDatabase", "Window"] }
serde-wasm-bindgen = "0.6"
console_error_panic_hook = "0.1"
# ===== Standard Dependencies =====
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
[dev-dependencies]
wasm-bindgen-test = "0.3"
criterion = "0.5"
[features]
default = ["sql", "sparql", "cypher"]
sql = []
sparql = []
cypher = [] # Always included via ruvector-graph-wasm
gnn = [] # Always included via ruvector-gnn-wasm
learning = [] # Always included via sona
attention = ["dep:ruvector-attention-wasm"]
micro-hnsw = ["dep:micro-hnsw-wasm"]
full = ["sql", "sparql", "cypher", "gnn", "learning", "attention"]
lite = ["sql"] # Just SQL + vectors
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
panic = "abort"
[profile.release.package."*"]
opt-level = "z"
```
---
## 💡 Key Implementation Insights
### 1. RvLite = Thin Orchestration Layer
**NOT**: Reimplementing everything
**YES**: Composing existing WASM crates
```rust
// RvLite doesn't reimplement - it orchestrates!
#[wasm_bindgen]
pub struct RvLite {
// Delegate to existing implementations
vectors: VectorDB, // From ruvector-wasm
graph: GraphDB, // From ruvector-graph-wasm
gnn: GnnEngine, // From ruvector-gnn-wasm
learning: SonaEngine, // From sona
// Only NEW components
sql: SqlExecutor, // NEW
sparql: SparqlExecutor, // Extracted
}
```
### 2. Unified API Pattern
```typescript
// Single entry point
const db = await RvLite.create();
// Automatically routes to correct engine
await db.query(`SELECT * FROM docs ORDER BY embedding <=> $1`); // → SQL
await db.query(`MATCH (a)-[:KNOWS]->(b) RETURN a, b`); // → Cypher
await db.query(`SELECT ?s ?p ?o WHERE { ?s ?p ?o }`); // → SPARQL
```
### 3. Zero-Copy Data Sharing
```rust
// Share storage between engines
struct SharedStorage {
vectors: Arc<VectorStorage>, // From ruvector-wasm
graph: Arc<GraphStorage>, // From ruvector-graph
triples: Arc<TripleStore>, // From SPARQL
}
// SQL can query vectors stored by vector engine
// Cypher can use vectors from vector engine
// SPARQL can reference graph nodes
```
---
## 📈 Revised Size Estimate
| Component | Size (gzipped) |
|-----------|----------------|
| ruvector-wasm | 500KB |
| ruvector-graph-wasm (Cypher) | 600KB |
| ruvector-gnn-wasm | 300KB |
| sona (learning) | 300KB |
| SQL engine (sqlparser-rs) | 200KB |
| SPARQL executor (extracted) | 300KB |
| RvLite orchestration | 100KB |
| **Total** | **~2.3MB** |
**Original Estimate**: 5-6MB
**Revised with Reuse**: **2-3MB**
---
## ✅ Success Metrics (Revised)
### Week 1 Checkpoint
- [ ] All existing WASM crates integrated
- [ ] Storage adapter working
- [ ] SPARQL extracted and functional
- [ ] Basic unified API working
### Week 2 Completion
- [ ] SQL engine complete
- [ ] All query types work (SQL, SPARQL, Cypher)
- [ ] Bundle size < 3MB
- [ ] Test coverage > 80%
- [ ] Documentation complete
---
## 🎯 Recommended Next Steps
1. **Immediate** (Today):
- Create `rvlite` crate
- Add dependencies on existing WASM crates
- Verify all crates compile together
2. **Day 1-2**:
- Build storage adapter
- Test vector operations via ruvector-wasm
- Test Cypher queries via ruvector-graph-wasm
3. **Day 3-5**:
- Extract SPARQL from ruvector-postgres
- Integrate SQL parser
- Build unified API
4. **Day 6-10**:
- Complete SQL executor
- Testing and optimization
- Documentation and examples
---
**Conclusion**: RvLite can be built in **2-3 weeks** by reusing **~70%** of existing code!
**Next**: Create the `rvlite` crate and start integration?