Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
493
crates/rvlite/docs/00_EXISTING_WASM_ANALYSIS.md
Normal file
493
crates/rvlite/docs/00_EXISTING_WASM_ANALYSIS.md
Normal file
@@ -0,0 +1,493 @@
|
||||
# Existing WASM Implementations Analysis
|
||||
|
||||
## Summary
|
||||
|
||||
RuVector already has **extensive WASM implementations** we can learn from and potentially reuse for RvLite!
|
||||
|
||||
---
|
||||
|
||||
## 1. Existing WASM Crates
|
||||
|
||||
### 1.1 ruvector-wasm (Main WASM Package)
|
||||
|
||||
**Location**: `/workspaces/ruvector/crates/ruvector-wasm/`
|
||||
|
||||
**Features**:
|
||||
- ✅ Full VectorDB API (insert, search, delete, batch)
|
||||
- ✅ SIMD acceleration (opt-in feature)
|
||||
- ✅ IndexedDB persistence
|
||||
- ✅ Web Workers support
|
||||
- ✅ Zero-copy transfers
|
||||
- ✅ Security limits (MAX_VECTOR_DIMENSIONS: 65536)
|
||||
|
||||
**Key Dependencies**:
|
||||
```toml
|
||||
ruvector-core = { path = "../ruvector-core", features = ["memory-only"] }
|
||||
wasm-bindgen = "0.2"
|
||||
wasm-bindgen-futures = "0.4"
|
||||
js-sys = "0.3"
|
||||
web-sys = { features = ["IdbDatabase", "IdbObjectStore", ...] }
|
||||
serde-wasm-bindgen = "0.6"
|
||||
console_error_panic_hook = "0.1"
|
||||
tracing-wasm = "0.2"
|
||||
```
|
||||
|
||||
**Release Profile** (Size Optimization):
|
||||
```toml
|
||||
[profile.release]
|
||||
opt-level = "z" # Optimize for size
|
||||
lto = true # Link-time optimization
|
||||
codegen-units = 1 # Single codegen unit
|
||||
panic = "abort" # No unwinding
|
||||
```
|
||||
|
||||
**Architecture Lessons**:
|
||||
```rust
|
||||
// Security: Validate dimensions
|
||||
const MAX_VECTOR_DIMENSIONS: usize = 65536;
|
||||
|
||||
// Error handling across WASM boundary
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct WasmError {
|
||||
pub message: String,
|
||||
pub kind: String,
|
||||
}
|
||||
|
||||
// WASM-friendly API
|
||||
#[wasm_bindgen]
|
||||
impl JsVectorEntry {
|
||||
#[wasm_bindgen(constructor)]
|
||||
pub fn new(
|
||||
vector: Float32Array,
|
||||
id: Option<String>,
|
||||
metadata: Option<JsValue>,
|
||||
) -> Result<JsVectorEntry, JsValue> {
|
||||
// ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 1.2 sona (Self-Optimizing Neural Architecture)
|
||||
|
||||
**Location**: `/workspaces/ruvector/crates/sona/`
|
||||
|
||||
**Features**:
|
||||
- ✅ Runtime-adaptive learning
|
||||
- ✅ Two-tier LoRA
|
||||
- ✅ EWC++ (Elastic Weight Consolidation)
|
||||
- ✅ ReasoningBank integration
|
||||
- ✅ Dual target support: WASM + NAPI (Node.js native)
|
||||
|
||||
**Feature Flags**:
|
||||
```toml
|
||||
[features]
|
||||
default = ["serde-support"]
|
||||
wasm = ["wasm-bindgen", "wasm-bindgen-futures", "console_error_panic_hook", ...]
|
||||
napi = ["dep:napi", "dep:napi-derive", "serde-support"]
|
||||
serde-support = ["serde", "serde_json"]
|
||||
```
|
||||
|
||||
**Key Insight**: Supports **both WASM and native Node.js** via feature flags!
|
||||
|
||||
### 1.3 micro-hnsw-wasm (Ultra-Lightweight HNSW)
|
||||
|
||||
**Location**: `/workspaces/ruvector/crates/micro-hnsw-wasm/`
|
||||
|
||||
**Features**:
|
||||
- ✅ **Only 11.8KB WASM** (incredibly small!)
|
||||
- ✅ Neuromorphic HNSW with spiking neural networks
|
||||
- ✅ LIF neurons
|
||||
- ✅ STDP learning
|
||||
- ✅ Winner-take-all
|
||||
- ✅ Dendritic computation
|
||||
- ✅ No dependencies (`[dependencies]` section is empty!)
|
||||
|
||||
**Size Optimization** (Maximum):
|
||||
```toml
|
||||
[profile.release]
|
||||
opt-level = "z"
|
||||
lto = true
|
||||
codegen-units = 1
|
||||
panic = "abort"
|
||||
strip = true # Strip debug symbols
|
||||
```
|
||||
|
||||
**Key Insight**: Proof that aggressive optimization can achieve sub-12KB WASM!
|
||||
|
||||
### 1.4 Other WASM Crates
|
||||
|
||||
| Crate | Purpose | Status |
|
||||
|-------|---------|--------|
|
||||
| `ruvector-attention-wasm` | Attention mechanisms | Built (`pkg/ruvector_attention_wasm_bg.wasm`) |
|
||||
| `ruvector-gnn-wasm` | Graph Neural Networks | Exists |
|
||||
| `ruvector-graph-wasm` | Graph operations | Exists |
|
||||
| `ruvector-tiny-dancer-wasm` | Tiny Dancer routing | Exists |
|
||||
| `ruvector-router-wasm` | Router | Exists |
|
||||
|
||||
---
|
||||
|
||||
## 2. Existing Examples
|
||||
|
||||
### 2.1 WASM Examples Directory
|
||||
|
||||
**Location**: `/workspaces/ruvector/examples/wasm/`
|
||||
|
||||
**Structure**:
|
||||
```
|
||||
examples/wasm/
|
||||
└── ios/
|
||||
├── dist/
|
||||
│ └── recommendation.wasm
|
||||
└── swift/
|
||||
└── Resources/
|
||||
└── recommendation.wasm
|
||||
```
|
||||
|
||||
**iOS Integration**: Shows how to use WASM in Swift/iOS apps!
|
||||
|
||||
### 2.2 Other Examples
|
||||
|
||||
- `examples/scipix/wasm_demo.html` - SciFi visualization demo
|
||||
- `npm/tests/unit/wasm.test.js` - WASM unit tests
|
||||
- `docs/guides/wasm-api.md` - WASM API documentation
|
||||
- `docs/guides/wasm-build-guide.md` - Build instructions
|
||||
|
||||
---
|
||||
|
||||
## 3. Key Learnings for RvLite
|
||||
|
||||
### 3.1 Architecture Patterns to Adopt
|
||||
|
||||
#### ✅ Use ruvector-core as Foundation
|
||||
```toml
|
||||
# RvLite can depend on existing ruvector-core
|
||||
[dependencies]
|
||||
ruvector-core = { path = "../ruvector-core", features = ["memory-only"] }
|
||||
```
|
||||
|
||||
**Benefit**: Reuse battle-tested vector operations, SIMD, quantization.
|
||||
|
||||
#### ✅ Security-First API Design
|
||||
```rust
|
||||
// Validate inputs before allocation
|
||||
const MAX_VECTOR_DIMENSIONS: usize = 65536;
|
||||
const MAX_BATCH_SIZE: usize = 10000;
|
||||
|
||||
if vec_len > MAX_VECTOR_DIMENSIONS {
|
||||
return Err("Vector too large");
|
||||
}
|
||||
```
|
||||
|
||||
#### ✅ Error Handling Pattern
|
||||
```rust
|
||||
#[derive(Serialize, Deserialize)]
|
||||
pub struct WasmError {
|
||||
pub message: String,
|
||||
pub kind: String,
|
||||
}
|
||||
|
||||
impl From<WasmError> for JsValue {
|
||||
// Convert to JS-friendly error object
|
||||
}
|
||||
```
|
||||
|
||||
#### ✅ WASM-Friendly Types
|
||||
```rust
|
||||
// Use wasm-bindgen compatible types
|
||||
#[wasm_bindgen]
|
||||
pub struct Database {
|
||||
inner: Arc<Mutex<CoreDatabase>>,
|
||||
}
|
||||
|
||||
#[wasm_bindgen]
|
||||
impl Database {
|
||||
#[wasm_bindgen(constructor)]
|
||||
pub fn new(options: JsValue) -> Result<Database, JsValue> {
|
||||
let opts: DbOptions = from_value(options)?;
|
||||
// ...
|
||||
}
|
||||
|
||||
pub async fn search(
|
||||
&self,
|
||||
query: Float32Array,
|
||||
limit: usize,
|
||||
) -> Result<JsValue, JsValue> {
|
||||
// Return JsValue (serialized results)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Build Configuration Best Practices
|
||||
|
||||
#### ✅ Size Optimization
|
||||
```toml
|
||||
[profile.release]
|
||||
opt-level = "z" # Optimize for size (not speed)
|
||||
lto = true # Link-time optimization
|
||||
codegen-units = 1 # Single codegen unit (better optimization)
|
||||
panic = "abort" # No unwinding (saves space)
|
||||
strip = true # Strip debug symbols
|
||||
|
||||
[profile.release.package."*"]
|
||||
opt-level = "z" # Apply to all dependencies
|
||||
|
||||
[package.metadata.wasm-pack.profile.release]
|
||||
wasm-opt = false # Disable wasm-opt (manual optimization)
|
||||
```
|
||||
|
||||
#### ✅ WASM-Specific Dependencies
|
||||
```toml
|
||||
# Always use wasm_js feature for getrandom
|
||||
[target.'cfg(target_arch = "wasm32")'.dependencies]
|
||||
getrandom = { version = "0.2", features = ["wasm_js"] }
|
||||
```
|
||||
|
||||
### 3.3 Feature Flags Strategy
|
||||
|
||||
```toml
|
||||
[features]
|
||||
default = []
|
||||
simd = ["ruvector-core/simd"] # WASM SIMD
|
||||
sql = ["dep:sql-parser"] # SQL engine
|
||||
sparql = ["dep:sparql-parser"] # SPARQL engine
|
||||
cypher = ["dep:cypher-parser"] # Cypher engine
|
||||
gnn = ["dep:ruvector-gnn-wasm"] # GNN layers
|
||||
learning = ["dep:sona"] # ReasoningBank
|
||||
graph = ["dep:ruvector-graph-wasm"] # Graph operations
|
||||
hyperbolic = ["dep:hyperbolic-embeddings"] # Hyperbolic spaces
|
||||
|
||||
# Feature bundles
|
||||
full = ["sql", "sparql", "cypher", "gnn", "learning", "graph", "hyperbolic"]
|
||||
lite = ["sql"] # Minimal bundle
|
||||
```
|
||||
|
||||
**Benefit**: Users can opt-in to features, reducing bundle size.
|
||||
|
||||
### 3.4 Persistence Strategy
|
||||
|
||||
**From ruvector-wasm**:
|
||||
```rust
|
||||
// IndexedDB persistence
|
||||
async fn save_to_indexeddb(&self) -> Result<(), JsValue> {
|
||||
let window = web_sys::window().unwrap();
|
||||
let idb: IdbFactory = window.indexed_db()?.unwrap();
|
||||
|
||||
// Open database
|
||||
let open_request = idb.open_with_u32("rvlite", 1)?;
|
||||
|
||||
// ... store data
|
||||
}
|
||||
```
|
||||
|
||||
**RvLite Should Support**:
|
||||
1. **IndexedDB** (browser) - 50MB+ quota
|
||||
2. **OPFS** (Origin Private File System) - Larger quota
|
||||
3. **File System** (Node.js) - Unlimited
|
||||
|
||||
### 3.5 Dual-Target Strategy (from sona)
|
||||
|
||||
**Support both WASM and Node.js native**:
|
||||
```toml
|
||||
[features]
|
||||
wasm = ["wasm-bindgen", ...]
|
||||
napi = ["dep:napi", "dep:napi-derive"]
|
||||
```
|
||||
|
||||
**Benefit**:
|
||||
- Browser: Use WASM
|
||||
- Node.js: Use native addon (faster)
|
||||
|
||||
---
|
||||
|
||||
## 4. What We Can Reuse
|
||||
|
||||
### 4.1 Direct Dependencies
|
||||
|
||||
✅ **ruvector-core** - Vector types, distances, SIMD, quantization
|
||||
✅ **ruvector-gnn-wasm** - GNN layers (if exists)
|
||||
✅ **ruvector-graph-wasm** - Graph operations (if exists)
|
||||
✅ **sona** - ReasoningBank, adaptive learning
|
||||
✅ **micro-hnsw-wasm** - Ultra-lightweight HNSW
|
||||
|
||||
### 4.2 Patterns & Code
|
||||
|
||||
✅ **Error handling** - WasmError pattern from ruvector-wasm
|
||||
✅ **IndexedDB persistence** - From ruvector-wasm
|
||||
✅ **Build configuration** - Cargo.toml profiles
|
||||
✅ **Security validation** - Input limits, bounds checking
|
||||
✅ **TypeScript types** - From existing packages
|
||||
|
||||
### 4.3 Testing Infrastructure
|
||||
|
||||
✅ **wasm-bindgen-test** - Browser test runner
|
||||
✅ **Unit tests** - From npm/tests/unit/wasm.test.js
|
||||
✅ **Benchmarks** - From node_modules/agentdb wasm benchmarks
|
||||
|
||||
---
|
||||
|
||||
## 5. RvLite Differentiation
|
||||
|
||||
### What Makes RvLite Different?
|
||||
|
||||
| Feature | Existing WASM Crates | RvLite |
|
||||
|---------|---------------------|---------|
|
||||
| **Scope** | Vector operations only | **Complete database** (SQL/SPARQL/Cypher) |
|
||||
| **Query Languages** | Programmatic API | **3 query languages** |
|
||||
| **Graph Support** | Limited | **Full graph DB** (Cypher, SPARQL) |
|
||||
| **Self-Learning** | sona (separate) | **Built-in ReasoningBank** |
|
||||
| **Standalone** | Needs backend | **Fully standalone** |
|
||||
| **Storage Engine** | Basic persistence | **ACID transactions** |
|
||||
|
||||
**RvLite = All existing WASM crates + Standalone DB + Query engines**
|
||||
|
||||
---
|
||||
|
||||
## 6. Recommended Architecture for RvLite
|
||||
|
||||
### 6.1 Layered Approach
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ RvLite (crates/rvlite/) │
|
||||
│ - SQL/SPARQL/Cypher engines │
|
||||
│ - Storage engine │
|
||||
│ - Transaction manager │
|
||||
│ - WASM bindings │
|
||||
└───────────────┬─────────────────────────┘
|
||||
│ depends on
|
||||
┌───────────────▼─────────────────────────┐
|
||||
│ Existing WASM Crates │
|
||||
│ - ruvector-core (vectors) │
|
||||
│ - sona (learning) │
|
||||
│ - micro-hnsw-wasm (indexing) │
|
||||
│ - ruvector-gnn-wasm (GNN) │
|
||||
│ - ruvector-graph-wasm (graph) │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 6.2 File Structure
|
||||
|
||||
```
|
||||
crates/rvlite/
|
||||
├── Cargo.toml # Similar to ruvector-wasm
|
||||
├── src/
|
||||
│ ├── lib.rs # WASM bindings
|
||||
│ ├── storage/ # NEW: Storage engine
|
||||
│ ├── query/ # NEW: Query engines
|
||||
│ │ ├── sql.rs
|
||||
│ │ ├── sparql.rs
|
||||
│ │ └── cypher.rs
|
||||
│ ├── transaction.rs # NEW: ACID transactions
|
||||
│ └── error.rs # Copy from ruvector-wasm
|
||||
├── tests/
|
||||
│ └── wasm.rs # Similar to ruvector-wasm/tests/wasm.rs
|
||||
└── pkg/ # Built by wasm-pack
|
||||
```
|
||||
|
||||
### 6.3 Dependency Strategy
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
# Reuse existing crates
|
||||
ruvector-core = { path = "../ruvector-core", features = ["memory-only"] }
|
||||
ruvector-gnn-wasm = { path = "../ruvector-gnn-wasm", optional = true }
|
||||
ruvector-graph-wasm = { path = "../ruvector-graph-wasm", optional = true }
|
||||
sona = { path = "../sona", features = ["wasm"], optional = true }
|
||||
|
||||
# New dependencies
|
||||
sql-parser = { version = "0.9", optional = true }
|
||||
sparql-parser = { version = "0.3", optional = true } # Or custom
|
||||
cypher-parser = { version = "0.1", optional = true } # Or custom
|
||||
|
||||
# Standard WASM stack (from ruvector-wasm)
|
||||
wasm-bindgen = "0.2"
|
||||
wasm-bindgen-futures = "0.4"
|
||||
js-sys = "0.3"
|
||||
web-sys = { version = "0.3", features = ["IdbDatabase", ...] }
|
||||
serde-wasm-bindgen = "0.6"
|
||||
console_error_panic_hook = "0.1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Action Items
|
||||
|
||||
### Immediate Next Steps
|
||||
|
||||
1. **✅ Review existing implementations** (DONE - this document)
|
||||
2. **Create RvLite crate** using ruvector-wasm as template
|
||||
3. **Add dependencies** on existing WASM crates
|
||||
4. **Extract query engines** from ruvector-postgres (remove pgrx)
|
||||
5. **Build storage engine** using patterns from ruvector-wasm
|
||||
6. **Implement WASM bindings** following ruvector-wasm patterns
|
||||
7. **Test with existing WASM test infrastructure**
|
||||
|
||||
### Quick Win: Minimal Viable Product
|
||||
|
||||
**Week 1**: Create ruvector-wasm-lite
|
||||
```rust
|
||||
// Just vector operations + SQL
|
||||
#[dependencies]
|
||||
ruvector-core = { ... }
|
||||
sql-parser = { ... }
|
||||
```
|
||||
|
||||
**Week 2**: Add SPARQL
|
||||
```rust
|
||||
// Reuse ruvector-postgres/src/graph/sparql (remove pgrx)
|
||||
```
|
||||
|
||||
**Week 3**: Add Cypher + GNN
|
||||
```rust
|
||||
ruvector-gnn-wasm = { ... }
|
||||
```
|
||||
|
||||
**Week 4**: Polish and optimize
|
||||
- Size optimization
|
||||
- Performance tuning
|
||||
- Documentation
|
||||
|
||||
---
|
||||
|
||||
## 8. Size Budget Analysis
|
||||
|
||||
### Existing Sizes
|
||||
|
||||
- **micro-hnsw-wasm**: 11.8KB (minimal HNSW)
|
||||
- **ruvector_wasm_bg.wasm**: ~500KB (full vector ops)
|
||||
- **sona_bg.wasm**: ~300KB (learning system)
|
||||
|
||||
### RvLite Target
|
||||
|
||||
| Component | Estimated Size | Cumulative |
|
||||
|-----------|---------------|------------|
|
||||
| ruvector-core | ~500KB | 500KB |
|
||||
| SQL parser | ~200KB | 700KB |
|
||||
| SPARQL parser | ~300KB | 1MB |
|
||||
| Cypher parser | ~200KB | 1.2MB |
|
||||
| sona (learning) | ~300KB | 1.5MB |
|
||||
| micro-hnsw | ~12KB | 1.512MB |
|
||||
| Storage engine | ~200KB | 1.7MB |
|
||||
| **Total (gzipped)** | | **~2-3MB** |
|
||||
|
||||
**Verdict**: Much smaller than original 5-6MB estimate! 🎉
|
||||
|
||||
---
|
||||
|
||||
## 9. Conclusion
|
||||
|
||||
**We have a HUGE head start!**
|
||||
|
||||
- ✅ Battle-tested WASM infrastructure
|
||||
- ✅ Security patterns established
|
||||
- ✅ Build optimization figured out
|
||||
- ✅ Multiple working examples
|
||||
- ✅ Reusable components (ruvector-core, sona, micro-hnsw)
|
||||
|
||||
**RvLite can be built MUCH FASTER** (4-5 weeks instead of 8) by:
|
||||
1. Reusing ruvector-wasm patterns
|
||||
2. Depending on existing WASM crates
|
||||
3. Extracting query engines from ruvector-postgres
|
||||
4. Following established build configs
|
||||
|
||||
**Next**: Continue SPARC documentation with this context!
|
||||
Reference in New Issue
Block a user