Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
544
vendor/ruvector/crates/rvlite/docs/01_SPECIFICATION.md
vendored
Normal file
544
vendor/ruvector/crates/rvlite/docs/01_SPECIFICATION.md
vendored
Normal file
@@ -0,0 +1,544 @@
|
||||
# Phase 1: Specification
|
||||
|
||||
## S - Specification Phase
|
||||
|
||||
**Duration**: Weeks 1-2
|
||||
**Goal**: Define complete requirements, constraints, and success criteria
|
||||
|
||||
---
|
||||
|
||||
## 1. Product Vision
|
||||
|
||||
### 1.1 Mission Statement
|
||||
|
||||
**RvLite** is a standalone, WASM-first vector database that brings the full power of ruvector-postgres to any environment - browser, Node.js, edge workers, mobile apps - without requiring PostgreSQL installation.
|
||||
|
||||
### 1.2 Target Users
|
||||
|
||||
1. **Frontend Developers** - Building AI-powered web apps with in-browser vector search
|
||||
2. **Edge Computing** - Serverless/edge environments (Cloudflare Workers, Deno Deploy)
|
||||
3. **Mobile Developers** - React Native, Capacitor apps with local vector storage
|
||||
4. **Data Scientists** - Rapid prototyping without infrastructure setup
|
||||
5. **Embedded Systems** - IoT, embedded devices with limited resources
|
||||
|
||||
### 1.3 Use Cases
|
||||
|
||||
#### UC-1: In-Browser Semantic Search
|
||||
```typescript
|
||||
// User browses documentation site
|
||||
// All searches happen locally, no backend needed
|
||||
const db = await RvLite.create();
|
||||
await db.loadDocuments(docs);
|
||||
const results = await db.searchSimilar(queryEmbedding);
|
||||
```
|
||||
|
||||
#### UC-2: Edge AI Search
|
||||
```typescript
|
||||
// Cloudflare Worker handles product search
|
||||
// Vector DB runs at the edge, globally distributed
|
||||
export default {
|
||||
async fetch(request) {
|
||||
const db = await RvLite.create();
|
||||
return searchProducts(db, query);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### UC-3: Knowledge Graph Exploration
|
||||
```typescript
|
||||
// Interactive graph visualization in browser
|
||||
// SPARQL + Cypher queries run client-side
|
||||
const db = await RvLite.create();
|
||||
await db.cypher('MATCH (a)-[r]->(b) RETURN a, r, b');
|
||||
await db.sparql('SELECT ?s ?p ?o WHERE { ?s ?p ?o }');
|
||||
```
|
||||
|
||||
#### UC-4: Self-Learning Agent
|
||||
```typescript
|
||||
// AI agent learns from user interactions
|
||||
// ReasoningBank stores patterns locally
|
||||
const db = await RvLite.create();
|
||||
await db.learning.recordTrajectory(state, action, reward);
|
||||
const nextAction = await db.learning.predictBest(state);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Functional Requirements
|
||||
|
||||
### 2.1 Core Database Features
|
||||
|
||||
#### FR-1: Vector Operations
|
||||
- **FR-1.1** Support vector types: `vector(n)`, `halfvec(n)`, `binaryvec(n)`, `sparsevec(n)`
|
||||
- **FR-1.2** Distance metrics: L2, cosine, inner product, L1, Hamming
|
||||
- **FR-1.3** Vector operations: add, subtract, scale, normalize
|
||||
- **FR-1.4** SIMD-optimized computations using WASM SIMD
|
||||
|
||||
#### FR-2: Indexing
|
||||
- **FR-2.1** HNSW index for approximate nearest neighbor search
|
||||
- **FR-2.2** Configurable parameters: M (connections), ef_construction, ef_search
|
||||
- **FR-2.3** Dynamic index updates (insert/delete)
|
||||
- **FR-2.4** B-Tree index for scalar columns
|
||||
- **FR-2.5** Triple store indexes (SPO, POS, OSP) for RDF data
|
||||
|
||||
#### FR-3: Query Languages
|
||||
|
||||
**FR-3.1 SQL Support**
|
||||
```sql
|
||||
-- Table creation
|
||||
CREATE TABLE documents (
|
||||
id SERIAL PRIMARY KEY,
|
||||
content TEXT,
|
||||
embedding VECTOR(384)
|
||||
);
|
||||
|
||||
-- Index creation
|
||||
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
|
||||
|
||||
-- Vector search
|
||||
SELECT *, embedding <=> $1 AS distance
|
||||
FROM documents
|
||||
ORDER BY distance
|
||||
LIMIT 10;
|
||||
|
||||
-- Hybrid search
|
||||
SELECT *
|
||||
FROM documents
|
||||
WHERE content ILIKE '%query%'
|
||||
ORDER BY embedding <=> $1
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
**FR-3.2 SPARQL 1.1 Support**
|
||||
```sparql
|
||||
# SELECT queries
|
||||
SELECT ?subject ?label
|
||||
WHERE {
|
||||
?subject rdfs:label ?label .
|
||||
FILTER(lang(?label) = "en")
|
||||
}
|
||||
|
||||
# CONSTRUCT queries
|
||||
CONSTRUCT { ?s foaf:knows ?o }
|
||||
WHERE { ?s :similar_to ?o }
|
||||
|
||||
# INSERT/DELETE updates
|
||||
INSERT DATA {
|
||||
<http://example.org/person1> foaf:name "Alice" .
|
||||
}
|
||||
|
||||
# Property paths
|
||||
SELECT ?person ?friend
|
||||
WHERE {
|
||||
?person foaf:knows+ ?friend .
|
||||
}
|
||||
```
|
||||
|
||||
**FR-3.3 Cypher Support**
|
||||
```cypher
|
||||
// Pattern matching
|
||||
MATCH (a:Person)-[:KNOWS]->(b:Person)
|
||||
WHERE a.age > 30
|
||||
RETURN a.name, b.name
|
||||
|
||||
// Graph creation
|
||||
CREATE (a:Person {name: 'Alice', embedding: $emb})
|
||||
CREATE (b:Person {name: 'Bob'})
|
||||
CREATE (a)-[:KNOWS]->(b)
|
||||
|
||||
// Vector-enhanced queries
|
||||
MATCH (p:Person)
|
||||
WHERE vector.cosine(p.embedding, $query) > 0.8
|
||||
RETURN p.name, p.embedding
|
||||
ORDER BY vector.cosine(p.embedding, $query) DESC
|
||||
```
|
||||
|
||||
#### FR-4: Graph Operations
|
||||
- **FR-4.1** Graph traversal (BFS, DFS)
|
||||
- **FR-4.2** Shortest path algorithms (Dijkstra, A*)
|
||||
- **FR-4.3** Community detection
|
||||
- **FR-4.4** PageRank and centrality metrics
|
||||
- **FR-4.5** Vector-enhanced graph search
|
||||
|
||||
#### FR-5: Graph Neural Networks (GNN)
|
||||
- **FR-5.1** GCN (Graph Convolutional Networks)
|
||||
- **FR-5.2** GraphSage
|
||||
- **FR-5.3** GAT (Graph Attention Networks)
|
||||
- **FR-5.4** GIN (Graph Isomorphism Networks)
|
||||
- **FR-5.5** Node/edge embeddings
|
||||
- **FR-5.6** Graph classification
|
||||
|
||||
#### FR-6: Self-Learning (ReasoningBank)
|
||||
- **FR-6.1** Trajectory recording (state, action, reward)
|
||||
- **FR-6.2** Pattern recognition
|
||||
- **FR-6.3** Memory distillation
|
||||
- **FR-6.4** Strategy optimization
|
||||
- **FR-6.5** Verdict judgment
|
||||
- **FR-6.6** Adaptive learning rates
|
||||
|
||||
#### FR-7: Hyperbolic Embeddings
|
||||
- **FR-7.1** Poincaré disk model
|
||||
- **FR-7.2** Lorentz/hyperboloid model
|
||||
- **FR-7.3** Hyperbolic distance metrics
|
||||
- **FR-7.4** Exponential/logarithmic maps
|
||||
- **FR-7.5** Hyperbolic neural networks
|
||||
|
||||
#### FR-8: Storage & Persistence
|
||||
|
||||
**FR-8.1 In-Memory Storage**
|
||||
- Primary storage: DashMap (concurrent hash maps)
|
||||
- Fast access: O(1) lookup for primary keys
|
||||
- Thread-safe concurrent access
|
||||
|
||||
**FR-8.2 Persistence Backends**
|
||||
```rust
|
||||
// Browser: IndexedDB
|
||||
await db.save(); // Saves to IndexedDB
|
||||
const db = await RvLite.load(); // Loads from IndexedDB
|
||||
|
||||
// Browser: OPFS (Origin Private File System)
|
||||
await db.saveToOPFS();
|
||||
await db.loadFromOPFS();
|
||||
|
||||
// Node.js/Deno/Bun: File system
|
||||
await db.saveToFile('database.rvlite');
|
||||
await RvLite.loadFromFile('database.rvlite');
|
||||
```
|
||||
|
||||
**FR-8.3 Serialization Formats**
|
||||
- Binary: rkyv (zero-copy deserialization)
|
||||
- JSON: For debugging and exports
|
||||
- Apache Arrow: For data exchange
|
||||
|
||||
#### FR-9: Transactions (ACID)
|
||||
- **FR-9.1** Atomic operations (all-or-nothing)
|
||||
- **FR-9.2** Consistency (integrity constraints)
|
||||
- **FR-9.3** Isolation (snapshot isolation)
|
||||
- **FR-9.4** Durability (write-ahead logging)
|
||||
|
||||
#### FR-10: Quantization
|
||||
- **FR-10.1** Binary quantization (1-bit)
|
||||
- **FR-10.2** Scalar quantization (8-bit)
|
||||
- **FR-10.3** Product quantization (configurable)
|
||||
- **FR-10.4** Automatic quantization selection
|
||||
|
||||
---
|
||||
|
||||
## 3. Non-Functional Requirements
|
||||
|
||||
### 3.1 Performance
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| WASM bundle size | < 6MB gzipped | `du -h rvlite_bg.wasm` |
|
||||
| Initial load time | < 1s | Performance API |
|
||||
| Query latency (1k vectors) | < 20ms | Benchmark suite |
|
||||
| Insert throughput | > 10k/s | Benchmark suite |
|
||||
| Memory usage (100k vectors) | < 200MB | Chrome DevTools |
|
||||
| HNSW search recall@10 | > 95% | ANN benchmarks |
|
||||
|
||||
### 3.2 Scalability
|
||||
|
||||
| Dimension | Limit | Rationale |
|
||||
|-----------|-------|-----------|
|
||||
| Max table size | 10M rows | Memory constraints |
|
||||
| Max vector dimensions | 4096 | WASM memory limits |
|
||||
| Max tables | 1000 | Reasonable use case |
|
||||
| Max indexes per table | 10 | Performance trade-off |
|
||||
| Max concurrent queries | 100 | WASM thread pool |
|
||||
|
||||
### 3.3 Compatibility
|
||||
|
||||
**Browser Support**
|
||||
- Chrome/Edge 91+ (WASM SIMD)
|
||||
- Firefox 89+ (WASM SIMD)
|
||||
- Safari 16.4+ (WASM SIMD)
|
||||
|
||||
**Runtime Support**
|
||||
- Node.js 18+
|
||||
- Deno 1.30+
|
||||
- Bun 1.0+
|
||||
- Cloudflare Workers
|
||||
- Vercel Edge Functions
|
||||
- Netlify Edge Functions
|
||||
|
||||
**Platform Support**
|
||||
- x86-64 (Intel/AMD)
|
||||
- ARM64 (Apple Silicon, AWS Graviton)
|
||||
- WebAssembly (universal)
|
||||
|
||||
### 3.4 Security
|
||||
|
||||
- **SEC-1** No arbitrary code execution
|
||||
- **SEC-2** Memory-safe (Rust guarantees)
|
||||
- **SEC-3** No SQL injection (prepared statements)
|
||||
- **SEC-4** Sandboxed WASM execution
|
||||
- **SEC-5** CORS-compliant (browser)
|
||||
- **SEC-6** No sensitive data in errors
|
||||
|
||||
### 3.5 Usability
|
||||
|
||||
- **US-1** Zero-config installation: `npm install @rvlite/wasm`
|
||||
- **US-2** TypeScript-first API with full type definitions
|
||||
- **US-3** Comprehensive documentation with examples
|
||||
- **US-4** Error messages with helpful suggestions
|
||||
- **US-5** Debug logging (optional, configurable)
|
||||
|
||||
### 3.6 Maintainability
|
||||
|
||||
- **MAIN-1** Test coverage > 90%
|
||||
- **MAIN-2** CI/CD pipeline (GitHub Actions)
|
||||
- **MAIN-3** Semantic versioning (semver)
|
||||
- **MAIN-4** Automated releases
|
||||
- **MAIN-5** Deprecation warnings (6-month notice)
|
||||
|
||||
---
|
||||
|
||||
## 4. Constraints
|
||||
|
||||
### 4.1 Technical Constraints
|
||||
|
||||
**WASM Limitations**
|
||||
- Single-threaded by default (multi-threading experimental)
|
||||
- Limited to 4GB memory (32-bit address space)
|
||||
- No direct file system access (browser)
|
||||
- No native threads (use Web Workers)
|
||||
|
||||
**Rust/WASM Constraints**
|
||||
- No `std::fs` in `wasm32-unknown-unknown`
|
||||
- No native threading (use `wasm-bindgen-futures`)
|
||||
- Must use `no_std` or WASM-compatible crates
|
||||
- Size overhead from Rust std library
|
||||
|
||||
### 4.2 Performance Constraints
|
||||
|
||||
- WASM is ~2-3x slower than native code
|
||||
- SIMD limited to 128-bit (vs 512-bit AVX-512)
|
||||
- Garbage collection overhead (JS interop)
|
||||
- Copy overhead for large data transfers
|
||||
|
||||
### 4.3 Resource Constraints
|
||||
|
||||
**Development Team**
|
||||
- 1 developer (8 weeks)
|
||||
- Community contributions (optional)
|
||||
|
||||
**Timeline**
|
||||
- 8 weeks total
|
||||
- 2 weeks per major phase
|
||||
- Beta release by Week 8
|
||||
|
||||
**Budget**
|
||||
- Open source (no monetary budget)
|
||||
- CI/CD: GitHub Actions (free tier)
|
||||
- Hosting: npm registry (free)
|
||||
|
||||
---
|
||||
|
||||
## 5. Success Criteria
|
||||
|
||||
### 5.1 Functional Completeness
|
||||
|
||||
- [ ] All vector operations working
|
||||
- [ ] SQL queries execute correctly
|
||||
- [ ] SPARQL queries pass W3C test suite
|
||||
- [ ] Cypher queries compatible with Neo4j syntax
|
||||
- [ ] GNN layers produce correct outputs
|
||||
- [ ] ReasoningBank learns from trajectories
|
||||
- [ ] Hyperbolic operations validated
|
||||
|
||||
### 5.2 Performance Benchmarks
|
||||
|
||||
- [ ] Bundle size < 6MB gzipped
|
||||
- [ ] Load time < 1s (browser)
|
||||
- [ ] Query latency < 20ms (1k vectors)
|
||||
- [ ] HNSW recall@10 > 95%
|
||||
- [ ] Memory usage < 200MB (100k vectors)
|
||||
|
||||
### 5.3 Quality Metrics
|
||||
|
||||
- [ ] Test coverage > 90%
|
||||
- [ ] Zero clippy warnings
|
||||
- [ ] All examples working
|
||||
- [ ] Documentation complete
|
||||
- [ ] API stable (no breaking changes)
|
||||
|
||||
### 5.4 Adoption Metrics (Post-Release)
|
||||
|
||||
- [ ] 100+ npm downloads/week
|
||||
- [ ] 10+ GitHub stars
|
||||
- [ ] 3+ community contributions
|
||||
- [ ] Featured in blog posts/articles
|
||||
|
||||
---
|
||||
|
||||
## 6. Out of Scope (v1.0)
|
||||
|
||||
### Not Included in Initial Release
|
||||
|
||||
- **Multi-user access** - Single-user database only
|
||||
- **Distributed queries** - No sharding or replication
|
||||
- **Advanced SQL** - No JOINs, subqueries, CTEs (future)
|
||||
- **Full-text search** - Basic LIKE only (no Elasticsearch-level)
|
||||
- **Geospatial** - No PostGIS-like features
|
||||
- **Time series** - No specialized time-series optimizations
|
||||
- **Streaming queries** - No live query updates
|
||||
- **Custom UDFs** - No user-defined functions in v1.0
|
||||
|
||||
### Future Considerations (v2.0+)
|
||||
|
||||
- Multi-threading support (WASM threads)
|
||||
- Advanced SQL features (JOINs, CTEs)
|
||||
- Streaming/reactive queries
|
||||
- Plugin system for extensions
|
||||
- Custom vector distance metrics
|
||||
- GPU acceleration (WebGPU)
|
||||
|
||||
---
|
||||
|
||||
## 7. Dependencies & Licenses
|
||||
|
||||
### Rust Crates (MIT/Apache-2.0)
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
wasm-bindgen = "0.2"
|
||||
serde = { version = "1.0", features = ["derive"] }
|
||||
serde-wasm-bindgen = "0.6"
|
||||
js-sys = "0.3"
|
||||
web-sys = { version = "0.3", features = ["Window", "IdbDatabase"] }
|
||||
dashmap = "6.0"
|
||||
parking_lot = "0.12"
|
||||
simsimd = "5.9"
|
||||
half = "2.4"
|
||||
rkyv = "0.8"
|
||||
once_cell = "1.19"
|
||||
thiserror = "1.0"
|
||||
|
||||
[dev-dependencies]
|
||||
wasm-bindgen-test = "0.3"
|
||||
criterion = "0.5"
|
||||
```
|
||||
|
||||
### License
|
||||
|
||||
**MIT License** (permissive, compatible with ruvector-postgres)
|
||||
|
||||
---
|
||||
|
||||
## 8. Risk Analysis
|
||||
|
||||
### High Risk
|
||||
|
||||
| Risk | Impact | Probability | Mitigation |
|
||||
|------|--------|-------------|------------|
|
||||
| WASM size > 10MB | High | Medium | Aggressive tree-shaking, feature gating |
|
||||
| Performance < 50% of native | High | Medium | WASM SIMD, optimized algorithms |
|
||||
| Browser compatibility issues | High | Low | Polyfills, fallbacks |
|
||||
|
||||
### Medium Risk
|
||||
|
||||
| Risk | Impact | Probability | Mitigation |
|
||||
|------|--------|-------------|------------|
|
||||
| IndexedDB quota limits | Medium | Medium | OPFS fallback, compression |
|
||||
| Memory leaks in WASM | Medium | Low | Careful lifetime management |
|
||||
| Breaking API changes | Medium | Medium | Semver, deprecation warnings |
|
||||
|
||||
### Low Risk
|
||||
|
||||
| Risk | Impact | Probability | Mitigation |
|
||||
|------|--------|-------------|------------|
|
||||
| Dependency vulnerabilities | Low | Low | Dependabot, security audits |
|
||||
| Documentation outdated | Low | Medium | CI checks, automated validation |
|
||||
|
||||
---
|
||||
|
||||
## 9. Validation & Acceptance
|
||||
|
||||
### 9.1 Validation Methods
|
||||
|
||||
**Unit Tests**
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
#[test]
|
||||
fn test_vector_cosine_distance() {
|
||||
let a = vec![1.0, 0.0, 0.0];
|
||||
let b = vec![0.0, 1.0, 0.0];
|
||||
let dist = cosine_distance(&a, &b);
|
||||
assert!((dist - 1.0).abs() < 0.001);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Integration Tests**
|
||||
```typescript
|
||||
import { RvLite } from '@rvlite/wasm';
|
||||
|
||||
describe('Vector Search', () => {
|
||||
it('should find similar vectors', async () => {
|
||||
const db = await RvLite.create();
|
||||
await db.sql('CREATE TABLE docs (id INT, vec VECTOR(3))');
|
||||
await db.sql('INSERT INTO docs VALUES (1, $1)', [[1, 0, 0]]);
|
||||
const results = await db.sql('SELECT * FROM docs ORDER BY vec <=> $1', [[1, 0, 0]]);
|
||||
expect(results[0].id).toBe(1);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
**Benchmark Tests**
|
||||
```rust
|
||||
use criterion::{black_box, criterion_group, criterion_main, Criterion};
|
||||
|
||||
fn bench_hnsw_search(c: &mut Criterion) {
|
||||
let index = build_hnsw_index(1000);
|
||||
let query = random_vector(384);
|
||||
|
||||
c.bench_function("hnsw_search_1k", |b| {
|
||||
b.iter(|| index.search(black_box(&query), 10))
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 Acceptance Criteria
|
||||
|
||||
**Must Have**
|
||||
- [ ] All functional requirements implemented
|
||||
- [ ] Performance benchmarks met
|
||||
- [ ] Test coverage > 90%
|
||||
- [ ] Documentation complete
|
||||
- [ ] Examples working in browser, Node.js, Deno
|
||||
|
||||
**Should Have**
|
||||
- [ ] TypeScript types accurate
|
||||
- [ ] Error messages helpful
|
||||
- [ ] Debug logging available
|
||||
- [ ] Migration guide from ruvector-postgres
|
||||
|
||||
**Could Have**
|
||||
- [ ] Interactive playground
|
||||
- [ ] Video tutorials
|
||||
- [ ] Community forum
|
||||
|
||||
---
|
||||
|
||||
## 10. Glossary
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **WASM** | WebAssembly - binary instruction format for stack-based virtual machine |
|
||||
| **HNSW** | Hierarchical Navigable Small World - graph-based ANN algorithm |
|
||||
| **ANN** | Approximate Nearest Neighbor - fast similarity search |
|
||||
| **SIMD** | Single Instruction Multiple Data - parallel computation |
|
||||
| **GNN** | Graph Neural Network - neural networks for graph data |
|
||||
| **SPARQL** | SPARQL Protocol and RDF Query Language - RDF query language |
|
||||
| **Cypher** | Neo4j's graph query language |
|
||||
| **ReasoningBank** | Self-learning framework for AI agents |
|
||||
| **RDF** | Resource Description Framework - semantic web standard |
|
||||
| **Triple Store** | Database for storing RDF triples (subject-predicate-object) |
|
||||
| **OPFS** | Origin Private File System - browser file storage API |
|
||||
| **IndexedDB** | Browser-based NoSQL database |
|
||||
|
||||
---
|
||||
|
||||
**Next**: [02_API_SPECIFICATION.md](./02_API_SPECIFICATION.md) - Complete API design
|
||||
Reference in New Issue
Block a user