5.5 KiB
5.5 KiB
RuVector Developer Quickstart
Distilled from 3,135 commits, 91 crates, and 55 ADRs across 99 days of development.
What is RuVector?
A Rust-native computation platform for vectors, graphs, and neural networks. Not just a vector database — a full stack from PostgreSQL extension to WASM microkernel.
91 crates organized in layers:
Applications ruvector-postgres (230+ SQL), ruvllm (LLM serving), mcp-gate
|
Compute ruvector-graph-transformer, ruvector-gnn, ruvector-solver,
ruvector-mincut, ruvector-attention (39 types), ruvector-coherence
|
Core ruvector-core (HNSW + SIMD), ruvector-graph (Cypher),
ruvector-math, ruvector-verified (proofs)
|
Format rvf-types, rvf-wire, rvf-runtime, rvf-crypto (ML-DSA-65)
|
Bindings *-wasm (20+), *-node (NAPI-RS), ruvector-cli
First Steps
Build everything
# Prerequisites: Rust 1.83+, Node.js 20+
cargo build --workspace
npm run build # NAPI-RS bindings
npm test
Use the vector database
use ruvector_core::vector_db::VectorDb;
let db = VectorDb::create("my_vectors.db", 384)?; // 384-dim embeddings
db.insert("doc1", &embedding_vector, &metadata)?;
let results = db.search(&query_vector, 10)?; // top-10 nearest
Use from PostgreSQL
CREATE EXTENSION ruvector;
CREATE TABLE items (id serial, embedding vector(384));
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
SELECT * FROM items ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10;
-- GNN in SQL
SELECT ruvector_gcn_forward(features, adjacency, weights);
-- Flash attention in SQL
SELECT ruvector_flash_attention(q, k, v);
Use from WASM
import { VectorDb } from '@ruvector/wasm';
const db = new VectorDb(384);
db.insert('doc1', embedding);
const results = db.search(query, 10);
Key Crates to Know
| If you need... | Use this crate | Key fact |
|---|---|---|
| Vector search | ruvector-core |
HNSW, SIMD, 2.5K qps on 10K vectors |
| Graph database | ruvector-graph |
Neo4j-compatible Cypher, petgraph + roaring |
| GNN training | ruvector-gnn |
Message-passing on HNSW topology |
| Graph transformers | ruvector-graph-transformer |
8 verified modules, proof-gated |
| LLM inference | ruvllm |
Paged attention, Metal/CUDA/CoreML |
| Sparse solvers | ruvector-solver |
O(log n) PageRank, spectral methods |
| Min-cut | ruvector-mincut |
First subpolynomial dynamic min-cut |
| PostgreSQL | ruvector-postgres |
230+ SQL functions, pgvector replacement |
| Binary format | rvf-* |
25 segment types, crash-safe, post-quantum |
Architecture Patterns
Feature flags everywhere
[features]
default = ["simd", "storage", "hnsw", "parallel"]
wasm = [] # Disables storage, SIMD, parallel
full = ["simd", "storage", "async-runtime", "compression", "hnsw"]
Every WASM crate mirrors a non-WASM crate. Storage falls back to in-memory.
Concurrency stack
rayon— data parallelism (map/reduce)crossbeam— channels and concurrent queuesdashmap— concurrent HashMap (never usestd::sync::Mutex)parking_lot— fast locks when you must lock
Testing strategy
proptestfor property-based testingcriterionfor benchmarksmockallfor mocking- London-school TDD (mock-first) for new code
Publishing order
Leaf crates first, then dependents:
ruvector-solver → ruvector-solver-wasm, ruvector-solver-node
Always: cargo publish --dry-run --allow-dirty before real publish.
RVF Format (The Unifier)
All RuVector libraries converge on RVF — a single binary format with:
- 25 segment types (Vec, Index, Overlay, Journal, Manifest, Quant, Meta, Witness, Crypto, Kernel, WASM, ...)
- Crash-safe without WAL (append-only + two-fsync protocol)
- Progressive indexing (Layer A/B/C — first query in <5ms)
- Post-quantum crypto (ML-DSA-65 signatures)
- 5 domain profiles (.rvf, .rvdna, .rvtext, .rvgraph, .rvvis)
- Self-booting (embedded WASM microkernel <8KB)
use rvf_runtime::RvfStore;
let store = RvfStore::create("knowledge.rvf", options)?;
store.ingest_batch(&embeddings, &ids, Some(&metadata))?;
let results = store.query(&query_vec, 10, &query_options)?;
Critical ADRs to Read
| ADR | Why it matters |
|---|---|
| ADR-001 | Core architecture — the foundation everything builds on |
| ADR-029 | RVF canonical format — the single most important design decision |
| ADR-015 | Coherence-gated transformer — sheaf attention mechanism |
| ADR-046 | Graph transformer architecture — the unified compute model |
| ADR-044 | PostgreSQL v0.3 — 230+ SQL functions |
| ADR-042 | TEE attestation — confidential computing model |
Common Gotchas
- redb locking — Use the global connection pool; don't open the same DB file twice
- NAPI binaries —
git add -fneeded in CI to commit .node files past .gitignore - WASM size — Microkernel budget is 8KB; CI asserts
wasm-opt -Oz < 8192 - pgrx — Requires explicit
--features pg17in test commands - ruvector-profiler — Has
publish = false; intentionally not on crates.io - Rust version — Main workspace needs 1.83+; rvf crates need 1.87+
Project Links
- Repository: https://github.com/ruvnet/ruvector
- ADRs:
docs/adr/(55+ decisions) - Benchmarks:
cargo benchin individual crates - Knowledge export:
docs/research/knowledge-export/ruvector-knowledge.rvf.json