62 KiB
RVF Examples — Learn by Running
Hands-on examples for the unified agentic AI format — store it, send it, run it
Quick Start • Examples • Features • Performance • Comparison
What is RVF?
RVF (RuVector Format) is the unified agentic AI file format. One .rvf file does three jobs:
- Store — vectors, indexes, metadata, and cryptographic proofs live in one file. No database server required.
- Transfer — the same file streams over a network. Query, insert, and delete operations work over the wire with zero conversion.
- Run — pack model weights, graph neural networks, WASM code, or even a bootable OS kernel into the file. Now it's not just data — it's a self-contained intelligence unit you can deploy anywhere.
Why does this matter?
Today, an AI agent's state is scattered: embeddings in one database, model weights in another, graph structure in a third, config in a fourth. Nothing talks to anything else. Moving between tools means re-indexing from scratch. There's no standard way to prove any of it was computed securely — and no way to hand an agent its complete knowledge as a single portable artifact.
RVF solves this. It gives agentic AI a universal substrate — one file that works everywhere:
| What it does | Where it runs | What you get |
|---|---|---|
| Stores vectors | Server (HNSW index) | Sub-millisecond search over millions of vectors |
| Stores vectors | Browser (5.5 KB WASM) | Same file, no backend needed |
| Stores vectors | Edge / IoT / mobile | Lightweight API, tiny footprint |
| Transfers data | Over the network | Batched query/ingest/delete via TCP |
| Runs code | Inside a TEE | Cryptographic proof of secure computation |
| Runs code | Bare metal / VM | File boots itself as a microservice |
| Runs code | Linux kernel (eBPF) | Sub-microsecond hot-path acceleration |
| Runs intelligence | Anywhere | Model + data + graph + trust chain in one file |
Key properties
- Crash-safe — no write-ahead log needed; if power dies mid-write, the file stays consistent
- Self-describing — the schema is in the file; no external catalog required
- Progressive loading — start answering queries before the full index is loaded
- Domain profiles —
.rvdnafor genomics,.rvtextfor language,.rvgraphfor networks,.rvvisfor vision — same format underneath - Lineage tracking — every derived file records its parent's hash, like DNA inheritance
- Tamper-evident — witness chains and post-quantum signatures prove nothing was altered
These examples walk you through every major feature, from the simplest "insert and query" to wire format inspection, witness chains, and sealed cognitive engines.
What you can build with RVF
| Use case | What goes in the file | Result |
|---|---|---|
| Semantic search | Vectors + HNSW index | Single-file vector database, no server needed |
| Agent memory | Vectors + metadata + witness chain | Portable, auditable AI agent knowledge base |
| Sealed LoRA distribution | Base embeddings + OVERLAY_SEG adapter deltas | Ship fine-tuned models as one versioned file |
| Portable graph intelligence | Node embeddings + GRAPH_SEG adjacency | GNN state that transfers between systems |
| Self-booting AI service | Vectors + index + KERNEL_SEG unikernel | File boots as a microservice on bare metal or Firecracker |
| Kernel-accelerated cache | Hot vectors + EBPF_SEG XDP program | Sub-microsecond lookups in the Linux kernel data path |
| Confidential AI | Any of the above + TEE attestation | Cryptographic proof everything ran inside a secure enclave |
| Genomic analysis | DNA k-mer embeddings + variant tensors | .rvdna file with lineage tracking across analysis pipeline |
| Firmware-style AI versioning | Full cognitive state + lineage chain | Parent → child derivation with hash verification, like DNA |
Quick Start
# Clone the repo
git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf
# Run your first example
cargo run --example basic_store
That's it. You'll see a store created, 100 vectors inserted, nearest neighbors found, and persistence verified — all in under a second.
Using the CLI
You can also work with RVF stores from the command line without writing any Rust:
# Build the CLI
cd crates/rvf && cargo build -p rvf-cli
# Create a store, ingest data, and query
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json
rvf query vectors.rvf --vector "0.1,0.2,..." --k 10
rvf status vectors.rvf
rvf inspect vectors.rvf
rvf compact vectors.rvf
# Derive a child store with lineage tracking
rvf derive parent.rvf child.rvf --type filter
# All commands support --json for machine-readable output
rvf status vectors.rvf --json
Run All 40 Examples
Core (6):
cargo run --example basic_store # Store lifecycle + k-NN
cargo run --example progressive_index # Three-layer HNSW recall
cargo run --example quantization # Scalar / product / binary
cargo run --example wire_format # Raw segment I/O
cargo run --example crypto_signing # Ed25519 + witness chains
cargo run --example filtered_search # Metadata-filtered queries
Agentic AI (6):
cargo run --example agent_memory # Persistent agent memory + witness audit
cargo run --example swarm_knowledge # Multi-agent shared knowledge base
cargo run --example reasoning_trace # Chain-of-thought with lineage derivation
cargo run --example tool_cache # Tool call result cache with TTL
cargo run --example agent_handoff # Transfer agent state between instances
cargo run --example experience_replay # RL experience replay buffer
Practical Production (5):
cargo run --example semantic_search # Document search with metadata filters
cargo run --example recommendation # Item recommendations (collaborative filtering)
cargo run --example rag_pipeline # Retrieval-augmented generation pipeline
cargo run --example embedding_cache # LRU cache with temperature tiering
cargo run --example dedup_detector # Near-duplicate detection + compaction
Vertical Domains (4):
cargo run --example genomic_pipeline # DNA k-mer search (.rvdna profile)
cargo run --example financial_signals # Market signals with TEE attestation
cargo run --example medical_imaging # Radiology search (.rvvis profile)
cargo run --example legal_discovery # Legal doc similarity (.rvtext profile)
Exotic Capabilities (5):
cargo run --example self_booting # RVF with embedded unikernel
cargo run --example ebpf_accelerator # eBPF hot-path acceleration
cargo run --example hyperbolic_taxonomy # Hierarchy-aware search
cargo run --example multimodal_fusion # Cross-modal text + image search
cargo run --example sealed_engine # Full cognitive engine (capstone)
Runtime Targets (4) + Postgres (1):
cargo run --example browser_wasm # Browser-side WASM vector search
cargo run --example edge_iot # IoT device with binary quantization
cargo run --example serverless_function # Cold-start optimized for Lambda
cargo run --example ruvllm_inference # LLM KV cache + LoRA via RVF
cargo run --example postgres_bridge # PostgreSQL ↔ RVF export/import
Network & Security (4):
cargo run --example network_sync # Peer-to-peer vector store sync
cargo run --example tee_attestation # TEE attestation + sealed keys
cargo run --example access_control # Role-based vector access control
cargo run --example zero_knowledge # Zero-knowledge proof integration
Autonomous Agent (1):
cargo run --example ruvbot # Autonomous RVF-powered agent bot
POSIX & Systems (3):
cargo run --example posix_fileops # POSIX file operations with RVF
cargo run --example linux_microkernel # Linux microkernel distribution
cargo run --example mcp_in_rvf # MCP server embedded in RVF
Network Operations (1):
cargo run --example network_interfaces # Network OS telemetry (60 interfaces)
Prerequisites
- Rust 1.87+ — install via rustup
- No other dependencies needed — everything builds from source
- All examples use deterministic pseudo-random data, so results are reproducible across runs
Examples at a Glance (40 examples)
Core
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 1 | basic_store | Beginner | Create, insert, query, persist, reopen |
| 2 | progressive_index | Intermediate | Three-layer HNSW, recall measurement |
| 3 | quantization | Intermediate | Scalar/product/binary quantization, tiering |
| 4 | wire_format | Advanced | Raw segment I/O, hash validation, tail-scan |
| 5 | crypto_signing | Advanced | Ed25519 signing, witness chains, tamper detection |
| 6 | filtered_search | Intermediate | Metadata filters: Eq, Range, AND/OR/IN |
Agentic AI
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 7 | agent_memory | Intermediate | Persistent agent memory, session recall, witness audit |
| 8 | swarm_knowledge | Intermediate | Multi-agent shared knowledge, cross-agent search |
| 9 | reasoning_trace | Advanced | Chain-of-thought lineage (parent → child → grandchild) |
| 10 | tool_cache | Intermediate | Tool call caching, TTL, delete_by_filter, compaction |
| 11 | agent_handoff | Advanced | Transfer agent state, derive clone, lineage verification |
| 12 | experience_replay | Intermediate | RL replay buffer, priority sampling, tiering |
Practical Production
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 13 | semantic_search | Beginner | Document search engine, 4 filter workflows |
| 14 | recommendation | Intermediate | Collaborative filtering, genre/quality filters |
| 15 | rag_pipeline | Advanced | 5-step RAG: chunk, embed, retrieve, rerank, assemble |
| 16 | embedding_cache | Advanced | Zipf access patterns, 3-tier quantization, memory savings |
| 17 | dedup_detector | Intermediate | Near-duplicate detection, clustering, compaction |
Vertical Domains
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 18 | genomic_pipeline | Advanced | DNA k-mer search, .rvdna profile, lineage |
| 19 | financial_signals | Advanced | Market signals, Ed25519 signing, attestation |
| 20 | medical_imaging | Intermediate | Radiology search, .rvvis profile, audit trail |
| 21 | legal_discovery | Intermediate | Legal similarity, .rvtext profile, discovery audit |
Exotic Capabilities
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 22 | self_booting | Advanced | Embed/extract unikernel, kernel header verification |
| 23 | ebpf_accelerator | Advanced | Embed/extract eBPF, XDP program, co-existence |
| 24 | hyperbolic_taxonomy | Intermediate | Hierarchy-aware embeddings, depth-filtered search |
| 25 | multimodal_fusion | Intermediate | Cross-modal text+image search, modality filtering |
| 26 | sealed_engine | Advanced | Capstone: vectors + kernel + eBPF + witness + lineage |
Runtime Targets + Postgres
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 27 | browser_wasm | Intermediate | WASM-compatible API, raw wire segments, size targets |
| 28 | edge_iot | Beginner | Constrained device, binary quantization, memory budget |
| 29 | serverless_function | Intermediate | Cold start, manifest tail-scan, progressive loading |
| 30 | ruvllm_inference | Advanced | KV cache + LoRA adapters + policy store via RVF |
| 31 | postgres_bridge | Intermediate | PG export/import, offline query, lineage, witness audit |
Network & Security
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 32 | network_sync | Advanced | Peer-to-peer sync, vector exchange, conflict resolution |
| 33 | tee_attestation | Advanced | TEE platform attestation, sealed keys, computation proof |
| 34 | access_control | Intermediate | Role-based access, permission checks, audit trails |
| 35 | zero_knowledge | Advanced | ZK proofs for vector operations, privacy-preserving search |
Autonomous Agent
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 36 | ruvbot | Advanced | Autonomous agent with RVF memory, planning, tool use |
POSIX & Systems
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 37 | posix_fileops | Intermediate | Raw I/O, atomic rename, locking, segment random access |
| 38 | linux_microkernel | Advanced | Package management, SSH keys, kernel embed, lineage updates |
| 39 | mcp_in_rvf | Advanced | MCP server runtime embedded in RVF, eBPF filter, tools |
Network Operations
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 40 | network_interfaces | Intermediate | Multi-chassis telemetry, anomaly detection, filtered queries |
Features Covered
Storage — vectors in, answers out
| Feature | Example | Description |
|---|---|---|
| k-NN Search | basic_store | Find nearest neighbors by L2 or cosine distance |
| Persistence | basic_store | Close a store, reopen it, verify results match |
| Metadata Filters | filtered_search | Eq, Ne, Gt, Lt, Range, In, And, Or expressions |
| Combined Filters | filtered_search | Multi-condition queries (category + score range) |
Indexing — speed vs. accuracy trade-offs
| Feature | Example | Description |
|---|---|---|
| Progressive Indexing | progressive_index | Three-tier HNSW: Layer A (fast), B (better), C (best) |
| Recall Measurement | progressive_index | Compare approximate results against brute-force ground truth |
Compression — fit more vectors in less memory
| Feature | Example | Description |
|---|---|---|
| Scalar Quantization | quantization | fp32 → u8 (4x compression, Hot tier) |
| Product Quantization | quantization | fp32 → PQ codes (8-32x compression, Warm tier) |
| Binary Quantization | quantization | fp32 → 1-bit (32x compression, Cold tier) |
| Temperature Tiering | quantization | Count-Min Sketch access tracking + automatic tier assignment |
Wire format — what the bytes look like on disk and over the network
| Feature | Example | Description |
|---|---|---|
| Segment I/O | wire_format | Write/read 64-byte-aligned segments with type/flags/hash |
| Hash Validation | wire_format | CRC32c / XXH3 integrity checks on every segment |
| Tail-Scan | wire_format | Find latest manifest by scanning backward from EOF |
Trust — signatures, audit trails, and tamper detection
| Feature | Example | Description |
|---|---|---|
| Ed25519 Signing | crypto_signing | Sign segments, verify signatures, detect tampering |
| Witness Chains | crypto_signing | SHAKE-256 linked audit trails (73-byte entries) |
| Tamper Detection | crypto_signing | Any byte flip breaks chain verification |
Agentic AI — lineage, domains, and self-booting intelligence
| Feature | Example | Description |
|---|---|---|
| DNA-Style Lineage | (API) | Every derived file records its parent's hash and derivation type |
| Domain Profiles | (API) | .rvdna, .rvtext, .rvgraph, .rvvis — same format, domain-specific hints |
| Computational Container | claude_code_appliance |
Embed a WASM microkernel, eBPF program, or bootable unikernel |
| Self-Booting Appliance | claude_code_appliance |
5.1 MB .rvf — boots Linux, serves queries, runs Claude Code |
| Import (JSON/CSV/NumPy) | (API) | Load embeddings from .json, .csv, or .npy files via rvf-import or rvf ingest CLI |
| Unified CLI | rvf |
9 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve |
| Compaction | (API) | Garbage-collect tombstoned vectors and reclaim disk space |
| Batch Delete | (API) | Delete vectors by ID with tombstone markers |
Self-Booting RVF — Claude Code Appliance
The claude_code_appliance example builds a complete self-booting AI development environment as a single .rvf file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.
cd examples/rvf
cargo run --example claude_code_appliance
What it produces (5.1 MB file):
claude_code_appliance.rvf
├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64)
├── EBPF_SEG Socket filter — allows ports 2222, 8080 only
├── VEC_SEG 20 package embeddings (128-dim)
├── INDEX_SEG HNSW graph for package search
├── WITNESS_SEG 6-entry tamper-evident audit trail
├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude)
├── MANIFEST_SEG 4 KB root with segment directory
└── Snapshot v1 derived image with lineage tracking
Boot and connect:
rvf launch claude_code_appliance.rvf # Boot on QEMU/Firecracker
ssh -p 2222 deploy@localhost # SSH in
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'
Final file: 5.1 MB single .rvf — boots Linux, serves queries, runs Claude Code.
What RVF Contains
An RVF file is built from segments — self-describing blocks that can be combined freely. Here are all 16 types, grouped by purpose:
Data Indexing Compression Runtime
+-----------+ +-----------+ +-----------+ +-----------+
| VEC 0x01 | | INDEX 0x02| | QUANT 0x06| | WASM |
| (vectors) | | (HNSW) | | (SQ/PQ/BQ)| | (5.5 KB) |
+-----------+ +-----------+ +-----------+ +-----------+
| META 0x07 | | META_IDX | | HOT 0x08 | | KERNEL |
| (key-val) | | 0x0D | | (promoted) | | 0x0E |
+-----------+ +-----------+ +-----------+ +-----------+
| JOURNAL | | OVERLAY | | SKETCH | | EBPF |
| 0x04 | | 0x03 | | 0x09 | | 0x0F |
+-----------+ +-----------+ +-----------+ +-----------+
Trust State Domain
+-----------+ +-----------+ +-----------+
| WITNESS | | MANIFEST | | PROFILE |
| 0x0A | | 0x05 | | 0x0B |
+-----------+ +-----------+ +-----------+
| CRYPTO |
| 0x0C |
+-----------+
Any segment you don't need is simply absent. A basic vector store uses VEC + INDEX + MANIFEST. A sealed cognitive engine might use all 16.
RuVector Ecosystem Integration
RVF is the universal substrate for the entire RuVector ecosystem. Here's how the 75+ Rust crates map onto RVF segments:
| Domain | Crates | RVF Segments Used |
|---|---|---|
| LLM inference | ruvllm, ruvllm-cli |
VEC (KV cache), OVERLAY (LoRA), WITNESS (audit) |
| Self-optimizing learning | sona |
OVERLAY (micro-LoRA), META (EWC++ weights) |
| Graph neural networks | ruvector-gnn, ruvector-graph |
INDEX (HNSW topology), META (edge weights) |
| Quantum computing | ruQu, ruqu-core, ruqu-algorithms |
SKETCH (VQE snapshots), META (syndrome tables) |
| Attention mechanisms | ruvector-attention, ruvector-mincut-gated-transformer |
VEC (attention matrices), QUANT (INT4/FP16) |
| Coherence systems | cognitum-gate-kernel, prime-radiant |
WITNESS (tile witnesses), WASM (64 KB tiles) |
| Neuromorphic | ruvector-nervous-system, micro-hnsw-wasm |
VEC (spike trains), INDEX (spiking HNSW) |
| Agent memory | agentdb, claude-flow, agentic-flow |
VEC + INDEX + WITNESS (full agent state) |
| Edge / browser | rvlite, rvf-wasm |
VEC + INDEX via 5.5 KB WASM microkernel |
| Hyperbolic geometry | ruvector-hyperbolic-hnsw, ruvector-math |
INDEX (Poincaré ball HNSW) |
| Routing / inference | ruvector-tiny-dancer-core, ruvector-sparse-inference |
VEC (feature vectors), META (routing policies) |
| Observation pipeline | ospipe |
META (state vectors), WITNESS (provenance) |
Performance & Comparison
RVF is designed for speed at every layer:
| Metric | Value | Example |
|---|---|---|
| Cold boot (4 KB manifest) | < 5 ms | wire_format |
| First query (Layer A only) | recall >= 0.70 | progressive_index |
| Full recall (Layer C) | >= 0.95 | progressive_index |
| WASM binary size | ~5.5 KB | — |
| Segment header | 64 bytes | wire_format |
| Witness chain entry | 73 bytes | crypto_signing |
| Scalar quantization | 4x compression | quantization |
| Product quantization | 8-32x compression | quantization |
| Binary quantization | 32x compression | quantization |
Progressive Loading
Instead of waiting for the full index, RVF serves queries immediately:
Layer A ─────> Layer B ─────> Layer C
(microsecs) (~10 ms) (~50 ms)
recall ~0.70 recall ~0.85 recall ~0.95
The progressive_index example measures this recall progression with brute-force ground truth.
Comparison
vs. vector databases
| Feature | RVF | Annoy | FAISS | Qdrant | Milvus |
|---|---|---|---|---|---|
| Single-file format | Yes | Yes | No | No | No |
| Crash-safe (no WAL) | Yes | No | No | WAL | WAL |
| Progressive loading | 3 layers | No | No | No | No |
| WASM support | 5.5 KB | No | No | No | No |
no_std compatible |
Yes | No | No | No | No |
| Post-quantum sigs | ML-DSA-65 | No | No | No | No |
| TEE attestation | Yes | No | No | No | No |
| Metadata filtering | Yes | No | Yes | Yes | Yes |
| Auto quantization | 3-tier | No | Manual | Yes | Yes |
| Append-only | Yes | Build-once | Build-once | Log | Log |
| Witness chains | Yes | No | No | No | No |
| Lineage provenance | Yes (DNA-style) | No | No | No | No |
| Computational container | Yes (WASM/eBPF/unikernel) | No | No | No | No |
| Domain profiles | 5 profiles | No | No | No | No |
| Language bindings | Rust, Node, WASM | C++, Python | C++, Python | Rust, Python | Go, Python |
vs. model registries, graph DBs, and container formats
RVF replaces multiple tools because it carries data, model, graph, runtime, and trust chain together:
| Capability | RVF | GGUF | ONNX | SafeTensors | Neo4j | Docker/OCI |
|---|---|---|---|---|---|---|
| Vector storage + search | Yes | No | No | No | No | No |
| Model weight deltas (LoRA) | OVERLAY_SEG | Full weights | Full graph | Weights only | No | No |
| Graph neural state | GRAPH_SEG | No | No | No | Yes | No |
| Cryptographic audit trail | WITNESS_SEG | No | No | No | No | No |
| Self-booting runtime | KERNEL_SEG | No | No | No | No | Yes |
| Kernel-level acceleration | EBPF_SEG | No | No | No | No | No |
| File lineage / versioning | DNA-style | No | No | No | No | Image layers |
| TEE attestation | Built-in | No | No | No | No | No |
| Single portable file | Yes | Yes | Yes | Yes | No | Image tarball |
| Runs in browser | 5.5 KB WASM | No | ONNX.js | No | No | No |
Usage Patterns (8 patterns)
Pattern 1: Simple Vector Store
The most common use case. Create a store, add embeddings, query nearest neighbors.
use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;
let options = RvfOptions {
dimension: 384,
metric: DistanceMetric::L2,
..Default::default()
};
let mut store = RvfStore::create("vectors.rvf", options)?;
// Insert embeddings
store.ingest_batch(&[&embedding], &[1], None)?;
// Query top-10 nearest neighbors
let results = store.query(&query, 10, &QueryOptions::default())?;
for r in &results {
println!("id={}, distance={:.4}", r.id, r.distance);
}
See: basic_store.rs
Pattern 2: Filtered Search
Attach metadata to vectors, then filter during queries.
use rvf_runtime::{FilterExpr, MetadataEntry, MetadataValue};
use rvf_runtime::filter::FilterValue;
// Add metadata during ingestion
let metadata = vec![
MetadataEntry { field_id: 0, value: MetadataValue::String("science".into()) },
MetadataEntry { field_id: 1, value: MetadataValue::U64(95) },
];
store.ingest_batch(&[&vec], &[42], Some(&metadata))?;
// Query with filter: category == "science" AND score > 80
let filter = FilterExpr::And(vec![
FilterExpr::Eq(0, FilterValue::String("science".into())),
FilterExpr::Gt(1, FilterValue::U64(80)),
]);
let opts = QueryOptions { filter: Some(filter), ..Default::default() };
let results = store.query(&query, 10, &opts)?;
See: filtered_search.rs
Pattern 3: Progressive Recall
Start serving queries instantly, improve quality as more data loads.
use rvf_index::{build_full_index, build_layer_a, build_layer_c, ProgressiveIndex};
// Build HNSW graph
let graph = build_full_index(&store, n, &config, &rng, &l2_distance);
// Layer A: instant but approximate
let layer_a = build_layer_a(&graph, ¢roids, &assignments, n as u64);
let idx = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: None };
let fast_results = idx.search(&query, 10, 200, &store); // recall ~0.70
// Layer C: full precision
let layer_c = build_layer_c(&graph);
let idx_full = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: Some(layer_c) };
let precise_results = idx_full.search(&query, 10, 200, &store); // recall ~0.95
See: progressive_index.rs
Pattern 4: Cryptographic Integrity
Sign segments and build tamper-evident audit trails.
use rvf_crypto::{sign_segment, verify_segment, create_witness_chain, WitnessEntry, shake256_256};
use ed25519_dalek::SigningKey;
// Sign a segment
let footer = sign_segment(&header, &payload, &signing_key);
// Verify signature
assert!(verify_segment(&header, &payload, &footer, &verifying_key));
// Build an audit trail
let entries = vec![WitnessEntry {
prev_hash: [0; 32],
action_hash: shake256_256(b"inserted 1000 vectors"),
timestamp_ns: 1_700_000_000_000_000_000,
witness_type: 0x01, // PROVENANCE
}];
let chain = create_witness_chain(&entries);
See: crypto_signing.rs
Pattern 5: Import from JSON / CSV / NumPy
Load embeddings from common formats without writing a parser.
use rvf_import::{import_json, import_csv, import_npy};
// From a JSON array of vectors
import_json("embeddings.json", &mut store)?;
// From a CSV file (one vector per row)
import_csv("embeddings.csv", &mut store)?;
// From a NumPy .npy file
import_npy("embeddings.npy", &mut store)?;
Pattern 6: Delete and Compact
Remove vectors by ID, then reclaim disk space.
// Delete specific vectors (marks as tombstones)
store.delete_batch(&[42, 99, 1001])?;
// Compact: rewrite the file without tombstoned data
store.compact()?;
Pattern 7: File Lineage (Parent → Child Derivation)
Create derived files that track their ancestry.
use rvf_types::DerivationType;
// Create a parent store
let parent = RvfStore::create("parent.rvf", options)?;
// Derive a filtered child — records parent's hash automatically
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());
// Derive a grandchild
let grandchild = child.derive("grandchild.rvdna", DerivationType::Quantize, None)?;
assert_eq!(grandchild.lineage_depth(), 2);
Pattern 8: Embed a Computational Container
Pack a bootable kernel or eBPF program into the file.
use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
// Embed a unikernel — file can now boot as a standalone service
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &kernel_image, 8080)?;
// Embed an eBPF program — enables kernel-level acceleration
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
// Extract later
let (hdr, img) = store.extract_kernel()?.unwrap();
let (hdr, prog) = store.extract_ebpf()?.unwrap();
Tutorial: Your First RVF Store (Step by Step)
Step 1: Set Up
Create a new Rust project and add the dependency:
cargo new my_vectors
cd my_vectors
Add to Cargo.toml:
[dependencies]
rvf-runtime = { path = "../crates/rvf/rvf-runtime" }
tempfile = "3"
Step 2: Create a Store
use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;
use tempfile::TempDir;
fn main() {
let tmp = TempDir::new().unwrap();
let path = tmp.path().join("my.rvf");
let opts = RvfOptions {
dimension: 128,
metric: DistanceMetric::L2,
..Default::default()
};
let mut store = RvfStore::create(&path, opts).unwrap();
Step 3: Insert Vectors
Vectors are inserted in batches. Each vector needs a unique u64 ID.
let vec_a = vec![0.1f32; 128];
let vec_b = vec![0.2f32; 128];
let vecs: Vec<&[f32]> = vec![&vec_a, &vec_b];
let ids = vec![1u64, 2];
let result = store.ingest_batch(&vecs, &ids, None).unwrap();
println!("Accepted: {}, Rejected: {}", result.accepted, result.rejected);
Step 4: Query
let query = vec![0.15f32; 128];
let results = store.query(&query, 5, &QueryOptions::default()).unwrap();
for r in &results {
println!(" id={}, dist={:.6}", r.id, r.distance);
}
Step 5: Verify Persistence
store.close().unwrap();
let reopened = RvfStore::open(&path).unwrap();
let results2 = reopened.query(&query, 5, &QueryOptions::default()).unwrap();
assert_eq!(results.len(), results2.len());
println!("Persistence verified!");
}
Expected Output
Accepted: 2, Rejected: 0
id=1, dist=0.064000
id=2, dist=0.032000
Persistence verified!
Tutorial: Understanding Quantization Tiers
The Problem
A million 384-dim vectors at full precision (fp32) takes 1.5 GB of RAM. Not all vectors are accessed equally — most are rarely touched. Why keep them all at full precision?
The Solution: Temperature Tiering
RVF assigns vectors to three compression levels based on how often they're accessed:
| Tier | Access Pattern | Compression | Memory per Vector (384d) |
|---|---|---|---|
| Hot | Frequently queried | Scalar (fp32 -> u8) | 384 bytes (4x smaller) |
| Warm | Occasionally queried | Product quantization | 48 bytes (32x smaller) |
| Cold | Rarely accessed | Binary (1-bit) | 48 bytes (32x smaller) |
| Raw | No compression | fp32 | 1,536 bytes |
How It Works
1. Track access patterns using a Count-Min Sketch (a probabilistic counter):
let mut sketch = CountMinSketch::default_sketch();
// Every time a vector is accessed, increment its counter
sketch.increment(vector_id);
// Check how often a vector has been accessed
let count = sketch.estimate(vector_id);
2. Assign tiers based on configurable thresholds:
let tier = assign_tier(count);
// Hot: count >= 100
// Warm: count >= 10
// Cold: count < 10
3. Encode at the appropriate level:
// Hot: Scalar (fast, low error)
let sq = ScalarQuantizer::train(&vectors);
let encoded = sq.encode_vec(&vector); // 384 bytes
// Warm: Product (balanced)
let pq = ProductQuantizer::train(&vectors, 48, 64, 20);
let encoded = pq.encode_vec(&vector); // 48 bytes
// Cold: Binary (smallest, approximate)
let bits = encode_binary(&vector); // 48 bytes
Run the Example
cargo run --example quantization
You'll see a comparison table showing compression ratio, reconstruction error (MSE), and bytes per vector for each tier.
Tutorial: Building Witness Chains for Audit Trails
What Is a Witness Chain?
A witness chain is a tamper-evident log of events. Each entry links to the previous one through a cryptographic hash. If any entry is modified, all subsequent hash links break — making tampering detectable without a blockchain.
Chain Structure
Entry 0 (genesis) Entry 1 Entry 2
+-------------------+ +-------------------+ +-------------------+
| prev_hash: 0x00.. | | prev_hash: H(E0) | | prev_hash: H(E1) |
| action: H(data) | | action: H(data) | | action: H(data) |
| timestamp: T0 | | timestamp: T1 | | timestamp: T2 |
| type: PROVENANCE | | type: COMPUTATION | | type: SEARCH |
+-------------------+ +-------------------+ +-------------------+
73 bytes 73 bytes 73 bytes
- prev_hash: SHAKE-256 hash of the previous entry (zeroed for genesis)
- action_hash: SHAKE-256 hash of whatever action is being recorded
- timestamp_ns: Nanosecond UNIX timestamp
- witness_type: What kind of event (see table below)
Witness Types
| Code | Name | When to Use |
|---|---|---|
0x01 |
PROVENANCE | Data origin tracking (e.g., "loaded from model X") |
0x02 |
COMPUTATION | Operation recording (e.g., "built HNSW index") |
0x03 |
SEARCH | Query audit (e.g., "searched for query Q, got results R") |
0x04 |
DELETION | Deletion audit (e.g., "deleted vectors 1-100") |
0x05 |
PLATFORM_ATTESTATION | TEE attestation (e.g., "enclave measured as M") |
0x06 |
KEY_BINDING | Sealed key (e.g., "key K bound to enclave M") |
0x07 |
COMPUTATION_PROOF | Verified computation (e.g., "search ran inside enclave") |
0x08 |
DATA_PROVENANCE | Full chain (e.g., "model -> TEE -> RVF file") |
0x09 |
DERIVATION | File lineage derivation event |
0x0A |
LINEAGE_MERGE | Multi-parent lineage merge |
0x0B |
LINEAGE_SNAPSHOT | Lineage snapshot checkpoint |
0x0C |
LINEAGE_TRANSFORM | Lineage transform operation |
0x0D |
LINEAGE_VERIFY | Lineage verification event |
Creating and Verifying
use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry, shake256_256};
// Record three events
let entries = vec![
WitnessEntry {
prev_hash: [0; 32], // genesis
action_hash: shake256_256(b"loaded embeddings from model-v2"),
timestamp_ns: 1_700_000_000_000_000_000,
witness_type: 0x01,
},
WitnessEntry {
prev_hash: [0; 32], // filled by create_witness_chain
action_hash: shake256_256(b"built HNSW index (M=16, ef=200)"),
timestamp_ns: 1_700_000_001_000_000_000,
witness_type: 0x02,
},
WitnessEntry {
prev_hash: [0; 32],
action_hash: shake256_256(b"query: top-10 for user request #42"),
timestamp_ns: 1_700_000_002_000_000_000,
witness_type: 0x03,
},
];
let chain_bytes = create_witness_chain(&entries);
let verified = verify_witness_chain(&chain_bytes).unwrap();
assert_eq!(verified.len(), 3);
Tamper Detection
Flip any byte in the chain and verification fails:
let mut tampered = chain_bytes.clone();
tampered[100] ^= 0xFF; // flip one byte
assert!(verify_witness_chain(&tampered).is_err()); // detected!
Run the Example
cargo run --example crypto_signing
The example creates a 5-entry chain, verifies it, then demonstrates tamper and truncation detection.
Tutorial: Wire Format Deep Dive
Segment Header (64 bytes)
Every piece of data in an RVF file is wrapped in a self-describing segment. The header is always exactly 64 bytes:
Offset Size Field Description
------ ---- ----- -----------
0x00 4 magic 0x52564653 ("RVFS")
0x04 1 version Format version (currently 1)
0x05 1 seg_type Segment type (VEC, INDEX, MANIFEST, ...)
0x06 2 flags Bitfield (COMPRESSED, SIGNED, ATTESTED, ...)
0x08 8 segment_id Monotonically increasing ID
0x10 8 payload_length Byte length of payload
0x18 8 timestamp_ns Nanosecond UNIX timestamp
0x20 1 checksum_algo 0=CRC32C, 1=XXH3-128, 2=SHAKE-256
0x21 1 compression 0=none, 1=LZ4, 2=ZSTD
0x22 2 reserved_0 Must be zero
0x24 4 reserved_1 Must be zero
0x28 16 content_hash First 128 bits of payload hash
0x38 4 uncompressed_len Original size before compression
0x3C 4 alignment_pad Padding to 64-byte boundary
The 16 Segment Types
| Code | Name | Purpose |
|---|---|---|
0x01 |
VEC | Raw vector embeddings |
0x02 |
INDEX | HNSW adjacency and routing tables |
0x03 |
OVERLAY | Graph overlay deltas |
0x04 |
JOURNAL | Metadata mutations, deletions |
0x05 |
MANIFEST | Segment directory, epoch state |
0x06 |
QUANT | Quantization dictionaries (scalar/PQ/binary) |
0x07 |
META | Key-value metadata |
0x08 |
HOT | Temperature-promoted data |
0x09 |
SKETCH | Access counter sketches (Count-Min) |
0x0A |
WITNESS | Audit trails, attestation proofs |
0x0B |
PROFILE | Domain profile declarations |
0x0C |
CRYPTO | Key material, signature chains |
0x0D |
META_IDX | Metadata inverted indexes |
0x0E |
KERNEL | Compressed unikernel image (self-booting) |
0x0F |
EBPF | eBPF program for kernel-level acceleration |
Segment Flags
| Bit | Name | Description |
|---|---|---|
| 0 | COMPRESSED | Payload is compressed (LZ4 or ZSTD) |
| 1 | ENCRYPTED | Payload is encrypted |
| 2 | SIGNED | Signature footer follows payload |
| 3 | SEALED | Immutable (compaction output) |
| 4 | PARTIAL | Streaming / partial write |
| 5 | TOMBSTONE | Logical deletion marker |
| 6 | HOT | Temperature-promoted |
| 7 | OVERLAY | Contains delta data |
| 8 | SNAPSHOT | Full snapshot |
| 9 | CHECKPOINT | Safe rollback point |
| 10 | ATTESTED | Produced inside attested TEE |
| 11 | HAS_LINEAGE | File carries FileIdentity lineage data |
Crash Safety: Two-fsync Protocol
RVF doesn't need a write-ahead log. Instead:
- Write data segment + payload, then
fsync - Write MANIFEST_SEG with updated state, then
fsync
If the process crashes between fsyncs, the incomplete segment has no manifest reference — it's ignored on recovery. Simple, safe, fast.
Tail-Scan
To find the current state, scan backward from the end of the file for the latest MANIFEST_SEG. The root manifest fits in 4 KB, so cold boot takes < 5 ms.
Run the Example
cargo run --example wire_format
You'll see three segments written, read back, hash-validated, corruption detected, and a tail-scan for the manifest.
Tutorial: Metadata Filtering Patterns
Available Filter Expressions
| Expression | Syntax | Description |
|---|---|---|
Eq |
FilterExpr::Eq(field_id, value) |
Exact match |
Ne |
FilterExpr::Ne(field_id, value) |
Not equal |
Gt |
FilterExpr::Gt(field_id, value) |
Greater than |
Lt |
FilterExpr::Lt(field_id, value) |
Less than |
Range |
FilterExpr::Range(field_id, low, high) |
Value in [low, high) |
In |
FilterExpr::In(field_id, values) |
Value is one of |
And |
FilterExpr::And(vec![...]) |
All conditions must match |
Or |
FilterExpr::Or(vec![...]) |
Any condition matches |
Metadata Types
| Type | Rust | Use Case |
|---|---|---|
String |
MetadataValue::String("cat".into()) |
Categories, labels, tags |
U64 |
MetadataValue::U64(95) |
Scores, counts, timestamps |
Bytes |
MetadataValue::Bytes(vec![...]) |
Binary data, hashes |
Common Patterns
Category filter:
FilterExpr::Eq(0, FilterValue::String("science".into()))
Score range:
FilterExpr::Range(1, FilterValue::U64(30), FilterValue::U64(90))
Multi-category:
FilterExpr::In(0, vec![
FilterValue::String("science".into()),
FilterValue::String("tech".into()),
])
Combined (AND):
FilterExpr::And(vec![
FilterExpr::Eq(0, FilterValue::String("science".into())),
FilterExpr::Gt(1, FilterValue::U64(80)),
])
Run the Example
cargo run --example filtered_search
The example creates 500 vectors with category and score metadata, then runs 7 different filter queries showing selectivity and verification.
Tutorial: Progressive Index Recall Measurement
What Is Recall?
Recall@K measures how many of the true K nearest neighbors your approximate algorithm actually returns. A recall of 0.95 means 95% of results are correct.
recall@K = |approximate_results ∩ exact_results| / K
How Progressive Indexing Achieves This
RVF builds an HNSW (Hierarchical Navigable Small World) graph, then splits it into three loadable layers:
Layer A: Coarse Routing
- Entry points (topmost HNSW nodes)
- Partition centroids for guided search
- Loads in microseconds
- Recall: ~0.40-0.70
Layer B: Hot Region
- Adjacency lists for the most frequently accessed vectors
- Covers the "working set" of your data
- Recall: ~0.70-0.85
Layer C: Full Graph
- Complete HNSW adjacency for all vectors
- Loaded in background while queries are already being served
- Recall: >= 0.95
Measuring Recall in the Example
The progressive_index example:
- Generates 5,000 vectors (128 dims)
- Builds the full HNSW graph (M=16, ef_construction=200)
- Splits into Layer A, B, C
- Runs 50 queries at each stage
- Computes recall@10 against brute-force ground truth
cargo run --example progressive_index
Expected output:
=== Recall Progression Summary ===
Layers Recall@10
A only 0.xxx
A + B 0.xxx
A + B + C 0.9xx
Tuning ef_search
The ef_search parameter controls how many candidates HNSW explores during search. Higher values improve recall at the cost of latency:
| ef_search | Recall@10 | Relative Speed |
|---|---|---|
| 10 | ~0.75 | Fastest |
| 50 | ~0.90 | Balanced |
| 200 | ~0.97 | Most accurate |
Technical Reference: Signature Footer Format
When the SIGNED flag is set on a segment, a signature footer follows the payload:
| Offset | Size | Field |
|---|---|---|
| 0x00 | 2 | sig_algo (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s) |
| 0x02 | 2 | sig_length |
| 0x04 | var | signature (64 to 7,856 bytes) |
| var | 4 | footer_length (for backward scan) |
Supported Algorithms
| Algorithm | Signature Size | Security Level | Standard |
|---|---|---|---|
| Ed25519 | 64 bytes | 128-bit classical | RFC 8032 |
| ML-DSA-65 | 3,309 bytes | NIST Level 3 (post-quantum) | FIPS 204 |
| SLH-DSA-128s | 7,856 bytes | NIST Level 1 (post-quantum, stateless) | FIPS 205 |
Signing Flow
- Serialize the segment header (64 bytes) and payload into a signing buffer
- Compute SHAKE-256 hash of the buffer
- Sign the hash with the chosen algorithm
- Append the signature footer after the payload (before padding)
- Set the
SIGNEDflag in the header
Verification Flow
- Read segment header and payload
- Recompute SHAKE-256 hash of header + payload
- Read signature footer (scan backward from segment end using
footer_length) - Verify signature against the public key
Technical Reference: Confidential Core Attestation
Overview
RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This provides cryptographic proof that:
- The platform is genuine (e.g., real Intel SGX hardware)
- The code running inside the enclave matches a known measurement
- Encryption keys are sealed to the enclave identity
- Vector operations were computed inside the secure environment
Supported TEE Platforms
| Platform | Enum Value | Quote Format |
|---|---|---|
| Intel SGX | TeePlatform::Sgx (0) |
DCAP attestation quote |
| AMD SEV-SNP | TeePlatform::SevSnp (1) |
VCEK attestation report |
| Intel TDX | TeePlatform::Tdx (2) |
TD quote |
| ARM CCA | TeePlatform::ArmCca (3) |
CCA token |
| Software (testing) | TeePlatform::SoftwareTee (0xFE) |
Synthetic (no hardware) |
Attestation Header (112 bytes, repr(C))
Offset Size Field
------ ---- -----
0x00 1 platform TeePlatform enum value
0x01 1 attestation_type AttestationWitnessType enum value
0x02 4 quote_length Length of the platform-specific quote
0x06 2 reserved
0x08 32 measurement SHAKE-256 hash of enclave code
0x28 32 signer_id SHAKE-256 hash of signing identity
0x48 8 timestamp_ns Nanosecond UNIX timestamp
0x50 16 nonce Anti-replay nonce
0x60 2 svn Security Version Number
0x62 1 sig_algo Signature algorithm for the quote
0x63 1 flags Attestation flags
0x64 4 report_data_len Length of additional report data
0x68 8 reserved
Attestation Types
| Type | Witness Code | Purpose |
|---|---|---|
| Platform Attestation | 0x05 |
TEE identity + measurement verification |
| Key Binding | 0x06 |
Keys sealed to enclave measurement |
| Computation Proof | 0x07 |
Proof that operations ran inside enclave |
| Data Provenance | 0x08 |
Full chain: model -> TEE -> RVF file |
ATTESTED Segment Flag
Any segment produced inside a TEE should set bit 10 (ATTESTED) in the segment header flags. This enables fast scanning to identify attested segments without parsing payloads.
QuoteVerifier Trait
The verification interface is pluggable:
pub trait QuoteVerifier {
fn platform(&self) -> TeePlatform;
fn verify_quote(
&self,
quote: &[u8],
report_data: &[u8],
expected_measurement: &[u8; 32],
) -> Result<(), String>;
}
Implement this trait for your TEE platform to enable hardware-backed verification. The SoftwareTee variant allows testing without real hardware.
Technical Reference: Computational Container (Self-Booting RVF)
Three-Tier Execution Model
RVF files can optionally carry executable compute alongside vector data:
| Tier | Segment | Size | Environment | Boot Time | Use Case |
|---|---|---|---|---|---|
| 1: WASM | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere |
| 2: eBPF | EBPF_SEG (0x0F) |
10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits |
| 3: Unikernel | KERNEL_SEG (0x0E) |
200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service |
KernelHeader (128 bytes)
| Field | Size | Description |
|---|---|---|
kernel_magic |
4 | 0x52564B4E ("RVKN") |
header_version |
2 | Currently 1 |
kernel_arch |
1 | x86_64 (0), AArch64 (1), RISC-V (2), WASM (3) |
kernel_type |
1 | HermitOS (0), Unikraft (1), Custom (2), TestStub (0xFE) |
image_size |
4 | Uncompressed kernel size |
compressed_size |
4 | Compressed (ZSTD) size |
image_hash |
32 | SHAKE-256-256 of uncompressed image |
api_port |
2 | HTTP API port (network byte order) |
api_transport |
1 | HTTP (0), gRPC (1), virtio-vsock (2) |
kernel_flags |
8 | Feature flags (read-only, metrics, TEE, etc.) |
cmdline_len |
2 | Length of kernel command line |
EbpfHeader (64 bytes)
| Field | Size | Description |
|---|---|---|
ebpf_magic |
4 | 0x52564250 ("RVBP") |
program_type |
1 | XDP (0), TC (1), Tracepoint (2), Socket (3) |
attach_type |
1 | XdpIngress (0), TcIngress (1), etc. |
max_dimension |
4 | Maximum vector dimension (eBPF verifier loop bound) |
bytecode_size |
4 | Size of BPF ELF object |
btf_size |
4 | Size of BTF section |
map_count |
4 | Number of BPF maps |
Embedding and Extracting
use rvf_runtime::RvfStore;
use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
let mut store = RvfStore::open("vectors.rvf")?;
// Embed a kernel
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &image, 8080)?;
// Embed an eBPF program
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
// Extract later
let (kernel_hdr, kernel_img) = store.extract_kernel()?.unwrap();
let (ebpf_hdr, ebpf_prog) = store.extract_ebpf()?.unwrap();
Forward Compatibility
Files with KERNEL_SEG or EBPF_SEG work with older readers -- unknown segment types are skipped per the RVF forward-compatibility rule. The computational capability is purely additive.
See ADR-030 for the full specification.
Technical Reference: DNA-Style Lineage Provenance
How Lineage Works
Every RVF file carries a 68-byte FileIdentity in its root manifest:
| Field | Size | Description |
|---|---|---|
file_id |
16 | Unique UUID for this file |
parent_id |
16 | UUID of the parent file (all zeros for root) |
parent_hash |
32 | SHAKE-256-256 of parent's manifest |
lineage_depth |
4 | Generation count (0 for root) |
Derivation Chain
Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna
file_id: A file_id: B file_id: C
parent_id: [0;16] parent_id: A parent_id: B
parent_hash: [0;32] parent_hash: hash(A) parent_hash: hash(B)
depth: 0 depth: 1 depth: 2
Derivation Types
| Code | Type | Description |
|---|---|---|
| 0 | Clone | Exact copy |
| 1 | Filter | Subset of parent's vectors |
| 2 | Merge | Multi-parent merge |
| 3 | Quantize | Re-quantized version |
| 4 | Reindex | Re-indexed with different parameters |
| 5 | Transform | Transformed embeddings |
| 6 | Snapshot | Point-in-time snapshot |
| 0xFF | UserDefined | Application-specific derivation |
Using the API
use rvf_runtime::RvfStore;
use rvf_types::DerivationType;
let parent = RvfStore::create("parent.rvf", options)?;
// Derive a filtered child
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());
Domain Extensions
| Extension | Domain Profile | Optimized For |
|---|---|---|
.rvf |
Generic | General-purpose vectors |
.rvdna |
RVDNA | Genomic sequence embeddings |
.rvtext |
RVText | Language model embeddings |
.rvgraph |
RVGraph | Graph/network node embeddings |
.rvvis |
RVVision | Image/vision model embeddings |
See ADR-029 for the full format specification.
Technical Reference: Crate Architecture
Crate Map
+-----------------------------------------+
| Cognitive Layer |
| ruvllm | gnn | ruQu | attention | sona |
| mincut | prime-radiant | nervous-system |
+---+-------------+---------------+-------+
| | |
+-----------------------------------------+
| Application Layer |
| claude-flow | agentdb | agentic-flow |
| ospipe | rvlite | sona | your-app |
+---+-------------+---------------+-------+
| | |
+---v-------------v---------------v-------+
| RVF SDK Layer |
| rvf-runtime | rvf-index | rvf-quant |
| rvf-manifest | rvf-crypto | rvf-wire |
+---+-------------+---------------+-------+
| | |
+--------v------+ +---v--------+ +----v-------+ +----v------+
| rvf-server | | rvf-node | | rvf-wasm | | rvf-cli |
| HTTP + TCP | | N-API | | ~46 KB | | clap |
+---------------+ +------------+ +------------+ +-----------+
Crate Details
| Crate | Lines | no_std | Purpose |
|---|---|---|---|
rvf-types |
3,184 | Yes | Segment types, kernel/eBPF headers, lineage, enums |
rvf-wire |
2,011 | Yes | Wire format read/write, hash validation |
rvf-manifest |
1,580 | No | Two-level manifest with 4 KB root, FileIdentity codec |
rvf-index |
2,691 | No | HNSW progressive indexing (Layer A/B/C) |
rvf-quant |
1,443 | No | Scalar, product, and binary quantization |
rvf-crypto |
1,725 | Partial | SHAKE-256, Ed25519, witness chains, attestation, lineage |
rvf-runtime |
3,607 | No | Full store API, compaction, lineage, kernel/eBPF embed |
rvf-import |
980 | No | JSON, CSV, NumPy (.npy) importers |
rvf-wasm |
1,616 | Yes | WASM control plane: in-memory store, query, segment inspection |
rvf-node |
852 | No | Node.js N-API bindings with lineage, kernel/eBPF, inspection |
rvf-cli |
665 | No | Unified CLI: create, ingest, query, delete, status, inspect, compact, derive, serve |
rvf-server |
1,165 | No | HTTP REST + TCP streaming server |
Library Adapters
| Adapter | Purpose | Key Feature |
|---|---|---|
rvf-adapter-claude-flow |
AI agent memory | WITNESS_SEG audit trails |
rvf-adapter-agentdb |
Agent vector database | Progressive HNSW indexing |
rvf-adapter-ospipe |
Observation-State pipeline | META_SEG for state vectors |
rvf-adapter-agentic-flow |
Swarm coordination | Inter-agent memory sharing |
rvf-adapter-rvlite |
Lightweight embedded store | Minimal API, edge-friendly |
rvf-adapter-sona |
Neural architecture | Experience replay + trajectories |
Technical Reference: File Format Specification
File Extension
| Extension | Usage |
|---|---|
.rvf |
Standard RuVector Format file |
.rvf.cold.N |
Cold shard N (multi-file mode) |
.rvf.idx.N |
Index shard N (multi-file mode) |
MIME Type
application/x-ruvector-format
Magic Number
0x52564653 (ASCII: "RVFS")
Byte Order
All multi-byte integers are little-endian.
Alignment
All segments are 64-byte aligned (cache-line friendly). Payloads are padded to the next 64-byte boundary.
Root Manifest
The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via backward scan:
let (offset, header) = find_latest_manifest(&file_data)?;
The root manifest provides:
- Segment directory (offsets to all segments)
- Hotset pointers (entry points, top layer, centroids, quant dicts)
- Epoch counter
- Vector count and dimension
- Profile identifiers
Domain Profiles
| Profile | Code | Optimized For |
|---|---|---|
| Generic | 0x00 |
General-purpose vectors |
| RVDNA | 0x01 |
Genomic sequence embeddings |
| RVText | 0x02 |
Language model embeddings |
| RVGraph | 0x03 |
Graph/network node embeddings |
| RVVision | 0x04 |
Image/vision model embeddings |
Building from Source
Prerequisites
- Rust 1.87+ via rustup (
rustup update stable) - For WASM:
rustup target add wasm32-unknown-unknown - For Node.js bindings: Node.js 18+ and
npm
Build Examples
cd examples/rvf
cargo build
Build All RVF Crates
cd crates/rvf
cargo build --workspace
Run All Tests
cd crates/rvf
cargo test --workspace
Run Clippy
cd crates/rvf
cargo clippy --all-targets --workspace --exclude rvf-wasm
Build WASM Microkernel
cd crates/rvf
cargo build --target wasm32-unknown-unknown -p rvf-wasm --release
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasm
Build Node.js Bindings
cd crates/rvf/rvf-node
npm install && npm run build
Run Benchmarks
cd crates/rvf
cargo bench --bench rvf_benchmarks
Project Structure
examples/rvf/
Cargo.toml # Standalone workspace
src/lib.rs # Shared utilities
examples/
# Core (6)
basic_store.rs # Store lifecycle, insert, query, persistence
progressive_index.rs # Three-layer HNSW, recall measurement
quantization.rs # Scalar, product, binary quantization + tiering
wire_format.rs # Raw segment I/O, hash validation, tail-scan
crypto_signing.rs # Ed25519 signing, witness chains, tamper detection
filtered_search.rs # Metadata-filtered vector search
# Agentic AI (6)
agent_memory.rs # Persistent agent memory + witness audit
swarm_knowledge.rs # Multi-agent shared knowledge base
reasoning_trace.rs # Chain-of-thought with lineage derivation
tool_cache.rs # Tool call result cache with TTL + compaction
agent_handoff.rs # Transfer agent state between instances
experience_replay.rs # RL experience replay buffer
# Practical Production (5)
semantic_search.rs # Document search engine (4 filter workflows)
recommendation.rs # Item recommendations (collaborative filtering)
rag_pipeline.rs # Retrieval-augmented generation pipeline
embedding_cache.rs # LRU cache with temperature tiering
dedup_detector.rs # Near-duplicate detection + compaction
# Vertical Domains (4)
genomic_pipeline.rs # DNA k-mer search (.rvdna profile)
financial_signals.rs # Market signals with attestation
medical_imaging.rs # Radiology embedding search (.rvvis)
legal_discovery.rs # Legal document similarity (.rvtext)
# Exotic Capabilities (5)
self_booting.rs # RVF with embedded unikernel
ebpf_accelerator.rs # eBPF hot-path acceleration
hyperbolic_taxonomy.rs # Hierarchy-aware search
multimodal_fusion.rs # Cross-modal text + image search
sealed_engine.rs # Full cognitive engine (capstone)
# Runtime Targets + Postgres (5)
browser_wasm.rs # Browser-side WASM vector search
edge_iot.rs # IoT device with binary quantization
serverless_function.rs # Cold-start optimized for Lambda
ruvllm_inference.rs # LLM KV cache + LoRA via RVF
postgres_bridge.rs # PostgreSQL ↔ RVF export/import
# Network & Security (4)
network_sync.rs # Peer-to-peer vector store sync
tee_attestation.rs # TEE attestation + sealed keys
access_control.rs # Role-based vector access control
zero_knowledge.rs # Zero-knowledge proof integration
# Autonomous Agent (1)
ruvbot.rs # Autonomous RVF-powered agent bot
# POSIX & Systems (3)
posix_fileops.rs # POSIX file operations with RVF
linux_microkernel.rs # Linux microkernel distribution
mcp_in_rvf.rs # MCP server embedded in RVF
# Network Operations (1)
network_interfaces.rs # Network OS telemetry (60 interfaces)
Learn More
| Resource | Description |
|---|---|
| RVF Format Specification | Full format documentation, architecture, and API reference |
| ADR-029 | Architecture decision record for the canonical format |
| ADR-030 | Computational container (KERNEL_SEG, EBPF_SEG) specification |
| ADR-031 | Example repository design (this collection of 40 examples) |
| Benchmarks | Performance benchmarks (HNSW build, quantization, wire I/O) |
| Integration Tests | E2E test suite (progressive recall, quantization, wire interop) |
Contributing
git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf
cargo build && cargo run --example basic_store
All contributions must pass cargo clippy with zero warnings and maintain the existing test count (currently 543+).
License
Dual-licensed under MIT or Apache-2.0 at your option.
Built with Rust. One file — store it, send it, run it.