RVF Examples — Learn by Running

Hands-on examples for the unified agentic AI format — store it, send it, run it

Quick StartExamplesFeaturesPerformanceComparison

Examples Rust License Tests no_std Crates

--- ## What is RVF? **RVF (RuVector Format)** is the unified agentic AI file format. One `.rvf` file does three jobs: 1. **Store** — vectors, indexes, metadata, and cryptographic proofs live in one file. No database server required. 2. **Transfer** — the same file streams over a network. Query, insert, and delete operations work over the wire with zero conversion. 3. **Run** — pack model weights, graph neural networks, WASM code, or even a bootable OS kernel into the file. Now it's not just data — it's a self-contained intelligence unit you can deploy anywhere. ### Why does this matter? Today, an AI agent's state is scattered: embeddings in one database, model weights in another, graph structure in a third, config in a fourth. Nothing talks to anything else. Moving between tools means re-indexing from scratch. There's no standard way to prove any of it was computed securely — and no way to hand an agent its complete knowledge as a single portable artifact. RVF solves this. It gives agentic AI a **universal substrate** — one file that works everywhere: | What it does | Where it runs | What you get | |-------------|--------------|-------------| | Stores vectors | Server (HNSW index) | Sub-millisecond search over millions of vectors | | Stores vectors | Browser (5.5 KB WASM) | Same file, no backend needed | | Stores vectors | Edge / IoT / mobile | Lightweight API, tiny footprint | | Transfers data | Over the network | Batched query/ingest/delete via TCP | | Runs code | Inside a TEE | Cryptographic proof of secure computation | | Runs code | Bare metal / VM | File boots itself as a microservice | | Runs code | Linux kernel (eBPF) | Sub-microsecond hot-path acceleration | | Runs intelligence | Anywhere | Model + data + graph + trust chain in one file | ### Key properties - **Crash-safe** — no write-ahead log needed; if power dies mid-write, the file stays consistent - **Self-describing** — the schema is in the file; no external catalog required - **Progressive loading** — start answering queries before the full index is loaded - **Domain profiles** — `.rvdna` for genomics, `.rvtext` for language, `.rvgraph` for networks, `.rvvis` for vision — same format underneath - **Lineage tracking** — every derived file records its parent's hash, like DNA inheritance - **Tamper-evident** — witness chains and post-quantum signatures prove nothing was altered These examples walk you through every major feature, from the simplest "insert and query" to wire format inspection, witness chains, and sealed cognitive engines. ### What you can build with RVF | Use case | What goes in the file | Result | |----------|----------------------|--------| | **Semantic search** | Vectors + HNSW index | Single-file vector database, no server needed | | **Agent memory** | Vectors + metadata + witness chain | Portable, auditable AI agent knowledge base | | **Sealed LoRA distribution** | Base embeddings + OVERLAY_SEG adapter deltas | Ship fine-tuned models as one versioned file | | **Portable graph intelligence** | Node embeddings + GRAPH_SEG adjacency | GNN state that transfers between systems | | **Self-booting AI service** | Vectors + index + KERNEL_SEG unikernel | File boots as a microservice on bare metal or Firecracker | | **Kernel-accelerated cache** | Hot vectors + EBPF_SEG XDP program | Sub-microsecond lookups in the Linux kernel data path | | **Confidential AI** | Any of the above + TEE attestation | Cryptographic proof everything ran inside a secure enclave | | **Genomic analysis** | DNA k-mer embeddings + variant tensors | `.rvdna` file with lineage tracking across analysis pipeline | | **Firmware-style AI versioning** | Full cognitive state + lineage chain | Parent → child derivation with hash verification, like DNA | --- ## Quick Start ```bash # Clone the repo git clone https://github.com/ruvnet/ruvector cd ruvector/examples/rvf # Run your first example cargo run --example basic_store ``` That's it. You'll see a store created, 100 vectors inserted, nearest neighbors found, and persistence verified — all in under a second. ### Using the CLI You can also work with RVF stores from the command line without writing any Rust: ```bash # Build the CLI cd crates/rvf && cargo build -p rvf-cli # Create a store, ingest data, and query rvf create vectors.rvf --dimension 384 rvf ingest vectors.rvf --input data.json --format json rvf query vectors.rvf --vector "0.1,0.2,..." --k 10 rvf status vectors.rvf rvf inspect vectors.rvf rvf compact vectors.rvf # Derive a child store with lineage tracking rvf derive parent.rvf child.rvf --type filter # All commands support --json for machine-readable output rvf status vectors.rvf --json ```
Run All 40 Examples **Core (6):** ```bash cargo run --example basic_store # Store lifecycle + k-NN cargo run --example progressive_index # Three-layer HNSW recall cargo run --example quantization # Scalar / product / binary cargo run --example wire_format # Raw segment I/O cargo run --example crypto_signing # Ed25519 + witness chains cargo run --example filtered_search # Metadata-filtered queries ``` **Agentic AI (6):** ```bash cargo run --example agent_memory # Persistent agent memory + witness audit cargo run --example swarm_knowledge # Multi-agent shared knowledge base cargo run --example reasoning_trace # Chain-of-thought with lineage derivation cargo run --example tool_cache # Tool call result cache with TTL cargo run --example agent_handoff # Transfer agent state between instances cargo run --example experience_replay # RL experience replay buffer ``` **Practical Production (5):** ```bash cargo run --example semantic_search # Document search with metadata filters cargo run --example recommendation # Item recommendations (collaborative filtering) cargo run --example rag_pipeline # Retrieval-augmented generation pipeline cargo run --example embedding_cache # LRU cache with temperature tiering cargo run --example dedup_detector # Near-duplicate detection + compaction ``` **Vertical Domains (4):** ```bash cargo run --example genomic_pipeline # DNA k-mer search (.rvdna profile) cargo run --example financial_signals # Market signals with TEE attestation cargo run --example medical_imaging # Radiology search (.rvvis profile) cargo run --example legal_discovery # Legal doc similarity (.rvtext profile) ``` **Exotic Capabilities (5):** ```bash cargo run --example self_booting # RVF with embedded unikernel cargo run --example ebpf_accelerator # eBPF hot-path acceleration cargo run --example hyperbolic_taxonomy # Hierarchy-aware search cargo run --example multimodal_fusion # Cross-modal text + image search cargo run --example sealed_engine # Full cognitive engine (capstone) ``` **Runtime Targets (4) + Postgres (1):** ```bash cargo run --example browser_wasm # Browser-side WASM vector search cargo run --example edge_iot # IoT device with binary quantization cargo run --example serverless_function # Cold-start optimized for Lambda cargo run --example ruvllm_inference # LLM KV cache + LoRA via RVF cargo run --example postgres_bridge # PostgreSQL ↔ RVF export/import ``` **Network & Security (4):** ```bash cargo run --example network_sync # Peer-to-peer vector store sync cargo run --example tee_attestation # TEE attestation + sealed keys cargo run --example access_control # Role-based vector access control cargo run --example zero_knowledge # Zero-knowledge proof integration ``` **Autonomous Agent (1):** ```bash cargo run --example ruvbot # Autonomous RVF-powered agent bot ``` **POSIX & Systems (3):** ```bash cargo run --example posix_fileops # POSIX file operations with RVF cargo run --example linux_microkernel # Linux microkernel distribution cargo run --example mcp_in_rvf # MCP server embedded in RVF ``` **Network Operations (1):** ```bash cargo run --example network_interfaces # Network OS telemetry (60 interfaces) ```
### Prerequisites - **Rust 1.87+** — install via [rustup](https://rustup.rs/) - No other dependencies needed — everything builds from source - All examples use deterministic pseudo-random data, so results are reproducible across runs ---
Examples at a Glance (40 examples) ### Core | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 1 | basic_store | Beginner | Create, insert, query, persist, reopen | | 2 | progressive_index | Intermediate | Three-layer HNSW, recall measurement | | 3 | quantization | Intermediate | Scalar/product/binary quantization, tiering | | 4 | wire_format | Advanced | Raw segment I/O, hash validation, tail-scan | | 5 | crypto_signing | Advanced | Ed25519 signing, witness chains, tamper detection | | 6 | filtered_search | Intermediate | Metadata filters: Eq, Range, AND/OR/IN | ### Agentic AI | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 7 | agent_memory | Intermediate | Persistent agent memory, session recall, witness audit | | 8 | swarm_knowledge | Intermediate | Multi-agent shared knowledge, cross-agent search | | 9 | reasoning_trace | Advanced | Chain-of-thought lineage (parent → child → grandchild) | | 10 | tool_cache | Intermediate | Tool call caching, TTL, delete_by_filter, compaction | | 11 | agent_handoff | Advanced | Transfer agent state, derive clone, lineage verification | | 12 | experience_replay | Intermediate | RL replay buffer, priority sampling, tiering | ### Practical Production | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 13 | semantic_search | Beginner | Document search engine, 4 filter workflows | | 14 | recommendation | Intermediate | Collaborative filtering, genre/quality filters | | 15 | rag_pipeline | Advanced | 5-step RAG: chunk, embed, retrieve, rerank, assemble | | 16 | embedding_cache | Advanced | Zipf access patterns, 3-tier quantization, memory savings | | 17 | dedup_detector | Intermediate | Near-duplicate detection, clustering, compaction | ### Vertical Domains | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 18 | genomic_pipeline | Advanced | DNA k-mer search, `.rvdna` profile, lineage | | 19 | financial_signals | Advanced | Market signals, Ed25519 signing, attestation | | 20 | medical_imaging | Intermediate | Radiology search, `.rvvis` profile, audit trail | | 21 | legal_discovery | Intermediate | Legal similarity, `.rvtext` profile, discovery audit | ### Exotic Capabilities | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 22 | self_booting | Advanced | Embed/extract unikernel, kernel header verification | | 23 | ebpf_accelerator | Advanced | Embed/extract eBPF, XDP program, co-existence | | 24 | hyperbolic_taxonomy | Intermediate | Hierarchy-aware embeddings, depth-filtered search | | 25 | multimodal_fusion | Intermediate | Cross-modal text+image search, modality filtering | | 26 | sealed_engine | Advanced | Capstone: vectors + kernel + eBPF + witness + lineage | ### Runtime Targets + Postgres | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 27 | browser_wasm | Intermediate | WASM-compatible API, raw wire segments, size targets | | 28 | edge_iot | Beginner | Constrained device, binary quantization, memory budget | | 29 | serverless_function | Intermediate | Cold start, manifest tail-scan, progressive loading | | 30 | ruvllm_inference | Advanced | KV cache + LoRA adapters + policy store via RVF | | 31 | postgres_bridge | Intermediate | PG export/import, offline query, lineage, witness audit | ### Network & Security | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 32 | network_sync | Advanced | Peer-to-peer sync, vector exchange, conflict resolution | | 33 | tee_attestation | Advanced | TEE platform attestation, sealed keys, computation proof | | 34 | access_control | Intermediate | Role-based access, permission checks, audit trails | | 35 | zero_knowledge | Advanced | ZK proofs for vector operations, privacy-preserving search | ### Autonomous Agent | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 36 | ruvbot | Advanced | Autonomous agent with RVF memory, planning, tool use | ### POSIX & Systems | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 37 | posix_fileops | Intermediate | Raw I/O, atomic rename, locking, segment random access | | 38 | linux_microkernel | Advanced | Package management, SSH keys, kernel embed, lineage updates | | 39 | mcp_in_rvf | Advanced | MCP server runtime embedded in RVF, eBPF filter, tools | ### Network Operations | # | Example | Difficulty | What You'll Learn | |---|---------|-----------|-------------------| | 40 | network_interfaces | Intermediate | Multi-chassis telemetry, anomaly detection, filtered queries |
---
Features Covered ### Storage — vectors in, answers out | Feature | Example | Description | |---------|---------|-------------| | k-NN Search | basic_store | Find nearest neighbors by L2 or cosine distance | | Persistence | basic_store | Close a store, reopen it, verify results match | | Metadata Filters | filtered_search | Eq, Ne, Gt, Lt, Range, In, And, Or expressions | | Combined Filters | filtered_search | Multi-condition queries (category + score range) | ### Indexing — speed vs. accuracy trade-offs | Feature | Example | Description | |---------|---------|-------------| | Progressive Indexing | progressive_index | Three-tier HNSW: Layer A (fast), B (better), C (best) | | Recall Measurement | progressive_index | Compare approximate results against brute-force ground truth | ### Compression — fit more vectors in less memory | Feature | Example | Description | |---------|---------|-------------| | Scalar Quantization | quantization | fp32 → u8 (4x compression, Hot tier) | | Product Quantization | quantization | fp32 → PQ codes (8-32x compression, Warm tier) | | Binary Quantization | quantization | fp32 → 1-bit (32x compression, Cold tier) | | Temperature Tiering | quantization | Count-Min Sketch access tracking + automatic tier assignment | ### Wire format — what the bytes look like on disk and over the network | Feature | Example | Description | |---------|---------|-------------| | Segment I/O | wire_format | Write/read 64-byte-aligned segments with type/flags/hash | | Hash Validation | wire_format | CRC32c / XXH3 integrity checks on every segment | | Tail-Scan | wire_format | Find latest manifest by scanning backward from EOF | ### Trust — signatures, audit trails, and tamper detection | Feature | Example | Description | |---------|---------|-------------| | Ed25519 Signing | crypto_signing | Sign segments, verify signatures, detect tampering | | Witness Chains | crypto_signing | SHAKE-256 linked audit trails (73-byte entries) | | Tamper Detection | crypto_signing | Any byte flip breaks chain verification | ### Agentic AI — lineage, domains, and self-booting intelligence | Feature | Example | Description | |---------|---------|-------------| | DNA-Style Lineage | (API) | Every derived file records its parent's hash and derivation type | | Domain Profiles | (API) | `.rvdna`, `.rvtext`, `.rvgraph`, `.rvvis` — same format, domain-specific hints | | Computational Container | `claude_code_appliance` | Embed a WASM microkernel, eBPF program, or bootable unikernel | | Self-Booting Appliance | `claude_code_appliance` | 5.1 MB `.rvf` — boots Linux, serves queries, runs Claude Code | | Import (JSON/CSV/NumPy) | (API) | Load embeddings from `.json`, `.csv`, or `.npy` files via `rvf-import` or `rvf ingest` CLI | | Unified CLI | `rvf` | 9 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve | | Compaction | (API) | Garbage-collect tombstoned vectors and reclaim disk space | | Batch Delete | (API) | Delete vectors by ID with tombstone markers | ### Self-Booting RVF — Claude Code Appliance The `claude_code_appliance` example builds a complete self-booting AI development environment as a single `.rvf` file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain. ```bash cd examples/rvf cargo run --example claude_code_appliance ``` **What it produces** (5.1 MB file): ``` claude_code_appliance.rvf ├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64) ├── EBPF_SEG Socket filter — allows ports 2222, 8080 only ├── VEC_SEG 20 package embeddings (128-dim) ├── INDEX_SEG HNSW graph for package search ├── WITNESS_SEG 6-entry tamper-evident audit trail ├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude) ├── MANIFEST_SEG 4 KB root with segment directory └── Snapshot v1 derived image with lineage tracking ``` **Boot and connect:** ```bash rvf launch claude_code_appliance.rvf # Boot on QEMU/Firecracker ssh -p 2222 deploy@localhost # SSH in curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}' ``` Final file: **5.1 MB single `.rvf`** — boots Linux, serves queries, runs Claude Code.
What RVF Contains An RVF file is built from **segments** — self-describing blocks that can be combined freely. Here are all 16 types, grouped by purpose: ``` Data Indexing Compression Runtime +-----------+ +-----------+ +-----------+ +-----------+ | VEC 0x01 | | INDEX 0x02| | QUANT 0x06| | WASM | | (vectors) | | (HNSW) | | (SQ/PQ/BQ)| | (5.5 KB) | +-----------+ +-----------+ +-----------+ +-----------+ | META 0x07 | | META_IDX | | HOT 0x08 | | KERNEL | | (key-val) | | 0x0D | | (promoted) | | 0x0E | +-----------+ +-----------+ +-----------+ +-----------+ | JOURNAL | | OVERLAY | | SKETCH | | EBPF | | 0x04 | | 0x03 | | 0x09 | | 0x0F | +-----------+ +-----------+ +-----------+ +-----------+ Trust State Domain +-----------+ +-----------+ +-----------+ | WITNESS | | MANIFEST | | PROFILE | | 0x0A | | 0x05 | | 0x0B | +-----------+ +-----------+ +-----------+ | CRYPTO | | 0x0C | +-----------+ ``` Any segment you don't need is simply absent. A basic vector store uses VEC + INDEX + MANIFEST. A sealed cognitive engine might use all 16. ### RuVector Ecosystem Integration RVF is the universal substrate for the entire RuVector ecosystem. Here's how the 75+ Rust crates map onto RVF segments: | Domain | Crates | RVF Segments Used | |--------|--------|-------------------| | **LLM inference** | `ruvllm`, `ruvllm-cli` | VEC (KV cache), OVERLAY (LoRA), WITNESS (audit) | | **Self-optimizing learning** | `sona` | OVERLAY (micro-LoRA), META (EWC++ weights) | | **Graph neural networks** | `ruvector-gnn`, `ruvector-graph` | INDEX (HNSW topology), META (edge weights) | | **Quantum computing** | `ruQu`, `ruqu-core`, `ruqu-algorithms` | SKETCH (VQE snapshots), META (syndrome tables) | | **Attention mechanisms** | `ruvector-attention`, `ruvector-mincut-gated-transformer` | VEC (attention matrices), QUANT (INT4/FP16) | | **Coherence systems** | `cognitum-gate-kernel`, `prime-radiant` | WITNESS (tile witnesses), WASM (64 KB tiles) | | **Neuromorphic** | `ruvector-nervous-system`, `micro-hnsw-wasm` | VEC (spike trains), INDEX (spiking HNSW) | | **Agent memory** | `agentdb`, `claude-flow`, `agentic-flow` | VEC + INDEX + WITNESS (full agent state) | | **Edge / browser** | `rvlite`, `rvf-wasm` | VEC + INDEX via 5.5 KB WASM microkernel | | **Hyperbolic geometry** | `ruvector-hyperbolic-hnsw`, `ruvector-math` | INDEX (Poincaré ball HNSW) | | **Routing / inference** | `ruvector-tiny-dancer-core`, `ruvector-sparse-inference` | VEC (feature vectors), META (routing policies) | | **Observation pipeline** | `ospipe` | META (state vectors), WITNESS (provenance) |
Performance & Comparison RVF is designed for speed at every layer: | Metric | Value | Example | |--------|-------|---------| | Cold boot (4 KB manifest) | **< 5 ms** | wire_format | | First query (Layer A only) | **recall >= 0.70** | progressive_index | | Full recall (Layer C) | **>= 0.95** | progressive_index | | WASM binary size | **~5.5 KB** | — | | Segment header | **64 bytes** | wire_format | | Witness chain entry | **73 bytes** | crypto_signing | | Scalar quantization | **4x compression** | quantization | | Product quantization | **8-32x compression** | quantization | | Binary quantization | **32x compression** | quantization | ### Progressive Loading Instead of waiting for the full index, RVF serves queries immediately: ``` Layer A ─────> Layer B ─────> Layer C (microsecs) (~10 ms) (~50 ms) recall ~0.70 recall ~0.85 recall ~0.95 ``` The `progressive_index` example measures this recall progression with brute-force ground truth. ### Comparison #### vs. vector databases | Feature | RVF | Annoy | FAISS | Qdrant | Milvus | |---------|-----|-------|-------|--------|--------| | Single-file format | Yes | Yes | No | No | No | | Crash-safe (no WAL) | Yes | No | No | WAL | WAL | | Progressive loading | 3 layers | No | No | No | No | | WASM support | 5.5 KB | No | No | No | No | | `no_std` compatible | Yes | No | No | No | No | | Post-quantum sigs | ML-DSA-65 | No | No | No | No | | TEE attestation | Yes | No | No | No | No | | Metadata filtering | Yes | No | Yes | Yes | Yes | | Auto quantization | 3-tier | No | Manual | Yes | Yes | | Append-only | Yes | Build-once | Build-once | Log | Log | | Witness chains | Yes | No | No | No | No | | Lineage provenance | Yes (DNA-style) | No | No | No | No | | Computational container | Yes (WASM/eBPF/unikernel) | No | No | No | No | | Domain profiles | 5 profiles | No | No | No | No | | Language bindings | Rust, Node, WASM | C++, Python | C++, Python | Rust, Python | Go, Python | #### vs. model registries, graph DBs, and container formats RVF replaces multiple tools because it carries data, model, graph, runtime, and trust chain together: | Capability | RVF | GGUF | ONNX | SafeTensors | Neo4j | Docker/OCI | |-----------|-----|------|------|-------------|-------|------------| | Vector storage + search | Yes | No | No | No | No | No | | Model weight deltas (LoRA) | OVERLAY_SEG | Full weights | Full graph | Weights only | No | No | | Graph neural state | GRAPH_SEG | No | No | No | Yes | No | | Cryptographic audit trail | WITNESS_SEG | No | No | No | No | No | | Self-booting runtime | KERNEL_SEG | No | No | No | No | Yes | | Kernel-level acceleration | EBPF_SEG | No | No | No | No | No | | File lineage / versioning | DNA-style | No | No | No | No | Image layers | | TEE attestation | Built-in | No | No | No | No | No | | Single portable file | Yes | Yes | Yes | Yes | No | Image tarball | | Runs in browser | 5.5 KB WASM | No | ONNX.js | No | No | No |
Usage Patterns (8 patterns) ### Pattern 1: Simple Vector Store The most common use case. Create a store, add embeddings, query nearest neighbors. ```rust use rvf_runtime::{RvfStore, RvfOptions, QueryOptions}; use rvf_runtime::options::DistanceMetric; let options = RvfOptions { dimension: 384, metric: DistanceMetric::L2, ..Default::default() }; let mut store = RvfStore::create("vectors.rvf", options)?; // Insert embeddings store.ingest_batch(&[&embedding], &[1], None)?; // Query top-10 nearest neighbors let results = store.query(&query, 10, &QueryOptions::default())?; for r in &results { println!("id={}, distance={:.4}", r.id, r.distance); } ``` See: [`basic_store.rs`](examples/basic_store.rs) ### Pattern 2: Filtered Search Attach metadata to vectors, then filter during queries. ```rust use rvf_runtime::{FilterExpr, MetadataEntry, MetadataValue}; use rvf_runtime::filter::FilterValue; // Add metadata during ingestion let metadata = vec![ MetadataEntry { field_id: 0, value: MetadataValue::String("science".into()) }, MetadataEntry { field_id: 1, value: MetadataValue::U64(95) }, ]; store.ingest_batch(&[&vec], &[42], Some(&metadata))?; // Query with filter: category == "science" AND score > 80 let filter = FilterExpr::And(vec![ FilterExpr::Eq(0, FilterValue::String("science".into())), FilterExpr::Gt(1, FilterValue::U64(80)), ]); let opts = QueryOptions { filter: Some(filter), ..Default::default() }; let results = store.query(&query, 10, &opts)?; ``` See: [`filtered_search.rs`](examples/filtered_search.rs) ### Pattern 3: Progressive Recall Start serving queries instantly, improve quality as more data loads. ```rust use rvf_index::{build_full_index, build_layer_a, build_layer_c, ProgressiveIndex}; // Build HNSW graph let graph = build_full_index(&store, n, &config, &rng, &l2_distance); // Layer A: instant but approximate let layer_a = build_layer_a(&graph, ¢roids, &assignments, n as u64); let idx = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: None }; let fast_results = idx.search(&query, 10, 200, &store); // recall ~0.70 // Layer C: full precision let layer_c = build_layer_c(&graph); let idx_full = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: Some(layer_c) }; let precise_results = idx_full.search(&query, 10, 200, &store); // recall ~0.95 ``` See: [`progressive_index.rs`](examples/progressive_index.rs) ### Pattern 4: Cryptographic Integrity Sign segments and build tamper-evident audit trails. ```rust use rvf_crypto::{sign_segment, verify_segment, create_witness_chain, WitnessEntry, shake256_256}; use ed25519_dalek::SigningKey; // Sign a segment let footer = sign_segment(&header, &payload, &signing_key); // Verify signature assert!(verify_segment(&header, &payload, &footer, &verifying_key)); // Build an audit trail let entries = vec![WitnessEntry { prev_hash: [0; 32], action_hash: shake256_256(b"inserted 1000 vectors"), timestamp_ns: 1_700_000_000_000_000_000, witness_type: 0x01, // PROVENANCE }]; let chain = create_witness_chain(&entries); ``` See: [`crypto_signing.rs`](examples/crypto_signing.rs) ### Pattern 5: Import from JSON / CSV / NumPy Load embeddings from common formats without writing a parser. ```rust use rvf_import::{import_json, import_csv, import_npy}; // From a JSON array of vectors import_json("embeddings.json", &mut store)?; // From a CSV file (one vector per row) import_csv("embeddings.csv", &mut store)?; // From a NumPy .npy file import_npy("embeddings.npy", &mut store)?; ``` ### Pattern 6: Delete and Compact Remove vectors by ID, then reclaim disk space. ```rust // Delete specific vectors (marks as tombstones) store.delete_batch(&[42, 99, 1001])?; // Compact: rewrite the file without tombstoned data store.compact()?; ``` ### Pattern 7: File Lineage (Parent → Child Derivation) Create derived files that track their ancestry. ```rust use rvf_types::DerivationType; // Create a parent store let parent = RvfStore::create("parent.rvf", options)?; // Derive a filtered child — records parent's hash automatically let child = parent.derive("child.rvf", DerivationType::Filter, None)?; assert_eq!(child.lineage_depth(), 1); assert_eq!(child.parent_id(), parent.file_id()); // Derive a grandchild let grandchild = child.derive("grandchild.rvdna", DerivationType::Quantize, None)?; assert_eq!(grandchild.lineage_depth(), 2); ``` ### Pattern 8: Embed a Computational Container Pack a bootable kernel or eBPF program into the file. ```rust use rvf_types::kernel::{KernelArch, KernelType}; use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType}; // Embed a unikernel — file can now boot as a standalone service store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &kernel_image, 8080)?; // Embed an eBPF program — enables kernel-level acceleration store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?; // Extract later let (hdr, img) = store.extract_kernel()?.unwrap(); let (hdr, prog) = store.extract_ebpf()?.unwrap(); ```
Tutorial: Your First RVF Store (Step by Step) ### Step 1: Set Up Create a new Rust project and add the dependency: ```bash cargo new my_vectors cd my_vectors ``` Add to `Cargo.toml`: ```toml [dependencies] rvf-runtime = { path = "../crates/rvf/rvf-runtime" } tempfile = "3" ``` ### Step 2: Create a Store ```rust use rvf_runtime::{RvfStore, RvfOptions, QueryOptions}; use rvf_runtime::options::DistanceMetric; use tempfile::TempDir; fn main() { let tmp = TempDir::new().unwrap(); let path = tmp.path().join("my.rvf"); let opts = RvfOptions { dimension: 128, metric: DistanceMetric::L2, ..Default::default() }; let mut store = RvfStore::create(&path, opts).unwrap(); ``` ### Step 3: Insert Vectors Vectors are inserted in batches. Each vector needs a unique `u64` ID. ```rust let vec_a = vec![0.1f32; 128]; let vec_b = vec![0.2f32; 128]; let vecs: Vec<&[f32]> = vec![&vec_a, &vec_b]; let ids = vec![1u64, 2]; let result = store.ingest_batch(&vecs, &ids, None).unwrap(); println!("Accepted: {}, Rejected: {}", result.accepted, result.rejected); ``` ### Step 4: Query ```rust let query = vec![0.15f32; 128]; let results = store.query(&query, 5, &QueryOptions::default()).unwrap(); for r in &results { println!(" id={}, dist={:.6}", r.id, r.distance); } ``` ### Step 5: Verify Persistence ```rust store.close().unwrap(); let reopened = RvfStore::open(&path).unwrap(); let results2 = reopened.query(&query, 5, &QueryOptions::default()).unwrap(); assert_eq!(results.len(), results2.len()); println!("Persistence verified!"); } ``` ### Expected Output ``` Accepted: 2, Rejected: 0 id=1, dist=0.064000 id=2, dist=0.032000 Persistence verified! ```
Tutorial: Understanding Quantization Tiers ### The Problem A million 384-dim vectors at full precision (fp32) takes **1.5 GB** of RAM. Not all vectors are accessed equally — most are rarely touched. Why keep them all at full precision? ### The Solution: Temperature Tiering RVF assigns vectors to three compression levels based on how often they're accessed: | Tier | Access Pattern | Compression | Memory per Vector (384d) | |------|---------------|------------|--------------------------| | **Hot** | Frequently queried | Scalar (fp32 -> u8) | 384 bytes (4x smaller) | | **Warm** | Occasionally queried | Product quantization | 48 bytes (32x smaller) | | **Cold** | Rarely accessed | Binary (1-bit) | 48 bytes (32x smaller) | | Raw | No compression | fp32 | 1,536 bytes | ### How It Works **1. Track access patterns** using a Count-Min Sketch (a probabilistic counter): ```rust let mut sketch = CountMinSketch::default_sketch(); // Every time a vector is accessed, increment its counter sketch.increment(vector_id); // Check how often a vector has been accessed let count = sketch.estimate(vector_id); ``` **2. Assign tiers** based on configurable thresholds: ```rust let tier = assign_tier(count); // Hot: count >= 100 // Warm: count >= 10 // Cold: count < 10 ``` **3. Encode at the appropriate level:** ```rust // Hot: Scalar (fast, low error) let sq = ScalarQuantizer::train(&vectors); let encoded = sq.encode_vec(&vector); // 384 bytes // Warm: Product (balanced) let pq = ProductQuantizer::train(&vectors, 48, 64, 20); let encoded = pq.encode_vec(&vector); // 48 bytes // Cold: Binary (smallest, approximate) let bits = encode_binary(&vector); // 48 bytes ``` ### Run the Example ```bash cargo run --example quantization ``` You'll see a comparison table showing compression ratio, reconstruction error (MSE), and bytes per vector for each tier.
Tutorial: Building Witness Chains for Audit Trails ### What Is a Witness Chain? A witness chain is a tamper-evident log of events. Each entry links to the previous one through a cryptographic hash. If any entry is modified, all subsequent hash links break — making tampering detectable without a blockchain. ### Chain Structure ``` Entry 0 (genesis) Entry 1 Entry 2 +-------------------+ +-------------------+ +-------------------+ | prev_hash: 0x00.. | | prev_hash: H(E0) | | prev_hash: H(E1) | | action: H(data) | | action: H(data) | | action: H(data) | | timestamp: T0 | | timestamp: T1 | | timestamp: T2 | | type: PROVENANCE | | type: COMPUTATION | | type: SEARCH | +-------------------+ +-------------------+ +-------------------+ 73 bytes 73 bytes 73 bytes ``` - **prev_hash**: SHAKE-256 hash of the previous entry (zeroed for genesis) - **action_hash**: SHAKE-256 hash of whatever action is being recorded - **timestamp_ns**: Nanosecond UNIX timestamp - **witness_type**: What kind of event (see table below) ### Witness Types | Code | Name | When to Use | |------|------|------------| | `0x01` | PROVENANCE | Data origin tracking (e.g., "loaded from model X") | | `0x02` | COMPUTATION | Operation recording (e.g., "built HNSW index") | | `0x03` | SEARCH | Query audit (e.g., "searched for query Q, got results R") | | `0x04` | DELETION | Deletion audit (e.g., "deleted vectors 1-100") | | `0x05` | PLATFORM_ATTESTATION | TEE attestation (e.g., "enclave measured as M") | | `0x06` | KEY_BINDING | Sealed key (e.g., "key K bound to enclave M") | | `0x07` | COMPUTATION_PROOF | Verified computation (e.g., "search ran inside enclave") | | `0x08` | DATA_PROVENANCE | Full chain (e.g., "model -> TEE -> RVF file") | | `0x09` | DERIVATION | File lineage derivation event | | `0x0A` | LINEAGE_MERGE | Multi-parent lineage merge | | `0x0B` | LINEAGE_SNAPSHOT | Lineage snapshot checkpoint | | `0x0C` | LINEAGE_TRANSFORM | Lineage transform operation | | `0x0D` | LINEAGE_VERIFY | Lineage verification event | ### Creating and Verifying ```rust use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry, shake256_256}; // Record three events let entries = vec![ WitnessEntry { prev_hash: [0; 32], // genesis action_hash: shake256_256(b"loaded embeddings from model-v2"), timestamp_ns: 1_700_000_000_000_000_000, witness_type: 0x01, }, WitnessEntry { prev_hash: [0; 32], // filled by create_witness_chain action_hash: shake256_256(b"built HNSW index (M=16, ef=200)"), timestamp_ns: 1_700_000_001_000_000_000, witness_type: 0x02, }, WitnessEntry { prev_hash: [0; 32], action_hash: shake256_256(b"query: top-10 for user request #42"), timestamp_ns: 1_700_000_002_000_000_000, witness_type: 0x03, }, ]; let chain_bytes = create_witness_chain(&entries); let verified = verify_witness_chain(&chain_bytes).unwrap(); assert_eq!(verified.len(), 3); ``` ### Tamper Detection Flip any byte in the chain and verification fails: ```rust let mut tampered = chain_bytes.clone(); tampered[100] ^= 0xFF; // flip one byte assert!(verify_witness_chain(&tampered).is_err()); // detected! ``` ### Run the Example ```bash cargo run --example crypto_signing ``` The example creates a 5-entry chain, verifies it, then demonstrates tamper and truncation detection.
Tutorial: Wire Format Deep Dive ### Segment Header (64 bytes) Every piece of data in an RVF file is wrapped in a self-describing segment. The header is always exactly 64 bytes: ``` Offset Size Field Description ------ ---- ----- ----------- 0x00 4 magic 0x52564653 ("RVFS") 0x04 1 version Format version (currently 1) 0x05 1 seg_type Segment type (VEC, INDEX, MANIFEST, ...) 0x06 2 flags Bitfield (COMPRESSED, SIGNED, ATTESTED, ...) 0x08 8 segment_id Monotonically increasing ID 0x10 8 payload_length Byte length of payload 0x18 8 timestamp_ns Nanosecond UNIX timestamp 0x20 1 checksum_algo 0=CRC32C, 1=XXH3-128, 2=SHAKE-256 0x21 1 compression 0=none, 1=LZ4, 2=ZSTD 0x22 2 reserved_0 Must be zero 0x24 4 reserved_1 Must be zero 0x28 16 content_hash First 128 bits of payload hash 0x38 4 uncompressed_len Original size before compression 0x3C 4 alignment_pad Padding to 64-byte boundary ``` ### The 16 Segment Types | Code | Name | Purpose | |------|------|---------| | `0x01` | VEC | Raw vector embeddings | | `0x02` | INDEX | HNSW adjacency and routing tables | | `0x03` | OVERLAY | Graph overlay deltas | | `0x04` | JOURNAL | Metadata mutations, deletions | | `0x05` | MANIFEST | Segment directory, epoch state | | `0x06` | QUANT | Quantization dictionaries (scalar/PQ/binary) | | `0x07` | META | Key-value metadata | | `0x08` | HOT | Temperature-promoted data | | `0x09` | SKETCH | Access counter sketches (Count-Min) | | `0x0A` | WITNESS | Audit trails, attestation proofs | | `0x0B` | PROFILE | Domain profile declarations | | `0x0C` | CRYPTO | Key material, signature chains | | `0x0D` | META_IDX | Metadata inverted indexes | | `0x0E` | KERNEL | Compressed unikernel image (self-booting) | | `0x0F` | EBPF | eBPF program for kernel-level acceleration | ### Segment Flags | Bit | Name | Description | |-----|------|-------------| | 0 | COMPRESSED | Payload is compressed (LZ4 or ZSTD) | | 1 | ENCRYPTED | Payload is encrypted | | 2 | SIGNED | Signature footer follows payload | | 3 | SEALED | Immutable (compaction output) | | 4 | PARTIAL | Streaming / partial write | | 5 | TOMBSTONE | Logical deletion marker | | 6 | HOT | Temperature-promoted | | 7 | OVERLAY | Contains delta data | | 8 | SNAPSHOT | Full snapshot | | 9 | CHECKPOINT | Safe rollback point | | 10 | ATTESTED | Produced inside attested TEE | | 11 | HAS_LINEAGE | File carries FileIdentity lineage data | ### Crash Safety: Two-fsync Protocol RVF doesn't need a write-ahead log. Instead: 1. Write data segment + payload, then `fsync` 2. Write MANIFEST_SEG with updated state, then `fsync` If the process crashes between fsyncs, the incomplete segment has no manifest reference — it's ignored on recovery. Simple, safe, fast. ### Tail-Scan To find the current state, scan backward from the end of the file for the latest MANIFEST_SEG. The root manifest fits in 4 KB, so cold boot takes < 5 ms. ### Run the Example ```bash cargo run --example wire_format ``` You'll see three segments written, read back, hash-validated, corruption detected, and a tail-scan for the manifest.
Tutorial: Metadata Filtering Patterns ### Available Filter Expressions | Expression | Syntax | Description | |-----------|--------|-------------| | `Eq` | `FilterExpr::Eq(field_id, value)` | Exact match | | `Ne` | `FilterExpr::Ne(field_id, value)` | Not equal | | `Gt` | `FilterExpr::Gt(field_id, value)` | Greater than | | `Lt` | `FilterExpr::Lt(field_id, value)` | Less than | | `Range` | `FilterExpr::Range(field_id, low, high)` | Value in [low, high) | | `In` | `FilterExpr::In(field_id, values)` | Value is one of | | `And` | `FilterExpr::And(vec![...])` | All conditions must match | | `Or` | `FilterExpr::Or(vec![...])` | Any condition matches | ### Metadata Types | Type | Rust | Use Case | |------|------|----------| | `String` | `MetadataValue::String("cat".into())` | Categories, labels, tags | | `U64` | `MetadataValue::U64(95)` | Scores, counts, timestamps | | `Bytes` | `MetadataValue::Bytes(vec![...])` | Binary data, hashes | ### Common Patterns **Category filter:** ```rust FilterExpr::Eq(0, FilterValue::String("science".into())) ``` **Score range:** ```rust FilterExpr::Range(1, FilterValue::U64(30), FilterValue::U64(90)) ``` **Multi-category:** ```rust FilterExpr::In(0, vec![ FilterValue::String("science".into()), FilterValue::String("tech".into()), ]) ``` **Combined (AND):** ```rust FilterExpr::And(vec![ FilterExpr::Eq(0, FilterValue::String("science".into())), FilterExpr::Gt(1, FilterValue::U64(80)), ]) ``` ### Run the Example ```bash cargo run --example filtered_search ``` The example creates 500 vectors with category and score metadata, then runs 7 different filter queries showing selectivity and verification.
Tutorial: Progressive Index Recall Measurement ### What Is Recall? **Recall@K** measures how many of the true K nearest neighbors your approximate algorithm actually returns. A recall of 0.95 means 95% of results are correct. ``` recall@K = |approximate_results ∩ exact_results| / K ``` ### How Progressive Indexing Achieves This RVF builds an HNSW (Hierarchical Navigable Small World) graph, then splits it into three loadable layers: **Layer A: Coarse Routing** - Entry points (topmost HNSW nodes) - Partition centroids for guided search - Loads in microseconds - Recall: ~0.40-0.70 **Layer B: Hot Region** - Adjacency lists for the most frequently accessed vectors - Covers the "working set" of your data - Recall: ~0.70-0.85 **Layer C: Full Graph** - Complete HNSW adjacency for all vectors - Loaded in background while queries are already being served - Recall: >= 0.95 ### Measuring Recall in the Example The `progressive_index` example: 1. Generates 5,000 vectors (128 dims) 2. Builds the full HNSW graph (M=16, ef_construction=200) 3. Splits into Layer A, B, C 4. Runs 50 queries at each stage 5. Computes recall@10 against brute-force ground truth ```bash cargo run --example progressive_index ``` Expected output: ``` === Recall Progression Summary === Layers Recall@10 A only 0.xxx A + B 0.xxx A + B + C 0.9xx ``` ### Tuning ef_search The `ef_search` parameter controls how many candidates HNSW explores during search. Higher values improve recall at the cost of latency: | ef_search | Recall@10 | Relative Speed | |-----------|-----------|---------------| | 10 | ~0.75 | Fastest | | 50 | ~0.90 | Balanced | | 200 | ~0.97 | Most accurate |
Technical Reference: Signature Footer Format When the `SIGNED` flag is set on a segment, a signature footer follows the payload: | Offset | Size | Field | |--------|------|-------| | 0x00 | 2 | `sig_algo` (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s) | | 0x02 | 2 | `sig_length` | | 0x04 | var | `signature` (64 to 7,856 bytes) | | var | 4 | `footer_length` (for backward scan) | ### Supported Algorithms | Algorithm | Signature Size | Security Level | Standard | |-----------|---------------|---------------|----------| | Ed25519 | 64 bytes | 128-bit classical | RFC 8032 | | ML-DSA-65 | 3,309 bytes | NIST Level 3 (post-quantum) | FIPS 204 | | SLH-DSA-128s | 7,856 bytes | NIST Level 1 (post-quantum, stateless) | FIPS 205 | ### Signing Flow 1. Serialize the segment header (64 bytes) and payload into a signing buffer 2. Compute SHAKE-256 hash of the buffer 3. Sign the hash with the chosen algorithm 4. Append the signature footer after the payload (before padding) 5. Set the `SIGNED` flag in the header ### Verification Flow 1. Read segment header and payload 2. Recompute SHAKE-256 hash of header + payload 3. Read signature footer (scan backward from segment end using `footer_length`) 4. Verify signature against the public key
Technical Reference: Confidential Core Attestation ### Overview RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This provides cryptographic proof that: - The platform is genuine (e.g., real Intel SGX hardware) - The code running inside the enclave matches a known measurement - Encryption keys are sealed to the enclave identity - Vector operations were computed inside the secure environment ### Supported TEE Platforms | Platform | Enum Value | Quote Format | |----------|-----------|--------------| | Intel SGX | `TeePlatform::Sgx` (0) | DCAP attestation quote | | AMD SEV-SNP | `TeePlatform::SevSnp` (1) | VCEK attestation report | | Intel TDX | `TeePlatform::Tdx` (2) | TD quote | | ARM CCA | `TeePlatform::ArmCca` (3) | CCA token | | Software (testing) | `TeePlatform::SoftwareTee` (0xFE) | Synthetic (no hardware) | ### Attestation Header (112 bytes, `repr(C)`) ``` Offset Size Field ------ ---- ----- 0x00 1 platform TeePlatform enum value 0x01 1 attestation_type AttestationWitnessType enum value 0x02 4 quote_length Length of the platform-specific quote 0x06 2 reserved 0x08 32 measurement SHAKE-256 hash of enclave code 0x28 32 signer_id SHAKE-256 hash of signing identity 0x48 8 timestamp_ns Nanosecond UNIX timestamp 0x50 16 nonce Anti-replay nonce 0x60 2 svn Security Version Number 0x62 1 sig_algo Signature algorithm for the quote 0x63 1 flags Attestation flags 0x64 4 report_data_len Length of additional report data 0x68 8 reserved ``` ### Attestation Types | Type | Witness Code | Purpose | |------|-------------|---------| | Platform Attestation | `0x05` | TEE identity + measurement verification | | Key Binding | `0x06` | Keys sealed to enclave measurement | | Computation Proof | `0x07` | Proof that operations ran inside enclave | | Data Provenance | `0x08` | Full chain: model -> TEE -> RVF file | ### ATTESTED Segment Flag Any segment produced inside a TEE should set bit 10 (`ATTESTED`) in the segment header flags. This enables fast scanning to identify attested segments without parsing payloads. ### QuoteVerifier Trait The verification interface is pluggable: ```rust pub trait QuoteVerifier { fn platform(&self) -> TeePlatform; fn verify_quote( &self, quote: &[u8], report_data: &[u8], expected_measurement: &[u8; 32], ) -> Result<(), String>; } ``` Implement this trait for your TEE platform to enable hardware-backed verification. The `SoftwareTee` variant allows testing without real hardware.
Technical Reference: Computational Container (Self-Booting RVF) ### Three-Tier Execution Model RVF files can optionally carry executable compute alongside vector data: | Tier | Segment | Size | Environment | Boot Time | Use Case | |------|---------|------|-------------|-----------|----------| | **1: WASM** | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere | | **2: eBPF** | EBPF_SEG (`0x0F`) | 10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits | | **3: Unikernel** | KERNEL_SEG (`0x0E`) | 200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service | ### KernelHeader (128 bytes) | Field | Size | Description | |-------|------|-------------| | `kernel_magic` | 4 | `0x52564B4E` ("RVKN") | | `header_version` | 2 | Currently 1 | | `kernel_arch` | 1 | x86_64 (0), AArch64 (1), RISC-V (2), WASM (3) | | `kernel_type` | 1 | HermitOS (0), Unikraft (1), Custom (2), TestStub (0xFE) | | `image_size` | 4 | Uncompressed kernel size | | `compressed_size` | 4 | Compressed (ZSTD) size | | `image_hash` | 32 | SHAKE-256-256 of uncompressed image | | `api_port` | 2 | HTTP API port (network byte order) | | `api_transport` | 1 | HTTP (0), gRPC (1), virtio-vsock (2) | | `kernel_flags` | 8 | Feature flags (read-only, metrics, TEE, etc.) | | `cmdline_len` | 2 | Length of kernel command line | ### EbpfHeader (64 bytes) | Field | Size | Description | |-------|------|-------------| | `ebpf_magic` | 4 | `0x52564250` ("RVBP") | | `program_type` | 1 | XDP (0), TC (1), Tracepoint (2), Socket (3) | | `attach_type` | 1 | XdpIngress (0), TcIngress (1), etc. | | `max_dimension` | 4 | Maximum vector dimension (eBPF verifier loop bound) | | `bytecode_size` | 4 | Size of BPF ELF object | | `btf_size` | 4 | Size of BTF section | | `map_count` | 4 | Number of BPF maps | ### Embedding and Extracting ```rust use rvf_runtime::RvfStore; use rvf_types::kernel::{KernelArch, KernelType}; use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType}; let mut store = RvfStore::open("vectors.rvf")?; // Embed a kernel store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &image, 8080)?; // Embed an eBPF program store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?; // Extract later let (kernel_hdr, kernel_img) = store.extract_kernel()?.unwrap(); let (ebpf_hdr, ebpf_prog) = store.extract_ebpf()?.unwrap(); ``` ### Forward Compatibility Files with KERNEL_SEG or EBPF_SEG work with older readers -- unknown segment types are skipped per the RVF forward-compatibility rule. The computational capability is purely additive. See [ADR-030](../../docs/adr/ADR-030-rvf-computational-container.md) for the full specification.
Technical Reference: DNA-Style Lineage Provenance ### How Lineage Works Every RVF file carries a 68-byte `FileIdentity` in its root manifest: | Field | Size | Description | |-------|------|-------------| | `file_id` | 16 | Unique UUID for this file | | `parent_id` | 16 | UUID of the parent file (all zeros for root) | | `parent_hash` | 32 | SHAKE-256-256 of parent's manifest | | `lineage_depth` | 4 | Generation count (0 for root) | ### Derivation Chain ``` Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna file_id: A file_id: B file_id: C parent_id: [0;16] parent_id: A parent_id: B parent_hash: [0;32] parent_hash: hash(A) parent_hash: hash(B) depth: 0 depth: 1 depth: 2 ``` ### Derivation Types | Code | Type | Description | |------|------|-------------| | 0 | Clone | Exact copy | | 1 | Filter | Subset of parent's vectors | | 2 | Merge | Multi-parent merge | | 3 | Quantize | Re-quantized version | | 4 | Reindex | Re-indexed with different parameters | | 5 | Transform | Transformed embeddings | | 6 | Snapshot | Point-in-time snapshot | | 0xFF | UserDefined | Application-specific derivation | ### Using the API ```rust use rvf_runtime::RvfStore; use rvf_types::DerivationType; let parent = RvfStore::create("parent.rvf", options)?; // Derive a filtered child let child = parent.derive("child.rvf", DerivationType::Filter, None)?; assert_eq!(child.lineage_depth(), 1); assert_eq!(child.parent_id(), parent.file_id()); ``` ### Domain Extensions | Extension | Domain Profile | Optimized For | |-----------|---------------|---------------| | `.rvf` | Generic | General-purpose vectors | | `.rvdna` | RVDNA | Genomic sequence embeddings | | `.rvtext` | RVText | Language model embeddings | | `.rvgraph` | RVGraph | Graph/network node embeddings | | `.rvvis` | RVVision | Image/vision model embeddings | See [ADR-029](../../docs/adr/ADR-029-rvf-canonical-format.md) for the full format specification.
Technical Reference: Crate Architecture ### Crate Map ``` +-----------------------------------------+ | Cognitive Layer | | ruvllm | gnn | ruQu | attention | sona | | mincut | prime-radiant | nervous-system | +---+-------------+---------------+-------+ | | | +-----------------------------------------+ | Application Layer | | claude-flow | agentdb | agentic-flow | | ospipe | rvlite | sona | your-app | +---+-------------+---------------+-------+ | | | +---v-------------v---------------v-------+ | RVF SDK Layer | | rvf-runtime | rvf-index | rvf-quant | | rvf-manifest | rvf-crypto | rvf-wire | +---+-------------+---------------+-------+ | | | +--------v------+ +---v--------+ +----v-------+ +----v------+ | rvf-server | | rvf-node | | rvf-wasm | | rvf-cli | | HTTP + TCP | | N-API | | ~46 KB | | clap | +---------------+ +------------+ +------------+ +-----------+ ``` ### Crate Details | Crate | Lines | no_std | Purpose | |-------|------:|:------:|---------| | `rvf-types` | 3,184 | Yes | Segment types, kernel/eBPF headers, lineage, enums | | `rvf-wire` | 2,011 | Yes | Wire format read/write, hash validation | | `rvf-manifest` | 1,580 | No | Two-level manifest with 4 KB root, FileIdentity codec | | `rvf-index` | 2,691 | No | HNSW progressive indexing (Layer A/B/C) | | `rvf-quant` | 1,443 | No | Scalar, product, and binary quantization | | `rvf-crypto` | 1,725 | Partial | SHAKE-256, Ed25519, witness chains, attestation, lineage | | `rvf-runtime` | 3,607 | No | Full store API, compaction, lineage, kernel/eBPF embed | | `rvf-import` | 980 | No | JSON, CSV, NumPy (.npy) importers | | `rvf-wasm` | 1,616 | Yes | WASM control plane: in-memory store, query, segment inspection | | `rvf-node` | 852 | No | Node.js N-API bindings with lineage, kernel/eBPF, inspection | | `rvf-cli` | 665 | No | Unified CLI: create, ingest, query, delete, status, inspect, compact, derive, serve | | `rvf-server` | 1,165 | No | HTTP REST + TCP streaming server | ### Library Adapters | Adapter | Purpose | Key Feature | |---------|---------|-------------| | `rvf-adapter-claude-flow` | AI agent memory | WITNESS_SEG audit trails | | `rvf-adapter-agentdb` | Agent vector database | Progressive HNSW indexing | | `rvf-adapter-ospipe` | Observation-State pipeline | META_SEG for state vectors | | `rvf-adapter-agentic-flow` | Swarm coordination | Inter-agent memory sharing | | `rvf-adapter-rvlite` | Lightweight embedded store | Minimal API, edge-friendly | | `rvf-adapter-sona` | Neural architecture | Experience replay + trajectories |
Technical Reference: File Format Specification ### File Extension | Extension | Usage | |-----------|-------| | `.rvf` | Standard RuVector Format file | | `.rvf.cold.N` | Cold shard N (multi-file mode) | | `.rvf.idx.N` | Index shard N (multi-file mode) | ### MIME Type `application/x-ruvector-format` ### Magic Number `0x52564653` (ASCII: "RVFS") ### Byte Order All multi-byte integers are **little-endian**. ### Alignment All segments are **64-byte aligned** (cache-line friendly). Payloads are padded to the next 64-byte boundary. ### Root Manifest The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via backward scan: ```rust let (offset, header) = find_latest_manifest(&file_data)?; ``` The root manifest provides: - Segment directory (offsets to all segments) - Hotset pointers (entry points, top layer, centroids, quant dicts) - Epoch counter - Vector count and dimension - Profile identifiers ### Domain Profiles | Profile | Code | Optimized For | |---------|------|---------------| | Generic | `0x00` | General-purpose vectors | | RVDNA | `0x01` | Genomic sequence embeddings | | RVText | `0x02` | Language model embeddings | | RVGraph | `0x03` | Graph/network node embeddings | | RVVision | `0x04` | Image/vision model embeddings |
Building from Source ### Prerequisites - **Rust 1.87+** via [rustup](https://rustup.rs/) (`rustup update stable`) - For WASM: `rustup target add wasm32-unknown-unknown` - For Node.js bindings: Node.js 18+ and `npm` ### Build Examples ```bash cd examples/rvf cargo build ``` ### Build All RVF Crates ```bash cd crates/rvf cargo build --workspace ``` ### Run All Tests ```bash cd crates/rvf cargo test --workspace ``` ### Run Clippy ```bash cd crates/rvf cargo clippy --all-targets --workspace --exclude rvf-wasm ``` ### Build WASM Microkernel ```bash cd crates/rvf cargo build --target wasm32-unknown-unknown -p rvf-wasm --release ls target/wasm32-unknown-unknown/release/rvf_wasm.wasm ``` ### Build Node.js Bindings ```bash cd crates/rvf/rvf-node npm install && npm run build ``` ### Run Benchmarks ```bash cd crates/rvf cargo bench --bench rvf_benchmarks ```
---
Project Structure ``` examples/rvf/ Cargo.toml # Standalone workspace src/lib.rs # Shared utilities examples/ # Core (6) basic_store.rs # Store lifecycle, insert, query, persistence progressive_index.rs # Three-layer HNSW, recall measurement quantization.rs # Scalar, product, binary quantization + tiering wire_format.rs # Raw segment I/O, hash validation, tail-scan crypto_signing.rs # Ed25519 signing, witness chains, tamper detection filtered_search.rs # Metadata-filtered vector search # Agentic AI (6) agent_memory.rs # Persistent agent memory + witness audit swarm_knowledge.rs # Multi-agent shared knowledge base reasoning_trace.rs # Chain-of-thought with lineage derivation tool_cache.rs # Tool call result cache with TTL + compaction agent_handoff.rs # Transfer agent state between instances experience_replay.rs # RL experience replay buffer # Practical Production (5) semantic_search.rs # Document search engine (4 filter workflows) recommendation.rs # Item recommendations (collaborative filtering) rag_pipeline.rs # Retrieval-augmented generation pipeline embedding_cache.rs # LRU cache with temperature tiering dedup_detector.rs # Near-duplicate detection + compaction # Vertical Domains (4) genomic_pipeline.rs # DNA k-mer search (.rvdna profile) financial_signals.rs # Market signals with attestation medical_imaging.rs # Radiology embedding search (.rvvis) legal_discovery.rs # Legal document similarity (.rvtext) # Exotic Capabilities (5) self_booting.rs # RVF with embedded unikernel ebpf_accelerator.rs # eBPF hot-path acceleration hyperbolic_taxonomy.rs # Hierarchy-aware search multimodal_fusion.rs # Cross-modal text + image search sealed_engine.rs # Full cognitive engine (capstone) # Runtime Targets + Postgres (5) browser_wasm.rs # Browser-side WASM vector search edge_iot.rs # IoT device with binary quantization serverless_function.rs # Cold-start optimized for Lambda ruvllm_inference.rs # LLM KV cache + LoRA via RVF postgres_bridge.rs # PostgreSQL ↔ RVF export/import # Network & Security (4) network_sync.rs # Peer-to-peer vector store sync tee_attestation.rs # TEE attestation + sealed keys access_control.rs # Role-based vector access control zero_knowledge.rs # Zero-knowledge proof integration # Autonomous Agent (1) ruvbot.rs # Autonomous RVF-powered agent bot # POSIX & Systems (3) posix_fileops.rs # POSIX file operations with RVF linux_microkernel.rs # Linux microkernel distribution mcp_in_rvf.rs # MCP server embedded in RVF # Network Operations (1) network_interfaces.rs # Network OS telemetry (60 interfaces) ```
## Learn More | Resource | Description | |----------|-------------| | [RVF Format Specification](../../crates/rvf/README.md) | Full format documentation, architecture, and API reference | | [ADR-029](../../docs/adr/ADR-029-rvf-canonical-format.md) | Architecture decision record for the canonical format | | [ADR-030](../../docs/adr/ADR-030-rvf-computational-container.md) | Computational container (KERNEL_SEG, EBPF_SEG) specification | | [ADR-031](../../docs/adr/ADR-031-rvf-example-repository.md) | Example repository design (this collection of 40 examples) | | [Benchmarks](../../crates/rvf/benches/) | Performance benchmarks (HNSW build, quantization, wire I/O) | | [Integration Tests](../../crates/rvf/tests/rvf-integration/) | E2E test suite (progressive recall, quantization, wire interop) | ## Contributing ```bash git clone https://github.com/ruvnet/ruvector cd ruvector/examples/rvf cargo build && cargo run --example basic_store ``` All contributions must pass `cargo clippy` with zero warnings and maintain the existing test count (currently 543+). ## License Dual-licensed under [MIT](../../LICENSE-MIT) or [Apache-2.0](../../LICENSE-APACHE) at your option. ---

Built with Rust. One file — store it, send it, run it.