RVF — RuVector Format
One file. Store vectors. Ship models. Boot services. Prove everything.
🚀 Quick Start • 📦 What It Contains • 🧠 Cognitive Engines • 🏗️ Architecture • ⚡ Performance • 📊 Comparison
--- dsp ## 🧠 What is RVF? A Cognitive Container **RVF (RuVector Format)** is a universal binary substrate that merges database, model, graph engine, kernel, and attestation into a single deployable file. A `.rvf` file can store vector embeddings, carry LoRA adapter deltas, embed GNN graph state, include a bootable Linux microkernel, run queries in a 5.5 KB WASM runtime, and prove every operation through a cryptographic witness chain — all in one file that runs anywhere from a browser to bare metal. This is not a database format. It is an **executable knowledge unit**. #### 🖥️ Compute & Execution | Capability | How | Segment | |------------|-----|---------| | 🖥️ **Self-boot as a microservice** | The file contains a real Linux kernel. Drop it on a VM and it boots as a running service in under 125 ms. No install, no dependencies. | `KERNEL_SEG` (0x0E) | | ⚡ **Hardware-speed lookups via eBPF** | Hot vectors are served directly in the Linux kernel data path, bypassing userspace entirely. Three real C programs handle distance, filtering, and routing. | `EBPF_SEG` (0x0F) | | 🌐 **Runs in any browser** | A 5.5 KB WebAssembly runtime lets the same file serve queries in a browser tab with zero backend. | `WASM_SEG` | #### 🧠 AI & Data Storage | Capability | How | Segment | |------------|-----|---------| | 🧠 **Ship models, graphs, and quantum state** | One file carries LoRA fine-tune weights, graph neural network state, and quantum circuit snapshots alongside vectors. No separate model registry needed. | `OVERLAY` / `GRAPH` / `SKETCH` | | 🌿 **Git-like branching** | Create a child file that shares all parent data. Only changed vectors are copied. A 1M-vector parent with 100 edits produces a ~2.5 MB child instead of a 512 MB copy. | `COW_MAP` / `MEMBERSHIP` (0x20-0x23) | | 📊 **Instant queries while loading** | Start answering queries at 70% accuracy immediately. Accuracy improves to 95%+ as the full index loads in the background. No waiting. | `INDEX_SEG` | | 🔍 **Search with filters** | Combine vector similarity with metadata conditions like "genre = sci-fi AND year > 2020" in a single query. | `META_IDX_SEG` (0x0D) | | 💥 **Never corrupts on crash** | Power loss mid-write? The file is always readable. Append-only design means incomplete writes are simply ignored on recovery. No write-ahead log needed. | Format rule | #### 🔐 Security & Trust RVF treats security as a structural property of the format, not an afterthought. Every segment can be individually signed, every operation is hash-chained into a tamper-evident ledger, and every derived file carries a cryptographic link to its parent. The result: you can hand someone a `.rvf` file and they can independently verify what data is inside, who produced it, what operations were performed, and whether anything was altered — without trusting the sender. | Capability | How | Segment | |------------|-----|---------| | 🔗 **Tamper-evident audit trail** | Every insert, query, and deletion is recorded in a SHAKE-256 hash-linked chain. Change one byte anywhere and the entire chain fails verification. | `WITNESS_SEG` (0x0A) | | 🔐 **Kernel locked to its data** | A 128-byte `KernelBinding` footer ties each signed kernel to its manifest hash. Prevents segment-swap attacks — the kernel only boots if the data it was built for is present and unmodified. | `KERNEL_SEG` + `CRYPTO_SEG` | | 🛡️ **Quantum-safe signatures** | Segments can be signed with ML-DSA-65 (FIPS 204) and SLH-DSA-128s alongside Ed25519. Dual-signing means files stay trustworthy even after quantum computers break classical crypto. | `CRYPTO_SEG` (0x0C) | | 🧬 **Track where data came from** | Every file records its parent, grandparent, and full derivation history with cryptographic hashes — DNA-style lineage. Verify that a child was legitimately derived from its parent without accessing the parent file. | `MANIFEST_SEG` | | 🏛️ **TEE attestation** | Record hardware attestation quotes from Intel SGX, AMD SEV-SNP, Intel TDX, and ARM CCA. Proves vector operations ran inside a verified secure enclave. | `CRYPTO_SEG` | | 🛡️ **Adversarial hardening** | Input validation, rate limiting, and resource exhaustion guards. Declarative `SecurityPolicy` configuration prevents denial-of-service and malformed-input attacks. | Runtime | #### 📦 Ecosystem & Tooling | Capability | How | Segment | |------------|-----|---------| | 🤖 **Plug into AI agents** | An MCP server lets Claude Code, Cursor, and other AI tools create, query, and manage vector stores directly. | npm package | | 📦 **Use from any language** | Published as 14 Rust crates, 6 adapters, 4 npm packages, a CLI tool, and an HTTP server. Works from Rust, Node.js, browsers, and the command line. | 14 crates + 6 adapters + 4 npm | | ♻️ **Always backward-compatible** | Old tools skip new segment types they don't understand. A file with COW branching still works in a reader that only knows basic vectors. | Format rule | ``` 📦 Anatomy of a .rvf Cognitive Container (24 segment types) ┌─────────────────────────────────────────────────────────────┐ │ .rvf file │ ├──────────────────────────┬──────────────────────────────────┤ │ 📋 Core Data │ 🧠 AI & Models │ │ MANIFEST (4 KB root) │ OVERLAY (LoRA deltas) │ │ VEC_SEG (embeddings) │ GRAPH (GNN state) │ │ INDEX_SEG (HNSW graph) │ SKETCH (quantum / VQE) │ │ QUANT (codebooks) │ META (key-value) │ │ HOT (promoted) │ PROFILE (domain config) │ │ META_IDX (filter idx) │ JOURNAL (mutations) │ ├──────────────────────────┼──────────────────────────────────┤ │ 🌿 COW Branching │ 🔐 Security & Trust │ │ COW_MAP (ownership) │ WITNESS (audit chain) │ │ REFCOUNT (ref counts) │ CRYPTO (signatures) │ │ MEMBERSHIP (visibility) │ KERNEL (Linux + binding) │ │ DELTA (sparse patch)│ EBPF (XDP / TC / socket) │ │ │ WASM (5.5 KB runtime) │ ├──────────────────────────┴──────────────────────────────────┤ │ │ │ Store it ─── single-file vector DB, no external deps │ │ Ship it ─── wire-format streaming, one file = one unit │ │ Run it ─── boots Linux, runs in browser, eBPF in kernel │ │ Trust it ─── witness chain + attestation + PQ signatures │ │ Branch it ── COW at cluster granularity, <3 ms │ │ Track it ─── DNA-style lineage from parent to child │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ 🖥️ Boots │ │ 🌐 Runs │ │ │ │ as Linux │ │ in any │ │ │ │ microVM │ │ browser │ │ │ │ <125 ms │ │ 5.5 KB │ │ │ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` The same `.rvf` file runs on servers, browsers (WASM), edge devices, TEE enclaves, Firecracker microVMs, and in the Linux kernel data path (eBPF) — no conversion, no re-indexing, no external dependencies. --- ## 📦 Published Packages ### Rust Crates (crates.io) | Crate | Version | Description | |-------|---------|-------------| | [`rvf-types`](https://crates.io/crates/rvf-types) | 0.2.0 | Segment types, 24 headers, quality, security, AGI container types (`no_std`) | | [`rvf-wire`](https://crates.io/crates/rvf-wire) | 0.1.0 | Wire format read/write (`no_std`) | | [`rvf-manifest`](https://crates.io/crates/rvf-manifest) | 0.1.0 | Two-level manifest, FileIdentity, COW pointers | | [`rvf-quant`](https://crates.io/crates/rvf-quant) | 0.1.0 | Scalar, product, and binary quantization | | [`rvf-index`](https://crates.io/crates/rvf-index) | 0.1.0 | HNSW progressive indexing (Layer A/B/C) | | [`rvf-crypto`](https://crates.io/crates/rvf-crypto) | 0.2.0 | SHAKE-256, Ed25519, witness chains, seed crypto | | [`rvf-runtime`](https://crates.io/crates/rvf-runtime) | 0.2.0 | Full store API, COW engine, AGI containers, QR seeds, safety net | | [`rvf-kernel`](https://crates.io/crates/rvf-kernel) | 0.1.0 | Linux kernel builder, initramfs, Docker pipeline | | [`rvf-ebpf`](https://crates.io/crates/rvf-ebpf) | 0.1.0 | BPF C compiler (XDP, socket filter, TC) | | [`rvf-launch`](https://crates.io/crates/rvf-launch) | 0.1.0 | QEMU microvm launcher, KVM/TCG, QMP | | [`rvf-server`](https://crates.io/crates/rvf-server) | 0.1.0 | HTTP REST + TCP streaming server | | [`rvf-import`](https://crates.io/crates/rvf-import) | 0.1.0 | JSON, CSV, NumPy importers | | [`rvf-cli`](https://crates.io/crates/rvf-cli) | 0.1.0 | Unified CLI with 17 subcommands | | [`rvf-solver-wasm`](https://crates.io/crates/rvf-solver-wasm) | 0.1.0 | Thompson Sampling temporal solver (WASM, `no_std`) | ### npm Packages (npmjs.org) | Package | Version | Description | |---------|---------|-------------| | [`@ruvector/rvf`](https://www.npmjs.com/package/@ruvector/rvf) | 0.1.0 | Unified TypeScript SDK | | [`@ruvector/rvf-node`](https://www.npmjs.com/package/@ruvector/rvf-node) | 0.1.0 | Node.js N-API native bindings | | [`@ruvector/rvf-wasm`](https://www.npmjs.com/package/@ruvector/rvf-wasm) | 0.1.0 | WASM browser package | | [`@ruvector/rvf-mcp-server`](https://www.npmjs.com/package/@ruvector/rvf-mcp-server) | 0.1.0 | MCP server for AI agents | ### Platform Support | Platform | Status | Notes | |----------|--------|-------| | **Linux** (x86_64, aarch64) | Full | KVM acceleration, eBPF, SIMD (AVX2/NEON) | | **macOS** (x86_64, Apple Silicon) | Full | TCG fallback for QEMU, NEON SIMD on ARM | | **Windows** (x86_64) | Core | Store, query, index, crypto work. QEMU launcher requires WSL or Windows QEMU. | | **WASM** (browser, edge) | Full | 5.5 KB microkernel, ~46 KB control plane | | **no_std** (embedded) | Types only | `rvf-types` and `rvf-wire` are `no_std` compatible | --- ## 🚀 Quick Start ### Install ```bash # Rust crate (library) cargo add rvf-runtime # CLI tool cargo install rvf-cli # or build from source: cd crates/rvf && cargo build -p rvf-cli --release # Node.js / npm npm install @ruvector/rvf-node # WASM (browser / edge) rustup target add wasm32-unknown-unknown cargo build -p rvf-wasm --target wasm32-unknown-unknown --release # → target/wasm32-unknown-unknown/release/rvf_wasm.wasm (~46 KB) # MCP Server (for Claude Code, Cursor, etc.) npx @ruvector/rvf-mcp-server --transport stdio ``` ### Rust Crate ```toml # Cargo.toml [dependencies] rvf-runtime = "0.2" # full store API rvf-types = "0.2" # types only (no_std) rvf-wire = "0.1" # wire format (no_std) rvf-crypto = "0.2" # signatures + witness chains rvf-import = "0.1" # JSON/CSV/NumPy importers ``` ```rust use rvf_runtime::{RvfStore, options::{RvfOptions, QueryOptions, DistanceMetric}}; let mut store = RvfStore::create("vectors.rvf", RvfOptions { dimension: 384, metric: DistanceMetric::Cosine, ..Default::default() })?; // Insert store.ingest_batch(&[&embedding], &[1], None)?; // Query let results = store.query(&query, 10, &QueryOptions::default())?; // Derive a child with lineage tracking let child = store.derive("child.rvf", DerivationType::Filter, None)?; // Embed a kernel — file now boots as a microservice store.embed_kernel(0x00, 0x01, 0, &kernel_image, 8080, None)?; store.close()?; ``` ### Node.js / npm ```bash npm install @ruvector/rvf-node ``` ```javascript const { RvfDatabase } = require('@ruvector/rvf-node'); // Create, insert, query const db = RvfDatabase.create('vectors.rvf', { dimension: 384 }); db.ingestBatch(new Float32Array(384), [1]); const results = db.query(new Float32Array(384), 10); // Lineage & inspection console.log(db.fileId()); // unique file UUID console.log(db.dimension()); // 384 console.log(db.segments()); // [{ type, id, size }] db.close(); ``` ### WASM (Browser / Edge) ```html ``` The WASM binary is **~46 KB** (control plane with in-memory store) or **~5.5 KB** (tile microkernel for Cognitum). No backend required. ### CLI ```bash # Full lifecycle from the command line rvf create vectors.rvf --dimension 384 rvf ingest vectors.rvf --input data.json --format json rvf query vectors.rvf --vector "0.1,0.2,..." --k 10 rvf status vectors.rvf rvf inspect vectors.rvf # show all segments rvf compact vectors.rvf # reclaim deleted space rvf derive parent.rvf child.rvf --type filter rvf serve vectors.rvf --port 8080 # Machine-readable output rvf status vectors.rvf --json ``` ### Lightweight (rvlite) ```rust use rvf_adapter_rvlite::{RvliteCollection, RvliteConfig}; let mut col = RvliteCollection::create(RvliteConfig::new("vectors.rvf", 128))?; col.add(1, &[0.1; 128])?; let matches = col.search(&[0.15; 128], 5); ``` ### Generate Sample Files ```bash cd examples/rvf cargo run --example generate_all ls output/ # 46 .rvf files ready to inspect rvf status output/sealed_engine.rvf rvf inspect output/linux_microkernel.rvf ``` --- ## 📋 What RVF Contains An RVF file is a sequence of typed segments. Each segment is self-describing, 64-byte aligned, and independently integrity-checked. The format supports 24 segment types that together constitute a complete cognitive runtime: ``` .rvf file (Sealed Cognitive Engine) | +-- MANIFEST_SEG .... 4 KB root manifest, segment directory, instant boot +-- VEC_SEG ......... Vector embeddings (fp16/fp32/int8/int4/binary) +-- INDEX_SEG ....... HNSW progressive index (Layer A/B/C) +-- OVERLAY_SEG ..... LoRA adapter deltas, incremental updates +-- GRAPH_SEG ....... GNN adjacency, edge weights, graph state +-- QUANT_SEG ....... Quantization codebooks (scalar/PQ/binary) +-- SKETCH_SEG ...... Access sketches, VQE snapshots, quantum state +-- META_SEG ........ Key-value metadata, observation-state +-- WITNESS_SEG ..... Tamper-evident audit trails, attestation records +-- CRYPTO_SEG ...... ML-DSA-65 / Ed25519 signatures, sealed keys +-- WASM_SEG ........ 5.5 KB query microkernel (Tier 1: browser/edge) +-- EBPF_SEG ........ eBPF fast-path program (Tier 2: kernel acceleration) +-- KERNEL_SEG ...... Compressed unikernel (Tier 3: self-booting service) +-- PROFILE_SEG ..... Domain profile (RVDNA/RVText/RVGraph/RVVision) +-- HOT_SEG ......... Temperature-promoted hot data +-- META_IDX_SEG .... Metadata inverted indexes for filtered search +-- COW_MAP_SEG ..... Cluster ownership map for COW branching (0x20) +-- REFCOUNT_SEG .... Cluster reference counts, rebuildable (0x21) +-- MEMBERSHIP_SEG .. Vector visibility filter for branches (0x22) +-- DELTA_SEG ....... Sparse delta patches / LoRA overlays (0x23) +-- TRANSFER_PRIOR .. Transfer learning priors (0x30) +-- POLICY_KERNEL ... Thompson Sampling policy state (0x31) +-- COST_CURVE ...... Cost/reward curves for solver (0x32) ``` --- ## 🧠 Sealed Cognitive Engines When an RVF file combines vectors, models, compute, and trust segments, it becomes a **deployable intelligence capsule**: ### Example: Domain Intelligence Unit ``` ClinicalOncologyEngine.rvdna (one file, ~50 MB) Contains: -- Medical corpus embeddings VEC_SEG 384-dim, 2M vectors -- MicroLoRA oncology fine-tune OVERLAY_SEG adapter deltas -- Biological pathway GNN GRAPH_SEG pathway modeling -- Molecular similarity state SKETCH_SEG quantum-enhanced -- Linux microkernel service KERNEL_SEG boots on Firecracker -- Browser query runtime WASM_SEG 5.5 KB, no backend -- eBPF drug lookup accelerator EBPF_SEG sub-microsecond -- Attested execution proof WITNESS_SEG tamper-evident chain -- Post-quantum signature CRYPTO_SEG ML-DSA-65 ``` This is not a database. It is a **sealed, auditable, self-booting domain expert**. Copy it to a Firecracker VM and it boots a Linux service. Open it in a browser and WASM serves queries locally. Ship it air-gapped and it produces identical results under audit. --- ## 🔌 RuVector Ecosystem Integration RVF is the canonical binary format across 87+ Rust crates in the RuVector ecosystem: | Domain | Crates | RVF Segment | |--------|--------|-------------| | **LLM Inference** | `ruvllm`, `ruvllm-cli`, `ruvllm-wasm` | VEC_SEG (KV cache), OVERLAY_SEG (LoRA) | | **Attention** | `ruvector-attention`, coherence-gated transformer | VEC_SEG, INDEX_SEG | | **GNN** | `ruvector-gnn`, `ruvector-graph`, graph-node/wasm | GRAPH_SEG | | **Quantum** | `ruQu`, `ruqu-core`, `ruqu-algorithms`, `ruqu-exotic` | SKETCH_SEG (VQE, syndrome tables) | | **Min-Cut Coherence** | `ruvector-mincut`, mincut-gated-transformer | GRAPH_SEG, INDEX_SEG | | **Delta Tracking** | `ruvector-delta-core`, delta-graph, delta-index | OVERLAY_SEG, JOURNAL_SEG | | **Neural Routing** | `ruvector-tiny-dancer-core` (FastGRNN) | VEC_SEG, META_SEG | | **Sparse Inference** | `ruvector-sparse-inference` | VEC_SEG, QUANT_SEG | | **Temporal Tensors** | `ruvector-temporal-tensor` | VEC_SEG, META_SEG | | **Cognitum Silicon** | `cognitum-gate-kernel`, `cognitum-gate-tilezero` | WASM_SEG (64 KB tiles) | | **SONA Learning** | `sona` (self-optimizing neural arch) | VEC_SEG, WITNESS_SEG | | **Agent Memory** | claude-flow, agentdb, agentic-flow, ospipe | All segments via adapters | The same `.rvf` file format runs on cloud servers, Firecracker microVMs, TEE enclaves, edge devices, Cognitum tiles, and in the browser. --- ## ✨ Features ### Storage & Indexing | Feature | Description | |---------|-------------| | **Append-only segments** | Crash-safe without WAL. Every write is atomic with per-segment integrity checksums. | | **Progressive indexing** | Three-tier HNSW (Layer A/B/C). First query at 70% recall before full index loads. | | **Temperature-tiered quantization** | Hot vectors stay fp16, warm use product quantization, cold use binary — automatically. | | **Metadata filtering** | Filtered k-NN with boolean expressions (AND/OR/NOT/IN/RANGE). | | **4 KB instant boot** | Root manifest fits in one page read. Cold boot < 5 ms. | | **24 segment types** | VEC, INDEX, MANIFEST, QUANT, WITNESS, CRYPTO, KERNEL, EBPF, WASM, COW_MAP, MEMBERSHIP, DELTA, TRANSFER_PRIOR, POLICY_KERNEL, COST_CURVE, and 9 more. | ### COW Branching (RVCOW) | Feature | Description | |---------|-------------| | **COW branching** | Git-like copy-on-write at cluster granularity. Derive child stores that share parent data; only changed clusters are copied. | | **Membership filters** | Shared HNSW index across branches with bitmap visibility control. Include/exclude modes. | | **Snapshot freeze** | Immutable snapshot at any generation. Metadata-only operation, no data copy. | | **Delta segments** | Sparse patches for LoRA overlays. Hot-path guard upgrades to full slab. | | **Rebuildable refcounts** | No WAL. Refcounts derived from COW map chain during compaction. | ### Ecosystem & Tooling | Feature | Description | |---------|-------------| | **Domain profiles** | `.rvdna`, `.rvtext`, `.rvgraph`, `.rvvis` extensions map to optimized profiles. | | **Unified CLI** | 17 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts. | | **6 library adapters** | Drop-in integration for claude-flow, agentdb, ospipe, agentic-flow, rvlite, sona. | | **MCP server** | Model Context Protocol integration for Claude Code, Cursor, and AI agents. | | **Node.js bindings** | N-API bindings with lineage, kernel/eBPF, and inspection support. | --- ## 🏗️ Architecture ``` +-----------------------------------------------------------------+ | Cognitive Layer | | ruvllm (LLM) | ruvector-gnn (GNN) | ruQu (Quantum) | | ruvector-attention | sona (SONA) | ruvector-mincut | +---+------------------+-----------------+-----------+------------+ | | | | +---v------------------v-----------------v-----------v------------+ | Agent & Application Layer | | claude-flow | agentdb | agentic-flow | ospipe | rvlite | +---+------------------+-----------------+-----------+------------+ | | | | +---v------------------v-----------------v-----------v------------+ | RVF SDK Layer | | rvf-runtime | rvf-index | rvf-quant | rvf-crypto | rvf-wire | | rvf-manifest | rvf-types | rvf-import | rvf-adapters | +---+--------+---------+----------+-----------+------------------+ | | | | | +---v---+ +--v----+ +--v-----+ +-v--------+ +v-----------+ +v------+ |server | | node | | wasm | | kernel | | ebpf | | cli | |HTTP | | N-API | | ~46 KB | |bzImage+ | |clang BPF | |17 cmds| |REST+ | | | | | |initramfs | |XDP/TC/sock | | | |TCP | | | | | +----------+ +------------+ +-------+ +-------+ +-------+ +--------+ +-v--------+ | launch | |QEMU+QMP | +----------+ ``` ### Segment Model An `.rvf` file is a sequence of 64-byte-aligned segments. Each segment has a self-describing header: ``` +--------+------+-------+--------+-----------+-------+----------+ | Magic | Ver | Type | Flags | SegmentID | Size | Hash | | 4B | 1B | 1B | 2B | 8B | 8B | 16B ... | +--------+------+-------+--------+-----------+-------+----------+ | Payload (variable length, 64-byte aligned) | +----------------------------------------------------------------+ ``` ### Crate Map | Crate | Lines | Purpose | |-------|------:|---------| | `rvf-types` | 7,000+ | 24 segment types, AGI container, quality, security, WASM bootstrap, QR seed (`no_std`) | | `rvf-wire` | 2,011 | Wire format read/write (`no_std`) | | `rvf-manifest` | 1,700+ | Two-level manifest with 4 KB root, FileIdentity codec, COW pointers, double-root scheme | | `rvf-index` | 2,691 | HNSW progressive indexing (Layer A/B/C) | | `rvf-quant` | 1,443 | Scalar, product, and binary quantization | | `rvf-crypto` | 1,725 | SHAKE-256, Ed25519, witness chains, attestation, seed crypto | | `rvf-runtime` | 8,000+ | Full store API, COW engine, AGI containers, QR seeds, safety net, adversarial defense | | `rvf-kernel` | 2,400+ | Real Linux kernel builder, cpio/newc initramfs, Docker build, SHA3-256 verification | | `rvf-launch` | 1,200+ | QEMU microvm launcher, KVM/TCG detection, QMP shutdown protocol | | `rvf-ebpf` | 1,100+ | Real BPF C compiler (XDP, socket filter, TC), vmlinux.h generation | | `rvf-wasm` | 1,700+ | WASM control plane: in-memory store, query, segment inspection, witness chain verification (~46 KB) | | `rvf-solver-wasm` | 1,500+ | Thompson Sampling temporal solver, PolicyKernel, three-loop architecture (`no_std`) | | `rvf-node` | 852 | Node.js N-API bindings with lineage, kernel/eBPF, and inspection | | `rvf-cli` | 1,800+ | Unified CLI with 17 subcommands (create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts) | | `rvf-server` | 1,165 | HTTP REST + TCP streaming server | | `rvf-import` | 980 | JSON, CSV, NumPy (.npy) importers | | **Adapters** | **6,493** | **6 library integrations (see below)** | --- ## ⚡ Performance | Metric | Target | Achieved | |--------|--------|----------| | Cold boot (4 KB manifest read) | < 5 ms | **1.6 us** | | First query recall@10 (Layer A only) | >= 0.70 | >= 0.70 | | Full quality recall@10 (Layer C) | >= 0.95 | >= 0.95 | | WASM binary (tile microkernel) | < 8 KB | **~5.5 KB** | | WASM binary (control plane) | < 50 KB | **~46 KB** | | Segment header size | 64 bytes | 64 bytes | | Minimum file overhead | < 1 KB | < 256 bytes | | COW branch creation (10K vecs) | < 10 ms | **2.6 ms** (child = 162 bytes) | | COW branch creation (100K vecs) | < 50 ms | **6.8 ms** (child = 162 bytes) | | COW read (local cluster, pread) | < 5 us | **1,348 ns/vector** | | COW read (inherited from parent) | < 5 us | **1,442 ns/vector** | | Write coalescing (32 vecs, 1 cluster) | 1 COW event | **654 us**, 1 event | | CowMap lookup | < 100 ns | **28 ns** | | Membership filter contains() | < 100 ns | **23-33 ns** | | Snapshot freeze | < 100 ns | **30-52 ns** | ### Progressive Loading RVF doesn't make you wait for the full index: | Stage | Data Loaded | Recall@10 | Latency | |-------|-------------|-----------|---------| | **Layer A** | Entry points + centroids | >= 0.70 | < 5 ms | | **Layer B** | Hot region adjacency | >= 0.85 | ~10 ms | | **Layer C** | Full HNSW graph | >= 0.95 | ~50 ms | --- ## 📊 Comparison | Feature | RVF | Annoy | FAISS | Qdrant | Milvus | |---------|-----|-------|-------|--------|--------| | Single-file format | Yes | Yes | No | No | No | | Crash-safe (no WAL) | Yes | No | No | Needs WAL | Needs WAL | | Progressive loading | Yes (3 layers) | No | No | No | No | | COW branching | Yes (cluster-level) | No | No | No | No | | Membership filters | Yes (shared HNSW) | No | No | No | No | | Snapshot freeze | Yes (zero-copy) | No | No | No | No | | WASM support | Yes (5.5 KB) | No | No | No | No | | Self-booting kernel | Yes (real Linux) | No | No | No | No | | eBPF acceleration | Yes (XDP/TC/socket) | No | No | No | No | | `no_std` compatible | Yes | No | No | No | No | | Post-quantum sigs | Yes (ML-DSA-65) | No | No | No | No | | TEE attestation | Yes | No | No | No | No | | Metadata filtering | Yes | No | Yes | Yes | Yes | | Temperature tiering | Automatic | No | Manual | No | No | | Quantization | 3-tier auto | No | Yes (manual) | Yes | Yes | | Lineage provenance | Yes (DNA-style) | No | No | No | No | | Domain profiles | 5 profiles | No | No | No | No | | Append-only | Yes | Build-once | Build-once | Log-based | Log-based | ### vs Docker / OCI Containers | | RVF Cognitive Container | Docker / OCI | |---|---|---| | **File format** | Single `.rvf` file | Layered tarball images | | **Boot target** | QEMU microVM (microvm machine) | Container runtime (runc, containerd) | | **Vector data** | Native segment, HNSW-indexed | External volume mount | | **Branching** | Vector-native COW at cluster granularity | Layer-based COW (filesystem) | | **eBPF** | Embedded in file, verified | Separate deployment | | **Attestation** | Witness chain + KernelBinding | External signing (cosign, notary) | | **Size (hello world)** | ~17 KB (with initramfs + vectors) | ~5 MB (Alpine) | ### vs Traditional Vector Databases | | RVF | Pinecone / Milvus / Qdrant | |---|---|---| | **Deployment** | Single file, zero dependencies | Server process + storage | | **Branching** | Native COW, 2.6 ms for 10K vectors | Copy entire collection | | **Multi-tenant** | Membership filter on shared index | Separate collections | | **Edge deploy** | `scp file.rvf host:` + boot | Install + configure + import | | **Provenance** | Cryptographic witness chain | External audit logs | | **Compute** | Embedded kernel + eBPF | N/A | ### vs Git LFS / DVC | | RVF COW | Git LFS / DVC | |---|---|---| | **Granularity** | Vector cluster (256 KB) | Whole file | | **Index sharing** | Shared HNSW + membership filter | No index awareness | | **Query during branch** | Yes, sub-microsecond | No query capability | | **Delta encoding** | Sparse row patches (LoRA) | Binary diff | ### vs SQLite / DuckDB | | RVF | SQLite | DuckDB | |---|---|---|---| | **Vector-native** | Yes (HNSW, quantization, COW) | No (extension needed) | No (extension needed) | | **Self-booting** | Yes (KERNEL_SEG) | No | No | | **eBPF acceleration** | Yes (XDP, TC, socket) | No | No | | **Cryptographic audit** | Yes (witness chains) | No | No | | **Progressive loading** | 3-tier HNSW (70% → 95% recall) | N/A | N/A | | **WASM support** | 5.5 KB microkernel | Yes (via wasm) | No | | **Single file** | Yes | Yes | Yes | --- ## 🧬 Lineage Provenance RVF supports DNA-style derivation chains for tracking how files were produced from one another. Each `.rvf` file carries a 68-byte `FileIdentity` recording its unique ID, its parent's ID, and a cryptographic hash of the parent's manifest. This enables tamper-evident provenance verification from any file back to its root ancestor. ``` parent.rvf child.rvf grandchild.rvf (depth=0) (depth=1) (depth=2) file_id: AAA file_id: BBB file_id: CCC parent_id: 000 parent_id: AAA parent_id: BBB parent_hash: 000 parent_hash: H(A) parent_hash: H(B) | | | +-------derive------+-------derive------+ ``` ### Domain Profiles & Extension Aliasing Domain-specific extensions are automatically mapped to optimized profiles. The authoritative profile lives in the `Level0Root.profile_id` field; the file extension is a convenience hint: | Extension | Domain Profile | Optimized For | |-----------|---------------|---------------| | `.rvf` | Generic | General-purpose vectors | | `.rvdna` | RVDNA | Genomic sequence embeddings | | `.rvtext` | RVText | Language model embeddings | | `.rvgraph` | RVGraph | Graph/network node embeddings | | `.rvvis` | RVVision | Image/vision model embeddings | ### Deriving a Child Store ```rust use rvf_runtime::{RvfStore, options::{RvfOptions, DistanceMetric}}; use rvf_types::DerivationType; use std::path::Path; let options = RvfOptions { dimension: 384, metric: DistanceMetric::Cosine, ..Default::default() }; let parent = RvfStore::create(Path::new("parent.rvf"), options)?; // Derive a filtered child -- inherits dimensions and options let child = parent.derive( Path::new("child.rvf"), DerivationType::Filter, None, )?; assert_eq!(child.lineage_depth(), 1); assert_eq!(child.parent_id(), parent.file_id()); ``` --- ## 🖥️ Self-Booting RVF (Cognitive Container) RVF supports an optional three-tier execution model that allows a single `.rvf` file to carry executable compute alongside its vector data. A file can serve queries from a browser (Tier 1 WASM), accelerate hot-path lookups in the Linux kernel (Tier 2 eBPF), or boot as a standalone microservice inside a Firecracker microVM or TEE enclave (Tier 3 unikernel) -- all from the same file. | Tier | Segment | Size | Environment | Boot Time | Use Case | |------|---------|------|-------------|-----------|----------| | **1: WASM** | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere | | **2: eBPF** | EBPF_SEG (`0x0F`) | 10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits | | **3: Unikernel** | KERNEL_SEG (`0x0E`) | 200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service | Readers that do not recognize KERNEL_SEG or EBPF_SEG skip them per the RVF forward-compatibility rule. The computational capability is purely additive. ### Embedding a Kernel ```rust use rvf_runtime::RvfStore; use rvf_types::kernel::{KernelArch, KernelType}; use std::path::Path; let mut store = RvfStore::open(Path::new("vectors.rvf"))?; // Embed a compressed unikernel image store.embed_kernel( KernelArch::X86_64 as u8, // arch KernelType::Hermit as u8, // kernel type 0x0018, // flags: HAS_QUERY_API | HAS_NETWORKING &compressed_kernel_image, // kernel binary 8080, // API port Some("console=ttyS0 quiet"), // cmdline (optional) )?; // Later, extract it if let Some((header, image_data)) = store.extract_kernel()? { println!("Kernel: {:?} ({} bytes)", header.kernel_arch(), image_data.len()); } ``` ### Embedding an eBPF Program ```rust use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType}; // Embed an eBPF XDP program for fast-path vector lookup store.embed_ebpf( EbpfProgramType::XdpDistance as u8, // program type EbpfAttachType::XdpIngress as u8, // attach point 384, // max vector dimension &ebpf_bytecode, // BPF ELF object Some(&btf_section), // BTF data (optional) )?; if let Some((header, program_data)) = store.extract_ebpf()? { println!("eBPF: {:?} ({} bytes)", header.program_type, program_data.len()); } ``` ### Security Model - **7-step fail-closed verification**: hash, signature, TEE measurement, all must pass before kernel boot - **Authority boundary**: guest kernel owns auth/audit/witness; host eBPF is acceleration-only - **Signing**: Ed25519 for development, ML-DSA-65 (FIPS 204) for production - **TEE priority**: SEV-SNP first, SGX second, ARM CCA third - **Size limits**: kernel images capped at 128 MiB, eBPF programs at 16 MiB For the full specification including wire formats, attestation binding, and implementation phases, see [ADR-030: RVF Cognitive Container](docs/adr/ADR-030-rvf-computational-container.md). ### End-to-End: Claude Code Appliance The `claude_code_appliance` example builds a complete self-booting AI development environment as a single `.rvf` file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain. **Prerequisites:** Docker (for kernel build), Rust 1.87+ ```bash # Build and run the example cd examples/rvf cargo run --example claude_code_appliance ``` **What it produces** (5.1 MB file): ``` claude_code_appliance.rvf ├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64) ├── EBPF_SEG Socket filter — allows ports 2222, 8080 only ├── VEC_SEG 20 package embeddings (128-dim) ├── INDEX_SEG HNSW graph for package search ├── WITNESS_SEG 6-entry tamper-evident audit trail ├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude) ├── MANIFEST_SEG 4 KB root with segment directory └── Snapshot v1 derived image with lineage tracking ``` **Boot sequence** (once launched on Firecracker/QEMU): ``` 1. Firecracker loads KERNEL_SEG → Linux boots (<125 ms) 2. SSH server starts on port 2222 3. curl -fsSL https://claude.ai/install.sh | bash 4. RVF query server starts on port 8080 5. Claude Code ready for use ``` **Connect and use:** ```bash # Boot the file (requires QEMU or Firecracker) rvf launch claude_code_appliance.rvf # SSH in ssh -p 2222 deploy@localhost # Query the package database curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}' # Or use the CLI rvf query claude_code_appliance.rvf --vector "0.1,0.2,..." --k 5 ``` **Verified output from the example run:** ``` === Claude Code Appliance Summary === File size: 5,260,093 bytes (5.1 MB) Segments: 8 Packages: 20 (203.1 MB manifest) KERNEL_SEG: MicroLinux x86_64 (5,243,904 bytes) EBPF_SEG: SocketFilter (3,805 bytes) SSH users: 3 (Ed25519 signed, all verified) Witness chain: 6 entries (tamper-evident, all verified) Lineage: base + v1 snapshot (parent hash matches) ``` Final file: **5.1 MB single `.rvf`** — boots Linux, serves queries, runs Claude Code. One file. Boots Linux. Runs SSH. Serves vectors. Installs Claude Code. Proves every step. ### Launching with QEMU ```bash # CLI launcher (auto-detects KVM or falls back to TCG) rvf launch vectors.rvf # Manual QEMU (if you want control) rvf launch vectors.rvf --memory 512M --cpus 2 --port-forward 2222:22,8080:8080 # Extract kernel for external use rvf inspect vectors.rvf --segment kernel --output kernel.bin qemu-system-x86_64 -M microvm -kernel kernel.bin -append "console=ttyS0" -nographic ``` ### Building Your Own Bootable RVF Step-by-step to create a self-booting `.rvf` from scratch: ```bash # 1. Create a vector store rvf create myservice.rvf --dimension 384 # 2. Ingest your data rvf ingest myservice.rvf --input embeddings.json --format json # 3. Build and embed a Linux kernel (uses Docker) rvf embed-kernel myservice.rvf --arch x86_64 # 4. Optionally embed an eBPF filter rvf embed-ebpf myservice.rvf --program filter.c # 5. Verify the result rvf inspect myservice.rvf # MANIFEST_SEG, VEC_SEG, INDEX_SEG, KERNEL_SEG, EBPF_SEG, WITNESS_SEG # 6. Boot it rvf launch myservice.rvf ``` --- ## 🔗 Library Adapters RVF provides drop-in adapters for 6 libraries in the RuVector ecosystem: | Adapter | Purpose | Key Feature | |---------|---------|-------------| | `rvf-adapter-claude-flow` | AI agent memory | WITNESS_SEG audit trails | | `rvf-adapter-agentdb` | Agent vector database | Progressive HNSW indexing | | `rvf-adapter-ospipe` | Observation-State pipeline | META_SEG for state vectors | | `rvf-adapter-agentic-flow` | Swarm coordination | Inter-agent memory sharing | | `rvf-adapter-rvlite` | Lightweight embedded store | Minimal API, edge-friendly | | `rvf-adapter-sona` | Neural architecture | Experience replay + trajectories | --- ## 🤖 AGI Cognitive Container (ADR-036) An AGI container packages a complete AI agent runtime into a single sealed `.rvf` file. Where the [Self-Booting RVF](#%EF%B8%8F-self-booting-rvf-cognitive-container) section covers the compute tiers (WASM/eBPF/Kernel), the AGI container adds the intelligence layer on top: model identity, orchestration config, tool registries, evaluation harnesses, authority controls, and coherence gates. ``` AGI Cognitive Container (.rvf) ├── Identity ────── container UUID, build UUID, model ID hash ├── Orchestrator ── Claude Code / Claude Flow config (JSON) ├── Tools ──────── MCP tool adapter registry ├── Agent Prompts ─ role definitions per agent type ├── Eval Harness ── task suite + grading rules ├── Skills ──────── promoted skill library ├── Policy ──────── governance rules + authority config ├── Coherence ───── min score, contradiction rate, rollback ratio ├── Resources ───── time/token/cost budgets with clamping ├── Replay ──────── automation script for deterministic re-execution ├── Kernel Config ─ boot parameters, network, SSH ├── Domain Profile ─ coding / research / ops specialization └── Signature ───── HMAC-SHA256 or Ed25519 tamper seal ``` ### Execution Modes | Mode | Purpose | Requires | |------|---------|----------| | **Replay** | Deterministic re-execution from witness logs | Witness chain | | **Verify** | Validate container integrity and run eval harness | Kernel + world model, or WASM + vectors | | **Live** | Full autonomous operation with tool use | Kernel + world model | ### Authority Levels Authority is hierarchical — each level permits everything below it: | Level | Allows | |-------|--------| | `ReadOnly` | Read vectors, run queries | | `WriteMemory` | + Write to vector store, update index | | `ExecuteTools` | + Invoke MCP tools, run commands | | `WriteExternal` | + Network access, file I/O, push to git | Default authority per mode: Replay → ReadOnly, Verify → ExecuteTools, Live → WriteMemory. ### Resource Budgets Every container carries hard limits that are clamped to safety maximums: | Resource | Max | Default | |----------|-----|---------| | Time | 3,600 sec | 300 sec | | Tokens | 1,000,000 | 100,000 | | Cost | $10.00 | $1.00 | | Tool calls | 500 | 100 | | External writes | 50 | 10 | ### Coherence Gates Coherence thresholds halt execution when the agent's world model drifts: - `min_coherence_score` (0.0–1.0) — minimum quality gate - `max_contradiction_rate` (0.0–1.0) — tolerable contradiction frequency - `max_rollback_ratio` (0.0–1.0) — ratio of rolled-back decisions ### Building a Container ```rust use rvf_runtime::agi_container::AgiContainerBuilder; use rvf_types::agi_container::*; let (payload, header) = AgiContainerBuilder::new(container_id, build_id) .with_model_id("claude-opus-4-6") .with_orchestrator(b"{\"max_turns\":100}") .with_tool_registry(b"[{\"name\":\"search\",\"type\":\"rvf_query\"}]") .with_eval_tasks(b"[{\"id\":1,\"spec\":\"fix bug\"}]") .with_eval_graders(b"[{\"type\":\"test_pass\"}]") .with_authority_config(b"{\"level\":\"WriteMemory\"}") .with_coherence_config(b"{\"min_cut\":0.7,\"rollback\":true}") .with_project_instructions(b"# CLAUDE.md\nFix bugs, run tests.") .with_segments(ContainerSegments { kernel_present: true, manifest_present: true, world_model_present: true, ..Default::default() }) .build_and_sign(signing_key)?; // Parse and validate let manifest = ParsedAgiManifest::parse(&payload)?; assert_eq!(manifest.model_id_str(), Some("claude-opus-4-6")); assert!(manifest.is_autonomous_capable()); assert!(header.is_signed()); ``` See [ADR-036](../../docs/adr/ADR-036-agi-cognitive-container.md) for the full specification. ## 📱 QR Cognitive Seed (ADR-034) A QR Cognitive Seed (RVQS) encodes a portable intelligence capsule into a scannable QR code. It carries bootstrap hosts, layer hashes, and cryptographic signatures in a compact binary format. ```rust use rvf_runtime::seed_crypto; let hash = seed_crypto::seed_content_hash(data); // 8-byte SHAKE-256 let sig = seed_crypto::sign_seed(key, payload); // 32-byte HMAC let ok = seed_crypto::verify_seed(key, payload, &sig); ``` Types: `SeedHeader`, `HostEntry`, `LayerEntry` (rvf-types), plus `qr_encode` for QR matrix generation (rvf-runtime). ## 🔒 Quality & Safety Net The quality system tracks retrieval fidelity across progressive index layers and enforces graceful degradation when budgets are exceeded. - `RetrievalQuality` — Full / Partial / Degraded / Failed - `ResponseQuality` — per-query quality metadata with evidence - `SafetyNetBudget` — time, token, and cost budgets with automatic clamping - `DegradationReport` — structured fallback path and reason tracking ## 🛡️ Security Modules | Module | Crate | Purpose | |--------|-------|---------| | `SecurityPolicy` / `HardeningFields` | rvf-types | Declarative per-file security configuration | | `adversarial` | rvf-runtime | Input validation, dimension/size checks at write boundary | | `dos` | rvf-runtime | Rate limiting, resource exhaustion guards | | `KernelBinding` | rvf-types | Binds signed kernels to specific manifest hashes | | `verify_witness_chain` | rvf-crypto | SHAKE-256 chain integrity verification | ## 🧬 WASM Self-Bootstrapping (0x10) WASM_SEG enables an RVF file to carry its own WASM interpreter, creating a three-layer bootstrap stack: ``` Raw bytes → WASM interpreter → microkernel → vector data ``` Types: `WasmRole` (Interpreter/Microkernel/Solver), `WasmTarget` (Browser/Node/Edge/Embedded), `WasmHeader` (rvf-types/wasm_bootstrap). The `rvf-solver-wasm` crate implements a Thompson Sampling temporal solver as a `no_std` WASM module with `dlmalloc`, producing segment types `TRANSFER_PRIOR` (0x30), `POLICY_KERNEL` (0x31), and `COST_CURVE` (0x32). ---Built with Rust. Not a database — a portable cognitive runtime.