1672 lines
62 KiB
Markdown
1672 lines
62 KiB
Markdown
<p align="center">
|
|
<strong>RVF Examples</strong> — Learn by Running
|
|
</p>
|
|
|
|
<p align="center">
|
|
<em>Hands-on examples for the unified agentic AI format — store it, send it, run it</em>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="#quick-start">Quick Start</a> •
|
|
<a href="#examples-at-a-glance">Examples</a> •
|
|
<a href="#features-covered">Features</a> •
|
|
<a href="#performance">Performance</a> •
|
|
<a href="#comparison">Comparison</a>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<img alt="Examples" src="https://img.shields.io/badge/examples-40_runnable-brightgreen?style=flat-square" />
|
|
<img alt="Rust" src="https://img.shields.io/badge/rust-1.87%2B-orange?style=flat-square" />
|
|
<img alt="License" src="https://img.shields.io/badge/license-MIT%2FApache--2.0-blue?style=flat-square" />
|
|
<img alt="Tests" src="https://img.shields.io/badge/tests-453_passing-brightgreen?style=flat-square" />
|
|
<img alt="no_std" src="https://img.shields.io/badge/no__std-compatible-green?style=flat-square" />
|
|
<img alt="Crates" src="https://img.shields.io/badge/crates-13-blue?style=flat-square" />
|
|
</p>
|
|
|
|
---
|
|
|
|
## What is RVF?
|
|
|
|
**RVF (RuVector Format)** is the unified agentic AI file format. One `.rvf` file does three jobs:
|
|
|
|
1. **Store** — vectors, indexes, metadata, and cryptographic proofs live in one file. No database server required.
|
|
2. **Transfer** — the same file streams over a network. Query, insert, and delete operations work over the wire with zero conversion.
|
|
3. **Run** — pack model weights, graph neural networks, WASM code, or even a bootable OS kernel into the file. Now it's not just data — it's a self-contained intelligence unit you can deploy anywhere.
|
|
|
|
### Why does this matter?
|
|
|
|
Today, an AI agent's state is scattered: embeddings in one database, model weights in another, graph structure in a third, config in a fourth. Nothing talks to anything else. Moving between tools means re-indexing from scratch. There's no standard way to prove any of it was computed securely — and no way to hand an agent its complete knowledge as a single portable artifact.
|
|
|
|
RVF solves this. It gives agentic AI a **universal substrate** — one file that works everywhere:
|
|
|
|
| What it does | Where it runs | What you get |
|
|
|-------------|--------------|-------------|
|
|
| Stores vectors | Server (HNSW index) | Sub-millisecond search over millions of vectors |
|
|
| Stores vectors | Browser (5.5 KB WASM) | Same file, no backend needed |
|
|
| Stores vectors | Edge / IoT / mobile | Lightweight API, tiny footprint |
|
|
| Transfers data | Over the network | Batched query/ingest/delete via TCP |
|
|
| Runs code | Inside a TEE | Cryptographic proof of secure computation |
|
|
| Runs code | Bare metal / VM | File boots itself as a microservice |
|
|
| Runs code | Linux kernel (eBPF) | Sub-microsecond hot-path acceleration |
|
|
| Runs intelligence | Anywhere | Model + data + graph + trust chain in one file |
|
|
|
|
### Key properties
|
|
|
|
- **Crash-safe** — no write-ahead log needed; if power dies mid-write, the file stays consistent
|
|
- **Self-describing** — the schema is in the file; no external catalog required
|
|
- **Progressive loading** — start answering queries before the full index is loaded
|
|
- **Domain profiles** — `.rvdna` for genomics, `.rvtext` for language, `.rvgraph` for networks, `.rvvis` for vision — same format underneath
|
|
- **Lineage tracking** — every derived file records its parent's hash, like DNA inheritance
|
|
- **Tamper-evident** — witness chains and post-quantum signatures prove nothing was altered
|
|
|
|
These examples walk you through every major feature, from the simplest "insert and query" to wire format inspection, witness chains, and sealed cognitive engines.
|
|
|
|
### What you can build with RVF
|
|
|
|
| Use case | What goes in the file | Result |
|
|
|----------|----------------------|--------|
|
|
| **Semantic search** | Vectors + HNSW index | Single-file vector database, no server needed |
|
|
| **Agent memory** | Vectors + metadata + witness chain | Portable, auditable AI agent knowledge base |
|
|
| **Sealed LoRA distribution** | Base embeddings + OVERLAY_SEG adapter deltas | Ship fine-tuned models as one versioned file |
|
|
| **Portable graph intelligence** | Node embeddings + GRAPH_SEG adjacency | GNN state that transfers between systems |
|
|
| **Self-booting AI service** | Vectors + index + KERNEL_SEG unikernel | File boots as a microservice on bare metal or Firecracker |
|
|
| **Kernel-accelerated cache** | Hot vectors + EBPF_SEG XDP program | Sub-microsecond lookups in the Linux kernel data path |
|
|
| **Confidential AI** | Any of the above + TEE attestation | Cryptographic proof everything ran inside a secure enclave |
|
|
| **Genomic analysis** | DNA k-mer embeddings + variant tensors | `.rvdna` file with lineage tracking across analysis pipeline |
|
|
| **Firmware-style AI versioning** | Full cognitive state + lineage chain | Parent → child derivation with hash verification, like DNA |
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Clone the repo
|
|
git clone https://github.com/ruvnet/ruvector
|
|
cd ruvector/examples/rvf
|
|
|
|
# Run your first example
|
|
cargo run --example basic_store
|
|
```
|
|
|
|
That's it. You'll see a store created, 100 vectors inserted, nearest neighbors found, and persistence verified — all in under a second.
|
|
|
|
### Using the CLI
|
|
|
|
You can also work with RVF stores from the command line without writing any Rust:
|
|
|
|
```bash
|
|
# Build the CLI
|
|
cd crates/rvf && cargo build -p rvf-cli
|
|
|
|
# Create a store, ingest data, and query
|
|
rvf create vectors.rvf --dimension 384
|
|
rvf ingest vectors.rvf --input data.json --format json
|
|
rvf query vectors.rvf --vector "0.1,0.2,..." --k 10
|
|
rvf status vectors.rvf
|
|
rvf inspect vectors.rvf
|
|
rvf compact vectors.rvf
|
|
|
|
# Derive a child store with lineage tracking
|
|
rvf derive parent.rvf child.rvf --type filter
|
|
|
|
# All commands support --json for machine-readable output
|
|
rvf status vectors.rvf --json
|
|
```
|
|
|
|
<details>
|
|
<summary><strong>Run All 40 Examples</strong></summary>
|
|
|
|
**Core (6):**
|
|
```bash
|
|
cargo run --example basic_store # Store lifecycle + k-NN
|
|
cargo run --example progressive_index # Three-layer HNSW recall
|
|
cargo run --example quantization # Scalar / product / binary
|
|
cargo run --example wire_format # Raw segment I/O
|
|
cargo run --example crypto_signing # Ed25519 + witness chains
|
|
cargo run --example filtered_search # Metadata-filtered queries
|
|
```
|
|
|
|
**Agentic AI (6):**
|
|
```bash
|
|
cargo run --example agent_memory # Persistent agent memory + witness audit
|
|
cargo run --example swarm_knowledge # Multi-agent shared knowledge base
|
|
cargo run --example reasoning_trace # Chain-of-thought with lineage derivation
|
|
cargo run --example tool_cache # Tool call result cache with TTL
|
|
cargo run --example agent_handoff # Transfer agent state between instances
|
|
cargo run --example experience_replay # RL experience replay buffer
|
|
```
|
|
|
|
**Practical Production (5):**
|
|
```bash
|
|
cargo run --example semantic_search # Document search with metadata filters
|
|
cargo run --example recommendation # Item recommendations (collaborative filtering)
|
|
cargo run --example rag_pipeline # Retrieval-augmented generation pipeline
|
|
cargo run --example embedding_cache # LRU cache with temperature tiering
|
|
cargo run --example dedup_detector # Near-duplicate detection + compaction
|
|
```
|
|
|
|
**Vertical Domains (4):**
|
|
```bash
|
|
cargo run --example genomic_pipeline # DNA k-mer search (.rvdna profile)
|
|
cargo run --example financial_signals # Market signals with TEE attestation
|
|
cargo run --example medical_imaging # Radiology search (.rvvis profile)
|
|
cargo run --example legal_discovery # Legal doc similarity (.rvtext profile)
|
|
```
|
|
|
|
**Exotic Capabilities (5):**
|
|
```bash
|
|
cargo run --example self_booting # RVF with embedded unikernel
|
|
cargo run --example ebpf_accelerator # eBPF hot-path acceleration
|
|
cargo run --example hyperbolic_taxonomy # Hierarchy-aware search
|
|
cargo run --example multimodal_fusion # Cross-modal text + image search
|
|
cargo run --example sealed_engine # Full cognitive engine (capstone)
|
|
```
|
|
|
|
**Runtime Targets (4) + Postgres (1):**
|
|
```bash
|
|
cargo run --example browser_wasm # Browser-side WASM vector search
|
|
cargo run --example edge_iot # IoT device with binary quantization
|
|
cargo run --example serverless_function # Cold-start optimized for Lambda
|
|
cargo run --example ruvllm_inference # LLM KV cache + LoRA via RVF
|
|
cargo run --example postgres_bridge # PostgreSQL ↔ RVF export/import
|
|
```
|
|
|
|
**Network & Security (4):**
|
|
```bash
|
|
cargo run --example network_sync # Peer-to-peer vector store sync
|
|
cargo run --example tee_attestation # TEE attestation + sealed keys
|
|
cargo run --example access_control # Role-based vector access control
|
|
cargo run --example zero_knowledge # Zero-knowledge proof integration
|
|
```
|
|
|
|
**Autonomous Agent (1):**
|
|
```bash
|
|
cargo run --example ruvbot # Autonomous RVF-powered agent bot
|
|
```
|
|
|
|
**POSIX & Systems (3):**
|
|
```bash
|
|
cargo run --example posix_fileops # POSIX file operations with RVF
|
|
cargo run --example linux_microkernel # Linux microkernel distribution
|
|
cargo run --example mcp_in_rvf # MCP server embedded in RVF
|
|
```
|
|
|
|
**Network Operations (1):**
|
|
```bash
|
|
cargo run --example network_interfaces # Network OS telemetry (60 interfaces)
|
|
```
|
|
|
|
</details>
|
|
|
|
### Prerequisites
|
|
|
|
- **Rust 1.87+** — install via [rustup](https://rustup.rs/)
|
|
- No other dependencies needed — everything builds from source
|
|
- All examples use deterministic pseudo-random data, so results are reproducible across runs
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary><strong>Examples at a Glance (40 examples)</strong></summary>
|
|
|
|
### Core
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 1 | basic_store | Beginner | Create, insert, query, persist, reopen |
|
|
| 2 | progressive_index | Intermediate | Three-layer HNSW, recall measurement |
|
|
| 3 | quantization | Intermediate | Scalar/product/binary quantization, tiering |
|
|
| 4 | wire_format | Advanced | Raw segment I/O, hash validation, tail-scan |
|
|
| 5 | crypto_signing | Advanced | Ed25519 signing, witness chains, tamper detection |
|
|
| 6 | filtered_search | Intermediate | Metadata filters: Eq, Range, AND/OR/IN |
|
|
|
|
### Agentic AI
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 7 | agent_memory | Intermediate | Persistent agent memory, session recall, witness audit |
|
|
| 8 | swarm_knowledge | Intermediate | Multi-agent shared knowledge, cross-agent search |
|
|
| 9 | reasoning_trace | Advanced | Chain-of-thought lineage (parent → child → grandchild) |
|
|
| 10 | tool_cache | Intermediate | Tool call caching, TTL, delete_by_filter, compaction |
|
|
| 11 | agent_handoff | Advanced | Transfer agent state, derive clone, lineage verification |
|
|
| 12 | experience_replay | Intermediate | RL replay buffer, priority sampling, tiering |
|
|
|
|
### Practical Production
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 13 | semantic_search | Beginner | Document search engine, 4 filter workflows |
|
|
| 14 | recommendation | Intermediate | Collaborative filtering, genre/quality filters |
|
|
| 15 | rag_pipeline | Advanced | 5-step RAG: chunk, embed, retrieve, rerank, assemble |
|
|
| 16 | embedding_cache | Advanced | Zipf access patterns, 3-tier quantization, memory savings |
|
|
| 17 | dedup_detector | Intermediate | Near-duplicate detection, clustering, compaction |
|
|
|
|
### Vertical Domains
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 18 | genomic_pipeline | Advanced | DNA k-mer search, `.rvdna` profile, lineage |
|
|
| 19 | financial_signals | Advanced | Market signals, Ed25519 signing, attestation |
|
|
| 20 | medical_imaging | Intermediate | Radiology search, `.rvvis` profile, audit trail |
|
|
| 21 | legal_discovery | Intermediate | Legal similarity, `.rvtext` profile, discovery audit |
|
|
|
|
### Exotic Capabilities
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 22 | self_booting | Advanced | Embed/extract unikernel, kernel header verification |
|
|
| 23 | ebpf_accelerator | Advanced | Embed/extract eBPF, XDP program, co-existence |
|
|
| 24 | hyperbolic_taxonomy | Intermediate | Hierarchy-aware embeddings, depth-filtered search |
|
|
| 25 | multimodal_fusion | Intermediate | Cross-modal text+image search, modality filtering |
|
|
| 26 | sealed_engine | Advanced | Capstone: vectors + kernel + eBPF + witness + lineage |
|
|
|
|
### Runtime Targets + Postgres
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 27 | browser_wasm | Intermediate | WASM-compatible API, raw wire segments, size targets |
|
|
| 28 | edge_iot | Beginner | Constrained device, binary quantization, memory budget |
|
|
| 29 | serverless_function | Intermediate | Cold start, manifest tail-scan, progressive loading |
|
|
| 30 | ruvllm_inference | Advanced | KV cache + LoRA adapters + policy store via RVF |
|
|
| 31 | postgres_bridge | Intermediate | PG export/import, offline query, lineage, witness audit |
|
|
|
|
### Network & Security
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 32 | network_sync | Advanced | Peer-to-peer sync, vector exchange, conflict resolution |
|
|
| 33 | tee_attestation | Advanced | TEE platform attestation, sealed keys, computation proof |
|
|
| 34 | access_control | Intermediate | Role-based access, permission checks, audit trails |
|
|
| 35 | zero_knowledge | Advanced | ZK proofs for vector operations, privacy-preserving search |
|
|
|
|
### Autonomous Agent
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 36 | ruvbot | Advanced | Autonomous agent with RVF memory, planning, tool use |
|
|
|
|
### POSIX & Systems
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 37 | posix_fileops | Intermediate | Raw I/O, atomic rename, locking, segment random access |
|
|
| 38 | linux_microkernel | Advanced | Package management, SSH keys, kernel embed, lineage updates |
|
|
| 39 | mcp_in_rvf | Advanced | MCP server runtime embedded in RVF, eBPF filter, tools |
|
|
|
|
### Network Operations
|
|
|
|
| # | Example | Difficulty | What You'll Learn |
|
|
|---|---------|-----------|-------------------|
|
|
| 40 | network_interfaces | Intermediate | Multi-chassis telemetry, anomaly detection, filtered queries |
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary><strong>Features Covered</strong></summary>
|
|
|
|
### Storage — vectors in, answers out
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| k-NN Search | basic_store | Find nearest neighbors by L2 or cosine distance |
|
|
| Persistence | basic_store | Close a store, reopen it, verify results match |
|
|
| Metadata Filters | filtered_search | Eq, Ne, Gt, Lt, Range, In, And, Or expressions |
|
|
| Combined Filters | filtered_search | Multi-condition queries (category + score range) |
|
|
|
|
### Indexing — speed vs. accuracy trade-offs
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| Progressive Indexing | progressive_index | Three-tier HNSW: Layer A (fast), B (better), C (best) |
|
|
| Recall Measurement | progressive_index | Compare approximate results against brute-force ground truth |
|
|
|
|
### Compression — fit more vectors in less memory
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| Scalar Quantization | quantization | fp32 → u8 (4x compression, Hot tier) |
|
|
| Product Quantization | quantization | fp32 → PQ codes (8-32x compression, Warm tier) |
|
|
| Binary Quantization | quantization | fp32 → 1-bit (32x compression, Cold tier) |
|
|
| Temperature Tiering | quantization | Count-Min Sketch access tracking + automatic tier assignment |
|
|
|
|
### Wire format — what the bytes look like on disk and over the network
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| Segment I/O | wire_format | Write/read 64-byte-aligned segments with type/flags/hash |
|
|
| Hash Validation | wire_format | CRC32c / XXH3 integrity checks on every segment |
|
|
| Tail-Scan | wire_format | Find latest manifest by scanning backward from EOF |
|
|
|
|
### Trust — signatures, audit trails, and tamper detection
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| Ed25519 Signing | crypto_signing | Sign segments, verify signatures, detect tampering |
|
|
| Witness Chains | crypto_signing | SHAKE-256 linked audit trails (73-byte entries) |
|
|
| Tamper Detection | crypto_signing | Any byte flip breaks chain verification |
|
|
|
|
### Agentic AI — lineage, domains, and self-booting intelligence
|
|
|
|
| Feature | Example | Description |
|
|
|---------|---------|-------------|
|
|
| DNA-Style Lineage | (API) | Every derived file records its parent's hash and derivation type |
|
|
| Domain Profiles | (API) | `.rvdna`, `.rvtext`, `.rvgraph`, `.rvvis` — same format, domain-specific hints |
|
|
| Computational Container | `claude_code_appliance` | Embed a WASM microkernel, eBPF program, or bootable unikernel |
|
|
| Self-Booting Appliance | `claude_code_appliance` | 5.1 MB `.rvf` — boots Linux, serves queries, runs Claude Code |
|
|
| Import (JSON/CSV/NumPy) | (API) | Load embeddings from `.json`, `.csv`, or `.npy` files via `rvf-import` or `rvf ingest` CLI |
|
|
| Unified CLI | `rvf` | 9 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve |
|
|
| Compaction | (API) | Garbage-collect tombstoned vectors and reclaim disk space |
|
|
| Batch Delete | (API) | Delete vectors by ID with tombstone markers |
|
|
|
|
### Self-Booting RVF — Claude Code Appliance
|
|
|
|
The `claude_code_appliance` example builds a complete self-booting AI development environment as a single `.rvf` file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.
|
|
|
|
```bash
|
|
cd examples/rvf
|
|
cargo run --example claude_code_appliance
|
|
```
|
|
|
|
**What it produces** (5.1 MB file):
|
|
|
|
```
|
|
claude_code_appliance.rvf
|
|
├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64)
|
|
├── EBPF_SEG Socket filter — allows ports 2222, 8080 only
|
|
├── VEC_SEG 20 package embeddings (128-dim)
|
|
├── INDEX_SEG HNSW graph for package search
|
|
├── WITNESS_SEG 6-entry tamper-evident audit trail
|
|
├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude)
|
|
├── MANIFEST_SEG 4 KB root with segment directory
|
|
└── Snapshot v1 derived image with lineage tracking
|
|
```
|
|
|
|
**Boot and connect:**
|
|
|
|
```bash
|
|
rvf launch claude_code_appliance.rvf # Boot on QEMU/Firecracker
|
|
ssh -p 2222 deploy@localhost # SSH in
|
|
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'
|
|
```
|
|
|
|
Final file: **5.1 MB single `.rvf`** — boots Linux, serves queries, runs Claude Code.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>What RVF Contains</strong></summary>
|
|
|
|
An RVF file is built from **segments** — self-describing blocks that can be combined freely. Here are all 16 types, grouped by purpose:
|
|
|
|
```
|
|
Data Indexing Compression Runtime
|
|
+-----------+ +-----------+ +-----------+ +-----------+
|
|
| VEC 0x01 | | INDEX 0x02| | QUANT 0x06| | WASM |
|
|
| (vectors) | | (HNSW) | | (SQ/PQ/BQ)| | (5.5 KB) |
|
|
+-----------+ +-----------+ +-----------+ +-----------+
|
|
| META 0x07 | | META_IDX | | HOT 0x08 | | KERNEL |
|
|
| (key-val) | | 0x0D | | (promoted) | | 0x0E |
|
|
+-----------+ +-----------+ +-----------+ +-----------+
|
|
| JOURNAL | | OVERLAY | | SKETCH | | EBPF |
|
|
| 0x04 | | 0x03 | | 0x09 | | 0x0F |
|
|
+-----------+ +-----------+ +-----------+ +-----------+
|
|
|
|
Trust State Domain
|
|
+-----------+ +-----------+ +-----------+
|
|
| WITNESS | | MANIFEST | | PROFILE |
|
|
| 0x0A | | 0x05 | | 0x0B |
|
|
+-----------+ +-----------+ +-----------+
|
|
| CRYPTO |
|
|
| 0x0C |
|
|
+-----------+
|
|
```
|
|
|
|
Any segment you don't need is simply absent. A basic vector store uses VEC + INDEX + MANIFEST. A sealed cognitive engine might use all 16.
|
|
|
|
### RuVector Ecosystem Integration
|
|
|
|
RVF is the universal substrate for the entire RuVector ecosystem. Here's how the 75+ Rust crates map onto RVF segments:
|
|
|
|
| Domain | Crates | RVF Segments Used |
|
|
|--------|--------|-------------------|
|
|
| **LLM inference** | `ruvllm`, `ruvllm-cli` | VEC (KV cache), OVERLAY (LoRA), WITNESS (audit) |
|
|
| **Self-optimizing learning** | `sona` | OVERLAY (micro-LoRA), META (EWC++ weights) |
|
|
| **Graph neural networks** | `ruvector-gnn`, `ruvector-graph` | INDEX (HNSW topology), META (edge weights) |
|
|
| **Quantum computing** | `ruQu`, `ruqu-core`, `ruqu-algorithms` | SKETCH (VQE snapshots), META (syndrome tables) |
|
|
| **Attention mechanisms** | `ruvector-attention`, `ruvector-mincut-gated-transformer` | VEC (attention matrices), QUANT (INT4/FP16) |
|
|
| **Coherence systems** | `cognitum-gate-kernel`, `prime-radiant` | WITNESS (tile witnesses), WASM (64 KB tiles) |
|
|
| **Neuromorphic** | `ruvector-nervous-system`, `micro-hnsw-wasm` | VEC (spike trains), INDEX (spiking HNSW) |
|
|
| **Agent memory** | `agentdb`, `claude-flow`, `agentic-flow` | VEC + INDEX + WITNESS (full agent state) |
|
|
| **Edge / browser** | `rvlite`, `rvf-wasm` | VEC + INDEX via 5.5 KB WASM microkernel |
|
|
| **Hyperbolic geometry** | `ruvector-hyperbolic-hnsw`, `ruvector-math` | INDEX (Poincaré ball HNSW) |
|
|
| **Routing / inference** | `ruvector-tiny-dancer-core`, `ruvector-sparse-inference` | VEC (feature vectors), META (routing policies) |
|
|
| **Observation pipeline** | `ospipe` | META (state vectors), WITNESS (provenance) |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Performance & Comparison</strong></summary>
|
|
|
|
RVF is designed for speed at every layer:
|
|
|
|
| Metric | Value | Example |
|
|
|--------|-------|---------|
|
|
| Cold boot (4 KB manifest) | **< 5 ms** | wire_format |
|
|
| First query (Layer A only) | **recall >= 0.70** | progressive_index |
|
|
| Full recall (Layer C) | **>= 0.95** | progressive_index |
|
|
| WASM binary size | **~5.5 KB** | — |
|
|
| Segment header | **64 bytes** | wire_format |
|
|
| Witness chain entry | **73 bytes** | crypto_signing |
|
|
| Scalar quantization | **4x compression** | quantization |
|
|
| Product quantization | **8-32x compression** | quantization |
|
|
| Binary quantization | **32x compression** | quantization |
|
|
|
|
### Progressive Loading
|
|
|
|
Instead of waiting for the full index, RVF serves queries immediately:
|
|
|
|
```
|
|
Layer A ─────> Layer B ─────> Layer C
|
|
(microsecs) (~10 ms) (~50 ms)
|
|
recall ~0.70 recall ~0.85 recall ~0.95
|
|
```
|
|
|
|
The `progressive_index` example measures this recall progression with brute-force ground truth.
|
|
|
|
### Comparison
|
|
|
|
#### vs. vector databases
|
|
|
|
| Feature | RVF | Annoy | FAISS | Qdrant | Milvus |
|
|
|---------|-----|-------|-------|--------|--------|
|
|
| Single-file format | Yes | Yes | No | No | No |
|
|
| Crash-safe (no WAL) | Yes | No | No | WAL | WAL |
|
|
| Progressive loading | 3 layers | No | No | No | No |
|
|
| WASM support | 5.5 KB | No | No | No | No |
|
|
| `no_std` compatible | Yes | No | No | No | No |
|
|
| Post-quantum sigs | ML-DSA-65 | No | No | No | No |
|
|
| TEE attestation | Yes | No | No | No | No |
|
|
| Metadata filtering | Yes | No | Yes | Yes | Yes |
|
|
| Auto quantization | 3-tier | No | Manual | Yes | Yes |
|
|
| Append-only | Yes | Build-once | Build-once | Log | Log |
|
|
| Witness chains | Yes | No | No | No | No |
|
|
| Lineage provenance | Yes (DNA-style) | No | No | No | No |
|
|
| Computational container | Yes (WASM/eBPF/unikernel) | No | No | No | No |
|
|
| Domain profiles | 5 profiles | No | No | No | No |
|
|
| Language bindings | Rust, Node, WASM | C++, Python | C++, Python | Rust, Python | Go, Python |
|
|
|
|
#### vs. model registries, graph DBs, and container formats
|
|
|
|
RVF replaces multiple tools because it carries data, model, graph, runtime, and trust chain together:
|
|
|
|
| Capability | RVF | GGUF | ONNX | SafeTensors | Neo4j | Docker/OCI |
|
|
|-----------|-----|------|------|-------------|-------|------------|
|
|
| Vector storage + search | Yes | No | No | No | No | No |
|
|
| Model weight deltas (LoRA) | OVERLAY_SEG | Full weights | Full graph | Weights only | No | No |
|
|
| Graph neural state | GRAPH_SEG | No | No | No | Yes | No |
|
|
| Cryptographic audit trail | WITNESS_SEG | No | No | No | No | No |
|
|
| Self-booting runtime | KERNEL_SEG | No | No | No | No | Yes |
|
|
| Kernel-level acceleration | EBPF_SEG | No | No | No | No | No |
|
|
| File lineage / versioning | DNA-style | No | No | No | No | Image layers |
|
|
| TEE attestation | Built-in | No | No | No | No | No |
|
|
| Single portable file | Yes | Yes | Yes | Yes | No | Image tarball |
|
|
| Runs in browser | 5.5 KB WASM | No | ONNX.js | No | No | No |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Usage Patterns (8 patterns)</strong></summary>
|
|
|
|
### Pattern 1: Simple Vector Store
|
|
|
|
The most common use case. Create a store, add embeddings, query nearest neighbors.
|
|
|
|
```rust
|
|
use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
|
|
use rvf_runtime::options::DistanceMetric;
|
|
|
|
let options = RvfOptions {
|
|
dimension: 384,
|
|
metric: DistanceMetric::L2,
|
|
..Default::default()
|
|
};
|
|
let mut store = RvfStore::create("vectors.rvf", options)?;
|
|
|
|
// Insert embeddings
|
|
store.ingest_batch(&[&embedding], &[1], None)?;
|
|
|
|
// Query top-10 nearest neighbors
|
|
let results = store.query(&query, 10, &QueryOptions::default())?;
|
|
for r in &results {
|
|
println!("id={}, distance={:.4}", r.id, r.distance);
|
|
}
|
|
```
|
|
|
|
See: [`basic_store.rs`](examples/basic_store.rs)
|
|
|
|
### Pattern 2: Filtered Search
|
|
|
|
Attach metadata to vectors, then filter during queries.
|
|
|
|
```rust
|
|
use rvf_runtime::{FilterExpr, MetadataEntry, MetadataValue};
|
|
use rvf_runtime::filter::FilterValue;
|
|
|
|
// Add metadata during ingestion
|
|
let metadata = vec![
|
|
MetadataEntry { field_id: 0, value: MetadataValue::String("science".into()) },
|
|
MetadataEntry { field_id: 1, value: MetadataValue::U64(95) },
|
|
];
|
|
store.ingest_batch(&[&vec], &[42], Some(&metadata))?;
|
|
|
|
// Query with filter: category == "science" AND score > 80
|
|
let filter = FilterExpr::And(vec![
|
|
FilterExpr::Eq(0, FilterValue::String("science".into())),
|
|
FilterExpr::Gt(1, FilterValue::U64(80)),
|
|
]);
|
|
let opts = QueryOptions { filter: Some(filter), ..Default::default() };
|
|
let results = store.query(&query, 10, &opts)?;
|
|
```
|
|
|
|
See: [`filtered_search.rs`](examples/filtered_search.rs)
|
|
|
|
### Pattern 3: Progressive Recall
|
|
|
|
Start serving queries instantly, improve quality as more data loads.
|
|
|
|
```rust
|
|
use rvf_index::{build_full_index, build_layer_a, build_layer_c, ProgressiveIndex};
|
|
|
|
// Build HNSW graph
|
|
let graph = build_full_index(&store, n, &config, &rng, &l2_distance);
|
|
|
|
// Layer A: instant but approximate
|
|
let layer_a = build_layer_a(&graph, ¢roids, &assignments, n as u64);
|
|
let idx = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: None };
|
|
let fast_results = idx.search(&query, 10, 200, &store); // recall ~0.70
|
|
|
|
// Layer C: full precision
|
|
let layer_c = build_layer_c(&graph);
|
|
let idx_full = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: Some(layer_c) };
|
|
let precise_results = idx_full.search(&query, 10, 200, &store); // recall ~0.95
|
|
```
|
|
|
|
See: [`progressive_index.rs`](examples/progressive_index.rs)
|
|
|
|
### Pattern 4: Cryptographic Integrity
|
|
|
|
Sign segments and build tamper-evident audit trails.
|
|
|
|
```rust
|
|
use rvf_crypto::{sign_segment, verify_segment, create_witness_chain, WitnessEntry, shake256_256};
|
|
use ed25519_dalek::SigningKey;
|
|
|
|
// Sign a segment
|
|
let footer = sign_segment(&header, &payload, &signing_key);
|
|
|
|
// Verify signature
|
|
assert!(verify_segment(&header, &payload, &footer, &verifying_key));
|
|
|
|
// Build an audit trail
|
|
let entries = vec![WitnessEntry {
|
|
prev_hash: [0; 32],
|
|
action_hash: shake256_256(b"inserted 1000 vectors"),
|
|
timestamp_ns: 1_700_000_000_000_000_000,
|
|
witness_type: 0x01, // PROVENANCE
|
|
}];
|
|
let chain = create_witness_chain(&entries);
|
|
```
|
|
|
|
See: [`crypto_signing.rs`](examples/crypto_signing.rs)
|
|
|
|
### Pattern 5: Import from JSON / CSV / NumPy
|
|
|
|
Load embeddings from common formats without writing a parser.
|
|
|
|
```rust
|
|
use rvf_import::{import_json, import_csv, import_npy};
|
|
|
|
// From a JSON array of vectors
|
|
import_json("embeddings.json", &mut store)?;
|
|
|
|
// From a CSV file (one vector per row)
|
|
import_csv("embeddings.csv", &mut store)?;
|
|
|
|
// From a NumPy .npy file
|
|
import_npy("embeddings.npy", &mut store)?;
|
|
```
|
|
|
|
### Pattern 6: Delete and Compact
|
|
|
|
Remove vectors by ID, then reclaim disk space.
|
|
|
|
```rust
|
|
// Delete specific vectors (marks as tombstones)
|
|
store.delete_batch(&[42, 99, 1001])?;
|
|
|
|
// Compact: rewrite the file without tombstoned data
|
|
store.compact()?;
|
|
```
|
|
|
|
### Pattern 7: File Lineage (Parent → Child Derivation)
|
|
|
|
Create derived files that track their ancestry.
|
|
|
|
```rust
|
|
use rvf_types::DerivationType;
|
|
|
|
// Create a parent store
|
|
let parent = RvfStore::create("parent.rvf", options)?;
|
|
|
|
// Derive a filtered child — records parent's hash automatically
|
|
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
|
|
assert_eq!(child.lineage_depth(), 1);
|
|
assert_eq!(child.parent_id(), parent.file_id());
|
|
|
|
// Derive a grandchild
|
|
let grandchild = child.derive("grandchild.rvdna", DerivationType::Quantize, None)?;
|
|
assert_eq!(grandchild.lineage_depth(), 2);
|
|
```
|
|
|
|
### Pattern 8: Embed a Computational Container
|
|
|
|
Pack a bootable kernel or eBPF program into the file.
|
|
|
|
```rust
|
|
use rvf_types::kernel::{KernelArch, KernelType};
|
|
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
|
|
|
|
// Embed a unikernel — file can now boot as a standalone service
|
|
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &kernel_image, 8080)?;
|
|
|
|
// Embed an eBPF program — enables kernel-level acceleration
|
|
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
|
|
|
|
// Extract later
|
|
let (hdr, img) = store.extract_kernel()?.unwrap();
|
|
let (hdr, prog) = store.extract_ebpf()?.unwrap();
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Your First RVF Store (Step by Step)</strong></summary>
|
|
|
|
### Step 1: Set Up
|
|
|
|
Create a new Rust project and add the dependency:
|
|
|
|
```bash
|
|
cargo new my_vectors
|
|
cd my_vectors
|
|
```
|
|
|
|
Add to `Cargo.toml`:
|
|
|
|
```toml
|
|
[dependencies]
|
|
rvf-runtime = { path = "../crates/rvf/rvf-runtime" }
|
|
tempfile = "3"
|
|
```
|
|
|
|
### Step 2: Create a Store
|
|
|
|
```rust
|
|
use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
|
|
use rvf_runtime::options::DistanceMetric;
|
|
use tempfile::TempDir;
|
|
|
|
fn main() {
|
|
let tmp = TempDir::new().unwrap();
|
|
let path = tmp.path().join("my.rvf");
|
|
|
|
let opts = RvfOptions {
|
|
dimension: 128,
|
|
metric: DistanceMetric::L2,
|
|
..Default::default()
|
|
};
|
|
let mut store = RvfStore::create(&path, opts).unwrap();
|
|
```
|
|
|
|
### Step 3: Insert Vectors
|
|
|
|
Vectors are inserted in batches. Each vector needs a unique `u64` ID.
|
|
|
|
```rust
|
|
let vec_a = vec![0.1f32; 128];
|
|
let vec_b = vec![0.2f32; 128];
|
|
let vecs: Vec<&[f32]> = vec![&vec_a, &vec_b];
|
|
let ids = vec![1u64, 2];
|
|
|
|
let result = store.ingest_batch(&vecs, &ids, None).unwrap();
|
|
println!("Accepted: {}, Rejected: {}", result.accepted, result.rejected);
|
|
```
|
|
|
|
### Step 4: Query
|
|
|
|
```rust
|
|
let query = vec![0.15f32; 128];
|
|
let results = store.query(&query, 5, &QueryOptions::default()).unwrap();
|
|
|
|
for r in &results {
|
|
println!(" id={}, dist={:.6}", r.id, r.distance);
|
|
}
|
|
```
|
|
|
|
### Step 5: Verify Persistence
|
|
|
|
```rust
|
|
store.close().unwrap();
|
|
|
|
let reopened = RvfStore::open(&path).unwrap();
|
|
let results2 = reopened.query(&query, 5, &QueryOptions::default()).unwrap();
|
|
assert_eq!(results.len(), results2.len());
|
|
println!("Persistence verified!");
|
|
}
|
|
```
|
|
|
|
### Expected Output
|
|
|
|
```
|
|
Accepted: 2, Rejected: 0
|
|
id=1, dist=0.064000
|
|
id=2, dist=0.032000
|
|
Persistence verified!
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Understanding Quantization Tiers</strong></summary>
|
|
|
|
### The Problem
|
|
|
|
A million 384-dim vectors at full precision (fp32) takes **1.5 GB** of RAM. Not all vectors are accessed equally — most are rarely touched. Why keep them all at full precision?
|
|
|
|
### The Solution: Temperature Tiering
|
|
|
|
RVF assigns vectors to three compression levels based on how often they're accessed:
|
|
|
|
| Tier | Access Pattern | Compression | Memory per Vector (384d) |
|
|
|------|---------------|------------|--------------------------|
|
|
| **Hot** | Frequently queried | Scalar (fp32 -> u8) | 384 bytes (4x smaller) |
|
|
| **Warm** | Occasionally queried | Product quantization | 48 bytes (32x smaller) |
|
|
| **Cold** | Rarely accessed | Binary (1-bit) | 48 bytes (32x smaller) |
|
|
| Raw | No compression | fp32 | 1,536 bytes |
|
|
|
|
### How It Works
|
|
|
|
**1. Track access patterns** using a Count-Min Sketch (a probabilistic counter):
|
|
|
|
```rust
|
|
let mut sketch = CountMinSketch::default_sketch();
|
|
|
|
// Every time a vector is accessed, increment its counter
|
|
sketch.increment(vector_id);
|
|
|
|
// Check how often a vector has been accessed
|
|
let count = sketch.estimate(vector_id);
|
|
```
|
|
|
|
**2. Assign tiers** based on configurable thresholds:
|
|
|
|
```rust
|
|
let tier = assign_tier(count);
|
|
// Hot: count >= 100
|
|
// Warm: count >= 10
|
|
// Cold: count < 10
|
|
```
|
|
|
|
**3. Encode at the appropriate level:**
|
|
|
|
```rust
|
|
// Hot: Scalar (fast, low error)
|
|
let sq = ScalarQuantizer::train(&vectors);
|
|
let encoded = sq.encode_vec(&vector); // 384 bytes
|
|
|
|
// Warm: Product (balanced)
|
|
let pq = ProductQuantizer::train(&vectors, 48, 64, 20);
|
|
let encoded = pq.encode_vec(&vector); // 48 bytes
|
|
|
|
// Cold: Binary (smallest, approximate)
|
|
let bits = encode_binary(&vector); // 48 bytes
|
|
```
|
|
|
|
### Run the Example
|
|
|
|
```bash
|
|
cargo run --example quantization
|
|
```
|
|
|
|
You'll see a comparison table showing compression ratio, reconstruction error (MSE), and bytes per vector for each tier.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Building Witness Chains for Audit Trails</strong></summary>
|
|
|
|
### What Is a Witness Chain?
|
|
|
|
A witness chain is a tamper-evident log of events. Each entry links to the previous one through a cryptographic hash. If any entry is modified, all subsequent hash links break — making tampering detectable without a blockchain.
|
|
|
|
### Chain Structure
|
|
|
|
```
|
|
Entry 0 (genesis) Entry 1 Entry 2
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
| prev_hash: 0x00.. | | prev_hash: H(E0) | | prev_hash: H(E1) |
|
|
| action: H(data) | | action: H(data) | | action: H(data) |
|
|
| timestamp: T0 | | timestamp: T1 | | timestamp: T2 |
|
|
| type: PROVENANCE | | type: COMPUTATION | | type: SEARCH |
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
73 bytes 73 bytes 73 bytes
|
|
```
|
|
|
|
- **prev_hash**: SHAKE-256 hash of the previous entry (zeroed for genesis)
|
|
- **action_hash**: SHAKE-256 hash of whatever action is being recorded
|
|
- **timestamp_ns**: Nanosecond UNIX timestamp
|
|
- **witness_type**: What kind of event (see table below)
|
|
|
|
### Witness Types
|
|
|
|
| Code | Name | When to Use |
|
|
|------|------|------------|
|
|
| `0x01` | PROVENANCE | Data origin tracking (e.g., "loaded from model X") |
|
|
| `0x02` | COMPUTATION | Operation recording (e.g., "built HNSW index") |
|
|
| `0x03` | SEARCH | Query audit (e.g., "searched for query Q, got results R") |
|
|
| `0x04` | DELETION | Deletion audit (e.g., "deleted vectors 1-100") |
|
|
| `0x05` | PLATFORM_ATTESTATION | TEE attestation (e.g., "enclave measured as M") |
|
|
| `0x06` | KEY_BINDING | Sealed key (e.g., "key K bound to enclave M") |
|
|
| `0x07` | COMPUTATION_PROOF | Verified computation (e.g., "search ran inside enclave") |
|
|
| `0x08` | DATA_PROVENANCE | Full chain (e.g., "model -> TEE -> RVF file") |
|
|
| `0x09` | DERIVATION | File lineage derivation event |
|
|
| `0x0A` | LINEAGE_MERGE | Multi-parent lineage merge |
|
|
| `0x0B` | LINEAGE_SNAPSHOT | Lineage snapshot checkpoint |
|
|
| `0x0C` | LINEAGE_TRANSFORM | Lineage transform operation |
|
|
| `0x0D` | LINEAGE_VERIFY | Lineage verification event |
|
|
|
|
### Creating and Verifying
|
|
|
|
```rust
|
|
use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry, shake256_256};
|
|
|
|
// Record three events
|
|
let entries = vec![
|
|
WitnessEntry {
|
|
prev_hash: [0; 32], // genesis
|
|
action_hash: shake256_256(b"loaded embeddings from model-v2"),
|
|
timestamp_ns: 1_700_000_000_000_000_000,
|
|
witness_type: 0x01,
|
|
},
|
|
WitnessEntry {
|
|
prev_hash: [0; 32], // filled by create_witness_chain
|
|
action_hash: shake256_256(b"built HNSW index (M=16, ef=200)"),
|
|
timestamp_ns: 1_700_000_001_000_000_000,
|
|
witness_type: 0x02,
|
|
},
|
|
WitnessEntry {
|
|
prev_hash: [0; 32],
|
|
action_hash: shake256_256(b"query: top-10 for user request #42"),
|
|
timestamp_ns: 1_700_000_002_000_000_000,
|
|
witness_type: 0x03,
|
|
},
|
|
];
|
|
|
|
let chain_bytes = create_witness_chain(&entries);
|
|
let verified = verify_witness_chain(&chain_bytes).unwrap();
|
|
assert_eq!(verified.len(), 3);
|
|
```
|
|
|
|
### Tamper Detection
|
|
|
|
Flip any byte in the chain and verification fails:
|
|
|
|
```rust
|
|
let mut tampered = chain_bytes.clone();
|
|
tampered[100] ^= 0xFF; // flip one byte
|
|
|
|
assert!(verify_witness_chain(&tampered).is_err()); // detected!
|
|
```
|
|
|
|
### Run the Example
|
|
|
|
```bash
|
|
cargo run --example crypto_signing
|
|
```
|
|
|
|
The example creates a 5-entry chain, verifies it, then demonstrates tamper and truncation detection.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Wire Format Deep Dive</strong></summary>
|
|
|
|
### Segment Header (64 bytes)
|
|
|
|
Every piece of data in an RVF file is wrapped in a self-describing segment. The header is always exactly 64 bytes:
|
|
|
|
```
|
|
Offset Size Field Description
|
|
------ ---- ----- -----------
|
|
0x00 4 magic 0x52564653 ("RVFS")
|
|
0x04 1 version Format version (currently 1)
|
|
0x05 1 seg_type Segment type (VEC, INDEX, MANIFEST, ...)
|
|
0x06 2 flags Bitfield (COMPRESSED, SIGNED, ATTESTED, ...)
|
|
0x08 8 segment_id Monotonically increasing ID
|
|
0x10 8 payload_length Byte length of payload
|
|
0x18 8 timestamp_ns Nanosecond UNIX timestamp
|
|
0x20 1 checksum_algo 0=CRC32C, 1=XXH3-128, 2=SHAKE-256
|
|
0x21 1 compression 0=none, 1=LZ4, 2=ZSTD
|
|
0x22 2 reserved_0 Must be zero
|
|
0x24 4 reserved_1 Must be zero
|
|
0x28 16 content_hash First 128 bits of payload hash
|
|
0x38 4 uncompressed_len Original size before compression
|
|
0x3C 4 alignment_pad Padding to 64-byte boundary
|
|
```
|
|
|
|
### The 16 Segment Types
|
|
|
|
| Code | Name | Purpose |
|
|
|------|------|---------|
|
|
| `0x01` | VEC | Raw vector embeddings |
|
|
| `0x02` | INDEX | HNSW adjacency and routing tables |
|
|
| `0x03` | OVERLAY | Graph overlay deltas |
|
|
| `0x04` | JOURNAL | Metadata mutations, deletions |
|
|
| `0x05` | MANIFEST | Segment directory, epoch state |
|
|
| `0x06` | QUANT | Quantization dictionaries (scalar/PQ/binary) |
|
|
| `0x07` | META | Key-value metadata |
|
|
| `0x08` | HOT | Temperature-promoted data |
|
|
| `0x09` | SKETCH | Access counter sketches (Count-Min) |
|
|
| `0x0A` | WITNESS | Audit trails, attestation proofs |
|
|
| `0x0B` | PROFILE | Domain profile declarations |
|
|
| `0x0C` | CRYPTO | Key material, signature chains |
|
|
| `0x0D` | META_IDX | Metadata inverted indexes |
|
|
| `0x0E` | KERNEL | Compressed unikernel image (self-booting) |
|
|
| `0x0F` | EBPF | eBPF program for kernel-level acceleration |
|
|
|
|
### Segment Flags
|
|
|
|
| Bit | Name | Description |
|
|
|-----|------|-------------|
|
|
| 0 | COMPRESSED | Payload is compressed (LZ4 or ZSTD) |
|
|
| 1 | ENCRYPTED | Payload is encrypted |
|
|
| 2 | SIGNED | Signature footer follows payload |
|
|
| 3 | SEALED | Immutable (compaction output) |
|
|
| 4 | PARTIAL | Streaming / partial write |
|
|
| 5 | TOMBSTONE | Logical deletion marker |
|
|
| 6 | HOT | Temperature-promoted |
|
|
| 7 | OVERLAY | Contains delta data |
|
|
| 8 | SNAPSHOT | Full snapshot |
|
|
| 9 | CHECKPOINT | Safe rollback point |
|
|
| 10 | ATTESTED | Produced inside attested TEE |
|
|
| 11 | HAS_LINEAGE | File carries FileIdentity lineage data |
|
|
|
|
### Crash Safety: Two-fsync Protocol
|
|
|
|
RVF doesn't need a write-ahead log. Instead:
|
|
|
|
1. Write data segment + payload, then `fsync`
|
|
2. Write MANIFEST_SEG with updated state, then `fsync`
|
|
|
|
If the process crashes between fsyncs, the incomplete segment has no manifest reference — it's ignored on recovery. Simple, safe, fast.
|
|
|
|
### Tail-Scan
|
|
|
|
To find the current state, scan backward from the end of the file for the latest MANIFEST_SEG. The root manifest fits in 4 KB, so cold boot takes < 5 ms.
|
|
|
|
### Run the Example
|
|
|
|
```bash
|
|
cargo run --example wire_format
|
|
```
|
|
|
|
You'll see three segments written, read back, hash-validated, corruption detected, and a tail-scan for the manifest.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Metadata Filtering Patterns</strong></summary>
|
|
|
|
### Available Filter Expressions
|
|
|
|
| Expression | Syntax | Description |
|
|
|-----------|--------|-------------|
|
|
| `Eq` | `FilterExpr::Eq(field_id, value)` | Exact match |
|
|
| `Ne` | `FilterExpr::Ne(field_id, value)` | Not equal |
|
|
| `Gt` | `FilterExpr::Gt(field_id, value)` | Greater than |
|
|
| `Lt` | `FilterExpr::Lt(field_id, value)` | Less than |
|
|
| `Range` | `FilterExpr::Range(field_id, low, high)` | Value in [low, high) |
|
|
| `In` | `FilterExpr::In(field_id, values)` | Value is one of |
|
|
| `And` | `FilterExpr::And(vec![...])` | All conditions must match |
|
|
| `Or` | `FilterExpr::Or(vec![...])` | Any condition matches |
|
|
|
|
### Metadata Types
|
|
|
|
| Type | Rust | Use Case |
|
|
|------|------|----------|
|
|
| `String` | `MetadataValue::String("cat".into())` | Categories, labels, tags |
|
|
| `U64` | `MetadataValue::U64(95)` | Scores, counts, timestamps |
|
|
| `Bytes` | `MetadataValue::Bytes(vec![...])` | Binary data, hashes |
|
|
|
|
### Common Patterns
|
|
|
|
**Category filter:**
|
|
```rust
|
|
FilterExpr::Eq(0, FilterValue::String("science".into()))
|
|
```
|
|
|
|
**Score range:**
|
|
```rust
|
|
FilterExpr::Range(1, FilterValue::U64(30), FilterValue::U64(90))
|
|
```
|
|
|
|
**Multi-category:**
|
|
```rust
|
|
FilterExpr::In(0, vec![
|
|
FilterValue::String("science".into()),
|
|
FilterValue::String("tech".into()),
|
|
])
|
|
```
|
|
|
|
**Combined (AND):**
|
|
```rust
|
|
FilterExpr::And(vec![
|
|
FilterExpr::Eq(0, FilterValue::String("science".into())),
|
|
FilterExpr::Gt(1, FilterValue::U64(80)),
|
|
])
|
|
```
|
|
|
|
### Run the Example
|
|
|
|
```bash
|
|
cargo run --example filtered_search
|
|
```
|
|
|
|
The example creates 500 vectors with category and score metadata, then runs 7 different filter queries showing selectivity and verification.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Tutorial: Progressive Index Recall Measurement</strong></summary>
|
|
|
|
### What Is Recall?
|
|
|
|
**Recall@K** measures how many of the true K nearest neighbors your approximate algorithm actually returns. A recall of 0.95 means 95% of results are correct.
|
|
|
|
```
|
|
recall@K = |approximate_results ∩ exact_results| / K
|
|
```
|
|
|
|
### How Progressive Indexing Achieves This
|
|
|
|
RVF builds an HNSW (Hierarchical Navigable Small World) graph, then splits it into three loadable layers:
|
|
|
|
**Layer A: Coarse Routing**
|
|
- Entry points (topmost HNSW nodes)
|
|
- Partition centroids for guided search
|
|
- Loads in microseconds
|
|
- Recall: ~0.40-0.70
|
|
|
|
**Layer B: Hot Region**
|
|
- Adjacency lists for the most frequently accessed vectors
|
|
- Covers the "working set" of your data
|
|
- Recall: ~0.70-0.85
|
|
|
|
**Layer C: Full Graph**
|
|
- Complete HNSW adjacency for all vectors
|
|
- Loaded in background while queries are already being served
|
|
- Recall: >= 0.95
|
|
|
|
### Measuring Recall in the Example
|
|
|
|
The `progressive_index` example:
|
|
1. Generates 5,000 vectors (128 dims)
|
|
2. Builds the full HNSW graph (M=16, ef_construction=200)
|
|
3. Splits into Layer A, B, C
|
|
4. Runs 50 queries at each stage
|
|
5. Computes recall@10 against brute-force ground truth
|
|
|
|
```bash
|
|
cargo run --example progressive_index
|
|
```
|
|
|
|
Expected output:
|
|
|
|
```
|
|
=== Recall Progression Summary ===
|
|
Layers Recall@10
|
|
A only 0.xxx
|
|
A + B 0.xxx
|
|
A + B + C 0.9xx
|
|
```
|
|
|
|
### Tuning ef_search
|
|
|
|
The `ef_search` parameter controls how many candidates HNSW explores during search. Higher values improve recall at the cost of latency:
|
|
|
|
| ef_search | Recall@10 | Relative Speed |
|
|
|-----------|-----------|---------------|
|
|
| 10 | ~0.75 | Fastest |
|
|
| 50 | ~0.90 | Balanced |
|
|
| 200 | ~0.97 | Most accurate |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: Signature Footer Format</strong></summary>
|
|
|
|
When the `SIGNED` flag is set on a segment, a signature footer follows the payload:
|
|
|
|
| Offset | Size | Field |
|
|
|--------|------|-------|
|
|
| 0x00 | 2 | `sig_algo` (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s) |
|
|
| 0x02 | 2 | `sig_length` |
|
|
| 0x04 | var | `signature` (64 to 7,856 bytes) |
|
|
| var | 4 | `footer_length` (for backward scan) |
|
|
|
|
### Supported Algorithms
|
|
|
|
| Algorithm | Signature Size | Security Level | Standard |
|
|
|-----------|---------------|---------------|----------|
|
|
| Ed25519 | 64 bytes | 128-bit classical | RFC 8032 |
|
|
| ML-DSA-65 | 3,309 bytes | NIST Level 3 (post-quantum) | FIPS 204 |
|
|
| SLH-DSA-128s | 7,856 bytes | NIST Level 1 (post-quantum, stateless) | FIPS 205 |
|
|
|
|
### Signing Flow
|
|
|
|
1. Serialize the segment header (64 bytes) and payload into a signing buffer
|
|
2. Compute SHAKE-256 hash of the buffer
|
|
3. Sign the hash with the chosen algorithm
|
|
4. Append the signature footer after the payload (before padding)
|
|
5. Set the `SIGNED` flag in the header
|
|
|
|
### Verification Flow
|
|
|
|
1. Read segment header and payload
|
|
2. Recompute SHAKE-256 hash of header + payload
|
|
3. Read signature footer (scan backward from segment end using `footer_length`)
|
|
4. Verify signature against the public key
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: Confidential Core Attestation</strong></summary>
|
|
|
|
### Overview
|
|
|
|
RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This provides cryptographic proof that:
|
|
|
|
- The platform is genuine (e.g., real Intel SGX hardware)
|
|
- The code running inside the enclave matches a known measurement
|
|
- Encryption keys are sealed to the enclave identity
|
|
- Vector operations were computed inside the secure environment
|
|
|
|
### Supported TEE Platforms
|
|
|
|
| Platform | Enum Value | Quote Format |
|
|
|----------|-----------|--------------|
|
|
| Intel SGX | `TeePlatform::Sgx` (0) | DCAP attestation quote |
|
|
| AMD SEV-SNP | `TeePlatform::SevSnp` (1) | VCEK attestation report |
|
|
| Intel TDX | `TeePlatform::Tdx` (2) | TD quote |
|
|
| ARM CCA | `TeePlatform::ArmCca` (3) | CCA token |
|
|
| Software (testing) | `TeePlatform::SoftwareTee` (0xFE) | Synthetic (no hardware) |
|
|
|
|
### Attestation Header (112 bytes, `repr(C)`)
|
|
|
|
```
|
|
Offset Size Field
|
|
------ ---- -----
|
|
0x00 1 platform TeePlatform enum value
|
|
0x01 1 attestation_type AttestationWitnessType enum value
|
|
0x02 4 quote_length Length of the platform-specific quote
|
|
0x06 2 reserved
|
|
0x08 32 measurement SHAKE-256 hash of enclave code
|
|
0x28 32 signer_id SHAKE-256 hash of signing identity
|
|
0x48 8 timestamp_ns Nanosecond UNIX timestamp
|
|
0x50 16 nonce Anti-replay nonce
|
|
0x60 2 svn Security Version Number
|
|
0x62 1 sig_algo Signature algorithm for the quote
|
|
0x63 1 flags Attestation flags
|
|
0x64 4 report_data_len Length of additional report data
|
|
0x68 8 reserved
|
|
```
|
|
|
|
### Attestation Types
|
|
|
|
| Type | Witness Code | Purpose |
|
|
|------|-------------|---------|
|
|
| Platform Attestation | `0x05` | TEE identity + measurement verification |
|
|
| Key Binding | `0x06` | Keys sealed to enclave measurement |
|
|
| Computation Proof | `0x07` | Proof that operations ran inside enclave |
|
|
| Data Provenance | `0x08` | Full chain: model -> TEE -> RVF file |
|
|
|
|
### ATTESTED Segment Flag
|
|
|
|
Any segment produced inside a TEE should set bit 10 (`ATTESTED`) in the segment header flags. This enables fast scanning to identify attested segments without parsing payloads.
|
|
|
|
### QuoteVerifier Trait
|
|
|
|
The verification interface is pluggable:
|
|
|
|
```rust
|
|
pub trait QuoteVerifier {
|
|
fn platform(&self) -> TeePlatform;
|
|
fn verify_quote(
|
|
&self,
|
|
quote: &[u8],
|
|
report_data: &[u8],
|
|
expected_measurement: &[u8; 32],
|
|
) -> Result<(), String>;
|
|
}
|
|
```
|
|
|
|
Implement this trait for your TEE platform to enable hardware-backed verification. The `SoftwareTee` variant allows testing without real hardware.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: Computational Container (Self-Booting RVF)</strong></summary>
|
|
|
|
### Three-Tier Execution Model
|
|
|
|
RVF files can optionally carry executable compute alongside vector data:
|
|
|
|
| Tier | Segment | Size | Environment | Boot Time | Use Case |
|
|
|------|---------|------|-------------|-----------|----------|
|
|
| **1: WASM** | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere |
|
|
| **2: eBPF** | EBPF_SEG (`0x0F`) | 10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits |
|
|
| **3: Unikernel** | KERNEL_SEG (`0x0E`) | 200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service |
|
|
|
|
### KernelHeader (128 bytes)
|
|
|
|
| Field | Size | Description |
|
|
|-------|------|-------------|
|
|
| `kernel_magic` | 4 | `0x52564B4E` ("RVKN") |
|
|
| `header_version` | 2 | Currently 1 |
|
|
| `kernel_arch` | 1 | x86_64 (0), AArch64 (1), RISC-V (2), WASM (3) |
|
|
| `kernel_type` | 1 | HermitOS (0), Unikraft (1), Custom (2), TestStub (0xFE) |
|
|
| `image_size` | 4 | Uncompressed kernel size |
|
|
| `compressed_size` | 4 | Compressed (ZSTD) size |
|
|
| `image_hash` | 32 | SHAKE-256-256 of uncompressed image |
|
|
| `api_port` | 2 | HTTP API port (network byte order) |
|
|
| `api_transport` | 1 | HTTP (0), gRPC (1), virtio-vsock (2) |
|
|
| `kernel_flags` | 8 | Feature flags (read-only, metrics, TEE, etc.) |
|
|
| `cmdline_len` | 2 | Length of kernel command line |
|
|
|
|
### EbpfHeader (64 bytes)
|
|
|
|
| Field | Size | Description |
|
|
|-------|------|-------------|
|
|
| `ebpf_magic` | 4 | `0x52564250` ("RVBP") |
|
|
| `program_type` | 1 | XDP (0), TC (1), Tracepoint (2), Socket (3) |
|
|
| `attach_type` | 1 | XdpIngress (0), TcIngress (1), etc. |
|
|
| `max_dimension` | 4 | Maximum vector dimension (eBPF verifier loop bound) |
|
|
| `bytecode_size` | 4 | Size of BPF ELF object |
|
|
| `btf_size` | 4 | Size of BTF section |
|
|
| `map_count` | 4 | Number of BPF maps |
|
|
|
|
### Embedding and Extracting
|
|
|
|
```rust
|
|
use rvf_runtime::RvfStore;
|
|
use rvf_types::kernel::{KernelArch, KernelType};
|
|
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
|
|
|
|
let mut store = RvfStore::open("vectors.rvf")?;
|
|
|
|
// Embed a kernel
|
|
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &image, 8080)?;
|
|
|
|
// Embed an eBPF program
|
|
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
|
|
|
|
// Extract later
|
|
let (kernel_hdr, kernel_img) = store.extract_kernel()?.unwrap();
|
|
let (ebpf_hdr, ebpf_prog) = store.extract_ebpf()?.unwrap();
|
|
```
|
|
|
|
### Forward Compatibility
|
|
|
|
Files with KERNEL_SEG or EBPF_SEG work with older readers -- unknown segment types are skipped per the RVF forward-compatibility rule. The computational capability is purely additive.
|
|
|
|
See [ADR-030](../../docs/adr/ADR-030-rvf-computational-container.md) for the full specification.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: DNA-Style Lineage Provenance</strong></summary>
|
|
|
|
### How Lineage Works
|
|
|
|
Every RVF file carries a 68-byte `FileIdentity` in its root manifest:
|
|
|
|
| Field | Size | Description |
|
|
|-------|------|-------------|
|
|
| `file_id` | 16 | Unique UUID for this file |
|
|
| `parent_id` | 16 | UUID of the parent file (all zeros for root) |
|
|
| `parent_hash` | 32 | SHAKE-256-256 of parent's manifest |
|
|
| `lineage_depth` | 4 | Generation count (0 for root) |
|
|
|
|
### Derivation Chain
|
|
|
|
```
|
|
Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna
|
|
file_id: A file_id: B file_id: C
|
|
parent_id: [0;16] parent_id: A parent_id: B
|
|
parent_hash: [0;32] parent_hash: hash(A) parent_hash: hash(B)
|
|
depth: 0 depth: 1 depth: 2
|
|
```
|
|
|
|
### Derivation Types
|
|
|
|
| Code | Type | Description |
|
|
|------|------|-------------|
|
|
| 0 | Clone | Exact copy |
|
|
| 1 | Filter | Subset of parent's vectors |
|
|
| 2 | Merge | Multi-parent merge |
|
|
| 3 | Quantize | Re-quantized version |
|
|
| 4 | Reindex | Re-indexed with different parameters |
|
|
| 5 | Transform | Transformed embeddings |
|
|
| 6 | Snapshot | Point-in-time snapshot |
|
|
| 0xFF | UserDefined | Application-specific derivation |
|
|
|
|
### Using the API
|
|
|
|
```rust
|
|
use rvf_runtime::RvfStore;
|
|
use rvf_types::DerivationType;
|
|
|
|
let parent = RvfStore::create("parent.rvf", options)?;
|
|
|
|
// Derive a filtered child
|
|
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
|
|
assert_eq!(child.lineage_depth(), 1);
|
|
assert_eq!(child.parent_id(), parent.file_id());
|
|
```
|
|
|
|
### Domain Extensions
|
|
|
|
| Extension | Domain Profile | Optimized For |
|
|
|-----------|---------------|---------------|
|
|
| `.rvf` | Generic | General-purpose vectors |
|
|
| `.rvdna` | RVDNA | Genomic sequence embeddings |
|
|
| `.rvtext` | RVText | Language model embeddings |
|
|
| `.rvgraph` | RVGraph | Graph/network node embeddings |
|
|
| `.rvvis` | RVVision | Image/vision model embeddings |
|
|
|
|
See [ADR-029](../../docs/adr/ADR-029-rvf-canonical-format.md) for the full format specification.
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: Crate Architecture</strong></summary>
|
|
|
|
### Crate Map
|
|
|
|
```
|
|
+-----------------------------------------+
|
|
| Cognitive Layer |
|
|
| ruvllm | gnn | ruQu | attention | sona |
|
|
| mincut | prime-radiant | nervous-system |
|
|
+---+-------------+---------------+-------+
|
|
| | |
|
|
+-----------------------------------------+
|
|
| Application Layer |
|
|
| claude-flow | agentdb | agentic-flow |
|
|
| ospipe | rvlite | sona | your-app |
|
|
+---+-------------+---------------+-------+
|
|
| | |
|
|
+---v-------------v---------------v-------+
|
|
| RVF SDK Layer |
|
|
| rvf-runtime | rvf-index | rvf-quant |
|
|
| rvf-manifest | rvf-crypto | rvf-wire |
|
|
+---+-------------+---------------+-------+
|
|
| | |
|
|
+--------v------+ +---v--------+ +----v-------+ +----v------+
|
|
| rvf-server | | rvf-node | | rvf-wasm | | rvf-cli |
|
|
| HTTP + TCP | | N-API | | ~46 KB | | clap |
|
|
+---------------+ +------------+ +------------+ +-----------+
|
|
```
|
|
|
|
### Crate Details
|
|
|
|
| Crate | Lines | no_std | Purpose |
|
|
|-------|------:|:------:|---------|
|
|
| `rvf-types` | 3,184 | Yes | Segment types, kernel/eBPF headers, lineage, enums |
|
|
| `rvf-wire` | 2,011 | Yes | Wire format read/write, hash validation |
|
|
| `rvf-manifest` | 1,580 | No | Two-level manifest with 4 KB root, FileIdentity codec |
|
|
| `rvf-index` | 2,691 | No | HNSW progressive indexing (Layer A/B/C) |
|
|
| `rvf-quant` | 1,443 | No | Scalar, product, and binary quantization |
|
|
| `rvf-crypto` | 1,725 | Partial | SHAKE-256, Ed25519, witness chains, attestation, lineage |
|
|
| `rvf-runtime` | 3,607 | No | Full store API, compaction, lineage, kernel/eBPF embed |
|
|
| `rvf-import` | 980 | No | JSON, CSV, NumPy (.npy) importers |
|
|
| `rvf-wasm` | 1,616 | Yes | WASM control plane: in-memory store, query, segment inspection |
|
|
| `rvf-node` | 852 | No | Node.js N-API bindings with lineage, kernel/eBPF, inspection |
|
|
| `rvf-cli` | 665 | No | Unified CLI: create, ingest, query, delete, status, inspect, compact, derive, serve |
|
|
| `rvf-server` | 1,165 | No | HTTP REST + TCP streaming server |
|
|
|
|
### Library Adapters
|
|
|
|
| Adapter | Purpose | Key Feature |
|
|
|---------|---------|-------------|
|
|
| `rvf-adapter-claude-flow` | AI agent memory | WITNESS_SEG audit trails |
|
|
| `rvf-adapter-agentdb` | Agent vector database | Progressive HNSW indexing |
|
|
| `rvf-adapter-ospipe` | Observation-State pipeline | META_SEG for state vectors |
|
|
| `rvf-adapter-agentic-flow` | Swarm coordination | Inter-agent memory sharing |
|
|
| `rvf-adapter-rvlite` | Lightweight embedded store | Minimal API, edge-friendly |
|
|
| `rvf-adapter-sona` | Neural architecture | Experience replay + trajectories |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Technical Reference: File Format Specification</strong></summary>
|
|
|
|
### File Extension
|
|
|
|
| Extension | Usage |
|
|
|-----------|-------|
|
|
| `.rvf` | Standard RuVector Format file |
|
|
| `.rvf.cold.N` | Cold shard N (multi-file mode) |
|
|
| `.rvf.idx.N` | Index shard N (multi-file mode) |
|
|
|
|
### MIME Type
|
|
|
|
`application/x-ruvector-format`
|
|
|
|
### Magic Number
|
|
|
|
`0x52564653` (ASCII: "RVFS")
|
|
|
|
### Byte Order
|
|
|
|
All multi-byte integers are **little-endian**.
|
|
|
|
### Alignment
|
|
|
|
All segments are **64-byte aligned** (cache-line friendly). Payloads are padded to the next 64-byte boundary.
|
|
|
|
### Root Manifest
|
|
|
|
The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via backward scan:
|
|
|
|
```rust
|
|
let (offset, header) = find_latest_manifest(&file_data)?;
|
|
```
|
|
|
|
The root manifest provides:
|
|
- Segment directory (offsets to all segments)
|
|
- Hotset pointers (entry points, top layer, centroids, quant dicts)
|
|
- Epoch counter
|
|
- Vector count and dimension
|
|
- Profile identifiers
|
|
|
|
### Domain Profiles
|
|
|
|
| Profile | Code | Optimized For |
|
|
|---------|------|---------------|
|
|
| Generic | `0x00` | General-purpose vectors |
|
|
| RVDNA | `0x01` | Genomic sequence embeddings |
|
|
| RVText | `0x02` | Language model embeddings |
|
|
| RVGraph | `0x03` | Graph/network node embeddings |
|
|
| RVVision | `0x04` | Image/vision model embeddings |
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>Building from Source</strong></summary>
|
|
|
|
### Prerequisites
|
|
|
|
- **Rust 1.87+** via [rustup](https://rustup.rs/) (`rustup update stable`)
|
|
- For WASM: `rustup target add wasm32-unknown-unknown`
|
|
- For Node.js bindings: Node.js 18+ and `npm`
|
|
|
|
### Build Examples
|
|
|
|
```bash
|
|
cd examples/rvf
|
|
cargo build
|
|
```
|
|
|
|
### Build All RVF Crates
|
|
|
|
```bash
|
|
cd crates/rvf
|
|
cargo build --workspace
|
|
```
|
|
|
|
### Run All Tests
|
|
|
|
```bash
|
|
cd crates/rvf
|
|
cargo test --workspace
|
|
```
|
|
|
|
### Run Clippy
|
|
|
|
```bash
|
|
cd crates/rvf
|
|
cargo clippy --all-targets --workspace --exclude rvf-wasm
|
|
```
|
|
|
|
### Build WASM Microkernel
|
|
|
|
```bash
|
|
cd crates/rvf
|
|
cargo build --target wasm32-unknown-unknown -p rvf-wasm --release
|
|
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasm
|
|
```
|
|
|
|
### Build Node.js Bindings
|
|
|
|
```bash
|
|
cd crates/rvf/rvf-node
|
|
npm install && npm run build
|
|
```
|
|
|
|
### Run Benchmarks
|
|
|
|
```bash
|
|
cd crates/rvf
|
|
cargo bench --bench rvf_benchmarks
|
|
```
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary><strong>Project Structure</strong></summary>
|
|
|
|
```
|
|
examples/rvf/
|
|
Cargo.toml # Standalone workspace
|
|
src/lib.rs # Shared utilities
|
|
examples/
|
|
# Core (6)
|
|
basic_store.rs # Store lifecycle, insert, query, persistence
|
|
progressive_index.rs # Three-layer HNSW, recall measurement
|
|
quantization.rs # Scalar, product, binary quantization + tiering
|
|
wire_format.rs # Raw segment I/O, hash validation, tail-scan
|
|
crypto_signing.rs # Ed25519 signing, witness chains, tamper detection
|
|
filtered_search.rs # Metadata-filtered vector search
|
|
# Agentic AI (6)
|
|
agent_memory.rs # Persistent agent memory + witness audit
|
|
swarm_knowledge.rs # Multi-agent shared knowledge base
|
|
reasoning_trace.rs # Chain-of-thought with lineage derivation
|
|
tool_cache.rs # Tool call result cache with TTL + compaction
|
|
agent_handoff.rs # Transfer agent state between instances
|
|
experience_replay.rs # RL experience replay buffer
|
|
# Practical Production (5)
|
|
semantic_search.rs # Document search engine (4 filter workflows)
|
|
recommendation.rs # Item recommendations (collaborative filtering)
|
|
rag_pipeline.rs # Retrieval-augmented generation pipeline
|
|
embedding_cache.rs # LRU cache with temperature tiering
|
|
dedup_detector.rs # Near-duplicate detection + compaction
|
|
# Vertical Domains (4)
|
|
genomic_pipeline.rs # DNA k-mer search (.rvdna profile)
|
|
financial_signals.rs # Market signals with attestation
|
|
medical_imaging.rs # Radiology embedding search (.rvvis)
|
|
legal_discovery.rs # Legal document similarity (.rvtext)
|
|
# Exotic Capabilities (5)
|
|
self_booting.rs # RVF with embedded unikernel
|
|
ebpf_accelerator.rs # eBPF hot-path acceleration
|
|
hyperbolic_taxonomy.rs # Hierarchy-aware search
|
|
multimodal_fusion.rs # Cross-modal text + image search
|
|
sealed_engine.rs # Full cognitive engine (capstone)
|
|
# Runtime Targets + Postgres (5)
|
|
browser_wasm.rs # Browser-side WASM vector search
|
|
edge_iot.rs # IoT device with binary quantization
|
|
serverless_function.rs # Cold-start optimized for Lambda
|
|
ruvllm_inference.rs # LLM KV cache + LoRA via RVF
|
|
postgres_bridge.rs # PostgreSQL ↔ RVF export/import
|
|
# Network & Security (4)
|
|
network_sync.rs # Peer-to-peer vector store sync
|
|
tee_attestation.rs # TEE attestation + sealed keys
|
|
access_control.rs # Role-based vector access control
|
|
zero_knowledge.rs # Zero-knowledge proof integration
|
|
# Autonomous Agent (1)
|
|
ruvbot.rs # Autonomous RVF-powered agent bot
|
|
# POSIX & Systems (3)
|
|
posix_fileops.rs # POSIX file operations with RVF
|
|
linux_microkernel.rs # Linux microkernel distribution
|
|
mcp_in_rvf.rs # MCP server embedded in RVF
|
|
# Network Operations (1)
|
|
network_interfaces.rs # Network OS telemetry (60 interfaces)
|
|
```
|
|
|
|
</details>
|
|
|
|
## Learn More
|
|
|
|
| Resource | Description |
|
|
|----------|-------------|
|
|
| [RVF Format Specification](../../crates/rvf/README.md) | Full format documentation, architecture, and API reference |
|
|
| [ADR-029](../../docs/adr/ADR-029-rvf-canonical-format.md) | Architecture decision record for the canonical format |
|
|
| [ADR-030](../../docs/adr/ADR-030-rvf-computational-container.md) | Computational container (KERNEL_SEG, EBPF_SEG) specification |
|
|
| [ADR-031](../../docs/adr/ADR-031-rvf-example-repository.md) | Example repository design (this collection of 40 examples) |
|
|
| [Benchmarks](../../crates/rvf/benches/) | Performance benchmarks (HNSW build, quantization, wire I/O) |
|
|
| [Integration Tests](../../crates/rvf/tests/rvf-integration/) | E2E test suite (progressive recall, quantization, wire interop) |
|
|
|
|
## Contributing
|
|
|
|
```bash
|
|
git clone https://github.com/ruvnet/ruvector
|
|
cd ruvector/examples/rvf
|
|
cargo build && cargo run --example basic_store
|
|
```
|
|
|
|
All contributions must pass `cargo clippy` with zero warnings and maintain the existing test count (currently 543+).
|
|
|
|
## License
|
|
|
|
Dual-licensed under [MIT](../../LICENSE-MIT) or [Apache-2.0](../../LICENSE-APACHE) at your option.
|
|
|
|
---
|
|
|
|
<p align="center">
|
|
<sub>Built with Rust. One file — store it, send it, run it.</sub>
|
|
</p>
|