Files

3.4 KiB

RVF: RuVector Format

A Living, Self-Reorganizing Runtime Substrate for Vector Intelligence


Document Index

Core Specification (spec/)

# Document Description
00 Overview The Four Laws, design coordinates, philosophy
01 Segment Model Append-only segments, headers, lifecycle, multi-file
02 Manifest System Two-level manifests, hotset pointers, progressive boot
03 Temperature Tiering Adaptive layout, access sketches, promotion/demotion
04 Progressive Indexing Layer A/B/C availability, lazy build, partial search
05 Overlay Epochs Streaming min-cut, epoch boundaries, rollback
06 Query Optimization SIMD alignment, prefetch, varint IDs, cache analysis
07 Deletion & Lifecycle Vector deletion, JOURNAL_SEG wire format, deletion bitmaps, compaction
08 Filtered Search META_SEG wire format, filter expressions, metadata indexes
09 Concurrency & Versioning Writer locking, reader-writer coordination, space reclamation
10 Operations API Batch ops, error codes, network streaming, compaction scheduling

Wire Format (wire/)

Document Description
Binary Layout Byte-level format reference, all segment payloads

WASM Microkernel (microkernel/)

Document Description
WASM Runtime Cognitum tile mapping, 14 exports, hub-tile protocol

Domain Profiles (profiles/)

Document Description
Domain Profiles RVDNA, RVText, RVGraph, RVVision specifications

Cryptography (crypto/)

Document Description
Quantum Signatures ML-DSA-65, SHAKE-256, hybrid encryption, witnesses

Benchmarks (benchmarks/)

Document Description
Acceptance Tests Performance targets, crash safety, scalability

Quick Reference

The Four Laws

  1. Truth lives at the tail
  2. Every segment is independently valid
  3. Data and state are separated
  4. The format adapts to its workload

Minimal Upgrade Path (smallest changes that unlock everything)

  1. Add tail manifest segments
  2. Make every payload a segment with its own hash and length
  3. Add hotset pointers in the manifest
  4. Add an epoch overlay model

Hardware Profiles

  • Core: 8 KB code + 8 KB data + 64 KB SIMD (Cognitum tile)
  • Hot: Multi-tile chip with shared memory
  • Full: Desktop/server with mmap and full feature set

Key Numbers

  • Boot: 4 KB read, < 5 ms
  • First query: <= 4 MB read, recall >= 0.70
  • Full quality: recall >= 0.95
  • Signing: ML-DSA-65, 3,309 B signatures, ~4,500 sign/s
  • Distance: 384-dim fp16 L2 in ~12 AVX-512 cycles
  • Hot entry: 960 bytes (vector + 16 neighbors, cache-line aligned)

Design Choices

  • Append-only + compaction (not random writes)
  • Both mmap desktop and microcontroller tiles
  • Priority: streamable > progressive > adaptive > p95 speed