Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

13 KiB

Raw Blame History

RVF Progressive Indexing

1. Index as Layers of Availability

Traditional HNSW serialization is all-or-nothing: either the full graph is loaded, or nothing works. RVF decomposes the index into three layers of availability, each independently useful, each stored in separate INDEX_SEG segments.

Layer C: Full Adjacency
+--------------------------------------------------+
| Complete neighbor lists for every node at every   |
| HNSW level. Built lazily. Optional for queries.   |
| Recall: >= 0.95                                   |
+--------------------------------------------------+
        ^  loaded last (seconds to minutes)
        |
Layer B: Partial Adjacency
+--------------------------------------------------+
| Neighbor lists for the most-accessed region       |
| (determined by temperature sketch). Covers the    |
| hot working set of the graph.                     |
| Recall: >= 0.85                                   |
+--------------------------------------------------+
        ^  loaded second (100ms - 1s)
        |
Layer A: Entry Points + Coarse Routing
+--------------------------------------------------+
| HNSW entry points. Top-layer adjacency lists.     |
| Cluster centroids for IVF pre-routing.            |
| Always present. Always in Level 0 hotset.         |
| Recall: >= 0.70                                   |
+--------------------------------------------------+
        ^  loaded first (< 5ms)
        |
      File open

Why Three Layers

Layer	Purpose	Data Size (10M vectors)	Load Time (NVMe)
A	First query possible	1-4 MB	< 5 ms
B	Good quality for working set	50-200 MB	100-500 ms
C	Full recall for all queries	1-4 GB	2-10 s

A system that only loads Layer A can still answer queries — just with lower recall. As layers B and C load asynchronously, quality improves transparently.

2. Layer A: Entry Points and Coarse Routing

Content

HNSW entry points: The node(s) at the highest layer of the HNSW graph. Typically 1 node, but may be multiple for redundancy.
Top-layer adjacency: Full neighbor lists for all nodes at HNSW layers

= ceil(ln(N) / ln(M)) - 2. For 10M vectors with M=16, this is layers 5-6, containing ~100-1000 nodes.
Cluster centroids: K centroids (K = sqrt(N) typically, so ~3162 for 10M) used for IVF-style partition routing.
Centroid-to-partition map: Which centroid owns which vector ID ranges.

Storage

Layer A data is stored in a dedicated INDEX_SEG with flags.HOT set. The root manifest's hotset pointers reference this segment directly. On cold start, this is the first data mapped after the manifest.

Binary Layout of Layer A INDEX_SEG

+-------------------------------------------+
| Header: INDEX_SEG, flags=HOT              |
+-------------------------------------------+
| Block 0: Entry Points                     |
|   entry_count: u32                        |
|   max_layer: u32                          |
|   [entry_node_id: u64, layer: u32] * N    |
+-------------------------------------------+
| Block 1: Top-Layer Adjacency              |
|   layer_count: u32                        |
|   For each layer (top to bottom):         |
|     node_count: u32                       |
|     For each node:                        |
|       node_id: u64                        |
|       neighbor_count: u16                 |
|       [neighbor_id: u64] * neighbor_count |
|     [64B padding]                         |
+-------------------------------------------+
| Block 2: Centroids                        |
|   centroid_count: u32                     |
|   dim: u16                                |
|   dtype: u8 (fp16)                        |
|   [centroid_vector: fp16 * dim] * K       |
|   [64B aligned]                           |
+-------------------------------------------+
| Block 3: Partition Map                    |
|   partition_count: u32                    |
|   For each partition:                     |
|     centroid_id: u32                      |
|     vector_id_start: u64                  |
|     vector_id_end: u64                    |
|     segment_ref: u64 (segment_id)         |
|     block_ref: u32 (block offset)         |
+-------------------------------------------+

Query Using Only Layer A

def query_layer_a_only(query, k, layer_a):
    # Step 1: Find nearest centroids
    dists = [distance(query, c) for c in layer_a.centroids]
    top_partitions = top_n(dists, n_probe)

    # Step 2: HNSW search through top layers only
    entry = layer_a.entry_points[0]
    current = entry
    for layer in range(layer_a.max_layer, layer_a.min_available_layer, -1):
        current = greedy_search(query, current, layer_a.adjacency[layer])

    # Step 3: If hot cache available, refine against it
    if hot_cache:
        candidates = scan_hot_cache(query, hot_cache, current.partition)
        return top_k(candidates, k)

    # Step 4: Otherwise, return centroid-approximate results
    return approximate_from_centroids(query, top_partitions, k)

Expected recall: 0.65-0.75 (depends on centroid quality and hot cache coverage).

3. Layer B: Partial Adjacency

Content

Neighbor lists for the hot region of the graph — the set of nodes that appear most frequently in query traversals. Determined by the temperature sketch (see 03-temperature-tiering.md).

Typically covers:

All nodes at HNSW layers >= 2
Layer 0-1 nodes in the hot temperature tier
~10-20% of total nodes

Storage

Layer B is stored in one or more INDEX_SEGs without the HOT flag. The Level 1 manifest maps these segments and records which node ID ranges they cover.

Incremental Build

Layer B can be built incrementally:

1. After Layer A is loaded, begin query serving
2. In background: read VEC_SEGs for hot-tier blocks
3. Build HNSW adjacency for those blocks
4. Write as new INDEX_SEG
5. Update manifest to include Layer B
6. Future queries use Layer B for better recall

This means the index improves over time without blocking any queries.

Partial Adjacency Routing

When a query traversal reaches a node without Layer B adjacency (i.e., it's in the cold region), the system falls back to:

Centroid routing: Use Layer A centroids to estimate the nearest region
Linear scan: Scan the relevant VEC_SEG block directly
Approximate: Accept slightly lower recall for that portion

def search_with_partial_index(query, k, layers):
    # Start with Layer A routing
    current = hnsw_search_layers(query, layers.a, layers.a.max_layer, 2)

    # Continue with Layer B (where available)
    if layers.b.has_node(current):
        current = hnsw_search_layers(query, layers.b, 1, 0,
                                      start=current)
    else:
        # Fallback: scan the block containing current
        candidates = linear_scan_block(query, current.block)
        current = best_of(current, candidates)

    return top_k(current.visited, k)

4. Layer C: Full Adjacency

Content

Complete neighbor lists for every node at every HNSW level. This is the traditional full HNSW graph.

Storage

Layer C may be split across multiple INDEX_SEGs for large datasets. The manifest records the node ID ranges covered by each segment.

Lazy Build

Layer C is built lazily — it is not required for the file to be functional. The build process runs as a background task:

1. Identify unindexed VEC_SEG blocks (those without Layer C adjacency)
2. Read blocks in partition order (good locality)
3. Build HNSW adjacency using the existing partial graph as scaffold
4. Write new INDEX_SEG(s)
5. Update manifest

Build Prioritization

Blocks are indexed in temperature order:

Hot blocks first (most query benefit)
Warm blocks next
Cold blocks last (may never be indexed if queries don't reach them)

This means the index build converges to useful quality fast, then approaches completeness asymptotically.

5. Index Segment Binary Format

Adjacency List Encoding

Neighbor lists are stored using varint delta encoding with restart points for fast random access:

+-------------------------------------------+
| Restart Point Index                       |
|   restart_interval: u32 (e.g., 64)       |
|   restart_count: u32                      |
|   [restart_offset: u32] * restart_count   |
|   [64B aligned]                           |
+-------------------------------------------+
| Adjacency Data                            |
|   For each node (sorted by node_id):      |
|     neighbor_count: varint                |
|     [delta_encoded_neighbor_id: varint]   |
|     (restart point every N nodes)         |
+-------------------------------------------+

Restart points: Every restart_interval nodes (default 64), the delta encoding resets to absolute IDs. This enables O(1) random access to any node's neighbors by:

Binary search the restart point index for the nearest restart <= target
Seek to that restart offset
Sequentially decode from restart to target (at most 63 decodes)

Varint Encoding

Standard LEB128 varint:

Values 0-127: 1 byte
Values 128-16383: 2 bytes
Values 16384-2097151: 3 bytes

For delta-encoded neighbor IDs (typical delta: 1-1000), most values fit in 1-2 bytes, giving ~3-4x compression over fixed u64.

Prefetch Hints

The manifest's prefetch table maps node ID ranges to contiguous page ranges:

Prefetch Entry:
  node_id_start: u64
  node_id_end: u64
  page_offset: u64      Offset of first contiguous page
  page_count: u32       Number of contiguous pages
  prefetch_ahead: u32   Pages to prefetch ahead of current access

When the HNSW search accesses a node, the runtime issues madvise(WILLNEED) (or equivalent) for the next prefetch_ahead pages. This hides disk/memory latency behind computation.

6. Index Consistency

Append-Only Index Updates

When new vectors are added:

New vectors go into a fresh VEC_SEG (append-only)
A temporary in-memory index covers the new vectors
When the in-memory index reaches a threshold, it is written as a new INDEX_SEG
The manifest is updated to include both the old and new INDEX_SEGs
Queries search both indexes and merge results

This is analogous to LSM-tree compaction levels but for graph indexes.

Index Merging

When too many small INDEX_SEGs accumulate:

1. Read all small INDEX_SEGs
2. Build a unified HNSW graph over all vectors
3. Write as a single sealed INDEX_SEG
4. Tombstone old INDEX_SEGs in manifest

Concurrent Read/Write

Readers always see a consistent snapshot through the manifest chain:

Reader opens file -> reads manifest -> has immutable segment set
Writer appends new segments + new manifest
Reader continues using old manifest until it explicitly re-reads
No locks needed — append-only guarantees no mutation of existing data

7. Query Path Integration

The complete query path combining progressive indexing with temperature tiering:

                         Query
                           |
                           v
                    +-----------+
                    | Layer A   |   Entry points + top-layer routing
                    | (always)  |   ~5ms to load on cold start
                    +-----------+
                           |
                    Is Layer B available for this region?
                      /              \
                    Yes               No
                    /                   \
            +-----------+         +-----------+
            | Layer B   |         | Centroid  |
            | HNSW      |         | Fallback  |
            | search    |         | + scan    |
            +-----------+         +-----------+
                    \                  /
                     \                /
                      v              v
                    +-----------+
                    | Candidate |
                    | Set       |
                    +-----------+
                           |
                    Is hot cache available?
                      /              \
                    Yes               No
                    /                   \
            +-----------+         +-----------+
            | Hot cache |         | Decode    |
            | re-rank   |         | from      |
            | (int8/fp16)|        | VEC_SEG   |
            +-----------+         +-----------+
                    \                  /
                     v                v
                    +-----------+
                    | Top-K     |
                    | Results   |
                    +-----------+

Recall Expectations by State

State	Layers Available	Expected Recall@10
Cold start (L0 only)	A	0.65-0.75
L0 + hot cache	A + hot	0.75-0.85
L0 + L1 loading	A + B partial	0.80-0.90
L1 complete	A + B	0.85-0.92
Full load	A + B + C	0.95-0.99
Full + optimized	A + B + C + hot	0.98-0.999

13 KiB Raw Blame History

RVF Progressive Indexing

1. Index as Layers of Availability

Why Three Layers

2. Layer A: Entry Points and Coarse Routing

Content

Storage

Binary Layout of Layer A INDEX_SEG

Query Using Only Layer A

3. Layer B: Partial Adjacency

Content

Storage

Incremental Build

Partial Adjacency Routing

4. Layer C: Full Adjacency

Content

Storage

Lazy Build

Build Prioritization

5. Index Segment Binary Format

Adjacency List Encoding

Varint Encoding

Prefetch Hints

6. Index Consistency

Append-Only Index Updates

Index Merging

Concurrent Read/Write

7. Query Path Integration

Recall Expectations by State

13 KiB

Raw Blame History