Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
374
vendor/ruvector/docs/research/rvf/spec/04-progressive-indexing.md
vendored
Normal file
374
vendor/ruvector/docs/research/rvf/spec/04-progressive-indexing.md
vendored
Normal file
@@ -0,0 +1,374 @@
|
||||
# RVF Progressive Indexing
|
||||
|
||||
## 1. Index as Layers of Availability
|
||||
|
||||
Traditional HNSW serialization is all-or-nothing: either the full graph is loaded,
|
||||
or nothing works. RVF decomposes the index into three layers of availability, each
|
||||
independently useful, each stored in separate INDEX_SEG segments.
|
||||
|
||||
```
|
||||
Layer C: Full Adjacency
|
||||
+--------------------------------------------------+
|
||||
| Complete neighbor lists for every node at every |
|
||||
| HNSW level. Built lazily. Optional for queries. |
|
||||
| Recall: >= 0.95 |
|
||||
+--------------------------------------------------+
|
||||
^ loaded last (seconds to minutes)
|
||||
|
|
||||
Layer B: Partial Adjacency
|
||||
+--------------------------------------------------+
|
||||
| Neighbor lists for the most-accessed region |
|
||||
| (determined by temperature sketch). Covers the |
|
||||
| hot working set of the graph. |
|
||||
| Recall: >= 0.85 |
|
||||
+--------------------------------------------------+
|
||||
^ loaded second (100ms - 1s)
|
||||
|
|
||||
Layer A: Entry Points + Coarse Routing
|
||||
+--------------------------------------------------+
|
||||
| HNSW entry points. Top-layer adjacency lists. |
|
||||
| Cluster centroids for IVF pre-routing. |
|
||||
| Always present. Always in Level 0 hotset. |
|
||||
| Recall: >= 0.70 |
|
||||
+--------------------------------------------------+
|
||||
^ loaded first (< 5ms)
|
||||
|
|
||||
File open
|
||||
```
|
||||
|
||||
### Why Three Layers
|
||||
|
||||
| Layer | Purpose | Data Size (10M vectors) | Load Time (NVMe) |
|
||||
|-------|---------|------------------------|-------------------|
|
||||
| A | First query possible | 1-4 MB | < 5 ms |
|
||||
| B | Good quality for working set | 50-200 MB | 100-500 ms |
|
||||
| C | Full recall for all queries | 1-4 GB | 2-10 s |
|
||||
|
||||
A system that only loads Layer A can still answer queries — just with lower recall.
|
||||
As layers B and C load asynchronously, quality improves transparently.
|
||||
|
||||
## 2. Layer A: Entry Points and Coarse Routing
|
||||
|
||||
### Content
|
||||
|
||||
- **HNSW entry points**: The node(s) at the highest layer of the HNSW graph.
|
||||
Typically 1 node, but may be multiple for redundancy.
|
||||
- **Top-layer adjacency**: Full neighbor lists for all nodes at HNSW layers
|
||||
>= ceil(ln(N) / ln(M)) - 2. For 10M vectors with M=16, this is layers 5-6,
|
||||
containing ~100-1000 nodes.
|
||||
- **Cluster centroids**: K centroids (K = sqrt(N) typically, so ~3162 for 10M)
|
||||
used for IVF-style partition routing.
|
||||
- **Centroid-to-partition map**: Which centroid owns which vector ID ranges.
|
||||
|
||||
### Storage
|
||||
|
||||
Layer A data is stored in a dedicated INDEX_SEG with `flags.HOT` set. The root
|
||||
manifest's hotset pointers reference this segment directly. On cold start, this
|
||||
is the first data mapped after the manifest.
|
||||
|
||||
### Binary Layout of Layer A INDEX_SEG
|
||||
|
||||
```
|
||||
+-------------------------------------------+
|
||||
| Header: INDEX_SEG, flags=HOT |
|
||||
+-------------------------------------------+
|
||||
| Block 0: Entry Points |
|
||||
| entry_count: u32 |
|
||||
| max_layer: u32 |
|
||||
| [entry_node_id: u64, layer: u32] * N |
|
||||
+-------------------------------------------+
|
||||
| Block 1: Top-Layer Adjacency |
|
||||
| layer_count: u32 |
|
||||
| For each layer (top to bottom): |
|
||||
| node_count: u32 |
|
||||
| For each node: |
|
||||
| node_id: u64 |
|
||||
| neighbor_count: u16 |
|
||||
| [neighbor_id: u64] * neighbor_count |
|
||||
| [64B padding] |
|
||||
+-------------------------------------------+
|
||||
| Block 2: Centroids |
|
||||
| centroid_count: u32 |
|
||||
| dim: u16 |
|
||||
| dtype: u8 (fp16) |
|
||||
| [centroid_vector: fp16 * dim] * K |
|
||||
| [64B aligned] |
|
||||
+-------------------------------------------+
|
||||
| Block 3: Partition Map |
|
||||
| partition_count: u32 |
|
||||
| For each partition: |
|
||||
| centroid_id: u32 |
|
||||
| vector_id_start: u64 |
|
||||
| vector_id_end: u64 |
|
||||
| segment_ref: u64 (segment_id) |
|
||||
| block_ref: u32 (block offset) |
|
||||
+-------------------------------------------+
|
||||
```
|
||||
|
||||
### Query Using Only Layer A
|
||||
|
||||
```python
|
||||
def query_layer_a_only(query, k, layer_a):
|
||||
# Step 1: Find nearest centroids
|
||||
dists = [distance(query, c) for c in layer_a.centroids]
|
||||
top_partitions = top_n(dists, n_probe)
|
||||
|
||||
# Step 2: HNSW search through top layers only
|
||||
entry = layer_a.entry_points[0]
|
||||
current = entry
|
||||
for layer in range(layer_a.max_layer, layer_a.min_available_layer, -1):
|
||||
current = greedy_search(query, current, layer_a.adjacency[layer])
|
||||
|
||||
# Step 3: If hot cache available, refine against it
|
||||
if hot_cache:
|
||||
candidates = scan_hot_cache(query, hot_cache, current.partition)
|
||||
return top_k(candidates, k)
|
||||
|
||||
# Step 4: Otherwise, return centroid-approximate results
|
||||
return approximate_from_centroids(query, top_partitions, k)
|
||||
```
|
||||
|
||||
Expected recall: 0.65-0.75 (depends on centroid quality and hot cache coverage).
|
||||
|
||||
## 3. Layer B: Partial Adjacency
|
||||
|
||||
### Content
|
||||
|
||||
Neighbor lists for the **hot region** of the graph — the set of nodes that appear
|
||||
most frequently in query traversals. Determined by the temperature sketch (see
|
||||
03-temperature-tiering.md).
|
||||
|
||||
Typically covers:
|
||||
- All nodes at HNSW layers >= 2
|
||||
- Layer 0-1 nodes in the hot temperature tier
|
||||
- ~10-20% of total nodes
|
||||
|
||||
### Storage
|
||||
|
||||
Layer B is stored in one or more INDEX_SEGs without the HOT flag. The Level 1
|
||||
manifest maps these segments and records which node ID ranges they cover.
|
||||
|
||||
### Incremental Build
|
||||
|
||||
Layer B can be built incrementally:
|
||||
|
||||
```
|
||||
1. After Layer A is loaded, begin query serving
|
||||
2. In background: read VEC_SEGs for hot-tier blocks
|
||||
3. Build HNSW adjacency for those blocks
|
||||
4. Write as new INDEX_SEG
|
||||
5. Update manifest to include Layer B
|
||||
6. Future queries use Layer B for better recall
|
||||
```
|
||||
|
||||
This means the index improves over time without blocking any queries.
|
||||
|
||||
### Partial Adjacency Routing
|
||||
|
||||
When a query traversal reaches a node without Layer B adjacency (i.e., it's in
|
||||
the cold region), the system falls back to:
|
||||
|
||||
1. **Centroid routing**: Use Layer A centroids to estimate the nearest region
|
||||
2. **Linear scan**: Scan the relevant VEC_SEG block directly
|
||||
3. **Approximate**: Accept slightly lower recall for that portion
|
||||
|
||||
```python
|
||||
def search_with_partial_index(query, k, layers):
|
||||
# Start with Layer A routing
|
||||
current = hnsw_search_layers(query, layers.a, layers.a.max_layer, 2)
|
||||
|
||||
# Continue with Layer B (where available)
|
||||
if layers.b.has_node(current):
|
||||
current = hnsw_search_layers(query, layers.b, 1, 0,
|
||||
start=current)
|
||||
else:
|
||||
# Fallback: scan the block containing current
|
||||
candidates = linear_scan_block(query, current.block)
|
||||
current = best_of(current, candidates)
|
||||
|
||||
return top_k(current.visited, k)
|
||||
```
|
||||
|
||||
## 4. Layer C: Full Adjacency
|
||||
|
||||
### Content
|
||||
|
||||
Complete neighbor lists for every node at every HNSW level. This is the
|
||||
traditional full HNSW graph.
|
||||
|
||||
### Storage
|
||||
|
||||
Layer C may be split across multiple INDEX_SEGs for large datasets. The
|
||||
manifest records the node ID ranges covered by each segment.
|
||||
|
||||
### Lazy Build
|
||||
|
||||
Layer C is built lazily — it is not required for the file to be functional.
|
||||
The build process runs as a background task:
|
||||
|
||||
```
|
||||
1. Identify unindexed VEC_SEG blocks (those without Layer C adjacency)
|
||||
2. Read blocks in partition order (good locality)
|
||||
3. Build HNSW adjacency using the existing partial graph as scaffold
|
||||
4. Write new INDEX_SEG(s)
|
||||
5. Update manifest
|
||||
```
|
||||
|
||||
### Build Prioritization
|
||||
|
||||
Blocks are indexed in temperature order:
|
||||
1. Hot blocks first (most query benefit)
|
||||
2. Warm blocks next
|
||||
3. Cold blocks last (may never be indexed if queries don't reach them)
|
||||
|
||||
This means the index build converges to useful quality fast, then approaches
|
||||
completeness asymptotically.
|
||||
|
||||
## 5. Index Segment Binary Format
|
||||
|
||||
### Adjacency List Encoding
|
||||
|
||||
Neighbor lists are stored using **varint delta encoding with restart points**
|
||||
for fast random access:
|
||||
|
||||
```
|
||||
+-------------------------------------------+
|
||||
| Restart Point Index |
|
||||
| restart_interval: u32 (e.g., 64) |
|
||||
| restart_count: u32 |
|
||||
| [restart_offset: u32] * restart_count |
|
||||
| [64B aligned] |
|
||||
+-------------------------------------------+
|
||||
| Adjacency Data |
|
||||
| For each node (sorted by node_id): |
|
||||
| neighbor_count: varint |
|
||||
| [delta_encoded_neighbor_id: varint] |
|
||||
| (restart point every N nodes) |
|
||||
+-------------------------------------------+
|
||||
```
|
||||
|
||||
**Restart points**: Every `restart_interval` nodes (default 64), the delta
|
||||
encoding resets to absolute IDs. This enables O(1) random access to any node's
|
||||
neighbors by:
|
||||
|
||||
1. Binary search the restart point index for the nearest restart <= target
|
||||
2. Seek to that restart offset
|
||||
3. Sequentially decode from restart to target (at most 63 decodes)
|
||||
|
||||
### Varint Encoding
|
||||
|
||||
Standard LEB128 varint:
|
||||
- Values 0-127: 1 byte
|
||||
- Values 128-16383: 2 bytes
|
||||
- Values 16384-2097151: 3 bytes
|
||||
|
||||
For delta-encoded neighbor IDs (typical delta: 1-1000), most values fit in 1-2
|
||||
bytes, giving ~3-4x compression over fixed u64.
|
||||
|
||||
### Prefetch Hints
|
||||
|
||||
The manifest's prefetch table maps node ID ranges to contiguous page ranges:
|
||||
|
||||
```
|
||||
Prefetch Entry:
|
||||
node_id_start: u64
|
||||
node_id_end: u64
|
||||
page_offset: u64 Offset of first contiguous page
|
||||
page_count: u32 Number of contiguous pages
|
||||
prefetch_ahead: u32 Pages to prefetch ahead of current access
|
||||
```
|
||||
|
||||
When the HNSW search accesses a node, the runtime issues `madvise(WILLNEED)`
|
||||
(or equivalent) for the next `prefetch_ahead` pages. This hides disk/memory
|
||||
latency behind computation.
|
||||
|
||||
## 6. Index Consistency
|
||||
|
||||
### Append-Only Index Updates
|
||||
|
||||
When new vectors are added:
|
||||
|
||||
1. New vectors go into a **fresh VEC_SEG** (append-only)
|
||||
2. A temporary in-memory index covers the new vectors
|
||||
3. When the in-memory index reaches a threshold, it is written as a new INDEX_SEG
|
||||
4. The manifest is updated to include both the old and new INDEX_SEGs
|
||||
5. Queries search both indexes and merge results
|
||||
|
||||
This is analogous to LSM-tree compaction levels but for graph indexes.
|
||||
|
||||
### Index Merging
|
||||
|
||||
When too many small INDEX_SEGs accumulate:
|
||||
|
||||
```
|
||||
1. Read all small INDEX_SEGs
|
||||
2. Build a unified HNSW graph over all vectors
|
||||
3. Write as a single sealed INDEX_SEG
|
||||
4. Tombstone old INDEX_SEGs in manifest
|
||||
```
|
||||
|
||||
### Concurrent Read/Write
|
||||
|
||||
Readers always see a consistent snapshot through the manifest chain:
|
||||
- Reader opens file -> reads manifest -> has immutable segment set
|
||||
- Writer appends new segments + new manifest
|
||||
- Reader continues using old manifest until it explicitly re-reads
|
||||
- No locks needed — append-only guarantees no mutation of existing data
|
||||
|
||||
## 7. Query Path Integration
|
||||
|
||||
The complete query path combining progressive indexing with temperature tiering:
|
||||
|
||||
```
|
||||
Query
|
||||
|
|
||||
v
|
||||
+-----------+
|
||||
| Layer A | Entry points + top-layer routing
|
||||
| (always) | ~5ms to load on cold start
|
||||
+-----------+
|
||||
|
|
||||
Is Layer B available for this region?
|
||||
/ \
|
||||
Yes No
|
||||
/ \
|
||||
+-----------+ +-----------+
|
||||
| Layer B | | Centroid |
|
||||
| HNSW | | Fallback |
|
||||
| search | | + scan |
|
||||
+-----------+ +-----------+
|
||||
\ /
|
||||
\ /
|
||||
v v
|
||||
+-----------+
|
||||
| Candidate |
|
||||
| Set |
|
||||
+-----------+
|
||||
|
|
||||
Is hot cache available?
|
||||
/ \
|
||||
Yes No
|
||||
/ \
|
||||
+-----------+ +-----------+
|
||||
| Hot cache | | Decode |
|
||||
| re-rank | | from |
|
||||
| (int8/fp16)| | VEC_SEG |
|
||||
+-----------+ +-----------+
|
||||
\ /
|
||||
v v
|
||||
+-----------+
|
||||
| Top-K |
|
||||
| Results |
|
||||
+-----------+
|
||||
```
|
||||
|
||||
### Recall Expectations by State
|
||||
|
||||
| State | Layers Available | Expected Recall@10 |
|
||||
|-------|-----------------|-------------------|
|
||||
| Cold start (L0 only) | A | 0.65-0.75 |
|
||||
| L0 + hot cache | A + hot | 0.75-0.85 |
|
||||
| L0 + L1 loading | A + B partial | 0.80-0.90 |
|
||||
| L1 complete | A + B | 0.85-0.92 |
|
||||
| Full load | A + B + C | 0.95-0.99 |
|
||||
| Full + optimized | A + B + C + hot | 0.98-0.999 |
|
||||
Reference in New Issue
Block a user