20 KiB
RVF Deletion Lifecycle
1. Overview
Deletion in RVF follows a two-phase protocol consistent with the append-only segment architecture. Vectors are never removed in-place. Instead, a soft delete records intent in a JOURNAL_SEG, and a subsequent compaction hard deletes by physically excluding the vectors from sealed output segments.
JOURNAL_SEG Compaction GC / Rewrite
(append) (merge) (reclaim)
ACTIVE -----> SOFT_DELETED -----> HARD_DELETED ------> RECLAIMED
| | | |
| query path | query path | |
| returns vec | skips vec | vec absent | space freed
| | (bitmap check) | from output seg |
Readers always see a consistent snapshot: a deletion is invisible until the manifest referencing the new deletion bitmap is durably committed.
2. Vector Lifecycle State Machine
+----------+ JOURNAL_SEG +-----------------+
| | DELETE_VECTOR / RANGE | |
| ACTIVE +----------------------->+ SOFT_DELETED |
| | | |
+----------+ +--------+--------+
| Compaction seals output
v excluding this vector
+--------+--------+
| HARD_DELETED |
+--------+--------+
| File rewrite / truncation
v reclaims physical space
+--------+--------+
| RECLAIMED |
+-----------------+
| State | Bitmap Bit | Physical Bytes | Query Visible |
|---|---|---|---|
| ACTIVE | 0 | Vector in VEC_SEG | Yes |
| SOFT_DELETED | 1 | Vector in VEC_SEG | No |
| HARD_DELETED | N/A | Excluded from sealed output | No |
| RECLAIMED | N/A | Bytes overwritten / freed | No |
| Transition | Trigger | Durability |
|---|---|---|
| ACTIVE -> SOFT_DELETED | JOURNAL_SEG + MANIFEST_SEG with bitmap | After manifest fsync |
| SOFT_DELETED -> HARD_DELETED | Compaction writes sealed VEC_SEG without vector | After compaction manifest fsync |
| HARD_DELETED -> RECLAIMED | File rewrite or old shard deletion | After shard unlink |
3. JOURNAL_SEG Wire Format (type 0x04)
A JOURNAL_SEG records metadata mutations: deletions, metadata updates, tier
moves, and ID remappings. Its payload follows the standard 64-byte segment
header (see 01-segment-model.md section 2).
3.1 Journal Header (64 bytes)
Offset Type Field Description
------ ---- ----- -----------
0x00 u32 entry_count Number of journal entries
0x04 u32 journal_epoch Epoch when this journal was written
0x08 u64 prev_journal_seg_id Segment ID of previous JOURNAL_SEG (0 if first)
0x10 u32 flags Reserved, must be 0
0x14 u8[44] reserved Zero-padded to 64-byte alignment
3.2 Journal Entry Format
Each entry begins on an 8-byte aligned boundary:
Offset Type Field Description
------ ---- ----- -----------
0x00 u8 entry_type Entry type enum
0x01 u8 reserved Must be 0x00
0x02 u16 entry_length Byte length of type-specific payload
0x04 u8[] payload Type-specific payload
var u8[] padding Zero-pad to next 8-byte boundary
3.3 Entry Types
Value Name Payload Size Description
----- ---- ------------ -----------
0x01 DELETE_VECTOR 8 B Delete a single vector by ID
0x02 DELETE_RANGE 16 B Delete a contiguous range of vector IDs
0x03 UPDATE_METADATA variable Update key-value metadata for a vector
0x04 MOVE_VECTOR 24 B Reassign vector to a different segment/tier
0x05 REMAP_ID 16 B Reassign vector ID (post-compaction)
3.4 Type-Specific Payloads
DELETE_VECTOR (0x01)
0x00 u64 vector_id ID of the vector to soft-delete
DELETE_RANGE (0x02)
0x00 u64 start_id First vector ID (inclusive)
0x08 u64 end_id Last vector ID (exclusive)
Invariant: start_id < end_id. Range [start_id, end_id) is half-open.
UPDATE_METADATA (0x03)
0x00 u64 vector_id Target vector ID
0x08 u16 key_len Byte length of metadata key
0x0A u8[] key Metadata key (UTF-8)
var u16 val_len Byte length of metadata value
var+2 u8[] val Metadata value (opaque bytes)
MOVE_VECTOR (0x04)
0x00 u64 vector_id Target vector ID
0x08 u64 src_seg Source segment ID
0x10 u64 dst_seg Destination segment ID
REMAP_ID (0x05)
0x00 u64 old_id Original vector ID
0x08 u64 new_id New vector ID after compaction
3.5 Complete JOURNAL_SEG Example
Deleting vector 42, deleting range [1000, 2000), remapping ID 500 -> 3:
Byte offset Content Notes
----------- ------- -----
0x00-0x3F Segment header (64 B) seg_type=0x04, magic=RVFS
0x40-0x7F Journal header (64 B) entry_count=3, epoch=7,
prev_journal_seg_id=12
--- Entry 0: DELETE_VECTOR ---
0x80 0x01 entry_type
0x81 0x00 reserved
0x82-0x83 0x0008 entry_length = 8
0x84-0x8B 0x000000000000002A vector_id = 42
0x8C-0x8F 0x00000000 padding to 8B
--- Entry 1: DELETE_RANGE ---
0x90 0x02 entry_type
0x91 0x00 reserved
0x92-0x93 0x0010 entry_length = 16
0x94-0x9B 0x00000000000003E8 start_id = 1000
0x9C-0xA3 0x00000000000007D0 end_id = 2000
--- Entry 2: REMAP_ID ---
0xA4 0x05 entry_type
0xA5 0x00 reserved
0xA6-0xA7 0x0010 entry_length = 16
0xA8-0xAF 0x00000000000001F4 old_id = 500
0xB0-0xB7 0x0000000000000003 new_id = 3
4. Deletion Bitmap
4.1 Manifest Record
The deletion bitmap is stored in the Level 1 manifest as a TLV record:
Tag Name Description
--- ---- -----------
0x000E DELETION_BITMAP Roaring bitmap of soft-deleted vector IDs
This extends the TLV tag space (previous: 0x000D KEY_DIRECTORY).
4.2 Roaring Bitmap Binary Layout
Vector IDs are 64-bit. The upper 32 bits select a high key; the lower 32 bits index into a container for that high key.
+---------------------------------------------+
| DELETION_BITMAP TLV Value |
+---------------------------------------------+
| Bitmap Header |
| cookie: u32 (0x3B3A3332) |
| high_key_count: u32 |
| For each high key: |
| high_key: u32 |
| container_type: u8 |
| 0x01 = ARRAY_CONTAINER |
| 0x02 = BITMAP_CONTAINER |
| 0x03 = RUN_CONTAINER |
| container_offset: u32 (from bitmap start)|
| [8B aligned] |
+---------------------------------------------+
| Container Data |
| Container 0: [type-specific layout] |
| Container 1: ... |
| [8B aligned per container] |
+---------------------------------------------+
4.3 Container Types
ARRAY_CONTAINER (0x01) -- Sparse deletions (< 4096 set bits per 64K range).
0x00 u16 cardinality Number of set values (1-4096)
0x02 u16[] values Sorted array of 16-bit values
Size: 2 + 2 * cardinality bytes.
BITMAP_CONTAINER (0x02) -- Dense deletions (>= 4096 set bits per 64K range).
0x00 u16 cardinality Number of set bits
0x02 u8[8192] bitmap Fixed 65536-bit bitmap (8 KB)
Size: 8194 bytes (fixed).
RUN_CONTAINER (0x03) -- Contiguous ranges of deletions.
0x00 u16 run_count Number of runs
0x02 (u16,u16) runs[] Array of (start, length-1) pairs
Size: 2 + 4 * run_count bytes.
4.4 Size Estimation
| Deletion Pattern | Deleted IDs | Container Types | Bitmap Size |
|---|---|---|---|
| Sparse random | 10,000 (0.1%) | ~153 array | ~22 KB |
| Clustered ranges | 10,000 (0.1%) | ~5 run | ~0.1 KB |
| Mixed workload | 100,000 (1%) | array + run | ~80 KB |
| Heavy deletion | 1,000,000 (10%) | bitmap + run | ~200 KB |
Even at 200 KB the bitmap fits entirely in L2 cache.
4.5 Bitmap Operations
def bitmap_check(bitmap, vector_id):
"""Returns True if vector_id is soft-deleted. O(1) amortized."""
high_key = vector_id >> 16
low_val = vector_id & 0xFFFF
container = bitmap.get_container(high_key)
if container is None:
return False
return container.contains(low_val) # array: bsearch, bitmap: bit test, run: bsearch
def bitmap_set(bitmap, vector_id):
"""Mark a vector as soft-deleted."""
high_key = vector_id >> 16
low_val = vector_id & 0xFFFF
container = bitmap.get_or_create_container(high_key)
container.add(low_val)
if container.type == ARRAY and container.cardinality > 4096:
container.promote_to_bitmap()
5. Delete-Aware Query Path
5.1 HNSW Traversal with Deletion Filtering
Deleted vectors remain in the HNSW graph until compaction rebuilds the index. During search, the deletion bitmap is checked per candidate. Deleted nodes are still traversed for connectivity but excluded from the result set.
def hnsw_search_delete_aware(query, entry_point, ef_search, k, del_bitmap):
candidates = MaxHeap() # worst candidate on top
visited = BitSet()
worklist = MinHeap() # best candidate first
d0 = distance(query, get_vector(entry_point))
worklist.push((d0, entry_point))
visited.add(entry_point)
if not bitmap_check(del_bitmap, entry_point):
candidates.push((d0, entry_point))
while worklist:
dist, node = worklist.pop()
if candidates.size() >= ef_search and dist > candidates.peek_max():
break
neighbors = get_neighbors(node)
for n in neighbors[:PREFETCH_AHEAD]:
if n not in visited:
prefetch_vector(n)
for n in neighbors:
if n in visited:
continue
visited.add(n)
d = distance(query, get_vector(n))
is_deleted = bitmap_check(del_bitmap, n) # O(1) bitmap lookup
# Always add to worklist (graph connectivity)
if candidates.size() < ef_search or d < candidates.peek_max():
worklist.push((d, n))
# Only add to results if NOT deleted
if not is_deleted:
if candidates.size() < ef_search:
candidates.push((d, n))
elif d < candidates.peek_max():
candidates.replace_max((d, n))
return candidates.top_k(k)
5.2 Top-K Refinement with Deletion Filtering
def topk_refine_delete_aware(candidates, hot_cache, query, k, del_bitmap):
heap = MaxHeap()
for cand_dist, cand_id in candidates:
heap.push((cand_dist, cand_id))
for entry in hot_cache.sequential_scan():
if bitmap_check(del_bitmap, entry.vector_id):
continue # skip soft-deleted
d = distance(query, entry.vector)
if heap.size() < k:
heap.push((d, entry.vector_id))
elif d < heap.peek_max():
heap.replace_max((d, entry.vector_id))
return heap.drain_sorted()
5.3 Performance Impact
| Operation | Without Deletions | With Deletions | Overhead |
|---|---|---|---|
| Bitmap check | N/A | ~2-5 ns (L1/L2 hit) | Per candidate |
| HNSW step (M=16) | ~300-500 ns | ~330-580 ns | +10% |
| Top-K refine (1000) | ~10 us | ~12 us | +20% worst |
| Total query | ~50-75 us | ~55-85 us | +10-13% |
At typical deletion rates (< 5%), overhead is negligible: the bitmap fits in L2 cache, graph connectivity is preserved, and the cost is one branch plus one bitmap load per candidate.
6. Deletion Write Path
All deletion operations follow the same two-fsync protocol:
def delete_vectors(file, entries):
"""Soft-delete vectors. entries: list of DeleteVector or DeleteRange."""
# 1. Append JOURNAL_SEG
journal = JournalSegment(
epoch=current_epoch(file),
prev_journal_seg_id=latest_journal_id(file),
entries=entries
)
append_segment(file, journal)
fsync(file) # orphan-safe: no manifest references this yet
# 2. Update deletion bitmap in memory
bitmap = load_deletion_bitmap(file)
for e in entries:
if e.type == DELETE_VECTOR:
bitmap_set(bitmap, e.vector_id)
elif e.type == DELETE_RANGE:
bitmap.add_range(e.start_id, e.end_id)
# 3. Append MANIFEST_SEG with updated bitmap
manifest = build_manifest(file, deletion_bitmap=bitmap)
append_segment(file, manifest)
fsync(file) # deletion now visible to all new readers
Single deletes, bulk ranges, and batch deletes all use this path. Batch operations pack multiple entries into one JOURNAL_SEG to amortize fsync cost.
7. Compaction with Deletions
7.1 Compaction Process
Before:
[VEC_1] [VEC_2] [JOURNAL_1] [VEC_3] [JOURNAL_2] [MANIFEST_5]
0-999 1000- del:42, 3000- del:[1000, bitmap={42,500,
2999 del:500 4999 2000) 1000..1999}
After:
... [MANIFEST_5] [VEC_sealed] [INDEX_new] [MANIFEST_6]
vectors 0-4999 bitmap={}
MINUS deleted (empty for
compacted range)
7.2 Compaction Algorithm
def compact_with_deletions(file, seg_ids):
bitmap = load_deletion_bitmap(file)
output, id_remap, next_id = [], {}, 0
for seg_id in sorted(seg_ids):
seg = load_segment(file, seg_id)
if seg.seg_type != VEC_SEG:
continue
for vec_id, vector in seg.all_vectors():
if bitmap_check(bitmap, vec_id):
continue # physically exclude
id_remap[vec_id] = next_id
output.append((next_id, vector))
next_id += 1
append_segment(file, VecSegment(flags=SEALED, vectors=output))
remaps = [RemapIdEntry(old, new) for old, new in id_remap.items() if old != new]
if remaps:
append_segment(file, JournalSegment(entries=remaps))
append_segment(file, build_hnsw_index(output))
for old_id in id_remap:
bitmap.remove(old_id)
manifest = build_manifest(file,
tombstone_seg_ids=seg_ids,
deletion_bitmap=bitmap)
append_segment(file, manifest)
fsync(file)
7.3 Journal Merging
During compaction, JOURNAL_SEGs covering the compacted range are consumed:
| Entry Type | Materialization |
|---|---|
| DELETE_VECTOR / DELETE_RANGE | Vectors excluded from output |
| UPDATE_METADATA | Applied to output META_SEG |
| MOVE_VECTOR | Tier assignment applied in new manifest |
| REMAP_ID | Chained: old remap composed with new remap |
Consumed JOURNAL_SEGs are tombstoned alongside compacted VEC_SEGs.
7.4 Compaction Invariants
| ID | Invariant |
|---|---|
| INV-D1 | After compaction, deletion bitmap is empty for compacted range |
| INV-D2 | Sealed output contains only ACTIVE vectors |
| INV-D3 | REMAP_ID entries journaled for every relocated vector |
| INV-D4 | Compacted input segments tombstoned in new manifest |
| INV-D5 | Sealed segments are never modified |
| INV-D6 | Rebuilt indexes exclude deleted nodes |
8. Deletion Consistency
8.1 Crash Safety
Write path:
1. Append JOURNAL_SEG -> fsync crash here: orphan, invisible
2. Append MANIFEST_SEG -> fsync crash here: partial manifest, fallback
Recovery:
- Crash after step 1: JOURNAL_SEG orphaned. No manifest references it.
Reader sees previous manifest. Deletion NOT visible. Orphan cleaned
up by next compaction.
- Crash during step 2: Partial MANIFEST_SEG has bad checksum. Reader
falls back to previous valid manifest. Deletion NOT visible.
- After step 2 success: Manifest durable. Deletion visible.
Guarantee: Uncommitted deletions never affect readers. Deletion is atomic at the manifest fsync boundary.
8.2 Manifest Chain Visibility
MANIFEST_3: bitmap = {}
| JOURNAL_SEG written (delete vector 42)
MANIFEST_4: bitmap = {42} <-- deletion visible from here
| Compaction runs
MANIFEST_5: bitmap = {} <-- vector 42 physically removed
A reader holding MANIFEST_3 continues to see vector 42. A reader opening after MANIFEST_4 will not. This provides snapshot isolation at manifest granularity.
8.3 Multi-File Mode
In multi-file mode, each shard maintains its own deletion bitmap. The DELETION_BITMAP TLV record supports two modes:
+----------------------------------------------+
| mode: u8 |
| 0x00 = SINGLE (one bitmap, inline) |
| 0x01 = SHARDED (per-shard references) |
+----------------------------------------------+
SINGLE (0x00):
| roaring_bitmap: [u8; ...] |
SHARDED (0x01):
| shard_count: u16 |
| For each shard: |
| shard_id: u16 |
| bitmap_offset: u64 (in shard file) |
| bitmap_length: u32 |
| bitmap_hash: hash128 |
+----------------------------------------------+
Queries spanning shards load per-shard bitmaps and check each candidate against its shard's bitmap.
8.4 Concurrent Access
One writer at a time (file-level advisory lock). Multiple readers are safe due to append-only architecture. A reader that opened before a deletion sees the pre-deletion snapshot until it re-reads the manifest.
9. Space Reclamation
| Trigger | Threshold | Action |
|---|---|---|
| Deletion ratio | > 20% of vectors deleted | Schedule compaction |
| Bitmap size | > 1 MB | Schedule compaction |
| Segment count | > 64 mutable segments | Schedule compaction |
| Manual | User-initiated | Compact immediately |
Space accounting derived from the manifest:
total_vector_count: 10,000,000 (Level 0 root manifest)
deleted_vector_count: 150,000 (bitmap cardinality)
active_vector_count: 9,850,000 (total - deleted)
deletion_ratio: 1.5% (below threshold)
wasted_bytes: ~115 MB (150K * 768 B per fp16-384 vector)
10. Summary
Deletion Protocol
| Step | Action | Durability |
|---|---|---|
| 1 | Append JOURNAL_SEG with DELETE entries | fsync (orphan-safe) |
| 2 | Update roaring deletion bitmap | In-memory |
| 3 | Append MANIFEST_SEG with new bitmap | fsync (deletion visible) |
| 4 | Compaction excludes deleted vectors | fsync (physical removal) |
| 5 | File rewrite reclaims space | fsync (space freed) |
New Wire Format Elements
| Element | Type / Tag | Section |
|---|---|---|
| JOURNAL_SEG | Segment type 0x04 | 3 |
| DELETE_VECTOR | Journal entry 0x01 | 3.4 |
| DELETE_RANGE | Journal entry 0x02 | 3.4 |
| UPDATE_METADATA | Journal entry 0x03 | 3.4 |
| MOVE_VECTOR | Journal entry 0x04 | 3.4 |
| REMAP_ID | Journal entry 0x05 | 3.4 |
| DELETION_BITMAP | Level 1 TLV 0x000E | 4 |
Invariants
| ID | Invariant |
|---|---|
| INV-D1 | After compaction, deletion bitmap is empty for compacted range |
| INV-D2 | Sealed output segments contain only ACTIVE vectors |
| INV-D3 | ID remappings journaled for every compaction-relocated vector |
| INV-D4 | Compacted input segments tombstoned in new manifest |
| INV-D5 | Sealed segments are never modified |
| INV-D6 | Rebuilt indexes exclude deleted nodes |
| INV-D7 | Uncommitted deletions never affect readers (crash safety) |
| INV-D8 | Deletion visibility is atomic at the manifest fsync boundary |