# ADR-033: Progressive Indexing Hardening — Centroid Stability, Adversarial Resilience, Recall Framing, and Mandatory Signatures **Status**: Accepted **Date**: 2026-02-15 **Supersedes**: Partially amends ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container) **Affects**: `rvf-types`, `rvf-runtime`, `rvf-manifest`, `rvf-crypto`, `rvf-wasm` --- ## Context Analysis of the progressive indexing system (spec chapters 02-04) revealed four structural weaknesses that convert engineered guarantees into opportunistic behavior: 1. **Centroid stability** depends on physical layout, not logical identity 2. **Layer A recall** collapses silently under adversarial distributions 3. **Recall targets** are empirical, presented as if they were bounds 4. **Manifest integrity** is optional, leaving the hotset attack surface open Each issue individually is tolerable. Together they form a compound vulnerability: an adversary who controls the data distribution AND the file tail can produce a structurally valid RVF file that returns confident, wrong answers with no detection mechanism. This ADR converts all four from "known limitations" to "engineered defenses." --- ## Decision ### 1. Content-Addressed Centroid Stability **Invariant**: Logical identity must not depend on physical layout. #### 1.1 Content-Addressed Segment References Hotset pointers in the Level 0 manifest currently store raw byte offsets: ``` 0x058 8 centroid_seg_offset Byte offset in file ``` Add a parallel content hash field for each hotset pointer: ``` Offset Size Field Description ------ ---- ----- ----------- 0x058 8 centroid_seg_offset Byte offset (for fast seek) 0x0A0 16 centroid_content_hash First 128 bits of SHAKE-256 of segment payload ``` The runtime validates: 1. Seek to `centroid_seg_offset` 2. Read segment header + payload 3. Compute SHAKE-256 of payload 4. Compare first 128 bits against `centroid_content_hash` 5. If mismatch: reject pointer, fall back to Level 1 directory scan This makes compaction physically destructive but logically stable. The manifest re-points by offset for speed but verifies by hash for correctness. #### 1.2 Centroid Epoch Monotonic Counter Add to Level 0 root manifest: ``` Offset Size Field Description ------ ---- ----- ----------- 0x0B0 4 centroid_epoch Monotonic counter, incremented on recomputation 0x0B4 4 max_epoch_drift Maximum allowed drift before forced recompute ``` **Semantics**: - `centroid_epoch` increments each time centroids are recomputed - The manifest's global `epoch` counter tracks all mutations - `epoch_drift = manifest.epoch - centroid_epoch` - If `epoch_drift > max_epoch_drift`: runtime MUST either recompute centroids or widen `n_probe` Default `max_epoch_drift`: 64 epochs. #### 1.3 Automatic Quality Elasticity When epoch drift is detected, the runtime applies controlled quality degradation instead of silent recall loss: ```rust fn effective_n_probe(base_n_probe: u32, epoch_drift: u32, max_drift: u32) -> u32 { if epoch_drift <= max_drift / 2 { // Within comfort zone: no adjustment base_n_probe } else if epoch_drift <= max_drift { // Drift zone: linear widening up to 2x let scale = 1.0 + (epoch_drift - max_drift / 2) as f64 / max_drift as f64; (base_n_probe as f64 * scale).ceil() as u32 } else { // Beyond max drift: double n_probe, schedule recomputation base_n_probe * 2 } } ``` This turns degradation into **controlled quality elasticity**: recall trades against latency in a predictable, bounded way. #### 1.4 Wire Format Changes Add content hash fields to Level 0 at reserved offsets (using the `0x094-0x0FF` reserved region before the signature): ``` Offset Size Field ------ ---- ----- 0x0A0 16 entrypoint_content_hash 0x0B0 16 toplayer_content_hash 0x0C0 16 centroid_content_hash 0x0D0 16 quantdict_content_hash 0x0E0 16 hot_cache_content_hash 0x0F0 4 centroid_epoch 0x0F4 4 max_epoch_drift 0x0F8 8 reserved_hardening ``` Total: 96 bytes. Fits within the existing reserved region before the signature at `0x094`. **Note**: The signature field at `0x094` must move to accommodate this. New signature offset: `0x100`. This is a breaking change to the Level 0 layout. Files written before ADR-033 are detected by `version < 2` in the root manifest and use the old layout. --- ### 2. Layer A Adversarial Resilience **Invariant**: Silent catastrophic degradation must not be possible. #### 2.1 Distance Entropy Detection After computing distances to the top-K centroids, measure the discriminative power: ```rust /// Detect adversarial or degenerate centroid distance distributions. /// Returns true if the distribution is too uniform to trust centroid routing. fn is_degenerate_distribution(distances: &[f32], k: usize) -> bool { if distances.len() < 2 * k { return true; // Not enough centroids } // Sort and take top-2k let mut sorted = distances.to_vec(); sorted.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap()); let top = &sorted[..2 * k]; // Compute coefficient of variation (CV = stddev / mean) let mean = top.iter().sum::() / top.len() as f32; if mean < f32::EPSILON { return true; // All distances near zero } let variance = top.iter().map(|d| (d - mean).powi(2)).sum::() / top.len() as f32; let cv = variance.sqrt() / mean; // CV < 0.05 means top distances are within 5% of each other // This indicates centroids provide no discriminative power cv < DEGENERATE_CV_THRESHOLD } const DEGENERATE_CV_THRESHOLD: f32 = 0.05; ``` #### 2.2 Adaptive n_probe Widening When degeneracy is detected, widen the search: ```rust fn adaptive_n_probe( base_n_probe: u32, centroid_distances: &[f32], total_centroids: u32, ) -> u32 { if is_degenerate_distribution(centroid_distances, base_n_probe as usize) { // Degenerate: widen to sqrt(K) or 4x base, whichever is smaller let widened = (total_centroids as f64).sqrt().ceil() as u32; base_n_probe.max(widened).min(base_n_probe * 4) } else { base_n_probe } } ``` #### 2.3 Multi-Centroid Fallback When distance variance is below threshold AND Layer B is not yet loaded, fall back to a lightweight multi-probe strategy: 1. Compute distances to ALL centroids (not just top-K) 2. If all distances are within `mean +/- 2*stddev`: treat as uniform 3. For uniform distributions: scan the hot cache linearly (if available) 4. If no hot cache: return results with a `quality_flag = APPROXIMATE` in the response This prevents silent wrong answers. The caller knows the result quality. #### 2.4 Quality Flag at the API Boundary `ResultQuality` is defined at two levels: per-retrieval and per-response. **Per-retrieval** (internal, attached to each candidate): ```rust /// Quality confidence for the retrieval candidate set. #[derive(Clone, Copy, Debug, PartialEq, Eq)] #[repr(u8)] pub enum RetrievalQuality { /// Full index traversed, high confidence in candidate set. Full = 0x00, /// Partial index (Layer A+B), good confidence. Partial = 0x01, /// Layer A only, moderate confidence. LayerAOnly = 0x02, /// Degenerate distribution detected, low confidence. DegenerateDetected = 0x03, /// Brute-force fallback used within budget, exact over scanned region. BruteForceBudgeted = 0x04, } ``` **Per-response** (external, returned to the caller at the API boundary): ```rust /// Response-level quality signal. This is the field that consumers /// (RAG pipelines, agent tool chains, MCP clients) MUST inspect. /// /// If `response_quality < threshold`, the consumer should either: /// - Wait and retry (progressive loading will improve quality) /// - Widen the search (increase k or ef_search) /// - Fall back to an alternative data source #[derive(Clone, Copy, Debug, PartialEq, Eq)] #[repr(u8)] pub enum ResponseQuality { /// All results from full index. Trust fully. Verified = 0x00, /// Results from partial index. Usable but may miss neighbors. Usable = 0x01, /// Degraded retrieval detected. Results are best-effort. /// The `degradation_reason` field explains why. Degraded = 0x02, /// Insufficient candidates found. Results are unreliable. /// Caller SHOULD NOT use these for downstream decisions. Unreliable = 0x03, } ``` **Derivation rule** — `ResponseQuality` is the minimum of all `RetrievalQuality` values in the result set: ```rust fn derive_response_quality(results: &[SearchResult]) -> ResponseQuality { let worst = results.iter() .map(|r| r.retrieval_quality) .max_by_key(|q| *q as u8) .unwrap_or(RetrievalQuality::Full); match worst { RetrievalQuality::Full => ResponseQuality::Verified, RetrievalQuality::Partial => ResponseQuality::Usable, RetrievalQuality::LayerAOnly => ResponseQuality::Usable, RetrievalQuality::DegenerateDetected => ResponseQuality::Degraded, RetrievalQuality::BruteForceBudgeted => ResponseQuality::Degraded, } } ``` **Mandatory outer wrapper** — `QualityEnvelope` is the top-level return type for all query APIs. It is not a nested field. It is the outer wrapper. JSON flattening cannot discard it. gRPC serialization cannot drop it. MCP tool responses must include it. ```rust /// The mandatory outer return type for all query APIs. /// This is not optional. This is not a nested field. /// Consumers that ignore this are misusing the API. pub struct QualityEnvelope { /// The search results. pub results: Vec, /// Top-level quality signal. Consumers MUST inspect this. pub quality: ResponseQuality, /// Structured evidence for why the quality is what it is. pub evidence: SearchEvidenceSummary, /// Resource consumption report for this query. pub budgets: BudgetReport, /// If quality is degraded, the structured reason. pub degradation: Option, } /// Evidence chain: what index state was actually used. pub struct SearchEvidenceSummary { /// Which index layers were available and used. pub layers_used: IndexLayersUsed, /// Effective n_probe (after any adaptive widening). pub n_probe_effective: u32, /// Whether degenerate distribution was detected. pub degenerate_detected: bool, /// Coefficient of variation of top-K centroid distances. pub centroid_distance_cv: f32, /// Number of candidates found by HNSW before safety net. pub hnsw_candidate_count: u32, /// Number of candidates added by safety net scan. pub safety_net_candidate_count: u32, /// Content hashes of index segments actually touched. pub index_segments_touched: Vec<[u8; 16]>, } #[derive(Clone, Copy, Debug)] pub struct IndexLayersUsed { pub layer_a: bool, pub layer_b: bool, pub layer_c: bool, pub hot_cache: bool, } /// Resource consumption report. pub struct BudgetReport { /// Wall-clock time per stage. pub centroid_routing_us: u64, pub hnsw_traversal_us: u64, pub safety_net_scan_us: u64, pub reranking_us: u64, pub total_us: u64, /// Distance evaluations performed. pub distance_ops: u64, pub distance_ops_budget: u64, /// Bytes read from storage. pub bytes_read: u64, /// Candidates scanned in linear scan (safety net). pub linear_scan_count: u64, pub linear_scan_budget: u64, } /// Why quality is degraded. pub struct DegradationReport { /// Which fallback path was chosen. pub fallback_path: FallbackPath, /// Why it was chosen (structured, not prose). pub reason: DegradationReason, /// What guarantee is lost relative to Full quality. pub guarantee_lost: &'static str, } #[derive(Clone, Copy, Debug)] pub enum FallbackPath { /// Normal HNSW traversal, no fallback needed. None, /// Adaptive n_probe widening due to epoch drift. NProbeWidened, /// Adaptive n_probe widening due to degenerate distribution. DegenerateWidened, /// Selective safety net scan on hot cache. SafetyNetSelective, /// Safety net budget exhausted before completion. SafetyNetBudgetExhausted, } #[derive(Clone, Copy, Debug)] pub enum DegradationReason { /// Centroid epoch drift exceeded threshold. CentroidDrift { epoch_drift: u32, max_drift: u32 }, /// Degenerate distance distribution detected. DegenerateDistribution { cv: f32, threshold: f32 }, /// Brute-force budget exhausted. BudgetExhausted { scanned: u64, total: u64, budget_type: &'static str }, /// Index layer not yet loaded. IndexNotLoaded { available: &'static str, needed: &'static str }, } ``` **Hard enforcement rule**: If `quality` is `Degraded` or `Unreliable`, the runtime MUST either: 1. Return the `QualityEnvelope` with the structured warning (which cannot be dropped because it is the outer type, not a nested field), OR 2. Require an explicit caller override flag to proceed: ```rust pub enum QualityPreference { /// Runtime decides. Default. Fastest path that meets internal thresholds. Auto, /// Caller prefers quality over latency. Runtime may widen n_probe, /// extend budgets up to 4x, and block until Layer B loads. PreferQuality, /// Caller prefers latency over quality. Runtime may skip safety net, /// reduce n_probe. ResponseQuality honestly reports what it gets. PreferLatency, /// Caller explicitly accepts degraded results. Required to proceed /// when ResponseQuality would be Degraded or Unreliable under Auto. /// Without this flag, Degraded queries return an error, not results. AcceptDegraded, } ``` Without `AcceptDegraded`, a `Degraded` result is returned as `Err(RvfError::QualityBelowThreshold(envelope))` — the caller gets the evidence but must explicitly opt in to use the results. This prevents silent misuse. #### 2.5 Distribution Assumption Declaration The spec MUST explicitly state: > **Distribution Assumption**: Recall targets (0.70/0.85/0.95) assume sub-Gaussian embedding distributions typical of neural network outputs (sentence-transformers, OpenAI ada-002, Cohere embed-v3, etc.). For adversarial, synthetic, or uniform-random distributions, recall may be lower. When degenerate distributions are detected at query time, the runtime automatically widens its search and signals reduced confidence via `ResultQuality`. This converts an implicit assumption into an explicit contract. --- ### 3. Recall Bound Framing **Invariant**: Never claim theoretical guarantees without distribution assumptions. #### 3.1 Monotonic Recall Improvement Property Replace hard recall bounds with a provable structural property: > **Monotonic Recall Property**: For any query Q and any two index states S1 and S2 where S2 includes all segments of S1 plus additional INDEX_SEGs: > > `recall(Q, S2) >= recall(Q, S1)` > > Proof: S2's candidate set is a superset of S1's (append-only segments, no removal). More candidates cannot reduce recall. This is provable from the append-only invariant and requires no distribution assumption. #### 3.2 Recall Target Classes Replace the single recall table with benchmark-class-specific targets: | State | Natural Embeddings | Synthetic Uniform | Adversarial Clustered | |-------|-------------------|-------------------|----------------------| | Layer A | >= 0.70 | >= 0.40 | >= 0.20 (with detection) | | A + B | >= 0.85 | >= 0.70 | >= 0.60 | | A + B + C | >= 0.95 | >= 0.90 | >= 0.85 | "Natural Embeddings" = sentence-transformers, OpenAI, Cohere on standard corpora. #### 3.3 Brute-Force Safety Net (Dual-Budgeted) When the candidate set from HNSW search is smaller than `2 * k`, the safety net activates. It is capped by **both** a time budget and a candidate budget to prevent unbounded work. An adversarial query cannot force O(N) compute. **Three required caps** (all enforced, none optional): ```rust /// Budget caps for the brute-force safety net. /// All three are enforced simultaneously. The scan stops at whichever hits first. /// These are RUNTIME limits, not caller-adjustable above the defaults. /// Callers may reduce them but not exceed them (unless PreferQuality mode, /// which extends to 4x). pub struct SafetyNetBudget { /// Maximum wall-clock time for the safety net scan. /// Default: 2,000 us (2 ms) in Layer A mode, 5,000 us (5 ms) in partial mode. pub max_scan_time_us: u64, /// Maximum number of candidate vectors to scan. /// Default: 10,000 in Layer A mode, 50,000 in partial mode. pub max_scan_candidates: u64, /// Maximum number of distance evaluations (the actual compute cost). /// This is the hardest cap — it bounds CPU work directly. /// Default: 10,000 in Layer A mode, 50,000 in partial mode. pub max_distance_ops: u64, } impl SafetyNetBudget { /// Layer A only defaults: tight budget for instant first query. pub const LAYER_A: Self = Self { max_scan_time_us: 2_000, // 2 ms max_scan_candidates: 10_000, max_distance_ops: 10_000, }; /// Partial index defaults: moderate budget. pub const PARTIAL: Self = Self { max_scan_time_us: 5_000, // 5 ms max_scan_candidates: 50_000, max_distance_ops: 50_000, }; /// PreferQuality mode: 4x extension of the applicable default. pub fn extended_4x(&self) -> Self { Self { max_scan_time_us: self.max_scan_time_us * 4, max_scan_candidates: self.max_scan_candidates * 4, max_distance_ops: self.max_distance_ops * 4, } } /// Disabled: all zeros. Safety net will not scan anything. pub const DISABLED: Self = Self { max_scan_time_us: 0, max_scan_candidates: 0, max_distance_ops: 0, }; } ``` All three are in `QueryOptions`: ```rust pub struct QueryOptions { pub k: usize, pub ef_search: u32, pub quality_preference: QualityPreference, /// Safety net budget. Callers may tighten but not loosen beyond /// the mode default (unless QualityPreference::PreferQuality). pub safety_net_budget: SafetyNetBudget, } ``` **Policy response**: When any budget is exceeded, the scan stops immediately and returns: - `FallbackPath::SafetyNetBudgetExhausted` - `DegradationReason::BudgetExhausted` with which budget triggered and how far the scan got - A partial candidate set (whatever was found before the budget hit) - `ResponseQuality::Degraded` **Selective scan strategy** — the safety net does NOT scan the entire hot cache. It scans a targeted subset to stay sparse even under fallback: ```rust fn selective_safety_net_scan( query: &[f32], k: usize, hnsw_candidates: &[Candidate], centroid_distances: &[(u32, f32)], // (centroid_id, distance) store: &RvfStore, budget: &SafetyNetBudget, ) -> (Vec, BudgetReport) { let deadline = Instant::now() + Duration::from_micros(budget.max_scan_time_us); let mut scanned: u64 = 0; let mut dist_ops: u64 = 0; let mut candidates = Vec::new(); let mut budget_report = BudgetReport::default(); // Phase 1: Multi-centroid union // Scan hot cache entries whose centroid_id is in top-T centroids. // T = min(adaptive_n_probe, sqrt(total_centroids)) let top_t = centroid_distances.len().min( (centroid_distances.len() as f64).sqrt().ceil() as usize ); let top_centroid_ids: Vec = centroid_distances[..top_t] .iter().map(|(id, _)| *id).collect(); for block in store.hot_cache_blocks_by_centroid(&top_centroid_ids) { if scanned >= budget.max_scan_candidates { break; } if dist_ops >= budget.max_distance_ops { break; } if Instant::now() >= deadline { break; } let block_results = scan_block(query, block); scanned += block.len() as u64; dist_ops += block.len() as u64; candidates.extend(block_results); } // Phase 2: HNSW neighbor expansion // For each existing HNSW candidate, scan their neighbors' vectors // in the hot cache (1-hop expansion). if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops { for candidate in hnsw_candidates.iter().take(k) { if scanned >= budget.max_scan_candidates { break; } if dist_ops >= budget.max_distance_ops { break; } if Instant::now() >= deadline { break; } if let Some(neighbors) = store.hot_cache_neighbors(candidate.id) { for neighbor in neighbors { if dist_ops >= budget.max_distance_ops { break; } let d = distance(query, &neighbor.vector); dist_ops += 1; scanned += 1; candidates.push(Candidate { id: neighbor.id, distance: d }); } } } } // Phase 3: Recency window (if budget remains) // Scan the most recently ingested vectors in the hot cache, // which are most likely to be missing from the HNSW index. if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops { let remaining_budget = budget.max_scan_candidates - scanned; for vec in store.hot_cache_recent(remaining_budget as usize) { if dist_ops >= budget.max_distance_ops { break; } if Instant::now() >= deadline { break; } let d = distance(query, &vec.vector); dist_ops += 1; scanned += 1; candidates.push(Candidate { id: vec.id, distance: d }); } } budget_report.linear_scan_count = scanned; budget_report.linear_scan_budget = budget.max_scan_candidates; budget_report.distance_ops = dist_ops; budget_report.distance_ops_budget = budget.max_distance_ops; (candidates, budget_report) } ``` **Why selective, not exhaustive:** The safety net scans three targeted sets in priority order: 1. **Multi-centroid union**: vectors near the best-matching centroids (spatial locality) 2. **HNSW neighbor expansion**: 1-hop neighbors of existing candidates (graph locality) 3. **Recency window**: recently ingested vectors not yet in any index (temporal locality) Each phase respects all three budget caps. Even under the safety net, the scan stays **sparse and deterministic**. **Why three budget caps:** - **Time alone** is insufficient: fast CPUs burn millions of ops in 5 ms. - **Candidates alone** is insufficient: slow storage makes 50K scans take 50 ms. - **Distance ops alone** is insufficient: a scan that reads but doesn't compute still consumes I/O bandwidth. - **All three together** bound the work in every dimension. The scan stops at whichever limit hits first. **Invariant**: The brute-force safety net is bounded in time, candidates, and compute. A fuzzed query generator cannot push p95 latency above the budgeted ceiling. If all three budgets are set to 0, the safety net is disabled entirely and the system returns `ResponseQuality::Degraded` immediately when HNSW produces insufficient candidates. #### 3.3.1 DoS Hardening Three additional protections for public-facing deployments: **Budget tokens**: Each query consumes a fixed budget of distance ops and bytes. The runtime tracks a per-connection token bucket. No tokens remaining = query rejected with `429 Too Many Requests` equivalent. Prevents sustained DoS via repeated adversarial queries. **Negative caching**: If a query signature (hash of the query vector's quantized form) triggers degenerate mode more than N times in a window, the runtime caches it and forces `SafetyNetBudget::DISABLED` for subsequent matches. The adversary cannot keep burning budget on the same attack vector. **Proof-of-work option**: For open-internet endpoints only. The caller must include a nonce proving O(work) computation before the query is accepted. This is opt-in, not default — only relevant for unauthenticated public endpoints. #### 3.4 Acceptance Test Update Update `benchmarks/acceptance-tests.md` to: 1. Test against three distribution classes (natural, synthetic, adversarial) 2. Verify `ResponseQuality` flag accuracy at the API boundary 3. Verify monotonic recall improvement across progressive load phases 4. Measure brute-force fallback frequency and latency impact 5. Verify brute-force scan terminates within both time and candidate budgets #### 3.5 Acceptance Test: Malicious Tail Manifest (MANDATORY) **Test**: A maliciously rewritten tail manifest that preserves CRC32C but changes hotset pointers must fail to mount under `Strict` policy, and must produce a logged, deterministic failure reason. ``` Test: Malicious Hotset Pointer Redirection ========================================== Setup: 1. Create signed RVF file with 100K vectors, full HNSW index 2. Record the original centroid_seg_offset and centroid_content_hash 3. Identify a different valid INDEX_SEG in the file (e.g., Layer B) 4. Craft a new Level 0 manifest: - Replace centroid_seg_offset with the Layer B segment offset - Keep ALL other fields identical - Recompute CRC32C at 0xFFC to match the modified manifest - Do NOT re-sign (signature becomes invalid) 5. Overwrite last 4096 bytes of file with crafted manifest Verification under Strict policy: 1. Attempt: RvfStore::open_with_policy(&path, opts, SecurityPolicy::Strict) 2. MUST return Err(SecurityError::InvalidSignature) 3. The error MUST include: - error_code: a stable, documented error code (not just a string) - manifest_offset: byte offset of the rejected manifest - expected_signer: public key fingerprint (if known) - rejection_phase: "signature_verification" (not "content_hash") 4. The error MUST be logged at WARN level or higher 5. The file MUST NOT be queryable (no partial mount, no fallback) Verification under Paranoid policy: Same as Strict, identical behavior. Verification under WarnOnly policy: 1. File opens successfully (warning logged) 2. Content hash verification runs on first hotset access 3. centroid_content_hash mismatches the actual segment payload 4. MUST return Err(SecurityError::ContentHashMismatch) on first query 5. The error MUST include: - pointer_name: "centroid_seg_offset" - expected_hash: the hash stored in Level 0 - actual_hash: the hash of the segment at the pointed offset - seg_offset: the byte offset that was followed 6. System transitions to read-only mode, refuses further queries Verification under Permissive policy: 1. File opens successfully (no warning) 2. Queries execute against the wrong segment 3. Results are structurally valid but semantically wrong 4. ResponseQuality is NOT required to detect this (Permissive = no safety) 5. This is the EXPECTED AND DOCUMENTED behavior of Permissive mode Pass criteria: - Strict/Paranoid: deterministic rejection, logged error, no mount - WarnOnly: mount succeeds, content hash catches mismatch on first access - Permissive: mount succeeds, no detection (by design) - Error messages are stable across versions (code, not prose) - No panic, no undefined behavior, no partial state leakage ``` **Test: Malicious Manifest with Re-signed Forgery** ``` Setup: 1. Same as above, but attacker also re-signs with a DIFFERENT key 2. File now has valid CRC32C AND valid signature — but wrong signer Verification under Strict policy: 1. MUST return Err(SecurityError::UnknownSigner) 2. Error includes the actual signer fingerprint 3. Error includes the expected signer fingerprint (from trust store) 4. File does not mount Pass criteria: - The system distinguishes "no signature" from "wrong signer" - Both produce distinct, documented error codes ``` #### 3.6 Acceptance Tests: QualityEnvelope Enforcement (MANDATORY) **Test 1: Consumer Cannot Ignore QualityEnvelope** ``` Test: Schema Enforcement of QualityEnvelope ============================================ Setup: 1. Create RVF file with 10K vectors, full index 2. Issue a query that returns Degraded results (use degenerate query vector) Verification: 1. The query API returns QualityEnvelope, not Vec 2. Attempt to deserialize the response as Vec (without envelope) 3. MUST fail at schema validation — the envelope is the outer type 4. JSON response: top-level keys MUST include "quality", "evidence", "budgets" 5. gRPC response: QualityEnvelope is the response message type 6. MCP tool response: "quality" field is at top level, not nested Pass criteria: - No API path exists that returns raw results without the envelope - Schema validation rejects any consumer that skips the quality field - The envelope cannot be flattened away by middleware or serialization ``` **Test 2: Adversarial Query Respects max_distance_ops Under Safety Net** ``` Test: Budget Cap Enforcement Under Adversarial Query ===================================================== Setup: 1. Create RVF file with 1M vectors, Layer A only (no HNSW loaded) 2. Set SafetyNetBudget to LAYER_A defaults (10,000 distance ops) 3. Craft adversarial query that triggers degenerate detection (uniform-random vector or equidistant from all centroids) Verification: 1. Issue query with quality_preference = Auto 2. Safety net activates (candidate set < 2*k from HNSW) 3. BudgetReport.distance_ops MUST be <= SafetyNetBudget.max_distance_ops 4. BudgetReport.distance_ops MUST be <= 10,000 5. Total query wall-clock MUST be <= SafetyNetBudget.max_scan_time_us 6. DegradationReport.reason MUST be BudgetExhausted if budget was hit 7. ResponseQuality MUST be Degraded (not Verified or Usable) Stress test: 1. Repeat with 10,000 adversarial queries in sequence 2. No single query may exceed max_distance_ops 3. Aggregate p95 latency MUST stay below max_scan_time_us ceiling 4. No OOM, no panic, no unbounded allocation Pass criteria: - max_distance_ops is a hard cap, never exceeded by even 1 operation - Budget enforcement works under all three safety net phases - Each phase independently respects all three budget caps ``` **Test 3: Degenerate Conditions Produce Partial Results, Not Hangs** ``` Test: Graceful Degradation Under Degenerate Conditions ======================================================= Setup: 1. Create RVF file with 1M uniform-random vectors (worst case) 2. Load with Layer A only (no HNSW, no Layer B/C) 3. All centroids equidistant from query (maximum degeneracy) Verification: 1. Issue query with quality_preference = Auto 2. Runtime MUST return within max_scan_time_us (not hang) 3. Return type MUST be Err(RvfError::QualityBelowThreshold(envelope)) 4. The envelope MUST contain: a. A partial result set (whatever was found before budget hit) b. quality = ResponseQuality::Degraded or Unreliable c. degradation.reason = BudgetExhausted or DegenerateDistribution d. degradation.guarantee_lost describes what is missing e. budgets.distance_ops <= budgets.distance_ops_budget 5. The caller can then choose: a. Retry with PreferQuality (extends budget 4x) b. Retry with AcceptDegraded (uses partial results as-is) c. Wait for Layer B to load and retry 6. With AcceptDegraded: a. Same partial results are returned as Ok(envelope) b. ResponseQuality is still Degraded (honesty preserved) c. No additional scanning beyond what was already done Pass criteria: - No hang, no scan-to-completion, no unbounded work - Partial results are always available (not empty unless truly zero candidates) - Clear, structured reason for degradation (not a string, a typed enum) - Caller can always recover by choosing a different QualityPreference ``` #### 3.7 Benchmark: Fuzzed Query Latency Ceiling (MANDATORY) ``` Benchmark: Fuzzed Query Generator vs Budget Ceiling ===================================================== Setup: 1. Create RVF file with 10M vectors, 384 dimensions, fp16 2. Generate a fuzzed query corpus: a. 1000 natural embedding queries (sentence-transformer outputs) b. 1000 uniform-random queries c. 1000 adversarial queries (equidistant from top-K centroids) d. 1000 degenerate queries (zero vector, max-norm vector, NaN-adjacent) 3. Load file progressively: measure at Layer A, A+B, A+B+C Test: 1. Execute all 4000 queries at each progressive load stage 2. Measure p50, p95, p99, max latency per query class per stage Pass criteria: - p95 latency MUST NOT exceed SafetyNetBudget.max_scan_time_us at any stage - p99 latency MUST NOT exceed 2x SafetyNetBudget.max_scan_time_us at any stage (allowing for OS scheduling jitter, not algorithmic overshoot) - max_distance_ops is NEVER exceeded (hard invariant, no exceptions) - Recall improves monotonically across stages for all query classes: recall@10(Layer A) <= recall@10(A+B) <= recall@10(A+B+C) - No query class achieves recall@10 = 0.0 at any stage (even degenerate queries must return SOME results) Report: JSON report per stage with: stage, query_class, p50_us, p95_us, p99_us, max_us, avg_recall_at_10, min_recall_at_10, avg_distance_ops, max_distance_ops, safety_net_trigger_rate, budget_exhaustion_rate ``` --- ### 4. Mandatory Manifest Signatures **Invariant**: No signature, no mount in secure mode. #### 4.1 Security Mount Policy Add a `SecurityPolicy` enum to `RvfOptions`: ```rust /// Manifest signature verification policy. #[derive(Clone, Copy, Debug, PartialEq, Eq)] #[repr(u8)] pub enum SecurityPolicy { /// No signature verification. For development and testing only. /// Files open regardless of signature state. Permissive = 0x00, /// Warn on missing or invalid signatures, but allow open. /// Log events for auditing. WarnOnly = 0x01, /// Require valid signature on Level 0 manifest. /// Reject files with missing or invalid signatures. /// DEFAULT for production. Strict = 0x02, /// Require valid signatures on Level 0, Level 1, and all /// hotset-referenced segments. Full chain verification. Paranoid = 0x03, } impl Default for SecurityPolicy { fn default() -> Self { Self::Strict } } ``` **Default is `Strict`**, not `Permissive`. #### 4.2 Verification Chain Under `Strict` policy, the open path becomes: ``` 1. Read Level 0 (4096 bytes) 2. Validate CRC32C (corruption check) 3. Validate ML-DSA-65 signature (adversarial check) 4. If signature missing: REJECT with SecurityError::UnsignedManifest 5. If signature invalid: REJECT with SecurityError::InvalidSignature 6. Extract hotset pointers 7. For each hotset pointer: validate content hash (ADR-033 §1.1) 8. If any content hash fails: REJECT with SecurityError::ContentHashMismatch 9. System is now queryable with verified pointers ``` Under `Paranoid` policy, add: ``` 10. Read Level 1 manifest 11. Validate Level 1 signature 12. For each segment in directory: verify content hash matches on first access ``` #### 4.3 Unsigned File Handling Files without signatures can still be opened under `Permissive` or `WarnOnly` policies. This supports: - Development and testing workflows - Legacy files created before signature support - Performance-critical paths where verification latency is unacceptable But the default is `Strict`. If an enterprise deploys with defaults, they get signature enforcement. They must explicitly opt out. #### 4.4 Signature Generation on Write Every `write_manifest()` call MUST: 1. Compute SHAKE-256-256 content hashes for all hotset-referenced segments 2. Store hashes in Level 0 at the new offsets (§1.4) 3. If a signing key is available: sign Level 0 with ML-DSA-65 4. If no signing key: write `sig_algo = 0` (unsigned) The `create()` and `open()` methods accept an optional signing key: ```rust impl RvfStore { pub fn create_signed( path: &Path, options: RvfOptions, signing_key: &MlDsa65SigningKey, ) -> Result; } ``` #### 4.5 Runtime Policy Flag The security policy is set at store open time and cannot be downgraded: ```rust let store = RvfStore::open_with_policy( &path, RvfOptions::default(), SecurityPolicy::Strict, )?; ``` A store opened with `Strict` policy will reject any hotset pointer that fails content hash verification, even if the CRC32C passes. This prevents the segment-swap attack identified in the analysis. --- ## Consequences ### Positive - Centroid stability becomes a **logical invariant**, not a physical accident - Adversarial distribution degradation becomes **detectable and bounded** - Recall claims become **honest** — empirical targets with explicit assumptions - Manifest integrity becomes **mandatory by default** — enterprises are secure without configuration - Quality elasticity replaces silent degradation — the system tells you when it's uncertain ### Negative - Level 0 layout change is **breaking** (version 1 -> version 2) - Content hash computation adds ~50 microseconds per manifest write - Strict signature policy adds ~200 microseconds per file open (ML-DSA-65 verify) - Adaptive n_probe increases query latency by up to 4x under degenerate distributions ### Migration - Level 0 version field (`0x004`) distinguishes v1 (pre-ADR-033) from v2 - v1 files are readable under `Permissive` policy (no content hashes, no signature) - v1 files trigger a warning under `WarnOnly` policy - v1 files are rejected under `Strict` policy unless explicitly migrated - Migration tool: `rvf migrate --sign --key ` rewrites manifest with v2 layout --- ## Size Impact | Component | Additional Bytes | Where | |-----------|-----------------|-------| | Content hashes (5 pointers * 16 bytes) | 80 B | Level 0 manifest | | Centroid epoch + drift fields | 8 B | Level 0 manifest | | ResponseQuality + DegradationReason | ~64 B | Per query response | | SecurityPolicy in options | 1 B | Runtime config | | Total Level 0 overhead | 96 B | Within existing 4096 B page | No additional segments. No file size increase beyond the 96 bytes in Level 0. --- ## Implementation Order | Phase | Component | Estimated Effort | |-------|-----------|-----------------| | 1 | Content hash fields in `rvf-types` Level 0 layout | Small | | 2 | `centroid_epoch` + `max_epoch_drift` in manifest | Small | | 3 | `ResultQuality` enum in `rvf-runtime` | Small | | 4 | `is_degenerate_distribution()` + adaptive n_probe | Medium | | 5 | Content hash verification in read path | Medium | | 6 | `SecurityPolicy` enum + enforcement in open path | Medium | | 7 | ML-DSA-65 signing in write path | Large (depends on rvf-crypto) | | 8 | Brute-force safety net in query path | Medium | | 9 | Acceptance test updates (3 distribution classes) | Medium | | 10 | Migration tool (`rvf migrate --sign`) | Medium | --- ## References - RVF Spec 02: Manifest System (hotset pointers, Level 0 layout) - RVF Spec 04: Progressive Indexing (Layer A/B/C recall targets) - RVF Spec 03: Temperature Tiering (centroid refresh, sketch epochs) - ADR-029: RVF Canonical Format (universal adoption across libraries) - ADR-030: Cognitive Container (three-tier execution model) - FIPS 204: ML-DSA (Module-Lattice Digital Signature Algorithm) - Malkov & Yashunin (2018): HNSW search complexity analysis