Files
wifi-densepose/vendor/ruvector/docs/adr/ADR-033-progressive-indexing-hardening.md

1027 lines
39 KiB
Markdown

# ADR-033: Progressive Indexing Hardening — Centroid Stability, Adversarial Resilience, Recall Framing, and Mandatory Signatures
**Status**: Accepted
**Date**: 2026-02-15
**Supersedes**: Partially amends ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container)
**Affects**: `rvf-types`, `rvf-runtime`, `rvf-manifest`, `rvf-crypto`, `rvf-wasm`
---
## Context
Analysis of the progressive indexing system (spec chapters 02-04) revealed four structural weaknesses that convert engineered guarantees into opportunistic behavior:
1. **Centroid stability** depends on physical layout, not logical identity
2. **Layer A recall** collapses silently under adversarial distributions
3. **Recall targets** are empirical, presented as if they were bounds
4. **Manifest integrity** is optional, leaving the hotset attack surface open
Each issue individually is tolerable. Together they form a compound vulnerability: an adversary who controls the data distribution AND the file tail can produce a structurally valid RVF file that returns confident, wrong answers with no detection mechanism.
This ADR converts all four from "known limitations" to "engineered defenses."
---
## Decision
### 1. Content-Addressed Centroid Stability
**Invariant**: Logical identity must not depend on physical layout.
#### 1.1 Content-Addressed Segment References
Hotset pointers in the Level 0 manifest currently store raw byte offsets:
```
0x058 8 centroid_seg_offset Byte offset in file
```
Add a parallel content hash field for each hotset pointer:
```
Offset Size Field Description
------ ---- ----- -----------
0x058 8 centroid_seg_offset Byte offset (for fast seek)
0x0A0 16 centroid_content_hash First 128 bits of SHAKE-256 of segment payload
```
The runtime validates:
1. Seek to `centroid_seg_offset`
2. Read segment header + payload
3. Compute SHAKE-256 of payload
4. Compare first 128 bits against `centroid_content_hash`
5. If mismatch: reject pointer, fall back to Level 1 directory scan
This makes compaction physically destructive but logically stable. The manifest re-points by offset for speed but verifies by hash for correctness.
#### 1.2 Centroid Epoch Monotonic Counter
Add to Level 0 root manifest:
```
Offset Size Field Description
------ ---- ----- -----------
0x0B0 4 centroid_epoch Monotonic counter, incremented on recomputation
0x0B4 4 max_epoch_drift Maximum allowed drift before forced recompute
```
**Semantics**:
- `centroid_epoch` increments each time centroids are recomputed
- The manifest's global `epoch` counter tracks all mutations
- `epoch_drift = manifest.epoch - centroid_epoch`
- If `epoch_drift > max_epoch_drift`: runtime MUST either recompute centroids or widen `n_probe`
Default `max_epoch_drift`: 64 epochs.
#### 1.3 Automatic Quality Elasticity
When epoch drift is detected, the runtime applies controlled quality degradation instead of silent recall loss:
```rust
fn effective_n_probe(base_n_probe: u32, epoch_drift: u32, max_drift: u32) -> u32 {
if epoch_drift <= max_drift / 2 {
// Within comfort zone: no adjustment
base_n_probe
} else if epoch_drift <= max_drift {
// Drift zone: linear widening up to 2x
let scale = 1.0 + (epoch_drift - max_drift / 2) as f64 / max_drift as f64;
(base_n_probe as f64 * scale).ceil() as u32
} else {
// Beyond max drift: double n_probe, schedule recomputation
base_n_probe * 2
}
}
```
This turns degradation into **controlled quality elasticity**: recall trades against latency in a predictable, bounded way.
#### 1.4 Wire Format Changes
Add content hash fields to Level 0 at reserved offsets (using the `0x094-0x0FF` reserved region before the signature):
```
Offset Size Field
------ ---- -----
0x0A0 16 entrypoint_content_hash
0x0B0 16 toplayer_content_hash
0x0C0 16 centroid_content_hash
0x0D0 16 quantdict_content_hash
0x0E0 16 hot_cache_content_hash
0x0F0 4 centroid_epoch
0x0F4 4 max_epoch_drift
0x0F8 8 reserved_hardening
```
Total: 96 bytes. Fits within the existing reserved region before the signature at `0x094`.
**Note**: The signature field at `0x094` must move to accommodate this. New signature offset: `0x100`. This is a breaking change to the Level 0 layout. Files written before ADR-033 are detected by `version < 2` in the root manifest and use the old layout.
---
### 2. Layer A Adversarial Resilience
**Invariant**: Silent catastrophic degradation must not be possible.
#### 2.1 Distance Entropy Detection
After computing distances to the top-K centroids, measure the discriminative power:
```rust
/// Detect adversarial or degenerate centroid distance distributions.
/// Returns true if the distribution is too uniform to trust centroid routing.
fn is_degenerate_distribution(distances: &[f32], k: usize) -> bool {
if distances.len() < 2 * k {
return true; // Not enough centroids
}
// Sort and take top-2k
let mut sorted = distances.to_vec();
sorted.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
let top = &sorted[..2 * k];
// Compute coefficient of variation (CV = stddev / mean)
let mean = top.iter().sum::<f32>() / top.len() as f32;
if mean < f32::EPSILON {
return true; // All distances near zero
}
let variance = top.iter().map(|d| (d - mean).powi(2)).sum::<f32>() / top.len() as f32;
let cv = variance.sqrt() / mean;
// CV < 0.05 means top distances are within 5% of each other
// This indicates centroids provide no discriminative power
cv < DEGENERATE_CV_THRESHOLD
}
const DEGENERATE_CV_THRESHOLD: f32 = 0.05;
```
#### 2.2 Adaptive n_probe Widening
When degeneracy is detected, widen the search:
```rust
fn adaptive_n_probe(
base_n_probe: u32,
centroid_distances: &[f32],
total_centroids: u32,
) -> u32 {
if is_degenerate_distribution(centroid_distances, base_n_probe as usize) {
// Degenerate: widen to sqrt(K) or 4x base, whichever is smaller
let widened = (total_centroids as f64).sqrt().ceil() as u32;
base_n_probe.max(widened).min(base_n_probe * 4)
} else {
base_n_probe
}
}
```
#### 2.3 Multi-Centroid Fallback
When distance variance is below threshold AND Layer B is not yet loaded, fall back to a lightweight multi-probe strategy:
1. Compute distances to ALL centroids (not just top-K)
2. If all distances are within `mean +/- 2*stddev`: treat as uniform
3. For uniform distributions: scan the hot cache linearly (if available)
4. If no hot cache: return results with a `quality_flag = APPROXIMATE` in the response
This prevents silent wrong answers. The caller knows the result quality.
#### 2.4 Quality Flag at the API Boundary
`ResultQuality` is defined at two levels: per-retrieval and per-response.
**Per-retrieval** (internal, attached to each candidate):
```rust
/// Quality confidence for the retrieval candidate set.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum RetrievalQuality {
/// Full index traversed, high confidence in candidate set.
Full = 0x00,
/// Partial index (Layer A+B), good confidence.
Partial = 0x01,
/// Layer A only, moderate confidence.
LayerAOnly = 0x02,
/// Degenerate distribution detected, low confidence.
DegenerateDetected = 0x03,
/// Brute-force fallback used within budget, exact over scanned region.
BruteForceBudgeted = 0x04,
}
```
**Per-response** (external, returned to the caller at the API boundary):
```rust
/// Response-level quality signal. This is the field that consumers
/// (RAG pipelines, agent tool chains, MCP clients) MUST inspect.
///
/// If `response_quality < threshold`, the consumer should either:
/// - Wait and retry (progressive loading will improve quality)
/// - Widen the search (increase k or ef_search)
/// - Fall back to an alternative data source
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum ResponseQuality {
/// All results from full index. Trust fully.
Verified = 0x00,
/// Results from partial index. Usable but may miss neighbors.
Usable = 0x01,
/// Degraded retrieval detected. Results are best-effort.
/// The `degradation_reason` field explains why.
Degraded = 0x02,
/// Insufficient candidates found. Results are unreliable.
/// Caller SHOULD NOT use these for downstream decisions.
Unreliable = 0x03,
}
```
**Derivation rule**`ResponseQuality` is the minimum of all `RetrievalQuality` values in the result set:
```rust
fn derive_response_quality(results: &[SearchResult]) -> ResponseQuality {
let worst = results.iter()
.map(|r| r.retrieval_quality)
.max_by_key(|q| *q as u8)
.unwrap_or(RetrievalQuality::Full);
match worst {
RetrievalQuality::Full => ResponseQuality::Verified,
RetrievalQuality::Partial => ResponseQuality::Usable,
RetrievalQuality::LayerAOnly => ResponseQuality::Usable,
RetrievalQuality::DegenerateDetected => ResponseQuality::Degraded,
RetrievalQuality::BruteForceBudgeted => ResponseQuality::Degraded,
}
}
```
**Mandatory outer wrapper**`QualityEnvelope` is the top-level return type for all
query APIs. It is not a nested field. It is the outer wrapper. JSON flattening cannot
discard it. gRPC serialization cannot drop it. MCP tool responses must include it.
```rust
/// The mandatory outer return type for all query APIs.
/// This is not optional. This is not a nested field.
/// Consumers that ignore this are misusing the API.
pub struct QualityEnvelope {
/// The search results.
pub results: Vec<SearchResult>,
/// Top-level quality signal. Consumers MUST inspect this.
pub quality: ResponseQuality,
/// Structured evidence for why the quality is what it is.
pub evidence: SearchEvidenceSummary,
/// Resource consumption report for this query.
pub budgets: BudgetReport,
/// If quality is degraded, the structured reason.
pub degradation: Option<DegradationReport>,
}
/// Evidence chain: what index state was actually used.
pub struct SearchEvidenceSummary {
/// Which index layers were available and used.
pub layers_used: IndexLayersUsed,
/// Effective n_probe (after any adaptive widening).
pub n_probe_effective: u32,
/// Whether degenerate distribution was detected.
pub degenerate_detected: bool,
/// Coefficient of variation of top-K centroid distances.
pub centroid_distance_cv: f32,
/// Number of candidates found by HNSW before safety net.
pub hnsw_candidate_count: u32,
/// Number of candidates added by safety net scan.
pub safety_net_candidate_count: u32,
/// Content hashes of index segments actually touched.
pub index_segments_touched: Vec<[u8; 16]>,
}
#[derive(Clone, Copy, Debug)]
pub struct IndexLayersUsed {
pub layer_a: bool,
pub layer_b: bool,
pub layer_c: bool,
pub hot_cache: bool,
}
/// Resource consumption report.
pub struct BudgetReport {
/// Wall-clock time per stage.
pub centroid_routing_us: u64,
pub hnsw_traversal_us: u64,
pub safety_net_scan_us: u64,
pub reranking_us: u64,
pub total_us: u64,
/// Distance evaluations performed.
pub distance_ops: u64,
pub distance_ops_budget: u64,
/// Bytes read from storage.
pub bytes_read: u64,
/// Candidates scanned in linear scan (safety net).
pub linear_scan_count: u64,
pub linear_scan_budget: u64,
}
/// Why quality is degraded.
pub struct DegradationReport {
/// Which fallback path was chosen.
pub fallback_path: FallbackPath,
/// Why it was chosen (structured, not prose).
pub reason: DegradationReason,
/// What guarantee is lost relative to Full quality.
pub guarantee_lost: &'static str,
}
#[derive(Clone, Copy, Debug)]
pub enum FallbackPath {
/// Normal HNSW traversal, no fallback needed.
None,
/// Adaptive n_probe widening due to epoch drift.
NProbeWidened,
/// Adaptive n_probe widening due to degenerate distribution.
DegenerateWidened,
/// Selective safety net scan on hot cache.
SafetyNetSelective,
/// Safety net budget exhausted before completion.
SafetyNetBudgetExhausted,
}
#[derive(Clone, Copy, Debug)]
pub enum DegradationReason {
/// Centroid epoch drift exceeded threshold.
CentroidDrift { epoch_drift: u32, max_drift: u32 },
/// Degenerate distance distribution detected.
DegenerateDistribution { cv: f32, threshold: f32 },
/// Brute-force budget exhausted.
BudgetExhausted { scanned: u64, total: u64, budget_type: &'static str },
/// Index layer not yet loaded.
IndexNotLoaded { available: &'static str, needed: &'static str },
}
```
**Hard enforcement rule**: If `quality` is `Degraded` or `Unreliable`, the runtime MUST
either:
1. Return the `QualityEnvelope` with the structured warning (which cannot be dropped
because it is the outer type, not a nested field), OR
2. Require an explicit caller override flag to proceed:
```rust
pub enum QualityPreference {
/// Runtime decides. Default. Fastest path that meets internal thresholds.
Auto,
/// Caller prefers quality over latency. Runtime may widen n_probe,
/// extend budgets up to 4x, and block until Layer B loads.
PreferQuality,
/// Caller prefers latency over quality. Runtime may skip safety net,
/// reduce n_probe. ResponseQuality honestly reports what it gets.
PreferLatency,
/// Caller explicitly accepts degraded results. Required to proceed
/// when ResponseQuality would be Degraded or Unreliable under Auto.
/// Without this flag, Degraded queries return an error, not results.
AcceptDegraded,
}
```
Without `AcceptDegraded`, a `Degraded` result is returned as
`Err(RvfError::QualityBelowThreshold(envelope))` — the caller gets the evidence
but must explicitly opt in to use the results. This prevents silent misuse.
#### 2.5 Distribution Assumption Declaration
The spec MUST explicitly state:
> **Distribution Assumption**: Recall targets (0.70/0.85/0.95) assume sub-Gaussian embedding distributions typical of neural network outputs (sentence-transformers, OpenAI ada-002, Cohere embed-v3, etc.). For adversarial, synthetic, or uniform-random distributions, recall may be lower. When degenerate distributions are detected at query time, the runtime automatically widens its search and signals reduced confidence via `ResultQuality`.
This converts an implicit assumption into an explicit contract.
---
### 3. Recall Bound Framing
**Invariant**: Never claim theoretical guarantees without distribution assumptions.
#### 3.1 Monotonic Recall Improvement Property
Replace hard recall bounds with a provable structural property:
> **Monotonic Recall Property**: For any query Q and any two index states S1 and S2 where S2 includes all segments of S1 plus additional INDEX_SEGs:
>
> `recall(Q, S2) >= recall(Q, S1)`
>
> Proof: S2's candidate set is a superset of S1's (append-only segments, no removal). More candidates cannot reduce recall.
This is provable from the append-only invariant and requires no distribution assumption.
#### 3.2 Recall Target Classes
Replace the single recall table with benchmark-class-specific targets:
| State | Natural Embeddings | Synthetic Uniform | Adversarial Clustered |
|-------|-------------------|-------------------|----------------------|
| Layer A | >= 0.70 | >= 0.40 | >= 0.20 (with detection) |
| A + B | >= 0.85 | >= 0.70 | >= 0.60 |
| A + B + C | >= 0.95 | >= 0.90 | >= 0.85 |
"Natural Embeddings" = sentence-transformers, OpenAI, Cohere on standard corpora.
#### 3.3 Brute-Force Safety Net (Dual-Budgeted)
When the candidate set from HNSW search is smaller than `2 * k`, the safety net
activates. It is capped by **both** a time budget and a candidate budget to prevent
unbounded work. An adversarial query cannot force O(N) compute.
**Three required caps** (all enforced, none optional):
```rust
/// Budget caps for the brute-force safety net.
/// All three are enforced simultaneously. The scan stops at whichever hits first.
/// These are RUNTIME limits, not caller-adjustable above the defaults.
/// Callers may reduce them but not exceed them (unless PreferQuality mode,
/// which extends to 4x).
pub struct SafetyNetBudget {
/// Maximum wall-clock time for the safety net scan.
/// Default: 2,000 us (2 ms) in Layer A mode, 5,000 us (5 ms) in partial mode.
pub max_scan_time_us: u64,
/// Maximum number of candidate vectors to scan.
/// Default: 10,000 in Layer A mode, 50,000 in partial mode.
pub max_scan_candidates: u64,
/// Maximum number of distance evaluations (the actual compute cost).
/// This is the hardest cap — it bounds CPU work directly.
/// Default: 10,000 in Layer A mode, 50,000 in partial mode.
pub max_distance_ops: u64,
}
impl SafetyNetBudget {
/// Layer A only defaults: tight budget for instant first query.
pub const LAYER_A: Self = Self {
max_scan_time_us: 2_000, // 2 ms
max_scan_candidates: 10_000,
max_distance_ops: 10_000,
};
/// Partial index defaults: moderate budget.
pub const PARTIAL: Self = Self {
max_scan_time_us: 5_000, // 5 ms
max_scan_candidates: 50_000,
max_distance_ops: 50_000,
};
/// PreferQuality mode: 4x extension of the applicable default.
pub fn extended_4x(&self) -> Self {
Self {
max_scan_time_us: self.max_scan_time_us * 4,
max_scan_candidates: self.max_scan_candidates * 4,
max_distance_ops: self.max_distance_ops * 4,
}
}
/// Disabled: all zeros. Safety net will not scan anything.
pub const DISABLED: Self = Self {
max_scan_time_us: 0,
max_scan_candidates: 0,
max_distance_ops: 0,
};
}
```
All three are in `QueryOptions`:
```rust
pub struct QueryOptions {
pub k: usize,
pub ef_search: u32,
pub quality_preference: QualityPreference,
/// Safety net budget. Callers may tighten but not loosen beyond
/// the mode default (unless QualityPreference::PreferQuality).
pub safety_net_budget: SafetyNetBudget,
}
```
**Policy response**: When any budget is exceeded, the scan stops immediately and returns:
- `FallbackPath::SafetyNetBudgetExhausted`
- `DegradationReason::BudgetExhausted` with which budget triggered and how far the scan got
- A partial candidate set (whatever was found before the budget hit)
- `ResponseQuality::Degraded`
**Selective scan strategy** — the safety net does NOT scan the entire hot cache. It
scans a targeted subset to stay sparse even under fallback:
```rust
fn selective_safety_net_scan(
query: &[f32],
k: usize,
hnsw_candidates: &[Candidate],
centroid_distances: &[(u32, f32)], // (centroid_id, distance)
store: &RvfStore,
budget: &SafetyNetBudget,
) -> (Vec<Candidate>, BudgetReport) {
let deadline = Instant::now() + Duration::from_micros(budget.max_scan_time_us);
let mut scanned: u64 = 0;
let mut dist_ops: u64 = 0;
let mut candidates = Vec::new();
let mut budget_report = BudgetReport::default();
// Phase 1: Multi-centroid union
// Scan hot cache entries whose centroid_id is in top-T centroids.
// T = min(adaptive_n_probe, sqrt(total_centroids))
let top_t = centroid_distances.len().min(
(centroid_distances.len() as f64).sqrt().ceil() as usize
);
let top_centroid_ids: Vec<u32> = centroid_distances[..top_t]
.iter().map(|(id, _)| *id).collect();
for block in store.hot_cache_blocks_by_centroid(&top_centroid_ids) {
if scanned >= budget.max_scan_candidates { break; }
if dist_ops >= budget.max_distance_ops { break; }
if Instant::now() >= deadline { break; }
let block_results = scan_block(query, block);
scanned += block.len() as u64;
dist_ops += block.len() as u64;
candidates.extend(block_results);
}
// Phase 2: HNSW neighbor expansion
// For each existing HNSW candidate, scan their neighbors' vectors
// in the hot cache (1-hop expansion).
if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
for candidate in hnsw_candidates.iter().take(k) {
if scanned >= budget.max_scan_candidates { break; }
if dist_ops >= budget.max_distance_ops { break; }
if Instant::now() >= deadline { break; }
if let Some(neighbors) = store.hot_cache_neighbors(candidate.id) {
for neighbor in neighbors {
if dist_ops >= budget.max_distance_ops { break; }
let d = distance(query, &neighbor.vector);
dist_ops += 1;
scanned += 1;
candidates.push(Candidate { id: neighbor.id, distance: d });
}
}
}
}
// Phase 3: Recency window (if budget remains)
// Scan the most recently ingested vectors in the hot cache,
// which are most likely to be missing from the HNSW index.
if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
let remaining_budget = budget.max_scan_candidates - scanned;
for vec in store.hot_cache_recent(remaining_budget as usize) {
if dist_ops >= budget.max_distance_ops { break; }
if Instant::now() >= deadline { break; }
let d = distance(query, &vec.vector);
dist_ops += 1;
scanned += 1;
candidates.push(Candidate { id: vec.id, distance: d });
}
}
budget_report.linear_scan_count = scanned;
budget_report.linear_scan_budget = budget.max_scan_candidates;
budget_report.distance_ops = dist_ops;
budget_report.distance_ops_budget = budget.max_distance_ops;
(candidates, budget_report)
}
```
**Why selective, not exhaustive:**
The safety net scans three targeted sets in priority order:
1. **Multi-centroid union**: vectors near the best-matching centroids (spatial locality)
2. **HNSW neighbor expansion**: 1-hop neighbors of existing candidates (graph locality)
3. **Recency window**: recently ingested vectors not yet in any index (temporal locality)
Each phase respects all three budget caps. Even under the safety net, the scan stays
**sparse and deterministic**.
**Why three budget caps:**
- **Time alone** is insufficient: fast CPUs burn millions of ops in 5 ms.
- **Candidates alone** is insufficient: slow storage makes 50K scans take 50 ms.
- **Distance ops alone** is insufficient: a scan that reads but doesn't compute still
consumes I/O bandwidth.
- **All three together** bound the work in every dimension. The scan stops at whichever
limit hits first.
**Invariant**: The brute-force safety net is bounded in time, candidates, and compute.
A fuzzed query generator cannot push p95 latency above the budgeted ceiling. If all
three budgets are set to 0, the safety net is disabled entirely and the system returns
`ResponseQuality::Degraded` immediately when HNSW produces insufficient candidates.
#### 3.3.1 DoS Hardening
Three additional protections for public-facing deployments:
**Budget tokens**: Each query consumes a fixed budget of distance ops and bytes. The
runtime tracks a per-connection token bucket. No tokens remaining = query rejected with
`429 Too Many Requests` equivalent. Prevents sustained DoS via repeated adversarial queries.
**Negative caching**: If a query signature (hash of the query vector's quantized form)
triggers degenerate mode more than N times in a window, the runtime caches it and forces
`SafetyNetBudget::DISABLED` for subsequent matches. The adversary cannot keep burning budget
on the same attack vector.
**Proof-of-work option**: For open-internet endpoints only. The caller must include a
nonce proving O(work) computation before the query is accepted. This is opt-in, not
default — only relevant for unauthenticated public endpoints.
#### 3.4 Acceptance Test Update
Update `benchmarks/acceptance-tests.md` to:
1. Test against three distribution classes (natural, synthetic, adversarial)
2. Verify `ResponseQuality` flag accuracy at the API boundary
3. Verify monotonic recall improvement across progressive load phases
4. Measure brute-force fallback frequency and latency impact
5. Verify brute-force scan terminates within both time and candidate budgets
#### 3.5 Acceptance Test: Malicious Tail Manifest (MANDATORY)
**Test**: A maliciously rewritten tail manifest that preserves CRC32C but
changes hotset pointers must fail to mount under `Strict` policy, and must
produce a logged, deterministic failure reason.
```
Test: Malicious Hotset Pointer Redirection
==========================================
Setup:
1. Create signed RVF file with 100K vectors, full HNSW index
2. Record the original centroid_seg_offset and centroid_content_hash
3. Identify a different valid INDEX_SEG in the file (e.g., Layer B)
4. Craft a new Level 0 manifest:
- Replace centroid_seg_offset with the Layer B segment offset
- Keep ALL other fields identical
- Recompute CRC32C at 0xFFC to match the modified manifest
- Do NOT re-sign (signature becomes invalid)
5. Overwrite last 4096 bytes of file with crafted manifest
Verification under Strict policy:
1. Attempt: RvfStore::open_with_policy(&path, opts, SecurityPolicy::Strict)
2. MUST return Err(SecurityError::InvalidSignature)
3. The error MUST include:
- error_code: a stable, documented error code (not just a string)
- manifest_offset: byte offset of the rejected manifest
- expected_signer: public key fingerprint (if known)
- rejection_phase: "signature_verification" (not "content_hash")
4. The error MUST be logged at WARN level or higher
5. The file MUST NOT be queryable (no partial mount, no fallback)
Verification under Paranoid policy:
Same as Strict, identical behavior.
Verification under WarnOnly policy:
1. File opens successfully (warning logged)
2. Content hash verification runs on first hotset access
3. centroid_content_hash mismatches the actual segment payload
4. MUST return Err(SecurityError::ContentHashMismatch) on first query
5. The error MUST include:
- pointer_name: "centroid_seg_offset"
- expected_hash: the hash stored in Level 0
- actual_hash: the hash of the segment at the pointed offset
- seg_offset: the byte offset that was followed
6. System transitions to read-only mode, refuses further queries
Verification under Permissive policy:
1. File opens successfully (no warning)
2. Queries execute against the wrong segment
3. Results are structurally valid but semantically wrong
4. ResponseQuality is NOT required to detect this (Permissive = no safety)
5. This is the EXPECTED AND DOCUMENTED behavior of Permissive mode
Pass criteria:
- Strict/Paranoid: deterministic rejection, logged error, no mount
- WarnOnly: mount succeeds, content hash catches mismatch on first access
- Permissive: mount succeeds, no detection (by design)
- Error messages are stable across versions (code, not prose)
- No panic, no undefined behavior, no partial state leakage
```
**Test: Malicious Manifest with Re-signed Forgery**
```
Setup:
1. Same as above, but attacker also re-signs with a DIFFERENT key
2. File now has valid CRC32C AND valid signature — but wrong signer
Verification under Strict policy:
1. MUST return Err(SecurityError::UnknownSigner)
2. Error includes the actual signer fingerprint
3. Error includes the expected signer fingerprint (from trust store)
4. File does not mount
Pass criteria:
- The system distinguishes "no signature" from "wrong signer"
- Both produce distinct, documented error codes
```
#### 3.6 Acceptance Tests: QualityEnvelope Enforcement (MANDATORY)
**Test 1: Consumer Cannot Ignore QualityEnvelope**
```
Test: Schema Enforcement of QualityEnvelope
============================================
Setup:
1. Create RVF file with 10K vectors, full index
2. Issue a query that returns Degraded results (use degenerate query vector)
Verification:
1. The query API returns QualityEnvelope, not Vec<SearchResult>
2. Attempt to deserialize the response as Vec<SearchResult> (without envelope)
3. MUST fail at schema validation — the envelope is the outer type
4. JSON response: top-level keys MUST include "quality", "evidence", "budgets"
5. gRPC response: QualityEnvelope is the response message type
6. MCP tool response: "quality" field is at top level, not nested
Pass criteria:
- No API path exists that returns raw results without the envelope
- Schema validation rejects any consumer that skips the quality field
- The envelope cannot be flattened away by middleware or serialization
```
**Test 2: Adversarial Query Respects max_distance_ops Under Safety Net**
```
Test: Budget Cap Enforcement Under Adversarial Query
=====================================================
Setup:
1. Create RVF file with 1M vectors, Layer A only (no HNSW loaded)
2. Set SafetyNetBudget to LAYER_A defaults (10,000 distance ops)
3. Craft adversarial query that triggers degenerate detection
(uniform-random vector or equidistant from all centroids)
Verification:
1. Issue query with quality_preference = Auto
2. Safety net activates (candidate set < 2*k from HNSW)
3. BudgetReport.distance_ops MUST be <= SafetyNetBudget.max_distance_ops
4. BudgetReport.distance_ops MUST be <= 10,000
5. Total query wall-clock MUST be <= SafetyNetBudget.max_scan_time_us
6. DegradationReport.reason MUST be BudgetExhausted if budget was hit
7. ResponseQuality MUST be Degraded (not Verified or Usable)
Stress test:
1. Repeat with 10,000 adversarial queries in sequence
2. No single query may exceed max_distance_ops
3. Aggregate p95 latency MUST stay below max_scan_time_us ceiling
4. No OOM, no panic, no unbounded allocation
Pass criteria:
- max_distance_ops is a hard cap, never exceeded by even 1 operation
- Budget enforcement works under all three safety net phases
- Each phase independently respects all three budget caps
```
**Test 3: Degenerate Conditions Produce Partial Results, Not Hangs**
```
Test: Graceful Degradation Under Degenerate Conditions
=======================================================
Setup:
1. Create RVF file with 1M uniform-random vectors (worst case)
2. Load with Layer A only (no HNSW, no Layer B/C)
3. All centroids equidistant from query (maximum degeneracy)
Verification:
1. Issue query with quality_preference = Auto
2. Runtime MUST return within max_scan_time_us (not hang)
3. Return type MUST be Err(RvfError::QualityBelowThreshold(envelope))
4. The envelope MUST contain:
a. A partial result set (whatever was found before budget hit)
b. quality = ResponseQuality::Degraded or Unreliable
c. degradation.reason = BudgetExhausted or DegenerateDistribution
d. degradation.guarantee_lost describes what is missing
e. budgets.distance_ops <= budgets.distance_ops_budget
5. The caller can then choose:
a. Retry with PreferQuality (extends budget 4x)
b. Retry with AcceptDegraded (uses partial results as-is)
c. Wait for Layer B to load and retry
6. With AcceptDegraded:
a. Same partial results are returned as Ok(envelope)
b. ResponseQuality is still Degraded (honesty preserved)
c. No additional scanning beyond what was already done
Pass criteria:
- No hang, no scan-to-completion, no unbounded work
- Partial results are always available (not empty unless truly zero candidates)
- Clear, structured reason for degradation (not a string, a typed enum)
- Caller can always recover by choosing a different QualityPreference
```
#### 3.7 Benchmark: Fuzzed Query Latency Ceiling (MANDATORY)
```
Benchmark: Fuzzed Query Generator vs Budget Ceiling
=====================================================
Setup:
1. Create RVF file with 10M vectors, 384 dimensions, fp16
2. Generate a fuzzed query corpus:
a. 1000 natural embedding queries (sentence-transformer outputs)
b. 1000 uniform-random queries
c. 1000 adversarial queries (equidistant from top-K centroids)
d. 1000 degenerate queries (zero vector, max-norm vector, NaN-adjacent)
3. Load file progressively: measure at Layer A, A+B, A+B+C
Test:
1. Execute all 4000 queries at each progressive load stage
2. Measure p50, p95, p99, max latency per query class per stage
Pass criteria:
- p95 latency MUST NOT exceed SafetyNetBudget.max_scan_time_us at any stage
- p99 latency MUST NOT exceed 2x SafetyNetBudget.max_scan_time_us at any stage
(allowing for OS scheduling jitter, not algorithmic overshoot)
- max_distance_ops is NEVER exceeded (hard invariant, no exceptions)
- Recall improves monotonically across stages for all query classes:
recall@10(Layer A) <= recall@10(A+B) <= recall@10(A+B+C)
- No query class achieves recall@10 = 0.0 at any stage
(even degenerate queries must return SOME results)
Report:
JSON report per stage with:
stage, query_class, p50_us, p95_us, p99_us, max_us,
avg_recall_at_10, min_recall_at_10, avg_distance_ops,
max_distance_ops, safety_net_trigger_rate, budget_exhaustion_rate
```
---
### 4. Mandatory Manifest Signatures
**Invariant**: No signature, no mount in secure mode.
#### 4.1 Security Mount Policy
Add a `SecurityPolicy` enum to `RvfOptions`:
```rust
/// Manifest signature verification policy.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum SecurityPolicy {
/// No signature verification. For development and testing only.
/// Files open regardless of signature state.
Permissive = 0x00,
/// Warn on missing or invalid signatures, but allow open.
/// Log events for auditing.
WarnOnly = 0x01,
/// Require valid signature on Level 0 manifest.
/// Reject files with missing or invalid signatures.
/// DEFAULT for production.
Strict = 0x02,
/// Require valid signatures on Level 0, Level 1, and all
/// hotset-referenced segments. Full chain verification.
Paranoid = 0x03,
}
impl Default for SecurityPolicy {
fn default() -> Self {
Self::Strict
}
}
```
**Default is `Strict`**, not `Permissive`.
#### 4.2 Verification Chain
Under `Strict` policy, the open path becomes:
```
1. Read Level 0 (4096 bytes)
2. Validate CRC32C (corruption check)
3. Validate ML-DSA-65 signature (adversarial check)
4. If signature missing: REJECT with SecurityError::UnsignedManifest
5. If signature invalid: REJECT with SecurityError::InvalidSignature
6. Extract hotset pointers
7. For each hotset pointer: validate content hash (ADR-033 §1.1)
8. If any content hash fails: REJECT with SecurityError::ContentHashMismatch
9. System is now queryable with verified pointers
```
Under `Paranoid` policy, add:
```
10. Read Level 1 manifest
11. Validate Level 1 signature
12. For each segment in directory: verify content hash matches on first access
```
#### 4.3 Unsigned File Handling
Files without signatures can still be opened under `Permissive` or `WarnOnly` policies. This supports:
- Development and testing workflows
- Legacy files created before signature support
- Performance-critical paths where verification latency is unacceptable
But the default is `Strict`. If an enterprise deploys with defaults, they get signature enforcement. They must explicitly opt out.
#### 4.4 Signature Generation on Write
Every `write_manifest()` call MUST:
1. Compute SHAKE-256-256 content hashes for all hotset-referenced segments
2. Store hashes in Level 0 at the new offsets (§1.4)
3. If a signing key is available: sign Level 0 with ML-DSA-65
4. If no signing key: write `sig_algo = 0` (unsigned)
The `create()` and `open()` methods accept an optional signing key:
```rust
impl RvfStore {
pub fn create_signed(
path: &Path,
options: RvfOptions,
signing_key: &MlDsa65SigningKey,
) -> Result<Self, RvfError>;
}
```
#### 4.5 Runtime Policy Flag
The security policy is set at store open time and cannot be downgraded:
```rust
let store = RvfStore::open_with_policy(
&path,
RvfOptions::default(),
SecurityPolicy::Strict,
)?;
```
A store opened with `Strict` policy will reject any hotset pointer that fails content hash verification, even if the CRC32C passes. This prevents the segment-swap attack identified in the analysis.
---
## Consequences
### Positive
- Centroid stability becomes a **logical invariant**, not a physical accident
- Adversarial distribution degradation becomes **detectable and bounded**
- Recall claims become **honest** — empirical targets with explicit assumptions
- Manifest integrity becomes **mandatory by default** — enterprises are secure without configuration
- Quality elasticity replaces silent degradation — the system tells you when it's uncertain
### Negative
- Level 0 layout change is **breaking** (version 1 -> version 2)
- Content hash computation adds ~50 microseconds per manifest write
- Strict signature policy adds ~200 microseconds per file open (ML-DSA-65 verify)
- Adaptive n_probe increases query latency by up to 4x under degenerate distributions
### Migration
- Level 0 version field (`0x004`) distinguishes v1 (pre-ADR-033) from v2
- v1 files are readable under `Permissive` policy (no content hashes, no signature)
- v1 files trigger a warning under `WarnOnly` policy
- v1 files are rejected under `Strict` policy unless explicitly migrated
- Migration tool: `rvf migrate --sign --key <path>` rewrites manifest with v2 layout
---
## Size Impact
| Component | Additional Bytes | Where |
|-----------|-----------------|-------|
| Content hashes (5 pointers * 16 bytes) | 80 B | Level 0 manifest |
| Centroid epoch + drift fields | 8 B | Level 0 manifest |
| ResponseQuality + DegradationReason | ~64 B | Per query response |
| SecurityPolicy in options | 1 B | Runtime config |
| Total Level 0 overhead | 96 B | Within existing 4096 B page |
No additional segments. No file size increase beyond the 96 bytes in Level 0.
---
## Implementation Order
| Phase | Component | Estimated Effort |
|-------|-----------|-----------------|
| 1 | Content hash fields in `rvf-types` Level 0 layout | Small |
| 2 | `centroid_epoch` + `max_epoch_drift` in manifest | Small |
| 3 | `ResultQuality` enum in `rvf-runtime` | Small |
| 4 | `is_degenerate_distribution()` + adaptive n_probe | Medium |
| 5 | Content hash verification in read path | Medium |
| 6 | `SecurityPolicy` enum + enforcement in open path | Medium |
| 7 | ML-DSA-65 signing in write path | Large (depends on rvf-crypto) |
| 8 | Brute-force safety net in query path | Medium |
| 9 | Acceptance test updates (3 distribution classes) | Medium |
| 10 | Migration tool (`rvf migrate --sign`) | Medium |
---
## References
- RVF Spec 02: Manifest System (hotset pointers, Level 0 layout)
- RVF Spec 04: Progressive Indexing (Layer A/B/C recall targets)
- RVF Spec 03: Temperature Tiering (centroid refresh, sketch epochs)
- ADR-029: RVF Canonical Format (universal adoption across libraries)
- ADR-030: Cognitive Container (three-tier execution model)
- FIPS 204: ML-DSA (Module-Lattice Digital Signature Algorithm)
- Malkov & Yashunin (2018): HNSW search complexity analysis