# ADR-033: Progressive Indexing Hardening — Centroid Stability, Adversarial Resilience, Recall Framing, and Mandatory Signatures

**Status**: Accepted
**Date**: 2026-02-15
**Supersedes**: Partially amends ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container)
**Affects**: `rvf-types`, `rvf-runtime`, `rvf-manifest`, `rvf-crypto`, `rvf-wasm`

---

## Context

Analysis of the progressive indexing system (spec chapters 02-04) revealed four structural weaknesses that convert engineered guarantees into opportunistic behavior:

1. **Centroid stability** depends on physical layout, not logical identity
2. **Layer A recall** collapses silently under adversarial distributions
3. **Recall targets** are empirical, presented as if they were bounds
4. **Manifest integrity** is optional, leaving the hotset attack surface open

Each issue individually is tolerable. Together they form a compound vulnerability: an adversary who controls the data distribution AND the file tail can produce a structurally valid RVF file that returns confident, wrong answers with no detection mechanism.

This ADR converts all four from "known limitations" to "engineered defenses."

---

## Decision

### 1. Content-Addressed Centroid Stability

**Invariant**: Logical identity must not depend on physical layout.

#### 1.1 Content-Addressed Segment References

Hotset pointers in the Level 0 manifest currently store raw byte offsets:

```
0x058   8   centroid_seg_offset      Byte offset in file
```

Add a parallel content hash field for each hotset pointer:

```
Offset  Size  Field                    Description
------  ----  -----                    -----------
0x058   8     centroid_seg_offset      Byte offset (for fast seek)
0x0A0   16    centroid_content_hash    First 128 bits of SHAKE-256 of segment payload
```

The runtime validates:
1. Seek to `centroid_seg_offset`
2. Read segment header + payload
3. Compute SHAKE-256 of payload
4. Compare first 128 bits against `centroid_content_hash`
5. If mismatch: reject pointer, fall back to Level 1 directory scan

This makes compaction physically destructive but logically stable. The manifest re-points by offset for speed but verifies by hash for correctness.

#### 1.2 Centroid Epoch Monotonic Counter

Add to Level 0 root manifest:

```
Offset  Size  Field                    Description
------  ----  -----                    -----------
0x0B0   4     centroid_epoch           Monotonic counter, incremented on recomputation
0x0B4   4     max_epoch_drift          Maximum allowed drift before forced recompute
```

**Semantics**:
- `centroid_epoch` increments each time centroids are recomputed
- The manifest's global `epoch` counter tracks all mutations
- `epoch_drift = manifest.epoch - centroid_epoch`
- If `epoch_drift > max_epoch_drift`: runtime MUST either recompute centroids or widen `n_probe`

Default `max_epoch_drift`: 64 epochs.

#### 1.3 Automatic Quality Elasticity

When epoch drift is detected, the runtime applies controlled quality degradation instead of silent recall loss:

```rust
fn effective_n_probe(base_n_probe: u32, epoch_drift: u32, max_drift: u32) -> u32 {
    if epoch_drift <= max_drift / 2 {
        // Within comfort zone: no adjustment
        base_n_probe
    } else if epoch_drift <= max_drift {
        // Drift zone: linear widening up to 2x
        let scale = 1.0 + (epoch_drift - max_drift / 2) as f64 / max_drift as f64;
        (base_n_probe as f64 * scale).ceil() as u32
    } else {
        // Beyond max drift: double n_probe, schedule recomputation
        base_n_probe * 2
    }
}
```

This turns degradation into **controlled quality elasticity**: recall trades against latency in a predictable, bounded way.

#### 1.4 Wire Format Changes

Add content hash fields to Level 0 at reserved offsets (using the `0x094-0x0FF` reserved region before the signature):

```
Offset  Size  Field
------  ----  -----
0x0A0   16    entrypoint_content_hash
0x0B0   16    toplayer_content_hash
0x0C0   16    centroid_content_hash
0x0D0   16    quantdict_content_hash
0x0E0   16    hot_cache_content_hash
0x0F0   4     centroid_epoch
0x0F4   4     max_epoch_drift
0x0F8   8     reserved_hardening
```

Total: 96 bytes. Fits within the existing reserved region before the signature at `0x094`.

**Note**: The signature field at `0x094` must move to accommodate this. New signature offset: `0x100`. This is a breaking change to the Level 0 layout. Files written before ADR-033 are detected by `version < 2` in the root manifest and use the old layout.

---

### 2. Layer A Adversarial Resilience

**Invariant**: Silent catastrophic degradation must not be possible.

#### 2.1 Distance Entropy Detection

After computing distances to the top-K centroids, measure the discriminative power:

```rust
/// Detect adversarial or degenerate centroid distance distributions.
/// Returns true if the distribution is too uniform to trust centroid routing.
fn is_degenerate_distribution(distances: &[f32], k: usize) -> bool {
    if distances.len() < 2 * k {
        return true; // Not enough centroids
    }

    // Sort and take top-2k
    let mut sorted = distances.to_vec();
    sorted.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
    let top = &sorted[..2 * k];

    // Compute coefficient of variation (CV = stddev / mean)
    let mean = top.iter().sum::<f32>() / top.len() as f32;
    if mean < f32::EPSILON {
        return true; // All distances near zero
    }

    let variance = top.iter().map(|d| (d - mean).powi(2)).sum::<f32>() / top.len() as f32;
    let cv = variance.sqrt() / mean;

    // CV < 0.05 means top distances are within 5% of each other
    // This indicates centroids provide no discriminative power
    cv < DEGENERATE_CV_THRESHOLD
}

const DEGENERATE_CV_THRESHOLD: f32 = 0.05;
```

#### 2.2 Adaptive n_probe Widening

When degeneracy is detected, widen the search:

```rust
fn adaptive_n_probe(
    base_n_probe: u32,
    centroid_distances: &[f32],
    total_centroids: u32,
) -> u32 {
    if is_degenerate_distribution(centroid_distances, base_n_probe as usize) {
        // Degenerate: widen to sqrt(K) or 4x base, whichever is smaller
        let widened = (total_centroids as f64).sqrt().ceil() as u32;
        base_n_probe.max(widened).min(base_n_probe * 4)
    } else {
        base_n_probe
    }
}
```

#### 2.3 Multi-Centroid Fallback

When distance variance is below threshold AND Layer B is not yet loaded, fall back to a lightweight multi-probe strategy:

1. Compute distances to ALL centroids (not just top-K)
2. If all distances are within `mean +/- 2*stddev`: treat as uniform
3. For uniform distributions: scan the hot cache linearly (if available)
4. If no hot cache: return results with a `quality_flag = APPROXIMATE` in the response

This prevents silent wrong answers. The caller knows the result quality.

#### 2.4 Quality Flag at the API Boundary

`ResultQuality` is defined at two levels: per-retrieval and per-response.

**Per-retrieval** (internal, attached to each candidate):

```rust
/// Quality confidence for the retrieval candidate set.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum RetrievalQuality {
    /// Full index traversed, high confidence in candidate set.
    Full = 0x00,
    /// Partial index (Layer A+B), good confidence.
    Partial = 0x01,
    /// Layer A only, moderate confidence.
    LayerAOnly = 0x02,
    /// Degenerate distribution detected, low confidence.
    DegenerateDetected = 0x03,
    /// Brute-force fallback used within budget, exact over scanned region.
    BruteForceBudgeted = 0x04,
}
```

**Per-response** (external, returned to the caller at the API boundary):

```rust
/// Response-level quality signal. This is the field that consumers
/// (RAG pipelines, agent tool chains, MCP clients) MUST inspect.
///
/// If `response_quality < threshold`, the consumer should either:
/// - Wait and retry (progressive loading will improve quality)
/// - Widen the search (increase k or ef_search)
/// - Fall back to an alternative data source
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum ResponseQuality {
    /// All results from full index. Trust fully.
    Verified = 0x00,
    /// Results from partial index. Usable but may miss neighbors.
    Usable = 0x01,
    /// Degraded retrieval detected. Results are best-effort.
    /// The `degradation_reason` field explains why.
    Degraded = 0x02,
    /// Insufficient candidates found. Results are unreliable.
    /// Caller SHOULD NOT use these for downstream decisions.
    Unreliable = 0x03,
}
```

**Derivation rule** — `ResponseQuality` is the minimum of all `RetrievalQuality` values in the result set:

```rust
fn derive_response_quality(results: &[SearchResult]) -> ResponseQuality {
    let worst = results.iter()
        .map(|r| r.retrieval_quality)
        .max_by_key(|q| *q as u8)
        .unwrap_or(RetrievalQuality::Full);

    match worst {
        RetrievalQuality::Full => ResponseQuality::Verified,
        RetrievalQuality::Partial => ResponseQuality::Usable,
        RetrievalQuality::LayerAOnly => ResponseQuality::Usable,
        RetrievalQuality::DegenerateDetected => ResponseQuality::Degraded,
        RetrievalQuality::BruteForceBudgeted => ResponseQuality::Degraded,
    }
}
```

**Mandatory outer wrapper** — `QualityEnvelope` is the top-level return type for all
query APIs. It is not a nested field. It is the outer wrapper. JSON flattening cannot
discard it. gRPC serialization cannot drop it. MCP tool responses must include it.

```rust
/// The mandatory outer return type for all query APIs.
/// This is not optional. This is not a nested field.
/// Consumers that ignore this are misusing the API.
pub struct QualityEnvelope {
    /// The search results.
    pub results: Vec<SearchResult>,
    /// Top-level quality signal. Consumers MUST inspect this.
    pub quality: ResponseQuality,
    /// Structured evidence for why the quality is what it is.
    pub evidence: SearchEvidenceSummary,
    /// Resource consumption report for this query.
    pub budgets: BudgetReport,
    /// If quality is degraded, the structured reason.
    pub degradation: Option<DegradationReport>,
}

/// Evidence chain: what index state was actually used.
pub struct SearchEvidenceSummary {
    /// Which index layers were available and used.
    pub layers_used: IndexLayersUsed,
    /// Effective n_probe (after any adaptive widening).
    pub n_probe_effective: u32,
    /// Whether degenerate distribution was detected.
    pub degenerate_detected: bool,
    /// Coefficient of variation of top-K centroid distances.
    pub centroid_distance_cv: f32,
    /// Number of candidates found by HNSW before safety net.
    pub hnsw_candidate_count: u32,
    /// Number of candidates added by safety net scan.
    pub safety_net_candidate_count: u32,
    /// Content hashes of index segments actually touched.
    pub index_segments_touched: Vec<[u8; 16]>,
}

#[derive(Clone, Copy, Debug)]
pub struct IndexLayersUsed {
    pub layer_a: bool,
    pub layer_b: bool,
    pub layer_c: bool,
    pub hot_cache: bool,
}

/// Resource consumption report.
pub struct BudgetReport {
    /// Wall-clock time per stage.
    pub centroid_routing_us: u64,
    pub hnsw_traversal_us: u64,
    pub safety_net_scan_us: u64,
    pub reranking_us: u64,
    pub total_us: u64,
    /// Distance evaluations performed.
    pub distance_ops: u64,
    pub distance_ops_budget: u64,
    /// Bytes read from storage.
    pub bytes_read: u64,
    /// Candidates scanned in linear scan (safety net).
    pub linear_scan_count: u64,
    pub linear_scan_budget: u64,
}

/// Why quality is degraded.
pub struct DegradationReport {
    /// Which fallback path was chosen.
    pub fallback_path: FallbackPath,
    /// Why it was chosen (structured, not prose).
    pub reason: DegradationReason,
    /// What guarantee is lost relative to Full quality.
    pub guarantee_lost: &'static str,
}

#[derive(Clone, Copy, Debug)]
pub enum FallbackPath {
    /// Normal HNSW traversal, no fallback needed.
    None,
    /// Adaptive n_probe widening due to epoch drift.
    NProbeWidened,
    /// Adaptive n_probe widening due to degenerate distribution.
    DegenerateWidened,
    /// Selective safety net scan on hot cache.
    SafetyNetSelective,
    /// Safety net budget exhausted before completion.
    SafetyNetBudgetExhausted,
}

#[derive(Clone, Copy, Debug)]
pub enum DegradationReason {
    /// Centroid epoch drift exceeded threshold.
    CentroidDrift { epoch_drift: u32, max_drift: u32 },
    /// Degenerate distance distribution detected.
    DegenerateDistribution { cv: f32, threshold: f32 },
    /// Brute-force budget exhausted.
    BudgetExhausted { scanned: u64, total: u64, budget_type: &'static str },
    /// Index layer not yet loaded.
    IndexNotLoaded { available: &'static str, needed: &'static str },
}
```

**Hard enforcement rule**: If `quality` is `Degraded` or `Unreliable`, the runtime MUST
either:

1. Return the `QualityEnvelope` with the structured warning (which cannot be dropped
   because it is the outer type, not a nested field), OR
2. Require an explicit caller override flag to proceed:

```rust
pub enum QualityPreference {
    /// Runtime decides. Default. Fastest path that meets internal thresholds.
    Auto,
    /// Caller prefers quality over latency. Runtime may widen n_probe,
    /// extend budgets up to 4x, and block until Layer B loads.
    PreferQuality,
    /// Caller prefers latency over quality. Runtime may skip safety net,
    /// reduce n_probe. ResponseQuality honestly reports what it gets.
    PreferLatency,
    /// Caller explicitly accepts degraded results. Required to proceed
    /// when ResponseQuality would be Degraded or Unreliable under Auto.
    /// Without this flag, Degraded queries return an error, not results.
    AcceptDegraded,
}
```

Without `AcceptDegraded`, a `Degraded` result is returned as
`Err(RvfError::QualityBelowThreshold(envelope))` — the caller gets the evidence
but must explicitly opt in to use the results. This prevents silent misuse.

#### 2.5 Distribution Assumption Declaration

The spec MUST explicitly state:

> **Distribution Assumption**: Recall targets (0.70/0.85/0.95) assume sub-Gaussian embedding distributions typical of neural network outputs (sentence-transformers, OpenAI ada-002, Cohere embed-v3, etc.). For adversarial, synthetic, or uniform-random distributions, recall may be lower. When degenerate distributions are detected at query time, the runtime automatically widens its search and signals reduced confidence via `ResultQuality`.

This converts an implicit assumption into an explicit contract.

---

### 3. Recall Bound Framing

**Invariant**: Never claim theoretical guarantees without distribution assumptions.

#### 3.1 Monotonic Recall Improvement Property

Replace hard recall bounds with a provable structural property:

> **Monotonic Recall Property**: For any query Q and any two index states S1 and S2 where S2 includes all segments of S1 plus additional INDEX_SEGs:
>
> `recall(Q, S2) >= recall(Q, S1)`
>
> Proof: S2's candidate set is a superset of S1's (append-only segments, no removal). More candidates cannot reduce recall.

This is provable from the append-only invariant and requires no distribution assumption.

#### 3.2 Recall Target Classes

Replace the single recall table with benchmark-class-specific targets:

| State | Natural Embeddings | Synthetic Uniform | Adversarial Clustered |
|-------|-------------------|-------------------|----------------------|
| Layer A | >= 0.70 | >= 0.40 | >= 0.20 (with detection) |
| A + B | >= 0.85 | >= 0.70 | >= 0.60 |
| A + B + C | >= 0.95 | >= 0.90 | >= 0.85 |

"Natural Embeddings" = sentence-transformers, OpenAI, Cohere on standard corpora.

#### 3.3 Brute-Force Safety Net (Dual-Budgeted)

When the candidate set from HNSW search is smaller than `2 * k`, the safety net
activates. It is capped by **both** a time budget and a candidate budget to prevent
unbounded work. An adversarial query cannot force O(N) compute.

**Three required caps** (all enforced, none optional):

```rust
/// Budget caps for the brute-force safety net.
/// All three are enforced simultaneously. The scan stops at whichever hits first.
/// These are RUNTIME limits, not caller-adjustable above the defaults.
/// Callers may reduce them but not exceed them (unless PreferQuality mode,
/// which extends to 4x).
pub struct SafetyNetBudget {
    /// Maximum wall-clock time for the safety net scan.
    /// Default: 2,000 us (2 ms) in Layer A mode, 5,000 us (5 ms) in partial mode.
    pub max_scan_time_us: u64,
    /// Maximum number of candidate vectors to scan.
    /// Default: 10,000 in Layer A mode, 50,000 in partial mode.
    pub max_scan_candidates: u64,
    /// Maximum number of distance evaluations (the actual compute cost).
    /// This is the hardest cap — it bounds CPU work directly.
    /// Default: 10,000 in Layer A mode, 50,000 in partial mode.
    pub max_distance_ops: u64,
}

impl SafetyNetBudget {
    /// Layer A only defaults: tight budget for instant first query.
    pub const LAYER_A: Self = Self {
        max_scan_time_us: 2_000,     // 2 ms
        max_scan_candidates: 10_000,
        max_distance_ops: 10_000,
    };
    /// Partial index defaults: moderate budget.
    pub const PARTIAL: Self = Self {
        max_scan_time_us: 5_000,     // 5 ms
        max_scan_candidates: 50_000,
        max_distance_ops: 50_000,
    };
    /// PreferQuality mode: 4x extension of the applicable default.
    pub fn extended_4x(&self) -> Self {
        Self {
            max_scan_time_us: self.max_scan_time_us * 4,
            max_scan_candidates: self.max_scan_candidates * 4,
            max_distance_ops: self.max_distance_ops * 4,
        }
    }
    /// Disabled: all zeros. Safety net will not scan anything.
    pub const DISABLED: Self = Self {
        max_scan_time_us: 0,
        max_scan_candidates: 0,
        max_distance_ops: 0,
    };
}
```

All three are in `QueryOptions`:

```rust
pub struct QueryOptions {
    pub k: usize,
    pub ef_search: u32,
    pub quality_preference: QualityPreference,
    /// Safety net budget. Callers may tighten but not loosen beyond
    /// the mode default (unless QualityPreference::PreferQuality).
    pub safety_net_budget: SafetyNetBudget,
}
```

**Policy response**: When any budget is exceeded, the scan stops immediately and returns:
- `FallbackPath::SafetyNetBudgetExhausted`
- `DegradationReason::BudgetExhausted` with which budget triggered and how far the scan got
- A partial candidate set (whatever was found before the budget hit)
- `ResponseQuality::Degraded`

**Selective scan strategy** — the safety net does NOT scan the entire hot cache. It
scans a targeted subset to stay sparse even under fallback:

```rust
fn selective_safety_net_scan(
    query: &[f32],
    k: usize,
    hnsw_candidates: &[Candidate],
    centroid_distances: &[(u32, f32)], // (centroid_id, distance)
    store: &RvfStore,
    budget: &SafetyNetBudget,
) -> (Vec<Candidate>, BudgetReport) {
    let deadline = Instant::now() + Duration::from_micros(budget.max_scan_time_us);
    let mut scanned: u64 = 0;
    let mut dist_ops: u64 = 0;
    let mut candidates = Vec::new();
    let mut budget_report = BudgetReport::default();

    // Phase 1: Multi-centroid union
    // Scan hot cache entries whose centroid_id is in top-T centroids.
    // T = min(adaptive_n_probe, sqrt(total_centroids))
    let top_t = centroid_distances.len().min(
        (centroid_distances.len() as f64).sqrt().ceil() as usize
    );
    let top_centroid_ids: Vec<u32> = centroid_distances[..top_t]
        .iter().map(|(id, _)| *id).collect();

    for block in store.hot_cache_blocks_by_centroid(&top_centroid_ids) {
        if scanned >= budget.max_scan_candidates { break; }
        if dist_ops >= budget.max_distance_ops { break; }
        if Instant::now() >= deadline { break; }

        let block_results = scan_block(query, block);
        scanned += block.len() as u64;
        dist_ops += block.len() as u64;
        candidates.extend(block_results);
    }

    // Phase 2: HNSW neighbor expansion
    // For each existing HNSW candidate, scan their neighbors' vectors
    // in the hot cache (1-hop expansion).
    if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
        for candidate in hnsw_candidates.iter().take(k) {
            if scanned >= budget.max_scan_candidates { break; }
            if dist_ops >= budget.max_distance_ops { break; }
            if Instant::now() >= deadline { break; }

            if let Some(neighbors) = store.hot_cache_neighbors(candidate.id) {
                for neighbor in neighbors {
                    if dist_ops >= budget.max_distance_ops { break; }
                    let d = distance(query, &neighbor.vector);
                    dist_ops += 1;
                    scanned += 1;
                    candidates.push(Candidate { id: neighbor.id, distance: d });
                }
            }
        }
    }

    // Phase 3: Recency window (if budget remains)
    // Scan the most recently ingested vectors in the hot cache,
    // which are most likely to be missing from the HNSW index.
    if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
        let remaining_budget = budget.max_scan_candidates - scanned;
        for vec in store.hot_cache_recent(remaining_budget as usize) {
            if dist_ops >= budget.max_distance_ops { break; }
            if Instant::now() >= deadline { break; }
            let d = distance(query, &vec.vector);
            dist_ops += 1;
            scanned += 1;
            candidates.push(Candidate { id: vec.id, distance: d });
        }
    }

    budget_report.linear_scan_count = scanned;
    budget_report.linear_scan_budget = budget.max_scan_candidates;
    budget_report.distance_ops = dist_ops;
    budget_report.distance_ops_budget = budget.max_distance_ops;

    (candidates, budget_report)
}
```

**Why selective, not exhaustive:**

The safety net scans three targeted sets in priority order:
1. **Multi-centroid union**: vectors near the best-matching centroids (spatial locality)
2. **HNSW neighbor expansion**: 1-hop neighbors of existing candidates (graph locality)
3. **Recency window**: recently ingested vectors not yet in any index (temporal locality)

Each phase respects all three budget caps. Even under the safety net, the scan stays
**sparse and deterministic**.

**Why three budget caps:**

- **Time alone** is insufficient: fast CPUs burn millions of ops in 5 ms.
- **Candidates alone** is insufficient: slow storage makes 50K scans take 50 ms.
- **Distance ops alone** is insufficient: a scan that reads but doesn't compute still
  consumes I/O bandwidth.
- **All three together** bound the work in every dimension. The scan stops at whichever
  limit hits first.

**Invariant**: The brute-force safety net is bounded in time, candidates, and compute.
A fuzzed query generator cannot push p95 latency above the budgeted ceiling. If all
three budgets are set to 0, the safety net is disabled entirely and the system returns
`ResponseQuality::Degraded` immediately when HNSW produces insufficient candidates.

#### 3.3.1 DoS Hardening

Three additional protections for public-facing deployments:

**Budget tokens**: Each query consumes a fixed budget of distance ops and bytes. The
runtime tracks a per-connection token bucket. No tokens remaining = query rejected with
`429 Too Many Requests` equivalent. Prevents sustained DoS via repeated adversarial queries.

**Negative caching**: If a query signature (hash of the query vector's quantized form)
triggers degenerate mode more than N times in a window, the runtime caches it and forces
`SafetyNetBudget::DISABLED` for subsequent matches. The adversary cannot keep burning budget
on the same attack vector.

**Proof-of-work option**: For open-internet endpoints only. The caller must include a
nonce proving O(work) computation before the query is accepted. This is opt-in, not
default — only relevant for unauthenticated public endpoints.

#### 3.4 Acceptance Test Update

Update `benchmarks/acceptance-tests.md` to:

1. Test against three distribution classes (natural, synthetic, adversarial)
2. Verify `ResponseQuality` flag accuracy at the API boundary
3. Verify monotonic recall improvement across progressive load phases
4. Measure brute-force fallback frequency and latency impact
5. Verify brute-force scan terminates within both time and candidate budgets

#### 3.5 Acceptance Test: Malicious Tail Manifest (MANDATORY)

**Test**: A maliciously rewritten tail manifest that preserves CRC32C but
changes hotset pointers must fail to mount under `Strict` policy, and must
produce a logged, deterministic failure reason.

```
Test: Malicious Hotset Pointer Redirection
==========================================

Setup:
  1. Create signed RVF file with 100K vectors, full HNSW index
  2. Record the original centroid_seg_offset and centroid_content_hash
  3. Identify a different valid INDEX_SEG in the file (e.g., Layer B)
  4. Craft a new Level 0 manifest:
     - Replace centroid_seg_offset with the Layer B segment offset
     - Keep ALL other fields identical
     - Recompute CRC32C at 0xFFC to match the modified manifest
     - Do NOT re-sign (signature becomes invalid)
  5. Overwrite last 4096 bytes of file with crafted manifest

Verification under Strict policy:
  1. Attempt: RvfStore::open_with_policy(&path, opts, SecurityPolicy::Strict)
  2. MUST return Err(SecurityError::InvalidSignature)
  3. The error MUST include:
     - error_code: a stable, documented error code (not just a string)
     - manifest_offset: byte offset of the rejected manifest
     - expected_signer: public key fingerprint (if known)
     - rejection_phase: "signature_verification" (not "content_hash")
  4. The error MUST be logged at WARN level or higher
  5. The file MUST NOT be queryable (no partial mount, no fallback)

Verification under Paranoid policy:
  Same as Strict, identical behavior.

Verification under WarnOnly policy:
  1. File opens successfully (warning logged)
  2. Content hash verification runs on first hotset access
  3. centroid_content_hash mismatches the actual segment payload
  4. MUST return Err(SecurityError::ContentHashMismatch) on first query
  5. The error MUST include:
     - pointer_name: "centroid_seg_offset"
     - expected_hash: the hash stored in Level 0
     - actual_hash: the hash of the segment at the pointed offset
     - seg_offset: the byte offset that was followed
  6. System transitions to read-only mode, refuses further queries

Verification under Permissive policy:
  1. File opens successfully (no warning)
  2. Queries execute against the wrong segment
  3. Results are structurally valid but semantically wrong
  4. ResponseQuality is NOT required to detect this (Permissive = no safety)
  5. This is the EXPECTED AND DOCUMENTED behavior of Permissive mode

Pass criteria:
  - Strict/Paranoid: deterministic rejection, logged error, no mount
  - WarnOnly: mount succeeds, content hash catches mismatch on first access
  - Permissive: mount succeeds, no detection (by design)
  - Error messages are stable across versions (code, not prose)
  - No panic, no undefined behavior, no partial state leakage
```

**Test: Malicious Manifest with Re-signed Forgery**

```
Setup:
  1. Same as above, but attacker also re-signs with a DIFFERENT key
  2. File now has valid CRC32C AND valid signature — but wrong signer

Verification under Strict policy:
  1. MUST return Err(SecurityError::UnknownSigner)
  2. Error includes the actual signer fingerprint
  3. Error includes the expected signer fingerprint (from trust store)
  4. File does not mount

Pass criteria:
  - The system distinguishes "no signature" from "wrong signer"
  - Both produce distinct, documented error codes
```

#### 3.6 Acceptance Tests: QualityEnvelope Enforcement (MANDATORY)

**Test 1: Consumer Cannot Ignore QualityEnvelope**

```
Test: Schema Enforcement of QualityEnvelope
============================================

Setup:
  1. Create RVF file with 10K vectors, full index
  2. Issue a query that returns Degraded results (use degenerate query vector)

Verification:
  1. The query API returns QualityEnvelope, not Vec<SearchResult>
  2. Attempt to deserialize the response as Vec<SearchResult> (without envelope)
  3. MUST fail at schema validation — the envelope is the outer type
  4. JSON response: top-level keys MUST include "quality", "evidence", "budgets"
  5. gRPC response: QualityEnvelope is the response message type
  6. MCP tool response: "quality" field is at top level, not nested

Pass criteria:
  - No API path exists that returns raw results without the envelope
  - Schema validation rejects any consumer that skips the quality field
  - The envelope cannot be flattened away by middleware or serialization
```

**Test 2: Adversarial Query Respects max_distance_ops Under Safety Net**

```
Test: Budget Cap Enforcement Under Adversarial Query
=====================================================

Setup:
  1. Create RVF file with 1M vectors, Layer A only (no HNSW loaded)
  2. Set SafetyNetBudget to LAYER_A defaults (10,000 distance ops)
  3. Craft adversarial query that triggers degenerate detection
     (uniform-random vector or equidistant from all centroids)

Verification:
  1. Issue query with quality_preference = Auto
  2. Safety net activates (candidate set < 2*k from HNSW)
  3. BudgetReport.distance_ops MUST be <= SafetyNetBudget.max_distance_ops
  4. BudgetReport.distance_ops MUST be <= 10,000
  5. Total query wall-clock MUST be <= SafetyNetBudget.max_scan_time_us
  6. DegradationReport.reason MUST be BudgetExhausted if budget was hit
  7. ResponseQuality MUST be Degraded (not Verified or Usable)

Stress test:
  1. Repeat with 10,000 adversarial queries in sequence
  2. No single query may exceed max_distance_ops
  3. Aggregate p95 latency MUST stay below max_scan_time_us ceiling
  4. No OOM, no panic, no unbounded allocation

Pass criteria:
  - max_distance_ops is a hard cap, never exceeded by even 1 operation
  - Budget enforcement works under all three safety net phases
  - Each phase independently respects all three budget caps
```

**Test 3: Degenerate Conditions Produce Partial Results, Not Hangs**

```
Test: Graceful Degradation Under Degenerate Conditions
=======================================================

Setup:
  1. Create RVF file with 1M uniform-random vectors (worst case)
  2. Load with Layer A only (no HNSW, no Layer B/C)
  3. All centroids equidistant from query (maximum degeneracy)

Verification:
  1. Issue query with quality_preference = Auto
  2. Runtime MUST return within max_scan_time_us (not hang)
  3. Return type MUST be Err(RvfError::QualityBelowThreshold(envelope))
  4. The envelope MUST contain:
     a. A partial result set (whatever was found before budget hit)
     b. quality = ResponseQuality::Degraded or Unreliable
     c. degradation.reason = BudgetExhausted or DegenerateDistribution
     d. degradation.guarantee_lost describes what is missing
     e. budgets.distance_ops <= budgets.distance_ops_budget
  5. The caller can then choose:
     a. Retry with PreferQuality (extends budget 4x)
     b. Retry with AcceptDegraded (uses partial results as-is)
     c. Wait for Layer B to load and retry

  6. With AcceptDegraded:
     a. Same partial results are returned as Ok(envelope)
     b. ResponseQuality is still Degraded (honesty preserved)
     c. No additional scanning beyond what was already done

Pass criteria:
  - No hang, no scan-to-completion, no unbounded work
  - Partial results are always available (not empty unless truly zero candidates)
  - Clear, structured reason for degradation (not a string, a typed enum)
  - Caller can always recover by choosing a different QualityPreference
```

#### 3.7 Benchmark: Fuzzed Query Latency Ceiling (MANDATORY)

```
Benchmark: Fuzzed Query Generator vs Budget Ceiling
=====================================================

Setup:
  1. Create RVF file with 10M vectors, 384 dimensions, fp16
  2. Generate a fuzzed query corpus:
     a. 1000 natural embedding queries (sentence-transformer outputs)
     b. 1000 uniform-random queries
     c. 1000 adversarial queries (equidistant from top-K centroids)
     d. 1000 degenerate queries (zero vector, max-norm vector, NaN-adjacent)
  3. Load file progressively: measure at Layer A, A+B, A+B+C

Test:
  1. Execute all 4000 queries at each progressive load stage
  2. Measure p50, p95, p99, max latency per query class per stage

Pass criteria:
  - p95 latency MUST NOT exceed SafetyNetBudget.max_scan_time_us at any stage
  - p99 latency MUST NOT exceed 2x SafetyNetBudget.max_scan_time_us at any stage
    (allowing for OS scheduling jitter, not algorithmic overshoot)
  - max_distance_ops is NEVER exceeded (hard invariant, no exceptions)
  - Recall improves monotonically across stages for all query classes:
    recall@10(Layer A) <= recall@10(A+B) <= recall@10(A+B+C)
  - No query class achieves recall@10 = 0.0 at any stage
    (even degenerate queries must return SOME results)

Report:
  JSON report per stage with:
    stage, query_class, p50_us, p95_us, p99_us, max_us,
    avg_recall_at_10, min_recall_at_10, avg_distance_ops,
    max_distance_ops, safety_net_trigger_rate, budget_exhaustion_rate
```

---

### 4. Mandatory Manifest Signatures

**Invariant**: No signature, no mount in secure mode.

#### 4.1 Security Mount Policy

Add a `SecurityPolicy` enum to `RvfOptions`:

```rust
/// Manifest signature verification policy.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum SecurityPolicy {
    /// No signature verification. For development and testing only.
    /// Files open regardless of signature state.
    Permissive = 0x00,
    /// Warn on missing or invalid signatures, but allow open.
    /// Log events for auditing.
    WarnOnly = 0x01,
    /// Require valid signature on Level 0 manifest.
    /// Reject files with missing or invalid signatures.
    /// DEFAULT for production.
    Strict = 0x02,
    /// Require valid signatures on Level 0, Level 1, and all
    /// hotset-referenced segments. Full chain verification.
    Paranoid = 0x03,
}

impl Default for SecurityPolicy {
    fn default() -> Self {
        Self::Strict
    }
}
```

**Default is `Strict`**, not `Permissive`.

#### 4.2 Verification Chain

Under `Strict` policy, the open path becomes:

```
1. Read Level 0 (4096 bytes)
2. Validate CRC32C (corruption check)
3. Validate ML-DSA-65 signature (adversarial check)
4. If signature missing: REJECT with SecurityError::UnsignedManifest
5. If signature invalid: REJECT with SecurityError::InvalidSignature
6. Extract hotset pointers
7. For each hotset pointer: validate content hash (ADR-033 §1.1)
8. If any content hash fails: REJECT with SecurityError::ContentHashMismatch
9. System is now queryable with verified pointers
```

Under `Paranoid` policy, add:

```
10. Read Level 1 manifest
11. Validate Level 1 signature
12. For each segment in directory: verify content hash matches on first access
```

#### 4.3 Unsigned File Handling

Files without signatures can still be opened under `Permissive` or `WarnOnly` policies. This supports:

- Development and testing workflows
- Legacy files created before signature support
- Performance-critical paths where verification latency is unacceptable

But the default is `Strict`. If an enterprise deploys with defaults, they get signature enforcement. They must explicitly opt out.

#### 4.4 Signature Generation on Write

Every `write_manifest()` call MUST:

1. Compute SHAKE-256-256 content hashes for all hotset-referenced segments
2. Store hashes in Level 0 at the new offsets (§1.4)
3. If a signing key is available: sign Level 0 with ML-DSA-65
4. If no signing key: write `sig_algo = 0` (unsigned)

The `create()` and `open()` methods accept an optional signing key:

```rust
impl RvfStore {
    pub fn create_signed(
        path: &Path,
        options: RvfOptions,
        signing_key: &MlDsa65SigningKey,
    ) -> Result<Self, RvfError>;
}
```

#### 4.5 Runtime Policy Flag

The security policy is set at store open time and cannot be downgraded:

```rust
let store = RvfStore::open_with_policy(
    &path,
    RvfOptions::default(),
    SecurityPolicy::Strict,
)?;
```

A store opened with `Strict` policy will reject any hotset pointer that fails content hash verification, even if the CRC32C passes. This prevents the segment-swap attack identified in the analysis.

---

## Consequences

### Positive

- Centroid stability becomes a **logical invariant**, not a physical accident
- Adversarial distribution degradation becomes **detectable and bounded**
- Recall claims become **honest** — empirical targets with explicit assumptions
- Manifest integrity becomes **mandatory by default** — enterprises are secure without configuration
- Quality elasticity replaces silent degradation — the system tells you when it's uncertain

### Negative

- Level 0 layout change is **breaking** (version 1 -> version 2)
- Content hash computation adds ~50 microseconds per manifest write
- Strict signature policy adds ~200 microseconds per file open (ML-DSA-65 verify)
- Adaptive n_probe increases query latency by up to 4x under degenerate distributions

### Migration

- Level 0 version field (`0x004`) distinguishes v1 (pre-ADR-033) from v2
- v1 files are readable under `Permissive` policy (no content hashes, no signature)
- v1 files trigger a warning under `WarnOnly` policy
- v1 files are rejected under `Strict` policy unless explicitly migrated
- Migration tool: `rvf migrate --sign --key <path>` rewrites manifest with v2 layout

---

## Size Impact

| Component | Additional Bytes | Where |
|-----------|-----------------|-------|
| Content hashes (5 pointers * 16 bytes) | 80 B | Level 0 manifest |
| Centroid epoch + drift fields | 8 B | Level 0 manifest |
| ResponseQuality + DegradationReason | ~64 B | Per query response |
| SecurityPolicy in options | 1 B | Runtime config |
| Total Level 0 overhead | 96 B | Within existing 4096 B page |

No additional segments. No file size increase beyond the 96 bytes in Level 0.

---

## Implementation Order

| Phase | Component | Estimated Effort |
|-------|-----------|-----------------|
| 1 | Content hash fields in `rvf-types` Level 0 layout | Small |
| 2 | `centroid_epoch` + `max_epoch_drift` in manifest | Small |
| 3 | `ResultQuality` enum in `rvf-runtime` | Small |
| 4 | `is_degenerate_distribution()` + adaptive n_probe | Medium |
| 5 | Content hash verification in read path | Medium |
| 6 | `SecurityPolicy` enum + enforcement in open path | Medium |
| 7 | ML-DSA-65 signing in write path | Large (depends on rvf-crypto) |
| 8 | Brute-force safety net in query path | Medium |
| 9 | Acceptance test updates (3 distribution classes) | Medium |
| 10 | Migration tool (`rvf migrate --sign`) | Medium |

---

## References

- RVF Spec 02: Manifest System (hotset pointers, Level 0 layout)
- RVF Spec 04: Progressive Indexing (Layer A/B/C recall targets)
- RVF Spec 03: Temperature Tiering (centroid refresh, sketch epochs)
- ADR-029: RVF Canonical Format (universal adoption across libraries)
- ADR-030: Cognitive Container (three-tier execution model)
- FIPS 204: ML-DSA (Module-Lattice Digital Signature Algorithm)
- Malkov & Yashunin (2018): HNSW search complexity analysis