Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

39 KiB

Raw Permalink Blame History

ADR-033: Progressive Indexing Hardening — Centroid Stability, Adversarial Resilience, Recall Framing, and Mandatory Signatures

Status: Accepted Date: 2026-02-15 Supersedes: Partially amends ADR-029 (RVF Canonical Format), ADR-030 (Cognitive Container) Affects: rvf-types, rvf-runtime, rvf-manifest, rvf-crypto, rvf-wasm

Context

Analysis of the progressive indexing system (spec chapters 02-04) revealed four structural weaknesses that convert engineered guarantees into opportunistic behavior:

Centroid stability depends on physical layout, not logical identity
Layer A recall collapses silently under adversarial distributions
Recall targets are empirical, presented as if they were bounds
Manifest integrity is optional, leaving the hotset attack surface open

Each issue individually is tolerable. Together they form a compound vulnerability: an adversary who controls the data distribution AND the file tail can produce a structurally valid RVF file that returns confident, wrong answers with no detection mechanism.

This ADR converts all four from "known limitations" to "engineered defenses."

Decision

1. Content-Addressed Centroid Stability

Invariant: Logical identity must not depend on physical layout.

1.1 Content-Addressed Segment References

Hotset pointers in the Level 0 manifest currently store raw byte offsets:

0x058   8   centroid_seg_offset      Byte offset in file

Add a parallel content hash field for each hotset pointer:

Offset  Size  Field                    Description
------  ----  -----                    -----------
0x058   8     centroid_seg_offset      Byte offset (for fast seek)
0x0A0   16    centroid_content_hash    First 128 bits of SHAKE-256 of segment payload

The runtime validates:

Seek to centroid_seg_offset
Read segment header + payload
Compute SHAKE-256 of payload
Compare first 128 bits against centroid_content_hash
If mismatch: reject pointer, fall back to Level 1 directory scan

This makes compaction physically destructive but logically stable. The manifest re-points by offset for speed but verifies by hash for correctness.

1.2 Centroid Epoch Monotonic Counter

Add to Level 0 root manifest:

Offset  Size  Field                    Description
------  ----  -----                    -----------
0x0B0   4     centroid_epoch           Monotonic counter, incremented on recomputation
0x0B4   4     max_epoch_drift          Maximum allowed drift before forced recompute

Semantics:

centroid_epoch increments each time centroids are recomputed
The manifest's global epoch counter tracks all mutations
epoch_drift = manifest.epoch - centroid_epoch
If epoch_drift > max_epoch_drift: runtime MUST either recompute centroids or widen n_probe

Default max_epoch_drift: 64 epochs.

1.3 Automatic Quality Elasticity

When epoch drift is detected, the runtime applies controlled quality degradation instead of silent recall loss:

fn effective_n_probe(base_n_probe: u32, epoch_drift: u32, max_drift: u32) -> u32 {
    if epoch_drift <= max_drift / 2 {
        // Within comfort zone: no adjustment
        base_n_probe
    } else if epoch_drift <= max_drift {
        // Drift zone: linear widening up to 2x
        let scale = 1.0 + (epoch_drift - max_drift / 2) as f64 / max_drift as f64;
        (base_n_probe as f64 * scale).ceil() as u32
    } else {
        // Beyond max drift: double n_probe, schedule recomputation
        base_n_probe * 2
    }
}

This turns degradation into controlled quality elasticity: recall trades against latency in a predictable, bounded way.

1.4 Wire Format Changes

Add content hash fields to Level 0 at reserved offsets (using the 0x094-0x0FF reserved region before the signature):

Offset  Size  Field
------  ----  -----
0x0A0   16    entrypoint_content_hash
0x0B0   16    toplayer_content_hash
0x0C0   16    centroid_content_hash
0x0D0   16    quantdict_content_hash
0x0E0   16    hot_cache_content_hash
0x0F0   4     centroid_epoch
0x0F4   4     max_epoch_drift
0x0F8   8     reserved_hardening

Total: 96 bytes. Fits within the existing reserved region before the signature at 0x094.

Note: The signature field at 0x094 must move to accommodate this. New signature offset: 0x100. This is a breaking change to the Level 0 layout. Files written before ADR-033 are detected by version < 2 in the root manifest and use the old layout.

2. Layer A Adversarial Resilience

Invariant: Silent catastrophic degradation must not be possible.

2.1 Distance Entropy Detection

After computing distances to the top-K centroids, measure the discriminative power:

/// Detect adversarial or degenerate centroid distance distributions.
/// Returns true if the distribution is too uniform to trust centroid routing.
fn is_degenerate_distribution(distances: &[f32], k: usize) -> bool {
    if distances.len() < 2 * k {
        return true; // Not enough centroids
    }

    // Sort and take top-2k
    let mut sorted = distances.to_vec();
    sorted.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
    let top = &sorted[..2 * k];

    // Compute coefficient of variation (CV = stddev / mean)
    let mean = top.iter().sum::<f32>() / top.len() as f32;
    if mean < f32::EPSILON {
        return true; // All distances near zero
    }

    let variance = top.iter().map(|d| (d - mean).powi(2)).sum::<f32>() / top.len() as f32;
    let cv = variance.sqrt() / mean;

    // CV < 0.05 means top distances are within 5% of each other
    // This indicates centroids provide no discriminative power
    cv < DEGENERATE_CV_THRESHOLD
}

const DEGENERATE_CV_THRESHOLD: f32 = 0.05;

2.2 Adaptive n_probe Widening

When degeneracy is detected, widen the search:

fn adaptive_n_probe(
    base_n_probe: u32,
    centroid_distances: &[f32],
    total_centroids: u32,
) -> u32 {
    if is_degenerate_distribution(centroid_distances, base_n_probe as usize) {
        // Degenerate: widen to sqrt(K) or 4x base, whichever is smaller
        let widened = (total_centroids as f64).sqrt().ceil() as u32;
        base_n_probe.max(widened).min(base_n_probe * 4)
    } else {
        base_n_probe
    }
}

2.3 Multi-Centroid Fallback

When distance variance is below threshold AND Layer B is not yet loaded, fall back to a lightweight multi-probe strategy:

Compute distances to ALL centroids (not just top-K)
If all distances are within mean +/- 2*stddev: treat as uniform
For uniform distributions: scan the hot cache linearly (if available)
If no hot cache: return results with a quality_flag = APPROXIMATE in the response

This prevents silent wrong answers. The caller knows the result quality.

2.4 Quality Flag at the API Boundary

ResultQuality is defined at two levels: per-retrieval and per-response.

Per-retrieval (internal, attached to each candidate):

/// Quality confidence for the retrieval candidate set.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum RetrievalQuality {
    /// Full index traversed, high confidence in candidate set.
    Full = 0x00,
    /// Partial index (Layer A+B), good confidence.
    Partial = 0x01,
    /// Layer A only, moderate confidence.
    LayerAOnly = 0x02,
    /// Degenerate distribution detected, low confidence.
    DegenerateDetected = 0x03,
    /// Brute-force fallback used within budget, exact over scanned region.
    BruteForceBudgeted = 0x04,
}

Per-response (external, returned to the caller at the API boundary):

/// Response-level quality signal. This is the field that consumers
/// (RAG pipelines, agent tool chains, MCP clients) MUST inspect.
///
/// If `response_quality < threshold`, the consumer should either:
/// - Wait and retry (progressive loading will improve quality)
/// - Widen the search (increase k or ef_search)
/// - Fall back to an alternative data source
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum ResponseQuality {
    /// All results from full index. Trust fully.
    Verified = 0x00,
    /// Results from partial index. Usable but may miss neighbors.
    Usable = 0x01,
    /// Degraded retrieval detected. Results are best-effort.
    /// The `degradation_reason` field explains why.
    Degraded = 0x02,
    /// Insufficient candidates found. Results are unreliable.
    /// Caller SHOULD NOT use these for downstream decisions.
    Unreliable = 0x03,
}

Derivation rule — ResponseQuality is the minimum of all RetrievalQuality values in the result set:

fn derive_response_quality(results: &[SearchResult]) -> ResponseQuality {
    let worst = results.iter()
        .map(|r| r.retrieval_quality)
        .max_by_key(|q| *q as u8)
        .unwrap_or(RetrievalQuality::Full);

    match worst {
        RetrievalQuality::Full => ResponseQuality::Verified,
        RetrievalQuality::Partial => ResponseQuality::Usable,
        RetrievalQuality::LayerAOnly => ResponseQuality::Usable,
        RetrievalQuality::DegenerateDetected => ResponseQuality::Degraded,
        RetrievalQuality::BruteForceBudgeted => ResponseQuality::Degraded,
    }
}

Mandatory outer wrapper — QualityEnvelope is the top-level return type for all query APIs. It is not a nested field. It is the outer wrapper. JSON flattening cannot discard it. gRPC serialization cannot drop it. MCP tool responses must include it.

/// The mandatory outer return type for all query APIs.
/// This is not optional. This is not a nested field.
/// Consumers that ignore this are misusing the API.
pub struct QualityEnvelope {
    /// The search results.
    pub results: Vec<SearchResult>,
    /// Top-level quality signal. Consumers MUST inspect this.
    pub quality: ResponseQuality,
    /// Structured evidence for why the quality is what it is.
    pub evidence: SearchEvidenceSummary,
    /// Resource consumption report for this query.
    pub budgets: BudgetReport,
    /// If quality is degraded, the structured reason.
    pub degradation: Option<DegradationReport>,
}

/// Evidence chain: what index state was actually used.
pub struct SearchEvidenceSummary {
    /// Which index layers were available and used.
    pub layers_used: IndexLayersUsed,
    /// Effective n_probe (after any adaptive widening).
    pub n_probe_effective: u32,
    /// Whether degenerate distribution was detected.
    pub degenerate_detected: bool,
    /// Coefficient of variation of top-K centroid distances.
    pub centroid_distance_cv: f32,
    /// Number of candidates found by HNSW before safety net.
    pub hnsw_candidate_count: u32,
    /// Number of candidates added by safety net scan.
    pub safety_net_candidate_count: u32,
    /// Content hashes of index segments actually touched.
    pub index_segments_touched: Vec<[u8; 16]>,
}

#[derive(Clone, Copy, Debug)]
pub struct IndexLayersUsed {
    pub layer_a: bool,
    pub layer_b: bool,
    pub layer_c: bool,
    pub hot_cache: bool,
}

/// Resource consumption report.
pub struct BudgetReport {
    /// Wall-clock time per stage.
    pub centroid_routing_us: u64,
    pub hnsw_traversal_us: u64,
    pub safety_net_scan_us: u64,
    pub reranking_us: u64,
    pub total_us: u64,
    /// Distance evaluations performed.
    pub distance_ops: u64,
    pub distance_ops_budget: u64,
    /// Bytes read from storage.
    pub bytes_read: u64,
    /// Candidates scanned in linear scan (safety net).
    pub linear_scan_count: u64,
    pub linear_scan_budget: u64,
}

/// Why quality is degraded.
pub struct DegradationReport {
    /// Which fallback path was chosen.
    pub fallback_path: FallbackPath,
    /// Why it was chosen (structured, not prose).
    pub reason: DegradationReason,
    /// What guarantee is lost relative to Full quality.
    pub guarantee_lost: &'static str,
}

#[derive(Clone, Copy, Debug)]
pub enum FallbackPath {
    /// Normal HNSW traversal, no fallback needed.
    None,
    /// Adaptive n_probe widening due to epoch drift.
    NProbeWidened,
    /// Adaptive n_probe widening due to degenerate distribution.
    DegenerateWidened,
    /// Selective safety net scan on hot cache.
    SafetyNetSelective,
    /// Safety net budget exhausted before completion.
    SafetyNetBudgetExhausted,
}

#[derive(Clone, Copy, Debug)]
pub enum DegradationReason {
    /// Centroid epoch drift exceeded threshold.
    CentroidDrift { epoch_drift: u32, max_drift: u32 },
    /// Degenerate distance distribution detected.
    DegenerateDistribution { cv: f32, threshold: f32 },
    /// Brute-force budget exhausted.
    BudgetExhausted { scanned: u64, total: u64, budget_type: &'static str },
    /// Index layer not yet loaded.
    IndexNotLoaded { available: &'static str, needed: &'static str },
}

Hard enforcement rule: If quality is Degraded or Unreliable, the runtime MUST either:

Return the QualityEnvelope with the structured warning (which cannot be dropped because it is the outer type, not a nested field), OR
Require an explicit caller override flag to proceed:

pub enum QualityPreference {
    /// Runtime decides. Default. Fastest path that meets internal thresholds.
    Auto,
    /// Caller prefers quality over latency. Runtime may widen n_probe,
    /// extend budgets up to 4x, and block until Layer B loads.
    PreferQuality,
    /// Caller prefers latency over quality. Runtime may skip safety net,
    /// reduce n_probe. ResponseQuality honestly reports what it gets.
    PreferLatency,
    /// Caller explicitly accepts degraded results. Required to proceed
    /// when ResponseQuality would be Degraded or Unreliable under Auto.
    /// Without this flag, Degraded queries return an error, not results.
    AcceptDegraded,
}

Without AcceptDegraded, a Degraded result is returned as Err(RvfError::QualityBelowThreshold(envelope)) — the caller gets the evidence but must explicitly opt in to use the results. This prevents silent misuse.

2.5 Distribution Assumption Declaration

The spec MUST explicitly state:

Distribution Assumption: Recall targets (0.70/0.85/0.95) assume sub-Gaussian embedding distributions typical of neural network outputs (sentence-transformers, OpenAI ada-002, Cohere embed-v3, etc.). For adversarial, synthetic, or uniform-random distributions, recall may be lower. When degenerate distributions are detected at query time, the runtime automatically widens its search and signals reduced confidence via ResultQuality.

This converts an implicit assumption into an explicit contract.

3. Recall Bound Framing

Invariant: Never claim theoretical guarantees without distribution assumptions.

3.1 Monotonic Recall Improvement Property

Replace hard recall bounds with a provable structural property:

Monotonic Recall Property: For any query Q and any two index states S1 and S2 where S2 includes all segments of S1 plus additional INDEX_SEGs:

recall(Q, S2) >= recall(Q, S1)

Proof: S2's candidate set is a superset of S1's (append-only segments, no removal). More candidates cannot reduce recall.

This is provable from the append-only invariant and requires no distribution assumption.

3.2 Recall Target Classes

Replace the single recall table with benchmark-class-specific targets:

State	Natural Embeddings	Synthetic Uniform	Adversarial Clustered
Layer A	>= 0.70	>= 0.40	>= 0.20 (with detection)
A + B	>= 0.85	>= 0.70	>= 0.60
A + B + C	>= 0.95	>= 0.90	>= 0.85

"Natural Embeddings" = sentence-transformers, OpenAI, Cohere on standard corpora.

3.3 Brute-Force Safety Net (Dual-Budgeted)

When the candidate set from HNSW search is smaller than 2 * k, the safety net activates. It is capped by both a time budget and a candidate budget to prevent unbounded work. An adversarial query cannot force O(N) compute.

Three required caps (all enforced, none optional):

/// Budget caps for the brute-force safety net.
/// All three are enforced simultaneously. The scan stops at whichever hits first.
/// These are RUNTIME limits, not caller-adjustable above the defaults.
/// Callers may reduce them but not exceed them (unless PreferQuality mode,
/// which extends to 4x).
pub struct SafetyNetBudget {
    /// Maximum wall-clock time for the safety net scan.
    /// Default: 2,000 us (2 ms) in Layer A mode, 5,000 us (5 ms) in partial mode.
    pub max_scan_time_us: u64,
    /// Maximum number of candidate vectors to scan.
    /// Default: 10,000 in Layer A mode, 50,000 in partial mode.
    pub max_scan_candidates: u64,
    /// Maximum number of distance evaluations (the actual compute cost).
    /// This is the hardest cap — it bounds CPU work directly.
    /// Default: 10,000 in Layer A mode, 50,000 in partial mode.
    pub max_distance_ops: u64,
}

impl SafetyNetBudget {
    /// Layer A only defaults: tight budget for instant first query.
    pub const LAYER_A: Self = Self {
        max_scan_time_us: 2_000,     // 2 ms
        max_scan_candidates: 10_000,
        max_distance_ops: 10_000,
    };
    /// Partial index defaults: moderate budget.
    pub const PARTIAL: Self = Self {
        max_scan_time_us: 5_000,     // 5 ms
        max_scan_candidates: 50_000,
        max_distance_ops: 50_000,
    };
    /// PreferQuality mode: 4x extension of the applicable default.
    pub fn extended_4x(&self) -> Self {
        Self {
            max_scan_time_us: self.max_scan_time_us * 4,
            max_scan_candidates: self.max_scan_candidates * 4,
            max_distance_ops: self.max_distance_ops * 4,
        }
    }
    /// Disabled: all zeros. Safety net will not scan anything.
    pub const DISABLED: Self = Self {
        max_scan_time_us: 0,
        max_scan_candidates: 0,
        max_distance_ops: 0,
    };
}

All three are in QueryOptions:

pub struct QueryOptions {
    pub k: usize,
    pub ef_search: u32,
    pub quality_preference: QualityPreference,
    /// Safety net budget. Callers may tighten but not loosen beyond
    /// the mode default (unless QualityPreference::PreferQuality).
    pub safety_net_budget: SafetyNetBudget,
}

Policy response: When any budget is exceeded, the scan stops immediately and returns:

FallbackPath::SafetyNetBudgetExhausted
DegradationReason::BudgetExhausted with which budget triggered and how far the scan got
A partial candidate set (whatever was found before the budget hit)
ResponseQuality::Degraded

Selective scan strategy — the safety net does NOT scan the entire hot cache. It scans a targeted subset to stay sparse even under fallback:

fn selective_safety_net_scan(
    query: &[f32],
    k: usize,
    hnsw_candidates: &[Candidate],
    centroid_distances: &[(u32, f32)], // (centroid_id, distance)
    store: &RvfStore,
    budget: &SafetyNetBudget,
) -> (Vec<Candidate>, BudgetReport) {
    let deadline = Instant::now() + Duration::from_micros(budget.max_scan_time_us);
    let mut scanned: u64 = 0;
    let mut dist_ops: u64 = 0;
    let mut candidates = Vec::new();
    let mut budget_report = BudgetReport::default();

    // Phase 1: Multi-centroid union
    // Scan hot cache entries whose centroid_id is in top-T centroids.
    // T = min(adaptive_n_probe, sqrt(total_centroids))
    let top_t = centroid_distances.len().min(
        (centroid_distances.len() as f64).sqrt().ceil() as usize
    );
    let top_centroid_ids: Vec<u32> = centroid_distances[..top_t]
        .iter().map(|(id, _)| *id).collect();

    for block in store.hot_cache_blocks_by_centroid(&top_centroid_ids) {
        if scanned >= budget.max_scan_candidates { break; }
        if dist_ops >= budget.max_distance_ops { break; }
        if Instant::now() >= deadline { break; }

        let block_results = scan_block(query, block);
        scanned += block.len() as u64;
        dist_ops += block.len() as u64;
        candidates.extend(block_results);
    }

    // Phase 2: HNSW neighbor expansion
    // For each existing HNSW candidate, scan their neighbors' vectors
    // in the hot cache (1-hop expansion).
    if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
        for candidate in hnsw_candidates.iter().take(k) {
            if scanned >= budget.max_scan_candidates { break; }
            if dist_ops >= budget.max_distance_ops { break; }
            if Instant::now() >= deadline { break; }

            if let Some(neighbors) = store.hot_cache_neighbors(candidate.id) {
                for neighbor in neighbors {
                    if dist_ops >= budget.max_distance_ops { break; }
                    let d = distance(query, &neighbor.vector);
                    dist_ops += 1;
                    scanned += 1;
                    candidates.push(Candidate { id: neighbor.id, distance: d });
                }
            }
        }
    }

    // Phase 3: Recency window (if budget remains)
    // Scan the most recently ingested vectors in the hot cache,
    // which are most likely to be missing from the HNSW index.
    if scanned < budget.max_scan_candidates && dist_ops < budget.max_distance_ops {
        let remaining_budget = budget.max_scan_candidates - scanned;
        for vec in store.hot_cache_recent(remaining_budget as usize) {
            if dist_ops >= budget.max_distance_ops { break; }
            if Instant::now() >= deadline { break; }
            let d = distance(query, &vec.vector);
            dist_ops += 1;
            scanned += 1;
            candidates.push(Candidate { id: vec.id, distance: d });
        }
    }

    budget_report.linear_scan_count = scanned;
    budget_report.linear_scan_budget = budget.max_scan_candidates;
    budget_report.distance_ops = dist_ops;
    budget_report.distance_ops_budget = budget.max_distance_ops;

    (candidates, budget_report)
}

Why selective, not exhaustive:

The safety net scans three targeted sets in priority order:

Multi-centroid union: vectors near the best-matching centroids (spatial locality)
HNSW neighbor expansion: 1-hop neighbors of existing candidates (graph locality)
Recency window: recently ingested vectors not yet in any index (temporal locality)

Each phase respects all three budget caps. Even under the safety net, the scan stays sparse and deterministic.

Why three budget caps:

Time alone is insufficient: fast CPUs burn millions of ops in 5 ms.
Candidates alone is insufficient: slow storage makes 50K scans take 50 ms.
Distance ops alone is insufficient: a scan that reads but doesn't compute still consumes I/O bandwidth.
All three together bound the work in every dimension. The scan stops at whichever limit hits first.

Invariant: The brute-force safety net is bounded in time, candidates, and compute. A fuzzed query generator cannot push p95 latency above the budgeted ceiling. If all three budgets are set to 0, the safety net is disabled entirely and the system returns ResponseQuality::Degraded immediately when HNSW produces insufficient candidates.

3.3.1 DoS Hardening

Three additional protections for public-facing deployments:

Budget tokens: Each query consumes a fixed budget of distance ops and bytes. The runtime tracks a per-connection token bucket. No tokens remaining = query rejected with 429 Too Many Requests equivalent. Prevents sustained DoS via repeated adversarial queries.

Negative caching: If a query signature (hash of the query vector's quantized form) triggers degenerate mode more than N times in a window, the runtime caches it and forces SafetyNetBudget::DISABLED for subsequent matches. The adversary cannot keep burning budget on the same attack vector.

Proof-of-work option: For open-internet endpoints only. The caller must include a nonce proving O(work) computation before the query is accepted. This is opt-in, not default — only relevant for unauthenticated public endpoints.

3.4 Acceptance Test Update

Update benchmarks/acceptance-tests.md to:

Test against three distribution classes (natural, synthetic, adversarial)
Verify ResponseQuality flag accuracy at the API boundary
Verify monotonic recall improvement across progressive load phases
Measure brute-force fallback frequency and latency impact
Verify brute-force scan terminates within both time and candidate budgets

3.5 Acceptance Test: Malicious Tail Manifest (MANDATORY)

Test: A maliciously rewritten tail manifest that preserves CRC32C but changes hotset pointers must fail to mount under Strict policy, and must produce a logged, deterministic failure reason.

Test: Malicious Hotset Pointer Redirection
==========================================

Setup:
  1. Create signed RVF file with 100K vectors, full HNSW index
  2. Record the original centroid_seg_offset and centroid_content_hash
  3. Identify a different valid INDEX_SEG in the file (e.g., Layer B)
  4. Craft a new Level 0 manifest:
     - Replace centroid_seg_offset with the Layer B segment offset
     - Keep ALL other fields identical
     - Recompute CRC32C at 0xFFC to match the modified manifest
     - Do NOT re-sign (signature becomes invalid)
  5. Overwrite last 4096 bytes of file with crafted manifest

Verification under Strict policy:
  1. Attempt: RvfStore::open_with_policy(&path, opts, SecurityPolicy::Strict)
  2. MUST return Err(SecurityError::InvalidSignature)
  3. The error MUST include:
     - error_code: a stable, documented error code (not just a string)
     - manifest_offset: byte offset of the rejected manifest
     - expected_signer: public key fingerprint (if known)
     - rejection_phase: "signature_verification" (not "content_hash")
  4. The error MUST be logged at WARN level or higher
  5. The file MUST NOT be queryable (no partial mount, no fallback)

Verification under Paranoid policy:
  Same as Strict, identical behavior.

Verification under WarnOnly policy:
  1. File opens successfully (warning logged)
  2. Content hash verification runs on first hotset access
  3. centroid_content_hash mismatches the actual segment payload
  4. MUST return Err(SecurityError::ContentHashMismatch) on first query
  5. The error MUST include:
     - pointer_name: "centroid_seg_offset"
     - expected_hash: the hash stored in Level 0
     - actual_hash: the hash of the segment at the pointed offset
     - seg_offset: the byte offset that was followed
  6. System transitions to read-only mode, refuses further queries

Verification under Permissive policy:
  1. File opens successfully (no warning)
  2. Queries execute against the wrong segment
  3. Results are structurally valid but semantically wrong
  4. ResponseQuality is NOT required to detect this (Permissive = no safety)
  5. This is the EXPECTED AND DOCUMENTED behavior of Permissive mode

Pass criteria:
  - Strict/Paranoid: deterministic rejection, logged error, no mount
  - WarnOnly: mount succeeds, content hash catches mismatch on first access
  - Permissive: mount succeeds, no detection (by design)
  - Error messages are stable across versions (code, not prose)
  - No panic, no undefined behavior, no partial state leakage

Test: Malicious Manifest with Re-signed Forgery

Setup:
  1. Same as above, but attacker also re-signs with a DIFFERENT key
  2. File now has valid CRC32C AND valid signature — but wrong signer

Verification under Strict policy:
  1. MUST return Err(SecurityError::UnknownSigner)
  2. Error includes the actual signer fingerprint
  3. Error includes the expected signer fingerprint (from trust store)
  4. File does not mount

Pass criteria:
  - The system distinguishes "no signature" from "wrong signer"
  - Both produce distinct, documented error codes

3.6 Acceptance Tests: QualityEnvelope Enforcement (MANDATORY)

Test 1: Consumer Cannot Ignore QualityEnvelope

Test: Schema Enforcement of QualityEnvelope
============================================

Setup:
  1. Create RVF file with 10K vectors, full index
  2. Issue a query that returns Degraded results (use degenerate query vector)

Verification:
  1. The query API returns QualityEnvelope, not Vec<SearchResult>
  2. Attempt to deserialize the response as Vec<SearchResult> (without envelope)
  3. MUST fail at schema validation — the envelope is the outer type
  4. JSON response: top-level keys MUST include "quality", "evidence", "budgets"
  5. gRPC response: QualityEnvelope is the response message type
  6. MCP tool response: "quality" field is at top level, not nested

Pass criteria:
  - No API path exists that returns raw results without the envelope
  - Schema validation rejects any consumer that skips the quality field
  - The envelope cannot be flattened away by middleware or serialization

Test 2: Adversarial Query Respects max_distance_ops Under Safety Net

Test: Budget Cap Enforcement Under Adversarial Query
=====================================================

Setup:
  1. Create RVF file with 1M vectors, Layer A only (no HNSW loaded)
  2. Set SafetyNetBudget to LAYER_A defaults (10,000 distance ops)
  3. Craft adversarial query that triggers degenerate detection
     (uniform-random vector or equidistant from all centroids)

Verification:
  1. Issue query with quality_preference = Auto
  2. Safety net activates (candidate set < 2*k from HNSW)
  3. BudgetReport.distance_ops MUST be <= SafetyNetBudget.max_distance_ops
  4. BudgetReport.distance_ops MUST be <= 10,000
  5. Total query wall-clock MUST be <= SafetyNetBudget.max_scan_time_us
  6. DegradationReport.reason MUST be BudgetExhausted if budget was hit
  7. ResponseQuality MUST be Degraded (not Verified or Usable)

Stress test:
  1. Repeat with 10,000 adversarial queries in sequence
  2. No single query may exceed max_distance_ops
  3. Aggregate p95 latency MUST stay below max_scan_time_us ceiling
  4. No OOM, no panic, no unbounded allocation

Pass criteria:
  - max_distance_ops is a hard cap, never exceeded by even 1 operation
  - Budget enforcement works under all three safety net phases
  - Each phase independently respects all three budget caps

Test 3: Degenerate Conditions Produce Partial Results, Not Hangs

Test: Graceful Degradation Under Degenerate Conditions
=======================================================

Setup:
  1. Create RVF file with 1M uniform-random vectors (worst case)
  2. Load with Layer A only (no HNSW, no Layer B/C)
  3. All centroids equidistant from query (maximum degeneracy)

Verification:
  1. Issue query with quality_preference = Auto
  2. Runtime MUST return within max_scan_time_us (not hang)
  3. Return type MUST be Err(RvfError::QualityBelowThreshold(envelope))
  4. The envelope MUST contain:
     a. A partial result set (whatever was found before budget hit)
     b. quality = ResponseQuality::Degraded or Unreliable
     c. degradation.reason = BudgetExhausted or DegenerateDistribution
     d. degradation.guarantee_lost describes what is missing
     e. budgets.distance_ops <= budgets.distance_ops_budget
  5. The caller can then choose:
     a. Retry with PreferQuality (extends budget 4x)
     b. Retry with AcceptDegraded (uses partial results as-is)
     c. Wait for Layer B to load and retry

  6. With AcceptDegraded:
     a. Same partial results are returned as Ok(envelope)
     b. ResponseQuality is still Degraded (honesty preserved)
     c. No additional scanning beyond what was already done

Pass criteria:
  - No hang, no scan-to-completion, no unbounded work
  - Partial results are always available (not empty unless truly zero candidates)
  - Clear, structured reason for degradation (not a string, a typed enum)
  - Caller can always recover by choosing a different QualityPreference

3.7 Benchmark: Fuzzed Query Latency Ceiling (MANDATORY)

Benchmark: Fuzzed Query Generator vs Budget Ceiling
=====================================================

Setup:
  1. Create RVF file with 10M vectors, 384 dimensions, fp16
  2. Generate a fuzzed query corpus:
     a. 1000 natural embedding queries (sentence-transformer outputs)
     b. 1000 uniform-random queries
     c. 1000 adversarial queries (equidistant from top-K centroids)
     d. 1000 degenerate queries (zero vector, max-norm vector, NaN-adjacent)
  3. Load file progressively: measure at Layer A, A+B, A+B+C

Test:
  1. Execute all 4000 queries at each progressive load stage
  2. Measure p50, p95, p99, max latency per query class per stage

Pass criteria:
  - p95 latency MUST NOT exceed SafetyNetBudget.max_scan_time_us at any stage
  - p99 latency MUST NOT exceed 2x SafetyNetBudget.max_scan_time_us at any stage
    (allowing for OS scheduling jitter, not algorithmic overshoot)
  - max_distance_ops is NEVER exceeded (hard invariant, no exceptions)
  - Recall improves monotonically across stages for all query classes:
    recall@10(Layer A) <= recall@10(A+B) <= recall@10(A+B+C)
  - No query class achieves recall@10 = 0.0 at any stage
    (even degenerate queries must return SOME results)

Report:
  JSON report per stage with:
    stage, query_class, p50_us, p95_us, p99_us, max_us,
    avg_recall_at_10, min_recall_at_10, avg_distance_ops,
    max_distance_ops, safety_net_trigger_rate, budget_exhaustion_rate

4. Mandatory Manifest Signatures

Invariant: No signature, no mount in secure mode.

4.1 Security Mount Policy

Add a SecurityPolicy enum to RvfOptions:

/// Manifest signature verification policy.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[repr(u8)]
pub enum SecurityPolicy {
    /// No signature verification. For development and testing only.
    /// Files open regardless of signature state.
    Permissive = 0x00,
    /// Warn on missing or invalid signatures, but allow open.
    /// Log events for auditing.
    WarnOnly = 0x01,
    /// Require valid signature on Level 0 manifest.
    /// Reject files with missing or invalid signatures.
    /// DEFAULT for production.
    Strict = 0x02,
    /// Require valid signatures on Level 0, Level 1, and all
    /// hotset-referenced segments. Full chain verification.
    Paranoid = 0x03,
}

impl Default for SecurityPolicy {
    fn default() -> Self {
        Self::Strict
    }
}

Default is Strict, not Permissive.

4.2 Verification Chain

Under Strict policy, the open path becomes:

1. Read Level 0 (4096 bytes)
2. Validate CRC32C (corruption check)
3. Validate ML-DSA-65 signature (adversarial check)
4. If signature missing: REJECT with SecurityError::UnsignedManifest
5. If signature invalid: REJECT with SecurityError::InvalidSignature
6. Extract hotset pointers
7. For each hotset pointer: validate content hash (ADR-033 §1.1)
8. If any content hash fails: REJECT with SecurityError::ContentHashMismatch
9. System is now queryable with verified pointers

Under Paranoid policy, add:

10. Read Level 1 manifest
11. Validate Level 1 signature
12. For each segment in directory: verify content hash matches on first access

4.3 Unsigned File Handling

Files without signatures can still be opened under Permissive or WarnOnly policies. This supports:

Development and testing workflows
Legacy files created before signature support
Performance-critical paths where verification latency is unacceptable

But the default is Strict. If an enterprise deploys with defaults, they get signature enforcement. They must explicitly opt out.

4.4 Signature Generation on Write

Every write_manifest() call MUST:

Compute SHAKE-256-256 content hashes for all hotset-referenced segments
Store hashes in Level 0 at the new offsets (§1.4)
If a signing key is available: sign Level 0 with ML-DSA-65
If no signing key: write sig_algo = 0 (unsigned)

The create() and open() methods accept an optional signing key:

impl RvfStore {
    pub fn create_signed(
        path: &Path,
        options: RvfOptions,
        signing_key: &MlDsa65SigningKey,
    ) -> Result<Self, RvfError>;
}

4.5 Runtime Policy Flag

The security policy is set at store open time and cannot be downgraded:

let store = RvfStore::open_with_policy(
    &path,
    RvfOptions::default(),
    SecurityPolicy::Strict,
)?;

A store opened with Strict policy will reject any hotset pointer that fails content hash verification, even if the CRC32C passes. This prevents the segment-swap attack identified in the analysis.

Consequences

Positive

Centroid stability becomes a logical invariant, not a physical accident
Adversarial distribution degradation becomes detectable and bounded
Recall claims become honest — empirical targets with explicit assumptions
Manifest integrity becomes mandatory by default — enterprises are secure without configuration
Quality elasticity replaces silent degradation — the system tells you when it's uncertain

Negative

Level 0 layout change is breaking (version 1 -> version 2)
Content hash computation adds ~50 microseconds per manifest write
Strict signature policy adds ~200 microseconds per file open (ML-DSA-65 verify)
Adaptive n_probe increases query latency by up to 4x under degenerate distributions

Migration

Level 0 version field (0x004) distinguishes v1 (pre-ADR-033) from v2
v1 files are readable under Permissive policy (no content hashes, no signature)
v1 files trigger a warning under WarnOnly policy
v1 files are rejected under Strict policy unless explicitly migrated
Migration tool: rvf migrate --sign --key <path> rewrites manifest with v2 layout

Size Impact

Component	Additional Bytes	Where
Content hashes (5 pointers * 16 bytes)	80 B	Level 0 manifest
Centroid epoch + drift fields	8 B	Level 0 manifest
ResponseQuality + DegradationReason	~64 B	Per query response
SecurityPolicy in options	1 B	Runtime config
Total Level 0 overhead	96 B	Within existing 4096 B page

No additional segments. No file size increase beyond the 96 bytes in Level 0.

Implementation Order

Phase	Component	Estimated Effort
1	Content hash fields in `rvf-types` Level 0 layout	Small
2	`centroid_epoch` + `max_epoch_drift` in manifest	Small
3	`ResultQuality` enum in `rvf-runtime`	Small
4	`is_degenerate_distribution()` + adaptive n_probe	Medium
5	Content hash verification in read path	Medium
6	`SecurityPolicy` enum + enforcement in open path	Medium
7	ML-DSA-65 signing in write path	Large (depends on rvf-crypto)
8	Brute-force safety net in query path	Medium
9	Acceptance test updates (3 distribution classes)	Medium
10	Migration tool (`rvf migrate --sign`)	Medium

References

RVF Spec 02: Manifest System (hotset pointers, Level 0 layout)
RVF Spec 04: Progressive Indexing (Layer A/B/C recall targets)
RVF Spec 03: Temperature Tiering (centroid refresh, sketch epochs)
ADR-029: RVF Canonical Format (universal adoption across libraries)
ADR-030: Cognitive Container (three-tier execution model)
FIPS 204: ML-DSA (Module-Lattice Digital Signature Algorithm)
Malkov & Yashunin (2018): HNSW search complexity analysis

39 KiB Raw Permalink Blame History