# DDC-001: Anytime-Valid Coherence Gate - Design Decision Criteria **Version**: 1.0 **Date**: 2026-01-17 **Related ADR**: ADR-001-anytime-valid-coherence-gate ## Purpose This document specifies the design decision criteria for implementing the Anytime-Valid Coherence Gate (AVCG). It provides concrete guidance for architectural choices, implementation trade-offs, and acceptance criteria. --- ## 1. Graph Model Design Decisions ### DDC-1.1: Action Graph Construction **Decision Required**: How to construct the action graph G_t from agent state? | Option | Description | Pros | Cons | Recommendation | |--------|-------------|------|------|----------------| | **A. State-Action Pairs** | Nodes = (state, action), Edges = transitions | Fine-grained control; precise cuts | Large graphs; O(|S|·|A|) nodes | Use for high-stakes domains | | **B. Abstract State Clusters** | Nodes = state clusters, Edges = aggregate transitions | Smaller graphs; faster updates | May miss nuanced boundaries | **Recommended for v0** | | **C. Learned Embeddings** | Nodes = learned state embeddings | Adaptive; captures latent structure | Requires training data; less interpretable | Future enhancement | **Acceptance Criteria**: - [ ] Graph construction completes in < 100μs for typical agent states - [ ] Graph accurately represents reachability to unsafe states - [ ] Witness partitions are human-interpretable ### DDC-1.2: Edge Weight Semantics **Decision Required**: What do edge weights represent? | Option | Interpretation | Use Case | |--------|---------------|----------| | **A. Risk Scores** | Higher weight = higher risk of unsafe outcome | Min-cut = minimum total risk to unsafe | | **B. Inverse Probability** | Higher weight = less likely transition | Min-cut = least likely path to unsafe | | **C. Unit Weights** | All edges weight 1.0 | Min-cut = fewest actions to unsafe | | **D. Conformal Set Size** | Weight = |C_t| for that action | Natural integration with predictive uncertainty | **Recommendation**: Option D creates natural integration between min-cut and conformal prediction. **Acceptance Criteria**: - [ ] Weight semantics are documented and consistent - [ ] Min-cut value has interpretable meaning for operators - [ ] Weights update correctly on new observations --- ## 2. Conformal Predictor Architecture ### DDC-2.1: Base Predictor Selection **Decision Required**: Which base predictor to wrap with conformal prediction? | Option | Characteristics | Computational Cost | |--------|----------------|-------------------| | **A. Neural Network** | High capacity; requires calibration | Medium-High | | **B. Random Forest** | Built-in uncertainty; robust | Medium | | **C. Gaussian Process** | Natural uncertainty; O(n³) training | High | | **D. Ensemble with Dropout** | Approximate Bayesian; scalable | Medium | **Recommendation**: Option D (Ensemble with Dropout) for balance of capacity and uncertainty. **Acceptance Criteria**: - [ ] Base predictor achieves acceptable accuracy on held-out data - [ ] Prediction latency < 10ms for single action - [ ] Uncertainty estimates correlate with actual error rates ### DDC-2.2: Non-Conformity Score Function **Decision Required**: How to compute non-conformity scores? | Option | Formula | Properties | |--------|---------|------------| | **A. Absolute Residual** | s(x,y) = |y - ŷ(x)| | Simple; symmetric | | **B. Normalized Residual** | s(x,y) = |y - ŷ(x)| / σ̂(x) | Scale-invariant | | **C. CQR** | s(x,y) = max(q̂_lo - y, y - q̂_hi) | Heteroscedastic coverage | **Recommendation**: Option C (CQR) for heteroscedastic agent environments. **Acceptance Criteria**: - [ ] Marginal coverage ≥ 1 - α over calibration window - [ ] Conditional coverage approximately uniform across feature space - [ ] Prediction sets are not trivially large ### DDC-2.3: Shift Adaptation Method **Decision Required**: How to adapt conformal predictor to distribution shift? | Method | Adaptation Speed | Conservativeness | |--------|-----------------|------------------| | **A. ACI (Adaptive Conformal)** | Medium | High | | **B. Retrospective Adjustment** | Fast | Medium | | **C. COP (Conformal Optimistic)** | Fastest | Low (but valid) | | **D. CORE (RL-based)** | Adaptive | Task-dependent | **Recommendation**: Hybrid approach: - Use COP for normal operation (fast, less conservative) - Fall back to ACI under detected severe shift - Use retrospective adjustment for post-hoc correction **Acceptance Criteria**: - [ ] Coverage maintained during gradual shift (δ < 0.1/step) - [ ] Recovery to target coverage within 100 steps after abrupt shift - [ ] No catastrophic coverage failures (coverage never < 0.5) --- ## 3. E-Process Construction ### DDC-3.1: E-Value Computation Method **Decision Required**: How to compute per-action e-values? | Method | Requirements | Robustness | |--------|--------------|------------| | **A. Likelihood Ratio** | Density models for H₀ and H₁ | Low (model-dependent) | | **B. Universal Inference** | Split data; no density needed | Medium | | **C. Mixture E-Values** | Multiple alternatives | High (hedged) | | **D. Betting E-Values** | Online learning framework | High (adaptive) | **Recommendation**: Option C (Mixture E-Values) for robustness: ``` e_t = (1/K) Σ_k e_t^{(k)} ``` Where each e_t^{(k)} tests a different alternative hypothesis. **Acceptance Criteria**: - [ ] E[e_t | H₀] ≤ 1 verified empirically - [ ] Power against reasonable alternatives > 0.5 - [ ] Computation time < 1ms per e-value ### DDC-3.2: E-Process Update Rule **Decision Required**: How to update the e-process over time? | Rule | Formula | Properties | |------|---------|------------| | **A. Product** | E_t = Π_{i=1}^t e_i | Aggressive; exponential power | | **B. Average** | E_t = (1/t) Σ_{i=1}^t e_i | Conservative; bounded | | **C. Exponential Moving** | E_t = λ·e_t + (1-λ)·E_{t-1} | Balanced; forgetting | | **D. Mixture Supermartingale** | E_t = Σ_j w_j · E_t^{(j)} | Robust; hedged | **Recommendation**: - Option A (Product) for high-stakes single decisions - Option D (Mixture) for continuous monitoring **Acceptance Criteria**: - [ ] E_t remains nonnegative supermartingale - [ ] Stopping time τ has valid Type I error: P(E_τ ≥ 1/α) ≤ α - [ ] Power grows with evidence accumulation ### DDC-3.3: Null Hypothesis Specification **Decision Required**: What constitutes the "coherence" null hypothesis? | Formulation | Meaning | |-------------|---------| | **A. Action Safety** | H₀: P(action leads to unsafe state) ≤ p₀ | | **B. State Stability** | H₀: P(state deviates from normal) ≤ p₀ | | **C. Policy Consistency** | H₀: Current policy ≈ reference policy | | **D. Composite** | H₀: (A) ∧ (B) ∧ (C) | **Recommendation**: Start with Option A, extend to Option D for production. **Acceptance Criteria**: - [ ] H₀ is well-specified and testable - [ ] False alarm rate matches target α - [ ] Null violations are meaningfully dangerous --- ## 4. Integration Architecture ### DDC-4.1: Signal Combination Strategy **Decision Required**: How to combine the three signals into a gate decision? | Strategy | Logic | Properties | |----------|-------|------------| | **A. Sequential Short-Circuit** | Cut → Conformal → E-process | Fast rejection; ordered | | **B. Parallel with Voting** | All evaluate; majority rules | Robust; slower | | **C. Weighted Integration** | score = w₁·cut + w₂·conf + w₃·e | Flexible; needs tuning | | **D. Hierarchical** | E-process gates conformal gates cut | Layered authority | **Recommendation**: Option A (Sequential Short-Circuit): 1. Min-cut DENY is immediate (structural safety) 2. Conformal uncertainty gates e-process (no point accumulating evidence if outcome unpredictable) 3. E-process makes final permit/defer decision **Acceptance Criteria**: - [ ] Gate latency < 50ms for typical decisions - [ ] No single-point-of-failure (graceful degradation) - [ ] Decision audit trail is complete ### DDC-4.2: Graceful Degradation **Decision Required**: How should the gate behave when components fail? | Component Failure | Fallback Behavior | |-------------------|-------------------| | Min-cut unavailable | Defer all actions; alert operator | | Conformal predictor fails | Use widened prediction sets (conservative) | | E-process computation fails | Use last valid e-value; decay confidence | | All components fail | Full DENY; require human approval | **Acceptance Criteria**: - [ ] Failure detection within 100ms - [ ] Fallback never less safe than full DENY - [ ] Recovery is automatic when component restores ### DDC-4.3: Latency Budget Allocation **Decision Required**: How to allocate total latency budget across components? Given total budget T_total (e.g., 50ms): | Component | Allocation | Rationale | |-----------|------------|-----------| | Min-cut update | 0.2 · T | Amortized; subpolynomial | | Conformal prediction | 0.4 · T | Main computation | | E-process update | 0.2 · T | Arithmetic; fast | | Decision logic | 0.1 · T | Simple rules | | Receipt generation | 0.1 · T | Hashing; logging | **Acceptance Criteria**: - [ ] p99 latency < T_total - [ ] No component exceeds 2× its budget - [ ] Latency monitoring in place --- ## 5. Operational Parameters ### DDC-5.1: Threshold Configuration | Parameter | Symbol | Default | Range | Tuning Guidance | |-----------|--------|---------|-------|-----------------| | E-process deny threshold | τ_deny | 0.01 | [0.001, 0.1] | Lower = more conservative | | E-process permit threshold | τ_permit | 100 | [10, 1000] | Higher = more evidence required | | Uncertainty threshold | θ_uncertainty | 0.5 | [0.1, 1.0] | Fraction of outcome space | | Confidence threshold | θ_confidence | 0.1 | [0.01, 0.3] | Fraction of outcome space | | Conformal coverage target | 1-α | 0.9 | [0.8, 0.99] | Higher = larger sets | ### DDC-5.2: Audit Requirements | Requirement | Specification | |-------------|---------------| | Receipt retention | 90 days minimum | | Receipt format | JSON + protobuf | | Receipt signing | Ed25519 signature | | Receipt searchability | Indexed by action_id, timestamp, decision | | Receipt integrity | Merkle tree for batch verification | --- ## 6. Testing & Validation Criteria ### DDC-6.1: Unit Test Coverage | Module | Coverage Target | Critical Paths | |--------|-----------------|----------------| | conformal/ | ≥ 90% | Prediction set generation; shift adaptation | | eprocess/ | ≥ 95% | E-value validity; supermartingale property | | anytime_gate/ | ≥ 90% | Decision logic; receipt generation | ### DDC-6.2: Integration Test Scenarios | Scenario | Expected Behavior | |----------|-------------------| | Normal operation | Permit rate > 90% | | Gradual shift | Coverage maintained; permit rate may decrease | | Abrupt shift | Temporary DEFER; recovery within 100 steps | | Adversarial probe | DENY rate increases; alerts generated | | Component failure | Graceful degradation; no unsafe permits | ### DDC-6.3: Benchmark Requirements | Metric | Target | Measurement Method | |--------|--------|-------------------| | Gate latency p50 | < 10ms | Continuous profiling | | Gate latency p99 | < 50ms | Continuous profiling | | False deny rate | < 5% | Simulation with known-safe actions | | Missed unsafe rate | < 0.1% | Simulation with known-unsafe actions | | Coverage maintenance | ≥ 85% | Real distribution shift scenarios | --- ## 7. Implementation Phases ### Phase 1: Foundation (v0.1) - [ ] E-value and e-process core implementation - [ ] Basic conformal prediction with ACI - [ ] Integration with existing `GateController` - [ ] Simple witness receipts ### Phase 2: Adaptation (v0.2) - [ ] COP and retrospective adjustment - [ ] Mixture e-values for robustness - [ ] Graph model with conformal-based weights - [ ] Enhanced audit trail ### Phase 3: Production (v1.0) - [ ] CORE RL-based adaptation - [ ] Learned graph construction - [ ] Cryptographic receipt signing - [ ] Full monitoring and alerting --- ## 8. Open Questions for Review 1. **Graph Model Scope**: Should the action graph include only immediate actions or multi-step lookahead? 2. **E-Process Null**: Is "action safety" the right null hypothesis, or should we test "policy consistency"? 3. **Threshold Learning**: Should thresholds be fixed or learned via meta-optimization? 4. **Human-in-Loop**: How should DEFER decisions be presented to human operators? 5. **Adversarial Robustness**: How does AVCG perform against adaptive adversaries who observe gate decisions? --- ## 9. Sign-Off | Role | Name | Date | Signature | |------|------|------|-----------| | Architecture Lead | | | | | Security Lead | | | | | ML Lead | | | | | Engineering Lead | | | | --- ## Appendix A: Glossary | Term | Definition | |------|------------| | **E-value** | Nonnegative test statistic with E[e] ≤ 1 under null | | **E-process** | Sequence of e-values forming a nonnegative supermartingale | | **Conformal Prediction** | Distribution-free method for calibrated uncertainty | | **Witness Partition** | Explicit (S, V\S) showing which vertices are separated | | **Anytime-Valid** | Guarantee holds at any stopping time | | **COP** | Conformal Optimistic Prediction | | **CORE** | Conformal Regression via Reinforcement Learning | | **ACI** | Adaptive Conformal Inference | ## Appendix B: Key Equations ### E-Value Validity ``` E_H₀[e] ≤ 1 ``` ### Anytime-Valid Type I Error ``` P_H₀(∃t: E_t ≥ 1/α) ≤ α ``` ### Conformal Coverage ``` P(Y_{t+1} ∈ C_t(X_{t+1})) ≥ 1 - α ``` ### E-Value Composition ``` e₁ · e₂ is valid if e₁, e₂ independent ```