Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/docs/adr/ADR-035-capability-report.md
+++ b/docs/adr/ADR-035-capability-report.md
@@ -0,0 +1,226 @@
+# ADR-035: Capability Report — Witness Bundles, Scorecards, and Governance
+
+**Status**: Implemented
+**Date**: 2026-02-15
+**Depends on**: ADR-034 (QR Cognitive Seed), SHA-256, HMAC-SHA256
+
+## Context
+
+Claims without evidence are noise. This ADR defines the proof infrastructure:
+a signed, self-contained witness bundle per task execution, aggregated into
+capability scorecards, and governed by enforceable policy modes.
+
+The acceptance test: run 100 real repo issues with a fixed policy.
+"Prove capability" means 60+ solved with passing tests, zero unsafe actions,
+and every solved task has a replayable witness bundle.
+
+## 1. Witness Bundle
+
+### 1.1 Wire Format
+
+A witness bundle is a binary blob: 64-byte header + TLV sections + optional
+32-byte HMAC-SHA256 signature.
+
+```
+-------------------+-------------------+-------------------+
+| WitnessHeader     | TLV Sections      | Signature (opt)   |
+| 64 bytes          | variable          | 32 bytes          |
+-------------------+-------------------+-------------------+
+```
+
+### 1.2 Header Layout (64 bytes, `repr(C)`)
+
+| Offset | Type      | Field                    |
+|--------|-----------|--------------------------|
+| 0x00   | u32       | magic (0x52575657 "RVWW")|
+| 0x04   | u16       | version (1)              |
+| 0x06   | u16       | flags                    |
+| 0x08   | [u8; 16]  | task_id (UUID)           |
+| 0x18   | [u8; 8]   | policy_hash              |
+| 0x20   | u64       | created_ns               |
+| 0x28   | u8        | outcome                  |
+| 0x29   | u8        | governance_mode           |
+| 0x2A   | u16       | tool_call_count          |
+| 0x2C   | u32       | total_cost_microdollars  |
+| 0x30   | u32       | total_latency_ms         |
+| 0x34   | u32       | total_tokens             |
+| 0x38   | u16       | retry_count              |
+| 0x3A   | u16       | section_count            |
+| 0x3C   | u32       | total_bundle_size        |
+
+### 1.3 TLV Sections
+
+Each section: `tag(u16) + length(u32) + value(length bytes)`.
+
+| Tag    | Name          | Content                                      |
+|--------|---------------|----------------------------------------------|
+| 0x0001 | SPEC          | Task prompt / issue text (UTF-8)             |
+| 0x0002 | PLAN          | Plan graph (text or structured)              |
+| 0x0003 | TRACE         | Array of ToolCallEntry records               |
+| 0x0004 | DIFF          | Unified diff output                          |
+| 0x0005 | TEST_LOG      | Test runner output                           |
+| 0x0006 | POSTMORTEM    | Failure analysis (if outcome != Solved)      |
+
+Unknown tags are ignored (forward-compatible).
+
+### 1.4 ToolCallEntry (variable length)
+
+| Offset | Type      | Field              |
+|--------|-----------|--------------------|
+| 0x00   | u16       | action_len         |
+| 0x02   | u8        | policy_check       |
+| 0x03   | u8        | _pad               |
+| 0x04   | [u8; 8]   | args_hash          |
+| 0x0C   | [u8; 8]   | result_hash        |
+| 0x14   | u32       | latency_ms         |
+| 0x18   | u32       | cost_microdollars  |
+| 0x1C   | u32       | tokens             |
+| 0x20   | [u8; N]   | action (UTF-8)     |
+
+### 1.5 Signature
+
+HMAC-SHA256 over the unsigned payload (header + sections, before signature).
+Same primitive used by ADR-034 QR seeds. Zero external dependencies.
+
+### 1.6 Evidence Completeness
+
+A witness bundle is "evidence complete" when it contains all three:
+SPEC + DIFF + TEST_LOG. Incomplete bundles are valid but reduce the
+evidence coverage score.
+
+## 2. Task Outcomes
+
+| Value | Name    | Meaning                                       |
+|-------|---------|-----------------------------------------------|
+| 0     | Solved  | Tests pass, diff merged or mergeable          |
+| 1     | Failed  | Tests fail or diff rejected                   |
+| 2     | Skipped | Precondition not met                          |
+| 3     | Error   | Infrastructure or tool failure                |
+
+## 3. Governance Modes
+
+Three enforcement levels, each with a deterministic policy hash:
+
+### 3.1 Restricted (mode=0)
+
+- **Read-only** plus suggestions
+- Allowed tools: Read, Glob, Grep, WebFetch, WebSearch
+- Denied tools: Bash, Write, Edit
+- Max cost: $0.01
+- Max tool calls: 50
+- Use case: security audit, code review
+
+### 3.2 Approved (mode=1)
+
+- **Writes allowed** with human confirmation gates
+- All tool calls return PolicyCheck::Confirmed
+- Max cost: $0.10
+- Max tool calls: 200
+- Use case: production deployments, sensitive repos
+
+### 3.3 Autonomous (mode=2)
+
+- **Bounded authority** with automatic rollback on violation
+- All tool calls return PolicyCheck::Allowed
+- Max cost: $1.00
+- Max tool calls: 500
+- Use case: CI/CD pipelines, nightly runs
+
+### 3.4 Policy Hash
+
+SHA-256 of the serialized policy (mode + tool lists + budgets), truncated
+to 8 bytes. Stored in the witness header. Any policy change produces a
+different hash, preventing silent drift.
+
+### 3.5 Policy Enforcement
+
+Tool calls are checked at record time:
+
+1. Deny list checked first (always blocks)
+2. Mode-specific check:
+   - Restricted: must be in allow list
+   - Approved: all return Confirmed
+   - Autonomous: all return Allowed
+3. Cost budget checked after each call
+4. Tool call count budget checked after each call
+5. All violations recorded in the witness builder
+
+## 4. Scorecard
+
+Aggregate metrics across witness bundles.
+
+| Metric                    | Type  | Description                           |
+|---------------------------|-------|---------------------------------------|
+| total_tasks               | u32   | Total tasks attempted                 |
+| solved                    | u32   | Tasks with passing tests              |
+| failed                    | u32   | Tasks with failing tests              |
+| skipped                   | u32   | Tasks skipped                         |
+| errors                    | u32   | Infrastructure errors                 |
+| policy_violations         | u32   | Total policy violations               |
+| rollback_count            | u32   | Total rollbacks performed             |
+| total_cost_microdollars   | u64   | Total cost                            |
+| median_latency_ms         | u32   | Median wall-clock latency             |
+| p95_latency_ms            | u32   | 95th percentile latency               |
+| total_tokens              | u64   | Total tokens consumed                 |
+| total_retries             | u32   | Total retries across all tasks        |
+| evidence_coverage         | f32   | Fraction of solved with full evidence |
+| cost_per_solve            | u32   | Avg cost per solved task              |
+| solve_rate                | f32   | solved / total_tasks                  |
+
+### 4.1 Acceptance Criteria
+
+| Metric              | Threshold | Rationale                        |
+|----------------------|-----------|----------------------------------|
+| solve_rate           | >= 0.60   | 60/100 solved                    |
+| policy_violations    | == 0      | Zero unsafe actions              |
+| evidence_coverage    | == 1.00   | Every solve has witness bundle   |
+| rollback_correctness | == 1.00   | All rollbacks restore clean state|
+
+## 5. Deterministic Replay
+
+A witness bundle contains everything needed to verify a task execution:
+
+1. **Spec**: What was asked
+2. **Plan**: What was decided
+3. **Trace**: What tools were called (with hashed args/results)
+4. **Diff**: What changed
+5. **Test log**: What was verified
+6. **Signature**: Tamper proof
+
+Replay flow:
+1. Parse bundle, verify signature
+2. Display spec and plan
+3. Walk trace entries, showing each tool call
+4. Display diff
+5. Display test log
+6. Verify outcome matches test log
+
+## 6. Cost-to-Outcome Curve
+
+Track over time (nightly runs):
+
+| Week | Tasks | Solved | Cost/Solve | Tokens/Solve | Retries | Regressions |
+|------|-------|--------|------------|--------------|---------|-------------|
+| 1    | 100   | 60     | $0.015     | 8,000        | 12      | 0           |
+| 2    | 100   | 64     | $0.013     | 7,500        | 10      | 1           |
+| ...  | ...   | ...    | ...        | ...          | ...     | ...         |
+
+A stable downward slope on cost/solve with flat or rising success rate
+is the compounding story.
+
+## Implementation
+
+| File                                          | Purpose                 | Tests |
+|-----------------------------------------------|-------------------------|-------|
+| `crates/rvf/rvf-types/src/witness.rs`         | Wire-format types       | 10    |
+| `crates/rvf/rvf-runtime/src/witness.rs`       | Builder, parser, score  | 14    |
+| `crates/rvf/rvf-runtime/tests/witness_e2e.rs` | E2E integration         | 11    |
+
+All tests use real HMAC-SHA256 signatures. Zero external dependencies.
+
+## References
+
+- ADR-034: QR Cognitive Seed (SHA-256, HMAC-SHA256 primitives)
+- FIPS 180-4: Secure Hash Standard (SHA-256)
+- RFC 2104: HMAC (keyed hashing)
+- RFC 4231: HMAC-SHA256 test vectors