git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
522 lines
19 KiB
Markdown
522 lines
19 KiB
Markdown
# Axis 6: Formal Verification -- Proof-Carrying GNN
|
|
|
|
**Document:** 26 of 30
|
|
**Series:** Graph Transformers: 2026-2036 and Beyond
|
|
**Last Updated:** 2026-02-25
|
|
**Status:** Research Prospectus
|
|
|
|
---
|
|
|
|
## 1. Problem Statement
|
|
|
|
Neural networks are black boxes. For safety-critical applications -- autonomous vehicles, medical diagnosis, financial systems, infrastructure control -- we need formal guarantees about what a graph transformer will and will not do. The verification axis asks: can we attach machine-checkable proofs to graph transformer computations?
|
|
|
|
### 1.1 What We Want to Verify
|
|
|
|
| Property | Definition | Difficulty |
|
|
|----------|-----------|------------|
|
|
| Robustness | small input perturbation -> small output change | Medium |
|
|
| Fairness | attention does not discriminate on protected attributes | Hard |
|
|
| Monotonicity | increasing input feature -> non-decreasing output | Medium |
|
|
| Lipschitz bound | ||f(x) - f(y)|| <= L * ||x - y|| | Medium |
|
|
| Graph invariant preservation | if input has property P, output has property P | Hard |
|
|
| Convergence | training reaches epsilon-optimal in T steps | Very Hard |
|
|
| Completeness | all relevant nodes are attended to | Hard |
|
|
| Soundness | every attended node is relevant | Hard |
|
|
|
|
### 1.2 The Verification Gap (2026)
|
|
|
|
Current state of neural network verification:
|
|
- **Interval arithmetic**: Can verify small networks (~1000 neurons). Not scalable to graph transformers.
|
|
- **Abstract interpretation**: Over-approximates reachable states. High false-positive rate.
|
|
- **SMT solving**: Exact but exponential. Limited to very small networks.
|
|
- **Randomized testing**: Finds bugs but provides no guarantees.
|
|
- **Certified training**: Trains with verification-friendly objectives. Sacrifices accuracy.
|
|
|
|
None of these approaches handles the combinatorial complexity of graph structure.
|
|
|
|
### 1.3 RuVector Baseline
|
|
|
|
- **`ruvector-verified`**: Lean-agentic dependent types, proof-carrying vector operations, 82-byte attestations, pipeline verification, gated verification, invariants
|
|
- **`ruvector-verified`** modules: `cache.rs`, `fast_arena.rs`, `gated.rs`, `invariants.rs`, `pipeline.rs`, `pools.rs`, `proof_store.rs`, `vector_types.rs`
|
|
- **`ruvector-coherence`**: Spectral coherence, embedding stability guarantees
|
|
|
|
This is RuVector's strongest competitive advantage. No other graph ML system has production-ready formal verification infrastructure.
|
|
|
|
---
|
|
|
|
## 2. Proof-Carrying Graph Attention
|
|
|
|
### 2.1 The Proof-Carrying Code Paradigm
|
|
|
|
Proof-carrying code (PCC, Necula 1997) attaches machine-checkable proofs to programs. We extend this to graph attention:
|
|
|
|
**Proof-carrying attention weight:**
|
|
```
|
|
struct CertifiedAttention {
|
|
/// The attention weight value
|
|
weight: f32,
|
|
/// Proof that weight satisfies property P
|
|
proof: Proof<P>,
|
|
/// The property being certified
|
|
property: AttentionProperty,
|
|
}
|
|
```
|
|
|
|
**Properties we can certify per attention weight:**
|
|
|
|
1. **Non-negativity**: alpha_{uv} >= 0 (trivial after softmax)
|
|
2. **Normalization**: sum_v alpha_{uv} = 1 (follows from softmax definition)
|
|
3. **Locality bound**: alpha_{uv} < epsilon for dist(u,v) > r (attention decays with distance)
|
|
4. **Fairness**: alpha_{uv} is independent of protected attribute A_v
|
|
5. **Robustness**: |alpha_{uv}(x) - alpha_{uv}(x')| < delta for ||x - x'|| < epsilon
|
|
|
|
### 2.2 Dependent Types for Graph Operations
|
|
|
|
**Core idea.** Use dependent types to express graph properties at the type level. The type system enforces invariants automatically -- ill-formed graph operations cannot compile.
|
|
|
|
```lean
|
|
-- Lean4 definitions for verified graph attention
|
|
|
|
-- A graph with a certified number of nodes and edges
|
|
structure CertifiedGraph (n : Nat) (m : Nat) where
|
|
nodes : Fin n -> NodeFeatures
|
|
edges : Fin m -> (Fin n x Fin n)
|
|
symmetric : forall e, edges e = (u, v) -> exists e', edges e' = (v, u)
|
|
|
|
-- Attention matrix with certified properties
|
|
structure CertifiedAttention (n : Nat) where
|
|
weights : Fin n -> Fin n -> Float
|
|
non_negative : forall i j, weights i j >= 0
|
|
normalized : forall i, (Finset.sum (Finset.univ) (weights i)) = 1.0
|
|
sparse : forall i, (Finset.card {j | weights i j > epsilon}) <= k
|
|
|
|
-- Verified softmax (proven correct)
|
|
def verified_softmax (logits : Fin n -> Float) :
|
|
{w : Fin n -> Float // (forall i, w i >= 0) /\ (Finset.sum Finset.univ w = 1)} :=
|
|
let max_val := Finset.sup Finset.univ logits
|
|
let exp_vals := fun i => Float.exp (logits i - max_val)
|
|
let sum_exp := Finset.sum Finset.univ exp_vals
|
|
let weights := fun i => exp_vals i / sum_exp
|
|
-- Proof obligations discharged by Lean4 tactic mode
|
|
⟨weights, ⟨non_neg_proof, norm_proof⟩⟩
|
|
|
|
-- Message passing with invariant preservation
|
|
def verified_message_pass
|
|
(graph : CertifiedGraph n m)
|
|
(features : Fin n -> Vector Float d)
|
|
(invariant : GraphInvariant) :
|
|
{output : Fin n -> Vector Float d // invariant.holds output} :=
|
|
-- Implementation with proof that invariant is preserved
|
|
sorry -- Proof to be filled in
|
|
```
|
|
|
|
### 2.3 82-Byte Attestation Protocol
|
|
|
|
RuVector's existing `ruvector-verified` uses 82-byte attestations. We extend this to graph attention:
|
|
|
|
```
|
|
Attestation format (82 bytes):
|
|
|
|
Bytes 0-3: Magic number (0x52564154 = "RVAT")
|
|
Bytes 4-7: Property code (enum: robustness, fairness, monotonicity, ...)
|
|
Bytes 8-15: Graph hash (FNV-1a of adjacency + features)
|
|
Bytes 16-23: Attention matrix hash
|
|
Bytes 24-31: Property parameter (epsilon for robustness, etc.)
|
|
Bytes 32-63: Proof commitment (SHA-256 of full proof)
|
|
Bytes 64-71: Timestamp
|
|
Bytes 72-79: Verifier public key
|
|
Bytes 80-81: Checksum
|
|
```
|
|
|
|
**Verification workflow:**
|
|
|
|
```
|
|
1. Compute attention: alpha = GraphAttention(X, G)
|
|
2. Generate proof: proof = Prove(alpha, property, params)
|
|
3. Create attestation: attest = Attest(alpha, proof, property)
|
|
4. Attach to output: (alpha, attest) -- 82 bytes overhead per attention matrix
|
|
5. Consumer verifies: Verify(alpha, attest) -> bool
|
|
- Check: property holds for the specific alpha
|
|
- Check: proof commitment matches actual proof
|
|
- Check: attestation is well-formed
|
|
```
|
|
|
|
**RuVector integration:**
|
|
|
|
```rust
|
|
/// Proof-carrying graph attention
|
|
pub trait ProofCarryingAttention {
|
|
type Property: AttentionProperty;
|
|
type Proof: VerifiableProof;
|
|
|
|
/// Compute attention with proof generation
|
|
fn attend_with_proof(
|
|
&self,
|
|
graph: &PropertyGraph,
|
|
features: &Tensor,
|
|
property: &Self::Property,
|
|
) -> Result<(AttentionMatrix, Self::Proof, Attestation), VerifyError>;
|
|
|
|
/// Verify an attention computation
|
|
fn verify(
|
|
&self,
|
|
attention: &AttentionMatrix,
|
|
proof: &Self::Proof,
|
|
attestation: &Attestation,
|
|
) -> Result<bool, VerifyError>;
|
|
|
|
/// Get proof size in bytes
|
|
fn proof_size(&self, property: &Self::Property) -> usize;
|
|
}
|
|
|
|
/// Attestation (exactly 82 bytes, matching ruvector-verified convention)
|
|
#[repr(C, packed)]
|
|
pub struct Attestation {
|
|
pub magic: [u8; 4], // 0x52564154
|
|
pub property_code: u32,
|
|
pub graph_hash: u64,
|
|
pub attention_hash: u64,
|
|
pub property_param: f64,
|
|
pub proof_commitment: [u8; 32],
|
|
pub timestamp: u64,
|
|
pub verifier_key: u64,
|
|
pub checksum: u16,
|
|
}
|
|
|
|
static_assertions::assert_eq_size!(Attestation, [u8; 82]);
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Verified GNN Training
|
|
|
|
### 3.1 Convergence Proofs
|
|
|
|
**Goal.** Prove that GNN training converges to an epsilon-optimal solution in T steps.
|
|
|
|
**Theorem (Verified SGD Convergence for Graph Attention).** For a graph attention network with L Lipschitz-continuous layers, step size eta = 1/(L * sqrt(T)), and convex loss function:
|
|
|
|
```
|
|
E[f(x_T) - f(x*)] <= L * ||x_0 - x*||^2 / (2 * sqrt(T)) + sigma * sqrt(log(T) / T)
|
|
```
|
|
|
|
where sigma is the gradient noise standard deviation.
|
|
|
|
**Proof structure:**
|
|
1. Lipschitz continuity of attention layers (proven per layer)
|
|
2. Composition: L-layer network has Lipschitz constant L_1 * L_2 * ... * L_L
|
|
3. Standard SGD convergence theorem applied with composed Lipschitz bound
|
|
4. Bound on gradient noise from mini-batch sampling on graphs
|
|
|
|
**Practical verification:** We cannot prove convergence of arbitrary training runs. Instead, we prove:
|
|
- **Pre-training:** The architecture *can* converge (existence of convergent learning rate schedule)
|
|
- **Post-training:** The trained model *did* converge (verify final gradient norm is small)
|
|
- **Property preservation:** Properties certified at initialization are maintained throughout training
|
|
|
|
### 3.2 Invariant-Preserving Training
|
|
|
|
**Key idea.** Define graph invariants that must hold before, during, and after training. The training loop is modified to project back onto the invariant set after each update.
|
|
|
|
```
|
|
Invariant-preserving training loop:
|
|
|
|
for epoch in 1..max_epochs:
|
|
1. Forward pass: output = model(graph, features)
|
|
2. Compute loss: L = loss(output, target)
|
|
3. Backward pass: gradients = autograd(L)
|
|
4. Unconstrained update: params' = params - lr * gradients
|
|
5. PROJECT onto invariant set:
|
|
params = project(params', invariant_set)
|
|
// Ensures invariants still hold after update
|
|
6. VERIFY (periodic):
|
|
assert verify_invariants(model, invariants)
|
|
// Generate fresh proof that invariants hold
|
|
```
|
|
|
|
**Projection operators for common invariants:**
|
|
|
|
| Invariant | Projection | Cost |
|
|
|-----------|-----------|------|
|
|
| Lipschitz bound L | Spectral normalization: W = W * L / max(L, sigma_max(W)) | O(d^2) per layer |
|
|
| Non-negative weights | Clamp: W = max(W, 0) | O(params) |
|
|
| Orthogonal weights | Polar decomposition: W = U * sqrt(U^T * U)^{-1} | O(d^3) per layer |
|
|
| Symmetry preservation | Symmetrize: W = (W + P * W * P^{-1}) / 2 | O(d^2) per layer |
|
|
| Attention sparsity | Hard threshold: alpha[alpha < epsilon] = 0 | O(n^2) |
|
|
|
|
### 3.3 Certified Adversarial Robustness
|
|
|
|
**Goal.** Prove that for any input perturbation ||delta|| <= epsilon, the graph transformer's output changes by at most delta_out.
|
|
|
|
**Interval bound propagation (IBP) for graph attention:**
|
|
|
|
```
|
|
For each layer l:
|
|
// Propagate interval bounds through attention
|
|
h_lower_l, h_upper_l = IBP_GraphAttention(h_lower_{l-1}, h_upper_{l-1}, G)
|
|
|
|
// The interval [h_lower_l, h_upper_l] provably contains
|
|
// all possible hidden states for any perturbation in the input interval
|
|
```
|
|
|
|
**Graph-specific challenges:**
|
|
1. **Structural perturbation**: What if the adversary adds/removes edges? Need to bound over all graphs within edit distance k of G.
|
|
2. **Feature perturbation**: Standard IBP applies, but graph attention amplifies perturbations (attention focuses on perturbed nodes).
|
|
3. **Combined perturbation**: Joint structural + feature perturbation is hardest.
|
|
|
|
**RuVector approach:** Use `ruvector-verified` invariant tracking to maintain robustness certificates through attention layers.
|
|
|
|
```rust
|
|
/// Certified robustness for graph attention
|
|
pub trait CertifiedRobustness {
|
|
/// Compute robustness bound for given perturbation budget
|
|
fn certify_robustness(
|
|
&self,
|
|
graph: &PropertyGraph,
|
|
features: &Tensor,
|
|
epsilon: f64,
|
|
perturbation_type: PerturbationType,
|
|
) -> Result<RobustnessCertificate, VerifyError>;
|
|
|
|
/// Check if a specific input is certifiably robust
|
|
fn is_certifiably_robust(
|
|
&self,
|
|
graph: &PropertyGraph,
|
|
features: &Tensor,
|
|
epsilon: f64,
|
|
) -> bool;
|
|
}
|
|
|
|
pub enum PerturbationType {
|
|
/// L_p norm ball on node features
|
|
FeatureLp { p: f64 },
|
|
/// Edit distance on graph structure
|
|
StructuralEdit { max_edits: usize },
|
|
/// Combined feature + structural
|
|
Combined { feature_epsilon: f64, max_edits: usize },
|
|
}
|
|
|
|
pub struct RobustnessCertificate {
|
|
pub epsilon: f64,
|
|
pub perturbation_type: PerturbationType,
|
|
pub output_bound: f64, // Maximum output change
|
|
pub certified: bool, // Whether the bound holds
|
|
pub proof: VerifiableProof, // Machine-checkable proof
|
|
pub attestation: Attestation,
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Compositional Verification
|
|
|
|
### 4.1 The Compositionality Problem
|
|
|
|
Real graph transformer systems are compositions of many layers, attention heads, and processing stages. Verifying the whole system monolithically is intractable. We need compositional verification: proofs about components that compose into proofs about the whole.
|
|
|
|
### 4.2 Verified Component Interfaces
|
|
|
|
Each graph transformer component declares its interface as a *contract*:
|
|
|
|
```lean
|
|
-- Component contract
|
|
structure AttentionContract where
|
|
-- Preconditions on input
|
|
input_bound : Float -> Prop -- ||input|| <= B_in
|
|
graph_property : Graph -> Prop -- Graph satisfies property P
|
|
|
|
-- Postconditions on output
|
|
output_bound : Float -> Prop -- ||output|| <= B_out
|
|
attention_property : AttentionMatrix -> Prop -- Attention satisfies Q
|
|
|
|
-- Proof that component satisfies contract
|
|
correctness : forall input graph,
|
|
input_bound (norm input) ->
|
|
graph_property graph ->
|
|
let (output, attention) := component input graph
|
|
output_bound (norm output) /\ attention_property attention
|
|
```
|
|
|
|
### 4.3 Contract Composition
|
|
|
|
When components are composed sequentially, contracts compose via transitivity:
|
|
|
|
```
|
|
Component A: {P_A} -> {Q_A} (if P_A holds for input, Q_A holds for output)
|
|
Component B: {P_B} -> {Q_B} (if P_B holds for input, Q_B holds for output)
|
|
|
|
If Q_A implies P_B:
|
|
Composition A;B: {P_A} -> {Q_B}
|
|
|
|
Proof: P_A -> Q_A (by A's contract)
|
|
Q_A -> P_B (by implication)
|
|
P_B -> Q_B (by B's contract)
|
|
Therefore P_A -> Q_B QED
|
|
```
|
|
|
|
**For parallel composition (multi-head attention):**
|
|
|
|
```
|
|
Head 1: {P_1} -> {Q_1}
|
|
Head 2: {P_2} -> {Q_2}
|
|
...
|
|
Head H: {P_H} -> {Q_H}
|
|
|
|
If inputs satisfy all P_i:
|
|
Combined: {P_1 /\ P_2 /\ ... /\ P_H} -> {Q_1 /\ Q_2 /\ ... /\ Q_H}
|
|
```
|
|
|
|
### 4.4 Refinement Types for Graph Operations
|
|
|
|
Extend Rust's type system with refinement types that encode graph properties:
|
|
|
|
```rust
|
|
/// Refinement type: a graph with certified properties
|
|
pub struct VerifiedGraph<const N: usize, P: GraphProperty> {
|
|
graph: PropertyGraph,
|
|
property_witness: P::Witness,
|
|
}
|
|
|
|
/// Example properties
|
|
pub trait GraphProperty {
|
|
type Witness;
|
|
fn verify(graph: &PropertyGraph) -> Option<Self::Witness>;
|
|
}
|
|
|
|
pub struct Connected;
|
|
impl GraphProperty for Connected {
|
|
type Witness = ConnectedProof;
|
|
fn verify(graph: &PropertyGraph) -> Option<ConnectedProof> { /* BFS/DFS check */ }
|
|
}
|
|
|
|
pub struct Acyclic;
|
|
pub struct BipartiteWith<const K: usize>;
|
|
pub struct PlanarWith<const GENUS: usize>;
|
|
pub struct BoundedDegree<const MAX_DEG: usize>;
|
|
pub struct TreeWidth<const K: usize>;
|
|
|
|
/// Verified graph attention: only compiles if types match
|
|
pub fn verified_attention<const N: usize, P: GraphProperty>(
|
|
graph: VerifiedGraph<N, P>,
|
|
features: Tensor,
|
|
) -> VerifiedAttention<N, P>
|
|
where
|
|
P: SupportsAttention, // Trait bound: property P is compatible with attention
|
|
{
|
|
// Implementation guaranteed to preserve property P
|
|
todo!()
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Proof Generation Strategies
|
|
|
|
### 5.1 Strategy Comparison
|
|
|
|
| Strategy | Proof Size | Generation Time | Verification Time | Automation |
|
|
|----------|-----------|----------------|-------------------|-----------|
|
|
| SMT (Z3/CVC5) | Large | Slow (exp) | Fast | High |
|
|
| Interactive (Lean4) | Medium | Manual | Fast | Low |
|
|
| Certifiable training | Implicit | During training | Fast | High |
|
|
| Abstract interpretation | Large | Fast | Fast | High |
|
|
| Symbolic execution | Large | Medium | Medium | Medium |
|
|
|
|
### 5.2 Hybrid Approach for Graph Transformers
|
|
|
|
We recommend a hybrid approach:
|
|
|
|
1. **Compile-time**: Refinement types catch type errors (free, automatic)
|
|
2. **Train-time**: Certifiable training maintains invariants (small overhead)
|
|
3. **Deploy-time**: Abstract interpretation verifies robustness (one-time cost)
|
|
4. **Run-time**: 82-byte attestations certify each inference (minimal overhead)
|
|
5. **Audit-time**: Full Lean4 proofs for high-assurance properties (manual effort)
|
|
|
|
**The 82-byte attestation is the default**: every attention computation gets an attestation. Full proofs are generated on demand for audit.
|
|
|
|
---
|
|
|
|
## 6. Projections
|
|
|
|
### 6.1 By 2030
|
|
|
|
**Likely:**
|
|
- Certified adversarial robustness standard for safety-critical graph ML
|
|
- Refinement types for graph operations in production Rust codebases
|
|
- 82-byte attestations for every attention computation in regulated industries
|
|
- Verified softmax and basic attention layers in Lean4/Coq
|
|
|
|
**Possible:**
|
|
- Compositional verification of multi-layer graph transformers
|
|
- Certified convergence proofs for specific GNN training configurations
|
|
- Automated proof generation for common graph attention properties
|
|
|
|
**Speculative:**
|
|
- Full end-to-end verification of graph transformer inference
|
|
- Verified GNN training that provably converges to global optimum (for convex subproblems)
|
|
|
|
### 6.2 By 2033
|
|
|
|
**Likely:**
|
|
- Formal verification as standard CI/CD gate for graph ML models
|
|
- Lean4 library for graph neural network verification
|
|
- Regulatory requirements for AI certification driving adoption
|
|
|
|
**Possible:**
|
|
- Real-time proof generation during inference (proofs computed alongside attention)
|
|
- Verified graph transformers for medical diagnosis (FDA certification)
|
|
- Compositional verification scaling to 100+ layer networks
|
|
|
|
### 6.3 By 2036+
|
|
|
|
**Possible:**
|
|
- Proof-carrying graph transformer programs as default
|
|
- Verified attention matching informal attention in capability
|
|
- Mathematics-AI co-evolution: graph transformers discovering proofs, proofs verifying transformers
|
|
|
|
**Speculative:**
|
|
- Self-verifying graph transformers that generate their own correctness proofs
|
|
- Universal verification framework for arbitrary graph neural network properties
|
|
- Formal verification of emergent properties (consciousness, agency) in graph systems
|
|
|
|
---
|
|
|
|
## 7. RuVector Implementation Roadmap
|
|
|
|
### Phase 1: Foundation (2026-2027)
|
|
- Extend `ruvector-verified` attestation protocol to attention matrices
|
|
- Implement refinement types for graph operations in Rust (via const generics + traits)
|
|
- Certified robustness via interval bound propagation for graph attention
|
|
- Lean4 bindings for RuVector graph types
|
|
|
|
### Phase 2: Compositional Verification (2027-2028)
|
|
- Contract-based composition of verified attention layers
|
|
- Invariant-preserving training loop
|
|
- Automated proof generation for Lipschitz bounds, monotonicity
|
|
- Integration with `ruvector-gnn` training pipeline
|
|
|
|
### Phase 3: Production Certification (2028-2030)
|
|
- Real-time attestation generation during inference
|
|
- Regulatory compliance framework (medical, financial, autonomous)
|
|
- Full Lean4 proof library for graph attention properties
|
|
- Self-verifying attention modules
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
1. Necula, "Proof-Carrying Code," POPL 1997
|
|
2. Singh et al., "An Abstract Domain for Certifying Neural Networks," POPL 2019
|
|
3. Gowal et al., "Scalable Verified Training for Provably Robust Image Classifiers," ICLR 2019
|
|
4. Zugner & Gunnemann, "Certifiable Robustness of Graph Convolutional Networks under Structure Perturbation," KDD 2020
|
|
5. Bojchevski et al., "Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing," ICML 2020
|
|
6. de Moura & Bjorner, "Z3: An Efficient SMT Solver," TACAS 2008
|
|
7. The Lean 4 Theorem Prover, https://leanprover.github.io/
|
|
8. RuVector `ruvector-verified` documentation (internal)
|
|
|
|
---
|
|
|
|
**End of Document 26**
|
|
|
|
**Next:** [Doc 27 - Hyperbolic & Mixed-Curvature](27-hyperbolic-mixed-curvature.md)
|