# ADR-001: Sheaf Cohomology for AI Coherence **Status**: Accepted **Date**: 2024-12-15 **Authors**: RuVector Team **Supersedes**: None --- ## Context Large Language Models and AI agents frequently produce outputs that are locally plausible but globally inconsistent. Traditional approaches to detecting such "hallucinations" rely on: 1. **Confidence scores**: Unreliable due to overconfidence on out-of-distribution inputs 2. **Retrieval augmentation**: Helps but doesn't verify consistency across retrieved facts 3. **Chain-of-thought verification**: Manual and prone to same failures as original reasoning 4. **Ensemble methods**: Expensive and still vulnerable to correlated errors We need a mathematical framework that can: - Detect **local-to-global consistency** failures systematically - Provide **quantitative measures** of coherence - Support **incremental updates** as new information arrives - Work across **multiple domains** with the same underlying math ### Why Sheaf Theory? Sheaf theory was developed in algebraic geometry and topology precisely to handle local-to-global problems. A sheaf assigns data to open sets in a way that: 1. **Locality**: Information at a point is determined by nearby information 2. **Gluing**: Locally consistent data can be assembled into global data 3. **Restriction**: Global data determines local data uniquely These properties exactly match our coherence requirements: - AI claims are local (about specific facts) - Coherent knowledge should glue together globally - Contradictions appear when local data fails to extend globally --- ## Decision We implement **cellular sheaf cohomology** on graphs as the mathematical foundation for Prime-Radiant's coherence engine. ### Mathematical Foundation #### Definition: Sheaf on a Graph A **cellular sheaf** F on a graph G = (V, E) assigns: 1. To each vertex v, a vector space F(v) (the **stalk** at v) 2. To each edge e = (u,v), a vector space F(e) 3. For each vertex v incident to edge e, a linear map (the **restriction map**): ``` rho_{v,e}: F(v) -> F(e) ``` #### Definition: Residual For an edge e = (u,v) with vertex states x_u in F(u) and x_v in F(v), the **residual** is: ``` r_e = rho_{u,e}(x_u) - rho_{v,e}(x_v) ``` The residual measures local inconsistency: if states agree through their restriction maps, r_e = 0. #### Definition: Sheaf Laplacian The **sheaf Laplacian** L is the block matrix: ``` L = D^T W D ``` where: - D is the coboundary map (encodes graph topology and restriction maps) - W is a diagonal weight matrix for edges The quadratic form x^T L x = sum_e w_e ||r_e||^2 computes total coherence energy. #### Definition: Cohomology Groups The **first cohomology group** H^1(G, F) measures obstruction to finding a global section: ``` H^1(G, F) = ker(delta_1) / im(delta_0) ``` where delta_i are coboundary maps. If H^1 is non-trivial, the sheaf admits no global section (global inconsistency exists). ### Implementation Architecture ```rust /// A sheaf on a graph with fixed-dimensional stalks pub struct SheafGraph { /// Node stalks: state vectors at each vertex nodes: HashMap, /// Edge stalks and restriction maps edges: HashMap, /// Cached Laplacian blocks for incremental updates laplacian_cache: LaplacianCache, } /// A restriction map implemented as a matrix pub struct RestrictionMap { /// The linear map as a matrix (output_dim x input_dim) matrix: Array2, /// Input dimension (node stalk dimension) input_dim: usize, /// Output dimension (edge stalk dimension) output_dim: usize, } impl RestrictionMap { /// Apply the restriction map: rho(x) pub fn apply(&self, x: &[f32]) -> Vec { self.matrix.dot(&ArrayView1::from(x)).to_vec() } /// Identity restriction (node stalk = edge stalk) pub fn identity(dim: usize) -> Self { Self { matrix: Array2::eye(dim), input_dim: dim, output_dim: dim, } } /// Projection restriction (edge stalk is subset of node stalk) pub fn projection(input_dim: usize, output_dim: usize) -> Self { let mut matrix = Array2::zeros((output_dim, input_dim)); for i in 0..output_dim.min(input_dim) { matrix[[i, i]] = 1.0; } Self { matrix, input_dim, output_dim } } } ``` ### Cohomology Computation ```rust /// Compute the first cohomology dimension pub fn cohomology_dimension(&self) -> usize { // Build coboundary matrix D let d = self.build_coboundary_matrix(); // Compute rank using SVD let svd = d.svd(true, true).unwrap(); let rank = svd.singular_values .iter() .filter(|&s| *s > 1e-10) .count(); // dim H^1 = dim(edge stalks) - rank(D) let edge_dim: usize = self.edges.values() .map(|e| e.stalk_dim) .sum(); edge_dim.saturating_sub(rank) } /// Check if sheaf admits a global section pub fn has_global_section(&self) -> bool { self.cohomology_dimension() == 0 } ``` ### Energy Computation The total coherence energy is: ```rust /// Compute total coherence energy: E = sum_e w_e ||r_e||^2 pub fn coherence_energy(&self) -> f32 { self.edges.values() .map(|edge| { let source = &self.nodes[&edge.source]; let target = &self.nodes[&edge.target]; // Apply restriction maps let rho_s = edge.source_restriction.apply(&source.state); let rho_t = edge.target_restriction.apply(&target.state); // Compute residual let residual: Vec = rho_s.iter() .zip(rho_t.iter()) .map(|(a, b)| a - b) .collect(); // Weighted squared norm let norm_sq: f32 = residual.iter().map(|r| r * r).sum(); edge.weight * norm_sq }) .sum() } ``` ### Incremental Updates For efficiency, we maintain a **residual cache** and update incrementally: ```rust /// Update a single node and recompute affected energies pub fn update_node(&mut self, node_id: NodeId, new_state: Vec) { // Store old state for delta computation let old_state = self.nodes.insert(node_id, new_state.clone()); // Only recompute residuals for edges incident to this node for edge_id in self.edges_incident_to(node_id) { self.recompute_residual(edge_id); } // Update fingerprint self.update_fingerprint(node_id, &old_state, &new_state); } ``` --- ## Consequences ### Positive 1. **Mathematically Grounded**: Sheaf cohomology provides rigorous foundations for coherence 2. **Domain Agnostic**: Same math applies to facts, financial signals, medical data, etc. 3. **Local-to-Global Detection**: Naturally captures the essence of hallucination (local OK, global wrong) 4. **Incremental Computation**: Residual caching enables real-time updates 5. **Spectral Analysis**: Sheaf Laplacian eigenvalues provide drift detection 6. **Quantitative Measure**: Energy gives a continuous coherence score, not just binary ### Negative 1. **Computational Cost**: Full cohomology computation is O(n^3) for n nodes 2. **Restriction Map Design**: Choosing appropriate rho requires domain knowledge 3. **Curse of Dimensionality**: High-dimensional stalks increase memory and compute 4. **Learning Complexity**: Non-trivial to learn restriction maps from data ### Mitigations 1. **Incremental Updates**: Avoid full recomputation for small changes 2. **Learned rho**: GNN-based restriction map learning (see `learned-rho` feature) 3. **Dimensional Reduction**: Use projection restriction maps to reduce edge stalk dimension 4. **Subpolynomial MinCut**: Use for approximation when full computation is infeasible --- ## Mathematical Properties ### Theorem: Energy Minimization If the sheaf Laplacian L has full column rank, the minimum energy configuration is unique: ``` x* = argmin_x ||Dx||^2_W = L^+ b ``` where L^+ is the pseudoinverse and b encodes boundary conditions. ### Theorem: Cheeger Inequality The spectral gap (second smallest eigenvalue) of L relates to graph cuts: ``` lambda_2 / 2 <= h(G) <= sqrt(2 * lambda_2) ``` where h(G) is the Cheeger constant. This enables **cut prediction** from spectral analysis. ### Theorem: Hodge Decomposition The space of edge states decomposes: ``` C^1(G, F) = im(delta_0) + ker(delta_1) + H^1(G, F) ``` This separates gradient flows (consistent), harmonic forms (neutral), and cohomology (obstructions). --- ## Related Decisions - [ADR-004: Spectral Invariants](ADR-004-spectral-invariants.md) - Uses sheaf Laplacian eigenvalues - [ADR-002: Category Theory](ADR-002-category-topos.md) - Sheaves are presheaves satisfying gluing - [ADR-003: Homotopy Type Theory](ADR-003-homotopy-type-theory.md) - Higher sheaves and stacks --- ## References 1. Hansen, J., & Ghrist, R. (2019). "Toward a spectral theory of cellular sheaves." Journal of Applied and Computational Topology. 2. Curry, J. (2014). "Sheaves, Cosheaves and Applications." PhD thesis, University of Pennsylvania. 3. Robinson, M. (2014). "Topological Signal Processing." Springer. 4. Bodnar, C., et al. (2022). "Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs." NeurIPS. 5. Ghrist, R. (2014). "Elementary Applied Topology." Createspace. --- ## Appendix: Worked Example Consider a knowledge graph with three facts: - F1: "Paris is the capital of France" (state: [1, 0, 0, 1]) - F2: "France is in Europe" (state: [0, 1, 1, 0]) - F3: "Paris is not in Europe" (state: [1, 0, 0, -1]) -- HALLUCINATION Edges with identity restriction maps: - E1: F1 -> F2 (France connection) - E2: F1 -> F3 (Paris connection) - E3: F2 -> F3 (Europe connection) Residuals: - r_{E1} = [1,0,0,1] - [0,1,1,0] = [1,-1,-1,1], ||r||^2 = 4 - r_{E2} = [1,0,0,1] - [1,0,0,-1] = [0,0,0,2], ||r||^2 = 4 - r_{E3} = [0,1,1,0] - [1,0,0,-1] = [-1,1,1,1], ||r||^2 = 4 Total energy = 4 + 4 + 4 = 12 (HIGH -- indicates hallucination) If F3 were corrected to "Paris is in Europe" (state: [1,0,1,1]): - r_{E3} = [0,1,1,0] - [1,0,1,1] = [-1,1,0,-1], ||r||^2 = 3 Energy decreases, indicating better coherence.