Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
333
examples/prime-radiant/docs/adr/ADR-001-sheaf-cohomology.md
Normal file
333
examples/prime-radiant/docs/adr/ADR-001-sheaf-cohomology.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# ADR-001: Sheaf Cohomology for AI Coherence
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2024-12-15
|
||||
**Authors**: RuVector Team
|
||||
**Supersedes**: None
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Large Language Models and AI agents frequently produce outputs that are locally plausible but globally inconsistent. Traditional approaches to detecting such "hallucinations" rely on:
|
||||
|
||||
1. **Confidence scores**: Unreliable due to overconfidence on out-of-distribution inputs
|
||||
2. **Retrieval augmentation**: Helps but doesn't verify consistency across retrieved facts
|
||||
3. **Chain-of-thought verification**: Manual and prone to same failures as original reasoning
|
||||
4. **Ensemble methods**: Expensive and still vulnerable to correlated errors
|
||||
|
||||
We need a mathematical framework that can:
|
||||
|
||||
- Detect **local-to-global consistency** failures systematically
|
||||
- Provide **quantitative measures** of coherence
|
||||
- Support **incremental updates** as new information arrives
|
||||
- Work across **multiple domains** with the same underlying math
|
||||
|
||||
### Why Sheaf Theory?
|
||||
|
||||
Sheaf theory was developed in algebraic geometry and topology precisely to handle local-to-global problems. A sheaf assigns data to open sets in a way that:
|
||||
|
||||
1. **Locality**: Information at a point is determined by nearby information
|
||||
2. **Gluing**: Locally consistent data can be assembled into global data
|
||||
3. **Restriction**: Global data determines local data uniquely
|
||||
|
||||
These properties exactly match our coherence requirements:
|
||||
|
||||
- AI claims are local (about specific facts)
|
||||
- Coherent knowledge should glue together globally
|
||||
- Contradictions appear when local data fails to extend globally
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
We implement **cellular sheaf cohomology** on graphs as the mathematical foundation for Prime-Radiant's coherence engine.
|
||||
|
||||
### Mathematical Foundation
|
||||
|
||||
#### Definition: Sheaf on a Graph
|
||||
|
||||
A **cellular sheaf** F on a graph G = (V, E) assigns:
|
||||
|
||||
1. To each vertex v, a vector space F(v) (the **stalk** at v)
|
||||
2. To each edge e = (u,v), a vector space F(e)
|
||||
3. For each vertex v incident to edge e, a linear map (the **restriction map**):
|
||||
```
|
||||
rho_{v,e}: F(v) -> F(e)
|
||||
```
|
||||
|
||||
#### Definition: Residual
|
||||
|
||||
For an edge e = (u,v) with vertex states x_u in F(u) and x_v in F(v), the **residual** is:
|
||||
|
||||
```
|
||||
r_e = rho_{u,e}(x_u) - rho_{v,e}(x_v)
|
||||
```
|
||||
|
||||
The residual measures local inconsistency: if states agree through their restriction maps, r_e = 0.
|
||||
|
||||
#### Definition: Sheaf Laplacian
|
||||
|
||||
The **sheaf Laplacian** L is the block matrix:
|
||||
|
||||
```
|
||||
L = D^T W D
|
||||
```
|
||||
|
||||
where:
|
||||
- D is the coboundary map (encodes graph topology and restriction maps)
|
||||
- W is a diagonal weight matrix for edges
|
||||
|
||||
The quadratic form x^T L x = sum_e w_e ||r_e||^2 computes total coherence energy.
|
||||
|
||||
#### Definition: Cohomology Groups
|
||||
|
||||
The **first cohomology group** H^1(G, F) measures obstruction to finding a global section:
|
||||
|
||||
```
|
||||
H^1(G, F) = ker(delta_1) / im(delta_0)
|
||||
```
|
||||
|
||||
where delta_i are coboundary maps. If H^1 is non-trivial, the sheaf admits no global section (global inconsistency exists).
|
||||
|
||||
### Implementation Architecture
|
||||
|
||||
```rust
|
||||
/// A sheaf on a graph with fixed-dimensional stalks
|
||||
pub struct SheafGraph {
|
||||
/// Node stalks: state vectors at each vertex
|
||||
nodes: HashMap<NodeId, StateVector>,
|
||||
|
||||
/// Edge stalks and restriction maps
|
||||
edges: HashMap<EdgeId, SheafEdge>,
|
||||
|
||||
/// Cached Laplacian blocks for incremental updates
|
||||
laplacian_cache: LaplacianCache,
|
||||
}
|
||||
|
||||
/// A restriction map implemented as a matrix
|
||||
pub struct RestrictionMap {
|
||||
/// The linear map as a matrix (output_dim x input_dim)
|
||||
matrix: Array2<f32>,
|
||||
|
||||
/// Input dimension (node stalk dimension)
|
||||
input_dim: usize,
|
||||
|
||||
/// Output dimension (edge stalk dimension)
|
||||
output_dim: usize,
|
||||
}
|
||||
|
||||
impl RestrictionMap {
|
||||
/// Apply the restriction map: rho(x)
|
||||
pub fn apply(&self, x: &[f32]) -> Vec<f32> {
|
||||
self.matrix.dot(&ArrayView1::from(x)).to_vec()
|
||||
}
|
||||
|
||||
/// Identity restriction (node stalk = edge stalk)
|
||||
pub fn identity(dim: usize) -> Self {
|
||||
Self {
|
||||
matrix: Array2::eye(dim),
|
||||
input_dim: dim,
|
||||
output_dim: dim,
|
||||
}
|
||||
}
|
||||
|
||||
/// Projection restriction (edge stalk is subset of node stalk)
|
||||
pub fn projection(input_dim: usize, output_dim: usize) -> Self {
|
||||
let mut matrix = Array2::zeros((output_dim, input_dim));
|
||||
for i in 0..output_dim.min(input_dim) {
|
||||
matrix[[i, i]] = 1.0;
|
||||
}
|
||||
Self { matrix, input_dim, output_dim }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cohomology Computation
|
||||
|
||||
```rust
|
||||
/// Compute the first cohomology dimension
|
||||
pub fn cohomology_dimension(&self) -> usize {
|
||||
// Build coboundary matrix D
|
||||
let d = self.build_coboundary_matrix();
|
||||
|
||||
// Compute rank using SVD
|
||||
let svd = d.svd(true, true).unwrap();
|
||||
let rank = svd.singular_values
|
||||
.iter()
|
||||
.filter(|&s| *s > 1e-10)
|
||||
.count();
|
||||
|
||||
// dim H^1 = dim(edge stalks) - rank(D)
|
||||
let edge_dim: usize = self.edges.values()
|
||||
.map(|e| e.stalk_dim)
|
||||
.sum();
|
||||
|
||||
edge_dim.saturating_sub(rank)
|
||||
}
|
||||
|
||||
/// Check if sheaf admits a global section
|
||||
pub fn has_global_section(&self) -> bool {
|
||||
self.cohomology_dimension() == 0
|
||||
}
|
||||
```
|
||||
|
||||
### Energy Computation
|
||||
|
||||
The total coherence energy is:
|
||||
|
||||
```rust
|
||||
/// Compute total coherence energy: E = sum_e w_e ||r_e||^2
|
||||
pub fn coherence_energy(&self) -> f32 {
|
||||
self.edges.values()
|
||||
.map(|edge| {
|
||||
let source = &self.nodes[&edge.source];
|
||||
let target = &self.nodes[&edge.target];
|
||||
|
||||
// Apply restriction maps
|
||||
let rho_s = edge.source_restriction.apply(&source.state);
|
||||
let rho_t = edge.target_restriction.apply(&target.state);
|
||||
|
||||
// Compute residual
|
||||
let residual: Vec<f32> = rho_s.iter()
|
||||
.zip(rho_t.iter())
|
||||
.map(|(a, b)| a - b)
|
||||
.collect();
|
||||
|
||||
// Weighted squared norm
|
||||
let norm_sq: f32 = residual.iter().map(|r| r * r).sum();
|
||||
edge.weight * norm_sq
|
||||
})
|
||||
.sum()
|
||||
}
|
||||
```
|
||||
|
||||
### Incremental Updates
|
||||
|
||||
For efficiency, we maintain a **residual cache** and update incrementally:
|
||||
|
||||
```rust
|
||||
/// Update a single node and recompute affected energies
|
||||
pub fn update_node(&mut self, node_id: NodeId, new_state: Vec<f32>) {
|
||||
// Store old state for delta computation
|
||||
let old_state = self.nodes.insert(node_id, new_state.clone());
|
||||
|
||||
// Only recompute residuals for edges incident to this node
|
||||
for edge_id in self.edges_incident_to(node_id) {
|
||||
self.recompute_residual(edge_id);
|
||||
}
|
||||
|
||||
// Update fingerprint
|
||||
self.update_fingerprint(node_id, &old_state, &new_state);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Mathematically Grounded**: Sheaf cohomology provides rigorous foundations for coherence
|
||||
2. **Domain Agnostic**: Same math applies to facts, financial signals, medical data, etc.
|
||||
3. **Local-to-Global Detection**: Naturally captures the essence of hallucination (local OK, global wrong)
|
||||
4. **Incremental Computation**: Residual caching enables real-time updates
|
||||
5. **Spectral Analysis**: Sheaf Laplacian eigenvalues provide drift detection
|
||||
6. **Quantitative Measure**: Energy gives a continuous coherence score, not just binary
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Computational Cost**: Full cohomology computation is O(n^3) for n nodes
|
||||
2. **Restriction Map Design**: Choosing appropriate rho requires domain knowledge
|
||||
3. **Curse of Dimensionality**: High-dimensional stalks increase memory and compute
|
||||
4. **Learning Complexity**: Non-trivial to learn restriction maps from data
|
||||
|
||||
### Mitigations
|
||||
|
||||
1. **Incremental Updates**: Avoid full recomputation for small changes
|
||||
2. **Learned rho**: GNN-based restriction map learning (see `learned-rho` feature)
|
||||
3. **Dimensional Reduction**: Use projection restriction maps to reduce edge stalk dimension
|
||||
4. **Subpolynomial MinCut**: Use for approximation when full computation is infeasible
|
||||
|
||||
---
|
||||
|
||||
## Mathematical Properties
|
||||
|
||||
### Theorem: Energy Minimization
|
||||
|
||||
If the sheaf Laplacian L has full column rank, the minimum energy configuration is unique:
|
||||
|
||||
```
|
||||
x* = argmin_x ||Dx||^2_W = L^+ b
|
||||
```
|
||||
|
||||
where L^+ is the pseudoinverse and b encodes boundary conditions.
|
||||
|
||||
### Theorem: Cheeger Inequality
|
||||
|
||||
The spectral gap (second smallest eigenvalue) of L relates to graph cuts:
|
||||
|
||||
```
|
||||
lambda_2 / 2 <= h(G) <= sqrt(2 * lambda_2)
|
||||
```
|
||||
|
||||
where h(G) is the Cheeger constant. This enables **cut prediction** from spectral analysis.
|
||||
|
||||
### Theorem: Hodge Decomposition
|
||||
|
||||
The space of edge states decomposes:
|
||||
|
||||
```
|
||||
C^1(G, F) = im(delta_0) + ker(delta_1) + H^1(G, F)
|
||||
```
|
||||
|
||||
This separates gradient flows (consistent), harmonic forms (neutral), and cohomology (obstructions).
|
||||
|
||||
---
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- [ADR-004: Spectral Invariants](ADR-004-spectral-invariants.md) - Uses sheaf Laplacian eigenvalues
|
||||
- [ADR-002: Category Theory](ADR-002-category-topos.md) - Sheaves are presheaves satisfying gluing
|
||||
- [ADR-003: Homotopy Type Theory](ADR-003-homotopy-type-theory.md) - Higher sheaves and stacks
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Hansen, J., & Ghrist, R. (2019). "Toward a spectral theory of cellular sheaves." Journal of Applied and Computational Topology.
|
||||
|
||||
2. Curry, J. (2014). "Sheaves, Cosheaves and Applications." PhD thesis, University of Pennsylvania.
|
||||
|
||||
3. Robinson, M. (2014). "Topological Signal Processing." Springer.
|
||||
|
||||
4. Bodnar, C., et al. (2022). "Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs." NeurIPS.
|
||||
|
||||
5. Ghrist, R. (2014). "Elementary Applied Topology." Createspace.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Worked Example
|
||||
|
||||
Consider a knowledge graph with three facts:
|
||||
|
||||
- F1: "Paris is the capital of France" (state: [1, 0, 0, 1])
|
||||
- F2: "France is in Europe" (state: [0, 1, 1, 0])
|
||||
- F3: "Paris is not in Europe" (state: [1, 0, 0, -1]) -- HALLUCINATION
|
||||
|
||||
Edges with identity restriction maps:
|
||||
- E1: F1 -> F2 (France connection)
|
||||
- E2: F1 -> F3 (Paris connection)
|
||||
- E3: F2 -> F3 (Europe connection)
|
||||
|
||||
Residuals:
|
||||
- r_{E1} = [1,0,0,1] - [0,1,1,0] = [1,-1,-1,1], ||r||^2 = 4
|
||||
- r_{E2} = [1,0,0,1] - [1,0,0,-1] = [0,0,0,2], ||r||^2 = 4
|
||||
- r_{E3} = [0,1,1,0] - [1,0,0,-1] = [-1,1,1,1], ||r||^2 = 4
|
||||
|
||||
Total energy = 4 + 4 + 4 = 12 (HIGH -- indicates hallucination)
|
||||
|
||||
If F3 were corrected to "Paris is in Europe" (state: [1,0,1,1]):
|
||||
- r_{E3} = [0,1,1,0] - [1,0,1,1] = [-1,1,0,-1], ||r||^2 = 3
|
||||
|
||||
Energy decreases, indicating better coherence.
|
||||
Reference in New Issue
Block a user