Files
wifi-densepose/vendor/ruvector/docs/research/gnn-v2/27-hyperbolic-mixed-curvature-graph-transformers.md

29 KiB

Hyperbolic and Mixed-Curvature Graph Transformers: Product Manifold Attention

Overview

Problem Statement

Graph Transformers have become the dominant architecture for learning on relational data, yet nearly all deployed systems operate in flat Euclidean space. This is a geometric mismatch: most real-world graphs are not flat.

Why Euclidean space fails for real-world graphs:

  1. Power-law degree distributions (social networks, citation graphs, the web) exhibit tree-like branching that requires exponentially many dimensions in Euclidean space to embed without distortion. A binary tree of depth d has 2^d leaves, but fitting them equidistantly in \mathbb{R}^n requires n \geq 2^d - 1 dimensions.
  2. Hierarchical structures (taxonomies, organizational charts, ontologies) naturally live in hyperbolic space, where the volume of a ball grows exponentially with radius -- matching the exponential growth of tree levels.
  3. Cyclic substructures (molecular rings, periodic lattices, social cliques) have positive curvature and embed naturally on spheres S^n.
  4. Hybrid graphs (knowledge graphs combining hierarchies with lateral associations) require multiple curvature regimes simultaneously.

The consequence: flat-space Graph Transformers waste capacity representing geometric structure that is free in the correct curved space, leading to higher distortion, larger models, and slower convergence.

Proposed Solution

Develop Product Manifold Graph Transformers that operate natively on mixed-curvature spaces. The core decomposition is:

\mathcal{M} = S^{n_1} \times H^{n_2} \times \mathbb{R}^{n_3}

where S^{n_1} captures cyclic/clustered structure, H^{n_2} captures hierarchical structure, and \mathbb{R}^{n_3} captures flat semantic similarity. Every component of the attention mechanism -- queries, keys, values, aggregation, and optimization -- operates in its geometrically appropriate space.

Connection to RuVector

RuVector already has substantial infrastructure for this research direction:

  • ruvector-attention/src/hyperbolic/: Poincare ball operations (poincare.rs), Lorentz cascade attention with Busemann scoring and Einstein midpoint (lorentz_cascade.rs), mixed-curvature attention (mixed_curvature.rs)
  • ruvector-attention/src/curvature/: Fused E x H x S attention (fused_attention.rs), tangent space mapping (tangent_space.rs), component quantizer (component_quantizer.rs)
  • ruvector-attention/src/transport/: Sliced Wasserstein and centroid optimal transport attention
  • ruvector-attention/src/topology/: Topology-gated attention with coherence metrics
  • ruvector-graph/: Full property graph with Cypher queries, distributed federation, hybrid vector-graph search
  • ruvector-solver/: Sublinear graph solvers (forward/backward push, CG, random walk, BMSSP)

This document extends RuVector's existing mixed-curvature capabilities toward full product manifold Graph Transformers with learned curvature fields.


Technical Deep Dive

1. Hyperbolic Graph Transformers

Poincare Ball Attention

In the Poincare ball model \mathbb{B}^n_c = \{x \in \mathbb{R}^n : c\|x\|^2 < 1\}, the standard dot-product attention \text{softmax}(QK^T / \sqrt{d}) is replaced with geodesic attention:

\alpha_{ij} = \frac{\exp(-d_{\mathbb{B}}(q_i, k_j) / \tau)}{\sum_l \exp(-d_{\mathbb{B}}(q_i, k_l) / \tau)}

where d_{\mathbb{B}}(x, y) = \frac{1}{\sqrt{c}} \text{arcosh}\left(1 + \frac{2c\|x - y\|^2}{(1 - c\|x\|^2)(1 - c\|y\|^2)}\right).

RuVector's poincare.rs already implements this with numerical stability via epsilon-buffered projection. The key insight from Lorentz cascade attention (lorentz_cascade.rs) is that the Lorentz model avoids boundary instability entirely: points live on the hyperboloid \{x : \langle x, x \rangle_L = -1/c, x_0 > 0\} rather than inside a ball, and attention scores reduce to Busemann functions (single dot products).

Lorentz Model Message Passing

In the Lorentz model, message passing between graph nodes proceeds as:

  1. Embed each node v onto the hyperboloid: h_v \in H^n_c
  2. Attend using Busemann scoring: B_\xi(x) = \ln(-\langle x, \xi \rangle_L), where \xi is a light-like focal direction defining the hierarchy
  3. Aggregate via Einstein midpoint (closed-form, unlike iterative Frechet mean): \bar{h} = \text{proj}_H\left(\sum_i w_i \gamma_i h_i / \|\sum_i w_i \gamma_i h_i\|_L\right) where \gamma_i is the Lorentz factor

RuVector's LorentzCascadeAttention implements this with multi-curvature heads operating at logarithmically-spaced curvatures, capturing hierarchy at multiple scales simultaneously.

Gyrovector Aggregation

Standard weighted averaging in Euclidean space (\bar{v} = \sum_i w_i v_i) does not preserve the Poincare ball constraint. Instead, aggregation must use Mobius operations:

\text{AGGREGATE}(\{(w_i, v_i)\}) = \bigoplus_{i=1}^n (w_i \otimes_c v_i)

where \oplus_c is Mobius addition and \otimes_c is Mobius scalar multiplication. RuVector's poincare.rs provides mobius_add and mobius_scalar_mult with full numerical stability.

The practical limitation is that Mobius aggregation is sequential -- each addition depends on the previous result. The Frechet mean (frechet_mean in RuVector) offers a parallel alternative via Riemannian gradient descent in the tangent space.

2. Mixed-Curvature Product Manifolds

S^n \times H^m \times \mathbb{R}^k Decomposition

A product manifold \mathcal{M} = \mathcal{M}_1 \times \mathcal{M}_2 \times \cdots \times \mathcal{M}_p has the metric:

d_{\mathcal{M}}(x, y)^2 = \sum_{i=1}^p \beta_i \cdot d_{\mathcal{M}_i}(x^{(i)}, y^{(i)})^2

where \beta_i are learnable mixing weights and each \mathcal{M}_i is either spherical (\kappa_i > 0), hyperbolic (\kappa_i < 0), or Euclidean (\kappa_i = 0).

RuVector's FusedCurvatureConfig already defines this decomposition:

pub struct FusedCurvatureConfig {
    pub euclidean_dim: usize,     // R^k component
    pub hyperbolic_dim: usize,    // H^m component
    pub spherical_dim: usize,     // S^n component
    pub weight_e: f32,            // beta_E
    pub weight_h: f32,            // beta_H
    pub weight_s: f32,            // beta_S
    pub hyperbolic_curvature: f32,
}

The fused attention kernel computes all three similarities in a single vectorized pass:

\text{logit}(q, k) = \beta_E \langle q_E, k_E \rangle + \beta_H \langle q_{H}^{\text{tan}}, k_{H}^{\text{tan}} \rangle + \beta_S \langle q_S, k_S \rangle_S

where the hyperbolic component uses tangent-space dot products (10-100x faster than geodesic distance per RuVector's TangentSpaceMapper) and the spherical component uses normalized inner products on the unit sphere.

Curvature-Per-Component

Rather than a single global curvature, each dimension group can have its own learned curvature. For a product of p components:

\mathcal{M} = \mathcal{M}_1^{\kappa_1} \times \mathcal{M}_2^{\kappa_2} \times \cdots \times \mathcal{M}_p^{\kappa_p}

This is the key extension beyond RuVector's current MixedCurvatureConfig (which uses a single curvature for the hyperbolic component). The research direction is to make \kappa_i learnable per-component, enabling the model to discover which curvature best fits each subspace of the embedding.

Optimal Curvature Learning

Given a graph G = (V, E) with known structure, the optimal curvature for a hyperbolic component can be estimated as:

\kappa^* = -\frac{4\delta^2}{(\text{diam}(G))^2}

where \delta is the Gromov hyperbolicity (measuring how tree-like the graph is) and \text{diam}(G) is the graph diameter. RuVector's solver crate provides the graph traversal primitives needed to compute both quantities sublinearly.

For learnable curvatures during training, the gradient flows through the exponential map:

\frac{\partial \mathcal{L}}{\partial \kappa} = \frac{\partial \mathcal{L}}{\partial d_\kappa} \cdot \frac{\partial d_\kappa}{\partial \kappa}

The curvature gradient for the Poincare distance is:

\frac{\partial d_c}{\partial c} = -\frac{1}{2c^{3/2}} \text{arcosh}(\alpha) + \frac{1}{\sqrt{c}} \frac{1}{\sqrt{\alpha^2 - 1}} \frac{\partial \alpha}{\partial c}

where \alpha = 1 + 2c\|x - y\|^2 / ((1 - c\|x\|^2)(1 - c\|y\|^2)).

3. Curvature-Adaptive Routing

Attention Weights as Parallel Transport

In a curved space, moving a vector from one tangent space to another requires parallel transport along the geodesic connecting them. Standard attention aggregation implicitly assumes all values live in the same space, which is only true in flat space.

For a message from node j to node i, the value v_j must be parallel-transported from T_{h_j}\mathcal{M} to T_{h_i}\mathcal{M}:

\tilde{v}_j = \Gamma_{h_j \to h_i}(v_j)

In the Poincare ball, parallel transport along the geodesic from x to y is:

\Gamma_{x \to y}(v) = \frac{\lambda_x}{\lambda_y} \cdot \text{gyr}[y, -x](v)

where \lambda_x = 2/(1 - c\|x\|^2) is the conformal factor and \text{gyr} is the gyration operator (Thomas precession). This connects to RuVector's transport module (ruvector-attention/src/transport/), which uses optimal transport for attention -- the Wasserstein distance provides a natural way to compute transport plans between distributions on manifolds.

Levi-Civita Connection for Message Passing

The Levi-Civita connection \nabla provides the unique torsion-free, metric-compatible way to differentiate vector fields on a manifold. For graph message passing on a Riemannian manifold (\mathcal{M}, g):

m_{i \leftarrow j} = \alpha_{ij} \cdot \Gamma_{j \to i}^{\nabla}(W_v h_j)

where \Gamma_{j \to i}^{\nabla} is parallel transport along the Levi-Civita connection. The Christoffel symbols \Gamma^k_{ij} encode the connection in coordinates:

\Gamma^k_{ij} = \frac{1}{2} g^{kl}\left(\frac{\partial g_{jl}}{\partial x^i} + \frac{\partial g_{il}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^l}\right)

For the Poincare ball with conformal factor \lambda_x = 2/(1 - c\|x\|^2), the Christoffel symbols simplify considerably, enabling efficient implementation.

4. Riemannian Optimization for Graph Transformers

Riemannian Adam

Standard Adam cannot be applied directly on manifolds because the update rule \theta_{t+1} = \theta_t - \eta \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon) does not preserve manifold constraints. Riemannian Adam replaces Euclidean operations with their Riemannian counterparts:

Algorithm: Riemannian Adam on Product Manifold M

Input: Learning rate eta, decay rates beta_1, beta_2, parameters theta in M
Initialize: m_0 = 0, v_0 = 0 (in tangent space at theta_0)

For t = 1, 2, ...:
    g_t = Riemannian_gradient(L, theta_{t-1})   // Project Euclidean grad to tangent space
    m_t = beta_1 * PT(m_{t-1}) + (1 - beta_1) * g_t   // Parallel transport first moment
    v_t = beta_2 * v_{t-1} + (1 - beta_2) * g_t^2     // Second moment (scalar, no transport)
    m_hat = m_t / (1 - beta_1^t)
    v_hat = v_t / (1 - beta_2^t)
    update = -eta * m_hat / (sqrt(v_hat) + epsilon)
    theta_t = Exp_{theta_{t-1}}(update)   // Exponential map back to manifold

The key operations are:

  • Riemannian gradient: \text{grad}_\mathcal{M} f = \frac{1}{\lambda_x^2} \nabla_E f (rescale Euclidean gradient by inverse metric)
  • Exponential map: \text{Exp}_x(v) moves from x in direction v along the geodesic
  • Parallel transport: \text{PT}_{x \to y}(m) moves the momentum from the old tangent space to the new one

RuVector's ruvector-attention/src/training/optimizer.rs provides the foundation; extending it to Riemannian Adam requires adding exp_map and log_map calls (already available in poincare.rs and lorentz_cascade.rs::tangent).

Projection-Free Training on Manifolds

An alternative to Riemannian optimization is projection-free training, where parameters are optimized in the ambient Euclidean space and projected back to the manifold after each step:

\theta_{t+1} = \text{proj}_\mathcal{M}(\theta_t - \eta \nabla_E \mathcal{L})

For the Poincare ball, this is simply project_to_ball. For the hyperboloid, project_hyperboloid. For the sphere, normalize to unit length. The advantage is compatibility with existing optimizers (Adam, SGD); the disadvantage is that projection introduces bias proportional to the step size.

RuVector's tangent space approach (TangentSpaceMapper) offers a practical middle ground: map to tangent space at the origin, perform standard operations, then map back. This is exact for small perturbations and provides 10-100x speedup over full geodesic operations.

5. Lie Group Equivariant Graph Attention

SE(3) and SO(3) Equivariance

For molecular graphs and physical simulations, attention must respect the symmetries of 3D space. An SE(3)-equivariant Graph Transformer satisfies:

f(Rx + t, Rh) = Rf(x, h)

for all rotations R \in SO(3) and translations t \in \mathbb{R}^3. This means the model's output transforms consistently with rigid body motions.

The key construction is equivariant attention using invariant features:

\alpha_{ij} = \phi\left(\|x_i - x_j\|, \langle h_i, h_j \rangle, h_i^T(x_i - x_j)\right)

The attention weights depend only on invariants (distances, inner products, projections), ensuring equivariance of the full attention layer. Value messages are constructed using equivariant basis functions:

m_{ij} = \alpha_{ij} \left(w_0 h_j + w_1 (x_i - x_j) + w_2 (x_i - x_j) \times h_j\right)

where the cross product ensures the message transforms correctly under rotations.

General Lie Group Equivariance

Beyond SE(3), graphs with symmetry group G require $G$-equivariant attention. The general framework uses fiber bundles: each node carries a feature that transforms under a representation \rho of G, and message passing uses intertwining operators.

For a Lie group G acting on the graph, equivariant attention decomposes into irreducible representations:

\alpha_{ij} = \sum_l \alpha_{ij}^{(l)} \cdot \rho^{(l)}(g_{ij})

where g_{ij} \in G is the relative group element between nodes i and j, and \rho^{(l)} is the $l$-th irreducible representation.

This connects to RuVector's sheaf attention module (ruvector-attention/src/sheaf/), where restriction maps between stalks play a role analogous to parallel transport between fibers in the Lie group setting.


Research Timeline

2026-2030: Mixed-Curvature GNNs Become Standard

Knowledge Graphs (2026-2028): Knowledge graphs like Wikidata and Freebase combine deep hierarchies (is-a relations), lateral associations (related-to), and cyclic patterns (mutual relations). Product manifold embeddings H^{64} \times S^{32} \times \mathbb{R}^{128} achieve 15-25% better link prediction than flat embeddings at half the dimensionality. RuVector's existing FusedCurvatureConfig provides the production-ready kernel.

Molecular Design (2027-2029): Drug discovery graphs have hierarchical scaffolds, cyclic ring systems, and flat functional group features. SE(3)-equivariant product manifold transformers replace flat-space message passing networks, achieving state-of-the-art on molecular property prediction benchmarks.

Social Networks (2028-2030): Community detection in social networks benefits from hyperbolic embeddings (communities are hierarchical), spherical embeddings (cliques are cyclic), and Euclidean embeddings (content similarity). Mixed-curvature Graph Transformers become the standard architecture for large-scale social graph analysis.

2030-2036: Continuous Manifold Graph Transformers

Learned Curvature Fields (2030-2032): Instead of a fixed product manifold, the curvature becomes a learned function of position: \kappa(x): \mathcal{M} \to \mathbb{R}. The manifold itself adapts to the local structure of the graph. Regions with tree-like structure automatically develop negative curvature; regions with cliques develop positive curvature; transition zones have near-zero curvature. This requires solving geodesic equations numerically on the learned manifold.

Arbitrary Riemannian Manifolds (2032-2034): Graph Transformers operate on manifolds defined by their learned metric tensor g_{ij}(x) rather than restricted to constant-curvature spaces. The exponential map, parallel transport, and geodesic attention are computed via neural ODE solvers. RuVector's PDE attention module (ruvector-attention/src/pde_attention/) provides the diffusion-based foundation.

Manifold-Valued Graph Neural Fields (2034-2036): The discrete graph is replaced by a continuous neural field on a manifold: f: \mathcal{M} \to \mathcal{N}, where both the domain manifold \mathcal{M} and the codomain manifold \mathcal{N} are learned. Attention becomes a kernel on the product manifold \mathcal{M} \times \mathcal{N}. This unifies graph transformers with neural radiance fields, geometric deep learning, and topological data analysis.


Architecture Proposals

Product Manifold Attention Layer

Input: node embeddings x_i = (x_i^E, x_i^H, x_i^S) in R^k x H^m x S^n

For each component space M_j in {R^k, H^m, S^n}:
    Q_j = W_Q^j * x^j                     // Linear projection (in tangent space for H, S)
    K_j = W_K^j * x^j
    V_j = W_V^j * x^j

    alpha_ij^j = softmax(-d_{M_j}(Q_j_i, K_j_l) / tau_j)   // Geodesic attention
    out_j_i = AGGREGATE_{M_j}({alpha_ij^j, V_j_l})           // Manifold-aware aggregation

// Fused attention (single kernel, as in RuVector's fused_attention.rs):
alpha_ij = softmax(beta_E * <Q_E_i, K_E_j> + beta_H * <Q_H_i, K_H_j>_tan + beta_S * <Q_S_i, K_S_j>_S)

// Aggregation per component:
out_E_i = sum_j alpha_ij * V_E_j                              // Euclidean: weighted average
out_H_i = einstein_midpoint({alpha_ij, V_H_j}, c)             // Hyperbolic: Einstein midpoint
out_S_i = normalize(sum_j alpha_ij * V_S_j)                   // Spherical: weighted + project

Output: (out_E_i, out_H_i, out_S_i)

Rust Pseudocode: Product Manifold Attention

/// Product manifold attention layer operating on S^n x H^m x R^k
pub struct ProductManifoldAttention {
    /// Per-component configurations with learned curvatures
    components: Vec<ManifoldComponent>,
    /// Fused attention kernel for single-pass computation
    fused_kernel: FusedCurvatureKernel,
    /// Tangent space mapper for fast hyperbolic operations
    tangent_mapper: TangentSpaceMapper,
    /// Riemannian optimizer state
    optimizer: RiemannianAdamState,
}

#[derive(Clone)]
pub enum ManifoldComponent {
    Euclidean { dim: usize },
    Hyperbolic { dim: usize, curvature: f32 },   // curvature < 0
    Spherical { dim: usize, curvature: f32 },     // curvature > 0
}

impl ProductManifoldAttention {
    /// Compute product manifold attention with geodesic scoring
    pub fn forward(
        &self,
        queries: &[Vec<f32>],    // [N, D_total]
        keys: &[Vec<f32>],       // [M, D_total]
        values: &[Vec<f32>],     // [M, D_total]
        graph_adj: &CsrMatrix,   // Sparse adjacency (attention mask)
    ) -> Vec<Vec<f32>> {
        let n = queries.len();
        let mut outputs = Vec::with_capacity(n);

        for i in 0..n {
            let q = &queries[i];
            let neighbors = graph_adj.neighbors(i);

            // Split query into component spaces
            let (q_e, q_h, q_s) = self.split_components(q);

            // Compute fused attention scores in single pass
            let mut logits = Vec::with_capacity(neighbors.len());
            for &j in &neighbors {
                let k = &keys[j];
                let (k_e, k_h, k_s) = self.split_components(k);

                // Euclidean: dot product
                let score_e = dot_product(q_e, k_e);

                // Hyperbolic: tangent-space dot product (fast path)
                let q_h_tan = self.tangent_mapper.log_map(q_h);
                let k_h_tan = self.tangent_mapper.log_map(k_h);
                let score_h = dot_product(&q_h_tan, &k_h_tan);

                // Spherical: cosine similarity on unit sphere
                let score_s = cosine_similarity(q_s, k_s);

                // Fused logit with learned mixing weights
                let logit = self.fused_kernel.weight_e * score_e
                    + self.fused_kernel.weight_h * score_h
                    + self.fused_kernel.weight_s * score_s;

                logits.push(logit);
            }

            // Softmax over neighbor logits
            let weights = softmax_with_temperature(&logits, self.fused_kernel.temperature);

            // Per-component aggregation
            let mut out_e = vec![0.0; self.euclidean_dim()];
            let mut out_h_weighted = Vec::new(); // for Einstein midpoint
            let mut out_s = vec![0.0; self.spherical_dim()];

            for (idx, &j) in neighbors.iter().enumerate() {
                let v = &values[j];
                let (v_e, v_h, v_s) = self.split_components(v);
                let w = weights[idx];

                // Euclidean: simple weighted sum
                for (d, &val) in v_e.iter().enumerate() {
                    out_e[d] += w * val;
                }

                // Hyperbolic: collect for Einstein midpoint
                out_h_weighted.push((w, v_h.to_vec()));

                // Spherical: weighted sum then project
                for (d, &val) in v_s.iter().enumerate() {
                    out_s[d] += w * val;
                }
            }

            // Hyperbolic aggregation via Einstein midpoint (closed-form)
            let hyp_curvature = self.hyperbolic_curvature();
            let hyp_points: Vec<&[f32]> = out_h_weighted.iter()
                .map(|(_, v)| v.as_slice()).collect();
            let hyp_weights: Vec<f32> = out_h_weighted.iter()
                .map(|(w, _)| *w).collect();
            let out_h = einstein_midpoint(&hyp_points, &hyp_weights, hyp_curvature);

            // Spherical: project weighted sum back to unit sphere
            let out_s = l2_normalize(&out_s);

            // Concatenate component outputs
            let output = concat_components(&out_e, &out_h, &out_s);
            outputs.push(output);
        }

        outputs
    }

    /// Riemannian gradient step: compute gradients in tangent space,
    /// then retract back to manifold via exponential map
    pub fn riemannian_step(&mut self, loss: f32, learning_rate: f32) {
        for component in &mut self.components {
            match component {
                ManifoldComponent::Euclidean { .. } => {
                    // Standard Euclidean Adam step
                }
                ManifoldComponent::Hyperbolic { curvature, .. } => {
                    // 1. Project Euclidean gradient to tangent space
                    // 2. Riemannian Adam update in tangent space
                    // 3. Exponential map back to Poincare ball / hyperboloid
                    let c = curvature.abs();
                    // grad_riemannian = (1/(lambda_x^2)) * grad_euclidean
                    // theta_new = exp_map(theta_old, -lr * grad_riemannian)
                }
                ManifoldComponent::Spherical { .. } => {
                    // 1. Project gradient to tangent plane of sphere
                    // 2. Update in tangent space
                    // 3. Normalize back to unit sphere
                }
            }
        }

        // Optionally update curvatures via gradient descent
        // d(loss)/d(kappa) flows through geodesic distance
    }
}

Curvature-Adaptive Graph Transformer Block

                    Input: x in M = S^n x H^m x R^k
                               |
                    +----------+-----------+
                    |                      |
            Product Manifold          Curvature
            Self-Attention            Estimator
            (geodesic QKV)         (kappa = f(x))
                    |                      |
                    +----------+-----------+
                               |
                    Parallel Transport Aggregation
                    (Levi-Civita connection)
                               |
                    Tangent Space Feed-Forward
                    (operate in T_x M, map back via exp)
                               |
                    Riemannian LayerNorm
                    (normalize on manifold)
                               |
                    Output: x' in M

Mathematical Formulations

Geodesic Attention

For two points x, y on a Riemannian manifold (\mathcal{M}, g):

\text{GeodesicAttention}(Q, K, V) = \text{Agg}_{\mathcal{M}}\left(\text{softmax}\left(-\frac{d_g(Q, K)}{\tau}\right) \cdot V\right)

where d_g is the geodesic distance induced by metric g, and \text{Agg}_{\mathcal{M}} is the manifold-appropriate aggregation.

Exponential Map Aggregation

Given weighted values \{(w_i, v_i)\} in the tangent space T_x\mathcal{M}:

\text{Agg}(x, \{w_i, v_i\}) = \text{Exp}_x\left(\sum_i w_i \cdot \text{Log}_x(v_i)\right)

This is equivalent to one step of Riemannian gradient descent toward the weighted Frechet mean.

Product Manifold Distance

For x = (x^{(1)}, \ldots, x^{(p)}) and y = (y^{(1)}, \ldots, y^{(p)}) in \mathcal{M} = \prod_i \mathcal{M}_i^{\kappa_i}:

d_{\mathcal{M}}(x, y)^2 = \sum_{i=1}^p \beta_i \cdot d_{\mathcal{M}_i}^{\kappa_i}(x^{(i)}, y^{(i)})^2

where each d_{\mathcal{M}_i}^{\kappa_i} is the sectional-curvature-\kappa_i geodesic distance.

Curvature Gradient

For learned curvature c in the Poincare model, the gradient of the distance with respect to curvature is:

\frac{\partial d_c(x,y)}{\partial c} = \frac{1}{2\sqrt{c(\alpha^2 - 1)}} \left(\frac{\partial \alpha}{\partial c} - \frac{\alpha \cdot \text{arcosh}(\alpha)}{c}\right)

where \alpha = 1 + 2c\|x-y\|^2 / ((1-c\|x\|^2)(1-c\|y\|^2)).


Implementation Roadmap for RuVector

Phase 1: Extend Fused Curvature Attention (3-4 months)

  • Add learned per-component curvature to FusedCurvatureConfig
  • Implement curvature gradient computation in ruvector-attention/src/curvature/
  • Extend TangentSpaceMapper to handle variable curvatures per batch element
  • Add spherical aggregation (normalize after weighted sum) alongside Einstein midpoint
  • Benchmark against fixed-curvature baseline

Phase 2: Parallel Transport and Riemannian Optimization (4-6 months)

  • Implement parallel transport for Poincare ball and Lorentz model
  • Build RiemannianAdam optimizer extending ruvector-attention/src/training/optimizer.rs
  • Add Levi-Civita connection-based message passing to ruvector-graph
  • Integrate with ruvector-solver for sublinear geodesic computation on large graphs

Phase 3: Lie Group Equivariance (6-9 months)

  • Add SE(3)-equivariant attention for molecular graphs
  • Implement fiber bundle framework connecting to ruvector-attention/src/sheaf/
  • Extend ruvector-graph property graph to carry manifold-valued node features
  • Develop equivariant sparse attention using ruvector-dag/src/mincut/ for graph sparsification

Phase 4: Continuous Curvature Fields (12-18 months)

  • Implement neural curvature field \kappa(x) using small MLP
  • Develop numerical geodesic solver for non-constant curvature (connect to PDE attention module)
  • Build differentiable metric tensor learning
  • Integrate with ruvector-temporal-tensor for time-varying curvature fields

Success Metrics

Metric Baseline (Euclidean) Target (Product Manifold)
Knowledge graph link prediction (MRR) 0.45 0.55-0.60
Hierarchy reconstruction accuracy 65% 85-95%
Embedding dimension for same quality 256 128
Attention computation (fused kernel) 1.0x 1.2x (overhead acceptable)
Training convergence (epochs) 100 60-70
Molecular property prediction (MAE) 1.0x 0.80-0.85x

References

  1. Bachmann, Becigneul, Ganea (2020). "Constant Curvature Graph Convolutional Networks." ICML.
  2. Chami, Ying, Re, Leskovec (2019). "Hyperbolic Graph Convolutional Neural Networks." NeurIPS.
  3. Gu, Sala, Gunel, Re (2019). "Learning Mixed-Curvature Representations in Product Spaces." ICLR.
  4. Nickel, Kiela (2017). "Poincare Embeddings for Learning Hierarchical Representations." NeurIPS.
  5. Sala, De Sa, Gu, Re (2018). "Representation Tradeoffs for Hyperbolic Embeddings." ICML.
  6. Ungar (2008). "Analytic Hyperbolic Geometry and Albert Einstein's Special Theory of Relativity."
  7. Ganea, Becigneul, Hofmann (2018). "Hyperbolic Neural Networks." NeurIPS.
  8. Fuchs, Worrall, Fischer, Welling (2020). "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks." NeurIPS.
  9. Brandstetter, Hesselink, van der Pol, Bekkers, Welling (2022). "Geometric and Physical Quantities Improve E(3) Equivariant Message Passing." ICLR.
  10. Skopek, Ganea, Becigneul (2020). "Mixed-curvature Variational Autoencoders." ICLR.
  11. Lou, Nickel, Zantedeschi (2020). "Differentiating through the Frechet Mean." ICML.
  12. Xiong, Zhu, Hsieh, Ma, Liu (2022). "Pseudo-Riemannian Graph Convolutional Networks." NeurIPS.

Document Status: Research Proposal Last Updated: 2026-02-25 Owner: RuVector Architecture Team Related ADRs: ADR-045 (Lean Agentic Integration) Related Crates: ruvector-attention, ruvector-graph, ruvector-solver, ruvector-dag