29 KiB
Hyperbolic and Mixed-Curvature Graph Transformers: Product Manifold Attention
Overview
Problem Statement
Graph Transformers have become the dominant architecture for learning on relational data, yet nearly all deployed systems operate in flat Euclidean space. This is a geometric mismatch: most real-world graphs are not flat.
Why Euclidean space fails for real-world graphs:
- Power-law degree distributions (social networks, citation graphs, the web) exhibit tree-like branching that requires exponentially many dimensions in Euclidean space to embed without distortion. A binary tree of depth
dhas2^dleaves, but fitting them equidistantly in\mathbb{R}^nrequiresn \geq 2^d - 1dimensions. - Hierarchical structures (taxonomies, organizational charts, ontologies) naturally live in hyperbolic space, where the volume of a ball grows exponentially with radius -- matching the exponential growth of tree levels.
- Cyclic substructures (molecular rings, periodic lattices, social cliques) have positive curvature and embed naturally on spheres
S^n. - Hybrid graphs (knowledge graphs combining hierarchies with lateral associations) require multiple curvature regimes simultaneously.
The consequence: flat-space Graph Transformers waste capacity representing geometric structure that is free in the correct curved space, leading to higher distortion, larger models, and slower convergence.
Proposed Solution
Develop Product Manifold Graph Transformers that operate natively on mixed-curvature spaces. The core decomposition is:
\mathcal{M} = S^{n_1} \times H^{n_2} \times \mathbb{R}^{n_3}
where S^{n_1} captures cyclic/clustered structure, H^{n_2} captures hierarchical structure, and \mathbb{R}^{n_3} captures flat semantic similarity. Every component of the attention mechanism -- queries, keys, values, aggregation, and optimization -- operates in its geometrically appropriate space.
Connection to RuVector
RuVector already has substantial infrastructure for this research direction:
ruvector-attention/src/hyperbolic/: Poincare ball operations (poincare.rs), Lorentz cascade attention with Busemann scoring and Einstein midpoint (lorentz_cascade.rs), mixed-curvature attention (mixed_curvature.rs)ruvector-attention/src/curvature/: Fused E x H x S attention (fused_attention.rs), tangent space mapping (tangent_space.rs), component quantizer (component_quantizer.rs)ruvector-attention/src/transport/: Sliced Wasserstein and centroid optimal transport attentionruvector-attention/src/topology/: Topology-gated attention with coherence metricsruvector-graph/: Full property graph with Cypher queries, distributed federation, hybrid vector-graph searchruvector-solver/: Sublinear graph solvers (forward/backward push, CG, random walk, BMSSP)
This document extends RuVector's existing mixed-curvature capabilities toward full product manifold Graph Transformers with learned curvature fields.
Technical Deep Dive
1. Hyperbolic Graph Transformers
Poincare Ball Attention
In the Poincare ball model \mathbb{B}^n_c = \{x \in \mathbb{R}^n : c\|x\|^2 < 1\}, the standard dot-product attention \text{softmax}(QK^T / \sqrt{d}) is replaced with geodesic attention:
\alpha_{ij} = \frac{\exp(-d_{\mathbb{B}}(q_i, k_j) / \tau)}{\sum_l \exp(-d_{\mathbb{B}}(q_i, k_l) / \tau)}
where d_{\mathbb{B}}(x, y) = \frac{1}{\sqrt{c}} \text{arcosh}\left(1 + \frac{2c\|x - y\|^2}{(1 - c\|x\|^2)(1 - c\|y\|^2)}\right).
RuVector's poincare.rs already implements this with numerical stability via epsilon-buffered projection. The key insight from Lorentz cascade attention (lorentz_cascade.rs) is that the Lorentz model avoids boundary instability entirely: points live on the hyperboloid \{x : \langle x, x \rangle_L = -1/c, x_0 > 0\} rather than inside a ball, and attention scores reduce to Busemann functions (single dot products).
Lorentz Model Message Passing
In the Lorentz model, message passing between graph nodes proceeds as:
- Embed each node
vonto the hyperboloid:h_v \in H^n_c - Attend using Busemann scoring:
B_\xi(x) = \ln(-\langle x, \xi \rangle_L), where\xiis a light-like focal direction defining the hierarchy - Aggregate via Einstein midpoint (closed-form, unlike iterative Frechet mean):
\bar{h} = \text{proj}_H\left(\sum_i w_i \gamma_i h_i / \|\sum_i w_i \gamma_i h_i\|_L\right)where\gamma_iis the Lorentz factor
RuVector's LorentzCascadeAttention implements this with multi-curvature heads operating at logarithmically-spaced curvatures, capturing hierarchy at multiple scales simultaneously.
Gyrovector Aggregation
Standard weighted averaging in Euclidean space (\bar{v} = \sum_i w_i v_i) does not preserve the Poincare ball constraint. Instead, aggregation must use Mobius operations:
\text{AGGREGATE}(\{(w_i, v_i)\}) = \bigoplus_{i=1}^n (w_i \otimes_c v_i)
where \oplus_c is Mobius addition and \otimes_c is Mobius scalar multiplication. RuVector's poincare.rs provides mobius_add and mobius_scalar_mult with full numerical stability.
The practical limitation is that Mobius aggregation is sequential -- each addition depends on the previous result. The Frechet mean (frechet_mean in RuVector) offers a parallel alternative via Riemannian gradient descent in the tangent space.
2. Mixed-Curvature Product Manifolds
S^n \times H^m \times \mathbb{R}^k Decomposition
A product manifold \mathcal{M} = \mathcal{M}_1 \times \mathcal{M}_2 \times \cdots \times \mathcal{M}_p has the metric:
d_{\mathcal{M}}(x, y)^2 = \sum_{i=1}^p \beta_i \cdot d_{\mathcal{M}_i}(x^{(i)}, y^{(i)})^2
where \beta_i are learnable mixing weights and each \mathcal{M}_i is either spherical (\kappa_i > 0), hyperbolic (\kappa_i < 0), or Euclidean (\kappa_i = 0).
RuVector's FusedCurvatureConfig already defines this decomposition:
pub struct FusedCurvatureConfig {
pub euclidean_dim: usize, // R^k component
pub hyperbolic_dim: usize, // H^m component
pub spherical_dim: usize, // S^n component
pub weight_e: f32, // beta_E
pub weight_h: f32, // beta_H
pub weight_s: f32, // beta_S
pub hyperbolic_curvature: f32,
}
The fused attention kernel computes all three similarities in a single vectorized pass:
\text{logit}(q, k) = \beta_E \langle q_E, k_E \rangle + \beta_H \langle q_{H}^{\text{tan}}, k_{H}^{\text{tan}} \rangle + \beta_S \langle q_S, k_S \rangle_S
where the hyperbolic component uses tangent-space dot products (10-100x faster than geodesic distance per RuVector's TangentSpaceMapper) and the spherical component uses normalized inner products on the unit sphere.
Curvature-Per-Component
Rather than a single global curvature, each dimension group can have its own learned curvature. For a product of p components:
\mathcal{M} = \mathcal{M}_1^{\kappa_1} \times \mathcal{M}_2^{\kappa_2} \times \cdots \times \mathcal{M}_p^{\kappa_p}
This is the key extension beyond RuVector's current MixedCurvatureConfig (which uses a single curvature for the hyperbolic component). The research direction is to make \kappa_i learnable per-component, enabling the model to discover which curvature best fits each subspace of the embedding.
Optimal Curvature Learning
Given a graph G = (V, E) with known structure, the optimal curvature for a hyperbolic component can be estimated as:
\kappa^* = -\frac{4\delta^2}{(\text{diam}(G))^2}
where \delta is the Gromov hyperbolicity (measuring how tree-like the graph is) and \text{diam}(G) is the graph diameter. RuVector's solver crate provides the graph traversal primitives needed to compute both quantities sublinearly.
For learnable curvatures during training, the gradient flows through the exponential map:
\frac{\partial \mathcal{L}}{\partial \kappa} = \frac{\partial \mathcal{L}}{\partial d_\kappa} \cdot \frac{\partial d_\kappa}{\partial \kappa}
The curvature gradient for the Poincare distance is:
\frac{\partial d_c}{\partial c} = -\frac{1}{2c^{3/2}} \text{arcosh}(\alpha) + \frac{1}{\sqrt{c}} \frac{1}{\sqrt{\alpha^2 - 1}} \frac{\partial \alpha}{\partial c}
where \alpha = 1 + 2c\|x - y\|^2 / ((1 - c\|x\|^2)(1 - c\|y\|^2)).
3. Curvature-Adaptive Routing
Attention Weights as Parallel Transport
In a curved space, moving a vector from one tangent space to another requires parallel transport along the geodesic connecting them. Standard attention aggregation implicitly assumes all values live in the same space, which is only true in flat space.
For a message from node j to node i, the value v_j must be parallel-transported from T_{h_j}\mathcal{M} to T_{h_i}\mathcal{M}:
\tilde{v}_j = \Gamma_{h_j \to h_i}(v_j)
In the Poincare ball, parallel transport along the geodesic from x to y is:
\Gamma_{x \to y}(v) = \frac{\lambda_x}{\lambda_y} \cdot \text{gyr}[y, -x](v)
where \lambda_x = 2/(1 - c\|x\|^2) is the conformal factor and \text{gyr} is the gyration operator (Thomas precession). This connects to RuVector's transport module (ruvector-attention/src/transport/), which uses optimal transport for attention -- the Wasserstein distance provides a natural way to compute transport plans between distributions on manifolds.
Levi-Civita Connection for Message Passing
The Levi-Civita connection \nabla provides the unique torsion-free, metric-compatible way to differentiate vector fields on a manifold. For graph message passing on a Riemannian manifold (\mathcal{M}, g):
m_{i \leftarrow j} = \alpha_{ij} \cdot \Gamma_{j \to i}^{\nabla}(W_v h_j)
where \Gamma_{j \to i}^{\nabla} is parallel transport along the Levi-Civita connection. The Christoffel symbols \Gamma^k_{ij} encode the connection in coordinates:
\Gamma^k_{ij} = \frac{1}{2} g^{kl}\left(\frac{\partial g_{jl}}{\partial x^i} + \frac{\partial g_{il}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^l}\right)
For the Poincare ball with conformal factor \lambda_x = 2/(1 - c\|x\|^2), the Christoffel symbols simplify considerably, enabling efficient implementation.
4. Riemannian Optimization for Graph Transformers
Riemannian Adam
Standard Adam cannot be applied directly on manifolds because the update rule \theta_{t+1} = \theta_t - \eta \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon) does not preserve manifold constraints. Riemannian Adam replaces Euclidean operations with their Riemannian counterparts:
Algorithm: Riemannian Adam on Product Manifold M
Input: Learning rate eta, decay rates beta_1, beta_2, parameters theta in M
Initialize: m_0 = 0, v_0 = 0 (in tangent space at theta_0)
For t = 1, 2, ...:
g_t = Riemannian_gradient(L, theta_{t-1}) // Project Euclidean grad to tangent space
m_t = beta_1 * PT(m_{t-1}) + (1 - beta_1) * g_t // Parallel transport first moment
v_t = beta_2 * v_{t-1} + (1 - beta_2) * g_t^2 // Second moment (scalar, no transport)
m_hat = m_t / (1 - beta_1^t)
v_hat = v_t / (1 - beta_2^t)
update = -eta * m_hat / (sqrt(v_hat) + epsilon)
theta_t = Exp_{theta_{t-1}}(update) // Exponential map back to manifold
The key operations are:
- Riemannian gradient:
\text{grad}_\mathcal{M} f = \frac{1}{\lambda_x^2} \nabla_E f(rescale Euclidean gradient by inverse metric) - Exponential map:
\text{Exp}_x(v)moves fromxin directionvalong the geodesic - Parallel transport:
\text{PT}_{x \to y}(m)moves the momentum from the old tangent space to the new one
RuVector's ruvector-attention/src/training/optimizer.rs provides the foundation; extending it to Riemannian Adam requires adding exp_map and log_map calls (already available in poincare.rs and lorentz_cascade.rs::tangent).
Projection-Free Training on Manifolds
An alternative to Riemannian optimization is projection-free training, where parameters are optimized in the ambient Euclidean space and projected back to the manifold after each step:
\theta_{t+1} = \text{proj}_\mathcal{M}(\theta_t - \eta \nabla_E \mathcal{L})
For the Poincare ball, this is simply project_to_ball. For the hyperboloid, project_hyperboloid. For the sphere, normalize to unit length. The advantage is compatibility with existing optimizers (Adam, SGD); the disadvantage is that projection introduces bias proportional to the step size.
RuVector's tangent space approach (TangentSpaceMapper) offers a practical middle ground: map to tangent space at the origin, perform standard operations, then map back. This is exact for small perturbations and provides 10-100x speedup over full geodesic operations.
5. Lie Group Equivariant Graph Attention
SE(3) and SO(3) Equivariance
For molecular graphs and physical simulations, attention must respect the symmetries of 3D space. An SE(3)-equivariant Graph Transformer satisfies:
f(Rx + t, Rh) = Rf(x, h)
for all rotations R \in SO(3) and translations t \in \mathbb{R}^3. This means the model's output transforms consistently with rigid body motions.
The key construction is equivariant attention using invariant features:
\alpha_{ij} = \phi\left(\|x_i - x_j\|, \langle h_i, h_j \rangle, h_i^T(x_i - x_j)\right)
The attention weights depend only on invariants (distances, inner products, projections), ensuring equivariance of the full attention layer. Value messages are constructed using equivariant basis functions:
m_{ij} = \alpha_{ij} \left(w_0 h_j + w_1 (x_i - x_j) + w_2 (x_i - x_j) \times h_j\right)
where the cross product ensures the message transforms correctly under rotations.
General Lie Group Equivariance
Beyond SE(3), graphs with symmetry group G require $G$-equivariant attention. The general framework uses fiber bundles: each node carries a feature that transforms under a representation \rho of G, and message passing uses intertwining operators.
For a Lie group G acting on the graph, equivariant attention decomposes into irreducible representations:
\alpha_{ij} = \sum_l \alpha_{ij}^{(l)} \cdot \rho^{(l)}(g_{ij})
where g_{ij} \in G is the relative group element between nodes i and j, and \rho^{(l)} is the $l$-th irreducible representation.
This connects to RuVector's sheaf attention module (ruvector-attention/src/sheaf/), where restriction maps between stalks play a role analogous to parallel transport between fibers in the Lie group setting.
Research Timeline
2026-2030: Mixed-Curvature GNNs Become Standard
Knowledge Graphs (2026-2028): Knowledge graphs like Wikidata and Freebase combine deep hierarchies (is-a relations), lateral associations (related-to), and cyclic patterns (mutual relations). Product manifold embeddings H^{64} \times S^{32} \times \mathbb{R}^{128} achieve 15-25% better link prediction than flat embeddings at half the dimensionality. RuVector's existing FusedCurvatureConfig provides the production-ready kernel.
Molecular Design (2027-2029): Drug discovery graphs have hierarchical scaffolds, cyclic ring systems, and flat functional group features. SE(3)-equivariant product manifold transformers replace flat-space message passing networks, achieving state-of-the-art on molecular property prediction benchmarks.
Social Networks (2028-2030): Community detection in social networks benefits from hyperbolic embeddings (communities are hierarchical), spherical embeddings (cliques are cyclic), and Euclidean embeddings (content similarity). Mixed-curvature Graph Transformers become the standard architecture for large-scale social graph analysis.
2030-2036: Continuous Manifold Graph Transformers
Learned Curvature Fields (2030-2032): Instead of a fixed product manifold, the curvature becomes a learned function of position: \kappa(x): \mathcal{M} \to \mathbb{R}. The manifold itself adapts to the local structure of the graph. Regions with tree-like structure automatically develop negative curvature; regions with cliques develop positive curvature; transition zones have near-zero curvature. This requires solving geodesic equations numerically on the learned manifold.
Arbitrary Riemannian Manifolds (2032-2034): Graph Transformers operate on manifolds defined by their learned metric tensor g_{ij}(x) rather than restricted to constant-curvature spaces. The exponential map, parallel transport, and geodesic attention are computed via neural ODE solvers. RuVector's PDE attention module (ruvector-attention/src/pde_attention/) provides the diffusion-based foundation.
Manifold-Valued Graph Neural Fields (2034-2036): The discrete graph is replaced by a continuous neural field on a manifold: f: \mathcal{M} \to \mathcal{N}, where both the domain manifold \mathcal{M} and the codomain manifold \mathcal{N} are learned. Attention becomes a kernel on the product manifold \mathcal{M} \times \mathcal{N}. This unifies graph transformers with neural radiance fields, geometric deep learning, and topological data analysis.
Architecture Proposals
Product Manifold Attention Layer
Input: node embeddings x_i = (x_i^E, x_i^H, x_i^S) in R^k x H^m x S^n
For each component space M_j in {R^k, H^m, S^n}:
Q_j = W_Q^j * x^j // Linear projection (in tangent space for H, S)
K_j = W_K^j * x^j
V_j = W_V^j * x^j
alpha_ij^j = softmax(-d_{M_j}(Q_j_i, K_j_l) / tau_j) // Geodesic attention
out_j_i = AGGREGATE_{M_j}({alpha_ij^j, V_j_l}) // Manifold-aware aggregation
// Fused attention (single kernel, as in RuVector's fused_attention.rs):
alpha_ij = softmax(beta_E * <Q_E_i, K_E_j> + beta_H * <Q_H_i, K_H_j>_tan + beta_S * <Q_S_i, K_S_j>_S)
// Aggregation per component:
out_E_i = sum_j alpha_ij * V_E_j // Euclidean: weighted average
out_H_i = einstein_midpoint({alpha_ij, V_H_j}, c) // Hyperbolic: Einstein midpoint
out_S_i = normalize(sum_j alpha_ij * V_S_j) // Spherical: weighted + project
Output: (out_E_i, out_H_i, out_S_i)
Rust Pseudocode: Product Manifold Attention
/// Product manifold attention layer operating on S^n x H^m x R^k
pub struct ProductManifoldAttention {
/// Per-component configurations with learned curvatures
components: Vec<ManifoldComponent>,
/// Fused attention kernel for single-pass computation
fused_kernel: FusedCurvatureKernel,
/// Tangent space mapper for fast hyperbolic operations
tangent_mapper: TangentSpaceMapper,
/// Riemannian optimizer state
optimizer: RiemannianAdamState,
}
#[derive(Clone)]
pub enum ManifoldComponent {
Euclidean { dim: usize },
Hyperbolic { dim: usize, curvature: f32 }, // curvature < 0
Spherical { dim: usize, curvature: f32 }, // curvature > 0
}
impl ProductManifoldAttention {
/// Compute product manifold attention with geodesic scoring
pub fn forward(
&self,
queries: &[Vec<f32>], // [N, D_total]
keys: &[Vec<f32>], // [M, D_total]
values: &[Vec<f32>], // [M, D_total]
graph_adj: &CsrMatrix, // Sparse adjacency (attention mask)
) -> Vec<Vec<f32>> {
let n = queries.len();
let mut outputs = Vec::with_capacity(n);
for i in 0..n {
let q = &queries[i];
let neighbors = graph_adj.neighbors(i);
// Split query into component spaces
let (q_e, q_h, q_s) = self.split_components(q);
// Compute fused attention scores in single pass
let mut logits = Vec::with_capacity(neighbors.len());
for &j in &neighbors {
let k = &keys[j];
let (k_e, k_h, k_s) = self.split_components(k);
// Euclidean: dot product
let score_e = dot_product(q_e, k_e);
// Hyperbolic: tangent-space dot product (fast path)
let q_h_tan = self.tangent_mapper.log_map(q_h);
let k_h_tan = self.tangent_mapper.log_map(k_h);
let score_h = dot_product(&q_h_tan, &k_h_tan);
// Spherical: cosine similarity on unit sphere
let score_s = cosine_similarity(q_s, k_s);
// Fused logit with learned mixing weights
let logit = self.fused_kernel.weight_e * score_e
+ self.fused_kernel.weight_h * score_h
+ self.fused_kernel.weight_s * score_s;
logits.push(logit);
}
// Softmax over neighbor logits
let weights = softmax_with_temperature(&logits, self.fused_kernel.temperature);
// Per-component aggregation
let mut out_e = vec![0.0; self.euclidean_dim()];
let mut out_h_weighted = Vec::new(); // for Einstein midpoint
let mut out_s = vec![0.0; self.spherical_dim()];
for (idx, &j) in neighbors.iter().enumerate() {
let v = &values[j];
let (v_e, v_h, v_s) = self.split_components(v);
let w = weights[idx];
// Euclidean: simple weighted sum
for (d, &val) in v_e.iter().enumerate() {
out_e[d] += w * val;
}
// Hyperbolic: collect for Einstein midpoint
out_h_weighted.push((w, v_h.to_vec()));
// Spherical: weighted sum then project
for (d, &val) in v_s.iter().enumerate() {
out_s[d] += w * val;
}
}
// Hyperbolic aggregation via Einstein midpoint (closed-form)
let hyp_curvature = self.hyperbolic_curvature();
let hyp_points: Vec<&[f32]> = out_h_weighted.iter()
.map(|(_, v)| v.as_slice()).collect();
let hyp_weights: Vec<f32> = out_h_weighted.iter()
.map(|(w, _)| *w).collect();
let out_h = einstein_midpoint(&hyp_points, &hyp_weights, hyp_curvature);
// Spherical: project weighted sum back to unit sphere
let out_s = l2_normalize(&out_s);
// Concatenate component outputs
let output = concat_components(&out_e, &out_h, &out_s);
outputs.push(output);
}
outputs
}
/// Riemannian gradient step: compute gradients in tangent space,
/// then retract back to manifold via exponential map
pub fn riemannian_step(&mut self, loss: f32, learning_rate: f32) {
for component in &mut self.components {
match component {
ManifoldComponent::Euclidean { .. } => {
// Standard Euclidean Adam step
}
ManifoldComponent::Hyperbolic { curvature, .. } => {
// 1. Project Euclidean gradient to tangent space
// 2. Riemannian Adam update in tangent space
// 3. Exponential map back to Poincare ball / hyperboloid
let c = curvature.abs();
// grad_riemannian = (1/(lambda_x^2)) * grad_euclidean
// theta_new = exp_map(theta_old, -lr * grad_riemannian)
}
ManifoldComponent::Spherical { .. } => {
// 1. Project gradient to tangent plane of sphere
// 2. Update in tangent space
// 3. Normalize back to unit sphere
}
}
}
// Optionally update curvatures via gradient descent
// d(loss)/d(kappa) flows through geodesic distance
}
}
Curvature-Adaptive Graph Transformer Block
Input: x in M = S^n x H^m x R^k
|
+----------+-----------+
| |
Product Manifold Curvature
Self-Attention Estimator
(geodesic QKV) (kappa = f(x))
| |
+----------+-----------+
|
Parallel Transport Aggregation
(Levi-Civita connection)
|
Tangent Space Feed-Forward
(operate in T_x M, map back via exp)
|
Riemannian LayerNorm
(normalize on manifold)
|
Output: x' in M
Mathematical Formulations
Geodesic Attention
For two points x, y on a Riemannian manifold (\mathcal{M}, g):
\text{GeodesicAttention}(Q, K, V) = \text{Agg}_{\mathcal{M}}\left(\text{softmax}\left(-\frac{d_g(Q, K)}{\tau}\right) \cdot V\right)
where d_g is the geodesic distance induced by metric g, and \text{Agg}_{\mathcal{M}} is the manifold-appropriate aggregation.
Exponential Map Aggregation
Given weighted values \{(w_i, v_i)\} in the tangent space T_x\mathcal{M}:
\text{Agg}(x, \{w_i, v_i\}) = \text{Exp}_x\left(\sum_i w_i \cdot \text{Log}_x(v_i)\right)
This is equivalent to one step of Riemannian gradient descent toward the weighted Frechet mean.
Product Manifold Distance
For x = (x^{(1)}, \ldots, x^{(p)}) and y = (y^{(1)}, \ldots, y^{(p)}) in \mathcal{M} = \prod_i \mathcal{M}_i^{\kappa_i}:
d_{\mathcal{M}}(x, y)^2 = \sum_{i=1}^p \beta_i \cdot d_{\mathcal{M}_i}^{\kappa_i}(x^{(i)}, y^{(i)})^2
where each d_{\mathcal{M}_i}^{\kappa_i} is the sectional-curvature-\kappa_i geodesic distance.
Curvature Gradient
For learned curvature c in the Poincare model, the gradient of the distance with respect to curvature is:
\frac{\partial d_c(x,y)}{\partial c} = \frac{1}{2\sqrt{c(\alpha^2 - 1)}} \left(\frac{\partial \alpha}{\partial c} - \frac{\alpha \cdot \text{arcosh}(\alpha)}{c}\right)
where \alpha = 1 + 2c\|x-y\|^2 / ((1-c\|x\|^2)(1-c\|y\|^2)).
Implementation Roadmap for RuVector
Phase 1: Extend Fused Curvature Attention (3-4 months)
- Add learned per-component curvature to
FusedCurvatureConfig - Implement curvature gradient computation in
ruvector-attention/src/curvature/ - Extend
TangentSpaceMapperto handle variable curvatures per batch element - Add spherical aggregation (normalize after weighted sum) alongside Einstein midpoint
- Benchmark against fixed-curvature baseline
Phase 2: Parallel Transport and Riemannian Optimization (4-6 months)
- Implement parallel transport for Poincare ball and Lorentz model
- Build
RiemannianAdamoptimizer extendingruvector-attention/src/training/optimizer.rs - Add Levi-Civita connection-based message passing to
ruvector-graph - Integrate with
ruvector-solverfor sublinear geodesic computation on large graphs
Phase 3: Lie Group Equivariance (6-9 months)
- Add SE(3)-equivariant attention for molecular graphs
- Implement fiber bundle framework connecting to
ruvector-attention/src/sheaf/ - Extend
ruvector-graphproperty graph to carry manifold-valued node features - Develop equivariant sparse attention using
ruvector-dag/src/mincut/for graph sparsification
Phase 4: Continuous Curvature Fields (12-18 months)
- Implement neural curvature field
\kappa(x)using small MLP - Develop numerical geodesic solver for non-constant curvature (connect to PDE attention module)
- Build differentiable metric tensor learning
- Integrate with
ruvector-temporal-tensorfor time-varying curvature fields
Success Metrics
| Metric | Baseline (Euclidean) | Target (Product Manifold) |
|---|---|---|
| Knowledge graph link prediction (MRR) | 0.45 | 0.55-0.60 |
| Hierarchy reconstruction accuracy | 65% | 85-95% |
| Embedding dimension for same quality | 256 | 128 |
| Attention computation (fused kernel) | 1.0x | 1.2x (overhead acceptable) |
| Training convergence (epochs) | 100 | 60-70 |
| Molecular property prediction (MAE) | 1.0x | 0.80-0.85x |
References
- Bachmann, Becigneul, Ganea (2020). "Constant Curvature Graph Convolutional Networks." ICML.
- Chami, Ying, Re, Leskovec (2019). "Hyperbolic Graph Convolutional Neural Networks." NeurIPS.
- Gu, Sala, Gunel, Re (2019). "Learning Mixed-Curvature Representations in Product Spaces." ICLR.
- Nickel, Kiela (2017). "Poincare Embeddings for Learning Hierarchical Representations." NeurIPS.
- Sala, De Sa, Gu, Re (2018). "Representation Tradeoffs for Hyperbolic Embeddings." ICML.
- Ungar (2008). "Analytic Hyperbolic Geometry and Albert Einstein's Special Theory of Relativity."
- Ganea, Becigneul, Hofmann (2018). "Hyperbolic Neural Networks." NeurIPS.
- Fuchs, Worrall, Fischer, Welling (2020). "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks." NeurIPS.
- Brandstetter, Hesselink, van der Pol, Bekkers, Welling (2022). "Geometric and Physical Quantities Improve E(3) Equivariant Message Passing." ICLR.
- Skopek, Ganea, Becigneul (2020). "Mixed-curvature Variational Autoencoders." ICLR.
- Lou, Nickel, Zantedeschi (2020). "Differentiating through the Frechet Mean." ICML.
- Xiong, Zhu, Hsieh, Ma, Liu (2022). "Pseudo-Riemannian Graph Convolutional Networks." NeurIPS.
Document Status: Research Proposal Last Updated: 2026-02-25 Owner: RuVector Architecture Team Related ADRs: ADR-045 (Lean Agentic Integration) Related Crates: ruvector-attention, ruvector-graph, ruvector-solver, ruvector-dag