26 KiB
Latent Space ↔ Graph Reality Interplay
Executive Summary
This document explores the fundamental relationship between high-dimensional latent space (where embeddings live) and graph reality (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity.
Central Question: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology?
1. The Two Worlds
1.1 Latent Space Characteristics
Definition: Continuous, high-dimensional vector space where node embeddings reside
Latent Space L: R^d where d ∈ [64, 1024+]
Node embedding: h_v ∈ L
Distance metric: d(h_u, h_v) (cosine, L2, etc.)
Properties:
- ✓ Continuous: Smooth interpolation between points
- ✓ Dense: Every point is surrounded by infinitely many neighbors
- ✓ High-Dimensional: Curse of dimensionality, but expressive
- ✓ Metric: Equipped with distance/similarity function
- ✗ Isotropic: May not preserve hierarchical structure
- ✗ Euclidean Bias: Most operations assume flat geometry
Current RuVector Latent Space:
// From layer.rs:337-350
RuvectorLayer {
input_dim: usize, // Typically 64-256
hidden_dim: usize, // Typically 128-512
// Embeddings live in R^{hidden_dim}
}
1.2 Graph Reality Characteristics
Definition: Discrete topological structure G = (V, E)
Graph G:
- Vertices: V = {v_1, ..., v_n}
- Edges: E ⊆ V × V
- Neighborhoods: N(v) = {u : (u,v) ∈ E}
- Topology: Small-world, scale-free, hierarchical, etc.
Properties:
- ✓ Discrete: Finite nodes and edges
- ✓ Sparse: |E| << |V|² (typically)
- ✓ Structured: Communities, hierarchies, motifs
- ✓ Relational: Explicit connections
- ✗ Non-Metric: Shortest path not always meaningful
- ✗ Heterogeneous: Variable degree, asymmetric
RuVector Graph (HNSW):
Hierarchical Navigable Small World:
- Layer 0: Dense graph (M = 16-64 neighbors)
- Layer 1+: Sparse graphs (long-range connections)
- Navigable: Greedy search finds approximate NN
- Small-world: Low diameter, high clustering
2. The Fundamental Tension
2.1 Embedding Paradox
Goal 1: Preserve graph topology in latent space
If (u, v) ∈ E, then ||h_u - h_v|| should be small
Goal 2: Preserve latent similarity in graph
If ||h_u - h_v|| is small, then u and v should be related
Paradox: These are not equivalent!
- Graph neighbors may be semantically different (e.g., bridge edges)
- Latent neighbors may not be graph-connected (e.g., same cluster, different components)
2.2 Information Bottleneck
Graph G ──encode──> Latent h ──decode──> Graph G'
(GNN) (predict edges/nodes)
Bottleneck: Fixed-dimensional h must compress all information from:
- Node features
- Local topology (ego-net)
- Global structure (communities, paths)
- Edge attributes
- Dynamic patterns
Trade-off:
- High dimensions: More expressive, but curse of dimensionality
- Low dimensions: Efficient, but lossy compression
3. Manifold Hypothesis for Graphs
3.1 Low-Dimensional Manifold
Hypothesis: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space
True data distribution: P_data(h) supported on manifold M ⊂ R^d
where dim(M) << d
Implications:
- Intrinsic Dimensionality: Effective degrees of freedom much less than d
- Local Linearity: Small neighborhoods approximately Euclidean
- Global Curvature: Manifold may be curved (non-Euclidean)
Evidence in RuVector:
- HNSW assumes low intrinsic dimension for efficient search
- Multi-head attention learns multiple "views" of manifold
- Layer normalization assumes local isotropy
3.2 Geometric Structure of Graph Embeddings
Question: What geometry best represents graphs?
Option 1: Euclidean (Current)
h ∈ R^d, distance = ||h_u - h_v||_2
- ✓ Simple, well-understood
- ✓ Efficient operations (dot products, linear maps)
- ✗ Poor for tree-like structures
- ✗ Exponential capacity limited
Option 2: Hyperbolic (Poincaré Ball)
h ∈ B^d = {x ∈ R^d : ||x|| < 1}
distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
- ✓ Exponential capacity: Volume grows exponentially with radius
- ✓ Natural hierarchies: Tree embeddings with low distortion
- ✓ HNSW synergy: Hierarchical layers naturally hyperbolic
- ✗ More complex operations
- ✗ Numerical instability near boundary
Option 3: Mixed Curvature (Product Manifolds)
h = (h_euclidean, h_hyperbolic, h_spherical)
Combine different geometries for different aspects
3.3 Curvature and Graph Structure
Relationship:
Negative curvature (hyperbolic) ↔ Tree-like, hierarchical
Zero curvature (Euclidean) ↔ Grid-like, regular
Positive curvature (spherical) ↔ Cyclic, clustered
HNSW Topology:
- Layer 0: Locally grid-like (Euclidean)
- Higher layers: Tree-like navigation (Hyperbolic)
- Overall: Mixed curvature
Implication: Single geometry may be suboptimal; consider mixed-curvature embeddings
4. Encoding: Graph → Latent Space
4.1 Message Passing Framework (Current)
Goal: Aggregate neighborhood information into node embedding
From layer.rs:362-401 (RuvectorLayer.forward):
h_v^{(l+1)} = UPDATE(
h_v^{(l)},
AGGREGATE({m_u^{(l)} : u ∈ N(v)}),
TRANSFORM(h_v^{(l)})
)
Current Pipeline:
- Message:
m_u = W_msg · h_u - Attention Aggregate:
a_v = MultiHeadAttention(h_v, {h_u}) - Weighted Aggregate:
agg_v = Σ w_uv · m_u - Combine:
combined = a_v + agg_v - Update:
h'_v = GRU(W_agg · combined, h_v) - Normalize:
output = LayerNorm(Dropout(h'_v))
What This Encodes:
- ✓ Local neighborhood structure (1-hop in one layer)
- ✓ Neighbor feature aggregation
- ✓ Temporal dynamics (via GRU)
- ✗ Global structure (requires stacking layers)
- ✗ Structural properties (degree, centrality, etc.)
- ✗ Edge semantics (only weights, not features)
4.2 Multi-Hop Information Propagation
Challenge: K-layer GNN sees only K-hop neighborhood
Receptive field after L layers: d_graph(u, v) ≤ L
For HNSW:
- Layer 0 average degree: ~50
- Layer 1 average degree: ~10
- Exponential reduction in higher layers
Trade-off:
- Many layers: Large receptive field, but over-smoothing
- Few layers: Localized, but miss global context
Over-Smoothing Problem:
As L → ∞, all node embeddings converge to the same value:
h_v^{(∞)} → E[h] for all v
Mitigation Strategies:
- Skip Connections:
h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)}) - Residual GRU: Implicit in
h_t = (1-z_t)h_{t-1} + z_t h̃_t - Jumping Knowledge: Concatenate all layer outputs
- Adaptive Depth: Learn when to stop propagating
4.3 Structural Features Beyond Neighborhoods
Current limitation: Only neighbor features, not structural properties
Missing Encodings:
- Node Degree:
deg(v) = |N(v)| - Clustering Coefficient:
C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2) - Centrality: Betweenness, closeness, eigenvector
- Community Membership: Detected clusters
- HNSW Layer: Which layers the node appears in
Proposed Enhancement:
pub struct StructuralFeatures {
degree: f32,
clustering_coef: f32,
hnsw_layers: Vec<usize>, // Layers this node appears in
centrality: f32,
}
impl RuvectorLayer {
fn forward_with_structural(
&self,
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
edge_weights: &[f32],
structural_features: &StructuralFeatures, // NEW
) -> Vec<f32> {
// Concatenate structural features to embedding
let augmented = [node_embedding, &structural_features.to_vec()].concat();
// Proceed with standard forward pass
// ...
}
}
5. Decoding: Latent Space → Graph Predictions
5.1 Link Prediction
Goal: Predict edge existence from embeddings
Score(u, v) = f(h_u, h_v)
P((u,v) ∈ E) = σ(Score(u, v))
Scoring Functions:
1. Dot Product (Current in search.rs)
score = h_u.dot(h_v)
- ✓ Fast O(d)
- ✗ Not invariant to scaling
2. Cosine Similarity (Current in search.rs)
score = h_u.dot(h_v) / (||h_u|| · ||h_v||)
- ✓ Scale-invariant
- ✓ Natural for normalized embeddings
- ✗ Ignores magnitude information
3. Distance-Based
score = -||h_u - h_v||²
- ✓ Metric structure
- ✗ Negative, unbounded
4. Bilinear
score = h_u^T W h_v
- ✓ Learnable asymmetry
- ✗ O(d²) parameters
5. MLP (Most Expressive)
score = MLP([h_u || h_v || (h_u ⊙ h_v)])
- ✓ Highly expressive
- ✗ Expensive O(d²) or more
RuVector Current:
// From search.rs:4-18
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
if norm_a == 0.0 || norm_b == 0.0 {
0.0
} else {
dot_product / (norm_a * norm_b)
}
}
5.2 Node Classification
Goal: Predict node labels from embeddings
From graph_neural.rs:82-98:
classify_node(h_v) → class probabilities
Typical Approach:
logits = W_class · h_v + b
probs = softmax(logits)
Challenge: Embedding must encode label-relevant information
5.3 Graph Reconstruction
Goal: Reconstruct adjacency matrix from embeddings
Autoencoder Framework:
Encoder: A → H (GNN)
Decoder: H → A' (pairwise scoring)
Loss: ||A - A'||² or Cross-Entropy
Reconstruction Loss:
// Proposed
pub fn graph_reconstruction_loss(
embeddings: &[Vec<f32>],
adjacency: &[(usize, usize)], // True edges
) -> f32 {
let mut loss = 0.0;
let n = embeddings.len();
// Positive edges (should have high score)
for &(i, j) in adjacency {
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
loss -= (score + 1e-10).ln(); // -log(score)
}
// Negative sampling (non-edges should have low score)
for _ in 0..adjacency.len() {
let i = rand::random::<usize>() % n;
let j = rand::random::<usize>() % n;
if !adjacency.contains(&(i, j)) {
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
loss -= (1.0 - score + 1e-10).ln(); // -log(1 - score)
}
}
loss / (2 * adjacency.len()) as f32
}
6. Information-Theoretic Perspective
6.1 Mutual Information
Goal: Maximize mutual information between graph structure G and embeddings H
max I(G; H) = H(G) - H(G|H)
= H(H) - H(H|G)
Interpretation:
I(G; H)measures how much knowing H tells us about G- Perfect encoding:
I(G; H) = H(G)(H captures all graph info) - Independence:
I(G; H) = 0(H tells nothing about G)
Challenges:
- Intractability: Computing I(G; H) is hard
- Continuous H: Differential entropy unbounded
- Discrete G: Entropy depends on graph size
6.2 Deep Graph Infomax (DGI)
Idea: Maximize MI between node embeddings and graph summary
DGI Loss:
max I(h_v; h_G)
where:
h_v: node embedding
h_G: graph-level summary (e.g., mean pooling)
Implementation:
pub fn deep_graph_infomax_loss(
node_embeddings: &[Vec<f32>],
graph_summary: &[f32], // Readout function output
negative_samples: &[Vec<f32>], // Corrupted embeddings
) -> f32 {
let mut loss = 0.0;
// Positive samples: real (node, graph) pairs
for h_v in node_embeddings {
let score = discriminator(h_v, graph_summary); // MLP or bilinear
loss -= (sigmoid(score) + 1e-10).ln();
}
// Negative samples: (corrupted node, graph) pairs
for h_neg in negative_samples {
let score = discriminator(h_neg, graph_summary);
loss -= (1.0 - sigmoid(score) + 1e-10).ln();
}
loss / (node_embeddings.len() + negative_samples.len()) as f32
}
Readout Functions (graph summary):
- Mean:
h_G = (1/n) Σ_v h_v - Max:
h_G = max_v h_v(element-wise) - Attention:
h_G = Σ_v α_v h_vwhereα_v = softmax(MLP(h_v))
6.3 Information Bottleneck Principle
Principle: Find minimal sufficient representation
min I(X; H) - β I(H; Y)
where:
X: input features
H: learned embeddings
Y: prediction target
β: trade-off parameter
Graph Context:
X: Node features + neighborhood structureH: Node embeddingsY: Downstream task (link, classification)
Goal: Compress X into H, retaining only task-relevant information
Implementation Strategy:
- Variational Bound: Use VAE-style reparameterization
- Lagrange Multiplier: β controls compression vs. performance
- Regularization: Encourage low mutual information I(X; H)
7. Contrastive Learning for Graph-Latent Alignment
7.1 Contrastive Objectives
Core Idea: Pull together related nodes in latent space, push apart unrelated nodes
InfoNCE Loss (Current in training.rs:362-411):
pub fn info_nce_loss(
anchor: &[f32],
positives: &[&[f32]],
negatives: &[&[f32]],
temperature: f32,
) -> f32
Mathematical Form:
L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ)))
What This Optimizes:
- Positive pairs
(h_v, h_+): Graph neighbors, semantically similar - Negative pairs
(h_v, h_-): Non-neighbors, dissimilar - Temperature τ: Controls hardness of negatives
7.2 Local Contrastive Loss (Graph-Specific)
Current Implementation (training.rs:444-462):
pub fn local_contrastive_loss(
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
non_neighbor_embeddings: &[Vec<f32>],
temperature: f32,
) -> f32
Graph-Aware Sampling:
- Positives: Direct graph neighbors
N(v) - Negatives: Non-neighbors (random or hard negatives)
Variants:
1. K-Hop Positives
Positives = {u : d_graph(v, u) ≤ K}
Encourages multi-hop proximity in latent space
2. Community-Based
Positives = {u : community(v) = community(u)}
Negatives = {u : community(v) ≠ community(u)}
Encourages cluster separation
3. HNSW Layer-Based
Positives = {u : (u,v) ∈ E_layer_k}
Different contrastive losses per HNSW layer
7.3 Hard Negative Mining
Problem: Random negatives are often too easy
Solution: Sample hard negatives (latent-close but graph-far)
fn sample_hard_negatives(
node: &[f32],
all_embeddings: &[Vec<f32>],
true_neighbors: &[usize],
k: usize,
) -> Vec<Vec<f32>> {
// 1. Compute similarities to all nodes
let similarities: Vec<(usize, f32)> = all_embeddings
.iter()
.enumerate()
.filter(|(i, _)| !true_neighbors.contains(i)) // Exclude true neighbors
.map(|(i, emb)| (i, cosine_similarity(node, emb)))
.collect();
// 2. Sort by similarity (descending)
similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
// 3. Take top-k (most similar non-neighbors = hard negatives)
similarities.iter()
.take(k)
.map(|(i, _)| all_embeddings[*i].clone())
.collect()
}
Benefits:
- Focuses learning on difficult cases
- Improves discrimination boundaries
- Speeds up convergence
8. Spectral Methods and Graph Signals
8.1 Graph Laplacian
Normalized Laplacian:
L = I - D^(-1/2) A D^(-1/2)
where:
A: adjacency matrix
D: degree matrix (diagonal)
Eigendecomposition:
L = U Λ U^T
where:
U: eigenvectors (graph Fourier basis)
Λ: eigenvalues (frequencies)
Interpretation:
- Small eigenvalues ↔ Low-frequency (smooth signals)
- Large eigenvalues ↔ High-frequency (oscillatory signals)
8.2 Spectral GNN Connection
Classical GCN (Kipf & Welling):
H^{(l+1)} = σ(D̃^(-1/2) Ã D̃^(-1/2) H^{(l)} W^{(l)})
where à = A + I (self-loops)
Spectral Interpretation:
- Aggregation = Low-pass filter
- Smooths node features along graph structure
- Eigenvalues control extent of smoothing
RuVector's Approach (Spatial, not spectral):
- Message passing is spatial formulation
- Attention adds adaptive filtering
- GRU adds temporal component
Missing Spectral Component:
- No explicit frequency analysis
- Could add spectral loss to preserve frequency content
8.3 Spectral Loss Functions
Goal: Preserve graph spectral properties in embeddings
Laplacian Eigenmaps:
min_H Σ_{(i,j) ∈ E} ||h_i - h_j||²
subject to H^T H = I
Equivalent to minimizing: Tr(H^T L H)
Implementation:
pub fn spectral_loss(
embeddings: &[Vec<f32>],
adjacency: &[(usize, usize)],
degrees: &[f32],
) -> f32 {
let mut loss = 0.0;
// Laplacian regularization: ||h_i - h_j||² for edges
for &(i, j) in adjacency {
let diff = subtract(&embeddings[i], &embeddings[j]);
let norm_sq = l2_norm_squared(&diff);
// Normalize by degrees
let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt());
loss += weight * norm_sq;
}
loss
}
Benefits:
- Smooth embeddings along graph structure
- Preserves community structure
- Theoretical guarantees (Laplacian eigenmaps)
9. Disentangled Representations
9.1 Motivation
Problem: Current embeddings are entangled (single vector encodes everything)
Goal: Separate embedding into interpretable factors
h_v = [h_structural || h_semantic || h_temporal]
where:
h_structural: Topology (degree, centrality, etc.)
h_semantic: Feature content
h_temporal: Dynamics (for evolving graphs)
9.2 β-VAE for Graphs
Variational Autoencoder with Disentanglement:
Encoder: (X_v, N(v)) → q(z_v | X_v, N(v))
Decoder: z_v → p(X_v, N(v) | z_v)
Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v))
β > 1: Encourages disentanglement (independence of latent factors)
Implementation Sketch:
pub struct GraphVAE {
encoder: RuvectorLayer,
mu_layer: Linear,
logvar_layer: Linear,
decoder: Linear,
}
impl GraphVAE {
fn encode(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, Vec<f32>) {
let h = self.encoder.forward(node_features, neighbors, &[]);
let mu = self.mu_layer.forward(&h);
let logvar = self.logvar_layer.forward(&h);
(mu, logvar)
}
fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
let eps: Vec<f32> = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect();
mu.iter().zip(std.iter()).zip(eps.iter())
.map(|((&m, &s), &e)| m + s * e)
.collect()
}
fn forward(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, f32) {
let (mu, logvar) = self.encode(node_features, neighbors);
let z = self.reparameterize(&mu, &logvar);
// Reconstruct node features
let recon = self.decoder.forward(&z);
// KL divergence
let kl: f32 = mu.iter().zip(logvar.iter())
.map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
.sum();
(recon, kl)
}
}
9.3 Disentanglement Metrics
1. Mutual Information Gap (MIG)
MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k)
Measures how uniquely each latent factor captures each ground-truth factor
2. SAP (Separated Attribute Predictability)
Train linear classifiers z → y for each attribute
Measure how well z predicts individual factors
Application to Graphs:
- Ground-truth factors: Degree, clustering, centrality, community
- Learned latent: h_v
- Metric: MIG or SAP between h_v components and structural properties
10. Hierarchical Representations (HNSW-Specific)
10.1 Multi-Scale Embeddings
Idea: Different embeddings for different HNSW layers
Node v appears in layers {0, 2, 3}:
h_v^{(0)}: Dense, local structure
h_v^{(2)}: Coarse, medium-range
h_v^{(3)}: Global, long-range hubs
Hierarchical Encoding:
pub struct HierarchicalEmbedding {
embeddings_by_layer: HashMap<usize, Vec<f32>>,
}
impl HierarchicalEmbedding {
fn get_embedding(&self, layer: usize) -> &Vec<f32> {
self.embeddings_by_layer.get(&layer)
.expect("Node not in this layer")
}
// Interpolate between layers for search
fn interpolated_embedding(&self, target_layer: f32) -> Vec<f32> {
let layer_low = target_layer.floor() as usize;
let layer_high = target_layer.ceil() as usize;
if layer_low == layer_high {
return self.get_embedding(layer_low).clone();
}
let alpha = target_layer - layer_low as f32;
let emb_low = self.get_embedding(layer_low);
let emb_high = self.get_embedding(layer_high);
// Linear interpolation
emb_low.iter().zip(emb_high.iter())
.map(|(&l, &h)| (1.0 - alpha) * l + alpha * h)
.collect()
}
}
Hierarchical Loss:
fn hierarchical_contrastive_loss(
node_hierarchical_emb: &HierarchicalEmbedding,
neighbors_by_layer: &HashMap<usize, Vec<HierarchicalEmbedding>>,
) -> f32 {
let mut loss = 0.0;
// Contrastive loss at each layer
for (layer, layer_neighbors) in neighbors_by_layer {
let h_v = node_hierarchical_emb.get_embedding(*layer);
let positives: Vec<&Vec<f32>> = layer_neighbors.iter()
.map(|n| n.get_embedding(*layer))
.collect();
// Sample negatives from other layers
let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer);
loss += info_nce_loss(h_v, &positives, &negatives, 0.07);
}
loss / neighbors_by_layer.len() as f32
}
10.2 Coarse-to-Fine Alignment
Goal: Ensure consistency across HNSW layers
Alignment Loss:
L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||²
where Project: R^{d_high} → R^{d_low} (e.g., learned linear map)
Benefits:
- Global structure (high layers) guides local (low layers)
- Enables layer-skipping (jump from layer 3 to layer 0 embedding)
- Multi-resolution representation
11. Practical Strategies for RuVector
11.1 Short-Term Enhancements
1. Structural Feature Augmentation
// Add degree, clustering, HNSW layer info to embeddings
let augmented_embedding = [
&node_embedding[..],
&[degree as f32],
&[clustering_coef],
&one_hot_layer[..],
].concat();
2. Spectral Regularization
// Add spectral loss to training
total_loss = contrastive_loss + λ_spectral * spectral_loss
3. Hard Negative Sampling
// Replace random negatives with hard negatives in local_contrastive_loss
let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k);
let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature);
11.2 Medium-Term Research
4. Hierarchical Embeddings per HNSW Layer
pub struct HNSWHierarchicalGNN {
gnn_layers_by_hnsw_level: Vec<RuvectorLayer>,
}
5. Hyperbolic Embeddings for Higher Layers
// Layer 0: Euclidean (local, grid-like)
// Layer 1+: Hyperbolic (hierarchical navigation)
pub enum GeometricEmbedding {
Euclidean(Vec<f32>),
Hyperbolic(Vec<f32>), // Poincaré ball
}
6. Disentangled VAE
// Separate structural vs. semantic information
pub struct DisentangledGraphVAE {
structural_encoder: RuvectorLayer,
semantic_encoder: RuvectorLayer,
decoder: Linear,
}
11.3 Long-Term Exploration
7. Information Bottleneck Optimization
- Minimize I(X; H) while maximizing I(H; Y)
- Variational bounds for tractability
- Beta-annealing schedule
8. Graph Transformers
- Replace message passing with full attention
- Positional encodings (Laplacian eigenvectors, RoPE)
- Layer-wise multi-scale attention
9. Neural ODEs for Continuous Depth
dh/dt = GNN(h(t), G)
h(T) = h(0) + ∫₀^T GNN(h(t), G) dt
12. Evaluation Metrics for Latent-Graph Alignment
12.1 Reconstruction Metrics
1. Link Prediction AUC
Measure how well latent similarity predicts edges
AUC-ROC on link prediction task
2. Graph Reconstruction Error
||A - σ(H H^T)||²_F
where A is adjacency, H is embeddings
12.2 Structural Preservation
3. Rank Correlation
Spearman ρ between:
- Graph distance d_G(u, v)
- Latent distance d_L(h_u, h_v)
4. Distortion
max_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
Worst-case embedding distortion
5. Average Distortion
(1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
12.3 Downstream Task Performance
6. Node Classification Accuracy
Train classifier on embeddings, test accuracy
7. Clustering Modularity
K-means on embeddings, measure graph modularity
8. HNSW Search Quality
Recall@K using learned embeddings vs. original features
References
Papers
Manifold Learning:
- Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
- Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction
Hyperbolic Embeddings: 3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations 4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks
Information Theory: 5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle 6. Velickovic et al. (2019) - Deep Graph Infomax
Contrastive Learning: 7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 8. You et al. (2020) - Graph Contrastive Learning with Augmentations
Disentanglement: 9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework 10. Ma et al. (2019) - Disentangled Graph Convolutional Networks
RuVector Code
crates/ruvector-gnn/src/layer.rs- GNN encodingcrates/ruvector-gnn/src/search.rs- Latent similarity (decoding)crates/ruvector-gnn/src/training.rs- Contrastive losses (alignment)
Document Version: 1.0 Last Updated: 2025-11-30 Author: RuVector Research Team