Files
wifi-densepose/vendor/ruvector/docs/research/latent-space/latent-graph-interplay.md

26 KiB
Raw Permalink Blame History

Latent Space ↔ Graph Reality Interplay

Executive Summary

This document explores the fundamental relationship between high-dimensional latent space (where embeddings live) and graph reality (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity.

Central Question: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology?


1. The Two Worlds

1.1 Latent Space Characteristics

Definition: Continuous, high-dimensional vector space where node embeddings reside

Latent Space L: R^d where d ∈ [64, 1024+]
Node embedding: h_v ∈ L
Distance metric: d(h_u, h_v) (cosine, L2, etc.)

Properties:

  • Continuous: Smooth interpolation between points
  • Dense: Every point is surrounded by infinitely many neighbors
  • High-Dimensional: Curse of dimensionality, but expressive
  • Metric: Equipped with distance/similarity function
  • Isotropic: May not preserve hierarchical structure
  • Euclidean Bias: Most operations assume flat geometry

Current RuVector Latent Space:

// From layer.rs:337-350
RuvectorLayer {
    input_dim: usize,   // Typically 64-256
    hidden_dim: usize,  // Typically 128-512
    // Embeddings live in R^{hidden_dim}
}

1.2 Graph Reality Characteristics

Definition: Discrete topological structure G = (V, E)

Graph G:
- Vertices: V = {v_1, ..., v_n}
- Edges: E ⊆ V × V
- Neighborhoods: N(v) = {u : (u,v) ∈ E}
- Topology: Small-world, scale-free, hierarchical, etc.

Properties:

  • Discrete: Finite nodes and edges
  • Sparse: |E| << |V|² (typically)
  • Structured: Communities, hierarchies, motifs
  • Relational: Explicit connections
  • Non-Metric: Shortest path not always meaningful
  • Heterogeneous: Variable degree, asymmetric

RuVector Graph (HNSW):

Hierarchical Navigable Small World:
- Layer 0: Dense graph (M = 16-64 neighbors)
- Layer 1+: Sparse graphs (long-range connections)
- Navigable: Greedy search finds approximate NN
- Small-world: Low diameter, high clustering

2. The Fundamental Tension

2.1 Embedding Paradox

Goal 1: Preserve graph topology in latent space

If (u, v) ∈ E, then ||h_u - h_v|| should be small

Goal 2: Preserve latent similarity in graph

If ||h_u - h_v|| is small, then u and v should be related

Paradox: These are not equivalent!

  • Graph neighbors may be semantically different (e.g., bridge edges)
  • Latent neighbors may not be graph-connected (e.g., same cluster, different components)

2.2 Information Bottleneck

Graph G ──encode──> Latent h ──decode──> Graph G'
          (GNN)                  (predict edges/nodes)

Bottleneck: Fixed-dimensional h must compress all information from:

  • Node features
  • Local topology (ego-net)
  • Global structure (communities, paths)
  • Edge attributes
  • Dynamic patterns

Trade-off:

  • High dimensions: More expressive, but curse of dimensionality
  • Low dimensions: Efficient, but lossy compression

3. Manifold Hypothesis for Graphs

3.1 Low-Dimensional Manifold

Hypothesis: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space

True data distribution: P_data(h) supported on manifold M ⊂ R^d
where dim(M) << d

Implications:

  1. Intrinsic Dimensionality: Effective degrees of freedom much less than d
  2. Local Linearity: Small neighborhoods approximately Euclidean
  3. Global Curvature: Manifold may be curved (non-Euclidean)

Evidence in RuVector:

  • HNSW assumes low intrinsic dimension for efficient search
  • Multi-head attention learns multiple "views" of manifold
  • Layer normalization assumes local isotropy

3.2 Geometric Structure of Graph Embeddings

Question: What geometry best represents graphs?

Option 1: Euclidean (Current)

h ∈ R^d, distance = ||h_u - h_v||_2
  • ✓ Simple, well-understood
  • ✓ Efficient operations (dot products, linear maps)
  • ✗ Poor for tree-like structures
  • ✗ Exponential capacity limited

Option 2: Hyperbolic (Poincaré Ball)

h ∈ B^d = {x ∈ R^d : ||x|| < 1}
distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
  • Exponential capacity: Volume grows exponentially with radius
  • Natural hierarchies: Tree embeddings with low distortion
  • HNSW synergy: Hierarchical layers naturally hyperbolic
  • ✗ More complex operations
  • ✗ Numerical instability near boundary

Option 3: Mixed Curvature (Product Manifolds)

h = (h_euclidean, h_hyperbolic, h_spherical)
Combine different geometries for different aspects

3.3 Curvature and Graph Structure

Relationship:

Negative curvature (hyperbolic) ↔ Tree-like, hierarchical
Zero curvature (Euclidean) ↔ Grid-like, regular
Positive curvature (spherical) ↔ Cyclic, clustered

HNSW Topology:

  • Layer 0: Locally grid-like (Euclidean)
  • Higher layers: Tree-like navigation (Hyperbolic)
  • Overall: Mixed curvature

Implication: Single geometry may be suboptimal; consider mixed-curvature embeddings


4. Encoding: Graph → Latent Space

4.1 Message Passing Framework (Current)

Goal: Aggregate neighborhood information into node embedding

From layer.rs:362-401 (RuvectorLayer.forward):

h_v^{(l+1)} = UPDATE(
    h_v^{(l)},
    AGGREGATE({m_u^{(l)} : u ∈ N(v)}),
    TRANSFORM(h_v^{(l)})
)

Current Pipeline:

  1. Message: m_u = W_msg · h_u
  2. Attention Aggregate: a_v = MultiHeadAttention(h_v, {h_u})
  3. Weighted Aggregate: agg_v = Σ w_uv · m_u
  4. Combine: combined = a_v + agg_v
  5. Update: h'_v = GRU(W_agg · combined, h_v)
  6. Normalize: output = LayerNorm(Dropout(h'_v))

What This Encodes:

  • ✓ Local neighborhood structure (1-hop in one layer)
  • ✓ Neighbor feature aggregation
  • ✓ Temporal dynamics (via GRU)
  • ✗ Global structure (requires stacking layers)
  • ✗ Structural properties (degree, centrality, etc.)
  • ✗ Edge semantics (only weights, not features)

4.2 Multi-Hop Information Propagation

Challenge: K-layer GNN sees only K-hop neighborhood

Receptive field after L layers: d_graph(u, v) ≤ L

For HNSW:

  • Layer 0 average degree: ~50
  • Layer 1 average degree: ~10
  • Exponential reduction in higher layers

Trade-off:

  • Many layers: Large receptive field, but over-smoothing
  • Few layers: Localized, but miss global context

Over-Smoothing Problem:

As L → ∞, all node embeddings converge to the same value:
h_v^{(∞)} → E[h] for all v

Mitigation Strategies:

  1. Skip Connections: h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)})
  2. Residual GRU: Implicit in h_t = (1-z_t)h_{t-1} + z_t h̃_t
  3. Jumping Knowledge: Concatenate all layer outputs
  4. Adaptive Depth: Learn when to stop propagating

4.3 Structural Features Beyond Neighborhoods

Current limitation: Only neighbor features, not structural properties

Missing Encodings:

  1. Node Degree: deg(v) = |N(v)|
  2. Clustering Coefficient: C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2)
  3. Centrality: Betweenness, closeness, eigenvector
  4. Community Membership: Detected clusters
  5. HNSW Layer: Which layers the node appears in

Proposed Enhancement:

pub struct StructuralFeatures {
    degree: f32,
    clustering_coef: f32,
    hnsw_layers: Vec<usize>,  // Layers this node appears in
    centrality: f32,
}

impl RuvectorLayer {
    fn forward_with_structural(
        &self,
        node_embedding: &[f32],
        neighbor_embeddings: &[Vec<f32>],
        edge_weights: &[f32],
        structural_features: &StructuralFeatures,  // NEW
    ) -> Vec<f32> {
        // Concatenate structural features to embedding
        let augmented = [node_embedding, &structural_features.to_vec()].concat();

        // Proceed with standard forward pass
        // ...
    }
}

5. Decoding: Latent Space → Graph Predictions

Goal: Predict edge existence from embeddings

Score(u, v) = f(h_u, h_v)
P((u,v) ∈ E) = σ(Score(u, v))

Scoring Functions:

1. Dot Product (Current in search.rs)

score = h_u.dot(h_v)
  • ✓ Fast O(d)
  • ✗ Not invariant to scaling

2. Cosine Similarity (Current in search.rs)

score = h_u.dot(h_v) / (||h_u|| · ||h_v||)
  • ✓ Scale-invariant
  • ✓ Natural for normalized embeddings
  • ✗ Ignores magnitude information

3. Distance-Based

score = -||h_u - h_v||²
  • ✓ Metric structure
  • ✗ Negative, unbounded

4. Bilinear

score = h_u^T W h_v
  • ✓ Learnable asymmetry
  • ✗ O(d²) parameters

5. MLP (Most Expressive)

score = MLP([h_u || h_v || (h_u  h_v)])
  • ✓ Highly expressive
  • ✗ Expensive O(d²) or more

RuVector Current:

// From search.rs:4-18
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
    let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
    let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;

    if norm_a == 0.0 || norm_b == 0.0 {
        0.0
    } else {
        dot_product / (norm_a * norm_b)
    }
}

5.2 Node Classification

Goal: Predict node labels from embeddings

From graph_neural.rs:82-98:
classify_node(h_v) → class probabilities

Typical Approach:

logits = W_class · h_v + b
probs = softmax(logits)

Challenge: Embedding must encode label-relevant information

5.3 Graph Reconstruction

Goal: Reconstruct adjacency matrix from embeddings

Autoencoder Framework:

Encoder: A → H (GNN)
Decoder: H → A' (pairwise scoring)

Loss: ||A - A'||² or Cross-Entropy

Reconstruction Loss:

// Proposed
pub fn graph_reconstruction_loss(
    embeddings: &[Vec<f32>],
    adjacency: &[(usize, usize)],  // True edges
) -> f32 {
    let mut loss = 0.0;
    let n = embeddings.len();

    // Positive edges (should have high score)
    for &(i, j) in adjacency {
        let score = cosine_similarity(&embeddings[i], &embeddings[j]);
        loss -= (score + 1e-10).ln();  // -log(score)
    }

    // Negative sampling (non-edges should have low score)
    for _ in 0..adjacency.len() {
        let i = rand::random::<usize>() % n;
        let j = rand::random::<usize>() % n;
        if !adjacency.contains(&(i, j)) {
            let score = cosine_similarity(&embeddings[i], &embeddings[j]);
            loss -= (1.0 - score + 1e-10).ln();  // -log(1 - score)
        }
    }

    loss / (2 * adjacency.len()) as f32
}

6. Information-Theoretic Perspective

6.1 Mutual Information

Goal: Maximize mutual information between graph structure G and embeddings H

max I(G; H) = H(G) - H(G|H)
            = H(H) - H(H|G)

Interpretation:

  • I(G; H) measures how much knowing H tells us about G
  • Perfect encoding: I(G; H) = H(G) (H captures all graph info)
  • Independence: I(G; H) = 0 (H tells nothing about G)

Challenges:

  1. Intractability: Computing I(G; H) is hard
  2. Continuous H: Differential entropy unbounded
  3. Discrete G: Entropy depends on graph size

6.2 Deep Graph Infomax (DGI)

Idea: Maximize MI between node embeddings and graph summary

DGI Loss:
  max I(h_v; h_G)

where:
  h_v: node embedding
  h_G: graph-level summary (e.g., mean pooling)

Implementation:

pub fn deep_graph_infomax_loss(
    node_embeddings: &[Vec<f32>],
    graph_summary: &[f32],  // Readout function output
    negative_samples: &[Vec<f32>],  // Corrupted embeddings
) -> f32 {
    let mut loss = 0.0;

    // Positive samples: real (node, graph) pairs
    for h_v in node_embeddings {
        let score = discriminator(h_v, graph_summary);  // MLP or bilinear
        loss -= (sigmoid(score) + 1e-10).ln();
    }

    // Negative samples: (corrupted node, graph) pairs
    for h_neg in negative_samples {
        let score = discriminator(h_neg, graph_summary);
        loss -= (1.0 - sigmoid(score) + 1e-10).ln();
    }

    loss / (node_embeddings.len() + negative_samples.len()) as f32
}

Readout Functions (graph summary):

  1. Mean: h_G = (1/n) Σ_v h_v
  2. Max: h_G = max_v h_v (element-wise)
  3. Attention: h_G = Σ_v α_v h_v where α_v = softmax(MLP(h_v))

6.3 Information Bottleneck Principle

Principle: Find minimal sufficient representation

min I(X; H) - β I(H; Y)

where:
  X: input features
  H: learned embeddings
  Y: prediction target
  β: trade-off parameter

Graph Context:

  • X: Node features + neighborhood structure
  • H: Node embeddings
  • Y: Downstream task (link, classification)

Goal: Compress X into H, retaining only task-relevant information

Implementation Strategy:

  1. Variational Bound: Use VAE-style reparameterization
  2. Lagrange Multiplier: β controls compression vs. performance
  3. Regularization: Encourage low mutual information I(X; H)

7. Contrastive Learning for Graph-Latent Alignment

7.1 Contrastive Objectives

Core Idea: Pull together related nodes in latent space, push apart unrelated nodes

InfoNCE Loss (Current in training.rs:362-411):

pub fn info_nce_loss(
    anchor: &[f32],
    positives: &[&[f32]],
    negatives: &[&[f32]],
    temperature: f32,
) -> f32

Mathematical Form:

L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ)))

What This Optimizes:

  • Positive pairs (h_v, h_+): Graph neighbors, semantically similar
  • Negative pairs (h_v, h_-): Non-neighbors, dissimilar
  • Temperature τ: Controls hardness of negatives

7.2 Local Contrastive Loss (Graph-Specific)

Current Implementation (training.rs:444-462):

pub fn local_contrastive_loss(
    node_embedding: &[f32],
    neighbor_embeddings: &[Vec<f32>],
    non_neighbor_embeddings: &[Vec<f32>],
    temperature: f32,
) -> f32

Graph-Aware Sampling:

  • Positives: Direct graph neighbors N(v)
  • Negatives: Non-neighbors (random or hard negatives)

Variants:

1. K-Hop Positives

Positives = {u : d_graph(v, u) ≤ K}
Encourages multi-hop proximity in latent space

2. Community-Based

Positives = {u : community(v) = community(u)}
Negatives = {u : community(v) ≠ community(u)}
Encourages cluster separation

3. HNSW Layer-Based

Positives = {u : (u,v) ∈ E_layer_k}
Different contrastive losses per HNSW layer

7.3 Hard Negative Mining

Problem: Random negatives are often too easy

Solution: Sample hard negatives (latent-close but graph-far)

fn sample_hard_negatives(
    node: &[f32],
    all_embeddings: &[Vec<f32>],
    true_neighbors: &[usize],
    k: usize,
) -> Vec<Vec<f32>> {
    // 1. Compute similarities to all nodes
    let similarities: Vec<(usize, f32)> = all_embeddings
        .iter()
        .enumerate()
        .filter(|(i, _)| !true_neighbors.contains(i))  // Exclude true neighbors
        .map(|(i, emb)| (i, cosine_similarity(node, emb)))
        .collect();

    // 2. Sort by similarity (descending)
    similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

    // 3. Take top-k (most similar non-neighbors = hard negatives)
    similarities.iter()
        .take(k)
        .map(|(i, _)| all_embeddings[*i].clone())
        .collect()
}

Benefits:

  • Focuses learning on difficult cases
  • Improves discrimination boundaries
  • Speeds up convergence

8. Spectral Methods and Graph Signals

8.1 Graph Laplacian

Normalized Laplacian:

L = I - D^(-1/2) A D^(-1/2)

where:
  A: adjacency matrix
  D: degree matrix (diagonal)

Eigendecomposition:

L = U Λ U^T

where:
  U: eigenvectors (graph Fourier basis)
  Λ: eigenvalues (frequencies)

Interpretation:

  • Small eigenvalues ↔ Low-frequency (smooth signals)
  • Large eigenvalues ↔ High-frequency (oscillatory signals)

8.2 Spectral GNN Connection

Classical GCN (Kipf & Welling):

H^{(l+1)} = σ(D̃^(-1/2) Ã D̃^(-1/2) H^{(l)} W^{(l)})

where à = A + I (self-loops)

Spectral Interpretation:

  • Aggregation = Low-pass filter
  • Smooths node features along graph structure
  • Eigenvalues control extent of smoothing

RuVector's Approach (Spatial, not spectral):

  • Message passing is spatial formulation
  • Attention adds adaptive filtering
  • GRU adds temporal component

Missing Spectral Component:

  • No explicit frequency analysis
  • Could add spectral loss to preserve frequency content

8.3 Spectral Loss Functions

Goal: Preserve graph spectral properties in embeddings

Laplacian Eigenmaps:

min_H Σ_{(i,j) ∈ E} ||h_i - h_j||²
subject to H^T H = I

Equivalent to minimizing: Tr(H^T L H)

Implementation:

pub fn spectral_loss(
    embeddings: &[Vec<f32>],
    adjacency: &[(usize, usize)],
    degrees: &[f32],
) -> f32 {
    let mut loss = 0.0;

    // Laplacian regularization: ||h_i - h_j||² for edges
    for &(i, j) in adjacency {
        let diff = subtract(&embeddings[i], &embeddings[j]);
        let norm_sq = l2_norm_squared(&diff);

        // Normalize by degrees
        let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt());
        loss += weight * norm_sq;
    }

    loss
}

Benefits:

  • Smooth embeddings along graph structure
  • Preserves community structure
  • Theoretical guarantees (Laplacian eigenmaps)

9. Disentangled Representations

9.1 Motivation

Problem: Current embeddings are entangled (single vector encodes everything)

Goal: Separate embedding into interpretable factors

h_v = [h_structural || h_semantic || h_temporal]

where:
  h_structural: Topology (degree, centrality, etc.)
  h_semantic: Feature content
  h_temporal: Dynamics (for evolving graphs)

9.2 β-VAE for Graphs

Variational Autoencoder with Disentanglement:

Encoder: (X_v, N(v)) → q(z_v | X_v, N(v))
Decoder: z_v → p(X_v, N(v) | z_v)

Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v))

β > 1: Encourages disentanglement (independence of latent factors)

Implementation Sketch:

pub struct GraphVAE {
    encoder: RuvectorLayer,
    mu_layer: Linear,
    logvar_layer: Linear,
    decoder: Linear,
}

impl GraphVAE {
    fn encode(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, Vec<f32>) {
        let h = self.encoder.forward(node_features, neighbors, &[]);
        let mu = self.mu_layer.forward(&h);
        let logvar = self.logvar_layer.forward(&h);
        (mu, logvar)
    }

    fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
        let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
        let eps: Vec<f32> = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect();

        mu.iter().zip(std.iter()).zip(eps.iter())
            .map(|((&m, &s), &e)| m + s * e)
            .collect()
    }

    fn forward(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, f32) {
        let (mu, logvar) = self.encode(node_features, neighbors);
        let z = self.reparameterize(&mu, &logvar);

        // Reconstruct node features
        let recon = self.decoder.forward(&z);

        // KL divergence
        let kl: f32 = mu.iter().zip(logvar.iter())
            .map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
            .sum();

        (recon, kl)
    }
}

9.3 Disentanglement Metrics

1. Mutual Information Gap (MIG)

MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k)

Measures how uniquely each latent factor captures each ground-truth factor

2. SAP (Separated Attribute Predictability)

Train linear classifiers z → y for each attribute
Measure how well z predicts individual factors

Application to Graphs:

  • Ground-truth factors: Degree, clustering, centrality, community
  • Learned latent: h_v
  • Metric: MIG or SAP between h_v components and structural properties

10. Hierarchical Representations (HNSW-Specific)

10.1 Multi-Scale Embeddings

Idea: Different embeddings for different HNSW layers

Node v appears in layers {0, 2, 3}:
  h_v^{(0)}: Dense, local structure
  h_v^{(2)}: Coarse, medium-range
  h_v^{(3)}: Global, long-range hubs

Hierarchical Encoding:

pub struct HierarchicalEmbedding {
    embeddings_by_layer: HashMap<usize, Vec<f32>>,
}

impl HierarchicalEmbedding {
    fn get_embedding(&self, layer: usize) -> &Vec<f32> {
        self.embeddings_by_layer.get(&layer)
            .expect("Node not in this layer")
    }

    // Interpolate between layers for search
    fn interpolated_embedding(&self, target_layer: f32) -> Vec<f32> {
        let layer_low = target_layer.floor() as usize;
        let layer_high = target_layer.ceil() as usize;

        if layer_low == layer_high {
            return self.get_embedding(layer_low).clone();
        }

        let alpha = target_layer - layer_low as f32;
        let emb_low = self.get_embedding(layer_low);
        let emb_high = self.get_embedding(layer_high);

        // Linear interpolation
        emb_low.iter().zip(emb_high.iter())
            .map(|(&l, &h)| (1.0 - alpha) * l + alpha * h)
            .collect()
    }
}

Hierarchical Loss:

fn hierarchical_contrastive_loss(
    node_hierarchical_emb: &HierarchicalEmbedding,
    neighbors_by_layer: &HashMap<usize, Vec<HierarchicalEmbedding>>,
) -> f32 {
    let mut loss = 0.0;

    // Contrastive loss at each layer
    for (layer, layer_neighbors) in neighbors_by_layer {
        let h_v = node_hierarchical_emb.get_embedding(*layer);
        let positives: Vec<&Vec<f32>> = layer_neighbors.iter()
            .map(|n| n.get_embedding(*layer))
            .collect();

        // Sample negatives from other layers
        let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer);

        loss += info_nce_loss(h_v, &positives, &negatives, 0.07);
    }

    loss / neighbors_by_layer.len() as f32
}

10.2 Coarse-to-Fine Alignment

Goal: Ensure consistency across HNSW layers

Alignment Loss:
  L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||²

where Project: R^{d_high} → R^{d_low} (e.g., learned linear map)

Benefits:

  • Global structure (high layers) guides local (low layers)
  • Enables layer-skipping (jump from layer 3 to layer 0 embedding)
  • Multi-resolution representation

11. Practical Strategies for RuVector

11.1 Short-Term Enhancements

1. Structural Feature Augmentation

// Add degree, clustering, HNSW layer info to embeddings
let augmented_embedding = [
    &node_embedding[..],
    &[degree as f32],
    &[clustering_coef],
    &one_hot_layer[..],
].concat();

2. Spectral Regularization

// Add spectral loss to training
total_loss = contrastive_loss + λ_spectral * spectral_loss

3. Hard Negative Sampling

// Replace random negatives with hard negatives in local_contrastive_loss
let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k);
let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature);

11.2 Medium-Term Research

4. Hierarchical Embeddings per HNSW Layer

pub struct HNSWHierarchicalGNN {
    gnn_layers_by_hnsw_level: Vec<RuvectorLayer>,
}

5. Hyperbolic Embeddings for Higher Layers

// Layer 0: Euclidean (local, grid-like)
// Layer 1+: Hyperbolic (hierarchical navigation)
pub enum GeometricEmbedding {
    Euclidean(Vec<f32>),
    Hyperbolic(Vec<f32>),  // Poincaré ball
}

6. Disentangled VAE

// Separate structural vs. semantic information
pub struct DisentangledGraphVAE {
    structural_encoder: RuvectorLayer,
    semantic_encoder: RuvectorLayer,
    decoder: Linear,
}

11.3 Long-Term Exploration

7. Information Bottleneck Optimization

  • Minimize I(X; H) while maximizing I(H; Y)
  • Variational bounds for tractability
  • Beta-annealing schedule

8. Graph Transformers

  • Replace message passing with full attention
  • Positional encodings (Laplacian eigenvectors, RoPE)
  • Layer-wise multi-scale attention

9. Neural ODEs for Continuous Depth

dh/dt = GNN(h(t), G)
h(T) = h(0) + ∫₀^T GNN(h(t), G) dt

12. Evaluation Metrics for Latent-Graph Alignment

12.1 Reconstruction Metrics

1. Link Prediction AUC

Measure how well latent similarity predicts edges
AUC-ROC on link prediction task

2. Graph Reconstruction Error

||A - σ(H H^T)||²_F
where A is adjacency, H is embeddings

12.2 Structural Preservation

3. Rank Correlation

Spearman ρ between:
  - Graph distance d_G(u, v)
  - Latent distance d_L(h_u, h_v)

4. Distortion

max_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
Worst-case embedding distortion

5. Average Distortion

(1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)|

12.3 Downstream Task Performance

6. Node Classification Accuracy

Train classifier on embeddings, test accuracy

7. Clustering Modularity

K-means on embeddings, measure graph modularity

8. HNSW Search Quality

Recall@K using learned embeddings vs. original features

References

Papers

Manifold Learning:

  1. Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
  2. Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction

Hyperbolic Embeddings: 3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations 4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks

Information Theory: 5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle 6. Velickovic et al. (2019) - Deep Graph Infomax

Contrastive Learning: 7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 8. You et al. (2020) - Graph Contrastive Learning with Augmentations

Disentanglement: 9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework 10. Ma et al. (2019) - Disentangled Graph Convolutional Networks

RuVector Code

  • crates/ruvector-gnn/src/layer.rs - GNN encoding
  • crates/ruvector-gnn/src/search.rs - Latent similarity (decoding)
  • crates/ruvector-gnn/src/training.rs - Contrastive losses (alignment)

Document Version: 1.0 Last Updated: 2025-11-30 Author: RuVector Research Team