Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

26 KiB

Raw Blame History

Latent Space ↔ Graph Reality Interplay

Executive Summary

This document explores the fundamental relationship between high-dimensional latent space (where embeddings live) and graph reality (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity.

Central Question: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology?

1. The Two Worlds

1.1 Latent Space Characteristics

Definition: Continuous, high-dimensional vector space where node embeddings reside

Latent Space L: R^d where d ∈ [64, 1024+]
Node embedding: h_v ∈ L
Distance metric: d(h_u, h_v) (cosine, L2, etc.)

Properties:

✓ Continuous: Smooth interpolation between points
✓ Dense: Every point is surrounded by infinitely many neighbors
✓ High-Dimensional: Curse of dimensionality, but expressive
✓ Metric: Equipped with distance/similarity function
✗ Isotropic: May not preserve hierarchical structure
✗ Euclidean Bias: Most operations assume flat geometry

Current RuVector Latent Space:

// From layer.rs:337-350
RuvectorLayer {
    input_dim: usize,   // Typically 64-256
    hidden_dim: usize,  // Typically 128-512
    // Embeddings live in R^{hidden_dim}
}

1.2 Graph Reality Characteristics

Definition: Discrete topological structure G = (V, E)

Graph G:
- Vertices: V = {v_1, ..., v_n}
- Edges: E ⊆ V × V
- Neighborhoods: N(v) = {u : (u,v) ∈ E}
- Topology: Small-world, scale-free, hierarchical, etc.

Properties:

✓ Discrete: Finite nodes and edges
✓ Sparse: |E| << |V|² (typically)
✓ Structured: Communities, hierarchies, motifs
✓ Relational: Explicit connections
✗ Non-Metric: Shortest path not always meaningful
✗ Heterogeneous: Variable degree, asymmetric

RuVector Graph (HNSW):

Hierarchical Navigable Small World:
- Layer 0: Dense graph (M = 16-64 neighbors)
- Layer 1+: Sparse graphs (long-range connections)
- Navigable: Greedy search finds approximate NN
- Small-world: Low diameter, high clustering

2. The Fundamental Tension

2.1 Embedding Paradox

Goal 1: Preserve graph topology in latent space

If (u, v) ∈ E, then ||h_u - h_v|| should be small

Goal 2: Preserve latent similarity in graph

If ||h_u - h_v|| is small, then u and v should be related

Paradox: These are not equivalent!

Graph neighbors may be semantically different (e.g., bridge edges)
Latent neighbors may not be graph-connected (e.g., same cluster, different components)

2.2 Information Bottleneck

Graph G ──encode──> Latent h ──decode──> Graph G'
          (GNN)                  (predict edges/nodes)

Bottleneck: Fixed-dimensional h must compress all information from:

Node features
Local topology (ego-net)
Global structure (communities, paths)
Edge attributes
Dynamic patterns

Trade-off:

High dimensions: More expressive, but curse of dimensionality
Low dimensions: Efficient, but lossy compression

3. Manifold Hypothesis for Graphs

3.1 Low-Dimensional Manifold

Hypothesis: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space

True data distribution: P_data(h) supported on manifold M ⊂ R^d
where dim(M) << d

Implications:

Intrinsic Dimensionality: Effective degrees of freedom much less than d
Local Linearity: Small neighborhoods approximately Euclidean
Global Curvature: Manifold may be curved (non-Euclidean)

Evidence in RuVector:

HNSW assumes low intrinsic dimension for efficient search
Multi-head attention learns multiple "views" of manifold
Layer normalization assumes local isotropy

3.2 Geometric Structure of Graph Embeddings

Question: What geometry best represents graphs?

Option 1: Euclidean (Current)

h ∈ R^d, distance = ||h_u - h_v||_2

✓ Simple, well-understood
✓ Efficient operations (dot products, linear maps)
✗ Poor for tree-like structures
✗ Exponential capacity limited

Option 2: Hyperbolic (Poincaré Ball)

h ∈ B^d = {x ∈ R^d : ||x|| < 1}
distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))

✓ Exponential capacity: Volume grows exponentially with radius
✓ Natural hierarchies: Tree embeddings with low distortion
✓ HNSW synergy: Hierarchical layers naturally hyperbolic
✗ More complex operations
✗ Numerical instability near boundary

Option 3: Mixed Curvature (Product Manifolds)

h = (h_euclidean, h_hyperbolic, h_spherical)
Combine different geometries for different aspects

3.3 Curvature and Graph Structure

Relationship:

Negative curvature (hyperbolic) ↔ Tree-like, hierarchical
Zero curvature (Euclidean) ↔ Grid-like, regular
Positive curvature (spherical) ↔ Cyclic, clustered

HNSW Topology:

Layer 0: Locally grid-like (Euclidean)
Higher layers: Tree-like navigation (Hyperbolic)
Overall: Mixed curvature

Implication: Single geometry may be suboptimal; consider mixed-curvature embeddings

4. Encoding: Graph → Latent Space

4.1 Message Passing Framework (Current)

Goal: Aggregate neighborhood information into node embedding

From layer.rs:362-401 (RuvectorLayer.forward):

h_v^{(l+1)} = UPDATE(
    h_v^{(l)},
    AGGREGATE({m_u^{(l)} : u ∈ N(v)}),
    TRANSFORM(h_v^{(l)})
)

Current Pipeline:

Message: m_u = W_msg · h_u
Attention Aggregate: a_v = MultiHeadAttention(h_v, {h_u})
Weighted Aggregate: agg_v = Σ w_uv · m_u
Combine: combined = a_v + agg_v
Update: h'_v = GRU(W_agg · combined, h_v)
Normalize: output = LayerNorm(Dropout(h'_v))

What This Encodes:

✓ Local neighborhood structure (1-hop in one layer)
✓ Neighbor feature aggregation
✓ Temporal dynamics (via GRU)
✗ Global structure (requires stacking layers)
✗ Structural properties (degree, centrality, etc.)
✗ Edge semantics (only weights, not features)

4.2 Multi-Hop Information Propagation

Challenge: K-layer GNN sees only K-hop neighborhood

Receptive field after L layers: d_graph(u, v) ≤ L

For HNSW:

Layer 0 average degree: ~50
Layer 1 average degree: ~10
Exponential reduction in higher layers

Trade-off:

Many layers: Large receptive field, but over-smoothing
Few layers: Localized, but miss global context

Over-Smoothing Problem:

As L → ∞, all node embeddings converge to the same value:
h_v^{(∞)} → E[h] for all v

Mitigation Strategies:

Skip Connections: h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)})
Residual GRU: Implicit in h_t = (1-z_t)h_{t-1} + z_t h̃_t
Jumping Knowledge: Concatenate all layer outputs
Adaptive Depth: Learn when to stop propagating

4.3 Structural Features Beyond Neighborhoods

Current limitation: Only neighbor features, not structural properties

Missing Encodings:

Node Degree: deg(v) = |N(v)|
Clustering Coefficient: C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2)
Centrality: Betweenness, closeness, eigenvector
Community Membership: Detected clusters
HNSW Layer: Which layers the node appears in

Proposed Enhancement:

pub struct StructuralFeatures {
    degree: f32,
    clustering_coef: f32,
    hnsw_layers: Vec<usize>,  // Layers this node appears in
    centrality: f32,
}

impl RuvectorLayer {
    fn forward_with_structural(
        &self,
        node_embedding: &[f32],
        neighbor_embeddings: &[Vec<f32>],
        edge_weights: &[f32],
        structural_features: &StructuralFeatures,  // NEW
    ) -> Vec<f32> {
        // Concatenate structural features to embedding
        let augmented = [node_embedding, &structural_features.to_vec()].concat();

        // Proceed with standard forward pass
        // ...
    }
}

5. Decoding: Latent Space → Graph Predictions

5.1 Link Prediction

Goal: Predict edge existence from embeddings

Score(u, v) = f(h_u, h_v)
P((u,v) ∈ E) = σ(Score(u, v))

Scoring Functions:

1. Dot Product (Current in search.rs)

score = h_u.dot(h_v)

✓ Fast O(d)
✗ Not invariant to scaling

2. Cosine Similarity (Current in search.rs)

score = h_u.dot(h_v) / (||h_u|| · ||h_v||)

✓ Scale-invariant
✓ Natural for normalized embeddings
✗ Ignores magnitude information

3. Distance-Based

score = -||h_u - h_v||²

✓ Metric structure
✗ Negative, unbounded

4. Bilinear

score = h_u^T W h_v

✓ Learnable asymmetry
✗ O(d²) parameters

5. MLP (Most Expressive)

score = MLP([h_u || h_v || (h_u ⊙ h_v)])

✓ Highly expressive
✗ Expensive O(d²) or more

RuVector Current:

// From search.rs:4-18
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
    let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
    let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
    let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;

    if norm_a == 0.0 || norm_b == 0.0 {
        0.0
    } else {
        dot_product / (norm_a * norm_b)
    }
}

5.2 Node Classification

Goal: Predict node labels from embeddings

From graph_neural.rs:82-98:
classify_node(h_v) → class probabilities

Typical Approach:

logits = W_class · h_v + b
probs = softmax(logits)

Challenge: Embedding must encode label-relevant information

5.3 Graph Reconstruction

Goal: Reconstruct adjacency matrix from embeddings

Autoencoder Framework:

Encoder: A → H (GNN)
Decoder: H → A' (pairwise scoring)

Loss: ||A - A'||² or Cross-Entropy

Reconstruction Loss:

// Proposed
pub fn graph_reconstruction_loss(
    embeddings: &[Vec<f32>],
    adjacency: &[(usize, usize)],  // True edges
) -> f32 {
    let mut loss = 0.0;
    let n = embeddings.len();

    // Positive edges (should have high score)
    for &(i, j) in adjacency {
        let score = cosine_similarity(&embeddings[i], &embeddings[j]);
        loss -= (score + 1e-10).ln();  // -log(score)
    }

    // Negative sampling (non-edges should have low score)
    for _ in 0..adjacency.len() {
        let i = rand::random::<usize>() % n;
        let j = rand::random::<usize>() % n;
        if !adjacency.contains(&(i, j)) {
            let score = cosine_similarity(&embeddings[i], &embeddings[j]);
            loss -= (1.0 - score + 1e-10).ln();  // -log(1 - score)
        }
    }

    loss / (2 * adjacency.len()) as f32
}

6. Information-Theoretic Perspective

6.1 Mutual Information

Goal: Maximize mutual information between graph structure G and embeddings H

max I(G; H) = H(G) - H(G|H)
            = H(H) - H(H|G)

Interpretation:

I(G; H) measures how much knowing H tells us about G
Perfect encoding: I(G; H) = H(G) (H captures all graph info)
Independence: I(G; H) = 0 (H tells nothing about G)

Challenges:

Intractability: Computing I(G; H) is hard
Continuous H: Differential entropy unbounded
Discrete G: Entropy depends on graph size

6.2 Deep Graph Infomax (DGI)

Idea: Maximize MI between node embeddings and graph summary

DGI Loss:
  max I(h_v; h_G)

where:
  h_v: node embedding
  h_G: graph-level summary (e.g., mean pooling)

Implementation:

pub fn deep_graph_infomax_loss(
    node_embeddings: &[Vec<f32>],
    graph_summary: &[f32],  // Readout function output
    negative_samples: &[Vec<f32>],  // Corrupted embeddings
) -> f32 {
    let mut loss = 0.0;

    // Positive samples: real (node, graph) pairs
    for h_v in node_embeddings {
        let score = discriminator(h_v, graph_summary);  // MLP or bilinear
        loss -= (sigmoid(score) + 1e-10).ln();
    }

    // Negative samples: (corrupted node, graph) pairs
    for h_neg in negative_samples {
        let score = discriminator(h_neg, graph_summary);
        loss -= (1.0 - sigmoid(score) + 1e-10).ln();
    }

    loss / (node_embeddings.len() + negative_samples.len()) as f32
}

Readout Functions (graph summary):

Mean: h_G = (1/n) Σ_v h_v
Max: h_G = max_v h_v (element-wise)
Attention: h_G = Σ_v α_v h_v where α_v = softmax(MLP(h_v))

6.3 Information Bottleneck Principle

Principle: Find minimal sufficient representation

min I(X; H) - β I(H; Y)

where:
  X: input features
  H: learned embeddings
  Y: prediction target
  β: trade-off parameter

Graph Context:

X: Node features + neighborhood structure
H: Node embeddings
Y: Downstream task (link, classification)

Goal: Compress X into H, retaining only task-relevant information

Implementation Strategy:

Variational Bound: Use VAE-style reparameterization
Lagrange Multiplier: β controls compression vs. performance
Regularization: Encourage low mutual information I(X; H)

7. Contrastive Learning for Graph-Latent Alignment

7.1 Contrastive Objectives

Core Idea: Pull together related nodes in latent space, push apart unrelated nodes

InfoNCE Loss (Current in training.rs:362-411):

pub fn info_nce_loss(
    anchor: &[f32],
    positives: &[&[f32]],
    negatives: &[&[f32]],
    temperature: f32,
) -> f32

Mathematical Form:

L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ)))

What This Optimizes:

Positive pairs (h_v, h_+): Graph neighbors, semantically similar
Negative pairs (h_v, h_-): Non-neighbors, dissimilar
Temperature τ: Controls hardness of negatives

7.2 Local Contrastive Loss (Graph-Specific)

Current Implementation (training.rs:444-462):

pub fn local_contrastive_loss(
    node_embedding: &[f32],
    neighbor_embeddings: &[Vec<f32>],
    non_neighbor_embeddings: &[Vec<f32>],
    temperature: f32,
) -> f32

Graph-Aware Sampling:

Positives: Direct graph neighbors N(v)
Negatives: Non-neighbors (random or hard negatives)

Variants:

1. K-Hop Positives

Positives = {u : d_graph(v, u) ≤ K}
Encourages multi-hop proximity in latent space

2. Community-Based

Positives = {u : community(v) = community(u)}
Negatives = {u : community(v) ≠ community(u)}
Encourages cluster separation

3. HNSW Layer-Based

Positives = {u : (u,v) ∈ E_layer_k}
Different contrastive losses per HNSW layer

7.3 Hard Negative Mining

Problem: Random negatives are often too easy

Solution: Sample hard negatives (latent-close but graph-far)

fn sample_hard_negatives(
    node: &[f32],
    all_embeddings: &[Vec<f32>],
    true_neighbors: &[usize],
    k: usize,
) -> Vec<Vec<f32>> {
    // 1. Compute similarities to all nodes
    let similarities: Vec<(usize, f32)> = all_embeddings
        .iter()
        .enumerate()
        .filter(|(i, _)| !true_neighbors.contains(i))  // Exclude true neighbors
        .map(|(i, emb)| (i, cosine_similarity(node, emb)))
        .collect();

    // 2. Sort by similarity (descending)
    similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

    // 3. Take top-k (most similar non-neighbors = hard negatives)
    similarities.iter()
        .take(k)
        .map(|(i, _)| all_embeddings[*i].clone())
        .collect()
}

Benefits:

Focuses learning on difficult cases
Improves discrimination boundaries
Speeds up convergence

8. Spectral Methods and Graph Signals

8.1 Graph Laplacian

Normalized Laplacian:

L = I - D^(-1/2) A D^(-1/2)

where:
  A: adjacency matrix
  D: degree matrix (diagonal)

Eigendecomposition:

L = U Λ U^T

where:
  U: eigenvectors (graph Fourier basis)
  Λ: eigenvalues (frequencies)

Interpretation:

Small eigenvalues ↔ Low-frequency (smooth signals)
Large eigenvalues ↔ High-frequency (oscillatory signals)

8.2 Spectral GNN Connection

Classical GCN (Kipf & Welling):

H^{(l+1)} = σ(D̃^(-1/2) Ã D̃^(-1/2) H^{(l)} W^{(l)})

where Ã = A + I (self-loops)

Spectral Interpretation:

Aggregation = Low-pass filter
Smooths node features along graph structure
Eigenvalues control extent of smoothing

RuVector's Approach (Spatial, not spectral):

Message passing is spatial formulation
Attention adds adaptive filtering
GRU adds temporal component

Missing Spectral Component:

No explicit frequency analysis
Could add spectral loss to preserve frequency content

8.3 Spectral Loss Functions

Goal: Preserve graph spectral properties in embeddings

Laplacian Eigenmaps:

min_H Σ_{(i,j) ∈ E} ||h_i - h_j||²
subject to H^T H = I

Equivalent to minimizing: Tr(H^T L H)

Implementation:

pub fn spectral_loss(
    embeddings: &[Vec<f32>],
    adjacency: &[(usize, usize)],
    degrees: &[f32],
) -> f32 {
    let mut loss = 0.0;

    // Laplacian regularization: ||h_i - h_j||² for edges
    for &(i, j) in adjacency {
        let diff = subtract(&embeddings[i], &embeddings[j]);
        let norm_sq = l2_norm_squared(&diff);

        // Normalize by degrees
        let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt());
        loss += weight * norm_sq;
    }

    loss
}

Benefits:

Smooth embeddings along graph structure
Preserves community structure
Theoretical guarantees (Laplacian eigenmaps)

9. Disentangled Representations

9.1 Motivation

Problem: Current embeddings are entangled (single vector encodes everything)

Goal: Separate embedding into interpretable factors

h_v = [h_structural || h_semantic || h_temporal]

where:
  h_structural: Topology (degree, centrality, etc.)
  h_semantic: Feature content
  h_temporal: Dynamics (for evolving graphs)

9.2 β-VAE for Graphs

Variational Autoencoder with Disentanglement:

Encoder: (X_v, N(v)) → q(z_v | X_v, N(v))
Decoder: z_v → p(X_v, N(v) | z_v)

Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v))

β > 1: Encourages disentanglement (independence of latent factors)

Implementation Sketch:

pub struct GraphVAE {
    encoder: RuvectorLayer,
    mu_layer: Linear,
    logvar_layer: Linear,
    decoder: Linear,
}

impl GraphVAE {
    fn encode(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, Vec<f32>) {
        let h = self.encoder.forward(node_features, neighbors, &[]);
        let mu = self.mu_layer.forward(&h);
        let logvar = self.logvar_layer.forward(&h);
        (mu, logvar)
    }

    fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
        let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
        let eps: Vec<f32> = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect();

        mu.iter().zip(std.iter()).zip(eps.iter())
            .map(|((&m, &s), &e)| m + s * e)
            .collect()
    }

    fn forward(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, f32) {
        let (mu, logvar) = self.encode(node_features, neighbors);
        let z = self.reparameterize(&mu, &logvar);

        // Reconstruct node features
        let recon = self.decoder.forward(&z);

        // KL divergence
        let kl: f32 = mu.iter().zip(logvar.iter())
            .map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
            .sum();

        (recon, kl)
    }
}

9.3 Disentanglement Metrics

1. Mutual Information Gap (MIG)

MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k)

Measures how uniquely each latent factor captures each ground-truth factor

2. SAP (Separated Attribute Predictability)

Train linear classifiers z → y for each attribute
Measure how well z predicts individual factors

Application to Graphs:

Ground-truth factors: Degree, clustering, centrality, community
Learned latent: h_v
Metric: MIG or SAP between h_v components and structural properties

10. Hierarchical Representations (HNSW-Specific)

10.1 Multi-Scale Embeddings

Idea: Different embeddings for different HNSW layers

Node v appears in layers {0, 2, 3}:
  h_v^{(0)}: Dense, local structure
  h_v^{(2)}: Coarse, medium-range
  h_v^{(3)}: Global, long-range hubs

Hierarchical Encoding:

pub struct HierarchicalEmbedding {
    embeddings_by_layer: HashMap<usize, Vec<f32>>,
}

impl HierarchicalEmbedding {
    fn get_embedding(&self, layer: usize) -> &Vec<f32> {
        self.embeddings_by_layer.get(&layer)
            .expect("Node not in this layer")
    }

    // Interpolate between layers for search
    fn interpolated_embedding(&self, target_layer: f32) -> Vec<f32> {
        let layer_low = target_layer.floor() as usize;
        let layer_high = target_layer.ceil() as usize;

        if layer_low == layer_high {
            return self.get_embedding(layer_low).clone();
        }

        let alpha = target_layer - layer_low as f32;
        let emb_low = self.get_embedding(layer_low);
        let emb_high = self.get_embedding(layer_high);

        // Linear interpolation
        emb_low.iter().zip(emb_high.iter())
            .map(|(&l, &h)| (1.0 - alpha) * l + alpha * h)
            .collect()
    }
}

Hierarchical Loss:

fn hierarchical_contrastive_loss(
    node_hierarchical_emb: &HierarchicalEmbedding,
    neighbors_by_layer: &HashMap<usize, Vec<HierarchicalEmbedding>>,
) -> f32 {
    let mut loss = 0.0;

    // Contrastive loss at each layer
    for (layer, layer_neighbors) in neighbors_by_layer {
        let h_v = node_hierarchical_emb.get_embedding(*layer);
        let positives: Vec<&Vec<f32>> = layer_neighbors.iter()
            .map(|n| n.get_embedding(*layer))
            .collect();

        // Sample negatives from other layers
        let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer);

        loss += info_nce_loss(h_v, &positives, &negatives, 0.07);
    }

    loss / neighbors_by_layer.len() as f32
}

10.2 Coarse-to-Fine Alignment

Goal: Ensure consistency across HNSW layers

Alignment Loss:
  L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||²

where Project: R^{d_high} → R^{d_low} (e.g., learned linear map)

Benefits:

Global structure (high layers) guides local (low layers)
Enables layer-skipping (jump from layer 3 to layer 0 embedding)
Multi-resolution representation

11. Practical Strategies for RuVector

11.1 Short-Term Enhancements

1. Structural Feature Augmentation

// Add degree, clustering, HNSW layer info to embeddings
let augmented_embedding = [
    &node_embedding[..],
    &[degree as f32],
    &[clustering_coef],
    &one_hot_layer[..],
].concat();

2. Spectral Regularization

// Add spectral loss to training
total_loss = contrastive_loss + λ_spectral * spectral_loss

3. Hard Negative Sampling

// Replace random negatives with hard negatives in local_contrastive_loss
let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k);
let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature);

11.2 Medium-Term Research

4. Hierarchical Embeddings per HNSW Layer

pub struct HNSWHierarchicalGNN {
    gnn_layers_by_hnsw_level: Vec<RuvectorLayer>,
}

5. Hyperbolic Embeddings for Higher Layers

// Layer 0: Euclidean (local, grid-like)
// Layer 1+: Hyperbolic (hierarchical navigation)
pub enum GeometricEmbedding {
    Euclidean(Vec<f32>),
    Hyperbolic(Vec<f32>),  // Poincaré ball
}

6. Disentangled VAE

// Separate structural vs. semantic information
pub struct DisentangledGraphVAE {
    structural_encoder: RuvectorLayer,
    semantic_encoder: RuvectorLayer,
    decoder: Linear,
}

11.3 Long-Term Exploration

7. Information Bottleneck Optimization

Minimize I(X; H) while maximizing I(H; Y)
Variational bounds for tractability
Beta-annealing schedule

8. Graph Transformers

Replace message passing with full attention
Positional encodings (Laplacian eigenvectors, RoPE)
Layer-wise multi-scale attention

9. Neural ODEs for Continuous Depth

dh/dt = GNN(h(t), G)
h(T) = h(0) + ∫₀^T GNN(h(t), G) dt

12. Evaluation Metrics for Latent-Graph Alignment

12.1 Reconstruction Metrics

1. Link Prediction AUC

Measure how well latent similarity predicts edges
AUC-ROC on link prediction task

2. Graph Reconstruction Error

||A - σ(H H^T)||²_F
where A is adjacency, H is embeddings

12.2 Structural Preservation

3. Rank Correlation

Spearman ρ between:
  - Graph distance d_G(u, v)
  - Latent distance d_L(h_u, h_v)

4. Distortion

max_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
Worst-case embedding distortion

5. Average Distortion

(1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)|

12.3 Downstream Task Performance

6. Node Classification Accuracy

Train classifier on embeddings, test accuracy

7. Clustering Modularity

K-means on embeddings, measure graph modularity

8. HNSW Search Quality

Recall@K using learned embeddings vs. original features

References

Papers

Manifold Learning:

Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction

Hyperbolic Embeddings: 3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations 4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks

Information Theory: 5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle 6. Velickovic et al. (2019) - Deep Graph Infomax

Contrastive Learning: 7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 8. You et al. (2020) - Graph Contrastive Learning with Augmentations

Disentanglement: 9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework 10. Ma et al. (2019) - Disentangled Graph Convolutional Networks

RuVector Code

crates/ruvector-gnn/src/layer.rs - GNN encoding
crates/ruvector-gnn/src/search.rs - Latent similarity (decoding)
crates/ruvector-gnn/src/training.rs - Contrastive losses (alignment)

Document Version: 1.0 Last Updated: 2025-11-30 Author: RuVector Research Team

26 KiB Raw Blame History Unescape Escape

Latent Space ↔ Graph Reality Interplay

Executive Summary

1. The Two Worlds

1.1 Latent Space Characteristics

1.2 Graph Reality Characteristics

2. The Fundamental Tension

2.1 Embedding Paradox

2.2 Information Bottleneck

3. Manifold Hypothesis for Graphs

3.1 Low-Dimensional Manifold

3.2 Geometric Structure of Graph Embeddings

3.3 Curvature and Graph Structure

4. Encoding: Graph → Latent Space

4.1 Message Passing Framework (Current)

4.2 Multi-Hop Information Propagation

4.3 Structural Features Beyond Neighborhoods

5. Decoding: Latent Space → Graph Predictions

5.1 Link Prediction

5.2 Node Classification

5.3 Graph Reconstruction

6. Information-Theoretic Perspective

6.1 Mutual Information

6.2 Deep Graph Infomax (DGI)

6.3 Information Bottleneck Principle

7. Contrastive Learning for Graph-Latent Alignment

7.1 Contrastive Objectives

7.2 Local Contrastive Loss (Graph-Specific)

7.3 Hard Negative Mining

8. Spectral Methods and Graph Signals

8.1 Graph Laplacian

8.2 Spectral GNN Connection

8.3 Spectral Loss Functions

9. Disentangled Representations

9.1 Motivation

9.2 β-VAE for Graphs

9.3 Disentanglement Metrics

10. Hierarchical Representations (HNSW-Specific)

10.1 Multi-Scale Embeddings

10.2 Coarse-to-Fine Alignment

11. Practical Strategies for RuVector

11.1 Short-Term Enhancements

11.2 Medium-Term Research

11.3 Long-Term Exploration

12. Evaluation Metrics for Latent-Graph Alignment

12.1 Reconstruction Metrics

12.2 Structural Preservation

12.3 Downstream Task Performance

References

Papers

RuVector Code

26 KiB

Raw Blame History