# Latent Space ↔ Graph Reality Interplay ## Executive Summary This document explores the fundamental relationship between **high-dimensional latent space** (where embeddings live) and **graph reality** (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity. **Central Question**: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology? --- ## 1. The Two Worlds ### 1.1 Latent Space Characteristics **Definition**: Continuous, high-dimensional vector space where node embeddings reside ``` Latent Space L: R^d where d ∈ [64, 1024+] Node embedding: h_v ∈ L Distance metric: d(h_u, h_v) (cosine, L2, etc.) ``` **Properties**: - ✓ **Continuous**: Smooth interpolation between points - ✓ **Dense**: Every point is surrounded by infinitely many neighbors - ✓ **High-Dimensional**: Curse of dimensionality, but expressive - ✓ **Metric**: Equipped with distance/similarity function - ✗ **Isotropic**: May not preserve hierarchical structure - ✗ **Euclidean Bias**: Most operations assume flat geometry **Current RuVector Latent Space**: ```rust // From layer.rs:337-350 RuvectorLayer { input_dim: usize, // Typically 64-256 hidden_dim: usize, // Typically 128-512 // Embeddings live in R^{hidden_dim} } ``` ### 1.2 Graph Reality Characteristics **Definition**: Discrete topological structure G = (V, E) ``` Graph G: - Vertices: V = {v_1, ..., v_n} - Edges: E ⊆ V × V - Neighborhoods: N(v) = {u : (u,v) ∈ E} - Topology: Small-world, scale-free, hierarchical, etc. ``` **Properties**: - ✓ **Discrete**: Finite nodes and edges - ✓ **Sparse**: |E| << |V|² (typically) - ✓ **Structured**: Communities, hierarchies, motifs - ✓ **Relational**: Explicit connections - ✗ **Non-Metric**: Shortest path not always meaningful - ✗ **Heterogeneous**: Variable degree, asymmetric **RuVector Graph (HNSW)**: ``` Hierarchical Navigable Small World: - Layer 0: Dense graph (M = 16-64 neighbors) - Layer 1+: Sparse graphs (long-range connections) - Navigable: Greedy search finds approximate NN - Small-world: Low diameter, high clustering ``` --- ## 2. The Fundamental Tension ### 2.1 Embedding Paradox **Goal 1**: Preserve graph topology in latent space ``` If (u, v) ∈ E, then ||h_u - h_v|| should be small ``` **Goal 2**: Preserve latent similarity in graph ``` If ||h_u - h_v|| is small, then u and v should be related ``` **Paradox**: These are not equivalent! - **Graph neighbors** may be semantically different (e.g., bridge edges) - **Latent neighbors** may not be graph-connected (e.g., same cluster, different components) ### 2.2 Information Bottleneck ``` Graph G ──encode──> Latent h ──decode──> Graph G' (GNN) (predict edges/nodes) ``` **Bottleneck**: Fixed-dimensional h must compress all information from: - Node features - Local topology (ego-net) - Global structure (communities, paths) - Edge attributes - Dynamic patterns **Trade-off**: - **High dimensions**: More expressive, but curse of dimensionality - **Low dimensions**: Efficient, but lossy compression --- ## 3. Manifold Hypothesis for Graphs ### 3.1 Low-Dimensional Manifold **Hypothesis**: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space ``` True data distribution: P_data(h) supported on manifold M ⊂ R^d where dim(M) << d ``` **Implications**: 1. **Intrinsic Dimensionality**: Effective degrees of freedom much less than d 2. **Local Linearity**: Small neighborhoods approximately Euclidean 3. **Global Curvature**: Manifold may be curved (non-Euclidean) **Evidence in RuVector**: - HNSW assumes low intrinsic dimension for efficient search - Multi-head attention learns multiple "views" of manifold - Layer normalization assumes local isotropy ### 3.2 Geometric Structure of Graph Embeddings **Question**: What geometry best represents graphs? **Option 1: Euclidean (Current)** ``` h ∈ R^d, distance = ||h_u - h_v||_2 ``` - ✓ Simple, well-understood - ✓ Efficient operations (dot products, linear maps) - ✗ Poor for tree-like structures - ✗ Exponential capacity limited **Option 2: Hyperbolic (Poincaré Ball)** ``` h ∈ B^d = {x ∈ R^d : ||x|| < 1} distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²))) ``` - ✓ **Exponential capacity**: Volume grows exponentially with radius - ✓ **Natural hierarchies**: Tree embeddings with low distortion - ✓ **HNSW synergy**: Hierarchical layers naturally hyperbolic - ✗ More complex operations - ✗ Numerical instability near boundary **Option 3: Mixed Curvature (Product Manifolds)** ``` h = (h_euclidean, h_hyperbolic, h_spherical) Combine different geometries for different aspects ``` ### 3.3 Curvature and Graph Structure **Relationship**: ``` Negative curvature (hyperbolic) ↔ Tree-like, hierarchical Zero curvature (Euclidean) ↔ Grid-like, regular Positive curvature (spherical) ↔ Cyclic, clustered ``` **HNSW Topology**: - **Layer 0**: Locally grid-like (Euclidean) - **Higher layers**: Tree-like navigation (Hyperbolic) - **Overall**: Mixed curvature **Implication**: Single geometry may be suboptimal; consider mixed-curvature embeddings --- ## 4. Encoding: Graph → Latent Space ### 4.1 Message Passing Framework (Current) **Goal**: Aggregate neighborhood information into node embedding ``` From layer.rs:362-401 (RuvectorLayer.forward): h_v^{(l+1)} = UPDATE( h_v^{(l)}, AGGREGATE({m_u^{(l)} : u ∈ N(v)}), TRANSFORM(h_v^{(l)}) ) ``` **Current Pipeline**: 1. **Message**: `m_u = W_msg · h_u` 2. **Attention Aggregate**: `a_v = MultiHeadAttention(h_v, {h_u})` 3. **Weighted Aggregate**: `agg_v = Σ w_uv · m_u` 4. **Combine**: `combined = a_v + agg_v` 5. **Update**: `h'_v = GRU(W_agg · combined, h_v)` 6. **Normalize**: `output = LayerNorm(Dropout(h'_v))` **What This Encodes**: - ✓ Local neighborhood structure (1-hop in one layer) - ✓ Neighbor feature aggregation - ✓ Temporal dynamics (via GRU) - ✗ Global structure (requires stacking layers) - ✗ Structural properties (degree, centrality, etc.) - ✗ Edge semantics (only weights, not features) ### 4.2 Multi-Hop Information Propagation **Challenge**: K-layer GNN sees only K-hop neighborhood ``` Receptive field after L layers: d_graph(u, v) ≤ L ``` **For HNSW**: - Layer 0 average degree: ~50 - Layer 1 average degree: ~10 - Exponential reduction in higher layers **Trade-off**: - **Many layers**: Large receptive field, but over-smoothing - **Few layers**: Localized, but miss global context **Over-Smoothing Problem**: ``` As L → ∞, all node embeddings converge to the same value: h_v^{(∞)} → E[h] for all v ``` **Mitigation Strategies**: 1. **Skip Connections**: `h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)})` 2. **Residual GRU**: Implicit in `h_t = (1-z_t)h_{t-1} + z_t h̃_t` 3. **Jumping Knowledge**: Concatenate all layer outputs 4. **Adaptive Depth**: Learn when to stop propagating ### 4.3 Structural Features Beyond Neighborhoods **Current limitation**: Only neighbor features, not structural properties **Missing Encodings**: 1. **Node Degree**: `deg(v) = |N(v)|` 2. **Clustering Coefficient**: `C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2)` 3. **Centrality**: Betweenness, closeness, eigenvector 4. **Community Membership**: Detected clusters 5. **HNSW Layer**: Which layers the node appears in **Proposed Enhancement**: ```rust pub struct StructuralFeatures { degree: f32, clustering_coef: f32, hnsw_layers: Vec, // Layers this node appears in centrality: f32, } impl RuvectorLayer { fn forward_with_structural( &self, node_embedding: &[f32], neighbor_embeddings: &[Vec], edge_weights: &[f32], structural_features: &StructuralFeatures, // NEW ) -> Vec { // Concatenate structural features to embedding let augmented = [node_embedding, &structural_features.to_vec()].concat(); // Proceed with standard forward pass // ... } } ``` --- ## 5. Decoding: Latent Space → Graph Predictions ### 5.1 Link Prediction **Goal**: Predict edge existence from embeddings ``` Score(u, v) = f(h_u, h_v) P((u,v) ∈ E) = σ(Score(u, v)) ``` **Scoring Functions**: **1. Dot Product** (Current in search.rs) ```rust score = h_u.dot(h_v) ``` - ✓ Fast O(d) - ✗ Not invariant to scaling **2. Cosine Similarity** (Current in search.rs) ```rust score = h_u.dot(h_v) / (||h_u|| · ||h_v||) ``` - ✓ Scale-invariant - ✓ Natural for normalized embeddings - ✗ Ignores magnitude information **3. Distance-Based** ```rust score = -||h_u - h_v||² ``` - ✓ Metric structure - ✗ Negative, unbounded **4. Bilinear** ```rust score = h_u^T W h_v ``` - ✓ Learnable asymmetry - ✗ O(d²) parameters **5. MLP (Most Expressive)** ```rust score = MLP([h_u || h_v || (h_u ⊙ h_v)]) ``` - ✓ Highly expressive - ✗ Expensive O(d²) or more **RuVector Current**: ```rust // From search.rs:4-18 pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 { let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum(); let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::().sqrt()) as f32; let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::().sqrt()) as f32; if norm_a == 0.0 || norm_b == 0.0 { 0.0 } else { dot_product / (norm_a * norm_b) } } ``` ### 5.2 Node Classification **Goal**: Predict node labels from embeddings ``` From graph_neural.rs:82-98: classify_node(h_v) → class probabilities ``` **Typical Approach**: ``` logits = W_class · h_v + b probs = softmax(logits) ``` **Challenge**: Embedding must encode label-relevant information ### 5.3 Graph Reconstruction **Goal**: Reconstruct adjacency matrix from embeddings **Autoencoder Framework**: ``` Encoder: A → H (GNN) Decoder: H → A' (pairwise scoring) Loss: ||A - A'||² or Cross-Entropy ``` **Reconstruction Loss**: ```rust // Proposed pub fn graph_reconstruction_loss( embeddings: &[Vec], adjacency: &[(usize, usize)], // True edges ) -> f32 { let mut loss = 0.0; let n = embeddings.len(); // Positive edges (should have high score) for &(i, j) in adjacency { let score = cosine_similarity(&embeddings[i], &embeddings[j]); loss -= (score + 1e-10).ln(); // -log(score) } // Negative sampling (non-edges should have low score) for _ in 0..adjacency.len() { let i = rand::random::() % n; let j = rand::random::() % n; if !adjacency.contains(&(i, j)) { let score = cosine_similarity(&embeddings[i], &embeddings[j]); loss -= (1.0 - score + 1e-10).ln(); // -log(1 - score) } } loss / (2 * adjacency.len()) as f32 } ``` --- ## 6. Information-Theoretic Perspective ### 6.1 Mutual Information **Goal**: Maximize mutual information between graph structure G and embeddings H ``` max I(G; H) = H(G) - H(G|H) = H(H) - H(H|G) ``` **Interpretation**: - `I(G; H)` measures how much knowing H tells us about G - Perfect encoding: `I(G; H) = H(G)` (H captures all graph info) - Independence: `I(G; H) = 0` (H tells nothing about G) **Challenges**: 1. **Intractability**: Computing I(G; H) is hard 2. **Continuous H**: Differential entropy unbounded 3. **Discrete G**: Entropy depends on graph size ### 6.2 Deep Graph Infomax (DGI) **Idea**: Maximize MI between node embeddings and graph summary ``` DGI Loss: max I(h_v; h_G) where: h_v: node embedding h_G: graph-level summary (e.g., mean pooling) ``` **Implementation**: ```rust pub fn deep_graph_infomax_loss( node_embeddings: &[Vec], graph_summary: &[f32], // Readout function output negative_samples: &[Vec], // Corrupted embeddings ) -> f32 { let mut loss = 0.0; // Positive samples: real (node, graph) pairs for h_v in node_embeddings { let score = discriminator(h_v, graph_summary); // MLP or bilinear loss -= (sigmoid(score) + 1e-10).ln(); } // Negative samples: (corrupted node, graph) pairs for h_neg in negative_samples { let score = discriminator(h_neg, graph_summary); loss -= (1.0 - sigmoid(score) + 1e-10).ln(); } loss / (node_embeddings.len() + negative_samples.len()) as f32 } ``` **Readout Functions** (graph summary): 1. **Mean**: `h_G = (1/n) Σ_v h_v` 2. **Max**: `h_G = max_v h_v` (element-wise) 3. **Attention**: `h_G = Σ_v α_v h_v` where `α_v = softmax(MLP(h_v))` ### 6.3 Information Bottleneck Principle **Principle**: Find minimal sufficient representation ``` min I(X; H) - β I(H; Y) where: X: input features H: learned embeddings Y: prediction target β: trade-off parameter ``` **Graph Context**: - `X`: Node features + neighborhood structure - `H`: Node embeddings - `Y`: Downstream task (link, classification) **Goal**: Compress X into H, retaining only task-relevant information **Implementation Strategy**: 1. **Variational Bound**: Use VAE-style reparameterization 2. **Lagrange Multiplier**: β controls compression vs. performance 3. **Regularization**: Encourage low mutual information I(X; H) --- ## 7. Contrastive Learning for Graph-Latent Alignment ### 7.1 Contrastive Objectives **Core Idea**: Pull together related nodes in latent space, push apart unrelated nodes **InfoNCE Loss** (Current in training.rs:362-411): ```rust pub fn info_nce_loss( anchor: &[f32], positives: &[&[f32]], negatives: &[&[f32]], temperature: f32, ) -> f32 ``` **Mathematical Form**: ``` L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ))) ``` **What This Optimizes**: - Positive pairs `(h_v, h_+)`: Graph neighbors, semantically similar - Negative pairs `(h_v, h_-)`: Non-neighbors, dissimilar - Temperature τ: Controls hardness of negatives ### 7.2 Local Contrastive Loss (Graph-Specific) **Current Implementation** (training.rs:444-462): ```rust pub fn local_contrastive_loss( node_embedding: &[f32], neighbor_embeddings: &[Vec], non_neighbor_embeddings: &[Vec], temperature: f32, ) -> f32 ``` **Graph-Aware Sampling**: - **Positives**: Direct graph neighbors `N(v)` - **Negatives**: Non-neighbors (random or hard negatives) **Variants**: **1. K-Hop Positives** ``` Positives = {u : d_graph(v, u) ≤ K} Encourages multi-hop proximity in latent space ``` **2. Community-Based** ``` Positives = {u : community(v) = community(u)} Negatives = {u : community(v) ≠ community(u)} Encourages cluster separation ``` **3. HNSW Layer-Based** ``` Positives = {u : (u,v) ∈ E_layer_k} Different contrastive losses per HNSW layer ``` ### 7.3 Hard Negative Mining **Problem**: Random negatives are often too easy **Solution**: Sample hard negatives (latent-close but graph-far) ```rust fn sample_hard_negatives( node: &[f32], all_embeddings: &[Vec], true_neighbors: &[usize], k: usize, ) -> Vec> { // 1. Compute similarities to all nodes let similarities: Vec<(usize, f32)> = all_embeddings .iter() .enumerate() .filter(|(i, _)| !true_neighbors.contains(i)) // Exclude true neighbors .map(|(i, emb)| (i, cosine_similarity(node, emb))) .collect(); // 2. Sort by similarity (descending) similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap()); // 3. Take top-k (most similar non-neighbors = hard negatives) similarities.iter() .take(k) .map(|(i, _)| all_embeddings[*i].clone()) .collect() } ``` **Benefits**: - Focuses learning on difficult cases - Improves discrimination boundaries - Speeds up convergence --- ## 8. Spectral Methods and Graph Signals ### 8.1 Graph Laplacian **Normalized Laplacian**: ``` L = I - D^(-1/2) A D^(-1/2) where: A: adjacency matrix D: degree matrix (diagonal) ``` **Eigendecomposition**: ``` L = U Λ U^T where: U: eigenvectors (graph Fourier basis) Λ: eigenvalues (frequencies) ``` **Interpretation**: - Small eigenvalues ↔ Low-frequency (smooth signals) - Large eigenvalues ↔ High-frequency (oscillatory signals) ### 8.2 Spectral GNN Connection **Classical GCN** (Kipf & Welling): ``` H^{(l+1)} = σ(D̃^(-1/2) à D̃^(-1/2) H^{(l)} W^{(l)}) where à = A + I (self-loops) ``` **Spectral Interpretation**: - Aggregation = Low-pass filter - Smooths node features along graph structure - Eigenvalues control extent of smoothing **RuVector's Approach** (Spatial, not spectral): - Message passing is spatial formulation - Attention adds adaptive filtering - GRU adds temporal component **Missing Spectral Component**: - No explicit frequency analysis - Could add spectral loss to preserve frequency content ### 8.3 Spectral Loss Functions **Goal**: Preserve graph spectral properties in embeddings **Laplacian Eigenmaps**: ``` min_H Σ_{(i,j) ∈ E} ||h_i - h_j||² subject to H^T H = I Equivalent to minimizing: Tr(H^T L H) ``` **Implementation**: ```rust pub fn spectral_loss( embeddings: &[Vec], adjacency: &[(usize, usize)], degrees: &[f32], ) -> f32 { let mut loss = 0.0; // Laplacian regularization: ||h_i - h_j||² for edges for &(i, j) in adjacency { let diff = subtract(&embeddings[i], &embeddings[j]); let norm_sq = l2_norm_squared(&diff); // Normalize by degrees let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt()); loss += weight * norm_sq; } loss } ``` **Benefits**: - Smooth embeddings along graph structure - Preserves community structure - Theoretical guarantees (Laplacian eigenmaps) --- ## 9. Disentangled Representations ### 9.1 Motivation **Problem**: Current embeddings are entangled (single vector encodes everything) **Goal**: Separate embedding into interpretable factors ``` h_v = [h_structural || h_semantic || h_temporal] where: h_structural: Topology (degree, centrality, etc.) h_semantic: Feature content h_temporal: Dynamics (for evolving graphs) ``` ### 9.2 β-VAE for Graphs **Variational Autoencoder with Disentanglement**: ``` Encoder: (X_v, N(v)) → q(z_v | X_v, N(v)) Decoder: z_v → p(X_v, N(v) | z_v) Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v)) ``` **β > 1**: Encourages disentanglement (independence of latent factors) **Implementation Sketch**: ```rust pub struct GraphVAE { encoder: RuvectorLayer, mu_layer: Linear, logvar_layer: Linear, decoder: Linear, } impl GraphVAE { fn encode(&self, node_features: &[f32], neighbors: &[Vec]) -> (Vec, Vec) { let h = self.encoder.forward(node_features, neighbors, &[]); let mu = self.mu_layer.forward(&h); let logvar = self.logvar_layer.forward(&h); (mu, logvar) } fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec { let std: Vec = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect(); let eps: Vec = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect(); mu.iter().zip(std.iter()).zip(eps.iter()) .map(|((&m, &s), &e)| m + s * e) .collect() } fn forward(&self, node_features: &[f32], neighbors: &[Vec]) -> (Vec, f32) { let (mu, logvar) = self.encode(node_features, neighbors); let z = self.reparameterize(&mu, &logvar); // Reconstruct node features let recon = self.decoder.forward(&z); // KL divergence let kl: f32 = mu.iter().zip(logvar.iter()) .map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp())) .sum(); (recon, kl) } } ``` ### 9.3 Disentanglement Metrics **1. Mutual Information Gap (MIG)** ``` MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k) Measures how uniquely each latent factor captures each ground-truth factor ``` **2. SAP (Separated Attribute Predictability)** ``` Train linear classifiers z → y for each attribute Measure how well z predicts individual factors ``` **Application to Graphs**: - Ground-truth factors: Degree, clustering, centrality, community - Learned latent: h_v - Metric: MIG or SAP between h_v components and structural properties --- ## 10. Hierarchical Representations (HNSW-Specific) ### 10.1 Multi-Scale Embeddings **Idea**: Different embeddings for different HNSW layers ``` Node v appears in layers {0, 2, 3}: h_v^{(0)}: Dense, local structure h_v^{(2)}: Coarse, medium-range h_v^{(3)}: Global, long-range hubs ``` **Hierarchical Encoding**: ```rust pub struct HierarchicalEmbedding { embeddings_by_layer: HashMap>, } impl HierarchicalEmbedding { fn get_embedding(&self, layer: usize) -> &Vec { self.embeddings_by_layer.get(&layer) .expect("Node not in this layer") } // Interpolate between layers for search fn interpolated_embedding(&self, target_layer: f32) -> Vec { let layer_low = target_layer.floor() as usize; let layer_high = target_layer.ceil() as usize; if layer_low == layer_high { return self.get_embedding(layer_low).clone(); } let alpha = target_layer - layer_low as f32; let emb_low = self.get_embedding(layer_low); let emb_high = self.get_embedding(layer_high); // Linear interpolation emb_low.iter().zip(emb_high.iter()) .map(|(&l, &h)| (1.0 - alpha) * l + alpha * h) .collect() } } ``` **Hierarchical Loss**: ```rust fn hierarchical_contrastive_loss( node_hierarchical_emb: &HierarchicalEmbedding, neighbors_by_layer: &HashMap>, ) -> f32 { let mut loss = 0.0; // Contrastive loss at each layer for (layer, layer_neighbors) in neighbors_by_layer { let h_v = node_hierarchical_emb.get_embedding(*layer); let positives: Vec<&Vec> = layer_neighbors.iter() .map(|n| n.get_embedding(*layer)) .collect(); // Sample negatives from other layers let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer); loss += info_nce_loss(h_v, &positives, &negatives, 0.07); } loss / neighbors_by_layer.len() as f32 } ``` ### 10.2 Coarse-to-Fine Alignment **Goal**: Ensure consistency across HNSW layers ``` Alignment Loss: L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||² where Project: R^{d_high} → R^{d_low} (e.g., learned linear map) ``` **Benefits**: - Global structure (high layers) guides local (low layers) - Enables layer-skipping (jump from layer 3 to layer 0 embedding) - Multi-resolution representation --- ## 11. Practical Strategies for RuVector ### 11.1 Short-Term Enhancements **1. Structural Feature Augmentation** ```rust // Add degree, clustering, HNSW layer info to embeddings let augmented_embedding = [ &node_embedding[..], &[degree as f32], &[clustering_coef], &one_hot_layer[..], ].concat(); ``` **2. Spectral Regularization** ```rust // Add spectral loss to training total_loss = contrastive_loss + λ_spectral * spectral_loss ``` **3. Hard Negative Sampling** ```rust // Replace random negatives with hard negatives in local_contrastive_loss let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k); let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature); ``` ### 11.2 Medium-Term Research **4. Hierarchical Embeddings per HNSW Layer** ```rust pub struct HNSWHierarchicalGNN { gnn_layers_by_hnsw_level: Vec, } ``` **5. Hyperbolic Embeddings for Higher Layers** ```rust // Layer 0: Euclidean (local, grid-like) // Layer 1+: Hyperbolic (hierarchical navigation) pub enum GeometricEmbedding { Euclidean(Vec), Hyperbolic(Vec), // Poincaré ball } ``` **6. Disentangled VAE** ```rust // Separate structural vs. semantic information pub struct DisentangledGraphVAE { structural_encoder: RuvectorLayer, semantic_encoder: RuvectorLayer, decoder: Linear, } ``` ### 11.3 Long-Term Exploration **7. Information Bottleneck Optimization** - Minimize I(X; H) while maximizing I(H; Y) - Variational bounds for tractability - Beta-annealing schedule **8. Graph Transformers** - Replace message passing with full attention - Positional encodings (Laplacian eigenvectors, RoPE) - Layer-wise multi-scale attention **9. Neural ODEs for Continuous Depth** ``` dh/dt = GNN(h(t), G) h(T) = h(0) + ∫₀^T GNN(h(t), G) dt ``` --- ## 12. Evaluation Metrics for Latent-Graph Alignment ### 12.1 Reconstruction Metrics **1. Link Prediction AUC** ``` Measure how well latent similarity predicts edges AUC-ROC on link prediction task ``` **2. Graph Reconstruction Error** ``` ||A - σ(H H^T)||²_F where A is adjacency, H is embeddings ``` ### 12.2 Structural Preservation **3. Rank Correlation** ``` Spearman ρ between: - Graph distance d_G(u, v) - Latent distance d_L(h_u, h_v) ``` **4. Distortion** ``` max_{u,v} |d_L(h_u, h_v) - d_G(u, v)| Worst-case embedding distortion ``` **5. Average Distortion** ``` (1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)| ``` ### 12.3 Downstream Task Performance **6. Node Classification Accuracy** ``` Train classifier on embeddings, test accuracy ``` **7. Clustering Modularity** ``` K-means on embeddings, measure graph modularity ``` **8. HNSW Search Quality** ``` Recall@K using learned embeddings vs. original features ``` --- ## References ### Papers **Manifold Learning**: 1. Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap) 2. Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction **Hyperbolic Embeddings**: 3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations 4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks **Information Theory**: 5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle 6. Velickovic et al. (2019) - Deep Graph Infomax **Contrastive Learning**: 7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 8. You et al. (2020) - Graph Contrastive Learning with Augmentations **Disentanglement**: 9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework 10. Ma et al. (2019) - Disentangled Graph Convolutional Networks ### RuVector Code - `crates/ruvector-gnn/src/layer.rs` - GNN encoding - `crates/ruvector-gnn/src/search.rs` - Latent similarity (decoding) - `crates/ruvector-gnn/src/training.rs` - Contrastive losses (alignment) --- **Document Version**: 1.0 **Last Updated**: 2025-11-30 **Author**: RuVector Research Team