git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
1039 lines
26 KiB
Markdown
1039 lines
26 KiB
Markdown
# Latent Space ↔ Graph Reality Interplay
|
||
|
||
## Executive Summary
|
||
|
||
This document explores the fundamental relationship between **high-dimensional latent space** (where embeddings live) and **graph reality** (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity.
|
||
|
||
**Central Question**: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology?
|
||
|
||
---
|
||
|
||
## 1. The Two Worlds
|
||
|
||
### 1.1 Latent Space Characteristics
|
||
|
||
**Definition**: Continuous, high-dimensional vector space where node embeddings reside
|
||
|
||
```
|
||
Latent Space L: R^d where d ∈ [64, 1024+]
|
||
Node embedding: h_v ∈ L
|
||
Distance metric: d(h_u, h_v) (cosine, L2, etc.)
|
||
```
|
||
|
||
**Properties**:
|
||
- ✓ **Continuous**: Smooth interpolation between points
|
||
- ✓ **Dense**: Every point is surrounded by infinitely many neighbors
|
||
- ✓ **High-Dimensional**: Curse of dimensionality, but expressive
|
||
- ✓ **Metric**: Equipped with distance/similarity function
|
||
- ✗ **Isotropic**: May not preserve hierarchical structure
|
||
- ✗ **Euclidean Bias**: Most operations assume flat geometry
|
||
|
||
**Current RuVector Latent Space**:
|
||
```rust
|
||
// From layer.rs:337-350
|
||
RuvectorLayer {
|
||
input_dim: usize, // Typically 64-256
|
||
hidden_dim: usize, // Typically 128-512
|
||
// Embeddings live in R^{hidden_dim}
|
||
}
|
||
```
|
||
|
||
### 1.2 Graph Reality Characteristics
|
||
|
||
**Definition**: Discrete topological structure G = (V, E)
|
||
|
||
```
|
||
Graph G:
|
||
- Vertices: V = {v_1, ..., v_n}
|
||
- Edges: E ⊆ V × V
|
||
- Neighborhoods: N(v) = {u : (u,v) ∈ E}
|
||
- Topology: Small-world, scale-free, hierarchical, etc.
|
||
```
|
||
|
||
**Properties**:
|
||
- ✓ **Discrete**: Finite nodes and edges
|
||
- ✓ **Sparse**: |E| << |V|² (typically)
|
||
- ✓ **Structured**: Communities, hierarchies, motifs
|
||
- ✓ **Relational**: Explicit connections
|
||
- ✗ **Non-Metric**: Shortest path not always meaningful
|
||
- ✗ **Heterogeneous**: Variable degree, asymmetric
|
||
|
||
**RuVector Graph (HNSW)**:
|
||
```
|
||
Hierarchical Navigable Small World:
|
||
- Layer 0: Dense graph (M = 16-64 neighbors)
|
||
- Layer 1+: Sparse graphs (long-range connections)
|
||
- Navigable: Greedy search finds approximate NN
|
||
- Small-world: Low diameter, high clustering
|
||
```
|
||
|
||
---
|
||
|
||
## 2. The Fundamental Tension
|
||
|
||
### 2.1 Embedding Paradox
|
||
|
||
**Goal 1**: Preserve graph topology in latent space
|
||
```
|
||
If (u, v) ∈ E, then ||h_u - h_v|| should be small
|
||
```
|
||
|
||
**Goal 2**: Preserve latent similarity in graph
|
||
```
|
||
If ||h_u - h_v|| is small, then u and v should be related
|
||
```
|
||
|
||
**Paradox**: These are not equivalent!
|
||
- **Graph neighbors** may be semantically different (e.g., bridge edges)
|
||
- **Latent neighbors** may not be graph-connected (e.g., same cluster, different components)
|
||
|
||
### 2.2 Information Bottleneck
|
||
|
||
```
|
||
Graph G ──encode──> Latent h ──decode──> Graph G'
|
||
(GNN) (predict edges/nodes)
|
||
```
|
||
|
||
**Bottleneck**: Fixed-dimensional h must compress all information from:
|
||
- Node features
|
||
- Local topology (ego-net)
|
||
- Global structure (communities, paths)
|
||
- Edge attributes
|
||
- Dynamic patterns
|
||
|
||
**Trade-off**:
|
||
- **High dimensions**: More expressive, but curse of dimensionality
|
||
- **Low dimensions**: Efficient, but lossy compression
|
||
|
||
---
|
||
|
||
## 3. Manifold Hypothesis for Graphs
|
||
|
||
### 3.1 Low-Dimensional Manifold
|
||
|
||
**Hypothesis**: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space
|
||
|
||
```
|
||
True data distribution: P_data(h) supported on manifold M ⊂ R^d
|
||
where dim(M) << d
|
||
```
|
||
|
||
**Implications**:
|
||
1. **Intrinsic Dimensionality**: Effective degrees of freedom much less than d
|
||
2. **Local Linearity**: Small neighborhoods approximately Euclidean
|
||
3. **Global Curvature**: Manifold may be curved (non-Euclidean)
|
||
|
||
**Evidence in RuVector**:
|
||
- HNSW assumes low intrinsic dimension for efficient search
|
||
- Multi-head attention learns multiple "views" of manifold
|
||
- Layer normalization assumes local isotropy
|
||
|
||
### 3.2 Geometric Structure of Graph Embeddings
|
||
|
||
**Question**: What geometry best represents graphs?
|
||
|
||
**Option 1: Euclidean (Current)**
|
||
```
|
||
h ∈ R^d, distance = ||h_u - h_v||_2
|
||
```
|
||
- ✓ Simple, well-understood
|
||
- ✓ Efficient operations (dot products, linear maps)
|
||
- ✗ Poor for tree-like structures
|
||
- ✗ Exponential capacity limited
|
||
|
||
**Option 2: Hyperbolic (Poincaré Ball)**
|
||
```
|
||
h ∈ B^d = {x ∈ R^d : ||x|| < 1}
|
||
distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
|
||
```
|
||
- ✓ **Exponential capacity**: Volume grows exponentially with radius
|
||
- ✓ **Natural hierarchies**: Tree embeddings with low distortion
|
||
- ✓ **HNSW synergy**: Hierarchical layers naturally hyperbolic
|
||
- ✗ More complex operations
|
||
- ✗ Numerical instability near boundary
|
||
|
||
**Option 3: Mixed Curvature (Product Manifolds)**
|
||
```
|
||
h = (h_euclidean, h_hyperbolic, h_spherical)
|
||
Combine different geometries for different aspects
|
||
```
|
||
|
||
### 3.3 Curvature and Graph Structure
|
||
|
||
**Relationship**:
|
||
```
|
||
Negative curvature (hyperbolic) ↔ Tree-like, hierarchical
|
||
Zero curvature (Euclidean) ↔ Grid-like, regular
|
||
Positive curvature (spherical) ↔ Cyclic, clustered
|
||
```
|
||
|
||
**HNSW Topology**:
|
||
- **Layer 0**: Locally grid-like (Euclidean)
|
||
- **Higher layers**: Tree-like navigation (Hyperbolic)
|
||
- **Overall**: Mixed curvature
|
||
|
||
**Implication**: Single geometry may be suboptimal; consider mixed-curvature embeddings
|
||
|
||
---
|
||
|
||
## 4. Encoding: Graph → Latent Space
|
||
|
||
### 4.1 Message Passing Framework (Current)
|
||
|
||
**Goal**: Aggregate neighborhood information into node embedding
|
||
|
||
```
|
||
From layer.rs:362-401 (RuvectorLayer.forward):
|
||
|
||
h_v^{(l+1)} = UPDATE(
|
||
h_v^{(l)},
|
||
AGGREGATE({m_u^{(l)} : u ∈ N(v)}),
|
||
TRANSFORM(h_v^{(l)})
|
||
)
|
||
```
|
||
|
||
**Current Pipeline**:
|
||
1. **Message**: `m_u = W_msg · h_u`
|
||
2. **Attention Aggregate**: `a_v = MultiHeadAttention(h_v, {h_u})`
|
||
3. **Weighted Aggregate**: `agg_v = Σ w_uv · m_u`
|
||
4. **Combine**: `combined = a_v + agg_v`
|
||
5. **Update**: `h'_v = GRU(W_agg · combined, h_v)`
|
||
6. **Normalize**: `output = LayerNorm(Dropout(h'_v))`
|
||
|
||
**What This Encodes**:
|
||
- ✓ Local neighborhood structure (1-hop in one layer)
|
||
- ✓ Neighbor feature aggregation
|
||
- ✓ Temporal dynamics (via GRU)
|
||
- ✗ Global structure (requires stacking layers)
|
||
- ✗ Structural properties (degree, centrality, etc.)
|
||
- ✗ Edge semantics (only weights, not features)
|
||
|
||
### 4.2 Multi-Hop Information Propagation
|
||
|
||
**Challenge**: K-layer GNN sees only K-hop neighborhood
|
||
|
||
```
|
||
Receptive field after L layers: d_graph(u, v) ≤ L
|
||
```
|
||
|
||
**For HNSW**:
|
||
- Layer 0 average degree: ~50
|
||
- Layer 1 average degree: ~10
|
||
- Exponential reduction in higher layers
|
||
|
||
**Trade-off**:
|
||
- **Many layers**: Large receptive field, but over-smoothing
|
||
- **Few layers**: Localized, but miss global context
|
||
|
||
**Over-Smoothing Problem**:
|
||
```
|
||
As L → ∞, all node embeddings converge to the same value:
|
||
h_v^{(∞)} → E[h] for all v
|
||
```
|
||
|
||
**Mitigation Strategies**:
|
||
1. **Skip Connections**: `h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)})`
|
||
2. **Residual GRU**: Implicit in `h_t = (1-z_t)h_{t-1} + z_t h̃_t`
|
||
3. **Jumping Knowledge**: Concatenate all layer outputs
|
||
4. **Adaptive Depth**: Learn when to stop propagating
|
||
|
||
### 4.3 Structural Features Beyond Neighborhoods
|
||
|
||
**Current limitation**: Only neighbor features, not structural properties
|
||
|
||
**Missing Encodings**:
|
||
1. **Node Degree**: `deg(v) = |N(v)|`
|
||
2. **Clustering Coefficient**: `C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2)`
|
||
3. **Centrality**: Betweenness, closeness, eigenvector
|
||
4. **Community Membership**: Detected clusters
|
||
5. **HNSW Layer**: Which layers the node appears in
|
||
|
||
**Proposed Enhancement**:
|
||
```rust
|
||
pub struct StructuralFeatures {
|
||
degree: f32,
|
||
clustering_coef: f32,
|
||
hnsw_layers: Vec<usize>, // Layers this node appears in
|
||
centrality: f32,
|
||
}
|
||
|
||
impl RuvectorLayer {
|
||
fn forward_with_structural(
|
||
&self,
|
||
node_embedding: &[f32],
|
||
neighbor_embeddings: &[Vec<f32>],
|
||
edge_weights: &[f32],
|
||
structural_features: &StructuralFeatures, // NEW
|
||
) -> Vec<f32> {
|
||
// Concatenate structural features to embedding
|
||
let augmented = [node_embedding, &structural_features.to_vec()].concat();
|
||
|
||
// Proceed with standard forward pass
|
||
// ...
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Decoding: Latent Space → Graph Predictions
|
||
|
||
### 5.1 Link Prediction
|
||
|
||
**Goal**: Predict edge existence from embeddings
|
||
|
||
```
|
||
Score(u, v) = f(h_u, h_v)
|
||
P((u,v) ∈ E) = σ(Score(u, v))
|
||
```
|
||
|
||
**Scoring Functions**:
|
||
|
||
**1. Dot Product** (Current in search.rs)
|
||
```rust
|
||
score = h_u.dot(h_v)
|
||
```
|
||
- ✓ Fast O(d)
|
||
- ✗ Not invariant to scaling
|
||
|
||
**2. Cosine Similarity** (Current in search.rs)
|
||
```rust
|
||
score = h_u.dot(h_v) / (||h_u|| · ||h_v||)
|
||
```
|
||
- ✓ Scale-invariant
|
||
- ✓ Natural for normalized embeddings
|
||
- ✗ Ignores magnitude information
|
||
|
||
**3. Distance-Based**
|
||
```rust
|
||
score = -||h_u - h_v||²
|
||
```
|
||
- ✓ Metric structure
|
||
- ✗ Negative, unbounded
|
||
|
||
**4. Bilinear**
|
||
```rust
|
||
score = h_u^T W h_v
|
||
```
|
||
- ✓ Learnable asymmetry
|
||
- ✗ O(d²) parameters
|
||
|
||
**5. MLP (Most Expressive)**
|
||
```rust
|
||
score = MLP([h_u || h_v || (h_u ⊙ h_v)])
|
||
```
|
||
- ✓ Highly expressive
|
||
- ✗ Expensive O(d²) or more
|
||
|
||
**RuVector Current**:
|
||
```rust
|
||
// From search.rs:4-18
|
||
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
|
||
let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
|
||
let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
|
||
let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
|
||
|
||
if norm_a == 0.0 || norm_b == 0.0 {
|
||
0.0
|
||
} else {
|
||
dot_product / (norm_a * norm_b)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 5.2 Node Classification
|
||
|
||
**Goal**: Predict node labels from embeddings
|
||
|
||
```
|
||
From graph_neural.rs:82-98:
|
||
classify_node(h_v) → class probabilities
|
||
```
|
||
|
||
**Typical Approach**:
|
||
```
|
||
logits = W_class · h_v + b
|
||
probs = softmax(logits)
|
||
```
|
||
|
||
**Challenge**: Embedding must encode label-relevant information
|
||
|
||
### 5.3 Graph Reconstruction
|
||
|
||
**Goal**: Reconstruct adjacency matrix from embeddings
|
||
|
||
**Autoencoder Framework**:
|
||
```
|
||
Encoder: A → H (GNN)
|
||
Decoder: H → A' (pairwise scoring)
|
||
|
||
Loss: ||A - A'||² or Cross-Entropy
|
||
```
|
||
|
||
**Reconstruction Loss**:
|
||
```rust
|
||
// Proposed
|
||
pub fn graph_reconstruction_loss(
|
||
embeddings: &[Vec<f32>],
|
||
adjacency: &[(usize, usize)], // True edges
|
||
) -> f32 {
|
||
let mut loss = 0.0;
|
||
let n = embeddings.len();
|
||
|
||
// Positive edges (should have high score)
|
||
for &(i, j) in adjacency {
|
||
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
|
||
loss -= (score + 1e-10).ln(); // -log(score)
|
||
}
|
||
|
||
// Negative sampling (non-edges should have low score)
|
||
for _ in 0..adjacency.len() {
|
||
let i = rand::random::<usize>() % n;
|
||
let j = rand::random::<usize>() % n;
|
||
if !adjacency.contains(&(i, j)) {
|
||
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
|
||
loss -= (1.0 - score + 1e-10).ln(); // -log(1 - score)
|
||
}
|
||
}
|
||
|
||
loss / (2 * adjacency.len()) as f32
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Information-Theoretic Perspective
|
||
|
||
### 6.1 Mutual Information
|
||
|
||
**Goal**: Maximize mutual information between graph structure G and embeddings H
|
||
|
||
```
|
||
max I(G; H) = H(G) - H(G|H)
|
||
= H(H) - H(H|G)
|
||
```
|
||
|
||
**Interpretation**:
|
||
- `I(G; H)` measures how much knowing H tells us about G
|
||
- Perfect encoding: `I(G; H) = H(G)` (H captures all graph info)
|
||
- Independence: `I(G; H) = 0` (H tells nothing about G)
|
||
|
||
**Challenges**:
|
||
1. **Intractability**: Computing I(G; H) is hard
|
||
2. **Continuous H**: Differential entropy unbounded
|
||
3. **Discrete G**: Entropy depends on graph size
|
||
|
||
### 6.2 Deep Graph Infomax (DGI)
|
||
|
||
**Idea**: Maximize MI between node embeddings and graph summary
|
||
|
||
```
|
||
DGI Loss:
|
||
max I(h_v; h_G)
|
||
|
||
where:
|
||
h_v: node embedding
|
||
h_G: graph-level summary (e.g., mean pooling)
|
||
```
|
||
|
||
**Implementation**:
|
||
```rust
|
||
pub fn deep_graph_infomax_loss(
|
||
node_embeddings: &[Vec<f32>],
|
||
graph_summary: &[f32], // Readout function output
|
||
negative_samples: &[Vec<f32>], // Corrupted embeddings
|
||
) -> f32 {
|
||
let mut loss = 0.0;
|
||
|
||
// Positive samples: real (node, graph) pairs
|
||
for h_v in node_embeddings {
|
||
let score = discriminator(h_v, graph_summary); // MLP or bilinear
|
||
loss -= (sigmoid(score) + 1e-10).ln();
|
||
}
|
||
|
||
// Negative samples: (corrupted node, graph) pairs
|
||
for h_neg in negative_samples {
|
||
let score = discriminator(h_neg, graph_summary);
|
||
loss -= (1.0 - sigmoid(score) + 1e-10).ln();
|
||
}
|
||
|
||
loss / (node_embeddings.len() + negative_samples.len()) as f32
|
||
}
|
||
```
|
||
|
||
**Readout Functions** (graph summary):
|
||
1. **Mean**: `h_G = (1/n) Σ_v h_v`
|
||
2. **Max**: `h_G = max_v h_v` (element-wise)
|
||
3. **Attention**: `h_G = Σ_v α_v h_v` where `α_v = softmax(MLP(h_v))`
|
||
|
||
### 6.3 Information Bottleneck Principle
|
||
|
||
**Principle**: Find minimal sufficient representation
|
||
|
||
```
|
||
min I(X; H) - β I(H; Y)
|
||
|
||
where:
|
||
X: input features
|
||
H: learned embeddings
|
||
Y: prediction target
|
||
β: trade-off parameter
|
||
```
|
||
|
||
**Graph Context**:
|
||
- `X`: Node features + neighborhood structure
|
||
- `H`: Node embeddings
|
||
- `Y`: Downstream task (link, classification)
|
||
|
||
**Goal**: Compress X into H, retaining only task-relevant information
|
||
|
||
**Implementation Strategy**:
|
||
1. **Variational Bound**: Use VAE-style reparameterization
|
||
2. **Lagrange Multiplier**: β controls compression vs. performance
|
||
3. **Regularization**: Encourage low mutual information I(X; H)
|
||
|
||
---
|
||
|
||
## 7. Contrastive Learning for Graph-Latent Alignment
|
||
|
||
### 7.1 Contrastive Objectives
|
||
|
||
**Core Idea**: Pull together related nodes in latent space, push apart unrelated nodes
|
||
|
||
**InfoNCE Loss** (Current in training.rs:362-411):
|
||
```rust
|
||
pub fn info_nce_loss(
|
||
anchor: &[f32],
|
||
positives: &[&[f32]],
|
||
negatives: &[&[f32]],
|
||
temperature: f32,
|
||
) -> f32
|
||
```
|
||
|
||
**Mathematical Form**:
|
||
```
|
||
L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ)))
|
||
```
|
||
|
||
**What This Optimizes**:
|
||
- Positive pairs `(h_v, h_+)`: Graph neighbors, semantically similar
|
||
- Negative pairs `(h_v, h_-)`: Non-neighbors, dissimilar
|
||
- Temperature τ: Controls hardness of negatives
|
||
|
||
### 7.2 Local Contrastive Loss (Graph-Specific)
|
||
|
||
**Current Implementation** (training.rs:444-462):
|
||
```rust
|
||
pub fn local_contrastive_loss(
|
||
node_embedding: &[f32],
|
||
neighbor_embeddings: &[Vec<f32>],
|
||
non_neighbor_embeddings: &[Vec<f32>],
|
||
temperature: f32,
|
||
) -> f32
|
||
```
|
||
|
||
**Graph-Aware Sampling**:
|
||
- **Positives**: Direct graph neighbors `N(v)`
|
||
- **Negatives**: Non-neighbors (random or hard negatives)
|
||
|
||
**Variants**:
|
||
|
||
**1. K-Hop Positives**
|
||
```
|
||
Positives = {u : d_graph(v, u) ≤ K}
|
||
Encourages multi-hop proximity in latent space
|
||
```
|
||
|
||
**2. Community-Based**
|
||
```
|
||
Positives = {u : community(v) = community(u)}
|
||
Negatives = {u : community(v) ≠ community(u)}
|
||
Encourages cluster separation
|
||
```
|
||
|
||
**3. HNSW Layer-Based**
|
||
```
|
||
Positives = {u : (u,v) ∈ E_layer_k}
|
||
Different contrastive losses per HNSW layer
|
||
```
|
||
|
||
### 7.3 Hard Negative Mining
|
||
|
||
**Problem**: Random negatives are often too easy
|
||
|
||
**Solution**: Sample hard negatives (latent-close but graph-far)
|
||
|
||
```rust
|
||
fn sample_hard_negatives(
|
||
node: &[f32],
|
||
all_embeddings: &[Vec<f32>],
|
||
true_neighbors: &[usize],
|
||
k: usize,
|
||
) -> Vec<Vec<f32>> {
|
||
// 1. Compute similarities to all nodes
|
||
let similarities: Vec<(usize, f32)> = all_embeddings
|
||
.iter()
|
||
.enumerate()
|
||
.filter(|(i, _)| !true_neighbors.contains(i)) // Exclude true neighbors
|
||
.map(|(i, emb)| (i, cosine_similarity(node, emb)))
|
||
.collect();
|
||
|
||
// 2. Sort by similarity (descending)
|
||
similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
|
||
|
||
// 3. Take top-k (most similar non-neighbors = hard negatives)
|
||
similarities.iter()
|
||
.take(k)
|
||
.map(|(i, _)| all_embeddings[*i].clone())
|
||
.collect()
|
||
}
|
||
```
|
||
|
||
**Benefits**:
|
||
- Focuses learning on difficult cases
|
||
- Improves discrimination boundaries
|
||
- Speeds up convergence
|
||
|
||
---
|
||
|
||
## 8. Spectral Methods and Graph Signals
|
||
|
||
### 8.1 Graph Laplacian
|
||
|
||
**Normalized Laplacian**:
|
||
```
|
||
L = I - D^(-1/2) A D^(-1/2)
|
||
|
||
where:
|
||
A: adjacency matrix
|
||
D: degree matrix (diagonal)
|
||
```
|
||
|
||
**Eigendecomposition**:
|
||
```
|
||
L = U Λ U^T
|
||
|
||
where:
|
||
U: eigenvectors (graph Fourier basis)
|
||
Λ: eigenvalues (frequencies)
|
||
```
|
||
|
||
**Interpretation**:
|
||
- Small eigenvalues ↔ Low-frequency (smooth signals)
|
||
- Large eigenvalues ↔ High-frequency (oscillatory signals)
|
||
|
||
### 8.2 Spectral GNN Connection
|
||
|
||
**Classical GCN** (Kipf & Welling):
|
||
```
|
||
H^{(l+1)} = σ(D̃^(-1/2) Ã D̃^(-1/2) H^{(l)} W^{(l)})
|
||
|
||
where à = A + I (self-loops)
|
||
```
|
||
|
||
**Spectral Interpretation**:
|
||
- Aggregation = Low-pass filter
|
||
- Smooths node features along graph structure
|
||
- Eigenvalues control extent of smoothing
|
||
|
||
**RuVector's Approach** (Spatial, not spectral):
|
||
- Message passing is spatial formulation
|
||
- Attention adds adaptive filtering
|
||
- GRU adds temporal component
|
||
|
||
**Missing Spectral Component**:
|
||
- No explicit frequency analysis
|
||
- Could add spectral loss to preserve frequency content
|
||
|
||
### 8.3 Spectral Loss Functions
|
||
|
||
**Goal**: Preserve graph spectral properties in embeddings
|
||
|
||
**Laplacian Eigenmaps**:
|
||
```
|
||
min_H Σ_{(i,j) ∈ E} ||h_i - h_j||²
|
||
subject to H^T H = I
|
||
|
||
Equivalent to minimizing: Tr(H^T L H)
|
||
```
|
||
|
||
**Implementation**:
|
||
```rust
|
||
pub fn spectral_loss(
|
||
embeddings: &[Vec<f32>],
|
||
adjacency: &[(usize, usize)],
|
||
degrees: &[f32],
|
||
) -> f32 {
|
||
let mut loss = 0.0;
|
||
|
||
// Laplacian regularization: ||h_i - h_j||² for edges
|
||
for &(i, j) in adjacency {
|
||
let diff = subtract(&embeddings[i], &embeddings[j]);
|
||
let norm_sq = l2_norm_squared(&diff);
|
||
|
||
// Normalize by degrees
|
||
let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt());
|
||
loss += weight * norm_sq;
|
||
}
|
||
|
||
loss
|
||
}
|
||
```
|
||
|
||
**Benefits**:
|
||
- Smooth embeddings along graph structure
|
||
- Preserves community structure
|
||
- Theoretical guarantees (Laplacian eigenmaps)
|
||
|
||
---
|
||
|
||
## 9. Disentangled Representations
|
||
|
||
### 9.1 Motivation
|
||
|
||
**Problem**: Current embeddings are entangled (single vector encodes everything)
|
||
|
||
**Goal**: Separate embedding into interpretable factors
|
||
|
||
```
|
||
h_v = [h_structural || h_semantic || h_temporal]
|
||
|
||
where:
|
||
h_structural: Topology (degree, centrality, etc.)
|
||
h_semantic: Feature content
|
||
h_temporal: Dynamics (for evolving graphs)
|
||
```
|
||
|
||
### 9.2 β-VAE for Graphs
|
||
|
||
**Variational Autoencoder with Disentanglement**:
|
||
```
|
||
Encoder: (X_v, N(v)) → q(z_v | X_v, N(v))
|
||
Decoder: z_v → p(X_v, N(v) | z_v)
|
||
|
||
Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v))
|
||
```
|
||
|
||
**β > 1**: Encourages disentanglement (independence of latent factors)
|
||
|
||
**Implementation Sketch**:
|
||
```rust
|
||
pub struct GraphVAE {
|
||
encoder: RuvectorLayer,
|
||
mu_layer: Linear,
|
||
logvar_layer: Linear,
|
||
decoder: Linear,
|
||
}
|
||
|
||
impl GraphVAE {
|
||
fn encode(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, Vec<f32>) {
|
||
let h = self.encoder.forward(node_features, neighbors, &[]);
|
||
let mu = self.mu_layer.forward(&h);
|
||
let logvar = self.logvar_layer.forward(&h);
|
||
(mu, logvar)
|
||
}
|
||
|
||
fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
|
||
let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
|
||
let eps: Vec<f32> = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect();
|
||
|
||
mu.iter().zip(std.iter()).zip(eps.iter())
|
||
.map(|((&m, &s), &e)| m + s * e)
|
||
.collect()
|
||
}
|
||
|
||
fn forward(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, f32) {
|
||
let (mu, logvar) = self.encode(node_features, neighbors);
|
||
let z = self.reparameterize(&mu, &logvar);
|
||
|
||
// Reconstruct node features
|
||
let recon = self.decoder.forward(&z);
|
||
|
||
// KL divergence
|
||
let kl: f32 = mu.iter().zip(logvar.iter())
|
||
.map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
|
||
.sum();
|
||
|
||
(recon, kl)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 9.3 Disentanglement Metrics
|
||
|
||
**1. Mutual Information Gap (MIG)**
|
||
```
|
||
MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k)
|
||
|
||
Measures how uniquely each latent factor captures each ground-truth factor
|
||
```
|
||
|
||
**2. SAP (Separated Attribute Predictability)**
|
||
```
|
||
Train linear classifiers z → y for each attribute
|
||
Measure how well z predicts individual factors
|
||
```
|
||
|
||
**Application to Graphs**:
|
||
- Ground-truth factors: Degree, clustering, centrality, community
|
||
- Learned latent: h_v
|
||
- Metric: MIG or SAP between h_v components and structural properties
|
||
|
||
---
|
||
|
||
## 10. Hierarchical Representations (HNSW-Specific)
|
||
|
||
### 10.1 Multi-Scale Embeddings
|
||
|
||
**Idea**: Different embeddings for different HNSW layers
|
||
|
||
```
|
||
Node v appears in layers {0, 2, 3}:
|
||
h_v^{(0)}: Dense, local structure
|
||
h_v^{(2)}: Coarse, medium-range
|
||
h_v^{(3)}: Global, long-range hubs
|
||
```
|
||
|
||
**Hierarchical Encoding**:
|
||
```rust
|
||
pub struct HierarchicalEmbedding {
|
||
embeddings_by_layer: HashMap<usize, Vec<f32>>,
|
||
}
|
||
|
||
impl HierarchicalEmbedding {
|
||
fn get_embedding(&self, layer: usize) -> &Vec<f32> {
|
||
self.embeddings_by_layer.get(&layer)
|
||
.expect("Node not in this layer")
|
||
}
|
||
|
||
// Interpolate between layers for search
|
||
fn interpolated_embedding(&self, target_layer: f32) -> Vec<f32> {
|
||
let layer_low = target_layer.floor() as usize;
|
||
let layer_high = target_layer.ceil() as usize;
|
||
|
||
if layer_low == layer_high {
|
||
return self.get_embedding(layer_low).clone();
|
||
}
|
||
|
||
let alpha = target_layer - layer_low as f32;
|
||
let emb_low = self.get_embedding(layer_low);
|
||
let emb_high = self.get_embedding(layer_high);
|
||
|
||
// Linear interpolation
|
||
emb_low.iter().zip(emb_high.iter())
|
||
.map(|(&l, &h)| (1.0 - alpha) * l + alpha * h)
|
||
.collect()
|
||
}
|
||
}
|
||
```
|
||
|
||
**Hierarchical Loss**:
|
||
```rust
|
||
fn hierarchical_contrastive_loss(
|
||
node_hierarchical_emb: &HierarchicalEmbedding,
|
||
neighbors_by_layer: &HashMap<usize, Vec<HierarchicalEmbedding>>,
|
||
) -> f32 {
|
||
let mut loss = 0.0;
|
||
|
||
// Contrastive loss at each layer
|
||
for (layer, layer_neighbors) in neighbors_by_layer {
|
||
let h_v = node_hierarchical_emb.get_embedding(*layer);
|
||
let positives: Vec<&Vec<f32>> = layer_neighbors.iter()
|
||
.map(|n| n.get_embedding(*layer))
|
||
.collect();
|
||
|
||
// Sample negatives from other layers
|
||
let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer);
|
||
|
||
loss += info_nce_loss(h_v, &positives, &negatives, 0.07);
|
||
}
|
||
|
||
loss / neighbors_by_layer.len() as f32
|
||
}
|
||
```
|
||
|
||
### 10.2 Coarse-to-Fine Alignment
|
||
|
||
**Goal**: Ensure consistency across HNSW layers
|
||
|
||
```
|
||
Alignment Loss:
|
||
L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||²
|
||
|
||
where Project: R^{d_high} → R^{d_low} (e.g., learned linear map)
|
||
```
|
||
|
||
**Benefits**:
|
||
- Global structure (high layers) guides local (low layers)
|
||
- Enables layer-skipping (jump from layer 3 to layer 0 embedding)
|
||
- Multi-resolution representation
|
||
|
||
---
|
||
|
||
## 11. Practical Strategies for RuVector
|
||
|
||
### 11.1 Short-Term Enhancements
|
||
|
||
**1. Structural Feature Augmentation**
|
||
```rust
|
||
// Add degree, clustering, HNSW layer info to embeddings
|
||
let augmented_embedding = [
|
||
&node_embedding[..],
|
||
&[degree as f32],
|
||
&[clustering_coef],
|
||
&one_hot_layer[..],
|
||
].concat();
|
||
```
|
||
|
||
**2. Spectral Regularization**
|
||
```rust
|
||
// Add spectral loss to training
|
||
total_loss = contrastive_loss + λ_spectral * spectral_loss
|
||
```
|
||
|
||
**3. Hard Negative Sampling**
|
||
```rust
|
||
// Replace random negatives with hard negatives in local_contrastive_loss
|
||
let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k);
|
||
let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature);
|
||
```
|
||
|
||
### 11.2 Medium-Term Research
|
||
|
||
**4. Hierarchical Embeddings per HNSW Layer**
|
||
```rust
|
||
pub struct HNSWHierarchicalGNN {
|
||
gnn_layers_by_hnsw_level: Vec<RuvectorLayer>,
|
||
}
|
||
```
|
||
|
||
**5. Hyperbolic Embeddings for Higher Layers**
|
||
```rust
|
||
// Layer 0: Euclidean (local, grid-like)
|
||
// Layer 1+: Hyperbolic (hierarchical navigation)
|
||
pub enum GeometricEmbedding {
|
||
Euclidean(Vec<f32>),
|
||
Hyperbolic(Vec<f32>), // Poincaré ball
|
||
}
|
||
```
|
||
|
||
**6. Disentangled VAE**
|
||
```rust
|
||
// Separate structural vs. semantic information
|
||
pub struct DisentangledGraphVAE {
|
||
structural_encoder: RuvectorLayer,
|
||
semantic_encoder: RuvectorLayer,
|
||
decoder: Linear,
|
||
}
|
||
```
|
||
|
||
### 11.3 Long-Term Exploration
|
||
|
||
**7. Information Bottleneck Optimization**
|
||
- Minimize I(X; H) while maximizing I(H; Y)
|
||
- Variational bounds for tractability
|
||
- Beta-annealing schedule
|
||
|
||
**8. Graph Transformers**
|
||
- Replace message passing with full attention
|
||
- Positional encodings (Laplacian eigenvectors, RoPE)
|
||
- Layer-wise multi-scale attention
|
||
|
||
**9. Neural ODEs for Continuous Depth**
|
||
```
|
||
dh/dt = GNN(h(t), G)
|
||
h(T) = h(0) + ∫₀^T GNN(h(t), G) dt
|
||
```
|
||
|
||
---
|
||
|
||
## 12. Evaluation Metrics for Latent-Graph Alignment
|
||
|
||
### 12.1 Reconstruction Metrics
|
||
|
||
**1. Link Prediction AUC**
|
||
```
|
||
Measure how well latent similarity predicts edges
|
||
AUC-ROC on link prediction task
|
||
```
|
||
|
||
**2. Graph Reconstruction Error**
|
||
```
|
||
||A - σ(H H^T)||²_F
|
||
where A is adjacency, H is embeddings
|
||
```
|
||
|
||
### 12.2 Structural Preservation
|
||
|
||
**3. Rank Correlation**
|
||
```
|
||
Spearman ρ between:
|
||
- Graph distance d_G(u, v)
|
||
- Latent distance d_L(h_u, h_v)
|
||
```
|
||
|
||
**4. Distortion**
|
||
```
|
||
max_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
|
||
Worst-case embedding distortion
|
||
```
|
||
|
||
**5. Average Distortion**
|
||
```
|
||
(1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
|
||
```
|
||
|
||
### 12.3 Downstream Task Performance
|
||
|
||
**6. Node Classification Accuracy**
|
||
```
|
||
Train classifier on embeddings, test accuracy
|
||
```
|
||
|
||
**7. Clustering Modularity**
|
||
```
|
||
K-means on embeddings, measure graph modularity
|
||
```
|
||
|
||
**8. HNSW Search Quality**
|
||
```
|
||
Recall@K using learned embeddings vs. original features
|
||
```
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Papers
|
||
|
||
**Manifold Learning**:
|
||
1. Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
|
||
2. Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction
|
||
|
||
**Hyperbolic Embeddings**:
|
||
3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations
|
||
4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks
|
||
|
||
**Information Theory**:
|
||
5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle
|
||
6. Velickovic et al. (2019) - Deep Graph Infomax
|
||
|
||
**Contrastive Learning**:
|
||
7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)
|
||
8. You et al. (2020) - Graph Contrastive Learning with Augmentations
|
||
|
||
**Disentanglement**:
|
||
9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
|
||
10. Ma et al. (2019) - Disentangled Graph Convolutional Networks
|
||
|
||
### RuVector Code
|
||
- `crates/ruvector-gnn/src/layer.rs` - GNN encoding
|
||
- `crates/ruvector-gnn/src/search.rs` - Latent similarity (decoding)
|
||
- `crates/ruvector-gnn/src/training.rs` - Contrastive losses (alignment)
|
||
|
||
---
|
||
|
||
**Document Version**: 1.0
|
||
**Last Updated**: 2025-11-30
|
||
**Author**: RuVector Research Team
|