Files
wifi-densepose/docs/research/latent-space/latent-graph-interplay.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

1039 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Latent Space ↔ Graph Reality Interplay
## Executive Summary
This document explores the fundamental relationship between **high-dimensional latent space** (where embeddings live) and **graph reality** (the actual topology). This interplay is central to GNN effectiveness: we must encode graph structure into latent representations while ensuring latent similarity reflects topological proximity.
**Central Question**: How do we optimally bridge continuous, high-dimensional embedding geometry with discrete, sparse graph topology?
---
## 1. The Two Worlds
### 1.1 Latent Space Characteristics
**Definition**: Continuous, high-dimensional vector space where node embeddings reside
```
Latent Space L: R^d where d ∈ [64, 1024+]
Node embedding: h_v ∈ L
Distance metric: d(h_u, h_v) (cosine, L2, etc.)
```
**Properties**:
-**Continuous**: Smooth interpolation between points
-**Dense**: Every point is surrounded by infinitely many neighbors
-**High-Dimensional**: Curse of dimensionality, but expressive
-**Metric**: Equipped with distance/similarity function
-**Isotropic**: May not preserve hierarchical structure
-**Euclidean Bias**: Most operations assume flat geometry
**Current RuVector Latent Space**:
```rust
// From layer.rs:337-350
RuvectorLayer {
input_dim: usize, // Typically 64-256
hidden_dim: usize, // Typically 128-512
// Embeddings live in R^{hidden_dim}
}
```
### 1.2 Graph Reality Characteristics
**Definition**: Discrete topological structure G = (V, E)
```
Graph G:
- Vertices: V = {v_1, ..., v_n}
- Edges: E ⊆ V × V
- Neighborhoods: N(v) = {u : (u,v) ∈ E}
- Topology: Small-world, scale-free, hierarchical, etc.
```
**Properties**:
-**Discrete**: Finite nodes and edges
-**Sparse**: |E| << |V|² (typically)
-**Structured**: Communities, hierarchies, motifs
-**Relational**: Explicit connections
-**Non-Metric**: Shortest path not always meaningful
-**Heterogeneous**: Variable degree, asymmetric
**RuVector Graph (HNSW)**:
```
Hierarchical Navigable Small World:
- Layer 0: Dense graph (M = 16-64 neighbors)
- Layer 1+: Sparse graphs (long-range connections)
- Navigable: Greedy search finds approximate NN
- Small-world: Low diameter, high clustering
```
---
## 2. The Fundamental Tension
### 2.1 Embedding Paradox
**Goal 1**: Preserve graph topology in latent space
```
If (u, v) ∈ E, then ||h_u - h_v|| should be small
```
**Goal 2**: Preserve latent similarity in graph
```
If ||h_u - h_v|| is small, then u and v should be related
```
**Paradox**: These are not equivalent!
- **Graph neighbors** may be semantically different (e.g., bridge edges)
- **Latent neighbors** may not be graph-connected (e.g., same cluster, different components)
### 2.2 Information Bottleneck
```
Graph G ──encode──> Latent h ──decode──> Graph G'
(GNN) (predict edges/nodes)
```
**Bottleneck**: Fixed-dimensional h must compress all information from:
- Node features
- Local topology (ego-net)
- Global structure (communities, paths)
- Edge attributes
- Dynamic patterns
**Trade-off**:
- **High dimensions**: More expressive, but curse of dimensionality
- **Low dimensions**: Efficient, but lossy compression
---
## 3. Manifold Hypothesis for Graphs
### 3.1 Low-Dimensional Manifold
**Hypothesis**: Graph-structured data lies on a low-dimensional manifold embedded in high-dimensional space
```
True data distribution: P_data(h) supported on manifold M ⊂ R^d
where dim(M) << d
```
**Implications**:
1. **Intrinsic Dimensionality**: Effective degrees of freedom much less than d
2. **Local Linearity**: Small neighborhoods approximately Euclidean
3. **Global Curvature**: Manifold may be curved (non-Euclidean)
**Evidence in RuVector**:
- HNSW assumes low intrinsic dimension for efficient search
- Multi-head attention learns multiple "views" of manifold
- Layer normalization assumes local isotropy
### 3.2 Geometric Structure of Graph Embeddings
**Question**: What geometry best represents graphs?
**Option 1: Euclidean (Current)**
```
h ∈ R^d, distance = ||h_u - h_v||_2
```
- ✓ Simple, well-understood
- ✓ Efficient operations (dot products, linear maps)
- ✗ Poor for tree-like structures
- ✗ Exponential capacity limited
**Option 2: Hyperbolic (Poincaré Ball)**
```
h ∈ B^d = {x ∈ R^d : ||x|| < 1}
distance = arcosh(1 + 2||x-y||² / ((1-||x||²)(1-||y||²)))
```
-**Exponential capacity**: Volume grows exponentially with radius
-**Natural hierarchies**: Tree embeddings with low distortion
-**HNSW synergy**: Hierarchical layers naturally hyperbolic
- ✗ More complex operations
- ✗ Numerical instability near boundary
**Option 3: Mixed Curvature (Product Manifolds)**
```
h = (h_euclidean, h_hyperbolic, h_spherical)
Combine different geometries for different aspects
```
### 3.3 Curvature and Graph Structure
**Relationship**:
```
Negative curvature (hyperbolic) ↔ Tree-like, hierarchical
Zero curvature (Euclidean) ↔ Grid-like, regular
Positive curvature (spherical) ↔ Cyclic, clustered
```
**HNSW Topology**:
- **Layer 0**: Locally grid-like (Euclidean)
- **Higher layers**: Tree-like navigation (Hyperbolic)
- **Overall**: Mixed curvature
**Implication**: Single geometry may be suboptimal; consider mixed-curvature embeddings
---
## 4. Encoding: Graph → Latent Space
### 4.1 Message Passing Framework (Current)
**Goal**: Aggregate neighborhood information into node embedding
```
From layer.rs:362-401 (RuvectorLayer.forward):
h_v^{(l+1)} = UPDATE(
h_v^{(l)},
AGGREGATE({m_u^{(l)} : u ∈ N(v)}),
TRANSFORM(h_v^{(l)})
)
```
**Current Pipeline**:
1. **Message**: `m_u = W_msg · h_u`
2. **Attention Aggregate**: `a_v = MultiHeadAttention(h_v, {h_u})`
3. **Weighted Aggregate**: `agg_v = Σ w_uv · m_u`
4. **Combine**: `combined = a_v + agg_v`
5. **Update**: `h'_v = GRU(W_agg · combined, h_v)`
6. **Normalize**: `output = LayerNorm(Dropout(h'_v))`
**What This Encodes**:
- ✓ Local neighborhood structure (1-hop in one layer)
- ✓ Neighbor feature aggregation
- ✓ Temporal dynamics (via GRU)
- ✗ Global structure (requires stacking layers)
- ✗ Structural properties (degree, centrality, etc.)
- ✗ Edge semantics (only weights, not features)
### 4.2 Multi-Hop Information Propagation
**Challenge**: K-layer GNN sees only K-hop neighborhood
```
Receptive field after L layers: d_graph(u, v) ≤ L
```
**For HNSW**:
- Layer 0 average degree: ~50
- Layer 1 average degree: ~10
- Exponential reduction in higher layers
**Trade-off**:
- **Many layers**: Large receptive field, but over-smoothing
- **Few layers**: Localized, but miss global context
**Over-Smoothing Problem**:
```
As L → ∞, all node embeddings converge to the same value:
h_v^{(∞)} → E[h] for all v
```
**Mitigation Strategies**:
1. **Skip Connections**: `h^{(l+1)} = h^{(l)} + GNN^{(l)}(h^{(l)})`
2. **Residual GRU**: Implicit in `h_t = (1-z_t)h_{t-1} + z_t h̃_t`
3. **Jumping Knowledge**: Concatenate all layer outputs
4. **Adaptive Depth**: Learn when to stop propagating
### 4.3 Structural Features Beyond Neighborhoods
**Current limitation**: Only neighbor features, not structural properties
**Missing Encodings**:
1. **Node Degree**: `deg(v) = |N(v)|`
2. **Clustering Coefficient**: `C(v) = |{(u,w) ∈ E : u,w ∈ N(v)}| / (deg(v) choose 2)`
3. **Centrality**: Betweenness, closeness, eigenvector
4. **Community Membership**: Detected clusters
5. **HNSW Layer**: Which layers the node appears in
**Proposed Enhancement**:
```rust
pub struct StructuralFeatures {
degree: f32,
clustering_coef: f32,
hnsw_layers: Vec<usize>, // Layers this node appears in
centrality: f32,
}
impl RuvectorLayer {
fn forward_with_structural(
&self,
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
edge_weights: &[f32],
structural_features: &StructuralFeatures, // NEW
) -> Vec<f32> {
// Concatenate structural features to embedding
let augmented = [node_embedding, &structural_features.to_vec()].concat();
// Proceed with standard forward pass
// ...
}
}
```
---
## 5. Decoding: Latent Space → Graph Predictions
### 5.1 Link Prediction
**Goal**: Predict edge existence from embeddings
```
Score(u, v) = f(h_u, h_v)
P((u,v) ∈ E) = σ(Score(u, v))
```
**Scoring Functions**:
**1. Dot Product** (Current in search.rs)
```rust
score = h_u.dot(h_v)
```
- ✓ Fast O(d)
- ✗ Not invariant to scaling
**2. Cosine Similarity** (Current in search.rs)
```rust
score = h_u.dot(h_v) / (||h_u|| · ||h_v||)
```
- ✓ Scale-invariant
- ✓ Natural for normalized embeddings
- ✗ Ignores magnitude information
**3. Distance-Based**
```rust
score = -||h_u - h_v||²
```
- ✓ Metric structure
- ✗ Negative, unbounded
**4. Bilinear**
```rust
score = h_u^T W h_v
```
- ✓ Learnable asymmetry
- ✗ O(d²) parameters
**5. MLP (Most Expressive)**
```rust
score = MLP([h_u || h_v || (h_u h_v)])
```
- ✓ Highly expressive
- ✗ Expensive O(d²) or more
**RuVector Current**:
```rust
// From search.rs:4-18
pub fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 {
let dot_product: f32 = a.iter().zip(b.iter()).map(|(x, y)| x * y).sum();
let norm_a: f32 = (a.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
let norm_b: f32 = (b.iter().map(|&x| (x as f64) * (x as f64)).sum::<f64>().sqrt()) as f32;
if norm_a == 0.0 || norm_b == 0.0 {
0.0
} else {
dot_product / (norm_a * norm_b)
}
}
```
### 5.2 Node Classification
**Goal**: Predict node labels from embeddings
```
From graph_neural.rs:82-98:
classify_node(h_v) → class probabilities
```
**Typical Approach**:
```
logits = W_class · h_v + b
probs = softmax(logits)
```
**Challenge**: Embedding must encode label-relevant information
### 5.3 Graph Reconstruction
**Goal**: Reconstruct adjacency matrix from embeddings
**Autoencoder Framework**:
```
Encoder: A → H (GNN)
Decoder: H → A' (pairwise scoring)
Loss: ||A - A'||² or Cross-Entropy
```
**Reconstruction Loss**:
```rust
// Proposed
pub fn graph_reconstruction_loss(
embeddings: &[Vec<f32>],
adjacency: &[(usize, usize)], // True edges
) -> f32 {
let mut loss = 0.0;
let n = embeddings.len();
// Positive edges (should have high score)
for &(i, j) in adjacency {
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
loss -= (score + 1e-10).ln(); // -log(score)
}
// Negative sampling (non-edges should have low score)
for _ in 0..adjacency.len() {
let i = rand::random::<usize>() % n;
let j = rand::random::<usize>() % n;
if !adjacency.contains(&(i, j)) {
let score = cosine_similarity(&embeddings[i], &embeddings[j]);
loss -= (1.0 - score + 1e-10).ln(); // -log(1 - score)
}
}
loss / (2 * adjacency.len()) as f32
}
```
---
## 6. Information-Theoretic Perspective
### 6.1 Mutual Information
**Goal**: Maximize mutual information between graph structure G and embeddings H
```
max I(G; H) = H(G) - H(G|H)
= H(H) - H(H|G)
```
**Interpretation**:
- `I(G; H)` measures how much knowing H tells us about G
- Perfect encoding: `I(G; H) = H(G)` (H captures all graph info)
- Independence: `I(G; H) = 0` (H tells nothing about G)
**Challenges**:
1. **Intractability**: Computing I(G; H) is hard
2. **Continuous H**: Differential entropy unbounded
3. **Discrete G**: Entropy depends on graph size
### 6.2 Deep Graph Infomax (DGI)
**Idea**: Maximize MI between node embeddings and graph summary
```
DGI Loss:
max I(h_v; h_G)
where:
h_v: node embedding
h_G: graph-level summary (e.g., mean pooling)
```
**Implementation**:
```rust
pub fn deep_graph_infomax_loss(
node_embeddings: &[Vec<f32>],
graph_summary: &[f32], // Readout function output
negative_samples: &[Vec<f32>], // Corrupted embeddings
) -> f32 {
let mut loss = 0.0;
// Positive samples: real (node, graph) pairs
for h_v in node_embeddings {
let score = discriminator(h_v, graph_summary); // MLP or bilinear
loss -= (sigmoid(score) + 1e-10).ln();
}
// Negative samples: (corrupted node, graph) pairs
for h_neg in negative_samples {
let score = discriminator(h_neg, graph_summary);
loss -= (1.0 - sigmoid(score) + 1e-10).ln();
}
loss / (node_embeddings.len() + negative_samples.len()) as f32
}
```
**Readout Functions** (graph summary):
1. **Mean**: `h_G = (1/n) Σ_v h_v`
2. **Max**: `h_G = max_v h_v` (element-wise)
3. **Attention**: `h_G = Σ_v α_v h_v` where `α_v = softmax(MLP(h_v))`
### 6.3 Information Bottleneck Principle
**Principle**: Find minimal sufficient representation
```
min I(X; H) - β I(H; Y)
where:
X: input features
H: learned embeddings
Y: prediction target
β: trade-off parameter
```
**Graph Context**:
- `X`: Node features + neighborhood structure
- `H`: Node embeddings
- `Y`: Downstream task (link, classification)
**Goal**: Compress X into H, retaining only task-relevant information
**Implementation Strategy**:
1. **Variational Bound**: Use VAE-style reparameterization
2. **Lagrange Multiplier**: β controls compression vs. performance
3. **Regularization**: Encourage low mutual information I(X; H)
---
## 7. Contrastive Learning for Graph-Latent Alignment
### 7.1 Contrastive Objectives
**Core Idea**: Pull together related nodes in latent space, push apart unrelated nodes
**InfoNCE Loss** (Current in training.rs:362-411):
```rust
pub fn info_nce_loss(
anchor: &[f32],
positives: &[&[f32]],
negatives: &[&[f32]],
temperature: f32,
) -> f32
```
**Mathematical Form**:
```
L_InfoNCE = -log(exp(sim(h_v, h_+) / τ) / (exp(sim(h_v, h_+) / τ) + Σ_{h_- ∈ N} exp(sim(h_v, h_-) / τ)))
```
**What This Optimizes**:
- Positive pairs `(h_v, h_+)`: Graph neighbors, semantically similar
- Negative pairs `(h_v, h_-)`: Non-neighbors, dissimilar
- Temperature τ: Controls hardness of negatives
### 7.2 Local Contrastive Loss (Graph-Specific)
**Current Implementation** (training.rs:444-462):
```rust
pub fn local_contrastive_loss(
node_embedding: &[f32],
neighbor_embeddings: &[Vec<f32>],
non_neighbor_embeddings: &[Vec<f32>],
temperature: f32,
) -> f32
```
**Graph-Aware Sampling**:
- **Positives**: Direct graph neighbors `N(v)`
- **Negatives**: Non-neighbors (random or hard negatives)
**Variants**:
**1. K-Hop Positives**
```
Positives = {u : d_graph(v, u) ≤ K}
Encourages multi-hop proximity in latent space
```
**2. Community-Based**
```
Positives = {u : community(v) = community(u)}
Negatives = {u : community(v) ≠ community(u)}
Encourages cluster separation
```
**3. HNSW Layer-Based**
```
Positives = {u : (u,v) ∈ E_layer_k}
Different contrastive losses per HNSW layer
```
### 7.3 Hard Negative Mining
**Problem**: Random negatives are often too easy
**Solution**: Sample hard negatives (latent-close but graph-far)
```rust
fn sample_hard_negatives(
node: &[f32],
all_embeddings: &[Vec<f32>],
true_neighbors: &[usize],
k: usize,
) -> Vec<Vec<f32>> {
// 1. Compute similarities to all nodes
let similarities: Vec<(usize, f32)> = all_embeddings
.iter()
.enumerate()
.filter(|(i, _)| !true_neighbors.contains(i)) // Exclude true neighbors
.map(|(i, emb)| (i, cosine_similarity(node, emb)))
.collect();
// 2. Sort by similarity (descending)
similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
// 3. Take top-k (most similar non-neighbors = hard negatives)
similarities.iter()
.take(k)
.map(|(i, _)| all_embeddings[*i].clone())
.collect()
}
```
**Benefits**:
- Focuses learning on difficult cases
- Improves discrimination boundaries
- Speeds up convergence
---
## 8. Spectral Methods and Graph Signals
### 8.1 Graph Laplacian
**Normalized Laplacian**:
```
L = I - D^(-1/2) A D^(-1/2)
where:
A: adjacency matrix
D: degree matrix (diagonal)
```
**Eigendecomposition**:
```
L = U Λ U^T
where:
U: eigenvectors (graph Fourier basis)
Λ: eigenvalues (frequencies)
```
**Interpretation**:
- Small eigenvalues ↔ Low-frequency (smooth signals)
- Large eigenvalues ↔ High-frequency (oscillatory signals)
### 8.2 Spectral GNN Connection
**Classical GCN** (Kipf & Welling):
```
H^{(l+1)} = σ(D̃^(-1/2) Ã D̃^(-1/2) H^{(l)} W^{(l)})
where à = A + I (self-loops)
```
**Spectral Interpretation**:
- Aggregation = Low-pass filter
- Smooths node features along graph structure
- Eigenvalues control extent of smoothing
**RuVector's Approach** (Spatial, not spectral):
- Message passing is spatial formulation
- Attention adds adaptive filtering
- GRU adds temporal component
**Missing Spectral Component**:
- No explicit frequency analysis
- Could add spectral loss to preserve frequency content
### 8.3 Spectral Loss Functions
**Goal**: Preserve graph spectral properties in embeddings
**Laplacian Eigenmaps**:
```
min_H Σ_{(i,j) ∈ E} ||h_i - h_j||²
subject to H^T H = I
Equivalent to minimizing: Tr(H^T L H)
```
**Implementation**:
```rust
pub fn spectral_loss(
embeddings: &[Vec<f32>],
adjacency: &[(usize, usize)],
degrees: &[f32],
) -> f32 {
let mut loss = 0.0;
// Laplacian regularization: ||h_i - h_j||² for edges
for &(i, j) in adjacency {
let diff = subtract(&embeddings[i], &embeddings[j]);
let norm_sq = l2_norm_squared(&diff);
// Normalize by degrees
let weight = 1.0 / (degrees[i].sqrt() * degrees[j].sqrt());
loss += weight * norm_sq;
}
loss
}
```
**Benefits**:
- Smooth embeddings along graph structure
- Preserves community structure
- Theoretical guarantees (Laplacian eigenmaps)
---
## 9. Disentangled Representations
### 9.1 Motivation
**Problem**: Current embeddings are entangled (single vector encodes everything)
**Goal**: Separate embedding into interpretable factors
```
h_v = [h_structural || h_semantic || h_temporal]
where:
h_structural: Topology (degree, centrality, etc.)
h_semantic: Feature content
h_temporal: Dynamics (for evolving graphs)
```
### 9.2 β-VAE for Graphs
**Variational Autoencoder with Disentanglement**:
```
Encoder: (X_v, N(v)) → q(z_v | X_v, N(v))
Decoder: z_v → p(X_v, N(v) | z_v)
Loss: L_VAE = E[log p(X_v | z_v)] - β KL(q(z_v) || p(z_v))
```
**β > 1**: Encourages disentanglement (independence of latent factors)
**Implementation Sketch**:
```rust
pub struct GraphVAE {
encoder: RuvectorLayer,
mu_layer: Linear,
logvar_layer: Linear,
decoder: Linear,
}
impl GraphVAE {
fn encode(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, Vec<f32>) {
let h = self.encoder.forward(node_features, neighbors, &[]);
let mu = self.mu_layer.forward(&h);
let logvar = self.logvar_layer.forward(&h);
(mu, logvar)
}
fn reparameterize(&self, mu: &[f32], logvar: &[f32]) -> Vec<f32> {
let std: Vec<f32> = logvar.iter().map(|&lv| (lv / 2.0).exp()).collect();
let eps: Vec<f32> = (0..mu.len()).map(|_| rand::thread_rng().sample(StandardNormal)).collect();
mu.iter().zip(std.iter()).zip(eps.iter())
.map(|((&m, &s), &e)| m + s * e)
.collect()
}
fn forward(&self, node_features: &[f32], neighbors: &[Vec<f32>]) -> (Vec<f32>, f32) {
let (mu, logvar) = self.encode(node_features, neighbors);
let z = self.reparameterize(&mu, &logvar);
// Reconstruct node features
let recon = self.decoder.forward(&z);
// KL divergence
let kl: f32 = mu.iter().zip(logvar.iter())
.map(|(&m, &lv)| -0.5 * (1.0 + lv - m*m - lv.exp()))
.sum();
(recon, kl)
}
}
```
### 9.3 Disentanglement Metrics
**1. Mutual Information Gap (MIG)**
```
MIG(z, y) = (1/K) Σ_k (I(z; y_k)_largest - I(z; y_k)_2nd_largest) / H(y_k)
Measures how uniquely each latent factor captures each ground-truth factor
```
**2. SAP (Separated Attribute Predictability)**
```
Train linear classifiers z → y for each attribute
Measure how well z predicts individual factors
```
**Application to Graphs**:
- Ground-truth factors: Degree, clustering, centrality, community
- Learned latent: h_v
- Metric: MIG or SAP between h_v components and structural properties
---
## 10. Hierarchical Representations (HNSW-Specific)
### 10.1 Multi-Scale Embeddings
**Idea**: Different embeddings for different HNSW layers
```
Node v appears in layers {0, 2, 3}:
h_v^{(0)}: Dense, local structure
h_v^{(2)}: Coarse, medium-range
h_v^{(3)}: Global, long-range hubs
```
**Hierarchical Encoding**:
```rust
pub struct HierarchicalEmbedding {
embeddings_by_layer: HashMap<usize, Vec<f32>>,
}
impl HierarchicalEmbedding {
fn get_embedding(&self, layer: usize) -> &Vec<f32> {
self.embeddings_by_layer.get(&layer)
.expect("Node not in this layer")
}
// Interpolate between layers for search
fn interpolated_embedding(&self, target_layer: f32) -> Vec<f32> {
let layer_low = target_layer.floor() as usize;
let layer_high = target_layer.ceil() as usize;
if layer_low == layer_high {
return self.get_embedding(layer_low).clone();
}
let alpha = target_layer - layer_low as f32;
let emb_low = self.get_embedding(layer_low);
let emb_high = self.get_embedding(layer_high);
// Linear interpolation
emb_low.iter().zip(emb_high.iter())
.map(|(&l, &h)| (1.0 - alpha) * l + alpha * h)
.collect()
}
}
```
**Hierarchical Loss**:
```rust
fn hierarchical_contrastive_loss(
node_hierarchical_emb: &HierarchicalEmbedding,
neighbors_by_layer: &HashMap<usize, Vec<HierarchicalEmbedding>>,
) -> f32 {
let mut loss = 0.0;
// Contrastive loss at each layer
for (layer, layer_neighbors) in neighbors_by_layer {
let h_v = node_hierarchical_emb.get_embedding(*layer);
let positives: Vec<&Vec<f32>> = layer_neighbors.iter()
.map(|n| n.get_embedding(*layer))
.collect();
// Sample negatives from other layers
let negatives = sample_negatives_other_layers(neighbors_by_layer, *layer);
loss += info_nce_loss(h_v, &positives, &negatives, 0.07);
}
loss / neighbors_by_layer.len() as f32
}
```
### 10.2 Coarse-to-Fine Alignment
**Goal**: Ensure consistency across HNSW layers
```
Alignment Loss:
L_align = Σ_v Σ_{l < l'} ||h_v^{(l)} - Project(h_v^{(l')})||²
where Project: R^{d_high} → R^{d_low} (e.g., learned linear map)
```
**Benefits**:
- Global structure (high layers) guides local (low layers)
- Enables layer-skipping (jump from layer 3 to layer 0 embedding)
- Multi-resolution representation
---
## 11. Practical Strategies for RuVector
### 11.1 Short-Term Enhancements
**1. Structural Feature Augmentation**
```rust
// Add degree, clustering, HNSW layer info to embeddings
let augmented_embedding = [
&node_embedding[..],
&[degree as f32],
&[clustering_coef],
&one_hot_layer[..],
].concat();
```
**2. Spectral Regularization**
```rust
// Add spectral loss to training
total_loss = contrastive_loss + λ_spectral * spectral_loss
```
**3. Hard Negative Sampling**
```rust
// Replace random negatives with hard negatives in local_contrastive_loss
let hard_negatives = sample_hard_negatives(node, all_embeddings, neighbors, k);
let loss = info_nce_loss(node, &neighbors, &hard_negatives, temperature);
```
### 11.2 Medium-Term Research
**4. Hierarchical Embeddings per HNSW Layer**
```rust
pub struct HNSWHierarchicalGNN {
gnn_layers_by_hnsw_level: Vec<RuvectorLayer>,
}
```
**5. Hyperbolic Embeddings for Higher Layers**
```rust
// Layer 0: Euclidean (local, grid-like)
// Layer 1+: Hyperbolic (hierarchical navigation)
pub enum GeometricEmbedding {
Euclidean(Vec<f32>),
Hyperbolic(Vec<f32>), // Poincaré ball
}
```
**6. Disentangled VAE**
```rust
// Separate structural vs. semantic information
pub struct DisentangledGraphVAE {
structural_encoder: RuvectorLayer,
semantic_encoder: RuvectorLayer,
decoder: Linear,
}
```
### 11.3 Long-Term Exploration
**7. Information Bottleneck Optimization**
- Minimize I(X; H) while maximizing I(H; Y)
- Variational bounds for tractability
- Beta-annealing schedule
**8. Graph Transformers**
- Replace message passing with full attention
- Positional encodings (Laplacian eigenvectors, RoPE)
- Layer-wise multi-scale attention
**9. Neural ODEs for Continuous Depth**
```
dh/dt = GNN(h(t), G)
h(T) = h(0) + ∫₀^T GNN(h(t), G) dt
```
---
## 12. Evaluation Metrics for Latent-Graph Alignment
### 12.1 Reconstruction Metrics
**1. Link Prediction AUC**
```
Measure how well latent similarity predicts edges
AUC-ROC on link prediction task
```
**2. Graph Reconstruction Error**
```
||A - σ(H H^T)||²_F
where A is adjacency, H is embeddings
```
### 12.2 Structural Preservation
**3. Rank Correlation**
```
Spearman ρ between:
- Graph distance d_G(u, v)
- Latent distance d_L(h_u, h_v)
```
**4. Distortion**
```
max_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
Worst-case embedding distortion
```
**5. Average Distortion**
```
(1/|V|²) Σ_{u,v} |d_L(h_u, h_v) - d_G(u, v)|
```
### 12.3 Downstream Task Performance
**6. Node Classification Accuracy**
```
Train classifier on embeddings, test accuracy
```
**7. Clustering Modularity**
```
K-means on embeddings, measure graph modularity
```
**8. HNSW Search Quality**
```
Recall@K using learned embeddings vs. original features
```
---
## References
### Papers
**Manifold Learning**:
1. Tenenbaum et al. (2000) - A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
2. Belkin & Niyogi (2003) - Laplacian Eigenmaps for Dimensionality Reduction
**Hyperbolic Embeddings**:
3. Nickel & Kiela (2017) - Poincaré Embeddings for Learning Hierarchical Representations
4. Chami et al. (2019) - Hyperbolic Graph Convolutional Neural Networks
**Information Theory**:
5. Tishby & Zaslavsky (2015) - Deep Learning and the Information Bottleneck Principle
6. Velickovic et al. (2019) - Deep Graph Infomax
**Contrastive Learning**:
7. Chen et al. (2020) - A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)
8. You et al. (2020) - Graph Contrastive Learning with Augmentations
**Disentanglement**:
9. Higgins et al. (2017) - β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
10. Ma et al. (2019) - Disentangled Graph Convolutional Networks
### RuVector Code
- `crates/ruvector-gnn/src/layer.rs` - GNN encoding
- `crates/ruvector-gnn/src/search.rs` - Latent similarity (decoding)
- `crates/ruvector-gnn/src/training.rs` - Contrastive losses (alignment)
---
**Document Version**: 1.0
**Last Updated**: 2025-11-30
**Author**: RuVector Research Team