Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
824
vendor/ruvector/docs/research/gnn-v2/12-topology-aware-gradient-routing.md
vendored
Normal file
824
vendor/ruvector/docs/research/gnn-v2/12-topology-aware-gradient-routing.md
vendored
Normal file
@@ -0,0 +1,824 @@
|
||||
# Topology-Aware Gradient Routing (TAGR)
|
||||
|
||||
## Overview
|
||||
|
||||
### Problem Statement
|
||||
Current vector search routing relies solely on embedding similarity, ignoring the rich topological structure of the graph. This leads to:
|
||||
1. **Inefficient routing**: Missing "highway" nodes with high betweenness centrality
|
||||
2. **Local optima**: Getting trapped in dense clusters without global context
|
||||
3. **Uniform traversal**: Treating all graph regions identically despite varying structure
|
||||
4. **Poor scalability**: Not leveraging graph properties for large-scale search
|
||||
|
||||
### Proposed Solution
|
||||
Route search queries based on local graph topology metrics (degree, clustering coefficient, betweenness centrality) in addition to embedding similarity. Automatically identify:
|
||||
- **Highway nodes**: High betweenness for long-range routing
|
||||
- **Hub nodes**: High degree for local exploration
|
||||
- **Bridge nodes**: Low clustering, connecting communities
|
||||
- **Dense regions**: High clustering for specialized searches
|
||||
|
||||
### Expected Benefits
|
||||
- **40-60% reduction** in path length for long-range queries
|
||||
- **25-35% improvement** in search efficiency (fewer hops)
|
||||
- **Automatic adaptation** to graph structure (no manual tuning)
|
||||
- **Better load balancing** across graph regions
|
||||
- **Hierarchical routing**: Global highways → local hubs → targets
|
||||
|
||||
### Novelty Claim
|
||||
First integration of graph topology metrics directly into vector search routing. Unlike:
|
||||
- **Community detection**: TAGR uses local metrics, no global clustering needed
|
||||
- **Graph neural networks**: TAGR routes using topology, not learned representations
|
||||
- **Hierarchical graphs**: TAGR adapts to natural topology, no imposed hierarchy
|
||||
|
||||
TAGR creates an adaptive routing strategy that respects the graph's intrinsic structure.
|
||||
|
||||
## Technical Design
|
||||
|
||||
### Architecture Diagram
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────┐
|
||||
│ Topology-Aware Gradient Routing │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Topology Metric Computation │ │
|
||||
│ │ │ │
|
||||
│ │ For each node i: │ │
|
||||
│ │ • Degree: deg(i) = |neighbors(i)| │ │
|
||||
│ │ • Clustering: C(i) = triangles(i) / potential_triangles │ │
|
||||
│ │ • Betweenness: B(i) = Σ(σ_st(i) / σ_st) │ │
|
||||
│ │ • PageRank: PR(i) = (1-d)/N + d·Σ(PR(j)/deg(j)) │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Node Classification by Topology │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ HIGHWAY │ │ HUB │ │ BRIDGE │ │ │
|
||||
│ │ │ │ │ │ │ │ │ │
|
||||
│ │ │ High B(i) │ │ High deg(i) │ │ Low C(i) │ │ │
|
||||
│ │ │ Low C(i) │ │ Med C(i) │ │ Med B(i) │ │ │
|
||||
│ │ │ │ │ │ │ │ │ │
|
||||
│ │ │ ●═══════● │ │ ●───● │ │ ● ● │ │ │
|
||||
│ │ │ ║ │ │ ╱│╲ │ │ │ │ │ │ │ │
|
||||
│ │ │ ║ │ │ ● │ ● │ │ │ ●─────● │ │ │
|
||||
│ │ │ ● │ │ ╲│╱ │ │ │ │ │ │
|
||||
│ │ │ │ │ ●───● │ │ │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Adaptive Routing Strategy │ │
|
||||
│ │ │ │
|
||||
│ │ Phase 1: Global Navigation │ │
|
||||
│ │ ┌─────────────────────────────────────┐ │ │
|
||||
│ │ │ Route via HIGHWAY nodes │ │ │
|
||||
│ │ │ Objective: minimize(distance to │ │ │
|
||||
│ │ │ target community) │ │ │
|
||||
│ │ │ Weight: 0.7·similarity + │ │ │
|
||||
│ │ │ 0.3·betweenness │ │ │
|
||||
│ │ └─────────────────────────────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ Phase 2: Local Exploration │ │
|
||||
│ │ ┌─────────────────────────────────────┐ │ │
|
||||
│ │ │ Route via HUB nodes │ │ │
|
||||
│ │ │ Objective: explore dense region │ │ │
|
||||
│ │ │ Weight: 0.8·similarity + │ │ │
|
||||
│ │ │ 0.2·degree │ │ │
|
||||
│ │ └─────────────────────────────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ Phase 3: Precision Targeting │ │
|
||||
│ │ ┌─────────────────────────────────────┐ │ │
|
||||
│ │ │ Pure similarity-based search │ │ │
|
||||
│ │ │ Weight: 1.0·similarity │ │ │
|
||||
│ │ └─────────────────────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Core Data Structures
|
||||
|
||||
```rust
|
||||
/// Topology metrics for each node
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct NodeTopology {
|
||||
/// Node identifier
|
||||
pub node_id: NodeId,
|
||||
|
||||
/// Degree (number of neighbors)
|
||||
pub degree: usize,
|
||||
|
||||
/// Clustering coefficient (0.0-1.0)
|
||||
pub clustering: f32,
|
||||
|
||||
/// Betweenness centrality (normalized)
|
||||
pub betweenness: f32,
|
||||
|
||||
/// PageRank score
|
||||
pub pagerank: f32,
|
||||
|
||||
/// Closeness centrality
|
||||
pub closeness: f32,
|
||||
|
||||
/// Eigenvector centrality
|
||||
pub eigenvector: f32,
|
||||
|
||||
/// Node classification
|
||||
pub classification: NodeClass,
|
||||
}
|
||||
|
||||
/// Node classification based on topology
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub enum NodeClass {
|
||||
/// High betweenness, low clustering (long-range routing)
|
||||
Highway,
|
||||
|
||||
/// High degree, medium clustering (local exploration)
|
||||
Hub,
|
||||
|
||||
/// Low clustering, medium betweenness (community connector)
|
||||
Bridge,
|
||||
|
||||
/// High clustering (dense local region)
|
||||
Dense,
|
||||
|
||||
/// Low degree, high clustering (leaf node)
|
||||
Leaf,
|
||||
|
||||
/// Doesn't fit other categories
|
||||
Ordinary,
|
||||
}
|
||||
|
||||
/// Configuration for topology-aware routing
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct TagrConfig {
|
||||
/// Metrics to compute (performance vs. accuracy trade-off)
|
||||
pub metrics: MetricSet,
|
||||
|
||||
/// Node classification thresholds
|
||||
pub classification_thresholds: ClassificationThresholds,
|
||||
|
||||
/// Routing strategy
|
||||
pub routing_strategy: RoutingStrategy,
|
||||
|
||||
/// Update frequency for topology metrics
|
||||
pub update_interval: Duration,
|
||||
|
||||
/// Enable adaptive weight tuning
|
||||
pub adaptive_weights: bool,
|
||||
}
|
||||
|
||||
/// Which topology metrics to compute
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct MetricSet {
|
||||
pub degree: bool,
|
||||
pub clustering: bool,
|
||||
pub betweenness: bool,
|
||||
pub pagerank: bool,
|
||||
pub closeness: bool,
|
||||
pub eigenvector: bool,
|
||||
}
|
||||
|
||||
/// Thresholds for node classification
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct ClassificationThresholds {
|
||||
/// Betweenness threshold for highways (top X%)
|
||||
pub highway_betweenness_percentile: f32, // default: 0.95
|
||||
|
||||
/// Degree threshold for hubs (top X%)
|
||||
pub hub_degree_percentile: f32, // default: 0.90
|
||||
|
||||
/// Clustering threshold for dense regions
|
||||
pub dense_clustering_threshold: f32, // default: 0.7
|
||||
|
||||
/// Maximum clustering for bridges
|
||||
pub bridge_clustering_max: f32, // default: 0.3
|
||||
}
|
||||
|
||||
/// Routing strategy configuration
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum RoutingStrategy {
|
||||
/// Three-phase: highway → hub → target
|
||||
ThreePhase {
|
||||
phase1_weight: PhaseWeights,
|
||||
phase2_weight: PhaseWeights,
|
||||
phase3_weight: PhaseWeights,
|
||||
},
|
||||
|
||||
/// Adaptive: dynamically choose weights based on query progress
|
||||
Adaptive {
|
||||
initial_weights: PhaseWeights,
|
||||
adaptation_rate: f32,
|
||||
},
|
||||
|
||||
/// Custom strategy
|
||||
Custom(fn(&SearchState) -> PhaseWeights),
|
||||
}
|
||||
|
||||
/// Weights for combining similarity and topology
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct PhaseWeights {
|
||||
pub similarity: f32,
|
||||
pub degree: f32,
|
||||
pub clustering: f32,
|
||||
pub betweenness: f32,
|
||||
pub pagerank: f32,
|
||||
}
|
||||
|
||||
/// Current search state for adaptive routing
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct SearchState {
|
||||
/// Nodes visited so far
|
||||
pub visited: Vec<NodeId>,
|
||||
|
||||
/// Current position
|
||||
pub current: NodeId,
|
||||
|
||||
/// Best similarity seen so far
|
||||
pub best_similarity: f32,
|
||||
|
||||
/// Number of hops taken
|
||||
pub hops: usize,
|
||||
|
||||
/// Estimated distance to target (embedding space)
|
||||
pub estimated_distance: f32,
|
||||
}
|
||||
|
||||
/// Topology-aware router
|
||||
pub struct TopologyRouter {
|
||||
/// Topology metrics for all nodes
|
||||
metrics: Vec<NodeTopology>,
|
||||
|
||||
/// Fast lookup by node class
|
||||
class_index: HashMap<NodeClass, Vec<NodeId>>,
|
||||
|
||||
/// Configuration
|
||||
config: TagrConfig,
|
||||
|
||||
/// Cached routing decisions
|
||||
routing_cache: LruCache<(NodeId, NodeId), Vec<NodeId>>,
|
||||
}
|
||||
```
|
||||
|
||||
### Key Algorithms
|
||||
|
||||
```rust
|
||||
// Pseudocode for topology-aware routing
|
||||
|
||||
/// Compute topology metrics for graph
|
||||
fn compute_topology_metrics(graph: &HnswGraph) -> Vec<NodeTopology> {
|
||||
let n = graph.node_count();
|
||||
let mut metrics = vec![NodeTopology::default(); n];
|
||||
|
||||
// Phase 1: Local metrics (degree, clustering)
|
||||
for node in 0..n {
|
||||
let neighbors = graph.get_neighbors(node, layer=0);
|
||||
metrics[node].degree = neighbors.len();
|
||||
|
||||
// Clustering coefficient: fraction of neighbor pairs connected
|
||||
let mut triangles = 0;
|
||||
let mut possible = 0;
|
||||
|
||||
for i in 0..neighbors.len() {
|
||||
for j in (i+1)..neighbors.len() {
|
||||
possible += 1;
|
||||
if graph.has_edge(neighbors[i], neighbors[j]) {
|
||||
triangles += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
metrics[node].clustering = if possible > 0 {
|
||||
triangles as f32 / possible as f32
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
}
|
||||
|
||||
// Phase 2: Global metrics (betweenness, PageRank)
|
||||
// Betweenness: fraction of shortest paths passing through node
|
||||
metrics = compute_betweenness(graph, metrics);
|
||||
|
||||
// PageRank: iterative link analysis
|
||||
metrics = compute_pagerank(graph, metrics);
|
||||
|
||||
// Phase 3: Classify nodes
|
||||
for i in 0..n {
|
||||
metrics[i].classification = classify_node(&metrics[i], &metrics);
|
||||
}
|
||||
|
||||
metrics
|
||||
}
|
||||
|
||||
/// Betweenness centrality using Brandes' algorithm
|
||||
fn compute_betweenness(
|
||||
graph: &HnswGraph,
|
||||
mut metrics: Vec<NodeTopology>
|
||||
) -> Vec<NodeTopology> {
|
||||
let n = graph.node_count();
|
||||
let mut betweenness = vec![0.0; n];
|
||||
|
||||
// For each source node
|
||||
for s in 0..n {
|
||||
let mut stack = Vec::new();
|
||||
let mut paths = vec![Vec::new(); n];
|
||||
let mut sigma = vec![0.0; n];
|
||||
sigma[s] = 1.0;
|
||||
let mut dist = vec![-1; n];
|
||||
dist[s] = 0;
|
||||
|
||||
// BFS from s
|
||||
let mut queue = VecDeque::new();
|
||||
queue.push_back(s);
|
||||
|
||||
while let Some(v) = queue.pop_front() {
|
||||
stack.push(v);
|
||||
|
||||
for w in graph.get_neighbors(v, layer=0) {
|
||||
// First visit to w?
|
||||
if dist[w] < 0 {
|
||||
dist[w] = dist[v] + 1;
|
||||
queue.push_back(w);
|
||||
}
|
||||
|
||||
// Shortest path to w via v?
|
||||
if dist[w] == dist[v] + 1 {
|
||||
sigma[w] += sigma[v];
|
||||
paths[w].push(v);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Accumulate betweenness
|
||||
let mut delta = vec![0.0; n];
|
||||
while let Some(w) = stack.pop() {
|
||||
for v in &paths[w] {
|
||||
delta[*v] += (sigma[*v] / sigma[w]) * (1.0 + delta[w]);
|
||||
}
|
||||
if w != s {
|
||||
betweenness[w] += delta[w];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Normalize
|
||||
let max_betweenness = betweenness.iter().cloned().fold(0.0, f32::max);
|
||||
for i in 0..n {
|
||||
metrics[i].betweenness = betweenness[i] / max_betweenness;
|
||||
}
|
||||
|
||||
metrics
|
||||
}
|
||||
|
||||
/// Classify node based on topology metrics
|
||||
fn classify_node(
|
||||
node: &NodeTopology,
|
||||
all_metrics: &[NodeTopology]
|
||||
) -> NodeClass {
|
||||
// Compute percentiles
|
||||
let betweenness_percentile = compute_percentile(
|
||||
all_metrics.iter().map(|m| m.betweenness),
|
||||
node.betweenness
|
||||
);
|
||||
|
||||
let degree_percentile = compute_percentile(
|
||||
all_metrics.iter().map(|m| m.degree as f32),
|
||||
node.degree as f32
|
||||
);
|
||||
|
||||
// Classification logic
|
||||
if betweenness_percentile > 0.95 && node.clustering < 0.3 {
|
||||
NodeClass::Highway
|
||||
} else if degree_percentile > 0.90 && node.clustering > 0.4 {
|
||||
NodeClass::Hub
|
||||
} else if node.clustering < 0.3 && betweenness_percentile > 0.7 {
|
||||
NodeClass::Bridge
|
||||
} else if node.clustering > 0.7 {
|
||||
NodeClass::Dense
|
||||
} else if node.degree < 5 && node.clustering > 0.6 {
|
||||
NodeClass::Leaf
|
||||
} else {
|
||||
NodeClass::Ordinary
|
||||
}
|
||||
}
|
||||
|
||||
/// Topology-aware search with three-phase routing
|
||||
fn tagr_search(
|
||||
query: &[f32],
|
||||
graph: &HnswGraph,
|
||||
router: &TopologyRouter,
|
||||
k: usize
|
||||
) -> Vec<SearchResult> {
|
||||
let mut current = graph.entry_point;
|
||||
let mut visited = HashSet::new();
|
||||
let mut best_similarity = -1.0;
|
||||
let mut hops = 0;
|
||||
|
||||
let state = SearchState {
|
||||
visited: Vec::new(),
|
||||
current,
|
||||
best_similarity,
|
||||
hops,
|
||||
estimated_distance: f32::MAX,
|
||||
};
|
||||
|
||||
// Phase 1: Global navigation via highways
|
||||
while in_phase_1(&state) {
|
||||
let neighbors = graph.get_neighbors(current, layer=0);
|
||||
let mut best_neighbor = None;
|
||||
let mut best_score = f32::MIN;
|
||||
|
||||
for neighbor in neighbors {
|
||||
if visited.contains(&neighbor) { continue; }
|
||||
|
||||
let topo = &router.metrics[neighbor];
|
||||
let embedding = graph.get_embedding(neighbor);
|
||||
let similarity = cosine_similarity(query, embedding);
|
||||
|
||||
// Phase 1 weights: favor highways
|
||||
let score = 0.6 * similarity + 0.4 * topo.betweenness;
|
||||
|
||||
if score > best_score {
|
||||
best_score = score;
|
||||
best_neighbor = Some(neighbor);
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(next) = best_neighbor {
|
||||
current = next;
|
||||
visited.insert(current);
|
||||
hops += 1;
|
||||
|
||||
let similarity = cosine_similarity(
|
||||
query,
|
||||
graph.get_embedding(current)
|
||||
);
|
||||
best_similarity = best_similarity.max(similarity);
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 2: Local exploration via hubs
|
||||
while in_phase_2(&state) {
|
||||
let neighbors = graph.get_neighbors(current, layer=0);
|
||||
let mut best_neighbor = None;
|
||||
let mut best_score = f32::MIN;
|
||||
|
||||
for neighbor in neighbors {
|
||||
if visited.contains(&neighbor) { continue; }
|
||||
|
||||
let topo = &router.metrics[neighbor];
|
||||
let embedding = graph.get_embedding(neighbor);
|
||||
let similarity = cosine_similarity(query, embedding);
|
||||
|
||||
// Phase 2 weights: favor hubs and similarity
|
||||
let degree_score = topo.degree as f32 / graph.max_degree() as f32;
|
||||
let score = 0.8 * similarity + 0.2 * degree_score;
|
||||
|
||||
if score > best_score {
|
||||
best_score = score;
|
||||
best_neighbor = Some(neighbor);
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(next) = best_neighbor {
|
||||
current = next;
|
||||
visited.insert(current);
|
||||
hops += 1;
|
||||
|
||||
let similarity = cosine_similarity(
|
||||
query,
|
||||
graph.get_embedding(current)
|
||||
);
|
||||
best_similarity = best_similarity.max(similarity);
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 3: Pure similarity search
|
||||
standard_greedy_search(query, graph, current, k, visited)
|
||||
}
|
||||
|
||||
/// Adaptive weight tuning based on search progress
|
||||
fn adaptive_routing(
|
||||
state: &SearchState,
|
||||
router: &TopologyRouter
|
||||
) -> PhaseWeights {
|
||||
let progress = estimate_progress(state);
|
||||
|
||||
// Early (global navigation): emphasize topology
|
||||
// Middle (local exploration): balanced
|
||||
// Late (precision targeting): emphasize similarity
|
||||
|
||||
let topology_weight = (1.0 - progress) * 0.5;
|
||||
let similarity_weight = 0.5 + progress * 0.5;
|
||||
|
||||
PhaseWeights {
|
||||
similarity: similarity_weight,
|
||||
degree: topology_weight * 0.3,
|
||||
clustering: topology_weight * 0.2,
|
||||
betweenness: topology_weight * 0.4,
|
||||
pagerank: topology_weight * 0.1,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### API Design
|
||||
|
||||
```rust
|
||||
/// Public API for Topology-Aware Gradient Routing
|
||||
pub trait TopologyAwareRouting {
|
||||
/// Create topology router for graph
|
||||
fn new(graph: &HnswGraph, config: TagrConfig) -> Self;
|
||||
|
||||
/// Search with topology-aware routing
|
||||
fn search(
|
||||
&self,
|
||||
query: &[f32],
|
||||
k: usize,
|
||||
options: TagrSearchOptions,
|
||||
) -> Result<Vec<SearchResult>, TagrError>;
|
||||
|
||||
/// Get topology metrics for node
|
||||
fn get_metrics(&self, node_id: NodeId) -> &NodeTopology;
|
||||
|
||||
/// Find nearest highway nodes
|
||||
fn find_highways(&self, point: &[f32], k: usize) -> Vec<NodeId>;
|
||||
|
||||
/// Find hubs in region
|
||||
fn find_hubs(&self, center: &[f32], radius: f32) -> Vec<NodeId>;
|
||||
|
||||
/// Get nodes by classification
|
||||
fn get_by_class(&self, class: NodeClass) -> &[NodeId];
|
||||
|
||||
/// Update topology metrics (incremental)
|
||||
fn update_metrics(&mut self, changed_nodes: &[NodeId]) -> Result<(), TagrError>;
|
||||
|
||||
/// Recompute all metrics (full update)
|
||||
fn recompute_metrics(&mut self) -> Result<(), TagrError>;
|
||||
|
||||
/// Export topology visualization
|
||||
fn export_topology(&self) -> TopologyVisualization;
|
||||
|
||||
/// Get routing statistics
|
||||
fn statistics(&self) -> RoutingStatistics;
|
||||
}
|
||||
|
||||
/// Search options for TAGR
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct TagrSearchOptions {
|
||||
/// Routing strategy override
|
||||
pub strategy: Option<RoutingStrategy>,
|
||||
|
||||
/// Prefer specific node classes
|
||||
pub prefer_classes: Vec<NodeClass>,
|
||||
|
||||
/// Avoid specific node classes
|
||||
pub avoid_classes: Vec<NodeClass>,
|
||||
|
||||
/// Enable path recording
|
||||
pub record_path: bool,
|
||||
|
||||
/// Maximum hops
|
||||
pub max_hops: usize,
|
||||
}
|
||||
|
||||
/// Routing statistics
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct RoutingStatistics {
|
||||
/// Total searches performed
|
||||
pub total_searches: usize,
|
||||
|
||||
/// Average path length
|
||||
pub avg_path_length: f32,
|
||||
|
||||
/// Highway usage rate
|
||||
pub highway_usage: f32,
|
||||
|
||||
/// Hub usage rate
|
||||
pub hub_usage: f32,
|
||||
|
||||
/// Average hops by phase
|
||||
pub hops_by_phase: [f32; 3],
|
||||
|
||||
/// Node class distribution
|
||||
pub class_distribution: HashMap<NodeClass, usize>,
|
||||
}
|
||||
|
||||
/// Topology visualization export
|
||||
#[derive(Clone, Debug, Serialize)]
|
||||
pub struct TopologyVisualization {
|
||||
pub nodes: Vec<TopoNode>,
|
||||
pub highways: Vec<NodeId>,
|
||||
pub hubs: Vec<NodeId>,
|
||||
pub bridges: Vec<NodeId>,
|
||||
pub metrics_summary: MetricsSummary,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Serialize)]
|
||||
pub struct TopoNode {
|
||||
pub id: NodeId,
|
||||
pub class: NodeClass,
|
||||
pub degree: usize,
|
||||
pub betweenness: f32,
|
||||
pub clustering: f32,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Serialize)]
|
||||
pub struct MetricsSummary {
|
||||
pub total_nodes: usize,
|
||||
pub avg_degree: f32,
|
||||
pub avg_clustering: f32,
|
||||
pub max_betweenness: f32,
|
||||
}
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Affected Crates/Modules
|
||||
|
||||
1. **`crates/ruvector-core/src/hnsw/`**
|
||||
- Add topology metadata to nodes
|
||||
- Modify routing to use topology metrics
|
||||
- Extend search API for topology options
|
||||
|
||||
2. **`crates/ruvector-gnn/src/routing/`**
|
||||
- Create new routing module
|
||||
- Integrate with existing GNN layers
|
||||
|
||||
3. **`crates/ruvector-core/src/metrics/`**
|
||||
- Implement graph centrality algorithms
|
||||
- Add metric computation utilities
|
||||
|
||||
### New Modules to Create
|
||||
|
||||
1. **`crates/ruvector-gnn/src/topology/`**
|
||||
- `metrics.rs` - Topology metric computation
|
||||
- `classification.rs` - Node classification
|
||||
- `router.rs` - Topology-aware routing
|
||||
- `adaptive.rs` - Adaptive weight tuning
|
||||
- `cache.rs` - Metric caching and updates
|
||||
|
||||
2. **`crates/ruvector-core/src/graph/`**
|
||||
- `centrality.rs` - Centrality algorithms (betweenness, PageRank)
|
||||
- `clustering.rs` - Clustering coefficient
|
||||
- `analysis.rs` - Graph analysis utilities
|
||||
|
||||
### Dependencies on Other Features
|
||||
|
||||
- **Feature 10 (Gravitational Fields)**: Combine topology routing with gravitational pull
|
||||
- **Feature 11 (Causal Networks)**: Adapt topology metrics for DAGs
|
||||
- **Feature 13 (Crystallization)**: Use topology to identify hierarchy levels
|
||||
|
||||
## Regression Prevention
|
||||
|
||||
### Existing Functionality at Risk
|
||||
|
||||
1. **Search Performance**
|
||||
- Risk: Topology computation overhead
|
||||
- Prevention: Incremental updates, caching, optional feature
|
||||
|
||||
2. **Search Quality**
|
||||
- Risk: Poor topology routing on certain graph structures
|
||||
- Prevention: Adaptive fallback to pure similarity
|
||||
|
||||
3. **Memory Usage**
|
||||
- Risk: Storing topology metrics per node
|
||||
- Prevention: Lazy computation, sparse storage
|
||||
|
||||
### Test Cases
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod regression_tests {
|
||||
/// Verify highways reduce path length
|
||||
#[test]
|
||||
fn test_highway_routing_efficiency() {
|
||||
let graph = create_scale_free_graph(10000);
|
||||
let router = TopologyRouter::new(&graph, TagrConfig::default());
|
||||
|
||||
let query = random_vector(128);
|
||||
|
||||
// Standard search
|
||||
let (standard_results, standard_path) = graph.search_with_path(&query, 10);
|
||||
|
||||
// TAGR search
|
||||
let (tagr_results, tagr_path) = router.search_with_path(&query, 10);
|
||||
|
||||
// TAGR should take fewer hops
|
||||
assert!(tagr_path.len() < standard_path.len());
|
||||
|
||||
// But maintain similar quality
|
||||
let standard_recall = compute_recall(&standard_results, &ground_truth);
|
||||
let tagr_recall = compute_recall(&tagr_results, &ground_truth);
|
||||
assert!((tagr_recall - standard_recall).abs() < 0.05);
|
||||
}
|
||||
|
||||
/// Verify correct node classification
|
||||
#[test]
|
||||
fn test_node_classification() {
|
||||
let graph = create_test_graph_with_known_structure();
|
||||
let router = TopologyRouter::new(&graph, TagrConfig::default());
|
||||
|
||||
// Verify known highways
|
||||
let highways = router.get_by_class(NodeClass::Highway);
|
||||
assert!(highways.contains(&known_highway_node));
|
||||
|
||||
// Verify known hubs
|
||||
let hubs = router.get_by_class(NodeClass::Hub);
|
||||
assert!(hubs.contains(&known_hub_node));
|
||||
}
|
||||
|
||||
/// Incremental metric updates
|
||||
#[test]
|
||||
fn test_incremental_updates() {
|
||||
let mut graph = create_test_graph(1000);
|
||||
let mut router = TopologyRouter::new(&graph, TagrConfig::default());
|
||||
|
||||
let original_metrics = router.get_metrics(0).clone();
|
||||
|
||||
// Add edges
|
||||
graph.add_edge(0, 500);
|
||||
graph.add_edge(0, 501);
|
||||
|
||||
// Incremental update
|
||||
router.update_metrics(&[0, 500, 501]).unwrap();
|
||||
|
||||
let updated_metrics = router.get_metrics(0);
|
||||
|
||||
// Degree should increase
|
||||
assert!(updated_metrics.degree > original_metrics.degree);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Research Validation (2 weeks)
|
||||
- Implement basic topology metrics (degree, clustering)
|
||||
- Test on synthetic graphs with known structure
|
||||
- Measure routing efficiency improvements
|
||||
- **Deliverable**: Research report with benchmarks
|
||||
|
||||
### Phase 2: Core Implementation (3 weeks)
|
||||
- Implement all centrality metrics (betweenness, PageRank)
|
||||
- Develop node classification
|
||||
- Build three-phase routing
|
||||
- Add caching and optimization
|
||||
- **Deliverable**: Working TAGR module
|
||||
|
||||
### Phase 3: Integration (2 weeks)
|
||||
- Integrate with HNSW search
|
||||
- Add adaptive weight tuning
|
||||
- Create API bindings
|
||||
- Write integration tests
|
||||
- **Deliverable**: Integrated TAGR feature
|
||||
|
||||
### Phase 4: Optimization (2 weeks)
|
||||
- Profile and optimize metric computation
|
||||
- Implement incremental updates
|
||||
- Add visualization tools
|
||||
- Write documentation
|
||||
- **Deliverable**: Production-ready feature
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Performance Benchmarks
|
||||
|
||||
| Metric | Baseline | Target | Dataset |
|
||||
|--------|----------|--------|---------|
|
||||
| Path length reduction | 0% | >40% | Scale-free graph, 1M nodes |
|
||||
| Search hops | 15.2 | <10.0 | Wikipedia embeddings |
|
||||
| Metric computation time | N/A | <5s | Per 100K nodes |
|
||||
| Memory overhead | 0MB | <200MB | Per 1M nodes |
|
||||
|
||||
### Accuracy Metrics
|
||||
|
||||
1. **Highway Identification**: Correlation with true betweenness
|
||||
- Target: Spearman correlation >0.85
|
||||
|
||||
2. **Routing Efficiency**: Hops saved vs. baseline
|
||||
- Target: >30% reduction for long-range queries
|
||||
|
||||
3. **Search Quality**: Recall maintained
|
||||
- Target: Recall degradation <5%
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Expensive betweenness computation | Approximate algorithms, sampling |
|
||||
| Poor generalization | Test on diverse graph types |
|
||||
| Classification instability | Regularization, threshold tuning |
|
||||
| Metric staleness | Incremental updates, change detection |
|
||||
|
||||
## References
|
||||
|
||||
- Brandes' betweenness algorithm
|
||||
- PageRank and graph centrality
|
||||
- Small-world and scale-free networks
|
||||
- Graph-based routing in P2P networks
|
||||
Reference in New Issue
Block a user