28 KiB
Topology-Aware Gradient Routing (TAGR)
Overview
Problem Statement
Current vector search routing relies solely on embedding similarity, ignoring the rich topological structure of the graph. This leads to:
- Inefficient routing: Missing "highway" nodes with high betweenness centrality
- Local optima: Getting trapped in dense clusters without global context
- Uniform traversal: Treating all graph regions identically despite varying structure
- Poor scalability: Not leveraging graph properties for large-scale search
Proposed Solution
Route search queries based on local graph topology metrics (degree, clustering coefficient, betweenness centrality) in addition to embedding similarity. Automatically identify:
- Highway nodes: High betweenness for long-range routing
- Hub nodes: High degree for local exploration
- Bridge nodes: Low clustering, connecting communities
- Dense regions: High clustering for specialized searches
Expected Benefits
- 40-60% reduction in path length for long-range queries
- 25-35% improvement in search efficiency (fewer hops)
- Automatic adaptation to graph structure (no manual tuning)
- Better load balancing across graph regions
- Hierarchical routing: Global highways → local hubs → targets
Novelty Claim
First integration of graph topology metrics directly into vector search routing. Unlike:
- Community detection: TAGR uses local metrics, no global clustering needed
- Graph neural networks: TAGR routes using topology, not learned representations
- Hierarchical graphs: TAGR adapts to natural topology, no imposed hierarchy
TAGR creates an adaptive routing strategy that respects the graph's intrinsic structure.
Technical Design
Architecture Diagram
┌────────────────────────────────────────────────────────────────────┐
│ Topology-Aware Gradient Routing │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Topology Metric Computation │ │
│ │ │ │
│ │ For each node i: │ │
│ │ • Degree: deg(i) = |neighbors(i)| │ │
│ │ • Clustering: C(i) = triangles(i) / potential_triangles │ │
│ │ • Betweenness: B(i) = Σ(σ_st(i) / σ_st) │ │
│ │ • PageRank: PR(i) = (1-d)/N + d·Σ(PR(j)/deg(j)) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Node Classification by Topology │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ HIGHWAY │ │ HUB │ │ BRIDGE │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ High B(i) │ │ High deg(i) │ │ Low C(i) │ │ │
│ │ │ Low C(i) │ │ Med C(i) │ │ Med B(i) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ ●═══════● │ │ ●───● │ │ ● ● │ │ │
│ │ │ ║ │ │ ╱│╲ │ │ │ │ │ │ │ │
│ │ │ ║ │ │ ● │ ● │ │ │ ●─────● │ │ │
│ │ │ ● │ │ ╲│╱ │ │ │ │ │ │
│ │ │ │ │ ●───● │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Adaptive Routing Strategy │ │
│ │ │ │
│ │ Phase 1: Global Navigation │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Route via HIGHWAY nodes │ │ │
│ │ │ Objective: minimize(distance to │ │ │
│ │ │ target community) │ │ │
│ │ │ Weight: 0.7·similarity + │ │ │
│ │ │ 0.3·betweenness │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Phase 2: Local Exploration │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Route via HUB nodes │ │ │
│ │ │ Objective: explore dense region │ │ │
│ │ │ Weight: 0.8·similarity + │ │ │
│ │ │ 0.2·degree │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Phase 3: Precision Targeting │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Pure similarity-based search │ │ │
│ │ │ Weight: 1.0·similarity │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
Core Data Structures
/// Topology metrics for each node
#[derive(Clone, Debug)]
pub struct NodeTopology {
/// Node identifier
pub node_id: NodeId,
/// Degree (number of neighbors)
pub degree: usize,
/// Clustering coefficient (0.0-1.0)
pub clustering: f32,
/// Betweenness centrality (normalized)
pub betweenness: f32,
/// PageRank score
pub pagerank: f32,
/// Closeness centrality
pub closeness: f32,
/// Eigenvector centrality
pub eigenvector: f32,
/// Node classification
pub classification: NodeClass,
}
/// Node classification based on topology
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum NodeClass {
/// High betweenness, low clustering (long-range routing)
Highway,
/// High degree, medium clustering (local exploration)
Hub,
/// Low clustering, medium betweenness (community connector)
Bridge,
/// High clustering (dense local region)
Dense,
/// Low degree, high clustering (leaf node)
Leaf,
/// Doesn't fit other categories
Ordinary,
}
/// Configuration for topology-aware routing
#[derive(Clone, Debug)]
pub struct TagrConfig {
/// Metrics to compute (performance vs. accuracy trade-off)
pub metrics: MetricSet,
/// Node classification thresholds
pub classification_thresholds: ClassificationThresholds,
/// Routing strategy
pub routing_strategy: RoutingStrategy,
/// Update frequency for topology metrics
pub update_interval: Duration,
/// Enable adaptive weight tuning
pub adaptive_weights: bool,
}
/// Which topology metrics to compute
#[derive(Clone, Debug)]
pub struct MetricSet {
pub degree: bool,
pub clustering: bool,
pub betweenness: bool,
pub pagerank: bool,
pub closeness: bool,
pub eigenvector: bool,
}
/// Thresholds for node classification
#[derive(Clone, Debug)]
pub struct ClassificationThresholds {
/// Betweenness threshold for highways (top X%)
pub highway_betweenness_percentile: f32, // default: 0.95
/// Degree threshold for hubs (top X%)
pub hub_degree_percentile: f32, // default: 0.90
/// Clustering threshold for dense regions
pub dense_clustering_threshold: f32, // default: 0.7
/// Maximum clustering for bridges
pub bridge_clustering_max: f32, // default: 0.3
}
/// Routing strategy configuration
#[derive(Clone, Debug)]
pub enum RoutingStrategy {
/// Three-phase: highway → hub → target
ThreePhase {
phase1_weight: PhaseWeights,
phase2_weight: PhaseWeights,
phase3_weight: PhaseWeights,
},
/// Adaptive: dynamically choose weights based on query progress
Adaptive {
initial_weights: PhaseWeights,
adaptation_rate: f32,
},
/// Custom strategy
Custom(fn(&SearchState) -> PhaseWeights),
}
/// Weights for combining similarity and topology
#[derive(Clone, Debug)]
pub struct PhaseWeights {
pub similarity: f32,
pub degree: f32,
pub clustering: f32,
pub betweenness: f32,
pub pagerank: f32,
}
/// Current search state for adaptive routing
#[derive(Clone, Debug)]
pub struct SearchState {
/// Nodes visited so far
pub visited: Vec<NodeId>,
/// Current position
pub current: NodeId,
/// Best similarity seen so far
pub best_similarity: f32,
/// Number of hops taken
pub hops: usize,
/// Estimated distance to target (embedding space)
pub estimated_distance: f32,
}
/// Topology-aware router
pub struct TopologyRouter {
/// Topology metrics for all nodes
metrics: Vec<NodeTopology>,
/// Fast lookup by node class
class_index: HashMap<NodeClass, Vec<NodeId>>,
/// Configuration
config: TagrConfig,
/// Cached routing decisions
routing_cache: LruCache<(NodeId, NodeId), Vec<NodeId>>,
}
Key Algorithms
// Pseudocode for topology-aware routing
/// Compute topology metrics for graph
fn compute_topology_metrics(graph: &HnswGraph) -> Vec<NodeTopology> {
let n = graph.node_count();
let mut metrics = vec![NodeTopology::default(); n];
// Phase 1: Local metrics (degree, clustering)
for node in 0..n {
let neighbors = graph.get_neighbors(node, layer=0);
metrics[node].degree = neighbors.len();
// Clustering coefficient: fraction of neighbor pairs connected
let mut triangles = 0;
let mut possible = 0;
for i in 0..neighbors.len() {
for j in (i+1)..neighbors.len() {
possible += 1;
if graph.has_edge(neighbors[i], neighbors[j]) {
triangles += 1;
}
}
}
metrics[node].clustering = if possible > 0 {
triangles as f32 / possible as f32
} else {
0.0
};
}
// Phase 2: Global metrics (betweenness, PageRank)
// Betweenness: fraction of shortest paths passing through node
metrics = compute_betweenness(graph, metrics);
// PageRank: iterative link analysis
metrics = compute_pagerank(graph, metrics);
// Phase 3: Classify nodes
for i in 0..n {
metrics[i].classification = classify_node(&metrics[i], &metrics);
}
metrics
}
/// Betweenness centrality using Brandes' algorithm
fn compute_betweenness(
graph: &HnswGraph,
mut metrics: Vec<NodeTopology>
) -> Vec<NodeTopology> {
let n = graph.node_count();
let mut betweenness = vec![0.0; n];
// For each source node
for s in 0..n {
let mut stack = Vec::new();
let mut paths = vec![Vec::new(); n];
let mut sigma = vec![0.0; n];
sigma[s] = 1.0;
let mut dist = vec![-1; n];
dist[s] = 0;
// BFS from s
let mut queue = VecDeque::new();
queue.push_back(s);
while let Some(v) = queue.pop_front() {
stack.push(v);
for w in graph.get_neighbors(v, layer=0) {
// First visit to w?
if dist[w] < 0 {
dist[w] = dist[v] + 1;
queue.push_back(w);
}
// Shortest path to w via v?
if dist[w] == dist[v] + 1 {
sigma[w] += sigma[v];
paths[w].push(v);
}
}
}
// Accumulate betweenness
let mut delta = vec![0.0; n];
while let Some(w) = stack.pop() {
for v in &paths[w] {
delta[*v] += (sigma[*v] / sigma[w]) * (1.0 + delta[w]);
}
if w != s {
betweenness[w] += delta[w];
}
}
}
// Normalize
let max_betweenness = betweenness.iter().cloned().fold(0.0, f32::max);
for i in 0..n {
metrics[i].betweenness = betweenness[i] / max_betweenness;
}
metrics
}
/// Classify node based on topology metrics
fn classify_node(
node: &NodeTopology,
all_metrics: &[NodeTopology]
) -> NodeClass {
// Compute percentiles
let betweenness_percentile = compute_percentile(
all_metrics.iter().map(|m| m.betweenness),
node.betweenness
);
let degree_percentile = compute_percentile(
all_metrics.iter().map(|m| m.degree as f32),
node.degree as f32
);
// Classification logic
if betweenness_percentile > 0.95 && node.clustering < 0.3 {
NodeClass::Highway
} else if degree_percentile > 0.90 && node.clustering > 0.4 {
NodeClass::Hub
} else if node.clustering < 0.3 && betweenness_percentile > 0.7 {
NodeClass::Bridge
} else if node.clustering > 0.7 {
NodeClass::Dense
} else if node.degree < 5 && node.clustering > 0.6 {
NodeClass::Leaf
} else {
NodeClass::Ordinary
}
}
/// Topology-aware search with three-phase routing
fn tagr_search(
query: &[f32],
graph: &HnswGraph,
router: &TopologyRouter,
k: usize
) -> Vec<SearchResult> {
let mut current = graph.entry_point;
let mut visited = HashSet::new();
let mut best_similarity = -1.0;
let mut hops = 0;
let state = SearchState {
visited: Vec::new(),
current,
best_similarity,
hops,
estimated_distance: f32::MAX,
};
// Phase 1: Global navigation via highways
while in_phase_1(&state) {
let neighbors = graph.get_neighbors(current, layer=0);
let mut best_neighbor = None;
let mut best_score = f32::MIN;
for neighbor in neighbors {
if visited.contains(&neighbor) { continue; }
let topo = &router.metrics[neighbor];
let embedding = graph.get_embedding(neighbor);
let similarity = cosine_similarity(query, embedding);
// Phase 1 weights: favor highways
let score = 0.6 * similarity + 0.4 * topo.betweenness;
if score > best_score {
best_score = score;
best_neighbor = Some(neighbor);
}
}
if let Some(next) = best_neighbor {
current = next;
visited.insert(current);
hops += 1;
let similarity = cosine_similarity(
query,
graph.get_embedding(current)
);
best_similarity = best_similarity.max(similarity);
} else {
break;
}
}
// Phase 2: Local exploration via hubs
while in_phase_2(&state) {
let neighbors = graph.get_neighbors(current, layer=0);
let mut best_neighbor = None;
let mut best_score = f32::MIN;
for neighbor in neighbors {
if visited.contains(&neighbor) { continue; }
let topo = &router.metrics[neighbor];
let embedding = graph.get_embedding(neighbor);
let similarity = cosine_similarity(query, embedding);
// Phase 2 weights: favor hubs and similarity
let degree_score = topo.degree as f32 / graph.max_degree() as f32;
let score = 0.8 * similarity + 0.2 * degree_score;
if score > best_score {
best_score = score;
best_neighbor = Some(neighbor);
}
}
if let Some(next) = best_neighbor {
current = next;
visited.insert(current);
hops += 1;
let similarity = cosine_similarity(
query,
graph.get_embedding(current)
);
best_similarity = best_similarity.max(similarity);
} else {
break;
}
}
// Phase 3: Pure similarity search
standard_greedy_search(query, graph, current, k, visited)
}
/// Adaptive weight tuning based on search progress
fn adaptive_routing(
state: &SearchState,
router: &TopologyRouter
) -> PhaseWeights {
let progress = estimate_progress(state);
// Early (global navigation): emphasize topology
// Middle (local exploration): balanced
// Late (precision targeting): emphasize similarity
let topology_weight = (1.0 - progress) * 0.5;
let similarity_weight = 0.5 + progress * 0.5;
PhaseWeights {
similarity: similarity_weight,
degree: topology_weight * 0.3,
clustering: topology_weight * 0.2,
betweenness: topology_weight * 0.4,
pagerank: topology_weight * 0.1,
}
}
API Design
/// Public API for Topology-Aware Gradient Routing
pub trait TopologyAwareRouting {
/// Create topology router for graph
fn new(graph: &HnswGraph, config: TagrConfig) -> Self;
/// Search with topology-aware routing
fn search(
&self,
query: &[f32],
k: usize,
options: TagrSearchOptions,
) -> Result<Vec<SearchResult>, TagrError>;
/// Get topology metrics for node
fn get_metrics(&self, node_id: NodeId) -> &NodeTopology;
/// Find nearest highway nodes
fn find_highways(&self, point: &[f32], k: usize) -> Vec<NodeId>;
/// Find hubs in region
fn find_hubs(&self, center: &[f32], radius: f32) -> Vec<NodeId>;
/// Get nodes by classification
fn get_by_class(&self, class: NodeClass) -> &[NodeId];
/// Update topology metrics (incremental)
fn update_metrics(&mut self, changed_nodes: &[NodeId]) -> Result<(), TagrError>;
/// Recompute all metrics (full update)
fn recompute_metrics(&mut self) -> Result<(), TagrError>;
/// Export topology visualization
fn export_topology(&self) -> TopologyVisualization;
/// Get routing statistics
fn statistics(&self) -> RoutingStatistics;
}
/// Search options for TAGR
#[derive(Clone, Debug)]
pub struct TagrSearchOptions {
/// Routing strategy override
pub strategy: Option<RoutingStrategy>,
/// Prefer specific node classes
pub prefer_classes: Vec<NodeClass>,
/// Avoid specific node classes
pub avoid_classes: Vec<NodeClass>,
/// Enable path recording
pub record_path: bool,
/// Maximum hops
pub max_hops: usize,
}
/// Routing statistics
#[derive(Clone, Debug)]
pub struct RoutingStatistics {
/// Total searches performed
pub total_searches: usize,
/// Average path length
pub avg_path_length: f32,
/// Highway usage rate
pub highway_usage: f32,
/// Hub usage rate
pub hub_usage: f32,
/// Average hops by phase
pub hops_by_phase: [f32; 3],
/// Node class distribution
pub class_distribution: HashMap<NodeClass, usize>,
}
/// Topology visualization export
#[derive(Clone, Debug, Serialize)]
pub struct TopologyVisualization {
pub nodes: Vec<TopoNode>,
pub highways: Vec<NodeId>,
pub hubs: Vec<NodeId>,
pub bridges: Vec<NodeId>,
pub metrics_summary: MetricsSummary,
}
#[derive(Clone, Debug, Serialize)]
pub struct TopoNode {
pub id: NodeId,
pub class: NodeClass,
pub degree: usize,
pub betweenness: f32,
pub clustering: f32,
}
#[derive(Clone, Debug, Serialize)]
pub struct MetricsSummary {
pub total_nodes: usize,
pub avg_degree: f32,
pub avg_clustering: f32,
pub max_betweenness: f32,
}
Integration Points
Affected Crates/Modules
-
crates/ruvector-core/src/hnsw/- Add topology metadata to nodes
- Modify routing to use topology metrics
- Extend search API for topology options
-
crates/ruvector-gnn/src/routing/- Create new routing module
- Integrate with existing GNN layers
-
crates/ruvector-core/src/metrics/- Implement graph centrality algorithms
- Add metric computation utilities
New Modules to Create
-
crates/ruvector-gnn/src/topology/metrics.rs- Topology metric computationclassification.rs- Node classificationrouter.rs- Topology-aware routingadaptive.rs- Adaptive weight tuningcache.rs- Metric caching and updates
-
crates/ruvector-core/src/graph/centrality.rs- Centrality algorithms (betweenness, PageRank)clustering.rs- Clustering coefficientanalysis.rs- Graph analysis utilities
Dependencies on Other Features
- Feature 10 (Gravitational Fields): Combine topology routing with gravitational pull
- Feature 11 (Causal Networks): Adapt topology metrics for DAGs
- Feature 13 (Crystallization): Use topology to identify hierarchy levels
Regression Prevention
Existing Functionality at Risk
-
Search Performance
- Risk: Topology computation overhead
- Prevention: Incremental updates, caching, optional feature
-
Search Quality
- Risk: Poor topology routing on certain graph structures
- Prevention: Adaptive fallback to pure similarity
-
Memory Usage
- Risk: Storing topology metrics per node
- Prevention: Lazy computation, sparse storage
Test Cases
#[cfg(test)]
mod regression_tests {
/// Verify highways reduce path length
#[test]
fn test_highway_routing_efficiency() {
let graph = create_scale_free_graph(10000);
let router = TopologyRouter::new(&graph, TagrConfig::default());
let query = random_vector(128);
// Standard search
let (standard_results, standard_path) = graph.search_with_path(&query, 10);
// TAGR search
let (tagr_results, tagr_path) = router.search_with_path(&query, 10);
// TAGR should take fewer hops
assert!(tagr_path.len() < standard_path.len());
// But maintain similar quality
let standard_recall = compute_recall(&standard_results, &ground_truth);
let tagr_recall = compute_recall(&tagr_results, &ground_truth);
assert!((tagr_recall - standard_recall).abs() < 0.05);
}
/// Verify correct node classification
#[test]
fn test_node_classification() {
let graph = create_test_graph_with_known_structure();
let router = TopologyRouter::new(&graph, TagrConfig::default());
// Verify known highways
let highways = router.get_by_class(NodeClass::Highway);
assert!(highways.contains(&known_highway_node));
// Verify known hubs
let hubs = router.get_by_class(NodeClass::Hub);
assert!(hubs.contains(&known_hub_node));
}
/// Incremental metric updates
#[test]
fn test_incremental_updates() {
let mut graph = create_test_graph(1000);
let mut router = TopologyRouter::new(&graph, TagrConfig::default());
let original_metrics = router.get_metrics(0).clone();
// Add edges
graph.add_edge(0, 500);
graph.add_edge(0, 501);
// Incremental update
router.update_metrics(&[0, 500, 501]).unwrap();
let updated_metrics = router.get_metrics(0);
// Degree should increase
assert!(updated_metrics.degree > original_metrics.degree);
}
}
Implementation Phases
Phase 1: Research Validation (2 weeks)
- Implement basic topology metrics (degree, clustering)
- Test on synthetic graphs with known structure
- Measure routing efficiency improvements
- Deliverable: Research report with benchmarks
Phase 2: Core Implementation (3 weeks)
- Implement all centrality metrics (betweenness, PageRank)
- Develop node classification
- Build three-phase routing
- Add caching and optimization
- Deliverable: Working TAGR module
Phase 3: Integration (2 weeks)
- Integrate with HNSW search
- Add adaptive weight tuning
- Create API bindings
- Write integration tests
- Deliverable: Integrated TAGR feature
Phase 4: Optimization (2 weeks)
- Profile and optimize metric computation
- Implement incremental updates
- Add visualization tools
- Write documentation
- Deliverable: Production-ready feature
Success Metrics
Performance Benchmarks
| Metric | Baseline | Target | Dataset |
|---|---|---|---|
| Path length reduction | 0% | >40% | Scale-free graph, 1M nodes |
| Search hops | 15.2 | <10.0 | Wikipedia embeddings |
| Metric computation time | N/A | <5s | Per 100K nodes |
| Memory overhead | 0MB | <200MB | Per 1M nodes |
Accuracy Metrics
-
Highway Identification: Correlation with true betweenness
- Target: Spearman correlation >0.85
-
Routing Efficiency: Hops saved vs. baseline
- Target: >30% reduction for long-range queries
-
Search Quality: Recall maintained
- Target: Recall degradation <5%
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Expensive betweenness computation | Approximate algorithms, sampling |
| Poor generalization | Test on diverse graph types |
| Classification instability | Regularization, threshold tuning |
| Metric staleness | Incremental updates, change detection |
References
- Brandes' betweenness algorithm
- PageRank and graph centrality
- Small-world and scale-free networks
- Graph-based routing in P2P networks