dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

28 KiB

Raw Blame History

Topology-Aware Gradient Routing (TAGR)

Overview

Problem Statement

Current vector search routing relies solely on embedding similarity, ignoring the rich topological structure of the graph. This leads to:

Inefficient routing: Missing "highway" nodes with high betweenness centrality
Local optima: Getting trapped in dense clusters without global context
Uniform traversal: Treating all graph regions identically despite varying structure
Poor scalability: Not leveraging graph properties for large-scale search

Proposed Solution

Route search queries based on local graph topology metrics (degree, clustering coefficient, betweenness centrality) in addition to embedding similarity. Automatically identify:

Highway nodes: High betweenness for long-range routing
Hub nodes: High degree for local exploration
Bridge nodes: Low clustering, connecting communities
Dense regions: High clustering for specialized searches

Expected Benefits

40-60% reduction in path length for long-range queries
25-35% improvement in search efficiency (fewer hops)
Automatic adaptation to graph structure (no manual tuning)
Better load balancing across graph regions
Hierarchical routing: Global highways → local hubs → targets

Novelty Claim

First integration of graph topology metrics directly into vector search routing. Unlike:

Community detection: TAGR uses local metrics, no global clustering needed
Graph neural networks: TAGR routes using topology, not learned representations
Hierarchical graphs: TAGR adapts to natural topology, no imposed hierarchy

TAGR creates an adaptive routing strategy that respects the graph's intrinsic structure.

Technical Design

Architecture Diagram

┌────────────────────────────────────────────────────────────────────┐
│                 Topology-Aware Gradient Routing                     │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────┐   │
│  │              Topology Metric Computation                    │   │
│  │                                                             │   │
│  │  For each node i:                                          │   │
│  │  • Degree: deg(i) = |neighbors(i)|                        │   │
│  │  • Clustering: C(i) = triangles(i) / potential_triangles  │   │
│  │  • Betweenness: B(i) = Σ(σ_st(i) / σ_st)                 │   │
│  │  • PageRank: PR(i) = (1-d)/N + d·Σ(PR(j)/deg(j))        │   │
│  └────────────────────────────────────────────────────────────┘   │
│                            │                                        │
│                            ▼                                        │
│  ┌────────────────────────────────────────────────────────────┐   │
│  │              Node Classification by Topology                │   │
│  │                                                             │   │
│  │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │   │
│  │   │  HIGHWAY    │  │    HUB      │  │   BRIDGE    │      │   │
│  │   │             │  │             │  │             │      │   │
│  │   │ High B(i)   │  │ High deg(i) │  │ Low C(i)    │      │   │
│  │   │ Low C(i)    │  │ Med C(i)    │  │ Med B(i)    │      │   │
│  │   │             │  │             │  │             │      │   │
│  │   │ ●═══════●  │  │    ●───●    │  │  ●     ●    │      │   │
│  │   │         ║   │  │   ╱│╲  │    │  │  │     │    │      │   │
│  │   │         ║   │  │  ● │ ● │    │  │  ●─────●    │      │   │
│  │   │         ●   │  │   ╲│╱  │    │  │             │      │   │
│  │   │             │  │    ●───●    │  │             │      │   │
│  │   └─────────────┘  └─────────────┘  └─────────────┘      │   │
│  └────────────────────────────────────────────────────────────┘   │
│                            │                                        │
│                            ▼                                        │
│  ┌────────────────────────────────────────────────────────────┐   │
│  │                  Adaptive Routing Strategy                  │   │
│  │                                                             │   │
│  │  Phase 1: Global Navigation                                │   │
│  │  ┌─────────────────────────────────────┐                  │   │
│  │  │ Route via HIGHWAY nodes             │                  │   │
│  │  │ Objective: minimize(distance to     │                  │   │
│  │  │            target community)        │                  │   │
│  │  │ Weight: 0.7·similarity +            │                  │   │
│  │  │         0.3·betweenness             │                  │   │
│  │  └─────────────────────────────────────┘                  │   │
│  │                    │                                        │   │
│  │                    ▼                                        │   │
│  │  Phase 2: Local Exploration                                │   │
│  │  ┌─────────────────────────────────────┐                  │   │
│  │  │ Route via HUB nodes                 │                  │   │
│  │  │ Objective: explore dense region     │                  │   │
│  │  │ Weight: 0.8·similarity +            │                  │   │
│  │  │         0.2·degree                  │                  │   │
│  │  └─────────────────────────────────────┘                  │   │
│  │                    │                                        │   │
│  │                    ▼                                        │   │
│  │  Phase 3: Precision Targeting                              │   │
│  │  ┌─────────────────────────────────────┐                  │   │
│  │  │ Pure similarity-based search        │                  │   │
│  │  │ Weight: 1.0·similarity              │                  │   │
│  │  └─────────────────────────────────────┘                  │   │
│  └────────────────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────────────────┘

Core Data Structures

/// Topology metrics for each node
#[derive(Clone, Debug)]
pub struct NodeTopology {
    /// Node identifier
    pub node_id: NodeId,

    /// Degree (number of neighbors)
    pub degree: usize,

    /// Clustering coefficient (0.0-1.0)
    pub clustering: f32,

    /// Betweenness centrality (normalized)
    pub betweenness: f32,

    /// PageRank score
    pub pagerank: f32,

    /// Closeness centrality
    pub closeness: f32,

    /// Eigenvector centrality
    pub eigenvector: f32,

    /// Node classification
    pub classification: NodeClass,
}

/// Node classification based on topology
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum NodeClass {
    /// High betweenness, low clustering (long-range routing)
    Highway,

    /// High degree, medium clustering (local exploration)
    Hub,

    /// Low clustering, medium betweenness (community connector)
    Bridge,

    /// High clustering (dense local region)
    Dense,

    /// Low degree, high clustering (leaf node)
    Leaf,

    /// Doesn't fit other categories
    Ordinary,
}

/// Configuration for topology-aware routing
#[derive(Clone, Debug)]
pub struct TagrConfig {
    /// Metrics to compute (performance vs. accuracy trade-off)
    pub metrics: MetricSet,

    /// Node classification thresholds
    pub classification_thresholds: ClassificationThresholds,

    /// Routing strategy
    pub routing_strategy: RoutingStrategy,

    /// Update frequency for topology metrics
    pub update_interval: Duration,

    /// Enable adaptive weight tuning
    pub adaptive_weights: bool,
}

/// Which topology metrics to compute
#[derive(Clone, Debug)]
pub struct MetricSet {
    pub degree: bool,
    pub clustering: bool,
    pub betweenness: bool,
    pub pagerank: bool,
    pub closeness: bool,
    pub eigenvector: bool,
}

/// Thresholds for node classification
#[derive(Clone, Debug)]
pub struct ClassificationThresholds {
    /// Betweenness threshold for highways (top X%)
    pub highway_betweenness_percentile: f32,  // default: 0.95

    /// Degree threshold for hubs (top X%)
    pub hub_degree_percentile: f32,  // default: 0.90

    /// Clustering threshold for dense regions
    pub dense_clustering_threshold: f32,  // default: 0.7

    /// Maximum clustering for bridges
    pub bridge_clustering_max: f32,  // default: 0.3
}

/// Routing strategy configuration
#[derive(Clone, Debug)]
pub enum RoutingStrategy {
    /// Three-phase: highway → hub → target
    ThreePhase {
        phase1_weight: PhaseWeights,
        phase2_weight: PhaseWeights,
        phase3_weight: PhaseWeights,
    },

    /// Adaptive: dynamically choose weights based on query progress
    Adaptive {
        initial_weights: PhaseWeights,
        adaptation_rate: f32,
    },

    /// Custom strategy
    Custom(fn(&SearchState) -> PhaseWeights),
}

/// Weights for combining similarity and topology
#[derive(Clone, Debug)]
pub struct PhaseWeights {
    pub similarity: f32,
    pub degree: f32,
    pub clustering: f32,
    pub betweenness: f32,
    pub pagerank: f32,
}

/// Current search state for adaptive routing
#[derive(Clone, Debug)]
pub struct SearchState {
    /// Nodes visited so far
    pub visited: Vec<NodeId>,

    /// Current position
    pub current: NodeId,

    /// Best similarity seen so far
    pub best_similarity: f32,

    /// Number of hops taken
    pub hops: usize,

    /// Estimated distance to target (embedding space)
    pub estimated_distance: f32,
}

/// Topology-aware router
pub struct TopologyRouter {
    /// Topology metrics for all nodes
    metrics: Vec<NodeTopology>,

    /// Fast lookup by node class
    class_index: HashMap<NodeClass, Vec<NodeId>>,

    /// Configuration
    config: TagrConfig,

    /// Cached routing decisions
    routing_cache: LruCache<(NodeId, NodeId), Vec<NodeId>>,
}

Key Algorithms

// Pseudocode for topology-aware routing

/// Compute topology metrics for graph
fn compute_topology_metrics(graph: &HnswGraph) -> Vec<NodeTopology> {
    let n = graph.node_count();
    let mut metrics = vec![NodeTopology::default(); n];

    // Phase 1: Local metrics (degree, clustering)
    for node in 0..n {
        let neighbors = graph.get_neighbors(node, layer=0);
        metrics[node].degree = neighbors.len();

        // Clustering coefficient: fraction of neighbor pairs connected
        let mut triangles = 0;
        let mut possible = 0;

        for i in 0..neighbors.len() {
            for j in (i+1)..neighbors.len() {
                possible += 1;
                if graph.has_edge(neighbors[i], neighbors[j]) {
                    triangles += 1;
                }
            }
        }

        metrics[node].clustering = if possible > 0 {
            triangles as f32 / possible as f32
        } else {
            0.0
        };
    }

    // Phase 2: Global metrics (betweenness, PageRank)
    // Betweenness: fraction of shortest paths passing through node
    metrics = compute_betweenness(graph, metrics);

    // PageRank: iterative link analysis
    metrics = compute_pagerank(graph, metrics);

    // Phase 3: Classify nodes
    for i in 0..n {
        metrics[i].classification = classify_node(&metrics[i], &metrics);
    }

    metrics
}

/// Betweenness centrality using Brandes' algorithm
fn compute_betweenness(
    graph: &HnswGraph,
    mut metrics: Vec<NodeTopology>
) -> Vec<NodeTopology> {
    let n = graph.node_count();
    let mut betweenness = vec![0.0; n];

    // For each source node
    for s in 0..n {
        let mut stack = Vec::new();
        let mut paths = vec![Vec::new(); n];
        let mut sigma = vec![0.0; n];
        sigma[s] = 1.0;
        let mut dist = vec![-1; n];
        dist[s] = 0;

        // BFS from s
        let mut queue = VecDeque::new();
        queue.push_back(s);

        while let Some(v) = queue.pop_front() {
            stack.push(v);

            for w in graph.get_neighbors(v, layer=0) {
                // First visit to w?
                if dist[w] < 0 {
                    dist[w] = dist[v] + 1;
                    queue.push_back(w);
                }

                // Shortest path to w via v?
                if dist[w] == dist[v] + 1 {
                    sigma[w] += sigma[v];
                    paths[w].push(v);
                }
            }
        }

        // Accumulate betweenness
        let mut delta = vec![0.0; n];
        while let Some(w) = stack.pop() {
            for v in &paths[w] {
                delta[*v] += (sigma[*v] / sigma[w]) * (1.0 + delta[w]);
            }
            if w != s {
                betweenness[w] += delta[w];
            }
        }
    }

    // Normalize
    let max_betweenness = betweenness.iter().cloned().fold(0.0, f32::max);
    for i in 0..n {
        metrics[i].betweenness = betweenness[i] / max_betweenness;
    }

    metrics
}

/// Classify node based on topology metrics
fn classify_node(
    node: &NodeTopology,
    all_metrics: &[NodeTopology]
) -> NodeClass {
    // Compute percentiles
    let betweenness_percentile = compute_percentile(
        all_metrics.iter().map(|m| m.betweenness),
        node.betweenness
    );

    let degree_percentile = compute_percentile(
        all_metrics.iter().map(|m| m.degree as f32),
        node.degree as f32
    );

    // Classification logic
    if betweenness_percentile > 0.95 && node.clustering < 0.3 {
        NodeClass::Highway
    } else if degree_percentile > 0.90 && node.clustering > 0.4 {
        NodeClass::Hub
    } else if node.clustering < 0.3 && betweenness_percentile > 0.7 {
        NodeClass::Bridge
    } else if node.clustering > 0.7 {
        NodeClass::Dense
    } else if node.degree < 5 && node.clustering > 0.6 {
        NodeClass::Leaf
    } else {
        NodeClass::Ordinary
    }
}

/// Topology-aware search with three-phase routing
fn tagr_search(
    query: &[f32],
    graph: &HnswGraph,
    router: &TopologyRouter,
    k: usize
) -> Vec<SearchResult> {
    let mut current = graph.entry_point;
    let mut visited = HashSet::new();
    let mut best_similarity = -1.0;
    let mut hops = 0;

    let state = SearchState {
        visited: Vec::new(),
        current,
        best_similarity,
        hops,
        estimated_distance: f32::MAX,
    };

    // Phase 1: Global navigation via highways
    while in_phase_1(&state) {
        let neighbors = graph.get_neighbors(current, layer=0);
        let mut best_neighbor = None;
        let mut best_score = f32::MIN;

        for neighbor in neighbors {
            if visited.contains(&neighbor) { continue; }

            let topo = &router.metrics[neighbor];
            let embedding = graph.get_embedding(neighbor);
            let similarity = cosine_similarity(query, embedding);

            // Phase 1 weights: favor highways
            let score = 0.6 * similarity + 0.4 * topo.betweenness;

            if score > best_score {
                best_score = score;
                best_neighbor = Some(neighbor);
            }
        }

        if let Some(next) = best_neighbor {
            current = next;
            visited.insert(current);
            hops += 1;

            let similarity = cosine_similarity(
                query,
                graph.get_embedding(current)
            );
            best_similarity = best_similarity.max(similarity);
        } else {
            break;
        }
    }

    // Phase 2: Local exploration via hubs
    while in_phase_2(&state) {
        let neighbors = graph.get_neighbors(current, layer=0);
        let mut best_neighbor = None;
        let mut best_score = f32::MIN;

        for neighbor in neighbors {
            if visited.contains(&neighbor) { continue; }

            let topo = &router.metrics[neighbor];
            let embedding = graph.get_embedding(neighbor);
            let similarity = cosine_similarity(query, embedding);

            // Phase 2 weights: favor hubs and similarity
            let degree_score = topo.degree as f32 / graph.max_degree() as f32;
            let score = 0.8 * similarity + 0.2 * degree_score;

            if score > best_score {
                best_score = score;
                best_neighbor = Some(neighbor);
            }
        }

        if let Some(next) = best_neighbor {
            current = next;
            visited.insert(current);
            hops += 1;

            let similarity = cosine_similarity(
                query,
                graph.get_embedding(current)
            );
            best_similarity = best_similarity.max(similarity);
        } else {
            break;
        }
    }

    // Phase 3: Pure similarity search
    standard_greedy_search(query, graph, current, k, visited)
}

/// Adaptive weight tuning based on search progress
fn adaptive_routing(
    state: &SearchState,
    router: &TopologyRouter
) -> PhaseWeights {
    let progress = estimate_progress(state);

    // Early (global navigation): emphasize topology
    // Middle (local exploration): balanced
    // Late (precision targeting): emphasize similarity

    let topology_weight = (1.0 - progress) * 0.5;
    let similarity_weight = 0.5 + progress * 0.5;

    PhaseWeights {
        similarity: similarity_weight,
        degree: topology_weight * 0.3,
        clustering: topology_weight * 0.2,
        betweenness: topology_weight * 0.4,
        pagerank: topology_weight * 0.1,
    }
}

API Design

/// Public API for Topology-Aware Gradient Routing
pub trait TopologyAwareRouting {
    /// Create topology router for graph
    fn new(graph: &HnswGraph, config: TagrConfig) -> Self;

    /// Search with topology-aware routing
    fn search(
        &self,
        query: &[f32],
        k: usize,
        options: TagrSearchOptions,
    ) -> Result<Vec<SearchResult>, TagrError>;

    /// Get topology metrics for node
    fn get_metrics(&self, node_id: NodeId) -> &NodeTopology;

    /// Find nearest highway nodes
    fn find_highways(&self, point: &[f32], k: usize) -> Vec<NodeId>;

    /// Find hubs in region
    fn find_hubs(&self, center: &[f32], radius: f32) -> Vec<NodeId>;

    /// Get nodes by classification
    fn get_by_class(&self, class: NodeClass) -> &[NodeId];

    /// Update topology metrics (incremental)
    fn update_metrics(&mut self, changed_nodes: &[NodeId]) -> Result<(), TagrError>;

    /// Recompute all metrics (full update)
    fn recompute_metrics(&mut self) -> Result<(), TagrError>;

    /// Export topology visualization
    fn export_topology(&self) -> TopologyVisualization;

    /// Get routing statistics
    fn statistics(&self) -> RoutingStatistics;
}

/// Search options for TAGR
#[derive(Clone, Debug)]
pub struct TagrSearchOptions {
    /// Routing strategy override
    pub strategy: Option<RoutingStrategy>,

    /// Prefer specific node classes
    pub prefer_classes: Vec<NodeClass>,

    /// Avoid specific node classes
    pub avoid_classes: Vec<NodeClass>,

    /// Enable path recording
    pub record_path: bool,

    /// Maximum hops
    pub max_hops: usize,
}

/// Routing statistics
#[derive(Clone, Debug)]
pub struct RoutingStatistics {
    /// Total searches performed
    pub total_searches: usize,

    /// Average path length
    pub avg_path_length: f32,

    /// Highway usage rate
    pub highway_usage: f32,

    /// Hub usage rate
    pub hub_usage: f32,

    /// Average hops by phase
    pub hops_by_phase: [f32; 3],

    /// Node class distribution
    pub class_distribution: HashMap<NodeClass, usize>,
}

/// Topology visualization export
#[derive(Clone, Debug, Serialize)]
pub struct TopologyVisualization {
    pub nodes: Vec<TopoNode>,
    pub highways: Vec<NodeId>,
    pub hubs: Vec<NodeId>,
    pub bridges: Vec<NodeId>,
    pub metrics_summary: MetricsSummary,
}

#[derive(Clone, Debug, Serialize)]
pub struct TopoNode {
    pub id: NodeId,
    pub class: NodeClass,
    pub degree: usize,
    pub betweenness: f32,
    pub clustering: f32,
}

#[derive(Clone, Debug, Serialize)]
pub struct MetricsSummary {
    pub total_nodes: usize,
    pub avg_degree: f32,
    pub avg_clustering: f32,
    pub max_betweenness: f32,
}

Integration Points

Affected Crates/Modules

crates/ruvector-core/src/hnsw/
- Add topology metadata to nodes
- Modify routing to use topology metrics
- Extend search API for topology options
crates/ruvector-gnn/src/routing/
- Create new routing module
- Integrate with existing GNN layers
crates/ruvector-core/src/metrics/
- Implement graph centrality algorithms
- Add metric computation utilities

New Modules to Create

crates/ruvector-gnn/src/topology/
- metrics.rs - Topology metric computation
- classification.rs - Node classification
- router.rs - Topology-aware routing
- adaptive.rs - Adaptive weight tuning
- cache.rs - Metric caching and updates
crates/ruvector-core/src/graph/
- centrality.rs - Centrality algorithms (betweenness, PageRank)
- clustering.rs - Clustering coefficient
- analysis.rs - Graph analysis utilities

Dependencies on Other Features

Feature 10 (Gravitational Fields): Combine topology routing with gravitational pull
Feature 11 (Causal Networks): Adapt topology metrics for DAGs
Feature 13 (Crystallization): Use topology to identify hierarchy levels

Regression Prevention

Existing Functionality at Risk

Search Performance
- Risk: Topology computation overhead
- Prevention: Incremental updates, caching, optional feature
Search Quality
- Risk: Poor topology routing on certain graph structures
- Prevention: Adaptive fallback to pure similarity
Memory Usage
- Risk: Storing topology metrics per node
- Prevention: Lazy computation, sparse storage

Test Cases

#[cfg(test)]
mod regression_tests {
    /// Verify highways reduce path length
    #[test]
    fn test_highway_routing_efficiency() {
        let graph = create_scale_free_graph(10000);
        let router = TopologyRouter::new(&graph, TagrConfig::default());

        let query = random_vector(128);

        // Standard search
        let (standard_results, standard_path) = graph.search_with_path(&query, 10);

        // TAGR search
        let (tagr_results, tagr_path) = router.search_with_path(&query, 10);

        // TAGR should take fewer hops
        assert!(tagr_path.len() < standard_path.len());

        // But maintain similar quality
        let standard_recall = compute_recall(&standard_results, &ground_truth);
        let tagr_recall = compute_recall(&tagr_results, &ground_truth);
        assert!((tagr_recall - standard_recall).abs() < 0.05);
    }

    /// Verify correct node classification
    #[test]
    fn test_node_classification() {
        let graph = create_test_graph_with_known_structure();
        let router = TopologyRouter::new(&graph, TagrConfig::default());

        // Verify known highways
        let highways = router.get_by_class(NodeClass::Highway);
        assert!(highways.contains(&known_highway_node));

        // Verify known hubs
        let hubs = router.get_by_class(NodeClass::Hub);
        assert!(hubs.contains(&known_hub_node));
    }

    /// Incremental metric updates
    #[test]
    fn test_incremental_updates() {
        let mut graph = create_test_graph(1000);
        let mut router = TopologyRouter::new(&graph, TagrConfig::default());

        let original_metrics = router.get_metrics(0).clone();

        // Add edges
        graph.add_edge(0, 500);
        graph.add_edge(0, 501);

        // Incremental update
        router.update_metrics(&[0, 500, 501]).unwrap();

        let updated_metrics = router.get_metrics(0);

        // Degree should increase
        assert!(updated_metrics.degree > original_metrics.degree);
    }
}

Implementation Phases

Phase 1: Research Validation (2 weeks)

Implement basic topology metrics (degree, clustering)
Test on synthetic graphs with known structure
Measure routing efficiency improvements
Deliverable: Research report with benchmarks

Phase 2: Core Implementation (3 weeks)

Implement all centrality metrics (betweenness, PageRank)
Develop node classification
Build three-phase routing
Add caching and optimization
Deliverable: Working TAGR module

Phase 3: Integration (2 weeks)

Integrate with HNSW search
Add adaptive weight tuning
Create API bindings
Write integration tests
Deliverable: Integrated TAGR feature

Phase 4: Optimization (2 weeks)

Profile and optimize metric computation
Implement incremental updates
Add visualization tools
Write documentation
Deliverable: Production-ready feature

Success Metrics

Performance Benchmarks

Metric	Baseline	Target	Dataset
Path length reduction	0%	>40%	Scale-free graph, 1M nodes
Search hops	15.2	<10.0	Wikipedia embeddings
Metric computation time	N/A	<5s	Per 100K nodes
Memory overhead	0MB	<200MB	Per 1M nodes

Accuracy Metrics

Highway Identification: Correlation with true betweenness
- Target: Spearman correlation >0.85
Routing Efficiency: Hops saved vs. baseline
- Target: >30% reduction for long-range queries
Search Quality: Recall maintained
- Target: Recall degradation <5%

Risks and Mitigations

Risk	Mitigation
Expensive betweenness computation	Approximate algorithms, sampling
Poor generalization	Test on diverse graph types
Classification instability	Regularization, threshold tuning
Metric staleness	Incremental updates, change detection

References

Brandes' betweenness algorithm
PageRank and graph centrality
Small-world and scale-free networks
Graph-based routing in P2P networks

28 KiB Raw Blame History Unescape Escape