38 KiB
Graph Condensation (SFGC) - Implementation Plan
Overview
Problem Statement
HNSW graphs in production vector databases face critical deployment challenges:
- Memory Footprint: Full HNSW graphs require 40-120 bytes per vector for connectivity metadata
- Edge Deployment: Mobile/IoT devices cannot store million-node graphs (400MB-4.8GB overhead)
- Federated Learning: Transferring full graphs between nodes is bandwidth-prohibitive
- Cold Start: Initial graph construction is expensive for dynamic applications
Proposed Solution
Implement Structure-Preserving Graph Condensation (SFGC) that creates synthetic "super-nodes" representing clusters of original nodes. The condensed graph:
- Reduces graph size by 10-100x (configurable compression ratio)
- Preserves topological properties (small-world, scale-free characteristics)
- Maintains search accuracy within 2-5% of full graph
- Enables progressive graph expansion from condensed to full representation
Core Innovation: Unlike naive graph coarsening, SFGC learns synthetic node embeddings that maximize structural fidelity using a differentiable graph neural network.
Expected Benefits (Quantified)
| Metric | Current (Full HNSW) | With SFGC (50x) | Improvement |
|---|---|---|---|
| Memory footprint | 4.8GB (1M vectors) | 96MB | 50x reduction |
| Transfer bandwidth | 4.8GB | 96MB | 50x reduction |
| Edge device compatibility | Limited to 100K vectors | 5M vectors | 50x capacity |
| Cold start time | 120s | 8s + progressive | 15x faster |
| Search accuracy (recall@10) | 0.95 | 0.92-0.94 | 2-3% degradation |
| Search latency | 1.2ms | 1.5ms (initial), 1.2ms (expanded) | 25% slower → same |
ROI Calculation:
- Edge deployment: enables $500 devices vs $2000 workstations
- Federated learning: 50x faster synchronization (2.4s vs 120s)
- Multi-tenant SaaS: 50x more graphs per server
Technical Design
Architecture Diagram (ASCII)
┌─────────────────────────────────────────────────────────────────┐
│ Graph Condensation Pipeline │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ Offline Condensation │ │ Online Expansion │
│ (Training) │ │ (Runtime) │
└───────────────────────┘ └───────────────────────┘
│ │
┌───────────┼───────────┐ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌─────────┐ ┌─────────────┐
│Cluster │ │Synth │ │Edge │ │Progressive │
│ ing │ │Node │ │Preserv │ │Decompression│
│ │ │Learn │ │ation │ │ │
└────────┘ └────────┘ └─────────┘ └─────────────┘
│ │ │ │
└───────────┼───────────┘ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Condensed │─────────▶│ Hybrid │
│ Graph File │ Load │ Graph Store │
│ (.cgraph) │ │ │
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ Search API │
│ (adaptive) │
└──────────────┘
Component Flow:
-
Offline Condensation (Training Phase):
- Hierarchical clustering of original graph
- GNN-based synthetic node embedding learning
- Edge weight optimization via structure preservation loss
- Export to
.cgraphformat
-
Online Expansion (Runtime):
- Load condensed graph for fast cold start
- Progressive decompression on cache misses
- Adaptive switching between condensed/full graph
Core Data Structures (Rust)
/// Condensed graph representation with synthetic nodes
#[derive(Clone, Debug)]
pub struct CondensedGraph {
/// Synthetic node embeddings (learned via GNN)
pub synthetic_nodes: Vec<SyntheticNode>,
/// Condensed HNSW layers (smaller topology)
pub condensed_layers: Vec<HnswLayer>,
/// Compression ratio (e.g., 50.0 for 50x)
pub compression_ratio: f32,
/// Mapping from synthetic node to original node IDs
pub expansion_map: HashMap<NodeId, Vec<NodeId>>,
/// Graph statistics for adaptive expansion
pub stats: GraphStatistics,
}
/// Synthetic node representing a cluster of original nodes
#[derive(Clone, Debug)]
pub struct SyntheticNode {
/// Learned embedding (centroid of cluster, refined via GNN)
pub embedding: Vec<f32>,
/// Original node IDs in this cluster
pub cluster_members: Vec<NodeId>,
/// Cluster radius (for expansion threshold)
pub radius: f32,
/// Connectivity in condensed graph
pub neighbors: Vec<(NodeId, f32)>, // (neighbor_id, edge_weight)
/// Access frequency (for adaptive expansion)
pub access_count: AtomicU64,
}
/// Configuration for graph condensation process
#[derive(Clone, Debug)]
pub struct CondensationConfig {
/// Target compression ratio (10-100)
pub compression_ratio: f32,
/// Clustering method
pub clustering_method: ClusteringMethod,
/// GNN training epochs for synthetic nodes
pub gnn_epochs: usize,
/// Structure preservation weight (vs embedding quality)
pub structure_weight: f32,
/// Edge preservation strategy
pub edge_strategy: EdgePreservationStrategy,
}
#[derive(Clone, Debug)]
pub enum ClusteringMethod {
/// Hierarchical agglomerative clustering
Hierarchical { linkage: LinkageType },
/// Louvain modularity-based clustering
Louvain { resolution: f32 },
/// Spectral clustering via graph Laplacian
Spectral { n_components: usize },
/// Custom clustering function
Custom(Box<dyn Fn(&HnswIndex) -> Vec<Vec<NodeId>>>),
}
#[derive(Clone, Debug)]
pub enum EdgePreservationStrategy {
/// Keep edges if both endpoints map to different synthetic nodes
InterCluster,
/// Weighted by cluster similarity
WeightedSimilarity,
/// Learn edge weights via GNN
Learned,
}
/// Hybrid graph store supporting both condensed and full graphs
pub struct HybridGraphStore {
/// Condensed graph (always loaded)
condensed: CondensedGraph,
/// Full graph (lazily loaded regions)
full_graph: Option<Arc<RwLock<HnswIndex>>>,
/// Expanded regions cache
expanded_cache: LruCache<NodeId, ExpandedRegion>,
/// Expansion policy
policy: ExpansionPolicy,
}
/// Policy for when to expand condensed nodes to full graph
#[derive(Clone, Debug)]
pub enum ExpansionPolicy {
/// Never expand (use condensed graph only)
Never,
/// Expand on cache miss
OnDemand { cache_size: usize },
/// Expand regions with high query frequency
Adaptive { threshold: f64 },
/// Always use full graph
Always,
}
/// Expanded region of the full graph
struct ExpandedRegion {
/// Full node data for this region
nodes: Vec<FullNode>,
/// Last access timestamp
last_access: Instant,
/// Access count
access_count: u64,
}
/// Statistics for monitoring condensation quality
#[derive(Clone, Debug, Default)]
pub struct GraphStatistics {
/// Average cluster size
pub avg_cluster_size: f32,
/// Cluster size variance
pub cluster_variance: f32,
/// Edge preservation ratio (condensed edges / original edges)
pub edge_preservation: f32,
/// Average path length increase
pub path_length_delta: f32,
/// Clustering coefficient preservation
pub clustering_coef_ratio: f32,
}
Key Algorithms (Pseudocode)
Algorithm 1: Graph Condensation (Offline Training)
function condense_graph(hnsw_index, config):
# Step 1: Hierarchical clustering
clusters = hierarchical_cluster(
hnsw_index.nodes,
target_clusters = hnsw_index.size / config.compression_ratio
)
# Step 2: Initialize synthetic node embeddings
synthetic_nodes = []
for cluster in clusters:
centroid = compute_centroid(cluster.members)
synthetic_nodes.append(SyntheticNode {
embedding: centroid,
cluster_members: cluster.members,
radius: compute_cluster_radius(cluster),
neighbors: [],
access_count: 0
})
# Step 3: Build condensed edges
condensed_edges = build_condensed_edges(
hnsw_index,
clusters,
config.edge_strategy
)
# Step 4: GNN-based refinement
gnn_model = GraphNeuralNetwork(
input_dim = embedding_dim,
hidden_dims = [128, 64],
output_dim = embedding_dim
)
optimizer = Adam(gnn_model.parameters(), lr=0.001)
for epoch in 1..config.gnn_epochs:
# Forward pass: refine synthetic embeddings
refined_embeddings = gnn_model.forward(
synthetic_nodes.embeddings,
condensed_edges
)
# Compute structure preservation loss
loss = compute_structure_loss(
refined_embeddings,
condensed_edges,
original_graph = hnsw_index,
expansion_map = clusters,
structure_weight = config.structure_weight
)
# Backward pass
loss.backward()
optimizer.step()
# Update synthetic embeddings
for i, node in enumerate(synthetic_nodes):
node.embedding = refined_embeddings[i]
# Step 5: Build condensed HNSW layers
condensed_layers = build_hnsw_layers(
synthetic_nodes,
condensed_edges,
max_layer = hnsw_index.max_layer
)
return CondensedGraph {
synthetic_nodes,
condensed_layers,
compression_ratio: config.compression_ratio,
expansion_map: clusters,
stats: compute_statistics(...)
}
function compute_structure_loss(embeddings, edges, original_graph, expansion_map, structure_weight):
# Part 1: Embedding quality (centroid fidelity)
embedding_loss = 0
for i, synthetic_node in enumerate(embeddings):
cluster_members = expansion_map[i]
original_embeddings = [original_graph.get_embedding(id) for id in cluster_members]
true_centroid = mean(original_embeddings)
embedding_loss += mse(synthetic_node, true_centroid)
# Part 2: Structure preservation (edge connectivity)
structure_loss = 0
for (u, v, weight) in edges:
# Check if original graph had path between clusters u and v
cluster_u = expansion_map[u]
cluster_v = expansion_map[v]
original_connectivity = compute_inter_cluster_connectivity(
original_graph, cluster_u, cluster_v
)
predicted_connectivity = cosine_similarity(embeddings[u], embeddings[v])
structure_loss += mse(predicted_connectivity, original_connectivity)
# Part 3: Topological invariants
topo_loss = 0
condensed_clustering_coef = compute_clustering_coefficient(embeddings, edges)
original_clustering_coef = original_graph.clustering_coefficient
topo_loss += abs(condensed_clustering_coef - original_clustering_coef)
return (1 - structure_weight) * embedding_loss +
structure_weight * (structure_loss + 0.1 * topo_loss)
Algorithm 2: Progressive Expansion (Online Runtime)
function search_hybrid_graph(query, k, hybrid_store):
# Step 1: Search in condensed graph
condensed_results = search_condensed(
query,
hybrid_store.condensed,
k_initial = k * 2 # oversample
)
# Step 2: Decide whether to expand
if hybrid_store.policy == ExpansionPolicy::Never:
return refine_condensed_results(condensed_results, k)
# Step 3: Identify expansion candidates
expansion_candidates = []
for result in condensed_results:
synthetic_node = result.node
# Expand if: high uncertainty OR cache miss OR high query frequency
should_expand = (
result.distance < synthetic_node.radius * 1.5 OR # uncertainty
not hybrid_store.expanded_cache.contains(synthetic_node.id) OR # cache miss
synthetic_node.access_count.load() > adaptive_threshold # hot region
)
if should_expand:
expansion_candidates.append(synthetic_node.id)
# Step 4: Expand regions (lazily load from full graph)
if len(expansion_candidates) > 0:
expanded_regions = hybrid_store.expand_regions(expansion_candidates)
# Step 5: Refine search in expanded regions
refined_results = []
for region in expanded_regions:
local_results = search_full_graph(
query,
region.nodes,
k_local = k
)
refined_results.extend(local_results)
# Merge condensed and expanded results
all_results = merge_results(condensed_results, refined_results)
return top_k(all_results, k)
else:
# No expansion needed
return refine_condensed_results(condensed_results, k)
function expand_regions(hybrid_store, synthetic_node_ids):
expanded = []
for node_id in synthetic_node_ids:
# Check cache first
if hybrid_store.expanded_cache.contains(node_id):
expanded.append(hybrid_store.expanded_cache.get(node_id))
continue
# Load from full graph (disk or memory)
synthetic_node = hybrid_store.condensed.synthetic_nodes[node_id]
cluster_member_ids = synthetic_node.cluster_members
full_nodes = []
if hybrid_store.full_graph.is_some():
# Full graph in memory
full_graph = hybrid_store.full_graph.unwrap()
for member_id in cluster_member_ids:
full_nodes.append(full_graph.get_node(member_id))
else:
# Load from disk (mmap)
full_nodes = load_nodes_from_disk(cluster_member_ids)
region = ExpandedRegion {
nodes: full_nodes,
last_access: now(),
access_count: 1
}
# Add to cache (evict LRU if full)
hybrid_store.expanded_cache.put(node_id, region)
expanded.append(region)
return expanded
API Design (Function Signatures)
// ============================================================
// Public API for Graph Condensation
// ============================================================
pub trait GraphCondensation {
/// Condense an HNSW index into a smaller graph
fn condense(
&self,
config: CondensationConfig,
) -> Result<CondensedGraph, CondensationError>;
/// Save condensed graph to disk
fn save_condensed(&self, path: &Path) -> Result<(), io::Error>;
/// Load condensed graph from disk
fn load_condensed(path: &Path) -> Result<CondensedGraph, io::Error>;
/// Validate condensation quality
fn validate_condensation(
&self,
condensed: &CondensedGraph,
test_queries: &[Vec<f32>],
) -> ValidationMetrics;
}
pub trait HybridGraphSearch {
/// Search using hybrid condensed/full graph
fn search_hybrid(
&self,
query: &[f32],
k: usize,
policy: ExpansionPolicy,
) -> Result<Vec<SearchResult>, SearchError>;
/// Adaptive search with automatic expansion
fn search_adaptive(
&self,
query: &[f32],
k: usize,
recall_target: f32, // e.g., 0.95
) -> Result<Vec<SearchResult>, SearchError>;
/// Get current cache statistics
fn cache_stats(&self) -> CacheStatistics;
/// Preload hot regions into cache
fn warmup_cache(&mut self, query_log: &[Vec<f32>]) -> Result<(), CacheError>;
}
// ============================================================
// Configuration API
// ============================================================
impl CondensationConfig {
/// Default configuration for 50x compression
pub fn default_50x() -> Self {
Self {
compression_ratio: 50.0,
clustering_method: ClusteringMethod::Hierarchical {
linkage: LinkageType::Ward,
},
gnn_epochs: 100,
structure_weight: 0.7,
edge_strategy: EdgePreservationStrategy::Learned,
}
}
/// Aggressive compression for edge devices (100x)
pub fn edge_device() -> Self {
Self {
compression_ratio: 100.0,
clustering_method: ClusteringMethod::Louvain {
resolution: 1.2,
},
gnn_epochs: 50,
structure_weight: 0.5,
edge_strategy: EdgePreservationStrategy::InterCluster,
}
}
/// Conservative compression for high accuracy (10x)
pub fn high_accuracy() -> Self {
Self {
compression_ratio: 10.0,
clustering_method: ClusteringMethod::Spectral {
n_components: 128,
},
gnn_epochs: 200,
structure_weight: 0.9,
edge_strategy: EdgePreservationStrategy::Learned,
}
}
}
// ============================================================
// Monitoring and Metrics
// ============================================================
#[derive(Clone, Debug)]
pub struct ValidationMetrics {
/// Recall at different k values
pub recall_at_k: HashMap<usize, f32>,
/// Average path length increase
pub avg_path_length_ratio: f32,
/// Search latency comparison
pub latency_ratio: f32,
/// Memory reduction achieved
pub memory_reduction: f32,
/// Graph property preservation
pub property_preservation: PropertyPreservation,
}
#[derive(Clone, Debug)]
pub struct PropertyPreservation {
pub clustering_coefficient: f32,
pub average_degree: f32,
pub diameter_ratio: f32,
}
#[derive(Clone, Debug)]
pub struct CacheStatistics {
pub hit_rate: f32,
pub eviction_count: u64,
pub avg_expansion_time: Duration,
pub total_expansions: u64,
}
Integration Points
Affected Crates/Modules
-
ruvector-gnn(Core GNN crate):- Add
condensation/module for graph compression - Extend
HnswIndexwithcondense()method - Add GNN training loop for synthetic node refinement
- Add
-
ruvector-core:- Add
CondensedGraphserialization format (.cgraph) - Extend search API with hybrid search modes
- Add
HybridGraphStoreas alternative index backend
- Add
-
ruvector-gnn-node(Node.js bindings):- Expose
condense()API to JavaScript/TypeScript - Add configuration builder for condensation parameters
- Provide progress callbacks for offline condensation
- Expose
-
ruvector-cli:- Add
ruvector condensecommand for offline condensation - Add
ruvector validate-condensedfor quality testing - Add visualization for condensed graph statistics
- Add
-
ruvector-distributed:- Use condensed graphs for federated learning synchronization
- Implement condensed graph transfer protocol
- Add merge logic for condensed graphs from multiple nodes
New Modules to Create
crates/ruvector-gnn/src/condensation/
├── mod.rs # Public API
├── clustering.rs # Hierarchical/Louvain/Spectral clustering
├── synthetic_node.rs # Synthetic node learning via GNN
├── edge_preservation.rs # Edge weight computation
├── gnn_trainer.rs # GNN training loop
├── structure_loss.rs # Loss functions for structure preservation
├── serialization.rs # .cgraph format I/O
└── validation.rs # Quality metrics
crates/ruvector-core/src/hybrid/
├── mod.rs # HybridGraphStore
├── expansion_policy.rs # Adaptive expansion logic
├── cache.rs # LRU cache for expanded regions
└── search.rs # Hybrid search algorithms
crates/ruvector-gnn-node/condensation/
├── bindings.rs # NAPI bindings
└── typescript/
└── condensation.d.ts # TypeScript definitions
Dependencies on Other Features
-
Prerequisite: Attention Mechanisms (Tier 1):
- SFGC uses attention-weighted clustering
- Synthetic node embeddings benefit from attention-based aggregation
- Action: Ensure attention module is stable before SFGC integration
-
Synergy: Adaptive HNSW (Tier 2, Feature #5):
- Adaptive HNSW can use condensed graph for cold start
- Layer-wise compression ratios (compress higher layers more aggressively)
- Integration: Shared
ExpansionPolicytrait
-
Optional: Neuromorphic Spiking (Tier 2, Feature #6):
- Spiking networks can accelerate GNN training for synthetic nodes
- Integration: Conditional compilation flag for spiking backend
-
Complementary: Sparse Attention (Tier 3, Feature #8):
- Sparse attention patterns can guide clustering
- Integration: Use learned attention masks as clustering hints
Regression Prevention
Existing Functionality at Risk
-
HNSW Search Accuracy:
- Risk: Condensed graph returns lower-quality results
- Mitigation:
- Validate recall@10 >= 0.92 on standard benchmarks (SIFT1M, GIST1M)
- Add A/B testing framework for condensed vs full graph
- Default to conservative 10x compression
-
Memory Safety (Rust):
- Risk: Expansion cache causes use-after-free or data races
- Mitigation:
- Use
Arc<RwLock<...>>for shared ownership - Fuzz testing with ThreadSanitizer
- Property-based testing with
proptest
- Use
-
Serialization Format Compatibility:
- Risk:
.cgraphformat breaks existing index loading - Mitigation:
- Separate file extension (
.cgraphvs.hnsw) - Version magic number in header
- Fallback to full graph if condensation fails
- Separate file extension (
- Risk:
-
Node.js Bindings Performance:
- Risk: Condensation adds latency to JavaScript API
- Mitigation:
- Make condensation opt-in (separate method)
- Async/non-blocking condensation API
- Progress callbacks to avoid blocking event loop
Test Cases to Prevent Regressions
// Test 1: Search quality preservation
#[test]
fn test_condensed_search_recall() {
let full_index = build_test_index(10000);
let condensed = full_index.condense(CondensationConfig::default_50x()).unwrap();
let test_queries = generate_test_queries(100);
for query in test_queries {
let full_results = full_index.search(&query, 10);
let condensed_results = condensed.search(&query, 10);
let recall = compute_recall(&full_results, &condensed_results);
assert!(recall >= 0.92, "Recall dropped below 92%: {}", recall);
}
}
// Test 2: Memory reduction
#[test]
fn test_memory_footprint() {
let full_index = build_test_index(100000);
let condensed = full_index.condense(CondensationConfig::default_50x()).unwrap();
let full_size = full_index.memory_usage();
let condensed_size = condensed.memory_usage();
let reduction = full_size as f32 / condensed_size as f32;
assert!(reduction >= 40.0, "Memory reduction below 40x: {}", reduction);
}
// Test 3: Serialization round-trip
#[test]
fn test_condensed_serialization() {
let original = build_test_index(1000).condense(CondensationConfig::default_50x()).unwrap();
let path = "/tmp/test.cgraph";
original.save_condensed(Path::new(path)).unwrap();
let loaded = CondensedGraph::load_condensed(Path::new(path)).unwrap();
assert_eq!(original.synthetic_nodes.len(), loaded.synthetic_nodes.len());
assert_eq!(original.compression_ratio, loaded.compression_ratio);
}
// Test 4: Hybrid search correctness
#[test]
fn test_hybrid_search_equivalence() {
let full_index = build_test_index(5000);
let condensed = full_index.condense(CondensationConfig::default_50x()).unwrap();
let hybrid_store = HybridGraphStore::new(condensed, Some(Arc::new(RwLock::new(full_index))));
let query = generate_random_query();
// With ExpansionPolicy::Always, hybrid should match full graph
let hybrid_results = hybrid_store.search_hybrid(&query, 10, ExpansionPolicy::Always).unwrap();
let full_results = full_index.search(&query, 10);
assert_eq!(hybrid_results, full_results);
}
// Test 5: Concurrent expansion safety
#[test]
fn test_concurrent_expansion() {
let hybrid_store = Arc::new(RwLock::new(build_hybrid_store()));
let handles: Vec<_> = (0..10).map(|_| {
let store = Arc::clone(&hybrid_store);
thread::spawn(move || {
let query = generate_random_query();
let results = store.write().unwrap().search_hybrid(
&query, 10, ExpansionPolicy::OnDemand { cache_size: 100 }
);
assert!(results.is_ok());
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
}
Backward Compatibility Strategy
-
API Level:
- Keep existing
HnswIndex::search()unchanged - Add new
HnswIndex::condense()method (opt-in) - Condensed search via separate
HybridGraphStoretype
- Keep existing
-
File Format:
- Condensed graphs use
.cgraphextension - Original
.hnswformat unchanged - Metadata includes version + compression ratio
- Condensed graphs use
-
Node.js Bindings:
- Add
index.condense(config)method (returns newCondensedIndexinstance) - Keep
index.search()behavior identical - Add
condensedIndex.searchHybrid()for hybrid mode
- Add
-
CLI:
ruvector buildunchanged (builds full graph)- New
ruvector condensecommand (separate step) - Auto-detect
.cgraphvs.hnswon load
Implementation Phases
Phase 1: Core Implementation (Weeks 1-3)
Goals:
- Implement clustering algorithms (hierarchical, Louvain)
- Build basic synthetic node creation (centroid-based, no GNN)
- Implement condensed HNSW layer construction
- Basic serialization (
.cgraphformat)
Deliverables:
// Week 1: Clustering
crates/ruvector-gnn/src/condensation/clustering.rs
✓ hierarchical_cluster()
✓ louvain_cluster()
✓ spectral_cluster()
// Week 2: Synthetic nodes + edges
crates/ruvector-gnn/src/condensation/synthetic_node.rs
✓ create_synthetic_nodes() // centroid-based
✓ build_condensed_edges()
// Week 3: Condensed graph + serialization
crates/ruvector-gnn/src/condensation/mod.rs
✓ CondensedGraph::from_hnsw()
✓ save_condensed() / load_condensed()
Success Criteria:
- Can condense 100K vector index to 2K synthetic nodes
- Serialization round-trip preserves graph structure
- Unit tests pass for clustering algorithms
Phase 2: Integration (Weeks 4-6)
Goals:
- Integrate with
HnswIndexAPI - Add GNN-based synthetic node refinement
- Implement hybrid search with basic expansion policy
- Node.js bindings
Deliverables:
// Week 4: HNSW integration
crates/ruvector-gnn/src/hnsw/index.rs
✓ impl GraphCondensation for HnswIndex
// Week 5: GNN training
crates/ruvector-gnn/src/condensation/gnn_trainer.rs
✓ train_synthetic_embeddings()
✓ structure_preservation_loss()
// Week 6: Hybrid search
crates/ruvector-core/src/hybrid/
✓ HybridGraphStore::search_hybrid()
✓ ExpansionPolicy::OnDemand
Success Criteria:
- Recall@10 >= 0.90 on SIFT1M benchmark
- GNN training converges in <100 epochs
- Hybrid search passes correctness tests
Phase 3: Optimization (Weeks 7-9)
Goals:
- Performance tuning (SIMD, caching)
- Adaptive expansion policy (query frequency tracking)
- Distributed condensation for federated learning
- CLI tool for offline condensation
Deliverables:
// Week 7: Performance optimization
crates/ruvector-gnn/src/condensation/
✓ SIMD-optimized centroid computation
✓ Parallel clustering (rayon)
// Week 8: Adaptive expansion
crates/ruvector-core/src/hybrid/
✓ ExpansionPolicy::Adaptive
✓ Query frequency tracking
✓ LRU cache tuning
// Week 9: CLI + distributed
crates/ruvector-cli/src/commands/condense.rs
✓ ruvector condense --ratio 50
crates/ruvector-distributed/src/sync.rs
✓ Condensed graph synchronization
Success Criteria:
- Condensation time <10s for 1M vectors
- Adaptive expansion improves latency by 20%+
- CLI can condense production-scale graphs
Phase 4: Production Hardening (Weeks 10-12)
Goals:
- Comprehensive testing (property-based, fuzz, benchmarks)
- Documentation + examples
- Performance regression suite
- Multi-platform validation
Deliverables:
// Week 10: Testing
tests/condensation/
✓ Property-based tests (proptest)
✓ Fuzz testing (cargo-fuzz)
✓ Regression test suite
// Week 11: Documentation
docs/
✓ Graph Condensation Guide (user-facing)
✓ API documentation (rustdoc)
✓ Examples (edge device deployment)
// Week 12: Benchmarks + validation
benches/condensation.rs
✓ Condensation time benchmarks
✓ Search quality benchmarks
✓ Memory footprint benchmarks
Success Criteria:
- 100% code coverage for condensation module
- Passes all regression tests
- Documentation complete with 3+ examples
- Validated on ARM64, x86-64, WASM targets
Success Metrics
Performance Benchmarks
| Benchmark | Metric | Target | Measurement Method |
|---|---|---|---|
| Condensation Time | Time to condense 1M vectors | <10s | cargo bench condense_1m |
| Memory Reduction | Footprint ratio (full/condensed) | 50x | malloc_count |
| Search Latency (condensed only) | p99 latency | <2ms | criterion benchmark |
| Search Latency (hybrid, cold) | p99 latency on first query | <3ms | Cache miss scenario |
| Search Latency (hybrid, warm) | p99 latency after warmup | <1.5ms | Cache hit scenario |
| Expansion Time | Time to expand 1 cluster | <0.5ms | expand_regions() profiling |
Accuracy Metrics
| Dataset | Metric | Target | Baseline (Full Graph) |
|---|---|---|---|
| SIFT1M | Recall@10 (50x compression) | >=0.92 | 0.95 |
| SIFT1M | Recall@100 (50x compression) | >=0.90 | 0.94 |
| GIST1M | Recall@10 (50x compression) | >=0.90 | 0.93 |
| GloVe-200 | Recall@10 (100x compression) | >=0.85 | 0.92 |
| Custom high-dim (1536d) | Recall@10 (50x compression) | >=0.88 | 0.94 |
Memory/Latency Targets
| Configuration | Memory Footprint | Search Latency (p99) | Use Case |
|---|---|---|---|
| Full HNSW (1M vectors) | 4.8GB | 1.2ms | Server deployment |
| Condensed 50x (baseline) | 96MB | 1.5ms (cold), 1.2ms (warm) | Edge device |
| Condensed 100x (aggressive) | 48MB | 2.0ms (cold), 1.5ms (warm) | IoT device |
| Condensed 10x (conservative) | 480MB | 1.3ms (cold), 1.2ms (warm) | Embedded system |
| Hybrid (50x + on-demand) | 96MB + cache | 1.3ms (adaptive) | Mobile app |
Measurement Tools:
- Memory:
massif(Valgrind),heaptrack, custommalloc_count - Latency:
criterion(Rust),perf(Linux profiling) - Accuracy: Custom recall calculator against ground truth
Quality Gates
All gates must pass before production release:
-
Functional:
- ✓ All unit tests pass (100% coverage for core logic)
- ✓ Integration tests pass on 3+ datasets
- ✓ Serialization round-trip is lossless
-
Performance:
- ✓ Memory reduction >= 40x (for 50x target config)
- ✓ Condensation time <= 15s for 1M vectors
- ✓ Search latency penalty <= 30% (cold start)
-
Accuracy:
- ✓ Recall@10 >= 0.92 on SIFT1M (50x compression)
- ✓ Recall@10 >= 0.85 on GIST1M (100x compression)
- ✓ No catastrophic failures (recall < 0.5)
-
Compatibility:
- ✓ Works on Linux x86-64, ARM64, macOS
- ✓ Node.js bindings pass all tests
- ✓ Backward compatible with existing indexes
Risks and Mitigations
Technical Risks
Risk 1: GNN Training Instability
Description: Synthetic node embeddings may not converge during GNN training, leading to poor structure preservation.
Probability: Medium (30%)
Impact: High (blocks Phase 2)
Mitigation:
- Fallback: Start with centroid-only embeddings (no GNN) in Phase 1
- Hyperparameter Tuning: Grid search over learning rates (1e-4 to 1e-2)
- Loss Function Design: Add regularization term to prevent mode collapse
- Early Stopping: Monitor validation recall and stop if plateauing
- Alternative: Use pre-trained graph embeddings (Node2Vec, GraphSAGE) if GNN fails
Contingency Plan: If GNN training is unstable after 2 weeks of tuning, fall back to attention-weighted centroids (use existing attention mechanisms from Tier 1).
Risk 2: Cold Start Latency Regression
Description: Condensed graph search may be slower than expected due to poor synthetic node placement.
Probability: Medium (40%)
Impact: Medium (user-facing latency)
Mitigation:
- Profiling: Use
perfto identify bottlenecks (likely distance computations) - SIMD Optimization: Vectorize distance calculations for synthetic nodes
- Caching: Precompute pairwise distances between synthetic nodes
- Pruning: Reduce condensed graph connectivity (fewer edges per node)
- Hybrid Strategy: Always expand top-3 synthetic nodes to reduce uncertainty
Contingency Plan: If cold start latency exceeds 2x full graph, add "warm cache" mode that preloads frequently accessed clusters based on query distribution.
Risk 3: Memory Overhead from Expansion Cache
Description: LRU cache for expanded regions may consume more memory than expected, negating compression benefits.
Probability: Low (20%)
Impact: Medium (defeats purpose on edge devices)
Mitigation:
- Adaptive Cache Size: Dynamically adjust cache size based on available memory
- Partial Expansion: Only expand k-nearest neighbors within cluster (not full cluster)
- Compression: Store expanded regions in quantized format (int8 instead of float32)
- Eviction Policy: Evict based on access frequency + recency (LFU + LRU hybrid)
Contingency Plan: If cache overhead exceeds 20% of condensed graph size, make expansion fully on-demand (no caching) and optimize expansion from disk (mmap).
Risk 4: Clustering Quality for High-Dimensional Data
Description: Hierarchical clustering may produce imbalanced clusters in high-dimensional spaces (curse of dimensionality).
Probability: High (60%)
Impact: Medium (poor compression or accuracy)
Mitigation:
- Dimensionality Reduction: Apply PCA or UMAP before clustering
- Alternative Algorithms: Try spectral clustering or Louvain (graph-based, not distance-based)
- Cluster Validation: Measure silhouette score and reject poor clusterings
- Adaptive Compression: Use variable compression ratios per region (dense regions = higher compression)
Contingency Plan: If clustering quality is poor (silhouette score < 0.3), switch to graph-based Louvain clustering using HNSW edges as adjacency matrix.
Risk 5: Serialization Format Bloat
Description:
.cgraph format may be larger than expected due to storing expansion maps and GNN weights.
Probability: Medium (35%)
Impact: Low (reduces compression benefits)
Mitigation:
- Sparse Storage: Use sparse matrix formats (CSR) for expansion maps
- Quantization: Store GNN embeddings in int8 (8x smaller)
- Compression: Apply zstd compression to
.cgraphfile - Lazy Loading: Only load expansion map on-demand (not upfront)
Contingency Plan:
If .cgraph file exceeds 50% of condensed graph target size, remove GNN weights from serialization and recompute on load (trade disk space for CPU time).
Operational Risks
Risk 6: User Confusion with Hybrid API
Description: Users may not understand when to use condensed vs full vs hybrid graphs.
Probability: High (70%)
Impact: Low (documentation issue)
Mitigation:
- Clear Documentation: Add decision tree (edge device → condensed, server → full, mobile → hybrid)
- Smart Defaults: Auto-detect environment (check available memory) and choose policy
- Examples: Provide 3 reference implementations (edge, mobile, server)
- Validation: Add
validate_condensed()method that warns if recall is too low
Risk 7: Debugging Difficulty
Description: When condensed search returns wrong results, debugging is harder (no direct mapping to original nodes).
Probability: Medium (50%)
Impact: Medium (developer experience)
Mitigation:
- Logging: Add verbose logging for expansion decisions
- Visualization: Provide tool to visualize condensed graph + clusters
- Explain API: Add
explain_search()method that shows which clusters were searched - Metrics: Expose per-cluster recall metrics
Appendix: Related Research
This design is based on:
- Graph Condensation for GNNs (Jin et al., 2021): Core SFGC algorithm
- Structure-Preserving Graph Coarsening (Loukas, 2019): Topological invariants
- Hierarchical Navigable Small Worlds (Malkov & Yashunin, 2018): HNSW baseline
- Federated Graph Learning (Wu et al., 2022): Distributed graph synchronization
Key differences from prior work:
- Novel: GNN-based synthetic node learning (prior work used simple centroids)
- Novel: Hybrid search with adaptive expansion (prior work only used condensed graph)
- Engineering: Production-ready Rust implementation with SIMD optimization