Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
242
crates/ruvector-hyperbolic-hnsw/README.md
Normal file
242
crates/ruvector-hyperbolic-hnsw/README.md
Normal file
@@ -0,0 +1,242 @@
|
||||
# ruvector-hyperbolic-hnsw
|
||||
|
||||
Hyperbolic (Poincaré ball) embeddings with HNSW integration for hierarchy-aware vector search.
|
||||
|
||||
## Why Hyperbolic Space?
|
||||
|
||||
Hierarchies compress naturally in hyperbolic space. Taxonomies, catalogs, ICD trees, product facets, org charts, and long-tail tags all fit better than in Euclidean space, which means higher recall on deep leaves without blowing up memory or latency.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Poincaré Ball Model**: Store vectors in the Poincaré ball with clamp `r < 1 − eps`
|
||||
- **HNSW Speed Trick**: Prune with cheap tangent-space proxy, rank with true hyperbolic distance
|
||||
- **Per-Shard Curvature**: Different parts of the hierarchy can have different optimal curvatures
|
||||
- **Dual-Space Index**: Keep a synchronized Euclidean ANN for fallback and mutual-ranking fusion
|
||||
- **Production Guardrails**: Numerical stability, canary testing, hot curvature reload
|
||||
|
||||
## Installation
|
||||
|
||||
### Rust
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
ruvector-hyperbolic-hnsw = "0.1.0"
|
||||
```
|
||||
|
||||
### WebAssembly
|
||||
|
||||
```bash
|
||||
cd crates/ruvector-hyperbolic-hnsw-wasm
|
||||
wasm-pack build --target web --release
|
||||
```
|
||||
|
||||
### TypeScript/JavaScript
|
||||
|
||||
```typescript
|
||||
import init, {
|
||||
HyperbolicIndex,
|
||||
poincareDistance,
|
||||
mobiusAdd,
|
||||
expMap,
|
||||
logMap
|
||||
} from 'ruvector-hyperbolic-hnsw-wasm';
|
||||
|
||||
await init();
|
||||
|
||||
const index = new HyperbolicIndex(16, 1.0);
|
||||
index.insert(new Float32Array([0.1, 0.2, 0.3]));
|
||||
const results = index.search(new Float32Array([0.15, 0.1, 0.2]), 5);
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```rust
|
||||
use ruvector_hyperbolic_hnsw::{HyperbolicHnsw, HyperbolicHnswConfig};
|
||||
|
||||
// Create index with default settings
|
||||
let mut index = HyperbolicHnsw::default_config();
|
||||
|
||||
// Insert vectors (automatically projected to Poincaré ball)
|
||||
index.insert(vec![0.1, 0.2, 0.3]).unwrap();
|
||||
index.insert(vec![-0.1, 0.15, 0.25]).unwrap();
|
||||
index.insert(vec![0.2, -0.1, 0.1]).unwrap();
|
||||
|
||||
// Search for nearest neighbors
|
||||
let results = index.search(&[0.15, 0.1, 0.2], 2).unwrap();
|
||||
for r in results {
|
||||
println!("ID: {}, Distance: {:.4}", r.id, r.distance);
|
||||
}
|
||||
```
|
||||
|
||||
## HNSW Speed Trick
|
||||
|
||||
The core optimization:
|
||||
|
||||
1. Precompute `u = log_c(x)` at a shard centroid `c`
|
||||
2. During neighbor selection, use Euclidean `||u_q - u_p||` to prune
|
||||
3. Run exact Poincaré distance only on top N candidates before final ranking
|
||||
|
||||
```rust
|
||||
use ruvector_hyperbolic_hnsw::{HyperbolicHnsw, HyperbolicHnswConfig};
|
||||
|
||||
let mut config = HyperbolicHnswConfig::default();
|
||||
config.use_tangent_pruning = true;
|
||||
config.prune_factor = 10; // Consider 10x candidates in tangent space
|
||||
|
||||
let mut index = HyperbolicHnsw::new(config);
|
||||
|
||||
// ... insert vectors ...
|
||||
|
||||
// Build tangent cache for pruning optimization
|
||||
index.build_tangent_cache().unwrap();
|
||||
|
||||
// Search with pruning (faster!)
|
||||
let results = index.search_with_pruning(&[0.1, 0.15], 5).unwrap();
|
||||
```
|
||||
|
||||
## Core Mathematical Operations
|
||||
|
||||
```rust
|
||||
use ruvector_hyperbolic_hnsw::poincare::{
|
||||
mobius_add, exp_map, log_map, poincare_distance, project_to_ball
|
||||
};
|
||||
|
||||
let x = vec![0.3, 0.2];
|
||||
let y = vec![-0.1, 0.4];
|
||||
let c = 1.0; // Curvature
|
||||
|
||||
// Möbius addition (hyperbolic vector addition)
|
||||
let z = mobius_add(&x, &y, c);
|
||||
|
||||
// Geodesic distance in hyperbolic space
|
||||
let d = poincare_distance(&x, &y, c);
|
||||
|
||||
// Map to tangent space at x
|
||||
let v = log_map(&y, &x, c);
|
||||
|
||||
// Map back to manifold
|
||||
let y_recovered = exp_map(&v, &x, c);
|
||||
```
|
||||
|
||||
## Sharded Index with Per-Shard Curvature
|
||||
|
||||
```rust
|
||||
use ruvector_hyperbolic_hnsw::{ShardedHyperbolicHnsw, ShardStrategy};
|
||||
|
||||
let mut manager = ShardedHyperbolicHnsw::new(1.0);
|
||||
|
||||
// Insert with hierarchy depth information
|
||||
manager.insert(vec![0.1, 0.2], Some(0)).unwrap(); // Root level
|
||||
manager.insert(vec![0.3, 0.1], Some(3)).unwrap(); // Deeper level
|
||||
|
||||
// Update curvature for specific shard
|
||||
manager.update_curvature("radius_1", 0.5).unwrap();
|
||||
|
||||
// Canary testing for new curvature
|
||||
manager.registry.set_canary("radius_1", 0.3, 10); // 10% traffic
|
||||
|
||||
// Search across all shards
|
||||
let results = manager.search(&[0.2, 0.15], 5).unwrap();
|
||||
```
|
||||
|
||||
## Numerical Stability
|
||||
|
||||
All operations include numerical safeguards:
|
||||
|
||||
- **Norm clamping**: Points projected with `eps = 1e-5`
|
||||
- **Projection after updates**: All operations keep points inside the ball
|
||||
- **Stable acosh**: Uses `log1p` expansions for safety
|
||||
- **Clamp arguments**: `arctanh` and `atanh` arguments bounded away from ±1
|
||||
|
||||
## Evaluation Protocol
|
||||
|
||||
### Datasets
|
||||
|
||||
- WordNet
|
||||
- DBpedia slices
|
||||
- Synthetic scale-free tree
|
||||
- Domain taxonomy
|
||||
|
||||
### Primary Metrics
|
||||
|
||||
- **recall@k** (1, 5, 10)
|
||||
- **Mean rank**
|
||||
- **NDCG**
|
||||
|
||||
### Hierarchy Metrics
|
||||
|
||||
- **Radius vs depth Spearman correlation**
|
||||
- **Distance distortion**
|
||||
- **Ancestor AUPRC**
|
||||
|
||||
### Baselines
|
||||
|
||||
- Euclidean HNSW
|
||||
- OPQ/PQ compressed
|
||||
- Simple mutual-ranking fusion
|
||||
|
||||
### Ablations
|
||||
|
||||
- Tangent proxy vs full hyperbolic
|
||||
- Fixed vs learnable curvature c
|
||||
- Global vs shard centroids
|
||||
|
||||
## Production Integration
|
||||
|
||||
### Reflex Loop (on writes)
|
||||
|
||||
Small Möbius deltas and tangent-space micro updates that never push points outside the ball.
|
||||
|
||||
```rust
|
||||
use ruvector_hyperbolic_hnsw::tangent_micro_update;
|
||||
|
||||
let updated = tangent_micro_update(
|
||||
&point,
|
||||
&delta,
|
||||
¢roid,
|
||||
curvature,
|
||||
0.1 // max step size
|
||||
);
|
||||
```
|
||||
|
||||
### Habit (nightly)
|
||||
|
||||
Riemannian SGD passes to clean neighborhoods and optionally relearn per-shard curvature. Run canary first.
|
||||
|
||||
### Structural (periodic)
|
||||
|
||||
Rebuild of HNSW with true hyperbolic metric, curvature retune, and shard reshuffle if hierarchy preservation drops below SLO.
|
||||
|
||||
## Dependencies (Exact Versions)
|
||||
|
||||
```toml
|
||||
nalgebra = "0.34.1"
|
||||
ndarray = "0.17.1"
|
||||
wasm-bindgen = "0.2.106"
|
||||
```
|
||||
|
||||
## Benchmarks
|
||||
|
||||
```bash
|
||||
cd crates/ruvector-hyperbolic-hnsw
|
||||
cargo bench
|
||||
```
|
||||
|
||||
Benchmark suite includes:
|
||||
|
||||
- Poincaré distance computation
|
||||
- Möbius addition
|
||||
- exp/log map operations
|
||||
- HNSW insert and search
|
||||
- Tangent cache building
|
||||
- Search with vs without pruning
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
## Related
|
||||
|
||||
- [ruvector-attention](../ruvector-attention) - Hyperbolic attention mechanisms
|
||||
- [micro-hnsw-wasm](../micro-hnsw-wasm) - Minimal HNSW for WASM
|
||||
- [ruvector-math](../ruvector-math) - General math primitives
|
||||
Reference in New Issue
Block a user