git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
1031 lines
33 KiB
Markdown
1031 lines
33 KiB
Markdown
# Degree-Aware Adaptive Precision for HNSW
|
||
|
||
## Overview
|
||
|
||
### Problem Statement
|
||
|
||
Current HNSW implementations use uniform precision (typically f32) for all vectors, regardless of their structural importance in the graph. This leads to significant inefficiencies:
|
||
|
||
- **Memory Waste**: Low-degree peripheral nodes consume same memory as critical hub nodes
|
||
- **Poor Resource Allocation**: Equal precision for nodes with vastly different connectivity
|
||
- **Missed Optimization Opportunities**: High-degree hubs could maintain f32/f64 precision while peripheral nodes use int8/int4
|
||
- **Suboptimal Trade-offs**: Global quantization degrades hub quality to save memory on peripheral nodes
|
||
|
||
In real-world graphs, degree distribution follows power law: 80-90% of nodes have low degree (< 10 connections), while 1-5% are high-degree hubs (100+ connections). Current approaches treat all nodes equally.
|
||
|
||
### Proposed Solution
|
||
|
||
Implement a **Degree-Aware Adaptive Precision System** that automatically selects optimal precision for each node based on its degree in the HNSW graph:
|
||
|
||
**Precision Tiers**:
|
||
1. **f32/f64**: High-degree hubs (top 5% by degree)
|
||
2. **f16**: Medium-degree nodes (5-20th percentile)
|
||
3. **int8**: Low-degree nodes (20-80th percentile)
|
||
4. **int4**: Peripheral nodes (bottom 20%)
|
||
|
||
**Key Features**:
|
||
- Automatic degree-based precision selection
|
||
- Dynamic precision updates as graph evolves
|
||
- Transparent mixed-precision distance computation
|
||
- Optimized memory layout for cache efficiency
|
||
|
||
### Expected Benefits
|
||
|
||
**Quantified Improvements**:
|
||
- **Memory Reduction**: 2-4x total memory savings (50-75% reduction)
|
||
- f32 baseline: 1M vectors × 512 dims × 4 bytes = 2GB
|
||
- Adaptive: ~500MB-1GB (depending on degree distribution)
|
||
- **Search Speed**: 1.2-1.5x faster due to better cache utilization
|
||
- **Accuracy Preservation**: < 1% recall degradation (hubs maintain full precision)
|
||
- **Hub Quality**: 99%+ precision for critical nodes
|
||
- **Peripheral Savings**: 8-16x compression for low-degree nodes
|
||
|
||
**Memory Breakdown** (1M vectors, 512 dims, power-law distribution):
|
||
- 5% f32 hubs: 50k × 512 × 4 = 102MB
|
||
- 15% f16 medium: 150k × 512 × 2 = 154MB
|
||
- 60% int8 low: 600k × 512 × 1 = 307MB
|
||
- 20% int4 peripheral: 200k × 512 × 0.5 = 51MB
|
||
- **Total: 614MB** (vs. 2GB baseline = **3.26x reduction**)
|
||
|
||
## Technical Design
|
||
|
||
### Architecture Diagram
|
||
|
||
```
|
||
┌────────────────────────────────────────────────────────────────┐
|
||
│ AdaptiveHNSW<T> │
|
||
├────────────────────────────────────────────────────────────────┤
|
||
│ - degree_threshold_config: DegreeThresholds │
|
||
│ - precision_policy: PrecisionPolicy │
|
||
│ - embeddings: MixedPrecisionStorage │
|
||
│ - degree_index: Vec<(NodeId, Degree)> │
|
||
└────────────────────────────────────────────────────────────────┘
|
||
▲
|
||
│
|
||
┌───────────────────┴──────────────────────────┐
|
||
│ │
|
||
┌───────▼──────────────────┐ ┌──────────▼──────────┐
|
||
│ MixedPrecisionStorage │ │ DegreeAnalyzer │
|
||
├──────────────────────────┤ ├─────────────────────┤
|
||
│ - f32_pool: Vec<Vec<f32>>│ │ - analyze_degrees() │
|
||
│ - f16_pool: Vec<Vec<f16>>│ │ - compute_percentiles() │
|
||
│ - int8_pool: Vec<QuantVec>│ │ - update_degrees() │
|
||
│ - int4_pool: Vec<QuantVec>│ │ - recommend_precision() │
|
||
│ - index_map: HashMap │ └─────────────────────┘
|
||
│ │
|
||
│ + get_vector() │
|
||
│ + distance() │ ┌─────────────────────┐
|
||
│ + compress() │ │ PrecisionPolicy │
|
||
│ + decompress() │ ├─────────────────────┤
|
||
└──────────────────────────┘ │ - Static │
|
||
│ - Dynamic │
|
||
┌─────────────────┐ │ - Hybrid │
|
||
│ Distance Engine │ │ - Custom(fn) │
|
||
├─────────────────┤ └─────────────────────┘
|
||
│ f32×f32 → f32 │
|
||
│ f32×f16 → f32 │
|
||
│ f32×int8 → f32 │
|
||
│ int8×int8→f32 │
|
||
│ int4×int4→f32 │
|
||
└─────────────────┘
|
||
```
|
||
|
||
### Core Data Structures
|
||
|
||
```rust
|
||
/// Precision tier for vector storage
|
||
#[derive(Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
|
||
pub enum Precision {
|
||
/// Full precision (4 bytes/component)
|
||
F32,
|
||
|
||
/// Half precision (2 bytes/component)
|
||
F16,
|
||
|
||
/// 8-bit quantized (1 byte/component)
|
||
Int8,
|
||
|
||
/// 4-bit quantized (0.5 bytes/component)
|
||
Int4,
|
||
}
|
||
|
||
impl Precision {
|
||
/// Bytes per component
|
||
pub fn bytes_per_component(&self) -> f32 {
|
||
match self {
|
||
Precision::F32 => 4.0,
|
||
Precision::F16 => 2.0,
|
||
Precision::Int8 => 1.0,
|
||
Precision::Int4 => 0.5,
|
||
}
|
||
}
|
||
|
||
/// Compression ratio vs. f32
|
||
pub fn compression_ratio(&self) -> f32 {
|
||
4.0 / self.bytes_per_component()
|
||
}
|
||
}
|
||
|
||
/// Degree-based thresholds for precision selection
|
||
#[derive(Clone, Debug)]
|
||
pub struct DegreeThresholds {
|
||
/// Degree threshold for f32 (e.g., >= 100 connections)
|
||
pub f32_threshold: usize,
|
||
|
||
/// Degree threshold for f16 (e.g., >= 20 connections)
|
||
pub f16_threshold: usize,
|
||
|
||
/// Degree threshold for int8 (e.g., >= 5 connections)
|
||
pub int8_threshold: usize,
|
||
|
||
/// Below this uses int4 (peripheral nodes)
|
||
pub int4_threshold: usize,
|
||
}
|
||
|
||
impl Default for DegreeThresholds {
|
||
fn default() -> Self {
|
||
Self {
|
||
f32_threshold: 50, // Top ~5% of nodes
|
||
f16_threshold: 20, // Next ~15%
|
||
int8_threshold: 5, // Next ~60%
|
||
int4_threshold: 0, // Bottom ~20%
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Policy for precision assignment
|
||
pub enum PrecisionPolicy {
|
||
/// Static: Assign precision at index creation, never change
|
||
Static(DegreeThresholds),
|
||
|
||
/// Dynamic: Re-evaluate precision periodically
|
||
Dynamic {
|
||
thresholds: DegreeThresholds,
|
||
update_interval: usize, // Re-evaluate every N insertions
|
||
},
|
||
|
||
/// Hybrid: Static for existing, dynamic for new nodes
|
||
Hybrid {
|
||
thresholds: DegreeThresholds,
|
||
promotion_threshold: usize, // Promote after N degree increases
|
||
},
|
||
|
||
/// Custom: User-defined precision function
|
||
Custom(Box<dyn Fn(usize) -> Precision + Send + Sync>),
|
||
}
|
||
|
||
/// Node metadata for adaptive precision
|
||
#[derive(Clone, Debug)]
|
||
pub struct NodeMetadata {
|
||
/// Node ID in HNSW graph
|
||
pub id: usize,
|
||
|
||
/// Current degree (number of connections)
|
||
pub degree: usize,
|
||
|
||
/// Assigned precision tier
|
||
pub precision: Precision,
|
||
|
||
/// Storage location (pool index)
|
||
pub storage_offset: usize,
|
||
|
||
/// Quantization parameters (if quantized)
|
||
pub quant_params: Option<QuantizationParams>,
|
||
}
|
||
|
||
/// Quantization parameters for int8/int4
|
||
#[derive(Clone, Debug)]
|
||
pub struct QuantizationParams {
|
||
/// Scale factor (range / 255 or 15)
|
||
pub scale: f32,
|
||
|
||
/// Zero point offset
|
||
pub zero_point: f32,
|
||
|
||
/// Original min/max for reconstruction
|
||
pub min_val: f32,
|
||
pub max_val: f32,
|
||
}
|
||
|
||
/// Mixed-precision storage pools
|
||
pub struct MixedPrecisionStorage {
|
||
/// Full precision vectors
|
||
f32_pool: Vec<Vec<f32>>,
|
||
|
||
/// Half precision vectors
|
||
f16_pool: Vec<Vec<half::f16>>,
|
||
|
||
/// 8-bit quantized vectors
|
||
int8_pool: Vec<Vec<i8>>,
|
||
|
||
/// 4-bit quantized vectors (packed 2 per byte)
|
||
int4_pool: Vec<Vec<u8>>,
|
||
|
||
/// Node metadata index
|
||
nodes: Vec<NodeMetadata>,
|
||
|
||
/// Quick lookup: node_id -> metadata index
|
||
node_index: HashMap<usize, usize>,
|
||
|
||
/// Vector dimension
|
||
dimension: usize,
|
||
}
|
||
|
||
/// Adaptive HNSW index with mixed precision
|
||
pub struct AdaptiveHNSW {
|
||
/// Storage for vectors
|
||
storage: MixedPrecisionStorage,
|
||
|
||
/// HNSW graph structure (layers, connections)
|
||
graph: HNSWGraph,
|
||
|
||
/// Precision assignment policy
|
||
policy: PrecisionPolicy,
|
||
|
||
/// Degree thresholds
|
||
thresholds: DegreeThresholds,
|
||
|
||
/// Statistics
|
||
stats: AdaptiveStats,
|
||
}
|
||
|
||
/// Statistics for adaptive precision
|
||
#[derive(Default, Debug)]
|
||
pub struct AdaptiveStats {
|
||
/// Count by precision tier
|
||
pub precision_counts: HashMap<Precision, usize>,
|
||
|
||
/// Total memory used (bytes)
|
||
pub total_memory: usize,
|
||
|
||
/// Memory by precision
|
||
pub memory_by_precision: HashMap<Precision, usize>,
|
||
|
||
/// Number of precision promotions
|
||
pub promotions: usize,
|
||
|
||
/// Number of precision demotions
|
||
pub demotions: usize,
|
||
|
||
/// Average degree by precision
|
||
pub avg_degree_by_precision: HashMap<Precision, f32>,
|
||
}
|
||
```
|
||
|
||
### Key Algorithms
|
||
|
||
#### Algorithm 1: Precision Selection Based on Degree
|
||
|
||
```pseudocode
|
||
function select_precision(degree: usize, thresholds: DegreeThresholds) -> Precision:
|
||
if degree >= thresholds.f32_threshold:
|
||
return Precision::F32
|
||
else if degree >= thresholds.f16_threshold:
|
||
return Precision::F16
|
||
else if degree >= thresholds.int8_threshold:
|
||
return Precision::Int8
|
||
else:
|
||
return Precision::Int4
|
||
|
||
function auto_calibrate_thresholds(degrees: Vec<usize>) -> DegreeThresholds:
|
||
// Sort degrees to compute percentiles
|
||
sorted = degrees.sorted()
|
||
n = sorted.len()
|
||
|
||
// Top 5% get f32
|
||
f32_threshold = sorted[n * 95 / 100]
|
||
|
||
// 5-20% get f16
|
||
f16_threshold = sorted[n * 80 / 100]
|
||
|
||
// 20-80% get int8
|
||
int8_threshold = sorted[n * 20 / 100]
|
||
|
||
// Bottom 20% get int4
|
||
int4_threshold = 0
|
||
|
||
return DegreeThresholds {
|
||
f32_threshold,
|
||
f16_threshold,
|
||
int8_threshold,
|
||
int4_threshold,
|
||
}
|
||
```
|
||
|
||
#### Algorithm 2: Mixed-Precision Distance Computation
|
||
|
||
```pseudocode
|
||
function mixed_precision_distance(
|
||
a: &NodeMetadata,
|
||
b: &NodeMetadata,
|
||
storage: &MixedPrecisionStorage,
|
||
) -> f32:
|
||
// Fetch vectors in their native precision
|
||
vec_a = storage.get_vector(a)
|
||
vec_b = storage.get_vector(b)
|
||
|
||
// Determine computation precision (use higher of the two)
|
||
compute_precision = max(a.precision, b.precision)
|
||
|
||
match (a.precision, b.precision):
|
||
// Both high precision: direct computation
|
||
(F32, F32):
|
||
return cosine_distance_f32(vec_a, vec_b)
|
||
|
||
// Mixed f32/f16: promote f16 to f32
|
||
(F32, F16) | (F16, F32):
|
||
vec_a_f32 = to_f32(vec_a)
|
||
vec_b_f32 = to_f32(vec_b)
|
||
return cosine_distance_f32(vec_a_f32, vec_b_f32)
|
||
|
||
// Both f16: compute in f16, convert result
|
||
(F16, F16):
|
||
dist_f16 = cosine_distance_f16(vec_a, vec_b)
|
||
return f32(dist_f16)
|
||
|
||
// Quantized: decompress to f32
|
||
(Int8 | Int4, _) | (_, Int8 | Int4):
|
||
vec_a_f32 = dequantize(vec_a, a.quant_params)
|
||
vec_b_f32 = dequantize(vec_b, b.quant_params)
|
||
return cosine_distance_f32(vec_a_f32, vec_b_f32)
|
||
|
||
// Optimized: Avoid decompression for int8×int8
|
||
function int8_dot_product_fast(a: &[i8], b: &[i8], params_a: &Quant, params_b: &Quant) -> f32:
|
||
// Compute dot product in int32 to avoid overflow
|
||
dot_int = 0_i32
|
||
for i in 0..a.len():
|
||
dot_int += i32(a[i]) * i32(b[i])
|
||
|
||
// Rescale to original space
|
||
scale = params_a.scale * params_b.scale
|
||
offset_a = params_a.zero_point
|
||
offset_b = params_b.zero_point
|
||
|
||
// Correct formula: (scale_a * (x - zp_a)) · (scale_b * (y - zp_b))
|
||
dot_float = scale * (f32(dot_int) - offset_a * sum(b) - offset_b * sum(a) +
|
||
offset_a * offset_b * a.len())
|
||
|
||
return dot_float
|
||
```
|
||
|
||
#### Algorithm 3: Dynamic Precision Update
|
||
|
||
```pseudocode
|
||
function update_precision_dynamic(
|
||
node_id: usize,
|
||
new_degree: usize,
|
||
storage: &mut MixedPrecisionStorage,
|
||
policy: &PrecisionPolicy,
|
||
) -> Option<PrecisionChange>:
|
||
metadata = storage.get_metadata(node_id)
|
||
old_precision = metadata.precision
|
||
|
||
// Compute new recommended precision
|
||
new_precision = select_precision(new_degree, policy.thresholds)
|
||
|
||
if new_precision == old_precision:
|
||
return None // No change needed
|
||
|
||
// Decide whether to actually change
|
||
match policy:
|
||
Dynamic { update_interval, .. }:
|
||
if storage.insertions_since_last_update < update_interval:
|
||
return None // Wait for next update cycle
|
||
|
||
Hybrid { promotion_threshold, .. }:
|
||
degree_increase = new_degree - metadata.degree
|
||
if new_precision < old_precision:
|
||
// Demotion: Only if degree dropped significantly
|
||
if degree_increase > -(promotion_threshold):
|
||
return None
|
||
else:
|
||
// Promotion: Only after sustained degree increase
|
||
if degree_increase < promotion_threshold:
|
||
return None
|
||
|
||
// Perform precision change
|
||
old_vector = storage.get_vector(&metadata)
|
||
|
||
// Convert precision
|
||
new_vector = match (old_precision, new_precision):
|
||
(F32, F16):
|
||
old_vector.map(|x| f16::from_f32(x))
|
||
|
||
(F32, Int8) | (F16, Int8):
|
||
quantize_int8(old_vector)
|
||
|
||
(F32, Int4) | (F16, Int4) | (Int8, Int4):
|
||
quantize_int4(old_vector)
|
||
|
||
(Int8, F32) | (Int4, F32):
|
||
dequantize(old_vector, metadata.quant_params)
|
||
|
||
(Int8, F16):
|
||
dequantize_to_f16(old_vector, metadata.quant_params)
|
||
|
||
// Update storage
|
||
storage.move_vector(node_id, old_precision, new_precision, new_vector)
|
||
|
||
return Some(PrecisionChange {
|
||
node_id,
|
||
old_precision,
|
||
new_precision,
|
||
memory_delta: calculate_memory_delta(old_precision, new_precision),
|
||
})
|
||
```
|
||
|
||
#### Algorithm 4: Quantization with Optimal Parameters
|
||
|
||
```pseudocode
|
||
function quantize_int8(vector: &[f32]) -> (Vec<i8>, QuantizationParams):
|
||
// Find min/max
|
||
min_val = vector.min()
|
||
max_val = vector.max()
|
||
|
||
// Compute scale and zero point
|
||
range = max_val - min_val
|
||
scale = range / 255.0
|
||
zero_point = min_val
|
||
|
||
// Quantize
|
||
quantized = Vec::new()
|
||
for x in vector:
|
||
// Map [min, max] → [0, 255] → [-128, 127]
|
||
normalized = (x - zero_point) / scale
|
||
clamped = clamp(normalized, 0.0, 255.0)
|
||
quantized.push(i8(clamped) - 128)
|
||
|
||
params = QuantizationParams {
|
||
scale,
|
||
zero_point,
|
||
min_val,
|
||
max_val,
|
||
}
|
||
|
||
return (quantized, params)
|
||
|
||
function dequantize_int8(quantized: &[i8], params: &QuantizationParams) -> Vec<f32>:
|
||
result = Vec::new()
|
||
for q in quantized:
|
||
// Map [-128, 127] → [0, 255] → [min, max]
|
||
normalized = f32(q + 128)
|
||
value = normalized * params.scale + params.zero_point
|
||
result.push(value)
|
||
|
||
return result
|
||
```
|
||
|
||
### API Design
|
||
|
||
```rust
|
||
// Public API
|
||
pub mod adaptive {
|
||
use super::*;
|
||
|
||
/// Create adaptive HNSW index with automatic precision selection
|
||
pub fn build_adaptive_index<T: Float>(
|
||
embeddings: &[Vec<T>],
|
||
config: AdaptiveConfig,
|
||
) -> Result<AdaptiveHNSW, Error>;
|
||
|
||
/// Configuration for adaptive precision
|
||
#[derive(Clone)]
|
||
pub struct AdaptiveConfig {
|
||
/// HNSW parameters
|
||
pub hnsw_params: HNSWParams,
|
||
|
||
/// Precision policy
|
||
pub policy: PrecisionPolicy,
|
||
|
||
/// Degree thresholds (None = auto-calibrate)
|
||
pub thresholds: Option<DegreeThresholds>,
|
||
|
||
/// Enable dynamic precision updates
|
||
pub dynamic_updates: bool,
|
||
}
|
||
|
||
/// Search with adaptive precision
|
||
pub fn search<T: Float>(
|
||
index: &AdaptiveHNSW,
|
||
query: &[T],
|
||
k: usize,
|
||
ef: usize,
|
||
) -> Vec<SearchResult>;
|
||
|
||
/// Get memory statistics
|
||
pub fn memory_stats(index: &AdaptiveHNSW) -> AdaptiveStats;
|
||
|
||
/// Analyze degree distribution and recommend thresholds
|
||
pub fn recommend_thresholds(
|
||
degrees: &[usize],
|
||
target_memory_ratio: f32, // e.g., 0.5 for 2x compression
|
||
) -> DegreeThresholds;
|
||
}
|
||
|
||
// Advanced API for fine-grained control
|
||
pub mod precision {
|
||
/// Manually set precision for a node
|
||
pub fn set_node_precision(
|
||
index: &mut AdaptiveHNSW,
|
||
node_id: usize,
|
||
precision: Precision,
|
||
) -> Result<(), Error>;
|
||
|
||
/// Get current precision for a node
|
||
pub fn get_node_precision(
|
||
index: &AdaptiveHNSW,
|
||
node_id: usize,
|
||
) -> Precision;
|
||
|
||
/// Bulk update precisions based on new degree information
|
||
pub fn bulk_update_precisions(
|
||
index: &mut AdaptiveHNSW,
|
||
updates: Vec<(usize, usize)>, // (node_id, new_degree)
|
||
) -> Vec<PrecisionChange>;
|
||
|
||
/// Export precision assignment for analysis
|
||
pub fn export_precision_map(
|
||
index: &AdaptiveHNSW,
|
||
) -> HashMap<usize, (Precision, usize)>; // node_id -> (precision, degree)
|
||
}
|
||
```
|
||
|
||
## Integration Points
|
||
|
||
### Affected Crates/Modules
|
||
|
||
1. **ruvector-hnsw** (Major Changes)
|
||
- Modify `HNSWIndex` to support mixed-precision storage
|
||
- Update distance computation in search
|
||
- Add degree tracking and analysis
|
||
- Modify serialization format
|
||
|
||
2. **ruvector-quantization** (Moderate Changes)
|
||
- Extract quantization logic into separate crate
|
||
- Add f16 support (using `half` crate)
|
||
- Add int4 packed quantization
|
||
- Implement optimized int8×int8 distance
|
||
|
||
3. **ruvector-core** (Minor Changes)
|
||
- Add `Precision` enum to core types
|
||
- Update `Distance` trait for mixed-precision
|
||
|
||
4. **ruvector-gnn-node** (Minor Changes)
|
||
- Add TypeScript bindings for adaptive configuration
|
||
- Expose memory statistics to JavaScript
|
||
|
||
### New Modules to Create
|
||
|
||
```
|
||
crates/ruvector-adaptive/
|
||
├── src/
|
||
│ ├── lib.rs # Public API
|
||
│ ├── precision/
|
||
│ │ ├── mod.rs # Precision management
|
||
│ │ ├── policy.rs # Precision policies
|
||
│ │ └── selection.rs # Degree-based selection
|
||
│ ├── storage/
|
||
│ │ ├── mod.rs # Mixed-precision storage
|
||
│ │ ├── pools.rs # Separate precision pools
|
||
│ │ ├── metadata.rs # Node metadata
|
||
│ │ └── layout.rs # Memory layout optimization
|
||
│ ├── quantization/
|
||
│ │ ├── mod.rs # Quantization utilities
|
||
│ │ ├── int8.rs # 8-bit quantization
|
||
│ │ ├── int4.rs # 4-bit quantization
|
||
│ │ └── f16.rs # Half-precision
|
||
│ ├── distance/
|
||
│ │ ├── mod.rs # Mixed-precision distance
|
||
│ │ ├── dispatcher.rs # Dispatch based on precision
|
||
│ │ └── optimized.rs # SIMD optimizations
|
||
│ ├── hnsw/
|
||
│ │ ├── mod.rs # Adaptive HNSW
|
||
│ │ ├── index.rs # AdaptiveHNSW struct
|
||
│ │ ├── search.rs # Mixed-precision search
|
||
│ │ └── update.rs # Dynamic precision updates
|
||
│ └── analysis/
|
||
│ ├── degree.rs # Degree analysis
|
||
│ ├── thresholds.rs # Threshold calibration
|
||
│ └── stats.rs # Statistics and reporting
|
||
├── tests/
|
||
│ ├── precision_tests.rs # Precision selection
|
||
│ ├── quantization_tests.rs # Quantization accuracy
|
||
│ ├── search_tests.rs # Search correctness
|
||
│ └── memory_tests.rs # Memory usage
|
||
├── benches/
|
||
│ ├── distance_bench.rs # Distance computation
|
||
│ ├── search_bench.rs # Search performance
|
||
│ └── memory_bench.rs # Memory efficiency
|
||
└── Cargo.toml
|
||
```
|
||
|
||
### Dependencies on Other Features
|
||
|
||
- **Synergies**:
|
||
- **Hyperbolic Embeddings** (Feature 4): Different precision for Euclidean vs. hyperbolic components
|
||
- **Attention Mechanisms** (Existing): Attention hubs may correlate with high degree
|
||
- **Temporal GNN** (Feature 6): Precision may evolve as node importance changes over time
|
||
|
||
- **Conflicts**:
|
||
- **Global Quantization**: Cannot use both global and adaptive quantization simultaneously
|
||
|
||
## Regression Prevention
|
||
|
||
### What Existing Functionality Could Break
|
||
|
||
1. **Search Accuracy**
|
||
- Risk: Quantization introduces approximation errors
|
||
- Impact: 1-5% recall degradation
|
||
|
||
2. **Distance Metric Properties**
|
||
- Risk: Mixed-precision may violate metric axioms (triangle inequality)
|
||
- Impact: Rare edge cases in graph construction
|
||
|
||
3. **Serialization**
|
||
- Risk: Complex multi-pool storage format
|
||
- Impact: Backward incompatibility
|
||
|
||
4. **Performance**
|
||
- Risk: Precision dispatch overhead
|
||
- Impact: 5-10% latency increase for small vectors
|
||
|
||
### Test Cases to Prevent Regressions
|
||
|
||
```rust
|
||
#[cfg(test)]
|
||
mod regression_tests {
|
||
use super::*;
|
||
|
||
#[test]
|
||
fn test_pure_f32_mode_exact_match() {
|
||
// All nodes at f32 should match non-adaptive exactly
|
||
let config = AdaptiveConfig {
|
||
thresholds: Some(DegreeThresholds {
|
||
f32_threshold: 0, // Force all to f32
|
||
..Default::default()
|
||
}),
|
||
..Default::default()
|
||
};
|
||
|
||
let adaptive_index = build_adaptive_index(&embeddings, config).unwrap();
|
||
let standard_index = build_standard_index(&embeddings).unwrap();
|
||
|
||
// Search results should be identical
|
||
let adaptive_results = search(&adaptive_index, &query, 10, 50);
|
||
let standard_results = search(&standard_index, &query, 10, 50);
|
||
|
||
assert_eq!(adaptive_results, standard_results);
|
||
}
|
||
|
||
#[test]
|
||
fn test_recall_degradation_acceptable() {
|
||
// Recall should not drop below 95%
|
||
let adaptive_index = build_adaptive_index(&embeddings, default_config()).unwrap();
|
||
let ground_truth = brute_force_search(&embeddings, &queries);
|
||
|
||
let recall = compute_recall(&adaptive_index, &queries, &ground_truth, 10);
|
||
assert!(recall >= 0.95, "Recall {} below threshold 0.95", recall);
|
||
}
|
||
|
||
#[test]
|
||
fn test_hub_precision_preserved() {
|
||
// High-degree nodes must maintain f32 precision
|
||
let index = build_adaptive_index(&embeddings, default_config()).unwrap();
|
||
|
||
for node in index.high_degree_nodes() {
|
||
let precision = get_node_precision(&index, node.id);
|
||
assert_eq!(precision, Precision::F32,
|
||
"Hub node {} has precision {:?}, expected F32",
|
||
node.id, precision);
|
||
}
|
||
}
|
||
|
||
#[test]
|
||
fn test_quantization_reconstruction_error() {
|
||
// Reconstruction error should be bounded
|
||
let original = vec![1.0_f32, 2.0, 3.0, -1.0, -2.0];
|
||
let (quantized, params) = quantize_int8(&original);
|
||
let reconstructed = dequantize_int8(&quantized, ¶ms);
|
||
|
||
for (orig, recon) in original.iter().zip(reconstructed.iter()) {
|
||
let error = (orig - recon).abs();
|
||
let relative_error = error / orig.abs().max(1e-6);
|
||
assert!(relative_error < 0.02,
|
||
"Reconstruction error {} > 2%", relative_error);
|
||
}
|
||
}
|
||
|
||
#[test]
|
||
fn test_mixed_precision_distance_commutative() {
|
||
// distance(a, b) should equal distance(b, a)
|
||
let dist_ab = mixed_precision_distance(&node_a, &node_b, &storage);
|
||
let dist_ba = mixed_precision_distance(&node_b, &node_a, &storage);
|
||
|
||
assert!((dist_ab - dist_ba).abs() < 1e-5);
|
||
}
|
||
}
|
||
```
|
||
|
||
### Backward Compatibility Strategy
|
||
|
||
1. **Feature Flag**
|
||
```toml
|
||
[features]
|
||
default = ["standard-precision"]
|
||
adaptive-precision = []
|
||
```
|
||
|
||
2. **Automatic Migration**
|
||
```rust
|
||
pub fn migrate_to_adaptive(
|
||
standard_index: &HNSWIndex,
|
||
config: AdaptiveConfig,
|
||
) -> Result<AdaptiveHNSW, Error> {
|
||
// Analyze degree distribution
|
||
let degrees = standard_index.compute_degrees();
|
||
let thresholds = recommend_thresholds(°rees, 0.5);
|
||
|
||
// Re-encode vectors with appropriate precision
|
||
// Preserve graph structure
|
||
}
|
||
```
|
||
|
||
3. **Dual Format Support**
|
||
```rust
|
||
enum IndexFormat {
|
||
Standard,
|
||
Adaptive,
|
||
}
|
||
|
||
pub fn deserialize(path: &Path) -> Result<Index, Error> {
|
||
let format = detect_format(path)?;
|
||
match format {
|
||
IndexFormat::Standard => load_standard(path),
|
||
IndexFormat::Adaptive => load_adaptive(path),
|
||
}
|
||
}
|
||
```
|
||
|
||
## Implementation Phases
|
||
|
||
### Phase 1: Core Implementation (Weeks 1-2)
|
||
|
||
**Goal**: Implement precision selection and mixed-precision storage
|
||
|
||
**Tasks**:
|
||
1. Create `ruvector-adaptive` crate
|
||
2. Implement `Precision` enum and `DegreeThresholds`
|
||
3. Build `MixedPrecisionStorage` with separate pools
|
||
4. Implement quantization (int8, int4, f16)
|
||
5. Add degree analysis utilities
|
||
6. Write unit tests for precision selection
|
||
|
||
**Deliverables**:
|
||
- Working mixed-precision storage
|
||
- Quantization with < 2% reconstruction error
|
||
- Degree analysis and threshold calibration
|
||
|
||
**Success Criteria**:
|
||
- All precision conversions invertible (up to quantization error)
|
||
- Memory usage matches theoretical estimates
|
||
- Degree-based selection working correctly
|
||
|
||
### Phase 2: Integration (Weeks 3-4)
|
||
|
||
**Goal**: Integrate adaptive precision with HNSW
|
||
|
||
**Tasks**:
|
||
1. Modify HNSW search to support mixed precision
|
||
2. Implement mixed-precision distance computation
|
||
3. Add precision update mechanisms
|
||
4. Implement serialization/deserialization
|
||
5. Create migration tool from standard HNSW
|
||
|
||
**Deliverables**:
|
||
- Functioning `AdaptiveHNSW` index
|
||
- Mixed-precision search
|
||
- Backward-compatible serialization
|
||
|
||
**Success Criteria**:
|
||
- Search recall >= 95%
|
||
- Migration from standard HNSW works
|
||
- Serialization round-trip preserves precision
|
||
|
||
### Phase 3: Optimization (Weeks 5-6)
|
||
|
||
**Goal**: Optimize performance and memory layout
|
||
|
||
**Tasks**:
|
||
1. SIMD optimization for int8×int8 distance
|
||
2. Cache-friendly memory layout (separate pools → interleaved)
|
||
3. Parallel precision updates
|
||
4. Benchmark vs. standard HNSW
|
||
5. Profile and optimize hotspots
|
||
|
||
**Deliverables**:
|
||
- SIMD-accelerated distance computation
|
||
- Optimized memory layout
|
||
- Performance benchmarks
|
||
|
||
**Success Criteria**:
|
||
- 2-4x memory reduction achieved
|
||
- Search latency within 1.2x of standard
|
||
- int8×int8 distance < 1µs (SIMD)
|
||
|
||
### Phase 4: Production Hardening (Weeks 7-8)
|
||
|
||
**Goal**: Production-ready with monitoring and documentation
|
||
|
||
**Tasks**:
|
||
1. Add monitoring and statistics
|
||
2. Write comprehensive documentation
|
||
3. Create example applications
|
||
4. Performance tuning for different workloads
|
||
5. Create deployment guide
|
||
|
||
**Deliverables**:
|
||
- API documentation
|
||
- Example applications (e-commerce search, recommendation)
|
||
- Production deployment guide
|
||
- Monitoring dashboards
|
||
|
||
**Success Criteria**:
|
||
- Documentation completeness > 90%
|
||
- Examples demonstrate 2-4x memory savings
|
||
- Zero P0/P1 bugs
|
||
|
||
## Success Metrics
|
||
|
||
### Performance Benchmarks
|
||
|
||
**Memory Targets**:
|
||
- Overall compression: 2-4x vs. f32 baseline
|
||
- f32 pool: 5-10% of nodes (hubs)
|
||
- f16 pool: 10-20% of nodes
|
||
- int8 pool: 50-70% of nodes
|
||
- int4 pool: 10-30% of nodes (peripherals)
|
||
|
||
**Latency Targets**:
|
||
- int8×int8 distance: < 1.0µs (SIMD), < 2.0µs (scalar)
|
||
- Mixed-precision distance: < 3.0µs (worst case)
|
||
- Search latency overhead: < 20% vs. standard
|
||
- Precision update: < 100µs per node
|
||
|
||
**Throughput Targets**:
|
||
- Distance computation: > 300k pairs/sec (mixed)
|
||
- Search QPS: > 1500 (8 threads, with adaptive precision)
|
||
|
||
### Accuracy Metrics
|
||
|
||
**Recall Targets**:
|
||
- Top-10 recall @ ef=50: >= 95%
|
||
- Top-100 recall @ ef=200: >= 97%
|
||
- Hub recall (f32 nodes): >= 99%
|
||
|
||
**Quantization Error**:
|
||
- int8 reconstruction: < 2% relative error
|
||
- int4 reconstruction: < 5% relative error
|
||
- f16 reconstruction: < 0.1% relative error
|
||
|
||
**Distance Approximation**:
|
||
- int8×int8 vs. f32×f32: < 3% error
|
||
- Mixed precision: < 2% error
|
||
|
||
### Memory/Latency Targets
|
||
|
||
**Memory Breakdown** (1M vectors, 512 dims, power-law):
|
||
- Baseline (f32): 2.0GB
|
||
- Adaptive: 0.5-1.0GB
|
||
- Metadata overhead: < 50MB
|
||
- Total savings: 50-75%
|
||
|
||
**Latency Breakdown**:
|
||
- Vector fetch: 40% of time
|
||
- Distance computation: 45% of time
|
||
- Precision dispatch: < 5% of time
|
||
- Other: 10% of time
|
||
|
||
**Scalability**:
|
||
- Linear memory scaling to 10M vectors
|
||
- Sub-linear to 100M vectors (due to power-law distribution)
|
||
|
||
## Risks and Mitigations
|
||
|
||
### Technical Risks
|
||
|
||
**Risk 1: Recall Degradation Beyond Acceptable Threshold**
|
||
- **Severity**: High
|
||
- **Impact**: Poor search quality, user complaints
|
||
- **Probability**: Medium
|
||
- **Mitigation**:
|
||
- Conservative default thresholds (more nodes at f32)
|
||
- Automatic threshold calibration with recall targets
|
||
- Per-query precision promotion (boost precision for important queries)
|
||
- Continuous monitoring and alerts
|
||
|
||
**Risk 2: Complex Mixed-Precision Bugs**
|
||
- **Severity**: High
|
||
- **Impact**: Incorrect results, crashes
|
||
- **Probability**: Medium
|
||
- **Mitigation**:
|
||
- Extensive property-based testing
|
||
- Reference implementation (pure f32) for validation
|
||
- Fuzzing with random precision combinations
|
||
- Clear invariants and assertions
|
||
|
||
**Risk 3: Memory Layout Inefficiency**
|
||
- **Severity**: Medium
|
||
- **Impact**: Cache misses, slower than expected
|
||
- **Probability**: Medium
|
||
- **Mitigation**:
|
||
- Profile-guided layout optimization
|
||
- Interleaved storage for locality
|
||
- Prefetching hints
|
||
- Benchmark different layouts
|
||
|
||
**Risk 4: Precision Update Overhead**
|
||
- **Severity**: Medium
|
||
- **Impact**: Slow dynamic updates, blocking inserts
|
||
- **Probability**: Low
|
||
- **Mitigation**:
|
||
- Batch updates amortize cost
|
||
- Async background updates
|
||
- Lazy evaluation (defer until next access)
|
||
- Update rate limiting
|
||
|
||
**Risk 5: Quantization Parameter Drift**
|
||
- **Severity**: Low
|
||
- **Impact**: Accumulated errors over time
|
||
- **Probability**: Low
|
||
- **Mitigation**:
|
||
- Periodic re-quantization with updated parameters
|
||
- Track quantization age
|
||
- Automatic re-quantization triggers
|
||
- Monitor reconstruction error distribution
|
||
|
||
**Risk 6: Poor Performance with Non-Power-Law Graphs**
|
||
- **Severity**: Medium
|
||
- **Impact**: Limited applicability, low adoption
|
||
- **Probability**: Medium
|
||
- **Mitigation**:
|
||
- Detect degree distribution at index creation
|
||
- Warn if savings will be minimal
|
||
- Provide fallback to standard HNSW
|
||
- Document ideal use cases
|
||
|
||
### Mitigation Summary Table
|
||
|
||
| Risk | Mitigation Strategy | Owner | Timeline |
|
||
|------|-------------------|-------|----------|
|
||
| Recall degradation | Conservative defaults + monitoring | Quality team | Phase 2 |
|
||
| Mixed-precision bugs | Property testing + fuzzing | Core team | Phase 1-2 |
|
||
| Memory inefficiency | Layout profiling + optimization | Perf team | Phase 3 |
|
||
| Update overhead | Batch + async updates | Core team | Phase 2 |
|
||
| Parameter drift | Periodic re-quantization | Maintenance | Post-v1 |
|
||
| Non-power-law graphs | Distribution detection + warnings | Product team | Phase 4 |
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
1. **Han et al. (2015)**: "Deep Compression: Compressing DNNs with Pruning, Trained Quantization and Huffman Coding"
|
||
2. **Jacob et al. (2018)**: "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
|
||
3. **Guo et al. (2020)**: "GRIP: Graph Representation Learning with Induced Precision"
|
||
4. **Malkov & Yashunin (2018)**: "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"
|
||
|
||
## Appendix: Degree Distribution Analysis
|
||
|
||
### Power-Law Distribution
|
||
|
||
Most real-world graphs follow power-law degree distribution:
|
||
```
|
||
P(k) ∝ k^(-γ)
|
||
```
|
||
|
||
where γ is typically 2-3.
|
||
|
||
### Example Distribution (1M nodes, γ=2.5)
|
||
|
||
| Degree Range | % of Nodes | Recommended Precision | Memory per Node (512 dims) |
|
||
|-------------|------------|---------------------|----------------------------|
|
||
| >= 100 | 5% | f32 | 2048 bytes |
|
||
| 20-99 | 15% | f16 | 1024 bytes |
|
||
| 5-19 | 60% | int8 | 512 bytes |
|
||
| < 5 | 20% | int4 | 256 bytes |
|
||
|
||
**Total Memory**: 614MB (vs. 2GB baseline = **69.3% savings**)
|
||
|
||
### Calibration Formula
|
||
|
||
Given target compression ratio `R`:
|
||
```
|
||
Σ(p_i * m_i) = M_baseline / R
|
||
|
||
where:
|
||
p_i = percentage of nodes at precision i
|
||
m_i = memory per node at precision i
|
||
M_baseline = baseline memory (all f32)
|
||
```
|
||
|
||
Solve for threshold percentiles that achieve target `R`.
|