git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
1196 lines
35 KiB
Markdown
1196 lines
35 KiB
Markdown
# Feature 15: Entangled Subspace Attention (ESA)
|
|
|
|
## Overview
|
|
|
|
### Problem Statement
|
|
Traditional attention mechanisms operate in a single semantic space, limiting their ability to capture multi-faceted relationships between nodes. Complex graph data often exhibits multiple, concurrent semantic dimensions (e.g., structural similarity, functional similarity, temporal correlation) that cannot be adequately represented in a unified attention computation.
|
|
|
|
### Proposed Solution
|
|
Entangled Subspace Attention (ESA) decomposes the attention computation into multiple independent subspaces, where each subspace captures a distinct semantic aspect of node relationships. These subspace-specific attention scores are then merged via learned mixing weights, allowing the model to adaptively combine different semantic perspectives.
|
|
|
|
### Expected Benefits
|
|
- **Multi-aspect Reasoning**: 40-60% improvement in capturing complex, multi-dimensional relationships
|
|
- **Interpretability**: Each subspace provides insight into specific semantic aspects
|
|
- **Adaptability**: Learned mixing weights adapt to query context
|
|
- **Robustness**: Redundancy across subspaces improves noise resistance by 25-35%
|
|
- **Performance**: Projected 15-20% accuracy improvement on heterogeneous graphs
|
|
|
|
### Novelty Claim
|
|
**Unique Contribution**: First GNN architecture to implement quantum-inspired entangled subspaces with dynamic mixing for attention computation. Unlike multi-head attention (which operates in parallel without explicit semantic separation), ESA enforces explicit semantic decomposition with learned entanglement relationships between subspaces.
|
|
|
|
**Differentiators**:
|
|
1. Explicit semantic subspace allocation (vs. implicit in multi-head)
|
|
2. Cross-subspace entanglement modeling
|
|
3. Query-adaptive mixing with uncertainty quantification
|
|
4. Hierarchical subspace organization
|
|
|
|
## Technical Design
|
|
|
|
### Architecture Diagram
|
|
|
|
```
|
|
Query Vector (q)
|
|
|
|
|
+-----------------+-----------------+
|
|
| | |
|
|
Subspace 1 Subspace 2 Subspace 3
|
|
(Structural) (Functional) (Temporal)
|
|
| | |
|
|
Project_1 Project_2 Project_3
|
|
| | |
|
|
Attention_1 Attention_2 Attention_3
|
|
| | |
|
|
Score_1 Score_2 Score_3
|
|
| | |
|
|
+--------+--------+--------+
|
|
|
|
|
Entanglement Matrix
|
|
|
|
|
Mixing Network
|
|
|
|
|
Mixed Weights
|
|
|
|
|
Weighted Combination
|
|
|
|
|
Final Attention Score
|
|
|
|
|
Top-k Results
|
|
|
|
|
|
Subspace Detail:
|
|
+------------------+
|
|
| Subspace_i |
|
|
| |
|
|
| +--------------+ |
|
|
| | Projection | |
|
|
| | W_i: d -> d_s| |
|
|
| +--------------+ |
|
|
| | |
|
|
| +--------------+ |
|
|
| | Attention | |
|
|
| | K_i, V_i | |
|
|
| +--------------+ |
|
|
| | |
|
|
| +--------------+ |
|
|
| | Output | |
|
|
| | score_i | |
|
|
| +--------------+ |
|
|
+------------------+
|
|
```
|
|
|
|
### Core Data Structures
|
|
|
|
```rust
|
|
/// Configuration for entangled subspace attention
|
|
#[derive(Debug, Clone)]
|
|
pub struct ESAConfig {
|
|
/// Number of independent subspaces
|
|
pub num_subspaces: usize,
|
|
|
|
/// Dimension of each subspace
|
|
pub subspace_dim: usize,
|
|
|
|
/// Original embedding dimension
|
|
pub embed_dim: usize,
|
|
|
|
/// Enable cross-subspace entanglement
|
|
pub enable_entanglement: bool,
|
|
|
|
/// Mixing strategy: "learned", "uniform", "adaptive"
|
|
pub mixing_strategy: MixingStrategy,
|
|
|
|
/// Temperature for mixing softmax
|
|
pub mixing_temperature: f32,
|
|
|
|
/// Enable hierarchical subspace organization
|
|
pub hierarchical: bool,
|
|
}
|
|
|
|
/// Semantic subspace definition
|
|
#[derive(Debug, Clone)]
|
|
pub struct SemanticSubspace {
|
|
/// Unique identifier
|
|
pub id: usize,
|
|
|
|
/// Semantic category (structural, functional, temporal, etc.)
|
|
pub semantic_type: SubspaceType,
|
|
|
|
/// Projection matrix: embed_dim -> subspace_dim
|
|
pub projection: Array2<f32>,
|
|
|
|
/// Learned attention parameters for this subspace
|
|
pub attention_params: AttentionParams,
|
|
|
|
/// Subspace-specific normalization
|
|
pub layer_norm: LayerNorm,
|
|
|
|
/// Weight in final mixing (learned)
|
|
pub mixing_weight: f32,
|
|
}
|
|
|
|
#[derive(Debug, Clone, PartialEq)]
|
|
pub enum SubspaceType {
|
|
Structural, // Graph topology patterns
|
|
Functional, // Feature similarity
|
|
Temporal, // Time-based relationships
|
|
Semantic, // Content-based similarity
|
|
Hybrid(Vec<SubspaceType>), // Composite subspace
|
|
}
|
|
|
|
/// Entanglement matrix between subspaces
|
|
#[derive(Debug, Clone)]
|
|
pub struct EntanglementMatrix {
|
|
/// Cross-subspace correlation matrix
|
|
/// Shape: [num_subspaces, num_subspaces]
|
|
pub correlations: Array2<f32>,
|
|
|
|
/// Learned entanglement strengths
|
|
pub entanglement_weights: Array2<f32>,
|
|
|
|
/// Last update timestamp
|
|
pub last_updated: std::time::Instant,
|
|
}
|
|
|
|
/// Mixing network for combining subspace outputs
|
|
#[derive(Debug)]
|
|
pub struct MixingNetwork {
|
|
/// Input: concatenated subspace scores
|
|
pub input_dim: usize,
|
|
|
|
/// Hidden layers for mixing computation
|
|
pub hidden_layers: Vec<DenseLayer>,
|
|
|
|
/// Output: mixing weights per subspace
|
|
pub output_layer: DenseLayer,
|
|
|
|
/// Dropout for regularization
|
|
pub dropout: f32,
|
|
|
|
/// Activation function
|
|
pub activation: ActivationType,
|
|
}
|
|
|
|
/// Complete ESA layer
|
|
pub struct EntangledSubspaceAttention {
|
|
/// Configuration
|
|
config: ESAConfig,
|
|
|
|
/// All semantic subspaces
|
|
subspaces: Vec<SemanticSubspace>,
|
|
|
|
/// Entanglement relationships
|
|
entanglement: EntanglementMatrix,
|
|
|
|
/// Mixing network
|
|
mixer: MixingNetwork,
|
|
|
|
/// Query-adaptive context encoder
|
|
context_encoder: ContextEncoder,
|
|
|
|
/// Metrics tracking
|
|
metrics: ESAMetrics,
|
|
}
|
|
|
|
#[derive(Debug, Clone)]
|
|
pub struct AttentionParams {
|
|
/// Key projection in subspace
|
|
pub key_proj: Array2<f32>,
|
|
|
|
/// Value projection in subspace
|
|
pub value_proj: Array2<f32>,
|
|
|
|
/// Attention scale factor
|
|
pub scale: f32,
|
|
}
|
|
|
|
#[derive(Debug, Default)]
|
|
pub struct ESAMetrics {
|
|
/// Subspace usage statistics
|
|
pub subspace_usage: Vec<usize>,
|
|
|
|
/// Average mixing weights over time
|
|
pub avg_mixing_weights: Vec<f32>,
|
|
|
|
/// Entanglement strength evolution
|
|
pub entanglement_history: Vec<Array2<f32>>,
|
|
|
|
/// Query processing times per subspace
|
|
pub processing_times: Vec<std::time::Duration>,
|
|
}
|
|
|
|
#[derive(Debug, Clone)]
|
|
pub enum MixingStrategy {
|
|
/// Learned neural network mixing
|
|
Learned,
|
|
|
|
/// Uniform weights across subspaces
|
|
Uniform,
|
|
|
|
/// Query-adaptive weights
|
|
Adaptive,
|
|
|
|
/// Attention-based mixing
|
|
AttentionBased,
|
|
}
|
|
|
|
/// Context encoder for query-adaptive mixing
|
|
#[derive(Debug)]
|
|
pub struct ContextEncoder {
|
|
/// Encode query into context vector
|
|
pub encoder: DenseLayer,
|
|
|
|
/// Context vector dimension
|
|
pub context_dim: usize,
|
|
|
|
/// Layer normalization
|
|
pub layer_norm: LayerNorm,
|
|
}
|
|
|
|
#[derive(Debug)]
|
|
pub struct DenseLayer {
|
|
pub weights: Array2<f32>,
|
|
pub bias: Array1<f32>,
|
|
}
|
|
|
|
#[derive(Debug)]
|
|
pub struct LayerNorm {
|
|
pub gamma: Array1<f32>,
|
|
pub beta: Array1<f32>,
|
|
pub eps: f32,
|
|
}
|
|
|
|
#[derive(Debug, Clone)]
|
|
pub enum ActivationType {
|
|
ReLU,
|
|
GELU,
|
|
Tanh,
|
|
Sigmoid,
|
|
}
|
|
```
|
|
|
|
### Key Algorithms
|
|
|
|
#### 1. ESA Forward Pass
|
|
|
|
```rust
|
|
/// Pseudocode for entangled subspace attention computation
|
|
fn forward(
|
|
query: Array1<f32>, // Query vector [embed_dim]
|
|
key_set: Array2<f32>, // Candidate keys [n_candidates, embed_dim]
|
|
value_set: Array2<f32>, // Candidate values [n_candidates, embed_dim]
|
|
config: ESAConfig
|
|
) -> (Vec<usize>, Array1<f32>) {
|
|
|
|
// Step 1: Encode query context for adaptive mixing
|
|
let context = context_encoder.encode(query); // [context_dim]
|
|
|
|
// Step 2: Compute attention in each subspace
|
|
let mut subspace_scores = Vec::new();
|
|
let mut subspace_attn = Vec::new();
|
|
|
|
for subspace in subspaces.iter() {
|
|
// Project query to subspace
|
|
let q_proj = subspace.projection.dot(&query); // [subspace_dim]
|
|
|
|
// Project keys to subspace
|
|
let k_proj = key_set.dot(&subspace.projection.t()); // [n_candidates, subspace_dim]
|
|
|
|
// Compute attention scores in subspace
|
|
let scores = compute_attention_scores(
|
|
q_proj,
|
|
k_proj,
|
|
subspace.attention_params.scale
|
|
); // [n_candidates]
|
|
|
|
subspace_scores.push(scores);
|
|
|
|
// Apply softmax for probabilistic interpretation
|
|
let attn = softmax(scores);
|
|
subspace_attn.push(attn);
|
|
}
|
|
|
|
// Step 3: Apply entanglement matrix
|
|
if config.enable_entanglement {
|
|
subspace_scores = apply_entanglement(
|
|
subspace_scores,
|
|
entanglement.entanglement_weights
|
|
);
|
|
}
|
|
|
|
// Step 4: Compute mixing weights
|
|
let mixing_weights = match config.mixing_strategy {
|
|
MixingStrategy::Learned => {
|
|
// Concatenate subspace info + context
|
|
let mixer_input = concatenate([
|
|
flatten(subspace_scores),
|
|
context
|
|
]);
|
|
|
|
// Pass through mixing network
|
|
mixer.forward(mixer_input) // [num_subspaces]
|
|
},
|
|
MixingStrategy::Uniform => {
|
|
uniform_weights(config.num_subspaces)
|
|
},
|
|
MixingStrategy::Adaptive => {
|
|
attention_based_mixing(subspace_attn, context)
|
|
},
|
|
MixingStrategy::AttentionBased => {
|
|
query_key_mixing(query, subspace_scores)
|
|
}
|
|
};
|
|
|
|
// Apply temperature scaling
|
|
let mixing_weights = softmax(
|
|
mixing_weights / config.mixing_temperature
|
|
);
|
|
|
|
// Step 5: Weighted combination of subspace scores
|
|
let final_scores = weighted_sum(subspace_scores, mixing_weights);
|
|
|
|
// Step 6: Top-k selection
|
|
let top_k_indices = argsort_topk(final_scores, k);
|
|
let top_k_scores = gather(final_scores, top_k_indices);
|
|
|
|
// Step 7: Update metrics
|
|
update_metrics(mixing_weights, subspace_scores);
|
|
|
|
return (top_k_indices, top_k_scores);
|
|
}
|
|
|
|
/// Compute attention scores using scaled dot-product
|
|
fn compute_attention_scores(
|
|
query: Array1<f32>, // [subspace_dim]
|
|
keys: Array2<f32>, // [n_candidates, subspace_dim]
|
|
scale: f32
|
|
) -> Array1<f32> {
|
|
// Scaled dot-product attention
|
|
let scores = keys.dot(&query); // [n_candidates]
|
|
return scores / scale.sqrt();
|
|
}
|
|
|
|
/// Apply entanglement between subspaces
|
|
fn apply_entanglement(
|
|
subspace_scores: Vec<Array1<f32>>, // [num_subspaces][n_candidates]
|
|
entanglement_weights: Array2<f32> // [num_subspaces, num_subspaces]
|
|
) -> Vec<Array1<f32>> {
|
|
|
|
let num_subspaces = subspace_scores.len();
|
|
let n_candidates = subspace_scores[0].len();
|
|
|
|
// Convert to matrix: [num_subspaces, n_candidates]
|
|
let score_matrix = stack(subspace_scores);
|
|
|
|
// Apply entanglement: E * S
|
|
let entangled_matrix = entanglement_weights.dot(&score_matrix);
|
|
|
|
// Convert back to vector of arrays
|
|
return unstack(entangled_matrix);
|
|
}
|
|
|
|
/// Attention-based mixing weights
|
|
fn attention_based_mixing(
|
|
subspace_attn: Vec<Array1<f32>>, // [num_subspaces][n_candidates]
|
|
context: Array1<f32> // [context_dim]
|
|
) -> Array1<f32> {
|
|
|
|
let mut mixing_scores = Vec::new();
|
|
|
|
for attn in subspace_attn.iter() {
|
|
// Measure entropy of attention distribution
|
|
let entropy = -sum(attn * log(attn + 1e-10));
|
|
|
|
// Measure peak sharpness
|
|
let sharpness = max(attn) - mean(attn);
|
|
|
|
// Combine into mixing score
|
|
let score = entropy * 0.5 + sharpness * 0.5;
|
|
mixing_scores.push(score);
|
|
}
|
|
|
|
// Convert to array and normalize
|
|
let scores = Array1::from(mixing_scores);
|
|
return softmax(scores);
|
|
}
|
|
```
|
|
|
|
#### 2. Entanglement Matrix Update
|
|
|
|
```rust
|
|
/// Update entanglement matrix based on subspace correlations
|
|
fn update_entanglement(
|
|
subspace_scores: Vec<Array1<f32>>, // Recent subspace outputs
|
|
entanglement: &mut EntanglementMatrix,
|
|
learning_rate: f32
|
|
) {
|
|
|
|
let num_subspaces = subspace_scores.len();
|
|
|
|
// Compute correlation matrix between subspaces
|
|
let mut correlations = Array2::zeros((num_subspaces, num_subspaces));
|
|
|
|
for i in 0..num_subspaces {
|
|
for j in i..num_subspaces {
|
|
// Pearson correlation
|
|
let corr = pearson_correlation(
|
|
&subspace_scores[i],
|
|
&subspace_scores[j]
|
|
);
|
|
|
|
correlations[[i, j]] = corr;
|
|
correlations[[j, i]] = corr;
|
|
}
|
|
}
|
|
|
|
// Update entanglement weights with EMA
|
|
let alpha = learning_rate;
|
|
entanglement.entanglement_weights =
|
|
alpha * correlations + (1.0 - alpha) * entanglement.entanglement_weights;
|
|
|
|
// Store correlation history
|
|
entanglement.correlations = correlations;
|
|
entanglement.last_updated = Instant::now();
|
|
}
|
|
|
|
/// Compute Pearson correlation coefficient
|
|
fn pearson_correlation(x: &Array1<f32>, y: &Array1<f32>) -> f32 {
|
|
let n = x.len() as f32;
|
|
let mean_x = x.mean().unwrap();
|
|
let mean_y = y.mean().unwrap();
|
|
|
|
let cov = ((x - mean_x) * (y - mean_y)).sum() / n;
|
|
let std_x = ((x - mean_x).mapv(|v| v * v).sum() / n).sqrt();
|
|
let std_y = ((y - mean_y).mapv(|v| v * v).sum() / n).sqrt();
|
|
|
|
return cov / (std_x * std_y + 1e-10);
|
|
}
|
|
```
|
|
|
|
#### 3. Training Algorithm
|
|
|
|
```rust
|
|
/// Train ESA parameters
|
|
fn train_esa(
|
|
training_data: Vec<(Array1<f32>, Array2<f32>, Vec<usize>)>, // (query, candidates, labels)
|
|
config: ESAConfig,
|
|
num_epochs: usize,
|
|
learning_rate: f32
|
|
) -> EntangledSubspaceAttention {
|
|
|
|
let mut esa = initialize_esa(config);
|
|
let optimizer = Adam::new(learning_rate);
|
|
|
|
for epoch in 0..num_epochs {
|
|
let mut total_loss = 0.0;
|
|
|
|
for (query, candidates, ground_truth) in training_data.iter() {
|
|
// Forward pass
|
|
let (predictions, scores) = esa.forward(query, candidates);
|
|
|
|
// Compute loss (ranking loss + diversity loss)
|
|
let ranking_loss = compute_ranking_loss(predictions, ground_truth);
|
|
let diversity_loss = compute_diversity_loss(&esa.subspaces);
|
|
let entanglement_regularization = compute_entanglement_reg(&esa.entanglement);
|
|
|
|
let loss = ranking_loss
|
|
+ 0.1 * diversity_loss
|
|
+ 0.01 * entanglement_regularization;
|
|
|
|
// Backward pass
|
|
let gradients = backward(loss);
|
|
|
|
// Update parameters
|
|
optimizer.step(&mut esa.parameters(), gradients);
|
|
|
|
// Update entanglement matrix
|
|
update_entanglement(
|
|
esa.last_subspace_scores,
|
|
&mut esa.entanglement,
|
|
0.01
|
|
);
|
|
|
|
total_loss += loss;
|
|
}
|
|
|
|
println!("Epoch {}: Loss = {}", epoch, total_loss / training_data.len() as f32);
|
|
}
|
|
|
|
return esa;
|
|
}
|
|
|
|
/// Diversity loss encourages subspaces to learn different features
|
|
fn compute_diversity_loss(subspaces: &Vec<SemanticSubspace>) -> f32 {
|
|
let mut diversity_loss = 0.0;
|
|
let num_subspaces = subspaces.len();
|
|
|
|
for i in 0..num_subspaces {
|
|
for j in (i+1)..num_subspaces {
|
|
// Measure similarity between projection matrices
|
|
let similarity = cosine_similarity(
|
|
&flatten(subspaces[i].projection),
|
|
&flatten(subspaces[j].projection)
|
|
);
|
|
|
|
// Penalize high similarity (want diverse subspaces)
|
|
diversity_loss += similarity.abs();
|
|
}
|
|
}
|
|
|
|
return diversity_loss / (num_subspaces * (num_subspaces - 1)) as f32;
|
|
}
|
|
```
|
|
|
|
### API Design
|
|
|
|
```rust
|
|
/// Public API for Entangled Subspace Attention
|
|
pub trait ESALayer {
|
|
/// Create new ESA layer with configuration
|
|
fn new(config: ESAConfig) -> Self;
|
|
|
|
/// Forward pass: compute attention and return top-k results
|
|
fn forward(
|
|
&mut self,
|
|
query: &[f32],
|
|
candidates: &[[f32]],
|
|
k: usize
|
|
) -> Result<(Vec<usize>, Vec<f32>), ESAError>;
|
|
|
|
/// Forward pass with full attention scores
|
|
fn forward_full(
|
|
&mut self,
|
|
query: &[f32],
|
|
candidates: &[[f32]]
|
|
) -> Result<Vec<f32>, ESAError>;
|
|
|
|
/// Get subspace-specific attention scores for interpretability
|
|
fn get_subspace_scores(
|
|
&self,
|
|
query: &[f32],
|
|
candidates: &[[f32]]
|
|
) -> Result<Vec<Vec<f32>>, ESAError>;
|
|
|
|
/// Get mixing weights for last query
|
|
fn get_mixing_weights(&self) -> &[f32];
|
|
|
|
/// Update entanglement matrix
|
|
fn update_entanglement(&mut self, learning_rate: f32);
|
|
|
|
/// Get metrics
|
|
fn get_metrics(&self) -> &ESAMetrics;
|
|
|
|
/// Reset metrics
|
|
fn reset_metrics(&mut self);
|
|
|
|
/// Save model
|
|
fn save(&self, path: &str) -> Result<(), ESAError>;
|
|
|
|
/// Load model
|
|
fn load(path: &str) -> Result<Self, ESAError>;
|
|
}
|
|
|
|
/// Error types
|
|
#[derive(Debug, thiserror::Error)]
|
|
pub enum ESAError {
|
|
#[error("Dimension mismatch: expected {expected}, got {actual}")]
|
|
DimensionMismatch { expected: usize, actual: usize },
|
|
|
|
#[error("Invalid configuration: {0}")]
|
|
InvalidConfig(String),
|
|
|
|
#[error("Computation error: {0}")]
|
|
ComputationError(String),
|
|
|
|
#[error("IO error: {0}")]
|
|
IoError(#[from] std::io::Error),
|
|
}
|
|
|
|
/// Builder pattern for ESA configuration
|
|
pub struct ESAConfigBuilder {
|
|
num_subspaces: usize,
|
|
subspace_dim: usize,
|
|
embed_dim: usize,
|
|
enable_entanglement: bool,
|
|
mixing_strategy: MixingStrategy,
|
|
mixing_temperature: f32,
|
|
hierarchical: bool,
|
|
}
|
|
|
|
impl ESAConfigBuilder {
|
|
pub fn new(embed_dim: usize) -> Self {
|
|
Self {
|
|
num_subspaces: 3,
|
|
subspace_dim: embed_dim / 3,
|
|
embed_dim,
|
|
enable_entanglement: true,
|
|
mixing_strategy: MixingStrategy::Learned,
|
|
mixing_temperature: 1.0,
|
|
hierarchical: false,
|
|
}
|
|
}
|
|
|
|
pub fn num_subspaces(mut self, n: usize) -> Self {
|
|
self.num_subspaces = n;
|
|
self
|
|
}
|
|
|
|
pub fn subspace_dim(mut self, dim: usize) -> Self {
|
|
self.subspace_dim = dim;
|
|
self
|
|
}
|
|
|
|
pub fn enable_entanglement(mut self, enable: bool) -> Self {
|
|
self.enable_entanglement = enable;
|
|
self
|
|
}
|
|
|
|
pub fn mixing_strategy(mut self, strategy: MixingStrategy) -> Self {
|
|
self.mixing_strategy = strategy;
|
|
self
|
|
}
|
|
|
|
pub fn build(self) -> ESAConfig {
|
|
ESAConfig {
|
|
num_subspaces: self.num_subspaces,
|
|
subspace_dim: self.subspace_dim,
|
|
embed_dim: self.embed_dim,
|
|
enable_entanglement: self.enable_entanglement,
|
|
mixing_strategy: self.mixing_strategy,
|
|
mixing_temperature: self.mixing_temperature,
|
|
hierarchical: self.hierarchical,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### Affected Crates/Modules
|
|
|
|
1. **`ruvector-gnn-core/`**
|
|
- `src/attention/mod.rs` - Add ESA as attention variant
|
|
- `src/layers/mod.rs` - Register ESA layer type
|
|
- `src/graph/mod.rs` - Extend graph operations for subspace projections
|
|
|
|
2. **`ruvector-gnn-node/`**
|
|
- `src/lib.rs` - Expose ESA to Node.js bindings
|
|
- `index.d.ts` - TypeScript definitions for ESA API
|
|
|
|
3. **`ruvector-core/`**
|
|
- `src/storage/mod.rs` - Store subspace projections
|
|
- `src/index/mod.rs` - Index subspace-specific embeddings
|
|
|
|
4. **`ruvector-graph/`**
|
|
- `src/ops.rs` - Graph operations for multi-subspace traversal
|
|
|
|
### New Modules to Create
|
|
|
|
1. **`ruvector-gnn-core/src/attention/esa/`**
|
|
```
|
|
esa/
|
|
├── mod.rs # Public API
|
|
├── config.rs # Configuration types
|
|
├── subspace.rs # Subspace implementation
|
|
├── entanglement.rs # Entanglement matrix
|
|
├── mixer.rs # Mixing network
|
|
├── context.rs # Context encoder
|
|
├── metrics.rs # Metrics tracking
|
|
└── training.rs # Training utilities
|
|
```
|
|
|
|
2. **`ruvector-gnn-core/src/attention/esa/ops/`**
|
|
```
|
|
ops/
|
|
├── mod.rs
|
|
├── projection.rs # Subspace projection operations
|
|
├── scoring.rs # Attention score computation
|
|
├── mixing.rs # Score mixing operations
|
|
└── update.rs # Entanglement update
|
|
```
|
|
|
|
3. **`ruvector-gnn-core/tests/esa/`**
|
|
```
|
|
tests/esa/
|
|
├── basic.rs # Basic functionality tests
|
|
├── subspace.rs # Subspace-specific tests
|
|
├── entanglement.rs # Entanglement tests
|
|
├── mixing.rs # Mixing strategy tests
|
|
├── integration.rs # Integration tests
|
|
└── benchmarks.rs # Performance benchmarks
|
|
```
|
|
|
|
### Dependencies on Other Features
|
|
|
|
- **Feature 3 (Hierarchical Attention)**: ESA can use hierarchical structure for organizing subspaces
|
|
- **Feature 8 (Sparse Attention)**: Each subspace can use sparse attention internally
|
|
- **Feature 11 (Dynamic Attention)**: Mixing weights are query-adaptive
|
|
- **Feature 19 (Consensus Attention)**: Can use ESA subspaces as independent voters
|
|
|
|
### External Dependencies
|
|
|
|
```toml
|
|
[dependencies]
|
|
ndarray = "0.15"
|
|
ndarray-linalg = "0.16"
|
|
serde = { version = "1.0", features = ["derive"] }
|
|
thiserror = "1.0"
|
|
rayon = "1.7" # Parallel subspace computation
|
|
```
|
|
|
|
## Regression Prevention
|
|
|
|
### What Existing Functionality Could Break
|
|
|
|
1. **Standard Attention API**
|
|
- Risk: ESA requires different input dimensions for subspaces
|
|
- Mitigation: Maintain backward-compatible wrapper API
|
|
|
|
2. **Memory Usage**
|
|
- Risk: Multiple subspaces increase memory by 3-5x
|
|
- Mitigation: Implement memory-efficient subspace sharing
|
|
|
|
3. **Performance**
|
|
- Risk: Multiple attention computations could slow down queries
|
|
- Mitigation: Parallel subspace computation, caching
|
|
|
|
4. **Serialization**
|
|
- Risk: Complex nested structures harder to serialize
|
|
- Mitigation: Custom serde implementations
|
|
|
|
5. **Training Stability**
|
|
- Risk: More parameters could destabilize training
|
|
- Mitigation: Layer normalization, gradient clipping
|
|
|
|
### Test Cases to Prevent Regressions
|
|
|
|
```rust
|
|
#[cfg(test)]
|
|
mod regression_tests {
|
|
use super::*;
|
|
|
|
#[test]
|
|
fn test_backward_compatibility() {
|
|
// ESA should work as drop-in replacement for standard attention
|
|
let config = ESAConfig::default();
|
|
let esa = EntangledSubspaceAttention::new(config);
|
|
|
|
let query = vec![1.0; 128];
|
|
let candidates = vec![vec![0.5; 128]; 100];
|
|
|
|
let (indices, scores) = esa.forward(&query, &candidates, 10).unwrap();
|
|
|
|
assert_eq!(indices.len(), 10);
|
|
assert_eq!(scores.len(), 10);
|
|
assert!(scores.is_sorted_by(|a, b| a >= b));
|
|
}
|
|
|
|
#[test]
|
|
fn test_memory_bounds() {
|
|
// Ensure memory usage stays within bounds
|
|
let config = ESAConfig {
|
|
num_subspaces: 5,
|
|
subspace_dim: 64,
|
|
embed_dim: 128,
|
|
..Default::default()
|
|
};
|
|
|
|
let esa = EntangledSubspaceAttention::new(config);
|
|
let initial_memory = get_memory_usage();
|
|
|
|
// Process 1000 queries
|
|
for _ in 0..1000 {
|
|
let query = vec![1.0; 128];
|
|
let candidates = vec![vec![0.5; 128]; 100];
|
|
let _ = esa.forward(&query, &candidates, 10);
|
|
}
|
|
|
|
let final_memory = get_memory_usage();
|
|
let memory_increase = final_memory - initial_memory;
|
|
|
|
// Should not leak memory
|
|
assert!(memory_increase < 10_000_000); // 10MB threshold
|
|
}
|
|
|
|
#[test]
|
|
fn test_numerical_stability() {
|
|
// Ensure stable computation with extreme values
|
|
let config = ESAConfig::default();
|
|
let esa = EntangledSubspaceAttention::new(config);
|
|
|
|
// Very large values
|
|
let query = vec![1e6; 128];
|
|
let candidates = vec![vec![1e6; 128]; 100];
|
|
let (_, scores) = esa.forward(&query, &candidates, 10).unwrap();
|
|
assert!(scores.iter().all(|s| s.is_finite()));
|
|
|
|
// Very small values
|
|
let query = vec![1e-6; 128];
|
|
let candidates = vec![vec![1e-6; 128]; 100];
|
|
let (_, scores) = esa.forward(&query, &candidates, 10).unwrap();
|
|
assert!(scores.iter().all(|s| s.is_finite()));
|
|
}
|
|
|
|
#[test]
|
|
fn test_deterministic_output() {
|
|
// Same input should produce same output
|
|
let config = ESAConfig::default();
|
|
let esa = EntangledSubspaceAttention::new(config);
|
|
|
|
let query = vec![1.0; 128];
|
|
let candidates = vec![vec![0.5; 128]; 100];
|
|
|
|
let (indices1, scores1) = esa.forward(&query, &candidates, 10).unwrap();
|
|
let (indices2, scores2) = esa.forward(&query, &candidates, 10).unwrap();
|
|
|
|
assert_eq!(indices1, indices2);
|
|
assert_eq!(scores1, scores2);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Backward Compatibility Strategy
|
|
|
|
1. **API Compatibility**
|
|
```rust
|
|
impl EntangledSubspaceAttention {
|
|
/// Standard attention interface (backward compatible)
|
|
pub fn forward_standard(
|
|
&mut self,
|
|
query: &[f32],
|
|
candidates: &[[f32]],
|
|
k: usize
|
|
) -> Result<(Vec<usize>, Vec<f32>), ESAError> {
|
|
// Use uniform mixing by default for standard interface
|
|
self.forward(query, candidates, k)
|
|
}
|
|
}
|
|
```
|
|
|
|
2. **Configuration Migration**
|
|
```rust
|
|
impl From<StandardAttentionConfig> for ESAConfig {
|
|
fn from(standard: StandardAttentionConfig) -> Self {
|
|
ESAConfig {
|
|
num_subspaces: 1, // Single subspace = standard attention
|
|
subspace_dim: standard.embed_dim,
|
|
embed_dim: standard.embed_dim,
|
|
enable_entanglement: false,
|
|
mixing_strategy: MixingStrategy::Uniform,
|
|
..Default::default()
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
3. **Feature Flags**
|
|
```toml
|
|
[features]
|
|
default = ["standard-attention"]
|
|
esa = ["entangled-subspace-attention"]
|
|
full = ["esa", "standard-attention"]
|
|
```
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Research Validation (2 weeks)
|
|
|
|
**Goals**:
|
|
- Validate theoretical foundations
|
|
- Prototype in Python
|
|
- Benchmark against baselines
|
|
|
|
**Tasks**:
|
|
1. Literature review on subspace learning and attention mechanisms
|
|
2. Mathematical formalization of ESA
|
|
3. Python prototype using PyTorch
|
|
4. Experiments on benchmark datasets (Cora, CiteSeer, PubMed)
|
|
5. Ablation studies on subspace count, dimension, mixing strategies
|
|
|
|
**Deliverables**:
|
|
- Research report with mathematical proofs
|
|
- Python prototype code
|
|
- Benchmark results showing 15-20% improvement
|
|
- Ablation study results
|
|
|
|
**Success Criteria**:
|
|
- ESA outperforms standard attention by >15% on graph classification
|
|
- Subspace diversity metrics show distinct semantic learning
|
|
- Computational overhead <2x standard attention
|
|
|
|
### Phase 2: Core Implementation (3 weeks)
|
|
|
|
**Goals**:
|
|
- Implement ESA in Rust
|
|
- Optimize for performance
|
|
- Add comprehensive tests
|
|
|
|
**Tasks**:
|
|
1. Create module structure in `ruvector-gnn-core/src/attention/esa/`
|
|
2. Implement core data structures (SemanticSubspace, EntanglementMatrix, etc.)
|
|
3. Implement forward pass algorithm
|
|
4. Implement entanglement update algorithm
|
|
5. Implement mixing network
|
|
6. Add SIMD optimizations for matrix operations
|
|
7. Add parallel subspace computation with Rayon
|
|
8. Write unit tests for each component
|
|
9. Write integration tests
|
|
10. Add property-based tests with proptest
|
|
|
|
**Deliverables**:
|
|
- Complete Rust implementation
|
|
- Unit tests with >90% coverage
|
|
- Integration tests
|
|
- Performance benchmarks
|
|
|
|
**Success Criteria**:
|
|
- All tests passing
|
|
- Forward pass <5ms for 1000 candidates
|
|
- Memory usage <500MB for standard configuration
|
|
- Zero unsafe code outside of SIMD intrinsics
|
|
|
|
### Phase 3: Integration (2 weeks)
|
|
|
|
**Goals**:
|
|
- Integrate with existing GNN infrastructure
|
|
- Add Node.js bindings
|
|
- Update documentation
|
|
|
|
**Tasks**:
|
|
1. Add ESA as attention option in GNN layer configuration
|
|
2. Update graph operations to support subspace projections
|
|
3. Add NAPI-RS bindings for Node.js
|
|
4. Update TypeScript definitions
|
|
5. Add JavaScript examples
|
|
6. Update API documentation
|
|
7. Add user guide
|
|
8. Create tutorial notebooks
|
|
|
|
**Deliverables**:
|
|
- Integrated ESA in GNN pipeline
|
|
- Node.js bindings
|
|
- Complete documentation
|
|
- Tutorial examples
|
|
|
|
**Success Criteria**:
|
|
- ESA selectable via configuration in existing GNN models
|
|
- JavaScript API fully functional
|
|
- Documentation complete and clear
|
|
- At least 3 working examples
|
|
|
|
### Phase 4: Optimization (2 weeks)
|
|
|
|
**Goals**:
|
|
- Optimize performance
|
|
- Reduce memory usage
|
|
- Add advanced features
|
|
|
|
**Tasks**:
|
|
1. Profile code and identify bottlenecks
|
|
2. Optimize hot paths with SIMD
|
|
3. Implement memory-efficient subspace sharing
|
|
4. Add caching for repeated queries
|
|
5. Implement hierarchical subspace organization
|
|
6. Add adaptive subspace allocation
|
|
7. Optimize entanglement matrix updates
|
|
8. Add GPU support (optional)
|
|
|
|
**Deliverables**:
|
|
- Optimized implementation
|
|
- Performance report
|
|
- Memory optimization report
|
|
- Advanced feature implementations
|
|
|
|
**Success Criteria**:
|
|
- 2x speedup over Phase 2 implementation
|
|
- Memory usage reduced by 30%
|
|
- Support for >10,000 candidates in real-time
|
|
- All advanced features working
|
|
|
|
## Success Metrics
|
|
|
|
### Performance Benchmarks
|
|
|
|
1. **Query Latency**
|
|
- Target: <5ms per query for 1000 candidates
|
|
- Baseline: Standard attention at ~2ms
|
|
- Measurement: Average over 10,000 queries
|
|
|
|
2. **Throughput**
|
|
- Target: >200 queries/second
|
|
- Baseline: Standard attention at ~500 queries/second
|
|
- Measurement: Sustained throughput over 1 minute
|
|
|
|
3. **Memory Usage**
|
|
- Target: <500MB for standard configuration
|
|
- Baseline: Standard attention at ~150MB
|
|
- Measurement: Peak RSS during query processing
|
|
|
|
4. **Scalability**
|
|
- Target: Linear scaling up to 10,000 candidates
|
|
- Baseline: Standard attention linear up to 100,000
|
|
- Measurement: Query time vs. candidate count
|
|
|
|
### Accuracy Metrics
|
|
|
|
1. **Graph Classification**
|
|
- Dataset: Cora, CiteSeer, PubMed
|
|
- Target: 15-20% improvement over standard attention
|
|
- Baseline: Standard GNN with single attention
|
|
- Metric: Macro F1 score
|
|
|
|
2. **Node Classification**
|
|
- Dataset: Reddit, PPI
|
|
- Target: 10-15% improvement
|
|
- Baseline: Standard GNN
|
|
- Metric: Micro F1 score
|
|
|
|
3. **Link Prediction**
|
|
- Dataset: FB15k-237, WN18RR
|
|
- Target: 8-12% improvement
|
|
- Baseline: Standard attention
|
|
- Metric: Mean Reciprocal Rank (MRR)
|
|
|
|
4. **Semantic Diversity**
|
|
- Metric: Average cosine distance between subspace projections
|
|
- Target: >0.7 (indicating diverse semantic learning)
|
|
- Baseline: N/A (new metric)
|
|
|
|
### Comparison to Baselines
|
|
|
|
| Metric | Standard Attention | Multi-Head Attention | ESA (Target) |
|
|
|--------|-------------------|---------------------|-------------|
|
|
| Cora F1 | 0.815 | 0.834 | 0.940 |
|
|
| CiteSeer F1 | 0.701 | 0.728 | 0.810 |
|
|
| Query Latency | 2ms | 3.5ms | 5ms |
|
|
| Memory Usage | 150MB | 280MB | 500MB |
|
|
| Interpretability | Low | Medium | High |
|
|
| Semantic Diversity | N/A | 0.45 | 0.75 |
|
|
|
|
### Interpretability Metrics
|
|
|
|
1. **Subspace Usage Balance**
|
|
- Metric: Entropy of mixing weight distribution
|
|
- Target: >0.8 (indicating balanced usage)
|
|
- Low entropy = some subspaces dominate
|
|
|
|
2. **Entanglement Strength**
|
|
- Metric: Frobenius norm of entanglement matrix
|
|
- Target: 0.3-0.7 (moderate entanglement)
|
|
- Too low = independent, too high = redundant
|
|
|
|
3. **Query-Adaptive Behavior**
|
|
- Metric: Variance of mixing weights across queries
|
|
- Target: >0.1 (indicating adaptation)
|
|
- Low variance = not adapting to query context
|
|
|
|
## Risks and Mitigations
|
|
|
|
### Technical Risks
|
|
|
|
1. **Risk: Increased Computational Complexity**
|
|
- **Impact**: HIGH - Could make ESA impractical for real-time use
|
|
- **Probability**: MEDIUM
|
|
- **Mitigation**:
|
|
- Parallel subspace computation with Rayon
|
|
- SIMD optimizations for matrix operations
|
|
- Caching of projection matrices
|
|
- Lazy evaluation of unused subspaces
|
|
- **Contingency**: Implement adaptive subspace pruning
|
|
|
|
2. **Risk: Training Instability**
|
|
- **Impact**: HIGH - Could prevent convergence
|
|
- **Probability**: MEDIUM
|
|
- **Mitigation**:
|
|
- Layer normalization in each subspace
|
|
- Gradient clipping
|
|
- Warm-up schedule for entanglement updates
|
|
- Careful initialization of projection matrices
|
|
- **Contingency**: Freeze entanglement matrix during early training
|
|
|
|
3. **Risk: Redundant Subspaces**
|
|
- **Impact**: MEDIUM - Subspaces learn same features
|
|
- **Probability**: MEDIUM
|
|
- **Mitigation**:
|
|
- Diversity loss during training
|
|
- Orthogonality constraints on projections
|
|
- Monitor subspace correlation metrics
|
|
- Adaptive subspace pruning
|
|
- **Contingency**: Use pre-defined semantic subspaces instead of learned
|
|
|
|
4. **Risk: Memory Overhead**
|
|
- **Impact**: MEDIUM - Could limit scalability
|
|
- **Probability**: HIGH
|
|
- **Mitigation**:
|
|
- Memory-efficient subspace sharing
|
|
- Quantization of projection matrices
|
|
- Sparse subspace representations
|
|
- Dynamic subspace allocation
|
|
- **Contingency**: Reduce number of subspaces or dimensions
|
|
|
|
5. **Risk: Integration Complexity**
|
|
- **Impact**: MEDIUM - Could delay deployment
|
|
- **Probability**: LOW
|
|
- **Mitigation**:
|
|
- Backward-compatible API design
|
|
- Comprehensive integration tests
|
|
- Gradual rollout with feature flags
|
|
- Extensive documentation
|
|
- **Contingency**: Provide ESA as optional plugin
|
|
|
|
6. **Risk: Hyperparameter Sensitivity**
|
|
- **Impact**: MEDIUM - Difficult to tune
|
|
- **Probability**: MEDIUM
|
|
- **Mitigation**:
|
|
- Automated hyperparameter search
|
|
- Sensible defaults based on experiments
|
|
- Adaptive hyperparameter adjustment
|
|
- Clear tuning guidelines
|
|
- **Contingency**: Provide pre-tuned configurations for common use cases
|
|
|
|
### Research Risks
|
|
|
|
1. **Risk: Limited Performance Improvement**
|
|
- **Impact**: HIGH - Justifies complexity
|
|
- **Probability**: LOW
|
|
- **Mitigation**: Extensive prototyping in Phase 1
|
|
- **Contingency**: Focus on interpretability benefits
|
|
|
|
2. **Risk: Dataset-Specific Benefits**
|
|
- **Impact**: MEDIUM - Limited generalization
|
|
- **Probability**: MEDIUM
|
|
- **Mitigation**: Test on diverse benchmark datasets
|
|
- **Contingency**: Provide dataset-specific configurations
|
|
|
|
### Mitigation Timeline
|
|
|
|
| Week | Risk Mitigation Activities |
|
|
|------|---------------------------|
|
|
| 1-2 | Phase 1 prototyping validates core concept |
|
|
| 3-4 | Performance optimization experiments |
|
|
| 5-7 | Core implementation with parallel computation |
|
|
| 8-9 | Integration testing and memory optimization |
|
|
| 10-11 | Hyperparameter tuning and stability tests |
|
|
| 12 | Final validation and documentation |
|
|
|
|
### Success Criteria for Each Phase
|
|
|
|
**Phase 1 (Research)**:
|
|
- [ ] ESA prototype shows >15% improvement on at least 2 datasets
|
|
- [ ] Computational overhead <3x standard attention
|
|
- [ ] Subspace diversity metric >0.6
|
|
|
|
**Phase 2 (Implementation)**:
|
|
- [ ] All unit tests passing
|
|
- [ ] Query latency <10ms (will optimize to <5ms in Phase 4)
|
|
- [ ] Memory usage <700MB (will optimize to <500MB in Phase 4)
|
|
|
|
**Phase 3 (Integration)**:
|
|
- [ ] ESA integrated with zero breaking changes
|
|
- [ ] Node.js bindings functional
|
|
- [ ] Documentation complete
|
|
|
|
**Phase 4 (Optimization)**:
|
|
- [ ] Query latency <5ms
|
|
- [ ] Memory usage <500MB
|
|
- [ ] All target metrics achieved
|