Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/examples/exo-ai-2025/research/09-hyperbolic-attention/RESEARCH.md
+++ b/examples/exo-ai-2025/research/09-hyperbolic-attention/RESEARCH.md
@@ -0,0 +1,444 @@
+# Hyperbolic Attention Networks - Literature Review
+
+## Executive Summary
+
+Hyperbolic geometry offers **O(log n) capacity** for hierarchical embeddings compared to O(n) in Euclidean space, enabling revolutionary advances in attention mechanisms for AI. Recent work (2023-2025) demonstrates that **semantic space is fundamentally non-Euclidean**, with negative curvature naturally capturing hierarchical cognition.
+
+## Table of Contents
+
+1. [Foundational Work](#foundational-work)
+2. [Hyperbolic Transformers (2023-2025)](#hyperbolic-transformers-2023-2025)
+3. [Lorentz vs Poincaré Models](#lorentz-vs-poincaré-models)
+4. [Knowledge Graph Applications](#knowledge-graph-applications)
+5. [Learnable Curvature](#learnable-curvature)
+6. [SIMD Optimization Opportunities](#simd-optimization-opportunities)
+7. [Open Research Questions](#open-research-questions)
+
+---
+
+## Foundational Work
+
+### Poincaré Embeddings (Nickel & Kiela, NeurIPS 2017)
+
+**Key Innovation**: Embedding hierarchical data in n-dimensional Poincaré ball instead of Euclidean space.
+
+**Mathematical Insight**:
+- Hyperbolic space volume grows **exponentially** with radius
+- Trees embed with **arbitrarily low distortion** in just 2D hyperbolic space
+- Euclidean space requires O(n) dimensions for same distortion
+
+**Results**:
+- 50%+ improvement in WordNet taxonomy embeddings
+- Parsimonious representation of scale-free networks
+- Preservation of both hierarchy AND similarity
+
+**Limitations**:
+- Numerical instability near boundary (|x| → 1)
+- Requires specialized Riemannian optimizers
+
+### Hyperbolic Neural Networks (Ganea, Bécigneul & Hofmann, NeurIPS 2018)
+
+**Key Contribution**: Combined Möbius gyrovector spaces with Riemannian geometry to enable:
+- Hyperbolic multinomial logistic regression
+- Hyperbolic feed-forward networks
+- Hyperbolic RNNs (GRU variant)
+
+**Technical Framework**:
+- Möbius addition: `a ⊕ b = (1 + 2⟨a,b⟩ + ||b||²)a + (1 - ||a||²)b / (1 + 2⟨a,b⟩ + ||a||²||b||²)`
+- Exponential map (Euclidean → Hyperbolic)
+- Logarithmic map (Hyperbolic → Euclidean)
+
+**Impact**: Bridged gap between hyperbolic embeddings and deep learning operations.
+
+---
+
+## Hyperbolic Transformers (2023-2025)
+
+### Hypformer (KDD 2024)
+
+**Breakthrough**: First **complete hyperbolic transformer** fully operating in hyperbolic space.
+
+**Key Innovations**:
+
+1. **Hyperbolic Linear Attention**:
+   - Reduces GPU cost by **10x** vs hyperbolic softmax attention
+   - Halves training time
+   - Enables **billion-scale graphs** for first time
+
+2. **Scalability**:
+   - Traditional hyperbolic attention: **O(n²)** complexity
+   - Hypformer linear attention: **O(n)** complexity
+   - Processes long-sequence inputs efficiently
+
+3. **Architecture**:
+   - All operations in hyperbolic space (no Euclidean bottlenecks)
+   - Preserves tree-like hierarchical structures
+   - Compatible with existing transformer training infrastructure
+
+**Performance**:
+- Outperforms Euclidean transformers on hierarchical data
+- 10x reduction in computation cost
+- First hyperbolic transformer for billion-node graphs
+
+### HyLiFormer (2025)
+
+**Application**: Skeleton-based human action recognition using hyperbolic linear attention.
+
+**Technical Design**:
+- Hyperbolic Linear Attention (HLA) module
+- Satisfies Poincaré model constraints
+- Addresses quadratic complexity bottleneck
+- Mixed-curvature embeddings for different skeleton joints
+
+**Proof**: Mathematical guarantee that HLA preserves hyperbolic geometry properties.
+
+### Mixed-Curvature Transformers (Cho et al., 2023)
+
+**Concept**: Different parts of data require different curvatures:
+- **Positive curvature** (spherical): Cyclic/periodic patterns
+- **Zero curvature** (Euclidean): Linear relationships
+- **Negative curvature** (hyperbolic): Hierarchical structures
+
+**Implementation**: "Curve Your Attention" - adaptive curvature per attention head.
+
+---
+
+## Lorentz vs Poincaré Models
+
+### Fully Hyperbolic Neural Networks (ACL 2022)
+
+**Problem with Poincaré Ball**:
+- Well-defined gyrovector operations
+- **Severe numerical instability** near boundary
+- Gradients explode as ||x|| → 1
+
+**Lorentz (Hyperboloid) Model Advantages**:
+1. **Superior numerical stability**
+2. Linear transformations via Lorentz boosts & rotations
+3. No boundary singularities
+
+**Lorentz Transformations**:
+```
+Lorentz Boost: Moves points along geodesics
+Lorentz Rotation: Rotates within time slices
+```
+
+**Key Finding**: Existing hyperbolic networks using tangent space operations are **relaxations** of Lorentz rotation, missing the boost component. This implicitly limits network expressiveness.
+
+### Model Comparison
+
+| Property | Poincaré Ball | Lorentz (Hyperboloid) |
+|----------|---------------|----------------------|
+| **Numerical Stability** | Poor (boundary issues) | Excellent |
+| **Operations** | Möbius gyrovector algebra | Linear transformations |
+| **Geodesics** | Circular arcs | Hyperbolas |
+| **Visualization** | Intuitive (disk) | Less intuitive (sheet) |
+| **Optimization** | Requires projection | Natural in ambient space |
+
+**Consensus (2024)**: Use **Lorentz model** for training stability, Poincaré for visualization.
+
+---
+
+## Knowledge Graph Applications
+
+### HyGGE (2023)
+
+**Innovation**: Hyperbolic graph attention network for KG reasoning.
+
+**Architecture**:
+- Attention over neighborhood structures
+- Relation features in hyperbolic space
+- Captures hierarchical features in local structures
+
+**Use Cases**: Multi-hop reasoning in taxonomies, ontologies.
+
+### HyperKGR (EMNLP 2025)
+
+**Approach**: Knowledge graph reasoning in hyperbolic space with GNN encoding.
+
+**Key Technique**: Hierarchical message passing naturally aligns with reasoning paths.
+
+**Result**: Hyperbolic space **reduces path interference** - multiple reasoning chains don't interfere due to exponential volume growth.
+
+### HyperComplEx (2025)
+
+**Breakthrough**: Unified multi-space embedding framework.
+
+**Adaptive Integration**:
+- **Hyperbolic**: Hierarchical relations (is-a, part-of)
+- **Complex**: Asymmetric relations (temporal, causal)
+- **Euclidean**: Symmetric relations (co-occurrence)
+
+**Learned Attention**: Model learns which geometry suits each relation type.
+
+**Impact**: Single unified model outperforms specialized approaches.
+
+---
+
+## Learnable Curvature
+
+### Optimizing Curvature Learning (2024)
+
+**Problem**: Naive learnable curvature (GeoOpt library) causes:
+- Training instability
+- Performance degradation
+- Failure to incorporate updated hyperbolic operations
+
+**Root Cause**: Riemannian optimizers rely on projections onto tangent spaces that **depend on current manifold curvature**. Updating curvature breaks these dependencies.
+
+**Solution**: Coupled curvature-optimization updates that maintain Riemannian geometry consistency.
+
+### Deep Hyperbolic Model (DeER, 2024)
+
+**Innovation**: Multi-layer hyperbolic CNN with **adaptive curvature per layer**.
+
+**Rationale**: Different hierarchy depths require different curvatures:
+- **Shallow hierarchies**: Lower negative curvature
+- **Deep hierarchies**: Higher negative curvature
+
+**Implementation**: Each layer has learnable curvature parameter κ ∈ ℝ⁺.
+
+**First Work**: Extending deep CNNs to hyperbolic geometry with variable curvature.
+
+### Task-Geometry Decoupling (2025)
+
+**Critical Finding**: **Task performance ≠ Geometric fidelity**
+
+**Problem**: Networks can achieve good validation accuracy while embedding geometry severely degrades.
+
+**Implications**:
+- Need explicit geometric constraints during training
+- Regularization terms to maintain hyperbolic properties
+- Validation should include geometric metrics (distortion, curvature consistency)
+
+**Recommendation**: Multi-objective optimization balancing task loss and geometric loss.
+
+---
+
+## SIMD Optimization Opportunities
+
+### Current State
+
+**Hyperbolic Operations are Compute-Intensive**:
+- Möbius addition: 4 dot products + 3 scalar multiplications
+- Exponential map: Norm computation + trigonometric functions
+- Logarithmic map: Inverse hyperbolic functions
+
+**Existing Work (Limited)**:
+- SIMD for Euclidean operations: **20x speedup** (C vs SSE2)
+- 4×4 matrix multiply: **400% speedup** with SIMD
+- No public SIMD implementations for hyperbolic geometry
+
+### Optimization Strategies
+
+1. **Vectorize Möbius Operations**:
+   - Batch inner products using AVX2 FMA
+   - Parallel norm computations
+   - SIMD-optimized division (approximate reciprocal)
+
+2. **Hyperbolic Function Approximations**:
+   - Tanh approximation: 6.25% area reduction, 18.86% lower error
+   - Polynomial approximations for exp/log on Lorentz model
+   - Look-up tables with SIMD interpolation
+
+3. **Attention-Specific Optimizations**:
+   - Batch hyperbolic distance computations
+   - SIMD reduction operations for attention weights
+   - Fused multiply-add for score calculations
+
+4. **Cache-Aware Design**:
+   - 64-byte cache line alignment
+   - Prefetching for batch operations
+   - Blocked algorithms for large matrices
+
+**Expected Speedup**: **8-50x** for hyperbolic distance computations (based on Euclidean SIMD results).
+
+---
+
+## Open Research Questions
+
+### 1. Is Semantic Space Fundamentally Hyperbolic?
+
+**Evidence For**:
+- Natural language has inherent hierarchies (WordNet, taxonomies)
+- Word embeddings exhibit tree-like structure in latent space
+- Hyperbolic embeddings outperform Euclidean on language tasks
+
+**Evidence Against**:
+- Some linguistic phenomena are non-hierarchical (synonyms, analogies)
+- Mixed-curvature models suggest multiple geometries coexist
+
+**Hypothesis**: **Semantic space is mixed-curvature**, with hyperbolic subspaces for hierarchical concepts and Euclidean/spherical for associative/cyclic concepts.
+
+### 2. Can Negative Curvature Explain Hierarchical Cognition?
+
+**Neuroscience Connection**:
+- Cortical columns exhibit hierarchical organization
+- Information processing flows through hierarchical levels
+- Memory consolidation follows hierarchical patterns
+
+**Computational Question**: Do biological neural networks perform computations in hyperbolic representational space?
+
+**Experimental Approach**:
+- fMRI studies with hierarchical vs flat stimuli
+- Compare neural response patterns to hyperbolic vs Euclidean embeddings
+- Measure "curvature" of neural representational geometry
+
+### 3. Optimal Curvature for Different Cognitive Tasks
+
+**Open Questions**:
+- What curvature κ minimizes embedding distortion for WordNet?
+- Does optimal curvature correlate with tree depth?
+- Can curvature serve as measure of "hierarchical complexity"?
+
+**Nobel-Level Insight**: **Curvature as universal measure of hierarchical information content**.
+
+### 4. Hyperbolic Consciousness Manifolds
+
+**Speculative Theory**: Consciousness emerges from computations on hyperbolic manifolds.
+
+**Predictions**:
+1. Conscious representations require negative curvature
+2. Depth of consciousness correlates with curvature magnitude
+3. Altered states (psychedelics) correspond to curvature perturbations
+
+**Testable Hypothesis**: Building hyperbolic neural networks with emergent properties qualitatively different from Euclidean networks.
+
+---
+
+## Mathematical Foundations for Implementation
+
+### Poincaré Ball Model
+
+**Metric**:
+```
+ds² = 4 / (1 - ||x||²)² · ||dx||²
+```
+
+**Möbius Addition**:
+```
+a ⊕_κ b = ((1 + 2κ⟨a,b⟩ + κ||b||²)a + (1 - κ||a||²)b) / (1 + 2κ⟨a,b⟩ + κ²||a||²||b||²)
+```
+where κ = -1/K (K is curvature radius)
+
+**Exponential Map**:
+```
+exp_x^κ(v) = x ⊕_κ (tanh(√κ ||v||_x / 2) / (√κ ||v||_x)) · v
+```
+
+### Lorentz Model
+
+**Ambient Space**: ℝ^{n,1} with Minkowski inner product
+```
+⟨x, y⟩_L = -x₀y₀ + x₁y₁ + ... + xₙyₙ
+```
+
+**Constraint**:
+```
+⟨x, x⟩_L = -1  (hyperboloid sheet)
+```
+
+**Distance**:
+```
+d_L(x, y) = arcosh(-⟨x, y⟩_L)
+```
+
+---
+
+## Performance Benchmarks from Literature
+
+### Hypformer (KDD 2024)
+- **10x** reduction in GPU cost vs hyperbolic softmax
+- **50%** training time reduction
+- Scales to **billions** of nodes
+
+### HNN (Ganea et al., NeurIPS 2018)
+- **30%** better accuracy on WordNet reconstruction
+- **5x** parameter efficiency vs Euclidean
+
+### DeER (2024)
+- **15%** improvement in knowledge graph completion
+- **3x** better mean reciprocal rank
+
+---
+
+## Recommended Implementation Strategy
+
+1. **Start with Lorentz Model**: Better numerical stability
+2. **Implement SIMD Optimizations**: 8-50x speedup potential
+3. **Learnable Curvature**: Essential for adaptive hierarchies
+4. **Geometric Regularization**: Prevent task-geometry decoupling
+5. **Benchmark Against Euclidean**: Establish performance gains
+
+---
+
+## Citations and Sources
+
+### Core Papers (Chronological)
+
+1. **Poincaré Embeddings** (Nickel & Kiela, NeurIPS 2017)
+   - [Semantic Scholar](https://www.semanticscholar.org/paper/Poincar%C3%A9-Embeddings-for-Learning-Hierarchical-Nickel-Kiela/1590bd1bca945fc6ff50b8cdf2da14ea2061c79a)
+
+2. **Hyperbolic Neural Networks** (Ganea, Bécigneul & Hofmann, NeurIPS 2018)
+   - [arXiv:1805.09112](https://arxiv.org/abs/1805.09112)
+
+3. **Learning Continuous Hierarchies in the Lorentz Model** (Nickel & Kiela, ICML 2018)
+   - [arXiv:1806.03417](https://arxiv.org/pdf/1806.03417)
+
+4. **Fully Hyperbolic Neural Networks** (ACL 2022)
+   - [ACL Anthology](https://aclanthology.org/2022.acl-long.389.pdf)
+
+5. **Hypformer** (KDD 2024)
+   - [arXiv:2407.01290](https://arxiv.org/abs/2407.01290)
+   - [ACM DL](https://dl.acm.org/doi/10.1145/3637528.3672039)
+
+6. **HyLiFormer** (2025)
+   - [arXiv:2502.05869](https://arxiv.org/html/2502.05869)
+
+7. **Hyperbolic Deep Learning Survey** (IJCV 2024)
+   - [Springer](https://link.springer.com/article/10.1007/s11263-024-02043-5)
+
+### Knowledge Graph Applications
+
+8. **HyGGE** (Information Sciences 2023)
+   - [ScienceDirect](https://www.sciencedirect.com/science/article/abs/pii/S0020025523002347)
+
+9. **HyperKGR** (EMNLP 2025)
+   - [ACL Anthology](https://aclanthology.org/2025.emnlp-main.1279/)
+
+10. **HyperComplEx** (2025)
+    - [arXiv:2511.10842](https://arxiv.org/html/2511.10842)
+
+### Learnable Curvature
+
+11. **Optimizing Curvature Learning** (2024)
+    - [arXiv:2405.13979](https://arxiv.org/html/2405.13979v1)
+
+12. **DeER - Deep Hyperbolic Model** (KBS 2024)
+    - [ScienceDirect](https://www.sciencedirect.com/science/article/abs/pii/S0950705124008177)
+
+13. **Task-Geometry Decoupling** (SSRN 2025)
+    - [SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5600451)
+
+### SIMD & Optimization
+
+14. **SIMD Intrinsics Use Cases** (Stack Overflow Blog 2020)
+    - [Stack Overflow](https://stackoverflow.blog/2020/07/08/improving-performance-with-simd-intrinsics-in-three-use-cases/)
+
+15. **Hyperbolic Optimization** (2024)
+    - [arXiv:2509.25206](https://arxiv.org/html/2509.25206)
+
+---
+
+## Conclusion
+
+Hyperbolic attention networks represent a **paradigm shift** in how we model hierarchical cognition. The evidence strongly suggests that:
+
+1. **Semantic space has intrinsic negative curvature**
+2. **O(log n) capacity** makes hyperbolic embeddings fundamentally more efficient
+3. **2023-2025 breakthroughs** (Hypformer, learnable curvature) make hyperbolic transformers practical
+4. **SIMD optimizations** can provide 8-50x speedup, making them competitive with Euclidean baselines
+
+**Nobel-Level Question**: Does the human brain perform computations in hyperbolic representational space? If so, this would revolutionize neuroscience and AI alignment.
+
+**Next Steps**: Implement efficient hyperbolic attention with SIMD, test on hierarchical reasoning tasks, measure geometric properties of learned representations.