Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions
--- a/crates/ruvector-postgres/docs/integration-plans/README.md
+++ b/crates/ruvector-postgres/docs/integration-plans/README.md
@@ -0,0 +1,165 @@
+# RuVector-Postgres Integration Plans
+
+Comprehensive implementation plans for integrating advanced capabilities into the ruvector-postgres PostgreSQL extension.
+
+## Overview
+
+These documents outline the roadmap to transform ruvector-postgres from a pgvector-compatible extension into a full-featured AI database with self-learning, attention mechanisms, GNN layers, and more.
+
+## Current State
+
+ruvector-postgres v0.1.0 includes:
+- ✅ SIMD-optimized distance functions (AVX-512, AVX2, NEON)
+- ✅ HNSW index with configurable parameters
+- ✅ IVFFlat index for memory-efficient search
+- ✅ Scalar (SQ8), Binary, and Product quantization
+- ✅ pgvector-compatible SQL interface
+- ✅ Parallel query execution
+
+## Planned Integrations
+
+| Feature | Document | Priority | Complexity | Est. Weeks |
+|---------|----------|----------|------------|------------|
+| Self-Learning / ReasoningBank | [01-self-learning.md](./01-self-learning.md) | High | High | 10 |
+| Attention Mechanisms (39 types) | [02-attention-mechanisms.md](./02-attention-mechanisms.md) | High | Medium | 12 |
+| GNN Layers | [03-gnn-layers.md](./03-gnn-layers.md) | High | High | 12 |
+| Hyperbolic Embeddings | [04-hyperbolic-embeddings.md](./04-hyperbolic-embeddings.md) | Medium | Medium | 10 |
+| Sparse Vectors | [05-sparse-vectors.md](./05-sparse-vectors.md) | High | Medium | 10 |
+| Graph Operations & Cypher | [06-graph-operations.md](./06-graph-operations.md) | High | High | 14 |
+| Tiny Dancer Routing | [07-tiny-dancer-routing.md](./07-tiny-dancer-routing.md) | Medium | Medium | 12 |
+
+## Supporting Documents
+
+| Document | Description |
+|----------|-------------|
+| [Optimization Strategy](./08-optimization-strategy.md) | SIMD, memory, query optimization techniques |
+| [Benchmarking Plan](./09-benchmarking-plan.md) | Performance testing and comparison methodology |
+
+## Architecture Principles
+
+### Modularity
+Each feature is implemented as a separate module with feature flags:
+
+```toml
+[features]
+# Core (always enabled)
+default = ["pg16"]
+
+# Advanced features (opt-in)
+learning = []
+attention = []
+gnn = []
+hyperbolic = []
+sparse = []
+graph = []
+routing = []
+
+# Feature bundles
+ai-complete = ["learning", "attention", "gnn", "routing"]
+graph-complete = ["hyperbolic", "sparse", "graph"]
+all = ["ai-complete", "graph-complete"]
+```
+
+### Dependency Strategy
+
+```
+ruvector-postgres
+├── ruvector-core (shared types, SIMD)
+├── ruvector-attention (optional)
+├── ruvector-gnn (optional)
+├── ruvector-graph (optional)
+├── ruvector-tiny-dancer-core (optional)
+└── External
+    ├── pgrx (PostgreSQL FFI)
+    ├── simsimd (SIMD operations)
+    └── rayon (parallelism)
+```
+
+### SQL Interface Design
+
+All features follow consistent SQL patterns:
+
+```sql
+-- Enable features
+SELECT ruvector_enable_feature('learning', table_name := 'embeddings');
+
+-- Configuration via GUCs
+SET ruvector.learning_rate = 0.01;
+SET ruvector.attention_type = 'flash';
+
+-- Feature-specific functions prefixed with ruvector_
+SELECT ruvector_attention_score(a, b, 'scaled_dot');
+SELECT ruvector_gnn_search(query, 'edges', num_hops := 2);
+SELECT ruvector_route(request, optimize_for := 'cost');
+
+-- Cypher queries via dedicated function
+SELECT * FROM ruvector_cypher('graph_name', $$
+    MATCH (n:Person)-[:KNOWS]->(friend)
+    RETURN friend.name
+$$);
+```
+
+## Implementation Roadmap
+
+### Phase 1: Foundation (Months 1-3)
+- [ ] Sparse vectors (BM25, SPLADE support)
+- [ ] Hyperbolic embeddings (Poincaré ball model)
+- [ ] Basic attention operations (scaled dot-product)
+
+### Phase 2: Graph (Months 4-6)
+- [ ] Property graph storage
+- [ ] Cypher query parser
+- [ ] Basic graph algorithms (BFS, shortest path)
+- [ ] Vector-guided traversal
+
+### Phase 3: Neural (Months 7-9)
+- [ ] GNN message passing framework
+- [ ] GCN, GraphSAGE, GAT layers
+- [ ] Multi-head attention
+- [ ] Flash attention
+
+### Phase 4: Intelligence (Months 10-12)
+- [ ] Self-learning trajectory tracking
+- [ ] ReasoningBank pattern storage
+- [ ] Adaptive search optimization
+- [ ] AI agent routing (Tiny Dancer)
+
+### Phase 5: Production (Months 13-15)
+- [ ] Performance optimization
+- [ ] Comprehensive benchmarking
+- [ ] Documentation and examples
+- [ ] Production hardening
+
+## Performance Targets
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| Vector search (1M, 768d) | <2ms p50 | HNSW with ef=64 |
+| Recall@10 | >0.95 | At target latency |
+| GNN forward (10K nodes) | <20ms | Single layer |
+| Cypher simple query | <5ms | Pattern match |
+| Memory overhead | <20% | vs raw vectors |
+| Build throughput | >50K vec/s | HNSW M=16 |
+
+## Contributing
+
+Each integration plan includes:
+1. Architecture diagrams
+2. Module structure
+3. SQL interface specification
+4. Implementation phases with timelines
+5. Code examples
+6. Benchmark targets
+7. Dependencies and feature flags
+
+When implementing:
+1. Start with the module structure
+2. Implement core functionality with tests
+3. Add PostgreSQL integration
+4. Write benchmarks
+5. Document SQL interface
+6. Update this README
+
+## License
+
+MIT License - See main repository for details.