Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,344 @@
# FalkorDB: Comprehensive Technical Research Report
> Research date: 2026-02-26 | Branch: `research/falkordb-review`
## Table of Contents
- [1. Project Overview](#1-project-overview)
- [2. Architecture](#2-architecture)
- [3. Key Features](#3-key-features)
- [4. Technical Deep Dive](#4-technical-deep-dive)
- [5. Ecosystem](#5-ecosystem)
- [6. Relevance to RuVector](#6-relevance-to-ruvector)
- [Sources](#sources)
---
## 1. Project Overview
### What is FalkorDB?
FalkorDB is a high-performance, in-memory property graph database that runs as a Redis module. Its distinguishing characteristic is the use of **sparse adjacency matrices** (via the GraphBLAS standard) and **linear algebra operations** for graph traversal and query execution — a fundamentally different approach from the pointer-hopping model used by traditional graph databases like Neo4j.
The project positions itself as "the best Knowledge Graph for LLM (GraphRAG)," optimized for low-latency graph queries that serve AI/ML inference pipelines.
### Origins: The RedisGraph Lineage
FalkorDB is the **direct successor to RedisGraph**, which reached End-of-Life in January 2025 when Redis Ltd. discontinued it. The FalkorDB team forked and continued development, preserving the core sparse-matrix architecture while adding significant new capabilities including vector indexing, the Bolt protocol, and a dedicated GraphRAG SDK.
### License
**Server Side Public License v1 (SSPLv1)** — the same license used by MongoDB. This restricts offering FalkorDB as a managed service without a commercial agreement. A commercial Enterprise license is also available.
### Current Version and Activity
| Metric | Value |
|--------|-------|
| Latest release | v4.16.5 (February 2026) |
| GitHub stars | ~3,600 |
| Forks | ~280 |
| Commits (master) | 2,172+ |
| Primary language | C (Rust port in progress) |
| Release cadence | Multiple per month |
**Key recent milestones**:
- v4.16.0 (Dec 2025): User-Defined Functions (UDFs)
- v4.14.10 (Dec 2025): 30% memory reduction via compact in-memory storage
- v4.0 (2024): Vector index support, Bolt protocol
---
## 2. Architecture
### Core: Sparse Matrices + Linear Algebra
FalkorDB's architecture is built on a single foundational insight: **graph traversals can be expressed as sparse matrix multiplications**. Rather than crawling through pointer-linked node structures, FalkorDB translates Cypher pattern queries into algebraic expressions executed by [GraphBLAS](http://graphblas.org/).
**How it works**:
1. Graph topology is stored as sparse adjacency matrices in **CSC (Compressed Sparse Column)** format
2. Each graph has one global adjacency matrix plus dedicated matrices per relationship type
3. Label membership is stored as symmetric diagonal matrices
4. A query like `(N0)-[A]->(N1)-[B]->(N2)` translates to the matrix multiplication `A * B`
5. GraphBLAS executes the multiplication using CPU-level optimizations (AVX, OpenMP)
### Storage Engine
- **In-memory primary storage** with Redis-backed disk persistence
- **Graph struct**: Central data structure managing entities through `DataBlock` arrays and GraphBLAS matrices for relationships
- **DataBlock**: Contiguous memory blocks for node/edge properties — O(1) insertion (1M+ node creates in <500ms, 500K edges in 0.3s)
- **Compact storage** (v4.14.10+): Dual representation approach achieving 30% memory reduction
- **Roaring bitmaps** for label indexes
### Query Engine Pipeline
```
Cypher Query
|
v
[Parsing] --> AST (Lex tokenizer + Lemon parser)
|
v
[Algebraic Translation] --> Matrix multiplication expressions
|
v
[Optimization] --> Execution plans prioritizing sparse intermediates
|
v
[Execution] --> Filtered traversal, conditional traversal, projection
|
v
[Result Population] --> Matching entity attributes
```
### Data Model
**Property Graph Model** (OpenCypher-compliant):
- **Nodes**: Zero or more labels, key-value properties
- **Relationships**: Exactly one type, key-value properties, directed
- Relationships recorded in adjacency matrices: `M[source, destination] = 1`
### Concurrency Model
- **Read-write lock per graph**: Concurrent readers, serialized writers
- **Intra-query parallelism**: Independent sub-expressions execute in parallel via OpenMP
- **Redis event loop**: Module runs within Redis's single-threaded event loop, offloads heavy computation to worker threads
### Persistence and Replication
| Feature | Mechanism |
|---------|-----------|
| Persistence | Redis RDB snapshots + AOF |
| Replication | Effect-based (deltas only) |
| HA | Redis Sentinel for automatic failover |
| Clustering | Redis Cluster (3 masters + 3 replicas) |
| Kubernetes | Helm charts, KubeBlocks, dedicated operator |
---
## 3. Key Features
### Query Language: OpenCypher with Extensions
FalkorDB implements a **subset of OpenCypher** with proprietary extensions:
| Command | Purpose |
|---------|---------|
| `GRAPH.QUERY` / `GRAPH.RO_QUERY` | Execute read-write or read-only queries |
| `GRAPH.EXPLAIN` / `GRAPH.PROFILE` | Query plan inspection and profiling |
| `GRAPH.DELETE` | Drop a graph |
| `GRAPH.INFO` / `GRAPH.MEMORY` | Metadata and memory diagnostics |
| `GRAPH.COPY` | Duplicate a graph |
Standard Cypher clauses: `MATCH`, `CREATE`, `DELETE`, `SET`, `MERGE`, `WHERE`, `ORDER BY`, `RETURN`, `WITH`, `UNWIND`, etc.
### Indexing Capabilities
| Index Type | Description |
|------------|-------------|
| **Range** | Numeric/comparable values, efficient lookups |
| **Full-text** | Text-based search queries |
| **Vector** (v4.0+) | Configurable dimensionality, cosine/euclidean similarity |
```cypher
-- Vector index example
CREATE VECTOR INDEX FOR (n:Product) ON (n.embedding)
OPTIONS {dimension: 1536, similarityFunction: 'cosine'}
```
### Performance Benchmarks (vs Neo4j)
SNAP Pokec social network, 82% read / 18% write:
| Metric | FalkorDB | Neo4j | Ratio |
|--------|----------|-------|-------|
| p50 latency | 55ms | 577.5ms | **~10x faster** |
| p90 latency | 108ms | 4,784ms | **~44x faster** |
| p99 latency | 136.2ms | 46,924ms | **~345x faster** |
| PageRank | 18.53ms | 417.31ms | **~23x faster** |
| WCC | 17.8ms | 1,324ms | **~74x faster** |
| Memory usage | 100MB | 600MB | **6x less** |
Key property: FalkorDB maintains a consistent 2.5x latency increase from p50 to p99, indicating predictable performance. Neo4j shows extreme tail latency variance due to JVM GC pauses.
### AI/ML Integrations
**GraphRAG SDK** ([GitHub](https://github.com/FalkorDB/GraphRAG-SDK)):
- Converts user queries to Cypher via LLM
- Retrieves relevant subgraphs as context for LLM generation
- Claims up to 90% reduction in hallucinations vs. vector-only RAG
- Multi-model configuration: separate models for graph construction vs. Q&A
- Multi-agent support: specialized agents per knowledge domain
**Vector Support**:
- Native vector indexes on node/edge properties
- Hybrid queries: graph traversal narrows dataset, vector search ranks results
- Integration with OpenAI, Anthropic, and other embedding providers
**Framework Integrations**: LangChain (Python + JS/TS), LlamaIndex, AG2/AutoGen, N8N + Graphiti
### Protocols
| Protocol | Port | Notes |
|----------|------|-------|
| RESP (Redis) | 6379 | Native Redis protocol |
| Bolt (v4.0+) | 7687 | Neo4j-compatible, enables migration |
### User-Defined Functions (v4.16.0+)
UDFs allow extending query capabilities with custom functions, including graph object support.
---
## 4. Technical Deep Dive
### Core Data Structures
| Structure | Purpose |
|-----------|---------|
| GraphBLAS sparse matrices (CSC) | Adjacency representation |
| DataBlock | Contiguous memory for node/edge properties |
| Label matrices | Diagonal matrices for node-label membership |
| Graph struct | Central coordinator for DataBlocks + matrices |
| AST (Lex + Lemon) | Query parsing and IR |
| Execution plan | Optimized query tree with algebraic ops |
### Comparison with Other Graph Databases
| Dimension | FalkorDB | Neo4j | RedisGraph (EOL) |
|-----------|----------|-------|-------------------|
| **Language** | C (Rust port underway) | Java (JVM) | C |
| **Graph model** | Property Graph | Property Graph | Property Graph |
| **Query lang** | OpenCypher subset | Full Cypher | OpenCypher subset |
| **Execution** | Sparse matrix algebra | Pointer hopping | Sparse matrix algebra |
| **Traversal** | Matrix multiplication | Index-free adjacency | Matrix multiplication |
| **Memory** | In-memory + persistence | Disk-based + cache | In-memory |
| **Concurrency** | RW-lock + OpenMP | MVCC (JVM) | RW-lock |
| **Vector index** | Native (v4.0+) | Via plugin | No |
| **Clustering** | Redis Cluster/Sentinel | Causal clustering | Redis Cluster |
| **License** | SSPLv1 | GPL/Commercial | EOL |
| **AI focus** | Primary (GraphRAG SDK) | Secondary (GenAI plugin) | None |
| **Bolt protocol** | Yes (v4.0+) | Native | No |
### Memory Management
- **In-memory architecture**: All graph data in RAM
- **Redis module model**: Leverages Redis memory allocation
- **Compact storage** (v4.14.10): 30% memory reduction
- **GRAPH.MEMORY**: Runtime diagnostics
- **Production guidance**: 48GB allocation for high-fragmentation; restart if ratio >10
- **Automatic index shrinking**: Deleted entries trigger compaction
---
## 5. Ecosystem
### Official Client Libraries
| Language | Package | License |
|----------|---------|---------|
| Python | [falkordb-py](https://pypi.org/project/FalkorDB/) | MIT |
| Node.js | [falkordb-ts](https://www.npmjs.com/package/falkordb) | MIT |
| Java | [jfalkordb](https://search.maven.org/search?q=jfalkordb) | BSD |
| **Rust** | [falkordb-rs](https://crates.io/crates/falkordb) | MIT |
| Go | [falkordb-go](https://github.com/FalkorDB/falkordb-go) | BSD |
| C# | [NFalkorDB](https://www.nuget.org/packages/NFalkorDB) | Apache 2.0 |
**OGM (Object-Graph Mapping)**: Python ORM, Go ORM, Spring Data (Java)
**Community**: 20+ implementations (Elixir, Ruby, PHP, Julia, etc.)
### Cloud Offerings
| Tier | Price | Key Features |
|------|-------|--------------|
| Free | $0 | Multi-graph, ACL, community support |
| Startup | From $73/GB/mo | TLS, automated backups |
| Pro | From $350/8GB/mo | Clustering, HA, multi-zone |
| Enterprise | Custom | VPC, 24/7 support, dedicated AM |
Available on **AWS Marketplace** and **Google Cloud Marketplace**.
### Community
- ~3,600 GitHub stars, ~280 forks
- Funded startup (Crunchbase-listed)
- [falkordb-browser](https://github.com/FalkorDB/falkordb-browser) for visual graph exploration
- **FalkorDBLite**: Embedded Python variant with process isolation
---
## 6. Relevance to RuVector
### Direct Architectural Parallels
**Sparse Matrix Algebra for Graphs**: FalkorDB's core insight — expressing graph traversals as sparse matrix multiplications via GraphBLAS — is directly relevant to RuVector's graph computation workloads. The `ruvector-graph` crate already uses `petgraph` and `roaring` bitmaps. FalkorDB's approach demonstrates that CSC sparse matrices + linear algebra can achieve 10-345x improvements over traditional traversal.
**HNSW Integration**: Both projects use HNSW indexing. RuVector has dedicated crates (`ruvector-hyperbolic-hnsw`, `micro-hnsw-wasm`) while FalkorDB's vector indexes use similar ANN search. RuVector's `ruvector-gnn` operates on HNSW topology.
**Cypher Query Language**: The `ruvector-graph` crate includes Cypher parsing dependencies (`nom`, `pest`, `lalrpop-util`). FalkorDB's OpenCypher implementation with Bolt protocol is a mature reference. The Rust port at [FalkorDB-core-rs](https://github.com/FalkorDB/FalkorDB-core-rs) could serve as a Rust-native reference.
### GNN and Graph Transformer Integration
FalkorDB stores graphs as sparse matrices — the exact format consumed by GNN pipelines:
1. **Store knowledge graphs** in FalkorDB for persistent, queryable storage
2. **Export adjacency matrices** in sparse CSC format for `ruvector-gnn`
3. **Run GNN message-passing** on exported topology via RuVector's ndarray/rayon computation
4. **Write computed embeddings back** to FalkorDB as vector properties for hybrid queries
The `ruvector-graph-transformer` (unified graph transformer with proof-gated mutation) could use FalkorDB as graph storage, querying subgraphs via Cypher and computing attention over them.
### Solver and Optimization
FalkorDB's GraphBLAS execution translates graph problems into linear algebra — the same domain as `ruvector-solver`. For combinatorial optimization (min-cut, max-flow, partitioning), FalkorDB's sparse matrix representation could serve as efficient input for `ruvector-mincut` and `ruvector-solver`.
### Sparse Inference
The `ruvector-sparse-inference` crate implements "PowerInfer-style sparse inference." FalkorDB demonstrates that sparsity-aware data structures (CSC) combined with hardware-optimized linear algebra (AVX, OpenMP) achieve orders-of-magnitude speedups — directly applicable to sparse neural network inference.
### Concrete Integration Opportunities
| Integration Point | RuVector Crate | FalkorDB Feature | Value |
|---|---|---|---|
| Graph storage backend | `ruvector-graph` | Property graph + Cypher | Persistent queryable graph with Bolt/RESP |
| GNN input pipeline | `ruvector-gnn` | Sparse adjacency matrices (CSC) | Native sparse matrix export for GNN |
| Vector hybrid queries | `ruvector-core` (HNSW) | Vector indexes + graph traversal | Graph-constrained ANN search |
| GraphRAG for ruvLLM | `ruvllm` | GraphRAG SDK | Knowledge-grounded LLM inference |
| Distributed graph | `ruvector-cluster`, `ruvector-raft` | Redis Cluster/Sentinel | HA graph storage |
| Embedding storage | `ruvector-attention` | Vector properties on nodes | Computed attention as graph metadata |
### The Rust Port Factor
[FalkorDB-core-rs](https://github.com/FalkorDB/FalkorDB-core-rs) (~95 commits) is particularly noteworthy. A Rust-native FalkorDB core could be embedded directly into RuVector as a library dependency, eliminating network overhead for graph queries during GNN training and transformer inference.
### Risks and Considerations
| Risk | Impact | Mitigation |
|------|--------|------------|
| **SSPLv1 License** | Restricts managed service offering | Legal review needed for embedding |
| **Redis Dependency** | Infrastructure overhead (Redis 7.4+) | Rust port may eliminate this |
| **OpenCypher Subset** | Complex queries may not work | Validate needed query patterns |
| **Write Serialization** | Bottleneck for embedding updates | Batch writes, partition graphs |
---
## Sources
- [FalkorDB GitHub Repository](https://github.com/FalkorDB/FalkorDB)
- [FalkorDB Documentation](https://docs.falkordb.com/)
- [FalkorDB Design Document](https://docs.falkordb.com/design/)
- [FalkorDB Performance Benchmarks vs Neo4j](https://www.falkordb.com/blog/graph-database-performance-benchmarks-falkordb-vs-neo4j/)
- [Best Database for Knowledge Graphs: FalkorDB vs Neo4j](https://www.falkordb.com/blog/best-database-for-knowledge-graphs-falkordb-neo4j/)
- [FalkorDB for AI and ML: Building Production-Ready GraphRAG Systems](https://orchestrator.dev/blog/2025-12-11-falkordb/)
- [FalkorDB: Open-Source Graph Database for Real-Time AI Agents (Medium)](https://medium.com/@CodePulse/falkordb-the-open-source-graph-database-built-for-real-time-ai-agents-8aff7b3400b3)
- [FalkorDB 4.0 Beta Release](https://www.falkordb.com/blog/falkordb-4-0-beta-released-major-improvements-and-critical-bug-fixes/)
- [FalkorDB Cloud Plans & Pricing](https://www.falkordb.com/plans/)
- [FalkorDB Rust Client (crates.io)](https://crates.io/crates/falkordb)
- [FalkorDB-core-rs: Rust Port](https://github.com/FalkorDB/FalkorDB-core-rs)
- [FalkorDB Client Libraries](https://docs.falkordb.com/getting-started/clients.html)
- [FalkorDB Kubernetes Support](https://docs.falkordb.com/operations/k8s-support.html)
- [FalkorDB GraphRAG SDK](https://github.com/FalkorDB/GraphRAG-SDK)
- [Graph Database Guide for AI Architects (2026)](https://www.falkordb.com/blog/graph-database-guide/)
- [FalkorDB vs Neo4j (PuppyGraph)](https://www.puppygraph.com/blog/falkordb-vs-neo4j)
- [FalkorDB vs Neo4j (DEV Community)](https://dev.to/danshalev7/falkordb-vs-neo4j-53bh)
- [FalkorDB GitHub Releases](https://github.com/FalkorDB/FalkorDB/releases)