Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
344
vendor/ruvector/docs/research/FalkorDB/README.md
vendored
Normal file
344
vendor/ruvector/docs/research/FalkorDB/README.md
vendored
Normal file
@@ -0,0 +1,344 @@
|
||||
# FalkorDB: Comprehensive Technical Research Report
|
||||
|
||||
> Research date: 2026-02-26 | Branch: `research/falkordb-review`
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [1. Project Overview](#1-project-overview)
|
||||
- [2. Architecture](#2-architecture)
|
||||
- [3. Key Features](#3-key-features)
|
||||
- [4. Technical Deep Dive](#4-technical-deep-dive)
|
||||
- [5. Ecosystem](#5-ecosystem)
|
||||
- [6. Relevance to RuVector](#6-relevance-to-ruvector)
|
||||
- [Sources](#sources)
|
||||
|
||||
---
|
||||
|
||||
## 1. Project Overview
|
||||
|
||||
### What is FalkorDB?
|
||||
|
||||
FalkorDB is a high-performance, in-memory property graph database that runs as a Redis module. Its distinguishing characteristic is the use of **sparse adjacency matrices** (via the GraphBLAS standard) and **linear algebra operations** for graph traversal and query execution — a fundamentally different approach from the pointer-hopping model used by traditional graph databases like Neo4j.
|
||||
|
||||
The project positions itself as "the best Knowledge Graph for LLM (GraphRAG)," optimized for low-latency graph queries that serve AI/ML inference pipelines.
|
||||
|
||||
### Origins: The RedisGraph Lineage
|
||||
|
||||
FalkorDB is the **direct successor to RedisGraph**, which reached End-of-Life in January 2025 when Redis Ltd. discontinued it. The FalkorDB team forked and continued development, preserving the core sparse-matrix architecture while adding significant new capabilities including vector indexing, the Bolt protocol, and a dedicated GraphRAG SDK.
|
||||
|
||||
### License
|
||||
|
||||
**Server Side Public License v1 (SSPLv1)** — the same license used by MongoDB. This restricts offering FalkorDB as a managed service without a commercial agreement. A commercial Enterprise license is also available.
|
||||
|
||||
### Current Version and Activity
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Latest release | v4.16.5 (February 2026) |
|
||||
| GitHub stars | ~3,600 |
|
||||
| Forks | ~280 |
|
||||
| Commits (master) | 2,172+ |
|
||||
| Primary language | C (Rust port in progress) |
|
||||
| Release cadence | Multiple per month |
|
||||
|
||||
**Key recent milestones**:
|
||||
- v4.16.0 (Dec 2025): User-Defined Functions (UDFs)
|
||||
- v4.14.10 (Dec 2025): 30% memory reduction via compact in-memory storage
|
||||
- v4.0 (2024): Vector index support, Bolt protocol
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### Core: Sparse Matrices + Linear Algebra
|
||||
|
||||
FalkorDB's architecture is built on a single foundational insight: **graph traversals can be expressed as sparse matrix multiplications**. Rather than crawling through pointer-linked node structures, FalkorDB translates Cypher pattern queries into algebraic expressions executed by [GraphBLAS](http://graphblas.org/).
|
||||
|
||||
**How it works**:
|
||||
|
||||
1. Graph topology is stored as sparse adjacency matrices in **CSC (Compressed Sparse Column)** format
|
||||
2. Each graph has one global adjacency matrix plus dedicated matrices per relationship type
|
||||
3. Label membership is stored as symmetric diagonal matrices
|
||||
4. A query like `(N0)-[A]->(N1)-[B]->(N2)` translates to the matrix multiplication `A * B`
|
||||
5. GraphBLAS executes the multiplication using CPU-level optimizations (AVX, OpenMP)
|
||||
|
||||
### Storage Engine
|
||||
|
||||
- **In-memory primary storage** with Redis-backed disk persistence
|
||||
- **Graph struct**: Central data structure managing entities through `DataBlock` arrays and GraphBLAS matrices for relationships
|
||||
- **DataBlock**: Contiguous memory blocks for node/edge properties — O(1) insertion (1M+ node creates in <500ms, 500K edges in 0.3s)
|
||||
- **Compact storage** (v4.14.10+): Dual representation approach achieving 30% memory reduction
|
||||
- **Roaring bitmaps** for label indexes
|
||||
|
||||
### Query Engine Pipeline
|
||||
|
||||
```
|
||||
Cypher Query
|
||||
|
|
||||
v
|
||||
[Parsing] --> AST (Lex tokenizer + Lemon parser)
|
||||
|
|
||||
v
|
||||
[Algebraic Translation] --> Matrix multiplication expressions
|
||||
|
|
||||
v
|
||||
[Optimization] --> Execution plans prioritizing sparse intermediates
|
||||
|
|
||||
v
|
||||
[Execution] --> Filtered traversal, conditional traversal, projection
|
||||
|
|
||||
v
|
||||
[Result Population] --> Matching entity attributes
|
||||
```
|
||||
|
||||
### Data Model
|
||||
|
||||
**Property Graph Model** (OpenCypher-compliant):
|
||||
- **Nodes**: Zero or more labels, key-value properties
|
||||
- **Relationships**: Exactly one type, key-value properties, directed
|
||||
- Relationships recorded in adjacency matrices: `M[source, destination] = 1`
|
||||
|
||||
### Concurrency Model
|
||||
|
||||
- **Read-write lock per graph**: Concurrent readers, serialized writers
|
||||
- **Intra-query parallelism**: Independent sub-expressions execute in parallel via OpenMP
|
||||
- **Redis event loop**: Module runs within Redis's single-threaded event loop, offloads heavy computation to worker threads
|
||||
|
||||
### Persistence and Replication
|
||||
|
||||
| Feature | Mechanism |
|
||||
|---------|-----------|
|
||||
| Persistence | Redis RDB snapshots + AOF |
|
||||
| Replication | Effect-based (deltas only) |
|
||||
| HA | Redis Sentinel for automatic failover |
|
||||
| Clustering | Redis Cluster (3 masters + 3 replicas) |
|
||||
| Kubernetes | Helm charts, KubeBlocks, dedicated operator |
|
||||
|
||||
---
|
||||
|
||||
## 3. Key Features
|
||||
|
||||
### Query Language: OpenCypher with Extensions
|
||||
|
||||
FalkorDB implements a **subset of OpenCypher** with proprietary extensions:
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `GRAPH.QUERY` / `GRAPH.RO_QUERY` | Execute read-write or read-only queries |
|
||||
| `GRAPH.EXPLAIN` / `GRAPH.PROFILE` | Query plan inspection and profiling |
|
||||
| `GRAPH.DELETE` | Drop a graph |
|
||||
| `GRAPH.INFO` / `GRAPH.MEMORY` | Metadata and memory diagnostics |
|
||||
| `GRAPH.COPY` | Duplicate a graph |
|
||||
|
||||
Standard Cypher clauses: `MATCH`, `CREATE`, `DELETE`, `SET`, `MERGE`, `WHERE`, `ORDER BY`, `RETURN`, `WITH`, `UNWIND`, etc.
|
||||
|
||||
### Indexing Capabilities
|
||||
|
||||
| Index Type | Description |
|
||||
|------------|-------------|
|
||||
| **Range** | Numeric/comparable values, efficient lookups |
|
||||
| **Full-text** | Text-based search queries |
|
||||
| **Vector** (v4.0+) | Configurable dimensionality, cosine/euclidean similarity |
|
||||
|
||||
```cypher
|
||||
-- Vector index example
|
||||
CREATE VECTOR INDEX FOR (n:Product) ON (n.embedding)
|
||||
OPTIONS {dimension: 1536, similarityFunction: 'cosine'}
|
||||
```
|
||||
|
||||
### Performance Benchmarks (vs Neo4j)
|
||||
|
||||
SNAP Pokec social network, 82% read / 18% write:
|
||||
|
||||
| Metric | FalkorDB | Neo4j | Ratio |
|
||||
|--------|----------|-------|-------|
|
||||
| p50 latency | 55ms | 577.5ms | **~10x faster** |
|
||||
| p90 latency | 108ms | 4,784ms | **~44x faster** |
|
||||
| p99 latency | 136.2ms | 46,924ms | **~345x faster** |
|
||||
| PageRank | 18.53ms | 417.31ms | **~23x faster** |
|
||||
| WCC | 17.8ms | 1,324ms | **~74x faster** |
|
||||
| Memory usage | 100MB | 600MB | **6x less** |
|
||||
|
||||
Key property: FalkorDB maintains a consistent 2.5x latency increase from p50 to p99, indicating predictable performance. Neo4j shows extreme tail latency variance due to JVM GC pauses.
|
||||
|
||||
### AI/ML Integrations
|
||||
|
||||
**GraphRAG SDK** ([GitHub](https://github.com/FalkorDB/GraphRAG-SDK)):
|
||||
- Converts user queries to Cypher via LLM
|
||||
- Retrieves relevant subgraphs as context for LLM generation
|
||||
- Claims up to 90% reduction in hallucinations vs. vector-only RAG
|
||||
- Multi-model configuration: separate models for graph construction vs. Q&A
|
||||
- Multi-agent support: specialized agents per knowledge domain
|
||||
|
||||
**Vector Support**:
|
||||
- Native vector indexes on node/edge properties
|
||||
- Hybrid queries: graph traversal narrows dataset, vector search ranks results
|
||||
- Integration with OpenAI, Anthropic, and other embedding providers
|
||||
|
||||
**Framework Integrations**: LangChain (Python + JS/TS), LlamaIndex, AG2/AutoGen, N8N + Graphiti
|
||||
|
||||
### Protocols
|
||||
|
||||
| Protocol | Port | Notes |
|
||||
|----------|------|-------|
|
||||
| RESP (Redis) | 6379 | Native Redis protocol |
|
||||
| Bolt (v4.0+) | 7687 | Neo4j-compatible, enables migration |
|
||||
|
||||
### User-Defined Functions (v4.16.0+)
|
||||
|
||||
UDFs allow extending query capabilities with custom functions, including graph object support.
|
||||
|
||||
---
|
||||
|
||||
## 4. Technical Deep Dive
|
||||
|
||||
### Core Data Structures
|
||||
|
||||
| Structure | Purpose |
|
||||
|-----------|---------|
|
||||
| GraphBLAS sparse matrices (CSC) | Adjacency representation |
|
||||
| DataBlock | Contiguous memory for node/edge properties |
|
||||
| Label matrices | Diagonal matrices for node-label membership |
|
||||
| Graph struct | Central coordinator for DataBlocks + matrices |
|
||||
| AST (Lex + Lemon) | Query parsing and IR |
|
||||
| Execution plan | Optimized query tree with algebraic ops |
|
||||
|
||||
### Comparison with Other Graph Databases
|
||||
|
||||
| Dimension | FalkorDB | Neo4j | RedisGraph (EOL) |
|
||||
|-----------|----------|-------|-------------------|
|
||||
| **Language** | C (Rust port underway) | Java (JVM) | C |
|
||||
| **Graph model** | Property Graph | Property Graph | Property Graph |
|
||||
| **Query lang** | OpenCypher subset | Full Cypher | OpenCypher subset |
|
||||
| **Execution** | Sparse matrix algebra | Pointer hopping | Sparse matrix algebra |
|
||||
| **Traversal** | Matrix multiplication | Index-free adjacency | Matrix multiplication |
|
||||
| **Memory** | In-memory + persistence | Disk-based + cache | In-memory |
|
||||
| **Concurrency** | RW-lock + OpenMP | MVCC (JVM) | RW-lock |
|
||||
| **Vector index** | Native (v4.0+) | Via plugin | No |
|
||||
| **Clustering** | Redis Cluster/Sentinel | Causal clustering | Redis Cluster |
|
||||
| **License** | SSPLv1 | GPL/Commercial | EOL |
|
||||
| **AI focus** | Primary (GraphRAG SDK) | Secondary (GenAI plugin) | None |
|
||||
| **Bolt protocol** | Yes (v4.0+) | Native | No |
|
||||
|
||||
### Memory Management
|
||||
|
||||
- **In-memory architecture**: All graph data in RAM
|
||||
- **Redis module model**: Leverages Redis memory allocation
|
||||
- **Compact storage** (v4.14.10): 30% memory reduction
|
||||
- **GRAPH.MEMORY**: Runtime diagnostics
|
||||
- **Production guidance**: 48GB allocation for high-fragmentation; restart if ratio >10
|
||||
- **Automatic index shrinking**: Deleted entries trigger compaction
|
||||
|
||||
---
|
||||
|
||||
## 5. Ecosystem
|
||||
|
||||
### Official Client Libraries
|
||||
|
||||
| Language | Package | License |
|
||||
|----------|---------|---------|
|
||||
| Python | [falkordb-py](https://pypi.org/project/FalkorDB/) | MIT |
|
||||
| Node.js | [falkordb-ts](https://www.npmjs.com/package/falkordb) | MIT |
|
||||
| Java | [jfalkordb](https://search.maven.org/search?q=jfalkordb) | BSD |
|
||||
| **Rust** | [falkordb-rs](https://crates.io/crates/falkordb) | MIT |
|
||||
| Go | [falkordb-go](https://github.com/FalkorDB/falkordb-go) | BSD |
|
||||
| C# | [NFalkorDB](https://www.nuget.org/packages/NFalkorDB) | Apache 2.0 |
|
||||
|
||||
**OGM (Object-Graph Mapping)**: Python ORM, Go ORM, Spring Data (Java)
|
||||
**Community**: 20+ implementations (Elixir, Ruby, PHP, Julia, etc.)
|
||||
|
||||
### Cloud Offerings
|
||||
|
||||
| Tier | Price | Key Features |
|
||||
|------|-------|--------------|
|
||||
| Free | $0 | Multi-graph, ACL, community support |
|
||||
| Startup | From $73/GB/mo | TLS, automated backups |
|
||||
| Pro | From $350/8GB/mo | Clustering, HA, multi-zone |
|
||||
| Enterprise | Custom | VPC, 24/7 support, dedicated AM |
|
||||
|
||||
Available on **AWS Marketplace** and **Google Cloud Marketplace**.
|
||||
|
||||
### Community
|
||||
|
||||
- ~3,600 GitHub stars, ~280 forks
|
||||
- Funded startup (Crunchbase-listed)
|
||||
- [falkordb-browser](https://github.com/FalkorDB/falkordb-browser) for visual graph exploration
|
||||
- **FalkorDBLite**: Embedded Python variant with process isolation
|
||||
|
||||
---
|
||||
|
||||
## 6. Relevance to RuVector
|
||||
|
||||
### Direct Architectural Parallels
|
||||
|
||||
**Sparse Matrix Algebra for Graphs**: FalkorDB's core insight — expressing graph traversals as sparse matrix multiplications via GraphBLAS — is directly relevant to RuVector's graph computation workloads. The `ruvector-graph` crate already uses `petgraph` and `roaring` bitmaps. FalkorDB's approach demonstrates that CSC sparse matrices + linear algebra can achieve 10-345x improvements over traditional traversal.
|
||||
|
||||
**HNSW Integration**: Both projects use HNSW indexing. RuVector has dedicated crates (`ruvector-hyperbolic-hnsw`, `micro-hnsw-wasm`) while FalkorDB's vector indexes use similar ANN search. RuVector's `ruvector-gnn` operates on HNSW topology.
|
||||
|
||||
**Cypher Query Language**: The `ruvector-graph` crate includes Cypher parsing dependencies (`nom`, `pest`, `lalrpop-util`). FalkorDB's OpenCypher implementation with Bolt protocol is a mature reference. The Rust port at [FalkorDB-core-rs](https://github.com/FalkorDB/FalkorDB-core-rs) could serve as a Rust-native reference.
|
||||
|
||||
### GNN and Graph Transformer Integration
|
||||
|
||||
FalkorDB stores graphs as sparse matrices — the exact format consumed by GNN pipelines:
|
||||
|
||||
1. **Store knowledge graphs** in FalkorDB for persistent, queryable storage
|
||||
2. **Export adjacency matrices** in sparse CSC format for `ruvector-gnn`
|
||||
3. **Run GNN message-passing** on exported topology via RuVector's ndarray/rayon computation
|
||||
4. **Write computed embeddings back** to FalkorDB as vector properties for hybrid queries
|
||||
|
||||
The `ruvector-graph-transformer` (unified graph transformer with proof-gated mutation) could use FalkorDB as graph storage, querying subgraphs via Cypher and computing attention over them.
|
||||
|
||||
### Solver and Optimization
|
||||
|
||||
FalkorDB's GraphBLAS execution translates graph problems into linear algebra — the same domain as `ruvector-solver`. For combinatorial optimization (min-cut, max-flow, partitioning), FalkorDB's sparse matrix representation could serve as efficient input for `ruvector-mincut` and `ruvector-solver`.
|
||||
|
||||
### Sparse Inference
|
||||
|
||||
The `ruvector-sparse-inference` crate implements "PowerInfer-style sparse inference." FalkorDB demonstrates that sparsity-aware data structures (CSC) combined with hardware-optimized linear algebra (AVX, OpenMP) achieve orders-of-magnitude speedups — directly applicable to sparse neural network inference.
|
||||
|
||||
### Concrete Integration Opportunities
|
||||
|
||||
| Integration Point | RuVector Crate | FalkorDB Feature | Value |
|
||||
|---|---|---|---|
|
||||
| Graph storage backend | `ruvector-graph` | Property graph + Cypher | Persistent queryable graph with Bolt/RESP |
|
||||
| GNN input pipeline | `ruvector-gnn` | Sparse adjacency matrices (CSC) | Native sparse matrix export for GNN |
|
||||
| Vector hybrid queries | `ruvector-core` (HNSW) | Vector indexes + graph traversal | Graph-constrained ANN search |
|
||||
| GraphRAG for ruvLLM | `ruvllm` | GraphRAG SDK | Knowledge-grounded LLM inference |
|
||||
| Distributed graph | `ruvector-cluster`, `ruvector-raft` | Redis Cluster/Sentinel | HA graph storage |
|
||||
| Embedding storage | `ruvector-attention` | Vector properties on nodes | Computed attention as graph metadata |
|
||||
|
||||
### The Rust Port Factor
|
||||
|
||||
[FalkorDB-core-rs](https://github.com/FalkorDB/FalkorDB-core-rs) (~95 commits) is particularly noteworthy. A Rust-native FalkorDB core could be embedded directly into RuVector as a library dependency, eliminating network overhead for graph queries during GNN training and transformer inference.
|
||||
|
||||
### Risks and Considerations
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| **SSPLv1 License** | Restricts managed service offering | Legal review needed for embedding |
|
||||
| **Redis Dependency** | Infrastructure overhead (Redis 7.4+) | Rust port may eliminate this |
|
||||
| **OpenCypher Subset** | Complex queries may not work | Validate needed query patterns |
|
||||
| **Write Serialization** | Bottleneck for embedding updates | Batch writes, partition graphs |
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- [FalkorDB GitHub Repository](https://github.com/FalkorDB/FalkorDB)
|
||||
- [FalkorDB Documentation](https://docs.falkordb.com/)
|
||||
- [FalkorDB Design Document](https://docs.falkordb.com/design/)
|
||||
- [FalkorDB Performance Benchmarks vs Neo4j](https://www.falkordb.com/blog/graph-database-performance-benchmarks-falkordb-vs-neo4j/)
|
||||
- [Best Database for Knowledge Graphs: FalkorDB vs Neo4j](https://www.falkordb.com/blog/best-database-for-knowledge-graphs-falkordb-neo4j/)
|
||||
- [FalkorDB for AI and ML: Building Production-Ready GraphRAG Systems](https://orchestrator.dev/blog/2025-12-11-falkordb/)
|
||||
- [FalkorDB: Open-Source Graph Database for Real-Time AI Agents (Medium)](https://medium.com/@CodePulse/falkordb-the-open-source-graph-database-built-for-real-time-ai-agents-8aff7b3400b3)
|
||||
- [FalkorDB 4.0 Beta Release](https://www.falkordb.com/blog/falkordb-4-0-beta-released-major-improvements-and-critical-bug-fixes/)
|
||||
- [FalkorDB Cloud Plans & Pricing](https://www.falkordb.com/plans/)
|
||||
- [FalkorDB Rust Client (crates.io)](https://crates.io/crates/falkordb)
|
||||
- [FalkorDB-core-rs: Rust Port](https://github.com/FalkorDB/FalkorDB-core-rs)
|
||||
- [FalkorDB Client Libraries](https://docs.falkordb.com/getting-started/clients.html)
|
||||
- [FalkorDB Kubernetes Support](https://docs.falkordb.com/operations/k8s-support.html)
|
||||
- [FalkorDB GraphRAG SDK](https://github.com/FalkorDB/GraphRAG-SDK)
|
||||
- [Graph Database Guide for AI Architects (2026)](https://www.falkordb.com/blog/graph-database-guide/)
|
||||
- [FalkorDB vs Neo4j (PuppyGraph)](https://www.puppygraph.com/blog/falkordb-vs-neo4j)
|
||||
- [FalkorDB vs Neo4j (DEV Community)](https://dev.to/danshalev7/falkordb-vs-neo4j-53bh)
|
||||
- [FalkorDB GitHub Releases](https://github.com/FalkorDB/FalkorDB/releases)
|
||||
Reference in New Issue
Block a user