dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

16 KiB

Raw Blame History

FalkorDB: Comprehensive Technical Research Report

Research date: 2026-02-26 | Branch: research/falkordb-review

1. Project Overview
2. Architecture
3. Key Features
4. Technical Deep Dive
5. Ecosystem
6. Relevance to RuVector
Sources

1. Project Overview

What is FalkorDB?

FalkorDB is a high-performance, in-memory property graph database that runs as a Redis module. Its distinguishing characteristic is the use of sparse adjacency matrices (via the GraphBLAS standard) and linear algebra operations for graph traversal and query execution — a fundamentally different approach from the pointer-hopping model used by traditional graph databases like Neo4j.

The project positions itself as "the best Knowledge Graph for LLM (GraphRAG)," optimized for low-latency graph queries that serve AI/ML inference pipelines.

Origins: The RedisGraph Lineage

FalkorDB is the direct successor to RedisGraph, which reached End-of-Life in January 2025 when Redis Ltd. discontinued it. The FalkorDB team forked and continued development, preserving the core sparse-matrix architecture while adding significant new capabilities including vector indexing, the Bolt protocol, and a dedicated GraphRAG SDK.

License

Server Side Public License v1 (SSPLv1) — the same license used by MongoDB. This restricts offering FalkorDB as a managed service without a commercial agreement. A commercial Enterprise license is also available.

Current Version and Activity

Metric	Value
Latest release	v4.16.5 (February 2026)
GitHub stars	~3,600
Forks	~280
Commits (master)	2,172+
Primary language	C (Rust port in progress)
Release cadence	Multiple per month

Key recent milestones:

v4.16.0 (Dec 2025): User-Defined Functions (UDFs)
v4.14.10 (Dec 2025): 30% memory reduction via compact in-memory storage
v4.0 (2024): Vector index support, Bolt protocol

2. Architecture

Core: Sparse Matrices + Linear Algebra

FalkorDB's architecture is built on a single foundational insight: graph traversals can be expressed as sparse matrix multiplications. Rather than crawling through pointer-linked node structures, FalkorDB translates Cypher pattern queries into algebraic expressions executed by GraphBLAS.

How it works:

Graph topology is stored as sparse adjacency matrices in CSC (Compressed Sparse Column) format
Each graph has one global adjacency matrix plus dedicated matrices per relationship type
Label membership is stored as symmetric diagonal matrices
A query like (N0)-[A]->(N1)-[B]->(N2) translates to the matrix multiplication A * B
GraphBLAS executes the multiplication using CPU-level optimizations (AVX, OpenMP)

Storage Engine

In-memory primary storage with Redis-backed disk persistence
Graph struct: Central data structure managing entities through DataBlock arrays and GraphBLAS matrices for relationships
DataBlock: Contiguous memory blocks for node/edge properties — O(1) insertion (1M+ node creates in <500ms, 500K edges in 0.3s)
Compact storage (v4.14.10+): Dual representation approach achieving 30% memory reduction
Roaring bitmaps for label indexes

Query Engine Pipeline

Cypher Query
    |
    v
[Parsing] --> AST (Lex tokenizer + Lemon parser)
    |
    v
[Algebraic Translation] --> Matrix multiplication expressions
    |
    v
[Optimization] --> Execution plans prioritizing sparse intermediates
    |
    v
[Execution] --> Filtered traversal, conditional traversal, projection
    |
    v
[Result Population] --> Matching entity attributes

Data Model

Property Graph Model (OpenCypher-compliant):

Nodes: Zero or more labels, key-value properties
Relationships: Exactly one type, key-value properties, directed
Relationships recorded in adjacency matrices: M[source, destination] = 1

Concurrency Model

Read-write lock per graph: Concurrent readers, serialized writers
Intra-query parallelism: Independent sub-expressions execute in parallel via OpenMP
Redis event loop: Module runs within Redis's single-threaded event loop, offloads heavy computation to worker threads

Persistence and Replication

Feature	Mechanism
Persistence	Redis RDB snapshots + AOF
Replication	Effect-based (deltas only)
HA	Redis Sentinel for automatic failover
Clustering	Redis Cluster (3 masters + 3 replicas)
Kubernetes	Helm charts, KubeBlocks, dedicated operator

3. Key Features

Query Language: OpenCypher with Extensions

FalkorDB implements a subset of OpenCypher with proprietary extensions:

Command	Purpose
`GRAPH.QUERY` / `GRAPH.RO_QUERY`	Execute read-write or read-only queries
`GRAPH.EXPLAIN` / `GRAPH.PROFILE`	Query plan inspection and profiling
`GRAPH.DELETE`	Drop a graph
`GRAPH.INFO` / `GRAPH.MEMORY`	Metadata and memory diagnostics
`GRAPH.COPY`	Duplicate a graph

Standard Cypher clauses: MATCH, CREATE, DELETE, SET, MERGE, WHERE, ORDER BY, RETURN, WITH, UNWIND, etc.

Indexing Capabilities

Index Type	Description
Range	Numeric/comparable values, efficient lookups
Full-text	Text-based search queries
Vector (v4.0+)	Configurable dimensionality, cosine/euclidean similarity

-- Vector index example
CREATE VECTOR INDEX FOR (n:Product) ON (n.embedding)
OPTIONS {dimension: 1536, similarityFunction: 'cosine'}

Performance Benchmarks (vs Neo4j)

SNAP Pokec social network, 82% read / 18% write:

Metric	FalkorDB	Neo4j	Ratio
p50 latency	55ms	577.5ms	~10x faster
p90 latency	108ms	4,784ms	~44x faster
p99 latency	136.2ms	46,924ms	~345x faster
PageRank	18.53ms	417.31ms	~23x faster
WCC	17.8ms	1,324ms	~74x faster
Memory usage	100MB	600MB	6x less

Key property: FalkorDB maintains a consistent 2.5x latency increase from p50 to p99, indicating predictable performance. Neo4j shows extreme tail latency variance due to JVM GC pauses.

AI/ML Integrations

GraphRAG SDK (GitHub):

Converts user queries to Cypher via LLM
Retrieves relevant subgraphs as context for LLM generation
Claims up to 90% reduction in hallucinations vs. vector-only RAG
Multi-model configuration: separate models for graph construction vs. Q&A
Multi-agent support: specialized agents per knowledge domain

Vector Support:

Native vector indexes on node/edge properties
Hybrid queries: graph traversal narrows dataset, vector search ranks results
Integration with OpenAI, Anthropic, and other embedding providers

Framework Integrations: LangChain (Python + JS/TS), LlamaIndex, AG2/AutoGen, N8N + Graphiti

Protocols

Protocol	Port	Notes
RESP (Redis)	6379	Native Redis protocol
Bolt (v4.0+)	7687	Neo4j-compatible, enables migration

User-Defined Functions (v4.16.0+)

UDFs allow extending query capabilities with custom functions, including graph object support.

4. Technical Deep Dive

Core Data Structures

Structure	Purpose
GraphBLAS sparse matrices (CSC)	Adjacency representation
DataBlock	Contiguous memory for node/edge properties
Label matrices	Diagonal matrices for node-label membership
Graph struct	Central coordinator for DataBlocks + matrices
AST (Lex + Lemon)	Query parsing and IR
Execution plan	Optimized query tree with algebraic ops

Comparison with Other Graph Databases

Dimension	FalkorDB	Neo4j	RedisGraph (EOL)
Language	C (Rust port underway)	Java (JVM)	C
Graph model	Property Graph	Property Graph	Property Graph
Query lang	OpenCypher subset	Full Cypher	OpenCypher subset
Execution	Sparse matrix algebra	Pointer hopping	Sparse matrix algebra
Traversal	Matrix multiplication	Index-free adjacency	Matrix multiplication
Memory	In-memory + persistence	Disk-based + cache	In-memory
Concurrency	RW-lock + OpenMP	MVCC (JVM)	RW-lock
Vector index	Native (v4.0+)	Via plugin	No
Clustering	Redis Cluster/Sentinel	Causal clustering	Redis Cluster
License	SSPLv1	GPL/Commercial	EOL
AI focus	Primary (GraphRAG SDK)	Secondary (GenAI plugin)	None
Bolt protocol	Yes (v4.0+)	Native	No

Memory Management

In-memory architecture: All graph data in RAM
Redis module model: Leverages Redis memory allocation
Compact storage (v4.14.10): 30% memory reduction
GRAPH.MEMORY: Runtime diagnostics
Production guidance: 48GB allocation for high-fragmentation; restart if ratio >10
Automatic index shrinking: Deleted entries trigger compaction

5. Ecosystem

Official Client Libraries

Language	Package	License
Python	falkordb-py	MIT
Node.js	falkordb-ts	MIT
Java	jfalkordb	BSD
Rust	falkordb-rs	MIT
Go	falkordb-go	BSD
C#	NFalkorDB	Apache 2.0

OGM (Object-Graph Mapping): Python ORM, Go ORM, Spring Data (Java) Community: 20+ implementations (Elixir, Ruby, PHP, Julia, etc.)

Cloud Offerings

Tier	Price	Key Features
Free	$0	Multi-graph, ACL, community support
Startup	From $73/GB/mo	TLS, automated backups
Pro	From $350/8GB/mo	Clustering, HA, multi-zone
Enterprise	Custom	VPC, 24/7 support, dedicated AM

Available on AWS Marketplace and Google Cloud Marketplace.

Community

~3,600 GitHub stars, ~280 forks
Funded startup (Crunchbase-listed)
falkordb-browser for visual graph exploration
FalkorDBLite: Embedded Python variant with process isolation

6. Relevance to RuVector

Direct Architectural Parallels

Sparse Matrix Algebra for Graphs: FalkorDB's core insight — expressing graph traversals as sparse matrix multiplications via GraphBLAS — is directly relevant to RuVector's graph computation workloads. The ruvector-graph crate already uses petgraph and roaring bitmaps. FalkorDB's approach demonstrates that CSC sparse matrices + linear algebra can achieve 10-345x improvements over traditional traversal.

HNSW Integration: Both projects use HNSW indexing. RuVector has dedicated crates (ruvector-hyperbolic-hnsw, micro-hnsw-wasm) while FalkorDB's vector indexes use similar ANN search. RuVector's ruvector-gnn operates on HNSW topology.

Cypher Query Language: The ruvector-graph crate includes Cypher parsing dependencies (nom, pest, lalrpop-util). FalkorDB's OpenCypher implementation with Bolt protocol is a mature reference. The Rust port at FalkorDB-core-rs could serve as a Rust-native reference.

GNN and Graph Transformer Integration

FalkorDB stores graphs as sparse matrices — the exact format consumed by GNN pipelines:

Store knowledge graphs in FalkorDB for persistent, queryable storage
Export adjacency matrices in sparse CSC format for ruvector-gnn
Run GNN message-passing on exported topology via RuVector's ndarray/rayon computation
Write computed embeddings back to FalkorDB as vector properties for hybrid queries

The ruvector-graph-transformer (unified graph transformer with proof-gated mutation) could use FalkorDB as graph storage, querying subgraphs via Cypher and computing attention over them.

Solver and Optimization

FalkorDB's GraphBLAS execution translates graph problems into linear algebra — the same domain as ruvector-solver. For combinatorial optimization (min-cut, max-flow, partitioning), FalkorDB's sparse matrix representation could serve as efficient input for ruvector-mincut and ruvector-solver.

Sparse Inference

The ruvector-sparse-inference crate implements "PowerInfer-style sparse inference." FalkorDB demonstrates that sparsity-aware data structures (CSC) combined with hardware-optimized linear algebra (AVX, OpenMP) achieve orders-of-magnitude speedups — directly applicable to sparse neural network inference.

Concrete Integration Opportunities

Integration Point	RuVector Crate	FalkorDB Feature	Value
Graph storage backend	`ruvector-graph`	Property graph + Cypher	Persistent queryable graph with Bolt/RESP
GNN input pipeline	`ruvector-gnn`	Sparse adjacency matrices (CSC)	Native sparse matrix export for GNN
Vector hybrid queries	`ruvector-core` (HNSW)	Vector indexes + graph traversal	Graph-constrained ANN search
GraphRAG for ruvLLM	`ruvllm`	GraphRAG SDK	Knowledge-grounded LLM inference
Distributed graph	`ruvector-cluster`, `ruvector-raft`	Redis Cluster/Sentinel	HA graph storage
Embedding storage	`ruvector-attention`	Vector properties on nodes	Computed attention as graph metadata

The Rust Port Factor

FalkorDB-core-rs (~95 commits) is particularly noteworthy. A Rust-native FalkorDB core could be embedded directly into RuVector as a library dependency, eliminating network overhead for graph queries during GNN training and transformer inference.

Risks and Considerations

Risk	Impact	Mitigation
SSPLv1 License	Restricts managed service offering	Legal review needed for embedding
Redis Dependency	Infrastructure overhead (Redis 7.4+)	Rust port may eliminate this
OpenCypher Subset	Complex queries may not work	Validate needed query patterns
Write Serialization	Bottleneck for embedding updates	Batch writes, partition graphs

16 KiB Raw Blame History