Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
645
docs/postgres/v2/00-overview.md
Normal file
645
docs/postgres/v2/00-overview.md
Normal file
@@ -0,0 +1,645 @@
|
||||
# RuVector Postgres v2 - Architecture Overview
|
||||
<!-- Last reviewed: 2025-12-25 -->
|
||||
|
||||
## What We're Building
|
||||
|
||||
Most databases, including vector databases, are **performance-first systems**. They optimize for speed, recall, and throughput, then bolt on monitoring. Structural safety is assumed, not measured.
|
||||
|
||||
RuVector does something different.
|
||||
|
||||
We give the system a **continuous, internal measure of its own structural integrity**, and the ability to **change its own behavior based on that signal**.
|
||||
|
||||
This puts RuVector in a very small class of systems.
|
||||
|
||||
---
|
||||
|
||||
## Why This Actually Matters
|
||||
|
||||
### 1. From Symptom Monitoring to Causal Monitoring
|
||||
|
||||
Everyone else watches outputs: latency, errors, recall.
|
||||
|
||||
We watch **connectivity and dependence**, which are upstream causes.
|
||||
|
||||
By the time latency spikes, the graph has already weakened. We detect that weakening while everything still looks healthy.
|
||||
|
||||
> **This is the difference between a smoke alarm and a structural stress sensor.**
|
||||
|
||||
### 2. Mincut Is a Leading Indicator, Not a Metric
|
||||
|
||||
Mincut answers a question no metric answers:
|
||||
|
||||
> *"How close is this system to splitting?"*
|
||||
|
||||
Not how slow it is. Not how many errors. **How close it is to losing coherence.**
|
||||
|
||||
That is a different axis of observability.
|
||||
|
||||
### 3. An Algorithm Becomes a Control Signal
|
||||
|
||||
Most people use graph algorithms for analysis. We use mincut to **gate behavior**.
|
||||
|
||||
That makes it a **control plane**, not analytics.
|
||||
|
||||
Very few production systems have mathematically grounded control loops.
|
||||
|
||||
### 4. Failure Mode Changes Class
|
||||
|
||||
| Without Integrity Control | With Integrity Control |
|
||||
|---------------------------|------------------------|
|
||||
| Fast → stressed → cascading failure → manual recovery | Fast → stressed → scope reduction → graceful degradation → automatic recovery |
|
||||
|
||||
Changing failure mode is what separates hobby systems from infrastructure.
|
||||
|
||||
### 5. Explainable Operations
|
||||
|
||||
The **witness edges** are huge.
|
||||
|
||||
When something slows down or freezes, we can say: *"Here are the exact links that would have failed next."*
|
||||
|
||||
That is gold in production, audits, and regulated environments.
|
||||
|
||||
---
|
||||
|
||||
## Why Nobody Else Has Done This
|
||||
|
||||
Not because it's impossible. Because:
|
||||
|
||||
1. **Most systems don't model themselves as graphs** — we do
|
||||
2. **Mincut was too expensive dynamically** — we use contracted graphs (~1000 nodes, not millions)
|
||||
3. **Ops culture reacts, it doesn't preempt** — we preempt
|
||||
4. **Survivability isn't a KPI until after outages** — we measure it continuously
|
||||
|
||||
---
|
||||
|
||||
## The Honest Framing
|
||||
|
||||
Will this get applause from model benchmarks or social media? No.
|
||||
|
||||
Will this make systems boringly reliable and therefore indispensable? Yes.
|
||||
|
||||
Those are the ideas that end up everywhere.
|
||||
|
||||
**We're not making vector search faster. We're making vector infrastructure survivable.**
|
||||
|
||||
---
|
||||
|
||||
## What This Is, Concretely
|
||||
|
||||
RuVector Postgres v2 is a **PostgreSQL extension** (built with pgrx) that provides:
|
||||
|
||||
- **100% pgvector compatibility** — drop-in replacement, change extension name, queries work unchanged
|
||||
- **Architecture separation** — PostgreSQL handles ACID/joins, RuVector handles vectors/graphs/learning
|
||||
- **Dynamic mincut integrity gating** — the control plane described above
|
||||
- **Self-learning pipeline** — GNN-based query optimization that improves over time
|
||||
- **Tiered storage** — automatic hot/warm/cool/cold management with compression
|
||||
- **Graph engine with Cypher** — property graphs with SQL joins
|
||||
|
||||
---
|
||||
|
||||
## Architecture Principles
|
||||
|
||||
### Separation of Concerns
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| PostgreSQL Frontend |
|
||||
| (SQL Parsing, Planning, ACID, Transactions, Joins, Aggregates) |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
v
|
||||
+------------------------------------------------------------------+
|
||||
| Extension Boundary (pgrx) |
|
||||
| - Type definitions (vector, sparsevec, halfvec) |
|
||||
| - Operator overloads (<->, <=>, <#>) |
|
||||
| - Index access method hooks |
|
||||
| - Background worker registration |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
v
|
||||
+------------------------------------------------------------------+
|
||||
| RuVector Engine (Rust) |
|
||||
| - HNSW/IVFFlat indexing |
|
||||
| - SIMD distance calculations |
|
||||
| - Graph storage & Cypher execution |
|
||||
| - GNN training & inference |
|
||||
| - Compression & tiering |
|
||||
| - Mincut integrity control |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Core Design Decisions
|
||||
|
||||
| Decision | Rationale |
|
||||
|----------|-----------|
|
||||
| **pgrx for extension** | Safe Rust bindings, modern build system, well-maintained |
|
||||
| **Background worker pattern** | Long-lived engine, avoid per-query initialization |
|
||||
| **Shared memory IPC** | Bounded request queue with explicit payload limits (see [02-background-workers](02-background-workers.md)) |
|
||||
| **WAL as source of truth** | Leverage Postgres replication, durability guarantees |
|
||||
| **Contracted mincut graph** | Never compute on full similarity - use operational graph |
|
||||
| **Hybrid consistency** | Synchronous hot tier, async background ops (see [10-consistency-replication](10-consistency-replication.md)) |
|
||||
|
||||
---
|
||||
|
||||
## System Architecture
|
||||
|
||||
### High-Level Components
|
||||
|
||||
```
|
||||
+-----------------------+
|
||||
| Client Application |
|
||||
+-----------+-----------+
|
||||
|
|
||||
+-----------v-----------+
|
||||
| PostgreSQL |
|
||||
| +-----------------+ |
|
||||
| | Query Executor | |
|
||||
| +--------+--------+ |
|
||||
| | |
|
||||
| +--------v--------+ |
|
||||
| | RuVector SQL | |
|
||||
| | Surface Layer | |
|
||||
| +--------+--------+ |
|
||||
+-----------|----------+
|
||||
|
|
||||
+--------------------+--------------------+
|
||||
| |
|
||||
+----------v----------+ +-----------v-----------+
|
||||
| Index AM Hooks | | Background Workers |
|
||||
| (HNSW, IVFFlat) | | (Maintenance, GNN) |
|
||||
+----------+----------+ +-----------+-----------+
|
||||
| |
|
||||
+--------------------+--------------------+
|
||||
|
|
||||
+-----------v-----------+
|
||||
| Shared Memory |
|
||||
| Communication |
|
||||
+-----------+-----------+
|
||||
|
|
||||
+-----------v-----------+
|
||||
| RuVector Engine |
|
||||
| +-------+ +-------+ |
|
||||
| | Index | | Graph | |
|
||||
| +-------+ +-------+ |
|
||||
| +-------+ +-------+ |
|
||||
| | GNN | | Tier | |
|
||||
| +-------+ +-------+ |
|
||||
| +------------------+|
|
||||
| | Integrity Ctrl ||
|
||||
| +------------------+|
|
||||
+-----------------------+
|
||||
```
|
||||
|
||||
### Component Responsibilities
|
||||
|
||||
#### 1. SQL Surface Layer
|
||||
- **pgvector type compatibility**: `vector(n)`, operators `<->`, `<#>`, `<=>`
|
||||
- **Extended types**: `sparsevec`, `halfvec`, `binaryvec`
|
||||
- **Function catalog**: `ruvector_*` functions for advanced features
|
||||
- **Views**: `ruvector_nodes`, `ruvector_edges`, `ruvector_hyperedges`
|
||||
|
||||
#### 2. Index Access Methods
|
||||
- **ruhnsw**: HNSW index with configurable M, ef_construction
|
||||
- **ruivfflat**: IVF-Flat index with automatic centroid updates
|
||||
- **Scan hooks**: Route queries to RuVector engine
|
||||
- **Build hooks**: Incremental and bulk index construction
|
||||
|
||||
#### 3. Background Workers
|
||||
- **Engine Worker**: Long-lived RuVector engine instance
|
||||
- **Maintenance Worker**: Tiering, compaction, statistics
|
||||
- **GNN Training Worker**: Periodic model updates
|
||||
- **Integrity Worker**: Mincut sampling and state updates
|
||||
|
||||
#### 4. RuVector Engine
|
||||
- **Index Manager**: HNSW/IVFFlat in-memory structures
|
||||
- **Graph Store**: Property graph with Cypher support
|
||||
- **GNN Pipeline**: Training data capture, model inference
|
||||
- **Tier Manager**: Hot/warm/cool/cold classification
|
||||
- **Integrity Controller**: Mincut-based operation gating
|
||||
|
||||
---
|
||||
|
||||
## Feature Matrix
|
||||
|
||||
### Phase 1: pgvector Compatibility (Foundation)
|
||||
|
||||
| Feature | Status | Description |
|
||||
|---------|--------|-------------|
|
||||
| `vector(n)` type | Core | Dense vector storage |
|
||||
| `<->` operator | Core | L2 (Euclidean) distance |
|
||||
| `<=>` operator | Core | Cosine distance |
|
||||
| `<#>` operator | Core | Negative inner product |
|
||||
| HNSW index | Core | `CREATE INDEX ... USING hnsw` |
|
||||
| IVFFlat index | Core | `CREATE INDEX ... USING ivfflat` |
|
||||
| `vector_l2_ops` | Core | Operator class for L2 |
|
||||
| `vector_cosine_ops` | Core | Operator class for cosine |
|
||||
| `vector_ip_ops` | Core | Operator class for inner product |
|
||||
|
||||
### Phase 2: Tiered Storage & Compression
|
||||
|
||||
| Feature | Status | Description |
|
||||
|---------|--------|-------------|
|
||||
| `ruvector_set_tiers()` | v2 | Configure tier thresholds |
|
||||
| `ruvector_compact()` | v2 | Trigger manual compaction |
|
||||
| Access frequency tracking | v2 | Background counter updates |
|
||||
| Automatic tier promotion/demotion | v2 | Policy-based migration |
|
||||
| SQ8/PQ compression | v2 | Transparent quantization |
|
||||
|
||||
### Phase 3: Graph Engine & Cypher
|
||||
|
||||
| Feature | Status | Description |
|
||||
|---------|--------|-------------|
|
||||
| `ruvector_cypher()` | v2 | Execute Cypher queries |
|
||||
| `ruvector_nodes` view | v2 | Graph nodes as relations |
|
||||
| `ruvector_edges` view | v2 | Graph edges as relations |
|
||||
| `ruvector_hyperedges` view | v2 | Hyperedge support |
|
||||
| SQL-graph joins | v2 | Mix Cypher with SQL |
|
||||
|
||||
### Phase 4: Integrity Control Plane
|
||||
|
||||
| Feature | Status | Description |
|
||||
|---------|--------|-------------|
|
||||
| `ruvector_integrity_sample()` | v2 | Sample contracted graph |
|
||||
| `ruvector_integrity_policy_set()` | v2 | Configure policies |
|
||||
| `ruvector_integrity_gate()` | v2 | Check operation permission |
|
||||
| Integrity states | v2 | normal/stress/critical |
|
||||
| Signed audit events | v2 | Cryptographic audit trail |
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Patterns
|
||||
|
||||
### Vector Search (Read Path)
|
||||
|
||||
```
|
||||
1. Client: SELECT ... ORDER BY embedding <-> $query LIMIT k
|
||||
|
||||
2. PostgreSQL Planner:
|
||||
- Recognizes index on embedding column
|
||||
- Generates Index Scan plan using ruhnsw
|
||||
|
||||
3. Index AM (amgettuple):
|
||||
- Submits search request to shared memory queue
|
||||
- Engine worker receives request
|
||||
|
||||
4. RuVector Engine:
|
||||
- Checks integrity gate (normal state: proceed)
|
||||
- Executes HNSW greedy search
|
||||
- Applies post-filter if needed
|
||||
- Returns top-k with distances
|
||||
|
||||
5. Index AM:
|
||||
- Fetches results from shared memory
|
||||
- Returns TIDs to executor
|
||||
|
||||
6. PostgreSQL Executor:
|
||||
- Fetches heap tuples
|
||||
- Applies remaining WHERE clauses
|
||||
- Returns to client
|
||||
```
|
||||
|
||||
### Vector Insert (Write Path)
|
||||
|
||||
```
|
||||
1. Client: INSERT INTO items (embedding) VALUES ($vec)
|
||||
|
||||
2. PostgreSQL Executor:
|
||||
- Assigns TID, writes heap tuple
|
||||
- Generates WAL record
|
||||
|
||||
3. Index AM (aminsert):
|
||||
- Checks integrity gate (normal: proceed, stress: throttle)
|
||||
- Submits insert to engine queue
|
||||
|
||||
4. RuVector Engine:
|
||||
- Integrates vector into HNSW graph
|
||||
- Updates tier counters
|
||||
- Writes to hot tier
|
||||
|
||||
5. WAL Writer:
|
||||
- Persists operation for durability
|
||||
|
||||
6. Replication (if configured):
|
||||
- Streams WAL to replicas
|
||||
- Replicas apply via engine
|
||||
```
|
||||
|
||||
### Integrity Gating
|
||||
|
||||
```
|
||||
1. Background Worker (periodic):
|
||||
- Samples contracted operational graph
|
||||
- Computes lambda_cut (minimum cut value) on contracted graph
|
||||
- Optionally computes lambda2 (algebraic connectivity) as drift signal
|
||||
- Updates integrity state in shared memory
|
||||
|
||||
2. Any Operation:
|
||||
- Reads current integrity state
|
||||
- normal (lambda > T_high): allow all
|
||||
- stress (T_low < lambda < T_high): throttle bulk ops
|
||||
- critical (lambda < T_low): freeze mutations
|
||||
|
||||
3. On State Change:
|
||||
- Logs signed integrity event
|
||||
- Notifies waiting operations
|
||||
- Adjusts background worker priorities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Modes
|
||||
|
||||
### Mode 1: Single Postgres Embedded
|
||||
|
||||
```
|
||||
+--------------------------------------------+
|
||||
| PostgreSQL Instance |
|
||||
| +--------------------------------------+ |
|
||||
| | RuVector Extension | |
|
||||
| | +--------+ +---------+ +-------+ | |
|
||||
| | | Engine | | Workers | | Index | | |
|
||||
| | +--------+ +---------+ +-------+ | |
|
||||
| +--------------------------------------+ |
|
||||
| |
|
||||
| +--------------------------------------+ |
|
||||
| | Data Directory | |
|
||||
| | vectors/ graphs/ indexes/ wal/ | |
|
||||
| +--------------------------------------+ |
|
||||
+--------------------------------------------+
|
||||
```
|
||||
|
||||
**Use case**: Development, small-medium deployments (< 100M vectors)
|
||||
|
||||
### Mode 2: Postgres + RuVector Cluster
|
||||
|
||||
```
|
||||
+------------------+ +------------------+
|
||||
| PostgreSQL 1 | | PostgreSQL 2 |
|
||||
| (Primary) | | (Replica) |
|
||||
+--------+---------+ +--------+---------+
|
||||
| |
|
||||
| WAL Stream | WAL Apply
|
||||
| |
|
||||
+--------v-------------------------v---------+
|
||||
| RuVector Cluster |
|
||||
| +-------+ +-------+ +-------+ +------+ |
|
||||
| | Node1 | | Node2 | | Node3 | | ... | |
|
||||
| +-------+ +-------+ +-------+ +------+ |
|
||||
| |
|
||||
| Distributed HNSW | Sharded Graph | GNN |
|
||||
+---------------------------------------------+
|
||||
```
|
||||
|
||||
**Use case**: Production, large deployments (100M+ vectors)
|
||||
|
||||
### v2 Cluster Mode Clarification
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| CLUSTER DEPLOYMENT DECISION |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
v2 cluster mode is a SEPARATE SERVICE with a stable RPC API.
|
||||
The Postgres extension acts as a CLIENT to the cluster.
|
||||
|
||||
ARCHITECTURE OPTIONS:
|
||||
|
||||
Option A: SIDECAR (per Postgres instance)
|
||||
• RuVector cluster node co-located with each Postgres
|
||||
• Pros: Low latency, simple networking
|
||||
• Cons: Resource contention, harder to scale independently
|
||||
• Use when: Latency-sensitive, moderate scale
|
||||
|
||||
Option B: SHARED SERVICE (separate cluster)
|
||||
• Dedicated RuVector cluster serving multiple Postgres instances
|
||||
• Pros: Independent scaling, resource isolation
|
||||
• Cons: Network latency, requires service discovery
|
||||
• Use when: Large scale, multi-tenant
|
||||
|
||||
PROTOCOL:
|
||||
• gRPC with protobuf serialization
|
||||
• mTLS for authentication
|
||||
• Connection pooling in extension
|
||||
|
||||
PARTITION ASSIGNMENT:
|
||||
• Consistent hashing for shard routing
|
||||
• Automatic rebalancing on node join/leave
|
||||
• Partition map cached in extension shared memory
|
||||
|
||||
PARTITION MAP VERSIONING AND FENCING:
|
||||
• partition_map_version: monotonic counter incremented on any change
|
||||
• lease_epoch: obtained from cluster leader, prevents split-brain
|
||||
• Extension rejects stale map updates unless epoch matches current
|
||||
• On leader failover:
|
||||
1. New leader increments epoch
|
||||
2. Extensions must re-fetch map with new epoch
|
||||
3. Stale-epoch operations return ESTALE, client retries
|
||||
|
||||
v2 RECOMMENDATION:
|
||||
Start with Mode 1 (embedded). Add cluster mode only when:
|
||||
• Dataset exceeds single-node memory
|
||||
• Need independent scaling of compute/storage
|
||||
• Multi-region deployment required
|
||||
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consistency Contract
|
||||
|
||||
### Heap-Engine Relationship
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| CONSISTENCY CONTRACT |
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| PostgreSQL Heap is AUTHORITATIVE for: |
|
||||
| • Row existence and visibility (MVCC xmin/xmax) |
|
||||
| • Transaction commit status |
|
||||
| • Data integrity constraints |
|
||||
| |
|
||||
| RuVector Engine Index is EVENTUALLY CONSISTENT: |
|
||||
| • Bounded lag window (configurable, default 100ms) |
|
||||
| • Never returns invisible tuples (heap recheck) |
|
||||
| • Never resurrects deleted vectors |
|
||||
| |
|
||||
| v2 HYBRID MODEL: |
|
||||
| • SYNCHRONOUS: Hot tier mutations, primary HNSW inserts |
|
||||
| • ASYNCHRONOUS: Compaction, tier moves, graph maintenance |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
See [10-consistency-replication.md](10-consistency-replication.md) for full specification.
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target | Notes |
|
||||
|--------|--------|-------|
|
||||
| Query latency (p50) | < 5ms | 1M vectors, top-10 |
|
||||
| Query latency (p99) | < 20ms | 1M vectors, top-10 |
|
||||
| Insert throughput | > 10K/sec | Bulk mode |
|
||||
| Index build | < 30min | 10M 768-dim vectors |
|
||||
| Recall@10 | > 95% | HNSW default params |
|
||||
| Compression ratio | 4-32x | Tier-dependent |
|
||||
| Memory overhead | < 2x | Compared to pgvector |
|
||||
|
||||
### Benchmark Specification
|
||||
|
||||
Performance targets must be validated against a defined benchmark suite:
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| BENCHMARK SPECIFICATION |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
VECTOR CONFIGURATIONS:
|
||||
• Dimensions: 768 (typical text embeddings), 1536 (large embedding models)
|
||||
• Row counts: 1M, 10M, 100M
|
||||
• Data type: float32
|
||||
|
||||
QUERY PATTERNS:
|
||||
• Pure vector search (no filter)
|
||||
• Vector + metadata filter (10% selectivity)
|
||||
• Vector + metadata filter (1% selectivity)
|
||||
• Batch query (100 queries)
|
||||
|
||||
HARDWARE BASELINE:
|
||||
• CPU: 8 cores (AMD EPYC or Intel Xeon)
|
||||
• RAM: 64GB
|
||||
• Storage: NVMe SSD (3GB/s read)
|
||||
• Single node, no replication
|
||||
|
||||
CONCURRENCY:
|
||||
• Single thread baseline
|
||||
• 8 concurrent queries (parallel)
|
||||
• 32 concurrent queries (stress)
|
||||
|
||||
RECALL MEASUREMENT:
|
||||
• Brute-force baseline on 10K sampled queries
|
||||
• Report recall@1, recall@10, recall@100
|
||||
• Calculate 95th percentile recall
|
||||
|
||||
INDEX CONFIGURATIONS:
|
||||
• HNSW: M=16, ef_construction=200, ef_search=100
|
||||
• IVFFlat: nlist=sqrt(N), nprobe=10
|
||||
|
||||
TIER-SPECIFIC TARGETS:
|
||||
• Hot tier: exact float32, recall > 98%
|
||||
• Warm tier: exact or float16, recall > 96%
|
||||
• Cool tier: approximate + rerank, recall > 94%
|
||||
• Cold tier: approximate only, recall > 90%
|
||||
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Integrity Event Signing
|
||||
|
||||
All integrity state changes are cryptographically signed:
|
||||
|
||||
```rust
|
||||
struct IntegrityEvent {
|
||||
timestamp: DateTime<Utc>,
|
||||
event_type: IntegrityEventType,
|
||||
previous_state: IntegrityState,
|
||||
new_state: IntegrityState,
|
||||
lambda_cut: f64,
|
||||
witness_edges: Vec<EdgeId>,
|
||||
signature: Ed25519Signature,
|
||||
}
|
||||
```
|
||||
|
||||
### Access Control
|
||||
|
||||
- Leverages PostgreSQL GRANT/REVOKE
|
||||
- Separate roles for:
|
||||
- `ruvector_admin`: Full access
|
||||
- `ruvector_operator`: Maintenance operations
|
||||
- `ruvector_user`: Query and insert only
|
||||
|
||||
### Audit Trail
|
||||
|
||||
- All administrative operations logged
|
||||
- Integrity events stored in `ruvector_integrity_events`
|
||||
- Optional export to external SIEM
|
||||
|
||||
---
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
### Phase 1: Foundation (Weeks 1-4)
|
||||
- [ ] Extension skeleton with pgrx
|
||||
- [ ] Collection metadata tables
|
||||
- [ ] Basic HNSW integration
|
||||
- [ ] pgvector compatibility tests
|
||||
- [ ] Recall/performance benchmarks
|
||||
|
||||
### Phase 2: Tiered Storage (Weeks 5-8)
|
||||
- [ ] Access counter infrastructure
|
||||
- [ ] Tier policy table
|
||||
- [ ] Background compactor
|
||||
- [ ] Compression integration
|
||||
- [ ] Tier report functions
|
||||
|
||||
### Phase 3: Graph & Cypher (Weeks 9-12)
|
||||
- [ ] Graph storage schema
|
||||
- [ ] Cypher parser integration
|
||||
- [ ] Relational bridge views
|
||||
- [ ] SQL-graph join helpers
|
||||
- [ ] Graph maintenance
|
||||
|
||||
### Phase 4: Integrity Control (Weeks 13-16)
|
||||
- [ ] Contracted graph construction
|
||||
- [ ] Lambda cut computation
|
||||
- [ ] Policy application layer
|
||||
- [ ] Signed audit events
|
||||
- [ ] Control plane testing
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Rust Crates
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `pgrx` | PostgreSQL extension framework |
|
||||
| `parking_lot` | Fast synchronization primitives |
|
||||
| `crossbeam` | Lock-free data structures |
|
||||
| `serde` | Serialization |
|
||||
| `ed25519-dalek` | Signature verification |
|
||||
|
||||
### PostgreSQL Features
|
||||
|
||||
| Feature | Minimum Version |
|
||||
|---------|-----------------|
|
||||
| Background workers | 9.4+ |
|
||||
| Custom access methods | 9.6+ |
|
||||
| Parallel query | 9.6+ |
|
||||
| Logical replication | 10+ |
|
||||
| Partitioning | 10+ (native) |
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| [01-sql-schema.md](01-sql-schema.md) | Complete SQL schema |
|
||||
| [02-background-workers.md](02-background-workers.md) | Worker specifications with IPC contract |
|
||||
| [03-index-access-methods.md](03-index-access-methods.md) | Index AM details |
|
||||
| [04-integrity-events.md](04-integrity-events.md) | Event schema, policies, hysteresis, operation classes |
|
||||
| [05-phase1-pgvector-compat.md](05-phase1-pgvector-compat.md) | Phase 1 specification with incremental AM path |
|
||||
| [06-phase2-tiered-storage.md](06-phase2-tiered-storage.md) | Phase 2 specification with tier exactness modes |
|
||||
| [07-phase3-graph-cypher.md](07-phase3-graph-cypher.md) | Phase 3 specification with SQL join keys |
|
||||
| [08-phase4-integrity-control.md](08-phase4-integrity-control.md) | Phase 4 specification (mincut + λ₂) |
|
||||
| [09-migration-guide.md](09-migration-guide.md) | pgvector migration |
|
||||
| [10-consistency-replication.md](10-consistency-replication.md) | Consistency contract, MVCC, WAL, recovery |
|
||||
1293
docs/postgres/v2/01-sql-schema.md
Normal file
1293
docs/postgres/v2/01-sql-schema.md
Normal file
File diff suppressed because it is too large
Load Diff
1405
docs/postgres/v2/02-background-workers.md
Normal file
1405
docs/postgres/v2/02-background-workers.md
Normal file
File diff suppressed because it is too large
Load Diff
1141
docs/postgres/v2/03-index-access-methods.md
Normal file
1141
docs/postgres/v2/03-index-access-methods.md
Normal file
File diff suppressed because it is too large
Load Diff
1544
docs/postgres/v2/04-integrity-events.md
Normal file
1544
docs/postgres/v2/04-integrity-events.md
Normal file
File diff suppressed because it is too large
Load Diff
1237
docs/postgres/v2/05-phase1-pgvector-compat.md
Normal file
1237
docs/postgres/v2/05-phase1-pgvector-compat.md
Normal file
File diff suppressed because it is too large
Load Diff
1490
docs/postgres/v2/06-phase2-tiered-storage.md
Normal file
1490
docs/postgres/v2/06-phase2-tiered-storage.md
Normal file
File diff suppressed because it is too large
Load Diff
1522
docs/postgres/v2/07-phase3-graph-cypher.md
Normal file
1522
docs/postgres/v2/07-phase3-graph-cypher.md
Normal file
File diff suppressed because it is too large
Load Diff
1511
docs/postgres/v2/08-phase4-integrity-control.md
Normal file
1511
docs/postgres/v2/08-phase4-integrity-control.md
Normal file
File diff suppressed because it is too large
Load Diff
656
docs/postgres/v2/09-migration-guide.md
Normal file
656
docs/postgres/v2/09-migration-guide.md
Normal file
@@ -0,0 +1,656 @@
|
||||
# RuVector Postgres v2 - Migration Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide provides step-by-step instructions for migrating from pgvector to RuVector Postgres v2. The migration is designed to be **non-disruptive** with zero data loss and minimal downtime.
|
||||
|
||||
---
|
||||
|
||||
## Migration Approaches
|
||||
|
||||
### Approach 1: In-Place Extension Swap (Recommended)
|
||||
|
||||
Swap the extension while keeping data in place. Fastest with zero data copy.
|
||||
|
||||
**Downtime**: < 5 minutes
|
||||
**Risk**: Low
|
||||
|
||||
### Approach 2: Parallel Run with Gradual Cutover
|
||||
|
||||
Run both extensions simultaneously, gradually shifting traffic.
|
||||
|
||||
**Downtime**: Zero
|
||||
**Risk**: Very Low
|
||||
|
||||
### Approach 3: Full Data Migration
|
||||
|
||||
Export and re-import all data. Use when changing schema significantly.
|
||||
|
||||
**Downtime**: Proportional to data size
|
||||
**Risk**: Medium
|
||||
|
||||
---
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
### 1. Verify Compatibility
|
||||
|
||||
```sql
|
||||
-- Check pgvector version
|
||||
SELECT extversion FROM pg_extension WHERE extname = 'vector';
|
||||
|
||||
-- Check PostgreSQL version (RuVector requires 14+)
|
||||
SELECT version();
|
||||
|
||||
-- Count vectors and indexes
|
||||
SELECT
|
||||
relname AS table_name,
|
||||
pg_size_pretty(pg_relation_size(c.oid)) AS size,
|
||||
(SELECT COUNT(*) FROM pg_class WHERE relname = c.relname) AS rows
|
||||
FROM pg_class c
|
||||
JOIN pg_namespace n ON n.oid = c.relnamespace
|
||||
WHERE c.relkind = 'r'
|
||||
AND EXISTS (
|
||||
SELECT 1 FROM pg_attribute a
|
||||
JOIN pg_type t ON a.atttypid = t.oid
|
||||
WHERE a.attrelid = c.oid AND t.typname = 'vector'
|
||||
);
|
||||
|
||||
-- List vector indexes
|
||||
SELECT
|
||||
i.relname AS index_name,
|
||||
t.relname AS table_name,
|
||||
am.amname AS index_type,
|
||||
pg_size_pretty(pg_relation_size(i.oid)) AS size
|
||||
FROM pg_index ix
|
||||
JOIN pg_class i ON ix.indexrelid = i.oid
|
||||
JOIN pg_class t ON ix.indrelid = t.oid
|
||||
JOIN pg_am am ON i.relam = am.oid
|
||||
WHERE am.amname IN ('hnsw', 'ivfflat');
|
||||
```
|
||||
|
||||
### 2. Backup
|
||||
|
||||
```bash
|
||||
# Full database backup
|
||||
pg_dump -Fc -f backup_before_migration.dump mydb
|
||||
|
||||
# Or just schema with vector data
|
||||
pg_dump -Fc --table='*embedding*' -f vector_tables.dump mydb
|
||||
```
|
||||
|
||||
### 3. Test Environment
|
||||
|
||||
```bash
|
||||
# Restore to test environment
|
||||
createdb mydb_test
|
||||
pg_restore -d mydb_test backup_before_migration.dump
|
||||
|
||||
# Install RuVector extension for testing
|
||||
psql mydb_test -c "CREATE EXTENSION ruvector"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Approach 1: In-Place Extension Swap
|
||||
|
||||
### Step 1: Install RuVector Extension
|
||||
|
||||
```bash
|
||||
# Install RuVector package
|
||||
# Option A: From source
|
||||
cd ruvector-postgres
|
||||
cargo pgrx install --release
|
||||
|
||||
# Option B: From package (when available)
|
||||
apt install postgresql-16-ruvector
|
||||
```
|
||||
|
||||
### Step 2: Stop Application Writes
|
||||
|
||||
```sql
|
||||
-- Optional: Put tables in read-only mode
|
||||
BEGIN;
|
||||
LOCK TABLE items IN EXCLUSIVE MODE;
|
||||
-- Keep transaction open to block writes
|
||||
```
|
||||
|
||||
### Step 3: Drop pgvector Indexes
|
||||
|
||||
```sql
|
||||
-- Save index definitions for recreation
|
||||
SELECT indexdef
|
||||
FROM pg_indexes
|
||||
WHERE indexname IN (
|
||||
SELECT i.relname
|
||||
FROM pg_index ix
|
||||
JOIN pg_class i ON ix.indexrelid = i.oid
|
||||
JOIN pg_am am ON i.relam = am.oid
|
||||
WHERE am.amname IN ('hnsw', 'ivfflat')
|
||||
);
|
||||
|
||||
-- Drop indexes (saves original DDL first)
|
||||
DO $$
|
||||
DECLARE
|
||||
idx RECORD;
|
||||
BEGIN
|
||||
FOR idx IN
|
||||
SELECT i.relname AS index_name
|
||||
FROM pg_index ix
|
||||
JOIN pg_class i ON ix.indexrelid = i.oid
|
||||
JOIN pg_am am ON i.relam = am.oid
|
||||
WHERE am.amname IN ('hnsw', 'ivfflat')
|
||||
LOOP
|
||||
EXECUTE format('DROP INDEX IF EXISTS %I', idx.index_name);
|
||||
END LOOP;
|
||||
END $$;
|
||||
```
|
||||
|
||||
### Step 4: Swap Extensions
|
||||
|
||||
```sql
|
||||
-- Drop pgvector
|
||||
DROP EXTENSION vector CASCADE;
|
||||
|
||||
-- Create RuVector
|
||||
CREATE EXTENSION ruvector;
|
||||
```
|
||||
|
||||
### Step 5: Recreate Indexes
|
||||
|
||||
```sql
|
||||
-- Recreate HNSW index (same syntax)
|
||||
CREATE INDEX idx_items_embedding ON items
|
||||
USING hnsw (embedding vector_l2_ops)
|
||||
WITH (m = 16, ef_construction = 64);
|
||||
|
||||
-- Or with RuVector-specific options
|
||||
CREATE INDEX idx_items_embedding ON items
|
||||
USING hnsw (embedding vector_l2_ops)
|
||||
WITH (m = 16, ef_construction = 64);
|
||||
```
|
||||
|
||||
### Step 6: Verify
|
||||
|
||||
```sql
|
||||
-- Check extension
|
||||
SELECT * FROM pg_extension WHERE extname = 'ruvector';
|
||||
|
||||
-- Test query
|
||||
EXPLAIN ANALYZE
|
||||
SELECT id, embedding <-> '[0.1, 0.2, ...]' AS distance
|
||||
FROM items
|
||||
ORDER BY embedding <-> '[0.1, 0.2, ...]'
|
||||
LIMIT 10;
|
||||
|
||||
-- Compare recall (optional)
|
||||
-- Run same query with and without index
|
||||
SET enable_indexscan = off;
|
||||
-- Query without index (exact)
|
||||
SET enable_indexscan = on;
|
||||
-- Query with index (approximate)
|
||||
```
|
||||
|
||||
### Step 7: Resume Application
|
||||
|
||||
```sql
|
||||
-- Release lock
|
||||
ROLLBACK; -- If you started a transaction for locking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Approach 2: Parallel Run
|
||||
|
||||
### Step 1: Install RuVector (Different Schema)
|
||||
|
||||
```sql
|
||||
-- Create schema for RuVector
|
||||
CREATE SCHEMA ruvector_new;
|
||||
|
||||
-- Install RuVector in new schema
|
||||
CREATE EXTENSION ruvector WITH SCHEMA ruvector_new;
|
||||
```
|
||||
|
||||
### Step 2: Create Shadow Tables
|
||||
|
||||
```sql
|
||||
-- Create shadow table with same structure
|
||||
CREATE TABLE ruvector_new.items AS
|
||||
SELECT * FROM items WHERE false;
|
||||
|
||||
-- Add vector column using RuVector type
|
||||
ALTER TABLE ruvector_new.items
|
||||
ALTER COLUMN embedding TYPE ruvector_new.vector(768);
|
||||
|
||||
-- Copy data
|
||||
INSERT INTO ruvector_new.items
|
||||
SELECT * FROM items;
|
||||
|
||||
-- Create index
|
||||
CREATE INDEX ON ruvector_new.items
|
||||
USING hnsw (embedding ruvector_new.vector_l2_ops)
|
||||
WITH (m = 16, ef_construction = 64);
|
||||
```
|
||||
|
||||
### Step 3: Set Up Triggers for Sync
|
||||
|
||||
```sql
|
||||
-- Sync inserts
|
||||
CREATE OR REPLACE FUNCTION sync_to_ruvector()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
INSERT INTO ruvector_new.items VALUES (NEW.*);
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER trg_sync_insert
|
||||
AFTER INSERT ON items
|
||||
FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector();
|
||||
|
||||
-- Sync updates
|
||||
CREATE TRIGGER trg_sync_update
|
||||
AFTER UPDATE ON items
|
||||
FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector_update();
|
||||
|
||||
-- Sync deletes
|
||||
CREATE TRIGGER trg_sync_delete
|
||||
AFTER DELETE ON items
|
||||
FOR EACH ROW EXECUTE FUNCTION sync_to_ruvector_delete();
|
||||
```
|
||||
|
||||
### Step 4: Gradual Cutover
|
||||
|
||||
```python
|
||||
# Application code with gradual cutover
|
||||
import random
|
||||
|
||||
def search_embeddings(query_vector, use_ruvector_pct=0):
|
||||
"""
|
||||
Gradually shift traffic to RuVector.
|
||||
Start with 0%, increase to 100% over time.
|
||||
"""
|
||||
if random.random() * 100 < use_ruvector_pct:
|
||||
# Use RuVector
|
||||
return db.execute("""
|
||||
SELECT id, embedding <-> %s AS distance
|
||||
FROM ruvector_new.items
|
||||
ORDER BY embedding <-> %s
|
||||
LIMIT 10
|
||||
""", [query_vector, query_vector])
|
||||
else:
|
||||
# Use pgvector
|
||||
return db.execute("""
|
||||
SELECT id, embedding <-> %s AS distance
|
||||
FROM items
|
||||
ORDER BY embedding <-> %s
|
||||
LIMIT 10
|
||||
""", [query_vector, query_vector])
|
||||
```
|
||||
|
||||
### Step 5: Complete Migration
|
||||
|
||||
Once 100% traffic on RuVector with no issues:
|
||||
|
||||
```sql
|
||||
-- Rename tables
|
||||
ALTER TABLE items RENAME TO items_pgvector_backup;
|
||||
ALTER TABLE ruvector_new.items RENAME TO items;
|
||||
ALTER TABLE items SET SCHEMA public;
|
||||
|
||||
-- Drop pgvector
|
||||
DROP EXTENSION vector CASCADE;
|
||||
DROP TABLE items_pgvector_backup;
|
||||
|
||||
-- Clean up triggers
|
||||
DROP FUNCTION sync_to_ruvector CASCADE;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Approach 3: Full Data Migration
|
||||
|
||||
### Step 1: Export Data
|
||||
|
||||
```sql
|
||||
-- Export to CSV
|
||||
\copy (SELECT id, embedding::text, metadata FROM items) TO 'items_export.csv' CSV;
|
||||
|
||||
-- Or to binary format
|
||||
\copy items TO 'items_export.bin' BINARY;
|
||||
```
|
||||
|
||||
### Step 2: Switch Extensions
|
||||
|
||||
```sql
|
||||
DROP EXTENSION vector CASCADE;
|
||||
CREATE EXTENSION ruvector;
|
||||
```
|
||||
|
||||
### Step 3: Recreate Tables
|
||||
|
||||
```sql
|
||||
-- Recreate with RuVector type
|
||||
CREATE TABLE items (
|
||||
id SERIAL PRIMARY KEY,
|
||||
embedding vector(768),
|
||||
metadata JSONB
|
||||
);
|
||||
|
||||
-- Import data
|
||||
\copy items FROM 'items_export.csv' CSV;
|
||||
|
||||
-- Create index
|
||||
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Compatibility Reference
|
||||
|
||||
### Identical Syntax (No Changes Needed)
|
||||
|
||||
```sql
|
||||
-- Vector type declaration
|
||||
CREATE TABLE items (embedding vector(768));
|
||||
|
||||
-- Distance operators
|
||||
SELECT * FROM items ORDER BY embedding <-> query LIMIT 10; -- L2
|
||||
SELECT * FROM items ORDER BY embedding <=> query LIMIT 10; -- Cosine
|
||||
SELECT * FROM items ORDER BY embedding <#> query LIMIT 10; -- Inner product
|
||||
|
||||
-- Index creation
|
||||
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
|
||||
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
|
||||
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
|
||||
|
||||
-- Operator classes
|
||||
vector_l2_ops
|
||||
vector_cosine_ops
|
||||
vector_ip_ops
|
||||
|
||||
-- Utility functions
|
||||
SELECT vector_dims(embedding) FROM items LIMIT 1;
|
||||
SELECT vector_norm(embedding) FROM items LIMIT 1;
|
||||
```
|
||||
|
||||
### Extended Syntax (RuVector Only)
|
||||
|
||||
```sql
|
||||
-- New distance operators
|
||||
SELECT * FROM items ORDER BY embedding <+> query LIMIT 10; -- L1/Manhattan
|
||||
|
||||
-- Collection registration
|
||||
SELECT ruvector_register_collection(
|
||||
'my_embeddings',
|
||||
'public',
|
||||
'items',
|
||||
'embedding',
|
||||
768,
|
||||
'l2'
|
||||
);
|
||||
|
||||
-- Advanced search options
|
||||
SELECT * FROM ruvector_search(
|
||||
'my_embeddings',
|
||||
query_vector,
|
||||
10, -- k
|
||||
100, -- ef_search
|
||||
FALSE, -- use_gnn
|
||||
'{"category": "electronics"}' -- filter
|
||||
);
|
||||
|
||||
-- Tiered storage
|
||||
SELECT ruvector_set_tiers('my_embeddings', 24, 168, 720);
|
||||
SELECT ruvector_tier_report('my_embeddings');
|
||||
|
||||
-- Graph integration
|
||||
SELECT ruvector_graph_create('knowledge_graph');
|
||||
SELECT ruvector_cypher('knowledge_graph', 'MATCH (n) RETURN n LIMIT 10');
|
||||
|
||||
-- Integrity monitoring
|
||||
SELECT ruvector_integrity_status('my_embeddings');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GUC Parameter Mapping
|
||||
|
||||
| pgvector | RuVector | Notes |
|
||||
|----------|----------|-------|
|
||||
| `ivfflat.probes` | `ruvector.probes` | Same behavior |
|
||||
| `hnsw.ef_search` | `ruvector.ef_search` | Same behavior |
|
||||
| N/A | `ruvector.use_simd` | Enable/disable SIMD |
|
||||
| N/A | `ruvector.max_index_memory` | Memory limit |
|
||||
|
||||
```sql
|
||||
-- Set runtime parameters (same syntax)
|
||||
SET ruvector.ef_search = 100;
|
||||
SET ruvector.probes = 10;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Migration Issues
|
||||
|
||||
### Issue 1: Type Mismatch After Migration
|
||||
|
||||
```sql
|
||||
-- Error: operator does not exist: ruvector.vector <-> public.vector
|
||||
-- Solution: Ensure all tables use the new type
|
||||
|
||||
SELECT
|
||||
c.relname AS table_name,
|
||||
a.attname AS column_name,
|
||||
t.typname AS type_name,
|
||||
n.nspname AS type_schema
|
||||
FROM pg_attribute a
|
||||
JOIN pg_class c ON a.attrelid = c.oid
|
||||
JOIN pg_type t ON a.atttypid = t.oid
|
||||
JOIN pg_namespace n ON t.typnamespace = n.oid
|
||||
WHERE t.typname = 'vector';
|
||||
|
||||
-- Fix by recreating column
|
||||
ALTER TABLE items ALTER COLUMN embedding TYPE ruvector.vector(768);
|
||||
```
|
||||
|
||||
### Issue 2: Index Not Using RuVector AM
|
||||
|
||||
```sql
|
||||
-- Check which AM is being used
|
||||
SELECT
|
||||
i.relname AS index_name,
|
||||
am.amname AS access_method
|
||||
FROM pg_index ix
|
||||
JOIN pg_class i ON ix.indexrelid = i.oid
|
||||
JOIN pg_am am ON i.relam = am.oid;
|
||||
|
||||
-- Rebuild index with correct AM
|
||||
DROP INDEX old_index;
|
||||
CREATE INDEX new_index ON items USING hnsw (embedding vector_l2_ops);
|
||||
```
|
||||
|
||||
### Issue 3: Different Recall/Performance
|
||||
|
||||
```sql
|
||||
-- RuVector may have different default parameters
|
||||
-- Adjust ef_search for recall
|
||||
SET ruvector.ef_search = 200; -- Higher for better recall
|
||||
|
||||
-- Check actual ef being used
|
||||
EXPLAIN (ANALYZE, VERBOSE)
|
||||
SELECT * FROM items ORDER BY embedding <-> query LIMIT 10;
|
||||
```
|
||||
|
||||
### Issue 4: Extension Dependencies
|
||||
|
||||
```sql
|
||||
-- Check what depends on vector extension
|
||||
SELECT
|
||||
dependent.relname AS dependent_object,
|
||||
dependent.relkind AS object_type
|
||||
FROM pg_depend d
|
||||
JOIN pg_extension e ON d.refobjid = e.oid
|
||||
JOIN pg_class dependent ON d.objid = dependent.oid
|
||||
WHERE e.extname = 'vector';
|
||||
|
||||
-- May need to drop dependent objects first
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedure
|
||||
|
||||
If migration fails, rollback to pgvector:
|
||||
|
||||
```bash
|
||||
# Restore from backup
|
||||
pg_restore -d mydb --clean backup_before_migration.dump
|
||||
|
||||
# Or manually:
|
||||
```
|
||||
|
||||
```sql
|
||||
-- Drop RuVector
|
||||
DROP EXTENSION ruvector CASCADE;
|
||||
|
||||
-- Reinstall pgvector
|
||||
CREATE EXTENSION vector;
|
||||
|
||||
-- Restore schema (from saved DDL)
|
||||
-- Recreate indexes (from saved DDL)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Validation
|
||||
|
||||
### Compare Query Performance
|
||||
|
||||
```python
|
||||
import time
|
||||
import psycopg2
|
||||
import numpy as np
|
||||
|
||||
def benchmark_extension(conn, query_vector, n_queries=100):
|
||||
"""Benchmark query latency"""
|
||||
latencies = []
|
||||
|
||||
for _ in range(n_queries):
|
||||
start = time.time()
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("""
|
||||
SELECT id, embedding <-> %s AS distance
|
||||
FROM items
|
||||
ORDER BY embedding <-> %s
|
||||
LIMIT 10
|
||||
""", [query_vector, query_vector])
|
||||
cur.fetchall()
|
||||
latencies.append((time.time() - start) * 1000)
|
||||
|
||||
return {
|
||||
'p50': np.percentile(latencies, 50),
|
||||
'p95': np.percentile(latencies, 95),
|
||||
'p99': np.percentile(latencies, 99),
|
||||
'mean': np.mean(latencies),
|
||||
}
|
||||
|
||||
# Run before migration (pgvector)
|
||||
pgvector_results = benchmark_extension(conn, query_vec)
|
||||
|
||||
# Run after migration (RuVector)
|
||||
ruvector_results = benchmark_extension(conn, query_vec)
|
||||
|
||||
print(f"pgvector p50: {pgvector_results['p50']:.2f}ms")
|
||||
print(f"RuVector p50: {ruvector_results['p50']:.2f}ms")
|
||||
```
|
||||
|
||||
### Compare Recall
|
||||
|
||||
```python
|
||||
def measure_recall(conn, query_vectors, k=10):
|
||||
"""Measure recall@k against brute force"""
|
||||
recalls = []
|
||||
|
||||
for query in query_vectors:
|
||||
# Index scan result
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("""
|
||||
SELECT id FROM items
|
||||
ORDER BY embedding <-> %s
|
||||
LIMIT %s
|
||||
""", [query, k])
|
||||
index_results = set(row[0] for row in cur.fetchall())
|
||||
|
||||
# Brute force (disable index)
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("SET enable_indexscan = off")
|
||||
cur.execute("""
|
||||
SELECT id FROM items
|
||||
ORDER BY embedding <-> %s
|
||||
LIMIT %s
|
||||
""", [query, k])
|
||||
exact_results = set(row[0] for row in cur.fetchall())
|
||||
cur.execute("SET enable_indexscan = on")
|
||||
|
||||
recall = len(index_results & exact_results) / k
|
||||
recalls.append(recall)
|
||||
|
||||
return np.mean(recalls)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Migration Steps
|
||||
|
||||
### 1. Register Collections (Optional but Recommended)
|
||||
|
||||
```sql
|
||||
-- Register for RuVector-specific features
|
||||
SELECT ruvector_register_collection(
|
||||
'items_embeddings',
|
||||
'public',
|
||||
'items',
|
||||
'embedding',
|
||||
768,
|
||||
'l2'
|
||||
);
|
||||
```
|
||||
|
||||
### 2. Enable Tiered Storage (Optional)
|
||||
|
||||
```sql
|
||||
-- Configure tiers
|
||||
SELECT ruvector_set_tiers('items_embeddings', 24, 168, 720);
|
||||
```
|
||||
|
||||
### 3. Set Up Integrity Monitoring (Optional)
|
||||
|
||||
```sql
|
||||
-- Enable integrity monitoring
|
||||
SELECT ruvector_integrity_policy_set('items_embeddings', 'default', '{
|
||||
"threshold_high": 0.8,
|
||||
"threshold_low": 0.3
|
||||
}'::jsonb);
|
||||
```
|
||||
|
||||
### 4. Update Application Code
|
||||
|
||||
```python
|
||||
# Minimal changes needed for basic operations
|
||||
|
||||
# No change needed:
|
||||
cursor.execute("SELECT * FROM items ORDER BY embedding <-> %s LIMIT 10", [vec])
|
||||
|
||||
# Optional: Use new features
|
||||
cursor.execute("SELECT * FROM ruvector_search('items_embeddings', %s, 10)", [vec])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
- GitHub Issues: https://github.com/ruvnet/ruvector/issues
|
||||
- Documentation: https://ruvector.dev/docs
|
||||
- Migration Support: migration@ruvector.dev
|
||||
826
docs/postgres/v2/10-consistency-replication.md
Normal file
826
docs/postgres/v2/10-consistency-replication.md
Normal file
@@ -0,0 +1,826 @@
|
||||
# RuVector Postgres v2 - Consistency and Replication Model
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the consistency contract between PostgreSQL heap tuples and the RuVector engine, MVCC interaction, WAL and logical decoding strategy, crash recovery, replay order, and idempotency guarantees.
|
||||
|
||||
---
|
||||
|
||||
## Core Consistency Contract
|
||||
|
||||
### Authoritative Source of Truth
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| CONSISTENCY HIERARCHY |
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| 1. PostgreSQL Heap is AUTHORITATIVE for: |
|
||||
| - Row existence |
|
||||
| - Visibility rules (MVCC xmin/xmax) |
|
||||
| - Transaction commit status |
|
||||
| - Data integrity constraints |
|
||||
| |
|
||||
| 2. RuVector Engine Index is EVENTUALLY CONSISTENT: |
|
||||
| - Bounded lag window (configurable, default 100ms) |
|
||||
| - Reconciled on demand |
|
||||
| - Never returns invisible tuples |
|
||||
| - Never resurrects deleted embeddings |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Consistency Guarantees
|
||||
|
||||
| Property | Guarantee | Enforcement |
|
||||
|----------|-----------|-------------|
|
||||
| **No phantom reads** | Index never returns invisible tuples | Heap visibility check on every result |
|
||||
| **No zombie vectors** | Deleted vectors never return | Delete markers + tombstone cleanup |
|
||||
| **No stale updates** | Updated vectors show new values | Version-aware index entries |
|
||||
| **Bounded staleness** | Max lag from commit to searchable | Configurable, default 100ms |
|
||||
| **Crash consistency** | Recoverable to last WAL checkpoint | WAL-based recovery |
|
||||
|
||||
---
|
||||
|
||||
## Consistency Mechanisms
|
||||
|
||||
### Option A: Synchronous Index Maintenance
|
||||
|
||||
```
|
||||
INSERT/UPDATE Transaction:
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| 1. BEGIN |
|
||||
| 2. Write heap tuple |
|
||||
| 3. Call engine (synchronous) |
|
||||
| └─ If engine rejects → ROLLBACK |
|
||||
| 4. Append to WAL |
|
||||
| 5. COMMIT |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
Pros:
|
||||
- Strongest consistency
|
||||
- Simple mental model
|
||||
- No reconciliation needed
|
||||
|
||||
Cons:
|
||||
- Higher latency per operation
|
||||
- Engine failure blocks writes
|
||||
- Reduces write throughput
|
||||
```
|
||||
|
||||
### Option B: Asynchronous Maintenance with Reconciliation
|
||||
|
||||
```
|
||||
INSERT/UPDATE Transaction:
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| 1. BEGIN |
|
||||
| 2. Write heap tuple |
|
||||
| 3. Write to change log table OR trigger logical decoding |
|
||||
| 4. Append to WAL |
|
||||
| 5. COMMIT |
|
||||
| |
|
||||
| Background (continuous): |
|
||||
| 6. Engine reads change log / logical replication stream |
|
||||
| 7. Applies changes to index |
|
||||
| 8. Index scan checks heap visibility for every result |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
Pros:
|
||||
- Lower write latency
|
||||
- Engine failure doesn't block writes
|
||||
- Higher throughput
|
||||
|
||||
Cons:
|
||||
- Bounded staleness window
|
||||
- Requires visibility rechecks
|
||||
- More complex recovery
|
||||
```
|
||||
|
||||
### v2 Hybrid Model (Recommended)
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| v2 HYBRID CONSISTENCY MODEL |
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| SYNCHRONOUS (Hot Tier): |
|
||||
| - Primary HNSW index mutations |
|
||||
| - Hot tier inserts/updates |
|
||||
| - Visibility-critical operations |
|
||||
| |
|
||||
| ASYNCHRONOUS (Background): |
|
||||
| - Compaction and tier moves |
|
||||
| - Graph edge maintenance |
|
||||
| - GNN training data capture |
|
||||
| - Cold tier updates |
|
||||
| - Index optimization/rewiring |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Visibility Check Protocol
|
||||
|
||||
```rust
|
||||
/// Check heap visibility for index results
|
||||
pub fn check_visibility(
|
||||
snapshot: &Snapshot,
|
||||
results: &[IndexResult],
|
||||
) -> Vec<IndexResult> {
|
||||
results.iter()
|
||||
.filter(|r| {
|
||||
// Fetch heap tuple header
|
||||
let htup = heap_fetch_tuple_header(r.tid);
|
||||
|
||||
// Check MVCC visibility
|
||||
htup.map_or(false, |h| {
|
||||
heap_tuple_satisfies_snapshot(h, snapshot)
|
||||
})
|
||||
})
|
||||
.cloned()
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Index scan must always recheck heap
|
||||
impl IndexScan {
|
||||
fn next(&mut self) -> Option<HeapTuple> {
|
||||
loop {
|
||||
// Get next candidate from index
|
||||
let candidate = self.index.next()?;
|
||||
|
||||
// CRITICAL: Always verify against heap
|
||||
if let Some(tuple) = self.heap_fetch_visible(candidate.tid) {
|
||||
return Some(tuple);
|
||||
}
|
||||
// Invisible tuple, try next
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Incremental Candidate Paging API
|
||||
|
||||
The engine must support incremental candidate paging so the executor can skip MVCC-invisible rows and request more until k visible results are produced.
|
||||
|
||||
```rust
|
||||
/// Search request with cursor support for incremental paging
|
||||
#[derive(Debug)]
|
||||
pub struct SearchRequest {
|
||||
pub collection_id: i32,
|
||||
pub query: Vec<f32>,
|
||||
pub want_k: usize, // Desired visible results
|
||||
pub cursor: Option<Cursor>, // Resume from previous batch
|
||||
pub max_candidates: usize, // Max to return per batch (default: want_k * 2)
|
||||
}
|
||||
|
||||
/// Search response with cursor for pagination
|
||||
#[derive(Debug)]
|
||||
pub struct SearchResponse {
|
||||
pub candidates: Vec<Candidate>,
|
||||
pub cursor: Option<Cursor>, // None if exhausted
|
||||
pub total_scanned: usize,
|
||||
}
|
||||
|
||||
/// Cursor token for resuming search
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Cursor {
|
||||
pub ef_search_position: usize,
|
||||
pub last_distance: f32,
|
||||
pub visited_count: usize,
|
||||
}
|
||||
|
||||
/// Engine returns batches with cursor tokens
|
||||
impl Engine {
|
||||
pub fn search_batch(&self, req: SearchRequest) -> SearchResponse {
|
||||
let start_pos = req.cursor.map(|c| c.ef_search_position).unwrap_or(0);
|
||||
|
||||
// Continue HNSW search from cursor position
|
||||
let (candidates, next_pos, exhausted) = self.hnsw.search_continue(
|
||||
&req.query,
|
||||
req.max_candidates,
|
||||
start_pos,
|
||||
);
|
||||
|
||||
SearchResponse {
|
||||
candidates,
|
||||
cursor: if exhausted {
|
||||
None
|
||||
} else {
|
||||
Some(Cursor {
|
||||
ef_search_position: next_pos,
|
||||
last_distance: candidates.last().map(|c| c.distance).unwrap_or(f32::MAX),
|
||||
visited_count: start_pos + candidates.len(),
|
||||
})
|
||||
},
|
||||
total_scanned: start_pos + candidates.len(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Executor uses incremental paging
|
||||
fn execute_vector_search(query: &[f32], k: usize, snapshot: &Snapshot) -> Vec<HeapTuple> {
|
||||
let mut results = Vec::with_capacity(k);
|
||||
let mut cursor = None;
|
||||
|
||||
loop {
|
||||
// Request batch from engine
|
||||
let response = engine.search_batch(SearchRequest {
|
||||
collection_id,
|
||||
query: query.to_vec(),
|
||||
want_k: k - results.len(),
|
||||
cursor,
|
||||
max_candidates: (k - results.len()) * 2, // Over-fetch
|
||||
});
|
||||
|
||||
// Check visibility and collect visible tuples
|
||||
for candidate in response.candidates {
|
||||
if let Some(tuple) = heap_fetch_visible(candidate.tid, snapshot) {
|
||||
results.push(tuple);
|
||||
if results.len() >= k {
|
||||
return results;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check if exhausted
|
||||
match response.cursor {
|
||||
Some(c) => cursor = Some(c),
|
||||
None => break, // No more candidates
|
||||
}
|
||||
}
|
||||
|
||||
results
|
||||
}
|
||||
```
|
||||
|
||||
### Change Log Table (Async Mode)
|
||||
|
||||
```sql
|
||||
-- Change log for async reconciliation
|
||||
CREATE TABLE ruvector._change_log (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
collection_id INTEGER NOT NULL,
|
||||
operation CHAR(1) NOT NULL CHECK (operation IN ('I', 'U', 'D')),
|
||||
tuple_tid TID NOT NULL,
|
||||
vector_data BYTEA, -- NULL for deletes
|
||||
xmin XID NOT NULL,
|
||||
committed BOOLEAN DEFAULT FALSE,
|
||||
applied BOOLEAN DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_change_log_pending
|
||||
ON ruvector._change_log(collection_id, id)
|
||||
WHERE NOT applied;
|
||||
|
||||
-- Trigger to capture changes
|
||||
CREATE FUNCTION ruvector._log_change() RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
IF TG_OP = 'INSERT' THEN
|
||||
INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
|
||||
SELECT collection_id, 'I', NEW.ctid, NEW.embedding, txid_current()
|
||||
FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
|
||||
ELSIF TG_OP = 'UPDATE' THEN
|
||||
INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
|
||||
SELECT collection_id, 'U', NEW.ctid, NEW.embedding, txid_current()
|
||||
FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
|
||||
ELSIF TG_OP = 'DELETE' THEN
|
||||
INSERT INTO ruvector._change_log (collection_id, operation, tuple_tid, vector_data, xmin)
|
||||
SELECT collection_id, 'D', OLD.ctid, NULL, txid_current()
|
||||
FROM ruvector.collections WHERE table_name = TG_TABLE_NAME;
|
||||
END IF;
|
||||
RETURN NULL;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
```
|
||||
|
||||
### Logical Decoding (Alternative)
|
||||
|
||||
```rust
|
||||
/// Logical decoding output plugin for RuVector
|
||||
pub struct RuVectorOutputPlugin;
|
||||
|
||||
impl OutputPlugin for RuVectorOutputPlugin {
|
||||
fn begin_txn(&mut self, xid: TransactionId) {
|
||||
self.current_xid = Some(xid);
|
||||
self.changes.clear();
|
||||
}
|
||||
|
||||
fn change(&mut self, relation: &Relation, change: &Change) {
|
||||
// Only process tables with vector columns
|
||||
if !self.is_vector_table(relation) {
|
||||
return;
|
||||
}
|
||||
|
||||
match change {
|
||||
Change::Insert(new) => {
|
||||
self.changes.push(VectorChange::Insert {
|
||||
tid: new.tid,
|
||||
vector: extract_vector(new),
|
||||
});
|
||||
}
|
||||
Change::Update(old, new) => {
|
||||
self.changes.push(VectorChange::Update {
|
||||
old_tid: old.tid,
|
||||
new_tid: new.tid,
|
||||
vector: extract_vector(new),
|
||||
});
|
||||
}
|
||||
Change::Delete(old) => {
|
||||
self.changes.push(VectorChange::Delete {
|
||||
tid: old.tid,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn commit_txn(&mut self, xid: TransactionId, commit_lsn: XLogRecPtr) {
|
||||
// Apply all changes atomically
|
||||
self.engine.apply_changes(&self.changes, commit_lsn);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MVCC Interaction
|
||||
|
||||
### Transaction Visibility Rules
|
||||
|
||||
```rust
|
||||
/// Snapshot-aware index search
|
||||
pub fn search_with_snapshot(
|
||||
collection_id: i32,
|
||||
query: &[f32],
|
||||
k: usize,
|
||||
snapshot: &Snapshot,
|
||||
) -> Vec<SearchResult> {
|
||||
// Get more candidates than k to account for invisible tuples
|
||||
let over_fetch_factor = 2.0;
|
||||
let candidates = engine.search(
|
||||
collection_id,
|
||||
query,
|
||||
(k as f32 * over_fetch_factor) as usize,
|
||||
);
|
||||
|
||||
// Filter by visibility
|
||||
let visible: Vec<_> = candidates.into_iter()
|
||||
.filter(|c| is_visible(c.tid, snapshot))
|
||||
.take(k)
|
||||
.collect();
|
||||
|
||||
// If we don't have enough, fetch more
|
||||
if visible.len() < k {
|
||||
// Recursive fetch with larger over_fetch
|
||||
return search_with_larger_pool(...);
|
||||
}
|
||||
|
||||
visible
|
||||
}
|
||||
|
||||
/// Check tuple visibility against snapshot
|
||||
fn is_visible(tid: TupleId, snapshot: &Snapshot) -> bool {
|
||||
let htup = unsafe { heap_fetch_tuple(tid) };
|
||||
|
||||
match htup {
|
||||
Some(tuple) => {
|
||||
// HeapTupleSatisfiesVisibility equivalent
|
||||
let xmin = tuple.t_xmin;
|
||||
let xmax = tuple.t_xmax;
|
||||
|
||||
// Inserted by committed transaction visible to us
|
||||
let xmin_visible = snapshot.xmin <= xmin &&
|
||||
!snapshot.xip.contains(&xmin) &&
|
||||
pg_xact_status(xmin) == XACT_STATUS_COMMITTED;
|
||||
|
||||
// Not deleted, or deleted by transaction not visible to us
|
||||
let not_deleted = xmax == InvalidTransactionId ||
|
||||
snapshot.xmax <= xmax ||
|
||||
snapshot.xip.contains(&xmax) ||
|
||||
pg_xact_status(xmax) != XACT_STATUS_COMMITTED;
|
||||
|
||||
xmin_visible && not_deleted
|
||||
}
|
||||
None => false, // Tuple vacuumed away
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### HOT Update Handling
|
||||
|
||||
```rust
|
||||
/// Handle Heap-Only Tuple updates
|
||||
pub fn handle_hot_update(old_tid: TupleId, new_tid: TupleId, new_vector: &[f32]) {
|
||||
// HOT updates may change ctid without changing embedding
|
||||
if vectors_equal(get_vector(old_tid), new_vector) {
|
||||
// Only ctid changed, update TID mapping
|
||||
engine.update_tid_mapping(old_tid, new_tid);
|
||||
} else {
|
||||
// Vector changed, full update needed
|
||||
engine.delete(old_tid);
|
||||
engine.insert(new_tid, new_vector);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## WAL and Recovery
|
||||
|
||||
### WAL Record Types
|
||||
|
||||
```rust
|
||||
/// Custom WAL record types for RuVector
|
||||
#[repr(u8)]
|
||||
pub enum RuVectorWalRecord {
|
||||
/// Vector inserted into index
|
||||
IndexInsert = 0x10,
|
||||
/// Vector deleted from index
|
||||
IndexDelete = 0x11,
|
||||
/// Index page split
|
||||
IndexSplit = 0x12,
|
||||
/// HNSW edge added
|
||||
HnswEdgeAdd = 0x20,
|
||||
/// HNSW edge removed
|
||||
HnswEdgeRemove = 0x21,
|
||||
/// Tier change
|
||||
TierChange = 0x30,
|
||||
/// Integrity state change
|
||||
IntegrityChange = 0x40,
|
||||
}
|
||||
|
||||
impl RuVectorWalRecord {
|
||||
/// Write WAL record
|
||||
pub fn write(&self, data: &[u8]) -> XLogRecPtr {
|
||||
unsafe {
|
||||
let rdata = XLogRecData {
|
||||
data: data.as_ptr() as *mut c_char,
|
||||
len: data.len() as u32,
|
||||
next: std::ptr::null_mut(),
|
||||
};
|
||||
|
||||
XLogInsert(RM_RUVECTOR_ID, self.to_u8(), &rdata)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Crash Recovery
|
||||
|
||||
```rust
|
||||
/// Redo function for crash recovery
|
||||
pub extern "C" fn ruvector_redo(record: *mut XLogReaderState) {
|
||||
let info = unsafe { (*record).decoded_record.as_ref() };
|
||||
|
||||
match RuVectorWalRecord::from_u8(info.xl_info) {
|
||||
Some(RuVectorWalRecord::IndexInsert) => {
|
||||
let insert_data: IndexInsertData = deserialize(info.data);
|
||||
engine.redo_insert(insert_data);
|
||||
}
|
||||
Some(RuVectorWalRecord::IndexDelete) => {
|
||||
let delete_data: IndexDeleteData = deserialize(info.data);
|
||||
engine.redo_delete(delete_data);
|
||||
}
|
||||
Some(RuVectorWalRecord::HnswEdgeAdd) => {
|
||||
let edge_data: HnswEdgeData = deserialize(info.data);
|
||||
engine.redo_edge_add(edge_data);
|
||||
}
|
||||
// ... other record types
|
||||
_ => {
|
||||
pgrx::warning!("Unknown RuVector WAL record type");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Startup recovery sequence
|
||||
pub fn startup_recovery() {
|
||||
pgrx::log!("RuVector: Starting crash recovery");
|
||||
|
||||
// 1. Load last consistent checkpoint
|
||||
let checkpoint = load_checkpoint();
|
||||
|
||||
// 2. Rebuild in-memory structures
|
||||
engine.load_from_checkpoint(&checkpoint);
|
||||
|
||||
// 3. Replay WAL from checkpoint
|
||||
let wal_reader = WalReader::from_lsn(checkpoint.redo_lsn);
|
||||
for record in wal_reader {
|
||||
ruvector_redo(&record);
|
||||
}
|
||||
|
||||
// 4. Reconcile with heap if needed
|
||||
if checkpoint.needs_reconciliation {
|
||||
reconcile_with_heap();
|
||||
}
|
||||
|
||||
pgrx::log!("RuVector: Recovery complete");
|
||||
}
|
||||
```
|
||||
|
||||
### Replay Order Guarantees
|
||||
|
||||
```
|
||||
WAL Replay Order Contract:
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| 1. WAL records replayed in LSN order (guaranteed by PostgreSQL) |
|
||||
| |
|
||||
| 2. Within a transaction: |
|
||||
| - Heap insert before index insert |
|
||||
| - Index delete before heap delete (for visibility) |
|
||||
| |
|
||||
| 3. Cross-transaction: |
|
||||
| - Commit order preserved |
|
||||
| - Visibility respects commit timestamps |
|
||||
| |
|
||||
| 4. Recovery invariant: |
|
||||
| - After recovery, index matches committed heap state |
|
||||
| - No uncommitted changes in index |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Idempotency and Ordering Rules
|
||||
|
||||
**CRITICAL**: If WAL is truth, these invariants prevent "eventual corruption".
|
||||
|
||||
### Explicit Replay Rules
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| ENGINE REPLAY INVARIANTS |
|
||||
+------------------------------------------------------------------+
|
||||
|
||||
RULE 1: Apply operations in LSN order
|
||||
- Each operation carries its source LSN
|
||||
- Engine rejects out-of-order operations
|
||||
- Crash recovery replays from last checkpoint LSN
|
||||
|
||||
RULE 2: Store last applied LSN per collection
|
||||
- Persisted in ruvector.collection_state.last_applied_lsn
|
||||
- Updated atomically after each operation
|
||||
- Skip operations with LSN <= last_applied_lsn
|
||||
|
||||
RULE 3: Delete wins over insert for same TID
|
||||
- If TID inserted then deleted, final state is deleted
|
||||
- Replay order handles this naturally if LSN-ordered
|
||||
- Edge case: TID reuse after VACUUM requires checking xmin
|
||||
|
||||
RULE 4: Update = Delete + Insert
|
||||
- Updates decompose to delete old, insert new
|
||||
- Both carry same transaction LSN
|
||||
- Applied atomically
|
||||
|
||||
RULE 5: Rollback handling
|
||||
- Uncommitted operations not in WAL (crash safe)
|
||||
- For explicit ROLLBACK during runtime:
|
||||
- Synchronous mode: engine notified, reverts in-memory state
|
||||
- Async mode: change log entry marked rollback, skipped on apply
|
||||
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Conflict Resolution
|
||||
|
||||
```rust
|
||||
/// Handle conflicts during replay
|
||||
pub fn apply_with_conflict_resolution(
|
||||
&mut self,
|
||||
op: WalOperation,
|
||||
) -> Result<(), ReplayError> {
|
||||
// Check LSN ordering
|
||||
let last_lsn = self.lsn_tracker.get(op.collection_id);
|
||||
if op.lsn <= last_lsn {
|
||||
// Already applied, skip (idempotent)
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
match op.kind {
|
||||
OpKind::Insert { tid, vector } => {
|
||||
if self.index.contains_tid(tid) {
|
||||
// TID exists - check if this is TID reuse after VACUUM
|
||||
let existing_lsn = self.index.get_lsn(tid);
|
||||
if op.lsn > existing_lsn {
|
||||
// Newer insert wins - delete old, insert new
|
||||
self.index.delete(tid);
|
||||
self.index.insert(tid, &vector, op.lsn);
|
||||
}
|
||||
// else: stale insert, skip
|
||||
} else {
|
||||
self.index.insert(tid, &vector, op.lsn);
|
||||
}
|
||||
}
|
||||
OpKind::Delete { tid } => {
|
||||
// Delete always wins if LSN is newer
|
||||
if self.index.contains_tid(tid) {
|
||||
let existing_lsn = self.index.get_lsn(tid);
|
||||
if op.lsn > existing_lsn {
|
||||
self.index.delete(tid);
|
||||
}
|
||||
}
|
||||
// If not present, already deleted - idempotent
|
||||
}
|
||||
OpKind::Update { old_tid, new_tid, vector } => {
|
||||
// Atomic delete + insert
|
||||
self.index.delete(old_tid);
|
||||
self.index.insert(new_tid, &vector, op.lsn);
|
||||
}
|
||||
}
|
||||
|
||||
self.lsn_tracker.update(op.collection_id, op.lsn);
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Idempotent Operations
|
||||
|
||||
```rust
|
||||
/// All engine operations must be idempotent for safe replay
|
||||
impl Engine {
|
||||
/// Idempotent insert - safe to replay
|
||||
pub fn redo_insert(&mut self, data: IndexInsertData) {
|
||||
// Check if already exists
|
||||
if self.index.contains_tid(data.tid) {
|
||||
// Already inserted, skip
|
||||
return;
|
||||
}
|
||||
|
||||
// Insert with LSN tracking
|
||||
self.index.insert_with_lsn(data.tid, &data.vector, data.lsn);
|
||||
}
|
||||
|
||||
/// Idempotent delete - safe to replay
|
||||
pub fn redo_delete(&mut self, data: IndexDeleteData) {
|
||||
// Check if already deleted
|
||||
if !self.index.contains_tid(data.tid) {
|
||||
// Already deleted, skip
|
||||
return;
|
||||
}
|
||||
|
||||
// Delete with tombstone
|
||||
self.index.delete_with_lsn(data.tid, data.lsn);
|
||||
}
|
||||
|
||||
/// Idempotent edge add - safe to replay
|
||||
pub fn redo_edge_add(&mut self, data: HnswEdgeData) {
|
||||
// HNSW edges are idempotent by nature
|
||||
self.hnsw.add_edge(data.from, data.to, data.lsn);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### LSN-Based Deduplication
|
||||
|
||||
```rust
|
||||
/// Track applied LSN per collection
|
||||
pub struct LsnTracker {
|
||||
applied_lsn: HashMap<i32, XLogRecPtr>,
|
||||
}
|
||||
|
||||
impl LsnTracker {
|
||||
/// Check if operation should be applied
|
||||
pub fn should_apply(&self, collection_id: i32, lsn: XLogRecPtr) -> bool {
|
||||
match self.applied_lsn.get(&collection_id) {
|
||||
Some(&last_lsn) => lsn > last_lsn,
|
||||
None => true,
|
||||
}
|
||||
}
|
||||
|
||||
/// Mark operation as applied
|
||||
pub fn mark_applied(&mut self, collection_id: i32, lsn: XLogRecPtr) {
|
||||
self.applied_lsn.insert(collection_id, lsn);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Replication Strategies
|
||||
|
||||
### Physical Replication (Streaming)
|
||||
|
||||
```
|
||||
Primary → Standby streaming with RuVector:
|
||||
|
||||
Primary:
|
||||
1. Write heap + index changes
|
||||
2. Generate WAL records
|
||||
3. Stream to standby
|
||||
|
||||
Standby:
|
||||
1. Receive WAL stream
|
||||
2. Apply heap changes (PostgreSQL)
|
||||
3. Apply index changes (RuVector redo)
|
||||
4. Engine state matches primary
|
||||
```
|
||||
|
||||
### Logical Replication
|
||||
|
||||
```
|
||||
Publisher → Subscriber with RuVector:
|
||||
|
||||
Publisher:
|
||||
1. Changes captured via logical decoding
|
||||
2. RuVector output plugin extracts vector changes
|
||||
3. Publishes to replication slot
|
||||
|
||||
Subscriber:
|
||||
1. Receives logical changes
|
||||
2. Applies to local heap
|
||||
3. Local RuVector engine indexes changes
|
||||
4. Independent index structures
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
```sql
|
||||
-- Consistency configuration
|
||||
ALTER SYSTEM SET ruvector.consistency_mode = 'hybrid'; -- 'sync', 'async', 'hybrid'
|
||||
ALTER SYSTEM SET ruvector.max_lag_ms = 100; -- Max staleness window
|
||||
ALTER SYSTEM SET ruvector.visibility_recheck = true; -- Always recheck heap
|
||||
ALTER SYSTEM SET ruvector.wal_level = 'logical'; -- For logical replication
|
||||
|
||||
-- Recovery configuration
|
||||
ALTER SYSTEM SET ruvector.checkpoint_interval = 300; -- Checkpoint every 5 min
|
||||
ALTER SYSTEM SET ruvector.wal_buffer_size = '64MB'; -- WAL buffer
|
||||
ALTER SYSTEM SET ruvector.recovery_target_timeline = 'latest';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
```sql
|
||||
-- Consistency lag monitoring
|
||||
SELECT
|
||||
c.name AS collection,
|
||||
s.last_heap_lsn,
|
||||
s.last_index_lsn,
|
||||
pg_wal_lsn_diff(s.last_heap_lsn, s.last_index_lsn) AS lag_bytes,
|
||||
s.lag_ms,
|
||||
s.pending_changes
|
||||
FROM ruvector.consistency_status s
|
||||
JOIN ruvector.collections c ON s.collection_id = c.id;
|
||||
|
||||
-- Visibility recheck statistics
|
||||
SELECT
|
||||
collection_name,
|
||||
total_searches,
|
||||
visibility_rechecks,
|
||||
invisible_filtered,
|
||||
(invisible_filtered::float / NULLIF(visibility_rechecks, 0) * 100)::numeric(5,2) AS invisible_pct
|
||||
FROM ruvector.visibility_stats
|
||||
ORDER BY invisible_pct DESC;
|
||||
|
||||
-- WAL replay status
|
||||
SELECT
|
||||
pg_last_wal_receive_lsn() AS receive_lsn,
|
||||
pg_last_wal_replay_lsn() AS replay_lsn,
|
||||
ruvector_last_applied_lsn() AS ruvector_lsn,
|
||||
pg_wal_lsn_diff(pg_last_wal_replay_lsn(), ruvector_last_applied_lsn()) AS ruvector_lag_bytes;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Tests
|
||||
- Visibility check correctness
|
||||
- Idempotent operation replay
|
||||
- LSN tracking accuracy
|
||||
- MVCC snapshot handling
|
||||
|
||||
### Integration Tests
|
||||
- Crash recovery scenarios
|
||||
- Concurrent transaction visibility
|
||||
- Replication lag handling
|
||||
- HOT update handling
|
||||
|
||||
### Chaos Tests
|
||||
- Primary failover
|
||||
- Network partition during replication
|
||||
- Partial WAL replay
|
||||
- Checkpoint corruption recovery
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The v2 consistency model ensures:
|
||||
|
||||
1. **Heap is authoritative** - All visibility decisions defer to PostgreSQL heap
|
||||
2. **Bounded staleness** - Index catches up within configurable lag window
|
||||
3. **Crash safe** - WAL-based recovery with idempotent replay
|
||||
4. **Replication compatible** - Works with streaming and logical replication
|
||||
5. **MVCC aware** - Respects transaction isolation guarantees
|
||||
608
docs/postgres/v2/11-hybrid-search.md
Normal file
608
docs/postgres/v2/11-hybrid-search.md
Normal file
@@ -0,0 +1,608 @@
|
||||
# RuVector Postgres v2 - Hybrid Search (BM25 + Vector)
|
||||
|
||||
## Why Hybrid Search Matters
|
||||
|
||||
Vector search finds semantically similar content. Keyword search finds exact matches.
|
||||
|
||||
Neither is sufficient alone:
|
||||
- **Vector-only** misses exact keyword matches (product SKUs, error codes, names)
|
||||
- **Keyword-only** misses semantic similarity ("car" vs "automobile")
|
||||
|
||||
Every production RAG system needs both. pgvector doesn't have this. We do.
|
||||
|
||||
---
|
||||
|
||||
## Design Goals
|
||||
|
||||
1. **Single query, both signals** — No application-level fusion
|
||||
2. **Configurable blending** — RRF, linear, learned weights
|
||||
3. **Integrity-aware** — Hybrid index participates in contracted graph
|
||||
4. **PostgreSQL-native** — Leverages `tsvector` and GIN indexes
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+------------------+
|
||||
| Hybrid Query |
|
||||
| "error 500 fix" |
|
||||
+--------+---------+
|
||||
|
|
||||
+---------------+---------------+
|
||||
| |
|
||||
+--------v--------+ +---------v---------+
|
||||
| Vector Branch | | Keyword Branch |
|
||||
| (HNSW/IVF) | | (GIN/tsvector) |
|
||||
+--------+--------+ +---------+---------+
|
||||
| |
|
||||
| top-100 by cosine | top-100 by BM25
|
||||
| |
|
||||
+---------------+---------------+
|
||||
|
|
||||
+--------v--------+
|
||||
| Fusion Layer |
|
||||
| (RRF / Linear) |
|
||||
+--------+--------+
|
||||
|
|
||||
+--------v--------+
|
||||
| Final top-k |
|
||||
+--------+--------+
|
||||
|
|
||||
+--------v--------+
|
||||
| Optional Rerank |
|
||||
+-----------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQL Interface
|
||||
|
||||
### Basic Hybrid Search
|
||||
|
||||
```sql
|
||||
-- Simple hybrid search with default RRF fusion
|
||||
SELECT * FROM ruvector_hybrid_search(
|
||||
'documents', -- collection name
|
||||
query_text := 'database connection timeout error',
|
||||
query_vector := $embedding,
|
||||
k := 10
|
||||
);
|
||||
|
||||
-- Returns: id, content, vector_score, keyword_score, hybrid_score
|
||||
```
|
||||
|
||||
### Configurable Fusion
|
||||
|
||||
```sql
|
||||
-- RRF (Reciprocal Rank Fusion) - default, robust
|
||||
SELECT * FROM ruvector_hybrid_search(
|
||||
'documents',
|
||||
query_text := 'postgres replication lag',
|
||||
query_vector := $embedding,
|
||||
k := 20,
|
||||
fusion := 'rrf',
|
||||
rrf_k := 60 -- RRF constant (default 60)
|
||||
);
|
||||
|
||||
-- Linear blend with alpha
|
||||
SELECT * FROM ruvector_hybrid_search(
|
||||
'documents',
|
||||
query_text := 'postgres replication lag',
|
||||
query_vector := $embedding,
|
||||
k := 20,
|
||||
fusion := 'linear',
|
||||
alpha := 0.7 -- 0.7 * vector + 0.3 * keyword
|
||||
);
|
||||
|
||||
-- Learned fusion weights (from query patterns)
|
||||
SELECT * FROM ruvector_hybrid_search(
|
||||
'documents',
|
||||
query_text := 'postgres replication lag',
|
||||
query_vector := $embedding,
|
||||
k := 20,
|
||||
fusion := 'learned' -- Uses GNN-trained weights
|
||||
);
|
||||
```
|
||||
|
||||
### Operator Syntax (Advanced)
|
||||
|
||||
```sql
|
||||
-- Using hybrid operator in ORDER BY
|
||||
SELECT id, content,
|
||||
ruvector_hybrid_score(
|
||||
embedding <=> $query_vec,
|
||||
ts_rank_cd(fts, plainto_tsquery($query_text)),
|
||||
alpha := 0.6
|
||||
) AS score
|
||||
FROM documents
|
||||
WHERE fts @@ plainto_tsquery($query_text) -- Pre-filter
|
||||
OR embedding <=> $query_vec < 0.5 -- Or similar vectors
|
||||
ORDER BY score DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema Requirements
|
||||
|
||||
### Collection with Hybrid Support
|
||||
|
||||
```sql
|
||||
-- Create table with both vector and FTS columns
|
||||
CREATE TABLE documents (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
content TEXT NOT NULL,
|
||||
embedding vector(1536) NOT NULL,
|
||||
fts tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
|
||||
metadata JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Vector index
|
||||
CREATE INDEX idx_documents_embedding
|
||||
ON documents USING ruhnsw (embedding vector_cosine_ops)
|
||||
WITH (m = 16, ef_construction = 100);
|
||||
|
||||
-- FTS index
|
||||
CREATE INDEX idx_documents_fts
|
||||
ON documents USING gin (fts);
|
||||
|
||||
-- Register for hybrid search
|
||||
SELECT ruvector_register_hybrid(
|
||||
collection := 'documents',
|
||||
vector_column := 'embedding',
|
||||
fts_column := 'fts',
|
||||
text_column := 'content' -- For BM25 stats
|
||||
);
|
||||
```
|
||||
|
||||
### Hybrid Registration Table
|
||||
|
||||
```sql
|
||||
-- Internal: tracks hybrid-enabled collections
|
||||
CREATE TABLE ruvector.hybrid_collections (
|
||||
id SERIAL PRIMARY KEY,
|
||||
collection_id INTEGER NOT NULL REFERENCES ruvector.collections(id),
|
||||
vector_column TEXT NOT NULL,
|
||||
fts_column TEXT NOT NULL,
|
||||
text_column TEXT NOT NULL,
|
||||
|
||||
-- BM25 parameters (computed from corpus)
|
||||
avg_doc_length REAL,
|
||||
doc_count BIGINT,
|
||||
k1 REAL DEFAULT 1.2,
|
||||
b REAL DEFAULT 0.75,
|
||||
|
||||
-- Fusion settings
|
||||
default_fusion TEXT DEFAULT 'rrf',
|
||||
default_alpha REAL DEFAULT 0.5,
|
||||
learned_weights JSONB,
|
||||
|
||||
-- Stats
|
||||
last_stats_update TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## BM25 Implementation
|
||||
|
||||
### Why Not Just ts_rank?
|
||||
|
||||
PostgreSQL's `ts_rank` is not true BM25. It doesn't account for:
|
||||
- Document length normalization
|
||||
- IDF weighting across corpus
|
||||
- Term frequency saturation
|
||||
|
||||
We implement proper BM25 in the engine.
|
||||
|
||||
### BM25 Scoring
|
||||
|
||||
```rust
|
||||
// src/hybrid/bm25.rs
|
||||
|
||||
/// BM25 scorer with corpus statistics
|
||||
pub struct BM25Scorer {
|
||||
k1: f32, // Term frequency saturation (default 1.2)
|
||||
b: f32, // Length normalization (default 0.75)
|
||||
avg_doc_len: f32, // Average document length
|
||||
doc_count: u64, // Total documents
|
||||
idf_cache: HashMap<String, f32>, // Cached IDF values
|
||||
}
|
||||
|
||||
impl BM25Scorer {
|
||||
/// Compute IDF for a term
|
||||
fn idf(&self, doc_freq: u64) -> f32 {
|
||||
let n = self.doc_count as f32;
|
||||
let df = doc_freq as f32;
|
||||
((n - df + 0.5) / (df + 0.5) + 1.0).ln()
|
||||
}
|
||||
|
||||
/// Score a document for a query
|
||||
pub fn score(&self, doc: &Document, query_terms: &[String]) -> f32 {
|
||||
let doc_len = doc.term_count as f32;
|
||||
let len_norm = 1.0 - self.b + self.b * (doc_len / self.avg_doc_len);
|
||||
|
||||
query_terms.iter()
|
||||
.filter_map(|term| {
|
||||
let tf = doc.term_freq(term)? as f32;
|
||||
let idf = self.idf_cache.get(term)?;
|
||||
|
||||
// BM25 formula
|
||||
let numerator = tf * (self.k1 + 1.0);
|
||||
let denominator = tf + self.k1 * len_norm;
|
||||
|
||||
Some(idf * numerator / denominator)
|
||||
})
|
||||
.sum()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Corpus Statistics Update
|
||||
|
||||
```sql
|
||||
-- Update BM25 statistics (run periodically or after bulk inserts)
|
||||
SELECT ruvector_hybrid_update_stats('documents');
|
||||
|
||||
-- Stats stored in hybrid_collections table
|
||||
-- Computed via background worker or on-demand
|
||||
```
|
||||
|
||||
```rust
|
||||
// Background worker updates corpus stats
|
||||
pub fn update_bm25_stats(collection_id: i32) -> Result<(), Error> {
|
||||
Spi::run(|client| {
|
||||
// Get average document length
|
||||
let avg_len: f64 = client.select(
|
||||
"SELECT AVG(LENGTH(content)) FROM documents",
|
||||
None, &[]
|
||||
)?.first().unwrap().get(1)?;
|
||||
|
||||
// Get document count
|
||||
let doc_count: i64 = client.select(
|
||||
"SELECT COUNT(*) FROM documents",
|
||||
None, &[]
|
||||
)?.first().unwrap().get(1)?;
|
||||
|
||||
// Update term frequencies (using tsvector stats)
|
||||
// ... compute IDF cache ...
|
||||
|
||||
client.update(
|
||||
"UPDATE ruvector.hybrid_collections
|
||||
SET avg_doc_length = $1, doc_count = $2, last_stats_update = NOW()
|
||||
WHERE collection_id = $3",
|
||||
None,
|
||||
&[avg_len.into(), doc_count.into(), collection_id.into()]
|
||||
)
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fusion Algorithms
|
||||
|
||||
### Reciprocal Rank Fusion (RRF)
|
||||
|
||||
Default and most robust. Works without score calibration.
|
||||
|
||||
```rust
|
||||
// src/hybrid/fusion.rs
|
||||
|
||||
/// RRF fusion: score = sum(1 / (k + rank_i))
|
||||
pub fn rrf_fusion(
|
||||
vector_results: &[(DocId, f32)], // (id, distance)
|
||||
keyword_results: &[(DocId, f32)], // (id, bm25_score)
|
||||
k: usize, // RRF constant (default 60)
|
||||
limit: usize,
|
||||
) -> Vec<(DocId, f32)> {
|
||||
let mut scores: HashMap<DocId, f32> = HashMap::new();
|
||||
|
||||
// Vector ranking (lower distance = higher rank)
|
||||
for (rank, (doc_id, _)) in vector_results.iter().enumerate() {
|
||||
*scores.entry(*doc_id).or_default() += 1.0 / (k + rank + 1) as f32;
|
||||
}
|
||||
|
||||
// Keyword ranking (higher BM25 = higher rank)
|
||||
for (rank, (doc_id, _)) in keyword_results.iter().enumerate() {
|
||||
*scores.entry(*doc_id).or_default() += 1.0 / (k + rank + 1) as f32;
|
||||
}
|
||||
|
||||
// Sort by fused score
|
||||
let mut results: Vec<_> = scores.into_iter().collect();
|
||||
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
|
||||
results.truncate(limit);
|
||||
results
|
||||
}
|
||||
```
|
||||
|
||||
### Linear Fusion
|
||||
|
||||
Simple weighted combination. Requires score normalization.
|
||||
|
||||
```rust
|
||||
/// Linear fusion: score = alpha * vec_score + (1 - alpha) * kw_score
|
||||
pub fn linear_fusion(
|
||||
vector_results: &[(DocId, f32)],
|
||||
keyword_results: &[(DocId, f32)],
|
||||
alpha: f32,
|
||||
limit: usize,
|
||||
) -> Vec<(DocId, f32)> {
|
||||
// Normalize vector scores (convert distance to similarity)
|
||||
let vec_scores = normalize_to_similarity(vector_results);
|
||||
|
||||
// Normalize BM25 scores to [0, 1]
|
||||
let kw_scores = min_max_normalize(keyword_results);
|
||||
|
||||
// Combine
|
||||
let mut combined: HashMap<DocId, f32> = HashMap::new();
|
||||
|
||||
for (doc_id, score) in vec_scores {
|
||||
*combined.entry(doc_id).or_default() += alpha * score;
|
||||
}
|
||||
|
||||
for (doc_id, score) in kw_scores {
|
||||
*combined.entry(doc_id).or_default() += (1.0 - alpha) * score;
|
||||
}
|
||||
|
||||
let mut results: Vec<_> = combined.into_iter().collect();
|
||||
results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
|
||||
results.truncate(limit);
|
||||
results
|
||||
}
|
||||
```
|
||||
|
||||
### Learned Fusion
|
||||
|
||||
Uses query characteristics to select weights dynamically.
|
||||
|
||||
```rust
|
||||
/// Learned fusion using GNN-predicted weights
|
||||
pub fn learned_fusion(
|
||||
query_embedding: &[f32],
|
||||
query_terms: &[String],
|
||||
vector_results: &[(DocId, f32)],
|
||||
keyword_results: &[(DocId, f32)],
|
||||
model: &FusionModel,
|
||||
limit: usize,
|
||||
) -> Vec<(DocId, f32)> {
|
||||
// Query features
|
||||
let features = QueryFeatures {
|
||||
embedding_norm: l2_norm(query_embedding),
|
||||
term_count: query_terms.len(),
|
||||
avg_term_idf: compute_avg_idf(query_terms),
|
||||
has_exact_match: detect_exact_match_intent(query_terms),
|
||||
query_type: classify_query_type(query_terms), // navigational, informational, etc.
|
||||
};
|
||||
|
||||
// Predict optimal alpha for this query
|
||||
let alpha = model.predict_alpha(&features);
|
||||
|
||||
linear_fusion(vector_results, keyword_results, alpha, limit)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integrity Integration
|
||||
|
||||
Hybrid search participates in the integrity control plane.
|
||||
|
||||
### Contracted Graph Nodes
|
||||
|
||||
```sql
|
||||
-- Hybrid index adds nodes to contracted graph
|
||||
INSERT INTO ruvector.contracted_graph (collection_id, node_type, node_id, node_name, health_score)
|
||||
SELECT
|
||||
c.id,
|
||||
'hybrid_index',
|
||||
h.id,
|
||||
'hybrid_' || c.name,
|
||||
CASE
|
||||
WHEN h.last_stats_update > NOW() - INTERVAL '1 day' THEN 1.0
|
||||
WHEN h.last_stats_update > NOW() - INTERVAL '7 days' THEN 0.7
|
||||
ELSE 0.3 -- Stale stats degrade health
|
||||
END
|
||||
FROM ruvector.hybrid_collections h
|
||||
JOIN ruvector.collections c ON h.collection_id = c.id;
|
||||
```
|
||||
|
||||
### Integrity-Aware Hybrid Search
|
||||
|
||||
```rust
|
||||
/// Hybrid search with integrity gating
|
||||
pub fn hybrid_search_with_integrity(
|
||||
collection_id: i32,
|
||||
query: &HybridQuery,
|
||||
) -> Result<Vec<HybridResult>, Error> {
|
||||
// Check integrity gate
|
||||
let gate = check_integrity_gate(collection_id, "hybrid_search");
|
||||
|
||||
match gate.state {
|
||||
IntegrityState::Normal => {
|
||||
// Full hybrid: both branches
|
||||
execute_full_hybrid(query)
|
||||
}
|
||||
IntegrityState::Stress => {
|
||||
// Degrade gracefully: prefer faster branch
|
||||
if query.alpha > 0.5 {
|
||||
// Vector-heavy query: use vector only
|
||||
execute_vector_only(query)
|
||||
} else {
|
||||
// Keyword-heavy query: use keyword only
|
||||
execute_keyword_only(query)
|
||||
}
|
||||
}
|
||||
IntegrityState::Critical => {
|
||||
// Minimal: keyword only (cheapest)
|
||||
execute_keyword_only(query)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Pre-filtering Strategy
|
||||
|
||||
```sql
|
||||
-- Hybrid search with pre-filter (faster for selective filters)
|
||||
SELECT * FROM ruvector_hybrid_search(
|
||||
'documents',
|
||||
query_text := 'error handling',
|
||||
query_vector := $embedding,
|
||||
k := 10,
|
||||
filter := 'category = ''backend'' AND created_at > NOW() - INTERVAL ''30 days'''
|
||||
);
|
||||
```
|
||||
|
||||
```rust
|
||||
// Execution strategy selection
|
||||
fn choose_strategy(filter_selectivity: f32, corpus_size: u64) -> HybridStrategy {
|
||||
if filter_selectivity < 0.01 {
|
||||
// Very selective: pre-filter, then hybrid on small set
|
||||
HybridStrategy::PreFilter
|
||||
} else if filter_selectivity < 0.1 && corpus_size > 1_000_000 {
|
||||
// Moderately selective, large corpus: hybrid first, post-filter
|
||||
HybridStrategy::PostFilter
|
||||
} else {
|
||||
// Not selective: full hybrid
|
||||
HybridStrategy::Full
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Parallel Execution
|
||||
|
||||
```rust
|
||||
/// Execute vector and keyword branches in parallel
|
||||
pub async fn parallel_hybrid(query: &HybridQuery) -> HybridResults {
|
||||
let (vector_results, keyword_results) = tokio::join!(
|
||||
execute_vector_branch(&query.embedding, query.prefetch_k),
|
||||
execute_keyword_branch(&query.text, query.prefetch_k),
|
||||
);
|
||||
|
||||
fuse_results(vector_results, keyword_results, query.fusion, query.k)
|
||||
}
|
||||
```
|
||||
|
||||
### Caching
|
||||
|
||||
```rust
|
||||
/// Cache BM25 scores for repeated terms
|
||||
pub struct HybridCache {
|
||||
term_doc_scores: LruCache<(String, DocId), f32>,
|
||||
idf_cache: HashMap<String, f32>,
|
||||
ttl: Duration,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### GUC Parameters
|
||||
|
||||
```sql
|
||||
-- Default fusion method
|
||||
SET ruvector.hybrid_fusion = 'rrf'; -- 'rrf', 'linear', 'learned'
|
||||
|
||||
-- Default alpha for linear fusion
|
||||
SET ruvector.hybrid_alpha = 0.5;
|
||||
|
||||
-- RRF constant
|
||||
SET ruvector.hybrid_rrf_k = 60;
|
||||
|
||||
-- Prefetch size for each branch
|
||||
SET ruvector.hybrid_prefetch_k = 100;
|
||||
|
||||
-- Enable parallel branch execution
|
||||
SET ruvector.hybrid_parallel = true;
|
||||
```
|
||||
|
||||
### Per-Collection Settings
|
||||
|
||||
```sql
|
||||
SELECT ruvector_hybrid_configure('documents', '{
|
||||
"default_fusion": "learned",
|
||||
"prefetch_k": 200,
|
||||
"bm25_k1": 1.5,
|
||||
"bm25_b": 0.8,
|
||||
"stats_refresh_interval": "1 hour"
|
||||
}'::jsonb);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
```sql
|
||||
-- Hybrid search statistics
|
||||
SELECT * FROM ruvector_hybrid_stats('documents');
|
||||
|
||||
-- Returns:
|
||||
-- {
|
||||
-- "total_searches": 15234,
|
||||
-- "avg_vector_latency_ms": 4.2,
|
||||
-- "avg_keyword_latency_ms": 2.1,
|
||||
-- "avg_fusion_latency_ms": 0.3,
|
||||
-- "cache_hit_rate": 0.67,
|
||||
-- "last_stats_update": "2024-01-15T10:30:00Z",
|
||||
-- "corpus_size": 1250000,
|
||||
-- "avg_doc_length": 542
|
||||
-- }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Correctness Tests
|
||||
- BM25 scoring matches reference implementation
|
||||
- RRF fusion produces expected rankings
|
||||
- Linear fusion respects alpha parameter
|
||||
- Learned fusion adapts to query type
|
||||
|
||||
### Performance Tests
|
||||
- Hybrid search < 2x single-branch latency
|
||||
- Parallel execution shows speedup
|
||||
- Cache hit rate > 50% for repeated queries
|
||||
|
||||
### Integration Tests
|
||||
- Integrity degradation triggers graceful fallback
|
||||
- Stats update doesn't block queries
|
||||
- Large corpus (10M+ docs) scales
|
||||
|
||||
---
|
||||
|
||||
## Example: RAG Application
|
||||
|
||||
```sql
|
||||
-- Complete RAG retrieval with hybrid search
|
||||
WITH retrieved AS (
|
||||
SELECT
|
||||
id,
|
||||
content,
|
||||
hybrid_score,
|
||||
metadata
|
||||
FROM ruvector_hybrid_search(
|
||||
'knowledge_base',
|
||||
query_text := $user_question,
|
||||
query_vector := $question_embedding,
|
||||
k := 5,
|
||||
fusion := 'rrf',
|
||||
filter := 'status = ''published'''
|
||||
)
|
||||
)
|
||||
SELECT
|
||||
string_agg(content, E'\n\n---\n\n') AS context,
|
||||
array_agg(id) AS source_ids
|
||||
FROM retrieved;
|
||||
|
||||
-- Pass context to LLM for answer generation
|
||||
```
|
||||
719
docs/postgres/v2/12-multi-tenancy.md
Normal file
719
docs/postgres/v2/12-multi-tenancy.md
Normal file
@@ -0,0 +1,719 @@
|
||||
# RuVector Postgres v2 - Multi-Tenancy Model
|
||||
|
||||
## Why Multi-Tenancy Matters
|
||||
|
||||
Every SaaS application needs tenant isolation. Without native support, teams build:
|
||||
- Separate databases per tenant (operational nightmare)
|
||||
- Manual partition schemes (error-prone)
|
||||
- Application-level filtering (security risk)
|
||||
|
||||
RuVector provides **first-class multi-tenancy** with:
|
||||
- Tenant-isolated search (data never leaks)
|
||||
- Per-tenant integrity monitoring (one bad tenant doesn't sink others)
|
||||
- Efficient shared infrastructure (cost-effective)
|
||||
- Row-level security integration (PostgreSQL-native)
|
||||
|
||||
---
|
||||
|
||||
## Design Goals
|
||||
|
||||
1. **Zero data leakage** — Tenant A never sees Tenant B's vectors
|
||||
2. **Per-tenant integrity** — Stress in one tenant doesn't affect others
|
||||
3. **Fair resource allocation** — No noisy neighbor problems
|
||||
4. **Transparent to queries** — SET tenant, then normal SQL
|
||||
5. **Efficient storage** — Shared indexes where safe, isolated where needed
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| Application |
|
||||
| SET ruvector.tenant_id = 'acme-corp'; |
|
||||
| SELECT * FROM embeddings ORDER BY vec <-> $q LIMIT 10; |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
+------------------------------------------------------------------+
|
||||
| Tenant Context Layer |
|
||||
| - Validates tenant_id |
|
||||
| - Injects tenant filter into all operations |
|
||||
| - Routes to tenant-specific resources |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
+---------------+---------------+
|
||||
| |
|
||||
+--------v--------+ +---------v---------+
|
||||
| Shared Index | | Tenant Indexes |
|
||||
| (small tenants)| | (large tenants) |
|
||||
+--------+--------+ +---------+---------+
|
||||
| |
|
||||
+---------------+---------------+
|
||||
|
|
||||
+------------------------------------------------------------------+
|
||||
| Per-Tenant Integrity |
|
||||
| - Separate contracted graphs |
|
||||
| - Independent state machines |
|
||||
| - Isolated throttling policies |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQL Interface
|
||||
|
||||
### Setting Tenant Context
|
||||
|
||||
```sql
|
||||
-- Set tenant for session (required before any operation)
|
||||
SET ruvector.tenant_id = 'acme-corp';
|
||||
|
||||
-- Or per-transaction
|
||||
BEGIN;
|
||||
SET LOCAL ruvector.tenant_id = 'acme-corp';
|
||||
-- ... operations ...
|
||||
COMMIT;
|
||||
|
||||
-- Verify current tenant
|
||||
SELECT current_setting('ruvector.tenant_id');
|
||||
```
|
||||
|
||||
### Tenant-Transparent Operations
|
||||
|
||||
```sql
|
||||
-- Once tenant is set, all operations are automatically scoped
|
||||
SET ruvector.tenant_id = 'acme-corp';
|
||||
|
||||
-- Insert only sees/affects acme-corp data
|
||||
INSERT INTO embeddings (content, vec) VALUES ('doc', $embedding);
|
||||
|
||||
-- Search only returns acme-corp results
|
||||
SELECT * FROM embeddings ORDER BY vec <-> $query LIMIT 10;
|
||||
|
||||
-- Delete only affects acme-corp
|
||||
DELETE FROM embeddings WHERE id = 123;
|
||||
```
|
||||
|
||||
### Admin Operations (Cross-Tenant)
|
||||
|
||||
```sql
|
||||
-- Superuser can query across tenants
|
||||
SET ruvector.tenant_id = '*'; -- Wildcard (admin only)
|
||||
|
||||
-- View all tenants
|
||||
SELECT * FROM ruvector_tenants();
|
||||
|
||||
-- View tenant stats
|
||||
SELECT * FROM ruvector_tenant_stats('acme-corp');
|
||||
|
||||
-- Migrate tenant to dedicated index
|
||||
SELECT ruvector_tenant_isolate('acme-corp');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema Design
|
||||
|
||||
### Tenant Registry
|
||||
|
||||
```sql
|
||||
CREATE TABLE ruvector.tenants (
|
||||
id TEXT PRIMARY KEY,
|
||||
display_name TEXT,
|
||||
|
||||
-- Resource limits
|
||||
max_vectors BIGINT DEFAULT 1000000,
|
||||
max_collections INTEGER DEFAULT 10,
|
||||
max_qps INTEGER DEFAULT 100,
|
||||
|
||||
-- Isolation level
|
||||
isolation_level TEXT DEFAULT 'shared' CHECK (isolation_level IN (
|
||||
'shared', -- Shared index with tenant filter
|
||||
'partition', -- Dedicated partition in shared index
|
||||
'dedicated' -- Separate physical index
|
||||
)),
|
||||
|
||||
-- Integrity settings
|
||||
integrity_enabled BOOLEAN DEFAULT true,
|
||||
integrity_policy_id INTEGER REFERENCES ruvector.integrity_policies(id),
|
||||
|
||||
-- Metadata
|
||||
metadata JSONB DEFAULT '{}'::jsonb,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
suspended_at TIMESTAMPTZ, -- Non-null = suspended
|
||||
|
||||
-- Stats (updated by background worker)
|
||||
vector_count BIGINT DEFAULT 0,
|
||||
storage_bytes BIGINT DEFAULT 0,
|
||||
last_access TIMESTAMPTZ
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tenants_isolation ON ruvector.tenants(isolation_level);
|
||||
CREATE INDEX idx_tenants_suspended ON ruvector.tenants(suspended_at) WHERE suspended_at IS NOT NULL;
|
||||
```
|
||||
|
||||
### Tenant-Aware Collections
|
||||
|
||||
```sql
|
||||
-- Collections can be tenant-specific or shared
|
||||
CREATE TABLE ruvector.collections (
|
||||
id SERIAL PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
tenant_id TEXT REFERENCES ruvector.tenants(id), -- NULL = shared
|
||||
|
||||
-- ... other columns from 01-sql-schema.md ...
|
||||
|
||||
UNIQUE (name, tenant_id) -- Same name allowed for different tenants
|
||||
);
|
||||
|
||||
-- Tenant-scoped view
|
||||
CREATE VIEW ruvector.my_collections AS
|
||||
SELECT * FROM ruvector.collections
|
||||
WHERE tenant_id = current_setting('ruvector.tenant_id', true)
|
||||
OR tenant_id IS NULL; -- Shared collections visible to all
|
||||
```
|
||||
|
||||
### Tenant Column in Data Tables
|
||||
|
||||
```sql
|
||||
-- User tables include tenant_id column
|
||||
CREATE TABLE embeddings (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
tenant_id TEXT NOT NULL DEFAULT current_setting('ruvector.tenant_id'),
|
||||
content TEXT,
|
||||
vec vector(1536),
|
||||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
|
||||
CONSTRAINT fk_tenant FOREIGN KEY (tenant_id)
|
||||
REFERENCES ruvector.tenants(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- Partial index per tenant (for dedicated isolation)
|
||||
CREATE INDEX idx_embeddings_vec_tenant_acme
|
||||
ON embeddings USING ruhnsw (vec vector_cosine_ops)
|
||||
WHERE tenant_id = 'acme-corp';
|
||||
|
||||
-- Or composite index for shared isolation
|
||||
CREATE INDEX idx_embeddings_vec_shared
|
||||
ON embeddings USING ruhnsw (vec vector_cosine_ops);
|
||||
-- Engine internally filters by tenant_id
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Row-Level Security Integration
|
||||
|
||||
### RLS Policies
|
||||
|
||||
```sql
|
||||
-- Enable RLS on data tables
|
||||
ALTER TABLE embeddings ENABLE ROW LEVEL SECURITY;
|
||||
|
||||
-- Tenant isolation policy
|
||||
CREATE POLICY tenant_isolation ON embeddings
|
||||
USING (tenant_id = current_setting('ruvector.tenant_id', true))
|
||||
WITH CHECK (tenant_id = current_setting('ruvector.tenant_id', true));
|
||||
|
||||
-- Admin bypass policy
|
||||
CREATE POLICY admin_access ON embeddings
|
||||
FOR ALL
|
||||
TO ruvector_admin
|
||||
USING (true)
|
||||
WITH CHECK (true);
|
||||
```
|
||||
|
||||
### Automatic Policy Creation
|
||||
|
||||
```sql
|
||||
-- Helper function to set up RLS for a table
|
||||
CREATE FUNCTION ruvector_enable_tenant_rls(
|
||||
p_table_name TEXT,
|
||||
p_tenant_column TEXT DEFAULT 'tenant_id'
|
||||
) RETURNS void AS $$
|
||||
BEGIN
|
||||
-- Enable RLS
|
||||
EXECUTE format('ALTER TABLE %I ENABLE ROW LEVEL SECURITY', p_table_name);
|
||||
|
||||
-- Create isolation policy
|
||||
EXECUTE format(
|
||||
'CREATE POLICY tenant_isolation ON %I
|
||||
USING (%I = current_setting(''ruvector.tenant_id'', true))
|
||||
WITH CHECK (%I = current_setting(''ruvector.tenant_id'', true))',
|
||||
p_table_name, p_tenant_column, p_tenant_column
|
||||
);
|
||||
|
||||
-- Create admin bypass
|
||||
EXECUTE format(
|
||||
'CREATE POLICY admin_bypass ON %I FOR ALL TO ruvector_admin USING (true)',
|
||||
p_table_name
|
||||
);
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
-- Usage
|
||||
SELECT ruvector_enable_tenant_rls('embeddings');
|
||||
SELECT ruvector_enable_tenant_rls('documents');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Isolation Levels
|
||||
|
||||
### Shared (Default)
|
||||
|
||||
All tenants share one index. Engine filters by tenant_id.
|
||||
|
||||
```
|
||||
Pros:
|
||||
+ Most memory-efficient
|
||||
+ Fastest for small tenants
|
||||
+ Simple management
|
||||
|
||||
Cons:
|
||||
- Some cross-tenant cache pollution
|
||||
- Shared integrity state
|
||||
|
||||
Best for: < 100K vectors per tenant
|
||||
```
|
||||
|
||||
### Partition
|
||||
|
||||
Tenants get dedicated partitions within shared index structure.
|
||||
|
||||
```
|
||||
Pros:
|
||||
+ Better cache isolation
|
||||
+ Per-partition integrity
|
||||
+ Easy promotion to dedicated
|
||||
|
||||
Cons:
|
||||
- Some overhead per partition
|
||||
- Still shares top-level structure
|
||||
|
||||
Best for: 100K - 10M vectors per tenant
|
||||
```
|
||||
|
||||
### Dedicated
|
||||
|
||||
Tenant gets completely separate physical index.
|
||||
|
||||
```
|
||||
Pros:
|
||||
+ Complete isolation
|
||||
+ Independent scaling
|
||||
+ Custom index parameters
|
||||
|
||||
Cons:
|
||||
- Higher memory overhead
|
||||
+ More management complexity
|
||||
|
||||
Best for: > 10M vectors, enterprise tenants, compliance requirements
|
||||
```
|
||||
|
||||
### Automatic Promotion
|
||||
|
||||
```sql
|
||||
-- Configure auto-promotion thresholds
|
||||
SELECT ruvector_tenant_set_policy('{
|
||||
"auto_promote_to_partition": 100000, -- vectors
|
||||
"auto_promote_to_dedicated": 10000000,
|
||||
"check_interval": "1 hour"
|
||||
}'::jsonb);
|
||||
```
|
||||
|
||||
```rust
|
||||
// Background worker checks and promotes
|
||||
pub fn check_tenant_promotion(tenant_id: &str) -> Option<IsolationLevel> {
|
||||
let stats = get_tenant_stats(tenant_id)?;
|
||||
let policy = get_promotion_policy()?;
|
||||
|
||||
if stats.vector_count > policy.dedicated_threshold {
|
||||
Some(IsolationLevel::Dedicated)
|
||||
} else if stats.vector_count > policy.partition_threshold {
|
||||
Some(IsolationLevel::Partition)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Per-Tenant Integrity
|
||||
|
||||
### Separate Contracted Graphs
|
||||
|
||||
```sql
|
||||
-- Each tenant gets its own contracted graph
|
||||
CREATE TABLE ruvector.tenant_contracted_graph (
|
||||
tenant_id TEXT NOT NULL REFERENCES ruvector.tenants(id),
|
||||
collection_id INTEGER NOT NULL,
|
||||
node_type TEXT NOT NULL,
|
||||
node_id BIGINT NOT NULL,
|
||||
-- ... same as contracted_graph ...
|
||||
|
||||
PRIMARY KEY (tenant_id, collection_id, node_type, node_id)
|
||||
);
|
||||
```
|
||||
|
||||
### Independent State Machines
|
||||
|
||||
```rust
|
||||
// Per-tenant integrity state
|
||||
pub struct TenantIntegrityState {
|
||||
tenant_id: String,
|
||||
state: IntegrityState,
|
||||
lambda_cut: f32,
|
||||
consecutive_samples: u32,
|
||||
last_transition: Instant,
|
||||
cooldown_until: Option<Instant>,
|
||||
}
|
||||
|
||||
// Tenant stress doesn't affect other tenants
|
||||
pub fn check_tenant_gate(tenant_id: &str, operation: &str) -> GateResult {
|
||||
let state = get_tenant_integrity_state(tenant_id);
|
||||
apply_policy(state, operation)
|
||||
}
|
||||
```
|
||||
|
||||
### Tenant-Specific Policies
|
||||
|
||||
```sql
|
||||
-- Each tenant can have custom thresholds
|
||||
INSERT INTO ruvector.integrity_policies (tenant_id, name, threshold_high, threshold_low)
|
||||
VALUES
|
||||
('acme-corp', 'enterprise', 0.6, 0.3), -- Stricter
|
||||
('startup-xyz', 'standard', 0.4, 0.15); -- Default
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resource Quotas
|
||||
|
||||
### Quota Enforcement
|
||||
|
||||
```sql
|
||||
-- Quota table
|
||||
CREATE TABLE ruvector.tenant_quotas (
|
||||
tenant_id TEXT PRIMARY KEY REFERENCES ruvector.tenants(id),
|
||||
max_vectors BIGINT NOT NULL DEFAULT 1000000,
|
||||
max_storage_gb REAL NOT NULL DEFAULT 10.0,
|
||||
max_qps INTEGER NOT NULL DEFAULT 100,
|
||||
max_concurrent INTEGER NOT NULL DEFAULT 10,
|
||||
|
||||
-- Current usage (updated by triggers/workers)
|
||||
current_vectors BIGINT DEFAULT 0,
|
||||
current_storage_gb REAL DEFAULT 0,
|
||||
|
||||
-- Rate limiting state
|
||||
request_count INTEGER DEFAULT 0,
|
||||
window_start TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Check quota before insert
|
||||
CREATE FUNCTION ruvector_check_quota() RETURNS TRIGGER AS $$
|
||||
DECLARE
|
||||
v_quota RECORD;
|
||||
BEGIN
|
||||
SELECT * INTO v_quota
|
||||
FROM ruvector.tenant_quotas
|
||||
WHERE tenant_id = NEW.tenant_id;
|
||||
|
||||
IF v_quota.current_vectors >= v_quota.max_vectors THEN
|
||||
RAISE EXCEPTION 'Tenant % has exceeded vector quota', NEW.tenant_id;
|
||||
END IF;
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER check_quota_before_insert
|
||||
BEFORE INSERT ON embeddings
|
||||
FOR EACH ROW EXECUTE FUNCTION ruvector_check_quota();
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
```rust
|
||||
// Token bucket rate limiter per tenant
|
||||
pub struct TenantRateLimiter {
|
||||
buckets: DashMap<String, TokenBucket>,
|
||||
}
|
||||
|
||||
impl TenantRateLimiter {
|
||||
pub fn check(&self, tenant_id: &str, tokens: u32) -> RateLimitResult {
|
||||
let bucket = self.buckets.entry(tenant_id.to_string())
|
||||
.or_insert_with(|| TokenBucket::new(
|
||||
get_tenant_qps_limit(tenant_id),
|
||||
));
|
||||
|
||||
if bucket.try_acquire(tokens) {
|
||||
RateLimitResult::Allowed
|
||||
} else {
|
||||
RateLimitResult::Limited {
|
||||
retry_after_ms: bucket.time_to_refill(tokens),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Fair Scheduling
|
||||
|
||||
```rust
|
||||
// Weighted fair queue for search requests
|
||||
pub struct FairScheduler {
|
||||
queues: HashMap<String, VecDeque<SearchRequest>>,
|
||||
weights: HashMap<String, f32>, // Based on tier/quota
|
||||
}
|
||||
|
||||
impl FairScheduler {
|
||||
pub fn next(&mut self) -> Option<SearchRequest> {
|
||||
// Weighted round-robin across tenants
|
||||
// Prevents one tenant from monopolizing resources
|
||||
let total_weight: f32 = self.weights.values().sum();
|
||||
|
||||
for (tenant_id, queue) in &mut self.queues {
|
||||
let weight = self.weights.get(tenant_id).unwrap_or(&1.0);
|
||||
let share = weight / total_weight;
|
||||
|
||||
// Probability of selecting this tenant's request
|
||||
if rand::random::<f32>() < share {
|
||||
if let Some(req) = queue.pop_front() {
|
||||
return Some(req);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback: any available request
|
||||
self.queues.values_mut()
|
||||
.find_map(|q| q.pop_front())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tenant Lifecycle
|
||||
|
||||
### Create Tenant
|
||||
|
||||
```sql
|
||||
SELECT ruvector_tenant_create('new-customer', '{
|
||||
"display_name": "New Customer Inc.",
|
||||
"max_vectors": 5000000,
|
||||
"max_qps": 200,
|
||||
"isolation_level": "shared",
|
||||
"integrity_enabled": true
|
||||
}'::jsonb);
|
||||
```
|
||||
|
||||
### Suspend Tenant
|
||||
|
||||
```sql
|
||||
-- Suspend (stops all operations, keeps data)
|
||||
SELECT ruvector_tenant_suspend('bad-actor');
|
||||
|
||||
-- Resume
|
||||
SELECT ruvector_tenant_resume('bad-actor');
|
||||
```
|
||||
|
||||
### Delete Tenant
|
||||
|
||||
```sql
|
||||
-- Soft delete (marks for cleanup)
|
||||
SELECT ruvector_tenant_delete('churned-customer');
|
||||
|
||||
-- Hard delete (immediate, for compliance)
|
||||
SELECT ruvector_tenant_delete('churned-customer', hard := true);
|
||||
```
|
||||
|
||||
### Migrate Isolation Level
|
||||
|
||||
```sql
|
||||
-- Promote to dedicated (online, no downtime)
|
||||
SELECT ruvector_tenant_migrate('enterprise-customer', 'dedicated');
|
||||
|
||||
-- Status check
|
||||
SELECT * FROM ruvector_tenant_migration_status('enterprise-customer');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Shared Memory Layout
|
||||
|
||||
```rust
|
||||
// Per-tenant state in shared memory
|
||||
#[repr(C)]
|
||||
pub struct TenantSharedState {
|
||||
tenant_id_hash: u64, // Fast lookup key
|
||||
integrity_state: u8, // 0=normal, 1=stress, 2=critical
|
||||
lambda_cut: f32, // Current mincut value
|
||||
request_count: AtomicU32, // For rate limiting
|
||||
last_request_epoch: AtomicU64, // Rate limit window
|
||||
flags: AtomicU32, // Suspended, migrating, etc.
|
||||
}
|
||||
|
||||
// Tenant lookup table
|
||||
pub struct TenantRegistry {
|
||||
states: [TenantSharedState; MAX_TENANTS], // Fixed array in shmem
|
||||
index: HashMap<String, usize>, // Heap-based lookup
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Per-Tenant Metrics
|
||||
|
||||
```sql
|
||||
-- Tenant dashboard
|
||||
SELECT
|
||||
t.id,
|
||||
t.display_name,
|
||||
t.isolation_level,
|
||||
tq.current_vectors,
|
||||
tq.max_vectors,
|
||||
ROUND(100.0 * tq.current_vectors / tq.max_vectors, 1) AS usage_pct,
|
||||
ts.integrity_state,
|
||||
ts.lambda_cut,
|
||||
ts.avg_search_latency_ms,
|
||||
ts.searches_last_hour
|
||||
FROM ruvector.tenants t
|
||||
JOIN ruvector.tenant_quotas tq ON t.id = tq.tenant_id
|
||||
JOIN ruvector.tenant_stats ts ON t.id = ts.tenant_id
|
||||
ORDER BY tq.current_vectors DESC;
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
```
|
||||
# Per-tenant metrics
|
||||
ruvector_tenant_vectors{tenant="acme-corp"} 1234567
|
||||
ruvector_tenant_integrity_state{tenant="acme-corp"} 1
|
||||
ruvector_tenant_lambda_cut{tenant="acme-corp"} 0.72
|
||||
ruvector_tenant_search_latency_p99{tenant="acme-corp"} 15.2
|
||||
ruvector_tenant_qps{tenant="acme-corp"} 45.3
|
||||
ruvector_tenant_quota_usage{tenant="acme-corp",resource="vectors"} 0.62
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Tenant ID Validation
|
||||
|
||||
```rust
|
||||
// Validate tenant_id before any operation
|
||||
pub fn validate_tenant_context() -> Result<String, Error> {
|
||||
let tenant_id = get_guc("ruvector.tenant_id")?;
|
||||
|
||||
// Check not empty
|
||||
if tenant_id.is_empty() {
|
||||
return Err(Error::NoTenantContext);
|
||||
}
|
||||
|
||||
// Check tenant exists and not suspended
|
||||
let tenant = get_tenant(&tenant_id)?;
|
||||
if tenant.suspended_at.is_some() {
|
||||
return Err(Error::TenantSuspended);
|
||||
}
|
||||
|
||||
Ok(tenant_id)
|
||||
}
|
||||
```
|
||||
|
||||
### Audit Logging
|
||||
|
||||
```sql
|
||||
-- Tenant operations audit log
|
||||
CREATE TABLE ruvector.tenant_audit_log (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
tenant_id TEXT NOT NULL,
|
||||
operation TEXT NOT NULL, -- search, insert, delete, etc.
|
||||
user_id TEXT, -- Application user
|
||||
details JSONB,
|
||||
ip_address INET,
|
||||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Enabled via GUC
|
||||
SET ruvector.audit_enabled = true;
|
||||
```
|
||||
|
||||
### Cross-Tenant Prevention
|
||||
|
||||
```rust
|
||||
// Engine-level enforcement (defense in depth)
|
||||
pub fn execute_search(request: &SearchRequest) -> Result<SearchResults, Error> {
|
||||
let context_tenant = validate_tenant_context()?;
|
||||
|
||||
// Double-check request matches context
|
||||
if let Some(req_tenant) = &request.tenant_id {
|
||||
if req_tenant != &context_tenant {
|
||||
// Log security event
|
||||
log_security_event("tenant_mismatch", &context_tenant, req_tenant);
|
||||
return Err(Error::TenantMismatch);
|
||||
}
|
||||
}
|
||||
|
||||
// Execute with tenant filter
|
||||
execute_search_internal(request, &context_tenant)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Isolation Tests
|
||||
- Tenant A cannot see Tenant B's data
|
||||
- Tenant A's stress doesn't affect Tenant B's operations
|
||||
- Suspended tenant cannot perform any operations
|
||||
|
||||
### Performance Tests
|
||||
- Shared isolation: < 5% overhead vs single-tenant
|
||||
- Dedicated isolation: equivalent to single-tenant
|
||||
- Rate limiting adds < 1ms latency
|
||||
|
||||
### Scale Tests
|
||||
- 1000+ tenants on shared infrastructure
|
||||
- 100+ tenants with dedicated isolation
|
||||
- Tenant migration under load
|
||||
|
||||
---
|
||||
|
||||
## Example: SaaS Application
|
||||
|
||||
```python
|
||||
# Application code
|
||||
class VectorService:
|
||||
def __init__(self, db_pool):
|
||||
self.pool = db_pool
|
||||
|
||||
def search(self, tenant_id: str, query_vec: list, k: int = 10):
|
||||
with self.pool.connection() as conn:
|
||||
# Set tenant context
|
||||
conn.execute("SET ruvector.tenant_id = %s", [tenant_id])
|
||||
|
||||
# Search (automatically scoped to tenant)
|
||||
results = conn.execute("""
|
||||
SELECT id, content, vec <-> %s AS distance
|
||||
FROM embeddings
|
||||
ORDER BY vec <-> %s
|
||||
LIMIT %s
|
||||
""", [query_vec, query_vec, k])
|
||||
|
||||
return results.fetchall()
|
||||
|
||||
def insert(self, tenant_id: str, content: str, vec: list):
|
||||
with self.pool.connection() as conn:
|
||||
conn.execute("SET ruvector.tenant_id = %s", [tenant_id])
|
||||
|
||||
# Insert (tenant_id auto-populated from context)
|
||||
conn.execute("""
|
||||
INSERT INTO embeddings (content, vec)
|
||||
VALUES (%s, %s)
|
||||
""", [content, vec])
|
||||
```
|
||||
1018
docs/postgres/v2/13-self-healing.md
Normal file
1018
docs/postgres/v2/13-self-healing.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user