git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
407 lines
13 KiB
Markdown
407 lines
13 KiB
Markdown
# Critical Gaps in RuVector-PGlite Implementation Plan
|
|
|
|
## 🚨 Major Architectural Flaws Discovered
|
|
|
|
After researching actual PGlite extension development, the original implementation plan has **critical flaws** that must be addressed.
|
|
|
|
---
|
|
|
|
## ❌ What Was WRONG in the Original Plan
|
|
|
|
### 1. **pgrx Does NOT Support WASM Compilation**
|
|
|
|
**Original Assumption**: Use pgrx with `wasm32-unknown-unknown` target
|
|
```toml
|
|
# ❌ THIS DOESN'T WORK
|
|
[lib]
|
|
crate-type = ["cdylib"]
|
|
|
|
[target.wasm32-unknown-unknown]
|
|
# pgrx is not designed for WASM target
|
|
```
|
|
|
|
**Reality**:
|
|
- pgrx is designed to build **native** PostgreSQL extensions (.so, .dylib, .dll)
|
|
- pgrx is used to build extensions that **run** WebAssembly (via Extism), not extensions **compiled to** WebAssembly
|
|
- No evidence of pgrx supporting wasm32 as a compilation target
|
|
|
|
**Sources**:
|
|
- [pgrx WebAssembly research](https://dylibso.com/blog/pg-extism/)
|
|
- No wasm32 target in pgrx documentation
|
|
|
|
### 2. **Wrong Build Toolchain**
|
|
|
|
**Original Plan**: `cargo pgrx package --target wasm32`
|
|
|
|
**Reality**: PGlite extensions require:
|
|
```bash
|
|
# ✅ CORRECT: Emscripten toolchain
|
|
emcc -o extension.wasm extension.c \
|
|
-I$POSTGRES_INCLUDE \
|
|
-s MAIN_MODULE=1 \
|
|
-s ASYNCIFY
|
|
```
|
|
|
|
**Required Tools**:
|
|
- ✅ Emscripten SDK (emsdk)
|
|
- ✅ PostgreSQL headers for WASM
|
|
- ✅ Tar packaging for `.tar.gz` bundles
|
|
- ❌ NOT cargo pgrx
|
|
|
|
### 3. **Misunderstood Extension Structure**
|
|
|
|
**Original Plan**: Build a standalone `.wasm` file
|
|
|
|
**Reality**: PGlite extensions are `.tar.gz` tarballs containing:
|
|
```
|
|
vector.tar.gz
|
|
├── extension/
|
|
│ ├── vector.so.wasm # WASM compiled extension
|
|
│ ├── vector.control # Extension metadata
|
|
│ ├── vector--*.sql # SQL install scripts
|
|
│ └── data/ # Any data files
|
|
```
|
|
|
|
**Actual pgvector Implementation**:
|
|
```typescript
|
|
// packages/pglite/src/vector/index.ts
|
|
const setup = async (_pg: PGliteInterface, emscriptenOpts: any) => {
|
|
return {
|
|
emscriptenOpts,
|
|
bundlePath: new URL('../../release/vector.tar.gz', import.meta.url),
|
|
}
|
|
}
|
|
|
|
export const vector = { name: 'pgvector', setup }
|
|
```
|
|
|
|
**Source**: [PGlite vector extension source](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts)
|
|
|
|
### 4. **Missing Build Process Details**
|
|
|
|
**What Was Missing**:
|
|
- How to clone PGlite with submodules
|
|
- How to add extension to `postgres-pglite/pglite/Makefile`
|
|
- How to build within PGlite's build system
|
|
- Emscripten compilation flags (MAIN_MODULE, ASYNCIFY)
|
|
- Tarball packaging steps
|
|
|
|
**Actual Process** ([source](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md)):
|
|
```bash
|
|
# 1. Clone PGlite
|
|
git clone --recurse-submodules git@github.com:electric-sql/pglite.git
|
|
cd pglite && pnpm i
|
|
|
|
# 2. Add extension as submodule
|
|
cd postgres-pglite/pglite
|
|
git submodule add <extension_url>
|
|
|
|
# 3. Register in Makefile
|
|
echo "SUBDIRS += ruvector" >> Makefile
|
|
|
|
# 4. Build (creates .tar.gz)
|
|
pnpm build:all
|
|
# Output: packages/pglite/release/ruvector.tar.gz
|
|
```
|
|
|
|
### 5. **No Rust-Specific Guidance**
|
|
|
|
**Gap**: How to write a Rust extension that compiles with Emscripten?
|
|
|
|
**Missing Details**:
|
|
- Rust → C FFI interface layer
|
|
- `#[no_mangle]` exports for PostgreSQL API
|
|
- Memory management (Emscripten vs Rust allocator)
|
|
- Build script for `emcc` + `rustc`
|
|
|
|
**Possible Approaches**:
|
|
|
|
#### Option A: Pure C Extension
|
|
```c
|
|
// ruvector_pglite.c
|
|
#include "postgres.h"
|
|
#include "fmgr.h"
|
|
|
|
PG_MODULE_MAGIC;
|
|
|
|
PG_FUNCTION_INFO_V1(vector_cosine_distance);
|
|
Datum vector_cosine_distance(PG_FUNCTION_ARGS) {
|
|
// Call Rust library via FFI
|
|
float32 result = rust_cosine_distance(...);
|
|
PG_RETURN_FLOAT4(result);
|
|
}
|
|
```
|
|
|
|
Then compile:
|
|
```bash
|
|
emcc -o ruvector.wasm ruvector_pglite.c libruvector_core.a \
|
|
-I$PG_INCLUDE -s MAIN_MODULE=1
|
|
```
|
|
|
|
#### Option B: Rust with C Wrapper
|
|
```rust
|
|
// ruvector_core/src/ffi.rs
|
|
#[no_mangle]
|
|
pub extern "C" fn rust_cosine_distance(
|
|
a: *const f32,
|
|
b: *const f32,
|
|
len: usize
|
|
) -> f32 {
|
|
// Safe Rust implementation
|
|
}
|
|
```
|
|
|
|
Then build:
|
|
```bash
|
|
# Build Rust to WASM staticlib
|
|
cargo build --target wasm32-unknown-emscripten --release
|
|
|
|
# Link with C wrapper
|
|
emcc -o ruvector.wasm wrapper.c libruvector_core.a \
|
|
-I$PG_INCLUDE -s MAIN_MODULE=1
|
|
```
|
|
|
|
### 6. **Size Targets May Be Unrealistic**
|
|
|
|
**Original Target**: 500KB-1MB WASM
|
|
|
|
**Reality Check**:
|
|
- pgvector (minimal, C-based): ~200KB compiled to WASM
|
|
- Full ruvector features (even stripped): likely 2-5MB
|
|
- Rust std library adds ~100-300KB
|
|
- PostgreSQL runtime overhead: varies
|
|
|
|
**Revised Targets**:
|
|
- Minimal (types + distances): ~500KB-1MB ✅
|
|
- With HNSW index: ~1-2MB
|
|
- With quantization: ~2-3MB
|
|
- Full features: 5-10MB (defeats purpose)
|
|
|
|
### 7. **No TypeScript Plugin API Consideration**
|
|
|
|
**Missing Alternative**: PGlite's custom plugin API
|
|
|
|
Instead of a PostgreSQL extension, could build a **TypeScript plugin** that provides vector operations via PGlite's namespace API:
|
|
|
|
```typescript
|
|
// Hybrid approach: TypeScript + WASM compute kernel
|
|
import { Extension } from '@electric-sql/pglite'
|
|
import init, { cosineDistance } from './ruvector_core.wasm'
|
|
|
|
const setup = async (pg: PGliteInterface) => {
|
|
await init() // Initialize WASM
|
|
|
|
return {
|
|
namespaceObj: {
|
|
vector: {
|
|
cosineDistance: (a: Float32Array, b: Float32Array) =>
|
|
cosineDistance(a, b), // WASM function
|
|
// Other vector operations...
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
export const ruvector = { name: 'ruvector', setup }
|
|
```
|
|
|
|
**Usage**:
|
|
```typescript
|
|
const db = await PGlite.create({ extensions: { ruvector } })
|
|
|
|
// Use via JavaScript API (not SQL)
|
|
const dist = db.ruvector.vector.cosineDistance(vec1, vec2)
|
|
```
|
|
|
|
**Pros**:
|
|
- ✅ No Emscripten/PostgreSQL build complexity
|
|
- ✅ Direct WASM (no PostgreSQL FFI overhead)
|
|
- ✅ Easier to build and maintain
|
|
- ✅ Can still use Rust → wasm-bindgen
|
|
|
|
**Cons**:
|
|
- ❌ Not SQL-compatible (no `SELECT ... ORDER BY embedding <=> $1`)
|
|
- ❌ Can't use PostgreSQL indexes
|
|
- ❌ Not a drop-in pgvector replacement
|
|
|
|
---
|
|
|
|
## ✅ What's ACTUALLY Needed
|
|
|
|
### Corrected Architecture Options
|
|
|
|
#### **Option 1: Full PostgreSQL Extension (Complex but SQL-compatible)**
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ ruvector-core (Rust library) │
|
|
│ - Vector types, distances, HNSW │
|
|
│ - Compiles to: libruvector_core.a │
|
|
│ - Target: wasm32-unknown-emscripten │
|
|
└─────────────────────────────────────────┘
|
|
▲
|
|
│ C FFI
|
|
│
|
|
┌─────────────┴───────────────────────────┐
|
|
│ ruvector_pglite_wrapper.c │
|
|
│ - PostgreSQL extension entry points │
|
|
│ - PG_FUNCTION_INFO_V1 macros │
|
|
│ - Calls Rust via FFI │
|
|
└─────────────────────────────────────────┘
|
|
│ Emscripten
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ ruvector.tar.gz │
|
|
│ ├── ruvector.so.wasm │
|
|
│ ├── ruvector.control │
|
|
│ └── ruvector--0.1.0.sql │
|
|
└─────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ @ruvector/pglite (TypeScript) │
|
|
│ - Extension loader │
|
|
│ - Minimal wrapper (like pgvector) │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
**Build Process**:
|
|
1. Fork PGlite repo
|
|
2. Add ruvector as submodule in `postgres-pglite/pglite/`
|
|
3. Create Makefile with Emscripten rules
|
|
4. Build Rust core to WASM staticlib
|
|
5. Link with C wrapper
|
|
6. Package to .tar.gz
|
|
7. Create TypeScript loader
|
|
|
|
**Pros**: ✅ Full SQL compatibility, ✅ PostgreSQL indexes
|
|
**Cons**: ❌ Complex build, ❌ Large size, ❌ Tight coupling to PGlite
|
|
|
|
#### **Option 2: Hybrid TypeScript Plugin (Simpler, WASM-native)**
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ ruvector-core (Rust library) │
|
|
│ - Vector operations only │
|
|
│ - No PostgreSQL dependencies │
|
|
│ - wasm-bindgen for JS interop │
|
|
│ - Target: wasm32-unknown-unknown │
|
|
└─────────────────────────────────────────┘
|
|
│ wasm-pack
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ ruvector_core_bg.wasm + .js glue │
|
|
└─────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────┐
|
|
│ @ruvector/pglite (TypeScript plugin) │
|
|
│ - PGlite Extension interface │
|
|
│ - Namespace API for vector ops │
|
|
│ - SQL function wrappers (via exec) │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
**Build Process**:
|
|
```bash
|
|
# 1. Build Rust to WASM
|
|
cd ruvector-core
|
|
wasm-pack build --target web
|
|
|
|
# 2. Create TypeScript wrapper
|
|
cd ../npm/packages/pglite
|
|
pnpm build
|
|
|
|
# 3. Publish
|
|
pnpm publish
|
|
```
|
|
|
|
**Pros**: ✅ Simple build, ✅ Small size, ✅ Easy maintenance
|
|
**Cons**: ❌ Limited SQL integration, ❌ No native indexes
|
|
|
|
#### **Option 3: Minimal SQL Extension (Best Balance)**
|
|
|
|
Start with **Option 1** but with **minimal features**:
|
|
- ✅ Core vector type
|
|
- ✅ Distance operators (`<->`, `<=>`, `<#>`)
|
|
- ❌ Skip HNSW (use flat scan)
|
|
- ❌ Skip quantization
|
|
- ❌ Skip advanced features
|
|
|
|
**Target Size**: ~200-500KB (comparable to pgvector)
|
|
|
|
---
|
|
|
|
## 📋 Revised Implementation Checklist
|
|
|
|
### Prerequisites
|
|
- [ ] Clone PGlite repo with submodules
|
|
- [ ] Install Emscripten SDK (emsdk)
|
|
- [ ] Study pgvector's PGlite implementation
|
|
- [ ] Understand PGlite's build system
|
|
|
|
### Development
|
|
- [ ] Create `ruvector-core` (no PostgreSQL deps)
|
|
- [ ] Add C FFI layer (`ffi.rs` with `#[no_mangle]`)
|
|
- [ ] Write C wrapper (`ruvector_wrapper.c`)
|
|
- [ ] Create extension control file (`ruvector.control`)
|
|
- [ ] Write SQL install script (`ruvector--0.1.0.sql`)
|
|
|
|
### Building
|
|
- [ ] Add as submodule to PGlite
|
|
- [ ] Configure Emscripten Makefile
|
|
- [ ] Build Rust to WASM staticlib
|
|
- [ ] Link with C wrapper using emcc
|
|
- [ ] Package to .tar.gz
|
|
|
|
### Testing
|
|
- [ ] Write Vitest tests
|
|
- [ ] Test in browser environment
|
|
- [ ] Benchmark against pgvector
|
|
- [ ] Validate SQL compatibility
|
|
|
|
### Publishing
|
|
- [ ] Create TypeScript loader
|
|
- [ ] Add to PGlite extensions catalog
|
|
- [ ] Publish to npm
|
|
- [ ] Write documentation
|
|
|
|
---
|
|
|
|
## 🎯 Recommended Path Forward
|
|
|
|
**Start with Option 2 (TypeScript Plugin)** for these reasons:
|
|
|
|
1. **Immediate Value**: Can ship in 1-2 weeks vs 6+ weeks
|
|
2. **Learning Path**: Understand PGlite before committing to Option 1
|
|
3. **Proof of Concept**: Validate demand for ruvector-pglite
|
|
4. **Simpler**: No Emscripten complexity
|
|
5. **Upgradeable**: Can migrate to Option 1 later if SQL is critical
|
|
|
|
**Then, if SQL compatibility is required**, upgrade to Option 1 (Full Extension).
|
|
|
|
---
|
|
|
|
## 📚 Additional Research Needed
|
|
|
|
1. **Emscripten + Rust**: Best practices for compiling Rust to WASM with emcc
|
|
2. **PGlite Build System**: Deep dive into their Makefile and build scripts
|
|
3. **PostgreSQL C API**: Required functions for minimal extension
|
|
4. **Memory Management**: Emscripten's memory model vs Rust
|
|
5. **Size Optimization**: Dead code elimination, LTO for WASM
|
|
|
|
---
|
|
|
|
## Sources
|
|
|
|
- [PGlite Extension Development Guide](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md)
|
|
- [PGlite pgvector Implementation](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts)
|
|
- [In-Browser Semantic Search with PGlite](https://supabase.com/blog/in-browser-semantic-search-pglite)
|
|
- [PGlite Extensions Catalog](https://pglite.dev/extensions/)
|
|
- [Bringing WebAssembly to PostgreSQL](https://dylibso.com/blog/pg-extism/)
|
|
- [Compiling Postgres to WASM with PGlite](https://development.tldrecap.tech/posts/pgconf-2025/postgresql-pg-lite-webassembly-wasm/)
|
|
|
|
---
|
|
|
|
**Next Step**: Choose an implementation option and update the plan accordingly.
|