git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
13 KiB
Critical Gaps in RuVector-PGlite Implementation Plan
🚨 Major Architectural Flaws Discovered
After researching actual PGlite extension development, the original implementation plan has critical flaws that must be addressed.
❌ What Was WRONG in the Original Plan
1. pgrx Does NOT Support WASM Compilation
Original Assumption: Use pgrx with wasm32-unknown-unknown target
# ❌ THIS DOESN'T WORK
[lib]
crate-type = ["cdylib"]
[target.wasm32-unknown-unknown]
# pgrx is not designed for WASM target
Reality:
- pgrx is designed to build native PostgreSQL extensions (.so, .dylib, .dll)
- pgrx is used to build extensions that run WebAssembly (via Extism), not extensions compiled to WebAssembly
- No evidence of pgrx supporting wasm32 as a compilation target
Sources:
- pgrx WebAssembly research
- No wasm32 target in pgrx documentation
2. Wrong Build Toolchain
Original Plan: cargo pgrx package --target wasm32
Reality: PGlite extensions require:
# ✅ CORRECT: Emscripten toolchain
emcc -o extension.wasm extension.c \
-I$POSTGRES_INCLUDE \
-s MAIN_MODULE=1 \
-s ASYNCIFY
Required Tools:
- ✅ Emscripten SDK (emsdk)
- ✅ PostgreSQL headers for WASM
- ✅ Tar packaging for
.tar.gzbundles - ❌ NOT cargo pgrx
3. Misunderstood Extension Structure
Original Plan: Build a standalone .wasm file
Reality: PGlite extensions are .tar.gz tarballs containing:
vector.tar.gz
├── extension/
│ ├── vector.so.wasm # WASM compiled extension
│ ├── vector.control # Extension metadata
│ ├── vector--*.sql # SQL install scripts
│ └── data/ # Any data files
Actual pgvector Implementation:
// packages/pglite/src/vector/index.ts
const setup = async (_pg: PGliteInterface, emscriptenOpts: any) => {
return {
emscriptenOpts,
bundlePath: new URL('../../release/vector.tar.gz', import.meta.url),
}
}
export const vector = { name: 'pgvector', setup }
Source: PGlite vector extension source
4. Missing Build Process Details
What Was Missing:
- How to clone PGlite with submodules
- How to add extension to
postgres-pglite/pglite/Makefile - How to build within PGlite's build system
- Emscripten compilation flags (MAIN_MODULE, ASYNCIFY)
- Tarball packaging steps
Actual Process (source):
# 1. Clone PGlite
git clone --recurse-submodules git@github.com:electric-sql/pglite.git
cd pglite && pnpm i
# 2. Add extension as submodule
cd postgres-pglite/pglite
git submodule add <extension_url>
# 3. Register in Makefile
echo "SUBDIRS += ruvector" >> Makefile
# 4. Build (creates .tar.gz)
pnpm build:all
# Output: packages/pglite/release/ruvector.tar.gz
5. No Rust-Specific Guidance
Gap: How to write a Rust extension that compiles with Emscripten?
Missing Details:
- Rust → C FFI interface layer
#[no_mangle]exports for PostgreSQL API- Memory management (Emscripten vs Rust allocator)
- Build script for
emcc+rustc
Possible Approaches:
Option A: Pure C Extension
// ruvector_pglite.c
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(vector_cosine_distance);
Datum vector_cosine_distance(PG_FUNCTION_ARGS) {
// Call Rust library via FFI
float32 result = rust_cosine_distance(...);
PG_RETURN_FLOAT4(result);
}
Then compile:
emcc -o ruvector.wasm ruvector_pglite.c libruvector_core.a \
-I$PG_INCLUDE -s MAIN_MODULE=1
Option B: Rust with C Wrapper
// ruvector_core/src/ffi.rs
#[no_mangle]
pub extern "C" fn rust_cosine_distance(
a: *const f32,
b: *const f32,
len: usize
) -> f32 {
// Safe Rust implementation
}
Then build:
# Build Rust to WASM staticlib
cargo build --target wasm32-unknown-emscripten --release
# Link with C wrapper
emcc -o ruvector.wasm wrapper.c libruvector_core.a \
-I$PG_INCLUDE -s MAIN_MODULE=1
6. Size Targets May Be Unrealistic
Original Target: 500KB-1MB WASM
Reality Check:
- pgvector (minimal, C-based): ~200KB compiled to WASM
- Full ruvector features (even stripped): likely 2-5MB
- Rust std library adds ~100-300KB
- PostgreSQL runtime overhead: varies
Revised Targets:
- Minimal (types + distances): ~500KB-1MB ✅
- With HNSW index: ~1-2MB
- With quantization: ~2-3MB
- Full features: 5-10MB (defeats purpose)
7. No TypeScript Plugin API Consideration
Missing Alternative: PGlite's custom plugin API
Instead of a PostgreSQL extension, could build a TypeScript plugin that provides vector operations via PGlite's namespace API:
// Hybrid approach: TypeScript + WASM compute kernel
import { Extension } from '@electric-sql/pglite'
import init, { cosineDistance } from './ruvector_core.wasm'
const setup = async (pg: PGliteInterface) => {
await init() // Initialize WASM
return {
namespaceObj: {
vector: {
cosineDistance: (a: Float32Array, b: Float32Array) =>
cosineDistance(a, b), // WASM function
// Other vector operations...
}
}
}
}
export const ruvector = { name: 'ruvector', setup }
Usage:
const db = await PGlite.create({ extensions: { ruvector } })
// Use via JavaScript API (not SQL)
const dist = db.ruvector.vector.cosineDistance(vec1, vec2)
Pros:
- ✅ No Emscripten/PostgreSQL build complexity
- ✅ Direct WASM (no PostgreSQL FFI overhead)
- ✅ Easier to build and maintain
- ✅ Can still use Rust → wasm-bindgen
Cons:
- ❌ Not SQL-compatible (no
SELECT ... ORDER BY embedding <=> $1) - ❌ Can't use PostgreSQL indexes
- ❌ Not a drop-in pgvector replacement
✅ What's ACTUALLY Needed
Corrected Architecture Options
Option 1: Full PostgreSQL Extension (Complex but SQL-compatible)
┌─────────────────────────────────────────┐
│ ruvector-core (Rust library) │
│ - Vector types, distances, HNSW │
│ - Compiles to: libruvector_core.a │
│ - Target: wasm32-unknown-emscripten │
└─────────────────────────────────────────┘
▲
│ C FFI
│
┌─────────────┴───────────────────────────┐
│ ruvector_pglite_wrapper.c │
│ - PostgreSQL extension entry points │
│ - PG_FUNCTION_INFO_V1 macros │
│ - Calls Rust via FFI │
└─────────────────────────────────────────┘
│ Emscripten
▼
┌─────────────────────────────────────────┐
│ ruvector.tar.gz │
│ ├── ruvector.so.wasm │
│ ├── ruvector.control │
│ └── ruvector--0.1.0.sql │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ @ruvector/pglite (TypeScript) │
│ - Extension loader │
│ - Minimal wrapper (like pgvector) │
└─────────────────────────────────────────┘
Build Process:
- Fork PGlite repo
- Add ruvector as submodule in
postgres-pglite/pglite/ - Create Makefile with Emscripten rules
- Build Rust core to WASM staticlib
- Link with C wrapper
- Package to .tar.gz
- Create TypeScript loader
Pros: ✅ Full SQL compatibility, ✅ PostgreSQL indexes Cons: ❌ Complex build, ❌ Large size, ❌ Tight coupling to PGlite
Option 2: Hybrid TypeScript Plugin (Simpler, WASM-native)
┌─────────────────────────────────────────┐
│ ruvector-core (Rust library) │
│ - Vector operations only │
│ - No PostgreSQL dependencies │
│ - wasm-bindgen for JS interop │
│ - Target: wasm32-unknown-unknown │
└─────────────────────────────────────────┘
│ wasm-pack
▼
┌─────────────────────────────────────────┐
│ ruvector_core_bg.wasm + .js glue │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ @ruvector/pglite (TypeScript plugin) │
│ - PGlite Extension interface │
│ - Namespace API for vector ops │
│ - SQL function wrappers (via exec) │
└─────────────────────────────────────────┘
Build Process:
# 1. Build Rust to WASM
cd ruvector-core
wasm-pack build --target web
# 2. Create TypeScript wrapper
cd ../npm/packages/pglite
pnpm build
# 3. Publish
pnpm publish
Pros: ✅ Simple build, ✅ Small size, ✅ Easy maintenance Cons: ❌ Limited SQL integration, ❌ No native indexes
Option 3: Minimal SQL Extension (Best Balance)
Start with Option 1 but with minimal features:
- ✅ Core vector type
- ✅ Distance operators (
<->,<=>,<#>) - ❌ Skip HNSW (use flat scan)
- ❌ Skip quantization
- ❌ Skip advanced features
Target Size: ~200-500KB (comparable to pgvector)
📋 Revised Implementation Checklist
Prerequisites
- Clone PGlite repo with submodules
- Install Emscripten SDK (emsdk)
- Study pgvector's PGlite implementation
- Understand PGlite's build system
Development
- Create
ruvector-core(no PostgreSQL deps) - Add C FFI layer (
ffi.rswith#[no_mangle]) - Write C wrapper (
ruvector_wrapper.c) - Create extension control file (
ruvector.control) - Write SQL install script (
ruvector--0.1.0.sql)
Building
- Add as submodule to PGlite
- Configure Emscripten Makefile
- Build Rust to WASM staticlib
- Link with C wrapper using emcc
- Package to .tar.gz
Testing
- Write Vitest tests
- Test in browser environment
- Benchmark against pgvector
- Validate SQL compatibility
Publishing
- Create TypeScript loader
- Add to PGlite extensions catalog
- Publish to npm
- Write documentation
🎯 Recommended Path Forward
Start with Option 2 (TypeScript Plugin) for these reasons:
- Immediate Value: Can ship in 1-2 weeks vs 6+ weeks
- Learning Path: Understand PGlite before committing to Option 1
- Proof of Concept: Validate demand for ruvector-pglite
- Simpler: No Emscripten complexity
- Upgradeable: Can migrate to Option 1 later if SQL is critical
Then, if SQL compatibility is required, upgrade to Option 1 (Full Extension).
📚 Additional Research Needed
- Emscripten + Rust: Best practices for compiling Rust to WASM with emcc
- PGlite Build System: Deep dive into their Makefile and build scripts
- PostgreSQL C API: Required functions for minimal extension
- Memory Management: Emscripten's memory model vs Rust
- Size Optimization: Dead code elimination, LTO for WASM
Sources
- PGlite Extension Development Guide
- PGlite pgvector Implementation
- In-Browser Semantic Search with PGlite
- PGlite Extensions Catalog
- Bringing WebAssembly to PostgreSQL
- Compiling Postgres to WASM with PGlite
Next Step: Choose an implementation option and update the plan accordingly.