# Critical Gaps in RuVector-PGlite Implementation Plan ## 🚨 Major Architectural Flaws Discovered After researching actual PGlite extension development, the original implementation plan has **critical flaws** that must be addressed. --- ## ❌ What Was WRONG in the Original Plan ### 1. **pgrx Does NOT Support WASM Compilation** **Original Assumption**: Use pgrx with `wasm32-unknown-unknown` target ```toml # ❌ THIS DOESN'T WORK [lib] crate-type = ["cdylib"] [target.wasm32-unknown-unknown] # pgrx is not designed for WASM target ``` **Reality**: - pgrx is designed to build **native** PostgreSQL extensions (.so, .dylib, .dll) - pgrx is used to build extensions that **run** WebAssembly (via Extism), not extensions **compiled to** WebAssembly - No evidence of pgrx supporting wasm32 as a compilation target **Sources**: - [pgrx WebAssembly research](https://dylibso.com/blog/pg-extism/) - No wasm32 target in pgrx documentation ### 2. **Wrong Build Toolchain** **Original Plan**: `cargo pgrx package --target wasm32` **Reality**: PGlite extensions require: ```bash # ✅ CORRECT: Emscripten toolchain emcc -o extension.wasm extension.c \ -I$POSTGRES_INCLUDE \ -s MAIN_MODULE=1 \ -s ASYNCIFY ``` **Required Tools**: - ✅ Emscripten SDK (emsdk) - ✅ PostgreSQL headers for WASM - ✅ Tar packaging for `.tar.gz` bundles - ❌ NOT cargo pgrx ### 3. **Misunderstood Extension Structure** **Original Plan**: Build a standalone `.wasm` file **Reality**: PGlite extensions are `.tar.gz` tarballs containing: ``` vector.tar.gz ├── extension/ │ ├── vector.so.wasm # WASM compiled extension │ ├── vector.control # Extension metadata │ ├── vector--*.sql # SQL install scripts │ └── data/ # Any data files ``` **Actual pgvector Implementation**: ```typescript // packages/pglite/src/vector/index.ts const setup = async (_pg: PGliteInterface, emscriptenOpts: any) => { return { emscriptenOpts, bundlePath: new URL('../../release/vector.tar.gz', import.meta.url), } } export const vector = { name: 'pgvector', setup } ``` **Source**: [PGlite vector extension source](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts) ### 4. **Missing Build Process Details** **What Was Missing**: - How to clone PGlite with submodules - How to add extension to `postgres-pglite/pglite/Makefile` - How to build within PGlite's build system - Emscripten compilation flags (MAIN_MODULE, ASYNCIFY) - Tarball packaging steps **Actual Process** ([source](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md)): ```bash # 1. Clone PGlite git clone --recurse-submodules git@github.com:electric-sql/pglite.git cd pglite && pnpm i # 2. Add extension as submodule cd postgres-pglite/pglite git submodule add # 3. Register in Makefile echo "SUBDIRS += ruvector" >> Makefile # 4. Build (creates .tar.gz) pnpm build:all # Output: packages/pglite/release/ruvector.tar.gz ``` ### 5. **No Rust-Specific Guidance** **Gap**: How to write a Rust extension that compiles with Emscripten? **Missing Details**: - Rust → C FFI interface layer - `#[no_mangle]` exports for PostgreSQL API - Memory management (Emscripten vs Rust allocator) - Build script for `emcc` + `rustc` **Possible Approaches**: #### Option A: Pure C Extension ```c // ruvector_pglite.c #include "postgres.h" #include "fmgr.h" PG_MODULE_MAGIC; PG_FUNCTION_INFO_V1(vector_cosine_distance); Datum vector_cosine_distance(PG_FUNCTION_ARGS) { // Call Rust library via FFI float32 result = rust_cosine_distance(...); PG_RETURN_FLOAT4(result); } ``` Then compile: ```bash emcc -o ruvector.wasm ruvector_pglite.c libruvector_core.a \ -I$PG_INCLUDE -s MAIN_MODULE=1 ``` #### Option B: Rust with C Wrapper ```rust // ruvector_core/src/ffi.rs #[no_mangle] pub extern "C" fn rust_cosine_distance( a: *const f32, b: *const f32, len: usize ) -> f32 { // Safe Rust implementation } ``` Then build: ```bash # Build Rust to WASM staticlib cargo build --target wasm32-unknown-emscripten --release # Link with C wrapper emcc -o ruvector.wasm wrapper.c libruvector_core.a \ -I$PG_INCLUDE -s MAIN_MODULE=1 ``` ### 6. **Size Targets May Be Unrealistic** **Original Target**: 500KB-1MB WASM **Reality Check**: - pgvector (minimal, C-based): ~200KB compiled to WASM - Full ruvector features (even stripped): likely 2-5MB - Rust std library adds ~100-300KB - PostgreSQL runtime overhead: varies **Revised Targets**: - Minimal (types + distances): ~500KB-1MB ✅ - With HNSW index: ~1-2MB - With quantization: ~2-3MB - Full features: 5-10MB (defeats purpose) ### 7. **No TypeScript Plugin API Consideration** **Missing Alternative**: PGlite's custom plugin API Instead of a PostgreSQL extension, could build a **TypeScript plugin** that provides vector operations via PGlite's namespace API: ```typescript // Hybrid approach: TypeScript + WASM compute kernel import { Extension } from '@electric-sql/pglite' import init, { cosineDistance } from './ruvector_core.wasm' const setup = async (pg: PGliteInterface) => { await init() // Initialize WASM return { namespaceObj: { vector: { cosineDistance: (a: Float32Array, b: Float32Array) => cosineDistance(a, b), // WASM function // Other vector operations... } } } } export const ruvector = { name: 'ruvector', setup } ``` **Usage**: ```typescript const db = await PGlite.create({ extensions: { ruvector } }) // Use via JavaScript API (not SQL) const dist = db.ruvector.vector.cosineDistance(vec1, vec2) ``` **Pros**: - ✅ No Emscripten/PostgreSQL build complexity - ✅ Direct WASM (no PostgreSQL FFI overhead) - ✅ Easier to build and maintain - ✅ Can still use Rust → wasm-bindgen **Cons**: - ❌ Not SQL-compatible (no `SELECT ... ORDER BY embedding <=> $1`) - ❌ Can't use PostgreSQL indexes - ❌ Not a drop-in pgvector replacement --- ## ✅ What's ACTUALLY Needed ### Corrected Architecture Options #### **Option 1: Full PostgreSQL Extension (Complex but SQL-compatible)** ``` ┌─────────────────────────────────────────┐ │ ruvector-core (Rust library) │ │ - Vector types, distances, HNSW │ │ - Compiles to: libruvector_core.a │ │ - Target: wasm32-unknown-emscripten │ └─────────────────────────────────────────┘ ▲ │ C FFI │ ┌─────────────┴───────────────────────────┐ │ ruvector_pglite_wrapper.c │ │ - PostgreSQL extension entry points │ │ - PG_FUNCTION_INFO_V1 macros │ │ - Calls Rust via FFI │ └─────────────────────────────────────────┘ │ Emscripten ▼ ┌─────────────────────────────────────────┐ │ ruvector.tar.gz │ │ ├── ruvector.so.wasm │ │ ├── ruvector.control │ │ └── ruvector--0.1.0.sql │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ @ruvector/pglite (TypeScript) │ │ - Extension loader │ │ - Minimal wrapper (like pgvector) │ └─────────────────────────────────────────┘ ``` **Build Process**: 1. Fork PGlite repo 2. Add ruvector as submodule in `postgres-pglite/pglite/` 3. Create Makefile with Emscripten rules 4. Build Rust core to WASM staticlib 5. Link with C wrapper 6. Package to .tar.gz 7. Create TypeScript loader **Pros**: ✅ Full SQL compatibility, ✅ PostgreSQL indexes **Cons**: ❌ Complex build, ❌ Large size, ❌ Tight coupling to PGlite #### **Option 2: Hybrid TypeScript Plugin (Simpler, WASM-native)** ``` ┌─────────────────────────────────────────┐ │ ruvector-core (Rust library) │ │ - Vector operations only │ │ - No PostgreSQL dependencies │ │ - wasm-bindgen for JS interop │ │ - Target: wasm32-unknown-unknown │ └─────────────────────────────────────────┘ │ wasm-pack ▼ ┌─────────────────────────────────────────┐ │ ruvector_core_bg.wasm + .js glue │ └─────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ @ruvector/pglite (TypeScript plugin) │ │ - PGlite Extension interface │ │ - Namespace API for vector ops │ │ - SQL function wrappers (via exec) │ └─────────────────────────────────────────┘ ``` **Build Process**: ```bash # 1. Build Rust to WASM cd ruvector-core wasm-pack build --target web # 2. Create TypeScript wrapper cd ../npm/packages/pglite pnpm build # 3. Publish pnpm publish ``` **Pros**: ✅ Simple build, ✅ Small size, ✅ Easy maintenance **Cons**: ❌ Limited SQL integration, ❌ No native indexes #### **Option 3: Minimal SQL Extension (Best Balance)** Start with **Option 1** but with **minimal features**: - ✅ Core vector type - ✅ Distance operators (`<->`, `<=>`, `<#>`) - ❌ Skip HNSW (use flat scan) - ❌ Skip quantization - ❌ Skip advanced features **Target Size**: ~200-500KB (comparable to pgvector) --- ## 📋 Revised Implementation Checklist ### Prerequisites - [ ] Clone PGlite repo with submodules - [ ] Install Emscripten SDK (emsdk) - [ ] Study pgvector's PGlite implementation - [ ] Understand PGlite's build system ### Development - [ ] Create `ruvector-core` (no PostgreSQL deps) - [ ] Add C FFI layer (`ffi.rs` with `#[no_mangle]`) - [ ] Write C wrapper (`ruvector_wrapper.c`) - [ ] Create extension control file (`ruvector.control`) - [ ] Write SQL install script (`ruvector--0.1.0.sql`) ### Building - [ ] Add as submodule to PGlite - [ ] Configure Emscripten Makefile - [ ] Build Rust to WASM staticlib - [ ] Link with C wrapper using emcc - [ ] Package to .tar.gz ### Testing - [ ] Write Vitest tests - [ ] Test in browser environment - [ ] Benchmark against pgvector - [ ] Validate SQL compatibility ### Publishing - [ ] Create TypeScript loader - [ ] Add to PGlite extensions catalog - [ ] Publish to npm - [ ] Write documentation --- ## 🎯 Recommended Path Forward **Start with Option 2 (TypeScript Plugin)** for these reasons: 1. **Immediate Value**: Can ship in 1-2 weeks vs 6+ weeks 2. **Learning Path**: Understand PGlite before committing to Option 1 3. **Proof of Concept**: Validate demand for ruvector-pglite 4. **Simpler**: No Emscripten complexity 5. **Upgradeable**: Can migrate to Option 1 later if SQL is critical **Then, if SQL compatibility is required**, upgrade to Option 1 (Full Extension). --- ## 📚 Additional Research Needed 1. **Emscripten + Rust**: Best practices for compiling Rust to WASM with emcc 2. **PGlite Build System**: Deep dive into their Makefile and build scripts 3. **PostgreSQL C API**: Required functions for minimal extension 4. **Memory Management**: Emscripten's memory model vs Rust 5. **Size Optimization**: Dead code elimination, LTO for WASM --- ## Sources - [PGlite Extension Development Guide](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md) - [PGlite pgvector Implementation](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts) - [In-Browser Semantic Search with PGlite](https://supabase.com/blog/in-browser-semantic-search-pglite) - [PGlite Extensions Catalog](https://pglite.dev/extensions/) - [Bringing WebAssembly to PostgreSQL](https://dylibso.com/blog/pg-extism/) - [Compiling Postgres to WASM with PGlite](https://development.tldrecap.tech/posts/pgconf-2025/postgresql-pg-lite-webassembly-wasm/) --- **Next Step**: Choose an implementation option and update the plan accordingly.