Files
wifi-densepose/docs/research/RUVECTOR_PGLITE_CRITICAL_GAPS.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

13 KiB

Critical Gaps in RuVector-PGlite Implementation Plan

🚨 Major Architectural Flaws Discovered

After researching actual PGlite extension development, the original implementation plan has critical flaws that must be addressed.


What Was WRONG in the Original Plan

1. pgrx Does NOT Support WASM Compilation

Original Assumption: Use pgrx with wasm32-unknown-unknown target

# ❌ THIS DOESN'T WORK
[lib]
crate-type = ["cdylib"]

[target.wasm32-unknown-unknown]
# pgrx is not designed for WASM target

Reality:

  • pgrx is designed to build native PostgreSQL extensions (.so, .dylib, .dll)
  • pgrx is used to build extensions that run WebAssembly (via Extism), not extensions compiled to WebAssembly
  • No evidence of pgrx supporting wasm32 as a compilation target

Sources:

2. Wrong Build Toolchain

Original Plan: cargo pgrx package --target wasm32

Reality: PGlite extensions require:

# ✅ CORRECT: Emscripten toolchain
emcc -o extension.wasm extension.c \
  -I$POSTGRES_INCLUDE \
  -s MAIN_MODULE=1 \
  -s ASYNCIFY

Required Tools:

  • Emscripten SDK (emsdk)
  • PostgreSQL headers for WASM
  • Tar packaging for .tar.gz bundles
  • NOT cargo pgrx

3. Misunderstood Extension Structure

Original Plan: Build a standalone .wasm file

Reality: PGlite extensions are .tar.gz tarballs containing:

vector.tar.gz
├── extension/
│   ├── vector.so.wasm    # WASM compiled extension
│   ├── vector.control    # Extension metadata
│   ├── vector--*.sql     # SQL install scripts
│   └── data/             # Any data files

Actual pgvector Implementation:

// packages/pglite/src/vector/index.ts
const setup = async (_pg: PGliteInterface, emscriptenOpts: any) => {
  return {
    emscriptenOpts,
    bundlePath: new URL('../../release/vector.tar.gz', import.meta.url),
  }
}

export const vector = { name: 'pgvector', setup }

Source: PGlite vector extension source

4. Missing Build Process Details

What Was Missing:

  • How to clone PGlite with submodules
  • How to add extension to postgres-pglite/pglite/Makefile
  • How to build within PGlite's build system
  • Emscripten compilation flags (MAIN_MODULE, ASYNCIFY)
  • Tarball packaging steps

Actual Process (source):

# 1. Clone PGlite
git clone --recurse-submodules git@github.com:electric-sql/pglite.git
cd pglite && pnpm i

# 2. Add extension as submodule
cd postgres-pglite/pglite
git submodule add <extension_url>

# 3. Register in Makefile
echo "SUBDIRS += ruvector" >> Makefile

# 4. Build (creates .tar.gz)
pnpm build:all
# Output: packages/pglite/release/ruvector.tar.gz

5. No Rust-Specific Guidance

Gap: How to write a Rust extension that compiles with Emscripten?

Missing Details:

  • Rust → C FFI interface layer
  • #[no_mangle] exports for PostgreSQL API
  • Memory management (Emscripten vs Rust allocator)
  • Build script for emcc + rustc

Possible Approaches:

Option A: Pure C Extension

// ruvector_pglite.c
#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(vector_cosine_distance);
Datum vector_cosine_distance(PG_FUNCTION_ARGS) {
    // Call Rust library via FFI
    float32 result = rust_cosine_distance(...);
    PG_RETURN_FLOAT4(result);
}

Then compile:

emcc -o ruvector.wasm ruvector_pglite.c libruvector_core.a \
  -I$PG_INCLUDE -s MAIN_MODULE=1

Option B: Rust with C Wrapper

// ruvector_core/src/ffi.rs
#[no_mangle]
pub extern "C" fn rust_cosine_distance(
    a: *const f32,
    b: *const f32,
    len: usize
) -> f32 {
    // Safe Rust implementation
}

Then build:

# Build Rust to WASM staticlib
cargo build --target wasm32-unknown-emscripten --release

# Link with C wrapper
emcc -o ruvector.wasm wrapper.c libruvector_core.a \
  -I$PG_INCLUDE -s MAIN_MODULE=1

6. Size Targets May Be Unrealistic

Original Target: 500KB-1MB WASM

Reality Check:

  • pgvector (minimal, C-based): ~200KB compiled to WASM
  • Full ruvector features (even stripped): likely 2-5MB
  • Rust std library adds ~100-300KB
  • PostgreSQL runtime overhead: varies

Revised Targets:

  • Minimal (types + distances): ~500KB-1MB
  • With HNSW index: ~1-2MB
  • With quantization: ~2-3MB
  • Full features: 5-10MB (defeats purpose)

7. No TypeScript Plugin API Consideration

Missing Alternative: PGlite's custom plugin API

Instead of a PostgreSQL extension, could build a TypeScript plugin that provides vector operations via PGlite's namespace API:

// Hybrid approach: TypeScript + WASM compute kernel
import { Extension } from '@electric-sql/pglite'
import init, { cosineDistance } from './ruvector_core.wasm'

const setup = async (pg: PGliteInterface) => {
  await init() // Initialize WASM

  return {
    namespaceObj: {
      vector: {
        cosineDistance: (a: Float32Array, b: Float32Array) =>
          cosineDistance(a, b), // WASM function
        // Other vector operations...
      }
    }
  }
}

export const ruvector = { name: 'ruvector', setup }

Usage:

const db = await PGlite.create({ extensions: { ruvector } })

// Use via JavaScript API (not SQL)
const dist = db.ruvector.vector.cosineDistance(vec1, vec2)

Pros:

  • No Emscripten/PostgreSQL build complexity
  • Direct WASM (no PostgreSQL FFI overhead)
  • Easier to build and maintain
  • Can still use Rust → wasm-bindgen

Cons:

  • Not SQL-compatible (no SELECT ... ORDER BY embedding <=> $1)
  • Can't use PostgreSQL indexes
  • Not a drop-in pgvector replacement

What's ACTUALLY Needed

Corrected Architecture Options

Option 1: Full PostgreSQL Extension (Complex but SQL-compatible)

┌─────────────────────────────────────────┐
│  ruvector-core (Rust library)          │
│  - Vector types, distances, HNSW       │
│  - Compiles to: libruvector_core.a      │
│  - Target: wasm32-unknown-emscripten    │
└─────────────────────────────────────────┘
              ▲
              │ C FFI
              │
┌─────────────┴───────────────────────────┐
│  ruvector_pglite_wrapper.c              │
│  - PostgreSQL extension entry points    │
│  - PG_FUNCTION_INFO_V1 macros          │
│  - Calls Rust via FFI                  │
└─────────────────────────────────────────┘
              │ Emscripten
              ▼
┌─────────────────────────────────────────┐
│  ruvector.tar.gz                        │
│  ├── ruvector.so.wasm                   │
│  ├── ruvector.control                   │
│  └── ruvector--0.1.0.sql                │
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  @ruvector/pglite (TypeScript)          │
│  - Extension loader                     │
│  - Minimal wrapper (like pgvector)      │
└─────────────────────────────────────────┘

Build Process:

  1. Fork PGlite repo
  2. Add ruvector as submodule in postgres-pglite/pglite/
  3. Create Makefile with Emscripten rules
  4. Build Rust core to WASM staticlib
  5. Link with C wrapper
  6. Package to .tar.gz
  7. Create TypeScript loader

Pros: Full SQL compatibility, PostgreSQL indexes Cons: Complex build, Large size, Tight coupling to PGlite

Option 2: Hybrid TypeScript Plugin (Simpler, WASM-native)

┌─────────────────────────────────────────┐
│  ruvector-core (Rust library)          │
│  - Vector operations only               │
│  - No PostgreSQL dependencies           │
│  - wasm-bindgen for JS interop          │
│  - Target: wasm32-unknown-unknown       │
└─────────────────────────────────────────┘
              │ wasm-pack
              ▼
┌─────────────────────────────────────────┐
│  ruvector_core_bg.wasm + .js glue       │
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  @ruvector/pglite (TypeScript plugin)   │
│  - PGlite Extension interface           │
│  - Namespace API for vector ops         │
│  - SQL function wrappers (via exec)     │
└─────────────────────────────────────────┘

Build Process:

# 1. Build Rust to WASM
cd ruvector-core
wasm-pack build --target web

# 2. Create TypeScript wrapper
cd ../npm/packages/pglite
pnpm build

# 3. Publish
pnpm publish

Pros: Simple build, Small size, Easy maintenance Cons: Limited SQL integration, No native indexes

Option 3: Minimal SQL Extension (Best Balance)

Start with Option 1 but with minimal features:

  • Core vector type
  • Distance operators (<->, <=>, <#>)
  • Skip HNSW (use flat scan)
  • Skip quantization
  • Skip advanced features

Target Size: ~200-500KB (comparable to pgvector)


📋 Revised Implementation Checklist

Prerequisites

  • Clone PGlite repo with submodules
  • Install Emscripten SDK (emsdk)
  • Study pgvector's PGlite implementation
  • Understand PGlite's build system

Development

  • Create ruvector-core (no PostgreSQL deps)
  • Add C FFI layer (ffi.rs with #[no_mangle])
  • Write C wrapper (ruvector_wrapper.c)
  • Create extension control file (ruvector.control)
  • Write SQL install script (ruvector--0.1.0.sql)

Building

  • Add as submodule to PGlite
  • Configure Emscripten Makefile
  • Build Rust to WASM staticlib
  • Link with C wrapper using emcc
  • Package to .tar.gz

Testing

  • Write Vitest tests
  • Test in browser environment
  • Benchmark against pgvector
  • Validate SQL compatibility

Publishing

  • Create TypeScript loader
  • Add to PGlite extensions catalog
  • Publish to npm
  • Write documentation

Start with Option 2 (TypeScript Plugin) for these reasons:

  1. Immediate Value: Can ship in 1-2 weeks vs 6+ weeks
  2. Learning Path: Understand PGlite before committing to Option 1
  3. Proof of Concept: Validate demand for ruvector-pglite
  4. Simpler: No Emscripten complexity
  5. Upgradeable: Can migrate to Option 1 later if SQL is critical

Then, if SQL compatibility is required, upgrade to Option 1 (Full Extension).


📚 Additional Research Needed

  1. Emscripten + Rust: Best practices for compiling Rust to WASM with emcc
  2. PGlite Build System: Deep dive into their Makefile and build scripts
  3. PostgreSQL C API: Required functions for minimal extension
  4. Memory Management: Emscripten's memory model vs Rust
  5. Size Optimization: Dead code elimination, LTO for WASM

Sources


Next Step: Choose an implementation option and update the plan accordingly.