Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
406
docs/research/RUVECTOR_PGLITE_CRITICAL_GAPS.md
Normal file
406
docs/research/RUVECTOR_PGLITE_CRITICAL_GAPS.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# Critical Gaps in RuVector-PGlite Implementation Plan
|
||||
|
||||
## 🚨 Major Architectural Flaws Discovered
|
||||
|
||||
After researching actual PGlite extension development, the original implementation plan has **critical flaws** that must be addressed.
|
||||
|
||||
---
|
||||
|
||||
## ❌ What Was WRONG in the Original Plan
|
||||
|
||||
### 1. **pgrx Does NOT Support WASM Compilation**
|
||||
|
||||
**Original Assumption**: Use pgrx with `wasm32-unknown-unknown` target
|
||||
```toml
|
||||
# ❌ THIS DOESN'T WORK
|
||||
[lib]
|
||||
crate-type = ["cdylib"]
|
||||
|
||||
[target.wasm32-unknown-unknown]
|
||||
# pgrx is not designed for WASM target
|
||||
```
|
||||
|
||||
**Reality**:
|
||||
- pgrx is designed to build **native** PostgreSQL extensions (.so, .dylib, .dll)
|
||||
- pgrx is used to build extensions that **run** WebAssembly (via Extism), not extensions **compiled to** WebAssembly
|
||||
- No evidence of pgrx supporting wasm32 as a compilation target
|
||||
|
||||
**Sources**:
|
||||
- [pgrx WebAssembly research](https://dylibso.com/blog/pg-extism/)
|
||||
- No wasm32 target in pgrx documentation
|
||||
|
||||
### 2. **Wrong Build Toolchain**
|
||||
|
||||
**Original Plan**: `cargo pgrx package --target wasm32`
|
||||
|
||||
**Reality**: PGlite extensions require:
|
||||
```bash
|
||||
# ✅ CORRECT: Emscripten toolchain
|
||||
emcc -o extension.wasm extension.c \
|
||||
-I$POSTGRES_INCLUDE \
|
||||
-s MAIN_MODULE=1 \
|
||||
-s ASYNCIFY
|
||||
```
|
||||
|
||||
**Required Tools**:
|
||||
- ✅ Emscripten SDK (emsdk)
|
||||
- ✅ PostgreSQL headers for WASM
|
||||
- ✅ Tar packaging for `.tar.gz` bundles
|
||||
- ❌ NOT cargo pgrx
|
||||
|
||||
### 3. **Misunderstood Extension Structure**
|
||||
|
||||
**Original Plan**: Build a standalone `.wasm` file
|
||||
|
||||
**Reality**: PGlite extensions are `.tar.gz` tarballs containing:
|
||||
```
|
||||
vector.tar.gz
|
||||
├── extension/
|
||||
│ ├── vector.so.wasm # WASM compiled extension
|
||||
│ ├── vector.control # Extension metadata
|
||||
│ ├── vector--*.sql # SQL install scripts
|
||||
│ └── data/ # Any data files
|
||||
```
|
||||
|
||||
**Actual pgvector Implementation**:
|
||||
```typescript
|
||||
// packages/pglite/src/vector/index.ts
|
||||
const setup = async (_pg: PGliteInterface, emscriptenOpts: any) => {
|
||||
return {
|
||||
emscriptenOpts,
|
||||
bundlePath: new URL('../../release/vector.tar.gz', import.meta.url),
|
||||
}
|
||||
}
|
||||
|
||||
export const vector = { name: 'pgvector', setup }
|
||||
```
|
||||
|
||||
**Source**: [PGlite vector extension source](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts)
|
||||
|
||||
### 4. **Missing Build Process Details**
|
||||
|
||||
**What Was Missing**:
|
||||
- How to clone PGlite with submodules
|
||||
- How to add extension to `postgres-pglite/pglite/Makefile`
|
||||
- How to build within PGlite's build system
|
||||
- Emscripten compilation flags (MAIN_MODULE, ASYNCIFY)
|
||||
- Tarball packaging steps
|
||||
|
||||
**Actual Process** ([source](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md)):
|
||||
```bash
|
||||
# 1. Clone PGlite
|
||||
git clone --recurse-submodules git@github.com:electric-sql/pglite.git
|
||||
cd pglite && pnpm i
|
||||
|
||||
# 2. Add extension as submodule
|
||||
cd postgres-pglite/pglite
|
||||
git submodule add <extension_url>
|
||||
|
||||
# 3. Register in Makefile
|
||||
echo "SUBDIRS += ruvector" >> Makefile
|
||||
|
||||
# 4. Build (creates .tar.gz)
|
||||
pnpm build:all
|
||||
# Output: packages/pglite/release/ruvector.tar.gz
|
||||
```
|
||||
|
||||
### 5. **No Rust-Specific Guidance**
|
||||
|
||||
**Gap**: How to write a Rust extension that compiles with Emscripten?
|
||||
|
||||
**Missing Details**:
|
||||
- Rust → C FFI interface layer
|
||||
- `#[no_mangle]` exports for PostgreSQL API
|
||||
- Memory management (Emscripten vs Rust allocator)
|
||||
- Build script for `emcc` + `rustc`
|
||||
|
||||
**Possible Approaches**:
|
||||
|
||||
#### Option A: Pure C Extension
|
||||
```c
|
||||
// ruvector_pglite.c
|
||||
#include "postgres.h"
|
||||
#include "fmgr.h"
|
||||
|
||||
PG_MODULE_MAGIC;
|
||||
|
||||
PG_FUNCTION_INFO_V1(vector_cosine_distance);
|
||||
Datum vector_cosine_distance(PG_FUNCTION_ARGS) {
|
||||
// Call Rust library via FFI
|
||||
float32 result = rust_cosine_distance(...);
|
||||
PG_RETURN_FLOAT4(result);
|
||||
}
|
||||
```
|
||||
|
||||
Then compile:
|
||||
```bash
|
||||
emcc -o ruvector.wasm ruvector_pglite.c libruvector_core.a \
|
||||
-I$PG_INCLUDE -s MAIN_MODULE=1
|
||||
```
|
||||
|
||||
#### Option B: Rust with C Wrapper
|
||||
```rust
|
||||
// ruvector_core/src/ffi.rs
|
||||
#[no_mangle]
|
||||
pub extern "C" fn rust_cosine_distance(
|
||||
a: *const f32,
|
||||
b: *const f32,
|
||||
len: usize
|
||||
) -> f32 {
|
||||
// Safe Rust implementation
|
||||
}
|
||||
```
|
||||
|
||||
Then build:
|
||||
```bash
|
||||
# Build Rust to WASM staticlib
|
||||
cargo build --target wasm32-unknown-emscripten --release
|
||||
|
||||
# Link with C wrapper
|
||||
emcc -o ruvector.wasm wrapper.c libruvector_core.a \
|
||||
-I$PG_INCLUDE -s MAIN_MODULE=1
|
||||
```
|
||||
|
||||
### 6. **Size Targets May Be Unrealistic**
|
||||
|
||||
**Original Target**: 500KB-1MB WASM
|
||||
|
||||
**Reality Check**:
|
||||
- pgvector (minimal, C-based): ~200KB compiled to WASM
|
||||
- Full ruvector features (even stripped): likely 2-5MB
|
||||
- Rust std library adds ~100-300KB
|
||||
- PostgreSQL runtime overhead: varies
|
||||
|
||||
**Revised Targets**:
|
||||
- Minimal (types + distances): ~500KB-1MB ✅
|
||||
- With HNSW index: ~1-2MB
|
||||
- With quantization: ~2-3MB
|
||||
- Full features: 5-10MB (defeats purpose)
|
||||
|
||||
### 7. **No TypeScript Plugin API Consideration**
|
||||
|
||||
**Missing Alternative**: PGlite's custom plugin API
|
||||
|
||||
Instead of a PostgreSQL extension, could build a **TypeScript plugin** that provides vector operations via PGlite's namespace API:
|
||||
|
||||
```typescript
|
||||
// Hybrid approach: TypeScript + WASM compute kernel
|
||||
import { Extension } from '@electric-sql/pglite'
|
||||
import init, { cosineDistance } from './ruvector_core.wasm'
|
||||
|
||||
const setup = async (pg: PGliteInterface) => {
|
||||
await init() // Initialize WASM
|
||||
|
||||
return {
|
||||
namespaceObj: {
|
||||
vector: {
|
||||
cosineDistance: (a: Float32Array, b: Float32Array) =>
|
||||
cosineDistance(a, b), // WASM function
|
||||
// Other vector operations...
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export const ruvector = { name: 'ruvector', setup }
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
```typescript
|
||||
const db = await PGlite.create({ extensions: { ruvector } })
|
||||
|
||||
// Use via JavaScript API (not SQL)
|
||||
const dist = db.ruvector.vector.cosineDistance(vec1, vec2)
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- ✅ No Emscripten/PostgreSQL build complexity
|
||||
- ✅ Direct WASM (no PostgreSQL FFI overhead)
|
||||
- ✅ Easier to build and maintain
|
||||
- ✅ Can still use Rust → wasm-bindgen
|
||||
|
||||
**Cons**:
|
||||
- ❌ Not SQL-compatible (no `SELECT ... ORDER BY embedding <=> $1`)
|
||||
- ❌ Can't use PostgreSQL indexes
|
||||
- ❌ Not a drop-in pgvector replacement
|
||||
|
||||
---
|
||||
|
||||
## ✅ What's ACTUALLY Needed
|
||||
|
||||
### Corrected Architecture Options
|
||||
|
||||
#### **Option 1: Full PostgreSQL Extension (Complex but SQL-compatible)**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ ruvector-core (Rust library) │
|
||||
│ - Vector types, distances, HNSW │
|
||||
│ - Compiles to: libruvector_core.a │
|
||||
│ - Target: wasm32-unknown-emscripten │
|
||||
└─────────────────────────────────────────┘
|
||||
▲
|
||||
│ C FFI
|
||||
│
|
||||
┌─────────────┴───────────────────────────┐
|
||||
│ ruvector_pglite_wrapper.c │
|
||||
│ - PostgreSQL extension entry points │
|
||||
│ - PG_FUNCTION_INFO_V1 macros │
|
||||
│ - Calls Rust via FFI │
|
||||
└─────────────────────────────────────────┘
|
||||
│ Emscripten
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ ruvector.tar.gz │
|
||||
│ ├── ruvector.so.wasm │
|
||||
│ ├── ruvector.control │
|
||||
│ └── ruvector--0.1.0.sql │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ @ruvector/pglite (TypeScript) │
|
||||
│ - Extension loader │
|
||||
│ - Minimal wrapper (like pgvector) │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Build Process**:
|
||||
1. Fork PGlite repo
|
||||
2. Add ruvector as submodule in `postgres-pglite/pglite/`
|
||||
3. Create Makefile with Emscripten rules
|
||||
4. Build Rust core to WASM staticlib
|
||||
5. Link with C wrapper
|
||||
6. Package to .tar.gz
|
||||
7. Create TypeScript loader
|
||||
|
||||
**Pros**: ✅ Full SQL compatibility, ✅ PostgreSQL indexes
|
||||
**Cons**: ❌ Complex build, ❌ Large size, ❌ Tight coupling to PGlite
|
||||
|
||||
#### **Option 2: Hybrid TypeScript Plugin (Simpler, WASM-native)**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ ruvector-core (Rust library) │
|
||||
│ - Vector operations only │
|
||||
│ - No PostgreSQL dependencies │
|
||||
│ - wasm-bindgen for JS interop │
|
||||
│ - Target: wasm32-unknown-unknown │
|
||||
└─────────────────────────────────────────┘
|
||||
│ wasm-pack
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ ruvector_core_bg.wasm + .js glue │
|
||||
└─────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────┐
|
||||
│ @ruvector/pglite (TypeScript plugin) │
|
||||
│ - PGlite Extension interface │
|
||||
│ - Namespace API for vector ops │
|
||||
│ - SQL function wrappers (via exec) │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Build Process**:
|
||||
```bash
|
||||
# 1. Build Rust to WASM
|
||||
cd ruvector-core
|
||||
wasm-pack build --target web
|
||||
|
||||
# 2. Create TypeScript wrapper
|
||||
cd ../npm/packages/pglite
|
||||
pnpm build
|
||||
|
||||
# 3. Publish
|
||||
pnpm publish
|
||||
```
|
||||
|
||||
**Pros**: ✅ Simple build, ✅ Small size, ✅ Easy maintenance
|
||||
**Cons**: ❌ Limited SQL integration, ❌ No native indexes
|
||||
|
||||
#### **Option 3: Minimal SQL Extension (Best Balance)**
|
||||
|
||||
Start with **Option 1** but with **minimal features**:
|
||||
- ✅ Core vector type
|
||||
- ✅ Distance operators (`<->`, `<=>`, `<#>`)
|
||||
- ❌ Skip HNSW (use flat scan)
|
||||
- ❌ Skip quantization
|
||||
- ❌ Skip advanced features
|
||||
|
||||
**Target Size**: ~200-500KB (comparable to pgvector)
|
||||
|
||||
---
|
||||
|
||||
## 📋 Revised Implementation Checklist
|
||||
|
||||
### Prerequisites
|
||||
- [ ] Clone PGlite repo with submodules
|
||||
- [ ] Install Emscripten SDK (emsdk)
|
||||
- [ ] Study pgvector's PGlite implementation
|
||||
- [ ] Understand PGlite's build system
|
||||
|
||||
### Development
|
||||
- [ ] Create `ruvector-core` (no PostgreSQL deps)
|
||||
- [ ] Add C FFI layer (`ffi.rs` with `#[no_mangle]`)
|
||||
- [ ] Write C wrapper (`ruvector_wrapper.c`)
|
||||
- [ ] Create extension control file (`ruvector.control`)
|
||||
- [ ] Write SQL install script (`ruvector--0.1.0.sql`)
|
||||
|
||||
### Building
|
||||
- [ ] Add as submodule to PGlite
|
||||
- [ ] Configure Emscripten Makefile
|
||||
- [ ] Build Rust to WASM staticlib
|
||||
- [ ] Link with C wrapper using emcc
|
||||
- [ ] Package to .tar.gz
|
||||
|
||||
### Testing
|
||||
- [ ] Write Vitest tests
|
||||
- [ ] Test in browser environment
|
||||
- [ ] Benchmark against pgvector
|
||||
- [ ] Validate SQL compatibility
|
||||
|
||||
### Publishing
|
||||
- [ ] Create TypeScript loader
|
||||
- [ ] Add to PGlite extensions catalog
|
||||
- [ ] Publish to npm
|
||||
- [ ] Write documentation
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Recommended Path Forward
|
||||
|
||||
**Start with Option 2 (TypeScript Plugin)** for these reasons:
|
||||
|
||||
1. **Immediate Value**: Can ship in 1-2 weeks vs 6+ weeks
|
||||
2. **Learning Path**: Understand PGlite before committing to Option 1
|
||||
3. **Proof of Concept**: Validate demand for ruvector-pglite
|
||||
4. **Simpler**: No Emscripten complexity
|
||||
5. **Upgradeable**: Can migrate to Option 1 later if SQL is critical
|
||||
|
||||
**Then, if SQL compatibility is required**, upgrade to Option 1 (Full Extension).
|
||||
|
||||
---
|
||||
|
||||
## 📚 Additional Research Needed
|
||||
|
||||
1. **Emscripten + Rust**: Best practices for compiling Rust to WASM with emcc
|
||||
2. **PGlite Build System**: Deep dive into their Makefile and build scripts
|
||||
3. **PostgreSQL C API**: Required functions for minimal extension
|
||||
4. **Memory Management**: Emscripten's memory model vs Rust
|
||||
5. **Size Optimization**: Dead code elimination, LTO for WASM
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- [PGlite Extension Development Guide](https://github.com/electric-sql/pglite/blob/main/docs/extensions/development.md)
|
||||
- [PGlite pgvector Implementation](https://github.com/electric-sql/pglite/blob/main/packages/pglite/src/vector/index.ts)
|
||||
- [In-Browser Semantic Search with PGlite](https://supabase.com/blog/in-browser-semantic-search-pglite)
|
||||
- [PGlite Extensions Catalog](https://pglite.dev/extensions/)
|
||||
- [Bringing WebAssembly to PostgreSQL](https://dylibso.com/blog/pg-extism/)
|
||||
- [Compiling Postgres to WASM with PGlite](https://development.tldrecap.tech/posts/pgconf-2025/postgresql-pg-lite-webassembly-wasm/)
|
||||
|
||||
---
|
||||
|
||||
**Next Step**: Choose an implementation option and update the plan accordingly.
|
||||
Reference in New Issue
Block a user