Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
commit d803bfe2b1
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,346 @@
# RuVector-Postgres v2.0.0 Security Audit Report
**Date:** 2025-12-26
**Auditor:** Claude Code Security Review
**Scope:** `/crates/ruvector-postgres/src/**/*.rs`
**Branch:** `feat/ruvector-postgres-v2`
**Status:** CRITICAL issues FIXED
---
## Executive Summary
| Severity | Count | Status |
|----------|-------|--------|
| **CRITICAL** | 3 | ✅ **FIXED** |
| **HIGH** | 2 | ⚠️ Documented for future improvement |
| **MEDIUM** | 3 | ⚠️ Documented for future improvement |
| **LOW** | 2 | ✅ Acceptable |
| **INFO** | 3 | ✅ Acceptable patterns noted |
### Security Fixes Applied (2025-12-26)
1. **Created `validation.rs` module** - Input validation for tenant IDs and identifiers
2. **Fixed SQL injection in `isolation.rs`** - All SQL now uses `quote_identifier()` and parameterized queries
3. **Fixed SQL injection in `operations.rs`** - `AuditLogEntry` now properly escapes all values
4. **Added `ValidatedTenantId` type** - Type-safe tenant ID validation
5. **Query routing uses `$1` placeholders** - Parameterized queries prevent injection
---
## CRITICAL Findings
### CVE-PENDING-001: SQL Injection in Tenant Isolation Module ✅ FIXED
**Location:** `src/tenancy/isolation.rs`
**Lines:** 233, 454, 461, 477, 491
**Status:****FIXED on 2025-12-26**
**Original Vulnerable Code:**
```rust
// Line 233 - Direct table name interpolation
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name))
// Line 454 - Direct tenant_id interpolation
filter: format!("tenant_id = '{}'", tenant_id),
```
**Applied Fix:**
```rust
// Now uses validated identifiers with quote_identifier()
validate_identifier(partition_name)?;
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))
// Now uses parameterized queries with $1 placeholder
filter: "tenant_id = $1".to_string(),
tenant_param: Some(tenant_id.to_string()),
```
**Changes Made:**
- Added `validate_tenant_id()` calls before any SQL generation
- All table/schema/partition names now use `quote_identifier()`
- Query routing returns `tenant_id = $1` placeholder instead of direct interpolation
- Added `tenant_param` field to `QueryRoute::SharedWithFilter` for binding
---
### CVE-PENDING-002: SQL Injection in Tenant Audit Logging ✅ FIXED
**Location:** `src/tenancy/operations.rs`
**Lines:** 515-527
**Status:****FIXED on 2025-12-26**
**Original Vulnerable Code:**
```rust
format!("'{}'", u) // Direct user_id interpolation
format!("'{}'", ip) // Direct IP interpolation
```
**Applied Fix:**
```rust
// New parameterized version
pub fn insert_sql_parameterized(&self) -> (String, Vec<Option<String>>) {
let sql = "INSERT INTO ruvector.tenant_audit_log ... VALUES ($1, $2, $3, $4, $5, $6, $7)";
// Params bound safely
}
// Legacy version now escapes properly
let escaped_user_id = escape_string_literal(u);
// IP validated: if validate_ip_address(ip) { Some(...) } else { None }
```
**Changes Made:**
- Added `insert_sql_parameterized()` for new code (preferred)
- Legacy `insert_sql()` now uses `escape_string_literal()` for all values
- Added IP address validation - invalid IPs become NULL
- Tenant ID validated before SQL generation
---
### CVE-PENDING-003: SQL Injection via Drop Partition ✅ FIXED
**Location:** `src/tenancy/isolation.rs:227-234`
**Status:****FIXED on 2025-12-26**
**Original Vulnerable Code:**
```rust
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name)) // UNSAFE
```
**Applied Fix:**
```rust
// Validate inputs
validate_tenant_id(tenant_id)?;
validate_identifier(partition_name)?;
// Verify partition belongs to tenant (authorization check)
let partition_exists = self.partitions.get(tenant_id)
.map(|p| p.iter().any(|p| p.partition_name == partition_name))
.unwrap_or(false);
if !partition_exists {
return Err(IsolationError::PartitionNotFound(partition_name.to_string()));
}
// Use quoted identifier
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))
```
**Changes Made:**
- Added input validation for both tenant_id and partition_name
- Added authorization check - partition must belong to tenant
- Used `quote_identifier()` for safe SQL generation
---
## HIGH Findings
### HIGH-001: Excessive Panic/Unwrap Usage
**Location:** Multiple files (63 files affected)
**Count:** 462 occurrences of `unwrap()`, `expect()`, `panic!`
**Description:**
Unhandled panics in PostgreSQL extensions can crash the database backend process.
**Impact:**
- Denial of Service through crafted inputs
- Database backend crashes
- Service unavailability
**Affected Patterns:**
```rust
.unwrap() // 280+ occurrences
.expect("...") // 150+ occurrences
panic!("...") // 32 occurrences
```
**Remediation:**
1. Replace `unwrap()` with `unwrap_or_default()` or proper error handling
2. Use `pgrx::error!()` for graceful PostgreSQL error reporting
3. Implement `Result<T, E>` return types for public functions
4. Add input validation before operations that can panic
---
### HIGH-002: Unsafe Integer Casts
**Location:** Multiple files
**Count:** 392 occurrences
**Description:**
Unchecked integer casts between types (e.g., `as usize`, `as i32`, `as u64`) can cause overflow/underflow.
**Affected Patterns:**
```rust
value as usize // Can panic on 32-bit systems
len as i32 // Can overflow for large vectors
index as u64 // Can truncate on edge cases
```
**Remediation:**
1. Use `TryFrom`/`try_into()` with error handling
2. Add bounds checking before casts
3. Use `saturating_cast` or `checked_cast` patterns
4. Validate dimension/size limits at API boundary
---
## MEDIUM Findings
### MEDIUM-001: Unsafe Pointer Operations in Index Storage
**Location:** `src/index/ivfflat_storage.rs`, `src/index/hnsw_am.rs`
**Description:**
Index access methods use raw pointer operations for performance, which are inherently unsafe.
**Affected Patterns:**
- `std::ptr::read()`
- `std::ptr::write()`
- `std::slice::from_raw_parts()`
- `std::slice::from_raw_parts_mut()`
**Mitigation Applied:**
- Operations are gated behind `unsafe` blocks
- Required for pgrx PostgreSQL integration
- No user-controlled data reaches pointers directly
**Recommendation:**
1. Add bounds checking assertions before pointer access
2. Document safety invariants for each unsafe block
3. Consider `#[deny(unsafe_op_in_unsafe_fn)]` lint
---
### MEDIUM-002: Unbounded Vector Allocations
**Location:** Multiple modules
**Description:**
Some operations allocate vectors based on user-provided dimensions without upper limits.
**Affected Areas:**
- `Vec::with_capacity(dimension)` in type constructors
- `.collect()` on unbounded iterators
- Graph traversal result sets
**Remediation:**
1. Define `MAX_VECTOR_DIMENSION` constant (e.g., 16384)
2. Validate dimensions at input boundaries
3. Add configurable limits via GUC parameters
---
### MEDIUM-003: Missing Rate Limiting on Tenant Operations
**Location:** `src/tenancy/operations.rs`
**Description:**
Tenant creation and audit logging have no rate limiting, allowing potential abuse.
**Remediation:**
1. Add configurable rate limits per tenant
2. Implement quota checking before operations
3. Add throttling for expensive operations
---
## LOW Findings
### LOW-001: Debug Output in Tests Only
**Location:** `src/distance/simd.rs`
**Count:** 7 `println!` statements
**Status:** ACCEPTABLE - All debug output is in `#[cfg(test)]` modules only.
---
### LOW-002: Error Messages May Reveal Internal Paths
**Location:** Various error handling code
**Description:**
Some error messages include internal details that could aid attackers.
**Example:**
```rust
format!("Failed to spawn worker: {}", e)
format!("Failed to decode operation: {}", e)
```
**Remediation:**
1. Use generic user-facing error messages
2. Log detailed errors internally only
3. Implement error code system for debugging
---
## INFO - Acceptable Patterns
### INFO-001: No Command Execution Found
No `Command::new()`, `exec`, or shell execution patterns found. ✅
### INFO-002: No File System Operations
No `std::fs`, `File::open`, or path manipulation in production code. ✅
### INFO-003: No Hardcoded Credentials
No passwords, API keys, or secrets in source code. ✅
---
## Security Checklist Summary
| Category | Status | Notes |
|----------|--------|-------|
| SQL Injection | ❌ FAIL | 3 critical findings in tenancy module |
| Command Injection | ✅ PASS | No shell execution |
| Path Traversal | ✅ PASS | No file operations |
| Memory Safety | ⚠️ WARN | Acceptable unsafe for pgrx, but review recommended |
| Input Validation | ⚠️ WARN | Missing on tenant/partition names |
| DoS Prevention | ⚠️ WARN | Panic-prone code paths |
| Auth/AuthZ | ✅ PASS | No bypasses found |
| Crypto | ✅ PASS | No cryptographic code present |
| Information Disclosure | ✅ PASS | Debug output test-only |
---
## Remediation Priority
### Immediate (Before Release)
1. **Fix SQL injection in tenancy module** - Use parameterized queries
2. **Validate tenant_id format** - Alphanumeric only, max length 64
### Short Term (Next Sprint)
3. Replace critical `unwrap()` calls with proper error handling
4. Add dimension limits to vector operations
5. Implement input validation helpers
### Medium Term
6. Add rate limiting to tenant operations
7. Audit and document all `unsafe` blocks
8. Convert integer casts to checked variants
---
## Testing Recommendations
1. **Fuzz testing:** Apply cargo-fuzz to SQL-generating functions
2. **Property testing:** Test boundary conditions with proptest
3. **Integration tests:** Add SQL injection test vectors
4. **Negative tests:** Verify malformed inputs are rejected
---
## Appendix: Files Reviewed
- 80+ source files in `/crates/ruvector-postgres/src/`
- 148 `#[pg_extern]` function definitions
- Focus areas: tenancy, index, distance, types, graph
---
*Report generated by Claude Code security analysis*