git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
347 lines
9.7 KiB
Markdown
347 lines
9.7 KiB
Markdown
# RuVector-Postgres v2.0.0 Security Audit Report
|
|
|
|
**Date:** 2025-12-26
|
|
**Auditor:** Claude Code Security Review
|
|
**Scope:** `/crates/ruvector-postgres/src/**/*.rs`
|
|
**Branch:** `feat/ruvector-postgres-v2`
|
|
**Status:** CRITICAL issues FIXED
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
| Severity | Count | Status |
|
|
|----------|-------|--------|
|
|
| **CRITICAL** | 3 | ✅ **FIXED** |
|
|
| **HIGH** | 2 | ⚠️ Documented for future improvement |
|
|
| **MEDIUM** | 3 | ⚠️ Documented for future improvement |
|
|
| **LOW** | 2 | ✅ Acceptable |
|
|
| **INFO** | 3 | ✅ Acceptable patterns noted |
|
|
|
|
### Security Fixes Applied (2025-12-26)
|
|
|
|
1. **Created `validation.rs` module** - Input validation for tenant IDs and identifiers
|
|
2. **Fixed SQL injection in `isolation.rs`** - All SQL now uses `quote_identifier()` and parameterized queries
|
|
3. **Fixed SQL injection in `operations.rs`** - `AuditLogEntry` now properly escapes all values
|
|
4. **Added `ValidatedTenantId` type** - Type-safe tenant ID validation
|
|
5. **Query routing uses `$1` placeholders** - Parameterized queries prevent injection
|
|
|
|
---
|
|
|
|
## CRITICAL Findings
|
|
|
|
### CVE-PENDING-001: SQL Injection in Tenant Isolation Module ✅ FIXED
|
|
|
|
**Location:** `src/tenancy/isolation.rs`
|
|
**Lines:** 233, 454, 461, 477, 491
|
|
**Status:** ✅ **FIXED on 2025-12-26**
|
|
|
|
**Original Vulnerable Code:**
|
|
```rust
|
|
// Line 233 - Direct table name interpolation
|
|
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name))
|
|
|
|
// Line 454 - Direct tenant_id interpolation
|
|
filter: format!("tenant_id = '{}'", tenant_id),
|
|
```
|
|
|
|
**Applied Fix:**
|
|
```rust
|
|
// Now uses validated identifiers with quote_identifier()
|
|
validate_identifier(partition_name)?;
|
|
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))
|
|
|
|
// Now uses parameterized queries with $1 placeholder
|
|
filter: "tenant_id = $1".to_string(),
|
|
tenant_param: Some(tenant_id.to_string()),
|
|
```
|
|
|
|
**Changes Made:**
|
|
- Added `validate_tenant_id()` calls before any SQL generation
|
|
- All table/schema/partition names now use `quote_identifier()`
|
|
- Query routing returns `tenant_id = $1` placeholder instead of direct interpolation
|
|
- Added `tenant_param` field to `QueryRoute::SharedWithFilter` for binding
|
|
|
|
---
|
|
|
|
### CVE-PENDING-002: SQL Injection in Tenant Audit Logging ✅ FIXED
|
|
|
|
**Location:** `src/tenancy/operations.rs`
|
|
**Lines:** 515-527
|
|
**Status:** ✅ **FIXED on 2025-12-26**
|
|
|
|
**Original Vulnerable Code:**
|
|
```rust
|
|
format!("'{}'", u) // Direct user_id interpolation
|
|
format!("'{}'", ip) // Direct IP interpolation
|
|
```
|
|
|
|
**Applied Fix:**
|
|
```rust
|
|
// New parameterized version
|
|
pub fn insert_sql_parameterized(&self) -> (String, Vec<Option<String>>) {
|
|
let sql = "INSERT INTO ruvector.tenant_audit_log ... VALUES ($1, $2, $3, $4, $5, $6, $7)";
|
|
// Params bound safely
|
|
}
|
|
|
|
// Legacy version now escapes properly
|
|
let escaped_user_id = escape_string_literal(u);
|
|
// IP validated: if validate_ip_address(ip) { Some(...) } else { None }
|
|
```
|
|
|
|
**Changes Made:**
|
|
- Added `insert_sql_parameterized()` for new code (preferred)
|
|
- Legacy `insert_sql()` now uses `escape_string_literal()` for all values
|
|
- Added IP address validation - invalid IPs become NULL
|
|
- Tenant ID validated before SQL generation
|
|
|
|
---
|
|
|
|
### CVE-PENDING-003: SQL Injection via Drop Partition ✅ FIXED
|
|
|
|
**Location:** `src/tenancy/isolation.rs:227-234`
|
|
**Status:** ✅ **FIXED on 2025-12-26**
|
|
|
|
**Original Vulnerable Code:**
|
|
```rust
|
|
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name)) // UNSAFE
|
|
```
|
|
|
|
**Applied Fix:**
|
|
```rust
|
|
// Validate inputs
|
|
validate_tenant_id(tenant_id)?;
|
|
validate_identifier(partition_name)?;
|
|
|
|
// Verify partition belongs to tenant (authorization check)
|
|
let partition_exists = self.partitions.get(tenant_id)
|
|
.map(|p| p.iter().any(|p| p.partition_name == partition_name))
|
|
.unwrap_or(false);
|
|
if !partition_exists {
|
|
return Err(IsolationError::PartitionNotFound(partition_name.to_string()));
|
|
}
|
|
|
|
// Use quoted identifier
|
|
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))
|
|
```
|
|
|
|
**Changes Made:**
|
|
- Added input validation for both tenant_id and partition_name
|
|
- Added authorization check - partition must belong to tenant
|
|
- Used `quote_identifier()` for safe SQL generation
|
|
|
|
---
|
|
|
|
## HIGH Findings
|
|
|
|
### HIGH-001: Excessive Panic/Unwrap Usage
|
|
|
|
**Location:** Multiple files (63 files affected)
|
|
**Count:** 462 occurrences of `unwrap()`, `expect()`, `panic!`
|
|
|
|
**Description:**
|
|
Unhandled panics in PostgreSQL extensions can crash the database backend process.
|
|
|
|
**Impact:**
|
|
- Denial of Service through crafted inputs
|
|
- Database backend crashes
|
|
- Service unavailability
|
|
|
|
**Affected Patterns:**
|
|
```rust
|
|
.unwrap() // 280+ occurrences
|
|
.expect("...") // 150+ occurrences
|
|
panic!("...") // 32 occurrences
|
|
```
|
|
|
|
**Remediation:**
|
|
1. Replace `unwrap()` with `unwrap_or_default()` or proper error handling
|
|
2. Use `pgrx::error!()` for graceful PostgreSQL error reporting
|
|
3. Implement `Result<T, E>` return types for public functions
|
|
4. Add input validation before operations that can panic
|
|
|
|
---
|
|
|
|
### HIGH-002: Unsafe Integer Casts
|
|
|
|
**Location:** Multiple files
|
|
**Count:** 392 occurrences
|
|
|
|
**Description:**
|
|
Unchecked integer casts between types (e.g., `as usize`, `as i32`, `as u64`) can cause overflow/underflow.
|
|
|
|
**Affected Patterns:**
|
|
```rust
|
|
value as usize // Can panic on 32-bit systems
|
|
len as i32 // Can overflow for large vectors
|
|
index as u64 // Can truncate on edge cases
|
|
```
|
|
|
|
**Remediation:**
|
|
1. Use `TryFrom`/`try_into()` with error handling
|
|
2. Add bounds checking before casts
|
|
3. Use `saturating_cast` or `checked_cast` patterns
|
|
4. Validate dimension/size limits at API boundary
|
|
|
|
---
|
|
|
|
## MEDIUM Findings
|
|
|
|
### MEDIUM-001: Unsafe Pointer Operations in Index Storage
|
|
|
|
**Location:** `src/index/ivfflat_storage.rs`, `src/index/hnsw_am.rs`
|
|
|
|
**Description:**
|
|
Index access methods use raw pointer operations for performance, which are inherently unsafe.
|
|
|
|
**Affected Patterns:**
|
|
- `std::ptr::read()`
|
|
- `std::ptr::write()`
|
|
- `std::slice::from_raw_parts()`
|
|
- `std::slice::from_raw_parts_mut()`
|
|
|
|
**Mitigation Applied:**
|
|
- Operations are gated behind `unsafe` blocks
|
|
- Required for pgrx PostgreSQL integration
|
|
- No user-controlled data reaches pointers directly
|
|
|
|
**Recommendation:**
|
|
1. Add bounds checking assertions before pointer access
|
|
2. Document safety invariants for each unsafe block
|
|
3. Consider `#[deny(unsafe_op_in_unsafe_fn)]` lint
|
|
|
|
---
|
|
|
|
### MEDIUM-002: Unbounded Vector Allocations
|
|
|
|
**Location:** Multiple modules
|
|
|
|
**Description:**
|
|
Some operations allocate vectors based on user-provided dimensions without upper limits.
|
|
|
|
**Affected Areas:**
|
|
- `Vec::with_capacity(dimension)` in type constructors
|
|
- `.collect()` on unbounded iterators
|
|
- Graph traversal result sets
|
|
|
|
**Remediation:**
|
|
1. Define `MAX_VECTOR_DIMENSION` constant (e.g., 16384)
|
|
2. Validate dimensions at input boundaries
|
|
3. Add configurable limits via GUC parameters
|
|
|
|
---
|
|
|
|
### MEDIUM-003: Missing Rate Limiting on Tenant Operations
|
|
|
|
**Location:** `src/tenancy/operations.rs`
|
|
|
|
**Description:**
|
|
Tenant creation and audit logging have no rate limiting, allowing potential abuse.
|
|
|
|
**Remediation:**
|
|
1. Add configurable rate limits per tenant
|
|
2. Implement quota checking before operations
|
|
3. Add throttling for expensive operations
|
|
|
|
---
|
|
|
|
## LOW Findings
|
|
|
|
### LOW-001: Debug Output in Tests Only
|
|
|
|
**Location:** `src/distance/simd.rs`
|
|
**Count:** 7 `println!` statements
|
|
|
|
**Status:** ACCEPTABLE - All debug output is in `#[cfg(test)]` modules only.
|
|
|
|
---
|
|
|
|
### LOW-002: Error Messages May Reveal Internal Paths
|
|
|
|
**Location:** Various error handling code
|
|
|
|
**Description:**
|
|
Some error messages include internal details that could aid attackers.
|
|
|
|
**Example:**
|
|
```rust
|
|
format!("Failed to spawn worker: {}", e)
|
|
format!("Failed to decode operation: {}", e)
|
|
```
|
|
|
|
**Remediation:**
|
|
1. Use generic user-facing error messages
|
|
2. Log detailed errors internally only
|
|
3. Implement error code system for debugging
|
|
|
|
---
|
|
|
|
## INFO - Acceptable Patterns
|
|
|
|
### INFO-001: No Command Execution Found
|
|
|
|
No `Command::new()`, `exec`, or shell execution patterns found. ✅
|
|
|
|
### INFO-002: No File System Operations
|
|
|
|
No `std::fs`, `File::open`, or path manipulation in production code. ✅
|
|
|
|
### INFO-003: No Hardcoded Credentials
|
|
|
|
No passwords, API keys, or secrets in source code. ✅
|
|
|
|
---
|
|
|
|
## Security Checklist Summary
|
|
|
|
| Category | Status | Notes |
|
|
|----------|--------|-------|
|
|
| SQL Injection | ❌ FAIL | 3 critical findings in tenancy module |
|
|
| Command Injection | ✅ PASS | No shell execution |
|
|
| Path Traversal | ✅ PASS | No file operations |
|
|
| Memory Safety | ⚠️ WARN | Acceptable unsafe for pgrx, but review recommended |
|
|
| Input Validation | ⚠️ WARN | Missing on tenant/partition names |
|
|
| DoS Prevention | ⚠️ WARN | Panic-prone code paths |
|
|
| Auth/AuthZ | ✅ PASS | No bypasses found |
|
|
| Crypto | ✅ PASS | No cryptographic code present |
|
|
| Information Disclosure | ✅ PASS | Debug output test-only |
|
|
|
|
---
|
|
|
|
## Remediation Priority
|
|
|
|
### Immediate (Before Release)
|
|
1. **Fix SQL injection in tenancy module** - Use parameterized queries
|
|
2. **Validate tenant_id format** - Alphanumeric only, max length 64
|
|
|
|
### Short Term (Next Sprint)
|
|
3. Replace critical `unwrap()` calls with proper error handling
|
|
4. Add dimension limits to vector operations
|
|
5. Implement input validation helpers
|
|
|
|
### Medium Term
|
|
6. Add rate limiting to tenant operations
|
|
7. Audit and document all `unsafe` blocks
|
|
8. Convert integer casts to checked variants
|
|
|
|
---
|
|
|
|
## Testing Recommendations
|
|
|
|
1. **Fuzz testing:** Apply cargo-fuzz to SQL-generating functions
|
|
2. **Property testing:** Test boundary conditions with proptest
|
|
3. **Integration tests:** Add SQL injection test vectors
|
|
4. **Negative tests:** Verify malformed inputs are rejected
|
|
|
|
---
|
|
|
|
## Appendix: Files Reviewed
|
|
|
|
- 80+ source files in `/crates/ruvector-postgres/src/`
|
|
- 148 `#[pg_extern]` function definitions
|
|
- Focus areas: tenancy, index, distance, types, graph
|
|
|
|
---
|
|
|
|
*Report generated by Claude Code security analysis*
|