git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
441 lines
14 KiB
Markdown
441 lines
14 KiB
Markdown
# RuVector Security Audit Report
|
|
|
|
**Date:** 2026-01-18
|
|
**Auditor:** Security Review Agent
|
|
**Scope:** Comprehensive security audit of the RuVector vector database
|
|
**Version:** 0.1.32
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This security audit examines the RuVector codebase for potential vulnerabilities in memory safety, input validation, SIMD operations, WASM security, and dependencies. The audit identified several areas of concern and provides recommendations for security hardening.
|
|
|
|
### Risk Summary
|
|
|
|
| Category | Critical | High | Medium | Low | Info |
|
|
|----------|----------|------|--------|-----|------|
|
|
| Memory Safety | 0 | 2 | 3 | 4 | 2 |
|
|
| Input Validation | 0 | 1 | 2 | 3 | 1 |
|
|
| SIMD Operations | 0 | 1 | 2 | 2 | 3 |
|
|
| WASM Security | 0 | 2 | 3 | 2 | 2 |
|
|
| Dependencies | 0 | 0 | 1 | 2 | 2 |
|
|
|
|
---
|
|
|
|
## 1. Unsafe Code Review
|
|
|
|
### 1.1 SIMD Intrinsics (`crates/ruvector-core/src/simd_intrinsics.rs`)
|
|
|
|
**Status:** Generally Well-Protected
|
|
|
|
**Positive Findings:**
|
|
- All unsafe SIMD functions include length assertions before pointer operations
|
|
- Safety comments present (e.g., "SECURITY: Ensure both arrays have the same length")
|
|
- Proper use of `#[target_feature(enable = "...")]` attributes
|
|
- Fallback scalar implementations available for all operations
|
|
|
|
**Code Example (Good Practice):**
|
|
```rust
|
|
#[cfg(target_arch = "x86_64")]
|
|
#[target_feature(enable = "avx2")]
|
|
unsafe fn euclidean_distance_avx2_impl(a: &[f32], b: &[f32]) -> f32 {
|
|
// SECURITY: Ensure both arrays have the same length to prevent out-of-bounds access
|
|
assert_eq!(a.len(), b.len(), "Input arrays must have the same length");
|
|
// ...
|
|
}
|
|
```
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| SIMD-001 | Medium | Missing `#[deny(unsafe_op_in_unsafe_fn)]` lint | Module level |
|
|
| SIMD-002 | Low | No bounds checking in remainder loops | Lines 88-91, 127-130 |
|
|
| SIMD-003 | Info | Uses `std::mem::transmute` for horizontal sum | Lines 84, 284, 364-366 |
|
|
|
|
**Recommendations:**
|
|
1. Add `#![deny(unsafe_op_in_unsafe_fn)]` at the module level
|
|
2. Add explicit bounds checks in remainder loops with `get()` or `assert!`
|
|
3. Document the transmute usage with safety invariant comments
|
|
|
|
### 1.2 Arena Allocator (`crates/ruvector-core/src/arena.rs`)
|
|
|
|
**Status:** Well-Protected with Security Checks
|
|
|
|
**Positive Findings:**
|
|
- Size overflow checks using `checked_add()`
|
|
- Alignment validation (power of 2 check)
|
|
- Maximum allocation size validation
|
|
- Bounds checking before pointer arithmetic
|
|
- Null pointer checks in `ArenaVec`
|
|
|
|
**Code Example (Good Practice):**
|
|
```rust
|
|
// SECURITY: Validate alignment is a power of 2 and size is reasonable
|
|
assert!(align > 0 && align.is_power_of_two(), "Alignment must be a power of 2");
|
|
assert!(size > 0, "Cannot allocate zero bytes");
|
|
assert!(size <= isize::MAX as usize, "Allocation size too large");
|
|
```
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| ARENA-001 | Medium | RefCell not thread-safe, marked issues with thread_arena | Lines 219-232 |
|
|
| ARENA-002 | Low | No maximum chunk count limit | `alloc_raw()` |
|
|
|
|
### 1.3 Cache-Optimized Storage (`crates/ruvector-core/src/cache_optimized.rs`)
|
|
|
|
**Status:** Good with Security Constants
|
|
|
|
**Positive Findings:**
|
|
- `MAX_DIMENSIONS` limit (65536) prevents DoS
|
|
- `MAX_CAPACITY` limit (~16M vectors) prevents memory exhaustion
|
|
- Checked arithmetic for all size calculations
|
|
- Explicit overflow panic messages
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| CACHE-001 | High | `unsafe impl Send/Sync` without verification | Lines 229-230 |
|
|
| CACHE-002 | Medium | No bounds check in `dimension_slice` before unsafe | Lines 112-115 |
|
|
|
|
**Recommendation for CACHE-001:**
|
|
```rust
|
|
// The raw pointer is exclusively owned and only accessed through
|
|
// properly synchronized methods. The storage is only modified through
|
|
// &mut self methods, ensuring exclusive access.
|
|
// SAFETY: The data pointer is valid for the lifetime of this struct,
|
|
// all writes are synchronized through &mut self, and reads are
|
|
// protected by the count field which is only incremented atomically.
|
|
unsafe impl Send for SoAVectorStorage {}
|
|
unsafe impl Sync for SoAVectorStorage {}
|
|
```
|
|
|
|
### 1.4 Micro-HNSW WASM (`crates/micro-hnsw-wasm/src/lib.rs`)
|
|
|
|
**Status:** High Risk - Extensive Unsafe Code
|
|
|
|
This is a `#![no_std]` WASM module with 50+ unsafe blocks using static mutable state.
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| WASM-001 | High | Static mutable state without synchronization | Lines 90-139 |
|
|
| WASM-002 | High | No bounds validation on external inputs | `insert()`, `search()` |
|
|
| WASM-003 | Medium | Raw pointer returned to caller without lifetime | `get_*_ptr()` functions |
|
|
| WASM-004 | Medium | No epoch timeout or resource limits | Global state |
|
|
| WASM-005 | Low | Panic handler is infinite loop | Line 1262 |
|
|
|
|
**Critical Code Pattern:**
|
|
```rust
|
|
// UNSAFE: Static mutable state accessed without synchronization
|
|
static mut HNSW: MicroHnsw = MicroHnsw { ... };
|
|
static mut QUERY: [f32; MAX_DIMS] = [0.0; MAX_DIMS];
|
|
static mut INSERT: [f32; MAX_DIMS] = [0.0; MAX_DIMS];
|
|
```
|
|
|
|
**Recommendations:**
|
|
1. Add input validation for all external entry points
|
|
2. Consider using atomic operations or mutex for state
|
|
3. Add memory limits and timeout mechanisms
|
|
4. Add `#[deny(unsafe_op_in_unsafe_fn)]`
|
|
|
|
---
|
|
|
|
## 2. Memory Safety Analysis
|
|
|
|
### 2.1 Buffer Overflow Analysis
|
|
|
|
**SIMD Operations:**
|
|
- All SIMD functions process data in chunks (4 or 8 elements)
|
|
- Remainder handling uses safe indexing
|
|
- No buffer overflows detected in current implementation
|
|
|
|
**Vector Operations:**
|
|
- `dimension_slice()` uses assertion for bounds check
|
|
- `push()` operations check capacity before writing
|
|
|
|
### 2.2 Integer Overflow Analysis
|
|
|
|
**Positive Findings:**
|
|
- Product Quantization validates `codebook_size > 256`
|
|
- SoAVectorStorage uses `checked_mul()` for size calculations
|
|
- Arena allocator uses `checked_add()` for offset calculations
|
|
|
|
**Potential Issue:**
|
|
```rust
|
|
// In ProductQuantized::train()
|
|
let subspace_dim = dimensions / num_subspaces;
|
|
// If num_subspaces > dimensions, this could be 0, leading to issues
|
|
```
|
|
|
|
**Recommendation:** Add validation for `num_subspaces <= dimensions`
|
|
|
|
### 2.3 Use-After-Free Analysis
|
|
|
|
**No vulnerabilities detected.** The codebase uses:
|
|
- Rust's ownership system
|
|
- Proper Drop implementations
|
|
- Arena-based allocation with explicit lifetimes
|
|
|
|
---
|
|
|
|
## 3. Input Validation
|
|
|
|
### 3.1 Vector Dimension Validation
|
|
|
|
**Positive Findings:**
|
|
- `MAX_VECTOR_DIMENSIONS = 65536` in WASM bindings
|
|
- Dimension mismatch returns proper errors
|
|
|
|
**WASM Module (`ruvector-wasm/src/lib.rs`):**
|
|
```rust
|
|
// Security: Validate vector dimensions before allocation
|
|
let vec_len = vector.length() as usize;
|
|
if vec_len == 0 {
|
|
return Err(JsValue::from_str("Vector cannot be empty"));
|
|
}
|
|
if vec_len > MAX_VECTOR_DIMENSIONS {
|
|
return Err(JsValue::from_str(&format!(
|
|
"Vector dimensions {} exceed maximum allowed {}",
|
|
vec_len, MAX_VECTOR_DIMENSIONS
|
|
)));
|
|
}
|
|
```
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| INPUT-001 | High | micro-hnsw-wasm has no dimension validation | `insert()` |
|
|
| INPUT-002 | Medium | No validation of `k` parameter in search | Multiple locations |
|
|
| INPUT-003 | Low | Empty vector handling varies by module | Multiple |
|
|
|
|
### 3.2 Quantization Parameters
|
|
|
|
**Positive Findings:**
|
|
- `codebook_size > 256` validation exists
|
|
- Empty vector validation in `ProductQuantized::train()`
|
|
|
|
**Identified Issues:**
|
|
|
|
| ID | Severity | Issue | Location |
|
|
|----|----------|-------|----------|
|
|
| QUANT-001 | Medium | No validation for `iterations` parameter | `kmeans_clustering()` |
|
|
| QUANT-002 | Low | Scale calculation could be 0 (handled) | `ScalarQuantized::quantize()` |
|
|
|
|
---
|
|
|
|
## 4. WASM Security Analysis
|
|
|
|
### 4.1 Main WASM Module (`ruvector-wasm`)
|
|
|
|
**Status:** Good Security Posture
|
|
|
|
**Positive Findings:**
|
|
- Uses `console_error_panic_hook` for debugging
|
|
- Input validation for vector dimensions
|
|
- Proper error handling with `WasmError` type
|
|
- IndexedDB operations properly async
|
|
|
|
### 4.2 Micro-HNSW WASM Module
|
|
|
|
**Status:** High Risk
|
|
|
|
**Security Concerns:**
|
|
1. **No Signature Validation:** Module exposes raw function pointers without verification
|
|
2. **No Epoch Timeouts:** Long-running operations cannot be interrupted
|
|
3. **Shared Memory:** Static mutable state is vulnerable to data races
|
|
4. **No Resource Limits:** Memory allocation is unbounded within MAX_VECTORS
|
|
|
|
**Recommendations:**
|
|
1. Implement resource quotas
|
|
2. Add timeout mechanisms for search operations
|
|
3. Consider WebAssembly Component Model for better isolation
|
|
4. Add input sanitization for all exported functions
|
|
|
|
### 4.3 Other WASM Modules
|
|
|
|
| Module | Risk Level | Notes |
|
|
|--------|------------|-------|
|
|
| ruvector-attention-wasm | Low | Standard WASM bindings |
|
|
| ruvector-mincut-wasm | Medium | Contains SIMD operations |
|
|
| ruvector-learning-wasm | Low | Standard bindings |
|
|
| ruvector-nervous-system-wasm | Low | Standard bindings |
|
|
|
|
---
|
|
|
|
## 5. Dependency Audit
|
|
|
|
### 5.1 Audit Status
|
|
|
|
**Note:** `cargo-audit` is not installed. Recommend installing and running:
|
|
```bash
|
|
cargo install cargo-audit
|
|
cargo audit
|
|
```
|
|
|
|
### 5.2 Key Dependencies Analysis
|
|
|
|
| Dependency | Version | Risk | Notes |
|
|
|------------|---------|------|-------|
|
|
| simsimd | 5.9 | Low | Native SIMD library, well-maintained |
|
|
| redb | 2.1 | Low | Embedded database, active development |
|
|
| parking_lot | 0.12 | Low | Well-audited mutex implementation |
|
|
| wasm-bindgen | 0.2 | Low | Official WASM tooling |
|
|
| hnsw_rs | 0.3 (patched) | Medium | Uses local patch for rand compatibility |
|
|
|
|
### 5.3 Potential Concerns
|
|
|
|
1. **hnsw_rs Patch:** The project patches `hnsw_rs` for WASM compatibility. This bypasses upstream security fixes.
|
|
2. **getrandom:** Multiple versions (0.2 vs 0.3) could cause inconsistencies
|
|
|
|
**Recommendation:** Regularly sync patch with upstream and monitor for security advisories.
|
|
|
|
---
|
|
|
|
## 6. Security Hardening Recommendations
|
|
|
|
### 6.1 Immediate Actions (Critical/High)
|
|
|
|
1. **Add `#[deny(unsafe_op_in_unsafe_fn)]` to all unsafe modules:**
|
|
```rust
|
|
#![deny(unsafe_op_in_unsafe_fn)]
|
|
```
|
|
|
|
2. **Add safety documentation to all unsafe impl blocks:**
|
|
```rust
|
|
// SAFETY: [Explain why this is safe]
|
|
unsafe impl Send for SoAVectorStorage {}
|
|
```
|
|
|
|
3. **Add input validation to micro-hnsw-wasm:**
|
|
```rust
|
|
#[no_mangle]
|
|
pub extern "C" fn insert() -> u8 {
|
|
unsafe {
|
|
// SECURITY: Validate inputs
|
|
if HNSW.dims == 0 || HNSW.dims > MAX_DIMS as u8 {
|
|
return 255;
|
|
}
|
|
// ... existing code
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6.2 Short-Term Actions (Medium)
|
|
|
|
1. **Add resource limits to WASM modules:**
|
|
- Maximum operation time
|
|
- Memory usage tracking
|
|
- Vector count limits
|
|
|
|
2. **Implement constant-time comparison for sensitive operations:**
|
|
```rust
|
|
/// Constant-time comparison to prevent timing attacks
|
|
fn constant_time_eq(a: &[u8], b: &[u8]) -> bool {
|
|
if a.len() != b.len() {
|
|
return false;
|
|
}
|
|
let mut result = 0u8;
|
|
for (x, y) in a.iter().zip(b.iter()) {
|
|
result |= x ^ y;
|
|
}
|
|
result == 0
|
|
}
|
|
```
|
|
|
|
3. **Add fuzzing targets:**
|
|
```rust
|
|
// In tests/fuzz_targets/
|
|
#[cfg(fuzzing)]
|
|
pub fn fuzz_euclidean_distance(data: &[u8]) {
|
|
if data.len() < 16 { return; }
|
|
let (a, b) = data.split_at(data.len() / 2);
|
|
let a_f32: Vec<f32> = a.chunks(4)
|
|
.filter_map(|c| c.try_into().ok())
|
|
.map(f32::from_le_bytes)
|
|
.collect();
|
|
// ... test with arbitrary inputs
|
|
}
|
|
```
|
|
|
|
### 6.3 Long-Term Actions (Low/Informational)
|
|
|
|
1. **Implement WASM Component Model** for better isolation
|
|
2. **Add security policy document** (SECURITY.md)
|
|
3. **Set up automated security scanning** in CI/CD
|
|
4. **Consider memory-safe alternatives** for critical paths
|
|
|
|
---
|
|
|
|
## 7. Verification Checklist
|
|
|
|
### Pre-Deployment Security Checklist
|
|
|
|
- [ ] Run `cargo audit` with no critical vulnerabilities
|
|
- [ ] All unsafe blocks have safety comments
|
|
- [ ] Input validation on all public APIs
|
|
- [ ] Resource limits configured for WASM
|
|
- [ ] No hardcoded secrets or credentials
|
|
- [ ] Panic handling properly configured
|
|
- [ ] Integer overflow checks in place
|
|
- [ ] Memory allocation limits enforced
|
|
|
|
### Continuous Security Measures
|
|
|
|
- [ ] Automated dependency updates (Dependabot)
|
|
- [ ] Regular security audits (quarterly)
|
|
- [ ] Fuzzing infrastructure in place
|
|
- [ ] Security incident response plan
|
|
|
|
---
|
|
|
|
## 8. Conclusion
|
|
|
|
The RuVector codebase demonstrates good security practices in most areas, particularly:
|
|
- Comprehensive input validation in main WASM bindings
|
|
- Proper use of checked arithmetic
|
|
- Well-documented unsafe code blocks
|
|
|
|
However, the following areas require attention:
|
|
1. **micro-hnsw-wasm** module has significant unsafe code without adequate safety guarantees
|
|
2. **cache_optimized.rs** has `unsafe impl Send/Sync` without documented safety invariants
|
|
3. Missing `#[deny(unsafe_op_in_unsafe_fn)]` lint across the codebase
|
|
|
|
**Overall Security Rating:** **Moderate Risk**
|
|
|
|
The core vector database functionality is well-protected, but the specialized WASM modules for embedded/edge deployment require hardening before production use.
|
|
|
|
---
|
|
|
|
## Appendix A: Files Reviewed
|
|
|
|
| File | Lines | Unsafe Blocks | Status |
|
|
|------|-------|---------------|--------|
|
|
| `ruvector-core/src/simd_intrinsics.rs` | 539 | 8 | Reviewed |
|
|
| `ruvector-core/src/arena.rs` | 282 | 6 | Reviewed |
|
|
| `ruvector-core/src/cache_optimized.rs` | 288 | 8 | Reviewed |
|
|
| `ruvector-core/src/distance.rs` | 168 | 0 | Reviewed |
|
|
| `ruvector-core/src/quantization.rs` | 432 | 0 | Reviewed |
|
|
| `micro-hnsw-wasm/src/lib.rs` | 1263 | 50+ | Reviewed |
|
|
| `ruvector-wasm/src/lib.rs` | 875 | 0 | Reviewed |
|
|
| `ruvector-mincut/src/wasm/simd.rs` | 169 | 4 | Reviewed |
|
|
| `ruvector-sparse-inference/src/backend/cpu.rs` | 481 | 12 | Reviewed |
|
|
|
|
## Appendix B: Security Tools Recommended
|
|
|
|
1. **cargo-audit** - Vulnerability scanning
|
|
2. **cargo-deny** - Dependency policy enforcement
|
|
3. **miri** - Undefined behavior detection
|
|
4. **cargo-fuzz** - Fuzzing framework
|
|
5. **clippy** - Linting with security rules
|
|
|
|
---
|
|
|
|
*This report was generated as part of a comprehensive security review. For questions or clarifications, please contact the security team.*
|