12 KiB
Contributing to Ruvector
Thank you for your interest in contributing to Ruvector! This document provides guidelines and instructions for contributing.
Table of Contents
- Code of Conduct
- Getting Started
- Development Setup
- Code Style
- Testing
- Pull Request Process
- Commit Guidelines
- Documentation
- Performance
- Community
Code of Conduct
Our Pledge
We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
Our Standards
Positive behavior includes:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Unacceptable behavior includes:
- Trolling, insulting/derogatory comments, and personal attacks
- Public or private harassment
- Publishing others' private information without permission
- Other conduct which could reasonably be considered inappropriate
Getting Started
Prerequisites
- Rust 1.77+: Install from rustup.rs
- Node.js 16+: For Node.js bindings testing
- Git: For version control
- cargo-nextest (optional but recommended):
cargo install cargo-nextest
Fork and Clone
- Fork the repository on GitHub
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/ruvector.git cd ruvector - Add upstream remote:
git remote add upstream https://github.com/ruvnet/ruvector.git
Development Setup
Build the Project
# Build all crates
cargo build
# Build with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Build specific crate
cargo build -p ruvector-core
Run Tests
# Run all tests
cargo test
# Run tests with nextest (parallel, faster)
cargo nextest run
# Run specific test
cargo test test_hnsw_search
# Run with logging
RUST_LOG=debug cargo test
# Run benchmarks
cargo bench
Check Code
# Format code
cargo fmt
# Check formatting without changes
cargo fmt -- --check
# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings
# Check all crates
cargo check --all-features
Code Style
Rust Style Guide
We follow the Rust Style Guide with these additions:
Naming Conventions
// Structs: PascalCase
struct VectorDatabase { }
// Functions: snake_case
fn insert_vector() { }
// Constants: SCREAMING_SNAKE_CASE
const MAX_DIMENSIONS: usize = 65536;
// Type parameters: Single uppercase letter or PascalCase
fn generic<T>() { }
fn generic<TMetric: DistanceMetric>() { }
Documentation
All public items must have doc comments:
/// A high-performance vector database.
///
/// # Examples
///
/// ```
/// use ruvector_core::VectorDB;
///
/// let db = VectorDB::new(DbOptions::default())?;
/// ```
pub struct VectorDB { }
/// Insert a vector into the database.
///
/// # Arguments
///
/// * `entry` - The vector entry to insert
///
/// # Returns
///
/// The ID of the inserted vector
///
/// # Errors
///
/// Returns `RuvectorError` if insertion fails
pub fn insert(&self, entry: VectorEntry) -> Result<VectorId> {
// ...
}
Error Handling
- Use
Result<T, RuvectorError>for fallible operations - Use
thiserrorfor error types - Provide context with error messages
use thiserror::Error;
#[derive(Error, Debug)]
pub enum RuvectorError {
#[error("Vector dimension mismatch: expected {expected}, got {got}")]
DimensionMismatch { expected: usize, got: usize },
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
}
Performance
- Use
#[inline]for hot path functions - Profile before optimizing
- Document performance characteristics
/// Distance calculation (hot path, inlined)
#[inline]
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
// SIMD-optimized implementation
}
TypeScript/JavaScript Style
For Node.js bindings:
// Use TypeScript for type safety
interface VectorEntry {
id?: string;
vector: Float32Array;
metadata?: Record<string, any>;
}
// Async/await for async operations
async function search(query: Float32Array): Promise<SearchResult[]> {
return await db.search({ vector: query, k: 10 });
}
// Use const/let, never var
const db = new VectorDB(options);
let results = await db.search(query);
Testing
Test Structure
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_insert() {
// Arrange
let db = VectorDB::new(DbOptions::default()).unwrap();
let entry = VectorEntry {
id: None,
vector: vec![0.1; 128],
metadata: None,
};
// Act
let id = db.insert(entry).unwrap();
// Assert
assert!(!id.is_empty());
}
#[test]
fn test_error_handling() {
let db = VectorDB::new(DbOptions::default()).unwrap();
let wrong_dims = vec![0.1; 64]; // Wrong dimensions
let result = db.insert(VectorEntry {
id: None,
vector: wrong_dims,
metadata: None,
});
assert!(result.is_err());
}
}
Property-Based Testing
Use proptest for property-based tests:
use proptest::prelude::*;
proptest! {
#[test]
fn test_distance_symmetry(
a in prop::collection::vec(any::<f32>(), 128),
b in prop::collection::vec(any::<f32>(), 128)
) {
let d1 = euclidean_distance(&a, &b);
let d2 = euclidean_distance(&b, &a);
assert!((d1 - d2).abs() < 1e-5);
}
}
Benchmarking
Use criterion for benchmarks:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_search(c: &mut Criterion) {
let db = setup_db();
let query = vec![0.1; 128];
c.bench_function("search 1M vectors", |b| {
b.iter(|| {
db.search(black_box(&SearchQuery {
vector: query.clone(),
k: 10,
filter: None,
include_vectors: false,
}))
})
});
}
criterion_group!(benches, benchmark_search);
criterion_main!(benches);
Test Coverage
Aim for:
- Unit tests: 80%+ coverage
- Integration tests: All major features
- Property tests: Core algorithms
- Benchmarks: Performance-critical paths
Pull Request Process
Before Submitting
- Create an issue first for major changes
- Fork and branch: Create a feature branch
git checkout -b feature/my-new-feature - Write tests: Ensure new code has tests
- Run checks:
cargo fmt cargo clippy --all-targets --all-features -- -D warnings cargo test cargo bench - Update documentation: Update relevant docs
- Add changelog entry: Update CHANGELOG.md
PR Template
## Description
Brief description of changes
## Motivation
Why is this change needed?
## Changes
- Change 1
- Change 2
## Testing
How was this tested?
## Performance Impact
Any performance implications?
## Checklist
- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Changelog updated
- [ ] Code formatted (`cargo fmt`)
- [ ] Lints passing (`cargo clippy`)
- [ ] All tests passing (`cargo test`)
Review Process
- Automated checks: CI must pass
- Code review: At least one maintainer approval
- Discussion: Address reviewer feedback
- Merge: Squash and merge or rebase
Commit Guidelines
Commit Message Format
<type>(<scope>): <subject>
<body>
<footer>
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting)refactor: Code refactoringperf: Performance improvementstest: Test additions/changeschore: Build process or auxiliary tool changes
Examples:
feat(hnsw): add parallel index construction
Implement parallel HNSW construction using rayon for faster
index building on multi-core systems.
- Split graph construction across threads
- Use atomic operations for thread-safe updates
- Achieve 4x speedup on 8-core system
Closes #123
fix(quantization): correct product quantization distance calculation
The distance calculation was not using precomputed lookup tables,
causing incorrect results.
Fixes #456
Commit Hygiene
- One logical change per commit
- Write clear, descriptive messages
- Reference issues/PRs when applicable
- Keep commits focused and atomic
Documentation
Code Documentation
- Public APIs: Comprehensive rustdoc comments
- Examples: Include usage examples in doc comments
- Safety: Document unsafe code thoroughly
- Panics: Document panic conditions
User Documentation
Update relevant docs:
- README.md: Overview and quick start
- guides/: User guides and tutorials
- api/: API reference documentation
- CHANGELOG.md: User-facing changes
Documentation Style
/// A vector database with HNSW indexing.
///
/// `VectorDB` provides fast approximate nearest neighbor search using
/// Hierarchical Navigable Small World (HNSW) graphs. It supports:
///
/// - Sub-millisecond query latency
/// - 95%+ recall with proper tuning
/// - Memory-mapped storage for large datasets
/// - Multiple distance metrics (Euclidean, Cosine, etc.)
///
/// # Examples
///
/// ```
/// use ruvector_core::{VectorDB, VectorEntry, DbOptions};
///
/// let mut options = DbOptions::default();
/// options.dimensions = 128;
///
/// let db = VectorDB::new(options)?;
///
/// let entry = VectorEntry {
/// id: None,
/// vector: vec![0.1; 128],
/// metadata: None,
/// };
///
/// let id = db.insert(entry)?;
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Performance
///
/// - Search: O(log n) with HNSW
/// - Insert: O(log n) amortized
/// - Memory: ~640 bytes per vector (M=32)
pub struct VectorDB { }
Performance
Performance Guidelines
- Profile first: Use
cargo flamegraphorperf - Measure impact: Benchmark before/after
- Document trade-offs: Explain performance vs. other concerns
- Use SIMD: Leverage SIMD intrinsics for hot paths
- Avoid allocations: Reuse buffers in hot loops
Benchmarking Changes
# Benchmark baseline
git checkout main
cargo bench -- --save-baseline main
# Benchmark your changes
git checkout feature-branch
cargo bench -- --baseline main
Performance Checklist
- Profiled hot paths
- Benchmarked changes
- No performance regressions
- Documented performance characteristics
- Considered memory usage
Community
Getting Help
- GitHub Issues: Bug reports and feature requests
- Discussions: Questions and general discussion
- Pull Requests: Code contributions
Reporting Bugs
Use the bug report template:
**Describe the bug**
Clear description of the bug
**To Reproduce**
1. Step 1
2. Step 2
3. See error
**Expected behavior**
What you expected to happen
**Environment**
- OS: [e.g., Ubuntu 22.04]
- Rust version: [e.g., 1.77.0]
- Ruvector version: [e.g., 0.1.0]
**Additional context**
Any other relevant information
Feature Requests
Use the feature request template:
**Is your feature request related to a problem?**
Clear description of the problem
**Describe the solution you'd like**
What you want to happen
**Describe alternatives you've considered**
Other solutions you've thought about
**Additional context**
Any other relevant information
License
By contributing to Ruvector, you agree that your contributions will be licensed under the MIT License.
Questions?
Feel free to open an issue or discussion if you have questions about contributing!
Thank you for contributing to Ruvector! 🚀