Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

12 KiB

Raw Permalink Blame History

Contributing to Ruvector

Thank you for your interest in contributing to Ruvector! This document provides guidelines and instructions for contributing.

Code of Conduct
Getting Started
Development Setup
Code Style
Testing
Pull Request Process
Commit Guidelines
Documentation
Performance
Community

Code of Conduct

Our Pledge

We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Positive behavior includes:

Using welcoming and inclusive language
Being respectful of differing viewpoints
Gracefully accepting constructive criticism
Focusing on what is best for the community
Showing empathy towards other community members

Unacceptable behavior includes:

Trolling, insulting/derogatory comments, and personal attacks
Public or private harassment
Publishing others' private information without permission
Other conduct which could reasonably be considered inappropriate

Getting Started

Prerequisites

Rust 1.77+: Install from rustup.rs
Node.js 16+: For Node.js bindings testing
Git: For version control
cargo-nextest (optional but recommended): cargo install cargo-nextest

Fork and Clone

Fork the repository on GitHub

Clone your fork:

git clone https://github.com/YOUR_USERNAME/ruvector.git
cd ruvector

Add upstream remote:

git remote add upstream https://github.com/ruvnet/ruvector.git

Development Setup

Build the Project

# Build all crates
cargo build

# Build with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Build specific crate
cargo build -p ruvector-core

Run Tests

# Run all tests
cargo test

# Run tests with nextest (parallel, faster)
cargo nextest run

# Run specific test
cargo test test_hnsw_search

# Run with logging
RUST_LOG=debug cargo test

# Run benchmarks
cargo bench

Check Code

# Format code
cargo fmt

# Check formatting without changes
cargo fmt -- --check

# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings

# Check all crates
cargo check --all-features

Code Style

Rust Style Guide

We follow the Rust Style Guide with these additions:

Naming Conventions

// Structs: PascalCase
struct VectorDatabase { }

// Functions: snake_case
fn insert_vector() { }

// Constants: SCREAMING_SNAKE_CASE
const MAX_DIMENSIONS: usize = 65536;

// Type parameters: Single uppercase letter or PascalCase
fn generic<T>() { }
fn generic<TMetric: DistanceMetric>() { }

Documentation

All public items must have doc comments:

/// A high-performance vector database.
///
/// # Examples
///
/// ```
/// use ruvector_core::VectorDB;
///
/// let db = VectorDB::new(DbOptions::default())?;
/// ```
pub struct VectorDB { }

/// Insert a vector into the database.
///
/// # Arguments
///
/// * `entry` - The vector entry to insert
///
/// # Returns
///
/// The ID of the inserted vector
///
/// # Errors
///
/// Returns `RuvectorError` if insertion fails
pub fn insert(&self, entry: VectorEntry) -> Result<VectorId> {
    // ...
}

Error Handling

Use Result<T, RuvectorError> for fallible operations
Use thiserror for error types
Provide context with error messages

use thiserror::Error;

#[derive(Error, Debug)]
pub enum RuvectorError {
    #[error("Vector dimension mismatch: expected {expected}, got {got}")]
    DimensionMismatch { expected: usize, got: usize },

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),
}

Performance

Use #[inline] for hot path functions
Profile before optimizing
Document performance characteristics

/// Distance calculation (hot path, inlined)
#[inline]
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
    // SIMD-optimized implementation
}

TypeScript/JavaScript Style

For Node.js bindings:

// Use TypeScript for type safety
interface VectorEntry {
    id?: string;
    vector: Float32Array;
    metadata?: Record<string, any>;
}

// Async/await for async operations
async function search(query: Float32Array): Promise<SearchResult[]> {
    return await db.search({ vector: query, k: 10 });
}

// Use const/let, never var
const db = new VectorDB(options);
let results = await db.search(query);

Testing

Test Structure

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_insert() {
        // Arrange
        let db = VectorDB::new(DbOptions::default()).unwrap();
        let entry = VectorEntry {
            id: None,
            vector: vec![0.1; 128],
            metadata: None,
        };

        // Act
        let id = db.insert(entry).unwrap();

        // Assert
        assert!(!id.is_empty());
    }

    #[test]
    fn test_error_handling() {
        let db = VectorDB::new(DbOptions::default()).unwrap();
        let wrong_dims = vec![0.1; 64]; // Wrong dimensions

        let result = db.insert(VectorEntry {
            id: None,
            vector: wrong_dims,
            metadata: None,
        });

        assert!(result.is_err());
    }
}

Property-Based Testing

Use proptest for property-based tests:

use proptest::prelude::*;

proptest! {
    #[test]
    fn test_distance_symmetry(
        a in prop::collection::vec(any::<f32>(), 128),
        b in prop::collection::vec(any::<f32>(), 128)
    ) {
        let d1 = euclidean_distance(&a, &b);
        let d2 = euclidean_distance(&b, &a);
        assert!((d1 - d2).abs() < 1e-5);
    }
}

Benchmarking

Use criterion for benchmarks:

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_search(c: &mut Criterion) {
    let db = setup_db();
    let query = vec![0.1; 128];

    c.bench_function("search 1M vectors", |b| {
        b.iter(|| {
            db.search(black_box(&SearchQuery {
                vector: query.clone(),
                k: 10,
                filter: None,
                include_vectors: false,
            }))
        })
    });
}

criterion_group!(benches, benchmark_search);
criterion_main!(benches);

Test Coverage

Aim for:

Unit tests: 80%+ coverage
Integration tests: All major features
Property tests: Core algorithms
Benchmarks: Performance-critical paths

Pull Request Process

Before Submitting

Create an issue first for major changes
Fork and branch: Create a feature branch
```
git checkout -b feature/my-new-feature
```
Write tests: Ensure new code has tests

Run checks:

cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test
cargo bench

Update documentation: Update relevant docs
Add changelog entry: Update CHANGELOG.md

PR Template

## Description

Brief description of changes

## Motivation

Why is this change needed?

## Changes

- Change 1
- Change 2

## Testing

How was this tested?

## Performance Impact

Any performance implications?

## Checklist

- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Changelog updated
- [ ] Code formatted (`cargo fmt`)
- [ ] Lints passing (`cargo clippy`)
- [ ] All tests passing (`cargo test`)

Review Process

Automated checks: CI must pass
Code review: At least one maintainer approval
Discussion: Address reviewer feedback
Merge: Squash and merge or rebase

Commit Guidelines

Commit Message Format

<type>(<scope>): <subject>

<body>

<footer>

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting)
refactor: Code refactoring
perf: Performance improvements
test: Test additions/changes
chore: Build process or auxiliary tool changes

Examples:

feat(hnsw): add parallel index construction

Implement parallel HNSW construction using rayon for faster
index building on multi-core systems.

- Split graph construction across threads
- Use atomic operations for thread-safe updates
- Achieve 4x speedup on 8-core system

Closes #123

fix(quantization): correct product quantization distance calculation

The distance calculation was not using precomputed lookup tables,
causing incorrect results.

Fixes #456

Commit Hygiene

One logical change per commit
Write clear, descriptive messages
Reference issues/PRs when applicable
Keep commits focused and atomic

Documentation

Code Documentation

Public APIs: Comprehensive rustdoc comments
Examples: Include usage examples in doc comments
Safety: Document unsafe code thoroughly
Panics: Document panic conditions

User Documentation

Update relevant docs:

README.md: Overview and quick start
guides/: User guides and tutorials
api/: API reference documentation
CHANGELOG.md: User-facing changes

Documentation Style

/// A vector database with HNSW indexing.
///
/// `VectorDB` provides fast approximate nearest neighbor search using
/// Hierarchical Navigable Small World (HNSW) graphs. It supports:
///
/// - Sub-millisecond query latency
/// - 95%+ recall with proper tuning
/// - Memory-mapped storage for large datasets
/// - Multiple distance metrics (Euclidean, Cosine, etc.)
///
/// # Examples
///
/// ```
/// use ruvector_core::{VectorDB, VectorEntry, DbOptions};
///
/// let mut options = DbOptions::default();
/// options.dimensions = 128;
///
/// let db = VectorDB::new(options)?;
///
/// let entry = VectorEntry {
///     id: None,
///     vector: vec![0.1; 128],
///     metadata: None,
/// };
///
/// let id = db.insert(entry)?;
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Performance
///
/// - Search: O(log n) with HNSW
/// - Insert: O(log n) amortized
/// - Memory: ~640 bytes per vector (M=32)
pub struct VectorDB { }

Performance

Performance Guidelines

Profile first: Use cargo flamegraph or perf
Measure impact: Benchmark before/after
Document trade-offs: Explain performance vs. other concerns
Use SIMD: Leverage SIMD intrinsics for hot paths
Avoid allocations: Reuse buffers in hot loops

Benchmarking Changes

# Benchmark baseline
git checkout main
cargo bench -- --save-baseline main

# Benchmark your changes
git checkout feature-branch
cargo bench -- --baseline main

Performance Checklist

Profiled hot paths
Benchmarked changes
No performance regressions
Documented performance characteristics
Considered memory usage

Community

Getting Help

GitHub Issues: Bug reports and feature requests
Discussions: Questions and general discussion
Pull Requests: Code contributions

Reporting Bugs

Use the bug report template:

**Describe the bug**
Clear description of the bug

**To Reproduce**
1. Step 1
2. Step 2
3. See error

**Expected behavior**
What you expected to happen

**Environment**
- OS: [e.g., Ubuntu 22.04]
- Rust version: [e.g., 1.77.0]
- Ruvector version: [e.g., 0.1.0]

**Additional context**
Any other relevant information

Feature Requests

Use the feature request template:

**Is your feature request related to a problem?**
Clear description of the problem

**Describe the solution you'd like**
What you want to happen

**Describe alternatives you've considered**
Other solutions you've thought about

**Additional context**
Any other relevant information

License

By contributing to Ruvector, you agree that your contributions will be licensed under the MIT License.

Questions?

Feel free to open an issue or discussion if you have questions about contributing!

Thank you for contributing to Ruvector! 🚀

12 KiB Raw Permalink Blame History

Contributing to Ruvector

Table of Contents

Code of Conduct

Our Pledge

Our Standards

Getting Started

Prerequisites

Fork and Clone

Development Setup

Build the Project

Run Tests

Check Code

Code Style

Rust Style Guide

Naming Conventions

Documentation

Error Handling

Performance

TypeScript/JavaScript Style

Testing

Test Structure

Property-Based Testing

Benchmarking

Test Coverage

Pull Request Process

Before Submitting

PR Template

Review Process

Commit Guidelines

Commit Message Format

Commit Hygiene

Documentation

Code Documentation

User Documentation

Documentation Style

Performance

Performance Guidelines

Benchmarking Changes

Performance Checklist

Community

Getting Help

Reporting Bugs

Feature Requests

License

Questions?

12 KiB

Raw Permalink Blame History