Files
wifi-densepose/vendor/ruvector/docs/development/CONTRIBUTING.md

584 lines
12 KiB
Markdown

# Contributing to Ruvector
Thank you for your interest in contributing to Ruvector! This document provides guidelines and instructions for contributing.
## Table of Contents
1. [Code of Conduct](#code-of-conduct)
2. [Getting Started](#getting-started)
3. [Development Setup](#development-setup)
4. [Code Style](#code-style)
5. [Testing](#testing)
6. [Pull Request Process](#pull-request-process)
7. [Commit Guidelines](#commit-guidelines)
8. [Documentation](#documentation)
9. [Performance](#performance)
10. [Community](#community)
## Code of Conduct
### Our Pledge
We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
### Our Standards
**Positive behavior includes**:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
**Unacceptable behavior includes**:
- Trolling, insulting/derogatory comments, and personal attacks
- Public or private harassment
- Publishing others' private information without permission
- Other conduct which could reasonably be considered inappropriate
## Getting Started
### Prerequisites
- **Rust 1.77+**: Install from [rustup.rs](https://rustup.rs/)
- **Node.js 16+**: For Node.js bindings testing
- **Git**: For version control
- **cargo-nextest** (optional but recommended): `cargo install cargo-nextest`
### Fork and Clone
1. Fork the repository on GitHub
2. Clone your fork:
```bash
git clone https://github.com/YOUR_USERNAME/ruvector.git
cd ruvector
```
3. Add upstream remote:
```bash
git remote add upstream https://github.com/ruvnet/ruvector.git
```
## Development Setup
### Build the Project
```bash
# Build all crates
cargo build
# Build with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Build specific crate
cargo build -p ruvector-core
```
### Run Tests
```bash
# Run all tests
cargo test
# Run tests with nextest (parallel, faster)
cargo nextest run
# Run specific test
cargo test test_hnsw_search
# Run with logging
RUST_LOG=debug cargo test
# Run benchmarks
cargo bench
```
### Check Code
```bash
# Format code
cargo fmt
# Check formatting without changes
cargo fmt -- --check
# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings
# Check all crates
cargo check --all-features
```
## Code Style
### Rust Style Guide
We follow the [Rust Style Guide](https://doc.rust-lang.org/1.0.0/style/) with these additions:
#### Naming Conventions
```rust
// Structs: PascalCase
struct VectorDatabase { }
// Functions: snake_case
fn insert_vector() { }
// Constants: SCREAMING_SNAKE_CASE
const MAX_DIMENSIONS: usize = 65536;
// Type parameters: Single uppercase letter or PascalCase
fn generic<T>() { }
fn generic<TMetric: DistanceMetric>() { }
```
#### Documentation
All public items must have doc comments:
```rust
/// A high-performance vector database.
///
/// # Examples
///
/// ```
/// use ruvector_core::VectorDB;
///
/// let db = VectorDB::new(DbOptions::default())?;
/// ```
pub struct VectorDB { }
/// Insert a vector into the database.
///
/// # Arguments
///
/// * `entry` - The vector entry to insert
///
/// # Returns
///
/// The ID of the inserted vector
///
/// # Errors
///
/// Returns `RuvectorError` if insertion fails
pub fn insert(&self, entry: VectorEntry) -> Result<VectorId> {
// ...
}
```
#### Error Handling
- Use `Result<T, RuvectorError>` for fallible operations
- Use `thiserror` for error types
- Provide context with error messages
```rust
use thiserror::Error;
#[derive(Error, Debug)]
pub enum RuvectorError {
#[error("Vector dimension mismatch: expected {expected}, got {got}")]
DimensionMismatch { expected: usize, got: usize },
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
}
```
#### Performance
- Use `#[inline]` for hot path functions
- Profile before optimizing
- Document performance characteristics
```rust
/// Distance calculation (hot path, inlined)
#[inline]
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
// SIMD-optimized implementation
}
```
### TypeScript/JavaScript Style
For Node.js bindings:
```typescript
// Use TypeScript for type safety
interface VectorEntry {
id?: string;
vector: Float32Array;
metadata?: Record<string, any>;
}
// Async/await for async operations
async function search(query: Float32Array): Promise<SearchResult[]> {
return await db.search({ vector: query, k: 10 });
}
// Use const/let, never var
const db = new VectorDB(options);
let results = await db.search(query);
```
## Testing
### Test Structure
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_insert() {
// Arrange
let db = VectorDB::new(DbOptions::default()).unwrap();
let entry = VectorEntry {
id: None,
vector: vec![0.1; 128],
metadata: None,
};
// Act
let id = db.insert(entry).unwrap();
// Assert
assert!(!id.is_empty());
}
#[test]
fn test_error_handling() {
let db = VectorDB::new(DbOptions::default()).unwrap();
let wrong_dims = vec![0.1; 64]; // Wrong dimensions
let result = db.insert(VectorEntry {
id: None,
vector: wrong_dims,
metadata: None,
});
assert!(result.is_err());
}
}
```
### Property-Based Testing
Use `proptest` for property-based tests:
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn test_distance_symmetry(
a in prop::collection::vec(any::<f32>(), 128),
b in prop::collection::vec(any::<f32>(), 128)
) {
let d1 = euclidean_distance(&a, &b);
let d2 = euclidean_distance(&b, &a);
assert!((d1 - d2).abs() < 1e-5);
}
}
```
### Benchmarking
Use `criterion` for benchmarks:
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_search(c: &mut Criterion) {
let db = setup_db();
let query = vec![0.1; 128];
c.bench_function("search 1M vectors", |b| {
b.iter(|| {
db.search(black_box(&SearchQuery {
vector: query.clone(),
k: 10,
filter: None,
include_vectors: false,
}))
})
});
}
criterion_group!(benches, benchmark_search);
criterion_main!(benches);
```
### Test Coverage
Aim for:
- **Unit tests**: 80%+ coverage
- **Integration tests**: All major features
- **Property tests**: Core algorithms
- **Benchmarks**: Performance-critical paths
## Pull Request Process
### Before Submitting
1. **Create an issue** first for major changes
2. **Fork and branch**: Create a feature branch
```bash
git checkout -b feature/my-new-feature
```
3. **Write tests**: Ensure new code has tests
4. **Run checks**:
```bash
cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test
cargo bench
```
5. **Update documentation**: Update relevant docs
6. **Add changelog entry**: Update CHANGELOG.md
### PR Template
```markdown
## Description
Brief description of changes
## Motivation
Why is this change needed?
## Changes
- Change 1
- Change 2
## Testing
How was this tested?
## Performance Impact
Any performance implications?
## Checklist
- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Changelog updated
- [ ] Code formatted (`cargo fmt`)
- [ ] Lints passing (`cargo clippy`)
- [ ] All tests passing (`cargo test`)
```
### Review Process
1. **Automated checks**: CI must pass
2. **Code review**: At least one maintainer approval
3. **Discussion**: Address reviewer feedback
4. **Merge**: Squash and merge or rebase
## Commit Guidelines
### Commit Message Format
```
<type>(<scope>): <subject>
<body>
<footer>
```
**Types**:
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation changes
- `style`: Code style changes (formatting)
- `refactor`: Code refactoring
- `perf`: Performance improvements
- `test`: Test additions/changes
- `chore`: Build process or auxiliary tool changes
**Examples**:
```
feat(hnsw): add parallel index construction
Implement parallel HNSW construction using rayon for faster
index building on multi-core systems.
- Split graph construction across threads
- Use atomic operations for thread-safe updates
- Achieve 4x speedup on 8-core system
Closes #123
```
```
fix(quantization): correct product quantization distance calculation
The distance calculation was not using precomputed lookup tables,
causing incorrect results.
Fixes #456
```
### Commit Hygiene
- One logical change per commit
- Write clear, descriptive messages
- Reference issues/PRs when applicable
- Keep commits focused and atomic
## Documentation
### Code Documentation
- **Public APIs**: Comprehensive rustdoc comments
- **Examples**: Include usage examples in doc comments
- **Safety**: Document unsafe code thoroughly
- **Panics**: Document panic conditions
### User Documentation
Update relevant docs:
- **README.md**: Overview and quick start
- **guides/**: User guides and tutorials
- **api/**: API reference documentation
- **CHANGELOG.md**: User-facing changes
### Documentation Style
```rust
/// A vector database with HNSW indexing.
///
/// `VectorDB` provides fast approximate nearest neighbor search using
/// Hierarchical Navigable Small World (HNSW) graphs. It supports:
///
/// - Sub-millisecond query latency
/// - 95%+ recall with proper tuning
/// - Memory-mapped storage for large datasets
/// - Multiple distance metrics (Euclidean, Cosine, etc.)
///
/// # Examples
///
/// ```
/// use ruvector_core::{VectorDB, VectorEntry, DbOptions};
///
/// let mut options = DbOptions::default();
/// options.dimensions = 128;
///
/// let db = VectorDB::new(options)?;
///
/// let entry = VectorEntry {
/// id: None,
/// vector: vec![0.1; 128],
/// metadata: None,
/// };
///
/// let id = db.insert(entry)?;
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Performance
///
/// - Search: O(log n) with HNSW
/// - Insert: O(log n) amortized
/// - Memory: ~640 bytes per vector (M=32)
pub struct VectorDB { }
```
## Performance
### Performance Guidelines
1. **Profile first**: Use `cargo flamegraph` or `perf`
2. **Measure impact**: Benchmark before/after
3. **Document trade-offs**: Explain performance vs. other concerns
4. **Use SIMD**: Leverage SIMD intrinsics for hot paths
5. **Avoid allocations**: Reuse buffers in hot loops
### Benchmarking Changes
```bash
# Benchmark baseline
git checkout main
cargo bench -- --save-baseline main
# Benchmark your changes
git checkout feature-branch
cargo bench -- --baseline main
```
### Performance Checklist
- [ ] Profiled hot paths
- [ ] Benchmarked changes
- [ ] No performance regressions
- [ ] Documented performance characteristics
- [ ] Considered memory usage
## Community
### Getting Help
- **GitHub Issues**: Bug reports and feature requests
- **Discussions**: Questions and general discussion
- **Pull Requests**: Code contributions
### Reporting Bugs
Use the bug report template:
```markdown
**Describe the bug**
Clear description of the bug
**To Reproduce**
1. Step 1
2. Step 2
3. See error
**Expected behavior**
What you expected to happen
**Environment**
- OS: [e.g., Ubuntu 22.04]
- Rust version: [e.g., 1.77.0]
- Ruvector version: [e.g., 0.1.0]
**Additional context**
Any other relevant information
```
### Feature Requests
Use the feature request template:
```markdown
**Is your feature request related to a problem?**
Clear description of the problem
**Describe the solution you'd like**
What you want to happen
**Describe alternatives you've considered**
Other solutions you've thought about
**Additional context**
Any other relevant information
```
## License
By contributing to Ruvector, you agree that your contributions will be licensed under the MIT License.
## Questions?
Feel free to open an issue or discussion if you have questions about contributing!
---
Thank you for contributing to Ruvector! 🚀