Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
583
vendor/ruvector/docs/development/CONTRIBUTING.md
vendored
Normal file
583
vendor/ruvector/docs/development/CONTRIBUTING.md
vendored
Normal file
@@ -0,0 +1,583 @@
|
||||
# Contributing to Ruvector
|
||||
|
||||
Thank you for your interest in contributing to Ruvector! This document provides guidelines and instructions for contributing.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Code of Conduct](#code-of-conduct)
|
||||
2. [Getting Started](#getting-started)
|
||||
3. [Development Setup](#development-setup)
|
||||
4. [Code Style](#code-style)
|
||||
5. [Testing](#testing)
|
||||
6. [Pull Request Process](#pull-request-process)
|
||||
7. [Commit Guidelines](#commit-guidelines)
|
||||
8. [Documentation](#documentation)
|
||||
9. [Performance](#performance)
|
||||
10. [Community](#community)
|
||||
|
||||
## Code of Conduct
|
||||
|
||||
### Our Pledge
|
||||
|
||||
We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
|
||||
|
||||
### Our Standards
|
||||
|
||||
**Positive behavior includes**:
|
||||
- Using welcoming and inclusive language
|
||||
- Being respectful of differing viewpoints
|
||||
- Gracefully accepting constructive criticism
|
||||
- Focusing on what is best for the community
|
||||
- Showing empathy towards other community members
|
||||
|
||||
**Unacceptable behavior includes**:
|
||||
- Trolling, insulting/derogatory comments, and personal attacks
|
||||
- Public or private harassment
|
||||
- Publishing others' private information without permission
|
||||
- Other conduct which could reasonably be considered inappropriate
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **Rust 1.77+**: Install from [rustup.rs](https://rustup.rs/)
|
||||
- **Node.js 16+**: For Node.js bindings testing
|
||||
- **Git**: For version control
|
||||
- **cargo-nextest** (optional but recommended): `cargo install cargo-nextest`
|
||||
|
||||
### Fork and Clone
|
||||
|
||||
1. Fork the repository on GitHub
|
||||
2. Clone your fork:
|
||||
```bash
|
||||
git clone https://github.com/YOUR_USERNAME/ruvector.git
|
||||
cd ruvector
|
||||
```
|
||||
3. Add upstream remote:
|
||||
```bash
|
||||
git remote add upstream https://github.com/ruvnet/ruvector.git
|
||||
```
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Build the Project
|
||||
|
||||
```bash
|
||||
# Build all crates
|
||||
cargo build
|
||||
|
||||
# Build with optimizations
|
||||
RUSTFLAGS="-C target-cpu=native" cargo build --release
|
||||
|
||||
# Build specific crate
|
||||
cargo build -p ruvector-core
|
||||
```
|
||||
|
||||
### Run Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
cargo test
|
||||
|
||||
# Run tests with nextest (parallel, faster)
|
||||
cargo nextest run
|
||||
|
||||
# Run specific test
|
||||
cargo test test_hnsw_search
|
||||
|
||||
# Run with logging
|
||||
RUST_LOG=debug cargo test
|
||||
|
||||
# Run benchmarks
|
||||
cargo bench
|
||||
```
|
||||
|
||||
### Check Code
|
||||
|
||||
```bash
|
||||
# Format code
|
||||
cargo fmt
|
||||
|
||||
# Check formatting without changes
|
||||
cargo fmt -- --check
|
||||
|
||||
# Run clippy lints
|
||||
cargo clippy --all-targets --all-features -- -D warnings
|
||||
|
||||
# Check all crates
|
||||
cargo check --all-features
|
||||
```
|
||||
|
||||
## Code Style
|
||||
|
||||
### Rust Style Guide
|
||||
|
||||
We follow the [Rust Style Guide](https://doc.rust-lang.org/1.0.0/style/) with these additions:
|
||||
|
||||
#### Naming Conventions
|
||||
|
||||
```rust
|
||||
// Structs: PascalCase
|
||||
struct VectorDatabase { }
|
||||
|
||||
// Functions: snake_case
|
||||
fn insert_vector() { }
|
||||
|
||||
// Constants: SCREAMING_SNAKE_CASE
|
||||
const MAX_DIMENSIONS: usize = 65536;
|
||||
|
||||
// Type parameters: Single uppercase letter or PascalCase
|
||||
fn generic<T>() { }
|
||||
fn generic<TMetric: DistanceMetric>() { }
|
||||
```
|
||||
|
||||
#### Documentation
|
||||
|
||||
All public items must have doc comments:
|
||||
|
||||
```rust
|
||||
/// A high-performance vector database.
|
||||
///
|
||||
/// # Examples
|
||||
///
|
||||
/// ```
|
||||
/// use ruvector_core::VectorDB;
|
||||
///
|
||||
/// let db = VectorDB::new(DbOptions::default())?;
|
||||
/// ```
|
||||
pub struct VectorDB { }
|
||||
|
||||
/// Insert a vector into the database.
|
||||
///
|
||||
/// # Arguments
|
||||
///
|
||||
/// * `entry` - The vector entry to insert
|
||||
///
|
||||
/// # Returns
|
||||
///
|
||||
/// The ID of the inserted vector
|
||||
///
|
||||
/// # Errors
|
||||
///
|
||||
/// Returns `RuvectorError` if insertion fails
|
||||
pub fn insert(&self, entry: VectorEntry) -> Result<VectorId> {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
#### Error Handling
|
||||
|
||||
- Use `Result<T, RuvectorError>` for fallible operations
|
||||
- Use `thiserror` for error types
|
||||
- Provide context with error messages
|
||||
|
||||
```rust
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Error, Debug)]
|
||||
pub enum RuvectorError {
|
||||
#[error("Vector dimension mismatch: expected {expected}, got {got}")]
|
||||
DimensionMismatch { expected: usize, got: usize },
|
||||
|
||||
#[error("IO error: {0}")]
|
||||
Io(#[from] std::io::Error),
|
||||
}
|
||||
```
|
||||
|
||||
#### Performance
|
||||
|
||||
- Use `#[inline]` for hot path functions
|
||||
- Profile before optimizing
|
||||
- Document performance characteristics
|
||||
|
||||
```rust
|
||||
/// Distance calculation (hot path, inlined)
|
||||
#[inline]
|
||||
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
|
||||
// SIMD-optimized implementation
|
||||
}
|
||||
```
|
||||
|
||||
### TypeScript/JavaScript Style
|
||||
|
||||
For Node.js bindings:
|
||||
|
||||
```typescript
|
||||
// Use TypeScript for type safety
|
||||
interface VectorEntry {
|
||||
id?: string;
|
||||
vector: Float32Array;
|
||||
metadata?: Record<string, any>;
|
||||
}
|
||||
|
||||
// Async/await for async operations
|
||||
async function search(query: Float32Array): Promise<SearchResult[]> {
|
||||
return await db.search({ vector: query, k: 10 });
|
||||
}
|
||||
|
||||
// Use const/let, never var
|
||||
const db = new VectorDB(options);
|
||||
let results = await db.search(query);
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Structure
|
||||
|
||||
```rust
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_basic_insert() {
|
||||
// Arrange
|
||||
let db = VectorDB::new(DbOptions::default()).unwrap();
|
||||
let entry = VectorEntry {
|
||||
id: None,
|
||||
vector: vec![0.1; 128],
|
||||
metadata: None,
|
||||
};
|
||||
|
||||
// Act
|
||||
let id = db.insert(entry).unwrap();
|
||||
|
||||
// Assert
|
||||
assert!(!id.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_error_handling() {
|
||||
let db = VectorDB::new(DbOptions::default()).unwrap();
|
||||
let wrong_dims = vec![0.1; 64]; // Wrong dimensions
|
||||
|
||||
let result = db.insert(VectorEntry {
|
||||
id: None,
|
||||
vector: wrong_dims,
|
||||
metadata: None,
|
||||
});
|
||||
|
||||
assert!(result.is_err());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Property-Based Testing
|
||||
|
||||
Use `proptest` for property-based tests:
|
||||
|
||||
```rust
|
||||
use proptest::prelude::*;
|
||||
|
||||
proptest! {
|
||||
#[test]
|
||||
fn test_distance_symmetry(
|
||||
a in prop::collection::vec(any::<f32>(), 128),
|
||||
b in prop::collection::vec(any::<f32>(), 128)
|
||||
) {
|
||||
let d1 = euclidean_distance(&a, &b);
|
||||
let d2 = euclidean_distance(&b, &a);
|
||||
assert!((d1 - d2).abs() < 1e-5);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Benchmarking
|
||||
|
||||
Use `criterion` for benchmarks:
|
||||
|
||||
```rust
|
||||
use criterion::{black_box, criterion_group, criterion_main, Criterion};
|
||||
|
||||
fn benchmark_search(c: &mut Criterion) {
|
||||
let db = setup_db();
|
||||
let query = vec![0.1; 128];
|
||||
|
||||
c.bench_function("search 1M vectors", |b| {
|
||||
b.iter(|| {
|
||||
db.search(black_box(&SearchQuery {
|
||||
vector: query.clone(),
|
||||
k: 10,
|
||||
filter: None,
|
||||
include_vectors: false,
|
||||
}))
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
criterion_group!(benches, benchmark_search);
|
||||
criterion_main!(benches);
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
Aim for:
|
||||
- **Unit tests**: 80%+ coverage
|
||||
- **Integration tests**: All major features
|
||||
- **Property tests**: Core algorithms
|
||||
- **Benchmarks**: Performance-critical paths
|
||||
|
||||
## Pull Request Process
|
||||
|
||||
### Before Submitting
|
||||
|
||||
1. **Create an issue** first for major changes
|
||||
2. **Fork and branch**: Create a feature branch
|
||||
```bash
|
||||
git checkout -b feature/my-new-feature
|
||||
```
|
||||
3. **Write tests**: Ensure new code has tests
|
||||
4. **Run checks**:
|
||||
```bash
|
||||
cargo fmt
|
||||
cargo clippy --all-targets --all-features -- -D warnings
|
||||
cargo test
|
||||
cargo bench
|
||||
```
|
||||
5. **Update documentation**: Update relevant docs
|
||||
6. **Add changelog entry**: Update CHANGELOG.md
|
||||
|
||||
### PR Template
|
||||
|
||||
```markdown
|
||||
## Description
|
||||
|
||||
Brief description of changes
|
||||
|
||||
## Motivation
|
||||
|
||||
Why is this change needed?
|
||||
|
||||
## Changes
|
||||
|
||||
- Change 1
|
||||
- Change 2
|
||||
|
||||
## Testing
|
||||
|
||||
How was this tested?
|
||||
|
||||
## Performance Impact
|
||||
|
||||
Any performance implications?
|
||||
|
||||
## Checklist
|
||||
|
||||
- [ ] Tests added/updated
|
||||
- [ ] Documentation updated
|
||||
- [ ] Changelog updated
|
||||
- [ ] Code formatted (`cargo fmt`)
|
||||
- [ ] Lints passing (`cargo clippy`)
|
||||
- [ ] All tests passing (`cargo test`)
|
||||
```
|
||||
|
||||
### Review Process
|
||||
|
||||
1. **Automated checks**: CI must pass
|
||||
2. **Code review**: At least one maintainer approval
|
||||
3. **Discussion**: Address reviewer feedback
|
||||
4. **Merge**: Squash and merge or rebase
|
||||
|
||||
## Commit Guidelines
|
||||
|
||||
### Commit Message Format
|
||||
|
||||
```
|
||||
<type>(<scope>): <subject>
|
||||
|
||||
<body>
|
||||
|
||||
<footer>
|
||||
```
|
||||
|
||||
**Types**:
|
||||
- `feat`: New feature
|
||||
- `fix`: Bug fix
|
||||
- `docs`: Documentation changes
|
||||
- `style`: Code style changes (formatting)
|
||||
- `refactor`: Code refactoring
|
||||
- `perf`: Performance improvements
|
||||
- `test`: Test additions/changes
|
||||
- `chore`: Build process or auxiliary tool changes
|
||||
|
||||
**Examples**:
|
||||
|
||||
```
|
||||
feat(hnsw): add parallel index construction
|
||||
|
||||
Implement parallel HNSW construction using rayon for faster
|
||||
index building on multi-core systems.
|
||||
|
||||
- Split graph construction across threads
|
||||
- Use atomic operations for thread-safe updates
|
||||
- Achieve 4x speedup on 8-core system
|
||||
|
||||
Closes #123
|
||||
```
|
||||
|
||||
```
|
||||
fix(quantization): correct product quantization distance calculation
|
||||
|
||||
The distance calculation was not using precomputed lookup tables,
|
||||
causing incorrect results.
|
||||
|
||||
Fixes #456
|
||||
```
|
||||
|
||||
### Commit Hygiene
|
||||
|
||||
- One logical change per commit
|
||||
- Write clear, descriptive messages
|
||||
- Reference issues/PRs when applicable
|
||||
- Keep commits focused and atomic
|
||||
|
||||
## Documentation
|
||||
|
||||
### Code Documentation
|
||||
|
||||
- **Public APIs**: Comprehensive rustdoc comments
|
||||
- **Examples**: Include usage examples in doc comments
|
||||
- **Safety**: Document unsafe code thoroughly
|
||||
- **Panics**: Document panic conditions
|
||||
|
||||
### User Documentation
|
||||
|
||||
Update relevant docs:
|
||||
- **README.md**: Overview and quick start
|
||||
- **guides/**: User guides and tutorials
|
||||
- **api/**: API reference documentation
|
||||
- **CHANGELOG.md**: User-facing changes
|
||||
|
||||
### Documentation Style
|
||||
|
||||
```rust
|
||||
/// A vector database with HNSW indexing.
|
||||
///
|
||||
/// `VectorDB` provides fast approximate nearest neighbor search using
|
||||
/// Hierarchical Navigable Small World (HNSW) graphs. It supports:
|
||||
///
|
||||
/// - Sub-millisecond query latency
|
||||
/// - 95%+ recall with proper tuning
|
||||
/// - Memory-mapped storage for large datasets
|
||||
/// - Multiple distance metrics (Euclidean, Cosine, etc.)
|
||||
///
|
||||
/// # Examples
|
||||
///
|
||||
/// ```
|
||||
/// use ruvector_core::{VectorDB, VectorEntry, DbOptions};
|
||||
///
|
||||
/// let mut options = DbOptions::default();
|
||||
/// options.dimensions = 128;
|
||||
///
|
||||
/// let db = VectorDB::new(options)?;
|
||||
///
|
||||
/// let entry = VectorEntry {
|
||||
/// id: None,
|
||||
/// vector: vec![0.1; 128],
|
||||
/// metadata: None,
|
||||
/// };
|
||||
///
|
||||
/// let id = db.insert(entry)?;
|
||||
/// # Ok::<(), Box<dyn std::error::Error>>(())
|
||||
/// ```
|
||||
///
|
||||
/// # Performance
|
||||
///
|
||||
/// - Search: O(log n) with HNSW
|
||||
/// - Insert: O(log n) amortized
|
||||
/// - Memory: ~640 bytes per vector (M=32)
|
||||
pub struct VectorDB { }
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
### Performance Guidelines
|
||||
|
||||
1. **Profile first**: Use `cargo flamegraph` or `perf`
|
||||
2. **Measure impact**: Benchmark before/after
|
||||
3. **Document trade-offs**: Explain performance vs. other concerns
|
||||
4. **Use SIMD**: Leverage SIMD intrinsics for hot paths
|
||||
5. **Avoid allocations**: Reuse buffers in hot loops
|
||||
|
||||
### Benchmarking Changes
|
||||
|
||||
```bash
|
||||
# Benchmark baseline
|
||||
git checkout main
|
||||
cargo bench -- --save-baseline main
|
||||
|
||||
# Benchmark your changes
|
||||
git checkout feature-branch
|
||||
cargo bench -- --baseline main
|
||||
```
|
||||
|
||||
### Performance Checklist
|
||||
|
||||
- [ ] Profiled hot paths
|
||||
- [ ] Benchmarked changes
|
||||
- [ ] No performance regressions
|
||||
- [ ] Documented performance characteristics
|
||||
- [ ] Considered memory usage
|
||||
|
||||
## Community
|
||||
|
||||
### Getting Help
|
||||
|
||||
- **GitHub Issues**: Bug reports and feature requests
|
||||
- **Discussions**: Questions and general discussion
|
||||
- **Pull Requests**: Code contributions
|
||||
|
||||
### Reporting Bugs
|
||||
|
||||
Use the bug report template:
|
||||
|
||||
```markdown
|
||||
**Describe the bug**
|
||||
Clear description of the bug
|
||||
|
||||
**To Reproduce**
|
||||
1. Step 1
|
||||
2. Step 2
|
||||
3. See error
|
||||
|
||||
**Expected behavior**
|
||||
What you expected to happen
|
||||
|
||||
**Environment**
|
||||
- OS: [e.g., Ubuntu 22.04]
|
||||
- Rust version: [e.g., 1.77.0]
|
||||
- Ruvector version: [e.g., 0.1.0]
|
||||
|
||||
**Additional context**
|
||||
Any other relevant information
|
||||
```
|
||||
|
||||
### Feature Requests
|
||||
|
||||
Use the feature request template:
|
||||
|
||||
```markdown
|
||||
**Is your feature request related to a problem?**
|
||||
Clear description of the problem
|
||||
|
||||
**Describe the solution you'd like**
|
||||
What you want to happen
|
||||
|
||||
**Describe alternatives you've considered**
|
||||
Other solutions you've thought about
|
||||
|
||||
**Additional context**
|
||||
Any other relevant information
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
By contributing to Ruvector, you agree that your contributions will be licensed under the MIT License.
|
||||
|
||||
## Questions?
|
||||
|
||||
Feel free to open an issue or discussion if you have questions about contributing!
|
||||
|
||||
---
|
||||
|
||||
Thank you for contributing to Ruvector! 🚀
|
||||
Reference in New Issue
Block a user