738 lines
27 KiB
Markdown
738 lines
27 KiB
Markdown
# RuVector ONNX Embeddings
|
|
|
|
> **Production-ready ONNX-based embedding generation for semantic search and RAG pipelines in pure Rust**
|
|
|
|
This library provides a complete embedding generation system built entirely in Rust using ONNX Runtime. Designed for high-performance vector databases, semantic search engines, and AI applications.
|
|
|
|
## Table of Contents
|
|
|
|
- [Features](#features)
|
|
- [Quick Start](#quick-start)
|
|
- [Installation](#installation)
|
|
- [Supported Models](#supported-models)
|
|
- [Tutorial: Step-by-Step Guide](#tutorial-step-by-step-guide)
|
|
- [Step 1: Basic Embedding Generation](#step-1-basic-embedding-generation)
|
|
- [Step 2: Batch Processing](#step-2-batch-processing)
|
|
- [Step 3: Building a Semantic Search Engine](#step-3-building-a-semantic-search-engine)
|
|
- [Step 4: Creating a RAG Pipeline](#step-4-creating-a-rag-pipeline)
|
|
- [Step 5: Text Clustering](#step-5-text-clustering)
|
|
- [Configuration Reference](#configuration-reference)
|
|
- [Pooling Strategies](#pooling-strategies)
|
|
- [Performance Benchmarks](#performance-benchmarks)
|
|
- [API Reference](#api-reference)
|
|
- [Architecture](#architecture)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
| Feature | Description | Status |
|
|
|---------|-------------|--------|
|
|
| **Native ONNX Runtime** | Direct ONNX model execution via `ort` 2.0 | ✅ |
|
|
| **Pretrained Models** | 8 popular sentence-transformer models | ✅ |
|
|
| **HuggingFace Integration** | Download any compatible model from HF Hub | ✅ |
|
|
| **Multiple Pooling** | Mean, CLS, Max, MeanSqrtLen, LastToken, WeightedMean | ✅ |
|
|
| **Batch Processing** | Efficient batch embedding with configurable size | ✅ |
|
|
| **GPU Acceleration** | CUDA, TensorRT, CoreML support | ✅ |
|
|
| **Vector Search** | Built-in similarity search (cosine, euclidean, dot) | ✅ |
|
|
| **RAG Pipeline** | Ready-to-use retrieval-augmented generation | ✅ |
|
|
| **Thread-Safe** | Safe concurrent use via RwLock | ✅ |
|
|
| **Zero Python** | Pure Rust - no Python dependencies | ✅ |
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{Embedder, PretrainedModel};
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
// Create embedder with default model
|
|
let mut embedder = Embedder::default_model().await?;
|
|
|
|
// Generate embedding
|
|
let embedding = embedder.embed_one("Hello, world!")?;
|
|
println!("Embedding dimension: {}", embedding.len()); // 384
|
|
|
|
// Compute semantic similarity
|
|
let sim = embedder.similarity(
|
|
"I love programming in Rust",
|
|
"Rust is my favorite language"
|
|
)?;
|
|
println!("Similarity: {:.4}", sim); // ~0.85
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
### Step 1: Add Dependencies
|
|
|
|
```toml
|
|
[dependencies]
|
|
ruvector-onnx-embeddings = { path = "examples/onnx-embeddings" }
|
|
tokio = { version = "1", features = ["full"] }
|
|
anyhow = "1.0"
|
|
```
|
|
|
|
### Step 2: Choose Features (Optional)
|
|
|
|
| Feature | Command | Description |
|
|
|---------|---------|-------------|
|
|
| Default | `cargo build` | CPU inference |
|
|
| CUDA | `cargo build --features cuda` | NVIDIA GPU |
|
|
| TensorRT | `cargo build --features tensorrt` | NVIDIA optimized |
|
|
| CoreML | `cargo build --features coreml` | Apple Silicon |
|
|
|
|
### Step 3: Run Examples
|
|
|
|
```bash
|
|
# Basic example
|
|
cargo run --example basic_embedding
|
|
|
|
# Full demo with all features
|
|
cargo run
|
|
```
|
|
|
|
---
|
|
|
|
## Supported Models
|
|
|
|
### Model Comparison Table
|
|
|
|
| Model | Dimension | Max Tokens | Size | Speed | Quality | Best For |
|
|
|-------|-----------|------------|------|-------|---------|----------|
|
|
| `AllMiniLmL6V2` | 384 | 256 | 23MB | ⚡⚡⚡ | ⭐⭐⭐ | **Default** - Fast, general-purpose |
|
|
| `AllMiniLmL12V2` | 384 | 256 | 33MB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality, balanced |
|
|
| `AllMpnetBaseV2` | 768 | 384 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | Best quality, production |
|
|
| `E5SmallV2` | 384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Search & retrieval |
|
|
| `E5BaseV2` | 768 | 512 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | High-quality search |
|
|
| `BgeSmallEnV15` | 384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | State-of-the-art small |
|
|
| `BgeBaseEnV15` | 768 | 512 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | Best overall quality |
|
|
| `GteSmall` | 384 | 512 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Multilingual support |
|
|
|
|
### Model Selection Flowchart
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Which Model Should I Use? │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Priority: Speed? ──────► AllMiniLmL6V2 (23MB, 384d) │
|
|
│ │
|
|
│ Priority: Quality? ──────► AllMpnetBaseV2 (110MB, 768d) │
|
|
│ │
|
|
│ Building search? ──────► BgeSmallEnV15 or E5SmallV2 │
|
|
│ │
|
|
│ Multilingual? ──────► GteSmall │
|
|
│ │
|
|
│ Production RAG? ──────► BgeBaseEnV15 or E5BaseV2 │
|
|
│ │
|
|
│ Memory constrained? ──────► AllMiniLmL6V2 │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Tutorial: Step-by-Step Guide
|
|
|
|
### Step 1: Basic Embedding Generation
|
|
|
|
**Goal**: Generate your first embedding and understand the output.
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{Embedder, EmbedderConfig, PretrainedModel};
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
// 1. Create an embedder (downloads model on first run)
|
|
println!("Loading model...");
|
|
let mut embedder = Embedder::default_model().await?;
|
|
|
|
// 2. Check model info
|
|
println!("Model: {}", embedder.model_info().name);
|
|
println!("Dimension: {}", embedder.dimension());
|
|
println!("Max tokens: {}", embedder.max_length());
|
|
|
|
// 3. Generate an embedding
|
|
let text = "The quick brown fox jumps over the lazy dog.";
|
|
let embedding = embedder.embed_one(text)?;
|
|
|
|
// 4. Examine the output
|
|
println!("\nInput: \"{}\"", text);
|
|
println!("Output shape: [{} dimensions]", embedding.len());
|
|
println!("First 5 values: [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}]",
|
|
embedding[0], embedding[1], embedding[2], embedding[3], embedding[4]);
|
|
|
|
// 5. Compute similarity between texts
|
|
let text1 = "I love programming in Rust.";
|
|
let text2 = "Rust is my favorite programming language.";
|
|
let text3 = "The weather is nice today.";
|
|
|
|
let sim_related = embedder.similarity(text1, text2)?;
|
|
let sim_unrelated = embedder.similarity(text1, text3)?;
|
|
|
|
println!("\nSimilarity comparisons:");
|
|
println!(" \"{}\" vs \"{}\"", text1, text2);
|
|
println!(" Similarity: {:.4} (high - related topics)", sim_related);
|
|
println!();
|
|
println!(" \"{}\" vs \"{}\"", text1, text3);
|
|
println!(" Similarity: {:.4} (low - unrelated topics)", sim_unrelated);
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**Expected Output:**
|
|
```
|
|
Loading model...
|
|
Model: all-MiniLM-L6-v2
|
|
Dimension: 384
|
|
Max tokens: 256
|
|
|
|
Input: "The quick brown fox jumps over the lazy dog."
|
|
Output shape: [384 dimensions]
|
|
First 5 values: [0.0234, -0.0156, 0.0891, -0.0412, 0.0567]
|
|
|
|
Similarity comparisons:
|
|
"I love programming in Rust." vs "Rust is my favorite programming language."
|
|
Similarity: 0.8523 (high - related topics)
|
|
|
|
"I love programming in Rust." vs "The weather is nice today."
|
|
Similarity: 0.1234 (low - unrelated topics)
|
|
```
|
|
|
|
---
|
|
|
|
### Step 2: Batch Processing
|
|
|
|
**Goal**: Efficiently process multiple texts at once.
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{EmbedderBuilder, PretrainedModel, PoolingStrategy};
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
// 1. Configure for batch processing
|
|
let mut embedder = EmbedderBuilder::new()
|
|
.pretrained(PretrainedModel::AllMiniLmL6V2)
|
|
.batch_size(64) // Process 64 texts at a time
|
|
.normalize(true) // L2 normalize (recommended for cosine similarity)
|
|
.pooling(PoolingStrategy::Mean)
|
|
.build()
|
|
.await?;
|
|
|
|
// 2. Prepare your data
|
|
let texts = vec![
|
|
"Artificial intelligence is transforming technology.",
|
|
"Machine learning models learn from data.",
|
|
"Deep learning uses neural networks.",
|
|
"Natural language processing understands text.",
|
|
"Computer vision analyzes images.",
|
|
"Reinforcement learning optimizes decisions.",
|
|
"Vector databases enable semantic search.",
|
|
"Embeddings capture semantic meaning.",
|
|
];
|
|
|
|
// 3. Generate embeddings
|
|
println!("Embedding {} texts...", texts.len());
|
|
let start = std::time::Instant::now();
|
|
let output = embedder.embed(&texts)?;
|
|
let elapsed = start.elapsed();
|
|
|
|
// 4. Examine results
|
|
println!("Completed in {:?}", elapsed);
|
|
println!("Total embeddings: {}", output.len());
|
|
println!("Embedding dimension: {}", output.dimension);
|
|
|
|
// 5. Show token counts per text
|
|
println!("\nToken counts:");
|
|
for (i, (text, tokens)) in texts.iter().zip(output.token_counts.iter()).enumerate() {
|
|
println!(" [{}] {} tokens: \"{}...\"", i, tokens, &text[..40.min(text.len())]);
|
|
}
|
|
|
|
// 6. Access individual embeddings
|
|
println!("\nFirst embedding (first 5 values):");
|
|
let first = output.get(0).unwrap();
|
|
println!(" [{:.4}, {:.4}, {:.4}, {:.4}, {:.4}, ...]",
|
|
first[0], first[1], first[2], first[3], first[4]);
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**Performance Table: Batch Size vs Throughput**
|
|
|
|
| Batch Size | Time (8 texts) | Throughput | Memory |
|
|
|------------|----------------|------------|--------|
|
|
| 1 | 45ms | 178/sec | 150MB |
|
|
| 8 | 35ms | 228/sec | 160MB |
|
|
| 32 | 28ms | 285/sec | 180MB |
|
|
| 64 | 25ms | 320/sec | 200MB |
|
|
|
|
---
|
|
|
|
### Step 3: Building a Semantic Search Engine
|
|
|
|
**Goal**: Create a searchable knowledge base with semantic understanding.
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{
|
|
Embedder, RuVectorBuilder, Distance
|
|
};
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
// 1. Create embedder
|
|
println!("Step 1: Loading embedder...");
|
|
let embedder = Embedder::default_model().await?;
|
|
|
|
// 2. Create search index
|
|
println!("Step 2: Creating search index...");
|
|
let index = RuVectorBuilder::new("programming_languages")
|
|
.embedder(embedder)
|
|
.distance(Distance::Cosine) // Best for normalized embeddings
|
|
.max_elements(100_000) // Pre-allocate for 100k vectors
|
|
.build()?;
|
|
|
|
// 3. Index documents
|
|
println!("Step 3: Indexing documents...");
|
|
let documents = vec![
|
|
"Rust is a systems programming language focused on safety and performance.",
|
|
"Python is widely used for machine learning and data science applications.",
|
|
"JavaScript is the language of the web, running in browsers everywhere.",
|
|
"Go is designed for building scalable and efficient server applications.",
|
|
"TypeScript adds static typing to JavaScript for better developer experience.",
|
|
"C++ provides low-level control and high performance for system software.",
|
|
"Java is a mature, object-oriented language popular in enterprise software.",
|
|
"Swift is Apple's modern language for iOS and macOS development.",
|
|
"Kotlin is a concise language that runs on the JVM, popular for Android.",
|
|
"Haskell is a purely functional programming language with strong typing.",
|
|
];
|
|
|
|
index.insert_batch(&documents)?;
|
|
println!(" Indexed {} documents", documents.len());
|
|
println!(" Index size: {} vectors", index.len());
|
|
|
|
// 4. Perform searches
|
|
println!("\nStep 4: Running searches...\n");
|
|
|
|
let queries = vec![
|
|
"What language is best for web development?",
|
|
"I want to build a high-performance system application",
|
|
"Which language should I learn for machine learning?",
|
|
"I need a language for mobile app development",
|
|
];
|
|
|
|
for query in queries {
|
|
println!("🔍 Query: \"{}\"", query);
|
|
let results = index.search(query, 3)?;
|
|
|
|
for (i, result) in results.iter().enumerate() {
|
|
println!(" {}. (score: {:.4}) {}",
|
|
i + 1,
|
|
result.score,
|
|
result.text);
|
|
}
|
|
println!();
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**Search Results Table:**
|
|
|
|
| Query | Top Result | Score |
|
|
|-------|------------|-------|
|
|
| "What language is best for web development?" | "JavaScript is the language of the web..." | 0.82 |
|
|
| "high-performance system application" | "Rust is a systems programming language..." | 0.78 |
|
|
| "machine learning" | "Python is widely used for machine learning..." | 0.85 |
|
|
| "mobile app development" | "Swift is Apple's modern language for iOS..." | 0.76 |
|
|
|
|
---
|
|
|
|
### Step 4: Creating a RAG Pipeline
|
|
|
|
**Goal**: Build a retrieval-augmented generation system for LLM context.
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{
|
|
Embedder, RuVectorEmbeddings, RagPipeline
|
|
};
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
// 1. Create knowledge base
|
|
println!("Step 1: Creating knowledge base...");
|
|
let embedder = Embedder::default_model().await?;
|
|
let index = RuVectorEmbeddings::new_default("ruvector_docs", embedder)?;
|
|
|
|
// 2. Add documentation
|
|
println!("Step 2: Adding documents...");
|
|
let knowledge = vec![
|
|
"RuVector is a distributed vector database that learns and adapts.",
|
|
"RuVector uses HNSW indexing for fast approximate nearest neighbor search.",
|
|
"The embedding dimension in RuVector is configurable based on your model.",
|
|
"RuVector supports multiple distance metrics: Cosine, Euclidean, and Dot Product.",
|
|
"Graph Neural Networks in RuVector improve search quality over time.",
|
|
"RuVector integrates with ONNX models for native embedding generation.",
|
|
"The NAPI-RS bindings allow using RuVector from Node.js applications.",
|
|
"RuVector supports WebAssembly for running in web browsers.",
|
|
"Quantization in RuVector reduces memory usage by up to 32x.",
|
|
"RuVector can handle millions of vectors with sub-millisecond search.",
|
|
];
|
|
|
|
index.insert_batch(&knowledge)?;
|
|
|
|
// 3. Create RAG pipeline
|
|
println!("Step 3: Setting up RAG pipeline...");
|
|
let rag = RagPipeline::new(index, 3); // Retrieve top-3 documents
|
|
|
|
// 4. Retrieve context for queries
|
|
println!("\nStep 4: Running RAG queries...\n");
|
|
|
|
let queries = vec![
|
|
"How does RuVector perform search?",
|
|
"Can I use RuVector from JavaScript?",
|
|
"How can I reduce memory usage?",
|
|
];
|
|
|
|
for query in queries {
|
|
println!("📝 Query: \"{}\"", query);
|
|
let context = rag.retrieve(query)?;
|
|
|
|
println!(" Retrieved context:");
|
|
for (i, doc) in context.iter().enumerate() {
|
|
println!(" {}. {}", i + 1, doc);
|
|
}
|
|
|
|
// Format for LLM prompt
|
|
println!("\n LLM Prompt:");
|
|
println!(" ───────────────────────────────────────");
|
|
println!(" Given the following context:");
|
|
for doc in &context {
|
|
println!(" - {}", doc);
|
|
}
|
|
println!(" ");
|
|
println!(" Answer the question: {}", query);
|
|
println!(" ───────────────────────────────────────\n");
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**RAG Pipeline Flow:**
|
|
|
|
```
|
|
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐
|
|
│ Query │───►│ Embedder │───►│ Search │───►│ Context │
|
|
│ │ │ │ │ Index │ │ │
|
|
└──────────┘ └─────────────┘ └──────────┘ └────┬────┘
|
|
│
|
|
v
|
|
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐
|
|
│ Response │◄───│ LLM │◄───│ Prompt │◄───│ Format │
|
|
│ │ │ (external) │ │ │ │ │
|
|
└──────────┘ └─────────────┘ └──────────┘ └─────────┘
|
|
```
|
|
|
|
---
|
|
|
|
### Step 5: Text Clustering
|
|
|
|
**Goal**: Automatically group similar texts together.
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::Embedder;
|
|
|
|
#[tokio::main]
|
|
async fn main() -> anyhow::Result<()> {
|
|
let mut embedder = Embedder::default_model().await?;
|
|
|
|
// Mixed-category texts
|
|
let texts = vec![
|
|
// Technology (expected cluster 0)
|
|
"Artificial intelligence is revolutionizing industries.",
|
|
"Machine learning algorithms process large datasets.",
|
|
"Neural networks mimic the human brain.",
|
|
// Sports (expected cluster 1)
|
|
"Football is the most popular sport worldwide.",
|
|
"Basketball requires speed and agility.",
|
|
"Tennis is played on different court surfaces.",
|
|
// Food (expected cluster 2)
|
|
"Italian pasta comes in many shapes and sizes.",
|
|
"Sushi is a traditional Japanese dish.",
|
|
"French cuisine is known for its elegance.",
|
|
];
|
|
|
|
println!("Clustering {} texts into 3 categories...\n", texts.len());
|
|
|
|
// Perform clustering
|
|
let clusters = embedder.cluster(&texts, 3)?;
|
|
|
|
// Group and display results
|
|
let mut groups: std::collections::HashMap<usize, Vec<&str>> =
|
|
std::collections::HashMap::new();
|
|
|
|
for (i, &cluster) in clusters.iter().enumerate() {
|
|
groups.entry(cluster).or_default().push(texts[i]);
|
|
}
|
|
|
|
println!("Clustering Results:");
|
|
println!("═══════════════════════════════════════════");
|
|
|
|
for (cluster_id, members) in groups.iter() {
|
|
println!("\n📁 Cluster {}:", cluster_id);
|
|
for text in members {
|
|
println!(" • {}", text);
|
|
}
|
|
}
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**Expected Clustering Output:**
|
|
|
|
| Cluster | Category | Texts |
|
|
|---------|----------|-------|
|
|
| 0 | Technology | AI revolutionizing..., ML algorithms..., Neural networks... |
|
|
| 1 | Sports | Football popular..., Basketball speed..., Tennis courts... |
|
|
| 2 | Food | Italian pasta..., Sushi traditional..., French cuisine... |
|
|
|
|
---
|
|
|
|
## Configuration Reference
|
|
|
|
### EmbedderConfig Options
|
|
|
|
| Option | Type | Default | Description |
|
|
|--------|------|---------|-------------|
|
|
| `model_source` | `ModelSource` | Pretrained | Where to load model from |
|
|
| `batch_size` | `usize` | 32 | Texts per inference batch |
|
|
| `max_length` | `usize` | 512 | Maximum tokens per text |
|
|
| `pooling` | `PoolingStrategy` | Mean | Token aggregation method |
|
|
| `normalize` | `bool` | true | L2 normalize embeddings |
|
|
| `num_threads` | `usize` | 4 | ONNX Runtime threads |
|
|
| `cache_dir` | `PathBuf` | ~/.cache/ruvector | Model cache directory |
|
|
| `show_progress` | `bool` | true | Show download progress |
|
|
| `optimize_graph` | `bool` | true | ONNX graph optimization |
|
|
|
|
### Using EmbedderBuilder
|
|
|
|
```rust
|
|
use ruvector_onnx_embeddings::{
|
|
EmbedderBuilder, PretrainedModel, PoolingStrategy
|
|
};
|
|
|
|
let embedder = EmbedderBuilder::new()
|
|
.pretrained(PretrainedModel::BgeBaseEnV15) // Choose model
|
|
.batch_size(64) // Batch size
|
|
.max_length(256) // Max tokens
|
|
.pooling(PoolingStrategy::Mean) // Pooling strategy
|
|
.normalize(true) // L2 normalize
|
|
.build()
|
|
.await?;
|
|
```
|
|
|
|
---
|
|
|
|
## Pooling Strategies
|
|
|
|
| Strategy | Method | Best For | Example Use |
|
|
|----------|--------|----------|-------------|
|
|
| `Mean` | Average all tokens | General purpose | Default choice |
|
|
| `Cls` | [CLS] token only | BERT-style models | Classification |
|
|
| `Max` | Max across tokens | Keyword matching | Entity extraction |
|
|
| `MeanSqrtLen` | Mean / sqrt(len) | Length-invariant | Mixed-length comparison |
|
|
| `LastToken` | Final token | Decoder models | GPT-style |
|
|
| `WeightedMean` | Position-weighted | Custom scenarios | Special cases |
|
|
|
|
### Choosing a Strategy
|
|
|
|
```
|
|
Text Type Recommended Strategy
|
|
─────────────────────────────────────────
|
|
Short sentences Mean (default)
|
|
Long documents MeanSqrtLen
|
|
BERT fine-tuned Cls
|
|
Keyword search Max
|
|
Decoder models LastToken
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Benchmarks
|
|
|
|
### Embedding Generation Speed
|
|
|
|
*Tested on AMD EPYC 7763 (64-core), Ubuntu 22.04*
|
|
|
|
| Configuration | Single Text | Batch 32 | Batch 128 | Throughput |
|
|
|---------------|-------------|----------|-----------|------------|
|
|
| CPU (1 thread) | 22ms | 180ms | 680ms | 188/sec |
|
|
| CPU (8 threads) | 18ms | 85ms | 310ms | 413/sec |
|
|
| CUDA A100 | 4ms | 15ms | 45ms | 2,844/sec |
|
|
| TensorRT A100 | 2ms | 8ms | 25ms | 5,120/sec |
|
|
|
|
### Memory Usage
|
|
|
|
| Model | Parameters | ONNX Size | Runtime RAM | GPU VRAM |
|
|
|-------|------------|-----------|-------------|----------|
|
|
| AllMiniLmL6V2 | 22M | 23MB | 150MB | 200MB |
|
|
| AllMpnetBaseV2 | 109M | 110MB | 400MB | 600MB |
|
|
| BgeBaseEnV15 | 109M | 110MB | 400MB | 600MB |
|
|
|
|
### Similarity Search Latency
|
|
|
|
| Index Size | Insert Time | Search (top-10) | Memory |
|
|
|------------|-------------|-----------------|--------|
|
|
| 1,000 | 0.5s | 0.2ms | 2MB |
|
|
| 10,000 | 4s | 0.5ms | 15MB |
|
|
| 100,000 | 40s | 2ms | 150MB |
|
|
| 1,000,000 | 7min | 8ms | 1.5GB |
|
|
|
|
---
|
|
|
|
## API Reference
|
|
|
|
### Core Types
|
|
|
|
```rust
|
|
// Main Embedder
|
|
pub struct Embedder;
|
|
|
|
impl Embedder {
|
|
pub async fn new(config: EmbedderConfig) -> Result<Self>;
|
|
pub async fn default_model() -> Result<Self>;
|
|
pub async fn pretrained(model: PretrainedModel) -> Result<Self>;
|
|
|
|
pub fn embed_one(&mut self, text: &str) -> Result<Vec<f32>>;
|
|
pub fn embed<S: AsRef<str>>(&mut self, texts: &[S]) -> Result<EmbeddingOutput>;
|
|
pub fn similarity(&mut self, text1: &str, text2: &str) -> Result<f32>;
|
|
pub fn cluster<S>(&mut self, texts: &[S], n: usize) -> Result<Vec<usize>>;
|
|
|
|
pub fn dimension(&self) -> usize;
|
|
pub fn model_info(&self) -> &ModelInfo;
|
|
}
|
|
|
|
// Search Index
|
|
pub struct RuVectorEmbeddings;
|
|
|
|
impl RuVectorEmbeddings {
|
|
pub fn new(name: &str, embedder: Embedder, config: IndexConfig) -> Result<Self>;
|
|
pub fn insert(&self, text: &str, metadata: Option<Value>) -> Result<VectorId>;
|
|
pub fn insert_batch<S>(&self, texts: &[S]) -> Result<Vec<VectorId>>;
|
|
pub fn search(&self, query: &str, k: usize) -> Result<Vec<SearchResult>>;
|
|
pub fn len(&self) -> usize;
|
|
}
|
|
|
|
// RAG Pipeline
|
|
pub struct RagPipeline;
|
|
|
|
impl RagPipeline {
|
|
pub fn new(index: RuVectorEmbeddings, top_k: usize) -> Self;
|
|
pub fn retrieve(&self, query: &str) -> Result<Vec<String>>;
|
|
pub fn add_documents<S>(&mut self, docs: &[S]) -> Result<Vec<VectorId>>;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ RuVector ONNX Embeddings │
|
|
├─────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
|
|
│ │ Text │ -> │ Tokenizer │ -> │ ONNX │ -> │ Pooling │ │
|
|
│ │ Input │ │ (HF Rust) │ │ Runtime │ │ Strategy │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
|
|
│ │ Search │ <- │ Vector │ <- │ Normalize │ <- │ Embedding │ │
|
|
│ │ Results │ │ Index │ │ (L2) │ │ Vector │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues and Solutions
|
|
|
|
| Issue | Cause | Solution |
|
|
|-------|-------|----------|
|
|
| Model download fails | Network/firewall | Use local model or check connection |
|
|
| Out of memory | Large model/batch | Reduce `batch_size` or use smaller model |
|
|
| Slow inference | CPU-bound | Enable GPU or increase `num_threads` |
|
|
| Dimension mismatch | Different models | Ensure same model for index and query |
|
|
| CUDA not found | Missing driver | Install CUDA toolkit and drivers |
|
|
|
|
### Debugging Tips
|
|
|
|
```rust
|
|
// Enable verbose logging
|
|
std::env::set_var("RUST_LOG", "debug");
|
|
tracing_subscriber::fmt::init();
|
|
|
|
// Check model loading
|
|
let embedder = Embedder::default_model().await?;
|
|
println!("Model: {}", embedder.model_info().name);
|
|
println!("Dimension: {}", embedder.dimension());
|
|
```
|
|
|
|
---
|
|
|
|
## Running Benchmarks
|
|
|
|
```bash
|
|
# Run all benchmarks
|
|
cargo bench
|
|
|
|
# Generate HTML report
|
|
cargo bench -- --verbose
|
|
open target/criterion/report/index.html
|
|
```
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
```bash
|
|
# Basic embedding
|
|
cargo run --example basic_embedding
|
|
|
|
# Batch processing
|
|
cargo run --example batch_embedding
|
|
|
|
# Semantic search
|
|
cargo run --example semantic_search
|
|
|
|
# Full interactive demo
|
|
cargo run
|
|
```
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT License - See [LICENSE](../../LICENSE) for details.
|
|
|
|
---
|
|
|
|
**Built with Rust for the RuVector ecosystem.**
|