12 KiB
AgenticDB Embedding Limitation - MUST READ
⚠️⚠️⚠️ CRITICAL WARNING ⚠️⚠️⚠️
AgenticDB currently uses PLACEHOLDER HASH-BASED EMBEDDINGS, not real semantic embeddings.
What This Means
The current generate_text_embedding() function creates embeddings using a simple hash that does NOT understand semantic meaning:
❌ What DOESN'T Work
- Semantic similarity: "dog" and "cat" are NOT similar
- Synonyms: "happy" and "joyful" are NOT similar
- Related concepts: "car" and "automobile" are NOT similar
- Paraphrasing: "I like pizza" and "Pizza is my favorite" are NOT similar
✅ What "Works" (But Shouldn't)
- Character similarity: "dog" and "god" ARE similar (same letters)
- Typos: "teh" and "the" ARE similar (close characters)
- This is NOT semantic search - it's character overlap!
Why This Exists
The placeholder embedding allows:
- Testing the AgenticDB API structure
- Demonstrating the API usage patterns
- Running benchmarks on vector operations
- Developing without external dependencies
But it should NEVER be used for production semantic search.
Production Integration - Choose ONE
Option 1: ONNX Runtime (Recommended ⭐)
Best for: Production deployments, cross-platform compatibility
use ort::{Session, Environment, Value, TensorRTExecutionProvider};
use tokenizers::Tokenizer;
pub struct OnnxEmbedder {
session: Session,
tokenizer: Tokenizer,
}
impl OnnxEmbedder {
pub fn new(model_path: &str, tokenizer_path: &str) -> Result<Self> {
let environment = Environment::builder()
.with_name("embeddings")
.with_execution_providers([TensorRTExecutionProvider::default().build()])
.build()?;
let session = Session::builder()?
.with_optimization_level(ort::GraphOptimizationLevel::Level3)?
.with_intra_threads(4)?
.with_model_from_file(model_path)?;
let tokenizer = Tokenizer::from_file(tokenizer_path)?;
Ok(Self { session, tokenizer })
}
pub fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
let encoding = self.tokenizer.encode(text, true)?;
let input_ids = encoding.get_ids();
let attention_mask = encoding.get_attention_mask();
let input_ids_array = ndarray::Array2::from_shape_vec(
(1, input_ids.len()),
input_ids.iter().map(|&x| x as i64).collect()
)?;
let attention_mask_array = ndarray::Array2::from_shape_vec(
(1, attention_mask.len()),
attention_mask.iter().map(|&x| x as i64).collect()
)?;
let outputs = self.session.run(ort::inputs![
"input_ids" => Value::from_array(self.session.allocator(), &input_ids_array)?,
"attention_mask" => Value::from_array(self.session.allocator(), &attention_mask_array)?
])?;
let embeddings: ort::OrtOwnedTensor<f32, _> = outputs["last_hidden_state"].try_extract()?;
let embeddings = embeddings.view();
// Mean pooling
let embedding_vec = embeddings
.mean_axis(ndarray::Axis(1))
.unwrap()
.to_vec();
Ok(embedding_vec)
}
}
// Replace AgenticDB's generate_text_embedding:
// 1. Add OnnxEmbedder field to AgenticDB struct
// 2. Initialize in new()
// 3. Call embedder.generate_text_embedding(text) instead of hash
Models to use:
all-MiniLM-L6-v2(384 dims, fast, good quality)all-mpnet-base-v2(768 dims, higher quality)gte-small(384 dims, multilingual)
Get ONNX models:
python -m pip install optimum[onnxruntime]
optimum-cli export onnx --model sentence-transformers/all-MiniLM-L6-v2 all-MiniLM-L6-v2-onnx/
Option 2: Candle (Pure Rust)
Best for: Native Rust deployments, no Python dependencies
use candle_core::{Device, Tensor};
use candle_nn::VarBuilder;
use candle_transformers::models::bert::{BertModel, Config as BertConfig};
pub struct CandleEmbedder {
model: BertModel,
tokenizer: tokenizers::Tokenizer,
device: Device,
}
impl CandleEmbedder {
pub fn new(model_path: &str, tokenizer_path: &str) -> Result<Self> {
let device = Device::cuda_if_available(0)?;
let config = BertConfig::default();
let vb = VarBuilder::from_pth(model_path, candle_core::DType::F32, &device)?;
let model = BertModel::load(vb, &config)?;
let tokenizer = tokenizers::Tokenizer::from_file(tokenizer_path)?;
Ok(Self { model, tokenizer, device })
}
pub fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
let encoding = self.tokenizer.encode(text, true)?;
let input_ids = Tensor::new(
encoding.get_ids(),
&self.device
)?.unsqueeze(0)?;
let token_type_ids = Tensor::zeros(
(1, encoding.get_ids().len()),
candle_core::DType::U32,
&self.device
)?;
let embeddings = self.model.forward(&input_ids, &token_type_ids)?;
// Mean pooling
let embedding_vec = embeddings
.mean(1)?
.to_vec1::<f32>()?;
Ok(embedding_vec)
}
}
Dependencies:
[dependencies]
candle-core = "0.3"
candle-nn = "0.3"
candle-transformers = "0.3"
Option 3: API-based (OpenAI, Cohere, Anthropic)
Best for: Quick prototyping, cloud deployments
OpenAI
use reqwest;
use serde_json::json;
pub struct OpenAIEmbedder {
client: reqwest::Client,
api_key: String,
}
impl OpenAIEmbedder {
pub fn new(api_key: String) -> Self {
Self {
client: reqwest::Client::new(),
api_key,
}
}
pub async fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
let response = self.client
.post("https://api.openai.com/v1/embeddings")
.header("Authorization", format!("Bearer {}", self.api_key))
.json(&json!({
"model": "text-embedding-3-small",
"input": text,
}))
.send()
.await?;
let json: serde_json::Value = response.json().await?;
let embeddings = json["data"][0]["embedding"]
.as_array()
.unwrap()
.iter()
.map(|v| v.as_f64().unwrap() as f32)
.collect();
Ok(embeddings)
}
}
Costs (as of 2024):
text-embedding-3-small: $0.02 / 1M tokenstext-embedding-3-large: $0.13 / 1M tokens
Cohere
pub async fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
let response = self.client
.post("https://api.cohere.ai/v1/embed")
.header("Authorization", format!("Bearer {}", self.api_key))
.json(&json!({
"model": "embed-english-v3.0",
"texts": [text],
"input_type": "search_query",
}))
.send()
.await?;
let json: serde_json::Value = response.json().await?;
let embeddings = json["embeddings"][0]
.as_array()
.unwrap()
.iter()
.map(|v| v.as_f64().unwrap() as f32)
.collect();
Ok(embeddings)
}
Costs: $0.10 / 1M tokens
Option 4: Python Bindings (sentence-transformers)
Best for: Leveraging existing Python ML ecosystem
use pyo3::prelude::*;
use pyo3::types::PyModule;
use numpy::PyArray1;
pub struct PythonEmbedder {
model: Py<PyAny>,
}
impl PythonEmbedder {
pub fn new(model_name: &str) -> Result<Self> {
Python::with_gil(|py| {
let sentence_transformers = PyModule::import(py, "sentence_transformers")?;
let model = sentence_transformers
.getattr("SentenceTransformer")?
.call1((model_name,))?;
Ok(Self {
model: model.into(),
})
})
}
pub fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
Python::with_gil(|py| {
let embeddings = self.model
.call_method1(py, "encode", (text,))?
.extract::<&PyArray1<f32>>(py)?;
Ok(embeddings.to_vec()?)
})
}
}
Dependencies:
[dependencies]
pyo3 = { version = "0.20", features = ["extension-module"] }
numpy = "0.20"
Python setup:
pip install sentence-transformers
Integration Steps
1. Choose Your Approach
Pick one of the 4 options above based on your requirements:
- ONNX: Best balance of performance and compatibility ⭐
- Candle: Pure Rust, no external runtime
- API: Fastest to prototype, pay per use
- Python: Maximum flexibility with ML libraries
2. Update AgenticDB Struct
pub struct AgenticDB {
vector_db: Arc<VectorDB>,
db: Arc<Database>,
dimensions: usize,
embedder: Arc<dyn Embedder>, // Add this
}
3. Create Embedder Trait
pub trait Embedder: Send + Sync {
fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>>;
}
// Implement for each option:
impl Embedder for OnnxEmbedder { /* ... */ }
impl Embedder for CandleEmbedder { /* ... */ }
impl Embedder for OpenAIEmbedder { /* ... */ }
impl Embedder for PythonEmbedder { /* ... */ }
4. Update Constructor
impl AgenticDB {
pub fn new(options: DbOptions, embedder: Arc<dyn Embedder>) -> Result<Self> {
// ... existing code ...
Ok(Self {
vector_db,
db,
dimensions: options.dimensions,
embedder, // Use provided embedder
})
}
}
5. Replace Hash Implementation
fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
self.embedder.generate_text_embedding(text)
}
6. Update Tests
#[cfg(test)]
mod tests {
use super::*;
struct MockEmbedder;
impl Embedder for MockEmbedder {
fn generate_text_embedding(&self, text: &str) -> Result<Vec<f32>> {
// Use hash for tests only
// ... hash implementation ...
}
}
fn create_test_db() -> Result<AgenticDB> {
let embedder = Arc::new(MockEmbedder);
AgenticDB::new(options, embedder)
}
}
Verification
After integration, verify semantic search works:
#[test]
fn test_semantic_similarity() {
let db = create_db_with_real_embeddings()?;
// These should be similar with real embeddings
let skill1 = db.create_skill(
"Dog Handler".to_string(),
"Take care of dogs".to_string(),
HashMap::new(),
vec![],
)?;
let skill2 = db.create_skill(
"Cat Handler".to_string(),
"Take care of cats".to_string(),
HashMap::new(),
vec![],
)?;
// Search with semantic query
let results = db.search_skills("pet care", 5)?;
// Both should be found because "pet care" is semantically similar
// to both "take care of dogs" and "take care of cats"
assert!(results.len() >= 2);
// With hash embeddings, this would likely fail!
}
Performance Considerations
| Method | Latency | Cost | Offline | Quality |
|---|---|---|---|---|
| ONNX | ~5-20ms | Free | ✅ | ⭐⭐⭐⭐ |
| Candle | ~10-30ms | Free | ✅ | ⭐⭐⭐⭐ |
| OpenAI API | ~100-300ms | $0.02/1M tokens | ❌ | ⭐⭐⭐⭐⭐ |
| Cohere API | ~100-300ms | $0.10/1M tokens | ❌ | ⭐⭐⭐⭐ |
| Python | ~5-20ms | Free | ✅ | ⭐⭐⭐⭐ |
| Hash (current) | ~0.1ms | Free | ✅ | ❌ |
Feature Flag (Future)
We plan to add a compile-time check:
#[cfg(not(feature = "real-embeddings"))]
compile_error!(
"AgenticDB requires 'real-embeddings' feature for production use. \
Current placeholder embeddings do NOT provide semantic search. \
Enable with: cargo build --features real-embeddings"
);
Conclusion
DO NOT use the current AgenticDB implementation for semantic search in production.
The placeholder embeddings are ONLY suitable for:
- API structure testing
- Performance benchmarking (vector operations)
- Development without external dependencies
For any real semantic search use case, integrate one of the four real embedding options above.
See /examples/onnx-embeddings for a complete ONNX integration example.