14 KiB
Integration Guides
Complete integration guides for Agentic-Synth with popular tools and frameworks.
Table of Contents
- Ruvector Integration
- AgenticDB Integration
- LangChain Integration
- Midstreamer Integration
- OpenAI Integration
- Anthropic Claude Integration
- HuggingFace Integration
- Vector Database Integration
- Data Pipeline Integration
Ruvector Integration
Seamless integration with Ruvector vector database for high-performance vector operations.
Installation
npm install agentic-synth ruvector
Basic Integration
import { SynthEngine } from 'agentic-synth';
import { VectorDB } from 'ruvector';
// Initialize Ruvector
const db = new VectorDB({
indexType: 'hnsw',
dimensions: 384,
});
// Initialize SynthEngine with Ruvector
const synth = new SynthEngine({
provider: 'openai',
vectorDB: db,
});
// Generate and automatically insert with embeddings
await synth.generateAndInsert({
schema: productSchema,
count: 10000,
collection: 'products',
batchSize: 1000,
});
Advanced Configuration
import { RuvectorAdapter } from 'agentic-synth/integrations';
const adapter = new RuvectorAdapter(synth, db);
// Configure embedding generation
adapter.configure({
embeddingModel: 'text-embedding-3-small',
dimensions: 384,
batchSize: 1000,
normalize: true,
});
// Generate with custom indexing
await adapter.generateAndIndex({
schema: documentSchema,
count: 100000,
collection: 'documents',
indexConfig: {
type: 'hnsw',
M: 16,
efConstruction: 200,
},
});
Streaming to Ruvector
import { createVectorStream } from 'agentic-synth/integrations';
const stream = createVectorStream({
synth,
db,
collection: 'embeddings',
batchSize: 500,
});
for await (const item of synth.generateStream({ schema, count: 1000000 })) {
await stream.write(item);
}
await stream.end();
Augmenting Existing Collections
// Augment existing Ruvector collection with synthetic variations
await adapter.augmentCollection({
collection: 'user-queries',
variationsPerItem: 5,
augmentationType: 'paraphrase',
preserveSemantics: true,
});
AgenticDB Integration
Full compatibility with AgenticDB patterns for agent memory and skills.
Installation
npm install agentic-synth agenticdb
Agent Memory Generation
import { AgenticDBAdapter } from 'agentic-synth/integrations';
import { AgenticDB } from 'agenticdb';
const agenticDB = new AgenticDB();
const adapter = new AgenticDBAdapter(synth);
// Generate episodic memory for agents
const memory = await adapter.generateMemory({
agentId: 'assistant-1',
memoryType: 'episodic',
count: 5000,
timeRange: {
start: new Date('2024-01-01'),
end: new Date('2024-12-31'),
},
});
// Insert directly into AgenticDB
await agenticDB.memory.insertBatch(memory);
Skill Library Generation
// Generate synthetic skills for agent training
const skills = await adapter.generateSkills({
domains: ['coding', 'research', 'communication', 'analysis'],
skillsPerDomain: 100,
includeExamples: true,
});
await agenticDB.skills.insertBatch(skills);
Reflexion Memory
// Generate reflexion-style memory for self-improving agents
const reflexionMemory = await adapter.generateReflexionMemory({
agentId: 'learner-1',
trajectories: 1000,
includeVerdict: true,
includeMemoryShort: true,
includeMemoryLong: true,
});
await agenticDB.reflexion.insertBatch(reflexionMemory);
LangChain Integration
Use Agentic-Synth with LangChain for agent training and RAG systems.
Installation
npm install agentic-synth langchain
Document Generation
import { LangChainAdapter } from 'agentic-synth/integrations';
import { Document } from 'langchain/document';
import { VectorStore } from 'langchain/vectorstores';
const adapter = new LangChainAdapter(synth);
// Generate LangChain documents
const documents = await adapter.generateDocuments({
schema: documentSchema,
count: 10000,
includeMetadata: true,
});
// Use with LangChain VectorStore
const vectorStore = await VectorStore.fromDocuments(
documents,
embeddings
);
RAG Chain Training Data
import { RetrievalQAChain } from 'langchain/chains';
// Generate QA pairs for RAG training
const qaPairs = await adapter.generateRAGTrainingData({
documents: existingDocuments,
questionsPerDoc: 10,
questionTypes: ['factual', 'analytical', 'multi-hop'],
});
// Train RAG chain
const chain = RetrievalQAChain.fromLLM(llm, vectorStore.asRetriever());
Agent Memory for LangChain Agents
import { BufferMemory } from 'langchain/memory';
// Generate conversation history for memory
const conversationHistory = await adapter.generateConversationHistory({
domain: 'customer-support',
interactions: 1000,
format: 'langchain-memory',
});
const memory = new BufferMemory({
chatHistory: conversationHistory,
});
Midstreamer Integration
Real-time streaming integration with Midstreamer for live data generation.
Installation
npm install agentic-synth midstreamer
Real-Time Data Streaming
import { MidstreamerAdapter } from 'agentic-synth/integrations';
import { Midstreamer } from 'midstreamer';
const midstreamer = new Midstreamer({
region: 'us-east-1',
streamName: 'synthetic-data-stream',
});
const adapter = new MidstreamerAdapter(synth, midstreamer);
// Stream synthetic data in real-time
await adapter.streamGeneration({
schema: eventSchema,
ratePerSecond: 1000,
duration: 3600, // 1 hour
});
Event Stream Simulation
// Simulate realistic event streams
await adapter.simulateEventStream({
schema: userEventSchema,
pattern: 'diurnal', // Daily activity pattern
peakHours: [9, 12, 15, 20],
baselineRate: 100,
peakMultiplier: 5,
duration: 86400, // 24 hours
});
Burst Traffic Simulation
// Simulate traffic spikes
await adapter.simulateBurstTraffic({
schema: requestSchema,
baselineRate: 100,
bursts: [
{ start: 3600, duration: 600, multiplier: 50 }, // 50x spike
{ start: 7200, duration: 300, multiplier: 100 }, // 100x spike
],
});
OpenAI Integration
Configure Agentic-Synth to use OpenAI models for generation.
Installation
npm install agentic-synth openai
Basic Configuration
import { SynthEngine } from 'agentic-synth';
const synth = new SynthEngine({
provider: 'openai',
model: 'gpt-4',
apiKey: process.env.OPENAI_API_KEY,
temperature: 0.8,
maxTokens: 2000,
});
Using OpenAI Embeddings
const synth = new SynthEngine({
provider: 'openai',
model: 'gpt-4',
embeddingModel: 'text-embedding-3-small',
embeddingDimensions: 384,
});
// Embeddings are automatically generated
const data = await synth.generate({
schema: schemaWithEmbeddings,
count: 10000,
});
Function Calling for Structured Data
import { OpenAIAdapter } from 'agentic-synth/integrations';
const adapter = new OpenAIAdapter(synth);
// Use OpenAI function calling for perfect structure compliance
const data = await adapter.generateWithFunctions({
schema: complexSchema,
count: 1000,
functionDefinition: {
name: 'generate_item',
parameters: schemaToJSONSchema(complexSchema),
},
});
Anthropic Claude Integration
Use Anthropic Claude for high-quality synthetic data generation.
Installation
npm install agentic-synth @anthropic-ai/sdk
Configuration
import { SynthEngine } from 'agentic-synth';
const synth = new SynthEngine({
provider: 'anthropic',
model: 'claude-3-opus-20240229',
apiKey: process.env.ANTHROPIC_API_KEY,
temperature: 0.8,
maxTokens: 4000,
});
Long-Form Content Generation
// Claude excels at long-form, coherent content
const articles = await synth.generate({
schema: Schema.define({
name: 'Article',
type: 'object',
properties: {
title: { type: 'string' },
content: { type: 'string', minLength: 5000 }, // Long-form
summary: { type: 'string' },
keyPoints: { type: 'array', items: { type: 'string' } },
},
}),
count: 100,
});
HuggingFace Integration
Use open-source models from HuggingFace for cost-effective generation.
Installation
npm install agentic-synth @huggingface/inference
Configuration
import { SynthEngine } from 'agentic-synth';
const synth = new SynthEngine({
provider: 'huggingface',
model: 'mistralai/Mistral-7B-Instruct-v0.2',
apiKey: process.env.HF_API_KEY,
});
Using Local Models
const synth = new SynthEngine({
provider: 'huggingface',
model: 'local',
modelPath: './models/llama-2-7b',
deviceMap: 'auto',
});
Vector Database Integration
Integration with popular vector databases beyond Ruvector.
Pinecone
import { PineconeAdapter } from 'agentic-synth/integrations';
import { PineconeClient } from '@pinecone-database/pinecone';
const pinecone = new PineconeClient();
await pinecone.init({ apiKey: process.env.PINECONE_API_KEY });
const adapter = new PineconeAdapter(synth, pinecone);
await adapter.generateAndUpsert({
schema: embeddingSchema,
count: 100000,
index: 'my-index',
namespace: 'synthetic-data',
});
Weaviate
import { WeaviateAdapter } from 'agentic-synth/integrations';
import weaviate from 'weaviate-ts-client';
const client = weaviate.client({ scheme: 'http', host: 'localhost:8080' });
const adapter = new WeaviateAdapter(synth, client);
await adapter.generateAndImport({
schema: documentSchema,
count: 50000,
className: 'Document',
});
Qdrant
import { QdrantAdapter } from 'agentic-synth/integrations';
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
const adapter = new QdrantAdapter(synth, client);
await adapter.generateAndInsert({
schema: vectorSchema,
count: 200000,
collection: 'synthetic-vectors',
});
Data Pipeline Integration
Integrate with data engineering pipelines and ETL tools.
Apache Airflow
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import subprocess
def generate_synthetic_data():
subprocess.run([
'npx', 'agentic-synth', 'generate',
'--schema', 'customer-support',
'--count', '10000',
'--output', '/data/synthetic.jsonl'
])
dag = DAG(
'synthetic_data_generation',
start_date=datetime(2024, 1, 1),
schedule_interval='@daily'
)
generate_task = PythonOperator(
task_id='generate',
python_callable=generate_synthetic_data,
dag=dag
)
dbt (Data Build Tool)
# dbt_project.yml
models:
synthetic_data:
materialized: table
pre-hook:
- "{{ run_agentic_synth('customer_events', 10000) }}"
# macros/agentic_synth.sql
{% macro run_agentic_synth(schema_name, count) %}
{{ run_command('npx agentic-synth generate --schema ' ~ schema_name ~ ' --count ' ~ count) }}
{% endmacro %}
Prefect
from prefect import flow, task
import subprocess
@task
def generate_data(schema: str, count: int):
result = subprocess.run([
'npx', 'agentic-synth', 'generate',
'--schema', schema,
'--count', str(count),
'--output', f'/data/{schema}.jsonl'
])
return result.returncode == 0
@flow
def synthetic_data_pipeline():
generate_data('users', 10000)
generate_data('products', 50000)
generate_data('interactions', 100000)
synthetic_data_pipeline()
AWS Step Functions
{
"Comment": "Synthetic Data Generation Pipeline",
"StartAt": "GenerateData",
"States": {
"GenerateData": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:agentic-synth-generator",
"Parameters": {
"schema": "customer-events",
"count": 100000,
"output": "s3://my-bucket/synthetic/"
},
"Next": "ValidateQuality"
},
"ValidateQuality": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:quality-validator",
"End": true
}
}
}
Custom Integration Template
Create custom integrations for your tools:
import { BaseIntegration } from 'agentic-synth/integrations';
export class MyCustomIntegration extends BaseIntegration {
constructor(
private synth: SynthEngine,
private customTool: any
) {
super();
}
async generateAndExport(options: GenerateOptions) {
// Generate data
const data = await this.synth.generate(options);
// Custom export logic
for (const item of data.data) {
await this.customTool.insert(item);
}
return {
count: data.metadata.count,
quality: data.metadata.quality,
};
}
async streamToCustomTool(options: GenerateOptions) {
for await (const item of this.synth.generateStream(options)) {
await this.customTool.stream(item);
}
}
}
Best Practices
- Connection Pooling: Reuse database connections across generations
- Batch Operations: Use batching for all database insertions (1000-5000 items)
- Error Handling: Implement retry logic for API and database failures
- Rate Limiting: Respect API rate limits with exponential backoff
- Monitoring: Track generation metrics and quality scores
- Resource Management: Close connections and cleanup resources properly
- Configuration: Externalize configuration for different environments
Troubleshooting
Common Issues
Issue: Slow vector insertions Solution: Increase batch size, use parallel workers
Issue: API rate limits Solution: Reduce generation rate, implement exponential backoff
Issue: Memory errors with large datasets Solution: Use streaming mode, process in smaller chunks
Issue: Low quality synthetic data Solution: Tune temperature, validate schemas, increase quality threshold
Examples Repository
Complete integration examples: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/examples/integrations
Support
- GitHub Issues: https://github.com/ruvnet/ruvector/issues
- Discord: https://discord.gg/ruvnet
- Email: support@ruv.io