11 KiB
11 KiB
Integration Guide
This document describes how agentic-synth integrates with external tools and libraries.
Integration Overview
Agentic-synth supports optional integrations with:
- Midstreamer - Streaming data pipelines
- Agentic-Robotics - Automation workflows
- Ruvector - Vector database for embeddings
All integrations are:
- Optional - Package works without them
- Peer dependencies - Installed separately
- Runtime detected - Gracefully degrade if unavailable
- Adapter-based - Clean integration boundaries
Midstreamer Integration
Purpose
Stream generated data through pipelines for real-time processing.
Installation
npm install midstreamer
Usage
Basic Streaming
import { AgenticSynth } from 'agentic-synth';
import { enableMidstreamer } from 'agentic-synth/integrations';
const synth = new AgenticSynth();
// Enable midstreamer integration
enableMidstreamer(synth, {
pipeline: 'synthetic-data-stream',
bufferSize: 1000,
flushInterval: 5000 // ms
});
// Generate with streaming
const result = await synth.generate('timeseries', {
count: 10000,
stream: true // Automatically streams via midstreamer
});
Custom Pipeline
import { createPipeline } from 'midstreamer';
const pipeline = createPipeline({
name: 'data-processing',
transforms: [
{ type: 'filter', predicate: (data) => data.value > 0 },
{ type: 'map', fn: (data) => ({ ...data, doubled: data.value * 2 }) }
],
outputs: [
{ type: 'file', path: './output/processed.jsonl' },
{ type: 'http', url: 'https://api.example.com/data' }
]
});
enableMidstreamer(synth, {
pipeline
});
CLI Usage
npx agentic-synth generate events \
--count 10000 \
--stream \
--stream-to midstreamer \
--stream-pipeline data-processing
API Reference
interface MidstreamerAdapter {
isAvailable(): boolean;
stream(data: AsyncIterator<any>): Promise<void>;
createPipeline(config: PipelineConfig): StreamPipeline;
}
Agentic-Robotics Integration
Purpose
Integrate synthetic data generation into automation workflows.
Installation
npm install agentic-robotics
Usage
Register Workflows
import { AgenticSynth } from 'agentic-synth';
import { enableAgenticRobotics } from 'agentic-synth/integrations';
const synth = new AgenticSynth();
enableAgenticRobotics(synth, {
workflowEngine: 'default'
});
// Register data generation workflow
synth.integrations.robotics.registerWorkflow('daily-timeseries', async (params) => {
return await synth.generate('timeseries', {
count: params.count || 1000,
startTime: params.startTime,
endTime: params.endTime
});
});
// Trigger workflow
await synth.integrations.robotics.triggerWorkflow('daily-timeseries', {
count: 5000,
startTime: '2024-01-01',
endTime: '2024-01-31'
});
Scheduled Generation
import { createSchedule } from 'agentic-robotics';
const schedule = createSchedule({
workflow: 'daily-timeseries',
cron: '0 0 * * *', // Daily at midnight
params: {
count: 10000
}
});
synth.integrations.robotics.addSchedule(schedule);
CLI Usage
# Register workflow
npx agentic-synth workflow register \
--name daily-data \
--generator timeseries \
--options '{"count": 1000}'
# Trigger workflow
npx agentic-synth workflow trigger daily-data
API Reference
interface AgenticRoboticsAdapter {
isAvailable(): boolean;
registerWorkflow(name: string, generator: Generator): void;
triggerWorkflow(name: string, options: any): Promise<void>;
addSchedule(schedule: Schedule): void;
}
Ruvector Integration
Purpose
Store generated data in vector database for similarity search and retrieval.
Installation
# Ruvector is in the same monorepo, no external install needed
Usage
Basic Vector Storage
import { AgenticSynth } from 'agentic-synth';
import { enableRuvector } from 'agentic-synth/integrations';
const synth = new AgenticSynth();
enableRuvector(synth, {
dbPath: './data/vectors.db',
collectionName: 'synthetic-data',
embeddingModel: 'text-embedding-004',
dimensions: 768
});
// Generate and automatically vectorize
const result = await synth.generate('structured', {
count: 1000,
vectorize: true // Automatically stores in ruvector
});
// Search similar records
const similar = await synth.integrations.ruvector.search({
query: 'sample query',
limit: 10,
threshold: 0.8
});
Custom Embeddings
enableRuvector(synth, {
dbPath: './data/vectors.db',
embeddingFn: async (data) => {
// Custom embedding logic
const text = JSON.stringify(data);
return await generateEmbedding(text);
}
});
Semantic Search
// Generate data with metadata for better search
const result = await synth.generate('structured', {
count: 1000,
schema: {
id: { type: 'string', format: 'uuid' },
content: { type: 'string' },
category: { type: 'enum', enum: ['tech', 'science', 'art'] }
},
vectorize: true
});
// Search by content similarity
const results = await synth.integrations.ruvector.search({
query: 'artificial intelligence',
filter: { category: 'tech' },
limit: 20
});
CLI Usage
# Generate with vectorization
npx agentic-synth generate structured \
--count 1000 \
--schema ./schema.json \
--vectorize-with ruvector \
--vector-db ./data/vectors.db
# Search vectors
npx agentic-synth vector search \
--query "sample query" \
--db ./data/vectors.db \
--limit 10
API Reference
interface RuvectorAdapter {
isAvailable(): boolean;
store(data: any, metadata?: any): Promise<string>;
storeBatch(data: any[], metadata?: any[]): Promise<string[]>;
search(query: SearchQuery, limit?: number): Promise<SearchResult[]>;
delete(id: string): Promise<void>;
update(id: string, data: any): Promise<void>;
}
interface SearchQuery {
query: string | number[];
filter?: Record<string, any>;
threshold?: number;
}
interface SearchResult {
id: string;
score: number;
data: any;
metadata?: any;
}
Combined Integration Example
Multi-Integration Workflow
import { AgenticSynth } from 'agentic-synth';
import {
enableMidstreamer,
enableAgenticRobotics,
enableRuvector
} from 'agentic-synth/integrations';
const synth = new AgenticSynth({
apiKeys: {
gemini: process.env.GEMINI_API_KEY
}
});
// Enable all integrations
enableMidstreamer(synth, {
pipeline: 'data-stream'
});
enableAgenticRobotics(synth, {
workflowEngine: 'default'
});
enableRuvector(synth, {
dbPath: './data/vectors.db'
});
// Register comprehensive workflow
synth.integrations.robotics.registerWorkflow('process-and-store', async (params) => {
// Generate data
const result = await synth.generate('structured', {
count: params.count,
stream: true, // Streams via midstreamer
vectorize: true // Stores in ruvector
});
return result;
});
// Execute workflow
await synth.integrations.robotics.triggerWorkflow('process-and-store', {
count: 10000
});
// Data is now:
// 1. Generated via AI models
// 2. Streamed through midstreamer pipeline
// 3. Stored in ruvector for search
Integration Availability Detection
Runtime Detection
import { AgenticSynth } from 'agentic-synth';
const synth = new AgenticSynth();
// Check which integrations are available
if (synth.integrations.hasMidstreamer()) {
console.log('Midstreamer is available');
}
if (synth.integrations.hasAgenticRobotics()) {
console.log('Agentic-Robotics is available');
}
if (synth.integrations.hasRuvector()) {
console.log('Ruvector is available');
}
Graceful Degradation
// Code works with or without integrations
const result = await synth.generate('timeseries', {
count: 1000,
stream: true, // Only streams if midstreamer available
vectorize: true // Only vectorizes if ruvector available
});
// Always works, integrations are optional
Custom Integrations
Creating Custom Integration Adapters
import { IntegrationAdapter } from 'agentic-synth/integrations';
class MyCustomAdapter implements IntegrationAdapter {
readonly name = 'my-custom-integration';
private available = false;
constructor(private config: any) {
this.detectAvailability();
}
isAvailable(): boolean {
return this.available;
}
async initialize(): Promise<void> {
// Setup logic
}
async processData(data: any[]): Promise<void> {
// Custom processing logic
}
async shutdown(): Promise<void> {
// Cleanup logic
}
private detectAvailability(): void {
try {
require('my-custom-package');
this.available = true;
} catch {
this.available = false;
}
}
}
// Register custom adapter
synth.integrations.register(new MyCustomAdapter(config));
Configuration
Integration Configuration File
{
"integrations": {
"midstreamer": {
"enabled": true,
"pipeline": "synthetic-data-stream",
"bufferSize": 1000,
"flushInterval": 5000,
"transforms": [
{
"type": "filter",
"predicate": "data.value > 0"
}
]
},
"agenticRobotics": {
"enabled": true,
"workflowEngine": "default",
"defaultWorkflow": "data-generation",
"schedules": [
{
"name": "daily-data",
"cron": "0 0 * * *",
"workflow": "daily-timeseries"
}
]
},
"ruvector": {
"enabled": true,
"dbPath": "./data/vectors.db",
"collectionName": "synthetic-data",
"embeddingModel": "text-embedding-004",
"dimensions": 768,
"indexType": "hnsw",
"distanceMetric": "cosine"
}
}
}
Troubleshooting
Integration Not Detected
Problem: Integration marked as unavailable
Solutions:
- Ensure peer dependency is installed:
npm install <package> - Check import/require paths are correct
- Verify package version compatibility
- Check logs for initialization errors
Performance Issues
Problem: Slow generation with integrations
Solutions:
- Adjust buffer sizes for streaming
- Use batch operations instead of individual calls
- Enable caching to avoid redundant processing
- Profile with
synth.integrations.getMetrics()
Memory Issues
Problem: High memory usage with integrations
Solutions:
- Use streaming mode instead of loading all data
- Adjust batch sizes to smaller values
- Clear caches periodically
- Configure TTL for cached data
Best Practices
- Optional Dependencies: Always check
isAvailable()before using integration features - Error Handling: Wrap integration calls in try-catch blocks
- Configuration: Use config files for complex integration setups
- Testing: Test with and without integrations enabled
- Documentation: Document which integrations your workflows depend on
- Monitoring: Track integration metrics and performance
- Versioning: Pin peer dependency versions for stability
Example Projects
See the /examples directory for complete integration examples:
examples/midstreamer-pipeline/- Real-time data streamingexamples/robotics-workflow/- Automated generation workflowsexamples/ruvector-search/- Vector search and retrievalexamples/full-integration/- All integrations combined