dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

11 KiB

Raw Blame History

Integration Guide

This document describes how agentic-synth integrates with external tools and libraries.

Integration Overview

Agentic-synth supports optional integrations with:

Midstreamer - Streaming data pipelines
Agentic-Robotics - Automation workflows
Ruvector - Vector database for embeddings

All integrations are:

Optional - Package works without them
Peer dependencies - Installed separately
Runtime detected - Gracefully degrade if unavailable
Adapter-based - Clean integration boundaries

Midstreamer Integration

Purpose

Stream generated data through pipelines for real-time processing.

Installation

npm install midstreamer

Usage

Basic Streaming

import { AgenticSynth } from 'agentic-synth';
import { enableMidstreamer } from 'agentic-synth/integrations';

const synth = new AgenticSynth();

// Enable midstreamer integration
enableMidstreamer(synth, {
  pipeline: 'synthetic-data-stream',
  bufferSize: 1000,
  flushInterval: 5000 // ms
});

// Generate with streaming
const result = await synth.generate('timeseries', {
  count: 10000,
  stream: true // Automatically streams via midstreamer
});

Custom Pipeline

import { createPipeline } from 'midstreamer';

const pipeline = createPipeline({
  name: 'data-processing',
  transforms: [
    { type: 'filter', predicate: (data) => data.value > 0 },
    { type: 'map', fn: (data) => ({ ...data, doubled: data.value * 2 }) }
  ],
  outputs: [
    { type: 'file', path: './output/processed.jsonl' },
    { type: 'http', url: 'https://api.example.com/data' }
  ]
});

enableMidstreamer(synth, {
  pipeline
});

CLI Usage

npx agentic-synth generate events \
  --count 10000 \
  --stream \
  --stream-to midstreamer \
  --stream-pipeline data-processing

API Reference

interface MidstreamerAdapter {
  isAvailable(): boolean;
  stream(data: AsyncIterator<any>): Promise<void>;
  createPipeline(config: PipelineConfig): StreamPipeline;
}

Agentic-Robotics Integration

Purpose

Integrate synthetic data generation into automation workflows.

Installation

npm install agentic-robotics

Usage

Register Workflows

import { AgenticSynth } from 'agentic-synth';
import { enableAgenticRobotics } from 'agentic-synth/integrations';

const synth = new AgenticSynth();

enableAgenticRobotics(synth, {
  workflowEngine: 'default'
});

// Register data generation workflow
synth.integrations.robotics.registerWorkflow('daily-timeseries', async (params) => {
  return await synth.generate('timeseries', {
    count: params.count || 1000,
    startTime: params.startTime,
    endTime: params.endTime
  });
});

// Trigger workflow
await synth.integrations.robotics.triggerWorkflow('daily-timeseries', {
  count: 5000,
  startTime: '2024-01-01',
  endTime: '2024-01-31'
});

Scheduled Generation

import { createSchedule } from 'agentic-robotics';

const schedule = createSchedule({
  workflow: 'daily-timeseries',
  cron: '0 0 * * *', // Daily at midnight
  params: {
    count: 10000
  }
});

synth.integrations.robotics.addSchedule(schedule);

CLI Usage

# Register workflow
npx agentic-synth workflow register \
  --name daily-data \
  --generator timeseries \
  --options '{"count": 1000}'

# Trigger workflow
npx agentic-synth workflow trigger daily-data

API Reference

interface AgenticRoboticsAdapter {
  isAvailable(): boolean;
  registerWorkflow(name: string, generator: Generator): void;
  triggerWorkflow(name: string, options: any): Promise<void>;
  addSchedule(schedule: Schedule): void;
}

Ruvector Integration

Purpose

Store generated data in vector database for similarity search and retrieval.

Installation

# Ruvector is in the same monorepo, no external install needed

Usage

Basic Vector Storage

import { AgenticSynth } from 'agentic-synth';
import { enableRuvector } from 'agentic-synth/integrations';

const synth = new AgenticSynth();

enableRuvector(synth, {
  dbPath: './data/vectors.db',
  collectionName: 'synthetic-data',
  embeddingModel: 'text-embedding-004',
  dimensions: 768
});

// Generate and automatically vectorize
const result = await synth.generate('structured', {
  count: 1000,
  vectorize: true // Automatically stores in ruvector
});

// Search similar records
const similar = await synth.integrations.ruvector.search({
  query: 'sample query',
  limit: 10,
  threshold: 0.8
});

Custom Embeddings

enableRuvector(synth, {
  dbPath: './data/vectors.db',
  embeddingFn: async (data) => {
    // Custom embedding logic
    const text = JSON.stringify(data);
    return await generateEmbedding(text);
  }
});

Semantic Search

// Generate data with metadata for better search
const result = await synth.generate('structured', {
  count: 1000,
  schema: {
    id: { type: 'string', format: 'uuid' },
    content: { type: 'string' },
    category: { type: 'enum', enum: ['tech', 'science', 'art'] }
  },
  vectorize: true
});

// Search by content similarity
const results = await synth.integrations.ruvector.search({
  query: 'artificial intelligence',
  filter: { category: 'tech' },
  limit: 20
});

CLI Usage

# Generate with vectorization
npx agentic-synth generate structured \
  --count 1000 \
  --schema ./schema.json \
  --vectorize-with ruvector \
  --vector-db ./data/vectors.db

# Search vectors
npx agentic-synth vector search \
  --query "sample query" \
  --db ./data/vectors.db \
  --limit 10

API Reference

interface RuvectorAdapter {
  isAvailable(): boolean;
  store(data: any, metadata?: any): Promise<string>;
  storeBatch(data: any[], metadata?: any[]): Promise<string[]>;
  search(query: SearchQuery, limit?: number): Promise<SearchResult[]>;
  delete(id: string): Promise<void>;
  update(id: string, data: any): Promise<void>;
}

interface SearchQuery {
  query: string | number[];
  filter?: Record<string, any>;
  threshold?: number;
}

interface SearchResult {
  id: string;
  score: number;
  data: any;
  metadata?: any;
}

Combined Integration Example

Multi-Integration Workflow

import { AgenticSynth } from 'agentic-synth';
import {
  enableMidstreamer,
  enableAgenticRobotics,
  enableRuvector
} from 'agentic-synth/integrations';

const synth = new AgenticSynth({
  apiKeys: {
    gemini: process.env.GEMINI_API_KEY
  }
});

// Enable all integrations
enableMidstreamer(synth, {
  pipeline: 'data-stream'
});

enableAgenticRobotics(synth, {
  workflowEngine: 'default'
});

enableRuvector(synth, {
  dbPath: './data/vectors.db'
});

// Register comprehensive workflow
synth.integrations.robotics.registerWorkflow('process-and-store', async (params) => {
  // Generate data
  const result = await synth.generate('structured', {
    count: params.count,
    stream: true,      // Streams via midstreamer
    vectorize: true    // Stores in ruvector
  });

  return result;
});

// Execute workflow
await synth.integrations.robotics.triggerWorkflow('process-and-store', {
  count: 10000
});

// Data is now:
// 1. Generated via AI models
// 2. Streamed through midstreamer pipeline
// 3. Stored in ruvector for search

Integration Availability Detection

Runtime Detection

import { AgenticSynth } from 'agentic-synth';

const synth = new AgenticSynth();

// Check which integrations are available
if (synth.integrations.hasMidstreamer()) {
  console.log('Midstreamer is available');
}

if (synth.integrations.hasAgenticRobotics()) {
  console.log('Agentic-Robotics is available');
}

if (synth.integrations.hasRuvector()) {
  console.log('Ruvector is available');
}

Graceful Degradation

// Code works with or without integrations
const result = await synth.generate('timeseries', {
  count: 1000,
  stream: true,      // Only streams if midstreamer available
  vectorize: true    // Only vectorizes if ruvector available
});

// Always works, integrations are optional

Custom Integrations

Creating Custom Integration Adapters

import { IntegrationAdapter } from 'agentic-synth/integrations';

class MyCustomAdapter implements IntegrationAdapter {
  readonly name = 'my-custom-integration';
  private available = false;

  constructor(private config: any) {
    this.detectAvailability();
  }

  isAvailable(): boolean {
    return this.available;
  }

  async initialize(): Promise<void> {
    // Setup logic
  }

  async processData(data: any[]): Promise<void> {
    // Custom processing logic
  }

  async shutdown(): Promise<void> {
    // Cleanup logic
  }

  private detectAvailability(): void {
    try {
      require('my-custom-package');
      this.available = true;
    } catch {
      this.available = false;
    }
  }
}

// Register custom adapter
synth.integrations.register(new MyCustomAdapter(config));

Configuration

Integration Configuration File

{
  "integrations": {
    "midstreamer": {
      "enabled": true,
      "pipeline": "synthetic-data-stream",
      "bufferSize": 1000,
      "flushInterval": 5000,
      "transforms": [
        {
          "type": "filter",
          "predicate": "data.value > 0"
        }
      ]
    },
    "agenticRobotics": {
      "enabled": true,
      "workflowEngine": "default",
      "defaultWorkflow": "data-generation",
      "schedules": [
        {
          "name": "daily-data",
          "cron": "0 0 * * *",
          "workflow": "daily-timeseries"
        }
      ]
    },
    "ruvector": {
      "enabled": true,
      "dbPath": "./data/vectors.db",
      "collectionName": "synthetic-data",
      "embeddingModel": "text-embedding-004",
      "dimensions": 768,
      "indexType": "hnsw",
      "distanceMetric": "cosine"
    }
  }
}

Troubleshooting

Integration Not Detected

Problem: Integration marked as unavailable

Solutions:

Ensure peer dependency is installed: npm install <package>
Check import/require paths are correct
Verify package version compatibility
Check logs for initialization errors

Performance Issues

Problem: Slow generation with integrations

Solutions:

Adjust buffer sizes for streaming
Use batch operations instead of individual calls
Enable caching to avoid redundant processing
Profile with synth.integrations.getMetrics()

Memory Issues

Problem: High memory usage with integrations

Solutions:

Use streaming mode instead of loading all data
Adjust batch sizes to smaller values
Clear caches periodically
Configure TTL for cached data

Best Practices

Optional Dependencies: Always check isAvailable() before using integration features
Error Handling: Wrap integration calls in try-catch blocks
Configuration: Use config files for complex integration setups
Testing: Test with and without integrations enabled
Documentation: Document which integrations your workflows depend on
Monitoring: Track integration metrics and performance
Versioning: Pin peer dependency versions for stability

Example Projects

See the /examples directory for complete integration examples:

examples/midstreamer-pipeline/ - Real-time data streaming
examples/robotics-workflow/ - Automated generation workflows
examples/ruvector-search/ - Vector search and retrieval
examples/full-integration/ - All integrations combined

11 KiB Raw Blame History

Integration Guide

Integration Overview

Midstreamer Integration

Purpose

Installation

Usage

Basic Streaming

Custom Pipeline

CLI Usage

API Reference

Agentic-Robotics Integration

Purpose

Installation

Usage

Register Workflows

Scheduled Generation

CLI Usage

API Reference

Ruvector Integration

Purpose

Installation

Usage

Basic Vector Storage

Custom Embeddings

Semantic Search

CLI Usage

API Reference

Combined Integration Example

Multi-Integration Workflow

Integration Availability Detection

Runtime Detection

Graceful Degradation

Custom Integrations

Creating Custom Integration Adapters

Configuration

Integration Configuration File

Troubleshooting

Integration Not Detected

Performance Issues

Memory Issues

Best Practices

Example Projects

11 KiB

Raw Blame History