Files
wifi-densepose/npm/packages/agentic-synth/docs/ARCHITECTURE_SUMMARY.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

10 KiB

Agentic-Synth Architecture Summary

Overview

Complete architecture design for agentic-synth - a TypeScript-based synthetic data generator using Gemini and OpenRouter APIs with streaming and automation support.

Key Design Decisions

1. Technology Stack

Core:

  • TypeScript 5.7+ with strict mode
  • ESM modules (NodeNext)
  • Zod for runtime validation
  • Winston for logging
  • Commander for CLI

AI Providers:

  • Google Gemini API via @google/generative-ai
  • OpenRouter API via OpenAI-compatible SDK

Optional Integrations:

  • Midstreamer (streaming pipelines)
  • Agentic-Robotics (automation workflows)
  • Ruvector (vector database) - workspace dependency

2. Architecture Patterns

Dual Interface:

  • SDK for programmatic access
  • CLI for command-line usage
  • CLI uses SDK internally (single source of truth)

Plugin Architecture:

  • Generator plugins for different data types
  • Model provider plugins for AI APIs
  • Integration adapters for external tools

Caching Strategy:

  • In-memory LRU cache (no Redis)
  • Optional file-based persistence
  • Content-based cache keys

Model Routing:

  • Cost-optimized routing
  • Performance-optimized routing
  • Quality-optimized routing
  • Fallback chains for reliability

3. Integration Design

Optional Dependencies: All integrations are optional with runtime detection:

  • Package works standalone
  • Graceful degradation if integrations unavailable
  • Clear documentation about optional features

Integration Points:

  1. Midstreamer: Stream generated data through pipelines
  2. Agentic-Robotics: Register data generation workflows
  3. Ruvector: Store generated data as vectors

Project Structure

packages/agentic-synth/
├── src/
│   ├── index.ts                 # Main SDK entry
│   ├── types/index.ts           # Type definitions
│   ├── sdk/AgenticSynth.ts      # Main SDK class
│   ├── core/
│   │   ├── Config.ts            # Configuration system
│   │   ├── Cache.ts             # LRU cache manager
│   │   └── Logger.ts            # Logging system
│   ├── generators/
│   │   ├── base.ts              # Generator interface
│   │   ├── Hub.ts               # Generator registry
│   │   ├── TimeSeries.ts        # Time-series generator
│   │   ├── Events.ts            # Event generator
│   │   └── Structured.ts        # Structured data generator
│   ├── models/
│   │   ├── base.ts              # Model provider interface
│   │   ├── Router.ts            # Model routing logic
│   │   └── providers/
│   │       ├── Gemini.ts        # Gemini integration
│   │       └── OpenRouter.ts    # OpenRouter integration
│   ├── integrations/
│   │   ├── Manager.ts           # Integration lifecycle
│   │   ├── Midstreamer.ts       # Streaming adapter
│   │   ├── AgenticRobotics.ts   # Automation adapter
│   │   └── Ruvector.ts          # Vector DB adapter
│   ├── bin/
│   │   ├── cli.ts               # CLI entry point
│   │   └── commands/            # CLI commands
│   └── utils/
│       ├── validation.ts        # Validation helpers
│       ├── serialization.ts     # Output formatting
│       └── prompts.ts           # AI prompt templates
├── tests/
│   ├── unit/                    # Unit tests
│   └── integration/             # Integration tests
├── examples/                    # Usage examples
├── docs/
│   ├── ARCHITECTURE.md          # Complete architecture
│   ├── API.md                   # API reference
│   ├── INTEGRATION.md           # Integration guide
│   ├── DIRECTORY_STRUCTURE.md   # Project layout
│   └── IMPLEMENTATION_PLAN.md   # Implementation guide
├── config/
│   └── .agentic-synth.example.json
├── package.json
├── tsconfig.json
└── README.md

API Design

SDK API

import { AgenticSynth } from 'agentic-synth';

// Initialize
const synth = new AgenticSynth({
  apiKeys: {
    gemini: process.env.GEMINI_API_KEY,
    openRouter: process.env.OPENROUTER_API_KEY
  },
  cache: { enabled: true, maxSize: 1000 }
});

// Generate data
const result = await synth.generate('timeseries', {
  count: 1000,
  schema: { temperature: { type: 'number', min: -20, max: 40 } }
});

// Stream generation
for await (const record of synth.generateStream('events', { count: 1000 })) {
  console.log(record);
}

CLI API

# Generate time-series data
npx agentic-synth generate timeseries \
  --count 1000 \
  --schema ./schema.json \
  --output data.json

# Batch generation
npx agentic-synth batch generate \
  --config ./batch-config.yaml \
  --parallel 4

Data Flow

User Request
    ↓
Request Parser (validate schema, options)
    ↓
Generator Hub (select appropriate generator)
    ↓
Model Router (choose AI model: Gemini/OpenRouter)
    ↓
Cache Check ──→ Cache Hit? ──→ Return cached
    ↓ (Miss)
AI Provider (Gemini/OpenRouter)
    ↓
Generated Data
    ↓
Post-Processor (validate, transform)
    ↓
├─→ Store in Cache
├─→ Stream via Midstreamer (if enabled)
├─→ Store in Ruvector (if enabled)
└─→ Output Handler (JSON/CSV/Parquet/Stream)

Key Components

1. Generator System

TimeSeriesGenerator

  • Generate time-series data with trends, seasonality, noise
  • Configurable sample rates and time ranges
  • Statistical distribution control

EventGenerator

  • Generate event streams with timestamps
  • Rate control (events per second/minute)
  • Distribution types (uniform, poisson, bursty)
  • Event correlations

StructuredGenerator

  • Generate structured records based on schema
  • Field type support (string, number, boolean, datetime, enum)
  • Constraint enforcement (unique, range, foreign keys)
  • Output formats (JSON, CSV, Parquet)

2. Model System

GeminiProvider

  • Google Gemini API integration
  • Context caching support
  • Streaming responses
  • Cost tracking

OpenRouterProvider

  • OpenRouter API integration
  • Multi-model access
  • Automatic fallback
  • Cost optimization

ModelRouter

  • Smart routing strategies
  • Fallback chain management
  • Cost/performance/quality optimization
  • Request caching

3. Integration System

MidstreamerAdapter

  • Stream data through pipelines
  • Buffer management
  • Transform support
  • Multiple output targets

AgenticRoboticsAdapter

  • Workflow registration
  • Scheduled generation
  • Event-driven triggers
  • Automation integration

RuvectorAdapter

  • Vector storage
  • Similarity search
  • Batch operations
  • Embedding generation

Configuration

Environment Variables

GEMINI_API_KEY=your-gemini-key
OPENROUTER_API_KEY=your-openrouter-key

Config File (.agentic-synth.json)

{
  "apiKeys": {
    "gemini": "${GEMINI_API_KEY}",
    "openRouter": "${OPENROUTER_API_KEY}"
  },
  "cache": {
    "enabled": true,
    "maxSize": 1000,
    "ttl": 3600000
  },
  "models": {
    "routing": {
      "strategy": "cost-optimized",
      "fallbackChain": ["gemini-pro", "gpt-4"]
    }
  },
  "integrations": {
    "midstreamer": { "enabled": false },
    "agenticRobotics": { "enabled": false },
    "ruvector": { "enabled": false }
  }
}

Performance Considerations

Context Caching:

  • Hash-based cache keys (prompt + schema + options)
  • LRU eviction strategy
  • Configurable TTL
  • Optional file persistence

Memory Management:

  • Streaming for large datasets
  • Chunked processing
  • Configurable batch sizes
  • Memory-efficient formats (JSONL, Parquet)

Model Selection:

  • Cost-based: Cheapest model that meets requirements
  • Performance-based: Fastest response time
  • Quality-based: Highest quality output
  • Balanced: Optimize all three factors

Security

API Key Management:

  • Environment variable loading
  • Config file with variable substitution
  • Never log sensitive data
  • Secure config file patterns

Data Validation:

  • Input validation (Zod schemas)
  • Output validation
  • Sanitization
  • Rate limiting

Testing Strategy

Unit Tests:

  • Component isolation
  • Mock dependencies
  • Logic correctness

Integration Tests:

  • Component interactions
  • Real dependencies
  • Error scenarios

E2E Tests:

  • Complete workflows
  • CLI commands
  • Real API calls (test keys)

Implementation Status

Completed

  • Complete architecture design
  • Type system definitions
  • Core configuration system
  • SDK class structure
  • Generator interfaces
  • Comprehensive documentation
  • Package.json with correct dependencies
  • TypeScript configuration
  • Directory structure

Remaining 🔨

  • Cache Manager implementation
  • Logger implementation
  • Generator implementations
  • Model provider implementations
  • Model router implementation
  • Integration adapters
  • CLI commands
  • Utilities (serialization, prompts)
  • Tests
  • Examples

Next Steps for Builder Agent

  1. Start with Core Infrastructure

    • Implement Cache Manager (/src/core/Cache.ts)
    • Implement Logger (/src/core/Logger.ts)
  2. Implement Model System

    • Gemini provider
    • OpenRouter provider
    • Model router
  3. Implement Generator System

    • Generator Hub
    • TimeSeries, Events, Structured generators
  4. Wire SDK Together

    • Complete AgenticSynth implementation
    • Add event emitters
    • Add progress tracking
  5. Build CLI

    • CLI entry point
    • Commands (generate, batch, cache, config)
  6. Add Integrations

    • Midstreamer adapter
    • AgenticRobotics adapter
    • Ruvector adapter
  7. Testing & Examples

    • Unit tests
    • Integration tests
    • Example code

Success Criteria

All TypeScript compiles without errors npm run build succeeds npm test passes all tests npx agentic-synth --help works Examples run successfully Documentation is comprehensive Package ready for npm publish

Resources

  • Architecture: /docs/ARCHITECTURE.md
  • API Reference: /docs/API.md
  • Integration Guide: /docs/INTEGRATION.md
  • Implementation Plan: /docs/IMPLEMENTATION_PLAN.md
  • Directory Structure: /docs/DIRECTORY_STRUCTURE.md

Architecture design is complete. Ready for builder agent implementation! 🚀