git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
10 KiB
Agentic-Synth Architecture Summary
Overview
Complete architecture design for agentic-synth - a TypeScript-based synthetic data generator using Gemini and OpenRouter APIs with streaming and automation support.
Key Design Decisions
1. Technology Stack
Core:
- TypeScript 5.7+ with strict mode
- ESM modules (NodeNext)
- Zod for runtime validation
- Winston for logging
- Commander for CLI
AI Providers:
- Google Gemini API via
@google/generative-ai - OpenRouter API via OpenAI-compatible SDK
Optional Integrations:
- Midstreamer (streaming pipelines)
- Agentic-Robotics (automation workflows)
- Ruvector (vector database) - workspace dependency
2. Architecture Patterns
Dual Interface:
- SDK for programmatic access
- CLI for command-line usage
- CLI uses SDK internally (single source of truth)
Plugin Architecture:
- Generator plugins for different data types
- Model provider plugins for AI APIs
- Integration adapters for external tools
Caching Strategy:
- In-memory LRU cache (no Redis)
- Optional file-based persistence
- Content-based cache keys
Model Routing:
- Cost-optimized routing
- Performance-optimized routing
- Quality-optimized routing
- Fallback chains for reliability
3. Integration Design
Optional Dependencies: All integrations are optional with runtime detection:
- Package works standalone
- Graceful degradation if integrations unavailable
- Clear documentation about optional features
Integration Points:
- Midstreamer: Stream generated data through pipelines
- Agentic-Robotics: Register data generation workflows
- Ruvector: Store generated data as vectors
Project Structure
packages/agentic-synth/
├── src/
│ ├── index.ts # Main SDK entry
│ ├── types/index.ts # Type definitions
│ ├── sdk/AgenticSynth.ts # Main SDK class
│ ├── core/
│ │ ├── Config.ts # Configuration system
│ │ ├── Cache.ts # LRU cache manager
│ │ └── Logger.ts # Logging system
│ ├── generators/
│ │ ├── base.ts # Generator interface
│ │ ├── Hub.ts # Generator registry
│ │ ├── TimeSeries.ts # Time-series generator
│ │ ├── Events.ts # Event generator
│ │ └── Structured.ts # Structured data generator
│ ├── models/
│ │ ├── base.ts # Model provider interface
│ │ ├── Router.ts # Model routing logic
│ │ └── providers/
│ │ ├── Gemini.ts # Gemini integration
│ │ └── OpenRouter.ts # OpenRouter integration
│ ├── integrations/
│ │ ├── Manager.ts # Integration lifecycle
│ │ ├── Midstreamer.ts # Streaming adapter
│ │ ├── AgenticRobotics.ts # Automation adapter
│ │ └── Ruvector.ts # Vector DB adapter
│ ├── bin/
│ │ ├── cli.ts # CLI entry point
│ │ └── commands/ # CLI commands
│ └── utils/
│ ├── validation.ts # Validation helpers
│ ├── serialization.ts # Output formatting
│ └── prompts.ts # AI prompt templates
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── examples/ # Usage examples
├── docs/
│ ├── ARCHITECTURE.md # Complete architecture
│ ├── API.md # API reference
│ ├── INTEGRATION.md # Integration guide
│ ├── DIRECTORY_STRUCTURE.md # Project layout
│ └── IMPLEMENTATION_PLAN.md # Implementation guide
├── config/
│ └── .agentic-synth.example.json
├── package.json
├── tsconfig.json
└── README.md
API Design
SDK API
import { AgenticSynth } from 'agentic-synth';
// Initialize
const synth = new AgenticSynth({
apiKeys: {
gemini: process.env.GEMINI_API_KEY,
openRouter: process.env.OPENROUTER_API_KEY
},
cache: { enabled: true, maxSize: 1000 }
});
// Generate data
const result = await synth.generate('timeseries', {
count: 1000,
schema: { temperature: { type: 'number', min: -20, max: 40 } }
});
// Stream generation
for await (const record of synth.generateStream('events', { count: 1000 })) {
console.log(record);
}
CLI API
# Generate time-series data
npx agentic-synth generate timeseries \
--count 1000 \
--schema ./schema.json \
--output data.json
# Batch generation
npx agentic-synth batch generate \
--config ./batch-config.yaml \
--parallel 4
Data Flow
User Request
↓
Request Parser (validate schema, options)
↓
Generator Hub (select appropriate generator)
↓
Model Router (choose AI model: Gemini/OpenRouter)
↓
Cache Check ──→ Cache Hit? ──→ Return cached
↓ (Miss)
AI Provider (Gemini/OpenRouter)
↓
Generated Data
↓
Post-Processor (validate, transform)
↓
├─→ Store in Cache
├─→ Stream via Midstreamer (if enabled)
├─→ Store in Ruvector (if enabled)
└─→ Output Handler (JSON/CSV/Parquet/Stream)
Key Components
1. Generator System
TimeSeriesGenerator
- Generate time-series data with trends, seasonality, noise
- Configurable sample rates and time ranges
- Statistical distribution control
EventGenerator
- Generate event streams with timestamps
- Rate control (events per second/minute)
- Distribution types (uniform, poisson, bursty)
- Event correlations
StructuredGenerator
- Generate structured records based on schema
- Field type support (string, number, boolean, datetime, enum)
- Constraint enforcement (unique, range, foreign keys)
- Output formats (JSON, CSV, Parquet)
2. Model System
GeminiProvider
- Google Gemini API integration
- Context caching support
- Streaming responses
- Cost tracking
OpenRouterProvider
- OpenRouter API integration
- Multi-model access
- Automatic fallback
- Cost optimization
ModelRouter
- Smart routing strategies
- Fallback chain management
- Cost/performance/quality optimization
- Request caching
3. Integration System
MidstreamerAdapter
- Stream data through pipelines
- Buffer management
- Transform support
- Multiple output targets
AgenticRoboticsAdapter
- Workflow registration
- Scheduled generation
- Event-driven triggers
- Automation integration
RuvectorAdapter
- Vector storage
- Similarity search
- Batch operations
- Embedding generation
Configuration
Environment Variables
GEMINI_API_KEY=your-gemini-key
OPENROUTER_API_KEY=your-openrouter-key
Config File (.agentic-synth.json)
{
"apiKeys": {
"gemini": "${GEMINI_API_KEY}",
"openRouter": "${OPENROUTER_API_KEY}"
},
"cache": {
"enabled": true,
"maxSize": 1000,
"ttl": 3600000
},
"models": {
"routing": {
"strategy": "cost-optimized",
"fallbackChain": ["gemini-pro", "gpt-4"]
}
},
"integrations": {
"midstreamer": { "enabled": false },
"agenticRobotics": { "enabled": false },
"ruvector": { "enabled": false }
}
}
Performance Considerations
Context Caching:
- Hash-based cache keys (prompt + schema + options)
- LRU eviction strategy
- Configurable TTL
- Optional file persistence
Memory Management:
- Streaming for large datasets
- Chunked processing
- Configurable batch sizes
- Memory-efficient formats (JSONL, Parquet)
Model Selection:
- Cost-based: Cheapest model that meets requirements
- Performance-based: Fastest response time
- Quality-based: Highest quality output
- Balanced: Optimize all three factors
Security
API Key Management:
- Environment variable loading
- Config file with variable substitution
- Never log sensitive data
- Secure config file patterns
Data Validation:
- Input validation (Zod schemas)
- Output validation
- Sanitization
- Rate limiting
Testing Strategy
Unit Tests:
- Component isolation
- Mock dependencies
- Logic correctness
Integration Tests:
- Component interactions
- Real dependencies
- Error scenarios
E2E Tests:
- Complete workflows
- CLI commands
- Real API calls (test keys)
Implementation Status
Completed ✅
- Complete architecture design
- Type system definitions
- Core configuration system
- SDK class structure
- Generator interfaces
- Comprehensive documentation
- Package.json with correct dependencies
- TypeScript configuration
- Directory structure
Remaining 🔨
- Cache Manager implementation
- Logger implementation
- Generator implementations
- Model provider implementations
- Model router implementation
- Integration adapters
- CLI commands
- Utilities (serialization, prompts)
- Tests
- Examples
Next Steps for Builder Agent
-
Start with Core Infrastructure
- Implement Cache Manager (
/src/core/Cache.ts) - Implement Logger (
/src/core/Logger.ts)
- Implement Cache Manager (
-
Implement Model System
- Gemini provider
- OpenRouter provider
- Model router
-
Implement Generator System
- Generator Hub
- TimeSeries, Events, Structured generators
-
Wire SDK Together
- Complete AgenticSynth implementation
- Add event emitters
- Add progress tracking
-
Build CLI
- CLI entry point
- Commands (generate, batch, cache, config)
-
Add Integrations
- Midstreamer adapter
- AgenticRobotics adapter
- Ruvector adapter
-
Testing & Examples
- Unit tests
- Integration tests
- Example code
Success Criteria
✅ All TypeScript compiles without errors
✅ npm run build succeeds
✅ npm test passes all tests
✅ npx agentic-synth --help works
✅ Examples run successfully
✅ Documentation is comprehensive
✅ Package ready for npm publish
Resources
- Architecture:
/docs/ARCHITECTURE.md - API Reference:
/docs/API.md - Integration Guide:
/docs/INTEGRATION.md - Implementation Plan:
/docs/IMPLEMENTATION_PLAN.md - Directory Structure:
/docs/DIRECTORY_STRUCTURE.md
Architecture design is complete. Ready for builder agent implementation! 🚀