Files
wifi-densepose/npm/packages/agentic-synth/docs/ARCHITECTURE_SUMMARY.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

412 lines
10 KiB
Markdown

# Agentic-Synth Architecture Summary
## Overview
Complete architecture design for **agentic-synth** - a TypeScript-based synthetic data generator using Gemini and OpenRouter APIs with streaming and automation support.
## Key Design Decisions
### 1. Technology Stack
**Core:**
- TypeScript 5.7+ with strict mode
- ESM modules (NodeNext)
- Zod for runtime validation
- Winston for logging
- Commander for CLI
**AI Providers:**
- Google Gemini API via `@google/generative-ai`
- OpenRouter API via OpenAI-compatible SDK
**Optional Integrations:**
- Midstreamer (streaming pipelines)
- Agentic-Robotics (automation workflows)
- Ruvector (vector database) - workspace dependency
### 2. Architecture Patterns
**Dual Interface:**
- SDK for programmatic access
- CLI for command-line usage
- CLI uses SDK internally (single source of truth)
**Plugin Architecture:**
- Generator plugins for different data types
- Model provider plugins for AI APIs
- Integration adapters for external tools
**Caching Strategy:**
- In-memory LRU cache (no Redis)
- Optional file-based persistence
- Content-based cache keys
**Model Routing:**
- Cost-optimized routing
- Performance-optimized routing
- Quality-optimized routing
- Fallback chains for reliability
### 3. Integration Design
**Optional Dependencies:**
All integrations are optional with runtime detection:
- Package works standalone
- Graceful degradation if integrations unavailable
- Clear documentation about optional features
**Integration Points:**
1. **Midstreamer**: Stream generated data through pipelines
2. **Agentic-Robotics**: Register data generation workflows
3. **Ruvector**: Store generated data as vectors
## Project Structure
```
packages/agentic-synth/
├── src/
│ ├── index.ts # Main SDK entry
│ ├── types/index.ts # Type definitions
│ ├── sdk/AgenticSynth.ts # Main SDK class
│ ├── core/
│ │ ├── Config.ts # Configuration system
│ │ ├── Cache.ts # LRU cache manager
│ │ └── Logger.ts # Logging system
│ ├── generators/
│ │ ├── base.ts # Generator interface
│ │ ├── Hub.ts # Generator registry
│ │ ├── TimeSeries.ts # Time-series generator
│ │ ├── Events.ts # Event generator
│ │ └── Structured.ts # Structured data generator
│ ├── models/
│ │ ├── base.ts # Model provider interface
│ │ ├── Router.ts # Model routing logic
│ │ └── providers/
│ │ ├── Gemini.ts # Gemini integration
│ │ └── OpenRouter.ts # OpenRouter integration
│ ├── integrations/
│ │ ├── Manager.ts # Integration lifecycle
│ │ ├── Midstreamer.ts # Streaming adapter
│ │ ├── AgenticRobotics.ts # Automation adapter
│ │ └── Ruvector.ts # Vector DB adapter
│ ├── bin/
│ │ ├── cli.ts # CLI entry point
│ │ └── commands/ # CLI commands
│ └── utils/
│ ├── validation.ts # Validation helpers
│ ├── serialization.ts # Output formatting
│ └── prompts.ts # AI prompt templates
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── examples/ # Usage examples
├── docs/
│ ├── ARCHITECTURE.md # Complete architecture
│ ├── API.md # API reference
│ ├── INTEGRATION.md # Integration guide
│ ├── DIRECTORY_STRUCTURE.md # Project layout
│ └── IMPLEMENTATION_PLAN.md # Implementation guide
├── config/
│ └── .agentic-synth.example.json
├── package.json
├── tsconfig.json
└── README.md
```
## API Design
### SDK API
```typescript
import { AgenticSynth } from 'agentic-synth';
// Initialize
const synth = new AgenticSynth({
apiKeys: {
gemini: process.env.GEMINI_API_KEY,
openRouter: process.env.OPENROUTER_API_KEY
},
cache: { enabled: true, maxSize: 1000 }
});
// Generate data
const result = await synth.generate('timeseries', {
count: 1000,
schema: { temperature: { type: 'number', min: -20, max: 40 } }
});
// Stream generation
for await (const record of synth.generateStream('events', { count: 1000 })) {
console.log(record);
}
```
### CLI API
```bash
# Generate time-series data
npx agentic-synth generate timeseries \
--count 1000 \
--schema ./schema.json \
--output data.json
# Batch generation
npx agentic-synth batch generate \
--config ./batch-config.yaml \
--parallel 4
```
## Data Flow
```
User Request
Request Parser (validate schema, options)
Generator Hub (select appropriate generator)
Model Router (choose AI model: Gemini/OpenRouter)
Cache Check ──→ Cache Hit? ──→ Return cached
↓ (Miss)
AI Provider (Gemini/OpenRouter)
Generated Data
Post-Processor (validate, transform)
├─→ Store in Cache
├─→ Stream via Midstreamer (if enabled)
├─→ Store in Ruvector (if enabled)
└─→ Output Handler (JSON/CSV/Parquet/Stream)
```
## Key Components
### 1. Generator System
**TimeSeriesGenerator**
- Generate time-series data with trends, seasonality, noise
- Configurable sample rates and time ranges
- Statistical distribution control
**EventGenerator**
- Generate event streams with timestamps
- Rate control (events per second/minute)
- Distribution types (uniform, poisson, bursty)
- Event correlations
**StructuredGenerator**
- Generate structured records based on schema
- Field type support (string, number, boolean, datetime, enum)
- Constraint enforcement (unique, range, foreign keys)
- Output formats (JSON, CSV, Parquet)
### 2. Model System
**GeminiProvider**
- Google Gemini API integration
- Context caching support
- Streaming responses
- Cost tracking
**OpenRouterProvider**
- OpenRouter API integration
- Multi-model access
- Automatic fallback
- Cost optimization
**ModelRouter**
- Smart routing strategies
- Fallback chain management
- Cost/performance/quality optimization
- Request caching
### 3. Integration System
**MidstreamerAdapter**
- Stream data through pipelines
- Buffer management
- Transform support
- Multiple output targets
**AgenticRoboticsAdapter**
- Workflow registration
- Scheduled generation
- Event-driven triggers
- Automation integration
**RuvectorAdapter**
- Vector storage
- Similarity search
- Batch operations
- Embedding generation
## Configuration
### Environment Variables
```bash
GEMINI_API_KEY=your-gemini-key
OPENROUTER_API_KEY=your-openrouter-key
```
### Config File (`.agentic-synth.json`)
```json
{
"apiKeys": {
"gemini": "${GEMINI_API_KEY}",
"openRouter": "${OPENROUTER_API_KEY}"
},
"cache": {
"enabled": true,
"maxSize": 1000,
"ttl": 3600000
},
"models": {
"routing": {
"strategy": "cost-optimized",
"fallbackChain": ["gemini-pro", "gpt-4"]
}
},
"integrations": {
"midstreamer": { "enabled": false },
"agenticRobotics": { "enabled": false },
"ruvector": { "enabled": false }
}
}
```
## Performance Considerations
**Context Caching:**
- Hash-based cache keys (prompt + schema + options)
- LRU eviction strategy
- Configurable TTL
- Optional file persistence
**Memory Management:**
- Streaming for large datasets
- Chunked processing
- Configurable batch sizes
- Memory-efficient formats (JSONL, Parquet)
**Model Selection:**
- Cost-based: Cheapest model that meets requirements
- Performance-based: Fastest response time
- Quality-based: Highest quality output
- Balanced: Optimize all three factors
## Security
**API Key Management:**
- Environment variable loading
- Config file with variable substitution
- Never log sensitive data
- Secure config file patterns
**Data Validation:**
- Input validation (Zod schemas)
- Output validation
- Sanitization
- Rate limiting
## Testing Strategy
**Unit Tests:**
- Component isolation
- Mock dependencies
- Logic correctness
**Integration Tests:**
- Component interactions
- Real dependencies
- Error scenarios
**E2E Tests:**
- Complete workflows
- CLI commands
- Real API calls (test keys)
## Implementation Status
### Completed ✅
- Complete architecture design
- Type system definitions
- Core configuration system
- SDK class structure
- Generator interfaces
- Comprehensive documentation
- Package.json with correct dependencies
- TypeScript configuration
- Directory structure
### Remaining 🔨
- Cache Manager implementation
- Logger implementation
- Generator implementations
- Model provider implementations
- Model router implementation
- Integration adapters
- CLI commands
- Utilities (serialization, prompts)
- Tests
- Examples
## Next Steps for Builder Agent
1. **Start with Core Infrastructure**
- Implement Cache Manager (`/src/core/Cache.ts`)
- Implement Logger (`/src/core/Logger.ts`)
2. **Implement Model System**
- Gemini provider
- OpenRouter provider
- Model router
3. **Implement Generator System**
- Generator Hub
- TimeSeries, Events, Structured generators
4. **Wire SDK Together**
- Complete AgenticSynth implementation
- Add event emitters
- Add progress tracking
5. **Build CLI**
- CLI entry point
- Commands (generate, batch, cache, config)
6. **Add Integrations**
- Midstreamer adapter
- AgenticRobotics adapter
- Ruvector adapter
7. **Testing & Examples**
- Unit tests
- Integration tests
- Example code
## Success Criteria
✅ All TypeScript compiles without errors
`npm run build` succeeds
`npm test` passes all tests
`npx agentic-synth --help` works
✅ Examples run successfully
✅ Documentation is comprehensive
✅ Package ready for npm publish
## Resources
- **Architecture**: `/docs/ARCHITECTURE.md`
- **API Reference**: `/docs/API.md`
- **Integration Guide**: `/docs/INTEGRATION.md`
- **Implementation Plan**: `/docs/IMPLEMENTATION_PLAN.md`
- **Directory Structure**: `/docs/DIRECTORY_STRUCTURE.md`
---
**Architecture design is complete. Ready for builder agent implementation!** 🚀