wifi-densepose/npm/packages/agentic-synth/docs/ADVANCED_USAGE.md

# 🎓 Agentic-Synth Advanced Usage Guide

**Version**: 0.1.0
**Last Updated**: 2025-11-22

---

## Table of Contents

1. [Advanced Data Generation](#1-advanced-data-generation)
2. [Real-World Integration Examples](#2-real-world-integration-examples)
3. [Performance Optimization](#3-performance-optimization)
4. [Production Deployment](#4-production-deployment)
5. [Error Handling & Monitoring](#5-error-handling--monitoring)
6. [Advanced Patterns](#6-advanced-patterns)

---

## 1. Advanced Data Generation

### 1.1 Complex Time-Series with Custom Patterns

Generate realistic stock market data with multiple indicators:

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory',
  cacheTTL: 3600
});

// Generate 1 year of stock market data
const stockData = await synth.generateTimeSeries({
  count: 365,
  startDate: '2024-01-01',
  interval: '1d',
  schema: {
    date: 'ISO date',
    open: 'number (100-200)',
    high: 'number (105-210)',
    low: 'number (95-195)',
    close: 'number (100-200)',
    volume: 'number (1000000-10000000)',
    // Technical indicators
    sma_20: 'number (calculated 20-day moving average)',
    rsi_14: 'number (0-100, RSI indicator)',
    macd: 'number (-5 to 5)',
    bollinger_upper: 'number',
    bollinger_lower: 'number'
  },
  constraints: [
    'high must be >= open and close',
    'low must be <= open and close',
    'close of previous day influences next day open (± 5%)',
    'volume increases on large price changes',
    'RSI correlates with price momentum',
    'SMA_20 follows price trend'
  ]
});

console.log(`Generated ${stockData.data.length} days of stock data`);
console.log('Cache hit rate:', stockData.metadata.cached ? '100%' : '0%');
```

### 1.2 Multi-User Event Simulation

Simulate realistic user behavior across 100 users:

```typescript
const userBehaviorSim = await synth.generateEvents({
  count: 10000,
  timeRange: {
    start: '2024-01-01T00:00:00Z',
    end: '2024-01-31T23:59:59Z'
  },
  eventTypes: [
    'page_view',
    'click',
    'form_submit',
    'purchase',
    'logout',
    'search',
    'add_to_cart'
  ],
  schema: {
    event_id: 'UUID',
    user_id: 'UUID (one of 100 consistent users)',
    event_type: 'one of eventTypes',
    timestamp: 'ISO timestamp within timeRange',
    session_id: 'UUID (consistent per user session)',
    page_url: 'URL path',
    metadata: {
      device: 'mobile | desktop | tablet',
      browser: 'chrome | firefox | safari | edge',
      country: 'ISO country code',
      referrer: 'URL or null',
      duration_ms: 'number (100-30000 for page_view)',
      cart_value: 'number (only for add_to_cart/purchase)',
      search_query: 'string (only for search events)'
    }
  },
  constraints: [
    'Users follow realistic session patterns (20-40 events per session)',
    'Purchase events must be preceded by add_to_cart',
    'Events follow temporal ordering per user',
    'Session gaps between 30min-24hours',
    'Time distribution follows Poisson with peak hours 10am-4pm',
    'Mobile users more common in evening hours',
    'Purchase conversion rate ~2-3%'
  ]
});

console.log('Event simulation complete');
console.log('Total events:', userBehaviorSim.data.length);
console.log('Unique users:', new Set(userBehaviorSim.data.map(e => e.user_id)).size);
```

### 1.3 Nested Schema Generation

Generate complex e-commerce orders with line items:

```typescript
const orders = await synth.generateStructured({
  count: 1000,
  schema: {
    order_id: 'UUID',
    customer: {
      customer_id: 'UUID',
      email: 'valid email',
      name: 'full name',
      phone: 'phone number with country code',
      address: {
        street: 'street address',
        city: 'city name',
        state: 'US state code',
        zip: 'US ZIP code',
        country: 'USA'
      }
    },
    items: [
      {
        sku: 'product SKU code',
        name: 'product name',
        category: 'electronics | clothing | home | books | food',
        quantity: 'number (1-10)',
        unit_price: 'number (5-500)',
        discount_percent: 'number (0-30)',
        subtotal: 'calculated: quantity * unit_price * (1 - discount_percent/100)'
      }
    ],
    payment: {
      method: 'credit_card | paypal | apple_pay | google_pay',
      status: 'pending | completed | failed | refunded',
      transaction_id: 'UUID',
      amount: 'sum of all item subtotals'
    },
    shipping: {
      method: 'standard | express | overnight',
      cost: 'number (0-50, based on method)',
      tracking_number: 'tracking code',
      estimated_delivery: 'ISO date (3-10 days from order date)'
    },
    order_date: 'ISO timestamp',
    status: 'pending | processing | shipped | delivered | cancelled',
    total_amount: 'payment.amount + shipping.cost',
    notes: 'optional customer notes or null'
  },
  constraints: [
    'Each order has 1-8 line items',
    'Total amount must equal sum of items + shipping',
    'Status progression: pending → processing → shipped → delivered',
    'Express shipping costs 2x standard',
    'Overnight shipping costs 4x standard',
    'Electronics have 10-20% discount',
    'Credit card most common payment (60%)',
    'Standard shipping most common (70%)'
  ]
});

console.log(`Generated ${orders.data.length} complex orders`);
```

---

## 2. Real-World Integration Examples

### 2.1 ML Training Pipeline

Complete machine learning data generation pipeline:

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';
import * as fs from 'fs/promises';

class MLTrainingPipeline {
  private synth: AgenticSynth;

  constructor() {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      maxCacheSize: 10000 // Large cache for ML datasets
    });
  }

  async generateTrainingData(samples = 5000) {
    console.log(`Generating ${samples} training samples...`);

    const training = await this.synth.generateStructured({
      count: samples,
      schema: {
        // Feature engineering
        customer_age: 'number (18-80)',
        annual_income: 'number (20000-200000)',
        credit_score: 'number (300-850)',
        account_tenure_months: 'number (1-360)',
        num_products: 'number (1-5)',
        balance: 'number (0-250000)',
        num_transactions_12m: 'number (0-200)',
        avg_transaction_amount: 'number (10-5000)',
        credit_utilization: 'number (0-100)',
        num_late_payments: 'number (0-10)',

        // Target variable
        churn: 'boolean (based on features: higher likelihood if credit_score<600, num_late_payments>3, balance<1000, credit_utilization>80)'
      },
      constraints: [
        'Credit utilization correlates negatively with credit score',
        'Higher income correlates with higher balance',
        'Churn rate should be ~15-20% (imbalanced dataset)',
        'Customers with 1 product more likely to churn',
        'Tenure > 24 months reduces churn likelihood'
      ]
    });

    await fs.writeFile(
      'ml_data/training.json',
      JSON.stringify(training.data, null, 2)
    );

    return training;
  }

  async generateTestData(samples = 1000) {
    console.log(`Generating ${samples} test samples...`);

    // Similar schema, different random seed
    const test = await this.synth.generateStructured({
      count: samples,
      schema: { /* same as training */ },
      constraints: [ /* same as training */ ]
    });

    await fs.writeFile(
      'ml_data/test.json',
      JSON.stringify(test.data, null, 2)
    );

    return test;
  }

  async generateEdgeCases(samples = 100) {
    console.log(`Generating ${samples} edge case samples...`);

    const edgeCases = await this.synth.generateStructured({
      count: samples,
      schema: { /* same schema */ },
      constraints: [
        'Include extreme values: age 18-22 or 75-80',
        'Include very high credit_utilization (90-100)',
        'Include very low credit_score (300-400)',
        'Include zero balance accounts',
        'Include customers with num_products = 5'
      ]
    });

    await fs.writeFile(
      'ml_data/edge_cases.json',
      JSON.stringify(edgeCases.data, null, 2)
    );

    return edgeCases;
  }

  async run() {
    await fs.mkdir('ml_data', { recursive: true });

    const [training, test, edges] = await Promise.all([
      this.generateTrainingData(5000),
      this.generateTestData(1000),
      this.generateEdgeCases(100)
    ]);

    console.log('\n📊 ML Dataset Generation Complete:');
    console.log(`   Training: ${training.data.length} samples`);
    console.log(`   Test: ${test.data.length} samples`);
    console.log(`   Edge Cases: ${edges.data.length} samples`);
    console.log(`   Total generation time: ${
      training.metadata.generationTime +
      test.metadata.generationTime +
      edges.metadata.generationTime
    }ms`);

    // Get cache statistics
    const cacheStats = this.synth.cache.getStats();
    console.log(`   Cache hit rate: ${(cacheStats.hitRate * 100).toFixed(1)}%`);
  }
}

// Run the pipeline
const pipeline = new MLTrainingPipeline();
await pipeline.run();
```

### 2.2 RAG System Data Generation

Generate Q&A pairs for Retrieval-Augmented Generation:

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';
import { RuvectorClient } from 'ruvector'; // Optional: vector DB integration

class RAGDataGenerator {
  private synth: AgenticSynth;
  private vectorDB?: RuvectorClient;

  constructor(useVectorDB = true) {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      vectorDB: useVectorDB
    });

    if (useVectorDB) {
      this.vectorDB = new RuvectorClient({
        url: process.env.RUVECTOR_URL || 'http://localhost:3000'
      });
    }
  }

  async generateCustomerSupportData() {
    // Generate diverse Q&A pairs
    const qaData = await this.synth.generateStructured({
      count: 1000,
      schema: {
        question_id: 'UUID',
        category: 'billing | technical | shipping | returns | account | product_info',
        question: 'realistic customer support question',
        answer: 'detailed, helpful answer (2-4 sentences)',
        keywords: ['array of 3-5 relevant keywords'],
        difficulty: 'easy | medium | hard',
        sentiment: 'neutral | frustrated | confused | satisfied',
        related_questions: ['array of 2-3 related question texts'],
        metadata: {
          estimated_resolution_time: 'number (1-30 minutes)',
          requires_escalation: 'boolean',
          product_category: 'electronics | clothing | home | books | other',
          language: 'en'
        }
      },
      constraints: [
        'Questions should be natural, conversational',
        'Answers should be accurate and empathetic',
        'Billing questions often frustrated sentiment',
        'Technical questions higher difficulty',
        'Include typos and informal language in 10% of questions',
        'Related questions share category and keywords',
        'Hard questions more likely to require escalation'
      ]
    });

    console.log(`Generated ${qaData.data.length} Q&A pairs`);
    return qaData;
  }

  async generateEmbeddingsAndStore(qaData: any) {
    if (!this.vectorDB) {
      console.log('Vector DB not enabled, skipping embedding storage');
      return;
    }

    console.log('Generating embeddings and storing in vector DB...');

    // Batch process for efficiency
    const batchSize = 50;
    for (let i = 0; i < qaData.data.length; i += batchSize) {
      const batch = qaData.data.slice(i, i + batchSize);

      // Generate embeddings using Gemini's embedding model
      const embeddings = await Promise.all(
        batch.map(async (qa: any) => {
          const text = `${qa.question} ${qa.answer}`;
          // Use Gemini embedding API
          // const embedding = await generateEmbedding(text);
          return {
            id: qa.question_id,
            text,
            metadata: qa
          };
        })
      );

      // Store in vector database
      // await this.vectorDB.insert(embeddings);

      console.log(`Processed batch ${i / batchSize + 1}/${Math.ceil(qaData.data.length / batchSize)}`);
    }

    console.log('✅ All embeddings stored in vector DB');
  }

  async run() {
    const qaData = await this.generateCustomerSupportData();
    await this.generateEmbeddingsAndStore(qaData);

    console.log('\n📚 RAG Data Generation Complete:');
    console.log(`   Q&A Pairs: ${qaData.data.length}`);
    console.log(`   Categories: ${new Set(qaData.data.map((d: any) => d.category)).size}`);
    console.log(`   Generation time: ${qaData.metadata.generationTime}ms`);
  }
}

// Run the generator
const ragGen = new RAGDataGenerator(true);
await ragGen.run();
```

### 2.3 Database Seeding

Seed development/staging databases with realistic test data:

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';
import { Pool } from 'pg'; // PostgreSQL example

class DatabaseSeeder {
  private synth: AgenticSynth;
  private db: Pool;

  constructor() {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      maxCacheSize: 5000
    });

    this.db = new Pool({
      host: process.env.DB_HOST,
      database: process.env.DB_NAME,
      user: process.env.DB_USER,
      password: process.env.DB_PASSWORD
    });
  }

  async seedUsers(count = 1000) {
    console.log(`Seeding ${count} users...`);

    const users = await this.synth.generateStructured({
      count,
      schema: {
        email: 'valid unique email',
        username: 'unique username (5-20 chars)',
        first_name: 'first name',
        last_name: 'last name',
        password_hash: 'bcrypt hash',
        phone: 'phone number or null',
        avatar_url: 'profile image URL or null',
        bio: 'user bio (1-2 sentences) or null',
        birth_date: 'ISO date (age 18-75)',
        country: 'ISO country code',
        created_at: 'ISO timestamp (last 2 years)',
        last_login_at: 'ISO timestamp (recent)',
        email_verified: 'boolean (90% true)',
        account_status: 'active | suspended | deleted'
      }
    });

    // Bulk insert
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const user of users.data) {
        await client.query(
          `INSERT INTO users (email, username, first_name, last_name, password_hash,
           phone, avatar_url, bio, birth_date, country, created_at, last_login_at,
           email_verified, account_status)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)`,
          [user.email, user.username, user.first_name, user.last_name,
           user.password_hash, user.phone, user.avatar_url, user.bio,
           user.birth_date, user.country, user.created_at, user.last_login_at,
           user.email_verified, user.account_status]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${count} users`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return users.data;
  }

  async seedOrders(users: any[], ordersPerUser = 5) {
    console.log(`Seeding orders for ${users.length} users...`);

    const totalOrders = users.length * ordersPerUser;
    const orders = await this.synth.generateStructured({
      count: totalOrders,
      schema: {
        user_id: 'UUID (from users array)',
        order_number: 'unique order number',
        status: 'pending | processing | shipped | delivered | cancelled',
        total_amount: 'number (10-1000)',
        currency: 'USD',
        payment_method: 'credit_card | paypal | apple_pay',
        shipping_address: 'full address',
        order_date: 'ISO timestamp (after user.created_at)',
        shipped_date: 'ISO timestamp or null',
        delivered_date: 'ISO timestamp or null'
      }
    });

    // Bulk insert orders
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const order of orders.data) {
        await client.query(
          `INSERT INTO orders (user_id, order_number, status, total_amount,
           currency, payment_method, shipping_address, order_date, shipped_date,
           delivered_date)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)`,
          [order.user_id, order.order_number, order.status, order.total_amount,
           order.currency, order.payment_method, order.shipping_address,
           order.order_date, order.shipped_date, order.delivered_date]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${totalOrders} orders`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return orders.data;
  }

  async seedProducts(count = 500) {
    console.log(`Seeding ${count} products...`);

    const products = await this.synth.generateStructured({
      count,
      schema: {
        sku: 'unique SKU code',
        name: 'product name',
        description: 'product description (2-3 sentences)',
        category: 'electronics | clothing | home | books | food | sports',
        price: 'number (5-500)',
        stock_quantity: 'number (0-1000)',
        weight_kg: 'number (0.1-50)',
        dimensions: '{ length, width, height in cm }',
        manufacturer: 'company name',
        rating: 'number (1-5, one decimal)',
        num_reviews: 'number (0-5000)',
        images: ['array of 1-5 image URLs'],
        tags: ['array of 3-7 product tags'],
        created_at: 'ISO timestamp'
      }
    });

    // Bulk insert products
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const product of products.data) {
        await client.query(
          `INSERT INTO products (sku, name, description, category, price,
           stock_quantity, weight_kg, dimensions, manufacturer, rating,
           num_reviews, images, tags, created_at)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)`,
          [product.sku, product.name, product.description, product.category,
           product.price, product.stock_quantity, product.weight_kg,
           JSON.stringify(product.dimensions), product.manufacturer,
           product.rating, product.num_reviews, JSON.stringify(product.images),
           JSON.stringify(product.tags), product.created_at]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${count} products`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return products.data;
  }

  async run() {
    console.log('🌱 Starting database seeding...\n');

    const users = await this.seedUsers(1000);
    await this.seedProducts(500);
    await this.seedOrders(users, 5);

    await this.db.end();

    console.log('\n✅ Database seeding complete!');
    console.log('   Users: 1000');
    console.log('   Products: 500');
    console.log('   Orders: 5000');
  }
}

// Run the seeder
const seeder = new DatabaseSeeder();
await seeder.run();
```

---

## 3. Performance Optimization

### 3.1 Maximize Cache Hit Rate

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

// Configure for optimal caching
const synth = new AgenticSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory',
  cacheTTL: 7200, // 2 hours
  maxCacheSize: 10000 // Large cache
});

// Use stable options for better cache hits
const baseOptions = {
  count: 100,
  schema: {
    user_id: 'UUID',
    name: 'full name',
    email: 'valid email'
  }
};

// First call - cache miss
const result1 = await synth.generateStructured(baseOptions);
console.log('Cached:', result1.metadata.cached); // false

// Second call with identical options - cache hit!
const result2 = await synth.generateStructured(baseOptions);
console.log('Cached:', result2.metadata.cached); // true

// Check cache statistics
const stats = synth.cache.getStats();
console.log('Cache hit rate:', (stats.hitRate * 100).toFixed(1) + '%');
console.log('Cache size:', stats.size);
```

### 3.2 Batch Processing for High Throughput

```typescript
// Generate 10 different datasets in parallel
const batchOptions = [
  { count: 100, schema: { id: 'UUID', value: 'number' } },
  { count: 200, schema: { id: 'UUID', name: 'string' } },
  { count: 150, schema: { id: 'UUID', email: 'email' } },
  // ... 7 more options
];

// Process with concurrency control
const results = await synth.generateBatch(
  'structured',
  batchOptions,
  5 // 5 concurrent requests
);

console.log(`Generated ${results.length} datasets in parallel`);
console.log('Total records:', results.reduce((sum, r) => sum + r.data.length, 0));
```

### 3.3 Streaming for Large Datasets

```typescript
// Stream 1 million records without loading all into memory
console.log('Streaming 1M records...');

let count = 0;
for await (const record of synth.generateStream('structured', {
  count: 1_000_000,
  schema: {
    id: 'UUID',
    timestamp: 'ISO timestamp',
    value: 'number'
  }
})) {
  // Process record immediately (e.g., write to file/DB)
  await processRecord(record);

  count++;
  if (count % 10000 === 0) {
    console.log(`Processed ${count.toLocaleString()} records...`);
  }
}

console.log('Streaming complete!');
```

### 3.4 Model Selection for Cost Optimization

```typescript
// Use cheaper models for simple data
const simpleData = new AgenticSynth({
  provider: 'gemini',
  model: 'gemini-2.0-flash-exp', // Fast and cheap
  apiKey: process.env.GEMINI_API_KEY
});

// Use more powerful models for complex data
const complexData = new AgenticSynth({
  provider: 'openrouter',
  model: 'anthropic/claude-3.5-sonnet', // More capable
  apiKey: process.env.OPENROUTER_API_KEY
});
```

---

## 4. Production Deployment

### 4.1 Environment Configuration

```typescript
// config/production.ts
export const productionConfig = {
  provider: process.env.SYNTH_PROVIDER as 'gemini' | 'openrouter',
  apiKey: process.env.SYNTH_API_KEY!,
  model: process.env.SYNTH_MODEL,
  cacheStrategy: 'memory' as const,
  cacheTTL: parseInt(process.env.CACHE_TTL || '3600'),
  maxCacheSize: parseInt(process.env.MAX_CACHE_SIZE || '10000'),
  maxRetries: 3,
  timeout: 30000,
  streaming: process.env.ENABLE_STREAMING === 'true',
  automation: process.env.ENABLE_AUTOMATION === 'true',
  vectorDB: process.env.ENABLE_VECTOR_DB === 'true'
};

// Validation
if (!productionConfig.apiKey) {
  throw new Error('SYNTH_API_KEY environment variable is required');
}
```

### 4.2 Error Handling & Retry Logic

```typescript
import { AgenticSynth, APIError, ValidationError } from '@ruvector/agentic-synth';

async function generateWithRetry(
  synth: AgenticSynth,
  options: any,
  maxRetries = 3
) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await synth.generateStructured(options);
      return result;
    } catch (error) {
      if (error instanceof ValidationError) {
        // Don't retry validation errors
        console.error('Validation failed:', error.message);
        throw error;
      }

      if (error instanceof APIError) {
        if (error.statusCode === 429) {
          // Rate limit - exponential backoff
          const delay = Math.pow(2, attempt) * 1000;
          console.log(`Rate limited. Retrying in ${delay}ms...`);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }

        if (error.statusCode >= 500) {
          // Server error - retry
          console.log(`Server error (${error.statusCode}). Retry ${attempt}/${maxRetries}`);
          continue;
        }
      }

      // Unknown error or non-retryable
      throw error;
    }
  }

  throw new Error('Max retries exceeded');
}
```

### 4.3 Monitoring & Metrics

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';
import { performance } from 'perf_hooks';

class MonitoredSynth {
  private synth: AgenticSynth;
  private metrics = {
    totalRequests: 0,
    successfulRequests: 0,
    failedRequests: 0,
    totalLatency: 0,
    cacheHits: 0,
    cacheMisses: 0
  };

  constructor(config: any) {
    this.synth = new AgenticSynth(config);
  }

  async generate(type: string, options: any) {
    const start = performance.now();
    this.metrics.totalRequests++;

    try {
      const result = await this.synth.generate(type as any, options);

      this.metrics.successfulRequests++;
      this.metrics.totalLatency += performance.now() - start;

      if (result.metadata.cached) {
        this.metrics.cacheHits++;
      } else {
        this.metrics.cacheMisses++;
      }

      return result;
    } catch (error) {
      this.metrics.failedRequests++;
      throw error;
    }
  }

  getMetrics() {
    const avgLatency = this.metrics.totalLatency / this.metrics.successfulRequests;
    const successRate = this.metrics.successfulRequests / this.metrics.totalRequests;
    const cacheHitRate = this.metrics.cacheHits / (this.metrics.cacheHits + this.metrics.cacheMisses);

    return {
      ...this.metrics,
      avgLatency: Math.round(avgLatency),
      successRate: (successRate * 100).toFixed(2) + '%',
      cacheHitRate: (cacheHitRate * 100).toFixed(2) + '%'
    };
  }
}

// Usage
const monitored = new MonitoredSynth(productionConfig);

// Generate data
await monitored.generate('structured', { count: 100, schema: { id: 'UUID' } });

// Log metrics periodically
setInterval(() => {
  console.log('Synth Metrics:', monitored.getMetrics());
}, 60000); // Every minute
```

### 4.4 Rate Limiting

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';
import { RateLimiter } from 'limiter';

class RateLimitedSynth {
  private synth: AgenticSynth;
  private limiter: RateLimiter;

  constructor(config: any, requestsPerMinute = 60) {
    this.synth = new AgenticSynth(config);
    this.limiter = new RateLimiter({
      tokensPerInterval: requestsPerMinute,
      interval: 'minute'
    });
  }

  async generate(type: string, options: any) {
    // Wait for rate limit token
    await this.limiter.removeTokens(1);

    // Proceed with generation
    return await this.synth.generate(type as any, options);
  }
}

// Usage: Limit to 60 requests per minute
const limited = new RateLimitedSynth(productionConfig, 60);
```

---

## 5. Error Handling & Monitoring

### 5.1 Comprehensive Error Handling

```typescript
import { AgenticSynth, SynthError, ValidationError, APIError, CacheError } from '@ruvector/agentic-synth';

async function robustGeneration(options: any) {
  const synth = new AgenticSynth({
    provider: 'gemini',
    apiKey: process.env.GEMINI_API_KEY
  });

  try {
    const result = await synth.generateStructured(options);
    return result;
  } catch (error) {
    if (error instanceof ValidationError) {
      console.error('❌ Invalid options:', error.message);
      console.error('   Validation errors:', error.validationErrors);
      // Fix options and retry
      return null;
    }

    if (error instanceof APIError) {
      console.error('❌ API error:', error.message);
      console.error('   Status:', error.statusCode);
      console.error('   Provider:', error.provider);

      if (error.statusCode === 401) {
        console.error('   Check API key configuration');
      } else if (error.statusCode === 429) {
        console.error('   Rate limit exceeded - implement backoff');
      } else if (error.statusCode >= 500) {
        console.error('   Provider service error - retry later');
      }

      return null;
    }

    if (error instanceof CacheError) {
      console.error('❌ Cache error:', error.message);
      // Cache errors are non-fatal - continue without cache
      synth.config.cacheStrategy = undefined;
      return await synth.generateStructured(options);
    }

    if (error instanceof SynthError) {
      console.error('❌ Synth error:', error.message);
      return null;
    }

    // Unknown error
    console.error('❌ Unexpected error:', error);
    throw error;
  }
}
```

### 5.2 Health Checks

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

class HealthChecker {
  private synth: AgenticSynth;

  constructor(config: any) {
    this.synth = new AgenticSynth(config);
  }

  async checkHealth() {
    const health = {
      status: 'healthy',
      timestamp: new Date().toISOString(),
      checks: {
        apiConnection: false,
        cacheWorking: false,
        generationWorking: false
      },
      metrics: {}
    };

    try {
      // Test API connection with minimal request
      const testResult = await this.synth.generateStructured({
        count: 1,
        schema: { test: 'string' }
      });

      health.checks.apiConnection = true;
      health.checks.generationWorking = true;
      health.checks.cacheWorking = testResult.metadata.cached === false;

      // Get cache stats
      const cacheStats = this.synth.cache.getStats();
      health.metrics = {
        cacheSize: cacheStats.size,
        cacheHitRate: (cacheStats.hitRate * 100).toFixed(2) + '%',
        generationTime: testResult.metadata.generationTime + 'ms'
      };

    } catch (error) {
      health.status = 'unhealthy';
      health.checks.apiConnection = false;
    }

    return health;
  }
}

// Express.js health endpoint
app.get('/health', async (req, res) => {
  const checker = new HealthChecker(productionConfig);
  const health = await checker.checkHealth();

  const statusCode = health.status === 'healthy' ? 200 : 503;
  res.status(statusCode).json(health);
});
```

---

## 6. Advanced Patterns

### 6.1 Multi-Provider Fallback

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

class MultiProviderSynth {
  private providers: AgenticSynth[];

  constructor() {
    this.providers = [
      new AgenticSynth({
        provider: 'gemini',
        apiKey: process.env.GEMINI_API_KEY,
        model: 'gemini-2.0-flash-exp'
      }),
      new AgenticSynth({
        provider: 'openrouter',
        apiKey: process.env.OPENROUTER_API_KEY,
        model: 'anthropic/claude-3.5-sonnet'
      })
    ];
  }

  async generateWithFallback(type: string, options: any) {
    for (let i = 0; i < this.providers.length; i++) {
      try {
        console.log(`Attempting provider ${i + 1}/${this.providers.length}...`);
        const result = await this.providers[i].generate(type as any, options);
        console.log(`✅ Success with provider ${i + 1}`);
        return result;
      } catch (error) {
        console.log(`❌ Provider ${i + 1} failed:`, error.message);
        if (i === this.providers.length - 1) {
          throw new Error('All providers failed');
        }
      }
    }
  }
}
```

### 6.2 Conditional Generation Logic

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

async function generateCustomerProfiles(synth: AgenticSynth, count: number) {
  // First, generate base profiles
  const profiles = await synth.generateStructured({
    count,
    schema: {
      customer_id: 'UUID',
      customer_type: 'individual | business',
      name: 'full name',
      email: 'valid email'
    }
  });

  // Then, conditionally generate additional data based on type
  for (const profile of profiles.data) {
    if (profile.customer_type === 'business') {
      // Generate business-specific data
      const businessData = await synth.generateStructured({
        count: 1,
        schema: {
          company_name: 'company name',
          tax_id: 'EIN',
          employees: 'number (1-10000)',
          annual_revenue: 'number (10000-10000000)',
          industry: 'industry type'
        }
      });

      Object.assign(profile, businessData.data[0]);
    } else {
      // Generate individual-specific data
      const individualData = await synth.generateStructured({
        count: 1,
        schema: {
          age: 'number (18-80)',
          occupation: 'job title',
          income: 'number (20000-200000)',
          marital_status: 'single | married | divorced'
        }
      });

      Object.assign(profile, individualData.data[0]);
    }
  }

  return profiles;
}
```

### 6.3 Progressive Enhancement

```typescript
import { AgenticSynth } from '@ruvector/agentic-synth';

async function progressiveDataGeneration(synth: AgenticSynth) {
  console.log('Phase 1: Basic data generation...');
  let data = await synth.generateStructured({
    count: 100,
    schema: {
      id: 'UUID',
      name: 'full name',
      email: 'valid email'
    }
  });

  console.log('Phase 2: Adding relationships...');
  data = await synth.generateStructured({
    count: 100,
    schema: {
      ...data.data[0], // Existing fields
      friends: ['array of 3-10 UUIDs from existing data'],
      groups: ['array of 1-3 group names']
    }
  });

  console.log('Phase 3: Adding behavioral data...');
  data = await synth.generateStructured({
    count: 100,
    schema: {
      ...data.data[0], // Existing fields
      last_login: 'ISO timestamp',
      total_purchases: 'number (0-100)',
      avg_order_value: 'number (10-500)',
      loyalty_tier: 'bronze | silver | gold | platinum'
    }
  });

  return data;
}
```

---

## 7. Best Practices Summary

### ✅ Do's

1. **Enable caching** for repeated or similar requests
2. **Use batch operations** for multiple datasets
3. **Stream large datasets** to avoid memory issues
4. **Implement retry logic** with exponential backoff
5. **Monitor cache hit rates** and adjust TTL accordingly
6. **Validate options** before generation
7. **Use environment variables** for sensitive config
8. **Implement health checks** in production
9. **Log errors comprehensively** with context
10. **Test with realistic schemas** before production

### ❌ Don'ts

1. **Don't hardcode API keys** in source code
2. **Don't generate without caching** in production
3. **Don't ignore validation errors** - fix schemas
4. **Don't load massive datasets** into memory
5. **Don't skip error handling** on generation calls
6. **Don't use inappropriate models** for task complexity
7. **Don't disable retries** unless intentional
8. **Don't forget to monitor** metrics in production
9. **Don't generate unconstrained** data without schema
10. **Don't skip testing** with edge cases

---

## 8. Performance Tips

1. **Cache Configuration**: Larger TTL (1-2 hours) for stable schemas
2. **Batch Size**: 3-5 concurrent requests for optimal throughput
3. **Model Selection**: Use `gemini-2.0-flash-exp` for speed
4. **Streaming**: Use for >10K records to reduce memory
5. **Connection Pooling**: Reuse AgenticSynth instances
6. **Rate Limiting**: Implement to avoid 429 errors
7. **Schema Simplicity**: Simpler schemas = faster generation
8. **Constraint Clarity**: Clear constraints improve accuracy
9. **Error Recovery**: Implement fallback chains
10. **Monitoring**: Track P95/P99 latencies

---

## 9. Troubleshooting

### Issue: Low Cache Hit Rate

**Solution**: Use stable, deterministic options
```typescript
// ❌ Bad: timestamp makes every request unique
const options = { count: 100, timestamp: Date.now() };

// ✅ Good: stable options enable caching
const options = { count: 100, schema: { id: 'UUID' } };
```

### Issue: High Latency

**Solution**:
1. Enable caching
2. Use faster model (gemini-2.0-flash-exp)
3. Reduce complexity of schema
4. Batch similar requests

### Issue: Memory Errors

**Solution**: Use streaming for large datasets
```typescript
// ❌ Bad: load all into memory
const result = await synth.generateStructured({ count: 1000000 });

// ✅ Good: stream records
for await (const record of synth.generateStream('structured', { count: 1000000 })) {
  processRecord(record);
}
```

### Issue: Rate Limiting (429)

**Solution**: Implement exponential backoff
```typescript
async function generateWithBackoff(synth, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await synth.generateStructured(options);
    } catch (error) {
      if (error.statusCode === 429) {
        const delay = Math.pow(2, i) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}
```

---

## 10. Additional Resources

- **API Reference**: See `docs/API.md`
- **Performance Guide**: See `docs/PERFORMANCE.md`
- **Benchmarks**: See `PERFORMANCE_REPORT.md`
- **Examples**: See `examples/` directory
- **GitHub**: https://github.com/ruvnet/ruvector

---

**Last Updated**: 2025-11-22
**Package Version**: 0.1.0
**Status**: Production Ready ✅