Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

36 KiB

Raw Blame History

🎓 Agentic-Synth Advanced Usage Guide

Version: 0.1.0 Last Updated: 2025-11-22

Advanced Data Generation
Real-World Integration Examples
Performance Optimization
Production Deployment
Error Handling & Monitoring
Advanced Patterns

1. Advanced Data Generation

1.1 Complex Time-Series with Custom Patterns

Generate realistic stock market data with multiple indicators:

import { AgenticSynth } from '@ruvector/agentic-synth';

const synth = new AgenticSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory',
  cacheTTL: 3600
});

// Generate 1 year of stock market data
const stockData = await synth.generateTimeSeries({
  count: 365,
  startDate: '2024-01-01',
  interval: '1d',
  schema: {
    date: 'ISO date',
    open: 'number (100-200)',
    high: 'number (105-210)',
    low: 'number (95-195)',
    close: 'number (100-200)',
    volume: 'number (1000000-10000000)',
    // Technical indicators
    sma_20: 'number (calculated 20-day moving average)',
    rsi_14: 'number (0-100, RSI indicator)',
    macd: 'number (-5 to 5)',
    bollinger_upper: 'number',
    bollinger_lower: 'number'
  },
  constraints: [
    'high must be >= open and close',
    'low must be <= open and close',
    'close of previous day influences next day open (± 5%)',
    'volume increases on large price changes',
    'RSI correlates with price momentum',
    'SMA_20 follows price trend'
  ]
});

console.log(`Generated ${stockData.data.length} days of stock data`);
console.log('Cache hit rate:', stockData.metadata.cached ? '100%' : '0%');

1.2 Multi-User Event Simulation

Simulate realistic user behavior across 100 users:

const userBehaviorSim = await synth.generateEvents({
  count: 10000,
  timeRange: {
    start: '2024-01-01T00:00:00Z',
    end: '2024-01-31T23:59:59Z'
  },
  eventTypes: [
    'page_view',
    'click',
    'form_submit',
    'purchase',
    'logout',
    'search',
    'add_to_cart'
  ],
  schema: {
    event_id: 'UUID',
    user_id: 'UUID (one of 100 consistent users)',
    event_type: 'one of eventTypes',
    timestamp: 'ISO timestamp within timeRange',
    session_id: 'UUID (consistent per user session)',
    page_url: 'URL path',
    metadata: {
      device: 'mobile | desktop | tablet',
      browser: 'chrome | firefox | safari | edge',
      country: 'ISO country code',
      referrer: 'URL or null',
      duration_ms: 'number (100-30000 for page_view)',
      cart_value: 'number (only for add_to_cart/purchase)',
      search_query: 'string (only for search events)'
    }
  },
  constraints: [
    'Users follow realistic session patterns (20-40 events per session)',
    'Purchase events must be preceded by add_to_cart',
    'Events follow temporal ordering per user',
    'Session gaps between 30min-24hours',
    'Time distribution follows Poisson with peak hours 10am-4pm',
    'Mobile users more common in evening hours',
    'Purchase conversion rate ~2-3%'
  ]
});

console.log('Event simulation complete');
console.log('Total events:', userBehaviorSim.data.length);
console.log('Unique users:', new Set(userBehaviorSim.data.map(e => e.user_id)).size);

1.3 Nested Schema Generation

Generate complex e-commerce orders with line items:

const orders = await synth.generateStructured({
  count: 1000,
  schema: {
    order_id: 'UUID',
    customer: {
      customer_id: 'UUID',
      email: 'valid email',
      name: 'full name',
      phone: 'phone number with country code',
      address: {
        street: 'street address',
        city: 'city name',
        state: 'US state code',
        zip: 'US ZIP code',
        country: 'USA'
      }
    },
    items: [
      {
        sku: 'product SKU code',
        name: 'product name',
        category: 'electronics | clothing | home | books | food',
        quantity: 'number (1-10)',
        unit_price: 'number (5-500)',
        discount_percent: 'number (0-30)',
        subtotal: 'calculated: quantity * unit_price * (1 - discount_percent/100)'
      }
    ],
    payment: {
      method: 'credit_card | paypal | apple_pay | google_pay',
      status: 'pending | completed | failed | refunded',
      transaction_id: 'UUID',
      amount: 'sum of all item subtotals'
    },
    shipping: {
      method: 'standard | express | overnight',
      cost: 'number (0-50, based on method)',
      tracking_number: 'tracking code',
      estimated_delivery: 'ISO date (3-10 days from order date)'
    },
    order_date: 'ISO timestamp',
    status: 'pending | processing | shipped | delivered | cancelled',
    total_amount: 'payment.amount + shipping.cost',
    notes: 'optional customer notes or null'
  },
  constraints: [
    'Each order has 1-8 line items',
    'Total amount must equal sum of items + shipping',
    'Status progression: pending → processing → shipped → delivered',
    'Express shipping costs 2x standard',
    'Overnight shipping costs 4x standard',
    'Electronics have 10-20% discount',
    'Credit card most common payment (60%)',
    'Standard shipping most common (70%)'
  ]
});

console.log(`Generated ${orders.data.length} complex orders`);

2. Real-World Integration Examples

2.1 ML Training Pipeline

Complete machine learning data generation pipeline:

import { AgenticSynth } from '@ruvector/agentic-synth';
import * as fs from 'fs/promises';

class MLTrainingPipeline {
  private synth: AgenticSynth;

  constructor() {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      maxCacheSize: 10000 // Large cache for ML datasets
    });
  }

  async generateTrainingData(samples = 5000) {
    console.log(`Generating ${samples} training samples...`);

    const training = await this.synth.generateStructured({
      count: samples,
      schema: {
        // Feature engineering
        customer_age: 'number (18-80)',
        annual_income: 'number (20000-200000)',
        credit_score: 'number (300-850)',
        account_tenure_months: 'number (1-360)',
        num_products: 'number (1-5)',
        balance: 'number (0-250000)',
        num_transactions_12m: 'number (0-200)',
        avg_transaction_amount: 'number (10-5000)',
        credit_utilization: 'number (0-100)',
        num_late_payments: 'number (0-10)',

        // Target variable
        churn: 'boolean (based on features: higher likelihood if credit_score<600, num_late_payments>3, balance<1000, credit_utilization>80)'
      },
      constraints: [
        'Credit utilization correlates negatively with credit score',
        'Higher income correlates with higher balance',
        'Churn rate should be ~15-20% (imbalanced dataset)',
        'Customers with 1 product more likely to churn',
        'Tenure > 24 months reduces churn likelihood'
      ]
    });

    await fs.writeFile(
      'ml_data/training.json',
      JSON.stringify(training.data, null, 2)
    );

    return training;
  }

  async generateTestData(samples = 1000) {
    console.log(`Generating ${samples} test samples...`);

    // Similar schema, different random seed
    const test = await this.synth.generateStructured({
      count: samples,
      schema: { /* same as training */ },
      constraints: [ /* same as training */ ]
    });

    await fs.writeFile(
      'ml_data/test.json',
      JSON.stringify(test.data, null, 2)
    );

    return test;
  }

  async generateEdgeCases(samples = 100) {
    console.log(`Generating ${samples} edge case samples...`);

    const edgeCases = await this.synth.generateStructured({
      count: samples,
      schema: { /* same schema */ },
      constraints: [
        'Include extreme values: age 18-22 or 75-80',
        'Include very high credit_utilization (90-100)',
        'Include very low credit_score (300-400)',
        'Include zero balance accounts',
        'Include customers with num_products = 5'
      ]
    });

    await fs.writeFile(
      'ml_data/edge_cases.json',
      JSON.stringify(edgeCases.data, null, 2)
    );

    return edgeCases;
  }

  async run() {
    await fs.mkdir('ml_data', { recursive: true });

    const [training, test, edges] = await Promise.all([
      this.generateTrainingData(5000),
      this.generateTestData(1000),
      this.generateEdgeCases(100)
    ]);

    console.log('\n📊 ML Dataset Generation Complete:');
    console.log(`   Training: ${training.data.length} samples`);
    console.log(`   Test: ${test.data.length} samples`);
    console.log(`   Edge Cases: ${edges.data.length} samples`);
    console.log(`   Total generation time: ${
      training.metadata.generationTime +
      test.metadata.generationTime +
      edges.metadata.generationTime
    }ms`);

    // Get cache statistics
    const cacheStats = this.synth.cache.getStats();
    console.log(`   Cache hit rate: ${(cacheStats.hitRate * 100).toFixed(1)}%`);
  }
}

// Run the pipeline
const pipeline = new MLTrainingPipeline();
await pipeline.run();

2.2 RAG System Data Generation

Generate Q&A pairs for Retrieval-Augmented Generation:

import { AgenticSynth } from '@ruvector/agentic-synth';
import { RuvectorClient } from 'ruvector'; // Optional: vector DB integration

class RAGDataGenerator {
  private synth: AgenticSynth;
  private vectorDB?: RuvectorClient;

  constructor(useVectorDB = true) {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      vectorDB: useVectorDB
    });

    if (useVectorDB) {
      this.vectorDB = new RuvectorClient({
        url: process.env.RUVECTOR_URL || 'http://localhost:3000'
      });
    }
  }

  async generateCustomerSupportData() {
    // Generate diverse Q&A pairs
    const qaData = await this.synth.generateStructured({
      count: 1000,
      schema: {
        question_id: 'UUID',
        category: 'billing | technical | shipping | returns | account | product_info',
        question: 'realistic customer support question',
        answer: 'detailed, helpful answer (2-4 sentences)',
        keywords: ['array of 3-5 relevant keywords'],
        difficulty: 'easy | medium | hard',
        sentiment: 'neutral | frustrated | confused | satisfied',
        related_questions: ['array of 2-3 related question texts'],
        metadata: {
          estimated_resolution_time: 'number (1-30 minutes)',
          requires_escalation: 'boolean',
          product_category: 'electronics | clothing | home | books | other',
          language: 'en'
        }
      },
      constraints: [
        'Questions should be natural, conversational',
        'Answers should be accurate and empathetic',
        'Billing questions often frustrated sentiment',
        'Technical questions higher difficulty',
        'Include typos and informal language in 10% of questions',
        'Related questions share category and keywords',
        'Hard questions more likely to require escalation'
      ]
    });

    console.log(`Generated ${qaData.data.length} Q&A pairs`);
    return qaData;
  }

  async generateEmbeddingsAndStore(qaData: any) {
    if (!this.vectorDB) {
      console.log('Vector DB not enabled, skipping embedding storage');
      return;
    }

    console.log('Generating embeddings and storing in vector DB...');

    // Batch process for efficiency
    const batchSize = 50;
    for (let i = 0; i < qaData.data.length; i += batchSize) {
      const batch = qaData.data.slice(i, i + batchSize);

      // Generate embeddings using Gemini's embedding model
      const embeddings = await Promise.all(
        batch.map(async (qa: any) => {
          const text = `${qa.question} ${qa.answer}`;
          // Use Gemini embedding API
          // const embedding = await generateEmbedding(text);
          return {
            id: qa.question_id,
            text,
            metadata: qa
          };
        })
      );

      // Store in vector database
      // await this.vectorDB.insert(embeddings);

      console.log(`Processed batch ${i / batchSize + 1}/${Math.ceil(qaData.data.length / batchSize)}`);
    }

    console.log('✅ All embeddings stored in vector DB');
  }

  async run() {
    const qaData = await this.generateCustomerSupportData();
    await this.generateEmbeddingsAndStore(qaData);

    console.log('\n📚 RAG Data Generation Complete:');
    console.log(`   Q&A Pairs: ${qaData.data.length}`);
    console.log(`   Categories: ${new Set(qaData.data.map((d: any) => d.category)).size}`);
    console.log(`   Generation time: ${qaData.metadata.generationTime}ms`);
  }
}

// Run the generator
const ragGen = new RAGDataGenerator(true);
await ragGen.run();

2.3 Database Seeding

Seed development/staging databases with realistic test data:

import { AgenticSynth } from '@ruvector/agentic-synth';
import { Pool } from 'pg'; // PostgreSQL example

class DatabaseSeeder {
  private synth: AgenticSynth;
  private db: Pool;

  constructor() {
    this.synth = new AgenticSynth({
      provider: 'gemini',
      apiKey: process.env.GEMINI_API_KEY,
      cacheStrategy: 'memory',
      maxCacheSize: 5000
    });

    this.db = new Pool({
      host: process.env.DB_HOST,
      database: process.env.DB_NAME,
      user: process.env.DB_USER,
      password: process.env.DB_PASSWORD
    });
  }

  async seedUsers(count = 1000) {
    console.log(`Seeding ${count} users...`);

    const users = await this.synth.generateStructured({
      count,
      schema: {
        email: 'valid unique email',
        username: 'unique username (5-20 chars)',
        first_name: 'first name',
        last_name: 'last name',
        password_hash: 'bcrypt hash',
        phone: 'phone number or null',
        avatar_url: 'profile image URL or null',
        bio: 'user bio (1-2 sentences) or null',
        birth_date: 'ISO date (age 18-75)',
        country: 'ISO country code',
        created_at: 'ISO timestamp (last 2 years)',
        last_login_at: 'ISO timestamp (recent)',
        email_verified: 'boolean (90% true)',
        account_status: 'active | suspended | deleted'
      }
    });

    // Bulk insert
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const user of users.data) {
        await client.query(
          `INSERT INTO users (email, username, first_name, last_name, password_hash,
           phone, avatar_url, bio, birth_date, country, created_at, last_login_at,
           email_verified, account_status)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)`,
          [user.email, user.username, user.first_name, user.last_name,
           user.password_hash, user.phone, user.avatar_url, user.bio,
           user.birth_date, user.country, user.created_at, user.last_login_at,
           user.email_verified, user.account_status]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${count} users`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return users.data;
  }

  async seedOrders(users: any[], ordersPerUser = 5) {
    console.log(`Seeding orders for ${users.length} users...`);

    const totalOrders = users.length * ordersPerUser;
    const orders = await this.synth.generateStructured({
      count: totalOrders,
      schema: {
        user_id: 'UUID (from users array)',
        order_number: 'unique order number',
        status: 'pending | processing | shipped | delivered | cancelled',
        total_amount: 'number (10-1000)',
        currency: 'USD',
        payment_method: 'credit_card | paypal | apple_pay',
        shipping_address: 'full address',
        order_date: 'ISO timestamp (after user.created_at)',
        shipped_date: 'ISO timestamp or null',
        delivered_date: 'ISO timestamp or null'
      }
    });

    // Bulk insert orders
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const order of orders.data) {
        await client.query(
          `INSERT INTO orders (user_id, order_number, status, total_amount,
           currency, payment_method, shipping_address, order_date, shipped_date,
           delivered_date)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)`,
          [order.user_id, order.order_number, order.status, order.total_amount,
           order.currency, order.payment_method, order.shipping_address,
           order.order_date, order.shipped_date, order.delivered_date]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${totalOrders} orders`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return orders.data;
  }

  async seedProducts(count = 500) {
    console.log(`Seeding ${count} products...`);

    const products = await this.synth.generateStructured({
      count,
      schema: {
        sku: 'unique SKU code',
        name: 'product name',
        description: 'product description (2-3 sentences)',
        category: 'electronics | clothing | home | books | food | sports',
        price: 'number (5-500)',
        stock_quantity: 'number (0-1000)',
        weight_kg: 'number (0.1-50)',
        dimensions: '{ length, width, height in cm }',
        manufacturer: 'company name',
        rating: 'number (1-5, one decimal)',
        num_reviews: 'number (0-5000)',
        images: ['array of 1-5 image URLs'],
        tags: ['array of 3-7 product tags'],
        created_at: 'ISO timestamp'
      }
    });

    // Bulk insert products
    const client = await this.db.connect();
    try {
      await client.query('BEGIN');

      for (const product of products.data) {
        await client.query(
          `INSERT INTO products (sku, name, description, category, price,
           stock_quantity, weight_kg, dimensions, manufacturer, rating,
           num_reviews, images, tags, created_at)
           VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)`,
          [product.sku, product.name, product.description, product.category,
           product.price, product.stock_quantity, product.weight_kg,
           JSON.stringify(product.dimensions), product.manufacturer,
           product.rating, product.num_reviews, JSON.stringify(product.images),
           JSON.stringify(product.tags), product.created_at]
        );
      }

      await client.query('COMMIT');
      console.log(`✅ Seeded ${count} products`);
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }

    return products.data;
  }

  async run() {
    console.log('🌱 Starting database seeding...\n');

    const users = await this.seedUsers(1000);
    await this.seedProducts(500);
    await this.seedOrders(users, 5);

    await this.db.end();

    console.log('\n✅ Database seeding complete!');
    console.log('   Users: 1000');
    console.log('   Products: 500');
    console.log('   Orders: 5000');
  }
}

// Run the seeder
const seeder = new DatabaseSeeder();
await seeder.run();

3. Performance Optimization

3.1 Maximize Cache Hit Rate

import { AgenticSynth } from '@ruvector/agentic-synth';

// Configure for optimal caching
const synth = new AgenticSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory',
  cacheTTL: 7200, // 2 hours
  maxCacheSize: 10000 // Large cache
});

// Use stable options for better cache hits
const baseOptions = {
  count: 100,
  schema: {
    user_id: 'UUID',
    name: 'full name',
    email: 'valid email'
  }
};

// First call - cache miss
const result1 = await synth.generateStructured(baseOptions);
console.log('Cached:', result1.metadata.cached); // false

// Second call with identical options - cache hit!
const result2 = await synth.generateStructured(baseOptions);
console.log('Cached:', result2.metadata.cached); // true

// Check cache statistics
const stats = synth.cache.getStats();
console.log('Cache hit rate:', (stats.hitRate * 100).toFixed(1) + '%');
console.log('Cache size:', stats.size);

3.2 Batch Processing for High Throughput

// Generate 10 different datasets in parallel
const batchOptions = [
  { count: 100, schema: { id: 'UUID', value: 'number' } },
  { count: 200, schema: { id: 'UUID', name: 'string' } },
  { count: 150, schema: { id: 'UUID', email: 'email' } },
  // ... 7 more options
];

// Process with concurrency control
const results = await synth.generateBatch(
  'structured',
  batchOptions,
  5 // 5 concurrent requests
);

console.log(`Generated ${results.length} datasets in parallel`);
console.log('Total records:', results.reduce((sum, r) => sum + r.data.length, 0));

3.3 Streaming for Large Datasets

// Stream 1 million records without loading all into memory
console.log('Streaming 1M records...');

let count = 0;
for await (const record of synth.generateStream('structured', {
  count: 1_000_000,
  schema: {
    id: 'UUID',
    timestamp: 'ISO timestamp',
    value: 'number'
  }
})) {
  // Process record immediately (e.g., write to file/DB)
  await processRecord(record);

  count++;
  if (count % 10000 === 0) {
    console.log(`Processed ${count.toLocaleString()} records...`);
  }
}

console.log('Streaming complete!');

3.4 Model Selection for Cost Optimization

// Use cheaper models for simple data
const simpleData = new AgenticSynth({
  provider: 'gemini',
  model: 'gemini-2.0-flash-exp', // Fast and cheap
  apiKey: process.env.GEMINI_API_KEY
});

// Use more powerful models for complex data
const complexData = new AgenticSynth({
  provider: 'openrouter',
  model: 'anthropic/claude-3.5-sonnet', // More capable
  apiKey: process.env.OPENROUTER_API_KEY
});

4. Production Deployment

4.1 Environment Configuration

// config/production.ts
export const productionConfig = {
  provider: process.env.SYNTH_PROVIDER as 'gemini' | 'openrouter',
  apiKey: process.env.SYNTH_API_KEY!,
  model: process.env.SYNTH_MODEL,
  cacheStrategy: 'memory' as const,
  cacheTTL: parseInt(process.env.CACHE_TTL || '3600'),
  maxCacheSize: parseInt(process.env.MAX_CACHE_SIZE || '10000'),
  maxRetries: 3,
  timeout: 30000,
  streaming: process.env.ENABLE_STREAMING === 'true',
  automation: process.env.ENABLE_AUTOMATION === 'true',
  vectorDB: process.env.ENABLE_VECTOR_DB === 'true'
};

// Validation
if (!productionConfig.apiKey) {
  throw new Error('SYNTH_API_KEY environment variable is required');
}

4.2 Error Handling & Retry Logic

import { AgenticSynth, APIError, ValidationError } from '@ruvector/agentic-synth';

async function generateWithRetry(
  synth: AgenticSynth,
  options: any,
  maxRetries = 3
) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await synth.generateStructured(options);
      return result;
    } catch (error) {
      if (error instanceof ValidationError) {
        // Don't retry validation errors
        console.error('Validation failed:', error.message);
        throw error;
      }

      if (error instanceof APIError) {
        if (error.statusCode === 429) {
          // Rate limit - exponential backoff
          const delay = Math.pow(2, attempt) * 1000;
          console.log(`Rate limited. Retrying in ${delay}ms...`);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }

        if (error.statusCode >= 500) {
          // Server error - retry
          console.log(`Server error (${error.statusCode}). Retry ${attempt}/${maxRetries}`);
          continue;
        }
      }

      // Unknown error or non-retryable
      throw error;
    }
  }

  throw new Error('Max retries exceeded');
}

4.3 Monitoring & Metrics

import { AgenticSynth } from '@ruvector/agentic-synth';
import { performance } from 'perf_hooks';

class MonitoredSynth {
  private synth: AgenticSynth;
  private metrics = {
    totalRequests: 0,
    successfulRequests: 0,
    failedRequests: 0,
    totalLatency: 0,
    cacheHits: 0,
    cacheMisses: 0
  };

  constructor(config: any) {
    this.synth = new AgenticSynth(config);
  }

  async generate(type: string, options: any) {
    const start = performance.now();
    this.metrics.totalRequests++;

    try {
      const result = await this.synth.generate(type as any, options);

      this.metrics.successfulRequests++;
      this.metrics.totalLatency += performance.now() - start;

      if (result.metadata.cached) {
        this.metrics.cacheHits++;
      } else {
        this.metrics.cacheMisses++;
      }

      return result;
    } catch (error) {
      this.metrics.failedRequests++;
      throw error;
    }
  }

  getMetrics() {
    const avgLatency = this.metrics.totalLatency / this.metrics.successfulRequests;
    const successRate = this.metrics.successfulRequests / this.metrics.totalRequests;
    const cacheHitRate = this.metrics.cacheHits / (this.metrics.cacheHits + this.metrics.cacheMisses);

    return {
      ...this.metrics,
      avgLatency: Math.round(avgLatency),
      successRate: (successRate * 100).toFixed(2) + '%',
      cacheHitRate: (cacheHitRate * 100).toFixed(2) + '%'
    };
  }
}

// Usage
const monitored = new MonitoredSynth(productionConfig);

// Generate data
await monitored.generate('structured', { count: 100, schema: { id: 'UUID' } });

// Log metrics periodically
setInterval(() => {
  console.log('Synth Metrics:', monitored.getMetrics());
}, 60000); // Every minute

4.4 Rate Limiting

import { AgenticSynth } from '@ruvector/agentic-synth';
import { RateLimiter } from 'limiter';

class RateLimitedSynth {
  private synth: AgenticSynth;
  private limiter: RateLimiter;

  constructor(config: any, requestsPerMinute = 60) {
    this.synth = new AgenticSynth(config);
    this.limiter = new RateLimiter({
      tokensPerInterval: requestsPerMinute,
      interval: 'minute'
    });
  }

  async generate(type: string, options: any) {
    // Wait for rate limit token
    await this.limiter.removeTokens(1);

    // Proceed with generation
    return await this.synth.generate(type as any, options);
  }
}

// Usage: Limit to 60 requests per minute
const limited = new RateLimitedSynth(productionConfig, 60);

5. Error Handling & Monitoring

5.1 Comprehensive Error Handling

import { AgenticSynth, SynthError, ValidationError, APIError, CacheError } from '@ruvector/agentic-synth';

async function robustGeneration(options: any) {
  const synth = new AgenticSynth({
    provider: 'gemini',
    apiKey: process.env.GEMINI_API_KEY
  });

  try {
    const result = await synth.generateStructured(options);
    return result;
  } catch (error) {
    if (error instanceof ValidationError) {
      console.error('❌ Invalid options:', error.message);
      console.error('   Validation errors:', error.validationErrors);
      // Fix options and retry
      return null;
    }

    if (error instanceof APIError) {
      console.error('❌ API error:', error.message);
      console.error('   Status:', error.statusCode);
      console.error('   Provider:', error.provider);

      if (error.statusCode === 401) {
        console.error('   Check API key configuration');
      } else if (error.statusCode === 429) {
        console.error('   Rate limit exceeded - implement backoff');
      } else if (error.statusCode >= 500) {
        console.error('   Provider service error - retry later');
      }

      return null;
    }

    if (error instanceof CacheError) {
      console.error('❌ Cache error:', error.message);
      // Cache errors are non-fatal - continue without cache
      synth.config.cacheStrategy = undefined;
      return await synth.generateStructured(options);
    }

    if (error instanceof SynthError) {
      console.error('❌ Synth error:', error.message);
      return null;
    }

    // Unknown error
    console.error('❌ Unexpected error:', error);
    throw error;
  }
}

5.2 Health Checks

import { AgenticSynth } from '@ruvector/agentic-synth';

class HealthChecker {
  private synth: AgenticSynth;

  constructor(config: any) {
    this.synth = new AgenticSynth(config);
  }

  async checkHealth() {
    const health = {
      status: 'healthy',
      timestamp: new Date().toISOString(),
      checks: {
        apiConnection: false,
        cacheWorking: false,
        generationWorking: false
      },
      metrics: {}
    };

    try {
      // Test API connection with minimal request
      const testResult = await this.synth.generateStructured({
        count: 1,
        schema: { test: 'string' }
      });

      health.checks.apiConnection = true;
      health.checks.generationWorking = true;
      health.checks.cacheWorking = testResult.metadata.cached === false;

      // Get cache stats
      const cacheStats = this.synth.cache.getStats();
      health.metrics = {
        cacheSize: cacheStats.size,
        cacheHitRate: (cacheStats.hitRate * 100).toFixed(2) + '%',
        generationTime: testResult.metadata.generationTime + 'ms'
      };

    } catch (error) {
      health.status = 'unhealthy';
      health.checks.apiConnection = false;
    }

    return health;
  }
}

// Express.js health endpoint
app.get('/health', async (req, res) => {
  const checker = new HealthChecker(productionConfig);
  const health = await checker.checkHealth();

  const statusCode = health.status === 'healthy' ? 200 : 503;
  res.status(statusCode).json(health);
});

6. Advanced Patterns

6.1 Multi-Provider Fallback

import { AgenticSynth } from '@ruvector/agentic-synth';

class MultiProviderSynth {
  private providers: AgenticSynth[];

  constructor() {
    this.providers = [
      new AgenticSynth({
        provider: 'gemini',
        apiKey: process.env.GEMINI_API_KEY,
        model: 'gemini-2.0-flash-exp'
      }),
      new AgenticSynth({
        provider: 'openrouter',
        apiKey: process.env.OPENROUTER_API_KEY,
        model: 'anthropic/claude-3.5-sonnet'
      })
    ];
  }

  async generateWithFallback(type: string, options: any) {
    for (let i = 0; i < this.providers.length; i++) {
      try {
        console.log(`Attempting provider ${i + 1}/${this.providers.length}...`);
        const result = await this.providers[i].generate(type as any, options);
        console.log(`✅ Success with provider ${i + 1}`);
        return result;
      } catch (error) {
        console.log(`❌ Provider ${i + 1} failed:`, error.message);
        if (i === this.providers.length - 1) {
          throw new Error('All providers failed');
        }
      }
    }
  }
}

6.2 Conditional Generation Logic

import { AgenticSynth } from '@ruvector/agentic-synth';

async function generateCustomerProfiles(synth: AgenticSynth, count: number) {
  // First, generate base profiles
  const profiles = await synth.generateStructured({
    count,
    schema: {
      customer_id: 'UUID',
      customer_type: 'individual | business',
      name: 'full name',
      email: 'valid email'
    }
  });

  // Then, conditionally generate additional data based on type
  for (const profile of profiles.data) {
    if (profile.customer_type === 'business') {
      // Generate business-specific data
      const businessData = await synth.generateStructured({
        count: 1,
        schema: {
          company_name: 'company name',
          tax_id: 'EIN',
          employees: 'number (1-10000)',
          annual_revenue: 'number (10000-10000000)',
          industry: 'industry type'
        }
      });

      Object.assign(profile, businessData.data[0]);
    } else {
      // Generate individual-specific data
      const individualData = await synth.generateStructured({
        count: 1,
        schema: {
          age: 'number (18-80)',
          occupation: 'job title',
          income: 'number (20000-200000)',
          marital_status: 'single | married | divorced'
        }
      });

      Object.assign(profile, individualData.data[0]);
    }
  }

  return profiles;
}

6.3 Progressive Enhancement

import { AgenticSynth } from '@ruvector/agentic-synth';

async function progressiveDataGeneration(synth: AgenticSynth) {
  console.log('Phase 1: Basic data generation...');
  let data = await synth.generateStructured({
    count: 100,
    schema: {
      id: 'UUID',
      name: 'full name',
      email: 'valid email'
    }
  });

  console.log('Phase 2: Adding relationships...');
  data = await synth.generateStructured({
    count: 100,
    schema: {
      ...data.data[0], // Existing fields
      friends: ['array of 3-10 UUIDs from existing data'],
      groups: ['array of 1-3 group names']
    }
  });

  console.log('Phase 3: Adding behavioral data...');
  data = await synth.generateStructured({
    count: 100,
    schema: {
      ...data.data[0], // Existing fields
      last_login: 'ISO timestamp',
      total_purchases: 'number (0-100)',
      avg_order_value: 'number (10-500)',
      loyalty_tier: 'bronze | silver | gold | platinum'
    }
  });

  return data;
}

7. Best Practices Summary

✅ Do's

Enable caching for repeated or similar requests
Use batch operations for multiple datasets
Stream large datasets to avoid memory issues
Implement retry logic with exponential backoff
Monitor cache hit rates and adjust TTL accordingly
Validate options before generation
Use environment variables for sensitive config
Implement health checks in production
Log errors comprehensively with context
Test with realistic schemas before production

❌ Don'ts

Don't hardcode API keys in source code
Don't generate without caching in production
Don't ignore validation errors - fix schemas
Don't load massive datasets into memory
Don't skip error handling on generation calls
Don't use inappropriate models for task complexity
Don't disable retries unless intentional
Don't forget to monitor metrics in production
Don't generate unconstrained data without schema
Don't skip testing with edge cases

8. Performance Tips

Cache Configuration: Larger TTL (1-2 hours) for stable schemas
Batch Size: 3-5 concurrent requests for optimal throughput
Model Selection: Use gemini-2.0-flash-exp for speed
Streaming: Use for >10K records to reduce memory
Connection Pooling: Reuse AgenticSynth instances
Rate Limiting: Implement to avoid 429 errors
Schema Simplicity: Simpler schemas = faster generation
Constraint Clarity: Clear constraints improve accuracy
Error Recovery: Implement fallback chains
Monitoring: Track P95/P99 latencies

9. Troubleshooting

Issue: Low Cache Hit Rate

Solution: Use stable, deterministic options

// ❌ Bad: timestamp makes every request unique
const options = { count: 100, timestamp: Date.now() };

// ✅ Good: stable options enable caching
const options = { count: 100, schema: { id: 'UUID' } };

Issue: High Latency

Solution:

Enable caching
Use faster model (gemini-2.0-flash-exp)
Reduce complexity of schema
Batch similar requests

Issue: Memory Errors

Solution: Use streaming for large datasets

// ❌ Bad: load all into memory
const result = await synth.generateStructured({ count: 1000000 });

// ✅ Good: stream records
for await (const record of synth.generateStream('structured', { count: 1000000 })) {
  processRecord(record);
}

Issue: Rate Limiting (429)

Solution: Implement exponential backoff

async function generateWithBackoff(synth, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await synth.generateStructured(options);
    } catch (error) {
      if (error.statusCode === 429) {
        const delay = Math.pow(2, i) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}

10. Additional Resources

API Reference: See docs/API.md
Performance Guide: See docs/PERFORMANCE.md
Benchmarks: See PERFORMANCE_REPORT.md
Examples: See examples/ directory
GitHub: https://github.com/ruvnet/ruvector

Last Updated: 2025-11-22 Package Version: 0.1.0 Status: Production Ready ✅

36 KiB Raw Blame History