Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

14 KiB

Raw Blame History

DSPy.ts + AgenticSynth Complete Integration Guide

Overview

This comprehensive example demonstrates real-world integration between DSPy.ts (v2.1.1) and AgenticSynth for e-commerce product data generation with automatic optimization.

What This Example Does

🎯 Complete Workflow

Baseline Generation: Uses AgenticSynth with Gemini to generate product data
DSPy Setup: Configures OpenAI with ChainOfThought reasoning module
Optimization: Uses BootstrapFewShot to learn from high-quality examples
Comparison: Analyzes quality improvements, cost, and performance
Reporting: Generates detailed comparison metrics and visualizations

🔧 Technologies Used

DSPy.ts v2.1.1: Real modules (ChainOfThought, BootstrapFewShot, metrics)
AgenticSynth: Baseline synthetic data generation
OpenAI GPT-3.5: Optimized generation with reasoning
Gemini Flash: Fast baseline generation
TypeScript: Type-safe implementation

Setup

Prerequisites

node >= 18.0.0
npm >= 9.0.0

Environment Variables

Create a .env file in the package root:

# Required
OPENAI_API_KEY=sk-...                    # OpenAI API key
GEMINI_API_KEY=...                       # Google AI Studio API key

# Optional
ANTHROPIC_API_KEY=sk-ant-...             # For Claude models

Installation

# Install dependencies
npm install

# Build the package
npm run build

Running the Example

Basic Usage

# Set environment variables
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...

# Run the example
npx tsx examples/dspy-complete-example.ts

Expected Output

╔════════════════════════════════════════════════════════════════════════╗
║         DSPy.ts + AgenticSynth Integration Example                    ║
║         E-commerce Product Data Generation with Optimization           ║
╚════════════════════════════════════════════════════════════════════════╝

✅ Environment validated

🔷 PHASE 1: BASELINE GENERATION

📦 Generating baseline data with AgenticSynth (Gemini)...

  ✓ [1/10] UltraSound Pro Wireless Headphones
    Quality: 72.3% | Price: $249.99 | Rating: 4.7/5
  ✓ [2/10] EcoLux Organic Cotton T-Shirt
    Quality: 68.5% | Price: $79.99 | Rating: 4.5/5
  ...

✅ Baseline generation complete: 10/10 products in 8.23s
💰 Estimated cost: $0.0005

🔷 PHASE 2: DSPy OPTIMIZATION

🧠 Setting up DSPy optimization with OpenAI...

  📡 Configuring OpenAI language model...
  ✓ Language model configured

  🔧 Creating ChainOfThought module...
  ✓ Module created

  📚 Loading training examples...
  ✓ Loaded 5 high-quality examples

  🎯 Running BootstrapFewShot optimizer...
  ✓ Optimization complete in 12.45s

✅ DSPy module ready for generation

🔷 PHASE 3: OPTIMIZED GENERATION

🚀 Generating optimized data with DSPy + OpenAI...

  ✓ [1/10] SmartHome Voice Assistant Hub
    Quality: 85.7% | Price: $179.99 | Rating: 4.8/5
  ...

✅ Optimized generation complete: 10/10 products in 15.67s
💰 Estimated cost: $0.0070

🔷 PHASE 4: ANALYSIS & REPORTING

╔════════════════════════════════════════════════════════════════════════╗
║                     COMPARISON REPORT                                  ║
╚════════════════════════════════════════════════════════════════════════╝

📊 BASELINE (AgenticSynth + Gemini)
────────────────────────────────────────────────────────────────────────────
Products Generated:    10
Generation Time:       8.23s
Estimated Cost:        $0.0005

Quality Metrics:
  Overall Quality:     68.2%
  Completeness:        72.5%
  Coherence:           65.0%
  Persuasiveness:      60.8%
  SEO Quality:         74.5%

🚀 OPTIMIZED (DSPy + OpenAI)
────────────────────────────────────────────────────────────────────────────
Products Generated:    10
Generation Time:       15.67s
Estimated Cost:        $0.0070

Quality Metrics:
  Overall Quality:     84.3%
  Completeness:        88.2%
  Coherence:           82.5%
  Persuasiveness:      85.0%
  SEO Quality:         81.5%

📈 IMPROVEMENT ANALYSIS
────────────────────────────────────────────────────────────────────────────
Quality Gain:          +23.6%
Speed Change:          +90.4%
Cost Efficiency:       +14.8%

📊 QUALITY COMPARISON CHART
────────────────────────────────────────────────────────────────────────────
Baseline:  ██████████████████████████████████ 68.2%
Optimized: ██████████████████████████████████████████ 84.3%

💡 KEY INSIGHTS
────────────────────────────────────────────────────────────────────────────
✓ Significant quality improvement with DSPy optimization
✓ Better cost efficiency with optimized approach

════════════════════════════════════════════════════════════════════════════

📁 Results exported to: .../examples/logs/dspy-comparison-results.json

✅ Example complete!

💡 Next steps:
   1. Review the comparison report above
   2. Check exported JSON for detailed results
   3. Experiment with different training examples
   4. Try other DSPy modules (Refine, ReAct, etc.)
   5. Adjust CONFIG parameters for your use case

Configuration

Customizable Parameters

Edit the CONFIG object in the example file:

const CONFIG = {
  SAMPLE_SIZE: 10,           // Number of products to generate
  TRAINING_EXAMPLES: 5,      // Examples for DSPy optimization
  BASELINE_MODEL: 'gemini-2.0-flash-exp',
  OPTIMIZED_MODEL: 'gpt-3.5-turbo',

  CATEGORIES: [
    'Electronics',
    'Fashion',
    'Home & Garden',
    'Sports & Outdoors',
    'Books & Media',
    'Health & Beauty'
  ],

  PRICE_RANGES: {
    low: { min: 10, max: 50 },
    medium: { min: 50, max: 200 },
    high: { min: 200, max: 1000 }
  }
};

Understanding the Code

Phase 1: Baseline Generation

const synth = new AgenticSynth({
  provider: 'gemini',
  model: 'gemini-2.0-flash-exp',
  apiKey: process.env.GEMINI_API_KEY
});

const result = await synth.generateStructured<Product>({
  prompt: '...',
  schema: { /* product schema */ },
  count: 1
});

Purpose: Establishes baseline quality and cost metrics using standard generation.

Phase 2: DSPy Setup

// Configure language model
const lm = new OpenAILM({
  model: 'gpt-3.5-turbo',
  apiKey: process.env.OPENAI_API_KEY,
  temperature: 0.7
});
await lm.init();
configureLM(lm);

// Create reasoning module
const productGenerator = new ChainOfThought({
  name: 'ProductGenerator',
  signature: {
    inputs: [
      { name: 'category', type: 'string', required: true },
      { name: 'priceRange', type: 'string', required: true }
    ],
    outputs: [
      { name: 'name', type: 'string', required: true },
      { name: 'description', type: 'string', required: true },
      { name: 'price', type: 'number', required: true },
      { name: 'rating', type: 'number', required: true }
    ]
  }
});

Purpose: Sets up DSPy's declarative reasoning framework.

Phase 3: Optimization

const optimizer = new BootstrapFewShot({
  metric: productQualityMetric,
  maxBootstrappedDemos: 5,
  maxLabeledDemos: 3,
  teacherSettings: { temperature: 0.5 },
  maxRounds: 2
});

const optimizedModule = await optimizer.compile(
  productGenerator,
  trainingExamples
);

Purpose: Learns from high-quality examples to improve generation.

Phase 4: Generation with Optimized Module

const result = await optimizedModule.forward({
  category: 'Electronics',
  priceRange: '$100-$500'
});

const product: Product = {
  name: result.name,
  description: result.description,
  price: result.price,
  rating: result.rating
};

Purpose: Uses optimized prompts and reasoning chains learned during compilation.

Quality Metrics Explained

The example calculates four quality dimensions:

1. Completeness (40% weight)

Description length (100-500 words)
Contains features/benefits
Has call-to-action

2. Coherence (20% weight)

Sentence structure quality
Average sentence length (15-25 words ideal)
Natural flow

3. Persuasiveness (20% weight)

Persuasive language usage
Emotional appeal
Value proposition clarity

4. SEO Quality (20% weight)

Product name in description
Keyword presence
Discoverability

Advanced Usage

Using Different DSPy Modules

Refine Module (Iterative Improvement)

import { Refine } from 'dspy.ts';

const refiner = new Refine({
  name: 'ProductRefiner',
  signature: { /* ... */ },
  maxIterations: 3,
  constraints: [
    { field: 'description', check: (val) => val.length >= 100 }
  ]
});

ReAct Module (Reasoning + Acting)

import { ReAct } from 'dspy.ts';

const reactor = new ReAct({
  name: 'ProductResearcher',
  signature: { /* ... */ },
  tools: [searchTool, pricingTool]
});

Custom Metrics

import { createMetric } from 'dspy.ts';

const customMetric = createMetric(
  'brand-consistency',
  (example, prediction) => {
    // Your custom evaluation logic
    const score = calculateBrandScore(prediction);
    return score;
  }
);

Integration with AgenticDB

import { AgenticDB } from 'agentdb';

// Store products in vector database
const db = new AgenticDB();
await db.init();

for (const product of optimizedProducts) {
  await db.add({
    id: product.id,
    text: product.description,
    metadata: { category: product.category, price: product.price }
  });
}

// Semantic search
const similar = await db.search('wireless noise cancelling headphones', {
  limit: 5
});

Troubleshooting

Common Issues

1. Module Not Found

Error: Cannot find module 'dspy.ts'

Solution: Ensure dependencies are installed:

npm install

2. API Key Not Found

❌ Missing required environment variables:
   - OPENAI_API_KEY

Solution: Export environment variables:

export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...

3. Rate Limiting

Error: Rate limit exceeded

Solution: Add delays or reduce SAMPLE_SIZE:

const CONFIG = {
  SAMPLE_SIZE: 5,  // Reduce from 10
  // ...
};

4. Out of Memory

Solution: Process in smaller batches:

const batchSize = 5;
for (let i = 0; i < totalProducts; i += batchSize) {
  const batch = await generateBatch(batchSize);
  // Process batch
}

Performance Tips

1. Parallel Generation

const promises = categories.map(category =>
  optimizedModule.forward({ category, priceRange })
);
const results = await Promise.all(promises);

2. Caching

const synth = new AgenticSynth({
  cacheStrategy: 'redis',
  cacheTTL: 3600,
  // ...
});

3. Streaming

for await (const product of synth.generateStream('structured', options)) {
  console.log('Generated:', product);
  // Process immediately
}

Cost Optimization

Model Selection Strategy

Use Case	Baseline Model	Optimized Model	Notes
High Quality	GPT-4	Claude Opus	Premium quality
Balanced	Gemini Flash	GPT-3.5 Turbo	Good quality/cost
Cost-Effective	Gemini Flash	Gemini Flash	Minimal cost
High Volume	Llama 3.1	Gemini Flash	Maximum throughput

Budget Management

const CONFIG = {
  MAX_BUDGET: 1.0,  // $1 USD limit
  COST_PER_TOKEN: 0.0005,
  // ...
};

let totalCost = 0;
for (let i = 0; i < products && totalCost < CONFIG.MAX_BUDGET; i++) {
  const result = await generate();
  totalCost += estimateCost(result);
}

Testing

Unit Tests

import { describe, it, expect } from 'vitest';
import { calculateQualityMetrics } from './dspy-complete-example';

describe('Quality Metrics', () => {
  it('should calculate completeness correctly', () => {
    const product = {
      name: 'Test Product',
      description: 'A'.repeat(150),
      price: 99.99,
      rating: 4.5
    };

    const metrics = calculateQualityMetrics(product);
    expect(metrics.completeness).toBeGreaterThan(0);
  });
});

Integration Tests

npm run test -- examples/dspy-complete-example.test.ts

14 KiB Raw Blame History