Files
wifi-densepose/vendor/ruvector/npm/packages/agentic-synth/docs/TROUBLESHOOTING.md

15 KiB

Troubleshooting Guide

Common issues and solutions for Agentic-Synth.

Table of Contents


Installation Issues

npm install fails

Symptoms:

npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /path/to/package.json

Solutions:

  1. Ensure you're in the correct directory
  2. Verify Node.js version (>=18.0.0):
    node --version
    
  3. Clear npm cache:
    npm cache clean --force
    npm install
    
  4. Try with different package manager:
    pnpm install
    # or
    yarn install
    

TypeScript type errors

Symptoms:

Cannot find module 'agentic-synth' or its corresponding type declarations

Solutions:

  1. Ensure TypeScript version >=5.0:
    npm install -D typescript@latest
    
  2. Check tsconfig.json:
    {
      "compilerOptions": {
        "moduleResolution": "node",
        "esModuleInterop": true
      }
    }
    

Native dependencies fail to build

Symptoms:

gyp ERR! build error

Solutions:

  1. Install build tools:
    • Windows: npm install --global windows-build-tools
    • Mac: xcode-select --install
    • Linux: sudo apt-get install build-essential
  2. Use pre-built binaries if available

Generation Problems

Generation returns empty results

Symptoms:

const data = await synth.generate({ schema, count: 1000 });
console.log(data.data.length); // 0

Solutions:

  1. Check API key configuration:

    const synth = new SynthEngine({
      provider: 'openai',
      apiKey: process.env.OPENAI_API_KEY, // Ensure this is set
    });
    
  2. Verify schema validity:

    import { validateSchema } from 'agentic-synth/utils';
    
    const isValid = validateSchema(schema);
    if (!isValid.valid) {
      console.error('Schema errors:', isValid.errors);
    }
    
  3. Check for errors in generation:

    try {
      const data = await synth.generate({ schema, count: 1000 });
    } catch (error) {
      console.error('Generation failed:', error);
    }
    

Generation hangs indefinitely

Symptoms:

  • Generation never completes
  • No progress updates
  • No error messages

Solutions:

  1. Add timeout:

    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), 60000); // 1 minute
    
    try {
      await synth.generate({
        schema,
        count: 1000,
        abortSignal: controller.signal,
      });
    } finally {
      clearTimeout(timeout);
    }
    
  2. Enable verbose logging:

    const synth = new SynthEngine({
      provider: 'openai',
      debug: true, // Enable debug logs
    });
    
  3. Reduce batch size:

    const synth = new SynthEngine({
      batchSize: 10, // Start small
    });
    

Invalid data generated

Symptoms:

  • Data doesn't match schema
  • Missing required fields
  • Type mismatches

Solutions:

  1. Enable strict validation:

    const synth = new SynthEngine({
      validationEnabled: true,
      strictMode: true,
    });
    
  2. Add constraints to schema:

    const schema = Schema.define({
      name: 'User',
      type: 'object',
      properties: {
        email: {
          type: 'string',
          format: 'email',
          pattern: '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$',
        },
      },
      required: ['email'],
    });
    
  3. Increase temperature for diversity:

    const synth = new SynthEngine({
      temperature: 0.8, // Higher for more variation
    });
    

Performance Issues

Slow generation speed

Symptoms:

  • Generation takes much longer than expected
  • Low throughput (< 100 items/minute)

Solutions:

  1. Enable streaming mode:

    for await (const item of synth.generateStream({ schema, count: 10000 })) {
      // Process item immediately
    }
    
  2. Increase batch size:

    const synth = new SynthEngine({
      batchSize: 1000, // Larger batches
      maxWorkers: 8,   // More parallel workers
    });
    
  3. Use faster model:

    const synth = new SynthEngine({
      provider: 'openai',
      model: 'gpt-3.5-turbo', // Faster than gpt-4
    });
    
  4. Cache embeddings:

    const synth = new SynthEngine({
      cacheEnabled: true,
      cacheTTL: 3600, // 1 hour
    });
    
  5. Profile generation:

    import { profiler } from 'agentic-synth/utils';
    
    const profile = await profiler.profile(() => {
      return synth.generate({ schema, count: 1000 });
    });
    
    console.log('Bottlenecks:', profile.bottlenecks);
    

High memory usage

Symptoms:

FATAL ERROR: Reached heap limit Allocation failed

Solutions:

  1. Use streaming:

    // Instead of loading all in memory
    const data = await synth.generate({ schema, count: 1000000 }); // ❌
    
    // Stream and process incrementally
    for await (const item of synth.generateStream({ schema, count: 1000000 })) { // ✅
      await processItem(item);
    }
    
  2. Reduce batch size:

    const synth = new SynthEngine({
      batchSize: 100, // Smaller batches
    });
    
  3. Increase Node.js heap size:

    NODE_OPTIONS="--max-old-space-size=4096" npm start
    
  4. Process in chunks:

    const chunkSize = 10000;
    const totalCount = 1000000;
    
    for (let i = 0; i < totalCount; i += chunkSize) {
      const chunk = await synth.generate({
        schema,
        count: Math.min(chunkSize, totalCount - i),
      });
      await exportChunk(chunk, i);
    }
    

Quality Problems

Low realism scores

Symptoms:

const metrics = await QualityMetrics.evaluate(data);
console.log(metrics.realism); // 0.45 (too low)

Solutions:

  1. Improve schema descriptions:

    const schema = Schema.define({
      name: 'User',
      description: 'A realistic user profile with authentic details',
      properties: {
        name: {
          type: 'string',
          description: 'Full name following cultural naming conventions',
        },
      },
    });
    
  2. Add examples to schema:

    const schema = Schema.define({
      properties: {
        bio: {
          type: 'string',
          examples: [
            'Passionate about machine learning and open source',
            'Software engineer with 10 years of experience',
          ],
        },
      },
    });
    
  3. Adjust temperature:

    const synth = new SynthEngine({
      temperature: 0.9, // Higher for more natural variation
    });
    
  4. Use better model:

    const synth = new SynthEngine({
      provider: 'anthropic',
      model: 'claude-3-opus-20240229', // Higher quality
    });
    

Low diversity scores

Symptoms:

  • Many duplicate or nearly identical examples
  • Limited variation in generated data

Solutions:

  1. Increase temperature:

    const synth = new SynthEngine({
      temperature: 0.95, // Maximum diversity
    });
    
  2. Add diversity constraints:

    const schema = Schema.define({
      constraints: [
        {
          type: 'diversity',
          field: 'content',
          minSimilarity: 0.3, // Max 30% similarity
        },
      ],
    });
    
  3. Use varied prompts:

    const synth = new SynthEngine({
      promptVariation: true,
      variationStrategies: ['paraphrase', 'reframe', 'alternative-angle'],
    });
    

Biased data detected

Symptoms:

const metrics = await QualityMetrics.evaluate(data, { bias: true });
console.log(metrics.bias); // { gender: 0.85 } (too high)

Solutions:

  1. Add fairness constraints:

    const schema = Schema.define({
      constraints: [
        {
          type: 'fairness',
          attributes: ['gender', 'age', 'ethnicity'],
          distribution: 'uniform',
        },
      ],
    });
    
  2. Explicit diversity instructions:

    const schema = Schema.define({
      description: 'Generate diverse examples representing all demographics equally',
    });
    
  3. Post-generation filtering:

    import { BiasDetector } from 'agentic-synth/utils';
    
    const detector = new BiasDetector();
    const balanced = data.filter(item => {
      const bias = detector.detect(item);
      return bias.overall < 0.3; // Keep low-bias items
    });
    

Integration Issues

Ruvector connection fails

Symptoms:

Error: Cannot connect to Ruvector at localhost:8080

Solutions:

  1. Verify Ruvector is running:

    # Check if Ruvector service is running
    curl http://localhost:8080/health
    
  2. Check connection configuration:

    const db = new VectorDB({
      host: 'localhost',
      port: 8080,
      timeout: 5000,
    });
    
  3. Use retry logic:

    import { retry } from 'agentic-synth/utils';
    
    const db = await retry(() => new VectorDB(), {
      attempts: 3,
      delay: 1000,
    });
    

Vector insertion fails

Symptoms:

Error: Failed to insert vectors into collection

Solutions:

  1. Verify collection exists:

    const collections = await db.listCollections();
    if (!collections.includes('my-collection')) {
      await db.createCollection('my-collection', { dimensions: 384 });
    }
    
  2. Check vector dimensions match:

    const schema = Schema.define({
      properties: {
        embedding: {
          type: 'embedding',
          dimensions: 384, // Must match collection config
        },
      },
    });
    
  3. Use batching:

    await synth.generateAndInsert({
      schema,
      count: 10000,
      collection: 'vectors',
      batchSize: 1000, // Insert in batches
    });
    

API and Authentication

OpenAI API errors

Symptoms:

Error: Incorrect API key provided

Solutions:

  1. Verify API key:

    echo $OPENAI_API_KEY
    
  2. Set environment variable:

    export OPENAI_API_KEY="sk-..."
    
  3. Pass key explicitly:

    const synth = new SynthEngine({
      provider: 'openai',
      apiKey: 'sk-...', // Not recommended for production
    });
    

Rate limit exceeded

Symptoms:

Error: Rate limit exceeded. Please try again later.

Solutions:

  1. Implement exponential backoff:

    const synth = new SynthEngine({
      retryConfig: {
        maxRetries: 5,
        backoffMultiplier: 2,
        initialDelay: 1000,
      },
    });
    
  2. Reduce request rate:

    const synth = new SynthEngine({
      rateLimit: {
        requestsPerMinute: 60,
        tokensPerMinute: 90000,
      },
    });
    
  3. Use multiple API keys:

    const synth = new SynthEngine({
      provider: 'openai',
      apiKeys: [
        process.env.OPENAI_API_KEY_1,
        process.env.OPENAI_API_KEY_2,
        process.env.OPENAI_API_KEY_3,
      ],
      keyRotationStrategy: 'round-robin',
    });
    

Memory and Resource Issues

Out of memory errors

Solutions:

  1. Use streaming mode (recommended):

    for await (const item of synth.generateStream({ schema, count: 1000000 })) {
      await processAndDiscard(item);
    }
    
  2. Process in smaller batches:

    async function generateInChunks(totalCount: number, chunkSize: number) {
      for (let i = 0; i < totalCount; i += chunkSize) {
        const chunk = await synth.generate({
          schema,
          count: chunkSize,
        });
        await processChunk(chunk);
        // Chunk is garbage collected after processing
      }
    }
    
  3. Increase Node.js memory:

    node --max-old-space-size=8192 script.js
    

Disk space issues

Symptoms:

Error: ENOSPC: no space left on device

Solutions:

  1. Stream directly to storage:

    import { createWriteStream } from 'fs';
    
    const stream = createWriteStream('./output.jsonl');
    for await (const item of synth.generateStream({ schema, count: 1000000 })) {
      stream.write(JSON.stringify(item) + '\n');
    }
    stream.end();
    
  2. Use compression:

    import { createGzip } from 'zlib';
    import { pipeline } from 'stream/promises';
    
    await pipeline(
      synth.generateStream({ schema, count: 1000000 }),
      createGzip(),
      createWriteStream('./output.jsonl.gz')
    );
    
  3. Export to remote storage:

    import { S3Client } from '@aws-sdk/client-s3';
    
    const s3 = new S3Client({ region: 'us-east-1' });
    await synth.generate({ schema, count: 1000000 }).export({
      format: 'parquet',
      destination: 's3://my-bucket/synthetic-data.parquet',
    });
    

Debugging Tips

Enable debug logging

import { setLogLevel } from 'agentic-synth';

setLogLevel('debug');

const synth = new SynthEngine({
  debug: true,
  verbose: true,
});

Use profiler

import { profiler } from 'agentic-synth/utils';

const results = await profiler.profile(async () => {
  return await synth.generate({ schema, count: 1000 });
});

console.log('Performance breakdown:', results.breakdown);
console.log('Bottlenecks:', results.bottlenecks);

Test with small datasets first

// Test with 10 examples first
const test = await synth.generate({ schema, count: 10 });
console.log('Sample:', test.data[0]);

// Validate quality
const quality = await QualityMetrics.evaluate(test.data);
console.log('Quality:', quality);

// If quality is good, scale up
if (quality.overall > 0.85) {
  const full = await synth.generate({ schema, count: 100000 });
}

Getting Help

If you're still experiencing issues:

  1. Check documentation: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/docs
  2. Search issues: https://github.com/ruvnet/ruvector/issues
  3. Ask on Discord: https://discord.gg/ruvnet
  4. Open an issue: https://github.com/ruvnet/ruvector/issues/new

When reporting issues, include:

  • Agentic-Synth version: npm list agentic-synth
  • Node.js version: node --version
  • Operating system
  • Minimal reproduction code
  • Error messages and stack traces
  • Schema definition (if relevant)

FAQ

Q: Why is generation slow? A: Enable streaming, increase batch size, use faster models, or cache embeddings.

Q: How do I improve data quality? A: Use better models, add detailed schema descriptions, include examples, adjust temperature.

Q: Can I use multiple LLM providers? A: Yes, configure fallback providers or rotate between them.

Q: How do I handle rate limits? A: Implement exponential backoff, reduce rate, or use multiple API keys.

Q: Is there a size limit for generation? A: No hard limit, but use streaming for datasets > 10,000 items.


Additional Resources