dearsky/wifi-densepose

Fork 0

Files

ruv cd5943df23 Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00

15 KiB

Raw Blame History

Troubleshooting Guide

Common issues and solutions for Agentic-Synth.

Installation Issues
Generation Problems
Performance Issues
Quality Problems
Integration Issues
API and Authentication
Memory and Resource Issues

Installation Issues

npm install fails

Symptoms:

npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /path/to/package.json

Solutions:

Ensure you're in the correct directory
Verify Node.js version (>=18.0.0):
```
node --version
```
Clear npm cache:
```
npm cache clean --force
npm install
```
Try with different package manager:
```
pnpm install
# or
yarn install
```

TypeScript type errors

Symptoms:

Cannot find module 'agentic-synth' or its corresponding type declarations

Solutions:

Ensure TypeScript version >=5.0:
```
npm install -D typescript@latest
```

Check tsconfig.json:

{
  "compilerOptions": {
    "moduleResolution": "node",
    "esModuleInterop": true
  }
}

Native dependencies fail to build

Symptoms:

gyp ERR! build error

Solutions:

Install build tools:
- Windows: npm install --global windows-build-tools
- Mac: xcode-select --install
- Linux: sudo apt-get install build-essential
Use pre-built binaries if available

Generation Problems

Generation returns empty results

Symptoms:

const data = await synth.generate({ schema, count: 1000 });
console.log(data.data.length); // 0

Solutions:

Check API key configuration:

const synth = new SynthEngine({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY, // Ensure this is set
});

Verify schema validity:

import { validateSchema } from 'agentic-synth/utils';

const isValid = validateSchema(schema);
if (!isValid.valid) {
  console.error('Schema errors:', isValid.errors);
}

Check for errors in generation:

try {
  const data = await synth.generate({ schema, count: 1000 });
} catch (error) {
  console.error('Generation failed:', error);
}

Generation hangs indefinitely

Symptoms:

Generation never completes
No progress updates
No error messages

Solutions:

Add timeout:

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 60000); // 1 minute

try {
  await synth.generate({
    schema,
    count: 1000,
    abortSignal: controller.signal,
  });
} finally {
  clearTimeout(timeout);
}

Enable verbose logging:

const synth = new SynthEngine({
  provider: 'openai',
  debug: true, // Enable debug logs
});

Reduce batch size:

const synth = new SynthEngine({
  batchSize: 10, // Start small
});

Invalid data generated

Symptoms:

Data doesn't match schema
Missing required fields
Type mismatches

Solutions:

Enable strict validation:

const synth = new SynthEngine({
  validationEnabled: true,
  strictMode: true,
});

Add constraints to schema:

const schema = Schema.define({
  name: 'User',
  type: 'object',
  properties: {
    email: {
      type: 'string',
      format: 'email',
      pattern: '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$',
    },
  },
  required: ['email'],
});

Increase temperature for diversity:

const synth = new SynthEngine({
  temperature: 0.8, // Higher for more variation
});

Performance Issues

Slow generation speed

Symptoms:

Generation takes much longer than expected
Low throughput (< 100 items/minute)

Solutions:

Enable streaming mode:

for await (const item of synth.generateStream({ schema, count: 10000 })) {
  // Process item immediately
}

Increase batch size:

const synth = new SynthEngine({
  batchSize: 1000, // Larger batches
  maxWorkers: 8,   // More parallel workers
});

Use faster model:

const synth = new SynthEngine({
  provider: 'openai',
  model: 'gpt-3.5-turbo', // Faster than gpt-4
});

Cache embeddings:

const synth = new SynthEngine({
  cacheEnabled: true,
  cacheTTL: 3600, // 1 hour
});

Profile generation:

import { profiler } from 'agentic-synth/utils';

const profile = await profiler.profile(() => {
  return synth.generate({ schema, count: 1000 });
});

console.log('Bottlenecks:', profile.bottlenecks);

High memory usage

Symptoms:

FATAL ERROR: Reached heap limit Allocation failed

Solutions:

Use streaming:

// Instead of loading all in memory
const data = await synth.generate({ schema, count: 1000000 }); // ❌

// Stream and process incrementally
for await (const item of synth.generateStream({ schema, count: 1000000 })) { // ✅
  await processItem(item);
}

Reduce batch size:

const synth = new SynthEngine({
  batchSize: 100, // Smaller batches
});

Increase Node.js heap size:

NODE_OPTIONS="--max-old-space-size=4096" npm start

Process in chunks:

const chunkSize = 10000;
const totalCount = 1000000;

for (let i = 0; i < totalCount; i += chunkSize) {
  const chunk = await synth.generate({
    schema,
    count: Math.min(chunkSize, totalCount - i),
  });
  await exportChunk(chunk, i);
}

Quality Problems

Low realism scores

Symptoms:

const metrics = await QualityMetrics.evaluate(data);
console.log(metrics.realism); // 0.45 (too low)

Solutions:

Improve schema descriptions:

const schema = Schema.define({
  name: 'User',
  description: 'A realistic user profile with authentic details',
  properties: {
    name: {
      type: 'string',
      description: 'Full name following cultural naming conventions',
    },
  },
});

Add examples to schema:

const schema = Schema.define({
  properties: {
    bio: {
      type: 'string',
      examples: [
        'Passionate about machine learning and open source',
        'Software engineer with 10 years of experience',
      ],
    },
  },
});

Adjust temperature:

const synth = new SynthEngine({
  temperature: 0.9, // Higher for more natural variation
});

Use better model:

const synth = new SynthEngine({
  provider: 'anthropic',
  model: 'claude-3-opus-20240229', // Higher quality
});

Low diversity scores

Symptoms:

Many duplicate or nearly identical examples
Limited variation in generated data

Solutions:

Increase temperature:

const synth = new SynthEngine({
  temperature: 0.95, // Maximum diversity
});

Add diversity constraints:

const schema = Schema.define({
  constraints: [
    {
      type: 'diversity',
      field: 'content',
      minSimilarity: 0.3, // Max 30% similarity
    },
  ],
});

Use varied prompts:

const synth = new SynthEngine({
  promptVariation: true,
  variationStrategies: ['paraphrase', 'reframe', 'alternative-angle'],
});

Biased data detected

Symptoms:

const metrics = await QualityMetrics.evaluate(data, { bias: true });
console.log(metrics.bias); // { gender: 0.85 } (too high)

Solutions:

Add fairness constraints:

const schema = Schema.define({
  constraints: [
    {
      type: 'fairness',
      attributes: ['gender', 'age', 'ethnicity'],
      distribution: 'uniform',
    },
  ],
});

Explicit diversity instructions:

const schema = Schema.define({
  description: 'Generate diverse examples representing all demographics equally',
});

Post-generation filtering:

import { BiasDetector } from 'agentic-synth/utils';

const detector = new BiasDetector();
const balanced = data.filter(item => {
  const bias = detector.detect(item);
  return bias.overall < 0.3; // Keep low-bias items
});

Integration Issues

Ruvector connection fails

Symptoms:

Error: Cannot connect to Ruvector at localhost:8080

Solutions:

Verify Ruvector is running:

# Check if Ruvector service is running
curl http://localhost:8080/health

Check connection configuration:

const db = new VectorDB({
  host: 'localhost',
  port: 8080,
  timeout: 5000,
});

Use retry logic:

import { retry } from 'agentic-synth/utils';

const db = await retry(() => new VectorDB(), {
  attempts: 3,
  delay: 1000,
});

Vector insertion fails

Symptoms:

Error: Failed to insert vectors into collection

Solutions:

Verify collection exists:

const collections = await db.listCollections();
if (!collections.includes('my-collection')) {
  await db.createCollection('my-collection', { dimensions: 384 });
}

Check vector dimensions match:

const schema = Schema.define({
  properties: {
    embedding: {
      type: 'embedding',
      dimensions: 384, // Must match collection config
    },
  },
});

Use batching:

await synth.generateAndInsert({
  schema,
  count: 10000,
  collection: 'vectors',
  batchSize: 1000, // Insert in batches
});

API and Authentication

OpenAI API errors

Symptoms:

Error: Incorrect API key provided

Solutions:

Verify API key:
```
echo $OPENAI_API_KEY
```
Set environment variable:
```
export OPENAI_API_KEY="sk-..."
```

Pass key explicitly:

const synth = new SynthEngine({
  provider: 'openai',
  apiKey: 'sk-...', // Not recommended for production
});

Rate limit exceeded

Symptoms:

Error: Rate limit exceeded. Please try again later.

Solutions:

Implement exponential backoff:

const synth = new SynthEngine({
  retryConfig: {
    maxRetries: 5,
    backoffMultiplier: 2,
    initialDelay: 1000,
  },
});

Reduce request rate:

const synth = new SynthEngine({
  rateLimit: {
    requestsPerMinute: 60,
    tokensPerMinute: 90000,
  },
});

Use multiple API keys:

const synth = new SynthEngine({
  provider: 'openai',
  apiKeys: [
    process.env.OPENAI_API_KEY_1,
    process.env.OPENAI_API_KEY_2,
    process.env.OPENAI_API_KEY_3,
  ],
  keyRotationStrategy: 'round-robin',
});

Memory and Resource Issues

Out of memory errors

Solutions:

Use streaming mode (recommended):

for await (const item of synth.generateStream({ schema, count: 1000000 })) {
  await processAndDiscard(item);
}

Process in smaller batches:

async function generateInChunks(totalCount: number, chunkSize: number) {
  for (let i = 0; i < totalCount; i += chunkSize) {
    const chunk = await synth.generate({
      schema,
      count: chunkSize,
    });
    await processChunk(chunk);
    // Chunk is garbage collected after processing
  }
}

Increase Node.js memory:

node --max-old-space-size=8192 script.js

Disk space issues

Symptoms:

Error: ENOSPC: no space left on device

Solutions:

Stream directly to storage:

import { createWriteStream } from 'fs';

const stream = createWriteStream('./output.jsonl');
for await (const item of synth.generateStream({ schema, count: 1000000 })) {
  stream.write(JSON.stringify(item) + '\n');
}
stream.end();

Use compression:

import { createGzip } from 'zlib';
import { pipeline } from 'stream/promises';

await pipeline(
  synth.generateStream({ schema, count: 1000000 }),
  createGzip(),
  createWriteStream('./output.jsonl.gz')
);

Export to remote storage:

import { S3Client } from '@aws-sdk/client-s3';

const s3 = new S3Client({ region: 'us-east-1' });
await synth.generate({ schema, count: 1000000 }).export({
  format: 'parquet',
  destination: 's3://my-bucket/synthetic-data.parquet',
});

Debugging Tips

Enable debug logging

import { setLogLevel } from 'agentic-synth';

setLogLevel('debug');

const synth = new SynthEngine({
  debug: true,
  verbose: true,
});

Use profiler

import { profiler } from 'agentic-synth/utils';

const results = await profiler.profile(async () => {
  return await synth.generate({ schema, count: 1000 });
});

console.log('Performance breakdown:', results.breakdown);
console.log('Bottlenecks:', results.bottlenecks);

Test with small datasets first

// Test with 10 examples first
const test = await synth.generate({ schema, count: 10 });
console.log('Sample:', test.data[0]);

// Validate quality
const quality = await QualityMetrics.evaluate(test.data);
console.log('Quality:', quality);

// If quality is good, scale up
if (quality.overall > 0.85) {
  const full = await synth.generate({ schema, count: 100000 });
}

Getting Help

If you're still experiencing issues:

Check documentation: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/docs
Search issues: https://github.com/ruvnet/ruvector/issues
Ask on Discord: https://discord.gg/ruvnet
Open an issue: https://github.com/ruvnet/ruvector/issues/new

When reporting issues, include:

Agentic-Synth version: npm list agentic-synth
Node.js version: node --version
Operating system
Minimal reproduction code
Error messages and stack traces
Schema definition (if relevant)

FAQ

Q: Why is generation slow? A: Enable streaming, increase batch size, use faster models, or cache embeddings.

Q: How do I improve data quality? A: Use better models, add detailed schema descriptions, include examples, adjust temperature.

Q: Can I use multiple LLM providers? A: Yes, configure fallback providers or rotate between them.

Q: How do I handle rate limits? A: Implement exponential backoff, reduce rate, or use multiple API keys.

Q: Is there a size limit for generation? A: No hard limit, but use streaming for datasets > 10,000 items.

15 KiB

Raw Blame History

Troubleshooting Guide

Table of Contents

Installation Issues

npm install fails

TypeScript type errors

Native dependencies fail to build

Generation Problems

Generation returns empty results

Generation hangs indefinitely

Invalid data generated

Performance Issues

Slow generation speed

High memory usage

Quality Problems

Low realism scores

Low diversity scores

Biased data detected

Integration Issues

Ruvector connection fails

Vector insertion fails

API and Authentication

OpenAI API errors

Rate limit exceeded

Memory and Resource Issues

Out of memory errors

Disk space issues

Debugging Tips

Enable debug logging

Use profiler

Test with small datasets first

Getting Help

FAQ

Additional Resources

15 KiB Raw Blame History

Troubleshooting Guide

Table of Contents

Installation Issues

npm install fails

TypeScript type errors

Native dependencies fail to build

Generation Problems

Generation returns empty results

Generation hangs indefinitely

Invalid data generated

Performance Issues

Slow generation speed

High memory usage

Quality Problems

Low realism scores

Low diversity scores

Biased data detected

Integration Issues

Ruvector connection fails

Vector insertion fails

API and Authentication

OpenAI API errors

Rate limit exceeded

Memory and Resource Issues

Out of memory errors

Disk space issues

Debugging Tips

Enable debug logging

Use profiler

Test with small datasets first

Getting Help

FAQ

Additional Resources

15 KiB

Raw Blame History