# Troubleshooting Guide

Common issues and solutions for Agentic-Synth.

## Table of Contents

- [Installation Issues](#installation-issues)
- [Generation Problems](#generation-problems)
- [Performance Issues](#performance-issues)
- [Quality Problems](#quality-problems)
- [Integration Issues](#integration-issues)
- [API and Authentication](#api-and-authentication)
- [Memory and Resource Issues](#memory-and-resource-issues)

---

## Installation Issues

### npm install fails

**Symptoms:**
```bash
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /path/to/package.json
```

**Solutions:**
1. Ensure you're in the correct directory
2. Verify Node.js version (>=18.0.0):
   ```bash
   node --version
   ```
3. Clear npm cache:
   ```bash
   npm cache clean --force
   npm install
   ```
4. Try with different package manager:
   ```bash
   pnpm install
   # or
   yarn install
   ```

### TypeScript type errors

**Symptoms:**
```
Cannot find module 'agentic-synth' or its corresponding type declarations
```

**Solutions:**
1. Ensure TypeScript version >=5.0:
   ```bash
   npm install -D typescript@latest
   ```
2. Check tsconfig.json:
   ```json
   {
     "compilerOptions": {
       "moduleResolution": "node",
       "esModuleInterop": true
     }
   }
   ```

### Native dependencies fail to build

**Symptoms:**
```
gyp ERR! build error
```

**Solutions:**
1. Install build tools:
   - **Windows**: `npm install --global windows-build-tools`
   - **Mac**: `xcode-select --install`
   - **Linux**: `sudo apt-get install build-essential`
2. Use pre-built binaries if available

---

## Generation Problems

### Generation returns empty results

**Symptoms:**
```typescript
const data = await synth.generate({ schema, count: 1000 });
console.log(data.data.length); // 0
```

**Solutions:**

1. **Check API key configuration:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'openai',
     apiKey: process.env.OPENAI_API_KEY, // Ensure this is set
   });
   ```

2. **Verify schema validity:**
   ```typescript
   import { validateSchema } from 'agentic-synth/utils';

   const isValid = validateSchema(schema);
   if (!isValid.valid) {
     console.error('Schema errors:', isValid.errors);
   }
   ```

3. **Check for errors in generation:**
   ```typescript
   try {
     const data = await synth.generate({ schema, count: 1000 });
   } catch (error) {
     console.error('Generation failed:', error);
   }
   ```

### Generation hangs indefinitely

**Symptoms:**
- Generation never completes
- No progress updates
- No error messages

**Solutions:**

1. **Add timeout:**
   ```typescript
   const controller = new AbortController();
   const timeout = setTimeout(() => controller.abort(), 60000); // 1 minute

   try {
     await synth.generate({
       schema,
       count: 1000,
       abortSignal: controller.signal,
     });
   } finally {
     clearTimeout(timeout);
   }
   ```

2. **Enable verbose logging:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'openai',
     debug: true, // Enable debug logs
   });
   ```

3. **Reduce batch size:**
   ```typescript
   const synth = new SynthEngine({
     batchSize: 10, // Start small
   });
   ```

### Invalid data generated

**Symptoms:**
- Data doesn't match schema
- Missing required fields
- Type mismatches

**Solutions:**

1. **Enable strict validation:**
   ```typescript
   const synth = new SynthEngine({
     validationEnabled: true,
     strictMode: true,
   });
   ```

2. **Add constraints to schema:**
   ```typescript
   const schema = Schema.define({
     name: 'User',
     type: 'object',
     properties: {
       email: {
         type: 'string',
         format: 'email',
         pattern: '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$',
       },
     },
     required: ['email'],
   });
   ```

3. **Increase temperature for diversity:**
   ```typescript
   const synth = new SynthEngine({
     temperature: 0.8, // Higher for more variation
   });
   ```

---

## Performance Issues

### Slow generation speed

**Symptoms:**
- Generation takes much longer than expected
- Low throughput (< 100 items/minute)

**Solutions:**

1. **Enable streaming mode:**
   ```typescript
   for await (const item of synth.generateStream({ schema, count: 10000 })) {
     // Process item immediately
   }
   ```

2. **Increase batch size:**
   ```typescript
   const synth = new SynthEngine({
     batchSize: 1000, // Larger batches
     maxWorkers: 8,   // More parallel workers
   });
   ```

3. **Use faster model:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'openai',
     model: 'gpt-3.5-turbo', // Faster than gpt-4
   });
   ```

4. **Cache embeddings:**
   ```typescript
   const synth = new SynthEngine({
     cacheEnabled: true,
     cacheTTL: 3600, // 1 hour
   });
   ```

5. **Profile generation:**
   ```typescript
   import { profiler } from 'agentic-synth/utils';

   const profile = await profiler.profile(() => {
     return synth.generate({ schema, count: 1000 });
   });

   console.log('Bottlenecks:', profile.bottlenecks);
   ```

### High memory usage

**Symptoms:**
```
FATAL ERROR: Reached heap limit Allocation failed
```

**Solutions:**

1. **Use streaming:**
   ```typescript
   // Instead of loading all in memory
   const data = await synth.generate({ schema, count: 1000000 }); // ❌

   // Stream and process incrementally
   for await (const item of synth.generateStream({ schema, count: 1000000 })) { // ✅
     await processItem(item);
   }
   ```

2. **Reduce batch size:**
   ```typescript
   const synth = new SynthEngine({
     batchSize: 100, // Smaller batches
   });
   ```

3. **Increase Node.js heap size:**
   ```bash
   NODE_OPTIONS="--max-old-space-size=4096" npm start
   ```

4. **Process in chunks:**
   ```typescript
   const chunkSize = 10000;
   const totalCount = 1000000;

   for (let i = 0; i < totalCount; i += chunkSize) {
     const chunk = await synth.generate({
       schema,
       count: Math.min(chunkSize, totalCount - i),
     });
     await exportChunk(chunk, i);
   }
   ```

---

## Quality Problems

### Low realism scores

**Symptoms:**
```typescript
const metrics = await QualityMetrics.evaluate(data);
console.log(metrics.realism); // 0.45 (too low)
```

**Solutions:**

1. **Improve schema descriptions:**
   ```typescript
   const schema = Schema.define({
     name: 'User',
     description: 'A realistic user profile with authentic details',
     properties: {
       name: {
         type: 'string',
         description: 'Full name following cultural naming conventions',
       },
     },
   });
   ```

2. **Add examples to schema:**
   ```typescript
   const schema = Schema.define({
     properties: {
       bio: {
         type: 'string',
         examples: [
           'Passionate about machine learning and open source',
           'Software engineer with 10 years of experience',
         ],
       },
     },
   });
   ```

3. **Adjust temperature:**
   ```typescript
   const synth = new SynthEngine({
     temperature: 0.9, // Higher for more natural variation
   });
   ```

4. **Use better model:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'anthropic',
     model: 'claude-3-opus-20240229', // Higher quality
   });
   ```

### Low diversity scores

**Symptoms:**
- Many duplicate or nearly identical examples
- Limited variation in generated data

**Solutions:**

1. **Increase temperature:**
   ```typescript
   const synth = new SynthEngine({
     temperature: 0.95, // Maximum diversity
   });
   ```

2. **Add diversity constraints:**
   ```typescript
   const schema = Schema.define({
     constraints: [
       {
         type: 'diversity',
         field: 'content',
         minSimilarity: 0.3, // Max 30% similarity
       },
     ],
   });
   ```

3. **Use varied prompts:**
   ```typescript
   const synth = new SynthEngine({
     promptVariation: true,
     variationStrategies: ['paraphrase', 'reframe', 'alternative-angle'],
   });
   ```

### Biased data detected

**Symptoms:**
```typescript
const metrics = await QualityMetrics.evaluate(data, { bias: true });
console.log(metrics.bias); // { gender: 0.85 } (too high)
```

**Solutions:**

1. **Add fairness constraints:**
   ```typescript
   const schema = Schema.define({
     constraints: [
       {
         type: 'fairness',
         attributes: ['gender', 'age', 'ethnicity'],
         distribution: 'uniform',
       },
     ],
   });
   ```

2. **Explicit diversity instructions:**
   ```typescript
   const schema = Schema.define({
     description: 'Generate diverse examples representing all demographics equally',
   });
   ```

3. **Post-generation filtering:**
   ```typescript
   import { BiasDetector } from 'agentic-synth/utils';

   const detector = new BiasDetector();
   const balanced = data.filter(item => {
     const bias = detector.detect(item);
     return bias.overall < 0.3; // Keep low-bias items
   });
   ```

---

## Integration Issues

### Ruvector connection fails

**Symptoms:**
```
Error: Cannot connect to Ruvector at localhost:8080
```

**Solutions:**

1. **Verify Ruvector is running:**
   ```bash
   # Check if Ruvector service is running
   curl http://localhost:8080/health
   ```

2. **Check connection configuration:**
   ```typescript
   const db = new VectorDB({
     host: 'localhost',
     port: 8080,
     timeout: 5000,
   });
   ```

3. **Use retry logic:**
   ```typescript
   import { retry } from 'agentic-synth/utils';

   const db = await retry(() => new VectorDB(), {
     attempts: 3,
     delay: 1000,
   });
   ```

### Vector insertion fails

**Symptoms:**
```
Error: Failed to insert vectors into collection
```

**Solutions:**

1. **Verify collection exists:**
   ```typescript
   const collections = await db.listCollections();
   if (!collections.includes('my-collection')) {
     await db.createCollection('my-collection', { dimensions: 384 });
   }
   ```

2. **Check vector dimensions match:**
   ```typescript
   const schema = Schema.define({
     properties: {
       embedding: {
         type: 'embedding',
         dimensions: 384, // Must match collection config
       },
     },
   });
   ```

3. **Use batching:**
   ```typescript
   await synth.generateAndInsert({
     schema,
     count: 10000,
     collection: 'vectors',
     batchSize: 1000, // Insert in batches
   });
   ```

---

## API and Authentication

### OpenAI API errors

**Symptoms:**
```
Error: Incorrect API key provided
```

**Solutions:**

1. **Verify API key:**
   ```bash
   echo $OPENAI_API_KEY
   ```

2. **Set environment variable:**
   ```bash
   export OPENAI_API_KEY="sk-..."
   ```

3. **Pass key explicitly:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'openai',
     apiKey: 'sk-...', // Not recommended for production
   });
   ```

### Rate limit exceeded

**Symptoms:**
```
Error: Rate limit exceeded. Please try again later.
```

**Solutions:**

1. **Implement exponential backoff:**
   ```typescript
   const synth = new SynthEngine({
     retryConfig: {
       maxRetries: 5,
       backoffMultiplier: 2,
       initialDelay: 1000,
     },
   });
   ```

2. **Reduce request rate:**
   ```typescript
   const synth = new SynthEngine({
     rateLimit: {
       requestsPerMinute: 60,
       tokensPerMinute: 90000,
     },
   });
   ```

3. **Use multiple API keys:**
   ```typescript
   const synth = new SynthEngine({
     provider: 'openai',
     apiKeys: [
       process.env.OPENAI_API_KEY_1,
       process.env.OPENAI_API_KEY_2,
       process.env.OPENAI_API_KEY_3,
     ],
     keyRotationStrategy: 'round-robin',
   });
   ```

---

## Memory and Resource Issues

### Out of memory errors

**Solutions:**

1. **Use streaming mode (recommended):**
   ```typescript
   for await (const item of synth.generateStream({ schema, count: 1000000 })) {
     await processAndDiscard(item);
   }
   ```

2. **Process in smaller batches:**
   ```typescript
   async function generateInChunks(totalCount: number, chunkSize: number) {
     for (let i = 0; i < totalCount; i += chunkSize) {
       const chunk = await synth.generate({
         schema,
         count: chunkSize,
       });
       await processChunk(chunk);
       // Chunk is garbage collected after processing
     }
   }
   ```

3. **Increase Node.js memory:**
   ```bash
   node --max-old-space-size=8192 script.js
   ```

### Disk space issues

**Symptoms:**
```
Error: ENOSPC: no space left on device
```

**Solutions:**

1. **Stream directly to storage:**
   ```typescript
   import { createWriteStream } from 'fs';

   const stream = createWriteStream('./output.jsonl');
   for await (const item of synth.generateStream({ schema, count: 1000000 })) {
     stream.write(JSON.stringify(item) + '\n');
   }
   stream.end();
   ```

2. **Use compression:**
   ```typescript
   import { createGzip } from 'zlib';
   import { pipeline } from 'stream/promises';

   await pipeline(
     synth.generateStream({ schema, count: 1000000 }),
     createGzip(),
     createWriteStream('./output.jsonl.gz')
   );
   ```

3. **Export to remote storage:**
   ```typescript
   import { S3Client } from '@aws-sdk/client-s3';

   const s3 = new S3Client({ region: 'us-east-1' });
   await synth.generate({ schema, count: 1000000 }).export({
     format: 'parquet',
     destination: 's3://my-bucket/synthetic-data.parquet',
   });
   ```

---

## Debugging Tips

### Enable debug logging

```typescript
import { setLogLevel } from 'agentic-synth';

setLogLevel('debug');

const synth = new SynthEngine({
  debug: true,
  verbose: true,
});
```

### Use profiler

```typescript
import { profiler } from 'agentic-synth/utils';

const results = await profiler.profile(async () => {
  return await synth.generate({ schema, count: 1000 });
});

console.log('Performance breakdown:', results.breakdown);
console.log('Bottlenecks:', results.bottlenecks);
```

### Test with small datasets first

```typescript
// Test with 10 examples first
const test = await synth.generate({ schema, count: 10 });
console.log('Sample:', test.data[0]);

// Validate quality
const quality = await QualityMetrics.evaluate(test.data);
console.log('Quality:', quality);

// If quality is good, scale up
if (quality.overall > 0.85) {
  const full = await synth.generate({ schema, count: 100000 });
}
```

---

## Getting Help

If you're still experiencing issues:

1. **Check documentation**: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/docs
2. **Search issues**: https://github.com/ruvnet/ruvector/issues
3. **Ask on Discord**: https://discord.gg/ruvnet
4. **Open an issue**: https://github.com/ruvnet/ruvector/issues/new

When reporting issues, include:
- Agentic-Synth version: `npm list agentic-synth`
- Node.js version: `node --version`
- Operating system
- Minimal reproduction code
- Error messages and stack traces
- Schema definition (if relevant)

---

## FAQ

**Q: Why is generation slow?**
A: Enable streaming, increase batch size, use faster models, or cache embeddings.

**Q: How do I improve data quality?**
A: Use better models, add detailed schema descriptions, include examples, adjust temperature.

**Q: Can I use multiple LLM providers?**
A: Yes, configure fallback providers or rotate between them.

**Q: How do I handle rate limits?**
A: Implement exponential backoff, reduce rate, or use multiple API keys.

**Q: Is there a size limit for generation?**
A: No hard limit, but use streaming for datasets > 10,000 items.

---

## Additional Resources

- [API Reference](./API.md)
- [Examples](./EXAMPLES.md)
- [Integration Guides](./INTEGRATIONS.md)
- [Best Practices](./BEST_PRACTICES.md)