Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,758 @@
# Troubleshooting Guide
Common issues and solutions for Agentic-Synth.
## Table of Contents
- [Installation Issues](#installation-issues)
- [Generation Problems](#generation-problems)
- [Performance Issues](#performance-issues)
- [Quality Problems](#quality-problems)
- [Integration Issues](#integration-issues)
- [API and Authentication](#api-and-authentication)
- [Memory and Resource Issues](#memory-and-resource-issues)
---
## Installation Issues
### npm install fails
**Symptoms:**
```bash
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /path/to/package.json
```
**Solutions:**
1. Ensure you're in the correct directory
2. Verify Node.js version (>=18.0.0):
```bash
node --version
```
3. Clear npm cache:
```bash
npm cache clean --force
npm install
```
4. Try with different package manager:
```bash
pnpm install
# or
yarn install
```
### TypeScript type errors
**Symptoms:**
```
Cannot find module 'agentic-synth' or its corresponding type declarations
```
**Solutions:**
1. Ensure TypeScript version >=5.0:
```bash
npm install -D typescript@latest
```
2. Check tsconfig.json:
```json
{
"compilerOptions": {
"moduleResolution": "node",
"esModuleInterop": true
}
}
```
### Native dependencies fail to build
**Symptoms:**
```
gyp ERR! build error
```
**Solutions:**
1. Install build tools:
- **Windows**: `npm install --global windows-build-tools`
- **Mac**: `xcode-select --install`
- **Linux**: `sudo apt-get install build-essential`
2. Use pre-built binaries if available
---
## Generation Problems
### Generation returns empty results
**Symptoms:**
```typescript
const data = await synth.generate({ schema, count: 1000 });
console.log(data.data.length); // 0
```
**Solutions:**
1. **Check API key configuration:**
```typescript
const synth = new SynthEngine({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY, // Ensure this is set
});
```
2. **Verify schema validity:**
```typescript
import { validateSchema } from 'agentic-synth/utils';
const isValid = validateSchema(schema);
if (!isValid.valid) {
console.error('Schema errors:', isValid.errors);
}
```
3. **Check for errors in generation:**
```typescript
try {
const data = await synth.generate({ schema, count: 1000 });
} catch (error) {
console.error('Generation failed:', error);
}
```
### Generation hangs indefinitely
**Symptoms:**
- Generation never completes
- No progress updates
- No error messages
**Solutions:**
1. **Add timeout:**
```typescript
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 60000); // 1 minute
try {
await synth.generate({
schema,
count: 1000,
abortSignal: controller.signal,
});
} finally {
clearTimeout(timeout);
}
```
2. **Enable verbose logging:**
```typescript
const synth = new SynthEngine({
provider: 'openai',
debug: true, // Enable debug logs
});
```
3. **Reduce batch size:**
```typescript
const synth = new SynthEngine({
batchSize: 10, // Start small
});
```
### Invalid data generated
**Symptoms:**
- Data doesn't match schema
- Missing required fields
- Type mismatches
**Solutions:**
1. **Enable strict validation:**
```typescript
const synth = new SynthEngine({
validationEnabled: true,
strictMode: true,
});
```
2. **Add constraints to schema:**
```typescript
const schema = Schema.define({
name: 'User',
type: 'object',
properties: {
email: {
type: 'string',
format: 'email',
pattern: '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$',
},
},
required: ['email'],
});
```
3. **Increase temperature for diversity:**
```typescript
const synth = new SynthEngine({
temperature: 0.8, // Higher for more variation
});
```
---
## Performance Issues
### Slow generation speed
**Symptoms:**
- Generation takes much longer than expected
- Low throughput (< 100 items/minute)
**Solutions:**
1. **Enable streaming mode:**
```typescript
for await (const item of synth.generateStream({ schema, count: 10000 })) {
// Process item immediately
}
```
2. **Increase batch size:**
```typescript
const synth = new SynthEngine({
batchSize: 1000, // Larger batches
maxWorkers: 8, // More parallel workers
});
```
3. **Use faster model:**
```typescript
const synth = new SynthEngine({
provider: 'openai',
model: 'gpt-3.5-turbo', // Faster than gpt-4
});
```
4. **Cache embeddings:**
```typescript
const synth = new SynthEngine({
cacheEnabled: true,
cacheTTL: 3600, // 1 hour
});
```
5. **Profile generation:**
```typescript
import { profiler } from 'agentic-synth/utils';
const profile = await profiler.profile(() => {
return synth.generate({ schema, count: 1000 });
});
console.log('Bottlenecks:', profile.bottlenecks);
```
### High memory usage
**Symptoms:**
```
FATAL ERROR: Reached heap limit Allocation failed
```
**Solutions:**
1. **Use streaming:**
```typescript
// Instead of loading all in memory
const data = await synth.generate({ schema, count: 1000000 }); // ❌
// Stream and process incrementally
for await (const item of synth.generateStream({ schema, count: 1000000 })) { // ✅
await processItem(item);
}
```
2. **Reduce batch size:**
```typescript
const synth = new SynthEngine({
batchSize: 100, // Smaller batches
});
```
3. **Increase Node.js heap size:**
```bash
NODE_OPTIONS="--max-old-space-size=4096" npm start
```
4. **Process in chunks:**
```typescript
const chunkSize = 10000;
const totalCount = 1000000;
for (let i = 0; i < totalCount; i += chunkSize) {
const chunk = await synth.generate({
schema,
count: Math.min(chunkSize, totalCount - i),
});
await exportChunk(chunk, i);
}
```
---
## Quality Problems
### Low realism scores
**Symptoms:**
```typescript
const metrics = await QualityMetrics.evaluate(data);
console.log(metrics.realism); // 0.45 (too low)
```
**Solutions:**
1. **Improve schema descriptions:**
```typescript
const schema = Schema.define({
name: 'User',
description: 'A realistic user profile with authentic details',
properties: {
name: {
type: 'string',
description: 'Full name following cultural naming conventions',
},
},
});
```
2. **Add examples to schema:**
```typescript
const schema = Schema.define({
properties: {
bio: {
type: 'string',
examples: [
'Passionate about machine learning and open source',
'Software engineer with 10 years of experience',
],
},
},
});
```
3. **Adjust temperature:**
```typescript
const synth = new SynthEngine({
temperature: 0.9, // Higher for more natural variation
});
```
4. **Use better model:**
```typescript
const synth = new SynthEngine({
provider: 'anthropic',
model: 'claude-3-opus-20240229', // Higher quality
});
```
### Low diversity scores
**Symptoms:**
- Many duplicate or nearly identical examples
- Limited variation in generated data
**Solutions:**
1. **Increase temperature:**
```typescript
const synth = new SynthEngine({
temperature: 0.95, // Maximum diversity
});
```
2. **Add diversity constraints:**
```typescript
const schema = Schema.define({
constraints: [
{
type: 'diversity',
field: 'content',
minSimilarity: 0.3, // Max 30% similarity
},
],
});
```
3. **Use varied prompts:**
```typescript
const synth = new SynthEngine({
promptVariation: true,
variationStrategies: ['paraphrase', 'reframe', 'alternative-angle'],
});
```
### Biased data detected
**Symptoms:**
```typescript
const metrics = await QualityMetrics.evaluate(data, { bias: true });
console.log(metrics.bias); // { gender: 0.85 } (too high)
```
**Solutions:**
1. **Add fairness constraints:**
```typescript
const schema = Schema.define({
constraints: [
{
type: 'fairness',
attributes: ['gender', 'age', 'ethnicity'],
distribution: 'uniform',
},
],
});
```
2. **Explicit diversity instructions:**
```typescript
const schema = Schema.define({
description: 'Generate diverse examples representing all demographics equally',
});
```
3. **Post-generation filtering:**
```typescript
import { BiasDetector } from 'agentic-synth/utils';
const detector = new BiasDetector();
const balanced = data.filter(item => {
const bias = detector.detect(item);
return bias.overall < 0.3; // Keep low-bias items
});
```
---
## Integration Issues
### Ruvector connection fails
**Symptoms:**
```
Error: Cannot connect to Ruvector at localhost:8080
```
**Solutions:**
1. **Verify Ruvector is running:**
```bash
# Check if Ruvector service is running
curl http://localhost:8080/health
```
2. **Check connection configuration:**
```typescript
const db = new VectorDB({
host: 'localhost',
port: 8080,
timeout: 5000,
});
```
3. **Use retry logic:**
```typescript
import { retry } from 'agentic-synth/utils';
const db = await retry(() => new VectorDB(), {
attempts: 3,
delay: 1000,
});
```
### Vector insertion fails
**Symptoms:**
```
Error: Failed to insert vectors into collection
```
**Solutions:**
1. **Verify collection exists:**
```typescript
const collections = await db.listCollections();
if (!collections.includes('my-collection')) {
await db.createCollection('my-collection', { dimensions: 384 });
}
```
2. **Check vector dimensions match:**
```typescript
const schema = Schema.define({
properties: {
embedding: {
type: 'embedding',
dimensions: 384, // Must match collection config
},
},
});
```
3. **Use batching:**
```typescript
await synth.generateAndInsert({
schema,
count: 10000,
collection: 'vectors',
batchSize: 1000, // Insert in batches
});
```
---
## API and Authentication
### OpenAI API errors
**Symptoms:**
```
Error: Incorrect API key provided
```
**Solutions:**
1. **Verify API key:**
```bash
echo $OPENAI_API_KEY
```
2. **Set environment variable:**
```bash
export OPENAI_API_KEY="sk-..."
```
3. **Pass key explicitly:**
```typescript
const synth = new SynthEngine({
provider: 'openai',
apiKey: 'sk-...', // Not recommended for production
});
```
### Rate limit exceeded
**Symptoms:**
```
Error: Rate limit exceeded. Please try again later.
```
**Solutions:**
1. **Implement exponential backoff:**
```typescript
const synth = new SynthEngine({
retryConfig: {
maxRetries: 5,
backoffMultiplier: 2,
initialDelay: 1000,
},
});
```
2. **Reduce request rate:**
```typescript
const synth = new SynthEngine({
rateLimit: {
requestsPerMinute: 60,
tokensPerMinute: 90000,
},
});
```
3. **Use multiple API keys:**
```typescript
const synth = new SynthEngine({
provider: 'openai',
apiKeys: [
process.env.OPENAI_API_KEY_1,
process.env.OPENAI_API_KEY_2,
process.env.OPENAI_API_KEY_3,
],
keyRotationStrategy: 'round-robin',
});
```
---
## Memory and Resource Issues
### Out of memory errors
**Solutions:**
1. **Use streaming mode (recommended):**
```typescript
for await (const item of synth.generateStream({ schema, count: 1000000 })) {
await processAndDiscard(item);
}
```
2. **Process in smaller batches:**
```typescript
async function generateInChunks(totalCount: number, chunkSize: number) {
for (let i = 0; i < totalCount; i += chunkSize) {
const chunk = await synth.generate({
schema,
count: chunkSize,
});
await processChunk(chunk);
// Chunk is garbage collected after processing
}
}
```
3. **Increase Node.js memory:**
```bash
node --max-old-space-size=8192 script.js
```
### Disk space issues
**Symptoms:**
```
Error: ENOSPC: no space left on device
```
**Solutions:**
1. **Stream directly to storage:**
```typescript
import { createWriteStream } from 'fs';
const stream = createWriteStream('./output.jsonl');
for await (const item of synth.generateStream({ schema, count: 1000000 })) {
stream.write(JSON.stringify(item) + '\n');
}
stream.end();
```
2. **Use compression:**
```typescript
import { createGzip } from 'zlib';
import { pipeline } from 'stream/promises';
await pipeline(
synth.generateStream({ schema, count: 1000000 }),
createGzip(),
createWriteStream('./output.jsonl.gz')
);
```
3. **Export to remote storage:**
```typescript
import { S3Client } from '@aws-sdk/client-s3';
const s3 = new S3Client({ region: 'us-east-1' });
await synth.generate({ schema, count: 1000000 }).export({
format: 'parquet',
destination: 's3://my-bucket/synthetic-data.parquet',
});
```
---
## Debugging Tips
### Enable debug logging
```typescript
import { setLogLevel } from 'agentic-synth';
setLogLevel('debug');
const synth = new SynthEngine({
debug: true,
verbose: true,
});
```
### Use profiler
```typescript
import { profiler } from 'agentic-synth/utils';
const results = await profiler.profile(async () => {
return await synth.generate({ schema, count: 1000 });
});
console.log('Performance breakdown:', results.breakdown);
console.log('Bottlenecks:', results.bottlenecks);
```
### Test with small datasets first
```typescript
// Test with 10 examples first
const test = await synth.generate({ schema, count: 10 });
console.log('Sample:', test.data[0]);
// Validate quality
const quality = await QualityMetrics.evaluate(test.data);
console.log('Quality:', quality);
// If quality is good, scale up
if (quality.overall > 0.85) {
const full = await synth.generate({ schema, count: 100000 });
}
```
---
## Getting Help
If you're still experiencing issues:
1. **Check documentation**: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/docs
2. **Search issues**: https://github.com/ruvnet/ruvector/issues
3. **Ask on Discord**: https://discord.gg/ruvnet
4. **Open an issue**: https://github.com/ruvnet/ruvector/issues/new
When reporting issues, include:
- Agentic-Synth version: `npm list agentic-synth`
- Node.js version: `node --version`
- Operating system
- Minimal reproduction code
- Error messages and stack traces
- Schema definition (if relevant)
---
## FAQ
**Q: Why is generation slow?**
A: Enable streaming, increase batch size, use faster models, or cache embeddings.
**Q: How do I improve data quality?**
A: Use better models, add detailed schema descriptions, include examples, adjust temperature.
**Q: Can I use multiple LLM providers?**
A: Yes, configure fallback providers or rotate between them.
**Q: How do I handle rate limits?**
A: Implement exponential backoff, reduce rate, or use multiple API keys.
**Q: Is there a size limit for generation?**
A: No hard limit, but use streaming for datasets > 10,000 items.
---
## Additional Resources
- [API Reference](./API.md)
- [Examples](./EXAMPLES.md)
- [Integration Guides](./INTEGRATIONS.md)
- [Best Practices](./BEST_PRACTICES.md)