Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

2026-02-28 14:39:40 -05:00
parent 7885bf6278 d803bfe2b1
commit cd5943df23
7854 changed files with 3522914 additions and 0 deletions
--- a/vendor/ruvector/npm/packages/agentic-synth/docs/TROUBLESHOOTING.md
+++ b/vendor/ruvector/npm/packages/agentic-synth/docs/TROUBLESHOOTING.md
@@ -0,0 +1,758 @@
+# Troubleshooting Guide
+
+Common issues and solutions for Agentic-Synth.
+
+## Table of Contents
+
+- [Installation Issues](#installation-issues)
+- [Generation Problems](#generation-problems)
+- [Performance Issues](#performance-issues)
+- [Quality Problems](#quality-problems)
+- [Integration Issues](#integration-issues)
+- [API and Authentication](#api-and-authentication)
+- [Memory and Resource Issues](#memory-and-resource-issues)
+
+---
+
+## Installation Issues
+
+### npm install fails
+
+**Symptoms:**
+```bash
+npm ERR! code ENOENT
+npm ERR! syscall open
+npm ERR! path /path/to/package.json
+```
+
+**Solutions:**
+1. Ensure you're in the correct directory
+2. Verify Node.js version (>=18.0.0):
+   ```bash
+   node --version
+   ```
+3. Clear npm cache:
+   ```bash
+   npm cache clean --force
+   npm install
+   ```
+4. Try with different package manager:
+   ```bash
+   pnpm install
+   # or
+   yarn install
+   ```
+
+### TypeScript type errors
+
+**Symptoms:**
+```
+Cannot find module 'agentic-synth' or its corresponding type declarations
+```
+
+**Solutions:**
+1. Ensure TypeScript version >=5.0:
+   ```bash
+   npm install -D typescript@latest
+   ```
+2. Check tsconfig.json:
+   ```json
+   {
+     "compilerOptions": {
+       "moduleResolution": "node",
+       "esModuleInterop": true
+     }
+   }
+   ```
+
+### Native dependencies fail to build
+
+**Symptoms:**
+```
+gyp ERR! build error
+```
+
+**Solutions:**
+1. Install build tools:
+   - **Windows**: `npm install --global windows-build-tools`
+   - **Mac**: `xcode-select --install`
+   - **Linux**: `sudo apt-get install build-essential`
+2. Use pre-built binaries if available
+
+---
+
+## Generation Problems
+
+### Generation returns empty results
+
+**Symptoms:**
+```typescript
+const data = await synth.generate({ schema, count: 1000 });
+console.log(data.data.length); // 0
+```
+
+**Solutions:**
+
+1. **Check API key configuration:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'openai',
+     apiKey: process.env.OPENAI_API_KEY, // Ensure this is set
+   });
+   ```
+
+2. **Verify schema validity:**
+   ```typescript
+   import { validateSchema } from 'agentic-synth/utils';
+
+   const isValid = validateSchema(schema);
+   if (!isValid.valid) {
+     console.error('Schema errors:', isValid.errors);
+   }
+   ```
+
+3. **Check for errors in generation:**
+   ```typescript
+   try {
+     const data = await synth.generate({ schema, count: 1000 });
+   } catch (error) {
+     console.error('Generation failed:', error);
+   }
+   ```
+
+### Generation hangs indefinitely
+
+**Symptoms:**
+- Generation never completes
+- No progress updates
+- No error messages
+
+**Solutions:**
+
+1. **Add timeout:**
+   ```typescript
+   const controller = new AbortController();
+   const timeout = setTimeout(() => controller.abort(), 60000); // 1 minute
+
+   try {
+     await synth.generate({
+       schema,
+       count: 1000,
+       abortSignal: controller.signal,
+     });
+   } finally {
+     clearTimeout(timeout);
+   }
+   ```
+
+2. **Enable verbose logging:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'openai',
+     debug: true, // Enable debug logs
+   });
+   ```
+
+3. **Reduce batch size:**
+   ```typescript
+   const synth = new SynthEngine({
+     batchSize: 10, // Start small
+   });
+   ```
+
+### Invalid data generated
+
+**Symptoms:**
+- Data doesn't match schema
+- Missing required fields
+- Type mismatches
+
+**Solutions:**
+
+1. **Enable strict validation:**
+   ```typescript
+   const synth = new SynthEngine({
+     validationEnabled: true,
+     strictMode: true,
+   });
+   ```
+
+2. **Add constraints to schema:**
+   ```typescript
+   const schema = Schema.define({
+     name: 'User',
+     type: 'object',
+     properties: {
+       email: {
+         type: 'string',
+         format: 'email',
+         pattern: '^[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,}$',
+       },
+     },
+     required: ['email'],
+   });
+   ```
+
+3. **Increase temperature for diversity:**
+   ```typescript
+   const synth = new SynthEngine({
+     temperature: 0.8, // Higher for more variation
+   });
+   ```
+
+---
+
+## Performance Issues
+
+### Slow generation speed
+
+**Symptoms:**
+- Generation takes much longer than expected
+- Low throughput (< 100 items/minute)
+
+**Solutions:**
+
+1. **Enable streaming mode:**
+   ```typescript
+   for await (const item of synth.generateStream({ schema, count: 10000 })) {
+     // Process item immediately
+   }
+   ```
+
+2. **Increase batch size:**
+   ```typescript
+   const synth = new SynthEngine({
+     batchSize: 1000, // Larger batches
+     maxWorkers: 8,   // More parallel workers
+   });
+   ```
+
+3. **Use faster model:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'openai',
+     model: 'gpt-3.5-turbo', // Faster than gpt-4
+   });
+   ```
+
+4. **Cache embeddings:**
+   ```typescript
+   const synth = new SynthEngine({
+     cacheEnabled: true,
+     cacheTTL: 3600, // 1 hour
+   });
+   ```
+
+5. **Profile generation:**
+   ```typescript
+   import { profiler } from 'agentic-synth/utils';
+
+   const profile = await profiler.profile(() => {
+     return synth.generate({ schema, count: 1000 });
+   });
+
+   console.log('Bottlenecks:', profile.bottlenecks);
+   ```
+
+### High memory usage
+
+**Symptoms:**
+```
+FATAL ERROR: Reached heap limit Allocation failed
+```
+
+**Solutions:**
+
+1. **Use streaming:**
+   ```typescript
+   // Instead of loading all in memory
+   const data = await synth.generate({ schema, count: 1000000 }); // ❌
+
+   // Stream and process incrementally
+   for await (const item of synth.generateStream({ schema, count: 1000000 })) { // ✅
+     await processItem(item);
+   }
+   ```
+
+2. **Reduce batch size:**
+   ```typescript
+   const synth = new SynthEngine({
+     batchSize: 100, // Smaller batches
+   });
+   ```
+
+3. **Increase Node.js heap size:**
+   ```bash
+   NODE_OPTIONS="--max-old-space-size=4096" npm start
+   ```
+
+4. **Process in chunks:**
+   ```typescript
+   const chunkSize = 10000;
+   const totalCount = 1000000;
+
+   for (let i = 0; i < totalCount; i += chunkSize) {
+     const chunk = await synth.generate({
+       schema,
+       count: Math.min(chunkSize, totalCount - i),
+     });
+     await exportChunk(chunk, i);
+   }
+   ```
+
+---
+
+## Quality Problems
+
+### Low realism scores
+
+**Symptoms:**
+```typescript
+const metrics = await QualityMetrics.evaluate(data);
+console.log(metrics.realism); // 0.45 (too low)
+```
+
+**Solutions:**
+
+1. **Improve schema descriptions:**
+   ```typescript
+   const schema = Schema.define({
+     name: 'User',
+     description: 'A realistic user profile with authentic details',
+     properties: {
+       name: {
+         type: 'string',
+         description: 'Full name following cultural naming conventions',
+       },
+     },
+   });
+   ```
+
+2. **Add examples to schema:**
+   ```typescript
+   const schema = Schema.define({
+     properties: {
+       bio: {
+         type: 'string',
+         examples: [
+           'Passionate about machine learning and open source',
+           'Software engineer with 10 years of experience',
+         ],
+       },
+     },
+   });
+   ```
+
+3. **Adjust temperature:**
+   ```typescript
+   const synth = new SynthEngine({
+     temperature: 0.9, // Higher for more natural variation
+   });
+   ```
+
+4. **Use better model:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'anthropic',
+     model: 'claude-3-opus-20240229', // Higher quality
+   });
+   ```
+
+### Low diversity scores
+
+**Symptoms:**
+- Many duplicate or nearly identical examples
+- Limited variation in generated data
+
+**Solutions:**
+
+1. **Increase temperature:**
+   ```typescript
+   const synth = new SynthEngine({
+     temperature: 0.95, // Maximum diversity
+   });
+   ```
+
+2. **Add diversity constraints:**
+   ```typescript
+   const schema = Schema.define({
+     constraints: [
+       {
+         type: 'diversity',
+         field: 'content',
+         minSimilarity: 0.3, // Max 30% similarity
+       },
+     ],
+   });
+   ```
+
+3. **Use varied prompts:**
+   ```typescript
+   const synth = new SynthEngine({
+     promptVariation: true,
+     variationStrategies: ['paraphrase', 'reframe', 'alternative-angle'],
+   });
+   ```
+
+### Biased data detected
+
+**Symptoms:**
+```typescript
+const metrics = await QualityMetrics.evaluate(data, { bias: true });
+console.log(metrics.bias); // { gender: 0.85 } (too high)
+```
+
+**Solutions:**
+
+1. **Add fairness constraints:**
+   ```typescript
+   const schema = Schema.define({
+     constraints: [
+       {
+         type: 'fairness',
+         attributes: ['gender', 'age', 'ethnicity'],
+         distribution: 'uniform',
+       },
+     ],
+   });
+   ```
+
+2. **Explicit diversity instructions:**
+   ```typescript
+   const schema = Schema.define({
+     description: 'Generate diverse examples representing all demographics equally',
+   });
+   ```
+
+3. **Post-generation filtering:**
+   ```typescript
+   import { BiasDetector } from 'agentic-synth/utils';
+
+   const detector = new BiasDetector();
+   const balanced = data.filter(item => {
+     const bias = detector.detect(item);
+     return bias.overall < 0.3; // Keep low-bias items
+   });
+   ```
+
+---
+
+## Integration Issues
+
+### Ruvector connection fails
+
+**Symptoms:**
+```
+Error: Cannot connect to Ruvector at localhost:8080
+```
+
+**Solutions:**
+
+1. **Verify Ruvector is running:**
+   ```bash
+   # Check if Ruvector service is running
+   curl http://localhost:8080/health
+   ```
+
+2. **Check connection configuration:**
+   ```typescript
+   const db = new VectorDB({
+     host: 'localhost',
+     port: 8080,
+     timeout: 5000,
+   });
+   ```
+
+3. **Use retry logic:**
+   ```typescript
+   import { retry } from 'agentic-synth/utils';
+
+   const db = await retry(() => new VectorDB(), {
+     attempts: 3,
+     delay: 1000,
+   });
+   ```
+
+### Vector insertion fails
+
+**Symptoms:**
+```
+Error: Failed to insert vectors into collection
+```
+
+**Solutions:**
+
+1. **Verify collection exists:**
+   ```typescript
+   const collections = await db.listCollections();
+   if (!collections.includes('my-collection')) {
+     await db.createCollection('my-collection', { dimensions: 384 });
+   }
+   ```
+
+2. **Check vector dimensions match:**
+   ```typescript
+   const schema = Schema.define({
+     properties: {
+       embedding: {
+         type: 'embedding',
+         dimensions: 384, // Must match collection config
+       },
+     },
+   });
+   ```
+
+3. **Use batching:**
+   ```typescript
+   await synth.generateAndInsert({
+     schema,
+     count: 10000,
+     collection: 'vectors',
+     batchSize: 1000, // Insert in batches
+   });
+   ```
+
+---
+
+## API and Authentication
+
+### OpenAI API errors
+
+**Symptoms:**
+```
+Error: Incorrect API key provided
+```
+
+**Solutions:**
+
+1. **Verify API key:**
+   ```bash
+   echo $OPENAI_API_KEY
+   ```
+
+2. **Set environment variable:**
+   ```bash
+   export OPENAI_API_KEY="sk-..."
+   ```
+
+3. **Pass key explicitly:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'openai',
+     apiKey: 'sk-...', // Not recommended for production
+   });
+   ```
+
+### Rate limit exceeded
+
+**Symptoms:**
+```
+Error: Rate limit exceeded. Please try again later.
+```
+
+**Solutions:**
+
+1. **Implement exponential backoff:**
+   ```typescript
+   const synth = new SynthEngine({
+     retryConfig: {
+       maxRetries: 5,
+       backoffMultiplier: 2,
+       initialDelay: 1000,
+     },
+   });
+   ```
+
+2. **Reduce request rate:**
+   ```typescript
+   const synth = new SynthEngine({
+     rateLimit: {
+       requestsPerMinute: 60,
+       tokensPerMinute: 90000,
+     },
+   });
+   ```
+
+3. **Use multiple API keys:**
+   ```typescript
+   const synth = new SynthEngine({
+     provider: 'openai',
+     apiKeys: [
+       process.env.OPENAI_API_KEY_1,
+       process.env.OPENAI_API_KEY_2,
+       process.env.OPENAI_API_KEY_3,
+     ],
+     keyRotationStrategy: 'round-robin',
+   });
+   ```
+
+---
+
+## Memory and Resource Issues
+
+### Out of memory errors
+
+**Solutions:**
+
+1. **Use streaming mode (recommended):**
+   ```typescript
+   for await (const item of synth.generateStream({ schema, count: 1000000 })) {
+     await processAndDiscard(item);
+   }
+   ```
+
+2. **Process in smaller batches:**
+   ```typescript
+   async function generateInChunks(totalCount: number, chunkSize: number) {
+     for (let i = 0; i < totalCount; i += chunkSize) {
+       const chunk = await synth.generate({
+         schema,
+         count: chunkSize,
+       });
+       await processChunk(chunk);
+       // Chunk is garbage collected after processing
+     }
+   }
+   ```
+
+3. **Increase Node.js memory:**
+   ```bash
+   node --max-old-space-size=8192 script.js
+   ```
+
+### Disk space issues
+
+**Symptoms:**
+```
+Error: ENOSPC: no space left on device
+```
+
+**Solutions:**
+
+1. **Stream directly to storage:**
+   ```typescript
+   import { createWriteStream } from 'fs';
+
+   const stream = createWriteStream('./output.jsonl');
+   for await (const item of synth.generateStream({ schema, count: 1000000 })) {
+     stream.write(JSON.stringify(item) + '\n');
+   }
+   stream.end();
+   ```
+
+2. **Use compression:**
+   ```typescript
+   import { createGzip } from 'zlib';
+   import { pipeline } from 'stream/promises';
+
+   await pipeline(
+     synth.generateStream({ schema, count: 1000000 }),
+     createGzip(),
+     createWriteStream('./output.jsonl.gz')
+   );
+   ```
+
+3. **Export to remote storage:**
+   ```typescript
+   import { S3Client } from '@aws-sdk/client-s3';
+
+   const s3 = new S3Client({ region: 'us-east-1' });
+   await synth.generate({ schema, count: 1000000 }).export({
+     format: 'parquet',
+     destination: 's3://my-bucket/synthetic-data.parquet',
+   });
+   ```
+
+---
+
+## Debugging Tips
+
+### Enable debug logging
+
+```typescript
+import { setLogLevel } from 'agentic-synth';
+
+setLogLevel('debug');
+
+const synth = new SynthEngine({
+  debug: true,
+  verbose: true,
+});
+```
+
+### Use profiler
+
+```typescript
+import { profiler } from 'agentic-synth/utils';
+
+const results = await profiler.profile(async () => {
+  return await synth.generate({ schema, count: 1000 });
+});
+
+console.log('Performance breakdown:', results.breakdown);
+console.log('Bottlenecks:', results.bottlenecks);
+```
+
+### Test with small datasets first
+
+```typescript
+// Test with 10 examples first
+const test = await synth.generate({ schema, count: 10 });
+console.log('Sample:', test.data[0]);
+
+// Validate quality
+const quality = await QualityMetrics.evaluate(test.data);
+console.log('Quality:', quality);
+
+// If quality is good, scale up
+if (quality.overall > 0.85) {
+  const full = await synth.generate({ schema, count: 100000 });
+}
+```
+
+---
+
+## Getting Help
+
+If you're still experiencing issues:
+
+1. **Check documentation**: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth/docs
+2. **Search issues**: https://github.com/ruvnet/ruvector/issues
+3. **Ask on Discord**: https://discord.gg/ruvnet
+4. **Open an issue**: https://github.com/ruvnet/ruvector/issues/new
+
+When reporting issues, include:
+- Agentic-Synth version: `npm list agentic-synth`
+- Node.js version: `node --version`
+- Operating system
+- Minimal reproduction code
+- Error messages and stack traces
+- Schema definition (if relevant)
+
+---
+
+## FAQ
+
+**Q: Why is generation slow?**
+A: Enable streaming, increase batch size, use faster models, or cache embeddings.
+
+**Q: How do I improve data quality?**
+A: Use better models, add detailed schema descriptions, include examples, adjust temperature.
+
+**Q: Can I use multiple LLM providers?**
+A: Yes, configure fallback providers or rotate between them.
+
+**Q: How do I handle rate limits?**
+A: Implement exponential backoff, reduce rate, or use multiple API keys.
+
+**Q: Is there a size limit for generation?**
+A: No hard limit, but use streaming for datasets > 10,000 items.
+
+---
+
+## Additional Resources
+
+- [API Reference](./API.md)
+- [Examples](./EXAMPLES.md)
+- [Integration Guides](./INTEGRATIONS.md)
+- [Best Practices](./BEST_PRACTICES.md)