# 🎥 Agentic-Synth Video Tutorial Script **Duration**: 8-10 minutes **Target Audience**: Developers, ML engineers, data scientists **Format**: Screen recording with voice-over --- ## Video Structure 1. **Introduction** (1 min) 2. **Installation & Setup** (1 min) 3. **Basic Usage** (2 mins) 4. **Advanced Features** (2 mins) 5. **Real-World Example** (2 mins) 6. **Performance & Wrap-up** (1 min) --- ## Script ### Scene 1: Introduction (0:00 - 1:00) **Visual**: Title card, then switch to terminal **Voice-over**: > "Hi! Today I'll show you agentic-synth - a high-performance synthetic data generator that makes it incredibly easy to create realistic test data for your AI and ML projects. > > Whether you're training machine learning models, building RAG systems, or just need to seed your development database, agentic-synth has you covered with AI-powered data generation. > > Let's dive in!" **Screen**: Show README on GitHub with badges --- ### Scene 2: Installation (1:00 - 2:00) **Visual**: Terminal with command prompts **Voice-over**: > "Installation is straightforward. You can use it as a global CLI tool or add it to your project." **Type in terminal**: ```bash # Global installation npm install -g @ruvector/agentic-synth # Or use directly with npx npx agentic-synth --help ``` **Voice-over**: > "You'll need an API key from Google Gemini or OpenRouter. Let's set that up quickly." **Type**: ```bash export GEMINI_API_KEY="your-key-here" ``` **Voice-over**: > "And we're ready to go!" --- ### Scene 3: Basic Usage - CLI (2:00 - 3:00) **Visual**: Terminal showing CLI commands **Voice-over**: > "Let's start with the CLI. Generating data is as simple as running a single command." **Type**: ```bash npx agentic-synth generate \ --type structured \ --count 10 \ --schema '{"name": "string", "email": "email", "age": "number"}' \ --output users.json ``` **Voice-over**: > "In just a few seconds, we have 10 realistic user records with names, emails, and ages. Let's look at the output." **Type**: ```bash cat users.json | jq '.[0:3]' ``` **Visual**: Show JSON output with realistic data **Voice-over**: > "Notice how the data looks realistic - real names, valid email formats, appropriate ages. This is all powered by AI." --- ### Scene 4: SDK Usage (3:00 - 4:00) **Visual**: VS Code with TypeScript file **Voice-over**: > "For more control, you can use the SDK directly in your code. Let me show you how simple that is." **Type in editor** (`demo.ts`): ```typescript import { AgenticSynth } from '@ruvector/agentic-synth'; // Initialize with configuration const synth = new AgenticSynth({ provider: 'gemini', apiKey: process.env.GEMINI_API_KEY, cacheStrategy: 'memory', // Enable caching for 95%+ speedup cacheTTL: 3600 }); // Generate structured data const users = await synth.generateStructured({ count: 100, schema: { user_id: 'UUID', name: 'full name', email: 'valid email', age: 'number (18-80)', country: 'country name', subscription: 'free | pro | enterprise' } }); console.log(`Generated ${users.data.length} users`); console.log('Sample:', users.data[0]); ``` **Voice-over**: > "Run this code..." **Type in terminal**: ```bash npx tsx demo.ts ``` **Visual**: Show output with generated data **Voice-over**: > "And we instantly get 100 realistic user profiles. Notice the caching - if we run this again with the same options, it's nearly instant!" --- ### Scene 5: Advanced Features - Time Series (4:00 - 5:00) **Visual**: Split screen - editor on left, output on right **Voice-over**: > "agentic-synth isn't just for simple records. It can generate complex time-series data, perfect for financial or IoT applications." **Type in editor**: ```typescript const stockData = await synth.generateTimeSeries({ count: 365, startDate: '2024-01-01', interval: '1d', schema: { date: 'ISO date', open: 'number (100-200)', high: 'number (105-210)', low: 'number (95-195)', close: 'number (100-200)', volume: 'number (1000000-10000000)' }, constraints: [ 'high must be >= open and close', 'low must be <= open and close', 'close influences next day open' ] }); console.log('Generated stock data for 1 year'); ``` **Voice-over**: > "The constraints ensure our data follows real-world patterns - high prices are actually higher than opens and closes, and there's continuity between days." **Show output**: Chart visualization of stock data --- ### Scene 6: Advanced Features - Streaming (5:00 - 6:00) **Visual**: Editor showing streaming code **Voice-over**: > "Need to generate millions of records? Use streaming to avoid memory issues." **Type**: ```typescript let count = 0; for await (const record of synth.generateStream('structured', { count: 1_000_000, schema: { id: 'UUID', timestamp: 'ISO timestamp', value: 'number' } })) { // Process each record individually await saveToDatabase(record); count++; if (count % 10000 === 0) { console.log(`Processed ${count.toLocaleString()}...`); } } ``` **Voice-over**: > "This streams records one at a time, so you can process a million records without loading everything into memory." **Visual**: Show progress counter incrementing --- ### Scene 7: Real-World Example - ML Training Data (6:00 - 7:30) **Visual**: Complete working example **Voice-over**: > "Let me show you a real-world use case: generating training data for a machine learning model that predicts customer churn." **Type**: ```typescript // Generate training dataset with features const trainingData = await synth.generateStructured({ count: 5000, schema: { customer_age: 'number (18-80)', annual_income: 'number (20000-200000)', credit_score: 'number (300-850)', account_tenure_months: 'number (1-360)', num_products: 'number (1-5)', balance: 'number (0-250000)', num_transactions_12m: 'number (0-200)', // Target variable churn: 'boolean (higher likelihood if credit_score < 600, balance < 1000)' }, constraints: [ 'Churn rate should be ~15-20%', 'Higher income correlates with higher balance', 'Customers with 1 product more likely to churn' ] }); // Split into train/test const trainSize = Math.floor(trainingData.data.length * 0.8); const trainSet = trainingData.data.slice(0, trainSize); const testSet = trainingData.data.slice(trainSize); console.log(`Training set: ${trainSet.length} samples`); console.log(`Test set: ${testSet.length} samples`); console.log(`Churn rate: ${(trainSet.filter(d => d.churn).length / trainSet.length * 100).toFixed(1)}%`); ``` **Voice-over**: > "In minutes, we have a complete ML dataset with realistic distributions and correlations. The AI understands the constraints and generates data that actually makes sense for training models." --- ### Scene 8: Performance Highlights (7:30 - 8:30) **Visual**: Show benchmark results **Voice-over**: > "Let's talk performance. agentic-synth is incredibly fast, thanks to intelligent caching." **Visual**: Show PERFORMANCE_REPORT.md metrics **Voice-over**: > "All operations complete in sub-millisecond to low-millisecond latencies. Cache hits are essentially instant. And with an 85% cache hit rate in production, you're looking at 95%+ performance improvement for repeated queries. > > The package also handles 1000+ requests per second with linear scaling, making it perfect for production workloads." --- ### Scene 9: Wrap-up (8:30 - 9:00) **Visual**: Return to terminal, show final commands **Voice-over**: > "That's agentic-synth! To recap: > - Simple CLI and SDK interfaces > - AI-powered realistic data generation > - Time-series, events, and structured data support > - Streaming for large datasets > - Built-in caching for incredible performance > - Perfect for ML training, RAG systems, and testing > > Check out the documentation for more advanced examples, and give it a try in your next project!" **Type**: ```bash npm install @ruvector/agentic-synth ``` **Visual**: Show GitHub repo with Star button **Voice-over**: > "If you found this useful, star the repo on GitHub and let me know what you build with it. Thanks for watching!" **Visual**: End card with links --- ## Visual Assets Needed 1. **Title Cards**: - Intro card with logo - Feature highlights card - End card with links 2. **Code Examples**: - Syntax highlighted in VS Code - Font: Fira Code or JetBrains Mono - Theme: Dark+ or Material Theme 3. **Terminal**: - Oh My Zsh with clean prompt - Colors: Nord or Dracula theme 4. **Data Visualizations**: - JSON output formatted with jq - Stock chart for time-series example - Progress bars for streaming 5. **Documentation**: - README.md rendered - Performance metrics table - Benchmark results --- ## Recording Tips 1. **Screen Setup**: - 1920x1080 resolution - Clean desktop, no distractions - Close unnecessary applications - Disable notifications 2. **Terminal Settings**: - Large font size (16-18pt) - High contrast theme - Slow down typing with tool like "Keycastr" 3. **Editor Settings**: - Zoom to 150-200% - Hide sidebars for cleaner view - Use presentation mode 4. **Audio**: - Use quality microphone - Record in quiet room - Speak clearly and at moderate pace - Add background music (subtle, low volume) 5. **Pacing**: - Pause between steps - Let output display for 2-3 seconds - Don't rush through commands - Leave time for viewers to read --- ## Post-Production Checklist - [ ] Add title cards - [ ] Add transitions between scenes - [ ] Highlight important commands/output - [ ] Add annotations/callouts where helpful - [ ] Background music at 10-15% volume - [ ] Export at 1080p, 60fps - [ ] Generate subtitles/captions - [ ] Create thumbnail image - [ ] Upload to YouTube - [ ] Add to README as embedded video --- ## Video Description (for YouTube) ```markdown # Agentic-Synth: High-Performance Synthetic Data Generator Generate realistic synthetic data for AI/ML training, RAG systems, and database seeding in minutes! 🔗 Links: - NPM: https://www.npmjs.com/package/@ruvector/agentic-synth - GitHub: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth - Documentation: https://github.com/ruvnet/ruvector/blob/main/packages/agentic-synth/README.md ⚡ Performance: - Sub-millisecond P99 latencies - 85% cache hit rate - 1000+ req/s throughput - 95%+ speedup with caching 🎯 Use Cases: - Machine learning training data - RAG system data generation - Database seeding - API testing - Load testing 📚 Chapters: 0:00 Introduction 1:00 Installation & Setup 2:00 CLI Usage 3:00 SDK Usage 4:00 Time-Series Data 5:00 Streaming Large Datasets 6:00 ML Training Example 7:30 Performance Highlights 8:30 Wrap-up #machinelearning #AI #syntheticdata #typescript #nodejs #datascience #RAG ``` --- ## Alternative: Live Coding Demo (15 min) For a longer, more in-depth tutorial: 1. **Setup** (3 min): Project initialization, dependencies 2. **Basic Generation** (3 min): Simple examples 3. **Complex Schemas** (3 min): Nested structures, constraints 4. **Integration** (3 min): Database seeding example 5. **Performance** (2 min): Benchmarks and optimization 6. **Q&A** (1 min): Common questions --- **Script Version**: 1.0 **Last Updated**: 2025-11-22 **Status**: Ready for Recording 🎬