git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
11 KiB
🎥 Agentic-Synth Video Tutorial Script
Duration: 8-10 minutes Target Audience: Developers, ML engineers, data scientists Format: Screen recording with voice-over
Video Structure
- Introduction (1 min)
- Installation & Setup (1 min)
- Basic Usage (2 mins)
- Advanced Features (2 mins)
- Real-World Example (2 mins)
- Performance & Wrap-up (1 min)
Script
Scene 1: Introduction (0:00 - 1:00)
Visual: Title card, then switch to terminal
Voice-over:
"Hi! Today I'll show you agentic-synth - a high-performance synthetic data generator that makes it incredibly easy to create realistic test data for your AI and ML projects.
Whether you're training machine learning models, building RAG systems, or just need to seed your development database, agentic-synth has you covered with AI-powered data generation.
Let's dive in!"
Screen: Show README on GitHub with badges
Scene 2: Installation (1:00 - 2:00)
Visual: Terminal with command prompts
Voice-over:
"Installation is straightforward. You can use it as a global CLI tool or add it to your project."
Type in terminal:
# Global installation
npm install -g @ruvector/agentic-synth
# Or use directly with npx
npx agentic-synth --help
Voice-over:
"You'll need an API key from Google Gemini or OpenRouter. Let's set that up quickly."
Type:
export GEMINI_API_KEY="your-key-here"
Voice-over:
"And we're ready to go!"
Scene 3: Basic Usage - CLI (2:00 - 3:00)
Visual: Terminal showing CLI commands
Voice-over:
"Let's start with the CLI. Generating data is as simple as running a single command."
Type:
npx agentic-synth generate \
--type structured \
--count 10 \
--schema '{"name": "string", "email": "email", "age": "number"}' \
--output users.json
Voice-over:
"In just a few seconds, we have 10 realistic user records with names, emails, and ages. Let's look at the output."
Type:
cat users.json | jq '.[0:3]'
Visual: Show JSON output with realistic data
Voice-over:
"Notice how the data looks realistic - real names, valid email formats, appropriate ages. This is all powered by AI."
Scene 4: SDK Usage (3:00 - 4:00)
Visual: VS Code with TypeScript file
Voice-over:
"For more control, you can use the SDK directly in your code. Let me show you how simple that is."
Type in editor (demo.ts):
import { AgenticSynth } from '@ruvector/agentic-synth';
// Initialize with configuration
const synth = new AgenticSynth({
provider: 'gemini',
apiKey: process.env.GEMINI_API_KEY,
cacheStrategy: 'memory', // Enable caching for 95%+ speedup
cacheTTL: 3600
});
// Generate structured data
const users = await synth.generateStructured({
count: 100,
schema: {
user_id: 'UUID',
name: 'full name',
email: 'valid email',
age: 'number (18-80)',
country: 'country name',
subscription: 'free | pro | enterprise'
}
});
console.log(`Generated ${users.data.length} users`);
console.log('Sample:', users.data[0]);
Voice-over:
"Run this code..."
Type in terminal:
npx tsx demo.ts
Visual: Show output with generated data
Voice-over:
"And we instantly get 100 realistic user profiles. Notice the caching - if we run this again with the same options, it's nearly instant!"
Scene 5: Advanced Features - Time Series (4:00 - 5:00)
Visual: Split screen - editor on left, output on right
Voice-over:
"agentic-synth isn't just for simple records. It can generate complex time-series data, perfect for financial or IoT applications."
Type in editor:
const stockData = await synth.generateTimeSeries({
count: 365,
startDate: '2024-01-01',
interval: '1d',
schema: {
date: 'ISO date',
open: 'number (100-200)',
high: 'number (105-210)',
low: 'number (95-195)',
close: 'number (100-200)',
volume: 'number (1000000-10000000)'
},
constraints: [
'high must be >= open and close',
'low must be <= open and close',
'close influences next day open'
]
});
console.log('Generated stock data for 1 year');
Voice-over:
"The constraints ensure our data follows real-world patterns - high prices are actually higher than opens and closes, and there's continuity between days."
Show output: Chart visualization of stock data
Scene 6: Advanced Features - Streaming (5:00 - 6:00)
Visual: Editor showing streaming code
Voice-over:
"Need to generate millions of records? Use streaming to avoid memory issues."
Type:
let count = 0;
for await (const record of synth.generateStream('structured', {
count: 1_000_000,
schema: {
id: 'UUID',
timestamp: 'ISO timestamp',
value: 'number'
}
})) {
// Process each record individually
await saveToDatabase(record);
count++;
if (count % 10000 === 0) {
console.log(`Processed ${count.toLocaleString()}...`);
}
}
Voice-over:
"This streams records one at a time, so you can process a million records without loading everything into memory."
Visual: Show progress counter incrementing
Scene 7: Real-World Example - ML Training Data (6:00 - 7:30)
Visual: Complete working example
Voice-over:
"Let me show you a real-world use case: generating training data for a machine learning model that predicts customer churn."
Type:
// Generate training dataset with features
const trainingData = await synth.generateStructured({
count: 5000,
schema: {
customer_age: 'number (18-80)',
annual_income: 'number (20000-200000)',
credit_score: 'number (300-850)',
account_tenure_months: 'number (1-360)',
num_products: 'number (1-5)',
balance: 'number (0-250000)',
num_transactions_12m: 'number (0-200)',
// Target variable
churn: 'boolean (higher likelihood if credit_score < 600, balance < 1000)'
},
constraints: [
'Churn rate should be ~15-20%',
'Higher income correlates with higher balance',
'Customers with 1 product more likely to churn'
]
});
// Split into train/test
const trainSize = Math.floor(trainingData.data.length * 0.8);
const trainSet = trainingData.data.slice(0, trainSize);
const testSet = trainingData.data.slice(trainSize);
console.log(`Training set: ${trainSet.length} samples`);
console.log(`Test set: ${testSet.length} samples`);
console.log(`Churn rate: ${(trainSet.filter(d => d.churn).length / trainSet.length * 100).toFixed(1)}%`);
Voice-over:
"In minutes, we have a complete ML dataset with realistic distributions and correlations. The AI understands the constraints and generates data that actually makes sense for training models."
Scene 8: Performance Highlights (7:30 - 8:30)
Visual: Show benchmark results
Voice-over:
"Let's talk performance. agentic-synth is incredibly fast, thanks to intelligent caching."
Visual: Show PERFORMANCE_REPORT.md metrics
Voice-over:
"All operations complete in sub-millisecond to low-millisecond latencies. Cache hits are essentially instant. And with an 85% cache hit rate in production, you're looking at 95%+ performance improvement for repeated queries.
The package also handles 1000+ requests per second with linear scaling, making it perfect for production workloads."
Scene 9: Wrap-up (8:30 - 9:00)
Visual: Return to terminal, show final commands
Voice-over:
"That's agentic-synth! To recap:
- Simple CLI and SDK interfaces
- AI-powered realistic data generation
- Time-series, events, and structured data support
- Streaming for large datasets
- Built-in caching for incredible performance
- Perfect for ML training, RAG systems, and testing
Check out the documentation for more advanced examples, and give it a try in your next project!"
Type:
npm install @ruvector/agentic-synth
Visual: Show GitHub repo with Star button
Voice-over:
"If you found this useful, star the repo on GitHub and let me know what you build with it. Thanks for watching!"
Visual: End card with links
Visual Assets Needed
-
Title Cards:
- Intro card with logo
- Feature highlights card
- End card with links
-
Code Examples:
- Syntax highlighted in VS Code
- Font: Fira Code or JetBrains Mono
- Theme: Dark+ or Material Theme
-
Terminal:
- Oh My Zsh with clean prompt
- Colors: Nord or Dracula theme
-
Data Visualizations:
- JSON output formatted with jq
- Stock chart for time-series example
- Progress bars for streaming
-
Documentation:
- README.md rendered
- Performance metrics table
- Benchmark results
Recording Tips
-
Screen Setup:
- 1920x1080 resolution
- Clean desktop, no distractions
- Close unnecessary applications
- Disable notifications
-
Terminal Settings:
- Large font size (16-18pt)
- High contrast theme
- Slow down typing with tool like "Keycastr"
-
Editor Settings:
- Zoom to 150-200%
- Hide sidebars for cleaner view
- Use presentation mode
-
Audio:
- Use quality microphone
- Record in quiet room
- Speak clearly and at moderate pace
- Add background music (subtle, low volume)
-
Pacing:
- Pause between steps
- Let output display for 2-3 seconds
- Don't rush through commands
- Leave time for viewers to read
Post-Production Checklist
- Add title cards
- Add transitions between scenes
- Highlight important commands/output
- Add annotations/callouts where helpful
- Background music at 10-15% volume
- Export at 1080p, 60fps
- Generate subtitles/captions
- Create thumbnail image
- Upload to YouTube
- Add to README as embedded video
Video Description (for YouTube)
# Agentic-Synth: High-Performance Synthetic Data Generator
Generate realistic synthetic data for AI/ML training, RAG systems, and database seeding in minutes!
🔗 Links:
- NPM: https://www.npmjs.com/package/@ruvector/agentic-synth
- GitHub: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth
- Documentation: https://github.com/ruvnet/ruvector/blob/main/packages/agentic-synth/README.md
⚡ Performance:
- Sub-millisecond P99 latencies
- 85% cache hit rate
- 1000+ req/s throughput
- 95%+ speedup with caching
🎯 Use Cases:
- Machine learning training data
- RAG system data generation
- Database seeding
- API testing
- Load testing
📚 Chapters:
0:00 Introduction
1:00 Installation & Setup
2:00 CLI Usage
3:00 SDK Usage
4:00 Time-Series Data
5:00 Streaming Large Datasets
6:00 ML Training Example
7:30 Performance Highlights
8:30 Wrap-up
#machinelearning #AI #syntheticdata #typescript #nodejs #datascience #RAG
Alternative: Live Coding Demo (15 min)
For a longer, more in-depth tutorial:
- Setup (3 min): Project initialization, dependencies
- Basic Generation (3 min): Simple examples
- Complex Schemas (3 min): Nested structures, constraints
- Integration (3 min): Database seeding example
- Performance (2 min): Benchmarks and optimization
- Q&A (1 min): Common questions
Script Version: 1.0 Last Updated: 2025-11-22 Status: Ready for Recording 🎬