Files
wifi-densepose/npm/packages/agentic-synth/docs/VIDEO_DEMO_SCRIPT.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

11 KiB

🎥 Agentic-Synth Video Tutorial Script

Duration: 8-10 minutes Target Audience: Developers, ML engineers, data scientists Format: Screen recording with voice-over


Video Structure

  1. Introduction (1 min)
  2. Installation & Setup (1 min)
  3. Basic Usage (2 mins)
  4. Advanced Features (2 mins)
  5. Real-World Example (2 mins)
  6. Performance & Wrap-up (1 min)

Script

Scene 1: Introduction (0:00 - 1:00)

Visual: Title card, then switch to terminal

Voice-over:

"Hi! Today I'll show you agentic-synth - a high-performance synthetic data generator that makes it incredibly easy to create realistic test data for your AI and ML projects.

Whether you're training machine learning models, building RAG systems, or just need to seed your development database, agentic-synth has you covered with AI-powered data generation.

Let's dive in!"

Screen: Show README on GitHub with badges


Scene 2: Installation (1:00 - 2:00)

Visual: Terminal with command prompts

Voice-over:

"Installation is straightforward. You can use it as a global CLI tool or add it to your project."

Type in terminal:

# Global installation
npm install -g @ruvector/agentic-synth

# Or use directly with npx
npx agentic-synth --help

Voice-over:

"You'll need an API key from Google Gemini or OpenRouter. Let's set that up quickly."

Type:

export GEMINI_API_KEY="your-key-here"

Voice-over:

"And we're ready to go!"


Scene 3: Basic Usage - CLI (2:00 - 3:00)

Visual: Terminal showing CLI commands

Voice-over:

"Let's start with the CLI. Generating data is as simple as running a single command."

Type:

npx agentic-synth generate \
  --type structured \
  --count 10 \
  --schema '{"name": "string", "email": "email", "age": "number"}' \
  --output users.json

Voice-over:

"In just a few seconds, we have 10 realistic user records with names, emails, and ages. Let's look at the output."

Type:

cat users.json | jq '.[0:3]'

Visual: Show JSON output with realistic data

Voice-over:

"Notice how the data looks realistic - real names, valid email formats, appropriate ages. This is all powered by AI."


Scene 4: SDK Usage (3:00 - 4:00)

Visual: VS Code with TypeScript file

Voice-over:

"For more control, you can use the SDK directly in your code. Let me show you how simple that is."

Type in editor (demo.ts):

import { AgenticSynth } from '@ruvector/agentic-synth';

// Initialize with configuration
const synth = new AgenticSynth({
  provider: 'gemini',
  apiKey: process.env.GEMINI_API_KEY,
  cacheStrategy: 'memory', // Enable caching for 95%+ speedup
  cacheTTL: 3600
});

// Generate structured data
const users = await synth.generateStructured({
  count: 100,
  schema: {
    user_id: 'UUID',
    name: 'full name',
    email: 'valid email',
    age: 'number (18-80)',
    country: 'country name',
    subscription: 'free | pro | enterprise'
  }
});

console.log(`Generated ${users.data.length} users`);
console.log('Sample:', users.data[0]);

Voice-over:

"Run this code..."

Type in terminal:

npx tsx demo.ts

Visual: Show output with generated data

Voice-over:

"And we instantly get 100 realistic user profiles. Notice the caching - if we run this again with the same options, it's nearly instant!"


Scene 5: Advanced Features - Time Series (4:00 - 5:00)

Visual: Split screen - editor on left, output on right

Voice-over:

"agentic-synth isn't just for simple records. It can generate complex time-series data, perfect for financial or IoT applications."

Type in editor:

const stockData = await synth.generateTimeSeries({
  count: 365,
  startDate: '2024-01-01',
  interval: '1d',
  schema: {
    date: 'ISO date',
    open: 'number (100-200)',
    high: 'number (105-210)',
    low: 'number (95-195)',
    close: 'number (100-200)',
    volume: 'number (1000000-10000000)'
  },
  constraints: [
    'high must be >= open and close',
    'low must be <= open and close',
    'close influences next day open'
  ]
});

console.log('Generated stock data for 1 year');

Voice-over:

"The constraints ensure our data follows real-world patterns - high prices are actually higher than opens and closes, and there's continuity between days."

Show output: Chart visualization of stock data


Scene 6: Advanced Features - Streaming (5:00 - 6:00)

Visual: Editor showing streaming code

Voice-over:

"Need to generate millions of records? Use streaming to avoid memory issues."

Type:

let count = 0;
for await (const record of synth.generateStream('structured', {
  count: 1_000_000,
  schema: {
    id: 'UUID',
    timestamp: 'ISO timestamp',
    value: 'number'
  }
})) {
  // Process each record individually
  await saveToDatabase(record);

  count++;
  if (count % 10000 === 0) {
    console.log(`Processed ${count.toLocaleString()}...`);
  }
}

Voice-over:

"This streams records one at a time, so you can process a million records without loading everything into memory."

Visual: Show progress counter incrementing


Scene 7: Real-World Example - ML Training Data (6:00 - 7:30)

Visual: Complete working example

Voice-over:

"Let me show you a real-world use case: generating training data for a machine learning model that predicts customer churn."

Type:

// Generate training dataset with features
const trainingData = await synth.generateStructured({
  count: 5000,
  schema: {
    customer_age: 'number (18-80)',
    annual_income: 'number (20000-200000)',
    credit_score: 'number (300-850)',
    account_tenure_months: 'number (1-360)',
    num_products: 'number (1-5)',
    balance: 'number (0-250000)',
    num_transactions_12m: 'number (0-200)',

    // Target variable
    churn: 'boolean (higher likelihood if credit_score < 600, balance < 1000)'
  },
  constraints: [
    'Churn rate should be ~15-20%',
    'Higher income correlates with higher balance',
    'Customers with 1 product more likely to churn'
  ]
});

// Split into train/test
const trainSize = Math.floor(trainingData.data.length * 0.8);
const trainSet = trainingData.data.slice(0, trainSize);
const testSet = trainingData.data.slice(trainSize);

console.log(`Training set: ${trainSet.length} samples`);
console.log(`Test set: ${testSet.length} samples`);
console.log(`Churn rate: ${(trainSet.filter(d => d.churn).length / trainSet.length * 100).toFixed(1)}%`);

Voice-over:

"In minutes, we have a complete ML dataset with realistic distributions and correlations. The AI understands the constraints and generates data that actually makes sense for training models."


Scene 8: Performance Highlights (7:30 - 8:30)

Visual: Show benchmark results

Voice-over:

"Let's talk performance. agentic-synth is incredibly fast, thanks to intelligent caching."

Visual: Show PERFORMANCE_REPORT.md metrics

Voice-over:

"All operations complete in sub-millisecond to low-millisecond latencies. Cache hits are essentially instant. And with an 85% cache hit rate in production, you're looking at 95%+ performance improvement for repeated queries.

The package also handles 1000+ requests per second with linear scaling, making it perfect for production workloads."


Scene 9: Wrap-up (8:30 - 9:00)

Visual: Return to terminal, show final commands

Voice-over:

"That's agentic-synth! To recap:

  • Simple CLI and SDK interfaces
  • AI-powered realistic data generation
  • Time-series, events, and structured data support
  • Streaming for large datasets
  • Built-in caching for incredible performance
  • Perfect for ML training, RAG systems, and testing

Check out the documentation for more advanced examples, and give it a try in your next project!"

Type:

npm install @ruvector/agentic-synth

Visual: Show GitHub repo with Star button

Voice-over:

"If you found this useful, star the repo on GitHub and let me know what you build with it. Thanks for watching!"

Visual: End card with links


Visual Assets Needed

  1. Title Cards:

    • Intro card with logo
    • Feature highlights card
    • End card with links
  2. Code Examples:

    • Syntax highlighted in VS Code
    • Font: Fira Code or JetBrains Mono
    • Theme: Dark+ or Material Theme
  3. Terminal:

    • Oh My Zsh with clean prompt
    • Colors: Nord or Dracula theme
  4. Data Visualizations:

    • JSON output formatted with jq
    • Stock chart for time-series example
    • Progress bars for streaming
  5. Documentation:

    • README.md rendered
    • Performance metrics table
    • Benchmark results

Recording Tips

  1. Screen Setup:

    • 1920x1080 resolution
    • Clean desktop, no distractions
    • Close unnecessary applications
    • Disable notifications
  2. Terminal Settings:

    • Large font size (16-18pt)
    • High contrast theme
    • Slow down typing with tool like "Keycastr"
  3. Editor Settings:

    • Zoom to 150-200%
    • Hide sidebars for cleaner view
    • Use presentation mode
  4. Audio:

    • Use quality microphone
    • Record in quiet room
    • Speak clearly and at moderate pace
    • Add background music (subtle, low volume)
  5. Pacing:

    • Pause between steps
    • Let output display for 2-3 seconds
    • Don't rush through commands
    • Leave time for viewers to read

Post-Production Checklist

  • Add title cards
  • Add transitions between scenes
  • Highlight important commands/output
  • Add annotations/callouts where helpful
  • Background music at 10-15% volume
  • Export at 1080p, 60fps
  • Generate subtitles/captions
  • Create thumbnail image
  • Upload to YouTube
  • Add to README as embedded video

Video Description (for YouTube)

# Agentic-Synth: High-Performance Synthetic Data Generator

Generate realistic synthetic data for AI/ML training, RAG systems, and database seeding in minutes!

🔗 Links:
- NPM: https://www.npmjs.com/package/@ruvector/agentic-synth
- GitHub: https://github.com/ruvnet/ruvector/tree/main/packages/agentic-synth
- Documentation: https://github.com/ruvnet/ruvector/blob/main/packages/agentic-synth/README.md

⚡ Performance:
- Sub-millisecond P99 latencies
- 85% cache hit rate
- 1000+ req/s throughput
- 95%+ speedup with caching

🎯 Use Cases:
- Machine learning training data
- RAG system data generation
- Database seeding
- API testing
- Load testing

📚 Chapters:
0:00 Introduction
1:00 Installation & Setup
2:00 CLI Usage
3:00 SDK Usage
4:00 Time-Series Data
5:00 Streaming Large Datasets
6:00 ML Training Example
7:30 Performance Highlights
8:30 Wrap-up

#machinelearning #AI #syntheticdata #typescript #nodejs #datascience #RAG

Alternative: Live Coding Demo (15 min)

For a longer, more in-depth tutorial:

  1. Setup (3 min): Project initialization, dependencies
  2. Basic Generation (3 min): Simple examples
  3. Complex Schemas (3 min): Nested structures, constraints
  4. Integration (3 min): Database seeding example
  5. Performance (2 min): Benchmarks and optimization
  6. Q&A (1 min): Common questions

Script Version: 1.0 Last Updated: 2025-11-22 Status: Ready for Recording 🎬