Files
wifi-densepose/npm/packages/agentic-synth/docs/CLI_USAGE.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

7.1 KiB

Agentic Synth CLI Usage Guide

Overview

The agentic-synth CLI provides a command-line interface for AI-powered synthetic data generation. It supports multiple model providers, custom schemas, and various output formats.

Installation

npm install agentic-synth
# or
npm install -g agentic-synth

Configuration

Environment Variables

Set your API key before using the CLI:

# For Google Gemini (default)
export GEMINI_API_KEY="your-api-key-here"

# For OpenRouter
export OPENROUTER_API_KEY="your-api-key-here"

Configuration File

Create a config.json file for persistent settings:

{
  "provider": "gemini",
  "model": "gemini-2.0-flash-exp",
  "apiKey": "your-api-key",
  "cacheStrategy": "memory",
  "cacheTTL": 3600,
  "maxRetries": 3,
  "timeout": 30000
}

Commands

Generate Data

Generate synthetic structured data based on a schema.

agentic-synth generate [options]

Options

  • -c, --count <number> - Number of records to generate (default: 10)
  • -s, --schema <path> - Path to JSON schema file
  • -o, --output <path> - Output file path (JSON format)
  • --seed <value> - Random seed for reproducibility
  • -p, --provider <provider> - Model provider: gemini or openrouter (default: gemini)
  • -m, --model <model> - Specific model name to use
  • --format <format> - Output format: json, csv, or array (default: json)
  • --config <path> - Path to config file with provider settings

Examples

Basic generation (10 records):

agentic-synth generate

Generate with custom count:

agentic-synth generate --count 100

Generate with schema:

agentic-synth generate --schema examples/user-schema.json --count 50

Generate to file:

agentic-synth generate --schema examples/user-schema.json --output data/users.json --count 100

Generate with seed for reproducibility:

agentic-synth generate --schema examples/user-schema.json --seed 12345 --count 20

Use OpenRouter provider:

agentic-synth generate --provider openrouter --model anthropic/claude-3.5-sonnet --count 30

Use config file:

agentic-synth generate --config config.json --schema examples/user-schema.json --count 50

Sample Schema

Create a JSON schema file (e.g., user-schema.json):

{
  "type": "object",
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique user identifier (UUID)"
    },
    "name": {
      "type": "string",
      "description": "Full name of the user"
    },
    "email": {
      "type": "string",
      "format": "email",
      "description": "Valid email address"
    },
    "age": {
      "type": "number",
      "minimum": 18,
      "maximum": 100,
      "description": "User age between 18 and 100"
    },
    "role": {
      "type": "string",
      "enum": ["admin", "user", "moderator"],
      "description": "User role in the system"
    }
  },
  "required": ["id", "name", "email"]
}

Display Configuration

View current configuration settings.

agentic-synth config [options]

Options

  • -f, --file <path> - Load and display config from file
  • -t, --test - Test configuration by initializing AgenticSynth

Examples

Show default configuration:

agentic-synth config

Load and display config file:

agentic-synth config --file config.json

Test configuration:

agentic-synth config --test

Validate Configuration

Validate configuration and dependencies.

agentic-synth validate [options]

Options

  • -f, --file <path> - Config file path to validate

Examples

Validate default configuration:

agentic-synth validate

Validate config file:

agentic-synth validate --file config.json

Output Format

JSON Output (default)

[
  {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "John Doe",
    "email": "john.doe@example.com",
    "age": 32,
    "role": "user"
  },
  {
    "id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
    "name": "Jane Smith",
    "email": "jane.smith@example.com",
    "age": 28,
    "role": "admin"
  }
]

Metadata

The CLI displays metadata after generation:

Metadata:
  Provider: gemini
  Model: gemini-2.0-flash-exp
  Cached: false
  Duration: 1247ms
  Generated: 2025-11-22T10:30:45.123Z

Error Handling

The CLI provides clear error messages:

# Missing schema file
agentic-synth generate --schema missing.json
# Error: Schema file not found: missing.json

# Invalid count
agentic-synth generate --count -5
# Error: Count must be a positive integer

# Missing API key
agentic-synth generate
# Error: API key not found. Set GEMINI_API_KEY or OPENROUTER_API_KEY environment variable

Debug Mode

Enable debug mode for detailed error information:

DEBUG=1 agentic-synth generate --schema examples/user-schema.json

Common Workflows

1. Quick Test Generation

agentic-synth generate --count 5

2. Production Data Generation

agentic-synth generate \
  --schema schemas/product-schema.json \
  --output data/products.json \
  --count 1000 \
  --seed 42 \
  --provider gemini

3. Multiple Datasets

# Users
agentic-synth generate --schema schemas/user.json --output data/users.json --count 100

# Products
agentic-synth generate --schema schemas/product.json --output data/products.json --count 500

# Orders
agentic-synth generate --schema schemas/order.json --output data/orders.json --count 200

4. Reproducible Generation

# Generate with same seed for consistent results
agentic-synth generate --schema examples/user-schema.json --seed 12345 --count 50 --output data/users-v1.json
agentic-synth generate --schema examples/user-schema.json --seed 12345 --count 50 --output data/users-v2.json

# Both files will contain identical data

Tips & Best Practices

  1. Use schemas - Provide detailed JSON schemas for better quality data
  2. Set seeds - Use --seed for reproducible results in testing
  3. Start small - Test with small counts before generating large datasets
  4. Cache strategy - Configure caching to improve performance for repeated generations
  5. Provider selection - Choose the appropriate provider based on your needs:
    • Gemini: Fast, cost-effective, good for structured data
    • OpenRouter: Access to multiple models including Claude, GPT-4, etc.

Troubleshooting

Command not found

# If globally installed
npm install -g agentic-synth

# If locally installed, use npx
npx agentic-synth generate

API Key Issues

# Verify environment variables
agentic-synth config

# Check output shows:
# Environment Variables:
#   GEMINI_API_KEY: ✓ Set

Build Issues

# Rebuild the package
cd packages/agentic-synth
npm run build

API Integration

The CLI uses the same API as the programmatic interface. For advanced usage, see the API documentation.

Support