Commands

PipeGen provides a comprehensive set of commands for creating, managing, and monitoring streaming data pipelines. Each command is designed to be intuitive and powerful.

Command Overview

Command	Purpose	Use Case
`init`	Create new pipeline project	Project scaffolding, AI generation
`run`	Execute complete pipeline	Testing, load testing, production runs
`deploy`	Start local development stack	Local development environment
`validate`	Validate project structure	Pre-deployment checks
`check`	Check AI provider setup	AI configuration validation
`clean`	Clean up Docker resources	Free up system resources

Quick Reference

Project Creation

bash

# Basic project
pipegen init my-pipeline

# AI-generated project
pipegen init fraud-detection --describe "Monitor transactions for fraud"

# Custom schema project  
pipegen init ecommerce --input-schema ./events.avsc

# Schema + AI description (ground AI with your schema)
pipegen init orders --input-schema ./schemas/input.avsc \
  --describe "Join orders with customers, compute daily GMV and top-5 products"

Pipeline Execution

bash

# Basic run
pipegen run

# With traffic patterns
pipegen run --traffic-pattern "1m-2m:300%,4m-5m:200%"

# With automatic HTML report generation
pipegen run --reports-dir ./my-reports

# Smart consumer stopping (stops after expected messages)
pipegen run --expected-messages 1000 --duration 5m

# Separate producer and overall timeouts
pipegen run --duration 2m --timeout 10m

# Dry run (preview only)
pipegen run --dry-run

Development Environment

bash

# Deploy local stack
pipegen deploy

# Deploy with custom timeout
pipegen deploy --startup-timeout 3m

# Clean deploy (remove existing containers)
pipegen deploy --clean

Monitoring & Validation

bash

# Standalone dashboard
pipegen dashboard --standalone

# Validate project
pipegen validate --check-connectivity

# Check AI setup
pipegen check

Cleaning Up

bash

# Basic cleanup
pipegen clean

# Cleanup with volume removal
pipegen clean --volumes

# Force cleanup without prompt
pipegen clean --force

Global Flags

These flags are available for all commands:

Flag	Default	Description
`--bootstrap-servers`	`localhost:9093`	Kafka bootstrap servers
`--flink-url`	`http://localhost:8081`	Flink Job Manager URL
`--schema-registry-url`	`http://localhost:8082`	Schema Registry URL
`--config`	`$HOME/.pipegen.yaml`	Configuration file path
`--local-mode`	`true`	Use local development mode

Examples with Global Flags

bash

# Connect to cloud Kafka cluster
pipegen run --bootstrap-servers "pkc-4nym6.us-east-1.aws.confluent.cloud:9092"

# Use custom Flink cluster
pipegen run --flink-url "https://my-flink.company.com:8081"

# Use custom config file
pipegen --config ./my-config.yaml run

Environment Variables

You can also set configuration using environment variables:

bash

export PIPEGEN_BOOTSTRAP_SERVERS="localhost:9093"
export PIPEGEN_FLINK_URL="http://localhost:8081"
export PIPEGEN_SCHEMA_REGISTRY_URL="http://localhost:8082"

# For AI features
export PIPEGEN_OLLAMA_MODEL="llama3.1"
export PIPEGEN_OPENAI_API_KEY="your-key"

Common Workflows

🚀 New Project Workflow

bash

# 1. Create project
pipegen init my-analytics --describe "User behavior analytics"

# 2. Deploy local environment  
cd my-analytics
pipegen deploy

# 3. Validate setup
pipegen validate --check-connectivity

# 4. Run with monitoring
pipegen run --dashboard

🧪 Testing Workflow

bash

# 1. Quick validation
pipegen run --dry-run

# 2. Basic test
pipegen run --message-rate 10 --duration 1m

# 3. Smart stopping test (stops early when done)
pipegen run --message-rate 50 --expected-messages 500 --timeout 5m

# 4. Load test with patterns
pipegen run --message-rate 100 --duration 5m \
  --traffic-pattern "1m-2m:300%,3m-4m:200%"

# 5. Generate comprehensive report
pipegen run --message-rate 50 --duration 10m \
  --dashboard --generate-report

🔍 Debugging Workflow

bash

# 1. Check configuration
pipegen check

# 2. Validate project
pipegen validate --check-connectivity

# 3. Start monitoring dashboard
pipegen dashboard --standalone

# 4. Run with detailed logging
pipegen run --dashboard --message-rate 1 --duration 5m

🏢 Production Workflow

bash

# 1. Generate production-ready project
pipegen init production-pipeline --describe "Real-time order processing"

# 2. Validate against cloud services
pipegen validate --check-connectivity \
  --bootstrap-servers "your-kafka-cluster:9092"

# 3. Load test with realistic patterns
pipegen run --message-rate 200 --duration 30m \
  --traffic-pattern "5m-15m:300%,20m-25m:400%" \
  --dashboard --generate-report

# 4. Deploy SQL to production Flink
# (Manual step using generated SQL files)

Configuration Hierarchy

PipeGen uses the following configuration priority (highest to lowest):

Command-line flags
Environment variables
Project .pipegen.yaml
Global ~/.pipegen.yaml
Default values

Example Configuration File

yaml

# ~/.pipegen.yaml or .pipegen.yaml
bootstrap_servers: "localhost:9093"
flink_url: "http://localhost:8081"
schema_registry_url: "http://localhost:8082"
local_mode: true

# Default pipeline settings
default_message_rate: 100
default_duration: "5m" 
topic_prefix: "pipegen"
cleanup_on_exit: true

# AI settings
ollama_model: "llama3.1"
ollama_url: "http://localhost:11434"

Command Chaining

Some commands work well together:

bash

# Create, validate, and run
pipegen init my-pipeline --describe "IoT sensor processing" && \
cd my-pipeline && \
pipegen validate && \
pipegen deploy && \
pipegen run --dashboard

# Test multiple scenarios
pipegen run --message-rate 50 --duration 2m && \
pipegen run --message-rate 100 --duration 2m && \
pipegen run --message-rate 200 --duration 2m

Error Handling

PipeGen provides clear error messages and suggestions:

bash

# Missing configuration
❌ Error: missing required configuration: bootstrap_servers
💡 Tip: Set via --bootstrap-servers flag or PIPEGEN_BOOTSTRAP_SERVERS env var

# Invalid traffic pattern  
❌ Error: invalid traffic pattern: patterns overlap
💡 Tip: Ensure traffic patterns don't overlap in time

# Connection issues
❌ Error: failed to connect to Kafka at localhost:9093
💡 Tip: Ensure Kafka is running via 'pipegen deploy'

Getting Help

Each command has built-in help:

bash

# General help
pipegen --help

# Command-specific help
pipegen run --help
pipegen init --help  
pipegen deploy --help

Next Steps

Explore detailed documentation for each command:

pipegen init - Project creation and AI generation
pipegen run - Pipeline execution and testing
pipegen deploy - Local development environment
pipegen validate - Project validation
pipegen dashboard - Real-time monitoring
Configuration - Advanced configuration options

Commands ​

Command Overview ​

Quick Reference ​

Project Creation ​

Pipeline Execution ​

Development Environment ​

Monitoring & Validation ​

Cleaning Up ​

Global Flags ​

Examples with Global Flags ​

Environment Variables ​

Common Workflows ​

🚀 New Project Workflow ​

🧪 Testing Workflow ​

🔍 Debugging Workflow ​

🏢 Production Workflow ​

Configuration Hierarchy ​

Example Configuration File ​

Command Chaining ​

Error Handling ​

Getting Help ​

Next Steps ​

Commands

Command Overview

Quick Reference

Project Creation

Pipeline Execution

Development Environment

Monitoring & Validation

Cleaning Up

Global Flags

Examples with Global Flags

Environment Variables

Common Workflows

🚀 New Project Workflow

🧪 Testing Workflow

🔍 Debugging Workflow

🏢 Production Workflow

Configuration Hierarchy

Example Configuration File

Command Chaining

Error Handling

Getting Help

Next Steps