Airbyte Worker Toolkit

Analyze, estimate, and optimize Airbyte Data Worker usage for customers and prospects.

What It Does

The Worker Toolkit helps AEs and SEs answer the question: "How many Data Workers does this customer need?"

It provides four workflows:

Mode	Use Case	Input	Output
Mode 1	Analyze current customer usage	Org ID or Workspace ID	P99 workers, hourly heatmap, connection inventory
Mode 2	Estimate prospect needs	Questionnaire or export	Worker estimate with ranges
Mode 3	Optimize schedules	Org ID + contracted workers	Cron reschedule recommendations to reduce peak
Mode 4	Generate PDF reports	Customer name or org ID	Branded PDF (scheduling optimization or capacity analysis)

Worker Model

All plans use the same universal Data Worker model:

Workers = (Concurrent API connections / 5) + (Concurrent DB connections / 2)

API connectors (Salesforce, GitHub, Jira, etc.): 5 per worker
DB/File connectors (Postgres, S3, Snowflake, etc.): 2 per worker
Capacity is combinable -- API and DB share the same workers
Billing uses P99 (99th percentile) of hourly peak workers

Installation

Prerequisites

Python 3.11+
Claude Code CLI with MCP tools configured:
- PyAirbyte MCP (for Airbyte Cloud API access)
- Metabase MCP (for billing meter data -- optional but recommended)
Airbyte Cloud API credentials (AIRBYTE_CLOUD_CLIENT_ID, AIRBYTE_CLOUD_CLIENT_SECRET)

Setup

# Navigate to the worker-toolkit directory
cd worker-toolkit

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your Airbyte Cloud API credentials

Environment Variables

Create a .env file with:

AIRBYTE_CLOUD_CLIENT_ID=your_client_id
AIRBYTE_CLOUD_CLIENT_SECRET=your_client_secret

Usage

With Claude Code (Recommended)

The primary interface is the /workers slash command in Claude Code:

/workers

This activates the Worker Toolkit agent which guides you through all four modes interactively, using MCP tools to fetch data from Airbyte Cloud and Metabase.

Claude Code Setup

Copy the command file to your Claude Code project:

mkdir -p .claude/commands
cp .claude/commands/workers.md .claude/commands/

Ensure your Claude Code has MCP servers configured:
- PyAirbyte MCP -- for mcp__pyairbyte__* tools
- Metabase MCP -- for mcp__metabase__execute_query (database_id=2, BigQuery)
Run /workers in Claude Code and follow the prompts.

With Interactive CLI

source .venv/bin/activate
python3 worker_optimizer.py

This launches an interactive menu with arrow-key navigation for all four modes.

With CLI Flags (Mode 4 Reports)

# Generate report for one customer
source .env && python3 generate_scheduling_reports.py --use-airbyte-api --customers "CustomerName"

# Generate reports for all over-utilized customers
source .env && python3 generate_scheduling_reports.py --use-airbyte-api --over-utilized-only

# List all customers with utilization status
python3 generate_scheduling_reports.py --dry-run

Modes in Detail

Mode 1: Calculate Current Usage

Analyzes actual worker usage for an existing customer.

Data Sources (priority order):

Metabase billing meter (authoritative) -- queries organization_data_worker_usage_daily and workspace_data_worker_usage_hourly tables
Airbyte Cloud API (fallback) -- fetches job history and calculates concurrency overlap

What it produces:

P99 worker usage (the billing metric)
30-day daily worker trend
Hourly peak/quiet hour pattern
Connection inventory with API/DB classification
Utilization percentage vs contracted workers

Mode 2: Estimate Prospect Needs

Three sub-modes for different prospect scenarios:

Sub-mode	When to Use	Accuracy
2A: Questionnaire	New prospect, no Airbyte usage	Estimate (statistical model)
2B: Cloud/OSS Export	Prospect has existing Airbyte deployment	High (actual job data)
2C: Quick Estimate	Ballpark for pricing discussions	Low-Medium (ranges)

Mode 3: Optimize Worker Usage

Finds the minimum schedule changes to reduce peak worker usage.

How it works:

Fetches Metabase billing data for ground truth peak hours
Fetches connection details and job durations from Airbyte API
Runs greedy minimum-move optimization algorithm
Outputs specific Quartz cron expressions for rescheduled connections
Shows before/after comparison

Example output:

3 moves needed to reduce peak from 1.50 to 0.90 workers:
  1. demandbase_jira     15:00 -> 00:00  (0 0 0 * * ?)  8h52m
  2. shield_hubspot_OP   15:00 -> 00:00  (0 0 0 * * ?)  1h32m
  3. shield_hubspot_NYC  15:00 -> 00:00  (0 0 0 * * ?)  1h31m

Mode 4: Generate PDF Reports

Produces branded Airbyte PDF reports for customers. Auto-selects report type based on utilization:

Utilization	Report Type	Content
>= 85%	Scheduling Optimization	Peak hours, connections to reschedule, cron recommendations
< 85%	Capacity Analysis	Headroom, growth potential, expansion recommendations

Sub-modes:

4A: Single customer report (fuzzy name search)
4B: Batch reports (filter by utilization)
4C: List available customers
4D: Non-Metabase customers (Cloud Sales Assist, Free tier, trials)

Project Structure

worker-toolkit/
├── worker_optimizer.py              # Interactive CLI menu (all modes)
├── generate_scheduling_reports.py   # Mode 4 report generator (CLI)
├── requirements.txt                 # Python dependencies
├── .claude/commands/workers.md      # Claude Code /workers command
├── src/                             # Core modules
│   ├── config.py                    # Worker model config (API/5 + DB/2)
│   ├── worker_calculator.py         # Worker calculation engine
│   ├── connector_classifier.py      # API vs DB connector classification
│   ├── job_overlap_analyzer.py      # Minute-by-minute concurrency analysis
│   ├── metabase_billing_data.py     # Metabase billing meter queries
│   ├── metabase_worker_data.py      # Metabase node_calculator queries
│   ├── airbyte_cloud_data.py        # Org mappings + API integration
│   ├── scheduling_optimization_report.py  # PDF report (over-utilized)
│   ├── underutilization_report.py   # PDF report (under-utilized)
│   ├── non_metabase_customer_report.py    # PDF report (non-Metabase)
│   ├── optimize_worker_usage_fast.py      # Fast optimization algorithm
│   ├── cron_generator.py            # Quartz cron expression utilities
│   └── ...                          # Additional utilities
├── mode2/                           # Prospect estimation scripts
│   ├── mode2_prospect_questionnaire.py    # Mode 2A
│   ├── mode2_analyze_oss_export.py        # Mode 2B
│   ├── mode2_custom_estimate.py           # Mode 2C
│   └── mode2_oss_export_script.py         # Export script for prospects
├── tests/                           # Test suite
└── examples/                        # Example reports and templates

Key Metabase Tables

When the Metabase MCP is available, the toolkit queries these BigQuery tables (database_id=2):

Table	Purpose
`airbyte_warehouse.organization_data_worker_usage_daily`	Daily peak workers (P99 billing)
`airbyte_warehouse.workspace_data_worker_usage_hourly`	Hourly worker patterns (peak hours)
`airbyte_warehouse.account`	Account metadata (name, ARR, owner)
`airbyte_warehouse.organization`	Org metadata (plan, connections, lifecycle)

Contributing

All plans use the same worker model -- do not add plan-specific logic
Metabase billing data is the authoritative source for worker usage
Use job overlap analysis (not connection counts) for concurrency calculations
Customer-specific scripts go in FinalReports/{CustomerName}/, not in src/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Airbyte Worker Toolkit

What It Does

Worker Model

Installation

Prerequisites

Setup

Environment Variables

Usage

With Claude Code (Recommended)

Claude Code Setup

With Interactive CLI

With CLI Flags (Mode 4 Reports)

Modes in Detail

Mode 1: Calculate Current Usage

Mode 2: Estimate Prospect Needs

Mode 3: Optimize Worker Usage

Mode 4: Generate PDF Reports

Project Structure

Key Metabase Tables

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude/commands		.claude/commands
examples		examples
mode2		mode2
mode3		mode3
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
generate_scheduling_reports.py		generate_scheduling_reports.py
requirements.txt		requirements.txt
worker_optimizer.py		worker_optimizer.py

rwask/worker-toolkit

Folders and files

Latest commit

History

Repository files navigation

Airbyte Worker Toolkit

What It Does

Worker Model

Installation

Prerequisites

Setup

Environment Variables

Usage

With Claude Code (Recommended)

Claude Code Setup

With Interactive CLI

With CLI Flags (Mode 4 Reports)

Modes in Detail

Mode 1: Calculate Current Usage

Mode 2: Estimate Prospect Needs

Mode 3: Optimize Worker Usage

Mode 4: Generate PDF Reports

Project Structure

Key Metabase Tables

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages