Analyze, estimate, and optimize Airbyte Data Worker usage for customers and prospects.
The Worker Toolkit helps AEs and SEs answer the question: "How many Data Workers does this customer need?"
It provides four workflows:
| Mode | Use Case | Input | Output |
|---|---|---|---|
| Mode 1 | Analyze current customer usage | Org ID or Workspace ID | P99 workers, hourly heatmap, connection inventory |
| Mode 2 | Estimate prospect needs | Questionnaire or export | Worker estimate with ranges |
| Mode 3 | Optimize schedules | Org ID + contracted workers | Cron reschedule recommendations to reduce peak |
| Mode 4 | Generate PDF reports | Customer name or org ID | Branded PDF (scheduling optimization or capacity analysis) |
All plans use the same universal Data Worker model:
Workers = (Concurrent API connections / 5) + (Concurrent DB connections / 2)
- API connectors (Salesforce, GitHub, Jira, etc.): 5 per worker
- DB/File connectors (Postgres, S3, Snowflake, etc.): 2 per worker
- Capacity is combinable -- API and DB share the same workers
- Billing uses P99 (99th percentile) of hourly peak workers
- Python 3.11+
- Claude Code CLI with MCP tools configured:
- PyAirbyte MCP (for Airbyte Cloud API access)
- Metabase MCP (for billing meter data -- optional but recommended)
- Airbyte Cloud API credentials (
AIRBYTE_CLOUD_CLIENT_ID,AIRBYTE_CLOUD_CLIENT_SECRET)
# Navigate to the worker-toolkit directory
cd worker-toolkit
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your Airbyte Cloud API credentialsCreate a .env file with:
AIRBYTE_CLOUD_CLIENT_ID=your_client_id
AIRBYTE_CLOUD_CLIENT_SECRET=your_client_secretThe primary interface is the /workers slash command in Claude Code:
/workers
This activates the Worker Toolkit agent which guides you through all four modes interactively, using MCP tools to fetch data from Airbyte Cloud and Metabase.
-
Copy the command file to your Claude Code project:
mkdir -p .claude/commands cp .claude/commands/workers.md .claude/commands/
-
Ensure your Claude Code has MCP servers configured:
- PyAirbyte MCP -- for
mcp__pyairbyte__*tools - Metabase MCP -- for
mcp__metabase__execute_query(database_id=2, BigQuery)
- PyAirbyte MCP -- for
-
Run
/workersin Claude Code and follow the prompts.
source .venv/bin/activate
python3 worker_optimizer.pyThis launches an interactive menu with arrow-key navigation for all four modes.
# Generate report for one customer
source .env && python3 generate_scheduling_reports.py --use-airbyte-api --customers "CustomerName"
# Generate reports for all over-utilized customers
source .env && python3 generate_scheduling_reports.py --use-airbyte-api --over-utilized-only
# List all customers with utilization status
python3 generate_scheduling_reports.py --dry-runAnalyzes actual worker usage for an existing customer.
Data Sources (priority order):
- Metabase billing meter (authoritative) -- queries
organization_data_worker_usage_dailyandworkspace_data_worker_usage_hourlytables - Airbyte Cloud API (fallback) -- fetches job history and calculates concurrency overlap
What it produces:
- P99 worker usage (the billing metric)
- 30-day daily worker trend
- Hourly peak/quiet hour pattern
- Connection inventory with API/DB classification
- Utilization percentage vs contracted workers
Three sub-modes for different prospect scenarios:
| Sub-mode | When to Use | Accuracy |
|---|---|---|
| 2A: Questionnaire | New prospect, no Airbyte usage | Estimate (statistical model) |
| 2B: Cloud/OSS Export | Prospect has existing Airbyte deployment | High (actual job data) |
| 2C: Quick Estimate | Ballpark for pricing discussions | Low-Medium (ranges) |
Finds the minimum schedule changes to reduce peak worker usage.
How it works:
- Fetches Metabase billing data for ground truth peak hours
- Fetches connection details and job durations from Airbyte API
- Runs greedy minimum-move optimization algorithm
- Outputs specific Quartz cron expressions for rescheduled connections
- Shows before/after comparison
Example output:
3 moves needed to reduce peak from 1.50 to 0.90 workers:
1. demandbase_jira 15:00 -> 00:00 (0 0 0 * * ?) 8h52m
2. shield_hubspot_OP 15:00 -> 00:00 (0 0 0 * * ?) 1h32m
3. shield_hubspot_NYC 15:00 -> 00:00 (0 0 0 * * ?) 1h31m
Produces branded Airbyte PDF reports for customers. Auto-selects report type based on utilization:
| Utilization | Report Type | Content |
|---|---|---|
| >= 85% | Scheduling Optimization | Peak hours, connections to reschedule, cron recommendations |
| < 85% | Capacity Analysis | Headroom, growth potential, expansion recommendations |
Sub-modes:
- 4A: Single customer report (fuzzy name search)
- 4B: Batch reports (filter by utilization)
- 4C: List available customers
- 4D: Non-Metabase customers (Cloud Sales Assist, Free tier, trials)
worker-toolkit/
├── worker_optimizer.py # Interactive CLI menu (all modes)
├── generate_scheduling_reports.py # Mode 4 report generator (CLI)
├── requirements.txt # Python dependencies
├── .claude/commands/workers.md # Claude Code /workers command
├── src/ # Core modules
│ ├── config.py # Worker model config (API/5 + DB/2)
│ ├── worker_calculator.py # Worker calculation engine
│ ├── connector_classifier.py # API vs DB connector classification
│ ├── job_overlap_analyzer.py # Minute-by-minute concurrency analysis
│ ├── metabase_billing_data.py # Metabase billing meter queries
│ ├── metabase_worker_data.py # Metabase node_calculator queries
│ ├── airbyte_cloud_data.py # Org mappings + API integration
│ ├── scheduling_optimization_report.py # PDF report (over-utilized)
│ ├── underutilization_report.py # PDF report (under-utilized)
│ ├── non_metabase_customer_report.py # PDF report (non-Metabase)
│ ├── optimize_worker_usage_fast.py # Fast optimization algorithm
│ ├── cron_generator.py # Quartz cron expression utilities
│ └── ... # Additional utilities
├── mode2/ # Prospect estimation scripts
│ ├── mode2_prospect_questionnaire.py # Mode 2A
│ ├── mode2_analyze_oss_export.py # Mode 2B
│ ├── mode2_custom_estimate.py # Mode 2C
│ └── mode2_oss_export_script.py # Export script for prospects
├── tests/ # Test suite
└── examples/ # Example reports and templates
When the Metabase MCP is available, the toolkit queries these BigQuery tables (database_id=2):
| Table | Purpose |
|---|---|
airbyte_warehouse.organization_data_worker_usage_daily |
Daily peak workers (P99 billing) |
airbyte_warehouse.workspace_data_worker_usage_hourly |
Hourly worker patterns (peak hours) |
airbyte_warehouse.account |
Account metadata (name, ARR, owner) |
airbyte_warehouse.organization |
Org metadata (plan, connections, lifecycle) |
- All plans use the same worker model -- do not add plan-specific logic
- Metabase billing data is the authoritative source for worker usage
- Use job overlap analysis (not connection counts) for concurrency calculations
- Customer-specific scripts go in
FinalReports/{CustomerName}/, not insrc/