Skip to content

Add gunicorn support for API server with rolling worker restarts#60940

Merged
kaxil merged 3 commits intoapache:mainfrom
astronomer:gunicorn-api-server
Feb 4, 2026
Merged

Add gunicorn support for API server with rolling worker restarts#60940
kaxil merged 3 commits intoapache:mainfrom
astronomer:gunicorn-api-server

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Jan 22, 2026

Related #60804 & #60919

This PR adds an optional gunicorn server type for the API server, providing:

  • Memory sharing: Gunicorn uses preload + fork, so workers share memory via copy-on-write (unlike uvicorn's multiprocess mode where each worker loads everything independently)
  • Rolling worker restarts: Custom Arbiter performs zero-downtime worker recycling to prevent memory accumulation
  • Proper signal handling: SIGTTOU kills oldest worker (FIFO), enabling true rolling restarts

Memory Impact

With --preload, gunicorn loads the application once in the arbiter process, then forks workers. Workers share read-only memory pages via copy-on-write:

Configuration Estimated Memory Usage
Uvicorn 4 workers ~600 MB (4 × 150 MB, each loads independently)
Gunicorn 4 workers ~300-350 MB (shared base + ~50 MB per worker for unique pages)

Savings: ~40-50% memory reduction with 4 workers. Benefits scale with worker count.

Usage

# Enable gunicorn mode
export AIRFLOW__API__SERVER_TYPE=gunicorn
export AIRFLOW__API__WORKER_REFRESH_INTERVAL=43200  # 12 hours

airflow api-server

Configuration

New [api] configuration options:

  • server_type: uvicorn (default) or gunicorn
  • worker_refresh_interval: Seconds between worker refresh cycles (0 = disabled)
  • worker_refresh_batch_size: Workers to refresh per cycle (default: 1)

Architecture

┌─────────────────────────────────────────────────────────┐
│  airflow api-server (gunicorn via Python API)           │
│  └── AirflowArbiter (custom Arbiter with monitoring)    │
│      ├── worker 1 (UvicornWorker)                       │
│      ├── worker 2 (UvicornWorker)                       │
│      └── worker N (UvicornWorker)                       │
└─────────────────────────────────────────────────────────┘

Uses gunicorn's recommended extension pattern: custom AirflowArbiter subclass integrates worker monitoring directly into the arbiter loop via manage_workers(). No separate thread or subprocess needed.

Rolling Restart Flow

  1. Spawn batch_size new workers (spawn_worker())
  2. Wait for workers to reach target count
  3. Kill batch_size old workers (kill_worker() - kills oldest via FIFO)
  4. Repeat until all original workers replaced

Zero-downtime is guaranteed because new workers are spawned before old workers are killed.

Why Gunicorn Over Uvicorn Multiprocess?

Aspect Gunicorn Uvicorn
Memory sharing Yes (preload + fork COW) No (independent workers)
Rolling restarts Yes (SIGTTOU kills oldest - FIFO) No (SIGTTOU kills newest - LIFO)
Worker management Always has arbiter process No arbiter with workers=1
macOS support Limited (setproctitle issues) Full

The key difference: Uvicorn's SIGTTOU kills the newest worker (LIFO), while Gunicorn kills the oldest (FIFO). Rolling restarts require killing old workers, not new ones.

Why Gunicorn is Optional

Gunicorn is an optional extra (apache-airflow-core[gunicorn]) because:

  1. Windows incompatibility: Gunicorn is Unix-only
  2. Most users don't need it: Default uvicorn is sufficient for development and simple production setups
  3. Different trade-offs: Some users prefer uvicorn's simplicity

image image

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds optional gunicorn support to the API server, enabling zero-downtime worker recycling to prevent memory accumulation in long-running processes. The implementation uses a GunicornMonitor to perform rolling restarts while maintaining service availability.

Changes:

  • Added gunicorn as an optional server type alongside the default uvicorn
  • Implemented GunicornMonitor for zero-downtime worker recycling with configurable refresh intervals
  • Added configuration options for server type, worker refresh intervals, and batch sizes

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Added gunicorn extra to main package dependencies
airflow-core/pyproject.toml Added gunicorn as an optional dependency with version constraint
docs/spelling_wordlist.txt Added "multiprocess" to spelling dictionary
airflow-core/src/airflow/config_templates/config.yml Added server_type, worker_refresh_interval, and worker_refresh_batch_size configuration options
airflow-core/src/airflow/settings.py Added GUNICORN_WORKER_READY_PREFIX constant for process title tracking
airflow-core/src/airflow/cli/commands/gunicorn_monitor.py Implemented GunicornMonitor class for rolling worker restarts
airflow-core/src/airflow/cli/commands/api_server_command.py Added gunicorn command building and server type routing logic
airflow-core/src/airflow/api_fastapi/gunicorn_config.py Added gunicorn hooks for worker readiness tracking and cleanup
airflow-core/tests/unit/cli/commands/test_gunicorn_monitor.py Comprehensive test coverage for GunicornMonitor functionality
airflow-core/newsfragments/60921.significant.rst Release notes documenting the new feature
airflow-core/docs/extra-packages-ref.rst Documentation for gunicorn extra package
airflow-core/docs/administration-and-deployment/web-stack.rst User guide for server types and rolling worker restarts
.pre-commit-config.yaml Added new files to pre-commit exclusions
Comments suppressed due to low confidence (1)

airflow-core/src/airflow/cli/commands/api_server_command.py:1

  • Using verify=False disables SSL certificate verification. While the comment notes this is for localhost health checks, consider only disabling verification when the scheme is https and host is localhost/127.0.0.1 to avoid accidentally disabling verification for remote health checks.
# Licensed to the Apache Software Foundation (ASF) under one

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kaxil kaxil force-pushed the gunicorn-api-server branch 3 times, most recently from 884e5d3 to 4b1ca24 Compare January 29, 2026 01:21
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I think this is a great addition.

I think it would be cool to have both https://github.com/apache/airflow/pull/60804/changes, in case a particular dagbag grows too big in a shorter time than the lifecycle. (just to make sure it doesn't explode and put a limit there)

@kaxil kaxil force-pushed the gunicorn-api-server branch 3 times, most recently from ec264a8 to aa59022 Compare January 30, 2026 17:42
Add optional gunicorn server type for the API server that provides:
- Memory sharing via preload + fork copy-on-write
- Rolling worker restarts through GunicornMonitor
- Correct FIFO signal handling (SIGTTOU kills oldest worker)

New configuration options in [api] section:
- server_type: uvicorn (default) or gunicorn
- worker_refresh_interval: seconds between refresh cycles (0=disabled)
- worker_refresh_batch_size: workers to refresh per cycle
- master_timeout: gunicorn master timeout
- reload_on_plugin_change: reload on plugin file changes

Requires apache-airflow-core[gunicorn] extra for gunicorn mode.
@kaxil kaxil force-pushed the gunicorn-api-server branch from 67503e6 to 4e0812b Compare January 30, 2026 18:38
kaxil added 2 commits January 30, 2026 18:39
Matches Airflow 2's webserver pattern: monitor runs in main thread,
so if it crashes, the whole process exits (fail-fast). No silent
degradation where gunicorn keeps running without worker recycling.

Also triggers monitor when reload_on_plugin_change is enabled,
even if worker_refresh_interval is 0.
…onitor

This refactor changes the gunicorn worker monitoring architecture from an
external thread-based approach to using a custom Arbiter subclass, which
is gunicorn's recommended extension pattern.

Changes:
- New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp
- AirflowArbiter integrates worker refresh into manage_workers() loop
- Removed gunicorn_monitor.py (no longer needed)
- Simplified api_server_command.py (no subprocess, direct gunicorn API)
- Updated tests for new architecture

Benefits:
- Simpler architecture (no separate thread or subprocess)
- Direct access to worker state via self.WORKERS
- Uses gunicorn's internal spawn_worker/kill_worker methods
- Follows gunicorn's documented extension pattern
@kaxil kaxil force-pushed the gunicorn-api-server branch from 4e0812b to 048b919 Compare January 30, 2026 18:39
@kaxil kaxil added this to the Airflow 3.2.0 milestone Jan 30, 2026
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM overall and the changes make sense to me.

@kaxil kaxil merged commit b906080 into apache:main Feb 4, 2026
435 of 437 checks passed
@kaxil kaxil deleted the gunicorn-api-server branch February 4, 2026 09:45
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Backport failed to create: v3-1-test. View the failure log Run details

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker b906080 v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

Alok-kumar-priyadarshi pushed a commit to Alok-kumar-priyadarshi/airflow that referenced this pull request Feb 5, 2026
…che#60940)

Add optional gunicorn server type for the API server that provides:
- Memory sharing via preload + fork copy-on-write
- Rolling worker restarts through GunicornMonitor
- Correct FIFO signal handling (SIGTTOU kills oldest worker)

New configuration options in [api] section:
- server_type: uvicorn (default) or gunicorn
- worker_refresh_interval: seconds between refresh cycles (0=disabled)
- worker_refresh_batch_size: workers to refresh per cycle
- master_timeout: gunicorn master timeout
- reload_on_plugin_change: reload on plugin file changes

Requires apache-airflow-core[gunicorn] extra for gunicorn mode.

* Run GunicornMonitor in main thread instead of daemon thread

Matches Airflow 2's webserver pattern: monitor runs in main thread,
so if it crashes, the whole process exits (fail-fast). No silent
degradation where gunicorn keeps running without worker recycling.

Also triggers monitor when reload_on_plugin_change is enabled,
even if worker_refresh_interval is 0.

* Refactor gunicorn support to use custom Arbiter instead of external monitor

This refactor changes the gunicorn worker monitoring architecture from an
external thread-based approach to using a custom Arbiter subclass, which
is gunicorn's recommended extension pattern.

Changes:
- New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp
- AirflowArbiter integrates worker refresh into manage_workers() loop
- Removed gunicorn_monitor.py (no longer needed)
- Simplified api_server_command.py (no subprocess, direct gunicorn API)
- Updated tests for new architecture

Benefits:
- Simpler architecture (no separate thread or subprocess)
- Direct access to worker state via self.WORKERS
- Uses gunicorn's internal spawn_worker/kill_worker methods
- Follows gunicorn's documented extension pattern
jhgoebbert pushed a commit to jhgoebbert/airflow_Owen-CH-Leung that referenced this pull request Feb 8, 2026
…che#60940)

Add optional gunicorn server type for the API server that provides:
- Memory sharing via preload + fork copy-on-write
- Rolling worker restarts through GunicornMonitor
- Correct FIFO signal handling (SIGTTOU kills oldest worker)

New configuration options in [api] section:
- server_type: uvicorn (default) or gunicorn
- worker_refresh_interval: seconds between refresh cycles (0=disabled)
- worker_refresh_batch_size: workers to refresh per cycle
- master_timeout: gunicorn master timeout
- reload_on_plugin_change: reload on plugin file changes

Requires apache-airflow-core[gunicorn] extra for gunicorn mode.

* Run GunicornMonitor in main thread instead of daemon thread

Matches Airflow 2's webserver pattern: monitor runs in main thread,
so if it crashes, the whole process exits (fail-fast). No silent
degradation where gunicorn keeps running without worker recycling.

Also triggers monitor when reload_on_plugin_change is enabled,
even if worker_refresh_interval is 0.

* Refactor gunicorn support to use custom Arbiter instead of external monitor

This refactor changes the gunicorn worker monitoring architecture from an
external thread-based approach to using a custom Arbiter subclass, which
is gunicorn's recommended extension pattern.

Changes:
- New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp
- AirflowArbiter integrates worker refresh into manage_workers() loop
- Removed gunicorn_monitor.py (no longer needed)
- Simplified api_server_command.py (no subprocess, direct gunicorn API)
- Updated tests for new architecture

Benefits:
- Simpler architecture (no separate thread or subprocess)
- Direct access to worker state via self.WORKERS
- Uses gunicorn's internal spawn_worker/kill_worker methods
- Follows gunicorn's documented extension pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants