Add gunicorn support for API server with rolling worker restarts#60940
Add gunicorn support for API server with rolling worker restarts#60940kaxil merged 3 commits intoapache:mainfrom
Conversation
b13f0a5 to
802b8e3
Compare
802b8e3 to
816507e
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds optional gunicorn support to the API server, enabling zero-downtime worker recycling to prevent memory accumulation in long-running processes. The implementation uses a GunicornMonitor to perform rolling restarts while maintaining service availability.
Changes:
- Added gunicorn as an optional server type alongside the default uvicorn
- Implemented GunicornMonitor for zero-downtime worker recycling with configurable refresh intervals
- Added configuration options for server type, worker refresh intervals, and batch sizes
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added gunicorn extra to main package dependencies |
| airflow-core/pyproject.toml | Added gunicorn as an optional dependency with version constraint |
| docs/spelling_wordlist.txt | Added "multiprocess" to spelling dictionary |
| airflow-core/src/airflow/config_templates/config.yml | Added server_type, worker_refresh_interval, and worker_refresh_batch_size configuration options |
| airflow-core/src/airflow/settings.py | Added GUNICORN_WORKER_READY_PREFIX constant for process title tracking |
| airflow-core/src/airflow/cli/commands/gunicorn_monitor.py | Implemented GunicornMonitor class for rolling worker restarts |
| airflow-core/src/airflow/cli/commands/api_server_command.py | Added gunicorn command building and server type routing logic |
| airflow-core/src/airflow/api_fastapi/gunicorn_config.py | Added gunicorn hooks for worker readiness tracking and cleanup |
| airflow-core/tests/unit/cli/commands/test_gunicorn_monitor.py | Comprehensive test coverage for GunicornMonitor functionality |
| airflow-core/newsfragments/60921.significant.rst | Release notes documenting the new feature |
| airflow-core/docs/extra-packages-ref.rst | Documentation for gunicorn extra package |
| airflow-core/docs/administration-and-deployment/web-stack.rst | User guide for server types and rolling worker restarts |
| .pre-commit-config.yaml | Added new files to pre-commit exclusions |
Comments suppressed due to low confidence (1)
airflow-core/src/airflow/cli/commands/api_server_command.py:1
- Using verify=False disables SSL certificate verification. While the comment notes this is for localhost health checks, consider only disabling verification when the scheme is https and host is localhost/127.0.0.1 to avoid accidentally disabling verification for remote health checks.
# Licensed to the Apache Software Foundation (ASF) under one
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
884e5d3 to
4b1ca24
Compare
pierrejeambrun
left a comment
There was a problem hiding this comment.
Nice I think this is a great addition.
I think it would be cool to have both https://github.com/apache/airflow/pull/60804/changes, in case a particular dagbag grows too big in a shorter time than the lifecycle. (just to make sure it doesn't explode and put a limit there)
ec264a8 to
aa59022
Compare
Add optional gunicorn server type for the API server that provides: - Memory sharing via preload + fork copy-on-write - Rolling worker restarts through GunicornMonitor - Correct FIFO signal handling (SIGTTOU kills oldest worker) New configuration options in [api] section: - server_type: uvicorn (default) or gunicorn - worker_refresh_interval: seconds between refresh cycles (0=disabled) - worker_refresh_batch_size: workers to refresh per cycle - master_timeout: gunicorn master timeout - reload_on_plugin_change: reload on plugin file changes Requires apache-airflow-core[gunicorn] extra for gunicorn mode.
67503e6 to
4e0812b
Compare
Matches Airflow 2's webserver pattern: monitor runs in main thread, so if it crashes, the whole process exits (fail-fast). No silent degradation where gunicorn keeps running without worker recycling. Also triggers monitor when reload_on_plugin_change is enabled, even if worker_refresh_interval is 0.
…onitor This refactor changes the gunicorn worker monitoring architecture from an external thread-based approach to using a custom Arbiter subclass, which is gunicorn's recommended extension pattern. Changes: - New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp - AirflowArbiter integrates worker refresh into manage_workers() loop - Removed gunicorn_monitor.py (no longer needed) - Simplified api_server_command.py (no subprocess, direct gunicorn API) - Updated tests for new architecture Benefits: - Simpler architecture (no separate thread or subprocess) - Direct access to worker state via self.WORKERS - Uses gunicorn's internal spawn_worker/kill_worker methods - Follows gunicorn's documented extension pattern
4e0812b to
048b919
Compare
jason810496
left a comment
There was a problem hiding this comment.
Nice! LGTM overall and the changes make sense to me.
Backport failed to create: v3-1-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker b906080 v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
…che#60940) Add optional gunicorn server type for the API server that provides: - Memory sharing via preload + fork copy-on-write - Rolling worker restarts through GunicornMonitor - Correct FIFO signal handling (SIGTTOU kills oldest worker) New configuration options in [api] section: - server_type: uvicorn (default) or gunicorn - worker_refresh_interval: seconds between refresh cycles (0=disabled) - worker_refresh_batch_size: workers to refresh per cycle - master_timeout: gunicorn master timeout - reload_on_plugin_change: reload on plugin file changes Requires apache-airflow-core[gunicorn] extra for gunicorn mode. * Run GunicornMonitor in main thread instead of daemon thread Matches Airflow 2's webserver pattern: monitor runs in main thread, so if it crashes, the whole process exits (fail-fast). No silent degradation where gunicorn keeps running without worker recycling. Also triggers monitor when reload_on_plugin_change is enabled, even if worker_refresh_interval is 0. * Refactor gunicorn support to use custom Arbiter instead of external monitor This refactor changes the gunicorn worker monitoring architecture from an external thread-based approach to using a custom Arbiter subclass, which is gunicorn's recommended extension pattern. Changes: - New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp - AirflowArbiter integrates worker refresh into manage_workers() loop - Removed gunicorn_monitor.py (no longer needed) - Simplified api_server_command.py (no subprocess, direct gunicorn API) - Updated tests for new architecture Benefits: - Simpler architecture (no separate thread or subprocess) - Direct access to worker state via self.WORKERS - Uses gunicorn's internal spawn_worker/kill_worker methods - Follows gunicorn's documented extension pattern
…che#60940) Add optional gunicorn server type for the API server that provides: - Memory sharing via preload + fork copy-on-write - Rolling worker restarts through GunicornMonitor - Correct FIFO signal handling (SIGTTOU kills oldest worker) New configuration options in [api] section: - server_type: uvicorn (default) or gunicorn - worker_refresh_interval: seconds between refresh cycles (0=disabled) - worker_refresh_batch_size: workers to refresh per cycle - master_timeout: gunicorn master timeout - reload_on_plugin_change: reload on plugin file changes Requires apache-airflow-core[gunicorn] extra for gunicorn mode. * Run GunicornMonitor in main thread instead of daemon thread Matches Airflow 2's webserver pattern: monitor runs in main thread, so if it crashes, the whole process exits (fail-fast). No silent degradation where gunicorn keeps running without worker recycling. Also triggers monitor when reload_on_plugin_change is enabled, even if worker_refresh_interval is 0. * Refactor gunicorn support to use custom Arbiter instead of external monitor This refactor changes the gunicorn worker monitoring architecture from an external thread-based approach to using a custom Arbiter subclass, which is gunicorn's recommended extension pattern. Changes: - New gunicorn_app.py with AirflowArbiter and AirflowGunicornApp - AirflowArbiter integrates worker refresh into manage_workers() loop - Removed gunicorn_monitor.py (no longer needed) - Simplified api_server_command.py (no subprocess, direct gunicorn API) - Updated tests for new architecture Benefits: - Simpler architecture (no separate thread or subprocess) - Direct access to worker state via self.WORKERS - Uses gunicorn's internal spawn_worker/kill_worker methods - Follows gunicorn's documented extension pattern
Related #60804 & #60919
This PR adds an optional
gunicornserver type for the API server, providing:Memory Impact
With
--preload, gunicorn loads the application once in the arbiter process, then forks workers. Workers share read-only memory pages via copy-on-write:Savings: ~40-50% memory reduction with 4 workers. Benefits scale with worker count.
Usage
Configuration
New
[api]configuration options:server_type:uvicorn(default) orgunicornworker_refresh_interval: Seconds between worker refresh cycles (0 = disabled)worker_refresh_batch_size: Workers to refresh per cycle (default: 1)Architecture
Uses gunicorn's recommended extension pattern: custom
AirflowArbitersubclass integrates worker monitoring directly into the arbiter loop viamanage_workers(). No separate thread or subprocess needed.Rolling Restart Flow
batch_sizenew workers (spawn_worker())batch_sizeold workers (kill_worker()- kills oldest via FIFO)Zero-downtime is guaranteed because new workers are spawned before old workers are killed.
Why Gunicorn Over Uvicorn Multiprocess?
The key difference: Uvicorn's SIGTTOU kills the newest worker (LIFO), while Gunicorn kills the oldest (FIFO). Rolling restarts require killing old workers, not new ones.
Why Gunicorn is Optional
Gunicorn is an optional extra (
apache-airflow-core[gunicorn]) because: