Skip to content

Add UvicornMonitor for zero-downtime API server worker recycling#60919

Closed
kaxil wants to merge 1 commit intoapache:mainfrom
astronomer:uvicorn-cycling
Closed

Add UvicornMonitor for zero-downtime API server worker recycling#60919
kaxil wants to merge 1 commit intoapache:mainfrom
astronomer:uvicorn-cycling

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Jan 22, 2026

Alternate/complement to #60804

Why

API server workers accumulate memory over time. Airflow 2 had GunicornMonitor with rolling restarts - spawn new workers, health check them, then kill old ones. This was lost when we switched to uvicorn.

Uvicorn's built-in limit_max_requests doesn't help here - it kills old workers before spawning new ones, causing downtime.

What

New UvicornMonitor class that does rolling restarts:

[n workers] → spawn batch → [n + batch workers] → health check → kill old batch → [n workers]

Enabled via config:

[api]
worker_refresh_interval = 1800  # seconds, 0 = disabled (default)
worker_refresh_batch_size = 1

Gotchas (vs GunicornMonitor)

Single-worker mode: Gunicorn always uses master/worker architecture even with workers=1 (master + 1 worker), so signals always work. Uvicorn with workers=1 runs in single-process mode with no supervisor, so SIGTTIN/SIGTTOU are ignored. Workaround: start with 2 workers and scale down after startup. Works, but briefly has 2 workers during refresh cycles.

Uvicorn kills newest worker, not oldest: Gunicorn kills the oldest worker on SIGTTOU (workers.pop(0) sorted by age). Uvicorn kills the newest (processes.pop() - LIFO). This would kill our fresh healthy workers instead of the old ones. Fixed by sending SIGTERM directly to specific old PIDs instead of using SIGTTOU.

No memory sharing between workers: Unlike Gunicorn (which uses preload + fork for copy-on-write memory sharing), uvicorn's multiprocess mode spawns independent workers that each load everything separately. This means:

  • Each worker has its own copy of all loaded modules, configurations, and cached data
  • Total memory usage scales linearly with worker count (N workers = ~N× memory)

This is an inherent limitation of uvicorn's multiprocessing approach vs gunicorn's preload+fork model.

Testing

# Start API server with worker recycling enabled
export AIRFLOW__API__WORKER_REFRESH_INTERVAL=60
airflow api-server --workers 2

# Watch logs for rolling restart messages:
# "Starting worker refresh: current workers PIDs [1001, 1002]"
# "Spawned new workers: PIDs [1003]"
# "Killing old workers: PIDs [1001]"
# "Worker refresh completed: new PIDs [1003] replaced old PIDs [1001]"

Related

Implement rolling worker restarts for the API server using uvicorn's
SIGTTIN/SIGTTOU signal-based worker management. This provides feature
parity with Airflow 2's GunicornMonitor while maintaining zero downtime.

Key features:
- Rolling restart pattern: spawn new workers before killing old ones
- Health check verification before killing old workers
- Automatic rollback if new workers fail health checks
- Configurable refresh interval and batch size

New configuration options:
- worker_refresh_interval: Seconds between refresh cycles (default: 0/disabled)
- worker_refresh_batch_size: Workers to refresh at a time (default: 1)

This addresses memory accumulation in long-running API server workers
by periodically recycling them while ensuring continuous availability.
version_added: 3.2.0
type: integer
example: ~
default: "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use some more robust default here?

I guess we could have min 2 workers by default and have rolling restarts enabled by default, and refresh interval set to some default, reasonable value (say 1hr).

While it increases memory used as explained in the PR description, it is far more robust for "out-of-the-box" installation, and if users would like to save memory by limiting it to 1, they still should be able to do so (and then we could disable the uvicorn restarts altogether). I think it's more important to have "robust non leaking, gently restarting" by default, rather than "potentially leaking smaller initial memory".

Since Fast API recommendation is 1 worker per Pod - we could also recommend to the users - in the same docs - to use the rolling restarts of deployment instead those restarts that are handled by Uvicorn monitor. And we could add description to the number of workers paraneter, that setting it to 1 is ok as long as you have more than one instance of api-server and implement refreshing rolling restart outside of Airlfow. This way, the users who would like to change it to 1 (and read the docs that is) - would know that they also need to take care about potential leaks.

There are even some ways to do it automatically - which we could implement in our Helm Chart as an option potentially:

https://stackoverflow.com/questions/67423919/k8s-cronjob-rolling-restart-every-day

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah certainly an option.

@kaxil
Copy link
Member Author

kaxil commented Jan 28, 2026

Closing this in favor of #60940

@kaxil kaxil closed this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants