Skip to content

Occasionally restart Gunicorn and Celery worker processes to clear held memory #7859

@melton-jason

Description

@melton-jason

Is your feature request related to a problem? Please describe.
Given the tendency of (C)Python (or more specifically, glibc) to hold onto allocated memory (see #7858), it might be nice to have some way to occasionally restart the Gunicorn and Celery worker processes to return memory to the Operating System.
The intuition behind this approach is that terminating the process is a guaranteed way to release the memory back to the OS.

See also:
https://stackoverflow.com/questions/15455048/releasing-memory-in-python
https://stackoverflow.com/questions/68225871/python3-give-unused-interpreter-memory-back-to-the-os

Gunicorn and Celery have built-in options to control periodically restarting their worker processes.
See Gunicorn's max_requests and max_requests_jitter and Celery's worker_max_tasks_per_child.
Both options set a threshold of maximum requests/tasks per worker process. Once the threshold is crossed, the worker process will be replaced with a new worker process (freeing the memory of the old worker process in the progress).

Describe the solution you'd like
We should allow having some way to restart or recreate the worker processes for Gunicorn and Celery.
This could be facilitated via their built-in options (which would preferably be configurable) set during build time.

Alternatively, we can (also) leave this process up to the system administrator where some automated job can be set in place to periodically restart the Gunicorn and Celery workers during downtimes/off-hours.

Sending a HUP signal to the Gunicorn master process will cause Gunicorn to gracefully stop old worker processes (letting them finish serving any requests before stoping them) and spawn new workers:
https://gunicorn.org/signals/#signal-handling

Celery recommends sending a TERM signal to the worker process and recreating it: https://docs.celeryq.dev/en/stable/userguide/workers.html#restarting-the-worker. As far Specify is concerned, it may be better to instead restart the worker service (stop and restart) in containerized environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    1 - EnhancementImprovements or extensions to existing behavior4 - PerformanceIssues related to performance, concurrency, and optimizationtype:metaThe Issue is related to platform engineering, deployment, CI/CD, GitOps, or other DevOps aspects

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions