Created per-worker RSS memory tracker via Prometheus resolved #6814 by SketchRudy · Pull Request #7546 · Flagsmith/flagsmith

SketchRudy · 2026-05-19T20:17:43Z

Thanks for submitting a PR! Please check the boxes below:

I have read the Contributing Guide.
I have added information to docs/ if required so people know about the feature.
I have filled in the "Changes" section below.
I have filled in the "How did you test this code" section below.

Changes

Adds a Prometheus gauge, flagsmith_worker_rss_bytes, that tracks the peak
resident-set size (RSS) of each API worker process, labelled by PID. This
gives operators per-worker memory visibility so leaks can be spotted on a
dashboard before a worker is OOM-killed.

Implementation

api/metrics/worker_metrics.py — reads the VmHWM (peak RSS) line from
/proc/self/status and exposes it via a prometheus_client.Gauge with a
pid label and multiprocess_mode="liveall", so it aggregates correctly
across gunicorn workers when PROMETHEUS_MULTIPROC_DIR is set. Fails safe to
a no-op on platforms where /proc/self/status is unavailable.
api/core/middleware/worker_rss.py — WorkerRSSMiddleware updates the
gauge after each response. The update is isolated so a metrics failure can
never affect request handling.
api/app/settings/common.py — the middleware is registered only when
PROMETHEUS_ENABLED is true, mirroring the existing
ENABLE_API_USAGE_TRACKING pattern, so deployments without Prometheus incur
zero overhead.

Design notes

The metric reports the high-water mark rather than current RSS. It is cheap
and robust to read, and sufficient for leak detection: a flat line is a
healthy worker that has stabilised, a continuously climbing line indicates a
leak, and recovery is observed via PID rotation when a worker is recycled.
These trade-offs and the interpretation guidance are documented for operators.

Documentation

New operator guide:
docs/docs/deployment-self-hosting/observability/worker-rss-monitoring.md
(enabling, PromQL examples, a Grafana panel, and interpretation notes).
Cross-linked from the metrics index (metrics.mdx).
Catalogue entry added for flagsmith_worker_rss_bytes.

How did you test this code?

Automated tests

Unit tests for the RSS helper and the gauge update/clear functions
(api/tests/unit/metrics/test_unit_worker_metrics.py) — success, missing
data, and unsupported-platform paths.
Unit tests for the middleware
(api/tests/unit/core/middleware/test_unit_core_middleware_worker_rss.py) —
call ordering, response pass-through, and failure isolation.
An integration test
(api/tests/integration/core/test_integration_core_worker_rss_metric.py)
drives a real request through the Django middleware stack and asserts the
gauge appears in the Prometheus exposition output.

Run with:

bash cd api uv sync --extra dev uv run pytest tests/unit/metrics tests/unit/core/middleware tests/integration/core/test_integration_core_worker_rss_metric.py -n0

Manual verification

Built the OSS API image from this branch and ran the full stack with
PROMETHEUS_ENABLED=true. After sending traffic, scraping /metrics returned
the metric for each live gunicorn worker, alongside the built-in Flagsmith
metrics

Confirmed one series per worker PID, with the expected VmHWM-based
description in the metric HELP text.

Worked on this with @mmaslov007 @HumaGitGud @AAshGray

…s-collector

Add worker max RSS helper with tests

… the wsgi file

Add Prometheus gauge metric for worker process RSS memory

for more information, see https://pre-commit.ci

Mypy fixes

Wired middleware to worker RSS gauge for per-request metric updates

fixed hardcoded line into if statement

Story #4 closes out the documentation and verification work for the flagsmith_worker_rss_bytes gauge (Flagsmith#6814). - New operator guide at docs/docs/deployment-self-hosting/observability/worker-rss-monitoring.md covering enabling, PromQL examples, Grafana panel suggestions, and high-water-mark interpretation notes. - Cross-link added to metrics.mdx so the guide is discoverable from the metrics index page. - Corrected the stale catalogue description for flagsmith_worker_rss_bytes to match the post-PR-#3 Python docstring (high-water mark / VmHWM). - Integration test in api/tests/integration/core/ exercises the full path: request through WorkerRSSMiddleware, gauge update, registry scrape. Satisfies story #3 AC #5 escape hatch. - Temporary scaffold note at docs/development/ documents Windows limitations encountered and follow-ups for the team.

docs: document worker RSS metric and add integration test

chore: remove sprint scaffold notes ahead of upstream submission

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

vercel · 2026-05-19T20:17:54Z

@AAshGray is attempting to deploy a commit to the Flagsmith Team on Vercel.

A member of the Team first needs to authorize it.

for more information, see https://pre-commit.ci

matthewelwell · 2026-05-19T21:39:41Z

Hi @SketchRudy , thanks for the PR. Please can you review the linting failure here, and make sure that the title of the PR adheres to the conventional commit format?

mmaslov007 and others added 30 commits April 27, 2026 20:05

added worker max RSS helper with tests + instructions

30e5cfa

Merge branch 'Flagsmith:main' into rss-collector

1ef08d0

Merge branch 'Flagsmith:main' into rss-collector

b095078

Relocated helper testing instructions to relevant /docs folder.

cc76ffd

Merge branch 'main' of https://github.com/Flagsmith/flagsmith into rs…

46fb1ac

…s-collector

read worker max RSS from proc status

2c8cec4

Merge pull request #2 from mmaslov007/rss-collector

9145112

Add worker max RSS helper with tests

created worker metric Prometheus gauge

b62ffad

updated the metrics-catalogue with the flagsmith_worker_rss_byes gauge

d3c3691

created worker metric Prometheus gauge

56741a3

added a background thread to update resource usage and imported it to…

0b49af0

… the wsgi file

updated new worker_metrics with gauge

51386b3

fixed duplicate imports

59255ae

re-arranged update_ and clear_ functions to be adjacent

09b996e

added unit tests for update_ and clear_worker_metrics

6c7b3f6

minor correction to test so mock mirrors gauge syntax more accurately

1738ef7

Merge pull request #1 from mmaslov007/gauge

b3be154

Add Prometheus gauge metric for worker process RSS memory

adjusted gauge description

c7113e5

added return types for mypy to pass

0a89c75

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef716d6

for more information, see https://pre-commit.ci

Merge pull request #3 from mmaslov007/mypy-fixes

791500b

Mypy fixes

Merge branch 'Flagsmith:main' into main

e572124

implemented django worker rss middleware to update memory gauge

4a3a7a6

add WorkerRSSMiddleware in middleware stack

55752b0

implemented tests for middleware

8533a22

Merge pull request #4 from mmaslov007/worker-rss-middleware

a4cf7c8

Wired middleware to worker RSS gauge for per-request metric updates

fixed hardcoded line into if statement

058749e

Merge branch 'Flagsmith:main' into main

202f1c9

Merge pull request #5 from mmaslov007/middleware-fix

3d00a6d

fixed hardcoded line into if statement

SketchRudy and others added 4 commits May 14, 2026 11:13

Merge pull request #6 from mmaslov007/story-4-worker-rss-docs

4f516d6

docs: document worker RSS metric and add integration test

Merge branch 'Flagsmith:main' into main

859945a

chore: remove sprint scaffold notes ahead of upstream submission

b5cfac8

Merge pull request #7 from mmaslov007/chore/remove-sprint-scaffolds

4003433

chore: remove sprint scaffold notes ahead of upstream submission

SketchRudy requested review from a team as code owners May 19, 2026 20:17

SketchRudy requested review from Holmus and emyller and removed request for a team May 19, 2026 20:17

claude Bot reviewed May 19, 2026

View reviewed changes

github-actions Bot added api Issue related to the REST API docs Documentation updates labels May 19, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d83f7a

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created per-worker RSS memory tracker via Prometheus resolved #6814 #7546

Created per-worker RSS memory tracker via Prometheus resolved #6814 #7546
SketchRudy wants to merge 35 commits into
Flagsmith:mainfrom
mmaslov007:main

SketchRudy commented May 19, 2026

Uh oh!

claude Bot left a comment

Uh oh!

vercel Bot commented May 19, 2026

Uh oh!

matthewelwell commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SketchRudy commented May 19, 2026

Changes

Implementation

Design notes

Documentation

How did you test this code?

Automated tests

Manual verification

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

vercel Bot commented May 19, 2026

Uh oh!

matthewelwell commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants