ENG-740: Redis key performance issue (base storage layer) by johnewart · Pull Request #7409 · ethyca/fides

johnewart · 2026-02-18T01:48:25Z

Description Of Changes

TL;DR - redis cache manager with automatic set-based key indexing, lazy migration of old keys, is not used in any APIs or async functions in this PR, it is purely the storage logic.

This PR lays the groundwork for a Redis cache layer that avoids using KEYS (and eventually SCAN) by providing a convenience layer for secondary index storage (and also as a starting point for other Redis improvements like clustering, enhanced logging, monitoring, etc). This implementation provides the following (and nothing else) - a RedisCacheManager that provides a layer around the Redis client to automatically handle secondary indexing of keys so that they can be cleared and queried without synchronous (and slow) access like KEYS (or SCAN). It does this by allowing you to specify an index name and then transparently recording keys that are related to that index in a Redis set with that name (i.e dsr:1234 or something). Then you can quickly get all the keys for iteration or determine membership, etc.

The second part which could be a separate PR if you prefer as it sits on top of the RedisCacheManager, is a DSRCacheStore that has well-known operations for caching DSRs that take advantage of the underlying secondary index management in RedisCacheManager and also performs backwards compatible key translation (KeyMapper) for well-known types of DSR data, lazy migration on read and writes to the new index mechanism. This is because (among other things) there will be an initial period where we have legacy data being processed and we need to be able to continue working while moving over to this new format (at which point KeyMapper and the things that use it can be removed).

The DSRCacheStore addresses the underlying KEYS issue by leveraging SCAN instead for legacy keys (which uses pagination so, while not nearly as fast, provides less contention).

Neither the RedisCacheManager, nor the DSRCacheStore are used in any active API/async worker code in this PR, it is purely a base for the follow ons (which I have split by DSR functionality - key clearing (the worst offender of KEYS currently), identity cache, encryption and custom fields, request body, etc. in subsequent PRs.

Code Changes

(See above)

Steps to Confirm

None, there are unit tests to test the key management logic / mapping / lazy reads / index management / etc.

Pre-Merge Checklist

Phase 1: Infrastructure + Tests - Context manager get_dsr_cache_store() for obtaining store instances - Fix clear() to use SCAN for finding all keys in mixed state - Add 14 migration tests with factory functions and parametrization - Tests verify backward compatibility and migration behavior No production code changes - infrastructure only.

vercel · 2026-02-18T01:48:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ignored		Feb 18, 2026 1:48am
fides-privacy-center	Ignored		Feb 18, 2026 1:48am

greptile-apps · 2026-02-18T01:54:05Z

Greptile Summary

This PR introduces a new Redis cache management layer (RedisCacheManager + DSRCacheStore) that replaces expensive KEYS/SCAN commands with SET-based secondary indexes for DSR (privacy request) cache operations. It also includes a KeyMapper that maps all known legacy key formats to the new dsr:{id}:{part} scheme, and supports lazy migration of legacy keys on read.

RedisCacheManager wraps a Redis client with index-backed operations (set_with_index, delete_keys_by_index, etc.) stored as Redis SETs at __idx:{prefix}
DSRCacheStore provides convenience methods for all DSR field types (identity, encryption, DRP, masking secrets, etc.) with automatic legacy key fallback and migration
KeyMapper encodes all discovered legacy key patterns alongside their new-format equivalents
Tests are thorough with 3 test files covering key mapping, store operations, and migration behavior using in-memory mocks — no real Redis required
Note: Index sets do not have TTLs, so stale references can accumulate when data keys expire. The clear() method's SCAN pattern also matches the index set key itself, and deletes keys individually rather than in batch
This is infrastructure-only — none of this code is wired into active API or async worker paths in this PR

Confidence Score: 3/5

Infrastructure-only PR with no active code paths affected, but has design concerns around index set TTL and clear() implementation that should be addressed before follow-on PRs wire this into production.
Score of 3 reflects that while the code is well-structured and thoroughly tested, there are two meaningful issues: (1) index sets never expire, creating a potential memory leak for DSRs that aren't explicitly cleared, and (2) the clear() method's SCAN pattern inadvertently matches index keys and uses inefficient one-by-one deletion. These should be resolved before this layer is used in production code paths. The PR also exceeds 500 lines (1265 additions across 10 files).
src/fides/common/cache/dsr_store.py needs attention for the index TTL and clear() issues. src/fides/common/cache/manager.py and src/fides/common/cache/key_mapping.py have minor unused import cleanup.

Important Files Changed

Filename	Overview
src/fides/common/cache/manager.py	Core RedisCacheManager with index-backed key management. Clean design with minor unused import (`Any`). Index sets lack TTL management, which could lead to stale references when data keys expire.
src/fides/common/cache/dsr_store.py	DSR-specific cache store with legacy migration. Has two issues: (1) index sets retain stale references when data keys expire via TTL, and (2) `clear()` SCAN pattern matches the index set key itself, plus deletes keys one-by-one instead of batching. Duplicated `RedisValue` type alias.
src/fides/common/cache/key_mapping.py	Maps legacy key patterns to new `dsr:{id}:{part}` format. Comprehensive coverage of all known DSR cache key types. Unused imports (`Any`, `Dict`) should be removed.
src/fides/common/cache/init.py	Context manager entry point for DSRCacheStore. Has an import inside function body (`get_cache`) that should be moved to module level per project conventions unless circular dependency prevents it.
tests/common/cache/conftest.py	Test conftest that overrides heavy session fixtures to avoid starting FastAPI, DB, and Celery for cache-only tests. Well-structured approach.
tests/common/cache/test_dsr_store.py	Thorough unit tests for DSRCacheStore with in-memory mocks covering set/get, indexing, clear, delete, legacy migration, and all convenience methods.
tests/common/cache/test_dsr_store_key_mapping.py	Comprehensive tests for every KeyMapper method verifying both new-format and legacy key generation. Clean and complete.
tests/common/cache/test_dsr_store_migration.py	Migration tests using MockRedis with real RedisCacheManager. Covers legacy key reading, migration on read, mixed-state clearing, index backfill, and DSR isolation. Good use of parametrize and factory functions.

_{Last reviewed commit: 8581498}

greptile-apps

_{10 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T01:54:08Z

src/fides/common/cache/dsr_store.py

+    def set(
+        self,
+        dsr_id: str,
+        part: str,
+        value: RedisValue,
+        expire_seconds: Optional[int] = None,
+    ) -> Optional[bool]:
+        """
+        Set a value for the given DSR and part. Registers the key in the DSR index.
+        """
+        key = _dsr_key(dsr_id, part)
+        return self._manager.set_with_index(
+            key, value, _dsr_index_prefix(dsr_id), expire_seconds
+        )


Index sets accumulate stale references when data keys expire

When set_with_index is called with expire_seconds, the data key gets a TTL but the index set member (the key name stored in the Redis SET at __idx:dsr:{dsr_id}) does not. When the data key expires via Redis TTL, the index set still references it.

This means get_all_keys() will return keys that no longer exist in Redis, and delete_keys_by_index() in the manager will attempt to delete already-expired keys (harmless but wasteful). More importantly, the index set itself never expires, so for DSRs that are never explicitly clear()-ed, the index sets will accumulate indefinitely.

Consider either:

Setting a TTL on the index set itself (e.g., matching the max data key TTL), or

Filtering out dead keys in get_all_keys() (check existence before returning), or

Documenting this as a known limitation that clear() should always be called.

greptile-apps · 2026-02-18T01:54:09Z

src/fides/common/cache/dsr_store.py

+    def clear(self, dsr_id: str) -> None:
+        """
+        Delete all cache keys for this DSR and remove the index.
+
+        Always uses SCAN to find all keys (both indexed and legacy) to ensure
+        complete cleanup in mixed-key scenarios.
+        """
+        # Use SCAN to find ALL keys (indexed + legacy)
+        all_keys_via_scan = list(self._redis.scan_iter(match=f"*{dsr_id}*", count=500))
+
+        index_prefix = _dsr_index_prefix(dsr_id)
+
+        # Delete all found keys
+        for key in all_keys_via_scan:
+            self._redis.delete(key)
+            self._manager.remove_key_from_index(index_prefix, key)
+
+        # Delete the index itself
+        self._manager.delete_index(index_prefix)


SCAN pattern *{dsr_id}* will match the index set key itself

The SCAN pattern f"*{dsr_id}*" on line 293 will also match the index set key __idx:dsr:{dsr_id} because it contains the dsr_id substring. This means line 299 deletes the index set key via self._redis.delete(key), and then line 303 tries to delete it again via self._manager.delete_index(index_prefix). The second call is a harmless no-op, but line 300 also calls remove_key_from_index on the already-deleted index — this is an SREM on a key that no longer exists.

More importantly, deleting keys one at a time in a loop (line 299) is less efficient than batching. Consider:

Suggested change

def clear(self, dsr_id: str) -> None:

"""

Delete all cache keys for this DSR and remove the index.

Always uses SCAN to find all keys (both indexed and legacy) to ensure

complete cleanup in mixed-key scenarios.

"""

# Use SCAN to find ALL keys (indexed + legacy)

all_keys_via_scan = list(self._redis.scan_iter(match=f"*{dsr_id}*", count=500))

index_prefix = _dsr_index_prefix(dsr_id)

# Delete all found keys

for key in all_keys_via_scan:

self._redis.delete(key)

self._manager.remove_key_from_index(index_prefix, key)

# Delete the index itself

self._manager.delete_index(index_prefix)

def clear(self, dsr_id: str) -> None:

"""

Delete all cache keys for this DSR and remove the index.

Always uses SCAN to find all keys (both indexed and legacy) to ensure

complete cleanup in mixed-key scenarios.

"""

# Use SCAN to find ALL keys (indexed + legacy)

all_keys_via_scan = list(self._redis.scan_iter(match=f"*{dsr_id}*", count=500))

index_prefix = _dsr_index_prefix(dsr_id)

# Delete all found keys in batch

if all_keys_via_scan:

self._redis.delete(*all_keys_via_scan)

# Delete the index itself

self._manager.delete_index(index_prefix)

greptile-apps · 2026-02-18T01:54:10Z

src/fides/common/cache/key_mapping.py

+  logical key; set_encoded_object stores under EN_ + logical key in Redis).
+"""
+
+from typing import Any, Dict, Tuple


Unused imports Any and Dict

Any and Dict are imported but never used in this module. Only Tuple is used.

Suggested change

from typing import Any, Dict, Tuple

from typing import Tuple

greptile-apps · 2026-02-18T01:54:12Z

src/fides/common/cache/manager.py

+__idx:{index_prefix}; members are the actual cache key names.
+"""
+
+from typing import Any, List, Optional, Union


Unused import Any

Any is imported but never used as a type annotation in this module — it only appears in a docstring.

Suggested change

from typing import Any, List, Optional, Union

from typing import List, Optional, Union

greptile-apps · 2026-02-18T01:54:13Z

src/fides/common/cache/dsr_store.py

+from fides.common.cache.manager import RedisCacheManager
+
+# Value types supported by Redis
+RedisValue = Union[bytes, float, int, str]


RedisValue type alias duplicated across modules

RedisValue = Union[bytes, float, int, str] is defined identically in both manager.py (line 17) and dsr_store.py (line 25). Since dsr_store.py already imports from manager.py, consider importing RedisValue from there instead of redefining it.

Suggested change

from fides.common.cache.manager import RedisCacheManager

# Value types supported by Redis

RedisValue = Union[bytes, float, int, str]

from fides.common.cache.manager import RedisCacheManager, RedisValue

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-18T01:54:14Z

src/fides/common/cache/__init__.py

+@contextmanager
+def get_dsr_cache_store(
+    *,
+    backfill_index_on_legacy_read: bool = True,
+    migrate_legacy_on_read: bool = True,
+) -> Iterator[DSRCacheStore]:
+    """
+    Context manager that yields a DSRCacheStore backed by the application Redis connection.
+
+    The store handles both new-format keys (dsr:{id}:{part}) and legacy keys
+    (id-{id}-{field}-{attr}) with automatic migration on read when migrate_legacy_on_read=True.
+
+    Args:
+        backfill_index_on_legacy_read: When listing keys and falling back to SCAN for
+            legacy keys, add those keys to the index. Default True.
+        migrate_legacy_on_read: When a get finds value in legacy key only, write to
+            new key, delete legacy key, add new key to index. Default True.
+
+    Yields:
+        DSRCacheStore instance ready for use
+
+    Usage:
+        with get_dsr_cache_store() as store:
+            store.clear(privacy_request_id)
+
+        with get_dsr_cache_store() as store:
+            store.write_identity(pr_id, "email", "user@example.com")
+            value = store.get_identity(pr_id, "email")
+    """
+    from fides.api.util.cache import get_cache
+
+    redis_client = get_cache()


Import inside function body

The get_cache import is placed inside the function body. Per project conventions, imports should be at the top of the module. If there's a circular dependency concern here, it may be worth noting that in a comment — otherwise move it to the top.

Suggested change

@contextmanager

def get_dsr_cache_store(

*,

backfill_index_on_legacy_read: bool = True,

migrate_legacy_on_read: bool = True,

) -> Iterator[DSRCacheStore]:

"""

Context manager that yields a DSRCacheStore backed by the application Redis connection.

The store handles both new-format keys (dsr:{id}:{part}) and legacy keys

(id-{id}-{field}-{attr}) with automatic migration on read when migrate_legacy_on_read=True.

Args:

backfill_index_on_legacy_read: When listing keys and falling back to SCAN for

legacy keys, add those keys to the index. Default True.

migrate_legacy_on_read: When a get finds value in legacy key only, write to

new key, delete legacy key, add new key to index. Default True.

Yields:

DSRCacheStore instance ready for use

Usage:

with get_dsr_cache_store() as store:

store.clear(privacy_request_id)

with get_dsr_cache_store() as store:

store.write_identity(pr_id, "email", "user@example.com")

value = store.get_identity(pr_id, "email")

"""

from fides.api.util.cache import get_cache

redis_client = get_cache()

from fides.api.util.cache import get_cache

@contextmanager

def get_dsr_cache_store(

*,

backfill_index_on_legacy_read: bool = True,

migrate_legacy_on_read: bool = True,

) -> Iterator[DSRCacheStore]:

"""

Context manager that yields a DSRCacheStore backed by the application Redis connection.

The store handles both new-format keys (dsr:{id}:{part}) and legacy keys

(id-{id}-{field}-{attr}) with automatic migration on read when migrate_legacy_on_read=True.

Args:

backfill_index_on_legacy_read: When listing keys and falling back to SCAN for

legacy keys, add those keys to the index. Default True.

migrate_legacy_on_read: When a get finds value in legacy key only, write to

new key, delete legacy key, add new key to index. Default True.

Yields:

DSRCacheStore instance ready for use

Usage:

with get_dsr_cache_store() as store:

store.clear(privacy_request_id)

with get_dsr_cache_store() as store:

store.write_identity(pr_id, "email", "user@example.com")

value = store.get_identity(pr_id, "email")

"""

redis_client = get_cache()

Context Used: Rule from dashboard - Python imports should always be placed at the top of the file, not near the code that uses them. (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

johnewart added 4 commits February 10, 2026 20:54

Local conftest.py suppresses un-needed heavy fixtures

15f26b6

Cleanup tests

87bbad8

Cleanup tests

8581498

johnewart requested review from JadeCara, adamsachs, erosselli and galvana February 18, 2026 01:48

johnewart requested a review from a team as a code owner February 18, 2026 01:48

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-740: Redis key performance issue (base storage layer)#7409

ENG-740: Redis key performance issue (base storage layer)#7409
johnewart wants to merge 4 commits intomainfrom
johnewart/ENG-740/base-framework

johnewart commented Feb 18, 2026 •

edited by atlassian bot

Loading

Uh oh!

vercel bot commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

	from typing import Any, List, Optional, Union
	from typing import List, Optional, Union

Conversation

johnewart commented Feb 18, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

johnewart commented Feb 18, 2026 •

edited by atlassian bot

Loading