Skip to content

ENG-2440 - Add new index WHERE is_leaf IS TRUE and and distance index#7398

Open
vcruces wants to merge 8 commits intomainfrom
ENG-2440
Open

ENG-2440 - Add new index WHERE is_leaf IS TRUE and and distance index#7398
vcruces wants to merge 8 commits intomainfrom
ENG-2440

Conversation

@vcruces
Copy link
Contributor

@vcruces vcruces commented Feb 16, 2026

Ticket ENG-2440

Description Of Changes

This PR adds optimized database indexes to improve the performance of tree queries on StagedResource and StagedResourceAncestor tables. The changes include:

  • A new partial index on stagedresource that specifically targets leaf resources (is_leaf IS TRUE)
  • A new index on stagedresourceancestor that optimizes ancestor lookups by starting with ancestor_urn
  • Conditional migration logic that defers index creation for tables with >1M rows to avoid blocking deployments
  • Added async utility function is_backfill_completed_async() for checking backfill status in async contexts

Code Changes

  • Added migration 29acbb0689de that creates ix_stagedresource_leaf_true_monitor_status_urn index for efficient leaf node filtering
  • Added stagedresourceancestor index (ancestor_urn, distance, descendant_urn) to optimize bottom-up tree queries
  • Added is_backfill_completed_async() function in backfill_scripts/utils.py for async session support
  • Updated post_upgrade_index_creation.py to include new index definitions for deferred creation
  • Updated model definitions in core.py to reflect new index structure

Steps to Confirm

  1. Run the migration on a test database and verify both indexes are created successfully
  2. Verify that the migration skips index creation on large tables (>1M rows) and logs appropriate messages
  3. Confirm that tree queries use the new indexes (check EXPLAIN ANALYZE output)

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Contributor

vercel bot commented Feb 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 19, 2026 8:56pm
fides-privacy-center Ignored Ignored Feb 19, 2026 8:56pm

Request Review

@vcruces vcruces added the do not merge Please don't merge yet, bad things will happen if you do label Feb 16, 2026
@vcruces
Copy link
Contributor Author

vcruces commented Feb 16, 2026

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds optimized database indexes to improve tree query performance on StagedResource and StagedResourceAncestor tables. The changes are well-structured and follow established patterns in the codebase for handling large table migrations.

  • Adds a new partial index ix_stagedresource_monitor_leaf_true_status_urn with WHERE is_leaf IS TRUE to optimize leaf-node filtering queries, complementing the existing IS NOT NULL partial index
  • Reorders the stagedresourceancestor composite index from (descendant_urn, ancestor_urn, distance) to (ancestor_urn, distance, descendant_urn) to better support bottom-up tree traversal queries
  • Drops the old ix_staged_resource_ancestor_desc_anc_dist index with proper existence checks to handle both small and large table scenarios
  • Adds is_backfill_completed_async() utility function for async session support, with corresponding test coverage
  • Migration includes conditional logic that defers index creation for tables with >1M rows, consistent with the project's pattern for non-blocking deployments
  • All three index definition locations (model, migration, post_upgrade_index_creation.py) are kept in sync

Confidence Score: 4/5

  • This PR is safe to merge with low risk — index changes are additive and the migration handles edge cases well.
  • Score of 4 reflects that this is a well-executed database index optimization with proper conditional migration logic, existence checks, and a clean downgrade path. The only suggestions are minor style optimizations (redundant column in partial index). The migration correctly handles both small and large table scenarios, and all three index definition locations are consistent.
  • The migration file (xx_2026_02_13_0100_29acbb0689de_...) deserves the most attention as it contains the core database changes with conditional logic.

Important Files Changed

Filename Overview
changelog/7398-add_is_leaf_index_and_update_distance_index.yaml Standard changelog entry with correct PR number and db-migration label.
src/fides/api/alembic/migrations/versions/xx_2026_02_13_0100_29acbb0689de_add_is_leaf_is_true_index_to_.py Well-structured migration with conditional logic for large tables, existence checks before drop/create operations, and proper downgrade. Minor style note about redundant is_leaf column in the partial index.
src/fides/api/migrations/backfill_scripts/utils.py Adds is_backfill_completed_async() as an async counterpart to existing sync function. Clean implementation using parameterized queries.
src/fides/api/migrations/post_upgrade_index_creation.py Updates old index entry to new column order and adds new partial index entry. Consistent with model and migration changes.
src/fides/api/models/detection_discovery/core.py Updates index column order for StagedResourceAncestor and adds new partial index for StagedResource. Good comments explaining the dual-index pattern. Minor optimization opportunity: is_leaf column is redundant in the IS TRUE partial index.
tests/ops/migration_tests/backfill_scripts.py/test_backfill_utils.py Adds async test for new is_backfill_completed_async function. Uses session-scoped async_session fixture with manual cleanup, matching existing sync test patterns.

Last reviewed commit: cc49e99

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@vcruces vcruces force-pushed the ENG-2440 branch 6 times, most recently from 75b5359 to cc49e99 Compare February 18, 2026 13:05
@vcruces vcruces requested a review from adamsachs February 18, 2026 17:23
@vcruces vcruces marked this pull request as ready for review February 18, 2026 17:23
@vcruces vcruces requested a review from a team as a code owner February 18, 2026 17:23
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me, especially if you've gotten some confidence locally that the updated indexes are helping performance on the queries we're using 👍

just highlighted a minor redundancy point that greptile called out and seems like it may be legitimate

@vcruces vcruces changed the title ENG-2440 - Add new index WHERE is_leaf IS TRUE and update distance index ENG-2440 - Add new index WHERE is_leaf IS TRUE and and distance index Feb 18, 2026
@vcruces vcruces removed the do not merge Please don't merge yet, bad things will happen if you do label Feb 19, 2026
@vcruces vcruces added this pull request to the merge queue Feb 19, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments