Skip to content

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Dec 29, 2025

In #6388 it was suggest to change the query approach for retrieving (per user) authorizations/authorized objects. Although that would be mostly beneficial for large MySQL based instances, there's also non-trivial room for improvement for Postgres queries.

Summary

  • Added comprehensive unit tests for get_authorized_*() query functions (46 new tests in unittests/test_authorization_queries.py)
  • Optimized authorization queries by replacing EXISTS correlated subqueries with IN (Subquery) pattern across 24 functions (~2.4x speedup)
  • Optimized user queries by replacing Python list materialization with Subquery in 3 functions
  • Added request-level caching using @cache_for_request decorator to 22 authorization functions
  • Split 6 functions with queryset parameter into cached and uncached variants (*_for_queryset)
  • Cleaned up obsolete annotation exclusions in dojo/reports/views.py

Background

The Problem

Authorization queries used EXISTS with OuterRef() which creates correlated subqueries evaluated per-row, causing poor performance on large datasets.

Reference: #6388

Old Query Pattern (EXISTS)

-- Correlated subqueries evaluated per-row
SELECT ... FROM dojo_finding
WHERE EXISTS (SELECT 1 FROM dojo_product_type_member WHERE product_type_id = finding.prod_type_id AND ...)
   OR EXISTS (SELECT 1 FROM dojo_product_member WHERE product_id = finding.product_id AND ...)
   OR EXISTS (...)
   OR EXISTS (...)

New Query Pattern (IN with Subquery)

-- Independent subqueries with hash semi-joins
SELECT ... FROM dojo_finding
WHERE product_id IN (SELECT product_id FROM dojo_product_member WHERE user_id = X AND role_id IN (...))
   OR prod_type_id IN (SELECT product_type_id FROM dojo_product_type_member WHERE user_id = X AND role_id IN (...))
   OR product_id IN (SELECT product_id FROM dojo_product_group WHERE ...)
   OR prod_type_id IN (SELECT product_type_id FROM dojo_product_type_group WHERE ...)

Performance Results

Tested on PostgreSQL with ~195,000 findings, using DISCARD ALL between runs to ensure fair cache state comparison:

Run EXISTS (ms) IN Subquery (ms) Speedup
1 354 149 2.38x
2 361 147 2.46x
3 348 152 2.29x
4 356 144 2.47x
5 352 148 2.38x
Average 354 148 2.39x

Consistent 2.3-2.5x speedup across all test runs.

The improvement comes from:

  1. No per-row subquery evaluation: EXISTS with OuterRef evaluates the subquery for each row
  2. Better query plan: PostgreSQL uses hash semi-joins with IN (Subquery)
  3. No annotation overhead: Removed 4 boolean annotations per finding

Caching

Added @cache_for_request decorator to authorization functions. This caches query results for the duration of a single HTTP request, eliminating redundant database queries when the same authorization check is called multiple times.

Functions with Direct Caching (16)

These functions do not accept a queryset parameter and are directly cached:

File Functions
dojo/engagement/queries.py get_authorized_engagements
dojo/product_type/queries.py get_authorized_product_types
dojo/product/queries.py get_authorized_products, get_authorized_app_analysis, get_authorized_dojo_meta, get_authorized_languages, get_authorized_engagement_presets, get_authorized_product_api_scan_configurations
dojo/test/queries.py get_authorized_tests, get_authorized_test_imports
dojo/risk_acceptance/queries.py get_authorized_risk_acceptances
dojo/jira_link/queries.py get_authorized_jira_projects, get_authorized_jira_issues
dojo/tool_product/queries.py get_authorized_tool_product_settings
dojo/group/queries.py get_authorized_groups
dojo/finding/queries.py get_authorized_stub_findings

Functions Split into Cached + Uncached (6)

Functions with a queryset parameter were split to support both use cases:

File Cached Function Uncached Variant
dojo/finding/queries.py get_authorized_findings() get_authorized_findings_for_queryset()
dojo/finding/queries.py get_authorized_vulnerability_ids() get_authorized_vulnerability_ids_for_queryset()
dojo/endpoint/queries.py get_authorized_endpoints() get_authorized_endpoints_for_queryset()
dojo/endpoint/queries.py get_authorized_endpoint_status() get_authorized_endpoint_status_for_queryset()
dojo/finding_group/queries.py get_authorized_finding_groups() get_authorized_finding_groups_for_queryset()
dojo/cred/queries.py get_authorized_cred_mappings() get_authorized_cred_mappings_for_queryset()

Expected Caching Benefits

In a typical finding list page request, authorization functions may be called multiple times:

Scenario Before After Reduction
Finding list with filters 5-10 calls to get_authorized_findings 1 DB query + cache hits ~80-90% fewer queries
Product dropdown rendering Multiple calls to get_authorized_products 1 DB query + cache hits ~80-90% fewer queries
Navigation menu Repeated permission checks Cached after first call Significant reduction

The cache is automatically cleared at the end of each HTTP request, ensuring data freshness. This is a pre-existing cache mechanism already used for some of the authorization query results.

Performance Test Query Count Changes

Test Method Step Old Queries New Queries Reduction
pghistory_async import1 306 296 -10
reimport1 232 227 -5
reimport2 114 109 -5
pghistory_no_async import1 313 303 -10
reimport1 239 234 -5
reimport2 121 116 -5
pghistory_no_async_with_product_grading import1 315 305 -10
reimport1 241 236 -5
reimport2 123 118 -5
deduplication_pghistory_async first_import 275 265 -10
second_import 185 175 -10
deduplication_pghistory_no_async first_import 282 272 -10
second_import 246 236 -10

Total query reduction: 5-10 queries saved per import/reimport operation due to request-level caching of authorization queries.

Tests

All existing and new tests pass:

  • unittests.test_authorization_queries - 46 tests ✓
  • unittests.authorization.test_authorization.TestAuthorization - 52 tests ✓
  • unittests.test_rest_framework.FindingsTest - 24 tests ✓
  • unittests.test_rest_framework.ProductTest - 18 tests ✓

@valentijnscholten valentijnscholten added this to the 2.55.0 milestone Dec 29, 2025
@valentijnscholten valentijnscholten changed the title authorizations: optimize queries authorizations: optimize queries & cache data per request Dec 29, 2025
@valentijnscholten valentijnscholten marked this pull request as ready for review December 29, 2025 15:59
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Dec 29, 2025
@github-actions
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

- Resolved conflict by using refactored _bulk_delete_findings function from upstream/dev
- Preserved optimization by using get_authorized_findings_for_queryset instead of get_authorized_findings
- This maintains the queryset-based authorization filtering optimization from the branch
@github-actions
Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects_pro PRs that affect Pro and need a coordinated release/merge moment. conflicts-detected docs unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant