fix(Code References): Code references are slow to query#7463
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7463 +/- ##
==========================================
+ Coverage 98.44% 98.47% +0.03%
==========================================
Files 1399 1402 +3
Lines 52667 53234 +567
==========================================
+ Hits 51847 52424 +577
+ Misses 820 810 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b809ab4 to
9209bbe
Compare
9209bbe to
2b0f99e
Compare
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Docker builds report
|
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
|
Visual Regression16 screenshots compared. See report for details. |
|
@claude review once |
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, comment @claude review on this pull request to trigger a review.
gagantrivedi
left a comment
There was a problem hiding this comment.
Strong stuff almost there!
| class FeatureSerializerWithMetadata(MetadataSerializerMixin, CreateFeatureSerializer): | ||
| metadata = MetadataSerializer(required=False, many=True) | ||
|
|
||
| # NOTE: This field is populated by `projects.code_references.services.annotate_feature_queryset_with_code_references_summary`. |
There was a problem hiding this comment.
I've added it to hint at where this is materialised, because I personally find it useful, but one could find it if they search. Weakly held, let me know if you prefer the 🔪
|
|
||
|
|
||
| def _hash_references(references: list[StoredCodeReference]) -> str: | ||
| return hashlib.md5(json.dumps(references, sort_keys=True).encode()).hexdigest() |
There was a problem hiding this comment.
Nit from claude : hashlib.md5(..., usedforsecurity=False) — flags it as a non-crypto digest so Bandit doesn't trip and FIPS-mode Python doesn't refuse to instantiate it. Same applies in the migration.
Does make sense to me?
| @@ -96,58 +64,106 @@ def annotate_feature_queryset_with_code_references_summary( | |||
| def get_code_references_for_feature_flag( | |||
| feature: Feature, | |||
| ) -> list[FeatureFlagCodeReferencesRepositorySummary]: | |||
There was a problem hiding this comment.
The created_at = repository.last_scanned_at predicate matches zero rows in practice. Two causes:
▎
▎ 1. First scan: record_scan sets last_scanned_at = scanned_at, but ScannedCodeReferences.created_at is populated by auto_now_add — a fresh timezone.now() per row inside bulk_create. The two
▎ timestamps drift apart (~80ms in my repro, but any non-zero δ kills the equality).
▎ 2. Re-scan with identical content: bulk_create(ignore_conflicts=True) drops the new row entirely, so created_at stays frozen at the original timestamp while last_scanned_at keeps advancing.
▎
▎ Result: list endpoint returns count: 0; detail endpoint returns []. The existing tests miss this because they either bypass record_scan (writing rows directly with
▎ ScannedCodeReferences.objects.create under freeze_time) or exercise record_scan but never query the read endpoint afterwards.
▎
▎ Here's a minimal failing test that goes through the real API end-to-end (no freeze_time):
▎
▎ def test_list_features__scan_recorded_via_api__count_reflects_references(
▎ feature, project, staff_client, admin_client_new, with_project_permissions,
▎ ):
▎ with_project_permissions([VIEW_PROJECT])
▎ admin_client_new.post(
▎ f"/api/v1/projects/{project.pk}/code-references/",
▎ data={
▎ "repository_url": "https://github.flagsmith.com/repo/",
▎ "revision": "rev-1",
▎ "code_references": [
▎ {"feature_name": feature.name, "file_path": "x.py", "line_number": 1},
▎ ],
▎ },
▎ format="json",
▎ )
▎ response = staff_client.get(f"/api/v1/projects/{project.pk}/features/")
▎ counts = response.json()["results"][0]["code_references_counts"]
▎ assert counts[0]["count"] == 1 # ← fails: actual count is 0
▎
▎ Fails today with assert 0 == 1. A second identical POST exposes cause (2) — same failure.
| models.Index( | ||
| fields=["project", "repository_url", "-created_at"], | ||
| name="code_ref_proj_repo_created_idx", | ||
| constraints = [ |
There was a problem hiding this comment.
Did you see this being used in your query plan? Mine isn't using it, which brings me to another important question — do you think we should test the query on the staging DB at least? The production DB is very different from a MacBook, and the query still looks complex enough to warrant testing on a prod-like DB
There was a problem hiding this comment.
Did you see this being used in your query plan?
Yes! The constraint index was used heavily in my local tests to help narrowing down row search (feature, repository). But seemingly not enough, so thanks for flagging.
do you think we should test the query on the staging DB at least?
I ran this scenario in staging, via direct database access, and temporary tables matching the ones created in this PR: "a project with 400 features, 350 are present in code, 10 merges / day (mostly dupes), over 6 months".
Results revealed slowness would bite us again in the future for big customers running micro services, as unique_scanned_code_references would still lead to N items to sort. Luckily this was an easy fix: 29f3276 — annotation is down to sub-10ms to most common cases, and sub-100ms in big-bad cases like the benchmark.
There was a problem hiding this comment.
Sorry forgot to add this in my response above. Real benchmarking, formatted by LLM:
Scenario 1, common: small project, steady scanning
40 features, 2 repos, 5 unique scans/repo/week, 6 months retained, 10,400 rows in bench_code_references_scannedcodereferences.
| query | time |
|---|---|
| list endpoint (full history) | 52 ms |
| list endpoint (3-month window) | 27 ms |
| detail endpoint × 100 features | 110 ms total (≈1.1 ms each) |
All three queries use cr_feature_repo_created_idx (introduced by this PR).
EXPLAIN ANALYZE full output
--- LIST ANNOTATION (full project) ---
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=58538.5..58538.5 rows=8 width=36) (actual time=52.4..52.4 rows=40 loops=1)
Output: f.id, ((SubPlan 2))
Sort Key: f.id
Sort Method: quicksort Memory: 46kB
Buffers: shared hit=52292
-> Index Scan using features_feature_project_id_72859830 on public.features_feature f (cost=0.3..58538.4 rows=8 width=36) (actual time=1.6..52.3 rows=40 loops=1)
Output: f.id, (SubPlan 2)
Index Cond: (f.project_id = 25969)
Filter: (f.deleted_at IS NULL)
Buffers: shared hit=52289
SubPlan 2
-> Aggregate (cost=7315.6..7315.6 rows=1 width=32) (actual time=1.3..1.3 rows=1 loops=40)
Output: array_agg((jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))))
Buffers: shared hit=52285
-> Unique (cost=7311.3..7315.5 rows=12 width=86) (actual time=1.3..1.3 rows=2 loops=40)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Buffers: shared hit=52285
-> Sort (cost=7311.3..7313.4 rows=843 width=86) (actual time=1.3..1.3 rows=260 loops=40)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Sort Key: r.url, scr.created_at DESC
Sort Method: quicksort Memory: 110kB
Buffers: shared hit=52285
-> Hash Join (cost=1.7..7270.3 rows=843 width=86) (actual time=0.0..1.2 rows=260 loops=40)
Output: jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0)), r.url, scr.created_at
Inner Unique: true
Hash Cond: (scr.repository_id = r.id)
Buffers: shared hit=52279
-> Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..142.7 rows=843 width=16) (actual time=0.0..0.2 rows=260 loops=40)
Output: scr.feature_id, scr.repository_id, scr.created_at
Index Cond: (scr.feature_id = f.id)
Heap Fetches: 10400
Buffers: shared hit=10678
-> Hash (cost=1.1..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buffers: shared hit=1
SubPlan 1
-> Limit (cost=0.4..8.4 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=10400)
Output: (jsonb_array_length(inner_scr.code_references)), inner_scr.created_at
Buffers: shared hit=41600
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences inner_scr (cost=0.4..8.4 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=10400)
Output: jsonb_array_length(inner_scr.code_references), inner_scr.created_at
Index Cond: ((inner_scr.feature_id = scr.feature_id) AND (inner_scr.repository_id = scr.repository_id) AND (inner_scr.created_at = r.last_scanned_at))
Buffers: shared hit=41600
Query Identifier: -4966899296074004523
Planning:
Buffers: shared hit=411
Planning Time: 1.0 ms
Execution Time: 52.5 ms
(52 rows)
Time: 314.6 ms
--- LIST ANNOTATION with 3-month window ---
QUERY PLAN
Sort (cost=29054.8..29054.8 rows=8 width=36) (actual time=26.6..26.6 rows=40 loops=1)
Output: f.id, ((SubPlan 2))
Sort Key: f.id
Sort Method: quicksort Memory: 46kB
Buffers: shared hit=26305
-> Index Scan using features_feature_project_id_72859830 on public.features_feature f (cost=0.3..29054.7 rows=8 width=36) (actual time=0.8..26.6 rows=40 loops=1)
Output: f.id, (SubPlan 2)
Index Cond: (f.project_id = 25969)
Filter: (f.deleted_at IS NULL)
Buffers: shared hit=26305
SubPlan 2
-> Aggregate (cost=3630.2..3630.2 rows=1 width=32) (actual time=0.7..0.7 rows=1 loops=40)
Output: array_agg((jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))))
Buffers: shared hit=26301
-> Unique (cost=3627.9..3630.0 rows=12 width=86) (actual time=0.6..0.7 rows=2 loops=40)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Buffers: shared hit=26301
-> Sort (cost=3627.9..3629.0 rows=416 width=86) (actual time=0.6..0.7 rows=130 loops=40)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Sort Key: r.url, scr.created_at DESC
Sort Method: quicksort Memory: 67kB
Buffers: shared hit=26301
-> Hash Join (cost=1.7..3609.8 rows=416 width=86) (actual time=0.0..0.6 rows=130 loops=40)
Output: jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0)), r.url, scr.created_at
Inner Unique: true
Hash Cond: (scr.repository_id = r.id)
Buffers: shared hit=26301
-> Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..88.8 rows=416 width=16) (actual time=0.0..0.1 rows=130 loops=40)
Output: scr.feature_id, scr.repository_id, scr.created_at
Index Cond: ((scr.feature_id = f.id) AND (scr.created_at >= (now() - '3 mons'::interval)))
Heap Fetches: 5200
Buffers: shared hit=5500
-> Hash (cost=1.1..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buffers: shared hit=1
SubPlan 1
-> Limit (cost=0.4..8.5 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=5200)
Output: (jsonb_array_length(inner_scr.code_references)), inner_scr.created_at
Buffers: shared hit=20800
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences inner_scr (cost=0.4..8.5 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=5200)
Output: jsonb_array_length(inner_scr.code_references), inner_scr.created_at
Index Cond: ((inner_scr.feature_id = scr.feature_id) AND (inner_scr.repository_id = scr.repository_id) AND (inner_scr.created_at >= (now() - '3 mons'::interval)) AND (inner_scr.created_at = r.last_scanned_at))
Buffers: shared hit=20800
Query Identifier: 5778658752241488958
Planning:
Buffers: shared hit=12
Planning Time: 0.3 ms
Execution Time: 26.7 ms
(52 rows)
Time: 300.2 ms
--- DETAIL QUERY across 100 features (single plan, 100 loops on the inner scan) ---
QUERY PLAN
Incremental Sort (cost=1041.5..16824.9 rows=729 width=1575) (actual time=109.8..110.4 rows=80 loops=1)
Output: s.feature_id, scr.id, scr.created_at, scr.revision, scr.code_references, r.url, r.vcs_provider, r.last_scanned_at
Sort Key: s.feature_id, r.url
Presorted Key: s.feature_id
Full-sort Groups: 3 Sort Method: quicksort Average Memory: 76kB Peak Memory: 76kB
Buffers: shared hit=27195
-> Nested Loop (cost=1001.0..16799.0 rows=729 width=1575) (actual time=109.3..110.3 rows=80 loops=1)
Output: s.feature_id, scr.id, scr.created_at, scr.revision, scr.code_references, r.url, r.vcs_provider, r.last_scanned_at
Buffers: shared hit=27195
-> Limit (cost=1000.6..6528.0 rows=100 width=4) (actual time=109.3..109.4 rows=40 loops=1)
Output: s.feature_id
Buffers: shared hit=25635
-> Unique (cost=1000.6..22557.4 rows=390 width=4) (actual time=109.3..109.4 rows=40 loops=1)
Output: s.feature_id
Buffers: shared hit=25635
-> Gather Merge (cost=1000.6..22556.4 rows=396 width=4) (actual time=109.3..109.4 rows=43 loops=1)
Output: s.feature_id
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=25635
-> Unique (cost=0.6..21510.7 rows=198 width=4) (actual time=83.3..90.0 rows=14 loops=3)
Output: s.feature_id
Buffers: shared hit=25635
Worker 0: actual time=75.1..83.3 rows=40 loops=1
Buffers: shared hit=15030
Worker 1: actual time=76.5..88.2 rows=3 loops=1
Buffers: shared hit=5502
-> Nested Loop (cost=0.6..21510.2 rows=198 width=4) (actual time=83.3..90.0 rows=27 loops=3)
Output: s.feature_id
Inner Unique: true
Buffers: shared hit=25635
Worker 0: actual time=75.0..83.3 rows=77 loops=1
Buffers: shared hit=15030
Worker 1: actual time=76.5..88.2 rows=3 loops=1
Buffers: shared hit=5502
-> Parallel Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences s (cost=0.4..17668.6 rows=137042 width=16) (actual time=0.0..40.2 rows=109633 loops=3)
Output: s.feature_id, s.repository_id, s.created_at
Heap Fetches: 10400
Buffers: shared hit=19133
Worker 0: actual time=0.0..23.5 rows=104455 loops=1
Buffers: shared hit=12689
Worker 1: actual time=0.0..42.8 rows=102664 loops=1
Buffers: shared hit=3161
-> Memoize (cost=0.1..0.2 rows=1 width=12) (actual time=0.0..0.0 rows=0 loops=328900)
Output: r_1.id, r_1.last_scanned_at
Cache Key: s.repository_id, s.created_at
Cache Mode: logical
Hits: 120871 Misses: 910 Evictions: 0 Overflows: 0 Memory Usage: 72kB
Buffers: shared hit=6502
Worker 0: actual time=0.0..0.0 rows=0 loops=104455
Hits: 103285 Misses: 1170 Evictions: 0 Overflows: 0 Memory Usage: 92kB
Buffers: shared hit=2341
Worker 1: actual time=0.0..0.0 rows=0 loops=102664
Hits: 101494 Misses: 1170 Evictions: 0 Overflows: 0 Memory Usage: 92kB
Buffers: shared hit=2341
-> Index Scan using bench_code_references_vcsrepository_pkey on public.bench_code_references_vcsrepository r_1 (cost=0.1..0.2 rows=1 width=12) (actual time=0.0..0.0 rows=0 loops=3250)
Output: r_1.id, r_1.last_scanned_at
Index Cond: (r_1.id = s.repository_id)
Filter: ((r_1.project_id = 25969) AND (s.created_at = r_1.last_scanned_at))
Rows Removed by Filter: 1
Buffers: shared hit=6502
Worker 0: actual time=0.0..0.0 rows=0 loops=1170
Buffers: shared hit=2341
Worker 1: actual time=0.0..0.0 rows=0 loops=1170
Buffers: shared hit=2341
-> Nested Loop (cost=0.4..102.6 rows=12 width=1575) (actual time=0.0..0.0 rows=2 loops=40)
Output: scr.id, scr.created_at, scr.revision, scr.code_references, scr.feature_id, r.url, r.vcs_provider, r.last_scanned_at
Buffers: shared hit=1560
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=65) (actual time=0.0..0.0 rows=12 loops=40)
Output: r.url, r.vcs_provider, r.last_scanned_at, r.id
Buffers: shared hit=40
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..8.4 rows=1 width=1518) (actual time=0.0..0.0 rows=0 loops=480)
Output: scr.id, scr.created_at, scr.revision, scr.code_references, scr.code_references_hash, scr.feature_id, scr.repository_id
Index Cond: ((scr.feature_id = s.feature_id) AND (scr.repository_id = r.id) AND (scr.created_at = r.last_scanned_at))
Buffers: shared hit=1520
Query Identifier: 180893212974011606
Planning:
Buffers: shared hit=36
Planning Time: 1.2 ms
Execution Time: 110.5 ms
(80 rows)
Time: 364.8 ms
Scenario 2, exaggerated bad case but plausible: large project, many repos
400 features, 10 repos, ~3.5 unique scans/repo/week, 6 months retained, 318,500 rows in bench_code_references_scannedcodereferences.
| query | time |
|---|---|
| list endpoint (full history) | 1,805 ms |
| list endpoint (3-month window) | 1,488 ms (~18% faster) |
| detail endpoint × 100 features | 85 ms total (≈0.85 ms each) |
The list endpoint stays above 1 s at this scale even with the covering index, because the inner subplan still loops once per (feature, repository) pair (318,500 loops, 1 row each). The 3-month window helps less than expected because the seed distributes scans uniformly over 6 months.
EXPLAIN ANALYZE full output
--- LIST ANNOTATION (full project) ---
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using features_feature_pkey on public.features_feature f (cost=0.3..564849.7 rows=77 width=36) (actual time=12.8..1804.5 rows=400 loops=1)
Output: f.id, (SubPlan 2)
Filter: ((f.deleted_at IS NULL) AND (f.project_id = 25968))
Rows Removed by Filter: 32295
Buffers: shared hit=1297014
SubPlan 2
-> Aggregate (cost=7315.6..7315.6 rows=1 width=32) (actual time=4.5..4.5 rows=1 loops=400)
Output: array_agg((jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))))
Buffers: shared hit=1283893
-> Unique (cost=7311.3..7315.5 rows=12 width=86) (actual time=4.4..4.5 rows=9 loops=400)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Buffers: shared hit=1283893
-> Sort (cost=7311.3..7313.4 rows=843 width=86) (actual time=4.4..4.4 rows=796 loops=400)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Sort Key: r.url, scr.created_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1283893
-> Hash Join (cost=1.7..7270.3 rows=843 width=86) (actual time=0.0..3.7 rows=796 loops=400)
Output: jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0)), r.url, scr.created_at
Inner Unique: true
Hash Cond: (scr.repository_id = r.id)
Buffers: shared hit=1283887
-> Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..142.7 rows=843 width=16) (actual time=0.0..0.1 rows=796 loops=400)
Output: scr.feature_id, scr.repository_id, scr.created_at
Index Cond: (scr.feature_id = f.id)
Heap Fetches: 0
Buffers: shared hit=9886
-> Hash (cost=1.1..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buffers: shared hit=1
SubPlan 1
-> Limit (cost=0.4..8.4 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=318500)
Output: (jsonb_array_length(inner_scr.code_references)), inner_scr.created_at
Buffers: shared hit=1274000
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences inner_scr (cost=0.4..8.4 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=318500)
Output: jsonb_array_length(inner_scr.code_references), inner_scr.created_at
Index Cond: ((inner_scr.feature_id = scr.feature_id) AND (inner_scr.repository_id = scr.repository_id) AND (inner_scr.created_at = r.last_scanned_at))
Buffers: shared hit=1274000
Query Identifier: -4966899296074004523
Planning:
Buffers: shared hit=411
Planning Time: 1.1 ms
Execution Time: 1804.6 ms
(47 rows)
Time: 2152.9 ms (00:2.2)
--- LIST ANNOTATION with 3-month window ---
QUERY PLAN
Index Scan using features_feature_pkey on public.features_feature f (cost=0.3..281069.1 rows=77 width=36) (actual time=9.6..1488.2 rows=400 loops=1)
Output: f.id, (SubPlan 2)
Filter: ((f.deleted_at IS NULL) AND (f.project_id = 25968))
Rows Removed by Filter: 32295
Buffers: shared hit=650730
SubPlan 2
-> Aggregate (cost=3630.2..3630.2 rows=1 width=32) (actual time=3.7..3.7 rows=1 loops=400)
Output: array_agg((jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))))
Buffers: shared hit=637609
-> Unique (cost=3627.9..3630.0 rows=12 width=86) (actual time=3.6..3.7 rows=9 loops=400)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Buffers: shared hit=637609
-> Sort (cost=3627.9..3629.0 rows=416 width=86) (actual time=3.6..3.7 rows=394 loops=400)
Output: (jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0))), r.url, scr.created_at
Sort Key: r.url, scr.created_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=637609
-> Hash Join (cost=1.7..3609.8 rows=416 width=86) (actual time=0.0..3.3 rows=394 loops=400)
Output: jsonb_build_object('repository_url', r.url, 'last_successful_repository_scanned_at', r.last_scanned_at, 'last_feature_found_at', scr.created_at, 'count', COALESCE((SubPlan 1), 0)), r.url, scr.created_at
Inner Unique: true
Hash Cond: (scr.repository_id = r.id)
Buffers: shared hit=637609
-> Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..88.8 rows=416 width=16) (actual time=0.0..0.1 rows=394 loops=400)
Output: scr.feature_id, scr.repository_id, scr.created_at
Index Cond: ((scr.feature_id = f.id) AND (scr.created_at >= (now() - '3 mons'::interval)))
Heap Fetches: 0
Buffers: shared hit=7608
-> Hash (cost=1.1..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=58) (actual time=0.0..0.0 rows=12 loops=1)
Output: r.url, r.last_scanned_at, r.id
Buffers: shared hit=1
SubPlan 1
-> Limit (cost=0.4..8.5 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=157500)
Output: (jsonb_array_length(inner_scr.code_references)), inner_scr.created_at
Buffers: shared hit=630000
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences inner_scr (cost=0.4..8.5 rows=1 width=12) (actual time=0.0..0.0 rows=1 loops=157500)
Output: jsonb_array_length(inner_scr.code_references), inner_scr.created_at
Index Cond: ((inner_scr.feature_id = scr.feature_id) AND (inner_scr.repository_id = scr.repository_id) AND (inner_scr.created_at >= (now() - '3 mons'::interval)) AND (inner_scr.created_at = r.last_scanned_at))
Buffers: shared hit=630000
Query Identifier: 5778658752241488958
Planning:
Buffers: shared hit=12
Planning Time: 0.3 ms
Execution Time: 1488.4 ms
(47 rows)
Time: 1744.8 ms (00:1.7)
--- DETAIL QUERY across 100 features (single plan, 100 loops on the inner scan) ---
QUERY PLAN
Incremental Sort (cost=1041.6..16836.9 rows=729 width=1575) (actual time=78.2..84.4 rows=1000 loops=1)
Output: s.feature_id, scr.id, scr.created_at, scr.revision, scr.code_references, r.url, r.vcs_provider, r.last_scanned_at
Sort Key: s.feature_id, r.url
Presorted Key: s.feature_id
Full-sort Groups: 25 Sort Method: quicksort Average Memory: 88kB Peak Memory: 88kB
Buffers: shared hit=28571
-> Nested Loop (cost=1001.0..16810.9 rows=729 width=1575) (actual time=77.8..81.6 rows=1000 loops=1)
Output: s.feature_id, scr.id, scr.created_at, scr.revision, scr.code_references, r.url, r.vcs_provider, r.last_scanned_at
Buffers: shared hit=28571
-> Limit (cost=1000.6..6539.9 rows=100 width=4) (actual time=77.7..78.0 rows=100 loops=1)
Output: s.feature_id
Buffers: shared hit=23871
-> Unique (cost=1000.6..22603.9 rows=390 width=4) (actual time=77.7..78.0 rows=100 loops=1)
Output: s.feature_id
Buffers: shared hit=23871
-> Gather Merge (cost=1000.6..22602.0 rows=780 width=4) (actual time=77.7..78.0 rows=200 loops=1)
Output: s.feature_id
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=23871
-> Unique (cost=0.6..21511.9 rows=390 width=4) (actual time=0.2..45.3 rows=234 loops=3)
Output: s.feature_id
Buffers: shared hit=23871
Worker 0: actual time=0.2..66.3 rows=350 loops=1
Buffers: shared hit=17134
Worker 1: actual time=0.1..69.4 rows=350 loops=1
Buffers: shared hit=6440
-> Nested Loop (cost=0.6..21509.5 rows=988 width=4) (actual time=0.2..45.2 rows=1167 loops=3)
Output: s.feature_id
Inner Unique: true
Buffers: shared hit=23871
Worker 0: actual time=0.2..66.1 rows=1810 loops=1
Buffers: shared hit=17134
Worker 1: actual time=0.1..69.3 rows=1689 loops=1
Buffers: shared hit=6440
-> Parallel Index Only Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences s (cost=0.4..17668.6 rows=137042 width=16) (actual time=0.0..12.1 rows=109633 loops=3)
Output: s.feature_id, s.repository_id, s.created_at
Heap Fetches: 10400
Buffers: shared hit=19127
Worker 0: actual time=0.0..20.4 rows=173049 loops=1
Buffers: shared hit=14793
Worker 1: actual time=0.0..16.0 rows=155706 loops=1
Buffers: shared hit=4327
-> Memoize (cost=0.1..0.2 rows=1 width=12) (actual time=0.0..0.0 rows=0 loops=328900)
Output: r_1.id, r_1.last_scanned_at
Cache Key: s.repository_id, s.created_at
Cache Mode: logical
Hits: 0 Misses: 145 Evictions: 0 Overflows: 0 Memory Usage: 12kB
Buffers: shared hit=4744
Worker 0: actual time=0.0..0.0 rows=0 loops=173049
Hits: 171879 Misses: 1170 Evictions: 0 Overflows: 0 Memory Usage: 92kB
Buffers: shared hit=2341
Worker 1: actual time=0.0..0.0 rows=0 loops=155706
Hits: 154650 Misses: 1056 Evictions: 0 Overflows: 0 Memory Usage: 83kB
Buffers: shared hit=2113
-> Index Scan using bench_code_references_vcsrepository_pkey on public.bench_code_references_vcsrepository r_1 (cost=0.1..0.2 rows=1 width=12) (actual time=0.0..0.0 rows=0 loops=2371)
Output: r_1.id, r_1.last_scanned_at
Index Cond: (r_1.id = s.repository_id)
Filter: ((r_1.project_id = 25968) AND (s.created_at = r_1.last_scanned_at))
Rows Removed by Filter: 1
Buffers: shared hit=4744
Worker 0: actual time=0.0..0.0 rows=0 loops=1170
Buffers: shared hit=2341
Worker 1: actual time=0.0..0.0 rows=0 loops=1056
Buffers: shared hit=2113
-> Nested Loop (cost=0.4..102.6 rows=12 width=1575) (actual time=0.0..0.0 rows=10 loops=100)
Output: scr.id, scr.created_at, scr.revision, scr.code_references, scr.feature_id, r.url, r.vcs_provider, r.last_scanned_at
Buffers: shared hit=4700
-> Seq Scan on public.bench_code_references_vcsrepository r (cost=0.0..1.1 rows=12 width=65) (actual time=0.0..0.0 rows=12 loops=100)
Output: r.url, r.vcs_provider, r.last_scanned_at, r.id
Buffers: shared hit=100
-> Index Scan using bench_cr_feature_repo_created_idx on public.bench_code_references_scannedcodereferences scr (cost=0.4..8.4 rows=1 width=1518) (actual time=0.0..0.0 rows=1 loops=1200)
Output: scr.id, scr.created_at, scr.revision, scr.code_references, scr.code_references_hash, scr.feature_id, scr.repository_id
Index Cond: ((scr.feature_id = s.feature_id) AND (scr.repository_id = r.id) AND (scr.created_at = r.last_scanned_at))
Buffers: shared hit=4600
Query Identifier: 180893212974011606
Planning:
Buffers: shared hit=32
Planning Time: 0.4 ms
Execution Time: 84.5 ms
(80 rows)
Time: 343.6 ms
d78d38d to
29f3276
Compare
Thanks for submitting a PR! Please check the boxes below:
I have added information todocs/if required so people know about the feature.Changes
Important
Contains a data migration.
Closes #5932
Would originally close #6832
Improve code references so they're usable again.
The previous data model did not handle accumulated code references too well, and we had to disable it.
This patch will:
code_references_ui_statsfeature flag branches, 🎉.How did you test this code?