fix: run ANALYZE at startup and use real relpages for stats#100
Open
fix: run ANALYZE at startup and use real relpages for stats#100
Conversation
There was a problem hiding this comment.
Query Doctor Analysis
32 queries analyzed
2 pre-existing issues
SELECT "guests"."id", "guests"."session_id", "guests"."username", "guests"."avatar_path", "guests"."color", "guests"."side", "guests"."audio_recording_path", "guests"."audio_recording_public", "gue...
indexassets(event_id, uploader_id, inserted_at desc)
cost 15,922 → 1,639 (90% reduction)SELECT * FROM guest_ip_addresses WHERE ip_address = '127.0.0.1';
indexguest_ip_addresses(ip_address)
cost 126 → 8 (94% reduction)
16df9bc to
ae07ed2
Compare
PostgreSQL's planner ignores pg_class.relpages for tables with data — it reads actual disk pages via RelationGetNumberOfBlocks(). The old fromAssumption(reltuples=10000, relpages=1) caused the planner to estimate tuples as actual_pages × 10000 / 1, inflating row estimates by up to 74x (e.g. 740,000 instead of 10,000 for a 10K-row table). Fix: run ANALYZE before reading statistics to populate pg_statistic deterministically, then build a fromStatisticsExport mode that pairs reltuples=10,000 with the real relpages from pg_class. This makes the planner formula produce exactly 10,000 regardless of actual data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the ANALYZE call inside the else branch so it only runs when buildStatsFromDatabase needs accurate pg_class.relpages. When users provide their own stats export, ANALYZE is skipped entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ae07ed2 to
dd00b83
Compare
a07c52d to
73875de
Compare
Temporary commit to verify actual relpages and estimated rows values in CI. Will be reverted after capturing the numbers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
73875de to
f19f9c8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fromAssumption(reltuples=10000, relpages=1)with afromStatisticsExportmode that reads realrelpagesfrompg_class— PostgreSQL's planner ignorespg_class.relpagesand reads actual disk pages viaRelationGetNumberOfBlocks(), then estimatestuples = actual_pages × reltuples ÷ relpages. Withrelpages=1this inflated estimates by up to 74x (740,000 instead of 10,000)statisticsPathis provided (CI default), runANALYZEfirst sopg_class.relpagesandpg_statisticreflect the current data deterministically. Skipped entirely when users provide their own stats exportbuildStatsFromDatabasewith 9 integration tests proving the planner estimates exactly 10,000 rows regardless of actual data (1, 10K, or 50K rows seeded)Test plan
fromAssumption(relpages=1)produces 740,000 estimatecolumns: nullpreserves ANALYZE'spg_statisticentries🤖 Generated with Claude Code