Skip to content

fix: run ANALYZE at startup and use real relpages for stats#100

Open
veksen wants to merge 3 commits intomainfrom
veksen/analyze-at-startup
Open

fix: run ANALYZE at startup and use real relpages for stats#100
veksen wants to merge 3 commits intomainfrom
veksen/analyze-at-startup

Conversation

@veksen
Copy link
Copy Markdown
Member

@veksen veksen commented Mar 27, 2026

Summary

  • Replace fromAssumption(reltuples=10000, relpages=1) with a fromStatisticsExport mode that reads real relpages from pg_class — PostgreSQL's planner ignores pg_class.relpages and reads actual disk pages via RelationGetNumberOfBlocks(), then estimates tuples = actual_pages × reltuples ÷ relpages. With relpages=1 this inflated estimates by up to 74x (740,000 instead of 10,000)
  • When no statisticsPath is provided (CI default), run ANALYZE first so pg_class.relpages and pg_statistic reflect the current data deterministically. Skipped entirely when users provide their own stats export
  • Extracted buildStatsFromDatabase with 9 integration tests proving the planner estimates exactly 10,000 rows regardless of actual data (1, 10K, or 50K rows seeded)

Test plan

  • 9 integration tests against local PostgreSQL covering:
    • Planner estimates 10,000 rows with 1 / 10K / 50K rows seeded
    • Bug reproduction: fromAssumption(relpages=1) produces 740,000 estimate
    • relpages clamped to ≥1 for empty tables
    • Indexes grouped by parent table
    • columns: null preserves ANALYZE's pg_statistic entries

🤖 Generated with Claude Code

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query Doctor Analysis

View full run details

32 queries analyzed

2 pre-existing issues

@veksen veksen force-pushed the veksen/analyze-at-startup branch 2 times, most recently from 16df9bc to ae07ed2 Compare March 27, 2026 10:47
veksen and others added 2 commits March 27, 2026 15:03
PostgreSQL's planner ignores pg_class.relpages for tables with data —
it reads actual disk pages via RelationGetNumberOfBlocks(). The old
fromAssumption(reltuples=10000, relpages=1) caused the planner to
estimate tuples as actual_pages × 10000 / 1, inflating row estimates
by up to 74x (e.g. 740,000 instead of 10,000 for a 10K-row table).

Fix: run ANALYZE before reading statistics to populate pg_statistic
deterministically, then build a fromStatisticsExport mode that pairs
reltuples=10,000 with the real relpages from pg_class. This makes
the planner formula produce exactly 10,000 regardless of actual data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the ANALYZE call inside the else branch so it only runs when
buildStatsFromDatabase needs accurate pg_class.relpages. When users
provide their own stats export, ANALYZE is skipped entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@veksen veksen force-pushed the veksen/analyze-at-startup branch from ae07ed2 to dd00b83 Compare March 27, 2026 11:03
@veksen veksen force-pushed the veksen/analyze-at-startup branch 2 times, most recently from a07c52d to 73875de Compare March 27, 2026 13:40
Temporary commit to verify actual relpages and estimated rows values
in CI. Will be reverted after capturing the numbers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@veksen veksen force-pushed the veksen/analyze-at-startup branch from 73875de to f19f9c8 Compare March 27, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant