Skip to content

Speed up analyze MCV tracking with hash table lookups#1

Open
Reminiscent wants to merge 1 commit intomasterfrom
codex/implement-mcv-calculation-optimization-in-postgres
Open

Speed up analyze MCV tracking with hash table lookups#1
Reminiscent wants to merge 1 commit intomasterfrom
codex/implement-mcv-calculation-optimization-in-postgres

Conversation

@Reminiscent
Copy link
Owner

Summary

  • add an optional simplehash map in compute_distinct_stats to speed up MCV lookups when hash support exists
  • gate the hash path behind a threshold (200 slots) and default hash operator availability, falling back to linear scans otherwise
  • keep track[] ordering in sync with hash entries while supporting replacements and swaps

Testing

  • Not run (not requested)

Codex Task

compute_distinct_stats() tracks possible MCVs for datatypes that have
an equality operator but no ordering.  Finding a match currently requires
a linear scan of the tracking array for every sampled row, which can
become very expensive when statistics targets are set high.

When the tracking array is large enough and the type's default hash
support matches the equality operator, maintain a simplehash table that
maps a tracked value to its current track[] slot.  This reduces match
lookups from O(n) to O(1) on average while keeping the existing linear
path as a fallback.

Add a regression test exercising the hashed path.
@Reminiscent Reminiscent force-pushed the codex/implement-mcv-calculation-optimization-in-postgres branch from f929522 to 9c5c504 Compare January 14, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant