Skip to content

Fix:unbounded sql pools#9891

Open
gurudatta-patil wants to merge 3 commits intotemporalio:mainfrom
gurudatta-patil:fix/unbounded-sql-pools
Open

Fix:unbounded sql pools#9891
gurudatta-patil wants to merge 3 commits intotemporalio:mainfrom
gurudatta-patil:fix/unbounded-sql-pools

Conversation

@gurudatta-patil
Copy link
Copy Markdown

What changed?

DatabaseHandle.reconnect() in db_handle.go had two closely related bugs that caused an unbounded accumulation of *sql.DB pools during sustained DB unavailability:

  1. Pool destroyed before throttle check. The old pool was nil'd out (h.db.Store(nil)) before the throttle check. When throttled, no new pool was created, so h.db stayed nil for the entire 1-second window. Every caller in that window got DatabaseUnavailableError, triggering another ConvertErrorreconnect(true) → destroy pool → throttled → nil again — a loop that lasted the entire outage.

  2. New pool created before old one closed. On each un-throttled reconnect, a fresh *sql.DB was opened while the previous one was closed asynchronously. During a 2-3 minute outage (~150 throttle windows), ~150 generations of pools accumulated. On recovery, all of them raced to open connections simultaneously, blowing through maxConns by a factor of ~150.

The fix: move the nil + go prevConn.Close() to after a successful new connection is established, and return the existing pool when throttled rather than returning nil.

Why?

Fixes #9747

How did you test it?

  • built
  • added new unit test(s) — TestReconnectPoolAccumulationDuringOutage and TestReconnectNilPoolOnThrottle in db_handle_test.go directly reproduce both failure modes

Potential risks

  • The old pool is now kept alive until a successful reconnect, so callers may briefly continue to use a pool whose connections are failing rather than getting DatabaseUnavailableError immediately. This is intentional — ConvertError still detects individual connection errors, retriggers reconnect(true), and the throttle ensures we attempt at most one reconnect per second. The overall behaviour is strictly better under sustained outages.

@gurudatta-patil gurudatta-patil requested review from a team as code owners April 9, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MySQL Connector creates unbounded new sql.DB pools during sustained DB unavailability

1 participant