Skip to content

panic: interface conversion: policy.BackendCtx is nil, not *router.backendWrapper in rebalance loop (v1.3.1) #1171

@dingsongjie

Description

@dingsongjie

Bug Report

1. Minimal reproduce step

Deploy TiProxy v1.3.1 via TiDB Operator without TLS enabled, with balance.policy = "resource" (default) and at least 2 TiDB backends. Have a client maintaining a long-lived connection.

2. What did you expect to see?

TiProxy runs stably without crash.

3. What did you see instead?

TiProxy panics and restarts repeatedly (~every 11 minutes). Over 36 days the container has restarted 683 times.

The panic stack trace from the previous container log:

panic: interface conversion: policy.BackendCtx is nil, not *router.backendWrapper
goroutine 198 running:
github.com/pingcap/tiproxy/pkg/balance/router.(*ScoreBasedRouter).rebalance(0xc0004bbc20, {0x13d1918, 0xc00013dc70})
/workspace/source/tiproxy/pkg/balance/router/router_score.go:355 +0x69f
github.com/pingcap/tiproxy/pkg/balance/router.(*ScoreBasedRouter).rebalanceLoop(0xc0004bbc20, {0x13d1918, 0xc00013dc70})
/workspace/source/tiproxy/pkg/balance/router/router_score.go:331 +0x16d
github.com/pingcap/tiproxy/pkg/balance/router.(ScoreBasedRouter).Init.func1()
/workspace/source/tiproxy/pkg/balance/router/router_score.go:67 +0x1f
github.com/pingcap/tiproxy/lib/util/waitgroup.(WaitGroup).Run.func1()
/workspace/source/tiproxy/lib/util/waitgroup/waitgroup.go:26 +0x4c
created by github.com/pingcap/tiproxy/lib/util/waitgroup.(*WaitGroup).Run in goroutine 1
/workspace/source/tiproxy/lib/util/waitgroup/waitgroup.go:24 +0x73

Container last state: Reason: Error, Exit Code: 2

4. Reproduce cycle

  1. TiProxy starts → discovers 2 healthy TiDB backends → works normally
  2. Client connects (connID=1)
  3. After ~5 minutes, rebalance triggers connection redirect:
    WARN redirect connection failed connID=1 from=tidb-tidb-2st127...:4000 to=tidb-tidb-0nsusf...:4000
    redirect_err="ERROR 8146 (HY000): cannot migrate the current session: no certificate or key file to sign the data"
  4. Redirect fails because TLS is not configured (no cert/key to sign migration data)
  5. ~5 minutes later, rebalance triggers again → panic: BackendCtx is nil
  6. Container crashes (exit code 2), K8s restarts it → cycle repeats

5. Environment

  • TiProxy version: v1.3.1 (Git Commit: 967f4cc)
  • TiDB version: 8.5.6
  • Deployment: TiDB Operator on Kubernetes
  • TLS: Not enabled (no certificates configured)
  • Balance policy: resource (default)
  • Resource limits: CPU 500m, Memory 1Gi

6. Config

[proxy]
addr = "[::]:6000"
advertise-addr = "tiproxy-tiproxy-u6q7d3.tiproxy-tiproxy-peer.tidb.svc"
pd-addrs = "pd-pd.tidb:2379"
graceful-close-conn-timeout = 15

[balance]
policy = "resource"

[advance]
ignore-wrong-namespace = true

[security]
# No TLS configured
7. Workaround
Setting policy = "" to disable balance stops the panic, but loses the load balancing feature.
8. Additional notes
- This is likely related to #486 (Fix panic when a TiDB fails during session migration) which was fixed in v1.0.0 but apparently doesn't cover this code path.
- The panic occurs in router_score.go:355 where BackendCtx is nil when the type assertion policy.BackendCtx.(*router.backendWrapper) is performed without a nil check.
- The root cause seems to be that a failed connection redirect (due to missing TLS) leaves the BackendCtx in an inconsistent state, which then triggers the nil panic on the next rebalance cycle.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions