Skip to content

Boolean type mismatch causes unnecessary Patroni config reloads during topology changes #1419

@marceloneppel

Description

@marceloneppel

Steps to reproduce

  1. Deploy PostgreSQL 14 with 3 units from the 14/stable channel (rev 987):
    juju deploy postgresql --channel 14/stable -n 3
  2. Wait for the cluster to become active:
    juju status --watch 5s
  3. Verify the cluster is healthy:
    juju exec -u postgresql/0 -- sudo -u snap_daemon charmed-postgresql.patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
  4. Note the current time and check existing Patroni logs to establish a baseline:
    juju exec -u postgresql/0 -- sudo bash -c "cat /var/snap/charmed-postgresql/common/var/log/patroni/patroni.log.* 2>/dev/null" | grep -i "Changed" | tail -5
  5. Block Raft port on one replica to cause it to drop from the cluster:
    juju exec -u postgresql/1 -- sudo iptables -A INPUT -p tcp --dport 2222 -j DROP
    juju exec -u postgresql/1 -- sudo iptables -A OUTPUT -p tcp --dport 2222 -j DROP
  6. Wait 30 seconds for the member to drop and topology change to be detected:
    sleep 30
  7. Restore connectivity to allow the member to rejoin:
    juju exec -u postgresql/1 -- sudo iptables -F
  8. Wait 30 seconds for the member to rejoin:
    sleep 30
  9. Check Patroni logs for new config change messages (compare timestamps with step 4):
    juju exec -u postgresql/0 -- sudo bash -c "cat /var/snap/charmed-postgresql/common/var/log/patroni/patroni.log.* 2>/dev/null" | grep -i "Changed"
  10. Observe the boolean type mismatch in the new log entries:
Changed archive_mode from 'on' to 'True' (restart might be required)
Changed synchronous_commit from 'on' to 'True'
Changed statement_timeout from '2s' to '0'

Expected behavior

Patroni should not reload PostgreSQL configuration when no actual configuration changes have been made. Boolean parameters like archive_mode and synchronous_commit should use PostgreSQL's native 'on'/'off' values consistently.

Actual behavior

When cluster topology changes (members dropping/rejoining), charm hooks run and trigger update_config(). Patroni logs show configuration "changes" due to type mismatch between PostgreSQL's 'on'/'off' and Python's True/False:

  2026-02-04 19:48:46 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)
  2026-02-04 19:48:46 UTC [5767]: INFO: Changed statement_timeout from '2s' to '0'
  2026-02-04 19:48:46 UTC [5767]: INFO: Changed synchronous_commit from 'on' to 'True'
  2026-02-04 19:48:46 UTC [5767]: INFO: Reloading PostgreSQL configuration.

This happens each time a topology change triggers charm hooks. When combined with network instability causing frequent topology changes, the primary unit flaps between "Primary" and "Primary (degraded)" states.

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.6.14

Juju agent: 3.6.14

Charm revision: 987 (14/stable channel)

LXD: 5.21.4 LTS

Log output

Juju debug log: (attach log.txt)

  Patroni log showing the config changes triggered by topology changes:
  2026-02-04 19:48:12 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)
  2026-02-04 19:48:12 UTC [5767]: INFO: Changed statement_timeout from '2s' to '0'
  2026-02-04 19:48:12 UTC [5767]: INFO: Changed synchronous_commit from 'on' to 'True'
  2026-02-04 19:48:12 UTC [5767]: INFO: Changed wal_keep_size from '128MB' to '4096'
  2026-02-04 19:48:13 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)
  2026-02-04 19:48:29 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)
  2026-02-04 19:48:40 UTC [5767]: INFO: Changed synchronous_standby_names from '' to '*'
  2026-02-04 19:48:40 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)
  2026-02-04 19:49:02 UTC [5767]: INFO: Changed archive_mode from 'on' to 'True' (restart might be required)

Juju debug log showing corresponding charm hooks:

  unit-postgresql-0: 19:48:06 INFO ran "start" hook
  unit-postgresql-0: 19:48:34 INFO ran "database-peers-relation-changed" hook
  unit-postgresql-0: 19:48:43 INFO ran "database-peers-relation-changed" hook
  unit-postgresql-0: 19:48:50 INFO ran "database-peers-relation-changed" hook
  unit-postgresql-0: 19:49:03 INFO ran "upgrade-relation-changed" hook
  unit-postgresql-0: 19:49:11 INFO ran "database-peers-relation-changed" hook

Additional context

Root cause: The charm's build_postgresql_parameters() method in lib/charms/postgresql_k8s/v0/postgresql.py does not convert Python boolean values (True/False) to PostgreSQL's expected string values ('on'/'off'). When PostgreSQL reports 'on' and the charm sends True, Patroni sees this as a configuration change and triggers a reload.

Affected parameters observed:

  • archive_mode: 'on''True'
  • synchronous_commit: 'on''True'
  • statement_timeout: '2s''0'
  • wal_keep_size: '128MB''4096'

Trigger: Config changes occur when cluster topology changes (members dropping/rejoining) trigger charm hooks that call update_config(). In a stable cluster, these won't appear frequently. In an unstable network environment with frequent member drops, they compound the instability.

Impact: Each unnecessary config reload causes cluster churn. When combined with network instability causing frequent topology changes, members may temporarily drop from the Patroni cluster, triggering "Primary (degraded)" status on the leader unit. This flapping behavior ("Primary" ↔ "Primary (degraded)") was observed in production DBaaS environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions