PostgreSQL HA Cluster on OCI
Terraform module for deploying a High-Availability PostgreSQL cluster on Oracle Cloud Infrastructure (OCI) using Patroni for automatic failover and etcd for distributed consensus.
Component
Count
Role
PostgreSQL Nodes
2
Primary + Replica with automatic failover
etcd Witness
1
Quorum/consensus provider (no PostgreSQL)
Network Load Balancer
1
Routes connections to the current Patroni leader
PostgreSQL : Configurable version (default: 15)
Patroni : Cluster management and automatic failover
etcd : Distributed configuration store (default: 3.5.17)
Operating System : Oracle Linux
Cloud Provider : Oracle Cloud Infrastructure (OCI)
Network Load Balancer
(TCP Port 5432 - PostgreSQL)
/ \
PostgreSQL Node 1 PostgreSQL Node 2
(Leader) (Replica)
- PostgreSQL - PostgreSQL
- Patroni - Patroni
- etcd - etcd
\ /
etcd Witness Node
(Quorum Only - no PostgreSQL)
module "postgres-cluster" {
source = " ./modules/postgres-cluster"
prefix = " myapp"
cluster_name = " MYAPP-POSTGRES"
common_tags = { environment = " prod" , managed_by = " terraform" }
compartment_id = var. compartment_id
vcn_id = var. vcn_id
subnet_id = var. subnet_id
subnet_cidr = " 10.0.2.0/24"
postgres_cluster_nsg = {
PostgreSQL = {
cidr_blocks = [" 10.0.2.0/24" ]
min_port = 5432
max_port = 5432
direction = " INGRESS"
stateless = false
}
Patroni = {
cidr_blocks = [" 10.0.2.0/24" ]
min_port = 8008
max_port = 8008
direction = " INGRESS"
stateless = false
}
etcd_client = {
cidr_blocks = [" 10.0.2.0/24" ]
min_port = 2379
max_port = 2379
direction = " INGRESS"
stateless = false
}
etcd_peer = {
cidr_blocks = [" 10.0.2.0/24" ]
min_port = 2380
max_port = 2380
direction = " INGRESS"
stateless = false
}
}
reserved_private_ips = [
cidrhost (" 10.0.2.0/24" , 175 ),
cidrhost (" 10.0.2.0/24" , 176 ),
cidrhost (" 10.0.2.0/24" , 177 ),
]
family_shape = " VM.Standard.E4.Flex"
image_id = var. oracle_linux_image_id
postgres_instance_specs = { ocpus = 4 , memory = 32 }
ssh_authorized_keys_postgres = var. ssh_keys
}
See examples/complete/ for a full working example with KMS encryption and compartment setup.
Name
Version
terraform
>= 1.5
oci
>= 5.0
Name
Description
Type
Default
Required
prefix
Prefix for all resource names
string
-
yes
cluster_name
Name for the PostgreSQL/Patroni cluster
string
-
yes
common_tags
Common freeform tags for all resources
map(string)
{}
no
compartment_id
OCI compartment OCID
string
-
yes
vcn_id
VCN OCID
string
-
yes
subnet_id
Subnet OCID for cluster instances
string
-
yes
subnet_cidr
Subnet CIDR for pg_hba rules
string
-
yes
postgres_cluster_nsg
NSG rules map
map(object)
-
yes
reserved_private_ips
3 private IPs: [node1, node2, witness]
list(string)
-
yes
family_shape
OCI compute shape
string
-
yes
image_id
Oracle Linux image OCID
string
-
yes
postgres_instance_specs
PostgreSQL instance CPU/memory
object
-
yes
witness_instance_specs
Witness instance CPU/memory
object
{ocpus=1, memory=8}
no
ssh_authorized_keys_postgres
Base64-encoded SSH public keys
list(string)
-
yes
kms_key_id
KMS key OCID for volume encryption
string
null
no
boot_volume_size_gb
Boot volume size in GB
number
100
no
data_volume_size_gb
Data volume size in GB
number
1024
no
wal_volume_size_gb
WAL volume size in GB
number
200
no
backup_volume_size_gb
Backup volume size in GB
number
600
no
postgresql_version
PostgreSQL major version
number
15
no
etcd_version
etcd version string
string
v3.5.17
no
db_name
Default database name
string
postgres
no
postgres_user
PostgreSQL superuser name
string
postgres
no
admin_user
Admin user name
string
admin
no
postgresql_parameters
PostgreSQL tuning parameters
object
{max_connections=200, shared_buffers="2GB", effective_cache_size="6GB"}
no
Name
Description
nlb_ip_addresses
Network Load Balancer IP addresses
postgres_instance_ids
Map of PostgreSQL instance OCIDs
postgres_private_ips
Map of PostgreSQL instance private IPs
witness_instance_id
etcd witness instance OCID
witness_private_ip
etcd witness private IP
nsg_id
Network Security Group OCID
Patroni detects primary failure (~10 seconds)
etcd quorum reached (replica + witness = 2/3)
Patroni promotes replica to leader (~5-15 seconds)
NLB health check (/primary on port 8008) detects new leader
Total downtime : typically 15-30 seconds
Each PostgreSQL node has three dedicated block volumes:
Volume
Default Size
Mount Point
Purpose
Data
1024 GB
/pgdata
PostgreSQL data files
WAL
200 GB
/pgwal
Write-Ahead Logs
Backup
600 GB
/pgbackup
Local backup storage
Service
Port
Purpose
PostgreSQL
5432
Database connections
Patroni REST API
8008
Cluster management & NLB health checks
etcd Client
2379
DCS communication
etcd Peer
2380
Cluster consensus
# Check cluster status
patronictl -c /etc/patroni.yml list
# Check replication lag
psql -c " SELECT client_addr, state, sync_state, replay_lag FROM pg_stat_replication;"
# Manual switchover
patronictl -c /etc/patroni.yml switchover
# etcd cluster health
etcdctl --endpoints=http://< node1> :2379,http://< node2> :2379,http://< witness> :2379 endpoint health
All instances use private IPs only (no public internet exposure)
Network access controlled via OCI Network Security Groups
PostgreSQL authentication via MD5 passwords (consider upgrading to scram-sha-256)
etcd runs without TLS on the private subnet (enable TLS for production if required)
Block volumes support KMS encryption via kms_key_id variable
Credentials are generated at bootstrap and stored in /root/.pgcredentials on the primary node
Enable PostgreSQL SSL/TLS for client connections
Migrate to scram-sha-256 authentication
Integrate with OCI Vault for secret management
Enable etcd TLS for production workloads
Configure audit logging
# Check service status
systemctl status patroni
systemctl status etcd
# View logs
journalctl -u patroni -n 100 --no-pager
journalctl -u etcd -n 100 --no-pager
tail -100 /var/log/user-data.log
Issue
Symptom
Solution
Split-brain
Both nodes claim leader
Check etcd quorum, restart Patroni
High replication lag
Lag > 10MB
Check network, disk I/O, PostgreSQL logs
etcd unhealthy
Member shows "unhealthy"
Restart etcd service, verify network
Connection refused
Cannot connect to PostgreSQL
Check firewall rules, verify Patroni state
modules/postgres-cluster/
main.tf # Locals and instance configuration
variables.tf # All input variables
outputs.tf # Module outputs
versions.tf # Provider and Terraform version constraints
data.tf # Data sources (ADs, fault domains)
nsg.tf # Network Security Group
nlb.tf # Network Load Balancer
block-volumes.tf # Block volumes and attachments
postgres_linux_instance.tf # PostgreSQL compute instances
etcd_linux_instance.tf # etcd witness compute instance
user-data/
postgres-userdata.sh # PostgreSQL node initialization
etcd-userdata.sh # etcd witness initialization
examples/complete/ # Full working example
See LICENSE for details.