当前位置：首页 > backend >正文

PostgreSQL主从同步双机集群创建与配置

backend 2025/7/20 1:11:35

PostgreSQL主从同步双机集群创建与配置

本文总结一下PostgreSQL数据库自带的主从同步双机集群创建与配置细化步骤，以及怎样进行手动failover。

PostgreSQL主从同步配置

注：下文cluster相当于oracle的数据库实例，一台服务器上可以创建多个cluster，只要给分配不同的端口就可以，相互之间都相互独立的。

文中涉及两台服务器，IP分别为192.168.1.11、192.168.1.12，起始时先在192.168.1.11安装PostgreSQL，并创建cluster d1(安装PostgreSQL软件与创建主库cluster过程本文略，另开文章记录)，作为主数据库（可读可写），然后在192.168.1.12安装好PostgreSQL(本文略，另开文章记录)，接下来后续文章描述怎么配置创建主从同步。

本文的操作目标是：主库的任务DDL与DML都能自动同步到从库，主库可读写，从库仅可读。

主库操作

登录主cluster d1数据库，创建用于主从同步的账户：
```
create role rep login replication encrypted password 'abcd1234';
```
增加rep账户对于主库的访问权限，主cluster修改pg_hba.conf文件，结尾增加以下条目：
```
host   replication   rep     192.168.1.12/32      trust
```
主cluster修改数据库配置文件postgresql.conf（使用13版本的PostgreSQL,使用debian apt安装并使用cluster命令创建的配置文件路径一般是/etc/postgresql/13/d1/postgresql.conf）
- 修改max_connections = 500
- 修改wal_keep_size = 8000,摘录官方说明如下：
  
  wal_keep_size (integer)
  
  Specifies the minimum size of past log file segments kept in the pg_wal directory, in case a standby server needs to fetch them for streaming replication. If a standby server connected to the sending server falls behind by more than wal_keep_size megabytes, the sending server might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. Downstream connections will also eventually fail as a result. (However, the standby server can recover by fetching the segment from archive, if WAL archiving is in use.)
  
  This sets only the minimum size of segments retained in pg_wal; the system might need to retain more segments for WAL archival or to recover from a checkpoint. If wal_keep_size is zero (the default), the system doesn’t keep any extra segments for standby purposes, so the number of old WAL segments available to standby servers is a function of the location of the previous checkpoint and status of WAL archiving. If this value is specified without units, it is taken as megabytes. This parameter can only be set in the postgresql.conf file or on the server command line.
  
  wal_keep_size = wal_keep_segments * wal_segment_size (typically 16MB)
  
  wal_keep_segments=500 则 wal_keep_size=8000
- wal_sender_timeout (integer) 使用默认60s,摘录官方说明如下：
  
  Terminate replication connections that are inactive for longer than this amount of time. This is useful for the sending server to detect a standby crash or network outage. If this value is specified without units, it is taken as milliseconds. The default value is 60 seconds. A value of zero disables the timeout mechanism.
  
  With a cluster distributed across multiple geographic locations, using different values per location brings more flexibility in the cluster management. A smaller value is useful for faster failure detection with a standby having a low-latency network connection, and a larger value helps in judging better the health of a standby if located on a remote location, with a high-latency network connection.

从库操作

在192.168.1.12创建从库cluster（名称使用主cluster相同的d1）不启动
```
pg_createcluster 13 d1 --port 5436
```
然后改名备份/var/lib/postgresql/13/d1/数据目录，或者直接删除，因为这个后面会从主cluster复制过来的：
```
mv /var/lib/postgresql/13/d1/ /var/lib/postgresql/13/d1-bak/
```
在192.168.1.12 执行获取基础备份：
```
 pg_basebackup -h 192.168.1.11 -p 5436 -U rep -F p -X stream -P -R -D /var/lib/postgresql/13/d1/ -l backup0001 -W
```
pg_basebackup备注：

-R: 会在数据目录创建standby.signal,并在在数据目录下的postgresql.auto.conf增加primary_conninfo

-F p : formart=plain ; t 则为 tar

-X stream 备份期间获取预写日志方法？

-P 显示进度

-W要求输入密码

-l backup0001 ：-l label` 设置标签

使用
```
ps -ef |grep  walsender/walreceiver 
```
可以看对应服务器上的同步进程，ps -ef |grep walsender查看主库的同步进程，ps -ef |grep walreceiver查看从库的同步进程。

根据主数据库数据量的大小，执行pg_basebackup可能会花一段时间，执行完成后，命令行界面结果最新显示日志如下：
```
879596	1201748	rep	pg_basebackup 192.168.1.12 51198	2025-05-15 17:37:35.661 +0800 backup 0	async	
879609	1201748	rep	pg_basebackup 192.168.1.12 51212	2025-05-15 17:37:36.480 +0800 streaming	E/AE000110	E/AE000110	E/AE000000		00:00:06.080555	00:01:00.117825	00:01:00.117825	0	async	2024-05-15 17:38:37.125 +0800
```

从库数据目录/var/lib/postgresql/13/d1/能看到 postgresql.auto.conf文件，内容如下：

primary_conninfo = 'user=rep password=''abcd1234'' channel_binding=prefer host=192.168.1.11 port=5436 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable'

并且从库 /var/lib/postgresql/13/d1/ 目录下已经存在 standby.signal

可以登录主库，使用

SELECT client_addr,sync_state FROM pg_stat_replication
SELECT * FROM pg_stat_replication;
SELECT pg_is_in_recovery();

查看同步情况

从库查看有几个database:
```
SELECT datname FROM pg_database;
```
设置从库配置/etc/postgresql/13/d1/postgresql.conf，配置一些参数与现在的主库相同，这样，主从互换后，还是可以进行主从同步:

listen_address = ‘*’

wal_keep_size=8000

max_worker_processes 设置为=51,未设置会有以下提示

hot standby is not possible because max_worker_processes = 8 is a lower setting than on the master server (its value was 51)

max_locks_per_transaction 设置为=512,未设置会有以下提示

hot standby is not possible because max_locks_per_transaction = 64 is a lower setting than on the master server (its value was 512)

max_connections 设置为=500

设置从现主库访问现从库的权限，修改/etc/postgresql/13/d1/pg_hba.conf，增加以下条目：
```
#客户端访问
host   all   all     0.0.0.0/0      md5
#升主后从库复制需要的访问权限
host   replication   rep     192.168.1.11/32      trust
```
启动从库：
```
systemctl start postgresql@13-d1
```

测试

在主库建表插入数据，都能同步到从库。
如果在从库修改数据，会显示：

SQL 错误 [25006]: ERROR: cannot execute UPDATE in a read-only transaction
如果在从库建表，会显示：

ERROR: cannot execute CREATE TABLE in a read-only transaction

手动failover

首次failover

以上的双机同步配置结构，在主宕机的情况下，主库的功能并不会自动failover到从库，仅保证了主库宕机后，从库依然能够提供数据库读取的功能。不过在主库未能及时维修的情况下，DBA可以使用如下方法手动提升从库为主库。自动failover的配置另开文章，敬请关注。

原主-192.168.1.11模拟宕机：systemctl stop postgresql@13-d1原从库-192.168.1.12提升为主：/lib/postgresql/13/bin/pg_ctl promote -D /var/lib/postgresql/13/d1/原主-192.168.1.11配置为从库，配置好需要连接跟随的主库信息，并启动：vi /etc/postgresql/13/d1/postgresql.conf #增加配置行：primary_conninfo = 'host=192.168.1.12 port=5436 user=rep password=abcd1234'touch /var/lib/postgresql/13/d1/standby.signalsystemctl start postgresql@13-d1

后续failover操作

配置在首次切换主从时已经配置，即相比上面的手动failover测试少去了在postgresql.conf里增加primary_conninfo配置的操作。操作完整命令如下：

原主：
systemctl stop postgresql@13-d1原从：
/lib/postgresql/13/bin/pg_ctl promote -D /var/lib/postgresql/13/d1/原主：
touch /var/lib/postgresql/13/d1/standby.signal
systemctl start postgresql@13-d1

查看全文

http://www.xdnf.cn/news/9989.html