openGauss数据库备份与恢复实践
准备
数据库版本:
openGauss-All-6.0.1-openEuler22.03-x86_64
数据库:
testdb1
参数设置:
gs_guc set -I all -N all -c "enable_cbm_tracking=on"
gs_guc set -I all -N all -c "synchronous_commit=off"
gs_guc set -N all -I all -c "archive_mode=on"
gs_guc set -N all -I all -c "archive_dest='/opt/software/openGauss/install/archive'"
gs_guc set -N all -I all -c "wal_sender_timeout =10s"
gs_guc set -N all -I all -c "archive_timeout =300s"
表:
表初始数据:
开始备份
初始化备份目录:
执行:
#gs_probackup init -B /opt/software/openGauss/install/backup/bak1
添加一个新的备份实例:
执行:
#gs_probackup add-instance -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1
进行全量备份:
执行:
#gs_probackup backup -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -b full
[omm@hostName1 data]$ gs_probackup backup -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -b full
INFO: Backup start, gs_probackup version: 2.4.2, instance: bak20250605_1, backup ID: SXDEOH, backup mode: FULL, wal mode: STREAM, remote: false, compress-algorithm: none,compress-level: 1
LOG: Backup destination is initialized
LOG: This openGauss instance was initialized with data block checksums. Data block corruption will be detected
LOG: Database backup start
LOG: started streaming WAL at 0/8000000 (timeline 1)
[2025-06-05 14:43:30]: check identify system success
[2025-06-05 14:43:30]: send START_REPLICATION 0/8000000 success
[2025-06-05 14:43:30]: keepalive message is received
INFO: PGDATA size: 639MB
[2025-06-05 14:43:30]: keepalive message is received
INFO: Start backing up files
LOG: Creating page header map "/opt/software/openGauss/install/backup/bak1/backups/bak20250605_1/SXDEOH/page_header_map"
[2025-06-05 14:43:35]: keepalive message is received
Progress: [==================================================] 100% (3210/3210, done_files/total_files). backup file
[2025-06-05 14:43:40]: keepalive message is received
[2025-06-05 14:43:45]: keepalive message is received
INFO: Finish backuping file
INFO: Data files are transferred, time elapsed: 15s
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
LOG: stop_lsn: 0/80001E8
LOG: Looking for LSN 0/80001E8 in segment: 000000010000000000000008
LOG: Found WAL segment: /opt/software/openGauss/install/backup/bak1/backups/bak20250605_1/SXDEOH/database/pg_xlog/000000010000000000000008
LOG: Thread [0]: Opening WAL segment "/opt/software/openGauss/install/backup/bak1/backups/bak20250605_1/SXDEOH/database/pg_xlog/000000010000000000000008"
LOG: Found LSN: 0/80001E8
LOG: finished streaming WAL at 0/9000000 (timeline 1)
LOG: Getting the Recovery Time from WAL
LOG: Thread [0]: Opening WAL segment "/opt/software/openGauss/install/backup/bak1/backups/bak20250605_1/SXDEOH/database/pg_xlog/000000010000000000000008"
INFO: Syncing backup files to disk
Progress: [==================================================] 100% (3210/3210, done_files/total_files). Sync backup file
INFO: Finish Syncing backup files.
INFO: Backup files are synced, time elapsed: 0
INFO: Validating backup SXDEOH
INFO: Begin validate file
Progress: [==================================================] 100% (3212/3212, done_files/total_files). validate file
INFO: Finish validate file.
INFO: Backup SXDEOH data files are valid
INFO: Backup SXDEOH resident size: 655MB
INFO: Backup SXDEOH completed
[omm@hostName1 data]$
备份成功后返回备份ID:SXDEOH
查看备份信息:
执行:
#gs_probackup show -B /opt/software/openGauss/install/backup/bak1
检查备份:
执行:
#gs_probackup validate -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -i SXDEOH
查看备份目录:
备份目录中已经有备份的文件信息了
模拟业务删除数据
delete表:t_1的数据
truncate表: t_2
drop表:t_3
切换wal日志:
执行:
#select pg_xlogfile_name(pg_switch_xlog());
作用:
WAL位置转换:将LSN(Log Sequence Number)转换为实际的WAL文件名
例如将0/156DF40转换为000000010000000000000015
WAL文件识别:帮助管理员快速定位特定事务或操作对应的物理WAL文件
#checkpoint;
作用:
数据持久化保证:将内存中的脏页(Dirty Page)写入磁盘,确保在检查点之前的所有事务修改都已持久化到存储设备;
崩溃恢复基础:作为数据库恢复的起始点,崩溃后只需重放检查点之后的WAL日志;
WAL日志管理:标记哪些WAL日志可以被回收或删除,避免WAL日志无限增长
记录日志号:
#select pg_switch_xlog();
作用:
强制切换WAL日志文件:立即结束当前WAL日志文件,创建一个新的WAL日志文件继续写入
触发检查点(Checkpoint):间接触发检查点操作,确保所有已提交事务的数据页被刷新到磁盘
在备份前执行以确保WAL文件完整性
日志号(LSN)为:0/A000288
记录事务ID:
执行:
#select * from pg_get_variable_info();
其中:latest_completed_xid字段值为事务ID:92739
模拟故障
停止数据库:
执行:
#gs_ctl stop
删除数据目录:
为保险起见,使用mv命令:
进入数据库安装目录:
#cd /opt/software/openGauss/install
#mv data data_bak20250605_1
启动数据库:
执行:
#gs_ctl start
发现启动失败
开始恢复
全量恢复
使用restore子命令前,应先停止gaussdb进程
执行:
#gs_probackup restore -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -i SXDEOH
[omm@hostName1 install]$ gs_probackup restore -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -i SXDEOH
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SXDEOH
LOG: check external directories of backup SXDEOH
INFO: Validating backup SXDEOH
INFO: Begin validate file
Progress: [==================================================] 100% (3212/3212, done_files/total_files). validate file
INFO: Finish validate file.
INFO: Backup SXDEOH data files are valid
LOG: Thread [1]: Opening WAL segment "/opt/software/openGauss/install/backup/bak1/backups/bak20250605_1/SXDEOH/database/pg_xlog/000000010000000000000008"
INFO: Backup SXDEOH WAL segments are valid
INFO: Backup SXDEOH is valid.
INFO: Restoring the database from backup at 2025-06-05 14:43:29+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks... in /opt/software/openGauss/install/data/dn1
INFO: Start restoring backup files. DATA size: 655MB
INFO: Begin restore file
Progress: [==================================================] 100% (3212/3212, done_files/total_files). Restore file
INFO: Finish restore file
INFO: Backup files are restored. Transfered bytes: 655MB, time elapsed: 3s
INFO: Restore incremental ratio (less is better): 100% (655MB/655MB)
INFO: Start Syncing restored files to disk
Progress: [==================================================] 100% (3212/3212, done_files/total_files). Sync restore file
INFO: Finish Syncing restored files.
INFO: Restored backup files are synced, time elapsed: 6s
INFO: Restore of backup SXDEOH completed.
[omm@hostName1 install]$
查看数据目录是否恢复
#ll
数据目录已经恢复了
配置恢复文件:
在数据目录:/opt/software/openGauss/install/data/dn1
增加recovery.conf文件:
如果想基于时间点恢复,可配置为:
recovery_target_time = '2025-06-05 14:12:06'
recovery_target_inclusive = false
restore_command = 'cp /opt/software/openGauss/install/archive/%f %p'
如果想基于事务ID恢复,可配置为:
recovery_target_xid = '92739'
recovery_target_inclusive = true
restore_command = 'cp /opt/software/openGauss/install/archive/%f %p'
如果想基于LSN恢复,可配置为:
recovery_target_lsn = '4FD/13021790'
recovery_target_inclusive = true
restore_command = 'cp /opt/software/openGauss/install/archive/%f %p'
如果想基于还原点恢复,可配置为:
recovery_target_name = 'restore_point_1'
restore_command = 'cp /opt/software/openGauss/install/archive/%f %p'
注意:
1)如果不配置任何恢复目标 或 配置目标不存在,则默认恢复到最新的WAL日志点。
2)recovery_target_inclusive:声明是否在指定恢复目标之后停止(true) 或 之前停止(false),不支持recovery_target_name 配置
3)还原点可以通过pg_create_restore_point函数创建,例如:
#select pg_create_restore_point('restore_point_1');
注:启动数据库后recovery.conf文件名称变为recovery.done
只有配置了恢复文件recovery.conf变成recovery.done,本次恢复才算结束
启动数据库:
执行:
#gs_ctl start
数据库可以启动了!
启动数据库后recovery.conf文件名称变为recovery.done
只有配置了恢复文件recovery.conf变成recovery.done,本次恢复才算结束
进入数据库查看数据是否恢复:
delete表:t_1表数据全部恢复
truncate表:t_2表数据全部恢复
drop的表:t_3居然没有恢复过来!好奇怪!
检测恢复状态:
执行:
#SELECT pg_is_in_recovery();
返回结果:t表示仍在恢复模式(只读),f表示已完成
如果返回的是t,即数据库在恢复状态中,则数据库则是只读状态
结束恢复:
如果确认恢复没有问题,即达到了恢复目标,可以执行下面命令,使机器对外提供读写服务:
#select pg_xlog_replay_resume();
再次查看恢复状态:
#SELECT pg_is_in_recovery();
向表插入一条数据,验证一下:
至此,恢复完成!
删除备份:
查看备份信息:
#gs_probackup show -B /opt/software/openGauss/install/backup/bak1
删除备份:
把备份失败的备份删除,执行:
#gs_probackup delete -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -i SXEY2P
再次查看:
#gs_probackup show -B /opt/software/openGauss/install/backup/bak1
定时任务后台执行备份与恢复:
在实际生产应用中,需要定期对数据库做备份,需要使用到crontab定时任务
数据库数据量比较大,备份、恢复时间都比较长,直接执行备份命令会超时,也可以采用定时任务方式执行备份、恢复,
定时任务执行备份:
例如:
执行命令
#crontab -e
进入定时任务编辑,增加备份任务:
30 10 * * * source /home/omm/.bashrc;/opt/software/openGauss/install/app/bin/gs_probackup backup -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -b full -t 3000
说明:
50 10 * * * 表示每天10:50开始执行备份
source /home/omm/.bashrc 导入omm用户环境变量
定时任务执行日志,可以在/var/log目录下的cron文件中查看
到达定时任务指定的时间后,查看备份信息:
#gs_probackup show -B /opt/software/openGauss/install/backup/bak1
定时任务执行完以后,查看定时任务日志:
#vi /var/log/cron
也可以使用nohup命令后台执行备份避免超时;
nohup gs_probackup backup -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -b full > /opt/software/openGauss/install/backup/backup_20250206_3.log 2>&1 &
定时任务执行恢复:
恢复数据库时,一般不会用crontab定时任务,但数据库数据非常大时,恢复实践比较长,才用用到crontab定时任务来做恢复,注意配置crontab定时任务时,一定要配置一个一次性的,千万别搞成了定期执行了
配置crontab任务:
执行:
#crontab -e
增加:
15 11 * * * source /home/omm/.bashrc;/opt/software/openGauss/install/app/bin/gs_probackup restore -B /opt/software/openGauss/install/backup/bak1 --instance bak20250605_1 -i SXEYJE
等待定时任务指定的时间后,启动数据库,查看恢复情况