ORACLE 11.2.0.4 数据库磁盘空间爆满导致GAP产生
前言
昨天晚上深夜接到客户电话,反应数据库无法正常使用,想进入服务器检查时,登录响应非常慢。等两分钟后进入服务器且通过sqlplus进入数据库也很慢。通过检查服务器磁盘空间发现数据库所在区已经爆满,导致数据库在运行期间新增审计文件无法成功导致数据库宕机。客户说按照之前的策略,正常每天做完备份后均会删除七天前的归档日志。磁盘空间会保证一定程度的可用量。可最近直线上升。
问题
1、客户反应磁盘空间爆满不符合当前数据库的整体策略?
2、为什么会出现磁盘爆满的情况?
思考方向
1、磁盘空间爆满是因为数据库在运行期间,会不断产生日志文件(包括alert、trace、listener、audit等),同时如果数据文件使用了自动扩展也会占用磁盘空间。如果不定期清理日志文件和对数据文件进行瘦身,只会导致数据库占比空间越来越大。且客户有提到定期删除日志的现象,所以暂时排除该现象
2、磁盘空间爆满不符合数据库的整体策略。数据库针对不同方向可定制不同策略。如备份策略、日志文件定期删除策略。据客户反应,他们拥有备份和日志文件定期删除策略。也可以暂时排除该方向
检查
1、检查发现数据库确实存在备份策略和日志文件定期删除策略,且定时任务正常执行。检查以上策略的日志文件发现。日志文件定期删除策略中的日志记录正常执行,但是备份策略中的归档日志无法删除。
2、数据库中的归档日志无法删除。按照当前客户配置来看,不存在RAC环境。那么最大的可能就是DG库。那么问题就显而易见了,当存在DG库时,如果从库无法接收主库所产生的归档日志,那么主库的归档日志将无法删除,且会持续增长并占用主库磁盘空间。经检查从库磁盘空间,确实是从库磁盘空间爆满导致无法接收主库日志进行应用。所以主库无法删除归档日志。
查询从库是否存在GAP
1、从库查询GAP
SQL> select * from v$archive_gap;THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------1 332 334
2、查询从库已应用的日志
# 查询应用完成的日志
SQL> SELECT max(sequence#) from v$archived_log where applied='YES';MAX(SEQUENCE#)
--------------332
经查发现,从库由于磁盘空间爆满无法应用日志导致GAP产生。开始修复
删除从库已经应用的日志
注意:此处删除的日志应尽最大程度符合备份策略中对于归档日志删除的保留期
# 删除七天前的日志
find . -type f -mtime +7 -exec rm -rf {} \;
从库重新注册未应用的归档日志
RMAN>catalog start with '/data/archive_log_332.log';
RMAN>catalog start with '/data/archive_log_333.log';
RMAN>catalog start with '/data/archive_log_334.log';
从库应用GAP日志
RMAN>restore archivelog sequence between 332 and 334;
从库开启日志同步
SQL>alter database recover managed standby database using current logfile disconnect from session;
校验
经检查发现依然存在GAP。所以此处使用备份进行修复
QL> select process,client_process,sequence#,status,BLOCK#,BLOCKS,GROUP#,delay_mins from v$managed_standby;PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
ARCH ARCH 0 CONNECTED 0 0
N/A 0ARCH ARCH 443 CLOSING 12288 1610
5 0ARCH ARCH 0 CONNECTED 0 0
N/A 0PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
ARCH ARCH 444 CLOSING 1 321
4 0RFS ARCH 0 IDLE 0 0
N/A 0RFS LGWR 445 IDLE 25 1
1 0PROCESS CLIENT_P SEQUENCE# STATUS BLOCK# BLOCKS
--------- -------- ---------- ------------ ---------- ----------
GROUP# DELAY_MINS
---------------------------------------- ----------
RFS UNKNOWN 0 IDLE 0 0
N/A 0MRP0 N/A 335 wait_for_gap 25 102400
N/A 0
修复方法
1、查询从库当前SCN
SQL> select to_char(current_scn) from v$database;TO_CHAR(CURRENT_SCN)
----------------------------------------
993872
2、从库断开日志同步
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
3、从库关机并启动到nomount状态
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>startup nomount;
4、主库执行备份
run{
allocate channel c1 device type disk;
allocate channel c2 device type disk;
backup as backupset incremental from scn 993872 database format '/data/db/backup/zengliang_%u.bak';
backup current controlfile for standby format '/data/db/backup/standby.bak';
release channel c1;
release channel c2;
}
5、从库恢复控制文件
RMAN> restore standby controlfile from '/data/db/backup/standby.bak';Starting restore at 14-MAY-25
using channel ORA_DISK_1channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/data/db/oradata/master/control01.ctl
output file name=/data/db/oradata/master/control02.ctl
Finished restore at 14-MAY-25
6、从库注册备份
RMAN> catalog start with '/data/db/backup/';
searching for all files that match the pattern /data/db/backup/
List of Files Unknown to the Database
=====================================
File Name: /data/db/backup/standby.bak
File Name: /data/db/backup/zengliang_0e3peq3p.bak
File Name: /data/db/backup/zengliang_0f3peq3p.bak
File Name: /data/db/backup/zengliang_0g3peq4s.bak
7、从库恢复数据库
RMAN> recover database noredo;Starting recover at 14-MAY-25
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /data/db/oradata/master/system01.dbf
destination for restore of datafile 00003: /data/db/oradata/master/undotbs01.dbf
destination for restore of datafile 00005: /data/db/oradata/master/apps.dbf
channel ORA_DISK_1: reading from backup piece /data/db/backup/zengliang_0e3peq3p.bak
channel ORA_DISK_1: piece handle=/data/db/backup/zengliang_0e3peq3p.bak tag=TAG20250514T161657
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00002: /data/db/oradata/master/sysaux01.dbf
destination for restore of datafile 00004: /data/db/oradata/master/users01.dbf
destination for restore of datafile 00006: /data/db/oradata/master/test.dbf
channel ORA_DISK_1: reading from backup piece /data/db/backup/zengliang_0f3peq3p.bak
channel ORA_DISK_1: piece handle=/data/db/backup/zengliang_0f3peq3p.bak tag=TAG20250514T161657
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07Finished recover at 14-MAY-25
8、从库清理redo日志组
SQL> alter database clear logfile group 1;
SQL> alter database clear logfile group 2;
SQL> alter database clear logfile group 3;
9、从库打开数据库
SQL> alter database open read only;
10、从库开启日志应用
SQL>alter database recover managed standby database using current logfile disconnect from session;
定期删除脚本–OS级别
注意:使用该脚本并加入定时计划,可实现定期自动化删除归档日志,同理其它文件也可以使用该脚本
logpath="/data/scripts/new_scripts"
CURTIME=`date +%Y%m%d%H%M%S`
delF_log="$logpath/del_file_list_15.log"export week=`date +%w`
if [ ! -f $delF_log ]; then
touch $delF_log
chmod 777 $delF_log
else
rm -rf $delF_log
touch $delF_log
chmod 777 $delF_log
fiif [ $week -ne 0 ] ; thenecho "Today cannot Sunday:$week" >> $delF_logexit 0
fipad=$(printf '%0.1s' "-"{1..60})folList[0]="/back/prod/New_Coldback"
echo "Delete 15 days before file " >> $delF_log
echo "Perform Date :"$CURTIME >> $delF_log
echo "$pad" >>$delF_log
for i in ${folList[@]}
do
echo "Location is : $i " >>$delF_log
find $i -type d -mtime +15 >> $delF_log
find $i -type f -mtime +15 >> $delF_log
find $i -type f -mtime +15 -exec rm -rf {} \;
done
echo "$pad" >>$delF_log
echo "Complete Time :"$CURTIME >> $delF_log