当前位置: 首页 > article >正文

Oracle RAC环境NBU异机恢复

一、故障原因

9月3日中午,主机运维人员误操作,对sydb数据库的ASM磁盘做了格式化,将三块共享磁盘做了分区,然后使用vgextend命令加入到本地文件系,造成sydb的两个实例异常,没有操作CRS磁盘,所以RAC 集群无影响

二、问题排查

1.确定被格式化的磁盘

在这里插入图片描述
在这里插入图片描述

根据Linux的lsbk和history命令,可以确定mpathd和mpath两块盘做了格式化。实际上,当时history只查询了RAC的一个节点,在另一个节点对mpathe磁盘做了格式化。最终确定 三块ASM盘做了破会,其中一块属于ARCDG磁盘组(此磁盘组只包含一块磁盘),另两块属于DATADG磁盘组(此磁盘组只包含两块),而且磁盘组做的外部冗余。

2.确定磁盘破坏的AU

通过kfed 查询被破坏的三快盘

在这里插入图片描述

发现三块盘的AU 0 和AU 1 分别有不同数量的block被损坏。

三、恢复方案研究

  1. kfed 恢复

    ​ kfed 是Oracle 未公开的一种ASM工具,它可以读取和修改ASM的元数据块,kfed直接读取磁盘,不受ASM实例状态影响。

    ​ 但是三块磁盘被破坏的不仅仅是元数据(AU 0),还有AU 1 的部分数据块,而且kfed 更多的是对元数据磁盘头块的修复,因此不适合本次故障恢复

  2. amdu

    ​ amdu是元数据导出工具,它能输出ASM磁盘的元数据信息,并且从ASM磁盘中抽取元数据和数据文件,且不依赖ASM实例和ASM磁盘状态。

    ​ 使用AMDU可以抽取出来数据文件,然后尝试不完全恢复,但是有两块磁盘的ASM元数据1号文件(即文件目录,位于aun 0 blkn 1)被损坏,无法抽取磁盘组下的数据文件列表,而且数据库数据文件过多,工作量巨大,无法明确恢复时长。故也不适合本次故障恢复

  3. DUL

    ​ DUL或类DUL工具可以在实例未启动状态下直接读取磁盘组的system数据文件,将数据库做逻辑导出。

    ​ 但是目前DUL已经不公开,而且类DUL工具需要付费。另外数据量巨大,逻辑导出速度慢。

  4. 备份恢复

    ​ 所幸此系统有NBU备份,最晚全备时间为8月31日 22点,而且也有每日的归档备份,所以使用NBU备份做全量恢复,然后使用归档追回三天数据方案可行。

四、NBU恢复过程

1.恢复流程

由于无法确定NBU备份集的完整性,和归档的连续性,考虑到NBU恢复失败的可能性,采用如下恢复流程:

  • NBU备份恢复到一个新的单机环境sydb_rec
  • NBU恢复成功的前提下,重做ARCDG和DATADG磁盘组
  • 部署sydb_rec 到 sydb的ADG,sydb_rec 为主,将数据通过rman duplicate 和ADG实现数据全量实施同步。
  • 执行sydb_rec 和sydb的主备切换,sydb升级为主库,
  • 将sydb注册到CRS集群

2.恢复环境sydb_rec 部署

  • 恢复环境初始化
  • 静默安装数据库软件
  • 配置监听
  • 静默建库,正常关闭数据库,启动到nomount状态
  • 注意:恢复环境和正式环境 数据库小版本号要一致;操作系统oracle uid和gid和备份客户端一致;dbname 和SID一致

3.NBU恢复

(1)恢复控制文件

rman target /
set dbid 3691232446

run{
ALLOCATE CHANNEL CH00 TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64' ;
send 'nb_ora_serv=nbumaster,NB_ora_client=sybzdb2';
restore controlfile  from '/ctrl_dSYDB_uer3432v1_s91611_p1_t1178700769';
release channel CH00;
}

(2) 重定向redo日志

SQL>alter database rename file '+DATADG/sydb/onlinelog/group_1.272.1082888427' to '/oracle/data/oradata/sydb/redo1.log';
...

(3) 数据文件转储

rman  target  /
run {
allocate channel c1 type 'sbt_tape' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_SERV=nbumaster,NB_ORA_CLIENT=sybzdb2';
allocate channel c2 type 'sbt_tape' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_SERV=nbumaster,NB_ORA_CLIENT=sybzdb2';
set newname for datafile  1 to '/oracle/data/oradata/sydb/system.259.937960515';
...
set newname for datafile 86 to '/oracle/data/oradata/sydb/xxxxx.289.1172240931';
set newname for tempfile 1 to '/oracle/data/oradata/sydb/temp.262.937960519';
set newname for tempfile 2 to '/oracle/data/oradata/sydb/temp.290.1172566255';
set newname for tempfile 3 to '/oracle/data/oradata/sydb/temp.291.1172566617';
set newname for tempfile 4 to '/oracle/data/oradata/sydb/temp.292.1172567161';
restore database;
switch datafile all; 
release channel  c1;
release channel  c2;
}
#如果数据量较大,服务器资源足够,可以多开几个通道,
#建议卸载脚本里面,后台执行。

(4)恢复数据库

rman target  /
run {
allocate channel c1 type 'sbt_tape' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_SERV=nbumaster,NB_ORA_CLIENT=sybzdb2';
allocate channel c2 type 'sbt_tape' PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_SERV=nbumaster,NB_ORA_CLIENT=sybzdb2';
sql 'alter session set nls_date_format="yyyy-mm-dd hh24:mi:ss"';
set until time='2024-09-03 08:00:00';
recover database;
release channel  c1;
release channel  c2;
}
#可以指定时间,也可以不指定时间,以控制文件为准

(5)开启数据库

alter  database   open resetlogs;

(6) 数据库调整

#禁用1线程redo
alter system  disable thread 1;
#删除多余的redo,只保留4个
alter  database drop logfile  group 5;
alter  database drop logfile  group 6;
alter  database drop logfile  group 11;
alter  database drop logfile  group 12;
alter  database drop logfile  group 13;
alter  database drop logfile  group 14;
alter  database drop logfile  group 15;
alter  database drop logfile  group 16;

五、ADG数据同步

1.ADG环境初始化

(1)清理生产环境db 实例

srvctl status database -d sydb
srvctl  stop database -d sydb
dbca -silent -deleteInstance -nodeList sybzdb1 -gdbName sydb -instanceName sydb1 -sysDBAUserName sys -sysDBAPassword xxxx
srvctl  remove database  -d sydb

(2) 磁盘组重建

alter diskgroup ARCDG dismount;
alter diskgroup DATADG dismount;
alter diskgroup DATADG1 dismount;
drop diskgroup ARCDG force including contents;
drop diskgroup DATADG force including contents;
drop diskgroup DATADG1 force including contents;

#11G 版本以后drop  磁盘组 加force 参数可以擦除磁盘头,不需要再执行dd
create diskgroup ARCDG external redundancy disk '/dev/asmdata1' attribute 'au_size'='1M';
create diskgroup DATADG external redundancy disk 
'/dev/asmdata2',
'/dev/asmdata3',
'/dev/asmdata4',
'/dev/asmdata5',
'/dev/asmdata6'
attribute 'au_size'='1M';

# 必要的话修改asm磁盘兼容性,也可以在创建时指定
alter diskgroup arcdg  set  attribute 'compatible.asm'='11.2';
alter diskgroup datadg  set  attribute 'compatible.asm'='11.2';

#两个节点都需要要mount
alter diskgroup arcdg mount;
alter  diskgroup datadg mount;

(3 )静态监听配置

为了不和 生产环境的监听冲突,新建dup监听,端口为1522

sydb_rec监听配置
DUP =
	(DESCRIPTION =
		(ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xx.120.152)(PORT=1522))
	)
SID_LIST_DUP = 
	(SID_LIST =
		(SID_DESC =
			(GLOBAL_DBNAME = sydb_rec)   
			(ORACLE_HOME = /oracle/app/oracle/base/product/11.2.0)
	        (SID_NAME = sydb2)
		)
	)
 sydb  监听配置
 DUP =
	(DESCRIPTION =
		(ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xx.35.97)(PORT=1522))
	)
SID_LIST_DUP = 
	(SID_LIST =
		(SID_DESC =
			(GLOBAL_DBNAME = sydb_dup)   
			(ORACLE_HOME = /oracle/app/oracle/base/product/11.2.0)
	        (SID_NAME = sydb2)
		)
	)
	
#启动恢复环境和生产环境监听
lsnrctl  start  dup

tns配置,恢复环境和生产环境
sydb_tns=
  (DESCRIPTION =
		(ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xx.35.97)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
	  (SERVICE_NAME = sydb_dup)
    )
  )
  
rec_tns=
  (DESCRIPTION =
		(ADDRESS = (PROTOCOL = TCP)(HOST = xxx.xx.120.152)(PORT = 1522))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
	  (SERVICE_NAME = sydb_rec)
    )
  )

(4) 恢复环境作为主库配置

##主库开启归档 和   force logging 
alter database force logging;

#添加  standby log,数量比redo多1 ,大小一致
ALTER DATABASE ADD STANDBY LOGFILE '/oracle/data/oradata/sydb/standby01.log' size 1024M;
ALTER DATABASE ADD STANDBY LOGFILE '/oracle/data/oradata/sydb/standby02.log' size 1024M;
ALTER DATABASE ADD STANDBY LOGFILE '/oracle/data/oradata/sydb/standby03.log' size 1024M;
ALTER DATABASE ADD STANDBY LOGFILE '/oracle/data/oradata/sydb/standby04.log' size 1024M;
ALTER DATABASE ADD STANDBY LOGFILE '/oracle/data/oradata/sydb/standby05.log' size 1024M;

(5)参数配置

#sydb_rec 参数配置
DB_UNIQUE_NAME=sydb_rec
LOG_ARCHIVE_CONFIG='DG_CONFIG=(sydb_rec,sydb)'
LOG_ARCHIVE_DEST_1= 'LOCATION=/oracle/data/oradata/arch1 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=sydb_rec'
LOG_ARCHIVE_DEST_2= 'SERVICE=sydb_tns ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=sydb  
FAL_SERVER=sydb
DB_FILE_NAME_CONVERT='/oracle/data/oradata/sydb','+DATADG'
LOG_FILE_NAME_CONVERT='/oracle/data/oradata/sydb','+DATADG' 
STANDBY_FILE_MANAGEMENT=AUTO


# sydb 参数配置
DB_UNIQUE_NAME=sydb
LOG_ARCHIVE_CONFIG='DG_CONFIG=(sydb_rec,sydb)'
LOG_ARCHIVE_DEST_1= 'LOCATION=+RECDG VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=sydb'
LOG_ARCHIVE_DEST_2= 'SERVICE=rec_tns ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=sydb_rec  
FAL_SERVER=sydb
DB_FILE_NAME_CONVERT='+DATADG','/oracle/data/oradata/sydb'
LOG_FILE_NAME_CONVERT='+DATADG' ,'/oracle/data/oradata/sydb'
STANDBY_FILE_MANAGEMENT=AUTO

(6) 密码文件重建

#恢复环境和生产环境都需要重建
orapwd file=orapwsydb2 password=oracle force=y

(7) sydb 拉起实例

#nomount 状态拉起实例
starup  nomount 

# 测试连接
rman target sys/oracle auxiliary sys/'oracle'@sydb_tns

2.数据复制和日志应用

(1)开始duplicate

rman target sys/xxxx@rec_tns auxiliary sys/xxxxx@sydb_tns 
run
{
allocate AUXILIARY channel c1 type disk;
allocate AUXILIARY channel c2 type disk;
allocate AUXILIARY channel c3 type disk;
allocate AUXILIARY channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
duplicate target database for standby from active database nofilenamecheck;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
} 

(2) 复制完成,应用日志

alter database open;
alter database recover managed standby database using current logfile disconnect from session;

3.ADG 主备切换

(1)主库sydb_rec 降级为备库

alter system switch logfile;
alter system archive log current;
alter database commit to switchover to physical standby with session shutdown;
#重启
shutdown abort;
startup mount;

(2)备库sydb 升级为主库

alter database commit to switchover to primary with session shutdown wait;
#开启数据库
 alter database open;

六、sydb注册资源到crs

1.添加线程

alter database add logfile  thread 1 group 10  '+DATADG' size 1024M;
alter database add logfile  thread 1 group 11  '+DATADG' size 1024M;
alter database add logfile  thread 1 group 12  '+DATADG' size 1024M;
alter database add logfile  thread 1 group 13  '+DATADG' size 1024M;
alter database enable thread 1;

2.配置实例为参数

alter system reset instance_number scope=spfile sid='*';
alter system reset thread  scope=spfile sid='*';
alter system reset undo_tablespace scope=spfile sid='*';
alter system set instance_number=1 scope=spfile sid='sydb1';
alter system set instance_number=2 scope=spfile sid='sydb2';
alter system set thread=1 scope=spfile sid='sydb1';
alter system set thread=2 scope=spfile sid='sydb2';
alter system set undo_tablespace='UNDOTBS1' scope=spfile sid='sydb1';
alter system set undo_tablespace='UNDOTBS2' scope=spfile sid='sydb2';

alter system set cluster_database=TRUE scope=spfile sid='*';
alter system set cluster_database_instances=2 scope=spfile sid='*';

3.注册db 和instance

srvctl add database -d sydb -o /oracle/app/oracle/base/product/11.2.0 -c RAC -r PRIMARY
srvctl add instance -d sydb -i sydb2 -x sybzdb2
srvctl add instance -d sydb -i sydb1 -x sybzdb1

srvctl modify database -d sydb  -p '+DATADG/sydb/parameterfile/spfile.363.1178881235'
srvctl modify database -db sydb -diskgroup DATADG,ARCDG

4.重启数据库

shutdown  immeidate;

srvctl start database -d sydb
#查看启动状态
srvctl   status  database  -d sydb

七 、参数调整

1.稳定参数调整

alter system set "_enable_shared_pool_durations" =false scope=spfile;
alter system set "_enable_space_preallocation"=0 scope=spfile;
alter system set "_gc_policy_time"=0 scope=spfile;
alter system set "_gc_undo_affinity" =FALSE scope=spfile;
alter system set "_gc_read_mostly_locking"=false scope=spfile sid='*';
alter system set "_gc_bypass_readers"=false scope=spfile sid='*';
alter system set "_memory_imm_mode_without_autosga" =false scope=spfile;
alter system set "_resource_manager_always_on" =false scope=spfile;
alter system set "_resource_manager_always_off" =true scope=spfile;
alter system set "_serial_direct_read" =false scope=spfile;
alter system set "_optimizer_use_feedback"=false  scope=spfile;
alter system set "_optim_peek_user_binds"=FALSE scope=spfile;
alter system set "_partition_large_extents"=FALSE scope=spfile;
alter system set "_undo_autotune" =FALSE scope=spfile;

alter system set aq_tm_processes=2  scope=spfile;                   
alter system set db_files=500  scope=spfile;
alter system set db_file_multiblock_read_count=128  scope=spfile;
alter system set deferred_segment_creation=false scope=spfile;              
alter system set fast_start_parallel_rollback=false scope=spfile; 
alter system set filesystemio_options='SETALL'  scope=spfile;
alter system set JOB_QUEUE_PROCESSES=200 scope=spfile;
alter system set parallel_force_local=TRUE scope=spfile;                  
alter system set parallel_max_servers=50  scope=spfile;                       
alter system set processes=3000 scope=spfile;
alter system set undo_retention = 3600 scope=spfile;
alter system set session_cached_cursors = 200 scope=spfile;
alter system set sec_case_sensitive_logon=false scope=spfile;
alter system set event='10949 trace name context forever,level 1','28401 trace name context forever,level 1' scope=spfile;

重启数据库生效

2.重置ADG参数

# 这里如果不需要ADG,必须要将dest_2 置空,否则归档会一直被占用,无法删除,导致归档空间满,影响业务
alter system set log_archive_dest_2='' sid='*'

#清空数据文件转换
alter system set db_file_name_convert='','' scope=spfile sid='*';
alter system set log_file_name_convert='','' scope=spfile sid='*';
alter system reset db_file_name_convert scope=spfile sid='*';
alter system reset log_file_name_convert scope=spfile sid='*';

八、总结

  • 对生产环境保持敬畏之心,一些高危操作一定要谨慎
  • 备份很重要!如果这次故障没有备份,只采取DUL抽取恢复,加大恢复时长,还会有额外的费用

http://www.kler.cn/a/298540.html

相关文章:

  • 【可实战】Bug的判定标准、分类、优先级、定位方法、提交Bug(包含常见面试题)
  • 测试ip端口-telnet开启与使用
  • ADO.NET知识总结3---SqlCommand命令对象
  • maven的简单介绍
  • Nature Electronics——近传感器计算:50 nm异构集成技术的革命
  • [离线数仓] 总结二、Hive数仓分层开发
  • 使用 nuxi upgrade 升级现有nuxt项目版本
  • 蓝桥杯备赛day01:循环
  • 使用Ansible进行自动化运维
  • 1-21 角点检测 opencv树莓派4B 入门系列笔记
  • 智能语音交互:人工智能如何改变我们的沟通方式?
  • 还不知道MES和PLC咋通信?5分钟看懂
  • [Redis] Redis中的String类型
  • 创新生态,共赢未来——数字媒体园区构建产业链协同新模式
  • 【前端】笔试题目整理(知识点)
  • 结构型设计模式—组合模式
  • Java学习路线图,助你成为开发高手
  • Windows子系统Ubuntu安装MySQL及windows的navicate连接
  • Midjourney提示词——黑神话悟空角色生成提示词!
  • Python框架Pandas:DataFrame的应用
  • 2024年【上海市安全员C证】考试题库及上海市安全员C证报名考试
  • 经验笔记:框架(Framework)与库(Library)
  • div内英文不换行问题以及解决方案
  • 深入解析 Docker exec 命令
  • 2-2 opencv实战进阶系列 多边形识别
  • Mac M1安装Hive