ORACLE 19.8版本遭遇ORA-600 [kqrHashTableRemove: X lock].宕机的问题分析
客户反馈单机环境的一个数据库半夜突然宕机了,这是一个比较重要的系统;接到通知后分析对应日志,发现ALERT日志中有明显报错:ORA-600 [kqrHashTableRemove: X lock].
600报错我简单的分为2类,一类不会导致宕机,一类导致宕机,这次的明显比较运气差。。
按600报错查询MOS文档,很快匹配到BUG: Bug 30417732 - Instance Crash After Hitting ORA-00600 [kqrHashTableRemove: X lock] (Doc ID 30417732.8)
Instance Crashed After ORA-00600 [kqrHashTableRemove: X lock] Error (Doc ID 2656030.1)
查看BUG描述及解决办法:
APPLIES TO:
Oracle Database - Enterprise Edition - Version 19.3.0.0.0 to 19.8.0.0.0 [Release 19]
Information in this document applies to any platform.
SYMPTOMS
• Instance was terminated after error ORA-600 [kqrHashTableRemove: X lock].
2019-10-02T15:11:27.979548+08:00
ORA-00600: internal error code, arguments: [kqrHashTableRemove: X lock], [0x1DEDE1F28], [], [], [], [],
[], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
... ...
2019-10-02T15:11:45.764798+08:00
System state dump requested by (instance=1, osid=##### (DBRM)),
summary=[abnormal instance termination].
System State dumped to trace file
<system state dump>.trc
2019-10-02T15:11:49.138344+08:00
Instance terminated by USER, pid = #####
• Stack Trace shows as following:
kqrHashTableRemove <- kqrfrpo <- kghfreup <- kgh_free_obj <- kgh_free_single_object
<- kgh_free_old <- ksm_free_old <- ksm_spmemrm_bg <- kskprememrmact <- kskparamread
<- kskdbrmtoutact <- ksb_act_run_int <- ksb_act_run <- ksbcti <- ksbabs <-ksbrdp <- opirip
<- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main
CHANGES
CAUSE
This problem is caused by unpublished Bug 30417732, that causes internal lock structure corrupted and may cause the instance
being crashed when a fatal background process do cleanup process.
SOLUTION
Apply the patch of Bug 30417732 or DBRU 19.9 which includes the fix of Bug 30417732.
REFERENCES
NOTE:30417732.8 - Bug 30417732 - Instance Crash After Hitting ORA-00600 [kqrHashTableRemove: X lock]
那么哪个BUG修复了此BUG呢?答案是19.9,就比现在的19.8多一个版本。
当天晚上申请停机窗口,安装了19.25版本补丁后,正常运行了近3个月,目前很稳定。
Description
In rare concurrent scenario, the PO object lock can be corrupted since the
lock structure is being modified after dropping PO mutex and the error
ORA-600[KQRHASHTABLEREMOVE: X LOCK] might be seen. This was leading to
the instance being crashed as a fatal background process was raising an assert
during the cleanup process.
Call stack for ORA-00600 will likely contain:
... kqrHashTableRemove kqrfrpo kghfreup kgh_free_obj ...
REDISCOVERY INFORMATION:
If there are ORA-600 errors for "kqrHashTableRemove: X lock" with the incident
trace indicating a corrupted PO object.
Workaround
NONE
Please not