« 珍惜当下,命运无常 - 纪念一位朋友的离开 | Blog首页 | Oracle Developer Forum -开发者大会 »
ORA-600 [2103]错误及CF enqueue竞争
作者:eygle | 【转载请注出处】|【云和恩墨 领先的zData数据库一体机 | zCloud PaaS云管平台 | SQM SQL审核平台 | ZDBM 数据库备份一体机】
链接:https://www.eygle.com/archives/2008/12/ora_600_cfenqueue.html
昨天,客户的一套Oracle 10.2.0.3 RAC环境遇到了一个严重故障,数据库最后以ORA-600 [2103]错误中止了一个实例。链接:https://www.eygle.com/archives/2008/12/ora_600_cfenqueue.html
日志信息大致如下:
Tue Dec 2 17:21:06 2008
Errors in file /u01/admin/erpdb/bdump/erpdb2_lgwr_127968.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Tue Dec 2 17:21:08 2008
Trace dumping is performing id=[cdmp_20081202172108]
Tue Dec 2 17:21:11 2008
Errors in file /u01/admin/erpdb/bdump/erpdb2_lgwr_127968.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
这里显示LGWR进程中止,出现故障,错误就是ORA-00600 [2103]号错误。
进一步的跟踪文件里的信息如下:
*** 2008-12-02 17:21:06.631
TIMEOUT ON CONTROL FILE ENQUEUE
mode=X, type=0, wait=1, eqt=900
*** 2008-12-02 17:21:06.631
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
这个错误是说,CONTROL FILE ENQUEUE等待超时,超时时间是900秒,也就是错误信息后面的参数,900秒杀15分钟,也就是说,在数据库解决这个队列冲突之前,RAC hang住了15分钟,这15分钟是一个要命的时间。
一个内部参数可以控制这个超时时间,这个参数是:_controlfile_enqueue_timeout,其缺省值是900秒:
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.indx = y.indx
4 AND x.ksppinm LIKE '%&par%'
5 /
Enter value for par: controlfile_enqueue
old 4: AND x.ksppinm LIKE '%&par%'
new 4: AND x.ksppinm LIKE '%controlfile_enqueue%'
NAME VALUE DESCRIB
---------------------------------------- ---------- ------------------------------------------------------------
_controlfile_enqueue_timeout 900 control file enqueue timeout in seconds
_controlfile_enqueue_holding_time 120 control file enqueue max holding time in seconds
_controlfile_enqueue_dump FALSE dump the system states after controlfile enqueue timeout
_kill_controlfile_enqueue_blocker TRUE enable killing controlfile enqueue blocker on timeout
但是调整这个参数需要验证和慎重。
ORA-600 [2103]相关的Bug有很多,只能提醒大家的是,遇到这个错误就要注意了!
历史上的今天...
>> 2019-12-03文章:
>> 2010-12-03文章:
>> 2007-12-03文章:
>> 2005-12-03文章:
By eygle on 2008-12-03 14:23 | Comments (16) | Case | 2101 |
这个错误是如何产生,如何解决的呢?
yes,怎么解决的这个问题?
为什么出现超时??
此时有做备份吗?
这个bug不是说在10.2.0.2时解决掉了么?
在本子上做个记录
这个故障我还没有完全研究完,回头再和大家分享。
系统在做rman备份?还是日志切换太频繁?
这么巧,我今天也碰到一个,哈哈
lala,描述一下你的情况,我对这个案例比较感兴趣!
等待解决办法
db每个早上8点作rman backup,那天早上到公司查邮件收到告警,发出时间是8:23
RMAN-06900: WARNING: unable to generate V$RMAN_STATUS or V$RMAN_OUTPUT row
RMAN-06901: WARNING: disabling update of the V$RMAN_STATUS and V$RMAN_OUTPUT rows
ORACLE error from target database:
ORA-03113: end-of-file on communication channel
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-03114: not connected to ORACLE
查看日志,在8点15,8点19,8点23生成了2个archiver的trace
Fri Dec 5 08:15:47 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc1_25055.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by killing session 27.38552
Fri Dec 5 08:19:56 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc0_25053.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by killing session 27.38552
Fri Dec 5 08:23:21 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc1_25055.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by terminating the process
后来我又试了一下rman target /
就出现了ora-600 2103
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
9点21的时候有一个log switch, 接着arc1就被发现死了
Fri Dec 5 09:21:49 2008
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
Fri Dec 5 09:24:00 2008
Errors in file /u01/oracle/admin/lala/bdump/lala_ckpt_24869.trc:
ORA-00445: background process "ARC1" did not start after 120 seconds
看第一个arc1的trace,原来早上4点就有记录了
*** 2008-10-30 04:00:39.752
*** SERVICE NAME:(SYS$BACKGROUND) 2008-10-30 04:00:39.746
*** SESSION ID:(35.1) 2008-10-30 04:00:39.746
Control file resized from 866 to 868 blocks
kccrsd_append: rectype = 11, lbn = 433, recs = 28
*** 2008-12-05 08:15:47.486
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=29589 sid=27 sser=38552 time_held=942 secs (ges mode req=6 held=4)
Killing blocker (pid=29589) on resource CF-00000000-00000000
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
possible owner[16.29589]
Dumping process info of pid[16.29589] requested by pid[29.25055]
Dumping process 16.29589 info:
*** 2008-12-05 08:15:47.494
Dumping diagnostic information for ospid 29589:
OS pid = 29589
loadavg : 1.62 1.69 1.94
看日志从前天晚上10点到当天早上6点没有记录,6点12到7点52有很频繁的log switch,最频繁的间隔1分钟,最久的间隔40分钟
遭遇过相同情况,这个问题关注,到现在遇到的唯一没搞明白的问题,当然那是因为遇到的问题太少了
O(∩_∩)O哈哈~
应该是切换太快了,rac无法支撑
造成队列等待的问题原因是bug还是什么?
show parameter control_file_record_keep_time