eygle.com   eygle.com
eygle.com  
 

« 珍惜当下,命运无常 - 纪念一位朋友的离开 | Blog首页 | Oracle Developer Forum -开发者大会 »

ORA-600 [2103]错误及CF enqueue竞争

昨天,客户的一套Oracle 10.2.0.3 RAC环境遇到了一个严重故障,数据库最后以ORA-600 [2103]错误中止了一个实例。

日志信息大致如下:
Tue Dec  2 17:21:06 2008
Errors in file /u01/admin/erpdb/bdump/erpdb2_lgwr_127968.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Tue Dec  2 17:21:08 2008
Trace dumping is performing id=[cdmp_20081202172108]
Tue Dec  2 17:21:11 2008
Errors in file /u01/admin/erpdb/bdump/erpdb2_lgwr_127968.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []

这里显示LGWR进程中止,出现故障,错误就是ORA-00600 [2103]号错误。
进一步的跟踪文件里的信息如下:
*** 2008-12-02 17:21:06.631
TIMEOUT ON CONTROL FILE ENQUEUE
mode=X, type=0, wait=1, eqt=900
*** 2008-12-02 17:21:06.631
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []

这个错误是说,CONTROL FILE ENQUEUE等待超时,超时时间是900秒,也就是错误信息后面的参数,900秒杀15分钟,也就是说,在数据库解决这个队列冲突之前,RAC hang住了15分钟,这15分钟是一个要命的时间。

一个内部参数可以控制这个超时时间,这个参数是:_controlfile_enqueue_timeout,其缺省值是900秒:
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2    FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3  WHERE x.indx = y.indx
  4    AND x.ksppinm LIKE '%&par%'
  5  /
Enter value for par: controlfile_enqueue
old  4:    AND x.ksppinm LIKE '%&par%'
new  4:    AND x.ksppinm LIKE '%controlfile_enqueue%'

NAME                                    VALUE      DESCRIB
---------------------------------------- ---------- ------------------------------------------------------------
_controlfile_enqueue_timeout            900        control file enqueue timeout in seconds
_controlfile_enqueue_holding_time        120        control file enqueue max holding time in seconds
_controlfile_enqueue_dump                FALSE      dump the system states after controlfile enqueue timeout
_kill_controlfile_enqueue_blocker        TRUE      enable killing controlfile enqueue blocker on timeout

但是调整这个参数需要验证和慎重。

ORA-600 [2103]相关的Bug有很多,只能提醒大家的是,遇到这个错误就要注意了!


历史上的今天...
      >> 2007-12-03文章:
             RAC + ASM单节点添加表空间的BUG
      >> 2005-12-03文章:
             2005年的第一场雪
------
这篇 【ORA-600 [2103]错误及CF enqueue竞争】来自 eygle.com | CSDN网摘| del.icio.us|Google订阅 | 鲜果订阅 | 抓虾订阅

By eygle on 2008-12-03 14:23 | Comments (15) | Posted to Case | Edit |Pageviews:

相关文章 随机文章
使用DATAPUMP导致ORA-00600 17020错误
IBM AIX Read-only file system案例一则
Logical Standby ORA-01425错误处理一则
ORA-07445 数据库也会旧病复发
ORA-00600 4000 及 4194 错误小记
雅虎中国Oracle DBA职位虚席以待
《Oracle数据库性能优化》书稿已经交付出版社
权从何处侵起 版自何处盗来
在Solaris下安装VNC 远程安装Oracle
在上海,在Radisson,在Oracle的路上
搜索本站:

留言 (15)

这个错误是如何产生,如何解决的呢?

Posted by: wangliang@163.com at December 3, 2008 3:59 PM

yes,怎么解决的这个问题?

Posted by: lypflash at December 3, 2008 4:07 PM

为什么出现超时??

Posted by: 棉花糖ONE at December 3, 2008 5:21 PM

此时有做备份吗?

Posted by: 小荷 at December 3, 2008 8:54 PM

这个bug不是说在10.2.0.2时解决掉了么?

Posted by: Hades_2008 at December 4, 2008 9:52 AM

在本子上做个记录

Posted by: dbafans at December 4, 2008 9:26 PM

这个故障我还没有完全研究完,回头再和大家分享。

Posted by: eygle at December 4, 2008 9:43 PM

系统在做rman备份?还是日志切换太频繁?

Posted by: battleman at December 4, 2008 10:55 PM

这么巧,我今天也碰到一个,哈哈

Posted by: lala at December 5, 2008 11:19 AM

lala,描述一下你的情况,我对这个案例比较感兴趣!

Posted by: eygle at December 5, 2008 1:39 PM

等待解决办法

Posted by: wffffc at December 7, 2008 11:48 PM

db每个早上8点作rman backup,那天早上到公司查邮件收到告警,发出时间是8:23

RMAN-06900: WARNING: unable to generate V$RMAN_STATUS or V$RMAN_OUTPUT row
RMAN-06901: WARNING: disabling update of the V$RMAN_STATUS and V$RMAN_OUTPUT rows
ORACLE error from target database:
ORA-03113: end-of-file on communication channel

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-03114: not connected to ORACLE

查看日志,在8点15,8点19,8点23生成了2个archiver的trace
Fri Dec 5 08:15:47 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc1_25055.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by killing session 27.38552
Fri Dec 5 08:19:56 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc0_25053.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by killing session 27.38552
Fri Dec 5 08:23:21 2008
GES: Potential blocker (pid=29589) on resource CF-00000000-00000000;
enqueue info in file /u01/oracle/admin/lala/bdump/lala_arc1_25055.trc
Killing enqueue blocker (pid=29589) on resource CF-00000000-00000000
by terminating the process

后来我又试了一下rman target /
就出现了ora-600 2103
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

9点21的时候有一个log switch, 接着arc1就被发现死了
Fri Dec 5 09:21:49 2008
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
Fri Dec 5 09:24:00 2008
Errors in file /u01/oracle/admin/lala/bdump/lala_ckpt_24869.trc:
ORA-00445: background process "ARC1" did not start after 120 seconds

看第一个arc1的trace,原来早上4点就有记录了
*** 2008-10-30 04:00:39.752
*** SERVICE NAME:(SYS$BACKGROUND) 2008-10-30 04:00:39.746
*** SESSION ID:(35.1) 2008-10-30 04:00:39.746
Control file resized from 866 to 868 blocks
kccrsd_append: rectype = 11, lbn = 433, recs = 28
*** 2008-12-05 08:15:47.486
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=29589 sid=27 sser=38552 time_held=942 secs (ges mode req=6 held=4)
Killing blocker (pid=29589) on resource CF-00000000-00000000
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
possible owner[16.29589]
Dumping process info of pid[16.29589] requested by pid[29.25055]
Dumping process 16.29589 info:
*** 2008-12-05 08:15:47.494
Dumping diagnostic information for ospid 29589:
OS pid = 29589
loadavg : 1.62 1.69 1.94

看日志从前天晚上10点到当天早上6点没有记录,6点12到7点52有很频繁的log switch,最频繁的间隔1分钟,最久的间隔40分钟

Posted by: lala at December 9, 2008 5:14 PM

遭遇过相同情况,这个问题关注,到现在遇到的唯一没搞明白的问题,当然那是因为遇到的问题太少了
O(∩_∩)O哈哈~

Posted by: 弦乐之花 at December 10, 2008 10:38 AM

应该是切换太快了,rac无法支撑

Posted by: mustapha at February 3, 2009 11:06 AM

造成队列等待的问题原因是bug还是什么?

Posted by: sterlingwu at June 30, 2009 9:26 PM

发表留言:



Remember Me?
(输入验证码后方可评论,谢谢支持)



CopyRight © 2004~2010 eygle.com, All rights reserved.