Oracle数据恢复:ORA-00600 6749与ORA-8102

« Reconnect device Samsung Kies (PC studio) mode | Blog首页 | 如何删除日志组成员( DROP LOGFILE MEMBER ) »

最近，在帮助用户进行数据恢复之后遇到了一个ORA-00600 6749问题，这个错误实际上有多种可能，在这个客户系统中，也并不是因为恢复问题导致的，只是这个问题以前并没有被注意到。

多年以前，在ITPUB上就曾经探讨过这个问题， Oracle有一个BUG，在某行记录上，会将ROWID指向自身，这样当读取到这条记录时就可能产生死循环，无法跳出这条记录。

这在MOS被标记为BUG：7705591 ：

Bug 7705591 Corruption with self-referenced row in MSSM tablespace. Wrong Results / OERI[6749] / ORA-8102

A chained row (logical row continued in another row) in a table
can be corrupted where the next row piece (nrid) points to itself.

Data corruption resulting from a lost row piece can occur very
intermittently in blocks experiencing high concurrency in MSSM
tablespaces (dba_tablespaces.segment_space_management=MANUAL).

It is most likely to happen but not limited to tables with a large
number of columns (e.g. more than 255 columns).

这个BUG可能导致DELETE操作出现6749错误，也可能因为索引而产生8102错误，因为ROWID的错误qertbFetchByRowID错误也可能被遇到：

Subsequent SQL operations may produce wrong results or different
errors like:

ORA-600 [6749] by DELETE
ORA-8102 by UPDATE (if the table has indexes)
ORA-600 [qertbFetchByRowID]
ORA-1499 by "analyze table <name> validate structure cascade"
        (logical corruption between index and table
         as the table is returning wrong values for the affected
         row)

这属于逻辑错误，并不会被Oracle的DBV等工具检测到：

Without the fix of Bug 8720802 tools like DBVERIFY / RMAN / ANALYZE
don't detect this logical corruption.

Corruption Example from a block dump:

Row 3 in Block rdba 0xad858746 : << BLOCK 的地址信息

tab 0, row 3, @0x6e9
tl: 127 fb: --H-F--- lb: 0x0 cc: 33
nrid: 0xad858746.3 --> it points to the same row 3
<< 这里的NRID 指 Next ROWID，行链接行的下一个块地址，这里的NRID指向了自身，读取会出现循环，无法跳出。

The fix for this bug does not repair existent corruptions.

这个BUG影响到Oracle 10.2.0.4，在10.2.0.5中被修正，11GR2中也予以修正。

在网上还看到这样一则案例：
早上，开发人员提出一个问题，执行下面的语句时，提示600错误：

update db_testresultinfo_old set f_depreagentoutlay = 0, f_depamtreagentoutlay = 0
where f_inputdate2 between to_date('2010-03-01 00:00:01', 'yyyy-mm-dd hh24:mi:ss') and
to_date('2010-03-31 23:59:59', 'yyyy-mm-dd hh24:mi:ss')

ORA-00600: 内部错误代码, 参数: [6749], [3], [18051454], [18], [], [], [], []

这个表的记录有600多万，实际更新大概100多万记录。
以DBA身份登录执行:

execute dbms_stats.delete_schema_stats('ZJLM_USER');

ZJLM_USER 是你报错的那个表所属的oracle用户
我没有敢做SCHEMA的删除，重新搜集了下该表的统计信息，再次更新时就成功了。以下是执行步骤：

SQL> exec dbms_stats.gather_table_stats(ownname => 'KMS',tabname=>'DB_TESTRESULTINFO_OLD');

PL/SQL procedure successfully completed

SQL> update db_testresultinfo_old
                 set   f_depreagentoutlay = 0,
                          f_depamtreagentoutlay = 0
                 where f_inputdate2
                     between to_date('2010-03-01 00:00:01', 'yyyy-mm-dd hh24:mi:ss')
                     and   to_date('2010-03-01 13:59:59', 'yyyy-mm-dd hh24:mi:ss')

2287 rows updated

这个案例与统计信息有关，注意 6749 的参数 18051454 是数据块的DBA，通过这个数字转换，应该可以找到相应的数据块。这个案例很有意思，转引供参考。

历史上的今天...
>> 2007-03-12文章:

今天将成为一个纪念日

>> 2005-03-12文章:

《Oracle数据库性能优化》一书即将出版

By eygle on 2012-03-12 08:28 | Comments (0) | Backup&Recovery | 2967 |