eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« Oracle数据库恢复:归档日志损坏案例一则 | Blog首页 | 《Oracle DBA手记 2》已经出版 »

Oracle ERP数据库恢复案例一则-备份重于一切
modb.pro

近日,帮助某客户处理了一起Oracle ERP数据库恢复案例,恢复过程较为简单,客户拥有完整的全备份以及归档日志,可以执行安全恢复,最后遇到的问题是,中间某个归档日志损坏,Oracle无法识别,客户允许放弃几个小时的数据,这样就顺利的完成了恢复。

归档日志损坏,就是我之前提到的:
http://www.eygle.com/archives/2010/11/recover_archivelog_corruption.html

这个案例,唯一存在的疑点是,为什么数据库会出现问题?

从告警日志看来,数据库在完全正常的运行过程中,忽然出现了ORA-600 2662错误,以下是所有的错误信息,在多次出现错误之后,数据库Crash掉,此后未能启动成功:
Wed May 10 17:12:45 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005408990], [1388], [4005425099], [1484804288], [], []
Wed May 10 17:12:47 2010
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed May 10 17:12:47 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005408993], [1388], [4005425099], [1484804288], [], []
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed May 10 17:12:50 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409007], [1388], [4005425099], [1484804288], [], []
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed May 10 17:12:52 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409009], [1388], [4005425099], [1484804288], [], []
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed May 10 17:12:54 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409012], [1388], [4005425099], [1484804288], [], []
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed May 10 17:13:06 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409030], [1388], [4005425099], [1484804288], [], []
Wed May 10 17:13:07 2010
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed May 10 17:13:17 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409048], [1388], [4005425099], [1484804288], [], []
Wed May 10 17:13:18 2010
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed May 10 17:13:28 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409102], [1388], [4005425099], [1484804288], [], []
Wed May 10 17:13:29 2010
Non-fatal internal error happenned while SMON was doing extent coalescing.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed May 10 17:13:39 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_smon_14821.trc:
ORA-00600: internal error code, arguments: [2662], [1388], [4005409119], [1388], [4005425099], [1484804288], [], []
Wed May 10 17:13:41 2010
Errors in file /ORACLE/ERP/LOG/ADMIN/bdump/erp_pmon_14807.trc:
ORA-00474: SMON process terminated with error
Wed May 10 17:13:41 2010
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 14807
检查错误日志:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2662], [1388], [4005408990], [1388], [4005425099], [1484804288], [], []
Current SQL statement for this session:
update sys.mon_mods$ set inserts = inserts + :ins, updates = updates + :upd, deletes = deletes + :del, flags = (decode(bitand(flags, :flag), :flag, flags, flags + :flag)), drop_segments = drop_segments + :dropseg, timestamp = :time where obj# = :objn

Oracle是在进行mon_mods表的维护时出现了不一致,导致2662错误的出现。
2662这个错误本质上仍然是好处理的,不过要看后台隐藏的深层原因,这个故障我怀疑是因为OS级别的写丢失、写异常导致的。
随后用户进行了一系列的恢复尝试,导致当前库不可用。

最后在进行实例恢复时,系统表空间块和Redo块出现了不一致:
Thu Nov 11 15:39:24 2010
Errors in file /bdump/pcerp_p005_27334.trc:
ORA-00600: internal error code, arguments: [3020], [1], [117393], [4311697], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 117393)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '/data/system01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 368
Thu Nov 11 15:39:24 2010
Errors in file /bdump/pcerp_p000_27324.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [123902], [6101], [], [], [], []
Thu Nov 11 15:39:25 2010
Errors in file /bdump/pcerp_p005_27334.trc:
ORA-00600: internal error code, arguments: [3020], [1], [117393], [4311697], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 117393)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '/data/system01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 368
最为常见的,只有写丢失、写异常才可能导致此类问题。

记录一下这些错误信息供参考。




历史上的今天...
    >> 2011-11-18文章:
    >> 2009-11-18文章:
    >> 2008-11-18文章:
    >> 2007-11-18文章:
           我的装修以及装修的生意
    >> 2006-11-18文章:
    >> 2005-11-18文章:
    >> 2004-11-18文章:
           使用USE_CONCAT提示

By eygle on 2010-11-18 12:31 | Comments (0) | Case | 2657 |


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com