eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« 明日将参加Oracle公司组织的DBA Round Table活动 | Blog首页 | Oracle 11gR2发布日期确定 - 2009-09-29 »

恩墨科技为济南某大学提供数据恢复服务
modb.pro

末折腾了2天,完成了济南某大学的数据库恢复,这是一次非常紧急的故障恢复工作,简要记录一下故障过程供警示。

故障原因:
据说由于电力不稳定,导致HP IA64位的服务器断电,后来维护厂商在不明缘由下,多次反复启停主机。接下来就发现数据库丢失了2个重要的数据文件。
首先由硬件恢复厂商提供了支持,从磁盘上找回了丢失的文件,然后发现数据库并不能启动,文件不一致,坏块等等对数据库产生了严重的伤害。

案例警示:
客户数据库运行在非归档模式下,只有几个月之前的一次逻辑备份,已经不足以来提供恢复了。
有效的备份,谨慎的维护对数据库来说是稳健之本!

案例难点:
1.系统表空间存在大量坏块,导致数据库无法启动:
#[/oracle]dbv file=system01.dbf blocksize=8192

DBVERIFY: Release 10.2.0.1.0 - Production on Sat Aug 22 20:37:50 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/oradata/jwgl/system01.dbf
Page 369 is influx - most likely media corrupt
Corrupt block relative dba: 0x00400171 (file 1, block 369)
Fractured block found during dbv:
Data in bad block:
 type: 16 format: 2 rdba: 0x00400171
 last change scn: 0x0000.20b564d2 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x2ec01001
 check value in block header: 0xc087
 computed block checksum: 0x4a15

Page 61532 is influx - most likely media corrupt
Corrupt block relative dba: 0x0040f05c (file 1, block 61532)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x0040f05c
 last change scn: 0x0000.23b870f0 seq: 0x3 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x17960602
 check value in block header: 0xab18
 computed block checksum: 0x6767

DBVERIFY - Verification complete

Total Pages Examined         : 130560
Total Pages Processed (Data) : 69851
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 41306
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1813
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 17528
Total Pages Marked Corrupt   : 62
Total Pages Influx           : 62
Highest block SCN            : 604665146 (0.604665146)
2.强制open resetlogs遇到的错误
使用强制手段打开数据库时遇到如下错误:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 12 was created in the future of incomplete recovery
ORA-01110: data file 12: '/oracle/oradata//GL5.dbf'
修复的文件和其他文件存在严重的不一致,该文件处于不完全恢复的未来,无法归并如数据库,强制打开也是无效的。隐含参数也无法生效,多家第三方公司尝试到这里放弃了。

3.遇到的常规错误
在整个恢复过程中,遇到的一些常规错误有ORA-00600 4000号错误:
Sat Aug 22 09:51:50 2009
SMON: enabling cache recovery
Sat Aug 22 09:51:50 2009
Errors in file /oracle/admin/jwgl/udump/jwgl_ora_10412.trc:
ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], []
Sat Aug 22 09:51:54 2009
Errors in file /oracle/admin/jwgl/udump/jwgl_ora_10412.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], []
这个错误老周有几个示范,深有心得:
http://logzgh.itpub.net/post/3185/489281

2662错误是常见错误,不过这个10g的库不知道哪里有问题,2662出现后数据库就会Hang住,abort之后一直无法adjust_scn:
Sat Aug 22 14:41:33 2009
Errors in file /oracle/admin/jwgl/udump/jwgl_ora_16129.trc:
ORA-00600: internal error code, arguments: [2662], [0], [604638038], [0], [604642947], [8388617], [], []
Sat Aug 22 14:41:35 2009
Errors in file /oracle/admin/jwgl/udump/jwgl_ora_16129.trc:
ORA-00600: internal error code, arguments: [2662], [0], [604638038], [0], [604642947], [8388617], [], []
Sat Aug 22 14:41:35 2009
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Sat Aug 22 14:41:35 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_dbw0_16102.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Instance terminated by USER, pid = 16129


4.遇到的棘手错误
有几个错误是第一次遇到,如ORA-00600 3600号错误:
Sat Aug 22 12:37:46 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_dbw0_13236.trc:
ORA-00600: internal error code, arguments: [3600], [12], [2], [], [], [], [], []
Sat Aug 22 12:37:47 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_dbw0_13236.trc:
ORA-00600: internal error code, arguments: [3600], [12], [2], [], [], [], [], []
这个错误解决不了,后来只好绕过去了。

5.强制启动之后的错误
最后强制启动之后,数据库会持续报告Fractured block ,研究了一下如何修复Fractured block 未果,只好导出表数据重建数据库,一点遗憾是有些表遇到Fractured block 无法完整恢复:
Hex dump of (file 1, block 62284) in trace file /oracle/admin/jwgl/bdump/jwgl_mmon_22803.trc
Corrupt block relative dba: 0x0040f34c (file 1, block 62284)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x0040f34c
 last change scn: 0x0000.23d374df seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x0e730601
 check value in block header: 0x223d
 computed block checksum: 0x3250
Reread of rdba: 0x0040f34c (file 1, block 62284) found same corrupted data
Sat Aug 22 20:25:31 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_mmon_22803.trc:
ORA-00600: internal error code, arguments: [2663], [0], [604643023], [0], [604647702], [], [], []
Sat Aug 22 20:25:31 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_smon_22797.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Sat Aug 22 20:25:33 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_mmon_22803.trc:
ORA-00600: internal error code, arguments: [2662], [0], [604643037], [0], [604647702], [25165826], [], []
ORA-600 encountered when generating server alert SMG-4120
Sat Aug 22 20:25:35 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_mmon_22803.trc:
ORA-00600: internal error code, arguments: [2662], [0], [604643042], [0], [604647702], [25165826], [], []
Sat Aug 22 20:25:37 2009
Errors in file /oracle/admin/jwgl/bdump/jwgl_mmon_22803.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [604643042], [0], [604647702], [25165826], [], []
6.恢复手段
在整个恢复过程中,严重依赖的BBED工具,越发体会到这个工具的强大之处。
SYSTEM表空间靠前的几个坏块可以从其他文件上copy过来,其内容一致,后面的就没办法了。tab$,seg$等表存在坏块。
由于文件的一致性混乱,数据库恢复请求的归档从2009年2月开始,根本无从寻找,这里用BBED强制将所有文件的归档需要推进到当前日志的序列,也就解决了之前的3600错误,以及ORA-01248错误。ORA-01248这个错误是第一次遇到,在Metalink上查了一下,看似无解的。所以没办法只好用bbed改。

还遇到一些600错误,就通过bbed不停修改文件头信息推演,最终将数据库强制resetlogs打开,所有文件回归到数据库中。

在解决ORA-01248错误时,bbed一开始还修改了Checkpointed at scn项,这个虽然可以跳过错误,但是记得是由此导致了ORA-00600 3600错误。

7.总结
根据最近几次的恢复经验看,BBED是恢复一些棘手故障的利器,几乎无往不胜,现在还存在的问题是Fractured block的修复问题。
然后备份总是王道,做好备份,万事无忧;没有备份,心惊胆跳!


-The End-







历史上的今天...
    >> 2010-08-23文章:
    >> 2006-08-23文章:
           sql.bsq与数据库的创建

By eygle on 2009-08-23 14:29 | Comments (12) | Backup&Recovery | Life | OraNews | 2374 |

12 Comments

请问大师 BBED之类的修复人品是否还占有很大比例的说?

牛年牛人用牛器做牛事

这个文档里对Fractured block做了很多说明406403.1

你是飞到济南做这件事还是通过远程登录呢?

找到一篇文章:http://dbsnake.com/2009/08/fractured-block-fix.html关于fractured block的修复


学校的DB维护都很差的,哪怕是重点大学也有可能

大学也很多用oracle?


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com