<?xml version="1.0" encoding="GB2312"?>
<rss version="2.0">
<channel>
<title>Friends Life and Oracle</title>
<link>http://www.eygle.com/blog/</link>
<description>eygle的Oracle Blog，提供Oracle技术研究及深入探讨，同时记录个人爱好及生活历程。</description>
<copyright>Copyright 2006</copyright>
<lastBuildDate>Tue, 28 Nov 2006 10:32:19 +0800</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.33</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 

<item>
<title>DataGuard数据库服务器硬盘故障处理一则</title>
<description><![CDATA[<p>昨天一台PC Server上的数据库又出问题，<a href="http://www.eygle.com/archives/2006/10/start_dataguard_db.html">同样</a>是硬盘故障。</p>

<p>这两台服务器用的都是<a href="http://www.realserver.com.cn/">联志</a>的国产低端PC Server，这些服务器的质量实在是差，上次一台备机的硬盘损坏，然后又有一台因为电源模块的问题反复重起，现在这一台服务器的硬盘再次出现问题。</p>

<blockquote>Nov 24 10:27:48 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:48 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:48 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011258880<BR>Nov 24 10:27:49 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:49 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011262976<BR>Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011267072<BR>Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011271168</blockquote>

<p>好在数据库通过DataGuard可以切换到另外一台，没有数据损失：<br />
<blockquote>Thu Nov 23 18:46:18 2006<BR>ARC0: Complete FAL archive (thread 1 sequence 6045 destination bmarksb)<BR>ARC0: Begin FAL archive (thread 1 sequence 6047 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>ARC0: Complete FAL archive (thread 1 sequence 6047 destination bmarksb)<BR>ARC0: Begin FAL archive (thread 1 sequence 6048 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>Thu Nov 23 18:46:18 2006<BR>ARC1: Complete FAL archive (thread 1 sequence 6046 destination bmarksb)<BR>ARC1: Begin FAL archive (thread 1 sequence 6049 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>Thu Nov 23 18:46:18 2006<BR>ARC0: Complete FAL archive (thread 1 sequence 6048 destination bmarksb)<BR>Thu Nov 23 18:46:18 2006<BR>ARC1: Complete FAL archive (thread 1 sequence 6049 destination bmarksb)</blockquote></p>

<p>现在是主库所在的服务器出现问题:<br />
<blockquote><P>SQL&gt; select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;</P><br />
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DBID NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PROTECTION_MODE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DATABASE_ROLE&nbsp;&nbsp;&nbsp; SWITCHOVER_STATUS<BR>---------- --------- -------------------- ---------------- ------------------<BR>3520694939 BMARK&nbsp;&nbsp;&nbsp;&nbsp; MAXIMUM PERFORMANCE&nbsp; PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SESSIONS ACTIVE</P></blockquote></p>

<p>备库现在一切正常:<br />
<blockquote><P>SQL&gt; select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;</P><br />
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DBID NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PROTECTION_MODE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DATABASE_ROLE&nbsp;&nbsp;&nbsp; SWITCHOVER_STATUS<BR>---------- --------- -------------------- ---------------- ------------------<BR>3520694939 BMARK&nbsp;&nbsp;&nbsp;&nbsp; MAXIMUM PERFORMANCE&nbsp; PHYSICAL STANDBY SESSIONS ACTIVE</P></blockquote></p>

<p>现在需要的是一点停机时间进行切换。</p>

<p>切换日志:<br />
<blockquote>Fri Nov 24 11:30:43 2006<br />
alter database commit to switchover to physical standby with session shutdown<br />
Fri Nov 24 11:30:43 2006<br />
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY<br />
Fri Nov 24 11:30:43 2006<br />
SMON: disabling tx recovery<br />
Fri Nov 24 11:30:44 2006<br />
Active process 26743 user 'oracle' program 'oracle@wapcom1.hawa.cn (CJQ0)'<br />
Active process 9033 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 7655 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
...............<br />
Active process 8944 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 29104 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 30750 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 9045 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
CLOSE: waiting for server sessions to complete.<br />
Fri Nov 24 11:31:51 2006<br />
CLOSE: all sessions shutdown successfully.<br />
Fri Nov 24 11:32:09 2006<br />
SMON: disabling cache recovery<br />
Fri Nov 24 11:32:10 2006<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Fri Nov 24 11:32:10 2006<br />
ARCH shutting down<br />
Fri Nov 24 11:32:10 2006<br />
ARCH shutting down<br />
Fri Nov 24 11:32:10 2006<br />
ARC0: Archival stopped<br />
Fri Nov 24 11:32:10 2006<br />
ARC1: Archival stopped<br />
Fri Nov 24 11:32:10 2006<br />
Thread 1 closed at log sequence 6076<br />
Successful close of redo thread 1<br />
Fri Nov 24 11:32:28 2006<br />
ARCH: noswitch archival of thread 1, sequence 6076<br />
ARCH: End-Of-Redo archival of thread 1 sequence 6076<br />
ARCH: Evaluating archive   log 3 thread 1 sequence 6076<br />
ARCH: Beginning to archive log 3 thread 1 sequence 6076<br />
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<br />
Creating archive destination LOG_ARCHIVE_DEST_1: '/var/oradata/arch/1_6076.arc'<br />
ARCH: Completed archiving  log 3 thread 1 sequence 6076<br />
ARCH: archiving is disabled due to current logfile archival<br />
Clearing standby activation ID 3520937155 (0xd1dd3cc3)<br />
The primary database controlfile was created using the<br />
'MAXLOGFILES 5' clause.<br />
The resulting standby controlfile will not have enough<br />
available logfile entries to support an adequate number<br />
of standby redo logfiles. Consider re-creating the<br />
primary controlfile using 'MAXLOGFILES 8' (or larger).<br />
Use the following SQL commands on the standby database to create<br />
standby redo logfiles that match the primary database:<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 10485760;<br />
Archivelog for thread 1 sequence 6076 required for standby recovery<br />
MRP0 started with pid=8<br />
MRP0: Background Managed Standby Recovery process started<br />
Media Recovery Log /var/oradata/arch/1_6076.arc<br />
Identified end-of-REDO for thread 1 sequence 6076<br />
Identified end-of-REDO for thread 1 sequence 6076<br />
Media Recovery End-Of-Redo indicator encountered<br />
Media Recovery Applied until change 194025715<br />
MRP0: Media Recovery Complete: End-Of-REDO<br />
Resetting standby activation ID 3520937155 (0xd1dd3cc3)<br />
MRP0: Background Media Recovery process shutdown<br />
Fri Nov 24 11:32:35 2006<br />
Switchover: Complete - Database shutdown required<br />
Completed: alter database commit to switchover to physical st<br />
Fri Nov 24 11:32:53 2006<br />
Shutting down instance: further logons disabled<br />
Shutting down instance (immediate)<br />
License high water mark = 140<br />
Fri Nov 24 11:32:53 2006<br />
ALTER DATABASE CLOSE NORMAL<br />
ORA-1507 signalled during: ALTER DATABASE CLOSE NORMAL...<br />
ARCH: Archiving is disabled<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Archive process shutdown avoided: 0 active<br />
ARCH: Archiving is disabled<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Archive process shutdown avoided: 0 active<br />
Fri Nov 24 11:33:14 2006<br />
Starting ORACLE instance (normal)<br />
LICENSE_MAX_SESSION = 0<br />
LICENSE_SESSIONS_WARNING = 0<br />
SCN scheme 2<br />
Using log_archive_dest parameter default value<br />
LICENSE_MAX_USERS = 0<br />
SYS auditing is disabled<br />
Starting up ORACLE RDBMS Version: 9.2.0.6.0.<br />
System parameters with non-default values:<br />
  processes                = 150<br />
  timed_statistics         = TRUE<br />
  shared_pool_size         = 83886080<br />
  large_pool_size          = 33554432<br />
  standby_archive_dest     = /var/oradata/arch<br />
  fal_server               = bmarksb<br />
  fal_client               = bmark<br />
  log_archive_format       = %t_%s.arc<br />
...........<br />
CJQ0 started with pid=8<br />
Fri Nov 24 11:33:15 2006<br />
ARCH: STARTING ARCH PROCESSES<br />
ARC0 started with pid=9<br />
ARC0: Archival started<br />
ARC1 started with pid=10<br />
Fri Nov 24 11:33:15 2006<br />
ARCH: STARTING ARCH PROCESSES COMPLETE<br />
Fri Nov 24 11:33:15 2006<br />
ARC1: Archival started<br />
Fri Nov 24 11:33:15 2006<br />
ARC0: Thread not mounted<br />
Fri Nov 24 11:33:15 2006<br />
ARC1: Thread not mounted<br />
Fri Nov 24 11:33:22 2006<br />
alter database mount standby database<br />
Fri Nov 24 11:33:26 2006<br />
Successful mount of redo thread 1, with mount id 3559140162<br />
Fri Nov 24 11:33:26 2006<br />
Standby Database mounted.<br />
Completed: alter database mount standby database<br />
Fri Nov 24 11:33:29 2006<br />
ALTER DATABASE RECOVER  managed standby database disconnect  <br />
Attempt to start background Managed Standby Recovery process<br />
MRP0 started with pid=12<br />
MRP0: Background Managed Standby Recovery process started<br />
Fri Nov 24 11:33:34 2006<br />
Completed: ALTER DATABASE RECOVER  managed standby database d<br />
Fri Nov 24 11:33:34 2006<br />
Media Recovery Waiting for thread 1 seq# 6077<br />
Media Recovery Log /var/oradata/arch/1_6077.arc<br />
Media Recovery Waiting for thread 1 seq# 6078<br />
Media Recovery Log /var/oradata/arch/1_6078.arc<br />
Media Recovery Waiting for thread 1 seq# 6079</blockquote></p>

<p>看来以后不能再采购联志服务器了。</p>

<p>-The End-</p>]]></description>
<link>http://www.eygle.com/archives/2006/11/aisino_server_dataguard.html</link>
<guid>http://www.eygle.com/archives/2006/11/aisino_server_dataguard.html</guid>
<category>Advanced</category>
<pubDate>Tue, 28 Nov 2006 10:32:19 +0800</pubDate>
</item>
<item>
<title>DBA警世录:备份重于一切</title>
<description><![CDATA[<p>最近在ITPUB上有<a href="http://www.itpub.net/650527.html">一个帖子</a>讨论得很热烈，题目是：怎么老是有这么多不负责任的DBA。</p>

<p>作者提到：<br />
<blockquote>　刚才同事告诉我，以前我的顶头上司，IT经理引咎辞职了，仔细一问，原来是我的继任没有做备份，资料全部损毁<br />
　<br />
　这也是今年我第2次见到这种情况</blockquote></p>

<p>这种情况其实我们已经看到过很多次了。</p>

<p>如果拿这个具体案例来说，其实有很多环节可以避免出现这样的问题，比如:<br />
1.良好的规范管理<br />
2.严格的操作及上线流程<br />
3.DBA的职责界定及监督检查机制<br />
4.系统的日常监控及维护机制...</p>

<p>当然有很多很多可能防止问题出现的方法，可惜最终问题仍然发生了。</p>

<p>这又一次验证了<a href="http://www.eygle.com/digest/2006/04/murphy_law.html">墨菲定律</a>，这个世界上没有永远的侥幸。<br />
在我的新书《<a href="http://www.eygle.com/archives/2008/08/my_book_services.html">深入浅出Oracle</a>》一书的序言中我也曾写到：<br />
<strong>唯一一件会使DBA在梦中惊醒的事情就是：没有备份！</strong></p>

<p>在我曾经授课的岁月里，我总是会在课程的最前面讲到DBA的<a href="http://www.eygle.com/archives/2006/03/the_four_rule_for_dba.html">四大守则</a>，其中第一条就是：<strong>备份重于一切</strong>。</p>

<p>而在我的网站上，这句话已经<a href="http://www.google.com/search?hl=zh-CN&inlang=zh-CN&ie=GB2312&oe=GB2312&newwindow=1&domains=eygle.com&q=%B1%B8%B7%DD%D6%D8%D3%DA%D2%BB%C7%D0&sitesearch=eygle.com">重复</a>了很多次，昨天看Tom的Blog，Tom提到了他的<a href="http://www.eygle.com/archives/2006/10/tom_five_rules.html">法则</a>，在原文中，Tom用了一个词<strong>mantra</strong>，这个词在金山词霸上如下解释：<br />
<blockquote>mantra<br />
颂歌, 咒语(尤指四吠陀经典内作为咒文或祷告唱念的)</blockquote></p>

<p>在昨天的<a href="http://www.eygle.com/archives/2006/10/tom_five_rules.html">文章</a>中，被我翻译成法则，而我觉得如果翻译成咒语也满合适的，如果我一次一次的重复能够让所有的DBA们都记得，那么我仍然愿意重复我的DBA四大守则:</p>

<blockquote>1.备份重于一切
我们必需知道,系统总是要崩溃的，没有有效的备份只是等哪一天死！我经常开玩笑的说,唯一会使DBA在梦中惊醒的就是,没有有效的备份.

<p>2.三思而后行<br />
think thrice before you act</p>

<p>任何时候都要清楚你所做的一切，否则宁可不做！有时候一个回车,一条命令就会造成不可恢复的灾难,所以,你必需清楚确认你所做的一切,并且在必要时保护现场.</p>

<p>3.rm是危险的<br />
  要知道在UNIX/Linux下，这个操作意味着你可能将永远失去后面的东西，所以，确认你的操作！！！<br />
 太多的人在 "rm -rf" 上悲痛欲绝,当年写下这条守则时,是一个凌晨被一个朋友吵醒,他说误操作rm -rf删除掉了200G的数据库,并且没有备份.</p>

<p>我当时能告诉他的只有一句话:要保持冷静.</p>

<p>4.你来制定规范<br />
 良好的规范是减少故障的基础。所以,做为一个DBA,你需要来制订规范,规范开发甚至系统人员,这样甚至可以规避有意或是无意的误操作.减少数据库的风险.</blockquote></p>

<p>最早写下这四大守则时，还受到我们某位国家领导人的影响，在指导防火工作时，他曾经题写过以下名词:<br />
<strong>隐患险于明火，防范胜于救灾，责任重于泰山</strong></p>

<p>这句话对于DBA来说，同样适用。在某种程度上，DBA就是消防队员。：）</p>

<p>最后，我们还可以来看看DCBA对这件事情的<a href="http://www.anysql.net/life/love_after_lost.html">看法</a>。</p>

<p>-The End-</p>]]></description>
<link>http://www.eygle.com/archives/2006/10/backup_backup_backup.html</link>
<guid>http://www.eygle.com/archives/2006/10/backup_backup_backup.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Fri, 20 Oct 2006 14:32:17 +0800</pubDate>
</item>
<item>
<title>使用RMAN进行基于时间点的不完全恢复</title>
<description><![CDATA[<p>上周一个<a href="http://www.eygle.com/archives/2006/09/undo_retention_need_change.html">朋友</a>的数据库发生误删除操作，请我帮忙进行恢复。<br />
由于备份比较充分，所以只需要通过RMAN进行一个基于时间点（这个时间点需要根据故障时间进行判断选取）。</p>

<p>首先启动实例:<br />
<blockquote>[oracle@stat ~]$ export ORACLE_SID=order<br />
[oracle@stat ~]$ rman target /</p>

<p>Recovery Manager: Release 10.2.0.2.0 - Production on Thu Sep 14 22:43:50 2006</p>

<p>Copyright (c) 1982, 2005, Oracle.  All rights reserved.</p>

<p>connected to target database (not started)</p>

<p>RMAN> set DBID=1341966532</p>

<p>executing command: SET DBID</p>

<p>RMAN> startup nomount;</p>

<p>Oracle instance started</p>

<p>Total System Global Area    2483027968 bytes</p>

<p>Fixed Size                     1262344 bytes<br />
Variable Size                654314744 bytes<br />
Database Buffers            1811939328 bytes<br />
Redo Buffers                  15511552 bytes</blockquote></p>

<p>恢复数据文件并加载(mount)数据库:<br />
<blockquote>RMAN> restore controlfile from autobackup;</p>

<p>Starting restore at 14-SEP-06<br />
using target database control file instead of recovery catalog<br />
allocated channel: ORA_DISK_1<br />
channel ORA_DISK_1: sid=541 devtype=DISK</p>

<p>channel ORA_DISK_1: looking for autobackup on day: 20060914<br />
channel ORA_DISK_1: autobackup found: c-1341966532-20060914-02<br />
channel ORA_DISK_1: control file restore from autobackup complete<br />
output filename=/oradata/controlfile/o1_mf_28spy45z_.ctl<br />
output filename=/oradata/controlfile/o2_mf_28spy45z_.ctl<br />
Finished restore at 14-SEP-06</p>

<p>RMAN> alter database mount;</p>

<p>database mounted<br />
released channel: ORA_DISK_1</blockquote></p>

<p>Restore数据库:<br />
<blockquote>RMAN> restore database;</p>

<p>Starting restore at 14-SEP-06<br />
allocated channel: ORA_DISK_1<br />
channel ORA_DISK_1: sid=541 devtype=DISK</p>

<p>channel ORA_DISK_1: starting datafile backupset restore<br />
channel ORA_DISK_1: specifying datafile(s) to restore from backup set<br />
restoring datafile 00001 to /oradata/datafile/o1_mf_system_28spy7kl_.dbf<br />
restoring datafile 00002 to /oradata/datafile/o1_mf_undotbs1_28spykdh_.dbf<br />
restoring datafile 00003 to /oradata/datafile/o1_mf_sysaux_28spyo9s_.dbf<br />
restoring datafile 00004 to /oradata/datafile/o1_mf_users_28spyvm8_.dbf<br />
restoring datafile 00005 to /oradata/datafile/o1_mf_vascms_2c444bhj_.dbf<br />
restoring datafile 00006 to /oradata/datafile/o1_mf_wapgame_2c44gz55_.dbf<br />
restoring datafile 00007 to /oradata/datafile/o1_mf_vascms_2c4kn0b2_.dbf<br />
channel ORA_DISK_1: reading from backup piece /data3/ordrbak/full_ORDER_20060913_169<br />
channel ORA_DISK_1: restored backup piece 1<br />
piece handle=/data3/ordrbak/orderfullback_ORDER_20060913_169 tag=order<br />
channel ORA_DISK_1: restore complete, elapsed time: 00:03:06<br />
Finished restore at 14-SEP-06</blockquote></p>

<p>进行基于时间点的恢复：<br />
<blockquote>RMAN> recover database until time '2006-09-14 19:00:00'<br />
2> ;</p>

<p>Starting recover at 14-SEP-06<br />
using channel ORA_DISK_1<br />
RMAN-00571: ===========================================================<br />
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============<br />
RMAN-00571: ===========================================================<br />
RMAN-03002: failure of recover command at 09/14/2006 22:49:54<br />
ORA-01861: literal does not match format string</p>

<p>RMAN> exit</p>

<p><br />
Recovery Manager complete.</blockquote></p>

<p>这个错误是由于时间日期格式设置的问题。</p>

<p>设置正确的时间格式，进行基于时间点的不完全恢复:<br />
<blockquote>[oracle@stat ~]$ export NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'<br />
[oracle@stat ~]$ rman target /</p>

<p>Recovery Manager: Release 10.2.0.2.0 - Production on Thu Sep 14 22:50:22 2006</p>

<p>Copyright (c) 1982, 2005, Oracle.  All rights reserved.</p>

<p>connected to target database: order (DBID=1341966532, not open)</p>

<p>RMAN> recover database until time '2006-09-14 19:00:00'<br />
2> ;</p>

<p>Starting recover at 2006-09-14 22:50:26<br />
using target database control file instead of recovery catalog<br />
allocated channel: ORA_DISK_1<br />
channel ORA_DISK_1: sid=544 devtype=DISK</p>

<p>starting media recovery</p>

<p>archive log thread 1 sequence 303 is already on disk as file <br />
							/oradata/archive/1_303_592917188.dbf<br />
archive log thread 1 sequence 304 is already on disk as file <br />
							/oradata/archive/1_304_592917188.dbf<br />
channel ORA_DISK_1: starting archive log restore to default destination<br />
channel ORA_DISK_1: restoring archive log<br />
archive log thread=1 sequence=299<br />
channel ORA_DISK_1: reading from backup piece /data3/ordrbak/arch_order_20060913_171<br />
channel ORA_DISK_1: restored backup piece 1<br />
piece handle=/data3/ordrbak/orderarch_order_20060913_171 tag=order<br />
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03<br />
archive log filename=/oradata/archive/1_299_592917188.dbf thread=1 sequence=299<br />
channel ORA_DISK_1: starting archive log restore to default destination<br />
channel ORA_DISK_1: restoring archive log<br />
archive log thread=1 sequence=300<br />
channel ORA_DISK_1: restoring archive log<br />
archive log thread=1 sequence=301<br />
channel ORA_DISK_1: restoring archive log<br />
archive log thread=1 sequence=302<br />
channel ORA_DISK_1: reading from backup piece /data3/ordrbak/arch_order_20060914_173<br />
channel ORA_DISK_1: restored backup piece 1<br />
piece handle=/data3/ordrbak/orderarch_order_20060914_173 tag=TAG20060914T033004<br />
channel ORA_DISK_1: restore complete, elapsed time: 00:00:08<br />
archive log filename=/oradata/archive/1_300_592917188.dbf thread=1 sequence=300<br />
archive log filename=/oradata/archive/1_301_592917188.dbf thread=1 sequence=301<br />
archive log filename=/oradata/archive/1_302_592917188.dbf thread=1 sequence=302<br />
archive log filename=/oradata/archive/1_303_592917188.dbf thread=1 sequence=303<br />
archive log filename=/oradata/archive/1_304_592917188.dbf thread=1 sequence=304<br />
media recovery complete, elapsed time: 00:00:57<br />
Finished recover at 2006-09-14 22:51:39</blockquote></p>

<p>Resetlogs打开数据库:<br />
<blockquote>RMAN> alter database open resetlogs;</p>

<p>database opened</p>

<p>RMAN></blockquote></p>

<p>此时可以检查数据的正确性，如果无误就可以通过exp导出数据，再imp进生产数据库，完成恢复。</p>

<p>在有了充分的备份的前提下，这样的恢复是非常容易的。<br />
这个故事再次告诉我们：<a href="http://www.eygle.com/archives/2006/01/backup_is_most_important.html">备份重于一切</a>。</p>

<p>-The End-<br />
</p>]]></description>
<link>http://www.eygle.com/archives/2006/09/rman_until_time_recovery.html</link>
<guid>http://www.eygle.com/archives/2006/09/rman_until_time_recovery.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Sun, 17 Sep 2006 21:05:31 +0800</pubDate>
</item>
<item>
<title>拥有归档日志 如何恢复一个丢失的数据文件</title>
<description><![CDATA[<p>昨天Kamus问到一个问题，如果拥有一个冷备份，但是缺失了其中的一个数据文件，但是存在所有的归档，应该如何恢复数据文件。</p>
<p>&nbsp;动手试一下，大概就是如下步骤:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>[oracle@jumper eygle]$ sqlplus &quot;/ as sysdba&quot;</p>
            <p>SQL*Plus: Release 9.2.0.4.0 - Production on Sun Aug 20 01:22:50 2006</p>
            <p>Copyright (c) 1982, 2002, Oracle Corporation.&nbsp; All rights reserved.</p>
            <p>Connected to an idle instance.</p>
            <p>SQL&gt; startup mount<br />ORACLE instance started.</p>
            <p>Total System Global Area&nbsp; 252777592 bytes<br />Fixed Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 451704 bytes<br />Variable Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 134217728 bytes<br />Database Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 117440512 bytes<br />Redo Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 667648 bytes<br />Database mounted.</p>
            <p><br />SQL&gt; alter database open;<br />alter database open<br />*<br />ERROR at line 1:<br />ORA-01157: cannot identify/lock data file 3 - see DBWR trace file<br />ORA-01110: data file 3: '/opt/oracle/oradata/eygle/eygle02.dbf'</p>
            <p><br />SQL&gt; <font color="#ff0000">alter database create datafile 3 as '/opt/oracle/oradata/eygle/eygle02.dbf';</font></p>
            <p>Database altered.</p>
            <p>SQL&gt; select name from v$datafile;</p>
            <p>NAME<br />-------------------------------------------------------<br />/opt/oracle/oradata/eygle/system01.dbf<br />/opt/oracle/oradata/eygle/undotbs01.dbf<br />/opt/oracle/oradata/eygle/eygle02.dbf<br />/opt/oracle/oradata/eygle/eygle01.dbf</p>
            <p>SQL&gt; alter database open;<br />alter database open<br />*<br />ERROR at line 1:<br />ORA-01113: file 3 needs media recovery<br />ORA-01110: data file 3: '/opt/oracle/oradata/eygle/eygle02.dbf'</p>
            <p><br />SQL&gt; recover datafile 3;<br />Media recovery complete.<br />SQL&gt; alter database open;</p>
            <p>Database altered.</p>
            <p>SQL&gt; </p>
            </td>
        </tr>
    </tbody>
</table>
<p>-The End-</p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/08/how_to_recover_lose_datafile.html</link>
<guid>http://www.eygle.com/archives/2006/08/how_to_recover_lose_datafile.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Sun, 20 Aug 2006 12:09:44 +0800</pubDate>
</item>
<item>
<title>DBA警世录:无知者不可无畏</title>
<description><![CDATA[<p>今天在Itpub上看到这样一个问题:<a href="http://www.itpub.net/showthread.php?s=&amp;postid=4836782">执行了SHUTDOWN ABORT命令后，数据库文件打不开</a>.</p>
<p>仔细看了一下帖子内容,发现作者执行了如下几条命令:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">sql&gt;alter datafile ' D：\oracle\QAS\sapdata1\system_1\SYSTEM.DATA1' offline drop;<br />database altered<br />sql&gt;alter datafile ' D：\oracle\QAS\sapdata1\qasusr_1\QASUSR.DATA1' offline drop;<br />database altered</td>
        </tr>
    </tbody>
</table>
<p>然后数据库无法打开.</p>
<p>我们注意到,在第一条命令中,作者offline drop掉了SYSTEM表空间中的数据文件.不管之前怎样,现在的数据库肯定是无法启动了.</p>
<p>在作者提供的日志中,可以看到之前数据库的错误提示:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">Sun Jul 02 14:34:25 <strong style="COLOR: black; BACKGROUND-COLOR: #ff66ff">2006</strong><br />Errors in file d:\oracle\qas\saptrace\background\qas_arc1_1844.trc:<br />ORA-19504: failed to create file &quot;C:\ORACLE\QAS\ORAARCH\QASARCHARC01148.001&quot;<br />ORA-19504: failed to create file &quot;C:\ORACLE\QAS\ORAARCH\QASARCHARC01148.001&quot;<br />ORA-27044: unable to write the header block of file<br />OSD-04008: WriteFile() failure, unable to write to file<br />O/S-Error: (OS 112) 磁盘空间不足。</td>
        </tr>
    </tbody>
</table>
<p>是因为磁盘空间不足导致数据库错误.</p>
<p>而作者在不清楚具体原因时,竟然贸然offline drop了重要的数据文件.这一行为非常草率.做为DBA我们不仅<a href="http://www.eygle.com/archives/2006/06/dba_must_be_preciseness.html">不能想当然</a>,也不能无知者无畏.</p>
<p>当然,offline drop的文件,可以通过重建控制文件的方式重新加入数据库,再尝试正常的恢复.</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>SQL&gt; alter database backup controlfile to trace ;</p>
            <p>Database altered.</p>
            <p>SQL&gt; shutdown immediate;<br />Database closed.<br />Database dismounted.<br />ORACLE instance shut down.<br />SQL&gt; startup mount;<br />ORACLE instance started.</p>
            <p>Total System Global Area 139531744 bytes<br />Fixed Size 452064 bytes<br />Variable Size 121634816 bytes<br />Database Buffers 16777216 bytes<br />Redo Buffers 667648 bytes<br />Database mounted.<br />SQL&gt; select name from v$datafile;</p>
            <p>NAME<br />--------------------------------------------------------<br />/opt/oracle/oradata/eygle/system01.dbf<br />/opt/oracle/oradata/eygle/undotbs01.dbf<br />/opt/oracle/oradata/eygle/users01.dbf<br />/opt/oracle/oradata/eygle/eygle01.dbf</p>
            <p>SQL&gt; alter database datafile '/opt/oracle/oradata/eygle/users01.dbf' offline drop;</p>
            <p>Database altered.</p>
            <p>SQL&gt; alter database open;</p>
            <p>Database altered.</p>
            <p>SQL&gt; shutdown immediate;<br />Database closed.<br />Database dismounted.<br />ORACLE instance shut down.<br />SQL&gt; startup nomount;<br />ORACLE instance started.</p>
            <p>Total System Global Area 139531744 bytes<br />Fixed Size 452064 bytes<br />Variable Size 121634816 bytes<br />Database Buffers 16777216 bytes<br />Redo Buffers 667648 bytes<br />SQL&gt; CREATE CONTROLFILE REUSE DATABASE &quot;EYGLE&quot; NORESETLOGS ARCHIVELOG<br />2 -- SET STANDBY TO MAXIMIZE PERFORMANCE<br />3 MAXLOGFILES 5<br />4 MAXLOGMEMBERS 3<br />5 MAXDATAFILES 100<br />6 MAXINSTANCES 1<br />7 MAXLOGHISTORY 226<br />8 LOGFILE<br />9 GROUP 1 '/opt/oracle/oradata/eygle/redo01.log' SIZE 10M,<br />10 GROUP 2 '/opt/oracle/oradata/eygle/redo02.log' SIZE 10M,<br />11 GROUP 3 '/opt/oracle/oradata/eygle/redo03.log' SIZE 10M<br />12 -- STANDBY LOGFILE<br />13 DATAFILE<br />14 '/opt/oracle/oradata/eygle/system01.dbf',<br />15 '/opt/oracle/oradata/eygle/undotbs01.dbf',<br />16 '/opt/oracle/oradata/eygle/users01.dbf',<br />17 '/opt/oracle/oradata/eygle/eygle01.dbf'<br />18 CHARACTER SET ZHS16GBK<br />19 ;</p>
            <p>Control file created.</p>
            <p>SQL&gt; alter database open;<br />alter database open<br />*<br />ERROR at line 1:<br />ORA-01113: file 3 needs media recovery<br />ORA-01110: data file 3: '/opt/oracle/oradata/eygle/users01.dbf'</p>
            <p><br />SQL&gt; recover datafile 3;<br />Media recovery complete.<br />SQL&gt; alter database open;</p>
            <p>Database altered.</p>
            <p>SQL&gt; </p>
            </td>
        </tr>
    </tbody>
</table>
<p>最后仍然要重复之前那句话:无知者不能无畏.</p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/07/be_careful_dba.html</link>
<guid>http://www.eygle.com/archives/2006/07/be_careful_dba.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Fri, 14 Jul 2006 13:13:20 +0800</pubDate>
</item>
<item>
<title>DBA警世录:谨慎操作数据字典</title>
<description><![CDATA[<p>今天有朋友在ITPUB提问:<font face="Verdana"><a href="http://www.itpub.net/575917.html">请问sys.file$数据字典文件被truncate掉后是否能够恢复</a>,具体的内容是:</font></p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p>请问sys.file$数据字典文件被truncate掉后是否能够恢复 <br />昨天太大意了，一不小心把file$中的内容给删除了，造成tablespace里的数据文件列表看不到了，请各位大侠帮忙！是否能用那几个建库脚本将这个表的内容恢复？</p>
</blockquote>
<p dir="ltr">我们知道数据字典对于数据库来说至关重要,通常建议不要手工对数据字典进行任何修改和变更.因为一个简单的修改可能引发数据库内部很多潜在的问题.</p>
<p dir="ltr">除非在Oracle技术支持的指导下,对字典的手工修改实在没有必要.对于DBA来说,大家需要谨记:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p dir="ltr"><font color="#ff0000"><strong>绝对不要手工修改数据字典</strong></font></p>
</blockquote>
<p dir="ltr">如果发生如上意外(当然这个意外也太离奇了点),最好能够从备份中恢复(当然有些字典表是能够通过insert等简单操作恢复的),如果不存在备份,一个需要知道的经验是,千万不要关闭数据库,尝试导出数据,最后如果无法恢复,可以通过重建,imp导入数据恢复.</p>
<p>&nbsp;</p>
<p>后来这位朋友电话给我,因为是测试环境,同时没有备份,我建议他导出重建数据库来解决.</p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/06/dba_not_modify_dictionary.html</link>
<guid>http://www.eygle.com/archives/2006/06/dba_not_modify_dictionary.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Thu, 22 Jun 2006 16:03:02 +0800</pubDate>
</item>
<item>
<title>DBA警世录:DBA千万不要想当然</title>
<description><![CDATA[<p>前几天写过一则《<a href="http://www.eygle.com/archives/2006/06/dba_update_prop.html">DBA警世录:更新系统表(props$)修改字符集</a>》，在Itpub开始了<a href="http://www.itpub.net/566011.html">讨论</a>之后，有朋友在没有仔细阅读的情况下，竟然在Oracle8i上尝试去试验，结果当然是数据库无法打开。</p>
<p>回顾错误的过程，这位朋友说：</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p><strong>我还以为 props$这个表可以在mount 下面update 呢</strong></p>
</blockquote>
<p dir="ltr">对&quot;以为&quot;这两个字，我是深恶痛绝的，作为DBA，一定要<a href="http://www.eygle.com/archives/2005/12/what_kind_of_dba_we_need.html">严谨</a>，千万不能想当然。想当然的轻率对于数据库来说可能是灾难。</p>
<p dir="ltr">把这个故事记录在这里，为大家稍作警戒。</p>
<p dir="ltr">&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/06/dba_must_be_preciseness.html</link>
<guid>http://www.eygle.com/archives/2006/06/dba_must_be_preciseness.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Wed, 14 Jun 2006 15:43:43 +0800</pubDate>
</item>
<item>
<title>DBA警世录:更新系统表(props$)修改字符集</title>
<description><![CDATA[<p>今天在<a href="http://www.itpub.net/">Itpub</a>上再次看到<a href="http://www.itpub.net/566011.html">字符集变化</a>导致的问题,作者给出的案例是这样的:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p>数据库为 9.2.0.7.0 ,OS : Solaris Operating System (SPARC 64-bit) </p>
<p>起因是这样的，我的一客户那里UPS出现故障导致系统宕机，<br />然后起来，大约过了10来分钟，突然操作系统找不到磁盘又一次宕机，<br />然后再起来，有用户报一个SQL用不上索引.</p>
<p>这个SQL是这样的：</p>
<p>select * from ww.test20060504 dg where dg.user_number='7290'</p>
<p>第一个想法是给那个索引做分析，但还是不行，我们就对这个表做了一次分析，但执行计划没有什么改变 。<br />我们尝试加提示（包括加 rule )，但也不行，用户反映是有一批这样类似的都用不到索引。<br />然后通过 10053 做 trace 居然发现优化器根本没有考虑索引。开始怀疑这个数据库的数据字典可能有问题。<br />我们只好用一个笨方法，将其中一个表导到测试库上去测试，在导出的过程中居然发现系统报错</p>
<p>EXP-00056: ORACLE error 6552 encountered<br />ORA-06552: PL/SQL: Compilation unit analysis terminated<br />ORA-06553: PLS-553: character set name is not recognized</p>
<p>居然系统报字符集的错！<br />.....</p>
<p>真是晕呀！我们马上仔细检查了一个 alter 文件发现了一条信息，系统在第一次宕机起来后就自已将 controlfile 中的字符集给更改了。最后我们将系统中的字符集改回来系统就恢复正常了。</p>
</blockquote>
<p>警告日志中的信息是这样的:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p>SMON: enabling tx recovery<br />Mon Jun 5 09:52:52 2006<br /><strong>Updating character set in controlfile to ZHS16CGB231280</strong><br />replication_dependency_tracking turned off (no async multimaster replication found)</p>
</blockquote>
<p dir="ltr">其实这个信息是手工更新过数据库字符集后,重新启动,数据库比较数据库和控制文件信息,根据数据库字符集修改控制文件字符集导致的.</p>
<p dir="ltr">我在以前作过这样的测试,参考:</p>
<p dir="ltr"><a href="http://www.eygle.com/special/NLS_CHARACTER_SET_03.htm">http://www.eygle.com/special/NLS_CHARACTER_SET_03.htm</a></p>
<p dir="ltr">通过更新props$的方式修改字符集是非常危险的,我在以上的文章中有过详细说明,在Oracle8i中,如果修改了错误的字符集,那么重新启动后数据库将无法启动.</p>
<p dir="ltr">如果需要提醒的话,我们需要再警世一次:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p dir="ltr"><strong>绝对不要用update系统表(props$)的方式来修改数据库字符集.</strong></p>
</blockquote>
<p dir="ltr">但是从Oracle9i开始,Oracle在启动时跳过了这个检查,即使修改了错误的字符集,也仍然可以启动,数据库启动时会将控制文件中的字符集更改为缺省的US7ASCII.</p>
<p dir="ltr">具体可以看看以下的测试:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>SQL&gt; select value$ from props$ where name='NLS_CHARACTERSET';</p>
            <p>VALUE$<br />----------------------------------------------------------<br />ZHS16GBK</p>
            <p>SQL&gt; update props$ set value$='EYGLE'<br />2 where name='NLS_CHARACTERSET';</p>
            <p>1 row updated.</p>
            <p>SQL&gt; commit;</p>
            <p>Commit complete.</p>
            <p>SQL&gt; select value$ from props$ where name='NLS_CHARACTERSET';</p>
            <p>VALUE$<br />-----------------------------------------<br />EYGLE</p>
            <p>SQL&gt; shutdown immediate;<br />Database closed.<br />Database dismounted.<br />ORACLE instance shut down.<br />SQL&gt; startup<br />ORACLE instance started.</p>
            <p>Total System Global Area 126948772 bytes<br />Fixed Size 452004 bytes<br />Variable Size 92274688 bytes<br />Database Buffers 33554432 bytes<br />Redo Buffers 667648 bytes<br />Database mounted.<br />Database opened.<br />SQL&gt; select value$ from props$ where name='NLS_CHARACTERSET';</p>
            <p>VALUE$<br />----------------------------------------------<br />EYGLE</p>
            </td>
        </tr>
    </tbody>
</table>
<p>此时警告日志中会记录如下信息:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">Thu Jun 8 16:28:05 2006<br />SMON: enabling cache recovery<br />SMON: enabling tx recovery<br />Thu Jun 8 16:28:05 2006<br />Updating character set in controlfile to US7ASCII<br />replication_dependency_tracking turned off (no async multimaster replication found)<br />Completed: ALTER DATABASE OPEN</td>
        </tr>
    </tbody>
</table>
<p>不同版本中,Oracle行为已经不同.</p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/06/dba_update_prop.html</link>
<guid>http://www.eygle.com/archives/2006/06/dba_update_prop.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Thu, 08 Jun 2006 15:31:11 +0800</pubDate>
</item>
<item>
<title>DBA警世录:Truncate之生产与测试环境</title>
<description><![CDATA[<p>不断的看到很多DBA在学习或工作过程中犯过很多相同或相似的错误.忽然想到,如果我把这些常见的错误或者故障收集记录下来,做为《警世录》,那么大家是不是可以做为借鉴,并使得后来人少犯或者不犯这些错误呢?</p>
<p>这就是DBA警世录的由来.</p>
<p>今天看到有朋友记下了这样一个案例:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p><strong>因为要导两个表的数据到测试库,结果在产品库上用了Truncate......</strong><br />更糟的是客户首先发现了问题 而不是自己 自己以为目标是<br />测试库............ </p>
<p>总结:<br />1. 谨慎&amp;细心<br />操作涉及产品库慎之再慎 <br />2. 产品库和测试库有相同的user/pw(这在某种程度上造成了假象)</p>
<p>ps:此次事件被定性为生产事故 严重 </p>
</blockquote>
<p dir="ltr">这样的案例很多见，因为测试环境和生产环境混淆而导致的误Delete,误Truncate操作经常发生。除了DBA不够严谨之外，制度上没有保证也是问题之一。</p>
<p dir="ltr">这位同学总结的很好，通常我们的测试库和产品库应该设置不同的用户密码，不同的SID，在进行重要操作时，应该先select instance_name from v$instance命令验证一下当前连接的例程：</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>SQL&gt; select instance_name from v$instance;</p>
            <p>INSTANCE_NAME<br />----------------<br />eygle</p>
            </td>
        </tr>
    </tbody>
</table>
<p>这就如同我们在Unix/Linux主机上应该经常用hostname来确认一下当前连接的主机一样。</p>
<p>如果在本地登陆，我们还可以通过修改本地glogin.sql文件，显示当前连接的实例等信息。</p>
<p>总之，在执行任何数据变更操作之前，我们都应当<a href="http://www.eygle.com/archives/2006/02/make_u_data_safety.html">谨慎</a>。这是对于DBA的基本要求之一。</p>
<p>参考连接:<br />生产事故 <a href="http://www.itpub.net/533262.html">http://www.itpub.net/533262.html</a>&nbsp;&nbsp; </p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/04/dba_warning_truncate.html</link>
<guid>http://www.eygle.com/archives/2006/04/dba_warning_truncate.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Tue, 25 Apr 2006 10:31:43 +0800</pubDate>
</item>
<item>
<title>如何解决Ora-00600 4194错误</title>
<description><![CDATA[<p>启动数据库出现Ora-00600 4194错误，观察alert文件，主要错误日志如下:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sat Jan 21 13:55:21 2006<br />Errors in file /opt/oracle/admin/conner/bdump/conner_smon_17113.trc:<br />ORA-00600: internal error code, arguments: [4194], [43], [46], [], [], [], [], []<br />Sat Jan 21 13:55:21 2006<br />Errors in file /opt/oracle/admin/conner/udump/conner_ora_17121.trc:<br />ORA-00600: internal error code, arguments: [4194], [45], [44], [], [], [], [], []&nbsp;</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>4194错误通常说明UNDO段出现问题,最好的办法是通过备份进行恢复,如果没有备份,那么可以通过特殊的初始化参数进行强制启动,本文就Oracle的<a href="http://www.eygle.com/archives/2005/02/ecinaoracleaeoi.html">隐含参数</a>进行恢复说明(由于实际情况可能各不相同，进行测试前请先行备份)，仅供参考。<br /></p>
<p>首先确定当前回滚段名称，这可以从alert文件中获得:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sat Jan 21 13:55:21 2006<br />Undo Segment 11 Onlined<br />Undo Segment 12 Onlined<br />Undo Segment 13 Onlined<br />Successfully onlined Undo Tablespace 16.&nbsp;</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>对应的AUM (auto undo management) 下的回滚段名称为:&nbsp;</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>'_SYSSMU11$','_SYSSMU12$','_SYSSMU13$'</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>&nbsp;修改init&lt;sid&gt;.ora参数文件，使用Oracle隐含参数_corrupted_rollback_segments将回滚段标记为损坏，此时启动数据库，Oracle会跳过对于这些回滚段的相关操作，强制启动数据库。</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>._corrupted_rollback_segments='_SYSSMU11$','_SYSSMU12$','_SYSSMU13$'</pre>
            </td>
        </tr>
    </tbody>
</table>]]></description>
<link>http://www.eygle.com/archives/2006/02/howto_resolve_ora_600_4194.html</link>
<guid>http://www.eygle.com/archives/2006/02/howto_resolve_ora_600_4194.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Mon, 13 Feb 2006 18:37:05 +0800</pubDate>
</item>
<item>
<title>年终难终 进入数据库事故多发期</title>
<description><![CDATA[<p>据多日观测,临近年终,国内各行业数据库已经进入了数据库事故多发期,主要重大事故有:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p><a href="http://www.itpub.net/481980.html" target="_blank"><font color="#0082ff">误删表空间</font></a><br /><a href="http://www.itpub.net/481771.html" target="_blank"><font color="#0082ff">System损坏</font></a><br /><a href="http://www.itpub.net/482506.html" target="_blank"><font color="#0082ff">又一个System损坏</font></a></p>
</blockquote>
<p dir="ltr">再加上今日,dcba又报道了一例redo损坏的重大事故:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p dir="ltr"><a href="http://www.anysql.net/2006/01/open_drop_onlinelog.html">在open时删除了所有的联机log后能起来吗?</a></p>
</blockquote>
<p dir="ltr">在这里我不想就具体技术细节进行评论,我只想说说我对这些问题的看法.在我反复面试我的<a href="http://www.eygle.com/archives/2005/11/find_another_dba.html">DBA</a>们时,经常我会谈到:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p dir="ltr">在我们这里,你不会遇到太复杂的恢复情况,一个管理良好的数据库始终能够从容的从事故或灾难中恢复出来,所以比较起来,在复杂的技术和严谨的态度之间,我们更需要你的严谨.</p>
</blockquote>
<p dir="ltr">这也是我对DBA的一个<a href="http://www.eygle.com/archives/2005/12/what_kind_of_dba_we_need.html">基本要求</a>,如果你足够严谨,以上的情况你可能都不需要面对,最严重的情况,你还有有效的备份可以恢复.在我主讲的<a href="http://www.eygle.com/archives/2005/08/itpub_dbaaeanoo.html">DBA课程</a>中,我曾经提到的DBA四大守则里,第一守则就是:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p dir="ltr"><font color="#ff0000"><strong>备份重于一切<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 系统总是要崩溃的，没有有效的备份只是等哪一天死！</strong></font></p>
</blockquote>
<p dir="ltr"><font color="#ff0000"></font>然而可惜的是大多数人都在运行无备份的数据库系统,这是DBA的悲哀.</p>]]></description>
<link>http://www.eygle.com/archives/2006/01/backup_is_most_important.html</link>
<guid>http://www.eygle.com/archives/2006/01/backup_is_most_important.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Sun, 22 Jan 2006 14:48:02 +0800</pubDate>
</item>
<item>
<title>Oracle HowTo: How to deal with Ora-600 4193 error</title>
<description><![CDATA[<p>在解决<a href="http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html">2662</a>错误之后，经常会出现Ora-00600 4193错误，经常可以在alert文件中看到的错误号类似:</p>
<blockquote>Fri Dec 16 22:37:27 2005<br />Errors in file /opt/oracle/admin/conner/bdump/conner_smon_22817.trc:<br />ORA-00604: error occurred at recursive SQL level 1<br />ORA-00607: Internal error occurred while making a change to a data block<br />ORA-00600: internal error code, arguments: [4193], [1171], [1187], [], [], [], [], []<br />Fri Dec 16 23:28:40 2005<br />Errors in file /opt/oracle/admin/conner/bdump/conner_smon_22817.trc:<br />ORA-00600: internal error code, arguments: [4193], [1171], [1187], [], [], [], [], []</blockquote>

<p>4193错误通常是因为恢复时redo与undo不一致所导致。</p>
<p>Oracle的解释如下:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p><strong>&nbsp; While backing out an undo record (i.e. at the time of rollback) we found a&nbsp; transaction id mis-match indicating either a corruption in the rollback&nbsp;&nbsp; segment or corruption in an object which the rollback segment is trying to&nbsp; apply undo records on.</strong></p>
<p><strong>&nbsp; <em>This would indicate a corrupted rollback segment</em>.</strong></p>
</blockquote>
<p>检查具体的Trace文件，可以发现类似如下错误：</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>*** 2005-12-16 20:54:53.496<br />ksedmp: internal or fatal error<br />ORA-00600: internal error code, arguments: [4193], [1171], [1187], [], [], [], [], []<br />Current SQL statement for this session:<br />UPDATE SMON_SCN_TIME SET SCN_WRP=:1, SCN_BAS=:2, TIME_MP=:3, TIME_DP=:4 <br />WHERE TIME_MP = :5&nbsp; AND&nbsp;&nbsp; THREAD = :6&nbsp; AND&nbsp;&nbsp; ROWNUM &lt;= 1</pre>
            </td>
        </tr>
    </tbody>
</table>]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_howto_deal_with_ora600_4137_error.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_howto_deal_with_ora600_4137_error.html</guid>
<category>HowTo</category>
<pubDate>Fri, 30 Dec 2005 14:25:14 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:How to deal with ORA-600 2662 Error</title>
<description><![CDATA[<p>在<a href="http://www.eygle.com/archives/2005/10/ora00600_2262ii.html">ORA-00600 2262错误解决</a>一文中，我曾经提到过，很多时候使用隐含参数<a href="http://www.eygle.com/archives/2005/10/oracle_hidden_allow_resetlogs_corruption.html">_ALLOW_RESETLOGS_CORRUPTION</a>后resetlogs打开数据库,我们可能会由于SCN不一致而遭遇到ORA-00600 2662号错误，这里给出一个完整的例子及解决过程。</p>
<p>当然模拟2662错误需要技巧，本文并不会涉及这个内容。</p>
<p>通过正常方式启动数据库时，从alert文件中，我们可以看到ora-00600 2662号错误。</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sun Dec 11 18:02:25 2005<br />Errors in file /opt/oracle/admin/conner/udump/conner_ora_13349.trc:<br /><strong>ORA-00600: internal error code, arguments: [2662], [0], [547743994], [0], [898092653], [8388617], [], []<br /></strong>Sun Dec 11 18:02:27 2005<br />Errors in file /opt/oracle/admin/conner/udump/conner_ora_13349.trc:<br /><strong>ORA-00600: internal error code, arguments: [2662], [0], [547743994], [0], [898092653], [8388617], [], []<br /></strong>Sun Dec 11 18:02:27 2005<br />Error 600 happened during db open, shutting down database<br />USER: terminating instance due to error 600</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>此时我们可以通过Oracle的<a href="http://www.eygle.com/internal/Oracle.Diagnostics.Events.list.htm">内部事件</a>来调整SCN:</p>
<p>增进SCN有两种常用方法:</p>
<p>1.通过immediate trace name方式(在数据库Open状态下)</p>
<p><strong><em>alter session set events 'IMMEDIATE trace name ADJUST_SCN level x';</em></strong></p>
<p>2.通过10015事件(在数据库无法打开，mount状态下)</p>
<p><font face="Courier"><strong><em>alter session set events '10015 trace name adjust_scn level x';</em></strong></font></p>
<p><font face="Courier">注:level 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。</font></p>
<p><font face="Courier">本例由于数据库无法打开，只能使用的二种方法。</font></p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>[oracle@jumper dbs]$ sqlplus &quot;/ as sysdba&quot;</pre>
            <pre>SQL*Plus: Release 9.2.0.4.0 - Production on Sun Dec 11 18:26:18 2005</pre>
            <pre>Copyright (c) 1982, 2002, Oracle Corporation.&nbsp; All rights reserved.</pre>
            <pre>Connected to an idle instance.</pre>
            <pre>SQL&gt; startup mount pfile=initconner.ora<br />ORACLE instance started.</pre>
            <pre>Total System Global Area&nbsp;&nbsp; 97588504 bytes<br />Fixed Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 451864 bytes<br />Variable Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 33554432 bytes<br />Database Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 62914560 bytes<br />Redo Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 667648 bytes<br />Database mounted.<br /></pre>
            <pre>SQL&gt; alter session set events '10015 trace name adjust_scn level 10';</pre>
            <pre>Session altered.</pre>
            <pre>SQL&gt; alter database open;</pre>
            <pre>Database altered.</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>注意,由于我使用了10015事件，使得SCN增进了10 <font face="Courier">billion，稍后我们可以验证。</font>&nbsp;</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>[oracle@jumper dbs]$ sqlplus &quot;/ as sysdba&quot;</pre>
            <pre>SQL*Plus: Release 9.2.0.4.0 - Production on Sun Dec 11 18:26:18 2005</pre>
            <pre>Copyright (c) 1982, 2002, Oracle Corporation.&nbsp; All rights reserved.</pre>
            <pre>Connected to an idle instance.</pre>
            <pre>SQL&gt; startup mount pfile=initconner.ora<br />ORACLE instance started.</pre>
            <pre>Total System Global Area&nbsp;&nbsp; 97588504 bytes<br />Fixed Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 451864 bytes<br />Variable Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 33554432 bytes<br />Database Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 62914560 bytes<br />Redo Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 667648 bytes<br />Database mounted.<br /></pre>
            <pre>SQL&gt; <strong>alter session set events '10015 trace name adjust_scn level 10';</strong></pre>
            <pre>Session altered.</pre>
            <pre>SQL&gt; alter database open;</pre>
            <pre>Database altered.</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>此时数据库可以打开，从alert文件中我们可以看到如下提示:</p>
<p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sun Dec 11 18:27:04 2005<br />SMON: enabling cache recovery<br />Sun Dec 11 18:27:05 2005<br />Debugging event used to advance scn to <strong>10737418240</strong></pre>
            </td>
        </tr>
    </tbody>
</table>
</p>
<p>SCN被增进了10 billion,即 10 * (1024*1024*1024) = <strong>10737418240</strong>,正好是日志里记录的数量。</p>]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Tue, 20 Dec 2005 19:52:36 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:How to deal with ORA-19815</title>
<description><![CDATA[<p>下班的时候,想不到又遇到了<a href="http://www.eygle.com/archives/2005/03/oracle10gecieif.html">ORA-19815</a>错误,这个10g的数据库最近数据量狂增,每天产生大约5~6个G的归档:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <div>
            <pre><strong>ORA-19815</strong>: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 85.00% used,
            and has 8052259328 remaining bytes available.
            *************************************************************
            You have the following choices to free up space from<br />flash recovery area:<br />1. Consider changing your RMAN retention policy.<br />&nbsp;&nbsp; If you are using dataguard, then consider changing your<br />&nbsp;&nbsp; RMAN archivelog deletion policy.<br />2. Backup files to tertiary device such as tape using the<br />&nbsp;&nbsp; RMAN command BACKUP RECOVERY AREA.<br />3. Add disk space and increase the db_recovery_file_dest_size<br />&nbsp;&nbsp; parameter to reflect the new space.<br />4. Delete unncessary files using the RMAN DELETE command.<br />&nbsp;&nbsp; If an OS command was used to delete files, then use<br />&nbsp;&nbsp; RMAN CROSSCHECK and DELETE EXPIRED commands.</pre>
            </div>
            </td>
        </tr>
    </tbody>
</table>
<br />
<p>db_recovery_file_dest_size设置的是50G,在当前的备份策略下已经不足够.只好临时扩展一下恢复区:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre> SQL&gt; show parameter recov</pre>
            <pre>NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TYPE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VALUE<br />------------------------------------ ----------- ------------------------------<br />db_recovery_file_dest&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /msflsh<br />db_recovery_file_dest_size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; big integer 50G<br />recovery_parallelism&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; integer&nbsp;&nbsp;&nbsp;&nbsp; 0<br />SQL&gt; alter system set db_recovery_file_dest_size=65G scope=both;</pre>
            <pre>System altered.</pre>
            <pre>SQL&gt; show parameter recov</pre>
            <pre>NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TYPE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; VALUE<br />------------------------------------ ----------- ------------------------------<br />db_recovery_file_dest&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /msflsh<br />db_recovery_file_dest_size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; big integer 65G<br />recovery_parallelism&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; integer&nbsp;&nbsp;&nbsp;&nbsp; 0</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>再修改下冗余策略,释放部分磁盘空间:&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_diagnost_howto_deal_ora-19815.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_diagnost_howto_deal_ora-19815.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Mon, 05 Dec 2005 21:37:38 +0800</pubDate>
</item>
<item>
<title>数据文件SCN的一致性问题</title>
<description><![CDATA[回答几个留言板上的问题:<br>

<strong>1、数据库正常运行中，所有数据文件的SCN都是一致的吗？ <br>
2、将一数据文件offline后，再将其online时，这个数据文件的SCN会前提吗？假如是，前提到的SCN是怎么确定的？</strong><br>

1.数据库正常运行时，所有数据文件的SCN不一定一致。<br>
问题在这个所有上，比如Offline表空间，数据文件的SCN会被冻结，而且表空间的数据文件offline/online时又会发生文件检查点，使单个数据文件SCN和数据库其他问题不一致。<br>
<br>
2.表空间online时，Oracle会取得当前SCN，解冻offline文件SCN，和当前SCN同步。<br>
简单的实验就可以清晰地看到这些变化:<br>

<table><td width="500" bgcolor="#999999"> <pre>
SQL> set echo on
SQL> @a
SQL> alter system checkpoint;

System altered.

SQL> select file#,checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1          546198149
         2          546198149
         3          <strong>546198149</strong>

SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
               <strong>546198149</strong>

SQL> alter tablespace users offline;

Tablespace altered.

SQL> select file#,checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1          546198149
         2          546198149
         3          <strong>546198153</strong>

SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
               546198159

SQL> alter tablespace users online;

Tablespace altered.

SQL> select file#,checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1          546198149
         2          546198149
         3          <strong>546198162</strong>

SQL> 
SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
               <strong>546198178</strong>


</pre></td></table><br>

<strong>如果是单纯的offline datafile，那么将不会触发文件检查点，只有针对offline tablespace的时候才会触发文件检查点，这也是为什么online datafile需要media recovery而online tablespace不需要。</strong>

]]></description>
<link>http://www.eygle.com/archives/2005/08/eiaescnaeooeaoi.html</link>
<guid>http://www.eygle.com/archives/2005/08/eiaescnaeooeaoi.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Sat, 06 Aug 2005 23:28:47 +0800</pubDate>
</item>


</channel>
</rss>