<?xml version="1.0" encoding="GB2312"?>
<rss version="2.0">
<channel>
<title>Friends Life and Oracle</title>
<link>http://www.eygle.com/blog/</link>
<description>eygle的Oracle Blog，提供Oracle技术研究及深入探讨，同时记录个人爱好及生活历程。</description>
<copyright>Copyright 2006</copyright>
<lastBuildDate>Tue, 28 Nov 2006 10:32:19 +0800</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.33</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 

<item>
<title>DataGuard数据库服务器硬盘故障处理一则</title>
<description><![CDATA[<p>昨天一台PC Server上的数据库又出问题，<a href="http://www.eygle.com/archives/2006/10/start_dataguard_db.html">同样</a>是硬盘故障。</p>

<p>这两台服务器用的都是<a href="http://www.realserver.com.cn/">联志</a>的国产低端PC Server，这些服务器的质量实在是差，上次一台备机的硬盘损坏，然后又有一台因为电源模块的问题反复重起，现在这一台服务器的硬盘再次出现问题。</p>

<blockquote>Nov 24 10:27:48 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:48 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:48 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011258880<BR>Nov 24 10:27:49 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:49 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011262976<BR>Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011267072<BR>Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device<BR>Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191<BR>Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir: <BR>&nbsp;directory #128110 contains a hole at offset 2011271168</blockquote>

<p>好在数据库通过DataGuard可以切换到另外一台，没有数据损失：<br />
<blockquote>Thu Nov 23 18:46:18 2006<BR>ARC0: Complete FAL archive (thread 1 sequence 6045 destination bmarksb)<BR>ARC0: Begin FAL archive (thread 1 sequence 6047 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>ARC0: Complete FAL archive (thread 1 sequence 6047 destination bmarksb)<BR>ARC0: Begin FAL archive (thread 1 sequence 6048 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>Thu Nov 23 18:46:18 2006<BR>ARC1: Complete FAL archive (thread 1 sequence 6046 destination bmarksb)<BR>ARC1: Begin FAL archive (thread 1 sequence 6049 destination bmarksb)<BR>Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<BR>Thu Nov 23 18:46:18 2006<BR>ARC0: Complete FAL archive (thread 1 sequence 6048 destination bmarksb)<BR>Thu Nov 23 18:46:18 2006<BR>ARC1: Complete FAL archive (thread 1 sequence 6049 destination bmarksb)</blockquote></p>

<p>现在是主库所在的服务器出现问题:<br />
<blockquote><P>SQL&gt; select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;</P><br />
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DBID NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PROTECTION_MODE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DATABASE_ROLE&nbsp;&nbsp;&nbsp; SWITCHOVER_STATUS<BR>---------- --------- -------------------- ---------------- ------------------<BR>3520694939 BMARK&nbsp;&nbsp;&nbsp;&nbsp; MAXIMUM PERFORMANCE&nbsp; PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SESSIONS ACTIVE</P></blockquote></p>

<p>备库现在一切正常:<br />
<blockquote><P>SQL&gt; select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;</P><br />
<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DBID NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PROTECTION_MODE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DATABASE_ROLE&nbsp;&nbsp;&nbsp; SWITCHOVER_STATUS<BR>---------- --------- -------------------- ---------------- ------------------<BR>3520694939 BMARK&nbsp;&nbsp;&nbsp;&nbsp; MAXIMUM PERFORMANCE&nbsp; PHYSICAL STANDBY SESSIONS ACTIVE</P></blockquote></p>

<p>现在需要的是一点停机时间进行切换。</p>

<p>切换日志:<br />
<blockquote>Fri Nov 24 11:30:43 2006<br />
alter database commit to switchover to physical standby with session shutdown<br />
Fri Nov 24 11:30:43 2006<br />
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY<br />
Fri Nov 24 11:30:43 2006<br />
SMON: disabling tx recovery<br />
Fri Nov 24 11:30:44 2006<br />
Active process 26743 user 'oracle' program 'oracle@wapcom1.hawa.cn (CJQ0)'<br />
Active process 9033 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 7655 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
...............<br />
Active process 8944 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 29104 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 30750 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
Active process 9045 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'<br />
CLOSE: waiting for server sessions to complete.<br />
Fri Nov 24 11:31:51 2006<br />
CLOSE: all sessions shutdown successfully.<br />
Fri Nov 24 11:32:09 2006<br />
SMON: disabling cache recovery<br />
Fri Nov 24 11:32:10 2006<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Fri Nov 24 11:32:10 2006<br />
ARCH shutting down<br />
Fri Nov 24 11:32:10 2006<br />
ARCH shutting down<br />
Fri Nov 24 11:32:10 2006<br />
ARC0: Archival stopped<br />
Fri Nov 24 11:32:10 2006<br />
ARC1: Archival stopped<br />
Fri Nov 24 11:32:10 2006<br />
Thread 1 closed at log sequence 6076<br />
Successful close of redo thread 1<br />
Fri Nov 24 11:32:28 2006<br />
ARCH: noswitch archival of thread 1, sequence 6076<br />
ARCH: End-Of-Redo archival of thread 1 sequence 6076<br />
ARCH: Evaluating archive   log 3 thread 1 sequence 6076<br />
ARCH: Beginning to archive log 3 thread 1 sequence 6076<br />
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'<br />
Creating archive destination LOG_ARCHIVE_DEST_1: '/var/oradata/arch/1_6076.arc'<br />
ARCH: Completed archiving  log 3 thread 1 sequence 6076<br />
ARCH: archiving is disabled due to current logfile archival<br />
Clearing standby activation ID 3520937155 (0xd1dd3cc3)<br />
The primary database controlfile was created using the<br />
'MAXLOGFILES 5' clause.<br />
The resulting standby controlfile will not have enough<br />
available logfile entries to support an adequate number<br />
of standby redo logfiles. Consider re-creating the<br />
primary controlfile using 'MAXLOGFILES 8' (or larger).<br />
Use the following SQL commands on the standby database to create<br />
standby redo logfiles that match the primary database:<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 10485760;<br />
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 10485760;<br />
Archivelog for thread 1 sequence 6076 required for standby recovery<br />
MRP0 started with pid=8<br />
MRP0: Background Managed Standby Recovery process started<br />
Media Recovery Log /var/oradata/arch/1_6076.arc<br />
Identified end-of-REDO for thread 1 sequence 6076<br />
Identified end-of-REDO for thread 1 sequence 6076<br />
Media Recovery End-Of-Redo indicator encountered<br />
Media Recovery Applied until change 194025715<br />
MRP0: Media Recovery Complete: End-Of-REDO<br />
Resetting standby activation ID 3520937155 (0xd1dd3cc3)<br />
MRP0: Background Media Recovery process shutdown<br />
Fri Nov 24 11:32:35 2006<br />
Switchover: Complete - Database shutdown required<br />
Completed: alter database commit to switchover to physical st<br />
Fri Nov 24 11:32:53 2006<br />
Shutting down instance: further logons disabled<br />
Shutting down instance (immediate)<br />
License high water mark = 140<br />
Fri Nov 24 11:32:53 2006<br />
ALTER DATABASE CLOSE NORMAL<br />
ORA-1507 signalled during: ALTER DATABASE CLOSE NORMAL...<br />
ARCH: Archiving is disabled<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Archive process shutdown avoided: 0 active<br />
ARCH: Archiving is disabled<br />
Shutting down archive processes<br />
Archiving is disabled<br />
Archive process shutdown avoided: 0 active<br />
Fri Nov 24 11:33:14 2006<br />
Starting ORACLE instance (normal)<br />
LICENSE_MAX_SESSION = 0<br />
LICENSE_SESSIONS_WARNING = 0<br />
SCN scheme 2<br />
Using log_archive_dest parameter default value<br />
LICENSE_MAX_USERS = 0<br />
SYS auditing is disabled<br />
Starting up ORACLE RDBMS Version: 9.2.0.6.0.<br />
System parameters with non-default values:<br />
  processes                = 150<br />
  timed_statistics         = TRUE<br />
  shared_pool_size         = 83886080<br />
  large_pool_size          = 33554432<br />
  standby_archive_dest     = /var/oradata/arch<br />
  fal_server               = bmarksb<br />
  fal_client               = bmark<br />
  log_archive_format       = %t_%s.arc<br />
...........<br />
CJQ0 started with pid=8<br />
Fri Nov 24 11:33:15 2006<br />
ARCH: STARTING ARCH PROCESSES<br />
ARC0 started with pid=9<br />
ARC0: Archival started<br />
ARC1 started with pid=10<br />
Fri Nov 24 11:33:15 2006<br />
ARCH: STARTING ARCH PROCESSES COMPLETE<br />
Fri Nov 24 11:33:15 2006<br />
ARC1: Archival started<br />
Fri Nov 24 11:33:15 2006<br />
ARC0: Thread not mounted<br />
Fri Nov 24 11:33:15 2006<br />
ARC1: Thread not mounted<br />
Fri Nov 24 11:33:22 2006<br />
alter database mount standby database<br />
Fri Nov 24 11:33:26 2006<br />
Successful mount of redo thread 1, with mount id 3559140162<br />
Fri Nov 24 11:33:26 2006<br />
Standby Database mounted.<br />
Completed: alter database mount standby database<br />
Fri Nov 24 11:33:29 2006<br />
ALTER DATABASE RECOVER  managed standby database disconnect  <br />
Attempt to start background Managed Standby Recovery process<br />
MRP0 started with pid=12<br />
MRP0: Background Managed Standby Recovery process started<br />
Fri Nov 24 11:33:34 2006<br />
Completed: ALTER DATABASE RECOVER  managed standby database d<br />
Fri Nov 24 11:33:34 2006<br />
Media Recovery Waiting for thread 1 seq# 6077<br />
Media Recovery Log /var/oradata/arch/1_6077.arc<br />
Media Recovery Waiting for thread 1 seq# 6078<br />
Media Recovery Log /var/oradata/arch/1_6078.arc<br />
Media Recovery Waiting for thread 1 seq# 6079</blockquote></p>

<p>看来以后不能再采购联志服务器了。</p>

<p>-The End-</p>]]></description>
<link>http://www.eygle.com/archives/2006/11/aisino_server_dataguard.html</link>
<guid>http://www.eygle.com/archives/2006/11/aisino_server_dataguard.html</guid>
<category>Advanced</category>
<pubDate>Tue, 28 Nov 2006 10:32:19 +0800</pubDate>
</item>
<item>
<title>磁盘IO故障 导致Redo损坏一例</title>
<description><![CDATA[<p>前几天一个数据库的硬盘出现<a href="http://www.eygle.com/archives/2006/10/system_tbs_io_corruption.html">问题</a>，经过格式化之后恢复正常，今天这块硬盘再次出现问题。</p>

<p>这次损坏的是Redo日志,数据库警告日志给出Redo相关的错误信息:<br />
<blockquote>Mon Nov 13 11:42:54 2006<br />
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:<br />
ORA-00333: redo log read error block 186498 count 6144<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186497<br />
Mon Nov 13 11:42:58 2006<br />
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:<br />
ORA-00333: redo log read error block 184450 count 8192<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186498<br />
Mon Nov 13 11:43:03 2006<br />
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:<br />
ORA-00333: redo log read error block 184450 count 8192<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186498</blockquote></p>

<p>相关的跟踪文件记录了类似的错误信息:<br />
<blockquote><br />
[oracle@gdmstest bdump]$ cat /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc<br />
/opt/oracle/admin/mydb/udump/mydb_ora_16682.trc<br />
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production<br />
With the Partitioning option<br />
JServer Release 9.2.0.4.0 - Production<br />
ORACLE_HOME = /opt/oracle/product/9.2.0<br />
System name:    Linux<br />
Node name:      gdmstest.hurray.com.cn<br />
Release:        2.4.21-15.EL<br />
Version:        #1 Thu Apr 22 00:27:41 EDT 2004<br />
Machine:        i686<br />
Instance name: mydb<br />
Redo thread mounted by this instance: 1<br />
Oracle process number: 11<br />
Unix process pid: 16682, image: oracle@gdmstest.hurray.com.cn (TNS V1-V3)</p>

<p>*** SESSION ID:(9.3) 2006-11-13 11:41:23.555<br />
Thread checkpoint rba:0x00001d.00000002.0010 scn:0x0000.000f94cd<br />
On-disk rba:0x00001d.0002dc60.0000 scn:0x0000.000f9b4e<br />
Use incremental checkpoint cache-low RBA<br />
Thread 1 recovery from rba:0x00001d.00029082.0000 scn:0x0000.00000000<br />
*** 2006-11-13 11:42:54.830<br />
ORA-00333: redo log read error block 186498 count 6144<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186497<br />
ORA-00333: redo log read error block 184450 count 8192<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186498<br />
ORA-00333: redo log read error block 184450 count 8192<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186498<br />
ORA-00333: redo log read error block 184450 count 8192<br />
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 2: No such file or directory<br />
Additional information: 186498</blockquote></p>

<p>察看系统提示，发现存在问题的扇区（Sector）和<a href="http://www.eygle.com/archives/2006/10/system_tbs_io_corruption.html">上次</a>相同(sector=14266880)，看来真的是物理损坏，只能更换硬盘了:</p>

<blockquote>
[oracle@gdmstest bdump]$ dmesg<BR>or=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880<BR>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<BR>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880<BR>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<BR>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880<BR>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<BR>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880<BR>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<BR>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880<BR>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<BR>hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<BR>end_request: I/O error, dev 03:06 (hda), sector 14266880</blockquote>

<p>-The End-</p>]]></description>
<link>http://www.eygle.com/archives/2006/11/io_fault_redo_corruption.html</link>
<guid>http://www.eygle.com/archives/2006/11/io_fault_redo_corruption.html</guid>
<category>Case</category>
<pubDate>Mon, 13 Nov 2006 14:51:24 +0800</pubDate>
</item>
<item>
<title>如何更改监听器日志文件名称</title>
<description><![CDATA[<p>今天一个数据库的监听器日志出了点问题，用set log_file命令重新定位一个日志文件得以解决。</p>

<p>发现以下两个命令很有用:</p>

<blockquote>LSNRCTL&gt; set current_listener &lt;listener name&gt; <BR>LSNRCTL&gt; set log_file &lt;sid name&gt;.log <BR></blockquote>

<p>使用set current_listener可以访问非缺省监听器，使用set log_file更改名称后，原来有问题的日志文件可以清除或实现日志重定位:<br />
<blockquote><P>[oracle@jumper admin]$ lsnrctl </P><br />
<P>LSNRCTL for Linux: Version 9.2.0.4.0 - Production on 10-NOV-2006 16:54:16</P><br />
<P>Copyright (c) 1991, 2002, Oracle Corporation.&nbsp; All rights reserved.</P><br />
<P>Welcome to LSNRCTL, type "help" for information.</P><br />
<P>LSNRCTL&gt; set current_listener LISTENER1<BR>Current Listener is LISTENER1<BR>LSNRCTL&gt; set log_file<BR>Parameter Value: a.log<BR>Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))<BR>LISTENER1 parameter "log_file" set to a.log<BR>The command completed successfully<BR>LSNRCTL&gt; status<BR>Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))<BR>STATUS of the LISTENER<BR>------------------------<BR>Alias&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LISTENER1<BR>Version&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TNSLSNR for Linux: Version 9.2.0.4.0 - Production<BR>Start Date&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10-NOV-2006 16:54:12<BR>Uptime&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 days 0 hr. 2 min. 6 sec<BR>Trace Level&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; off<BR>Security&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; OFF<BR>SNMP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; OFF<BR>Listener Parameter File&nbsp;&nbsp; /opt/oracle/product/9.2.0/network/admin/listener.ora<BR>Listener Log File&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /opt/oracle/product/9.2.0/network/log/a.log<BR>Listening Endpoints Summary...<BR>&nbsp; (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))<BR>&nbsp; (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.33.11)(PORT=1521)))<BR>Services Summary...<BR>Service "PLSExtProc" has 1 instance(s).<BR>&nbsp; Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...<BR>Service "conner" has 1 instance(s).<BR>&nbsp; Instance "conner", status UNKNOWN, has 1 handler(s) for this service...<BR>Service "eygle" has 1 instance(s).<BR>&nbsp; Instance "eygle", status UNKNOWN, has 1 handler(s) for this service...<BR>The command completed successfully<BR>LSNRCTL&gt; </P></blockquote></p>

<p>如果需要将这个修改永久化，需要使用save_config命令保存一下：<br />
<blockquote>LSNRCTL> save_config<br />
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))<br />
Saved LISTENER1 configuration parameters.<br />
Listener Parameter File   /opt/oracle/product/9.2.0/network/admin/listener.ora<br />
Old Parameter File   /opt/oracle/product/9.2.0/network/admin/listener.bak<br />
The command completed successfully</blockquote></p>

<p>此时listener.ora文件会被增加如下记录：<br />
<blockquote>[oracle@jumper oracle]$ tail -5 /opt/oracle/product/9.2.0/network/admin/listener.ora</p>

<p><br />
#----ADDED BY TNSLSNR 14-NOV-2006 16:39:12---<br />
LOG_FILE_LISTENER1 = a.log<br />
#--------------------------------------------</blockquote></p>

<p>记录一下。</p>

<p>-The End-<br />
</p>]]></description>
<link>http://www.eygle.com/archives/2006/11/lsnrctl_set_logfile.html</link>
<guid>http://www.eygle.com/archives/2006/11/lsnrctl_set_logfile.html</guid>
<category>Case</category>
<pubDate>Fri, 10 Nov 2006 17:33:35 +0800</pubDate>
</item>
<item>
<title>磁盘IO错误 导致数据库故障一则</title>
<description><![CDATA[<p><a href="http://www.eygle.com/archives/2006/10/start_dataguard_db.html">本周一</a>刚刚说过最近硬件故障频繁，昨天又有一个数据库出现问题。</p>

<p>同样是硬件故障，存放数据库软件及数据文件的磁盘出现问题，导致数据库Down机。<br />
登陆数据库服务器检查可以发现:<br />
<blockquote>$ df -k<br />
Filesystem            kbytes    used   avail capacity  Mounted on<br />
/dev/dsk/c0t10d0s0    494235   95149  349663    22%    /<br />
/dev/dsk/c0t10d0s6   4384710 2160661 2180202    50%    /usr<br />
/proc                      0       0       0     0%    /proc<br />
mnttab                     0       0       0     0%    /etc/mnttab<br />
fd                         0       0       0     0%    /dev/fd<br />
/dev/dsk/c0t10d0s1   1018191  586987  370113    62%    /var<br />
swap                 3703192      96 3703096     1%    /var/run<br />
swap                 4133440  430344 3703096    11%    /tmp<br />
/dev/dsk/c4t1d0s0    120514012 100868307 18440565    85%    /data1<br />
/dev/dsk/c0t10d0s5   8261393 2365474 5813306    29%    /opt<br />
/dev/dsk/c0t11d0s2   17348866   17229 17158149     1%    /backup<br />
/dev/dsk/c0t10d0s4    586515   21157  506707     5%    /export/home<br />
$ cd /data1<br />
$ ls<br />
<strong>.: I/O error</strong></blockquote></p>

<p>数据库Mount点data1已经不可以访问，I/O error的提示一般意味着磁盘出现问题。</p>

<p>这时我们可以通过一个系统命令dmesg来进行系统信息察看。<br />
<strong>dmesg - collect system diagnostic messages to form error log</strong></p>

<p>dmesg主要发现如下错误：<br />
<blockquote>$ dmesg</p>

<p>Nov  1 23:58:10 stat socal: [ID 403145 kern.info] ID[SUNWssa.socal.link.5010] socal1: port 1: Fibre Channel is OFFLINE<br />
Nov  1 23:58:56 stat scsi: [ID 243001 kern.warning] WARNING: /sbus@3,0/SUNW,socal@0,0/sf@1,0 (sf3):<br />
Nov  1 23:58:56 stat      Offline Timeout<br />
Nov  1 23:58:56 stat scsi: [ID 243001 kern.info] /sbus@3,0/SUNW,socal@0,0/sf@1,0 (sf3):<br />
Nov  1 23:58:56 stat      target 0x1 al_pa 0xe8 lun 0 offlined<br />
Nov  1 23:58:56 stat scsi: [ID 107833 kern.warning] WARNING: /sbus@3,0/SUNW,socal@0,0/sf@1,0/ssd@w50020f2300009321,0 (ssd0):<br />
Nov  1 23:58:56 stat      SCSI transport failed: reason 'reset': retrying command<br />
Nov  1 23:58:56 stat scsi: [ID 107833 kern.warning] WARNING: /sbus@3,0/SUNW,socal@0,0/sf@1,0/ssd@w50020f2300009321,0 (ssd0):<br />
Nov  1 23:58:56 stat      transport rejected fatal error<br />
Nov  1 23:58:56 stat ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll<br />
Nov  1 23:58:56 stat ufs: [ID 127457 kern.warning] WARNING: ufs log for /data1 changed state to Error<br />
Nov  1 23:58:56 stat ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /data1 and run fsck(1M)</blockquote></p>

<p>至此我们已经可以看到这是IO通道出现问题，最后导致IO操作失败。</p>

<p>这已经不是数据库层面的问题，我们通过重新启动主机及阵列，进行磁盘检查后，系统恢复正常。<br />
还算幸运！</p>

<p>-The  End-</p>

<p><br />
</p>]]></description>
<link>http://www.eygle.com/archives/2006/11/sunt3_io_fault.html</link>
<guid>http://www.eygle.com/archives/2006/11/sunt3_io_fault.html</guid>
<category>Case</category>
<pubDate>Fri, 03 Nov 2006 16:13:37 +0800</pubDate>
</item>
<item>
<title>如何启动DataGuard的备用数据库</title>
<description><![CDATA[<p>一大早来到公司，打开邮箱，发现收到了一堆的报警邮件，一个Standby数据库Down掉了。</p>

<p>登陆检查主库，警告日志记录了错误信息:<br />
<blockquote>*** 2006-10-30 07:32:10.614<br />
kcrrfail: dest:2 err:12560 force:0<br />
ORA-12560: TNS:protocol adapter error<br />
*** 2006-10-30 07:34:10.615<br />
Error 12541 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'bmarksb'<br />
Error 12541 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'bmarksb'<br />
Heartbeat failed to connect to standby 'bmarksb'. Error is 12541.<br />
*** 2006-10-30 07:34:10.615<br />
kcrrfail: dest:2 err:12541 force:0<br />
ORA-12541: TNS:no listener<br />
*** 2006-10-30 07:36:10.615<br />
Error 12541 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'bmarksb'<br />
Error 12541 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'bmarksb'<br />
Heartbeat failed to connect to standby 'bmarksb'. Error is 12541.</blockquote></p>

<p>马上登陆从库主机，手工启动备用数据库：<br />
<blockquote>[oracle@wapcom2 bdump]$ sqlplus "/ as sysdba"</p>

<p>SQL*Plus: Release 9.2.0.6.0 - Production on Mon Oct 30 08:17:24 2006</p>

<p>Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.</p>

<p>Connected to an idle instance.</p>

<p>SQL> startup nomount;<br />
ORACLE instance started.</p>

<p>Total System Global Area  470881780 bytes<br />
Fixed Size                   452084 bytes<br />
Variable Size             167772160 bytes<br />
Database Buffers          301989888 bytes<br />
Redo Buffers                 667648 bytes<br />
SQL> alter database mount standby database;</p>

<p>Database altered.</p>

<p>SQL> alter database recover managed standby database disconnect from session;</p>

<p>Database altered.</p>

<p>SQL> exit<br />
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production<br />
With the Partitioning, OLAP and Oracle Data Mining options<br />
JServer Release 9.2.0.6.0 - Production<br />
[oracle@wapcom2 bdump]$ lsnrctl start</blockquote></p>

<p>观察从库的日志信息，发现归档可以自动应用:<br />
<blockquote>[oracle@wapcom2 bdump]$ tail -f alert_bmark.log<br />
Standby Database mounted.<br />
Completed: alter database mount standby database<br />
Mon Oct 30 08:19:23 2006<br />
alter database recover managed standby database disconnect from session<br />
Attempt to start background Managed Standby Recovery process<br />
MRP0 started with pid=12<br />
MRP0: Background Managed Standby Recovery process started<br />
Media Recovery Waiting for thread 1 seq# 5151<br />
Mon Oct 30 08:19:29 2006<br />
Completed: alter database recover managed standby database di<br />
Mon Oct 30 08:22:58 2006<br />
Media Recovery Log /opt/oracle/oradata/bmark/stdarch/1_5151.arc<br />
Media Recovery Log /opt/oracle/oradata/bmark/stdarch/1_5152.arc<br />
Media Recovery Log /opt/oracle/oradata/bmark/stdarch/1_5153.arc<br />
Media Recovery Log /opt/oracle/oradata/bmark/stdarch/1_5154.arc<br />
Media Recovery Log /opt/oracle/oradata/bmark/stdarch/1_5155.arc<br />
Media Recovery Waiting for thread 1 seq# 5156</blockquote></p>

<p>再检查原因，发现原来是主机出现问题，在夜间不断重起：<br />
<blockquote>-bash-2.05b$ last |grep reboot<br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 08:10          (02:14)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 07:51          (02:32)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 07:38          (02:45)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 07:35          (02:48)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 07:21          (03:02)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 07:18          (03:05)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 06:39          (03:44)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 06:37          (03:46)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 06:32          (03:51)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 06:03          (04:21)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 01:48          (08:36)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 01:23          (09:01)    <br />
reboot   system boot  2.4.21-15.ELsmp  Mon Oct 30 00:39          (09:44)    </blockquote></p>

<p>初步看来是硬件出现了故障，<a href="http://www.eygle.com/archives/2006/10/system_tbs_io_corruption.html">最近</a>的硬件故障极为频繁，年底也到了<a href="http://www.eygle.com/archives/2006/01/backup_is_most_important.html">事故多发期</a>。<br />
提醒大家也多多注意。</p>

<p>参考文档：<br />
<a href="http://www.eygle.com/ha/dataguard-step-by-step.htm">http://www.eygle.com/ha/dataguard-step-by-step.htm</a></p>

<p>-The End-<br />
</p>]]></description>
<link>http://www.eygle.com/archives/2006/10/start_dataguard_db.html</link>
<guid>http://www.eygle.com/archives/2006/10/start_dataguard_db.html</guid>
<category>Case</category>
<pubDate>Mon, 30 Oct 2006 10:57:20 +0800</pubDate>
</item>
<item>
<title>系统表空间IO错误 数据损坏处理一则</title>
<description><![CDATA[<p>同事最近遇到一个数据库问题，说是系统表空间出现坏块，警告日志文件中不断出现如下错误：</p>

<p>[oracle@gdmstest bdump]$ tail -20 alert_mydb.log<br />
Linux Error: 4: Interrupted system call<br />
Additional information: 23710<br />
Wed Oct 25 16:47:44 2006<br />
Errors in file /opt/oracle/admin/mydb/bdump/mydb_smon_19646.trc:<br />
ORA-00604: error occurred at recursive SQL level 1<br />
ORA-01115: IO error reading block from file 1 (block # 23712)<br />
ORA-01110: data file 1: '/opt/oracle/oradata/mydb/system01.dbf'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 4: Interrupted system call<br />
Additional information: 23710<br />
Wed Oct 25 16:47:59 2006<br />
Errors in file /opt/oracle/admin/mydb/bdump/mydb_smon_19646.trc:<br />
ORA-00604: error occurred at recursive SQL level 1<br />
ORA-01115: IO error reading block from file 1 (block # 23712)<br />
ORA-01110: data file 1: '/opt/oracle/oradata/mydb/system01.dbf'<br />
ORA-27091: skgfqio: unable to queue I/O<br />
ORA-27072: skgfdisp: I/O error<br />
Linux Error: 4: Interrupted system call<br />
Additional information: 23710</p>

<p>而通过dbv检查又没有报数据块损坏：<br />
<blockquote>[oracle@gdmstest mydb]$ dbv file=system01.dbf blocksize=8192</p>

<p>DBVERIFY: Release 9.2.0.4.0 - Production on Thu Oct 26 11:36:42 2006</p>

<p>Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.</p>

<p>DBVERIFY - Verification starting : FILE = system01.dbf</p>

<p><br />
DBVERIFY - Verification complete</p>

<p>Total Pages Examined         : 23709<br />
Total Pages Processed (Data) : 13000<br />
Total Pages Failing   (Data) : 0<br />
Total Pages Processed (Index): 2090<br />
Total Pages Failing   (Index): 0<br />
Total Pages Processed (Other): 1377<br />
Total Pages Processed (Seg)  : 0<br />
Total Pages Failing   (Seg)  : 0<br />
Total Pages Empty            : 7242<br />
Total Pages Marked Corrupt   : 0<br />
Total Pages Influx           : 0</blockquote></p>

<p>我们一起来看看这个问题，首先从错误日志来看，其实这并不是一个数据块损坏的问题：<br />
ORA-01115: IO error reading block from file 1 (block # 23712)</p>

<p>这是个IO错误，数据块不能读取。</p>

<p>而DBV的提示也只是说检查了23709个数据块，这些数据块没有问题，而我们真正报错的数据块是23712号数据块，也就是说DBV检查到这个块附近，无法继续读取，进而退出。</p>

<p>而系统表空间远远大于 23709 * 8k / 1024 = 185M。</p>

<p>此时检查系统日志，dmesg日志中有大量的寻址错误，也就是说硬件出现了故障：<br />
<blockquote>[maintain@gdmstest bdump]$ dmesg<br />
: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<br />
end_request: I/O error, dev 03:06 (hda), sector 14266880<br />
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<br />
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<br />
end_request: I/O error, dev 03:06 (hda), sector 14266880<br />
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }<br />
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880<br />
end_request: I/O error, dev 03:06 (hda), sector 14266880<br />
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }</blockquote></p>

<p>至此问题被定位。</p>

<p>如果我们尝试cp系统表空间文件，同样会收到硬件的错误提示信息:<br />
<blockquote>[oracle@gdmstest mydb]$ cp system01.dbf system01.dbf.bk<br />
cp: 正在读入‘system01.dbf’: 输入/输出错误<br />
[oracle@gdmstest mydb]$ ll<br />
总用量 2173060<br />
....<br />
-rw-r-----    1 oracle   dba      524296192 10月 25 16:49 system01.dbf<br />
-rw-r-----    1 oracle   dba      194236416 10月 25 17:00 system01.dbf.bk<br />
...............</blockquote></p>

<p>只能复制194236416 Bytes，也就是 194236416 / 8192 = 23710.5，同样是读到23709个数据块左右，硬件的损坏就要通过系统的其它手段去解决了。</p>

<p>-The End-<br />
</p>]]></description>
<link>http://www.eygle.com/archives/2006/10/system_tbs_io_corruption.html</link>
<guid>http://www.eygle.com/archives/2006/10/system_tbs_io_corruption.html</guid>
<category>Case</category>
<pubDate>Fri, 27 Oct 2006 19:30:30 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:又见ORA-04031</title>
<description><![CDATA[<p>今天,一个朋友的数据库出现问题,连接上去一看,原来又是ORA-04031:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>[oracle@statdata bdump]$ sqlplus &quot;/ as sysdba&quot;</p>
            <p>SQL*Plus: Release 8.1.7.0.0 - Production on 星期五 6月 23 11:04:31 2006</p>
            <p>(c) Copyright 2000 Oracle Corporation. All rights reserved.</p>
            <p>ERROR:<br />ORA-00604: error occurred at recursive SQL level 2<br />ORA-04031: unable to allocate 4200 bytes of shared memory (&quot;shared<br />pool&quot;,&quot;TRIGGER$&quot;,&quot;sga heap&quot;,&quot;state objects&quot;)</p>
            </td>
        </tr>
    </tbody>
</table>
<p>sql*plus无法连接,想了一下才记起,还有svrmgrl可以用:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>[oracle@statdata dbs]$ svrmgrl</p>
            <p>Oracle Server Manager Release 3.1.7.0.0 - Production</p>
            <p>Copyright (c) 1997, 1999, Oracle Corporation. All Rights Reserved.</p>
            <p>Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production<br />With the Partitioning option<br />JServer Release 8.1.7.0.0 - Production</p>
            <p>SVRMGR&gt; connect internal<br />Connected.<br />SVRMGR&gt; shutdown immediate;<br />ORA-00604: error occurred at recursive SQL level 1<br />ORA-04031: unable to allocate 4200 bytes of shared memory (&quot;shared </p>
            <p>pool&quot;,&quot;DATABASE&quot;,&quot;sga heap&quot;,&quot;state objects&quot;)</p>
            </td>
        </tr>
    </tbody>
</table>
<p>在Oracle8.1.7.0.0中,ORA-04031的问题是由来已久的,使用svrmgrl也不能执行shutdown immediate了.只能通过shutdown abort关闭数据库后重起.</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">SVRMGR&gt; connect internal<br />Connected.<br />SVRMGR&gt; shutdown abort;<br />ORACLE instance shut down.</td>
        </tr>
    </tbody>
</table>
<p>进一步检查发现这个数据库处于初始态运行,共享池设置的只有30M,过小的共享池设置也是导致ORA-04031的原因之一:</p>
<blockquote dir="ltr" style="MARGIN-RIGHT: 0px">
<p>shared_pool_size = 31457280<br />db_block_buffers = 2048</p>
</blockquote>
<p dir="ltr">对这两个参数进行了放大调整,主机毕竟有4G内存,调整后,ORA-04031错误应该会少很多了.</p>
<p>数据库关闭后,共享内存并未及时释放:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>SVRMGR&gt; connect internal<br />Connected.<br />SVRMGR&gt; shutdown abort;<br />ORACLE instance shut down.<br />SVRMGR&gt; exit <br />Server Manager complete.<br />[oracle@statdata dbs]$ ipcs -sa</p>
            <p>------ Shared Memory Segments --------<br />key shmid owner perms bytes nattch status <br />0x00000000 2293760 oracle 640 77824 1 dest <br />0x00000000 2326529 oracle 640 17825792 1 dest <br />0x00000000 2359298 oracle 640 17825792 1 dest <br />0x00000000 2392067 oracle 640 20971520 1 dest <br />0x00000000 2424836 oracle 640 16961536 1 dest </p>
            <p>------ Semaphore Arrays --------<br />key semid owner perms nsems </p>
            <p>------ Message Queues --------<br />key msqid owner perms used-bytes messages <br /></p>
            </td>
        </tr>
    </tbody>
</table>
<p>杀掉残余的Oracle进程后,共享内存释放:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <p>[oracle@statdata dbs]$ ps -ef|grep ora<br />oracle 4159 1 0 May11 ? 00:17:20 </p>
            <p>/export/home/oracle/product/8.1.7/bin/tnslsnr LISTENER -inherit<br />oracle 7663 7651 0 10:47 ? 00:00:00 [sshd]<br />oracle 7664 7663 0 10:47 pts/1 00:00:00 -bash<br />oracle 7730 7664 0 10:48 pts/1 00:00:00 svrmgrl<br />oracle 7731 7730 0 10:48 ? 00:00:00 oracleora8 (DESCRIPTION=(LOCAL=YES)</p>
            <p>(ADDRESS=(PROTOCOL=beq)))<br />oracle 8344 8342 0 11:03 ? 00:00:00 [sshd]<br />oracle 8345 8344 0 11:03 pts/2 00:00:00 -bash<br />oracle 9094 1 0 11:19 ? 00:00:00 oracleora8 (LOCAL=NO)<br />oracle 9101 8345 0 11:19 pts/2 00:00:00 ps -ef<br />oracle 9102 8345 0 11:19 pts/2 00:00:00 grep ora<br />[oracle@statdata dbs]$ kill -9 9094<br />[oracle@statdata dbs]$ ps -ef|grep ora<br />oracle 4159 1 0 May11 ? 00:17:20 </p>
            <p>/export/home/oracle/product/8.1.7/bin/tnslsnr LISTENER -inherit<br />oracle 7663 7651 0 10:47 ? 00:00:00 [sshd]<br />oracle 7664 7663 0 10:47 pts/1 00:00:00 -bash<br />oracle 7730 7664 0 10:48 pts/1 00:00:00 svrmgrl<br />oracle 8344 8342 0 11:03 ? 00:00:00 [sshd]<br />oracle 8345 8344 0 11:03 pts/2 00:00:00 -bash<br />oracle 9113 8345 0 11:19 pts/2 00:00:00 ps -ef<br />oracle 9114 8345 0 11:19 pts/2 00:00:00 grep ora<br />[oracle@statdata dbs]$ ipcs -sa</p>
            <p>------ Shared Memory Segments --------<br />key shmid owner perms bytes nattch status </p>
            <p>------ Semaphore Arrays --------<br />key semid owner perms nsems </p>
            <p>------ Message Queues --------<br />key msqid owner perms used-bytes messages </p>
            <p>[oracle@statdata dbs]$ svrmgrl</p>
            <p>Oracle Server Manager Release 3.1.7.0.0 - Production</p>
            <p>Copyright (c) 1997, 1999, Oracle Corporation. All Rights Reserved.</p>
            <p>Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production<br />With the Partitioning option<br />JServer Release 8.1.7.0.0 - Production</p>
            <p>SVRMGR&gt; connect internal<br />Connected.<br />SVRMGR&gt; startup<br />ORACLE instance started.<br />Total System Global Area 767996064 bytes<br />Fixed Size 73888 bytes<br />Variable Size 243462144 bytes<br />Database Buffers 524288000 bytes<br />Redo Buffers 172032 bytes<br />Database mounted.<br />Database opened.</p>
            </td>
        </tr>
    </tbody>
</table>
<p>此时数据库可以成功启动.</p>
<p>&nbsp;</p>]]></description>
<link>http://www.eygle.com/archives/2006/06/oracle_diagnost_ora_04031.html</link>
<guid>http://www.eygle.com/archives/2006/06/oracle_diagnost_ora_04031.html</guid>
<category>Case</category>
<pubDate>Fri, 23 Jun 2006 15:53:01 +0800</pubDate>
</item>
<item>
<title>CPU Load Very High-超高负载之数据库</title>
<description><![CDATA[<p>周一一大早就发现一个数据库负载超高,性能异常:</p>
<p>4CPU,8G内存,的SUN Fire 480R主机:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre># ./prtdiag<br />System Configuration:&nbsp; Sun Microsystems&nbsp; sun4u Sun Fire 480R<br />System clock frequency: 150 MHz<br />Memory size: 8192 Megabytes</pre>
            <pre>========================= CPUs ===============================================</pre>
            <pre>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Run&nbsp;&nbsp; E$&nbsp; CPU&nbsp;&nbsp;&nbsp;&nbsp; CPU&nbsp; <br />Brd&nbsp; CPU&nbsp; MHz&nbsp;&nbsp; MB&nbsp; Impl.&nbsp;&nbsp; Mask <br />--- ----- ---- ---- ------- ---- <br />&nbsp;A&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; 1050&nbsp; 8.0 US-III+&nbsp; 11.0<br />&nbsp;A&nbsp;&nbsp;&nbsp;&nbsp; 2&nbsp; 1050&nbsp; 8.0 US-III+&nbsp; 11.0<br />&nbsp;B&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp; 1050&nbsp; 8.0 US-III+&nbsp; 11.0<br />&nbsp;B&nbsp;&nbsp;&nbsp;&nbsp; 3&nbsp; 1050&nbsp; 8.0 US-III+&nbsp; 11.0&nbsp;</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>&nbsp;现在负载:</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>load averages: <strong>13.12, 12.60, 12.23</strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; db480-4.hurray.com.cn&nbsp;&nbsp;&nbsp; 09:50:32<br />184 processes: 166 sleeping, 12 running, 1 stopped, 5 on cpu<br />CPU states:&nbsp; 0.0% idle, 93.6% user,&nbsp; 6.4% kernel,&nbsp; 0.0% iowait,&nbsp; 0.0% swap<br />Memory: 8.0G real, 2.8G free, 4.1G swap in use, 18.1G swap free</pre>
            <pre>&nbsp;&nbsp; PID USERNAME THR PR NCE&nbsp; SIZE&nbsp;&nbsp; RES STATE&nbsp;&nbsp; TIME FLTS&nbsp;&nbsp;&nbsp; CPU COMMAND<br />&nbsp;11834 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 22&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 12:51&nbsp;&nbsp;&nbsp; 0&nbsp; 7.08% oracle<br />&nbsp; 3876 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 22&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 38.5H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.93% oracle<br />&nbsp;15876 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp; 514:20&nbsp;&nbsp;&nbsp; 0&nbsp; 6.85% oracle<br />&nbsp; 4042 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 41.3H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.78% oracle<br />&nbsp;29532 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 19.4H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.67% oracle<br />&nbsp; 3703 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 38.5H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.66% oracle<br />&nbsp;29704 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 20.9H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.60% oracle<br />&nbsp;15537 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 21&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp; 496:48&nbsp;&nbsp;&nbsp; 0&nbsp; 6.57% oracle<br />&nbsp;15680 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 31&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp; 496:44&nbsp;&nbsp;&nbsp; 0&nbsp; 6.50% oracle<br />&nbsp;29375 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 31&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp; 19.5H&nbsp;&nbsp;&nbsp; 0&nbsp; 6.31% oracle<br />&nbsp; 4033 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 22&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G run&nbsp;&nbsp;&nbsp;&nbsp; 8:51&nbsp;&nbsp;&nbsp; 0&nbsp; 4.76% oracle<br />&nbsp; 4035 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 51&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G sleep&nbsp;&nbsp; 8:55&nbsp;&nbsp;&nbsp; 0&nbsp; 4.70% oracle<br />&nbsp; 4046 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 52&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G cpu03&nbsp;&nbsp; 8:59&nbsp;&nbsp;&nbsp; 0&nbsp; 4.68% oracle<br />&nbsp; 7349 oracle&nbsp;&nbsp;&nbsp; 11 53&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G sleep&nbsp;&nbsp; 8:38&nbsp;&nbsp;&nbsp; 0&nbsp; 4.44% oracle<br />&nbsp; 4055 oracle&nbsp;&nbsp;&nbsp;&nbsp; 2 42&nbsp;&nbsp; 0&nbsp; 3.1G&nbsp; 3.0G cpu02&nbsp;&nbsp; 8:43&nbsp;&nbsp;&nbsp; 0&nbsp; 4.39% oracle&nbsp;</pre>
            </td>
        </tr>
    </tbody>
</table>
应用的问题太多了,汗!]]></description>
<link>http://www.eygle.com/archives/2006/01/cpu_load_very_high.html</link>
<guid>http://www.eygle.com/archives/2006/01/cpu_load_very_high.html</guid>
<category>Case</category>
<pubDate>Mon, 09 Jan 2006 10:30:29 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:How to deal with ORA-600 2662 Error</title>
<description><![CDATA[<p>在<a href="http://www.eygle.com/archives/2005/10/ora00600_2262ii.html">ORA-00600 2262错误解决</a>一文中，我曾经提到过，很多时候使用隐含参数<a href="http://www.eygle.com/archives/2005/10/oracle_hidden_allow_resetlogs_corruption.html">_ALLOW_RESETLOGS_CORRUPTION</a>后resetlogs打开数据库,我们可能会由于SCN不一致而遭遇到ORA-00600 2662号错误，这里给出一个完整的例子及解决过程。</p>
<p>当然模拟2662错误需要技巧，本文并不会涉及这个内容。</p>
<p>通过正常方式启动数据库时，从alert文件中，我们可以看到ora-00600 2662号错误。</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sun Dec 11 18:02:25 2005<br />Errors in file /opt/oracle/admin/conner/udump/conner_ora_13349.trc:<br /><strong>ORA-00600: internal error code, arguments: [2662], [0], [547743994], [0], [898092653], [8388617], [], []<br /></strong>Sun Dec 11 18:02:27 2005<br />Errors in file /opt/oracle/admin/conner/udump/conner_ora_13349.trc:<br /><strong>ORA-00600: internal error code, arguments: [2662], [0], [547743994], [0], [898092653], [8388617], [], []<br /></strong>Sun Dec 11 18:02:27 2005<br />Error 600 happened during db open, shutting down database<br />USER: terminating instance due to error 600</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>此时我们可以通过Oracle的<a href="http://www.eygle.com/internal/Oracle.Diagnostics.Events.list.htm">内部事件</a>来调整SCN:</p>
<p>增进SCN有两种常用方法:</p>
<p>1.通过immediate trace name方式(在数据库Open状态下)</p>
<p><strong><em>alter session set events 'IMMEDIATE trace name ADJUST_SCN level x';</em></strong></p>
<p>2.通过10015事件(在数据库无法打开，mount状态下)</p>
<p><font face="Courier"><strong><em>alter session set events '10015 trace name adjust_scn level x';</em></strong></font></p>
<p><font face="Courier">注:level 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。</font></p>
<p><font face="Courier">本例由于数据库无法打开，只能使用的二种方法。</font></p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>[oracle@jumper dbs]$ sqlplus &quot;/ as sysdba&quot;</pre>
            <pre>SQL*Plus: Release 9.2.0.4.0 - Production on Sun Dec 11 18:26:18 2005</pre>
            <pre>Copyright (c) 1982, 2002, Oracle Corporation.&nbsp; All rights reserved.</pre>
            <pre>Connected to an idle instance.</pre>
            <pre>SQL&gt; startup mount pfile=initconner.ora<br />ORACLE instance started.</pre>
            <pre>Total System Global Area&nbsp;&nbsp; 97588504 bytes<br />Fixed Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 451864 bytes<br />Variable Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 33554432 bytes<br />Database Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 62914560 bytes<br />Redo Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 667648 bytes<br />Database mounted.<br /></pre>
            <pre>SQL&gt; alter session set events '10015 trace name adjust_scn level 10';</pre>
            <pre>Session altered.</pre>
            <pre>SQL&gt; alter database open;</pre>
            <pre>Database altered.</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>注意,由于我使用了10015事件，使得SCN增进了10 <font face="Courier">billion，稍后我们可以验证。</font>&nbsp;</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>[oracle@jumper dbs]$ sqlplus &quot;/ as sysdba&quot;</pre>
            <pre>SQL*Plus: Release 9.2.0.4.0 - Production on Sun Dec 11 18:26:18 2005</pre>
            <pre>Copyright (c) 1982, 2002, Oracle Corporation.&nbsp; All rights reserved.</pre>
            <pre>Connected to an idle instance.</pre>
            <pre>SQL&gt; startup mount pfile=initconner.ora<br />ORACLE instance started.</pre>
            <pre>Total System Global Area&nbsp;&nbsp; 97588504 bytes<br />Fixed Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 451864 bytes<br />Variable Size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 33554432 bytes<br />Database Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 62914560 bytes<br />Redo Buffers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 667648 bytes<br />Database mounted.<br /></pre>
            <pre>SQL&gt; <strong>alter session set events '10015 trace name adjust_scn level 10';</strong></pre>
            <pre>Session altered.</pre>
            <pre>SQL&gt; alter database open;</pre>
            <pre>Database altered.</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>此时数据库可以打开，从alert文件中我们可以看到如下提示:</p>
<p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>Sun Dec 11 18:27:04 2005<br />SMON: enabling cache recovery<br />Sun Dec 11 18:27:05 2005<br />Debugging event used to advance scn to <strong>10737418240</strong></pre>
            </td>
        </tr>
    </tbody>
</table>
</p>
<p>SCN被增进了10 billion,即 10 * (1024*1024*1024) = <strong>10737418240</strong>,正好是日志里记录的数量。</p>]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html</guid>
<category>Backup&amp;Recovery</category>
<pubDate>Tue, 20 Dec 2005 19:52:36 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:Why sysdate is fixed?</title>
<description><![CDATA[<p>今天一个朋友在MSN上问到一个问题:为什么我的SYSDATE不变了？</p>
<p>他查询SYSDATE的值一直停留在2005-03-01 11:41:15。感觉很奇怪。</p>
<p>忍不住指导他研究一下，先是从<a href="http://www.eygle.com/unix/Man.Page.Of.gethrtime.htm">系统级</a>诊断，发现没有问题。</p>
<p>再从数据库角度来诊断，发现:</p>
<p><em><strong>select current_date from dual; </strong></em>的输出是正确的，而</p>
<p><strong><em>select sysdate from dual;</em></strong> 却是不正确的。</p>
<p>&nbsp;</p>
<p>猜测是某个参数导致了系统日期被固化，让他传来alert文件，果然发现了一个此前未注意到的参数: FIXED_DATE，</p>
<table>
    <tbody>
        <tr>
            <td width="500" bgcolor="#999999">
            <pre>&nbsp; core_dump_dest&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = /u01/app/oracle/admin/unicode/cdump<br /><strong>&nbsp; fixed_date&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 01-MAR-05<br /></strong>&nbsp; sort_area_size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = 524288<br />&nbsp; db_name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = unicode</pre>
            </td>
        </tr>
    </tbody>
</table>
<p>文档上的解释为：</p>
<h2 class="H1"><a name="REFRN10062"><font face="Arial, Helvetica, sans-serif" color="#330099">FIXED_DATE</font></a></h2>
<p><!--/TOC=h1--><a name="1038428"></a>
<table dir="ltr" title="" width="100%" summary="" class="Simple">
    <tbody>
        <tr class="Simple" valign="top" align="left">
            <td class="Simple"><a name="1038413"></a>
            <p class="TS"><strong class="Bold">Parameter type</strong></p>
            </td>
            <td class="Simple"><a name="1038415"></a>
            <p class="TS">String</p>
            </td>
        </tr>
        <tr class="Simple" valign="top" align="left">
            <td class="Simple"><a name="1038417"></a>
            <p class="TS"><strong class="Bold">Syntax</strong></p>
            </td>
            <td class="Simple"><a name="1038419"></a>
            <p class="TS"><code><font face="新宋体">FIXED_DATE = YYYY-MM-DD-HH24:MI:SS</font></code> (or the default Oracle date format)</p>
            </td>
        </tr>
        <tr class="Simple" valign="top" align="left">
            <td class="Simple"><a name="1038421"></a>
            <p class="TS"><strong class="Bold">Default value</strong></p>
            </td>
            <td class="Simple"><a name="1038423"></a>
            <p class="TS">There is no default value.</p>
            </td>
        </tr>
        <tr class="Simple" valign="top" align="left">
            <td class="Simple"><a name="1038425"></a>
            <p class="TS"><strong class="Bold">Parameter class</strong></p>
            </td>
            <td class="Simple"><a name="1038427"></a>
            <p class="TS">Dynamic: <code><font face="新宋体">ALTER SYSTEM</font></code></p>
            </td>
        </tr>
    </tbody>
</table>
<a name="1038431"></a></p>
<p class="BP"><code><font face="新宋体"><strong>FIXED_DATE</strong></font></code> enables you to set a constant date that <code><font face="新宋体">SYSDATE</font></code> will always return instead of the current date. This parameter is useful primarily for testing. The value can be in the format shown above or in the default Oracle date format, without a time.</p>
<p class="BP">找到了这个参数也就找到了答案！</p>]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_diagnostics_why_sysdate_fixed.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_diagnostics_why_sysdate_fixed.html</guid>
<category>Case</category>
<pubDate>Mon, 12 Dec 2005 23:21:13 +0800</pubDate>
</item>
<item>
<title>Oracle Diagnostics:KTSMG_UPDATE_MQL(): MMNL absent</title>
<description><![CDATA[今早有朋友问到如下错误:<br>
<table><td width="450" bgcolor="#999999"><pre>
/* OracleOEM */ ALTER DATABASE DATAFILE 
'/OracleStorage/content/content_ifs_lob_i_01.dbf' RESIZE 10240M 
Fri Dec 2 08:36:42 2005 
KTSMG_UPDATE_MQL(): MMNL absent for 4294967292 secs;
Foregrounds taking over 
</pre></td></table><br>
首先我们从MMNL可以知道，这是一个Oracle10g的数据库。<br>
MMNL是Oracle10g引入的一个新的后台进程，其全拼名字为Memory Monitor Light ，是AWR(Automatic Workload Repository )的组件之一。<br><br>

这个错误的含义是，MMNL过长时间未激活，前台接替了它的工作。<br><br>

通常这是一个可以忽略的错误，不会对数据库产生什么影响；但是在某些情况下，该错误会导致数据库无法登陆或访问，需要重新启动数据库才能解决。<br>

这类错误只在10gR1中存在，在10gR2中已经修正。<br>
]]></description>
<link>http://www.eygle.com/archives/2005/12/oracle_diagnost_mmnl_absent.html</link>
<guid>http://www.eygle.com/archives/2005/12/oracle_diagnost_mmnl_absent.html</guid>
<category>Case</category>
<pubDate>Sat, 03 Dec 2005 11:54:16 +0800</pubDate>
</item>
<item>
<title>案例:Move系统表DEPENDENCY$导致索引失效的数据库故障</title>
<description><![CDATA[今天看到有一个朋友因为Move了一个系统表DEPENDENCY$,在没有Rebuild索引的情况下,重起数据库,结果收到ORA-01502错误,数据库无法启动.<br>
<table><td width="500" bgcolor="#999999"><pre>
Thu Nov 17 01:55:30 2005
Errors in file /dcdb/admin/hidc/udump/hidc_ora_56602.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01502: index 'SYS.I_DEPENDENCY1' or partition of such index is in unusable state</pre></td></table><br>
<br>
在这种情况下,最好的情况是拥有备份,能够从备份中恢复.如果没有备份就很麻烦了(本案例恰恰没有备份).<br>
和<a href="http://www.anysql.net">D.C.B.A</a>讨论这个问题的时候,开始想到了3个办法:<br>
1.通过某种手段跳过索引检测<br>
事实证明在9i中这很难;而且这是在Bootstrap$的检测过程中发生的.<br>
2.通过BBED进行修复<br>
这种方法应该可行,但是会极其复杂小心.<br>
3.使用DUL或类DUL工具<br>
最后这种方法是万不得已.<br>
<br>
DCBA在跟进这个案例,参考:<br>
<a href="http://www.anysql.net/blog/p/movesystem.php">http://www.anysql.net/blog/p/movesystem.php</a><br>
但是我们应该记住,永远不要让你的数据库处于这样的境地,这真的很危险.<br>
<br>
这一问题的根本原因在于,数据库启动过程中,会进行如下验证:<br>
<table><td width="500" bgcolor="#999999"><pre>
select owner#,name,namespace,remoteowner,linkname,
p_timestamp,p_obj#, d_owner#, nvl(property,0),subname 
from dependency$,obj$ where d_obj#=:1 and p_obj#=obj#(+) order by order#
</pre></td></table><br>
这一验证会导致如下执行计划:<br>
<table><td width="500" bgcolor="#999999"><pre>
STAT #9 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=6 r=3 w=0 time=694 us)'
STAT #9 id=2 cnt=1 pid=1 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ#(18) (cr=3 r=0 w=0 time=104 us)'
STAT #9 id=3 cnt=1 pid=2 pos=1 obj=36 op='INDEX UNIQUE SCAN OBJ#(36) (cr=2 r=0 w=0 time=64 us)'
STAT #9 id=4 cnt=1 pid=1 pos=2 obj=22 op='TABLE ACCESS CLUSTER OBJ#(22) (cr=3 r=3 w=0 time=576 us)'
STAT #9 id=5 cnt=1 pid=4 pos=1 obj=11 op='INDEX UNIQUE SCAN OBJ#(11) (cr=2 r=2 w=0 time=406 us)'
</pre></td></table><br>
这里的'INDEX UNIQUE SCAN OBJ#(36)"就导致了最后的错误:<br>
<table><td width="500" bgcolor="#999999"><pre>
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01502: index 'SYS.I_DEPENDENCY1' or partition of such index is in unusable state
EXEC #1:c=0,e=633371,p=39,cr=619,cu=1,mis=0,r=0,dep=0,og=2,tim=1105782888673612
ERROR #1:err=1092 tim=1563018104
</pre></td></table><br>
<br>
可惜Oracle并不允许置所有索引于不顾,否则就有救了.<br>
<br>
套用一句名言:<strong>幸运的数据库大致相同,不幸的数据库却各有各的不幸.</strong><br>]]></description>
<link>http://www.eygle.com/archives/2005/11/move_dependency_index_unusable.html</link>
<guid>http://www.eygle.com/archives/2005/11/move_dependency_index_unusable.html</guid>
<category>Internal</category>
<pubDate>Fri, 18 Nov 2005 13:46:21 +0800</pubDate>
</item>
<item>
<title>Oracle数据库诊断案例-redo log日志组处于高激活状态</title>
<description><![CDATA[平台:SunOS 5.8 Generic_108528-23 sun4u sparc SUNW,Ultra-Enterprise<br>
数据库:8.1.5.0.0<br>
症状:响应缓慢，应用请求已经无法返回<br>

登陆数据库,发现redo日志组除current外都处于active状态<br>
<table><td width="500" bgcolor="#999999"> <pre>
oracle:/oracle/oracle8>sqlplus "/ as sysdba"

SQL*Plus: Release 8.1.5.0.0 - Production on Thu Jun 23 18:56:06 2005

(c) Copyright 1999 Oracle Corporation.  All rights reserved.


Connected to:
Oracle8i Enterprise Edition Release 8.1.5.0.0 - Production
With the Partitioning and Java options
PL/SQL Release 8.1.5.0.0 - Production
SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1     520403   31457280          1 NO  <strong>ACTIVE</strong>              1.3861E+10 23-JUN-05
         2          1     520404   31457280          1 NO  ACTIVE              1.3861E+10 23-JUN-05
         3          1     520405   31457280          1 NO  ACTIVE              1.3861E+10 23-JUN-05
         4          1     520406   31457280          1 NO  CURRENT             1.3861E+10 23-JUN-05
         5          1     520398   31457280          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         6          1     520399   31457280          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         7          1     520400  104857600          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         8          1     520401  104857600          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         9          1     520402  104857600          1 NO  ACTIVE              1.3861E+10 23-JUN-05

9 rows selected.

SQL> /

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1     520403   31457280          1 NO  ACTIVE              1.3861E+10 23-JUN-05
         2          1     520404   31457280          1 NO  ACTIVE              1.3861E+10 23-JUN-05
         3          1     520405   31457280          1 NO  ACTIVE              1.3861E+10 23-JUN-05
         4          1     520406   31457280          1 NO  CURRENT             1.3861E+10 23-JUN-05
         5          1     520398   31457280          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         6          1     520399   31457280          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         7          1     520400  104857600          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         8          1     520401  104857600          1 NO  ACTIVE              1.3860E+10 23-JUN-05
         9          1     520402  104857600          1 NO  ACTIVE              1.3861E+10 23-JUN-05

9 rows selected.
</pre></td></table><br>


如果日志都处于active状态，那么显然DBWR的写已经无法跟上log switch触发的检查点。<br>

]]></description>
<link>http://www.eygle.com/archives/2005/06/oracleeaoieiear.html</link>
<guid>http://www.eygle.com/archives/2005/06/oracleeaoieiear.html</guid>
<category>Internal</category>
<pubDate>Sun, 26 Jun 2005 10:46:19 +0800</pubDate>
</item>
<item>
<title>Oracle诊断案例-Job任务停止执行</title>
<description><![CDATA[<P>昨天接到研发人员报告，数据库定时任务未正常执行，导致某些操作失败。</P>
<P>开始介入处理该事故.<BR>系统环境:<BR>SunOS DB 5.8 Generic_108528-21 sun4u sparc SUNW,Ultra-4 <BR>Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production</P>
<P>1.首先介入检查数据库任务</P>
<P>&nbsp;</P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999><SPAN class=style6>
<PRE>$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.3.0 - Production on Wed Nov 17 20:23:53 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

SQL&gt; select job,last_date,last_sec,next_date,next_sec,broken,failures from dba_jobs;

       JOB LAST_DATE LAST_SEC         NEXT_DATE NEXT_SEC         B   FAILURES INTERVAL
---------- --------- ---------------- --------- ---------------- - ----------   ----------------------------
        31 16-NOV-04 01:00:02         17-NOV-04 01:00:00         N          0 trunc(sysdate+1)+1/24
        27 16-NOV-04 00:00:04         17-NOV-04 00:00:00         N          0 TRUNC(SYSDATE) + 1
        35 16-NOV-04 01:00:02         17-NOV-04 01:00:00         N          0 trunc(sysdate+1)+1/24
        29 16-NOV-04 00:00:04         17-NOV-04 00:00:00         N          0 TRUNC(SYSDATE) + 1
        30 01-NOV-04 06:00:01         01-DEC-04 06:00:00         N          0 trunc(add_months(sysdate,1),'MM')+6/24
        65 16-NOV-04 04:00:03         17-NOV-04 04:00:00         N          0 trunc(sysdate+1)+4/24
        46 16-NOV-04 02:14:27         17-NOV-04 02:14:27         N          0 sysdate+1
        66 16-NOV-04 03:00:02         17-NOV-04 18:14:49         N          0 trunc(sysdate+1)+3/24

8 rows selected.                      </PRE></SPAN></TD></TR></TBODY></TABLE>
<P>发现JOB任务是都没有正常执行，最早一个应该在17-NOV-04 01:00:00执行。但是没有执行。</P>
<P>2.建立测试JOB</P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=736>

<P>&nbsp;</P><PRE>create or replace PROCEDURE pining
  IS
BEGIN
         NULL;
 END;
/

variable jobno number;
variable instno number;
begin
  select instance_number into :instno from v$instance;
  dbms_job.submit(:jobno, 'pining;', trunc(sysdate+1/288,'MI'), 'trunc(SYSDATE+1/288,''MI'')', TRUE, :instno);
end;
/

</PRE></TD></TR></TBODY></TABLE>
<P>发现同样的，不执行。<BR>但是通过dbms_job.run(&lt;job&gt;)执行没有任何问题。</P>
<P>3.进行恢复尝试</P>
<P>怀疑是CJQ0进程失效，首先设置JOB_QUEUE_PROCESSES为0，Oracle会杀掉CJQ0及相应job进程<BR>SQL&gt; ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 0; </P>
<P>等2~3分钟，重新设置</P>
<P>SQL&gt; ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 5; </P>
<P>此时PMON会重起CJQ0进程</P>
<P>在警报日志中可以看到以下信息:</P>
<TABLE height=132 cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=694>

<P>&nbsp;</P><PRE>Thu Nov 18 11:59:50 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 12:01:30 2004
ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY;
Thu Nov 18 12:01:30 2004
Restarting dead background process CJQ0
CJQ0 started with pid=8      </PRE></TD></TR></TBODY></TABLE>
<P>&nbsp;</P>
<P>但是Job仍然不执行，而且在再次修改的时候，CJQ0直接死掉了。</P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>

<P>&nbsp;</P><PRE>Thu Nov 18 13:52:05 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 14:09:30 2004
ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY;
Thu Nov 18 14:10:27 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 14:10:42 2004
ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY;
Thu Nov 18 14:31:07 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 14:40:14 2004
ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY;
Thu Nov 18 14:40:28 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 14:40:33 2004
ALTER SYSTEM SET job_queue_processes=1 SCOPE=MEMORY;
Thu Nov 18 14:40:40 2004
ALTER SYSTEM SET job_queue_processes=10 SCOPE=MEMORY;
Thu Nov 18 15:00:42 2004
ALTER SYSTEM SET job_queue_processes=0 SCOPE=MEMORY;
Thu Nov 18 15:01:36 2004
ALTER SYSTEM SET job_queue_processes=15 SCOPE=MEMORY;
      </PRE></TD></TR></TBODY></TABLE>
<P>4.尝试重起数据库<BR>这个必须在晚上进行</P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>

<P>&nbsp;</P><PRE>PMON started with pid=2
DBW0 started with pid=3
LGWR started with pid=4
CKPT started with pid=5
SMON started with pid=6
RECO started with pid=7
CJQ0 started with pid=8
QMN0 started with pid=9
....
      </PRE></TD></TR></TBODY></TABLE>
<P>CJQ0正常启动，但是Job仍然不执行。</P>
<P>5.没办法了...</P>
<P>继续研究...居然发现Oralce有这样一个bug</P>
<P>1. Clear description of the problem encountered: <BR>slgcsf() / slgcs() on Solaris will stop incrementing after <BR>497 days 2 hrs 28 mins (approx) machine uptime. <BR></P>
<P>2. Pertinent configuration information <BR>No special configuration other than long machine uptime. . </P>
<P>3. Indication of the frequency and predictability of the problem <BR><SPAN class=style34><FONT color=#0000ff>100% but only after 497 days.</FONT></SPAN></P>
<P>4. Sequence of events leading to the problem <BR>If the <A href="http://www.eygle.com/unix/Man.Page.Of.gethrtime.htm">gethrtime()</A> OS call returns a value &gt; 42949672950000000 <BR>nanoseconds then slgcs() stays at 0xffffffff. This can <BR>cause some problems in parts of the code which rely on <BR>slgcs() to keep moving. <BR>eg: In kkjssrh() does "now = slgcs(&amp;se)" and compares that <BR>to a previous timestamp. After 497 days uptime slgcs() <BR>keeps returning 0xffffffff so "now - kkjlsrt" will <BR>always return 0. . </P>
<P>5. Technical impact on the customer. Include persistent after effects. <BR><SPAN class=style34><FONT color=#0000ff>In this case DBMS JOBS stopped running after 497 days uptime.</FONT></SPAN> <BR>Other symptoms could occur in various places in the code. </P>
<P>好么，原来是计时器溢出了，一检查我的主机:</P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>
<PRE>bash-2.03$ uptime
 10:00pm  up 500 day(s), 14:57,  1 user,  load average: 1.31, 1.09, 1.08
bash-2.03$ date
Fri Nov 19 22:00:14 CST 2004      </PRE>
<P>&nbsp;</P></TD></TR></TBODY></TABLE>
<P>刚好到事发时是497天多一点.ft.</P>
<P>6.安排重起主机系统..</P>
<P>这个问题够郁闷的，NND，谁曾想Oracle这都成...</P>
<P>Oracle最后声称:</P>
<P>fix made it into 9.2.0.6 patchset</P>
<P>在Solaris上的9206尚未发布...晕.</P>
<P>好了，就当是个经历吧，如果有问题非常不可思议的话，那么大胆怀疑Oracle吧，是Bug，可能就是Bug。</P>
<P>重起以后问题解决，状态如下:</P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999>
<PRE><SPAN class=style26>
$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.3.0 - Production on Fri Nov 26 09:21:21 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

SQL&gt; select job,last_date,last_sec,next_date,next_sec from user_jobs;

       JOB LAST_DATE LAST_SEC         NEXT_DATE NEXT_SEC
---------- --------- ---------------- --------- ----------------
        70 26-NOV-04 09:21:04         26-NOV-04 09:26:00


SQL&gt; /

       JOB LAST_DATE LAST_SEC         NEXT_DATE NEXT_SEC
---------- --------- ---------------- --------- ----------------
        70 26-NOV-04 09:26:01         26-NOV-04 09:31:00

SQL&gt; 
SQL&gt; select * from v$timer;

     HSECS
----------
   3388153

SQL&gt; select * from v$timer;

     HSECS
----------
   3388319

SQL&gt;       
                      </PRE></SPAN></TD></TR></TBODY></TABLE>
<P>&nbsp;</P>
<P>7.FAQ</P>
<P>一些朋友在Pub上问的问题<BR>Q:对于不同平台，是否存在同样的问题?</P>
<P>A:对于不同平台，存在同样的问题<BR>因为Oracle使用了标准C函数gethrtime<BR>参考:<BR>http://www.eygle.com/unix/Man.Page.Of.gethrtime.htm</P>
<P>使用了该函数的代码都会存在问题.</P>
<P>在Metalink Note:3427424.8 文档中，Oracle定义的平台影响为:Generic (all / most platforms affected)</P>
<P>Q.计数器溢出，看了看job 中基本都是1天左右执行一次，如果设置 3 天执行一次的 job , 是否出问题的uptime 应该是 497*3 之后呢 ？ </P>
<P>A:不会<BR><BR>Oracle内部通过计时器来增进相对时间.<BR>由于Oracle内部hrtime_t使用了32位计数</P>
<P>那么最大值也就是0xffffffff<BR>0xffffffff = 4294967295</P>
<P>slgcs()是10亿分之一秒，溢出在42949672950000000这个点上.</P>
<P>注意，这里0xffffffff，达到这个值时，本来是无符号整型，现在变成了-1，那么这个值递增时，+1 = 0了。<BR>时间就此停住了。<BR></P>我写了一小段代码来验证这个内容，参考：<BR>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>
<PRE>&nbsp;

[oracle@jumper oracle]$ cat unsign.c
#include <STDIO.H>
int main(void){
unsigned int num = 0xffffffff;

printf("num is %d bits long\n", sizeof(num) * 8);
printf("num = 0x%x\n", num);
printf("num + 1 = 0x%x\n", num + 1);

return 0;
}

[oracle@jumper oracle]$ gcc -o unsign.sh unsign.c
[oracle@jumper oracle]$ ./unsign.sh
num is 32 bits long
<STRONG>num = 0xffffffff
num + 1 = 0x0</STRONG>
[oracle@jumper oracle]$
</PRE>
<P>&nbsp;</P></TD></TR></TBODY></TABLE>
<P><BR>Q:内部时钟之一应该就是这个吧： v$timer 精确到1/100 秒的数据</P>
<P>没错!</P>
<P>注意前面说的:</P>
<P>4. Sequence of events leading to the problem <BR>If the gethrtime() OS call returns a value &gt; 42949672950000000 <BR>nanoseconds then slgcs() stays at 0xffffffff. This can <BR>cause some problems in parts of the code which rely on <BR>slgcs() to keep moving. </P>
<P>也就是说如果gethrtime() 操作系统调用返回值大于42949672950000000（单位10亿分之一秒）</P>
<P>也就是说Oracle将得到一个cs值为4294967295的时间值</P>
<P>而4294967295值就是0xffffffff</P>
<P>所以当时v$timer的计时也就是:</P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999><SPAN class=style6>
<PRE>SQL&gt; select * from v$timer;

     HSECS
----------
4294967295

SQL&gt; /                   

     HSECS
----------
4294967295

SQL&gt; /

     HSECS
----------
4294967295

SQL&gt;      
                      </PRE></SPAN></TD></TR></TBODY></TABLE>
<P> </P>
<P> </P>
<P>&nbsp;</P>]]></description>
<link>http://www.eygle.com/archives/2004/11/job_can_not_execute_auto.html</link>
<guid>http://www.eygle.com/archives/2004/11/job_can_not_execute_auto.html</guid>
<category>Case</category>
<pubDate>Fri, 26 Nov 2004 14:41:48 +0800</pubDate>
</item>
<item>
<title>使用SQL_TRACE进行数据库诊断</title>
<description><![CDATA[<P>SQL_TRACE是Oracle提供的用于进行SQL跟踪的手段，是强有力的辅助诊断工具.在日常的数据库问题诊断和解决中，SQL_TRACE是非常常用的方法。<BR>本文就SQL_TRACE的使用作简单探讨，并通过具体案例对sql_trace的使用进行说明.<BR></P>
<P><BR>一、 基础介绍</P>
<P>(a) SQL_TRACE说明</P>
<P>SQL_TRACE可以作为初始化参数在全局启用，也可以通过命令行方式在具体session启用。<BR><STRONG>1． 在全局启用</STRONG><BR>在参数文件(pfile/spfile)中指定:<BR></P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999 height=24><SPAN class=style6>
<PRE>sql_trace =true</PRE></SPAN></TD></TR></TBODY></TABLE>
<P>在全局启用SQL_TRACE会导致所有进程的活动被跟踪，包括后台进程及所有用户进程，这通常会导致比较严重的性能问题，所以在生产环境<BR>中要谨慎使用.<BR>提示: 通过在全局启用sql_trace，我们可以跟踪到所有后台进程的活动，很多在文档中的抽象说明，通过跟踪文件的实时变化，我们可以清晰<BR>的看到各个进程之间的紧密协调.</P>
<P><STRONG>2． 在当前session级设置</STRONG><BR>大多数时候我们使用sql_trace跟踪当前进程.通过跟踪当前进程可以发现当前操作的后台数据库递归活动(这在研究数据库新特性时尤其有效)，<BR>研究SQL执行，发现后台错误等.<BR>在session级启用和停止sql_trace方式如下:<BR></P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=736>

<P>&nbsp;</P><PRE>启用当前session的跟踪:
SQL&gt; alter session set sql_trace=true;

Session altered.

此时的SQL操作将被跟踪:
SQL&gt; select count(*) from dba_users;

  COUNT(*)
----------
        34
结束跟踪:
SQL&gt; alter session set sql_trace=false;

Session altered.
       </PRE></TD></TR></TBODY></TABLE>
<P><BR><STRONG>3． 跟踪其他用户进程</STRONG><BR>在很多时候我们需要跟踪其他用户的进程，而不是当前用户，这可以通过Oracle提供的系统包DBMS_SYSTEM. SET_SQL_TRACE_IN_SESSION<BR>来完成</P>
<P>SET_SQL_TRACE_IN_SESSION过程序要提供三个参数:<BR></P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>
<PRE>SQL&gt; desc dbms_system
…
PROCEDURE SET_SQL_TRACE_IN_SESSION
 Argument Name                     Type                    In/Out Default?
 ------------------------------           -----------------------   ------ --------
 SID                               NUMBER                  IN
 SERIAL#                          NUMBER                  IN
 SQL_TRACE                        BOOLEAN                 IN
…</PRE>
<P>&nbsp;</P></TD></TR></TBODY></TABLE>
<P>通过v$session我们可以获得sid、serial#等信息:</P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>
<PRE>获得进程信息，选择需要跟踪的进程:

SQL&gt; select sid,serial#,username from v$session
  2  where username is not null;

       SID    SERIAL#  USERNAME
---------- ---------- ------------------------------
         8       2041  SYS
         9        437  EYGLE

设置跟着:
SQL&gt; exec dbms_system.set_sql_trace_in_session(9,437,true)

PL/SQL procedure successfully completed.

….
可以等候片刻，跟踪session执行任务,捕获sql操作…
….

停止跟踪:
SQL&gt; exec dbms_system.set_sql_trace_in_session(9,437,false)

PL/SQL procedure successfully completed.
      </PRE>
<P>&nbsp;</P></TD></TR></TBODY></TABLE>
<P><BR><STRONG>(b) 10046事件说明</STRONG><BR>10046事件是Oracle提供的内部事件，是对SQL_TRACE的增强.<BR>10046事件可以设置以下四个级别:<BR>1 - 启用标准的SQL_TRACE功能,等价于sql_trace<BR>4 - Level 1 加上绑定值(bind values)<BR>8 - Level 1 + 等待事件跟踪<BR>12 - Level 1 + Level 4 + Level 8<BR>类似sql_trace，10046事件可以在全局设置，也可以在session级设置。<BR><STRONG>1． 在全局设置</STRONG><BR>在参数文件中增加:<BR></P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>

<P>&nbsp;</P>
<P>event="10046 trace name context forever,level 12"</P>
<P>&nbsp;</P></TD></TR></TBODY></TABLE>
<P>此设置对所有用户的所有进程生效、包括后台进程.</P>
<P><STRONG>2． 对当前session设置</STRONG><BR>通过alter session的方式修改，需要alter session的系统权限:<BR></P>
<TABLE cellSpacing=0 cellPadding=0 bgColor=#999999>
<TBODY>
<TR>
<TD class=style6 vAlign=top width=692>

<P>&nbsp;</P><PRE>SQL&gt; alter session set events '10046 trace name context forever';

Session altered.

SQL&gt; alter session set events '10046 trace name context forever, level 8';

Session altered.

SQL&gt; alter session set events '10046 trace name context off';

Session altered.

      </PRE></TD></TR></TBODY></TABLE>
<P><STRONG>3． 对其他用户session设置</STRONG><BR>通过DBMS_SYSTEM.SET_EV系统包来实现:</P>
<P>&nbsp;</P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999>
<PRE><SPAN class=style26><SPAN class=style6></SPAN>
</SPAN>SQL&gt; desc dbms_system
...
PROCEDURE SET_EV
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SI                             BINARY_INTEGER          IN
 SE                             BINARY_INTEGER          IN
 EV                             BINARY_INTEGER          IN
 LE                             BINARY_INTEGER          IN
 NM                             VARCHAR2                IN

...

                      </PRE></TD></TR></TBODY></TABLE>
<P>其中的参数SI、SE来自v$session视图:</P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999>
<PRE><SPAN class=style26>
</SPAN><SPAN class=style25><FONT face=Verdana size=2>查询获得需要跟踪的session信息:<BR>SQL&gt; select sid,serial#,username from v$session where username is not null;</FONT></SPAN></PRE>
<P class=style25>SID SERIAL# USERNAME<BR>---------- ---------- ------------------------------<BR>8 2041 SYS<BR>9 437 EYGLE</P>
<P class=style25><BR>执行跟踪:<BR>SQL&gt; exec dbms_system.set_ev(9,437,10046,8,'eygle');</P>
<P class=style25>PL/SQL procedure successfully completed.</P>
<P class=style25>结束跟踪:<BR>SQL&gt; exec dbms_system.set_ev(9,437,10046,0,'eygle');</P>
<P><SPAN class=style25><FONT size=2>PL/SQL procedure successfully completed.<BR></FONT></SPAN></P></TD></TR></TBODY></TABLE>
<P><STRONG>(c) 获取跟踪文件</STRONG><BR>以上生成的跟踪文件位于user_dump_dest目录中，位置及文件名可以通过以下SQL查询获得:<BR></P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999>
<PRE><SPAN class=style26>
</SPAN><SPAN class=style25><FONT face=Verdana size=2>SQL&gt; select<BR>  2    d.value||'/'||lower(rtrim(i.instance, chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name<BR>  3  from<BR>  4    ( select p.spid<BR>  5      from sys.v$mystat m,sys.v$session s,sys.v$process p<BR>  6      where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p,<BR>  7    ( select t.instance from sys.v$thread  t,sys.v$parameter  v<BR>  8      where v.name = 'thread' and (v.value = 0 or t.thread# = to_number(v.value))) i,<BR>  9    ( select value from sys.v$parameter where name = 'user_dump_dest') d<BR> 10  /</FONT></SPAN></PRE>
<P><FONT face=Verdana size=2></FONT>&nbsp;</P>
<P class=style25>TRACE_FILE_NAME<BR>--------------------------------------------------------------------------------<BR>/opt/oracle/admin/hsjf/udump/hsjf_ora_1026.trc<BR></P><PRE>&nbsp;
                        </PRE></TD></TR></TBODY></TABLE>
<P><BR><STRONG>(d) 读取当前session设置的参数</STRONG><BR>当我们通过alter session的方式设置了sql_trace,这个设置是不能通过show parameter的方式得到的,我们需要通过dbms_system.read_ev来获取：<BR></P>
<TABLE border=0>
<TBODY>
<TR>
<TD width=729 bgColor=#999999>
<PRE><SPAN class=style26>
</SPAN><SPAN class=style25><FONT face=Verdana size=2>SQL&gt; set feedback off<BR>SQL&gt; set serveroutput on </FONT></SPAN></PRE>
<P class=style25>SQL&gt; declare <BR>2 event_level number; <BR>3 begin <BR>4 for event_number in 10000..10999 loop <BR>5 sys.dbms_system.read_ev(event_number, event_level); <BR>6 if (event_level &gt; 0) then <BR>7 sys.dbms_output.put_line(<BR>8 'Event ' ||<BR>9 to_char(event_number) ||<BR>10 ' is set at level ' || <BR>11 to_char(event_level)<BR>12 ); <BR>13 end if; <BR>14 end loop; <BR>15 end; <BR>16 /<BR>Event 10046 is set at level 1</P>
<P></P><PRE>&nbsp;
                        </PRE></TD></TR></TBODY></TABLE>
<P>&nbsp;</P>
<P>&nbsp;</P>]]></description>
<link>http://www.eygle.com/archives/2004/10/use_sql_trace_to_diagnose_database.html</link>
<guid>http://www.eygle.com/archives/2004/10/use_sql_trace_to_diagnose_database.html</guid>
<category>Case</category>
<pubDate>Sun, 31 Oct 2004 14:29:47 +0800</pubDate>
</item>


</channel>
</rss>