eygle.com   eygle.com
eygle.com  
 
Digest Net: February 2011 Archives

February 2011 Archives

原文链接: http://www.anysql.net/oracle/exp_param_query.html

Oracle的exp工具有一个query参数可以指定一个where条件来有条件地导出记录, 对于不经常用这个选项的人来说, 经常会遇到这样的错误:

LRM-00112: multiple values not allowed for parameter 'query'

EXP-00019: failed to process parameters, type 'EXP HELP=Y' for help
EXP-00000: Export terminated unsuccessfully

这是因为在where条件中一般都会有空格, 而命令行下就会被释成几个命令行参数, 需要用单引号或双引号将整个where条件括起来, 就可以了. 在Windows下, 如何指定query参数:

exp ... query='where deptno=10′
exp ... query='where deptno="10"'
exp ... query='where deptno"<"10'

在Solaris(C shell)下, 如何指定query参数:

exp ..... query=\"where col1 \< 1000\"
exp ..... query=\"where col1 \< '1000′\"

其他Unix平台的应当和Solaris下的一样, 我自已也经常搞错. 在上面的例子中已经说明了如何在query值中使用单引号, 因此在看完这篇后, 就应当可以写出正确的where条件了. 最好是写在一个参数文件里, 这样的话就不用注意这些了.

转引自老熊的网站,原文链接: http://www.laoxiong.net/a-recovery-case.html

本文简单记录一下最近一次数据恢复的过程。

事情的起因是,一个应用升级后,某一个操作导致一个表的几个列全部被更新为同一值(忍不住又要唠叨测试的重要性)。这样的错误居然出现在应用代码中,显然是重大的BUG。那个是罪魁祸首的SQL,UPDATE语句,其WHERE条件仅仅只有一个where 1=1。

系统的维护人员称是星期五出的错,发现出错是在星期天,也就是我恢复数据的日期,与声称的出错时间已经隔了将近2天。开始尝试用flashback query恢复数据,报ORA-01555错误,此路不通。维护人员说,星期五之前的RMAN备份已经被删除了(又是一个备份恢复策略不当地例子),使用 基于时间点的恢复也不可能了。剩下的一条路,只有使用log miner。还好归档文件还在数据库服务器上。

这套库是一套RAC数据库,由于没有人能确认操作发生在哪个节点,因此需要将一个节点下所有的归档复制到另一个节点上(如果没有足够的空间,可以使用NFS)。然后需要找到我们用于数据恢复的归档日志:

  1. set linesize 170 pagesize 10000  
  2. alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';  
  3.   
  4. col name for a30  
  5. col first_change for a10  
  6. col next_change for a10  
  7.   
  8. select max(first_time) from v$archived_log  
  9. where first_time < to_date('200909251900','yyyymmddhh24mi'); --这里的时间为错误发生时估计的最早时间。  
  10.   
  11. select sequence#,first_time,name,to_char(first_change#,'xxxxxxxx') first_change,  
  12.  to_char(next_change#,'xxxxxxxx') next_change  
  13.  from v$archived_log  
  14. where  first_time >=to_date('200909251707','yyyymmddhh24mi')  
  15. order by 2;--这里的时间为前一SQL的max(first_time)结果  
  16.   
  17.  SEQUENCE# FIRST_TIME          NAME                           FIRST_CHAN NEXT_CHANG  
  18. ---------- ------------------- ------------------------------ ---------- ----------  
  19.       4039 2009-09-25 17:07:10 /arch/db1_1_4039.arc          88ce7eff   88d1457c  
  20.       4040 2009-09-26 12:24:52 /arch/db1_1_4040.arc          88d1457c   88d1459f  
  21.       4041 2009-09-26 12:25:22 /arch/db1_1_4041.arc          88d1459f   88d156a4  
  22.       4688 2009-09-26 12:37:59 /arch/db1_2_4688.arc          88d1457f   88d1464a  
  23.       4689 2009-09-26 12:38:27 /arch/db1_2_4689.arc          88d1464a   88d1569c  
  24.       4042 2009-09-26 12:54:44 /arch/db1_1_4042.arc          88d156a4   88d157e7  
  25.       4043 2009-09-26 12:54:56 /arch/db1_1_4043.arc          88d157e7   88d1ab06  
  26.       4690 2009-09-26 13:07:47 /arch/db1_2_4690.arc          88d1569c   88d1570b  
  27.       4691 2009-09-26 13:08:00 /arch/db1_2_4691.arc          88d1570b   88d1ab09  
  28.       4044 2009-09-26 15:27:32 /arch/db1_1_4044.arc          88d1ab06   88d1ab0d  
  29.       4045 2009-09-26 15:27:35 /arch/db1_1_4045.arc          88d1ab0d   88d25091  
  30.       4692 2009-09-26 15:40:36 /arch/db1_2_4692.arc          88d1ab09   88d1ab77  
  31.       4693 2009-09-26 15:40:39 /arch/db1_2_4693.arc          88d1ab77   88d25094  
  32.       4046 2009-09-26 22:24:07 /arch/db1_1_4046.arc          88d25091   88d250db  
  33.       4047 2009-09-26 22:24:19 /arch/db1_1_4047.arc          88d250db   88d2515e  
  34.       4048 2009-09-26 22:24:29 /arch/db1_1_4048.arc          88d2515e   88d25167  
  35.       4049 2009-09-26 22:24:41 /arch/db1_1_4049.arc          88d25167   88d25cac  
  36.       4694 2009-09-26 22:37:13 /arch/db1_2_4694.arc          88d25094   88d25147  
  37.       4695 2009-09-26 22:37:25 /arch/db1_2_4695.arc          88d25147   88d2515b  
  38.       4696 2009-09-26 22:37:33 /arch/db1_2_4696.arc          88d2515b   88d2516a  
  39.       4697 2009-09-26 22:37:47 /arch/db1_2_4697.arc          88d2516a   88d25ca9  
  40.       4050 2009-09-26 22:41:57 /arch/db1_1_4050.arc          88d25cac   88d25cde  
  41.       4698 2009-09-26 22:55:01 /arch/db1_2_4698.arc          88d25ca9   88d25dcf  
  42.       4699 2009-09-26 22:55:19 /arch/db1_2_4699.arc          88d25dcf   88dbd27e  

尝试找到数据被错误更新的时间点:


  1. exec sys.dbms_logmnr.add_logfile(logfilename=>'/arch/db1_1_4038.arc');  
  2. exec sys.dbms_logmnr.add_logfile(logfilename=>'/arch/db1_1_4039.arc');  
  3.   
  4. exec sys.dbms_logmnr.start_logmnr(options=>sys.dbms_logmnr.dict_from_online_catalog);  
  5.   
  6. col sql_redo for a50  
  7.   
  8. select scn,timestamp,username,sql_redo from v$logmnr_contents  
  9. where operation='UPDATE' and upper(sql_redo) like '%TBL_FORM_FORM%'  
  10. and sql_redo like '%SGS0900021BNc10%'  --这个值是UPDATE时某一列被更新后的值,用在这里便于查找。  
  11. order by scn,timestamp;  
  12. exec sys.dbms_logmnr.end_logmnr;  

很不幸的是,没有找着需要的数据。再往后找了几个日志,也没找着。
如果一直找下去,显然会消耗比较长的时间,业务也已经停止了。不过可以用一种简单的方法来查找数据被错误更新发生的时间:一个比较大的表,通常段头后面的 那个块,也就是存储那个表的数据的第1个块,通常是很少更新的,至少当时恢复的那个表是这样一种情况。我们可以通过数据块中ITL上的事务SCN来满足我 们的要求。


  1. SQL> select tablespace_name,extent_id,file_id,block_id,blocks  
  2.      from dba_extents where owner='XXX'  
  3.      and segment_name='TBL_FORM_FORM'  
  4.      order by extent_id;  
  5.   
  6. TABLESPACE_NAME   EXTENT_ID    FILE_ID   BLOCK_ID  BLOCKS  
  7. ---------------- ---------- ---------- ---------- -------  
  8. XXXX                      0         16      25481     128  
  9. XXXX                      1         17      23433     128  
  10. XXXX                      2         18      21385     128  
  11. XXXX                      3         19      19977     128  
  12. XXXX                      4         16      23945     128  
  13. XXXX                      5         17       8585     128  
  14. XXXX                      6         18      14217     128  
  15. XXXX                      7         19      18825     128  
  16.   
  17. SQL> alter system dump datafile 16 block 25482;  
  18.   
  19. System altered.  
  20.   
  21. Start dump data blocks tsn: 4 file#: 16 minblk 25482 maxblk 25482  
  22. buffer tsn: 4 rdba: 0x0400638a (16/25482)  
  23. scn: 0x0000.88e21027 seq: 0x02 flg: 0x00 tail: 0x10270602  
  24. frmt: 0x02 chkval: 0x0000 type: 0x06=trans data  
  25. Block header dump:  0x0400638a  
  26.  Object id on Block? Y  
  27.  seg/obj: 0x40d8  csc: 0x00.88e20c40  itc: 2  flg: -  typ: 1 - DATA  
  28.      fsl: 0  fnx: 0x0 ver: 0x01  
  29.   
  30.  Itl           Xid                  Uba         Flag  Lck        Scn/Fsc  
  31. 0x01   0x0010.011.0006ed74  0x03c002a0.2f48.07  C---    0  scn 0x0000.88d7af30  
  32. 0x02   0x0012.019.000027e0  0x03c00ede.05de.42  C---    0  scn 0x0000.44e2ee39  

从上面的结果可以看到,数据块的ITL中,最新的事务其SCN为88d7af30,正处于最后一个归档日志的first_change#和 last_change#之间,即88d25dcf和88dbd27e之间,难不成这个错误是今天早上才发生的?于是我挖掘最后1个归档日志,结果发生错 误的确是发生在早上,也就是我开始进行恢复操作之前半个小时。

既然错误并没有发生太久,同时这个系统也允许一定的数据丢失,那就使用flashback query,得到UPDATE操作之前的数据即可。

  1. create table tbl_form_form_new  
  2. as select * from tbl_form_form  
  3. as of timestamp to_date('2009-09-27 09:08:00','yyyy-mm-dd hh24:mi:ss');  
  4. --当然这里也可以按SCN进行闪回。  

幸运的是,这次闪回查询成功了。看起来足够大的UNDO表空间还是有好处,至少我已经有数次用闪回查询来恢复数据。


Logical Standby 故障案例处理两则

| No Comments
原文引自:
http://www.banping.com/2010/01/19/temporary_table_data_guard/
http://www.banping.com/2009/09/28/logic_standby_blocked/
http://www.banping.com/2009/08/27/logic_standby_temporary_table/

新年上班第一天就忙于处理一起data guard故障,最近太忙了,今天抽空整理一下。

话说那时是2010年第一个工作日,早上来发现备份库小型机一个目录占满了:

p55a@/home/oracle> df -g

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on

/dev/fslv01       53.50      0.00  100%     1167    68% /stb_arch

查看可用的存储空间:

p55a@/home/oracle> lsvg
rootvg
backvg

p55a@/home/oracle> lsvg backvg
VOLUME GROUP:       backvg                   VG IDENTIFIER:  000221f30000d600000001141bb7ac12
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      546 (139776 megabytes)
MAX LVs:            256                      FREE PPs:       0 (0 megabytes)
LVs:                4                        USED PPs:       546 (139776 megabytes)
OPEN LVs:           3                        QUORUM:         2
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

已经没有足够的空间可以分配,而这个/stb_arch是存放主库传过来的归档日志,很显然备库在apply这些归档时出现了阻塞,导致主库过来的归档文件不断积压,最终占满了空间。

先去主库暂停到备库的日志同步:

SQL> show parameter log_archive_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest                     string
log_archive_dest_1                   string      LOCATION=/ora_arch VALID_FOR=(
ALL_LOGFILES,ALL_ROLES) DB_UNI
QUE_NAME=erpdb
log_archive_dest_10                  string
log_archive_dest_2                   string      SERVICE=bkdb VALID_FOR=(ONLIN
E_LOGFILES,PRIMARY_ROLE) DB_UN
IQUE_NAME=bkdb

SQL> alter system set log_archive_dest_state_2=defer;

System altered.

在备库查看DBA_LOGSTDBY_EVENTS视图:

SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTS order by 1 desc;

发现有建立临时表的操作引起了阻塞:

create global temporary table sntempStoreOut090407 on commit preserve rows as
select
g.islcode,idx,g.bcode as bcode1,g.bcode
,g.scode,g.hcode,g.receivecode,g.ihcode,g.ecode
,g.gcode,g.gattr1,g.gattr2,g.gattr3,g.gattr4,
,g.qtx,g.qtp,g.qtpunit,g.qtycount,g.qtyunit,g.qty,g.iodate as iodate,g.ratifydate as ratifydate,g.stockfrom,g.piecewt,g.slstatus
,g.rmb2,upric0 as upric0
,g.rmb0 as rmb0
,g.gtcode1 as gtcode1
,g.slcode
,g.ecode2
,g.busimode1
,busimode2
,bcode as vpreapre
,g.ccode as ccode
,g.gcode as gcode0
,g.sbcode
,g.upislcode
,g.slcodex,g.etrdcode,g.busimode3,g.inoutmode
from
storeslistg g where 1 <> 1

先停止standby,跳过这个事务,kill使用原来的临时表的session,手动执行以上SQL,然后开启standby:

SQL> alter database stop logical standby apply;

Database altered.

SQL> exec dbms_logstdby.skip_transaction(9,27,994864);

PL/SQL procedure successfully completed.

SQL> alter database start logical standby apply;

这样这个引起阻塞的事务就能过去了,但是由于空间占满了,应用仍然有问题,这时可以手动转移走一些比较靠后的不会马上应用的归档,然后有足够的空间后再还原回来,要注意的就是要保证这些文件的属主是Oracle用户。

# cd /tmp/log_2010
# ls -l
total 1025568
-rw-r-----   1 root     system      1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 root     system       110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 root     system        52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 root     system        65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 root     system        56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 root     system      1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 root     system        59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 root     system        57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 root     system    520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 root     system      1545216 Jan 04 08:52 2_125518_640266118.dbf
# chown oracle.oinstall *
# ls -l
total 1025568
-rw-r-----   1 oracle   oinstall    1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 oracle   oinstall     110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 oracle   oinstall      52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 oracle   oinstall      65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 oracle   oinstall      56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 oracle   oinstall    1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 oracle   oinstall      59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 oracle   oinstall      57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 oracle   oinstall  520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 oracle   oinstall    1545216 Jan 04 08:52 2_125518_640266118.dbf
# pwd
/tmp/log_2010
# cp * /stb_arch

之后在备库的同步继续进行,然后又经历了2个临时表的手动干预后,终于腾出了足够的空间,这时可以打开主库的同步:

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

后来又发现备库出现了停滞,而并没有引起阻塞的事务,仔细查看DBA_LOGSTDBY_LOG视图发现有些归档丢失,没有传到备库,去主库找归档时发现已经删除了,这时只能从RMAN中恢复这些文件。根据缺失的sequence来找对应的RMAN备份片:

ERPDB2@/orabak/arch/transmited>rman target /

RMAN> list backup of archivelog from sequence 125504 until sequence 125554 thread 2;

List of Backup Sets
===================

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16862   1.14G      DISK        00:04:57     04-JAN-10
BP Key: 19677   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707403929_16929_1

List of Archived Logs in backup set 16862
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125504  11523639376 04-JAN-10 11523954045 04-JAN-10
. . . . . .
2    125509  11525286806 04-JAN-10 11525631638 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16864   453.25M    DISK        00:01:57     04-JAN-10
BP Key: 19679   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707404542_16931_1

List of Archived Logs in backup set 16864
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125510  11525631638 04-JAN-10 11525981043 04-JAN-10
. . . . . .
2    125522  11526443719 04-JAN-10 11526475093 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16867   401.53M    DISK        00:01:42     04-JAN-10
BP Key: 19682   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T213015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707434218_16935_1

List of Archived Logs in backup set 16867
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125523  11526475093 04-JAN-10 11526613894 04-JAN-10
. . . . . .
2    125554  11529391158 04-JAN-10 11529453386 04-JAN-10

RMAN> exit

从这里的Piece Name可以看出缺少3个备份片文件,找到这3个文件,放到RMAN备份的目的地/orabak/arch/目录下。写到这里我发现《Oracle DBA手记》书稿里一个类似的操作被我写成了放到默认的归档日志路径下,可能是晚上赶稿神志不清吧,道歉先。

然后恢复出来归档日志到指定的目录:

RMAN> run {
2> set archivelog destination to '/orabak/testarch';
3> restore archivelog from sequence 125504 until sequence 125554 thread 2;
4> }

在恢复的过程中会自动把这些归档同时传到备库,无需手动再传。

备库端归档日志全了,就能顺利的同步了。至此这个问题得到解决。这个案例和我在《Oracle DBA手记》里记录的一个案例类似,可互相参看印证。


主库传到备库的日志发生了滞留,在DBA_LOGSTDBY_LOG视图发现大量的日志没有应用,而正在APPLY的只有几条,这说明不是大事务导 致的延迟。以前经常是特殊的DDL语句导致的阻塞,而这次在DBA_LOGSTDBY_EVENTS视图并没有发现可疑的DDL操作。

尝试停止APPLY,可是却迟迟没有响应:

SQL> alter database stop logical standby apply;
^Calter database stop logical standby apply
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

这时备库的日志记录了如下的信息:

Sun Sep 27 17:30:04 2009
alter database stop logical standby apply
Sun Sep 27 17:30:04 2009
ALTER DATABASE STOP LOGICAL STANDBY APPLY
Sun Sep 27 17:30:05 2009
LOGSTDBY: Shutdown acknowledged

Sun Sep 27 17:36:12 2009
ORA-1013 signalled during: alter database stop logical standby apply...

此时并没有停掉应用,尝试ABORT停止:

SQL> alter database abort logical standby apply;

Database altered.

日志记录了如下的信息:

Sun Sep 27 20:02:43 2009
ALTER DATABASE ABORT LOGICAL STANDBY APPLY
Sun Sep 27 20:02:44 2009
LOGSTDBY: Shutdown acknowledged
LOGSTDBY Analyzer process P006 pid=39 OS id=479250 stopped
LOGSTDBY Apply process P008 pid=43 OS id=381224 stopped
LOGSTDBY Apply process P011 pid=46 OS id=270744 stopped
LOGSTDBY Apply process P018 pid=53 OS id=315748 stopped
LOGSTDBY Apply process P009 pid=44 OS id=373196 stopped
LOGSTDBY Apply process P022 pid=89 OS id=409732 stopped
LOGSTDBY Apply process P025 pid=92 OS id=467268 stopped
LOGSTDBY Apply process P023 pid=90 OS id=278898 stopped
LOGSTDBY Apply process P012 pid=47 OS id=438534 stopped
LOGSTDBY Apply process P010 pid=45 OS id=299266 stopped
LOGSTDBY Apply process P021 pid=88 OS id=176472 stopped
LOGSTDBY Apply process P019 pid=72 OS id=421888 stopped
LOGSTDBY Apply process P007 pid=41 OS id=458914 stopped
LOGSTDBY Apply process P026 pid=93 OS id=410074 stopped
LOGSTDBY Apply process P020 pid=87 OS id=291214 stopped
LOGSTDBY Apply process P024 pid=91 OS id=397640 stopped
LOGSTDBY Apply process P013 pid=48 OS id=487712 stopped
LOGSTDBY Apply process P017 pid=52 OS id=401700 stopped
LOGSTDBY Apply process P014 pid=49 OS id=405806 stopped
LOGSTDBY Apply process P015 pid=50 OS id=586094 stopped
Sun Sep 27 20:03:43 2009
LOGSTDBY status: ORA-16128: User initiated stop apply successfully completed
Sun Sep 27 20:04:49 2009
Completed: alter database abort logical standby apply

再尝试开启应用,仍然没有反应:

SQL> alter database start logical standby apply;
^Calter database start logical standby apply
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

日志显示apply engine已经在运行,看来没有办法停掉它:

Sun Sep 27 20:06:56 2009
ALTER DATABASE START LOGICAL STANDBY APPLY (banpingbkdb)
Sun Sep 27 20:06:56 2009
No optional part
Attempt to start background Logical Standby process
LSP0 started with pid=21, OS id=655544
LOGSTDBY status: ORA-16084: an apply engine is already running
Sun Sep 27 20:06:57 2009
Errors in file /u01/admin/banpingbkdb/bdump/banpingbkdb_lsp0_655544.trc:
ORA-16084: an apply engine is already running
LOGSTDBY status: ORA-16222: automatic Logical Standby retry of last action
LOGSTDBY status: ORA-16084: an apply engine is already running
Sun Sep 27 20:06:57 2009
Errors in file /u01/admin/banpingbkdb/bdump/banpingbkdb_lsp0_655544.trc:
ORA-16084: an apply engine is already running

现在stop可以了:

SQL>
SQL> alter database stop logical standby apply;

Database altered.

但是再开启仍然不行,apply engine一直在运行。没有什么好办法,重启备库的instance吧:

SQL> alter database stop logical standby apply;

Database altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

在重启的过程中也等待了很久,日志如下:

Sun Sep 27 20:24:04 2009
Starting background process EMN0
EMN0 started with pid=21, OS id=536826
Sun Sep 27 20:24:04 2009
Shutting down instance: further logons disabled
Sun Sep 27 20:24:04 2009
Stopping background process CJQ0
Sun Sep 27 20:24:04 2009
Stopping background process MMNL
Sun Sep 27 20:24:05 2009
Stopping background process MMON
Sun Sep 27 20:24:06 2009
Shutting down instance (immediate)
License high water mark = 98
Sun Sep 27 20:24:06 2009
Stopping Job queue slave processes
Sun Sep 27 20:24:06 2009
Job queue slave processes stopped
Sun Sep 27 20:29:05 2009
Active call for process 557346 user 'oracle' program 'oracle@p55a (P016)'
SHUTDOWN: waiting for active calls to complete.

似乎一直在等待这个process 557346,于是直接kill之:

# ps -ef | grep 557346
    root 606208 528518   0 20:46:00  pts/3  0:00 grep 557346
  oracle 557346      1  93   Aug 27      - 1018:44 ora_p016_banpingbkdb
# kill -9 557346

然后就顺利的关掉了:

Sun Sep 27 20:43:57 2009
MMNL absent for 1203 secs; Foregrounds taking over
Sun Sep 27 20:46:20 2009
All dispatchers and shared servers shutdown
Sun Sep 27 20:46:20 2009
Process OS id : 643398 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:21 2009
Process OS id : 696622 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:22 2009
Process OS id : 606694 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:23 2009
Process OS id : 647260 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:23 2009
Process OS id : 536932 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:24 2009
Process OS id : 372796 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:24 2009
Process OS id : 348212 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:25 2009
Process OS id : 430426 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:26 2009
Process OS id : 634944 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:27 2009
ALTER DATABASE CLOSE NORMAL
Sun Sep 27 20:46:41 2009
SMON: disabling tx recovery
SMON: disabling cache recovery
Sun Sep 27 20:46:48 2009
Shutting down archive processes
Archiving is disabled
Sun Sep 27 20:46:53 2009
ARCH shutting down
ARC1: Archival stopped
Sun Sep 27 20:46:58 2009
ARCH shutting down
ARC0: Archival stopped
Sun Sep 27 20:46:59 2009
Thread 1 closed at log sequence 43105
Successful close of redo thread 1
Sun Sep 27 20:47:02 2009
Completed: ALTER DATABASE CLOSE NORMAL
Sun Sep 27 20:47:02 2009
ALTER DATABASE DISMOUNT
Sun Sep 27 20:47:02 2009
SUCCESS: diskgroup DATA was dismounted
Sun Sep 27 20:47:02 2009
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active

然后再启动实例:

SQL> startup
ORACLE instance started.

Total System Global Area 1.0737E+10 bytes
Fixed Size                  2090664 bytes
Variable Size            3707767128 bytes
Database Buffers         7012876288 bytes
Redo Buffers               14684160 bytes
Database mounted.
Database opened.
SQL> alter database start logical standby apply;

Database altered.

再查看日志,logic standby终于恢复了正常。

启动的日志也记录一下吧:

Sun Sep 27 20:48:13 2009
Starting ORACLE instance (normal)
Sun Sep 27 20:48:29 2009
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =121
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
  processes                = 1000
  lock_sga                 = TRUE
  __shared_pool_size       = 3623878656
  __large_pool_size        = 16777216
  __java_pool_size         = 16777216
  __streams_pool_size      = 50331648
  streams_pool_size        = 50331648
  nls_language             = AMERICAN
  nls_territory            = CHINA
  nls_length_semantics     = BYTE
  resource_manager_plan    =
  sga_target               = 10737418240
  control_files            = +DATA/banpingbkdb/controlfile/db_control1, +DATA/banpingbkdb/controlfile/db_control2, +DATA/banpingbkdb/controlfile/db_control3
  db_file_name_convert     = /dev, +DATA/banpingbkdb/datafile
  log_file_name_convert    = /dev, +DATA/banpingbkdb/onlinelog
  db_block_size            = 8192
  __db_cache_size          = 7012876288
  compatible               = 10.2.0.1.0
  log_archive_config       = DG_CONFIG=(cnderpdb, banpingbkdb)
  log_archive_dest_1       = LOCATION=/ora_arch/ VALID_FOR=(ONLINE_LOGFILE,ALL_ROLES)  DB_UNIQUE_NAME=banpingbkdb
  log_archive_dest_2       = SERVICE=cnderpdb1 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=cnderpdb
  log_archive_dest_3       = LOCATION=/stb_arch/ VALID_FOR=(STANDBY_LOGFILE,STANDBY_ROLE)  DB_UNIQUE_NAME=banpingbkdb
  log_archive_format       = %t_%s_%r.dbf
  fal_client               = banpingbkdb
  fal_server               = CNDERPDB1, CNDERPDB2
  db_file_multiblock_read_count= 16
  standby_file_management  = MANUAL
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  undo_retention           = 900
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = chinacdc.com
  dispatchers              = (PROTOCOL=TCP) (SERVICE=banpingbkdbXDB)
  job_queue_processes      = 10
  background_dump_dest     = /u01/admin/banpingbkdb/bdump
  user_dump_dest           = /u01/admin/banpingbkdb/udump
  core_dump_dest           = /u01/admin/banpingbkdb/cdump
  audit_file_dest          = /u01/admin/banpingbkdb/adump
  session_max_open_files   = 20
  db_name                  = banpingbkdb
  db_unique_name           = banpingbkdb
  open_cursors             = 300
  pga_aggregate_target     = 2147483648
  aq_tm_processes          = 0
PMON started with pid=2, OS id=639102
PSP0 started with pid=3, OS id=672098
MMAN started with pid=4, OS id=545050
DBW0 started with pid=5, OS id=655710
DBW1 started with pid=6, OS id=635230
LGWR started with pid=7, OS id=627006
CKPT started with pid=8, OS id=643408
SMON started with pid=9, OS id=610492
RECO started with pid=10, OS id=614748
CJQ0 started with pid=11, OS id=274744
MMON started with pid=12, OS id=589914
Sun Sep 27 20:48:29 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=13, OS id=696642
Sun Sep 27 20:48:29 2009
starting up 1 shared server(s) ...
Sun Sep 27 20:48:30 2009
ALTER DATABASE   MOUNT
Sun Sep 27 20:48:30 2009
Starting background process ASMB
ASMB started with pid=17, OS id=524414
Starting background process RBAL
RBAL started with pid=18, OS id=299234
Sun Sep 27 20:48:34 2009
SUCCESS: diskgroup DATA was mounted
Sun Sep 27 20:48:38 2009
Setting recovery target incarnation to 2
Using STANDBY_ARCHIVE_DEST parameter default value as /stb_arch/
Sun Sep 27 20:48:38 2009
Successful mount of redo thread 1, with mount id 1040980254
Sun Sep 27 20:48:38 2009
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sun Sep 27 20:48:38 2009
ALTER DATABASE OPEN
Sun Sep 27 20:48:39 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS id=467272
Sun Sep 27 20:48:39 2009
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=21, OS id=418070
Sun Sep 27 20:48:39 2009
ARC0: Becoming the 'no FAL' ARCH
Sun Sep 27 20:48:39 2009
ARC0: Becoming the 'no SRL' ARCH
Sun Sep 27 20:48:39 2009
ARC1: Becoming the heartbeat ARCH
Sun Sep 27 20:48:39 2009
Thread 1 opened at log sequence 43105
  Current log# 4 seq# 43105 mem# 0: +DATA/banpingbkdb/onlinelog/rdb_redo1_4
Successful open of redo thread 1
Sun Sep 27 20:48:39 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Sep 27 20:48:39 2009
SMON: enabling cache recovery
Sun Sep 27 20:48:40 2009
Successfully onlined Undo Tablespace 1.
Sun Sep 27 20:48:40 2009
SMON: enabling tx recovery
Sun Sep 27 20:48:40 2009
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected.
Completed: ALTER DATABASE OPEN
Sun Sep 27 20:50:41 2009
alter database start logical standby apply
Sun Sep 27 20:50:42 2009
ALTER DATABASE START LOGICAL STANDBY APPLY (banpingbkdb)
Sun Sep 27 20:50:42 2009
No optional part
Attempt to start background Logical Standby process
LSP0 started with pid=24, OS id=561496
Sun Sep 27 20:50:42 2009
LOGSTDBY Parameter:                        MAX_SGA = 1000
LOGSTDBY Parameter:                    MAX_SERVERS = 27
Completed: alter database start logical standby apply


今天又碰到了Logic Standby同步的问题,起因是这样的,有用户反映在备库的查询数据很旧,于是检查备库的同步信息:

SELECT FILE_NAME, SEQUENCE# AS SEQ#, FIRST_CHANGE# AS FCHANGE#,NEXT_CHANGE# AS NCHANGE#, TIMESTAMP,DICT_BEGIN AS BEG, DICT_END AS END,THREAD# AS THR# , APPLIED
FROM DBA_LOGSTDBY_LOG ORDER BY THREAD#,SEQUENCE#;

发现日志并没有积压,以为只是大事务没有完成,没什么问题,可是检查DBA_LOGSTDBY_EVENTS视图还是发现了问题:

SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTS order by 1 desc;

EVENT_TIME EVENT XIDUSN XIDSLT XIDSQN

2009-8-27 10:38:55 <CLOB> 7 29 735354

<CLOB>对象记录的SQL如下:

create global temporary table sntempStoreOut1210164_3 on commit preserve rows as
select
*
from
storeslistg where 1 <> 1

看来是这个创建临时表的DDL引起了阻塞,先停止standby:

SQL> alter database stop logical standby apply;

Database altered.

尝试手动执行这个DDL:

SQL> create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
  2  select
  3  *  from
 4  banping.storeslistg g where 1 <> 1
 5  /
create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
                                     *
ERROR at line 1:
ORA-00955: name is already used by an existing object

看来这个临时表已经存在了,由于程序设计的原因,每次使用临时表的时候会判断是否存在,如果不存在则创建,旧的已有的应该是可以删掉的:

SQL> drop table banping.sntempStoreOut1210164_3;
drop table banping.sntempStoreOut1210164_3
                  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use
SQL> truncate table banping.sntempStoreOut1210164_3;

Table truncated.

SQL> drop table banping.sntempStoreOut1210164_3;
drop table banping.sntempStoreOut1210164_3
                  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use

无法drop掉旧的表,只能找到使用这个表的session并杀掉:

SQL> select sid,id1,type from v$lock where
  2  id1=(select object_id from dba_objects where object_name=upper('sntempStoreOut1210164_3'))
  3  /

       SID        ID1 TY
---------- ---------- --
      1021     195370 TO
      1024     195370 TO
      1052     195370 TO
      1072     195370 TO
      1084     195370 TO
      1053     195370 TO

6 rows selected.

通过SID到v$session找到对应的serial#后就可以杀掉session了,发现这几个session都是INACTIVE状态的,估 计是以前挂在这边的了。奇怪的是这里的TYPE值是TO,不知道TO是什么类型的锁,Oracle的文档也只记录了以下用户类型锁,系统类型的锁也没有 TO:

TM - DML enqueue
TX - Transaction enqueue
UL - User supplied

手动kill session,释放对临时表的锁定:

SQL> alter system kill session '1021,1444';

System altered.

SQL> alter system kill session '1024,1116';

System altered.

SQL> alter system kill session '1052,2688';

System altered.

SQL> alter system kill session '1053,2579';

System altered.

SQL> alter system kill session '1072,473';

System altered.

SQL> alter system kill session '1084,948';

System altered.

然后就可以drop掉原来的临时表:

SQL> drop table banping.sntempStoreOut1210164_3;

Table dropped.

再手动创建新的临时表:

SQL> create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
  2  select
  3  *  from
 4  banping.storeslistg g where 1 <> 1
 5  /

Table created.

然后手动跳过这个事务:

SQL> exec dbms_logstdby.skip_transaction(7,29,735354);

PL/SQL procedure successfully completed.

重新开始standby:

SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

Database altered.

观察日志,一切恢复正常。logic standby麻烦的地方就是没有足够的警告日志,要人工去查找原因,当然监控也是可以通过查询一些视图做到的。


About this Archive

This page is an archive of entries from February 2011 listed from newest to oldest.

January 2011 is the previous archive.

March 2011 is the next archive.

回到 首页 查看最近文章或者查看所有归档文章.