eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« 汉语拼音之父周有光:中国落后惊人 没有经济奇迹 | 文摘首页 | 记一次使用闪回查询进行的数据恢复 »

Logical Standby 故障案例处理两则
modb.pro

原文引自:
http://www.banping.com/2010/01/19/temporary_table_data_guard/
http://www.banping.com/2009/09/28/logic_standby_blocked/
http://www.banping.com/2009/08/27/logic_standby_temporary_table/

新年上班第一天就忙于处理一起data guard故障,最近太忙了,今天抽空整理一下。

话说那时是2010年第一个工作日,早上来发现备份库小型机一个目录占满了:

p55a@/home/oracle> df -g

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on

/dev/fslv01       53.50      0.00  100%     1167    68% /stb_arch

查看可用的存储空间:

p55a@/home/oracle> lsvg
rootvg
backvg

p55a@/home/oracle> lsvg backvg
VOLUME GROUP:       backvg                   VG IDENTIFIER:  000221f30000d600000001141bb7ac12
VG STATE:           active                   PP SIZE:        256 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      546 (139776 megabytes)
MAX LVs:            256                      FREE PPs:       0 (0 megabytes)
LVs:                4                        USED PPs:       546 (139776 megabytes)
OPEN LVs:           3                        QUORUM:         2
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

已经没有足够的空间可以分配,而这个/stb_arch是存放主库传过来的归档日志,很显然备库在apply这些归档时出现了阻塞,导致主库过来的归档文件不断积压,最终占满了空间。

先去主库暂停到备库的日志同步:

SQL> show parameter log_archive_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest                     string
log_archive_dest_1                   string      LOCATION=/ora_arch VALID_FOR=(
ALL_LOGFILES,ALL_ROLES) DB_UNI
QUE_NAME=erpdb
log_archive_dest_10                  string
log_archive_dest_2                   string      SERVICE=bkdb VALID_FOR=(ONLIN
E_LOGFILES,PRIMARY_ROLE) DB_UN
IQUE_NAME=bkdb

SQL> alter system set log_archive_dest_state_2=defer;

System altered.

在备库查看DBA_LOGSTDBY_EVENTS视图:

SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTS order by 1 desc;

发现有建立临时表的操作引起了阻塞:

create global temporary table sntempStoreOut090407 on commit preserve rows as
select
g.islcode,idx,g.bcode as bcode1,g.bcode
,g.scode,g.hcode,g.receivecode,g.ihcode,g.ecode
,g.gcode,g.gattr1,g.gattr2,g.gattr3,g.gattr4,
,g.qtx,g.qtp,g.qtpunit,g.qtycount,g.qtyunit,g.qty,g.iodate as iodate,g.ratifydate as ratifydate,g.stockfrom,g.piecewt,g.slstatus
,g.rmb2,upric0 as upric0
,g.rmb0 as rmb0
,g.gtcode1 as gtcode1
,g.slcode
,g.ecode2
,g.busimode1
,busimode2
,bcode as vpreapre
,g.ccode as ccode
,g.gcode as gcode0
,g.sbcode
,g.upislcode
,g.slcodex,g.etrdcode,g.busimode3,g.inoutmode
from
storeslistg g where 1 <> 1

先停止standby,跳过这个事务,kill使用原来的临时表的session,手动执行以上SQL,然后开启standby:

SQL> alter database stop logical standby apply;

Database altered.

SQL> exec dbms_logstdby.skip_transaction(9,27,994864);

PL/SQL procedure successfully completed.

SQL> alter database start logical standby apply;

这样这个引起阻塞的事务就能过去了,但是由于空间占满了,应用仍然有问题,这时可以手动转移走一些比较靠后的不会马上应用的归档,然后有足够的空间后再还原回来,要注意的就是要保证这些文件的属主是Oracle用户。

# cd /tmp/log_2010
# ls -l
total 1025568
-rw-r-----   1 root     system      1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 root     system       110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 root     system        52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 root     system        65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 root     system        56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 root     system      1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 root     system        52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 root     system        59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 root     system        57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 root     system    520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 root     system      1545216 Jan 04 08:52 2_125518_640266118.dbf
# chown oracle.oinstall *
# ls -l
total 1025568
-rw-r-----   1 oracle   oinstall    1547264 Jan 04 08:52 1_118088_640266118.dbf
-rw-r-----   1 oracle   oinstall     110080 Jan 04 08:52 1_118089_640266118.dbf
-rw-r-----   1 oracle   oinstall      52224 Jan 04 08:52 1_118090_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118091_640266118.dbf
-rw-r-----   1 oracle   oinstall      65024 Jan 04 08:52 1_118092_640266118.dbf
-rw-r-----   1 oracle   oinstall      56832 Jan 04 08:52 1_118093_640266118.dbf
-rw-r-----   1 oracle   oinstall    1138688 Jan 04 08:52 1_118094_640266118.dbf
-rw-r-----   1 oracle   oinstall      52736 Jan 04 08:52 1_118095_640266118.dbf
-rw-r-----   1 oracle   oinstall      59904 Jan 04 08:52 1_118096_640266118.dbf
-rw-r-----   1 oracle   oinstall      57344 Jan 04 08:52 1_118097_640266118.dbf
-rw-r-----   1 oracle   oinstall  520338944 Jan 04 08:52 2_125502_640266118.dbf
-rw-r-----   1 oracle   oinstall    1545216 Jan 04 08:52 2_125518_640266118.dbf
# pwd
/tmp/log_2010
# cp * /stb_arch

之后在备库的同步继续进行,然后又经历了2个临时表的手动干预后,终于腾出了足够的空间,这时可以打开主库的同步:

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

后来又发现备库出现了停滞,而并没有引起阻塞的事务,仔细查看DBA_LOGSTDBY_LOG视图发现有些归档丢失,没有传到备库,去主库找归档时发现已经删除了,这时只能从RMAN中恢复这些文件。根据缺失的sequence来找对应的RMAN备份片:

ERPDB2@/orabak/arch/transmited>rman target /

RMAN> list backup of archivelog from sequence 125504 until sequence 125554 thread 2;

List of Backup Sets
===================

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16862   1.14G      DISK        00:04:57     04-JAN-10
BP Key: 19677   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707403929_16929_1

List of Archived Logs in backup set 16862
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125504  11523639376 04-JAN-10 11523954045 04-JAN-10
. . . . . .
2    125509  11525286806 04-JAN-10 11525631638 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16864   453.25M    DISK        00:01:57     04-JAN-10
BP Key: 19679   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T130015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707404542_16931_1

List of Archived Logs in backup set 16864
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125510  11525631638 04-JAN-10 11525981043 04-JAN-10
. . . . . .
2    125522  11526443719 04-JAN-10 11526475093 04-JAN-10

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
16867   401.53M    DISK        00:01:42     04-JAN-10
BP Key: 19682   Status: AVAILABLE  Compressed: YES  Tag: TAG20100104T213015
Piece Name: /orabak/arch/ERPDB_arch_20100104_707434218_16935_1

List of Archived Logs in backup set 16867
Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
---- ------- ---------- --------- ---------- ---------
2    125523  11526475093 04-JAN-10 11526613894 04-JAN-10
. . . . . .
2    125554  11529391158 04-JAN-10 11529453386 04-JAN-10

RMAN> exit

从这里的Piece Name可以看出缺少3个备份片文件,找到这3个文件,放到RMAN备份的目的地/orabak/arch/目录下。写到这里我发现《Oracle DBA手记》书稿里一个类似的操作被我写成了放到默认的归档日志路径下,可能是晚上赶稿神志不清吧,道歉先。

然后恢复出来归档日志到指定的目录:

RMAN> run {
2> set archivelog destination to '/orabak/testarch';
3> restore archivelog from sequence 125504 until sequence 125554 thread 2;
4> }

在恢复的过程中会自动把这些归档同时传到备库,无需手动再传。

备库端归档日志全了,就能顺利的同步了。至此这个问题得到解决。这个案例和我在《Oracle DBA手记》里记录的一个案例类似,可互相参看印证。


主库传到备库的日志发生了滞留,在DBA_LOGSTDBY_LOG视图发现大量的日志没有应用,而正在APPLY的只有几条,这说明不是大事务导 致的延迟。以前经常是特殊的DDL语句导致的阻塞,而这次在DBA_LOGSTDBY_EVENTS视图并没有发现可疑的DDL操作。

尝试停止APPLY,可是却迟迟没有响应:

SQL> alter database stop logical standby apply;
^Calter database stop logical standby apply
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

这时备库的日志记录了如下的信息:

Sun Sep 27 17:30:04 2009
alter database stop logical standby apply
Sun Sep 27 17:30:04 2009
ALTER DATABASE STOP LOGICAL STANDBY APPLY
Sun Sep 27 17:30:05 2009
LOGSTDBY: Shutdown acknowledged

Sun Sep 27 17:36:12 2009
ORA-1013 signalled during: alter database stop logical standby apply...

此时并没有停掉应用,尝试ABORT停止:

SQL> alter database abort logical standby apply;

Database altered.

日志记录了如下的信息:

Sun Sep 27 20:02:43 2009
ALTER DATABASE ABORT LOGICAL STANDBY APPLY
Sun Sep 27 20:02:44 2009
LOGSTDBY: Shutdown acknowledged
LOGSTDBY Analyzer process P006 pid=39 OS id=479250 stopped
LOGSTDBY Apply process P008 pid=43 OS id=381224 stopped
LOGSTDBY Apply process P011 pid=46 OS id=270744 stopped
LOGSTDBY Apply process P018 pid=53 OS id=315748 stopped
LOGSTDBY Apply process P009 pid=44 OS id=373196 stopped
LOGSTDBY Apply process P022 pid=89 OS id=409732 stopped
LOGSTDBY Apply process P025 pid=92 OS id=467268 stopped
LOGSTDBY Apply process P023 pid=90 OS id=278898 stopped
LOGSTDBY Apply process P012 pid=47 OS id=438534 stopped
LOGSTDBY Apply process P010 pid=45 OS id=299266 stopped
LOGSTDBY Apply process P021 pid=88 OS id=176472 stopped
LOGSTDBY Apply process P019 pid=72 OS id=421888 stopped
LOGSTDBY Apply process P007 pid=41 OS id=458914 stopped
LOGSTDBY Apply process P026 pid=93 OS id=410074 stopped
LOGSTDBY Apply process P020 pid=87 OS id=291214 stopped
LOGSTDBY Apply process P024 pid=91 OS id=397640 stopped
LOGSTDBY Apply process P013 pid=48 OS id=487712 stopped
LOGSTDBY Apply process P017 pid=52 OS id=401700 stopped
LOGSTDBY Apply process P014 pid=49 OS id=405806 stopped
LOGSTDBY Apply process P015 pid=50 OS id=586094 stopped
Sun Sep 27 20:03:43 2009
LOGSTDBY status: ORA-16128: User initiated stop apply successfully completed
Sun Sep 27 20:04:49 2009
Completed: alter database abort logical standby apply

再尝试开启应用,仍然没有反应:

SQL> alter database start logical standby apply;
^Calter database start logical standby apply
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

日志显示apply engine已经在运行,看来没有办法停掉它:

Sun Sep 27 20:06:56 2009
ALTER DATABASE START LOGICAL STANDBY APPLY (banpingbkdb)
Sun Sep 27 20:06:56 2009
No optional part
Attempt to start background Logical Standby process
LSP0 started with pid=21, OS id=655544
LOGSTDBY status: ORA-16084: an apply engine is already running
Sun Sep 27 20:06:57 2009
Errors in file /u01/admin/banpingbkdb/bdump/banpingbkdb_lsp0_655544.trc:
ORA-16084: an apply engine is already running
LOGSTDBY status: ORA-16222: automatic Logical Standby retry of last action
LOGSTDBY status: ORA-16084: an apply engine is already running
Sun Sep 27 20:06:57 2009
Errors in file /u01/admin/banpingbkdb/bdump/banpingbkdb_lsp0_655544.trc:
ORA-16084: an apply engine is already running

现在stop可以了:

SQL>
SQL> alter database stop logical standby apply;

Database altered.

但是再开启仍然不行,apply engine一直在运行。没有什么好办法,重启备库的instance吧:

SQL> alter database stop logical standby apply;

Database altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

在重启的过程中也等待了很久,日志如下:

Sun Sep 27 20:24:04 2009
Starting background process EMN0
EMN0 started with pid=21, OS id=536826
Sun Sep 27 20:24:04 2009
Shutting down instance: further logons disabled
Sun Sep 27 20:24:04 2009
Stopping background process CJQ0
Sun Sep 27 20:24:04 2009
Stopping background process MMNL
Sun Sep 27 20:24:05 2009
Stopping background process MMON
Sun Sep 27 20:24:06 2009
Shutting down instance (immediate)
License high water mark = 98
Sun Sep 27 20:24:06 2009
Stopping Job queue slave processes
Sun Sep 27 20:24:06 2009
Job queue slave processes stopped
Sun Sep 27 20:29:05 2009
Active call for process 557346 user 'oracle' program 'oracle@p55a (P016)'
SHUTDOWN: waiting for active calls to complete.

似乎一直在等待这个process 557346,于是直接kill之:

# ps -ef | grep 557346
    root 606208 528518   0 20:46:00  pts/3  0:00 grep 557346
  oracle 557346      1  93   Aug 27      - 1018:44 ora_p016_banpingbkdb
# kill -9 557346

然后就顺利的关掉了:

Sun Sep 27 20:43:57 2009
MMNL absent for 1203 secs; Foregrounds taking over
Sun Sep 27 20:46:20 2009
All dispatchers and shared servers shutdown
Sun Sep 27 20:46:20 2009
Process OS id : 643398 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:21 2009
Process OS id : 696622 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:22 2009
Process OS id : 606694 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:23 2009
Process OS id : 647260 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:23 2009
Process OS id : 536932 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:24 2009
Process OS id : 372796 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:24 2009
Process OS id : 348212 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:25 2009
Process OS id : 430426 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:26 2009
Process OS id : 634944 alive after kill
Errors in file /u01/admin/banpingbkdb/udump/banpingbkdb_ora_610574.trc
Sun Sep 27 20:46:27 2009
ALTER DATABASE CLOSE NORMAL
Sun Sep 27 20:46:41 2009
SMON: disabling tx recovery
SMON: disabling cache recovery
Sun Sep 27 20:46:48 2009
Shutting down archive processes
Archiving is disabled
Sun Sep 27 20:46:53 2009
ARCH shutting down
ARC1: Archival stopped
Sun Sep 27 20:46:58 2009
ARCH shutting down
ARC0: Archival stopped
Sun Sep 27 20:46:59 2009
Thread 1 closed at log sequence 43105
Successful close of redo thread 1
Sun Sep 27 20:47:02 2009
Completed: ALTER DATABASE CLOSE NORMAL
Sun Sep 27 20:47:02 2009
ALTER DATABASE DISMOUNT
Sun Sep 27 20:47:02 2009
SUCCESS: diskgroup DATA was dismounted
Sun Sep 27 20:47:02 2009
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active

然后再启动实例:

SQL> startup
ORACLE instance started.

Total System Global Area 1.0737E+10 bytes
Fixed Size                  2090664 bytes
Variable Size            3707767128 bytes
Database Buffers         7012876288 bytes
Redo Buffers               14684160 bytes
Database mounted.
Database opened.
SQL> alter database start logical standby apply;

Database altered.

再查看日志,logic standby终于恢复了正常。

启动的日志也记录一下吧:

Sun Sep 27 20:48:13 2009
Starting ORACLE instance (normal)
Sun Sep 27 20:48:29 2009
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =121
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
  processes                = 1000
  lock_sga                 = TRUE
  __shared_pool_size       = 3623878656
  __large_pool_size        = 16777216
  __java_pool_size         = 16777216
  __streams_pool_size      = 50331648
  streams_pool_size        = 50331648
  nls_language             = AMERICAN
  nls_territory            = CHINA
  nls_length_semantics     = BYTE
  resource_manager_plan    =
  sga_target               = 10737418240
  control_files            = +DATA/banpingbkdb/controlfile/db_control1, +DATA/banpingbkdb/controlfile/db_control2, +DATA/banpingbkdb/controlfile/db_control3
  db_file_name_convert     = /dev, +DATA/banpingbkdb/datafile
  log_file_name_convert    = /dev, +DATA/banpingbkdb/onlinelog
  db_block_size            = 8192
  __db_cache_size          = 7012876288
  compatible               = 10.2.0.1.0
  log_archive_config       = DG_CONFIG=(cnderpdb, banpingbkdb)
  log_archive_dest_1       = LOCATION=/ora_arch/ VALID_FOR=(ONLINE_LOGFILE,ALL_ROLES)  DB_UNIQUE_NAME=banpingbkdb
  log_archive_dest_2       = SERVICE=cnderpdb1 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=cnderpdb
  log_archive_dest_3       = LOCATION=/stb_arch/ VALID_FOR=(STANDBY_LOGFILE,STANDBY_ROLE)  DB_UNIQUE_NAME=banpingbkdb
  log_archive_format       = %t_%s_%r.dbf
  fal_client               = banpingbkdb
  fal_server               = CNDERPDB1, CNDERPDB2
  db_file_multiblock_read_count= 16
  standby_file_management  = MANUAL
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  undo_retention           = 900
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = chinacdc.com
  dispatchers              = (PROTOCOL=TCP) (SERVICE=banpingbkdbXDB)
  job_queue_processes      = 10
  background_dump_dest     = /u01/admin/banpingbkdb/bdump
  user_dump_dest           = /u01/admin/banpingbkdb/udump
  core_dump_dest           = /u01/admin/banpingbkdb/cdump
  audit_file_dest          = /u01/admin/banpingbkdb/adump
  session_max_open_files   = 20
  db_name                  = banpingbkdb
  db_unique_name           = banpingbkdb
  open_cursors             = 300
  pga_aggregate_target     = 2147483648
  aq_tm_processes          = 0
PMON started with pid=2, OS id=639102
PSP0 started with pid=3, OS id=672098
MMAN started with pid=4, OS id=545050
DBW0 started with pid=5, OS id=655710
DBW1 started with pid=6, OS id=635230
LGWR started with pid=7, OS id=627006
CKPT started with pid=8, OS id=643408
SMON started with pid=9, OS id=610492
RECO started with pid=10, OS id=614748
CJQ0 started with pid=11, OS id=274744
MMON started with pid=12, OS id=589914
Sun Sep 27 20:48:29 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=13, OS id=696642
Sun Sep 27 20:48:29 2009
starting up 1 shared server(s) ...
Sun Sep 27 20:48:30 2009
ALTER DATABASE   MOUNT
Sun Sep 27 20:48:30 2009
Starting background process ASMB
ASMB started with pid=17, OS id=524414
Starting background process RBAL
RBAL started with pid=18, OS id=299234
Sun Sep 27 20:48:34 2009
SUCCESS: diskgroup DATA was mounted
Sun Sep 27 20:48:38 2009
Setting recovery target incarnation to 2
Using STANDBY_ARCHIVE_DEST parameter default value as /stb_arch/
Sun Sep 27 20:48:38 2009
Successful mount of redo thread 1, with mount id 1040980254
Sun Sep 27 20:48:38 2009
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sun Sep 27 20:48:38 2009
ALTER DATABASE OPEN
Sun Sep 27 20:48:39 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS id=467272
Sun Sep 27 20:48:39 2009
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=21, OS id=418070
Sun Sep 27 20:48:39 2009
ARC0: Becoming the 'no FAL' ARCH
Sun Sep 27 20:48:39 2009
ARC0: Becoming the 'no SRL' ARCH
Sun Sep 27 20:48:39 2009
ARC1: Becoming the heartbeat ARCH
Sun Sep 27 20:48:39 2009
Thread 1 opened at log sequence 43105
  Current log# 4 seq# 43105 mem# 0: +DATA/banpingbkdb/onlinelog/rdb_redo1_4
Successful open of redo thread 1
Sun Sep 27 20:48:39 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Sep 27 20:48:39 2009
SMON: enabling cache recovery
Sun Sep 27 20:48:40 2009
Successfully onlined Undo Tablespace 1.
Sun Sep 27 20:48:40 2009
SMON: enabling tx recovery
Sun Sep 27 20:48:40 2009
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected.
Completed: ALTER DATABASE OPEN
Sun Sep 27 20:50:41 2009
alter database start logical standby apply
Sun Sep 27 20:50:42 2009
ALTER DATABASE START LOGICAL STANDBY APPLY (banpingbkdb)
Sun Sep 27 20:50:42 2009
No optional part
Attempt to start background Logical Standby process
LSP0 started with pid=24, OS id=561496
Sun Sep 27 20:50:42 2009
LOGSTDBY Parameter:                        MAX_SGA = 1000
LOGSTDBY Parameter:                    MAX_SERVERS = 27
Completed: alter database start logical standby apply


今天又碰到了Logic Standby同步的问题,起因是这样的,有用户反映在备库的查询数据很旧,于是检查备库的同步信息:

SELECT FILE_NAME, SEQUENCE# AS SEQ#, FIRST_CHANGE# AS FCHANGE#,NEXT_CHANGE# AS NCHANGE#, TIMESTAMP,DICT_BEGIN AS BEG, DICT_END AS END,THREAD# AS THR# , APPLIED
FROM DBA_LOGSTDBY_LOG ORDER BY THREAD#,SEQUENCE#;

发现日志并没有积压,以为只是大事务没有完成,没什么问题,可是检查DBA_LOGSTDBY_EVENTS视图还是发现了问题:

SELECT EVENT_TIME,EVENT,XIDUSN, XIDSLT, XIDSQN FROM DBA_LOGSTDBY_EVENTS order by 1 desc;

EVENT_TIME EVENT XIDUSN XIDSLT XIDSQN

2009-8-27 10:38:55 <CLOB> 7 29 735354

<CLOB>对象记录的SQL如下:

create global temporary table sntempStoreOut1210164_3 on commit preserve rows as
select
*
from
storeslistg where 1 <> 1

看来是这个创建临时表的DDL引起了阻塞,先停止standby:

SQL> alter database stop logical standby apply;

Database altered.

尝试手动执行这个DDL:

SQL> create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
  2  select
  3  *  from
 4  banping.storeslistg g where 1 <> 1
 5  /
create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
                                     *
ERROR at line 1:
ORA-00955: name is already used by an existing object

看来这个临时表已经存在了,由于程序设计的原因,每次使用临时表的时候会判断是否存在,如果不存在则创建,旧的已有的应该是可以删掉的:

SQL> drop table banping.sntempStoreOut1210164_3;
drop table banping.sntempStoreOut1210164_3
                  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use
SQL> truncate table banping.sntempStoreOut1210164_3;

Table truncated.

SQL> drop table banping.sntempStoreOut1210164_3;
drop table banping.sntempStoreOut1210164_3
                  *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table already in use

无法drop掉旧的表,只能找到使用这个表的session并杀掉:

SQL> select sid,id1,type from v$lock where
  2  id1=(select object_id from dba_objects where object_name=upper('sntempStoreOut1210164_3'))
  3  /

       SID        ID1 TY
---------- ---------- --
      1021     195370 TO
      1024     195370 TO
      1052     195370 TO
      1072     195370 TO
      1084     195370 TO
      1053     195370 TO

6 rows selected.

通过SID到v$session找到对应的serial#后就可以杀掉session了,发现这几个session都是INACTIVE状态的,估 计是以前挂在这边的了。奇怪的是这里的TYPE值是TO,不知道TO是什么类型的锁,Oracle的文档也只记录了以下用户类型锁,系统类型的锁也没有 TO:

TM - DML enqueue
TX - Transaction enqueue
UL - User supplied

手动kill session,释放对临时表的锁定:

SQL> alter system kill session '1021,1444';

System altered.

SQL> alter system kill session '1024,1116';

System altered.

SQL> alter system kill session '1052,2688';

System altered.

SQL> alter system kill session '1053,2579';

System altered.

SQL> alter system kill session '1072,473';

System altered.

SQL> alter system kill session '1084,948';

System altered.

然后就可以drop掉原来的临时表:

SQL> drop table banping.sntempStoreOut1210164_3;

Table dropped.

再手动创建新的临时表:

SQL> create global temporary table banping.sntempStoreOut1210164_3 on commit preserve rows as
  2  select
  3  *  from
 4  banping.storeslistg g where 1 <> 1
 5  /

Table created.

然后手动跳过这个事务:

SQL> exec dbms_logstdby.skip_transaction(7,29,735354);

PL/SQL procedure successfully completed.

重新开始standby:

SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

Database altered.

观察日志,一切恢复正常。logic standby麻烦的地方就是没有足够的警告日志,要人工去查找原因,当然监控也是可以通过查询一些视图做到的。



历史上的今天...
    >> 2009-02-01文章:
    >> 2006-02-01文章:
           肯将一生爱 付与师徒情

By eygle on 2011-02-01 10:30 | Comments (0) | Oracle摘 | 2725 |


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com