Oracle Dataguard备库失败与主库响应测试

« 恩墨科技拟招聘一名Oracle数据库工程师 | Blog首页 | 深入解析Oracle - 实战案例模拟与实践30讲 »

在客户环境中，使用了Oracle 10.2.0.4 DataGuard技术，通过最大可用性模式进行数据保护。

以下简单测试，当备库关闭后，再重启备库，主库及备库的响应过程。

在备库执行如下步骤：

SQL> shutdown immediate;
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084400 bytes
Variable Size             385876432 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

Database altered.

SQL> exit

当备库关闭后，主库立即检测到备库的失败：

Tue Nov 17 20:34:58 2009
ARC1: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
ARC1: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
PING[ARC1]: Error 3113 when pinging standby STANDBY.
Tue Nov 17 20:35:17 2009
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Tue Nov 17 20:35:17 2009
Errors in file /opt/oracle/admin/oradbt/bdump/oradbt_lgwr_319632.trc:
ORA-03113: end-of-file on communication channel
LGWR: Network asynch I/O wait error 3113 log 1 service 'STANDBY'
Tue Nov 17 20:35:17 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 1 thread 1 sequence 3466 (3113)
Tue Nov 17 20:35:18 2009
LGWR: Closing remote archive destination LOG_ARCHIVE_DEST_2: 'STANDBY' (error 3113)
(oradbt)
Tue Nov 17 20:35:18 2009
Errors in file /opt/oracle/admin/oradbt/bdump/oradbt_lgwr_319632.trc:
ORA-01041: internal error. hostdef extension doesn't exist
LGWR: Error 1041 closing archivelog file 'STANDBY'
LGWR: Error 1041 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'STANDBY'

当备库重启后：

Tue Nov 17 20:35:23 2009
Thread 1 advanced to log sequence 3467 (LGWR switch)
Current log# 2 seq# 3467 mem# 0: /redodata/oradbt/redo02.log
Tue Nov 17 20:38:34 2009
Thread 1 advanced to log sequence 3468 (LGWR switch)
Current log# 3 seq# 3468 mem# 0: /redodata/oradbt/redo03.log
Tue Nov 17 20:39:59 2009
Thread 1 cannot allocate new log, sequence 3469
Checkpoint not complete
Current log# 3 seq# 3468 mem# 0: /redodata/oradbt/redo03.log
LNSb started with pid=19, OS id=573716
Tue Nov 17 20:40:05 2009
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 3469
LGWR: Standby redo logfile selected for thread 1 sequence 3469 for destination LOG_ARCHIVE_DEST_2
Tue Nov 17 20:40:05 2009
Thread 1 advanced to log sequence 3469 (LGWR switch)
Current log# 1 seq# 3469 mem# 0: /redodata/oradbt/redo01.log
Tue Nov 17 20:40:05 2009
ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2
ARC0: Standby redo logfile selected for thread 1 sequence 3468 for destination LOG_ARCHIVE_DEST_2

此时备库也恢复了同步：

Tue Nov 17 20:35:30 2009
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Tue Nov 17 20:35:30 2009
Attempt to start background Managed Standby Recovery process (oradbt)
MRP0 started with pid=19, OS id=254276
Tue Nov 17 20:35:30 2009
MRP0: Background Managed Standby Recovery process started (oradbt)
Managed Standby Recovery not using Real Time Apply
parallel recovery started with 11 processes
Tue Nov 17 20:35:35 2009
Waiting for all non-current ORLs to be archived...
Media Recovery Waiting for thread 1 sequence 3466
Tue Nov 17 20:35:36 2009
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 172068
RFS[1]: Identified database type as 'physical standby'
Tue Nov 17 20:39:58 2009
RFS LogMiner: Client disabled from further notification
RFS[1]: Archived Log: '/oradata/archive/1_3466_690213276.dbf'
RFS[1]: Archived Log: '/oradata/archive/1_3467_690213276.dbf'
Tue Nov 17 20:40:00 2009
Media Recovery Log /oradata/archive/1_3466_690213276.dbf
Media Recovery Log /oradata/archive/1_3467_690213276.dbf
Media Recovery Waiting for thread 1 sequence 3468
Tue Nov 17 20:40:05 2009
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[2]: Assigned to RFS process 1441960
RFS[2]: Identified database type as 'physical standby'
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to MAXIMUM AVAILABILITY level
RFS[2]: Successfully opened standby log 4: '/redodata/oradbt/stdrd1.log'
Tue Nov 17 20:40:05 2009
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[3]: Assigned to RFS process 1442140
RFS[3]: Identified database type as 'physical standby'
RFS[3]: Successfully opened standby log 5: '/redodata/oradbt/stdrd2.log'
Tue Nov 17 20:40:06 2009
Media Recovery Log /oradata/archive/1_3468_690213276.dbf
Media Recovery Waiting for thread 1 sequence 3469 (in transit)

检查主库的LGWR日志，可以看到整个过程的后台处理：

*** 2009-11-17 20:35:23.248 64165 kcrr.c
LGWR: Error 1041 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'STANDBY'
Ignoring krslcmp() detach error 1041
kcrrtsync: Standby mount ID 0xc896b692 not found
*** 2009-11-17 20:35:23.249 2342 krsl.c
No standby database destinations have been configured
as being archived by the LGWR process
This instance will operate at a reduced protection mode until
network connectivity to the standby databases is restored and
all archivelog gaps have been resolved.
*** 2009-11-17 20:38:34.160
kcrrtsync: Standby mount ID 0xc896b692 not found
*** 2009-11-17 20:38:34.160 2342 krsl.c
No standby database destinations have been configured
as being archived by the LGWR process
This instance will operate at a reduced protection mode until
network connectivity to the standby databases is restored and
all archivelog gaps have been resolved.
*** 2009-11-17 20:40:02.286
kcrrtsync: Standby mount ID 0xc896b692 not found

Oracle通过数据库的Mount ID来查找目标实例，备库关闭Mount ID不存在，则错误出现。
重新初始化加你LNS进程的过程如下：

*** 2009-11-17 20:40:02.286 56939 kcrr.c
Initializing NetServer[LNSb] for dest=STANDBY mode SYNC
LNSb is not running anymore.
New SYNC LNSb needs to be started
Waiting for subscriber count on LGWR-LNSb channel to go to zero
Subscriber count went to zero - time now is <11/17/2009 20:40:02>
Starting LNSb ...
Waiting for LNSb to initialize itself
*** 2009-11-17 20:40:05.330 57230 kcrr.c
Netserver LNSb [pid 573716] for mode SYNC has been initialized
Performing a channel reset to ignore previous responses
Successfully started LNSb [pid 573716] for dest STANDBY mode SYNC ocis=0x1104db358
*** 2009-11-17 20:40:05.330 57733 kcrr.c
Making upiahm request to LNSb [pid 573716]: Begin Time is <11/17/2009 20:40:02>. NET_TIMEOUT = <180> seconds
Waiting for LNSb to respond to upiahm
*** 2009-11-17 20:40:05.413 57897 kcrr.c
upiahm connect done status is 0
Receiving message from LNSb
Receiving message from LNSb
Receiving message from LNSb
*** 2009-11-17 20:40:05.609 59112 kcrr.c
Making upinbls request to LNSb (ocis 0x1104db358). Begin time is <11/17/2009 20:40:02> and NET_TIMEOUT is <180> seconds
NetServer pid:573716