DataGuard数据库服务器硬盘故障处理一则

« 将在“2006年中国软件技术大会”演讲 | Blog首页 | 2006年Q3中国数据库市场 Oracle再占首位 »

昨天一台PC Server上的数据库又出问题，同样是硬盘故障。

这两台服务器用的都是联志的国产低端PC Server，这些服务器的质量实在是差，上次一台备机的硬盘损坏，然后又有一台因为电源模块的问题反复重起，现在这一台服务器的硬盘再次出现问题。

Nov 24 10:27:48 wapcom1 kernel: attempt to access beyond end of device
Nov 24 10:27:48 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191
Nov 24 10:27:48 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir:
directory #128110 contains a hole at offset 2011258880
Nov 24 10:27:49 wapcom1 kernel: attempt to access beyond end of device
Nov 24 10:27:49 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191
Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir:
directory #128110 contains a hole at offset 2011262976
Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device
Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191
Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir:
directory #128110 contains a hole at offset 2011267072
Nov 24 10:27:50 wapcom1 kernel: attempt to access beyond end of device
Nov 24 10:27:50 wapcom1 kernel: 08:08: rw=0, want=1564747716, limit=5245191
Nov 24 10:27:50 wapcom1 kernel: EXT3-fs error (device sd(8,8)): ext3_readdir:
directory #128110 contains a hole at offset 2011271168

好在数据库通过DataGuard可以切换到另外一台，没有数据损失：

Thu Nov 23 18:46:18 2006
ARC0: Complete FAL archive (thread 1 sequence 6045 destination bmarksb)
ARC0: Begin FAL archive (thread 1 sequence 6047 destination bmarksb)
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'
ARC0: Complete FAL archive (thread 1 sequence 6047 destination bmarksb)
ARC0: Begin FAL archive (thread 1 sequence 6048 destination bmarksb)
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'
Thu Nov 23 18:46:18 2006
ARC1: Complete FAL archive (thread 1 sequence 6046 destination bmarksb)
ARC1: Begin FAL archive (thread 1 sequence 6049 destination bmarksb)
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'
Thu Nov 23 18:46:18 2006
ARC0: Complete FAL archive (thread 1 sequence 6048 destination bmarksb)
Thu Nov 23 18:46:18 2006
ARC1: Complete FAL archive (thread 1 sequence 6049 destination bmarksb)

现在是主库所在的服务器出现问题:

SQL> select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;

DBID NAME PROTECTION_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- --------- -------------------- ---------------- ------------------
3520694939 BMARK MAXIMUM PERFORMANCE PRIMARY SESSIONS ACTIVE

备库现在一切正常:

SQL> select dbid,name,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS from v$database;

DBID NAME PROTECTION_MODE DATABASE_ROLE SWITCHOVER_STATUS
---------- --------- -------------------- ---------------- ------------------
3520694939 BMARK MAXIMUM PERFORMANCE PHYSICAL STANDBY SESSIONS ACTIVE

现在需要的是一点停机时间进行切换。

切换日志:

Fri Nov 24 11:30:43 2006
alter database commit to switchover to physical standby with session shutdown
Fri Nov 24 11:30:43 2006
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY
Fri Nov 24 11:30:43 2006
SMON: disabling tx recovery
Fri Nov 24 11:30:44 2006
Active process 26743 user 'oracle' program 'oracle@wapcom1.hawa.cn (CJQ0)'
Active process 9033 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
Active process 7655 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
...............
Active process 8944 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
Active process 29104 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
Active process 30750 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
Active process 9045 user 'oracle' program 'oracle@wapcom1.hawa.cn (TNS V1-V3)'
CLOSE: waiting for server sessions to complete.
Fri Nov 24 11:31:51 2006
CLOSE: all sessions shutdown successfully.
Fri Nov 24 11:32:09 2006
SMON: disabling cache recovery
Fri Nov 24 11:32:10 2006
Shutting down archive processes
Archiving is disabled
Fri Nov 24 11:32:10 2006
ARCH shutting down
Fri Nov 24 11:32:10 2006
ARCH shutting down
Fri Nov 24 11:32:10 2006
ARC0: Archival stopped
Fri Nov 24 11:32:10 2006
ARC1: Archival stopped
Fri Nov 24 11:32:10 2006
Thread 1 closed at log sequence 6076
Successful close of redo thread 1
Fri Nov 24 11:32:28 2006
ARCH: noswitch archival of thread 1, sequence 6076
ARCH: End-Of-Redo archival of thread 1 sequence 6076
ARCH: Evaluating archive log 3 thread 1 sequence 6076
ARCH: Beginning to archive log 3 thread 1 sequence 6076
Creating archive destination LOG_ARCHIVE_DEST_2: 'bmarksb'
Creating archive destination LOG_ARCHIVE_DEST_1: '/var/oradata/arch/1_6076.arc'
ARCH: Completed archiving log 3 thread 1 sequence 6076
ARCH: archiving is disabled due to current logfile archival
Clearing standby activation ID 3520937155 (0xd1dd3cc3)
The primary database controlfile was created using the
'MAXLOGFILES 5' clause.
The resulting standby controlfile will not have enough
available logfile entries to support an adequate number
of standby redo logfiles. Consider re-creating the
primary controlfile using 'MAXLOGFILES 8' (or larger).
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 10485760;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 10485760;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 10485760;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 10485760;
Archivelog for thread 1 sequence 6076 required for standby recovery
MRP0 started with pid=8
MRP0: Background Managed Standby Recovery process started
Media Recovery Log /var/oradata/arch/1_6076.arc
Identified end-of-REDO for thread 1 sequence 6076
Identified end-of-REDO for thread 1 sequence 6076
Media Recovery End-Of-Redo indicator encountered
Media Recovery Applied until change 194025715
MRP0: Media Recovery Complete: End-Of-REDO
Resetting standby activation ID 3520937155 (0xd1dd3cc3)
MRP0: Background Media Recovery process shutdown
Fri Nov 24 11:32:35 2006
Switchover: Complete - Database shutdown required
Completed: alter database commit to switchover to physical st
Fri Nov 24 11:32:53 2006
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 140
Fri Nov 24 11:32:53 2006
ALTER DATABASE CLOSE NORMAL
ORA-1507 signalled during: ALTER DATABASE CLOSE NORMAL...
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Fri Nov 24 11:33:14 2006
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.6.0.
System parameters with non-default values:
processes = 150
timed_statistics = TRUE
shared_pool_size = 83886080
large_pool_size = 33554432
standby_archive_dest = /var/oradata/arch
fal_server = bmarksb
fal_client = bmark
log_archive_format = %t_%s.arc
...........
CJQ0 started with pid=8
Fri Nov 24 11:33:15 2006
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=9
ARC0: Archival started
ARC1 started with pid=10
Fri Nov 24 11:33:15 2006
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Nov 24 11:33:15 2006
ARC1: Archival started
Fri Nov 24 11:33:15 2006
ARC0: Thread not mounted
Fri Nov 24 11:33:15 2006
ARC1: Thread not mounted
Fri Nov 24 11:33:22 2006
alter database mount standby database
Fri Nov 24 11:33:26 2006
Successful mount of redo thread 1, with mount id 3559140162
Fri Nov 24 11:33:26 2006
Standby Database mounted.
Completed: alter database mount standby database
Fri Nov 24 11:33:29 2006
ALTER DATABASE RECOVER managed standby database disconnect
Attempt to start background Managed Standby Recovery process
MRP0 started with pid=12
MRP0: Background Managed Standby Recovery process started
Fri Nov 24 11:33:34 2006
Completed: ALTER DATABASE RECOVER managed standby database d
Fri Nov 24 11:33:34 2006
Media Recovery Waiting for thread 1 seq# 6077
Media Recovery Log /var/oradata/arch/1_6077.arc
Media Recovery Waiting for thread 1 seq# 6078
Media Recovery Log /var/oradata/arch/1_6078.arc
Media Recovery Waiting for thread 1 seq# 6079

看来以后不能再采购联志服务器了。

-The End-

历史上的今天...
>> 2018-11-28文章:

Oracle 18c 19c 安装的 DBT-50000 错误解决

>> 2011-11-28文章:

Oracle中 HWM与数据库性能的探讨

>> 2010-11-28文章:

诊断案例分享 - ACOUG 11月活动结束

>> 2008-11-28文章:

《深入解析Oracle》一书封面初稿

>> 2007-11-28文章:

Oracle的监听口令及监听器安全

>> 2005-11-28文章:

Tools:Windows Service Install/Remove Wizard

12 Comments

jacky | November 28, 2006 6:08 PM

唉，没想到HY也会用这么低端的设备，我本以为我们用宝德的已经够低端了。

蛋白粉 | November 29, 2006 11:07 PM

还不如DIY的服务器呢

eygle | November 30, 2006 3:54 PM

并非重要系统，所以设备也差一些...

piner | August 29, 2007 1:17 PM

还不如我们的测试环境呢

wangliang | May 19, 2008 10:23 AM

又来广告了,ft!!

eygle | May 19, 2008 10:26 AM

干掉了!

mabao | May 24, 2008 11:30 AM

You Link:

mabao | May 24, 2008 11:32 AM

出售:希捷300G/15K S服务器硬盘 \原装希捷300G SAS(ST3300555SS)\st31327fc\日立７3/15K/80\ST373455LC服务器CPU
专业的服务器配件渠道销售,SAS SCSI SATA(企业级)硬盘 XEON 服务器主板,核心渠道,价格绝优,欢迎来电 0755 82727353 13530015777 QQ:85288626 MSN:mabao76520 @live.cn

wangliang | May 26, 2008 9:26 AM

生意真好,广告真多!

mabao | May 27, 2008 4:26 PM

出售:希捷300G/15K S服务器硬盘 \原装希捷300G SAS(ST3300555SS)\st31327fc\日立７3/15K/80\ST373455LC服务器CPU
专业的服务器配件渠道销售,SAS SCSI SATA(企业级)硬盘 XEON 服务器主板,核心渠道,价格绝优,欢迎来电0755 82727353 13530015777 QQ:85288626 MSN:mabao76520@live.cn

wangliang | May 28, 2008 9:04 AM

广告又来了,FT!

skles | November 16, 2008 4:24 PM

大哥，我一直没搞懂一件事情，做了dataguard后，前段程序怎么弄？primary和standby毕竟IP啥的都不一样