eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« spam留言知几何之三 | Blog首页 | 圣诞快乐与搜索引擎的力量 »

寒冬中的温暖-SUN E4500温度过高当机
modb.pro

上个周末,一台数据库服务器SUN E4500因为故障,温度过高导致当机,那么温度有多高呢?

[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 470940 kern.warning] WARNING: SBus FFB SOC+ IO board 1 still too hot (temperature: 68C).
Overtemp shutdown started

系统Shutdown的时候,温度达到了68度。在这寒冷的冬日里,这个温度真实太温暖了。
启动后检查,是一块IO板出了问题:

bash-2.03# /usr/platform/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500
系统时钟频率:100 MHz
内存大小:2048Mb

========================= CPUs =========================

Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 400 8.0 US-II 10.0
0 1 1 400 8.0 US-II 10.0
2 4 0 400 8.0 US-II 10.0
2 5 1 400 8.0 US-II 10.0
4 8 0 400 8.0 US-II 10.0
4 9 1 400 8.0 US-II 10.0


========================= 内存 =========================

Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
--- ----- ---- ------- ---------- ----- ------- -------
0 0 1024 Active OK 60ns 2-way A
2 0 1024 Active OK 60ns 2-way A

========================= IO 卡 =========================

Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
1 SBus 25 0 SUNW,socal/sf (scsi-3) 501-5266
1 SBus 25 3 SUNW,hme
1 SBus 25 3 SUNW,fas/sd (block)
1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060
1 UPA 100 2 FFB, Double Buffered SUNW,501-4790

Detached Boards
===============
Slot State Type Info
---- --------- ------ -----------------------------------------
3 failed disk Disk 0: no disk Disk 1: no disk

系统中失败的字段取代单元 (FRU):
==============================================
disk-board 在 IO 板上不可用 #3 上
PROM 错误字符串:fail
失败的字段取代单元为 IO 板 3

Detected System Faults
======================
Board 1 fault: Overtemp
Detected Sat Dec 16 02:24:21 2006
Unit 2 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
Unit 1 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
PROM detected failure
Detected Fri Dec 15 23:24:23 2006

最近的 AC 电源故障:
=============================
Fri May 27 14:53:06 2005


========================= 环境状态 =========================
Keyswitch position is in Normal Mode
System Power Status: Minimum Available
System LED Status: GREEN YELLOW GREEN
WARNING ON ON BLINKING


Fans:
-----
Unit Status
---- ------
Rack OK
Key OK
AC OK

System Temperatures (Celsius):
------------------------------
Brd State Current Min Max Trend
--- ------- ------- --- --- -----
0 OK 39 36 43 stable
1 WARNING 66 46 67 stable
2 OK 39 36 43 stable
4 OK 53 50 55 stable
CLK OK 38 37 40 stable


Power Supplies:
---------------
Supply Status
--------- ------
0 OK
1 FAIL
2 FAIL
3 OK
PPS OK
System 3.3v OK
System 5.0v OK
Peripheral 5.0v OK
Peripheral 12v OK
Auxilary 5.0v OK
Peripheral 5.0v precharge OK
Peripheral 12v precharge OK
System 3.3v precharge OK
System 5.0v precharge OK
AC Power OK


========================= HW Revisions =========================

ASIC Revisions:
---------------
Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes
--- --- -- ----- ----- ---- ---- ---- ---------- ----------
0 1 5 CPU 100MHz Capable
1 1 5 1 22 UPA-SBus-SOC+ 100MHz Capable
2 1 5 CPU 100MHz Capable
3 Unknown 100MHz Capable
4 1 5 CPU 100MHz Capable

Board 1 FFB Hardware Configuration:
-----------------------------------
Board rev: 2
FBC version: 0x3241906d
DAC: Brooktree 9070, version 1
3DRAM: Mitsubishi 130b, version 2

System Board PROM revisions:
----------------------------
Board 0: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 1: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49
Board 2: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 4: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50

更郁闷的是,目前这台服务器处于关键运营时期,还不能重新启动更换硬件。
只好等下次何时Down机。

-The End-


历史上的今天...
    >> 2012-12-19文章:
    >> 2011-12-19文章:
    >> 2008-12-19文章:
    >> 2007-12-19文章:
           Baby病了
    >> 2005-12-19文章:
           Oracle Metalink is rebuild
    >> 2004-12-19文章:
           Berkeley DB安装记录
           MT安装备忘录
           我的MT,新的开始

By eygle on 2006-12-19 08:50 | Comments (2) | System | 1261 |

2 Comments

装个温度发电机, 带走一些热量吧.

4500够智能化的,温度传感器遍布全身。多给机柜走个空调出风口了,或许可以有点用


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com