eygle.com   eygle.com
eygle.com  
 

« spam留言知几何之三 | Blog首页 | 圣诞快乐与搜索引擎的力量 »

寒冬中的温暖-SUN E4500温度过高当机

作者:eygle |【转载时请务必以超链接形式标明文章和作者信息及本声明
链接:

上个周末,一台数据库服务器SUN E4500因为故障,温度过高导致当机,那么温度有多高呢?

[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 350302 kern.notice] NOTICE: SBus FFB SOC+ IO board 1 is cooling (temperature: 67C)
[ID 538492 kern.notice] NOTICE: System shutdown due to over-temperature condition cancelled
[ID 110001 kern.warning] WARNING: SBus FFB SOC+ IO board 1 is very hot (temperature: 68C)
[ID 516145 kern.warning] WARNING: System shutdown scheduled in 20 seconds due to
over-temperature condition on SBus FFB SOC+ IO board 1
[ID 470940 kern.warning] WARNING: SBus FFB SOC+ IO board 1 still too hot (temperature: 68C).
Overtemp shutdown started

系统Shutdown的时候,温度达到了68度。在这寒冷的冬日里,这个温度真实太温暖了。
启动后检查,是一块IO板出了问题:

bash-2.03# /usr/platform/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500
系统时钟频率:100 MHz
内存大小:2048Mb

========================= CPUs =========================

Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 400 8.0 US-II 10.0
0 1 1 400 8.0 US-II 10.0
2 4 0 400 8.0 US-II 10.0
2 5 1 400 8.0 US-II 10.0
4 8 0 400 8.0 US-II 10.0
4 9 1 400 8.0 US-II 10.0


========================= 内存 =========================

Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
--- ----- ---- ------- ---------- ----- ------- -------
0 0 1024 Active OK 60ns 2-way A
2 0 1024 Active OK 60ns 2-way A

========================= IO 卡 =========================

Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
1 SBus 25 0 SUNW,socal/sf (scsi-3) 501-5266
1 SBus 25 3 SUNW,hme
1 SBus 25 3 SUNW,fas/sd (block)
1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060
1 UPA 100 2 FFB, Double Buffered SUNW,501-4790

Detached Boards
===============
Slot State Type Info
---- --------- ------ -----------------------------------------
3 failed disk Disk 0: no disk Disk 1: no disk

系统中失败的字段取代单元 (FRU):
==============================================
disk-board 在 IO 板上不可用 #3 上
PROM 错误字符串:fail
失败的字段取代单元为 IO 板 3

Detected System Faults
======================
Board 1 fault: Overtemp
Detected Sat Dec 16 02:24:21 2006
Unit 2 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
Unit 1 Core Power Supply failure
Detected Fri Dec 15 23:24:23 2006
PROM detected failure
Detected Fri Dec 15 23:24:23 2006

最近的 AC 电源故障:
=============================
Fri May 27 14:53:06 2005


========================= 环境状态 =========================
Keyswitch position is in Normal Mode
System Power Status: Minimum Available
System LED Status: GREEN YELLOW GREEN
WARNING ON ON BLINKING


Fans:
-----
Unit Status
---- ------
Rack OK
Key OK
AC OK

System Temperatures (Celsius):
------------------------------
Brd State Current Min Max Trend
--- ------- ------- --- --- -----
0 OK 39 36 43 stable
1 WARNING 66 46 67 stable
2 OK 39 36 43 stable
4 OK 53 50 55 stable
CLK OK 38 37 40 stable


Power Supplies:
---------------
Supply Status
--------- ------
0 OK
1 FAIL
2 FAIL
3 OK
PPS OK
System 3.3v OK
System 5.0v OK
Peripheral 5.0v OK
Peripheral 12v OK
Auxilary 5.0v OK
Peripheral 5.0v precharge OK
Peripheral 12v precharge OK
System 3.3v precharge OK
System 5.0v precharge OK
AC Power OK


========================= HW Revisions =========================

ASIC Revisions:
---------------
Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes
--- --- -- ----- ----- ---- ---- ---- ---------- ----------
0 1 5 CPU 100MHz Capable
1 1 5 1 22 UPA-SBus-SOC+ 100MHz Capable
2 1 5 CPU 100MHz Capable
3 Unknown 100MHz Capable
4 1 5 CPU 100MHz Capable

Board 1 FFB Hardware Configuration:
-----------------------------------
Board rev: 2
FBC version: 0x3241906d
DAC: Brooktree 9070, version 1
3DRAM: Mitsubishi 130b, version 2

System Board PROM revisions:
----------------------------
Board 0: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 1: FCODE 1.8.29 2001/06/18 17:26 iPOST 3.4.29 2001/06/18 17:49
Board 2: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50
Board 4: OBP 3.2.29 2001/06/18 17:28 POST 3.9.29 2001/06/18 17:50

更郁闷的是,目前这台服务器处于关键运营时期,还不能重新启动更换硬件。
只好等下次何时Down机。

-The End-

By eygle on 2006-12-19 08:50 | Comments (2) | Posted to Hardware | Edit |Pageviews:

相关文章 随机文章
  • 故障总是在同一个地方出现
  • 光纤存储、SUN遭遇莫名故障
  • 稳定的风险
  • SUN与Oracle 新的蜜月期
  • UltraSparc T1处理器 SUN的新起点
  • DBA警世录:Truncate之生产与测试环境
    栀子花开 北京印象
    Dell D600关机时"结束程序sample"问题解决
    有多少病毒可以再来-记大战7y7.us
    生日快乐 有生的日子里天天快乐
    网上相关主题:
    Google

    留言 (2)

    装个温度发电机, 带走一些热量吧.

    Posted by: anysql at December 19, 2006 9:44 AM

    4500够智能化的,温度传感器遍布全身。多给机柜走个空调出风口了,或许可以有点用

    Posted by: jacky at December 19, 2006 2:37 PM

    发表留言:



    Remember Me?
    (输入验证码后方可评论,谢谢支持)



    CopyRight © 2004 eygle.com, All rights reserved.