eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« Oracle中数据文件大小的限制 | Blog首页 | Oracle11g开始倒计时 »

案例学习:inode耗尽导致No space left on device错误
modb.pro

这是一则学习笔记,具体问题ITPUB上提问的朋友已经自己解决。

作者提出的问题是这样的:
一台测试的服务器,停电再起来后发现listener起不来,报错如下:
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.191.100)(PORT=1521)))
Error listening on: (ADDRESS=(PROTOCOL=ipc)(PARTIAL=yes)(QUEUESIZE=1))
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.191.100)(PORT=1521)))
TNS-12549: TNSperating system resource quota exceeded
TNS-12560: TNSrotocol adapter error
TNS-00519: Operating system resource quota exceeded
Linux Error: 28: No space left on device

对于这个提示,一般的直觉反映是磁盘空间用完了,不过这个错误肯定大家都能发现:

首先查看log文件,已经2G了,打开看看日志里面也没发现什么异常,认为日志是自然增长到这么大的,于是直接cat /dev/null>listener.log把日志清空。然后listener还是起不来,仍然报上面的错误。之后重启机器,还是不行,检查磁盘空间也没有问题。

显然没这么简单,以前的一些经验,当系统信号量不足时也可能导致这类错误提示。

不过作者最后发现的问题是inode耗尽。
在Linux上,我们可以用df -i来查看inode的分配情况:

[oracle@jumper elog]$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda3 131616 25376 106240 20% /
/dev/hda1 66264 35 66229 1% /boot
/dev/hda5 1048576 37166 1011410 4% /data1
/dev/hda9 294336 6020 288316 3% /home
/dev/hda6 1048576 20467 1028109 2% /opt
none 64193 1 64192 1% /dev/shm
/dev/hda8 524288 87362 436926 17% /usr
/dev/hda7 524288 1598 522690 1% /var

如果inode耗尽,则系统上将不能创建文件。监听器就可能无法启动。
作者当时耗尽的/var下的inode,那么Oracle监听器是否需要使用var下的空间呢?
我们看一下测试:

[oracle@jumper tmp]$ strace -o lsnrctl.log lsnrctl start

LSNRCTL for Linux: Version 9.2.0.4.0 - Production on 09-JUL-2007 15:45:09

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

Starting /opt/oracle/product/9.2.0/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 9.2.0.4.0 - Production
System parameter file is /opt/oracle/product/9.2.0/network/admin/listener.ora
Log messages written to /opt/oracle/product/9.2.0/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.33.11)(PORT=1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
umovestr: Input/output error
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 9.2.0.4.0 - Production
Start Date 09-JUL-2007 15:45:09
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security OFF
SNMP OFF
Listener Parameter File /opt/oracle/product/9.2.0/network/admin/listener.ora
Listener Log File /opt/oracle/product/9.2.0/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.33.11)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "eygle" has 1 instance(s).
Instance "eygle", status UNKNOWN, has 1 handler(s) for this service...
Service "julia" has 1 instance(s).
Instance "eygle", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

检查一下跟踪文件:

[oracle@jumper tmp]$ grep var lsnrctl.log
execve("/opt/oracle/product/9.2.0/bin/lsnrctl", ["lsnrctl", "start"], [/* 33 vars */]) = 0
connect(4, {sa_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ENOENT (No such file or directory)
access("/var/tmp/.oracle", F_OK) = 0
access("/var/tmp/.oracle/sEXTPROC", F_OK) = 0
connect(4, {sa_family=AF_UNIX, path="/var/tmp/.oracle/sEXTPROC"}, 110) = 0

启动监听文件后,/var/tmp/.oracle目录下会创建两个文件,用于外部存储过程调用的监听和本地监听:

[oracle@jumper tmp]$ ll /var/tmp/.oracle/ |wc -l
16
[oracle@jumper tmp]$ lsnrctl stop

LSNRCTL for Linux: Version 9.2.0.4.0 - Production on 09-JUL-2007 15:46:08

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
The command completed successfully
[oracle@jumper tmp]$ ll /var/tmp/.oracle/ |wc -l
14
[oracle@jumper tmp]$ lsnrctl start

LSNRCTL for Linux: Version 9.2.0.4.0 - Production on 09-JUL-2007 15:46:13

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

Starting /opt/oracle/product/9.2.0/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 9.2.0.4.0 - Production
System parameter file is /opt/oracle/product/9.2.0/network/admin/listener.ora
Log messages written to /opt/oracle/product/9.2.0/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.33.11)(PORT=1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 9.2.0.4.0 - Production
Start Date 09-JUL-2007 15:46:13
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security OFF
SNMP OFF
Listener Parameter File /opt/oracle/product/9.2.0/network/admin/listener.ora
Listener Log File /opt/oracle/product/9.2.0/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.33.11)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "eygle" has 1 instance(s).
Instance "eygle", status UNKNOWN, has 1 handler(s) for this service...
Service "julia" has 1 instance(s).
Instance "eygle", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@jumper tmp]$ ll /var/tmp/.oracle/ |wc -l
16
[oracle@jumper tmp]$ ll .oracle/
total 0
srwxrwxrwx 1 oracle dba 0 Jan 18 2006 s#11126.1
srwxrwxrwx 1 oracle dba 0 Jan 3 2007 s#12200.1
srwxrwxrwx 1 oracle dba 0 Apr 24 2006 s#14328.1
srwxrwxrwx 1 oracle dba 0 Oct 20 2006 s#14420.1
srwxrwxrwx 1 oracle dba 0 May 8 2006 s#15102.1
srwxrwxrwx 1 oracle dba 0 Mar 18 2005 s#16499.1
srwxrwxrwx 1 oracle dba 0 Jul 9 15:46 s#16661.1
srwxrwxrwx 1 oracle dba 0 May 18 2006 s#21975.1
srwxrwxrwx 1 oracle dba 0 Jun 28 2005 s#23361.1
srwxrwxrwx 1 oracle dba 0 Nov 3 2006 s#27269.1
srwxrwxrwx 1 oracle dba 0 Nov 10 2006 s#4200.1
srwxrwxrwx 1 oracle dba 0 Oct 17 2006 s#6146.1
srwxrwxrwx 1 oracle dba 0 Aug 28 2006 s#6565.1
srwxrwxrwx 1 oracle dba 0 Jun 27 2006 s#9884.1
srwxrwxrwx 1 oracle dba 0 Jul 9 15:46 sEXTPROC

这个原因才是导致监听器无法启动的罪魁祸首。

遭遇者的日志记录参考:
http://zhang41082.itpub.net/post/7167/305840

-The End-


历史上的今天...
    >> 2005-07-10文章:
           解决referrers的乱码问题

By eygle on 2007-07-10 10:46 | Comments (1) | Case | 1493 |

1 Comment

遇到过类似问题,但没想过这么细,佩服大师的细致,呵呵!


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com