« 美西游记之 - Disney与环球影城 | Blog首页 | 使用RMAN进行排除表空间备份 »
Oracle10gR2中的Mutex竞争的案例
作者:eygle | 【转载请注出处】|【云和恩墨 领先的zData数据库一体机 | zCloud PaaS云管平台 | SQM SQL审核平台 | ZDBM 数据库备份一体机】
链接:https://www.eygle.com/archives/2008/10/oracle10gr2_mute_bug.html
最近有客户在Oracle10gR2上遇到了Mutex竞争的问题。链接:https://www.eygle.com/archives/2008/10/oracle10gr2_mute_bug.html
Mutex是Oracle在Oracle10g中引入的串行机制,逐渐会用来替代一些存在性能问题的Latch。
和Latch相比,一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令,同时在大小上,每个Mutex仅占用大约16 Bytes空间,而一个latch在10gR2中要占用大约112 Bytes空间。
Mutex首先替代了Library Cache Latch以及Library Cache Pin,在Oracle 10.2.0.2上通过隐含参数_kks_use_mutex_pin的调整可以限制是否使用Mutex机制来实现Cursor Pin:
SQL> set linesize 120
SQL> col name for a30
SQL> col value for a20
SQL> col describ for a60
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.indx = y.indx
4 AND x.ksppinm LIKE '%&par%'
5 /
Enter value for par: mutex
old 4: AND x.ksppinm LIKE '%&par%'
new 4: AND x.ksppinm LIKE '%mutex%'
NAME VALUE DESCRIB
------------------------------ -------------------- ------------------------------------------------------------
_kks_use_mutex_pin TRUE Turning on this will make KKS use mutex for cursor pins.
在新的Mutex Pins机制下,以下等待事件可能变得常见:
cursor: mutex S
cursor: mutex X
cursor: pin S
cursor: pin S wait on X
cursor: pin X
由于Mutex使用CAS(Compare and Swap)机制,所以在不支持CAS的HP Unix平台上就可能出现CPU消耗过高的情况。
这作为一个Bug在10.2.0.4版本中被修正。
历史上的今天...
>> 2010-10-14文章:
>> 2009-10-14文章:
>> 2007-10-14文章:
>> 2006-10-14文章:
>> 2005-10-14文章:
>> 2004-10-14文章:
By eygle on 2008-10-14 09:25 | Comments (5) | Case | 2058 |
eygle, 能不能再详细些解释一下这个新的mutex机制?
另外关于"一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令" - 不知道是些什么样的"指令"? 一个Latch Get不是只需要一个"TAS"(在非AIX平台上)的吗?
Latch在实现机制上是通过硬件指令来完成的,但是Latch Get并非仅仅是获取了Latch就完成了,一个Latch Get的生命周期还包括记时、释放等操作。到软件这个层面还需要很多指令的协同。
以下这段代码是Linux下9i的一个简单Latch Get和Latch Free的堆栈跟踪:
"\0\200\0\0\6\0\0\0\0\0\3V%\0\0\0\0\0\0\0\0\200H\377\277"..., 2064) = 128
gettimeofday({1224639491, 830852}, NULL) = 0
gettimeofday({1224639491, 831014}, NULL) = 0
gettimeofday({1224639491, 831161}, NULL) = 0
gettimeofday({1224639491, 831300}, NULL) = 0
futex(0xb71ecfd0, FUTEX_WAKE, 2147483647) = 0
rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
gettimeofday({1224639491, 833380}, NULL) = 0
gettimeofday({1224639491, 833512}, NULL) = 0
gettimeofday({1224639491, 833641}, NULL) = 0
gettimeofday({1224639491, 833772}, NULL) = 0
gettimeofday({1224639491, 833912}, NULL) = 0
gettimeofday({1224639491, 834040}, NULL) = 0
gettimeofday({1224639491, 834175}, NULL) = 0
gettimeofday({1224639491, 834305}, NULL) = 0
gettimeofday({1224639491, 834475}, NULL) = 0
gettimeofday({1224639491, 834604}, NULL) = 0
gettimeofday({1224639491, 834731}, NULL) = 0
gettimeofday({1224639491, 834859}, NULL) = 0
write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
read(7,
"\0x\0\0\6\0\0\0\0\0\3V&\0\0\0\0\0\0\0\0\200H\377\277\3"..., 2064) = 120
gettimeofday({1224639534, 324253}, NULL) = 0
gettimeofday({1224639534, 324411}, NULL) = 0
gettimeofday({1224639534, 324559}, NULL) = 0
gettimeofday({1224639534, 324696}, NULL) = 0
rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
semctl(1146880, 9, IPC_64|SETVAL, 0xbfff980c) = 0
gettimeofday({1224639534, 328215}, NULL) = 0
gettimeofday({1224639534, 328378}, NULL) = 0
gettimeofday({1224639534, 328510}, NULL) = 0
gettimeofday({1224639534, 328640}, NULL) = 0
gettimeofday({1224639534, 328856}, NULL) = 0
gettimeofday({1224639534, 328996}, NULL) = 0
gettimeofday({1224639534, 329159}, NULL) = 0
gettimeofday({1224639534, 329293}, NULL) = 0
gettimeofday({1224639534, 329481}, NULL) = 0
gettimeofday({1224639534, 329610}, NULL) = 0
gettimeofday({1224639534, 329738}, NULL) = 0
gettimeofday({1224639534, 329864}, NULL) = 0
write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
read(7,
According Tom Kyte, the pseudo-code to get a latch get might look like below:
**************
Attempt to get Latch
If Latch gotten
Then
return SUCCESS
Else
Misses on that Latch = Misses+1;
Loop
Sleeps on Latch = Sleeps + 1
For I in 1 .. 2000
Loop
Attempt to get Latch
If Latch gotten
Then
Return SUCCESS
End if
End loop
Go to sleep for short period
End loop
End if
***************
因为获取latch本身就是TAS或者CAS一个硬指令,肯定没有什么可以优化的了, 引用"和Latch相比,一个Mutex Get大约仅需要30~35个指令" - 那么就是说mutex get在实现上只能是在spin和sleep环节上比latch get更为优化? 那么Mutex是怎样那些环节的呢? 好奇中...
关于Latch Get和Latch Free的堆栈跟踪, 我想这会是理解Latch的根本实现机制的好方法,比如我不知道latch free发生的具体时机, 一个request latch的process在spin了spin_count之后如果还没有得到latch,那么这个时候它会sleep,我以为这个时候一个latch free event也会同时被记下,但是又发现在awr report里面, latch free的等待次数("waits") 是可以远远小于cache buffers chains latch的sleep次数("sleeps"),- 期待eygle的看法. 另外请eygle对如何在Linux上面对Latch Get和Latch Free做堆栈跟踪,以及如何read堆栈跟踪的结果做些解释或者指点一下相应的链接好吗?
Lily 是个喜欢钻研的好学上进青年. 赞之!