eygle.com   eygle.com
eygle.com eygle
eygle.com  
 

« 美西游记之 - Disney与环球影城 | Blog首页 | 使用RMAN进行排除表空间备份 »

Oracle10gR2中的Mutex竞争的案例
modb.pro

最近有客户在Oracle10gR2上遇到了Mutex竞争的问题。

Mutex是Oracle在Oracle10g中引入的串行机制,逐渐会用来替代一些存在性能问题的Latch。
和Latch相比,一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令,同时在大小上,每个Mutex仅占用大约16 Bytes空间,而一个latch在10gR2中要占用大约112 Bytes空间。

Mutex首先替代了Library Cache Latch以及Library Cache Pin,在Oracle 10.2.0.2上通过隐含参数_kks_use_mutex_pin的调整可以限制是否使用Mutex机制来实现Cursor Pin:
SQL> set linesize 120
SQL> col name for a30
SQL> col value for a20
SQL> col describ for a60
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2    FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3  WHERE x.indx = y.indx
  4    AND x.ksppinm LIKE '%&par%'
  5  /
Enter value for par: mutex
old  4:    AND x.ksppinm LIKE '%&par%'
new  4:    AND x.ksppinm LIKE '%mutex%'

NAME                          VALUE                DESCRIB
------------------------------ -------------------- ------------------------------------------------------------
_kks_use_mutex_pin            TRUE                Turning on this will make KKS use mutex for cursor pins.

在新的Mutex Pins机制下,以下等待事件可能变得常见:
cursor: mutex S
cursor: mutex X
cursor: pin S
cursor: pin S wait on X
cursor: pin X

由于Mutex使用CAS(Compare and Swap)机制,所以在不支持CAS的HP Unix平台上就可能出现CPU消耗过高的情况。
这作为一个Bug在10.2.0.4版本中被修正。




历史上的今天...
    >> 2010-10-14文章:
    >> 2009-10-14文章:
    >> 2007-10-14文章:
           我的新房 我的家
    >> 2006-10-14文章:
           有朋自远方来 不亦悦乎
    >> 2005-10-14文章:
    >> 2004-10-14文章:
           Statspack之十三-Enqueue

By eygle on 2008-10-14 09:25 | Comments (5) | Case | 2058 |

5 Comments

eygle, 能不能再详细些解释一下这个新的mutex机制?

另外关于"一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令" - 不知道是些什么样的"指令"? 一个Latch Get不是只需要一个"TAS"(在非AIX平台上)的吗?

Latch在实现机制上是通过硬件指令来完成的,但是Latch Get并非仅仅是获取了Latch就完成了,一个Latch Get的生命周期还包括记时、释放等操作。到软件这个层面还需要很多指令的协同。

以下这段代码是Linux下9i的一个简单Latch Get和Latch Free的堆栈跟踪:
"\0\200\0\0\6\0\0\0\0\0\3V%\0\0\0\0\0\0\0\0\200H\377\277"..., 2064) = 128
gettimeofday({1224639491, 830852}, NULL) = 0
gettimeofday({1224639491, 831014}, NULL) = 0
gettimeofday({1224639491, 831161}, NULL) = 0
gettimeofday({1224639491, 831300}, NULL) = 0
futex(0xb71ecfd0, FUTEX_WAKE, 2147483647) = 0
rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
gettimeofday({1224639491, 833380}, NULL) = 0
gettimeofday({1224639491, 833512}, NULL) = 0
gettimeofday({1224639491, 833641}, NULL) = 0
gettimeofday({1224639491, 833772}, NULL) = 0
gettimeofday({1224639491, 833912}, NULL) = 0
gettimeofday({1224639491, 834040}, NULL) = 0
gettimeofday({1224639491, 834175}, NULL) = 0
gettimeofday({1224639491, 834305}, NULL) = 0
gettimeofday({1224639491, 834475}, NULL) = 0
gettimeofday({1224639491, 834604}, NULL) = 0
gettimeofday({1224639491, 834731}, NULL) = 0
gettimeofday({1224639491, 834859}, NULL) = 0
write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
read(7,

"\0x\0\0\6\0\0\0\0\0\3V&\0\0\0\0\0\0\0\0\200H\377\277\3"..., 2064) = 120
gettimeofday({1224639534, 324253}, NULL) = 0
gettimeofday({1224639534, 324411}, NULL) = 0
gettimeofday({1224639534, 324559}, NULL) = 0
gettimeofday({1224639534, 324696}, NULL) = 0
rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
semctl(1146880, 9, IPC_64|SETVAL, 0xbfff980c) = 0
gettimeofday({1224639534, 328215}, NULL) = 0
gettimeofday({1224639534, 328378}, NULL) = 0
gettimeofday({1224639534, 328510}, NULL) = 0
gettimeofday({1224639534, 328640}, NULL) = 0
gettimeofday({1224639534, 328856}, NULL) = 0
gettimeofday({1224639534, 328996}, NULL) = 0
gettimeofday({1224639534, 329159}, NULL) = 0
gettimeofday({1224639534, 329293}, NULL) = 0
gettimeofday({1224639534, 329481}, NULL) = 0
gettimeofday({1224639534, 329610}, NULL) = 0
gettimeofday({1224639534, 329738}, NULL) = 0
gettimeofday({1224639534, 329864}, NULL) = 0
write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
read(7,

According Tom Kyte, the pseudo-code to get a latch get might look like below:
**************
Attempt to get Latch
If Latch gotten
Then
return SUCCESS
Else
Misses on that Latch = Misses+1;
Loop
Sleeps on Latch = Sleeps + 1
For I in 1 .. 2000
Loop
Attempt to get Latch
If Latch gotten
Then
Return SUCCESS
End if
End loop
Go to sleep for short period
End loop
End if
***************
因为获取latch本身就是TAS或者CAS一个硬指令,肯定没有什么可以优化的了, 引用"和Latch相比,一个Mutex Get大约仅需要30~35个指令" - 那么就是说mutex get在实现上只能是在spin和sleep环节上比latch get更为优化? 那么Mutex是怎样那些环节的呢? 好奇中...

关于Latch Get和Latch Free的堆栈跟踪, 我想这会是理解Latch的根本实现机制的好方法,比如我不知道latch free发生的具体时机, 一个request latch的process在spin了spin_count之后如果还没有得到latch,那么这个时候它会sleep,我以为这个时候一个latch free event也会同时被记下,但是又发现在awr report里面, latch free的等待次数("waits") 是可以远远小于cache buffers chains latch的sleep次数("sleeps"),- 期待eygle的看法. 另外请eygle对如何在Linux上面对Latch Get和Latch Free做堆栈跟踪,以及如何read堆栈跟踪的结果做些解释或者指点一下相应的链接好吗?

Lily 是个喜欢钻研的好学上进青年. 赞之!


CopyRight © 2004~2020 云和恩墨,成就未来!, All rights reserved.
数据恢复·紧急救援·性能优化 云和恩墨 24x7 热线电话:400-600-8755 业务咨询:010-59007017-7040 or 7037 业务合作: marketing@enmotech.com