eygle.com   eygle.com
eygle.com  
 

« 美西游记之 - Disney与环球影城 | Blog首页 | 使用RMAN进行排除表空间备份 »

Oracle10gR2中的Mutex竞争的案例

作者:eygle |【转载时请以超链接形式标明文章和作者信息及本声明
链接:
最近有客户在Oracle10gR2上遇到了Mutex竞争的问题。

Mutex是Oracle在Oracle10g中引入的串行机制,逐渐会用来替代一些存在性能问题的Latch。
和Latch相比,一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令,同时在大小上,每个Mutex仅占用大约16 Bytes空间,而一个latch在10gR2中要占用大约112 Bytes空间。

Mutex首先替代了Library Cache Latch以及Library Cache Pin,在Oracle 10.2.0.2上通过隐含参数_kks_use_mutex_pin的调整可以限制是否使用Mutex机制来实现Cursor Pin:
SQL> set linesize 120
SQL> col name for a30
SQL> col value for a20
SQL> col describ for a60
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2    FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3  WHERE x.indx = y.indx
  4    AND x.ksppinm LIKE '%&par%'
  5  /
Enter value for par: mutex
old  4:    AND x.ksppinm LIKE '%&par%'
new  4:    AND x.ksppinm LIKE '%mutex%'

NAME                          VALUE                DESCRIB
------------------------------ -------------------- ------------------------------------------------------------
_kks_use_mutex_pin            TRUE                Turning on this will make KKS use mutex for cursor pins.

在新的Mutex Pins机制下,以下等待事件可能变得常见:
cursor: mutex S
cursor: mutex X
cursor: pin S
cursor: pin S wait on X
cursor: pin X

由于Mutex使用CAS(Compare and Swap)机制,所以在不支持CAS的HP Unix平台上就可能出现CPU消耗过高的情况。
这作为一个Bug在10.2.0.4版本中被修正。




历史上的今天...
      >> 2009-10-14文章:
      >> 2007-10-14文章:
             我的新房 我的家
      >> 2006-10-14文章:
             有朋自远方来 不亦悦乎
      >> 2005-10-14文章:
      >> 2004-10-14文章:
             Statspack之十三-Enqueue
------
这篇 【Oracle10gR2中的Mutex竞争的案例】来自 eygle.com | CSDN网摘| del.icio.us|Google订阅 | 鲜果订阅 | 抓虾订阅

By eygle on 2008-10-14 09:25 | Comments (4) | Posted to Case | Edit |

相关文章 随机文章
  • Oracle KSL Latch 管理层 与 Latch管理
  • 数据字典视图之:V$LATCH_CHILDREN 结构
  • CURSOR_SPACE_FOR_TIME 参数废弃
  • 关于Mutex的笔记
  • DBA警世录:bootstrap$的禁忌
  • 回顾成都的Oracle 11g发布会
    使用RMAN进行排除表空间备份
    Using DBMS_SYS_SQL Package to grant Privilege
    在RAC环境中如何管理日志(redolog file)组
    Windows无法显示隐藏文件夹之问题解决
    搜索本站:

    留言 (4)

    eygle, 能不能再详细些解释一下这个新的mutex机制?

    另外关于"一个Mutex Get大约仅需要30~35个指令,而Latch Get则需要大约150~200个指令" - 不知道是些什么样的"指令"? 一个Latch Get不是只需要一个"TAS"(在非AIX平台上)的吗?

    Posted by: lily at October 20, 2008 11:18 PM

    Latch在实现机制上是通过硬件指令来完成的,但是Latch Get并非仅仅是获取了Latch就完成了,一个Latch Get的生命周期还包括记时、释放等操作。到软件这个层面还需要很多指令的协同。

    Posted by: eygle at October 22, 2008 10:05 AM

    以下这段代码是Linux下9i的一个简单Latch Get和Latch Free的堆栈跟踪:
    "\0\200\0\0\6\0\0\0\0\0\3V%\0\0\0\0\0\0\0\0\200H\377\277"..., 2064) = 128
    gettimeofday({1224639491, 830852}, NULL) = 0
    gettimeofday({1224639491, 831014}, NULL) = 0
    gettimeofday({1224639491, 831161}, NULL) = 0
    gettimeofday({1224639491, 831300}, NULL) = 0
    futex(0xb71ecfd0, FUTEX_WAKE, 2147483647) = 0
    rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
    rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    gettimeofday({1224639491, 833380}, NULL) = 0
    gettimeofday({1224639491, 833512}, NULL) = 0
    gettimeofday({1224639491, 833641}, NULL) = 0
    gettimeofday({1224639491, 833772}, NULL) = 0
    gettimeofday({1224639491, 833912}, NULL) = 0
    gettimeofday({1224639491, 834040}, NULL) = 0
    gettimeofday({1224639491, 834175}, NULL) = 0
    gettimeofday({1224639491, 834305}, NULL) = 0
    gettimeofday({1224639491, 834475}, NULL) = 0
    gettimeofday({1224639491, 834604}, NULL) = 0
    gettimeofday({1224639491, 834731}, NULL) = 0
    gettimeofday({1224639491, 834859}, NULL) = 0
    write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
    read(7,

    "\0x\0\0\6\0\0\0\0\0\3V&\0\0\0\0\0\0\0\0\200H\377\277\3"..., 2064) = 120
    gettimeofday({1224639534, 324253}, NULL) = 0
    gettimeofday({1224639534, 324411}, NULL) = 0
    gettimeofday({1224639534, 324559}, NULL) = 0
    gettimeofday({1224639534, 324696}, NULL) = 0
    rt_sigaction(SIGBUS, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigaction(SIGSEGV, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigaction(SIGILL, {0x97fd27a, [ALRM], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, 8) = 0
    rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
    rt_sigaction(SIGBUS, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    rt_sigaction(SIGSEGV, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    rt_sigaction(SIGILL, {0x81e8114, ~[ILL ABRT BUS FPE KILL SEGV STOP XCPU XFSZ SYS RTMIN], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0xb71c2d20}, NULL, 8) = 0
    semctl(1146880, 9, IPC_64|SETVAL, 0xbfff980c) = 0
    gettimeofday({1224639534, 328215}, NULL) = 0
    gettimeofday({1224639534, 328378}, NULL) = 0
    gettimeofday({1224639534, 328510}, NULL) = 0
    gettimeofday({1224639534, 328640}, NULL) = 0
    gettimeofday({1224639534, 328856}, NULL) = 0
    gettimeofday({1224639534, 328996}, NULL) = 0
    gettimeofday({1224639534, 329159}, NULL) = 0
    gettimeofday({1224639534, 329293}, NULL) = 0
    gettimeofday({1224639534, 329481}, NULL) = 0
    gettimeofday({1224639534, 329610}, NULL) = 0
    gettimeofday({1224639534, 329738}, NULL) = 0
    gettimeofday({1224639534, 329864}, NULL) = 0
    write(10, "\0*\0\0\6\0\0\0\0\0\10\1\0\1\24\0\24Function return"..., 42) = 42
    read(7,

    Posted by: eygle at October 22, 2008 10:13 AM

    According Tom Kyte, the pseudo-code to get a latch get might look like below:
    **************
    Attempt to get Latch
    If Latch gotten
    Then
    return SUCCESS
    Else
    Misses on that Latch = Misses+1;
    Loop
    Sleeps on Latch = Sleeps + 1
    For I in 1 .. 2000
    Loop
    Attempt to get Latch
    If Latch gotten
    Then
    Return SUCCESS
    End if
    End loop
    Go to sleep for short period
    End loop
    End if
    ***************
    因为获取latch本身就是TAS或者CAS一个硬指令,肯定没有什么可以优化的了, 引用"和Latch相比,一个Mutex Get大约仅需要30~35个指令" - 那么就是说mutex get在实现上只能是在spin和sleep环节上比latch get更为优化? 那么Mutex是怎样那些环节的呢? 好奇中...

    关于Latch Get和Latch Free的堆栈跟踪, 我想这会是理解Latch的根本实现机制的好方法,比如我不知道latch free发生的具体时机, 一个request latch的process在spin了spin_count之后如果还没有得到latch,那么这个时候它会sleep,我以为这个时候一个latch free event也会同时被记下,但是又发现在awr report里面, latch free的等待次数("waits") 是可以远远小于cache buffers chains latch的sleep次数("sleeps"),- 期待eygle的看法. 另外请eygle对如何在Linux上面对Latch Get和Latch Free做堆栈跟踪,以及如何read堆栈跟踪的结果做些解释或者指点一下相应的链接好吗?

    Posted by: lily at October 24, 2008 1:43 AM

    发表留言:



    Remember Me?
    (输入验证码后方可评论,谢谢支持)



    CopyRight © 2004~2010 eygle.com, All rights reserved.