Latch free竞争 - 最近的SAP测试项目小记 - Oracle Life - MogDB 成为国产数据库第一品牌！

« 《青衣张火丁》、《中国童话》-最近购入的图书 | Blog首页 | 谁有遇到 ORA-600 kcblasm_1 的Bug及经验? »

Latch free竞争 - 最近的SAP测试项目小记

上周在一个SAP的测试项目上折腾了几天，在BASIS方面，以Oracle数据库为后端做了大量的优化和反复测试工作。

在高压力、大并发的情况下，Oracle的种种Bug此起彼伏的跳出来，开始用的10g的版本10.2.0.4进行测试，后来遇到了一个10g中不修正的Bug，只好将数据库升级到Oracle 11gR2上来。
在这个测试中经历了非常多的异常情况，包括对于SAP系统的Debug跟踪等。

以前不常见的种种Latch竞争纷纷呈现。

简要摘录一些测试过程中遇到的问题与大家分享。
以下是10.2.0.4测试的数据库，Buffer Cache 配置200G，Shared Pool配置8G：

DB Name	DB Id	Instance	Inst num	Release	RAC	Host
E00	3694296179	SAP	1	10.2.0.4.0	NO	hpdb4

	Snap Id	Snap Time	Sessions	Cursors/Session
Begin Snap:	886	25-Oct-10 19:19:14	9019	38.6
End Snap:	887	25-Oct-10 19:44:06	9022	12.0
Elapsed:		24.86 (mins)
DB Time:		194.88 (mins)

Report Summary

Cache Sizes

	Begin	End
Buffer Cache:	200,000M	200,000M	Std Block Size:	8K
Shared Pool Size:	8,192M	8,192M	Log Buffer:	62,988K

此时的负载概要如下，事务数大约是6580个/秒，每秒Redo大约7M：

	Per Second	Per Transaction
Redo size:	7,490,377.18	1,138.27
Logical reads:	270,762.69	41.15
Block changes:	29,554.77	4.49
Physical reads:	1.95	0.00
Physical writes:	1,563.19	0.24
User calls:	69,075.67	10.50
Parses:	30.44	0.00
Hard parses:	0.09	0.00
Sorts:	8.68	0.00
Logons:	1.31	0.00
Executes:	62,484.20	9.50
Transactions:	6,580.48

此时数据库的主要竞争体现在：

Top 5 Timed Events

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
latch free	2,011,998	767,164	381	6,561.1	Other
CPU time		8,500		72.7
latch: session allocation	57,723	2,350	41	20.1	Other
latch: enqueue hash chains	1,657	10	6	.1	Other
latch: cache buffers chains	10,160	5	1	.0	Concurrency

这里的latch: session allocation最终被证实是一个Bug，10g中未修正，始终无法解决。

Latch的使用情况如下：

Latch Name	Get Requests	Misses	Sleeps	Spin Gets
dml lock allocation	31,707,289	3,083,158	11,321	3,072,356
resmgr:active threads	3,751,592	2,022,796	2,022,666	169
cache buffers chains	698,231,997	1,394,909	10,173	1,385,399
session allocation	19,521,250	685,814	57,723	629,338
enqueue hash chains	51,233,672	184,793	1,657	183,205
redo allocation	30,701,879	73,881	91	73,799
mostly latch-free SCN	1,254,255	71,716	113	71,604
library cache	10,439,897	52,171	283	51,899
undo global data	49,290,255	41,078	283	40,814
session idle bit	214,418,726	28,335	262	28,083
enqueues	9,852,464	27,037	841	26,230
messages	3,844,660	25,985	57	25,928
redo writing	4,632,969	7,221	79	7,145
lgwr LWN SCN	1,184,768	7,165	1	7,164
simulator lru latch	2,292,839	5,484	19	5,465
resmgr:free threads list	3,041	2,774	2,793	1
object queue header operation	10,713,623	2,633	30	2,603
In memory undo latch	40,626,363	1,821	1,646	263
cache buffers lru chain	4,578,067	1,453	61	1,392
checkpoint queue latch	8,894,121	957	3	954
simulator hash latch	22,594,380	707	1	706
Consistent RBA	1,177,333	233	4	229
parameter table allocation management	1,849	141	107	38
session state list latch	10,218	62	55	11
shared pool	71,942	52	1	51
process allocation	2,499	27	27	0
active service list	8,582	5	1	4

这其中另外一个问题是 resmgr:active threads 资源管理的竞争极高，虽然数据库中没有显示的设置任何资源计划。

后来这个问题通过设置隐含参数禁用资源计划得以解决。相关参数设置如下:
_resource_manager_always_on = false

通过该参数设置禁用了资源管理计划,该Bug在Oracle 11g中仍然存在.

历史上的今天...
>> 2009-11-10文章:

PowerPoint不能输入中文 - 令人抓狂的诡异

Oracle 11gR2 Solaris (SPARC) (64-bit) 发布

>> 2008-11-10文章:

如何快速的成为一个合格的Oracle DBA?

>> 2006-11-10文章:

如何更改监听器日志文件名称

>> 2005-11-10文章:

Linux上iSCSI配置

给20年后的自己发封邮件

How to use Oracle Dump Function

By eygle on 2010-11-10 01:00 | Comments (6) | Case | 2655 |