Digest Net: March 2009 Archives

Linux Ext3 和 ReiserFS 文件系统介绍

By eygle on March 27, 2009 12:27 PM | 10 Comments

原文引自： http://www.osxcn.com/ubuntu/ext3-and-reiserfs.html

这篇文章是 Ubuntu 分区和文件系统的选择的延续阅读，适合初级用户了解。

Linux 上的文件系统很多，例如 ext3, ReiserFS, XFS, JFS 这些，但桌面用户使用比较多的还是 ext3 和 ReiserFS。据我所知，ext3 独特的优点就是易于转换，很容易在 ext2 和 ext3 之间相互转换，而具有良好的兼容性，其它优点 ReiserFS 都有，而且还比它做得更好。如高效的磁盘空间利用和独特的搜寻方式都是 ext3 所不具备的，速度上它也不能和 ReiserFS、XFS 相媲美，在实际使用过程中，ReiserFS 也更加安全高效，据说反删除功能也不错。

要说 ext3 和 ReiserFS，可以先了解一下日志文件系统，它就是在非日志文件系统中加入了文件系统更改的日志记录，可以跟踪记录文件系统的变化，并将变化内容写入日志，写操作首先是对日志记录文件进行操作，若整个写操作由于某种原因 (如系统掉电) 而中断，系统重启时，会根据日志记录来恢复中断前的写操作，而且这个过程费时极短。ext3 和 ReiserFS 都是拥有这种日志功能的日志式文件系统。

ext3 和 ReiserFS 分别是 Redhat / SuSE Linux 默认文件系统，而 ReiserFS 的优势在于，它是基于 B*Tree 快速平衡树这种高效算法的文件系统，例如在处理小于 1k 的文件比 ext3 快 10 倍。再一个就是 ReiserFS 空间浪费较少，它不会对一些小文件分配 inode，而是打包存放在同一个磁盘块 (簇) 中，ext2/ext3 是把它们单独存放在不同的簇上，如簇大小为 4k，那么 2 个 100 字节的文件会占用 2 个簇，ReiserFS 则只占用一个。当然 ReiserFS 也有缺点，就是每升级一个版本，都要将磁盘重新格式化一次。

由于日志文件系统在写入数据的同时还要记录日志，这样就需要更多的磁盘 I/O 操作，必然会带来性能上的损失 (但 ext3 优化了硬盘磁头的运动，总处理能力不比 ext2 慢)。还有就是日志文件系统在频繁记录日志的同时，产生的磁盘碎片也比 ext2 这种非日志文件系统多 (虽然相比 fat32 这些碎片根本算不了什么)。所以一些资料上推荐用户使用混合文件系统，例如一些只读目录 /usr 使用 ext2，把 /var 这些需要频繁写入数据的目录使用 ext3，但我认为对桌面用户来说，ReiserFS 则是更好的选择，它的速度比 ext3 快，碎片比 ext3 少。

参考：
ext2, ext3, xfs, reiserfs 文件系统性能测试
 实战 ReiserFS 文件系统
 Linux 日志文件系统及性能分析
 在 Linux 中使用 ReiserFS 文件系统

What is Oracle consistent gets?

By eygle on March 12, 2009 11:47 AM | 6 Comments

Quote From : http://www.dba-oracle.com/m_consistent_gets.htm

The consistent gets Oracle metric is the number of times a consistent read (a logical RAM buffer I/O) was requested to get data from a data block. Part of Oracle tuning is to increase logical I/O by reducing the expensive disk I/O (physical reads), but high consistent gets presents it's own tuning challenges, especially when we see super high CPU consumption (i.e. the "top 5 timed events" in an AWR report).

Tuning Consistent Gets

Many shops with super-high consistent gets have high CPU consumption and this is quickly fixed by adding more CPU's to the server. Note that Oracle expert Kevin Closson sees "buffer chains latch" thrashing (latch overhead) as a major contributor to high CPU consumption on highly-buffered Oracle databases (e.g. 64-bit Oracle with a 50 gig db_cache_size):

" The closer a system gets to processor saturation, the more troublesome latch gets become--presuming the chain is hot.

While cache buffers chains latch thrashing may seem like a nebulous place to put blame for high processor utilization, trust me, it isn't.".

Types of Consistent Gets

Not all buffer touches are created equal, and Oracle has several types of "consistent gets", the term used by Oracle to describe an Oracle I/O that is done exclusively from the buffer cache. Oracle AWR and STATSPACK reports mention several types of consistent gets, all undocumented:

consistent gets
consistent gets from cache
consistent gets - examination
consistent gets direct

Some Oracle experts claim that these undocumented underlying mechanism can be revealed and that these consistent gets metrics may tell us about data clustering Mladen Gogala, author of "Easy Oracle PHP" makes these observations about consistent gets:

"The [consistent gets] overhead is the time spent by Oracle to maintain its own structures + the time spent by OS to maintain its own structures. So, what exactly happens during a consistent get in the situation described? As I don't have access to the source code, I cannot tell precisely, with 100% of certainty, but based on my experience, the process goes something like this:

1) Oracle calculates the hash value of the block and searches the SGA hash table for the place where the block is located.

2) Oracle checks the SCN of the block and compares it with the SCN of the current transaction. Here, I'll assume that this check will be OK and that no read consistent version needs to be constructed.

3) If the instance is a part of RAC, check the directory and see whether any other instance has modified the block. It will require communication with the GES process using the IPC primitives (MSG system calls). MSG system calls are frequently implemented using device driver which brings us to the OS overhead (context switch, scheduling)

4) If everything is OK, the block is paged in the address space of the requesting process. For this step I am not exactly sure when does it happen, but it has to happen at some point. Logically, it would look as the last step, but my logic may be flawed. Here, of course, I assume a soft fault. Hard fault would mean that a part of SGA was swapped out.

All of this is an overhead of a consistent get and it is the simplest case. How much is it in terms of microseconds, depends on many factors, but the overhead exists and is strictly larger then zero. If your SQL does a gazillion of consistent gets, it will waste significant CPU power and time to perform that."

For more insights on consistent gets, we see expert Kevin Closson who has a great description of the internal mechanisms within consistent gets. Kevin goes on to describe the internals of a consistent get:

"The routine is kcbget() (or one of his special purpose cousins). It doesn't really "search" a hash *table* if you will. A hash table would be more of a "perfect hash" structure and to implement that, every possible hash value has to be known when the table is set up. That would mean knowing every possible database block address.

Instead, it hashes to a bucket that has similar hashed dbas chained off off it in a linked list. So it is more of a scan of the linked list looking for the right dba and right version of it.

The particulars of the structures under a get are not as important as remembering that before walking that chain, the process has to obtain the latch on the chain. "

Consistent gets - examination

Mike Ault notes that "consistent gets - examinations" are related to buffer management overhead and data access overhead such as index reads and undo writes:

"consistent gets - examination is from reading something like undo blocks...

Other examples of "consistent gets - examination" are: reading the root block of an index, reading an undo block while creating a consistent read data block, reading a block in a single table hash cluster - unless it is found to have the 'collision flag' set."

Steve Karam, OCM notes about "consistent gets - examination":

"Consistent gets - examination are a different kind of consistent get that only requires a single latch, saving CPU. The most common use of a consistent get - examination is to read undo blocks for consistent read purposes, but they also do it for the first part of an index read and in certain cases for hash clusters.

So if you're doing a query on a couple tables that are mostly cached, but one of them has uncommitted DML against it at the time, you'll do consistent gets for the standard data in the cache, and the query will do consistent gets - examination to read the undo blocks and create read consistent blocks; this doesn't necessarily save CPU unfortunately, because while the consistent gets - examination only acquire one latch, creating the read consistent data block also takes a latch.

However, I think that when you use single table hash clusters (or the new 10g Sorted Hash Clusters I mentioned once that automatically sort by a key so they don't need order by) you can get a performance gain, because reads from the blocks of a hash cluster are usually consistent get - examination, therefore they only need one latch instead of two. "

Interpreting consistent gets in reports

Here is a STATSPACK (pr AWR) report we see displays for "consistent gets" and "consistent gets - examinations":

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- -----------
consistent gets 35,024,284 9,718.2 3,703.9

consistent gets - examination 12,148,672 3,370.9 1,284.8

An Oracle FAQ's forum had a problem in which a user had trouble, "when we run set autotrace on or similar execution statistics." The problem was resolved in part with this advices: "consistent gets is the blocks in consistent mode (sometimes reconstructed using information from RBS). So this reconstruction from RBS takes more resources (reads actually), which will end up as high consistent gets."

ASKTOM about :Consistenet gets and arraysize

By eygle on March 10, 2009 11:50 AM | 9 Comments

Quote From ASKTOM: http://asktom.oracle.com

Question:

Consistenet gets is based upon re-constructing a block for consistent read. 
Hence it is a function of only the
number of db_blocks to be read.
If you say that it is altered by the arraysize, do you suggest that, 
due to arraysize, 
some blocks are read muliple times and hence some blocks have > 1 
consistent read in the process
Thanks

Followup:

No, you are wrong in your statement.
A consistent get is a block gotten in read consistent mode (point in time mode).  
It MAY or MAY NOT involve reconstruction (rolling back).
Db Block Gets are CURRENT mode gets -- blocks read "as of right now".

Some blocks are processed more then once, yes, the blocks will have more then 1 
consistent read in the process.  Consider:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t as select * from all_objects;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> exec show_space( 'T')
Free Blocks.............................0
Total Blocks............................320
Total Bytes.............................2621440
Unused Blocks...........................4
Unused Bytes............................32768
Last Used Ext FileId....................7
Last Used Ext BlockId...................40969
Last Used Block.........................60
PL/SQL procedure successfully completed.
Table has 316 blocks, 22,908 rows..

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly statistics;
ops$tkyte@ORA817DEV.US.ORACLE.COM> set arraysize 15
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t;
22908 rows selected.
here with an array size of 15, we expect
22908/15 + 316 = 1843 consistent mode gets.  db block gets -- they were for 
performing the FULL SCAN, they had nothing to do with the data itself we 
selected

Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
       1824  consistent gets
        170  physical reads
          0  redo size
    2704448  bytes sent via SQL*Net to client
     169922  bytes received via SQL*Net from client
       1529  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22908  rows processed
ops$tkyte@ORA817DEV.US.ORACLE.COM> set arraysize 100
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t;
22908 rows selected.
Now, with 100 as the arraysize, we expect
22908/100 + 316 = 545 consistent mode gets.
Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
        546  consistent gets
        180  physical reads
          0  redo size
    2557774  bytes sent via SQL*Net to client
      25844  bytes received via SQL*Net from client
        231  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22908  rows processed
ops$tkyte@ORA817DEV.US.ORACLE.COM> set arraysize 1000
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from t;
22908 rows selected.
now, with arraysize = 1000, we expect:
22908/1000+316 = 338 consistent mode gets...
Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
        342  consistent gets
        222  physical reads
          0  redo size
    2534383  bytes sent via SQL*Net to client
       2867  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22908  rows processed

so yes, the blocks are gotten in consistent mode MORE THEN ONCE when the array 
fetch size is lower then the number of rows to be retrieved in this case
This is because we'll be 1/2 way through processing a block -- have enough rows 
to return to the client -- and we'll give UP that block.  When they ask for the 
next N rows, we need to get that halfway processed block again and pick up where 
we left off.

AIX5.3中将Oracle SGA PIN在内存中的步骤

By eygle on March 9, 2009 11:25 PM | 10 Comments

在一些操作系统平台中，我们可以将Oracle的SGA定在内存里，这样可以避免页交换，从而提高Oracle的性能。在AIX下，要把操作系统的v_pinshm参数设置为1，否则即使在Oracle中将LOCK_SGA设置为TRUE也是不管用的。然而仅仅知道这两个参数还远不够用的，必须对AIX内存管理有一定了解。本文要求操作系统是5.3 ML01以上，Oracle在9.2.0.4以上。

首先我们来检查一下操作系统版本：

XXIBM:#oslevel -r

5300-07

可见操作系统版本满足我们的要求。如果这个输出是5300-00，那么就先要给操作系统打补丁。Oracle很多的问题都和操作系统有紧密的联系。

接下来看看有多少内存。查看内存的方法有很多，随便用哪一种吧。

XXIBM:#bootinfo -r

64749568

上面的输出显示操作系统有64G内存。

再用rmss -p来看看当前可用内存是否与实际内存一致。因为有的时候可能出于测试的考虑，我们可能用rmss把内存模拟到某个大小（当然只能向小模拟）。

XXIBM:#rmss -p

Simulated memory size is 63231.9375 Mb.

如果上面的输出小于实际的内存，就要考虑用rmss -r来将内存恢复到实际大小。

接下来让我们检查几个有关内存的参数设置。AIX5.3的默认内存参数

首先检查lru_file_repage的设置。这是5.3新增的参数，这个参数默认为1，但IBM推荐在ML01之后，将这个参数设置为0。

XXIBM:#vmo -L lru_file_repage

NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES

--------------------------------------------------------------------

lru_file_repage 1 1 1 0 1 boolean D

在上面的输出中，CUR代表参数的当前值，DEF代表参数默认值，BOOT代表下次启动值。

用下面的命令把lru_file_repage设置为0。下面的设置只是在当前生效，不改变重启的设置。

XXIBM:#vmo -o lru_file_repage=0

Setting lru_file_repage to 0

接下来检查v_pinshm，应该改成1。

XXIBM:#vmo -L v_pinshm

NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES

--------------------------------------------------------------------

v_pinshm 1 0 0 0 1 boolean D

XXIBM:#vmo -o v_pinshm=1

Setting v_pinshm to 1

检查一下minperm%、maxperm%等参数。在使用lru_file_repage之前，我们习惯把maxperm%设置很小，如20%。但从5.3开始，IBM建议改大。这个参数默认是80，IBM建议可以考虑改成90。至于minperm%，默认是20。如果内存在32G-64G，可以改成10，小于32G，改成5，大于64G，保持默认20。

XXIBM:#vmo -o minperm%=10

Setting minperm% to 10

XXIBM:#vmo -o maxperm%=90

Setting maxperm% to 90

操作系统的参数调整好了之后，剩下的工作就简单了。登录到Oracle，查看一下LOCK_SGA参数的设置：

XXIBM:#su - oracle

$sqlplus /nolog

SQL*Plus: Release9.2.0.6.0 - Production on Fri Sep 19 08:40:10 2008

SQL>conn / as sysdba

Connected.

SQL>show parameter lock_sga

NAME TYPE VALUE

------------------------------------ -----------

lock_sga boolean FALSE

这个参数当前为FALSE。要想把SGA定在内存中，要把这个参数改成TRUE。

SQL>alter system set lock_sga=true scope=spfile;

System altered.

接下来计算一下当前SGA的大小：

SQL>select sum(value)/1024/1024 from v$sga;

SUM(VALUE)/1024/1024

--------------------

35941.0215

这个大小一般不要超过物理内存的60%。太小也不好，利用不充分。从上面的输出来看，当前的SGA大小基本合适。当然可进一步查看DB_CACHE_SIZE等参数设置是否合理，以确定是否要调整，这里略过。

设置好之后要重新启动数据库。如果数据库能够顺利启动，那么说明设置没问题。

那么怎样才能看出ORACLE的SGA是否定在内存里呢？可以通过svmon命令来查看。这个命令要用超级用户才可以运行。

$su -

root's Password:

XXIBM:#svmon -P -t 100|grep -p Pid|head

--------------------------------------------------------------------

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB 225546 oracle 9313207 9270407 2232 9308982 Y N N

--------------------------------------------------------------------

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB

119692 oracle 9312614 9270438 2232 9308978 Y N N

--------------------------------------------------------------------

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB

注意上面输出的Inuse和Pin，还有Command。Command对应oracle，说明进程信息是Oracle的。Inuse代表使用中的内存页，Pin代表定在内存中的页数量，每页大小4KB。这两个值如果相差甚远，则说明随SGA没有定在内存里，如果相差很近，则说明定在了内存里。

如果想在操作系统重启后Oracle也能把SGA定在内存里，并且正常工作，就要把本文一开始设置的v_pinshm、lru_file_repage等设置为重起操作系统后也是想要的值。例如：

XXIBM:#vmo -p -o v_pinshm=1

Setting v_pinshm to1 innextboot file

Setting v_pinshm to 1

原文： http://space.itpub.net/78033/viewspace-462686

海里、船速、节 - 海船速度何谓节

By eygle on March 2, 2009 6:34 PM | 10 Comments

　陆上的车辆和空中的飞机，以及江河船舶，其速度计量单位多用千米（公里）/小时，而海船（包括军舰）的速度单位却称作"节"。

　　早在16世纪，海上航行已相当发达，但当时一无时钟，二无航程记录仪，所以难以确切判定船的航行速度。然而，有一位聪明的水手想出一个妙法，他在船航行时向海面抛出拖有绳索的浮体，再根据一定时间里拉出的绳索长度来计船速。那时候，计时使用的还是流砂计时器。为了较准确地计算船速，有时放出的绳索很长，便在绳索的等距离打了许多结，如此整根计速绳上有分成若干节，只要测出相同的单位时间里，绳索被拉曳的节数，自然也就测得了相应的航速。于是，"节"成了海船速度的计量单位；相应地，海水流速、海上风速、鱼雷等水中兵器的速度计量单位，国际上也通用"节"。

　　"节"的代号是英文"Knot"的词头，采用"Kn"表示。1节等于每小时1海里，也就是每小时行驶1.852千米（公里）。航海上计量短距离的单位是"链"，1链等于1/10海里，代号是英文"Cable"的词头，用"Cab"。

　　海里是海上的长度单位。它原指地球子午线上纬度1分的长度，由于地球略呈椭球体状，不同纬度处的1分弧度略有差异。在赤道上1海里约等于1843米；纬度45°处约等于1852.2米，两极约等于1861.6米。1929年国际水文地理学会议，通过用1分平均长度1852米作为1海里；1948年国际人命安全会议承认，1852米或6O76.115英尺为1海里，故国际上采用1852米为标准海里长度。中国承认这一标准，用代号"M"表示。

　　此外，舰船上锚链分段制造和使用标志长度单位也用"节"通常规定锚链长度27.5米为1节；中国舰艇的使用标志以2O米为1节。

　　现代海船的测速仪已非常先进，有的随时可以数字显示，"抛绳计节"早已成为历史，但"节"作为海船航速单位仍被沿用。

March 2009 Archives

Linux Ext3 和 ReiserFS 文件系统介绍

What is Oracle consistent gets?

ASKTOM about :Consistenet gets and arraysize

AIX5.3中将Oracle SGA PIN在内存中的步骤

海里、船速、节 - 海船速度何谓节

文章分类

Monthly Archives

Pages

搜索本站

About this Archive