eygle.com   eygle.com
eygle.com  
 
Digest Net: April 2007 Archives

April 2007 Archives

原文链接:
http://www.iselong.com/english/0007/7969.htm 

网友Skila给我发邮件,说她最近才知道,翻译外国文学作品,有三个标准,分别是“信、达、雅”,也就是忠实、通顺、美好。Skila希望我举一个简单的例子(她刚上大学一年级),介绍一下怎样才能让译文达到“信、达、雅”的标准。我本人不是英语专业毕业,也没有翻译过外国文学作品,按理说没有资格谈这个问题,不过我还是想根据我的一点体会,简单聊一聊。

  首先请看下面这段文字,这是17世界英国女诗人Katherine Philips的一首诗的前四行,是我在Google检索时偶然发现的。

  I did not live until this time,
  Crown'd my felicity,
  When I could say without a crime,
  I am not thine, but thee.

  我首先按照“信”的标准,将它翻译为:

  我没有活过,直到现在为止,
  给我的快乐加冕,
  我可以无罪地说,
  我不是你的,而是你。

  您可以看出来,译文虽然“信”,但不“达”,因此我必须根据我的汉语知识,使译文在“信”的基础上,尽可能“达”一些,因此我将译文改写成:

  我直到现在才算真正活着,
  我的快乐得到了加冕,
  我可以无愧地说,
  我不是你的,我就是你。

  至此,译文基本达到“信”和“达”的标准,但还没有达到“雅”的标准。此时,假如我有很过硬的古汉语基础,我可以将译文再次改写成五言古诗(或者七言古诗),那样一来,就真正符合“信、达、雅”的标准了,可惜我不会,实在惭愧(笑)。

  由此我们可以看出,要想真正达到“信、达、雅”的标准,实在是太难太难啦,没有十年汉语底子,没有十年英语底子,没有十年翻译实践的底子,也就是说,没有30年积淀的底子,靠突击、靠速成、靠耍小聪明等等,是绝对不可能达到的,因此我们可以发现,那些真正能够达到“信、达、雅”标准的翻译家,往往都是一些鬓发斑白的老先生。

  ▲帖子发表之后,网友YaleField将我的译文改写成一首五言古诗,内容如下。特向YaleField网友表示衷心感谢!

  大悟方此时,
  鸿运正当前。
  无愧表心语,
  我今天下君!

  作者:张宏(info@italian.org.cn)

使用ZFS的十条理由

| 1 Comment

1. 再也不需要fsck, scandisk

不管你是在用Linux,UNIX还是Windows,相信大家都有过类似的体会:当系统意外断电或者非法关机,系统重起后发现文件系统有inconsistent的问题,这时 候就需要fsck或者scandisk 来修复,这段时间是非常耗时而且最后不一定能够修复成功。更糟糕的是,如果这是一台服务器需要做fsck的时候,只能offline(下线),而且现有应用往往都是大硬盘,相应fsck修 复时间也很长,这对许多使用该服务器的用户来说几乎不能忍受的。而使用ZFS后大家可以彻底抛弃fsck这种工具,因为ZFS是一个基于COW(Copy on Write)机制的文件系统。COW是不会对硬盘上现有的文件进行重写,保证所有硬盘上的文件都是有效的。所以不会有这种inconsistent的概念,自然就不需要这种工具了。

2. 管理简单

ZFS作为一个全新的文件系统,全面抛弃传统File System + Volume Manager + Storage的架构,所有的存储设备是通过ZFS Pool进行管理,只要把各种存储设备加 入同一个ZFS Pool,大家就可以轻松的在这个ZFS Pool管理配置文件系统。大家再也不用牢记各种专业概念,各种命令newfs, metinit及各种Volume Manager的用法。在ZFS中我们只需要两个命令,zpool(针 对ZFS Pool管理)和zfs(针对ZFS文件系统的管理),就可以轻松管理128位的文件系统。举个例子,我们经常会遇到系统数据增长过 快,现有存储容量不够,需要添加硬盘,如果依照传统的Volume Manager管理方式,那我 们需要预先要考虑很多现有因素,还要预先根据应用计算出需要配置的各种参数。在ZFS情况下,我们的系统管理员可以彻底解放,再也不需要这种人为的复杂 考虑和计算,我们可以把这些交给ZFS,因为ZFS Pool会自动调节,动态适应需求。我们只需一个简单的命令为 这个ZFS Pool加入新的硬盘就可以了:

zpool add zfs_pool mirror c4t0d0 c5t0d0

基于这个动态调节的ZFS Pool之上的所有的文件系统就可以立即使用到这个新的硬盘,并且会自动的选择最优化的参数。

而且ZFS同时也提供图形化的管理界面,下面是一个ZFS图形化管理的一个截屏:

3. 没有任何容量限制

ZFS(Zettabyte File System)文件系统就如其名字所预示,可以提供真正的海量存储,在现实中几乎不可能遇到容量问题。在现有的64位kernel(内 核)下,它可以容纳达到16 Exabytes(264)大小的单个文件,可以使用264个存储设备,可以创建264个文件系统。

4. 完全保证 数据 的正确和完整

由于ZFS所有的数据操作都是基 于Transaction(事务),一组相应的操作会被ZFS解 析为一个事务操作,事务的操作就代表着一组操作要么一起失败,要么一起成功。而且如前所说,ZFS对 所有的操作是基于COW(Copy on Write), 从而保证设备上的数 据始终都是有效的,再也不会因为系统崩溃或者意外掉电导致数据文件的inconsistent。

还有一种潜在威胁 数据的可能是来自于硬件设备的问题,比如磁 盘,RAID卡的硬件问题或者驱动bug。现有文件系统通常遇到这个问题,往往只是简单的把错误数据直接交给上层应用,通常我们把这个问题称作Silent Data Corruption。而在ZFS中,对所有数据不管是用户数据还是文件系统自身的metadata数 据都进行256位的Checksum(校 验),当ZFS在提交数据时会进行校验,彻底杜绝这种Silent Data Corruption情况。

5. 提供优异 性能和扩展性

和传统File System + Volume Manager + Storage架构不同,ZFS则是直接基于存储设备提供所有的功能,因此有自己独有的创新特性,性能自然非比寻常。

  • Dynamic Striping vs. Static Striping

由于ZFS是基于COW和一个全局动态的ZFS Pool,任何一次写 操作,都是对一块新数据块(Block)的一次写操作。ZFS从ZFS Pool中动态挑选出一个最优的设备,并且以一个transaction(事 务)线性写入,充分有效地利用了现有设备的带宽,我们把这个特性称为Dynamic Striping。而相对应的Static Striping则是传统文件系统所使用的方式,Static Striping需要管理员预先对这组Stripe进行正确地计算人为 设置,而且如果加入新的设备则需要再次人为的计算和设置,更为严重的是如果人为计算错误,则会直接影响系统的性能。而在使用Dynamic Striping这种特性之后,我们根本不需要人为介入,ZFS会自动调整,智能的为你 提供最佳的设备,最快的操作方式。

  • 支持多种 大小的数据块(Multiple Block Size)

ZFS支持多种大小的数据块定义,从512字节到1M字节。和传统文件系统往往都是固定大小数据块不同,ZFS则是可以动态的根据不同 大小的文件进行计算,动态的选择最佳的数据块。

因为不同大小数据 块,直接影响到实际使用硬盘容量和读取速度。如果使用较小的数据块,存储文件所导致的碎片则较少,读写小文件更快一些,但是会导致需要创建更多的metadata,读写大文件则会更费时。如果使用较大的数据块,使用的metadata较少,更利于读写大文件,但是会导致更多的碎片。ZFS根据实际调查现有文件使 用的情况,分析出一个选择数据块大小的算法,动态的根据实际文件大小确定最佳的数据块。所以ZFS是 非常智能的,在不需要系统管理员介入,就可以得到一个自我调优的结果。当然ZFS也支持用户对单个文件或者整个文件系统 所使用的数据块大小的自定义设置。

  • 智能预读取(Intelligent Prefetch)

多数的操作系统都 有这种将数据预先读取的功能,而ZFS则是建立在文件系统上直接提供的一种更加智能的数据预读取功能。它不仅可以智能地识别出多种读取模式, 进 行提前读取数据,而且可以对每个读取数据流进行这种预读取智能识别,这个对许多流媒体提供者来说是件非常好的事情。

在扩展性上,和现有文件系统多是基于一个受限的静态模型不同,ZFS是采用ZFS Pool这个动态概念,它的metadata也是动态,并且读写操作都是可并行的,并且具有优先级概念,所以即使在大数据量,多设备的情况下仍可以保证性能的线性增长。

6. 自我修复功能

  • ZFS Mirror 和 RAID-Z

传统的硬盘Mirror及RAID 4,RAID 5阵列方式都会遇到前面提到过的问题:Silent Data Corruption。如果发生了某块硬盘物理问题导致数据错误,现有的Mirror,包括RAID 4,RAID 5阵列会默默地把这个错误数据提交给上层应用。如果这个错误发生在Metadata中,则会直接导致系统的Panic。 而且还有一种更为严重的情况是:在RAID 4和RAID 5阵列中,如果系统正在计算Parity数值,并再次写入新数据和新Parity值的时候发生断电,那么整个阵列的所有存储的数据都毫无意义了。

在ZFS中则提出了相对应的ZFS Mirror和RAID-Z方式,它在负责读取数据的时候会自动和256位校验码进行校验,会主动发现这种Silent Data Corruption,然后通过相应的Mirror硬 盘或者通过RAID-Z阵列中其他硬盘得到正确的数据返回给上层应用,并且同时自动修复原硬盘的Data Corruption 。

  • Fault Manager

在Solaris 10中,包含 一个ZFS诊断引擎和Solaris的 Fault Manager(这也是Solaris 10的 另一个新特性)交互,可以实时地诊断分析并且报告ZFS Pool和存储设备的错误,用户可以通过Fault Manager及时得到一个非常友善的消息。这个诊断引擎虽然不会采取主动的行为去修复或者解决 问题,但是会在消息中提示系统管理员可采取的动作。类似下面一个ZFS报错消息,其中REC-ACTION就是建议采取的动作:

SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major

EVENT-TIME: Fri Mar 10 11:09:06 MST 2006

PLATFORM: SUNW,Ultra-60, CSN: -, HOSTNAME: neo

SOURCE: zfs-diagnosis, REV: 1.0

EVENT-ID: b55ee13b-cd74-4dff-8aff-ad575c372ef8

DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information.

AUTO-RESPONSE: No automated response will occur.

IMPACT: Fault tolerance of the pool maybe compromised.

REC-ACTION: Run ’zpool status -x’ and replace the bad device.

7. 安全

在安全上,ZFS支持类似NT风格NFSv4版的ACL(读取控制列表)。而且前面所提到的256位验证码,用户可选择多种验证方式,包括SHA-256验证算法,从而在物理存储单元级别上保证数据的安全性。

8. 超强功能

ZFS作为“最后一个文件系统”,涵盖了基本的文件系统和Volume管理的功能,同时 一并提供许多企业级别的超强功能:Quota(配额),Reservation(预留), Compression(压 缩), Snapshot(快照),Clone(克隆)。并且速度非常快。有了这个文件系统,大家再也不需要任何Volume Manager了。

9. 兼容性

ZFS是一个完全兼容POSIX规范的文件系统,所以处于上层的应用程序是完全不受影响。ZFS也提供一个Emulated Volume模块,可以把任何一个ZFS文件系统作为普通的块设备使用。同时ZFS也可以使用基于Volume Manager构建的Volume作为存储设备单 元。这样在不需要修改应用程序,不修改已有文件系统下,给了大家最大的自由度去获得ZFS提供的各 种特性。

10. 开源

ZFS是Sun Microsystems公 司作为OpenSolaris的一个开源项目运作并且完全免费使用,点击这里(http://www.opensolaris.org/os/community/zfs/source/) 可以直接浏览到ZFS的代码。 这就代表着我们不仅同时可以享受商业公司的高质量,也可以获得开源模式的优点。

(来源: 老饿鱼的地盘)

虽然目前只有Solaris支持该文件系统,但是这种开源的模式必定会促进更多基于ZFS的应用。现在已经有国外开发者正在将ZFS移植到Linux和Mac OS上来。如果想要体验一下ZFS,由于目前它和Solaris 10绑定在一起,所以需要下载最新版的Solaris 10 6/06

相关链接

(原文链接)

即使在互联网这个盛产梦想家的行业,也很少有谁的梦想比千橡集团董事长陈一舟的更大。

如果说过去两年里,所有想抓住“下一件大事”的网络创业公司都不能免俗的把自己定位为“下一个MySpace”、“下一个YouTube”、 “下一个Facebook”或“下一个Craigslist”,陈一舟的雄心则从来不止于此,在一系列网站搭建及收购兼并之后,他的公司或许可以被称为 “下一个MySpace+YouTube+Facebook+Craigslist”。换句话说,基于web 2.0理念有所成就的几种商业模式,都被陈飞快地融为己用,并伺机上市。

  理论上,陈具备吞下这样一盘“意大利面”的能力。作为中国互联网业早期创业者之一,他也是最早把一个网络社区做出商业价值的人。虽然 Chinaren并没有成功IPO,但陈因此积累了足够多的个人财富、经验与人脉——这帮助他在千橡的二度创业时,获得了其他创业者难以企及的两轮共计 5800万美元的风险投资。过去十年中,还没有任何一家网络公司在创业初期就获得如此规模的资本青睐。还有,在他重新为自己贴上“社区”标签之前,陈所创建的无线增值服务公司(SP)收入喜人。据2006年初风险投资者预估,在2007年底,其SP业务的利润可以达到1500万美元——这一收入规模,结合以方兴未艾的web 2.0概念,或者说,web 2.0公司“一切皆有可能”的市盈率,上市并不困难。

  但正如接下来人们所看到的,这满盘算计变成了一次关于上市的“无主题变奏”。

link:http://homes.cerias.purdue.edu/~florian/reiser/reiserfs.php

The structure of the Reiser file system

by Florian Buchholz

Updated January 2006: mostly corrected minor inaccuracies and mistakes. Thanks to all the people who have brought those to my attention. I also added some more explanations where I found they were useful.

The Reiser file system was created by Hans Reiser. The design objectives were to increase performance over the ext2 file system, offer a space efficient file system, and to improve handling of large directories compared to existing file systems. Reiserfs uses balanced trees to store files and directories and it also offers journaling.

This document describes the on-disk structure of the Reiser file system version 3.6. This document does not describe how the file system tree is balanced, how the journaling is performed, or how files and directories are managed within an implementation of the file system.

Blocks

The reiserfs partition is divided into blocks of a fixed size. The blocks are numbered sequentially starting with block 0. There is a maximum number of 2^32 possible blocks in one partition.

The partition starts with the first 64k unused to leave enough room for partition labels or boot loaders. After that follows the superblock. The superblock contains important information about the partition such as the block size and the block numbers of the root and journal nodes. The superblock block number differs depending on the block size, but always starts at byte 65536 of the partition. The default block size for reiserfs under Linux is 4096 bytes. This makes the superblock block number 16. There is only one instance of the superblock for the entire partition.

Directly following the superblock is a block containing a bitmap of free blocks. The number of blocks mapped in the bitmap depends directly on the block size. If a bitmap can map k blocks, then every k-th block will be a new bitmap block.

Block size in bytes 4,096 512 1,024 8,192
# blocks a bitmap block can address 32,768 4,096 8,192 65,536
superblock # 16 128 64 8
1st bitmap block # 17 129 65 9
2nd bitmap block # 32,768 4,096 8,192 65,536
3rd bitmap block # 65,536 8,192 16,384 131,072
4th bitmap block # 98,304 12,288 24,576 196,608
...

(assuming that the partition is large enough to have 2nd, 3rd, 4th, ... bitmap blocks)

Following the first bitmap block should be the journal, but the information in the superblock is the authoritative source for that information.

The Superblock

The superblock layout
Name Size Description
Block count 4 The number of blocks in the partition
Free blocks 4 The number of free blocks in the partition
Root block 4 The block number of the block containing the root node
Journal block 4 The block number of the block containing the first journal node
Journal device 4 Journal device number (not sure what for)
Orig. journal size 4 Original journal size. Needed when using partition on systems with different default journal sizes.
Journal trans. max 4 The maximum number of blocks in a transaction
Journal magic 4 A random magic number
Journal max batch 4 The maximum number of blocks in a transaction
Journal max commit age 4 Time in seconds of how old an asynchronous commit can be
Journal max trans. age 4 Time in seconds of how old a transaction can be
Blocksize 2 The size in bytes of a block
OID max size 2 The maximum size of the object id array
OID current size 2 The current size of the object id array
State 2 State of the partition: valid (1) or error (2)
Magic string 12 The reiserfs magic string, should be "ReIsEr2Fs"
Hash function code 4 The hash function that is being used to sort names in a directory
Tree Height 2 The current height of the disk tree
Bitmap number 2 The amount of bitmap blocks needed to address each block of the file system
Version 2 The reiserfs version number
Reserved 2  
Inode Generation 4 Number of the current inode generation.

The inode generation number is a counter that denotes the current generation of inodes. The counter is increased every time the tree gets re-balanced.

Example:

The following is the start of the superblock of a 256MB reiserfs partition on an Intel based system:

00000000 66 00 01 00 93 18 00 00 82 40 00 00 12 00 00 00  f........@......
00000010 00 00 00 00 00 20 00 00 00 04 00 00 ac 34 11 57  ..... ......¬4.W
00000020 84 03 00 00 1e 00 00 00 00 00 00 00 00 10 cc 03  ..............Ì.
00000030 08 00 02 00 52 65 49 73 45 72 32 46 73 00 00 00  ....ReIsEr2Fs...
00000040 03 00 00 00 04 00 03 00 02 00 00 00 dc 52 00 00  ............ÜR..
An example superblock
Block count: 65638
Free blocks: 6291
Root block: 16514
Journal block: 18
Journal device: 0
Original journal size: 8192
Journal trans. max: 1024
Journal magic: 1460745388
Journal max. batch: 900
Journal max. commit age: 30
Journal max. trans. age: 0
Blocksize: 4096
OID max. size: 972
OID current size: 8
State: 2 (error)
Magic String: ReIsEr2Fs
Hash function code: 3
Tree height: 4
Bitmap number: 3
Version: 2
Inode generation: 21212

Notes: the state of mounted partitions should be "error" so that if the system crashes while the file system is mounted it can be detected the next time the system starts. The version number does not indicate the file system version (3.6) but rather the revision number of the file system structure.

Bitmap blocks

The bitmap blocks are simple bitmaps, where every bit stands for a block number. One bitmap block can address (8 * block size) number of blocks. Byte 0 of the bitmap maps to the first eight blocks, the second byte to the next eight, and so on. Within a byte, the low order bits map to the the lower number blocks. Bit 0 maps to the first block, bit 1 to the second, etc. A set bit indicates that the block is in use, a zero bit that the block is free.

Example:

00000400 ff ff f7 ff 7f 00 00 00 00 00 00 00 00 80 cb bd  ÿÿ÷ÿ..........˽
These 16 bytes of bitmap block 0 describe block numbers 8192 to 8319.

Blocks 8192-8210: used
Block 8211: free (f7 is 11110111 binary)
Blocks 8212-8230: used
Blocks 8231-8302: free
Blocks 8303-8305: used
Block 8306: free
Block 8307: used
Blocks 8308-8309: free
Blocks 8310-8312: used
Block 8313: free
Blocks 8314-8317: used
Block 8318: free
Block 8319: used

Had the above entry been from a bitmap block other than bitmap block 0, then (bitmap block # * block size * 8) needs to be added for the proper block number. By bitmap block # we understand the ordinal number (0 for the 1st, 1 for the second, ...) not the block number of the bitmap block.

Given a block number b, one can determine its status as follows:

b div (8 * block size) : bitmap block # (integer division)

Let r = b mod (8* block size), then

r div 8: byte within bitmap block, and
r mod 8: bit within byte

The File System Tree

The Reiser file system is made up of a balanced tree (B+ or S+ tree as it is called in the reiserfs documentation). The tree is composed of internal nodes and leaf nodes. Each node is a disk block. Each object (called an item) in reiserfs (file, directory, or stat item) is assigned a unique key, which can be compared to an inode node number in other file systems. The internal nodes are mainly composed of keys and pointers to their child nodes. There is always one more pointer than there are keys. P0 points to the objects that have keys smaller than K0, P1 to those K0<=obj<K1, and so on. The last pointer points to those objects larger than the last key in the node. Each node has a level, with 1 denoting leaf nodes and 2 and higher denoting internal nodes. The root node has the highest level.

For our example partition, part of the S+ tree looks like this (think of the key as a large 128-bit number for now):

The reiserfs S+-tree

The internal nodes are the boxes, and the leaf nodes are depicted as circles. Note that the block numbers do not relate directly to the keys. Reiserfs tries to assign items whose key values lie closely together block numbers that are also close together, but this does not matter for the description of the structural layout.

Block headers

Each disk block that belongs to an internal or leaf node starts with a block header. Only unformatted blocks don't have a block header. A block header is always 24 bytes long and contains the following information:

The block header structure

Name Size Description
Level 2 level of the block in the tree
Nr. of items 2 number of items in the block
Free space 2 free space left in the block
Reserved 2  
Right key 16 right delimiting key for the block

The right delimiting key was originally used for leaf nodes but is now only kept for compatibility.

Example:

The following is the block header of block 8416, the leftmost leaf node in the tree.

00000000 01 00 06 00 e4 04 00 00 00 00 00 00 00 00 00 00  ....ä...........
00000010 00 00 00 00 00 00 00 00
Example of a block header

Level: 1
Items: 6
Free space: 1252 bytes

Keys

Keys are used in the Reiser file system to uniquely identify items, but also to locate them in the tree and achieve local groupings of items that belong together. A key consists of four objects: the directory id, the object id, the offset within the object, and a type. Note that the actual object identifier is only one part of the key. The directory id is present so that files that belong into the same directory are grouped together and for the most part are located in the same subtree(s). The offset is present because an indirect item can at most contain (blocksize-48)/4 pointers to unformatted blocks (see indirect items below). For a block size of 4096 bytes this would result in a maximum file size of 4048KB. To be able to handle larger files, multiple keys are used to reference the file. All fields of the key are the same, except for the offset, which denotes the offset in bytes of the file, which a particular key references. I do not know why the type of an object is part of the actual key.

In reiserfs up until version 3.5 the offset and the type fields were both 4 byte values. This meant, that the maximum file size was limited to roughly 2^32 bytes, or 4GB (2^32 bytes plus the data of one more indirect item plus the tail, actually). To increase the maximum file size in the file system, in version 3.6, the offset field was increased to 60 bits, and the type field shrunk to 4 bits. This now allows for a theoretical maximum file size of 2^60 bytes, but as there can be only 2^32 blocks with a maximum of 2^16 bytes per block, the file system itself only supports 2^48 bytes.

In order not to be incompatible to older versions of the file system, there are now to different versions of keys around, which can be very confusing as the key itself doesn't carry a version number. To make up for this, the formerly reserved last 16 bits of the item header (see item header below) now contain a version number, so if necessary, the key's version number can be obtained from there. This makes it fairly straightforward for keys contained in leaf nodes, but if one really wanted to determine the version of a key inside an internal node, one would have to follow the tree down to the leaf, first. The code in the reiserfs library actually uses this ugly hack to determine the key format:

static inline int is_key_format_1 (int type) {
    return ( (type == 0 || type == 15) ? 1 : 0);
}

/* old keys (on i386) have k_offset_v2.k_type == 15 (direct and
   indirect) or == 0 (dir items and stat data) */

/* */
int key_format (const struct key * key)
{
    int type;

    type = get_key_type_v2 (key);

    if (is_key_format_1 (type))
        return KEY_FORMAT_1;

    return KEY_FORMAT_2;
}
This actually implies that stat items will always be assumed to have KEY_FORMAT_1, because they, also, have a type of zero in version 2.

Version 1 key:
Key of version 1

Name Size Description
Directory ID 4 the identifier of the directory where the object is located
Object ID 4 the actual identifier of the object ("inode number")
Offset 4 the offset in bytes that this key references
Type 4 the type of item. Possible values are:
Stat: 0
Indirect: 0xfffffffe
Direct: 0xffffffff
Directory: 500
Any: 555

Version 2 key:
Key of version 2

Name Size Description
Directory ID 4 the identifier of the directory where the object is located
Object ID 4 the actual identifier of the object ("inode number")
Offset 60 bits the offset in bytes that this key references
Type 4 bits the type of item. Possible values are:
Stat: 0
Indirect: 1
Direct: 2
Directory: 3
Any: 15

Only stat items have an offset of 0. Files (direct and indirect items) and directories always start with an offset of 1 so that they are sorted behind the stat item in the leaf nodes. For directory items the "offset" field contains the hash value and generation number of the leftmost directory header of the directory item (see below), not the offset in bytes.

Examples:

The following shows the first two keys of the internal node that is contained in block 8482. The first one is of version 2, the second of version 1.

00000000 02 00 00 00 0e 00 00 00 00 00 00 00 00 00 00 00  ................
Example of a key of version 2

Directory id: 2
Object id: 14
Offset: 0
Type: Stat item (0)

00000000 03 00 00 00 04 00 00 00 01 00 00 00 f4 01 00 00  ............ô...
Example of a key of version 1

Directory id: 3
Object id: 4
Offset: 1
Type: Directory item (500)

Two keys are compared by comparing their directory ids first, and if those are equal, by comparing the object ids, and so on for offset and type. When inspecting the Linux reiserfs source code, on can see that a warning is generated when the type fields need to be compared for keys stored in memory. This indicates that the type field does not matter from a structural point of view. The only time the field needs to be compared seems to be during "tail conversion", where a direct item is changed into an indirect one.

Internal nodes

An internal node block consists of the block header, keys, and pointers to child nodes. Unlike our figure of the S+-tree above, the internal nodes list all the keys first, which are sorted by the key values. Then following the last key come the pointers, starting with the pointer to the subtree containing all the keys smaller to the first key.

Internal node layout

The level in the block header should always be larger than 1 for internal nodes. The number of items in the block header denotes the number of keys in the node, not the combined number of keys and pointers. There is always one more pointer than there are keys. The following figure describes the layout of the pointer structure:

Pointer to child node

Given a key n (whose position in the block is 24 + n * 16 bytes) and a total number of k keys in the block, the left pointer that corresponds to key n can be found at byte 24 + k * 16 + n * 8. The free space starts at byte blocksize - free space, where free space is the value from the block header.

Example:

00000000 02 00 a0 00 e0 00 00 00 00 00 00 00 00 00 00 00  .. .à...........
00000010 00 00 00 00 00 00 00 00 02 00 00 00 0e 00 00 00  ................
00000020 00 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00  ................
00000030 01 00 00 00 f4 01 00 00 03 00 00 00 9e 04 00 00  ....ô...........
00000040 00 00 00 00 00 00 00 00 04 00 00 00 05 00 00 00  ................
...
00000a10 01 10 00 00 00 00 00 20 e0 20 00 00 04 0b b4 cc  ....... à ....´Ì
00000a20 03 21 00 00 94 0d 54 c5 0b 21 00 00 e0 0f 2f c5  .!....TÅ.!..à./Å
00000a30 5e 23 00 00 b4 0f f4 ff 60 23 00 00 38 07 a9 ff  ^#..´.ôÿ`#..8.©ÿ
...

Level: 2
Nr. items: 160
Free space: 224 bytes

Key 0: {2, 14, 0, 0}
Key 1: {3, 4, 1, 500}
Key 2: {3, 1182, 0, 0}
...
Ptr 0: {8416, 2820}
Ptr 1: {8451, 3479}
Ptr 2: {8459, 4064}
Ptr 3: {9054, 4020}
...

This example shows parts of block 8482, which is also depicted in the diagram describing the S+-tree above. Key 0 starts at byte 24 (0x18), and as there are 160 items in the block, Ptr 0 starts at byte 2584 (0xa18). Note that the reserved parts of the pointers actually contain junk data. The free space starts at byte 3872 (0xf20) and it may also contain junk data.

Leaf nodes

Leaf nodes are found at the lowest level of the S+-tree. Except for indirect items all the data is contained within the leaf nodes. Leaf nodes are made up of the block header, item headers, and items:

Leaf node layout

Note that the free space in the block is located between the last item header and item, and that items are in reverse order. This way, new item headers and items can simply be added without having to rearrange existing items. New headers go after the last header, and new items before the first on-disk item. Also note that items are of variable length.

Item Headers

The item header describes the item it refers to. It contains the key for the item as well as the item's location and size within the leaf node. The type of the item is determined by its key.

Item header layout

Name Size Description
Key 16 The key that belongs to the item
Count 2 The free space in the last unformatted node for an indirect item if this is an indirect item
0xffff for stat and direct items
the number of directory entries for a directory item
Length 2 total size of the item within the block
Location 2 offset to the item body within the block
Version 2 0 for all old items (keys), 1 for new ones
Note that the length indicates the length that the item uses up within the current block, not the length of the file. Also note that the comments in the structure definition in the Reiserfs source code indicate that new items have a version of 2. However, the KEY_FORMAT_3_6 constant is defined as 1 and this is used to set the version.

Example:

The following is the item header for the stat item described by key {2, 14, 0, 0}, which was used earlier as an example of type 2 (version 3.6). It shows that the version is indeed the new version, even though the heuristic above would indicate an old key.

00000000 02 00 00 00 0e 00 00 00 00 00 00 00 00 00 00 00  ................
00000010 ff ff 2c 00 d4 0f 01 00                          ÿÿ,.Ô...

Example of an item header

Key: {2, 14, 0, 0}
Count: 0xffff
Length: 44 bytes
Location: byte 4052
Version: 1 (3.6)

Items

Items finally contain actual data. There are four types of items: stat items, directory items, direct items, and indirect items. Files are made up of one or more direct or indirect item, depending on the file's size. Every file and directory is preceded by a stat item.
Stat Items
Stat items contain the meta-data for files and directories. Keys belonging to stat items always have an offset and type of 0, so that the stat item key always comes first before the other one(s) belonging to the same "inode number". Due to the same reason that there are two versions of keys, there are also two versions of stat items, as the size field was increased from 32 bits to 64 bits. For some reason, the fields for number of hard links, user id, and group id also were increased from 16 bits to 32 bits, each and other fields were introduced. Thus a stat item of version 3.5 is 32 bytes in size, whereas one of version 3.6 has 44 bytes.

The structure of a stat item of version 1:

Structure of the stat item version 1

Name Size Description
Mode 2 file type and permissions
Num links 2 number of hard links
UID 2 user id
GID 2 group id
Size 4 file size in bytes
Atime 4 time of last access
Mtime 4 time of last modification
Ctime 4 time stat data was last changed
Rdev/blocks 4 Device number /
number of blocks file uses
First dir. byte 4 first byte of file which is stored in a direct item
if it equals 1 it is a symlink
if it equals 0xffffffff there is no direct item.

The structure of a stat item of version 2:

Structure of the stat item version 2

Name Size Description
Mode 2 file type and permissions
Reserved 2  
Num links 4 number of hard links
Size 8 file size in bytes
UID 4 user id
GID 4 group id
Atime 4 time of last access
Mtime 4 time of last modification
Ctime 4 time stat data was last changed
Blocks 4 number of blocks file uses
Rdev/gen/first 4 Device number/
File's generation/
first byte of file which is stored in a direct item
if it equals 1 it is a symlink
if it equals 0xffffffff there is no direct item.

The file mode field identifies the type of the file as well as the permissions. The low 9 bits (3 octals) contain the permissions for world, group, and user, the next 3 bits (from lower to higher) are the sticky bit, the set GID bit, and the set UID bit. The high 4 bits contain the file type. On a Linux system, possible values for the file type are (as defined in stat.h):

Constant Name 16-bit Mask 4-bit value Description
S_IFSOCK 0xc000 12 socket
S_IFLNK 0xa000 10 symbolic link
S_IFREG 0x8000 8 regular file
S_IFBLK 0x6000 6 block device
S_IFDIR 0x4000 4 directory
S_IFCHR 0x2000 2 character device
S_IFIFO 0x1000 1 fifo

Other operating systems might have additional file types. Only regular files and directories have other items associated with the stat item. In all the other cases the stat item makes up the entire file.

The "rdev" field applies to special files that are not regular files (S_IFREG), directories (S_IFDIR), or links (S_IFLNK). In those cases, the field holds the device number (or socket number) belonging to the file. The "generation" field applies to the other cases and denotes the inode generation number for the file/directory/link (see above for superblock inode generation field' description). The "first" field doesn't seem to be used in Reiserfs 3.6 anymore.

Example:

The following example shows the stat item denoted by key {2, 14, 0, 0} from the item header example above:

00000000 ff 43 05 00 03 00 00 00 50 00 00 00 00 00 00 00  ÿC......P.......
00000010 00 00 00 00 00 00 00 00 2d 1c 17 3f 34 94 ff 3e  ........-..?4.ÿ>
00000020 34 94 ff 3e 01 00 00 00 00 00 00 00              4.ÿ>........

Example of an stat item version 2

Mode: 0x43ff -- type: directory, sticky bit set, 777 permissions
Reserved: 5
Num. links: 3
Size: 80 bytes
UID: 0
GID: 0
Atime: Thu Jul 17 16:59:09 2003
Mtime: Sun Jun 29 20:36:52 2003
Ctime: Sun Jun 29 20:36:52 2003
Blocks: 1
First: 0

Directory Items
Directory items describe a directory. If there are too many entries in a directory to be contained in one directory item, it will span across several directory items, using the offset value of the key. Directory items are made up of directory headers and file names. Just like leaf nodes, the free space (if there is any) is located in the middle of the item. The structure of a directory item is as follows:

Structure of a directory item

Directory headers contain an offset, the first two parts of the referenced item's key (directory id and object id), the location of the name within the block, and a status field.

Structure of a directory header

Name Size Description
Offset 4 Hash value and generation number
Dir ID 4 object id of the referenced item's parent directory
Object ID 4 object id of the referenced item
Location 2 offset of name within the item
State 2 bit 0 indicates that item contains stat data (not used)
bit 2 whether entry is visible (bit set) or hidden

The file names are simple zero-terminated ASCII strings. File name entries seem to be 8-byte aligned, but the information in the directory headers should be the authoritative source for the start of the name (and implicitly the end by looking at the previous header entry). The "offset" field is aptly misnamed as it contains a hash value of the file name. Bits 7 through 30 of the field contains the actual hash value and bits 0 through 6 a generation number in case two file names within a directory hash to the same value. Bit 31 seems to be unused. The hash value is used to actually search for file and directory names in reiserfs, and the directory items are sorted by the offset value. Three different hash functions are possible: keyed tea hash, rupasov hash, and r5 hash. The purpose of the hash function is to create different values for different strings with as little collisions as possible. In the Linux implementation of reiserfs, the r5 hash seems to be the default.

Example:

The following example is an entire directory item, that belongs to the stat item example from the previous section:

00000000 01 00 00 00 02 00 00 00 0e 00 00 00 48 00 04 00  ............H...
00000010 02 00 00 00 01 00 00 00 02 00 00 00 40 00 04 00  ............@...
00000020 00 6d 6f 73 0e 00 00 00 60 00 00 00 30 00 04 00  .mos....`...0...
00000030 76 69 2e 72 65 63 6f 76 65 72 00 00 00 00 00 00  vi.recover......
00000040 2e 2e 00 00 00 00 00 00 2e 00 00 00 00 00 00 00  ................

Example of a directory item

Header 0: {hash 0, gen. 1, 2, 14, byte 0x48, 4 (bit 2 set: visible)}
Header 1: {hash 0, gen. 2, 1, 2, byte 0x40, 4 (bit 2 set: visible)}
Header 2: {hash 15130330, gen. 0, 14, 96, byte 0x30, 4 (bit 2 set: visible)}
Name 2: "vi.recover"
Name 1: ".."
Name 0: "."

As one can see, the directory referenced by key {2, 14, 0, 0} consists of 3 entries, which in return have the following keys (all these keys will lead to the stat item for the directory first):

. {2, 14, 0, 0}
.. {1, 2, 0, 0}
vi.recover {14, 96, 0, 0}

Direct Items
Direct items contain the entire file body of small files or the tail of a file. For small files, all the necessary other information can be found in the item header and the corresponding stat item for the file. For the tail of a file, the key for the direct item is the last one for the file.
Indirect Items
Indirect items contain pointers to unformatted blocks that belong to a file. Each pointer is 4 bytes long and contains the block number of the unformatted block. An indirect item that takes up an entire leaf node can at most contain (blocksize-48) / 4 pointers (the 48 bytes are for the block and item headers). In a partition with 4096 bytes block size, a single indirect item can at most reference 4145152 bytes (4048 KB: 1012 pointers to 4K blocks). Larger files are composed of multiple indirect items, using the offset value in the key, plus a possible tail.

Structure of an indirect item

The Journal

The journal in reiserfs is a continuous set of disk blocks and it describes transactions made to the file system. Each time the file system is modified in any way, instead of performing the changes directly in the file system, the transactions that belong together (those that need to be atomic so that the file system is in a consistent state) are written into the journal first. At a later point the transactions in the journal will be flushed and, if everything was successful, marked as such.

The journal is of fixed size in the file system. In the 2.4.x Linux implementation the journal size is fixed at 8192 blocks plus one block for the journal header. The journal itself consists of variable-length transactions and a journal header. The journal starts with the list of transactions and the journal header is at the end of the journal. A transaction spans at least three disk blocks and the journal header is exactly one block. The journal is a circular buffer, meaning that once the last block of the journal is reached, it wraps around and uses the first block again.

It can often be read that reiserfs only records the file system meta data in its journal. This is not entirely correct. It is true, that purpose of the journaling is to ensure the integrity of the meta data. However, reiserfs journals entire disk blocks as they have to appear in the file system after the journal transaction is committed. As directories, stat data and small files are stored directly in the leaf nodes of the tree, some amount of data is also contained in the journal and could be used to reconstruct earlier versions of a file or directory.

The journal layout

Journal Header

The journal header is a single block which describes where the first unflushed transaction can be found in the journal. The journal header is the last block of the journal. In our example the journal's first transaction starts at block 18 and there are 8192 journal blocks. Therefore. the journal header is at block 8210. There are only 12 bytes of information in the journal header. The rest of the block is undefined.

The journal header

Name Size Description
Last flush ID 4 The transaction ID of the last fully flushed transaction
Unflushed offset 4 The offset (in blocks) of the next transaction in the journal
Mount ID 4 The mount ID of the flushed transaction

The transaction pointed to by the offset must have a higher transaction ID or a higher mount ID than the flushed transaction in order to be considered an unflushed transaction. If this is not the case, all transactions are considered flushed and the block pointed to by the offset is used to start recording new journal transactions.

Example:

00000000 e2 74 02 00 24 1c 00 00 1d 01 00 00 12 00 00 00  ât..$...........

The journal header example

Last flush ID: 160994
Unflushed offset: 7204 blocks
Mount ID: 285

In this example, the first unflushed transaction can be found at block 7222 (as the journal starts at block 18). However, the block found there does not contain a transaction description (see below) and therefore there aren't any unflushed transactions for the partition.

Transactions

Transactions describe changes in the file system. Instead of directly modifying blocks in the file system tree, new or changed blocks are first written into the journal and mapped to their real location in the file system.

A transaction consists of a transaction description block, a list of blocks, and a commit block at the end. All those blocks are contiguous within the journal.

The journal transaction layout

Description block

The description block contains the transaction and mount IDs, the number of blocks in the transaction, a magic number, and the first possible half of mappings.

The journal transaction layout

Name Size Description
Transaction ID 4 The transaction ID
Len 4 Length (in blocks) of the transaction
Mount ID 4 Mount ID of the transaction
Real blocks Block size - 24 Mapping for blocks in transaction
Magic 12 Magic number. Should be "ReIsErLB"

The "Real blocks" field is theoretically dependent on the block size. The first 12 bytes of the block have the IDs and the length, and the last 12 bytes contain the magic string. Everything in between is used for the block mapping. However, in the Linux 2.4.x implementation, the struct for a description block defines

  __u32 j_realblock[JOURNAL_TRANS_HALF];
where JOURNAL_TRANS_HALF is a constant set to 1018. This means that the blocksize has to be 4096 for journaling to work with reiserfs under Linux!

The actual block mapping is done as follows: The "Real blocks" field is seen as an array that contains for each block in the transaction the actual block number of the block in the file system. If we number every four bytes in the field as r0 through rn, then block 0 of the transaction is how block number r0 needs to look like after flushing the journal. Block 1 of the transaction is block r1, and so on. If the "Real blocks" field of the description block is not large enough, the field in the commit block is used in addition. This limits the maximum number of blocks in one transaction to 2*(blocksize-24)/4. (2036 for a block size of 4K), but the actual limit is set in the superblock.

Commit block

The commit block terminates a transaction. It contains a copy of the transaction ID and the transaction length. There is also a 16 byte field reserved for a digest value at the end of the block, but this is not used currently. It also contains the remaining half of the block mappings.

The transaction commit block

Name Size Description
Transaction ID 4 The transaction ID
Len 4 Length (in blocks) of the transaction
Real blocks Block size - 24 Mapping for blocks in transaction
Digest 16 Digest of all blocks in transaction. Not used.

Example:

The following example describes an old transaction in our example partition. The transaction starts in block 7243 (the description block), spans 4 data blocks (7244-7247) and has its commit block at block number 7248. Only the description block is shown, as the other blocks are not relevant for the example.

00000000 1b 6e 02 00 04 00 00 00 1b 01 00 00 90 22 00 00  .n..........."..
00000010 07 f7 00 00 aa 22 00 00 10 00 00 00 00 00 00 00  .÷..ª"..........
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
...
00000ff0 00 00 00 00 52 65 49 73 45 72 4c 42 00 00 00 00  ....ReIsErLB....

A sample transaction description block

Transaction ID: 159259
Length: 4 blocks
Mount ID: 283
Real blocks[0]: 8848
Real blocks[1]: 63239
Real blocks[2]: 8874
Real blocks[3]: 16
Magic: ReIsErLB

This transaction therefore describes the following mapping: when the transaction is committed/flushed, block 7244 is written to block 8848, block 7245 to block 63239, block 7246 to block 8874, and block 7247 to block 16 (the superblock).

Note: the superblock needs to be updated by a large number of file system operations (e.g. when the free block count is updated or the height of the file system tree changes). Thus a copy of the superblock can be found in many journal transaction blocks. This can lead to confusion when trying to locate the superblock via the magic string. The real superblock *should* be the first block with the magic string on the partition.

Navigating reiserfs

In addition to the file system tree itself, in order to access files, one needs to navigate through the directory tree. The root directory of a Reiser file system always has the key {1, 2, 0, 0}. The keys for subsequent directories and files within the directory hierarchy can then be found in the headers of the directory items. As the keys in reiserfs are sorted by parent directory ID first, items that are in the same directory are grouped together in the file system tree. This allows for searching for keys locally instead of always having to go through the root node of the file system.

A key {a, b, 0, 0} will always yield the stat item of the directory or file, and subsequent items will follow immediately after that in the file system tree. The stat item contains the size of the actual item in bytes. With this information and using the size information of the individual item headers, the keys for other parts of the directory/file can be constructed and the parts located. In many cases, the items will be arranged consecutively on the disk, anyway.

The following three examples will show three different types of files: a very small file consisting only of a stat item and a tail, a larger file that actually has an indirect item, and finally a very large file that spans over multiple indirect items. We again use the example partition from above, which is an image of a partition mounted as "/var" in a SuSe Linux 8.0 system.

Example 1: small file

The first example is that of a small file that contains only of a stat item and one direct item. The file is "/var/log/y2start.log-initial". The root directory ("/var") has key {1,2,0,0}, which by navigating the file system tree can be found in block 8416. There we can find that the "log" directory has key {2,13,0,0}. This directory is also contained in block 8416. The file "y2start.log-initial" has key {13, 1633, 0, 0}. By inspecting block 8482, we find that this key is contained in the leaf node block number 24224. The item headers for the keys {13, 1633, 0, 0} and {13, 1633, 1, 2} are as follows:
00000090 0d 00 00 00 61 06 00 00 00 00 00 00 00 00 00 00  ....a...........
000000a0 ff ff 2c 00 a4 0b 01 00 0d 00 00 00 61 06 00 00  ÿÿ,.¤.......a...
000000b0 01 00 00 00 00 00 00 20 ff ff f0 00 b4 0a 01 00  ....... ÿÿð.´...
Key: {13, 1633, 0, 0}
Count: 0xffff
Length: 44 bytes
Location: byte 2980 (0xba4)
Version: 1 (new)

Key: {13, 1633, 1, 2}
Count: 0xffff
Length: 240 bytes
Location: byte 2740 (0xab4)
Version: 1 (new)

At byte 2740 (0xab4) in the block, we find the direct item for the file followed by the stat item at byte 2980 (0xba4):

00000ab0             65 6e 76 0a 65 63 68 6f 20 59 32 44      env.echo Y2D
00000ac0 45 42 55 47 20 28 29 0a 6d 65 6d 69 6e 66 6f 20  EBUG ().meminfo
00000ad0 31 20 3d 20 4d 65 6d 3a 20 31 30 33 33 34 35 36  1 = Mem: 1033456
00000ae0 20 38 35 39 37 36 20 39 34 37 34 38 30 20 30 20   85976 947480 0
00000af0 36 34 32 34 20 35 37 31 37 32 0a 69 53 65 72 69  6424 57172.iSeri
00000b00 65 73 3d 31 0a 68 76 63 5f 63 6f 6e 73 6f 6c 65  es=1.hvc_console
00000b10 3d 31 0a 58 31 31 69 3d 0a 4d 65 6d 54 6f 74 61  =1.X11i=.MemTota
00000b20 6c 3d 31 30 33 33 34 35 36 0a 66 62 64 65 76 5f  l=1033456.fbdev_
00000b30 6f 6b 3d 31 0a 75 70 64 61 74 65 3d 0a 58 56 65  ok=1.update=.XVe
00000b40 72 73 69 6f 6e 3d 34 0a 58 53 65 72 76 65 72 3d  rsion=4.XServer=
00000b50 66 62 64 65 76 0a 78 73 72 76 3d 58 46 72 65 65  fbdev.xsrv=XFree
00000b60 38 36 0a 73 63 72 65 65 6e 3d 66 62 64 65 76 0a  86.screen=fbdev.
00000b70 6d 65 6d 69 6e 66 6f 20 32 20 3d 20 4d 65 6d 3a  meminfo 2 = Mem:
00000b80 20 31 30 33 33 34 35 36 20 39 32 34 30 34 20 39   1033456 92404 9
00000b90 34 31 30 35 32 20 30 20 38 32 33 32 20 36 30 35  41052 0 8232 605
00000ba0 31 36 0a 00 a4 81 05 00 01 00 00 00 ef 00 00 00  16..¤.......ï...
00000bb0 00 00 00 00 00 00 00 00 00 00 00 00 25 15 3e 3d  ............%.>=
00000bc0 25 15 3e 3d 25 15 3e 3d 08 00 00 00 d5 02 00 00  %.>=%.>=....Õ...
Mode: S_IFREG (regular file), -rw-r--r--
Num. links: 1
Size: 239
UID: 0
GID: 0
A/M/Ctimes: 07/23/2002 21:47:01
Blocks: 8
Gen: 725

Note that the stat item contains the correct size for the file, 239 bytes. Despite that, the item direct item takes up 240 bytes in the block. This means that byte 2979 (0xba3) of the block does not belong to the file's data. This discrepancy is probably due to alignment issues of the implementation.

Example 2: file with indirect item

The file "/var/log/SaX.log" is 7121 bytes long. It therefore cannot fit as a direct item and needs to be split either into two unformatted blocks or one unformatted block and a tail. In this case, the file will take up two unformatted blocks described by one indirect item. The key for the file is {13, 1490, 0, 0} and examining block 8482 we find out that it is contained in leaf node block number 27444. The item headers for the keys:
00000040                         0d 00 00 00 d2 05 00 00          ....Ò...
00000050 00 00 00 00 00 00 00 00 ff ff 2c 00 a4 0b 01 00  ........ÿÿ,.¤...
00000060 0d 00 00 00 d2 05 00 00 01 00 00 00 00 00 00 10  ....Ò...........
00000070 00 00 08 00 9c 0b 01 00                          ........        
Key: {13, 1490, 0, 0}
Count: 0xffff
Length: 44 bytes
Location: byte 2980 (0xba4)
Version: 1 (new)

Key: {13, 1490, 1, 1}
Count: 0
Length: 8 bytes
Location: byte 2972 (0xb9c)
Version: 1 (new)

Indirect blocks and stat item:

00000b90                                     12 52 00 00              .R..
00000ba0 13 52 00 00 a4 81 05 00 01 00 00 00 d1 1b 00 00  .R..¤.......Ñ...
00000bb0 00 00 00 00 00 00 00 00 00 00 00 00 3f aa 4a 3d  ............?ªJ=
00000bc0 bd aa 4a 3d bd aa 4a 3d 10 00 00 00 54 05 00 00  ½ªJ=½ªJ=....T...
Mode: S_IFREG (regular file), -rw-r--r--
Num. links: 1
Size: 7121
UID: 0
GID: 0
C time: Fri Aug 2 10:50:23 2002
M/Atimes: Fri Aug 2 10:52:29 2002
Blocks: 10
Gen: 1364
Block 1: 21010
Block 2: 21011

The file is thus made up of the contents of blocks 21010 and 21011. Block 21010 contains a full 4096 bytes of data, whereas block 21011 contains only 3025 bytes.

Example 3: a large file

The file "/var/lib/rpm/fileindex.rpm" is a file of over 11 MB in size. A single indirect item can not describe the file, as there isn't enough space in a block for such a large indirect item. The file has the key {4, 7, 0, 0}, which can be found in block 16822. This block, however, contains only the stat item for the file. The indirect items for the file span over three more blocks: Key {4, 7, 1, 1} is in block 13286, key {4, 7, 4145153, 1} in block 20171, and key {4, 7, 8290305, 1} in block 20987. Block 13286 contains one single indirect item:
00000010                         04 00 00 00 07 00 00 00          ........
00000020 01 00 00 00 00 00 00 10 00 00 d0 0f 30 00 01 00  ..........Ð.0...
Key: {4, 7, 1, 1}
Count: 0
Length: 4048 bytes
Location: byte 48 (0x30)
Version: 1 (new)

What follows are 1012 pointers to unformatted blocks. Block 20171 has the same structure. Block 20987 also holds just one indirect item, but uses only 3320 bytes for 830 pointers. Note how the offset for the next key derives directly from offset of the previous key and the number of pointers in the previous indirect item:

1 + (1012 pointers * 4096 bytes blocksize) = 4145153
4145153 + (1012 pointers * 4096 bytes blocksize) = 8290305

Last modified: Thu Jan 26 16:54:03 EST 2006

About this Archive

This page is an archive of entries from April 2007 listed from newest to oldest.

March 2007 is the previous archive.

June 2007 is the next archive.

回到 首页 查看最近文章或者查看所有归档文章.