LSM-Tree(14)

3. Cost-Performance and the Multi-Component LSM-Tree(2)

In Section 3.2 following, we compare the I/O insert costs and demonstrate that the small ratio of cost for an LSM-tree of two components to that of a B-tree is a product of two factors.
The first factor, COSTπ/COSTP, corresponds to the advantage gained in the LSM-tree by performing all I/O in multi-page blocks, thus utilizing disk arms much more efficiently by saving a great deal of seek and rotational latency time.
The COSTπ term represents the disk arm cost of reading or writing a page on disk as part of a multi-page block, and COSTP represents the cost of reading or writing a page at random.
The second factor that determines I/O cost ratio between the LSM- tree and the B-tree is given as 1/M, representing the batching efficiency to be gained during a merge step.
M is the average number of entries merged from C0 into a page-sized leaf node of C1.
Inserting multiple entries per leaf is an advantage over a (large) B-tree where each entry inserted normally requires two I/Os to read and write the leaf node on which it resides.
Because of the Five minute rule, it is unlikely in Example 1.2 that a leaf page read in from a B-tree will be re-referenced for a second insert during the short time it remains in buffer.
Thus there is no batching effect in a B-tree index: each leaf node is read in, an insert of a new entry is per- formed, and it is written out again.
In an LSM-tree however, there will be an important batching effect as long as the C0 component is sufficiently large in comparison to the C1 com- ponent.
For example, with 16 byte index entries, we can expect 250 entries in a fully packed 4 KByte node.
If the C0 component is 1/25 the size of the C1 component, we will expect (about) 10 new entries entering each new C1 node of 250 entries during a node I/O.
It is clear that the LSM-tree has an efficiency advantage over the B-tree because of these two factors, and the "rolling merge" process is fundamental to gaining this advantage.
在接下来的3.2节中,我们比较了I/O插入成本,并演示了由两个组件组成的lsm树与b树的成本的小比例是两个因素的乘积。
第一个因素,COSTπ/ cost,对应于在lsm树中通过在多页块中执行所有I/O而获得的优势,从而通过节省大量寻道和旋转延迟时间而更有效地利用磁盘臂。
cost π表示作为多页块的一部分在磁盘上读或写一个页的磁盘代价,而cost p表示随机读或写一个页的代价。
决定LSM-树和b -树之间I/O成本比率的第二个因素为1/M,表示在合并步骤中获得的批处理效率。
M是从C0合并到C1页大小的叶节点的平均条目数。
在每个叶节点插入多个条目比(大)b树更有优势,在b树中,插入的每个条目通常需要两个I/ o来读写它所在的叶节点。
由于5分钟规则,在示例1.2中,从B-tree中读入的叶页不太可能在它留在缓冲区的短时间内被重新引用,以便进行第二次插入。
因此,在B-tree索引中没有批处理效果:读入每个叶节点,执行插入新条目的操作,然后再次将其写出来。
然而,在lsm -树中,只要C0组分相对于C1组分足够大,就会有重要的批处理效应。
例如,对于16字节的索引项,我们可以预期在一个完全打包的4kbyte节点中有250个条目。
如果C0组件的大小是C1组件的1/25,那么在一个节点I/O期间,我们预计(大约)将有10个新条目进入每个250个条目的新C1节点。
显然,由于这两个因素,lsm -树比b -树具有效率优势,而“滚动合并”过程是获得这种优势的基础。(全部有道翻译)

你可能感兴趣的:(LSM-Tree(14))