LSM-Tree(54)

4.2. Recovery in the LSM-tree(4)

This recovery approach clearly works, and its only drawback is that there is a possibly large pause while various disk writes take place during the checkpoint process. This pause is not terribly significant, however, since we can write the C0 component to disk in a short period and then resume inserts to the C0 component while the rest of the writes to disk complete; this will simply result in a longer than usual latency period during which index entries newly inserted to C0 are not merged out to larger disk-based components. Once the checkpoint is complete, the rolling merge process can catch up on work it has missed. Note that the last piece of informa- tion mentioned in the checkpoint log list above was the current information for dynamic allo- cation of new multi-page blocks. In the case of a crash, we will need to figure out in recovery what multi-page blocks are available in our dynamic disk storage allocation algorithm. This is clearly not a difficult problem; in fact a more difficult problem of garbage collecting frag- mented information within such a block had to be solved in [23].
这种恢复方法显然是有效的,惟一的缺点是在检查点过程中发生各种磁盘写操作时可能会有很大的暂停。不过,这个暂停并不是特别重要,因为我们可以在很短的时间内将C0组件写入磁盘,然后在完成对磁盘的其他写入操作时继续插入C0组件;这只会导致比通常更长的延迟时间,在此期间,新插入到C0的索引项不会合并到更大的基于磁盘的组件。一旦检查点完成,滚动合并进程就可以补上它错过的工作。注意,检查点日志列表中提到的最后一条信息是动态分配新多页块的当前信息。在发生崩溃的情况下,我们需要在恢复时弄清楚在动态磁盘存储分配算法中有哪些多页块可用。这显然不是一个困难的问题;事实上,在这样的块中,垃圾收集碎片信息的一个更困难的问题必须在[23]中解决。(有道翻译)

Another detail of recovery has to do with directory information. Note that as the rolling merge progresses, each time a multi-page block or a higher level directory node is brought in from disk to be emptied it must immediately be assigned a new disk position in case a checkpoint occurs before the emptying is completed and remaining buffered information must be forced out to disk. This means that the directory entries pointing down to the emptying nodes must be immediately corrected to point to the new node locations. Similarly we must immediately assign a disk position for newly created nodes so that directory entries in the tree will be able to point immediately to the appropriate position on disk. At every point we need to take care that di- rectory nodes containing pointers to lower-level nodes buffered by a rolling merge are also buffered; only in this way can we make all necessary modifications quickly so that a checkpoint will not be held up waiting for I/Os to correct directories. Furthermore, after a checkpoint occurs and the multi-page blocks are read back into memory buffers to continue the rolling merge, all the blocks involved must be assigned to a new disk position, and thus all directory pointers to subsidiary nodes must be corrected. If this sounds like a great deal of work the reader should recall that there is no additional I/O necessary and the number of pointers in- volved is probably only about 64 for each block buffered. Furthermore these changes should be amortized over a large number of merged nodes, assuming that the checkpoints are only taken frequently enough to keep recovery time from growing beyond a few minutes; this implies a few minutes of I/O between checkpoints.
恢复的另一个细节与目录信息有关。注意,随着轧制合并的进行,每次一个多页的块或更高级别的目录节点从磁盘把它必须立即被分配一个新磁盘的位置,以防出现检查点清空之前完成,剩余的缓冲信息必须被迫离开到磁盘。这意味着指向空节点的目录条目必须立即更正为指向新节点位置。类似地,我们必须立即为新创建的节点分配磁盘位置,以便树中的目录条目能够立即指向磁盘上的适当位置。在每一点上,我们都需要注意,包含由滚动合并缓冲的低层节点指针的目录节点也要被缓冲;只有这样,我们才能快速地进行所有必要的修改,这样检查点才不会因为等待I/ o修正目录而被占用。而且,在检查点发生并且将多页块读入内存缓冲区以继续滚动合并之后,所有涉及到的块都必须分配到一个新的磁盘位置,因此所有指向附属节点的目录指针都必须更正。如果这听起来像是大量的工作,读者应该记得,没有额外的I/O必要,涉及的指针的数量可能只有64个左右的每个块缓冲。此外,这些更改应该分摊到大量合并的节点上,假设检查点的使用频率仅够保持恢复时间不超过几分钟;这意味着在检查点之间需要几分钟的I/O。(有道翻译)

todo:自己翻译,仔细阅读

你可能感兴趣的:(LSM-Tree(54))