[RocksDB profiling series] Iterator

reference resources:

Introduction

After being highly encapsulated, the iterator of RocksDB can be used like the iterator of the iterator constructed by C++ stl library for each container. It can locate a key and start scanning from this key. It can also be used for reverse scanning.

If when creating an iterator, readoptions When a snapshot is given, the iterator will return the data of the snapshot. If nullptr is passed in, the hidden view will be created, and the hidden view cannot be converted to explicit.

User's use of Iterator

  • NewIterator creates an iterator and needs to pass in ReadOptions
  • Seek find a key
  • SeekToFirst iterator moves to the first key position of db, which is generally used to traverse all keys of the whole db in sequence
  • SeekToLast iterator moves to the last key position of db, which is generally used to traverse all keys of the whole db in reverse
  • SeekForPrev moves to the previous position of the current key. It is generally used to traverse the keys between limit, start]
  • The Next iterator moves to the Next key
  • Move Prev iterator to previous key

Iterator::status() can return the error of the current iterator. Error includes IO error, checksum mismatch, internal error, etc

If there are no errors, the status is Status::OK(). If Iterator::Valid() is true, the status must be OK.
If Iterator::Valid() is false, it may be because the iterator has reached the end of the data. In this case, the status is OK; There may also be an error. In this case, the status is not OK.

5.13. In X or earlier versions, the two API s will be different, so you need to consult the documentation again.

Each Iterator

DBIter

Implementation: db/db_iter.cc

Interface: Iterator

DBIter is the encapsulation of InternalIterator. Its main work is to parse the Internal key into user key and show it

Example

In the underlying InternalIterator

InternalKey(user_key="Key1", seqno=10, Type=Put)    | Value = "KEY1_VAL2"
InternalKey(user_key="Key1", seqno=9,  Type=Put)    | Value = "KEY1_VAL1"
InternalKey(user_key="Key2", seqno=16, Type=Put)    | Value = "KEY2_VAL2"
InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"
InternalKey(user_key="Key3", seqno=7,  Type=Delete) | Value = "KEY3_VAL1"
InternalKey(user_key="Key4", seqno=5,  Type=Put)    | Value = "KEY4_VAL1"

DBIter parses the Internal Key into the User Key and exposes the interface to the user

Key="Key1"  | Value = "KEY1_VAL2"
Key="Key2"  | Value = "KEY2_VAL2"
Key="Key4"  | Value = "KEY4_VAL1"

MergingIterator

Implementation: table/merging_iterator.cc

Interface: InternalIterator

Mergeiterator consists of many children iterators

autovector<IteratorWrapper, kNumIterReserve> children_;

The merging iterator will put these children iterators into the heap and display them to the upper layer in order.

Example

The children iterator is:

= Child Iterator 1 =
InternalKey(user_key="Key1", seqno=10, Type=Put)    | Value = "KEY1_VAL2"

= Child Iterator 2 =
InternalKey(user_key="Key1", seqno=9,  Type=Put)    | Value = "KEY1_VAL1"
InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"
InternalKey(user_key="Key4", seqno=5,  Type=Put)    | Value = "KEY4_VAL1"

= Child Iterator 3 =
InternalKey(user_key="Key2", seqno=16, Type=Put)    | Value = "KEY2_VAL2"
InternalKey(user_key="Key3", seqno=7,  Type=Delete) | Value = "KEY3_VAL1"

The mergeiterator puts all children iterator s into the heap to make them orderly

InternalKey(user_key="Key1", seqno=10, Type=Put)    | Value = "KEY1_VAL2"
InternalKey(user_key="Key1", seqno=9,  Type=Put)    | Value = "KEY1_VAL1"
InternalKey(user_key="Key2", seqno=16, Type=Put)    | Value = "KEY2_VAL2"
InternalKey(user_key="Key2", seqno=15, Type=Delete) | Value = "KEY2_VAL1"
InternalKey(user_key="Key3", seqno=7,  Type=Delete) | Value = "KEY3_VAL1"
InternalKey(user_key="Key4", seqno=5,  Type=Put)    | Value = "KEY4_VAL1"

MemtableIterator

Implementation: db/memtable.cc

Interface: InternalIterator

As an iterator for memtables, each memtable implements its own iterator to expose orderly and iterative kv pairs to the outside world

BlockIter

Implementation: table/block_based/block.h

Interface: InternalIterator

This iterator is used to read blocks from SST files, whether they are index block s, data block s or meta block s (DataBlockIter, IndexBlockIter and MetaBlockIter inherit BlockIter). Since the SST File block is ordered and immutable, we load the block into memory and create a BlockIter for the sorted data.

TwoLevelIterator

Implementation: table/two_level_iterator.cc

Interface: InternalIterator

A TwoLevelIterator consists of two iterators

  • first_level_iter_
  • second_level_iter_

first_level_iter_ Used to find which second to use_ level_ iter_ , Then second_level_iter_ Point to the data you actually need to read

Example

RocksDB uses TwoLevelIterator to read SST files, first_level_iter_ BlockIter, second for Index block_ level_ iter_ Is BlockIter for Data block

[Data block, offset: 0x0000]
KEY1  | VALUE1
KEY2  | VALUE2
KEY3  | VALUE3

[Data Block, offset: 0x0100]
KEY4  | VALUE4
KEY7  | VALUE7

[Data Block, offset: 0x0250]
KEY8  | VALUE8
KEY9  | VALUE9

[Data Block, offset: 0x0350]
KEY11 | VALUE11
KEY15 | VALUE15

[Index Block, offset: 0x0500]
KEY3  | 0x0000
KEY7  | 0x0100
KEY9  | 0x0250
KEY15 | 0x0500

first_level_iter_ => BlockIter over Index block
second_ level_ iter_ => Blockiter over data block, will be first_level_iter_ Decide which to point to
When using TwoLevelIterator to seek KEY8, first is used_ level_ iter_ Seek to the corresponding data block (0x0250), and then use second_level_iter_ Come to seek the specific key.

When reading the Data Block, if the Block is in the Block Cache, the object address is returned directly. Otherwise, disk IO occurs. Read the Block of SST, construct the Block object and cache its address in the Block Cache.

Keywords: RocksDB

Added by Ashley Sheridan on Thu, 27 Jan 2022 00:47:50 +0200