You can also use my independent blog—— www.huliujia.com Get this article
brief introduction
The official note of LevelDB describes WriteBatch as follows:
WriteBatch holds a collection of updates to apply atomically to a DB
How to ensure atomicity may need to be clarified after reading the use of WriteBatch. Here, only one WriteBatch object can contain multiple update records (insert / delete) and support batch writing.
Many operations of WriteBatch are implemented through auxiliary classes, which directly operate the member variables of WriteBatch. This article will first introduce the member variables of WriteBatch and these auxiliary classes, and finally introduce the member functions of WriteBatch
Member variable of WriteBatch
WriteBatch has only one member and is a string type. All update records will be encoded and written to this rep_ Yes.
std::string rep_;
rep_ The coding format is as follows:
Number of bytes | 8 bytes | 4 bytes | Lengthen | Lengthen | Lengthen | Lengthen |
---|---|---|---|---|---|---|
content | sequence | count | record 1 | reocrd 2 | record 3 | record ... |
sequence is a 64bit serial number, and each WriteBatch has a unique serial number.
count is rep_ Number of record s contained.
count is followed by the record list. The encoding format of each record is the same as follows:
Number of bytes | 1 byte | Lengthen | key size | Lengthen | value size |
---|---|---|---|---|---|
content | ValueType | key size | key | value size | value |
key size and value size use Variable length encoding format of LevelDB
WriteBatchInternal helper class
WriteBatchInternal has no member variables, only member functions.
SetCount and Count
void WriteBatchInternal::SetCount(WriteBatch* b, int n) { EncodeFixed32(&b->rep_[8], n); } int WriteBatchInternal::Count(const WriteBatch* b) { return DecodeFixed32(b->rep_.data() + 8); }
SetCount calls EncodeFixed32 to write the input parameter n to the rep of WriteBatch in small end encoding_ count field for.
Count calls DecodeFixed32 to replace rep_ The count field of is returned after decoding.
SetSequence and Sequence
void WriteBatchInternal::SetSequence(WriteBatch* b, SequenceNumber seq) { EncodeFixed64(&b->rep_[0], seq); } SequenceNumber WriteBatchInternal::Sequence(const WriteBatch* b) { return SequenceNumber(DecodeFixed64(b->rep_.data())); }
SetSequence calls EncodeFixed64 to write the serial number seq to rep in small end encoding_ The sequence field of the. The SequenceNumber here is actually std::uint64_t type.
The sequence function calls the DecodeFixed64 rep_ The sequence field of is returned after decoding.
SetContents and ByteSize
void WriteBatchInternal::SetContents(WriteBatch* b, const Slice& contents) { assert(contents.size() >= kHeader); b->rep_.assign(contents.data(), contents.size()); } static Slice WriteBatchInternal::Contents(const WriteBatch* batch){ return Slice(batch->rep_); } static size_t WriteBatchInternal::ByteSize(const WriteBatch* batch){ return batch->rep_.size(); }
SetContent overwrites rep_ with conents. The value of kHeader is 12, which is the total length of sequence field and count field, which are required.
Contents rep_ Wrapped as a slice object. Slice mentioned in the previous article that it can be treated as a string.
ByteSize returns rep_ size of, i.e. rep_ Number of bytes.
Append
void WriteBatchInternal::Append(WriteBatch* dst, const WriteBatch* src) { SetCount(dst, Count(dst) + Count(src)); assert(src->rep_.size() >= kHeader); dst->rep_.append(src->rep_.data() + kHeader, src->rep_.size() - kHeader); }
Append is an important function with two parameters, dst and src. Append appends the src data into the dst.
After merging, the sequence field of dst remains unchanged, and the count field value of dst is equal to the sum of dst and src count. The record list of src is appended and merged after the reocrd list of dst.
InsertInto
Status WriteBatchInternal::InsertInto(const WriteBatch* b, MemTable* memtable) { MemTableInserter inserter; inserter.sequence_ = WriteBatchInternal::Sequence(b); inserter.mem_ = memtable; return b->Iterate(&inserter); }
Construct MemTableInserter object and fill in sequence and memtable object pointer. Then call the Iterate of WriteBatch to insert the update record of the current batch into memtable.
MemTableInserter helper class
The MemTableInserter auxiliary class is used to insert records into MemTable. MemTableInserter inherits Handler, which is an abstract class with only Put and Delete pure virtual functions.
The Handler uses the template method design pattern, which can easily make different versions of Handler, such as MemTableInserterPro.
namespace { class MemTableInserter : public WriteBatch::Handler { public: SequenceNumber sequence_; MemTable* mem_; void Put(const Slice& key, const Slice& value) override { mem_->Add(sequence_, kTypeValue, key, value); sequence_++; } void Delete(const Slice& key) override { mem_->Add(sequence_, kTypeDeletion, key, Slice()); sequence_++; } }; }
When I saw the code, I was very confused. Why write a namespace without a name? I checked it specifically. This is called anonymous namespace, which allows functions, classes and objects inside the namespace to be accessed only in the current file, which is similar to the static scope of C language.
Member variable
MemTableInserter contains two member variables, sequence_ And mem_. sequence_ Is the serial number to be filled in when inserting Memtable. mem_ Is a pointer to the Memtable object.
Put
Call the Add function of Memtable to Add a record. The record type is kTypeValue, indicating that this is an insert record.
Delete
Call the Add function of Memtable to Add a record. The record type is kTypeDeletion, indicating that this is a deleted record.
WriteBatch member function
Put
void WriteBatch::Put(const Slice& key, const Slice& value) { WriteBatchInternal::SetCount(this, WriteBatchInternal::Count(this) + 1); rep_.push_back(static_cast<char>(kTypeValue)); PutLengthPrefixedSlice(&rep_, key); PutLengthPrefixedSlice(&rep_, value); }
The first is to put rep_ Add 1 to the count field of. Then encode ValueType, key and value into a record and write it to rep_ Behind the.
Delete
void WriteBatch::Delete(const Slice& key) { WriteBatchInternal::SetCount(this, WriteBatchInternal::Count(this) + 1); rep_.push_back(static_cast<char>(kTypeDeletion)); PutLengthPrefixedSlice(&rep_, key); }
Deleting elements in LevelDB is achieved by adding a delete record. Therefore, there are two differences between deleting or adding a record and Put. One is that the ValueType of record is ktypedeletion, and the other is that the deleted record has only key but no value.
Append
void WriteBatch::Append(const WriteBatch& source) { WriteBatchInternal::Append(this, &source); }
Call the Append of WriteBatchInternal to merge the source into the current WriteBatch object.
Iterate
Status WriteBatch::Iterate(Handler* handler) const { Slice input(rep_); if( input.size() < kHeader ) { return Status::Corruption("malformed WriteBatch (too small)"); } input.remove_prefix(kHeader); Slice key, value; int found = 0; while( !input.empty()) { found++; char tag = input[0]; input.remove_prefix(1); switch( tag ) { case kTypeValue: if( GetLengthPrefixedSlice(&input, &key) && GetLengthPrefixedSlice(&input, &value)) { handler->Put(key, value); }else { return Status::Corruption("bad WriteBatch Put"); } break; case kTypeDeletion: if( GetLengthPrefixedSlice(&input, &key)) { handler->Delete(key); }else { return Status::Corruption("bad WriteBatch Delete"); } break; default: return Status::Corruption("unknown WriteBatch tag"); } } if( found != WriteBatchInternal::Count(this)) { return Status::Corruption("WriteBatch has wrong count"); }else { return Status::OK(); } }
The above code is quite a lot, but the logic is very simple, that is, traversal. The rep of WriteBatch_ Decoding: for each record, call the Put of the handler to add an insert record or call Delete to add or Delete a record according to the ValueType field.
Some checks will be performed during and after the traversal. If there is a problem with the data, it will return to a "Corruption" state. If the data is OK, an "OK" status is returned.
ApproximateSize and Clear
Approximate size returns rep_ Clear, clear rep_ And set the sequence field and count field to 0.
Why separate a WriteBatchInternal
From the perspective of code function, the author puts the rep of WriteBatch_ The decoding and encoding of are stored in WriteBatchInternal, and WriteBatch only has an encapsulated interface for external use. Make the code structure clearer.
With one exception, the Iterate member function of WriteBatch has a pair of rep_ For decoding. According to the above idea, Iterate should be implemented by WriteBatchInternal, and WriteBatch can be called again. It may be that the author still has some deviation in his understanding here. Welcome to post your ideas in the comment area.