Share 100 million and 1 billion data from NutsDB

Hello, I want to share with you my recent data test for nutsdb.

Test items

github address: https://github.com/xujiajun/nutsdb

cause

The cause of the matter is this. issue In short, the memory is too high to use.

Maybe many people don't know NutsDB. In a nutshell, NutsDB is a built-in KV database written in Go language several months ago, which supports a variety of data structures. Feedback from Open Source: Open Source is on the Go Trend List. I got 500+star in a week, which also attracted the attention of many colleagues and gave me some suggestions. Several are used in production environments.

Verify and test 100 million pieces of data

Back to the point: To verify this issue, I first measured a billion data volumes.

Version: nutsdb V0.4.0
Server configuration: Ubuntu 16.04 64-bit 8-core 64G
Data volume: about 11G (current version is not compressed)
In order to speed up the test, no real-time sync was set. Write speed: 25.7w/s

Key value is similar:

key := []byte("namename" + strconv.Itoa(i))
val := []byte("valvalvavalvalvalvavalvalvalvavalvalvalvaval" + strconv.Itoa(i))

Test results:

Mem : 64430 MB , Free: 63776 MB , Used:176 MB , Usage:0.273957%
start db index cost time:  72.076µs
batch put data cost:  6m29.067011134s
Mem : 64430 MB , Free: 24760 MB , Used:39147 MB , Usage:60.759105%

It was found that memory consumption was about 3.46 times the amount of data. To be honest, it was several times less than what he said, but I still could not accept it. What should I do?

Solve

So we developed a new mode, Entry IdxMode: HintBPTSparse IdxMode, which is designed to save memory. The principle is b + tree multi-level index.

The master branch is already supported, and interested attempts are welcome.

Let's test 1 billion pieces of data on a single machine.

The new model tests 1 billion pieces of data

Version: nutsdb master branch
Host configuration: Ubuntu 16.04 64 bit 2 core 2G
Key value is similar to the one above.
To speed up testing, no real-time sync was set up

Test results:

Mem : 1999 MB , Free: 1786 MB , Used:53 MB , Usage:2.688618%
Mem : 1999 MB , Free: 1695 MB , Used:135 MB , Usage:6.784733%

Memory occupancy is only 82 MB and 1 billion data inserts are completed, but write speed is reduced to 435 w/s. Generate index data file 153G

Look at the performance of the next reading, read 10 data, this is no cached results are as follows:

load cost: 2.607796193s
key , find val namename0 valvalvavalvalvalvavalvalvalvavalvalvalvaval0
key , find val namename1 valvalvavalvalvalvavalvalvalvavalvalvalvaval1
key , find val namename2 valvalvavalvalvalvavalvalvalvavalvalvalvaval2
key , find val namename3 valvalvavalvalvalvavalvalvalvavalvalvalvaval3
key , find val namename4 valvalvavalvalvalvavalvalvalvavalvalvalvaval4
key , find val namename5 valvalvavalvalvalvavalvalvalvavalvalvalvaval5
key , find val namename6 valvalvavalvalvalvavalvalvalvavalvalvalvaval6
key , find val namename7 valvalvavalvalvalvavalvalvalvavalvalvalvaval7
key , find val namename8 valvalvavalvalvalvavalvalvalvavalvalvalvaval8
key , find val namename9 valvalvavalvalvalvavalvalvalvavalvalvalvaval9
read cost 87.208728ms

Good to share here. Welcome to exchange messages.

Finally, welcome to nutsdb to raise issue s, point Star attention, submit PR, thank you!

Keywords: Database github Ubuntu less

Added by korporaal on Fri, 26 Jul 2019 06:54:49 +0300