Redis - redis persistence

Introduction to persistence

Redis is an in memory database. If the database state in memory is not saved to disk, the database state in the server will be lost once the server process exits. So redis provides persistence function!

What is persistence
The working mechanism of using permanent storage media to save data and recover the saved data at a specific time is called persistence.

Why persistence
Prevent accidental loss of data and ensure data security

What does the persistence process hold

Save the current data state in the form of snapshot, store data results, simple storage format, and focus on data
Save the operation process of data in the form of log and store the operation process. The storage format is complex, and the focus is on the operation process of data
Redis has two ways of persistence
- RDB: the storage method is data (snapshot)
- AOF: stored as a procedure (log)

RDB

What is RDB

Write the data set Snapshot in memory to disk within the specified time interval, that is, the professional Snapshot snapshot. When it is restored, it reads the Snapshot file directly into memory.
Redis will separately create (fork) a sub process for persistence. It will first write the data to a temporary file. After the persistence process is completed, redis will use this temporary file to replace the last persistent file. In the whole process, the main process does not perform any IO operations. This ensures extremely high performance. If large-scale data recovery is needed and the integrity of data recovery is not very sensitive, RDB method is more efficient than AOF method. The disadvantage of RDB is that the data after the last persistence may be lost.
Sometimes in the production environment, we will back up this file

RDB startup mode - save instruction

command
save
effect
Manually perform a save operation

127.0.0.1:6379> keys *
(empty array)
127.0.0.1:6379> set name 123
OK
127.0.0.1:6379> save
OK

stay data One will be generated in the directory rdb file
[root@maomao data]# ls
6379.log  6380.log  dump.rdb

We will rdb delete	Once in execution save
[root@maomao data]# rm -rf dump.rdb 
[root@maomao data]# ls
6379.log  6380.log

127.0.0.1:6379> set user maomao
OK
127.0.0.1:6379> save
OK

Generated again dump.rdb
[root@maomao data]# ls
6379.log  6380.log  dump.rdb

save instruction related configuration

dbfilename dump.rdb
Note: set the local database file name. The default value is dump rdb
Experience: usually set to dump port number rdb
dir
Description: set storage Path to rdb file
Experience: it is usually set in the directory with large storage space, and the directory name is data
rdbcompression yes
Note: set whether to compress data when stored in the local database. The default is yes and LZF compression is adopted
Experience: it is usually on by default. If it is set to no, it can save CPU running time, but it will make the stored files larger (huge)
rdbchecksum yes
Note: set whether to perform RDB file format verification. The verification process is carried out in the process of writing and reading files
Experience: it is usually on by default. If it is set to no, it can save about 10% of the time consumption of the reading and writing process, but there is a certain risk of data damage

configuration file

dbfilename dump-6379.rdb	# Change the rdb file name to the port number

rdbcompression yes		# Turn on compression

rdbchecksum yes			# Turn on verification

Restart the service and write a script to facilitate startup
#!/bin/bash
read -p 'Please enter the you want to start redis Port:' port

/usr/local/bin/redis-server /usr/local/bin/redis_config/redis-$port.conf
/usr/local/bin/redis-cli -p $port

[root@maomao bin]# bash qidong.sh 
Please enter the you want to start redis Port: 6379
127.0.0.1:6379> 

127.0.0.1:6379> keys *
(empty array)
127.0.0.1:6379> set name zhu
OK
127.0.0.1:6379> set age 18
OK
127.0.0.1:6379> save
OK
127.0.0.1:6379> set gender nan
OK


[root@maomao data]# ls		# With the newly renamed rdb file
6379.log  6380.log  dump-6379.rdb

data recovery

Test whether the data can be recovered
127.0.0.1:6379> shutdown
not connected> exit
[root@maomao bin]# ps -ef |grep redis-
root       1667   1507  0 01:15 pts/0    00:00:00 grep --color=auto redis-

[root@maomao bin]# bash qidong.sh 
Please enter the you want to start redis Port: 6379
127.0.0.1:6379> keys *
1) "gender"
2) "name"
3) "age"

Data exists!

Trigger mechanism

When the save rule is satisfied, the rdb rule will be triggered automatically
Executing the flush command will also trigger our rdb rules
Exiting redis will also generate rdb files

How the save instruction works

Note: the execution of the save instruction will block the current Redis server until the current RDB process is completed, which may cause long-term blocking. It is not recommended to use it in the online environment

RDB startup mode - bgsave instruction

command

bgsave
effect
Start the background save operation manually, but not immediately

127.0.0.1:6379> set addr chengdu
OK
127.0.0.1:6379> bgsave
Background saving started

view log
cat 6379.log
1671:M 18 Apr 2021 01:23:58.455 * Background saving started by pid 1678
1671:M 18 Apr 2021 01:23:58.482 * Background saving terminated with success

Working principle of bgsave instruction

Note: the bgsave command is optimized for save blocking. All RDB related operations in Redis adopt bgsave, and the Save command can be abandoned.

bgsave instruction related configuration

dbfilename dump-6379.rdb	# Change the rdb file name to the port number

rdbcompression yes		# Turn on compression

rdbchecksum yes			# Turn on verification

stop-write-on-bgsave-error yes	# An error occurred. Do you want to stop saving

stop-write-on-bgsave-error yes
Note: if an error occurs in the background stored procedure, do you want to stop saving
Experience: usually the default is on

RDB automatic execution

save configuration

to configure

save second changes
effect
If the number of key s changes within a limited time range reaches the specified number, it will be persisted
parameter
- second: monitoring time range
  changes: monitors the change amount of the key

# save 3600 1
# save 300 100
# save 60 10000
save 60 5		# If the key is modified five times within 60s, rdb operation will be triggered

127.0.0.1:6379> set name mao
OK
127.0.0.1:6379> set age 18
OK
127.0.0.1:6379> set gender nv
OK
127.0.0.1:6379> set addr chengdu
OK
127.0.0.1:6379> set subject python
OK

[root@maomao data]# ls		# rdb file generated
6379.log  6380.log  dump-6379.rdb

Within the time range, as long as the set number of key s changes, the system will execute bgsave

save configuration principle

Instructions that have an impact on the data will not have an impact
The value in key needs to change
Without data comparison, if the same value is set twice, the affected quantity will also be + 1

be careful:

The save configuration should be set according to the actual business situation. If the frequency is too high or too low, performance problems will occur, and the result may be disastrous
The settings of second and changes in the save configuration usually have complementary correspondence. Try not to set them as inclusive
The bgsave operation is executed after the save configuration is started

Comparison of three RDB startup modes

mode	save instruction	bgsave instruction
Reading and writing	synchronization	asynchronous
Blocking client instruction	yes	no
Additional memory consumption	no	yes
Start a new process	no	yes

rdb special startup form

Full replication
Write in master-slave copy
Restart during server operation
debug reload
Specify save data when closing the server
shutdown save

By default, the shutdown command is executed automatically
Bgsave (if AOF persistence is not enabled)

Advantages and disadvantages of RDB

RDB benefits

RDB is a compact compressed binary file with high storage efficiency
The internal storage of RDB is the data snapshot of redis at a certain point in time, which is very suitable for data backup, full replication and other scenarios
RDB recovers data much faster than AOF
Application: execute bgsave backup every X hours in the server, and copy RDB files to remote machines for disaster recovery.

Rdb disadvantages

RDB mode, whether executing instructions or using configuration, can not achieve real-time persistence, which is likely to lose data
Each time the bgsave instruction runs, it needs to perform a fork operation to create a child process, sacrificing some performance
Among many versions of Redis, RDB file format is not unified, which may lead to incompatibility of data formats between various versions of services

AOF

Disadvantages of RDB storage

The amount of data stored is large and the efficiency is low
- Based on the idea of snapshot, every read and write is all data. When the amount of data is huge, the efficiency is very low
Low IO performance under large amount of data
Creating subprocesses based on fork causes additional memory consumption
Risk of data loss due to downtime

Solution ideas

Do not write all data, only record some data
Reduce the difficulty of distinguishing whether the data is changed, and change the recorded data to the recorded operation process
All operations are recorded to eliminate the risk of data loss

What is AOF

AOF(append only file) persistence: record each write command in an independent log, and re execute the command in the AOF file when restarting
Achieve the purpose of restoring data. Compared with RDB, it can be simply described as the process of changing recorded data to recorded data generation
The main function of AOF is to solve the real-time of data persistence. At present, it has become the mainstream way of Redis persistence

Each write operation is recorded in the form of a log. All instructions executed by redis are recorded (read operations are not recorded). Only files can be added, but files cannot be overwritten. Redis will read the file at the beginning of startup and rebuild the data. In other words, if redis restarts, the write instructions will be executed from front to back according to the contents of the log file to complete the data recovery

Three AOF write data strategies (appendfsync)

Always
- Each write operation is synchronized to the AOF file, with zero data error and low performance. It is not recommended
everysec (per second)
- The instructions in the buffer are synchronized to the AOF file every second, which has high data accuracy and high performance
- Loss of data within 1 second in case of sudden system downtime
no (system control)
- The operating system controls the cycle of each synchronization to AOF file, and the overall process is uncontrollable

AOF function on

to configure

appendonly yes|no
effect
Whether to enable the AOF persistence function. It is not enabled by default
to configure

appendfsync always|everysec|no
effect
AOF write data policy

appendonly yes	# Enable AOF function
appendfilename "appendonly-6379.aof"		# File name

appendfsync always	# Test always first

Test always

127.0.0.1:6379> keys *
(empty array)
127.0.0.1:6379> set name zhuer
OK
127.0.0.1:6379> lpush list a b c d e
(integer) 5

There it is aof file
[root@maomao data]# ls
6379.log  6380.log  appendonly-6379.aof  dump-6379.rdb

see
[root@maomao data]# cat appendonly-6379.aof 
*2
$6
SELECT
$1
0
*3
$3
set
$4
name
$5
zhuer
*7
$5
lpush
$4
list
$1
a
$1
b
$1
c
$1
d
$1
e

Test everysec

vim redis-6379.conf 

appendfsync everysec

[root@maomao bin]# bash qidong.sh 
Please enter the you want to start redis Port: 6379
127.0.0.1:6379> set abc abc
OK

see aof file
$3
abc
$3
abc

Persistence success

AOF rewrite

As commands continue to write to AOF, the file will become larger and larger. In order to solve this problem, Redis has introduced AOF rewriting mechanism to compress the file volume. Aof file redo
Write is the process of converting data in Redis process into write commands and synchronizing them to new AOF files. In short, it is to convert the execution results of several commands on the same data into the instructions corresponding to the final result data for recording.

AOF rewriting

Reduce disk usage and improve disk utilization
Improve persistence efficiency, reduce persistence write time and improve IO performance
Reduce data recovery time and improve data recovery efficiency

AOF rewrite rule

The data that has timed out in the process is no longer written to the file
Ignore the invalid instructions and use the in-process data to generate directly during rewriting, so that the new AOF file only retains the write command of the final data
- Such as del key1, hdel key2, srem key3, set key4 111, set key4 222, etc
Multiple write commands for the same data are combined into one command
- For example, lpush list1 a, lpush list1 b and lpush list1 c can be transformed into: lpush list1 a b c.
- In order to prevent client buffer overflow caused by excessive data volume, each instruction can write up to 64 elements for list, set, hash, zset and other types

AOF rewrite mode

Manual override

bgrewriteaof
Auto rewrite

auto-aof-rewrite-min-size size
auto-aof-rewrite-percentage percentage

Modify profile
appendfsync always
appendfilename "appendonly-6379.aof"

Delete previous data
[root@maomao data]# ls
6379.log  6380.log  appendonly-6379.aof  dump-6379.rdb
[root@maomao data]# rm -rf appendonly-6379.aof 
[root@maomao data]# rm -rf dump-6379.rdb

start-up redis
127.0.0.1:6379> set name a
OK
127.0.0.1:6379> set name b
OK
127.0.0.1:6379> set name c
OK
127.0.0.1:6379> get name
"c"

127.0.0.1:6379> lpush list a
(integer) 1
127.0.0.1:6379> lpush list b
(integer) 2
127.0.0.1:6379> lpush list c
(integer) 3

see aof file
set
$4
name
$1
a
*3
$3
set
$4
name
$1
b
*3
$3
set
$4
name
$1
c
$5
lpush
$4
list
$1
a
*3
$5
lpush
$4
list
$1
b
*3
$5
lpush
$4
list
$1
c
*2
$3
del
$4
name

rewrite
127.0.0.1:6379> bgrewriteaof

View again aof file
*2
$6
SELECT
$1
0
*3
$3
SET
$4
name
$3
c
RPUSH
$5
list
$1
c
$1
b
$1
a

After rewriting, the file size becomes smaller and useless instructions are ignored

AOF auto rewrite mode

Auto override trigger condition setting
Automatically rewrite trigger comparison parameters (run the instruction info Persistence to obtain specific information)
Auto override trigger condition

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

If aof File greater than 64 m，It's too big! fork A new process is to rewrite our files

AOF workflow

rewrite

The AOF buffer synchronization file policy is controlled by the parameter appendfsync

Description of system call write and fsync:
- The write operation will trigger the delayed write mechanism. Linux provides a page buffer in the kernel to improve the IO performance of the hard disk. The write operation returns directly after writing to the system buffer. Synchronous hard disk operation depends on the system scheduling mechanism, such as the buffer page space is full or reaches a specific time period. Before synchronizing files, if the system fails and goes down at this time, the data in the buffer will be lost.
- fsync enforces hard disk synchronization for single file operations (such as AOF files). fsync will block and return after writing to the hard disk to ensure data persistence.
- In addition to write, fsync and Linx, sync and fdatasync operations are also provided

Difference between RDB and AOF

Persistence mode	RDB	AOF
Occupied storage space	Small (data level: compression)	Large (instruction level: override)
Storage speed	slow	fast
Recovery speed	fast	slow
Data security	Data will be lost	Determined by strategy
resource consumption	High / heavyweight	Low / lightweight
boot priority	low	high

Selection of RDB and AOF

It is very sensitive to data. It is recommended to use the default AOF persistence scheme
- The AOF persistence policy uses everysecond, fsync every second. With this strategy, redis can still maintain good processing performance. When a problem occurs, data within 0-1 seconds will be lost at most.
- Note: due to the large storage volume of AOF files and slow recovery speed
For the validity of data presentation phase, it is recommended to use RDB persistence scheme
- The data can be well maintained without loss in the stage (this stage is manually maintained by developers or operation and maintenance personnel), and the recovery speed is fast. RDB scheme is usually adopted for stage point data recovery
- Note: using RDB to realize compact data persistence will make Redis drop very low
Comprehensive comparison
- The choice between RDB and AOF is actually a trade-off. Each has advantages and disadvantages
- If you can't bear the loss of data within a few minutes and are very sensitive to business data, choose AOF
- If you can withstand data loss within a few minutes and pursue the recovery speed of large data sets, RDB is selected
- RDB is selected for disaster recovery
- Double insurance strategy, enable RDB and AOF at the same time. After restart, Redis gives priority to using AOF to recover data and reduce the amount of lost data

Keywords: Database Big Data Redis

Added by Sanjib Sinha on Fri, 04 Mar 2022 08:54:45 +0200

Programming VIP