Case 12. shell multithreaded backup database

The shell script is multithreaded, which is somewhat difficult to understand because it is implemented with named pipes. Multithreading means that what was originally done by one process is now done by multiple threads. If a process needs 10 hours to complete something, now allocate 10 threads, divide them up, and then do it at the same time, it may eventually take 1 hour.

The specific requirements of this case are as follows:

1) The company's business volume is relatively large, there are 100 databases need to be backed up in full, and the data volume of each database is as high as tens of GB. (Note that each database is an independent instance, that is, there is an independent IP:port).

2) Estimate the backup time of each library in about 30 minutes.

3) Require backup within 5 hours.

Tip: To backup 100 databases in 5 hours, we need to use the multi-threading function of shell script, open 10 threads at a time and backup 10 databases concurrently.

Knowledge Point 1: Backup MySQL database with xtrabackup

Mysqldump is good for exporting several G databases or tables. Once the amount of data reaches tens of hundreds of gigabytes, mysqldump will not work well, regardless of the pressure on the original library or the performance of the export. Percona-Xtrabackup backup tool is the best choice for MySQL online hot standby. It can be used for full, incremental, single-form backup and restore.

The xtrabackup command only supports non-blocking backup of InnoDB and XtraDB storage engines, while innobackupex encapsulates a layer of xtrabackup through Perl, and backup of Myisam is implemented by adding tables and reading locks.

Install Percona-Xtrabackup like this on CentOS 7:

# rpm-ivh yum source
 # Yum install-y percona-xtrabackup-24//yum install version 2.4

The command to do full backup with xtrabackup is:

# innobackupex --default-file=/etc/my.cnf --host= --port=3333 --user=bakuser --password=your_pass /data/backup/mysql

Note: Before performing this backup operation, you need to create a user bakuser (user name customization) and grant reload, lock tables, replication client, process, super and other permissions. Backup data will be placed in the / data/backup/mysql directory and a directory named after the current date and time will be automatically generated, such as 2019-08-10_09_56_12.

Knowledge Point 2: Document Descriptor

The file descriptor (abbreviated fd) is formally a non-negative integer. In fact, it is an index value that points to the record table of the file that the kernel maintains for each process. When the program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. Each Unix process has three standard file descriptors that correspond to three different streams:

File Descriptor Name
 0) Standard input
 1) Standard correct output
 2) Standard error output

In addition to the above three standard descriptors, you can also customize other numbers as file descriptors in the process. Each file descriptor corresponds to an open file. At the same time, different file descriptors can correspond to the same open file; the same file can be opened by different processes, or multiple times by the same process.

Write a test script, / tmp/, content:

echo "The process's pid by $$"
exec 1>/tmp/test.log 2>&1
ls -l /proc/$$/fd/

Execute the script, and then view / tmp/test.log

# cat test.log
//Total dosage 0
lrwx------. 1 root root 64 8 month  10 10:24 0 -> /dev/pts/1
l-wx------. 1 root root 64 8 month  10 10:24 1 -> /tmp/test.log
l-wx------. 1 root root 64 8 month  10 10:24 2 -> /tmp/test.log
lr-x------. 1 root root 64 8 month  10 10:24 255 -> /tmp/

Note: Exc redirects the correct and error output of subsequent script instructions to / tmp/test.log, so you will see the above when you look at the file. For exec commands, look at an intuitive example:

[root@wbs]# exec > /tmp/test
[root@wbs]# echo "123123"
[root@wbs]# echo $PWD
[root@wbs]# lalala
-bash: lalala: No command found
[root@wbs]# exec > /dev/tty
[root@wbs]# cat /tmp/test

Note: Through the above example, we can find that when exec is executed, the standard correct output of the following commands is written to the / tmp/test file, and the error is displayed on the current terminal. To exit this setting, we need to redefine the standard output of exec as / dev/tty.

Knowledge Point 3: Command Pipeline

Pipeline symbol "|" has been used many times in shell scripts, which is called anonymous pipes, and the pipes mentioned here are named pipes. The functions of anonymous pipes are basically the same.

Named Pipeline, English name First In First Out, referred to as FIFO. Named Pipeline Characteristics:

1) In the filesystem, FIFO has a name and exists in the form of device special files.

2) Any process can share data through FIFO;

3) Unless both ends of FIFO have read and write processes at the same time, the data flow of FIFO will be blocked.

4) Anonymous pipes are created automatically by shell and exist in the kernel, while FIFO is created by program (such as mkfifo command) and exists in the file system.

5) Anonymous pipeline is a one-way byte stream, while FIFO is a two-way byte stream.

Create a named pipe with the mkfifo command:

# screen
# mkfifo 123.fifo
#echo "121212" > 123.fifo // / is blocked at this time because we just wrote the content in the pipeline, and no other process read it.
ctrl+a d  //Exit the screen
#cat 123.fifo // / You can see 1212 at this point, and then go to screen to see that echo command is over.

Named pipes can be combined with file descriptors:

[root@wbs ~]# mkfifo test.fifo
[root@wbs ~]#Exc 100 <> test.fifo // / This assigns read and write of fd100 to test.fifo
[root@wbs ~]#ls-l/dev/fd/100// / You can see that fd100 has pointed to / root/test.fifo
lrwx------. 1 root root 64 8 month  10 11:41 /dev/fd/100 -> /root/test.fifo

Knowledge Point 4: read command

In shell scripts, read commands are still used a lot, and the most typical use is to interact with users, as follows:

# read -p "Please input a number:" n
Please input a number:3
# echo $n

If you don't use the - p option, you can also use the following:

# read name
# echo $name

The - u option of read can be followed by fd, as follows:

# read-u 10 a// This assigns the string in fd 10 to a

Note that fd 10 here is the test.fifo defined earlier. If you haven't written anything in fd 10 yet, you will be stuck executing the above command. Because fd 10 is a named pipeline file, read will read only if it writes something, otherwise it will be stuck and waiting to write content. Of course, this named pipe file can be written to multiple lines, stored first, and then read.

# echo "123" >& 10
#echo "456" > & 10 // Write twice in fd10 consecutively
#Read-u 10 a// First read the first line in fd10
# echo $a
#read-u 10 a// read the second line in fd10 for the second time
# echo $a

Knowledge Point 5: wait command

Wait command: Waiting means waiting for tasks that are not completed (mainly background tasks) until all tasks have been completed before continuing to execute instructions after wait, often used in shell scripts. The following is an example of the wait instruction:

# sleep 5 &
#wait *// At this point, it will be stuck, until the above background instructions are executed, there will be no response.

Knowledge Point 6: Multithreading with Named Pipeline and read

Named pipelines have two distinct characteristics:

1) FIFO. For example, in the previous example, we wrote two lines to fd10, the first read line and the second read line.

2) Read with content is executed, but not blocked. For example, in the previous example, if you execute read again after two reads, it will remain stuck until we write new content again.

Using these two features, we can realize multi-threading of shell. Let's take an example first.

#Create a named pipeline 123.fifo file
mkfifo 123.fifo
#Binding named pipeline 123.fifo to file descriptor 1000, that is, the input and output of fd1000 are in 123.fifo
exec 1000<>123.fifo

#Write two blank lines to fd1000 in succession
echo >&1000
echo >&1000

#10 cycles
for i in `seq 1 10`
    #Every cycle, read the contents of fd1000 once, that is blank line. Only when read blank line, will the instructions in {} be executed.
    #Each cycle requires printing the current time, hibernating for one second, and then writing blank lines to fd1000 again, so that subsequent read s have content.
    #read instruction can not only assign values, but also follow a function, enclosed in {}, in which there are many instructions.
    read -u1000
        date +%T
        echo $i
        sleep 1
        echo >&1000
    } &  //Leave it behind the scenes, so that the 10 cycles will be completed quickly, but these tasks are running behind the scenes. Since we started writing two blank lines to fd1000, read reads two lines at a time.
#Waiting for all background tasks to complete
#Delete fd1000
exec 1000>&-
#Delete named pipes
rm -f 123.fifo

Implementation results:


As you can see, what would have taken 10 seconds to complete is now completed in 5 seconds, which means that the concurrency amount is 2, that is, two threads perform tasks at the same time. If you want five threads, you can write five empty lines directly to fd1000 at the beginning.

Reference script for this case

#Multithread Backup Database
#Version: v1.0

##Assume that the library name, host, port, and configuration file path of 100 libraries are stored in a file with the file name / TMP / databases list.
##Format: db1 3308/data/mysql/db1/my.cnf
##Backup the database using xtrabackup (because myisam is involved, the command is inoobackupex)

exec &> /tmp/mysql_bak.log

if ! which innobackupex &>/dev/nll
    echo "install xtrabackup tool"
    rpm -ivh  && \ 
    yum install -y percona-xtrabackup-24
    if [ $? -ne 0 ]
        echo "install xtrabackup Tool error, please check."
        exit 1


function bak_data {
    [ -d $bakdir/$db_name ] || mkdir -p $bakdir/$db_name
    innobackupex --defaults-file=$4  --host=$2  --port=$3 --user=$bakuser --password=$bakpass  $bakdir/$1
        if [ $? -ne 0 ]
            echo "Backup database $1 Problems arise."

mkfifo $fifofile
exec 1000<>$fifofile

for ((i=0;i<$thread;i++))
    echo >&1000

cat /tmp/databases.list | while read line
    read -u1000
        bak_data `echo $line`
        echo >&1000
    } &

exec 1000>&-
rm -f $fifofile

Keywords: Linux shell Database MySQL RPM

Added by neh on Tue, 13 Aug 2019 07:21:20 +0300