Technology Sharing | mysql Table Data Checking

Author: Yang Taotao

Question from customers: What are the methods of database validation? Can you tell him?
Me: Direct percona toolkit is not enough.
Client: Any software installed on the server must be reviewed, and how long it will be reviewed is uncertain. And open source software is not allowed.
Me:...

Okay, let's see how MySQL itself checks data.

1.checksum table.

The checksum table calculates the table row by row until the final checksum result is calculated.
For example, table n4 is checked (157W, 4G)

[ytt]>desc n4;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| r1    | char(36)     | YES  |     | NULL    |       |
| r2    | varchar(100) | YES  |     | NULL    |       |
| r3    | datetime     | YES  |     | NULL    |       |
| r4    | text         | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

[ytt]>select count(*) from n4;
+----------+
| count(*) |
+----------+
|  1572864 |
+----------+
1 row in set (6.89 sec)

[ytt]>checksum table n4;
+--------+-----------+
| Table  | Checksum  |
+--------+-----------+
| ytt.n4 | 874125175 |
+--------+-----------+
1 row in set (8.24 sec)

The test results on my own notebook were very fast.

However, there are many restrictions on checksum. The following are listed.

A. Views cannot be checked.

[ytt]>checksum table v_n3;
+----------+----------+
| Table    | Checksum |
+----------+----------+
| ytt.v_n3 |     NULL |
+----------+----------+
1 row in set, 1 warning (0.00 sec)

[ytt]>show warnings;
+-------+------+------------------------------+
| Level | Code | Message                      |
+-------+------+------------------------------+
| Error | 1347 | 'ytt.v_n3' is not BASE TABLE |
+-------+------+------------------------------+
1 row in set (0.00 sec)

B. The order of fields is different, and the results of verification will be inconsistent.

[ytt]>desc n3;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id    | int(11) | NO   |     | NULL    |       |
| r1    | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

[ytt]>desc n5;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| r1    | int(11) | YES  |     | NULL    |       |
| id    | int(11) | NO   |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

[ytt]>checksum table n3,n5;
+--------+------------+
| Table  | Checksum   |
+--------+------------+
| ytt.n3 | 1795175396 |
| ytt.n5 |  838415794 |
+--------+------------+
2 rows in set (0.00 sec)

C, CHAR(100) and VARCHAR(100) store the same characters, and the verification results will be inconsistent.

[ytt]>desc n6;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | NO   |     | NULL    |       |
| r1    | int(11)      | YES  |     | NULL    |       |
| s1    | varchar(100) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

[ytt]>desc n3;
+-------+-----------+------+-----+---------+-------+
| Field | Type      | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id    | int(11)   | NO   |     | NULL    |       |
| r1    | int(11)   | YES  |     | NULL    |       |
| s1    | char(100) | YES  |     | NULL    |       |
+-------+-----------+------+-----+---------+-------+
3 rows in set (0.00 sec)

[ytt]>select * from n6;
Empty set (0.00 sec)

[ytt]>insert into n6 select * from n3;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

[ytt]>checksum table n3,n6;
+--------+------------+
| Table  | Checksum   |
+--------+------------+
| ytt.n3 | 2202684200 |
| ytt.n6 |  455222236 |
+--------+------------+
2 rows in set (0.00 sec)

D. When checksum is executed, shared read locks are added to all rows of the table.

E. There is also the possibility that different versions of MySQL may result in inconsistent verification results. As mentioned in the manual, the version after MySQL 5.6.5 changes the storage format of time type, resulting in inconsistent verification results.

So there are so many restrictions on checksum, do we have any way to break through them? For example, the principle of checksum table can be simulated to calculate manually.

2. Calculate the checksum value by yourself.

Several features of MySQL itself are used here: session variables; general expressions; window functions and MySQL concat_ws functions. The implementation is very simple.

For example, we use the sha function to calculate the check value.

[ytt]>set @crc='';
Query OK, 0 rows affected (0.00 sec)

[ytt]>
[ytt]>with ytt (r,rn) as
   -> (
   -> select @crc:= sha(concat_ws('#',@crc,id,r1,r2,r3,r4)) as r, row_number() over() as rn
   -> from n4
   -> )
   -> select 'n4' tablename, r checksum from ytt where rn = 1572864 ;
+-----------+------------------------------------------+
| tablename | checksum                                 |
+-----------+------------------------------------------+
| n4        | a9711af93399e0d195a53f4148adea46ab684d30 |
+-----------+------------------------------------------+
1 row in set, 1 warning (16.46 sec)

If the old version of MySQL runs, you can use MySQL's black hole engine and change the SQL as follows:

[ytt]>create table tmp_checksum (checksum varchar(100)) engine blackhole;
Query OK, 0 rows affected (0.08 sec)

[ytt]>
[ytt]>set @crc='';insert into tmp_checksum
Query OK, 0 rows affected (0.00 sec)

->     select @crc:= sha(concat_ws('#',@crc,id,r1,r2,r3,r4)) as r   from n4;

Query OK, 1572864 rows affected, 1 warning (20.11 sec)
Records: 1572864  Duplicates: 0  Warnings: 1

[ytt]>select 'n4' tablename,@crc checksum;
+-----------+------------------------------------------+
| tablename | checksum                                 |
+-----------+------------------------------------------+
| n4        | a9711af93399e0d195a53f4148adea46ab684d30 |
+-----------+------------------------------------------+
1 row in set (0.00 sec)

summary

For the requirement of checking data consistency, the second method of writing SQL by oneself is preferred.

Keywords: Database MySQL SQL Session

Added by newjsguy on Fri, 20 Sep 2019 12:17:18 +0300