Issue 39: Notes on writing SQL in MySQL time class partition

The previous article "specific implementation of MySQL time class partition" introduces the implementation method of time class partition. This article is an extension of the previous article and introduces the relevant SQL writing precautions based on this kind of partition.

There are two ways to retrieve partitioned tables. One is with partition key, and the other is without partition key. Generally speaking, if the search condition has partition key, the execution speed is fast, and if there is no partition key, the execution speed becomes slow. This conclusion is applicable to most scenarios, but it cannot be generalized. The most appropriate SQL statements should be written for different partition table definitions. The purpose of using partitioned table is to reduce the number of records during SQL statement retrieval. If the expected effect is not achieved, partitioned table can only bring side effects. Next, I list several classic SQL statements:

Careful readers may have some questions after reading the first article, based on table YTT_ The SQL statement of P1 is as follows:

select count(*) from ytt_pt1 where log_date >='2018-01-01' and log_date <'2019-01-01';

It is also a partition table ytt_pt1_month1, the SQL statement based on this table is as follows:

select count(*) from ytt_pt1_month1 where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15');

The retrieval requirements of the two tables are similar. Why are the writing methods different? Why should the latter be written in list form instead of simple range search form? With that in mind, let's continue.

MySQL has an optimization technology for partitioned tables called partition pruning, which translates into partition pruning. Its general meaning is that MySQL will calculate according to the partition function corresponding to the filter conditions of SQL statements, and penetrate the calculation results into the underlying partition table, so as to reduce the number of scanning records. For time types (DATE,TIMESTAMP,TIME, DATETIME), MySQL only supports partition clipping of some functions: to_days，to_seconds，year，unix_timestamp. Then let's look at the previous question: table YTT_ pt1_ The month1 partition function is month. Although the MySQL partition table supports the month function, the partition clipping technology does not include this function. Next, we will introduce the content of this article in two parts.

First, experience the partition cutting technology of MySQL and create a new table pt_pruning: partition function is to_days .

create table pt_pruning (
id int,
r1 int,
r2 int,
log_date date)
partition by range(to_days(log_date))
(
PARTITION p_01 VALUES LESS THAN (to_days('2020-02-01')) ENGINE = InnoDB,
 PARTITION p_02 VALUES LESS THAN (to_days('2020-03-01')) ENGINE = InnoDB,
 PARTITION p_03 VALUES LESS THAN (to_days('2020-04-01')) ENGINE = InnoDB,
 PARTITION p_04 VALUES LESS THAN (to_days('2020-05-01')) ENGINE = InnoDB,
 PARTITION p_05 VALUES LESS THAN (to_days('2020-06-01')) ENGINE = InnoDB,
 PARTITION p_06 VALUES LESS THAN (to_days('2020-07-01')) ENGINE = InnoDB,
 PARTITION p_07 VALUES LESS THAN (to_days('2020-08-01')) ENGINE = InnoDB,
 PARTITION p_08 VALUES LESS THAN (to_days('2020-09-01')) ENGINE = InnoDB,
 PARTITION p_09 VALUES LESS THAN (to_days('2020-10-01')) ENGINE = InnoDB,
 PARTITION p_10 VALUES LESS THAN (to_days('2020-11-01')) ENGINE = InnoDB,
 PARTITION p_11 VALUES LESS THAN (to_days('2020-12-01')) ENGINE = InnoDB,
 PARTITION p_12 VALUES LESS THAN (to_days('2021-01-01')) ENGINE = InnoDB,
 PARTITION p_max VALUES LESS THAN MAXVALUE ENGINE = InnoDB
)

This table contains about 100W data for the whole year of 2020. The data generation process is omitted here.

(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_pruning;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)

Execute the following SQL statements:

SQL 1: find the number of records whose date contains' 2020-01-02 '.

SQL 1:  select count(*) from pt_pruning where log_date <= '2020-01-02';

SQL 2 and SQL 3: find the number of records in January 2020.

SQL 2:  select count(*) from pt_pruning where log_date < '2020-02-01';

SQL 3:  select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';

The execution time of SQL 1 and SQL 2 is 0.04 seconds, and that of SQL 3 is 0.06 seconds. The effect is ideal without using index.

(localhost:ytt)<mysql> select count(*) from pt_pruning where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

(localhost:ytt)<mysql>select count(*) from pt_pruning where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.06 sec)

So remember to use the partition function specified by MySQL partition clipping technology to establish the partition table, so that writing SQL will be relatively free. If the partition table does not use the partition function specified above due to historical reasons, there are two possible optimization strategies:

Manually modify the SQL statement to achieve the best.
Add HINT to prompt MySQL to use specific partitions.

Second, if the partition function used by the partition table does not meet the rules of MySQL partition clipping technology, how to optimize such SQL statements?

To avoid confusion with the previous article, create a new table pt_month, copy table YTT_ pt1_ Table definition of month1. Table pt_month and table Pt_ Like pruning, the records for the whole year of 2020 are stored, and the total number is 100W.

(localhost:ytt)<mysql>select min(log_date),max(log_date),count(*) from pt_month;
+---------------+---------------+----------+
| min(log_date) | max(log_date) | count(*) |
+---------------+---------------+----------+
| 2020-01-02    | 2020-12-31    |  1000000 |
+---------------+---------------+----------+
1 row in set (0.72 sec)

Execute the previous three SQL statements again and replace the table name with pt_month :

The execution time of SQL 1 is 1.26 seconds, which is much slower than before. After checking the execution plan, it is found that MySQL partition clipping technology is not used, and unnecessary table partitions are scanned. (here are all table partitions)

(localhost:ytt)<mysql>select count(*) from pt_month where log_date <= '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (1.26 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date <= '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01,p_02,p_03,p_04,p_05,p_06,p_07,p_08,p_09,p_10,p_11,p_max
...
         rows: 992805
     filtered: 33.33
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Next, make a simple optimization for SQL 1: since you are looking for the record with the date of '2020-01-02', do not use < = to filter, but directly use = to filter: the execution time is 0.03 seconds. View the execution plan and locate the modified SQL directly to the table partition p_01, which achieves the effect of partition cutting.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date = '2020-01-02';
+----------+
| count(*) |
+----------+
|     2621 |
+----------+
1 row in set (0.03 sec)

(localhost:ytt)<mysql>explain 
    -> select count(*) from pt_month where log_date = '2020-01-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pt_month
   partitions: p_01
         type: ALL
...
         rows: 82522
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Continue to execute SQL 2 and SQL 3: the execution time is between 1 and 2 seconds, which is very inefficient, and MySQL partition clipping technology is not used.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.35 sec)

(localhost:ytt)<mysql>select count(*) from pt_month where log_date between '2020-01-01' and '2020-01-31';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (1.93 sec)

To continue to optimize SQL 2 and SQL 3. Since the two requirements are consistent, you can change the range retrieval to the specified list retrieval: the execution time is only 0.04 seconds.

(localhost:ytt)<mysql>select count(*) from pt_month where log_date in ('2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05','2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10','2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15','2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20','2020-01-21','2020-01-22','2020-01-23','2020-01-24','2020-01-25','2020-01-26','2020-01-27','2020-01-28','2020-01-29','2020-01-30','2020-01-31');
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

After changing the range query to IN list, the efficiency is greatly improved. The query plan shows that MySQL optimizer is only IN partition P_ Retrieve records on 01.

...
   partitions: p_01
...

In addition to modifying the SQL statement, you can also add HINT to the statement to make MySQL use the partition cutting technology: for example, after adding HINT to SQL 2, the execution time is 0.04 seconds, which is equivalent to the execution efficiency of the modified statement.

(localhost:ytt)<mysql>select count(*) from pt_month partition (p_01) where log_date < '2020-02-01';
+----------+
| count(*) |
+----------+
|    82410 |
+----------+
1 row in set (0.04 sec)

Summary:

If the partition table does not use MySQL partition clipping technology due to historical reasons, you can manually clip and optimize the partition table according to the following rules:

select * from tbname where partition_key = value;
select * from tbname where partition_key in (value1,value2,...,valueN);
The above two rules still apply to multi table joins.

Added by vikaspa on Mon, 21 Feb 2022 10:02:29 +0200

Programming VIP