In depth analysis of MySQL Partition function

=Horizontal partition (by row according to column attribute)=

Take a simple example: a table containing ten-year invoice records can be divided into ten different partitions, and each partition contains the records of one year.

Mode of horizontal partition:

  • Range – this mode allows DBA s to divide data into different ranges. For example, the data of any year (s) after the 1980s (s) can be divided into three tables.
  • Hash – this mode allows DBAs to calculate the Hash Key of one or more columns of the table, and finally partition the data area corresponding to different values of the hash code. For example, a DBA can create a table that partitions the primary key of a table.
  • Key – an extension of the Hash mode, where the Hash Key is generated by the MySQL system.
  • List – this mode allows the system to segment the row data corresponding to the value of the list defined by the DBA. For example, DBA establishes a table across three partitions, based on the data corresponding to the values in 2004, 2005 and 2006 respectively.
  • Composite mode - it's very mysterious. Ha ha, it's actually a combination of the above modes, so I won't explain it. For example: on the table that has been initialized with Range range partition, we can hash one partition again.

=Vertical zoning (by column)=

Take a simple example: a table containing large text and BLOB columns. These text and BLOB columns are not often accessed. At this time, these infrequently used text and BLOB should be divided into another partition to ensure their data relevance and improve the access speed at the same time.

=Test process of partitioned table and non partitioned table=

Create a partition table, split by year of date

mysql> CREATE TABLE part_tab (
 c1 int default NULL, 
 c2 varchar(30) default NULL, 
 c3 date default NULL
) engine=myisam 
PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,
PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,
PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,
PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,
PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE );

Note the last line, taking into account the maximum possible value

Create unpartitioned table

mysql> create table no_part_tab (
        c1 int(11) default NULL,
        c2 varchar(30) default NULL,
        c3 date default NULL
       ) engine=myisam;

Inject 8 million test data through stored procedures

mysql> set sql_mode=''; /* If creating a stored procedure fails, you need to set this variable first. bug? */
mysql> delimiter //     /*Set the statement terminator to / / because the stored procedure statement is used; End*/

mysql> CREATE PROCEDURE load_part_tab()
	begin
		declare v int default 0;
		while v < 8000000
  		do
   			insert into part_tab values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652));
			set v = v + 1;
		end while;
	end
//
mysql> delimiter ;
mysql> call load_part_tab();
Query OK, 1 row affected (8 min 17.75 sec)

mysql> insert into no_part_tab select * from part_tab;      //Copy 8 million data to non partitioned table no_ part_ In tab

Query OK, 8000000 rows affected (51.59 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

Test SQL performance

mysql> select count(*) from part_tab where c3 > date('1995-01-01') and c3 < date('1995-12-31');
+----------+
| count(*) |
+----------+
|   795181 |
+----------+
1 row in set (0.55 sec)

mysql> select count(*) from no_part_tab where c3 > date('1995-01-01') and c3 < date('1995-12-31'); 
+----------+
| count(*) |
+----------+
|   795181 |
+----------+
1 row in set (4.69 sec)

The results show that the execution time of partitioned tables is 90% less than that of non partitioned tables.

Analyze the execution through the explain statement

mysql > explain select count(*) from no_part_tab where c3 > date('1995-01-01') and c3 < date ('1995-12-31') \G    #The \ G at the end changes the output of mysql to column mode 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: no_part_tab
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8000000               #8 million records need to be queried
        Extra: Using where
  1 row in set (0.00 sec)

mysql> explain select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31') \G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: part_tab
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 798458               #Only 798458 records need to be queried
        Extra: Using where
  1 row in set (0.00 sec)

Test the situation after index creation

mysql> create index idx_of_c3 on no_part_tab (c3);
Query OK, 8000000 rows affected (1 min 18.08 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

mysql> create index idx_of_c3 on part_tab (c3);
Query OK, 8000000 rows affected (1 min 19.19 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

List of database file sizes after index creation:

2008-05-24 09:23             8,608 no_part_tab.frm
2008-05-24 09:24       255,999,996 no_part_tab.MYD
2008-05-24 09:24        81,611,776 no_part_tab.MYI
2008-05-24 09:25                 0 part_tab#P#p0.MYD
2008-05-24 09:26             1,024 part_tab#P#p0.MYI
2008-05-24 09:26        25,550,656 part_tab#P#p1.MYD
2008-05-24 09:26         8,148,992 part_tab#P#p1.MYI
2008-05-24 09:26        25,620,192 part_tab#P#p10.MYD
2008-05-24 09:26         8,170,496 part_tab#P#p10.MYI
2008-05-24 09:25                 0 part_tab#P#p11.MYD
2008-05-24 09:26             1,024 part_tab#P#p11.MYI
2008-05-24 09:26        25,656,512 part_tab#P#p2.MYD
2008-05-24 09:26         8,181,760 part_tab#P#p2.MYI
2008-05-24 09:26        25,586,880 part_tab#P#p3.MYD
2008-05-24 09:26         8,160,256 part_tab#P#p3.MYI
2008-05-24 09:26        25,585,696 part_tab#P#p4.MYD
2008-05-24 09:26         8,159,232 part_tab#P#p4.MYI
2008-05-24 09:26        25,585,216 part_tab#P#p5.MYD
2008-05-24 09:26         8,159,232 part_tab#P#p5.MYI
2008-05-24 09:26        25,655,740 part_tab#P#p6.MYD
2008-05-24 09:26         8,181,760 part_tab#P#p6.MYI
2008-05-24 09:26        25,586,528 part_tab#P#p7.MYD
2008-05-24 09:26         8,160,256 part_tab#P#p7.MYI
2008-05-24 09:26        25,586,752 part_tab#P#p8.MYD
2008-05-24 09:26         8,160,256 part_tab#P#p8.MYI
2008-05-24 09:26        25,585,824 part_tab#P#p9.MYD
2008-05-24 09:26         8,159,232 part_tab#P#p9.MYI
2008-05-24 09:25             8,608 part_tab.frm
2008-05-24 09:25                68 part_tab.par

Test SQL performance again

mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31');
+----------+
| count(*) |
+----------+
|   795181 |
+----------+
  1 row in set (2.42 sec)   # 51% of the original 4.69 sec

After restart, the query time is reduced to 0.89 sec, almost the same as the partition table.

mysql ( net stop mysql, net start mysql)
mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31');

  +----------+
  | count(*) |
  +----------+
  |   795181 |
  +----------+
  1 row in set (0.86 sec)

Further tests

Add date range

mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1997-12-31');
+----------+
| count(*) |
+----------+
| 2396524 |
+----------+
1 row in set (5.42 sec)

mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1997-12-31');
+----------+
| count(*) |
+----------+
| 2396524 |
+----------+
1 row in set (2.63 sec)

Add unindexed field query

mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1996-12-31') and c2='hello';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (11.52 sec)

mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1996-12-31') and c2='hello';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.75 sec)

=Preliminary conclusion=

  * Partitioned and non partitioned occupy roughly the same file space (data and index files)
  * If there are non indexed fields in the query statement, the partition time is much better than the non partition time
  * If the fields in the query statement are indexed, the difference between partitioned and non partitioned is reduced, and partitioned is slightly better than non partitioned.

=Final conclusion=

  • For large amounts of data, it is recommended to use the partition function.
  • Remove unnecessary fields
  • Add MyISAM according to the manual_ max_ sort_ file_ Size will increase the partition performance (the maximum size of temporary files allowed when mysql rebuilds the index)

=Detailed explanation of zoning commands=

  • RANGE type
CREATE TABLE users (
       uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       name VARCHAR(30) NOT NULL DEFAULT '',
       email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY RANGE (uid) (
       PARTITION p0 VALUES LESS THAN (3000000)
       DATA DIRECTORY = '/data0/data'
       INDEX DIRECTORY = '/data1/idx',
 
       PARTITION p1 VALUES LESS THAN (6000000)
       DATA DIRECTORY = '/data2/data'
       INDEX DIRECTORY = '/data3/idx',
 
       PARTITION p2 VALUES LESS THAN (9000000)
       DATA DIRECTORY = '/data4/data'
       INDEX DIRECTORY = '/data5/idx',
 
       PARTITION p3 VALUES LESS THAN MAXVALUE     DATA DIRECTORY = '/data6/data' 
       INDEX DIRECTORY = '/data7/idx'
);

Here, the user table is divided into four partitions, with every 3 million records as the boundary. Each partition has its own independent storage directory of data and index files. At the same time, the physical disk partitions where these directories are located may also be completely independent, which can improve the disk IO throughput.

  • LIST type
CREATE TABLE category (
     cid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY LIST (cid) (
     PARTITION p0 VALUES IN (0,4,8,12)
     DATA DIRECTORY = '/data0/data' 
     INDEX DIRECTORY = '/data1/idx',
     
     PARTITION p1 VALUES IN (1,5,9,13)
     DATA DIRECTORY = '/data2/data'
     INDEX DIRECTORY = '/data3/idx',
     
     PARTITION p2 VALUES IN (2,6,10,14)
     DATA DIRECTORY = '/data4/data'
     INDEX DIRECTORY = '/data5/idx',
     
     PARTITION p3 VALUES IN (3,7,11,15)
     DATA DIRECTORY = '/data6/data'
     INDEX DIRECTORY = '/data7/idx'
);

It is divided into four areas, and the data files and index files are stored separately.

  • HASH type
CREATE TABLE users (
     uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name VARCHAR(30) NOT NULL DEFAULT '',
     email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY HASH (uid) PARTITIONS 4 (
     PARTITION p0
     DATA DIRECTORY = '/data0/data'
     INDEX DIRECTORY = '/data1/idx',
 
     PARTITION p1
     DATA DIRECTORY = '/data2/data'
     INDEX DIRECTORY = '/data3/idx',
 
     PARTITION p2
     DATA DIRECTORY = '/data4/data'
     INDEX DIRECTORY = '/data5/idx',
 
     PARTITION p3
     DATA DIRECTORY = '/data6/data'
     INDEX DIRECTORY = '/data7/idx'
);
  • KEY type
CREATE TABLE users (
     uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name VARCHAR(30) NOT NULL DEFAULT '',
     email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY KEY (uid) PARTITIONS 4 (
     PARTITION p0
     DATA DIRECTORY = '/data0/data'
     INDEX DIRECTORY = '/data1/idx',
     
     PARTITION p1
     DATA DIRECTORY = '/data2/data' 
     INDEX DIRECTORY = '/data3/idx',
     
     PARTITION p2 
     DATA DIRECTORY = '/data4/data'
     INDEX DIRECTORY = '/data5/idx',
     
     PARTITION p3 
     DATA DIRECTORY = '/data6/data'
     INDEX DIRECTORY = '/data7/idx'
);

It is divided into four areas, and the data files and index files are stored separately.

  • Sub partition

The sub partition is the re partition of each partition in the partition table of RANGE/LIST type. The re segmentation can be HASH/KEY and other types.

CREATE TABLE users (
     uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name VARCHAR(30) NOT NULL DEFAULT '',
     email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY RANGE (uid) SUBPARTITION BY HASH (uid % 4) SUBPARTITIONS 2(
     PARTITION p0 VALUES LESS THAN (3000000)
     DATA DIRECTORY = '/data0/data'
     INDEX DIRECTORY = '/data1/idx',
 
     PARTITION p1 VALUES LESS THAN (6000000)
     DATA DIRECTORY = '/data2/data'
     INDEX DIRECTORY = '/data3/idx'
);

The sub partition is divided into sub partitions by using rage.
perhaps

CREATE TABLE users (
     uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     name VARCHAR(30) NOT NULL DEFAULT '',
     email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY RANGE (uid) SUBPARTITION BY KEY(uid) SUBPARTITIONS 2(
     PARTITION p0 VALUES LESS THAN (3000000)
     DATA DIRECTORY = '/data0/data'
     INDEX DIRECTORY = '/data1/idx',
 
     PARTITION p1 VALUES LESS THAN (6000000)
     DATA DIRECTORY = '/data2/data'
     INDEX DIRECTORY = '/data3/idx'
);

The RANGE partition is divided into sub partitions again, and the sub partitions are of KEY type.

Partition management

delete a partition

ALERT TABLE users DROP PARTITION p0; #Delete partition p0

Rebuild partition

  • RANGE zoning reconstruction
ALTER TABLE users REORGANIZE PARTITION p0,p1 INTO (PARTITION p0 VALUES LESS THAN (6000000));  
#Merge the original p0 and P1 partitions and put them into the new p0 partition.
  • LIST partition reconstruction
ALTER TABLE users REORGANIZE PARTITION p0,p1 INTO (PARTITION p0 VALUES IN(0,1,4,5,8,9,12,13));
#Merge the original p0 and P1 partitions and put them into the new p0 partition.
  • HASH/KEY zoning reconstruction
ALTER TABLE users REORGANIZE PARTITION COALESCE PARTITION 2; 
#The number of partitions reconstructed by REORGANIZE becomes 2. Here, the number can only be reduced but not increased. If you want to add, you can use the ADD PARTITION method.

New partition

  • New RANGE partition
ALTER TABLE category ADD PARTITION (PARTITION p4 VALUES IN (16,17,18,19)
            DATA DIRECTORY = '/data8/data'
            INDEX DIRECTORY = '/data9/idx');
  • Add HASH/KEY partition
ALTER TABLE users ADD PARTITION PARTITIONS 8;   #Expand the total number of partitions to 8.

Partition an existing table

alter table results partition by RANGE (month(ttime)) 
(
PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN (2) , 
PARTITION p2 VALUES LESS THAN (3) ,
PARTITION p3 VALUES LESS THAN (4) , 
PARTITION p4 VALUES LESS THAN (5) ,
PARTITION p5 VALUES LESS THAN (6) , 
PARTITION p6 VALUES LESS THAN (7) ,
PARTITION p7 VALUES LESS THAN (8) , 
PARTITION p8 VALUES LESS THAN (9) ,
PARTITION p9 VALUES LESS THAN (10) , 
PARTITION p10 VALUES LESS THAN (11),
PARTITION p11 VALUES LESS THAN (12),
PARTITION P12 VALUES LESS THAN (13) 
);

The default partition limit is that the partition field must be part of the primary key. In order to remove this limit:

  • [method 1] use ID:
mysql> ALTER TABLE np_pk
    ->PARTITION BY HASH( TO_DAYS(added) )
    ->PARTITIONS 4;
#ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

mysql> ALTER TABLE np_pk
    -> PARTITION BY HASH(id)
    -> PARTITIONS 4;
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
  • [method 2] remove the original PK and generate a new PK
mysql> alter table results drop PRIMARY KEY;
Query OK, 5374850 rows affected (7 min 4.05 sec)
Records: 5374850 Duplicates: 0 Warnings: 0

mysql> alter table results add PRIMARY KEY(id, ttime);
Query OK, 5374850 rows affected (7 min 4.05 sec)
Records: 5374850 Duplicates: 0 Warnings: 0

Keywords: Database MySQL Big Data

Added by tobimichigan on Wed, 09 Mar 2022 13:13:19 +0200