=Horizontal partition (by row according to column attribute)=
Take a simple example: a table containing ten-year invoice records can be divided into ten different partitions, and each partition contains the records of one year.
Mode of horizontal partition:
- Range – this mode allows DBA s to divide data into different ranges. For example, the data of any year (s) after the 1980s (s) can be divided into three tables.
- Hash – this mode allows DBAs to calculate the Hash Key of one or more columns of the table, and finally partition the data area corresponding to different values of the hash code. For example, a DBA can create a table that partitions the primary key of a table.
- Key – an extension of the Hash mode, where the Hash Key is generated by the MySQL system.
- List – this mode allows the system to segment the row data corresponding to the value of the list defined by the DBA. For example, DBA establishes a table across three partitions, based on the data corresponding to the values in 2004, 2005 and 2006 respectively.
- Composite mode - it's very mysterious. Ha ha, it's actually a combination of the above modes, so I won't explain it. For example: on the table that has been initialized with Range range partition, we can hash one partition again.
=Vertical zoning (by column)=
Take a simple example: a table containing large text and BLOB columns. These text and BLOB columns are not often accessed. At this time, these infrequently used text and BLOB should be divided into another partition to ensure their data relevance and improve the access speed at the same time.
=Test process of partitioned table and non partitioned table=
Create a partition table, split by year of date
mysql> CREATE TABLE part_tab ( c1 int default NULL, c2 varchar(30) default NULL, c3 date default NULL ) engine=myisam PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995), PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) , PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) , PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) , PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) , PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010), PARTITION p11 VALUES LESS THAN MAXVALUE );
Note the last line, taking into account the maximum possible value
Create unpartitioned table
mysql> create table no_part_tab ( c1 int(11) default NULL, c2 varchar(30) default NULL, c3 date default NULL ) engine=myisam;
Inject 8 million test data through stored procedures
mysql> set sql_mode=''; /* If creating a stored procedure fails, you need to set this variable first. bug? */ mysql> delimiter // /*Set the statement terminator to / / because the stored procedure statement is used; End*/ mysql> CREATE PROCEDURE load_part_tab() begin declare v int default 0; while v < 8000000 do insert into part_tab values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 3652)); set v = v + 1; end while; end // mysql> delimiter ; mysql> call load_part_tab(); Query OK, 1 row affected (8 min 17.75 sec) mysql> insert into no_part_tab select * from part_tab; //Copy 8 million data to non partitioned table no_ part_ In tab Query OK, 8000000 rows affected (51.59 sec) Records: 8000000 Duplicates: 0 Warnings: 0
Test SQL performance
mysql> select count(*) from part_tab where c3 > date('1995-01-01') and c3 < date('1995-12-31'); +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (0.55 sec) mysql> select count(*) from no_part_tab where c3 > date('1995-01-01') and c3 < date('1995-12-31'); +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (4.69 sec)
The results show that the execution time of partitioned tables is 90% less than that of non partitioned tables.
Analyze the execution through the explain statement
mysql > explain select count(*) from no_part_tab where c3 > date('1995-01-01') and c3 < date ('1995-12-31') \G #The \ G at the end changes the output of mysql to column mode *************************** 1. row *************************** id: 1 select_type: SIMPLE table: no_part_tab type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8000000 #8 million records need to be queried Extra: Using where 1 row in set (0.00 sec) mysql> explain select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31') \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_tab type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 798458 #Only 798458 records need to be queried Extra: Using where 1 row in set (0.00 sec)
Test the situation after index creation
mysql> create index idx_of_c3 on no_part_tab (c3); Query OK, 8000000 rows affected (1 min 18.08 sec) Records: 8000000 Duplicates: 0 Warnings: 0 mysql> create index idx_of_c3 on part_tab (c3); Query OK, 8000000 rows affected (1 min 19.19 sec) Records: 8000000 Duplicates: 0 Warnings: 0
List of database file sizes after index creation:
2008-05-24 09:23 8,608 no_part_tab.frm 2008-05-24 09:24 255,999,996 no_part_tab.MYD 2008-05-24 09:24 81,611,776 no_part_tab.MYI 2008-05-24 09:25 0 part_tab#P#p0.MYD 2008-05-24 09:26 1,024 part_tab#P#p0.MYI 2008-05-24 09:26 25,550,656 part_tab#P#p1.MYD 2008-05-24 09:26 8,148,992 part_tab#P#p1.MYI 2008-05-24 09:26 25,620,192 part_tab#P#p10.MYD 2008-05-24 09:26 8,170,496 part_tab#P#p10.MYI 2008-05-24 09:25 0 part_tab#P#p11.MYD 2008-05-24 09:26 1,024 part_tab#P#p11.MYI 2008-05-24 09:26 25,656,512 part_tab#P#p2.MYD 2008-05-24 09:26 8,181,760 part_tab#P#p2.MYI 2008-05-24 09:26 25,586,880 part_tab#P#p3.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p3.MYI 2008-05-24 09:26 25,585,696 part_tab#P#p4.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p4.MYI 2008-05-24 09:26 25,585,216 part_tab#P#p5.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p5.MYI 2008-05-24 09:26 25,655,740 part_tab#P#p6.MYD 2008-05-24 09:26 8,181,760 part_tab#P#p6.MYI 2008-05-24 09:26 25,586,528 part_tab#P#p7.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p7.MYI 2008-05-24 09:26 25,586,752 part_tab#P#p8.MYD 2008-05-24 09:26 8,160,256 part_tab#P#p8.MYI 2008-05-24 09:26 25,585,824 part_tab#P#p9.MYD 2008-05-24 09:26 8,159,232 part_tab#P#p9.MYI 2008-05-24 09:25 8,608 part_tab.frm 2008-05-24 09:25 68 part_tab.par
Test SQL performance again
mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31'); +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (2.42 sec) # 51% of the original 4.69 sec
After restart, the query time is reduced to 0.89 sec, almost the same as the partition table.
mysql ( net stop mysql, net start mysql)
mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1995-12-31'); +----------+ | count(*) | +----------+ | 795181 | +----------+ 1 row in set (0.86 sec)
Further tests
Add date range
mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1997-12-31'); +----------+ | count(*) | +----------+ | 2396524 | +----------+ 1 row in set (5.42 sec) mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1997-12-31'); +----------+ | count(*) | +----------+ | 2396524 | +----------+ 1 row in set (2.63 sec)
Add unindexed field query
mysql> select count(*) from no_part_tab where c3 > date ('1995-01-01') and c3 < date ('1996-12-31') and c2='hello'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (11.52 sec) mysql> select count(*) from part_tab where c3 > date ('1995-01-01') and c3 < date ('1996-12-31') and c2='hello'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.75 sec)
=Preliminary conclusion=
* Partitioned and non partitioned occupy roughly the same file space (data and index files) * If there are non indexed fields in the query statement, the partition time is much better than the non partition time * If the fields in the query statement are indexed, the difference between partitioned and non partitioned is reduced, and partitioned is slightly better than non partitioned.
=Final conclusion=
- For large amounts of data, it is recommended to use the partition function.
- Remove unnecessary fields
- Add MyISAM according to the manual_ max_ sort_ file_ Size will increase the partition performance (the maximum size of temporary files allowed when mysql rebuilds the index)
=Detailed explanation of zoning commands=
- RANGE type
CREATE TABLE users ( uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '', email VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY RANGE (uid) ( PARTITION p0 VALUES LESS THAN (3000000) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 VALUES LESS THAN (6000000) DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx', PARTITION p2 VALUES LESS THAN (9000000) DATA DIRECTORY = '/data4/data' INDEX DIRECTORY = '/data5/idx', PARTITION p3 VALUES LESS THAN MAXVALUE DATA DIRECTORY = '/data6/data' INDEX DIRECTORY = '/data7/idx' );
Here, the user table is divided into four partitions, with every 3 million records as the boundary. Each partition has its own independent storage directory of data and index files. At the same time, the physical disk partitions where these directories are located may also be completely independent, which can improve the disk IO throughput.
- LIST type
CREATE TABLE category ( cid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY LIST (cid) ( PARTITION p0 VALUES IN (0,4,8,12) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 VALUES IN (1,5,9,13) DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx', PARTITION p2 VALUES IN (2,6,10,14) DATA DIRECTORY = '/data4/data' INDEX DIRECTORY = '/data5/idx', PARTITION p3 VALUES IN (3,7,11,15) DATA DIRECTORY = '/data6/data' INDEX DIRECTORY = '/data7/idx' );
It is divided into four areas, and the data files and index files are stored separately.
- HASH type
CREATE TABLE users ( uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '', email VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY HASH (uid) PARTITIONS 4 ( PARTITION p0 DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx', PARTITION p2 DATA DIRECTORY = '/data4/data' INDEX DIRECTORY = '/data5/idx', PARTITION p3 DATA DIRECTORY = '/data6/data' INDEX DIRECTORY = '/data7/idx' );
- KEY type
CREATE TABLE users ( uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '', email VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY KEY (uid) PARTITIONS 4 ( PARTITION p0 DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx', PARTITION p2 DATA DIRECTORY = '/data4/data' INDEX DIRECTORY = '/data5/idx', PARTITION p3 DATA DIRECTORY = '/data6/data' INDEX DIRECTORY = '/data7/idx' );
It is divided into four areas, and the data files and index files are stored separately.
- Sub partition
The sub partition is the re partition of each partition in the partition table of RANGE/LIST type. The re segmentation can be HASH/KEY and other types.
CREATE TABLE users ( uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '', email VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY RANGE (uid) SUBPARTITION BY HASH (uid % 4) SUBPARTITIONS 2( PARTITION p0 VALUES LESS THAN (3000000) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 VALUES LESS THAN (6000000) DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx' );
The sub partition is divided into sub partitions by using rage.
perhaps
CREATE TABLE users ( uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(30) NOT NULL DEFAULT '', email VARCHAR(30) NOT NULL DEFAULT '' ) PARTITION BY RANGE (uid) SUBPARTITION BY KEY(uid) SUBPARTITIONS 2( PARTITION p0 VALUES LESS THAN (3000000) DATA DIRECTORY = '/data0/data' INDEX DIRECTORY = '/data1/idx', PARTITION p1 VALUES LESS THAN (6000000) DATA DIRECTORY = '/data2/data' INDEX DIRECTORY = '/data3/idx' );
The RANGE partition is divided into sub partitions again, and the sub partitions are of KEY type.
Partition management
delete a partition
ALERT TABLE users DROP PARTITION p0; #Delete partition p0
Rebuild partition
- RANGE zoning reconstruction
ALTER TABLE users REORGANIZE PARTITION p0,p1 INTO (PARTITION p0 VALUES LESS THAN (6000000)); #Merge the original p0 and P1 partitions and put them into the new p0 partition.
- LIST partition reconstruction
ALTER TABLE users REORGANIZE PARTITION p0,p1 INTO (PARTITION p0 VALUES IN(0,1,4,5,8,9,12,13)); #Merge the original p0 and P1 partitions and put them into the new p0 partition.
- HASH/KEY zoning reconstruction
ALTER TABLE users REORGANIZE PARTITION COALESCE PARTITION 2; #The number of partitions reconstructed by REORGANIZE becomes 2. Here, the number can only be reduced but not increased. If you want to add, you can use the ADD PARTITION method.
New partition
- New RANGE partition
ALTER TABLE category ADD PARTITION (PARTITION p4 VALUES IN (16,17,18,19) DATA DIRECTORY = '/data8/data' INDEX DIRECTORY = '/data9/idx');
- Add HASH/KEY partition
ALTER TABLE users ADD PARTITION PARTITIONS 8; #Expand the total number of partitions to 8.
Partition an existing table
alter table results partition by RANGE (month(ttime)) ( PARTITION p0 VALUES LESS THAN (1), PARTITION p1 VALUES LESS THAN (2) , PARTITION p2 VALUES LESS THAN (3) , PARTITION p3 VALUES LESS THAN (4) , PARTITION p4 VALUES LESS THAN (5) , PARTITION p5 VALUES LESS THAN (6) , PARTITION p6 VALUES LESS THAN (7) , PARTITION p7 VALUES LESS THAN (8) , PARTITION p8 VALUES LESS THAN (9) , PARTITION p9 VALUES LESS THAN (10) , PARTITION p10 VALUES LESS THAN (11), PARTITION p11 VALUES LESS THAN (12), PARTITION P12 VALUES LESS THAN (13) );
The default partition limit is that the partition field must be part of the primary key. In order to remove this limit:
- [method 1] use ID:
mysql> ALTER TABLE np_pk ->PARTITION BY HASH( TO_DAYS(added) ) ->PARTITIONS 4; #ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function mysql> ALTER TABLE np_pk -> PARTITION BY HASH(id) -> PARTITIONS 4; Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0
- [method 2] remove the original PK and generate a new PK
mysql> alter table results drop PRIMARY KEY; Query OK, 5374850 rows affected (7 min 4.05 sec) Records: 5374850 Duplicates: 0 Warnings: 0 mysql> alter table results add PRIMARY KEY(id, ttime); Query OK, 5374850 rows affected (7 min 4.05 sec) Records: 5374850 Duplicates: 0 Warnings: 0