1, Application scenario of partition table
1. Why use partition table?
The tables are so large that they can't all be put in memory, or there are only hot data in the last part of the table, and the others are historical data. Partitioned table refers to decomposing a table in the database into multiple smaller and easy to manage parts according to certain rules. Logically, there is only one table, but the bottom layer is composed of multiple physical partitions.
2. Benefits of using partitioned tables
(1) Data is easier to maintain
You can delete a large amount of data in batch by clearing the entire partition
Optimize, check and repair an independent partition
(2) Efficient use of equipment
Data can be distributed on different physical devices. Multiple hardware devices can be used efficiently to store more data than a single disk or file system
(3) Partitioned tables can be used to avoid some special bottlenecks
Exclusive access of single index of innodb (added later)
inode lock contention for ext3 file systems (added later)
(4) Optimize query
When the where statement contains partition conditions, you can scan only one or more partition tables to improve query efficiency; When sum and count statements are involved, they can also be processed in parallel on multiple partitions and finally summarize the results.
(5) Partition tables are easier to maintain.
For example, if you want to delete a large amount of data in batch, you can clear the entire partition and back up and restore independent partitions
2, Restrictions on partitioned tables
1. Limited number of partitions
A table can only have 1024 partitions at most. In version 5.7, it can support 8196 partitions.
2. Restrictions on separate table expressions
In MySQL 5.1, partition expressions must be integers or expressions that return integers. MySQL 5.5 provides support for non integer expression partitions.
3. Restrictions of partitioned tables on Indexes
If there are primary key or unique index columns in the partition field, many primary key columns and unique index columns must be included. That is, the partition field either does not contain primary key or index columns, or contains all primary key and index columns.
4. Foreign key constraints cannot be used in partitioned tables
5. Data and index are the same
MySQL partitioning is applicable to all data and indexes of a table. You cannot partition only table data without index partitioning, index partitioning without table partitioning, or part of table data partitioning.
3, Principle of partition table
Partition tables are implemented by multiple related underlying tables, which are also identified by handle objects. We can directly access each partition. The storage engine manages each underlying table of the partition as well as ordinary tables (all underlying tables must use the same storage engine) , the index knowledge of partitioned tables adds an identical index to each underlying table. From the perspective of the storage engine, the underlying table is no different from the ordinary table, and the storage engine does not need to know whether it is an ordinary table or a part of a partitioned table.
The partition table is operated according to the following operation logic:
1. select query
When querying a partition table, the partition layer first opens and locks all the underlying tables. The optimizer first determines whether some partitions can be filtered, and then calls the corresponding storage engine interface to access the data of each partition
2. insert operation
When writing a record, the partition layer first opens and locks all the underlying tables, then determines which partition accepts the record, and then writes the record to the corresponding underlying table
3. delete operation
When deleting a record, the partition layer first opens and locks all the underlying tables, then determines the partition corresponding to the data, and finally deletes the corresponding underlying tables
4. update operation
When updating a record, the partition layer first opens and locks all the underlying tables. mysql first determines the partition of the record to be updated, then takes out the data and updates it, and then determines which partition the updated data should be. Finally, write to the underlying table and delete the underlying table where the source data is located
5. Attention
Some operations support filtering. For example, when deleting a record, MySQL needs to find the record first. If the where condition exactly matches the partition expression, all partitions that do not contain the record can be filtered, which is also effective for update. If it is an insert operation, only one partition is hit, and other partitions will be filtered. MySQL first Determine which partition this record belongs to, and then write the record to the corresponding partition table. There is no need to operate on any other partition.
Although each operation will "open and lock all underlying tables first", this does not mean that the partitioned table locks the whole table during processing. If the storage engine can implement row level locks, such as innodb, it will release the corresponding table locks at the partition layer.
4, Type of partition table
1. Range partition
Assign rows to partitions within a given range based on column values.
The partition method of range partition table is: each partition contains row data, and the partition expression is within the given range. The partition range should be continuous and cannot overlap. You can use the value less than operator to define it.
(1) Create a common table
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL );
(2) . partition according to store_id
Create a partitioned table. The following table creation statement is partitioned according to store_id, specifying four partitions
CREATE TABLE employees2 ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN (21) );
(3) Use of less than maxvalue
In the current table creation statement, you can see that the values of store_id are 1-5 in p0 partition, 6-10 in p1 partition, 11-15 in p3 partition and 16-20 in p4 partition. However, if you insert a value exceeding 20, an error will be reported because mysql does not know which partition to place the data in. You can use less than maxvalue to avoid this situation.
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN MAXVALUE );
– maxvalue indicates an integer value that is always greater than or equal to the maximum possible integer value
(4) . partition according to job code
The table can be partitioned according to the employee's job code in the same way
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (job_code) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (10000) );
(5) . partition with date type
date type can be used for partitioning: if false, it can be divided according to the year each employee left the company, such as year(separated)
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY RANGE ( YEAR(separated) ) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1996), PARTITION p2 VALUES LESS THAN (2001), PARTITION p3 VALUES LESS THAN MAXVALUE );
(6) Use functions to partition tables
You can use functions to partition the table according to the value of range, such as timestamp unix_timestamp()
CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );
– timestamp does not allow any other expressions involving values
2. Interval based partitioning
Partition scheme based on time interval. In mysql5.7, partition scheme can be implemented based on range or event interval. There are two options
1. Range based partitioning. For partition expressions, you can use the operation function to return an integer value based on the date, time, or datatime columns
CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE ); CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );
2. For partition based on range column, use date or datatime column as partition column
CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE COLUMNS(joined) ( PARTITION p0 VALUES LESS THAN ('1960-01-01'), PARTITION p1 VALUES LESS THAN ('1970-01-01'), PARTITION p2 VALUES LESS THAN ('1980-01-01'), PARTITION p3 VALUES LESS THAN ('1990-01-01'), PARTITION p4 VALUES LESS THAN MAXVALUE );
3. List partition
It is similar to partitioning by range, except that the list partition is selected based on the column value matching a value in a discrete value set
`CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY LIST(store_id) ( PARTITION pNorth VALUES IN (3,5,6,9,17), PARTITION pEast VALUES IN (1,2,10,11,19,20), PARTITION pWest VALUES IN (4,12,13,14,18), PARTITION pCentral VALUES IN (7,8,15,16) );`
4. Column partition
Since 5.5, mysql supports column partition. i can be considered as an upgraded version of range and list. After 5.5, column partition can be used to replace range and list, but column partition only accepts ordinary columns and does not accept expressions
`CREATE TABLE `list_c` ( `c1` int(11) DEFAULT NULL, `c2` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50500 PARTITION BY RANGE COLUMNS(c1) (PARTITION p0 VALUES LESS THAN (5) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (10) ENGINE = InnoDB) */ CREATE TABLE `list_c` ( `c1` int(11) DEFAULT NULL, `c2` int(11) DEFAULT NULL, `c3` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50500 PARTITION BY RANGE COLUMNS(c1,c3) (PARTITION p0 VALUES LESS THAN (5,'aaa') ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (10,'bbb') ENGINE = InnoDB) */ CREATE TABLE `list_c` ( `c1` int(11) DEFAULT NULL, `c2` int(11) DEFAULT NULL, `c3` char(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50500 PARTITION BY LIST COLUMNS(c3) (PARTITION p0 VALUES IN ('aaa') ENGINE = InnoDB, PARTITION p1 VALUES IN ('bbb') ENGINE = InnoDB) */
5. hash partition
Partitions selected based on the return value of a user-defined expression that uses the column values of the rows to be inserted into the table. This function can contain any valid expression in myql that produces a nonnegative integer value
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY HASH(store_id) PARTITIONS 4; CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY LINEAR HASH(YEAR(hired)) PARTITIONS 4;
6. key partition
Similar to hash partition, the difference is that the key partition only supports one or more columns, and the mysql server provides its own hash function. One or more columns must contain integer values
CREATE TABLE tk ( col1 INT NOT NULL, col2 CHAR(5), col3 DATE ) PARTITION BY LINEAR KEY (col1) PARTITIONS 3;
7. Sub partition
On the basis of partition, post partition storage is performed
CREATE TABLE `t_partition_by_subpart` ( `id` INT AUTO_INCREMENT, `sName` VARCHAR(10) NOT NULL, `sAge` INT(2) UNSIGNED ZEROFILL NOT NULL, `sAddr` VARCHAR(20) DEFAULT NULL, `sGrade` INT(2) NOT NULL, `sStuId` INT(8) DEFAULT NULL, `sSex` INT(1) UNSIGNED DEFAULT NULL, PRIMARY KEY (`id`, `sGrade`) ) ENGINE = INNODB PARTITION BY RANGE(id) SUBPARTITION BY HASH(sGrade) SUBPARTITIONS 2 ( PARTITION p0 VALUES LESS THAN(5), PARTITION p1 VALUES LESS THAN(10), PARTITION p2 VALUES LESS THAN(15) );
5, How to use partitioned tables
If you need to query records of a certain period of time from a very large table, and this table contains historical data of many years, and the data is sorted by time, how should you query the data?
Because of the huge amount of data, you can't scan the whole table every time you query. Considering the space and maintenance consumption of the index, you do not want to use the index. Even if you use the index, you will find that it will produce a lot of fragments and a lot of random IO. However, when the amount of data is too large, the index will not work. At this time, you can consider using partition to solve it
1. Scan the data in full without any index
Use a simple partition method to store tables without any indexes. Roughly locate the required data according to the partition rules. Limit the required data to a few partitions by using the where condition. This strategy is suitable for accessing a large amount of data in a normal way
2. Index data and separate hotspots
If the data has obvious hot spots, and other data is rarely accessed except this part of the data, then this part of the hot spot data can be placed in a separate partition, so that the data of this partition can be cached in memory. In this way, the query can access only a small partition table, use the index and effectively use the cache