We've already talked about it Principle and algorithm of MySQL InnoDB index Here's how to manage and use B+tree indexes and some new features.
Management of B+Tree Index
The basic indexes we use in the InnoDB engine are B+tree indexes.
Create and delete indexes
It can be created and deleted in two ways:
# Mode 1: alter table, in which case both index and key can be used, if all values are required to be non-duplicate, plus unique alter table tbl_name add [unique] index|key index_name (index_col_name,...); alter table tbl_name drop index|key index_name; # Mode 2: create/drop index, only index can be used at this time create index index_name on tbl_name (index_col_name,...); drop index index_name on tbl_name;
Modify Index
MySQL does not provide commands to modify the index. We usually delete the index and rebuild the index with the same name to achieve the goal of modification.
View Index
When we look at the table description, we can see which indexes are available in three ways:
#Method 1: View the statement that created the table mysql> show create table serviceHost; | Table | Create Table | t | CREATE TABLE `t` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` varchar(20) DEFAULT NULL, `b` varchar(20) DEFAULT NULL, `c` varchar(20) DEFAULT NULL, `d` varchar(20) DEFAULT NULL, `e` varchar(20) DEFAULT NULL, `f` varchar(20) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `a` (`a`), KEY `idx_b` (`b`), KEY `idex_cde` (`c`,`d`,`e`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | 1 row in set (0.00 sec)
You can see that the table has four indexes, a primary key set index (id), a unique secondary index (a), a single column auxiliary index (b), and a combined auxiliary index (c,d,e).
# Method 2: View the description of the table mysql> describe t; +-------+-------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | a | varchar(20) | YES | UNI | NULL | | | b | varchar(20) | YES | MUL | NULL | | | c | varchar(20) | YES | MUL | NULL | | | d | varchar(20) | YES | | NULL | | | e | varchar(20) | YES | | NULL | | | f | varchar(20) | YES | | NULL | | +-------+-------------+------+-----+---------+----------------+ 7 rows in set (0.01 sec)
You can see the properties of each field more clearly in this table. Note that the Key column represents an index with four values:
- PRI: Represents the column as the primary key
- UNI: represents the first column of the unique index (unique index allows multiple null values, non-null values must be unique)
- MUL: First column representing non-unique index
-empty: represents that the column has no index
d,e are also part of the composite index. Why is there no identification?Because of the nature of the index, we only label the first composite index except for the primary key combination.
# Method 3: View detailed index information mysql> show index from t; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | t | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | | t | 0 | a | 1 | a | A | 0 | NULL | NULL | YES | BTREE | | | | t | 1 | idx_b | 1 | b | A | 0 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 1 | c | A | 0 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 2 | d | A | 0 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 3 | e | A | 0 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 6 rows in set (0.00 sec)
The meaning of each column here is:
- Table:The name of the table in which the index is located
- Non_unique: Is the column value unique, but there can be multiple NULL s
- Key_name: The name of the index, which is used when drop ping
- Seq_in_index: Location of the column in the index
- Column_name: Name of index column
- Collation: The way columns are stored in the index, with a value of A or NULL.The B+tree index is always A, which is sorted (asc).NULL if it is a Heap storage engine and Hash index is built.
- Cardinality: Estimated amount of the deduplicated median value of the index.
- Sub_part: Are parts of columns indexed, numbers if part indexed, or NULL if not?
- Packed: How keywords are compressed.
- Null: Whether the index column contains a Null value.
- Index_type: Index type.Only B+tree indexes are supported in InnoDB engine, so they are all BTREE.
- Comment: Index comment.
Joint Index
Joint index refers to indexing multiple columns in the order in which they are indexed.
We can obviously query fields in an index directly to use this index:
mysql> explain select * from t where c='10' and d='10'; +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ | 1 | SIMPLE | t | ref | idex_cde | idex_cde | 126 | const,const | 1 | Using index condition | +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ 1 row in set (0.00 sec)
We also use this when we need to sort by this index:
mysql> explain select c from t order by c desc limit 10; +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ | 1 | SIMPLE | t | index | NULL | idex_cde | 189 | NULL | 10 | Using index | +----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+ 1 row in set (0.00 sec)
Override Index
InnoDB supports coverage indexes, which are records that can be queried from secondary indexes without querying the records of a clustered index.Since secondary indexes do not contain entire rows of data, they are smaller than clustered indexes, and each page can read more data, reducing the number of pages displaced, that is, reducing the number of I/O operations, which improves efficiency.
Note, however, that if we're choosing an entire row of data and we're also bookmarking to a clustered index, it's likely that the optimizer won't use our index:
# Not only does it use a secondary index, but this index condition uses a clustered index as we will see later mysql> explain select * from t where c='10' and d='10'; +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ | 1 | SIMPLE | t | ref | idex_cde | idex_cde | 68 | const,const | 1 | Using index condition | +----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------+ 1 row in set (0.00 sec) # use index represents the use of a secondary index mysql> explain select c,d from t where c='10' and d='10'; +----+-------------+-------+------+---------------+----------+---------+-------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+----------+---------+-------------+------+--------------------------+ | 1 | SIMPLE | t | ref | idex_cde | idex_cde | 68 | const,const | 1 | Using where; Using index | +----+-------------+-------+------+---------------+----------+---------+-------------+------+--------------------------+ 1 row in set (0.00 sec)
Index Tips
If you override the use where in an index, you can understand that the optimizer did not choose to use an index but instead used a table scan.If you use an index scan, you then need to find the entire row of data through a bookmark access. The data you find through a bookmark is out of order and becomes a discrete read operation.Sequential reading is much faster than discrete reading, so table scanning is chosen.This is not absolute, however, because secondary indexes are chosen when the amount of data accessed is small, and tables are scanned if more than 20% (estimate).
# Let's change the a value to a number type to select a range mysql> alter table t modify a int(11); Query OK, 230 rows affected (0.07 sec) Records: 230 Duplicates: 0 Warnings: 0 mysql> explain select * from t where a>10; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | t | ALL | a | NULL | NULL | NULL | 230 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from t where a<10; +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 8 | Using index condition | +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+ 1 row in set (0.00 sec)
The database optimizer works effectively and correctly most of the time, but professional DBA s can enforce certain indexes, but there are two cases where hints can be used:
- An index is used incorrectly in the database, causing statements to execute slowly and with a very low probability.
- There are so many indexes that the optimizer will take longer to select execution plans than SQL execution, such as analyzing range queries.Tips can be executed directly using the specified index without cost analysis.
There are two levels of index hints:
# An index hint simply tells the optimizer that it can choose, but the optimizer will decide for itself, not necessarily use it mysql> explain select * from t use index(a) order by a; +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 230 | Using filesort | +----+-------------+-------+------+---------------+------+---------+------+------+----------------+ 1 row in set (0.00 sec) # Force the use of an index force to ensure that the end result is consistent with the user's choice mysql> explain select * from t force index(a) order by a; +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | t | index | NULL | a | 5 | NULL | 230 | NULL | +----+-------------+-------+-------+---------------+------+---------+------+------+-------+ 1 row in set (0.00 sec)
Index parameters
All the parameters of the index are explained above, but there are a few that require special attention.
Column Header Part Index: Sub_part
We usually index the entire column of data, but if the content is long, we can also index only the beginning of the column:
mysql> drop index idx_b on t; mysql> create index idx_b on t (b(10)); mysql> show index from t where Key_name='idx_b'; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | t | 1 | idx_b | 1 | b | A | 0 | 10 | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.00 sec)
You can see that we only index the first 10 characters of column b, and the Sub_part field becomes 10.
Unique value estimate: Cardinality
Cardinality is critical and is often the basis on which we decide whether or not to use this index.It represents an estimate of the number of duplicate records in the index.It can also be hash level.
We randomly insert some data into column b in the table just now. From 100 to 1000, the rest of the columns are NULL and the id column increases itself:
mysql> select count(b) from t; +----------+ | count(b) | +----------+ | 110 | +----------+ 1 row in set (0.00 sec) mysql> select distinct(b) from t; +------+ | b | +------+ | 100 | | 200 | | 300 | | 400 | | 500 | | 600 | | 700 | | 800 | | 900 | | 1000 | +------+ 10 rows in set (0.00 sec) mysql> show index from t; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | t | 0 | PRIMARY | 1 | id | A | 110 | NULL | NULL | | BTREE | | | | t | 0 | a | 1 | a | A | 2 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 1 | c | A | 2 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 2 | d | A | 2 | NULL | NULL | YES | BTREE | | | | t | 1 | idex_cde | 3 | e | A | 2 | NULL | NULL | YES | BTREE | | | | t | 1 | idx_b | 1 | b | A | 20 | 10 | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 6 rows in set (0.00 sec)
You can see that the id column hashes 110, b the column hashes 20, and the rest are 2.It should be 10 and 1 in theory, and I can't find out why it doubled.Updates will occur if follow-up is found.
The calculation method for this value is:
- The InnoDB engine gets the number of leaf nodes in the B+tree index, which is A.
- Eight leaf nodes were randomly selected and the number of different records per page was counted as P1,p2,...P8.
- Cardinality = (P1+P2+...+P8) / 8 * A.
Since it is a sampling method, the values may be different each time.
In addition, as we mentioned in the previous section, we update this value through a strategy because the index is updated frequently:
- 1/16 of the data in the table has changed.
- Stat_modified_counter>200,000,000 (M, 2 billion CUD operations)
When we want to update the value manually, we can do either of the following:
analyze table; (recommended) show table status; show index; Access tables/statistics under information_schema;
There may be problems with database index statistics, resulting in NULL values, which can be updated manually.Because statistics can affect the performance of statistical moments, if we can manually update key tables in batches during peak hours, we can make the index serve you better: ananlyze tables;
Related parameters
# Number of sampling pages when setting statistics, default is 8 innodb_stats_sample_pages # Determines how to handle NULL values, which default to nulls_equal and are treated as equal records. innodb_stats_method = nulls_equal|nulls_unequal|nulls_ignored
Optimize the use of indexes
As we mentioned earlier, since secondary indexes do not store row data, they are searched on a clustered index (primary key index) unless only index columns are queried.At the same time, the primary key index is more efficient for continuous reading (bi-directional linked list) than the secondary index for skipping, so if the storage engine does not think that the index can effectively improve efficiency, it will use full table scanning directly.
The nature of the B+tree determines that it makes sense only to access a small portion of the data in the table.For example, there are two types of gender. When we query through this index, we usually have 50% of the data. At this time, it becomes a less selective field, and then the index is completely meaningless.If it's a name, it's less duplicated, and it's highly selective, so it's appropriate to use that index.
We can use the Cardinality field to determine whether to use an index, taking Cardinality/rows_in_table, which is good for indexing if the value is close to 1, and if the value is close to 0, you should consider deleting the unnecessary index.
Based on the above example, test the following:
mysql> describe select * from t where id=10; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | t | const | PRIMARY | PRIMARY | 4 | const | 1 | NULL | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.00 sec) mysql> describe select * from t where b=10; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | t | ALL | idx_b | NULL | NULL | NULL | 230 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec)
Note the possible_keys field, which represents the possible index used, that is, the index that exists for the column.The key field represents the actual index used.The last Using where value represents a full table scan.
New Index Optimization Function
New features have been added to versions 5.5 and 5.6 to optimize SQL execution speed from within the engine.
First of all, there are three types of database operations:
- DDL (Data Definition Languages): Data definition statements, such as changes to databases, tables, columns, indexes, and commands such as create/drop/alter.
- DML (Data Manipulation Languages): Data manipulation statements, such as database add-delete check, insert/update/delete/select, etc.
- DCL (Data Control Languages): Data control statement, such as setting permissions, command grant/revoke, etc.
Fast Index Creation
DDL operations such as adding or deleting indexes in pre-5.5 versions are as follows:
- Create a new table with a structure newly defined through alter table
- Then import the data from the original table into the temporary table
- Delete original table
- Finally rename the temporary table to the original table name
You can see that if the table has very large data, this process can be time consuming, and the database service is unavailable due to locks.After 5.5, a FIC index creation method was supported. For secondary index creation, an S (shared) shared lock would be added to the table, and the table would not need to be rebuilt, so the speed and usability would be much improved.
However, note that at this time only read operations, write operations are not available, and the method only targets secondary indexes, and the primary key still needs to be rebuilt.Deleting an index simply requires InnoDB to delete the internal view's index definition for the table and mark the index space as available.
Online Schema change
OSC was originally implemented by Facebook using PHP scripts to perform other transactions when there are read-write transactions that operate on tables, which improves the concurrency of databases during DDL operations.Steps:
- init: In the initialization phase, the table is validated and cannot be used if it does not have a primary key or has triggers or foreign keys.
- createCopyTable: Create a new table with the same structure as the original table
- alterCopyTable: Operate on a new table, such as adding an index or column
- createDeltasTable: Creates a deltas temporary table, after which all DML operations are recorded
- createTriggers: Create triggers on the original table, records from all DML operations are written to the deltas table
- startSnapshotXact: Start Transaction for OSC Operations
- selectTableIntoOutfile: Writes the original table data to a new table.To reduce the lock time of the original table, the data is output to several external files through fragmentation, and then the file data is imported into the new table.
- dropNCIndexs: Delete all secondary indexes in the new table before importing into it.
- loadCopyable: Import the exported slice file into a new table
- ReplyChanges: Applies DML operation records from deltas tables that exist during OSC to new tables.
- recreateNCIndexes: Re-create secondary index
- replayChanges: Replay the log, which will be replayed during the creation of the secondary index.
- swapTables: The original table and the new table exchange names and need to lock two tables, so the blocking is short because the rename changes quickly.
This functionality has some limitations, and because set sql_bin_log=0 can be set during execution, the master-from-master inconsistency can occur.
Online DDL (Online Data Definition)
For the first two functions, FIC will block some DML operations, OSC has great limitations, so Online DDL functions are supported after 5.6.It allows DML operations when creating secondary indexes, which greatly improves usability.It supports several change operations:
- Addition or deletion of secondary indexes
- Change in Self-Growing Value
- Add and remove foreign key constraints
- Rename Column
It supports three algorithms and four locks:
alter table t add key (f), ALGORITHM=DEFAULT|INPLACE|COPY, LOCK=DEFAULT|NONE|SHARED|EXCLUSIVE;
Algorithms:
- COPY: Old version method, creating temporary tables
- INPLACE: No need to create temporary tables
- DEFAULT: The old_alter_table parameter will be used to determine whether the old method is used or not, and the default is off
mysql> select @@version; +-----------+ | @@version | +-----------+ | 5.6.39 | +-----------+ 1 row in set (0.00 sec) mysql> show variables like 'old_alter_table'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | old_alter_table | OFF | +-----------------+-------+ 1 row in set (0.00 sec)
Four locks:
- NONE: Maximum concurrency without locking.
- SHARE: Like FIC, it can be read concurrently and not written concurrently.
- EXCLUSIVE: Add an X lock (exclusive lock) to block all threads.
- DEFAULT: Will judge whether it is available in turn, strive for maximum concurrency, NONE > SHARE > EXCLUSIVE.
The principle of Online DDL is to write the DML operation log in the cache during DDL operation, and after the operation is completed, it achieves one-time data on the table.The size of the cache is controlled by the innodb_online_alter_log_max_size parameter.
Multi-Range Read optimization
MRR optimization is designed to reduce random access to disks and convert them to sequential data access, resulting in significant performance improvements.This is mainly reflected in reducing the number of page replacements in the cache pool and batch processing of query operations on key values.For InnoDB and MyISAM storage engine range queries and JOIN queries, the principle is:
- The secondary index key value from the query is in the cache.
- Sort the cached key values according to the RowID.
- Access the actual data file according to the sort order of the RowID.
This optimization is mainly reflected in range queries using indexes, but queries the entire row of data:
mysql> explain select * from t where a<50 and a > 30; +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 18 | Using index condition; Using MRR | +----+-------------+-------+-------+---------------+------+---------+------+------+----------------------------------+ 1 row in set (0.00 sec)
This feature can be dozens of times more efficient, but sometimes it may not start and is controlled by tags in optimizer_switch.When mrr=on, MRR optimization is enabled; when mrr_cost_based=off, it is always enabled, and when = on, it is chosen based on the cost computed by the optimizer.
mysql> select @@optimizer_switch; +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @@optimizer_switch | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> set @@optimizer_switch='mrr_cost_based=off'; Query OK, 0 rows affected (0.00 sec)
Index Condition Pushdown
ICP optimization improves performance by filtering index conditions ahead and placing partial filtering in the storage engine layer, which reduces the reading of row data.
# Enforcement MRR optimization needs to be turned off before execution mysql> set @@optimizer_switch='mrr_cost_based=on'; Query OK, 0 rows affected (0.00 sec) mysql> explain select * from t where c='50' and d > 30; +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+ | 1 | SIMPLE | t | range | idex_cde | idex_cde | 68 | NULL | 1 | Using index condition | +----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+ 1 row in set (0.00 sec)
As you can see, we used ICP optimization, before version 5.6, we first filtered the condition c='50', read all the line records that match, and then filtered d>30.In the new feature, it detects that C and d have a combined index, so it filters ahead and reads the row records at once.This feature improves performance by 23% and can be used in conjunction with MRR.
Reference material
- mysql 5.6 native Online DDL parsing: http://seanlook.com/2016/05/2...
- mysql X lock and S lock: https://www.jianshu.com/p/342...
- Detail MySQL - DDL, DML and DCL statements: https://www.cnblogs.com/zhang...
- MySQL Insider InnoDB Storage Engine Version 2 Chapter 5: Indexes and Algorithms