background
Whether you're a technology guru or a newcomer to Xiaobai, you will step into the pit of Mysql database without index from time to time. A common phenomenon is that the index is clearly added to the field, but it does not take effect.
A few days ago, I encountered a slightly special scenario. The same SQL statement takes effect under some parameters and does not take effect under some parameters. Why?
In addition, you should understand and learn about the general situation of Mysql index failure, whether in an interview or daily.
In order to facilitate learning and memory, this document summarizes 15 common cases of not going to the index and shows them with examples to help you better avoid stepping on the pit. It is recommended to collect it in case of need.
Database and index preparation
Create table structure
In order to verify the usage of the index item by item, we first prepare a table t_user:
CREATE TABLE `t_user` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID', `id_no` varchar(18) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL COMMENT 'ID number', `username` varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL COMMENT 'user name', `age` int(11) DEFAULT NULL COMMENT 'Age', `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time', PRIMARY KEY (`id`), KEY `union_idx` (`id_no`,`username`,`age`), KEY `create_time_idx` (`create_time`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
There are three indexes in the above table structure:
- id: database primary key;
- union_idx: ID_ Joint index composed of no, username and age;
- create_time_idx: created by_ Ordinary index composed of time;
Initialization data
The initialization data is divided into two parts: basic data and batch import data.
The basic data insert s four pieces of data, of which the creation time of the fourth piece of data is the future time, which is used for the verification of subsequent special scenarios:
INSERT INTO `t_user` (`id`, `id_no`, `username`, `age`, `create_time`) VALUES (null, '1001', 'Tom1', 11, '2022-02-27 09:04:23'); INSERT INTO `t_user` (`id`, `id_no`, `username`, `age`, `create_time`) VALUES (null, '1002', 'Tom2', 12, '2022-02-26 09:04:23'); INSERT INTO `t_user` (`id`, `id_no`, `username`, `age`, `create_time`) VALUES (null, '1003', 'Tom3', 13, '2022-02-25 09:04:23'); INSERT INTO `t_user` (`id`, `id_no`, `username`, `age`, `create_time`) VALUES (null, '1004', 'Tom4', 14, '2023-02-25 09:04:23');
In addition to the basic data, there is also a stored procedure and its called SQL, which is convenient for batch inserting data and used to verify scenarios with more data:
-- Delete history stored procedure DROP PROCEDURE IF EXISTS `insert_t_user` -- Create stored procedure delimiter $ CREATE PROCEDURE insert_t_user(IN limit_num int) BEGIN DECLARE i INT DEFAULT 10; DECLARE id_no varchar(18) ; DECLARE username varchar(32) ; DECLARE age TINYINT DEFAULT 1; WHILE i < limit_num DO SET id_no = CONCAT("NO", i); SET username = CONCAT("Tom",i); SET age = FLOOR(10 + RAND()*2); INSERT INTO `t_user` VALUES (NULL, id_no, username, age, NOW()); SET i = i + 1; END WHILE; END $ -- Call stored procedure call insert_t_user(100);
For the creation and storage of stored procedures, they can not be executed temporarily, and they can be executed when they are used.
Database version and execution plan
View the current version of the database:
select version(); 8.0.18
The above is the database version I tested: 8.0.18. Of course, all the examples below can be verified in other versions.
To view the SQL statement execution plan, we generally use the explain keyword to judge the index usage through the execution results.
Execution example:
explain select * from t_user where id = 1;
Execution result:
You can see that the above SQL statement uses the PRIMARY key index (PRIMARY), key_len is 4;
Where key_len means the number of bytes used by the index. According to this value, you can judge the use of the index. Especially when combining indexes, it is very important to judge how many parts of the index are used.
Prepare for the above data and knowledge, and then start to explain the specific examples of index failure.
1. The union index does not meet the leftmost matching principle
The union index follows the leftmost matching principle. As the name suggests, in the union index, the leftmost field is matched first. Therefore, when creating a federated index, the most frequently used field in the where clause is placed on the far left of the composite index.
When querying, if you want the query criteria to be indexed, you need to meet the following requirements: the leftmost field should appear in the query criteria.
In the instance, union_ Composition of IDX joint index:
KEY `union_idx` (`id_no`,`username`,`age`)
The leftmost field is id_no, generally, as long as the ID is guaranteed_ If no appears in the query criteria, the joint index will be taken.
Example 1:
explain select * from t_user where id_no = '1002';
explain result:
It can be seen from the explain execution result that the above SQL statement has gone to union_idx this index.
Let's popularize it here_ Calculation of len:
- id_ The no type is varchar(18), and the character set is utf8mb4_bin, that is, use 4 bytes to represent a complete UTF-8. At this point, key_len = 18* 4 = 72;
- Since the field type varchar is a variable length data type, two additional bytes need to be added. At this point, key_len = 72 + 2 = 74;
- Since the field is NULL (default NULL), another byte needs to be added. At this point, key_len = 74 + 1 = 75;
The above demonstrates key_len is the calculation process of one case. It will not be deduced one by one in the future. You can know the basic composition and principle. You can check more cases by yourself.
Example 2:
explain select * from t_user where id_no = '1002' and username = 'Tom2';
explain result:
Obviously, I still left the union_idx index, according to the above key_len's analysis, bold guess, when using the index, not only the ID is used_ No column, and the username column is also used.
Example 3:
explain select * from t_user where id_no = '1002' and age = 12;
explain result:
Let's go, union_idx index, but like the example, only ID is used_ No column.
Of course, if there are three columns in the query criteria, there will be no more examples. The above are all positive examples of indexing, that is, examples that meet the leftmost matching principle. Let's take a look at the reverse examples that do not meet the principle.
Reverse example:
explain select * from t_user where username = 'Tom2' and age = 12;
explain result:
At this point, you can see that no index is taken, that is, the index is invalid.
Similarly, as long as there is no leftmost condition combination, the index is invalid:
explain select * from t_user where age = 12; explain select * from t_user where username = 'Tom2';
Then, the first scenario of index failure is: in the scenario of joint index, the query conditions do not meet the leftmost matching principle.
2. select is used*
There is a [mandatory] specification in the ORM mapping chapter of Alibaba Development Manual:
[mandatory] in table query, do not use * as the field list of query, and specify which fields are required. Note: 1) increase the parsing cost of query analyzer. 2) The increase / decrease fields are easily inconsistent with the resultMap configuration. 3) Useless fields increase network consumption, especially fields of type text.
Although the index problem is not mentioned in the specification manual, the side benefit of prohibiting the use of select * statements may be that the index can be overwritten in some cases.
For example, in the above joint index, if the query condition is age or username, when select * is used, the index will not be used.
But if you want to find the ID based on username_ No, username and age results (all index fields), specifying the query result field, can override the index:
explain select id_no, username, age from t_user where username = 'Tom2'; explain select id_no, username, age from t_user where age = 12;
explain result:
No matter whether the query condition is username or age, the index is used. According to the key_len can see that all columns of the index are used.
The second index failure scenario: under the joint index, try to use explicit query columns to tend to overwrite the index;
This case without index is an optimization item. If the business scenario is met, it will come in and urge the SQL statement to index. As for the specification in Alibaba's development manual, it's just that the two collide. The specification itself is not determined by this index rule.
3. The index column participates in the operation
Take a direct look at the example:
explain select * from t_user where id + 1 = 2 ;
explain result:
It can be seen that even if the id column has an index, the index cannot be accessed normally due to calculation processing.
In view of this situation, it is not only the problem of index, but also increases the computational burden of the database. Take the above SQL statement as an example. The database needs to scan all the id field values in the whole table, then calculate them, and then compare them with the parameter values. If you go through the above steps every time, the performance loss can be imagined.
The recommended method is to calculate the expected value in memory first, or calculate the parameter value on the right side of the SQL statement condition.
The optimization for the above example is as follows:
-- Memory calculation to know the information to be queried id Is 1 explain select * from t_user where id = 1 ; -- Parameter side calculation explain select * from t_user where id = 2 - 1 ;
The third kind of index failure: the index column participates in the operation, which will lead to full table scanning and index failure.
4 index column parameter uses function
Example:
explain select * from t_user where SUBSTR(id_no,1,3) = '100';
explain result:
In the above example, the index column uses the function (SUBSTR, string interception), resulting in the invalidation of the index.
At this time, the reason for index failure is the same as that in the third case, because the database needs to scan the whole table first, and then intercept and calculate the data, resulting in index failure. At the same time, it is accompanied by performance problems.
In the example, only SUBSTR functions are listed. Similar situations will occur for similar functions such as CONCAT. The solution can refer to the third scenario. You can consider reducing the database through memory calculation or other methods to process the content.
The fourth kind of index failure: the index column participates in function processing, which will lead to full table scanning and index failure.
5 wrong use of Like
Example:
explain select * from t_user where id_no like '%00%';
explain result:
The use of like is very frequent, but improper use often leads to no index. Common ways to use like are:
- Method 1: like '%abc';
- Method 2: like 'abc%';
- Method 3: like '%abc%';
In mode 1 and mode 3, the index cannot be accessed because the placeholder appears in the header. In this case, the reason for not indexing is easy to understand. The index itself is equivalent to a directory, sorted one by one from left to right. The placeholder is used on the left side of the condition, which makes it impossible to match according to the normal directory and leads to index failure.
The fifth index invalidation: in the case of fuzzy query (like statement), the placeholder of fuzzy matching is located at the head of the condition.
6 type implicit conversion
Example:
explain select * from t_user where id_no = 1002;
explain result:
id_ The no field type is varchar, but the int type is used in the SQL statement, resulting in a full table scan.
The reason for index invalidation is that varchar and int are two different types.
The solution is to add single or double quotation marks to parameter 1002.
The sixth index invalidation: the parameter type does not match the field type, resulting in implicit conversion of the type and index invalidation.
There is another special case in this case. If the field type is int and the query condition adds single quotation marks or double quotation marks, Mysql will convert the parameter to int, although single quotation marks or double quotation marks are used:
explain select * from t_user where id = '2';
The above statement will still go through the index.
7. Use OR operation
OR is the most commonly used operation keyword, but improper use will also lead to index invalidation.
Example:
explain select * from t_user where id = 2 or username = 'Tom2';
explain result:
Is it surprising to see the above execution results? It is clear that the id field has an index. Due to the use of the or keyword, the index is invalid.
In fact, from another perspective, if you use the username field alone as a condition, it is obviously a full table scan. Since the full table scan has been carried out, it is a waste to index the previous id condition again. Therefore, when using the or keyword, remember to add an index to both conditions, otherwise the index will become invalid.
However, if both sides use ">" and "<", the index will also be invalidated:
explain select * from t_user where id > 1 or id < 80;
explain result:
The seventh kind of index invalidation: if the or keyword is used in the query condition, and an index is not created for one field, the index of the whole query statement will be invalid; Or when both sides are ">" and "<" range query, the index is invalid.
8 compare the two columns
If both column data have indexes, but the two column data are compared in the query criteria, the index will become invalid.
Here is an inappropriate example, for example, two columns such as age less than id (the real scene may be the comparison of two columns of data with the same dimension, which accommodates the existing table structure):
explain select * from t_user where id > age;
explain result:
Although the id has an index and the age can also create an index, the index will fail when the two columns are compared.
The eighth index failure condition: compare two columns of data. Even if the indexes are created in both columns, the indexes will fail.
9 is not equal to comparison
Example:
explain select * from t_user where id_no <> '1002';
explain result:
When the query condition is string, use "< >" or "= "As a conditional query, the index may not be used, but not all of them.
explain select * from t_user where create_time != '2022-02-27 09:56:42';
In the above SQL, since "2022-02-27 09:56:42" is generated by the stored procedure in the same second, a large amount of data is generated at this time. After execution, it will be found that when the proportion of query result set is small, the index will be taken, and when the proportion is large, the index will not be taken. This is related to the proportion of result set and population.
It should be noted that if the above statement performs unequal operations with id, the index will be executed normally.
explain select * from t_user where id != 2;
explain result:
The ninth kind of index invalidation: when using different query conditions for comparison, you need to be careful. Ordinary indexes will fail when the query result set accounts for a large proportion.
10 is not null
Example:
explain select * from t_user where id_no is not null;
explain result:
The tenth index failure condition: when the query condition uses is null, the index is normally taken, and when the query condition uses is not null, the index is not taken.
11 not in and not exists
In, exists, not in, not exists, between and, and so on, are widely used in daily range queries.
explain select * from t_user where id in (2,3); explain select * from t_user where id_no in ('1001','1002'); explain select * from t_user u1 where exists (select 1 from t_user u2 where u2.id = 2 and u2.id = u1.id); explain select * from t_user where id_no between '1002' and '1003';
When the above four statements are executed, the index will be executed normally, and the specific explain results will not be displayed. It mainly depends on the index:
explain select * from t_user where id_no not in('1002' , '1003');
explain result:
When not in is used, the index is not used? Try replacing the condition column with the primary key:
explain select * from t_user where id not in (2,3);
explain result:
If it is a primary key, the index is used normally.
The eleventh kind of index invalidation: when not in is used for query criteria, if it is a primary key, the index will be used; if it is a common index, the index will be invalid.
Let's look at not exists:
explain select * from t_user u1 where not exists (select 1 from t_user u2 where u2.id = 2 and u2.id = u1.id);
explain result:
When the query condition uses not exists, the index is not used.
The twelfth index invalidation condition: when the query condition uses not exists, the index is invalid.
12 order by causes index invalidation
Example:
explain select * from t_user order by id_no ;
explain result:
In fact, the index failure in this case is easy to understand. After all, the whole table data needs to be sorted.
So, do you want to add or delete the limit keyword and leave the index?
explain select * from t_user order by id_no limit 10;
explain result:
As a result, the index still doesn't go. It is said on the network that if the order by condition meets the leftmost match, the index will be taken normally, which does not appear in the current version 8.0.18. Therefore, pay special attention when using based on order by and limit. Whether to index or not involves not only the database version, but also how the Mysql optimizer handles it.
There is another special case here, that is, when order by is used for the primary key, the index can be used normally.
explain select * from t_user order by id desc;
explain result:
It can be seen that for the primary key, or order by, the index can be used normally.
In addition, the author tests the following SQL statements:
explain select id from t_user order by age; explain select id , username from t_user order by age; explain select id_no from t_user order by id_no;
The above three SQL statements are indexed, that is to say, the scenario of overwriting the index can also be indexed normally.
The id and id will now be_ No combined to order by:
explain select * from t_user order by id,id_no desc; explain select * from t_user order by id,id_no desc limit 10; explain select * from t_user order by id_no desc,username desc;
explain result:
Neither of the above two SQL statements is indexed.
The thirteenth kind of index invalidation: when the query conditions involve order by, limit and other conditions, whether to go to the index is complex and related to the Mysql version. Generally, ordinary indexes will not go to the index if limit is not used. When order by multiple index fields, the index may not be taken. In other cases, it is recommended to perform export verification when using.
13 index invalidation due to different parameters
At this time, if you have not executed the stored procedure you created at the beginning, it is recommended that you execute the stored procedure first, and then execute the following SQL:
explain select * from t_user where create_time > '2023-02-24 09:04:23';
Among them, time is the time in the future to ensure that data can be found.
explain result:
As you can see, the index is normal.
Then, we change the parameters of the query criteria to another date:
explain select * from t_user where create_time > '2022-02-27 09:04:23';
explain result:
At this point, a full table scan is performed. This is also the strange phenomenon mentioned at the beginning.
Why does the same query statement only have different parameter values, but there is a case of index walking and no index walking?
The answer is simple: the above index fails because the DBMS finds that the full table scan is more efficient than the index walking, so it gives up the index walking.
In other words, when Mysql finds that the number of row records scanned through the index exceeds 10% - 30% of the whole table, the optimizer may give up the index and automatically turn into a full table scan. In some scenarios, even if the SQL statement is forced to index, it will also fail.
For similar problems, the above situations often occur when querying the range (such as \ >, <, > =, < =, in and other conditions), and the threshold mentioned above will vary according to different scenarios.
The fourteenth index invalidation: when the query condition is a range query such as greater than or equal to, in, etc., the optimizer may give up the index and scan the whole table according to the proportion of the query result in the whole table data.
14 others
Of course, there are other rules on whether to follow the index, which also has something to do with whether the index type is B-tree index or bitmap index, so I won't expand it in detail.
Other things to be said here can be summarized as the fifteenth case of index failure: other optimization strategies of Mysql optimizer. For example, if the optimizer thinks that in some cases, full table scanning is faster than index walking, it will give up the index.
In view of this situation, it is generally unnecessary to pay too much attention to it. When a problem is found, it can be checked at a fixed point.
Summary
This article summarizes 15 common scenarios of index failure. Due to different versions of Mysql, the index failure strategies are also different. Most index failures are clear, and a small number of index failures will vary depending on the version of Mysql. Therefore, it is recommended to collect this article. When comparing in the process of practice, if you can't accurately grasp it, you can directly execute explain for verification.
About the blogger: the author of the technical book "inside of SpringBoot technology", loves to study technology and write technical dry goods articles.
The official account: "new horizon of procedures", the official account of bloggers, welcome the attention.
Technical exchange: please contact blogger wechat: zhuan2quan