mysql Index Optimization Practice I

CREATE TABLE `employees` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(24) NOT NULL DEFAULT '' COMMENT 'full name',
  `age` int(11) NOT NULL DEFAULT '0' COMMENT 'Age',
  `position` varchar(20) NOT NULL DEFAULT '' COMMENT 'position',
  `hire_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Entry time',
  PRIMARY KEY (`id`),
  KEY `idx_name_age_position` (`name`,`age`,`position`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='Employee record form';

INSERT INTO employees(name,age,position,hire_time) VALUES('LiLei',22,'manager',NOW());
INSERT INTO employees(name,age,position,hire_time) VALUES('HanMeimei', 23,'dev',NOW());
INSERT INTO employees(name,age,position,hire_time) VALUES('Lucy',23,'dev',NOW());

-- Insert some sample data
drop procedure if exists insert_emp; 
delimiter ;;
create procedure insert_emp()        
begin
  declare i int;                    
  set i=1;                          
  while(i<=100000)do                 
    insert into employees(name,age,position) values(CONCAT('zhuge',i),i,'dev');  
    set i=i+1;                       
  end while;
end;;
delimiter ;
call insert_emp();

Take a comprehensive example that is not easy to understand:

1. The range of the first field of the joint index will not go through the index

mysql> EXPLAIN SELECT * FROM employees WHERE name > 'LiLei' AND age = 22 AND position ='manager';
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+-------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys         | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_name_age_position | NULL | NULL    | NULL | 97324 |     0.50 | Using where |
+----+-------------+-----------+------------+------+-----------------------+------+---------+------+-------+----------+-------------+

Conclusion: the first field of the joint index uses the range search instead of the index. mysql may think that the first field uses the range. The result set should be large and the efficiency of returning to the table is not high. It is better to scan the whole table

2. Forced index

mysql> EXPLAIN SELECT * FROM employees force index(idx_name_age_position) WHERE name > 'LiLei' AND age = 22 AND position ='manager';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 48662 |     1.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+-----------------------+
1 row in set, 1 warning (0.10 sec)

Conclusion: Although the forced index is used to make the search of the first field range of the joint index follow the index, and the rows scanned look a little less, the final search efficiency is not necessarily higher than that of the full table scan, because the table return efficiency is not high

-- Execution time 0.333s
SELECT * FROM employees WHERE name > 'LiLei';
-- Execution time 0.444s
SELECT * FROM employees force index(idx_name_age_position) WHERE name > 'LiLei';

3. Overlay index optimization

mysql> EXPLAIN SELECT name,age,position FROM employees WHERE name > 'LiLei' AND age = 22 AND position ='manager';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 74      | NULL | 48662 |     1.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

4. in and or will go through the index when the amount of table data is large, and full table scanning will be selected when there are few table records

mysql> EXPLAIN SELECT * FROM employees WHERE name in ('LiLei','HanMeimei','Lucy') AND age = 22 AND position ='manager';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 140     | NULL |    3 |   100.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM employees3 WHERE name in ('LiLei','HanMeimei','Lucy') AND age = 22 AND position ='manager';
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | employees3 | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

5. like KK% usually use the index

mysql> EXPLAIN SELECT * FROM employees WHERE name like 'LiLei%' AND age = 22 AND position ='manager';
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_name_age_position | idx_name_age_position | 140     | NULL |    1 |     5.00 | Using index condition |
+----+-------------+-----------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

Here is a supplementary concept for you: Index Condition Pushdown (ICP). like KK% actually uses index pushdown optimization

What is index push down?

For the auxiliary joint index (name,age,position), normally, according to the leftmost prefix principle, SELECT * FROM employees WHERE name like 'LiLei%' AND age = 22 AND position ='manager'. In this case, only the name field index will be used, because after filtering according to the name field, the age and position in the index line are disordered, so the index cannot be used well.

On mysql5 In versions before 6, this query can only match indexes with names beginning with 'LiLei' in the joint index, then take the primary keys corresponding to these indexes back to the table one by one, find the corresponding records in the primary key index, and then compare the values of age and position.

MySQL 5.6 introduces index push down optimization. During index traversal, you can judge all fields contained in the index first, filter out unqualified records, and then return to the table, which can effectively reduce the number of return to the table. After using the index push down optimization, after the above query matches the index with the name beginning with 'LiLei' in the joint index, it will also filter the age and position fields in the index, take the primary key id corresponding to the filtered index, and then go back to the table to check the whole row of data.

Index push down will reduce the number of table returns. For the table of innodb engine, index push down can only be used for secondary indexes. The leaf node of innodb's primary key index (clustered index) tree stores all row data, so index push down will not reduce the effect of querying all row data at this time.

Why is range lookup Mysql not optimized by index push down?

It is estimated that Mysql thinks that the filtered result set of range lookup is too large. Like KK% in most cases, the filtered result set is relatively small. Therefore, Mysql chooses to use index push down optimization for like KK%. Of course, this is not absolute. Sometimes like KK% does not necessarily push down.

Common sql in-depth optimization

Order by and Group by optimization

Case1:

analysis:

Using the leftmost prefix rule: the middle field cannot be broken, so the query uses the name index from the key_len=74 also shows that the age index column is used in the sorting process because there is no using filesort in the Extra field

Case 2:

analysis:

From the execution result of explain: key_len=74, the query uses the name index. Because position is used for sorting, age is skipped and Using filesort appears.

Case 3:

analysis:

Only the indexes name, age and position are used for sorting without Using filesort.

Case 4:

analysis:

The result is the same as that of explain in Case 3, but Using filesort appears because the index is created in the order of name, age and position, but age and position are reversed during sorting.

Case 5:

analysis:

Compared with Case 4, Using filesort does not appear in Extra. Because age is a constant and optimized in sorting, the index is not reversed and Using filesort does not appear.

Case 6:

analysis:

Although the sorted field columns are in the same order as the index, and order by is ascending by default, position desc here becomes descending, resulting in a different sorting method from the index, resulting in Using filesort. Mysql8 or above has a descending index, which can support this query method.

Case 7:

analysis:

For sorting, multiple equality conditions are also range queries

Case 8:

You can optimize with overlay indexes

Optimization summary:

1. MySQL supports two sorting methods, filesort and index. Using index means that MySQL scans the index itself to complete sorting. High index efficiency and low filesort efficiency.

2. order by uses the Using index when two conditions are met.

1) The order by statement uses the leftmost row of the index.

2) Use the combination of where clause and order by clause to satisfy the leftmost row of the index.

3. Try to sort on the index column and follow the leftmost prefix rule when establishing the index (the order in which the index is created).

4. If the condition of order by is not on the index column, Using filesort will be generated.

5. If you can use overlay index, try to use overlay index

6. Group by is very similar to order by. Its essence is to sort first and then group. It follows the leftmost prefix rule of index creation order. For the optimization of group by, if sorting is not required, order by null can be added to prohibit sorting. Note that where is higher than having. Do not have to limit the conditions that can be written in where.

Detailed explanation of Using filesort file sorting principle

filesort file sort method

  • One way sorting: all fields of qualified rows are retrieved at one time, and then sorted in the sort buffer; Use the trace tool to see sort_ The mode message displays < sort_ key, additional_ Fields > or < sort_ key, packed_ additional_ fields >
  • Two way sorting (also called return table sorting mode): first, take out the corresponding sorting field and row ID that can directly locate the row data according to the corresponding conditions, and then sort in the sort buffer. After sorting, you need to retrieve other required fields again; use the trace tool to see that < sort_key, ROWID > is displayed in the sort_mode information

MySQL compares the system variable max_ length_ for_ sort_ The size of data (1024 bytes by default) and the total size of the fields to be queried determine which sort mode to use.

  • If the total length of the field is less than max_length_for_sort_data, then the one-way sorting mode is used;
  • If the total length of the field is greater than max_length_for_sort_data, then use the two-way sorting module · formula.

Examples verify the following sorting methods:

Let's first look at the detailed process of one-way sorting:

  1. Find the first primary key id satisfying the condition of name = 'zhuge' from the index name
  2. Take out the whole row according to the primary key id, take out the values of all fields, and store them in sort_ In buffer
  3. Find the next primary key id satisfying the condition of name = 'zhuge' from the index name
  4. Repeat steps 2 and 3 until name = 'zhuge' is not satisfied
  5. Yes, sort_ The data in the buffer is sorted by the field position
  6. Return results to the client

Let's look at the detailed process of two-way sorting:

  1. Find the first primary key id satisfying name = 'zhuge' from the index name
  2. Take out the whole row according to the primary key id, and put the sorting field position and primary key id into the sort buffer
  3. Remove the primary key id of a record with name = 'zhuge' from the index name
  4. Repeat 3 and 4 until name = 'zhuge' is not satisfied
  5. Yes, sort_ The field position and primary key id in the buffer are sorted by the field position
  6. Traverse the sorted id and field position, return to the original table according to the id value, take out the values of all fields and return them to the client

In fact, comparing the two sorting modes, one-way sorting will put all the fields to be queried into the sort buffer, while two-way sorting will only put the primary key and the fields to be sorted into the sort buffer, and then return to the fields to be queried in the original table through the primary key.

If MySQL sorts memory sort_ The buffer configuration is relatively small and there is no condition to continue to increase. You can properly set max_ length_ for_ sort_ Configure the small point of data to let the optimizer choose to use the two-way sorting algorithm, which can be found in sort_ More rows are sorted in the buffer at a time, but the data needs to be returned to the original table according to the primary key.

If MySQL sorting memory can be configured to be large, you can increase Max appropriately_ length_ for_ sort_ Data, let the optimizer give priority to full field sorting (one-way sorting) and put the required fields into sort_buffer, so that the query results will be returned directly from memory after sorting.

Therefore, MySQL passes max_length_for_sort_data is used to control sorting. Different sorting modes are used in different scenarios to improve sorting efficiency.

Note that if you use sort all_ In general, the efficiency of buffer memory sorting is higher than that of disk file sorting, but sort cannot be arbitrarily increased because of this_ Buffer (1m by default). Many mysql parameter settings have been optimized. Do not adjust them easily.

Index design principles

1. Code first, index last

I don't know how you usually build indexes for data tables. Do you build indexes immediately after building tables?

In fact, this is wrong. Generally, you should wait until the main business function is developed and take out and analyze the sql related to the table before creating an index.

2. Joint index coverage conditions

For example, you can design one or two or three joint indexes (try to build less single value indexes), make each joint index include the fields of where, order by and group by in the sql statement as much as possible, and ensure that the field order of these joint indexes meets the leftmost prefix principle of sql query as much as possible.

3. Do not index on small cardinality fields

Index cardinality refers to the total number of different values of this field in the table. For example, if there are 1 million rows of records in a table, there is a gender field whose value is either male or female, then the cardinality of this field is 2.

If you index this small base number segment, it's better to scan the whole table, because your index tree contains male and female values, so it's impossible to carry out fast binary search, so it doesn't make much sense to use the index.

Generally, when establishing an index, try to use the fields with large cardinality, that is, the fields with more values, so as to give play to the advantage of fast binary search of B + tree.

4. We can use prefix index for long strings

Try to design indexes for columns with small field types, such as tinyint, because if the field type is small, the disk space will be small, and your performance will be better when searching.

Of course, this so-called column with a smaller field type is not absolute. Many times, you just need to establish an index for fields such as varchar(255), even if it takes up more disk space.

The large field of varchar(255) may occupy more disk space. It can be slightly optimized. For example, index the first 20 characters of this field, that is, put the first 20 characters of each value in this field in the index tree, similar to KEY index(name(20),age,position).

At this time, when you search in the where condition, if you search according to the name field, you will first search according to the first 20 characters of the name field in the index tree, locate some data matching the prefix of the next 20 characters, and then return to the cluster index to extract the complete name field value for comparison.

However, if you order by name, your name only contains the first 20 characters in the index tree, so you can't use the index for this sort, and the same is true for group by. So here we should have an understanding of prefix index.

5. In case of conflict between where and order by, where takes precedence

When there is an index design conflict between where and order by, do you design the index for where or order by? Do you want where to use the index or order by to use the index?

In general, the where condition is often used to use the index to quickly filter out a part of the specified data, and then sort it.

Because in most cases, where filtering based on index can filter out a small part of the data you want as quickly as possible, and then the cost of sorting may be much less.

6. Optimization based on slow sql query

You can perform specific index optimization for these slow sql queries according to some slow sql in the monitoring background.

For unclear slow sql queries, please refer to this article: https://blog.csdn.net/qq_40884473/article/details/89455740

Index design practice

Take the social scenario APP as an example. We usually search for some friends, which involves filtering user information. This must be the user table. Generally speaking, the amount of data in this table will be relatively large. We don't consider the database and table. For example, we generally screen regions (provinces and cities), gender, age, height, hobbies and so on, Some apps may also have user ratings, such as user popularity ratings. We may also sort them according to the ratings, etc.

For the background program, in addition to filtering various conditions of users, it also needs processing such as paging, which may generate sql statements for execution:

select xx from user where xx=xx and xx=xx order by xx limit xx,xx

In this case, how to reasonably design the index? For example, users may often screen users in the same city according to provinces and cities, and screen according to gender. Should we design a joint index (province,city,sex)? It seems that the cardinality of these fields is not large. In fact, it should be, because these fields are queried too frequently.

Suppose another user filters according to the age range, such as where province = XX and city = XX and age > = XX and age < = XX, we try to add the age field to the joint index (province,city,sex,age). Note that generally, the search conditions of this range should be placed at the end. The conditions after the joint index range mentioned earlier cannot be indexed, However, for the current situation, the index field age is still not used, because the user does not filter the sex field. How to optimize it? In fact, we can optimize the writing of sql in this way: where province = XX and city = XX and sex in ('female ',' male ') and age > = XX and age < = XX

Fields such as hobbies can also be processed similarly to the sex field, so you can add the hobby field to the index (province,city,sex,hobby,age)

Suppose there is another filter condition, for example, to filter users who have logged in in the last week. Generally, people must want to make friends with active users, so that they can receive feedback as soon as possible. The corresponding background sql may be as follows:

where province=xx and city=xx and sex in ('female','male') and age>=xx and age<=xx and latest_login_time>= xx

Can we put the latest_ login_ The time field is also indexed? For example (province,city,sex,hobby,age,latest_login_time), obviously not. How can we optimize this situation? In fact, we can try to design another field is_ login_ in_ latest_ 7_ Days, if the user has a login value within a week, it is 1, otherwise it is 0, then we can design the index as (province,city,sex,hobby,is_login_in_latest_7_days,age) to meet the above scenario!

Generally speaking, such a multi field index can filter out most of the data, keep a small part of the data, sort the order by statement based on the disk file, and finally page based on limit, so the general performance is relatively high.

However, sometimes users may query in this way to find women with high popularity, such as sql: where sex = 'female' order by score limit xx,xx. The above index is difficult to use. Too many fields and too many values cannot be spliced into sql with in statements. What should we do? In fact, we can design an auxiliary joint index, such as (sex,score), so as to meet the query requirements.

The above is the idea of index design. The core idea is to use one or two complex multi field joint indexes to resist more than 80% of your queries, and then use one or two auxiliary indexes to resist the remaining atypical queries as much as possible, so as to ensure that as many queries of this large data scale can make full use of the indexes as possible, This will ensure your query speed and performance!

Keywords: Java

Added by hyngvesson on Mon, 20 Dec 2021 12:21:36 +0200