MySQL optimization lesson 2: Explain and index practice

Explain use and explanation

Explain tool introduction

Use the EXPLAIN keyword to simulate the database optimizer to execute SQL statements and analyze the performance bottlenecks of your query statements or structures
Add the explain keyword before the select statement. MySQL will set a flag on the query. Executing the query will return the information of the execution plan instead of executing the SQL.
Note: if the from contains a subquery, the subquery will still be executed and the results will be put into the temporary table

Explain analysis example

Build table

DROP TABLE IF EXISTS `actor`;
CREATE TABLE `actor` (
  `id` int(11) NOT NULL,
  `name` varchar(45) DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `actor` (`id`, `name`, `update_time`) VALUES (1,'a','2017-12-22 15:27:18'), (2,'b','2017-12-22 15:27:18'), (3,'c','2017-12-22 15:27:18');

DROP TABLE IF EXISTS `film`;
CREATE TABLE `film` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `film` (`id`, `name`) VALUES (3,'film0'),(1,'film1'),(2,'film2');

DROP TABLE IF EXISTS `film_actor`;
CREATE TABLE `film_actor` (
  `id` int(11) NOT NULL,
  `film_id` int(11) NOT NULL,
  `actor_id` int(11) NOT NULL,
  `remark` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_film_actor_id` (`film_id`,`actor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `film_actor` (`id`, `film_id`, `actor_id`) VALUES (1,1,1),(2,1,2),(3,2,1); 

Use explain

explain select * from actor;


Each table in the query will output one row. If two tables join the query through join, two rows will be output. The meaning of a table is quite broad: it can be a sub query, a union result, etc.

explain two variants (earlier versions)

mysql8. The two variants of version 0 have become the default fields, and an error will be reported if they are added

1) explain extended: additional query optimization information will be provided on the basis of explain. After that, you can get the optimized query statement through the show warnings command, so as to see what the optimizer has optimized. In addition, there is a filtered column, which is the value of a half split ratio. rows * filtered/100 can estimate the number of rows to be connected with the previous table in explain (the previous table refers to the table whose id value in explain is smaller than the id value of the current table).

explain extended select * from film where id = 1;

See how mysql will be optimized

explain select * from film where id = 1;
show warnings;

2) explain partitions: there are more partitions fields than explain. If the query is based on the partition table, the partitions that the query will access will be displayed.

Columns in explain

Next, we will show the information of each column in explain.
Close the merge derived table first

set session optimizer_switch = 'derived_merge=off';

id column

The number of the id column is the serial number of the select. If there are several selections, there are several IDS, and the order of IDS increases in the order in which the select appears. MySQL divides select query into simple query and primary query.
Complex queries are divided into three categories: simple subqueries, derived tables (subqueries in the from statement) and union queries.
The larger the id column, the higher the execution priority. If the id is the same, it will be executed from top to bottom. If the id is NULL, it will be executed last
1) Simple subquery

explain select (select 1 from actor limit 1) from film;

2) Subquery in from clause

explain select id from (select id from film) as der;

During the execution of this query, there is a temporary table alias der, which is referenced by the external select query

3) union query

explain select 1 union all select 1;


The union result is always placed in an anonymous temporary table. The temporary table does not appear in SQL, so its id is NULL.

select_type column

select_type indicates whether the corresponding row is a simple or complex query. If it is a complex query, which of the above three complex queries is it.
1.simple: simple query. The query does not contain subqueries and union s

explain select * from film where id = 2;


2.primary: the outermost select in complex queries
3.subquery: subquery contained in the select (not in the from clause)
4.derived: the subquery contained in the from clause. MySQL will store the results in a temporary table, also known as derived table
Use this example to understand the primary, subquery, and derived types

# Turn off settings first
set session optimizer_switch = 'derived_merge=off';
explain select (select 1 from actor where id = 1) from (select * from film where id = 1) der;

Before closing the merge derived table:

After closing the merge derived table:

5.union: the second and subsequent select in the union
6.union result: select the result retrieved from the union temporary table
Use this example to understand the Union and union result types:

explain select 1 union all select 1;

table column

This column indicates which table a row of explain is accessing.
When there is a subquery in the from clause, the table column is in < deriven n format, which means that the current query depends on the query with id=N, so the query with id=N is executed first.
When there is a union, the value of the table column of UNION RESULT is < union1,2 >, and 1 and 2 represent the select row id participating in the union.

type column

This column represents the association type or access type, that is, MySQL decides how to find rows in the table and the approximate range of data row records.
The order from the best to the worst is: system > const > Eq_ ref > ref > range > index > ALL
Generally speaking, it is necessary to ensure that the query reaches the range level, preferably ref
NULL: mysql can decompose query statements in the optimization phase, and there is no need to access tables or indexes in the execution phase. For example, selecting the minimum value in the index column can be completed by looking up the index separately, and there is no need to access the table during execution

explain select min(id) from film;


const, system: mysql can optimize a part of the query and convert it into a constant (see the results of show warnings). When comparing all columns of primary key or unique key with constants, the table has at most one matching row, which can be read once, which is fast. System is a special case of const. If only one tuple in the table matches, it is system

explain select * from (select * from film where id = 1) tmp;
show warnings;


In the show warnings result: dual is an empty table of mysql

eq_ref: all parts of the primary key or unique key index are used by connection, and only one qualified record will be returned at most. This is probably the best join type outside const. This type will not appear in a simple select query.

explain select * from film_actor left join film on film_actor.film_id = film.id;


ref: compared with eq_ref, instead of using the unique index, use the partial prefix of the ordinary index or the unique index. If the index is compared with a certain value, multiple qualified rows may be found.
1 . Simple select query, where name is a common index (non unique index)

explain select * from film where name = 'film1';


2. Association table query, idx_film_actor_id is film_id and actor_ For the joint index of ID, film is used here_ The left prefix of actor is film_id part.

explain select film_id from film left join film_actor on film.id = film_actor.film_id;


Range: range scanning usually occurs in in (), between, >, <, > = and other operations. Use an index to retrieve rows in a given range.

explain select * from actor where id > 1;


Index: the result can be obtained by scanning the whole table index. Generally, a secondary index is scanned. This kind of scanning will not start from the root node of the index tree, but directly traverse and scan the leaf nodes of the secondary index. The speed is still relatively slow. This kind of query is generally to use the overlay index. The secondary index is relatively small, so it is usually faster than ALL.

The difference between ref and ref is that ref is a binary search, while index finds the first leaf node and then traverses all leaf nodes

explain select * from film;


ALL: that is, full table scanning, scanning ALL leaf nodes of your cluster index. Usually, this needs to increase the index for optimization
(index is read from the secondary index, while all is read from the cluster index (primary key index))

explain select * from actor;

possible_keys column

This column shows which indexes the query may use to find.
Possible when explain ing_ Keys has columns and key s display NULL. This is because there is not much data in the table. mysql thinks that the index is not helpful for this query, so it selects the full table query.
If the column is NULL, there is no associated index. In this case, you can improve query performance by checking the where clause to see if you can create an appropriate index, and then use explain to see the effect.

key column

This column shows which index mysql actually uses to optimize access to the table.
If no index is used, the column is NULL. If you want to force mysql to use or ignore possible_ The index in the keys column. force index and ignore index are used in the query.

key_len column

This column shows the number of bytes used by mysql in the index. Through this value, you can calculate which columns in the index are used.
For example, film_ Joint index idx of actor_ film_actor_id by film_id and actor_id consists of two int columns, and each int is 4 bytes. Pass the key in the result_ Len = 4, it can be inferred that the query uses the first column: film_id column to perform index lookup.

explain select * from film_actor where film_id = 2;


key_len calculation rules are as follows:

  • character string
    char(n): n-byte length
    varchar(n): 2 bytes to store the string length. If it is utf-8, the length is 3n + 2
  • value type
    tinyint: 1 byte
    smallint: 2 bytes
    int: 4 bytes
    bigint: 8 bytes
  • Time type
    date: 3 bytes
    timestamp: 4 bytes
    datetime: 8 bytes
  • If the field is allowed to be NULL, whether the 1-byte record is NULL is required

The maximum length of the index is 768 bytes. When the string is too long, mysql will do a process similar to the left prefix index to extract the first half of the characters for indexing.

ref column

This column shows the columns or constants used by the table to find the value in the index of the key column record. The common ones are: const (constant) and field name (for example: film.id)

rows column

This column is the number of rows that mysql estimates to read and detect. Note that this is not the number of rows in the result set.

Extra column

This column shows additional information. Common important values are as follows:
Using index: the query column is overwritten by the index, and the where filter condition is the leading column of the index, which is the performance of high performance. Generally, the overlay index is used (the index contains the fields of all queries). For innodb, if it is a secondary index, the performance will be improved a lot

Overlay index definition: the key in the explain result of the mysql execution plan has a use index. If the fields queried after select can be obtained from the tree of this index, it can be said that the overlay index is generally used, and there is generally a using index in extra; Overlay index is generally aimed at the auxiliary index. The whole query result can be obtained only through the auxiliary index. There is no need to find the primary key through the auxiliary index tree, and then obtain other field values through the primary key index tree

explain select film_id from film_actor where film_id = 1;


Using where: the where statement is used to process the results, and the columns of the query are not overwritten by the index

explain select * from actor where name = 'a';


Using where Using index: the query column is overwritten by the index, and the where filter condition is one of the index columns but not the leading column of the index, which means that qualified data cannot be queried directly through index lookup

explain select film_id from film_actor where actor_id = 1;


NULL: the query column is not overwritten by the index, and the where filter condition is the leading column of the index, which means that the index is used, but some fields are not overwritten by the index. It must be realized by "returning to the table", not purely using the index or not using the index at all

explain select * from film_actor where film_id = 1;

Using index condition: similar to Using where, the query column is not completely covered by the index. The where condition is the range of a leading column;

explain select * from film_actor where film_id > 1;


Using temporary: mysql needs to create a temporary table to process queries. In this case, it is generally necessary to optimize. The first thing is to think of using index to optimize.
1 . actor.name has no index. At this time, a temporary table is created to distinguish

explain select distinct name from actor;


2 . film.name created idx_name index. At this time, extra is using index when querying, and no temporary table is used

explain select distinct name from film;


Instead of reading the results from an external table, filesort will sort the results by an index. At this time, mysql will browse all qualified records according to the connection type, save the sorting keyword and row pointer, then sort the keyword and retrieve the row information in order. In this case, it is generally necessary to consider using indexes for optimization.
1 . actor. If name does not create an index, it will browse the entire actor table, save the sorting keyword name and the corresponding id, then sort name and retrieve row records

explain select * from actor order by name;


2 . film.name created idx_name index. When querying, extra is using index

explain select * from film order by name;

Select tables optimized away: use some aggregate functions (such as max, min) to access a field with an index

explain select min(id) from film;

Indexing practice

CREATE TABLE `employees` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(24) NOT NULL DEFAULT '' COMMENT 'full name',
  `age` int(11) NOT NULL DEFAULT '0' COMMENT 'Age',
  `position` varchar(20) NOT NULL DEFAULT '' COMMENT 'position',
  `hire_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Entry time',
  PRIMARY KEY (`id`),
  KEY `idx_name_age_position` (`name`,`age`,`position`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT='Employee record form';

INSERT INTO employees(name,age,position,hire_time) VALUES('LiLei',22,'manager',NOW());
INSERT INTO employees(name,age,position,hire_time) VALUES('HanMeimei', 23,'dev',NOW());
INSERT INTO employees(name,age,position,hire_time) VALUES('Lucy',23,'dev',NOW());

Full value matching

EXPLAIN SELECT * FROM employees WHERE name= 'LiLei';

EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22;

EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22 AND position ='manager';

Best left prefix rule

If multiple columns are indexed, follow the leftmost prefix rule. It means that the query starts at the top left of the index and does not skip the columns in the index.

EXPLAIN SELECT * FROM employees WHERE age = 22 AND position ='manager';
EXPLAIN SELECT * FROM employees WHERE position = 'manager';
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei';



Failure to do any operation (calculation, function, (automatic or manual) type conversion) on the index column will lead to index failure and turn to full table scanning

EXPLAIN SELECT * FROM employees WHERE name = 'LiLei';
EXPLAIN SELECT * FROM employees WHERE left(name,3) = 'LiLei';


The storage engine cannot use the column to the right of the range condition in the index

EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22 AND position ='manager';
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age > 22 AND position ='manager';


Try to use the overlay index (only the query accessing the index (the index column contains the query column)) and reduce the select * statement

EXPLAIN SELECT name,age FROM employees WHERE name= 'LiLei' AND age = 23 AND position ='manager';
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 23 AND position ='manager';


mysql cannot use the index when it is not equal to (! = or < >), which will lead to full table scanning

EXPLAIN SELECT * FROM employees WHERE name != 'LiLei';

is null,is not null, and indexes cannot be used

EXPLAIN SELECT * FROM employees WHERE name is null;

like starts with a wildcard ('$abc...') if the mysql index fails, it will become a full table scan operation

EXPLAIN SELECT * FROM employees WHERE name like '%Lei';
EXPLAIN SELECT * FROM employees WHERE name like 'Lei%';



Question: how to solve the problem that like '%' string index is not used?

  1. To use the overlay index, the query field must be the field to establish the overlay index
    EXPLAIN SELECT name,age,position FROM employees WHERE name like '%Lei%';
    
  2. When the fields pointed to by the overwrite index are varchar(380) and above, the overwrite index will become invalid!

Invalid string index without single quotation marks

EXPLAIN SELECT * FROM employees WHERE name = '1000';
EXPLAIN SELECT * FROM employees WHERE name = 1000;


Use or less. When using it to connect, the index will fail in many cases

EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' or name = 'HanMeimei';

Range lookup optimization

Sometimes, some large-scale searches do not take the index because the mysql internal optimizer will evaluate whether to use the index according to multiple factors such as the retrieval ratio and table size

The optimization method is to split a large range into multiple small ranges

Summary:

Suppose index(a,b,c)

like KK% equals = constant,% KK and% KK% equals range

Keywords: Database MySQL

Added by lesmckeown on Sat, 12 Feb 2022 12:42:36 +0200