MySQL optimizer: introduction to index merge - Database, Cloud Computing and Life

In MySQL's official manual, about Introduction to index merge Very few. There are even many misleading places. This time, I have looked over the code of version 5.1 about this kind of optimization processing, and introduced various practical index merge access types of SQL in a case-by-case way. Subsequently, we will continue to introduce the main data structure of index merge implementation, as well as cost assessment.

1. what is index merge?

If the MySQL optimizer finds that it can locate data using intersection/union after multiple index lookups, the MySQL optimizer will try such access methods as index merge. Index merge can be divided into two main categories: multiple index intersections and multiple index union access. Of course, these two types can also be combined in more complex ways, such as the union of multiple intersections.

1.1 Limitation of index merge: range first

Before MySQL was 5.6.7, there was an important prerequisite for using index merge: no range was available. This restriction reduces the scenarios that MySQL index merge can use. The ideal state is to evaluate the cost at the same time and then make a choice. Because of this limitation, there is the following known bad case( Reference resources):

SELECT * FROM t1 WHERE (goodkey1 < 10 OR goodkey2 < 20) AND badkey < 30;

The optimizer can choose to use goodkey1 and goodkey2 as index merge or badkey as range. Because of the above principle, regardless of goodkey1 and goodkey2 selection, MySQL only considers range, not index merge access. This is a tragedy...( Version 5.6.7 has been fixed for this)

2. Some cases about index merge

On what is intersection / Union Detailed information is included in the manual. I won't go into details here. Here, through several cases, we can see which cases use intersection, which cases use union, and which cases use more complex combinations.

The table structure and data used in the example are referenced in Section 4.2 of this article.

2.1 k1_p1 = 2 or k2_p1 = 4

This is the most typical and simplest scenario.

SELECT * FROM tmp_index_merge where key1_part1 = 2 or key2_part1 = 4
explain SELECT * FROM tmp_index_merge where key1_part1 = 2 or key2_part1 = 4\G
            ......
        table: tmp_index_merge
         type: index_merge
          key: ind1,ind2
      key_len: 4,4
        Extra: Using sort_union(ind1,ind2); Using where

2.2 (k1_p1=2 and k1_p2=7) or k2_p1=4\G

This case is a little more complicated. The first index uses two fields:

explain SELECT * FROM tmp_index_merge 
where (key1_part1 = 2 and key1_part2 = 7) or key2_part1 = 4\G
            ......
        table: tmp_index_merge
         type: index_merge
          key: ind1,ind2
      key_len: 8,4
        Extra: Using sort_union(ind1,ind2); Using where

2.3 (k1_p1=2 or k1_p1=7) or k2_p1=4\G

This case can also use index merge. The internal implementation is more complex than it seems on the surface. Here's a brief explanation: MySQL handles the previous part (key1_part1 = 2 or key1_part1 = 7) when it recursively handles this WHERE condition. MySQL merges or operations on the same field of the same index into a SEL_ARG tree( Specific reference The two conditions are connected by the Next/prev pointer of SEL_ARG. MySQL's range access can be done by traversing the tree (also for reference) The previous article ) Then the optimizer reprocesses another branch of or (key2_part1 = 4) and finds that the second index can be used, so it adds the index merge to the list of possible execution plans (subsequent cost assessment, and then decides whether to use the access method).

explain SELECT * FROM tmp_index_merge 
where (key1_part1 = 2 or key1_part1 = 7) or key2_part1 = 4\G
            ......
        table: tmp_index_merge
         type: index_merge
          key: ind1,ind2
      key_len: 4,4
        Extra: Using sort_union(ind1,ind2); Using where

2.4 (k1_p1=2 or k1_p2=7) or k2_p1=4\G

In this case, no index can be used directly. No explanation.

explain SELECT * FROM tmp_index_merge 
where (key1_part1 = 2 or key1_part2 = 7) or key2_part1 = 4\G
            ......
        table: tmp_index_merge
         type: ALL
possible_keys: ind1,ind2
          key: NULL
        Extra: Using where

2.5 k1_p1=1 or (k1_p1=2 and k1_p2=4 and k2_p1=3)

For such conditions, MySQL will find that range access can be used. According to the previous "range first" principle, MySQL no longer considers index merge (where k1_p1=1 and k2_p1=3 can be accessed by index merge). When MySQL considers using key1 access, it sees the condition that k1_p1=1 or (k1_p1=2 and k1_p2=4). Here the conditions on both sides of OR can be constructed as an independent SEL_ARG. (More details on the Range Priority Principle are provided later in this article.)

So MySQL will use range directly instead of index merge. (What conditions can not be reached as a SEL_ARG tree, reference, for two SEL_ARG through or merge, there are some more complex things, I will not introduce here for the time being)

explain SELECT * FROM tmp_sel_tree
where key1_part1=1 or (key1_part1=2 and key1_part2=4 and key2_part1=3)\G
        table: tmp_sel_tree
         type: range
          key: ind1
      key_len: 8
        Extra: Using where

If the preceding cases are clear, they can be continued. There will be some more complicated cases below.

2.6 Nested Case 1

This case may seem complex, but its essence is the same as the "known bad case" mentioned earlier. It is a case where index merge can be used, but range takes precedence over it.

SELECT * FROM tmp_sel_tree where
  ( key1_part1 = 1 or (key1_part2 = 2 and key2_part1 = 3) ) and
  ( key3_part1 = 5 )

2.7 Nested Case 2

This case is slightly different from the previous one, which is an index merge of three indexes, where MySQL will consider using index merge. But generally speaking, the cost of this kind of index merge is relatively large, and it is easy to exceed the cost of the whole table.

SELECT * FROM tmp_sel_tree where
  ( key1_part1 = 1 or (key1_part2 = 2 and key2_part1 = 3) ) or
  ( key3_part1 = 5 )

If the cost of the index merge is found to be less than the full table, it will be used:

table: tmp_sel_tree
 type: index_merge
  key: ind1,ind2,ind3
Extra: Using sort_union(ind1,ind2,ind3); Using where

3. More on the range Priority Principle

The case where range can be used

In the MySQL version before 5.6.7, as long as Range access is available, index merge will not be used again. Because there are many WHERE conditions that can be accessed using range, except for the common ones (k1_p1 = const and k2_p2 > const), if you refer to them Range optimizes related data structures You'll also see more WHERE conditions using range.

Here is a simple analysis of one of the more complex WHERE conditions that can be accessed by range.

WHERE
  (
    key1_part1 = 3 and key1_part2 > 5 and key2_part1 = 7
  )
  or ( key1_part1 > 2 )

For index key2, this condition can be simplified as follows, using index merge access:

(TRUE AND TRUE AND key2_part1 = 7) OR ( key1_part1 < 2 )

For index key1, the condition is simplified as follows:

(key1_part1 = 3 and key1_part2 > 5 and TRUE) or (key1_part1 > 2)

For index key1, this is a condition where range access can be used. According to the foregoing Range optimizes related data structures A SEL_ARG structure can be constructed as follows:

               $                      $
SEL_ARG[2,∞)   $                      $
       |^      $                      $
   next||      $                      $
       ||prev  $                      $
       v|      $                      $
SEL_ARG[3,3] ==$====>  SEL_ARG[5,∞]   $
               $                      $

Range access will be SEL_ARG in turn, traversing the access. Because of range access, such conditions do not take index merge into account.

But if WHERE is as follows (OR followed by key1_part2 instead of key1_part1):

WHERE
  (
    key1_part1 = 3 and key1_part2 > 5 and key2_part1 = 7
  )
  or ( key1_part2 > 2 )

The key1_part2 following OR cannot be merged into a SEL_ARG tree with the previous key1 condition, so range access cannot be used. Because the latter condition of or cannot be accessed independently, it is also impossible to do index merge access.

4. other

4.1 type in MySQL Explain

In the MySQL manual, the type column in Explain is called "EXPLAIN Join Types". This has led to a misunderstanding among many people, where Type actually refers to the way this single table is accessed throughout the JOIN. For example:

           id: 1
  select_type: SIMPLE
        table: tmp_sel_tree
         type: index_merge
possible_keys: ind1,ind2,ind3
          key: ind1,ind2,ind3
      key_len: 4,4,4

Common forms of access are const/ref/range/index/all

MySQL optimizer has two degrees of freedom, one is to determine how each single table is accessed. The other is access order. Blog often said that the use of "range optimization" and "index merge optimization" also refers to MySQL form access mode to choose "range" or "index merge".

Table structure and data in the 4.2 example

CREATE TABLE `tmp_index_merge` (
  `id` int(11) NOT NULL,
  `key1_part1` int(11) NOT NULL,
  `key1_part2` int(11) NOT NULL,
  `key2_part1` int(11) NOT NULL,
  `key2_part2` int(11) NOT NULL,
  `key2_part3` int(11) NOT NULL,
  `key3_part1` int(11) NOT NULL DEFAULT '4',
  KEY `ind1` (`key1_part1`,`key1_part2`),
  KEY `ind2` (`key2_part1`,`key2_part2`,`key2_part3`),
  KEY `ind3` (`key3_part1`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk
1 row in set (0.01 sec)

root@test 11:33:22>select * from tmp_index_merge;
+----+------------+------------+------------+------------+------------+------------+
| id | key1_part1 | key1_part2 | key2_part1 | key2_part2 | key2_part3 | key3_part1 |
+----+------------+------------+------------+------------+------------+------------+
|  6 |          6 |          1 |          9 |          2 |          1 |          8 |
|  8 |          9 |          9 |          1 |          6 |          6 |          6 |
|  4 |          1 |          3 |          4 |          9 |          3 |          6 |
| 10 |          9 |          7 |          5 |          7 |          1 |          2 |
|  1 |          4 |          7 |          2 |          1 |          8 |          3 |
|  6 |          6 |          3 |          9 |          3 |          9 |          7 |
|  8 |         10 |          6 |          2 |          1 |          1 |          7 |
|  0 |          9 |          4 |          4 |          8 |          7 |          6 |
|  2 |          9 |          1 |          5 |          4 |          5 |          7 |
|  2 |          7 |         10 |          6 |          1 |          8 |          6 |
|  7 |         10 |          8 |          2 |          3 |          1 |          9 |
|  7 |          3 |          3 |          7 |          7 |          2 |         10 |
|  6 |          6 |          1 |          9 |          2 |          1 |          8 |
|  8 |          9 |          9 |          1 |          6 |          6 |          6 |
|  4 |          1 |          3 |          4 |          9 |          3 |          6 |
| 10 |          9 |          7 |          5 |          7 |          1 |          2 |
|  1 |          4 |          7 |          2 |          1 |          8 |          3 |
|  6 |          6 |          3 |          9 |          3 |          9 |          7 |
|  8 |         10 |          6 |          2 |          1 |          1 |          7 |
|  0 |          9 |          4 |          4 |          8 |          7 |          6 |
|  2 |          9 |          1 |          5 |          4 |          5 |          7 |
|  2 |          7 |         10 |          6 |          1 |          8 |          6 |
|  7 |         10 |          8 |          2 |          3 |          1 |          9 |
|  7 |          3 |          3 |          7 |          7 |          2 |         10 |
+----+------------+------------+------------+------------+------------+------------+

Keywords: Database MySQL SQL less

Added by genetheblue on Tue, 08 Oct 2019 13:41:24 +0300