Query rewriting rules for MySQL

MySQL Learning Series

For some very performance-consuming statements, MySQL still tries its best to convert this bad statement into a more efficient form according to some rules. This process can also be called query rewriting.

Conditional simplification

The search condition of the query statement we write is essentially an expression. These expressions may be complex or cannot be executed efficiently. MySQL query optimizer will simplify these expressions for us.

  • Remove unnecessary parentheses

Sometimes there are many useless parentheses in the expression, such as:
((a = 5 AND b = c) OR ((a > c) AND (c < 5)))
It's annoying to see. The optimizer will kill those unused parentheses. That's it:
(a = 5 and b = c) OR (a > c AND c < 5)

  • constant_propagation

Sometimes an expression is an equivalent match between a column and a constant, such as:

a = 5

When this expression AND other expressions involving column a are connected by AND, the value of a in other expressions can be replaced with 5, for example:

a = 5 AND b > a

Can be converted to:

a = 5 AND b > 5

Equivalence_propagation
Sometimes there is an equivalent matching relationship between multiple columns, such as:

a = b and b = c and c = 5

This expression can be simplified to:

a = 5 and b = 5 and c = 5
  • Remove useless condition (trivial_condition_remove)

For some expressions that are obviously always TRUE or FALSE, the optimizer will remove them, such as this expression:

(a < 1 and b = b) OR (a = 6 OR 5 != 5)

Obviously, the expression b = b is always TRUE, 5= 5 this expression is always FALSE, so the simplified expression is like this:

(a < 1 and TRUE) OR (a = 6 OR FALSE)

Can continue to be simplified as

a < 1 OR a = 6
  • Expression evaluation

Before the query is executed, if the expression contains only constants, its value will be calculated first, such as this:
a = 5 + 1
Because the expression 5 + 1 contains only constants, it will be reduced to:
a = 6
However, it should be noted here that if a column is not used as the operand of an expression in a separate form, for example, in a function or in a more complex expression, like this:
ABS(a) > 5
Or:
-a < -8
The optimizer does not attempt to simplify these expressions. As we said earlier, the index can only be used when the index column and constant in the search criteria are connected by some operators. Therefore, if possible, it is best to let the index column appear in the expression in a separate form.

  • Constant table detection

MySQL thinks the following query runs very fast:
Use primary key equivalence matching or unique secondary index column equivalence matching as search criteria to query a table.
MySQL thinks that the time spent on these two queries is negligible, so it also calls the tables queried by these two methods constant tables (English Name: constant tables). When analyzing a query statement, the optimizer first executes the constant table query, and then replaces all the conditions related to the table in the query
Change it into a constant, and finally analyze the query cost of other tables. For example, this query statement:

SELECT * FROM table1 INNER JOIN table2
ON table1.column1 = table2.column2
WHERE table1.primary_key = 1;

Obviously, this query can use the equivalent matching of primary key and constant value to query table1 table, that is, in this query, table1 table is equivalent to constant table. Before analyzing the query cost of table2 table, the query of table1 table will be executed and the conditions related to table1 table in the query will be replaced, that is, the above statement will be converted into this:

SELECT table1 Constant values for each field of the table record, table2.* FROM table1 INNER JOIN
table2 ON table1 surface column1 Constant value of column = table2.column2;

External connection elimination

As we said earlier, the positions of internally connected drive tables and driven tables can be converted to each other, while the left (outer) and right (outer) connected drive tables and driven tables are fixed. As a result, the inner join may reduce the overall query cost by optimizing the join order of the table, while the outer join cannot optimize the join order of the table.

As we said before, the essential difference between external connection and internal connection is that for the records of the driven table of external connection, if the records matching the filter conditions in the ON Clause cannot be found in the driven table, the records will still be added to the result set, and the fields of the corresponding driven table records will be filled in with NULL values
Filling; If the record of the inner connected driving table cannot find a record matching the filter condition in the ON clause in the driven table, the record will be discarded. The query effect is as follows:

mysql> SELECT * FROM e1 INNER JOIN e2 ON e1.m1 = e2.m2;
+------+------+------+------+
| m1   | n1   | m2   | n2   |
+------+------+------+------+
|    2 | b    |    2 | b    |
|    3 | c    |    3 | c    |
+------+------+------+------+
2 rows in set (0.01 sec)
mysql> SELECT * FROM e1 LEFT JOIN e2 ON e1.m1 = e2.m2;
+------+------+------+------+
| m1   | n1   | m2   | n2   |
+------+------+------+------+
|    2 | b    |    2 | b    |
|    3 | c    |    3 | c    |
|    1 | a    | NULL | NULL |
+------+------+------+------+
3 rows in set (0.00 sec)

For the (left) outer join in the above example, the records of M1 = 1 and N1 ='a 'in the driven table e1 cannot find the ON Clause condition e1 in the driven table E2 m1 = e2. m2 record, so this record is directly added to the result set, and the values of m2 and n2 columns of the corresponding E2 table are set to NULL.

Because all records that do not meet the conditions in the WHERE clause will not participate in the connection. As long as we specify in the search criteria that the value of the related columns of the driven table is not NULL, the driven table records that meet the conditions of the ON clause in the driven table in the outer connection will be excluded from the final result set, that is:
In this case, there is no difference between external connection and internal connection! For example, this query:

mysql> SELECT * FROM e1 LEFT JOIN e2 ON e1.m1 = e2.m2 WHERE e2.n2 IS NOT NULL;
+------+------+------+------+
| m1   | n1   | m2   | n2   |
+------+------+------+------+
|    2 | b    |    2 | b    |
|    3 | c    |    3 | c    |
+------+------+------+------+
2 rows in set (0.00 sec)

Since the n2 column of the driven table e2 is not allowed to be NULL, the left (outer) join query and inner join query of the e1 and e2 tables above are the same. Of course, we don't need to explicitly specify that a column of the driven table IS NOT NULL, as long as it implicitly means this, for example:

mysql> SELECT * FROM e1 LEFT JOIN e2 ON e1.m1 = e2.m2 WHERE e2.m2 = 2;
+------+------+------+------+
| m1   | n1   | m2   | n2   |
+------+------+------+------+
|    2 | b    |    2 | b    |
+------+------+------+------+
1 row in set (0.00 sec)

In this example, we specify in the WHERE clause that the m2 column of the driven table e2 is equal to 2, which is equivalent to indirectly specifying that the m2 column is not NULL. Therefore, the upper left (outer) join query is actually equivalent to the lower inner join query:

mysql> SELECT * FROM e1 INNER JOIN e2 ON e1.m1 = e2.m2 WHERE e2.m2 = 2;
+------+------+------+------+
| m1   | n1   | m2   | n2   |
+------+------+------+------+
|    2 | b    |    2 | b    |
+------+------+------+------+
1 row in set (0.00 sec)

We call this condition that the specified WHERE clause in the outer join query contains a column in the driven table that is not NULL as NULL value rejection (English Name: reject NULL). After the WHERE clause of the driven table meets the condition of NULL rejection, the outer connection and inner connection can be converted to each other. The advantage of this transformation is that the query optimizer can evaluate the cost of different join orders of the table and select the join order with the lowest cost to execute the query.

Subquery optimization

Refer to the next chapter

Keywords: Database MySQL SQL

Added by KirstyBurgoine on Sun, 16 Jan 2022 09:06:36 +0200