First of all, I want to briefly explain that this article is translated from mysql manual , and through my integration, there are many text contents, but the actual gold content is relatively high, I suggest you read it carefully, There must be harvest.
1. GROUP BY optimization
The most common way to satisfy a GROUP BY clause is to scan the entire table and create a new temporary table, in which all rows of each group are continuous, and then use this temporary table to discover groups and apply aggregation functions (if any). In some cases, MySQL can do better than this and avoid creating temporary tables by using index access.
The most important prerequisite for using an index for GROUP BY is that all GROUP BY columns reference properties from the same index, and the index stores keys sequentially (for example, for BTREE indexes, not for HASH indexes). Whether index access can be used instead of temporary tables also depends on which parts of the index are used in the query, the conditions specified for those parts, and the aggregate function selected
There are two methods to execute queries through GROUP BY index access. Details will be described in the following sections. The first method applies grouping operations with all scope predicates, if any. The second method first performs a range scan and then groups the generated tuples.
In MySQL, GROUP BY is used for sorting, so the server can also apply ORDER BY optimization to groups. However, relying on implicit or explicit GROUP BY sorting is not recommended. See“ ORDER BY Optimization".
1. Loose index scanning
The most effective way to deal with GROUP BY is to use the index to retrieve the grouped columns directly. Using this access method, MySQL uses some index type attributes, and the keys are ordered (for example, BTREE). This attribute allows the use of lookup groups in the index, regardless of all keys in the index that meet all WHERE conditions. This access method only considers part of the key in the index, so it is called loose index scanning. When there is no WHERE clause, the loose index scan reads as many keys as the number of groups, and the number of groups may be much less than the number of all keys. If the WHERE clause contains range predicates (see the discussion of range connection types in section 8.8.1 "optimizing queries with EXPLAIN"), the loose index scan will find the first key of each group that meets the range conditions and read the minimum number of keys again. This is possible in the following cases:
- The query is for only one table.
- GROUP BY names only the columns that make up the leftmost prefix of the index, and does not name other columns. (if the query has a DISTINCT clause instead of GROUP BY, all the different properties point to the columns that make up the leftmost prefix of the index.) For example, if table t1 has an index on (c1,c2,c3), then if the query has GROUP BY C1, C2, a loose index scan is applicable. Not applicable if the query has GROUP BY C2, C3 (the column is not the leftmost prefix) or GROUP BY c1, c2, c4 (c4 is not in the index).
- The only aggregate functions (if any) used in the selection list are MIN()and MAX(), and they all refer to the same column. The column must be in the index and must follow GROUP BY
- In addition to the GROUP BY referenced in the query, other parts of the index must be constants (that is, they must be referenced in equations with constants), except for the parameters of the MIN() or MAX() functions
- For columns in an index, you must index the full column value, not just a prefix. For example, for c1 VARCHAR(20), INDEX (c1(10)), the index only uses the prefix of c1 value and cannot be used for loose index scanning.
If the loose index scan is applicable to the query, the EXPLAIN output displays using index for group by in the Extra column.
Suppose that idx(c1,c2,c3)table has an index t1(c1,c2,c3,c4). The loose index scan access method can be used for the following queries:
SELECT c1, c2 FROM t1 GROUP BY c1, c2; SELECT DISTINCT c1, c2 FROM t1; SELECT c1, MIN(c2) FROM t1 GROUP BY c1; SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2; SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;
The following queries cannot be executed by this quick select method for the following reasons:
- In addition to MIN() or MAX(), there are other aggregate functions:
SELECT c1, SUM(c2) FROM t1 GROUP BY c1;
- Columns in the GROUP BY clause do not form the leftmost index prefix:
SELECT c1, c2 FROM t1 GROUP BY c2, c3;
- The query references part of the key after the GROUP BY part and is not equal to the constant:
SELECT c1, c3 FROM t1 GROUP BY c1, c2;
If the query contains WHERE c3 = const, you can use a loose index scan.
In addition to the MIN() and MAX() references already supported, the loose index scan access method can be applied to other forms of aggregate function references in the selection list:
- Support AVG(DISTINCT), SUM(DISTINCT) and COUNT(DISTINCT). AVG(DISTINCT) and SUM(DISTINCT) have only one parameter. COUNT(DISTINCT) can have multiple column parameters.
- There must be no GROUP BY or DISTINCT clause in the query.
- The loose index scan restrictions described earlier still apply.
Suppose that there is an index idx(c1,c2,c3,c4) on table t1(c1,c2,c3,c4). The Loose Index Scan access method can be used for the following queries:
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1; SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;
2. Compact index scanning
A compact index scan can be a full index scan or a range index scan, depending on the query criteria.
When the conditions for loose index scanning are not met, you can still avoid creating temporary tables for GROUP BY queries. If there are scope conditions in the WHERE clause, the method reads only the keys that meet these conditions. Otherwise, it performs an index scan. This method is called a compact index scan because it reads all keys within each range defined by the WHERE clause, or scans the entire index if there is no range condition. With tight index scanning, grouping is performed only after all keys that meet the range conditions are found.
For this method to work, a constant equality condition is sufficient for all columns in a query that references the part before or between the part of the GROUP BY key. Constants in the equality condition fill any "blanks" in the search key so that the full prefix of the index can be formed. These index prefixes can be used for index lookup. If the GROUP BY results need to be sorted, and it is possible to form search keys with index prefixes, MySQL also avoids additional sorting operations, because all keys have been retrieved in order by using prefix search in the ordered index.
Suppose that idx(c1,c2,c3)table has an index t1(c1,c2,c3,c4). The following query does not apply to the loose index scan access method described above, but it still applies to the tight index scan access method.
- There is a gap in GROUP BY, but it is covered by the condition c2 = 'a':
SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
- GROUP BY does not start with the first part of the key, but there is a condition that provides a constant for that part
SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;
2. DISTINCT optimization
Because DISTINCT can use GROUP BY to understand how MySQL handles columns in partial ORDER BY or HAVING clauses that do not belong to the selected column. See“ MySQL processing GROUP BY".
In most cases, DISTINCT can treat a clause as a special case of GROUP BY. For example, the following two queries are equivalent:
SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const; SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3;
Because of this equivalence, the optimization applicable to GROUP BY queries can also be applied to queries with DISTINCT clauses. Therefore, for more details on the possibility of DISTINCT query optimization, see "GROUP BY optimization" above.
When limit row_ When count and DISTINCT are combined, once MySQL finds row_ The only row of count will stop.
If the columns in all tables named in the query are not used, MySQL stops scanning any unused tables as soon as it finds the first match. In the following example, assuming that t1 is used before t2 (you can use EXPLAIN to check), MySQL stops reading from t2 when it finds the first row in t2 (for any specific row in t1):
SELECT DISTINCT t1.a FROM t1, t2 where t1.a=t2.a;
2. LIMIT query optimization
If you only need the number of rows specified in the result set, use the LIMIT clause in the query instead of getting the entire result set and throwing away additional data.
MySQL sometimes optimizes a row with a limit_ Count clause and query without HAVING clause:
-
If you use LIMIT to select only a few rows, MySQL will use indexes in some cases when it usually tends to perform a full table scan.
-
If LIMIT row_count and ORDER BY are combined, and MySQL finds the first row of the sorting result_ Instead of sorting the entire result, the sorting will stop when the count row. If sorting is done by using indexes, this is very fast. If filesort must be executed, all rows matching the query without a LIMIT clause are selected and the first row is found_ Most or all of their rows are sorted before count. After the initial row is found, MySQL does not sort any of the remaining parts of the result set.
One manifestation of this behavior is that ORDER BY queries with and without LIMIT may return rows in different order, as described later in this section.
-
If you put LIMIT row_count and DISTINCT are combined. Once MySQL finds row_ The only row of count will stop.
-
In some cases, you can parse GROUP BY by reading the index in order (or sorting the index), and then calculate the summary until the index value changes. In this case, LIMIT row_count does not calculate any unnecessary GROUP BY values.
-
Once MySQL sends the required number of rows to the client, it aborts the query unless you use SQL_CALC_FOUND_ROWS. In this case, you can use SELECT FOUND_ROWS() retrieves the number of rows. See“ Information function".
-
LIMIT 0 quickly returns an empty set. This is useful for checking the validity of queries. It can also be used to get the type of result column in applications using MySQL API, which makes result set metadata available. In the mysql client program, you can use the -- column type info option to display the result column type.
-
If the server uses a temporary table to parse the query, it will use LIMIT row_count clause to calculate how much space is needed.
-
If the index is not used for ORDER BY, but there is also a LIMIT clause, the optimizer may be able to avoid using merged files and sort rows in memory using the in memory filesort operation.
If multiple rows have the same value in the ORDER BY column, the server is free to return these rows in any order, which may be executed in different ways according to the overall execution plan. In other words, the sort order of these rows is uncertain for non sequential.
One factor affecting the execution plan is LIMIT, so ORDER BY queries with and without LIMIT may return rows in different order. Consider this query, which is sorted by category, but is uncertain about the id and rating columns:
mysql> SELECT * FROM ratings ORDER BY category; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | | 2 | 3 | 5.0 | | 7 | 3 | 2.7 | +----+----------+--------+
Including LIMIT may affect the order of rows in each category value. For example, this is a valid query result:
mysql> SELECT * FROM ratings ORDER BY category LIMIT 5; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 4 | 2 | 3.5 | | 3 | 2 | 3.7 | | 6 | 2 | 3.5 | +----+----------+--------+
In each case, the rows are sorted by ORDER by column, which is all that is required by the SQL standard.
If it is important to ensure the same row order with and without LIMIT, include additional columns in the order BY clause to make the order deterministic. For example, if the id value is unique, you can arrange the rows of a given category value in id order, as follows:
mysql> SELECT * FROM ratings ORDER BY category, id; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | | 2 | 3 | 5.0 | | 7 | 3 | 2.7 | +----+----------+--------+ mysql> SELECT * FROM ratings ORDER BY category, id LIMIT 5; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | +----+----------+--------+
For queries with ORDER BY or GROUP BY and LIMIT clauses, the optimizer attempts to select an ordered index by default, which will speed up query execution. Before MySQL 5.7.33, there was no way to override this behavior, even when some other optimizations might be faster. Starting with MySQL 5.7.33, you can set optimizer_ Preference of switch system variable_ ordering_ Set the index flag to off to turn off this optimization.
First, we create and populate a table t, as follows:
# Create and populate a table t: mysql> CREATE TABLE t ( -> id1 BIGINT NOT NULL, -> id2 BIGINT NOT NULL, -> c1 VARCHAR(50) NOT NULL, -> c2 VARCHAR(50) NOT NULL, -> PRIMARY KEY (id1), -> INDEX i (id2, c1) -> ); # [Insert some rows into table t - not shown]
Verify preference_ ordering_ Whether the index flag is enabled:
mysql> SELECT @@optimizer_switch LIKE '%prefer_ordering_index=on%'; +------------------------------------------------------+ | @@optimizer_switch LIKE '%prefer_ordering_index=on%' | +------------------------------------------------------+ | 1 | +------------------------------------------------------+
Since the following query has a LIMIT clause, we want it to use an ordered index if possible. In this case, as we can see from the EXPLAIN output, it uses the primary key of the table.
mysql> EXPLAIN SELECT c2 FROM t -> WHERE id2 > 3 -> ORDER BY id1 ASC LIMIT 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t partitions: NULL type: index possible_keys: i key: PRIMARY key_len: 8 ref: NULL rows: 2 filtered: 70.00 Extra: Using where
Now let's disable preference_ ordering_ Index flag and re run the same query; This time it uses index I (which includes the id2 column used in the WHERE clause) and a filesort:
mysql> SET optimizer_switch = "prefer_ordering_index=off"; mysql> EXPLAIN SELECT c2 FROM t -> WHERE id2 > 3 -> ORDER BY id1 ASC LIMIT 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t partitions: NULL type: range possible_keys: i key: i key_len: 8 ref: NULL rows: 14 filtered: 100.00 Extra: Using index condition; Using filesort
Please refer to“ Switchable optimization".