Troubleshooting of MySQL query timeout

A sentry alarm was received this week. The following SQL query timed out.

select * from order_info where uid = 5837661 order by id asc limit 1

Execute show create table order_info   It is found that this table is actually indexed

CREATE TABLE `order_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `uid` int(11) unsigned,
  `order_status` tinyint(3) DEFAULT NULL,
  ... Omit other fields and indexes
  PRIMARY KEY (`id`),
  KEY `idx_uid_stat` (`uid`,`order_status`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Theoretically, executing the above SQL will hit idx_uid_stat this index, but actually execute explain to view

explain select * from order_info where uid = 5837661 order by id asc limit 1

You can see its possible_keys (the index that may be involved in this SQL) is idx_uid_stat, but actually (key) uses full table scanning

We know that MySQL chooses whether to execute the final execution plan based on full table scanning or an index based on cost, so it seems that the cost of full table scanning is less than that based on IDX_ uid_ The cost of stat index execution, but my first feeling is very strange. Although this SQL is returned to the table, its limit is 1, that is, only one statement satisfying uid = 5837661 is selected. Even if the table is returned, only one record is returned. This cost is almost negligible. How can the optimizer choose full table scanning.

To see why MySQL optimizer chose full table scanning, I opened optimizer_trace, come on

Voice over: in MySQL 5.6 and later versions, we can use the optimizer trace function to view the whole process of the optimizer generating the execution plan

Using optimizer_ The specific process of trace is as follows

SET optimizer_trace="enabled=on";        //  open   optimizer_trace
SELECT * FROM order_info where uid = 5837661 order by id asc limit 1
SELECT * FROM information_schema.OPTIMIZER_TRACE;    //  View execution schedule
SET optimizer_trace="enabled=off"; //  close   optimizer_trace

MySQL optimizer first calculates the cost of full table scanning, then selects all indexes that may be involved in the SQL and calculates the cost of the index, and then selects the one with the lowest cost to execute. Let's take a look at the key information given by optimizer trace

{
  "rows_estimation": [
    {
      "table": "`rebate_order_info`",
      "range_analysis": {
        "table_scan": {
          "rows": 21155996,
          "cost": 4.45e6    //  Full table scanning cost
        }
      },
      ...
      "analyzing_range_alternatives": {
          "range_scan_alternatives": [
          {
            "index": "idx_uid_stat",
            "ranges": [
            "5837661 <= uid <= 5837661"
            ],
            "index_dives_for_eq_ranges": true,
            "rowid_ordered": false,
            "using_mrr": false,
            "index_only": false,
            "rows": 255918,
            "cost": 307103,            //  Using idx_ uid_ Cost of stat index
            "chosen": true
            }
          ],
       "chosen_range_access_summary": {    //  The final result selected after the above cost comparison
         "range_access_plan": {
             "type": "range_scan",
             "index": "idx_uid_stat",  //  You can see that IDX is finally selected_ uid_ Stat this index to execute
             "rows": 255918,
             "ranges": [
             "58376617 <= uid <= 58376617"
             ]
         },
         "rows_for_plan": 255918,
         "cost_for_plan": 307103,
         "chosen": true
         }
         }  
    ...

You can see that the cost of full table scanning is 4.45e6, and the index idx is selected_ uid_ The cost of stat is 307103, which is far less than the cost of full table scanning, and from the final selection result (chosen_range_access_summary), IDX is indeed selected_ uid_ Stat is the index, but why do you choose to perform PRIMARY, that is, full table scanning from explain? Is there a mistake in this execution plan?

After taking a closer look at the implementation plan, I found something fishy. There is a reconsidering in the implementation plan_ access_ paths_ for_ index_ The ordering choice caught my attention

{
    "reconsidering_access_paths_for_index_ordering": {
    "clause": "ORDER BY",
    "index_order_summary": {
      "table": "`rebate_order_info`",
      "index_provides_order": true,
      "order_direction": "asc",
      "index": "PRIMARY",    //  You can see that the primary key index is selected
      "plan_changed": true,
      "access_type": "index_scan"
        }
    }
}

This selection means that the index selection optimization is performed again due to sorting. Because our SQL uses id sorting (order by id asc limit 1), the optimizer finally selects PRIMARY, that is, full table scanning. That is, this selection will ignore the previous selection based on index cost. Why is there such an option? The main reasons are as follows:

The short explanation is that the optimizer thinks — or should I say hopes — that scanning the whole table (which is already sorted by the id field) will find the limited rows quick enough, and that this will avoid a sort operation. So by trying to avoid a sort, the optimizer ends-up losing time scanning the table.

It can be seen from this explanation that the main reason is that we use the id based sorting method of order by id asc. The optimizer thinks that sorting is an expensive operation, so in order to avoid sorting, and it thinks that   If the n of limit n is very small, it can be quickly executed even if full table scanning is used. Therefore, it selects full table scanning to avoid id sorting (full table scanning is actually a cluster index scanning based on id primary key, which is sorted based on id)

If this choice is right, it doesn't matter. However, in fact, there are bug s in this optimization! Actual selection idx_uid_stat execution will be much faster (only 28 ms)! Many people on the Internet feed back this problem, and this problem basically only appears in SQL   order by id asc limit n is related to this writing method. If n is relatively small, it is very likely that the full table will be scanned. If n is relatively large, the correct index will be selected.

This bug dates back to 2014. Many people have called on the official to correct it in time. It may be difficult to implement it until MySQL 5.7 and 8.0. Therefore, we should try to avoid this writing method before official repair. If we must use this writing method, what should we do? There are two main solutions

  1. Use force index to force the use of the specified index, as follows:

select * from order_info force index(idx_uid_stat) where uid = 5837661 order by id asc limit 1

Although this writing method is OK, it is not elegant enough. What if the index is abandoned? So there is a second more elegant scheme

  1. Use the order by (id+0) scheme as follows

select * from order_info where uid = 5837661 order by (id+0) asc limit 1

This scheme also allows the optimizer to select the correct index, which is more recommended! Why is this trick OK? Although the SQL is sorted by id, it adds such a time-consuming operation on id (although it only adds a useless 0, it is enough to deceive the optimizer). The optimizer thinks that it will consume more performance based on full table scanning at this time, so it will select the index based on the cost.

Keywords: Database MySQL SQL

Added by synchro_irl on Wed, 27 Oct 2021 14:41:44 +0300