sql practical optimization and misunderstanding

sql common pit + optimization

How many of the eight common SQL misuses did you get?
Alibaba cloud RDS expert delegation Java 2022-02-28 09:09

Introduction: Alibaba cloud RDS expert service team has helped cloud customers solve many urgent problems. Some common SQL problems in ApsaraDB expert diagnosis report are summarized as follows for your reference.

MySQL has maintained a strong growth trend in database popularity in recent years. More and more customers build their applications on MySQL database, and even migrate from Oracle to MySQL. However, some customers encounter some problems in the process of using MySQL database, such as slow response time, full CPU, etc.

Rate limiting is a very practical function in Nginx, but it is often misunderstood and misconfigured. We can use it to limit the number of HTTP requests a user can make at a given time. Request can be a simple website header

LIMIT statement
Paging query is one of the most common scenarios, but it is also often the most prone to problems. For example, for the following simple statements, the general idea of DBA is in type, name and create_ Add a composite index to the time field. In this way, conditional sorting can effectively use the index and improve the performance rapidly.

SELECT * 
FROM   operation 
WHERE  type = 'SQLStats' 
       AND name = 'SlowLog' 
ORDER  BY create_time 
LIMIT  1000, 10;

Well, maybe more than 90% of DBA s solve this problem, that's all. However, when the LIMIT clause becomes "LIMIT 1000000,10", programmers will still complain: why is it still slow when I only take 10 records?

You should know that the database does not know where the 1000000 record starts. Even if there is an index, it needs to be calculated from scratch. In most cases, programmers are lazy when this performance problem occurs. In scenarios such as front-end data browsing and page turning, or batch export of big data, the maximum value of the previous page can be used as a parameter as a query condition. The SQL is redesigned as follows:

SELECT   * 
FROM     operation 
WHERE    type = 'SQLStats' 
AND      name = 'SlowLog' 
AND      create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

Under the new design, the query time is basically fixed and will not change with the increase of the amount of data.

Implicit conversion
Another common error in SQL statements is that the types of query variables and field definitions do not match. For example, the following statement:

mysql> explain extended SELECT * 
     > FROM   my_balance b 
     > WHERE  b.bpn = 14000000123 
     >       AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

The field bpn is defined as varchar(20). MySQL's strategy is to convert strings into numbers and then compare them. Function acts on table fields, and the index is invalid.

The above situation may be the parameters automatically filled in by the application framework, rather than the original intention of the programmer. At present, there are many application frameworks, which are very complicated. While it is convenient to use, be careful that it may dig holes for yourself.

Association update and deletion
Although MySQL 5 6 introduces the materialization feature, but it needs special attention that it is only aimed at the optimization of query statements at present. For update or deletion, you need to rewrite it into JOIN manually.

For example, in the following UPDATE statement, MySQL actually executes a required subquery, and its execution time can be imagined.

UPDATE operation o 
SET    status = 'applying' 
WHERE  o.id IN (SELECT id 
                FROM   (SELECT o.id, 
                               o.status 
                        FROM   operation o 
                        WHERE  o.group = 123 
                               AND o.status NOT IN ( 'done' ) 
                        ORDER  BY o.parent, 
                                  o.id 
                        LIMIT  1) t);

Execution plan:

+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                                               |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY            | o     | index |               | PRIMARY | 8       |       | 24   | Using where; Using temporary                        |
| 2  | DEPENDENT SUBQUERY |       |       |               |         |         |       |      | Impossible WHERE noticed after reading const tables |
| 3  | DERIVED            | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1    | Using where; Using filesort                         |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After rewriting to JOIN, the selection mode of sub query changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.

UPDATE operation o 
       JOIN  (SELECT o.id, 
                            o.status 
                     FROM   operation o 
                     WHERE  o.group = 123 
                            AND o.status NOT IN ( 'done' ) 
                     ORDER  BY o.parent, 
                               o.id 
                     LIMIT  1) t
         ON o.id = t.id 
SET    status = 'applying'

The implementation plan is simplified to:

+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                                               |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1  | PRIMARY     |       |      |               |       |         |       |      | Impossible WHERE noticed after reading const tables |
| 2  | DERIVED     | o     | ref  | idx_2,idx_5   | idx_5 | 8       | const | 1    | Using where; Using filesort                         |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

Mixed sorting
MySQL cannot use indexes for mixed sorting. However, in some scenarios, there are opportunities to use special methods to improve performance.

SELECT * 
FROM   my_order o 
       INNER JOIN my_appraise a ON a.orderid = o.id 
ORDER  BY a.is_reply ASC, 
          a.appraise_time DESC 
LIMIT  0, 20

The execution plan is displayed as a full table scan:

+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type   | possible_keys     | key     | key_len | ref      | rows    | Extra    
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
|  1 | SIMPLE      | a     | ALL    | idx_orderid | NULL    | NULL    | NULL    | 1967647 | Using filesort |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY     | PRIMARY | 122     | a.orderid |       1 | NULL           |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Because is_reply has only two states: 0 and 1. After we rewrite it according to the following method, the execution time is reduced from 1.58 seconds to 2 milliseconds.

SELECT * 
FROM   ((SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 0 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20) 
        UNION ALL 
        (SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 1 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20)) t 
ORDER  BY  is_reply ASC, 
          appraisetime DESC 
LIMIT  20;

EXISTS statement
When MySQL treats the EXISTS clause, it still adopts the execution method of nested subquery. As shown in the following SQL statement:

SELECT *
FROM   my_neighbor n 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND EXISTS(SELECT 1 
                  FROM   message_info m 
                  WHERE  n.id = m.neighbor_id 
                         AND m.inuser = 'xxx') 
       AND n.topic_type <> 5

The implementation plan is:

+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type        | table | type | possible_keys     | key   | key_len | ref   | rows    | Extra   |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
|  1 | PRIMARY            | n     | ALL  |  | NULL     | NULL    | NULL  | 1086041 | Using where                   |
|  1 | PRIMARY            | sra   | ref  |  | idx_user_id | 123     | const |       1 | Using where          |
|  2 | DEPENDENT SUBQUERY | m     | ref  |  | idx_message_info   | 122     | const |       1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Removing exists and changing it to join can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.

SELECT *
FROM   my_neighbor n 
       INNER JOIN message_info m 
               ON n.id = m.neighbor_id 
                  AND m.inuser = 'xxx' 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND n.topic_type <> 5

New implementation plan:

+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref   | rows | Extra                 |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
|  1 | SIMPLE      | m     | ref    | | idx_message_info   | 122     | const    |    1 | Using index condition |
|  1 | SIMPLE      | n     | eq_ref | | PRIMARY   | 122     | ighbor_id |    1 | Using where      |
|  1 | SIMPLE      | sra   | ref    | | idx_user_id | 123     | const     |    1 | Using where           |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

Push under condition
External query conditions cannot be pushed down to complex views or sub queries:

Aggregate sub query;
Subquery with LIMIT;
Sub query UNION or all;
Sub query in the output field;
As shown in the following statement, it can be seen from the execution plan that its conditions act after the aggregate subquery:

SELECT * 
FROM   (SELECT target, 
               Count(*) 
        FROM   operation 
        GROUP  BY target) t 
WHERE  target = 'rm-xxxx'

+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table      | type  | possible_keys | key         | key_len | ref   | rows | Extra       |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
|  1 | PRIMARY     | <derived2> | ref   | <auto_key0>   | <auto_key0> | 514     | const |    2 | Using where |
|  2 | DERIVED     | operation  | index | idx_4         | idx_4       | 519     | NULL  |   20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

It is determined that the query criteria can be directly pushed down and rewritten as follows:

SELECT target, 
       Count(*) 
FROM   operation 
WHERE  target = 'rm-xxxx' 
GROUP  BY target

The execution plan becomes:

+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

For a detailed explanation that MySQL external conditions cannot be pushed down, please refer to the previous article: push to materialized table under MySQL · performance optimization · conditions

Narrow the scope in advance
Start with the initial SQL statement:

SELECT * 
FROM   my_order o 
       LEFT JOIN my_userinfo u 
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15

The original meaning of this SQL statement is: first make a series of left connections, and then sort the first 15 records. It can also be seen from the execution plan that the last step estimates that the number of sorting records is 900000 and the time consumption is 12 seconds.

+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             | rows   | Extra                                              |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
|  1 | SIMPLE      | o     | ALL    | NULL          | NULL    | NULL    | NULL            | 909119 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | SIMPLE      | p     | ALL    | PRIMARY       | NULL    | NULL    | NULL            |      6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+

Since the last WHERE condition and sorting are for the leftmost main table, you can sort my first_ Order sorting reduces the amount of data in advance, and then makes left connection. After SQL rewriting, the execution time is reduced to about 1 millisecond.

SELECT * 
FROM (
SELECT * 
FROM   my_order o 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15
) o 
     LEFT JOIN my_userinfo u 
              ON o.uid = u.uid 
     LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
ORDER BY  o.selltime DESC
limit 0, 15

Recheck the execution plan: after materializing the sub query (select_type=DERIVED), participate in the JOIN. Although the estimated row scan is still 900000, the actual execution time becomes very small after using the index and LIMIT clause.

+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref   | rows   | Extra                                              |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL  |     15 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | u          | eq_ref | PRIMARY       | PRIMARY | 4       | o.uid |      1 | NULL                                               |
|  1 | PRIMARY     | p          | ALL    | PRIMARY       | NULL    | NULL    | NULL  |      6 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | o          | index  | NULL          | idx_1   | 5       | NULL  | 909112 | Using where                                        |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

Intermediate result set push down
Let's look at the following example that has been preliminarily optimized (the main table in the left connection takes precedence over the query criteria):

SELECT    a.*, 
          c.allocated 
FROM      ( 
              SELECT   resourceid 
              FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
              SELECT   resourcesid， sum(ifnull(allocation, 0) * 12345) allocated 
              FROM     my_resources 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

Are there any other problems with this statement? It is not difficult to see that sub query c is a full table aggregate query, which will lead to the performance degradation of the whole statement when the number of tables is particularly large.

In fact, for sub query c, the final result set of the left connection only cares about the data that can match the resourceid of the main table. Therefore, we can rewrite the following statement to reduce the execution time from 2 seconds to 2 milliseconds.

SELECT    a.*, 
          c.allocated 
FROM      ( 
                   SELECT   resourceid 
                   FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
                   SELECT   resourcesid， sum(ifnull(allocation, 0) * 12345) allocated 
                   FROM     my_resources r, 
                            ( 
                                     SELECT   resourceid 
                                     FROM     my_distribute d 
                                     WHERE    isdelete = 0 
                                     AND      cusmanagercode = '1234567' 
                                     ORDER BY salecode limit 20) a 
                   WHERE    r.resourcesid = a.resourcesid 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

But subquery a appears many times in our SQL statement. This writing method not only has additional overhead, but also makes the whole sentence more complicated. Rewrite again using the WITH statement:

WITH a AS 
( 
         SELECT   resourceid 
         FROM     my_distribute d 
         WHERE    isdelete = 0 
         AND      cusmanagercode = '1234567' 
         ORDER BY salecode limit 20)
SELECT    a.*, 
          c.allocated 
FROM      a 
LEFT JOIN 
          ( 
                   SELECT   resourcesid， sum(ifnull(allocation, 0) * 12345) allocated 
                   FROM     my_resources r, 
                            a 
                   WHERE    r.resourcesid = a.resourcesid 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

summary

The database compiler generates an execution plan, which determines the actual execution mode of SQL. However, the compiler is only trying its best to serve, and the compilers of all databases are not perfect. Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of database compiler can we avoid its shortcomings and write high-performance SQL statements.
When programmers design data models and write SQL statements, they should bring the idea or consciousness of algorithms into them.
Develop the habit of using WITH statements when writing complex SQL statements. Simple and clear SQL statements can also reduce the burden of the database ^ ^.
If you encounter difficulties in using cloud database (not limited to SQL problems), please feel free to seek the help of Alibaba cloud's original expert services.

Keywords: MySQL Mybatis SQL

Added by ashutosh.titan on Mon, 07 Mar 2022 11:31:39 +0200

Programming VIP

sql practical optimization and misunderstanding

sql common pit + optimization

summary

Popular Keywords