Function index (Generated Column) and SQL optimization in MySQL

There is no function index function of Oracle in MySQL. It is not accurate to call the Generated Column of MySQL as "function index", but it can achieve the same effect as the function index. Some people also call this feature "derived column".

What is Generated Column

The value of Generated Column is calculated according to the expression defined by it. Here is a brief introduction using the example in the official document.

There is a table to store the three side lengths of a right triangle. As we all know, according to the side length formula of a right triangle, the length of the hypotenuse can be calculated by the other two side lengths, so that only two right angles can be stored in the table, and the hypotenuse is defined by the Generated Column to create this table and insert a piece of data:

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))
);
INSERT INTO triangle (sidea, sideb) VALUES(1,1),(3,4),(6,8);

Side a and side B are two right angle edges, and side C is a beveled edge. When inserting, only two right angle edges need to be inserted. That is to say, the Generated Column cannot be operated manually (insert, update, delete). It will be automatically calculated according to its definition expression.

Query this table:

mysql> SELECT * FROM triangle;
+-------+-------+--------------------+
| sidea | sideb | sidec              |
+-------+-------+--------------------+
|     1 |     1 | 1.4142135623730951 |
|     3 |     4 |                  5 |
|     6 |     8 |                 10 |
+-------+-------+--------------------+

Generated Column definition syntax

The definition syntax of Generated Column is as follows:

col_name data_type [GENERATED ALWAYS] AS (expr)
  [VIRTUAL | STORED] [NOT NULL | NULL]
  [UNIQUE [KEY]] [[PRIMARY] KEY]
  [COMMENT 'string']

The keyword "AS" indicates that this field is derived. It is Generated Column. The expression used for calculation is followed by AS. GENERATED ALWAYS makes the definition clearer and can be omitted.

VIRTUAL and STORED are two types of Generated Column, indicating how the value of the field is STORED:

  • Virtual: the value of virtual generated column will not be persisted to disk, only saved in the data dictionary (metadata of the table), and will be calculated immediately after the BEFORE trigger every time it is read.
  • STORED: the value of the Stored Generated Column is persisted to disk, not computed each time it is read.

If not specified, MySQL will be implemented in the form of VIRTUAL by default. STORED requires more disk space and has no obvious performance advantage. Therefore, VIRTUAL is generally used.

Generated Column definition points

  • In general, Generated Column can be defined with built-in functions and operators. Given the same data, multiple calls will produce the same result, which is explicitly allowed. Otherwise, the definition will fail, such as the definition using NOW(), current ﹣ user(), connection ﹣ id().
  • Custom functions and stored procedures are not allowed.
  • Variables, such as system variables, custom variables, etc., are not allowed.
  • Subqueries are not allowed.
  • The Generated Column definition can rely on other Generated Column fields, but the dependent derived field must be defined before it. If you rely only on non derived fields, there is no requirement for the definition order.
  • Auto increment is not allowed.
  • Self growing column cannot be used in the definition of Generated Column.
  • Starting from MySQL 5.7.10, if expression evaluation results in truncation or provides incorrect input to the function, the create table statement will terminate and return the DDL operation.

A SQL optimization

Find a slow SQL through the slow query log. The execution plan is as follows:

mysql> EXPLAIN
SELECT
    c.id,
    b.customer_status
FROM
    t_core_customer c
    INNER JOIN t_core_customer_bizinfo b ON c.id = b.customer_id AND b.biz_id = 'maintain' 
WHERE
    REPLACE ( REPLACE ( c.customer_name, '(', '(' ), ')', ')' ) = 'Tianjin maisikou Technology Co., Ltd';
+----+-------------+-------+------------+--------+----------------------------------+---------+---------+--------------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys                    | key     | key_len | ref                            | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+----------------------------------+---------+---------+--------------------------------+---------+----------+-------------+
|  1 | SIMPLE      | b     | NULL       | ALL    | idx_core_customer_bizinfo_cidbid | NULL    | NULL    | NULL                           | 1263918 |    10.00 | Using where |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY                          | PRIMARY | 110     | b.customer_id                  |       1 |   100.00 | Using where |
+----+-------------+-------+------------+--------+----------------------------------+---------+---------+--------------------------------+---------+----------+-------------+
2 rows in set (0.05 sec)

There are 1.17 million rows of data in the customer table, which takes more than 4 seconds to execute. From the execution plan, we can see that the customer table does not go through the index, and the index of the customer name field is not used because of the replace function.

Add Generated Column:

ALTER TABLE `t_core_customer` 
ADD COLUMN `customer_name_replaced` varchar(200)  AS (REPLACE(REPLACE(customer_name, '(', '(' ), ')', ')' )); 

Create index:

ALTER TABLE `t_core_customer` 
ADD INDEX `customer_name_replaced`(`customer_name_replaced`) USING BTREE;

See the execution plan after optimization:

mysql> EXPLAIN
SELECT
    c.id,
    b.customer_status
FROM
    t_core_customer c
    INNER JOIN t_core_customer_bizinfo b ON c.id = b.customer_id AND b.biz_id = 'maintain' 
WHERE
    REPLACE ( REPLACE ( c.customer_name, '(', '(' ), ')', ')' ) = 'Tianjin maisikou Technology Co., Ltd';
+----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+-----------------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys                    | key                              | key_len | ref                         | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+-----------------------------+------+----------+-------+
|  1 | SIMPLE      | c     | NULL       | ref  | PRIMARY,customer_name_replaced   | customer_name_replaced           | 603     | const                       |    1 |   100.00 | NULL  |
|  1 | SIMPLE      | b     | NULL       | ref  | idx_core_customer_bizinfo_cidbid | idx_core_customer_bizinfo_cidbid | 222     | c.id,const                  |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+----------------------------------+----------------------------------+---------+-----------------------------+------+----------+-------+
2 rows in set (0.40 sec)

The execution plan is normal, the index is used, and the SQL takes less than 10 milliseconds.

Keywords: MySQL SQL Oracle less

Added by bettyatolive on Sun, 08 Mar 2020 11:44:01 +0200