Problem Description
R&D colleagues feedback that an application is slow to execute, corresponding to SQL:
UPDATE bs_serial_trac SET SERIAL_IS_LOCK = 0, LOCK_VALUE = '', UPDATE_USER = 'transSys' WHERE GOODS_NO = '4418095740626' AND ( PARENT_CODE = 'F9G7S19722001835' OR SERIAL = 'F9G7S19722001835' );
The index on table bs_serial_trac is as follows:
PRIMARY KEY (`ID`), UNIQUE KEY `idx_complex_serial_goodsNo` (`SERIAL`,`GOODS_NO`), KEY `idx_update_time` (`UPDATE_TIME`), KEY `idx_serial_goodsNo` (`GOODS_NO`), KEY `idx_parent_code` (`PARENT_CODE`), KEY `idx_lock_value` (`LOCK_VALUE`)
Because of the OR condition, the query can only search data based on the condition GOODS_NO='4418095740626'. Its execution plan is as follows:
*************************** 1. row *************************** id: 1 select_type: UPDATE table: bs_serial_trac partitions: NULL type: range possible_keys: idx_serial_goodsNo key: idx_serial_goodsNo key_len: 93 ref: const rows: 404920 filtered: 100.00 Extra: Using where 1 row in set (0.00 sec)
Due to the poor selectivity of GOODS_NO columns and the large number of records satisfying the conditions, the query performance is poor.
SELECT COUNT(1) FROM bs_serial_trac WHERE GOODS_NO = '4418095740626'; +----------+ | COUNT(1) | +----------+ | 215447 | +----------+
Solution steps
Obviously, the problem is that the index cannot be used properly because of OR statement. The OR is adjusted to UNION ALL and the adjusted SQL statement is as follows:
SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND PARENT_CODE = 'F9G7S19722007485' UNION ALL SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' AND PARENT_CODE <> 'F9G7S19722007485'
Query speed is very fast, from the previous 800ms optimization to less than 10ms, it looks perfect, but the query results have no data...
Under simple positioning, it is found that the PARENT_CODE column type is varchar(50) DEFAULT'', and the PARENT_CODE column value is NULL. There are some problems in judging whether the PARENT_CODE column is not equal to NULL.
SELECT NOT(PARENT_CODE <=> 'F9G7S19722007485') AS C1, NOT(PARENT_CODE = 'F9G7S19722007485') AS C2, PARENT_CODE <> 'F9G7S19722007485' AS C3 FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' ; +----+------+------+ | C1 | C2 | C3 | +----+------+------+ | 1 | NULL | NULL | +----+------+------+
Therefore, when you rewrite a UNION ALL statement, you need to rewrite it as follows:
SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND PARENT_CODE = 'F9G7S19722007485' UNION ALL SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' AND NOT(PARENT_CODE <=> 'F9G7S19722007485')
Or rewrite it as:
SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND PARENT_CODE = 'F9G7S19722007485' UNION ALL SELECT * FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' AND (PARENT_CODE <> 'F9G7S19722007485' OR PARENT_CODE IS NULL)
Since PARENT_CODE is not used for index lookup but for data filtering in the second part of UNION ALL query, neither of the two methods will affect query performance.
After confirming that the UNION ALL performance meets the requirements, the UPDATE operation is rewritten as follows:
UPDATE bs_serial_trac SET SERIAL_IS_LOCK = 0, LOCK_VALUE = '', UPDATE_USER = 'transSys' WHERE ID IN( SELECT ID FROM( SELECT ID FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND PARENT_CODE = 'F9G7S19722007485' UNION ALL SELECT ID FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' AND (PARENT_CODE <> 'F9G7S19722007485' OR PARENT_CODE IS NULL) ) AS T1 )
For the implementation plan:
*************************** 1. row *************************** id: 1 select_type: UPDATE table: bs_serial_trac partitions: NULL type: index possible_keys: NULL key: PRIMARY key_len: 8 ref: NULL rows: 13270473 filtered: 100.00 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: <derived3> partitions: NULL type: index_subquery possible_keys: <auto_key0> key: <auto_key0> key_len: 8 ref: func rows: 2 filtered: 100.00 Extra: Using index *************************** 3. row *************************** id: 3 select_type: DERIVED table: bs_serial_trac partitions: NULL type: ref possible_keys: idx_serial_goodsNo,idx_parent_code key: idx_parent_code key_len: 153 ref: const rows: 1 filtered: 5.00 Extra: Using where *************************** 4. row *************************** id: 4 select_type: UNION table: bs_serial_trac partitions: NULL type: const possible_keys: idx_complex_serial_goodsNo,idx_serial_goodsNo,idx_parent_code key: idx_complex_serial_goodsNo key_len: 695 ref: const,const rows: 1 filtered: 100.00 Extra: NULL 4 rows in set (0.00 sec)
Although it is updated according to the primary key ID, the use of IN statement still results in the query being scanned in a full table with poor performance. It is necessary to convert the IN query into INNER JOIN:
UPDATE bs_serial_trac AS T2 INNER JOIN ( SELECT ID FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND PARENT_CODE = 'F9G7S19722007485' UNION ALL SELECT ID FROM bs_serial_trac WHERE GOODS_NO = '4418095740626' AND SERIAL = 'F9G7S19722007485' AND (PARENT_CODE <> 'F9G7S19722007485' OR PARENT_CODE IS NULL) ) AS T1 ON T1.ID=T2.ID SET T2.SERIAL_IS_LOCK = 0, T2.LOCK_VALUE = '', T2.UPDATE_USER = 'transSys'
The revised implementation plan is as follows:
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: UPDATE table: T2 partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: T1.ID rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 2 select_type: DERIVED table: bs_serial_trac partitions: NULL type: ref possible_keys: idx_serial_goodsNo,idx_parent_code key: idx_parent_code key_len: 153 ref: const rows: 1 filtered: 5.00 Extra: Using where *************************** 4. row *************************** id: 3 select_type: UNION table: bs_serial_trac partitions: NULL type: const possible_keys: idx_complex_serial_goodsNo,idx_serial_goodsNo,idx_parent_code key: idx_complex_serial_goodsNo key_len: 695 ref: const,const rows: 1 filtered: 100.00 Extra: NULL 4 rows in set (0.00 sec)
Adjustment can be operated according to the main key normally, and the performance is guaranteed.
<=> operator
<=> : NULL-safe equal. This operator performs an equality comparison like the = operator, but returns 1 rather than NULL if both operands are NULL, and 0 rather than NULL if one operand is NULL. Reference: https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_equal-to