MySQL 8.0 InnoDB supports instant add column

Original address: https://mysqlserverteam.com/mysql-8-0-innodb-now-supports-instant-add-column/

 
For a long time, instant DDL has been one of the most popular InnoDB functions. For the growing and fast-growing data set, any network scale database must have the function of executing DDL immediately.
Developers often need to add new columns to meet changing business needs. The function of add column is the first in a series of instant DDL statements.
Migrating to a new transaction data dictionary in MySQL 8.0 makes this much easier. Before MySQL 8.0, metadata (data dictionary) was stored in a flat file called. frm file. frm file is an incredible format, which has been out of date for a long time.  
The instant add patch is provided by Tencent game database administrator team. We would like to thank Tencent game for its important and timely contribution.

background

MySQL 5.6 is the first version to support replace DDL. Before MySQL 5.6, the only way to execute DDL was to copy lines line by line.
INPLACE DDL is mainly processed by InnoDB, while line by line COPY is processed at the server layer. Until 8.0 (see the experimental version), InnoDB even added columns to the table by rebuilding the table for the replace DDL algorithm.

  • For large tables, it can take a long time, especially in a replication environment.  
  • Disk space requirements will more than double, roughly the same size as existing tables.  
  • DDL operations take up resources, and put forward high requirements for CPU, memory and IO, which compete for resources from user transactions.  
  • If replication is involved, the slave must wait until the DDL completes before synchronization can start.
 
New instant algorithm

Many users asked us how to avoid the time-consuming schema changes. You can now do this by (always) specifying ALGORITHM = install, which guarantees that the operation will complete immediately (if not supported).
In addition, if ALGORITHM is not specified at all, the server will first try the default = install ALGORITHM, if it cannot be completed, the server will try the replace ALGORITHM; if SE cannot support it, the server will finally try the COPY ALGORITHM.
The new syntax is as follows:

ALTER TABLE table_name [alter_specification], ALGORITHM=INSTANT;
The advantage of the install algorithm is that metadata changes are made only in the data dictionary. During SE change, there is no need to get metadata lock and touch the data in the table.
This change also affects the LOCK =... Semantics. There is no need to specify a LOCK for the immediate algorithm. If you use ALGORITHM = install, LOCK cannot be set to any value other than DEFAULT, otherwise an error will occur:
ALTER TABLE t1 ALTER COLUMN i SET DEFAULT 11, ALGORITHM=INSTANT, LOCK=NONE;
ERROR HY000: Incorrect usage of ALGORITHM=INSTANT and LOCK=NONE/SHARED/EXCLUSIVE
# ALGORITHM=INSTANT and LOCK=DEFAULT are OK though.
ALTER TABLE t1 ALTER COLUMN i SET DEFAULT 13, ALGORITHM=INSTANT, LOCK=DEFAULT;
If you set algorithm = install to a DDL that is not supported, an error occurs, as shown below. The idea here is that if it doesn't support it, it will fail directly, instead of default converting and switching to another algorithm behind the scenes.
ALTER TABLE t1 ALTER COLUMN i SET DEFAULT 12, DROP COLUMN j, ALGORITHM=INSTANT;
ERROR 0A000: ALGORITHM=INSTANT is not supported for this operation. Try ALGORITHM=COPY/INPLACE

Currently, Innodb instant DDL supports the following operations

  • Change index option
  • Rename table (in ALTER way)
  • SET/DROP DEFAULT
  • MODIFY COLUMN
  • Add/drop virtual columns
  • Add columns (non generated) - what we call instant DDL
You can specify more than one instant operation in a statement. Here is an example of instant operation
mysql> CREATE TABLE t1 (a INT, b INT, KEY(b));
Query OK, 0 rows affected (0.70 sec)

mysql> # Modify the index can be instant if it's a trivial change
mysql> ALTER TABLE t1 DROP KEY b, ADD KEY b(b) USING BTREE, ALGORITHM = INSTANT; 
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> # Rename the table through ALTER TABLE can be instant
mysql> ALTER TABLE t1 RENAME TO t2, ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.26 sec)

mysql> # SET DEFAULT to a column can be instant
mysql> ALTER TABLE t2 ALTER COLUMN b SET DEFAULT 100, ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> # DROP DEFAULT to a column can be instant
mysql> ALTER TABLE t2 ALTER COLUMN b DROP DEFAULT, ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> # MODIFY COLUMN can be instant
mysql> ALTER TABLE t2 ADD COLUMN c ENUM('a', 'b', 'c'), ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.35 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t2 MODIFY COLUMN c ENUM('a', 'b', 'c', 'd', 'e'), ALGORITHM=INSTANT;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> # ADD/DROP virtual column can be instant
mysql> ALTER TABLE t2 ADD COLUMN (d INT GENERATED ALWAYS AS (a + 1) VIRTUAL), ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.38 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t2 DROP COLUMN d, ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.40 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> # Do two operations instantly in the same statement
mysql> ALTER TABLE t2 ALTER COLUMN a SET DEFAULT 20, ALTER COLUMN b SET DEFAULT 200, ALGORITHM = INSTANT;
Query OK, 0 rows affected (0.20 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> DROP TABLE t2;
Query OK, 0 rows affected (0.36 sec)
 

How does it work

The problem we face is how to parse the physical records on the page after the metadata changes immediately after adding the columns?
Note that the physical record here refers to the record stored in the leaf page of the clustered index. The existing secondary indexes of clustered indexes are not even affected by non leaf pages (internal nodes of B-tree).
 
InnoDB has two main row formats: redundant row and compact row. Row format dynamics are a smaller variation of compact. Compression and its derived row formats remove some metadata from redundant row formats to save space.
Because of this "space saving" change, when we have to deserialize the data in the physical rows on the page, we always need to find metadata from the internal metadata structure.
In order for the immediate addition of columns to work, we need to add some metadata to the physical records in the format of DYNAMIC and COMPACT rows on the page. This additional metadata is not required for the rendant row format because the number of columns is already stored in the physical record.
 
Additional information is retained in the physical record along with some metadata in the data dictionary.
This is very different from the practice of some downstream hackers based on the same Tencent patch, which stores similar metadata in the fuzzy and unused parts of the table space.
We believe that storing metadata in the appropriate data dictionary table and keeping it consistent on transactions will make it more robust and natural. This new metadata is stored in the physical record.
This new metadata includes a flag stored in info "bits. This new information in info? Bits tracks whether a record is created after the first immediate ADD COLUMN.
We also use info? Bits to track the number of fields / columns in the physical record. The number of columns when the table experiences the first immediate ADD COLUMN and all default values for newly added columns are stored in the data dictionary.
These two pieces of information are stored in the se "private" data column of the data dictionary table.
 
With this additional information, you can now perform the ADD COLUMN operation immediately without modifying any rows in the table. If there is no immediate ADD COLUMN, all rows in the table are in the same format as before.
Any updates to the table will be written in the new format after the ADD COLUMN is issued immediately. Find the default value, if any, from the data dictionary.

In each instant ADD COLUMN, the default value of the newly added COLUMN is tracked separately. The default values for these columns can be changed at any time. Therefore, after rebuilding or truncating the table, you can discard the instant COLUMN count and default values, and you can change the rows in the table to the old format as before.
If the table is a partitioned table, different partitions may have different numbers of immediate columns, and different default values are required.
If some partitions are rebuilt, truncated, or recreated, the rows in the partition can also be changed to the old format as before.
 
Translator's note:
The principle of real-time DDL is rather obscure. The following two diagrams can also be described. In a word, compared with the original newly added fields, the whole table can be rebuilt, and the method of instant plus fields can learn to change metadata to improve performance.
 
The following screenshots are from: https://opensource.actionsky.com/20190620-mysql-add-column/ , delete
1. The process of adding fields without "immediate": after the change of row based storage rules (adding fields), all rows of the entire table need to be rebuilt (regenerated)

2. The process of adding fields in real time: only metadata is modified after the row based storage rule changes (add fields)
,
3. The process of query after "instant" plus fields.

4. Processing of new data after "instant" adding fields

 

How to observe (Instant column)
Users can observe the results of adding columns in real time through the view in information ﹣ schema. More specifically, some new fields will be added to information ﹣ u schema. InnoDB ﹣ tables and information ﹣ u schema. InnoDB ﹣ columns.
Note that there is no need to provide a new observation state for other actions that can be done immediately. See the following example:
mysql> CREATE TABLE t1 (a INT, b INT);
Query OK, 0 rows affected (0.06 sec)
mysql> SELECT table_id, name, instant_cols FROM information_schema.innodb_tables WHERE name LIKE '%t1%';
+----------+---------+--------------+
| table_id | name    | instant_cols |
+----------+---------+--------------+
|     1065 | test/t1 |            0 |
+----------+---------+--------------+
1 row in set (0.22 sec)
 
mysql> SELECT table_id, name, has_default, default_value FROM information_schema.innodb_columns WHERE table_id = 1065;
+----------+------+-------------+---------------+
| table_id | name | has_default | default_value |
+----------+------+-------------+---------------+
|     1065 | a    |           0 | NULL          |
|     1065 | b    |           0 | NULL          |
+----------+------+-------------+---------------+
2 rows in set (0.38 sec)
As you can see, a new column named "instant? Cols" has been introduced into InnoDB? Tables, which represents the number of instant columns,
In InnoDB columns, two new columns about default values are introduced, named "has default" and "default value".
mysql> ALTER TABLE t1 ADD COLUMN c INT, ADD COLUMN d INT DEFAULT 1000, ALGORITHM=INSTANT;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> SELECT table_id, name, instant_cols FROM information_schema.innodb_tables WHERE name LIKE '%t1%';
+----------+---------+--------------+
| table_id | name    | instant_cols |
+----------+---------+--------------+
|     1065 | test/t1 |            2 |
+----------+---------+--------------+
1 row in set (0.03 sec)
 
mysql> SELECT table_id, name, has_default, default_value FROM information_schema.innodb_columns WHERE table_id = 1065;
+----------+------+-------------+---------------+
| table_id | name | has_default | default_value |
+----------+------+-------------+---------------+
|     1065 | a    |           0 | NULL          |
|     1065 | b    |           0 | NULL          |
|     1065 | c    |           1 | NULL          |
|     1065 | d    |           1 | 800003e8      |
+----------+------+-------------+---------------+
4 rows in set (0.36 sec)
Note that the table "ID does not change. This is no longer a table rebuild! As we can see, 'instant'cols is now set to 2, which means that there are columns a and b in the table when the first instant ADD COLUMN occurs.
Remember the default values for columns c and d in InnoDB? Columns. Now, if has default is 1, the user can know if the column has been added immediately.
In addition, if has default is 1, the default value for this column is stored in the default value field. d is set to the internal binary format of value 1000.
mysql> ALTER TABLE t1 ADD COLUMN e VARCHAR(100) DEFAULT 'Hello MySQL!';
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> SELECT table_id, name, instant_cols FROM information_schema.innodb_tables WHERE name LIKE '%t1%';
+----------+---------+--------------+
| table_id | name    | instant_cols |
+----------+---------+--------------+
|     1065 | test/t1 |            2 |
+----------+---------+--------------+
1 row in set (0.03 sec)
 
mysql> SELECT table_id, name, has_default, default_value FROM information_schema.innodb_columns WHERE table_id = 1065;
+----------+------+-------------+--------------------------+
| table_id | name | has_default | default_value            |
+----------+------+-------------+--------------------------+
|     1065 | a    |           0 | NULL                     |
|     1065 | b    |           0 | NULL                     |
|     1065 | c    |           1 | NULL                     |
|     1065 | d    |           1 | 800003e8                 |
|     1065 | e    |           1 | 48656c6c6f204d7953514c21 |
+----------+------+-------------+--------------------------+
5 rows in set (0.36 sec)
After a while, add another column. The table ID remains the same. "instant_cols" will remain unchanged, and the default value of column e will also be remembered.


Side effects and trade-offs of Instant column
There are some side effects because ADD COLUMN no longer rebuilds the table:
  • In older versions, the row size is pre checked, so ADD COLUMN fails at the beginning. However, with the new immediate ADD COLUMN, the row size will only be checked when the row is later updated.
  • In previous versions, if a table or index was corrupted, you could "fix" the problem by rebuilding the table. Adding columns on the fly creates more challenges, and we're looking for ways to mitigate that.
 
Limitations of Instant column
At present, there are some limitations:
  • Only adding columns in one statement is supported, that is, if there are other non install operations in the same statement, they cannot be completed immediately
  • Only the last added column is supported, not among existing columns
  • Rarely used COMPRESSED row format is not supported
  • Tables that already have full-text indexes are not supported
  • Any tables in the DD tablespace are not supported (????)
  • Temporary tables are not supported (supplied with COPY)

How to add fields in the original way
Although instant column addition is supported, users can still add columns in the old way (not instant).
That is, if users intend to add columns by rebuilding tables or copying rows, they can still issue ALTER TABLE using ALGORITHM = replace / COPY, or even specify the FORCE keyword. In this way, the columns are added in the traditional way.


summary
MySQL now supports a new algorithm, ALTER TABLE, called install. Related to this, the most exciting part is that the ADD COLUMN operation can now be completed immediately in InnoDB, so it is no longer a user's problem.
New features can be easily observed. Please try to use this exciting new feature and look forward to feedback on any problems you may encounter!

Thank you for using MySQL!

Keywords: MySQL Database network

Added by coolfool on Tue, 12 May 2020 16:55:02 +0300