MySQL -- index creation principle

MySQL -- index creation principle

1. Data preparation

Create database and table

create database if not exists testindexdb;

use testindexdb;

create table student_info(
    id int(11) not null auto_increment,
    student_id int(11) not null ,
    name varchar(20) default null,
    course_id int not null ,
    class_id int(11) default null,
    create_time datetime default current_timestamp on update current_timestamp,
    primary key (id)
)engine=innodb auto_increment=1 default charset=utf8;

create table course(
    id int(11) not null auto_increment,
    course_id int not null ,
    course_name varchar(40) default null,
    primary key (id)
)engine=innodb auto_increment=1 default charset=utf8;

Write storage functions to create analog data

Open create function:

set global log_bin_trust_function_creators = 1;

Function 1: randomly generate string function

delimiter //
create function rand_string(n int)
    returns varchar(255)
begin 
    declare chars_str varchar(100) default 
        'abcdefghijklmnopqrstuvwsyzABCDEFGJHIKLMNOPQRSTUVWSYZ';
    declare return_str varchar(255) default '';
    declare i int default 0;
    while i < n do
        set return_str = concat(return_str,substring(chars_str,floor(1+rand()*52),1));
        set i = i + 1;
        end while ;
    return return_str;
end //
delimiter ;

Function 2: random number generation function

delimiter //
create function rand_num(from_num int , to_num int) returns int(11)
begin 
    declare i int default 0;
    set i = floor(from_num+rand()*(to_num - from_num + 1));
    return i;
end //
delimiter ;

Create a stored procedure to insert data

To create a stored procedure for inserting a curriculum:

delimiter //
create procedure insert_course(max_num int)
begin 
    declare i int default 0;
    set autocommit  = 0;
    repeat 
        set i = i + 1;
        insert into course(course_id, course_name) VALUES (rand_num(10000,10100),rand_string(6));
        until i = max_num
    end repeat ;
    commit;
end //
delimiter ;

Create a stored procedure to insert the student information table:

delimiter //
create procedure insert_stu(max_num int)
begin
    declare i int default 0;
    set autocommit = 0; -- Set up manual commit transactions
    repeat -- loop
        set i = i+1; -- assignment
        insert into student_info (course_id,class_id,student_id,name) values (rand_num(10000,10100),
                                                                              rand_num(10000,10200),
                                                                              rand_num(1,200000),
                                                                              rand_string(6));
    until  i = max_num
        end repeat ;
    commit ; -- Commit transaction
end //
delimiter ;

Call stored procedure:

call insert_course(100);

call insert_stu(1000000);

2. Suitable for index creation

1. The value of the field has uniqueness restrictions

Indexes themselves can serve as constraints. For example, unique indexes and primary key indexes can serve as uniqueness constraints. Therefore, in our data table, if a field is unique, we can directly create a unique index or primary key index. In this way, a record can be determined more quickly through the index.

For example, the student number in the student table is a unique field. Establishing a unique index for this field can quickly determine the information of a student. If the name is used, there may be the phenomenon of the same name, thus reducing the query speed.

2. Fields frequently used as where query criteria

A field is often used in the WHERE condition of the SELECT statement, so you need to create an index for this field. Especially in the case of large amount of data, creating a general index can greatly improve the efficiency of data query.

Query time when the index is not built: 10038 pieces of data are queried, and the time is 310ms

select course_id,class_id,name,create_time,student_id from student_info where course_id = 10085;

In course_ Create index on ID field:

create index idx_cid on student_info(course_id);

Execute the same query again: 10038 pieces of data were queried, and the time took 30ms, which improved significantly

3. Frequently GROUP BY and ORDER BY columns

Indexing is to store or retrieve data in a certain order. Therefore, when we use GROUP BY to group query data or ORDER BY to sort data, we need to index the grouped or sorted fields. If there are multiple columns to be sorted, you can establish a composite index on these columns.

If both GROUP BY and ORDER BY are used in the query, a joint index can be established, in which the fields used by GROUP BY are placed in the front and the fields used by ORDER BY are placed in the back.

4. Columns in the WHERE condition during UPDATE and DELETE

Indexes need to be created for the columns in the WHERE condition during UPDATE and DELETE.

Query the data according to a certain condition and then UPDATE or DELETE. If you create an index on the WHERE field, you can greatly improve the efficiency. The principle is that we need to retrieve this record according to the WHERE condition column first, and then UPDATE or DELETE it. If the updated field is a non index field during the UPDATE, the efficiency will be improved more obviously, because the non index field UPDATE does not need to maintain the index.

5. Index creation is required for DISTINCT de duplication field

Sometimes we need to de duplicate a field and use DISTINCT, so creating an index on the de duplicated field will also improve the query efficiency.

6. Create index on multi table join

1. Create an index on the column of WHERE condition, because WHERE is the filter of data condition. If the amount of data is very large, filtering without WHERE condition is very terrible.

2. Create an index on the field used for connection, and the field must be of the same type in multiple tables. Like course_id in student_ Both info table and course table are of type int(11), but one cannot be of type int and the other varchar.

7. Create an index on a small range of data types that use columns

The smaller the data type, the faster the comparison operation during query

The smaller the data type, the smaller the storage space occupied by the index. More records can be put down in one data page, so as to reduce the performance loss caused by disk I/0, which means that more data pages can be cached in memory, so as to speed up the reading and writing efficiency.

When the data type range of the table's primary key is small, it is more suitable to add an index, because not only the primary key value will be stored in the cluster index, but also the primary key value of a record will be stored at the nodes of all other secondary indexes. If the primary key uses a smaller data type, it means saving more storage space and more efficient I/0.

8. Create index using string prefix·

Assuming that our string is very long, it takes a lot of storage space to store a string. When we need to build an index for this character string, we can build an index by intercepting the front part of the field, which is called prefix index.

In this way, although the location of the record cannot be accurately located when looking up the record, the location of the corresponding prefix can be located, and then the complete string value can be queried back to the table according to the primary key value of the record with the same prefix. It not only saves space, but also reduces the comparison time of strings, but also can generally solve the problem of sorting.

9. Columns with high discrimination (high hashing) are suitable for indexing

The cardinality of a column refers to the number of non duplicate data in a column. For example, a column contains values 2, 5, 8, 2, 5, 8, 2, 5, 8. Although there are 9 records, the cardinality of the column is 3. That is, when the number of record rows is fixed, the greater the cardinality of the column, the more scattered the values in the column; The smaller the cardinality of a column, the more concentrated the values in the column. The cardinality index of this column is very important, which directly affects whether we can effectively use the index. It is best to index columns with a large cardinality. Indexing for columns with a small cardinality may not be effective.

You can use the following formula to calculate the discrimination. The closer it is to 1, the better. Generally, more than 33% is a more efficient index.

 select count(distinct a)/count(*) from t1

Expansion: the joint index puts the columns with high discrimination (high hashability) in front.

10. The most frequently used column is placed to the left of the federated index

Best left prefix principle

11. When multiple fields need to be indexed, the joint index is better than the single column index

3. Limit the number of indexes

The index is not the more the better. It should be created according to the query. Although MySQL can build up to 65 indexes (64 secondary indexes + 1 primary key index) on a single ordinary table, it is recommended that the number of indexes on a single table should not exceed 6. Reasons:

  • The more indexes are not the better. Indexes can improve the efficiency of query, but reduce the efficiency of writing data. Sometimes inappropriate indexes can also reduce the efficiency of queries.
  • Each index needs to occupy disk space. The more indexes, the more disk space needed.
  • The index will affect the performance of INSERT, DELETE, UPDATE and other statements, because when the data in the table changes, the index will also be adjusted and updated, which will cause a burden.
  • When selecting how to optimize the query, the optimizer will evaluate each available index according to the unified information to generate the best execution plan. If there are many indexes that can be used for the query at the same time, it will increase the time for MySQL optimizer to generate the execution plan and reduce the query performance.

4. Not suitable for index creation

1. Do not set the index if no field is used in wherw, GROUP BY or ORDER BY

2. It is better not to use indexes for tables with small amount of data

3. It is forbidden to build a separate index for each column in the table

4. Do not build indexes on columns with a large number of duplicates

5. Do not create indexes on frequently updated tables and frequently updated fields

6. Unordered values are not recommended as indexes

  • For example, ID card, UUID, MD5, HASH, unordered factory string, etc.

7. Delete indexes that are no longer used or rarely used

8. Do not define redundant or duplicate indexes

  • Repeat: primary key(id), index(id), unique index(id)

  • Redundancy: index(a,b,c), index(a,b), index(a),

  • Duplicate and redundant indexes will reduce query efficiency, because MySQL query optimizer will not know which index to use.

Keywords: Database MySQL index

Added by Sephiriz on Thu, 24 Feb 2022 11:23:00 +0200