Database index

Indexes

1.1 what does database index do?

Database index is actually to make data query faster   /*Check the important things and say them three times*/

1.2 what are the database indexes?

1. Clustered index (primary key index). In the database, all rows will be sorted according to the primary key index
2. Non clustered index: index ordinary fields
3. Joint index: it is an index composed of several fields, also known as joint index

key 'idx_age_name_sex' ('age','name','sex')

The joint index follows the leftmost prefix principle. What does it mean? For example, the joint index in a student table is shown above. Which of the following a, B, C, D, e and F will go through the index?

A:select * from student where age = 16 and name = 'Xiao Zhang'
B:select * from student where name = 'Xiao Zhang' and sex = 'male'
C:select * from student where name = 'Xiao Zhang' and sex = 'male' and age = 18
D:select * from student where age > 20 and name = 'Xiao Zhang'
E:select * from student where age != 15 and name = 'Xiao Zhang'
F:select * from student where age = 15 and name != 'Xiao Zhang'

---------------------------------------------------
/*A Follow the leftmost matching principle. age is on the leftmost side, so A goes to the index;
B Starting directly from name, the leftmost matching principle is not followed, so the index is not used;
C Although it starts from name, there is an age on the leftmost side of the index. mysql will automatically change to where age = '18' and name = 'Xiao Zhang' and sex = 'male', so the leftmost matching principle is followed;
D This is because age > 20 is the range, and the range field will end the index's use of the index field behind the range, so only the age index is used;
E Although this follows the leftmost matching principle, it does not take the index because= No index;
F This only uses the age index, not the name index. The reason is as follows*/

Supplement:
Since mysql queries can only use one index at a time, this has improved the efficiency of full table scanning compared with no index

However, if you create a composite index on the area and age columns, it will bring higher efficiency. If we create a composite index of (area,age,salary), it is equivalent to creating three indexes of (area,age,salary), (area,age) and (area), which is called the best left prefix feature.

1.3. Create index

When executing the CREATE TABLE statement, you can create an index, or you can use CREATE INDEX or ALTER TABLE alone to add an index to the table.

  1. ALTER TABLE

    1. ALTER TABLE Used to create ordinary indexes UNIQUE Index or PRIMARY KEY Indexes.
    
       ALTER TABLE `table_name` ADD INDEX index_name (column_list)
    
       ALTER TABLE `table_name` ADD UNIQUE (column_list)
    
       ALTER TABLE `table_name` ADD PRIMARY KEY (column_list)
       
       /*Where table_name is the name of the table to be indexed, column_list indicates which columns are indexed. When there are multiple columns, the columns are separated by commas. Index nameindex_ Name is optional. By default, MySQL will assign a name to the first index column. In addition, ALTER TABLE allows you to change multiple tables in a single statement, so you can create multiple indexes at the same time.*/
    
  2. CREATE INDEX

    CREATE INDEX You can add a common index or index to the table UNIQUE Indexes.
    
    CREATE INDEX `table_name` ON table_name (column_list)
    
    CREATE UNIQUE INDEX `table_name` ON table_name (column_list)
    
    /*table_name,index_name And column_list has the same meaning as in ALTER TABLE statement, and the index name is not optional. In addition, you cannot create a PRIMARY KEY index with the CREATE INDEX statement.*/
    

1.4 index type

When creating an index, you can specify whether the index can contain duplicate values. If not, the index should be created as a PRIMARY KEY or UNIQUE index. For single column UNIQUE indexes, this ensures that a single column does not contain duplicate values. For multi column UNIQUE indexes, ensure that the combination of multiple values is not repeated.

The PRIMARY KEY index is very similar to the UNIQUE index.
In fact, the PRIMARY KEY index is only a UNIQUE index with the name PRIMARY. This means that a table can only contain one PRIMARY KEY, because it is impossible to have two indexes with the same name in a table.

The following SQL statement adds a PRIMARY KEY index on sid to the students table.

ALTER TABLE students ADD PRIMARY KEY (sid)

1.5. Delete index

You can use ALTER TABLE or DROP INDEX statements to drop indexes. Similar to the CREATE INDEX statement, DROP INDEX can be processed as a statement within ALTER TABLE. The syntax is as follows.

DROP INDEX index_name ON talbe_name
/*Use DROP INDEX as an internal statement in the ALTER TABLE statement*/
ALTER TABLE table_name DROP INDEX index_name
/* ditto*/
ALTER TABLE table_name DROP PRIMARY KEY

The third statement is only used when deleting the PRIMARY KEY index, because a table can only have one PRIMARY KEY index, so it is not necessary to specify the index name. If the PRIMARY KEY index is not created, but the table has one or more UNIQUE indexes, MySQL will delete the first UNIQUE index.

If a column is deleted from the table, the index is affected. For a multi column composite index, if a column is deleted, the column is also deleted from the index. If you delete all the columns that make up the index, the entire index is deleted

1.6. View index

mysql> show index from tblname;

mysql> show keys from tblname;
· Table
  Name of the table.
· Non_unique
  0 if the index cannot include duplicate words. 1 if yes.
· Key_name
	The name of the index.
· Seq_in_index
  The column serial number in the index, starting with 1.
· Column_name
  Column name.
· Collation
  How columns are stored in the index. stay MySQL In, there are values'A'(Ascending) or NULL(No classification).
· Cardinality
  An estimate of the number of unique values in the index. By running ANALYZE TABLE or myisamchk -a Can be updated. Cardinality counts according to statistics stored as integers, so even for small tables, the value does not need to be accurate. The larger the base, when the union is performed, MySQL The greater the chance of using the index.
· Sub_part
  If the column is only partially indexed, the number of characters indexed. If the entire column is indexed; otherwise NULL. 
· Packed
  Indicates how keywords are compressed. If not compressed; otherwise NULL. 
· Null
  If the column contains NULL,Then contain YES. If not, the column contains NO. 
· Index_type
  Used indexing method( BTREE, FULLTEXT, HASH, RTREE). 
· Comment

1.7. Under what circumstances are index keywords used

  1. Table field unique constraint

    ORACLE Using indexes to ensure data integrity
    
    as lc_hj(In process (link) lc_bh+hj_sx(Process number+Phase sequence)
    
  2. Fields of direct condition query

    Fields of direct condition query
     stay SQL Fields in for conditional constraints
     as zl_yhjbqk(User profile) qc_bh(Area Book No.)
    
    select * from zl_yhjbqk where qc_bh='<????Be careful???>7001'
    
  3. Fields associated with other tables in the query

    Fields often have foreign key relationships
     as zl_ydcf(Electrical components) jldb_bh(Metering point (meter No.)
    select * from zl_ydcf a,zl_yhdb b where a.jldb_bh=b.jldb_bh and b.jldb_bh='540100214511'
    
    /*Writing method of new specification*/
    select * from zl_ydcf a inner join zl_yhdb b on a.jldb_bh=b.jldb_bh and b.jldb_bh='540100214511'
    
    /*Note: the difference between the two writing methods is only the difference in specifications. In fact, there is little difference in performance, but the new writing method can clarify which connection method it is */
    
  4. Fields sorted in query

    If the sorted fields are accessed through the index, the sorting speed will be greatly improved

    select * from zl_yhjbqk order by qc_bh(establish qc_bh (index)
    select * from zl_yhjbqk where qc_bh='7001' order by cb_sx(establish qc_bh+cb_sx Index, note: it is only an index, including qc_bh and cb_sx Fields)
    
  5. Fields for statistics or group statistics in query

    select max(hbs_bh) from zl_yhjbqk
    
    select qc_bh,count(*) from zl_yhjbqk group by qc_bh
    

1.8. Under what circumstances are indexes not built or less built

  1. Too few table records

    If a table has only 5 records and the index is used to access the records, the index table must be accessed first, and then the data table must be accessed through the index table. Generally, the index table and the data table are not in the same data block. In this case, ORACLE must read the data block back and forth at least twice. Without an index, ORACLE will read all the data at once, and the processing speed will obviously be faster than using an index.

    As shown in the table zl_sybm(Operating Department) generally has only a few records. Indexing any field except the primary keyword will not produce performance optimization. In fact, after statistical analysis of this table ORACLE It will not use the index you built, but automatically perform full table access. For example:
    
    select * from zl_sybm where sydw_bh='5401'(yes sydw_bh Indexing does not result in performance optimization)
    
  2. Tables frequently inserted, deleted and modified

    For some frequently processed business tables, the indexes should be reduced as far as possible if the query is allowed, such as zl_yhbm,gc_dfss,gc_dfys,gc_fpdy and other business tables.

  3. Table fields with duplicate and evenly distributed data

    If A table has 100000 rows of records, A field A has only T and F values, and the distribution probability of each value is about 50%, indexing the field A of this table generally will not improve the query speed of the database.

  4. Table fields that are often queried together with the main field but have more index values in the main field

    Such as GC_ The DFSS (electricity charge paid in) table often inquires about a certain collection according to the charging serial number, household identification number, meter reading date, electricity charge occurrence month and year, and operation flag. If all fields are built in one index, the data modification, insertion and deletion time will be increased, In fact, if a collection is indexed according to the charging serial number, it will reduce the number of records to only a few. If the query is indexed according to the following fields, it will not have a great impact on the performance.

1.9. Matters of indexing ten million MySQL databases and means to improve performance

  1. matters needing attention:

    First, you should consider whether the table space and disk space are sufficient. We know/*Index is also a kind of data*/,It is bound to occupy a lot of table space when building an index. Therefore, when indexing a large table, the first consideration should be the space capacity.
    
    Secondly, the table should be locked when establishing the index. Therefore, it should be noted that the operation should be carried out when the business is idle.
    
  2. Performance adjustment:

    The first consideration is disk I/O. Physically, you should try to spread the indexes and data on different disks (regardless of the array). Logically, the data table space is separated from the index table space. This is the basic criterion that should be followed when building an index.

    Secondly, we know that we need to scan the whole table when building the index. Therefore, we should consider increasing the initialization parameter dB_ file_ multiblock_ read_ The value of count. Typically set to 32 or greater.

    Thirdly, in addition to full table scanning, index building also requires a large number of sorting operations on the data. Therefore, the size of the sorting area should be adjusted.

    Before 9i, you can increase sort at the session level_ area_ Size, such as 100m or more.

    After 9i, if the parameter workarea is initialized_ size_ If the value of policy is TRUE, the sorting area is from PGA_ aggregate_ Automatically allocated in target.

    Finally, you can add the nologging option when building an index. To reduce the large amount of redo generated in the indexing process, so as to improve the execution speed.

1.10. Problems needing attention when MySql establishes index optimization

  1. Create index

    Index is particularly important for query dominated applications. Many times, performance problems are simply caused by forgetting to add indexes, or by not adding more effective indexes. If you do not add an index, you will perform a full table scan to find even a specific piece of data. If a table has a large amount of data and few qualified results, you will cause fatal performance degradation if you do not add an index.
    However, it is not necessary to build an index in any case. For example, there may be only two values for gender. Building an index not only has no advantages, but also affects the update speed. This is called excessive index.

  2. Composite index

    For example, there is a statement like this:

      select * from users where area='beijing' and age=22;
    

    If we create a single index on area and age respectively, since mysql query can only use one index at a time, although this has improved the efficiency of full table scanning compared with no index, it will bring higher efficiency if we create a composite index on area and age columns. If we create a composite index of (area,age,salary), it is equivalent to creating three indexes of (area,age,salary), (area,age) and (area), which is called the best left prefix feature.
    Therefore, when creating a composite index, we should put the most commonly used columns as constraints on the far left and decrease them in turn.

  3. The index will not contain columns with NULL values

    As long as a column contains a NULL value, it will not be included in the index. As long as a column in a composite index contains a NULL value, this column is invalid for this composite index. Therefore, we should not let the default value of the field be NULL when designing the database.

  4. Use short index

    Index a string column and specify a prefix length if possible. For example, if you have a CHAR(255) column, if multiple values are unique within the first 10 or 20 characters, do not index the entire column. Short index can not only improve query speed, but also save disk space and I/O operation.

  5. Index problem of sorting

    mysql query only uses one index, so if the index is already used in the where clause, the column in order by will not use the index. Therefore, if the default sorting of the database can meet the requirements, do not use the sorting operation; Try not to include the sorting of multiple columns. If necessary, it is best to create a composite index for these columns.

  6. like statement operation

    Generally, the use of like operation is not encouraged. If it is necessary, how to use it is also a problem. Like "% aaa%" does not use indexes, while * * like "aaa%" * * can use indexes.

  7. Do not operate on columns

    select * from users where
    
    YEAR(adddate)
    
  8. NOT IN not used

    NOT IN will not use indexes and will perform a full table scan. NOT IN can be replaced by NOT EXISTS

2.0 expansion

The original intention of index technology is to find out a field from a large data set quickly, which is equal to the value of a certain value (for example, finding someone by ID number). A scale (number of lines) is N The data set needs to be compared with traversal search N If the data is ordered according to the field value (called key value in the index), then a binary tree can be established to find it by dichotomy, as long as the comparison logN(Take 2 as the bottom) times, for example, 1 billion rows of data only need to be compared 30 times (1 billion is about 2^30),This obviously can greatly improve the performance. Sometimes there may be duplicate key values (find people by birth date) or search requirements in key value range (find people by birth date range), and the comparison times will be lower than logN Larger, but basically this order of magnitude.

The essence of index is sorting.

Of course, we generally do not sort the original data set, but make the key value of each record and the position of this record in the data set, and the key value order into a smaller data set, that is, the index table. If there are other fields to be used for key value search, you can create other indexes. There is only one original dataset, and there can be multiple indexes. If each index sorts the original dataset, the dataset will be copied many times, taking up too much space.

In addition, when building an index, the database should also consider that the data will be inserted and deleted. A simple sorted index will lead to a very high cost of insertion and deletion. At this time, it is generally used B Tree to facilitate quick updates. B Tree is equivalent to expanding a binary tree into n Fork tree, in essence, is still key value order. (there are a lot of topics on how to establish the index. We will find another opportunity to discuss it. Here we only discuss the use of the index)

Another way to extend it is HASH An index that calculates the value of a record key HASH Value, hashed to 1...k Range of natural numbers. In this way, you don't even have to do binary comparison when searching. You can use it directly HASH The value is located. HASH The method is only used for accurate search of key values and cannot be used to realize interval search because HASH The function is not monotonous, and has lost the size information of the original ID number, but it is also sufficient in many scenarios. HASH The index is also a sort in essence, only using the key value HASH Value. The following discussion still takes the common key value sorting as an example, and the conclusion also applies to HASH Indexes.

In principle, it is obvious that indexing will not improve the performance of full data traversal. When some programmers do not know where to go, they also build indexes in order to improve the performance of grouping and summary, which is abuse.

Keywords: Database MySQL SQL

Added by jcampbell1 on Fri, 12 Nov 2021 13:43:04 +0200