MongoDB learning rich index

The functions and optimization principles of MongoDB index and MySql index are basically similar. MySql index types can be basically divided into:

  • Single key index - Union index
  • Primary key index (clustered index) - non primary key index (non clustered index)

In addition to these basic classifications, there are some special index types in MongoDB, such as array index, sparse index, geospatial index, TTL index, etc

For the convenience of the following test, we use the script to insert the following data

for(var i = 0;i < 100000;i++){
    db.users.insertOne({
        username: "user"+i,
        age: Math.random() * 100,
        sex: i % 2,
        phone: 18468150001+i
    });
}

Single key index

Single key index is the most basic index method, that is, there is only one index field

Use the username field in the collection to create a single key index. MongoDB will automatically name the index username_ one

db.users.createIndex({username:1})
'username_1'

After creating the index, check the query plan using the username field. The stage is IXSCAN, and the index scan is used by the IXSCAN representative

db.users.find({username:"user40001"}).explain()
{ 
   queryPlanner: 
   { 
     winningPlan: 
     { 
        ......
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { username: 1 },
           indexName: 'username_1',
           ......
        } 
     }
     rejectedPlans: [] ,
   },
   ......
   ok: 1 
}

Among the principles of index optimization, a very important principle is that the index should be built on the field with high cardinality. The so-called cardinality is the number of non repeating values in a field, that is, when we create the users set, the age value is 0-99, then the age field will have 100 non repeating values, that is, the cardinality of the age field is 100, The sex field only has two values of 0 | 1, that is, the base of the sex field is 2, which is a very low cardinality. In this case, the efficiency of the index is not high and will lead to index failure

The following is a sex field index of the ship. When querying the execution plan, it will be found that the query uses the full table scan instead of the relevant index

db.users.createIndex({sex:1})
'sex_1'

db.users.find({sex:1}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'COLLSCAN',
        filter: { sex: { '$eq': 1 } },
        direction: 'forward' 
     },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

Joint index

Joint index means that there will be multiple fields on the index. Next, use age and sex to create an index

db.users.createIndex({age:1,sex:1})
'age_1_sex_1'

Then we use these two fields to make a query, view the execution plan, and walk the index smoothly

db.users.find({age:23,sex:1}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { age: 1, sex: 1 },
           indexName: 'age_1_sex_1',
           .......
           indexBounds: { age: [ '[23, 23]' ], sex: [ '[1, 1]' ] } 
        } 
     },
     rejectedPlans: [], 
  },
  ......
  ok: 1 
 }

Array index

Array index is to create an index on array fields, also known as multi value index. To test, add some array fields to the data in the users collection

db.users.updateOne({username:"user1"},{$set:{hobby:["sing","Basketball","rap"]}})
......

Create an array index and view its execution plan. Note that isMultiKey: true indicates that the index used is a multivalued index

db.users.createIndex({hobby:1})
'hobby_1'

db.users.find({hobby:{$elemMatch:{$eq:"go fishing"}}}).explain()
{ 
   queryPlanner: 
   { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        filter: { hobby: { '$elemMatch': { '$eq': 'go fishing' } } },
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { hobby: 1 },
           indexName: 'hobby_1',
           isMultiKey: true,
           multiKeyPaths: { hobby: [ 'hobby' ] },
           ......
           indexBounds: { hobby: [ '["go fishing", "go fishing"]' ] } } 
         },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

Compared with other indexes, the index entries and volume of the array index must increase exponentially. For example, if the average size of the hobby array of each document is 10, the number of entries in the hobby array index of this collection will be 10 times that of the ordinary index

Union array index

A joint array index is a joint index containing array fields. This index does not support multiple array fields in an index, that is, there can be at most one array field in an index. This is to avoid the explosive growth of index entries. Assuming that there are two array fields in an index, the number of index entries will be n*m times that of an ordinary index

Geospatial index

Add some geographic information to the original users collection

for(var i = 0;i < 100000;i++){
    db.users.updateOne(
    {username:"user"+i},
    {
        $set:{
            location:{
                type: "Point",
                coordinates: [100+Math.random() * 4,40+Math.random() * 3]
            }
        }
    });
}

Create a 2D spatial index

db.users.createIndex({location:"2dsphere"})
'location_2dsphere'

//Query people within 500 meters
db.users.find({
  location:{
    $near:{
      $geometry:{type:"Point",coordinates:[102,41.5]},
      $maxDistance:500
    }
  }
})

There are many type s of geospatial indexes, including ponit (point) | linestring (line) | polygon (polygon), etc

TTL index

TTL is time to live, which is mainly used to automatically delete expired data. When using this index, you need to declare a time type field in the document, and then set an expireAfterSeconds when creating TTL index for this field. The expiration time unit is seconds. After creation, MongoDB will regularly check the data in the collection. When:

$$ Current time - TTL index field time > expireaftersrconds $$

MongoDB will automatically delete these documents. This index also has the following requirements:

  • TTL index can only have one field, and there is no federated TTL index
  • TTL cannot be used for fixed sets
  • TTL indexes are traversed one by one. If it is found that the deletion conditions are met, it will be deleted by using the delete function, which is not efficient

First, add or delete a time field on our document

for(var i = 90000;i < 100000;i++){
    db.users.updateOne(
    {username:"user"+i},
    {
        $set:{
            createdDate:new Date()
        }
    });
}

Create a TTL index and set the expiration time to 60s. After 60s, query and find that these data no longer exist

db.users.createIndex({createdDate:1},{expireAfterSeconds:60})
'createdDate_1'

You can also change the expiration time of the TTL index with the CollMod command

db.runCommand({
  collMod:"users",
  index:{
    keyPattern:{createdDate:1},
    expireAfterSeconds:120
  }
})

{ expireAfterSeconds_old: 60, expireAfterSeconds_new: 120, ok: 1 }

Conditional index

Conditional index is also called partial index, which only indexes the data that meets the conditions

Create a username only for users over 50_ 1 index. When you view the execution plan, you will find that the isPartial field will become true

db.users.createIndex({username:1},{partialFilterExpression:{
    age:{$gt:50}
  }})
'username_1'

db.users.find({$and:[{username:"user4"},{age:60}]}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        filter: { age: { '$eq': 60 } },
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { username: 1 },
           indexName: 'username_1',
           ......
           isPartial: true,
           ......
         } 
     },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

Sparse index

A general index will create an index for the whole collection according to a certain field. Even if a document does not have this field, the index will establish this field of the document in the index as null

Sparse index does not index fields that do not exist in the document. If this field exists but is null, the index will be created

Let's create a sparse index for some data in the users collection

for(var i = 5000;i < 10000;i++){
  if(i < 9000){
    db.users.updateOne(
      {username:"user"+i},
      { $set:{email:(120000000+i)+"@qq.email"}}
    )
  }else{
    db.users.updateOne(
      {username:"user"+i},
      { $set:{email:null}}
    )
  }
}

When the {email:null} condition is used to query without indexing, we will find that the documents found contain documents without email field

db.users.find({email:null})
{ 
  _id: ObjectId("61bdc01ba59136670f6536fd"),
  username: 'user0',
  age: 64.41483801726282,
  sex: 0,
  phone: 18468150001,
  location: 
  { 
    type: 'Point',
    coordinates: [ 101.42490900320335, 42.2576650823515 ] 
  } 
}
......

Then create a sparse index for the email field and query with the {email:null} condition. It is found that all the documents queried are null documents with email field

db.users.createIndex({email:1},{sparse:true});
'email_1'

db.users.find({email:null}).hint({email:1})
{ 
  _id: ObjectId("61bdc12ca59136670f655a25"),
  username: 'user9000',
  age: 94.18397576757012,
  sex: 0,
  phone: 18468159001,
  hobby: [ 'go fishing', 'Table Tennis' ],
  location: 
  { 
    type: 'Point',
    coordinates: [ 101.25903151863596, 41.38450145025062 ] 
  },
  email: null 
}
......

Text index

The text index will segment the indexed document fields first and then retrieve them, but Chinese word segmentation is not supported at present

Next, add two text fields to create a joint text index

db.blog.insertMany([
  {title:"hello world",content:"mongodb is the best database"},
  {title:"index",content:"efficient data structure"}
])

//Create index
db.blog.createIndex({title:"text",content:"text"})
'title_text_content_text'
//Query using text index
db.blog.find({$text:{$search:"hello data"}})
{ 
  _id: ObjectId("61c092268c4037d17827d977"),
  title: 'index',
  content: 'efficient data structure' 
},
{ 
  _id: ObjectId("61c092268c4037d17827d976"),
  title: 'hello world',
  content: 'mongodb is the best database' 
}

unique index

Unique index means that duplicate elements cannot appear in the field where the index is established. In addition to single field unique index, there are joint unique index and array unique index (that is, there can be no element intersection between arrays)

//Create a unique index on the title field
db.blog.createIndex({title:1},{unique:true})
'title_1'
//Insert an existing title Value
db.blog.insertOne({title:"hello world",content:"mongodb is the best database"})
MongoServerError: E11000 duplicate key error collection: mock.blog index: title_1 dup key: { : "hello world" }
//Check the execution plan. isUnique is true
db.blog.find({"title":"index"}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { title: 1 },
           indexName: 'title_1',
           isMultiKey: false,
           multiKeyPaths: { title: [] },
           isUnique: true,
           ......
         } 
     },
     rejectedPlans: [] 
  },
  .......
  ok: 1 
}

Keywords: Database JSON MongoDB Back-end nosql

Added by onicsoft on Mon, 20 Dec 2021 23:17:52 +0200