Python operates MongoDB. Just read this article

preface

MongoDB is a non relational database written in C + + language. It is an open source database system based on distributed file storage. Its content storage form is similar to JSON object. Its field value can contain other documents, arrays and document arrays, which is very flexible. In this section, let's take a look at the storage operation of MongoDB under Python 3.

1. Preparation

Before you start, make sure MongoDB is installed and its services are started, and Python's PyMongo library is installed.

2. Connect MongoDB

When connecting MongoDB, we need to use MongoClient in PyMongo library. Generally speaking, the IP and port of MongoDB can be passed in. The first parameter is the address host and the second parameter is the port port (if no parameters are passed to it, the default is 27017):

import pymongo
client = pymongo.MongoClient(host='localhost', port=27017)

In this way, you can create the connection object of MongoDB.

In addition, the first parameter host of MongoClient can also be directly passed into the connection string of mongodb, which starts with mongodb, for example:

client = MongoClient('mongodb://localhost:27017/')

This can also achieve the same connection effect.

3. Specify database

Multiple databases can be established in MongoDB. Next, we need to specify which database to operate. Here we take the test database as an example to illustrate that the next step is to specify the database to be used in the program:

db = client.test

Here, call the test attribute of the client to return to the test database. Of course, we can also specify:

db = client['test']

The two methods are equivalent.

4. Specify the set

Each database of MongoDB contains many collection s, which are similar to tables in relational databases.

Next, you need to specify the set to operate. Here, you need to specify a set name as students. Similar to specifying a database, there are two ways to specify a collection:

collection = db.students
collection = db['students']

In this way, we declare a Collection object.

5. Insert data

Next, you can insert data. For the students collection, create a new student data, which is represented in the form of a dictionary:

student = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

The student number, name, age and gender of the student are specified here. Next, directly call the insert() method of collection to insert data. The code is as follows:

result = collection.insert(student)
print(result)

In MongoDB, each data actually has one_ id attribute to uniquely identify. If the attribute is not explicitly specified, MongoDB will automatically generate an object of type ObjectId_ id attribute. The insert() method returns after execution_ id value.

The operation results are as follows:

5932a68615c2606814c91f3d

Of course, we can also insert multiple pieces of data at the same time. We only need to pass them in the form of list. The example is as follows:

student1 = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

student2 = {
    'id': '20170202',
    'name': 'Mike',
    'age': 21,
    'gender': 'male'
}

result = collection.insert([student1, student2])
print(result)

The returned result is corresponding_ Set of IDS:

[ObjectId('5932a80115c2606a59e8a048'), ObjectId('5932a80115c2606a59e8a049')]

In fact, in pymongo 3 In the X version, the insert() method is officially not recommended. Of course, there is no problem with continued use. Insert is officially recommended_ One() and insert_ Use the many() method to insert a single record and multiple records respectively. An example is as follows:

student = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

result = collection.insert_one(student)
print(result)
print(result.inserted_id)

The operation results are as follows:

<pymongo.results.InsertOneResult object at 0x10d68b558>
5932ab0f15c2606f0c1cf6c5

Different from the insert() method, the InsertOneResult object is returned this time, and we can call its inserted_id attribute acquisition_ id.

For insert_ With the many() method, we can transfer the data in the form of a list. The example is as follows:

student1 = {
    'id': '20170101',
    'name': 'Jordan',
    'age': 20,
    'gender': 'male'
}

student2 = {
    'id': '20170202',
    'name': 'Mike',
    'age': 21,
    'gender': 'male'
}

result = collection.insert_many([student1, student2])
print(result)
print(result.inserted_ids)

The operation results are as follows:

<pymongo.results.InsertManyResult object at 0x101dea558>
[ObjectId('5932abf415c2607083d3b2ac'), ObjectId('5932abf415c2607083d3b2ad')]

The type returned by this method is InsertManyResult. Insert is called_ The IDS property can get the of the inserted data_ id list.

6. Query

After inserting data, we can use find_one() or find() method, where find_one() gets a single result, and find() returns a generator object. Examples are as follows:

result = collection.find_one({'name': 'Mike'})
print(type(result))
print(result)

Here, we query the data name d Mike. The returned result is the dictionary type. The running result is as follows:

<class 'dict'>
{'_id': ObjectId('5932a80115c2606a59e8a049'), 'id': '20170202', 'name': 'Mike', 'age': 21, 'gender': 'male'}

As you can see, it's too much_ id attribute, which is automatically added by MongoDB during insertion.

In addition, we can also query by objectid. At this time, we need to use the objectid in the bson Library:

from bson.objectid import ObjectId

result = collection.find_one({'_id': ObjectId('593278c115c2602667ec6bae')})
print(result)

The query result is still dictionary type, as follows:

{'_id': ObjectId('593278c115c2602667ec6bae'), 'id': '20170101', 'name': 'Jordan', 'age': 20, 'gender': 'male'}

Of course, if the query result does not exist, it will return None.

For the query of multiple pieces of data, we can use the find() method. For example, the data with the age of 20 can be found here. The example is as follows:

results = collection.find({'age': 20})
print(results)
for result in results:
    print(result)

The operation results are as follows:

<pymongo.cursor.Cursor object at 0x1032d5128>
{'_id': ObjectId('593278c115c2602667ec6bae'), 'id': '20170101', 'name': 'Jordan', 'age': 20, 'gender': 'male'}
{'_id': ObjectId('593278c815c2602678bb2b8d'), 'id': '20170102', 'name': 'Kevin', 'age': 20, 'gender': 'male'}
{'_id': ObjectId('593278d815c260269d7645a8'), 'id': '20170103', 'name': 'Harden', 'age': 20, 'gender': 'male'}

The returned result is the Cursor type, which is equivalent to a generator. We need to traverse to get all the results, and each result is a dictionary type.

If you want to query data older than 20, write it as follows:

results = collection.find({'age': {'$gt': 20}})

The query condition key value here is no longer a simple number, but a dictionary. Its key name is the comparison symbol $gt, which means greater than and the key value is 20.

The comparison symbols are summarized in the following table.

SymbolmeaningExamples
$ltless than{'age': {'$lt': 20}}
$gtgreater than{'age': {'$gt': 20}}
$lteLess than or equal to{'age': {'$lte': 20}}
$gteGreater than or equal to{'age': {'$gte': 20}}
$neNot equal to{'age': {'$ne': 20}}
$inWithin range{'age': {'$in': [20, 23]}}
$ninOut of range{'age': {'$nin': [20, 23]}}

In addition, regular matching queries can also be performed. For example, query the student data whose name starts with M, as shown in the following example:

results = collection.find({'name': {'$regex': '^M.*'}})

Here, $regex is used to specify the regular match, and ^ M. * represents the regular expression starting with M.

Here, some function symbols are classified into the following table.

SymbolmeaningExamplesExample meaning
$regexMatching regular expressions{'name': {'$regex': '^M.*'}}name starts with M
$existsDoes the property exist{'name': {'$exists': True}}The name attribute does not exist
$typeType judgment{'age': {'$type': 'int'}}age is of type int
$modDigital analog operation{'age': {'$mod': [5, 0]}}Age module 5 + 0
$textText query{'$text': {'$search': 'Mike'}}The attribute of type text contains the Mike string
$whereAdvanced condition query{'$where': 'obj.fans_count == obj.follows_count'}The number of fans is equal to the number of followers

7. Counting

To count the number of pieces of data in the query result, you can call the count() method. For example, count the number of all data:

count = collection.find().count()
print(count)

Or statistics of data meeting certain conditions:

count = collection.find({'age': 20}).count()
print(count)

The running result is a numerical value, that is, the number of qualified data pieces.

8. Sorting

When sorting, you can directly call the sort() method and pass in the sorted field and ascending / descending order flag. Examples are as follows:

results = collection.find().sort('name', pymongo.ASCENDING)
print([result['name'] for result in results])

The operation results are as follows:

['Harden', 'Jordan', 'Kevin', 'Mark', 'Mike']

Here we call pymongo Ascending specifies ascending order. If you want to sort in descending order, you can pass in pymongo DESCENDING.

9. Offset

In some cases, we may want to take only a few elements. At this time, we can use the skip() method to offset several positions, such as offset 2. Ignore the first two elements and get the third and subsequent elements:

results = collection.find().sort('name', pymongo.ASCENDING).skip(2)
print([result['name'] for result in results])

The operation results are as follows:

['Kevin', 'Mark', 'Mike']

In addition, you can use the limit() method to specify the number of results to get, as shown in the following example:

results = collection.find().sort('name', pymongo.ASCENDING).skip(2).limit(2)
print([result['name'] for result in results])

The operation results are as follows:

['Kevin', 'Mark']

If you do not use the limit() method, three results will be returned. After adding restrictions, two results will be intercepted and returned.

It is worth noting that when the number of databases is very large, such as tens of millions or hundreds of millions, it is best not to use a large offset to query data, because this may lead to memory overflow. At this time, you can use the following operations to query:

from bson.objectid import ObjectId
collection.find({'_id': {'$gt': ObjectId('593278c815c2602678bb2b8d')}})

At this time, you need to record the last query_ id.

10. Update

For data update, we can use the update() method to specify the update conditions and the updated data. For example:

condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 25
result = collection.update(condition, student)
print(result)

Here we want to update the age of name's data for Kevin: first specify the query condition, then query the data, and modify the age and call the update() method to import the original condition and the modified data.

The operation results are as follows:

{'ok': 1, 'nModified': 1, 'n': 1, 'updatedExisting': True}

The returned result is in the form of a dictionary. ok represents successful execution, and nModified represents the number of data pieces affected.

In addition, we can also use the $set operator to update the data. The code is as follows:

result = collection.update(condition, {'$set': student})

This allows you to update only the fields that exist in the student dictionary. If there are other fields, they will not be updated or deleted. If $set is not used, all previous data will be replaced with student dictionary; If there are other fields, they will be deleted.

In addition, the update() method is actually not recommended by the official. It is also divided into update_one() method and update_ The usage of many() method is more strict. Their second parameter needs to use the $type operator as the key name of the dictionary. An example is as follows:

condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 26
result = collection.update_one(condition, {'$set': student})
print(result)
print(result.matched_count, result.modified_count)

Update is called here_ For the one () method, the second parameter can no longer be directly passed into the modified dictionary. Instead, it needs to use the form of {'$set': student}, and its return result is of type UpdateResult. Then call matched respectively_ Count and modified_count attribute, you can get the number of matched data and the number of affected data.

The operation results are as follows:

<pymongo.results.UpdateResult object at 0x10d17b678>
1 0

Let's take another example:

condition = {'age': {'$gt': 20}}
result = collection.update_one(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)

Here, specify the query condition as age greater than 20, and then update the condition as {'$inc': {'age': 1}}, that is, age plus 1. After execution, the age of the first qualified data will be increased by 1.

The operation results are as follows:

<pymongo.results.UpdateResult object at 0x10b8874c8>
1 1

You can see that the number of matches is 1 and the number of impacts is 1.

If update is called_ Many() method, all eligible data will be updated, as shown in the following example:

condition = {'age': {'$gt': 20}}
result = collection.update_many(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)

At this time, the number of matches is no longer 1. The operation results are as follows:

<pymongo.results.UpdateResult object at 0x10c6384c8>
3 3

As you can see, all matched data will be updated.

11. Delete

The deletion operation is relatively simple. You can directly call the remove() method to specify the deletion conditions. At this time, all data meeting the conditions will be deleted. Examples are as follows:

result = collection.remove({'name': 'Kevin'})
print(result)

The operation results are as follows:

{'ok': 1, 'n': 1}

In addition, there are still two new recommendation methods - delete_one() and delete_many(). Examples are as follows:

result = collection.delete_one({'name': 'Kevin'})
print(result)
print(result.deleted_count)
result = collection.delete_many({'age': {'$lt': 25}})
print(result.deleted_count)

The operation results are as follows:

<pymongo.results.DeleteResult object at 0x10e6ba4c8>
1
4

delete_one() is to delete the first qualified data_ Many() deletes all eligible data. Their return results are of type DeleteResult, which can be called deleted_ The count property gets the number of data pieces deleted.

12. Other operations

In addition, PyMongo also provides some combination methods, such as find_one_and_delete(),find_one_and_replace() and find_one_and_update(), which are delete, replace and update operations after finding. Their usage is basically the same as the above methods.

In addition, you can also operate on the index. The related methods are create_index(),create_indexes() and drop_ Index, etc.

This section explains how to use PyMongo to operate MongoDB for data addition, deletion, modification and query.

Keywords: Python Back-end Programmer crawler

Added by vitorjamil on Thu, 20 Jan 2022 14:00:31 +0200