Crawler and Python: crawler advanced 2 data storage (database storage) - 6.MongoDB storage

MongoDB is written in C + + language. It is an open source database system based on distributed file storage. In the case of high load, add more nodes to ensure the performance of the open source server. MongoDB is designed to provide scalable storage solutions for Web servers. It stores the data as a document, and the data consists of key values (key = > value). A MongoDB document is similar to a JSON object. Field values include other documents, arrays, and document arrays.

Based on these advantages, it is often involved in saving data to MongoDB in the crawler for data cleaning. Make sure MongoDB is installed. For installation tutorial, see: https://www.cnblogs.com/luyj00436/p/15514788.html.

Install the drive

In Python, to connect and use mongoDB, you need MongDB driver. Here, PyMongoDB driver is used to connect. You can use the pip command to install.

python3 -m pip3 install pymongo

After installation, create a test file demo_test_mongodb.py. Execute the following code. If there are no errors, the installation is successful.

import pymongo

1. Create database

To create a database, you need to use the MongoClient object, and specify the URL address and database name of the connection. In the following example, create the data test_db.

1 import pymongo
2 
3 # Create database
4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
5 mydb = myclient["test_db"]

2. Create a collection

MongoDB sets are similar to SQL tables. MongoDB uses the database to create a collection. The example code is as follows:

1 import pymongo
2 
3 # Create collection
4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
5 mydb = myclient["test_db"]
6 mycol = mydb["site"]    # Create collection

In MongoDB, collections are created only after content is inserted. In other words, after creating a set (data table), you need to insert another document (record) before the set can be truly created.

3. Insert document

A document in MongoDB is similar to a record in an SQL table. Inserting documents into a collection using insert_ The first parameter of the one () method is the dictionary name = > value pair. The following example inserts a document into the sites collection.

 1 import pymongo
 2 
 3 # Insert document
 4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
 5 mydb = myclient["test_db"]
 6 mycol = mydb["site"]    # Create collection
 7 
 8 mydict = {"name":"Zhang San", "age":23 , "gender":"male"}
 9 x = mycol.insert_one(mydict)
10 print(x)

After running, the console will output:

4. Insert multiple documents

Insert multiple documents into the collection_ Many() method. The first parameter of this method is the dictionary list. The example code is as follows:

 1 import pymongo
 2 
 3 # Insert multiple documents
 4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
 5 mydb = myclient["test_db"]
 6 mycol = mydb["site"]    # Create collection
 7 
 8 mylist =[
 9     {"name":"Zhang San","age":23,"gender":"male"},
10     {"name":"Li Si","age":23,"gender":"male"},
11     {"name":"Zhang San","age":23,"gender":"male"},
12     {"name":"Zhang San","age":23,"gender":"male"},
13 ]
14 x = mycol.insert_many(mylist)
15 # Enter the corresponding for all inserted documents id value
16 print(x.inserted_ids)

Console output after operation:

five   consult your documentation

MongoDB uses find and find_one method to query the data in the collection, which is similar to the SELECT statement in SQL. Users can use find_ Use the one () method to query a piece of data in the collection. Next, query a piece of data in the sites document. The code is as follows:

1 import pymongo
2 
3 # Query a single document
4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
5 mydb = myclient["test_db"]
6 mycol = mydb["site"]
7 
8 x = mycol.find_one()
9 print(x)

After running, the console will output the current value:

{'_id': ObjectId('6185268a346c50e66086702c'), 'name': 'Zhang San', 'age': 23, 'gender': 'male'}

6. Query all data in the set

The find() method can query all data in the collection, similar to the SELECT * operation in SQL. The following example code finds all the data in the sites collection.

1 import pymongo
2 
3 # Query all data in the collection
4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
5 mydb = myclient["test_db"]
6 mycol = mydb["site"]
7 
8 for x in mycol.find():
9     print(x)

7. Modify data

Users can use update in MongoDB_ The one () method modifies the records in the document. The first parameter of this method is the query condition, and the second parameter is the field to be modified. If more than one matching data is found, the first one will be modified.

 1 import pymongo
 2 
 3 # Amend Article 1 name The attribute equal to Zhang San is age For 20
 4 myclient = pymongo.MongoClient("mongodb://localhost:27017")
 5 mydb = myclient["test_db"]
 6 mycol = mydb["site"]
 7 
 8 myquery={"name":"Zhang San"}
 9 newvalues = {"$set":{"age":20}}
10 mycol.update_one(myquery , newvalues)
11 
12 for x in mycol.find():
13     print(x)

Keywords: Python crawler

Added by glennn.php on Sun, 07 Nov 2021 04:27:29 +0200