Inventory four methods of reading json files and extracting the contents of json files in Python

preface

A few days ago, a fan asked a question about json file processing in the group.

It seems that he only needs the corresponding values under the two fields of follower and ddate.

We know that json is a common form of data transmission, so the relevant operations of json are more important for the data analysis of crawling data, which can speed up our data extraction efficiency.

thinking

IT's not very difficult to solve this problem. Three methods are proposed in the group. The first is the pd processing or regular expression mentioned by Cai Ge, and the second is the json processing proposed by Xiao Bian, The third is the jsonpath proposed by [Chengdu - IT technical support - Xiao Wang]. In short, there are many methods. Here are four treatment methods. I hope there will be rules to follow when fans encounter similar problems next time.

Implementation process

1. Regular expression

This method can be viewed and extracted through the matching method. The code is as follows:

import re
import jsonfile = open('cartoon.txt', 'r', encoding='utf-8')
content = file.readlineddate_result1 = re.findall('"ddate":"(\d+\-\d+\-\d+)"', content)
ddate_result2 = re.findall('"ddate":"(.*?)"', content)
follower_result1 = re.findall('"follower":(\d+),"', content)
print(ddate_result1)
print(ddate_result2)
print(follower_result1)

After running, you can get the results:

There must be many other ways to get ddate and follower. Here is just a brick to attract jade. You are welcome to try more.

2. jsonpath method I

The usage of jsonpath was mentioned in this article before. Interested partners can also take a look at JSON and jsonpath for data extraction.

The following is the code given by the boss of [Chengdu - IT technical support - Xiao Wang]:

from jsonpath import jsonpath
import json"""follower and ddate"""
with open("cartoon.txt", encoding="utf-8") as file:
file_json = json.loads(file.readline)follower = jsonpath(file_json, "$..follower")
ddate = jsonpath(file_json, "$..ddate")
print(follower)print(ddate)

After the code runs, you will get the desired data, as shown in the following figure:

This Just like / / in xpath, the descendant node, $is the root node.

3. jsonpath method 2

This is another usage. It is provided by the trumpet [Pipi] and directly on the code.

import json
import jsonpath# obj = json.load(open('Luo Xiang.json', 'r', encoding='utf-8')) # Note that this is in the form of a file. You can't put a string of file name directly
file = open('cartoon.txt', 'r', encoding='utf-8') # Note that this is in the form of a file. You can't put a string of file name directly
obj = json.loads(file.readline)follower = jsonpath.jsonpath(obj, '$..follower') # File object jsonpath syntax
ddate = jsonpath.jsonpath(obj, '$..ddate') # File object jsonpath syntax
print(follower)
print(ddate)

After the code is run, you can also get the expected results.

Of course, if your file is a json file, you can also read it directly. The code is similar to:

import json
import jsonpathobj = json.load(open('Luo Xiang.json', 'r', encoding='utf-8')) # Note that this is in the form of a file. You can't put a string of file name directly
# file = open('Luo Xiang.json', 'r', encoding='utf-8') # Note that this is in the form of a file. You can't put a string of file name directly
# obj = json.loads(file.readline)follower = jsonpath.jsonpath(obj, '$..follower') # File object jsonpath syntax
ddate = jsonpath.jsonpath(obj, '$..ddate') # File object jsonpath syntax
print(follower)
print(ddate)

After running, you can also get the expected results:

4. jsonpath method 3

This is provided by [Shenzhen Hua Bro] Huabo in Qunli. The code is as follows:

import json
import jsonpathwith open("Luo Xiang.txt", 'r', encoding="UTF-8") as fr:
file_json = eval(fr.read.replace('\n\u200b', '')) # Convert read str to dictionary
follower = jsonpath.jsonpath(file_json, '$..follower') # File object jsonpath syntax
ddate = jsonpath.jsonpath(file_json, '$..ddate') # File object jsonpath syntax
print(follower)
print(ddate)

The method is similar. After running, you can also get the prefetched target data, as shown in the figure below.

Keywords: Python JSON crawler

Added by jtp51 on Wed, 15 Dec 2021 19:32:55 +0200