preface
A few days ago, a fan asked a question about json file processing in the group.
It seems that he only needs the corresponding values under the two fields of follower and ddate.
We know that json is a common form of data transmission, so the relevant operations of json are more important for the data analysis of crawling data, which can speed up our data extraction efficiency.
thinking
IT's not very difficult to solve this problem. Three methods are proposed in the group. The first is the pd processing or regular expression mentioned by Cai Ge, and the second is the json processing proposed by Xiao Bian, The third is the jsonpath proposed by [Chengdu - IT technical support - Xiao Wang]. In short, there are many methods. Here are four treatment methods. I hope there will be rules to follow when fans encounter similar problems next time.
Implementation process
1. Regular expression
This method can be viewed and extracted through the matching method. The code is as follows:
import re import jsonfile = open('cartoon.txt', 'r', encoding='utf-8') content = file.readlineddate_result1 = re.findall('"ddate":"(\d+\-\d+\-\d+)"', content) ddate_result2 = re.findall('"ddate":"(.*?)"', content) follower_result1 = re.findall('"follower":(\d+),"', content) print(ddate_result1) print(ddate_result2) print(follower_result1)
After running, you can get the results:
There must be many other ways to get ddate and follower. Here is just a brick to attract jade. You are welcome to try more.
2. jsonpath method I
The usage of jsonpath was mentioned in this article before. Interested partners can also take a look at JSON and jsonpath for data extraction.
The following is the code given by the boss of [Chengdu - IT technical support - Xiao Wang]:
from jsonpath import jsonpath import json"""follower and ddate""" with open("cartoon.txt", encoding="utf-8") as file: file_json = json.loads(file.readline)follower = jsonpath(file_json, "$..follower") ddate = jsonpath(file_json, "$..ddate") print(follower)print(ddate)
After the code runs, you will get the desired data, as shown in the following figure:
This Just like / / in xpath, the descendant node, $is the root node.
3. jsonpath method 2
This is another usage. It is provided by the trumpet [Pipi] and directly on the code.
import json import jsonpath# obj = json.load(open('Luo Xiang.json', 'r', encoding='utf-8')) # Note that this is in the form of a file. You can't put a string of file name directly file = open('cartoon.txt', 'r', encoding='utf-8') # Note that this is in the form of a file. You can't put a string of file name directly obj = json.loads(file.readline)follower = jsonpath.jsonpath(obj, '$..follower') # File object jsonpath syntax ddate = jsonpath.jsonpath(obj, '$..ddate') # File object jsonpath syntax print(follower) print(ddate)
After the code is run, you can also get the expected results.
Of course, if your file is a json file, you can also read it directly. The code is similar to:
import json import jsonpathobj = json.load(open('Luo Xiang.json', 'r', encoding='utf-8')) # Note that this is in the form of a file. You can't put a string of file name directly # file = open('Luo Xiang.json', 'r', encoding='utf-8') # Note that this is in the form of a file. You can't put a string of file name directly # obj = json.loads(file.readline)follower = jsonpath.jsonpath(obj, '$..follower') # File object jsonpath syntax ddate = jsonpath.jsonpath(obj, '$..ddate') # File object jsonpath syntax print(follower) print(ddate)
After running, you can also get the expected results:
4. jsonpath method 3
This is provided by [Shenzhen Hua Bro] Huabo in Qunli. The code is as follows:
import json import jsonpathwith open("Luo Xiang.txt", 'r', encoding="UTF-8") as fr: file_json = eval(fr.read.replace('\n\u200b', '')) # Convert read str to dictionary follower = jsonpath.jsonpath(file_json, '$..follower') # File object jsonpath syntax ddate = jsonpath.jsonpath(file_json, '$..ddate') # File object jsonpath syntax print(follower) print(ddate)
The method is similar. After running, you can also get the prefetched target data, as shown in the figure below.