Python json processing basics

Python handles json and dict data

The difference between json and dict

python's dict is a data structure, and json is a data transfer format. json is a pure string written according to a certain agreed format and does not have any characteristics of data structure. The string representation rules of python's dict look similar to json, but dict itself is a complete data structure and implements all its own algorithms.
The key of Python dict can be any hash object, and json can only be a string. It is similar in form, but json is plain text and cannot be operated directly.
json format requires that you must and can only use double quotation marks as the boundary symbol of key or value, not single quotation marks, and "key" must use boundary characters (double quotation marks), but dict doesn't matter.
python's dict Key is unique, while json's Key can be repeated.
python's dict can nest tuples, and JSON has only array.

Conversion method of json and dict in python

json data processing in python:
1. Serialization: dic object ➡ json string (storage, transmission) encoding
2. Deserialization: json ➡ dic (analysis, processing) decoding

import json
json.loads() # Convert json data into dict data
json.dumps() # Convert dict data into json data
json.load() # Read the json file data and convert it into dict data
json.dump() # Convert the dict data into json data and write it to the json file

python read / write json

Read the json file and convert it to dict

with open('.../XXX.json','r') as f:
    data= json.load(f)
data = json.load(open('.../XXX.json','r'))

Convert the dict data into json and write it to the json file

with open('json_data.txt','w+') as f:

python processing dict data

Take key, value and key value pair

# The dictionary only outputs key by default in the for iteration
for k in dict_data:

# Iterative output key
for k in dict_data.keys():

# Iterative output value
for v in dict_data.values():

# Iterative output key value pair
for k,v in dict_data.items():

Addition, deletion and modification of dict:

# Add a key value pair:
dd['new_key'] = 'XXX'

# Change key value
dd['key'] = 'XX'

# Delete key value pair
del['key']  # Delete key value pairs at the first level of the dictionary
del['key1']['key2']['key3'] # Delete key value pairs in multi-level nesting
dd.pop('key') # Delete the key value pair and return the corresponding value

# Determine whether it is in dictionary format

Get key value pairs of multi-level nested dict:

def get_all_dict(data):
    if isinstance(data,dict):
        for x in range(len(data)): 
            temp_key = list(data.keys())[x] 
            temp_value = data[temp_key]
            print('KEY --> {}\nVALUE --> {}'.format(temp_key,temp_value))
            get_all_dict(temp_value) # Iterative output results

Compare the differences between the two dictionaries

dict1 = {'a':1,'b':2,'c':3,'d':4}
dict2 = {'a':1,'b':2,'c':5,'e':6}
differ = set(dict1.items()) ^ set(dict2.items())
#All differences
#Output: {('c ', 3), (E', 6), (C ', 5), (d', 4)}
diff = dict1.keys() & dict2
diff_vals = [(k, dict1[k], dict2[k]) for k in diff if dict1[k] != dict2[k]]
#Same key, different value
#Output: [('c', 3, 5)]

Other references

a = {
b = {
>>>dict_items([('x', 1), ('y', 2), ('z', 3)])
# View key s shared by two dictionaries
print(a.keys() & b.keys())
>>>{'x', 'z'}
# View the key s that dictionary a and dictionary b do not share
print(a.keys() ^  b.keys())
# Check the key in dictionary a but not in dictionary b
print(a.keys() - b.keys())
>>>{('x', 1)}
# View the same key value pairs of dictionary a and dictionary b
print(a.items() & b.items())
>>>{'w', 'y'}

Replace None with null in json

import json
d = {"name":None}
s = json.dumps(d)
s = s.replace('null', '\"null\"')
d = json.loads(s)
>> {'name':'null'}

Other instructions:
In pandas, if other data types are numeric, pandas will automatically replace None with NaN, and even invalidate the effects of s[s.isnull()]= None and s.replace(NaN, None) operations. In this case, you need to use the where function to replace.

None can be directly imported into the database as a null value, and errors will be reported when importing data containing NaN.

Many functions of numpy and pandas can handle NaN, but they will report an error if they encounter None.

Neither None nor NaN can be processed by the group by function of pandas. Groups containing None or NaN will be ignored.

In short, None comes with python, while NaN is supported by numpy and pandas

Summary of equivalence comparison: (True indicates that it is determined to be equal)

Convert the key in the dictionary list from str to float


{'pickup_latitude': '40.64499',
'pickup_longitude': '-73.78115',
'trip_distance': '18.38'}

Treatment method:

for list_dict in list_of_dicts:
    for key, value in list_dict.items():
        list_dict[key] = float(value) 

Traversal inserts a dictionary (dic) into a list

nid = "1,2"
datas = []
for i in nid.split(','):
    mydict = {}
    mydict["id"] = str(i)
    mydict["checked"] = True


['1', '2']
[{'id': '1', 'checked': True}, {'id': '2', 'checked': True}]

The sql statement counts the number of non empty columns per row

select id , num =
       (case when col1 is null then 1 else 0 end) + 
       (case when col2 is null then 1 else 0 end) + 
       (case when col3 is null then 1 else 0 end) + 
       (case when col4 is null then 1 else 0 end) 
from tb

--Non empty
select id , num =
       (case when col1 is not null then 1 else 0 end) + 
       (case when col2 is not null then 1 else 0 end) + 
       (case when col3 is not null then 1 else 0 end) + 
       (case when col4 is not null then 1 else 0 end) 
from tb

The table that generates the most non empty columns in the project

select id , 
	case when filed !='' then 1 else 0 end as num
from ods_crd_ddt_external_dev.vira_ft_ods_issue_s3_di_pt vfoisdp order by num desc limit 10

Convert two lists into Dictionaries

>>> keys = ['a', 'b', 'c']
>>> values = [1, 2, 3]
>>> dictionary = dict(zip(keys, values))
>>> print(dictionary)
{'a': 1, 'b': 2, 'c': 3}

There is no zip writing

l1 = [1,2,3,4,5]
l2 = ['a','b','c','d','e']
d1 = {}
for l1_ in l1:
    for l2_ in l2:
        d1[l1_] = l2_

print (d1)

{1: 'd', 2: 'b', 3: 'e', 4: 'a', 5: 'c'}

Take the dict key to form a list

students = {
    'Small armour':18,
    'Xiao B':20,
    'Xiao C':15
1,Just take the dictionary keys to form a new list
b = [ key for key,value in students.items() ]
b = [ 'Small armour', 'Xiao B','Xiao C' ]
2,The key value pairs of the dictionary are exchanged to form a new dictionary;
b = { value:key for key,value in students.items() }
b = {
        18:'Small armour',
        20:'Xiao B',
        15:'Xiao C'

Write the list into the json file (the list element is a dictionary)

[{'LT/C_score': 0.44917283324979085, 'class_uncertainty': 0.060122907161712646, 'image_id': 286849}, {'LT/C_score': 0.8943103022795436, 'class_uncertainty': 0.009622752666473389, 'image_id': 540932}, {'LT/C_score': 0.8732056996282929, 'class_uncertainty': 0.00022846460342407227, 'image_id': 32901}, {'LT/C_score': 0.8979904106858694, 'class_uncertainty': 0.002511739730834961, 'image_id': 324614}, {'LT/C_score': 0.8679021984870174, 'class_uncertainty': 0.0019818544387817383, 'image_id': 270474}, {'LT/C_score': 0.7331004226857969, 'class_uncertainty': 0.00713348388671875, 'image_id': 139}, {'LT/C_score': 0.11941635694217467, 'class_uncertainty': 0.0169374942779541, 'image_id': 241677}, {'LT/C_score': 0.02415653156423425, 'class_uncertainty': 0.014993846416473389, 'image_id': 528399}, {'LT/C_score': 0.1769359589898254, 'class_uncertainty': 0.2365768551826477, 'image_id': 565391}, {'LT/C_score': 0.866919080778817, 'class_uncertainty': 0.0014355778694152832, 'image_id': 196754}]
import json
import os
def write_list_to_json(list, json_file_name, json_file_save_path):
    take list Write to json file
    :param list:
    :param json_file_name: Written json File name
    :param json_file_save_path: json File storage path
    with open(json_file_name, 'w') as  f:
        json.dump(list, f)

Read json data in txt

txt text file can store all kinds of data, including structured two-dimensional table, semi-structured json and unstructured plain text.
The two-dimensional tables stored in excel and csv files can be directly stored in txt files.

Semi structured json can also be stored in txt text files.

The most common is to store a group of unstructured data in txt files:

Read json type semi-structured data from txt

import pandas as pd
import json

f = open("../data/test.txt","r",encoding="utf-8") 

data = json.load(f)

It should be noted that json must be double quotation marks: '' to operate
Get the data type with the following command

# print(type(data))

Read the Json file by line and write it into the list (dictionary data of each line in the list)

json files in the following format

{"ID": "001", "name": "a", age: "12"}
{"ID": "002", "name": "b", age: "23"}
{"ID": "003", "name": "c", age: "42"}

Convert the json file into dictionary data by line and store it in the list

# coding=utf-8
import json
papers = []    # Each row of the array is dictionary data {}
file = open("E:\\pycharm1\\testset2.json", 'r', encoding = 'utf-8')
    # The above path is the location of my json file, followed by the Chinese code
for line in file.readlines():
    dic = json.loads(line)    

Take key "name" as an example to analyze the extracted data according to attributes:

for line in range(0, len(papers)):
    S1 = (papers[line])['name']     # Read value with key as name

Write list object to txt file

list1 = ['1','2','3','4']
fileObject = open('list.txt','w')
for word in list1:

txt file readout

Here, a txt file is read with the open function, "encoding" indicates that the reading format is "utf-8", and the error code can be ignored.
In addition, it is a good habit to use the with statement to operate file IO, eliminating the need to close() every time you open it

with open("list.txt","r",encoding="utf-8",errors='ignore') as f:
    data = f.readlines()
    for line in data:
        word = line.strip() #list

Read the array with numpy data, which can be further converted into a list object

array_ = numpy.loadtxt('list.txt')
list_ = list(array_) 

Read each row of TXT and coexist in LIST

Example 1
In this way, the json data in the txt file will be stored in str

Method 1:

import sys
with open('accounts.txt','r') as f:
	for line in f:


Method 2:
Processing data:

Step 1: read the data in the file

It is easy to read the data in the file. The code runs as follows:

After reading the data of the file, it is found that these are detailed strings. You can see the following effects through printing:

The printed result is more like the content in txt text, but it is stored as a series of strings in python. Each number is cut into a separate string with the split function of the string

Step 2: data conversion

The above can be reduced to three lines of code:

file = open ('....txt','r')
S =
P = list(int(i) for i in S)

Example 2

Method 1:

f=open('One hundred Tang Poems.txt', encoding='gbk')
for line in f:

line.strip() removes leading and trailing spaces
Encoding encoding format utf-8,gbk
Method 2:

f=open('One hundred Tang Poems.txt')
line = f.readline().strip() #Read the first line
while line:  # Until the file is read
   line = f.readline().strip()  # Read one line of file, including line breaks
f.close()  # Close file

Write dict object to json file

import json
dictObj = {  
        'age': 23,  
        'city': 'shanghai',  
        'skill': 'python'  
    'william': {  
        'age': 33,  
        'city': 'hangzhou',  
        'skill': 'js'  
jsObj = json.dumps(dictObj)  
fileObject = open('jsonFile.json', 'w')  

Read json file

with open('data.json', 'r') as f:
    data = json.load(f)

Write json file

with open('data.json', 'w') as f:
    json.dump(data, f)

Read txt content as list

file_path = 'C:/...'
for w in txt:

Delete the same elements of two lists (one-to-one deletion)

for i in range(len(c)):
    for j in c:
        if j in d:
#[10, 20, 20, 100, 50, 50]

Delete dictionary element

  1. clear() method of Python Dictionary (delete all elements in the dictionary)
# -*- coding: UTF-8 -*-
dict = {'name': 'My blog address', 'number': 10000, 'url':''}
dict.clear();  # Empty all dictionary entries
  1. pop() method of Python Dictionary (delete the value corresponding to the key given in the dictionary, and the return value is the deleted value)
# -*- coding: UTF-8 -*-
site= {'name': 'My blog address', 'number': 10000, 'url':''}
pop_obj=site.pop('name') # Delete the key value pair to be deleted, such as {'name': 'my blog address'}
print pop_obj   # Output: my blog address
  1. popitem() method of Python Dictionary (randomly returns and deletes a pair of keys and values in the dictionary)
# -*- coding: UTF-8 -*-
site= {'name': 'My blog address', 'number': 10000, 'url':''}
pop_obj=site.popitem() # Randomly return and delete a key value pair
print pop_obj   # The output result may be {'url',' '}
  1. del global method (you can delete a single element or empty the dictionary. Emptying requires only one operation)
# -*- coding: UTF-8 -*-
site= {'name': 'My blog address', 'alexa': 10000, 'url':''}
del site['name'] # Delete the entry whose key is' name ' 
del site  # Clear all entries in the dictionary

Removing dictionary key / value pairs - rookie tutorial

Given a dictionary, remove the dictionary key / value pair.

Remove using del

test_dict = {"Runoob" : 1, "Google" : 2, "Taobao" : 3, "Zhihu" : 4} 
# Output original dictionary
print ("Before dictionary removal : " + str(test_dict)) 
# Remove Zhihu with del
del test_dict['Zhihu'] 
# Output removed dictionary
print ("After dictionary removal : " + str(test_dict)) 
# Removing an empty key will result in an error
#del test_dict['Baidu']

Execute the above code and the output result is:

Before dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3, 'Zhihu': 4}
After dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3}

Remove using pop()

test_dict = {"Runoob" : 1, "Google" : 2, "Taobao" : 3, "Zhihu" : 4} 
# Output original dictionary
print ("Before dictionary removal : " + str(test_dict)) 
# Remove Zhihu using pop
removed_value = test_dict.pop('Zhihu') 
# Output removed dictionary
print ("After dictionary removal : " + str(test_dict)) 
print ("Removed key Corresponding value by : " + str(removed_value)) 
print ('\r') 
# If you use pop() to remove a key that does not exist, no exception will occur. You can customize the prompt information
removed_value = test_dict.pop('Baidu', 'There is no such key(key)') 
# Output removed dictionary
print ("After dictionary removal : " + str(test_dict)) 
print ("The removed value is : " + str(removed_value))

Execute the above code and the output result is:

Before dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3, 'Zhihu': 4}
After dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3}
Removed key Corresponding value by : 4

After dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3}
The removed value is : There is no such key(key)

Remove using items()

test_dict = {"Runoob" : 1, "Google" : 2, "Taobao" : 3, "Zhihu" : 4} 
# Output original dictionary
print ("Before dictionary removal : " + str(test_dict)) 
# Remove Zhihu using pop
new_dict = {key:val for key, val in test_dict.items() if key != 'Zhihu'} 
# Output removed dictionary
print ("After dictionary removal : " + str(new_dict))

Execute the above code and the output result is:

Before dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3, 'Zhihu': 4}
After dictionary removal : {'Runoob': 1, 'Google': 2, 'Taobao': 3}

Difference between pop and del

Del statements that delete key value pairs in the dictionary are the same as pop methods. Del statements and pop() functions have the same function. Pop() function has a return value, which is the corresponding value, and del statements have no return value. Pop() function syntax: pop(key[,default]), where key: the key value to delete, default: if there is no key, the default value is returned

    dic = {"Zhang San": 24, "Li Si": 23, "Wang Wu" : 25, "Zhao Liu" : 27}
    del dic["Zhang San"]
    print(dic.pop("Li Si"))


{"Li Si": 23, "Wang Wu" : 25, "Zhao Liu" : 27}
{"Wang Wu" : 25, "Zhao Liu" : 27}

Updated on February 22, 2019:
From Python 3 7, the dictionary is in order according to the insertion order. Modifying the value of an existing key does not affect the order. If you delete a key and add it, the key will be added to the end. dump(s)/load(s) of the standard json library is also ordered

The order of the elements in the list remains the same, where each element is repeated the same number of times

list_before = [1, 2, 3, 4, 5]
list_after = [val for val in list_before for i in range(2)]


list_after = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]

Expand multiple nested complex data (lists, dictionaries, etc.)

def data_flatten(key,val,con_s='_',basic_types=(str,int,float,bool,complex,bytes)):
    Data expansion generator,Data based on key value pairs
    param key: Key. The default is basic type data. No further analysis is required
    param val: Value, which determines the data type of the value. If it is a complex type, it will be further analyzed
    param con_s: Splicer, the connector spliced between the current level key and the parent key. The default is_
    param basic_types: The basic data type is tuple. If the value type is within tuple, it can be output
    return: Key value pair tuple
    if isinstance(val, dict):
        for ck,cv in val.items():
            yield from data_flatten(con_s.join([key,ck]).lstrip('_'), cv)
    elif isinstance(val, (list,tuple,set)):
        for item in val:
            yield from data_flatten(key,item)
    elif isinstance(val, basic_types) or val is None:
        yield str(key).lower(),val

Sample data:

test_dict = {


for i in data_flatten('',test_dict):


Tiling (including nested) dict s

def prefix_dict(di_, prefix_s=''):
    Put every word in the dictionary key All prefixed prefix_s
    :param di_:
    :param prefix_s:
    return {prefix_s + k: v for k, v in di_.items()}
def spear_dict(di_, con_s='.'):
    open dict(If the lower floor is still dict),Recursion is required until the underlying data type is not a dictionary
    Where it may be useful: it may be useful to format the data of the document class into a more relational look
    :param di_: Input dictionary
    :param con_s: Connection symbols between levels
    :return: For dictionaries with a depth not greater than 1, other nested data types remain the same
    ret_di = {}
    for k, v in di_.items():
        if type(v) is dict:
            v = spear_dict(v)
            # There may be a better way to write without writing this layer
            # for k_, v_ in v.items():
            #     ret_di.update({con_s.join([k, k_]): v_})
            ret_di.update(prefix_dict(v, prefix_s=k + con_s))
            ret_di.update({k: v})
    return ret_di
>>> di_
{'title': 'Xintian Commercial Street', 'reliability': 7, 'addressComponents': {'streetNumber': '', 'city': 'Shenzhen City', 'street': '', 'province': 'Guangdong Province', 'district': 'Longhua District'}, 'location': {'lng': 114.09127044677734, 'lat': 22.700519561767578}, 'adInfo': {'adcode': '440309'}, 'level': 11, 'more_deep': {'loca': {'lng': 114.09127044677734, 'lat': 22.700519561767578}}}
>>> spear_dict(di_)
{'title': 'Xintian Commercial Street', 'reliability': 7, 'addressComponents.streetNumber': '', '': 'Shenzhen City', 'addressComponents.street': '', 'addressComponents.province': 'Guangdong Province', 'addressComponents.district': 'Longhua District', 'location.lng': 114.09127044677734, '': 22.700519561767578, 'adInfo.adcode': '440309', 'level': 11, 'more_deep.loca.lng': 114.09127044677734, '': 22.700519561767578}
spear_dict(di_, '_')
{'title': 'Xintian Commercial Street', 'reliability': 7, 'addressComponents_streetNumber': '', 'addressComponents_city': 'Shenzhen City', 'addressComponents_street': '', 'addressComponents_province': 'Guangdong Province', 'addressComponents_district': 'Longhua District', 'location_lng': 114.09127044677734, 'location_lat': 22.700519561767578, 'adInfo_adcode': '440309', 'level': 11, 'more_deep_loca.lng': 114.09127044677734, '': 22.700519561767578}

Nested list expansion

Question 1: for a list, such as list_1 = [[1, 2], [3, 4, 5], [6, 7], [8], [9]] convert to list_2 = [1, 2, 3, 4, 5, 6, 7, 8, 9].

# Common method
list_1 = [[1, 2], [3, 4, 5], [6, 7], [8], [9]]
list_2 = []
for _ in list_1:
    list_2 += _

# List derivation
list_1 = [[1, 2], [3, 4, 5], [6, 7], [8], [9]]
list_2 = [i for k in list_1 for i in k]

# Using sum
list_1 = [[1, 2], [3, 4, 5], [6, 7], [8], [9]]
list_2 = sum(list_1, [])

Problem 2: for some complex, such as: list =[1,[2],[[3]],[[4,[5],6]],7,8,[9]], the above method is not easy to use. We have to change the method. Here we use the recursive method to solve it.

def flat(nums):
    res = []
    for i in nums:
        if isinstance(i, list):
    return res

Two verification methods were learned during the internship:
Token authentication and user name plus password authentication
1. What is Token authentication?

Token is a user authentication method, which is usually called token authentication. When the user logs in successfully for the first time, the server will generate a token and return it to the user. In the future, the user only needs to bring this token to request data without using the user name and password.

2. Token verification process

① When logging in for the first time, the client requests login through user name and password

② The server receives the request and verifies the user name and password

③ If the verification passes, the server will issue a Token, which will be saved in the database of the server and returned to the client

④ The client receives the Token and stores it, such as cookies, SessionStorage and LocalStorage.

⑤ Every time the client requests resources from the server, it will bring a Token

⑥ The server receives the request, then verifies the Token carried in the client request, that is, compares the client's Token with the Token in the local database. If the two tokens are the same, it indicates that the user has successfully logged in, the verification passes, and the server returns the request data to the client.

⑦ If the two tokens are different, the authentication fails and you need to log in through your user name and password

3. Why use Token authentication?

When the client requests data from the server many times, the server needs to query the user name and password from the database many times and compare them, judge whether the user name and password match correctly, and give corresponding prompts. Querying the database consumes more resources, which will undoubtedly increase the running pressure of the server. In order to verify the user login and reduce the pressure on the server, Token authentication is introduced. Tokens are generally stored in memory, so there is no need to query the database, which reduces the frequent query of the database and makes the server more robust.

What are incremental table, full scale, snapshot table

According to the data stored every day and whether it is partitioned by day, it can be divided into incremental table, full table and snapshot table

Item | Value | a|
-------- | ----- ||
Computer | $1600||
Mobile phone | $12||
Conduit | $1 ||

Full scaleIncremental tableSnapshot table
dataInclude the full amount of data to the previous dayIncremental data of the previous dayInclude the full amount of data to the previous day
partitionNo partition (ymd is the current date)Partition by dayPartition by day

For details:

In addition, full outer join

Keywords: Python JSON

Added by sridsam on Tue, 04 Jan 2022 20:04:44 +0200