python crawls China's total import and export trade and data analysis

(1) I. background of the topic

In the past two years, under the attack of the new coronavirus, all countries are facing great challenges. Some have taken measures to seal off the country, and the economic level of some countries has declined. However, China has not only maintained a non declining economy, but also made progress. Through the analysis of total import and export trade, I want to know that in the past two years, in the face of the difficulties of the epidemic, the total import and export volume of our country is compared with that before. Through data visualization, we can see the difference between our country's total import and export volume in recent years.  

(2) Theme web crawler design scheme

1. Topic crawler name

Crawler analysis of total domestic import and export trade

2. Content and data feature analysis of topic web crawler

Climb the total domestic import and export trade of the website and analyze the amount of import and export volume of the current month (US $100 million), year-on-year growth, month on month growth, cumulative import and export volume (US $100 million) and year-on-year growth.

3. Overview of thematic web crawler design scheme (including implementation ideas and technical difficulties)

Crawl the total domestic import and export trade of the current website and the analysis of the amount of import and export volume of the current month (US $100 million), year-on-year growth, month on month growth, the amount of cumulative import and export volume (US $100 million) and year-on-year growth, find the link under the label, jump, crawl the relevant data of the next page, clean the data and visualize the data.

The gradual crawling of page labels will lead to errors due to the slicing of data, cleaning and processing of data, and visual processing of available data.

The specific ideas and analysis are shown through the following codes and pictures.

(3) Analysis of structural characteristics of theme pages

1. Structure and feature analysis of theme page

 

 

2. HTML page parsing

In the first picture, we can find that we need to find the total amount of import and export trade, and then look below. In the second picture, we can find that the data we need to crawl is located in the tr of the tobody tag, and the first two lines are subtitles, The third line starts with the specific data of total import and export trade (US $100 million), year-on-year growth and month on month growth in November 2021, and so on, October and September

3. Node (label) search method and traversal method

Traverse the tr tag with a double loop. Then traverse the td tag under the tr tag to get the data.

(4) Web crawler program design

The main body of the crawler program shall include the following parts, with source code and detailed notes attached, and after each part of the program

Face provides a screenshot of the output result.

1. Data crawling and acquisition

 1 from bs4 import BeautifulSoup
 2 import requests
 3 import matplotlib.pyplot as plt
 4 import pandas as pd
 5 import csv
 6 def getHTMLText(url):
 7     try:
 8         headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/\
 9         537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34"}
10         r=requests.get(url,headers=headers)
11         r.raise_for_status()#If the status is not 200, an HTTPError exception is thrown
12         r.encoding = r.apparent_encoding
13         print(r)
14         return r.text
15     except:
16         return "no"
17 
18 url="https://market.cnal.com/historical/jck.html"
19 a=getHTMLText(url)
20 soup=BeautifulSoup(a)

 

 

 

 

2. Clean and process the data

 1 lll=[]
 2 for i in soup.find_all("tr")[2:]:
 3     zz=[]
 4     bb=0
 5     for a in i:
 6         if bb==1:
 7             zz.append(a)
 8         elif bb==3:
 9             zz.append(a)
10         elif bb==5:
11             zz.append(a)
12         elif bb==7:
13             zz.append(a)
14         elif bb==9:
15             zz.append(a)
16         elif bb==11:
17             zz.append(a)
18         elif bb==13:
19             zz.append(a)
20         elif bb==15:
21             zz.append(a)
22         elif bb==17:
23             zz.append(a)
24         elif bb==19:
25             zz.append(a)
26         elif bb==21:
27             zz.append(a)
28         bb=bb+1 
29     lll.append(zz)
30 lll

 

 

 

1 xxx=[]
2 for gtk in lll:
3     hv=[]
4     for i in gtk:
5         hv.append(str(i)[4:-5])
6     xxx.append(hv)
7 xxx

 

 

 

 1 for hji in range(len(df["month"])):
 2     try:
 3         df.loc[hji,"amount of money(USD100mn)"]=float(df.loc[hji,"amount of money(USD100mn)"])
 4     except:
 5         df.loc[hji,"amount of money(USD100mn)"]=0
 6     try:
 7         df.loc[hji,"Year on year growth"]=float(df.loc[hji,"Year on year growth"][0:-1])/100
 8     except:
 9         df.loc[hji,"Year on year growth"]=0
10     try:
11         df.loc[hji,"Month on month growth"]=float(df.loc[hji,"Month on month growth"][0:-1])/100
12     except:
13         df.loc[hji,"Month on month growth"]=0
14     try:
15         df.loc[hji,"amount of money(USD100mn).1"]=float(df.loc[hji,"amount of money(USD100mn).1"])
16     except:
17         df.loc[hji,"amount of money(USD100mn).1"]=0
18     try:
19         df.loc[hji,"Year on year growth.1"]=float(df.loc[hji,"Year on year growth.1"][0:-1])/100
20     except:
21         df.loc[hji,"Year on year growth.1"]=0
22     try:
23         df.loc[hji,"Month on month growth.1"]=float(df.loc[hji,"Month on month growth.1"][0:-1])/100
24     except:
25         df.loc[hji,"Month on month growth.1"]=0
26     try:
27         df.loc[hji,"amount of money(USD100mn).2"]=float(df.loc[hji,"amount of money(USD100mn).2"])
28     except:
29         df.loc[hji,"amount of money(USD100mn).2"]=0
30     try:
31         df.loc[hji,"Year on year growth.2"]=float(df.loc[hji,"Year on year growth.2"][0:-1])/100
32     except:
33         df.loc[hji,"Year on year growth.2"]=0
34     try:
35         df.loc[hji,"amount of money(USD100mn).3"]=float(df.loc[hji,"amount of money(USD100mn).3"])
36     except:
37         df.loc[hji,"amount of money(USD100mn).3"]=0
38     try:
39         df.loc[hji,"Year on year growth.3"]=float(df.loc[hji,"Year on year growth.3"][0:-1])/100
40     except:
41         df.loc[hji,"Year on year growth.3"]=0
42 df

 

 

 3. Data analysis and visualization (e.g. data column diagram, histogram, scatter diagram, box diagram, distribution diagram)

 1 import requests
 2 from bs4 import BeautifulSoup
 3 import matplotlib.pyplot as plt
 4 import seaborn as sns
 5 import pandas as pd
 6 #df=pd.read_csv("C:\\Users\\wei\\data.csv")
 7 ggf=df.sort_values(by="amount of money(USD100mn)",
 8                         axis=0,
 9                         ascending=False,)
10 bk=ggf["amount of money(USD100mn).1"][0:6]
11 dfc=ggf["amount of money(USD100mn)"][0:6]
12 zk=ggf["amount of money(USD100mn).2"][0:6]
13 city_1=ggf["amount of money(USD100mn).3"][0:6]
14 #Display Chinese tags and deal with Chinese garbled code
15 plt.rcParams['font.sans-serif']=['Microsoft YaHei']
16 plt.rcParams['axes.unicode_minus']=False 
17 plt.figure(figsize=(10,6))
18 x=list(range(len(zk)))
19 #Set spacing for pictures
20 total_width=0.8
21 n=4
22 width=total_width/n
23 for i in range(len(x)):
24     x[i]-=width
25 plt.bar(x,
26         bk,
27         width=width,
28         label="amount of money(USD100mn)",
29         color="brown"
30        )
31 gtx_3=zip(x,bk)
32 for aa,ab in gtx_3:
33     plt.text(aa,
34              ab,
35              ab,
36              ha="center",
37              va='bottom',
38              fontsize=10)
39 for i in range(len(x)):
40     x[i]+=width
41 plt.bar(x,
42         zk,
43         width=width,#width
44         label="amount of money(USD100mn).1",
45         tick_label=city_1,
46         color="b"
47        )
48 gtx_2=zip(x,zk)
49 for aa,ab in gtx_2:
50     plt.text(aa,
51              ab,
52              ab,
53              ha="center",
54              va='bottom',
55              fontsize=10)
56 
57 for i in range(len(x)):
58     x[i]+=width
59 plt.bar(x,
60         city_1,
61         width=width,
62         label="amount of money(USD100mn).2",
63         color="cyan"
64        )
65 gtx_1=zip(x,city_1)
66 for aa,ab in gtx_1:
67     plt.text(aa,
68              ab,
69              ab,
70              ha="center",
71              va='bottom',
72              fontsize=10)
73 for i in range(len(x)):
74     x[i]+=width
75 plt.bar(x,
76         dfc,
77         width=width,
78         label="amount of money(USD100mn)",
79         color="r"
80        )
81 gtx_1=zip(x,city_1)
82 for aa,ab in gtx_1:
83     plt.text(aa,
84              ab,
85              ab,
86              ha="center",
87              va='bottom',
88              fontsize=10)
89 plt.legend()
90 plt.xlabel("")
91 plt.ylabel("USD100mn")
92 plt.title("Comparison of import and export amount")
93 plt.grid()
94 plt.show()

 

 

 

 

 1 #Find out where the average import and export amount is and the amount distribution through the box chart
 2 plt.subplot(2,2,1)
 3 plt.boxplot(df["amount of money(USD100mn)"],       
 4             notch=True,
 5             sym=None,
 6             vert=None,
 7             whis=None,
 8             positions=None,
 9             widths=None,
10             patch_artist=True,
11             meanline=None,
12             showmeans=None,
13             showcaps=None,
14             showbox=None,
15             showfliers=None,
16             boxprops=None,
17             labels=None,
18             flierprops=None,
19             medianprops=None,
20             meanprops=None,
21             capprops=None,
22             whiskerprops=None)
23 plt.title("Year on year growth")
24 plt.ylabel("Month on month growth")
25 plt.subplot(2,2,2)
26 plt.boxplot(df["amount of money(USD100mn).1"],       
27             notch=True,
28             sym=None,
29             vert=None,
30             whis=None,
31             positions=None,
32             widths=None,
33             patch_artist=True,
34             meanline=None,
35             showmeans=None,
36             showcaps=None,
37             showbox=None,
38             showfliers=None,
39             boxprops=None,
40             labels=None,
41             flierprops=None,
42             medianprops=None,
43             meanprops=None,
44             capprops=None,
45             whiskerprops=None)
46 plt.title("Year on year growth.1")
47 plt.ylabel("Month on month growth.1")
48 plt.subplot(2,2,3)
49 
50 plt.boxplot(df["amount of money(USD100mn).2"],       
51             notch=True,
52             sym=">",
53             vert=None,
54             whis=None,
55             positions=None,
56             widths=None,
57             patch_artist=True,
58             meanline=None,
59             showmeans=None,
60             showcaps=None,
61             showbox=None,
62             showfliers=None,
63             boxprops=None,
64             labels=None,
65             flierprops=None,
66             medianprops=None,
67             meanprops=None,
68             capprops=None,
69             whiskerprops=None)
70 plt.title("Year on year growth.2")
71 plt.ylabel("Month on month growth.2")
72 plt.subplot(2,2,4)
73 plt.boxplot(df["amount of money(USD100mn).3"],       
74             notch=True,
75             sym="+",
76             vert=None,
77             whis=None,
78             positions=None,
79             widths=None,
80             patch_artist=True,
81             meanline=True,
82             showmeans=True,
83             showcaps=None,
84             showbox=None,
85             showfliers=None,
86             boxprops=None,
87             labels=None,
88             flierprops=None,
89             medianprops=None,
90             meanprops=None,
91             capprops=None,
92             whiskerprops=None)
93 plt.title("Year on year growth.3")
94 plt.ylabel("Month on month growth.3")
95 plt.show()

 

 

 

 

4. According to the relationship between the data, analyze the correlation coefficient between the two variables, draw the scatter diagram, and establish the variable

 

Regression equation between quantities (univariate or multivariate).

 

 1 from pyecharts import options as opts
 2 from pyecharts.charts import Line
 3 df=pd.read_csv("D:\\Zheng Biao Liu python curriculum design\\rk.csv")
 4 line1 = (
 5     Line()
 6     .add_xaxis(list(df['amount of money(USD100mn)'][:10].values))
 7     .add_yaxis("Import", df['amount of money(USD100mn)'][:10].values.tolist())
 8     .add_yaxis("Export", df['amount of money(USD100mn).1'][:10].values.tolist())
 9     .set_global_opts(
10         title_opts=opts.TitleOpts(title="Total value of import and export"),
11     )
12 )
13 line1.render_notebook()

 

 

 

  1 #Organize drawing data
  2 hi=df.sort_values(by="amount of money(USD100mn)",
  3                         axis=0,
  4                         ascending=False,)
  5 for ikl in range(len(df["amount of money(USD100mn)"])):
  6     if ikl==29:
  7         fa=hi.loc[ikl,"amount of money(USD100mn)"]
  8     elif ikl==60:
  9         fb=hi.loc[ikl,"amount of money(USD100mn)"]
 10     elif ikl==90:
 11         fc=hi.loc[ikl,"amount of money(USD100mn)"]
 12 a_25=0
 13 a_50=0
 14 a_75=0
 15 a_100=0
 16 DF=len(hi["amount of money(USD100mn)"])
 17 plt.subplot(1,2,1)
 18 for gh in range(DF):
 19     if hi.loc[gh,"amount of money(USD100mn)"]>fa:
 20         a_100=a_100+1
 21     elif hi.loc[gh,"amount of money(USD100mn)"]>fb:
 22         a_75=a_75+1
 23     elif hi.loc[gh,"amount of money(USD100mn)"]>fc:
 24         a_50=a_50+1
 25     else:
 26         a_25=a_25+1
 27 a_data=[a_25,a_50,a_75,a_100]
 28 plt.rcParams['font.sans-serif']=['Microsoft YaHei']  #Show Chinese labels,Dealing with Chinese garbled code
 29 plt.rcParams['axes.unicode_minus']=False 
 30 #Construction data
 31 explode = [0, 0, 0, 0]
 32 labels = ["0-25%", "25-50%", "50-75%", "75-100%"]
 33 colors = ['gold', 'chartreuse', 'cyan', 'pink']
 34 plt.pie(
 35     a_data,  #Drawing data
 36     explode=explode, #Specifies that some parts of the pie chart are highlighted, that is, they appear explosive
 37     labels=labels,
 38     colors=colors,
 39     autopct='%.2f%%',
 40     pctdistance=0.8,
 41     labeldistance=1.1,
 42     startangle=180,
 43     radius=1.2,
 44     counterclock=False,
 45     wedgeprops={'linewidth':1.5,'edgecolor':'r'},
 46     textprops={'fontsize':10,'color':'black'},
 47     )
 48 #Add diagram title
 49 plt.title('Import amount (USD 100 million)')
 50 #----------------------------------------------------------------------------------------------------------------------------------
 51 plt.subplot(1,2,2)
 52 #display graphics
 53 hi=df.sort_values(by="amount of money(USD100mn).1",
 54                         axis=0,
 55                         ascending=False,)
 56 for ikl in range(len(df["amount of money(USD100mn).1"])):
 57     if ikl==29:
 58         fa=hi.loc[ikl,"amount of money(USD100mn).1"]
 59     elif ikl==60:
 60         fb=hi.loc[ikl,"amount of money(USD100mn).1"]
 61     elif ikl==90:
 62         fc=hi.loc[ikl,"amount of money(USD100mn).1"]
 63 a_25=0
 64 a_50=0
 65 a_75=0
 66 a_100=0
 67 DF=len(hi["amount of money(USD100mn).1"])
 68 for gh in range(DF):
 69     if hi.loc[gh,"amount of money(USD100mn).1"]>fa:
 70         a_100=a_100+1
 71     elif hi.loc[gh,"amount of money(USD100mn).1"]>fb:
 72         a_75=a_75+1
 73     elif hi.loc[gh,"amount of money(USD100mn).1"]>fc:
 74         a_50=a_50+1
 75     else:
 76         a_25=a_25+1
 77 a_data=[a_25,a_50,a_75,a_100]
 78 plt.rcParams['font.sans-serif']=['Microsoft YaHei']  #Show Chinese labels,Dealing with Chinese garbled code
 79 plt.rcParams['axes.unicode_minus']=False 
 80 #Construction data
 81 explode = [0, 0, 0, 0]
 82 labels = ["0-25%", "25-50%", "50-75%", "75-100%"]
 83 colors = ['gold', 'chartreuse', 'cyan', 'pink']
 84 plt.pie(
 85     a_data,  #Drawing data
 86     explode=explode, #Specifies that some parts of the pie chart are highlighted, that is, they appear explosive
 87     labels=labels,
 88     colors=colors,
 89     autopct='%.2f%%',
 90     pctdistance=0.8,
 91     labeldistance=1.1,
 92     startangle=180,
 93     radius=1.2,
 94     counterclock=False,
 95     wedgeprops={'linewidth':1.5,'edgecolor':'r'},
 96     textprops={'fontsize':10,'color':'black'},
 97     )
 98 #Add diagram title
 99 plt.title('Export amount (USD 100 million)')
100 plt.show()

 

 

 

 

 

6. Data persistence

 

 1 with open("D:\\Zheng Biao Liu python curriculum design\\rk.csv","w",encoding="utf-8") as fi:
 2     writer=csv.writer(fi)
 3     writer.writerow(["month",
 4                      "amount of money(USD100mn)","Year on year growth","Month on month growth",
 5                      "amount of money(USD100mn)","Year on year growth","Month on month growth",
 6                      "amount of money(USD100mn)",
 7                      "Year on year growth",
 8                      "amount of money(USD100mn)","Year on year growth"])#Data column name for each column
 9     for da in xxx:
10         writer.writerow(da)
11 fi.close()

 

 

 

7. Summarize the codes of the above parts and attach the complete program code

 

  1 from bs4 import BeautifulSoup
  2 import requests
  3 import matplotlib.pyplot as plt
  4 import pandas as pd
  5 import csv
  6 def getHTMLText(url):
  7     try:
  8         headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/\
  9         537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36 Edg/96.0.1054.34"}
 10         r=requests.get(url,headers=headers)
 11         r.raise_for_status()#If the status is not 200, the HTTPError abnormal
 12         r.encoding = r.apparent_encoding
 13         print(r)
 14         return r.text
 15     except:
 16         return "no"
 17 
 18 url="https://market.cnal.com/historical/jck.html"
 19 a=getHTMLText(url)
 20 soup=BeautifulSoup(a)
 21 soup
 22 lll=[]
 23 for i in soup.find_all("tr")[2:]:
 24     zz=[]
 25     bb=0
 26     for a in i:
 27         if bb==1:
 28             zz.append(a)
 29         elif bb==3:
 30             zz.append(a)
 31         elif bb==5:
 32             zz.append(a)
 33         elif bb==7:
 34             zz.append(a)
 35         elif bb==9:
 36             zz.append(a)
 37         elif bb==11:
 38             zz.append(a)
 39         elif bb==13:
 40             zz.append(a)
 41         elif bb==15:
 42             zz.append(a)
 43         elif bb==17:
 44             zz.append(a)
 45         elif bb==19:
 46             zz.append(a)
 47         elif bb==21:
 48             zz.append(a)
 49         bb=bb+1 
 50     lll.append(zz)
 51 lll
 52 xxx=[]
 53 for gtk in lll:
 54     hv=[]
 55     for i in gtk:
 56         hv.append(str(i)[4:-5])
 57     xxx.append(hv)
 58 xxx
 59 with open("D:\\Zheng Biao Liu python curriculum design\\rk.csv","w",encoding="utf-8") as fi:
 60     writer=csv.writer(fi)
 61     writer.writerow(["month",
 62                      "amount of money(USD100mn)","Year on year growth","Month on month growth",
 63                      "amount of money(USD100mn)","Year on year growth","Month on month growth",
 64                      "amount of money(USD100mn)",
 65                      "Year on year growth",
 66                      "amount of money(USD100mn)","Year on year growth"])#Data column name for each column
 67     for da in xxx:
 68         writer.writerow(da)
 69 fi.close()
 70 for hji in range(len(df["month"])):
 71     try:
 72         df.loc[hji,"amount of money(USD100mn)"]=float(df.loc[hji,"amount of money(USD100mn)"])
 73     except:
 74         df.loc[hji,"amount of money(USD100mn)"]=0
 75     try:
 76         df.loc[hji,"Year on year growth"]=float(df.loc[hji,"Year on year growth"][0:-1])/100
 77     except:
 78         df.loc[hji,"Year on year growth"]=0
 79     try:
 80         df.loc[hji,"Month on month growth"]=float(df.loc[hji,"Month on month growth"][0:-1])/100
 81     except:
 82         df.loc[hji,"Month on month growth"]=0
 83     try:
 84         df.loc[hji,"amount of money(USD100mn).1"]=float(df.loc[hji,"amount of money(USD100mn).1"])
 85     except:
 86         df.loc[hji,"amount of money(USD100mn).1"]=0
 87     try:
 88         df.loc[hji,"Year on year growth.1"]=float(df.loc[hji,"Year on year growth.1"][0:-1])/100
 89     except:
 90         df.loc[hji,"Year on year growth.1"]=0
 91     try:
 92         df.loc[hji,"Month on month growth.1"]=float(df.loc[hji,"Month on month growth.1"][0:-1])/100
 93     except:
 94         df.loc[hji,"Month on month growth.1"]=0
 95     try:
 96         df.loc[hji,"amount of money(USD100mn).2"]=float(df.loc[hji,"amount of money(USD100mn).2"])
 97     except:
 98         df.loc[hji,"amount of money(USD100mn).2"]=0
 99     try:
100         df.loc[hji,"Year on year growth.2"]=float(df.loc[hji,"Year on year growth.2"][0:-1])/100
101     except:
102         df.loc[hji,"Year on year growth.2"]=0
103     try:
104         df.loc[hji,"amount of money(USD100mn).3"]=float(df.loc[hji,"amount of money(USD100mn).3"])
105     except:
106         df.loc[hji,"amount of money(USD100mn).3"]=0
107     try:
108         df.loc[hji,"Year on year growth.3"]=float(df.loc[hji,"Year on year growth.3"][0:-1])/100
109     except:
110         df.loc[hji,"Year on year growth.3"]=0
111 df
112 import requests
113 from bs4 import BeautifulSoup
114 import matplotlib.pyplot as plt
115 import seaborn as sns
116 import pandas as pd
117 #df=pd.read_csv("C:\\Users\\wei\\data.csv")
118 ggf=df.sort_values(by="amount of money(USD100mn)",
119                         axis=0,
120                         ascending=False,)
121 bk=ggf["amount of money(USD100mn).1"][0:6]
122 dfc=ggf["amount of money(USD100mn)"][0:6]
123 zk=ggf["amount of money(USD100mn).2"][0:6]
124 city_1=ggf["amount of money(USD100mn).3"][0:6]
125 #Show Chinese labels,Dealing with Chinese garbled code
126 plt.rcParams['font.sans-serif']=['Microsoft YaHei']
127 plt.rcParams['axes.unicode_minus']=False 
128 plt.figure(figsize=(10,6))
129 x=list(range(len(zk)))
130 #Set spacing for pictures
131 total_width=0.8
132 n=4
133 width=total_width/n
134 for i in range(len(x)):
135     x[i]-=width
136 plt.bar(x,
137         bk,
138         width=width,
139         label="amount of money(USD100mn)",
140         color="brown"
141        )
142 gtx_3=zip(x,bk)
143 for aa,ab in gtx_3:
144     plt.text(aa,
145              ab,
146              ab,
147              ha="center",
148              va='bottom',
149              fontsize=10)
150 for i in range(len(x)):
151     x[i]+=width
152 plt.bar(x,
153         zk,
154         width=width,#width
155         label="amount of money(USD100mn).1",
156         tick_label=city_1,
157         color="b"
158        )
159 gtx_2=zip(x,zk)
160 for aa,ab in gtx_2:
161     plt.text(aa,
162              ab,
163              ab,
164              ha="center",
165              va='bottom',
166              fontsize=10)
167 
168 for i in range(len(x)):
169     x[i]+=width
170 plt.bar(x,
171         city_1,
172         width=width,
173         label="amount of money(USD100mn).2",
174         color="cyan"
175        )
176 gtx_1=zip(x,city_1)
177 for aa,ab in gtx_1:
178     plt.text(aa,
179              ab,
180              ab,
181              ha="center",
182              va='bottom',
183              fontsize=10)
184 for i in range(len(x)):
185     x[i]+=width
186 plt.bar(x,
187         dfc,
188         width=width,
189         label="amount of money(USD100mn)",
190         color="r"
191        )
192 gtx_1=zip(x,city_1)
193 for aa,ab in gtx_1:
194     plt.text(aa,
195              ab,
196              ab,
197              ha="center",
198              va='bottom',
199              fontsize=10)
200 plt.legend()
201 plt.xlabel("")
202 plt.ylabel("USD100mn")
203 plt.title("Comparison of import and export amount")
204 plt.grid()
205 plt.show()
206 #Find out where the average import and export amount is and the amount distribution through the box chart
207 plt.subplot(2,2,1)
208 plt.boxplot(df["amount of money(USD100mn)"],       
209             notch=True,
210             sym=None,
211             vert=None,
212             whis=None,
213             positions=None,
214             widths=None,
215             patch_artist=True,
216             meanline=None,
217             showmeans=None,
218             showcaps=None,
219             showbox=None,
220             showfliers=None,
221             boxprops=None,
222             labels=None,
223             flierprops=None,
224             medianprops=None,
225             meanprops=None,
226             capprops=None,
227             whiskerprops=None)
228 plt.title("Year on year growth")
229 plt.ylabel("Month on month growth")
230 plt.subplot(2,2,2)
231 plt.boxplot(df["amount of money(USD100mn).1"],       
232             notch=True,
233             sym=None,
234             vert=None,
235             whis=None,
236             positions=None,
237             widths=None,
238             patch_artist=True,
239             meanline=None,
240             showmeans=None,
241             showcaps=None,
242             showbox=None,
243             showfliers=None,
244             boxprops=None,
245             labels=None,
246             flierprops=None,
247             medianprops=None,
248             meanprops=None,
249             capprops=None,
250             whiskerprops=None)
251 plt.title("Year on year growth.1")
252 plt.ylabel("Month on month growth.1")
253 plt.subplot(2,2,3)
254 
255 plt.boxplot(df["amount of money(USD100mn).2"],       
256             notch=True,
257             sym=">",
258             vert=None,
259             whis=None,
260             positions=None,
261             widths=None,
262             patch_artist=True,
263             meanline=None,
264             showmeans=None,
265             showcaps=None,
266             showbox=None,
267             showfliers=None,
268             boxprops=None,
269             labels=None,
270             flierprops=None,
271             medianprops=None,
272             meanprops=None,
273             capprops=None,
274             whiskerprops=None)
275 plt.title("Year on year growth.2")
276 plt.ylabel("Month on month growth.2")
277 plt.subplot(2,2,4)
278 plt.boxplot(df["amount of money(USD100mn).3"],       
279             notch=True,
280             sym="+",
281             vert=None,
282             whis=None,
283             positions=None,
284             widths=None,
285             patch_artist=True,
286             meanline=True,
287             showmeans=True,
288             showcaps=None,
289             showbox=None,
290             showfliers=None,
291             boxprops=None,
292             labels=None,
293             flierprops=None,
294             medianprops=None,
295             meanprops=None,
296             capprops=None,
297             whiskerprops=None)
298 plt.title("Year on year growth.3")
299 plt.ylabel("Month on month growth.3")
300 plt.show()
301 from pyecharts import options as opts
302 from pyecharts.charts import Line
303 df=pd.read_csv("D:\\Zheng Biao Liu python curriculum design\\rk.csv")
304 line1 = (
305     Line()
306     .add_xaxis(list(df['amount of money(USD100mn)'][:10].values))
307     .add_yaxis("Import", df['amount of money(USD100mn)'][:10].values.tolist())
308     .add_yaxis("Export", df['amount of money(USD100mn).1'][:10].values.tolist())
309     .set_global_opts(
310         title_opts=opts.TitleOpts(title="Total value of import and export"),
311     )
312 )
313 line1.render_notebook()
314 #Organize drawing data
315 hi=df.sort_values(by="amount of money(USD100mn)",
316                         axis=0,
317                         ascending=False,)
318 for ikl in range(len(df["amount of money(USD100mn)"])):
319     if ikl==29:
320         fa=hi.loc[ikl,"amount of money(USD100mn)"]
321     elif ikl==60:
322         fb=hi.loc[ikl,"amount of money(USD100mn)"]
323     elif ikl==90:
324         fc=hi.loc[ikl,"amount of money(USD100mn)"]
325 a_25=0
326 a_50=0
327 a_75=0
328 a_100=0
329 DF=len(hi["amount of money(USD100mn)"])
330 plt.subplot(1,2,1)
331 for gh in range(DF):
332     if hi.loc[gh,"amount of money(USD100mn)"]>fa:
333         a_100=a_100+1
334     elif hi.loc[gh,"amount of money(USD100mn)"]>fb:
335         a_75=a_75+1
336     elif hi.loc[gh,"amount of money(USD100mn)"]>fc:
337         a_50=a_50+1
338     else:
339         a_25=a_25+1
340 a_data=[a_25,a_50,a_75,a_100]
341 plt.rcParams['font.sans-serif']=['Microsoft YaHei']  #Show Chinese labels,Dealing with Chinese garbled code
342 plt.rcParams['axes.unicode_minus']=False 
343 #Construction data
344 explode = [0, 0, 0, 0]
345 labels = ["0-25%", "25-50%", "50-75%", "75-100%"]
346 colors = ['gold', 'chartreuse', 'cyan', 'pink']
347 plt.pie(
348     a_data,  #Drawing data
349     explode=explode, #Specifies that some parts of the pie chart are highlighted, that is, they appear explosive
350     labels=labels,
351     colors=colors,
352     autopct='%.2f%%',
353     pctdistance=0.8,
354     labeldistance=1.1,
355     startangle=180,
356     radius=1.2,
357     counterclock=False,
358     wedgeprops={'linewidth':1.5,'edgecolor':'r'},
359     textprops={'fontsize':10,'color':'black'},
360     )
361 #Add diagram title
362 plt.title('Import amount (USD 100 million)')
363 #----------------------------------------------------------------------------------------------------------------------------------
364 plt.subplot(1,2,2)
365 #display graphics
366 hi=df.sort_values(by="amount of money(USD100mn).1",
367                         axis=0,
368                         ascending=False,)
369 for ikl in range(len(df["amount of money(USD100mn).1"])):
370     if ikl==29:
371         fa=hi.loc[ikl,"amount of money(USD100mn).1"]
372     elif ikl==60:
373         fb=hi.loc[ikl,"amount of money(USD100mn).1"]
374     elif ikl==90:
375         fc=hi.loc[ikl,"amount of money(USD100mn).1"]
376 a_25=0
377 a_50=0
378 a_75=0
379 a_100=0
380 DF=len(hi["amount of money(USD100mn).1"])
381 for gh in range(DF):
382     if hi.loc[gh,"amount of money(USD100mn).1"]>fa:
383         a_100=a_100+1
384     elif hi.loc[gh,"amount of money(USD100mn).1"]>fb:
385         a_75=a_75+1
386     elif hi.loc[gh,"amount of money(USD100mn).1"]>fc:
387         a_50=a_50+1
388     else:
389         a_25=a_25+1
390 a_data=[a_25,a_50,a_75,a_100]
391 plt.rcParams['font.sans-serif']=['Microsoft YaHei']  #Show Chinese labels,Dealing with Chinese garbled code
392 plt.rcParams['axes.unicode_minus']=False 
393 #Construction data
394 explode = [0, 0, 0, 0]
395 labels = ["0-25%", "25-50%", "50-75%", "75-100%"]
396 colors = ['gold', 'chartreuse', 'cyan', 'pink']
397 plt.pie(
398     a_data,  #Drawing data
399     explode=explode, #Specifies that some parts of the pie chart are highlighted, that is, they appear explosive
400     labels=labels,
401     colors=colors,
402     autopct='%.2f%%',
403     pctdistance=0.8,
404     labeldistance=1.1,
405     startangle=180,
406     radius=1.2,
407     counterclock=False,
408     wedgeprops={'linewidth':1.5,'edgecolor':'r'},
409     textprops={'fontsize':10,'color':'black'},
410     )
411 #Add diagram title
412 plt.title('Export amount (USD 100 million)')
413 plt.show()

 

(5) . summary (10 points)

1. What conclusions can be drawn from the analysis and visualization of subject data? Is the expected goal achieved?

Conclusion: (1) in recent years, the overall change range of total import and export trade affected by the epidemic is not particularly large.

(2) generally speaking, the total export amount of our country is greater than the total import amount, and the total expenditure is less than the total income.

(3) our country's import and export trade is constantly improving.

Achieve the expected goal

2. What are the gains in the process of completing this design? And suggestions for improvement?

In this design process, I have gained a lot from the data screening of data processing species, and have a better understanding of reptile knowledge, that is, how to carry out type conversion, and then achieve my desired effect.

Benefit shallow! What needs to be improved may be that the reaction time of writing programs is too slow! Lack of programming experience.

 

 

 

Added by nagrgk on Wed, 29 Dec 2021 20:44:50 +0200