Data analysis of coronavirus pneumonia in 2020
1. Use historical data to predict the number of infections in the future: This is a relatively simple prediction model, which can have the following ideas:
(1) Using historical data for fitting, the fitting curve is obtained to predict the number of possible infections in the next few days, but the effect is generally not very good
(2) Time series models such as ARIMA were used to predict the number of infections in the next few days
(3) Neural networks such as LSTM are used to train the prediction model.
2. Draw the national epidemic map of new pneumonia in Wuhan in 2020 with pyechards
1. Predict the number of confirmed patients by exponential regression
Here, I use the exponential function to fit and calculate the historical data of the number of confirmed patients, and wait for the official updated number of confirmed patients to compare. It is found that the result of the number of confirmed patients predicted by the exponential regression is high.
In the model of fitting historical data with exponential function in the figure below, the prediction results are as follows: the number of people diagnosed on February 1 and February 2 is 16092 and 20967 respectively, but the official real-time data are 14380 and 17205, and the prediction error is between 1000-4000. It can be seen that the prediction of exponential fitting method is not reliable and there are large errors.
! [insert picture description here]( https://img-
blog.csdnimg.cn/20200203131114923.png?x-ossprocess=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDk4NjQ1OQ==,size_16,color_FFFFFF,t_70)
Data source: Real time tracking of Tencent's new coronary pneumonia
Data crawling date: 23:00 on January 31, 2020
Source code: from scipy.optimize import curve_fit import urllib import json import numpy as np import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import plotly.express as px from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures from sklearn import linear_model import scipy as sp from scipy.stats import norm import time import datetime from dateutil.parser import parse def date_change(date): date = '2020/' + date timeArray = time.strptime(date, "%Y/%m/%d") otherStyleTime = time.strftime("%Y/%m/%d", timeArray) return otherStyleTime def change_time(date): t1 = date[-1] today = parse(t1) tomorrow = today + datetime.timedelta(days=1) second_day = tomorrow + datetime.timedelta(days=1) next_day = str(tomorrow)[0:11].replace('-', '/') next_day_2 = str(second_day)[0:11].replace('-', '/') date.append(next_day) date.append(next_day_2) return date def date_encode(date): d = date.split('/') month, day = int(d[0]), int(d[1]) return 100 * month + day def date_decode(date): return '{}.{}'.format(str(date // 100), str(date % 100)) def sequence_analyse(data): date_list, confirm_list, dead_list, heal_list, suspect_list,date = [], [], [], [], [] , [] data.sort(key=lambda x: date_encode(x['date'])) for day in data: date.append(date_change(day['date'])) date_list.append(date_encode(day['date'])) confirm_list.append(int(day['confirm'])) dead_list.append(int(day['dead'])) heal_list.append(int(day['heal'])) suspect_list.append(int(day['suspect'])) return pd.DataFrame({ 'date': date_list, 'confirm': confirm_list, 'dead': dead_list, 'heal': heal_list, 'suspect': suspect_list }),date url = 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_cn_day_counts' response = urllib.request.urlopen(url) json_data = response.read().decode('utf-8').replace('\n', '') data = json.loads(json_data) data = json.loads(data['data']) df,date = sequence_analyse(data) x, y = df['date'].values[:-1], df['confirm'].values[:-1] x_idx = list(np.arange(len(x))) date = change_time(date) def func(x, a, b, c): return a * np.exp(b * x) + c plt.figure(figsize=(15, 8)) plt.scatter(x, y, color='purple', marker='x', label="History data") plt.plot(x, y, color='gray', label="History curve") popt, pcov = curve_fit(func, x_idx, y) test_x = x_idx + [i + 2 for i in x_idx[-2:]] label_x = np.array(test_x) + 113 index_x = np.array(date) test_y = [func(i, popt[0], popt[1], popt[2]) for i in test_x] plt.plot(label_x, test_y, 'g--', label="Fitting curve") plt.title("{:.4}·e^{:.4}+({:.4})".format(popt[0], popt[1], popt[2]), loc="center", pad=-40) plt.scatter(label_x[-2:], test_y[-2:], marker='x', color="red", linewidth=7, label="Predicted data") plt.xticks(label_x, index_x,rotation=45) plt.ylim([-500, test_y[-1] + 2000]) plt.legend() for i in range(len(x)): plt.text(x[i], test_y[i] + 200, y[i], ha='center', va='bottom', fontsize=12, color='red') for a, b in zip(label_x, test_y): plt.text(a, b + 800, int(b), ha='center', va='bottom', fontsize=12) plt.show()
2. Draw the national epidemic map of new pneumonia in Wuhan in 2020 with pyechards
The above method of exponential regression is used to predict the number of confirmed patients, but there is no more intuitive description of the distribution of confirmed patients and the specific distribution in various provinces of China. Therefore, here I crawled the information on the real-time tracking of Tencent's new coronary pneumonia condition with a crawler to facilitate accurate analysis.
It can be clearly observed in the figure below that, except that the new pneumonia is mainly distributed in the areas adjacent to Hubei Province, there are a large number of confirmed cases in the developed coastal areas. According to the observation map, the transmission of Spring Festival pneumonia has the greatest impact on the coastal areas. There are many population movements in this area, which makes the transmission of new pneumonia more serious.
Therefore, I hope you will pay attention to this new type of pneumonia. After all, it can be seen from the picture that the transmission range of the new type of pneumonia is so large. I hope you can spend this time in good health, hahaha
! [insert picture description here]( https://img-blog.csdnimg.cn/20200203160720171.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDk4NjQ1OQ==,size_16,color_FFFFFF,t_70)
Data source: Real time tracking of Tencent's new type of coronary pneumonia
Data crawling date: 15:00 on February 3, 2020
Source code: from bs4 import BeautifulSoup from selenium import webdriver import pandas as pd from pyecharts import Bar from pyecharts import Geo from pyecharts import Map Province_list,Confirm_list,Heal_list,Dead_list = [],[],[],[] driver_path = r'F:\Python-Study-Files\anzhuang\chromedriver\chromedriver.exe' driver = webdriver.Chrome(executable_path=driver_path) driver.get("https://news.qq.com//zt2020/page/feiyan.htm") html = driver.page_source driver.quit() soup = BeautifulSoup(html, 'lxml') h2_list = soup.find_all('h2',class_='blue') for h2 in h2_list[1:35]: Province_name = h2.get_text() Province_list.append(Province_name) confirm_list = soup.find_all('div',class_='confirm') for confirm in confirm_list[3:37]: confirm_list = confirm.get_text() Confirm_list.append(confirm_list) heal_list = soup.find_all('div',class_='heal') for heal in heal_list[0:34]: heal_list = heal.get_text() Heal_list.append(heal_list) dead_list = soup.find_all('div',class_='dead') for dead in dead_list[3:37]: dead_list = dead.get_text() Dead_list.append(dead_list) map = Map("2020 National epidemic map of new pneumonia in Wuhan", "By 2020/2/3/15:00",title_color="#404a59", title_pos="center",width=1000,height=800) map.add("",Province_list,Confirm_list,is_map_symbol_show=True,maptype='china',is_visualmap=True,is_piecewise=True,visual_text_color='#000', is_label_show=True, pieces=[ {"max": 10000, "min": 1001, "label": ">1000"}, {"max": 1000, "min": 500, "label": "500-1000"}, {"max": 499, "min": 200, "label": "200-499"}, {"max":199,"min":100,"label":"100-199"}, {"max":99,"min":10,"label":"10-99"}, {"max":9,"min":1,"label":"1-9"}]) map.render("2020 National epidemic map of new pneumonia in Wuhan.html")
The first time to write a blog, I hope to make more corrections!
reference resources:
https://blog.csdn.net/qq_26822029/article/details/104106679
https://blog.csdn.net/weixin_43746433/article/details/91346371