Data analysis of coronavirus pneumonia in 2020

1. Use historical data to predict the number of infections in the future: This is a relatively simple prediction model, which can have the following ideas:
(1) Using historical data for fitting, the fitting curve is obtained to predict the number of possible infections in the next few days, but the effect is generally not very good
(2) Time series models such as ARIMA were used to predict the number of infections in the next few days
(3) Neural networks such as LSTM are used to train the prediction model.

2. Draw the national epidemic map of new pneumonia in Wuhan in 2020 with pyechards

1. Predict the number of confirmed patients by exponential regression

Here, I use the exponential function to fit and calculate the historical data of the number of confirmed patients, and wait for the official updated number of confirmed patients to compare. It is found that the result of the number of confirmed patients predicted by the exponential regression is high.

In the model of fitting historical data with exponential function in the figure below, the prediction results are as follows: the number of people diagnosed on February 1 and February 2 is 16092 and 20967 respectively, but the official real-time data are 14380 and 17205, and the prediction error is between 1000-4000. It can be seen that the prediction of exponential fitting method is not reliable and there are large errors.

! [insert picture description here]( https://img-
blog.csdnimg.cn/20200203131114923.png?x-ossprocess=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDk4NjQ1OQ==,size_16,color_FFFFFF,t_70)
Data source: Real time tracking of Tencent's new coronary pneumonia

Data crawling date: 23:00 on January 31, 2020

    Source code:
    
    from scipy.optimize import curve_fit
    import urllib
    import json
    import numpy as np
    import seaborn as sns
    import matplotlib.pyplot as plt
    import pandas as pd
    import plotly.express as px
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn import linear_model
    import scipy as sp
    from scipy.stats import norm
    import time
    import datetime
    from dateutil.parser import parse
    
    def date_change(date):
        date = '2020/' + date
        timeArray = time.strptime(date, "%Y/%m/%d")
        otherStyleTime = time.strftime("%Y/%m/%d", timeArray)
        return otherStyleTime
    
    def change_time(date):
        t1 = date[-1]
        today = parse(t1)
        tomorrow = today + datetime.timedelta(days=1)
        second_day = tomorrow + datetime.timedelta(days=1)
        next_day = str(tomorrow)[0:11].replace('-', '/')
        next_day_2 = str(second_day)[0:11].replace('-', '/')
        date.append(next_day)
        date.append(next_day_2)
        return date
    
    def date_encode(date):
        d = date.split('/')
        month, day = int(d[0]), int(d[1])
        return 100 * month + day
    
    def date_decode(date):
        return '{}.{}'.format(str(date // 100), str(date % 100))
    
    def sequence_analyse(data):
        date_list, confirm_list, dead_list, heal_list, suspect_list,date = [], [], [], [], [] , []
        data.sort(key=lambda x: date_encode(x['date']))
        for day in data:
            date.append(date_change(day['date']))
            date_list.append(date_encode(day['date']))
            confirm_list.append(int(day['confirm']))
            dead_list.append(int(day['dead']))
            heal_list.append(int(day['heal']))
            suspect_list.append(int(day['suspect']))
        return pd.DataFrame({
            'date': date_list,
            'confirm': confirm_list,
            'dead': dead_list,
            'heal': heal_list,
            'suspect': suspect_list
        }),date
    
    url = 'https://view.inews.qq.com/g2/getOnsInfo?name=wuwei_ww_cn_day_counts'
    response = urllib.request.urlopen(url)
    json_data = response.read().decode('utf-8').replace('\n', '')
    data = json.loads(json_data)
    data = json.loads(data['data'])
    
    df,date = sequence_analyse(data)
    x, y = df['date'].values[:-1], df['confirm'].values[:-1]
    x_idx = list(np.arange(len(x)))
    date = change_time(date)
    
    def func(x, a, b, c):
        return a * np.exp(b * x) + c
    
    plt.figure(figsize=(15, 8))
    plt.scatter(x, y, color='purple', marker='x', label="History data")
    plt.plot(x, y, color='gray', label="History curve")
    popt, pcov = curve_fit(func, x_idx, y)
    
    test_x = x_idx + [i + 2 for i in x_idx[-2:]]
    label_x = np.array(test_x) + 113
    index_x = np.array(date)
    test_y = [func(i, popt[0], popt[1], popt[2]) for i in test_x]
    plt.plot(label_x, test_y, 'g--', label="Fitting curve")
    plt.title("{:.4}·e^{:.4}+({:.4})".format(popt[0], popt[1], popt[2]), loc="center", pad=-40)
    plt.scatter(label_x[-2:], test_y[-2:], marker='x', color="red", linewidth=7, label="Predicted data")
    plt.xticks(label_x, index_x,rotation=45)
    plt.ylim([-500, test_y[-1] + 2000])
    plt.legend()
    
    for i in range(len(x)):
        plt.text(x[i], test_y[i] + 200, y[i], ha='center', va='bottom', fontsize=12, color='red')
    for a, b in zip(label_x, test_y):
        plt.text(a, b + 800, int(b), ha='center', va='bottom', fontsize=12)
    
    plt.show()

2. Draw the national epidemic map of new pneumonia in Wuhan in 2020 with pyechards

The above method of exponential regression is used to predict the number of confirmed patients, but there is no more intuitive description of the distribution of confirmed patients and the specific distribution in various provinces of China. Therefore, here I crawled the information on the real-time tracking of Tencent's new coronary pneumonia condition with a crawler to facilitate accurate analysis.

It can be clearly observed in the figure below that, except that the new pneumonia is mainly distributed in the areas adjacent to Hubei Province, there are a large number of confirmed cases in the developed coastal areas. According to the observation map, the transmission of Spring Festival pneumonia has the greatest impact on the coastal areas. There are many population movements in this area, which makes the transmission of new pneumonia more serious.

Therefore, I hope you will pay attention to this new type of pneumonia. After all, it can be seen from the picture that the transmission range of the new type of pneumonia is so large. I hope you can spend this time in good health, hahaha
! [insert picture description here]( https://img-blog.csdnimg.cn/20200203160720171.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDk4NjQ1OQ==,size_16,color_FFFFFF,t_70)
Data source: Real time tracking of Tencent's new type of coronary pneumonia

Data crawling date: 15:00 on February 3, 2020

    Source code:
    
    from bs4 import BeautifulSoup
    from selenium import webdriver
    import pandas as pd
    from pyecharts import Bar
    from pyecharts import Geo
    from pyecharts import Map
    
    Province_list,Confirm_list,Heal_list,Dead_list = [],[],[],[]
    driver_path = r'F:\Python-Study-Files\anzhuang\chromedriver\chromedriver.exe'
    driver = webdriver.Chrome(executable_path=driver_path)
    driver.get("https://news.qq.com//zt2020/page/feiyan.htm")
    html = driver.page_source
    driver.quit()
    soup = BeautifulSoup(html, 'lxml')
    
    h2_list = soup.find_all('h2',class_='blue')
    for h2 in h2_list[1:35]:
        Province_name = h2.get_text()
        Province_list.append(Province_name)
    
    confirm_list = soup.find_all('div',class_='confirm')
    for confirm in confirm_list[3:37]:
        confirm_list = confirm.get_text()
        Confirm_list.append(confirm_list)
    
    heal_list = soup.find_all('div',class_='heal')
    for heal in heal_list[0:34]:
        heal_list = heal.get_text()
        Heal_list.append(heal_list)
    
    dead_list = soup.find_all('div',class_='dead')
    for dead in dead_list[3:37]:
        dead_list = dead.get_text()
        Dead_list.append(dead_list)
    
    map = Map("2020 National epidemic map of new pneumonia in Wuhan", "By 2020/2/3/15:00",title_color="#404a59", title_pos="center",width=1000,height=800)
    map.add("",Province_list,Confirm_list,is_map_symbol_show=True,maptype='china',is_visualmap=True,is_piecewise=True,visual_text_color='#000',
                 is_label_show=True, pieces=[
                {"max": 10000, "min": 1001, "label": ">1000"},
                {"max": 1000, "min": 500, "label": "500-1000"},
                {"max": 499, "min": 200, "label": "200-499"},
                {"max":199,"min":100,"label":"100-199"},
                {"max":99,"min":10,"label":"10-99"},
                 {"max":9,"min":1,"label":"1-9"}])
    map.render("2020 National epidemic map of new pneumonia in Wuhan.html")

The first time to write a blog, I hope to make more corrections!

reference resources:
https://blog.csdn.net/qq_26822029/article/details/104106679
https://blog.csdn.net/weixin_43746433/article/details/91346371

Added by cavemaneca on Sat, 22 Jan 2022 16:36:59 +0200

Programming VIP

Data analysis of coronavirus pneumonia in 2020