Data collected from A-share data analysis: stock lists and stock prices

Data is the premise of data analysis. This paper mainly describes how to use Python to collect the basic stock data of Shanghai, Shenzhen and China: stock list and stock price.

1. Stock List

As we all know, for A shares, there are two exchanges in China, namely the Shanghai Stock Exchange and the Shenzhen Stock Exchange.We mainly get a list of all A shares from their official website.

For the Shanghai Stock Exchange, we start from ( http://www.sse.com.cn/assortment/stock/list/share/) Download, when you open the page, you will see a download button in the upper right corner, as shown in the following image:

So how do we download this data from Python?Let's go directly to the code, as follows:

from urllib import request

#Download A-Stock stock list

sse_stock_list_url = 'http://query.sse.com.cn/security/stock/downloadStockListFile.do?csrcCode=&stockCode=&areaName=&stockType=1'
request_headers = {'X-Requested-With': 'XMLHttpRequest',
                   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/56.0.2924.87 Safari/537.36',
                   'Referer': 'http://www.sse.com.cn/assortment/stock/list/share/'
                  }

req = request.Request(sse_stock_list_url, headers=request_headers)
resp = request.urlopen(req)
result = resp.read().decode('gb2312')#please use gb2312 to decode otherwise you will not get correct data

It is important to note here that headers must be provided, otherwise the following errors will be prompted:

null({"jsonCallBack":"null","success":"false","error":"System busy...","errorType":"ExceptionInterceptor"})

The above program runs as follows:

The returned list of stocks is a csv file separated by \t. Since we have parsed the data into a string through the decode function, we use pandas to parse the string directly below:

import pandas as pd
from io import StringIO

TESTDATA = StringIO(result)
df = pd.read_csv(TESTDATA, sep='\t')
print(df)

The results are as follows:

For lists of stocks on the Shenzhen Stock Exchange, this can be done in a similar way, except that the URL and request parameters are different.Open http://www.szse.cn/market/stock/list/index.html, on which there is also a button to download the list of stocks, as shown in the following image:
SZSE
Here is the python code to download the list:

szse_stock_list_url = 'http://www.szse.cn/api/report/ShowReport?SHOWTYPE=xlsx&CATALOGID=1110&TABKEY=tab1'
szse_stock_list_file = 'szse_stock_list.xlsx'
request.urlretrieve(szse_stock_list_url, szse_stock_list_file)

The above code downloads the list of stocks to a file named szse_stock_list.xlsx in the current directory in the format excel. Here we use pandas to read the file:

data = pd.read_excel(szse_stock_list_file)
print(data)

The above code runs as follows:

2. Stock prices

In the previous section, we obtained a list of stocks in Shanghai and Shenzhen and imported them into pandas. Now let's see how to get the daily stock price information of these stocks.Here we use NetEase Finance, the following is an example:

http://quotes.money.163.com/service/chddata.html?code=0600138&start=20040101&end=20190710&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP

Here, we need to explain a few parameters in the URL:

  1. Code: Stock code, this information is already available in the above section, but it should be noted that stocks starting with 0 need to be added 1 before the original code, and stocks starting with 6 need to be added 0.
  2. start:Start Date
  3. end:End date, along with start, indicates which period of stock price information we want to obtain
  4. fields:List the areas of data you want to obtain, such as opening price (TOPEN), etc.

Here is the code to get the share price and the results of its operation. Since the previous section obtained too many shares, here is only 600138 as an example:

stock_code='0600138'#Note: Add a 0 before the original code
start_date='20040101'
end_date='20190710'
stock_price_csv = '600138.csv'
url = f'http://quotes.money.163.com/service/chddata.html?code={stock_code}&' \
            f'start={start_date}&end={end_date}&' \
            f'fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP'
request.urlretrieve(url,stock_price_csv)

The code imported into pandas is as follows

stock_price_data = pd.read_csv(stock_price_csv, encoding='gbk')
print(stock_price_data)

The results are as follows:

With the stock price data, we can proceed to the next data analysis, which I will explain in subsequent articles.

Keywords: Python Windows Excel encoding

Added by keeB on Sun, 01 Mar 2020 05:14:56 +0200