This paper compiles relevant programs according to the book Empirical Asset Pricing written by Bai et al. The module of portfolio analysis is EAP portfolio_ analysis. The Package of this article has been published on Github:
Github: GitHub - whyecofiliter/EAP: empirical asset pricing
The value effect was discovered in the 1980s and became famous in the research of Fama and French (1993). As one of the systematic factors, value factor is composed of book price earnings ratio (BM) variables. The positive correlation between BM and expected return has been widely tested in the literature (Fama and French, 1993,1995; Stattman, 1980; Barber and Lyon, 1997). In addition to BM, some value related variables are also considered. The most common variable is the P / E ratio EP (Basu, 1983). Aras and Yilmaz (2008) and Cakici et al. (2013) found that value effects exist in 12 and 18 emerging markets respectively.
As for the value effect of the Chinese market, Liu et al. (2019) thought that EP was more appropriate than BM. They constructed the value factor of the Chinese market through BM. In this demo, BM and EP are tested, and some adjustments are made to these variables, and the opposite forms of these two variables are used, which are named P / E ratio (PE) and P / B ratio (PB) respectively. Therefore, the relationship between factor and expected return is negative.
According to convention, stocks are grouped separately by two variables (market value (scale) and value variables) to form a 5 * 5 portfolio matrix. This data set contains monthly Chinese A-share stocks from 2000-01-01 to 2019-12-31 in CSMAR data set. Warning: do not use the dataset in this demo for any commercial purposes.
import pandas as pd import sys, os sys.path.append(os.path.abspath("..")) # %% import data # Monthly return of stocks in China security market month_return = pd.read_hdf('.\data\month_return.h5', key='month_return') company_data = pd.read_hdf('.\data\last_filter_pe.h5', key='data')
Data preprocessing
# %% preprocessing data # forward the monthly return for each stock # emrwd is the return including dividend month_return['emrwd'] = month_return.groupby(['Stkcd'])['Mretwd'].shift(-1) # emrnd is the return including no dividend month_return['emrnd'] = month_return.groupby(['Stkcd'])['Mretnd'].shift(-1) # select the A share stock month_return = month_return[month_return['Markettype'].isin([1, 4, 16])] # % distinguish the stocks whose size is among the up 30% stocks in each month def percentile(stocks) : return stocks >= stocks.quantile(q=.3) month_return['cap'] = month_return.groupby(['Trdmnt'])['Msmvttl'].apply(percentile) # %% merge data from pandas.tseries.offsets import * month_return['Stkcd_merge'] = month_return['Stkcd'].astype(dtype='string') month_return['Date_merge'] = pd.to_datetime(month_return['Trdmnt']) #month_return['Date_merge'] += MonthEnd() company_data['Stkcd_merge'] = company_data['Symbol'].dropna().astype(dtype='int').astype(dtype='string') company_data['Date_merge'] = pd.to_datetime(company_data['TradingDate']) company_data['Date_merge'] += MonthBegin()
Consistent with the scale effect, the data set began from 2000-01
# %% dataset starts from '2000-01' company_data = company_data[company_data['Date_merge'] >= '2000-01'] month_return = month_return[month_return['Date_merge'] >= '2000-01'] return_company = pd.merge(company_data, month_return, on=['Stkcd_merge', 'Date_merge'])
Four data sets were tested. Due to the specific requirements of IPO, some dominant companies will acquire a small listed company and join the A-share market. Therefore, in the study of Liu et al. (2016), stocks with a scale of less than 30% were abandoned. Both PE and PB were tested.
# %% construct test_data for bivariate analysis # dataset 1 : PE from portfolio_analysis import Bivariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_1 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)] test_data_1 = test_data_1[['emrwd', 'Msmvttl', 'PE1A', 'Date_merge']].dropna() test_data_1 = test_data_1[(test_data_1['Date_merge'] >= '2000-01-01') & (test_data_1['Date_merge'] <= '2019-12-01')] # analysis bi_1 = Bivariate(np.array(test_data_1), number=4) bi_1.average_by_time() bi_1.summary_and_test() bi_1.print_summary_by_time() bi_1.print_summary() ===================================================================================== +-------+--------+--------+--------+--------+--------+--------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+--------+ | 1 | 0.019 | 0.017 | 0.013 | 0.013 | 0.01 | -0.009 | | | 3.024 | 2.693 | 2.003 | 1.89 | 1.472 | -3.551 | | 2 | 0.014 | 0.015 | 0.011 | 0.011 | 0.009 | -0.005 | | | 2.25 | 2.415 | 1.779 | 1.658 | 1.374 | -1.947 | | 3 | 0.014 | 0.01 | 0.009 | 0.009 | 0.005 | -0.009 | | | 2.298 | 1.648 | 1.478 | 1.36 | 0.778 | -3.19 | | 4 | 0.011 | 0.011 | 0.009 | 0.009 | 0.005 | -0.006 | | | 1.901 | 1.871 | 1.507 | 1.431 | 0.799 | -1.98 | | 5 | 0.013 | 0.009 | 0.007 | 0.006 | 0.002 | -0.011 | | | 2.204 | 1.678 | 1.258 | 0.988 | 0.331 | -2.984 | | Diff | -0.007 | -0.008 | -0.006 | -0.007 | -0.008 | -0.001 | | | -1.876 | -2.628 | -1.819 | -2.117 | -2.389 | -0.446 | +-------+--------+--------+--------+--------+--------+--------+
The first data set does not include the tail 30% stocks from January 1, 2000 to December 1, 2019, and the value factor is PE. Since the t-test statistics of differential portfolio are -3.551, -1.947, -3.19, -1.98 and -2.984 respectively, the absolute value is usually greater than 2, corresponding to the significance level of 0.05, so the value effect is significant.
# %% construct test_data for bivariate analysis # dataset 2 : PB from portfolio_analysis import Bivariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_2 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)] test_data_2 = test_data_2[['emrwd', 'Msmvttl', 'PBV1A', 'Date_merge']].dropna() test_data_2 = test_data_2[(test_data_2['Date_merge'] >= '2000-01-01') & (test_data_2['Date_merge'] <= '2019-12-01')] # analysis bi_2 = Bivariate(np.array(test_data_2), number=4) bi_2.average_by_time() bi_2.summary_and_test() bi_2.print_summary_by_time() bi_2.print_summary() ================================================================================= +-------+--------+--------+--------+--------+--------+--------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+--------+ | 1 | 0.016 | 0.015 | 0.012 | 0.014 | 0.008 | -0.008 | | | 2.41 | 2.321 | 1.904 | 1.969 | 1.135 | -2.467 | | 2 | 0.013 | 0.012 | 0.012 | 0.011 | 0.006 | -0.007 | | | 2.075 | 1.87 | 1.877 | 1.645 | 0.955 | -1.902 | | 3 | 0.012 | 0.009 | 0.008 | 0.01 | 0.007 | -0.005 | | | 1.898 | 1.479 | 1.302 | 1.571 | 1.075 | -1.492 | | 4 | 0.012 | 0.011 | 0.008 | 0.006 | 0.008 | -0.004 | | | 1.911 | 1.804 | 1.284 | 0.996 | 1.266 | -1.075 | | 5 | 0.011 | 0.011 | 0.008 | 0.007 | 0.006 | -0.005 | | | 1.975 | 1.995 | 1.423 | 1.282 | 1.111 | -1.185 | | Diff | -0.004 | -0.004 | -0.004 | -0.006 | -0.002 | 0.003 | | | -1.301 | -1.127 | -1.38 | -1.726 | -0.418 | 0.785 | +-------+--------+--------+--------+--------+--------+--------+
The second data set does not include the tail 30% stocks, and the value variable is PB, from 2000-01-01 to 2019-12-01. The value effect is partially significant, because the t-test statistics of differential portfolio are -2.467, -1.902, -1.492, -1.075 and -1.185 respectively, and its absolute value usually does not exceed 2, corresponding to the significance level of 0.05.
# %% construct test_data for bivariate analysis # dataset 3 : PE from portfolio_analysis import Bivariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_3 = return_company[return_company['Ndaytrd']>=10] test_data_3 = test_data_3[['emrwd', 'Msmvttl', 'PE1A', 'Date_merge']].dropna() test_data_3 = test_data_3[(test_data_3['Date_merge'] >= '2000-01-01') & (test_data_3['Date_merge'] <= '2019-12-01')] # analysis bi_3 = Bivariate(np.array(test_data_3), number=4) bi_3.average_by_time() bi_3.summary_and_test() bi_3.print_summary_by_time() bi_3.print_summary() ============================================================================== +-------+--------+--------+--------+--------+--------+--------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+--------+ | 1 | 0.023 | 0.023 | 0.022 | 0.022 | 0.022 | -0.001 | | | 3.492 | 3.471 | 3.239 | 3.107 | 3.043 | -0.664 | | 2 | 0.019 | 0.017 | 0.014 | 0.013 | 0.011 | -0.008 | | | 3.04 | 2.619 | 2.163 | 1.915 | 1.635 | -3.531 | | 3 | 0.014 | 0.013 | 0.011 | 0.01 | 0.009 | -0.005 | | | 2.296 | 2.181 | 1.723 | 1.57 | 1.338 | -2.334 | | 4 | 0.013 | 0.009 | 0.009 | 0.007 | 0.005 | -0.008 | | | 2.159 | 1.626 | 1.519 | 1.138 | 0.69 | -3.141 | | 5 | 0.012 | 0.009 | 0.007 | 0.006 | 0.003 | -0.009 | | | 2.098 | 1.648 | 1.273 | 0.992 | 0.521 | -2.597 | | Diff | -0.011 | -0.014 | -0.015 | -0.016 | -0.018 | -0.007 | | | -2.816 | -3.834 | -4.025 | -4.372 | -5.071 | -2.236 | +-------+--------+--------+--------+--------+--------+--------+
The third data set contains 30% stocks in the tail, and the value variable is PE from 2000-01-01 to 2019-12-01. The value effect is significant, because the t-test statistics of differential portfolio are -0.664, -3.531, -2.334, -3.141, -2.597 respectively, and its absolute value is usually greater than 2, corresponding to the significance level of 0.05.
# %% construct test_data for bivariate analysis # dataset 4 : PB from portfolio_analysis import Bivariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_4 = return_company[(return_company['Ndaytrd']>=10)] test_data_4 = test_data_4[['emrwd', 'Msmvttl', 'PBV1A', 'Date_merge']].dropna() test_data_4 = test_data_4[(test_data_4['Date_merge'] >= '2000-01-01') & (test_data_4['Date_merge'] <= '2019-12-01')] # analysis bi_4 = Bivariate(np.array(test_data_4), number=4) bi_4.average_by_time() bi_4.summary_and_test() bi_4.print_summary_by_time() bi_4.print_summary() =================================================================== +-------+--------+--------+--------+--------+--------+--------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+--------+ | 1 | 0.022 | 0.024 | 0.025 | 0.022 | 0.02 | -0.002 | | | 3.342 | 3.475 | 3.554 | 3.054 | 2.714 | -0.905 | | 2 | 0.017 | 0.016 | 0.013 | 0.015 | 0.009 | -0.008 | | | 2.632 | 2.365 | 1.939 | 2.239 | 1.337 | -2.653 | | 3 | 0.013 | 0.013 | 0.012 | 0.01 | 0.007 | -0.006 | | | 2.045 | 2.071 | 1.9 | 1.616 | 1.051 | -1.827 | | 4 | 0.011 | 0.01 | 0.009 | 0.008 | 0.007 | -0.004 | | | 1.783 | 1.592 | 1.416 | 1.299 | 1.167 | -1.093 | | 5 | 0.011 | 0.011 | 0.008 | 0.007 | 0.006 | -0.006 | | | 1.975 | 1.914 | 1.382 | 1.174 | 0.997 | -1.392 | | Diff | -0.011 | -0.013 | -0.017 | -0.015 | -0.015 | -0.004 | | | -2.803 | -3.498 | -4.532 | -3.934 | -3.144 | -0.891 | +-------+--------+--------+--------+--------+--------+--------+
The fourth data set contains 30% stocks in the tail, and the value variable is PB, from 2000-01-01 to 2019-12-01. The value effect is significant, because the t-test statistics of differential portfolio are -0.905, -2.653, -1.827, -1.093 and -1.392 respectively, and its absolute value usually does not exceed 2, corresponding to the significance level of 0.05.