Stock price trend prediction and quantitative investment case visual demonstration system (with code)

abstract

With the rapid development of global economy and stock market, stock investment has become one of the common financial management methods. In recent years, quantitative investment has attracted more and more attention because of its excellent discipline, accuracy, timeliness and systematicness. Compared with the western mature market, China's quantitative investment is still in its infancy, with some shortcomings and broad development prospects. At the same time, with the rapid development of artificial intelligence technology, there is a spark between machine learning and quantitative investment research. Therefore, aiming at the problem of quantitative stock selection, this paper combines machine learning and technical analysis, constructs a decision tree model, obtains the top ten stock portfolio, and makes an empirical test through the visual interface, The return of each stock in the portfolio and the total return of the portfolio are obtained respectively.
Keywords: decision tree; Quantitative stock selection; Visualization; sharpe ratio

1, Problem restatement

1.1 problem background
With the development of software technology and artificial intelligence technology, a large number of cumbersome data analysis and processing tasks have gradually changed from manual execution to computer automatic operation. This change is also quietly taking place in the financial sector in the pursuit of accuracy and efficiency. The subjective securities investment, once regarded as an art, has been gradually replaced by the quantitative investment strategy attached to the computer. Quantitative investment builds a mathematical model according to people's investment ideas and investment experience, and uses computers to process a large number of historical data to verify the effectiveness of the model in a short time. Only when the performance of the model meets the requirements in the historical data can it be further applied to the real transaction. Therefore, for stock investment, quantitative stock selection is the basis. Without good stock selection technology, the effect of quantitative investment will be greatly reduced.
1.2 problems to be solved
Design a quantitative stock selection system, which can arbitrarily select 28 first-class industries of Shenyin Wanguo, select the top 10 stocks based on the overall scale and investment efficiency method to build the portfolio, and conduct empirical test through the visual interface to obtain the return of each stock in the portfolio and the total return of the portfolio.

2, Problem analysis

The research object of the problem is any given multiple stocks, and the research content is quantitative stock selection strategy. The essence of this problem is to adopt the decision tree model in machine learning, use the preprocessed data over the years as the training set, select the appropriate label and gradually cut it into different subsets until all the training samples are classified correctly, classify and predict the data of the next year, and then construct the portfolio according to the classification results. The core of the model is to select appropriate features, that is, feature extraction.

3, Symbol description and noun definition

4, Model establishment and solution

4.1 data preprocessing
Data preprocessing includes data cleaning and data visualization and analysis.
There are two annexes in total. The data in the annex gives the relevant information of Shenyin Wanguo's 28 primary industry stocks:
Annex 1: stock code of Shenwan industry;
Annex 2: stock market data of shenwanyi industry (from January 1, 2021 to June 30, 2021);
There are invalid values in Annex 1, so Annex 1 needs to clean and sort out the data, and use Excel and Python software to preprocess the data as follows. Using the above data, the problem is solved and analyzed.
4.1. 1 data cleaning
Some stocks in Annex 1 are delisted, but their codes are still. The specific codes are as follows: 688509
688688, 688385, 688670, 688148, 688071, 688778 and 688779 must be deleted.
4.1. 2 data visualization and analysis
Annex 1 the number of constituent shares of each primary industry of Shenyin Wanguo is shown in Figure 1.

Figures 2, 3 and 4 show the rise and fall, income index and trading volume of Shenyin Wanguo's primary industries.

4.2 characteristic Engineering
There are generally many indicators to measure a stock. Here we select five indicators: yield, alpha, sharp ratio, maximum pullback and beta.
The yield is the ratio between the stock return (stock appreciation + dividend) and the initial investment. The higher the value, the better the stock performance without considering the risk.
Sharp ratio describes the degree of return that a stock or portfolio can obtain under unit risk compared with risk-free return. It normalizes the risk of stocks or portfolios to better compare the effectiveness of portfolios. The higher the value, the better the performance of the stock or portfolio considering the risk.
Pullback describes the maximum loss that investors may face and is an important risk indicator to measure the anti risk ability of the portfolio. The maximum drawdown (MDD) is defined as the maximum value of the rate of return pullback at any historical point in the selected cycle. The lower the pullback value, the better.
Alpha measures the active return (excess return) of a stock or portfolio relative to the market. alpha=0 indicates that the performance is consistent with the market. Alpha < 0 indicates that the return is worse than the market. Alpha > 0 indicates that stocks or portfolios outperform the market. alpha=1%, equivalent to 1% higher than the market income in the same period.
Beta measures the correlation between the stock or portfolio and the market trend, and explains the return of the stock or portfolio from the market (market return). The beta of the market is 1. Beta > 1 means that the stock or market is more related to the market trend, and the shock is more intense. Beta < 1 means that it is either less relevant to the market trend, or the shock is smaller than the market.
4.3 model establishment and solution
4.3. 1. Establishment of model
Here, the machine learning strategy is adopted and the support vector machine model is used. The specific algorithm is described as follows:
Table 1 specific algorithm description of Niubi

The results obtained by the above algorithm support the next calculation
The number of stocks in Annex 2 is 4408. Firstly, calculate the rate of return for each stock. The specific algorithm description is shown in Table 2:
Table 2 description of return algorithm

Similarly, according to the above algorithm process, the maximum stock pullback rate, alpha and beta values can be calculated:

4.3. 2 solution of model
Calculate the stocks in Annex II respectively, and the final results are as follows (only the top 15 are ranked, and the calculation results retain one digit after the decimal point):
Table 3 ranking of stock returns

Table 4 ranking of maximum stock pullback rate

Table 5 ranking of maximum stock pullback rate

The final total score is calculated by the following formula:

The scoring results are as follows:

According to the following formula, the sharp ratio of the portfolio is 1.43, indicating that the portfolio is a better portfolio after comprehensive income and risk.

4.3. 3 performance evaluation of the model
The classification accuracy, precision, recall, F1 score and AUC value were used as the indexes to evaluate the support vector machine. The accuracy rate is the proportion of the samples with correct classification to the total number, the accuracy is the proportion of the number of samples with correct prediction to the number of samples with positive prediction, the recall rate is the proportion of the number of samples with correct prediction to the actual number of samples with positive prediction, the F1 score is the harmonic average of the accuracy and recall rate, and the AUC is the area under the ROC curve, the larger the better. The model performance is shown in Table 6.
Table 6 model evaluation

5, Evaluation and improvement of model

5.1 model evaluation
5.1. 1 advantages of the model
(1) Fast speed: the amount of calculation is relatively small, and it is easy to transform into classification rules As long as you walk down the tree root to the leaf, the splitting conditions along the way can uniquely determine a predicate of classification;
(2) High accuracy: the mined classification rules are accurate and easy to understand. The decision tree can clearly display which fields are important, that is, understandable rules can be generated;
(3) Can handle continuous and category fields;
(4) No domain knowledge and parameter assumptions are required;
(5) Suitable for high-dimensional data.
5.1. 2 disadvantages of the model
(1) For the data with different sample numbers in each category, the information gain tends to the characteristics with more values;
(2) Easy over fitting;
(3) Ignore dependencies between attributes.
5.2 model improvement
Overfitting is an important practical difficulty for decision tree learning and many other learning algorithms. There are several ways to avoid over fitting in decision tree learning. They can be divided into two categories:
(1) Stop the growth tree method as soon as possible and stop the growth tree before ID3 algorithm perfectly classifies the training data;
(2) Post pruning method allows the tree to over fit the data, and then post prune the tree.
Although the first method may seem more direct, the second method of post pruning over fitted trees has proved more successful in practice. This is because in the first method, it is difficult to accurately estimate when to stop growing the tree. Whether we get the correct size of the tree by stopping early or pruning later, a key problem is what criteria to use to determine the final correct size of the tree. Solutions to this problem include:
(1) A set of separate samples, which are different from the training samples, are used to evaluate the utility of pruning nodes from the tree by post pruning method;
(2) Use all available data for training, but conduct statistical tests to estimate whether expanding (or pruning) a specific node may improve the performance on instances outside the training set;
(3) A clear standard is used to measure the complexity of training samples and decision tree coding. When the length of this coding is the smallest, stop growing the tree.

6, Application of model

The financial industry can use the decision tree for loan risk assessment, the insurance industry can use the decision tree for insurance promotion prediction, the medical industry can use the decision tree to generate auxiliary diagnosis and disposal models, and so on.

reference

[1] Li Bin, Shao Xinyue, Li eryang Research on fundamental quantitative investment driven by machine learning [J] China industrial economy, 2019 (08): 61-79
[2] Huang Hongyuan, Wang Mei Multi factor stock selection model based on multiple regression analysis [J] Journal of assimilation demonstration college, 2016
[3] Ding Peng Quantitative investment strategy and technology [M]. Beijing: Electronic Industry Press, 2014,24-29
[4] Zhou Zhihua Machine learning [M] · Beijing: Tsinghua University Press, 2016121-137
[5] Wang yuanfan, Shi Yong, Xue Zhi Research on port scanning malicious traffic detection based on decision tree [J] Communication technology, 2020,53 (08): 2002-2005
[6] Li Kai, Tan Haibo, Wang Haiyuan, Xie Yuman, Huang Hongqiao, bu Wenbin, Tan Cong, Peng Xiao, Guo Guang, Liu mouhai, Chen Hao A main network line state detection method, system and medium based on decision tree [P] Hunan Province: cn111612149a, September 1, 2020
[7] Zhou Jian Empirical Study on multi factor stock selection model based on SVM algorithm [D] Zhejiang Business University, 2017
[8] Cao Zhengfeng, Ji Hong, Xie bangchang Using random forest algorithm to realize the selection of high-quality stocks [J] The capital economy

Appendix 1

1. Data cleaning

1.	import pandas as pd
2.	from jqdatasdk import *
3.	from jqdatasdk.api import get_price, get_query_count
4.	from numpy import nan
5.	
6.	auth('13259391862', '2001720Lmt')
7.	get_query_count()
8.	
9.	industry_code = pd.read_csv("Industries.csv", index_col=0)
10.	stock_code = pd.read_csv("total.csv",index_col=0,dtype=str)
11.	stock_code.columns = [x for x in industry_code.index]
12.	
13.	for j in stock_code.index:
14.	    for i in stock_code.columns:
15.	        try:
16.	            if stock_code[i][j] is not nan:
17.	                prices = get_price(
18.	                    stock_code[i][j],
19.	                    start_date='2021-01-01',
20.	                    end_date='2021-06-30',
21.	                    frequency='1d',
22.	                    fields=['open', 'close', 'low', 'high', 'avg'])
23.	                prices.to_csv(stock_code[i][j] + '.csv')
24.	                print("Data entry" + stock_code[i][j] + "Completed...")
25.	            else:
26.	                continue
27.	        except:
28.	            print(stock_code[i][j] + "Unable to enter...")

2. Tactics

1.	from datetime import timedelta
2.	import jqdata
3.	import scipy.optimize as optimize
4.	import statsmodels.api as sm
5.	from jqdatasdk import valuation, balance, income, indicator, get_fundamentals
6.	
7.	
8.	# Initialize functions, set benchmarks, etc
9.	def initialize(context):
10.	    set_benchmark('000300.XSHG')
11.	    # Enable dynamic reversion mode (real price)
12.	    set_option('use_real_price', True)
13.	    # Output content to log info()
14.	    log.info('The initial function starts running and the global is run only once')
15.	    # The handling fee for each transaction of stocks is: 0.03% of the commission when buying, 0.03% of the commission when selling and 1 / 1000 of the stamp duty. The minimum commission for each transaction is 5 yuan
16.	    set_order_cost(OrderCost(close_tax=0.001, open_commission=0.0003, close_commission=0.0003, min_commission=5),
17.	                   type='stock')
18.	    # Operation before opening
19.	    run_daily(before_market_open,time='before_open',
20.	reference_security='000300.XSHG')
21.	    # Run after closing
22.	run_daily(after_market_close,time='after_close', 
23.	reference_security='000300.XSHG')
24.	    set_parameters()
25.	
26.	
27.	## Run function before opening
28.	def before_market_open(context):
29.	    # Output run time
30.	log.info('Function run time(before_market_open): ' + 
31.	str(context.current_dt.time()))
32.	
33.	factors = ['CMC', 'MC', 'CMC/C', 'TOE/MC', 
34.	'PB', 'NP/MC', 'TP/MC', 'TA/MC', 
35.	'OP/MC', 'CRF/MC', 'PS', 'OR/MC',
36.	               'RP/MC', 'TL/TA', 'TCA/TCL', 'PE', 
37.	'OR*ROA/NP', 'GPM', 'IRYOY', 'IRA', 
38.	'INPYOY', 'INPA', 'NPM', 'OPTTR',
39.	               'C', 'CC', 'PR', 'PRL', 'ROE', 
40.	'ROA', 'EPS', 'ROIC', 'ZYZY']
41.	    # Factor get factor parameters
42.	    theta, mu, sigma = getThetaByFeatures(context, factors)
43.	    # Factor stock selection
44.	    if sum(theta) == 0:
45.	        return
46.	    stock_list = selectStocks(context, factors, theta, mu, sigma)
47.	    stock_list = unStartWith300(stock_list)
48.	    stock_list = filter_paused_and_st_stock(stock_list)
49.	    stock_list = filter_limitup_stock(context, stock_list)
50.	    stock_list = filter_limitup_stock(context, stock_list)
51.	    g.stock_to_buy = stock_list
52.	
53.	
54.	## Run function after closing  
55.	def after_market_close(context):
56.	log.info(str('Function run time(after_market_close):' + 
57.	str(context.current_dt.time())))
58.	    # Get all transaction records of the day
59.	    trades = get_trades()
60.	    for _trade in trades.values():
61.	        log.info('Transaction record:' + str(_trade))
62.	    log.info('End of day')
63.	    log.info('#######################################################')
64.	
65.	
66.	# Set parameters
67.	def set_parameters():
68.	    g.period = 10
69.	    g.buyStockCount = 50
70.	    g.stock_to_buy = []
71.	    g.days = 0
72.	
73.	
74.	# transaction
75.	def trades(context, data, stock_list):
76.	    # Sell stocks that are not on the list
77.	    for stock in context.portfolio.positions.keys():
78.	        if stock not in stock_list:
79.	            order_target_value(stock, 0)
80.	
81.	    # Calculate the quantity that still needs to be purchased
82.	    num_to_buy = len(stock_list) - len(context.portfolio.positions.keys())
83.	    if num_to_buy == 0:
84.	        return
85.	    # Cash distribution
86.	    cash = context.portfolio.available_cash
87.	    cash = 1.0 * cash / num_to_buy
88.	    for stock in stock_list:
89.	        if stock not in context.portfolio.positions.keys():
90.	            order_target_value(stock, cash)
91.	
92.	
93.	# Daily operation
94.	def dailyRunning(context, data):
95.	    # Filter price limit stocks
96.	    stock_list = g.stock_to_buy
97.	    stock_list = filter_limitup_stock(context, stock_list)
98.	    stock_list = filter_limitup_stock(context, stock_list)
99.	    stock_list = stock_list[:g.buyStockCount]
100.	    # transaction
101.	    if g.days % g.period == 0:
102.	        trades(context, data, stock_list)
103.	    g.days += 1
104.	    pass
105.	
106.	
107.	## Run function at opening
108.	def handle_data(context, data):
109.	    # Get current time
110.	    hour = context.current_dt.hour
111.	    minute = context.current_dt.minute
112.	
113.	    # Every day at 14:50 p.m
114.	    if hour == 14 and minute == 50:
115.	        # Daily operation
116.	        dailyRunning(context, data)
117.	
118.	
119.	# Remove gem
120.	def unStartWith300(stockspool):
121.	    return [stock for stock in stockspool if stock[0:3] != '300']
122.	
123.	
124.	# Filter suspended, ST stocks and other stocks with delisting labels
125.	def filter_paused_and_st_stock(stock_list):
126.	    current_data = get_current_data()
127.	    return [stock for stock in stock_list if not current_data[stock].paused and not current_data[stock].is_st
128.	            and 'ST' not in current_data[stock].name and '*' not in current_data[stock].name and 'retreat' not in
129.	            current_data[stock].name]
130.	
131.	
132.	# Filter trading stocks
133.	def filter_limitup_stock(context, stock_list):
134.	    last_prices = history(1, unit='1m', field='close', security_list=stock_list)
135.	    current_data = get_current_data()
136.	
137.	    # The stock already in the position is not filtered even if it rises or falls, so as to avoid that the stock can be bought again, but it will be filtered and lead to the selection of other stocks
138.	    return [stock for stock in stock_list if stock in context.portfolio.positions.keys()
139.	            or last_prices[stock][-1] < current_data[stock].high_limit]
140.	
141.	
142.	# Filter down limit stocks
143.	def filter_limitdown_stock(context, stock_list):
144.	last_prices = history(1, unit='1m', field='close', 
145.	security_list=stock_list)
146.	    current_data = get_current_data()
147.	    return [stock for stock in stock_list if stock in context.portfolio.positions.keys()
148.	            or last_prices[stock][-1] > current_data[stock].low_limit]
149.	
150.	
151.	# Cost function
152.	def costFunction(theta, X, y):
153.	    m = len(y)
154.	    tmp_theta = theta.reshape(len(theta), 1).copy()
155.	    temp = X.dot(tmp_theta)
156.	    J = sum(np.square(temp - y)) / 2.0 / m
157.	    return J
158.	
159.	
160.	# gradient
161.	def gradient(theta, X, y):
162.	    # Number of samples
163.	    m = y.size
164.	    # Copy of parameters
165.	    tmp_theta = theta.reshape(len(theta), 1).copy()
166.	    # Prediction function
167.	    h = dot(X, tmp_theta)
168.	    # Gradient calculation
169.	    grad = 1.0 / m * X.T.dot(h - y)
170.	    grad = grad.flatten()
171.	    return grad
172.	
173.	
174.	# Cost function (regularization)
175.	def costFunctionReg(theta, X, y, mylambda):
176.	    m = len(y)
177.	    tmp_theta = theta.reshape(len(theta), 1).copy()
178.	    temp = X.dot(tmp_theta)
179.	    J = sum(np.square(temp - y)) / 2.0 / m + 1.0 * mylambda / 2 / m * sum(tmp_theta ** 2)
180.	    return J
181.	
182.	
183.	# gradient
184.	def gradientReg(theta, X, y, mylambda):
185.	    # Number of samples
186.	    m = y.size
187.	    # Copy of parameters
188.	    tmp_theta = theta.reshape(len(theta), 1).copy()
189.	    # Prediction function
190.	    h = dot(X, tmp_theta)
191.	    # Gradient calculation
192.	    grad = 1.0 / m * X.T.dot(h - y) + 1.0 * mylambda / m * tmp_theta
193.	    grad = grad.flatten()
194.	    return grad
195.	
196.	
197.	# Mean normalization
198.	def featureNormalize(X):
199.	    mu = mean(X)
200.	    sigma = std(X)
201.	    X_norm = 1.0 * (X - mu) / sigma
202.	    return X_norm, mu, sigma
203.	
204.	
205.	# The fitting parameters are obtained
206.	def getThetaByFeatures(context, factors):
207.	    period = g.period
208.	    period = 1
209.	    # Last warehouse adjustment date
210.	    yesterday = context.previous_date
211.	    daysBefore = yesterday - timedelta(days=period * 2)  # period * 2
212.	    trade_days = jqdata.get_trade_days(start_date=daysBefore, end_date=yesterday)
213.	    log.info(trade_days)
214.	    log.info(trade_days[-period - 1:])
215.	    # cycle
216.	    trade_days = trade_days[-period - 1:]
217.	    # Start and end date
218.	    start_date = trade_days[0]
219.	    end_date = trade_days[-1]
220.	    # Get the factor data of the stock on the last position adjustment day and construct the feature combination
221.	    x_df = get_factors(start_date, factors)
222.	    # Feature scaling
223.	    # Get the increase of stocks on the last position adjustment day and build the result portfolio
224.	    stock_list = x_df.index.tolist()
225.	    df = get_price(stock_list, start_date=start_date, end_date=end_date, frequency='daily', fields=['close'])['close']
226.	    y_se = df.ix[-1] / df.ix[0] - 1
227.	    y = y_se[~ np.isnan(y_se)]
228.	    x = x_df.ix[y.index.tolist()]
229.	    n = len(x_df.columns)
230.	    m = len(y)
231.	    X_norm, mu, sigma = featureNormalize(x)
232.	    X = sm.add_constant(X_norm)
233.	    for i in X.columns:
234.	        X[i] = np.nan_to_num(X[i])
235.	    X = np.c_[X]
236.	    y = np.c_[y]
237.	    # Initialization parameters
238.	    initial_theta = np.zeros(n + 1)
239.	    # Regularization parameters
240.	    mylambda = 1
241.	    opts = {'disp': False,
242.	            'xtol': 1e-05,
243.	            'eps': 1.4901161193847656e-08,
244.	            'return_all': False,
245.	            'maxiter': None}
246.	    result = optimize.minimize(costFunctionReg, initial_theta, args=(X, y, mylambda), method='Newton-CG',
247.	                               jac=gradientReg, hess=None, hessp=None, tol=None, callback=None, options=opts)
248.	    theta = result.x
249.	    return theta, mu, sigma
250.	
251.	
252.	# Get factor data
253.	def get_factors(fdate, factors):
254.	    # stock_set = get_index_stocks('000300.XSHG',fdate)
255.	    q = query(
256.	        valuation.code,  # Stock code
257.	        valuation.circulating_market_cap,  # CMC current market value
258.	        valuation.market_cap,  # MC total market value
259.	        valuation.circulating_market_cap / valuation.capitalization * 10000,  # CMC/C current market value (100 million) / total share capital (10000) (closing price)
260.	        balance.total_owner_equities / valuation.market_cap / 100000000,  # TOE/MC owner's equity per yuan
261.	        valuation.pb_ratio,  # PB price to book ratio
262.	        income.net_profit / valuation.market_cap / 100000000,  # NP/MC owner's net profit per yuan
263.	        income.total_profit / valuation.market_cap / 100000000,  # TP/MC total profit per yuan
264.	        balance.total_assets / valuation.market_cap / 100000000,  # TA/MC total assets per yuan
265.	        income.operating_profit / valuation.market_cap / 100000000,  # OP/MC operating profit per yuan
266.	        balance.capital_reserve_fund / valuation.market_cap / 100000000,  # CRF/MC capital reserve per yuan
267.	        valuation.ps_ratio,  # PS market sales rate
268.	        income.operating_revenue / valuation.market_cap / 100000000,  # OR/MC operating income per yuan
269.	        balance.retained_profit / valuation.market_cap / 100000000,  # RP/MC undistributed profit per yuan
270.	        balance.total_liability / balance.total_sheet_owner_equities,  # TL/TA asset liability ratio
271.	        balance.total_current_assets / balance.total_current_liability,  # TCA/TCL current ratio
272.	        valuation.pe_ratio,  # PE P / E ratio
273.	        income.operating_revenue * indicator.roa / income.net_profit,  # OR*ROA/NP total asset turnover
274.	        indicator.gross_profit_margin,  # GPM gross margin on sales
275.	        indicator.inc_revenue_year_on_year,  # Yoy growth rate of IRYOY operating revenue (%)
276.	        indicator.inc_revenue_annual,  # Month on month growth rate of IRA operating revenue (%)
277.	        indicator.inc_net_profit_year_on_year,  # Year on year growth rate of INPYOY net profit (%)
278.	        indicator.inc_net_profit_annual,  # Month on month growth rate of INPA net profit (%)
279.	        indicator.net_profit_margin,  # NPM net profit margin (%)
280.	        indicator.operation_profit_to_total_revenue,  # OPTTR operating profit / total operating revenue (%)
281.	        valuation.capitalization,  # C total share capital
282.	        valuation.circulating_cap,  # CC circulating share capital (10000 shares)
283.	        valuation.pcf_ratio,  # PR market rate
284.	        valuation.pe_ratio_lyr,  # PRL P / E ratio LYR
285.	        indicator.roe,  # ROE return on net assets (%)
286.	        indicator.roa,  # ROA net interest rate of total assets (%)
287.	        indicator.eps,  # EPS earnings per share
288.	        # ROIC
289.	        # EBIT = net profit + Interest + tax
290.	        # ROIC
291.	        (income.net_profit + income.financial_expense + income.income_tax_expense) / (
292.	                balance.total_owner_equities + balance.shortterm_loan + balance.non_current_liability_in_one_year + balance.longterm_loan + balance.bonds_payable + balance.longterm_account_payable),
293.	        (
294.	                balance.accounts_payable + balance.advance_peceipts + balance.other_payable - balance.account_receivable - balance.advance_payment - balance.other_receivable) / (
295.	                balance.total_owner_equities + balance.shortterm_loan + balance.non_current_liability_in_one_year + balance.longterm_loan + balance.bonds_payable + balance.longterm_account_payable)
296.	    ).filter(
297.	        # valuation.code.in_(stock_set),
298.	        valuation.circulating_market_cap
299.	    )
300.	    fdf = get_fundamentals(q, date=fdate)
301.	    fdf.index = fdf['code']
302.	    fdf.columns = ['code'] + factors
303.	    # Row: select all, column, and return all factors except stock code
304.	    return fdf.iloc[:, 1:]
305.	
306.	# Stock selection method
307.	def selectStocks(context, factors, theta, mu, sigma):
308.	    x_df = get_factors(context.current_dt, factors)
309.	    X_norm = (x_df - mu) / sigma
310.	    X = sm.add_constant(X_norm)
311.	    for i in X.columns:
312.	        X[i] = np.nan_to_num(X[i])
313.	    # Copy of parameters
314.	    tmp_theta = theta.reshape(len(theta), 1).copy()
315.	    # Prediction function
316.	    h = dot(X, tmp_theta)
317.	    # Result assignment, predicted increase
318.	    X['predict'] = h
319.	X = X.sort(columns=['predict'], ascending=[False])
320.	return X.index.tolist()

Appendix 2

List of attachments:
Appendix 1: Appendix 1: stock code of shenwanyi industry csv
Appendix 2: Appendix 2: stock market data of shenwanyi industry (from January 1, 2021 to June 30, 2021)
Annex 3: Pictures
See upload resources for the appendix
Welcome to join me for wechat learning and discussion

Added by Pezmc on Tue, 04 Jan 2022 08:13:45 +0200

Programming VIP