AI machine learning self learning note lifting algorithm

Lifting algorithm is a method to improve the accuracy of weak classification algorithm. This method first constructs a series of prediction functions, and then combines them into a prediction function in a certain way.

Lifting algorithm is also a method to improve the accuracy of any given learning algorithm. It is an integrated algorithm. It mainly obtains the sample subset through the operation of the sample set, and then uses the weak classification algorithm to train on the sample subset to generate a series of base classifiers. It can be used to improve the recognition rate of other weak classification algorithms, that is, other weak classification algorithms are placed in the lifting framework as the base classification algorithm. Through the operation of the lifting framework on the training sample set, different training sample subsets are obtained, and then the sample subset is used to train the base classifier. Each time a sample set is obtained, a base classifier is generated on the sample set by the base classification algorithm. In this way, after a given number of training rounds n, N base classifiers can be generated, and then the lifting algorithm weights and fuses the N base classifiers to produce the final result classifier. In these N base classifiers, the recognition rate of each classifier is not necessarily very high, but the result of their combination has a high recognition rate, which improves the recognition rate of weak classification algorithm.

There are two main lifting algorithms:

AdaBoost

Stochastic Gradient Boosting

AdaBoost algorithm

AdaBoost is an iterative algorithm. Its core idea is to train different classifiers (~ ~ classifiers) for the same training set, and then collect these weak classifiers to form a stronger final classifier (strong classifier)

The implementation class of AdaBoost algorithm in sklearn is AdaBoost classifier

from pandas import read_csv 
from sklearn.model_selection import KFold 
from sklearn.model_selection import cross_val_score 
from sklearn.ensemble import AdaBoostClassifier

#Import data
filename = 'data/boston_housing.csv' 
names= ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PRTATIO','B','LSTAT','MEDV'] 
data = read_csv(filename , names=names) 
data=data.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) #Be sure to clean the data, otherwise an error will be reported if the data contains NAN
#Divide the data into input data and output results
array = data.values 
X = array [:,0:13] 
Y = array [:,13] 
Y = Y.astype(int).astype(float)#The label of Boston house price data set is floating-point, but the calculation method requires shaping, so it will report an error Unknown label type: 'continuous'. I see this solution on the Internet and don't understand the principle
num_folds = 10 
seed = 7 
kfold = KFold(n_splits=num_folds, random_state=seed,shuffle=True) 
num_tree = 100 
model = AdaBoostClassifier(n_estimators=num_tree,random_state=seed) 
model.fit(X,Y)
result= cross_val_score(model,X,Y,cv=kfold)
print(result.mean())
print(model.predict([data.values[5][:13]]))

The printing results are as follows

PS C:\coding\machinelearning> & C:/Users/admin/anaconda3/envs/pytorch/python.exe c:/coding/machinelearning/AdaBoost algorithm.py 
0.10845410628019322
[23.]
PS C:\coding\machinelearning> 

Random gradient lifting method (GBM)

Random gradient lifting method (BM) is based on the idea that to find the maximum value of a function, the best way is to explore along the gradient direction of the function. The gradient operator always points to the direction where the function value increases the fastest. Because the gradient lifting algorithm needs to traverse the whole data set every time it updates the data set, the computational complexity is high, so there is an improved algorithm - random gradient lifting algorithm The algorithm uses only one sample point to update the regression coefficient at a time, which greatly improves the computational complexity of the algorithm.

The implementation class of random gradient lifting method in sklearn is gradientboosting classifier.

from numpy.lib.function_base import gradient
from pandas import read_csv 
from sklearn.model_selection import KFold 
from sklearn.model_selection import cross_val_score 
from sklearn.ensemble import GradientBoostingClassifier
#Import data
filename = 'data/boston_housing.csv' 
names= ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PRTATIO','B','LSTAT','MEDV'] 
data = read_csv(filename , names=names) 
data=data.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) #Be sure to clean the data, otherwise an error will be reported if the data contains NAN
#Divide the data into input data and output results
array = data.values 
X = array[:,0:13] 
Y = array[:,13] 
Y = Y.astype(int).astype(float)#The label of Boston house price data set is floating-point, but the calculation method requires shaping, so it will report an error Unknown label type: 'continuous'. I see this solution on the Internet and don't understand the principle
num_folds = 10 
seed = 7 
kfold = KFold(n_splits=num_folds, random_state=seed,shuffle=True) 
num_tree = 100 
max_features = 3 
model = GradientBoostingClassifier(n_estimators=num_tree , random_state=seed, 
max_features=max_features) 
result= cross_val_score(model , X, Y, cv=kfold) 
print(result.mean()) 
#Forecast single data
model.fit(X,Y)
print(model.predict([data.values[18][:13]]))

The printing results are as follows:

PS C:\coding\machinelearning> & C:/Users/admin/anaconda3/envs/pytorch/python.exe c:/coding/machinelearning/Random gradient rise( GBM)algorithm Boston.py
0.1350724637681159
[20.]
PS C:\coding\machinelearning> 

Keywords: Algorithm Machine Learning AI

Added by adeelahmad on Thu, 30 Dec 2021 12:45:05 +0200