Several common classification algorithms are used to train iris data, and K-fold cross validation method is used for evaluation
K-fold cross validation: sklearn.model_selection.KFold(n_splits=k, shuffle=False, random_state=None)
Idea: the training / test data set is divided into n ﹣ splits mutually exclusive subsets, one of which is used as the verification set each time, and the remaining N ﹣ splits-1 is used as the training set for n ﹣ splits training and testing, and N ﹣ splits results are obtained
Parameter Description:
N ﹣ splits: indicates how many equal parts are divided
Shuffle: shuffle or not during each division
① if it is false, its effect is the same as that of random state, and the result of each partition is the same
(2) if it is True, the result of each division is different, which means that it is randomly sampled after shuffling
Random state: random seed number
Dataset: iris (in this case, from a local file)
Code
import pandas as pd import numpy as np from sklearn.metrics import accuracy_score from sklearn.model_selection import KFold from sklearn import tree from sklearn import naive_bayes from sklearn import svm from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.neural_network import MLPClassifier #Read data data = pd.read_csv('iris.csv',header=None) data.columns = ["Calyx length","Calyx width","Petal length","petal width","category"] X = data.iloc[:,0:4] Y = data.iloc[:,4] k = 10 kf = KFold(n_splits=k, shuffle=True) def eval_model(model_name,model): accuracies = [] i=0 for train_index, test_index in kf.split(data): #split x_train, x_test = X.loc[train_index] ,X.loc[test_index] y_train, y_test = Y.loc[train_index] ,Y.loc[test_index] model.fit(x_train,y_train) #train y_predict = model.predict(x_test) #Forecast accuracy = accuracy_score(y_pred=y_predict,y_true=y_test) #accuracy accuracies.append(accuracy) i+=1 print('The first{}round: {}'.format(i,accuracy)) print(model_name+"Model accuracy: ",np.mean(accuracies)) models={ 'decision tree':lambda:tree.DecisionTreeClassifier(), 'random forest':lambda:RandomForestClassifier(n_estimators=100), 'naive bayes':lambda:naive_bayes.GaussianNB(), 'svm':lambda:svm.SVC(gamma='scale'), 'GBDT':lambda:GradientBoostingClassifier(), 'MLP':lambda:MLPClassifier(max_iter=1000), } for name,m in models.items(): eval_model(name,m())