Training iris data sets with several classification models

Several common classification algorithms are used to train iris data, and K-fold cross validation method is used for evaluation

K-fold cross validation: sklearn.model_selection.KFold(n_splits=k, shuffle=False, random_state=None)

Idea: the training / test data set is divided into n ﹣ splits mutually exclusive subsets, one of which is used as the verification set each time, and the remaining N ﹣ splits-1 is used as the training set for n ﹣ splits training and testing, and N ﹣ splits results are obtained

Parameter Description:
N ﹣ splits: indicates how many equal parts are divided
Shuffle: shuffle or not during each division
① if it is false, its effect is the same as that of random state, and the result of each partition is the same
(2) if it is True, the result of each division is different, which means that it is randomly sampled after shuffling
Random state: random seed number

Dataset: iris (in this case, from a local file)

Code

import pandas as pd
import numpy as np

from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold


from sklearn import tree
from sklearn import naive_bayes
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier


#Read data
data = pd.read_csv('iris.csv',header=None)
data.columns = ["Calyx length","Calyx width","Petal length","petal width","category"]

X = data.iloc[:,0:4]
Y = data.iloc[:,4]


k = 10
kf = KFold(n_splits=k, shuffle=True)

def eval_model(model_name,model):
    accuracies = []
    i=0
    for train_index, test_index in kf.split(data): #split
        x_train, x_test = X.loc[train_index] ,X.loc[test_index]
        y_train, y_test = Y.loc[train_index] ,Y.loc[test_index]
        
        model.fit(x_train,y_train) #train
        y_predict = model.predict(x_test) #Forecast
        
        accuracy = accuracy_score(y_pred=y_predict,y_true=y_test) #accuracy
        accuracies.append(accuracy)
        i+=1
        print('The first{}round: {}'.format(i,accuracy))
        
    print(model_name+"Model accuracy: ",np.mean(accuracies))
    
    
models={
        'decision tree':lambda:tree.DecisionTreeClassifier(),
        'random forest':lambda:RandomForestClassifier(n_estimators=100),
        'naive bayes':lambda:naive_bayes.GaussianNB(),
        'svm':lambda:svm.SVC(gamma='scale'),
        'GBDT':lambda:GradientBoostingClassifier(),
        'MLP':lambda:MLPClassifier(max_iter=1000),        
        }


for name,m in models.items():
    eval_model(name,m())


Keywords: Lambda

Added by Minor Threat on Sat, 30 Nov 2019 23:09:31 +0200