Evaluation of ML - Classification Algorithm

Classification accuracy

Problem scenario:
An uncommon disease prediction system, input physical examination information, can judge whether there is a disease;
Prediction accuracy: 99.9%;
If the prevalence rate is 0.1%, our system can achieve 99.9% accuracy in predicting that everyone is healthy.
If the prevalence rate is 0.01%, our system can predict that everyone is healthy, with an accuracy of 99.99%

Conclusion: for extremely skewed data, only using classification accuracy is far from enough.

Solution: use confusion matrix for further analysis

Confusion Matrix

For binary classification problem

• The row represents the true value
• The column represents the predicted value
• 0 Negative
• 1 Positive
01
0TNFP
1FNTP
• TN: True Negative, the predicted negative is correct
• FP: False Positve, predict positive error
• FN: False Negative, negative prediction error
• TP: true positive, the predicted positive is correct

Accuracy and recall

01
0TN 9978FP 12
1FN 2TP 8

Accuracy

In the forecast data, 100 data are 1. How many are 1.

$precision = \frac{ TP }{ TP + FP }$

Accuracy in the above data = 8 / (12 + 8) = 40%

When to focus on accuracy?
Such as stock forecast.

recall

Among the predicted data, there are 100 data with 1, and how many 1 are predicted.

$recall = \frac{ TP }{ TP + FN }$

In the above data, the recall rate = 8 / (2 + 8) = 80%

When to focus on recall?
Such as: disease diagnosis.

At this time, it doesn't matter if the accuracy is a little lower. Can be further diagnosed.

Why?

Why is accuracy and recall better than classification accuracy?

The accuracy of the following cases is meaningless. F1 Score

F1 will take into account both accuracy and recall, which is the harmonic average of the two (rather than the arithmetic average).

F 1 = 2 ∗ p r e s i c i o n ∗ r e c a l l p r e s i c i o n + r e c a l l F1 = \frac{ 2 * presicion * recall }{ presicion + recall } F1=presicion+recall2∗presicion∗recall​

1 F 1 = 1 2 ( 1 p r e c i s i o n + 1 r e c a l l ) \frac{1}{F1} = \frac{1}{2} ( \frac{1}{ precision } + \frac{1}{ recall } ) F11​=21​(precision1​+recall1​)

Why take the harmonic average?
The harmonic average has a characteristic: if the two are extremely unbalanced, such as one is particularly high and the other is particularly low, F1 will also be very low.

When recall and precision are both 1, F1 takes the maximum value of 1;
F1 is 0 when both are 0.

code implementation

Code implementation of F1

import numpy as np

def f1_score(precision, recall):
try:
return 2 * precision * recall / (precision + recall)
except:
return 0.0

# If they are equal, the value of f1 is equal to them
precision = 0.5
recall = 0.5
f1_score(precision, recall)
# 0.5

# If the two are different, the result will be pulled down
precision = 0.1
recall = 0.9
f1_score(precision, recall)
# 0.18000000000000002

# If one is 0, it will be 0
precision = 0.0
recall = 1.0
f1_score(precision, recall)
# 0.0

Introducing real data

from sklearn import datasets

X = digits.data
y = digits.target.copy()

y[digits.target==9] = 1
y[digits.target!=9] = 0

Confusion matrix, realization of accuracy and recall

def TN(y_true, y_predict):
assert len(y_true) == len(y_predict)
return np.sum((y_true == 0) & (y_predict == 0))

def FP(y_true, y_predict):
assert len(y_true) == len(y_predict)
return np.sum((y_true == 0) & (y_predict == 1))

def FN(y_true, y_predict):
assert len(y_true) == len(y_predict)
return np.sum((y_true == 1) & (y_predict == 0))

def TP(y_true, y_predict):
assert len(y_true) == len(y_predict)
return np.sum((y_true == 1) & (y_predict == 1))

def confusion_matrix(y_true, y_predict):
return np.array([
[TN(y_true, y_predict), FP(y_true, y_predict)],
[FN(y_true, y_predict), TP(y_true, y_predict)]
])

def precision_score(y_true, y_predict):
tp = TP(y_true, y_predict)
fp = FP(y_true, y_predict)
try:
return tp / (tp + fp)
except:
return 0.0

def recall_score(y_true, y_predict):
tp = TP(y_true, y_predict)
fn = FN(y_true, y_predict)
try:
return tp / (tp + fn)
except:
return 0.0

y_log_predict = log_reg.predict(X_test)

TN(y_test, y_log_predict)		# 403
FP(y_test, y_log_predict)		# 2
FN(y_test, y_log_predict)		# 9
TP(y_test, y_log_predict)		# 36

confusion_matrix(y_test, y_log_predict)
'''
array([[403,   2],
[  9,  36]])
'''

precision_score(y_test, y_log_predict)	# 0.94736842105263153

recall_score(y_test, y_log_predict)   # 0.80000000000000004

Confusion matrix in scikit learn, accuracy, recall, F1

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
log_reg.score(X_test, y_test)
# 0.97555555555555551

y_predict = log_reg.predict(X_test)

from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score

confusion_matrix(y_test, y_predict)  # Confusion matrix
'''
array([[403,   2],
[  9,  36]])
'''
precision_score(y_test, y_predict)
# 0.94736842105263153

recall_score(y_test, y_predict)
# 0.80000000000000004

f1_score(y_test, y_predict)
# 0.86746987951807231

Balance of precision recall

Effect of threshold on accuracy and recall The star in the figure above is 1;
It can be seen that as the threshold increases, the accuracy rate becomes higher and the recall rate becomes lower;
Precision and precision cannot be contradictory at the same time.

Popular understanding, if you want high accuracy, you need to be particularly sure of the data before marking; But this will miss a lot of samples with a real value of 1.

# decision_function is a decision-making function. You can know what the corresponding score value is

log_reg.decision_function(X_test)[:10]
# array([-22.05700185, -33.02943631, -16.21335414, -80.37912074, -48.25121102, -24.54004847, -44.39161228, -25.0429358 ,  -0.97827574, -19.71740779])

log_reg.predict(X_test)[:10]
# array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

decision_scores = log_reg.decision_function(X_test)

np.min(decision_scores)
# -85.686124167491727

np.max(decision_scores)
# 19.889606885682948

Threshold usage 5

y_predict_2 = np.array(decision_scores >= 5, dtype='int')

confusion_matrix(y_test, y_predict_2) # array([[404,   1], [ 21,  24]])

precision_score(y_test, y_predict_2)	# 0.95999999999999996

recall_score(y_test, y_predict_2)  # 0.53333333333333333

Threshold use - 5

y_predict_3 = np.array(decision_scores >= -5, dtype='int')

confusion_matrix(y_test, y_predict_3)
# array([[390,  15], [  5,  40]])

precision_score(y_test, y_predict_3) # 0.72727272727272729

recall_score(y_test, y_predict_3)  #0.88888888888888884

How to select threshold – PR curve

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

precisions = []
recalls = []

thresholds = np.arange(np.min(decision_scores), np.max(decision_scores), 0.1)

for threshold in thresholds:
y_predict = np.array(decision_scores >= threshold, dtype='int')
precisions.append(precision_score(y_test, y_predict))
recalls.append(recall_score(y_test, y_predict))

plt.plot(thresholds, precisions)
plt.plot(thresholds, recalls)
plt.show() Precision recall curve

# The x-axis is the accuracy rate and recall is the recall rate; The relationship between the two
plt.plot(precisions, recalls)
plt.show() Precision recall curve in scikit learn

from sklearn.metrics import precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_test, decision_scores)

# Why 145? The above function will automatically take the most appropriate step size according to the data
precisions.shape	# (145,)
recalls.shape # (145,)
thresholds.shape  # (144,)

plt.plot(thresholds, precisions[:-1])
plt.plot(thresholds, recalls[:-1])
plt.show() plt.plot(precisions, recalls)
plt.show() The sharp drop point in the PR curve is likely to be the point where precision and recall reach equilibrium. The model corresponding to the outer curve is better than the model corresponding to the inner curve;
The description inside is more abstract, and the area of curve package can also be used to describe it.

ROC curve

ROC : Receiver Operation Characteristic Curve
Statistics is commonly used to describe the relationship between TPR and FPR.

TPR is the recall rate, which is predicted to be 1, accounting for 1 of the total
T P R = T P T P + F N TPR = \frac{ TP }{ TP + FN } TPR=TP+FNTP​

FPR: False Position Rate, predicted to be 0, accounting for the proportion of 0 in the total
F P R = F P T N + F P FPR = \frac{ FP }{ TN + FP } FPR=TN+FPFP​ It can be found that TPR and FPR increase with the decrease of threshold; The two trends are consistent;

TPR & fpr code implementation

def TPR(y_true, y_predict):
tp = TP(y_true, y_predict)
fn = FN(y_true, y_predict)
try:
return tp / (tp + fn)
except:
return 0.

def FPR(y_true, y_predict):
fp = FP(y_true, y_predict)
tn = TN(y_true, y_predict)
try:
return fp / (fp + tn)
except:
return 0.

ROC code implementation

from playML.metrics import FPR, TPR

fprs = []
tprs = []
thresholds = np.arange(np.min(decision_scores), np.max(decision_scores), 0.1)

for threshold in thresholds:
y_predict = np.array(decision_scores >= threshold, dtype='int')
fprs.append(FPR(y_test, y_predict))
tprs.append(TPR(y_test, y_predict))

plt.plot(fprs, tprs)
plt.show() ROC in scikit learn

from sklearn.metrics import roc_curve

fprs, tprs, thresholds = roc_curve(y_test, decision_scores)

plt.plot(fprs, tprs)
plt.show() ROC AUC

AUC: area under curve

from sklearn.metrics import roc_auc_score

# Calculated area
roc_auc_score(y_test, decision_scores)  # 0.98304526748971188

The model with larger roc area is considered to be a better model.

Implementation of confusion matrix

import matplotlib.pyplot as plt
import itertools

def plot_confusion_matrix(cm, classes,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=0)
plt.yticks(tick_marks, classes)

thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape), range(cm.shape)):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

call

from sklearn.metrics import confusion_matrix

LR_model = LogisticRegression()
LR_model = LR_model.fit(X_train, y_train)
y_pred = LR_model.predict(X_test)
cnf_matrix = confusion_matrix(y_test,y_pred)

# recall
print("Recall metric in the testing dataset: ", cnf_matrix[1,1]/(cnf_matrix[1,0] + cnf_matrix[1,1]))

# accuracy
print("accuracy metric in the testing dataset: ", (cnf_matrix[1,1] + cnf_matrix[0,0])/(cnf_matrix[0,0] + cnf_matrix[1,1]+cnf_matrix[1,0]+cnf_matrix[0,1]))

# Plot non-normalized confusion matrix
class_names = [0,1]
plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_names, title='Confusion matrix')
plt.show() Keywords: Machine Learning Deep Learning

Added by skovela on Mon, 03 Jan 2022 01:03:13 +0200