Tensorflow similarity study notes 13

2021SC@SDUSC

Code address: https://github.com/tensorflow/similarity/blob/master/tensorflow_similarity/evaluators/evaluator.py

 def evaluate_classification(
        self,
        query_labels: IntTensor,
        lookup_labels: IntTensor,
        lookup_distances: FloatTensor,
        distance_thresholds: FloatTensor,
        metrics: Sequence[ClassificationMetric],
        matcher: Union[str, ClassificationMatch],
        distance_rounding: int = 8,
        verbose: int = 1
    ) -> Dict[str, np.ndarray]:
        """Evaluate the classification performance.
        Compute the classification metrics given a set of queries, lookups, and
        distance thresholds.
        Args:
            query_labels: Sequence of expected labels for the lookups.
            lookup_labels: A 2D tensor where the jth row is the labels
            associated with the set of k neighbors for the jth query.
            lookup_distances: A 2D tensor where the jth row is the distances
            between the jth query and the set of k neighbors.
            distance_thresholds: A 1D tensor denoting the distances points at
            which we compute the metrics.
            metrics: The set of classification metrics.
            matcher: {'match_nearest', 'match_majority_vote'} or
            ClassificationMatch object. Defines the classification matching,
            e.g., match_nearest will count a True Positive if the query_label
            is equal to the label of the nearest neighbor and the distance is
            less than or equal to the distance threshold.
            distance_rounding: How many digit to consider to
            decide if the distance changed. Defaults to 8.
            verbose: Be verbose. Defaults to 1.
        Returns:
            A Mapping from metric name to the list of values computed for each
            distance threshold.
        """

Once we calculate the True Positive, True Negative, False Positive and False Negative indicators, we can calculate the sensitivity and specificity of the model.

Sensitivity:

Sensitivity is called true positive rate. It is also called a recall. In essence, it tells us the proportion of actual positive cases predicted to be positive by our model.

Therefore, when the sensitivity value is very high, this means that our model is good at correctly predicting true positive. It is the ratio of true positive to all positive.

Specificity:

Specificity is called true negative rate. It tells us the proportion of actual negative cases predicted as negative by our model. It is the ratio of true negative to all negative.

Therefore, when the specificity value is very high, this means that our model is good at predicting true negative.

4. Which is better, sensitivity or specificity?

We should always aim for the highest possible sensitivity and specificity, but sometimes one measure is more important than another.

The reason why I propose the above three use cases is to help us understand this problem.

The short answer is that it depends on the problem we are trying to solve.

We can draw an ROC curve, and our goal should be that the area under the curve is 1. The larger the area, the better the model. The area under the curve is called AUC.

Python code

We can use support vector machine, logistic regression and decision tree classifier to classify data in many models.

This code snippet shows how we get the confusion matrix and output the score:

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVCsvc = SVC(kernel='linear', C=5).fit(X_train, y_train)
predicted = svc.predict(X_test)
cm = confusion_matrix(y_test, predicted)dtc = DecisionTreeClassifier(max_depth=10).fit(X_train, y_train)
predicted = dt.predict(X_test)
cm = confusion_matrix(y_test, predicted)lrc = LogisticRegression().fit(X_train, y_train)
predicted = lr.predict(X_test)
cm = confusion_matrix(y_test, predicted)

generalization
This paper aims to explain what is sensitivity and specificity. Specificity and sensitivity are used in many statistical experiments, so it is important to understand what they are and when to select the matrix.

Keywords: AI Deep Learning

Added by tomsasse on Wed, 29 Dec 2021 06:10:17 +0200