Posts Confusion Matrix and Derived Metrics - Evaluation Metrics
Post
Cancel

Confusion Matrix and Derived Metrics - Evaluation Metrics

Introduction

When it comes to an evaluation of a classification problem, the Confusion Matrix is one of the widely popular evaluation metrics. It is a matrix consist of the status of all the predictions and actual values divided into true positive, false positive, true negative, and false negative. We will see in a moment what all these things are.

Other than the above-mentioned terms, there are some other derived terms/metrics from the Confusion Matrix. In Python language’s Sklearn library following are calculated in the Classification Report. Along with Confusion Matrix, the following derived metrics are covered in this article:

• F1 Score
• Precision
• Recall
• Support
• Micro Average
• Macro Average
• Weighted Average

Before going for an example, let’s quickly understand these terms one-by-one.

True Positive

True Positive [TP] is the number of positive predictions that are actually positive means the actual values are positive and predicted is also positive.

POSITIVE -> POSITIVE

False Positive OR Type 1 Error

False Positive [FP] is a number of positive predictions that are actually negative and model predicted wrongly.

False Positive is also known as Type 1 Error.

NEGATIVE -> POSITIVE

True Negative

True Negative [TN] is the number of negative predictions that are actually negative and model predicted correctly.

The actual value is negative and the model also predicted negative that is the correct prediction of negative values.

NEGATIVE -> NEGATIVE

False Negative OR Type 2 Error

False Negative [FN] is a number of negative predictions that are actually positive and model predicted wrongly.

False Negative is also known as Type 2 Error.

POSITIVE -> NEGATIVE

Precision

Precision is a rate of positive predictions. Out of all the positive predictions how many are actually positive. The value of precision will be a floating-point value between 0 to 1. 1 indicates the best score and 0 for the worst score for positive predictions.

Following is the formula for identifying precision:

\begin{equation} \text {Precision}=\frac{T P}{T P+F P} \end{equation}

TP = True Positive
FP - False Positive

Recall

The recall is also a rate of positive predictions like precision but here from of all actual positives values, how many of them are found positive by model. The value of recall will be a floating-point value between 0 to 1. 1 indicates the best score and 0 for the worst score for actual positive predictions.</br>

Following is the formula for identifying recall:

\begin{equation} \text {Recall}=\frac{T P}{T P+F N} \end{equation}

TP = True Positive
FN - False Negative

F1-Score

F1-Score is like a mid-way to precision and recall as it is a weighted average of precision and recall. F1-Score changes based on a contribution by precision and recall.
Both precision and recall cannot increase or decrease together instead if precision increases then recall decreases and vice versa. F1-Score value is also a floating-point value between 0 to 1.

When F1-Score is 1 then we can say that contribution of precision and recall is equal.

Following is the formula for identifying f1-score:

\begin{equation} \text {F1-Score}=2 * \frac{Precision * Recall}{Precision + Recall} \end{equation}

Support

Support is simply a count of classification type in input data.
For example, if there is a classification problem with two classes ‘A’ and ‘B’ then the total number of ‘A’ present in input data is a support value of class A.
So, if there are 10000 records with class A and 9500 with class B then both of these values are support value of their respective classes.

Micro Average

Micro Average is an average of true positive, false positive, and false negative while working with a multiclass classification problem.
Precision, Recall, and F1-Score all have Micro Average. Let’s see how it is calculated when we have three classes A, B, and C.

\begin{equation} \text {MA for Precision}=\frac{TP1+TP2+TP3}{TP1+TP2+TP3+FP1+FP2+FP3} \end{equation}
\begin{equation} \text {MA for Recall}=\frac{TP1+TP2+TP3}{TP1+TP2+TP3+FN1+FN2+FN3} \end{equation}
\begin{equation} \text {MA for F1-Score}=\frac{2 * MA\ for\ Precision * MA\ for\ Recall}{MA\ for\ Precision + MA\ for\ Recall} \end{equation}

In all three formulas above, TP1, TP2, TP3 represent true positive values of classes A, B, and C respectively and the same goes for false positive and false negative.
Micro Average of F1-Score is a harmonic mean of micro averages of Precision and Recall.

Macro Average

Macro Average is very simple. For Precision, it is an average of all the Precision values of multiple classes (i.e. A, B, and C). in the case of Recall also it is an average of all Recall values and the same goes for F1-Score.

For example, the micro average of precision would be an average of precision of classes A, B, and C.

Weighted Average

Weighted Average is an average measured by considering support values of multiple classes i.e. number of each class in input data.

For example, we have 3 classes A, B, and C then the formula for the weighted average for precision, recall, and f1-score will be like this:

Here, Support = S, Precision = P, Recall = R and F1-Score = F1 are used.

\begin{equation} \text {WA for Precision}=\frac{S(A) * P(A) + S(B) * P(B) + S(C) * P(C) }{S(A) + S(B) + S(C)} \end{equation}
\begin{equation} \text {WA for Recall}=\frac{S(A) * R(A) + S(B) * R(B) + S(C) * R(C) }{S(A) + S(B) + S(C)} \end{equation}
\begin{equation} \text {WA for F1-Score}=\frac{S(A) * F1(A) + S(B) * F1(B) + S(C) * F1(C) }{S(A) + S(B) + S(C)} \end{equation}

Example

Confusion Matrix

A confusion matrix is a matrix with actual and predicted values consist of true positive, false positive, true negative, and false negative counts.

Confusion Matrix Skelton

Confusion Matrix Example

Classification Report

Linear regression diagram

⦿ This is the same classification report that Python’s Sklearn library generates.
⦿ In the above diagram, we are dealing with three classes A, B, and C.
⦿ The total number of records in this example are 92 i.e. total of support values of A, B, and C.
⦿ The rest of the table is self-explanatory with terms discussed earlier, re-look definitions if required.

Conclusion

⦿ To compute the confusion matrix and classification report, Python’s Sklearn library has already implemented it and you just have to pass your actual and predicted data to it to get the results.
⦿ There are other ways of evaluating classification problems and those will be covered in future articles.
⦿ You can find all the classification evaluation metrics here: Classification Evaluation Metrics

Check out here article categories for more information: Article Categories

This post is licensed under CC BY 4.0 by the author.