confusion-matrix

Last updated Dec 1, 2022

All post

# Summary

A good model has high TP and TN, while low FP(error 1 $\alpha$) and FN(error 2 $\theta$). Or F1 score has high value

Accuracy:

How often the classifier makes correct predictions
It’s the ratio (num of correct pred / total num of pred)
It’s a useful metric when the true positives and true negatives are more important
Not suited for imbalanced data (when model predicts each point belongs to majority class label, the accuracy will be high but the model is not accurate)
A valid choice of evaluation for classification problem if the data are well balanced and not skewed.

Precision:

How many predictions are actually positive out of all the total positive predicted
It’s the ratio (num of correctly classified positive / total num of predicted positive) TP/(TP+FP)
Ideally precision should be high
It’s a useful metric in cases where false positive is a higher concern than false negative
Case 1: In spam detection we try to reduce FP (positive mail is not spam but predicted as spam)
Case 2: In recommendation systems, wrong result could lead to customer churn (good matched content predicted as bad content)

Recall / sensitivity:

How many observations of positive class are actually predicted positive
It’s the ratio (num of correctly classified positive / total num of actual positive) TP/(TP+FN) or (1 - error type 2)
A valid choice when we want to capture as many positives as possible
It’s a useful metric in cases where false negative trumps false positive
It would be a better metric because we don’t want to accidentally discharge an infected person.
Case 1: Medical cases where it doesn’t matter whether we raise false alarm(FP), but the actual positive cases should not go undetected!(FN)
Case 2: A person suffering from cancer but the model predicted not suffering (FP)

Specificity:

It’s the ratio (true negative / all negative outcomes) TN/(TN+FP)
It’s useful if you concerned about the accuracy of your negative classes, and there is a high cost to a positive outcome
Case: Auditor looking over financial transactions, and a positive outcome would mean a one year investigation, but not finding one would only cost the company $50

F1 Score:

It’s a number between 0 and 1
It’s the harmonic mean of precision and recall. It’s maintains a balance
Harmonic mean is not sensitive to extremely large values (unlike simple average does). Harmonic mean punishes extreme values more
If your precision is low, the F1 is low. If the recall is low again, your F1 score is low
It’s a better metric for imbalance data
It’s useful when there is no clear distinction between whether precision is more important or recall is more important. We combine them!
When we try to increase precision, the recall goes down and vice-versa. F1 score capture both trends in a single value

Confusion matrix is a table layout that allows visualization of the performance of an algorithm. It is an $N \times N$ matrix used for evaluating the performance of a classification model, where $N$ is the number of target classes. The matrix compare the actual target values with those predicted by the model. It’s a tabular summary of the number of correct and incorrect predictions made by a classifier.

Each row represents the instances in an actual class, while each column represents the instances in a predicted class. It is a special kind of contingency-table.

In unsupervised learning, it usually called a matching-matrix

Key takeaway:

A good model is one which has high TP and TN rates, while low FP and FN rates
If you have an imbalance dataset, it’s better to use confusion matrix as your evaluation criteria

# Terminology

P (Condiiton Positive)

The number of real positive cases in the data

N (Condition Negative)

The number of real negative cases in the data

TP (True Positive)

You predicted positive and it’s true
When the actual value is positive and predicted is also positive
A test result that correctly indicates the presence of a condition or characteristic

TN (True Negative)

You predicted negative and it’s true
When the actual value is negative and predicted is also negative
A test result that correctly indicates the absence of a condition or characteristic

FP (False Positive) / Type 1 error

You predicted positive and it’s false
When the actual is negative but prediction is positive
A test result which wrongly indicates that a particular condition or atribute is present

FN (False Negative) / Type 2 error

You predicted negative and it’s false
When the actual is positive, but the prediction is negative
A test result which wrongly indicates that a particular condition or attribute is absent

# Why we’re not using only accuracy?

In classification accuracy, there is no information about the number of misclassified instances. It’s not suited for imbalanced classes

# Classification Measure

Accuracy

How often the classifier makes the correct prediction
It’s the ratio between the number of correct predictions and the total number of predictions- Not suited for imbalanced classes

🪴 Aradinka Digital Garden

confusion-matrix

# Summary

# What is confusion matrix?

# Terminology

# Why we’re not using only accuracy?

# Classification Measure

Backlinks

Interactive Graph