Intro

This post is a recap of the Machine Learning Zoomcamp Module 4.

Below are the posts for the previous modules:

  1. Machine Learning Zoomcamp Module 1 - points received: 9 (7/7 for questions + 2 bonus for learning in public)
  2. Machine Learning Zoomcamp Module 2 - points received: 5 (5/6 for questions + 0 bonus for learning in public)
  3. Machine Learning Zoomcamp Module 3 - points received: 7 (6/6 for questions + 1 bonus for learning in public)

The gist of the module

This module focuses on the evaluation of classification models. It covers various metrics used for this purpose, including accuracy, precision, recall, F1 score, and ROC-AUC. The lessons build upon the dataset and model from the previous module, which involves predicting customer churn. The objective is to determine whether a customer is likely to leave or remain, using a logistic regression model.

Accuracy

Accuracy is a common metric used to evaluate classification models. It measures the proportion of correct predictions made by the model. However, accuracy alone may not provide a complete picture of the model’s performance, especially in imbalanced datasets.

Confusion matrix

The confusion matrix is a tabular representation of the model’s predictions against the actual values. It consists of four components: true positives, true negatives, false positives, and false negatives. These components are used to calculate other evaluation metrics such as precision, recall, and F1 score.

True positives (TP) are the cases where the model correctly predicts the positive class. True negatives (TN) are the cases where the model correctly predicts the negative class. False positives (FP) are the cases where the model incorrectly predicts the positive class. False negatives (FN) are the cases where the model incorrectly predicts the negative class.

Visually, the confusion matrix looks like this:

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

Using these components, the accuracy is nothing else than the sum of the true positives and true negatives divided by the total number of observations.

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Precision

Precision measures the proportion of true positive predictions among all positive predictions made by the model. It is calculated as the ratio of true positives to the sum of true positives and false positives.

Precision = TP / (TP + FP)

Recall

Recall, also known as sensitivity or true positive rate, measures the proportion of true positive predictions among all actual positive cases. It is calculated as the ratio of true positives to the sum of true positives and false negatives.

Recall = TP / (TP + FN)

F1 score

The F1 score is the harmonic mean of precision and recall. It provides a balance between the two metrics and is useful when the classes are imbalanced.

An F1 score of 1 indicates a perfect model, while a score of 0 indicates a model that performs no better than random. The relative contribution of precision and recall to the F1 score are equal.

F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

ROC curve

The Receiver Operating Characteristic (ROC) curve is a graphical representation of the trade-off between the true positive rate (recall), which we want to maximize, and the false positive rate, which we want to minimize, of a classification model. It is used to evaluate the performance of the model across different thresholds.

False positive rate (FPR) is the proportion of false positive predictions among all actual negative cases. It is calculated as the ratio of false positives to the sum of false positives and true negatives.

True positive rate (TPR) is the same as recall.

The ROC curve plots the TPR against the FPR for different threshold values.

Area Under the ROC curve

The Area Under the ROC Curve (AUC-ROC) is a metric used to evaluate the performance of a binary classification model. It measures the model’s ability to distinguish between the two classes (positive and negative) across all possible classification thresholds.

AUC-ROC answers the question, “What is the probability that the model will rank a randomly chosen positive instance higher than a randomly chosen negative instance?”

The AUC-ROC score ranges from 0 to 1, where a score of 0.5 indicates a model that performs no better than random, and a score of 1 indicates a perfect model.

Cross-validation

Cross-validation is a technique used to evaluate the performance of a machine learning model. It involves splitting the dataset into multiple subsets, training the model on some subsets, and testing it on others. This process is repeated multiple times to obtain a more reliable estimate of the model’s performance.

The most common form of cross-validation is k-fold cross-validation, where the dataset is divided into k subsets (folds). The model is trained on k-1 folds and tested on the remaining fold. This process is repeated k times, with each fold serving as the test set once.

This technique is applied in the parameter tuning process to select the best hyperparameters for the model.

Conclusion

This module provides an overview of the evaluation metrics used to assess the performance of classification models. It covers the importance of accuracy, precision, recall, F1-score, ROC-AUC, and the confusion matrix in evaluating the model’s performance.

The homework code can be found here.