Recipe
publicShareStar

Working with imbalanced classification (binary case)

Collaborators
Reviewers
Total time needed: ~45 minutes
Learning Objectives
The user of this short list will learn about the different techniques to handle imbalanced datasets and the importance of model calibration
Potential Use Cases
You are working with imbalanced datasets, for example fraud detection.
Target Audience
INTERMEDIATEPractitioners working with imbalanced datasets
Go through the following annotated items in order:
ARTICLE 1. A Gentle Introduction to Imbalanced Classification
  • What is a classification problem?
  • Why imbalanced datasets are a challenge for classification algorithms?
  • Which ones are examples of imbalanced classification problems?
9 minutes
VIDEO 2. Machine Learning Classification How to Deal with Imbalanced Data
  • Why it is important to deal with imbalanced data?
  • How to use SMOTE?
10 minutes
ARTICLE 3. 7 Techniques to Handle Imbalanced Data
  • Which evaluation metrics can be used for an imbalanced dataset?
  • How can the dataset be resample?
  • Which techniques can be applied to imbalanced datasets?
9 minutes
ARTICLE 4. Dealing with Imbalanced Data
  • Which metrics are used with imbalanced datasets?
  • How to oversample the data?
  • How to under sample the data?
5 minutes
ARTICLE 5. Classifier calibration
  • Why calibration is important?
  • How to create a probability density plot of your model?
  • How to calibrate the model?
7 minutes
ARTICLE 6. How to Calibrate Probabilities for Imbalanced Classification
  • Why uncalibrated probabilities are a problem?
  • How to calibrate probabilities?
  • How to calibrate SVM?
  • How to calibrate KNN?
15 minutes

Concepts Covered