Dealing with Class Imbalance: Resampling, Weighting, Ensembles

Class imbalance is a common problem in practice — real-world data isn’t as clean as a public dataset, and the positive/negative counts are often wildly skewed. There are three families of fixes: resampling, weight balancing, and ensembles.

Resampling

Resampling goes in two directions: undersample the majority class, or oversample the minority class.

Resampling the data
Undersampling (left) drops majority-class samples; oversampling (right) adds minority-class samples.

A classic undersampling algorithm is Tomek Links: find the majority-class samples closest to the minority class and remove them, sharpening the decision boundary. But it doesn’t always help — it can also erase subtle boundaries and backfire.

Undersampling with Tomek Links
Tomek Links: remove majority-class points hugging the minority class to clean up the boundary.

A classic oversampling algorithm is SMOTE: compute the K nearest neighbors of a minority-class sample xix_i, randomly pick one neighbor x^i\hat{x}_i, and synthesize a new sample on the line between xix_i and x^i\hat{x}_i; repeat until enough new samples exist.

Oversampling with SMOTE
SMOTE: synthesize new samples along the line between a minority sample and its neighbor.

Weight balancing

Another family weights samples by class, the flagship being Focal Loss:

FL(pt)=αt(1pt)γlog(pt)\text{FL}(p_t) = -\alpha_t\,(1-p_t)^{\gamma}\log(p_t)

Here α\alpha handles positive/negative imbalance — different loss weights for positives and negatives — while γ\gamma handles easy/hard imbalance: the larger γ\gamma, the more the loss of high-confidence “easy” samples is suppressed, focusing the loss on the hard, “difficult” ones.

Focal Loss focuses on hard samples
Larger γ pushes down the loss of easy (high-confidence) samples, focusing on the hard ones.

Ensembles

A third family is ensemble learning. The common Bagging (bootstrap aggregating) trains several classifiers on different sampled subsets of the data, then votes — a combination of weak classifiers is often more robust than any single one.

Bagging algorithm
Bagging: train several classifiers on different data subsets, then aggregate by voting.

References

  • He, Haibo, Ma, Yunqian. Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley, 2013.
  • Chawla, Nitesh V., et al. SMOTE: Synthetic Minority Over-sampling Technique. JAIR, 2002.
  • Lin, Tsung-Yi, et al. Focal Loss for Dense Object Detection. ICCV, 2017.