Brain Dump

Laplace Smoothing

Tags
text-processing

A form of [see page 34, smoothing] that can be applied to a classifier to try to mitigate incorrect classifications.

At a basic level:

  1. When we encounter a term that isn't in our corpus of word->sentiment classifications we consider that word to have a 0% correlation with a given sentiment.
  2. This turns our classification to 0 because it's defined as a large product.
  3. Which means a single missing word can completely invalidate a classification.

Laplace smoothing gets around this by basically adding a minimum count of one to every word in our corpus/vocabulary/feature.

Links to this note