Brain Dump

Laplace Smoothing

Tags: text-processing

A form of [see page 34, smoothing] that can be applied to a classifier to try to mitigate incorrect classifications.

At a basic level:

When we encounter a term that isn't in our corpus of word->sentiment classifications we consider that word to have a 0% correlation with a given sentiment.
This turns our classification to 0 because it's defined as a large product.
Which means a single missing word can completely invalidate a classification.

Laplace smoothing gets around this by basically adding a minimum count of one to every word in our corpus/vocabulary/feature.

Links to this note

Bayes Classifier