Compression Entropy

Tags: text-processing

We define \(I(s)\) as the number of bits which symbol \(s\) should be coded in. I.E. The information content of \(s\). This metric is directly related to the predicted probability of that symbol and can be defined as: \(I(s) = -log_{2}(P[s])\) where \(P[s] = the probability of symbol s\).

We [see page 54, define] the entropy, \(H\), of a probability distribution as the average information per symbol over the whole alphabet.

H places a lower bound on compression, per [Shannon's source coding theorem](https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem). "The average number of bits/symbol of any uniquely decodable source must be greater than or equal to the entropy H of the source." "A string of length N made of symbols from alphabet X cannot be compressed into fewer than N×H(X) bits without possible loss of information"

Entropy is highest at a uniform distribution and lowest in a skewed distribution. It should approach 0 as the dist approaches a single value.

Entropy generally represents the best possible encoding we can achieve. That is to say entropy is

Links to this note

Huffman Coding