Text Coding
- Tags
- text-processing
Determining the output representation of a symbol in a compression scheme given the probability distribution supplied by a model. The general approach is:
- Symbols that occur the most frequently should have the shortest code.
- Symbols that occur the least frequently should have the longest code.
Given a set of codewords we can [see page 116, calculate] the expected average code length (bit count) for each symbol in the compressed output.