Brain Dump

Arithmetic Coding

Tags
text-processing

A coding algorithm which is suitable for images. This approach is:

  • Slower than Huffman coding.
  • Not easy to start decoding in the middle of a compressed stream.
  • Less suitable for full-text retrieval (random access required).

Procedure

Arithmetic coding [see page 26, represents] our encoded input as a fractional binary number. We build a model containing the probabilities of each character in our alphabet and place it on a number line (from 0-1). Then we progressively [see page 44, clamp] this interval.

The first character lies in some range on the number line (say 0.5-0.8) and we now create a new number line from 0.5-0.8 and recreate our original model on this line. We then repeat this process until we've got a final fractional range representative of our input.

We can then transmit see page 54, any number in this range and transmit this as an answer. We can work back from this fractional value to determine our original input sequence.

Issues

  • Need to transmit a whole number of bits/bytes.
  • Arithmetic needs to be highly precise (some computer may have issues with this).
  • Method produces no output until all the input has been processed.

We can get around many of these [see page 75, issues] by outputting bits during encoding. At some point we can reliably assume a common prefix of bits occurs for our current range. Note: doing so also lets us disregard that prefix. Meaning because we know that value is in the stream and we've outputted it we can remove it from our range and end up with a smaller fraction to use for the rest of the input.