Brain Dump

Code Synchronisation

Tags
text-processing

A [see page 13, technique] for achieving random acess in compressed files.

Sync Points

Assume there's some smallest unit of random access in a compressed archive (eg. a document) and find a way to make random access for that unit. This can be done by:

  • Storing bit offset of document in archive.
  • Ensuring the document ends on a byte boundary.

Note: This approach is only viable in a limited domain and only for documents you know you'll be wanting to access after compression.

Self Synchronising Code

Design code so that regardless of where decoding starts, comes into synchronisation rapidly and stays there.

Note: This approach is problematic for full text retrieval, we can't guarantee how quickly synchronisation will be achieved.