Brain Dump

Term Manipulation

Tags
text-processing

The act of preprocessing words from documents to support generalisation. This is often a beginning stage in building an inverted-(/brain/20201103212129-indexing/) for information-retrieval.

It has been [see page 5, observed] that both the most frequent and least frequent words are not the most useful for retrieval. Note: the curve shown above is common across most human languages.

Links to this note