Brain Dump

Text Processing

Tags
text-processing

The creation, storage and access of (massive) text in digital form by a computer.

Broca's area is the region of the brain associated with (spoken) language.

[see page 6, Applications]

  • Information Retrieval Deciding relvence of a bunch of documents based on a search query, think Google.
  • Information Extraction Recognise (Specific) information from text. eg. Foo IS A Dog.
  • Text Categorisation Put text into discrete categories, eg. Email into Spam
  • Summarisation Extract essential information from (one or more) text articles.
  • Natural Language Generation Generate natural language text from an abstract, structured, representation. Eg. generate a manual in multiple languages from a single abstract representation of the steps to be carried out.
  • Machine Translation English to French and/or vice versa. (VERY Difficult).

Considerations

Text isn't simple [see page 4, unstructured] information. It has structure, headings, tables etc. These are clues to help determine the importance of terms.

Links to this note