Lexicon

Tags: text-processing

An approach for sentiment-analysis which uses a [see page 9, lexicon] to sentiment words.

We define a [see page 27, lexicon] as a list of opinion/emotion words mapping to a polarity. For example:

Word	Polarity
Horrible	Bad
Great	Good

General Algorithm

Classify text as subjective if it has some fixed number of words in the lexicon.
Count the number of +ve and -ve connotations and classify text as +ve, neutral, -ve.

Note: Lexicon approaches only apply on subjective text.

Feature Sentimenting

We can tie lexicon based approaches to specific features, for example if we can extract the feature at a previous step and identify emotion words related to that feature, we can then simply pick the emotion words that're in our lexicon and use the algorithm (binary, weighted etc.) to get a sentiment for it.

Advantages vs. Disadvantages

See [see page 25, here], key highlights:

Easy to extend, required large lexicon and needs to be created+maintained by experts.
Needs frequent updates to incorporate new words.
Trouble with spelling errors (which're very frequent).
Lexicons are static, they don't change depending on the text being processed.

Lexicon Generation

Lexicon Generation can be done manually or taken from [see page 28, somewhere else] SentiWordNet, LIWC (Linguistic Inquiry and Word Count) by Psychologists, groups by emotion) or generated [see page 35, semi-automatically].

The final approach involves:

Input some seed words (with positive and negative connotations).
Generate patterns from them ("nice" -> "nice and", "low-cost" -> "low-cost but").
Search the web for these patterns.

Note: we can reinforce our discoveries by using the local context. For example if we find a pattern in a restaurant review we expect the review to match the sentiment of the word.

Brain Dump

Lexicon

General Algorithm

Feature Sentimenting

Advantages vs. Disadvantages

Lexicon Generation

Links to this note