- Tetsuya Nasukawa and Jeonghee Yi
|Domain||General domain + Camera reviews|
|Sentiment Classes||Positive / Negative|
|Aspect Detection Method||Pattern-based|
|Sentiment Analysis Method||Transfer Patterns|
|Sentiment Lexicon||Custom, including both words and some common expressions|
|Parser||Shallow, rule-based parser (implemented in TEXTRACT)|
|On general domain||Precision: 94.3%||Recall: 28.6%|
|On camera reviews||Precision: 94.5%||Recall: 24%|
This is, to my knowledge, the first sentiment analysis paper that attempts to attribute sentiment to specific subjects in text (e.g., aspects). It is a typical high precision - low recall method, as can be seen in the performance summary above.
An important argument made in this paper is that previously, only simple operations were performed on text (e.g., stemming, part-of-speech tagging), and much was computed using co-occurrences or word distance measures. Now, the authors employ a shallow parser and use the resulting syntactic structure to create extraction patterns to deal with natural language more accurately. They argue that sentiment, originating at some typical positive or negative words, flows through the sentence, influencing other words as well. Especially verbs are, what they call, sentiment transmitters, in that they transmit the sentiment from one word to the other. For example, in the sentence "this camera is good", the verb "is" transmits the sentiment of "good" to the subject "this camera". Based on this insight, a sentiment lexicon is created that contains sentiment expressions with information regarding
- polarity (positive, negative, neutral, sentiment transfer verb),
- part-of-speech (adjectives, adverbs, nouns, and verbs are included in the lexicon),
- the canonical form of the sentiment term,
- and arguments, such as subject and object, that are able to receive sentiment from, or transmit sentiment to, a transferring verb.
An example might help to give a clearer picture of this lexicon, as it is quite different from the usual sentiment lexicons. "gVB admire obj" is an entry in the lexicon indicating that the verb "admire" is a sentiment term that transmits positive sentiment towards its object. Besides entries like this, it is also possible for a verb to transmit the opposite of the sentiment it receives: "tVB prevent obj ~sub" indicates that the reverse polarity of its object is transferred to its subject, like in "the company prevented trouble", where company is associated with positive sentiment. The lexicon has 3,513 entries, with 14 of them using regular expressions.
Clearly, the rules, as embedded in the sentiment lexicon, are very well crafted. This results in the high precision. However, they are also quite restrictive, so a lot of cases are not extracted at all, resulting in the very low recall. In that sense, the arguments from the discussion of Popescu & Etzioni (2005) are also true here: the more accurate or specific your rules are, the higher the precision but the lower the recall.
A severe drawback of this research is the detailed lexicon that is being used. Expanding it will take a lot of effort, not to mention porting it to other languages or domains. The lexicon however, does allow for neat tricks such as reversing the polarity with words like "prevent". To my knowledge, nothing from this research is publicly available: no data sets, no code, and no sentiment lexicon.