Paper Discussion: Popescu & Etzioni (2005) 4

Extracting Product Features and Opinions from Reviews

- Ana-Maria Popescu and Oren Etzioni

Domain Product reviews (same data as Hu & Liu (2004))
Sentiment Classes Positive / Negative / Neutral
Aspect Detection Method Pointwise Mutual Information + Naïve Bayes Classifier
Sentiment Analysis Method Relaxation Labeling
Aspect detection Precision: 87.84% Recall: 77.6%
Opinion phrase extraction Precision: 76.68% Recall: 76.44%
Opinion classification Precision: 84.8% Recall: 89.28%

This research is probably a bit hard to reproduce, since it is built on the resulting software of previous research, and I could find no existing code for both the base system and its extension. Because it is built on top of an existing system, some technical details with respect to aspect detection are not in this paper. The focus lies more on the sentiment analysis part and the relaxation labeling technique, which is described thoroughly.

In this paper, OPINE is presented as a system for aspect-level sentiment analysis. It is built on top of KnowItAll. Using a set of relations (or patterns), a set of extraction rules is generated, which is used to extract candidate facts from text. Since the domain is product reviews, many facts will be some form of statement about the product being reviewed. The idea is that in this way, the aspects of a product can be found. Using a Pointwise Mutual Information metric as binary input for a Naïve Bayes classifier, the system yields a probability for each candidate aspect, signifying the likelihood of it being a genuine aspect.

Given a set of aspects, the opinion phrases are found by looking in the neighborhood of the aspect for opinionated words. While the intuition is the same as that from Hu & Liu (2004), here the output of the parser can be used instead of simple word distance. Note that this directly addresses some of the shortcomings of Hu & Liu (2004), as mentioned in its paper discussion. A set of 10 manually crafted extraction rules that operate over the parser output is used to find the potential opinion phrase that is related to the found aspect. Only potential opinion phrases whose head word carries a positive or negative sentiment is retained as an actual opinion phrase.

Determining the semantic orientation, or polarity, of a word is the next step in the system. First, we can define a set of polarity labels {positive, negative, neutral}, a set of reviews, and a set of tuples (w, f, s), where w is a potential opinion word, associated with aspect f in sentence s. The task can now be defined as assigning a polarity label to each tuple (w, f, s).

The system is developed to incrementally solve this problem by first assigning a polarity label for each word w, then assigning a polarity label to each (w, f) pair, and last, by assigning a polarity label to each (w, f, s) tuple. Each of these three steps is cast as an unsupervised collective classification problem and solved using a technique from computer vision called relaxation labeling. For now, I won't go deep into relaxation labeling, as I have yet to encounter another sentiment analysis paper where this technique was used also. Regardless, it can be interesting to discuss it at some point.


The evaluation is split into two parts. First the aspect detection is evaluated on the data set introduced by Hu & Liu (2004). As can be seen in the tables below, precision is a lot better than Hu & Liu (2004), but recall is slightly lower. The first column contains Hu & Liu's reported results, and the last column the proposed method. The middle two columns are methods that are combination's of both methods.


Besides these reviews in the electronics consumer goods domain, two additional sets of reviews are used to test cross-domain functionality. The first set contains hotel reviews from and the second set contains scanner reviews from Two annotators labeled a set of 450 extractions by OPINE as either correct or incorrect. The inter-annotator agreement was 86%, and using these 86% of the extractions, OPINE's precision was 89%. The annotators also extracted explicit features from a subset of 800 review sentences (400 for each domain), with an inter-annotator agreement of 82%. Computing recall on the set of 179 features the annotators agreed upon yielded a score of 73%.

For the sentiment analysis part, OPINE is compared against two baseline methods: PMI++, and Hu++. The first is an extended version of the algorithm proposed in Turney (2002), and the latter is a version of the WordNet propagation method of Hu & Liu (2004), adapted to work not only for adjectives, but for nouns, verbs, and adverbs as well. The results of polarity classification of single words are in the table below (the four rows correspond to the four word categories: adjectives, nouns, verbs, and adverbs, respectively):


A more important evaluation is that of extracting opinion phrases and determining the polarity of the found opinion phrases (OP in the table):



This research improves on Hu & Liu (2004) by using more advanced methods for both aspect detection and sentiment analysis. Instead of word distance, it uses a parser and the grammatical relations it yields as a basis for extraction rules. The general benefit of this method is an increase in precision, albeit at the cost of recall. One of the more interesting contributions of this method is a context sensitive way of dealing with opinion words. Because in the relaxation labeling process, the neighborhood of a word forms a set of features for the machine learning algorithm to determine the label of the word, it can actually deal with words like big and small, which depend on the context to give them sentiment.

Since the extraction rules are carefully crafted by the authors, the increase in precision can hardly be a surprise. The same is true for recall, as any set of rules is limited and will miss certain instances. Actually, in the most extreme case you'll have a rule that will work perfectly, but only on one specific instance (i.e., very high precision, very low recall). The more generic your rules are, the higher your recall, but the lower your precision. It is extremely hard to break this precision-recall tradeoff.

A final remark on this method is the use of a parser. Although it allows for extraction patterns that are much more accurate compared to word distance, there are some serious issues with using a parser. First off, it is relatively slow, so real-time use will be near to impossible. Second, most parsers are statistically trained on some large corpus of text, meaning that it will work best on text that is similar to the text is was trained on. Usually parsers are trained on proper English (or whatever language your parser works with) resulting in rather poor performance on user-generated content like tweets, short messages, forum posts, and other outlets where sloppy language use predominates.

I don't know of any existing library of programming code that implements this technique for text processing, but be sure to let me know if you encounter it somewhere!

4 thoughts on “Paper Discussion: Popescu & Etzioni (2005)

  • Reply

    Subject: "A set of 10 manually crafted extraction rules that operate over the parser output is used to find the potential opinion phrase that is related to the found aspect."

    Hi Kim,
    I' m working on a project based on this paper and I'm wondering if the 10 extraction rules for the potential opinion phrases are avilable somewhere, because in the paper only 4 of them are described. Do you know if I can find them anywhere?
    Best Patricia

    • Reply
      Kim Schouten Post author

      Hi Patricia,

      Unfortunately, I don't know where to find them. As far as I know, this system is not available for download anywhere. My advice would be to email the authors to ask for these rules (or maybe all of the system code and data if possible). Beware though that this is quite an old paper, so let's hope that they still have it. Good luck with your project! What is it about?

      Kind regards,


Leave a Reply

Your email address will not be published. Required fields are marked *