Paper Discussion: Liu et al. (2005) 1

Opinion Observer: Analyzing and Comparing Opinions on the Web

- Bing Liu, Minqing Hu, and Junsheng Cheng

Characteristics Summary
Domain Product reviews
Sentiment Classes Positive / Negative
Aspect Detection Method Supervised association rule mining
Sentiment Analysis Method N/A
Performance Summary
Pros Precision: 0.889 Recall: 0.902
Cons Precision: 0.791 Recall: 0.824

This research introduces us to Opinion Observer, a prototype system incorporating all of the ideas presented. A major focus is on the presentation of the results using visualization techniques. Another difference with previous research is that a slightly different type of product reviews is targeted. The authors distinguish between three types of product reviews:

  1. Pros and cons: The reviewer is asked to separately provide positive and negative points regarding the product. (like the user reviews on
  2. Free text review with pros and cons: The reviewer is asked to provide positive and negative points, which function more or less as a summary of the detailed review. (see
  3. Free text review only: There is no additional structure provided besides the main review text. (this is used on for example

Usually, all text provided by the user consists of properly written sentences. The only exception are the pros and cons from category 2. Since the user gets one free text field for every positive thing he wants to state, and one free text field for every negative thing he wants to state, these two fields tend to be short, keyword-based enumerations of multiple aspects. The user is expected to read the review for additional details and argumentation. This research focuses solely on this type of product reviews, and only the short pros and cons are used. Because the user already specifies whether the aspect is positive or negative by putting it in the corresponding text field, this research does not include any sentiment analysis.

Extracting aspects from the pros and cons

There are three issues that need to be taken care of when dealing with pros and cons. First, there are both explicit and implicit features. The latter is different from the former in that the actual aspect is not named as such. An example is the statement that some product "is too heavy", commenting on the weight of the product. Obviously, extracting implicit features is much harder than explicit features. Even so, this research aims to extract both explicit and implicit features. Second, since reviewers can use different words for the same aspect, the system should be able to handle synonyms by grouping them together as one aspect. Third and last, the system should be able to generalize from very specific aspects to more generic aspects. When the granularity of aspects is too high, there are many aspects with only a handful of reviews mentioning it. An example illustrating the behavior of the system in this respect is that aspects like "battery size", "battery weight", " battery color", etc. are all grouped into one "battery" aspect.

The method used in this research is supervised association rule mining, for which a large annotated data set is required. The following steps are performed for this:

  1. Part-of-speech tagging and phrase chunking, using NLProcessor, and removal of digits. This is an automated step.
  2. Replacement of the actual aspect with "[feature]" to be able to generate generalizable rules. This step requires manual tagging of all aspects.
  3. Long rules are split into segment of at most three words. This is also a manual step.
  4. Distinguish duplicate tags by adding a sequence number.
  5. Word stemming. This can be done automatically.

All three-word patterns that are found this way are saved in a transaction file that will be the input for the association rule mining. So why is this supervised association rule mining? Well, it would've been unsupervised normally, but for the manually tagged transactions in all reviews. As unsupervised implies that no manual tagging is required, it is indeed better to speak of a supervised method here. Given a set of rules that all have a support of at least 1%, they are post-processed by performing the following three steps:

  1. Only rules that have "[feature]" on the right-hand side of the rule are useful. All the others are discarded.
  2. The authors need to correct for the lack of word order in the generated rules, as word order is vital in natural language. For each rule, its matching sentences are looked up and for each possible order of words, a new, order-sensitive rule is generated. Here, a minimum confidence of 50% is required for an order-sensitive rule to be retained.
  3. Simultaneously with the previous step, the rules that are retained are transformed in a language pattern. Again, this is word-order sensitive. A rule like "<N1>, <N2> -> [feature]" may be transformed into a pattern like this: "<N1> [feature] <N2>".

Some additional remarks include:

  • gaps are allowed when matching patterns to sentence segments,
  • when multiple patterns match the same sentence segment, the one with the highest confidence is chosen,
  • for sentence segments that are not matched by any pattern, the noun or noun phrase as detected by NLProcessor is used as an aspect
  • when multiple aspects are found in the same sentence segment, the one with the highest frequency is chosen
  • the same rule mining technique is used to create a mapping from implicit aspects to explicit aspects. (i.e., from "heavy" to "<weight>")

The grouping of synonymous aspects is done using a WordNet look-up: all aspects that are found in the same synset (using only the top two senses of each word), are regarded as synonyms and are grouped together as one aspect.


The data set consists of reviews for 15 electronic products, gathered from The reviews for 10 products are used as training data for the association rule mining, and 5 are used as test set. Opinion Observer is compared against two baseline methods, one is Hu & Liu (2004) and the other is simply using the nouns and noun phrases found by the POS tagger as aspects. As can be seen, Hu & Liu (2004) performs much worse, indicating that it this method is not suitable for this kind of data.

Pros Opinion Observer Hu & Liu (2004) Noun/noun phrases
Precision Recall Precision Recall Precision Recall
data1 0.876 0.922 0.476 0.4 0.524 0.543
data2 0.902 0.894 0.567 0.494 0.642 0.747
data3 0.825 0.825 0.508 0.431 0.521 0.551
data4 0.922 0.942 0.441 0.411 0.682 0.728
data5 0.923 0.93 0.56 0.48 0.631 0.664
0.8896 0.9026 0.5104 0.4432 0.6 0.6466
Cons Opinion Observer Hu & Liu (2004) Noun/noun phrases
Precision Recall Precision Recall Precision Recall
data1 0.798 0.85 0.424 0.419 0.409 0.681
data2 0.833 0.86 0.508 0.485 0.249 0.536
data3 0.769 0.846 0.494 0.486 0.327 0.642
data4 0.657 0.681 0.506 0.496 0.354 0.758
data5 0.897 0.881 0.474 0.469 0.487 0.859
0.7908 0.8236 0.4812 0.471 0.3652 0.6952

Interestingly, the results for pros are better than for cons, a result caused by the fact that people tend to use a more diverse vocabulary for cons compared to pros. This makes it harder to get decent patterns for cons (only 22 generated patterns versus 117 patterns for pros). Therefore, the algorithm reverts more to the default option of using nouns or noun phrases as aspects, which has a negative impact on precision and recall.


This paper has some nice contributions to the field of aspect-level sentiment analysis. First, the visualizations are interesting, although I did not cover these in this post. Have a look at the paper if you're interested in those! Second, finding aspects in the pros and cons is a smart idea. These short pieces of text are densely packed with aspect information, making extraction of them relatively easy. This is the first paper I know of that did this, and already they achieve results of around 90% for pros and 80% for cons. Unfortunately, this research cannot be reproduced well, as neither the data set, nor the code of the algorithm is publicly available.

A question that remains is how well the found patterns translate to other domains. It seems to me that these pros and cons are quite generic in terms of vocabulary use, so I'd guess that the extracted patterns are likely to work well on other domains as well. Still, some additional research confirming (or disproving) it would be useful.

One thought on “Paper Discussion: Liu et al. (2005)

Leave a Reply

Your email address will not be published. Required fields are marked *