Paper Discussion: Kobayashi et al. (2006)

Opinion Mining on the Web by Extracting Subject-Aspect-Evaluation Relations

- Nozomi Kobayashi, Ryu Iida, Kentaro Inui, and Yuji Matsumoto

Characteristics Summary
Domain Car reviews
Sentiment Classes N/A
Extraction Method Dictionary + Support Vector Machines
Performance Summary
Opinion extraction: Precision: 67.7% Recall: 50.7% F1-measure: 40.5
Aspect-evaluation pair extraction: Precision: 76.6% Recall: 75.1%
Opinionhood determination: Precision: 82.2% Recall: 66.2%

The focus of this research is on extracting aspect-evaluation pairs. While actually, the triple [Subject, Aspect, Evaluation] would be of interest, the subject is usually relatively easy to find, so the authors chose to focus on the hard part: the aspect-evaluation pair. Here, the evaluation is the statement communicated by someone about the aspect, which is some aspect of the subject. A major problem with previous works, is that a direct relation between the aspect and the evaluation is assumed. Many pattern-based approaches use, well, patterns to find these relations. However, according to the authors, around 30% of the sentences does not have a clear link between the evaluation and its target, the aspect. This can be either because the aspect is implicit, but it may also be the case that it is mentioned later or earlier in the text. The latter is a phenomenon called ellipsis.


First, the text is pre-processed, using a morphological analyzer (ChaSen) and a Japanese dependency parser (CaboCha). The proposed method then consists of four consecutive steps:

  1. Dictionary lookup: This step requires presence of domain-specific dictionaries for both evaluation phrases and aspect phrases.
  2. Aspect identification: For each evaluation phrase, find the best matching aspect phrase. This step is performed using the previously introduced tournament model.
  3. Aspect-evaluation pairedness determination: Each proposed pair is validated, checking whether the evaluation truly is about the proposed aspect. When it is not, only the evaluation part is retained. From this follows, that this method is designed to find opinions with an implicit aspect, but it will not actually determine the implicit aspect.
  4. Opinionhood determination: Here, each pair (or only the evaluation if the aspect was not correct), is classified into either opinionated or non-opinionated. A separate classifier is trained for the case of a full pair, and the case of a pair without an aspect.

Each classification task, including the ones in the tournament model, is performed using a Support Vector Machine with a second-order polynomial kernel function. Two types of features are used to train the SVM's:

  • Surface spelling and part-of-speech tags of the targeted evaluation expression, as well as for its dependent phrase and the latter's dependent phrase(s). (This is my interpretation, though. The text is a bit ambiguous here)
  • Relation between the targeted evaluation phrase and the candidate aspect phrase in terms of distance, the existence of a dependency relation, and the existence of a co-occurrence relation.

All features are used in every classifier that is trained for this method, with the obvious exception that the second category of features is not used for the classifier that determines opinionhood for pairs without an aspect.


To evaluate the proposed method, a Japanese corpus of car reviews is used. It consists of 288 review articles with 4,442 sentences. The corpus contains 2,191 evaluations with an explicit aspect, and 420 evaluation without one. An interesting fact is that 99% of the evaluation-aspect pairs featured an aspect that was either in the same sentence as the evaluation phrase, or in the sentence immediately preceding it. A dictionary for evaluation phrases, and one for aspect phrases is created semi-automatically. It contains 3,777 aspect phrases and 3,950 evaluation phrases.

The results for opinion extraction is compared against a baseline model. This model only uses dependency information to find a relation between aspect and evaluation. If there is a dependency, then the aspect and evaluation phrase are determined to be a pair, otherwise, the evaluation is retained without an explicit aspect.

All evaluations are based on ten-fold cross-validation.

procedure evaluation with explicit aspect evaluation without explicit aspect all aspect-evaluation pairs
baseline precision 60.5% (1130/1869) 10.6% (249/2340) 32.8% (1379/4209)
recall 51.6% (1130/2191) 59.3% (249/420) 52.8% (1379/2611)
F1-measure 55.7 21.0 40.5
proposed model precision 80.5% (1175/1460) 30.2% (150/497) 67.7% (1325/1957)
recall 53.6% (1175/2191) 35.7% (150/420) 50.7% (1325/2611)
F1-measure 64.4 32.7 58.0

For the pair extraction evaluation, the correct evaluation phrases are given, so there is no error propagation from the above results to the results below.

procedure precision recall
aspect-evaluation pair extraction baseline 71.1% (1385/1929) 63.2% (1385/2191)
proposed method 76.6% (1645/2148) 75.1% (1645/2191)
opinionhood determination 82.2% (1709/2078) 66.2% (1709/2581)

The error analysis shows that a significant proportion of the errors is caused by lacking dictionary entries. This is always the greatest weakness of dictionary based approaches: it is nigh impossible to have a complete dictionary.


This research directly addresses some of the shortcomings of previous works. It  features a way to deal with aspects and opinions that are not directly related because of ellipses. However, because it separately identifies aspects and opinions and only then determines whether they form a pair, the method requires a dictionary for both opinion phrases and aspect phrases. This introduces the problem of coverage, as mentioned above. All problems are smartly cast as binary classification problems, which can be tackled using many machine learning algorithms, like SVM in this paper.

The baseline is well-chosen, as the authors explicitly tackle problems caused by that class of methods. A minor remark however, is that any optimization will increase performance to some extent. As the baseline is not optimized in any way, some improvement by the proposed method is to be expected. The results look therefore slightly (but only slightly) better than they should be.

Leave a Reply

Your email address will not be published. Required fields are marked *