Paper Discussion: Zhuang et al. (2006)


Movie Review Mining and Summarization

- Li Zhuang, Feng Jing, and Xiao-Yan Zhu

Characteristics Summary
Domain Movie reviews
Sentiment Classes Positive / Negative
Aspect Detection Method Custom lexicon plus dependency patterns
Sentiment Analysis Method Custom lexicon plus dependency patterns
Parser Stanford Parser, using typed-dependencies output
Performance Summary
Precision: 0.483 Recall: 0.585 F-score: 0.529
Introduction

Instead of product reviews, this research focuses on movie reviews. The trouble with movie reviews however, is that they are quite a bit harder to mine than product reviews. The main challenge for sentiment analysis of movie reviews is to make the right distinction between actual opinions of the reviewer and positive or negative things that happen in the movie itself. A lot of plot descriptions carry sentiment, even though they are only factual descriptions of the story. As can be seen in the performance summary above, this results in numbers way below the scores obtained on product reviews.

The interesting thing of the approach in this paper is what the authors call a multi-knowledge approach. As the approach is heavily based on lexicons for both aspect detection and sentiment word detection, it's in the generation of both lexicons that multiple sources are combined. Information from WordNet, movie cast data, and labeled training data are combined to generate both lists of words. Then, using grammatical dependencies patterns between words, valid aspect-opinion pairs are extracted. Valid here, means that the opinion word is actually grammatically related to the aspect.

Keyword list generation

Using IMDB, a set of predefined aspect classes is defined. These classes are divided into two groups: elements and people. The first includes aspect classes like overall, screenplay, character design, vision effects, music and sound, and special effects. The second group includes aspect classes like producer, director, screenwriter, actor and actress, music and sound people (e.g., composer, singer, sound effects creator, etc.), and technical people (e.g., cameraman, editor, etc.).

The statistical results of 1,100 manually labeled reviews are used to find proper keywords. Note that keywords in this context can also include short phrases, like "well acted" or "special effects". Given the set of labeled reviews, all keywords with a frequency below 1% are disregarded (this is reminiscent to the minimum support of Hu & Liu (2004)). The remaining keywords still cover 90% of the aspect occurrences, illustrating the skewed distribution of words. The remaining set of words is less than 20 for most of the aspect classes, and it forms the basis of the aspect keyword list. Since these words are not likely to change anytime soon, no attempt is made to include synonyms. In addition to these normal words, a cast library is build from the cast information in IMDB for all considered movies, where only names that occur also in the training data are included. Regular expressions are used to find names in the reviews and to transform them into a canonical form as specified in the cast library.

For opinion words, no external sources like a cast library are needed. Instead, the labeled training data is used to find 1093 positive keywords and 780 negative keywords. A little bit more than half of these words also appears in the general sentiment lexicon of General Inquirer. According to the authors, this implies the need for a domain-specific sentiment lexicon. For both the positive and the negative list, the first 100 words with the highest frequency are used as seed words and are immediately put into the final sentiment lexicon. Then, for each substantive in WordNet, the top two synsets are scanned and if one of the seed words is in there, it is added to the semantic lexicon as well. And last, opinion words that have a high frequency in the training data but not in the generated list are added as domain specific words to the semantic lexicon.

To make the link between sentiment keyword and aspect keyword, patterns are extracted from the labeled training data. Only patterns with a high frequency are retained as dependency relation templates.

For detecting implicit aspects, the authors do not present a general approach. Instead, they focus on very short sentences (at most three words), which are usually exclamations of the reviewers regarding implicit aspects. In the training data, they look at opinion words that are always used for the same aspect and record these in a separate list. Whenever such an opinion word is found, the corresponding aspect is looked up in this list and added as an implicit feature.

Evaluation

For eleven movies, selected from IMDB, that represent as many genres as possible, the top 100 most helpful reviews were downloaded and manually annotated by four movie fans. In total they annotated over 16,000 sentences spanning more than 260,000 words with opinion-aspect pairs. For the experiments, the data is split into five equal folds, containing 20 reviews for each movie. Then, five-fold cross-validation is performed and the average results are reported below. The results are compared against the method of Hu & Liu (2004). As can be seen, using grammatical patterns improves precision, but hurts recall.

ZhuangEA2006-Results

Discussion

This paper is an interesting take on review mining by focusing on the movie review domain, which is arguable harder than product reviews. However, as it is a lexicon-based approach, it has some problems with recall performance, which is further aggravated by using grammatical patterns. In terms of precision, since it is defined as finding the right aspect class for a certain aspect-opinion pair  it will actually hurt precision if an aspect has been found, but has been assigned the wrong class. Furthermore, as we discussed, movie reviews mention a lot of plot points which may carry sentiment even though it is not a reflection of the author's opinion. This will also lead to lower precision as aspect-opinion pairs are extracted incorrectly.

Unfortunately, this research is not readily reproducible as the annotated data set is not available. Of course, the movie reviews themselves are easily crawled, so if you're determined (and have lots of resources 😉 ), you can make this work.

Leave a Reply

Your email address will not be published. Required fields are marked *