- Ivan Titov and Ryan McDonald
|Domain||Reviews of products, hotels, and restaurants|
|Sentiment Classes||5-star rating|
|Aspect Detection Method||MultiGrain LDA|
|Sentiment Analysis Method||PRanking (existing method)|
|Sentiment Analysis||Ranking Loss: 0.669|
With the advent of topic models, especially, Latent Dirichlet Allocation (LDA), this research is one of the earlier attempts at adapting LDA to work with aspects. The main problem when using LDA, is that it will find document-level topics, called global topics in this paper, instead of local topics that correspond with aspects. While the topic of the entire document might be the entity, we are actually interested in the aspects of the entity, and the reviewer usually only talks about these aspects in very specific parts of the review. To address this issue, a multi-grain (MG) or multi-level approach is proposed here, to extract both topics on a global level and on a local level, with the latter (hopefully) corresponding to the aspects of the entity under review. The found aspects can then later be used when determining the sentiment of the aspects or of the document. To illustrate this, the MG topic model is used in conjunction with an existing sentiment analysis algorithm. But, as argued by the authors, it could also be used together with the method from Mei et al. (2007), which jointly models topics and sentiment.
A document is represented as a set of sliding windows, with each window covering T adjacent sentences within a document d. Each window v has a distribution over local topics and a distribution that defines the preference of local topics over global topics . A word can be sampled from any v that covers that sentence s, with a categorical distribution deciding which window to use.
The idea of overlapping windows allows for the use of more co-occurrence data, as LDA on only one sentence is said to perform badly. In this way, the co-occurrence data from all sentences covered by the set of overlapping windows that cover the current sentence can be used.
This generative model, with global and local topics, can be described as follows:
- Draw word distributions for global topics from a Dirichlet prior
- Draw word distributions for local topics from a Dirichlet prior
- Then for each document d:
- Choose a distribution of global topics
- For each sentence s choose a distribution
- For each sliding window v:
- For each word i in sentence s of document d:
- choose window
- if choose global topic
- if choose local topic
- choose word from the word distribution
In the above set of generative steps, is a prior Beta distribution that is used to choose between local and global topics. Using a non-symmetrical distribution here allows setting the global versus local preference by setting and accordingly. The whole model shown as a plate model in the figure below, with the standard LDA model on the left, and the proposed extension on the right:
The evaluation is performed, both quantitatively and qualitatively. For the former, multiple tables with actual topic output are printed in the paper, with corresponding LDA topics as comparison. This clearly demonstrates that the MG-LDA topics (at least the local ones, as intended) correspond much better with actual rateable aspects. For details, I'll have to refer you to the actual paper.
For the quantitative evaluation, the MG-LDA topic model is teamed up with an existing sentiment analysis algorithm called the PRanking algorithm. The idea is that the sentiment analysis performance should increase, when more accurate aspect information is available. So, when you know better which sentences are about actual aspects, sentiment analysis should be more accurate as well.
Since the PRanking algorithm assigns star ratings between 1 and 5 stars, the authors have chosen to use ranking loss as the most appropriate metric. It measures the distance between the assigned score and the actual score. This is of course more accurate than just using precision, as with precision (being binary in nature), all ranks that are not exactly the right one are false, even though it may be close to the right one. With ranking loss, assigning a 5 instead of a 1 is punished more severely, than assigning a 2 instead of a 1, whereas with precision, both assignments would have been false.
The results are reported for two setups. In the first, PRanking only uses unigrams, whereas in the second, it also uses bigrams and some common trigrams. As expected, in the latter scenario, the added value of MG-LDA is smaller than in the first.
This paper successfully adapts LDA to work with aspects, which is not a small feat. Later research will draw on this experience and more papers on aspect-level sentiment analysis with LDA are published after this one. That is a good testimony to this paper. Personally, I like the idea of the sliding windows, which is quite a smart solution to both perform LDA on a sentence level and have enough data to do it reliably. The evaluation is thorough, which is always good. I would have like some direct quantitative evaluation of the aspects, however. Now, there is just the (lengthy) qualitative evaluation, and an indirect quantitative one (using the PRanking algorithm).
Last, an interesting conclusion from the qualitative evaluation is that restaurant reviews are very hard for topic models to process correctly. The main reason, as hypothesized by the authors, is that they are both short and lack a clear aspect vocabulary: many words are specific for certain types of cuisines. Those words typically do not generalize well beyond their type of cuisine and hence are not seen as correct aspect words.