- Ivan Titov and Ryan McDonald
|Sentiment Classes||5-star rating|
|Aspect Detection Method||Multi-Aspect Sentiment Model (incorporates MG-LDA model)|
|Aspect service||Average Precision: 75.8%|
|Aspect location||Average Precision: 85.5%|
|Aspect rooms (with best # of topics)||Average Precision: 87.6%|
A major drawback of topic models is that the topics discovered by the model are not linked to actual aspects. Typically, there is a many-to-one relation from topics to aspects. This makes it harder to evaluate the results, but also less useful in practice, as the discovered topics are just a collection of words and their associated probabilities. Hence, it is not so easy to represent a topic to a user. Typically, when analyzing reviews, one would like to find more or less the same aspects in each review, in order to effectively aggregate the results over aspects for a summary.
This research is a direct extension to the MG-LDA model in Titov & McDonald (2008a). It incorporates that model, and extends it to have topics that corresponds to predefined aspects. The main idea is to use the aspect ratings given by the users to enhance the model. The data gathered for this research consists of hotel reviews, where, besides an overall rating, at least these three aspects also have a user-defined rating: service, location, and room.
The basic idea is to have a sentiment classifier for each aspect, that will only use the words associated with the topic to predict the sentiment for that aspect. However, because aspect ratings highly correlate with the overall rating and with each other, the sentiment for an aspect will instead depend on the overall sentiment rating and a correction factor, the latter which is learned from only the words associated with the topic. In this way, the model will not be forced to put general words, that describe the overall sentiment, into topics. Besides these topics that are associated with user-defined aspect ratings, the model also has extra unassociated topics in order to capture aspects that are explicitly rated by the users.
The MAS model is shown below, as an extension to the MG-LDA model. Because of that, parts of the model that are similar are not shown again in the MAS model:
In the MAS model, is the rating for an aspect, and is the overall sentiment distribution, based on all the n-gram features of the review text. Then the distribution , for every aspect can be computed from the distribution of and from any n-gram feature where at least one word in the n-gram is assigned to the associated aspect topic ().
The evaluation data set consists of 10,000 reviews from TripAdviser.com (109,024 sentences and 2,145,313 words). Every review has at least a rating for aspects service, location, and rooms, and ratings are from 1 to 5 stars. To test the quality of the topics linked to rated aspects, 779 random sentences are labeled manually. From these, 164, 176, and 263 are labeled for the three respective aspects.
As a benchmark, a maximum entropy classifier is also trained (one for each aspect, with 10-fold cross-validation) and shown in these graphs. Note that a MaxEnt classifier is a supervised method and hence represents an upper bound for an unsupervised method like MAS. The idea of many-to-one relations from topic to rated aspect is demonstrated in figure c, where more than one topic corresponds to the aspect rooms. It is clear from these graphs that the MAS model performs relatively close the supervised method, which is quite an achievement.
|Maximum Entropy Classifier||MAS Model|
|Aspect rooms||88.3%||1 topic: 75.0%
2 topics: 74.5%
3 topics: 87.6%
4 topics: 79.8%
The main point of this research, linking topics to actual known aspects, is great. It increases the practical use of topic models immensely. There are of course, a few remaining issues. For example, the number of topics per aspect has to be set manually. While in this case, three topics for the aspect rooms performed the best, this is not fixed and could be different for other data sets. This is actually a general problem of topic models: the number of topics has to be set beforehand, but this is in reality unknown. Usually, a trial-and-error process will yield a reasonable number of topics, but this is not an elegant way of doing things.
Leveraging already provided aspect ratings to improve the model and get results similar to supervised methods is smart. It does require these aspect ratings to be available obviously, but this is the case for multiple review sites and therefore not a serious limitation.