- Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani
|Sentiment Classes||5-star rating|
|Aspect Detection Method||N/A|
|Sentiment Analysis Method||Ordinal regression (ε-support vector regression, LibSvm implementation used)|
|Sentiment Lexicon||General Inquirer|
|Averaged over all aspects||MAEμ: 0.733||MAEM: 1.032|
First off, let me say that this is not really a fully-fledged aspect-level sentiment analysis paper. It is however related to it, so it might be interesting for aspect-level sentiment analysis as well. The method proposed by the authors is based on ordinal regression, which means that it is not only able to discern multiple labels or classes, but also that it will recognize that the labels are ordered. A more fine-grained sentiment analysis can of course be useful, but it is also much harder.
The reason that this research is not completely relevant for aspect-level sentiment analysis is as follows. Each review has a global rating, and optional ratings for seven pre-defined aspects. Besides the main dataset, representing the global rating, seven more sets are created by only including reviews that have a rating for that particular aspect. By training a separate classifier on each of the subsets, a classifier is learned for each of the pre-defined aspects.
Vector Representations of Product Reviews
Instead of focusing on tweaking the classifier function, the authors have chosen to focus on feature design and selection. First, a baseline is created by just using the traditional Bag-of-Words representation. The idea is to start here, and move to more sophisticated representations to overcome the limitations of a BoW representation. As discussed more at length in the paper, while useful for text classification, the BoW model is not very useful for sentiment classification. A good example given is that "A horrible hotel in a great town" would have the same vector representation as "A great hotel in a horrible town", while expressing opposite sentiment towards the hotel.
To move away from the BoW model, the text is first processed by a Part-of-Speech tagger and the resulting tags are used to extract three simple patterns. In this way, the BoW model is enriched with (hopefully) meaningful phrases as features. See picture below for the set of patterns used.
A nice advantage of these patterns is that they can be used to find sentences with different structure but with identical semantics. Thus, canonical forms can be defined in which the expressions that match the patterns can be converted. This will both reduce the number of distinct but semantically equivalent features and increase the robustness of the canonical forms since they will have a higher count. Last, POS-tagging allows simple negations to be found. This is used to avoid the collapse of statements with their negated counterparts into one feature.
Using the General Inquirer lexicon, positive and negative words in the extracted phrases are replaced by [positive] and [negative] tags and these updated phrases are also added as features. Furthermore, since the General Inquirer also provides more detailed sentiment annotations, these are used as well and the results are also added as features. An example of such features can be seen in the table below:
The classifier now has many features that it can use to predict the rating of a hotel review: all words, all extracted patterns, all simple GI (General Inquirer) expressions, and all enriched GI expressions. However, the dimensionality of all these features combined is very large, which is the reason to add an additional feature selection phase. A so-called filter approach is chosen, where each feature's discriminative power is measured and only the t most discriminative features are retained. Two feature selection methods are devised and tested.
The first is called minimum variance and it is based on measuring the variance of the distribution of a feature across the labels of the scale (the 5 stars), keeping the ones with the least variance. The intuition is that a good feature is able to discriminate a small portion of the ordered scale from the rest, and that features with minimum variance are such features. This method is referred to as MV in the results table.
The second is an extension of the first in that it addresses its major shortcoming: in an extreme scenario, the minimum variance method will choose a lot of features that are very good at finding one particular portion of the ordered scale, but doing that it won't have any features to discriminate between the other portions of the scale. The proposed solution is:
- Provisionally assigning each feature to the label closest to its average label value
- For each label, ranking its associated features
- Enforcing a round robin policy so that each label can pick its best feature and add it to the selection.
This method is referred to as RRMV in the results table.
The dataset for this experiment can be found here (it is the TripAdvisor set). As an evaluation measure the mean absolute error is used (MAE), which is the average deviation between the predicted label and the true label. Both the standard micro-averaged (MAEμ) and the newly proposed macro-averaged (MAEM) version of MAE are used.
As can be seen, any use of more advanced features will yield a decrease in error compared to BoW. Note that the Global column refers to predicting the overall rating for the hotel review, whereas Average refers to the average over the seven aspect-specific data-subsets. It is clear that this method works best on the overall rating, although even for aspect specific data, its performance is not that bad.