With the training data released just before Christmas, I started working on this benchmark this month. For three of the four subtasks I have established a rough baseline method. I will try to keep this updated over time.
- A relatively simple Part-of-Speech pattern algorithm that finds aspects
(Precision: 54% Recall: 77% F1: 63%)
- Default HMM tagger
(Precision: 61% Recall: 30% F1: 40%)
Aspect Polarity Classification
- A SentiWordNet method that simply aggregates by summing, while discounting words based on how far they are from the aspect words. Tested with golden aspect data as input.
- Simply choose always positive.
- A simple Bag-of-Words classifier based on SVMlib.
(Precision: 74% Recall: 32% F1: 45%)
- A method based on co-occurrences.
(Precision: 73% Recall: 70% F1: 71%)
Category Polarity Classification
- none yet
If you are also working on this problem, let me know in the comments! If you also have some preliminary results, then shout them out as well. It's a competition after all...