Paper Discussion: Ku et al. (2006)


OpinionĀ  Extraction, Summarization and Tracking in News and Blog Corpora

- Lun-Wei Ku, Yu-Ting Liang, and Hsin-Hsi Chen

Characteristics Summary
Domain News items and blogs
Sentiment Classes Positive / negative
Sentiment Analysis Dictionary-based, using topical relevance filter
Performance Summary
On NTCIR corpus (news): Precision: 38.06% Recall: 64.84% F1-measure: 47.97%
On Blog posts: Precision: 23.48% Recall: 50.94% F1-measure: 32.58%
Introduction

News and blog articles are a very important source of opinions. While sentiment analysis research often targets reviews, this research is about news and blogs. Compared to reviews, where the overall topic (the product for instance) is known beforehand, news and blogs are harder in that the topic is not known. News items are a bit more formal than both blogs and reviews, using a larger vocabulary and featuring less errors. While the formality level makes for sentences that are usually more complex and thus harder to process, the lack of errors is of course helpful.

This paper proposes a major topic detection algorithm to find the main topics of a given text. Then all sentences related to that topic are extracted and the polarity of each sentence is determined. The positive and negative sentences are then used to construct an opinion summary. Last, the proposed method also features a way of tracking sentiment over time.

Opinion Extraction

The opinion extraction algorithm is built from the bottom up, starting with sentiment words, to sentences, to documents. On the word level, a dictionary is created by using the General Inquirer (translated to Chinese) and the Chinese Network Sentiment Dictionary as a seed set. This set is then enlarged using two thesauri: tong2yi4ci2ci2lin2 (Cilin) and the Academia Sinica Bilingual Ontological Wordnet (BOW). There is a problem however, in that words in the same synsets do not always have the same polarity. Hence, expanding the seed set using these thesauri has to be done carefully. Using frequencies for the various Chinese characters (how often does a certain ideogram appear in positive words or in negative words?), the polarity of words that are not in the seed set can be defined. Formulas for this process are in the paper, if you're interested. In that case: note the normalization step that is needed to correct for the fact that there are more negative than positive words in the seed set.

On the sentence level, sentiment is computed as

S_p = S_{opinion-holder} \times \sum^n_{j=1} S_{w_j}

where S_p is the sentiment score of sentence p, S_{opinion-holder} is the weight of the opinion holder, and S_{w_j} is the sentiment score of word w_j. n is the total number of sentiment words in p. The sentiment for a document now simply is the sum of the sentences:

S_d = \sum^m_{j=1} S_p

.

Extracting relevant sentences

Using a set of tf-idf inspired measures, representative terms are extracted from the corpus. Terms are defined to be representative (in terms of the topic), when they appear in few paragraphs of many documents, or in many paragraphs of few documents.

Evaluation

The results for the determination of word sentiment are shown below:

Verb Noun Average
Precision 70.07% 52.04% 61.06%
Recall 76.57% 82.26% 79.42%
F1-measure 73.18% 63.75% 68.47%

For opinion extraction at the sentence level, without any topical relevance detection, the results are as follows:

NTCIR BLOG
Precision 34.07% 11.41%
Recall 68.13% 56.60%
F1-measure 45.42% 18.99%

Whereas for the document level, these are the results:

NTCIR BLOG
Precision 40.00% 27.78%
Recall 54.55% 55.56%
F1-measure 46.16% 37.04%

For NTCIR, the keywords describing the topics are available in the annotations. Using these for the topical relevance detection, the result improve considerably:

Sentence Document
Precision 57.80% 76.56%
Recall 67.23% 72.30%
F1-measure 62.16% 74.37%

However, when these keywords have to be extracted automatically, the results are much lower. These are the results for (I guess?) the document level opinion extraction.

NTCIR BLOG
Precision 38.06% 23.48%
Recall 64.84% 50.94%
F1-measure 47.97% 32.58%
Discussion

A very interesting notion that is worked out in this paper is the idea that sentiment in a document is based on the sentiment in the sentences, which is based on the sentiment in the words. Even though the way sentiment is combined is a bit simple (just summing up, more or less), the idea itself is good. Later papers have for example used the principle of compositional semantics, which basically states the same: the meaning (including sentiment) of a phrase/sentence is based on the meaning of the individual words and the way they are combined. In this paper, the latter part of compositional semantics is missing, as "the way they are combined" is largely ignored.

Some parts of the research I omitted, for example the comparison between their word sentiment algorithm and two out-of-the-box machine learning algorithms, as I don't think the comparison is that strong. Other parts are a bit vague, for example, for the last table of results it is not specified on what level the results are: sentence or document level.

Unfortunately, it turned out that the aspect detection part of this algorithm is rather limited. It features some topic detection algorithm that determines, on a word level, which terms are the most relevant and only opinion sentences with these words are retained. But aspects themselves are not really extracted, and thus sentiment analysis is not performed on the aspect level, but on the sentence/document level with respect to the main topic of the sentence/document. A clear point that is proved in this research, is that sentiment analysis results are better when used in conjunction with a topical relevance filter, so that opinionated sentences that are off-topic are ignored. This makes perfect sense of course, but it's good to see it proved in experiments.

 

Leave a Reply

Your email address will not be published. Required fields are marked *