Topic Sentiment Mixture:
Modeling Facets and Opinions in Weblogs
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai
Department of Computer Science
University of Illinois at Urbana-Champaign
Department of EECS
Categories and Subject Descriptors: H.3.3 [Information Search and Retrieval]: Text Mining
General Terms: Algorithms
Keywords: topic-sentiment mixture, weblogs, mixture model, topic models, sentiment analysis
More and more internet users now publish online dairies and
express their opinions with Weblogs (i.e., blogs). The wide
coverage of topics, dynamics of discussion, and abundance of
opinions in Weblogs make blog data extremely valuable for mining
user opinions about all kinds of topics (e.g., products, political
figures, etc.), which in turn would enable a wide range of
applications, such as opinion search for ordinary users, opinion
tracking for business intelligence, and user behavior prediction
for targeted advertising.
Technically, the task of mining user opinions from Weblogs boils
down to sentiment analysis of blog data - identifying and
extracting positive and negative opinions from blog articles.
Although much work has been done recently on blog mining
most existing work aims at extracting and analyzing topical
contents of blog articles without any analysis of sentiments in an
The lack of sentiment analysis in such work often
limits the effectiveness of the mining results. For example, in
, a burst of blog mentions about a book has been
to be correlated with a spike of
sales of the book in Amazon.com. However, a burst of
criticism of a book is unlikely to indicate a growth of the book
a decrease of
blog mentions about a product might actually be caused by the
decrease of complaints about its defects. Thus
understanding the positive and negative opinions about each
topic/subtopic of the product is critical
to making more accurate predictions and decisions.
There has also been some work trying to capture the positive and
negative sentiments in Weblogs. For example, Opinmind
 is a commercial weblog search engine which can
categorize the search results into positive and negative opinions.
Mishne and others analyze the sentiments  and
moods  in Weblogs, and use the temporal patterns
of sentiments to predict the book sales as opposed to simple blog
mentions. However, a common deficiency of all this work is that
the proposed approaches extract only the overall sentiment of a
query or a blog article, but
can neither distinguish different subtopics within a blog article,
nor analyze the sentiment of a subtopic.
Since a blog article often covers a mixture of subtopics and may
hold different opinions for different subtopics, it would be more
useful to analyze sentiments at the level of subtopics. For
example, a user may like the price and fuel
efficiency of a new Toyota Camry, but dislike its power
and safety aspects. Indeed,
people tend to have different opinions about different features of
a product [28,13].
As another example, a voter may agree with some
points made by a presidential candidate, but disagree with some
others. In reality, a general statement of good or bad about a
query is not so informative to the user, who usually wants to
drill down in different facets and explore more detailed
information (e.g., ``price'', ``battery life'', ``warranty'' of a
laptop). In all these scenarios, a more in-depth analysis of
sentiments in specific aspects of a topic would be much more
useful than the analysis of the overall sentiment of a blog
To improve the accuracy and utility of opinion mining from blog
data, we propose
to conduct an in-depth analysis of blog articles to reveal the
major topics in an article,
associate each topic with sentiment polarities,
and model the dynamics of each topic and its corresponding
sentiments. Such topic-sentiment analysis can potentially support
many applications. For example, it can be used to generate a more
detailed topic-sentiment summary of Weblog search results
as shown in Figure 1.
In this section, we formally define the general problem of
be a set of documents (e.g.,
We assume that covers
a number of topics, or subtopics (also known as themes) and
some related sentiments.
further assume that there are major topics (subtopics) in the
, each being
characterized by a multinomial distribution over all the words in
our vocabulary (also known as a unigram language model). Following
[23,21,13], we assume that there are two
sentiment polarities in Weblog articles, the positive and
the negative sentiment. The two sentiments are associated
with each topic in a document, representing the positive and
negative opinions about the topic.
Definition 1 (Topic Model) A topic model
in a text collection is a probabilistic distribution of
and represents a semantically coherent
topic. Clearly, we have
the high probability words of a topic model often suggest what
theme the topic captures.
For example, a topic about the movie ``Da Vinci Code'' may assign
a high probability to words like ``movie'', ``Tom'' and ``Hanks''
This definition can be easily extended to a distribution of
We assume that there are such topic models in the collection.
Definition 2 (Sentiment Model) A sentiment
model in a text collection is a probabilistic distribution
of words representing either positive opinions
) or negative opinions
). We have
Sentiment models are orthogonal to topic models in the sense that
they would assign high probabilities to general words that are
frequently used to express sentiment polarities whereas topical
models would assign high probabilities to words representing
topical contents with neutral opinions.
Definition 3 (Sentiment Coverage) A
sentiment coverage of a topic in a document (or a
collection of documents) is the relative coverage of the neurtral,
positive, and negative opinions about the topic in the document
(or the collection of documents).
Formally, we define a sentiment coverage of topic in
are the coverage of neutral, positive, and
negative opinions, respectively; they form a probability
distribution and satisfy
applications, we also want to know how the neutral discussions,
the positive opinions, and the negative opinions about the topic
(subtopic) change over time. For this purpose, we introduce two
additional concepts, ``topic life cycle'' and ``sentiment
dynamics'' as follows.
Definition 4 (Topic Life Cycle) A topic
life cycle, also known as a theme life cycle in
, is a time series representing the strength
distribution of the neutral contents of a topic over the time
line. The strength can be measured based on either the amount
of text which a topic can explain  or
the relative strength of topics in a time period
[15,17]. In this paper,
we follow  and model the topic life cycles with the
amount of document content that is generated with each topic model
in different time periods.
Definition 5 (Sentiment Dynamics) The
sentiment dynamics for a topic is a time
series representing the strength distribution of a sentiment
associated with . The strength can indicate
how much positive/negative opinion there is
about the given topic in each time period. Being consistent with
topic life cycles, we model the sentiment dynamics with the amount
of text associated with topic that is generated with
each sentiment model.
Based on the concepts above, we define the major tasks of
Topic-Sentiment Analysis (TSA) on weblogs as:
(1) Learning General Sentiment Models: Learn a sentiment
model for positive opinions and a sentiment model for negative
opinions, which are general enough to be used in new unlabeled
(2) Extracting Topic Models and Sentiment Coverages:
Given a collection of Weblog articles and the general sentiment
models learnt, customize the sentiment models to this collection,
extract the topic models, and extract the sentiment coverages.
(3) Modeling Topic Life Cycle and Sentiment Dynamics:
Model the life cycles of each topic and the dynamics of each
sentiment associated with that topic in the given collection.
This problem as defined above is more challenging than many
existing topic extraction tasks and sentiment classification tasks
for several reasons. First, it is not immediately clear how to
model topics and sentiments simultaneously with a mixture model.
No existing topic extraction work [9,1,16,15,17] could extract sentiment models from
text, while no sentiment classification algorithm could model a
mixture of topics
simultaneously. Second, it is unclear how to obtain sentiment
models that are
independent of specific contents of topics
and can be generally applicable to
any collection representing a user's ad hoc information need.
Most existing sentiment classification methods overfit to the
specific training data provided. Finally, computing and
distinguishing topic life cycles and sentiment dynamics
is also a challenging task. In the next section, we will present a
unified probabilistic approach to solve these challenges.
2 Problem Formulation
3 A Mixture Model for Theme and Sentiment Analysis
1 The Generation ProcessA lot of previous work has shown the effectiveness of mixture of multinomial distributions (mixture language models) in extracting topics (themes, subtopics) from either plain text collections or contextualized collections [9,1,16,15,17,12]. However, none of this work models topics and sentiments simultaneously; if we apply an existing topic model on the weblog articles directly, none of the topics extracted with this model could capture the positive or negative sentiment well. To model both topics and sentiments, we also use a mixture of multinomials, but extend the model structure to include two sentiment models to naturally capture sentiments. In the previous work [15,17], the words in a blog article are classified into two categories: (1) common English words (e.g., ``the'', ``a'', ``of'') and (2) words related to a topical theme (e.g., ``nano'', ``price'', ``mini'' in the documents about iPod). The common English words are captured with a background component model [28,16,15], and the topical words are captured with topic models. In our topic-sentiment model, we extend the categories for the topical words in existing approaches. Specifically, for the words related to a topic, we further categorize them into three sub-categories: (1) words about the topic with neutral opinions (e.g., ``nano'', ``price''); (2) words representing the positive opinions of the topic (e.g., ``awesome'', ``love''); and (3) words representing the negative opinions about the topic (e.g., ``hate'', ``bad''). Correspondingly, we introduce four multinomial distributions: (1) is a background topic model to capture common English words; (2) are topic models to capture neutral descriptions about global subtopics in the collection; (3) is a positive sentiment model to capture positive opinions; and (4) is a negative sentiment model to capture negative opinions for all the topics in the collection. According to this mixture model, an author would ``write'' a Weblog article by making the following decisions stochastically and sampling each word from the component models: (1) The author would first decide whether the word will be a common English word. If so, the word would be sampled according to . (2) If not, the author would then decide which of the k subtopics the word should be used to describe. (3) Once the author decides which topic the word is about, the author will further decide whether the word is used to describe the topic neutrally, positively, or negatively. (4) Let the topic picked in step (2) be the -th topic . The author would finally sample a word using , or , according to the decision in step(3). This generation process is illustrated in Figure 2.
be a collection
of weblog articles,
topic models, and be a positive and negative
sentiment model respectively. The log likelihood of the whole
collection according to the TSM model is
2 The Topic-Sentiment Mixture Model
where is the count of word in document , is the probability of choosing , is the probability of choosing the -th topic in document , and is the sentiment coverage of topic in document , as defined in Section 2. Similar to existing work [28,16,15,17], we also regularize this model by fixing some parameters. is set to an empirical constant between 0 and 1, which indicates how much noise that we believe exists in the weblog collection. We then set the background model as
The parameters remaining to be estimated are: (1) the topic models, ; (2) the sentiment models, and ; (3) the document topic probabilities ; and (4) the sentiment coverage for each document, . We denote the whole set of free parameters as . Without any prior knowledge, we may use the maximum likelihood estimator to estimate all the parameters. Specifically, we can use the Expectation-Maximization (EM) algorithm  to compute the maximum likelihood estimate iteratively; the updating formulas are shown in Figure 3. In these formulas, is a set of hidden variables ( ), and is the probability that word in document is generated from the -th topic, using topic/sentiment model .
The prior distribution should tell the TSM what the sentiment
models should look like in the working collection.
This knowledge may be obtained from domain specific lexicons, or
training data in this domain as in . However, it is
impossible to have such knowledge or training data for every ad
hoc topics, or queries. Therefore, we want the prior sentiment
models to be general enough to apply to any ad hoc topics. In this
section, we show how we may exploit an online sentiment retrieval
service such as Opinmind  to induce a general prior
on the sentiment models.
When given a query, Opinmind can retrieve positive sentences and
negative sentences, thus we can obtain examples with sentiment
labels for a topic (i.e., the query)
from Opinmind. The query can be regarded as a topic label. To
ensure diversity of topics, we can
submit various queries to Opinmind and mix all the results to form
a training collection. Presumably, if the topics in this training
collection are diversified enough, the sentiment models learnt
would be very general.
With such a training collection,
we have topic labels and sentiment labels for each document.
Formally, we have
, where indicates
which topics the document is about, and indicates whether
holds positive or negative opinions about the topics. We then
use the topic-sentiment model presented in
Section 3.2 to fit the training data and estimate
the sentiment models. Since we have topic and sentiment labels, we
impose the following
constraints: (1) if and
if is negative
if is positive.
In Section 5, we will show that this estimation method
is effective for extracting general sentiment models and the
diversity of topics helps improve the generality of the sentiment
Rather than directly using the learnt sentiment models to analyze
our target collection, we use them to
define a prior on the sentiment models and estimate sentiment
models (and the topic models) using the maximum a posterior
estimator. This way would allow us to adapt the general sentiment
models to our collection and further improve the accuracy of the
sentiment models, which is traditionally done in a domain
dependent way. Specifically, let
be the positive and negative sentiment models
learnt from some training collections.
We define the following two conjugate Dirichlet priors for the
sentiment model and , respectively:
where the parameters and indicate how strong our
confidence is on the sentiment model prior.
Since the prior is conjugate, (or ) can be
interpreted as ``equivalent sample size'', which means that the
impact of adding the prior would be equivalent to adding
counts for word when estimating the sentiment model
If we have some prior knowledge on the topic models, we can also
define them as conjugate prior for some . Indeed, given
a topic, a user often has some knowledge about what aspects are
For example, when the user is searching for laptops, we know that
he is very likely interested in ``price'' and ``configuration''.
It will be nice if we ``guide'' the model to enforce two of the
topic models to be as close as possible to the predefined facets.
Therefore, in general, we may assume that the prior on all the
parameters in the model is
3 Defining Model Priors
where if we do not have prior knowledge on .
With the prior defined above,
we may use the MAP estimator:
can be computed by rewriting the M-step in the EM algorithm in
Section 3.2 to incorporate the
given by the prior . The new M-step updating
4 Maximum A Posterior Estimation
The parameters can be either empirically set to constants, or set through regularized estimation , in which we would start with very large and then gradually discount in each EM iteration until some stopping condition is satisfied.
5 Utilizing the ModelOnce the parameters in the model are estimated, many tasks can be done by utilizing the model parameters.
- Rank sentences for topics: Given a set of sentences and a
theme , we can rank the sentences according to a topic with
where is a smoothed language model of sentence .
sentences by sentiments: Given a sentence assigned to topic
, we can assign to positive, negative, or neutral sentiment
where , and is a language model of .
- Reveal the overall opinions for documents/topics: Given a document
and a topic , the overall sentiment distribution for in
is the sentiment coverage
. The overall sentiment strength (e.g., positive
sentiment) for the topic is
the TSM model can be directly used to
analyze topics and sentiments in many ways, it does not directly
model the topic life cycles or sentiment dynamics. In addition to
associating the sentiments with multiple subtopics, we
would also like to show how the positive/negative opinions about a
given subtopic change over time. The comparison of such temporal
patterns (i.e., topic life cycles and corresponding sentiment
dynamics) could potentially provide more in-depth understanding of
the public opinions than , and yield more accurate
predictions of user behavior than using the methods proposed in
 and .
To achieve this goal, we can approximate these temporal patterns
by partitioning documents into their corresponding time periods
and computing the posterior probability of ,
is a time period. This approach has the limitation that these
posterior distributions are not well defined, because the time
variable is nowhere involved in the original model. An
alternative approach would be to model the time variable
explicitly in the model as in [15,17], but this
would bring in many
more free parameters to the model, making it harder
to estimate all the parameters reliably.
Defining a good partition of the time line is also a challenging
problem, since too coarse a partition would miss many bursting
patterns, while too fine granularity a time period may not be
estimated reliably because of data sparseness.
In this work, we present another approach to extract topic life
cycles and sentiment dynamics, which is similar to the method used
in . Specifically, we use a hidden Markov model (HMM)
tag every word in the collection with a topic and sentiment
Once all words are tagged, the topic life cycles and sentiment
dynamics could be extracted by counting the words with
We first sort the documents with their time stamps, and convert
the whole collection into a long sequence of words.
On the surface, it appears that we could follow 
and construct an HMM with each state corresponding to a topic
model (including the background model), and
set the output probability of state to . A
topic state can either stay on itself or transit to
some other topic states through the background state. The system
can learn (from our collection) the transition probabilities with
the Baum-Welch algorithm  and decode the
collection sequence with the Viterbi algorithm .
We can easily model sentiments by adding two sentiment states to
the HMM. Unfortunately, this structure cannot decode which
sentiment word is about which topic. Below, we present an
alternative HMM structure (shown in Figure 4) that can
better serve our purpose.
4 Sentiment Dynamics Analysis
5 Experiments and Results
We need two types of data sets for evaluation. One is used to
learn the general sentiment priors, thus should have labels for
positive and negative sentiments.
In order to extract very general sentiment models, we want the
topics in this data set to be as diversified as possible.
We construct this training data set by leveraging an existing
weblog sentiment retrieval system
(i.e., Opinmind.com ), i.e., we submit different
queries to Opinmind and mix the downloaded classified results.
This also gives us natural boundaries of topics in the training
collection. The composition
of this training data set (denotated as ``OPIN'') is shown in
1 Data Sets
The other type of data is used to evaluate the extraction of topic models, topic life cycles, and sentiment dynamics. Such data do not need to have sentiment labels, but should have time stamps, and be able to represent users' ad hoc information needs. Following , we construct these data sets by submitting time-bounded queries to Google Blog Search 1 and collect the blog entries returned. We restrict the search domain to spaces.live.com, since schema matching is not our focus. The basic information of these test collections (notated as ``TEST'') is shown in Table 2.
For all the weblog collections, Krovetz stemmer  is used to stem the text.
Our first experiment is to evaluate the
effectiveness of learning the prior models for sentiments. As
discussed in Section 3.3, a good
should not be dependent with the specific
features of topics, and be general enough to be used to guild the
learning of sentiment models for unseen topics.
The more diversified the topics of the training set are, the more
general the sentiment models estimated should be.
To evaluate the effectiveness of our TSM model on this task, we
collect labeled results of 10 different topics from Opinmind, each
of which consists of an average of 5 queries. We then construct a
series of training data sets, such that for any (), there are 10 training data sets, each of which is a mixture
of topics. We then apply the TSM model on each data set and
extract sentiment models accordingly. We also construct a dataset
with the mixture of all 10 topics.
The top words of the sentiment models which are extracted from the
10-topic-mixture data set and those from a single-topic data set
are compared in Table 3.
2 Sentiment Model Extraction
The left two columns in Table 3 present the two sentiment models extracted from the 10-topic-mixture dataset, which is more general than the right two columns and two in the middle, which are extracted from two single-topic data sets (``movies'' and ``cities'') respectively. In the two columns in the middles, we see terms like ``harry'', ``pot'', ``brokeback'', ``mountain'' ranked highly in the sentiment models. These words are actually part of our query terms. We also see other domain specific terms such as ``movie'', ``series'', ``gay'', and ``watch''. In the sentiment models from ``cities'' dataset, we remove all query terms from the top words. However, we could still notice words like ``night'', ``air'' in the positive model, and ``traffic'', ``weather'', ``transport'' in the negative model. This indicates that the sentiment models are highly biased towards the specific features of the topic, if the training data set only contains one topic. To evaluate this in a more principled way, we conduct a 10-fold cross validation, which numerically measures the closeness of the sentiment models learnt from a mixture of topics (), and those from an unseen topic (i.e., a topic not in the mixture). Intuitively, a sentiment model is less biased if it is closer to unseen topics. The closeness of two sentiment models is measured with the Kullback-Leibler(KL) Divergence, the formula of which is
where and are two sentiment models (e.g., learnt from a mixture-topic collection and from a single-topic collection). We use a simple Laplace smoothing method to guarantee . The result of the cross validation is presented in Figure 5.
3 Topic Model ExtractionOur second experiment is to fit the TSM to ad hoc weblog collections and extract the topic models and sentiment coverage. As discussed in Section 3.3, the general sentiment models learnt from the OPIN Data set will be used as a strong prior for the sentiment models in a given collection. We expect the topic models extracted be unbiased towards sentiment polarities, which simply represent the neutral contents of the topics. In the experiments, we set the initial values of reasonably large (10,000), and use the regularized estimation strategy in  to gradually decay the . is empirically set between . Some informative topic models extracted from the TEST data sets are shown in Table 4, 5.
As discussed in Section 3.4, we can either extract topic models in a completely unsupervised way, or base on some prior of what the topic models should look like. In Table 4, 5, the left three columns are topic models extracted without prior knowledge, and the right columns are those extracted with the bold titles as priors. We see that the topics extracted in either way are informative and coherent. The ones extracted with priors are extremely clear and distinctive, such as ``Nano'' and ``battery'' for the query ``iPod''. This is quite desirable in summarizing search results, where the system could extract topics in an interactive way with the user. For example, the user can input several words as expected facets, and the system uses these words (e.g., ``movie'', ``book'', ``history'' for the query ``Da Vinci Code'' and ``battery'', ``price'' for the query ``iPod'') as prior on some topic models, and let the remaining to be extracted in the unsupervised way. With the topic models and sentiment models extracted, we can summarize the sentences in blog search results, by first ranking sentences according to different topics, and then assigning them into sentiment categories. Table 6 shows the summarized results for the query ``Da Vinci Code''. We show two facets of the results: ``movie'' and ``book''. Although both the movie and the book are named as ``The Da Vinci Code'', many people hold different opinions about them. Table 6 well organizes sentences retrieved for the query ``da vinci code'' by the relevance to each facets, and the categorization as positive, negative, and neutral opinions. The sentences do not have to contain the facet name, such as ``Tom Hanks stars in the movie''. The bolded sentence clearly presents an example of mixed topics. We also notice that the system sometimes make the wrong classifications. For example, the sentence ``anybody is interested in it?'' is misclassified as positive. This is because we rely on a unigram language model for the sentiments, and the ``bag of words'' assumption does not consider word dependency and linguistics. This problem can be tackled when phrases are used as the bases of the sentiment models.
In Table 7, we compare the query summarization of our model to that of Opinmind. The left two columns are summarized search results with TSM and right two columns are top results from Opinmind, with the same query ``iPod''. We see that Opinmind tends to rank sentences with strongest sentiments to the top, but many of which are not very informative. For example, although the sentences ``I love iPod'' and ``I hate iPod'' do reflect strong attitudes, they do not give the user as useful information as ``out of battery again''. Our system, on the other hand, reveals the hidden facets of people's opinions. In the results from Opinmind, we do notice that some sentences are about specific aspects of iPod, such as ``battery'', ``video'', ``microsoft (indicating marketing)''. Unfortunately, these useful information are mixed together. Our system organizes the sentences according to the hidden aspects, which provides the user a deeper understanding of the opinions about the query.
4 Topic Life Cycle and Sentiment DynamicsBased on the topic models and sentiment models learnt from the TEST collections, we evaluate the effectiveness of the HMM based method presented in Section 4, on extraction of topic life cycles and sentiment dynamics. Intuitively, we expect the sentiment models to explain as much information as possible, since the most useful patterns are sentiment dynamics. In our experiments, we force the transition probability from topic states to sentiment states, and those from sentiment models to themselves to be reasonably large (e.g., 0.25). The results of topic life cycles and sentiment dynamics are selectively presented in Figure 6.
To the best of our knowledge, modeling the mixture of topics and
sentiments has not been addressed in existing work. However, there
are several lines of related work.
Weblogs have been attracting increasing attentions from
researchers, who consider weblogs as a suitable test bed for many
novel research problems and algorithms [11,7,6,15,19]. Much new research work has found
applications to weblog analysis, such as community evolution
, spatiotemporal text mining ,
opinion tracking [20,15,19], information
propagation , and user behavior prediction
Mei and others introduced a
mixture model to extract the subtopics in weblog collections, and
track their distribution over time and locations .
Gruhl and others  proposed a model for information
propagation and detect spikes in the diffusing topics in weblogs,
use the burst of blog mentions
to predict spikes of sales of this book in the near future
. However, all these models tend to ignore the
sentiments in the weblogs, and only capture the general
description about topics. This may limit the usefulness of their
results. Mishne and others
instead used the temporal pattern of sentiments to predict the
book sales. Opinmind  summarizes the weblog search
results with positive and negative categories. On the other hand,
researchers also use facets to categorize the latent topics in
search results . However, all this work ignores
the correlation between topics and sentiments.
This limitation is shared with other sentiment analysis work such
Sentiment classification has been a challenging topic in Natural
Language Processing (see e.g., [26,2]). The most
common definition of the problem is a binary classification task
of a sentence to either the positive or the negative polarity
[23,21]. Since traditional text categorization
methods perform poorly on sentiment classification ,
Pang and Lee proposed a method using mincut algorithm to extract
sentiments and subjective summarization for movie reviews
. In some recent work, the definition of sentiment
classification problem is generalized into a rating scale
. The goal of this line of work is to improve the
classification accuracy, while we aim at mining useful information
(topic/sentiment models, sentiment dynamics) from weblogs. These
methods do not either consider the correlation of sentiments and
topics or model sentiment dynamics.
Some recent work has been aware of this limitation. Engström
studied how the topic dependence influences the accuracy of
sentiment classification and tried to reduce this dependence
. In a very recent work , the
author proposed a topic dependent method for sentiment retrieval,
which assumed that a sentence was generated from a probabilistic
model consisting of both a topic language model and a sentiment
language model. A similar approach could be found in .
Their vision of topic-sentiment dependency is similar to ours.
However, they do not consider the mixture of topics in the text,
while we assume that a document could cover multiple subtopics and
different sentiments. Their model requires that a set of topic
keywords is given by the user, while our method is more flexible,
which could extract the topic models in an
unsupervised/semi-supervised way with an EM algorithm. They also
requires sentiment training data for every topic, or manually
input sentiment keywords, while we can learn general sentiment
models applicable to ad hoc topics.
Most opinion extraction work tries to find general opinions on a
given topic but did not distinguish sentiments [28,15]. Liu and others extracted product features and opinion
features for a product, thus were able to provide sentiments for
different features of a product. However, those product opinion
features are highly dependent on the training data sets, thus are
not flexible to deal with ad hoc queries and topics. The same
problem is shared with . They also did not provide a
way to model sentiment dynamics.
There is yet another line of research in text mining, which tries
to model the mixture of topics (themes) in documents
mixture model we presented is along this line. However, none of
this work has tried to model the sentiments associated with the
topics, thus can not be applied to our problem. However, we do
notice that the TSM model is a special case of some very general
topic models, such as the CPLSA model , which mixes
themes with different views (topic, sentiment) and different
coverages (sentiment coverages). The generation structure in
Figure 2 is also related to the general DAG
structure presented in .
6 Related Work
In this paper, we formally define the problem of topic-sentiment
analysis and propose a new probabilistic topic-sentiment mixture
model (TSM) to solve this problem.
With this model, we could effectively (1) learn general sentiment
(2) extract topic models orthogonal to sentiments, which can
represent the neutral content of a subtopic; and (3) extract
topic life cycles and the associated sentiment dynamics.
We evaluate our model on different Weblog collections; the results
the TSM model is effective for topic-sentiment analysis,
generating more useful topic-sentiment result summaries for blog
search than a state-of-the-art blog opinion search engine
There are several interesting extensions to our work. In this
work, we assume that the content of sentiment models is the same
for all topics in a collection. It would be interesting to
customize the sentiment models according to each topic and obtain
different contextual views  of sentiments on
different facets. Another interesting future direction is to
further explore other applications of the TSM, such as user
8 AcknowledgmentsWe thank the three anonymous reviewers for their comments. This work is in part supported by the National Science Foundation under award numbers 0425852, 0347933, and 0428472.
- D. M. Blei, A. Y. Ng, and M. I. Jordan.
Latent dirichlet allocation.
J. Mach. Learn. Res., 3:993-1022, 2003.
- Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan.
Identifying sources of opinions with conditional random fields and extraction patterns.
In Proceedings of HLT-EMNLP 2005, 2005.
- A. P. Dempster, N. M. Laird, and D. B. Rubin.
Maximum likelihood from incomplete data via the EM algorithm.
Journal of Royal Statist. Soc. B, 39:1-38, 1977.
- K. Eguchi and V. Lavrenko.
Sentiment retrieval using generative models.
In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 345-354, July 2006.
- C. Engstr¡§om.
Topic dependence in sentiment classification. master¡¯s thesis. university of cambridge.
- D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins.
The predictive power of online chatter.
In Proceedings of KDD '05, pages 78-87, 2005.
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins.
Information diffusion through blogspace.
In Proceedings of the 13th International Conference on World Wide Web, pages 491-501, 2004.
- M. A. Hearst.
Clustering versus faceted categories for information exploration.
Commun. ACM, 49(4):59-61, 2006.
- T. Hofmann.
Probabilistic latent semantic indexing.
In Proceedings of SIGIR '99, pages 50-57, 1999.
- R. Krovetz.
Viewing morphology as an inference process.
In Proceedings of SIGIR '93, pages 191-202, 1993.
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins.
On the bursty evolution of blogspace.
In Proceedings of the 12th International Conference on World Wide Web, pages 568-576, 2003.
- W. Li and A. McCallum.
Pachinko allocation: Dag-structured mixture models of topic correlations.
In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 577-584, 2006.
- B. Liu, M. Hu, and J. Cheng.
Opinion observer: analyzing and comparing opinions on the web.
In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 342-351, 2005.
- G. J. McLachlan and T. Krishnan.
The EM Algorithm and Extensions.
- Q. Mei, C. Liu, H. Su, and C. Zhai.
A probabilistic approach to spatiotemporal theme pattern mining on weblogs.
In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 533-542, 2006.
- Q. Mei and C. Zhai.
Discovering evolutionary theme patterns from text: an exploration of temporal text mining.
In Proceedings of KDD '05, pages 198-207, 2005.
- Q. Mei and C. Zhai.
A mixture model for contextual text mining.
In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 649-655, 2006.
- G. Mishne and M. de Rijke.
MoodViews: Tools for blog mood analysis.
In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006), pages 153-154, 2006.
- G. Mishne and N. Glance.
Predicting movie sales from blogger sentiment.
In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006), 2006.
- B. Pang and L. Lee.
A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts.
In Proceedings of the ACL, pages 271-278, 2004.
- B. Pang and L. Lee.
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.
In Proceedings of the ACL, pages 115-124, 2005.
- B. Pang, L. Lee, and S. Vaithyanathan.
Thumbs up? Sentiment classification using machine learning techniques.
In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 79-86, 2002.
- L. Rabiner.
A tutorial on hidden markov models and selected applications in speech recognition.
Proc. of the IEEE, 77(2):257-285, Feb. 1989.
- T. Tao and C. Zhai.
Regularized estimation of mixture models for robust pseudo-relevance feedback.
In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 162-169, 2006.
- J. Wiebe, T. Wilson, and C. Cardie.
Annotating expressions of opinions and emotions in language.
Language Resources and Evaluation (formerly Computers and the Humanities), 39, 2005.
- J. Yi, T. Nasukawa, R. C. Bunescu, and W. Niblack.
Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques.
In Proceedings of ICDM 2003, pages 427-434, 2003.
- C. Zhai, A. Velivelli, and B. Yu.
A cross-collection mixture model for comparative text mining.
In Proceedings of KDD '04, pages 743-748, 2004.
- ... Search1