Abstract—Microblog summarization can save large amount
of time for users in browsing. However, it is more challenging to
summarize microblog than traditional documents due to the
heavy noise and severe sparsity of posts. In this paper, we
propose an unsupervised method named TR-LDA for
summarizing microblog by cascading two key-bigram extractors
based on TextRank and Latent Dirichlet Allocation (LDA).
Cascading strategy contributes to a key-bigram set with better
noise immunity. Two sentence ranking strategies are proposed
based on the key-bigram set. Moreover, an approach of sentence
extraction is proposed by merging two ranking results.
Compared with some other text content based summarizers, the
proposed method was shown to perform superiorly in
experiments on Sina Weibo dataset.
Index Terms—Key-Bigram, extraction, microblog
summarization, sentence extraction, TR-LDA.
The authors are with the Institute of Automation, Chinese Academy of
Sciences, 95 Zhongguancun East Road, Beijing, 100190, China (e-mail:
yufang.wu@ia.ac.cn).
Cite: Yufang Wu, Heng Zhang, Bo Xu, Hongwei Hao, and Chenglin Liu, "TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 172-178, 2015.