Home > Archive > 2015 > Volume 5 Number 3 (Jun. 2015) >
IJMLC 2015 Vol. 5(3): 172-178 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.503

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

Yufang Wu, Heng Zhang, Bo Xu, Hongwei Hao, and Chenglin Liu

Abstract—Microblog summarization can save large amount of time for users in browsing. However, it is more challenging to summarize microblog than traditional documents due to the heavy noise and severe sparsity of posts. In this paper, we propose an unsupervised method named TR-LDA for summarizing microblog by cascading two key-bigram extractors based on TextRank and Latent Dirichlet Allocation (LDA). Cascading strategy contributes to a key-bigram set with better noise immunity. Two sentence ranking strategies are proposed based on the key-bigram set. Moreover, an approach of sentence extraction is proposed by merging two ranking results. Compared with some other text content based summarizers, the proposed method was shown to perform superiorly in experiments on Sina Weibo dataset.

Index Terms—Key-Bigram, extraction, microblog summarization, sentence extraction, TR-LDA.

The authors are with the Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China (e-mail: yufang.wu@ia.ac.cn).


Cite: Yufang Wu, Heng Zhang, Bo Xu, Hongwei Hao, and Chenglin Liu, "TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 172-178, 2015.

General Information

  • ISSN: 2010-3700 (Online)
  • Abbreviated Title: Int. J. Mach. Learn. Comput.
  • Frequency: Bimonthly
  • DOI: 10.18178/IJMLC
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
  • E-mail: ijmlc@ejournal.net

Article Metrics