Home > Archive > 2014 > Volume 4 Number 4 (Aug. 2014) >
IJMLC 2014 Vol.4(4): 313-318 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2014.V4.430

A LDA-Based Approach for Semi-Supervised Document Clustering

Ruizhang Huang, Ping Zhou, and Li Zhang

Abstract—In this paper, we develop an approach for semi-supervised document clustering based on Latent Dirichlet Allocation (LDA), namely LLDA. A small amount of labeled documents are used to indicate user's document grouping preference. A generative model is investigated to jointly model documents and the small amount of document labels. A variational inference algorithm is developed to infer the document collection structure. We explore the performance of our proposed approach on both a synthetic dataset and realistic document datasets. Our experiments indicate that our proposed approach performs well on grouping documents based on different user grouping preferences. The comparison between our proposed approach and state-of-the-art semi-supervised clustering algorithms using labeled instance shows that our approach is effective.

Index Terms—Semi-supervised clustering, document clustering, latent dirichlet allocation, generative model.

The authors are with the College of Computer Science and Technology, Guizhou University, Guiyang, CO 550025 China (corresponding author: Li Zhang; e-mail: cse.rzhuang@gzu.edu.cn, gs.pzhou11@mail.gzu.edu.cn, lizhang_2004@126.com).


Cite: Ruizhang Huang, Ping Zhou, and Li Zhang, "A LDA-Based Approach for Semi-Supervised Document Clustering," International Journal of Machine Learning and Computing vol.4, no. 4, pp. 313-318, 2014.

General Information

  • ISSN: 2010-3700 (Online)
  • Abbreviated Title: Int. J. Mach. Learn. Comput.
  • Frequency: Bimonthly
  • DOI: 10.18178/IJMLC
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
  • E-mail: ijmlc@ejournal.net

Article Metrics