Home > Archive > 2015 > Volume 5 Number 4 (Aug. 2015) >
IJMLC 2015 Vol. 5(4): 277-282 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.520

Keyword Clustering for Comparing Documents in Different Languages

J. Tae and D. Shin

Abstract—The objective of this study was to complement natural language processing of a content-based retrieval system by applying keyword clustering. We focused on comparing documents in two languages. To evaluate the performance of this approach, we clustered keywords using the features of documents and performed document clustering using the results of keyword clustering. The purity and the entropy of document clustering revealed that keyword clustering resulted in improvements in the quality of document clustering and allowed us to measure similarities between documents in different languages.

Index Terms—Keyword clustering, dictionary, document clustering, purity, entropy, export control.

The authors are with the Korea Institute of Nuclear nonproliferation and control (KINAC), 1534 Yuseong-daero, Yuseong-gu, Daejeon, 305-348, Republic of Korea (e-mail: ttjjww@postech.ac.kr, nucleo@kinac.re.kr).

[PDF]

Cite: J. Tae and D. Shin, "Keyword Clustering for Comparing Documents in Different Languages," International Journal of Machine Learning and Computing vol. 5, no. 4, pp. 277-282, 2015.

General Information

  • E-ISSN: 2972-368X
  • Abbreviated Title: Int. J. Mach. Learn.
  • Frequency: Quaterly
  • DOI: 10.18178/IJML
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals LibraryCNKI.
  • E-mail: ijml@ejournal.net


Article Metrics in Dimensions