Keyword Clustering for Comparing Documents in Different Languages

Home > Archive > 2015 > Volume 5 Number 4 (Aug. 2015) >

IJMLC 2015 Vol. 5(4): 277-282 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.520

J. Tae and D. Shin

Abstract—The objective of this study was to complement natural language processing of a content-based retrieval system by applying keyword clustering. We focused on comparing documents in two languages. To evaluate the performance of this approach, we clustered keywords using the features of documents and performed document clustering using the results of keyword clustering. The purity and the entropy of document clustering revealed that keyword clustering resulted in improvements in the quality of document clustering and allowed us to measure similarities between documents in different languages.

Index Terms—Keyword clustering, dictionary, document clustering, purity, entropy, export control.

The authors are with the Korea Institute of Nuclear nonproliferation and control (KINAC), 1534 Yuseong-daero, Yuseong-gu, Daejeon, 305-348, Republic of Korea (e-mail: ttjjww@postech.ac.kr, nucleo@kinac.re.kr).

[PDF]

Cite: J. Tae and D. Shin, "Keyword Clustering for Comparing Documents in Different Languages," International Journal of Machine Learning and Computing vol. 5, no. 4, pp. 277-282, 2015.

PREVIOUS PAPER

Expert System Development through the Decision-Making Process and Optimization for Classifying Strategic Items

NEXT PAPER

Time Series Shapelets: Training Time Improvement Based on Particle Swarm Optimization

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2015 > Volume 5 Number 4 (Aug. 2015) >

Keyword Clustering for Comparing Documents in Different Languages

General Information

Article Metrics in Dimensions