Abstract—The objective of this study was to complement
natural language processing of a content-based retrieval system
by applying keyword clustering. We focused on comparing
documents in two languages. To evaluate the performance of
this approach, we clustered keywords using the features of
documents and performed document clustering using the
results of keyword clustering. The purity and the entropy of
document clustering revealed that keyword clustering resulted
in improvements in the quality of document clustering and
allowed us to measure similarities between documents in
different languages.
Index Terms—Keyword clustering, dictionary, document
clustering, purity, entropy, export control.
The authors are with the Korea Institute of Nuclear nonproliferation and
control (KINAC), 1534 Yuseong-daero, Yuseong-gu, Daejeon, 305-348,
Republic of Korea (e-mail: ttjjww@postech.ac.kr, nucleo@kinac.re.kr).
Cite: J. Tae and D. Shin, "Keyword Clustering for Comparing Documents in Different Languages," International Journal of Machine Learning and Computing vol. 5, no. 4, pp. 277-282, 2015.