Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble

Home > Archive > 2015 > Volume 5 Number 3 (Jun. 2015) >

IJMLC 2015 Vol. 5(3): 165-171 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2015.V5.502

Wenli Gui, Liping Jing, Liu Yang, and Jian Yu

Abstract—Many real world data sets are comprised of multiple representations or views, learning from multi-view data is important in many applications. In the unsupervised cross-language classification problems, the documents in different languages always share the same set of categories. To solve the cross-language clustering problem, we propose a novel Stratified Sampling-based Cluster Ensemble method, which has two main contributions. It can effectively generate several data components from the cross-language documents set via stratified sampling technique, so that the correlation between multiple views can be significantly considered. On the other hand, it makes use of the linked based consensus function to combine the component clustering results, so that the relationship between components can be effectively utilized. A series of experiments on real cross-language documents set have been conducted. The experimental results have shown that the proposed method outperforms the state-of-the-art multi-view clustering methods.

Index Terms—Unsupervised cross-language classification, multi-view clustering, clustering ensemble, stratified sampling.

Wenli Gui, Liping Jing, and Jian Yu are with Beijing Key Lab of Traffic Data Analysis and Mining, the School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing 100044 China (e-mail: 13125158@bjtu.edu.cn, lpjing@bjtu.edu.cn, jyu@bjtu.edu.cn).
Liu Yang is with Beijing Key Lab of Traffic Data Analysis and Mining, the School of Computer Science and Information Technology, Beijing Jiaotong University, Beijing 100044 China, and College of Mathematics and Computer Science, Hebei University, Baoding, Heibei, China (e-mail: 11112091@bjtu.edu.cn).

[PDF]

Cite: Wenli Gui, Liping Jing, Liu Yang, and Jian Yu, "Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 165-171, 2015.

PREVIOUS PAPER

First page

NEXT PAPER

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2015 > Volume 5 Number 3 (Jun. 2015) >

Unsupervised Cross-Language Classification with Stratified Sampling-Based Cluster Ensemble

General Information

Article Metrics in Dimensions