• Jul 03, 2017 News!Good News! Since 2017, IJMLC has been indexed by Scopus!
  • Jul 06, 2017 News!Vol.7, No.2 has been published with online version.   [Click]
  • Jul 01, 2017 News!Vol.7, No.1 has been published with online version.   [Click]
Search
General Information
Editor-in-chief
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2015 Vol.5(6): 454-457 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2015.5.6.551

A Feature-Partition and Under-Sampling Based Ensemble Classifier for Web Spam Detection

Xiaoyong Lu, Musheng Chen, Jhenglong Wu, and Peichan Chan
Abstract—Web spam detection has become one of the top important tasks for web search engines. Web spam detection is a class imbalance problem because normal pages are far more than spam pages. However, most of traditional learning methods are not effective on imbalance classification problems. In order to tackle this problem and make full use of various features extracted from web pages’ content and links, this paper presents an ensemble classifier based on under-sampling and feature-partition techniques and integrates decision tree algorithm C4.5 into it as a sub classifier to detect web spam. The experimental results show that the ensemble classifier outperforms other approaches on several evaluation metrics such as F1-Measue, AUC etc. in WEBSPAM-UK2006 dataset.

Index Terms—Web spam detection, under-sampling, features partition, ensemble classifier, C4.5.

Xiaoyong Lu and Musheng Chen are with Nanchang University, China (e-mail: lxy@ncu.edu.cn, dreaminit@gmail.com).
Jhenglong Wu and Peichan Chan are with the Information Management Department, Yuan Ze University, Taiwan (e-mail: jlwu.yzu@gmail.com, iepchang@saturn.yzu.edu.tw).

[PDF]

Cite: Xiaoyong Lu, Musheng Chen, Jhenglong Wu, and Peichan Chan, "A Feature-Partition and Under-Sampling Based Ensemble Classifier for Web Spam Detection," International Journal of Machine Learning and Computing vol.5, no. 6, pp. 454-457, 2015.

Copyright © 2008-2015. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net