• Aug 09, 2018 News!Good News! All papers from Volume 8, Number 3 have been indexed by Scopus!   [Click]
  • Jan 11, 2019 News!The papers published in Vol.9, No.1 have all received dois from Crossref.
  • Jan 08, 2019 News!Vol.9, No.1 has been published with online version.   [Click]
General Information
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2018 Vol.8(4): 336-340 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2018.8.4.708

Classification and Regression Tree with Resampling for Classifying Imbalanced Data

Supajittree Boonamnuay, Nittaya Kerdprasop, and Kittisak Kerdprasop
Abstract—Data mining is the automatic process to find from data interesting and useful patterns for specific tasks such as predicting future data or classifying label or group of the new data items. Many data mining algorithms successfully applied to several real-life data are in a tree group. Among the tree-based algorithms, decision tree is the most popular and renowned one for its high accuracy on classifying data in general cases in which data in each class are quite equally distributed. But many datasets in real applications are imbalanced; amount of data in some group outnumber those in other group. Such uneven distribution among classes is a main reason why classification accuracy is not excellent even when using decision tree algorithm. Inefficiency is due to the case that in the tree growing phase, the algorithm tends to favor the majority data and ignores the minority data to be incorrectly classified. In the past many researchers try to solve this data imbalanced problem with many ways like over-sampling, under-sampling, cost-sensitive classification, or even ensemble of cost-sensitive decision tree. In this paper, we introduce a simplified method of learning classification and regression tree (CART) with resampling technique for classifying imbalanced datasets. We compare our proposed method with other methods based on several metrics including the precision on classifying the minority data as opposed to the classification on majority data, the overall accuracy regardless of minority nor majority classes, and the Matthews Correlation Coefficient (MCC). The use of MCC is suitable for imbalanced data because it takes into account all four classifying metrics: true positive, true negative, false positive, and false negative. The performance of our proposed method to combine resampling with CART is satisfied based on the MCC metric. From all five experimental imbalanced datasets, our method performs the best.

Index Terms—Classification and regression tree, CART, resampling technique, imbalanced data, matthews coefficient correlation.

The authors are with the School of Computer Engineering, Suranaree University of Technology (SUT), Thailand (corresponding author: Supajittree Boonamnuay; tel.: +66892865318; e-mail: eternity_faith@windowslive.com, nittaya@sut.ac.th, kerdpras@sut.ac.th).


Cite: Supajittree Boonamnuay, Nittaya Kerdprasop, and Kittisak Kerdprasop, "Classification and Regression Tree with Resampling for Classifying Imbalanced Data," International Journal of Machine Learning and Computing vol. 8, no. 4, pp. 336-340, 2018.

Copyright © 2008-2019. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net