Abstract—Currently, one of the most challenging problem in machine learning and data mining is the data imbalance problem. Many techniques and methods are researched and proposed to solve this problem. Fundamental solution is data balancing with under-sampling and over-sampling techniques. However, these conventional methods might be suffered from the potential loss of useful information leading to the generation of useless patterns. Therefore, the techniques that avoid adjusting the sample size of data are more interesting. One of such technique is misclassification cost adjustment. This paper focuses on improving the performance of classification model built from the misclassification cost adjustment technique by proposing the novel heuristic method. Our proposed method uses a heuristic based on the experience of practitioner working on many manufacturing data. The heuristic employs the relation between misclassification cost, imbalance ratio and a constant factor “e” (Euler’s number). The experiment has been operated on 56 real-world datasets with various number of attributes and different degrees of imbalance ratio. The results confirm that our novel heuristic method can help improving the performance of the classification model. On datasets with high imbalance ratio, our method shows the improvement rate of AUC up to 29%.
Index Terms—Misclassification cost, imbalance data, classification, decision tree learning.
The authors are with School of Computer Engineering, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand (e-mail: Anusara.firstname.lastname@example.org, email@example.com, firstname.lastname@example.org).
Cite: Anusara Hirunyawanakul, Nittaya Kerdprasop, and Kittisak Kerdprasop, "A Novel Heuristic Method for Misclassification Cost Tuning in Imbalanced Data," International Journal of Machine Learning and Computing vol. 8, no. 6, pp. 565-570, 2018.