Abstract—In this paper we discussed our proposed formula of precision for an imbalanced class distribution which gives a true reflection of a defect predictor in relation to its high classifier performance. Our formula gave values which were closer to the Accuracy computation for both balanced and imbalanced class distribution, thus our formula gave consistent high values for a good predictor irrespective of the size of our target data. Approach: We used NASA dataset to come out with well-documented examples as to how to get a higher accuracy, with its corresponding higher precision and subsequently a higher Recall and F- Measure values which are reflection of the higher classifier performance. We used data with the minority class between 5 to 10 percent (5%-10%) data points inclusive. We applied a fixed true positive rate (TPR) of one (1), whiles the false positive rate (FPR) on the other hand ranged from 0.01 to 0.05 inclusive at an interval of 0.01 for our analysis .We used the proposed adjusted formula for precision computation to improve earlier works which were criticized of not being satisfactory. The proposed formula precision (AR) was used to compute the precisions which gave results that were the true reflection of a higher performance predictor. The results in the tables clearly show our assertion for our formula giving good estimated values for precision.
Index Terms—True positive rate, false positive rate, precision, recall, F-measure and defect predictor classifier.
G. K. Armah is with the School of Computer Science and Engineering , University of Electronic Science and Technology of China, Chengdu , China (e-mail: email@example.com).
Guangchun Luo and Ke Qin are with the School of Computer Science, University of Electronic Science and Technology of China (UESTC), Chengdu, P.O. Box 611731, China (e-mail: firstname.lastname@example.org, email@example.com).
Cite: Gabriel Kofi Armah, Guangchun Luo, and Ke Qin, "A Deep Analysis of the Precision Formula for Imbalanced Class Distribution," International Journal of Machine Learning and Computing vol. 4, no. 5, pp. 417-422, 2014.