Abstract—Disease diagnosis is of the utmost importance in providing appropriate medical treatment. Genetic diseases, such as hemoglobinopathies and thalassemia, need to be diagnosed accurately and on time. Though Hb variants are diagnosed using a HPLC-based hemoglobin typing machine. appropriate interpretation of the data obtained is still necessary and this requires trained professionals. Machine learning helps to interpret the obtained data and in predicting the type of Hb variants, thus reducing the workload of health professionals. In this study, the obtained data are classified using the following classifiers, namely logistic regression, support vector classifier (SVC), k-nearest neighbor (KNN), Gaussian naïve bayes, perceptron classifier, linear SVC, stochastic gradient descent, decision tree, random forest, and multi-layer perceptron. The pre-processing, visualization and the classification steps were implemented using Python 2.7 on an Intel Core i5 computer. The performance of each classifier was then tested by initially creating a confusion matrix. Indices including “precision,” “recall,” and “f1-score” were used to quantify the quality of each model. KNN, decision tree, and random forest show better classification results in comparison to the other classifiers. With a precision of 93.89%, recall of 92.78%, and f1-score of 93.33%, the decision tree and random forest classifiers prove to be better classifiers in predicting the Hb variants with a higher accuracy rate.
Index Terms—Data mining, disease prediction, Hb variants, hemoglobinopathies, machine learning, thalassemia.
Monalisha Siakia Borah is with the Department of Bioscience, Asian Institute of Management and Technology, Guwahati, Assam 781022, India (e-mail: email@example.com).
Bikram Pratim Bhuyan is with the Department of Computer Science and Engineering, Kaziranga University, Jorhat, Assam 785006, India (e-mail: firstname.lastname@example.org).
Mauchumi S. Pathak is with the Department of Biochemistry, Silchar Medical College and Hospital, Silchar, Assam 788014, India (e-mail: email@example.com).
P. K. Bhattacharya is with the Department of General Medicines, North Eastern Indira Gandhi Regional Institute of Health and Medical Sciences, Shillong, Meghalaya 793018, India (e-mail: firstname.lastname@example.org).
Cite: Monalisha Saikia Borah, Bikram Pratim Bhuyan, Mauchumi Saikia Pathak, and P. K. Bhattacharya, "Machine Learning in Predicting Hemoglobin Variants," International Journal of Machine Learning and Computing vol. 8, no. 2, pp. 140-143, 2018.