Abstract—Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.
Index Terms—Diabetes, feature selection, frequent itemset mining, single nucleotide polymorphism, weight.
Sofianita Mutalib, Azlinah Mohamed, and Shuzlina Abdul-Rahman are with the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor Malaysia (e-mail: email@example.com, firstname.lastname@example.org and email@example.com).
Norlaila Mustafa is with the Medical Department, Faculty of Medicine, Hospital Canselor Tuanku Muhriz, Jalan Yaacob Latif, Bandar Tun Razak, Universiti Kebangsaan Malaysia, 56000 Cheras, Kuala Lumpur, Malaysia (e-mail: firstname.lastname@example.org).
Cite: Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, and Norlaila Mustafa, "Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, and Norlaila Mustafa," International Journal of Machine Learning and Computing vol. 8, no. 4, pp. 311-318, 2018.