Weighted Frequent Itemset of SNPs in Genome Wide Studies

Home > Archive > 2018 > Volume 8 Number 4 (Aug. 2018) >

IJMLC 2018 Vol.8(4): 311-318 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2018.8.4.704

Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, and Norlaila Mustafa

Abstract—Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.

Index Terms—Diabetes, feature selection, frequent itemset mining, single nucleotide polymorphism, weight.

Sofianita Mutalib, Azlinah Mohamed, and Shuzlina Abdul-Rahman are with the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor Malaysia (e-mail: sofi@tmsk.uitm.edu.my, azlinah@tmsk.uitm.edu.my and shuzlina@tmsk.uitm.edu.my).
Norlaila Mustafa is with the Medical Department, Faculty of Medicine, Hospital Canselor Tuanku Muhriz, Jalan Yaacob Latif, Bandar Tun Razak, Universiti Kebangsaan Malaysia, 56000 Cheras, Kuala Lumpur, Malaysia (e-mail: norlaila@ppukm.ukm.edu.my).

[PDF]

Cite: Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, and Norlaila Mustafa, "Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, and Norlaila Mustafa," International Journal of Machine Learning and Computing vol. 8, no. 4, pp. 311-318, 2018.

PREVIOUS PAPER

An Improvement of Data Classification Using Random Multimodel Deep Learning (RMDL)

NEXT PAPER

On the Use of Hash Maps for Data Reconciliation Optimization over a Data Integration System

General Information

E-ISSN: 2972-368X
Abbreviated Title: Int. J. Mach. Learn.
Frequency: Quaterly
DOI: 10.18178/IJML
Editor-in-Chief: Dr. Lin Huang
Executive Editor: Ms. Cherry L. Chen
Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library, CNKI.
E-mail: ijml@ejournal.net

Home

About IJML

Editorial Board

Author Guideline

Editor Guideline

Reviewer Guideline

Special Issues

Archive

Home > Archive > 2018 > Volume 8 Number 4 (Aug. 2018) >

Weighted Frequent Itemset of SNPs in Genome Wide Studies

General Information

Article Metrics in Dimensions