Abstract—In this paper, a new algorithm, skipping suffix algorithm based on a new encoded mode for genome sequence aimed at accelerating multiple genome sequence matching are proposed. By introducing binary coding, the efficiency of gene sequence alignment gets improved obviously. Besides, we decide the maximal bits to skip by constructing skipping tree. A contrastive evaluation of the computational efficiency of KMP algorithm, suffix array and skipping suffix algorithm shows that preprocess of skipping suffix algorithm is more than 12 times speedup than that of suffix array. Moreover, multiple genome sequence matching based on suffix array is more than 50 times speedup than that of KMP. In a word, skipping suffix algorithm strike balance between preprocess and search successfully which better help it fit into large-scale genetic data matching.
Index Terms—Bioinformatics, skipping tree, bit manipulation, binary search.
The authors are with School of Software Engineering, Sichuan University, 610225 Chengdu, China (e-mail: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com).
Cite: Zihuan Xu, Kewei Cheng, Yi Ding, Ziqiang Tian, and Hui Zhao, "A Multiple Genome Sequence Matching Based on Skipping Tree," International Journal of Machine Learning and Computing vol. 5, no. 1, pp. 78-85, 2015.