Home > Archive > 2021 > Volume 11 Number 6 (Nov. 2021) >
IJMLC 2021 Vol.11(6): 373-379 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2021.11.6.1064

A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets

Nazanin Zounemat Kermani, Xian Yang, Yike Guo, James McKenzie, and Zoltan Takats

Abstract—The preprocessing of mass spectrometry (MS) data is a crucial step in every MS study, which not only makes data comparable and manageable but also makes the study more reproducible. However, an essential part of this process, which is often overlooked, is peak matching. Although existing clustering methods have been applied for peak matching, the use of these methods have been limited. For example, the use of hierarchical agglomerative clustering (HAC) for matching of mass/charge signals has been constrained to small-scale MS data sets due to the computational complexity of HAC. In this paper, we reintroduce a bi-directional hierarchical agglomerative clustering (BHC) as a scalable and accurate peak matching technique. As a result, the computational complexity of hierarchical agglomerative clustering for peak matching was optimized by BHC to O(RlogR). BHC was benchmarked against existing peak matching techniques. Finally, we propose a parallelization framework that significantly reduces the peak matching method’s computation time.

Index Terms—Mass spectrometry data preprocessing, peak matching, hierarchical agglomerative clustering, parallel computing.

Nazanin Zounemat Kermani, Xian Yang, Yike Guo are with the Department of Computing, Data Science Institute, Imperial College London.
James McKenzie and Zoltan Takats are with the Faculty of Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London (e-mail: n.kermani@imperial.ac.uk).

[PDF]

Cite: Nazanin Zounemat Kermani, Xian Yang, Yike Guo, James McKenzie, and Zoltan Takats, "A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets," International Journal of Machine Learning and Computing vol. 11, no. 6, pp. 373-379, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

General Information

  • ISSN: 2010-3700 (Online)
  • Abbreviated Title: Int. J. Mach. Learn. Comput.
  • Frequency: Bimonthly
  • DOI: 10.18178/IJMLC
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
  • E-mail: ijmlc@ejournal.net


Article Metrics in Dimensions