Abstract—Data clustering is an important task for data management because it groups similar data into clusters and acquires significant knowledge. K-means is one of the popular clustering algorithms; however, there are several weaknesses such as cluster quality often depended on initial centers and too sensitive to an outlier. To address the problems, this study proposed a new method of initial centers selection based on data density and a novel approach of outlier detection based on data distance. I conducted some experiments to evaluate the methods. For the new method of initial centers selection, I compared the number of iterations and the Silhouette scores from this method and the traditional K-means. For the outlier detection system, I measured the system performance by using a confusion matrix. As the results, the system of the study outperformed the traditional K-means because of higher speed and great accuracy acquired.
Index Terms—K-means, outlier detection, initial centers, a clustering algorithm, local outliers.
Sarunya Kanjanawattana is with the Compiter Engineering, Institute of Engineering, Suranaree University of Technology, Nakhonratchasima 30000, Thailand (e-mail: Sarunya.email@example.com).
Cite: Sarunya Kanjanawattana, "A Novel Outlier Detection Applied to an Adaptive K-Means," International Journal of Machine Learning and Computing vol. 9, no. 5, pp. 569-574, 2019.Copyright © 2019 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).