Abstract—Ultimate objective of data mining is to extract information from large datasets, and to utilize the extracted information in decision making process. Clustering is the most generic approach of many unsupervised algorithms in data mining, which cluster data into samples so that objects with similar statistical properties cluster together. Hierarchical, partition, grid and spectral are such clustering algorithms coming under unsupervised approach. Many of these approaches produce clusters either according to a predefined value or according to its own algorithm or produces hierarchies letting the user to determine the preferred number of clusters. Selecting appropriate number of clusters for a given problem is a crucial factor that determine the success of the approach. This paper proposes a novel recursive hierarchical clustering algorithm which combine the core concepts of hierarchical clustering and decision tree fundamentals to find the optimal number of clusters that suits to the given problem autonomously.
Index Terms—Decision tree, gain ratio, gini-gain, gini-index, hierarchical clustering.
The authors are with the University of Moratuwa, Sri Lanka (e-mail: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com).
Cite: Pavani Y. De Silva, Chiran N. Fernando, Damith D. Wijethunge, and Subha D. Fernando, "Recursive Hierarchical Clustering Algorithm," International Journal of Machine Learning and Computing vol. 8, no. 1, pp. 1-7, 2018.