Abstract—In the field of pattern recognition, principal component analysis (PCA) is one of the most well-known feature extraction methods for reducing the dimensionality of high-dimensional datasets. Simple-PCA (SPCA), which is a faster version of PCA, performs effectively with iterative operated learning. However, SPCA might not be efficient when input data are distributed in a complex manner because it learns without using the class information in the dataset. Thus, SPCA cannot be said to be optimal from the perspective of feature extraction for classification. In this study, we propose a new learning algorithm that uses the class information in the dataset. Eigenvectors spanning the eigenspace of the dataset are produced by calculating the data variations within each class. We present our proposed algorithm and discuss the results of our experiments that used UCI datasets to compare SPCA and our proposed algorithm.
Index Terms—Pattern recognition, principal component analysis, supervised learning.
Y. Takeuchi is a doctoral course student with the Graduate School of Advanced Technology and Science, the University of Tokushima, Tokushima, 770-8506 Japan (e-mail: email@example.com).
M. Ito is an Assistant Professor with the Department of Information Science and Intelligent Systems, the University of Tokushima, Tokushima, 770-8506 Japan (e-mail: firstname.lastname@example.org).
M. Fukumi is a Professor with the Department of Information Science and Intelligent Systems, the University of Tokushima, Tokushima, 770-8506 Japan (e-mail: email@example.com).
Cite:Yohei Takeuchi, Momoyo Ito, and Minoru Fukumi, "Novel Approximate Statistical Algorithm for Large Complex Datasets," International Journal of Machine Learning and Computing vol.2, no. 5, pp. 720-724, 2012.