Abstract—High Performance Computing (HPC) log analysis is an active research domain. The challenge is how to extract the useful information from the HPC log file because the information resulting from the analysis can be used as a new knowledge to re-configure the HPC system for improving its efficiency. The traditional manner of HPC log analysis is considered inefficient in the sense that it is time-consuming and requires specific knowledge and skills of system administrator. In this research, we empirical study the application of machine learning techniques to perform an HPC log analysis task. We apply machine learning techniques that are different in their learning schemes including C5.0, Support Vector Machine (SVM), and Artificial Neuron Network (ANN) to analyze and predict the job status on the HPC system. We also propose a novel technique, which is called “Grouping & Combining”. Grouping means reducing the class labels of the target variable. Doing so the time-consuming for analyzing is reduced. Then, the class labels of the target variable are combined with another variable such that the efficiency of the interpretability could be increased. The dataset used in our experiment is the real-world data obtained from the HPC system of the National Electronics and Computer Technology Center, or NECTEC, Thailand. According to the experimental results, the C5.0 model has the highest prediction accuracy at 88.74%. In contrast, the ANN model shows the best robustness. In addition, the experimental results show that the proposed Grouping & Combining technique can be efficiently used for handling the multi-label classification as it helps increasing the accuracy, consuming less time, and improving interpretability of the learned model.
Index Terms—High performance computing workload, log analysis, multi-label classification, performance evaluation.
The authors are with the School of Computer Engineering, Suranaree University of Technology (SUT), Thailand (e-mail: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org).
Cite: Anupong Banjongkan, Watthana Pongsena, Ratiporn Chanklan, Nittaya Kerdprasop, and Kittisak Kerdprasop, "Multi-label Classification of High Performance Computing Workload with Variable Transformation," International Journal of Machine Learning and Computing vol. 8, no. 6, pp. 536-541, 2018.