Prediction of Fused Magnesium Operating Mode Based on ADASYN-XGBoost

The operating modes in the smelting process of fused magnesium are cyclically shifted, resulting in severe fluctuations in electricity load. Accurate prediction of its operating mode shifting can optimize the power supply curve of an electric furnace, to improve electric energy efficiency and reduce electricity expenses. In this paper, we propose a prediction model of fused magnesium operating mode based on ADASYN-XGBoost. Four supervised machine learning algorithms including a eXtreme Gradient Boosting (XGB), ADASYN-LGB and ADASYN-RF, ADASYN-SVM were compared with the proposed ADASYN-XGB method. The results indicate that the ADASYN-XGB has the best prediction accuracy (92.5%), high average precision (>0.8), low hamming loss (0.03) and low ranking loss (0.075). Based on these results for classification performance and prediction accuracy, the ADASYN-XGB is a solid candidate for a correct classification of operating modes. These findings suggest that ADASYNXGB systems trained with real data may serve as a new tool to assist in fused magnesium smelting process. 


I. INTRODUCTION
The electrical fused magnesium furnace uses magnesite or lightly-burned magnesite powder as raw materials, and by heating the raw materials to about 2800℃, so they melt, separate out impurities, and recrystallize to obtain qualified fused magnesium. Fused magnesium is widely used in the metallurgical industry, aerospace, nuclear power and other fields. Five operating modes are identified by experience of experts and field operators during smelting process of fused magnesium, such as semi-melt mode, normal-melt mode, over-melt mode, a-load mode, and gas-expel mode. Additionally, taken relationship between operating modes into account, the smelting process can also be divided into no-shifted mode and shifted mode.
According to different operating modes, the power required for fused magnesium is also different. Some operating modes require high power, such as normal-melt mode; some operating modes require less power, such as gas-expel mode. In addition, the duration of each operating mode is also different. On the contrary, shifted time is very short, which leads to a dilemma that when predicted by Manuscript  algorithms the shifted mode samples will be deleted as abnormal data, their predictions will be previous mode. If the model is directly built with limited data, the generalization performance of shifted mode is poor, and the prediction accuracy is low. Therefore, there are two problems to be solved: predicting multi-label classification and the imbalanced samples.
Nowadays, with the continuous development and application of information collection and data transmission in various industries, a large amount of multi-label data has been accumulated, and the research on multi-label classification has been gradually extended to image processing, gene function, emotion classification, and other fields. In recent years, a host of multi-label classification algorithms have been proposed, which are mainly divided into two major classes: problem transformation [1], [2] and algorithm adaptation [3]. The problem transformation methods decompose a multi-label problem into one or more single-label subproblems, which can be solved directly by traditional classification models. To achieve this goal, there are three main methods for problem transformation: Binary Relevance [4], Label Powerset [5], and Pairwise (two-two association method). BR is a typical binary relevance method. It constructs the binary classifier independently and does not consider label relevance at all. To introduce label relevance into BR, Godbole and Sarawagi proposed a model containing two layers of BR. Read et al. proposed a classifier chain (CC) model for multi-label classification, which constructs a binary classifier chain with each classifier corresponding to one label. Then, the multi-label problem is transformed into a single-label multi-class problem, such as RAkEL [6], [7] and EPS. While the algorithm transformation method uses existing single-label algorithms for multi-label classification. This research focuses on the application of decision trees [8], k-nearest neighbors [9], [10], neural networks and support vector machines. A. Clare and R. D. King construct a decision tree by a top-down method, whose root contains all the training samples. For the non-leaf nodes in the tree, each feature is examined one by one to find the appropriate splitting point so that the data at that node can be split for maximum information gain.
In fact, severe class imbalances are prevalent in our lives, such as fraud detection [11] and medical image analysis [12], so effective identification of imbalanced data is an important area of machine learning [13]. Although there has been enough development in machine learning, there are few studies in this area due to imbalanced data [14]. As a discriminative model, the convergence and performance of machine learning can be significantly affected by skewed data [15], [16]. Since majority and minority classes have different effects on the objective function, classifiers tend to ignore labels with lower frequencies in order to achieve higher overall accuracy, which leads to more errors referring to minority classes. In the study of multi-label learning, the dataset also has the same problem of uneven label distribution and have very limited related research. Resampling technique is a classifier-independent solution, and widely used in traditional imbalance learning [13], such as SMOTE, ADASYN, etc. Multi-label imbalance algorithms based on resampling methods have been proposed in [17]]-[ [23] and have been shown to have the ability to improve classification results. Essentially, the purpose of data resampling is to adjust the bias of the classifier by changing the prior probability distribution of the classes in the training data. In the data preprocessing stage, resampling methods are effective in improving the performance of the model. This paper proposes a method to build an ADASYN-XGBoost model for predicting operating modes based on the features of the original data of electrical fused magnesium operating modes. After obtaining the raw data of the fused magnesium smelting process, we first use feature engineering to remove redundant features, and then resample through ADASYN to generate five new training set, build XGBoost models, and finally predict the operating modes.
The rest of this paper is organized as follows: Section II presents the operating mode prediction method based on ADASYN-XGBoost. Section III describes the experiments. Section IV presents the analysis of experimental results. Finally, Section V presents conclusion and future. Five operating modes of electrical fused magnesium are cycled alternately according to a certain rule during the smelting process, but the duration of each operating mode is different, such as the duration of the previous gas-expel mode is about 12 seconds, the current may be 24 seconds. The uncertainty of such duration brings great difficulties to the prediction of fused magnesium operating modes. In this paper, we propose an operating mode method based on ADASYN-XGBoost. We build five models respectively, and input the testing dataset to the respective models for prediction by recognizing the current operating mode. The method mainly includes two steps: Model Building based on ADASYN and Prediction of Operating Mode. The method step flow is shown in Fig. 1.

A. Model Building Based on ADASYN
Two modes mentioned in the first section: no-shifted mode and shifted mode. "No-shifted mode" means that the current operating mode and the next mode are the same. And "shifted mode" means that the current operating mode and the next mode are different. The specific performance of fused magnesium operating mode is as follows: (1) At the beginning operating mode will continue for a while, then the mode shifting will occur, which causes that no-shifted modes are majority classes and shifted modes are minority classes; (2) There is often a correlation between the current mode and the next mode, so the first issue we discussed in this section is the shift order of the operating modes.

B. Classification Mining Process Based on CART Tree
The CART tree determines the attribute division points by calculating the size of the Gini impurity coefficient, and the information contained in the dataset is measured by the Gini impurity coefficient. Assuming that there are classes, and the probability of belonging to the k-th class in a sample is , the formula of the Gini impurity coefficient of the probability distribution is shown in formula (1).
According to the definition of the Gini coefficient, the Gini index of the sample set U is shown in formula (2), where represents the subset of samples in U that belong to the k-th class.
According to the feature A is divided at a certain value a, U is divided into 1 and 2 . Under feature A, the Gini coefficient of set U is shown in formula (3), where the Gini coefficient ( ) reflects the probability that two samples randomly selected from U have inconsistent class labels. The smaller the Gini coefficient ( ), the higher the purity of U and the better the branch. The Gini coefficient ( , ) represents the impurity of the set U after division by = , as shown in formula (3).
The structure of the operating mode generated by the CART tree is shown in Fig. 2.
International Journal of Machine Learning and Computing, Vol. 12, No. 5, September 2022 . And ℎ is a predefined threshold for the maximum tolerable degree of class imbalance ratio of the j-th model.
The main steps can be summarized as follows: 1) Determine the minority class of the j-th model, where = 1,2,3,4,5.
2) Calculate the imbalance of the j-th model 3) Calculate the number of synthetic data instances that need to be generated for the j-th model minority class if < ℎ , then ∈ (0,1] is a parameter that specifies the desired balance level after the j-th model generates synthetic data instances. = 1 means that the j-th model generates a fully balanced dataset after the synthesis process. 4) For each sample belonging to a minority class in the j-th model, the k nearest neighbors are calculated in dimensional space using Euclidean distance, the ratio is as follows: where ∆ is the number of samples belonging to the majority class among the K nearest neighbors of , so ∆ ∈ [0,1].

5) Regularize
according to ̂= / ∑ =1 to obtain the density distribution of (∑= 1 = 1). 6) Calculate the number of synthetic data instances to be generated for each minority class sample of the j-th model.
where is the total number of synthetic data instances that need to be generated for samples of minority classes, as shown in formula (5). 7) For each minority class sample of the j-th model, synthetic data instances are generated: A data of minority class sample is randomly selected from the k nearest neighbors of . An example of synthetic data is generated as follows: where is different vectors in -dimensional space, the random variable ∈ [0,1].
8) Determine whether the new training set obeys the identical distribution. If it obeys the identical distribution, the new training set will be send to XGB to build the model; if it does not obey the identical distribution, the above 2) to 7) will be repeated until it obeys the identical distribution.

D. Prediction of Operating Mode
The current operating mode of each data in the testing set will be recognized, and then they will be input into the respective models for prediction. Specifically, if the current mode is semi-melt mode, the data will be input to model 1 and return a predicted mode; if the current mode is A-load mode, the date will be recognized by the semi-melt module, normal-melt module, over-melt module, until the A-load module, and after the recognition is the A-load mode, the testing data will be input to model 4 and return a predicted mode result. After a series of recognition and prediction, we can finish the predicted operating mode of the whole testing set, and generate a set of predicted operating modes.

III. EXPERIMENTS
The data used in the experiment is from the real production of an electrical fused magnesium enterprise. After preprocess the raw data, the training set contains over 1870 samples and the testing set contains 800 samples. But it is only 140 samples that mode shifting occurs in training set, accounting for only 7.38%. If the model is trained according to the original training set data, it will be difficult to make an accurate prediction of the data with shift mode when the testing set data is input. This is a typical sample imbalance problem. Therefore, we propose an ADASYN-XGBoost method to solve the problem caused by International Journal of Machine Learning and Computing, Vol. 12, No. 5, September 2022 imbalanced data.

A. Experimental Metrics
In this experiment, we use six evaluation metrics, including Hamming Loss, Ranking Loss, Coverage, One Error, Average Precision, and KAPPA coefficient. The six metrics are described in detail as follow:

1) Hamming loss
Hamming Loss is used to calculate the accuracy of multilabel classification models.
where ℎ( ) is the predicted label of sample .

2) Coverage
Coverage is used to calculate how far we need to go down the ranked label list to cover all possibility.

3) One error
One error indicates the proportion of samples whose label with the highest predicted probability value is not in the true label set. ) (13)

6) KAPPA
The Kappa coefficient is used to test the consistency of the predicted results of the classifier with the actual results. where,

B. Experimental Results
In this experiment, we select eXtreme Gradient Boosting (XGB), ADASYN-LightGBM (ADASYN-LGB), ADASYN-RF, ADASYN-SVM, ADASYN-XGBoost 5 machine learning algorithms to build the models. Based on the original training set, the best model is built by seeking the optimal combination of model parameters through grid search and other methods. Finally, the built models are evaluated with the testing set. The following are the test results of various machine learning algorithms.

A. Performance Metrics
For each evaluation metric, "↓" indicates better performance for smaller values and "↑" indicates better performance for larger values. Each result consists of a mean and a rank. The best results for the five machine learning algorithms are highlighted in bold. If two or more algorithms achieve the same performance on a given evaluation metric, the value of the corresponding rank is assigned to their average rank. To present the results more clearly, the average rank on all evaluation metrics of each algorithm is calculated in this paper and recorded in the last column of each table.  In this experiment, Table I and Table II report in detail the  results of the proposed experiments with XGB, ADASYN-LGB, ADASYN-RF, ADASYN-SVM, and ADASYN-XGB as classifiers. To compare their effectiveness more International Journal of Machine Learning and Computing, Vol. 12, No. 5, September 2022 intuitively, the average ranking of each algorithm is given in Fig. 3 and Fig. 4. In Fig. 3, for each performance metric, the average ranking of each algorithm on the testing set is described. For example, for Hamming Loss, XGB and ADASYN-RF have equal values, so their average ranking for Hamming Loss is (4 + 5)/2 = 4.5. And Fig. 4 describes the overall average ranking of each algorithm across all experiments. For example, ADASYN-SVM has an average ranking of {2,2.8} on the dataset, so its overall average ranking is (2 + 2.8)/2 = 2.4. Based on these experimental results, the following conclusions are drawn:  1) As can be seen from Table I, XGB has the worst performance, ADASYN-RF is the next, and ADASYN-XGBoost has the best performance. Specifically, XGB performs significantly worse than other machine learning algorithms in all performance metrics. The performance metrics were all improved when the models were constructed by the new training set that had been resampled by ADASYN. ADASYN-XGBoost obtained the best performance on Hamming Loss, Coverage, Ranking Loss, One Error, and Average Precision. Fig. 4 also shows that for all evaluation metrics, ADASYN-RF performs the worst, while ADASYN-XGBoost performs the best. This indicates that while optimizing the classification ranking the classification performance can be improved by resampling.
2) For fused data, the ADASYN-XGBoost model outperforms other models, as shown in Table I. Specifically, ADASYN-XGBoost has the lowest Hamming Loss of about 0.03, Ranking Loss of 0.075, and Average Precision of 0.802. And for shifted data, ADASYN-XGBoost also outperforms the other models with 0.2895 Hamming Loss, 0.7237 Ranking Loss, and 0.2567 Average Precision. As can be seen from Fig. 4, ADASYN-XGBoost improves the performance of XGB on all evaluation metrics. These experimental results also verify that increasing the number of minority class samples in the training set by ADASYN resampling can improve the performance of the XGB algorithm.
3) In summary, the ADASYN-XGBoost algorithm performs the best, the ADASYN-SVM and XGB algorithms are the second, and the other two algorithms are the worst. This fully proves that the algorithm of this paper is very effective in improving the traditional XGB algorithm by ADASYN. It also shows that the based on XGB should not only select features from the feature space, but also consider the influence of minority class samples in imbalanced samples on the classification results, so as to further improve the model performance.

B. Kappa Coefficient
Using the performance metric of Kappa coefficient, the overall accuracy and Kappa of the five machine learning algorithms, from the lowest value of 0.0 to the highest value of 1.0, are shown in Table III. As can be seen from Table III, the overall accuracy and Kappa coefficient of ADASYN-XGBoost are the highest for fused data, while for shifted data, the overall accuracy and Kappa coefficient of ADASYN-XGBoost are improved compared to XGB.

V. CONCLUSION
In this paper, we propose an operating mode prediction model for electrical fused magnesium based on ADASYN-XGBoost. We use the ADASYN resampling technique to generate five different training sets and build five different models based on the traditional XGBoost model by recognizing the difference between current operating modes, and input the testing set data into the respective models for prediction according to the current operating mode. The improved model achieved the best results in six metrics, such as Hamming Loss, Ranking Loss, Coverage, One Error, Average Precision and KAPPA, which effectively improves the accuracy of the prediction of electrical used magnesium operating modes. So it is important to see that high performance has been achieved with the ADASYN-XGBoost method, so this system would become valid and effective for enterprise in the smelting process of fused magnesium.
Our study also has some limitations, such as not conducting experiments on a wider dataset. More industrial models and different sampling methods' effect on the results will be tested in future studies.