Application of Deep Learning for Credit Card Approval : A Comparison with Two Machine Learning Techniques

The increased credit card defaulters have forced the companies to think carefully before the approval of credit applications. Credit card companies usually use their judgment to determine whether a credit card should be issued to the customer satisfying certain criteria. Some machine learning algorithms have also been used to support the decision. The main objective of this paper is to build a deep learning model based on the UCI (University of California, Irvine) data sets, which can support the credit card approval decision. Secondly, the performance of the built model is compared with the other two traditional machine learning algorithms: logistic regression (LR) and support vector machine (SVM). Our results show that the overall performance of our deep learning model is slightly better than that of the other two models.


I. INTRODUCTION
The growth of the internet has led to a significant rise in credit card usage. It is one of the most used payment methods these days. As the world economy increases, credit card fraud also increasing at an alarming rate [1]. It is also evident that credit card defaulters have also increased significantly. Consequently, the credit card issuing institutions are becoming meticulous in approving credit cards to customers. In addition, the downturn of financial institutions in the USA and Europe during the US subprime mortgage and the European sovereign crisis has raised concerns about risk management properly [2]. Hence, these challenges have attracted significant attention from researchers and practitioners. A wide range of statistical and machine learning techniques have been developed to solve credit card related problems (see [1]- [7]). It is found that machine learning techniques are superior to other traditional statistical techniques in dealing with credit scoring [8]- [11]. In particular, deep learning is a most popular and accurate classification technique that outperforms other machine learning models (e.g. logistic regression (LR), linear discriminant analysis (LDA), multiple discriminant analysis (MDA), k-nearest neighbor (k-NN), decision trees, etc.) [12]. Deep learning is also found to be a state-of-art research area to solve various practical problems including credit card fraud [6]. Some of the problems for which deep learning technique is found to be the best method to solve are illustrated in Table I [7], [12], [40], [41]. However, the literature review explores that there is a very little research done to decide whether a customer is to be issued a credit card or not based on their information. Therefore, this study aims to support the decision-makers of whether a customer is to be issued a credit card or not. This study has two objectives. First, it will build a deep learning model based on the best parameters for the credit card dataset. Second, a comparative study between deep learning and traditional machine learning algorithms (Logistic Regression and SVM) will also be conducted.

A. Logistic Regression Model
Logistic Regression (LR) is one of the most commonly applied statistical techniques for credit card analysis [5], [30], [31]. It predicts the likelihood of a result that can just have two states (i.e. a dichotomy). The prediction depends on the use of one or several indicators (numerical and categorical). According to [7], it seeks the best fit parameter to determine the probability of the binary response based on one or more features. Based on independent variables for each credit card application, it provides a probability that is used to classify the application as accepted or rejected [5]. If the probability is larger than the threshold value, it is accepted. Otherwise, it is rejected. LR function takes as input the client characteristics and outputs the probability of default.
where in the above  p is the probability of default  x i is the explanatory factor i  β i is the regression coefficient of the explanatory factor i  n is the number of explanatory variables For each of the existing data points, it is known whether the client has gone into acceptance or not (i.e. p=1 or p=0). The aim in the here is to find the coefficients β 0 , β 1 , β 2 , … , β n such that the model's probability of default equals to the observed probability of default.

B. Support Vector Machine (SVM) Model
Support vector machine (SVM) is an algorithm that learns based on instances given and predicts [42]. For instance, an SVM can learn to recognize fraudulent credit card activity by examining hundreds or thousands of fraudulent and nonfraudulent credit card activity reports. SVM was firstly introduced by [43]. It is used as a classification and regression tool to maximize predictive accuracy [2]. SVM is the best fit for supervised learning where data are linearly categorized and examined [7]. Support Vector Regression (SVR) methods aim to approximate the following function by minimizing the following objective function where ‖w‖ is the regularization term, ( , ) is the loss function and C is the trade-off between model complexity and error on training dataset. The graphical representation of SVR can be seen in Fig. 1. The advantage of SVR is to present convex solution space resulting in a unique solution.
The data points are not always in a linear classification; the kernel functions enable us to transform the nonlinear dataset into a linear separation format. Fig. 2 shows the transformation of a nonlinear dataset to a linear dataset by using kernel functions.
where y is output, α i and α* are lagrange multipliers, x i is input vector, K(x i , x) is kernel function, and ƅ is bias.

C. Deep Learning
Deep learning (DL) is a subset of machine learning methods based on artificial neural networks. The core concept of deep learning is automating the extraction of features from the data [43]. According to [44], "deep learning is a class of machine learning algorithms that: (1) use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input, (2) learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts." Deep learning has recently drawn much attention from researchers in the field of machine learning [6]. It is considered as a robust algorithm for image identification and credit fraud detection [5]. DL is a multi-layer perceptron network that uses a stochastic gradient descent for training [7]. The deep learning principle is similar to an ANN that has many hidden layers. Conversely, non-deep learning feed forward neural networks have only a single hidden layer. The given picture shows the comparison between non-deep learning as in Fig.  3 and deep learning with hidden layers as in Fig. 4.
A sigmoid or a tahn function is applied as an activation function in the deep learning algorithm (see 5, 6).
III. DATA This study used the credit card approval dataset by UCI Machine repository to evaluate the experimental results (see [45]). The UCI Machine Learning Repository is considered to be a good source of data for conducting empirical and methodological research in deep learning. In the dataset, arbitrary names and values were given to the attributes to maintain the confidentiality of the data. Table II illustrates the details of the dataset.

A. Data Pre-processing
Some missing values were found in the dataset and taken care of following the appropriate machine learning approach to replace the missing data. All categorical attributes were converted to binary numerical attributes. Then, all data were normalized.

B. Data Analyzing Platform
Data were analyzed using respective machine learning algorithms (LR, SVM, and DL) with different parameters. The WEKA tool was used for SVM and LR while Python programming language was developed for DL.

IV. EXPERIMENTAL DESIGN
The main purpose of this study is to build a deep neural network based on parameters that provide the best performance. Different configurations of DL architectures are examined in this study by varying the number of layers and the number of neurons in each layer to see which configuration gives best performance on the data set. A total of 24 different combinations are evaluated for DL in which 2-, 3-, 5-, and 7-hidden layer networks with 3, 5, 7, 16, 32 and 64 neurons are experimented with. The number of neurons is kept the same in each layer for a single network configuration. For instance, if it is a 5-hidden layer network with 16 neurons, then each of the 5 hidden layers will have 16 neurons. In the first experimentation, the following parameters of the DL were a used-loss function: binary cross-entropy, optimizer: adam, activation function: rectified linear units (ReLU), the batch size for training and prediction: 15 and epochs: 50. A sigmoid function was used in the output layer. The popular 10-fold cross-validation approach is used for model evaluation and model selection to avoid overfitting classifiers [46]. Tuning with a grid search in parameter space is employed for fine-tuning the important parameters to find out the best parameters. After several experiments, Table III shows the best parameters used in the deep learning model:

A. Metrics
The chosen algorithms assume the underlying fraud detection issue as a classification problem. We have considered the confusion matrix given in Table IV for evaluating metrics. However, classical metrics of accuracy and confusion matrix will not be able to capture the actual fraud identification rate due to skewness in instances of each class. Thus, metrics that balance the detection of both classes have been considered. International Journal of Machine Learning and Computing, Vol. 11, No. 4, July 2021 False positive rate: FP / (FP + TN)

B. Experimental Results
In this paper, three algorithms namely SVM, LR, and DL are compared with each other. The WEKA (Waikato environment for knowledge analysis) tool is used for Support Vector Machine (SVM) and Logistic Regression to calculate the efficiency based on accuracy garnered from the confusion matrix and Python programming language is developed for Deep Learning (DL).  Table V illustrates the experimental results. For each classifier, F1-Measure, Precision, Recall, FP and the accuracy are displayed. As the deep learning results depend on the initial parameters, the algorithm was run for 5 times and the results reported in Table V are the average results of the five experimentations. Accuracy is the percentage of correctly classified instances and provides a measure for the ability to make accurate predictions on previously unseen cases. The F1-measure can reflect the overall performance of the model. The recall metric represents the proportion of the actual rejected applications that have been correctly predicted, while the precision metric denotes the proportion of the correctly predicted rejected applications to the predicted rejected applications. Both recall and precision are important evaluation metrics. In addition, the false positive rate is defined as the proportion of the applications that have been wrongly categorized as positive (false positives).
As shown in Table V, the accuracy rate of DL is the highest at 87.10%. However, the accuracy rates of the other two classifiers are same at 86.23%. Moreover, the precision and recall of DL is higher than that of SVM and LR. Recall value is the same for both SVM and LR while precision slightly differs from each other. The comparative results indicate that deep learning performs better for the credit card dataset.
Specifically, based on the F1-measure, the DL achieves the highest F1-measure score of .886, which indicates the overall performance of the model. F1-measure value for SVM is .863 while .861 for LR. These two algorithms produced almost the same F1-measure scores. In respect to the false positive rate, the SVM outperformed the other two algorithms, 12.80% for SVM, 16.10% for LR and 16% for DL. Based on the all accuracy measures except FP in Table V, we can conclude that the deep learning model performs slightly better than the other two models.

VI. CONCLUSION
In this paper, we have built a deep learning model based on the best parameters found by the grid search technique. The built model is then applied for the credit card data set and compared the results with logistic regression and support vector machine models. It is concluded that the deep learning model performed slightly better than the other two models. LR and SVM produced almost the same results. In the future, another experiment can be evaluated using the large dataset to see the comparative accuracy and applicability of these methods.