Keystroke dynamics based user authentication using deep multilayer perceptron

 Abstract —User authentication is an essential factor to protect digital service and prevent malicious users from gaining access to the system. As Single Factor Authentication (SFA) is less secure, organizations started to utilize Multi-Factor Authentication (MFA) to provide reliable protection by using two or more identification measures. Keystroke dynamics is a behavioral biometric, which analyses users typing rhythm to identify the legitimacy of the subject accessing the system. Keystroke dynamics that have a low implementation cost and does not require additional hardware in the authentication process since the collection of typing data is relatively simple as it does not require extra effort from the user. This study aims to propose deep learning model using Multilayer Perceptron (MLP) in keystroke dynamics for user authentication on CMU benchmark dataset. The user typing rhythm from 51 subjects collected based on the static password (.tie5Roanl) typed 400 times over 8 sessions and 50 repetitions per session. The MLP achieved optimum EER of 4.45% compared to original benchmark classifiers such as 9.6% (scaled Manhattan), 9.96% (Mahalanobis Nearest Neighbor), 10.22% (Outlier Count), 10.25% and 16.14% (Neural Network


I. INTRODUCTION
With the advancement of information technology and the pervasive nature of digital services leads to the massive explosion of data. Privacy and security are the major challenges for the organization to maintain increasing security breaches. Authentication is one of the fundamental methods to ensure the confidentiality and availability of data to the legitimate user. Single Factor Authentication (SFA) are prone to vulnerabilities, due to the user using weak passwords and hackers can crack the passwords in sophisticated techniques such as brute force, dictionary attack, etc. [1].
On the other hand, the complexity of authentication can be increased using the combination of two or more independent factors (smartcard, security hardware token, biometrics, etc.) known as Multi-Factor Authentication (MFA). The biometric properties of a user for authentication are gaining immense Manuscript  interest from the recent software products or organizations, as it tackles the issue of transferability of credentials [2].
Biometric-based authentication is categorized into the physiological and behavioral property of the user. The physiological property covers the visible part of the human body such as the retina, fingerprint, etc. On the other hand, behavioral property analyzes the behavior of a user through user profiling, gait, mouse dynamics, keystroke dynamics, etc. These unique behavior properties can be used to enhance the user verification process and develop a multi-modal user authentication system. For instance, by implementing keystroke dynamics alongside with password-based authentication system, the impostor will not only need to obtain the knowledge of the password but also the knowledge of how the password is typed. Thus, better security is provided by using multi-modal user authentication.
Keystroke dynamics is a user authentication method, which validates the user's typing rhythm to allow access to the system. It is an emerging field of interest for security especially in user authentication due to its various advantages. Firstly, keystroke dynamics have low implementation cost, and no additional hardware is required in the authentication process. Secondly, it has easier implementation compared to other biometric authentication methods because the collection of typing data is relatively easy, as it does not require special permission from the user.
Numerous studies in keystroke dynamics have been proposed utilizing the statistical models to build the classifier [3]- [7], machine learning approach [8]- [12] and hybrid models [3], [13]- [15]. However, the model accuracy to differentiate typing pattern between genuine user and complexity of accessing multitude data are the significant challenges in those models [16].
The purpose of this research is to develop a deep learning model on keystroke dynamics dataset. The model represents hierarchical learning of non-linear features with the purpose of extracting dependencies between them. As these features may be complicated and challenging to learn with usual machine learning methods, deep learning can help to learn high-level abstract ideas from low-level ones. Then, these abstractions can be separated to find features that can be used to improve classification performance.

II. METHODS
In this study, a classification model to differentiate genuine user and impostor will be proposed using deep learning approach. The model will be evaluated using existing keystroke dataset that is available for research. Thus, this study will only utilize secondary data and not primary data.

A. Dataset Selection
This study utilizes CMU benchmark dataset for keystroke dynamics [5]. The dataset consists of a subject identifier (ID) variable, session number, repetition number, and 31 keystroke timing features (H, DD, and UD) collected from 51 users. The users were asked to type a secure password (.tie5Roanl) for eight sessions with 50 typing repetitions for each session, which lead to a total of 34 variables and 20400 observations. These timing features are recorded in the measurement unit of second. The dataset description is given in Table I. DD.a.n The duration between pressing 'a' key and pressing 'n' key.

27
UD.a.n The duration between releasing 'a' key and pressing 'n' key.

28
H.n The duration between pressing and releasing 'n' key.
29 DD.n.l The duration between pressing 'n' key and pressing 'l' key.

30
UD.n.l The duration between releasing 'n' key and pressing 'l' key.

31
H.l The duration between pressing and releasing 'l' key.

DD.l.return
The duration between pressing 'l' key and pressing 'return' key.

33
UD.l.return The duration between releasing 'l' key and pressing 'return' key.

H.return
The duration between pressing and releasing 'return' key.

B. Data Pre-processing
The CMU dataset does not have any missing value, but some outliers could be found for several timing features. These outliers might occur because each participant has a different style and efficiency of typing a keyboard. For instance, a participant who has a job or experience related to typing task should be able to type quicker than those who do not have one. Unfortunately, CMU does not provide information on the typing efficiency for the participants. The model development consists of 34 columns and 6000 rows. The selection process of the data is divided into three criteria: (i) five user data, which have the most outliers; (ii) five user data, which have the least outliers; and (iii) five user data, which have the median amount of outliers.

C. Deep Learning
The primary purpose of deep learning is to automate the process of finding high-level representation from low-level features [17]. Deep learning offers several benefits such as allows selection and learning of all features in-depth k architecture and to perform multi-task learning in which multiple tasks in the learning process can re-use features and functions. This is possible due to its multi-level structure, and sparsity characteristic of the architecture which increases the representation efficiency by only utilizing up to 4% of the neurons [18]. Thus, deep learning can optimize the parameters used in a study to improve its representation.
Multilayer Perceptron, or also known as a multilayer feed-forward neural network refers to a network model in which each neuron in a layer is connected with neurons from another layer without cycling back to previous layer [19]. It consists of an input layer, one or more hidden layer, and an output layer. The input layer consists of the neuron that receives the input values (either numerical or binary) from training tuple. These inputs contain a weight assigned to each of them, which will be carried on to the next layer called a hidden layer. It receives the input values from the input layer, performs the mathematical calculation, and generates a temporary output for each training tuple that has entered the network. Next, these outputs are sent to an output layer where the predicted value for each training tuple will be assigned accordingly to the type of embedded activation function. [20] suggested the use of non-linear activation function in deep learning to handle the composition of a continuous linear transformation. By using matrix multiplication, the non-linear function can reproduce numbers of linear transformations in a single layer.
In this study, the multilayer perceptron model using deep learning is built with one input layer, two hidden layers, and an output layer. For the input layer, the number of neurons is set to thirty-one units as corresponding to the number of input features in the dataset. For hidden layers, the number of neurons is set to twenty-three units. This number is selected based on trial and error in finding the optimal accuracy for the classifier. However, [21] explained that the rule of thumb in deciding the number of units in a hidden layer is to choose a number of units in the input layer and the output layer. Thus, the median number between thirty-one and fifteen is selected. For the output layer, the amount of neuron is set to fifteen units because the classifier returns genuine user or impostor information for fifteen users.

D. Evaluation
Evaluation is the process of generating new knowledge through unique patterns identification. In this step, the output produced by the model is interpreted and transformed into knowledge. One possible way to interpret the result is by using statistical inference [5]. It enables the researcher to understand whether the output has a significant effect on the study or not.

III. RESULTS
The keystroke dynamics model implementation was carried out using the Waikato Environment for Knowledge Analysis (WEKA), which supports machine learning and deep learning model [22].
The dataset does not have any missing value, but some outliers could be found for several timing features. These outliers might occur because each participant has a different style and efficiency of typing a keyboard. For instance, a participant who has a job or experience related to typing task should be able to type quicker than those who do not have one. Unfortunately, there is no description provided on the typing efficiency for the participants.
Several interesting observations are highlighted in Table II. First of all, it can be seen that DD.five.shift.r and UD.a.n have the highest and the lowest mean among all features respectively. This means most of the participants have difficulty in typing number and uppercase letter consecutively. The inference is also supported by the values of median and mode of DD.five.shift.r, which is the highest among all other features. Whereas for the feature with the lowest mean, it could be resulted by the position of the keyboard keys which made it easier to type key 'a' and key 'n' consecutively while typing with both hands. This is supported by the lowest mode belonged to the feature. Next, all the features have considerably low standard errors, which means the values tend to close to the mean of the dataset. Furthermore, it can be seen that some features have negative median and minimum values, which indicate overlapping in the typing task. Finally, the maximum values of DD.i.e and UD.i.e are significantly higher compared to other features. This might occur because of the user being idle (taking a break) during typing task.
The performance of a classifier needs to be evaluated to understand how well it performs in future unseen data. As a biometric user authentication technique, keystroke dynamics requires high accuracy in classifying genuine user and impostor. To evaluate the performance of deep learning model implemented in this study, different types of evaluation criteria such as accuracy, kappa statistic, RMSE, precision, recall, F-measure, MCC, ROC, PRC and confusion matrix. These evaluation metrics are also elaborated to get a better understanding of the performance of the classifier.
The results of the classification performed by Dl4jMlpClassifier are illustrated in the figures and tables below. Fig. 1 shows the summary output for the classifier.   Table  II shows the detailed accuracy for each class. The overall accuracy of the classifier is 91.67% as a result of being able to identify 5500 out of 6000 instances. This means the classifier can correctly identify nine out of ten classification tasks (either identifying genuine user or impostor). The use of kappa statistics in calculating the agreement level between observers towards the case studies. The Kappa statistic of the classifier is 0.9107, which means it has an almost perfect and positive inter-observer agreement. This means the deep learning classifier is statistically significant to be used for keystroke dynamics studies. EER or a measure of accuracy for classification will be discussed in Table III along with FAR and FRR.
The MAE value for the classifier is 0.0142. This indicates the classifier has approximately an average absolute error of 0.0142 in identifying genuine user and impostor for fifteen users. The RMSE value for the classifier is 0.0911. This indicates the classifier gives approximately a squared error of 0.0911 in identifying genuine user and impostor for fifteen users.
In solving the classification problem for keystroke dynamics, more straightforward representation of the users (subject IDs) is implemented. A user with subject ID 's002' is represented as 'a', user with subject ID 's003' is represented as 'b', and so on.   Learning and Computing, Vol. 10, No. 1, January 2020 that the classifier has 8.3% rate in false identification of impostor as a genuine user and a 0.6% rate in false identification of the genuine user as an impostor. After obtaining the value of FAR and FRR, Equal Error Rate (EER) can be calculated by using the formula ((FAR+FRR)/2, which gives 0.0445 as a result. Thus, 0.0445 is the threshold value for FAR and FRR in the classifier. This value is similar to the result given by the classifier as illustrated in Fig. 1, which is 0.04.
Next, the classifier achieves an average precision, recall, and F-measure of 0.917. The precision indicates that the classifier can identify 91.7% of the impostor correctly and the recall indicates that the classifier can recognize 91.7% of all impostor cases in the dataset. The F-measure calculates the harmonic mean of precision and recall of the classifier, thus also achieves a 91.7% rate.

IV. CONCLUSION
Advances in keystroke dynamics have produced multiple classifiers such as statistical and machine learning to perform classification for genuine user and impostor. However, the maximum rate of accuracy has not been achieved. This study aims to propose a model in keystroke dynamics using deep learning method. This study is critical because it can potentially increase the user awareness and understanding regarding the biometrics authentication, and tackle security issues individually in access control and data privacy and can provide better authentication measure compared to SFA. The scope of this study is limited to the implementation of deep learning model with one dataset and does not cover external factors affecting keystroke dynamics performance.
The network model used in deep learning is multilayer perceptron with two hidden layers. Stochastic gradient descent algorithm used as the optimization technique as it can minimize the error and the cost of a function. An acceleration technique for gradient descent called momentum used to increase the learning speed of the network and the backpropagation algorithm to calculate the error of the function. The weight initialization (Xavier initialization) is utilized to assigns weight by considering the learning effect of the neurons to maintain an equal distribution of activations. Next, there are two activation functions used in the network: relu for hidden layers and softmax for the output layer. The model will use multi-class cross-entropy as its output function.
Three evaluation metrics such as FAR, FRR, and EER are selected and prioritized in this study to evaluate the performance of the deep learning classifier. Based on the training result, the classifier has achieved 0.083 FAR, 0.006 FRR, and 0.0445 EER in classifying genuine user and impostor based on fifteen users data. However, there are also other types of performance metrics that can also be used to evaluate the classifier such as accuracy, kappa statistic, MAE, RMSE, precision, recall, F-measure, MCC, ROC, PRC, and confusion matrix. The accuracy of the classifier shows that it can identify 91.67% of the instances correctly. The Kappa statistic of the classifier shows that it has an almost perfect and positive inter-observer agreement with a coefficient of 0.9107. The MAE and RMSE have indicated that the classifier suffers differences between the predicted value and actual value with an error of 0.0142 and 0.0911 respectively. The MCC statistic of the classifier shows that it has close to perfect prediction for genuine user and impostor with a coefficient of 0.911. The classifier also achieved 91.7% precision, recall, and F-measure in the classification task. The ROC area and PRC area of the classifier indicates that the classifier achieves almost excellent discrimination incorrect classification of genuine user and impostor with the value of 0.996 and 0.967 respectively. After conducting a comparison with related works on the same dataset, the deep learning classifier can achieve better performance compared to other classifiers in keystroke dynamics. At finally yet importantly, the classifier is also able to perform considerably well in another dataset.
Keystroke dynamics is an exciting field to explore as one type of biometric authentication measure although it has lower classification accuracy and a limited amount of studies compared to other biometric modalities. Although the field of study is still open to challenges and improvement, it has the potential to become an active, reliable and low-cost biometric user authentication.
The study on deep learning model development for keystroke dynamics has achieved a promising result. However, several limitations could not be addressed by the completion of the study. Firstly, the study only uses a single model (multilayer perceptron) for the deep learning implementation. Secondly, the study only uses a single dataset to perform model training. Although these limitations did not affect the achievement of the aim and objectives of the study, better performance could be achieved. Hence, future research can compare more complex deep learning models such as autoencoders, recurrent neural networks, and others for keystroke dynamics. Another future research in keystroke dynamics field is to build deep learning model for the mobile platform.