Towards Machine Learning Based Analysis of Quality of User Experience ( QoUE )

Industries use various platforms to receive feedback from users of their products. In this paper, there is an overview of the potentials of using natural language processing system (NLP) in classifying the quality of user experience. The user experience is captured using google form. To test the efficacy of the platform, sentiments of users were analysed using hotels.ng as the source of data. The natural processing of electronic word of mouth (e-WOM) can be applied to any feedback platforms to classify and predict customers' sentiments and provide a veritable opportunity for companies to capture the quality of users' experiences and improve service delivery. The feature or sentiments extraction was done using opinion mining and data cleaning tools on heterogeneous data sources to judge the decision-making process of users. Using charts and correlations, with an average performance level of the willingness to recommend and degree of review helpfulness, the platform showed that the Quality of User Experience (QoUE) of the customers are 7.31 and 7.03 respectively. Finally, an improved logistic regression classifier was developed to test, train and classify the user experiences. Comparing the improved logistic regression classifier with standard logistic regression classifier shows that the training accuracy of the proposed improved logistic regression gave 97.67% as against the standard logistic regression which had accuracy of 86.01%


I. INTRODUCTION
It is considered that the expected launch of 5G and growth beyond 5G (B5G), organizations are going to experience sophistication in web 2.0 and online democracy. Competition among various online shopping and booking platforms will be driven by the ability to capture, analyse and use customers' experiences to the organization's advantage. Examples of popular platforms where user experiences are captured and sentiments analysed include but not limited to Gmarket, booking.com, TripAdvisor, Facebook, hotels.ng, and such other organizations as Amazon and Google. These options available to the customer and low switching cost it affords is provided by the current trends in advancement of technology as well as the social media [1]- [3].
The primary motive for the sentiment analysis of user experience is to gain insight into how customers feel and respond to products and services. This work is intended to recommend possibilities of ensuring that the needs of customers are met, and their expectations exceeded in the future. This is considered important given that customer satisfaction is critical to the success of all businesses. Besides quality of experience, other researchers have also used sentiment analysis, otherwise known as emotion analysis, to detect speech, mood or language pattern [2]. In another related work, authors designed a framework, which can detect hate speech on Facebook due to the need to uncover the real intentions of Facebook users [4].
Similarly, machine learning models for analysing and predicting YouTube users' experiences, especially those using smartphones have been developed recently [5]. These and many other works, demonstrated the rising interest in measurement and monitoring quality of user experiences (QoUE) in recent times [6]. However, the challenge lies in capturing accurate and complete picture of the customer experience, considering that reviews and comments are now real time, just as data increase currently requires automation and machine learning. In this paper, the effect of online reviews on customers using hotels.ng as a case study was analyzed. The contributions of this paper include: 1) An overview of the evolving role of machine learning in capturing and analyzing quality of user experience using hotel.ng website as a case study. 2) A measure of the effect of customers' reviews on new customers, analyzing positive and negative feedback.

3) A review of challenges and open issues to the design and
implementation of machine learning based analysis of sentiments derived in varied contexts from online platforms. The rest of the paper is arranged as follows: in Section II, there is a brief background to three concepts -A. machine learning, B. quality of user experience and C. electronic word of mouth (e-WOM). Section III focuses on the overall system model. Section IV details the performance evaluation, while Section V concludes the paper.

A. Machine Learning
Machine learning has provided a veritable opportunity to Towards Machine Learning Based Analysis of Quality of User Experience (QoUE) monitor the moods and level of satisfaction of users of products and services in all areas such as industry, civil, military and social media [2]. It is therefore not surprising that in a bid to receive the best services, potential customers visit the review section of the platforms, in order to make informed decisions based on the experiences and opinions of past customers regarding various brands as can be seen in Fig.  1. Fig. 1. QoUE sample of a typical platform [7].

B. Quality of User Experience
Machine learning is applied to the modeling of quality of experience (QoE), quality of service (QoS) and quality of user experience (QoUE) in networking for instance by using a set of indicators to represent the state and perception of users over a network [8]. This enables an automatic extraction and prediction of QoE or QoUE as the case may be. The researchers here argue that there is a need to guarantee reliable prediction, guided learning rate and careful design of such machine learning platform [8], [9].
As shown in Fig. 1, the features of the reviews used in this research work include:  Review title: This is the header of the review. It is usually a short phrase or word that the guest uses to summarize his impression of the industrial product.  Review content: This is the main area where the reviewer expresses his opinion in detail, about the industrial product.  User rating: The rating here is a quantitative data. It has a possible value between 1-10; with 1 indicating strongly negative and 10 indicating strongly positive.  User ID: This is the identity of the reviewer. Although the reviewers usually have the option of being anonymous, having User ID, lends the review a higher validity. To manage and organize these diverse features better, machine learning tools are now employed in online reviews. They serve as credible source of past user perception and quality of experience to guide new or prospective users [10].
In some cases, reviews are presented in summarized formats to make it easier for the new visitors to quickly access what has been overtly said about the brand. Through these reviews, customers have quick and easy access to an unprecedented amount of user-generated product information which helps them choose the most appropriate product, according to their idiosyncratic preferences based on other customers' experiences [10]. The experiences and opinions of other customers can provide information about the quality and value of each product or service and hence reduce customers' risk in making choices. It also complements other forms of business to customer communication [11].

C. e-WOM
Online review, also called electronic word of mouth (e-WOM), has an impact on customer attitudes and consequently on booking or purchase intentions [12]. The influence of online review and its affects on the profitability of companies is repeatedly discussed in [13]. In the hospitality and tourism sector for example, online reviews influence prospective hoteliers and tourists as they make booking decisions. Studies on the state of social media [14], shows that 70% of respondents indicated they trust online consumer reviews. In [15], the relevance of the knowledge of quality of user experience, as captured in online reviews was emphasised as they have been shown to be of more impact than the traditional referrals.
However, of major concern is the level of correctness of online reviews since most reviewers tend to maintain an extreme positive and negative stance, based on their experiences [16]. All the same, this does not remove the impact of e-WOM on future purchasing decisions of users. Hence, prospective users rely more on the use of numerical ratings which are easy to process [17].

A. Overall Analytics Procedure
Several steps or processes are involved in predictive analysis aimed at producing a model used to predict possible outcomes of a given instance. The processes are shown in Fig.  2. Under the modelling and pattern mining, the Mean Opinion Score (MOS) model for over the top content is given as Eqn. (1) where x denotes the number of product purchase and t is the time since last purchase and g represents the memory parameter sometimes set at a typical value as 0.14 [18].

B. Predictive Model
Predictive models use many several techniques in International Journal of Machine Learning and Computing, Vol. 10, No. 6, November 2020 preparing data for modeling, estimation, validating, scoring data, or related mining activities which leverage data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data and to make predictions about the future. The aim here is to ascertain what has happened, as to provide the best estimation of what will happen in the future [19]. This study shows how patterns found in historical and transactional data can be translated into algorithms and used to identify risks and opportunities for the future. When these algorithms are tested over time, especially in multiple scenarios, the model makes predictions with new data introduced into the system. Fig.3 shows the outcome of the predictive model, where the model can make predictions of the future from the information feed as input just like an expert system. This paper compared improved logistic regression classifier with standard logistic regression. The evaluation of the model accuracy was done using the standard Equation (2) adapted from [20]. In Equation (2), ε is the number of true positives while α is the number of false positives. The precision is a measure of the ability of the classifier to not label a sample as positive if it is negative. Another measure is "recall" which is a measure of the classifier's ability to find all the positive samples. Recall is given as Equation (3) where β is the number of false negatives. The third measurement criteria are the "F-beta" score which lies between the values of 0 (worst case) and 1 (best case). The use of this model helps to avoid prediction challenges such as equipment failure or depleted stock, or capitalizing on opportunities to market products to customers, like targeting people in happy or dejected moods after a sporting event [21].

C. Sentiment Analysis
Sentiment analysis is the automated process of classifying online text data as positive, neutral or negative, giving businesses the opportunity to gain a deeper insight into customers' assessment of their products, brand or services. Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) in transforming the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms. Arguably, NLP becomes a machine learning tool once its parameters are automatically learned from data. This involves an automated detection of users' assessment of their products and services as they are posted in online platforms for the purpose of feedback. Such platforms could be websites of the companies, social media handles like twitter, or any booking or even online shopping platforms such as Gmarket. The essence is to identify, analyse and codify the opinions of users based on user experience and satisfaction. With the growth of web 2.0, opinions can be expressed even without the use of texts, as the use of images and emojis now dominate the social media. Thus, extracting even smiley and symbols can also be a source of feedback to companies and organizations. This development has placed the study of opinion mining and sentiment analysis on the burner of both academics and industry.
Manufacturers as well as service companies now rely on the power of machine learning for the aggregation of feedback via sentiment analysis. With this, an unstructured response could be structured into a meaningful format that provides relevant reviews to prospective users. The new format is then readily available as a means of branding, customer service support and feedback for re-work in the case of manufacturing companies who rely on feedback from users of their products to gain a competitive edge.
Experts believe that customers are most likely to spend 31% more on businesses with excellent reviews [22]. From the analysis of respondents' viewpoints, about 72% said positive reviews make them trust local businesses, 72% will take action after reading positive reviews and 86% will hesitate to purchase from a business with negative online review. With the conclusions drawn from several studies, online reviews have become an important conversation for hospitality and tourism industry. This is so because people who require quality comfort and relaxation service can only learn about them through the said online platforms.

D. Research Design
In this study, hotels.ng was used as the main platform to aggregate the comments. Hotels.ng was selected due to data availability to the authors. The platform is just like any other booking platform and as a ready source of online reviews, it is very relevant to this research. A questionnaire was developed for primary data collection and web scrapping was used to extract reviews for sentiment classification. The questionnaire bubble rating is shown as a 10-point scale and a decision rule was formulated to guide the grading of each rate: 1= "Extremely Terrible", 2= "Moderately Terrible", 3= "Terrible", 4= "Poor", 5 = "Moderate", 6 = "Good", 7= "Very Good", 8= "Super Good", 9= "Nearly Excellent" and 10= "Excellent". While the sentiment classification was graded into negative weighted, neutral weighted and positive weighted. Hypotheses were formulated to examine the relationship between features of a hotel and its importance to hotel users, online reviews and its impact on the brand. The spread of sentiments captured from three thousand respondents is shown in Fig. 4.
The dataset was collected using web crawling technique to extract data from Hotels.ng website. Web crawling involves implementing a software to surf the internet with the sole aim of mining relevant data or opinion [23]. This paper used a Java library called "Jsoup" to collect user reviews from hotels.ng. One challenge facing the use of Web scraping is privacy. In order to ensure that the privacy of their customer is preserved, some websites do not permit certain kinds of data mining. Web scraping however will continue to enjoy wide range of usage despite this legal obstacle due to its role as data mining tool which is useful for feedback and prediction.
A self-constructed questionnaire created with Google forms tools was used to gather data about how users make decisions on a product and how online reviews influence such decisions. The Google forms used were distributed by posting the link on WhatsApp between September 2018 and November 2018. The questionnaire was made up of seven (7) questions, with two questions being on linear scale, two (2) multiple choice questions and three (3) open questions for selection. Google form can be created by signing up to have a Google account, after which the email address can be used to access the form. The structure of the form and questions it contains will be determined by the creator. After the creation of the form, the URL can be sent out to the target respondents. The responses are then saved in the drive of the email address used in creating the form. These responses can also be exported to a spread sheet. In this paper, the spreadsheet was converted to a Comma Separated Values (CSV) file to reduce training time complexity associated with the use of excel.  The population study for the online review consist of a community user of hotels.ng platform, while the respondents for the questionnaire consists of wide range of people available online at the time it was uploaded on the social. The population spans across all age ranges and both sexes who are in the community of hotels.ng users and internet users.

A. Performance Evaluation
Upon the implementation of the predictive analytics tool, the test run showed the following results captured from 3000 persons who were asked to give their opinion using Google form as a means of response collection. Fig. 6 shows the willingness of customers to recommend a product after usage. This recommendation can influence other prospective users, while they dabble into online in search. In Fig. 6 and Fig. 7, the average willingness to recommend a product and degree of review helpfulness is 7.31 and 7.03 respectively. This figure on a scale of 0 to 10 validates the premise of this paper which states that sentiments or opinion of customers affect or influence future patronage of industrial products and services. To classify the QoUE, an improved logistic regression model developed, and its performance compared with conventional logistic regression. Fig. 8 shows the confusion matrix while Fig. 9 shows the classification into satisfied and dissatisfied users.  1) Accuracy Score of Model: This paper compared the improved logistic regression model with conventional logistic regression. This was done using python 3.7 and Jupyter notebook. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Table  I and Table II shows that the improved logistic regression model performed better than the standard regression model during training and test. Training accuracy of the improved logistic regression gave 97.67% against the standard logistic regression which gave accuracy of 86.01%. Similarly, the test accuracy of improved logistic regression model gave 100% against that of the standard logistic regression which was 94.74%. In line with the conventional practice in most machine learning projects, 70% of the dataset was used for training, while 30% was used for testing [20]. The improved logistic regression classifier showed good performance judging by Table III as well as Fig. 10 where the loss is minimal as epoch is increased compared to the conventional logistic regression classifier.   2) Classification Report: The computed precision, recall and F-beta score is as shown in Table III.

B. Open Research Issues
As interesting as the use of machine learning and artificial intelligence may be in the capturing of QoUE, three research issues are worthy of note. These research issues are: Extreme review of users, Data set access and dynamic update by some review platforms, Legal and ethical issues of web crawling and Bidirectional Encoder Representations from Transformers (BERT). 1) Extreme review of users: Machine learning models are as good as the quality of data provided. This has raised concern over the level of reliability of reviews provided by users. Since some reviewers can be suspected to review at extreme, their review may not be reliable and devoid of sentiments or bias. There is, therefore, the need to develop ways to normalise the effect of outliers on prediction. This problem was a challenge to [24], in their design of a recommender system that serves as a source of review of products for potential users. In solving the problem, they relied on the users' rating instead of written reviews alone. More research and approach to identify extreme reviews and isolate them before translating them to ratings is expected 2) Data set access and dynamic update by some review platforms: As evident in this work, the source of data is an online platform and one common feature to most online review platforms is the continuous update of their website. In fact, Facebook now sieves some words posted by users using various rules and algorithms. This poses a restriction and re-usability of data. However, the challenge of limited data can be solved by the use of hybrid deep learning frameworks as suggested by [25]. It is expected by researchers that hybrid model frameworks will dominate future machine learning designs. Possibly too, there would be a comparison of various hybrid models for specific target systems. 3) Legal and ethical issues of web crawling: The concept or practice of web crawling involves using software or any means to extract information from the internet and in most cases, without official permission of the websites or platform owners. This has generated a lot of legal and ethical concerns with various people holding divergent opinion. Web crawling also known as web harvesting can be seen as a form of data mining, but the legal issues remain a research direction for both the industry and academia. However, despite these concerns, web crawling is gaining relevance and usage. This calls for more research on how to set up best practices to reach a tradeoff between its importance and potential threat.

4) Bidirectional
Encoder Representations from Transformers (BERT): Recent works now employ Google's BERT for predicting experiences and usage in clinical dialogues [26], [27]. Google's BERT is a state-of the-art model released. It is commanding wide usage by researchers in the field of natural language processing. The open issues for researchers centre on the effective usage and adaptation to various scenarios. For instance, in a recent work, there was a reported challenge faced by their researchers on how to use the BERT [26]. They compared the performance of BERT to a rule-based system and found that BERT did not perform optimally as claimed by inventors. However, they quickly accepted that their findings may be due to inability to test all possible ways to use BERT [26]. It is therefore a research direction to see more BERT-based systems emerge in the coming future. Similarly, another research trained their BERT model and compared its performance with that of Google's pre-trained models [27]. The result showed that their model outperformed that of Google's BERT in predicting personal experience using medication-related tweets as a case study [27].
This paper presents an overview of machine learning particularly natural language processing tools for the analysis of users satisfaction measured by QoUE metrics. The results showed that the extracted sentiments using the web scraping technique corroborates the opinion of respondents in the questionnaire. This showed the potential and capability of machine learning in analysing and predicting large database of sentiments or reviews. This approach can be applied to industrial scenarios where quality control units can use QoUE to appraise the perception of users of their products and services. Finally, an improved logistic regression classifier was developed to test, train and classify the user experiences. Comparing the improved logistics regression classifier with standard logistic regression classifier shows that the training accuracy of the improved logistic regression gave 97.67% as against the standard logistic regression which gave accuracy of 86.01%.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
Cosmas Ifeanyi Nwakanma proposed the paper idea, collected data and wrote substantial part of the paper as well as the MATLAB analysis and python codes for the logistic regression. Md Sajjad Hossain assisted in the improvement of the paper quality and helped in completing paper writing. Jae-Min Lee and Dong-Seong provided the supervision leading to the comprehensive enhancement of paper after each review stage. Dong-Seong Kim is the Professor in charge of Networked System Laboratory responsible for the research funding. All authors had approved the final version.

ACKNOWLEDGMENT
Authors are grateful to Mr Ugochukwu Ejike Akpudo of the Defence Reliability Laboratory, Kumoh National Institute of Technology, Gumi, South Korea for assisting in the python codes and technical support towards the revision of this paper. Authors are also grateful to Dr. Uchenna David Uwakwe, Directorate of General Studies, Federal University of Technology Owerri, Nigeria for English correction service.