AGeneral Chinese Chatbot Based on Deep Learning and Its ’ Application for Children with ASD

Commercial chatbots such as Apple’s Siri, Microsoft’s XiaoIce, Amazon’s Alexa, Jingdong’s JIMI, and Alibaba’s Alime, have some great prospective in applications such as hosting programs, writing poetry, providing pre-sale consulting and after-sales service in E-commerce, and providing virtual shopping guidance. However, in most cases, existed chatbots in the world are neither designed specifically for children, nor suitable for children, especially for children with ASD (autism spectrum disorder). In order to develop chatbots that are suitable for children with ASD, the present study firstly adopted an open source chatting corpus containing more than 1.7 million question-and-answer Chinese sentences of chatting histories involving children in many cases, and screened out more than 400,000 ideal chatting sentences for model training. Then a generative-based method combing Bi-LSTM and attention mechanism with word embedding based on deep neural network was adopted to build a general Chinese chatbot. The quality evaluation results indicated that our chatbot can successfully intrigue participants’ interest and made them understand it well. The chatbot also showed its’ great potential for using in the conversation-mediated intervention for Chinese children with ASD.


I. INTRODUCTION
Autism is a developmental disorder and 1 in 160 children has an ASD worldwide [1]. Common symptoms of children with ASD are deficits in social and language ability, as well as repetitive behaviors and narrow interests [2]. Fortunately, research indicates these abilities can be significant improved by early intervention [3].
In recent years, researchers show increasing interests in using computer assisted technology (CAT) and rehabilitation robots as interventions for children with ASD. Computer assisted technology has long been recognized as an effective treatment for children with ASD. [4] A dozens of research [5]- [7] report that autistic children normally feel comfortable in an environment which is predictable and avoid directly contact with human. Therefore, children with ASD take delight in interaction with computer. Computer assisted therapy for children with ASD contains a wide range of applications. Previous interventions mainly focus on using computer or mobile devices to conduct emotion recognition training [8], [9], language learning [10] and social behavior education [11]. Robot assisted treatment for children with ASD also achieved tremendous success and showed its unique value in clinical practice. Existed rehabilitation robot-based interventions generally concentrated on improving social skills of autistic children through joint attention [12] [13], learning imitation and turn taking [13]. However, none of these interventions apply chatbots which can simulate real communication in real society to enhance the social and communication skills of children with ASD.
Since children with ASD have linguistic difficulties [14] and commonly reluctant to communicate with others [15], early intervention mainly focuses on language teaching. However, researchers and therapists should bear in mind that the final goal of intervention for children with ASD should be facilitating autistic children to return and to apply skills which they learned from interventions to a real world. Therefore, to involve chatbots which could simulate human to human communication would be a valuable and important attempt to reach this goal.
It is necessary to apply chatbots in CAT or robot-based intervention. However, existed chatbots lack adaptability to language comprehension of autistic children. Therefore, the present study designs a general Chinese chatbot based on deep learning and some adjustments were made according to autistic children's language habits and understanding abilities. For example, autistic children with language barrier generally have pragmatic impairments [16]- [18] such as lacking of response or having difficulty in sustaining a conversational topic. Experts consider that the pragmatic impairments may relate to the difficulty in language comprehension [19]. Therefore, in the design of chatbot, we keep responsive sentences as short and simple as possible to make it easier for autistic children to understand. Besides, we conduct pre-training for the word embedding and therefore the imagination of our chatbot has been significant improved and this allow the chatbot generating appropriate response even to single input words. Considering the ages of children, our chatbot selects relatively pure corpus, which do not contain sexual or violent languages and it also makes our chatbot to have an optimistic personality. To our best knowledge, our chatbot is the only general Chinese chatbot designed for meeting the special needs of children with ASD.
The rest of this paper are arranged as follow. Firstly, we briefly review the history and the recent trends of chatbots. Then, in the methodology, we first introduce the construction process of our chatbot and after that, several experiments are conducted to evaluate the quality of our chatbot. The results suggested that our chatbot shows the ability of simulating human to human interaction and adult users are satisfied with the experience of interacting with our chatbot. Therefore, the present chatbot reveals a great potential to apply in the clinical practice for the treatment of children with ASD. However, actual application effects of our chatbot is still waiting for the test among children with ASD.

II. BACKGROUND
The world's first chatbot so-called ELIZA was developed by Joseph Weizenbaum in 1966 [20], which was used to imitate psychologists in clinical treatment. Although ELIZA only applied keyword matching and manual response rules, Weizebaum himself was surprised at ELIZA's performance at that time. Later, other chatbots appeared one after another. In 1988, Robert Wilensky and others developed a chat robot system called UC (UNIX Consultant) [21], which helps users learn how to use UNIX operating system. Subsequently, in 1995, Dr. Richard S. Wallace developed the ALICE system, which is one of the best performing chatbots based on template matching. In 2014, Microsoft launched an companion robot, XiaoIce [22], an 18-year-old AI girl, who can host programs, write poetry, sing and compose songs. In the same year, Jingdong released its self-developed e-commerce robot JIMI [23], which can provide full-time, unlimited services. Its' function covers the whole process of E-commerce including pre-sale consulting and after-sales service. In 2015, Alibaba Group released a virtual shopping assistant named Alime [24], which allows customers to enjoy one-to-one, full-time shopping experience. More and more commercial applications reflect the broad application prospects of chatbots.
Nowadays, chatbots have made great progress with the rise of artificial intelligence. Although the chatbots based on rule and retrieval [25], [26] has been studied for a long time, the pre-set rules and databases will not answer the questions that have never appeared during training or development process. To solve such problem, generative-based method [27] has irreplaceable technical advantages such that the program is expected to answer all the questions through automatically generated replies. In addition, compared with traditional methods, End-to-End based data-driven dialogue generation [28], eliminates a lot of feature extraction and processing of various complex intermediate steps, such as parsing and semantic analysis, which is the unavoidable work in traditional natural language processing. Therefore, generative-based method greatly improves the efficiency of system development and have better extensions. Based on the above advantages, the present study adopted a generative-based method to build a chatbot.

1) The construction of the data base
The present chatbot uses an open source chatting corpus [29]. The original corpus contains more than 1.7 million question-and-answer sentences of chatting histories involving children. After data cleaning and processing, including eliminating duplicate sentences and illegal sentences, and the unified processing of Chinese punctuation symbols, we finally screened out more than 400,000 ideal chatting sentences for model training.

2) An overview of the chatbot system
The chatbot system uses Bi-LSTM to encode sentences. On this basis, attention mechanism is introduced to improve the generation effect. At the same time, word embedding based on deep neural network is used to optimize the sentence representation that it learns from large data sets. The combination method is convenient and fast, and the evaluation experiments proves that our chatbot has a good effect.
a) Cyclic neural network unit In the encoder part, we use bidirectional LSTM unit to achieve better coding effect than unidirectional LSTM. The RNN unit we use in the decoder is the LSTM unit. LSTM controls input, memory and output values by introducing update gate units, forget gate units and output gate units. The structure of LSTM cell is shown in Fig. 1.
Cell unit at t time is calculated from cell unit at the previous time c <t-1> and current time . Among the formula (4), is calculated by formula (5).
Finally, the hidden layer state at t time a <t> is obtained by multiplying the cell unit c <t> at the current time by tanh function with the output gate, as shown in formula (6). The basic sequence-to-sequence model only uses the last hidden layer state in the Encoder part. Although the final output hidden layer state contains the state information of all hidden layers, the closer the input value is to the last moment, the greater the probability of being retained. The hidden layer state information near the initial time will be less retained in the final output hidden layer state. Most importantly, the attention-based sequence-to-sequence model [30] has proved to be superior to the cyclic neural network baseline system [31] in the application of chat robots. Therefore, the attention mechanism proposed by Bahdanau [32] in 2015 is introduced into our model. Attention structure diagram is shown in Fig. 3 Attention mechanism learns the attention weights of all hidden layer states of the encoder through a shallow neural network, and finds out the hidden layer states of the encoder after adding the attention variables at each time.

c) Pre-trained word embedding
One-hot vectors encoding the representation of vocabulary will lose the semantic information between words. Therefore, we need word embedding based on deep neural network to change the high dimension into the low dimension, while retaining the semantic relevance between words as much as possible.
Words embedding is a method of representing words by creating a high-dimensional vector space in which similar words are adjacent to each other. Universal word embedding has always been the goal of word vector representation. Pre-trained embedding in large corpus can insert various downstream task models such as affective analysis, classification, translation and extra to improve its performance automatically by combining some common words or sentences representation learned on larger datasets. Therefore, the word vector representation trained by Chinese word 2vec is loaded into the model to initialize the sub-word embedding, and then the word embedding is fine-tuned in the training process.

d) Beam search decoder
Cluster search algorithm is a heuristic algorithm, which extends the basis of greedy search. It returns the list of output sequences with the greatest possibilities. Compared with greedily choosing the most likely next step when constructing a sequence, the beam search algorithm tracks K states. It starts with K randomly generated States and generates all successors of all k states in each step. If any of these successors is the target, the algorithm stops. Otherwise, it will select k Top-k successors from the complete list and repeat them several times.
According to formula (7), the first output word is searched, that is, the word with the largest probability value of the first K. Next, search K second output words according to formula (8). Search K third output words according to formula (9).
That is to say, the cluster search algorithm is to obtain Probability is a very small number, and multiplying small numbers will result in a smaller number. In order to avoid the underflow of floating-point numbers, the natural logarithm of probability can be multiplied, which makes the numbers bigger and easier to manage.The improved cluster search formula needs to caculate Among them, adjust the super parameters ( between (0,1) to get the best results.
=0, which means no normalization, =1, means that the output sentence length is used to normalize completely.

3) Overall parameters
Overall, in our model, we set the dimension of hidden state to 256 and the dimension of word embedding to 300. The encoder is a four-layer bidirectional LSTM, and the decoder is a four-layer one-way LSTM. Encoders and decoders share the same word embedding, which is initialized by pre-trained word embedding and fine-tuned by our data set during training. We trained our model with four Nvidia GTX TITAN Xp GPUs. The batch size was 256, the initial learning rate was 0.001, and the learning rate was reduced by using Adam [33] optimizer. We will consider to open our source code in the future.  Overall impression to the robots

1) Evaluation methods
For testing the quality of our chatbot, we design an evaluation framework mainly based on Radziwill's research [34] and combine with some items which has unique values for children with ASD. Rediziwill applied the method of literature review and selected 36 scholarly articles from 7340 articles and finally generated an evaluation system which includes four categories: humanity, affect, accessibility and performance to measure the quality of chatbots. This evaluation system requires a comparison among the present chatbot and at least one another chatbot to present the advancement of the present chatbot. For measuring the category of humanity, affect and accessibility, which are totally depending on users' own experience. We design a questionnaire to evaluate the experience of users. All items range from 1 to 100 and numbers increase with users' satisfaction. The detail of questionnaire shows in the appendix and each item corresponding to the evaluation categories and the attributes of quality shows in Table I. For measuring the category of performance, we ask same questions to our chatbot and other two chatbots [35], [36] to compare the different answers. We also present parts of the chat histories to show a human to robot communication in appendix Table IV.

2) Participant recruitment
In order to ensure effective communication and to avoid unexpected harms, before conducting clinical trials among children with ASD, the chatbot should at least show its' competence to effectively simulate human to human interaction and can satisfied by normal adults. Therefore, the present study recruited 15 people to complete the users' experience experiment and to fill the evaluation scale.

3) Evaluation procedure
All 15 people are required to sit down in front of a computer. Three dialogue boxes from three chatbots are presented on the computer screen. One of the three chatbots is the chatbot we built and other two comes from open sources developmental environment [35], [36]. All these three chatbots applies generative-based technology. Participants are required to conduct 20 rounds interaction with each chatbot and then completed the evaluation scale. Participants do not know the designer of each chatbots before they completed the whole experiment.
For the users' experience scale, in total 15-people completed the experiment and filled with evaluation scale.
Results are presented in Fig. 4. The scoring range is from 1to 100. We calculate the mean average of 15 people in four categories.
The present robot receives outstanding scores in all categories except humanity compared with other two robots.
Affect is the category which the present robot receives the highest score among four categories with the score of 68 compares to the comparison chatbot1 with the score of 54 and the comparison chatbot 2 with the score of 48. The lowest score gets by the present chatbot is in the category of humanity, with the score of 47 compares to the comparison chatbot1 and 2 with the score of 48 and 49 respectively. In the category of accessibility, the present robot shows a significant advantage compares to other two chatbots and receives the score of 62. Other chatbots gets 49 and 47 respectively. In terms to the overall impression, the present chatbot gets the score of 67 which significantly higher than the comparison chatbot1 and 2 with the score of 56 and 44 respectively.  Table II presents the responses of asking same questions to the present robot and other two robots. The questions 1 to 4 show a significant positive personality of the present robot compared with other two and question 5 to 7 present the excellent imagination of our robot as a result of the word-embedding training.
The research results indicate that our robots perform better than the other two in the following areas: providing comfortable greeting and have a positive personality, intriguing interests of users, and using civilized language. In addition, users commonly feel easy to understand the meaning of our chatbot. This character makes our chatbot suitable for children with ASD, since one of the most important characteristics for children with ASD is the difficulty of language comprehension. The overall impression of users towards our chatbot is considerable better than other two chatbots. In summary, the present chatbot shows its great potential in applying in children with autism due to its easy to understand and rich of imagination.
Due to the time limitation, the present research has not been applied in children with ASD and is still waiting for the clinical testing. Future work needs to concentrate on the performance of the present chatbot in clinical practice. Long-term and sophisticated clinical experiments are needed. Besides, there are still many challenges for designing a chatbot which could achieve excellent performance in the interaction with autistic children. First, Children with ASD generally avoid actively communicate with others [15] The future work can focus on promoting chatbots to start a conversation or ask questions to motivate children to communicate with chatbots. Second, autistic children are lacking of response to a conversation initiated by others which suggests may relate to difficulty of comprehension [19]. At the present study, we simplifying the responsive words and cut down the sentences into as short and simple as possible to allow the conversation to be easily understandable. Future research could consider allow chatbot to repeat its' response in many ways to foster the understanding of autistic children. In addition, children with ASD may use idiosyncrasy words in a conversation which made people difficult to understand [37]. Future research is strongly recommended to establish the 'knowledge graph' specially for children with ASD in order to help chatbots figuring out real intentions of autistic children. We are now collecting data from children with ASD in order to achieve this goal.
Although compared with traditional chatbots, the development efficiency of generative-based chatbot has been greatly improved, and it has better scalability. At the same time, in theory, generative-based chatbots can replies to problems that have not been encountered. However, how to make accurate reasoning based on context information is still a problem to be solved. The future work can try to make some effort from this point of view.

VI. CONCLUSION
The present study introduces a general Chinese chatbot which are specially designed for children with ASD. The present chatbot adopted an open source chatting corpus containing question-and-answer style chatting histories involving children in many cases for model training, and adopted a generative-based method based on deep neural network to build a general Chinese chatbot. The contribution of the present chatbot are: (1)it can simulate a realistic social environment to intrigue children with ASD and allow them to apply social and communication skills that they learned from the professional intervention into practice; (2) it can serve as an effective tool for researchers and therapists to collect rare and precious linguistic data since it can be set on computers, mobile devices or robots and automatically record the vocal data, given collecting linguistic data from children with ASD has long been a challenge for researchers in psychological linguistics and other related subjects [37]; (3) the present study provides a novel HCI or HRI approach for children with ASD which can be used as a reference for future research, as previous research based on CAT or rehabilitation robots seldom used chatbots to facilitate the communication and social skills of children with ASD and general chatbots significantly lacked of design for meeting the special needs of children with ASD. 2. How much you are satisfied with the greeting of the chatbot? 3. How much you are interested in chatting with the chatbot? 4. How much do you think the chatbot can understand your meaning and intention? 5. Did the chatbot provide instruction when it cannot understand what do you mean? 6. How much do you think that the robot can give the propitiate response for your social cues? 7. How much do you think the chatbot's personality is positive and outgoing? 8. How much do you think the chatbot can talk to you with a civilized language? 9. Please rate the overall impression to the chatbot.