Intelligent Medication Reminding System for Visually Challenged Groups

This paper invested an intelligent medication reminding system that composed of optical character recognition (OCR) and text to speech (TTS) for visually challenged individuals. This approach has been implemented as an ARM-based embedded system that is adopted here to provide the service to extract the text from document and synthesize speech to generate the speech sound for the visually impaired individuals. As a domain specific reading machine, the proposed intelligent medication reminding system uses the convolutional neural network (CNN) as the optical character recognition (OCR) core engine to recognize the Chinese character, the HMM/DNN-based speech synthesis system is adopted as the pronunciation mechanism to achieve the functionality of text to speech (TTS). Since the recognition error resulted from optical character recognition, the spelling checking module based on n-gram models are also completed by detecting and correcting the error characters. For evaluating the proposed approach, several experiments are designed for the users. According to the experimental results, we can find the proposed approach can obtain the improvement in daily usage. That is to say, the developed embedded system is practical and effective.


I. INTRODUCTION
Medication-use safety is one of the important issues about health management especially for home-care. Since the vision, hearing, cognitive, memory degradation of the patients especially for elders, the medication-use safety is able to prevent the damages in health those resulted from take the wrong medicine and/or misusage of medicine. As we known, the abnormal in drug absorption, distribution metabolism, excretion usually cause the harmful effect to health. Therefore, how to manage the medicine taking and do medication-use safety well has become one of essential issues in home-care. Considering of medication-use safety, we have some frequently problems in medicine usage for home-care patients. Some Chinese like buy medicine to deal with the personal health problems for hiding the real condition from thy physician due to vanity. Furthermore, the remedies, Chinese herbs, and drugs with exaggerated effects have become one kind gift in Mandarin society. They give each other the related materials for achieve the desired goal in health. However, these conditions usually produce the opposite of the desired results for absent of the medical professional information and knowledge. Besides, preserving the medicine in the wrong way and/or reluctant to discard the expired drugs also cause the damage in medication-use safety. It is illegal and immoral some medical unprofessional individuals even adjusting the medicine by oneself. Apart from these abnormal conditions mentioned previous, how to manage the medication-use is the core issue in this paper. That is to say, how to prevent the problems such as repeated tasking of medicine, forgetting to take medicine, taking the wrong medicine, taking the medicine at the wrong time.
Medication-use safety by technology is one of the most essential trends in health administration development in near future. With the development of computer science, how to apply technologies to the medicine administration has become an important research topic. Using artificial intelligence for home care is also an extremely important issue at this time for the home-care patients. Recently, many medicine taking safety incidents have led the care for and caregiver to doubt the safety of medication-use. The intelligent perception technologies for patient care have the features of low cost and high efficiency; the data in multiple crop growth cycle are available to provide the more convenience. Intelligent medication reminding and reading system for medicine taking, the automatically recognition technology and speech synthesis helps the user to understand the content on the medicine bag efficiency significantly. This is very essential for keeping the safety of medication-use especially for visually challenged elders. Besides, speech is one of the most natural forms of human communication. Spoken remaining for medicine taking tenderly is desired by patients, which always greet us so warmly. As we known, the patients are usually with visually challenged condition. They are usually too old to use the more complex user-machine interface. It is hard for cared for to recognize the small icon and provide the input in a smaller touch panel. Herein, embedded system provides a good solution for specific application. One-button triggers the function makes user the most convenient especially for the user who is not so good at the device manipulation.

Intelligent Medication Reminding System for Visually Challenged Groups
Jui-Feng Yeh, Chan-Yi Liu, Sheng Chen, Jia-Yu Lin, and Li-Ting Zhang This investment aims to develop a document reader based on embedded system for elders with low vision. Herein, there are two core technologies: optical character recognition (OCR) and text to speech (TTS). Most of the elders are not smart phone users, considering of practice usage the embedded system is adopted here to provide the service to extract the text from document and synthesize speech to generate the speech sound for the visually impaired elders. However, the document reading is one of the most essential issues in nowadays daily life especially for official documents. Reading machine proposed in this investment is automatically device which uses optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. For visually impaired or near-blind elders, the proposed assistive system will be invested to document reading. It reads textual information on document and produces corresponding voice using OCR and TTS. We developed the OCR module. For obtaining the characters, to localize text regions in images connected component labeling approach using histogram analysis is done on image. Furthermore the pattern recognition algorithm will be adopted for text recognition. The spelling check with n-gram model is also integrated for detecting and correcting the error characters, TTS system using concatenated synthesis based on embedded system platform is used. This system is operated via a voice-based user interface and also has a user friendly user interface to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use. According to users' desire features, speech output is one of the most natural and useful trends. This research investment plans to develop the text scanner with speech synthesis to enhance the capability for the need of elder with lower vision. Herein, three main points are illustrated in this investment: hardware design and development, OCR technology and pronunciation-based TTS design. We will design the print circuit board for ARM-based system on board (SOB) embedded system for the system architecture. Considering of the speech output module, the computational complexity should be enhanced compared to traditional approaches. The pronunciation-based reader machine will be designed and developed for future applications. The speech synthesis developed in next year is connected to the intelligent light pen and interfaces to provide the message to users effectively and efficiently. Based on industry's desire, in this paper we have develop the related technologies.

II. RELATED WORKS
Since medication-use safety is important, McCall et al.
(2000) developed a RFID-based medication adherence intelligence system for monitoring and improving drug adherence are either costly or too complicated for patients to use [1]. Laranjo et al. (2012) used the internet of thing (IOT) concept for medication control; they adopted the RFID into the medicine bag and achieved a good performance [2]. Considering of independently living patients, McCall et al.
(2013) developed an embedded system to monitor the patients' medication condition [3]. For medication adherence monitoring, Aldeer et al. (2018) had a taxonomy of the technology-based approaches [4]. They divided the main core technologies into six categories: Smart pillbox, wearable sensors, ingestible biosensors, RFID, computer vision. Herein, the proposed intelligent medication reminding system is also one kind of reading system. Reading system is more and more popular nowadays. Ali et al. (2018) developed an interactive case-based learning system for medical education [5]. Kim and Chung (2018) designed a smart query-response interface of remote emergency medical image reading system for mobile applications [6]. Nazemi et al. [7] design a Standalone, low cost and affordable reading system for developing countries. Bouazizi et al. [8] used same technology in this paper but is using in Arabic. Katz et al. [9] developed a system in relation to guidance directives developed through participative design with potential users and educators for the visually challenged group. Albraheem et al. [10] developed a system called Third Eye, which used a human-technology-based mobile application to help visually challenged people to face their challenges in real life. The app was developed in Arabic interface and designed for Arab visually challenged people identify objects. Fedorovici et al. [11] published a paper about Convolutional Neural Network (CNN) in the Optical Character Recognition System adding Gravitational Search Algorithms (GSAs), applying GSA and following used backpropagation, to ensure performance improvement by avoiding the algorithm being limited to the minimum area. Rawoof et al. [12] developed a real-time speech synthesis in an ARM-based embedded system. They introduce the implementation details of the TTS steps and the results of the implementation on the ARM and x86 system. Nazemi et al. [13] published a paper about the application of speech synthesis technology in embedded systems, they introduces a multi-language speech synthesis technology and embeds it into a portable, low-cost and independent embedded system to access and read electronic documents.
In previous studies, we found out that reading systems based on optical character recognition and text to speech is more and more popular nowadays. Ours goal in this paper is to develop a system to help visually challenged groups in medication-use safety using technologies.

III. INTELLIGENT MEDICATION REMINDING SYSTEM
This paper proposed an intelligent medication reminding system with Chinese spelling checking for visually challenged groups. The proposed framework is illustrated as the Fig. 1 shows. The system is composed of three parts: optical character recognition (OCR), Chinese spelling checking, and text to speech. The overall flow chart of smart medication reminding system shows in Fig. 1.
There have two main phases in this system: optical character recognition and text to speech. These two technologies are the main cores in this system. After photograph the cover image of the medicine bag, the paper document would be converted into an image file, the OCR module would divide and extract texts in the image file. The utterance based on response sentence as a unit to synthesize speech voice in TTS module. After synthesized, digital/analog converter (DAC) is used to play the audio files to achieve the function of assisting reading to remind medication-use safety related information.

A. Optical Character Recognition Module
In this part apparently can divide into two sections: embedded system and OCR module. In embedded system, developing control firmware module for CCD camera module in the embedded system. Adjust the distance between the lens and the file, and use the following parameters shown in Table I to determine the distance.

Symbol Contrast
The measured image is scanned for reflectance data, the difference between the maximum (light) and minimum (dark) reflection values. If the bright area is too close to the dark area, the readability of the image will be impaired.

Fixed Pattern Damage
Damage and distortion around the image. Any damage would seriously affect the reading effect.

Axial Non-Uniformity
The offset error value of the X-axis and the Y-axis, the larger error value, the greater inconsistency of the perpendicularity of the two axes.

Grid Non-Uniformity
The deviation between the mesh of image and the ideal mesh. Cross comparison both of above, the ideal value differs by the maximum distance to determine the level.

Modulation
The uniformity of the bright and dark areas of the image.

Print Growth
The print error results between the actual element size and the expected element size.
The overall flow chart of OCR module shows in Fig. 2. After got the images from CCD lens, we have to capture Chinese characters from the images, called as text extraction. In this part, we define the part of the text block as a continuous or adjacent rectangle of the same size. While other independent blocks of different sizes were treated as non-text graphics. We ignore them and just process text blocks. The information on the medicine bag is between structural table and non-structural natural language, belongs semi-structural. While processing the information on the medicine bag, we would deal with fixed field first, like name, gender, dosage etc. After that we process the content of above. It would help if we separate paragraph and sentence. According to the fixed field on medicine bag, and based on domain knowledge we can effectively reduce the amount of candidate words, and this would lower the complexity and improve the correct rate. Because of the information on medicine bag is belong to high professional field, the words and sentences using would be some similar, so we use spelling checker and language model after OCR processing.
In the OCR module, the most important part would be character recognition, as other signal recognition; we would use model-based recognizer, the operation main divided into two parts, feature extraction and model-based classifier. We use convolutional neural network to build our character recognition model. CNN learns spatially-local correlation in neighboring layers. Convolution operation is made up by convolution and pooling. So the hidden layers in CNN are composed by convolution layers, sub-sampling layers and pooling layers. Different from traditional fully connected networks, CNN can base on the features in the image to get reasonable connection between neighboring layers; this is the main function in convolution layers. Sub-sampling layers and pooling layers would reduce the sampling amounts to achieve substantially lower the operation cost. Our character recognition using in medicine bags is shown in Fig. 3.
We discover that there are some spaces between English words and only character 'i' and 'j' are made up by different parts, but in Chinese characters usually made up different parts and geometric arrangement. Because of this characteristic in Chinese, we found out that most of the errors came from over segmentation, one character divided into two or more characters, leaded to follow-up errors in image recognition.

B. Speech Synthesis Module
Reading system is more and more popular nowadays. Nazemi et al. [7] design a Standalone, low cost and affordable reading system for developing countries.
In this part, we would introduce imbedded speech IO (Input and Output) control and speech synthesis module. The imbedded speech IO control is mainly aimed at the control of digital to analog converter (DAC) in our imbedded system. In speech synthesis module, we use text to speech (TTS), and considering feasibility, we use serial tone connection combine with prosodic rules to produce speech. Intuitively, more smaller the synthesis unit is, more smaller the storage space require. In Chinese speech synthesis, it's impossible if the unit of speech synthesis is a sentence, because of the computational complexity is over the algorithm capability. Using a syllable or a character as a synthesis unit compared to a word as a unit would easily overcome the requirement of memory. But using above method might have some pronunciation quality issues. So we use intra-word prosodic rules to overcome the issues above. And the processing flow chart of speech synthesis module shows as Fig. 4.  First using the texts recognize by the OCR module as the input. And according to pronunciation dictionary we can correspond the character into syllable. Considering lower down the read-only memory (ROM) cost, according to Chinese pronunciation, we store the third tone in Chinese and use voice conversion to generate other tone in Chinese. And considering the pronunciation of words, we use prosodic conversion to deal with the words composed by two to four characters. After that we concatenate the pronunciation syllables, and use digital/analog converter (DAC) to generate speech.
The main advantage of HTS is abandoned rule-based only to determine speech parameter, HTS import the conception of statistics model to generate or revise parameters. While HTS was synthesizing and converting speech, it would consider synthesizing speech by speech coding. Also, by the concept of state transition, the synthesized speech would not cause popping or high frequency noise due to the abrupt connection between concatenations. And speech parameters exist in coding mode and can be adapted to context conversion, speech synthesis and conversion can achieve excellent results without losing the characteristics of speech. The HTS processing flow can be divided into two phase as shown in Fig. 5. In the training phase, using supervised learning to process the voice signals and labels to teach the HMM model. The synthesis phase analysis text performances first, obtaining the corresponding mark, and synthesizes the speech by generating corresponding excitation parameters and cepstrum parameters according to the HMM model.

A. Experimental Setup
The experiments we divided into two parts: experiments of OCR module and speech synthesis module. The details of software and hardware shows in Table II and our system shows as Fig. 6.

B. Evaluations about Optical Character Recognition
For evaluation of the proposed approach, 15 simulated medicine bags were designed, and each one was shot twice with the photography module of the embedded system, to reduce errors that may occur due to shooting angles and natural factors in the shooting environment. Herein, total amount about the number of Chinese character is 2,720. The experiment results are shown in Table III. According to the observation, we find that the brightness plays an essential role for the optical character recognition. Therefore, we have a lighting device that is aimed to lower recognition errors due to insufficient illumination. By the adjustment, we added scattering LED to solve above problem. The corresponding experimental result is shown in Table IV.

C. Evaluations about Speech Synthesis
We have two questionnaires are about the tone we modify before and after. Our questionnaires designed according to tester's audition and MOS analysis method. And we found 13 elderly people aged 60 to 70, 9 people between 40 and 50 years old and 8 between 20 to 25 years old, total of 30 people participated in the test. The time of test was 10 minutes, and each of the two tests had 10 sets of voices. The tester will separately hear the unmodified speech and the revised speech to give the score and preference. We use abx analysis to analyze the speech synthesis module. There are three options a, b and x (about the same), while testing, tester didn't know which one is unmodified speech or the revised one. If the tester choose a or b, that one got 2 points, if tester choose x, both got 1 point. And the experiment result shows as Table   V.
We used another analysis, called as Mean Opinion Socre (MOS) analysis. Just like abx analysis, the tester didn't know which one is unmodified speech or the revised one while testing. And after listening, they have to choose 1 to 5 as the grading. The grading shows as Table VI.   TABLE VI: THE GRADING OF MOS ANALYSIS   5 The synthesized voice is close to the sound of the announcer, overall smooth and perfect.

4
Clear and understandable, no obvious errors, reaching a level that can be promoted.

3
Can understand, but rhythm and accent are not handled properly.

2
The keyword words are unclear and approximate to a single tone.

1
The pronunciation is unclear and the expression is intermittent and incoherent. The related experimental results about MOS measure are illustrated in Table VII.

D. Evaluations about Overall System
In the experiment of OCR module, we can know that our system accuracy rate is nearly to 97%. Although there would be some segmentation error, like ' ' could be recognized into ' ', this example means that ' ' this character was over segmentation, so one character became into two characters. And there is another example of recognition error, ' ' was recognized into ' ', ' ' means stomach in Chinese and ' ' means armour, these two character are totally different meanings, but there are similar in appearance. After processing in our system, we revised above errors, increase about 5% accuracy rate. Although it is a little improvement, but output of OCR module would be the input of speech synthesis module, if we improve OCR module accuracy, it would be reduced the error of speech synthesis module and it would not misunderstand users. In the speech synthesis module, we increase about 13% accuracy rate in abx analysis, and increase 0.3 point in MOS analysis, so after experiments, we can apparently know that after our system, total acceptance increased. And we also invite 20 people to evaluate our overall system. We use system accuracy, execution time, speech clearness, operational simplicity, practicality, convenience, and overall scoring. The scoring is 1 to 5. 1 is the worst, 5 is the best. And the results are shown in Table VIII.
After the end of the experimental test, most of the participants listened to the modified tone of the system to get a better acceptance. Compared to reading the medicine bag or word-by-word voice, colloquial speech and modified tone to make it easier for users to understand the medication information to reduce the risk of medication errors and overdose. This shows that the proposed approach system is effective and practice for the visually challenged groups.

V. CONCLUSIONS
This paper invested an embedded system approach combining optical character recognition and speech synthesis with Chinese spell checking for medication-use safety. Besides the functionalities about reading machine of information on the medicine bag are implemented. Herein, two contributions are obtained including the identification error rate is reduced, providing a more accurate identification result, and converting the unfeeling standardized medicine bag into colloquialism, as the friends or relatives personally read the contents of the medicine bag, change The subsequent tone also makes the words sound smoother. The program can be used in a wide range of applications, from major hospitals to local clinics, to pharmacies, and even to everyone else. Nowadays, when modern people "have a hand", people's vision deterioration is becoming more and more serious, so this plan is not limited to the elderly. Although there are already applications using QR-Code to identify medicine bags, some of the elders are not familiar with the operation of smart phones or have never used them. Therefore, the system of this project can be used. After optimizing the hardware device and design-related software, the visually impaired group intelligent medicine voice reminding system can achieve one-key input and output, making the system for the elderly to operate the project more convenient. And the project is applied to the Raspberry Pi of the embedded system. The development cost is lower than that of the smart phone, and the hardware cost of each hospital or pharmacy is relatively small. The above will be the advantage of this plan.