Ride-Hailing Service Prediction Based on Deep Learning

—As a fundamental transportation service, ride-hailing has greatly improved the city mobility efficiency and served millions of passengers in big metropolitan cities. However, due to the imbalance between the limited supply caused by the strict car-buying policy and the increasing travelling demand, ride-hailing services are far from satisfactory. A better prediction of travel demand is one possible solution of improving ride-hailing service efficiency and quality and the idle drivers can be scheduled to hotspots with more potential ride requests. In this paper, we explore the usage of deep learning technique, i.e., ConvLSTM networks, for ride-hailing service prediction. Experiment results on a real-world ride-hailing dataset provided by Didi Chuxing show the superiority of ConvLSTM over baseline methods including Multi-Layer Perceptron and two simple historical methods.


I. INTRODUCTION
With the continuous development of the Internet and the mobile smart terminals, Internet users have begun to transfer to the Mobile Internet scenario. A large number of application software has begun to be transferred from computers to mobile terminals, and mobile apps have begun to appear in batches. Didi Chuxing is a typical success story. Didi Taxi (the former name of Didi Chuxing) is the first application software in China that uses Internet technology and a new type of intelligent network system. Didi Chuxing has been continuously updated and upgraded to occupy the market. At present, Didi Chuxing has become the largest ride-hailing platform in China, providing convenient car calling services and more localized life services for more than 100 million users every day. The biggest value and success of Didi Chuxing is that in the Internet era, with big data mining and user matching techniques, it leads users to achieve a modern way of travel. The economic and technological environment of Didi Chuxing is relatively friendly. Although various other car-hailing applications have entered the market, Didi Chuxing still dominates and basically monopolizes the Chinese market, and its market share accounts for 88.4%.
In the data of Didi Chuxing, the order quantity is a very research-worthy data. By mining and analyzing the data reflecting the order quantity, it can actually provide Didi Manuscript  Chuxing with the time and place for drivers to catch high-quality orders, so that the decision making of drivers is more efficient. Secondly, by analyzing the spatiotemporal trajectory of the order volume, it is possible to predict the taxi demand in different time periods in the area, which is helpful for better scheduling of drivers. For example, for the whole year, numbers of order demand in the holidays and non-holidays are different. Within a week, the order requirements on weekdays and weekends may be significantly different. And even for the same day, the taxi demand during commuting hours and other periods is completely different. On the other hand, through the spatial trajectory analysis of the order volume, difference of the order demand in different areas of the region can be defined, and the hot areas can be predicted, and these data can provide effective reference opinions for the driver's decision making. For example, in playgrounds and shopping malls, people's taxi demand may be greater than in other places.
Through the mining of order volume, based on various trajectory data and combining various algorithms, it is also possible to find hot routes and hot spots for residents in certain area and the potential travel laws of users, and it is effective for urban planning, make judgments and predictions on future travel needs, and can optimize travel efficiency, even reduce residents' travel waiting time.
By extracting the open data of Didi Chuxing, the prediction methods can be divided into two types. The first category is based on previous experience and large statistical amounts of data to predict travel requirements of residents in different time periods and different regions in the future. This method has many limitations in data sparseness, utilization, and accuracy. Another method is to use machine learning and various deep learning models. Based on neural network, different data features are input to the network model for training and prediction, allowing the machine to learn autonomously to predict various values in the future.
In this paper, we use deep learning techniques, i.e., ConvLSTM networks, for ride-hailing service prediction. We also compare the performance of two simple historical methods. We conduct experiments on a real-world ride-hailing dataset provided by Didi Chuxing, which contains the ride orders for half a year in Haikou, Hainan, China. The experiment results show ConvLSTM outperforms the baseline methods including Multi-Layer Perceptron and two simple historical methods.
Through the analysis of Didi Chuxing's order data, it can help companies achieve more accurate forecasting of order demand, and also help the planning and layout of Didi's vehicles. It can also improve the efficiency of traffic travel and make the ride-hailing service more intelligent. At the same time, the comparison of different models can also provide reference for the following research.
The following of this paper is organized as follows. In Section II, we review some latest related work. In Section III, we formulate the prediction problem. In Section IV, we describe the dataset as well as the preprocessing steps. In Section V, we give a short introduction of the models we use. In Section VI, we present our experiments. In Section VII, we give our conclusion.

II. RELATED WORK
In this section, we give a discussion about the related work from two aspects, namely, machine-learning based prediction and deep learning based prediction. Deep learning has becoming extremely successfully in a series of problems, including both computer vision tasks [1]- [3] and time series tasks [4], [5]. Deep learning is also drawing much attention for traffic forecasting [6].

A. Machine Learning Based Prediction
In Ref. [7], the authors introduce machine learning methods for characterizing and predicting. Temporal and spatial estimates are made for the short-term demand for on-demand ride-hailing services. The change in demand is a function of variable influences related to traffic, prices and weather conditions. Decision tree, guided aggregation (bagged) decision tree, random forest, enhanced decision tree and artificial neural network are used for regression. R-squared, root mean square error (RMSE) and slope are used for evaluation. In the article, the data of Didi Chuxing is used, and there are 199,584 time slots describe the space-time ride-hailing requirements, and the extraction interval is 10 minutes. All methods will be tested and verified by two independent samples in the data set. The results show that the enhanced decision tree provides the best prediction accuracy (RMSE = 16.41), and avoid the risk-fitting. This method is followed by artificial neural network (20.09), random forest (23.50), bagged decision tree (24.29), and single decision tree (33.55). The results show that the support vector machine algorithm is not suitable for solving the current taxi problem. In addition, the SVM used for regression is computationally expensive. In current research, non-parametric models are tend to be selected. Regarding modeling, GBDT provides the best performance in terms of overall prediction ability. In addition, in this study, the authors used regularization to reduce the different programs of computer running time, such as shrinking, bagging, early stopping, random subspace, etc.
In Ref. [8], the authors use the spatiotemporal model to analyze the demand for electronic taxi service and make more accurate predictions. This report presents a new method for analyzing and forecasting Uber demand. In addition, the prediction performance several statistical models will be compared, which includes a time model (vector autoregression, VAR) and two proposed spatiotemporal models (spatiotemporal model autoregression, STAR, and the minimum absolute shrinkage and selection operator applied to STAR, LASSO-STAR) in different scenarios (based on the number of time and space delays), as well as during peak and off-peak hours. This paper proposes to add LASSO penalty to the parameter estimation part of the STAR model to improve performance, and also developed several weighting matrices. The model is built by setting several coefficients to zero. Finally, in almost all cases, the LASSO-STAR model is better than the STAR model. As a conclusion, it is recommended to use the LASSO-STAR model instead of the STAR model.
In Ref. [9], the authors analyze the order and itinerary data extracted from DiDi Chuxing. The relationship between the different service modes of the driver and the selected area in a specific time period is studied. In order to predict the ride-hailing demand, the author uses LASSO (Least Absolute Shrinkage and Selection Operator) to analyze on-demand platform data (for example, distance, cost, and waiting time). Ride-on-demand prediction is based on a random forest (RF) model and then compared with autoregressive (ARIMA) and support vector regression (SVR). The results show that the performance of RF is better than other models and can predict the demand for unique on-demand ride-hailing service models.
In Ref. [10], the authors applied random forest to estimate direct demand models between regions. Compared with the traditional multiplicative model, the random forest model has better model fit and obtains higher prediction accuracy.

B. Deep Learning Based Prediction
In Ref. [11], the authors propose the spatio-temporal dynamic graph attention network (STDGAT), which is used for predict the demand for taxis demand. The method is using graph attention network (GAT) to extract the non-Euclidean correlation between regions, so as to achieve the goal of dynamically adaptive weight distribution to the adjacent regions in each region, thereby modeling different spatial correlations and capture spatial information. In addition, the dynamic graphics attention mode implemented by the author can capture different spatial relationships at different time intervals based on actual commuting relationships. Extensive experiments have been conducted on large-scale ride demand data sets in the real world.
In Ref. [12], the authors propose a network-based (CNN) deep learning model, which is used to predict multi-step ride-hailing demand, using travel request data from Chengdu China. These data are provided by DiDi Chuxing. The CNN model can accurately predict the car transportation demand for every 1 km by 1 km area in the urban area. The training speed of the CNN model is increased by 30% based on long-term short-term memory. The proposed model can also be easily extended to multi-step prediction, which will facilitate on-demand sharing.
In Ref. [13], the authors proposed a novel deep learning method named multiple spatiotemporal information fusion network (MSTIF-Net), which can better integrate multiple situational awareness information and graphical representation. Experimental studies of the proposed model on the traffic data sets of Haikou, China and Chicago, the United States show that compared with some traditional newest benchmark models, MSTIF-Net has excellent performance in predicting urban ride demand.
A spatiotemporal encoder-decoder residual multi-graph convolutional network (ST-ED-RMGC) is proposed in [14]. It is a novel deep learning model that predicts the demand for the sources of rides for various OD pairs. Extensive experiments on the rental vehicle data set of Manhattan, New York City, proves that the deep learning framework proposed in the article is largely superior to the latest technology.
In Ref. [15], a spatiotemporal multi-graph convolution network (ST-MGCN) model is used to forecast ride demand. In this paper, the non-Euclidean pairwise correlation coding region is divided into multiple graphs, and then these graphs are explicitly modeled. The correlation of multi-graph convolution uses global context information when modeling time correlation. A context-gated recurrent neural network is further proposed, which can enhance the recurrent neural network in the following ways: a context-aware gating mechanism to reweight different historical observations. Finally, the evaluation on the two real-world large-scale car-hailing demand data sets of the proposed model shows a continuous improvement of more than 10% over the latest benchmark.
In Ref. [16], a deep learning architecture combining the residual network, graph convolutional network, and long short-term memory is proposed to forecast short-term passenger flow in urban rail transit. Weather conditions and air quality are also considered, and their influences on prediction precision are quantified for the first time.
In Ref. [17], a new virtual graph modeling method to focus on significant demand regions and a novel Deep Multi-View Spatiotemporal Virtual Graph Neural Network (DMVST-VGNN) is proposed to strengthen learning capabilities of spatial dynamics and temporal long-term dependencies. Experiments on two large-scale New York City datasets demonstrate effectiveness and superiority of the new method.

III. PREDICTION PROBLEM FORMULATION
In this section, we formulate the ride-hailing service prediction problem as a regression problem, similar to [18]. Suppose we divide the spatial region into M by N grids and divide the temporal range into K time slots. We denote D(i, j, k) as the travel demand calculated by the number of orders in the grid (i, j) and time slot k. The prediction problem is to predict D(i, j, k+1) for all i and j, given historical travel logs before time slot k.

A. Dataset Description
In this paper, we use a real-world ride-hailing dataset provided by Didi Chuxing, which contains the ride orders for half a year in Haikou, Hainan, China. The usage of real-world transportation data has been proven important and necessary in previous studies [19]- [21].
Each ride order contains the following fields: order id, departure time, starting longitude, starting latitude. The other fields of the raw dataset are not used in this study. In this paper, we only use the ride orders provided by real-time Didi Express service, which accounts for more than 99% of the dataset. Totally, we use 11,038,281 ride orders for this paper.
The spatial region of the dataset is within the longitude range from 110.2001 to 110.3999 and the latitude range from 19.9001 to 20.0957. We divide the region with 20 by 20 grids, in which each grid corresponds to an area of 1km by 1km. The temporal range of the dataset lasts from May 1, 2017 to October 31, 2017. We divide the temporal range by 1 hour as the time slot. In total, we have 4416 time slots. We show the order statistics for each hour in May 1, 2017 and May 2, 2017 in Fig. 1 and Fig. 2. As we can tell from these figures, the demand presents a periodic pattern, which can be further used for prediction.

B. Dataset Preprocessing
We aggregate the order data into a matrix in 20 by 20 spatial grids and 4416 time slots. For each element of the matrix, it represents the number of orders with the start location within the specific spatial grid and the start time in the time slot. We show the distribution of the matrix element values in Fig. 3.
As we can tell from Fig. 3, the aggregated data follows a long-tail distribution. Before feeding the data into different models, we conduct a log transformation to the data with log(i+1), where i is the original value, and transform the data back for evaluation. The distribution of the transformed element values is shown in Fig. 4.

V. MODELS
In this study, we propose to use a ConvLSTM model to solve the ride-hailing service prediction. We use the Multi-Layer Perceptron (MLP) and two simple historical methods in [18] as baselines.
The ConvLSTM model combines the convolutional operation with Long Short-Term Memory (LSTM) [22] and is proposed in [23]. The traditional LSTM model consists of input gate, forget gate, cell, output gate, hidden state. In ConvLSTM, the neural network is used to capture the temporal and spatial dependence of the data set. Different from LSTM, the forward-feed connection between input and different gates are replaced with convolution, and the translation between states are also replaced with convolution. It not only has the time series modeling ability of LSTM, but also can describe local features like Convolutional Neural Network (CNN).
For the baselines, we use a simple MLP model as in [18]. The first simple historical method uses the demand value from the last hour as the prediction and the second simple historical method uses the demand value the same hour from the previous day as the prediction. We also use the extreme gradient boosting as another baseline, which is a strong machine learning baseline and has been proven effective in previous studies [24].
We denote the two simple historical methods as HIST_HOUR and HIST_DAY, the MLP model as MLP, the extreme gradient boosting model as XGBoost and the ConvLSTM model as ConvLSTM in the following sections.

VI. EXPERIMENTS
In this paper, we use Python and its package TensorFlow for implementing the deep learning models. A GPU is used for acceleration of the training of deep learning models.

A. Parameter Settings
We first show the specific model structure of ConvLSTM used in this study in Table I.  For ConvLSTM and MLP models, we use the historical data from the last 6, 12, 18 or 24 hours as input frames and predict the one-hour ahead frame, where frame is used to represent the matrix in a time slot. For training the model, we use Adam as the optimizer and use a learning rate of 1e-3. We also use a batch size of 10 and train each deep learning model with 100 epochs.

B. Valuation Metrics
We use the last 30 days of the whole dataset as the test set and the remaining data as the training set. We use the root mean squared error (RMSE) over the test set as our final International Journal of Machine Learning and Computing, Vol. 12, No. 1, January 2022 evaluation metric. A lower RMSE represents a better prediction performance.

C. Results
We show the results in Table II. As we can tell from  Table II, ConvLSTM achieves the best performance for ride-hailing service prediction in this study. And the best result of ConvLSTM is achieved when the input historical length is 12 hours and the best result of MLP is achieved when the input historical length is 24 hours. It could be possible that with more input frames, ConvLSTM is prone to overfitting and the longer historical length damage the prediction performance on the test set. We also plot the ground truth and the predicted results of the first frame from the test set in Fig. 5. As indicated in Fig. 5, the predicted result is similar to the ground truth, which indicates the effectiveness of ConvLSTM.

VII. CONCLUSION
In this paper, we use deep learning techniques, i.e., ConvLSTM networks, for ride-hailing service prediction. For comparison among different models, we conduct numerical experiments on a real-world ride-hailing dataset provided by Didi Chuxing, which contains 11,038,281 ride orders from May 1, 2017 to October 31, 2017 in Haikou, the capital city of Hainan province. The experiment results show that ConvLSTM outperforms the baseline methods including Multi-Layer Perceptron and two simple historical methods.