Sampling Algorithms Combination with Machine Learning for Efficient Safe Trajectory Planning

The planning of safe trajectories in critical traffic scenarios using model-based algorithms is a very computationally intensive task. Recently proposed algorithms, namely Hybrid Augmented CL-RRT, Hybrid Augmented CL-RRT+ and GATE-ARRT+, reduce the computation time for safe trajectory planning drastically using a combination of a deep learning algorithm 3D-ConvNet with a vehicle dynamic model. An efficient embedded implementation of these algorithms is required as the vehicle on-board micro-controller resources are limited. This work proposes methodologies for replacing the computationally intensive modules of these trajectory planning algorithms using different efficient machine learning and analytical methods. The required computational resources are measured by downloading and running the algorithms on various hardware platforms. The results show significant reduction in computational resources and the potential of proposed algorithms to run in real time. Also, alternative architectures for 3D-ConvNet are presented for further reduction of required computational resources.


I. INTRODUCTION
Active safety systems for collision avoidance have a huge potential for increasing the road-traffic safety and are necessary components of vehicles for autonomous driving. Since the driverless car competitions DARPA Grand Challenge [1] in 2005 and DARPA Urban Challenge [2] in 2007, automotive industry is broadly engaged in the development of advanced driver assistance systems having varying degrees of autonomous driving. Autonomous driving comprises of many tasks such as perception of the environment, localization and mapping, motion planning, vehicle control, etc. One of the most difficult problems for realizing fully autonomous driving is the safe trajectory planning in complex, critical traffic-scenarios with multiple static and dynamic objects with the simultaneous intervention in both the lateral and longitudinal dynamics of the vehicle. Such trajectory planning algorithms need to be efficient in terms of required computational resources, both time and memory, as they are limited by the on-board vehicle Manuscript

micro-controllers.
Trajectory planning is a already very widely researched area in the robotics community. Many deterministic [3]- [5] and probabilistic trajectory planning approaches [6], [7] have already been proposed and a survey can be found in [8]. A probabilistic sampling algorithm called Rapidly-exploring Random Tree (RRT) algorithm [7] is very popular because of its fast runtimes and ability for planning the trajectory with nonholonomic constraints of the vehicle without discretizing the state-space. Several variants of RRT algorithms have been proposed but only few variants of the RRT algorithm [9], [10] claim to find a safe trajectory with simultaneous intervention in the lateral and the longitudinal dynamics of the vehicle in real-time. CL-RRT [9] was the motion planning algorithm for Team MIT at the 2007 DARPA Urban challenge and due to high computational resources requirements it was executed on a high-end 'Car-PC'. Also the algorithm in [10] needs to precompute and store many safe states in order to assure real-time capability.
The computation time for an RRT algorithm increases with the complexity of a scenario, i. e., with the increase in the number of objects and constraints arising due the road infrastructure. The RRT algorithm is probabilistically complete. It means that it will find a solution, if it exists, given a sufficient computation time. The key for increasing the convergence speed is to find an appropriate heuristic based on the application. There is a long history of using heuristics to refine the RRT search [11]- [17]. All of these algorithms use some heuristic for the biased-sampling to increase the convergence speed of finding an optimal solution, but they all need to find at least an initial approximate solution before they can use the biased-sampling.
Machine learning algorithms especially Deep Neural Networks (DNN) can be used to find the solutions for complex problems with short inference time. Since they are purely data-based methods they are seen as black-box methods. Therefore, they are not used in safety critical applications like vehicle trajectory planning. A learned Gaussian Mixture Models distribution is used for the biased-sampling in learned free spaces in [18] to decrease the number of collision checks drastically for the trajectory planning with the RRT algorithm. Biased sampling in free spaces might increase the probability of finding collision-free states, but it is not necessarily a good sampling strategy for the long term trajectory planning which requires a long chain of collision-free states. In an another approach [19], a conditional variational autoencoder is used to generate biased samples in space from a learned sampling distribution on the specific planning problem to increase the convergence rate of the RRT algorithm. However, the approach for simultaneous biased-sampling in the lateral and longitudinal dynamics of the vehicle, is missing.
The use of hybrid machine learning algorithms, a combination of machine learning algorithms and model-based search algorithms, opens a new way of using machine learning algorithms in safety critical applications. AlphaGo [20] and ExIT [21] are two examples of guided tree search algorithms with neural networks for the board games Go and Hex, respectively. But, these algorithms are limited to discrete state-spaces and action-spaces. The Hybrid Augmented CL-RRT (HARRT) [22] and the Hybrid Augmented CL-RRT+ (HARRT+) [23] and GATE-ARRT+ [24] are examples of hybrid machine learning algorithms for long term safe trajectory planning in complex, critical traffic scenarios which use 3D convolutional neural networks (3D-ConvNets) [25], a type of DNN, to assist RRT variants the Augmented CL-RRT (ARRT) [26] and the Augmented CL-RRT+ (ARRT+) [23] to reduce the computation time. These algorithms are described briefly in Section III.
The high accuracy of DNNs comes at the cost of high computational complexity and therefore computational engines such as GPU, ASIC, FPGA are used for applications with DNN implementation. Therefore, even if the hybrid machine learning algorithms proposed in [22], [23] and [24] reduce the time required by the model-based search algorithm, the total time and memory requirements are still high for the implementation on an automotive micro-controller. In this work the required computational resources for the implementation of the HARRT algorithm are evaluated on various hardware platforms. Additionally, the modules of the HARRT and HARRT+ algorithm requiring high computation time and memory are identified and alternative machine learning and model-based algorithms for an efficient embedded-implementation are proposed. The results obtained with these changes that lead to embedded-implementation of the HARRT algorithm are summarized in Section VII. The same methods can be used with the HARRT+ and GATE-ARRT+ algorithm to reduce the required computational resources.
3D-ConvNet is chosen as machine learning algorithm for learning spatiotemporal features. This algorithm is very computationally expensive because of 3D convolution and 3D pooling operations. Therefore, alternative architectures suitable for learning spatiotemporal features to replace 3D-ConvNet are presented which also required less computational resources. This paper is organised as follows: Section II describes the functional and embedded implementation requirements for vehicle trajectory planning in critical traffic-scenarios, having the goal of collision avoidance/mitigation. Section III illustrates the model-based trajectory planning algorithms ARRT and ARRT+ and their corresponding hybrid machine learning trajectory planning algorithms HARRT, HARRT+ and GATE-ARRT+. Section IV and V introduce different machine learning and analytical methods for the optimization of the computation time and memory, respectively. These approaches are exemplarily explained with the HARRT algorithm but they are applicable to HARRT+ and GATE-ARRT+ algorithms as well. Section VI presents different alternative architectures for learning spatiotemporal features. Finally, Section VII presents the results of the measurements of computational resources required in different hardware platforms along with the comparison of alternative architectures.
Throughout this work, matrices are denoted by upper case bold letters, and vectors are denoted by lower case bold letters.

II. VEHICLE TRAJECTORY PLANNING REQUIREMENTS
In this Section, the requirements for the safe trajectory planning in critical dynamic traffic-scenarios are formulated. These requirements are divided into functional and embedded implementation requirements.
Functional requirements describe what a system should do with regard to the desired functionality. The functional requirements for trajectory planning in vehicle safety applications are as follows: • Criticality estimation of the traffic-scenario: In a first step, a traffic situation must be categorized as "critical" for the EGO-vehicle. The EGO-vehicle is the vehicle in which the trajectory planning algorithm is running.
• Computation of collision-free trajectories: In the second step, collision-free trajectories for the EGO vehicle must be computed. Collision-free trajectories are the path followed by a vehicle as a function of time in the time interval (e. g. 00 [ , ] tt   ) without a collision. EGO vehicle can follow such trajectory with a simultaneous intervention in both the lateral and longitudinal dynamics. This is because the trajectories resulting alone from full braking, as the ones that can be realized by an autonomous emergency braking system, are not enough to avoid collisions in many traffic scenarios.
• Prediction of the severity of the injury: For unavoidable collisions or when a collision-free trajectory is not found by search algorithms, a trajectory with low severity of collision must be computed and followed for the collision mitigation.
• Selection of best trajectory: The best trajectory should be selected from all the computed trajectories based on the criteria that reflects the aspects safety (e. g. collision-free), severity of injury (e. g. low collision speed), comfort (low accelerations during the maneuver), etc. The overall safety algorithm should also include a strategy when no trajectory that is collision-free or with a nonsevere collision is found.
These functional requirements are fullfilled with the analytical search algorithms ARRT and ARRT+.
The embedded implementation requirements describe the properties the algorithms should have to implement them in vehicles for safety applications. They are summarized below: • Efficiency: The trajectory planning algorithms should be efficient so that they should run in real time on an automotive micro-controller.
• Interpretability: The requirement of interpretability arises, especially when machine learning methods are used in safety-critical applications.
• No dynamic memory allocation: For embedded applications especially safety critical applications like trajectory planning in complex traffic scenarios, a dynamic memory allocation should not be allowed [27], because many coding errors stem from mishandling of memory allocation like attempting to allocate more memory than physically available, forgetting to free memory or continuing to use memory after it was free, etc.

III. HYBRID MACHINE LEARNING ALGORITHMS FOR VEHICLE TRAJECTORY PLANNING
In this section, the analytical trajectory planning algorithms the Augmented CL-RRT (ARRT) and Augmented CL-RRT+ (ARRT+) and their combinations with machine learning algorithms for faster convergence are described.

A. Criteria for Traffic-Scenario Criticality
A traffic scenario is considered to be critical for the EGO-vehicle at time instance 0 t if a collision will occur in the next   seconds given the assumption that the EGO-vehicle and other dynamic objects continue to follow the lane and pedestrians travel linearly with their velocity at time 0 t in the prediction interval 00 [ , ].
The trajectory planning algorithm only runs when a critical scenario is detected. The value of   is taken as 2 seconds in this work.

B. Constructing a Tree
The ARRT and ARRT+ algorithm find safe trajectories in critical traffic-scenarios with a simultaneous intervention in the lateral and longitudinal dynamics of the vehicle. They consider vehicle nonlinear dynamics for trajectory planning in the form ( is the control input and () t s is the area occupied by the EGO vehicle at time t which is the subspace of 2 . In an iterative process, these algorithms construct a tree with multiple safe states () t s . Throughout this paper, the term safe used in context of states and trajectories means either collision-free or with the predicted nonsevere collision.
In every iteration, a random point rand s is sampled with some bias towards a goal region goal S . The state () nearest t s which are previously stored in the tree nearest to rand s is found. The tree is extended by an incremental motion towards rand s from () nearest t s . The incremental extension is performed for the time interval t  using differential constraints f as in Eq.
(1) to get the new state () new tt  s . The inputs () t u here define inputs for both the lateral and longitudinal dynamics intervention. The two-track model is used as a constraint f while extending the tree, therefore the control input is defined as at state () t s is between the EGO vehicle and the collision object while c v is the critical impact speed defined according to the proposed generalized relationship between the impact speed, the impact angle, the type of collision object and the probability of severity of injury as in [28]. Finding states with a predicted nonsevere collision makes it possible to choose a trajectory with a low crash severity in case a collision-free trajectory is not found. These states with predicted non severe collision are not extended further as they are not collision-free states. Similarly, the indicator function road I at time t is where, nr S  is the area not belonging to the road. The ( ), tt  state index, parent index for tracing back the trajectory, information related to the predicted severity of injury for each state is added to the tree . This is possible due to the use of the two-track vehicle dynamic model while planning trajectories.

D. Selection of Best Trajectory
The trajectory planning with the ARRT and ARRT+ algorithm is divided into two subintervals of a long time interval 00 [ , ] tt   . In the first subinterval 0 0 tt   from the end position of the trajectories found in first subinterval is predicted. This is named as steering effort and is taken as one of the parameter while choosing the final trajectory. The selection of best trajectory is done considering the parameters in preference order safety, steering effort, and acceleration values. If no collision-free trajectory is found then the trajectories with a collision are compared solely based on the predicted severity of injury and the one with lowest severity of injury is selected as the best trajectory for that scenario. If collision-free trajectories are found then only those are considered for further comparison. Among these trajectories, the trajectories with high steering effort above the defined threshold are eliminated or if there is none below the defined threshold then the one with lowest steering effort is selected as a label. Finally, if still multiple trajectories are remaining then the one with the lowest maximum absolute acceleration value along the trajectory is selected as the best trajectory.
If no trajectory without a collision or predicted low severity of collision is found, then the vehicle breaks sharply as fail-safe case. The critical traffic-scenario described by a sequence of predicted occupancy grids is used as an input for the machine learning algorithm as its size remains the same irrespective of the road infrastructure, type and the number of road-traffic participants in the vehicle environment. Due to the 3D structure of the input, the 3D-ConvNet is used as a machine learning algorithm. The architecture of the 3D-ConvNet used for the HARRT algorithms is as shown in the Fig. 1.

E. HARRT, HARRT+ and GATE-ARRT+ Algorithms
The labels for training the 3D-ConvNet is generated from simulations. In a simulation environment developed in Matlab, many critical traffic-scenarios are simulated and best trajectories are found by the ARRT and ARRT+ algorithm. The label for best trajectory found by the ARRT algorithm is the corresponding predefined acceleration profile used. The HARRT algorithm uses only few predicted acceleration profiles, instead of all, for finding the safe trajectory which increases the convergence speed of the ARRT algorithm. In case of the ARRT+ algorithm, the best trajectory does not have any fixed acceleration profile because of the random sampling of the longitudinal acceleration. Therefore, the clusters of the steering wheel angle profile and longitudinal acceleration profile are formed and the combination of these clusters in which the best trajectory for a scenario lies is considered as a label for that scenario. The HARRT+ algorithm uses the predicted clusters for the biased-sampling in both the lateral and longitudinal dynamics of the algorithm to increase the convergence speed of the algorithm.
GATE-ARRT+ algorithm distiguishes itself from the HARRT and HARRT+ in the usage of the 3D-ConvNet for regression instead of classification. Initially, a variational autoencoder is trained on trajectories. The encoder part of this variational autoencoder converts the safe trajectories found in every scenario into latent space values which then are used as labels for training the 3D-ConvNet. For testing, the predicted values by 3D-ConvNet are converted into trajectories by the decoder part of trained variational autoencoder. This trajectory is used as reference trajectory for International Journal of Machine Learning and Computing, Vol. 11, No. 1, January 2021 biasing the sampling with the ARRT+ algorithm, therefore the the name of the algorithm is GATE-ARRT+.

IV. MACHINE LEARNING ALGORITHMS FOR REDUCING COMPUTATION TIME
In the HARRT algorithm, 3D-ConvNet is used only for assisting the ARRT algorithm, but the actual safe trajectory planning is done by model-based methods which are interpretable and so suitable for safety-critical applications. As explained in Section III, different computationally intensive analytical algorithms such as collision detection and calculation of the severity of injury, road I and obj I , along with a vehicle dynamic model are used in each prediction time-step to generate a tree with many branches, i. e., many safe trajectories as shown in the Fig. 2. As only one safe trajectory (e. g. red) is selected for the vehicle to follow, the computation time required for finding other branches is wasted. Also, the number of times the indicator functions are calculated increases with the number of objects in the surrounding of the EGO vehicle and the number of time steps required to find the final trajectory. This number is in thousands for finding a collision-free trajectory of few seconds. Therefore, machine learning algorithms  . Going outside of the road is considered as colliding with a stationary object, therefore, the EGO velocity is considered as the relative collision velocity for calculating the severity of injury. Fig. 3

B. Data Generation
The training samples for The machine learning algorithm 1 1 () fx  performs the collision detection with each object in sequence while 2 2 () fx  is trained with all types of road infrastructures. Therefore, they continue to adhere to the HARRT and HARRT+ algorithms principle of using one algorithm for all types of traffic-scenarios irrespective of the number and type of traffic participants. As mentioned in Section II, a dynamic memory allocation should be avoided for safety critical applications like safe trajectory planning. The memory required in static RAM (SRAM) for 3D-ConvNets having an architecture as shown in the Fig. 1 will be very high if memory is preallocated for the input M , all weights W for the convolution operations and fully-connected layers, feature maps V in all the convolution and pooling layers. The DNN processing in the cloud is not desirable due to latency, security, and communication bandwidth concerns. Therefore, different analytical algorithms are proposed to reduce the memory requirements for different parts of 3D-ConvNets as well as the ARRT algorithm. Sections V.A, V.B, V.C present the methodologies for required memory reduction for 3D-ConvNets and Section V.D is the methodology for reducing the required memory for generated trees using the ARRT and ARRT+ algorithm.

A. Iterative ConvNet Feature Generation
A 3D-ConvNet architecture comprises of multiple 3D convolutions and 3D pooling layers generating multiple output feature maps as shown in the Fig. 1. In a 3D convolution layer, the value of output unit xyt k V at position ( , ) xy and time t in th k out of K output feature maps is obtained by 1 11 where, ,, ) ,, PQ  and R  are the dimensions of slice of the input over which pooling is performed. It shows that the number of output feature maps in the pooling layer is equal to the number of input feature maps. As can be seen in Fig. 2 , generating in total 64 output feature maps for C2 ass well as for P2 layer. Preallocation of memory for all of these feature maps and kernels will require a huge amount of memory which is not available in automotive micro-controllers. The convNet features for FC layer in 3D-ConvNet are obtained just by the vectorization of the output feature maps of the layer P2. Therefore, the calculation of all feature maps in P2 layer at once for the inference is not necessary. Instead they can be calculated iteratively and appended to get ConvNet features for the FC layer. Fig.  4  W , respectively. These iteratively generated output feature maps can be vectorized into a vector of size 12 and appended continuously to get the ConvNet features for the FC layer. If parallel processing is used to speed the features computation, the buffers equal to the number of parallel processors can be created which will help increase the computation speed without assigning memory to all feature maps. In this work no parallel processing is used. Table I shows the difference between the iterative and non-iterative ConvNet features generation in terms of the required number of parameters to be stored, thereby the size of memory required (considering float datatype with memory required for 1 parameter is 4 bytes) in convolution and pooling layers of the 3D-ConvNet.

B. Iterative Convolution of the Input M
As explained in Section III, the input M is the sequence of occupancy grids having values either 0 or 1 for each cell. Therefore, its data type can be defined as 'unsigned int'. But the kernels used for convolution operation have data type 'float' because of which the input must also be 'float'. Even if the input is defined as 'unsigned int', the compiler changes the datatype to float before performing convolution operation. As the input contains 15000 (dimention of ) parameters in total, this results into huge memory consumption. Each unit of the output feature map in the C1 layer of 3D-ConvNet is calculated iteratively with a kernel convolving over a part of the input equal to the dimension of a kernel . This offers a possibility to define the input as 'unsigned int' and only copy part of the input into the buffer of dimension equal to kernel with data type 'float'. Therefore, instead of using 15000 'float' parameters, 15000 'unsigned int' parameters and a buffer of dimension equal to kernel with data type 'float' are used. This iterative convolution of the input , shown in the part of the 3D-ConvNet with magenta box in Fig. 4, results into a saving of memory equal to 43.66 kilobytes.

C. Fully-Connected Layer in Compressed Sparse Column Format
Most number of learned parameters of 3D-ConvNet are in the fully-connected layer FC. The 768 ConvNet features are generated in sequence because of the use of iterative ConvNet features generation explained in Section V.A. At every iteration only 12 ConvNet features are generated which can be appended continuously to previously found ConvNet features. The weights FC W having dimension of 22 769  (16918 parameters) are multiplied with a vector of dimension 769 1  , consisting of the generated ConvNet features and EGO velocity, to get a vector of dimension 22 1  followed by addition of 22 bias parameters as shown in the Fig. 5. This is further used as an input for a softmax function to generate 22 class scores. Thus, the total number of parameter in FC layer are 16940. Although these parameters are stored in FLASH memory of the micro-controller, they are fetched to SRAM during the calculation. Fetching all 16940 parameters at once consumes a lot of memory in SRAM. Therefore, the multiplication of FC W and ConvNet features is done in a Compressed Sparse Column (CSC) format [29] as shown in Fig. 6. features. This also reduces the required number of parameters by 756. Thus, the usage of the CSC format multiplication in FC layer saves the memory equivalent to 17652 parameters having data type 'float', i. e., 68.95 kilobytes.

D. Storage of Only One State with Low Severity of Injury
In the HARRT algorithm, the found states that are not collision-free, are still added to the tree, if the predicted severity of injury at these states is low. It provides an option of following a trajectory with predicted low severity of collision, when a collision-free trajectory is not found at the International Journal of Machine Learning and Computing, Vol. 11, No. 1, January 2021 cost of increasing the number of states to be stored in the tree, thereby increasing the required SRAM memory. This is especially because there is a lot of information saved along with these states as described in Section III. The number of states to be stored has to be predefined in order to preallocate the memory for these states. Storing each state with the predicted low severity of collision will fill this memory quickly and the algorithm will not be able to find and store more collision-free states, even if the computation time is available. To avoid this, only one state with the predicted low severity of collision is stored in the tree . If a new state with even lower predicted severity of injury is found, it just replaces the previously stored state.

VI. ALTERNATIVE ARCHITECTURES
In HARRT+ algorithm, 3D-ConvNet is chosen for extracting features from the input M as it is a suitable model for extracting the required spatio-temporal features. However, it requires a high memory consumption. Section V has proposed few methodologies for reducing the required computational memory. An alternative approach to reduce the required computational memory is to choose a different computationally efficient network architecture which also learns the spatio-temporal features from the input M . In 3D ConvNet, the 3D feature maps are generated by using 3D filters having smaller depth dimension than its input depth. These feature maps are not appended like in 2D ConvNet. In fact, the second convolutional layer performs convolution on each generated 3D feature map individually. This is shown in Fig. 1. This might affect the performance of the network as the spatial feature correlation in different feature maps at particular time step is not learned in intermediate layers. Only in the fully connected layer, all the extracted features are combined. However, the fully connected layer surrenders all the spatio-temporal correlation information in the input. Therefore, an analysis of other possible architectures is needed. In addition, 3D-ConvNet involves high number of computational operations compared to 2D-ConvNet. This section proposes alternative architectures to solve the problem of high computational resources requirements while not surrendering the spatio-temporal feature correlation.
Recurrent convolutional neural network (Recurrent-CNN) [30], as shown in Fig. 8, is a natural selection for learning spatio-temporal features because of its suitable architecture.
The weights xh W represent weights of 2D-ConvNet which will learn the relevant spatial features from input occupancy grids while the connection of latent spaces using the weights hh W will learn the temporal correlation of these spatial features. RNNs share the weights at each step and the extraction of features from inputs at different time-steps can be parallelized which makes them highly computationally efficient. Another advantage for RNNs is that they don't need fixed size of the input. As the number of time-steps, here 10, are small, the LSTM or GRU network, which eliminate the problem of vanishing and exploding gradients, are not suitable. Also, because the input M have the fixed size, a complicated architecture such as RNN is not necessary. Another way is to extract initially only spatial features from individual occupancy grids of M using simple 2D-ConvNet followed by a fully-connected layer. The weights of 2D-ConvNet can be shared similar to the RNN architecture. This is achieved by initially training an autoencoder with indiviual occupancy grids of all data D as input as shown in Fig. 7. The encoder , is used as a common 2D-Convnet. The problem with this methodology is that although spatial feature correlation is learned by the 2D-ConvNet, the temporal features correlation is surrendered in fully-connected layer. This problem can be solved by using a Temporal Convolutional Networks (TCN) [31] architectures. TCN is a family of networks for convolutional sequence prediction. A temporal convolution using 1D filters is applied in [31] to capture how the input signals evolve over the course of an action for video-based action recognition. Dilated causal convolutions are used in [32] for raw audio generation to learn long range temporal dependencies. A convolutional encoder network with soft attention followed by LSTM decoder is used in [33] for language translation. A gated convolutional network is used for language modeling in [34]. All these approaches are for long time series. As the time-sequence in given problem is fixed and short, the 1D-TCN followed by fully-connected layer is used as shown in Fig. 9.

VII. RESULTS AND DISCUSSION
A. Accuracy for 1 1 () fx  and 2 2 () fx  Using the data generation procedure explained in Section IV, 2 million and 1 million training samples are generated for the machine learning functions 1 1 () fx  and 2 2 ( ), fx  respectively. For both algorithms, neural networks with one hidden layer are used for training. 70% of the total data was applied for training and 15% of the data for both validation and testing. The results of training with different numbers of neurons in the hidden layer for both algorithms is shown in Table II. A neural network with 20 neurons in the hidden layer is selected for implementation as it gives best results with the validation and training data of both machine learning algorithms as shown in confusion matrices in Fig. 10 and Fig.  11. The class labels 1, 2, 3 defined in Fig. 10 and Fig. 11 are for no collision, nonsevere collision, and severe collision, respectively. A Matlab neural network toolbox [35] is used to to train the neural networks. The cross-entropy loss function is used as loss function for training the neural networks and all the layers use ReLU activation function.

B. Implementation and Measurement Results for Various Hardware Platforms
For the implementation of the algorithms a TMS570LS series Texas Instrument micro-controller TMS570LS20216, the dSPACE MicroAutobox II and a Raspberry Pi 3 are used. TMS570LS20216 is an automotive micro-controller certified for the use in IEC 61508 SIL3 safety systems. It has a ARM Cortex-R4F floating point CPU which runs with speed of 160 MHz and have 2 MB and 160 KB of FLASH and SRAM size, respectively. The small amount of SRAM available in the TMS570LS20216 micro-controller indicates the significance of the amount of memory reduction obtained by methodologies that has been described in Section V. The dSPACE MicroAutobox II is a real-time system for performing fast function prototyping. It runs with the speed of 900 MHz and 16 MB RAM while the Raspberry Pi 3 has 1.2 GHz frequency and 1 GB RAM. The results for the mex-implementations in Matlab are also presented. The computer used for the mex-implementations measurements has an Intel Core-i7 processor with 2.8 GHz and 8GB RAM.
The Simulink coder is used to generate the optimized C-code of the Matlab implementation of the ARRT, HARRT and optimized HARRT algorithms with the machine learning and analytical methods proposed in Sections IV and V. This generated code is downloaded to TMS570LS20216, dSPACE MicroAutobox II, Raspberry Pi 3 and the required amount of memory (SRAM) and time for execution is International Journal of Machine Learning and Computing, Vol. 11, No. 1, January 2021 measured. The results are summarized in Table III. The first two rows show the required computational resources for the ARRT and HARRT algorithm without the methodologies proposed in this paper. Subsequent rows shows the gradual reduction in computational resource with each methodology. The results show that the computation time for the mex-implementation and dSPACE MicroAutobox is very low, but the computation time for the automotive micro-controller TMS570LS20216 and raspberry pi is still quite high. A solution to this result is to use parallel processing which can lead to significant reduction in computation time. The proposed algorithm allows the parallelization of many modules, e. g., predictions of road objects, 3D-ConvNet as well as RRT algorithms. However, parallelization is not the main scope of this paper and will be considered in further work. It can be seen from Table III that the proposed techniques lead to a significant reduction of computational resources and show that there is potential to use the proposed algorithms for trajectory planning in real time. C. Alternative Architecture Comparison 42191 scenarios for curved roads from [22] were used to train the models with the architectures defined in Section VI. Table IV provides the comparison between all the presented networks in this section based on accuracy in terms of the top-2 error and the number of parameters in the network. A top-2 error tests if the reference class was within 2 hypothesis with highest probability. There is no significant difference in the number of parameters in all networks, however 1D-TCN network provides best accuracy on the test data.

VIII. CONCLUSION
This work describes combination of model-based search algorithms for safe trajectory planning in critical traffic-scenarios, with simultaneous intervention in the vehicle lateral and longitudinal dynamics, along with their combination with deep learning algorithm. The motivation for this hybrid approach is a much faster convergence of the model-based algorithms. In order to make these algorithms suitable for embedded implementation further machine learning and analytical approaches have been proposed. The results of the optimization and embedded implementation, i. e., the memory and time required in different hardware platforms is presented. The results show multiple times reduction in both memory and time requirements. Also, different alternative architectures for learning spatiotemporal features are presented and comapred in this work.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
Amit Chaulwar developed methodologies for efficient implementation of hybrid machine learning algorithms. He also tested alternative architecture for learning spatio-temporal features. Hussein Al-Hashimi optimized and implemented the algorithms in various hardware platforms and measured the required computational resources. Michael Botsch and Wolfgang Utschick have developed the hybrid machine learning algorithms for the presented application together with Amit Chaulwar. All authors had approved the final version.