In recent years, the rapid growth of vehicles has imposed a significant burden on urban road resources. To alleviate urban traffic congestion in intelligent transportation systems (ITS), real-time and accurate traffic flow prediction has emerged as an effective approach. However, selecting relevant parameters from traffic flow information and adjusting hyperparameters in intelligent algorithms to achieve high prediction accuracy is a time-consuming process, posing practical challenges in dynamically changing traffic conditions. To address these challenges, this paper introduces a novel prediction architecture called Multiple Variables Heuristic Selection Long Short-Term Memory (MVHS-LSTM). The key innovation lies in its ability to select informative parameters, eliminating unnecessary factors to reduce computational costs while achieving a balance between prediction performance and computing efficiency. The MVHS-LSTM model employs the Ordinary Least Squares (OLS) method to intelligently reduce factors and optimize cost efficiency. Additionally, it dynamically selects hyperparameters through a heuristic iteration process involving epoch, learning rate, and window length, ensuring adaptability and improved accuracy. Extensive simulations were conducted using real traffic flow data from Shanghai to evaluate the enhanced performance of MVHS-LSTM. The prediction results were compared with those of the ARIMA, SVM, and PSO-LSTM models, demonstrating the innovative capabilities and advantages of the proposed model.
Keywords: traffic prediction; heuristic selection; LSTM; deep learning
With the growing issue of severe urban road congestion on a global scale, the prediction of traffic congestion has become an urgent problem that requires swift and effective solutions. Each year the Dutch company TOMTOM releases a notable report focusing on global traffic congestion levels. The data consistently show an upward trend in traffic congestion around the world over the past decade [[
Deep learning models are commonly used for traffic flow prediction. Typically, the process involves preprocessing data, training the model with historical data, optimizing hyperparameters for best evaluation results, and finally using the latest data to make predictions. Given that the Long Short-Term Memory (LSTM) model integrates the characteristics of both long-term and short-term data for prediction, it is highly suitable for application in traffic flow prediction scenarios. Currently, the process of using LSTM for traffic flow prediction still has a few limitations. One is that the presence of excessive and irrelevant data in the input can increase the cost of training and learning. This means that the model needs to process and analyze a large amount of unnecessary information, which can be time-consuming and inefficient. When training against the model, if there are too many input variables, then the computation will increase accordingly, resulting in longer time required for forward and back propagation in each layer. Especially in the fully connected layer, this may cause the model to require more memory to store parameters and intermediate calculation results, thereby affecting training speed [[
The contributions of this paper are as follows:
- Firstly, the paper highlights our approach to the rational dimensionality reduction of traffic flow data, which helps in reducing the computational complexity [[
3 ]]. By combining relevant traffic flow parameters and considering realistic road factors, our proposed method effectively integrates and processes the data, resulting in a more efficient computation process. Additionally, the presented closed formulae capture the relationships among traffic flow parameters, further enhancing the data processing approach [[4 ]]. - Secondly, three heuristic iterative models are introduced to enhance the MVHS-LSTM model, enabling a more convenient analysis of the impact of various factors such as the number of hidden layer nodes, window length, learning rate, iteration times, and the proportion of training and test sets on model error and computational time. These improvements in the MVHS-LSTM model establish a clear relationship between model error and time complexity, allowing for the identification of an equilibrium point based on the observed tendency [[
5 ]]. - Finally, extensive simulations are conducted to validate the effectiveness and accuracy of the proposed model. By substituting the optimal solution into the MVHS-LSTM model and comparing its performance with other algorithms, the results demonstrate that the proposed algorithm achieves a balance between the computing time and model error. The simulations not only significantly reduce the computing time but also provide a theoretical foundation for data processing methods.
The remainder of the paper is organized as follows. Section 2 discusses the related works of the traffic flow prediction field. Section 3 introduces the system model, including the traffic data preprocessing model, LSTM model and Cost–Loss balanced model. Section 4 presents the algorithm design. Performance evaluation is presented in Section 5. Finally, Section 6 summarizes this paper.
Traffic flow prediction is becoming more and more important in intelligent transportation systems (ITS). In recent years, researchers used a variety of algorithms to improve the performance of traffic flow prediction. There are two main categories of algorithms for traffic flow prediction. The first is a kind of prediction model based on traditional mathematical and physical methods such as mathematical statistics and calculus, i.e., the Kalman filter, Auto Regressive Integrated Moving Average (ARIMA) model and so on. Considering the effect on realistic traffic flow data, the other category methods focus on the neural networks or deep learning technologies, which increase the performance via training and iterations, i.e., the Support Vector Machine (SVM) model, nonparametric regression model, Bayesian model, neural network models and hybrid neural network models.
Within the first category of methods, Li et al. used ARIMA and the Grey Prediction Model to predict the passenger flow data of a subway entrance. It is concluded that the ARIMA model is more suitable for obtaining formulas from complex training sets, which leads it to obtain more accurate prediction data [[
Within the second category of methods, this paper selects several representative examples for illustration. In order to solve the problem of urban traffic congestion, Feng et al. estimated and predicted the traffic state based on an adaptive multi-kernel Support Vector Machine (AMSVM). The Gaussian kernel and polynomial kernel were combined to form AMSVM and analyze the nonlinearity and randomness of traffic flow with spatio-temporal information. The parameters of AMSVM were optimized by the adaptive particle swarm optimization algorithm, and a new method for adaptively adjusting the mixed kernel weight according to the trend of real-time traffic flow was proposed [[
After meticulously analyzing the existing literature, it becomes evident that, while numerous studies have successfully enhanced data accuracy, there persist several unresolved issues that are seldom discussed. Firstly, there is a pressing need to take into account the intricate interdependence among traffic flow data. It is imperative to recognize that, in the prediction process, it is not always necessary to feed the entire dataset into the model for learning. This approach can often lead to an unwarranted increase in the learning burden, thereby compromising the efficiency of the prediction system. Furthermore, it is crucial to recognize that solely prioritizing accuracy without giving due consideration to learning time is inadequate for real-time traffic flow prediction scenarios. This approach fails to align with the inherent real-time nature and the rapid, dynamic changes that are characteristic of traffic flow. It is the lack of a balanced approach that takes into account both accuracy and learning time that hinders the development of truly effective real-time traffic flow prediction systems.
Therefore, the paper aims to address the unresolved issues in traffic flow prediction by proposing a comprehensive approach that takes into account both accuracy and learning time. Our primary focus is on the rational dimensionality reduction in traffic flow data, which aims to reduce computational complexity and enhance prediction efficiency. By combining relevant traffic flow parameters and considering realistic road factors, our method effectively integrates and processes the data, leading to more accurate and responsive predictions. Furthermore, we recognize the need to strike a balance between accuracy and learning time, which is crucial for real-time traffic flow prediction scenarios. To achieve this, we introduce heuristic iterative models that complement the MVHS-LSTM model, enabling a convenient analysis of the impact of various factors on model error and computational time.
This section presents a system framework of traffic flow real-time prediction and proposes a traffic data preprocessing model, LSTM model and Cost–Loss balanced model. The quantitative analysis of the factors that can affect the prediction results in the model is also discussed in this section. The notations used in this section are described in Table 1.
In the section of System Architecture, we delve into the logical relationships among these sub-models, aiming to achieve dynamic traffic flow prediction. As shown in Figure 1, the system contains three processes during the traffic flow prediction, i.e., the traffic flow analysis based on real-time information acquisition, the offline analysis to determine the training factors, and the online prediction for traffic flow. We assume that the traffic flow data is obtained via V2V and V2R communication in VANETs [[
The real-time traffic flow information is obtained by the cameras equipped on the roadside or the flowmeters directly connected to the RSU. The RSUs at the road interactions contain not only cache to store and relay the traffic data, but also a tiny server to complete the calculation and training tasks. Each RSU in the distributed system acquires and trains traffic flow data and hyper-parameters of the road segments it covers, and then, it uploads the results, completing the task in the offline sub-systems. Then, the RSU uploads the nearest cellular base station, which transmits the message to the offline system in the vehicle traffic server to complete the parameter setting of the MVHS-LSTM on the road. When the road segment has the prediction requirement, based on the collected information, the online system in the vehicle traffic server can provide global traffic flow prediction for the road. The paper considers the internal spatial correlation within road segments by incorporating their spatial properties (such as lane number, inter-vehicle distance and vehicle speed) into the matrix Q.
As shown in Figure 1, our system consists of three subsystems:
(a) Traffic flow analysis system: The acquired traffic flow is multi-dimensional data that contain many variables, i.e., traffic flow data, vehicle spacing, average speed, map data (latitude and longitude), road information, travel cost and so on. It is difficult to distinguish which variables are needed in prediction. Therefore, this paper proposes a method to decrease unnecessary variables in most traffic flow data. Considering the correlations among the factors, the subsystem selects the optimal variables as the input for traffic flow data prediction heuristically.
(b) Offline system: In order to reduce the prediction computing cost, the system involves an offline system to analyze the MVHS-LSTM parameters. According to the historical traffic flow data, the offline system can find the parameters suitable by using the MVHS-LSTM model. We fine-tune multiple hyperparameters for the LSTM model, such as the window length, number of iterations, learning rate and so on. These adjustments are made to ensure optimal values in the offline system. The MVHS-LSTM model combines with these parameters and achieves a balance between the error and the time of the traffic flow prediction. In practice, the system determines the distribution of various dynamics based on users' demand for prediction accuracy and prediction cost.
(c) Online system: The online system imports the calculated parameters from the offline system and outputs the data that are predicted once. The system continuously circulates and consistently outputs predicted data throughout the process.
The obtained traffic flow data will be filtered through the method in this paper to screen out the variables that are beneficial to the prediction results. The historical data are put into the offline system for analysis, and then, the real-time prediction is carried out through the online system. The proposed architecture takes account of both prediction accuracy and computing time via the cooperation of the offline system and the online system.
In order to reduce the computational burden caused by abundant variables, the model needs to filter the variables firstly. In accordance with the difference in effect, whether variables have a good impact to forecast traffic flow data, the model filters some useless parameters and retains some effective variables.
We assume a matrix Q as the presentation of collected traffic flow data, which is denoted as follows:
(
Here, the columns of Q denote characters of observed traffic flow data, e.g., the real-time speed, the number of lanes, etc. The number of collected characters is
So as to find the conclusive variables, the paper chooses the Multiple Regression Analysis method [[
(
Here,
Since the dependent variable to be regressed is the continuous variable data, the Ordinary Least Squares (OLS) model [[
(
where
Based on multiple regression, we can obtain an equation for the minimum value tested by OLS model. The selected variables need to use the Joint Hypotheses Test [[
The Joint Hypotheses test is used to analyze statistical models that use more than one parameter to determine whether all of the parameters in the model are suitable for estimating the parent variable. The statistical value of the F-test can be calculated to obtain the statistic value by
(
where
After satisfying the Joint Hypotheses test, the goodness of fit of the overall regression equation needs to be tested. It is judged on the basis of R-squared, which is denoted as follows:
(
where
Due to the redundance of matrix Q, the collected characters of traffic flow should be selected and filtered based on R-squared. The correlation between each independent variable and the dependent variable is calculated by multiple regression. The independent variable with weak R-squared value is eliminated. The dimension of Q is reduced, which has a positive impact on the following prediction. In order to select the characters based on R-squared, we discuss the issue with two cases.
Case 1: The correlation measurement between traffic flow and single character. For
(
The correlation between traffic flow and the character
Case 2: The correlation measurement among multiple characters.
(
Furthermore, the correlation among multiple characters is considered in this case. The result is the same as for Case 1 as
Accurate traffic flow prediction demands continuous and uninterrupted forecasting, as traffic characteristics continuously evolve over time. Short-term data, measured in minutes or hours, and long-term data, measured in weeks or months, manifest distinct features. To ensure precision in data prediction, it is essential to consider both the long-term and short-term characteristics of the data. LSTM stands out for its capability to handle long time series and facilitate short-term as well as long-term predictions, making it an ideal choice for traffic flow prediction.
LSTM is an algorithm in RNN, which is used to solve the problem of gradient disappearance and gradient explosion in long sequence training. LSTM mitigates the vanishing gradient problem by incorporating a memory cell and a set of gates that regulate the flow of information within the network. The memory cell acts as a long-term storage unit, allowing the network to retain information over longer sequences. The gates, including the input gate, forget gate, and output gate, control the flow of information into and out of the memory cell. The forget gate selectively decides which information to discard from the memory cell, preventing irrelevant information from persisting and reducing the impact of vanishing gradients. The input gate regulates the update of the memory cell with new information, preventing the exploding gradients. These mechanisms enable LSTM to capture long-term dependencies and alleviate the challenges associated with gradient propagation. Therefore, LSTM performs better in longer sequences than ordinary RNNs. The control flow of LSTM is similar to RNN, which processes the data flowing through cells in the process of forward propagation. LSTM is suitable for analyzing long sequence data, and this paper uses the advantage of LSTM to predict long-term traffic flow data [[
Long-term memory cell:
(
Short-term memory cell:
(
Forget gate:
(
Input gate:
(
(
Output gate:
(
The long-term memory cell stores long-term traffic flow information, while the short-term cell stores short-term traffic flow memory information. The long-term memory cell forgets parts of unnecessary long-term traffic flow information through the forgetting gate. The function of the forgetting gate is to decide if information should be discarded or retained. The information from the short-term memory cell and the current input is passed to the sigmoid function at the meantime. The probability that the short-term memory is stored is denoted as
The input gate transmits the short-term memory cell of the previous layer and the current traffic flow data to the tanh function to create a new candidate vector
The output gate is used to determine the value of the next hidden state, which contains the information previously input. The output gate controls
In Figure 2a,
Based on the above discussion, the real-time traffic flow prediction issue can be extracted as follows:
(
where
(
The constrains of the cost–loss balanced model are presented as follows:
- (
1 ) - (
2 ) - (
3 )
The objective of the model aims to determine the most suitable parameters for prediction depending on the equilibrium between cost and loss. Here,
The three parameters impact the performance of the overall performance. Based on the drift-plus-penalty architecture [[
As shown in Equation (
Furthermore, the extracted cost–loss balanced model should consider three constrains: (
In this section, the improved MVHS-LSTM is designed to balance the accuracy and computing cost, which considers the parameters heuristically. In the analysis of the MVHS-LSTM model for data prediction, there are five variables that have a large impact on the prediction of the results, i.e., the number of hidden layer units of the MVHS-LSTM model, ratio of training and test data, learning rate, window length, and epoch.
Due to the uncertainty and continuous time series of traffic data, the system cannot input the entire data for learning at the beginning. The system involves window length to control the number of input data per time. The window length refers to the number of data points processed by the model each time. The extent of the window length directly correlates with the amount of contextual information that the model can incorporate, thereby enhancing its capacity to comprehend and predict the protracted trends within the sequence. Nevertheless, excessively protracted window lengths can result in undue complexity, rendering the model challenging to train and potentially introducing extraneous noise information. Conversely, a window that is unduly brief may constrain the model's representational prowess, preventing it from fully encompassing the long-term dependencies of the data and impeding its ability to comprehensively learn the inherent patterns and laws governing the data.
The MVHS-LSTM cell learns the data
According to reference [[
(
where
(
An epoch signifies the total count of instances where the model has comprehensively traversed the entire dataset throughout the training process. As the iteration count escalates, the LSTM model acquires additional opportunities to learn and refine its parameters from the training data. Typically, this implies that the model can adapt more precisely to the data and exhibit superior performance on the test set. Nevertheless, an excessive number of iterations can potentially induce overfitting in the model, manifesting as excellent performance on training data but a decline in performance on unseen data. The iteration count serves as a direct determinant of the model training duration. Initially, as the iteration count rises, the model's error often decreases swiftly, displaying a rapid convergence rate. However, as the model nears its optimal solution, further iterations may yield only minor performance improvements and, in some cases, may even result in performance fluctuations or degradation. Based on aforementioned discussion, the system determines the parameters in offline process to reduce the computing time in prediction. Based on the aforementioned discussion, the system determines the parameters in the offline process to reduce the computing time in prediction. The offline system contains three layers. The first layer is the LSTM layer and four parameters are considered: batch size, input dimensions, window length, and the number of hidden layer units. The LSTM layer, also known as the long short-term memory layer, is the core part of the LSTM model. Its main function is to process sequence data and capture long-term dependencies in the data. Through its internal memory unit and gating mechanism, the LSTM layer can effectively handle the long-term dependency and gradient vanishing problems in time series data, thereby improving the model's prediction and classification capabilities. The first parameter is the number of samples that are input into MVHS-LSTM at one time. The second parameter is the number of input dimensions in the same time series, which can be a multi-dimensional input. The third and fourth parameters have been described above. The second layer is the Flatten layer, the function of the Flatten layer is transform the multidimensional input into one-dimension, that is, to flatten the multidimensional feature maps into one-dimensional feature vectors. This usually occurs after the convolutional layer is used to convert the multi-dimensional feature map output by the convolutional layer into a format that can be processed by the fully connected layer. The Flatten layer flattens multidimensional data, making it easier for subsequent fully connected layers to process this data. The third layer is the Dense layer, also known as the fully connected layer, and is typically located after the LSTM or Flatten layer. Its main function is to weigh and transform the features output from the previous layer to generate the final output.
The basic idea of building the model is shown in Figure 1. Firstly, the algorithm imports the data and processes the traffic flow data and then normalizes the data by compressing them between 0 and 1. The data are divided according to the holdout method, which separates them into a training set and a test set based on a specified proportion. Then, the model undergoes training, which consists of three layers: the LSTM layer, Flatten layer, and Dense layer. The LSTM layer is a type of recurrent neural network layer designed for handling sequential data. It captures and remembers long-term dependencies in the data using memory cells and gates. The Flatten layer transforms the multidimensional output from the LSTM layer into a one-dimensional vector. The Dense layer connects each neuron to every neuron in the previous and following layers. At the end of the training, the algorithm checks whether the specified number of iterations is completed. If the number of iterations has not been reached, the training continues. The model is documented so that the test set can be used in the online system when the specified number of iterations is reached. Finally, the test set data are fed into the model to obtain the prediction results. These results are then denormalized to obtain the final predicted traffic flow data.
Algorithm 1 presents the pseudo code of the proposed dynamic traffic flow prediction based on multi-parameter MVHS-LSTM. Lines 1∼4 represent the preparation phase of the algorithm. In this phase, we initialize the matrix Q, normalize it, and divide it into a training set and a testing set. Additionally, we define two variables:
Require: Require: 1: Normalization traffic flow data and traffic speed data in 2: Divide into training sets − and testing sets − 3: MinPara = [ , , ] 4: = Equ(14)( , , ) 5: 6: Substituting − into MVHS-LSTM model 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: Put MinPara into MVHS-LSTM 18: Put − into to predict 19: Anti-normalization 20: Output the predicted next traffic flow
We evaluate the proposed MVHS-LSTM in terms of both loss and time. The simulation environment and simulation results analysis is also discussed in this section. The evaluation aims to answer the following two questions: (
We utilized MATLAB version R2021a and employed the following toolboxes: Neural Network Toolbox, Deep Learning Toolbox, Signal Processing Toolbox, and Statistics and Machine Learning Toolbox. We conducted the computations on a system equipped with an Intel Core i7 processor, running at 2.6 GHz. The system also had 12 GB of RAM and was running on the Windows 10 operating system. The data selected in this experiment are open source data taken by Wu et al. from Fudan University using video detection method [[
We choose performance indicators to evaluate the merits and demerits of the model. MAE, RMSE and MAPE are selected as metrics to evaluate the prediction performance.
(
(
(
Here,
This paper selects three comparison algorithms as the baseline, i.e., PSO-LSTM algorithms, ARIMA and the Support Vector Machine model (SVM). ARIMA relies on traditional mathematical and physical methods, while SVM and PSO-LSTM concentrate on neural networks or deep learning technologies. PSO-LSTM, specifically, employs the PSO algorithm to optimize hyperparameters for the LSTM method. Our approach shares similarities with PSO-LSTM as we also leverage similar techniques to enhance the performance of the LSTM model. SVM predicts by constructing an optimal hyperplane for classification or regression, while ARIMA predicts by modeling the autocorrelation and trend in time series data to forecast future values. PSO-LSTM predicts by combining the optimization capabilities of Particle Swarm Optimization (PSO) with the LSTM neural network to improve prediction accuracy.
PSO-LSTM (Particle Swarm Optimization-Long Short-Term Memory) [[
(
(
Here,
PSO-LSTM uses the above formulae to calculate the particle velocity and then updates the particle position. The updates of population, individual optimal and population optimal are carried out to find the global optimal parameters, and then, LSTM learning is carried out.
The advantage of PSO-LSTM is that the parameters of LSTM can be automatically adjusted and optimized without manual search. However, the parameter adjustment of PSO-LSTM can only adjust the two important parameters, which leads to a long adjustment time.
The essence of the ARIMA algorithm [[
The ARIMA model is used as the comparison algorithm to conduct a comparative experimental study on the time series model and the algorithm. ARIMA model can be divided into three parts, AR (Auto Regressive model), MA (Moving Average model) and I (Integrated), and their formulas are
(
(
Here,
The ARIMA model contains three parameters, i.e., p, d, and q. p represents the lag number of the time series data itself used in the prediction model. d represents the order of difference required for time series data to be stable. q denotes the lag number of prediction error used in the prediction model.
The core idea of SVM [[
(
where
(
We calculate the values of
There are many variables that may affect the traffic flow, namely, the number of lanes, speed and distance between vehicles. The correlation between variables of multiple regression analysis and traffic flow is obtained by computation, and the variables with low correlation are deleted to reduce the amount of computation. Therefore, there are comparisons between one variable and three variables. The R-squared and test results are shown as the following Table 2.
According to Equation (
Figure 4 shows the MAPE and time with an increasing number of iterations with other variables held constant. As the model is trained to be more suitable for data with the increase in iteration. So with the same learning rate and window length, the model becomes progressively better for the learned data as the number of iterations increases. It results in a decrease in the value of MAPE and an increase in the computing time required for learning. When the number of iterations was 1000, the MAPE was 0.0941, and the running time was 187.88 s. However, when the number of iterations increased from 1000 to 2000 and 4000, the MAPE decreased by 29.8% and 33.2%, and the running time increased by 193% and 492.8%. It can be seen that as the number of iterations increases, the error decreases while the learning time increases very quickly, but consuming 300% of the time to reduce the error by 3.4%. It is not worth wasting abundant computing time to acquire the tiny relative improvement in prediction accuracy in practice. As the number of epochs increases, the accuracy (MAPE) improves and becomes smaller, but the computation time increases. This is because the increase in the number of iterations leads to more learning cycles, resulting in improved learning capability. However, this improvement in learning comes at the cost of increased computational time. During the training process, we substitute the calculation error and time into the Formulas (
Figure 5 shows that MAPE increases with the learning rate when other variables remain unchanged. The learning rate is a hyperparameter that guides us how to adjust the network weight through the gradient of the loss function. The lower the learning rate, the slower the loss function changes. Although using a low learning rate can ensure that the algorithm does not miss any local minima, it also means that we will take longer to converge, especially in the case of being trapped in a plateau area. In this paper, the loss and running time of the learning rate in the range of
Figure 6 shows MAPE performance with window length when other variables remain unchanged. The window length represents the time step of a single input of data in each training batch. The larger the value of the window length, the greater the amount of data per learning. In this paper, the loss and running time in the range of window lengths of 2 to 18 are selected. It can be seen that the error of the window length decreases rapidly from 2 to 6, and MAPE decreases by 27.3% on average for every 2 increase in the window length. When the window length is from 6 to 18, MAPE decreases by 4.5% on average. The computing time increases rapidly in the window length from 2 to 6. The calculation time increases by an average of 5.8% for every 2 increase in the window length, while the calculation time increases by an average of 1.7% when the window length is from 6 to 18. During the training process, we substitute the calculation error and time into the Formula (
In summary, the model not only realizes the heuristic selection of parameters and variables but also gives the comprehensive value considering both MAPE and computing time. We verify our results in the system model and algorithm design, and the model can match the dynamic topology in practice road network.
In order to evaluate our proposed methods' advantage further, the simulation compares the performance among three methods, i.e., SVM, PSO-LSTM and ARIMA models. These methods have been applied to various sequence data prediction tasks and satisfactory results have been obtained. In comparison to the baseline algorithms, we use the same dataset to make predictions and evaluate the performance of our algorithm. Multiple algorithms are employed to perform predictions on the dataset, and the evaluation is carried out using the same evaluation function. Additionally, we also perform optimization on the baseline algorithms to ensure a fair comparison. Table 3 shows the comparison results of prediction accuracy between MVHS-LSTM and other models.
The MAPE score is particularly important for the performance of the indicator model. MAPE is used to detect the average absolute percentage error of each prediction model and the sample-observed traffic flow. Furthermore, for different sections of the data, the MAE data are not accurate, and MAPE can better reflect the average prediction accuracy. As shown in Table 3, the MAPE score of our model is 5.8102%, which is 10.4% lower than the ARIMA model, 13.1% lower than the SVM model, and 1.3% lower than the PSO-LSTM model. Our model does not exceed the SVM and ARIMA model in computing time. However, compared to the ARIMA model, our model is 37.2268 lower in MAE, 58.5902 lower in RMSE, 0.6715% lower in MAPE, and only 175 s longer in computing time. Compared to the PSO-LSTM model, although the evaluation metrics are slightly lower than our model, the computing time of the PSO-LSTM model is 513.4% longer than our model, which is unacceptable. Therefore, our model has outstanding comprehensive performance among several comparison models.
Figure 7, Figure 8, Figure 9 and Figure 10 are comparison charts of the traffic flow prediction effect.
It can be seen from the chart that the ARIMA model utilizes differencing methods, and its predicted values consistently fall between the actual data points. Overall, the predictions are relatively stable, but they lack short-term forecasting accuracy, and while the model captures the general trend reasonably well, most of the values are overly averaged, failing to capture the short-term characteristics of traffic flow and resulting in fewer fluctuations or spikes in the predictions. The SVM model focuses primarily on short-term characteristics and lacks the ability to learn from long time series data. As observed, the predictions in the first half of the sequence are relatively accurate. However, in the latter half of the predictions, the SVM method gradually deviates from the original data and tends to be positioned above the overall trend. With the passage of time, the predicted values may become increasingly inaccurate. The PSO-LSTM model utilizes certain hyperparameters from particle swarm optimization to find the optimal performance point. However, it is associated with long computation times, making it unsuitable for studying real-world traffic flow data. The MVHS-LSTM model not only effectively predicts peak and valley values, but it also demonstrates improved performance. It boasts a low computation time, making it well suited for practical applications.
The update time of our online system depends on the frequency at which the traffic management system transmits data. It is determined by taking the maximum value between the time required for training and prediction in our system and the time interval at which the traffic management system updates its data. Our offline system is responsible for training the data, and based on experimental simulation results, we have obtained an average time of 366 s, which translates to approximately 6–7 min for completing the training and prediction process. If the traffic management system transmits data to us within a time interval of less than 6 min, the update time will be set to 6 min. On the other hand, if the traffic management system transmits data at intervals longer than 6 min, the update time will be equal to the duration of that data transmission cycle.
This paper aims to address the challenge of achieving a balance between accuracy and computing time in traffic flow prediction. The proposed enhanced LSTM model, known as MVHS-LSTM, employs a three-layer architecture to analyze traffic flow variables and MVHS-LSTM hyperparameters using an offline system. Taking into account actual road factors, the variables are carefully selected to determine the validity of traffic flow information and ensure data harmonization. The paper uses the method of multiple regression to calculate the R-squared and selects the most suitable variable for learning from multiple variables through formulas as the input variable for learning. This solved the problem of the excessive training burden caused by too many input variables in identified challenges. In terms of innovation for LSTM models, the paper uses the iterative method for each parameter, substitutes the calculation error and time into the formula to obtain the minimum value and selects it as the input for the variable. The MVHS-LSTM model undergoes iterative refinement to optimize hyperparameters and establish an equilibrium point between model error and computational time. By implementing the strategy of ending iterations in advance, compared to the PSO-LSTM model, there is no need for a too long training time, which saves the time for model learning and training. Furthermore, multiple sets of comparative tests were conducted, comparing the performance of the MVHS-LSTM model with the ARIMA, SVM, and PSO-LSTM models. The MAPE of the MVHS-LSTM model is 5.8102%, which is 10.4% lower than the ARIMA model, 13.1% lower than the SVM model, and 1.3% lower than the PSO-LSTM model. At the same time, the calculation time is also reduced by 513.4% compared to the PSO-LSTM model. The results demonstrate that our approach achieves a harmonious balance between prediction accuracy and computational time, enabling rapid responsiveness to real-time traffic flow variations and carrying practical significance for future applications. By analyzing and predicting data on traffic flow, congestion patterns, and road usage, traffic management departments can better optimize traffic signals, route planning, and traffic control strategies, thereby alleviating congestion, reducing traffic accident risks, and improving the overall traffic operation efficiency of roads. At present, this paper only focuses on predicting traffic flow on a single road segment. Subsequent research can synchronize predictions in space based on the topology map of the road segment, thereby radiating to the entire city and bringing new thinking and innovation to urban-level traffic prediction. At the same time, this paper does not consider the spatial relationship between road segments, specifically the external spatial characteristics of traffic flow data. In future research, we will further consider the spatial relationships between road segments within the traffic flow data, aiming to enhance the theoretical framework.
Graph: Figure 1 Overall working flow chart.
Graph: Figure 2 The structure of the LSTM memory block.
Graph: Figure 3 Illustration of evaluation scenario: Yan' an Viaduct near the Shaanxi Road in front of the Shanghai Exhibition Hall.
Graph: Figure 4 The degree of epochs affecting MAPE and time.
Graph: Figure 5 The degree of learning rate affecting MAPE and time.
Graph: Figure 6 The degree of window length affecting MAPE and time.
Graph: Figure 7 ARIMA model prediction effect chart.
Graph: Figure 8 SVM model prediction effect chart.
Graph: Figure 9 PSO-LSTM model prediction effect chart.
Graph: Figure 10 MVHS-LSTM model prediction effect chart.
Table 1 Summary of the main mathematical notations.
Notation Description Q traffic flow data s the row of Q and the number of data for each variable z the column of Q and number of independent variables regression coefficient disturbance term the column of Q and number of independent variables the sum of the squares of the differences between the fitted values and the actual values mean value of real traffic data j,p,m the variables in the k selected variables long-term memory cell short-term memory cell forget gate , input gate output gate ,,,,,,, weight value of gates completed one iteration of all batches window length the rate of learning the cost of model the loss of model threshold of epoch the number of input nodes the number of output nodes the number of hidden layer nodes
Table 2 R-squared comparison among variables.
Speed Lanes VEH-Distant Speed+Lanes+Veh-Distant R-squared 0.1362 0.0067 0.0015
Table 3 Results of evaluation function between methods.
ARIMA SVM PSO-LSTM MVHS-LSTM MAE 110.8981 78.0017 RMSE 151.6354 93.0452 MAPE 6.4817% 6.6848% Time/s 1879 366
C.G. designed the system model and revised the manuscript. J.Z. drafted the manuscript, evaluated the experiments, and drew the figures. X.W. supervised the manuscript and improved the architecture. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
The raw data supporting the conclusions of this article will be made available by the authors on request.
The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.
By Chang Guo; Jianfeng Zhu and Xiaoming Wang
Reported by Author; Author; Author