MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection

Guo, Chang ; Zhu, Jianfeng ; et al.

In: Applied Sciences, Jg. 14 (2024-03-01), Heft 7, S. 2959-2959

Online academicJournal

Zugriff:

Volltext (PDF)

In recent years, the rapid growth of vehicles has imposed a significant burden on urban road resources. To alleviate urban traffic congestion in intelligent transportation systems (ITS), real-time and accurate traffic flow prediction has emerged as an effective approach. However, selecting relevant parameters from traffic flow information and adjusting hyperparameters in intelligent algorithms to achieve high prediction accuracy is a time-consuming process, posing practical challenges in dynamically changing traffic conditions. To address these challenges, this paper introduces a novel prediction architecture called Multiple Variables Heuristic Selection Long Short-Term Memory (MVHS-LSTM). The key innovation lies in its ability to select informative parameters, eliminating unnecessary factors to reduce computational costs while achieving a balance between prediction performance and computing efficiency. The MVHS-LSTM model employs the Ordinary Least Squares (OLS) method to intelligently reduce factors and optimize cost efficiency. Additionally, it dynamically selects hyperparameters through a heuristic iteration process involving epoch, learning rate, and window length, ensuring adaptability and improved accuracy. Extensive simulations were conducted using real traffic flow data from Shanghai to evaluate the enhanced performance of MVHS-LSTM. The prediction results were compared with those of the ARIMA, SVM, and PSO-LSTM models, demonstrating the innovative capabilities and advantages of the proposed model.

MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection

Keywords: traffic prediction; heuristic selection; LSTM; deep learning

1. Introduction

With the growing issue of severe urban road congestion on a global scale, the prediction of traffic congestion has become an urgent problem that requires swift and effective solutions. Each year the Dutch company TOMTOM releases a notable report focusing on global traffic congestion levels. The data consistently show an upward trend in traffic congestion around the world over the past decade [[1]]. As a result, accurate prediction of traffic flow in diverse scenarios has garnered significant attention from academia and traffic management organizations as a promising solution.

Deep learning models are commonly used for traffic flow prediction. Typically, the process involves preprocessing data, training the model with historical data, optimizing hyperparameters for best evaluation results, and finally using the latest data to make predictions. Given that the Long Short-Term Memory (LSTM) model integrates the characteristics of both long-term and short-term data for prediction, it is highly suitable for application in traffic flow prediction scenarios. Currently, the process of using LSTM for traffic flow prediction still has a few limitations. One is that the presence of excessive and irrelevant data in the input can increase the cost of training and learning. This means that the model needs to process and analyze a large amount of unnecessary information, which can be time-consuming and inefficient. When training against the model, if there are too many input variables, then the computation will increase accordingly, resulting in longer time required for forward and back propagation in each layer. Especially in the fully connected layer, this may cause the model to require more memory to store parameters and intermediate calculation results, thereby affecting training speed [[2]]. The other is that the relentless pursuit of high accuracy in the LSTM model can result in significant time costs. This can be particularly problematic for practical applications where real-time predictions are required. The model may not be suitable for such scenarios due to the long training and prediction times. To address these limitations, it is important to refine the data preprocessing stage to include only relevant and reliable information. Additionally, exploring optimization techniques that strike a balance between accuracy and computational efficiency can help reduce the time cost associated with training and prediction.

The contributions of this paper are as follows:

Firstly, the paper highlights our approach to the rational dimensionality reduction of traffic flow data, which helps in reducing the computational complexity [[3]]. By combining relevant traffic flow parameters and considering realistic road factors, our proposed method effectively integrates and processes the data, resulting in a more efficient computation process. Additionally, the presented closed formulae capture the relationships among traffic flow parameters, further enhancing the data processing approach [[4]].

Secondly, three heuristic iterative models are introduced to enhance the MVHS-LSTM model, enabling a more convenient analysis of the impact of various factors such as the number of hidden layer nodes, window length, learning rate, iteration times, and the proportion of training and test sets on model error and computational time. These improvements in the MVHS-LSTM model establish a clear relationship between model error and time complexity, allowing for the identification of an equilibrium point based on the observed tendency [[5]].

Finally, extensive simulations are conducted to validate the effectiveness and accuracy of the proposed model. By substituting the optimal solution into the MVHS-LSTM model and comparing its performance with other algorithms, the results demonstrate that the proposed algorithm achieves a balance between the computing time and model error. The simulations not only significantly reduce the computing time but also provide a theoretical foundation for data processing methods.

The remainder of the paper is organized as follows. Section 2 discusses the related works of the traffic flow prediction field. Section 3 introduces the system model, including the traffic data preprocessing model, LSTM model and Cost–Loss balanced model. Section 4 presents the algorithm design. Performance evaluation is presented in Section 5. Finally, Section 6 summarizes this paper.

2. Related Works

Traffic flow prediction is becoming more and more important in intelligent transportation systems (ITS). In recent years, researchers used a variety of algorithms to improve the performance of traffic flow prediction. There are two main categories of algorithms for traffic flow prediction. The first is a kind of prediction model based on traditional mathematical and physical methods such as mathematical statistics and calculus, i.e., the Kalman filter, Auto Regressive Integrated Moving Average (ARIMA) model and so on. Considering the effect on realistic traffic flow data, the other category methods focus on the neural networks or deep learning technologies, which increase the performance via training and iterations, i.e., the Support Vector Machine (SVM) model, nonparametric regression model, Bayesian model, neural network models and hybrid neural network models.

2.1. Traditional Methods

Within the first category of methods, Li et al. used ARIMA and the Grey Prediction Model to predict the passenger flow data of a subway entrance. It is concluded that the ARIMA model is more suitable for obtaining formulas from complex training sets, which leads it to obtain more accurate prediction data [[6]]. Vidya et al. included the Kalman filter to estimate and predict the passenger flow of buses. When the bus can arrive at the appropriate time, it can reduce the waiting time of passengers. In the meantime, public transport companies can reasonably arrange vehicle shifts to reduce operating costs [[7]]. This kind of algorithm can express the prediction problem with a simplified model, which is interpretable and fast. However, the prediction performance of traditional methods is relatively weak when the mass traffic data represent nonlinear and random characteristics.

2.2. Deep Learning Methods

Within the second category of methods, this paper selects several representative examples for illustration. In order to solve the problem of urban traffic congestion, Feng et al. estimated and predicted the traffic state based on an adaptive multi-kernel Support Vector Machine (AMSVM). The Gaussian kernel and polynomial kernel were combined to form AMSVM and analyze the nonlinearity and randomness of traffic flow with spatio-temporal information. The parameters of AMSVM were optimized by the adaptive particle swarm optimization algorithm, and a new method for adaptively adjusting the mixed kernel weight according to the trend of real-time traffic flow was proposed [[8]]. Arif et al. equipped the deep learning technique with non-parametric regression. The first layer in the deep learning algorithm identifies spatio-temporal relationships between predictors and other non-linear relationships. The results were better than Online-Support Vector Regression (OL-SVM), Stacked Auto-Encoder (SAE) and Back-Propagation-Neural Network (BP-NNet). However, the model did not consider the multivariate time series for traffic prediction [[9]]. Zhou et al. analyzed the autocorrelation and distribution characteristics of traffic flow to correct the structure of the time series model. Based on the Gaussian distribution of the traffic flow, a hybrid model based on the Bayesian learning algorithm was proposed, which improved the Seasonal Autoregressive Integrated Moving Average (SARIMA)'s scenario and performed better in prediction accuracy and effectiveness [[10]]. Huang et al. proposed that traffic flow data are mainly treated as time series data and chose a Bidirectional Recurrent Neural Network (BRNN) model in RNN. Compared with LSTM model and Gated Recurrent Unit (GRU) model; this model is good at processing time series data [[11]]. Wu used the Convolutional Neural Network (CNN) to predict the spatio-temporal dynamics of urban road traffic flow and compared it with LSTM, Recurrent Neural Network (RNN), AlexNet and Spatio-Temporal Graph Convolutional Network (STGCN) models. The prediction accuracy and robustness of the proposed method had better performance [[12]]. Hu et al. proposed an urban traffic flow prediction method based on the graph convolution neural network, which can excavate the spatio-temporal dynamic mode of urban road network traffic flow and realized accurate traffic flow prediction [[13]]. Cheng et al. used the Vector Auto-regression (VAR) model to evaluate the internal correlation between traffic variables and determined the predictable relationship between these variables. Then, the CNN-LSTM hybrid neural network model was used to predict the multi-feature speed of a spatial position. The prediction results showed that the multi-feature prediction was better than the single feature prediction [[14]]. Lin et al. combined CNN and LSTM to predict road traffic risk using spatio-temporal multi feature traffic data. Compared with previous studies, our approach has lower loss values and a better convergence speed [[15]].

After meticulously analyzing the existing literature, it becomes evident that, while numerous studies have successfully enhanced data accuracy, there persist several unresolved issues that are seldom discussed. Firstly, there is a pressing need to take into account the intricate interdependence among traffic flow data. It is imperative to recognize that, in the prediction process, it is not always necessary to feed the entire dataset into the model for learning. This approach can often lead to an unwarranted increase in the learning burden, thereby compromising the efficiency of the prediction system. Furthermore, it is crucial to recognize that solely prioritizing accuracy without giving due consideration to learning time is inadequate for real-time traffic flow prediction scenarios. This approach fails to align with the inherent real-time nature and the rapid, dynamic changes that are characteristic of traffic flow. It is the lack of a balanced approach that takes into account both accuracy and learning time that hinders the development of truly effective real-time traffic flow prediction systems.

Therefore, the paper aims to address the unresolved issues in traffic flow prediction by proposing a comprehensive approach that takes into account both accuracy and learning time. Our primary focus is on the rational dimensionality reduction in traffic flow data, which aims to reduce computational complexity and enhance prediction efficiency. By combining relevant traffic flow parameters and considering realistic road factors, our method effectively integrates and processes the data, leading to more accurate and responsive predictions. Furthermore, we recognize the need to strike a balance between accuracy and learning time, which is crucial for real-time traffic flow prediction scenarios. To achieve this, we introduce heuristic iterative models that complement the MVHS-LSTM model, enabling a convenient analysis of the impact of various factors on model error and computational time.

3. System Model

This section presents a system framework of traffic flow real-time prediction and proposes a traffic data preprocessing model, LSTM model and Cost–Loss balanced model. The quantitative analysis of the factors that can affect the prediction results in the model is also discussed in this section. The notations used in this section are described in Table 1.

3.1. System Architecture

In the section of System Architecture, we delve into the logical relationships among these sub-models, aiming to achieve dynamic traffic flow prediction. As shown in Figure 1, the system contains three processes during the traffic flow prediction, i.e., the traffic flow analysis based on real-time information acquisition, the offline analysis to determine the training factors, and the online prediction for traffic flow. We assume that the traffic flow data is obtained via V2V and V2R communication in VANETs [[16]], which depends on our previous work [[17], [19]]. The on-board unit (OBU) on vehicles enables V2V communication with other vehicles on the road segment and V2R communication with RSUs at intersections. The OBU records vehicle information like ID, position, and real-time speed. By obtaining real-time information about subsequent road segments from RSUs via V2R communication, the vehicle estimates travel time based on its recorded real-time speed. This information is relayed along the road segment until a relay vehicle enters the RSU communication range at intersections, allowing the RSU to receive real-time traffic information through V2R communication. The messages including the traffic flow data can be acquired and shared during vehicles and RSUs via the multi-hop and carry-and-forward transmission technology in VANETs. We obtain the traffic flow and speed information in unit time by the RSU (roadside unit) [[20]] and form the data set.

The real-time traffic flow information is obtained by the cameras equipped on the roadside or the flowmeters directly connected to the RSU. The RSUs at the road interactions contain not only cache to store and relay the traffic data, but also a tiny server to complete the calculation and training tasks. Each RSU in the distributed system acquires and trains traffic flow data and hyper-parameters of the road segments it covers, and then, it uploads the results, completing the task in the offline sub-systems. Then, the RSU uploads the nearest cellular base station, which transmits the message to the offline system in the vehicle traffic server to complete the parameter setting of the MVHS-LSTM on the road. When the road segment has the prediction requirement, based on the collected information, the online system in the vehicle traffic server can provide global traffic flow prediction for the road. The paper considers the internal spatial correlation within road segments by incorporating their spatial properties (such as lane number, inter-vehicle distance and vehicle speed) into the matrix Q.

As shown in Figure 1, our system consists of three subsystems:

(a) Traffic flow analysis system: The acquired traffic flow is multi-dimensional data that contain many variables, i.e., traffic flow data, vehicle spacing, average speed, map data (latitude and longitude), road information, travel cost and so on. It is difficult to distinguish which variables are needed in prediction. Therefore, this paper proposes a method to decrease unnecessary variables in most traffic flow data. Considering the correlations among the factors, the subsystem selects the optimal variables as the input for traffic flow data prediction heuristically.

(b) Offline system: In order to reduce the prediction computing cost, the system involves an offline system to analyze the MVHS-LSTM parameters. According to the historical traffic flow data, the offline system can find the parameters suitable by using the MVHS-LSTM model. We fine-tune multiple hyperparameters for the LSTM model, such as the window length, number of iterations, learning rate and so on. These adjustments are made to ensure optimal values in the offline system. The MVHS-LSTM model combines with these parameters and achieves a balance between the error and the time of the traffic flow prediction. In practice, the system determines the distribution of various dynamics based on users' demand for prediction accuracy and prediction cost.

(c) Online system: The online system imports the calculated parameters from the offline system and outputs the data that are predicted once. The system continuously circulates and consistently outputs predicted data throughout the process.

The obtained traffic flow data will be filtered through the method in this paper to screen out the variables that are beneficial to the prediction results. The historical data are put into the offline system for analysis, and then, the real-time prediction is carried out through the online system. The proposed architecture takes account of both prediction accuracy and computing time via the cooperation of the offline system and the online system.

3.2. Traffic Data Preprocessing Model

In order to reduce the computational burden caused by abundant variables, the model needs to filter the variables firstly. In accordance with the difference in effect, whether variables have a good impact to forecast traffic flow data, the model filters some useless parameters and retains some effective variables.

We assume a matrix Q as the presentation of collected traffic flow data, which is denoted as follows:

(1) $Q_{s \times z} = \begin{matrix} q_{11} & q_{12} & \dots & q_{1 z} \\ q_{21} & q_{22} & \dots & q_{2 z} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ q_{s 1} & q_{s 2} & \dots & q_{s z} \end{matrix} .$

Here, the columns of Q denote characters of observed traffic flow data, e.g., the real-time speed, the number of lanes, etc. The number of collected characters is $z - 1$ , and the last column records observed traffic flow values. The rows of Q denote the traffic flow data in different times slots. The number of traffic flow data collection time slots is s.

So as to find the conclusive variables, the paper chooses the Multiple Regression Analysis method [[21]] to analyze the numerous variables. Multiple regression analysis is the method to research the linear correlation between multiple independent variables and dependent variables. The formula of the multiple regression model is as follows:

(2) $\hat{Q_{\cdot z}} = \hat{β_{0}} + \sum_{i = 1}^{z} (\hat{β_{i}} \hat{Q_{\cdot i}}) + δ .$

Here, $\hat{Q_{\cdot z}}$ is the dependent variables traffic flow. $\hat{Q_{\cdot i}}$ is the total vectors corresponding to each variable, $\hat{β_{i}}$ is the regression coefficient, and $δ$ is the disturbance term.

Since the dependent variable to be regressed is the continuous variable data, the Ordinary Least Squares (OLS) model [[22]] is suitable to evaluate the multiple regression model, whose representation is as follows:

(3) $L = \sum_{i = 1}^{z} {(\hat{Q_{i \cdot}} - \hat{Q})}^{2},$

where $\hat{Q_{i \cdot}}$ is real traffic data, and $\hat{Q}$ is the result calculated by regression. The purpose of the OLS model is to minimize L, making the regression formula match the real traffic data as much as possible.

Based on multiple regression, we can obtain an equation for the minimum value tested by OLS model. The selected variables need to use the Joint Hypotheses Test [[23]] to test the significance of the overall regression equation. The test aims to evaluate whether the overall interpretation of the dependent variable by all independent variables is appropriate.

The Joint Hypotheses test is used to analyze statistical models that use more than one parameter to determine whether all of the parameters in the model are suitable for estimating the parent variable. The statistical value of the F-test can be calculated to obtain the statistic value by

(4) $F = \frac{M S R}{M S T} = \frac{\frac{S S R}{D F_{R}}}{\frac{S S T}{D F_{T}}},$

where $D F_{R}$ is the degrees of freedom for SSR (Regression Sum of Squares), and $D F_{T}$ is the degrees of freedom for SST (Sum of Squares for Total). The calculated F-value can be obtained by checking the table for its corresponding $P_{F}$ .

$P_{F} < 0.05$ means that the original hypothesis is rejected and not all coefficients are 0, so the regression is significant. Otherwise, it means that we do not reject the original hypothesis, i.e., all coefficients are 0 and the model is not reasonable.

After satisfying the Joint Hypotheses test, the goodness of fit of the overall regression equation needs to be tested. It is judged on the basis of R-squared, which is denoted as follows:

(5) $R^{2} = 1 - \frac{S S E}{S S T} = 1 - \frac{\sum_{i = 1}^{z} {(Q_{i} - \hat{Q})}^{2}}{\sum_{i = 1}^{z} {(Q_{i} - \bar{Q})}^{2}},$

where $\bar{Q}$ is mean value of real traffic data. The model fits better with a larger $R^{2}$ value.

Due to the redundance of matrix Q, the collected characters of traffic flow should be selected and filtered based on R-squared. The correlation between each independent variable and the dependent variable is calculated by multiple regression. The independent variable with weak R-squared value is eliminated. The dimension of Q is reduced, which has a positive impact on the following prediction. In order to select the characters based on R-squared, we discuss the issue with two cases.

Case 1: The correlation measurement between traffic flow and single character. For $\forall Q_{\cdot j} \in Q, j \neq z,$ the correlation between the traffic flow and the character j is denoted as follows:

(6) $R^{2} [Q_{\cdot j}, Q_{\cdot z}] = 1 - \frac{\sum_{i = 1}^{z} {(Q_{i j} - \hat{Q_{i z}})}^{2}}{\sum_{i = 1}^{z} {(Q_{i j} - \bar{Q_{i z}})}^{2}} .$

The correlation between traffic flow and the character $Q_{\cdot j}$ is strong as $R^{2} [Q_{\cdot j}, Q_{\cdot z}] > 0.49$ , moderate as $R^{2} [Q_{\cdot j}, Q_{\cdot z}] \in [0.25, 0.49]$ , and weak as $R^{2} [Q_{\cdot j}, Q_{\cdot z}] < 0.25$ . In this paper, we assume that character $Q_{\cdot j}$ has a correlation with traffic flow and should be reserved for prediction. Otherwise, the character $Q_{\cdot j}$ with R-squared less than 0.25 will be eliminated to reduce the prediction cost due to the weak correlation.

Case 2: The correlation measurement among multiple characters. $\forall Q_{\cdot j}, Q_{\cdot p},$ ... $, Q_{\cdot m} \in Q, j, p,$ ... $, m \neq z,$ the matrix Q eliminates as $Q^{'}$ based on R-square, which is denoted as follows:

(7) $Q^{'} = \begin{matrix} Q, & R^{2} \geq 0.25 \\ Q \cap arg max Q_{\cdot k} [R^{2} [Q_{\cdot k}] | k \in [j, p, ..., m]], & R^{2} < 0.25 \end{matrix}$

Furthermore, the correlation among multiple characters is considered in this case. The result is the same as for Case 1 as $R^{2} > 0.25$ , in which all the considered characters are reserved. When $R^{2} < 0.25$ , it means that all considered factors and traffic flows are weakly correlated. Therefore, the largest character in $R^{2}$ is selected as the only variable. According to Case 1 and Case 2, the original traffic flow matrix $Q_{s \times z}$ is the contracted dimension $Q_{s \times k}^{'}$ , in which $k \leq s$ .

3.3. Long Short-Term Memory Network (LSTM)

Accurate traffic flow prediction demands continuous and uninterrupted forecasting, as traffic characteristics continuously evolve over time. Short-term data, measured in minutes or hours, and long-term data, measured in weeks or months, manifest distinct features. To ensure precision in data prediction, it is essential to consider both the long-term and short-term characteristics of the data. LSTM stands out for its capability to handle long time series and facilitate short-term as well as long-term predictions, making it an ideal choice for traffic flow prediction.

LSTM is an algorithm in RNN, which is used to solve the problem of gradient disappearance and gradient explosion in long sequence training. LSTM mitigates the vanishing gradient problem by incorporating a memory cell and a set of gates that regulate the flow of information within the network. The memory cell acts as a long-term storage unit, allowing the network to retain information over longer sequences. The gates, including the input gate, forget gate, and output gate, control the flow of information into and out of the memory cell. The forget gate selectively decides which information to discard from the memory cell, preventing irrelevant information from persisting and reducing the impact of vanishing gradients. The input gate regulates the update of the memory cell with new information, preventing the exploding gradients. These mechanisms enable LSTM to capture long-term dependencies and alleviate the challenges associated with gradient propagation. Therefore, LSTM performs better in longer sequences than ordinary RNNs. The control flow of LSTM is similar to RNN, which processes the data flowing through cells in the process of forward propagation. LSTM is suitable for analyzing long sequence data, and this paper uses the advantage of LSTM to predict long-term traffic flow data [[24]].

Long-term memory cell:

(8) $C_{t} = f_{t} ⨂ C_{t - 1} + i_{t} ⨂ {\hat{C}}_{t}$

Short-term memory cell:

(9) $h_{t} = O_{t} ⨂ tanh (C_{t})$

Forget gate:

(10) $f_{t} = σ (W_{f} \cdot [h_{t - 1}, Q_{t \cdot}^{'}] + b_{f})$

Input gate:

(11) $i_{t} = σ (W_{i} \cdot [h_{t - 1}, Q_{t \cdot}^{'}] + b_{i})$

(12) ${\hat{C}}_{t} = tanh (W_{c} \cdot [h_{t - 1}, Q_{t \cdot}^{'}] + b_{c})$

Output gate:

(13) $O_{t} = σ (W_{o} [h_{t - 1}, Q_{t \cdot}^{'}] + b_{o})$

The long-term memory cell stores long-term traffic flow information, while the short-term cell stores short-term traffic flow memory information. The long-term memory cell forgets parts of unnecessary long-term traffic flow information through the forgetting gate. The function of the forgetting gate is to decide if information should be discarded or retained. The information from the short-term memory cell and the current input is passed to the sigmoid function at the meantime. The probability that the short-term memory is stored is denoted as $σ (•) = \frac{1}{1 + e^{- •}}$ . The output value is between 0 and 1. A value closer to 0 means that more probability should be discarded. Otherwise, a value closer to 1 means that more probability should be retained in the new state [[25], [27]].

The input gate transmits the short-term memory cell of the previous layer and the current traffic flow data to the tanh function to create a new candidate vector ${\hat{C}}_{t}$ and calculates $i_{t}$ from the short-term memory cell of the previous layer and the current traffic flow data and transfers it to the sigmoid function. The value is modified from 0 to 1, and it is used to determine whether the information needs to be updated. A value of 0 means that the short-term input information is not important and 1 means that it is important. ${\hat{C}}_{t}$ $i_{t}$ will determine the information that is important and needs to be retained.

The output gate is used to determine the value of the next hidden state, which contains the information previously input. The output gate controls $C_{t}$ to select some relevant memories to generate $h_{t}$ that we are concerned about at this moment, and then, $h_{t}$ is output.

In Figure 2a, $Q_{t}$ represents a slice of the parameter matrix Q outputted by our traffic data preprocessing model at time t with a window length. $h_{t}$ represents the output in each time sequence, which corresponds to the short-term output. $C_{t}$ represents the long-term output. Both the long-term and short-term outputs are then fed into the next LSTM unit, and this process continues. This is similar to the representation shown in Figure 2b, where $Q_{t} - 1$ , $Q_{t}$ and $Q_{t} + 1$ are slices of the Q matrix obtained by shifting the window length by one unit at a time.

3.4. Cost–Loss Balanced Model

Based on the above discussion, the real-time traffic flow prediction issue can be extracted as follows:

(14) $A R G = arg min e p o, r a t, w w C o s t (Q^{'}) + L o s s (Q^{'}),$

where

(15) $C o s t (Q^{'}) = \sum_{i = 1}^{s} e^{(ω_{1} e p o + ω_{2} r a t) Q_{i \cdot}^{'}} + ω_{3} tanh [Q_{i \cdot}^{'}, ..., Q_{i + w w \cdot}^{'}];$

$L o s s (Q^{'}) = \sum e p o, r a t, w w log [σ (Q_{i \cdot}^{'})] + log [1 - σ (Q_{i \cdot}^{'})] .$

The constrains of the cost–loss balanced model are presented as follows:

(1) $ω_{1} + ω_{2} + ω_{3} = 1$ ;

(2) $e p o \leq e p o_{t h r}$ ;

(3) $w w + i \leq s$ .

The objective of the model aims to determine the most suitable parameters for prediction depending on the equilibrium between cost and loss. Here, $e p o$ denotes the selected number of training iterations. $r a t$ is represented as the learning rate of original LSTM. $w w$ denotes the width of the window size, which decides the duration of prediction.

The three parameters impact the performance of the overall performance. Based on the drift-plus-penalty architecture [[19]], we balance the cost and loss during the prediction and find the equilibrium to determine the parameters above. The loss decreases as the parameters change to improve the prediction accuracy, while the negative consequence is the encasement of the computation cost. Therefore, the sum of cost and loss reaches the minimum value as the three parameters achieve equilibrium.

As shown in Equation (15), we define two variables, i.e., $C o s t (Q^{'})$ and $L o s s (Q^{'})$ to perform the computing cost and accuracy, respectively. $C o s t (Q^{'})$ denotes the computing time when the traffic flow data $Q^{'}$ are input for prediction. Larger $C o s t (Q^{'})$ means the system takes an amount of time to calculate the prediction results. Meanwhile, $L o s s (Q^{'})$ is used for evaluating the accuracy of prediction results, which pursue a relatively small value. The cost and loss performance is assumed as having a non-linear positive correlation with the parameter epoch $e p o$ and learning rate $r a t$ [[28]]. $ω_{1}$ and $ω_{2}$ are used to adjust the ratio of these two parameters' influence. $Q_{i}$ denotes the ith training traffic flow data. The correlation between cost and the third parameter is positive; however, the increasing ratio is gentle relatively in the medium range. Therefore, we exploit a tanh function to express this correlation. Here, $tanh (\cdot) = \frac{e^{\cdot} + e^{- \cdot}}{e^{\cdot} - e^{- \cdot}}$ . $ω_{3}$ denotes the weight parameter to adjust the third correction window width $w w$ . The loss performance relates to binary cross-entropy during the entire training based on the three parameters above. The variable in the logic function $σ (\cdot)$ is also represented as the sigmoid function.

Furthermore, the extracted cost–loss balanced model should consider three constrains: (1) the linear sum of three weights for epoch, learning rate, and window width is denoted as 1. The first constrain guarantees the three parts of cost performance are maintained at the same order of the magnitude. (2) and (3) limit the upper bound of the selection results of the epoch and window width. In the next section, we will discuss the detailed MVHS-LSTM algorithm to achieve the comprehensive consideration between the computing time and prediction accuracy based on this cost–loss balanced model.

4. Algorithm Design

In this section, the improved MVHS-LSTM is designed to balance the accuracy and computing cost, which considers the parameters heuristically. In the analysis of the MVHS-LSTM model for data prediction, there are five variables that have a large impact on the prediction of the results, i.e., the number of hidden layer units of the MVHS-LSTM model, ratio of training and test data, learning rate, window length, and epoch.

Due to the uncertainty and continuous time series of traffic data, the system cannot input the entire data for learning at the beginning. The system involves window length to control the number of input data per time. The window length refers to the number of data points processed by the model each time. The extent of the window length directly correlates with the amount of contextual information that the model can incorporate, thereby enhancing its capacity to comprehend and predict the protracted trends within the sequence. Nevertheless, excessively protracted window lengths can result in undue complexity, rendering the model challenging to train and potentially introducing extraneous noise information. Conversely, a window that is unduly brief may constrain the model's representational prowess, preventing it from fully encompassing the long-term dependencies of the data and impeding its ability to comprehensively learn the inherent patterns and laws governing the data.

The MVHS-LSTM cell learns the data $Q_{t \cdot}^{'}$ within the length of the window and also combines the long-time memory cell $C_{t - 1}$ and short-time memory cell $h_{t - 1}$ of the previous time cell. The system outputs the short-time prediction $h_{t}$ by learning MVHS-LSTM. Then, the time window slides from one unit to the next. The process loops eventually obtains all of the forecast data.

According to reference [[29]], we select hold-out method to determine the ratio of training and test, which denotes 7:3. This method aims to divide the available data set into training and testing subsets, which are used for model training and performance evaluation, respectively. We select a 7:3 ratio for training and testing data, which is a common choice that provides an adequate amount of data for model training while reserving a sufficient portion for performance evaluation. Based on reference [[30]], the number of hidden layer nodes of neural network can be determined as follows:

(16) $H i d = \sqrt{a_{1} I n^{2} + a_{2} I n O u t + a_{3} O u t^{2} + a_{4} I n + a_{5} O u t + a_{6}}$

where $H i d$ is the number of hidden layer nodes, $I n$ represents the number of input nodes, and $O u t$ denotes the number of output nodes. The number of input layer nodes determines the granularity at which the model extracts information from the raw data, which is used to capture important features of the data. The number of hidden layer nodes determines the ability of the LSTM model to process information internally. The more hidden layer nodes there are, the more complex features the model can learn, but it also increases the complexity and training difficulty of the model. The number of output layer nodes generally corresponds to the number of predicted values for each time step. The relationship between them is that the input layer converts the raw data into a format that the model can process, while the hidden layer is responsible for extracting and memorizing useful information from the input data and ultimately producing prediction results through the output layer. The calculation needs to be rounded to adjust $\vec{a}$ . In this article, the input is represented by the Q matrix, and the number of columns in the Q matrix is used as $I n$ to denote the number of input nodes. Since the input consists of a single predicted value, $O u t$ in this article is equal to 1. So the above formula can be simplified to

(17) $H i d = \sqrt{a_{1} I n^{2} + a_{2} I n + a_{3}}$

An epoch signifies the total count of instances where the model has comprehensively traversed the entire dataset throughout the training process. As the iteration count escalates, the LSTM model acquires additional opportunities to learn and refine its parameters from the training data. Typically, this implies that the model can adapt more precisely to the data and exhibit superior performance on the test set. Nevertheless, an excessive number of iterations can potentially induce overfitting in the model, manifesting as excellent performance on training data but a decline in performance on unseen data. The iteration count serves as a direct determinant of the model training duration. Initially, as the iteration count rises, the model's error often decreases swiftly, displaying a rapid convergence rate. However, as the model nears its optimal solution, further iterations may yield only minor performance improvements and, in some cases, may even result in performance fluctuations or degradation. Based on aforementioned discussion, the system determines the parameters in offline process to reduce the computing time in prediction. Based on the aforementioned discussion, the system determines the parameters in the offline process to reduce the computing time in prediction. The offline system contains three layers. The first layer is the LSTM layer and four parameters are considered: batch size, input dimensions, window length, and the number of hidden layer units. The LSTM layer, also known as the long short-term memory layer, is the core part of the LSTM model. Its main function is to process sequence data and capture long-term dependencies in the data. Through its internal memory unit and gating mechanism, the LSTM layer can effectively handle the long-term dependency and gradient vanishing problems in time series data, thereby improving the model's prediction and classification capabilities. The first parameter is the number of samples that are input into MVHS-LSTM at one time. The second parameter is the number of input dimensions in the same time series, which can be a multi-dimensional input. The third and fourth parameters have been described above. The second layer is the Flatten layer, the function of the Flatten layer is transform the multidimensional input into one-dimension, that is, to flatten the multidimensional feature maps into one-dimensional feature vectors. This usually occurs after the convolutional layer is used to convert the multi-dimensional feature map output by the convolutional layer into a format that can be processed by the fully connected layer. The Flatten layer flattens multidimensional data, making it easier for subsequent fully connected layers to process this data. The third layer is the Dense layer, also known as the fully connected layer, and is typically located after the LSTM or Flatten layer. Its main function is to weigh and transform the features output from the previous layer to generate the final output.

The basic idea of building the model is shown in Figure 1. Firstly, the algorithm imports the data and processes the traffic flow data and then normalizes the data by compressing them between 0 and 1. The data are divided according to the holdout method, which separates them into a training set and a test set based on a specified proportion. Then, the model undergoes training, which consists of three layers: the LSTM layer, Flatten layer, and Dense layer. The LSTM layer is a type of recurrent neural network layer designed for handling sequential data. It captures and remembers long-term dependencies in the data using memory cells and gates. The Flatten layer transforms the multidimensional output from the LSTM layer into a one-dimensional vector. The Dense layer connects each neuron to every neuron in the previous and following layers. At the end of the training, the algorithm checks whether the specified number of iterations is completed. If the number of iterations has not been reached, the training continues. The model is documented so that the test set can be used in the online system when the specified number of iterations is reached. Finally, the test set data are fed into the model to obtain the prediction results. These results are then denormalized to obtain the final predicted traffic flow data.

Algorithm 1 presents the pseudo code of the proposed dynamic traffic flow prediction based on multi-parameter MVHS-LSTM. Lines 1∼4 represent the preparation phase of the algorithm. In this phase, we initialize the matrix Q, normalize it, and divide it into a training set and a testing set. Additionally, we define two variables: $M i n P a r a$ , which is a list containing three variables, and $A R G_{m i n}$ , which represents the minimum equilibrium value obtained through Equation Lines 5∼16, which constitute the iterative algorithm. In each iteration, we examine whether the current descending value is smaller than the last three values. If it is, we consider this value as a local minimum. If not, it indicates that the value is still in a descending state, and the iteration continues. Lines 17∼18 correspond to the prediction phase. In this phase, we substitute the values of $M i n P a r a$ into the MVHS-LSTM model and predict the next traffic flow value.

Algorithm 1 Dynamic traffic flow prediction based on multi-heuristic parameter

Require: s traffic speed data: $Q_{11}$ to $Q_{s 1}$

Require: s traffic flow data: $Q_{12}$ to $Q_{s 2}$

1: Normalization traffic flow data and traffic speed data in $Q_{s}$

2: Divide $Q_{s}$ into training sets $Q_{1}$ − $Q_{m}$ and testing sets $Q_{m + 1}$ − $Q_{s}$

3: MinPara = [ $w w_{n}$ , $e p o_{n}$ , $r a t_{n}$ ]

4: $A R G_{m i n}$ = Equ(14)( $w w_{n}$ , $e p o_{n}$ , $r a t_{n}$ )

5: for MinPara = $s t a r t$ To $s t o p$ Step $s t e p$ do

6: Substituting $Q_{1}$ − $Q_{m}$ into MVHS-LSTM model

7: if $E q u {(14)}_{n}$ < $A R G_{m i n}$ then

8: for i = 0 To 3 do

9: if $E q u {(14)}_{n + i}$ > $A R G_{m i n}$ then

10: return MinPara

11: else

12: break

13: end if

14: end for

15: end if

16: end for

17: Put MinPara into MVHS-LSTM

18: Put $Q_{m + 1}$ − $Q_{z}$ into $M V H S - L S T M (E q u (8 - 13))$ to predict $Q_{z + 1}$

19: Anti-normalization

20: Output the predicted next traffic flow $Q_{z + 1}$

5. Performance Evaluation

We evaluate the proposed MVHS-LSTM in terms of both loss and time. The simulation environment and simulation results analysis is also discussed in this section. The evaluation aims to answer the following two questions: (1) Can the simulation results verify the theoretical analysis? and (2) Does the proposed algorithm outperform the comparing algorithms, i.e., the PSO-LSTM algorithm, Support Vector Machines algorithm and ARIMA algorithm?

5.1. Evaluation Environment

We utilized MATLAB version R2021a and employed the following toolboxes: Neural Network Toolbox, Deep Learning Toolbox, Signal Processing Toolbox, and Statistics and Machine Learning Toolbox. We conducted the computations on a system equipped with an Intel Core i7 processor, running at 2.6 GHz. The system also had 12 GB of RAM and was running on the Windows 10 operating system. The data selected in this experiment are open source data taken by Wu et al. from Fudan University using video detection method [[31]]. The data are traffic flow from 7:40 to 16:50 with a total of 28,044 data. These data are open source data based on real maps. The traffic flow data in the dataset represents the number of vehicles passing through a specific point on a road within a certain time period, while the vehicle speed data represents the average speed of vehicles passing through a specific road segment within a specific time interval. As shown in Figure 3, the road is the Yan' an Viaduct near the Shanxi Road in front of the Shanghai Exhibition Hall.

5.2. Evaluation Metrics

We choose performance indicators to evaluate the merits and demerits of the model. MAE, RMSE and MAPE are selected as metrics to evaluate the prediction performance.

(18) $M A E = \frac{1}{S i z} \sum_{i = 1}^{S i z} y_{i} - y$

(19) $R M S E = \sqrt{\frac{1}{S i z} \sum_{i = 1}^{S i z} {y_{i} - y}^{2}}$

(20) $M A P E = \frac{100 %}{S i z} \sum_{i = 1}^{S i z} \frac{y_{i} - y}{y}$

Here, $S i z$ is the sample size, $y_{i}$ is the predicted values via the involved approaches, and y is the actual value. MAE (Mean Absolute Error) measures the average magnitude of the differences between predicted and actual values. It provides a straightforward indication of the model's accuracy, where lower values indicate better performance. RMSE (Root-Mean-Squared Error) calculates the square root of the average of squared differences between predicted and actual values. It penalizes larger errors more heavily than MAE and is useful when larger errors have a significant impact on the model's performance. MAPE (Mean Absolute Percentage Error) measures the average percentage difference between predicted and actual values. It provides insights into the relative performance of the model, indicating the average magnitude of errors in terms of percentages.

5.3. Comparison Algorithms

This paper selects three comparison algorithms as the baseline, i.e., PSO-LSTM algorithms, ARIMA and the Support Vector Machine model (SVM). ARIMA relies on traditional mathematical and physical methods, while SVM and PSO-LSTM concentrate on neural networks or deep learning technologies. PSO-LSTM, specifically, employs the PSO algorithm to optimize hyperparameters for the LSTM method. Our approach shares similarities with PSO-LSTM as we also leverage similar techniques to enhance the performance of the LSTM model. SVM predicts by constructing an optimal hyperplane for classification or regression, while ARIMA predicts by modeling the autocorrelation and trend in time series data to forecast future values. PSO-LSTM predicts by combining the optimization capabilities of Particle Swarm Optimization (PSO) with the LSTM neural network to improve prediction accuracy.

5.3.1. Comparison Methods: PSO-LSTM

PSO-LSTM (Particle Swarm Optimization-Long Short-Term Memory) [[32]] uses particle swarm optimization to optimize some of the hyper parameters, so as to find the best performance point. Based on simulating the number of evolution and population size, the initial learning rate and the number of hidden layer neurons are updated as follows:

(21) $\begin{matrix} v_{I + 1} = v_{I} + c_{1} \times r a n d \times (p b e s t_{I} - L o c_{I}) \\ + c_{2} \times r a n d \times (g b e s t_{I} - L o c_{I}) \end{matrix}$

(22) $L o c_{I + 1} = L o c_{I} + v_{I}$

Here, $v_{I}$ is the velocity of the I-th particle. $c_{1}$ and $c_{2}$ are learning coefficients. $p b e s t_{I}$ denotes the best position the I-th particle has ever experienced. $g b e s t_{I}$ is the best position ever experienced by all particle, and $L o c_{I}$ is the location of the I-th particle.

PSO-LSTM uses the above formulae to calculate the particle velocity and then updates the particle position. The updates of population, individual optimal and population optimal are carried out to find the global optimal parameters, and then, LSTM learning is carried out.

The advantage of PSO-LSTM is that the parameters of LSTM can be automatically adjusted and optimized without manual search. However, the parameter adjustment of PSO-LSTM can only adjust the two important parameters, which leads to a long adjustment time.

5.3.2. Comparison Methods: ARIMA Model

The essence of the ARIMA algorithm [[28]] is to identify trends, seasonal and periodic patterns in the data and then extract the rule information from the data layer by layer, so that the data are free of patterns or noise.

The ARIMA model is used as the comparison algorithm to conduct a comparative experimental study on the time series model and the algorithm. ARIMA model can be divided into three parts, AR (Auto Regressive model), MA (Moving Average model) and I (Integrated), and their formulas are

(23) $A R : Y_{r} = μ + ε_{r} + \sum_{i = 1}^{p} γ_{i} Y_{r - i}$

(24) $M A : Y_{r} = μ + ε_{r} + \sum_{i = 1}^{q} θ_{i} ε_{r - i}$

Here, $Y_{r}$ is the current value, $Y_{r - i}$ is the historical value, $μ$ is the constant term, $γ_{i}$ is the autocorrelation coefficient, and $θ_{i}$ is the error term. The AR formula is used to describe the relationship between the current value and the historical value and uses historical time data of variables to predict, while the $M A$ formula is concerned with the accumulation of the error terms in the autoregressive model. I is the difference method, and ARIMA is an ARMA model after the difference method to ensure the stability of the value.

The ARIMA model contains three parameters, i.e., p, d, and q. p represents the lag number of the time series data itself used in the prediction model. d represents the order of difference required for time series data to be stable. q denotes the lag number of prediction error used in the prediction model.

5.3.3. Comparison Methods: SVM Model

The core idea of SVM [[33]] is to map the sample vector to an N-dimensional feature space by selecting a kernel function and then construct an optimal decision function in this feature space. This idea can be expressed as follows:

(25) $\begin{matrix} m i n (\frac{1}{2} ∥ ψ ∥^{2} + λ \sum_{i = 1}^{Z} ξ_{i}) \\ s . t . \begin{matrix} u_{i} (ψ^{T} k_{i} + B) \geq 1 - ξ_{i} \\ ξ_{i} \geq 0 \\ i = 1, 2, ..., Z \end{matrix} \end{matrix}$

where $ω$ is weight vector, $λ$ is the penalty coefficient, and $ξ_{i}$ is slack variables. RBF (Radial Basis Function) is selected as the kernel function in our evaluation, which is defined as follows:

(26) $K (k, u) = e^{- γ ∥ k - u ∥}$

We calculate the values of $λ$ and $γ$ by the iterative method and substitute the predicted values into the evaluation function to obtain the best evaluation results.

5.4. Performance Analysis of Our Proposed Method

There are many variables that may affect the traffic flow, namely, the number of lanes, speed and distance between vehicles. The correlation between variables of multiple regression analysis and traffic flow is obtained by computation, and the variables with low correlation are deleted to reduce the amount of computation. Therefore, there are comparisons between one variable and three variables. The R-squared and test results are shown as the following Table 2.

According to Equation (7), it is obvious that the correlation between the vehicle speed and traffic flow is much higher than the others, and the R-squared is less than 0.25. The highest is 0.1383, which is Speed+Lanes+Veh Distance. Therefore, in preprocessing of the traffic flow, the vehicle speed is firstly calculated as a conclusive variable affecting the traffic flow for data preprocessing, and then, the following operations are performed. We ran it five times for each set of experiments, following a step-by-step procedure for each experiment, and obtained the average values as the results. During the experiments on each baseline, we adjust the parameters to appropriate sizes to ensure that they do not overfit or underfit. The variables are brought into the MVHS-LSTM model, and epoch, learning rate and window length are iterated through the heuristic algorithm, and then, the following results are, respectively, obtained to obtain their effects on MAPE and time. The error and time data are normalized to obtain an intersecting point, and the nearby point is the best neutralization point. We take the midpoint data as the final result. Users can make a humanized adjustment on those points.

Figure 4 shows the MAPE and time with an increasing number of iterations with other variables held constant. As the model is trained to be more suitable for data with the increase in iteration. So with the same learning rate and window length, the model becomes progressively better for the learned data as the number of iterations increases. It results in a decrease in the value of MAPE and an increase in the computing time required for learning. When the number of iterations was 1000, the MAPE was 0.0941, and the running time was 187.88 s. However, when the number of iterations increased from 1000 to 2000 and 4000, the MAPE decreased by 29.8% and 33.2%, and the running time increased by 193% and 492.8%. It can be seen that as the number of iterations increases, the error decreases while the learning time increases very quickly, but consuming 300% of the time to reduce the error by 3.4%. It is not worth wasting abundant computing time to acquire the tiny relative improvement in prediction accuracy in practice. As the number of epochs increases, the accuracy (MAPE) improves and becomes smaller, but the computation time increases. This is because the increase in the number of iterations leads to more learning cycles, resulting in improved learning capability. However, this improvement in learning comes at the cost of increased computational time. During the training process, we substitute the calculation error and time into the Formulas (14) and (15), and we obtain that when the epoch value is 1500, the ARG value is the smallest. Therefore, we choose the epoch of 1500 as our final result.

Figure 5 shows that MAPE increases with the learning rate when other variables remain unchanged. The learning rate is a hyperparameter that guides us how to adjust the network weight through the gradient of the loss function. The lower the learning rate, the slower the loss function changes. Although using a low learning rate can ensure that the algorithm does not miss any local minima, it also means that we will take longer to converge, especially in the case of being trapped in a plateau area. In this paper, the loss and running time of the learning rate in the range of $1 \times 10^{- 5}$ to $1 \times 10^{- 4}$ are selected, and it can be seen that the error shows a sharp downward trend from $1 \times 10^{- 5}$ to $5 \times 10^{- 4}$ with little change in running time. The error shows a slow declining trend from $6 \times 10^{- 5}$ to $1 \times 10^{- 4}$ . Though the trend is not obvious, the running time increases sharply. When the learning rate reaches a certain threshold, the accuracy does not show significant deterioration and tends to stabilize. However, the learning time increases dramatically. This indicates that the model has reached a point where further adjustments to the learning rate may not have a noticeable impact on accuracy improvement. The model's training progress slows down as it requires more iterations to converge. It is important to strike a balance between accuracy and training time, finding the optimal learning rate that achieves a satisfactory level of accuracy without excessively prolonging the training process. During the training process, we substitute the calculation error and time into the Formula (14) and (15), and we obtain that when the learning rate value is $4 \times 10^{- 5}$ , the ARG value is the smallest. Therefore, we choose the epoch of 1500 as our final result. So we choose a learning rate of $4 \times 10^{- 5}$ as our final result.

Figure 6 shows MAPE performance with window length when other variables remain unchanged. The window length represents the time step of a single input of data in each training batch. The larger the value of the window length, the greater the amount of data per learning. In this paper, the loss and running time in the range of window lengths of 2 to 18 are selected. It can be seen that the error of the window length decreases rapidly from 2 to 6, and MAPE decreases by 27.3% on average for every 2 increase in the window length. When the window length is from 6 to 18, MAPE decreases by 4.5% on average. The computing time increases rapidly in the window length from 2 to 6. The calculation time increases by an average of 5.8% for every 2 increase in the window length, while the calculation time increases by an average of 1.7% when the window length is from 6 to 18. During the training process, we substitute the calculation error and time into the Formula (14) and (15), and we obtain that when the window length value is 6, the ARG value is the smallest. So we choose a window length of 6 as our final result. According to our simulation results, if the window length is too short, each input contains less information, resulting in less long-term feature information captured by the LSTM. However, on the other hand, the analysis time is shorter. Conversely, if the window length is too long, each input contains more information, leading to more long-term feature information captured by the LSTM. However, the analysis time will increase.

In summary, the model not only realizes the heuristic selection of parameters and variables but also gives the comprehensive value considering both MAPE and computing time. We verify our results in the system model and algorithm design, and the model can match the dynamic topology in practice road network.

5.5. Comparison Performance of Methods

In order to evaluate our proposed methods' advantage further, the simulation compares the performance among three methods, i.e., SVM, PSO-LSTM and ARIMA models. These methods have been applied to various sequence data prediction tasks and satisfactory results have been obtained. In comparison to the baseline algorithms, we use the same dataset to make predictions and evaluate the performance of our algorithm. Multiple algorithms are employed to perform predictions on the dataset, and the evaluation is carried out using the same evaluation function. Additionally, we also perform optimization on the baseline algorithms to ensure a fair comparison. Table 3 shows the comparison results of prediction accuracy between MVHS-LSTM and other models.

The MAPE score is particularly important for the performance of the indicator model. MAPE is used to detect the average absolute percentage error of each prediction model and the sample-observed traffic flow. Furthermore, for different sections of the data, the MAE data are not accurate, and MAPE can better reflect the average prediction accuracy. As shown in Table 3, the MAPE score of our model is 5.8102%, which is 10.4% lower than the ARIMA model, 13.1% lower than the SVM model, and 1.3% lower than the PSO-LSTM model. Our model does not exceed the SVM and ARIMA model in computing time. However, compared to the ARIMA model, our model is 37.2268 lower in MAE, 58.5902 lower in RMSE, 0.6715% lower in MAPE, and only 175 s longer in computing time. Compared to the PSO-LSTM model, although the evaluation metrics are slightly lower than our model, the computing time of the PSO-LSTM model is 513.4% longer than our model, which is unacceptable. Therefore, our model has outstanding comprehensive performance among several comparison models.

Figure 7, Figure 8, Figure 9 and Figure 10 are comparison charts of the traffic flow prediction effect.

It can be seen from the chart that the ARIMA model utilizes differencing methods, and its predicted values consistently fall between the actual data points. Overall, the predictions are relatively stable, but they lack short-term forecasting accuracy, and while the model captures the general trend reasonably well, most of the values are overly averaged, failing to capture the short-term characteristics of traffic flow and resulting in fewer fluctuations or spikes in the predictions. The SVM model focuses primarily on short-term characteristics and lacks the ability to learn from long time series data. As observed, the predictions in the first half of the sequence are relatively accurate. However, in the latter half of the predictions, the SVM method gradually deviates from the original data and tends to be positioned above the overall trend. With the passage of time, the predicted values may become increasingly inaccurate. The PSO-LSTM model utilizes certain hyperparameters from particle swarm optimization to find the optimal performance point. However, it is associated with long computation times, making it unsuitable for studying real-world traffic flow data. The MVHS-LSTM model not only effectively predicts peak and valley values, but it also demonstrates improved performance. It boasts a low computation time, making it well suited for practical applications.

The update time of our online system depends on the frequency at which the traffic management system transmits data. It is determined by taking the maximum value between the time required for training and prediction in our system and the time interval at which the traffic management system updates its data. Our offline system is responsible for training the data, and based on experimental simulation results, we have obtained an average time of 366 s, which translates to approximately 6–7 min for completing the training and prediction process. If the traffic management system transmits data to us within a time interval of less than 6 min, the update time will be set to 6 min. On the other hand, if the traffic management system transmits data at intervals longer than 6 min, the update time will be equal to the duration of that data transmission cycle.

6. Conclusions

This paper aims to address the challenge of achieving a balance between accuracy and computing time in traffic flow prediction. The proposed enhanced LSTM model, known as MVHS-LSTM, employs a three-layer architecture to analyze traffic flow variables and MVHS-LSTM hyperparameters using an offline system. Taking into account actual road factors, the variables are carefully selected to determine the validity of traffic flow information and ensure data harmonization. The paper uses the method of multiple regression to calculate the R-squared and selects the most suitable variable for learning from multiple variables through formulas as the input variable for learning. This solved the problem of the excessive training burden caused by too many input variables in identified challenges. In terms of innovation for LSTM models, the paper uses the iterative method for each parameter, substitutes the calculation error and time into the formula to obtain the minimum value and selects it as the input for the variable. The MVHS-LSTM model undergoes iterative refinement to optimize hyperparameters and establish an equilibrium point between model error and computational time. By implementing the strategy of ending iterations in advance, compared to the PSO-LSTM model, there is no need for a too long training time, which saves the time for model learning and training. Furthermore, multiple sets of comparative tests were conducted, comparing the performance of the MVHS-LSTM model with the ARIMA, SVM, and PSO-LSTM models. The MAPE of the MVHS-LSTM model is 5.8102%, which is 10.4% lower than the ARIMA model, 13.1% lower than the SVM model, and 1.3% lower than the PSO-LSTM model. At the same time, the calculation time is also reduced by 513.4% compared to the PSO-LSTM model. The results demonstrate that our approach achieves a harmonious balance between prediction accuracy and computational time, enabling rapid responsiveness to real-time traffic flow variations and carrying practical significance for future applications. By analyzing and predicting data on traffic flow, congestion patterns, and road usage, traffic management departments can better optimize traffic signals, route planning, and traffic control strategies, thereby alleviating congestion, reducing traffic accident risks, and improving the overall traffic operation efficiency of roads. At present, this paper only focuses on predicting traffic flow on a single road segment. Subsequent research can synchronize predictions in space based on the topology map of the road segment, thereby radiating to the entire city and bringing new thinking and innovation to urban-level traffic prediction. At the same time, this paper does not consider the spatial relationship between road segments, specifically the external spatial characteristics of traffic flow data. In future research, we will further consider the spatial relationships between road segments within the traffic flow data, aiming to enhance the theoretical framework.

Figures and Tables

Graph: Figure 1 Overall working flow chart.

Graph: Figure 2 The structure of the LSTM memory block.

Graph: Figure 3 Illustration of evaluation scenario: Yan' an Viaduct near the Shaanxi Road in front of the Shanghai Exhibition Hall.

Graph: Figure 4 The degree of epochs affecting MAPE and time.

Graph: Figure 5 The degree of learning rate affecting MAPE and time.

Graph: Figure 6 The degree of window length affecting MAPE and time.

Graph: Figure 7 ARIMA model prediction effect chart.

Graph: Figure 8 SVM model prediction effect chart.

Graph: Figure 9 PSO-LSTM model prediction effect chart.

Graph: Figure 10 MVHS-LSTM model prediction effect chart.

Table 1 Summary of the main mathematical notations.

Notation	Description
Q	traffic flow data
s	the row of Q and the number of data for each variable
z	the column of Q and number of independent variables
$β_{i}$	regression coefficient
$δ$	disturbance term
$\hat{Q}$	the column of Q and number of independent variables
L	the sum of the squares of the differences between the fitted values and the actual values
$\bar{Q}$	mean value of real traffic data
j,p,m	the variables in the Q
k	selected variables
$C_{t}$	long-term memory cell
$h_{t}$	short-term memory cell
$f_{t}$	forget gate
${\hat{C}}_{t}$ , $i_{t}$	input gate
$O_{t}$	output gate
$W_{f}$ , $W_{i}$ , $W_{c}$ , $W_{o}$ , $b_{f}$ , $b_{i}$ , $b_{c}$ , $b_{o}$	weight value of gates
$e p o$	completed one iteration of all batches
$w w$	window length
$r a t$	the rate of learning
$C o s t$	the cost of model
$L o s s$	the loss of model
$t h r$	threshold of epoch
$I n$	the number of input nodes
$O u t$	the number of output nodes
$H i d$	the number of hidden layer nodes

Table 2 R-squared comparison among variables.

	Speed	Lanes	VEH-Distant	Speed+Lanes+Veh-Distant
R-squared	0.1362	0.0067	0.0015	0.1383

Table 3 Results of evaluation function between methods.

	ARIMA	SVM	PSO-LSTM	MVHS-LSTM
MAE	110.8981	78.0017	71.7587	73.6713
RMSE	151.6354	82.8986	91.3037	93.0452
MAPE	6.4817%	6.6848%	5.8874%	5.8102%
Time/s	191	256	1879	366

Author Contributions

C.G. designed the system model and revised the manuscript. J.Z. drafted the manuscript, evaluated the experiments, and drew the figures. X.W. supervised the manuscript and improved the architecture. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes 1 Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. References TOMTOMAvailable online: https://www.tomtom.com/en%5fgb/traffic-index/(accessed on 26 March 2024) 2 Ye D., Wen J., Zheng S., Zhong Q., Pei W., Jia H., Zhou C., Gong Y. Prediction of Key Parameters of Wheelset Based on LSTM Neural Network. Appl. Sci. 2023; 1311935. 10.3390/app132111935 3 Pun L., Zhao P., Liu X. A Multiple Regression Approach for Traffic Flow Estimation. IEEE Access. 2019; 7: 35998-36009. 10.1109/ACCESS.2019.2904645 4 Xuan A., Yin M., Li Y., Chen X., Ma Z. A comprehensive evaluation of statistical, machine learning and deep learning models for time series prediction. Proceedings of the 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA). Riyadh, Saudi Arabia. 1–3 March 2022: 55-60. 10.1109/CDMA54072.2022.00014 5 Gültekin T., Uğur A. An iterative dynamic ensemble weighting approach for deep learning applications. Proceedings of the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). Malatya, Turkey. 16–17 September 2017: 1-4. 10.1109/IDAP.2017.8090318 6 Run L., Min L.X., Lu Z.X. Research and Comparison of ARIMA and Grey Prediction Models for Subway Traffic Forecasting. Proceedings of the 2020 International Conference on Intelligent Computing, Automation and Systems (ICICAS). Chongqing, China. 11–13 December 2020: 63-67. 10.1109/ICICAS51530.2020.00020 7 Vidya G., Hari V., Shivasagaran S. Estimation of Passenger Flow in a Bus Route using Kalman Filter. Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). Coimbatore, India. 6–7 March 2020: 1248-1251. 10.1109/ICACCS48705.2020.9074363 8 Feng X., Ling X., Zheng H., Chen Z., Xu Y. Adaptive Multi-Kernel SVM with Spatial–Temporal Correlation for Short-Term Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2019; 20: 2001-2013. 10.1109/TITS.2018.2854913 9 Arif M., Wang G., Chen S. Deep Learning with Non-parametric Regression Model for Traffic Flow Prediction. Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). Athens, Greece. 12–15 August 2018: 681-688. 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00120 Zhou X., Li C., Liu Z., Luan T.H., Miao Z., Zhu L., Xiong L. See the near future: A short-term predictive methodology to traffic load in ITS. Proceedings of the 2017 IEEE International Conference on Communications (ICC). Paris, France. 21–25 May 2017: 1-6. 10.1109/ICC.2017.7996800 Huang B., Yun B. Traffic Flow Prediction Based on BRNN. Proceedings of the 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). Beijing, China. 12–14 July 2019: 320-323. 10.1109/ICEIEC.2019.8784513 Wu S. Spatiotemporal Dynamic Forecasting and Analysis of Regional Traffic Flow in Urban Road Networks Using Deep Learning Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2022; 23: 1607-1615. 10.1109/TITS.2021.3098461 Hu Y. Research on City Traffic Flow Forecast Based on Graph Convolutional Neural Network. Proceedings of the 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). Nanchang, China. 26–28 March 2021: 269-273. 10.1109/ICBAIE52039.2021.9389951 Cheng Z., Lu J., Zhou H., Zhang Y., Zhang L. Short-Term Traffic Flow Prediction: An Integrated Method of Econometrics and Hybrid Deep Learning. IEEE Trans. Intell. Transp. Syst. 2021; 23: 5231-5244. 10.1109/TITS.2021.3052796 Lin K.Y., Liu P.Y., Wang P.K., Hu C.L., Cai Y. Predicting Road Traffic Risks with CNN-and-LSTM Learning Over Spatio-Temporal and Multi-Feature Traffic Data. Proceedings of the 2023 IEEE International Conference on Software Services Engineering (SSE). Chicago, IL, USA. 2–8 July 2023 Kona S., Morthala S.V.K.R., Konathala R., Pinninti P.K., Mavuru H.K., Maria A. An Efficient Key Agreement Furthermore, Anonymous Mutual Authentication Protocols For Secure Communication In VANETs. Proceedings of the 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC). Chennai, India. 22–23 April 2022: 146-151. 10.1109/ICESIC53714.2022.9783551 Guo C., Li D., Chen X., Zhang G. An adaptive V2R communication strategy based on data delivery delay estimation in VANETs. Veh. Commun. 2022; 34: 100444. 10.1016/j.vehcom.2021.100444 Guo C., Li D., Zhang G., Ding X., Curtmola R., Borcea C. Dynamic Interior Point Method for Vehicular Traffic Optimization. IEEE Trans. Veh. Technol. 2020; 69: 4855-4868. 10.1109/TVT.2020.2983434 Guo C., Li D., Zhang G., Zhai M. Real-Time Path Planning in Urban Area via VANET-Assisted Traffic Information Sharing. IEEE Trans. Veh. Technol. 2018; 67: 5635-5649. 10.1109/TVT.2018.2806979 Hyun S.H., Kim J., Kim K., Cho W., Kim S.C. Joint Estimation of Vehicle's Position and Velocity with Distributed RSUs for OFDM Radar System. Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference. Rio de Janeiro, Brazil. 4–8 December 2022: 741-746. 10.1109/GLOBECOM48099.2022.10000753 Hoc H.T., Silhavy R., Prokopova Z., Silhavy P. Comparing Multiple Linear Regression, Deep Learning and Multiple Perceptron for Functional Points Estimation. IEEE Access. 2022; 10: 112187-112198. 10.1109/ACCESS.2022.3215987 Inghelbrecht G., Pintelon R., Barbe K. Large-Scale Regression: A Partition Analysis of the Least Squares Multisplitting. IEEE Trans. Instrum. Meas. 2020; 69: 2635-2647. 10.1109/TIM.2019.2925880 Sheng H., Zhang Y., Wu Y., Wang S., Lyu W., Ke W., Xiong Z. Hypothesis Testing Based Tracking with Spatio-Temporal Joint Interaction Modeling. IEEE Trans. Circuits Syst. Video Technol. 2020; 30: 2971-2983. 10.1109/TCSVT.2020.2988649 Lin J., Li H., Liu N., Gao J., Li Z. Automatic Lithology Identification by Applying LSTM to Logging Data: A Case Study in X Tight Rock Reservoirs. IEEE Geosci. Remote Sens. Lett. 2021; 18: 1361-1365. 10.1109/LGRS.2020.3001282 Li Z., Peng Y., Li J., Tang Z. Composite Foundation Settlement Prediction Based on LSTM—Transformer Model for CFG. Appl. Sci. 2024; 14732. 10.3390/app14020732 Wang C., Qiao J. Construction Project Cost Prediction Method Based on Improved BiLSTM. Appl. Sci. 2024; 14978. 10.3390/app14030978 Jin S., Jang A., Lee D., Kim S., Shin M., Do S.L. Development of Virtual Sensor Based on LSTM-Autoencoder to Detect Faults in Supply Chilled Water Temperature Sensor. Appl. Sci. 2024; 141113. 10.3390/app14031113 Kumaresan K., Ganeshkumar P. Software reliability prediction model with realistic assumption using time series (S)ARIMA model. J. Ambient. Intell. Humaniz. Comput. 2020; 11: 5561-5568. 10.1007/s12652-020-01912-4 Zhou Z.H. Machine Learning; Springer: Berlin/Heidelberg, Germany. 2021: 27-31. 10.1007/978-981-15-1967-3 Mirchandani G., Cao W. On hidden nodes for neural nets. IEEE Trans. Circuits Syst. 1989; 36: 661-664. 10.1109/31.31313 University F. Available online: http://traflow.fudan.edu.cn/index.html(accessed on 26 March 2024) Gundu V., Simon S.P. PSO–LSTM for short term forecast of heterogeneous time series electricity price signals. J. Ambient. Intell. Humaniz. Comput. 2021; 12: 2375-2385. 10.1007/s12652-020-02353-9 Anitha P., Kaarthick B. Retraction Note to: Oppositional based Laplacian grey wolf optimization algorithm with SVM for data mining in intrusion detection system. J. Ambient. Intell. Humaniz. Comput. 2023; 14: 37. 10.1007/s12652-022-03931-9

By Chang Guo; Jianfeng Zhu and Xiaoming Wang

Reported by Author; Author; Author

Titel:	MVHS-LSTM: The Comprehensive Traffic Flow Prediction Based on Improved LSTM via Multiple Variables Heuristic Selection
Autor/in / Beteiligte Person:	Guo, Chang ; Zhu, Jianfeng ; Wang, Xiaoming
Link:	Volltext (PDF) View record in DOAJ (Volltext) https://www.mdpi.com/2076-3417/14/7/2959 https://doaj.org/toc/2076-3417
Zeitschrift:	Applied Sciences, Jg. 14 (2024-03-01), Heft 7, S. 2959-2959
Veröffentlichung:	MDPI AG, 2024
Medientyp:	academicJournal
ISSN:	2076-3417 (print)
DOI:	10.3390/app14072959
Schlagwort:	traffic prediction heuristic selection LSTM deep learning Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Sonstiges:	Nachgewiesen in: Directory of Open Access Journals Sprachen: English Collection: LCC:Technology ; LCC:Engineering (General). Civil engineering (General) ; LCC:Biology (General) ; LCC:Physics ; LCC:Chemistry Document Type: article File Description: electronic resource Language: English

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.