Deep belief networks (DBNs) of deep learning technology have been successfully used in many fields. However, the structure of a DBN is difficult to design for different datasets. Hence, a DBN structure design algorithm based on information entropy and reconstruction error is proposed. Unlike previous algorithms, we innovatively combine network depth and node number and optimizes them simultaneously. First, the mathematical model of the structural design problem is established, and the boundary constraint for node number based on information entropy is derived by introducing the idea of information compression. Moreover, the optimization objective of the network performance based on reconstruction error is proposed by deriving the fact that network energy is proportional to reconstruction error. Finally, the improved simulated annealing (ISA) algorithm is used to adjust the DBN network layers and nodes simultaneously. Experiments were carried out on three public datasets (MNIST, Cifar-10 and Cifar-100). The results show that the proposed algorithm can design its proper structure to different datasets, yielding a trained DBN which has the lowest reconstruction error and prediction error rate. The proposed algorithm is shown to have the best performance compared with other algorithms and can be used to assist the setting of DBN structural parameters for different datasets.
Keywords: deep learning; DBN; artificial intelligence; structure design; information entropy; reconstruction error; improved simulated annealing algorithm
A deep belief network (DBN) is a kind of deep artificial neural network (ANN) [[
The performance of a DBN is closely related to its structure. A simple structure can improve the convergence speed, but it may lead to problems such as low training precision and large prediction error. A complex structure can improve the training precision, but it can easily lead to non-convergence or over-fitting. In engineering practice, experience or trial-and-error method are often used in traditional ANN structure design [[
Given the above problems, some researchers have studied DBN structure design. In terms of network depth, Pan et al. proposed using the correlation inference of network energy, network performance, and depth [[
Previous studies have preliminarily discussed the design method for a DBN structure, but they have only discussed a single aspect of structure, either network depth or the number of nodes. Or, they did not fully consider the unsupervised training process of the DBN network. In fact, the performance of a network is determined by both aspects. The two parameters are coupled and hence influence each other. The optimal value of the depth is related to the node selection strategy, and the optimal value of the number of nodes is related to the depth optimization strategy. If we combine the depth decision and the node optimization processes while ignoring the organic correlation between them, it is difficult to obtain a good network structure. Therefore, to improve the performance of DBN by changing its structure, we need a DBN structure design algorithm that simultaneously and organically combines network depth and node number.
Hence, this paper proposes a DBN structural design algorithm based on information entropy and reconstruction error. The algorithm innovatively combines the network depth and number of nodes into a unified mathematical model, introduces information entropy and reconstruction error, and uses the ISA algorithm to solve the optimization problem. First, using information compression and the distribution characteristics of the sample, a bound on the number of hidden layer neurons based on information entropy is derived. In addition, the positive correlation between reconstruction error and network energy is proved, and a model optimization that minimizes the reconstruction error is constructed. Then, this paper employs the ISA algorithm to solve for the network depth and node number while training the network. The experimental results show that this algorithm can generate a network structure that is adapted to different datasets. Moreover, the constructed DBN has lower reconstruction and root-mean-square errors in training process as well as a low prediction error rate in test process.
The DBN structure is determined by the number of layers and the number of nodes (or neurons) contained in each layer. Therefore, to adjust the structure, it is essential to automatically solve for the optimal number of layers and nodes for each data set. From the perspective of mathematical modeling, this problem can be expressed as an optimization in the solution space formed by all feasible DBN structures. Therefore, for the general optimization model, the problem can be mathematically expressed in the framework of an objective function and constraint conditions as follows:
(
where,
The range of the number of hidden-layer neurons is based on the information entropy.
The network performance is based on reconstruction error.
Hence, the DBN structure optimization model is constructed as follows:
(
Here, C represents the DBN structure and
The DBN consists of multiple layers of neurons, where each two adjacent layers of neurons make up one RBM, as shown in Figure 1. Each RBM has a bipartite graph structure. According to the input and output, the neurons are divided into a visible layer and hidden layer. Each neuron only performs layer interconnection and does not perform intra-layer interconnection. Each layer of neurons can be used as both a hidden layer for the current RBM and a visible layer for the next RBM. Therefore, a DBN can be regarded as a deep network in which multiple RBMs are stacked.
The process of transferring data from the visible layer to the hidden layer in an RBM is a dimensionality-reducing feature extraction process [[
Based on the idea of information compression, when determining the number of hidden-layer nodes, it must be ensured that the maximum amount of information that the hidden layer output vector can store is greater than or equal to the amount of information carried by the input data of the visible layer, so that information will be transferred losslessly. Otherwise, information will be inevitably lost, and this will ultimately reduce the overall network performance. Therefore, this paper employs the information entropy as the criterion for determining the number of hidden layer nodes.
Information entropy, proposed by Shannon, is a measure of information quantity. In physical sense, it refers to the uncertainty of the received signal. The formula for calculating the information entropy of a single character is:
(
where, H is information entropy, J is the number of characters, and
Equation (
Let the number of visual layer nodes be
(
Further, let the number of hidden layer nodes be
(
Because the maximum amount of information that the hidden layer output vector can store is greater than or equal to the amount of information carried by the input data of the visible layer, we obtain:
(
From Equations (
(
Obviously, Equation (
(
To obtain a more reasonable network, the maximum number of neurons in each hidden layer is defined according to [[
(
From Equation (
(
From the above analysis, we hence obtain Conclusion 1, and the range of the number of hidden layer nodes based on information entropy is
To optimize the network structure, we need to introduce an index that can reflect the performance of DBN. According to [[
Network energy is an important index for judging the performance of feedback network, and its numerical value is inversely proportional to the network performance.
Network energy is calculated as:
(
Here, L represents the network energy, T represents the total number of training samples, W represents the weight matrix,
Therefore, in theory, network energy can be used as an optimization objective. However, the computational complexity of network energy is high, which may lead to impractically long computation times and memory overflow. Hence, in this paper, based on [[
The reconstruction error refers to the difference between the samples obtained by Gibbs sampling and the original data. The calculation of reconstruction error R is:
(
Here,
(
(
Here,
(
In RBMs, we use
(
Because
(
Because
(
Combining Equation (
(
Here
(
Moreover, according to Equations (
(
This demonstrates that the reconstruction error has a positive correlation with the network energy. The computational complexity of Equations (
For the optimization model established in the Section 2, a suitable algorithm can be adopted. The simulated annealing (SA) algorithm has many advantages [[
The SA algorithm is a general probabilistic search algorithm that simulates the annealing process of solid matter in physics. It has a fast search speed and excellent globally optimal search ability. The core concept of SA is to construct a state transition probability matrix and update the current solution according to the matrix. The probability of a transition from state 1 to state 2
(
Here,
In addition, let
(
Here,
The traditional SA algorithm has some disadvantages, such as sensitive parameters, poor convergence performance, and a tendency to fall into local optima. Therefore, according to [[
In order to study the DBN structure design based on ISA algorithm, two lemmas are introduced.
the fitting accuracy of the network increases as the number of network layers increases, when the number of training samples is sufficient [[
increasing network depth can improve network performance more effectively than increasing network width [[
Combining Conclusions 1 and 2, we obtain the following three Rules.
- The internal energy of the solution in the ISA algorithm is equal to the reconstruction error of the RBM at the highest level of the DBN.
From Conclusion 2 and Lemma 2, we obtain that the reconstruction error of the topmost RBM reflects the upper bound of the performance of the whole network structure, which is the optimization goal of the model. Hence, we obtain a second rule.
- 2. The undetermined new solution of the number of nodes in the layer is randomly generated, and the state update follows Equation (
22 ).
The number of nodes
(
Hence, we obtain the following equation:
(
According to the Metropolis rules, if
- 3. The number of layers increases monotonically from simple to complex.
According to Lemma 3, the effect of the upper layer nodes on performance is much higher than that of the lower layer nodes, so the complexity of the network structure is gradually improved by a layer-by-layer approach. The number of nodes in the bottom layer is optimized first then fixed. Then, in each subsequent iteration, only the number of nodes in the next layer of the network is adjusted.
The pseudocode of the resulting DBN structure design algorithm is shown in Algorithm 1.
1: Initialization: set initial temperature 2: 3: 4: Generate 5: 6: The new number of neurons 7: 8: C = 9: 10: Find 11: 12: 13: 14: Return the optimal network structure. 15:
In the evaluation, we refer to the proposed algorithm as the information entropy and reconstruction error via ISA (IEREISA) method. We compare the similarities and differences in performance between IEREISA and some common DBN depth and node-number setting methods. The depth setting methods consist of a fixed method [[
- Reconstruction Error and Equivalent nodes (REE): The number of neurons in each layer are set to be equal and the decision to increase the network depth is determined by the value of the reconstruction error. Moreover, the maximum network depth is set to ensure the convergence of the algorithm.
- Rate of Correlation and Equivalent nodes (RCE): Similar to REE, the numbers of neurons in each layer are equal. The value of the cross-correlation coefficient determines whether to increase the network depth and the maximum network depth is fixed to ensure the convergence of the algorithm.
- Traversal Search with Constant Layers (TSCL): TSCL obtains the optimal architecture by manually setting the network depth and then searching for the number of neurons in each layer by traversal, also called exhaustive search. In the TSCL algorithm, the maximum number of neurons per layer is fixed to ensure the convergence of the algorithm.
- IERESA: The main idea of the IERESA algorithm is the same as the IEREISA algorithm, except that the normal SA algorithm is used instead of ISA.
The corresponding DBNs were generated for the above five different structural algorithms, and experiments were carried out on three public datasets (Cifar-10, Cifar-100, and MNIST) [[
- Reconstruction error in the unsupervised training process. The unsupervised training pre-adjusts the weights and bias, and a lower reconstruction error indicates better training, which further indicates that the structure design algorithm obtains better results.
- Root-mean-square error (RMSE) in the supervised training process. Supervised training uses the error back propagation algorithm to fine-tune the weight. A lower RMSE after training indicates better training and a better network performance.
- The prediction error rate of the test dataset. The error rate of the test results indicates the effectiveness of the algorithm.
- The runtime of the algorithm. When the DBN structure is changed, the new part of the structure needs to be retrained, which causes the complexity of the algorithm to substantially impact training time. A higher complexity and larger number of required iterations increases the time for training. Therefore, runtime, as an indicator of algorithm complexity, can be compared across different algorithms.
In the experiment, the initialization parameters of the DBN network were set as follows:
- (
1 ) The weights W were randomly generated according to the normal distribution N ~ (0, 0.01). - (
2 ) The hidden layer bias c was initialized to be zero. - (
3 ) To control the network scale, Dmax = 10. - (4) The visual layer bias _Ii_ was produced by the following equation:
(
where b
This experiment tests the performance of the methods on a high-dimensional input sample. The public dataset Cifar-10 is a classic experimental dataset in the machine learning, which has 60,000 samples and 10 classes. Each sample contains features and labels, characterized by 3072 pixels with a value of 1–255 and a single integer in the range 0–9. We used 50,000 samples as training set and 10,000 samples the test set, and the algorithm parameter settings are shown in Table 2. In the IEREISA and IERESA algorithms, R
The reconstruction error for DBN obtained by the five structure design algorithms is shown in Figure 2. Obviously, over the whole iteration process, except for the TSCL algorithm, the reconstruction error of the algorithms gradually decreases. The IEREISA algorithm has the lowest convergence value, demonstrating that it performs the best on this dataset.
In Figure 2, the REE algorithm and the RCE algorithm use an equal number of neurons in each layer, which does not guarantee that the numbers of neurons in each layer are optimal. Hence, the reconstruction error cannot converge to its optimal value. It proves that the performance of DBN is determined by the number of layers and the number of nodes. The algorithms that only consider the number of layers cannot find the optimal network structure. Moreover, the TSCL algorithm adopts the traversal method with a slow convergence speed, so the reconstruction error tends to oscillate and may not converge within the maximum number of iterations. In the same way, an algorithm that considers only the number of nodes without considering the number of layers also cannot find the optimal network structure. In addition, the IEREISA algorithm and IERESA algorithm have good performance and the IEREISA algorithm can reach the lowest reconstruction error. This is because the optimization ability of SA is not as good as that of ISA. The experimental results hence show that the network structure generated by IEREISA algorithm has the lowest reconstruction error and the IEREISA algorithm, which simultaneously and organically combines network depth and node number, can find the optimal DBN structure suitable for the current dataset.
The DBN structures obtained by the above five algorithms is shown in Table 3. It can be seen that the DBN structure obtained by the IEREISA algorithm proposed in this paper is more reasonable than other algorithms.
The algorithm parameter settings for the supervised training process are shown in Table 2. The RMSE of the DBN networks generated by the algorithms during the training process is shown in Figure 3. Compared with the other four algorithms, the DBN network generated by the IEREISA algorithm has the fastest convergence speed for supervised training and has the lowest RMSE convergence value, because the IEREISA can design the most proper network structure.
The trained networks were tested using the same test set, and the error rates are shown in Figure 4. The IEREISA algorithm has the lowest error rate of 30.35%. The runtime statistics of the algorithms are shown in Figure 5. The training times of RCE and REE algorithms are short, the training times of the IERESA and IEREISA algorithms are a little longer, and the training time of the TSCL algorithm is the longest. This is because the number of nodes is much larger than the number of layers of the solution space, so the IERESA, IEREISA, and TSCL algorithms require more searching and take a longer time to compute. In particular, the TSCL algorithm uses traversal search, which is inefficient. Although the IEREISA algorithm takes more time than some methods, it considers both the network depth and number of nodes. In contrast to the REE and RCE algorithms, IEREISA obtains both the best network depth and the number of nodes. IEREISA also improves the quality of the solution obtained by IERESA.
In summary, the experimental results show that on the Cifar-10 dataset, the proposed IEREISA algorithm can obtain a lower RMSE and reconstruction error than those of other algorithms and has higher prediction accuracy. However, the algorithm incurs a small increase in time complexity owing to the increased scale of the solution space.
This experiment evaluates the performance of the algorithm on other datasets. The experiment uses the MNIST handwriting recognition dataset, which is a basic experimental dataset for testing network performance and consists of a total of 60,000 training samples, 10,000 test samples and 10 classes. Each sample has a 28 × 28 matrix as the input features and 10 one-hot vectors as labels. The algorithm parameters were set as shown in Table 4.
In the IEREISA and IERESA algorithms, R
The results of the reconstruction error are shown in Figure 6. Like the analysis in Section 4.1.1, the IEREISA algorithm also achieves the lowest reconstruction error on the MNIST dataset, which demonstrates the effectiveness of the algorithm on more than one dataset.
The DBN structures obtained by the above five algorithms is shown in Table 5. It has also been proved in Table 5 that the IEREISA algorithm proposed in this paper has the most reasonable network structure, which shows the same result as in Table 3.
The results of the RMSE are shown in Figure 7. The RMSE of the IEREISA algorithm converges to the lowest value and its speed of convergence is the fastest on the MNIST data set. Compared with the networks of the other algorithms, the DBN structure designed by the proposed IEREISA algorithm has the most proper structure and shows the best fitting ability.
The error rates are compared shown in Figure 8. The error rate of the IEREISA algorithm (0.81%) is much lower than of the other four algorithms. This demonstrates that the network structure generated by the IEREISA algorithm has the best prediction performance on the MNIST dataset compared with other algorithms.
The time consumed by the five algorithms is shown in Figure 9. The IEREISA algorithm slightly increases the time complexity of the algorithm, which is consistent with the experimental results of Section 4.1.3.
In the DBN structure design algorithm proposed in this paper, when the RBM layer is newly added, the ISA algorithm is selected to calculate the optimal number of neurons. In order to verify the effectiveness of the ISA algorithm, the ISA algorithm is compared with the SA algorithm and the genetic algorithm (GA). The experiment using genetic algorithm was denoted as IEREGA. In the experiment, the parameter settings of the IEREISA algorithm and the IERESA algorithm are shown in Table 2 and Table 3. The parameter settings on Cifar-10 dataset are same as Cifar-100 dataset. According to [[
The experimental results of three algorithms on the three datasets are shown in Table 7, Table 8 and Table 9. By comparing Table 7, Table 8 and Table 9, it can be seen that the IEREISA algorithm can obtain a reasonable network structure for different datasets while maintaining low reconstruction error, low RMSE, and high prediction accuracy. Table 8 shows that the SA algorithm may fall into local optima when solving for the number of neurons, which is caused by the SA algorithm's performance.
It can be seen from Table 9 that the IEREGA algorithm also appears to fall into the local optimum, because GA is susceptible to the initial value of the population. When searching the optimal number of neurons, the area of solutions determined by the coding length of GA is much larger than the range of values satisfying the constraints of neurons, thus causing a decline in GA search capability. And the quality of the solution is affected by the insufficient local search ability of GA.
In summary, for different datasets, the proposed IEREISA algorithm maintains the lowest reconstruction error, RMSE and prediction error rate, and has the best fitting and prediction performance compared with other algorithms. The IEREISA algorithm organically combines the methods for determining the number of layers and number of neurons, and simultaneously optimizes both to obtain a better network structure. Compared with the REE and RCE algorithms which only consider the number of layers, the runtime of IEREISA algorithm is longer, but redundancy in the network is avoided. Moreover, a network with better performance and a more reasonable structure is obtained by the IEREISA algorithm. Compared with TSCL, which only considers the number of neurons, IEREISA can not only obtain a network with better performance, but it also improves the efficiency of the algorithm and reduces the runtime. Because TSCL adopts a traversal search, it is difficult to converge for networks with a complex structure.
Compared with the previously proposed method, the IEREISA algorithm, which utilizes information entropy and reconstruction error, optimizes the number of layers and the number of neurons simultaneously and can quickly obtain a DBN network with better performance and a more reasonable structure.
In this paper, an approach that combines and simultaneously optimizes the number of network nodes and the depth of the network in a DBN was proposed. First, we constructed a mathematical model for optimizing the DBN structure by introducing information entropy and reconstruction error. Then, the ISA algorithm was employed to optimize the model. Finally, the algorithm proposed in this paper was tested on three public datasets. Experimental results show that for different datasets, the proposed algorithm can achieve lower reconstruction error, RMSE, and prediction error rates. Moreover, this algorithm can adaptively optimize a network structure for different datasets and obtain a better network structure than other algorithms. The DBN structure design algorithm proposed in this paper is superior to the previously proposed algorithms and can be used to provide a reference for the setting of DBN structural parameters for different datasets, which is an important and often over-looked issue of parameter optimization in DBN.
The ideas in this article can also be used when working with other network models. For example, for the CNN model, the reconstruction error after optimization for CNN can be used as an objective function of network performance. The information entropy theory is used as the constraint condition of the number of neurons, and the heuristic search algorithm can be used to obtain the optimal network structure. In this paper, we mainly combine the unsupervised training process of DBN, so the algorithm proposed in this paper may not be applicable to networks without unsupervised training process. Therefore, our follow-up work will be based on the idea of this paper, and propose structure design algorithms for other network models.
Graph: Figure 1 RBM structure in a DBN.
Graph: Figure 2 DBN reconstruction error variation of five structural algorithms on the Cifar-10 dataset.
Graph: Figure 3 RMSE variation of five algorithms on the Cifar-10 dataset.
Graph: Figure 4 Prediction error rate of five algorithms on the Cifar-10 dataset.
Graph: Figure 5 Runtime of five algorithms on the Cifar-10 dataset.
Graph: Figure 6 DBN reconstruction error variation of five algorithms on the MNIST dataset.
Graph: Figure 7 RMSE variation of DBN of five algorithms on the MNIST dataset.
Graph: Figure 8 Prediction error rate of five algorithms on the MNIST dataset.
Graph: Figure 9 Runtime of five algorithms on the MNIST dataset.
Table 1 Computational complexity of reconstruction error and network energy.
Means Multiplication Quantity Addition Quantity Reconstruction Error Network Energy
Table 2 Algorithm parameter settings for the Cifar-10 dataset.
Batch Size Iterations (Supervised, Unsupervised) Learning Algorithm Momentum Learning Rate (Supervised, Unsupervised) Activation Function Output Classifier 2000 (1500,50) Momentum gradient 0.5 (0.5,0.5) Sigmoid Softmax 1 0.7
Table 3 The five DBN structures obtained by above five algorithms in Cifar-10 dataset.
Algorithm DBN Structure Reconstruction Error REE [3072,200,200,200,200,200,200,10] 3.9989 TSCL [3072,3008,2009,500,507,406,99,208,316,58,36,10] 5.0036 RCE [3072,100,100,100,100,100,100,10] 3.6587 IERESA [3072,2959,756,1024,146,99,95,10] 1.4032 IEREISA [3072,2958,756,1033,134,99,95,10] 1.1106
Table 4 Algorithm parameter settings for the MNIST dataset.
Batch Size Iterations (Supervised, Unsupervised) Learning Algorithm Momentum Learning Rate (Supervised, Unsupervised) Activation Function Output Classifier 200 (30,500) Momentum gradient 0.5 (0.5,0.5) Sigmoid Softmax 5 0.5
Table 5 The five DBN structures obtained by above five algorithms in MNIST dataset.
Algorithm DBN Structure Reconstruction Error REE [784,200,200,200,200,200,200,10] 3.9989 TSCL [784,777,659,452,68,106,69,78,16,28,36,10] 5.0036 RCE [784,100,100,100,100,100,100,10] 3.6587 IERESA [784,150,138,112,102,92,82,10] 1.4032 IEREISA [784,155,150,112,112,100,75,10] 1.1106
Table 6 Parameter settings of the IEREIGA algorithm on different datasets.
Dataset Coding Length Population Max Number of Generations Crossover Probability Mutation Probability Cifar-10 12 10 10 0.75 0.01 Cifar-100 12 10 10 0.75 0.01 MNIST 10 10 10 0.75 0.01
Table 7 Experimental results of the IEREISA algorithm on different dataset.
Dataset Number of Layers Number of Neurons Reconstruction Error RMSE Prediction Accuracy Cifar-10 8 [3072,2958,756,1033,134,99,95,10] 1.1106 3.3010 69.65% Cifar-100 10 [3072,2586,880,112,86,73,99,95,86,100] 36.2558 10.0777 61.94% MNIST 8 [784,155,150,112,112,100,75,10] 6.2096 0.0299 99.19%
Table 8 Experimental results of the IERESA algorithm on different dataset.
Dataset Number of Layers Number of Neurons Reconstruction Error RMSE Prediction Accuracy Cifar-10 8 [3072,2959,756,1024,146,99,95,10] 1.4032 3.4263 67.43% Cifar-100 10 [3072,2516,892,117,86,73,98,95,85,100] 36.8585 11.7817 61.70% MNIST 8 [784,150,138,112,102,92,82,10] 6.2397 0.0302 99.08%
Table 9 Experimental results of the IEREGA algorithm on different dataset.
Dataset Number of Layers Number of Neurons Reconstruction Error RMSE Prediction Accuracy Cifar-10 9 [3072,2436,1056,102,461,156,114,95,10] 2.0031 3.4003 64.34% Cifar-100 10 [3072,2516,892,201,88,98,102,94,85,100] 38.6475 11.8016 61.60% MNIST 8 [784,155,150,112,107,95,74,10] 6.3305 0.0311 99.07%
Conceptualization—J.J. (Jianjun Jiang), J.Z. and L.Z.; methodology—J.J. (Jianjun Jiang); software—J.J. (Jianjun Jiang) and L.Z.; validation—J.J. (Jianjun Jiang), J.Z., L.Z. and J.J. (Jun Jiang); formal analysis—J.J. (Jianjun Jiang) and J.Z.; investigation—J.J. (Jun Jiang) and Y.W.; resources—X.R. and J.Z.; data curation—Jianjun.J. and L.Z; writing, original draft preparation—J.J. (Jianjun Jiang) and L.Z; writing—review and editing, J.J. (Jianjun Jiang) and Y.W.; visualization—J.J. (Jianjun Jiang); supervision—J.Z.; project administration—X.R.; funding acquisition—X.R. and J.Z.
This research received no external funding.
The authors declare no conflict of interest.
By Jianjun Jiang; Jing Zhang; Lijia Zhang; Xiaomin Ran; Jun Jiang and Yifan Wu
Reported by Author; Author; Author; Author; Author; Author