Most mainstream research on assessing building damage using satellite imagery is based on scattered datasets and lacks unified standards and methods to quantify and compare the performance of different models. To mitigate these problems, the present study develops a novel end-to-end benchmark model, termed the pyramid pooling module semi-Siamese network (PPM-SSNet), based on a large-scale xBD satellite imagery dataset. The high precision of the proposed model is achieved by adding residual blocks with dilated convolution and squeeze-and-excitation blocks into the network. Simultaneously, the highly automated process of satellite imagery input and damage classification result output is reached by employing concurrent learned attention mechanisms through a semi-Siamese network for end-to-end input and output purposes. Our proposed method achieves F1 scores of 0.90, 0.41, 0.65, and 0.70 for the undamaged, minor-damaged, major-damaged, and destroyed building classes, respectively. From the perspective of end-to-end methods, the ablation experiments and comparative analysis confirm the effectiveness and originality of the proposed method. Finally, the consistent prediction results of our model for data from the 2011 Great East Japan Earthquake verify the high performance of our model in terms of the domain shift problem, which implies that it is effective for evaluating future disasters.
Keywords: pyramid pooling module; semi-Siamese; benchmark model; damage assessment; end-to-end; xBD dataset
Natural disasters, which have been occurring frequently in recent years [[
The key factor to obtaining disaster information is assessing building damage. Mainstream building damage assessment methods include two main steps: building localization and damage classification. First, building localization is unnecessary if building footprint information is provided; however, this information is rarely available in disaster events, especially those in underdeveloped areas. Second, damage classification relies heavily on building footprint information; therefore, the accuracy of building localization information directly affects this classification. Xu et al. [[
To solve the above challenges, some researchers apply five-class semantic segmentation, which simply regards "no building" as a damage class [[
However, these baseline models cannot accurately distinguish between minor- and major-damaged buildings. Indeed, five-class semantic segmentation is a harder task than building localization. Using the transfer learning technique, the performance of some end-to-end models can be improved by initializing the final model with pre-trained building localization weights. Nia and Mori [[
In addition to those works discussed above, Valentijn et al. [[
In this study, we design a concurrent learned attention network, which is an end-to-end trainable, unified model, to localize buildings and classify damage jointly. This network is built on a semi-Siamese strategy that can learn collectively. We use a pixel-level segmentation-based approach as well as residual blocks (RBs) with dilated convolution and squeeze-and-excitation (SE) blocks to detect damage to the segmented buildings. To model the global contextual prior, we also introduce the pyramid-pooling module (PPM) that enhances the scale invariance of images, while lowering over-fitting risk.
To benchmark our method, we develop our model based on the large-scale xBD dataset, which contains satellite images from multiple disaster types worldwide such as earthquakes, hurricanes, floods, and wildfires. To verify our method's effectiveness and practicality, we compare its performance with that of the published baseline model based on the xBD dataset. To demonstrate its usefulness, we use data from the 2011 Great East Japan Earthquake.
We contribute to the body of knowledge in four main ways. First, redwe propose a benchmark model for assessing building damage based on a large-scale xBD satellite imagery dataset. Second, we put forward an end-to-end model for assessing building damage, termed PPM-SSNet, which adopts the semi-Siamese technique, the PPM, and an attention mechanism. To overcome the difficulty of multi-target learning, we use the weighted combined losses of dice, focal, and cross-entropy. Third, we use efficient five data augmentation methods and four class balance strategies designed for these tasks to improve the task performance of all the mainstream models. Finally, we use different disaster images, including severely damaged images and rare disaster images, to test our model's robustness by comparing it with two strong baseline models.
The xBD dataset [[
Consistent with real-world disaster case scenarios, the xBD dataset presents severe class imbalance. In terms of the building area/non-building area ratio at the pixel level, the non-building pixel occupies 97% of the image pixels, as shown in Table 1. Regarding the proportional distribution of the damage class at the pixel level, the number of undamaged building pixels far exceeds that of the other three classes, with a ratio of up to 76%. Only 6% of pixels belong to the class of destroyed. The minor-damaged and major-damaged categories account for almost the same proportion. Figure 2 compares the class balance.
To verify our method's transferability, we test other satellite imagery with the developed model based on the xBD dataset. Two areas in Higashi Matsushima severely affected by the 2011 Great East Japan Earthquake are used for testing, as shown in Figure 3a–c. These two areas are selected because the xBD dataset does not contain any disaster data from Japan and data on the tsunami in the xBD dataset are scarce. This design can test the ability of our model for to evaluate and predict unknown disasters.
The building damage ground truth data for the testing area are based on the field investigation conducted by TTJS [[
The four-band multispectral high-resolution Worldview-2 images with a spatial resolution of 0.6 m, collected before and after the 2011 Great East Japan Earthquake, were used for validation as shown in background of Figure 3b,c.
The PPM-SSNet model developed in this research employed dilated convolution, the SE mechanism for attention, and the PPM, as detailed below.
Collectively leveraging the global and local features of an input image is effective at solving computer vision problems [[
The SE mechanism was originally developed to improve the performance of image classification on ImageNet [[
The PPM pools the activation map of each channel in a pyramidal fashion [[
The task of estimating the damage assessment of buildings is divided into two stages. The first stage identifies the buildings on an image. This can be treated as a localization problem in which a system such as a CNN is employed to estimate the binary localization map for an input image. A location with 1 or 0 on the map indicates whether it is a building or not. The localization map is then employed as a prior for the second stage to estimate the damage assessment of a location with a value equal to 1. Based on this idea, we design a network to jointly estimate buildings' locations and assess their damage. We use the pre-image alone to estimate the location map and then use both the pre- and the post-images to estimate the assessment result. To leverage the localization map to produce an accurate assessment result, we directly multiply it by the output of the assessment estimator. This process corrects the assessment result, improving its quality from a coarse to a fine level (see Figure 7).
Figure 7 shows the architecture. The network is built on a semi-Siamese strategy. We let the weights at the shallow layers of the network share the two input images (i.e., pre-/post-images) to enable it to produce a good "filters' bank" by collectively learning the low-level features from both. As the layers go deeper, we stop sharing weights and use independent branches for the two inputs instead. The two branches are merged by subtracting one from the other along their channels, which encourages the network to learn the differences between the pre- and post-images. For the tail of the network, we use a single branch of the layers to produce the final estimation result. In the network, we employ RBs with dilated convolution and SE blocks. Our motivation for using RBs is that the network can extract features from large and small receptive fields by employing the large and small dilated rates used in RBs, which may improve its representation ability for the estimations. In addition, SE blocks are employed to encourage the network to focus on the important features, while suppressing the less useful ones. We employ a PPM at the end of the network, immediately before an SE block, and a convolutional layer to aggregate the features.
Over-sampling and data augmentation are adopted in this study. The assessment metrics as well as loss and mask dilation parameter settings are detailed below.
Building damage detection networks based on xBD generally perform badly when detecting minor and major damage, resulting in comparatively low recalls and F1 scores for these two categories because of imbalanced training data. To overcome this problem, we devise several methods to increase the number of minor damage and major damage instances, one of which is over-sampling the training dataset. Since our model is designed to generate pixel-level classification results, we suggest using a main label to decide how many times a picture containing multi-label pixels should be repeated in the training dataset. A weight vector
(
where category 0 denotes no damage, category 1 denotes minor damage, and so on. Table 3 shows the main label categories and corresponding repeated times.
Since the images are cropped and randomly augmented later, there is no concern that the repeated pictures are identical to the original ones.
After over-sampling, we perform a cropping-and-selecting process with discrimination. Similar to above, we reweight each pixel as inversely proportional to the frequency of its corresponding damage level. The original image size is
To enhance the generalizability of our model, we apply the following data augmentation methods sequentially to each image. As shown in Table 4, every method is assigned a value, indicating the probability of occurrence. In other words, the sequence of augmentation methods applied to an image is determined randomly and the higher the order, the earlier is the execution.
End-to-end building damage assessment includes two progressive tasks: building localization and damage classification. The former can be regarded as a binary segmentation, while the latter is a multi-classification task. This study adopts F1 scores, precision, recall, and IoU to evaluate our network's performance. For the localization task, the F1 score is used:
(
where
(
(
(
(
(
where
The output damage scale classification mask has five channels: the four damage levels and no-building label. We adopt a weighted mixed loss that consists of dice loss and focal loss for the damage scale classification loss
(
(
(
where
To achieve better classification at the boundary, we expand the building damage scale labels. Given the overlap in pixel' labels, we prioritize minor damaged and major damaged buildings (c = 2, 3), which are relatively vulnerable in the classification.
In this work, we use PyTorch, a Python package that provides Tensor computation with strong GPU acceleration and deep neural networks built on a tape-based autograd system, as the deep learning framework. PyTorch is designed to be intuitive, linear in thought, and easy to use. Equipped with acceleration libraries such as Intel MKL and NVIDIA and custom memory allocators for GPU, PyTorch enables users to train larger deep learning models than with other Python packages.
All the experimentation and modeling tasks are implemented in the public cluster in the x64 Linux environment with the public computing cloud at the Renmin University of China. This computing cloud is equipped with the Simple Linux Utility for Resource Management (Slurm) scheduling system. Computations are performed on the node titan, which is configured with 128 GB of RAM, two Intel Gold 5218 CPUs, and two NVIDIA Titan RTX GPUs.
In this study, we use an ablation experiment to demonstrate the effectiveness of our proposed method. An ablation study typically refers to subtracting a "feature" of the model or algorithm and verifying how this affects performance. Instead of subtracting, however, we gradually add modules such as Siamese, attention, and pyramid pooling into our proposed baseline network to verify its performance. Nevertheless, the improvement of the model performance is incompatible with different sectional tasks. Conducting experiments over several rounds guarantees that the modules of interest boost model performance.
Table 5 and Table 6 show the results of the ablation experiment. The shaded row in the tables represents the performance of our proposed baseline model. The second row in Table 5 indicates that deploying the Siamese network module to the baseline model leads to a significant increment in all the metrics. Adding the attention module into the model results in a slight decline in all the metrics except the recall rate. The increase in the recall rate might be a consequence of the scale-aware semantic image segmentation that arises with an attention mechanism. We then introduce the PPM, which raises all the metrics except the recall rate slightly. This variation can be attributed to pyramid pooling, which enhances the scale invariance of images, while lowering the risk of over-fitting.
Table 6 shows that sequentially applied modules improve overall performance since total F1, the harmonic mean of the F1 of each category, increases gradually with a mere recession. As for the irregular increment in the metrics, a trade-off between the precision rate and recall rate and the respective F1s of the different classes often results. For instance, deploying the Siamese network raises
Table 7 also shows the confusion matrix of our final PPM-SSNet. Our model performs well overall and the non-building category holds the highest accuracy of 96.52%; whereas accuracy for the minor damage pixel is only 30.29%.
Table 8 compares the results between the post-and-pre strategy (both pre-disaster and post-disaster images are available) and the post-only strategy (only the post-disaster images are applied). According to the results, The performance of using only post-disaster image is lower than using both pre and post disaster images to locate buildings and assess damage levels, this demonstrates the important role of pre-disaster image in improving building localization and damage classification.
Since the release of the xBD dataset, some studies have divided a share of its data for training and achieved good results, whereas others use different evaluation metrics to assess accuracy. Moreover, some work is not strictly an end-to-end study, preventing us from being able to compare these published results with ours. To solve this problem, we reproduce previous research results and carry out comparative experiments under uniform experimental conditions. A Mask R-CNN network [[
Weber et al. [[
Hao [[
We train and test our network and other methods using the same datasets described above and same parameter settings. The results show that our proposed network easily outperforms the other approaches, as shown in Table 9 and Table 10. We also compare the classification results of earthquakes, tsunamis, floods, typhoons, and volcanic eruptions, as shown in Figure 10. The results again verify the superiority of our method over previous approaches.
Further, our model outperforms baseline models when predicting building localization and damage classification. Post-disaster images with destroyed buildings make a noise to building localization since the edges of destroyed buildings may be vague. FPN-R-CNN classified the majority of destroyed buildings into the no building category, while the U-Net-Siam-Attn's prediction of destroyed buildings is not robust. In these cases, our model can easily distinguish undamaged and destroyed buildings, but it is hard to distinguish minor from major damage.
The validation areas are characterized by a great diversity of environmental settings, building structures and spatial distributions, tsunami processes, and image acquisition conditions, as shown in Figure 11(a1,b1,a2,b2), respectively.
The predicted results show that the proposed model detects destroyed and undamaged buildings, but separating minor damage from major damage is still challenging, as shown in Figure 11(c1,d1,c2,d2) and Table 11. Partly because the Tohoku tsunami's annotation standard and that of the xBD dataset are not uniform. The Tohoku tsunami's building label is from a field survey, while the label of the xBD dataset comes from a visual interpretation, leading to an error in the like-for-like comparison. As the small validation area as shown in Figure 3c contains almost destroyed buildings, therefore we only did quantitative confusion matrix (Table 11) analysis for the larger validation area with variety of damage types as shown in Figure 3b. Still, We can visually interpret that the prediction results of the small validation area as shown in Figure 11(c1) are quite consistent with the ground truth data as shown in Figure 3c. Further, satellite remote sensing is limited when detecting fine-scale building damage because of its lower spatial resolution. Therefore, the method's inability to distinguish major and minor damage is logical. One way to solve this challenge would be to use high-resolution drone images. In general, our prediction results are consistent with the field observation data.
In this study, we developed an end-to-end attention-guided semi-Siamese network with a pyramid-pooling module. Our proposed model yielded satisfactory results when focusing on building localization and damage classification compared with other methods. Employing dilated convolution, the method leveraged the global and local features of an input image. To improve damage classification performance, we adopted a squeeze-and-excitation mechanism, a weighting system that produces and applies channel-wise weights on a feature map. Our ablation experiments on the xBD dataset demonstrated that the proposed semi-Siamese network, dilated convolution, and squeeze-and-excitation mechanism were both necessary and effective. Meanwhile, the demonstration with 2011 Great East Japan Earthquake data revealed consistent results with the ground truth data, confirming the effectiveness of evaluating future disasters using our proposed method. Further, it achieved true end-to-end input and output. Thanks to the open source of the large-scale high-precision xBD dataset, which used to be the main challenge of training deep learning models for building damage assessment from satellite imagery, it has become unnecessary to xxxx. Nevertheless, the contribution of this research is developing a damage detection algorithm based on large-scale benchmark data from multiple types of disasters. Therefore, we do not provide targeted solutions for a specific type of disaster.
Our research has some limitations. It is based on the visual information of optical images, meaning that it may be unable to measure extensive flood damage under an intact roof. To address this, researchers could consider using synthetic aperture radar images to detect bottom or sidewall damage [[
Graph: Figure 1 Example of the xBD dataset: Tsunami in Palu, Indonesia. From left to right: (a) Pre-disaster image, (b) Post-disaster image, (c) Damage scale, and (d) Building footprint.
Graph: Figure 2 Ratio of damage class at the pixel level.
Graph: Figure 3 Validation area. (a) Higashi Matsushima in the Tohoku region of Japan; the rectangular areas marked in blue and red are the selected validation areas; (b) The close-up of the blue area as shown in Figure 10a with the ground truth data of building damage; and (c) The close-up of the red area as shown in Figure 10a with the ground truth data of building damage.
Graph: Figure 4 Dilated convolution with dilated rates of 1 (i.e., normal convolution; left side of the figure) and 2 (right side of the figure). g, h, and u mean the input image (or activation map), convolutional kernel, and output. An output u is calculated by summing the multiplications of each value (i, j) at the kernel h and its corresponding value (x, y) at g.
MAP: Figure 5 Squeeze-and-excitation (SE) blocks produce and apply channel-wise attention on the activation maps. GAP means global average pooling. wi denotes the ith linear production layer. ReLU and Sigmoid are employed following w1 and w2 for the activation functions. The columns depicted in different colors represent the activation map of each channel of the input/output tensor.
MAP: Figure 6 The pyramid pooling module (PPM) g represents an activation map of a single channel. N is the number of cells in a row/column of a pooling grid.
Graph: Figure 7 The architecture of the proposed network. c, b, d, and r represent the convolutional layer, batch normalization layer, dropout layer, and ReLU layer. SE, RB', RB, and PPM represent the modules illustrated at the bottom of this figure. The difference between RB' and RB is that RB' has an additional convolutional layer + batch normalization layer, which is designed to change the number of channels or size of the input tensor if needed. See Table 2 for more details.
Graph: Figure 8 FPN R-CNN network.
Graph: Figure 9 Siam-U-Net-Attention network model.
Graph: Figure 10 The results from our proposed method and comparisons with others. (a) Pre-disaster image; (b) Post-disaster image; (c) Ground truth; (d) Proposed PPM-SSNet model; (e) Siam-U-Net model; and (f) FPN-R-CNN model.
Graph: Figure 11 Prediction results from our proposed method in the validation areas. (a1,a2) Pre-disaster image; (b1,b2) Post-disaster image; (c1,c2) Predicted damage scale by the PPM-SSNet model; and (d1,d2) Prediction building footprint by the PPM-SSNet model.
Table 1 Non-building area to building area ratio at the pixel level.
Non-Building Area Building Area 96.97% 3.03%
Table 2 Details of the proposed network. Conv., RB', RB, SE, Drop, and PPM mean the convolutional layer (c), RB-v2 (RB'), RB, SE, dropout layer (d), and PPM (see Figure 7). For the convolutional layer (Conv./conv.),
[ 7×7, in=3, out=16, stride=1, dila=1 ] [ 3×3, in=16, out=16, stride=1, dila=1 ] [ 3×3, in=16, out=32, stride=2, dila=1 ] conv., 1×1, in =32, out=64, stride=1, dila=1 conv., 3×3, in =64, out=64, stride=2, dila=1 conv., 1×1, in =64, out=256, stride=1, dila=1 down., 1×1, in =32, out=256, stride=2, dila=1 conv., 1×1, in =256, out=64, stride=1, dila=1 conv., 3×3, in =64, out=64, stride=1, dila=1 conv., 1×1, in =64, out=256, stride=1, dila=1 [ in=256, mid=16, out=256 ] conv., 1×1, in =256, out=128, stride=1, dila=1 conv., 3×3, in =128, out=128, stride=2, dila=1 conv., 1×1, in =128, out=512, stride=1, dila=1 down., 1×1, in =256, out=512, stride=2, dila=1 conv., 1×1, in =512, out=128, stride=1, dila=1 conv., 3×3, in =128, out=128, stride=1, dila=1 conv., 1×1, in =128, out=512, stride=1, dila=1 [ in=512, mid=32, out=512 ] conv., 1×1, in =512, out=256, stride=1, dila=1 conv., 3×3, in =256, out=256, stride=1, dila=2 conv., 1×1, in =256, out=1024, stride=1, dila=1 conv., 1×1, in =512, out=1024, stride=1, dila=1 conv., 1×1, in =1024, out=256, stride=1, dila=1 conv., 3×3, in =256, out=256, stride=1, dila=2 conv., 1×1, in =256, out=1024, stride=1, dila=1 [ in=1024, mid=64, out=1024 ] conv., 1×1, in =1024, out=512, stride=1, dila=1 conv., 3×3, in =512, out=512, stride=1, dila=4 conv., 1×1, in =512, out=2048, stride=1, dila=1 conv., 1×1, in =1024, out=2048, stride=1, dila=1 conv., 1×1, in =2048, out=512, stride=1, dila=1 conv., 3×3, in =512, out=512, stride=1, dila=4 conv., 1×1, in =512, out=2048, stride=1, dila=1 3×3, in=2048, out=512, stride=1, dila=2 [ in=512, mid=16, out=512 ] [ out=512] [ in=1024, mid=64, out=1024 ] 1×1, in=1024, out=5, stride=1, dila=1Layer Parameters Number Conv. ×1 Share Conv. ×1 Conv. ×1 Share RB' ×1 RB ×2 SE ×1 RB' ×1 RB ×3 Independent SE ×1 RB' ×1 RB ×22 SE ×1 Single RB' ×1 RB ×2 Drop − ×1 Conv. ×1 SE ×1 Single PPM ×1 SE ×1 Conv. ×1
Table 3 Main labels and corresponding repeated times.
Main Label No Damage Minor Damage Major Damage Destroyed Repeated Times 0 3 2 1
Table 4 Data augmentation methods and probabilities.
Method Pre to Post Flip Rotate by 90 Degree Shift Pnt Probability 0.015 0.5 0.95 0.1 Method Rotation Scale Color shifts Change hsv Probability 0.1 0.7 0.01 0.01 Method CLAHE Blur Noise Saturation Probability 0.0001 0.0001 0.0001 0.0001 Method Brightness Contrast Probability 0.0001 0.0001
Table 5 Ablation experiments of the location methods with different modules (the shaded row represents the results of the ablated model).
Baseline model 94.91 52.57 73.74 54.70 75.27 63.36 95.14 56.07 +Siamese 96.98 66.07 81.53 73.93 82.42 77.95 95.98 61.97 +Siamese + Attention 96.60 65.45 81.03 64.98 87.26 74.49 96.15 60.90 +Siamese + PPM + Attention 97.00 67.33 82.17 71.15 85.58 77.70 95.95 66.40
Table 6 Ablation experiments of the multi-classification methods with different modules (the shaded row represents the results of the ablated model).
Baseline Model 87.22 93.04 90.04 54.64 26.20 35.43 48.14 56.41 51.95 85.41 45.02 58.96 52.95 +Siamese 90.19 79.10 84.28 22.59 55.14 32.05 67.24 65.25 66.23 92.07 55.73 69.44 55.12 +Siamese + Attention 91.35 77.26 83.72 22.52 56.60 32.22 61.73 66.64 64.10 83.07 62.31 71.21 55.08 +Siamese + PPM + Attention 90.64 89.07 89.85 35.51 49.50 41.36 65.80 64.93 65.36 87.08 57.89 69.55 61.55
Table 7 Confusion matrix.
8.88× 108 2.16× 108 2.60× 108 2.84× 108 2.05× 106 2.22× 107 3.67× 107 8.31× 105 3.76× 105 7.43× 104 4.26× 106 2.53× 106 2.06× 106 3.81× 105 1.50× 104 4.93× 106 1.60× 106 1.21× 106 4.15× 106 2.06× 105 1.39× 106 4.09× 105 1.12× 105 1.28× 105 1.95× 106 9.20× 108 6.29× 108 6.80× 106 7.91× 106 4.30× 106 Ground Truth Non-Building No-Damage Minor Damage Major Damage Destoryed Non-building No-damage Prediction Minor damage Major damage Destoryed Total Accuracy(%) 96.52 58.35 30.29 52.47 45.35
Table 8 Comparison between the pre-and-post strategy and the post-only strategy.
Strategy post-only 91.88 47.32 69.60 56.94 58.16 82.84 38.16 63.23 71.10 58.69 pre-and-post 97.00 67.33 82.17 77.70 66.40 89.85 41.36 65.36 69.55 61.55
Table 9 Comparison with other methods on the location task.
Networks Mean Mean Siam-U-Net-Diff 96.50 44.57 70.54 52.75 90.75 66.72 Weber et al. 95.63 48.62 72.13 85.30 82.90 84.10 PPM-SSNet 97.00 67.33 82.17 71.15 85.58 77.70
Table 10 Comparison with other methods on the classification task.
Networks Siam-U-Net-Diff 80.58 49.64 60.51 28.69 26.32 27.45 51.31 27.60 35.89 75.00 33.03 45.86 39.01 Weber et al. 94.80 56.90 71.10 58.90 22.00 32.00 70.10 38.00 49.30 89.50 40.03 60.71 48.73 PPM-SSNet 90.64 89.07 89.85 35.51 49.50 41.36 65.80 64.93 65.36 87.08 57.89 69.55 61.55
Table 11 Confusion Matrix of Tohoku Tsunami Building Damage Prediction Experiment.
Prediction Non-Building No-Damage Minor Damage Major Damage Destoryed Non-building 38,960,379 66,366 50,870 19,195 34,488 No-damage 215,480 368,283 862 1962 39,889 Ground Truth Minor damage 58,680 2841 34,629 1736 8293 Major damage 86,002 8 4331 43,611 3272 Destoryed 196,579 80,942 12,550 6839 314,583 Total 39,517,120 518,080 103,242 73,343 400,525 Accuracy(%) 98.59 71.04 33.54 59.46 78.54
Conceptualization, Y.B.; methodology, Y.B., J.H., J.S., X.L.; software, X.H., H.L.; validation, X.H., H.L.; formal analysis, Y.B., J.H., J.S., X.H., H.L., X.L.; investigation, Y.B., E.M., S.K.; resources, Y.B., S.M., E.M., S.K.; data curation, Y.B., E.M., and S.K.; writing—original draft preparation, Y.B., J.H., J.S., X.H., H.L., X.L.; visualization, X.H., H.L.; supervision, Y.B., S.M., X.L., E.M., S.K.; project administration, Y.B., S.M., E.M., S.K.; funding acquisition, Y.B., S.M., E.M., S.K. All authors have read and agreed to the published version of the manuscript.
This research was partly funded by the Fundamental Research Funds for the Central Universities, Research Funds of Renmin University of China (20XNF022), fund for building world-class universities (disciplines) of Renmin University of China, Major projects of the National Social Science Fund (16ZDA052), Japan Society for the Promotion of Science Kakenhi Program (17H06108), and Core Research Cluster of Disaster Science and Tough Cyberphysical AI Research Center at Tohoku University.
The authors declare no conflict of interest.
The following abbreviations are used in this manuscript:
PPM-SSNet Pyramid Pooling Module-based Semi-Siamese Network PPM Pyramid Pooling Module CNN Convolutional Neural Network IoU Intersection over Union SE Squeeze-and-Excitation RBs Residual Blocks
This work was supported by the Public Computing Cloud, Renmin University of China. We also thank the SmartData Club, an Entrepreneurship Incubation Team lead by Jinhua Su of Renmin University of China; Wenqi Wu, students from Renmin University of China; and the Core Research Cluster of Disaster Science at Tohoku University (a Designated National University) for their support. We thank the two reviewers for their helpful and constructive comments on our work. The author gratefully acknowledges the support of K.C. Wong Education Foundation, Hong Kong.
By Yanbing Bai; Junjie Hu; Jinhua Su; Xing Liu; Haoyu Liu; Xianwen He; Shengwang Meng; Erick Mas and Shunichi Koshimura
Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author