Vision Transformers for building damage assessment after natural disasters

Adrien Lagrange, Nicolas Dublé, François De Vieilleville, Aurore Dupuis,Stéphane May, Aymeric Walker-Deemin

crossref（2023）

引用 0|浏览0

暂无评分

摘要

Damage assessment is a critical step in crisis management. It must be fast and accurate in order to organize and scale the emergency response in a manner adapted to the real needs on the ground. The speed requirements motivate an automation of the analysis, at least in support of the photo-interpretation. Deep Learning (DL) seems to be the most suitable methodology for this problem: on one hand for the speed in obtaining the answer, and on the other hand by the high performance of the results obtained by these methods in the extraction of information from images. Following previous studies to evaluate the potential contribution of DL methods for building damage assessment after a disaster, several conventional Deep Neural Network (DNN) and Transformers (TF) architectures were compared. Made available at the end of 2019, the xView2 database appears to be the most interesting database for this study. It gathers images of disasters between 2011 and 2018 with 6 types of disasters: earthquakes, tsunamis, floods, volcanic eruptions, fires and hurricanes. For each of these disasters, pre- and post-disaster images are available with a ground truth containing the building footprint as well as the evaluation of the type of damage divided into 4 classes (no damage, minor damage, major damage, destroyed) similar to those considered in the study. This study compares a wide range DNN architectures all based on an encoder-decoder structure. Two encoder families were implemented: EfficientNet (B0 to B7 configurations) and Swin TF (Tiny, Small, and Base configurations). Three adaptable decoders were implemented: UNet, DeepLabV3+, FPN. Finally, to benefit from both pre- and post-disaster images, the trained models were designed to proceed images with a Siamese approach: both images are processed independently by the encoder, and the extracted features are then concatenated by the decoder. Taking benefit of global information (such as the type of disaster for example) present in the image, the Swin TF, associated with FPN decoder, reaches the better performances than all other encoder-decoder architectures. The Shifted WINdows process enables the pipe to process large images in a reasonable time, comparable to the processing time of EfficientNet-based architectures. An interesting additional result is that the models trained during this study do not seem to benefit so much from extra-large configurations, and both small and tiny configurations reach the highest scores.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要