基于视觉注意力机制的多源遥感图像语义分割

Semantic Segmentation of Multi-source Remote Sensing Images Based on Visual Attention Mechanism

  • 摘要: 近年来,随着空间感知技术的不断发展,对多源遥感图像的融合处理需求也逐渐增多,如何有效地提取多源图像中的互补信息以完成特定任务成为当前的研究热点。针对多源遥感图像融合语义分割任务中,多源图像的信息冗余和全局特征提取难题,本文提出一种将多光谱图像(Multispectral image, MS)、全色图像(Panchromatic image, PAN)和合成孔径雷达 (Synthetic Aperture Radar, SAR)图像融合的基于Transformer的多源遥感图像语义分割模型Transformer U-Net (TU-Net)。该模型使用通道交换网络(Channel-Exchanging-Network, CEN)对融合支路中的多源遥感特征图进行通道交换,以获得更好的信息互补性,减少数据冗余。同时在特征图拼接后通过带注意力机制的Transformer模块对融合特征图进行全局上下文建模,提取多源遥感图像的全局特征,并以端到端的方式分割多源图像。在MSAW数据集上的训练和验证结果表明,相比目前的多源融合语义分割算法,在F1值和Dice系数上分别提高了3.31%~11.47%和4.87%~8.55%,对建筑物的分割效果提升明显。

     

    Abstract: ‍ ‍In recent years, with the continuous development of spatial sensing technology, the demand for fusion processing of multi-source remote sensing images has gradually increased. How to effectively extract complementary information from multi-source images to complete specific tasks has become a research hotspot. Aiming at the problems of information redundancy and global feature extraction of multi-source images in the task of semantic segmentation, this paper proposes a model named Transformer U-Net (TU-Net) based on Transformer module for multi-spectral image (MS), panchromatic image (PAN) and Synthetic Aperture Radar (SAR) fusion segmentation. The model uses Channel-Exchanging-Network (CEN) to exchange the multi-source remote sensing feature maps in the fusion branches, so as to obtain better information complementarity and reduce data redundancy. At the same time, after the feature maps were concatenated, the global context of the fusion feature map is modeled by Transformer module with attention mechanism, the global features of multi-source remote sensing images are extracted, and the multi-source images are segmented in an end-to-end manner. The training and verification results on MSAW dataset show that compared with the current multi-source fusion semantic segmentation algorithms, the F1 value and Dice coefficient are improved by 3.31%~11.47% and 4.87%~8.55% respectively, which significantly improves the segmentation effect of buildings.

     

/

返回文章
返回