Abstract:
In recent years, with the continuous development of spatial sensing technology, the demand for fusion processing of multi-source remote sensing images has gradually increased. How to effectively extract complementary information from multi-source images to complete specific tasks has become a research hotspot. Aiming at the problems of information redundancy and global feature extraction of multi-source images in the task of semantic segmentation, this paper proposes a model named Transformer U-Net (TU-Net) based on Transformer module for multi-spectral image (MS), panchromatic image (PAN) and Synthetic Aperture Radar (SAR) fusion segmentation. The model uses Channel-Exchanging-Network (CEN) to exchange the multi-source remote sensing feature maps in the fusion branches, so as to obtain better information complementarity and reduce data redundancy. At the same time, after the feature maps were concatenated, the global context of the fusion feature map is modeled by Transformer module with attention mechanism, the global features of multi-source remote sensing images are extracted, and the multi-source images are segmented in an end-to-end manner. The training and verification results on MSAW dataset show that compared with the current multi-source fusion semantic segmentation algorithms, the F
1 value and Dice coefficient are improved by 3.31%~11.47% and 4.87%~8.55% respectively, which significantly improves the segmentation effect of buildings.