多分支特征融合分类网络用于CXR图像识别
Multi-Branch Feature Fusion Classification Network for Chest X-Ray Image Recognition
-
摘要: COVID-19是由新型冠状病毒引起的一种传染性疾病,给全球公共卫生带来了巨大的挑战。在临床实践中,胸部X射线(Chest X-ray, CXR)检查是识别COVID-19感染和其他常见肺部疾病的重要手段,然而放射科医生对COVID-19患者进行检查需要耗费大量时间和精力,而且增加医生感染的风险。因此,能够从胸部X射线的图像中,自动识别COVID-19的算法显得尤为重要。本文提出了一种基于深度学习的CXR图像分类框架,该框架能够在有限的训练数据下生成更具判别力的特征。具体而言,首先通过残差神经网络(ResNet34和ResNet50)和Transformer组成多分支分类网络,其中ResNet分支通过深度残差结构,有效地提取丰富的语义信息和细腻的纹理信息;而Transformer分支则通过自注意力机制,捕捉图像的全局语义特征。随后,利用特征交互模块将ResNet分支提取丰富的语义信息和纹理信息,与Transformer提取的全局语义特征进行特征交互。最后,再通过特征融合模块来提取图像的多尺度语义特征。该方法能够在有限训练数据的条件下提取多尺度特征表示,以对COVID-19感染区域进行特征提取和定位。实验在公开DLAI3和COVIDx数据集上与15种方法进行了比较,相比于ResNet50的模型,准确率分别提高了1.37%和0.76%。本文提出的分类方法,结合ResNet和Transformer网络在特征提取上的优点,使得网络对CXR图像的识别结果更加准确。Abstract: COVID-19 is an infectious disease caused by the new coronavirus, which poses a significant challenge to global public health. In clinical practice, chest X-ray (CXR) examinations are an important means by which to identify COVID-19 infections and other common lung diseases. However, it is time-consuming and labor-intensive for radiologists to examine COVID-19 patients, and such procedures increase the risk of infection for doctors. Therefore, an algorithm that can automatically identify COVID-19 from chest X-ray images is particularly important. Therefore, this paper proposes a CXR image classification framework based on deep learning that can generate more discriminative features with limited training data. Specifically, a multi-branch classification network is first formed by residual neural networks (ResNet34 and ResNet50) and a Transformer. The ResNet branch effectively extracts rich semantic information and delicate texture information through a deep residual structure, whereas the Transformer branch captures the global semantic features of the image through a self-attention mechanism. Then, the feature interaction module is used to extract rich semantic and texture information from the ResNet branch, and the feature interaction is performed with the global semantic features extracted by the Transformer. Finally, the multiscale semantic features of the image are extracted through the feature fusion module. This method can extract multiscale feature representations under the condition of limited training data to extract features and locate COVID-19 infected areas. The experiment was compared with 15 methods on the public DLAI3 and COVIDx data sets, and the accuracy was improved by 1.37% and 0.76%, respectively, compared with the ResNet50 model. The classification method proposed in this paper combines the advantages of ResNet and Transformer networks in feature extraction to make the recognition results of the network more accurate for CXR images.