Abstract:
X-ray prohibited-item detection is an extremely important work and is used to detect a variety of dangerous goods in airports, stations, and other public places to prevent accidents and protect the safety of passengers. However, a complex background for an X-ray image and large change in the target scale make it difficult to achieve sufficient detection accuracy with the traditional detection algorithm. The YOLOv5s network model was improved with the goal of improving the performance with a complicated background for an X-ray image or large changes in the scale of prohibited items, while considering the model-performance and running-speed requirements in actual detection scenarios. First, in order to enhance the global modeling ability of the network, a transformer was introduced into the trunk network and its global modeling ability was used to improve the trunk network's ability to extract global information and make up for the shortage of local information. Then, in order to more accurately detect prohibited items on different scales in the X-ray images, we designed a multi-scale wide receptive field adaptive fusion module based on cavity convolution and a convolutional block attention module (CBAM) to reasonably allocate the receptive field information with different scales. This improved the detection accuracy for different prohibited items with a complex background, which allowed the model to better adapt to different task scenarios. Finally, the optimized DIoU (EDIoU) frame regression loss function was used to introduce penalty weight
φ into DIoU, which not only shortened the training time of the model and reduced the frame loss error, but also further improved the detection accuracy for prohibited items. In order to verify the feasibility of the optimization method proposed in this paper, the proposed optimized model of YOLOv5s was verified on the self-made dataset SIXray_OD in the laboratory. The experimental results showed an average detection accuracy of 89.8% for the optimized model, which was 0.9% better than that of the original model.