LU Zihao,XU Yanjie,SUN Hao,et al. Adversarial robustness enhancement technology for deep image recognition based on multi-model orthogonalization[J]. Journal of Signal Processing, 2024,40(3): 503-515. DOI: 10.16798/j.issn.1003-0530.2024.03.009.
Citation: LU Zihao,XU Yanjie,SUN Hao,et al. Adversarial robustness enhancement technology for deep image recognition based on multi-model orthogonalization[J]. Journal of Signal Processing, 2024,40(3): 503-515. DOI: 10.16798/j.issn.1003-0530.2024.03.009.

Adversarial Robustness Enhancement Technology for Deep Image Recognition Based on Multi-Model Orthogonalization

  • ‍ ‍In recent years, deep neural networks (DNNs) have been widely used with great success in a variety of computer-vision tasks such as image classification, target detection, and image segmentation. However, DNN models still face security risks such as adversarial attacks because of their inherent vulnerability. Attackers maliciously add small and hard-to-identify perturbations to images, which can cause the models to produce incorrect outputs with high confidence. Integrating multiple DNN models to improve the adversarial robustness has become an effective solution to this problem. However, the transferability of adversarial samples among the sub-models in an ensemble may significantly reduce its defense effectiveness. Thus, there is still a need for an intuitive theoretical analysis that can reduce the adversarial transferability. This paper introduces the loss-field concept, quantitatively describes the adversarial transferability between DNN models, derives the upper bound of the adversarial transferability, and shows that promoting orthogonality between model loss fields and reducing their strength can limit the upper bound. This paper then introduces a PORS penalty term to the original loss, which maintains the recognition performance on the original data while enhancing the overall adversarial robustness by reducing the adversarial transferability among sub-models. The paper discusses the results of experiments on ensembles obtained by PORS on CIFAR-10 and MNIST datasets, and compares it with other advanced integrated defense methods in white-box and black-box attack settings. The experimental results showed that PORS could significantly improve the adversarial robustness and maintain a very high recognition accuracy during white-box attacks with the original datasets, especially during black-box transfer attacks. PORS is the most stable of the integrated defense methods.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return