基于Inception3D网络的眼部与口部区域协同视频换脸伪造检测

Inception3D Net Based Video Face Swapping Forgery Detection Jointly Exploiting Eye and Mouth Areas

  • 摘要: 近年来出现并迅猛发展的深度伪造(DeepFake)技术深刻改变了多媒体内容伪造的方式和水平,给网络空间内容安全带来了新的严峻挑战。本文主要关注深度伪造中危害最大的视频换脸伪造,提出基于Inception3D(I3D)网络的眼部与口部双流检测方法。首先,针对现有大多数伪造检测方法忽略了视频中重要的时间信息的问题,将目前常用的仅具备空域感受能力的2D卷积拓展为I3D卷积,赋予网络同时感受空域和时域信息的能力。同时,通过调整I3D网络结构使其从原有的多分类任务设计改进为更适合换脸取证二分类任务的高效网络。进一步,考虑到视频换脸操作中眼部和口部区域伪造难度更大也更容易留下篡改痕迹的特点,提出基于这两个区域的双流网络结构,最终利用双流输出结果实现协同决策。通过在Celeb-DF、DFDC、DeepFakeDetection、FaceForensics++等目前常用数据集上的广泛实验,结果表明本文提出的方法在检测准确性和效率上较目前最先进的Xception和标准I3D网络均得到显著提升。

     

    Abstract: DeepFake technology, which has emerged and been developed rapidly in recent years, has profoundly changed the way and level of multimedia content forgery, posing new severe challenges to content security in cyberspace. This paper mainly focuses on the most harmful video face-swapping forgery among the deep forgeries, and proposes a two-stream detection method exploiting eye and mouth artifacts based on the I3D (Inception3D) network. Firstly, considering that most of the existing forgery detection methods ignore the important time information in the video, the currently commonly used 2D convolution merely has the spatial domain perception ability, and therefore, we extend it to the I3D convolution, enabling the network with the ability to simultaneously learn the spatial and temporal domains information. Meanwhile, through adjusting the I3D network structure, it could be improved from the original multi-class classification task design to an efficient network that is more suitable for the binary classification task of DeepFake forensics. Furthermore, considering that the forgery of eye and mouth areas is more difficult and it is easier to leave tampering artifacts in video face-swapping operation, a two-stream network structure based on these two areas is proposed, and the two-stream output results are finally used to form collaborative decision-making. Via extensive experiments on commonly used databases such as Celeb-DF, DFDC, DeepFakeDetection, and FaceForensics++, the results verify that the detection accuracy and computational efficiency of the method proposed in this paper are substantially improved compared with the most advanced Xception and standard I3D networks.

     

/

返回文章
返回