基于注意力与U-NET的深度洗白图像篡改定位方法

A U-Net and Attention Based Method for Localizing Laundering Tampered Images

  • 摘要: 近年来,生成式图像技术的迅猛发展为数字内容创作开辟了广阔前景,同时也催生了愈加隐蔽的图像篡改行为。其中,一种被称为“洗白”的新型伪造手段逐渐浮现。该操作借助生成式修复技术,针对经扩散模型等工具篡改的区域,将其痕迹无缝替换为视觉合理的自然内容,从而掩盖原始篡改证据。这一过程严重破坏了传统取证方法所依赖的特征不一致性假设,使其面临前所未有的检测挑战。为此,本文提出一种基于U-Net架构、融合多阶段协同训练机制的图像“洗白”强度预测与篡改定位模型。本文构建了一个三阶段渐进式训练框架,依次对“洗白”强度回归预测、图像内容复原及篡改区域定位三个核心任务进行协同优化。针对不同任务特性,各阶段采用定制化的损失函数作为优化目标,以适配其需学习的数据特征分布。模型在编解码结构中嵌入了多头注意力机制,并将扩散模型中的时间步嵌入替换为“洗白”强度条件向量,从而实现对不同程度修复操作的自适应响应。在多个公开数据集上的实验结果表明,本文方法在“洗白”强度预测、图像复原质量与篡改区域定位等方面均取得了良好效果。与同类篡改定位模型相比,本文模型亦表现出更优的综合性能。消融实验进一步验证了多任务协同训练机制及各模块设计的有效性,证实引入回归结果与复原图像特征作为先验信息可显著提升模型性能。本文所提方法为应对生成式模型“洗白”后的图像取证问题提供了有效的技术路径,在特征感知、结构保持与定位精度方面具备明显优势,对推动图像取证技术发展具有重要意义。

     

    Abstract: In recent years, the rapid development of generative image technology has opened up vast prospects for digital content creation while also giving rise to increasingly covert image tampering behaviors. Among them, a new type of forgery technique known as "laundering" has gradually emerged. This operation utilizes generative inpainting technology to seamlessly replace traces in areas altered by tools such as diffusion models with visually plausible natural content, thereby concealing the original tampering evidence. This process severely undermines the feature inconsistency assumption that traditional forensic methods rely on, posing unprecedented detection challenges. To address this, this paper proposes an image "laundering" intensity prediction and tampering localization model based on the U-Net architecture, integrating a multi-stage collaborative training mechanism. A three-stage progressive training framework is constructed to jointly optimize three core tasks: "laundering" intensity regression prediction, image content restoration, and tampering area localization. Customized loss functions tailored to the characteristics of different tasks are employed at each stage as optimization objectives, adapting to the data feature distributions that need to be learned. The model embeds a multi-head attention mechanism into the encoder-decoder structure and replaces the time step embedding in the diffusion model with a "laundering" intensity conditional vector to achieve adaptive responses to varying degrees of restoration operations. Experimental results on multiple public datasets demonstrate that the proposed method achieves favorable outcomes in "laundering" intensity prediction, image restoration quality, and tampering area localization. Compared to similar tampering localization models, this model also exhibits superior overall performance. Ablation experiments further validate the effectiveness of the multi-task collaborative training mechanism and the design of each module, confirming that incorporating regression results and features from restored images as prior information can significantly enhance model performance. The method presented in this paper provides an effective technical path for addressing the forensic issues of images subjected to "laundering" by generative models, showing clear advantages in feature perception, structural preservation, and localization accuracy, and is of great significance for advancing image forensics technology.

     

/

返回文章
返回