Abstract:
In recent years, the rapid development of generative image technology has opened up vast prospects for digital content creation while also giving rise to increasingly covert image tampering behaviors. Among them, a new type of forgery technique known as "laundering" has gradually emerged. This operation utilizes generative inpainting technology to seamlessly replace traces in areas altered by tools such as diffusion models with visually plausible natural content, thereby concealing the original tampering evidence. This process severely undermines the feature inconsistency assumption that traditional forensic methods rely on, posing unprecedented detection challenges. To address this, this paper proposes an image "laundering" intensity prediction and tampering localization model based on the U-Net architecture, integrating a multi-stage collaborative training mechanism. A three-stage progressive training framework is constructed to jointly optimize three core tasks: "laundering" intensity regression prediction, image content restoration, and tampering area localization. Customized loss functions tailored to the characteristics of different tasks are employed at each stage as optimization objectives, adapting to the data feature distributions that need to be learned. The model embeds a multi-head attention mechanism into the encoder-decoder structure and replaces the time step embedding in the diffusion model with a "laundering" intensity conditional vector to achieve adaptive responses to varying degrees of restoration operations. Experimental results on multiple public datasets demonstrate that the proposed method achieves favorable outcomes in "laundering" intensity prediction, image restoration quality, and tampering area localization. Compared to similar tampering localization models, this model also exhibits superior overall performance. Ablation experiments further validate the effectiveness of the multi-task collaborative training mechanism and the design of each module, confirming that incorporating regression results and features from restored images as prior information can significantly enhance model performance. The method presented in this paper provides an effective technical path for addressing the forensic issues of images subjected to "laundering" by generative models, showing clear advantages in feature perception, structural preservation, and localization accuracy, and is of great significance for advancing image forensics technology.