语音伪造及检测技术研究综述

A survey on speech forgery and detection

  • 摘要: 语音承载着人类语言和说话人身份信息,通过语音伪造技术可以精确模仿目标说话人的声音以达到欺骗人或机器听觉的目的。目前,深度伪造(Deepfake)正在对全球的政治经济及社会稳定带来极大的威胁,其中语音伪造是Deepfake实现舆论操控的核心技术之一。近年来语音伪造技术在拟人度、自然度方面有了显著进步,使得语音伪造检测技术面临着更大的挑战。本文对当前主流的语音伪造和伪造语音检测技术研究现状进行综述,主要包括:1)对主流语音伪造技术,包括语音合成、语音转换和语音对抗样本的基本概念、技术发展历程和研究进展进行综述;2)对伪造语音检测技术的基本概念、性能评价指标、主要技术实现原理和性能效果进行综述;3)对伪造语音检测相关的主流竞赛、常用数据集和可用代码工具资源进行介绍;最后对语音伪造和检测技术现存的挑战性问题和未来的研究方向进行讨论。

     

    Abstract: Voice carries human language and speaker identity information. Through voice spoofing technology, the voice of the target speaker can be accurately imitated to achieve the purpose of deceiving human or machine hearing. At present, Deepfake is posing a great threat to the global politics, economy and social stability. Voice spoofing is one of the core technologies for Deepfake to achieve public opinion manipulation. In recent years, voice forgery technology has made significant progress in anthropomorphism and naturalness, making voice forgery detection technology face greater challenges. This article reviews the current mainstream voice forgery and fake voice detection technology research status, mainly including: 1) A summary of the basic concepts, technological development and research progress of mainstream voice forgery technologies, including voice synthesis, voice conversion and voice countermeasure samples 2) A summary of the basic concepts, performance evaluation indicators, main technical implementation principles and performance effects of fake voice detection technology; 3) An introduction to mainstream competitions, commonly used data sets and available source code as well as tool resources related to fake voice detection. Finally, this paper discusses the existing challenging problems and the future research direction of speech forgery and detection technology.

     

/

返回文章
返回