基于FFTNet-GAN的音频超分辨率方法研究

徐峰; 李平

doi:10.16798/j.issn.1003-0530.2021.01.007

基于FFTNet-GAN的音频超分辨率方法研究

徐峰,
李平

Research on Audio Super-resolution Method Based on FFTNet-GAN

XU Feng,
LI Ping

摘要

摘要: 本文提出了一种基于FFTNet的生成对抗网络模型来实现极端音频超分辨率任务。生成器采用并行、非因果、Non-local运算的三路分裂求和FFTNet，此浅层模型速度快，精度高，能更好的提取时域音频的长期相关结构，以期望分辨率提取特征，提升重建性能；设计匹配性能的判别器，稳定适应生成对抗架构；融合基于频域的感知损失，与样本空间损失固定加权减少重建失真和提高感知质量。从主客观进行系统评价，本文方法都优于基线模型，从2x/4x/6x倍还原效果来看，模型具有极端的高频重建能力，有助于提高音频信号的时间分辨率。

Abstract: This paper proposes a generative adversarial network model based on FFTNet to achieve extreme audio super-resolution tasks. The generator uses parallel, non-causal, and non-local three-way split-sum FFTNet. This shallow model is fast and accurate. It can better extract the long-term correlation structure of time-domain audio and extract features at the desired resolution, can help improve reconstruction performance.In addition, a discriminator with matching performance is designed to stably adapt to the generation adversarial architecture. Fusion based on the frequency domain perceptual loss, fixed weight with sample space loss to reduce reconstruction distortion and improve perceptual quality. From the subjective and objective system evaluation, the method in this paper is better than the baseline model. Judging from the 2x/4x/6x times reduction effect, the model has extreme high-frequency reconstruction ability, which helps to improve the time resolution of the audio signal.

HTML全文

参考文献(27)

施引文献

资源附件(0)