Abstract:
This paper proposes a generative adversarial network model based on FFTNet to achieve extreme audio super-resolution tasks. The generator uses parallel, non-causal, and non-local three-way split-sum FFTNet. This shallow model is fast and accurate. It can better extract the long-term correlation structure of time-domain audio and extract features at the desired resolution, can help improve reconstruction performance.In addition, a discriminator with matching performance is designed to stably adapt to the generation adversarial architecture. Fusion based on the frequency domain perceptual loss, fixed weight with sample space loss to reduce reconstruction distortion and improve perceptual quality. From the subjective and objective system evaluation, the method in this paper is better than the baseline model. Judging from the 2x/4x/6x times reduction effect, the model has extreme high-frequency reconstruction ability, which helps to improve the time resolution of the audio signal.