[R] Audio Denoising with Deep Network Priors
Hi! First author here –
In this paper we tackle the problem of audio and speech denoising. Given an audio stream of noisy speech, which is a mixture of speech and noisy background, we would like to filter out the speech signal. While most of the methods utilize supervised deep learning, we decided to use only the noisy sample without any learned model or additional dataset, presenting fully unsupervised method.
To accomplish that, we train an autoencoder to fit the noisy signal from random noise input.
We observed that modeling noise in the signal is harder than the “clean” part in the signal. During the fitting process we observe fluctuations in different stages of the train.
Utilizing that, we calculate the amount of difference between different network outputs in the time-frequency domain we create a robust spectral mask used for denoising the noisy output.
We tested this algorithm on other audio domains rather than only speech, and it shows the same effect: denoising or filtering the main data in a signal using only the noisy signal itself.
You can listen to samples and a comparison with traditional unsupervised methods can be found here
Feel free to ask any questions.