[Research] STFT within neural network pipeline
I have been thinking and looking for an answer for this for a while, but I couldn’t really find a satisfactory solution on google (or maybe i’m not looking for the right thing), hence my post here.
Assume I have a GAN that generates raw audio waveforms. The generator is a convolutional neural network that produces raw audio waveforms, which are passed to a discriminator that evaluates it and backprop is performed. This is pretty straight forward.
But I found that my discriminator is pretty bad at distinguishing real from fake waveforms, therefor I would find it beneficial if I could convert the generated waveform to a spectrogram with an STFT and discriminate real from fake spectrograms.
I understand how the forward pass is performed, but my problem is with backprop. I understand that we compute an error based on the discriminator predictions and back propagate it through the discriminator, which is a standard CNN classifier. But now what happens in between the discriminator and generator? Do we perform an ISTFT on the back propagated error? And how is this done in keras or PyTorch? Would it be some special kind of intermediary layer? I would like to implement this, but I have no idea where to even start.
In general, how is a domain conversion handled within a neural network pipeline?
It would be really helpful if you could share your thoughts on this, or point me towards some work that has already been done on this. Thanks in advance and cheers!