[R] Audio Conversion GAN with Unpaired Data
For the past month I have been working on voice conversion using unpaired data. I naively applied image conversion algorithms to audio spectrograms and after working out a few obstacles I got convincing, although not perfect, results.
Using the exact same algorithm on music genre conversion is also possible and the results, despite a fairly shallow generator with very low capacity, are pretty interesting.
Here are some examples:
The model is able to translate audio signals of any length and does not use any vocoder.
I cannot find papers with similar approaches, and I don’t really know what I should do with this research. Being an Engineering student and not understanding how the academic world works, maybe a simple article and a code release is the best idea.
Thank you for your attention!