[D] CycleGAN performance peaks before first epoch is finished–help needed!
I have been trying to use an unpaired image-to-image style transfer GAN to convert shoes to dresses. After trying a few different algorithms (including DiscoGAN, StarGAN, and MUNIT), I settled on CycleGAN as it has given me the best results.
I have collected 99,500 and 114,000 images of dresses and shoes, respectively. All of the shoes and dresses are the same orientation and scale (within category). Compared to other projects, I think my dataset quality is quite good.
The quality of the generated images initially progresses well (until around 1/3 of an epoch—30,000 steps on a batch size of 1). After that, even after 32,000, the results worsen. The mode seems to collapse after the first epoch has been completed.
The results generated by the model after 30,000 steps are still very checkered. I tried replacing the transposed convolutions with upsamples (see here) but it resulted in a mode collapse very quickly.
I am not sure if I have already reached the best performance I can expect with such small images (64×64) but it’s hard to believe since this would imply that the “best results” are achieved before the GAN has seen the whole training set. I am unable to get anything comparable with larger images (128×128 or 256×256).
Additionally, I feel like I am battling many issues at the same time. First, I am finding it difficult to get high-quality outputs. Then, the GAN is very unstable and collapses frequently. Finally, it is often the case the even though the GAN produces the same output for any input, it is able to reconstruct the input image (see image 2)
I have already played around with many of the training parameters (batch size, learning rate, decay rate, loss function composition, added feature matching, Wasserstein loss, added noise to the input images, added extra layers to the discriminator and generator, used upsample operations in place of transposed convolutions, played around with the buffer size, also attempted to keep the kernel size proportional to the 64×64 image ratio when using larger images). However, since there are so many things to change, my experiments were not exhaustive. In particular, I feel like I could try playing around the decay rates and composition of the loss function in more detail. Things that are on my list but that I haven’t tried are mini-batch discrimination, packing (from PACGAN), different normalization (currently, I am using InstanceNorm), and logit loss (from LOGAN).
Any advice would be greatly appreciated as I have been spent many hours on this, and, without positive feedback, my personal learning is not so high.
I’d be happy to give more info.