[D] CycleGAN performance peaks before first epoch is finished–help needed!

Written by torontoai on May 29, 2019. Posted in Reddit MachineLearning.

Hi,

I have been trying to use an unpaired image-to-image style transfer GAN to convert shoes to dresses. After trying a few different algorithms (including DiscoGAN, StarGAN, and MUNIT), I settled on CycleGAN as it has given me the best results.

I have collected 99,500 and 114,000 images of dresses and shoes, respectively. All of the shoes and dresses are the same orientation and scale (within category). Compared to other projects, I think my dataset quality is quite good.

I am using an implementation by aitorzip (my fork). I have tried a few different configurations but haven’t been able to get anything better than the default configurations.

The quality of the generated images initially progresses well (until around 1/3 of an epoch—30,000 steps on a batch size of 1). After that, even after 32,000, the results worsen. The mode seems to collapse after the first epoch has been completed.

Image 1: progression of generator’s outputs and reconstruction over time for shoes2dresses task

The results generated by the model after 30,000 steps are still very checkered. I tried replacing the transposed convolutions with upsamples (see here) but it resulted in a mode collapse very quickly.

I am not sure if I have already reached the best performance I can expect with such small images (64×64) but it’s hard to believe since this would imply that the “best results” are achieved before the GAN has seen the whole training set. I am unable to get anything comparable with larger images (128×128 or 256×256).

Additionally, I feel like I am battling many issues at the same time. First, I am finding it difficult to get high-quality outputs. Then, the GAN is very unstable and collapses frequently. Finally, it is often the case the even though the GAN produces the same output for any input, it is able to reconstruct the input image (see image 2)

Image 2: reconstructs output of mode collapse generator well

I have already played around with many of the training parameters (batch size, learning rate, decay rate, loss function composition, added feature matching, Wasserstein loss, added noise to the input images, added extra layers to the discriminator and generator, used upsample operations in place of transposed convolutions, played around with the buffer size, also attempted to keep the kernel size proportional to the 64×64 image ratio when using larger images). However, since there are so many things to change, my experiments were not exhaustive. In particular, I feel like I could try playing around the decay rates and composition of the loss function in more detail. Things that are on my list but that I haven’t tried are mini-batch discrimination, packing (from PACGAN), different normalization (currently, I am using InstanceNorm), and logit loss (from LOGAN).

Any advice would be greatly appreciated as I have been spent many hours on this, and, without positive feedback, my personal learning is not so high.

I’d be happy to give more info.

submitted by /u/lujkuj
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] CycleGAN performance peaks before first epoch is finished–help needed!