[P] Anyone had any luck training Deep Recursive Networks from scratch?
I’ve been working on a SISR project lately and I’m trying to reproduce some interesting network architectures. I’m having some problems replicating this paper about Deep Recursive Residual Networks (DRRN).
I’m training a DRRN with 4 residual blocks, recursively applied 3 times each to reconstruct face images. Training usually starts smoothly but somewhere along the way the loss function explodes and all progress is lost. This seems to be an artifact of the architecture, since each filter is applied multiple times and can then sum up to a big gradient.
I’ve tried to follow the paper’s original implementation details to the maximum extent I could: I used SGD with 0.9 momentum and the max batch size I could with available VRAM (16, which is admittedly far away from the original 128), a decreasing learning rate and gradient clipping based on the current learning rate. The SGD yielded horrible results, the network diverged, so I changed it to Adam, which worked better. Still, nothing helped with the random loss function jumps.
I was just wondering if this is normal and if anyone else had this, or if perhaps this is due to some implementation error (I’m using Keras and the code is kinda messy, but I could make it available on demand). If you had this, can you share some tips on how to properly train these networks?