[D] Your opinions on my residual deconvolution implementation
I am currently looking for a solution regarding mask generation given an input image, and my approach works as follows: – conv + activation + max-pooling from the input image until i get to a given smaller size. Every block halves the input size, so expect max-pooling after every convolution – upsampling + conv + activation until I get back to the input resolution. Expect upsampling before each convolution Bear with me, this encoder decoder architecture is required, as well as the upsampling + conv instead of transposed convolution (or deconvolution), so take it as given. The model works as expected, and in order to improve it’s quality I decided to go with residual connections, in particular the full pre-activation variant shown below given it’s improved performance. I am not using BatchNormalization, so don’t take it into account. For the encoder, I have each block defined as – activation of the input – conv – max-pooling – creation of the shortcut, defined as projection + max_pooling of the original input – addition Now, the interesting part: I want to build a residual deconvolutional architecture for the decoder, and I’m not entirely sure if what I ended up with is the right way of doing it: – residual encoder, up to encoded size – activation (as the last encoder’s residual layer doesn’t have it after the addition Each block is defined as – upsampling – activation – conv – creation of the shortcut, defined as upsampling + projection of the original input – addition Then I get my mask. What are your thoughts regarding my approach? using the traditional approach for downsizing the shortcut using convolutions with kernel_size of 1 and strides of 2 loses 75% of information at each step, and I would like to find a way to avoid it. Thank you for your feedback! submitted by /u/HitLuca |