Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Your opinions on my residual deconvolution implementation

[D] Your opinions on my residual deconvolution implementation

I am currently looking for a solution regarding mask generation given an input image, and my approach works as follows:

– conv + activation + max-pooling from the input image until i get to a given smaller size. Every block halves the input size, so expect max-pooling after every convolution

– upsampling + conv + activation until I get back to the input resolution. Expect upsampling before each convolution

Bear with me, this encoder decoder architecture is required, as well as the upsampling + conv instead of transposed convolution (or deconvolution), so take it as given.

The model works as expected, and in order to improve it’s quality I decided to go with residual connections, in particular the full pre-activation variant shown below given it’s improved performance. I am not using BatchNormalization, so don’t take it into account.

rightmost version, no bn

For the encoder, I have each block defined as

– activation of the input

– conv

– max-pooling

– creation of the shortcut, defined as projection + max_pooling of the original input

– addition

Now, the interesting part: I want to build a residual deconvolutional architecture for the decoder, and I’m not entirely sure if what I ended up with is the right way of doing it:

– residual encoder, up to encoded size

– activation (as the last encoder’s residual layer doesn’t have it after the addition

Each block is defined as

– upsampling

– activation

– conv

– creation of the shortcut, defined as upsampling + projection of the original input

– addition

Then I get my mask.

What are your thoughts regarding my approach? using the traditional approach for downsizing the shortcut using convolutions with kernel_size of 1 and strides of 2 loses 75% of information at each step, and I would like to find a way to avoid it.

Thank you for your feedback!

submitted by /u/HitLuca
[link] [comments]