Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
I am currently looking for a solution regarding mask generation given an input image, and my approach works as follows:
– conv + activation + max-pooling from the input image until i get to a given smaller size. Every block halves the input size, so expect max-pooling after every convolution
– upsampling + conv + activation until I get back to the input resolution. Expect upsampling before each convolution
Bear with me, this encoder decoder architecture is required, as well as the upsampling + conv instead of transposed convolution (or deconvolution), so take it as given.
The model works as expected, and in order to improve it’s quality I decided to go with residual connections, in particular the full pre-activation variant shown below given it’s improved performance. I am not using BatchNormalization, so don’t take it into account.
For the encoder, I have each block defined as
– activation of the input
– creation of the shortcut, defined as projection + max_pooling of the original input
Now, the interesting part: I want to build a residual deconvolutional architecture for the decoder, and I’m not entirely sure if what I ended up with is the right way of doing it:
– residual encoder, up to encoded size
– activation (as the last encoder’s residual layer doesn’t have it after the addition
Each block is defined as
– creation of the shortcut, defined as upsampling + projection of the original input
Then I get my mask.
What are your thoughts regarding my approach? using the traditional approach for downsizing the shortcut using convolutions with kernel_size of 1 and strides of 2 loses 75% of information at each step, and I would like to find a way to avoid it.
Thank you for your feedback!