Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Should autoencoders really be symmetric?

I always find myself wanting to make the decoder side of an autoencoder as symmetric as possible with respect to the encoder side, because it feels like an “elegant” design decision. But I suspect that it’s not optimal. And I’m not finding any direct discussions of this topic via google.

In most of mathematics, complex functions tend to have even more complex inverses. With respect to CNNs, convolutions are not strictly invertible, so it seems like the Conv2DTranspose operations could benefit from a higher complexity and parameter count to approximate it better. I’m curious if anyone has direct experience studying this, or if there are conventions for “optimizing” the decoder side of an autoencoder (or maybe it’s the encoder side needs more parameters…?).

My first inclination is to just double some numbers on the decoder side to give it twice as many parameters. But maybe including extra layers is better, since it more significantly increases the complexity of functions it can approximate. Or maybe none of this is theoretically necessary/relevant…?

Here’s an almost perfectly-symmetric reference network. Obviously I could experiment with it to come up with ideas, but I’m more interested in the general theory and if there’s any established ideas on the topic (and not just for CNNs, but all types of autoencoders).

Encoder:

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 48, 48, 3)] 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 24, 24, 32) 2432 _________________________________________________________________ conv2d_2 (Conv2D) (None, 12, 12, 64) 32832 _________________________________________________________________ conv2d_3 (Conv2D) (None, 6, 6, 128) 73856 _________________________________________________________________ flatten_3 (Flatten) (None, 4608) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 1179904 _________________________________________________________________ dense_2 (Dense) (None, 64) 16448 ================================================================= Total params: 1,305,472 

Decoder:

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) [(None, 64)] 0 _________________________________________________________________ dense_3 (Dense) (None, 256) 16640 _________________________________________________________________ dense_4 (Dense) (None, 4608) 1184256 _________________________________________________________________ reshape_1 (Reshape) (None, 6, 6, 128) 0 _________________________________________________________________ conv2d_transpose_3 (Conv2DTr (None, 12, 12, 64) 73792 _________________________________________________________________ conv2d_transpose_4 (Conv2DTr (None, 24, 24, 32) 32800 _________________________________________________________________ conv2d_transpose_5 (Conv2DTr (None, 48, 48, 3) 2403 ================================================================= Total params: 1,309,891 

For reference, the above computation graph was produced with the following code fragment:

# Encoder enc_input = L.Input(shape=(48, 48, 3)) enc0 = L.Conv2D(filters= 32, kernel_size=5, strides=2, padding='same', activation='relu')(enc_input) enc1 = L.Conv2D(filters= 64, kernel_size=4, strides=2, padding='same', activation='relu')(enc0) enc2 = L.Conv2D(filters=128, kernel_size=3, strides=2, padding='same', activation='relu')(enc1) enc_flat = L.Flatten()(enc2) enc_dense = L.Dense(256, activation='tanh')(enc_flat) enc_out = L.Dense(64, activation='linear')(enc_dense) encoder = keras.Model(inputs=enc_input, outputs=enc_out, name='Encoder') # Decoder dec_input = L.Input(shape=(64,)) dec_dense1 = L.Dense(256, activation='tanh')(dec_input) dec_dense2 = L.Dense(6*6*128, activation='relu')(dec_dense1) dec_reshape = L.Reshape((6,6,128))(dec_dense2) dec2 = L.Conv2DTranspose(filters=64, kernel_size=3, strides=2, padding='same', activation='relu')(dec_reshape) dec1 = L.Conv2DTranspose(filters=32, kernel_size=4, strides=2, padding='same', activation='relu')(dec2) dec0 = L.Conv2DTranspose(filters= 3, kernel_size=5, strides=2, padding='same', activation='linear')(dec1) decoder = keras.Model(inputs=dec_input, outputs=dec0, name='Decoder') encoder.summary() decoder.summary() 

submitted by /u/etotheipi_
[link] [comments]