[Research] What happens when CNN devised for 224x224x3 images is trained on high resolution imagery ?
I’m training a CNN on high resolution satellite imagery. Because of hardware constraints, I’m using
EB0) to predict classes from 800x800x3 tiles. This can be done thanks to the 2D
GlobalAveragePooling layer at the end of the model which compresses the 25x25x1280 feature map (7x7x1280 when working with the original 224×224 images) into a 1D 1280 vector to be fed to dense layers and such. It works surprisingly well. Even more, I get better performance with
EB0 used on 800x800x3 tiles than with
EB5 applied on 456x456x3 tiles (the resolution for which they were designed).
How is this possible?