[D] Learning the rotation of 2d images with a CNN
Hi all, this is my first post here so I hope I do this right…
I’m currently trying to get a CNN to learn the rotation angle of 2D images. I tried to make a post on stackoverflow first, because I didn’t know if the question fits into this subreddit, but so far that thread didn’t gain any traction… see here.
I hope it’s ok when I just quote the question from there:
I am trying to build a CNN (in Keras) that can estimate the rotation of an image (or a 2d object). So basically, the input is an image and the output should be its rotation.
My first experiment is to estimate the rotation of MŃIST digits (starting with only one digit “class”, let’s say the “3”). So what I did was extracting all 3s from the MNIST set, and then building a “rotated 3s” dataset, by randomly rotating these images multiple times, and storing the rotated images together with their rotation angles as ground truth labels.
So my first problem was that a 2d rotation is cyclic and I didn’t know how to model this behavior. Therefore, I encoded the angle as y=sin(ang), x = cos(ang). This gives me my dataset (the rotated 3s images) and the corresponding labels (x and y values).
For the CNN, as a start, i just took the keras MNIST CNN example (https://keras.io/examples/mnist_cnn/) and replaced the last dense layer (that had 10 outputs and a softmax activation) with a dense layer that has 2 outputs (x and y) and a tanh activation (since y=sin(ang), x = cos(ang) are within [-1,1]).
The last thing i had to decide was the loss function, where i basically want to have a distance measurement for angles. Therefore i thought “cosine_proximity” is the way to go.
When training the network I can see that the loss is decreasing and converging to a certain point. However when I then check the predictions vs the ground truth I observe a (for me) fairly surprising behavior. Almost all x and y predictions tend towards 0 or +/-1. And since the “decoding” of my rotation is ang=atan2(y,x) the predictions are usually either +/- 0°, 45°, 90, 135° or 180°. However, my training and test data has only angles of 0°, 20°, 40°, … 360°. This doesn’t really change if I change the complexity of the network. I also played around with the optimizer parameters without any success.
Is there anything wrong with the assumptions:
x,y encoding for angle
tanh activation to have values in [-1,1]
cosine_proximity as loss function
Thanks in advance for any advice, tips or pointing me towards a possible mistake i made!
If this is the wrong place for this question I’m sorry and would be happy if someone could point me to the right forum subreddit!