[P] Need help with Image Captioning
Hello I’m trying to learn CNNs and I’ve hit a deadend with an Image Captioning project I was working on for fun.
Dataset: 10k images from Google Conceptual Captions
Tutorial I’m mostly following: Automatic Image Captioning
One difference between my dataset and the Flicker8k dataset in the tutorial is that my dataset only has one caption per image but latter has five captions per image.
The problem is that I am getting the same caption for nearly all images. I have tried to use: – LSTM instead of GRU cells – 50 and 200 Glove word embeddings. I even tried to create my own embeddings using all captions in the dataset – beam search and greedy search to get a prediction
What do I do?