[P] Image + Text input classification
Hi, I’m trying to build this network which will run on real world production with inspiration from this article.
He’s trying to predict a product’s label from given 1 image input and 1 product name text input.
My data set have 6 attributes (5 image and 1 text input) and 1 class label(output). So I want to create a model which takes 5 product image inputs + 1 description text input and predict that product’s category.
My questions are;
How can effect using something like this for text feature extraction(without softmax) ? Should I even do this?