[D] Word embeddings for categorical variables?

Written by torontoai on July 2, 2019. Posted in Reddit MachineLearning.

I am working on a classification problem with a data set containing numerical as well as categorical data. A colleague of mine said that instead of encoding the categorical variables in a “primitive way” (label encoding, creating dummy variables etc.) he would use word2vec to get some kind of word embeddings. This would be a more realistic way of representing these variables. To me this makes no sense. If I understood correctly, for word2vec to work the words we want to embed need neighbors for there to be some kind of context. In a column of a DataFrame containing one string in each row and maybe 3 – 10 unique categories there isn’t any context. Each entry is independent from the entry in the next row. Am I missing something?

I hope I posed the question in a somewhat understandable way.

Thanks, guys.

submitted by /u/aeppelsaeft
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Word embeddings for categorical variables?